date:20240320

[PATCH] target/ppc: Do not clear MSR[ME] on MCE interrupts to supervisor

2024-03-20 Thread Nicholas Piggin

Hardware clears the MSR[ME] bit when delivering a machine check
interrupt, so that is what QEMU does.

The spapr environment runs in supervisor mode though, and receives
machine check interrupts after they are processed by the hypervisor,
and MSR[ME] must always be enabled in supervisor mode (otherwise it
could checkstop the system). So MSR[ME] must not be cleared when
delivering machine checks to the supervisor.

The fix to prevent supervisor mode from modifying MSR[ME] also
prevented it from re-enabling the incorrectly cleared MSR[ME] bit
when returning from handling the interrupt. Before that fix, the
problem was not very noticable with well-behaved code. So the
Fixes tag is not strictly correct, but practically they go together.

Found by kvm-unit-tests machine check tests (not yet upstream).

Fixes: 678b6f1af75ef ("target/ppc: Prevent supervisor from modifying MSR[ME]")
Signed-off-by: Nicholas Piggin 
---
 target/ppc/excp_helper.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 80f584f933..674c05a2ce 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1345,9 +1345,10 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
  * clear (e.g., see FWNMI in PAPR).
  */
 new_msr |= (target_ulong)MSR_HVB;
+
+/* HV machine check exceptions don't have ME set */
+new_msr &= ~((target_ulong)1 << MSR_ME);
 }
-/* machine check exceptions don't have ME set */
-new_msr &= ~((target_ulong)1 << MSR_ME);
 
 msr |= env->error_code;
 break;
-- 
2.42.0

Re: [PATCH v2] target/riscv: Fix the element agnostic function problem

2024-03-20 Thread LIU Zhiwei




On 2024/3/21 11:58, Huang Tao wrote:

In RVV and vcrypto instructions, the masked and tail elements are set to 1s
using vext_set_elems_1s function if the vma/vta bit is set. It is the element
agnostic policy.

However, this function can't deal the big endian situation. This patch fixes
the problem by adding handling of such case.

Signed-off-by: Huang Tao 
Suggested-by: Richard Henderson 
---
Changes in v2:
- Keep the api of vext_set_elems_1s
- Reduce the number of patches.
---
  target/riscv/vector_internals.c | 22 ++
  1 file changed, 22 insertions(+)

diff --git a/target/riscv/vector_internals.c b/target/riscv/vector_internals.c
index 12f5964fbb..3e45b9b4a7 100644
--- a/target/riscv/vector_internals.c
+++ b/target/riscv/vector_internals.c
@@ -30,6 +30,28 @@ void vext_set_elems_1s(void *base, uint32_t is_agnostic, 
uint32_t cnt,
  if (tot - cnt == 0) {
  return ;
  }
+
+#if HOST_BIG_ENDIAN
+/*
+ * Deal the situation when the elements are insdie
+ * only one uint64 block including setting the
+ * masked-off element.
+ */
+if ((tot - 1) ^ cnt < 8) {
+memset(base + H1(tot - 1), -1, tot - cnt);
+return;
+}
+/*
+ * Otherwise, at least cross two uint64_t blocks.
+ * Set first unaligned block.
+ */
+if (cnt % 8 != 0) {
+uint32_t j = ROUND_UP(cnt, 8);
+memset(base + H1(j - 1), -1, j - cnt);
+cnt = j;
+}
+/* Set other 64bit aligend blocks */
+#endif


Reviewed-by: LIU Zhiwei 

Zhiwei


  memset(base + cnt, -1, tot - cnt);
  }

Re: [PATCH v4 1/2] vhost: dirty log should be per backend type

2024-03-20 Thread Jason Wang

On Thu, Mar 21, 2024 at 4:29 AM Si-Wei Liu  wrote:
>
>
>
> On 3/19/2024 8:25 PM, Jason Wang wrote:
> > On Tue, Mar 19, 2024 at 6:06 AM Si-Wei Liu  wrote:
> >>
> >>
> >> On 3/17/2024 8:20 PM, Jason Wang wrote:
> >>> On Sat, Mar 16, 2024 at 2:33 AM Si-Wei Liu  wrote:
> 
>  On 3/14/2024 8:50 PM, Jason Wang wrote:
> > On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu  
> > wrote:
> >> There could be a mix of both vhost-user and vhost-kernel clients
> >> in the same QEMU process, where separate vhost loggers for the
> >> specific vhost type have to be used. Make the vhost logger per
> >> backend type, and have them properly reference counted.
> > It's better to describe what's the advantage of doing this.
>  Yes, I can add that to the log. Although it's a niche use case, it was
>  actually a long standing limitation / bug that vhost-user and
>  vhost-kernel loggers can't co-exist per QEMU process, but today it's
>  just silent failure that may be ended up with. This bug fix removes that
>  implicit limitation in the code.
> >>> Ok.
> >>>
> >> Suggested-by: Michael S. Tsirkin 
> >> Signed-off-by: Si-Wei Liu 
> >>
> >> ---
> >> v3->v4:
> >>  - remove checking NULL return value from vhost_log_get
> >>
> >> v2->v3:
> >>  - remove non-effective assertion that never be reached
> >>  - do not return NULL from vhost_log_get()
> >>  - add neccessary assertions to vhost_log_get()
> >> ---
> >> hw/virtio/vhost.c | 45 
> >> +
> >> 1 file changed, 33 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >> index 2c9ac79..612f4db 100644
> >> --- a/hw/virtio/vhost.c
> >> +++ b/hw/virtio/vhost.c
> >> @@ -43,8 +43,8 @@
> >> do { } while (0)
> >> #endif
> >>
> >> -static struct vhost_log *vhost_log;
> >> -static struct vhost_log *vhost_log_shm;
> >> +static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
> >> +static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
> >>
> >> /* Memslots used by backends that support private memslots 
> >> (without an fd). */
> >> static unsigned int used_memslots;
> >> @@ -287,6 +287,10 @@ static int vhost_set_backend_type(struct 
> >> vhost_dev *dev,
> >> r = -1;
> >> }
> >>
> >> +if (r == 0) {
> >> +assert(dev->vhost_ops->backend_type == backend_type);
> >> +}
> >> +
> > Under which condition could we hit this?
>  Just in case some other function inadvertently corrupted this earlier,
>  we have to capture discrepancy in the first place... On the other hand,
>  it will be helpful for other vhost backend writers to diagnose day-one
>  bug in the code. I feel just code comment here will not be
>  sufficient/helpful.
> >>> See below.
> >>>
> > It seems not good to assert a local logic.
>  It seems to me quite a few local asserts are in the same file already,
>  vhost_save_backend_state,
> >>> For example it has assert for
> >>>
> >>> assert(!dev->started);
> >>>
> >>> which is not the logic of the function itself but require
> >>> vhost_dev_start() not to be called before.
> >>>
> >>> But it looks like this patch you assert the code just a few lines
> >>> above the assert itself?
> >> Yes, that was the intent - for e.g. xxx_ops may contain corrupted
> >> xxx_ops.backend_type already before coming to this
> >> vhost_set_backend_type() function. And we may capture this corrupted
> >> state by asserting the expected xxx_ops.backend_type (to be consistent
> >> with the backend_type passed in),
> > This can happen for all variables. Not sure why backend_ops is special.
> The assert is just checking the backend_type field only. The other op
> fields in backend_ops have similar assert within the op function itself
> also. For e.g. vhost_user_requires_shm_log() and a lot of other
> vhost_user ops have the following:
>
>  assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
>
> vhost_vdpa_vq_get_addr() and a lot of other vhost_vdpa ops have:
>
>  assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
>
> vhost_kernel ops has similar assertions as well.
>
> The reason why it has to be checked against here is now the callers of
> vhost_log_get(), would pass in dev->vhost_ops->backend_type to the API,
> which are unable to verify the validity of the backend_type by
> themselves. The vhost_log_get() has necessary asserts to make bound
> check for the vhost_log[] or vhost_log_shm[] array, but specific assert
> against the exact backend type in vhost_set_backend_type() will further
> harden the implementation in vhost_log_get() and other backend ops.

As discussed, those assertions are to make sure of the logic
dependencies of other functions. (The assignment

[PATCH v2] target/riscv: Fix the element agnostic function problem

2024-03-20 Thread Huang Tao

In RVV and vcrypto instructions, the masked and tail elements are set to 1s
using vext_set_elems_1s function if the vma/vta bit is set. It is the element
agnostic policy.

However, this function can't deal the big endian situation. This patch fixes
the problem by adding handling of such case.

Signed-off-by: Huang Tao 
Suggested-by: Richard Henderson 
---
Changes in v2:
- Keep the api of vext_set_elems_1s
- Reduce the number of patches.
---
 target/riscv/vector_internals.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/target/riscv/vector_internals.c b/target/riscv/vector_internals.c
index 12f5964fbb..3e45b9b4a7 100644
--- a/target/riscv/vector_internals.c
+++ b/target/riscv/vector_internals.c
@@ -30,6 +30,28 @@ void vext_set_elems_1s(void *base, uint32_t is_agnostic, 
uint32_t cnt,
 if (tot - cnt == 0) {
 return ;
 }
+
+#if HOST_BIG_ENDIAN
+/*
+ * Deal the situation when the elements are insdie
+ * only one uint64 block including setting the
+ * masked-off element.
+ */
+if ((tot - 1) ^ cnt < 8) {
+memset(base + H1(tot - 1), -1, tot - cnt);
+return;
+}
+/*
+ * Otherwise, at least cross two uint64_t blocks.
+ * Set first unaligned block.
+ */
+if (cnt % 8 != 0) {
+uint32_t j = ROUND_UP(cnt, 8);
+memset(base + H1(j - 1), -1, j - cnt);
+cnt = j;
+}
+/* Set other 64bit aligend blocks */
+#endif
 memset(base + cnt, -1, tot - cnt);
 }
 
-- 
2.41.0

Re: [PATCH v4 2/2] vhost: Perform memory section dirty scans once per iteration

2024-03-20 Thread Jason Wang

On Thu, Mar 21, 2024 at 5:03 AM Si-Wei Liu  wrote:
>
>
>
> On 3/19/2024 8:27 PM, Jason Wang wrote:
> > On Tue, Mar 19, 2024 at 6:16 AM Si-Wei Liu  wrote:
> >>
> >>
> >> On 3/17/2024 8:22 PM, Jason Wang wrote:
> >>> On Sat, Mar 16, 2024 at 2:45 AM Si-Wei Liu  wrote:
> 
>  On 3/14/2024 9:03 PM, Jason Wang wrote:
> > On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu  
> > wrote:
> >> On setups with one or more virtio-net devices with vhost on,
> >> dirty tracking iteration increases cost the bigger the number
> >> amount of queues are set up e.g. on idle guests migration the
> >> following is observed with virtio-net with vhost=on:
> >>
> >> 48 queues -> 78.11%  [.] vhost_dev_sync_region.isra.13
> >> 8 queues -> 40.50%   [.] vhost_dev_sync_region.isra.13
> >> 1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13
> >> 2 devices, 1 queue -> 18.60%  [.] vhost_dev_sync_region.isra.14
> >>
> >> With high memory rates the symptom is lack of convergence as soon
> >> as it has a vhost device with a sufficiently high number of queues,
> >> the sufficient number of vhost devices.
> >>
> >> On every migration iteration (every 100msecs) it will redundantly
> >> query the *shared log* the number of queues configured with vhost
> >> that exist in the guest. For the virtqueue data, this is necessary,
> >> but not for the memory sections which are the same. So essentially
> >> we end up scanning the dirty log too often.
> >>
> >> To fix that, select a vhost device responsible for scanning the
> >> log with regards to memory sections dirty tracking. It is selected
> >> when we enable the logger (during migration) and cleared when we
> >> disable the logger. If the vhost logger device goes away for some
> >> reason, the logger will be re-selected from the rest of vhost
> >> devices.
> >>
> >> After making mem-section logger a singleton instance, constant cost
> >> of 7%-9% (like the 1 queue report) will be seen, no matter how many
> >> queues or how many vhost devices are configured:
> >>
> >> 48 queues -> 8.71%[.] vhost_dev_sync_region.isra.13
> >> 2 devices, 8 queues -> 7.97%   [.] vhost_dev_sync_region.isra.14
> >>
> >> Co-developed-by: Joao Martins 
> >> Signed-off-by: Joao Martins 
> >> Signed-off-by: Si-Wei Liu 
> >>
> >> ---
> >> v3 -> v4:
> >>  - add comment to clarify effect on cache locality and
> >>performance
> >>
> >> v2 -> v3:
> >>  - add after-fix benchmark to commit log
> >>  - rename vhost_log_dev_enabled to vhost_dev_should_log
> >>  - remove unneeded comparisons for backend_type
> >>  - use QLIST array instead of single flat list to store vhost
> >>logger devices
> >>  - simplify logger election logic
> >> ---
> >> hw/virtio/vhost.c | 67 
> >> ++-
> >> include/hw/virtio/vhost.h |  1 +
> >> 2 files changed, 62 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >> index 612f4db..58522f1 100644
> >> --- a/hw/virtio/vhost.c
> >> +++ b/hw/virtio/vhost.c
> >> @@ -45,6 +45,7 @@
> >>
> >> static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
> >> static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
> >> +static QLIST_HEAD(, vhost_dev) vhost_log_devs[VHOST_BACKEND_TYPE_MAX];
> >>
> >> /* Memslots used by backends that support private memslots 
> >> (without an fd). */
> >> static unsigned int used_memslots;
> >> @@ -149,6 +150,47 @@ bool vhost_dev_has_iommu(struct vhost_dev *dev)
> >> }
> >> }
> >>
> >> +static inline bool vhost_dev_should_log(struct vhost_dev *dev)
> >> +{
> >> +assert(dev->vhost_ops);
> >> +assert(dev->vhost_ops->backend_type > VHOST_BACKEND_TYPE_NONE);
> >> +assert(dev->vhost_ops->backend_type < VHOST_BACKEND_TYPE_MAX);
> >> +
> >> +return dev == 
> >> QLIST_FIRST(_log_devs[dev->vhost_ops->backend_type]);
> > A dumb question, why not simple check
> >
> > dev->log == vhost_log_shm[dev->vhost_ops->backend_type]
>  Because we are not sure if the logger comes from vhost_log_shm[] or
>  vhost_log[]. Don't want to complicate the check here by calling into
>  vhost_dev_log_is_shared() everytime when the .log_sync() is called.
> >>> It has very low overhead, isn't it?
> >> Whether this has low overhead will have to depend on the specific
> >> backend's implementation for .vhost_requires_shm_log(), which the common
> >> vhost layer should not assume upon or rely on the current implementation.
> >>
> >>> static bool vhost_dev_log_is_shared(struct vhost_dev *dev)
> >>> {
> >>>   return dev->vhost_ops->vhost_requires_shm_log &&
> >>>

Re: [PATCH v1] target/loongarch: Fix qemu-system-loongarch64 assert failed with the option '-d int'

2024-03-20 Thread gaosong


在 2024/3/21 上午10:50, Richard Henderson 写道:

On 3/20/24 16:11, Song Gao wrote:

qemu-system-loongarch64 assert failed with the option '-d int',
the helper_idle() raise an exception EXCP_HLT, but the exception name 
is undefined.


Signed-off-by: Song Gao 
---
  target/loongarch/cpu.c | 75 ++
  1 file changed, 46 insertions(+), 29 deletions(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index f6ffb3aadb..17a923de02 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -45,33 +45,46 @@ const char * const fregnames[32] = {
  "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31",
  };
  -static const char * const excp_names[] = {
-    [EXCCODE_INT] = "Interrupt",
-    [EXCCODE_PIL] = "Page invalid exception for load",
-    [EXCCODE_PIS] = "Page invalid exception for store",
-    [EXCCODE_PIF] = "Page invalid exception for fetch",
-    [EXCCODE_PME] = "Page modified exception",
-    [EXCCODE_PNR] = "Page Not Readable exception",
-    [EXCCODE_PNX] = "Page Not Executable exception",
-    [EXCCODE_PPI] = "Page Privilege error",
-    [EXCCODE_ADEF] = "Address error for instruction fetch",
-    [EXCCODE_ADEM] = "Address error for Memory access",
-    [EXCCODE_SYS] = "Syscall",
-    [EXCCODE_BRK] = "Break",
-    [EXCCODE_INE] = "Instruction Non-Existent",
-    [EXCCODE_IPE] = "Instruction privilege error",
-    [EXCCODE_FPD] = "Floating Point Disabled",
-    [EXCCODE_FPE] = "Floating Point Exception",
-    [EXCCODE_DBP] = "Debug breakpoint",
-    [EXCCODE_BCE] = "Bound Check Exception",
-    [EXCCODE_SXD] = "128 bit vector instructions Disable exception",
-    [EXCCODE_ASXD] = "256 bit vector instructions Disable exception",
+struct TypeExcp {
+    int32_t exccode;
+    const char *name;
+};
+
+static const struct TypeExcp excp_names[] = {
+    {EXCCODE_INT, "Interrupt"},
+    {EXCCODE_PIL, "Page invalid exception for load"},
+    {EXCCODE_PIS, "Page invalid exception for store"},
+    {EXCCODE_PIF, "Page invalid exception for fetch"},
+    {EXCCODE_PME, "Page modified exception"},
+    {EXCCODE_PNR, "Page Not Readable exception"},
+    {EXCCODE_PNX, "Page Not Executable exception"},
+    {EXCCODE_PPI, "Page Privilege error"},
+    {EXCCODE_ADEF, "Address error for instruction fetch"},
+    {EXCCODE_ADEM, "Address error for Memory access"},
+    {EXCCODE_SYS, "Syscall"},
+    {EXCCODE_BRK, "Break"},
+    {EXCCODE_INE, "Instruction Non-Existent"},
+    {EXCCODE_IPE, "Instruction privilege error"},
+    {EXCCODE_FPD, "Floating Point Disabled"},
+    {EXCCODE_FPE, "Floating Point Exception"},
+    {EXCCODE_DBP, "Debug breakpoint"},
+    {EXCCODE_BCE, "Bound Check Exception"},
+    {EXCCODE_SXD, "128 bit vector instructions Disable exception"},
+    {EXCCODE_ASXD, "256 bit vector instructions Disable exception"},
  };
    const char *loongarch_exception_name(int32_t exception)
  {
-    assert(excp_names[exception]);
-    return excp_names[exception];
+    int i;
+    const char *name = "unknown";
+
+    for (i = 0; i < ARRAY_SIZE(excp_names); i++) {
+    if (excp_names[i].exccode == exception) {
+    name = excp_names[i].name;
+    break;
+    }
+    }
+    return name;
  }


I think you should return null for unknown, and then...


    void G_NORETURN do_raise_exception(CPULoongArchState *env,
@@ -79,11 +92,17 @@ void G_NORETURN 
do_raise_exception(CPULoongArchState *env,

 uintptr_t pc)
  {
  CPUState *cs = env_cpu(env);
+    const char *name;
  +    if (exception == EXCP_HLT) {
+    name = "EXCP_HLT";
+    } else {
+    name = loongarch_exception_name(exception);
+    }
  qemu_log_mask(CPU_LOG_INT, "%s: %d (%s)\n",
    __func__,
    exception,
-  loongarch_exception_name(exception));
+  name);


... use two different printfs, one of which prints the exception number.
Why would you special case HLT here instead of putting it in the table?


Hmm,  put HLT in the table no problem.  I will correct it.

I considered HLT not a real exception to the LoongAarh architecture, so 
I didn't put it in the table.


Thanks.
Song Gao


r~

Re: [PATCH v1] target/loongarch: Fix qemu-system-loongarch64 assert failed with the option '-d int'

2024-03-20 Thread Richard Henderson


On 3/20/24 16:11, Song Gao wrote:

qemu-system-loongarch64 assert failed with the option '-d int',
the helper_idle() raise an exception EXCP_HLT, but the exception name is 
undefined.

Signed-off-by: Song Gao 
---
  target/loongarch/cpu.c | 75 ++
  1 file changed, 46 insertions(+), 29 deletions(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index f6ffb3aadb..17a923de02 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -45,33 +45,46 @@ const char * const fregnames[32] = {
  "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31",
  };
  
-static const char * const excp_names[] = {

-[EXCCODE_INT] = "Interrupt",
-[EXCCODE_PIL] = "Page invalid exception for load",
-[EXCCODE_PIS] = "Page invalid exception for store",
-[EXCCODE_PIF] = "Page invalid exception for fetch",
-[EXCCODE_PME] = "Page modified exception",
-[EXCCODE_PNR] = "Page Not Readable exception",
-[EXCCODE_PNX] = "Page Not Executable exception",
-[EXCCODE_PPI] = "Page Privilege error",
-[EXCCODE_ADEF] = "Address error for instruction fetch",
-[EXCCODE_ADEM] = "Address error for Memory access",
-[EXCCODE_SYS] = "Syscall",
-[EXCCODE_BRK] = "Break",
-[EXCCODE_INE] = "Instruction Non-Existent",
-[EXCCODE_IPE] = "Instruction privilege error",
-[EXCCODE_FPD] = "Floating Point Disabled",
-[EXCCODE_FPE] = "Floating Point Exception",
-[EXCCODE_DBP] = "Debug breakpoint",
-[EXCCODE_BCE] = "Bound Check Exception",
-[EXCCODE_SXD] = "128 bit vector instructions Disable exception",
-[EXCCODE_ASXD] = "256 bit vector instructions Disable exception",
+struct TypeExcp {
+int32_t exccode;
+const char *name;
+};
+
+static const struct TypeExcp excp_names[] = {
+{EXCCODE_INT, "Interrupt"},
+{EXCCODE_PIL, "Page invalid exception for load"},
+{EXCCODE_PIS, "Page invalid exception for store"},
+{EXCCODE_PIF, "Page invalid exception for fetch"},
+{EXCCODE_PME, "Page modified exception"},
+{EXCCODE_PNR, "Page Not Readable exception"},
+{EXCCODE_PNX, "Page Not Executable exception"},
+{EXCCODE_PPI, "Page Privilege error"},
+{EXCCODE_ADEF, "Address error for instruction fetch"},
+{EXCCODE_ADEM, "Address error for Memory access"},
+{EXCCODE_SYS, "Syscall"},
+{EXCCODE_BRK, "Break"},
+{EXCCODE_INE, "Instruction Non-Existent"},
+{EXCCODE_IPE, "Instruction privilege error"},
+{EXCCODE_FPD, "Floating Point Disabled"},
+{EXCCODE_FPE, "Floating Point Exception"},
+{EXCCODE_DBP, "Debug breakpoint"},
+{EXCCODE_BCE, "Bound Check Exception"},
+{EXCCODE_SXD, "128 bit vector instructions Disable exception"},
+{EXCCODE_ASXD, "256 bit vector instructions Disable exception"},
  };
  
  const char *loongarch_exception_name(int32_t exception)

  {
-assert(excp_names[exception]);
-return excp_names[exception];
+int i;
+const char *name = "unknown";
+
+for (i = 0; i < ARRAY_SIZE(excp_names); i++) {
+if (excp_names[i].exccode == exception) {
+name = excp_names[i].name;
+break;
+}
+}
+return name;
  }


I think you should return null for unknown, and then...

  
  void G_NORETURN do_raise_exception(CPULoongArchState *env,

@@ -79,11 +92,17 @@ void G_NORETURN do_raise_exception(CPULoongArchState *env,
 uintptr_t pc)
  {
  CPUState *cs = env_cpu(env);
+const char *name;
  
+if (exception == EXCP_HLT) {

+name = "EXCP_HLT";
+} else {
+name = loongarch_exception_name(exception);
+}
  qemu_log_mask(CPU_LOG_INT, "%s: %d (%s)\n",
__func__,
exception,
-  loongarch_exception_name(exception));
+  name);


... use two different printfs, one of which prints the exception number.
Why would you special case HLT here instead of putting it in the table?


r~

Re: [PATCH v3 1/1] target/i386: Enable page walking from MMIO memory

2024-03-20 Thread Richard Henderson


Paolo, ping!

On 3/13/24 09:30, Richard Henderson wrote:

On 3/7/24 05:53, Jonathan Cameron wrote:

From: Gregory Price 

CXL emulation of interleave requires read and write hooks due to
requirement for subpage granularity. The Linux kernel stack now enables
using this memory as conventional memory in a separate NUMA node. If a
process is deliberately forced to run from that node
$ numactl --membind=1 ls
the page table walk on i386 fails.

Useful part of backtrace:

 (cpu=cpu@entry=0x56fd9000, fmt=fmt@entry=0x55fe3378 "cpu_io_recompile: 
could not find TB for pc=%p")

 at ../../cpu-target.c:359
 (retaddr=0, addr=19595792376, attrs=..., xlat=, cpu=0x56fd9000, 
out_offset=)

 at ../../accel/tcg/cputlb.c:1339
 (cpu=0x56fd9000, full=0x7fffee0d96e0, ret_be=ret_be@entry=0, addr=19595792376, 
size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030
 (cpu=cpu@entry=0x56fd9000, p=p@entry=0x756fddc0, mmu_idx=, 
type=type@entry=MMU_DATA_LOAD, memop=, ra=ra@entry=0) at 
../../accel/tcg/cputlb.c:2356
 (cpu=cpu@entry=0x56fd9000, addr=addr@entry=19595792376, oi=oi@entry=52, 
ra=ra@entry=0, access_type=access_type@entry=MMU_DATA_LOAD) at 
../../accel/tcg/cputlb.c:2439

 at ../../accel/tcg/ldst_common.c.inc:301
 at ../../target/i386/tcg/sysemu/excp_helper.c:173
 (err=0x756fdf80, out=0x756fdf70, mmu_idx=0, access_type=MMU_INST_FETCH, 
addr=18446744072116178925, env=0x56fdb7c0)

 at ../../target/i386/tcg/sysemu/excp_helper.c:578
 (cs=0x56fd9000, addr=18446744072116178925, size=, 
access_type=MMU_INST_FETCH, mmu_idx=0, probe=, retaddr=0) at 
../../target/i386/tcg/sysemu/excp_helper.c:604


Avoid this by plumbing the address all the way down from
x86_cpu_tlb_fill() where is available as retaddr to the actual accessors
which provide it to probe_access_full() which already handles MMIO accesses.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Suggested-by: Peter Maydell 
Signed-off-by: Gregory Price 
Signed-off-by: Jonathan Cameron 
---
v3: No change.


Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2180
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2220


r~

Re: [PATCH-for-9.1 22/27] target/s390x: Convert to TCGCPUOps::get_cpu_state()

2024-03-20 Thread Richard Henderson


On 3/19/24 21:09, Philippe Mathieu-Daudé wrote:

On 19/3/24 22:05, Richard Henderson wrote:

On 3/19/24 05:42, Philippe Mathieu-Daudé wrote:

Convert cpu_get_tb_cpu_state() to TCGCPUOps::get_cpu_state().

Note, now s390x_get_cpu_state() is restricted to TCG.

Signed-off-by: Philippe Mathieu-Daudé
---
  target/s390x/cpu.h    | 30 --
  target/s390x/s390x-internal.h |  2 ++
  target/s390x/cpu.c    |  1 +
  target/s390x/tcg/mem_helper.c |  2 +-
  target/s390x/tcg/translate.c  | 23 +++
  5 files changed, 27 insertions(+), 31 deletions(-)


Why is the function in translate.c, not cpu.c (with or without ifdefs)?


My understanding is target/foo/tcg/ is better for TCG-specific handlers,
less #ifdef'ry and stubs. Then bar_helper.c are meant for TCG helpers
(including "exec/helper-proto.h").

Can you think of a better file (new name?) in tcg/ or do you rather
keep it in the main cpu.c?


Given that all other targets to this point used cpu.c, I would prefer s390x and sparc to 
not be the only exceptions.



r~

[PATCH v1] target/loongarch: Fix qemu-system-loongarch64 assert failed with the option '-d int'

2024-03-20 Thread Song Gao

qemu-system-loongarch64 assert failed with the option '-d int',
the helper_idle() raise an exception EXCP_HLT, but the exception name is 
undefined.

Signed-off-by: Song Gao 
---
 target/loongarch/cpu.c | 75 ++
 1 file changed, 46 insertions(+), 29 deletions(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index f6ffb3aadb..17a923de02 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -45,33 +45,46 @@ const char * const fregnames[32] = {
 "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31",
 };
 
-static const char * const excp_names[] = {
-[EXCCODE_INT] = "Interrupt",
-[EXCCODE_PIL] = "Page invalid exception for load",
-[EXCCODE_PIS] = "Page invalid exception for store",
-[EXCCODE_PIF] = "Page invalid exception for fetch",
-[EXCCODE_PME] = "Page modified exception",
-[EXCCODE_PNR] = "Page Not Readable exception",
-[EXCCODE_PNX] = "Page Not Executable exception",
-[EXCCODE_PPI] = "Page Privilege error",
-[EXCCODE_ADEF] = "Address error for instruction fetch",
-[EXCCODE_ADEM] = "Address error for Memory access",
-[EXCCODE_SYS] = "Syscall",
-[EXCCODE_BRK] = "Break",
-[EXCCODE_INE] = "Instruction Non-Existent",
-[EXCCODE_IPE] = "Instruction privilege error",
-[EXCCODE_FPD] = "Floating Point Disabled",
-[EXCCODE_FPE] = "Floating Point Exception",
-[EXCCODE_DBP] = "Debug breakpoint",
-[EXCCODE_BCE] = "Bound Check Exception",
-[EXCCODE_SXD] = "128 bit vector instructions Disable exception",
-[EXCCODE_ASXD] = "256 bit vector instructions Disable exception",
+struct TypeExcp {
+int32_t exccode;
+const char *name;
+};
+
+static const struct TypeExcp excp_names[] = {
+{EXCCODE_INT, "Interrupt"},
+{EXCCODE_PIL, "Page invalid exception for load"},
+{EXCCODE_PIS, "Page invalid exception for store"},
+{EXCCODE_PIF, "Page invalid exception for fetch"},
+{EXCCODE_PME, "Page modified exception"},
+{EXCCODE_PNR, "Page Not Readable exception"},
+{EXCCODE_PNX, "Page Not Executable exception"},
+{EXCCODE_PPI, "Page Privilege error"},
+{EXCCODE_ADEF, "Address error for instruction fetch"},
+{EXCCODE_ADEM, "Address error for Memory access"},
+{EXCCODE_SYS, "Syscall"},
+{EXCCODE_BRK, "Break"},
+{EXCCODE_INE, "Instruction Non-Existent"},
+{EXCCODE_IPE, "Instruction privilege error"},
+{EXCCODE_FPD, "Floating Point Disabled"},
+{EXCCODE_FPE, "Floating Point Exception"},
+{EXCCODE_DBP, "Debug breakpoint"},
+{EXCCODE_BCE, "Bound Check Exception"},
+{EXCCODE_SXD, "128 bit vector instructions Disable exception"},
+{EXCCODE_ASXD, "256 bit vector instructions Disable exception"},
 };
 
 const char *loongarch_exception_name(int32_t exception)
 {
-assert(excp_names[exception]);
-return excp_names[exception];
+int i;
+const char *name = "unknown";
+
+for (i = 0; i < ARRAY_SIZE(excp_names); i++) {
+if (excp_names[i].exccode == exception) {
+name = excp_names[i].name;
+break;
+}
+}
+return name;
 }
 
 void G_NORETURN do_raise_exception(CPULoongArchState *env,
@@ -79,11 +92,17 @@ void G_NORETURN do_raise_exception(CPULoongArchState *env,
uintptr_t pc)
 {
 CPUState *cs = env_cpu(env);
+const char *name;
 
+if (exception == EXCP_HLT) {
+name = "EXCP_HLT";
+} else {
+name = loongarch_exception_name(exception);
+}
 qemu_log_mask(CPU_LOG_INT, "%s: %d (%s)\n",
   __func__,
   exception,
-  loongarch_exception_name(exception));
+  name);
 cs->exception_index = exception;
 
 cpu_loop_exit_restore(cs, pc);
@@ -159,13 +178,11 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
 uint32_t vec_size = FIELD_EX64(env->CSR_ECFG, CSR_ECFG, VS);
 
 if (cs->exception_index != EXCCODE_INT) {
-if (cs->exception_index < 0 ||
-cs->exception_index >= ARRAY_SIZE(excp_names)) {
-name = "unknown";
+if (cs->exception_index == EXCP_HLT) {
+name = "EXCP_HLT";
 } else {
-name = excp_names[cs->exception_index];
+name = loongarch_exception_name(cs->exception_index);
 }
-
 qemu_log_mask(CPU_LOG_INT,
  "%s enter: pc " TARGET_FMT_lx " ERA " TARGET_FMT_lx
  " TLBRERA " TARGET_FMT_lx " %s exception\n", __func__,
-- 
2.25.1

[PATCH v4] ui/gtk: flush display pipeline before saving vmstate when blob=true

2024-03-20 Thread dongwon . kim

From: Dongwon Kim 

It is required to ensure the current scanout frame is completed
before transitioning guest's run-state to save to prevent potential
guest waiting for the response on the resource flush of the old
scanout frame upon resume.

v2: Giving some time for the fence to be signaled before flushing
the pipeline

v3: Prevent redudant call of gd_hw_gl_flushed by checking dmabuf
and fence_fd >= 0 in it (e.g. during and after eglClientWaitSync
in gd_change_runstate)

v4: Rewrote the commit msg

Creating fence_fd in the same function where sync is created to
handle the case where the valid sync is created but fence_fd is
failed to be created.

0 is a valid fd so any fence_fd > -1 for the fence in draw function
in gtk-egl.c and gtk-gl-area.c will be considered valid

egl_sync and fence_fd for it are created in the same function

Cc: Marc-André Lureau 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 ui/egl-helpers.c | 16 ++--
 ui/gtk-egl.c | 10 ++
 ui/gtk-gl-area.c |  9 ++---
 ui/gtk.c | 31 +++
 4 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/ui/egl-helpers.c b/ui/egl-helpers.c
index 3d19dbe382..b6a8169ffc 100644
--- a/ui/egl-helpers.c
+++ b/ui/egl-helpers.c
@@ -376,20 +376,16 @@ void egl_dmabuf_create_sync(QemuDmaBuf *dmabuf)
 EGL_SYNC_NATIVE_FENCE_ANDROID, NULL);
 if (sync != EGL_NO_SYNC_KHR) {
 dmabuf->sync = sync;
+dmabuf->fence_fd = eglDupNativeFenceFDANDROID(qemu_egl_display,
+  dmabuf->sync);
+if (dmabuf->fence_fd < 0) {
+eglDestroySyncKHR(qemu_egl_display, dmabuf->sync);
+dmabuf->sync = NULL;
+}
 }
 }
 }
 
-void egl_dmabuf_create_fence(QemuDmaBuf *dmabuf)
-{
-if (dmabuf->sync) {
-dmabuf->fence_fd = eglDupNativeFenceFDANDROID(qemu_egl_display,
-  dmabuf->sync);
-eglDestroySyncKHR(qemu_egl_display, dmabuf->sync);
-dmabuf->sync = NULL;
-}
-}
-
 #endif /* CONFIG_GBM */
 
 /* -- */
diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index 3af5ac5bcf..683a87c6b3 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -98,8 +98,8 @@ void gd_egl_draw(VirtualConsole *vc)
 glFlush();
 #ifdef CONFIG_GBM
 if (dmabuf) {
-egl_dmabuf_create_fence(dmabuf);
-if (dmabuf->fence_fd > 0) {
+egl_dmabuf_create_sync(dmabuf);
+if (dmabuf->fence_fd > -1) {
 qemu_set_fd_handler(dmabuf->fence_fd, gd_hw_gl_flushed, NULL, 
vc);
 return;
 }
@@ -348,12 +348,6 @@ void gd_egl_scanout_flush(DisplayChangeListener *dcl,
 egl_fb_blit(>gfx.win_fb, >gfx.guest_fb, !vc->gfx.y0_top);
 }
 
-#ifdef CONFIG_GBM
-if (vc->gfx.guest_fb.dmabuf) {
-egl_dmabuf_create_sync(vc->gfx.guest_fb.dmabuf);
-}
-#endif
-
 eglSwapBuffers(qemu_egl_display, vc->gfx.esurface);
 }
 
diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
index 52dcac161e..7791498646 100644
--- a/ui/gtk-gl-area.c
+++ b/ui/gtk-gl-area.c
@@ -77,16 +77,11 @@ void gd_gl_area_draw(VirtualConsole *vc)
 glBlitFramebuffer(0, y1, vc->gfx.w, y2,
   0, 0, ww, wh,
   GL_COLOR_BUFFER_BIT, GL_NEAREST);
-#ifdef CONFIG_GBM
-if (dmabuf) {
-egl_dmabuf_create_sync(dmabuf);
-}
-#endif
 glFlush();
 #ifdef CONFIG_GBM
 if (dmabuf) {
-egl_dmabuf_create_fence(dmabuf);
-if (dmabuf->fence_fd > 0) {
+egl_dmabuf_create_sync(dmabuf);
+if (dmabuf->fence_fd > -1) {
 qemu_set_fd_handler(dmabuf->fence_fd, gd_hw_gl_flushed, NULL, 
vc);
 return;
 }
diff --git a/ui/gtk.c b/ui/gtk.c
index 810d7fc796..bbe05a0baf 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -597,10 +597,14 @@ void gd_hw_gl_flushed(void *vcon)
 VirtualConsole *vc = vcon;
 QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
 
-qemu_set_fd_handler(dmabuf->fence_fd, NULL, NULL, NULL);
-close(dmabuf->fence_fd);
-dmabuf->fence_fd = -1;
-graphic_hw_gl_block(vc->gfx.dcl.con, false);
+if (dmabuf && dmabuf->fence_fd > -1) {
+qemu_set_fd_handler(dmabuf->fence_fd, NULL, NULL, NULL);
+close(dmabuf->fence_fd);
+dmabuf->fence_fd = -1;
+eglDestroySyncKHR(qemu_egl_display, dmabuf->sync);
+dmabuf->sync = NULL;
+graphic_hw_gl_block(vc->gfx.dcl.con, false);
+}
 }
 
 /** DisplayState Callbacks (opengl version) **/
@@ -678,6 +682,25 @@ static const DisplayGLCtxOps egl_ctx_ops = {
 static void gd_change_runstate(void *opaque, bool running, RunState state)
 {
 GtkDisplayState *s = opaque;
+int i;
+
+if (state == RUN_STATE_SAVE_VM) {
+

RE: [PATCH v5 5/7] migration/multifd: implement initialization of qpl compression

2024-03-20 Thread Liu, Yuan1

> -Original Message-
> From: Peter Xu 
> Sent: Thursday, March 21, 2024 4:32 AM
> To: Liu, Yuan1 
> Cc: Daniel P. Berrangé ; faro...@suse.de; qemu-
> de...@nongnu.org; hao.xi...@bytedance.com; bryan.zh...@bytedance.com; Zou,
> Nanhai 
> Subject: Re: [PATCH v5 5/7] migration/multifd: implement initialization of
> qpl compression
> 
> On Wed, Mar 20, 2024 at 04:23:01PM +, Liu, Yuan1 wrote:
> > let me explain here, during the decompression operation of IAA, the
> > decompressed data can be directly output to the virtual address of the
> > guest memory by IAA hardware.  It can avoid copying the decompressed
> data
> > to guest memory by CPU.
> 
> I see.
> 
> > Without -mem-prealloc, all the guest memory is not populated, and IAA
> > hardware needs to trigger I/O page fault first and then output the
> > decompressed data to the guest memory region.  Besides that, CPU page
> > faults will also trigger IOTLB flush operation when IAA devices use SVM.
> 
> Oh so the IAA hardware already can use CPU pgtables?  Nice..
> 
> Why IOTLB flush is needed?  AFAIU we're only installing new pages, the
> request can either come from a CPU access or a DMA.  In all cases there
> should have no tearing down of an old page.  Isn't an iotlb flush only
> needed if a tear down happens?

As far as I know, IAA hardware uses SVM technology to use the CPU's page table 
for address translation (IOMMU scalable mode directly accesses the CPU page 
table).
Therefore, when the CPU page table changes, the device's Invalidation operation 
needs
to be triggered to update the IOMMU and the device's cache. 

My current kernel version is mainline 6.2. The issue I see is as follows:
--Handle_mm_fault
 |
  -- wp_page_copy
|
-- mmu_notifier_invalidate_range
  |
  -- intel_invalidate_rage
|
-- qi_flush_piotlb
-- qi_flush_dev_iotlb_pasid
 

> > Due to the inability to quickly resolve a large number of IO page faults
> > and IOTLB flushes, the decompression throughput of the IAA device will
> > decrease significantly.
> 
> --
> Peter Xu

Re: [External] Re: [PATCH v3 1/2] memory tier: dax/kmem: create CPUless memory tiers after obtaining HMAT info

2024-03-20 Thread Ho-Ren (Jack) Chuang

On Wed, Mar 20, 2024 at 12:15 AM Huang, Ying  wrote:
>
> "Ho-Ren (Jack) Chuang"  writes:
>
> > The current implementation treats emulated memory devices, such as
> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
> > (E820_TYPE_RAM). However, these emulated devices have different
> > characteristics than traditional DRAM, making it important to
> > distinguish them. Thus, we modify the tiered memory initialization process
> > to introduce a delay specifically for CPUless NUMA nodes. This delay
> > ensures that the memory tier initialization for these nodes is deferred
> > until HMAT information is obtained during the boot process. Finally,
> > demotion tables are recalculated at the end.
> >
> > More details:
>
> You have done several stuff in one patch.  So you need "more details".
> You may separate them into multiple patches.  One for echo "*" below.
> But I have no strong opinion on that.
>
> > * late_initcall(memory_tier_late_init);
> > Some device drivers may have initialized memory tiers between
> > `memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
> > online memory nodes and configuring memory tiers. They should be excluded
> > in the late init.
> >
> > * Abstract common functions into `mt_find_alloc_memory_type()`
> > Since different memory devices require finding or allocating a memory type,
> > these common steps are abstracted into a single function,
> > `mt_find_alloc_memory_type()`, enhancing code scalability and conciseness.
> >
> > * Handle cases where there is no HMAT when creating memory tiers
> > There is a scenario where a CPUless node does not provide HMAT information.
> > If no HMAT is specified, it falls back to using the default DRAM tier.
> >
> > * Change adist calculation code to use another new lock, `mt_perf_lock`.
> > In the current implementation, iterating through CPUlist nodes requires
> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
> > trying to acquire the same lock, leading to a potential deadlock.
> > Therefore, we propose introducing a standalone `mt_perf_lock` to protect
> > `default_dram_perf`. This approach not only avoids deadlock but also
> > prevents holding a large lock simultaneously.
> >
> > * Upgrade `set_node_memory_tier` to support additional cases, including
> >   default DRAM, late CPUless, and hot-plugged initializations.
> > To cover hot-plugged memory nodes, `mt_calc_adistance()` and
> > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
> > handle cases where memtype is not initialized and where HMAT information is
> > available.
> >
> > * Introduce `default_memory_types` for those memory types that are not
> >   initialized by device drivers.
> > Because late initialized memory and default DRAM memory need to be managed,
> > a default memory type is created for storing all memory types that are
> > not initialized by device drivers and as a fallback.
> >
> > Signed-off-by: Ho-Ren (Jack) Chuang 
> > Signed-off-by: Hao Xiang 
> > ---
> >  drivers/dax/kmem.c   | 13 +
> >  include/linux/memory-tiers.h |  7 +++
> >  mm/memory-tiers.c| 94 +---
> >  3 files changed, 95 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> > index 42ee360cf4e3..de1333aa7b3e 100644
> > --- a/drivers/dax/kmem.c
> > +++ b/drivers/dax/kmem.c
> > @@ -55,21 +55,10 @@ static LIST_HEAD(kmem_memory_types);
> >
> >  static struct memory_dev_type *kmem_find_alloc_memory_type(int adist)
> >  {
> > - bool found = false;
> >   struct memory_dev_type *mtype;
> >
> >   mutex_lock(_memory_type_lock);
> > - list_for_each_entry(mtype, _memory_types, list) {
> > - if (mtype->adistance == adist) {
> > - found = true;
> > - break;
> > - }
> > - }
> > - if (!found) {
> > - mtype = alloc_memory_type(adist);
> > - if (!IS_ERR(mtype))
> > - list_add(>list, _memory_types);
> > - }
> > + mtype = mt_find_alloc_memory_type(adist, _memory_types);
> >   mutex_unlock(_memory_type_lock);
> >
> >   return mtype;
>
> It seems that there's some miscommunication about my previous comments
> about this.  What I suggested is to create one separate patch, which
> moves mt_find_alloc_memory_type() and mt_put_memory_types() into
> memory-tiers.c.  And make this patch the first one of the series.
>

I will make mt_find_alloc/mt_put_memory_type changes as
a separate patch and the first of my patch series. Thanks.


> > diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
> > index 69e781900082..b2135334ac18 100644
> > --- a/include/linux/memory-tiers.h
> > +++ b/include/linux/memory-tiers.h
> > @@ -48,6 +48,8 @@ int mt_calc_adistance(int node, int *adist);
> >  int mt_set_default_dram_perf(int nid, struct access_coordinate *perf,
> >

[PATCH v2 2/2] Implement QEMU GA commands for Windows

2024-03-20 Thread aidan_leuck

From: Aidan Leuck 

Signed-off-by: Aidan Leuck 
---
 qga/commands-posix-ssh.c   | 47 +---
 qga/commands-ssh-core.c| 57 +
 qga/commands-ssh-core.h| 15 +
 qga/commands-windows-ssh.c | 64 --
 qga/commands-windows-ssh.h | 15 -
 qga/meson.build|  5 +++
 6 files changed, 86 insertions(+), 117 deletions(-)
 create mode 100644 qga/commands-ssh-core.c
 create mode 100644 qga/commands-ssh-core.h

diff --git a/qga/commands-posix-ssh.c b/qga/commands-posix-ssh.c
index 236f80de44..9a71b109f9 100644
--- a/qga/commands-posix-ssh.c
+++ b/qga/commands-posix-ssh.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 
+#include "commands-ssh-core.h"
 #include "qapi/error.h"
 #include "qga-qapi-commands.h"
 
@@ -80,37 +81,6 @@ mkdir_for_user(const char *path, const struct passwd *p,
 return true;
 }
 
-static bool
-check_openssh_pub_key(const char *key, Error **errp)
-{
-/* simple sanity-check, we may want more? */
-if (!key || key[0] == '#' || strchr(key, '\n')) {
-error_setg(errp, "invalid OpenSSH public key: '%s'", key);
-return false;
-}
-
-return true;
-}
-
-static bool
-check_openssh_pub_keys(strList *keys, size_t *nkeys, Error **errp)
-{
-size_t n = 0;
-strList *k;
-
-for (k = keys; k != NULL; k = k->next) {
-if (!check_openssh_pub_key(k->value, errp)) {
-return false;
-}
-n++;
-}
-
-if (nkeys) {
-*nkeys = n;
-}
-return true;
-}
-
 static bool
 write_authkeys(const char *path, const GStrv keys,
const struct passwd *p, Error **errp)
@@ -139,21 +109,6 @@ write_authkeys(const char *path, const GStrv keys,
 return true;
 }
 
-static GStrv
-read_authkeys(const char *path, Error **errp)
-{
-g_autoptr(GError) err = NULL;
-g_autofree char *contents = NULL;
-
-if (!g_file_get_contents(path, , NULL, )) {
-error_setg(errp, "failed to read '%s': %s", path, err->message);
-return NULL;
-}
-
-return g_strsplit(contents, "\n", -1);
-
-}
-
 void
 qmp_guest_ssh_add_authorized_keys(const char *username, strList *keys,
   bool has_reset, bool reset,
diff --git a/qga/commands-ssh-core.c b/qga/commands-ssh-core.c
new file mode 100644
index 00..c77cee8a11
--- /dev/null
+++ b/qga/commands-ssh-core.c
@@ -0,0 +1,57 @@
+ /*
+  * This work is licensed under the terms of the GNU GPL, version 2 or later.
+  * See the COPYING file in the top-level directory.
+  */
+
+#include "qemu/osdep.h"
+#include 
+#include 
+#include "qapi/error.h"
+#include "commands-ssh-core.h"
+
+GStrv read_authkeys(const char *path, Error **errp)
+{
+g_autoptr(GError) err = NULL;
+g_autofree char *contents = NULL;
+
+if (!g_file_get_contents(path, , NULL, ))
+{
+error_setg(errp, "failed to read '%s': %s", path, err->message);
+return NULL;
+}
+
+return g_strsplit(contents, "\n", -1);
+}
+
+bool check_openssh_pub_keys(strList *keys, size_t *nkeys, Error **errp)
+{
+size_t n = 0;
+strList *k;
+
+for (k = keys; k != NULL; k = k->next)
+{
+if (!check_openssh_pub_key(k->value, errp))
+{
+return false;
+}
+n++;
+}
+
+if (nkeys)
+{
+*nkeys = n;
+}
+return true;
+}
+
+bool check_openssh_pub_key(const char *key, Error **errp)
+{
+/* simple sanity-check, we may want more? */
+if (!key || key[0] == '#' || strchr(key, '\n'))
+{
+error_setg(errp, "invalid OpenSSH public key: '%s'", key);
+return false;
+}
+
+return true;
+}
diff --git a/qga/commands-ssh-core.h b/qga/commands-ssh-core.h
new file mode 100644
index 00..9c9e992c62
--- /dev/null
+++ b/qga/commands-ssh-core.h
@@ -0,0 +1,15 @@
+/*
+ * Header file for commands-ssh-core.c
+ *
+ * Copyright IBM Corp. 2024
+ *
+ * Authors:
+ *  Aidan Leuck 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+GStrv read_authkeys(const char *path, Error **errp);
+bool check_openssh_pub_keys(strList *keys, size_t *nkeys, Error **errp);
+bool check_openssh_pub_key(const char *key, Error **errp);
\ No newline at end of file
diff --git a/qga/commands-windows-ssh.c b/qga/commands-windows-ssh.c
index e9faae90fc..0739d694ed 100644
--- a/qga/commands-windows-ssh.c
+++ b/qga/commands-windows-ssh.c
@@ -23,7 +23,6 @@
 #include "lmapibuf.h"
 #include "lmerr.h"
 #include "qapi/error.h"
-
 #include "qga-qapi-commands.h"
 #include "sddl.h"
 #include "shlobj.h"
@@ -35,69 +34,6 @@
 #define ADMIN_SID "S-1-5-32-544"
 #define WORLD_SID "S-1-1-0"
 
-/*
- * Reads the authorized_keys file and returns an array of strings for each 
entry
- * parameters:
- * path -> Path to the authorized_keys file
- * errp -> Error structure that will contain errors upon failure.
- * returns: Array of strings, where

[PATCH v2 1/2] Implement QEMU GA commands for Windows

2024-03-20 Thread aidan_leuck

From: Aidan Leuck 

Signed-off-by: Aidan Leuck 
---
 qga/commands-windows-ssh.c | 823 +
 qga/commands-windows-ssh.h |  26 ++
 qga/meson.build|   9 +-
 qga/qapi-schema.json   |  22 +-
 4 files changed, 867 insertions(+), 13 deletions(-)
 create mode 100644 qga/commands-windows-ssh.c
 create mode 100644 qga/commands-windows-ssh.h

diff --git a/qga/commands-windows-ssh.c b/qga/commands-windows-ssh.c
new file mode 100644
index 00..e9faae90fc
--- /dev/null
+++ b/qga/commands-windows-ssh.c
@@ -0,0 +1,823 @@
+/*
+ * QEMU Guest Agent win32-specific command implementations for SSH keys.
+ * The implementation is opinionated and expects the SSH implementation to
+ * be OpenSSH.
+ *
+ * Copyright Schweitzer Engineering Laboratories. 2024
+ *
+ * Authors:
+ *  Aidan Leuck 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include 
+#include 
+
+#include "commands-windows-ssh.h"
+#include "guest-agent-core.h"
+#include "limits.h"
+#include "lmaccess.h"
+#include "lmapibuf.h"
+#include "lmerr.h"
+#include "qapi/error.h"
+
+#include "qga-qapi-commands.h"
+#include "sddl.h"
+#include "shlobj.h"
+#include "userenv.h"
+
+#define AUTHORIZED_KEY_FILE "authorized_keys"
+#define AUTHORIZED_KEY_FILE_ADMIN "administrators_authorized_keys"
+#define LOCAL_SYSTEM_SID "S-1-5-18"
+#define ADMIN_SID "S-1-5-32-544"
+#define WORLD_SID "S-1-1-0"
+
+/*
+ * Reads the authorized_keys file and returns an array of strings for each 
entry
+ * parameters:
+ * path -> Path to the authorized_keys file
+ * errp -> Error structure that will contain errors upon failure.
+ * returns: Array of strings, where each entry is an authorized key.
+ */
+static GStrv read_authkeys(const char *path, Error **errp)
+{
+  g_autoptr(GError) err = NULL;
+  g_autofree char *contents = NULL;
+
+  if (!g_file_get_contents(path, , NULL, )) {
+error_setg(errp, "failed to read '%s': %s", path, err->message);
+return NULL;
+  }
+
+  return g_strsplit(contents, "\n", -1);
+}
+
+/*
+ * Checks if a OpenSSH key is valid
+ * parameters:
+ * key* Key to check for validity
+ * errp -> Error structure that will contain errors upon failure.
+ * returns: true if key is valid, false otherwise
+ */
+static bool check_openssh_pub_key(const char *key, Error **errp)
+{
+  /* simple sanity-check, we may want more? */
+  if (!key || key[0] == '#' || strchr(key, '\n')) {
+error_setg(errp, "invalid OpenSSH public key: '%s'", key);
+return false;
+  }
+
+  return true;
+}
+
+/*
+ * Checks if all openssh keys in the array are valid
+ * parameters:
+ * keys -> Array of keys to check
+ * errp -> Error structure that will contain errors upon failure.
+ * returns: true if all keys are valid, false otherwise
+ */
+static bool check_openssh_pub_keys(strList *keys, size_t *nkeys, Error **errp)
+{
+  size_t n = 0;
+  strList *k;
+
+  for (k = keys; k != NULL; k = k->next) {
+if (!check_openssh_pub_key(k->value, errp)) {
+  return false;
+}
+n++;
+  }
+
+  if (nkeys) {
+*nkeys = n;
+  }
+  return true;
+}
+
+/*
+ * Frees userInfo structure. This implements the g_auto cleanup
+ * for the structure.
+ */
+void free_userInfo(PWindowsUserInfo info)
+{
+  g_free(info->sshDirectory);
+  g_free(info->authorizedKeyFile);
+  LocalFree(info->SSID);
+  g_free(info->username);
+  g_free(info);
+}
+
+/*
+ * Gets the admin SSH folder for OpenSSH. OpenSSH does not store
+ * the authorized_key file in the users home directory for security reasons and
+ * instead stores it at %PROGRAMDATA%/ssh. This function returns the path to
+ * that directory on the users machine parameters: errp -> error structure to
+ * set when an error occurs returns: The path to the ssh folder in 
%PROGRAMDATA%
+ * or NULl if an error occurred.
+ */
+static char *get_admin_ssh_folder(Error **errp)
+{
+  // Allocate memory for the program data path
+  g_autofree char *programDataPath = NULL;
+  char *authkeys_path = NULL;
+  PWSTR pgDataW;
+  GError *gerr = NULL;
+
+  // Get the KnownFolderPath on the machine.
+  HRESULT folderResult =
+  SHGetKnownFolderPath(_ProgramData, 0, NULL, );
+  if (folderResult != S_OK) {
+error_setg(errp, "Failed to retrieve ProgramData folder");
+goto error;
+  }
+
+  // Convert from a wide string back to a standard character string.
+  programDataPath = g_utf16_to_utf8(pgDataW, -1, NULL, NULL, );
+  if (!programDataPath) {
+goto error;
+  }
+
+  // Build the path to the file.
+  authkeys_path = g_build_filename(programDataPath, "ssh", NULL);
+  CoTaskMemFree(pgDataW);
+  return authkeys_path;
+
+error:
+  CoTaskMemFree(pgDataW);
+  
+  if (gerr) {
+error_setg(errp,"Failed to convert program data path from wide string to 
standard utf 8 string. %s", gerr->message);
+g_error_free(gerr);
+  }
+
+  return NULL;
+}
+
+/*
+ * Gets the path to the SSH folder for the specified user. If

Re: [PATCH-for-9.0] monitor/hmp-cmds-target.c: append a space in error message in gpa2hva()

2024-03-20 Thread Dr. David Alan Gilbert

* Philippe Mathieu-Daudé (phi...@linaro.org) wrote:
> On 19/3/24 03:16, Shiyang Ruan via wrote:
> > From: Yao Xingtao 
> > 
> > In qemu monitor mode, when we use gpa2hva command to print the host
> > virtual address corresponding to a guest physical address, if the gpa is
> > not in RAM, the error message is below:
> > 
> > (qemu) gpa2hva 0x75000
> > Memory at address 0x75000is not RAM
> > 
> > a space is missed between '0x75000' and 'is'.
> > 
> > Signed-off-by: Yao Xingtao 
> > ---
> >   monitor/hmp-cmds-target.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/monitor/hmp-cmds-target.c b/monitor/hmp-cmds-target.c
> > index 9338ae8440..ff01cf9d8d 100644
> > --- a/monitor/hmp-cmds-target.c
> > +++ b/monitor/hmp-cmds-target.c
> > @@ -261,7 +261,7 @@ void *gpa2hva(MemoryRegion **p_mr, hwaddr addr, 
> > uint64_t size, Error **errp)
> >   }
> >   if (!memory_region_is_ram(mrs.mr) && !memory_region_is_romd(mrs.mr)) {
> > -error_setg(errp, "Memory at address 0x%" HWADDR_PRIx "is not RAM", 
> > addr);
> > +error_setg(errp, "Memory at address 0x%" HWADDR_PRIx " is not 
> > RAM", addr);
> >   memory_region_unref(mrs.mr);
> >   return NULL;
> >   }
> 
> Fixes: e9628441df ("hmp: gpa2hva and gpa2hpa hostaddr command")
> Reviewed-by: Philippe Mathieu-Daudé 

Thanks,

Reviewed-by: Dr. David Alan Gilbert 

Cc'ing in Trivial.

Dave

> 
-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/

Re: [PATCH v3 40/49] hw/i386/sev: Add function to get SEV metadata from OVMF header

2024-03-20 Thread Michael Roth

On Wed, Mar 20, 2024 at 10:55:35AM -0700, Isaku Yamahata wrote:
> On Wed, Mar 20, 2024 at 03:39:36AM -0500,
> Michael Roth  wrote:
> 
> > From: Brijesh Singh 
> > 
> > A recent version of OVMF expanded the reset vector GUID list to add
> > SEV-specific metadata GUID. The SEV metadata describes the reserved
> > memory regions such as the secrets and CPUID page used during the SEV-SNP
> > guest launch.
> > 
> > The pc_system_get_ovmf_sev_metadata_ptr() is used to retieve the SEV
> > metadata pointer from the OVMF GUID list.
> > 
> > Signed-off-by: Brijesh Singh 
> > Signed-off-by: Michael Roth 
> > ---
> >  hw/i386/pc_sysfw_ovmf.c | 33 +
> >  include/hw/i386/pc.h| 26 ++
> >  2 files changed, 59 insertions(+)
> > 
> > diff --git a/hw/i386/pc_sysfw_ovmf.c b/hw/i386/pc_sysfw_ovmf.c
> > index 07a4c267fa..32efa34614 100644
> > --- a/hw/i386/pc_sysfw_ovmf.c
> > +++ b/hw/i386/pc_sysfw_ovmf.c
> > @@ -35,6 +35,31 @@ static const int bytes_after_table_footer = 32;
> >  static bool ovmf_flash_parsed;
> >  static uint8_t *ovmf_table;
> >  static int ovmf_table_len;
> > +static OvmfSevMetadata *ovmf_sev_metadata_table;
> > +
> > +#define OVMF_SEV_META_DATA_GUID "dc886566-984a-4798-A75e-5585a7bf67cc"
> > +typedef struct __attribute__((__packed__)) OvmfSevMetadataOffset {
> > +uint32_t offset;
> > +} OvmfSevMetadataOffset;
> > +
> > +static void pc_system_parse_sev_metadata(uint8_t *flash_ptr, size_t 
> > flash_size)
> > +{
> > +OvmfSevMetadata *metadata;
> > +OvmfSevMetadataOffset  *data;
> > +
> > +if (!pc_system_ovmf_table_find(OVMF_SEV_META_DATA_GUID, (uint8_t 
> > **),
> > +   NULL)) {
> > +return;
> > +}
> > +
> > +metadata = (OvmfSevMetadata *)(flash_ptr + flash_size - data->offset);
> > +if (memcmp(metadata->signature, "ASEV", 4) != 0) {
> > +return;
> > +}
> > +
> > +ovmf_sev_metadata_table = g_malloc(metadata->len);
> > +memcpy(ovmf_sev_metadata_table, metadata, metadata->len);
> > +}
> >  
> >  void pc_system_parse_ovmf_flash(uint8_t *flash_ptr, size_t flash_size)
> >  {
> > @@ -90,6 +115,9 @@ void pc_system_parse_ovmf_flash(uint8_t *flash_ptr, 
> > size_t flash_size)
> >   */
> >  memcpy(ovmf_table, ptr - tot_len, tot_len);
> >  ovmf_table += tot_len;
> > +
> > +/* Copy the SEV metadata table (if exist) */
> > +pc_system_parse_sev_metadata(flash_ptr, flash_size);
> >  }
> 
> Can we move this call to x86_firmware_configure() @ pc_sysfw.c, and move sev
> specific bits to somewhere to sev specific file?  We don't have to parse sev
> metadata for non-SEV case, right?
> 
> We don't have to touch common ovmf file. It also will be consistent with tdx
> case.  TDX patch series adds tdx_parse_tdvf() to x86_firmware_configure().

Yep, makes sense to handle it similarly for SNP.

Thanks,

Mike

> 
> thanks,
> 
> >  
> >  /**
> > @@ -159,3 +187,8 @@ bool pc_system_ovmf_table_find(const char *entry, 
> > uint8_t **data,
> >  }
> >  return false;
> >  }
> > +
> > +OvmfSevMetadata *pc_system_get_ovmf_sev_metadata_ptr(void)
> > +{
> > +return ovmf_sev_metadata_table;
> > +}
> > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> > index fb1d4106e5..df9a61540d 100644
> > --- a/include/hw/i386/pc.h
> > +++ b/include/hw/i386/pc.h
> > @@ -163,6 +163,32 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int 
> > level);
> >  #define PCI_HOST_ABOVE_4G_MEM_SIZE "above-4g-mem-size"
> >  #define PCI_HOST_PROP_SMM_RANGES   "smm-ranges"
> >  
> > +typedef enum {
> > +SEV_DESC_TYPE_UNDEF,
> > +/* The section contains the region that must be validated by the VMM. 
> > */
> > +SEV_DESC_TYPE_SNP_SEC_MEM,
> > +/* The section contains the SNP secrets page */
> > +SEV_DESC_TYPE_SNP_SECRETS,
> > +/* The section contains address that can be used as a CPUID page */
> > +SEV_DESC_TYPE_CPUID,
> > +
> > +} ovmf_sev_metadata_desc_type;
> > +
> > +typedef struct __attribute__((__packed__)) OvmfSevMetadataDesc {
> > +uint32_t base;
> > +uint32_t len;
> > +ovmf_sev_metadata_desc_type type;
> > +} OvmfSevMetadataDesc;
> > +
> > +typedef struct __attribute__((__packed__)) OvmfSevMetadata {
> > +uint8_t signature[4];
> > +uint32_t len;
> > +uint32_t version;
> > +uint32_t num_desc;
> > +OvmfSevMetadataDesc descs[];
> > +} OvmfSevMetadata;
> > +
> > +OvmfSevMetadata *pc_system_get_ovmf_sev_metadata_ptr(void);
> >  
> >  void pc_pci_as_mapping_init(MemoryRegion *system_memory,
> >  MemoryRegion *pci_address_space);
> > -- 
> > 2.25.1
> > 
> > 
> 
> -- 
> Isaku Yamahata

Re: [PATCH v3 37/49] i386/sev: Add the SNP launch start context

2024-03-20 Thread Michael Roth

On Wed, Mar 20, 2024 at 10:58:30AM +0100, Paolo Bonzini wrote:
> On 3/20/24 09:39, Michael Roth wrote:
> > From: Brijesh Singh 
> > 
> > The SNP_LAUNCH_START is called first to create a cryptographic launch
> > context within the firmware.
> > 
> > Signed-off-by: Brijesh Singh 
> > Signed-off-by: Michael Roth 
> > ---
> >   target/i386/sev.c| 42 +++-
> >   target/i386/trace-events |  1 +
> >   2 files changed, 42 insertions(+), 1 deletion(-)
> > 
> > diff --git a/target/i386/sev.c b/target/i386/sev.c
> > index 3b4dbc63b1..9f63a41f08 100644
> > --- a/target/i386/sev.c
> > +++ b/target/i386/sev.c
> > @@ -39,6 +39,7 @@
> >   #include "confidential-guest.h"
> >   #include "hw/i386/pc.h"
> >   #include "exec/address-spaces.h"
> > +#include "qemu/queue.h"
> >   OBJECT_DECLARE_SIMPLE_TYPE(SevCommonState, SEV_COMMON)
> >   OBJECT_DECLARE_SIMPLE_TYPE(SevGuestState, SEV_GUEST)
> > @@ -106,6 +107,16 @@ struct SevSnpGuestState {
> >   #define DEFAULT_SEV_DEVICE  "/dev/sev"
> >   #define DEFAULT_SEV_SNP_POLICY  0x3
> > +typedef struct SevLaunchUpdateData {
> > +QTAILQ_ENTRY(SevLaunchUpdateData) next;
> > +hwaddr gpa;
> > +void *hva;
> > +uint64_t len;
> > +int type;
> > +} SevLaunchUpdateData;
> > +
> > +static QTAILQ_HEAD(, SevLaunchUpdateData) launch_update;
> > +
> >   #define SEV_INFO_BLOCK_GUID "00f771de-1a7e-4fcb-890e-68c77e2fb44e"
> >   typedef struct __attribute__((__packed__)) SevInfoBlock {
> >   /* SEV-ES Reset Vector Address */
> > @@ -668,6 +679,30 @@ sev_read_file_base64(const char *filename, guchar 
> > **data, gsize *len)
> >   return 0;
> >   }
> > +static int
> > +sev_snp_launch_start(SevSnpGuestState *sev_snp_guest)
> > +{
> > +int fw_error, rc;
> > +SevCommonState *sev_common = SEV_COMMON(sev_snp_guest);
> > +struct kvm_sev_snp_launch_start *start = 
> > _snp_guest->kvm_start_conf;
> > +
> > +trace_kvm_sev_snp_launch_start(start->policy, 
> > sev_snp_guest->guest_visible_workarounds);
> > +
> > +rc = sev_ioctl(sev_common->sev_fd, KVM_SEV_SNP_LAUNCH_START,
> > +   start, _error);
> > +if (rc < 0) {
> > +error_report("%s: SNP_LAUNCH_START ret=%d fw_error=%d '%s'",
> > +__func__, rc, fw_error, fw_error_to_str(fw_error));
> > +return 1;
> > +}
> > +
> > +QTAILQ_INIT(_update);
> > +
> > +sev_set_guest_state(sev_common, SEV_STATE_LAUNCH_UPDATE);
> > +
> > +return 0;
> > +}
> > +
> >   static int
> >   sev_launch_start(SevGuestState *sev_guest)
> >   {
> > @@ -1007,7 +1042,12 @@ static int sev_kvm_init(ConfidentialGuestSupport 
> > *cgs, Error **errp)
> >   goto err;
> >   }
> > -ret = sev_launch_start(SEV_GUEST(sev_common));
> > +if (sev_snp_enabled()) {
> > +ret = sev_snp_launch_start(SEV_SNP_GUEST(sev_common));
> > +} else {
> > +ret = sev_launch_start(SEV_GUEST(sev_common));
> > +}
> 
> Instead of an "if", this should be a method in sev-common.  Likewise for
> launch_finish in the next patch.

Makes sense.

> 
> Also, patch 47 should introduce an "int (*launch_update_data)(hwaddr gpa,
> uint8_t *ptr, uint64_t len)" method whose implementation is either the
> existing sev_launch_update_data() for sev-guest, or a wrapper around
> snp_launch_update_data() (to add KVM_SEV_SNP_PAGE_TYPE_NORMAL) for
> sev-snp-guest.

I suppose if we end up introducing an unused 'gpa' parameter in the case
of sev_launch_update_data() that's still worth the change? Seems
reasonable to me.

> 
> In general, the only uses of sev_snp_enabled() should be in
> sev_add_kernel_loader_hashes() and kvm_handle_vmgexit_ext_req().  I would
> not be that strict for the QMP and HMP functions, but if you want to make
> those methods of sev-common I wouldn't complain.

There's a good bit of duplication in those cases which is a little
awkward to break out into a common helper. Will consider these as well
though.

Thanks,

Mike

> 
> Paolo
> 
> >   if (ret) {
> >   error_setg(errp, "%s: failed to create encryption context", 
> > __func__);
> >   goto err;
> > diff --git a/target/i386/trace-events b/target/i386/trace-events
> > index 2cd8726eeb..cb26d8a925 100644
> > --- a/target/i386/trace-events
> > +++ b/target/i386/trace-events
> > @@ -11,3 +11,4 @@ kvm_sev_launch_measurement(const char *value) "data %s"
> >   kvm_sev_launch_finish(void) ""
> >   kvm_sev_launch_secret(uint64_t hpa, uint64_t hva, uint64_t secret, int 
> > len) "hpa 0x%" PRIx64 " hva 0x%" PRIx64 " data 0x%" PRIx64 " len %d"
> >   kvm_sev_attestation_report(const char *mnonce, const char *data) "mnonce 
> > %s data %s"
> > +kvm_sev_snp_launch_start(uint64_t policy, char *gosvw) "policy 0x%" PRIx64 
> > " gosvw %s"
>

Re: [PATCH v3 31/49] i386/sev: Update query-sev QAPI format to handle SEV-SNP

2024-03-20 Thread Michael Roth via

On Wed, Mar 20, 2024 at 12:10:04PM +, Daniel P. Berrangé wrote:
> On Wed, Mar 20, 2024 at 03:39:27AM -0500, Michael Roth wrote:
> > Most of the current 'query-sev' command is relevant to both legacy
> > SEV/SEV-ES guests and SEV-SNP guests, with 2 exceptions:
> > 
> >   - 'policy' is a 64-bit field for SEV-SNP, not 32-bit, and
> > the meaning of the bit positions has changed
> >   - 'handle' is not relevant to SEV-SNP
> > 
> > To address this, this patch adds a new 'sev-type' field that can be
> > used as a discriminator to select between SEV and SEV-SNP-specific
> > fields/formats without breaking compatibility for existing management
> > tools (so long as management tools that add support for launching
> > SEV-SNP guest update their handling of query-sev appropriately).
> > 
> > The corresponding HMP command has also been fixed up similarly.
> > 
> > Signed-off-by: Michael Roth 
> > ---
> >  qapi/misc-target.json | 71 ++-
> >  target/i386/sev.c | 50 --
> >  target/i386/sev.h |  3 ++
> >  3 files changed, 94 insertions(+), 30 deletions(-)
> > 
> > diff --git a/qapi/misc-target.json b/qapi/misc-target.json
> > index 4e0a6492a9..daceb85d95 100644
> > --- a/qapi/misc-target.json
> > +++ b/qapi/misc-target.json
> > @@ -47,6 +47,49 @@
> > 'send-update', 'receive-update' ],
> >'if': 'TARGET_I386' }
> >  
> > +##
> > +# @SevGuestType:
> > +#
> > +# An enumeration indicating the type of SEV guest being run.
> > +#
> > +# @sev: The guest is a legacy SEV or SEV-ES guest.
> > +# @sev-snp: The guest is an SEV-SNP guest.
> > +#
> > +# Since: 6.2
> 
> Now 9.1 at the earliest.
> 
> > +##
> > +{ 'enum': 'SevGuestType',
> > +  'data': [ 'sev', 'sev-snp' ],
> > +  'if': 'TARGET_I386' }
> > +
> > +##
> > +# @SevGuestInfo:
> > +#
> > +# Information specific to legacy SEV/SEV-ES guests.
> > +#
> > +# @policy: SEV policy value
> > +#
> > +# @handle: SEV firmware handle
> > +#
> > +# Since: 2.12
> > +##
> > +{ 'struct': 'SevGuestInfo',
> > +  'data': { 'policy': 'uint32',
> > +'handle': 'uint32' },
> > +  'if': 'TARGET_I386' }
> > +
> > +##
> > +# @SevSnpGuestInfo:
> > +#
> > +# Information specific to SEV-SNP guests.
> > +#
> > +# @snp-policy: SEV-SNP policy value
> > +#
> > +# Since: 6.2
> > +##
> > +{ 'struct': 'SevSnpGuestInfo',
> > +  'data': { 'snp-policy': 'uint64' },
> > +  'if': 'TARGET_I386' }
> 
> IMHO it can just be called 'policy' still, since
> it is implicitly within a 'Snp' specific type.
> 
> 
> > +
> >  ##
> >  # @SevInfo:
> >  #
> > @@ -60,25 +103,25 @@
> >  #
> >  # @build-id: SEV FW build id
> >  #
> > -# @policy: SEV policy value
> > -#
> >  # @state: SEV guest state
> >  #
> > -# @handle: SEV firmware handle
> > +# @sev-type: Type of SEV guest being run
> >  #
> >  # Since: 2.12
> >  ##
> > -{ 'struct': 'SevInfo',
> > -'data': { 'enabled': 'bool',
> > -  'api-major': 'uint8',
> > -  'api-minor' : 'uint8',
> > -  'build-id' : 'uint8',
> > -  'policy' : 'uint32',
> > -  'state' : 'SevState',
> > -  'handle' : 'uint32'
> > -},
> > -  'if': 'TARGET_I386'
> > -}
> > +{ 'union': 'SevInfo',
> > +  'base': { 'enabled': 'bool',
> > +'api-major': 'uint8',
> > +'api-minor' : 'uint8',
> > +'build-id' : 'uint8',
> > +'state' : 'SevState',
> > +'sev-type' : 'SevGuestType' },
> > +  'discriminator': 'sev-type',
> > +  'data': {
> > +  'sev': 'SevGuestInfo',
> > +  'sev-snp': 'SevSnpGuestInfo' },
> > +  'if': 'TARGET_I386' }
> > +
> >  
> >  ##
> >  # @query-sev:
> > diff --git a/target/i386/sev.c b/target/i386/sev.c
> > index 43e6c0172f..b03d70a3d1 100644
> > --- a/target/i386/sev.c
> > +++ b/target/i386/sev.c
> > @@ -353,25 +353,27 @@ static SevInfo *sev_get_info(void)
> >  {
> >  SevInfo *info;
> >  SevCommonState *sev_common = 
> > SEV_COMMON(MACHINE(qdev_get_machine())->cgs);
> > -SevGuestState *sev_guest =
> > -(SevGuestState *)object_dynamic_cast(OBJECT(sev_common),
> > - TYPE_SEV_GUEST);
> >  
> >  info = g_new0(SevInfo, 1);
> >  info->enabled = sev_enabled();
> >  
> >  if (info->enabled) {
> > -if (sev_guest) {
> > -info->handle = sev_guest->handle;
> > -}
> >  info->api_major = sev_common->api_major;
> >  info->api_minor = sev_common->api_minor;
> >  info->build_id = sev_common->build_id;
> >  info->state = sev_common->state;
> > -/* we only report the lower 32-bits of policy for SNP, ok for 
> > now... */
> > -info->policy =
> > -(uint32_t)object_property_get_uint(OBJECT(sev_common),
> > -   "policy", NULL);
> > +
> > +if (sev_snp_enabled()) {
> > +info->sev_type = SEV_GUEST_TYPE_SEV_SNP;
> > +

Re: [PATCH v3 25/49] i386/sev: Skip RAMBlock notifiers for SNP

2024-03-20 Thread Michael Roth

On Wed, Mar 20, 2024 at 10:46:29AM +0100, Paolo Bonzini wrote:
> On 3/20/24 09:39, Michael Roth wrote:
> > SEV uses these notifiers to register/pin pages prior to guest use, since
> > they could potentially be used for private memory where page migration
> > is not supported. But SNP only uses guest_memfd-provided pages for
> > private memory, which has its own kernel-internal mechanisms for
> > registering/pinning memory.
> > 
> > Signed-off-by: Michael Roth 
> > ---
> >   target/i386/sev.c | 10 +-
> >   1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/target/i386/sev.c b/target/i386/sev.c
> > index 61af312a11..774262d834 100644
> > --- a/target/i386/sev.c
> > +++ b/target/i386/sev.c
> > @@ -982,7 +982,15 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, 
> > Error **errp)
> >   goto err;
> >   }
> > -ram_block_notifier_add(_ram_notifier);
> > +if (!sev_snp_enabled()) {
> > +/*
> > + * SEV uses these notifiers to register/pin pages prior to guest 
> > use,
> > + * but SNP relies on guest_memfd for private pages, which has it's
> > + * own internal mechanisms for registering/pinning private memory.
> > + */
> > +ram_block_notifier_add(_ram_notifier);
> > +}
> > +
> >   qemu_add_machine_init_done_notifier(_machine_done_notify);
> >   qemu_add_vm_change_state_handler(sev_vm_state_change, sev_common);
> 
> These three lines can be done in any order, so I suggest removing
> ram_block_notifier_add + qemu_add_machine_init_done_notifier from the
> sev-common implementation of kvm_init (let's call it sev_common_kvm_init);
> and add an override in sev-guest that calls them if sev_common_kvm_init()
> succeeds.
> 
> (treat this as a review for 25/26/29).

Makes sense. Will split out the common bits of sev_kvm_init() and use
class methods for initialization specific to sev-guest and
sev-snp-guest.

-Mike

> 
> Paolo
>

Re: [PATCH v3 23/49] i386/sev: Add a sev_snp_enabled() helper

2024-03-20 Thread Michael Roth via

On Wed, Mar 20, 2024 at 12:35:09PM +, Daniel P. Berrangé wrote:
> On Wed, Mar 20, 2024 at 03:39:19AM -0500, Michael Roth wrote:
> > Add a simple helper to check if the current guest type is SNP. Also have
> > SNP-enabled imply that SEV-ES is enabled as well, and fix up any places
> > where the sev_es_enabled() check is expecting a pure/non-SNP guest.
> > 
> > Signed-off-by: Michael Roth 
> > ---
> >  target/i386/sev.c | 13 -
> >  target/i386/sev.h |  2 ++
> >  2 files changed, 14 insertions(+), 1 deletion(-)
> > 
> > diff --git a/target/i386/sev.c b/target/i386/sev.c
> > index 7e6dab642a..2eb13ba639 100644
> > --- a/target/i386/sev.c
> > +++ b/target/i386/sev.c
> 
> 
> > @@ -933,7 +942,9 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, 
> > Error **errp)
> >   __func__);
> >  goto err;
> >  }
> > +}
> >  
> > +if (sev_es_enabled() && !sev_snp_enabled()) {
> >  if (!(status.flags & SEV_STATUS_FLAGS_CONFIG_ES)) {
> >  error_report("%s: guest policy requires SEV-ES, but "
> >   "host SEV-ES support unavailable",
> 
> Opps, pre-existing bug here - this method has an 'Error **errp'
> parameter, so should be using 'error_report'.
> 
> There are several more examples of this in this method that
> predate your patch series.  Can you put a patch at the start
> of this series that fixes them before introducing SNP.

Sure, will add a pre-patch to fix up all the pre-existing issues
you've noted.

-Mike

> 
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>

Re: [PATCH v3 22/49] i386/sev: Introduce 'sev-snp-guest' object

2024-03-20 Thread Michael Roth via

On Wed, Mar 20, 2024 at 11:58:57AM +, Daniel P. Berrangé wrote:
> On Wed, Mar 20, 2024 at 03:39:18AM -0500, Michael Roth wrote:
> > From: Brijesh Singh 
> > 
> > SEV-SNP support relies on a different set of properties/state than the
> > existing 'sev-guest' object. This patch introduces the 'sev-snp-guest'
> > object, which can be used to configure an SEV-SNP guest. For example,
> > a default-configured SEV-SNP guest with no additional information
> > passed in for use with attestation:
> > 
> >   -object sev-snp-guest,id=sev0
> > 
> > or a fully-specified SEV-SNP guest where all spec-defined binary
> > blobs are passed in as base64-encoded strings:
> > 
> >   -object sev-snp-guest,id=sev0, \
> > policy=0x3, \
> > init-flags=0, \
> > id-block=YWFhYWFhYWFhYWFhYWFhCg==, \
> > id-auth=CxHK/OKLkXGn/KpAC7Wl1FSiisWDbGTEKz..., \
> > auth-key-enabled=on, \
> > host-data=LNkCWBRC5CcdGXirbNUV1OrsR28s..., \
> > guest-visible-workarounds=AA==, \
> > 
> > See the QAPI schema updates included in this patch for more usage
> > details.
> > 
> > In some cases these blobs may be up to 4096 characters, but this is
> > generally well below the default limit for linux hosts where
> > command-line sizes are defined by the sysconf-configurable ARG_MAX
> > value, which defaults to 2097152 characters for Ubuntu hosts, for
> > example.
> > 
> > Signed-off-by: Brijesh Singh 
> > Co-developed-by: Michael Roth 
> > Acked-by: Markus Armbruster  (for QAPI schema)
> > Signed-off-by: Michael Roth 
> > ---
> >  docs/system/i386/amd-memory-encryption.rst |  78 ++-
> >  qapi/qom.json  |  51 +
> >  target/i386/sev.c  | 241 +
> >  target/i386/sev.h  |   1 +
> >  4 files changed, 369 insertions(+), 2 deletions(-)
> > 
> 
> > +##
> > +# @SevSnpGuestProperties:
> > +#
> > +# Properties for sev-snp-guest objects. Most of these are direct arguments
> > +# for the KVM_SNP_* interfaces documented in the linux kernel source
> > +# under Documentation/virt/kvm/amd-memory-encryption.rst, which are in
> > +# turn closely coupled with the SNP_INIT/SNP_LAUNCH_* firmware commands
> > +# documented in the SEV-SNP Firmware ABI Specification (Rev 0.9).
> > +#
> > +# More usage information is also available in the QEMU source tree under
> > +# docs/amd-memory-encryption.
> > +#
> > +# @policy: the 'POLICY' parameter to the SNP_LAUNCH_START command, as
> > +#  defined in the SEV-SNP firmware ABI (default: 0x3)
> > +#
> > +# @guest-visible-workarounds: 16-byte, base64-encoded blob to report
> > +# hypervisor-defined workarounds, corresponding
> > +# to the 'GOSVW' parameter of the
> > +# SNP_LAUNCH_START command defined in the
> > +# SEV-SNP firmware ABI (default: all-zero)
> > +#
> > +# @id-block: 96-byte, base64-encoded blob to provide the 'ID Block'
> > +#structure for the SNP_LAUNCH_FINISH command defined in the
> > +#SEV-SNP firmware ABI (default: all-zero)
> > +#
> > +# @id-auth: 4096-byte, base64-encoded blob to provide the 'ID 
> > Authentication
> > +#   Information Structure' for the SNP_LAUNCH_FINISH command 
> > defined
> > +#   in the SEV-SNP firmware ABI (default: all-zero)
> > +#
> > +# @auth-key-enabled: true if 'id-auth' blob contains the 'AUTHOR_KEY' field
> > +#defined SEV-SNP firmware ABI (default: false)
> > +#
> > +# @host-data: 32-byte, base64-encoded, user-defined blob to provide to the
> > +# guest, as documented for the 'HOST_DATA' parameter of the
> > +# SNP_LAUNCH_FINISH command in the SEV-SNP firmware ABI
> > +# (default: all-zero)
> > +#
> > +# Since: 7.2
> 
> This will be 9.1 at the earliest now.

Amazing how good I am at remembering these once I see a reply to a
schema patch I'd already hit 'send' on :)

> 
> > +##
> > +{ 'struct': 'SevSnpGuestProperties',
> > +  'base': 'SevCommonProperties',
> > +  'data': {
> > +'*policy': 'uint64',
> > +'*guest-visible-workarounds': 'str',
> > +'*id-block': 'str',
> > +'*id-auth': 'str',
> > +'*auth-key-enabled': 'bool',
> > +'*host-data': 'str' } }
> > +
> 
> > diff --git a/target/i386/sev.c b/target/i386/sev.c
> > index 63a220de5e..7e6dab642a 100644
> > --- a/target/i386/sev.c
> > +++ b/target/i386/sev.c
> > @@ -42,6 +42,7 @@
> >  
> >  OBJECT_DECLARE_SIMPLE_TYPE(SevCommonState, SEV_COMMON)
> >  OBJECT_DECLARE_SIMPLE_TYPE(SevGuestState, SEV_GUEST)
> > +OBJECT_DECLARE_SIMPLE_TYPE(SevSnpGuestState, SEV_SNP_GUEST)
> >  
> >  struct SevCommonState {
> >  X86ConfidentialGuest parent_obj;
> > @@ -87,8 +88,22 @@ struct SevGuestState {
> >  bool kernel_hashes;
> >  };
> >  
> > +struct SevSnpGuestState {
> > +SevCommonState sev_common;
> > +
> > +/*

Re: [PATCH v3 21/49] i386/sev: Introduce "sev-common" type to encapsulate common SEV state

2024-03-20 Thread Michael Roth via

On Wed, Mar 20, 2024 at 11:47:28AM +, Daniel P. Berrangé wrote:
> On Wed, Mar 20, 2024 at 03:39:17AM -0500, Michael Roth wrote:
> > Currently all SEV/SEV-ES functionality is managed through a single
> > 'sev-guest' QOM type. With upcoming support for SEV-SNP, taking this
> > same approach won't work well since some of the properties/state
> > managed by 'sev-guest' is not applicable to SEV-SNP, which will instead
> > rely on a new QOM type with its own set of properties/state.
> > 
> > To prepare for this, this patch moves common state into an abstract
> > 'sev-common' parent type to encapsulate properties/state that are
> > common to both SEV/SEV-ES and SEV-SNP, leaving only SEV/SEV-ES-specific
> > properties/state in the current 'sev-guest' type. This should not
> > affect current behavior or command-line options.
> > 
> > As part of this patch, some related changes are also made:
> > 
> >   - a static 'sev_guest' variable is currently used to keep track of
> > the 'sev-guest' instance. SEV-SNP would similarly introduce an
> > 'sev_snp_guest' static variable. But these instances are now
> > available via qdev_get_machine()->cgs, so switch to using that
> > instead and drop the static variable.
> > 
> >   - 'sev_guest' is currently used as the name for the static variable
> > holding a pointer to the 'sev-guest' instance. Re-purpose the name
> > as a local variable referring the 'sev-guest' instance, and use
> > that consistently throughout the code so it can be easily
> > distinguished from sev-common/sev-snp-guest instances.
> > 
> >   - 'sev' is generally used as the name for local variables holding a
> > pointer to the 'sev-guest' instance. In cases where that now points
> > to common state, use the name 'sev_common'; in cases where that now
> > points to state specific to 'sev-guest' instance, use the name
> > 'sev_guest'
> > 
> > Signed-off-by: Michael Roth 
> > ---
> >  qapi/qom.json |  32 ++--
> >  target/i386/sev.c | 457 ++
> >  target/i386/sev.h |   3 +
> >  3 files changed, 281 insertions(+), 211 deletions(-)
> > 
> 
> >  static SevInfo *sev_get_info(void)
> >  {
> >  SevInfo *info;
> > +SevCommonState *sev_common = 
> > SEV_COMMON(MACHINE(qdev_get_machine())->cgs);
> > +SevGuestState *sev_guest =
> > +(SevGuestState *)object_dynamic_cast(OBJECT(sev_common),
> > + TYPE_SEV_GUEST);
> >  
> >  info = g_new0(SevInfo, 1);
> >  info->enabled = sev_enabled();
> >  
> >  if (info->enabled) {
> > -info->api_major = sev_guest->api_major;
> > -info->api_minor = sev_guest->api_minor;
> > -info->build_id = sev_guest->build_id;
> > -info->policy = sev_guest->policy;
> > -info->state = sev_guest->state;
> > -info->handle = sev_guest->handle;
> > +if (sev_guest) {
> > +info->handle = sev_guest->handle;
> > +}
> 
> If we're not going to provide a value for 'handle', then
> we should update the QAPI for this to mark the property
> as optional, which would then require doing
> 
>   info->has_handle = true;
> 
> inside this 'if' block.

I think this is another temporarily-awkward case that gets resolved
with:

  i386/sev: Update query-sev QAPI format to handle SEV-SNP

With that patch 'handle' is always available for SEV guests, and never
available for SNP, and that's managed through a discriminated union
type. I think that info->handle should be treated the same as the
other fields as part of this patch and any changes in how they are
reported should be kept in the above-mentioned patch.

This might be another artifact from v2's handling. Will get this fixed
up.

-Mike

> > +}

> 
> > +info->api_major = sev_common->api_major;
> > +info->api_minor = sev_common->api_minor;
> > +info->build_id = sev_common->build_id;
> > +info->state = sev_common->state;
> > +/* we only report the lower 32-bits of policy for SNP, ok for 
> > now... */
> > +info->policy =
> > +(uint32_t)object_property_get_uint(OBJECT(sev_common),
> > +   "policy", NULL);
> >  }
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>

Re: [PATCH v3 21/49] i386/sev: Introduce "sev-common" type to encapsulate common SEV state

2024-03-20 Thread Michael Roth via

On Wed, Mar 20, 2024 at 11:44:13AM +, Daniel P. Berrangé wrote:
> On Wed, Mar 20, 2024 at 03:39:17AM -0500, Michael Roth wrote:
> > Currently all SEV/SEV-ES functionality is managed through a single
> > 'sev-guest' QOM type. With upcoming support for SEV-SNP, taking this
> > same approach won't work well since some of the properties/state
> > managed by 'sev-guest' is not applicable to SEV-SNP, which will instead
> > rely on a new QOM type with its own set of properties/state.
> > 
> > To prepare for this, this patch moves common state into an abstract
> > 'sev-common' parent type to encapsulate properties/state that are
> > common to both SEV/SEV-ES and SEV-SNP, leaving only SEV/SEV-ES-specific
> > properties/state in the current 'sev-guest' type. This should not
> > affect current behavior or command-line options.
> > 
> > As part of this patch, some related changes are also made:
> > 
> >   - a static 'sev_guest' variable is currently used to keep track of
> > the 'sev-guest' instance. SEV-SNP would similarly introduce an
> > 'sev_snp_guest' static variable. But these instances are now
> > available via qdev_get_machine()->cgs, so switch to using that
> > instead and drop the static variable.
> > 
> >   - 'sev_guest' is currently used as the name for the static variable
> > holding a pointer to the 'sev-guest' instance. Re-purpose the name
> > as a local variable referring the 'sev-guest' instance, and use
> > that consistently throughout the code so it can be easily
> > distinguished from sev-common/sev-snp-guest instances.
> > 
> >   - 'sev' is generally used as the name for local variables holding a
> > pointer to the 'sev-guest' instance. In cases where that now points
> > to common state, use the name 'sev_common'; in cases where that now
> > points to state specific to 'sev-guest' instance, use the name
> > 'sev_guest'
> > 
> > Signed-off-by: Michael Roth 
> > ---
> >  qapi/qom.json |  32 ++--
> >  target/i386/sev.c | 457 ++
> >  target/i386/sev.h |   3 +
> >  3 files changed, 281 insertions(+), 211 deletions(-)
> > 
> > diff --git a/qapi/qom.json b/qapi/qom.json
> > index baae3a183f..66b5781ca6 100644
> > --- a/qapi/qom.json
> > +++ b/qapi/qom.json
> > @@ -875,12 +875,29 @@
> >'data': { '*filename': 'str' } }
> >  
> >  ##
> > -# @SevGuestProperties:
> > +# @SevCommonProperties:
> >  #
> > -# Properties for sev-guest objects.
> > +# Properties common to objects that are derivatives of sev-common.
> >  #
> >  # @sev-device: SEV device to use (default: "/dev/sev")
> >  #
> > +# @cbitpos: C-bit location in page table entry (default: 0)
> > +#
> > +# @reduced-phys-bits: number of bits in physical addresses that become
> > +# unavailable when SEV is enabled
> > +#
> > +# Since: 2.12
> 
> Not quite sure what we've done in this scenario before.
> It feels wierd to use '2.12' for the new base type, even
> though in effect the properties all existed since 2.12 in
> the sub-class.
> 
> Perhaps 'Since: 9.1' for the type, but 'Since: 2.12' for the
> properties, along with an explanatory comment about stuff
> moving into the new base type ?
> 
> Markus, opinions ?

My thinking is that the internal details are less important than what's
actually exposed to users in the form of command-line options/etc. So
in that context the "Since: 2.12" sort of becomes the "default" for when
those properties were first made available to users, and then anything we
add after would then get special treatment with the per-property
versioning. But no issue with taking a different approach if that's
preferred.

> 
> > +##
> > +{ 'struct': 'SevCommonProperties',
> > +  'data': { '*sev-device': 'str',
> > +'*cbitpos': 'uint32',
> > +'reduced-phys-bits': 'uint32' } }
> > +
> > +##
> > +# @SevGuestProperties:
> > +#
> > +# Properties for sev-guest objects.
> > +#
> >  # @dh-cert-file: guest owners DH certificate (encoded with base64)
> >  #
> >  # @session-file: guest owners session parameters (encoded with base64)
> > @@ -889,11 +906,6 @@
> >  #
> >  # @handle: SEV firmware handle (default: 0)
> >  #
> > -# @cbitpos: C-bit location in page table entry (default: 0)
> > -#
> > -# @reduced-phys-bits: number of bits in physical addresses that become
> > -# unavailable when SEV is enabled
> > -#
> >  # @kernel-hashes: if true, add hashes of kernel/initrd/cmdline to a
> >  # designated guest firmware page for measured boot with -kernel
> >  # (default: false) (since 6.2)
> > @@ -901,13 +913,11 @@
> >  # Since: 2.12
> >  ##
> >  { 'struct': 'SevGuestProperties',
> > -  'data': { '*sev-device': 'str',
> > -'*dh-cert-file': 'str',
> > +  'base': 'SevCommonProperties',
> > +  'data': { '*dh-cert-file': 'str',
> >  '*session-file': 'str',
> >  '*policy': 'uint32',
> >  '*handle': 'uint32',
> > -'*cbitpos': 'uint32',
> > -

[PATCH v2 0/2] Implement QEMU GA commands for Windows

2024-03-20 Thread aidan_leuck

From: Aidan Leuck 

* Fixed styling errors
* Moved from wcstombs to g_utf functions
* Removed unnecessary if checks on calls to free
* Fixed copyright headers
* Refactored create_acl functions into base function, admin function and user 
function
* Removed unused user count function
* Split up refactor into a separate patch

Aidan Leuck (2):
  Implement QEMU GA commands for Windows
  Factored out common functions between POSIX and Windows implementation

 qga/commands-posix-ssh.c   |  47 +--
 qga/commands-ssh-core.c|  57 +++
 qga/commands-ssh-core.h|  15 +
 qga/commands-windows-ssh.c | 759 +
 qga/commands-windows-ssh.h |  27 ++
 qga/meson.build|  12 +-
 qga/qapi-schema.json   |  22 +-
 7 files changed, 881 insertions(+), 58 deletions(-)
 create mode 100644 qga/commands-ssh-core.c
 create mode 100644 qga/commands-ssh-core.h
 create mode 100644 qga/commands-windows-ssh.c
 create mode 100644 qga/commands-windows-ssh.h

-- 
2.34.1

>From b77264e7ba7390adeaaa9d5df707c60693d78c16 Mon Sep 17 00:00:00 2001
From: Aidan Leuck 
Date: Wed, 20 Mar 2024 15:36:18 -0600
Subject: [PATCH v2 1/2] Implement QEMU GA commands for Windows

Signed-off-by: Aidan Leuck 
---
 qga/commands-windows-ssh.c | 823 +
 qga/commands-windows-ssh.h |  26 ++
 qga/meson.build|   9 +-
 qga/qapi-schema.json   |  22 +-
 4 files changed, 867 insertions(+), 13 deletions(-)
 create mode 100644 qga/commands-windows-ssh.c
 create mode 100644 qga/commands-windows-ssh.h

diff --git a/qga/commands-windows-ssh.c b/qga/commands-windows-ssh.c
new file mode 100644
index 00..e9faae90fc
--- /dev/null
+++ b/qga/commands-windows-ssh.c
@@ -0,0 +1,823 @@
+/*
+ * QEMU Guest Agent win32-specific command implementations for SSH keys.
+ * The implementation is opinionated and expects the SSH implementation to
+ * be OpenSSH.
+ *
+ * Copyright Schweitzer Engineering Laboratories. 2024
+ *
+ * Authors:
+ *  Aidan Leuck 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include 
+#include 
+
+#include "commands-windows-ssh.h"
+#include "guest-agent-core.h"
+#include "limits.h"
+#include "lmaccess.h"
+#include "lmapibuf.h"
+#include "lmerr.h"
+#include "qapi/error.h"
+
+#include "qga-qapi-commands.h"
+#include "sddl.h"
+#include "shlobj.h"
+#include "userenv.h"
+
+#define AUTHORIZED_KEY_FILE "authorized_keys"
+#define AUTHORIZED_KEY_FILE_ADMIN "administrators_authorized_keys"
+#define LOCAL_SYSTEM_SID "S-1-5-18"
+#define ADMIN_SID "S-1-5-32-544"
+#define WORLD_SID "S-1-1-0"
+
+/*
+ * Reads the authorized_keys file and returns an array of strings for each 
entry
+ * parameters:
+ * path -> Path to the authorized_keys file
+ * errp -> Error structure that will contain errors upon failure.
+ * returns: Array of strings, where each entry is an authorized key.
+ */
+static GStrv read_authkeys(const char *path, Error **errp)
+{
+  g_autoptr(GError) err = NULL;
+  g_autofree char *contents = NULL;
+
+  if (!g_file_get_contents(path, , NULL, )) {
+error_setg(errp, "failed to read '%s': %s", path, err->message);
+return NULL;
+  }
+
+  return g_strsplit(contents, "\n", -1);
+}
+
+/*
+ * Checks if a OpenSSH key is valid
+ * parameters:
+ * key* Key to check for validity
+ * errp -> Error structure that will contain errors upon failure.
+ * returns: true if key is valid, false otherwise
+ */
+static bool check_openssh_pub_key(const char *key, Error **errp)
+{
+  /* simple sanity-check, we may want more? */
+  if (!key || key[0] == '#' || strchr(key, '\n')) {
+error_setg(errp, "invalid OpenSSH public key: '%s'", key);
+return false;
+  }
+
+  return true;
+}
+
+/*
+ * Checks if all openssh keys in the array are valid
+ * parameters:
+ * keys -> Array of keys to check
+ * errp -> Error structure that will contain errors upon failure.
+ * returns: true if all keys are valid, false otherwise
+ */
+static bool check_openssh_pub_keys(strList *keys, size_t *nkeys, Error **errp)
+{
+  size_t n = 0;
+  strList *k;
+
+  for (k = keys; k != NULL; k = k->next) {
+if (!check_openssh_pub_key(k->value, errp)) {
+  return false;
+}
+n++;
+  }
+
+  if (nkeys) {
+*nkeys = n;
+  }
+  return true;
+}
+
+/*
+ * Frees userInfo structure. This implements the g_auto cleanup
+ * for the structure.
+ */
+void free_userInfo(PWindowsUserInfo info)
+{
+  g_free(info->sshDirectory);
+  g_free(info->authorizedKeyFile);
+  LocalFree(info->SSID);
+  g_free(info->username);
+  g_free(info);
+}
+
+/*
+ * Gets the admin SSH folder for OpenSSH. OpenSSH does not store
+ * the authorized_key file in the users home directory for security reasons and
+ * instead stores it at %PROGRAMDATA%/ssh. This function returns the path to
+ * that directory on the users machine parameters: errp -> error structure to
+ * set when an error occurs returns: The path to

Re: [PATCH-for-9.0 0/2] target/monitor: Deprecate 'info tlb/mem' in favor of 'info mmu'

2024-03-20 Thread Dr. David Alan Gilbert

* Peter Maydell (peter.mayd...@linaro.org) wrote:
> On Wed, 20 Mar 2024 at 17:06, Philippe Mathieu-Daudé  
> wrote:
> >
> > +Alex/Daniel
> >
> > On 20/3/24 17:53, Peter Maydell wrote:
> > > On Wed, 20 Mar 2024 at 16:40, Philippe Mathieu-Daudé  
> > > wrote:
> > >>
> > >> 'info tlb' and 'info mem' commands don't scale in heterogeneous
> > >> emulation. They will be reworked after the next release, hidden
> > >> behind the 'info mmu' command. It is not too late to deprecate
> > >> commands, so add the 'info mmu' command as wrapper to the other
> > >> ones, but already deprecate them.
> > >>
> > >> Philippe Mathieu-Daudé (2):
> > >>target/monitor: Introduce 'info mmu' command
> > >>target/monitor: Deprecate 'info tlb' and 'info mem' commands
> > >
> > > This seems to replace "info tlb" and "info mem" with "info mmu -t"
> > > and "info mmu -m", but it doesn't really say anything about:
> > >   * what the difference is between these two things
> >
> > I really don't know; I'm only trying to keep the monitor interface
> > identical.
> 
> You don't, though: you change it from "info tlb" to "info mmu -t" etc.
> 
> > >   * which targets implement which and why
> >
> > This one is easy to answer:
> >
> > #if defined(TARGET_I386) || defined(TARGET_SH4) || defined(TARGET_SPARC)
> > || \
> >  defined(TARGET_PPC) || defined(TARGET_XTENSA) || defined(TARGET_M68K)
> >  {
> >  .name   = "tlb",
> >
> > #if defined(TARGET_I386) || defined(TARGET_RISCV)
> >  {
> >  .name   = "mem",
> >
> > >   * what the plan is for the future
> >
> > My problem is with linking a single QEMU binary, as these two symbols
> > (hmp_info_mem and hmp_info_tlb) clash.
> 
> Yes, but they both (implicitly) operate on the current HMP CPU,
> so the problem with linking into a single binary is that they're
> not indirected through a method on the CPU object, not the syntax
> used in the monitor to invoke them, presumably.
> 
> > I'm indeed only postponing the problem, without looking at what
> > this code does. I did it adding hmp_info_mmu_tlb/mem hooks in
> > TCGCPUOps ("hw/core/tcg-cpu-ops.h"), so the command can be
> > dispatched per target vcpu as target-agnostic code in
> > monitor/hmp-cmds.c:
> >
> > +#include "hw/core/tcg-cpu-ops.h"
> > +
> > +static void hmp_info_mmu_tlb(Monitor *mon, CPUState *cpu)
> > +{
> > +const TCGCPUOps *tcg_ops = cpu->cc->tcg_ops;
> > +
> > +if (tcg_ops->hmp_info_mmu_tlb) {
> > +tcg_ops->hmp_info_mmu_tlb(mon, cpu_env(cpu));
> > +} else {
> > +monitor_puts(mon, "No per-CPU information available on this
> > target\n");
> > +}
> > +}
> 
> These aren't TCG specific though, so why TCGCPUOps ?
> 
> > > I am definitely not a fan of either of these commands, because
> > > (as we currently implement them) they effectively require each
> > > target architecture to implement a second copy of the page table
> > > walking code. But before we can deprecate them we need to be
> > > pretty sure that "info mmu" is what we want to replace them with.
> >
> > An alternative is to just deprecate them, without adding "info mmu" :)
> >
> > It is OK to un-deprecate stuff if we realize its usefulness.
> 
> The commands are there because some users find them useful.
> I just dislike them because I think they're a bit niche and
> annoying to implement and not consistent across target
> architectures and not very well documented...
> 
> By the way, we have no obligation to follow the deprecate-and-drop
> process for HMP commands; unlike QMP, we give ourselves the
> license to vary it when we feel like it, because the users are
> humans, not programs or scripts.

Right, so no rush to get the deprecation in; change it when you agree
what you'd like a replacement to look like.

Dave

> -- PMM
-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/

[PATCH] migration/postcopy: Fix high frequency sync

2024-03-20 Thread peterx

From: Peter Xu 

On current code base I can observe extremely high sync count during
precopy, as long as one enables postcopy-ram=on before switchover to
postcopy.

To provide some context of when we decide to do a full sync: we check
must_precopy (which implies "data must be sent during precopy phase"), and
as long as it is lower than the threshold size we calculated (out of
bandwidth and expected downtime) we will kick off the slow sync.

However, when postcopy is enabled (even if still during precopy phase), RAM
only reports all pages as can_postcopy, and report must_precopy==0.  Then
"must_precopy <= threshold_size" mostly always triggers and enforces a slow
sync for every call to migration_iteration_run() when postcopy is enabled
even if not used.  That is insane.

It turns out it was a regress bug introduced in the previous refactoring in
QEMU 8.0 in late 2022. Fix this by checking the whole RAM size rather than
must_precopy, like before.  Not copy stable yet as many things changed, and
even if this should be a major performance regression, no functional change
has observed (and that's also probably why nobody found it).  I only notice
this when looking for another bug reported by Nina.

When at it, cleanup a little bit on the lines around.

Cc: Nina Schoetterl-Glausch 
Fixes: c8df4a7aef ("migration: Split save_live_pending() into state_pending_*")
Signed-off-by: Peter Xu 
---

Nina: I copied you only because this might still be relevant, as this issue
also misteriously points back to c8df4a7aef..  However I don't think it
should be a fix of your problem, at most it can change the possibility of
reproducability.

This is not a regression for this release, but I still want to have it for
9.0.  Fabiano, any opinions / objections?
---
 migration/migration.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 047b6b49cf..9fe8fd2afd 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3199,17 +3199,16 @@ typedef enum {
  */
 static MigIterateState migration_iteration_run(MigrationState *s)
 {
-uint64_t must_precopy, can_postcopy;
+uint64_t must_precopy, can_postcopy, pending_size;
 Error *local_err = NULL;
 bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
 bool can_switchover = migration_can_switchover(s);
 
 qemu_savevm_state_pending_estimate(_precopy, _postcopy);
-uint64_t pending_size = must_precopy + can_postcopy;
-
+pending_size = must_precopy + can_postcopy;
 trace_migrate_pending_estimate(pending_size, must_precopy, can_postcopy);
 
-if (must_precopy <= s->threshold_size) {
+if (pending_size < s->threshold_size) {
 qemu_savevm_state_pending_exact(_precopy, _postcopy);
 pending_size = must_precopy + can_postcopy;
 trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy);
-- 
2.44.0

Re: [PATCH v4 2/2] vhost: Perform memory section dirty scans once per iteration

2024-03-20 Thread Si-Wei Liu





On 3/19/2024 8:27 PM, Jason Wang wrote:

On Tue, Mar 19, 2024 at 6:16 AM Si-Wei Liu  wrote:



On 3/17/2024 8:22 PM, Jason Wang wrote:

On Sat, Mar 16, 2024 at 2:45 AM Si-Wei Liu  wrote:


On 3/14/2024 9:03 PM, Jason Wang wrote:

On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu  wrote:

On setups with one or more virtio-net devices with vhost on,
dirty tracking iteration increases cost the bigger the number
amount of queues are set up e.g. on idle guests migration the
following is observed with virtio-net with vhost=on:

48 queues -> 78.11%  [.] vhost_dev_sync_region.isra.13
8 queues -> 40.50%   [.] vhost_dev_sync_region.isra.13
1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13
2 devices, 1 queue -> 18.60%  [.] vhost_dev_sync_region.isra.14

With high memory rates the symptom is lack of convergence as soon
as it has a vhost device with a sufficiently high number of queues,
the sufficient number of vhost devices.

On every migration iteration (every 100msecs) it will redundantly
query the *shared log* the number of queues configured with vhost
that exist in the guest. For the virtqueue data, this is necessary,
but not for the memory sections which are the same. So essentially
we end up scanning the dirty log too often.

To fix that, select a vhost device responsible for scanning the
log with regards to memory sections dirty tracking. It is selected
when we enable the logger (during migration) and cleared when we
disable the logger. If the vhost logger device goes away for some
reason, the logger will be re-selected from the rest of vhost
devices.

After making mem-section logger a singleton instance, constant cost
of 7%-9% (like the 1 queue report) will be seen, no matter how many
queues or how many vhost devices are configured:

48 queues -> 8.71%[.] vhost_dev_sync_region.isra.13
2 devices, 8 queues -> 7.97%   [.] vhost_dev_sync_region.isra.14

Co-developed-by: Joao Martins 
Signed-off-by: Joao Martins 
Signed-off-by: Si-Wei Liu 

---
v3 -> v4:
 - add comment to clarify effect on cache locality and
   performance

v2 -> v3:
 - add after-fix benchmark to commit log
 - rename vhost_log_dev_enabled to vhost_dev_should_log
 - remove unneeded comparisons for backend_type
 - use QLIST array instead of single flat list to store vhost
   logger devices
 - simplify logger election logic
---
hw/virtio/vhost.c | 67 
++-
include/hw/virtio/vhost.h |  1 +
2 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 612f4db..58522f1 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -45,6 +45,7 @@

static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
+static QLIST_HEAD(, vhost_dev) vhost_log_devs[VHOST_BACKEND_TYPE_MAX];

/* Memslots used by backends that support private memslots (without an fd). 
*/
static unsigned int used_memslots;
@@ -149,6 +150,47 @@ bool vhost_dev_has_iommu(struct vhost_dev *dev)
}
}

+static inline bool vhost_dev_should_log(struct vhost_dev *dev)
+{
+assert(dev->vhost_ops);
+assert(dev->vhost_ops->backend_type > VHOST_BACKEND_TYPE_NONE);
+assert(dev->vhost_ops->backend_type < VHOST_BACKEND_TYPE_MAX);
+
+return dev == QLIST_FIRST(_log_devs[dev->vhost_ops->backend_type]);

A dumb question, why not simple check

dev->log == vhost_log_shm[dev->vhost_ops->backend_type]

Because we are not sure if the logger comes from vhost_log_shm[] or
vhost_log[]. Don't want to complicate the check here by calling into
vhost_dev_log_is_shared() everytime when the .log_sync() is called.

It has very low overhead, isn't it?

Whether this has low overhead will have to depend on the specific
backend's implementation for .vhost_requires_shm_log(), which the common
vhost layer should not assume upon or rely on the current implementation.


static bool vhost_dev_log_is_shared(struct vhost_dev *dev)
{
  return dev->vhost_ops->vhost_requires_shm_log &&
 dev->vhost_ops->vhost_requires_shm_log(dev);
}

For example, if I understand the code correctly, the log type won't be
changed during runtime, so we can endup with a boolean to record that
instead of a query ops?
Right now the log type won't change during runtime, but I am not sure if 
this may prohibit future revisit to allow change at the runtime, then 
there'll be complex code involvled to maintain the state.


Other than this, I think it's insufficient to just check the shm log 
v.s. normal log. The logger device requires to identify a leading logger 
device that gets elected in vhost_dev_elect_mem_logger(), as all the 
dev->log points to the same logger that is refenerce counted, that we 
have to add extra field and complex logic to maintain the election 
status. I thought that Eugenio's previous suggestion tried to simplify 
the logic in vhost_dev_elect_mem_logger(), as the QLIST_FIRST

Re: [PATCH v2 1/2] target/riscv/csr.c: Add functional of hvictl CSR

2024-03-20 Thread Daniel Henrique Barboza


Hi,

This patch doesn't apply in master or alistair/riscv-to-apply.next. Can you
please re-send?


Thanks,


Daniel


On 3/20/24 13:42, Irina Ryapolova wrote:

CSR hvictl (Hypervisor Virtual Interrupt Control) provides further flexibility
for injecting interrupts into VS level in situations not fully supported by the
facilities described thus far, but only with more active involvement of the 
hypervisor.

A hypervisor must use hvictl for any of the following:
• asserting for VS level a major interrupt not supported by hvien and hvip;
• implementing configurability of priorities at VS level for major interrupts 
beyond those sup-
ported by hviprio1 and hviprio2; or
• emulating an external interrupt controller for a virtual hart without the use 
of an IMSIC’s
guest interrupt file, while also supporting configurable priorities both for 
external interrupts
and for major interrupts to the virtual hart.

All hvictl fields together can affect the value of CSR vstopi (Virtual 
Supervisor Top Interrupt)
and therefore the interrupt identity reported in vscause when an interrupt 
traps to VS-mode.
When hvictl.VTI = 1, the absence of an interrupt for VS level can be indicated 
only by setting
hvictl.IID = 9. Software might want to use the pair IID = 9, IPRIO = 0 
generally to represent
no interrupt in hvictl.

(See riscv-interrupts-1.0: Interrupts at VS level)

Signed-off-by: Irina Ryapolova 
---
Changes for v2:
   -added more information in commit message
---
  target/riscv/csr.c | 15 +++
  1 file changed, 15 insertions(+)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 674ea075a4..0c21145eaf 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3585,6 +3585,21 @@ static int read_hvictl(CPURISCVState *env, int csrno, 
target_ulong *val)
  static int write_hvictl(CPURISCVState *env, int csrno, target_ulong val)
  {
  env->hvictl = val & HVICTL_VALID_MASK;
+if (env->hvictl & HVICTL_VTI)
+{
+uint32_t hviid = get_field(env->hvictl, HVICTL_IID);
+uint32_t hviprio = get_field(env->hvictl, HVICTL_IPRIO);
+/* the pair IID = 9, IPRIO = 0 generally to represent no interrupt in 
hvictl. */
+if (!(hviid == IRQ_S_EXT && hviprio == 0)) {
+uint64_t new_val = BIT(hviid) ;
+ if (new_val & S_MODE_INTERRUPTS) {
+rmw_hvip64(env, csrno, NULL, new_val << 1, new_val << 1);
+} else if (new_val & LOCAL_INTERRUPTS) {
+rmw_hvip64(env, csrno, NULL, new_val, new_val);
+}
+}
+}
+
  return RISCV_EXCP_NONE;
  }

Re: [PATCH RFC v3 00/49] Add AMD Secure Nested Paging (SEV-SNP) support

2024-03-20 Thread Xiaoyao Li


On 3/21/2024 1:08 AM, Paolo Bonzini wrote:

On Wed, Mar 20, 2024 at 10:59 AM Paolo Bonzini  wrote:

I will now focus on reviewing patches 6-20.  This way we can prepare a
common tree for SEV_INIT2/SNP/TDX, for both vendors to build upon.


Ok, the attachment is the delta that I have. The only major change is
requiring discard (thus effectively blocking VFIO support for
SEV-SNP/TDX, at least for now).

I will push it shortly to the same sevinit2 branch, and will post the
patches sometime soon.

Xiaoyao, you can use that branch too (it's on
https://gitlab.com/bonzini/qemu) as the basis for your TDX work.


Sure, it's really a good news for us.

BTW, there are some minor comments on guest_memfd patches of my v5 
post[*]. Could you please resolve them it your branch?


[*] 
https://lore.kernel.org/qemu-devel/20240229063726.610065-1-xiaoyao...@intel.com/



Paolo

[PATCH 1/3] hw/virtio: initialize QemuDmaBuf using the function from ui/console

2024-03-20 Thread dongwon . kim

From: Dongwon Kim 

QemuDmaBuf is an abstraction of dmabuf specifically for ui/console usage.
To enhance safety and maintainability, it is needed to centralizes its
creation and initialization within ui/console using newly introduced methods.

Cc: Philippe Mathieu-Daudé 
Cc: Marc-André Lureau 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 hw/display/virtio-gpu-udmabuf.c | 27 +++
 include/hw/virtio/virtio-gpu.h  |  2 +-
 2 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/hw/display/virtio-gpu-udmabuf.c b/hw/display/virtio-gpu-udmabuf.c
index d51184d658..dde6c8e9d9 100644
--- a/hw/display/virtio-gpu-udmabuf.c
+++ b/hw/display/virtio-gpu-udmabuf.c
@@ -162,7 +162,7 @@ static void virtio_gpu_free_dmabuf(VirtIOGPU *g, VGPUDMABuf 
*dmabuf)
 struct virtio_gpu_scanout *scanout;
 
 scanout = >parent_obj.scanout[dmabuf->scanout_id];
-dpy_gl_release_dmabuf(scanout->con, >buf);
+dpy_gl_release_dmabuf(scanout->con, dmabuf->buf);
 QTAILQ_REMOVE(>dmabuf.bufs, dmabuf, next);
 g_free(dmabuf);
 }
@@ -181,17 +181,10 @@ static VGPUDMABuf
 }
 
 dmabuf = g_new0(VGPUDMABuf, 1);
-dmabuf->buf.width = r->width;
-dmabuf->buf.height = r->height;
-dmabuf->buf.stride = fb->stride;
-dmabuf->buf.x = r->x;
-dmabuf->buf.y = r->y;
-dmabuf->buf.backing_width = fb->width;
-dmabuf->buf.backing_height = fb->height;
-dmabuf->buf.fourcc = qemu_pixman_to_drm_format(fb->format);
-dmabuf->buf.fd = res->dmabuf_fd;
-dmabuf->buf.allow_fences = true;
-dmabuf->buf.draw_submitted = false;
+dmabuf->buf = dpy_gl_create_dmabuf(r->width, r->height, fb->stride,
+   r->x, r->y, fb->width, fb->height,
+   qemu_pixman_to_drm_format(fb->format),
+   0, res->dmabuf_fd, false);
 dmabuf->scanout_id = scanout_id;
 QTAILQ_INSERT_HEAD(>dmabuf.bufs, dmabuf, next);
 
@@ -206,21 +199,23 @@ int virtio_gpu_update_dmabuf(VirtIOGPU *g,
 {
 struct virtio_gpu_scanout *scanout = >parent_obj.scanout[scanout_id];
 VGPUDMABuf *new_primary, *old_primary = NULL;
+uint32_t width, height;
 
 new_primary = virtio_gpu_create_dmabuf(g, scanout_id, res, fb, r);
 if (!new_primary) {
 return -EINVAL;
 }
 
+width = dpy_gl_dmabuf_get_width(new_primary->buf);
+height = dpy_gl_dmabuf_get_height(new_primary->buf);
+
 if (g->dmabuf.primary[scanout_id]) {
 old_primary = g->dmabuf.primary[scanout_id];
 }
 
 g->dmabuf.primary[scanout_id] = new_primary;
-qemu_console_resize(scanout->con,
-new_primary->buf.width,
-new_primary->buf.height);
-dpy_gl_scanout_dmabuf(scanout->con, _primary->buf);
+qemu_console_resize(scanout->con, width, height);
+dpy_gl_scanout_dmabuf(scanout->con, new_primary->buf);
 
 if (old_primary) {
 virtio_gpu_free_dmabuf(g, old_primary);
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index ed44cdad6b..010083e8e3 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -169,7 +169,7 @@ struct VirtIOGPUBaseClass {
 DEFINE_PROP_UINT32("yres", _state, _conf.yres, 800)
 
 typedef struct VGPUDMABuf {
-QemuDmaBuf buf;
+QemuDmaBuf *buf;
 uint32_t scanout_id;
 QTAILQ_ENTRY(VGPUDMABuf) next;
 } VGPUDMABuf;
-- 
2.34.1

[PATCH 2/3] hw/vfio: intialize QemuDmaBuf using the function from ui/console

2024-03-20 Thread dongwon . kim

From: Dongwon Kim 

QemuDmaBuf is an abstraction of dmabuf specifically for ui/console usage.
To enhance safety and maintainability, it is needed to centralizes its
creation and initialization within ui/console using newly introduced methods.

Cc: Philippe Mathieu-Daudé 
Cc: Marc-André Lureau 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 hw/vfio/display.c | 35 ---
 include/hw/vfio/vfio-common.h |  2 +-
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/hw/vfio/display.c b/hw/vfio/display.c
index 1aa440c663..a3bdb01789 100644
--- a/hw/vfio/display.c
+++ b/hw/vfio/display.c
@@ -241,14 +241,10 @@ static VFIODMABuf *vfio_display_get_dmabuf(VFIOPCIDevice 
*vdev,
 
 dmabuf = g_new0(VFIODMABuf, 1);
 dmabuf->dmabuf_id  = plane.dmabuf_id;
-dmabuf->buf.width  = plane.width;
-dmabuf->buf.height = plane.height;
-dmabuf->buf.backing_width = plane.width;
-dmabuf->buf.backing_height = plane.height;
-dmabuf->buf.stride = plane.stride;
-dmabuf->buf.fourcc = plane.drm_format;
-dmabuf->buf.modifier = plane.drm_format_mod;
-dmabuf->buf.fd = fd;
+dmabuf->buf = dpy_gl_create_dmabuf(plane.width, plane.height, plane.stride,
+   0, 0, plane.width, plane.height,
+   plane.drm_format, plane.drm_format_mod,
+   fd, false);
 if (plane_type == DRM_PLANE_TYPE_CURSOR) {
 vfio_display_update_cursor(dmabuf, );
 }
@@ -259,9 +255,15 @@ static VFIODMABuf *vfio_display_get_dmabuf(VFIOPCIDevice 
*vdev,
 
 static void vfio_display_free_one_dmabuf(VFIODisplay *dpy, VFIODMABuf *dmabuf)
 {
+int fd;
+
 QTAILQ_REMOVE(>dmabuf.bufs, dmabuf, next);
-dpy_gl_release_dmabuf(dpy->con, >buf);
-close(dmabuf->buf.fd);
+fd = dpy_gl_dmabuf_get_fd(dmabuf->buf);
+if (fd > -1) {
+close(fd);
+}
+
+dpy_gl_release_dmabuf(dpy->con, dmabuf->buf);
 g_free(dmabuf);
 }
 
@@ -286,6 +288,7 @@ static void vfio_display_dmabuf_update(void *opaque)
 VFIOPCIDevice *vdev = opaque;
 VFIODisplay *dpy = vdev->dpy;
 VFIODMABuf *primary, *cursor;
+uint32_t width, height;
 bool free_bufs = false, new_cursor = false;
 
 primary = vfio_display_get_dmabuf(vdev, DRM_PLANE_TYPE_PRIMARY);
@@ -296,11 +299,13 @@ static void vfio_display_dmabuf_update(void *opaque)
 return;
 }
 
+width = dpy_gl_dmabuf_get_width(primary->buf);
+height = dpy_gl_dmabuf_get_height(primary->buf);
+
 if (dpy->dmabuf.primary != primary) {
 dpy->dmabuf.primary = primary;
-qemu_console_resize(dpy->con,
-primary->buf.width, primary->buf.height);
-dpy_gl_scanout_dmabuf(dpy->con, >buf);
+qemu_console_resize(dpy->con, width, height);
+dpy_gl_scanout_dmabuf(dpy->con, primary->buf);
 free_bufs = true;
 }
 
@@ -314,7 +319,7 @@ static void vfio_display_dmabuf_update(void *opaque)
 if (cursor && (new_cursor || cursor->hot_updates)) {
 bool have_hot = (cursor->hot_x != 0x &&
  cursor->hot_y != 0x);
-dpy_gl_cursor_dmabuf(dpy->con, >buf, have_hot,
+dpy_gl_cursor_dmabuf(dpy->con, cursor->buf, have_hot,
  cursor->hot_x, cursor->hot_y);
 cursor->hot_updates = 0;
 } else if (!cursor && new_cursor) {
@@ -328,7 +333,7 @@ static void vfio_display_dmabuf_update(void *opaque)
 cursor->pos_updates = 0;
 }
 
-dpy_gl_update(dpy->con, 0, 0, primary->buf.width, primary->buf.height);
+dpy_gl_update(dpy->con, 0, 0, width, height);
 
 if (free_bufs) {
 vfio_display_free_dmabufs(vdev);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index b9da6c08ef..d66e27db02 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -148,7 +148,7 @@ typedef struct VFIOGroup {
 } VFIOGroup;
 
 typedef struct VFIODMABuf {
-QemuDmaBuf buf;
+QemuDmaBuf *buf;
 uint32_t pos_x, pos_y, pos_updates;
 uint32_t hot_x, hot_y, hot_updates;
 int dmabuf_id;
-- 
2.34.1

[PATCH 3/3] ui/console: add methods for allocating, intializing and accessing QemuDmaBuf

2024-03-20 Thread dongwon . kim

From: Dongwon Kim 

This commit introduces new methods within ui/console to handle the allocation,
initialization, and field retrieval of QemuDmaBuf. By isolating these
operations within ui/console, it enhances safety and encapsulation of
the struct.

Cc: Philippe Mathieu-Daudé 
Cc: Marc-André Lureau 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 include/ui/console.h | 10 
 ui/console.c | 55 
 2 files changed, 65 insertions(+)

diff --git a/include/ui/console.h b/include/ui/console.h
index 0bc7a00ac0..70903f1b0d 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -279,6 +279,7 @@ typedef struct DisplayChangeListenerOps {
 /* optional */
 void (*dpy_gl_cursor_position)(DisplayChangeListener *dcl,
uint32_t pos_x, uint32_t pos_y);
+
 /* optional */
 void (*dpy_gl_release_dmabuf)(DisplayChangeListener *dcl,
   QemuDmaBuf *dmabuf);
@@ -358,6 +359,15 @@ void dpy_gl_cursor_dmabuf(QemuConsole *con, QemuDmaBuf 
*dmabuf,
   bool have_hot, uint32_t hot_x, uint32_t hot_y);
 void dpy_gl_cursor_position(QemuConsole *con,
 uint32_t pos_x, uint32_t pos_y);
+QemuDmaBuf *dpy_gl_create_dmabuf(uint32_t width, uint32_t height,
+ uint32_t stride, uint32_t x,
+ uint32_t y, uint32_t backing_width,
+ uint32_t backing_height, uint32_t fourcc,
+ uint32_t modifier, uint32_t dmabuf_fd,
+ bool allow_fences);
+uint32_t dpy_gl_dmabuf_get_width(QemuDmaBuf *dmabuf);
+uint32_t dpy_gl_dmabuf_get_height(QemuDmaBuf *dmabuf);
+int32_t dpy_gl_dmabuf_get_fd(QemuDmaBuf *dmabuf);
 void dpy_gl_release_dmabuf(QemuConsole *con,
QemuDmaBuf *dmabuf);
 void dpy_gl_update(QemuConsole *con,
diff --git a/ui/console.c b/ui/console.c
index 43226c5c14..bac24756f0 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -1132,6 +1132,60 @@ void dpy_gl_cursor_position(QemuConsole *con,
 }
 }
 
+QemuDmaBuf *dpy_gl_create_dmabuf(uint32_t width, uint32_t height,
+ uint32_t stride, uint32_t x,
+ uint32_t y, uint32_t backing_width,
+ uint32_t backing_height, uint32_t fourcc,
+ uint32_t modifier, uint32_t dmabuf_fd,
+ bool allow_fences)
+{
+QemuDmaBuf *dmabuf;
+
+dmabuf = g_new0(QemuDmaBuf, 1);
+
+dmabuf->width = width;
+dmabuf->height = height;
+dmabuf->stride = stride;
+dmabuf->x = x;
+dmabuf->y = y;
+dmabuf->backing_width = backing_width;
+dmabuf->backing_height = backing_height;
+dmabuf->fourcc = fourcc;
+dmabuf->modifier = modifier;
+dmabuf->fd = dmabuf_fd;
+dmabuf->allow_fences = allow_fences;
+dmabuf->fence_fd = -1;
+
+return dmabuf;
+}
+
+uint32_t dpy_gl_dmabuf_get_width(QemuDmaBuf *dmabuf)
+{
+if (dmabuf) {
+return dmabuf->width;
+}
+
+return 0;
+}
+
+uint32_t dpy_gl_dmabuf_get_height(QemuDmaBuf *dmabuf)
+{
+if (dmabuf) {
+return dmabuf->height;
+}
+
+return 0;
+}
+
+int32_t dpy_gl_dmabuf_get_fd(QemuDmaBuf *dmabuf)
+{
+if (dmabuf) {
+return dmabuf->fd;
+}
+
+return -1;
+}
+
 void dpy_gl_release_dmabuf(QemuConsole *con,
   QemuDmaBuf *dmabuf)
 {
@@ -1145,6 +1199,7 @@ void dpy_gl_release_dmabuf(QemuConsole *con,
 if (dcl->ops->dpy_gl_release_dmabuf) {
 dcl->ops->dpy_gl_release_dmabuf(dcl, dmabuf);
 }
+g_free(dmabuf);
 }
 }
 
-- 
2.34.1

[PATCH 0/3] ui/console: initialize QemuDmaBuf in ui/console

2024-03-20 Thread dongwon . kim

From: Dongwon Kim 

QemuDmaBuf struct is defined and primarily used by ui/console/gl so it is
better to handle its creation, initialization and access within ui/console
rather than within hw modules such as hw/display/virtio-gpu and
hw/vfio/display.

To achieve this, new methods for allocating, initializing the struct, and
accessing certain fields necessary for hardware modules have been introduced
in ui/console.c.
(3rd patch)

Furthermore, modifications have been made to hw/display/virtio-gpu and
hw/vfio/display to utilize these new methods instead of setting up the struct
independently.
(1st and 2nd patches)

Dongwon Kim (3):
  hw/virtio: intialize QemuDmaBuf using the function from ui/console
  hw/vfio: intialize QemuDmaBuf using the function from ui/console
  ui/console: add methods for allocating, intializing and accessing
QemuDmaBuf

 hw/display/virtio-gpu-udmabuf.c | 27 +++-
 hw/vfio/display.c   | 35 -
 include/hw/vfio/vfio-common.h   |  2 +-
 include/hw/virtio/virtio-gpu.h  |  2 +-
 include/ui/console.h| 10 ++
 ui/console.c| 55 +
 6 files changed, 98 insertions(+), 33 deletions(-)

-- 
2.34.1

Re: [PATCH 2/3] migration: Drop unnecessary check in ram's pending_exact()

2024-03-20 Thread Peter Xu

On Wed, Mar 20, 2024 at 03:46:44PM -0400, Peter Xu wrote:
> On Wed, Mar 20, 2024 at 08:21:30PM +0100, Nina Schoetterl-Glausch wrote:
> > On Wed, 2024-03-20 at 14:57 -0400, Peter Xu wrote:
> > > On Wed, Mar 20, 2024 at 06:51:26PM +0100, Nina Schoetterl-Glausch wrote:
> > > > On Wed, 2024-01-17 at 15:58 +0800, pet...@redhat.com wrote:
> > > > > From: Peter Xu 
> > > > > 
> > > > > When the migration frameworks fetches the exact pending sizes, it 
> > > > > means
> > > > > this check:
> > > > > 
> > > > >   remaining_size < s->threshold_size
> > > > > 
> > > > > Must have been done already, actually at migration_iteration_run():
> > > > > 
> > > > > if (must_precopy <= s->threshold_size) {
> > > > > qemu_savevm_state_pending_exact(_precopy, _postcopy);
> > > > > 
> > > > > That should be after one round of ram_state_pending_estimate().  It 
> > > > > makes
> > > > > the 2nd check meaningless and can be dropped.
> > > > > 
> > > > > To say it in another way, when reaching ->state_pending_exact(), we
> > > > > unconditionally sync dirty bits for precopy.
> > > > > 
> > > > > Then we can drop migrate_get_current() there too.
> > > > > 
> > > > > Signed-off-by: Peter Xu 
> > > > 
> > > > Hi Peter,
> > > 
> > > Hi, Nina,
> > > 
> > > > 
> > > > could you have a look at this issue:
> > > > https://gitlab.com/qemu-project/qemu/-/issues/1565
> > > > 
> > > > which I reopened. Previous thread here:
> > > > 
> > > > https://lore.kernel.org/qemu-devel/20230324184129.3119575-1-...@linux.ibm.com/
> > > > 
> > > > I'm seeing migration failures with s390x TCG again, which look the same 
> > > > to me
> > > > as those a while back.
> > > 
> > > I'm still quite confused how that could be caused of this.
> > > 
> > > What you described in the previous bug report seems to imply some page was
> > > leftover in migration so some page got corrupted after migrated.
> > > 
> > > However what this patch mostly does is it can sync more than before even 
> > > if
> > > I overlooked the condition check there (I still think the check is
> > > redundant, there's one outlier when remaining_size == threshold_size, but 
> > > I
> > > don't think it should matter here as of now).  It'll make more sense if
> > > this patch made the sync less, but that's not the case but vice versa.
> > 
> > [...]
> > 
> > > In the previous discussion, you mentioned that you bisected to the commit
> > > and also verified the fix.  Now you also mentioned in the bz that you 
> > > can't
> > > reporduce this bug manually.
> > > 
> > > Is it still possible to be reproduced with some scripts?  Do you also mean
> > > that it's harder to reproduce comparing to before?  In all cases, some way
> > > to reproduce it would definitely be helpful.
> > 
> > I tried running the kvm-unit-test a bunch of times in a loop and couldn't
> > trigger a failure. I just tried again on a different system and managed just
> > fine, yay. No idea why it wouldn't on the first system tho.
> 
> There's probably still a bug somewhere.  If reproduction rate changed, it's
> also a sign that it might not be directly relevant to this change, as
> otherwise it should reproduce the same as before.
> 
> > > 
> > > Even if we want to revert this change, we'll need to know whether this 
> > > will
> > > fix your case so we need something to verify it before a revert.  I'll
> > > consider that the last though as I had a feeling this is papering over
> > > something else.
> > 
> > I can check if I can reproduce the issue before & after b0504edd 
> > ("migration:
> > Drop unnecessary check in ram's pending_exact()").
> > I can also check if I can reproduce it on x86, that worked last time.
> > Anything else? Ideas on how to pinpoint where the corruption happens?
> 
> I don't have a solid clue yet, but more information of the single case
> where it reproduced could help.
> 
> I saw from the bug link that the cmdline is pretty simple.  However still
> not sure of something that can be relevant.  E.g., did you use postcopy
> (including when postcopy-ram enabled but precopy completed)?  Is there any
> special device, like s390's CMMA (would that simplest cmdline include such
> a device; apologies, I have zero knowledge there before today)?
> 
> I _think_ when reading the code I already found something quite unusual,
> but only when postcopy is selected: I notice postcopy will frequently sync
> dirty bitmap while it doesn't really necessarily need to, because
> ram_state_pending_estimate() will report all ram as "can_postcopy"; it
> means it's highly likely that this check will 99.999% always be true simply
> because must_precopy can in most cases be zero:
> 
> if (must_precopy <= s->threshold_size) { < 
> here
> qemu_savevm_state_pending_exact(_precopy, _postcopy);
> pending_size = must_precopy + can_postcopy;
> trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy);
> }
> 
> I need to think more of this, but this doesn't

Re: [PATCH v5 5/7] migration/multifd: implement initialization of qpl compression

2024-03-20 Thread Peter Xu

On Wed, Mar 20, 2024 at 04:23:01PM +, Liu, Yuan1 wrote:
> let me explain here, during the decompression operation of IAA, the
> decompressed data can be directly output to the virtual address of the
> guest memory by IAA hardware.  It can avoid copying the decompressed data
> to guest memory by CPU.

I see.

> Without -mem-prealloc, all the guest memory is not populated, and IAA
> hardware needs to trigger I/O page fault first and then output the
> decompressed data to the guest memory region.  Besides that, CPU page
> faults will also trigger IOTLB flush operation when IAA devices use SVM.

Oh so the IAA hardware already can use CPU pgtables?  Nice..

Why IOTLB flush is needed?  AFAIU we're only installing new pages, the
request can either come from a CPU access or a DMA.  In all cases there
should have no tearing down of an old page.  Isn't an iotlb flush only
needed if a tear down happens?

>
> Due to the inability to quickly resolve a large number of IO page faults
> and IOTLB flushes, the decompression throughput of the IAA device will
> decrease significantly.

-- 
Peter Xu

Re: [PATCH v4 1/2] vhost: dirty log should be per backend type

2024-03-20 Thread Si-Wei Liu





On 3/19/2024 8:25 PM, Jason Wang wrote:

On Tue, Mar 19, 2024 at 6:06 AM Si-Wei Liu  wrote:



On 3/17/2024 8:20 PM, Jason Wang wrote:

On Sat, Mar 16, 2024 at 2:33 AM Si-Wei Liu  wrote:


On 3/14/2024 8:50 PM, Jason Wang wrote:

On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu  wrote:

There could be a mix of both vhost-user and vhost-kernel clients
in the same QEMU process, where separate vhost loggers for the
specific vhost type have to be used. Make the vhost logger per
backend type, and have them properly reference counted.

It's better to describe what's the advantage of doing this.

Yes, I can add that to the log. Although it's a niche use case, it was
actually a long standing limitation / bug that vhost-user and
vhost-kernel loggers can't co-exist per QEMU process, but today it's
just silent failure that may be ended up with. This bug fix removes that
implicit limitation in the code.

Ok.


Suggested-by: Michael S. Tsirkin 
Signed-off-by: Si-Wei Liu 

---
v3->v4:
 - remove checking NULL return value from vhost_log_get

v2->v3:
 - remove non-effective assertion that never be reached
 - do not return NULL from vhost_log_get()
 - add neccessary assertions to vhost_log_get()
---
hw/virtio/vhost.c | 45 +
1 file changed, 33 insertions(+), 12 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 2c9ac79..612f4db 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -43,8 +43,8 @@
do { } while (0)
#endif

-static struct vhost_log *vhost_log;
-static struct vhost_log *vhost_log_shm;
+static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
+static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];

/* Memslots used by backends that support private memslots (without an fd). 
*/
static unsigned int used_memslots;
@@ -287,6 +287,10 @@ static int vhost_set_backend_type(struct vhost_dev *dev,
r = -1;
}

+if (r == 0) {
+assert(dev->vhost_ops->backend_type == backend_type);
+}
+

Under which condition could we hit this?

Just in case some other function inadvertently corrupted this earlier,
we have to capture discrepancy in the first place... On the other hand,
it will be helpful for other vhost backend writers to diagnose day-one
bug in the code. I feel just code comment here will not be
sufficient/helpful.

See below.


It seems not good to assert a local logic.

It seems to me quite a few local asserts are in the same file already,
vhost_save_backend_state,

For example it has assert for

assert(!dev->started);

which is not the logic of the function itself but require
vhost_dev_start() not to be called before.

But it looks like this patch you assert the code just a few lines
above the assert itself?

Yes, that was the intent - for e.g. xxx_ops may contain corrupted
xxx_ops.backend_type already before coming to this
vhost_set_backend_type() function. And we may capture this corrupted
state by asserting the expected xxx_ops.backend_type (to be consistent
with the backend_type passed in),

This can happen for all variables. Not sure why backend_ops is special.
The assert is just checking the backend_type field only. The other op 
fields in backend_ops have similar assert within the op function itself 
also. For e.g. vhost_user_requires_shm_log() and a lot of other 
vhost_user ops have the following:


    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);

vhost_vdpa_vq_get_addr() and a lot of other vhost_vdpa ops have:

    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);

vhost_kernel ops has similar assertions as well.

The reason why it has to be checked against here is now the callers of 
vhost_log_get(), would pass in dev->vhost_ops->backend_type to the API, 
which are unable to verify the validity of the backend_type by 
themselves. The vhost_log_get() has necessary asserts to make bound 
check for the vhost_log[] or vhost_log_shm[] array, but specific assert 
against the exact backend type in vhost_set_backend_type() will further 
harden the implementation in vhost_log_get() and other backend ops.





which needs be done in the first place
when this discrepancy is detected. In practice I think there should be
no harm to add this assert, but this will add warranted guarantee to the
current code.

For example, such corruption can happen after the assert() so a TOCTOU issue.
Sure, it's best effort only. As pointed out earlier, I think together 
with this, there are other similar asserts already in various backend 
ops, which could be helpful to nail down the earliest point or a 
specific range where things may go wrong in the first place.


Thanks,
-Siwei



Thanks


Regards,
-Siwei


dev->vhost_ops = _ops;

...

assert(dev->vhost_ops->backend_type == backend_type)

?

Thanks


vhost_load_backend_state,
vhost_virtqueue_mask, vhost_config_mask, just to name a few. Why local
assert a problem?

Thanks,
-Siwei


Thanks

[PATCH v2 0/2] Add support for STM32G0 SoC family

2024-03-20 Thread Felipe Balbi

Hi all,

These two patches add support for STM32G0 family and nucleo-g071rb
board. Patches have been tested with minimal embedded rust examples.

Changes since v1:

  - Patch 1:
- Convert tabs to spaces (checkpatch.pl)
- Correct lines longer than 80 characters (checkpatch.pl)
- Correct num-prio-bits (Samuel Tardieu)
- Correct num-irqs (Found reviewing RM0444)

  - Patch 2:
- Convert tabs to spaces (checkpatch.pl)

Felipe Balbi (2):
  hw/arm: Add support for stm32g000 SoC family
  hw/arm: Add nucleo-g071rb board

 MAINTAINERS|  13 ++
 hw/arm/Kconfig |  12 ++
 hw/arm/meson.build |   2 +
 hw/arm/nucleo-g071rb.c |  70 +
 hw/arm/stm32g000_soc.c | 253 +
 include/hw/arm/stm32g000_soc.h |  62 
 6 files changed, 412 insertions(+)
 create mode 100644 hw/arm/nucleo-g071rb.c
 create mode 100644 hw/arm/stm32g000_soc.c
 create mode 100644 include/hw/arm/stm32g000_soc.h

-- 
2.44.0

[PATCH v2 1/2] hw/arm: Add support for stm32g000 SoC family

2024-03-20 Thread Felipe Balbi

Minimal support with USARTs and SPIs working. This SoC will be used to
create and nucleo-g071rb board.

Signed-off-by: Felipe Balbi 
---

Changes since v1:
- Convert tabs to spaces (checkpatch.pl)
- Correct lines longer than 80 characters (checkpatch.pl)
- Correct num-prio-bits (Samuel Tardieu)
- Correct num-irqs (Found reviewing RM0444)

 MAINTAINERS|   7 +
 hw/arm/Kconfig |   6 +
 hw/arm/meson.build |   1 +
 hw/arm/stm32g000_soc.c | 253 +
 include/hw/arm/stm32g000_soc.h |  62 
 5 files changed, 329 insertions(+)
 create mode 100644 hw/arm/stm32g000_soc.c
 create mode 100644 include/hw/arm/stm32g000_soc.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 409d7db4d457..bce2eb3ad70b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1134,6 +1134,13 @@ F: hw/misc/stm32l4x5_rcc.c
 F: hw/gpio/stm32l4x5_gpio.c
 F: include/hw/*/stm32l4x5_*.h
 
+STM32G000 SoC Family
+M: Felipe Balbi 
+L: qemu-...@nongnu.org
+S: Maintained
+F: hw/arm/stm32g000_soc.c
+F: include/hw/*/stm32g000_*.h
+
 B-L475E-IOT01A IoT Node
 M: Arnaud Minier 
 M: Inès Varhol 
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 893a7bff66b9..28a46d2b1ad3 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -463,6 +463,12 @@ config STM32F405_SOC
 select STM32F4XX_SYSCFG
 select STM32F4XX_EXTI
 
+config STM32G000_SOC
+bool
+select ARM_V7M
+select STM32F2XX_USART
+select STM32F2XX_SPI
+
 config B_L475E_IOT01A
 bool
 default y
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index 6808135c1f79..9c4137a988e1 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -34,6 +34,7 @@ arm_ss.add(when: ['CONFIG_RASPI', 'TARGET_AARCH64'], if_true: 
files('bcm2838.c',
 arm_ss.add(when: 'CONFIG_STM32F100_SOC', if_true: files('stm32f100_soc.c'))
 arm_ss.add(when: 'CONFIG_STM32F205_SOC', if_true: files('stm32f205_soc.c'))
 arm_ss.add(when: 'CONFIG_STM32F405_SOC', if_true: files('stm32f405_soc.c'))
+arm_ss.add(when: 'CONFIG_STM32G000_SOC', if_true: files('stm32g000_soc.c'))
 arm_ss.add(when: 'CONFIG_B_L475E_IOT01A', if_true: files('b-l475e-iot01a.c'))
 arm_ss.add(when: 'CONFIG_STM32L4X5_SOC', if_true: files('stm32l4x5_soc.c'))
 arm_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp.c', 
'xlnx-zcu102.c'))
diff --git a/hw/arm/stm32g000_soc.c b/hw/arm/stm32g000_soc.c
new file mode 100644
index ..48531d41fcc7
--- /dev/null
+++ b/hw/arm/stm32g000_soc.c
@@ -0,0 +1,253 @@
+/*
+ * STM32G000 SoC
+ *
+ * Copyright (c) 2024 Felipe Balbi 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/module.h"
+#include "hw/arm/boot.h"
+#include "exec/address-spaces.h"
+#include "hw/arm/stm32g000_soc.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-clock.h"
+#include "hw/misc/unimp.h"
+#include "sysemu/sysemu.h"
+
+/* stm32g000_soc implementation is derived from stm32f100_soc */
+
+struct stm32g0_ip_config {
+const char  *name;
+uint32_taddr;
+uint32_tirq;
+};
+
+#define STM32G0_DEFINE_IP(n, a, i)\
+{ \
+.name = (n),  \
+.addr = (a),  \
+.irq = (i),   \
+}
+
+static const struct stm32g0_ip_config usart_config[STM_NUM_USARTS] = {
+STM32G0_DEFINE_IP("USART1", 0x40013800, 27),
+STM32G0_DEFINE_IP("USART2", 0x40004000, 28),
+STM32G0_DEFINE_IP("USART3", 0x40004400, 29),
+STM32G0_DEFINE_IP("USART4", 0x40004800, 29),
+STM32G0_DEFINE_IP("USART5", 0x40004c00, 29),
+STM32G0_DEFINE_IP("USART6", 0x40005000, 29),
+STM32G0_DEFINE_IP("LPUSART1", 0x40008000, 29),
+STM32G0_DEFINE_IP("LPUSART2", 0x40008400, 28),
+};
+
+static const struct stm32g0_ip_config spi_config[STM_NUM_SPIS] = {
+STM32G0_DEFINE_IP("SPI1", 0x40013000, 25),
+STM32G0_DEFINE_IP("SPI2", 0x40003800, 26),
+

[PATCH v2 2/2] hw/arm: Add nucleo-g071rb board

2024-03-20 Thread Felipe Balbi

This board is based around STM32G071RB SoC, a Cortex-M0 based
device. More information can be found at:

https://www.st.com/en/product/nucleo-g071rb.html

Signed-off-by: Felipe Balbi 
---

Changes since v1:

- Convert tabs to spaces (checkpatch.pl)

 MAINTAINERS|  6 
 hw/arm/Kconfig |  6 
 hw/arm/meson.build |  1 +
 hw/arm/nucleo-g071rb.c | 70 ++
 4 files changed, 83 insertions(+)
 create mode 100644 hw/arm/nucleo-g071rb.c

diff --git a/MAINTAINERS b/MAINTAINERS
index bce2eb3ad70b..052ce4dcfb97 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1116,6 +1116,12 @@ L: qemu-...@nongnu.org
 S: Maintained
 F: hw/arm/netduinoplus2.c
 
+Nucleo G071RB
+M: Felipe Balbi 
+L: qemu-...@nongnu.org
+S: Maintained
+F: hw/arm/nucleo-g071rb.c
+
 Olimex STM32 H405
 M: Felipe Balbi 
 L: qemu-...@nongnu.org
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 28a46d2b1ad3..5938bb8208a1 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -310,6 +310,12 @@ config STM32VLDISCOVERY
 depends on TCG && ARM
 select STM32F100_SOC
 
+config NUCLEO_G071RB
+bool
+default y
+depends on TCG && ARM
+select STM32G000_SOC
+
 config STRONGARM
 bool
 select PXA2XX
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index 9c4137a988e1..580c2d55fc3f 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -18,6 +18,7 @@ arm_ss.add(when: 'CONFIG_REALVIEW', if_true: 
files('realview.c'))
 arm_ss.add(when: 'CONFIG_SBSA_REF', if_true: files('sbsa-ref.c'))
 arm_ss.add(when: 'CONFIG_STELLARIS', if_true: files('stellaris.c'))
 arm_ss.add(when: 'CONFIG_STM32VLDISCOVERY', if_true: 
files('stm32vldiscovery.c'))
+arm_ss.add(when: 'CONFIG_NUCLEO_G071RB', if_true: files('nucleo-g071rb.c'))
 arm_ss.add(when: 'CONFIG_ZYNQ', if_true: files('xilinx_zynq.c'))
 arm_ss.add(when: 'CONFIG_SABRELITE', if_true: files('sabrelite.c'))
 
diff --git a/hw/arm/nucleo-g071rb.c b/hw/arm/nucleo-g071rb.c
new file mode 100644
index ..580b52bacf2c
--- /dev/null
+++ b/hw/arm/nucleo-g071rb.c
@@ -0,0 +1,70 @@
+/*
+ * ST Nucleo G071RB
+ *
+ * Copyright (c) 2024 Felipe Balbi 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/boards.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-clock.h"
+#include "qemu/error-report.h"
+#include "hw/arm/stm32g000_soc.h"
+#include "hw/arm/boot.h"
+
+/* nucleo_g071rb implementation is derived from olimex-stm32-h405.c */
+
+/* Main SYSCLK frequency in Hz (48MHz) */
+#define SYSCLK_FRQ 4800ULL
+
+static void nucleo_g071rb_init(MachineState *machine)
+{
+DeviceState *dev;
+Clock *sysclk;
+
+/* This clock doesn't need migration because it is fixed-frequency */
+sysclk = clock_new(OBJECT(machine), "SYSCLK");
+clock_set_hz(sysclk, SYSCLK_FRQ);
+
+dev = qdev_new(TYPE_STM32G000_SOC);
+object_property_add_child(OBJECT(machine), "soc", OBJECT(dev));
+qdev_connect_clock_in(dev, "sysclk", sysclk);
+sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
+
+armv7m_load_kernel(ARM_CPU(first_cpu),
+   machine->kernel_filename,
+   0, FLASH_SIZE);
+}
+
+static void nucleo_g071rb_machine_init(MachineClass *mc)
+{
+static const char * const valid_cpu_types[] = {
+ARM_CPU_TYPE_NAME("cortex-m0"),
+NULL
+};
+
+mc->desc = "ST Nucleo-G071RB (Cortex-M0)";
+mc->init = nucleo_g071rb_init;
+mc->valid_cpu_types = valid_cpu_types;
+}
+
+DEFINE_MACHINE("nucleo-g071rb", nucleo_g071rb_machine_init)
-- 
2.44.0

Re: [PATCH] target/riscv: Fix mode in riscv_tlb_fill

2024-03-20 Thread Daniel Henrique Barboza





On 3/20/24 14:28, Irina Ryapolova wrote:

Need to convert mmu_idx to privilege mode for PMP function.



Please add:

Fixes: b297129ae1 ("target/riscv: propagate PMP permission to TLB page")


Signed-off-by: Irina Ryapolova 
---


Reviewed-by: Daniel Henrique Barboza 


  target/riscv/cpu_helper.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index ce7322011d..fc090d729a 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -1315,7 +1315,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
  bool two_stage_lookup = mmuidx_2stage(mmu_idx);
  bool two_stage_indirect_error = false;
  int ret = TRANSLATE_FAIL;
-int mode = mmu_idx;
+int mode = mmuidx_priv(mmu_idx);
  /* default TLB page size */
  target_ulong tlb_size = TARGET_PAGE_SIZE;

Re: [PATCH] target/riscv: rvv: Remove redudant SEW checking for vector fp narrow/widen instructions

2024-03-20 Thread Daniel Henrique Barboza





On 3/20/24 04:25, Max Chou wrote:

If the checking functions check both the single and double width
operators at the same time, then the single width operator checking
functions (require_rvf[min]) will check whether the SEW is 8.

Signed-off-by: Max Chou 
---


Reviewed-by: Daniel Henrique Barboza 


  target/riscv/insn_trans/trans_rvv.c.inc | 16 
  1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index 19059fea5f..08c22f48cb 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -2333,7 +2333,6 @@ static bool opfvv_widen_check(DisasContext *s, arg_rmrr 
*a)
  return require_rvv(s) &&
 require_rvf(s) &&
 require_scale_rvf(s) &&
-   (s->sew != MO_8) &&
 vext_check_isa_ill(s) &&
 vext_check_dss(s, a->rd, a->rs1, a->rs2, a->vm);
  }
@@ -2373,7 +2372,6 @@ static bool opfvf_widen_check(DisasContext *s, arg_rmrr 
*a)
  return require_rvv(s) &&
 require_rvf(s) &&
 require_scale_rvf(s) &&
-   (s->sew != MO_8) &&
 vext_check_isa_ill(s) &&
 vext_check_ds(s, a->rd, a->rs2, a->vm);
  }
@@ -2406,7 +2404,6 @@ static bool opfwv_widen_check(DisasContext *s, arg_rmrr 
*a)
  return require_rvv(s) &&
 require_rvf(s) &&
 require_scale_rvf(s) &&
-   (s->sew != MO_8) &&
 vext_check_isa_ill(s) &&
 vext_check_dds(s, a->rd, a->rs1, a->rs2, a->vm);
  }
@@ -2446,7 +2443,6 @@ static bool opfwf_widen_check(DisasContext *s, arg_rmrr 
*a)
  return require_rvv(s) &&
 require_rvf(s) &&
 require_scale_rvf(s) &&
-   (s->sew != MO_8) &&
 vext_check_isa_ill(s) &&
 vext_check_dd(s, a->rd, a->rs2, a->vm);
  }
@@ -2704,8 +2700,7 @@ static bool opffv_widen_check(DisasContext *s, arg_rmr *a)
  {
  return opfv_widen_check(s, a) &&
 require_rvfmin(s) &&
-   require_scale_rvfmin(s) &&
-   (s->sew != MO_8);
+   require_scale_rvfmin(s);
  }
  
  #define GEN_OPFV_WIDEN_TRANS(NAME, CHECK, HELPER, FRM) \

@@ -2810,16 +2805,14 @@ static bool opffv_narrow_check(DisasContext *s, arg_rmr 
*a)
  {
  return opfv_narrow_check(s, a) &&
 require_rvfmin(s) &&
-   require_scale_rvfmin(s) &&
-   (s->sew != MO_8);
+   require_scale_rvfmin(s);
  }
  
  static bool opffv_rod_narrow_check(DisasContext *s, arg_rmr *a)

  {
  return opfv_narrow_check(s, a) &&
 require_rvf(s) &&
-   require_scale_rvf(s) &&
-   (s->sew != MO_8);
+   require_scale_rvf(s);
  }
  
  #define GEN_OPFV_NARROW_TRANS(NAME, CHECK, HELPER, FRM)\

@@ -2947,8 +2940,7 @@ static bool freduction_widen_check(DisasContext *s, 
arg_rmrr *a)
  {
  return reduction_widen_check(s, a) &&
 require_rvf(s) &&
-   require_scale_rvf(s) &&
-   (s->sew != MO_8);
+   require_scale_rvf(s);
  }
  
  GEN_OPFVV_WIDEN_TRANS(vfwredusum_vs, freduction_widen_check)

Re: [PATCH] target/riscv: rvv: Check single width operator for vfncvt.rod.f.f.w

2024-03-20 Thread Daniel Henrique Barboza





On 3/20/24 04:25, Max Chou wrote:

The opfv_narrow_check needs to check the single width float operator by
require_rvf.

Signed-off-by: Max Chou 
---


Reviewed-by: Daniel Henrique Barboza 


  target/riscv/insn_trans/trans_rvv.c.inc | 1 +
  1 file changed, 1 insertion(+)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index 6cb9bc9fde..19059fea5f 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -2817,6 +2817,7 @@ static bool opffv_narrow_check(DisasContext *s, arg_rmr 
*a)
  static bool opffv_rod_narrow_check(DisasContext *s, arg_rmr *a)
  {
  return opfv_narrow_check(s, a) &&
+   require_rvf(s) &&
 require_scale_rvf(s) &&
 (s->sew != MO_8);
  }

Re: [PATCH] target/riscv: rvv: Check single width operator for vector fp widen instructions

2024-03-20 Thread Daniel Henrique Barboza





On 3/20/24 04:25, Max Chou wrote:

The require_scale_rvf function only checks the double width operator for
the vector floating point widen instructions, so most of the widen
checking functions need to add require_rvf for single width operator.

The vfwcvt.f.x.v and vfwcvt.f.xu.v instructions convert single width
integer to double width float, so the opfxv_widen_check function doesn’t
need require_rvf for the single width operator(integer).

Signed-off-by: Max Chou 
---


Reviewed-by: Daniel Henrique Barboza 


  target/riscv/insn_trans/trans_rvv.c.inc | 5 +
  1 file changed, 5 insertions(+)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index ef568e263d..6cb9bc9fde 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -2331,6 +2331,7 @@ GEN_OPFVF_TRANS(vfrsub_vf,  opfvf_check)
  static bool opfvv_widen_check(DisasContext *s, arg_rmrr *a)
  {
  return require_rvv(s) &&
+   require_rvf(s) &&
 require_scale_rvf(s) &&
 (s->sew != MO_8) &&
 vext_check_isa_ill(s) &&
@@ -2370,6 +2371,7 @@ GEN_OPFVV_WIDEN_TRANS(vfwsub_vv, opfvv_widen_check)
  static bool opfvf_widen_check(DisasContext *s, arg_rmrr *a)
  {
  return require_rvv(s) &&
+   require_rvf(s) &&
 require_scale_rvf(s) &&
 (s->sew != MO_8) &&
 vext_check_isa_ill(s) &&
@@ -2402,6 +2404,7 @@ GEN_OPFVF_WIDEN_TRANS(vfwsub_vf)
  static bool opfwv_widen_check(DisasContext *s, arg_rmrr *a)
  {
  return require_rvv(s) &&
+   require_rvf(s) &&
 require_scale_rvf(s) &&
 (s->sew != MO_8) &&
 vext_check_isa_ill(s) &&
@@ -2441,6 +2444,7 @@ GEN_OPFWV_WIDEN_TRANS(vfwsub_wv)
  static bool opfwf_widen_check(DisasContext *s, arg_rmrr *a)
  {
  return require_rvv(s) &&
+   require_rvf(s) &&
 require_scale_rvf(s) &&
 (s->sew != MO_8) &&
 vext_check_isa_ill(s) &&
@@ -2941,6 +2945,7 @@ GEN_OPFVV_TRANS(vfredmin_vs, freduction_check)
  static bool freduction_widen_check(DisasContext *s, arg_rmrr *a)
  {
  return reduction_widen_check(s, a) &&
+   require_rvf(s) &&
 require_scale_rvf(s) &&
 (s->sew != MO_8);
  }

Re: [PATCH] target/riscv: rvv: Fix Zvfhmin checking for vfwcvt.f.f.v and vfncvt.f.f.w instructions

2024-03-20 Thread Daniel Henrique Barboza





On 3/20/24 04:25, Max Chou wrote:

According v spec 18.4, only the vfwcvt.f.f.v and vfncvt.f.f.w
instructions will be affected by Zvfhmin extension.
And the vfwcvt.f.f.v and vfncvt.f.f.w instructions only support the
conversions of

* From 1*SEW(16/32) to 2*SEW(32/64)
* From 2*SEW(32/64) to 1*SEW(16/32)

Signed-off-by: Max Chou 
---


Reviewed-by: Daniel Henrique Barboza 


  target/riscv/insn_trans/trans_rvv.c.inc | 20 ++--
  1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index 7d84e7d812..ef568e263d 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -50,6 +50,22 @@ static bool require_rvf(DisasContext *s)
  }
  }
  
+static bool require_rvfmin(DisasContext *s)

+{
+if (s->mstatus_fs == EXT_STATUS_DISABLED) {
+return false;
+}
+
+switch (s->sew) {
+case MO_16:
+return s->cfg_ptr->ext_zvfhmin;
+case MO_32:
+return s->cfg_ptr->ext_zve32f;
+default:
+return false;
+}
+}
+
  static bool require_scale_rvf(DisasContext *s)
  {
  if (s->mstatus_fs == EXT_STATUS_DISABLED) {
@@ -75,8 +91,6 @@ static bool require_scale_rvfmin(DisasContext *s)
  }
  
  switch (s->sew) {

-case MO_8:
-return s->cfg_ptr->ext_zvfhmin;
  case MO_16:
  return s->cfg_ptr->ext_zve32f;
  case MO_32:
@@ -2685,6 +2699,7 @@ static bool opxfv_widen_check(DisasContext *s, arg_rmr *a)
  static bool opffv_widen_check(DisasContext *s, arg_rmr *a)
  {
  return opfv_widen_check(s, a) &&
+   require_rvfmin(s) &&
 require_scale_rvfmin(s) &&
 (s->sew != MO_8);
  }
@@ -2790,6 +2805,7 @@ static bool opfxv_narrow_check(DisasContext *s, arg_rmr 
*a)
  static bool opffv_narrow_check(DisasContext *s, arg_rmr *a)
  {
  return opfv_narrow_check(s, a) &&
+   require_rvfmin(s) &&
 require_scale_rvfmin(s) &&
 (s->sew != MO_8);
  }

Re: [PATCH v3 11/49] physmem: Introduce ram_block_discard_guest_memfd_range()

2024-03-20 Thread David Hildenbrand


On 20.03.24 18:38, Michael Roth wrote:

On Wed, Mar 20, 2024 at 10:37:14AM +0100, David Hildenbrand wrote:

On 20.03.24 09:39, Michael Roth wrote:

From: Xiaoyao Li 

When memory page is converted from private to shared, the original
private memory is back'ed by guest_memfd. Introduce
ram_block_discard_guest_memfd_range() for discarding memory in
guest_memfd.

Originally-from: Isaku Yamahata 
Codeveloped-by: Xiaoyao Li 


"Co-developed-by"


Signed-off-by: Xiaoyao Li 
Reviewed-by: David Hildenbrand 


Your SOB should go here.


---
Changes in v5:
- Collect Reviewed-by from David;

Changes in in v4:
- Drop ram_block_convert_range() and open code its implementation in the
next Patch.

Signed-off-by: Michael Roth 


I only received 3 patches from this series, and now I am confused: changelog
talks about v5 and this is "PATCH v3"

Please make sure to send at least the cover letter along (I might not need
the other 46 patches :D ).


Sorry for the confusion, you got auto-Cc'd by git, which is good, but
not sure there's a good way to make sure everyone gets a copy of the
cover letter. I could see how it would help useful to potential
reviewers though. I'll try to come up with a script for it and take that
approach in the future.


A script shared with me in the past to achieve that in most cases:

$ cat cc-cmd.sh
#!/bin/bash

if [[ $1 == *gitsendemail.msg* || $1 == *cover-letter* ]]; then
grep ': .* <.*@.*>' -h *.patch | sed 's/^.*: //' | sort | uniq
fi


And attach to "git send-email ... *.patch": --cc-cmd=./cc-cmd.sh

--
Cheers,

David / dhildenb

Re: [PATCH v3 19/49] kvm: Make kvm_convert_memory() obey ram_block_discard_is_enabled()

2024-03-20 Thread Michael Roth

On Wed, Mar 20, 2024 at 05:26:00PM +0100, Paolo Bonzini wrote:
> On 3/20/24 09:39, Michael Roth wrote:
> > Some subsystems like VFIO might disable ram block discard for
> > uncoordinated cases. Since kvm_convert_memory()/guest_memfd don't
> > implement a RamDiscardManager handler to convey discard operations to
> > various listeners like VFIO. > Because of this, sequences like the
> > following can result due to stale IOMMU mappings:
> 
> Alternatively, should guest-memfd memory regions call
> ram_block_discard_require(true)?  This will prevent VFIO from operating, but
> it will avoid consuming twice the memory.
> 
> If desirable, guest-memfd support can be changed to implement an extension
> of RamDiscardManager that notifies about private/shared memory changes, and
> then guest-memfd would be able to support coordinated discard.  But I wonder

In an earlier/internal version of the SNP+gmem patches (when there was still
a dedicated hostmem-memfd-private backend for restrictedmem/gmem), we had a
rough implementation of RamDiscardManager that did this:

https://github.com/AMDESE/qemu/blob/snp-latest-gmem-v12/backends/hostmem-memfd-private.c#L75

Now that gmem handling is mostly done transparently to the HostMem
backend in use I'm not sure what the right place would be to implement
something similar, but maybe it can be done in a more generic way.

There were some notable downsides to that approach though that I'm a
little hazy on now, but I think they were both kernel limitations:

  - VFIO seemed to have some limitation where it expects that the
DMA mapping for a particular iova will be unmapped/mapped with
the same granularity, but for an SNP guest there's no guarantee
that if you flip a 2MB page from shared->private, that it won't
later be flipped private->shared again but this time with a 4K
granularity/sub-range. I think the current code still treats
this as an -EINVAL case. So we end up needing to do everything
with 4K granularity, which I *think* results in 4K IOMMU page
table mappings, but I'd need to confirm.

  - VFIO doesn't seem to be optimized for this sort of use case and
generally expects a much larger granularity and defaults to 64K
max DMA entries, so for a 16GB guest you need to configure VFIO
with something like:

  vfio_iommu_type1.dma_entry_limit=4194304

I didn't see any reason to suggest that's problematic but it
makes we wonder if there's other stuff me might run into.

> if that's doable at all - how common are shared<->private flips, and is it
> feasible to change the IOMMU page tables every time?

- For OVMF+guest kernel that don't do lazy-acceptance:

  I think the bulk of the flipping is during boot where most of
  shared GPA ranges get converted to private memory, and then
  later on the guest kernel switches memory back to to shared
  for stuff like SWIOTLB, and after that I think DMA mappings
  would be fairly stable.

- For OVMF+guest kernel that support lazy-acceptance:

  The first 4GB get converted to private, and the rest remains
  shared until guest kernel needs to allocate memory from it.
  I'm not sure if SWIOTLB allocation is optimized to avoid
  unecessary flipping if it's allocated from that pool of
  still-shared memory, but normal/private allocations will
  result in a steady stream of DMA unmap operations as the
  guest faults in its working set.

> 
> If the real solution is SEV-TIO (which means essentially guest_memfd support
> for VFIO), calling ram_block_discard_require(true) may be the simplest
> stopgap solution.

Hard to guess how cloud vendors will feel about waiting for trusted I/O.
It does make sense in the context of CoCo to expect them to wait, but
would be nice to have a stop-gap to offer like disabling discard, since
it has minimal requirements on the QEMU/VFIO side and might be enough to
get early adopters up and running at least.

All that said, if you think something based around RamDiscardManager
seems tenable given all above then we can re-visit that approach as well.

-Mike

> 
> Paolo
> 
> >- convert page shared->private
> >- discard shared page
> >- convert page private->shared
> >- new page is allocated
> >- issue DMA operations against that shared page
> > 
> > Address this by taking ram_block_discard_is_enabled() into account when
> > deciding whether or not to discard pages.
> > 
> > Signed-off-by: Michael Roth 
> > ---
> >   accel/kvm/kvm-all.c | 8 ++--
> >   1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> > index 53ce4f091e..6ae03c880f 100644
> > --- a/accel/kvm/kvm-all.c
> > +++ b/accel/kvm/kvm-all.c
> > @@ -2962,10 +2962,14 @@ static int kvm_convert_memory(hwaddr start, hwaddr 
> > size, bool to_private)
> >   */
> >   return 0;
> >   } else {
> > -ret = ram_block_discard_range(rb, offset, size);
> > +ret =

Re: [PATCH v3 11/49] physmem: Introduce ram_block_discard_guest_memfd_range()

2024-03-20 Thread Michael Roth

On Wed, Mar 20, 2024 at 10:37:14AM +0100, David Hildenbrand wrote:
> On 20.03.24 09:39, Michael Roth wrote:
> > From: Xiaoyao Li 
> > 
> > When memory page is converted from private to shared, the original
> > private memory is back'ed by guest_memfd. Introduce
> > ram_block_discard_guest_memfd_range() for discarding memory in
> > guest_memfd.
> > 
> > Originally-from: Isaku Yamahata 
> > Codeveloped-by: Xiaoyao Li 
> 
> "Co-developed-by"
> 
> > Signed-off-by: Xiaoyao Li 
> > Reviewed-by: David Hildenbrand 
> 
> Your SOB should go here.
> 
> > ---
> > Changes in v5:
> > - Collect Reviewed-by from David;
> > 
> > Changes in in v4:
> > - Drop ram_block_convert_range() and open code its implementation in the
> >next Patch.
> > 
> > Signed-off-by: Michael Roth 
> 
> I only received 3 patches from this series, and now I am confused: changelog
> talks about v5 and this is "PATCH v3"
> 
> Please make sure to send at least the cover letter along (I might not need
> the other 46 patches :D ).

Sorry for the confusion, you got auto-Cc'd by git, which is good, but
not sure there's a good way to make sure everyone gets a copy of the
cover letter. I could see how it would help useful to potential
reviewers though. I'll try to come up with a script for it and take that
approach in the future.

-Mike

> 
> -- 
> Cheers,
> 
> David / dhildenb
>

Re: [PATCH 2/3] migration: Drop unnecessary check in ram's pending_exact()

2024-03-20 Thread Peter Xu

On Wed, Mar 20, 2024 at 08:21:30PM +0100, Nina Schoetterl-Glausch wrote:
> On Wed, 2024-03-20 at 14:57 -0400, Peter Xu wrote:
> > On Wed, Mar 20, 2024 at 06:51:26PM +0100, Nina Schoetterl-Glausch wrote:
> > > On Wed, 2024-01-17 at 15:58 +0800, pet...@redhat.com wrote:
> > > > From: Peter Xu 
> > > > 
> > > > When the migration frameworks fetches the exact pending sizes, it means
> > > > this check:
> > > > 
> > > >   remaining_size < s->threshold_size
> > > > 
> > > > Must have been done already, actually at migration_iteration_run():
> > > > 
> > > > if (must_precopy <= s->threshold_size) {
> > > > qemu_savevm_state_pending_exact(_precopy, _postcopy);
> > > > 
> > > > That should be after one round of ram_state_pending_estimate().  It 
> > > > makes
> > > > the 2nd check meaningless and can be dropped.
> > > > 
> > > > To say it in another way, when reaching ->state_pending_exact(), we
> > > > unconditionally sync dirty bits for precopy.
> > > > 
> > > > Then we can drop migrate_get_current() there too.
> > > > 
> > > > Signed-off-by: Peter Xu 
> > > 
> > > Hi Peter,
> > 
> > Hi, Nina,
> > 
> > > 
> > > could you have a look at this issue:
> > > https://gitlab.com/qemu-project/qemu/-/issues/1565
> > > 
> > > which I reopened. Previous thread here:
> > > 
> > > https://lore.kernel.org/qemu-devel/20230324184129.3119575-1-...@linux.ibm.com/
> > > 
> > > I'm seeing migration failures with s390x TCG again, which look the same 
> > > to me
> > > as those a while back.
> > 
> > I'm still quite confused how that could be caused of this.
> > 
> > What you described in the previous bug report seems to imply some page was
> > leftover in migration so some page got corrupted after migrated.
> > 
> > However what this patch mostly does is it can sync more than before even if
> > I overlooked the condition check there (I still think the check is
> > redundant, there's one outlier when remaining_size == threshold_size, but I
> > don't think it should matter here as of now).  It'll make more sense if
> > this patch made the sync less, but that's not the case but vice versa.
> 
> [...]
> 
> > In the previous discussion, you mentioned that you bisected to the commit
> > and also verified the fix.  Now you also mentioned in the bz that you can't
> > reporduce this bug manually.
> > 
> > Is it still possible to be reproduced with some scripts?  Do you also mean
> > that it's harder to reproduce comparing to before?  In all cases, some way
> > to reproduce it would definitely be helpful.
> 
> I tried running the kvm-unit-test a bunch of times in a loop and couldn't
> trigger a failure. I just tried again on a different system and managed just
> fine, yay. No idea why it wouldn't on the first system tho.

There's probably still a bug somewhere.  If reproduction rate changed, it's
also a sign that it might not be directly relevant to this change, as
otherwise it should reproduce the same as before.

> > 
> > Even if we want to revert this change, we'll need to know whether this will
> > fix your case so we need something to verify it before a revert.  I'll
> > consider that the last though as I had a feeling this is papering over
> > something else.
> 
> I can check if I can reproduce the issue before & after b0504edd ("migration:
> Drop unnecessary check in ram's pending_exact()").
> I can also check if I can reproduce it on x86, that worked last time.
> Anything else? Ideas on how to pinpoint where the corruption happens?

I don't have a solid clue yet, but more information of the single case
where it reproduced could help.

I saw from the bug link that the cmdline is pretty simple.  However still
not sure of something that can be relevant.  E.g., did you use postcopy
(including when postcopy-ram enabled but precopy completed)?  Is there any
special device, like s390's CMMA (would that simplest cmdline include such
a device; apologies, I have zero knowledge there before today)?

I _think_ when reading the code I already found something quite unusual,
but only when postcopy is selected: I notice postcopy will frequently sync
dirty bitmap while it doesn't really necessarily need to, because
ram_state_pending_estimate() will report all ram as "can_postcopy"; it
means it's highly likely that this check will 99.999% always be true simply
because must_precopy can in most cases be zero:

if (must_precopy <= s->threshold_size) { < here
qemu_savevm_state_pending_exact(_precopy, _postcopy);
pending_size = must_precopy + can_postcopy;
trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy);
}

I need to think more of this, but this doesn't sound right at all.  There's
no such issue with precopy-only, and I'm surprised it is like that for years.

-- 
Peter Xu

Re: [PATCH 3/4] hw/nmi: Remove @cpu_index argument from NMIClass::nmi_handler()

2024-03-20 Thread Peter Maydell

On Wed, 20 Mar 2024 at 19:05, Markus Armbruster  wrote:
>
> Philippe Mathieu-Daudé  writes:
>
> > On 20/3/24 14:23, Peter Maydell wrote:
> >> On Tue, 20 Feb 2024 at 15:09, Philippe Mathieu-Daudé  
> >> wrote:
> >>>
> >>> Only s390x was using the 'cpu_index' argument, but since the
> >>> previous commit it isn't anymore (it use the first cpu).
> >>> Since this argument is now completely unused, remove it. Have
> >>> the callback return a boolean indicating failure.
> >>>
> >>> Signed-off-by: Philippe Mathieu-Daudé 
> >>> ---
> >>>   include/hw/nmi.h   | 11 ++-
> >>>   hw/core/nmi.c  |  3 +--
> >>>   hw/hppa/machine.c  |  8 +---
> >>>   hw/i386/x86.c  |  7 ---
> >>>   hw/intc/m68k_irqc.c|  6 --
> >>>   hw/m68k/q800-glue.c|  6 --
> >>>   hw/misc/macio/gpio.c   |  6 --
> >>>   hw/ppc/pnv.c   |  6 --
> >>>   hw/ppc/spapr.c |  6 --
> >>>   hw/s390x/s390-virtio-ccw.c |  6 --
> >>>   10 files changed, 44 insertions(+), 21 deletions(-)
> >>>
> >>> diff --git a/include/hw/nmi.h b/include/hw/nmi.h
> >>> index fff41bebc6..c70db941c9 100644
> >>> --- a/include/hw/nmi.h
> >>> +++ b/include/hw/nmi.h
> >>> @@ -37,7 +37,16 @@ typedef struct NMIState NMIState;
> >>>   struct NMIClass {
> >>>   InterfaceClass parent_class;
> >>>
> >>> -void (*nmi_monitor_handler)(NMIState *n, int cpu_index, Error 
> >>> **errp);
> >>> +/**
> >>> + * nmi_handler: Callback to handle NMI notifications.
> >>> + *
> >>> + * @n: Class #NMIState state
> >>> + * @errp: pointer to error object
> >>> + *
> >>> + * On success, return %true.
> >>> + * On failure, store an error through @errp and return %false.
> >>> + */
> >>> +bool (*nmi_handler)(NMIState *n, Error **errp);
> >> Any particular reason to change the method name here?
> >> Do we really need to indicate failure both through the bool return
> >> and the Error** ?
> >
> > No, but this is the style *recommended* by the Error API since
> > commit e3fe3988d7 ("error: Document Error API usage rules"):
> >
> > error: Document Error API usage rules
> >
> > This merely codifies existing practice, with one exception: the rule
> > advising against returning void, where existing practice is mixed.
> >
> > When the Error API was created, we adopted the (unwritten) rule to
> > return void when the function returns no useful value on success,
> > unlike GError, which recommends to return true on success and false
> > on error then.
> >
> > [...]
> >
> > Make the rule advising against returning void official by putting it
> > in writing.  This will hopefully reduce confusion.
> >
> >   * - Whenever practical, also return a value that indicates success /
> >   *   failure.  This can make the error checking more concise, and can
> >   *   avoid useless error object creation and destruction.  Note that
>
> It's the difference between
>
> if (!frobnicate(arg, errp)) {
> return;
> }
>
> and
>
> frobnicate(arg, );
> if (err) {
> error_propagate(errp, err);
> return;
> }
>
> Readabilty dies by a thousand cuts.
>
> GError got this right.  We deviated from it for Error, until we
> understood why it's right.
>
> Another win: _abort gives you a backtrace into frobnicate() with
> the former, and into error_propagate() with the latter.

Fair enough. (When I made the comment I was vaguely wondering
if we wanted to keep the return value available to distinguish
"this hook has handled the NMI, don't keep iterating" from
"no error, but you should keep iterating through other handlers".
But I think in the end my feeling is we should always stop
after the first NMI handler we find regardless.

-- PMM

Re: [PATCH 2/3] migration: Drop unnecessary check in ram's pending_exact()

2024-03-20 Thread Nina Schoetterl-Glausch

On Wed, 2024-03-20 at 14:57 -0400, Peter Xu wrote:
> On Wed, Mar 20, 2024 at 06:51:26PM +0100, Nina Schoetterl-Glausch wrote:
> > On Wed, 2024-01-17 at 15:58 +0800, pet...@redhat.com wrote:
> > > From: Peter Xu 
> > > 
> > > When the migration frameworks fetches the exact pending sizes, it means
> > > this check:
> > > 
> > >   remaining_size < s->threshold_size
> > > 
> > > Must have been done already, actually at migration_iteration_run():
> > > 
> > > if (must_precopy <= s->threshold_size) {
> > > qemu_savevm_state_pending_exact(_precopy, _postcopy);
> > > 
> > > That should be after one round of ram_state_pending_estimate().  It makes
> > > the 2nd check meaningless and can be dropped.
> > > 
> > > To say it in another way, when reaching ->state_pending_exact(), we
> > > unconditionally sync dirty bits for precopy.
> > > 
> > > Then we can drop migrate_get_current() there too.
> > > 
> > > Signed-off-by: Peter Xu 
> > 
> > Hi Peter,
> 
> Hi, Nina,
> 
> > 
> > could you have a look at this issue:
> > https://gitlab.com/qemu-project/qemu/-/issues/1565
> > 
> > which I reopened. Previous thread here:
> > 
> > https://lore.kernel.org/qemu-devel/20230324184129.3119575-1-...@linux.ibm.com/
> > 
> > I'm seeing migration failures with s390x TCG again, which look the same to 
> > me
> > as those a while back.
> 
> I'm still quite confused how that could be caused of this.
> 
> What you described in the previous bug report seems to imply some page was
> leftover in migration so some page got corrupted after migrated.
> 
> However what this patch mostly does is it can sync more than before even if
> I overlooked the condition check there (I still think the check is
> redundant, there's one outlier when remaining_size == threshold_size, but I
> don't think it should matter here as of now).  It'll make more sense if
> this patch made the sync less, but that's not the case but vice versa.

[...]

> In the previous discussion, you mentioned that you bisected to the commit
> and also verified the fix.  Now you also mentioned in the bz that you can't
> reporduce this bug manually.
> 
> Is it still possible to be reproduced with some scripts?  Do you also mean
> that it's harder to reproduce comparing to before?  In all cases, some way
> to reproduce it would definitely be helpful.

I tried running the kvm-unit-test a bunch of times in a loop and couldn't
trigger a failure. I just tried again on a different system and managed just
fine, yay. No idea why it wouldn't on the first system tho.
> 
> Even if we want to revert this change, we'll need to know whether this will
> fix your case so we need something to verify it before a revert.  I'll
> consider that the last though as I had a feeling this is papering over
> something else.

I can check if I can reproduce the issue before & after b0504edd ("migration:
Drop unnecessary check in ram's pending_exact()").
I can also check if I can reproduce it on x86, that worked last time.
Anything else? Ideas on how to pinpoint where the corruption happens?

> 
> Thanks,
>

Re: [PATCH 1/2] hw/arm: Add support for stm32g000 SoC family

2024-03-20 Thread Samuel Tardieu

Felipe Balbi  writes:

> +qdev_prop_set_uint8(armv7m, "num-prio-bits", 4);

Hi Felipe.

This should be 2, not 4. From RM0454 section 11.1 on page 250: "4 programmable 
priority levels (2 bits of interrupt priority are used)".

  Sam
-- 
Samuel Tardieu

Re: [PATCH 3/4] hw/nmi: Remove @cpu_index argument from NMIClass::nmi_handler()

2024-03-20 Thread Markus Armbruster

Philippe Mathieu-Daudé  writes:

> On 20/3/24 14:23, Peter Maydell wrote:
>> On Tue, 20 Feb 2024 at 15:09, Philippe Mathieu-Daudé  
>> wrote:
>>>
>>> Only s390x was using the 'cpu_index' argument, but since the
>>> previous commit it isn't anymore (it use the first cpu).
>>> Since this argument is now completely unused, remove it. Have
>>> the callback return a boolean indicating failure.
>>>
>>> Signed-off-by: Philippe Mathieu-Daudé 
>>> ---
>>>   include/hw/nmi.h   | 11 ++-
>>>   hw/core/nmi.c  |  3 +--
>>>   hw/hppa/machine.c  |  8 +---
>>>   hw/i386/x86.c  |  7 ---
>>>   hw/intc/m68k_irqc.c|  6 --
>>>   hw/m68k/q800-glue.c|  6 --
>>>   hw/misc/macio/gpio.c   |  6 --
>>>   hw/ppc/pnv.c   |  6 --
>>>   hw/ppc/spapr.c |  6 --
>>>   hw/s390x/s390-virtio-ccw.c |  6 --
>>>   10 files changed, 44 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/include/hw/nmi.h b/include/hw/nmi.h
>>> index fff41bebc6..c70db941c9 100644
>>> --- a/include/hw/nmi.h
>>> +++ b/include/hw/nmi.h
>>> @@ -37,7 +37,16 @@ typedef struct NMIState NMIState;
>>>   struct NMIClass {
>>>   InterfaceClass parent_class;
>>>
>>> -void (*nmi_monitor_handler)(NMIState *n, int cpu_index, Error **errp);
>>> +/**
>>> + * nmi_handler: Callback to handle NMI notifications.
>>> + *
>>> + * @n: Class #NMIState state
>>> + * @errp: pointer to error object
>>> + *
>>> + * On success, return %true.
>>> + * On failure, store an error through @errp and return %false.
>>> + */
>>> +bool (*nmi_handler)(NMIState *n, Error **errp);
>> Any particular reason to change the method name here?
>> Do we really need to indicate failure both through the bool return
>> and the Error** ?
>
> No, but this is the style *recommended* by the Error API since
> commit e3fe3988d7 ("error: Document Error API usage rules"):
>
> error: Document Error API usage rules
>
> This merely codifies existing practice, with one exception: the rule
> advising against returning void, where existing practice is mixed.
>
> When the Error API was created, we adopted the (unwritten) rule to
> return void when the function returns no useful value on success,
> unlike GError, which recommends to return true on success and false
> on error then.
>
> [...]
>
> Make the rule advising against returning void official by putting it
> in writing.  This will hopefully reduce confusion.
>
>   * - Whenever practical, also return a value that indicates success /
>   *   failure.  This can make the error checking more concise, and can
>   *   avoid useless error object creation and destruction.  Note that

It's the difference between

if (!frobnicate(arg, errp)) {
return;
}

and

frobnicate(arg, );
if (err) {
error_propagate(errp, err);
return;
}

Readabilty dies by a thousand cuts.

GError got this right.  We deviated from it for Error, until we
understood why it's right.

Another win: _abort gives you a backtrace into frobnicate() with
the former, and into error_propagate() with the latter.

>   *   we still have many functions returning void.  We recommend
>   *   • bool-valued functions return true on success / false on failure,
>   *   • pointer-valued functions return non-null / null pointer, and
>   *   • integer-valued functions return non-negative / negative.
>
> Anyway I'll respin removing @cpu_index as a single change :)

Re: [PATCH 2/3] migration: Drop unnecessary check in ram's pending_exact()

2024-03-20 Thread Peter Xu

On Wed, Mar 20, 2024 at 06:51:26PM +0100, Nina Schoetterl-Glausch wrote:
> On Wed, 2024-01-17 at 15:58 +0800, pet...@redhat.com wrote:
> > From: Peter Xu 
> > 
> > When the migration frameworks fetches the exact pending sizes, it means
> > this check:
> > 
> >   remaining_size < s->threshold_size
> > 
> > Must have been done already, actually at migration_iteration_run():
> > 
> > if (must_precopy <= s->threshold_size) {
> > qemu_savevm_state_pending_exact(_precopy, _postcopy);
> > 
> > That should be after one round of ram_state_pending_estimate().  It makes
> > the 2nd check meaningless and can be dropped.
> > 
> > To say it in another way, when reaching ->state_pending_exact(), we
> > unconditionally sync dirty bits for precopy.
> > 
> > Then we can drop migrate_get_current() there too.
> > 
> > Signed-off-by: Peter Xu 
> 
> Hi Peter,

Hi, Nina,

> 
> could you have a look at this issue:
> https://gitlab.com/qemu-project/qemu/-/issues/1565
> 
> which I reopened. Previous thread here:
> 
> https://lore.kernel.org/qemu-devel/20230324184129.3119575-1-...@linux.ibm.com/
> 
> I'm seeing migration failures with s390x TCG again, which look the same to me
> as those a while back.

I'm still quite confused how that could be caused of this.

What you described in the previous bug report seems to imply some page was
leftover in migration so some page got corrupted after migrated.

However what this patch mostly does is it can sync more than before even if
I overlooked the condition check there (I still think the check is
redundant, there's one outlier when remaining_size == threshold_size, but I
don't think it should matter here as of now).  It'll make more sense if
this patch made the sync less, but that's not the case but vice versa.

> 
> > ---
> >  migration/ram.c | 9 -
> >  1 file changed, 4 insertions(+), 5 deletions(-)
> > 
> > diff --git a/migration/ram.c b/migration/ram.c
> > index c0cdcccb75..d5b7cd5ac2 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -3213,21 +3213,20 @@ static void ram_state_pending_estimate(void 
> > *opaque, uint64_t *must_precopy,
> >  static void ram_state_pending_exact(void *opaque, uint64_t *must_precopy,
> >  uint64_t *can_postcopy)
> >  {
> > -MigrationState *s = migrate_get_current();
> >  RAMState **temp = opaque;
> >  RAMState *rs = *temp;
> > +uint64_t remaining_size;
> >  
> > -uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
> > -
> > -if (!migration_in_postcopy() && remaining_size < s->threshold_size) {
> > +if (!migration_in_postcopy()) {
> >  bql_lock();
> >  WITH_RCU_READ_LOCK_GUARD() {
> >  migration_bitmap_sync_precopy(rs, false);
> >  }
> >  bql_unlock();
> > -remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
> >  }
> >  
> > +remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
> > +
> >  if (migrate_postcopy_ram()) {
> >  /* We can do postcopy, and all the data is postcopiable */
> >  *can_postcopy += remaining_size;
> 
> This basically reverts 28ef5339c3 ("migration: fix 
> ram_state_pending_exact()"), which originally
> made the issue disappear.
> 
> Any thoughts on the matter appreciated.

In the previous discussion, you mentioned that you bisected to the commit
and also verified the fix.  Now you also mentioned in the bz that you can't
reporduce this bug manually.

Is it still possible to be reproduced with some scripts?  Do you also mean
that it's harder to reproduce comparing to before?  In all cases, some way
to reproduce it would definitely be helpful.

Even if we want to revert this change, we'll need to know whether this will
fix your case so we need something to verify it before a revert.  I'll
consider that the last though as I had a feeling this is papering over
something else.

Thanks,

-- 
Peter Xu

[PATCH 1/2] hw/arm: Add support for stm32g000 SoC family

2024-03-20 Thread Felipe Balbi

From: Felipe Balbi 

Minimal support with USARTs and SPIs working. This SoC will be used to
create and nucleo-g071rb board.

Signed-off-by: Felipe Balbi 
---
 hw/arm/Kconfig |   6 +
 hw/arm/meson.build |   1 +
 hw/arm/stm32g000_soc.c | 246 +
 include/hw/arm/stm32g000_soc.h |  62 +
 4 files changed, 315 insertions(+)
 create mode 100644 hw/arm/stm32g000_soc.c
 create mode 100644 include/hw/arm/stm32g000_soc.h

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 893a7bff66b9..28a46d2b1ad3 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -463,6 +463,12 @@ config STM32F405_SOC
 select STM32F4XX_SYSCFG
 select STM32F4XX_EXTI
 
+config STM32G000_SOC
+bool
+select ARM_V7M
+select STM32F2XX_USART
+select STM32F2XX_SPI
+
 config B_L475E_IOT01A
 bool
 default y
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index 6808135c1f79..9c4137a988e1 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -34,6 +34,7 @@ arm_ss.add(when: ['CONFIG_RASPI', 'TARGET_AARCH64'], if_true: 
files('bcm2838.c',
 arm_ss.add(when: 'CONFIG_STM32F100_SOC', if_true: files('stm32f100_soc.c'))
 arm_ss.add(when: 'CONFIG_STM32F205_SOC', if_true: files('stm32f205_soc.c'))
 arm_ss.add(when: 'CONFIG_STM32F405_SOC', if_true: files('stm32f405_soc.c'))
+arm_ss.add(when: 'CONFIG_STM32G000_SOC', if_true: files('stm32g000_soc.c'))
 arm_ss.add(when: 'CONFIG_B_L475E_IOT01A', if_true: files('b-l475e-iot01a.c'))
 arm_ss.add(when: 'CONFIG_STM32L4X5_SOC', if_true: files('stm32l4x5_soc.c'))
 arm_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp.c', 
'xlnx-zcu102.c'))
diff --git a/hw/arm/stm32g000_soc.c b/hw/arm/stm32g000_soc.c
new file mode 100644
index ..8f97d8c89ad9
--- /dev/null
+++ b/hw/arm/stm32g000_soc.c
@@ -0,0 +1,246 @@
+/*
+ * STM32G000 SoC
+ *
+ * Copyright (c) 2024 Felipe Balbi 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/module.h"
+#include "hw/arm/boot.h"
+#include "exec/address-spaces.h"
+#include "hw/arm/stm32g000_soc.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-clock.h"
+#include "hw/misc/unimp.h"
+#include "sysemu/sysemu.h"
+
+/* stm32g000_soc implementation is derived from stm32f100_soc */
+
+struct stm32g0_ip_config {
+const char *name;
+uint32_t   addr;
+uint32_t   irq;
+};
+
+#define STM32G0_DEFINE_IP(n, a, i)\
+{ \
+.name = (n),  \
+.addr = (a),  \
+.irq = (i),   \
+}
+
+static const struct stm32g0_ip_config usart_config[STM_NUM_USARTS] = {
+STM32G0_DEFINE_IP("USART1", 0x40013800, 27),
+STM32G0_DEFINE_IP("USART2", 0x40004000, 28),
+STM32G0_DEFINE_IP("USART3", 0x40004400, 29),
+STM32G0_DEFINE_IP("USART4", 0x40004800, 29),
+STM32G0_DEFINE_IP("USART5", 0x40004c00, 29),
+STM32G0_DEFINE_IP("USART6", 0x40005000, 29),
+STM32G0_DEFINE_IP("LPUSART1", 0x40008000, 29),
+STM32G0_DEFINE_IP("LPUSART2", 0x40008400, 28),
+};
+
+static const struct stm32g0_ip_config spi_config[STM_NUM_SPIS] = {
+STM32G0_DEFINE_IP("SPI1", 0x40013000, 25),
+STM32G0_DEFINE_IP("SPI2", 0x40003800, 26),
+/* STM32G0_DEFINE_IP("SPI3", 0x4003c000, 26), only on STM32G0B1xx and 
STM32G0C1xx */
+};
+
+static void stm32g000_soc_initfn(Object *obj)
+{
+STM32G000State *s = STM32G000_SOC(obj);
+int i;
+
+object_initialize_child(obj, "armv7m", >armv7m, TYPE_ARMV7M);
+
+for (i = 0; i < STM_NUM_USARTS; i++) {
+object_initialize_child(obj, "usart[*]", >usart[i],
+TYPE_STM32F2XX_USART);
+}
+
+for (i = 0; i < STM_NUM_SPIS; i++) {
+object_initialize_child(obj, "spi[*]", >spi[i], TYPE_STM32F2XX_SPI);
+}
+
+s->sysclk = qdev_init_clock_in(DEVICE(s), "sysclk", NULL, NULL, 0);
+s->refclk =

[PATCH 2/2] hw/arm: Add nucleo-g071rb board

2024-03-20 Thread Felipe Balbi

From: Felipe Balbi 

This board is based around STM32G071RB SoC, a Cortex-M0 based
device. More information can be found at:

https://www.st.com/en/product/nucleo-g071rb.html

Signed-off-by: Felipe Balbi 
---
 hw/arm/Kconfig |  6 
 hw/arm/meson.build |  1 +
 hw/arm/nucleo-g071rb.c | 70 ++
 3 files changed, 77 insertions(+)
 create mode 100644 hw/arm/nucleo-g071rb.c

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 28a46d2b1ad3..5938bb8208a1 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -310,6 +310,12 @@ config STM32VLDISCOVERY
 depends on TCG && ARM
 select STM32F100_SOC
 
+config NUCLEO_G071RB
+bool
+default y
+depends on TCG && ARM
+select STM32G000_SOC
+
 config STRONGARM
 bool
 select PXA2XX
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index 9c4137a988e1..580c2d55fc3f 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -18,6 +18,7 @@ arm_ss.add(when: 'CONFIG_REALVIEW', if_true: 
files('realview.c'))
 arm_ss.add(when: 'CONFIG_SBSA_REF', if_true: files('sbsa-ref.c'))
 arm_ss.add(when: 'CONFIG_STELLARIS', if_true: files('stellaris.c'))
 arm_ss.add(when: 'CONFIG_STM32VLDISCOVERY', if_true: 
files('stm32vldiscovery.c'))
+arm_ss.add(when: 'CONFIG_NUCLEO_G071RB', if_true: files('nucleo-g071rb.c'))
 arm_ss.add(when: 'CONFIG_ZYNQ', if_true: files('xilinx_zynq.c'))
 arm_ss.add(when: 'CONFIG_SABRELITE', if_true: files('sabrelite.c'))
 
diff --git a/hw/arm/nucleo-g071rb.c b/hw/arm/nucleo-g071rb.c
new file mode 100644
index ..580b52bacf2c
--- /dev/null
+++ b/hw/arm/nucleo-g071rb.c
@@ -0,0 +1,70 @@
+/*
+ * ST Nucleo G071RB
+ *
+ * Copyright (c) 2024 Felipe Balbi 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/boards.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-clock.h"
+#include "qemu/error-report.h"
+#include "hw/arm/stm32g000_soc.h"
+#include "hw/arm/boot.h"
+
+/* nucleo_g071rb implementation is derived from olimex-stm32-h405.c */
+
+/* Main SYSCLK frequency in Hz (48MHz) */
+#define SYSCLK_FRQ 4800ULL
+
+static void nucleo_g071rb_init(MachineState *machine)
+{
+DeviceState *dev;
+Clock *sysclk;
+
+/* This clock doesn't need migration because it is fixed-frequency */
+sysclk = clock_new(OBJECT(machine), "SYSCLK");
+clock_set_hz(sysclk, SYSCLK_FRQ);
+
+dev = qdev_new(TYPE_STM32G000_SOC);
+object_property_add_child(OBJECT(machine), "soc", OBJECT(dev));
+qdev_connect_clock_in(dev, "sysclk", sysclk);
+sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
+
+armv7m_load_kernel(ARM_CPU(first_cpu),
+   machine->kernel_filename,
+   0, FLASH_SIZE);
+}
+
+static void nucleo_g071rb_machine_init(MachineClass *mc)
+{
+static const char * const valid_cpu_types[] = {
+ARM_CPU_TYPE_NAME("cortex-m0"),
+NULL
+};
+
+mc->desc = "ST Nucleo-G071RB (Cortex-M0)";
+mc->init = nucleo_g071rb_init;
+mc->valid_cpu_types = valid_cpu_types;
+}
+
+DEFINE_MACHINE("nucleo-g071rb", nucleo_g071rb_machine_init)
-- 
2.44.0

[PATCH 0/2] Add support for STM32G0 SoC family

2024-03-20 Thread Felipe Balbi

From: Felipe Balbi 

Hi all,

These two patches add support for STM32G0 family and nucleo-g071rb
board. Patches have been tested with minimal embedded rust examples.

Felipe Balbi (2):
  hw/arm: Add support for stm32g000 SoC family
  hw/arm: Add nucleo-g071rb board

 hw/arm/Kconfig |  12 ++
 hw/arm/meson.build |   2 +
 hw/arm/nucleo-g071rb.c |  70 ++
 hw/arm/stm32g000_soc.c | 246 +
 include/hw/arm/stm32g000_soc.h |  62 +
 5 files changed, 392 insertions(+)
 create mode 100644 hw/arm/nucleo-g071rb.c
 create mode 100644 hw/arm/stm32g000_soc.c
 create mode 100644 include/hw/arm/stm32g000_soc.h

-- 
2.44.0

[PATCH] coroutine: reserve 5,000 mappings

2024-03-20 Thread Stefan Hajnoczi

Daniel P. Berrangé  pointed out that the coroutine
pool size heuristic is very conservative. Instead of halving
max_map_count, he suggested reserving 5,000 mappings for non-coroutine
users based on observations of guests he has access to.

Fixes: 86a637e48104 ("coroutine: cap per-thread local pool size")
Signed-off-by: Stefan Hajnoczi 
---
 util/qemu-coroutine.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index 2790959eaf..eb4eebefdf 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -377,12 +377,17 @@ static unsigned int get_global_pool_hard_max_size(void)
 NULL) &&
 qemu_strtoi(contents, NULL, 10, _map_count) == 0) {
 /*
- * This is a conservative upper bound that avoids exceeding
- * max_map_count. Leave half for non-coroutine users like library
- * dependencies, vhost-user, etc. Each coroutine takes up 2 VMAs so
- * halve the amount again.
+ * This is an upper bound that avoids exceeding max_map_count. Leave a
+ * fixed amount for non-coroutine users like library dependencies,
+ * vhost-user, etc. Each coroutine takes up 2 VMAs so halve the
+ * remaining amount.
  */
-return max_map_count / 4;
+if (max_map_count > 5000) {
+return (max_map_count - 5000) / 2;
+} else {
+/* Disable the global pool but threads still have local pools */
+return 0;
+}
 }
 #endif
 
-- 
2.44.0

Re: [PATCH v3 48/49] hw/i386/sev: Use guest_memfd for legacy ROMs

2024-03-20 Thread Isaku Yamahata

On Wed, Mar 20, 2024 at 03:39:44AM -0500,
Michael Roth  wrote:

> TODO: make this SNP-specific if TDX disables legacy ROMs in general

TDX disables pc.rom, not disable isa-bios. IIRC, TDX doesn't need pc pflash.
Xiaoyao can chime in.

Thanks,

> 
> Current SNP guest kernels will attempt to access these regions with
> with C-bit set, so guest_memfd is needed to handle that. Otherwise,
> kvm_convert_memory() will fail when the guest kernel tries to access it
> and QEMU attempts to call KVM_SET_MEMORY_ATTRIBUTES to set these ranges
> to private.
> 
> Whether guests should actually try to access ROM regions in this way (or
> need to deal with legacy ROM regions at all), is a separate issue to be
> addressed on kernel side, but current SNP guest kernels will exhibit
> this behavior and so this handling is needed to allow QEMU to continue
> running existing SNP guest kernels.
> 
> Signed-off-by: Michael Roth 
> ---
>  hw/i386/pc.c   | 13 +
>  hw/i386/pc_sysfw.c | 13 ++---
>  2 files changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index feb7a93083..5feaeb43ee 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1011,10 +1011,15 @@ void pc_memory_init(PCMachineState *pcms,
>  pc_system_firmware_init(pcms, rom_memory);
>  
>  option_rom_mr = g_malloc(sizeof(*option_rom_mr));
> -memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE,
> -   _fatal);
> -if (pcmc->pci_enabled) {
> -memory_region_set_readonly(option_rom_mr, true);
> +if (machine_require_guest_memfd(machine)) {
> +memory_region_init_ram_guest_memfd(option_rom_mr, NULL, "pc.rom",
> +   PC_ROM_SIZE, _fatal);
> +} else {
> +memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE,
> +   _fatal);
> +if (pcmc->pci_enabled) {
> +memory_region_set_readonly(option_rom_mr, true);
> +}
>  }
>  memory_region_add_subregion_overlap(rom_memory,
>  PC_ROM_MIN_VGA,
> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
> index 9dbb3f7337..850f86edd4 100644
> --- a/hw/i386/pc_sysfw.c
> +++ b/hw/i386/pc_sysfw.c
> @@ -54,8 +54,13 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory,
>  /* map the last 128KB of the BIOS in ISA space */
>  isa_bios_size = MIN(flash_size, 128 * KiB);
>  isa_bios = g_malloc(sizeof(*isa_bios));
> -memory_region_init_ram(isa_bios, NULL, "isa-bios", isa_bios_size,
> -   _fatal);
> +if (machine_require_guest_memfd(current_machine)) {
> +memory_region_init_ram_guest_memfd(isa_bios, NULL, "isa-bios",
> +   isa_bios_size, _fatal);
> +} else {
> +memory_region_init_ram(isa_bios, NULL, "isa-bios", isa_bios_size,
> +   _fatal);
> +}
>  memory_region_add_subregion_overlap(rom_memory,
>  0x10 - isa_bios_size,
>  isa_bios,
> @@ -68,7 +73,9 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory,
> ((uint8_t*)flash_ptr) + (flash_size - isa_bios_size),
> isa_bios_size);
>  
> -memory_region_set_readonly(isa_bios, true);
> +if (!machine_require_guest_memfd(current_machine)) {
> +memory_region_set_readonly(isa_bios, true);
> +}
>  }
>  
>  static PFlashCFI01 *pc_pflash_create(PCMachineState *pcms,
> -- 
> 2.25.1
> 
> 

-- 
Isaku Yamahata

Re: [PATCH 2/3] migration: Drop unnecessary check in ram's pending_exact()

2024-03-20 Thread Nina Schoetterl-Glausch

I cc'ed Juan, but it looks like he is no longer with Redhat.

Re: [PATCH v3 40/49] hw/i386/sev: Add function to get SEV metadata from OVMF header

2024-03-20 Thread Isaku Yamahata

On Wed, Mar 20, 2024 at 03:39:36AM -0500,
Michael Roth  wrote:

> From: Brijesh Singh 
> 
> A recent version of OVMF expanded the reset vector GUID list to add
> SEV-specific metadata GUID. The SEV metadata describes the reserved
> memory regions such as the secrets and CPUID page used during the SEV-SNP
> guest launch.
> 
> The pc_system_get_ovmf_sev_metadata_ptr() is used to retieve the SEV
> metadata pointer from the OVMF GUID list.
> 
> Signed-off-by: Brijesh Singh 
> Signed-off-by: Michael Roth 
> ---
>  hw/i386/pc_sysfw_ovmf.c | 33 +
>  include/hw/i386/pc.h| 26 ++
>  2 files changed, 59 insertions(+)
> 
> diff --git a/hw/i386/pc_sysfw_ovmf.c b/hw/i386/pc_sysfw_ovmf.c
> index 07a4c267fa..32efa34614 100644
> --- a/hw/i386/pc_sysfw_ovmf.c
> +++ b/hw/i386/pc_sysfw_ovmf.c
> @@ -35,6 +35,31 @@ static const int bytes_after_table_footer = 32;
>  static bool ovmf_flash_parsed;
>  static uint8_t *ovmf_table;
>  static int ovmf_table_len;
> +static OvmfSevMetadata *ovmf_sev_metadata_table;
> +
> +#define OVMF_SEV_META_DATA_GUID "dc886566-984a-4798-A75e-5585a7bf67cc"
> +typedef struct __attribute__((__packed__)) OvmfSevMetadataOffset {
> +uint32_t offset;
> +} OvmfSevMetadataOffset;
> +
> +static void pc_system_parse_sev_metadata(uint8_t *flash_ptr, size_t 
> flash_size)
> +{
> +OvmfSevMetadata *metadata;
> +OvmfSevMetadataOffset  *data;
> +
> +if (!pc_system_ovmf_table_find(OVMF_SEV_META_DATA_GUID, (uint8_t 
> **),
> +   NULL)) {
> +return;
> +}
> +
> +metadata = (OvmfSevMetadata *)(flash_ptr + flash_size - data->offset);
> +if (memcmp(metadata->signature, "ASEV", 4) != 0) {
> +return;
> +}
> +
> +ovmf_sev_metadata_table = g_malloc(metadata->len);
> +memcpy(ovmf_sev_metadata_table, metadata, metadata->len);
> +}
>  
>  void pc_system_parse_ovmf_flash(uint8_t *flash_ptr, size_t flash_size)
>  {
> @@ -90,6 +115,9 @@ void pc_system_parse_ovmf_flash(uint8_t *flash_ptr, size_t 
> flash_size)
>   */
>  memcpy(ovmf_table, ptr - tot_len, tot_len);
>  ovmf_table += tot_len;
> +
> +/* Copy the SEV metadata table (if exist) */
> +pc_system_parse_sev_metadata(flash_ptr, flash_size);
>  }

Can we move this call to x86_firmware_configure() @ pc_sysfw.c, and move sev
specific bits to somewhere to sev specific file?  We don't have to parse sev
metadata for non-SEV case, right?

We don't have to touch common ovmf file. It also will be consistent with tdx
case.  TDX patch series adds tdx_parse_tdvf() to x86_firmware_configure().

thanks,

>  
>  /**
> @@ -159,3 +187,8 @@ bool pc_system_ovmf_table_find(const char *entry, uint8_t 
> **data,
>  }
>  return false;
>  }
> +
> +OvmfSevMetadata *pc_system_get_ovmf_sev_metadata_ptr(void)
> +{
> +return ovmf_sev_metadata_table;
> +}
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index fb1d4106e5..df9a61540d 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -163,6 +163,32 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int 
> level);
>  #define PCI_HOST_ABOVE_4G_MEM_SIZE "above-4g-mem-size"
>  #define PCI_HOST_PROP_SMM_RANGES   "smm-ranges"
>  
> +typedef enum {
> +SEV_DESC_TYPE_UNDEF,
> +/* The section contains the region that must be validated by the VMM. */
> +SEV_DESC_TYPE_SNP_SEC_MEM,
> +/* The section contains the SNP secrets page */
> +SEV_DESC_TYPE_SNP_SECRETS,
> +/* The section contains address that can be used as a CPUID page */
> +SEV_DESC_TYPE_CPUID,
> +
> +} ovmf_sev_metadata_desc_type;
> +
> +typedef struct __attribute__((__packed__)) OvmfSevMetadataDesc {
> +uint32_t base;
> +uint32_t len;
> +ovmf_sev_metadata_desc_type type;
> +} OvmfSevMetadataDesc;
> +
> +typedef struct __attribute__((__packed__)) OvmfSevMetadata {
> +uint8_t signature[4];
> +uint32_t len;
> +uint32_t version;
> +uint32_t num_desc;
> +OvmfSevMetadataDesc descs[];
> +} OvmfSevMetadata;
> +
> +OvmfSevMetadata *pc_system_get_ovmf_sev_metadata_ptr(void);
>  
>  void pc_pci_as_mapping_init(MemoryRegion *system_memory,
>  MemoryRegion *pci_address_space);
> -- 
> 2.25.1
> 
> 

-- 
Isaku Yamahata

Re: [PATCH 2/3] migration: Drop unnecessary check in ram's pending_exact()

2024-03-20 Thread Nina Schoetterl-Glausch

On Wed, 2024-01-17 at 15:58 +0800, pet...@redhat.com wrote:
> From: Peter Xu 
> 
> When the migration frameworks fetches the exact pending sizes, it means
> this check:
> 
>   remaining_size < s->threshold_size
> 
> Must have been done already, actually at migration_iteration_run():
> 
> if (must_precopy <= s->threshold_size) {
> qemu_savevm_state_pending_exact(_precopy, _postcopy);
> 
> That should be after one round of ram_state_pending_estimate().  It makes
> the 2nd check meaningless and can be dropped.
> 
> To say it in another way, when reaching ->state_pending_exact(), we
> unconditionally sync dirty bits for precopy.
> 
> Then we can drop migrate_get_current() there too.
> 
> Signed-off-by: Peter Xu 

Hi Peter,

could you have a look at this issue:
https://gitlab.com/qemu-project/qemu/-/issues/1565

which I reopened. Previous thread here:

https://lore.kernel.org/qemu-devel/20230324184129.3119575-1-...@linux.ibm.com/

I'm seeing migration failures with s390x TCG again, which look the same to me
as those a while back.

> ---
>  migration/ram.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index c0cdcccb75..d5b7cd5ac2 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3213,21 +3213,20 @@ static void ram_state_pending_estimate(void *opaque, 
> uint64_t *must_precopy,
>  static void ram_state_pending_exact(void *opaque, uint64_t *must_precopy,
>  uint64_t *can_postcopy)
>  {
> -MigrationState *s = migrate_get_current();
>  RAMState **temp = opaque;
>  RAMState *rs = *temp;
> +uint64_t remaining_size;
>  
> -uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
> -
> -if (!migration_in_postcopy() && remaining_size < s->threshold_size) {
> +if (!migration_in_postcopy()) {
>  bql_lock();
>  WITH_RCU_READ_LOCK_GUARD() {
>  migration_bitmap_sync_precopy(rs, false);
>  }
>  bql_unlock();
> -remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
>  }
>  
> +remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
> +
>  if (migrate_postcopy_ram()) {
>  /* We can do postcopy, and all the data is postcopiable */
>  *can_postcopy += remaining_size;

This basically reverts 28ef5339c3 ("migration: fix ram_state_pending_exact()"), 
which originally
made the issue disappear.

Any thoughts on the matter appreciated.

Thanks,
Nina

Re: [PATCH] libqos/virtio.c: Correct 'flags' reading in qvirtqueue_kick

2024-03-20 Thread Stefan Hajnoczi

On Wed, 20 Mar 2024 at 09:10, Zheyu Ma  wrote:
>
> In qvirtqueue_kick(), the 'flags' were previously being incorrectly read from
> vq->avail instead of the correct vq->used location. This update ensures 
> 'flags'
> are read from the correct location as per the virtio standard.
>
> Signed-off-by: Zheyu Ma 
> ---
>  tests/qtest/libqos/virtio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 

> diff --git a/tests/qtest/libqos/virtio.c b/tests/qtest/libqos/virtio.c
> index 82a6e122bf..a21b6eee9c 100644
> --- a/tests/qtest/libqos/virtio.c
> +++ b/tests/qtest/libqos/virtio.c
> @@ -394,7 +394,7 @@ void qvirtqueue_kick(QTestState *qts, QVirtioDevice *d, 
> QVirtQueue *vq,
>  qvirtio_writew(d, qts, vq->avail + 2, idx + 1);
>
>  /* Must read after idx is updated */
> -flags = qvirtio_readw(d, qts, vq->avail);
> +flags = qvirtio_readw(d, qts, vq->used);
>  avail_event = qvirtio_readw(d, qts, vq->used + 4 +
>  sizeof(struct vring_used_elem) * vq->size);
>
> --
> 2.34.1
>
>

Re: [PATCH for-9.1 v5 09/14] memory: Add Error** argument to .log_global_start() handler

2024-03-20 Thread Peter Xu

On Wed, Mar 20, 2024 at 05:15:06PM +0100, Cédric Le Goater wrote:
> Sure, or I will in a v6. Markus had a comment on 8/14.

Yeah, I can handle both if they're the only ones.  Thanks,

-- 
Peter Xu

[PATCH] target/riscv: Fix mode in riscv_tlb_fill

2024-03-20 Thread Irina Ryapolova

Need to convert mmu_idx to privilege mode for PMP function.

Signed-off-by: Irina Ryapolova 
---
 target/riscv/cpu_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index ce7322011d..fc090d729a 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -1315,7 +1315,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 bool two_stage_lookup = mmuidx_2stage(mmu_idx);
 bool two_stage_indirect_error = false;
 int ret = TRANSLATE_FAIL;
-int mode = mmu_idx;
+int mode = mmuidx_priv(mmu_idx);
 /* default TLB page size */
 target_ulong tlb_size = TARGET_PAGE_SIZE;
 
-- 
2.25.1

[PATCH v8 6/6] target/riscv: Enable updates for pointer masking variables and thus enable pointer masking extension

2024-03-20 Thread Alexey Baturo

From: Alexey Baturo 

Signed-off-by: Alexey Baturo 

Reviewed-by: Alistair Francis 
---
 target/riscv/cpu.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 73c69f3d0a..9e3bf6c5c5 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -190,6 +190,9 @@ const RISCVIsaExtData isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(svinval, PRIV_VERSION_1_12_0, ext_svinval),
 ISA_EXT_DATA_ENTRY(svnapot, PRIV_VERSION_1_12_0, ext_svnapot),
 ISA_EXT_DATA_ENTRY(svpbmt, PRIV_VERSION_1_12_0, ext_svpbmt),
+ISA_EXT_DATA_ENTRY(ssnpm, PRIV_VERSION_1_12_0, ext_ssnpm),
+ISA_EXT_DATA_ENTRY(smnpm, PRIV_VERSION_1_12_0, ext_smnpm),
+ISA_EXT_DATA_ENTRY(smmpm, PRIV_VERSION_1_12_0, ext_smmpm),
 ISA_EXT_DATA_ENTRY(xtheadba, PRIV_VERSION_1_11_0, ext_xtheadba),
 ISA_EXT_DATA_ENTRY(xtheadbb, PRIV_VERSION_1_11_0, ext_xtheadbb),
 ISA_EXT_DATA_ENTRY(xtheadbs, PRIV_VERSION_1_11_0, ext_xtheadbs),
@@ -1561,6 +1564,11 @@ const RISCVCPUMultiExtConfig riscv_cpu_vendor_exts[] = {
 
 /* These are experimental so mark with 'x-' */
 const RISCVCPUMultiExtConfig riscv_cpu_experimental_exts[] = {
+/* Zjpm v0.8 extensions */
+MULTI_EXT_CFG_BOOL("x-ssnpm", ext_ssnpm, false),
+MULTI_EXT_CFG_BOOL("x-smnpm", ext_smnpm, false),
+MULTI_EXT_CFG_BOOL("x-smmpm", ext_smmpm, false),
+
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.34.1

Re: [PATCH v4 1/3] qio: add support for SO_PEERCRED for socket channel

2024-03-20 Thread Daniel P . Berrangé

On Mon, Mar 18, 2024 at 04:12:14PM +0100, Anthony Harivel wrote:
> The function qio_channel_get_peercred() returns a pointer to the
> credentials of the peer process connected to this socket.
> 
> This credentials structure is defined in  as follows:
> 
> struct ucred {
>   pid_t pid;/* Process ID of the sending process */
>   uid_t uid;/* User ID of the sending process */
>   gid_t gid;/* Group ID of the sending process */
> };
> 
> The use of this function is possible only for connected AF_UNIX stream
> sockets and for AF_UNIX stream and datagram socket pairs.
> 
> On platform other than Linux, the function return 0.
> 
> Signed-off-by: Anthony Harivel 
> ---
>  include/io/channel.h | 21 +
>  io/channel-socket.c  | 24 
>  io/channel.c | 12 
>  3 files changed, 57 insertions(+)
> 
> diff --git a/include/io/channel.h b/include/io/channel.h
> index 7986c49c713a..01ad7bd7e430 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -160,6 +160,9 @@ struct QIOChannelClass {
>void *opaque);
>  int (*io_flush)(QIOChannel *ioc,
>  Error **errp);
> +void (*io_peerpid)(QIOChannel *ioc,
> +   unsigned int *pid,
> +   Error **errp);
>  };
>  
>  /* General I/O handling functions */
> @@ -981,4 +984,22 @@ int coroutine_mixed_fn 
> qio_channel_writev_full_all(QIOChannel *ioc,
>  int qio_channel_flush(QIOChannel *ioc,
>Error **errp);
>  
> +/**
> + * qio_channel_get_peercred:
> + * @ioc: the channel object
> + * @pid: pointer to pid
> + * @errp: pointer to a NULL-initialized error object
> + *
> + * Returns the pid of the peer process connected to this socket.
> + *
> + * The use of this function is possible only for connected
> + * AF_UNIX stream sockets and for AF_UNIX stream and datagram
> + * socket pairs on Linux.
> + * Return an error with pid -1 for the non-Linux OS.
> + *
> + */
> +void qio_channel_get_peerpid(QIOChannel *ioc,
> + unsigned int *pid,
> + Error **errp);
> +
>  #endif /* QIO_CHANNEL_H */
> diff --git a/io/channel-socket.c b/io/channel-socket.c
> index 3a899b060858..fcff92ecc151 100644
> --- a/io/channel-socket.c
> +++ b/io/channel-socket.c
> @@ -841,6 +841,29 @@ qio_channel_socket_set_cork(QIOChannel *ioc,
>  socket_set_cork(sioc->fd, v);
>  }
>  
> +static void
> +qio_channel_socket_get_peerpid(QIOChannel *ioc,
> +   unsigned int *pid,
> +   Error **errp)
> +{
> +#ifdef CONFIG_LINUX
> +QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
> +Error *err = NULL;
> +socklen_t len = sizeof(struct ucred);
> +
> +struct ucred cred;
> +if (getsockopt(sioc->fd,
> +   SOL_SOCKET, SO_PEERCRED,
> +   , ) == -1) {

Set '*pid = -1'

> +error_setg_errno(, errno, "Unable to get peer credentials");
> +error_propagate(errp, err);

and 'return;' here, since accessing 'cred.pid' below
is undefined behaviour if getsockopt failed.

> +}
> +*pid = (unsigned int)cred.pid;
> +#else
> +error_setg(errp, "Unsupported feature");
> +*pid = -1;
> +#endif
> +}
>  
>  static int
>  qio_channel_socket_close(QIOChannel *ioc,
> @@ -938,6 +961,7 @@ static void qio_channel_socket_class_init(ObjectClass 
> *klass,
>  #ifdef QEMU_MSG_ZEROCOPY
>  ioc_klass->io_flush = qio_channel_socket_flush;
>  #endif
> +ioc_klass->io_peerpid = qio_channel_socket_get_peerpid;
>  }
>  
>  static const TypeInfo qio_channel_socket_info = {
> diff --git a/io/channel.c b/io/channel.c
> index a1f12f8e9096..777989bc9a81 100644
> --- a/io/channel.c
> +++ b/io/channel.c
> @@ -548,6 +548,18 @@ void qio_channel_set_cork(QIOChannel *ioc,
>  }
>  }
>  
> +void qio_channel_get_peerpid(QIOChannel *ioc,
> + unsigned int *pid,
> + Error **errp)
> +{
> +QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> +
> +if (!klass->io_peerpid) {
> +error_setg(errp, "Channel does not support peer pid");
> +return;
> +}
> +klass->io_peerpid(ioc, pid, errp);
> +}
>  
>  off_t qio_channel_io_seek(QIOChannel *ioc,
>off_t offset,
> -- 
> 2.44.0
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH v8 4/6] target/riscv: Add pointer masking tb flags

2024-03-20 Thread Alexey Baturo

From: Alexey Baturo 

Signed-off-by: Alexey Baturo 

Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
---
 target/riscv/cpu.h| 3 +++
 target/riscv/cpu_helper.c | 3 +++
 target/riscv/translate.c  | 5 +
 3 files changed, 11 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 0112b568a0..404f6ec50d 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -566,6 +566,9 @@ FIELD(TB_FLAGS, ITRIGGER, 20, 1)
 FIELD(TB_FLAGS, VIRT_ENABLED, 21, 1)
 FIELD(TB_FLAGS, PRIV, 22, 2)
 FIELD(TB_FLAGS, AXL, 24, 2)
+/* If pointer masking should be applied and address sign extended */
+FIELD(TB_FLAGS, PM_PMM, 26, 2)
+FIELD(TB_FLAGS, PM_SIGNEXTEND, 28, 1)
 
 #ifdef TARGET_RISCV32
 #define riscv_cpu_mxl(env)  ((void)(env), MXL_RV32)
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index a563451c48..4dea564fd8 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -68,6 +68,7 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
 RISCVCPU *cpu = env_archcpu(env);
 RISCVExtStatus fs, vs;
 uint32_t flags = 0;
+bool pm_signext = riscv_cpu_virt_mem_enabled(env);
 
 *pc = env->xl == MXL_RV32 ? env->pc & UINT32_MAX : env->pc;
 *cs_base = 0;
@@ -138,6 +139,8 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
 flags = FIELD_DP32(flags, TB_FLAGS, VS, vs);
 flags = FIELD_DP32(flags, TB_FLAGS, XL, env->xl);
 flags = FIELD_DP32(flags, TB_FLAGS, AXL, cpu_address_xl(env));
+flags = FIELD_DP32(flags, TB_FLAGS, PM_PMM, riscv_pm_get_pmm(env));
+flags = FIELD_DP32(flags, TB_FLAGS, PM_SIGNEXTEND, pm_signext);
 
 *pflags = flags;
 }
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 3382eb0a5f..a85a2abf2e 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -103,6 +103,9 @@ typedef struct DisasContext {
 bool vl_eq_vlmax;
 CPUState *cs;
 TCGv zero;
+/* actual address width */
+uint8_t addr_width;
+bool addr_signed;
 /* Ztso */
 bool ztso;
 /* Use icount trigger for native debug */
@@ -1180,6 +1183,8 @@ static void riscv_tr_init_disas_context(DisasContextBase 
*dcbase, CPUState *cs)
 ctx->xl = FIELD_EX32(tb_flags, TB_FLAGS, XL);
 ctx->address_xl = FIELD_EX32(tb_flags, TB_FLAGS, AXL);
 ctx->cs = cs;
+ctx->addr_width = 0;
+ctx->addr_signed = false;
 ctx->ztso = cpu->cfg.ext_ztso;
 ctx->itrigger = FIELD_EX32(tb_flags, TB_FLAGS, ITRIGGER);
 ctx->zero = tcg_constant_tl(0);
-- 
2.34.1

[PATCH v8 5/6] target/riscv: Update address modify functions to take into account pointer masking

2024-03-20 Thread Alexey Baturo

From: Alexey Baturo 

Signed-off-by: Alexey Baturo 

Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
---
 target/riscv/translate.c | 22 --
 target/riscv/vector_helper.c | 13 +
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index a85a2abf2e..99c5c6a530 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -581,8 +581,10 @@ static TCGv get_address(DisasContext *ctx, int rs1, int 
imm)
 TCGv src1 = get_gpr(ctx, rs1, EXT_NONE);
 
 tcg_gen_addi_tl(addr, src1, imm);
-if (get_address_xl(ctx) == MXL_RV32) {
-tcg_gen_ext32u_tl(addr, addr);
+if (ctx->addr_signed) {
+tcg_gen_sextract_tl(addr, addr, 0, ctx->addr_width);
+} else {
+tcg_gen_extract_tl(addr, addr, 0, ctx->addr_width);
 }
 
 return addr;
@@ -595,8 +597,10 @@ static TCGv get_address_indexed(DisasContext *ctx, int 
rs1, TCGv offs)
 TCGv src1 = get_gpr(ctx, rs1, EXT_NONE);
 
 tcg_gen_add_tl(addr, src1, offs);
-if (get_xl(ctx) == MXL_RV32) {
-tcg_gen_ext32u_tl(addr, addr);
+if (ctx->addr_signed) {
+tcg_gen_sextract_tl(addr, addr, 0, ctx->addr_width);
+} else {
+tcg_gen_extract_tl(addr, addr, 0, ctx->addr_width);
 }
 return addr;
 }
@@ -1183,8 +1187,14 @@ static void riscv_tr_init_disas_context(DisasContextBase 
*dcbase, CPUState *cs)
 ctx->xl = FIELD_EX32(tb_flags, TB_FLAGS, XL);
 ctx->address_xl = FIELD_EX32(tb_flags, TB_FLAGS, AXL);
 ctx->cs = cs;
-ctx->addr_width = 0;
-ctx->addr_signed = false;
+if (get_xl(ctx) == MXL_RV32) {
+ctx->addr_width = 32;
+ctx->addr_signed = false;
+} else {
+int pm_pmm = FIELD_EX32(tb_flags, TB_FLAGS, PM_PMM);
+ctx->addr_width = 64 - riscv_pm_get_pmlen(pm_pmm);
+ctx->addr_signed = FIELD_EX32(tb_flags, TB_FLAGS, PM_SIGNEXTEND);
+}
 ctx->ztso = cpu->cfg.ext_ztso;
 ctx->itrigger = FIELD_EX32(tb_flags, TB_FLAGS, ITRIGGER);
 ctx->zero = tcg_constant_tl(0);
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 4934b43722..c77fbd8929 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -104,6 +104,19 @@ static inline uint32_t vext_max_elems(uint32_t desc, 
uint32_t log2_esz)
 
 static inline target_ulong adjust_addr(CPURISCVState *env, target_ulong addr)
 {
+RISCVPmPmm pmm = riscv_pm_get_pmm(env);
+if (pmm == PMM_FIELD_DISABLED) {
+return addr;
+}
+int pmlen = riscv_pm_get_pmlen(pmm);
+bool signext = riscv_cpu_virt_mem_enabled(env);
+addr = addr << pmlen;
+/* sign/zero extend masked address by N-1 bit */
+if (signext) {
+addr = (target_long)addr >> pmlen;
+} else {
+addr = addr >> pmlen;
+}
 return addr;
 }
 
-- 
2.34.1

[PATCH v8 3/6] target/riscv: Add helper functions to calculate current number of masked bits for pointer masking

2024-03-20 Thread Alexey Baturo

From: Alexey Baturo 

Signed-off-by: Alexey Baturo 

Reviewed-by: Alistair Francis 
---
 target/riscv/cpu.h|  4 +++
 target/riscv/cpu_helper.c | 58 +++
 2 files changed, 62 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index b694cc62bf..0112b568a0 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -700,6 +700,10 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
 
 bool riscv_cpu_is_32bit(RISCVCPU *cpu);
 
+bool riscv_cpu_virt_mem_enabled(CPURISCVState *env);
+RISCVPmPmm riscv_pm_get_pmm(CPURISCVState *env);
+int riscv_pm_get_pmlen(RISCVPmPmm pmm);
+
 RISCVException riscv_csrrw(CPURISCVState *env, int csrno,
target_ulong *ret_value,
target_ulong new_value, target_ulong write_mask);
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index d20bffdd5a..a563451c48 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -142,6 +142,64 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
 *pflags = flags;
 }
 
+RISCVPmPmm riscv_pm_get_pmm(CPURISCVState *env)
+{
+int pmm = 0;
+#ifndef CONFIG_USER_ONLY
+int priv_mode = cpu_address_mode(env);
+/* Get current PMM field */
+switch (priv_mode) {
+case PRV_M:
+pmm = riscv_cpu_cfg(env)->ext_smmpm ?
+  get_field(env->mseccfg, MSECCFG_PMM) : PMM_FIELD_DISABLED;
+break;
+case PRV_S:
+pmm = riscv_cpu_cfg(env)->ext_smnpm ?
+  get_field(env->menvcfg, MENVCFG_PMM) : PMM_FIELD_DISABLED;
+break;
+case PRV_U:
+pmm = riscv_cpu_cfg(env)->ext_ssnpm ?
+  get_field(env->senvcfg, SENVCFG_PMM) : PMM_FIELD_DISABLED;
+break;
+default:
+g_assert_not_reached();
+}
+#endif
+return pmm;
+}
+
+bool riscv_cpu_virt_mem_enabled(CPURISCVState *env)
+{
+bool virt_mem_en = false;
+#ifndef CONFIG_USER_ONLY
+int satp_mode = 0;
+int priv_mode = cpu_address_mode(env);
+/* Get current PMM field */
+if (riscv_cpu_mxl(env) == MXL_RV32) {
+satp_mode = get_field(env->satp, SATP32_MODE);
+} else {
+satp_mode = get_field(env->satp, SATP64_MODE);
+}
+virt_mem_en = ((satp_mode != VM_1_10_MBARE) && (priv_mode != PRV_M));
+#endif
+return virt_mem_en;
+}
+
+int riscv_pm_get_pmlen(RISCVPmPmm pmm)
+{
+switch (pmm) {
+case PMM_FIELD_DISABLED:
+return 0;
+case PMM_FIELD_PMLEN7:
+return 7;
+case PMM_FIELD_PMLEN16:
+return 16;
+default:
+g_assert_not_reached();
+}
+return -1;
+}
+
 #ifndef CONFIG_USER_ONLY
 
 /*
-- 
2.34.1

[PATCH v8 2/6] target/riscv: Add new CSR fields for S{sn, mn, m}pm extensions as part of Zjpm v0.8

2024-03-20 Thread Alexey Baturo

From: Alexey Baturo 

Signed-off-by: Alexey Baturo 

Reviewed-by: Alistair Francis 
---
 target/riscv/cpu.h  |  8 
 target/riscv/cpu_bits.h |  3 +++
 target/riscv/cpu_cfg.h  |  3 +++
 target/riscv/csr.c  | 11 +++
 target/riscv/machine.c  | 10 +++---
 target/riscv/pmp.c  | 13 ++---
 target/riscv/pmp.h  | 11 ++-
 7 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index cfad5281a1..b694cc62bf 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -123,6 +123,14 @@ typedef enum {
 EXT_STATUS_DIRTY,
 } RISCVExtStatus;
 
+/* Enum holds PMM field values for Zjpm v0.8 extension */
+typedef enum {
+PMM_FIELD_DISABLED = 0,
+PMM_FIELD_RESERVED = 1,
+PMM_FIELD_PMLEN7   = 2,
+PMM_FIELD_PMLEN16  = 3,
+} RISCVPmPmm;
+
 #define MMU_USER_IDX 3
 
 #define MAX_RISCV_PMPS (16)
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index 5098d2d613..e9e6e1f952 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -708,6 +708,7 @@ typedef enum RISCVException {
 #define MENVCFG_CBIE   (3UL << 4)
 #define MENVCFG_CBCFE  BIT(6)
 #define MENVCFG_CBZE   BIT(7)
+#define MENVCFG_PMM(3ULL << 32)
 #define MENVCFG_ADUE   (1ULL << 61)
 #define MENVCFG_PBMTE  (1ULL << 62)
 #define MENVCFG_STCE   (1ULL << 63)
@@ -721,11 +722,13 @@ typedef enum RISCVException {
 #define SENVCFG_CBIE   MENVCFG_CBIE
 #define SENVCFG_CBCFE  MENVCFG_CBCFE
 #define SENVCFG_CBZE   MENVCFG_CBZE
+#define SENVCFG_PMMMENVCFG_PMM
 
 #define HENVCFG_FIOM   MENVCFG_FIOM
 #define HENVCFG_CBIE   MENVCFG_CBIE
 #define HENVCFG_CBCFE  MENVCFG_CBCFE
 #define HENVCFG_CBZE   MENVCFG_CBZE
+#define HENVCFG_PMMMENVCFG_PMM
 #define HENVCFG_ADUE   MENVCFG_ADUE
 #define HENVCFG_PBMTE  MENVCFG_PBMTE
 #define HENVCFG_STCE   MENVCFG_STCE
diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index 2040b90da0..963de724c2 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -118,6 +118,9 @@ struct RISCVCPUConfig {
 bool ext_ssaia;
 bool ext_sscofpmf;
 bool ext_smepmp;
+bool ext_ssnpm;
+bool ext_smnpm;
+bool ext_smmpm;
 bool rvv_ta_all_1s;
 bool rvv_ma_all_1s;
 
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index ffb5a1102e..69c0279c12 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -530,6 +530,9 @@ static RISCVException have_mseccfg(CPURISCVState *env, int 
csrno)
 if (riscv_cpu_cfg(env)->ext_zkr) {
 return RISCV_EXCP_NONE;
 }
+if (riscv_cpu_cfg(env)->ext_smmpm) {
+return RISCV_EXCP_NONE;
+}
 
 return RISCV_EXCP_ILLEGAL_INST;
 }
@@ -2080,6 +2083,10 @@ static RISCVException write_menvcfg(CPURISCVState *env, 
int csrno,
 (cfg->ext_sstc ? MENVCFG_STCE : 0) |
 (cfg->ext_svadu ? MENVCFG_ADUE : 0);
 }
+/* Update PMM field only if the value is valid according to Zjpm v0.8 */
+if (((val & MENVCFG_PMM) >> 32) != PMM_FIELD_RESERVED) {
+mask |= MENVCFG_PMM;
+}
 env->menvcfg = (env->menvcfg & ~mask) | (val & mask);
 
 return RISCV_EXCP_NONE;
@@ -2124,6 +2131,10 @@ static RISCVException write_senvcfg(CPURISCVState *env, 
int csrno,
 target_ulong val)
 {
 uint64_t mask = SENVCFG_FIOM | SENVCFG_CBIE | SENVCFG_CBCFE | SENVCFG_CBZE;
+/* Update PMM field only if the value is valid according to Zjpm v0.8 */
+if (((val & SENVCFG_PMM) >> 32) != PMM_FIELD_RESERVED) {
+mask |= SENVCFG_PMM;
+}
 RISCVException ret;
 
 ret = smstateen_acc_ok(env, 0, SMSTATEEN0_HSENVCFG);
diff --git a/target/riscv/machine.c b/target/riscv/machine.c
index 64ab66e332..28f373 100644
--- a/target/riscv/machine.c
+++ b/target/riscv/machine.c
@@ -152,15 +152,19 @@ static const VMStateDescription vmstate_vector = {
 
 static bool pointermasking_needed(void *opaque)
 {
-return false;
+RISCVCPU *cpu = opaque;
+return cpu->cfg.ext_ssnpm || cpu->cfg.ext_smnpm || cpu->cfg.ext_smmpm;
 }
 
 static const VMStateDescription vmstate_pointermasking = {
 .name = "cpu/pointer_masking",
-.version_id = 1,
-.minimum_version_id = 1,
+.version_id = 2,
+.minimum_version_id = 2,
 .needed = pointermasking_needed,
 .fields = (const VMStateField[]) {
+VMSTATE_UINTTL(env.mseccfg, RISCVCPU),
+VMSTATE_UINTTL(env.senvcfg, RISCVCPU),
+VMSTATE_UINTTL(env.menvcfg, RISCVCPU),
 VMSTATE_END_OF_LIST()
 }
 };
diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 2a76b611a0..7ddb9dbf0b 100644
--- a/target/riscv/pmp.c

[PATCH v8 0/6] Pointer Masking update for Zjpm v0.8

2024-03-20 Thread Alexey Baturo

From: Alexey Baturo 

Hi,

Rebasing patches on current qemu branch and resubmitting them.

Thanks.

[v7]:
I'm terribly sorry, but previous rebase went wrong and somehow I missed it.
This time I double-checked rebased version.
This patch series is properly rebased on 
https://github.com/alistair23/qemu/tree/riscv-to-apply.next 

[v6]:
This patch series is rebased on 
https://github.com/alistair23/qemu/tree/riscv-to-apply.next 

[v5]:
This patch series targets Zjpm v0.8 extension.
The spec itself could be found here: 
https://github.com/riscv/riscv-j-extension/blob/8088461d8d66a7676872b61c908cbeb7cf5c5d1d/zjpm-spec.pdf
This patch series is updated after the suggested comments:
- add "x-" to the extension names to indicate experimental

[v4]:
Patch series updated after the suggested comments:
- removed J-letter extension as it's unused
- renamed and fixed function to detect if address should be sign-extended
- zeroed unused context variables and moved computation logic to another patch
- bumped pointer masking version_id and minimum_version_id by 1

[v3]:
There patches are updated after Richard's comments:
- moved new tb flags to the end
- used tcg_gen_(s)extract to get the final address
- properly handle CONFIG_USER_ONLY

[v2]:
As per Richard's suggestion I made pmm field part of tb_flags.
It allowed to get rid of global variable to store pmlen.
Also it allowed to simplify all the machinery around it.

[v1]:
It looks like Zjpm v0.8 is almost frozen and we don't expect it change 
drastically anymore.
Compared to the original implementation with explicit base and mask CSRs, we 
now only have
several fixed options for number of masked bits which are set using existing 
CSRs.
The changes have been tested with handwritten assembly tests and LLVM HWASAN
test suite.

Alexey Baturo (6):
  target/riscv: Remove obsolete pointer masking  extension code.
  target/riscv: Add new CSR fields for S{sn,mn,m}pm extensions as part
of Zjpm v0.8
  target/riscv: Add helper functions to calculate current number of
masked bits for pointer masking
  target/riscv: Add pointer masking tb flags
  target/riscv: Update address modify functions to take into account
pointer masking
  target/riscv: Enable updates for pointer masking variables and thus
enable pointer masking extension

 target/riscv/cpu.c   |  21 +--
 target/riscv/cpu.h   |  45 +++--
 target/riscv/cpu_bits.h  |  90 +-
 target/riscv/cpu_cfg.h   |   3 +
 target/riscv/cpu_helper.c|  97 +-
 target/riscv/csr.c   | 337 ++-
 target/riscv/machine.c   |  20 +--
 target/riscv/pmp.c   |  13 +-
 target/riscv/pmp.h   |  11 +-
 target/riscv/tcg/tcg-cpu.c   |   5 +-
 target/riscv/translate.c |  46 ++---
 target/riscv/vector_helper.c |  15 +-
 12 files changed, 157 insertions(+), 546 deletions(-)

-- 
2.34.1

[PATCH v8 1/6] target/riscv: Remove obsolete pointer masking extension code.

2024-03-20 Thread Alexey Baturo

From: Alexey Baturo 

Zjpm v0.8 is almost frozen and it's much simplier compared to the existing one:
The newer version doesn't allow to specify custom mask or base for masking.
Instead it allows only certain options for masking top bits.

Signed-off-by: Alexey Baturo 

Acked-by: Alistair Francis 
---
 target/riscv/cpu.c   |  13 +-
 target/riscv/cpu.h   |  30 +---
 target/riscv/cpu_bits.h  |  87 --
 target/riscv/cpu_helper.c|  52 --
 target/riscv/csr.c   | 326 ---
 target/riscv/machine.c   |  14 +-
 target/riscv/tcg/tcg-cpu.c   |   5 +-
 target/riscv/translate.c |  27 +--
 target/riscv/vector_helper.c |   2 +-
 9 files changed, 13 insertions(+), 543 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index c160b9216b..73c69f3d0a 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -42,7 +42,7 @@
 /* RISC-V CPU definitions */
 static const char riscv_single_letter_exts[] = "IEMAFDQCBPVH";
 const uint32_t misa_bits[] = {RVI, RVE, RVM, RVA, RVF, RVD, RVV,
-  RVC, RVS, RVU, RVH, RVJ, RVG, RVB, 0};
+  RVC, RVS, RVU, RVH, RVG, RVB, 0};
 
 /*
  * From vector_helper.c
@@ -793,13 +793,6 @@ static void riscv_cpu_dump_state(CPUState *cs, FILE *f, 
int flags)
 CSR_MSCRATCH,
 CSR_SSCRATCH,
 CSR_SATP,
-CSR_MMTE,
-CSR_UPMBASE,
-CSR_UPMMASK,
-CSR_SPMBASE,
-CSR_SPMMASK,
-CSR_MPMBASE,
-CSR_MPMMASK,
 };
 
 for (i = 0; i < ARRAY_SIZE(dump_csrs); ++i) {
@@ -979,8 +972,6 @@ static void riscv_cpu_reset_hold(Object *obj)
 }
 i++;
 }
-/* mmte is supposed to have pm.current hardwired to 1 */
-env->mmte |= (EXT_STATUS_INITIAL | MMTE_M_PM_CURRENT);
 
 /*
  * Bits 10, 6, 2 and 12 of mideleg are read only 1 when the Hypervisor
@@ -1002,7 +993,6 @@ static void riscv_cpu_reset_hold(Object *obj)
 pmp_unlock_entries(env);
 #endif
 env->xl = riscv_cpu_mxl(env);
-riscv_cpu_update_mask(env);
 cs->exception_index = RISCV_EXCP_NONE;
 env->load_res = -1;
 set_default_nan_mode(1, >fp_status);
@@ -1393,7 +1383,6 @@ static const MISAExtInfo misa_ext_info_arr[] = {
 MISA_EXT_INFO(RVS, "s", "Supervisor-level instructions"),
 MISA_EXT_INFO(RVU, "u", "User-level instructions"),
 MISA_EXT_INFO(RVH, "h", "Hypervisor"),
-MISA_EXT_INFO(RVJ, "x-j", "Dynamic translated languages"),
 MISA_EXT_INFO(RVV, "v", "Vector operations"),
 MISA_EXT_INFO(RVG, "g", "General purpose (IMAFD_Zicsr_Zifencei)"),
 MISA_EXT_INFO(RVB, "x-b", "Bit manipulation (Zba_Zbb_Zbs)")
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 3b1a02b944..cfad5281a1 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -68,7 +68,6 @@ typedef struct CPUArchState CPURISCVState;
 #define RVS RV('S')
 #define RVU RV('U')
 #define RVH RV('H')
-#define RVJ RV('J')
 #define RVG RV('G')
 #define RVB RV('B')
 
@@ -395,17 +394,6 @@ struct CPUArchState {
 /* True if in debugger mode.  */
 bool debugger;
 
-/*
- * CSRs for PointerMasking extension
- */
-target_ulong mmte;
-target_ulong mpmmask;
-target_ulong mpmbase;
-target_ulong spmmask;
-target_ulong spmbase;
-target_ulong upmmask;
-target_ulong upmbase;
-
 /* CSRs for execution environment configuration */
 uint64_t menvcfg;
 uint64_t mstateen[SMSTATEEN_MAX_COUNT];
@@ -414,9 +402,6 @@ struct CPUArchState {
 target_ulong senvcfg;
 uint64_t henvcfg;
 #endif
-target_ulong cur_pmmask;
-target_ulong cur_pmbase;
-
 /* Fields from here on are preserved across CPU reset. */
 QEMUTimer *stimer; /* Internal timer for S-mode interrupt */
 QEMUTimer *vstimer; /* Internal timer for VS-mode interrupt */
@@ -565,16 +550,14 @@ FIELD(TB_FLAGS, VSTART_EQ_ZERO, 15, 1)
 /* The combination of MXL/SXL/UXL that applies to the current cpu mode. */
 FIELD(TB_FLAGS, XL, 16, 2)
 /* If PointerMasking should be applied */
-FIELD(TB_FLAGS, PM_MASK_ENABLED, 18, 1)
-FIELD(TB_FLAGS, PM_BASE_ENABLED, 19, 1)
-FIELD(TB_FLAGS, VTA, 20, 1)
-FIELD(TB_FLAGS, VMA, 21, 1)
+FIELD(TB_FLAGS, VTA, 18, 1)
+FIELD(TB_FLAGS, VMA, 19, 1)
 /* Native debug itrigger */
-FIELD(TB_FLAGS, ITRIGGER, 22, 1)
+FIELD(TB_FLAGS, ITRIGGER, 20, 1)
 /* Virtual mode enabled */
-FIELD(TB_FLAGS, VIRT_ENABLED, 23, 1)
-FIELD(TB_FLAGS, PRIV, 24, 2)
-FIELD(TB_FLAGS, AXL, 26, 2)
+FIELD(TB_FLAGS, VIRT_ENABLED, 21, 1)
+FIELD(TB_FLAGS, PRIV, 22, 2)
+FIELD(TB_FLAGS, AXL, 24, 2)
 
 #ifdef TARGET_RISCV32
 #define riscv_cpu_mxl(env)  ((void)(env), MXL_RV32)
@@ -707,7 +690,6 @@ static inline uint32_t vext_get_vlmax(uint32_t vlenb, 
uint32_t vsew,
 void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
   uint64_t *cs_base, uint32_t *pflags);
 
-void riscv_cpu_update_mask(CPURISCVState *env);
 bool riscv_cpu_is_32bit(RISCVCPU *cpu);

Re: [PATCH-for-9.0 0/2] target/monitor: Deprecate 'info tlb/mem' in favor of 'info mmu'

2024-03-20 Thread Peter Maydell

On Wed, 20 Mar 2024 at 17:06, Philippe Mathieu-Daudé  wrote:
>
> +Alex/Daniel
>
> On 20/3/24 17:53, Peter Maydell wrote:
> > On Wed, 20 Mar 2024 at 16:40, Philippe Mathieu-Daudé  
> > wrote:
> >>
> >> 'info tlb' and 'info mem' commands don't scale in heterogeneous
> >> emulation. They will be reworked after the next release, hidden
> >> behind the 'info mmu' command. It is not too late to deprecate
> >> commands, so add the 'info mmu' command as wrapper to the other
> >> ones, but already deprecate them.
> >>
> >> Philippe Mathieu-Daudé (2):
> >>target/monitor: Introduce 'info mmu' command
> >>target/monitor: Deprecate 'info tlb' and 'info mem' commands
> >
> > This seems to replace "info tlb" and "info mem" with "info mmu -t"
> > and "info mmu -m", but it doesn't really say anything about:
> >   * what the difference is between these two things
>
> I really don't know; I'm only trying to keep the monitor interface
> identical.

You don't, though: you change it from "info tlb" to "info mmu -t" etc.

> >   * which targets implement which and why
>
> This one is easy to answer:
>
> #if defined(TARGET_I386) || defined(TARGET_SH4) || defined(TARGET_SPARC)
> || \
>  defined(TARGET_PPC) || defined(TARGET_XTENSA) || defined(TARGET_M68K)
>  {
>  .name   = "tlb",
>
> #if defined(TARGET_I386) || defined(TARGET_RISCV)
>  {
>  .name   = "mem",
>
> >   * what the plan is for the future
>
> My problem is with linking a single QEMU binary, as these two symbols
> (hmp_info_mem and hmp_info_tlb) clash.

Yes, but they both (implicitly) operate on the current HMP CPU,
so the problem with linking into a single binary is that they're
not indirected through a method on the CPU object, not the syntax
used in the monitor to invoke them, presumably.

> I'm indeed only postponing the problem, without looking at what
> this code does. I did it adding hmp_info_mmu_tlb/mem hooks in
> TCGCPUOps ("hw/core/tcg-cpu-ops.h"), so the command can be
> dispatched per target vcpu as target-agnostic code in
> monitor/hmp-cmds.c:
>
> +#include "hw/core/tcg-cpu-ops.h"
> +
> +static void hmp_info_mmu_tlb(Monitor *mon, CPUState *cpu)
> +{
> +const TCGCPUOps *tcg_ops = cpu->cc->tcg_ops;
> +
> +if (tcg_ops->hmp_info_mmu_tlb) {
> +tcg_ops->hmp_info_mmu_tlb(mon, cpu_env(cpu));
> +} else {
> +monitor_puts(mon, "No per-CPU information available on this
> target\n");
> +}
> +}

These aren't TCG specific though, so why TCGCPUOps ?

> > I am definitely not a fan of either of these commands, because
> > (as we currently implement them) they effectively require each
> > target architecture to implement a second copy of the page table
> > walking code. But before we can deprecate them we need to be
> > pretty sure that "info mmu" is what we want to replace them with.
>
> An alternative is to just deprecate them, without adding "info mmu" :)
>
> It is OK to un-deprecate stuff if we realize its usefulness.

The commands are there because some users find them useful.
I just dislike them because I think they're a bit niche and
annoying to implement and not consistent across target
architectures and not very well documented...

By the way, we have no obligation to follow the deprecate-and-drop
process for HMP commands; unlike QMP, we give ourselves the
license to vary it when we feel like it, because the users are
humans, not programs or scripts.

-- PMM

Re: [PATCH v2] vfio/pci: migration: Skip config space check for vendor specific capability during restore/load

2024-03-20 Thread Vinayak Kale





On 18/03/24 8:28 pm, Alex Williamson wrote:

External email: Use caution opening links or attachments


On Fri, 15 Mar 2024 23:22:22 +0530
Vinayak Kale  wrote:


On 11/03/24 8:32 pm, Alex Williamson wrote:

External email: Use caution opening links or attachments


On Mon, 11 Mar 2024 17:45:19 +0530
Vinayak Kale  wrote:


In case of migration, during restore operation, qemu checks config space of the
pci device with the config space in the migration stream captured during save
operation. In case of config space data mismatch, restore operation is failed.

config space check is done in function get_pci_config_device(). By default VSC
(vendor-specific-capability) in config space is checked.

Ideally qemu should not check VSC for VFIO-PCI device during restore/load as
qemu is not aware of VSC ABI.


It's disappointing that we can't seem to have a discussion about why
it's not the responsibility of the underlying migration support in the
vfio-pci variant driver to make the vendor specific capability
consistent across migration.


I think it is device vendor driver's responsibility to ensure that VSC
is consistent across migration. Here consistency could mean that VSC
format should be same on source and destination, however actual VSC
contents may not be byte-to-byte identical.

If a vfio-pci device is migration capable and if vfio-pci vendor driver
is OK with volatile VSC contents as long as consistency is maintained
for VSC format then QEMU should exempt config space check for VSC contents.


I tend to agree that ultimately the variant driver is responsible for
making the device consistent during migration and QEMU's policy that
even vendor defined ABI needs to be byte for byte identical is somewhat
arbitrary.


Also, for future maintenance, specifically what device is currently
broken by this and under what conditions?


Under certain conditions VSC contents vary for NVIDIA vGPU devices in
case of live migration. Due to QEMU's current config space check for
VSC, live migration is broken across NVIDIA vGPU devices.


This is incredibly vague.  We've been testing NVIDIA vGPU migration and
have not experienced a migration failure due to VSC mismatch.  Does this
require a specific device?  A specific workload?  What specific
conditions trigger this problem?


In case of live migration, in a situation where source and destination 
host driver is different, Vendor Specific Information in VSC varies on 
the destination to ensure vGPU feature capabilities exposed to guest 
driver are compatible with destination host. This is applicable to all 
NVIDIA vGPU devices.




While as above, I agree in theory that the responsibility lies on the
migration support in the variant driver, there are risks involved,
particularly if new dependencies on the VSC contents are developed in
the guest.  For future maintenance and development in this space, the
commit log should describe exactly the scenario that requires this
policy change.  Thanks,


I'll add aforementioned scenario (situation when live migration is 
broken for NVIDIA vGPU devices) in the commit description. Thanks.




Alex


This patch skips the check for VFIO-PCI device by clearing pdev->cmask[] for VSC
offsets. If cmask[] is not set for an offset, then qemu skips config space check
for that offset.

Signed-off-by: Vinayak Kale 
---
Version History
v1->v2:
  - Limited scope of change to vfio-pci devices instead of all pci devices.

   hw/vfio/pci.c | 19 +++
   1 file changed, 19 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d7fe06715c..9edaff4b37 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2132,6 +2132,22 @@ static void vfio_check_af_flr(VFIOPCIDevice *vdev, 
uint8_t pos)
   }
   }

+static int vfio_add_vendor_specific_cap(VFIOPCIDevice *vdev, int pos,
+uint8_t size, Error **errp)
+{
+PCIDevice *pdev = >pdev;
+
+pos = pci_add_capability(pdev, PCI_CAP_ID_VNDR, pos, size, errp);
+if (pos < 0) {
+return pos;
+}
+
+/* Exempt config space check for VSC during restore/load  */
+memset(pdev->cmask + pos, 0, size);


This excludes the entire capability from comparison, including the
capability ID, next pointer, and capability length.  Even if the
contents of the capability are considered volatile vendor information,
the header is spec defined ABI which must be consistent.  Thanks,


This makes sense, I'll address this in V3. Thanks.



Alex


+
+return pos;
+}
+
   static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos, Error **errp)
   {
   PCIDevice *pdev = >pdev;
@@ -2199,6 +2215,9 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t 
pos, Error **errp)
   vfio_check_af_flr(vdev, pos);
   ret = pci_add_capability(pdev, cap_id, pos, size, errp);
   break;
+case PCI_CAP_ID_VNDR:
+ret = vfio_add_vendor_specific_cap(vdev, pos, size, errp);
+break;
   default:
   ret =

Re: [PATCH RFC v3 00/49] Add AMD Secure Nested Paging (SEV-SNP) support

2024-03-20 Thread Paolo Bonzini

On Wed, Mar 20, 2024 at 10:59 AM Paolo Bonzini  wrote:
> I will now focus on reviewing patches 6-20.  This way we can prepare a
> common tree for SEV_INIT2/SNP/TDX, for both vendors to build upon.

Ok, the attachment is the delta that I have. The only major change is
requiring discard (thus effectively blocking VFIO support for
SEV-SNP/TDX, at least for now).

I will push it shortly to the same sevinit2 branch, and will post the
patches sometime soon.

Xiaoyao, you can use that branch too (it's on
https://gitlab.com/bonzini/qemu) as the basis for your TDX work.

Paolo
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index bf0ae0c8adb..428468950d9 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -285,19 +285,8 @@ static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, boo
 {
 KVMState *s = kvm_state;
 struct kvm_userspace_memory_region2 mem;
-static int cap_user_memory2 = -1;
 int ret;
 
-if (cap_user_memory2 == -1) {
-cap_user_memory2 = kvm_check_extension(s, KVM_CAP_USER_MEMORY2);
-}
-
-if (!cap_user_memory2 && slot->guest_memfd >= 0) {
-error_report("%s, KVM doesn't support KVM_CAP_USER_MEMORY2,"
- " which is required by guest memfd!", __func__);
-exit(1);
-}
-
 mem.slot = slot->slot | (kml->as_id << 16);
 mem.guest_phys_addr = slot->start_addr;
 mem.userspace_addr = (unsigned long)slot->ram;
@@ -310,7 +299,7 @@ static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, boo
  * value. This is needed based on KVM commit 75d61fbc. */
 mem.memory_size = 0;
 
-if (cap_user_memory2) {
+if (kvm_guest_memfd_supported) {
 ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, );
 } else {
 ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, );
@@ -320,7 +309,7 @@ static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, boo
 }
 }
 mem.memory_size = slot->memory_size;
-if (cap_user_memory2) {
+if (kvm_guest_memfd_supported) {
 ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, );
 } else {
 ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, );
@@ -332,7 +321,7 @@ err:
   mem.userspace_addr, mem.guest_memfd,
   mem.guest_memfd_offset, ret);
 if (ret < 0) {
-if (cap_user_memory2) {
+if (kvm_guest_memfd_supported) {
 error_report("%s: KVM_SET_USER_MEMORY_REGION2 failed, slot=%d,"
 " start=0x%" PRIx64 ", size=0x%" PRIx64 ","
 " flags=0x%" PRIx32 ", guest_memfd=%" PRId32 ","
@@ -502,6 +491,7 @@ static int kvm_mem_flags(MemoryRegion *mr)
 flags |= KVM_MEM_READONLY;
 }
 if (memory_region_has_guest_memfd(mr)) {
+assert(kvm_guest_memfd_supported);
 flags |= KVM_MEM_GUEST_MEMFD;
 }
 return flags;
@@ -1310,18 +1300,7 @@ static int kvm_set_memory_attributes(hwaddr start, hwaddr size, uint64_t attr)
 struct kvm_memory_attributes attrs;
 int r;
 
-if (kvm_supported_memory_attributes == 0) {
-error_report("No memory attribute supported by KVM\n");
-return -EINVAL;
-}
-
-if ((attr & kvm_supported_memory_attributes) != attr) {
-error_report("memory attribute 0x%lx not supported by KVM,"
- " supported bits are 0x%lx\n",
- attr, kvm_supported_memory_attributes);
-return -EINVAL;
-}
-
+assert((attr & kvm_supported_memory_attributes) == attr);
 attrs.attributes = attr;
 attrs.address = start;
 attrs.size = size;
@@ -2488,11 +2467,14 @@ static int kvm_init(MachineState *ms)
 }
 s->as = g_new0(struct KVMAs, s->nr_as);
 
-kvm_guest_memfd_supported = kvm_check_extension(s, KVM_CAP_GUEST_MEMFD);
-
 ret = kvm_check_extension(s, KVM_CAP_MEMORY_ATTRIBUTES);
 kvm_supported_memory_attributes = ret > 0 ? ret : 0;
 
+kvm_guest_memfd_supported =
+kvm_check_extension(s, KVM_CAP_GUEST_MEMFD) &&
+kvm_check_extension(s, KVM_CAP_USER_MEMORY2) &&
+(kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE);
+
 if (object_property_find(OBJECT(current_machine), "kvm-type")) {
 g_autofree char *kvm_type = object_property_get_str(OBJECT(current_machine),
 "kvm-type",
@@ -2962,14 +2944,10 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
 */
 return 0;
 } else {
-ret = ram_block_discard_is_disabled()
-  ? ram_block_discard_range(rb, offset, size)
-  : 0;
+ret = ram_block_discard_range(rb, offset, size);
 }
 } else {
-ret = ram_block_discard_is_disabled()
-  ? ram_block_discard_guest_memfd_range(rb, offset,

Re: [PATCH-for-9.0 0/2] target/monitor: Deprecate 'info tlb/mem' in favor of 'info mmu'

2024-03-20 Thread Philippe Mathieu-Daudé


+Alex/Daniel

On 20/3/24 17:53, Peter Maydell wrote:

On Wed, 20 Mar 2024 at 16:40, Philippe Mathieu-Daudé  wrote:


'info tlb' and 'info mem' commands don't scale in heterogeneous
emulation. They will be reworked after the next release, hidden
behind the 'info mmu' command. It is not too late to deprecate
commands, so add the 'info mmu' command as wrapper to the other
ones, but already deprecate them.

Philippe Mathieu-Daudé (2):
   target/monitor: Introduce 'info mmu' command
   target/monitor: Deprecate 'info tlb' and 'info mem' commands


This seems to replace "info tlb" and "info mem" with "info mmu -t"
and "info mmu -m", but it doesn't really say anything about:
  * what the difference is between these two things


I really don't know; I'm only trying to keep the monitor interface
identical.


  * which targets implement which and why


This one is easy to answer:

#if defined(TARGET_I386) || defined(TARGET_SH4) || defined(TARGET_SPARC) 
|| \

defined(TARGET_PPC) || defined(TARGET_XTENSA) || defined(TARGET_M68K)
{
.name   = "tlb",

#if defined(TARGET_I386) || defined(TARGET_RISCV)
{
.name   = "mem",


  * what the plan is for the future


My problem is with linking a single QEMU binary, as these two symbols
(hmp_info_mem and hmp_info_tlb) clash.

Luckily for me these are the only 2 implemented by more then one target:

$ git grep TARGET_ -- hmp-commands*
hmp-commands-info.hx:116:#if defined(TARGET_I386)
hmp-commands-info.hx:225:#if defined(TARGET_I386) || defined(TARGET_SH4) 
|| defined(TARGET_SPARC) || \
hmp-commands-info.hx:226:defined(TARGET_PPC) || 
defined(TARGET_XTENSA) || defined(TARGET_M68K)

hmp-commands-info.hx:241:#if defined(TARGET_I386) || defined(TARGET_RISCV)
hmp-commands-info.hx:729:#if defined(TARGET_S390X)
hmp-commands-info.hx:744:#if defined(TARGET_S390X)
hmp-commands-info.hx:828:#if defined(TARGET_I386)
hmp-commands-info.hx:882:#if defined(TARGET_I386)
hmp-commands.hx:1126:#if defined(TARGET_S390X)
hmp-commands.hx:1141:#if defined(TARGET_S390X)
hmp-commands.hx:1489:#if defined(TARGET_I386)

All the other ones are only implemented by a single target, so not a
problem for now.

I'm indeed only postponing the problem, without looking at what
this code does. I did it adding hmp_info_mmu_tlb/mem hooks in
TCGCPUOps ("hw/core/tcg-cpu-ops.h"), so the command can be
dispatched per target vcpu as target-agnostic code in
monitor/hmp-cmds.c:

+#include "hw/core/tcg-cpu-ops.h"
+
+static void hmp_info_mmu_tlb(Monitor *mon, CPUState *cpu)
+{
+const TCGCPUOps *tcg_ops = cpu->cc->tcg_ops;
+
+if (tcg_ops->hmp_info_mmu_tlb) {
+tcg_ops->hmp_info_mmu_tlb(mon, cpu_env(cpu));
+} else {
+monitor_puts(mon, "No per-CPU information available on this 
target\n");

+}
+}


I am definitely not a fan of either of these commands, because
(as we currently implement them) they effectively require each
target architecture to implement a second copy of the page table
walking code. But before we can deprecate them we need to be
pretty sure that "info mmu" is what we want to replace them with.


An alternative is to just deprecate them, without adding "info mmu" :)

It is OK to un-deprecate stuff if we realize its usefulness.

Regards,

Phil.

Re: [PULL 0/6] QEMU bug fixes for 20240320

2024-03-20 Thread Peter Maydell

On Wed, 20 Mar 2024 at 10:32, Paolo Bonzini  wrote:
>
> The following changes since commit ba49d760eb04630e7b15f423ebecf6c871b8f77b:
>
>   Merge tag 'pull-maintainer-final-130324-1' of 
> https://gitlab.com/stsquad/qemu into staging (2024-03-13 15:12:14 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/bonzini/qemu.git tags/for-upstream
>
> for you to fetch changes up to 05007258f02da253af370387b69fe98e9f37b320:
>
>   meson: remove dead dictionary access (2024-03-20 11:30:49 +0100)
>
> 
> * fix use-after-free issue
> * fix i386 TLB issue
> * fix crash with wrong -M confidential-guest-support argument
> * fix NULL pointer dereference in x86 MCE injection
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM

Re: [PULL 0/5] Ui patches

2024-03-20 Thread Peter Maydell

On Wed, 20 Mar 2024 at 13:54,  wrote:
>
> From: Marc-André Lureau 
>
> The following changes since commit c62d54d0a8067ffb3d5b909276f7296d7df33fa7:
>
>   Update version for v9.0.0-rc0 release (2024-03-19 19:13:52 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/marcandre.lureau/qemu.git tags/ui-pull-request
>
> for you to fetch changes up to d4069a84a3380247c1b524096c6a807743bf687a:
>
>   ui: compile dbus-display1.c with -fPIC as necessary (2024-03-20 10:28:00 
> +0400)
>
> 
> UI: fixes
>
> - dbus-display shared-library compilation fix
> - remove console_select() and fix related issues
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM

Re: [PULL 0/5] Edk2 20240320 patches

2024-03-20 Thread Peter Maydell

On Wed, 20 Mar 2024 at 07:09, Gerd Hoffmann  wrote:
>
> The following changes since commit ba49d760eb04630e7b15f423ebecf6c871b8f77b:
>
>   Merge tag 'pull-maintainer-final-130324-1' of 
> https://gitlab.com/stsquad/qemu into staging (2024-03-13 15:12:14 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/kraxel/qemu.git tags/edk2-20240320-pull-request
>
> for you to fetch changes up to 4a1babe58a1b3cd2c493ee6e0d774e70f62ad9c3:
>
>   update edk2 binaries for arm, risc-v and x86 secure boot. (2024-03-19 
> 16:42:10 +0100)
>
> 
> edk2: cleanup fix, update build config, rebuild binaries.
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM

Re: [PATCH] libqos/virtio.c: Correct 'flags' reading in qvirtqueue_kick

2024-03-20 Thread Philippe Mathieu-Daudé

Cc'ing Marc & Stefan for commit 1053587c3f ("libqos: Added EVENT_IDX 
support").


On 20/3/24 10:04, Zheyu Ma wrote:

In qvirtqueue_kick(), the 'flags' were previously being incorrectly read from
vq->avail instead of the correct vq->used location. This update ensures 'flags'
are read from the correct location as per the virtio standard.

Signed-off-by: Zheyu Ma 
---
  tests/qtest/libqos/virtio.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/libqos/virtio.c b/tests/qtest/libqos/virtio.c
index 82a6e122bf..a21b6eee9c 100644
--- a/tests/qtest/libqos/virtio.c
+++ b/tests/qtest/libqos/virtio.c
@@ -394,7 +394,7 @@ void qvirtqueue_kick(QTestState *qts, QVirtioDevice *d, 
QVirtQueue *vq,
  qvirtio_writew(d, qts, vq->avail + 2, idx + 1);
  
  /* Must read after idx is updated */

-flags = qvirtio_readw(d, qts, vq->avail);
+flags = qvirtio_readw(d, qts, vq->used);
  avail_event = qvirtio_readw(d, qts, vq->used + 4 +
  sizeof(struct vring_used_elem) * vq->size);
  


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH-for-9.0 0/2] target/monitor: Deprecate 'info tlb/mem' in favor of 'info mmu'

2024-03-20 Thread Peter Maydell

On Wed, 20 Mar 2024 at 16:40, Philippe Mathieu-Daudé  wrote:
>
> 'info tlb' and 'info mem' commands don't scale in heterogeneous
> emulation. They will be reworked after the next release, hidden
> behind the 'info mmu' command. It is not too late to deprecate
> commands, so add the 'info mmu' command as wrapper to the other
> ones, but already deprecate them.
>
> Philippe Mathieu-Daudé (2):
>   target/monitor: Introduce 'info mmu' command
>   target/monitor: Deprecate 'info tlb' and 'info mem' commands

This seems to replace "info tlb" and "info mem" with "info mmu -t"
and "info mmu -m", but it doesn't really say anything about:
 * what the difference is between these two things
 * which targets implement which and why
 * what the plan is for the future

I am definitely not a fan of either of these commands, because
(as we currently implement them) they effectively require each
target architecture to implement a second copy of the page table
walking code. But before we can deprecate them we need to be
pretty sure that "info mmu" is what we want to replace them with.

thanks
-- PMM

Re: [PATCH 3/4] hw/nmi: Remove @cpu_index argument from NMIClass::nmi_handler()

2024-03-20 Thread Philippe Mathieu-Daudé


On 20/3/24 14:23, Peter Maydell wrote:

On Tue, 20 Feb 2024 at 15:09, Philippe Mathieu-Daudé  wrote:


Only s390x was using the 'cpu_index' argument, but since the
previous commit it isn't anymore (it use the first cpu).
Since this argument is now completely unused, remove it. Have
the callback return a boolean indicating failure.

Signed-off-by: Philippe Mathieu-Daudé 
---
  include/hw/nmi.h   | 11 ++-
  hw/core/nmi.c  |  3 +--
  hw/hppa/machine.c  |  8 +---
  hw/i386/x86.c  |  7 ---
  hw/intc/m68k_irqc.c|  6 --
  hw/m68k/q800-glue.c|  6 --
  hw/misc/macio/gpio.c   |  6 --
  hw/ppc/pnv.c   |  6 --
  hw/ppc/spapr.c |  6 --
  hw/s390x/s390-virtio-ccw.c |  6 --
  10 files changed, 44 insertions(+), 21 deletions(-)

diff --git a/include/hw/nmi.h b/include/hw/nmi.h
index fff41bebc6..c70db941c9 100644
--- a/include/hw/nmi.h
+++ b/include/hw/nmi.h
@@ -37,7 +37,16 @@ typedef struct NMIState NMIState;
  struct NMIClass {
  InterfaceClass parent_class;

-void (*nmi_monitor_handler)(NMIState *n, int cpu_index, Error **errp);
+/**
+ * nmi_handler: Callback to handle NMI notifications.
+ *
+ * @n: Class #NMIState state
+ * @errp: pointer to error object
+ *
+ * On success, return %true.
+ * On failure, store an error through @errp and return %false.
+ */
+bool (*nmi_handler)(NMIState *n, Error **errp);


Any particular reason to change the method name here?

Do we really need to indicate failure both through the bool return
and the Error** ?


No, but this is the style *recommended* by the Error API since
commit e3fe3988d7 ("error: Document Error API usage rules"):

error: Document Error API usage rules

This merely codifies existing practice, with one exception: the rule
advising against returning void, where existing practice is mixed.

When the Error API was created, we adopted the (unwritten) rule to
return void when the function returns no useful value on success,
unlike GError, which recommends to return true on success and false
on error then.

[...]

Make the rule advising against returning void official by putting it
in writing.  This will hopefully reduce confusion.

  * - Whenever practical, also return a value that indicates success /
  *   failure.  This can make the error checking more concise, and can
  *   avoid useless error object creation and destruction.  Note that
  *   we still have many functions returning void.  We recommend
  *   • bool-valued functions return true on success / false on failure,
  *   • pointer-valued functions return non-null / null pointer, and
  *   • integer-valued functions return non-negative / negative.

Anyway I'll respin removing @cpu_index as a single change :)

[PATCH v2 2/2] target/riscv/csr: Added the ability to delegate LCOFI to VS

2024-03-20 Thread Irina Ryapolova

From: Vadim Shakirov 

In the AIA specification in the paragraph "Virtual interrupts for VS level"
it is indicated for interrupts 13-63: if the bit in hideleg is enabled,
then the corresponding vsip and vsie bits are aliases to sip and sie

Signed-off-by: Vadim Shakirov 
Reviewed-by: Alistair Francis 
---
 target/riscv/csr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 0c21145eaf..51b1099e10 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1136,7 +1136,7 @@ static RISCVException write_stimecmph(CPURISCVState *env, 
int csrno,
 static const uint64_t delegable_ints =
 S_MODE_INTERRUPTS | VS_MODE_INTERRUPTS | MIP_LCOFIP;
 static const uint64_t vs_delegable_ints =
-(VS_MODE_INTERRUPTS | LOCAL_INTERRUPTS) & ~MIP_LCOFIP;
+VS_MODE_INTERRUPTS | LOCAL_INTERRUPTS;
 static const uint64_t all_ints = M_MODE_INTERRUPTS | S_MODE_INTERRUPTS |
  HS_MODE_INTERRUPTS | LOCAL_INTERRUPTS;
 #define DELEGABLE_EXCPS ((1ULL << (RISCV_EXCP_INST_ADDR_MIS)) | \
-- 
2.25.1

[PATCH v2 1/2] target/riscv/csr.c: Add functional of hvictl CSR

2024-03-20 Thread Irina Ryapolova

CSR hvictl (Hypervisor Virtual Interrupt Control) provides further flexibility
for injecting interrupts into VS level in situations not fully supported by the
facilities described thus far, but only with more active involvement of the 
hypervisor.

A hypervisor must use hvictl for any of the following:
• asserting for VS level a major interrupt not supported by hvien and hvip;
• implementing configurability of priorities at VS level for major interrupts 
beyond those sup-
ported by hviprio1 and hviprio2; or
• emulating an external interrupt controller for a virtual hart without the use 
of an IMSIC’s
guest interrupt file, while also supporting configurable priorities both for 
external interrupts
and for major interrupts to the virtual hart.

All hvictl fields together can affect the value of CSR vstopi (Virtual 
Supervisor Top Interrupt)
and therefore the interrupt identity reported in vscause when an interrupt 
traps to VS-mode.
When hvictl.VTI = 1, the absence of an interrupt for VS level can be indicated 
only by setting
hvictl.IID = 9. Software might want to use the pair IID = 9, IPRIO = 0 
generally to represent
no interrupt in hvictl.

(See riscv-interrupts-1.0: Interrupts at VS level)

Signed-off-by: Irina Ryapolova 
---
Changes for v2:
  -added more information in commit message
---
 target/riscv/csr.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 674ea075a4..0c21145eaf 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3585,6 +3585,21 @@ static int read_hvictl(CPURISCVState *env, int csrno, 
target_ulong *val)
 static int write_hvictl(CPURISCVState *env, int csrno, target_ulong val)
 {
 env->hvictl = val & HVICTL_VALID_MASK;
+if (env->hvictl & HVICTL_VTI)
+{
+uint32_t hviid = get_field(env->hvictl, HVICTL_IID);
+uint32_t hviprio = get_field(env->hvictl, HVICTL_IPRIO);
+/* the pair IID = 9, IPRIO = 0 generally to represent no interrupt in 
hvictl. */
+if (!(hviid == IRQ_S_EXT && hviprio == 0)) {
+uint64_t new_val = BIT(hviid) ;
+ if (new_val & S_MODE_INTERRUPTS) {
+rmw_hvip64(env, csrno, NULL, new_val << 1, new_val << 1);
+} else if (new_val & LOCAL_INTERRUPTS) {
+rmw_hvip64(env, csrno, NULL, new_val, new_val);
+}
+}
+}
+
 return RISCV_EXCP_NONE;
 }
 
-- 
2.25.1

[PATCH-for-9.0 2/2] target/monitor: Deprecate 'info tlb' and 'info mem' commands

2024-03-20 Thread Philippe Mathieu-Daudé

'info tlb' has been replaced by 'info mmu -t', and
'info mem' by 'info mmu -m'.

Signed-off-by: Philippe Mathieu-Daudé 
---
 docs/about/deprecated.rst| 10 ++
 include/monitor/hmp-target.h |  2 ++
 monitor/hmp-cmds-target.c| 20 
 target/i386/monitor.c|  4 ++--
 target/m68k/monitor.c|  2 +-
 target/nios2/monitor.c   |  2 +-
 target/ppc/ppc-qmp-cmds.c|  2 +-
 target/riscv/monitor.c   |  2 +-
 target/sh4/monitor.c |  2 +-
 target/sparc/monitor.c   |  2 +-
 target/xtensa/monitor.c  |  2 +-
 hmp-commands-info.hx |  8 
 12 files changed, 41 insertions(+), 17 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 7b548519b5..4f5f4becbe 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -158,6 +158,16 @@ points was removed in 7.0. However QMP still exposed the 
vcpu
 parameter. This argument has now been deprecated and the remaining
 remaining trace points that used it are selected just by name.
 
+Human Monitor Protocol (HMP) commands
+-
+
+``info tlb`` and ``info mem`` (since 9.0)
+'
+
+The ``info tlb`` and ``info mem`` commands have been replaced by
+the ``info mmu`` command, which has the same behaviour but a less
+misleading name.
+
 Host Architectures
 --
 
diff --git a/include/monitor/hmp-target.h b/include/monitor/hmp-target.h
index 2af84b1915..057f7c6841 100644
--- a/include/monitor/hmp-target.h
+++ b/include/monitor/hmp-target.h
@@ -46,7 +46,9 @@ CPUState *mon_get_cpu(Monitor *mon);
 
 void hmp_info_mmu(Monitor *mon, const QDict *qdict);
 void hmp_info_mem(Monitor *mon, const QDict *qdict);
+void hmp_info_mem_deprecated(Monitor *mon, const QDict *qdict);
 void hmp_info_tlb(Monitor *mon, const QDict *qdict);
+void hmp_info_tlb_deprecated(Monitor *mon, const QDict *qdict);
 void hmp_mce(Monitor *mon, const QDict *qdict);
 void hmp_info_local_apic(Monitor *mon, const QDict *qdict);
 void hmp_info_sev(Monitor *mon, const QDict *qdict);
diff --git a/monitor/hmp-cmds-target.c b/monitor/hmp-cmds-target.c
index 71bce4870a..086b58b8d6 100644
--- a/monitor/hmp-cmds-target.c
+++ b/monitor/hmp-cmds-target.c
@@ -382,19 +382,31 @@ void hmp_gpa2hpa(Monitor *mon, const QDict *qdict)
 #endif
 
 __attribute__((weak))
-void hmp_info_mem(Monitor *mon, const QDict *qdict)
+void hmp_info_mem_deprecated(Monitor *mon, const QDict *qdict)
 {
 monitor_puts(mon,
  "No per-CPU mapping information available on this target\n");
 }
 
 __attribute__((weak))
-void hmp_info_tlb(Monitor *mon, const QDict *qdict)
+void hmp_info_tlb_deprecated(Monitor *mon, const QDict *qdict)
 {
 monitor_puts(mon,
  "No per-CPU TLB information available on this target\n");
 }
 
+void hmp_info_mem(Monitor *mon, const QDict *qdict)
+{
+monitor_puts(mon, "This command is deprecated, please use 'info mmu 
-m'\n");
+hmp_info_mem_deprecated(mon, qdict);
+}
+
+void hmp_info_tlb(Monitor *mon, const QDict *qdict)
+{
+monitor_puts(mon, "This command is deprecated, please use 'info mmu 
-t'\n");
+hmp_info_tlb_deprecated(mon, qdict);
+}
+
 void hmp_info_mmu(Monitor *mon, const QDict *qdict)
 {
 bool tlb = qdict_get_try_bool(qdict, "tlb", false);
@@ -410,9 +422,9 @@ void hmp_info_mmu(Monitor *mon, const QDict *qdict)
 }
 
 if (mem) {
-hmp_info_mem(mon, qdict);
+hmp_info_mem_deprecated(mon, qdict);
 }
 if (tlb) {
-hmp_info_tlb(mon, qdict);
+hmp_info_tlb_deprecated(mon, qdict);
 }
 }
diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index 3a281dab02..5da77b6b22 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -217,7 +217,7 @@ static void tlb_info_la57(Monitor *mon, CPUArchState *env)
 }
 #endif /* TARGET_X86_64 */
 
-void hmp_info_tlb(Monitor *mon, const QDict *qdict)
+void hmp_info_tlb_deprecated(Monitor *mon, const QDict *qdict)
 {
 CPUArchState *env;
 
@@ -545,7 +545,7 @@ static void mem_info_la57(Monitor *mon, CPUArchState *env)
 }
 #endif /* TARGET_X86_64 */
 
-void hmp_info_mem(Monitor *mon, const QDict *qdict)
+void hmp_info_mem_deprecated(Monitor *mon, const QDict *qdict)
 {
 CPUArchState *env;
 
diff --git a/target/m68k/monitor.c b/target/m68k/monitor.c
index 2bdf6acae0..ea303805c4 100644
--- a/target/m68k/monitor.c
+++ b/target/m68k/monitor.c
@@ -10,7 +10,7 @@
 #include "monitor/hmp-target.h"
 #include "monitor/monitor.h"
 
-void hmp_info_tlb(Monitor *mon, const QDict *qdict)
+void hmp_info_tlb_deprecated(Monitor *mon, const QDict *qdict)
 {
 CPUArchState *env1 = mon_get_cpu_env(mon);
 
diff --git a/target/nios2/monitor.c b/target/nios2/monitor.c
index 0152dec3fa..2e4efee1aa 100644
--- a/target/nios2/monitor.c
+++ b/target/nios2/monitor.c
@@ -27,7 +27,7 @@
 #include "monitor/hmp-target.h"
 #include "monitor/hmp.h"
 
-void hmp_info_tlb(Monitor *mon, const QDict *qdict)
+void

[PATCH-for-9.0 1/2] target/monitor: Introduce 'info mmu' command

2024-03-20 Thread Philippe Mathieu-Daudé

Introduce the 'info mmu' command. For now it only
forward to the 'info tlb' and 'info mem' commands,
which will be deprecated.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/monitor/hmp-target.h |  1 +
 monitor/hmp-cmds-target.c| 37 
 hmp-commands-info.hx | 14 ++
 3 files changed, 52 insertions(+)

diff --git a/include/monitor/hmp-target.h b/include/monitor/hmp-target.h
index d78e979f05..2af84b1915 100644
--- a/include/monitor/hmp-target.h
+++ b/include/monitor/hmp-target.h
@@ -44,6 +44,7 @@ int target_get_monitor_def(CPUState *cs, const char *name, 
uint64_t *pval);
 CPUArchState *mon_get_cpu_env(Monitor *mon);
 CPUState *mon_get_cpu(Monitor *mon);
 
+void hmp_info_mmu(Monitor *mon, const QDict *qdict);
 void hmp_info_mem(Monitor *mon, const QDict *qdict);
 void hmp_info_tlb(Monitor *mon, const QDict *qdict);
 void hmp_mce(Monitor *mon, const QDict *qdict);
diff --git a/monitor/hmp-cmds-target.c b/monitor/hmp-cmds-target.c
index 9338ae8440..71bce4870a 100644
--- a/monitor/hmp-cmds-target.c
+++ b/monitor/hmp-cmds-target.c
@@ -31,6 +31,7 @@
 #include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
 #include "sysemu/hw_accel.h"
+#include "sysemu/tcg.h"
 
 /* Set the current CPU defined by the user. Callers must hold BQL. */
 int monitor_set_cpu(Monitor *mon, int cpu_index)
@@ -379,3 +380,39 @@ void hmp_gpa2hpa(Monitor *mon, const QDict *qdict)
 memory_region_unref(mr);
 }
 #endif
+
+__attribute__((weak))
+void hmp_info_mem(Monitor *mon, const QDict *qdict)
+{
+monitor_puts(mon,
+ "No per-CPU mapping information available on this target\n");
+}
+
+__attribute__((weak))
+void hmp_info_tlb(Monitor *mon, const QDict *qdict)
+{
+monitor_puts(mon,
+ "No per-CPU TLB information available on this target\n");
+}
+
+void hmp_info_mmu(Monitor *mon, const QDict *qdict)
+{
+bool tlb = qdict_get_try_bool(qdict, "tlb", false);
+bool mem = qdict_get_try_bool(qdict, "mem", false);
+
+if (!tcg_enabled()) {
+monitor_puts(mon, "This command is specific to TCG accelerator\n");
+return;
+}
+
+if (!tlb && !mem) {
+tlb = mem = true;
+}
+
+if (mem) {
+hmp_info_mem(mon, qdict);
+}
+if (tlb) {
+hmp_info_tlb(mon, qdict);
+}
+}
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index ad1b1306e3..e31f2467fb 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -208,6 +208,20 @@ SRST
 Show PCI information.
 ERST
 
+{
+.name   = "mmu",
+.args_type  = "tlb:-t,mem:-m",
+.params = "[-t][-m]",
+.help   = "show virtual to physical memory "
+  "(-t: TLB; -m: active mapping)",
+.cmd= hmp_info_mmu,
+},
+
+SRST
+  ``info mmu``
+Show virtual to physical memory mappings.
+ERST
+
 #if defined(TARGET_I386) || defined(TARGET_SH4) || defined(TARGET_SPARC) || \
 defined(TARGET_PPC) || defined(TARGET_XTENSA) || defined(TARGET_M68K)
 {
-- 
2.41.0

[PATCH-for-9.0 0/2] target/monitor: Deprecate 'info tlb/mem' in favor of 'info mmu'

2024-03-20 Thread Philippe Mathieu-Daudé

'info tlb' and 'info mem' commands don't scale in heterogeneous
emulation. They will be reworked after the next release, hidden
behind the 'info mmu' command. It is not too late to deprecate
commands, so add the 'info mmu' command as wrapper to the other
ones, but already deprecate them.

Philippe Mathieu-Daudé (2):
  target/monitor: Introduce 'info mmu' command
  target/monitor: Deprecate 'info tlb' and 'info mem' commands

 docs/about/deprecated.rst| 10 
 include/monitor/hmp-target.h |  3 +++
 monitor/hmp-cmds-target.c| 49 
 target/i386/monitor.c|  4 +--
 target/m68k/monitor.c|  2 +-
 target/nios2/monitor.c   |  2 +-
 target/ppc/ppc-qmp-cmds.c|  2 +-
 target/riscv/monitor.c   |  2 +-
 target/sh4/monitor.c |  2 +-
 target/sparc/monitor.c   |  2 +-
 target/xtensa/monitor.c  |  2 +-
 hmp-commands-info.hx | 22 +---
 12 files changed, 89 insertions(+), 13 deletions(-)

-- 
2.41.0

Re: [PATCH v3 06/49] RAMBlock: Add support of KVM private guest memfd

2024-03-20 Thread Paolo Bonzini


On 3/20/24 09:39, Michael Roth wrote:

@@ -1842,6 +1842,17 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp)
  }
  }
  
+if (kvm_enabled() && (new_block->flags & RAM_GUEST_MEMFD)) {

+assert(new_block->guest_memfd < 0);
+
+new_block->guest_memfd = kvm_create_guest_memfd(new_block->max_length,
+0, errp);
+if (new_block->guest_memfd < 0) {
+qemu_mutex_unlock_ramlist();
+return;
+}
+}
+


This potentially leaks new_block->host.  This can be squashed into the patch:

diff --git a/system/physmem.c b/system/physmem.c
index 3a4a3f10d5a..0836aff190e 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1810,6 +1810,7 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp)
 const bool shared = qemu_ram_is_shared(new_block);
 RAMBlock *block;
 RAMBlock *last_block = NULL;
+bool free_on_error = false;
 ram_addr_t old_ram_size, new_ram_size;
 Error *err = NULL;
 
@@ -1839,6 +1841,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)

 return;
 }
 memory_try_enable_merging(new_block->host, new_block->max_length);
+free_on_error = true;
 }
 }
 
@@ -1849,7 +1852,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)

 0, errp);
 if (new_block->guest_memfd < 0) {
 qemu_mutex_unlock_ramlist();
-return;
+goto out_free;
 }
 }
 
@@ -1901,6 +1904,13 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)

 ram_block_notify_add(new_block->host, new_block->used_length,
  new_block->max_length);
 }
+return;
+
+out_free:
+if (free_on_error) {
+qemu_anon_ram_free(new_block->host, new_block->max_length);
+new_block->host = NULL;
+}
 }
 
 #ifdef CONFIG_POSIX

Re: Intention to work on GSoC project

2024-03-20 Thread Eugenio Perez Martin

On Mon, Mar 18, 2024 at 8:47 PM Sahil  wrote:
>
> Hi,
>
> I was reading the "Virtqueues and virtio ring: How the data travels"
> article [1]. There are a few things that I have not understood in the
> "avail rings" section.
>
> Q1.
> Step 2 in the "Process to make a buffer available" diagram depicts
> how the virtio driver writes the descriptor index in the avail ring.
> In the example, the descriptor index #0 is written in the first entry.
> But in figure 2, the number 0 is in the 4th position in the avail ring.
> Is the avail ring queue an array of "struct virtq_avail" which maintains
> metadata such as the number of descriptor indexes in the header?
>

struct virtq_avail has two members: uint16_t idx and ring[]. To be in
the first position of the avail ring means to be in ring[0] there.

Idx and ring[] are just headers in the figure, not actual positions.
Same as Avail. Now that you mention maybe there is a better way to
represent that, yes.

Let me know if I didn't explain it well.

> Also, in the second position, the number changes from 0 (figure 1) to
> 1 (figure 2). I haven't understood what idx, 0 (later 1) and ring[] represent
> in the figures. Does this number represent the number of descriptors
> that are currently in the avail ring?
>

It is the position in ring[] where the device needs to stop looking
for descriptors. It starts at 0, and when the device sees 1 it means
ring[0] has a descriptor to process.

Now you need to apply a "modulo virtqueue size" to that index. So if
the virtqueue is 256, avail_idx 257 means the last valid descriptor is
at 0. This happens naturally when the driver keeps adding descriptors
and wraps the queue.

The authoritative source of this is the VirtQueues section of the
virtio standard [1], feel free to check it in case it clarifies
something better.

> Q2.
>
> There's this paragraph in the article right below the above mentioned
> diagram:
>
> > The avail ring must be able to hold the same number of descriptors
> > as the descriptor area, and the descriptor area must have a size power
> > of two, so idx wraps naturally at some point. For example, if the ring
> > size is 256 entries, idx 1 references the same descriptor as idx 257, 513...
> > And it will wrap at a 16 bit boundary. This way, neither side needs to
> > worry about processing an invalid idx: They are all valid.
>
> I haven't really understood this. I have understood that idx is calculated
> as idx mod queue_length. But I haven't understood the "16 bit boundary"
> part.
>

avail_idx is an uin16_t, so ((uint16_t)-1) + 1 == 0.

> I am also not very clear on how a queue length that is not a power of 2
> might cause trouble. Could you please expand on this?
>

That's a limitation in the standard, but I'm not sure where it comes
from beyond being computationally easier to calculate ring position
with a mask than with a remainder of a random non-power-of-two number.
Packed virtqueue removes that limitation.

> Q3.
> I have started going through the source code in 
> "drivers/virtio/virtio_ring.c".
> I have understood that the virtio driver runs in the guest's kernel. Does that
> mean the drivers in "drivers/virtio/*" are enabled when linux is being run in
> a guest VM?
>

For PCI devices, as long as it detects a device with vendor == Red
Hat, Inc. (0x1AF4) and device ID 0x1000 through 0x107F inclusive, yes.
You can also load and unload manually with modprobe as other drivers.

Let me know if you have more doubts. Thanks!

[1] https://docs.oasis-open.org/virtio/virtio/v1.3/virtio-v1.3.html

> Thanks,
> Sahil
>
> [1] https://www.redhat.com/en/blog/virtqueues-and-virtio-ring-how-data-travels
>
>
>
>

RE: [PATCH v5 7/7] tests/migration-test: add qpl compression test

2024-03-20 Thread Liu, Yuan1

> -Original Message-
> From: Daniel P. Berrangé 
> Sent: Wednesday, March 20, 2024 11:40 PM
> To: Liu, Yuan1 
> Cc: pet...@redhat.com; faro...@suse.de; qemu-devel@nongnu.org;
> hao.xi...@bytedance.com; bryan.zh...@bytedance.com; Zou, Nanhai
> 
> Subject: Re: [PATCH v5 7/7] tests/migration-test: add qpl compression test
> 
> On Wed, Mar 20, 2024 at 03:30:40PM +, Liu, Yuan1 wrote:
> > > -Original Message-
> > > From: Daniel P. Berrangé 
> > > Sent: Wednesday, March 20, 2024 6:46 PM
> > > To: Liu, Yuan1 
> > > Cc: pet...@redhat.com; faro...@suse.de; qemu-devel@nongnu.org;
> > > hao.xi...@bytedance.com; bryan.zh...@bytedance.com; Zou, Nanhai
> > > 
> > > Subject: Re: [PATCH v5 7/7] tests/migration-test: add qpl compression
> test
> > >
> > > On Wed, Mar 20, 2024 at 12:45:27AM +0800, Yuan Liu wrote:
> > > > add qpl to compression method test for multifd migration
> > > >
> > > > the migration with qpl compression needs to access IAA hardware
> > > > resource, please run "check-qtest" with sudo or root permission,
> > > > otherwise migration test will fail
> > >
> > > That's not an acceptable requirement.
> > >
> > > If someone builds QEMU with QPL, the migration test *must*
> > > pass 100% reliably when either running on a host without
> > > the QPL required hardware, or when lacking permissions.
> > >
> > > The test case needs to detect these scenarios and automatically
> > > skip the test if it is incapable of running successfully.
> > > This raises another question though. If QPL migration requires
> > > running as root, then it is effectively unusable for QEMU, as
> > > no sane deployment ever runs QEMU as root.
> > >
> > > Is there a way to make QPL work for non-root users ?
> >
> > There are two issues here
> > 1. I need to add an IAA resource detection before the QPL test begins
> >In this way, when QPL resources are unavailable, the live migration
> >test will not be affected.
> >
> > 2. I need to add some additional information about IAA configuration in
> >the devel/qpl-compression.rst documentation. In addition to
> configuring
> >IAA resources, the system administrator also needs to assign IAA
> resources
> >to user groups.
> >For example, the system administrator runs "chown -R user /dev/iax",
> then
> >all IAA resources can be accessed by "user", this method does not
> require
> >sudo or root permissions
> 
> Ok, so in the test suite you likely should do something
> approximately like
> 
> #ifdef CONFIG_QPL
>   if (access("/dev/iax", R_OK|W_OK) == 0) {
> migration_test_add("/migration/multifd/tcp/plain/qpl",
>test_multifd_tcp_qpl);
>   }
> #endif
> 
> possibly more if you need to actually query supported features
> of /dev/iax before trying to use it

Yes, very thanks for your suggestion, I will fix this in the next version.

> > > > Signed-off-by: Yuan Liu 
> > > > Reviewed-by: Nanhai Zou 
> > > > ---
> > > >  tests/qtest/migration-test.c | 24 
> > > >  1 file changed, 24 insertions(+)
> > > >
> > > > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-
> test.c
> > > > index 71895abb7f..052d0d60fd 100644
> > > > --- a/tests/qtest/migration-test.c
> > > > +++ b/tests/qtest/migration-test.c
> > > > @@ -2815,6 +2815,15 @@
> > > test_migrate_precopy_tcp_multifd_zstd_start(QTestState *from,
> > > >  }
> > > >  #endif /* CONFIG_ZSTD */
> > > >
> > > > +#ifdef CONFIG_QPL
> > > > +static void *
> > > > +test_migrate_precopy_tcp_multifd_qpl_start(QTestState *from,
> > > > +QTestState *to)
> > > > +{
> > > > +return test_migrate_precopy_tcp_multifd_start_common(from, to,
> > > "qpl");
> > > > +}
> > > > +#endif /* CONFIG_QPL */
> > > > +
> > > >  static void test_multifd_tcp_none(void)
> > > >  {
> > > >  MigrateCommon args = {
> > > > @@ -2880,6 +2889,17 @@ static void test_multifd_tcp_zstd(void)
> > > >  }
> > > >  #endif
> > > >
> > > > +#ifdef CONFIG_QPL
> > > > +static void test_multifd_tcp_qpl(void)
> > > > +{
> > > > +MigrateCommon args = {
> > > > +.listen_uri = "defer",
> > > > +.start_hook = test_migrate_precopy_tcp_multifd_qpl_start,
> > > > +};
> > > > +test_precopy_common();
> > > > +}
> > > > +#endif
> > > > +
> > > >  #ifdef CONFIG_GNUTLS
> > > >  static void *
> > > >  test_migrate_multifd_tcp_tls_psk_start_match(QTestState *from,
> > > > @@ -3789,6 +3809,10 @@ int main(int argc, char **argv)
> > > >  migration_test_add("/migration/multifd/tcp/plain/zstd",
> > > > test_multifd_tcp_zstd);
> > > >  #endif
> > > > +#ifdef CONFIG_QPL
> > > > +migration_test_add("/migration/multifd/tcp/plain/qpl",
> > > > +   test_multifd_tcp_qpl);
> > > > +#endif
> > > >  #ifdef CONFIG_GNUTLS
> > > >  migration_test_add("/migration/multifd/tcp/tls/psk/match",
> > > > test_multifd_tcp_tls_psk_match);
> > > > --
> > > > 2.39.3
> > > >
> >

Re: [PATCH v3 19/49] kvm: Make kvm_convert_memory() obey ram_block_discard_is_enabled()

2024-03-20 Thread Paolo Bonzini


On 3/20/24 09:39, Michael Roth wrote:

Some subsystems like VFIO might disable ram block discard for
uncoordinated cases. Since kvm_convert_memory()/guest_memfd don't
implement a RamDiscardManager handler to convey discard operations to
various listeners like VFIO. > Because of this, sequences like the
following can result due to stale IOMMU mappings:


Alternatively, should guest-memfd memory regions call 
ram_block_discard_require(true)?  This will prevent VFIO from operating, 
but it will avoid consuming twice the memory.


If desirable, guest-memfd support can be changed to implement an 
extension of RamDiscardManager that notifies about private/shared memory 
changes, and then guest-memfd would be able to support coordinated 
discard.  But I wonder if that's doable at all - how common are 
shared<->private flips, and is it feasible to change the IOMMU page 
tables every time?


If the real solution is SEV-TIO (which means essentially guest_memfd 
support for VFIO), calling ram_block_discard_require(true) may be the 
simplest stopgap solution.


Paolo


   - convert page shared->private
   - discard shared page
   - convert page private->shared
   - new page is allocated
   - issue DMA operations against that shared page

Address this by taking ram_block_discard_is_enabled() into account when
deciding whether or not to discard pages.

Signed-off-by: Michael Roth 
---
  accel/kvm/kvm-all.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 53ce4f091e..6ae03c880f 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2962,10 +2962,14 @@ static int kvm_convert_memory(hwaddr start, hwaddr 
size, bool to_private)
  */
  return 0;
  } else {
-ret = ram_block_discard_range(rb, offset, size);
+ret = ram_block_discard_is_disabled()
+  ? ram_block_discard_range(rb, offset, size)
+  : 0;
  }
  } else {
-ret = ram_block_discard_guest_memfd_range(rb, offset, size);
+ret = ram_block_discard_is_disabled()
+  ? ram_block_discard_guest_memfd_range(rb, offset, size)
+  : 0;
  }
  } else {
  error_report("Convert non guest_memfd backed memory region "

RE: [PATCH v5 5/7] migration/multifd: implement initialization of qpl compression

2024-03-20 Thread Liu, Yuan1

> -Original Message-
> From: Peter Xu 
> Sent: Wednesday, March 20, 2024 11:35 PM
> To: Liu, Yuan1 
> Cc: Daniel P. Berrangé ; faro...@suse.de; qemu-
> de...@nongnu.org; hao.xi...@bytedance.com; bryan.zh...@bytedance.com; Zou,
> Nanhai 
> Subject: Re: [PATCH v5 5/7] migration/multifd: implement initialization of
> qpl compression
> 
> On Wed, Mar 20, 2024 at 03:02:59PM +, Liu, Yuan1 wrote:
> > > > +static int alloc_zbuf(QplData *qpl, uint8_t chan_id, Error **errp)
> > > > +{
> > > > +int flags = MAP_PRIVATE | MAP_POPULATE | MAP_ANONYMOUS;
> > > > +uint32_t size = qpl->job_num * qpl->data_size;
> > > > +uint8_t *buf;
> > > > +
> > > > +buf = (uint8_t *) mmap(NULL, size, PROT_READ | PROT_WRITE,
> flags, -
> > > 1, 0);
> > > > +if (buf == MAP_FAILED) {
> > > > +error_setg(errp, "multifd: %u: alloc_zbuf failed, job
> num %u,
> > > size %u",
> > > > +   chan_id, qpl->job_num, qpl->data_size);
> > > > +return -1;
> > > > +}
> > >
> > > What's the reason for using mmap here, rather than a normal
> > > malloc ?
> >
> > I want to populate the memory accessed by the IAA device in the
> initialization
> > phase, and then avoid initiating I/O page faults through the IAA device
> during
> > migration, a large number of I/O page faults are not good for
> performance.
> 
> mmap() doesn't populate pages, unless with MAP_POPULATE.  And even with
> that it shouldn't be guaranteed, as the populate phase should ignore all
> errors.
> 
>MAP_POPULATE (since Linux 2.5.46)
>   Populate (prefault) page tables for a mapping.  For a file
> map‐
>   ping, this causes read-ahead on the file.  This will help to
> re‐
>   duce  blocking  on  page  faults later.  The mmap() call
> doesn't
>   fail if the mapping cannot be populated  (for  example,  due
> to
>   limitations  on  the  number  of  mapped  huge  pages when
> using
>   MAP_HUGETLB).  Support for MAP_POPULATE in conjunction with
> pri‐
>   vate mappings was added in Linux 2.6.23.
> 
> OTOH, I think g_malloc0() should guarantee to prefault everything in as
> long as the call returned (even though they can be swapped out later, but
> that applies to all cases anyway).

Thanks, Peter. I will try the g_malloc0 method here

> > This problem also occurs at the destination, therefore, I recommend that
> > customers need to add -mem-prealloc for destination boot parameters.
> 
> I'm not sure what issue you hit when testing it, but -mem-prealloc flag
> should only control the guest memory backends not the buffers that QEMU
> internally use, afaiu.
> 
> Thanks,
> 
> --
> Peter Xu

let me explain here, during the decompression operation of IAA, the 
decompressed data
can be directly output to the virtual address of the guest memory by IAA 
hardware. 
It can avoid copying the decompressed data to guest memory by CPU.

Without -mem-prealloc, all the guest memory is not populated, and IAA hardware 
needs to trigger
I/O page fault first and then output the decompressed data to the guest memory 
region. 
Besides that, CPU page faults will also trigger IOTLB flush operation when IAA 
devices use SVM. 

Due to the inability to quickly resolve a large number of IO page faults and 
IOTLB flushes, the
decompression throughput of the IAA device will decrease significantly.

[PATCH v4 3/7] qga/commands-posix: qmp_guest_shutdown: use ga_run_command helper

2024-03-20 Thread Andrey Drobyshev

Also remove the G_GNUC_UNUSED attribute added in the previous commit from
the helper.

Signed-off-by: Andrey Drobyshev 
Reviewed-by: Daniel P. Berrangé 
---
 qga/commands-posix.c | 39 ++-
 1 file changed, 6 insertions(+), 33 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 9b1bdf194c..cb9eed9a0b 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -108,7 +108,6 @@ static ssize_t ga_pipe_read_str(int fd[2], char **str)
  * sending string to stdin and taking error message from
  * stdout/err.
  */
-G_GNUC_UNUSED
 static int ga_run_command(const char *argv[], const char *in_str,
   const char *action, Error **errp)
 {
@@ -230,8 +229,6 @@ void qmp_guest_shutdown(const char *mode, Error **errp)
 {
 const char *shutdown_flag;
 Error *local_err = NULL;
-pid_t pid;
-int status;
 
 #ifdef CONFIG_SOLARIS
 const char *powerdown_flag = "-i5";
@@ -260,46 +257,22 @@ void qmp_guest_shutdown(const char *mode, Error **errp)
 return;
 }
 
-pid = fork();
-if (pid == 0) {
-/* child, start the shutdown */
-setsid();
-reopen_fd_to_null(0);
-reopen_fd_to_null(1);
-reopen_fd_to_null(2);
-
+const char *argv[] = {"/sbin/shutdown",
 #ifdef CONFIG_SOLARIS
-execl("/sbin/shutdown", "shutdown", shutdown_flag, "-g0", "-y",
-  "hypervisor initiated shutdown", (char *)NULL);
+  shutdown_flag, "-g0", "-y",
 #elif defined(CONFIG_BSD)
-execl("/sbin/shutdown", "shutdown", shutdown_flag, "+0",
-   "hypervisor initiated shutdown", (char *)NULL);
+  shutdown_flag, "+0",
 #else
-execl("/sbin/shutdown", "shutdown", "-h", shutdown_flag, "+0",
-   "hypervisor initiated shutdown", (char *)NULL);
+  "-h", shutdown_flag, "+0",
 #endif
-_exit(EXIT_FAILURE);
-} else if (pid < 0) {
-error_setg_errno(errp, errno, "failed to create child process");
-return;
-}
+  "hypervisor initiated shutdown", (char *) NULL};
 
-ga_wait_child(pid, , _err);
+ga_run_command(argv, NULL, "shutdown", _err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
 }
 
-if (!WIFEXITED(status)) {
-error_setg(errp, "child process has terminated abnormally");
-return;
-}
-
-if (WEXITSTATUS(status)) {
-error_setg(errp, "child process has failed to shutdown");
-return;
-}
-
 /* succeeded */
 }
 
-- 
2.39.3

[PATCH v4 1/7] qga: guest-get-fsinfo: add optional 'total-bytes-privileged' field

2024-03-20 Thread Andrey Drobyshev

Since the commit 25b5ff1a86 ("qga: add mountpoint usage info to
GuestFilesystemInfo") we have 2 values reported in guest-get-fsinfo:
used = (f_blocks - f_bfree), total = (f_blocks - f_bfree + f_bavail) as
returned by statvfs(3).  While on Windows guests that's all we can get
with GetDiskFreeSpaceExA(), on POSIX guests we might also be interested in
total file system size, as it's visible for root user.  Let's add an
optional field 'total-bytes-privileged' to GuestFilesystemInfo struct,
which'd only be reported on POSIX and represent f_blocks value as returned
by statvfs(3).

While here, also tweak the docs to reflect better where those values
come from.

Signed-off-by: Andrey Drobyshev 
---
 qga/commands-posix.c | 2 ++
 qga/commands-win32.c | 1 +
 qga/qapi-schema.json | 7 +--
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 26008db497..7df2d72e9f 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -1569,8 +1569,10 @@ static GuestFilesystemInfo *build_guest_fsinfo(struct 
FsMount *mount,
 nonroot_total = used + buf.f_bavail;
 fs->used_bytes = used * fr_size;
 fs->total_bytes = nonroot_total * fr_size;
+fs->total_bytes_privileged = buf.f_blocks * fr_size;
 
 fs->has_total_bytes = true;
+fs->has_total_bytes_privileged = true;
 fs->has_used_bytes = true;
 }
 
diff --git a/qga/commands-win32.c b/qga/commands-win32.c
index 6242737b00..6fee0e1e6f 100644
--- a/qga/commands-win32.c
+++ b/qga/commands-win32.c
@@ -1143,6 +1143,7 @@ static GuestFilesystemInfo *build_guest_fsinfo(char 
*guid, Error **errp)
 fs = g_malloc(sizeof(*fs));
 fs->name = g_strdup(guid);
 fs->has_total_bytes = false;
+fs->has_total_bytes_privileged = false;
 fs->has_used_bytes = false;
 if (len == 0) {
 fs->mountpoint = g_strdup("System Reserved");
diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
index 9554b566a7..dcc469b268 100644
--- a/qga/qapi-schema.json
+++ b/qga/qapi-schema.json
@@ -1026,7 +1026,10 @@
 #
 # @used-bytes: file system used bytes (since 3.0)
 #
-# @total-bytes: non-root file system total bytes (since 3.0)
+# @total-bytes: filesystem capacity in bytes for unprivileged users (since 3.0)
+#
+# @total-bytes-privileged: filesystem capacity in bytes for privileged users
+# (since 9.0)
 #
 # @disk: an array of disk hardware information that the volume lies
 # on, which may be empty if the disk type is not supported
@@ -1036,7 +1039,7 @@
 { 'struct': 'GuestFilesystemInfo',
   'data': {'name': 'str', 'mountpoint': 'str', 'type': 'str',
'*used-bytes': 'uint64', '*total-bytes': 'uint64',
-   'disk': ['GuestDiskAddress']} }
+   '*total-bytes-privileged': 'uint64', 'disk': ['GuestDiskAddress']} }
 
 ##
 # @guest-get-fsinfo:
-- 
2.39.3

[PATCH v4 5/7] qga/commands-posix: execute_fsfreeze_hook: use ga_run_command helper

2024-03-20 Thread Andrey Drobyshev

There's no need to check for the existence of the hook executable, as the
exec() call will do that for us.

Signed-off-by: Andrey Drobyshev 
Reviewed-by: Daniel P. Berrangé 
---
 qga/commands-posix.c | 35 +++
 1 file changed, 3 insertions(+), 32 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 545f3c99dc..9b993772f5 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -736,8 +736,6 @@ static const char *fsfreeze_hook_arg_string[] = {
 
 static void execute_fsfreeze_hook(FsfreezeHookArg arg, Error **errp)
 {
-int status;
-pid_t pid;
 const char *hook;
 const char *arg_str = fsfreeze_hook_arg_string[arg];
 Error *local_err = NULL;
@@ -746,42 +744,15 @@ static void execute_fsfreeze_hook(FsfreezeHookArg arg, 
Error **errp)
 if (!hook) {
 return;
 }
-if (access(hook, X_OK) != 0) {
-error_setg_errno(errp, errno, "can't access fsfreeze hook '%s'", hook);
-return;
-}
 
-slog("executing fsfreeze hook with arg '%s'", arg_str);
-pid = fork();
-if (pid == 0) {
-setsid();
-reopen_fd_to_null(0);
-reopen_fd_to_null(1);
-reopen_fd_to_null(2);
-
-execl(hook, hook, arg_str, NULL);
-_exit(EXIT_FAILURE);
-} else if (pid < 0) {
-error_setg_errno(errp, errno, "failed to create child process");
-return;
-}
+const char *argv[] = {hook, arg_str, NULL};
 
-ga_wait_child(pid, , _err);
+slog("executing fsfreeze hook with arg '%s'", arg_str);
+ga_run_command(argv, NULL, "execute fsfreeze hook", _err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
 }
-
-if (!WIFEXITED(status)) {
-error_setg(errp, "fsfreeze hook has terminated abnormally");
-return;
-}
-
-status = WEXITSTATUS(status);
-if (status) {
-error_setg(errp, "fsfreeze hook has failed with status %d", status);
-return;
-}
 }
 
 /*
-- 
2.39.3

[PATCH v4 0/7] qga/commands-posix: replace code duplicating commands with a helper

2024-03-20 Thread Andrey Drobyshev

v3 -> v4:
  * Patch 1/7:
- Replaced "since 8.3" with "since 9.0" as we're now at v9.0.0-rc0;
- Renamed the field to 'total-bytes-privileged';
- Got rid of the implementation details in the docs;
  * Patch 6/7: added g_autoptr macro to local error declaration.

v3: https://lists.nongnu.org/archive/html/qemu-devel/2024-03/msg04068.html

Andrey Drobyshev (7):
  qga: guest-get-fsinfo: add optional 'total-bytes-privileged' field
  qga: introduce ga_run_command() helper for guest cmd execution
  qga/commands-posix: qmp_guest_shutdown: use ga_run_command helper
  qga/commands-posix: qmp_guest_set_time: use ga_run_command helper
  qga/commands-posix: execute_fsfreeze_hook: use ga_run_command helper
  qga/commands-posix: don't do fork()/exec() when suspending via sysfs
  qga/commands-posix: qmp_guest_set_user_password: use ga_run_command
helper

 qga/commands-posix.c | 404 +++
 qga/commands-win32.c |   1 +
 qga/qapi-schema.json |   7 +-
 3 files changed, 187 insertions(+), 225 deletions(-)

-- 
2.39.3

[PATCH v4 6/7] qga/commands-posix: don't do fork()/exec() when suspending via sysfs

2024-03-20 Thread Andrey Drobyshev

Since commit 246d76eba ("qga: guest_suspend: decoupling pm-utils and sys
logic") pm-utils logic is running in a separate child from the sysfs
logic.  Now when suspending via sysfs we don't really need to do that in
a separate process as we only need to perform one write to /sys/power/state.

Let's just use g_file_set_contents() to simplify things here.

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Andrey Drobyshev 
Reviewed-by: Daniel P. Berrangé 
---
 qga/commands-posix.c | 41 +
 1 file changed, 5 insertions(+), 36 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 9b993772f5..9910957ff5 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -1928,52 +1928,21 @@ static bool linux_sys_state_supports_mode(SuspendMode 
mode, Error **errp)
 
 static void linux_sys_state_suspend(SuspendMode mode, Error **errp)
 {
-Error *local_err = NULL;
+g_autoptr(GError) local_gerr = NULL;
 const char *sysfile_strs[3] = {"disk", "mem", NULL};
 const char *sysfile_str = sysfile_strs[mode];
-pid_t pid;
-int status;
 
 if (!sysfile_str) {
 error_setg(errp, "unknown guest suspend mode");
 return;
 }
 
-pid = fork();
-if (!pid) {
-/* child */
-int fd;
-
-setsid();
-reopen_fd_to_null(0);
-reopen_fd_to_null(1);
-reopen_fd_to_null(2);
-
-fd = open(LINUX_SYS_STATE_FILE, O_WRONLY);
-if (fd < 0) {
-_exit(EXIT_FAILURE);
-}
-
-if (write(fd, sysfile_str, strlen(sysfile_str)) < 0) {
-_exit(EXIT_FAILURE);
-}
-
-_exit(EXIT_SUCCESS);
-} else if (pid < 0) {
-error_setg_errno(errp, errno, "failed to create child process");
-return;
-}
-
-ga_wait_child(pid, , _err);
-if (local_err) {
-error_propagate(errp, local_err);
+if (!g_file_set_contents(LINUX_SYS_STATE_FILE, sysfile_str,
+ -1, _gerr)) {
+error_setg(errp, "suspend: cannot write to '%s': %s",
+   LINUX_SYS_STATE_FILE, local_gerr->message);
 return;
 }
-
-if (WEXITSTATUS(status)) {
-error_setg(errp, "child process has failed to suspend");
-}
-
 }
 
 static void guest_suspend(SuspendMode mode, Error **errp)
-- 
2.39.3

[PATCH v4 4/7] qga/commands-posix: qmp_guest_set_time: use ga_run_command helper

2024-03-20 Thread Andrey Drobyshev

There's no need to check for the existence of "/sbin/hwclock", the
exec() call will do that for us.

Signed-off-by: Andrey Drobyshev 
Reviewed-by: Daniel P. Berrangé 
---
 qga/commands-posix.c | 43 +++
 1 file changed, 3 insertions(+), 40 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index cb9eed9a0b..545f3c99dc 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -279,21 +279,9 @@ void qmp_guest_shutdown(const char *mode, Error **errp)
 void qmp_guest_set_time(bool has_time, int64_t time_ns, Error **errp)
 {
 int ret;
-int status;
-pid_t pid;
 Error *local_err = NULL;
 struct timeval tv;
-static const char hwclock_path[] = "/sbin/hwclock";
-static int hwclock_available = -1;
-
-if (hwclock_available < 0) {
-hwclock_available = (access(hwclock_path, X_OK) == 0);
-}
-
-if (!hwclock_available) {
-error_setg(errp, QERR_UNSUPPORTED);
-return;
-}
+const char *argv[] = {"/sbin/hwclock", has_time ? "-w" : "-s", NULL};
 
 /* If user has passed a time, validate and set it. */
 if (has_time) {
@@ -324,37 +312,12 @@ void qmp_guest_set_time(bool has_time, int64_t time_ns, 
Error **errp)
  * just need to synchronize the hardware clock. However, if no time was
  * passed, user is requesting the opposite: set the system time from the
  * hardware clock (RTC). */
-pid = fork();
-if (pid == 0) {
-setsid();
-reopen_fd_to_null(0);
-reopen_fd_to_null(1);
-reopen_fd_to_null(2);
-
-/* Use '/sbin/hwclock -w' to set RTC from the system time,
- * or '/sbin/hwclock -s' to set the system time from RTC. */
-execl(hwclock_path, "hwclock", has_time ? "-w" : "-s", NULL);
-_exit(EXIT_FAILURE);
-} else if (pid < 0) {
-error_setg_errno(errp, errno, "failed to create child process");
-return;
-}
-
-ga_wait_child(pid, , _err);
+ga_run_command(argv, NULL, "set hardware clock to system time",
+   _err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
 }
-
-if (!WIFEXITED(status)) {
-error_setg(errp, "child process has terminated abnormally");
-return;
-}
-
-if (WEXITSTATUS(status)) {
-error_setg(errp, "hwclock failed to set hardware clock to system 
time");
-return;
-}
 }
 
 typedef enum {
-- 
2.39.3

[PATCH v4 7/7] qga/commands-posix: qmp_guest_set_user_password: use ga_run_command helper

2024-03-20 Thread Andrey Drobyshev

There's no need to check for the existence of the "chpasswd", "pw"
executables, as the exec() call will do that for us.

Signed-off-by: Andrey Drobyshev 
Reviewed-by: Daniel P. Berrangé 
---
 qga/commands-posix.c | 96 ++--
 1 file changed, 13 insertions(+), 83 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 9910957ff5..7a065c4085 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -2151,14 +2151,8 @@ void qmp_guest_set_user_password(const char *username,
  Error **errp)
 {
 Error *local_err = NULL;
-char *passwd_path = NULL;
-pid_t pid;
-int status;
-int datafd[2] = { -1, -1 };
-char *rawpasswddata = NULL;
+g_autofree char *rawpasswddata = NULL;
 size_t rawpasswdlen;
-char *chpasswddata = NULL;
-size_t chpasswdlen;
 
 rawpasswddata = (char *)qbase64_decode(password, -1, , errp);
 if (!rawpasswddata) {
@@ -2169,95 +2163,31 @@ void qmp_guest_set_user_password(const char *username,
 
 if (strchr(rawpasswddata, '\n')) {
 error_setg(errp, "forbidden characters in raw password");
-goto out;
+return;
 }
 
 if (strchr(username, '\n') ||
 strchr(username, ':')) {
 error_setg(errp, "forbidden characters in username");
-goto out;
+return;
 }
 
 #ifdef __FreeBSD__
-chpasswddata = g_strdup(rawpasswddata);
-passwd_path = g_find_program_in_path("pw");
+g_autofree char *chpasswdata = g_strdup(rawpasswddata);
+const char *crypt_flag = crypted ? "-H" : "-h";
+const char *argv[] = {"pw", "usermod", "-n", username,
+  crypt_flag, "0", NULL};
 #else
-chpasswddata = g_strdup_printf("%s:%s\n", username, rawpasswddata);
-passwd_path = g_find_program_in_path("chpasswd");
+g_autofree char *chpasswddata = g_strdup_printf("%s:%s\n", username,
+rawpasswddata);
+const char *crypt_flag = crypted ? "-e" : NULL;
+const char *argv[] = {"chpasswd", crypt_flag, NULL};
 #endif
 
-chpasswdlen = strlen(chpasswddata);
-
-if (!passwd_path) {
-error_setg(errp, "cannot find 'passwd' program in PATH");
-goto out;
-}
-
-if (!g_unix_open_pipe(datafd, FD_CLOEXEC, NULL)) {
-error_setg(errp, "cannot create pipe FDs");
-goto out;
-}
-
-pid = fork();
-if (pid == 0) {
-close(datafd[1]);
-/* child */
-setsid();
-dup2(datafd[0], 0);
-reopen_fd_to_null(1);
-reopen_fd_to_null(2);
-
-#ifdef __FreeBSD__
-const char *h_arg;
-h_arg = (crypted) ? "-H" : "-h";
-execl(passwd_path, "pw", "usermod", "-n", username, h_arg, "0", NULL);
-#else
-if (crypted) {
-execl(passwd_path, "chpasswd", "-e", NULL);
-} else {
-execl(passwd_path, "chpasswd", NULL);
-}
-#endif
-_exit(EXIT_FAILURE);
-} else if (pid < 0) {
-error_setg_errno(errp, errno, "failed to create child process");
-goto out;
-}
-close(datafd[0]);
-datafd[0] = -1;
-
-if (qemu_write_full(datafd[1], chpasswddata, chpasswdlen) != chpasswdlen) {
-error_setg_errno(errp, errno, "cannot write new account password");
-goto out;
-}
-close(datafd[1]);
-datafd[1] = -1;
-
-ga_wait_child(pid, , _err);
+ga_run_command(argv, chpasswddata, "set user password", _err);
 if (local_err) {
 error_propagate(errp, local_err);
-goto out;
-}
-
-if (!WIFEXITED(status)) {
-error_setg(errp, "child process has terminated abnormally");
-goto out;
-}
-
-if (WEXITSTATUS(status)) {
-error_setg(errp, "child process has failed to set user password");
-goto out;
-}
-
-out:
-g_free(chpasswddata);
-g_free(rawpasswddata);
-g_free(passwd_path);
-if (datafd[0] != -1) {
-close(datafd[0]);
-}
-if (datafd[1] != -1) {
-close(datafd[1]);
+return;
 }
 }
 #else /* __linux__ || __FreeBSD__ */
-- 
2.39.3

1 2 3 4 >

1 - 100 of 342 matches

Mail list logo