date:20240329

Re: [PATCH v10 17/23] hw/intc/arm_gicv3: Add NMI handling CPU interface registers

2024-03-29 Thread Jinjie Ruan via




On 2024/3/28 22:50, Peter Maydell wrote:
> On Mon, 25 Mar 2024 at 08:53, Jinjie Ruan  wrote:
>>
>> Add the NMIAR CPU interface registers which deal with acknowledging NMI.
>>
>> When introduce NMI interrupt, there are some updates to the semantics for the
>> register ICC_IAR1_EL1 and ICC_HPPIR1_EL1. For ICC_IAR1_EL1 register, it
>> should return 1022 if the intid has non-maskable property. And for
>> ICC_NMIAR1_EL1 register, it should return 1023 if the intid do not have
>> non-maskable property. Howerever, these are not necessary for ICC_HPPIR1_EL1
>> register.
>>
>> And the APR and RPR has NMI bits which should be handled correctly.
>>
>> Signed-off-by: Jinjie Ruan 
>> Reviewed-by: Richard Henderson 
>> ---
>> v10:
>> - is_nmi -> nmi.
>> - is_hppi -> hppi.
>> - Exchange the order of nmi and hppi parameters.
>> - superprio -> nmi.
>> - Handle APR and RPR NMI bits.
>> - Update the commit message, super priority -> non-maskable property.
>> v7:
>> - Add Reviewed-by.
>> v4:
>> - Define ICC_NMIAR1_EL1 only if FEAT_GICv3_NMI is implemented.
>> - Check sctrl_elx.SCTLR_NMI to return 1022 for icc_iar1_read().
>> - Add gicv3_icc_nmiar1_read() trace event.
>> - Do not check icc_hppi_can_preempt() for icc_nmiar1_read().
>> - Add icv_nmiar1_read() and call it when EL2Enabled() and HCR_EL2.IMO == '1'
>> ---
>>  hw/intc/arm_gicv3_cpuif.c | 115 ++
>>  hw/intc/gicv3_internal.h  |   5 ++
>>  hw/intc/trace-events  |   1 +
>>  3 files changed, 110 insertions(+), 11 deletions(-)
>>
>> diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
>> index e1a60d8c15..76e2286e70 100644
>> --- a/hw/intc/arm_gicv3_cpuif.c
>> +++ b/hw/intc/arm_gicv3_cpuif.c
>> @@ -795,6 +795,13 @@ static uint64_t icv_iar_read(CPUARMState *env, const 
>> ARMCPRegInfo *ri)
>>  return intid;
>>  }
>>
>> +static uint64_t icv_nmiar1_read(CPUARMState *env, const ARMCPRegInfo *ri)
>> +{
>> +/* todo */
>> +uint64_t intid = INTID_SPURIOUS;
>> +return intid;
>> +}
>> +
>>  static uint32_t icc_fullprio_mask(GICv3CPUState *cs)
>>  {
>>  /*
>> @@ -825,11 +832,15 @@ static inline int icc_num_aprs(GICv3CPUState *cs)
>>  return aprmax;
>>  }
>>
>> -static int icc_highest_active_prio(GICv3CPUState *cs)
>> +static uint64_t icc_highest_active_prio(GICv3CPUState *cs)
>>  {
>>  /* Calculate the current running priority based on the set bits
>>   * in the Active Priority Registers.
>>   */
>> +ARMCPU *cpu = ARM_CPU(cs->cpu);
>> +CPUARMState *env = >env;
>> +
>> +uint64_t prio;
>>  int i;
>>
>>  for (i = 0; i < icc_num_aprs(cs); i++) {
>> @@ -839,7 +850,32 @@ static int icc_highest_active_prio(GICv3CPUState *cs)
>>  if (!apr) {
>>  continue;
>>  }
>> -return (i * 32 + ctz32(apr)) << (icc_min_bpr(cs) + 1);
>> +prio = (i * 32 + ctz32(apr)) << (icc_min_bpr(cs) + 1);
>> +
>> +if (cs->gic->nmi_support) {
>> +if (cs->gic->gicd_ctlr & GICD_CTLR_DS) {
>> +if ((cs->icc_apr[GICV3_G0][i] & ICC_AP1R_EL1_NMI) ||
>> +(cs->icc_apr[GICV3_G1][i] & ICC_AP1R_EL1_NMI) ||
>> +(cs->icc_apr[GICV3_G1NS][i] & ICC_AP1R_EL1_NMI)) {
>> +prio |= ICC_RPR_EL1_NMI;
>> +}
>> +} else if (!arm_is_secure(env)) {
>> +if (cs->icc_apr[GICV3_G1NS][i] & ICC_AP1R_EL1_NMI) {
>> +prio |= ICC_RPR_EL1_NMI;
>> +}
>> +} else {
>> +if (cs->icc_apr[GICV3_G1][i] & ICC_AP1R_EL1_NMI) {
>> +prio |= ICC_RPR_EL1_NMI;
>> +}
>> +}
>> +
>> +if (arm_feature(env, ARM_FEATURE_EL3) &&
>> +cs->icc_apr[GICV3_G1NS][i] & ICC_AP1R_EL1_NMI) {
>> +prio |= ICC_RPR_EL1_NSNMI;
>> +}
>> +}
>> +
>> +return prio;
> 
> This function is used both for getting the ICC_RPR value,
> and also in icc_hppi_can_preempt(). So we can't put the
> special RPR NMI bits in here. Also doing that will not work well
> with the way the code in icc_rpr_read() adjusts the priority
> for non-secure accesses. I think we should follow the structure
> of the pseudocode here, and do the setting of the RPR bits 62 and 63
> in icc_rpr_read(). (In the pseudocode this is ICC_RPR_EL1 calling
> GetHighestActivePriority() and then doing the NMI bits locally.)
> 
> The NMI bit also exists only in the AP1R0 bit, not in every AP
> register. So you can check it before the for() loop, something like this:
> 
> if (cs->gic->nmi_support) {
> /*
>  * If an NMI is active this takes precedence over anything else
>  * for priority purposes; the NMI bit is only in the AP1R0 bit.
>  * We return here the effective priority of the NMI, which is
>  * either 0x0 or 0x80. Callers will need to check NMI again for
>  * purposes of either setting the RPR register bits or for
>

Re: [RFC PATCH v2 5/6] cxl: add definition for transaction types

2024-03-29 Thread Dan Williams

Shiyang Ruan wrote:
> The transaction types are defined in General Media Event Record/DRAM Event
> per CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43 and
> Section 8.2.9.2.1.2; Table 8-44.  Add them for Event Record handler use.

Combine this patch with the one that uses them so that the use case can
be reviewed together with the implementation.

Re: [RFC PATCH v2 4/6] cxl/core: report poison when injecting from debugfs

2024-03-29 Thread Dan Williams

Shiyang Ruan wrote:
> Poison injection from debugfs is silent too.  Add calling
> cxl_mem_report_poison() to make it able to do memory_failure().

Why does this needs to be signalled? It is a debug interface, the
debugger can also trigger a read after the injection, or trigger page
soft-offline.

Re: [RFC PATCH v2 3/6] cxl/core: add report option for cxl_mem_get_poison()

2024-03-29 Thread Dan Williams

Shiyang Ruan wrote:
> The GMER only has "Physical Address" field, no such one indicates length.
> So, when a poison event is received, we could use GET_POISON_LIST command
> to get the poison list.  Now driver has cxl_mem_get_poison(), so
> reuse it and add a parameter 'bool report', report poison record to MCE
> if set true.

I am not sure I agree with the rationale here because there is no
correlation between the event being signaled and the current state of
the poison list. It also establishes race between multiple GMER events,
i.e. imagine the hardware sends 4 GMER events to communicate a 256B
poison discovery event. Does the driver need logic to support GMER event
2, 3, and 4 if it already say all 256B of poison after processing GMER
event 1?

I think the best the driver can do is assume at least 64B of poison
per-event and depend on multiple notifications to handle larger poison
lengths.

Otherwise, the poison list is really only useful for pre-populating
pages to offline after a reboot, i.e. to catch the kernel up with the
state of poison pages after a reboot.

Re: [PATCH v10 14/23] hw/intc/arm_gicv3: Add irq non-maskable property

2024-03-29 Thread Jinjie Ruan via




On 2024/3/28 22:54, Peter Maydell wrote:
> On Mon, 25 Mar 2024 at 08:52, Jinjie Ruan  wrote:
>>
>> A SPI, PPI or SGI interrupt can have non-maskable property. So maintain
>> non-maskable property in PendingIrq and GICR/GICD. Since add new device
>> state, it also needs to be migrated, so also save NMI info in
>> vmstate_gicv3_cpu and vmstate_gicv3.
>>
>> Signed-off-by: Jinjie Ruan 
>> Acked-by: Richard Henderson 
>> ---
>> v10:
>> - superprio -> nmi, gicr_isuperprio -> gicr_inmir0.
>> - Save NMI state in vmstate_gicv3_cpu and vmstate_gicv3.
>> - Update the commit message.
>> v3:
>> - Place this ahead of implement GICR_INMIR.
>> - Add Acked-by.
>> ---
>>  hw/intc/arm_gicv3_common.c | 44 ++
>>  include/hw/intc/arm_gicv3_common.h |  4 +++
>>  2 files changed, 48 insertions(+)
>>
>> diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
>> index 2d2cea6858..be76ae0be6 100644
>> --- a/hw/intc/arm_gicv3_common.c
>> +++ b/hw/intc/arm_gicv3_common.c
>> @@ -164,6 +164,24 @@ const VMStateDescription vmstate_gicv3_gicv4 = {
>>  }
>>  };
>>
>> +static bool nmi_needed(void *opaque)
>> +{
>> +GICv3CPUState *cs = opaque;
>> +
>> +return cs->gic->nmi_support != 0;
> 
> nmi_support is already a bool, so you can
>return cs->gic_nmi_support;
> 
> 
>> +}
>> +
>> +static const VMStateDescription vmstate_gicv3_cpu_nmi = {
>> +.name = "arm_gicv3_cpu/nmi",
>> +.version_id = 1,
>> +.minimum_version_id = 1,
>> +.needed = nmi_needed,
>> +.fields = (const VMStateField[]) {
>> +VMSTATE_UINT32(gicr_inmir0, GICv3CPUState),
>> +VMSTATE_END_OF_LIST()
>> +}
>> +};
>> +
>>  static const VMStateDescription vmstate_gicv3_cpu = {
>>  .name = "arm_gicv3_cpu",
>>  .version_id = 1,
>> @@ -197,6 +215,10 @@ static const VMStateDescription vmstate_gicv3_cpu = {
>>  _gicv3_cpu_sre_el1,
>>  _gicv3_gicv4,
>>  NULL
>> +},
>> +.subsections = (const VMStateDescription * const []) {
>> +_gicv3_cpu_nmi,
>> +NULL
> 
> You add your subsection to the existing .subsections[] list.
> Otherwise this field initializer overwrites the previous one.
> 
>>  }
>>  };
>>
>> @@ -238,6 +260,24 @@ const VMStateDescription 
>> vmstate_gicv3_gicd_no_migration_shift_bug = {
>>  }
>>  };
>>
>> +static bool needed_nmi(void *opaque)
>> +{
>> +GICv3State *cs = opaque;
>> +
>> +return cs->nmi_support != 0;
>> +}
> 
> You already have nmi_needed() above, so you can use it
> as the .needed function for both vmstate struct definitions.

The input opaque pointer seems not same, one is "GICv3CPUState *", but
another is "GICv3State *"

> 
>> +
>> +const VMStateDescription vmstate_gicv3_gicd_nmi = {
>> +.name = "arm_gicv3/gicd_nmi",
>> +.version_id = 1,
>> +.minimum_version_id = 1,
>> +.needed = needed_nmi,
>> +.fields = (const VMStateField[]) {
>> +VMSTATE_UINT32_ARRAY(nmi, GICv3State, GICV3_BMP_SIZE),
>> +VMSTATE_END_OF_LIST()
>> +}
>> +};
>> +
>>  static const VMStateDescription vmstate_gicv3 = {
>>  .name = "arm_gicv3",
>>  .version_id = 1,
>> @@ -267,6 +307,10 @@ static const VMStateDescription vmstate_gicv3 = {
>>  .subsections = (const VMStateDescription * const []) {
>>  _gicv3_gicd_no_migration_shift_bug,
>>  NULL
>> +},
>> +.subsections = (const VMStateDescription * const []) {
>> +_gicv3_gicd_nmi,
>> +NULL
> 
> Similarly here this must be added to the existing list.
> 
>>  }
>>  };
>>
>> diff --git a/include/hw/intc/arm_gicv3_common.h 
>> b/include/hw/intc/arm_gicv3_common.h
>> index 4358c5319c..88533749eb 100644
>> --- a/include/hw/intc/arm_gicv3_common.h
>> +++ b/include/hw/intc/arm_gicv3_common.h
>> @@ -146,6 +146,7 @@ typedef struct {
>>  int irq;
>>  uint8_t prio;
>>  int grp;
>> +bool nmi;
>>  } PendingIrq;
>>
>>  struct GICv3CPUState {
>> @@ -172,6 +173,7 @@ struct GICv3CPUState {
>>  uint32_t gicr_ienabler0;
>>  uint32_t gicr_ipendr0;
>>  uint32_t gicr_iactiver0;
>> +uint32_t gicr_inmir0;
>>  uint32_t edge_trigger; /* ICFGR0 and ICFGR1 even bits */
>>  uint32_t gicr_igrpmodr0;
>>  uint32_t gicr_nsacr;
>> @@ -275,6 +277,7 @@ struct GICv3State {
>>  GIC_DECLARE_BITMAP(active);   /* GICD_ISACTIVER */
>>  GIC_DECLARE_BITMAP(level);/* Current level */
>>  GIC_DECLARE_BITMAP(edge_trigger); /* GICD_ICFGR even bits */
>> +GIC_DECLARE_BITMAP(nmi);  /* GICD_INMIR */
>>  uint8_t gicd_ipriority[GICV3_MAXIRQ];
>>  uint64_t gicd_irouter[GICV3_MAXIRQ];
>>  /* Cached information: pointer to the cpu i/f for the CPUs specified
>> @@ -314,6 +317,7 @@ GICV3_BITMAP_ACCESSORS(pending)
>>  GICV3_BITMAP_ACCESSORS(active)
>>  GICV3_BITMAP_ACCESSORS(level)
>>  GICV3_BITMAP_ACCESSORS(edge_trigger)
>> +GICV3_BITMAP_ACCESSORS(nmi)
>>
>>  #define TYPE_ARM_GICV3_COMMON "arm-gicv3-common"
>>  typedef struct ARMGICv3CommonClass

Re: [RFC PATCH v2 2/6] cxl/core: introduce cxl_mem_report_poison()

2024-03-29 Thread Dan Williams

Shiyang Ruan wrote:
> If poison is detected(reported from cxl memdev), OS should be notified to
> handle it. So, introduce this helper function for later use:
>   1. translate DPA to HPA;
>   2. enqueue records into memory_failure's work queue;
> 
> Signed-off-by: Shiyang Ruan 

This patch is too small, it needs the corresponding caller to make sense
of the proposed change.

> ---
> 
> Currently poison injection from debugfs always create a 64-bytes-length
> record, which is fine.  But the injection from qemu's QMP API:
> qmp_cxl_inject_poison() could create a poison record contains big length,
> which may cause many many times of calling memory_failure_queue().
> Though the MEMORY_FAILURE_FIFO_SIZE is 1 << 4, it seems not enougth.

What matters is what devices do in practice, the kernel should not be
worried about odd corner cases that only exist in QEMU injection
scenarios.

Re: [RFC PATCH v2 1/6] cxl/core: correct length of DPA field masks

2024-03-29 Thread Dan Williams

Shiyang Ruan wrote:
> The length of Physical Address in General Media Event Record/DRAM Event
> Record is 64-bit, so the field mask should be defined as such length.
> Otherwise, this causes cxl_general_media and cxl_dram tracepoints to
> mask off the upper-32-bits of DPA addresses. The cxl_poison event is
> unaffected.
> 
> If userspace was doing its own DPA-to-HPA translation this could lead to
> incorrect page retirement decisions, but there is no known consumer
> (like rasdaemon) of this event today.
> 
> Fixes: d54a531a430b ("cxl/mem: Trace General Media Event Record")
> Cc: 
> Cc: Dan Williams 
> Cc: Davidlohr Bueso 
> Cc: Jonathan Cameron 
> Cc: Ira Weiny 
> Signed-off-by: Shiyang Ruan 
> ---
>  drivers/cxl/core/trace.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index e5f13260fc52..e2d1f296df97 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h
> @@ -253,11 +253,11 @@ TRACE_EVENT(cxl_generic_event,
>   * DRAM Event Record
>   * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
>   */
> -#define CXL_DPA_FLAGS_MASK   0x3F
> +#define CXL_DPA_FLAGS_MASK   0x3FULL

This change makes sense...

>  #define CXL_DPA_MASK (~CXL_DPA_FLAGS_MASK)
>  
> -#define CXL_DPA_VOLATILE BIT(0)
> -#define CXL_DPA_NOT_REPAIRABLE   BIT(1)
> +#define CXL_DPA_VOLATILE BIT_ULL(0)
> +#define CXL_DPA_NOT_REPAIRABLE   BIT_ULL(1)

...these do not. The argument to __print_flags() is an unsigned long, so
they will be cast down to (unsigned long), and they are never used as a
mask so the generated code should not change.

[PATCH v2] linux-user/syscall: fix target_msqid_ds time fields order

2024-03-29 Thread Max Filippov

target_msqid_ds::msg_*time field pairs are reversed on 32-bit TARGET_PPC
and TARGET_SPARC and on big-endian TARGET_MIPS and TARGET_XTENSA.
Fix the order to match the kernel definitions.
The issue is spotted by the libc-test http://nsz.repo.hu/git/?p=libc-test
on big-endian xtensa core.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Max Filippov 
---
Changes v1->v2:
- split into a separate patch
- add PPC, SPARC and big-endian MIPS

 linux-user/syscall.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index d9bfd31c1cad..781ed14bc613 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -4113,17 +4113,27 @@ static inline abi_long do_semtimedop(int semid,
 struct target_msqid_ds
 {
 struct target_ipc_perm msg_perm;
-abi_ulong msg_stime;
 #if TARGET_ABI_BITS == 32
+#if defined(TARGET_PPC) || defined(TARGET_SPARC) || \
+(TARGET_BIG_ENDIAN && (defined(TARGET_MIPS) || defined(TARGET_XTENSA)))
+abi_ulong __unused1;
+abi_ulong msg_stime;
+abi_ulong __unused2;
+abi_ulong msg_rtime;
+abi_ulong __unused3;
+abi_ulong msg_ctime;
+#else
+abi_ulong msg_stime;
 abi_ulong __unused1;
-#endif
 abi_ulong msg_rtime;
-#if TARGET_ABI_BITS == 32
 abi_ulong __unused2;
-#endif
 abi_ulong msg_ctime;
-#if TARGET_ABI_BITS == 32
 abi_ulong __unused3;
+#endif
+#else
+abi_ulong msg_stime;
+abi_ulong msg_rtime;
+abi_ulong msg_ctime;
 #endif
 abi_ulong __msg_cbytes;
 abi_ulong msg_qnum;
-- 
2.39.2

[PULL 15/18] target/hppa: Generate getshadowregs inline

2024-03-29 Thread Richard Henderson

This operation is trivial and does not require a helper.

Reviewed-by: Helge Deller 
Signed-off-by: Richard Henderson 
---
 target/hppa/helper.h |  1 -
 target/hppa/sys_helper.c |  4 ++--
 target/hppa/translate.c  | 17 +
 3 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/target/hppa/helper.h b/target/hppa/helper.h
index 8fd7ba65d8..5900fd70bc 100644
--- a/target/hppa/helper.h
+++ b/target/hppa/helper.h
@@ -86,7 +86,6 @@ DEF_HELPER_FLAGS_0(read_interval_timer, TCG_CALL_NO_RWG, tl)
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_1(halt, noreturn, env)
 DEF_HELPER_1(reset, noreturn, env)
-DEF_HELPER_1(getshadowregs, void, env)
 DEF_HELPER_1(rfi, void, env)
 DEF_HELPER_1(rfi_r, void, env)
 DEF_HELPER_FLAGS_2(write_interval_timer, TCG_CALL_NO_RWG, void, env, tl)
diff --git a/target/hppa/sys_helper.c b/target/hppa/sys_helper.c
index 4a31748342..208e51c086 100644
--- a/target/hppa/sys_helper.c
+++ b/target/hppa/sys_helper.c
@@ -95,7 +95,7 @@ void HELPER(rfi)(CPUHPPAState *env)
 cpu_hppa_put_psw(env, env->cr[CR_IPSW]);
 }
 
-void HELPER(getshadowregs)(CPUHPPAState *env)
+static void getshadowregs(CPUHPPAState *env)
 {
 env->gr[1] = env->shadow[0];
 env->gr[8] = env->shadow[1];
@@ -108,7 +108,7 @@ void HELPER(getshadowregs)(CPUHPPAState *env)
 
 void HELPER(rfi_r)(CPUHPPAState *env)
 {
-helper_getshadowregs(env);
+getshadowregs(env);
 helper_rfi(env);
 }
 
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 6da9503f33..29e4a64e40 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -2385,14 +2385,23 @@ static bool trans_reset(DisasContext *ctx, arg_reset *a)
 #endif
 }
 
-static bool trans_getshadowregs(DisasContext *ctx, arg_getshadowregs *a)
+static bool do_getshadowregs(DisasContext *ctx)
 {
 CHECK_MOST_PRIVILEGED(EXCP_PRIV_OPR);
-#ifndef CONFIG_USER_ONLY
 nullify_over(ctx);
-gen_helper_getshadowregs(tcg_env);
+tcg_gen_ld_i64(cpu_gr[1], tcg_env, offsetof(CPUHPPAState, shadow[0]));
+tcg_gen_ld_i64(cpu_gr[8], tcg_env, offsetof(CPUHPPAState, shadow[1]));
+tcg_gen_ld_i64(cpu_gr[9], tcg_env, offsetof(CPUHPPAState, shadow[2]));
+tcg_gen_ld_i64(cpu_gr[16], tcg_env, offsetof(CPUHPPAState, shadow[3]));
+tcg_gen_ld_i64(cpu_gr[17], tcg_env, offsetof(CPUHPPAState, shadow[4]));
+tcg_gen_ld_i64(cpu_gr[24], tcg_env, offsetof(CPUHPPAState, shadow[5]));
+tcg_gen_ld_i64(cpu_gr[25], tcg_env, offsetof(CPUHPPAState, shadow[6]));
 return nullify_end(ctx);
-#endif
+}
+
+static bool trans_getshadowregs(DisasContext *ctx, arg_getshadowregs *a)
+{
+return do_getshadowregs(ctx);
 }
 
 static bool trans_nop_addrx(DisasContext *ctx, arg_ldst *a)
-- 
2.34.1

[PULL 10/18] target/hppa: Optimize UADDCM with no condition

2024-03-29 Thread Richard Henderson

With r1 as zero is by far the most common usage of UADDCM, as the
easiest way to invert a register.  The compiler does occasionally
use the addition step as well, and we can simplify that to avoid
a temp and write directly into the destination.

Tested-by: Helge Deller 
Reviewed-by: Helge Deller 
Signed-off-by: Richard Henderson 
---
 target/hppa/translate.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index a3f425d861..3fc3e7754c 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -2763,9 +2763,29 @@ static bool do_uaddcm(DisasContext *ctx, arg_rrr_cf_d 
*a, bool is_tc)
 {
 TCGv_i64 tcg_r1, tcg_r2, tmp;
 
-if (a->cf) {
-nullify_over(ctx);
+if (a->cf == 0) {
+tcg_r2 = load_gpr(ctx, a->r2);
+tmp = dest_gpr(ctx, a->t);
+
+if (a->r1 == 0) {
+/* UADDCM r0,src,dst is the common idiom for dst = ~src. */
+tcg_gen_not_i64(tmp, tcg_r2);
+} else {
+/*
+ * Recall that r1 - r2 == r1 + ~r2 + 1.
+ * Thus r1 + ~r2 == r1 - r2 - 1,
+ * which does not require an extra temporary.
+ */
+tcg_r1 = load_gpr(ctx, a->r1);
+tcg_gen_sub_i64(tmp, tcg_r1, tcg_r2);
+tcg_gen_subi_i64(tmp, tmp, 1);
+}
+save_gpr(ctx, a->t, tmp);
+cond_free(>null_cond);
+return true;
 }
+
+nullify_over(ctx);
 tcg_r1 = load_gpr(ctx, a->r1);
 tcg_r2 = load_gpr(ctx, a->r2);
 tmp = tcg_temp_new_i64();
-- 
2.34.1

[PATCH v2] linux-user/syscall: xtensa: fix ipc_perm conversion

2024-03-29 Thread Max Filippov

target_ipc_perm::mode and target_ipc_perm::__seq fields are 32-bit wide
on xtensa and thus need to use tswap32.
The issue is spotted by the libc-test http://nsz.repo.hu/git/?p=libc-test
on big-endian xtensa core.

Cc: qemu-sta...@nongnu.org
Fixes: a3da8be5126b ("target/xtensa: linux-user: fix sysv IPC structures")
Signed-off-by: Max Filippov 
---
Changes v1->v2:
- split into a separate patch

 linux-user/syscall.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index e384e1424890..d9bfd31c1cad 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -3758,12 +3758,13 @@ static inline abi_long target_to_host_ipc_perm(struct 
ipc_perm *host_ip,
 host_ip->gid = tswap32(target_ip->gid);
 host_ip->cuid = tswap32(target_ip->cuid);
 host_ip->cgid = tswap32(target_ip->cgid);
-#if defined(TARGET_ALPHA) || defined(TARGET_MIPS) || defined(TARGET_PPC)
+#if defined(TARGET_ALPHA) || defined(TARGET_MIPS) || defined(TARGET_PPC) || \
+defined(TARGET_XTENSA)
 host_ip->mode = tswap32(target_ip->mode);
 #else
 host_ip->mode = tswap16(target_ip->mode);
 #endif
-#if defined(TARGET_PPC)
+#if defined(TARGET_PPC) || defined(TARGET_XTENSA)
 host_ip->__seq = tswap32(target_ip->__seq);
 #else
 host_ip->__seq = tswap16(target_ip->__seq);
@@ -3786,12 +3787,13 @@ static inline abi_long 
host_to_target_ipc_perm(abi_ulong target_addr,
 target_ip->gid = tswap32(host_ip->gid);
 target_ip->cuid = tswap32(host_ip->cuid);
 target_ip->cgid = tswap32(host_ip->cgid);
-#if defined(TARGET_ALPHA) || defined(TARGET_MIPS) || defined(TARGET_PPC)
+#if defined(TARGET_ALPHA) || defined(TARGET_MIPS) || defined(TARGET_PPC) || \
+defined(TARGET_XTENSA)
 target_ip->mode = tswap32(host_ip->mode);
 #else
 target_ip->mode = tswap16(host_ip->mode);
 #endif
-#if defined(TARGET_PPC)
+#if defined(TARGET_PPC) || defined(TARGET_XTENSA)
 target_ip->__seq = tswap32(host_ip->__seq);
 #else
 target_ip->__seq = tswap16(host_ip->__seq);
-- 
2.39.2

[PULL 11/18] target/hppa: Fix unit carry conditions

2024-03-29 Thread Richard Henderson

Split do_unit_cond to do_unit_zero_cond to only handle conditions
versus zero.  These are the only ones that are legal for UXOR.
Simplify trans_uxor accordingly.

Rename do_unit to do_unit_addsub, since xor has been split.
Properly compute carry-out bits for add and subtract, mirroring
the code in do_add and do_sub.

Tested-by: Helge Deller 
Reviewed-by: Helge Deller 
Fixes: b2167459ae4 ("target-hppa: Implement basic arithmetic")
Signed-off-by: Richard Henderson 
---
 target/hppa/translate.c | 218 +---
 1 file changed, 113 insertions(+), 105 deletions(-)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 3fc3e7754c..99c5c4cbca 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -936,98 +936,44 @@ static DisasCond do_sed_cond(DisasContext *ctx, unsigned 
orig, bool d,
 return do_log_cond(ctx, c * 2 + f, d, res);
 }
 
-/* Similar, but for unit conditions.  */
-
-static DisasCond do_unit_cond(unsigned cf, bool d, TCGv_i64 res,
-  TCGv_i64 in1, TCGv_i64 in2)
+/* Similar, but for unit zero conditions.  */
+static DisasCond do_unit_zero_cond(unsigned cf, bool d, TCGv_i64 res)
 {
-DisasCond cond;
-TCGv_i64 tmp, cb = NULL;
+TCGv_i64 tmp;
 uint64_t d_repl = d ? 0x00010001ull : 1;
-
-if (cf & 8) {
-/* Since we want to test lots of carry-out bits all at once, do not
- * do our normal thing and compute carry-in of bit B+1 since that
- * leaves us with carry bits spread across two words.
- */
-cb = tcg_temp_new_i64();
-tmp = tcg_temp_new_i64();
-tcg_gen_or_i64(cb, in1, in2);
-tcg_gen_and_i64(tmp, in1, in2);
-tcg_gen_andc_i64(cb, cb, res);
-tcg_gen_or_i64(cb, cb, tmp);
-}
+uint64_t ones = 0, sgns = 0;
 
 switch (cf >> 1) {
-case 0: /* never / TR */
-cond = cond_make_f();
-break;
-
 case 1: /* SBW / NBW */
 if (d) {
-tmp = tcg_temp_new_i64();
-tcg_gen_subi_i64(tmp, res, d_repl * 0x0001u);
-tcg_gen_andc_i64(tmp, tmp, res);
-tcg_gen_andi_i64(tmp, tmp, d_repl * 0x8000u);
-cond = cond_make_0(TCG_COND_NE, tmp);
-} else {
-/* undefined */
-cond = cond_make_f();
+ones = d_repl;
+sgns = d_repl << 31;
 }
 break;
-
 case 2: /* SBZ / NBZ */
-/* See hasless(v,1) from
- * https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord
- */
-tmp = tcg_temp_new_i64();
-tcg_gen_subi_i64(tmp, res, d_repl * 0x01010101u);
-tcg_gen_andc_i64(tmp, tmp, res);
-tcg_gen_andi_i64(tmp, tmp, d_repl * 0x80808080u);
-cond = cond_make_0(TCG_COND_NE, tmp);
+ones = d_repl * 0x01010101u;
+sgns = ones << 7;
 break;
-
 case 3: /* SHZ / NHZ */
-tmp = tcg_temp_new_i64();
-tcg_gen_subi_i64(tmp, res, d_repl * 0x00010001u);
-tcg_gen_andc_i64(tmp, tmp, res);
-tcg_gen_andi_i64(tmp, tmp, d_repl * 0x80008000u);
-cond = cond_make_0(TCG_COND_NE, tmp);
+ones = d_repl * 0x00010001u;
+sgns = ones << 15;
 break;
-
-case 4: /* SDC / NDC */
-tcg_gen_andi_i64(cb, cb, d_repl * 0xu);
-cond = cond_make_0(TCG_COND_NE, cb);
-break;
-
-case 5: /* SWC / NWC */
-if (d) {
-tcg_gen_andi_i64(cb, cb, d_repl * 0x8000u);
-cond = cond_make_0(TCG_COND_NE, cb);
-} else {
-/* undefined */
-cond = cond_make_f();
-}
-break;
-
-case 6: /* SBC / NBC */
-tcg_gen_andi_i64(cb, cb, d_repl * 0x80808080u);
-cond = cond_make_0(TCG_COND_NE, cb);
-break;
-
-case 7: /* SHC / NHC */
-tcg_gen_andi_i64(cb, cb, d_repl * 0x80008000u);
-cond = cond_make_0(TCG_COND_NE, cb);
-break;
-
-default:
-g_assert_not_reached();
 }
-if (cf & 1) {
-cond.c = tcg_invert_cond(cond.c);
+if (ones == 0) {
+/* Undefined, or 0/1 (never/always). */
+return cf & 1 ? cond_make_t() : cond_make_f();
 }
 
-return cond;
+/*
+ * See hasless(v,1) from
+ * https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord
+ */
+tmp = tcg_temp_new_i64();
+tcg_gen_subi_i64(tmp, res, ones);
+tcg_gen_andc_i64(tmp, tmp, res);
+tcg_gen_andi_i64(tmp, tmp, sgns);
+
+return cond_make_0_tmp(cf & 1 ? TCG_COND_EQ : TCG_COND_NE, tmp);
 }
 
 static TCGv_i64 get_carry(DisasContext *ctx, bool d,
@@ -1330,34 +1276,86 @@ static bool do_log_reg(DisasContext *ctx, arg_rrr_cf_d 
*a,
 return nullify_end(ctx);
 }
 
-static void do_unit(DisasContext *ctx, unsigned rt, TCGv_i64 in1,
-TCGv_i64 in2, unsigned cf, bool d, bool is_tc,
-void (*fn)(TCGv_i64, TCGv_i64, TCGv_i64))
+static void do_unit_addsub(DisasContext

[PULL 01/18] target/hppa: Fix BE,L set of sr0

2024-03-29 Thread Richard Henderson

The return address comes from IA*Q_Next, and IASQ_Next
is always equal to IASQ_Back, not IASQ_Front.

Tested-by: Helge Deller 
Signed-off-by: Richard Henderson 
---
 target/hppa/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 19594f917e..1766a63001 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -3817,7 +3817,7 @@ static bool trans_be(DisasContext *ctx, arg_be *a)
 load_spr(ctx, new_spc, a->sp);
 if (a->l) {
 copy_iaoq_entry(ctx, cpu_gr[31], ctx->iaoq_n, ctx->iaoq_n_var);
-tcg_gen_mov_i64(cpu_sr[0], cpu_iasq_f);
+tcg_gen_mov_i64(cpu_sr[0], cpu_iasq_b);
 }
 if (a->n && use_nullify_skip(ctx)) {
 copy_iaoq_entry(ctx, cpu_iaoq_f, -1, tmp);
-- 
2.34.1

[PULL 18/18] target/hppa: Clear psw_n for BE on use_nullify_skip path

2024-03-29 Thread Richard Henderson

Along this path we have already skipped the insn to be
nullified, so the subsequent insn should be executed.

Cc: qemu-sta...@nongnu.org
Reported-by: Sven Schnelle 
Tested-by: Sven Schnelle 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/hppa/translate.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 143818c2d9..8a1a8bc3aa 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -3948,6 +3948,7 @@ static bool trans_be(DisasContext *ctx, arg_be *a)
 copy_iaoq_entry(ctx, cpu_iaoq_b, -1, tmp);
 tcg_gen_mov_i64(cpu_iasq_f, new_spc);
 tcg_gen_mov_i64(cpu_iasq_b, cpu_iasq_f);
+nullify_set(ctx, 0);
 } else {
 copy_iaoq_entry(ctx, cpu_iaoq_f, ctx->iaoq_b, cpu_iaoq_b);
 if (ctx->iaoq_b == -1) {
-- 
2.34.1

[PULL 03/18] target/hppa: Handle unit conditions for wide mode

2024-03-29 Thread Richard Henderson

From: Sven Schnelle 

Wide mode provides two more conditions, add them.

Fixes: 59963d8fdf42 ("target/hppa: Pass d to do_unit_cond")
Signed-off-by: Sven Schnelle 
Reviewed-by: Richard Henderson 
Message-Id: <20240321184228.611897-1-sv...@stackframe.org>
Signed-off-by: Richard Henderson 
---
 target/hppa/translate.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index f875d76a23..2cb91956da 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -967,11 +967,22 @@ static DisasCond do_unit_cond(unsigned cf, bool d, 
TCGv_i64 res,
 
 switch (cf >> 1) {
 case 0: /* never / TR */
-case 1: /* undefined */
-case 5: /* undefined */
 cond = cond_make_f();
 break;
 
+case 1: /* SBW / NBW */
+if (d) {
+tmp = tcg_temp_new_i64();
+tcg_gen_subi_i64(tmp, res, d_repl * 0x0001u);
+tcg_gen_andc_i64(tmp, tmp, res);
+tcg_gen_andi_i64(tmp, tmp, d_repl * 0x8000u);
+cond = cond_make_0(TCG_COND_NE, tmp);
+} else {
+/* undefined */
+cond = cond_make_f();
+}
+break;
+
 case 2: /* SBZ / NBZ */
 /* See hasless(v,1) from
  * https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord
@@ -996,6 +1007,16 @@ static DisasCond do_unit_cond(unsigned cf, bool d, 
TCGv_i64 res,
 cond = cond_make_0(TCG_COND_NE, cb);
 break;
 
+case 5: /* SWC / NWC */
+if (d) {
+tcg_gen_andi_i64(cb, cb, d_repl * 0x8000u);
+cond = cond_make_0(TCG_COND_NE, cb);
+} else {
+/* undefined */
+cond = cond_make_f();
+}
+break;
+
 case 6: /* SBC / NBC */
 tcg_gen_andi_i64(cb, cb, d_repl * 0x80808080u);
 cond = cond_make_0(TCG_COND_NE, cb);
-- 
2.34.1

[PULL 02/18] target/hppa: Fix B,GATE for wide mode

2024-03-29 Thread Richard Henderson

Do not clobber the high bits of the address by using a 32-bit deposit.

Reviewed-by: Helge Deller 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/hppa/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 1766a63001..f875d76a23 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -3880,7 +3880,7 @@ static bool trans_b_gate(DisasContext *ctx, arg_b_gate *a)
 }
 /* No change for non-gateway pages or for priv decrease.  */
 if (type >= 4 && type - 4 < ctx->privilege) {
-dest = deposit32(dest, 0, 2, type - 4);
+dest = deposit64(dest, 0, 2, type - 4);
 }
 } else {
 dest &= -4;  /* priv = 0 */
-- 
2.34.1

[PULL 14/18] target/hppa: Fix overflow computation for shladd

2024-03-29 Thread Richard Henderson

Overflow indicator should include the effect of the shift step.
We had previously left ??? comments about the issue.

Tested-by: Helge Deller 
Signed-off-by: Richard Henderson 
---
 target/hppa/translate.c | 84 +++--
 1 file changed, 65 insertions(+), 19 deletions(-)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 9d31ef5764..6da9503f33 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -994,7 +994,8 @@ static TCGv_i64 get_psw_carry(DisasContext *ctx, bool d)
 
 /* Compute signed overflow for addition.  */
 static TCGv_i64 do_add_sv(DisasContext *ctx, TCGv_i64 res,
-  TCGv_i64 in1, TCGv_i64 in2)
+  TCGv_i64 in1, TCGv_i64 in2,
+  TCGv_i64 orig_in1, int shift, bool d)
 {
 TCGv_i64 sv = tcg_temp_new_i64();
 TCGv_i64 tmp = tcg_temp_new_i64();
@@ -1003,9 +1004,49 @@ static TCGv_i64 do_add_sv(DisasContext *ctx, TCGv_i64 
res,
 tcg_gen_xor_i64(tmp, in1, in2);
 tcg_gen_andc_i64(sv, sv, tmp);
 
+switch (shift) {
+case 0:
+break;
+case 1:
+/* Shift left by one and compare the sign. */
+tcg_gen_add_i64(tmp, orig_in1, orig_in1);
+tcg_gen_xor_i64(tmp, tmp, orig_in1);
+/* Incorporate into the overflow. */
+tcg_gen_or_i64(sv, sv, tmp);
+break;
+default:
+{
+int sign_bit = d ? 63 : 31;
+
+/* Compare the sign against all lower bits. */
+tcg_gen_sextract_i64(tmp, orig_in1, sign_bit, 1);
+tcg_gen_xor_i64(tmp, tmp, orig_in1);
+/*
+ * If one of the bits shifting into or through the sign
+ * differs, then we have overflow.
+ */
+tcg_gen_extract_i64(tmp, tmp, sign_bit - shift, shift);
+tcg_gen_movcond_i64(TCG_COND_NE, sv, tmp, ctx->zero,
+tcg_constant_i64(-1), sv);
+}
+}
 return sv;
 }
 
+/* Compute unsigned overflow for addition.  */
+static TCGv_i64 do_add_uv(DisasContext *ctx, TCGv_i64 cb, TCGv_i64 cb_msb,
+  TCGv_i64 in1, int shift, bool d)
+{
+if (shift == 0) {
+return get_carry(ctx, d, cb, cb_msb);
+} else {
+TCGv_i64 tmp = tcg_temp_new_i64();
+tcg_gen_extract_i64(tmp, in1, (d ? 63 : 31) - shift, shift);
+tcg_gen_or_i64(tmp, tmp, get_carry(ctx, d, cb, cb_msb));
+return tmp;
+}
+}
+
 /* Compute signed overflow for subtraction.  */
 static TCGv_i64 do_sub_sv(DisasContext *ctx, TCGv_i64 res,
   TCGv_i64 in1, TCGv_i64 in2)
@@ -1020,19 +1061,19 @@ static TCGv_i64 do_sub_sv(DisasContext *ctx, TCGv_i64 
res,
 return sv;
 }
 
-static void do_add(DisasContext *ctx, unsigned rt, TCGv_i64 in1,
+static void do_add(DisasContext *ctx, unsigned rt, TCGv_i64 orig_in1,
TCGv_i64 in2, unsigned shift, bool is_l,
bool is_tsv, bool is_tc, bool is_c, unsigned cf, bool d)
 {
-TCGv_i64 dest, cb, cb_msb, cb_cond, sv, tmp;
+TCGv_i64 dest, cb, cb_msb, in1, uv, sv, tmp;
 unsigned c = cf >> 1;
 DisasCond cond;
 
 dest = tcg_temp_new_i64();
 cb = NULL;
 cb_msb = NULL;
-cb_cond = NULL;
 
+in1 = orig_in1;
 if (shift) {
 tmp = tcg_temp_new_i64();
 tcg_gen_shli_i64(tmp, in1, shift);
@@ -1050,9 +1091,6 @@ static void do_add(DisasContext *ctx, unsigned rt, 
TCGv_i64 in1,
 }
 tcg_gen_xor_i64(cb, in1, in2);
 tcg_gen_xor_i64(cb, cb, dest);
-if (cond_need_cb(c)) {
-cb_cond = get_carry(ctx, d, cb, cb_msb);
-}
 } else {
 tcg_gen_add_i64(dest, in1, in2);
 if (is_c) {
@@ -1063,18 +1101,23 @@ static void do_add(DisasContext *ctx, unsigned rt, 
TCGv_i64 in1,
 /* Compute signed overflow if required.  */
 sv = NULL;
 if (is_tsv || cond_need_sv(c)) {
-sv = do_add_sv(ctx, dest, in1, in2);
+sv = do_add_sv(ctx, dest, in1, in2, orig_in1, shift, d);
 if (is_tsv) {
 if (!d) {
 tcg_gen_ext32s_i64(sv, sv);
 }
-/* ??? Need to include overflow from shift.  */
 gen_helper_tsv(tcg_env, sv);
 }
 }
 
+/* Compute unsigned overflow if required.  */
+uv = NULL;
+if (cond_need_cb(c)) {
+uv = do_add_uv(ctx, cb, cb_msb, orig_in1, shift, d);
+}
+
 /* Emit any conditional trap before any writeback.  */
-cond = do_cond(ctx, cf, d, dest, cb_cond, sv);
+cond = do_cond(ctx, cf, d, dest, uv, sv);
 if (is_tc) {
 tmp = tcg_temp_new_i64();
 tcg_gen_setcond_i64(cond.c, tmp, cond.a0, cond.a1);
@@ -2843,7 +2886,6 @@ static bool trans_dcor_i(DisasContext *ctx, arg_rr_cf_d 
*a)
 static bool trans_ds(DisasContext *ctx, arg_rrr_cf *a)
 {
 TCGv_i64 dest, add1, add2, addc, in1, in2;
-TCGv_i64 cout;
 
 nullify_over(ctx);
 
@@ -2880,19 +2922,23 @@ static bool

[PULL 13/18] target/hppa: Replace c with uv in do_cond

2024-03-29 Thread Richard Henderson

Prepare for proper indication of shladd unsigned overflow.
The UV indicator will be zero/not-zero instead of a single bit.

Tested-by: Helge Deller 
Reviewed-by: Helge Deller 
Signed-off-by: Richard Henderson 
---
 target/hppa/translate.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index a70d644c0b..9d31ef5764 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -707,7 +707,7 @@ static bool cond_need_cb(int c)
  */
 
 static DisasCond do_cond(DisasContext *ctx, unsigned cf, bool d,
- TCGv_i64 res, TCGv_i64 cb_msb, TCGv_i64 sv)
+ TCGv_i64 res, TCGv_i64 uv, TCGv_i64 sv)
 {
 DisasCond cond;
 TCGv_i64 tmp;
@@ -754,14 +754,12 @@ static DisasCond do_cond(DisasContext *ctx, unsigned cf, 
bool d,
 }
 cond = cond_make_0_tmp(TCG_COND_EQ, tmp);
 break;
-case 4: /* NUV / UV  (!C / C) */
-/* Only bit 0 of cb_msb is ever set. */
-cond = cond_make_0(TCG_COND_EQ, cb_msb);
+case 4: /* NUV / UV  (!UV / UV) */
+cond = cond_make_0(TCG_COND_EQ, uv);
 break;
-case 5: /* ZNV / VNZ (!C | Z / C & !Z) */
+case 5: /* ZNV / VNZ (!UV | Z / UV & !Z) */
 tmp = tcg_temp_new_i64();
-tcg_gen_neg_i64(tmp, cb_msb);
-tcg_gen_and_i64(tmp, tmp, res);
+tcg_gen_movcond_i64(TCG_COND_EQ, tmp, uv, ctx->zero, ctx->zero, res);
 if (!d) {
 tcg_gen_ext32u_i64(tmp, tmp);
 }
-- 
2.34.1

[PULL 09/18] target/hppa: Fix DCOR reconstruction of carry bits

2024-03-29 Thread Richard Henderson

The carry bits for each nibble N are located in bit (N+1)*4,
so the shift by 3 was off by one.  Furthermore, the carry bit
for the most significant carry bit is indeed located in bit 64,
which is located in a different storage word.

Use a double-word shift-right to reassemble into a single word
and place them all at bit 0 of their respective nibbles.

Tested-by: Helge Deller 
Reviewed-by: Helge Deller 
Fixes: b2167459ae4 ("target-hppa: Implement basic arithmetic")
Signed-off-by: Richard Henderson 
---
 target/hppa/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index e041310207..a3f425d861 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -2791,7 +2791,7 @@ static bool do_dcor(DisasContext *ctx, arg_rr_cf_d *a, 
bool is_i)
 nullify_over(ctx);
 
 tmp = tcg_temp_new_i64();
-tcg_gen_shri_i64(tmp, cpu_psw_cb, 3);
+tcg_gen_extract2_i64(tmp, cpu_psw_cb, cpu_psw_cb_msb, 4);
 if (!is_i) {
 tcg_gen_not_i64(tmp, tmp);
 }
-- 
2.34.1

[PULL 05/18] target/hppa: Mark interval timer write as io

2024-03-29 Thread Richard Henderson

Reviewed-by: Helge Deller 
Tested-by: Helge Deller 
Signed-off-by: Richard Henderson 
---
 target/hppa/translate.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index ceb739c54a..8c1a564c5d 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -2162,6 +2162,9 @@ static bool trans_mtctl(DisasContext *ctx, arg_mtctl *a)
 
 switch (ctl) {
 case CR_IT:
+if (translator_io_start(>base)) {
+ctx->base.is_jmp = DISAS_IAQ_N_STALE;
+}
 gen_helper_write_interval_timer(tcg_env, reg);
 break;
 case CR_EIRR:
-- 
2.34.1

[PULL 16/18] target/hppa: Move diag argument handling to decodetree

2024-03-29 Thread Richard Henderson

Split trans_diag into per-operation functions.

Reviewed-by: Helge Deller 
Signed-off-by: Richard Henderson 
---
 target/hppa/insns.decode |  8 +++-
 target/hppa/translate.c  | 34 +-
 2 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/target/hppa/insns.decode b/target/hppa/insns.decode
index 6a74cf23cd..9f6ffd8e2c 100644
--- a/target/hppa/insns.decode
+++ b/target/hppa/insns.decode
@@ -634,4 +634,10 @@ fdiv_d  001110 . . 011 . ... .  
@f0e_d_3
 xmpyu   001110 . . 010 .0111 .00 t:5r1=%ra64 r2=%rb64
 
 # diag
-diag000101 i:26
+{
+  [
+diag_btlb   000101 00    0001  
+diag_cout   000101 00    0001  0001
+  ]
+  diag_unimp000101 i:26
+}
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 29e4a64e40..42dd3f2c8d 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -4572,23 +4572,31 @@ static bool trans_fmpyfadd_d(DisasContext *ctx, 
arg_fmpyfadd_d *a)
 return nullify_end(ctx);
 }
 
-static bool trans_diag(DisasContext *ctx, arg_diag *a)
+/* Emulate PDC BTLB, called by SeaBIOS-hppa */
+static bool trans_diag_btlb(DisasContext *ctx, arg_diag_btlb *a)
 {
 CHECK_MOST_PRIVILEGED(EXCP_PRIV_OPR);
 #ifndef CONFIG_USER_ONLY
-if (a->i == 0x100) {
-/* emulate PDC BTLB, called by SeaBIOS-hppa */
-nullify_over(ctx);
-gen_helper_diag_btlb(tcg_env);
-return nullify_end(ctx);
-}
-if (a->i == 0x101) {
-/* print char in %r26 to first serial console, used by SeaBIOS-hppa */
-nullify_over(ctx);
-gen_helper_diag_console_output(tcg_env);
-return nullify_end(ctx);
-}
+nullify_over(ctx);
+gen_helper_diag_btlb(tcg_env);
+return nullify_end(ctx);
 #endif
+}
+
+/* Print char in %r26 to first serial console, used by SeaBIOS-hppa */
+static bool trans_diag_cout(DisasContext *ctx, arg_diag_cout *a)
+{
+CHECK_MOST_PRIVILEGED(EXCP_PRIV_OPR);
+#ifndef CONFIG_USER_ONLY
+nullify_over(ctx);
+gen_helper_diag_console_output(tcg_env);
+return nullify_end(ctx);
+#endif
+}
+
+static bool trans_diag_unimp(DisasContext *ctx, arg_diag_unimp *a)
+{
+CHECK_MOST_PRIVILEGED(EXCP_PRIV_OPR);
 qemu_log_mask(LOG_UNIMP, "DIAG opcode 0x%04x ignored\n", a->i);
 return true;
 }
-- 
2.34.1

[PULL 04/18] target/hppa: Fix ADD/SUB trap on overflow for narrow mode

2024-03-29 Thread Richard Henderson

From: Sven Schnelle 

Fixes: c53e401ed9ff ("target/hppa: Remove TARGET_REGISTER_BITS")
Signed-off-by: Sven Schnelle 
Reviewed-by: Richard Henderson 
Message-Id: <20240321184228.611897-2-sv...@stackframe.org>
Signed-off-by: Richard Henderson 
---
 target/hppa/translate.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 2cb91956da..ceb739c54a 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -1126,6 +1126,9 @@ static void do_add(DisasContext *ctx, unsigned rt, 
TCGv_i64 in1,
 if (is_tsv || cond_need_sv(c)) {
 sv = do_add_sv(ctx, dest, in1, in2);
 if (is_tsv) {
+if (!d) {
+tcg_gen_ext32s_i64(sv, sv);
+}
 /* ??? Need to include overflow from shift.  */
 gen_helper_tsv(tcg_env, sv);
 }
@@ -1217,6 +1220,9 @@ static void do_sub(DisasContext *ctx, unsigned rt, 
TCGv_i64 in1,
 if (is_tsv || cond_need_sv(c)) {
 sv = do_sub_sv(ctx, dest, in1, in2);
 if (is_tsv) {
+if (!d) {
+tcg_gen_ext32s_i64(sv, sv);
+}
 gen_helper_tsv(tcg_env, sv);
 }
 }
-- 
2.34.1

[PULL 06/18] target/hppa: Tidy read of interval timer

2024-03-29 Thread Richard Henderson

The call to gen_helper_read_interval_timer is
identical on both sides of the IF.

Reviewed-by: Helge Deller 
Tested-by: Helge Deller 
Signed-off-by: Richard Henderson 
---
 target/hppa/translate.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 8c1a564c5d..5b8c1b06c3 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -2082,11 +2082,9 @@ static bool trans_mfctl(DisasContext *ctx, arg_mfctl *a)
 nullify_over(ctx);
 tmp = dest_gpr(ctx, rt);
 if (translator_io_start(>base)) {
-gen_helper_read_interval_timer(tmp);
 ctx->base.is_jmp = DISAS_IAQ_N_STALE;
-} else {
-gen_helper_read_interval_timer(tmp);
 }
+gen_helper_read_interval_timer(tmp);
 save_gpr(ctx, rt, tmp);
 return nullify_end(ctx);
 case 26:
-- 
2.34.1

[PULL 08/18] target/hppa: Use gva_offset_mask() everywhere

2024-03-29 Thread Richard Henderson

From: Sven Schnelle 

Move it to cpu.h, so it can also be used in hppa_form_gva_psw().

Signed-off-by: Sven Schnelle 
Reviewed-by: Helge Deller 
Reviewed-by: Richard Henderson 
Message-Id: <20240324080945.991100-2-sv...@stackframe.org>
Signed-off-by: Richard Henderson 
---
 target/hppa/cpu.h   | 10 --
 target/hppa/translate.c | 12 +++-
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/target/hppa/cpu.h b/target/hppa/cpu.h
index a92dc352cb..a072d0bb63 100644
--- a/target/hppa/cpu.h
+++ b/target/hppa/cpu.h
@@ -285,14 +285,20 @@ void hppa_translate_init(void);
 
 #define CPU_RESOLVING_TYPE TYPE_HPPA_CPU
 
+static inline uint64_t gva_offset_mask(target_ulong psw)
+{
+return (psw & PSW_W
+? MAKE_64BIT_MASK(0, 62)
+: MAKE_64BIT_MASK(0, 32));
+}
+
 static inline target_ulong hppa_form_gva_psw(target_ulong psw, uint64_t spc,
  target_ulong off)
 {
 #ifdef CONFIG_USER_ONLY
 return off;
 #else
-off &= psw & PSW_W ? MAKE_64BIT_MASK(0, 62) : MAKE_64BIT_MASK(0, 32);
-return spc | off;
+return spc | (off & gva_offset_mask(psw));
 #endif
 }
 
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 46b2d6508d..e041310207 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -586,17 +586,10 @@ static bool nullify_end(DisasContext *ctx)
 return true;
 }
 
-static uint64_t gva_offset_mask(DisasContext *ctx)
-{
-return (ctx->tb_flags & PSW_W
-? MAKE_64BIT_MASK(0, 62)
-: MAKE_64BIT_MASK(0, 32));
-}
-
 static void copy_iaoq_entry(DisasContext *ctx, TCGv_i64 dest,
 uint64_t ival, TCGv_i64 vval)
 {
-uint64_t mask = gva_offset_mask(ctx);
+uint64_t mask = gva_offset_mask(ctx->tb_flags);
 
 if (ival != -1) {
 tcg_gen_movi_i64(dest, ival & mask);
@@ -1430,7 +1423,8 @@ static void form_gva(DisasContext *ctx, TCGv_i64 *pgva, 
TCGv_i64 *pofs,
 
 *pofs = ofs;
 *pgva = addr = tcg_temp_new_i64();
-tcg_gen_andi_i64(addr, modify <= 0 ? ofs : base, gva_offset_mask(ctx));
+tcg_gen_andi_i64(addr, modify <= 0 ? ofs : base,
+ gva_offset_mask(ctx->tb_flags));
 #ifndef CONFIG_USER_ONLY
 if (!is_phys) {
 tcg_gen_or_i64(addr, addr, space_select(ctx, sp, base));
-- 
2.34.1

[PULL 17/18] target/hppa: Add diag instructions to set/restore shadow registers

2024-03-29 Thread Richard Henderson

From: Helge Deller 

The 32-bit PA-7300LC (PCX-L2) CPU and the 64-bit PA8700 (PCX-W2) CPU
use different diag instructions to save or restore the CPU registers
to/from the shadow registers.

Implement those per-CPU architecture diag instructions to fix those
parts of the HP ODE testcases (L2DIAG and WDIAG, section 1) which test
the shadow registers.

Signed-off-by: Helge Deller 
[rth: Use decodetree to distinguish cases]
Signed-off-by: Richard Henderson 
Reviewed-by: Helge Deller 
Tested-by: Helge Deller 
---
 target/hppa/insns.decode | 10 ++
 target/hppa/translate.c  | 34 ++
 2 files changed, 44 insertions(+)

diff --git a/target/hppa/insns.decode b/target/hppa/insns.decode
index 9f6ffd8e2c..71074a64c1 100644
--- a/target/hppa/insns.decode
+++ b/target/hppa/insns.decode
@@ -65,6 +65,8 @@
 # Argument set definitions
 
 
+
+
 # All insns that need to form a virtual address should use this set.
t b x disp sp m scale size
 
@@ -638,6 +640,14 @@ xmpyu   001110 . . 010 .0111 .00 t:5
r1=%ra64 r2=%rb64
   [
 diag_btlb   000101 00    0001  
 diag_cout   000101 00    0001  0001
+
+# For 32-bit PA-7300LC (PCX-L2)
+diag_getshadowregs_pa1  000101 00   0001 1010  
+diag_putshadowregs_pa1  000101 00   0001 1010 0100 
+
+# For 64-bit PA8700 (PCX-W2)
+diag_getshadowregs_pa2  000101 00 0111 1000 0001 1000 0100 
+diag_putshadowregs_pa2  000101 00 0111  0001 1000 0100 
   ]
   diag_unimp000101 i:26
 }
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 42dd3f2c8d..143818c2d9 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -2399,6 +2399,20 @@ static bool do_getshadowregs(DisasContext *ctx)
 return nullify_end(ctx);
 }
 
+static bool do_putshadowregs(DisasContext *ctx)
+{
+CHECK_MOST_PRIVILEGED(EXCP_PRIV_OPR);
+nullify_over(ctx);
+tcg_gen_st_i64(cpu_gr[1], tcg_env, offsetof(CPUHPPAState, shadow[0]));
+tcg_gen_st_i64(cpu_gr[8], tcg_env, offsetof(CPUHPPAState, shadow[1]));
+tcg_gen_st_i64(cpu_gr[9], tcg_env, offsetof(CPUHPPAState, shadow[2]));
+tcg_gen_st_i64(cpu_gr[16], tcg_env, offsetof(CPUHPPAState, shadow[3]));
+tcg_gen_st_i64(cpu_gr[17], tcg_env, offsetof(CPUHPPAState, shadow[4]));
+tcg_gen_st_i64(cpu_gr[24], tcg_env, offsetof(CPUHPPAState, shadow[5]));
+tcg_gen_st_i64(cpu_gr[25], tcg_env, offsetof(CPUHPPAState, shadow[6]));
+return nullify_end(ctx);
+}
+
 static bool trans_getshadowregs(DisasContext *ctx, arg_getshadowregs *a)
 {
 return do_getshadowregs(ctx);
@@ -4594,6 +4608,26 @@ static bool trans_diag_cout(DisasContext *ctx, 
arg_diag_cout *a)
 #endif
 }
 
+static bool trans_diag_getshadowregs_pa1(DisasContext *ctx, arg_empty *a)
+{
+return !ctx->is_pa20 && do_getshadowregs(ctx);
+}
+
+static bool trans_diag_getshadowregs_pa2(DisasContext *ctx, arg_empty *a)
+{
+return ctx->is_pa20 && do_getshadowregs(ctx);
+}
+
+static bool trans_diag_putshadowregs_pa1(DisasContext *ctx, arg_empty *a)
+{
+return !ctx->is_pa20 && do_putshadowregs(ctx);
+}
+
+static bool trans_diag_putshadowregs_pa2(DisasContext *ctx, arg_empty *a)
+{
+return ctx->is_pa20 && do_putshadowregs(ctx);
+}
+
 static bool trans_diag_unimp(DisasContext *ctx, arg_diag_unimp *a)
 {
 CHECK_MOST_PRIVILEGED(EXCP_PRIV_OPR);
-- 
2.34.1

[PULL 12/18] target/hppa: Squash d for pa1.x during decode

2024-03-29 Thread Richard Henderson

The cond_need_ext predicate was created while we still had a
32-bit compilation mode.  It now makes more sense to treat D
as an absolute indicator of a 64-bit operation.

Tested-by: Helge Deller 
Reviewed-by: Helge Deller 
Signed-off-by: Richard Henderson 
---
 target/hppa/insns.decode | 20 +---
 target/hppa/translate.c  | 38 --
 2 files changed, 33 insertions(+), 25 deletions(-)

diff --git a/target/hppa/insns.decode b/target/hppa/insns.decode
index f58455dfdb..6a74cf23cd 100644
--- a/target/hppa/insns.decode
+++ b/target/hppa/insns.decode
@@ -57,6 +57,9 @@
 %neg_to_m   0:1  !function=neg_to_m
 %a_to_m 2:1  !function=neg_to_m
 %cmpbid_c   13:2 !function=cmpbid_c
+%d_55:1  !function=pa20_d
+%d_11   11:1 !function=pa20_d
+%d_13   13:1 !function=pa20_d
 
 
 # Argument set definitions
@@ -84,15 +87,16 @@
 # Format definitions
 
 
-@rr_cf_d.. r:5 . cf:4 .. d:1 t:5_cf_d
+@rr_cf_d.. r:5 . cf:4 .. . t:5  _cf_d d=%d_5
 @rrr.. r2:5 r1:5  ... t:5   
 @rrr_cf .. r2:5 r1:5 cf:4 ... t:5   _cf
-@rrr_cf_d   .. r2:5 r1:5 cf:4 .. d:1 t:5_cf_d
+@rrr_cf_d   .. r2:5 r1:5 cf:4 .. . t:5  _cf_d d=%d_5
 @rrr_sh .. r2:5 r1:5  sh:2 . t:5_sh
-@rrr_cf_d_sh.. r2:5 r1:5 cf:4  sh:2 d:1 t:5 _cf_d_sh
-@rrr_cf_d_sh0   .. r2:5 r1:5 cf:4 .. d:1 t:5_cf_d_sh sh=0
+@rrr_cf_d_sh.. r2:5 r1:5 cf:4  sh:2 . t:5   _cf_d_sh d=%d_5
+@rrr_cf_d_sh0   .. r2:5 r1:5 cf:4 .. . t:5  _cf_d_sh d=%d_5 
sh=0
 @rri_cf .. r:5  t:5  cf:4 . ... _cf i=%lowsign_11
-@rri_cf_d   .. r:5  t:5  cf:4 d:1 ...   _cf_d i=%lowsign_11
+@rri_cf_d   .. r:5  t:5  cf:4 . ... \
+_cf_d d=%d_11 i=%lowsign_11
 
 @rrb_cf .. r2:5 r1:5 c:3 ... n:1 .  \
 _c_f disp=%assemble_12
@@ -368,8 +372,10 @@ fmpysub_d   100110 . . . . 1 .  
@mpyadd
 # Conditional Branches
 
 
-bb_sar  11 0 r:5 c:1 1 d:1 ... n:1 . disp=%assemble_12
-bb_imm  110001 p:5   r:5 c:1 1 d:1 ... n:1 . disp=%assemble_12
+bb_sar  11 0 r:5 c:1 1 . ... n:1 . \
+disp=%assemble_12 d=%d_13
+bb_imm  110001 p:5   r:5 c:1 1 . ... n:1 . \
+disp=%assemble_12 d=%d_13
 
 movb110010 . . ... ... . .  @rrb_cf f=0
 movbi   110011 . . ... ... . .  @rib_cf f=0
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 99c5c4cbca..a70d644c0b 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -200,6 +200,14 @@ static int cmpbid_c(DisasContext *ctx, int val)
 return val ? val : 4; /* 0 == "*<<" */
 }
 
+/*
+ * In many places pa1.x did not decode the bit that later became
+ * the pa2.0 D bit.  Suppress D unless the cpu is pa2.0.
+ */
+static int pa20_d(DisasContext *ctx, int val)
+{
+return ctx->is_pa20 & val;
+}
 
 /* Include the auto-generated decoder.  */
 #include "decode-insns.c.inc"
@@ -693,12 +701,6 @@ static bool cond_need_cb(int c)
 return c == 4 || c == 5;
 }
 
-/* Need extensions from TCGv_i32 to TCGv_i64. */
-static bool cond_need_ext(DisasContext *ctx, bool d)
-{
-return !(ctx->is_pa20 && d);
-}
-
 /*
  * Compute conditional for arithmetic.  See Page 5-3, Table 5-1, of
  * the Parisc 1.1 Architecture Reference Manual for details.
@@ -715,7 +717,7 @@ static DisasCond do_cond(DisasContext *ctx, unsigned cf, 
bool d,
 cond = cond_make_f();
 break;
 case 1: /* = / <>(Z / !Z) */
-if (cond_need_ext(ctx, d)) {
+if (!d) {
 tmp = tcg_temp_new_i64();
 tcg_gen_ext32u_i64(tmp, res);
 res = tmp;
@@ -725,7 +727,7 @@ static DisasCond do_cond(DisasContext *ctx, unsigned cf, 
bool d,
 case 2: /* < / >=(N ^ V / !(N ^ V) */
 tmp = tcg_temp_new_i64();
 tcg_gen_xor_i64(tmp, res, sv);
-if (cond_need_ext(ctx, d)) {
+if (!d) {
 tcg_gen_ext32s_i64(tmp, tmp);
 }
 cond = cond_make_0_tmp(TCG_COND_LT, tmp);
@@ -742,7 +744,7 @@ static DisasCond do_cond(DisasContext *ctx, unsigned cf, 
bool d,
  */
 tmp = tcg_temp_new_i64();
 tcg_gen_eqv_i64(tmp, res, sv);
-if (cond_need_ext(ctx, d)) {
+if (!d) {
 tcg_gen_sextract_i64(tmp, tmp, 31, 1);
 tcg_gen_and_i64(tmp, tmp, res);
 tcg_gen_ext32u_i64(tmp, tmp);
@@ -760,13 +762,13 @@ static DisasCond do_cond(DisasContext *ctx, unsigned cf, 
bool d,
 tmp = tcg_temp_new_i64();
 tcg_gen_neg_i64(tmp, cb_msb);
 tcg_gen_and_i64(tmp, tmp, res);
-if (cond_need_ext(ctx, d)) {
+if (!d) {

[PULL 00/18] target/hppa patch queue

2024-03-29 Thread Richard Henderson

The following changes since commit 5012e522aca161be5c141596c66e5cc6082538a9:

  Update version for v9.0.0-rc1 release (2024-03-26 19:46:55 +)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-pa-20240329

for you to fetch changes up to 4a3aa11e1fb25c28c24a43fd2835c429b00a463d:

  target/hppa: Clear psw_n for BE on use_nullify_skip path (2024-03-29 08:15:01 
-1000)


target/hppa: Fix BE,L set of sr0
target/hppa: Fix B,GATE for wide mode
target/hppa: Mark interval timer write as io
target/hppa: Fix EIRR, EIEM versus icount
target/hppa: Fix DCOR reconstruction of carry bits
target/hppa: Fix unit carry conditions
target/hppa: Fix overflow computation for shladd
target/hppa: Add diag instructions to set/restore shadow registers
target/hppa: Clear psw_n for BE on use_nullify_skip path


Helge Deller (1):
  target/hppa: Add diag instructions to set/restore shadow registers

Richard Henderson (14):
  target/hppa: Fix BE,L set of sr0
  target/hppa: Fix B,GATE for wide mode
  target/hppa: Mark interval timer write as io
  target/hppa: Tidy read of interval timer
  target/hppa: Fix EIRR, EIEM versus icount
  target/hppa: Fix DCOR reconstruction of carry bits
  target/hppa: Optimize UADDCM with no condition
  target/hppa: Fix unit carry conditions
  target/hppa: Squash d for pa1.x during decode
  target/hppa: Replace c with uv in do_cond
  target/hppa: Fix overflow computation for shladd
  target/hppa: Generate getshadowregs inline
  target/hppa: Move diag argument handling to decodetree
  target/hppa: Clear psw_n for BE on use_nullify_skip path

Sven Schnelle (3):
  target/hppa: Handle unit conditions for wide mode
  target/hppa: Fix ADD/SUB trap on overflow for narrow mode
  target/hppa: Use gva_offset_mask() everywhere

 target/hppa/cpu.h|  10 +-
 target/hppa/helper.h |   2 -
 target/hppa/insns.decode |  38 +++-
 target/hppa/int_helper.c |  14 +-
 target/hppa/sys_helper.c |   4 +-
 target/hppa/translate.c  | 488 +++
 6 files changed, 364 insertions(+), 192 deletions(-)

[PULL 07/18] target/hppa: Fix EIRR, EIEM versus icount

2024-03-29 Thread Richard Henderson

Call translator_io_start before write to EIRR.
Move evaluation of EIRR vs EIEM to hppa_cpu_exec_interrupt.
Exit TB after write to EIEM, but otherwise use a straight store.

Reviewed-by: Helge Deller 
Tested-by: Helge Deller 
Signed-off-by: Richard Henderson 
---
 target/hppa/helper.h |  1 -
 target/hppa/int_helper.c | 14 --
 target/hppa/translate.c  | 10 +++---
 3 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/target/hppa/helper.h b/target/hppa/helper.h
index 1bdbcd8f98..8fd7ba65d8 100644
--- a/target/hppa/helper.h
+++ b/target/hppa/helper.h
@@ -91,7 +91,6 @@ DEF_HELPER_1(rfi, void, env)
 DEF_HELPER_1(rfi_r, void, env)
 DEF_HELPER_FLAGS_2(write_interval_timer, TCG_CALL_NO_RWG, void, env, tl)
 DEF_HELPER_FLAGS_2(write_eirr, TCG_CALL_NO_RWG, void, env, tl)
-DEF_HELPER_FLAGS_2(write_eiem, TCG_CALL_NO_RWG, void, env, tl)
 DEF_HELPER_FLAGS_2(swap_system_mask, TCG_CALL_NO_RWG, tl, env, tl)
 DEF_HELPER_FLAGS_3(itlba_pa11, TCG_CALL_NO_RWG, void, env, tl, tl)
 DEF_HELPER_FLAGS_3(itlbp_pa11, TCG_CALL_NO_RWG, void, env, tl, tl)
diff --git a/target/hppa/int_helper.c b/target/hppa/int_helper.c
index efe638b36e..90437a92cd 100644
--- a/target/hppa/int_helper.c
+++ b/target/hppa/int_helper.c
@@ -28,7 +28,7 @@
 static void eval_interrupt(HPPACPU *cpu)
 {
 CPUState *cs = CPU(cpu);
-if (cpu->env.cr[CR_EIRR] & cpu->env.cr[CR_EIEM]) {
+if (cpu->env.cr[CR_EIRR]) {
 cpu_interrupt(cs, CPU_INTERRUPT_HARD);
 } else {
 cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD);
@@ -89,14 +89,6 @@ void HELPER(write_eirr)(CPUHPPAState *env, target_ulong val)
 bql_unlock();
 }
 
-void HELPER(write_eiem)(CPUHPPAState *env, target_ulong val)
-{
-env->cr[CR_EIEM] = val;
-bql_lock();
-eval_interrupt(env_archcpu(env));
-bql_unlock();
-}
-
 void hppa_cpu_do_interrupt(CPUState *cs)
 {
 HPPACPU *cpu = HPPA_CPU(cs);
@@ -280,7 +272,9 @@ bool hppa_cpu_exec_interrupt(CPUState *cs, int 
interrupt_request)
 }
 
 /* If interrupts are requested and enabled, raise them.  */
-if ((env->psw & PSW_I) && (interrupt_request & CPU_INTERRUPT_HARD)) {
+if ((interrupt_request & CPU_INTERRUPT_HARD)
+&& (env->psw & PSW_I)
+&& (env->cr[CR_EIRR] & env->cr[CR_EIEM])) {
 cs->exception_index = EXCP_EXT_INTERRUPT;
 hppa_cpu_do_interrupt(cs);
 return true;
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 5b8c1b06c3..46b2d6508d 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -2166,10 +2166,10 @@ static bool trans_mtctl(DisasContext *ctx, arg_mtctl *a)
 gen_helper_write_interval_timer(tcg_env, reg);
 break;
 case CR_EIRR:
+/* Helper modifies interrupt lines and is therefore IO. */
+translator_io_start(>base);
 gen_helper_write_eirr(tcg_env, reg);
-break;
-case CR_EIEM:
-gen_helper_write_eiem(tcg_env, reg);
+/* Exit to re-evaluate interrupts in the main loop. */
 ctx->base.is_jmp = DISAS_IAQ_N_STALE_EXIT;
 break;
 
@@ -2195,6 +2195,10 @@ static bool trans_mtctl(DisasContext *ctx, arg_mtctl *a)
 #endif
 break;
 
+case CR_EIEM:
+/* Exit to re-evaluate interrupts in the main loop. */
+ctx->base.is_jmp = DISAS_IAQ_N_STALE_EXIT;
+/* FALLTHRU */
 default:
 tcg_gen_st_i64(reg, tcg_env, offsetof(CPUHPPAState, cr[ctl]));
 break;
-- 
2.34.1

[PULL 2/7] linux-user: Fix shmat() strace

2024-03-29 Thread Richard Henderson

From: Ilya Leoshkevich 

The indices of arguments passed to print_shmat() are all off-by-1,
because arg1 is the ipc() command. Fix them.

New output for linux-shmat-maps test:

3501769 shmat(4784214,0x0080,SHM_RND) = 0

Fixes: 9f7c97324c27 ("linux-user: Add strace for shmat")
Reviewed-by: Richard Henderson 
Signed-off-by: Ilya Leoshkevich 
Message-Id: <20240325192436.561154-3-...@linux.ibm.com>
Signed-off-by: Richard Henderson 
---
 linux-user/strace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 51a5bdd95f..b4d1098170 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -701,7 +701,7 @@ print_ipc(CPUArchState *cpu_env, const struct syscallname 
*name,
 break;
 case IPCOP_shmat:
 print_shmat(cpu_env, &(const struct syscallname){ .name = "shmat" },
-arg1, arg4, arg2, 0, 0, 0);
+arg2, arg5, arg3, 0, 0, 0);
 break;
 default:
 qemu_log(("%s("
-- 
2.34.1

[PULL 4/7] tests/tcg: Test shmat(NULL)

2024-03-29 Thread Richard Henderson

From: Ilya Leoshkevich 

Add a small test to prevent regressions.

Reviewed-by: Richard Henderson 
Signed-off-by: Ilya Leoshkevich 
Message-Id: <20240325192436.561154-5-...@linux.ibm.com>
Signed-off-by: Richard Henderson 
---
 tests/tcg/multiarch/linux/linux-shmat-null.c | 38 
 1 file changed, 38 insertions(+)
 create mode 100644 tests/tcg/multiarch/linux/linux-shmat-null.c

diff --git a/tests/tcg/multiarch/linux/linux-shmat-null.c 
b/tests/tcg/multiarch/linux/linux-shmat-null.c
new file mode 100644
index 00..94eaaec371
--- /dev/null
+++ b/tests/tcg/multiarch/linux/linux-shmat-null.c
@@ -0,0 +1,38 @@
+/*
+ * Test shmat(NULL).
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include 
+#include 
+#include 
+#include 
+
+int main(void)
+{
+int shmid;
+char *p;
+int err;
+
+/* Create, attach and intialize shared memory. */
+shmid = shmget(IPC_PRIVATE, 1, IPC_CREAT | 0600);
+assert(shmid != -1);
+p = shmat(shmid, NULL, 0);
+assert(p != (void *)-1);
+*p = 42;
+
+/* Reattach, check that the value is still there. */
+err = shmdt(p);
+assert(err == 0);
+p = shmat(shmid, NULL, 0);
+assert(p != (void *)-1);
+assert(*p == 42);
+
+/* Detach. */
+err = shmdt(p);
+assert(err == 0);
+err = shmctl(shmid, IPC_RMID, NULL);
+assert(err == 0);
+
+return EXIT_SUCCESS;
+}
-- 
2.34.1

[PULL 6/7] disas: Show opcodes for target_disas and monitor_disas

2024-03-29 Thread Richard Henderson

Fixes: 83b4613ba83 ("disas: introduce show_opcodes")
Signed-off-by: Richard Henderson 
---
 disas/disas-mon.c | 1 +
 disas/disas.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/disas/disas-mon.c b/disas/disas-mon.c
index 48ac492c6c..5d6d9aa02d 100644
--- a/disas/disas-mon.c
+++ b/disas/disas-mon.c
@@ -34,6 +34,7 @@ void monitor_disas(Monitor *mon, CPUState *cpu, uint64_t pc,
 disas_initialize_debug_target(, cpu);
 s.info.fprintf_func = disas_gstring_printf;
 s.info.stream = (FILE *)ds;  /* abuse this slot */
+s.info.show_opcodes = true;
 
 if (is_physical) {
 s.info.read_memory_func = physical_read_memory;
diff --git a/disas/disas.c b/disas/disas.c
index 17170d291e..7e3b0bb46c 100644
--- a/disas/disas.c
+++ b/disas/disas.c
@@ -211,6 +211,7 @@ void target_disas(FILE *out, CPUState *cpu, uint64_t code, 
size_t size)
 s.info.stream = out;
 s.info.buffer_vma = code;
 s.info.buffer_length = size;
+s.info.show_opcodes = true;
 
 if (s.info.cap_arch >= 0 && cap_disas_target(, code, size)) {
 return;
-- 
2.34.1

[PULL 7/7] accel/tcg: Use CPUState.get_pc in cpu_io_recompile

2024-03-29 Thread Richard Henderson

Using log_pc produces the pc at the beginning of TB,
not the actual pc installed by cpu_restore_state_from_tb,
which could be any of the guest instructions within TB.

Signed-off-by: Richard Henderson 
---
 accel/tcg/translate-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index c1f57e894a..83cc14fbde 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -634,7 +634,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
 cpu->cflags_next_tb = curr_cflags(cpu) | CF_MEMI_ONLY | n;
 
 if (qemu_loglevel_mask(CPU_LOG_EXEC)) {
-vaddr pc = log_pc(cpu, tb);
+vaddr pc = cpu->cc->get_pc(cpu);
 if (qemu_log_in_addr_range(pc)) {
 qemu_log("cpu_io_recompile: rewound execution of TB to %016"
  VADDR_PRIx "\n", pc);
-- 
2.34.1

[PULL 3/7] linux-user: Fix shmat(NULL) for h != g

2024-03-29 Thread Richard Henderson

From: Ilya Leoshkevich 

In the h != g && shmaddr == NULL && !reserved_va case, target_shmat()
incorrectly mmap()s the initial anonymous range with
MAP_FIXED_NOREPLACE, even though the earlier mmap_find_vma() has
already reserved the respective address range.

Fix by using MAP_FIXED when "mapped", which is set after
mmap_find_vma(), is true.

Fixes: 78bc8ed9a8f0 ("linux-user: Rewrite target_shmat")
Reviewed-by: Richard Henderson 
Signed-off-by: Ilya Leoshkevich 
Message-Id: <20240325192436.561154-4-...@linux.ibm.com>
Signed-off-by: Richard Henderson 
---
 linux-user/mmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 4505fd7376..be3b9a68eb 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -1354,7 +1354,7 @@ abi_ulong target_shmat(CPUArchState *cpu_env, int shmid,
 if (h_len != t_len) {
 int mmap_p = PROT_READ | (shmflg & SHM_RDONLY ? 0 : PROT_WRITE);
 int mmap_f = MAP_PRIVATE | MAP_ANONYMOUS
-   | (reserved_va || (shmflg & SHM_REMAP)
+   | (reserved_va || mapped || (shmflg & SHM_REMAP)
   ? MAP_FIXED : MAP_FIXED_NOREPLACE);
 
 test = mmap(want, m_len, mmap_p, mmap_f, -1, 0);
-- 
2.34.1

[PULL 5/7] tcg/optimize: Fix sign_mask for logical right-shift

2024-03-29 Thread Richard Henderson

The 'sign' computation is attempting to locate the sign bit that has
been repeated, so that we can test if that bit is known zero.  That
computation can be zero if there are no known sign repetitions.

Cc: qemu-sta...@nongnu.org
Fixes: 93a967fbb57 ("tcg/optimize: Propagate sign info for shifting")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2248
Signed-off-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
---
 tcg/optimize.c|  2 +-
 tests/tcg/aarch64/test-2248.c | 28 
 tests/tcg/aarch64/Makefile.target |  1 +
 3 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 tests/tcg/aarch64/test-2248.c

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 752cc5c56b..275db77b42 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -2376,7 +2376,7 @@ static bool fold_shift(OptContext *ctx, TCGOp *op)
  * will not reduced the number of input sign repetitions.
  */
 sign = (s_mask & -s_mask) >> 1;
-if (!(z_mask & sign)) {
+if (sign && !(z_mask & sign)) {
 ctx->s_mask = s_mask;
 }
 break;
diff --git a/tests/tcg/aarch64/test-2248.c b/tests/tcg/aarch64/test-2248.c
new file mode 100644
index 00..aac2e17836
--- /dev/null
+++ b/tests/tcg/aarch64/test-2248.c
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* See https://gitlab.com/qemu-project/qemu/-/issues/2248 */
+
+#include 
+
+__attribute__((noinline))
+long test(long x, long y, long sh)
+{
+long r;
+asm("cmp   %1, %2\n\t"
+"cset  x12, lt\n\t"
+"and   w11, w12, #0xff\n\t"
+"cmp   w11, #0\n\t"
+"csetm x14, ne\n\t"
+"lsr   x13, x14, %3\n\t"
+"sxtb  %0, w13"
+: "=r"(r)
+: "r"(x), "r"(y), "r"(sh)
+: "x11", "x12", "x13", "x14");
+return r;
+}
+
+int main()
+{
+long r = test(0, 1, 2);
+assert(r == -1);
+return 0;
+}
diff --git a/tests/tcg/aarch64/Makefile.target 
b/tests/tcg/aarch64/Makefile.target
index ea3e232e65..0efd565f05 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -10,6 +10,7 @@ VPATH += $(AARCH64_SRC)
 
 # Base architecture tests
 AARCH64_TESTS=fcvt pcalign-a64 lse2-fault
+AARCH64_TESTS += test-2248
 
 fcvt: LDFLAGS+=-lm
 
-- 
2.34.1

[PULL 0/7] tcg + linux-user patch queue

2024-03-29 Thread Richard Henderson

The following changes since commit 5012e522aca161be5c141596c66e5cc6082538a9:

  Update version for v9.0.0-rc1 release (2024-03-26 19:46:55 +)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20240329

for you to fetch changes up to dafa0ecc97850c325fe85cd87dc0b536858d171a:

  accel/tcg: Use CPUState.get_pc in cpu_io_recompile (2024-03-29 12:16:00 -1000)


linux-user: Fix shmat(NULL) for host != guest page size
tcg/optimize: Fix sign_mask for logical right-shift
accel/tcg: Use CPUState.get_pc in cpu_io_recompile
disas: Show opcodes for target_disas and monitor_disas


Ilya Leoshkevich (4):
  linux-user: Fix semctl() strace
  linux-user: Fix shmat() strace
  linux-user: Fix shmat(NULL) for h != g
  tests/tcg: Test shmat(NULL)

Richard Henderson (3):
  tcg/optimize: Fix sign_mask for logical right-shift
  disas: Show opcodes for target_disas and monitor_disas
  accel/tcg: Use CPUState.get_pc in cpu_io_recompile

 accel/tcg/translate-all.c|  2 +-
 disas/disas-mon.c|  1 +
 disas/disas.c|  1 +
 linux-user/mmap.c|  2 +-
 linux-user/strace.c  | 10 +++-
 tcg/optimize.c   |  2 +-
 tests/tcg/aarch64/test-2248.c| 28 
 tests/tcg/multiarch/linux/linux-shmat-null.c | 38 
 tests/tcg/aarch64/Makefile.target|  1 +
 9 files changed, 75 insertions(+), 10 deletions(-)
 create mode 100644 tests/tcg/aarch64/test-2248.c
 create mode 100644 tests/tcg/multiarch/linux/linux-shmat-null.c

[PULL 1/7] linux-user: Fix semctl() strace

2024-03-29 Thread Richard Henderson

From: Ilya Leoshkevich 

The indices of arguments used with semctl() are all off-by-1, because
arg1 is the ipc() command. Fix them. While at it, reuse print_semctl().

New output (for a small test program):

3540333 semctl(999,888,SEM_INFO,0x7fe5051ee9a0) = -1 errno=14 (Bad 
address)

Fixes: 7ccfb2eb5f9d ("Fix warnings that would be caused by gcc flag 
-Wwrite-strings")
Reviewed-by: Richard Henderson 
Signed-off-by: Ilya Leoshkevich 
Message-Id: <20240325192436.561154-2-...@linux.ibm.com>
Signed-off-by: Richard Henderson 
---
 linux-user/strace.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 8d13e55a5b..51a5bdd95f 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -657,7 +657,6 @@ print_newselect(CPUArchState *cpu_env, const struct 
syscallname *name,
 }
 #endif
 
-#ifdef TARGET_NR_semctl
 static void
 print_semctl(CPUArchState *cpu_env, const struct syscallname *name,
  abi_long arg1, abi_long arg2, abi_long arg3,
@@ -668,7 +667,6 @@ print_semctl(CPUArchState *cpu_env, const struct 
syscallname *name,
 print_ipc_cmd(arg3);
 qemu_log(",0x" TARGET_ABI_FMT_lx ")", arg4);
 }
-#endif
 
 static void
 print_shmat(CPUArchState *cpu_env, const struct syscallname *name,
@@ -698,10 +696,8 @@ print_ipc(CPUArchState *cpu_env, const struct syscallname 
*name,
 {
 switch(arg1) {
 case IPCOP_semctl:
-qemu_log("semctl(" TARGET_ABI_FMT_ld "," TARGET_ABI_FMT_ld ",",
- arg1, arg2);
-print_ipc_cmd(arg3);
-qemu_log(",0x" TARGET_ABI_FMT_lx ")", arg4);
+print_semctl(cpu_env, &(const struct syscallname){ .name = "semctl" },
+ arg2, arg3, arg4, arg5, 0, 0);
 break;
 case IPCOP_shmat:
 print_shmat(cpu_env, &(const struct syscallname){ .name = "shmat" },
-- 
2.34.1

Re: [PATCH] linux-user/syscall: xtensa: fix target_msqid_ds and ipc_perm conversion

2024-03-29 Thread Max Filippov

On Fri, Mar 29, 2024 at 5:48 AM Philippe Mathieu-Daudé
 wrote:
>
> Hi Max,
>
> On 29/3/24 07:31, Max Filippov wrote:
> > - target_ipc_perm::mode and target_ipc_perm::__seq fields are 32-bit wide
> >on xtensa and thus need to use tswap32
> > - target_msqid_ds::msg_*time field pairs are reversed on big-endian
> >xtensa
>
> Please split in 2 distinct patches.

Ok.

> >   struct target_msqid_ds
> >   {
> >   struct target_ipc_perm msg_perm;
> > +#if defined(TARGET_XTENSA) && TARGET_BIG_ENDIAN
>
> Why restrict to only Xtensa here?

I have detected and tested it on xtensa.
I see other architectures (mips, parisc, ppc, sparc) that may need
that, but AFAICS it's not that it's applicable for all big endians.

> > +abi_ulong __unused1;
> > +abi_ulong msg_stime;
> > +abi_ulong __unused2;
> > +abi_ulong msg_rtime;
> > +abi_ulong __unused3;
> > +abi_ulong msg_ctime;
> > +#else

-- 
Thanks.
-- Max

Re: [RFC PATCH v2 0/6] cxl: add poison event handler

2024-03-29 Thread Dan Williams

Alison Schofield wrote:
> On Fri, Mar 29, 2024 at 11:22:32AM -0700, Dan Williams wrote:
> > Alison Schofield wrote:
> > [..]
> > > Upon receipt of that new poison list, call memory_failture_queue()
> > > on *any* poison in a mapped space. Is that OK?  Can we call
> > > memory_failure_queue() on any and every poison report that is in
> > > HPA space regardless of whether it first came to us through a GMER?
> > > I'm actually wondering if that is going to be the next ask anyway -
> > > ie report all poison.
> > 
> > memory_failure_queue() should be called on poison creation events. Leave
> > the MF_ACTION_REQUIRED flag not set so that memory_failure() performs
> > "action optional" handling.  So I would expect memory_failure_queue()
> > notification for GMER events, but not on poison list events.
> 
> Seems I totally missed the point of this patch set.
> Is it's only purpose to make sure that poison that is injected gets
> reported to memory_failure?

Clarify terms, "poison injection" to me is a debug-only event to test
that poison handling is working, "poison creation" is an event where new
poison was encountered by CPU consumption, deposited by a
DMA-with-poison transaction, or discovered by a background scrub
operation.

> 
> So this single patch only:
> 1. Poison inject leads to this GMER/CXL_EVENT_TRANSACTION_INJECT_POISON 

Inject is a special case. Likely this should copy the PMEM legacy where
notifying memory_failure() on injected poison is optional:

"ndctl inject-error --no-notify"

> 2. Driver sees GMER/CXL_EVENT_TRANSACTION_INJECT_POISON and reads poison
> list to get accurate length.

Again, inject is the least interesting for the common case, production
kernels care about "Media ECC Error" and "Scrub Media ECC Error"
regardless of transaction type.

> 3. Driver reports that to memory_failure_queue()
> 
> Still expect there's some code sharing opportunities and I still wonder
> about what is next in this area.

One area this needs to be careful is in unifying the OS-first and
FW-first paths. In the FW-first case the platform can trigger
memory_failure() along with the GMER by just posting a memory failure
CPER record. So there is a risk that things get doubly reported if the
GMER handling code blindly triggers memory_failure(). Might be benign,
but probably best avoided.

Re: Backdoor in xz, should we switch compression format for tarballs?

2024-03-29 Thread Alex Bennée

Also does qemu link to libarchive? The original analysis wasn't a full
reverse engineer of the payload so we don't know if it only affects sshd.

On Sat, 30 Mar 2024, 07:01 Daniel P. Berrangé,  wrote:

> On Fri, Mar 29, 2024 at 06:59:30PM +0100, Paolo Bonzini wrote:
> > For more info, see
> >
> https://lwn.net/ml/oss-security/20240329155126.kjjfduxw2yrlx...@awork3.anarazel.de/
> > but, essentially, xz was backdoored and it seems like upstream was
> directly
> > responsible for this.
> >
> > Based on this, should we switch our distribution from bz2+xz to bz2+zstd
> or
> > bz2+lzip?
>
> Based on the attack vector of pre-loading git with an exploit, but then
> modifying the tarball to activate it, there's a bigger question of whether
> users should really trust manually created tarballs at all ? You don't
> anything about either the tarball creator, or the state of creators'
> machine,
> even if it is signed. How can you trust that its contents is a faithful
> representation of the tagged release from git it claims to be?
>
> This issue could prompt a push towards distros only handling tarballs
> directly auto-generated from a git tag, in a reliably reproducible manner.
>
> Obviously you couldn't actually trust the upstream maintainer in this
> case, but at least if you're using a reproducible git tarball you can
> verify every link in the chain right through each git commit, and don't
> have this manual tarball whose contents need to be to picked apart.
>
> TL;DR; I think we should consider our tarball distribution options, but
> lets wait for the dust to settle and not rush into decisions.
>
> With regards,
> Daniel
> --
> |: https://berrange.com  -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-
> https://www.instagram.com/dberrange :|
>
>
>

Re: Backdoor in xz, should we switch compression format for tarballs?

2024-03-29 Thread Daniel P . Berrangé

On Fri, Mar 29, 2024 at 06:59:30PM +0100, Paolo Bonzini wrote:
> For more info, see
> https://lwn.net/ml/oss-security/20240329155126.kjjfduxw2yrlx...@awork3.anarazel.de/
> but, essentially, xz was backdoored and it seems like upstream was directly
> responsible for this.
> 
> Based on this, should we switch our distribution from bz2+xz to bz2+zstd or
> bz2+lzip?

Based on the attack vector of pre-loading git with an exploit, but then
modifying the tarball to activate it, there's a bigger question of whether
users should really trust manually created tarballs at all ? You don't
anything about either the tarball creator, or the state of creators' machine,
even if it is signed. How can you trust that its contents is a faithful
representation of the tagged release from git it claims to be?

This issue could prompt a push towards distros only handling tarballs
directly auto-generated from a git tag, in a reliably reproducible manner.

Obviously you couldn't actually trust the upstream maintainer in this
case, but at least if you're using a reproducible git tarball you can
verify every link in the chain right through each git commit, and don't
have this manual tarball whose contents need to be to picked apart.

TL;DR; I think we should consider our tarball distribution options, but
lets wait for the dust to settle and not rush into decisions.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-03-29 Thread Daniel P . Berrangé

On Fri, Mar 29, 2024 at 11:28:54AM +0100, Philippe Mathieu-Daudé wrote:
> Hi Zhijian,
> 
> On 29/3/24 02:53, Zhijian Li (Fujitsu) wrote:
> > 
> > 
> > On 28/03/2024 23:01, Peter Xu wrote:
> > > On Thu, Mar 28, 2024 at 11:18:04AM -0300, Fabiano Rosas wrote:
> > > > Philippe Mathieu-Daudé  writes:
> > > > 
> > > > > The whole RDMA subsystem was deprecated in commit e9a54265f5
> > > > > ("hw/rdma: Deprecate the pvrdma device and the rdma subsystem")
> > > > > released in v8.2.
> > > > > 
> > > > > Remove:
> > > > >- RDMA handling from migration
> > > > >- dependencies on libibumad, libibverbs and librdmacm
> > > > > 
> > > > > Keep the RAM_SAVE_FLAG_HOOK definition since it might appears
> > > > > in old migration streams.
> > > > > 
> > > > > Cc: Peter Xu 
> > > > > Cc: Li Zhijian 
> > > > > Acked-by: Fabiano Rosas 
> > > > > Signed-off-by: Philippe Mathieu-Daudé 
> > > > 
> > > > Just to be clear, because people raised the point in the last version,
> > > > the first link in the deprecation commit links to a thread comprising
> > > > entirely of rdma migration patches. I don't see any ambiguity on whether
> > > > the deprecation was intended to include migration. There's even an ack
> > > > from Juan.
> > > 
> > > Yes I remember that's the plan.
> > > 
> > > > 
> > > > So on the basis of not reverting the previous maintainer's decision, my
> > > > Ack stands here.
> > > > 
> > > > We also had pretty obvious bugs ([1], [2]) in the past that would have
> > > > been caught if we had any kind of testing for the feature, so I can't
> > > > even say this thing works currently.
> > > > 
> > > > @Peter Xu, @Li Zhijian, what are your thoughts on this?
> > > 
> > > Generally I definitely agree with such a removal sooner or later, as 
> > > that's
> > > how deprecation works, and even after Juan's left I'm not aware of any
> > > other new RDMA users.  Personally, I'd slightly prefer postponing it one
> > > more release which might help a bit of our downstream maintenance, however
> > > I assume that's not a blocker either, as I think we can also manage it.
> > > 
> > > IMHO it's more important to know whether there are still users and whether
> > > they would still like to see it around. That's also one thing I notice 
> > > that
> > > e9a54265f533f didn't yet get acks from RDMA users that we are aware, even
> > > if they're rare. According to [2] it could be that such user may only rely
> > > on the release versions of QEMU when it broke things.
> > > 
> > > So I'm copying Yu too (while Zhijian is already in the loop), just in case
> > > someone would like to stand up and speak.
> > 
> > 
> > I admit RDMA migration was lack of testing(unit/CI test), which led to the 
> > a few
> > obvious bugs being noticed too late.
> > However I was a bit surprised when I saw the removal of the RDMA migration. 
> > I wasn't
> > aware that this feature has not been marked as deprecated(at least there is 
> > no
> > prompt to end-user).
> > 
> > 
> > > IMHO it's more important to know whether there are still users and whether
> > > they would still like to see it around.
> > 
> > Agree.
> > I didn't immediately express my opinion in V1 because I'm also consulting 
> > our
> > customers for this feature in the future.
> > 
> > Personally, I agree with Perter's idea that "I'd slightly prefer postponing 
> > it one
> > more release which might help a bit of our downstream maintenance"
> 
> Do you mind posting a deprecation patch to clarify the situation?

The key thing the first deprecation patch missed was that it failed
to issue a warning message when RDMA migration was actually used.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [RFC PATCH v2 0/6] cxl: add poison event handler

2024-03-29 Thread Alison Schofield

On Fri, Mar 29, 2024 at 11:22:32AM -0700, Dan Williams wrote:
> Alison Schofield wrote:
> [..]
> > Upon receipt of that new poison list, call memory_failture_queue()
> > on *any* poison in a mapped space. Is that OK?  Can we call
> > memory_failure_queue() on any and every poison report that is in
> > HPA space regardless of whether it first came to us through a GMER?
> > I'm actually wondering if that is going to be the next ask anyway -
> > ie report all poison.
> 
> memory_failure_queue() should be called on poison creation events. Leave
> the MF_ACTION_REQUIRED flag not set so that memory_failure() performs
> "action optional" handling.  So I would expect memory_failure_queue()
> notification for GMER events, but not on poison list events.

Seems I totally missed the point of this patch set.
Is it's only purpose to make sure that poison that is injected gets
reported to memory_failure?

So this single patch only:
1. Poison inject leads to this GMER/CXL_EVENT_TRANSACTION_INJECT_POISON 
2. Driver sees GMER/CXL_EVENT_TRANSACTION_INJECT_POISON and reads poison
list to get accurate length.
3. Driver reports that to memory_failure_queue()

Still expect there's some code sharing opportunities and I still wonder
about what is next in this area.

--Alison

Re: [PATCH for-9.1] migration: Add Error** argument to add_bitmaps_to_list()

2024-03-29 Thread Vladimir Sementsov-Ogievskiy


On 29.03.24 13:56, Cédric Le Goater wrote:

This allows to report more precise errors in the migration handler
dirty_bitmap_save_setup().

Suggested-by Vladimir Sementsov-Ogievskiy  
Signed-off-by: Cédric Le Goater


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

[PATCH v3 4/5] qapi: introduce device-sync-config

2024-03-29 Thread Vladimir Sementsov-Ogievskiy

Add command to sync config from vhost-user backend to the device. It
may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not
triggered interrupt to the guest or just not available (not supported
by vhost-user server).

Command result is racy if allow it during migration. Let's allow the
sync only in RUNNING state.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 hw/block/vhost-user-blk.c | 27 --
 hw/virtio/virtio-pci.c|  9 
 include/hw/qdev-core.h|  3 +++
 include/sysemu/runstate.h |  1 +
 qapi/qdev.json| 21 +
 system/qdev-monitor.c | 47 +++
 system/runstate.c |  5 +
 7 files changed, 106 insertions(+), 7 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9e6bbc6950..2f301f380c 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -88,27 +88,39 @@ static void vhost_user_blk_set_config(VirtIODevice *vdev, 
const uint8_t *config)
 s->blkcfg.wce = blkcfg->wce;
 }
 
+static int vhost_user_blk_sync_config(DeviceState *dev, Error **errp)
+{
+int ret;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg,
+   vdev->config_len, errp);
+if (ret < 0) {
+return ret;
+}
+
+memcpy(vdev->config, >blkcfg, vdev->config_len);
+virtio_notify_config(vdev);
+
+return 0;
+}
+
 static int vhost_user_blk_handle_config_change(struct vhost_dev *dev)
 {
 int ret;
-VirtIODevice *vdev = dev->vdev;
-VHostUserBlk *s = VHOST_USER_BLK(dev->vdev);
 Error *local_err = NULL;
 
 if (!dev->started) {
 return 0;
 }
 
-ret = vhost_dev_get_config(dev, (uint8_t *)>blkcfg,
-   vdev->config_len, _err);
+ret = vhost_user_blk_sync_config(DEVICE(dev->vdev), _err);
 if (ret < 0) {
 error_report_err(local_err);
 return ret;
 }
 
-memcpy(dev->vdev->config, >blkcfg, vdev->config_len);
-virtio_notify_config(dev->vdev);
-
 return 0;
 }
 
@@ -576,6 +588,7 @@ static void vhost_user_blk_class_init(ObjectClass *klass, 
void *data)
 
 device_class_set_props(dc, vhost_user_blk_properties);
 dc->vmsd = _vhost_user_blk;
+dc->sync_config = vhost_user_blk_sync_config;
 set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
 vdc->realize = vhost_user_blk_device_realize;
 vdc->unrealize = vhost_user_blk_device_unrealize;
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index eaaf86402c..92afbae71c 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2501,6 +2501,14 @@ static void virtio_pci_dc_realize(DeviceState *qdev, 
Error **errp)
 vpciklass->parent_dc_realize(qdev, errp);
 }
 
+static int virtio_pci_sync_config(DeviceState *dev, Error **errp)
+{
+VirtIOPCIProxy *proxy = VIRTIO_PCI(dev);
+VirtIODevice *vdev = virtio_bus_get_device(>bus);
+
+return qdev_sync_config(DEVICE(vdev), errp);
+}
+
 static void virtio_pci_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -2517,6 +2525,7 @@ static void virtio_pci_class_init(ObjectClass *klass, 
void *data)
 device_class_set_parent_realize(dc, virtio_pci_dc_realize,
 >parent_dc_realize);
 rc->phases.hold = virtio_pci_bus_reset_hold;
+dc->sync_config = virtio_pci_sync_config;
 }
 
 static const TypeInfo virtio_pci_info = {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 9228e96c87..87135bdcdf 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -95,6 +95,7 @@ typedef void (*DeviceUnrealize)(DeviceState *dev);
 typedef void (*DeviceReset)(DeviceState *dev);
 typedef void (*BusRealize)(BusState *bus, Error **errp);
 typedef void (*BusUnrealize)(BusState *bus);
+typedef int (*DeviceSyncConfig)(DeviceState *dev, Error **errp);
 
 /**
  * struct DeviceClass - The base class for all devices.
@@ -162,6 +163,7 @@ struct DeviceClass {
 DeviceReset reset;
 DeviceRealize realize;
 DeviceUnrealize unrealize;
+DeviceSyncConfig sync_config;
 
 /**
  * @vmsd: device state serialisation description for
@@ -546,6 +548,7 @@ bool qdev_hotplug_allowed(DeviceState *dev, Error **errp);
  */
 HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev);
 void qdev_unplug(DeviceState *dev, Error **errp);
+int qdev_sync_config(DeviceState *dev, Error **errp);
 void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev,
   DeviceState *dev, Error **errp);
 void qdev_machine_creation_done(void);
diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 0117d243c4..296af52322 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -5,6 +5,7 @@
 #include "qemu/notify.h"
 
 bool runstate_check(RunState state);
+const char

[PATCH v3 5/5] qapi: introduce CONFIG_READ event

2024-03-29 Thread Vladimir Sementsov-Ogievskiy

Send a new event when guest reads virtio-pci config after
virtio_notify_config() call.

That's useful to check that guest fetched modified config, for example
after resizing disk backend.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 hw/virtio/virtio-pci.c |  9 +
 include/monitor/qdev.h |  2 ++
 monitor/monitor.c  |  1 +
 qapi/qdev.json | 33 +
 stubs/qdev.c   |  6 ++
 system/qdev-monitor.c  |  6 ++
 6 files changed, 57 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 92afbae71c..c0c158dae2 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -23,6 +23,7 @@
 #include "hw/boards.h"
 #include "hw/virtio/virtio.h"
 #include "migration/qemu-file-types.h"
+#include "monitor/qdev.h"
 #include "hw/pci/pci.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/qdev-properties.h"
@@ -530,6 +531,10 @@ static uint64_t virtio_pci_config_read(void *opaque, 
hwaddr addr,
 }
 addr -= config;
 
+if (vdev->generation > 0) {
+qdev_virtio_config_read_event(DEVICE(proxy));
+}
+
 switch (size) {
 case 1:
 val = virtio_config_readb(vdev, addr);
@@ -1884,6 +1889,10 @@ static uint64_t virtio_pci_device_read(void *opaque, 
hwaddr addr,
 return UINT64_MAX;
 }
 
+if (vdev->generation > 0) {
+qdev_virtio_config_read_event(DEVICE(proxy));
+}
+
 switch (size) {
 case 1:
 val = virtio_config_modern_readb(vdev, addr);
diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
index 1d57bf6577..fc9a834dca 100644
--- a/include/monitor/qdev.h
+++ b/include/monitor/qdev.h
@@ -36,4 +36,6 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
  */
 const char *qdev_set_id(DeviceState *dev, char *id, Error **errp);
 
+void qdev_virtio_config_read_event(DeviceState *dev);
+
 #endif
diff --git a/monitor/monitor.c b/monitor/monitor.c
index 01ede1babd..5b06146503 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -316,6 +316,7 @@ static MonitorQAPIEventConf 
monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
 [QAPI_EVENT_VSERPORT_CHANGE]   = { 1000 * SCALE_MS },
 [QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE] = { 1000 * SCALE_MS },
 [QAPI_EVENT_HV_BALLOON_STATUS_REPORT] = { 1000 * SCALE_MS },
+[QAPI_EVENT_VIRTIO_CONFIG_READ] = { 300 * SCALE_MS },
 };
 
 /*
diff --git a/qapi/qdev.json b/qapi/qdev.json
index e8be79c3d5..29a4f47360 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -182,3 +182,36 @@
 { 'command': 'device-sync-config',
   'features': [ 'unstable' ],
   'data': {'id': 'str'} }
+
+##
+# @VIRTIO_CONFIG_READ:
+#
+# Emitted whenever guest reads virtio device configuration after
+# configuration change.
+#
+# The event may be used in pair with device-sync-config. It shows
+# that guest has re-read updated configuration. It doesn't
+# guarantee that guest successfully handled it and updated the
+# view of the device for the user, but still it's a kind of
+# success indicator.
+#
+# @device: device name
+#
+# @path: device path
+#
+# Features:
+#
+# @unstable: The event is experimental.
+#
+# Since: 9.1
+#
+# Example:
+#
+# <- { "event": "VIRTIO_CONFIG_READ",
+#  "data": { "device": "virtio-net-pci-0",
+#"path": "/machine/peripheral/virtio-net-pci-0" },
+#  "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
+##
+{ 'event': 'VIRTIO_CONFIG_READ',
+  'features': [ 'unstable' ],
+  'data': { '*device': 'str', 'path': 'str' } }
diff --git a/stubs/qdev.c b/stubs/qdev.c
index 6869f6f90a..ab6c4afe0b 100644
--- a/stubs/qdev.c
+++ b/stubs/qdev.c
@@ -26,3 +26,9 @@ void qapi_event_send_device_unplug_guest_error(const char 
*device,
 {
 /* Nothing to do. */
 }
+
+void qapi_event_send_virtio_config_read(const char *device,
+const char *path)
+{
+/* Nothing to do. */
+}
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index cb35ea0b86..8a2ca77fde 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -26,6 +26,7 @@
 #include "sysemu/runstate.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-qdev.h"
+#include "qapi/qapi-events-qdev.h"
 #include "qapi/qmp/dispatch.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qerror.h"
@@ -1206,3 +1207,8 @@ bool qmp_command_available(const QmpCommand *cmd, Error 
**errp)
 }
 return true;
 }
+
+void qdev_virtio_config_read_event(DeviceState *dev)
+{
+qapi_event_send_virtio_config_read(dev->id, dev->canonical_path);
+}
-- 
2.34.1

[PATCH v3 0/5] vhost-user-blk: live resize additional APIs

2024-03-29 Thread Vladimir Sementsov-Ogievskiy

v3:
02: add r-b by Markus
03: improve commit message
04: improve documentation, merge race-fix here (which was v2:05),
rebase on master (migration_is_running() now without arguments)
05: improve documentation

Vladimir Sementsov-Ogievskiy (5):
  vhost-user-blk: simplify and fix vhost_user_blk_handle_config_change
  qdev-monitor: fix error message in find_device_state()
  qdev-monitor: add option to report GenericError from find_device_state
  qapi: introduce device-sync-config
  qapi: introduce CONFIG_READ event

 hw/block/vhost-user-blk.c | 32 +++---
 hw/virtio/virtio-pci.c| 18 ++
 include/hw/qdev-core.h|  3 ++
 include/monitor/qdev.h|  2 ++
 include/sysemu/runstate.h |  1 +
 monitor/monitor.c |  1 +
 qapi/qdev.json| 54 ++
 stubs/qdev.c  |  6 
 system/qdev-monitor.c | 70 ---
 system/runstate.c |  5 +++
 10 files changed, 175 insertions(+), 17 deletions(-)

-- 
2.34.1

[PATCH v3 1/5] vhost-user-blk: simplify and fix vhost_user_blk_handle_config_change

2024-03-29 Thread Vladimir Sementsov-Ogievskiy

Let's not care about what was changed and update the whole config,
reasons:

1. config->geometry should be updated together with capacity, so we fix
   a bug.

2. Vhost-user protocol doesn't say anything about config change
   limitation. Silent ignore of changes doesn't seem to be correct.

3. vhost-user-vsock reads the whole config

4. on realize we don't do any checks on retrieved config, so no reason
   to care here

Comment "valid for resize only" exists since introduction the whole
hw/block/vhost-user-blk.c in commit
   00343e4b54ba0685e9ebe928ec5713b0cf7f1d1c
"vhost-user-blk: introduce a new vhost-user-blk host device",
seems it was just an extra limitation.

Also, let's notify guest unconditionally:

1. So does vhost-user-vsock

2. We are going to reuse the functionality in new cases when we do want
   to notify the guest unconditionally. So, no reason to create extra
   branches in the logic.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Acked-by: Raphael Norwitz 
---
 hw/block/vhost-user-blk.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 6a856ad51a..9e6bbc6950 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -91,7 +91,6 @@ static void vhost_user_blk_set_config(VirtIODevice *vdev, 
const uint8_t *config)
 static int vhost_user_blk_handle_config_change(struct vhost_dev *dev)
 {
 int ret;
-struct virtio_blk_config blkcfg;
 VirtIODevice *vdev = dev->vdev;
 VHostUserBlk *s = VHOST_USER_BLK(dev->vdev);
 Error *local_err = NULL;
@@ -100,19 +99,15 @@ static int vhost_user_blk_handle_config_change(struct 
vhost_dev *dev)
 return 0;
 }
 
-ret = vhost_dev_get_config(dev, (uint8_t *),
+ret = vhost_dev_get_config(dev, (uint8_t *)>blkcfg,
vdev->config_len, _err);
 if (ret < 0) {
 error_report_err(local_err);
 return ret;
 }
 
-/* valid for resize only */
-if (blkcfg.capacity != s->blkcfg.capacity) {
-s->blkcfg.capacity = blkcfg.capacity;
-memcpy(dev->vdev->config, >blkcfg, vdev->config_len);
-virtio_notify_config(dev->vdev);
-}
+memcpy(dev->vdev->config, >blkcfg, vdev->config_len);
+virtio_notify_config(dev->vdev);
 
 return 0;
 }
-- 
2.34.1

[PATCH v3 2/5] qdev-monitor: fix error message in find_device_state()

2024-03-29 Thread Vladimir Sementsov-Ogievskiy

This "hotpluggable" here is misleading. Actually we check is object a
device or not. Let's drop the word.

Suggested-by: Markus Armbruster 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Markus Armbruster 
---
 system/qdev-monitor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index c1243891c3..840177d19f 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -891,7 +891,7 @@ static DeviceState *find_device_state(const char *id, Error 
**errp)
 
 dev = (DeviceState *)object_dynamic_cast(obj, TYPE_DEVICE);
 if (!dev) {
-error_setg(errp, "%s is not a hotpluggable device", id);
+error_setg(errp, "%s is not a device", id);
 return NULL;
 }
 
-- 
2.34.1

[PATCH v3 3/5] qdev-monitor: add option to report GenericError from find_device_state

2024-03-29 Thread Vladimir Sementsov-Ogievskiy

Here we just prepare for the following patch, making possible to report
GenericError as recommended.

This patch doesn't aim to prevent further use of DeviceNotFound by
future interfaces:

 - find_device_state() is used in blk_by_qdev_id() and qmp_get_blk()
   functions, which may lead to spread of DeviceNotFound anyway
 - also, nothing prevent simply copy-pasting find_device_state() calls
   with false argument

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 system/qdev-monitor.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 840177d19f..7e075d91c1 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -878,13 +878,20 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, 
Error **errp)
 object_unref(OBJECT(dev));
 }
 
-static DeviceState *find_device_state(const char *id, Error **errp)
+/*
+ * Note that creating new APIs using error classes other than GenericError is
+ * not recommended. Set use_generic_error=true for new interfaces.
+ */
+static DeviceState *find_device_state(const char *id, bool use_generic_error,
+  Error **errp)
 {
 Object *obj = object_resolve_path_at(qdev_get_peripheral(), id);
 DeviceState *dev;
 
 if (!obj) {
-error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+error_set(errp,
+  (use_generic_error ?
+   ERROR_CLASS_GENERIC_ERROR : ERROR_CLASS_DEVICE_NOT_FOUND),
   "Device '%s' not found", id);
 return NULL;
 }
@@ -948,7 +955,7 @@ void qdev_unplug(DeviceState *dev, Error **errp)
 
 void qmp_device_del(const char *id, Error **errp)
 {
-DeviceState *dev = find_device_state(id, errp);
+DeviceState *dev = find_device_state(id, false, errp);
 if (dev != NULL) {
 if (dev->pending_deleted_event &&
 (dev->pending_deleted_expires_ms == 0 ||
@@ -1068,7 +1075,7 @@ BlockBackend *blk_by_qdev_id(const char *id, Error **errp)
 
 GLOBAL_STATE_CODE();
 
-dev = find_device_state(id, errp);
+dev = find_device_state(id, false, errp);
 if (dev == NULL) {
 return NULL;
 }
-- 
2.34.1

Re: Backdoor in xz, should we switch compression format for tarballs?

2024-03-29 Thread Alex Bennée

Um maybe?

>From what I've read so far it doesn't seem the format is compromised but it
certainly seems like a concerted attempt to subvert an upstream. However a
knee-jerk jump to another format might be premature without carefully
considering if other upstreams have been targeted.

I guess zstd is overseen by Facebook but it's still a mostly single
contributor repo. Lzip's history directly ties to the original author of xz
and we haven't heard from them yet.

We should certainly keep an eye on the situation but let's not be too hasty.

On Sat, 30 Mar 2024, 05:00 Paolo Bonzini,  wrote:

> For more info, see
> https://lwn.net/ml/oss-security/20240329155126.kjjfduxw2yrlx...@awork3.anarazel.de/
> but, essentially, xz was backdoored and it seems like upstream was directly
> responsible for this.
>
> Based on this, should we switch our distribution from bz2+xz to bz2+zstd
> or bz2+lzip?
>
> Thanks,
>
> Paolo
>

Re: [RFC PATCH v2 6/6] cxl/core: add poison injection event handler

2024-03-29 Thread Alison Schofield

On Fri, Mar 29, 2024 at 02:36:14PM +0800, Shiyang Ruan wrote:
> Currently driver only traces cxl events, poison injection (for both vmem
> and pmem type) on cxl memdev is silent.  OS needs to be notified then it
> could handle poison range in time.  Per CXL spec, the device error event
> could be signaled through FW-First and OS-First methods.
> 
> So, add poison event handler in OS-First method:
>   - qemu:
> - CXL device report POISON event to OS by MSI by sending GMER after
>   injecting a poison record
>   - CXL driver
> a. parse the POISON event from GMER;<-- this patch
> b. retrieve POISON list from memdev;
> c. translate poisoned DPA to HPA;
> d. enqueue poisoned PFN to memory_failure's work queue;
> 
> Signed-off-by: Shiyang Ruan 
> ---
> 
> the reply to Jonathan's comment in last version:
> > I'm not 100% convinced this is necessary poison causing.  Also
> > the text tells us we should see 'an appropriate event'.
> > DRAM one seems likely to be chosen by some vendors.
> I think it's right to use DRAM Event Record for volatile-memdev, but 
> should poison on a persistent-memdev also use DRAM Event Record too? 
> Though its 'Physical Address' feild has the 'Volatile' bit too, which is 
> same as General Media Event Record.  I am a bit confused about this.


Similar thought as shared in cover letter -
Can the driver trigger new poison list read on any events of interest,
and implement a policy to report mem failures on all poison list reads
that hit mapped addresses?

I guess if the answer to that question is NO - we only report memory
failures on GMER/poison, can we find more synergy and not repeat so 
much work.

--Alison

> 
> ---
>  drivers/cxl/core/mbox.c | 100 ++--
>  drivers/cxl/cxlmem.h|   8 ++--
>  2 files changed, 91 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 19b46fb06ed6..97ef45d808b8 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -837,25 +837,99 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>  
> -void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> - enum cxl_event_log_type type,
> - enum cxl_event_type event_type,
> - const uuid_t *uuid, union cxl_event *evt)
> +struct cxl_event_poison_context {
> + u64 dpa;
> + u64 length;
> +};
> +
> +static int __cxl_report_poison(struct device *dev, void *arg)
> +{
> + struct cxl_event_poison_context *ctx = arg;
> + struct cxl_endpoint_decoder *cxled;
> + struct cxl_memdev *cxlmd;
> +
> + cxled = to_cxl_endpoint_decoder(dev);
> + if (!cxled || !cxled->dpa_res || !resource_size(cxled->dpa_res))
> + return 0;
> +
> + if (cxled->mode == CXL_DECODER_MIXED) {
> + dev_dbg(dev, "poison list read unsupported in mixed mode\n");
> + return 0;
> + }
> +
> + if (ctx->dpa > cxled->dpa_res->end || ctx->dpa < cxled->dpa_res->start)
> + return 0;
> +
> + cxlmd = cxled_to_memdev(cxled);
> + cxl_mem_get_poison(cxlmd, ctx->dpa, ctx->length, cxled->cxld.region,
> +true);
> +
> + return 1;
> +}
> +
> +static void cxl_event_handle_poison(struct cxl_memdev *cxlmd,
> + struct cxl_event_gen_media *rec)
> +{
> + struct cxl_port *port = cxlmd->endpoint;
> + u64 phys_addr = le64_to_cpu(rec->phys_addr);
> + struct cxl_event_poison_context ctx = {
> + .dpa = phys_addr & CXL_DPA_MASK,
> + };
> +
> + /* No regions mapped to this memdev, that is to say no HPA is mapped */
> + if (!port || !is_cxl_endpoint(port) ||
> + cxl_num_decoders_committed(port) == 0)
> + return;
> +
> + /*
> +  * Host Inject Poison may have a range of DPA, but the GMER only has
> +  * "Physical Address" field, no such one indicates length.  So it's
> +  * better to call cxl_mem_get_poison() to find this poison record.
> +  */
> + ctx.length = phys_addr & CXL_DPA_VOLATILE ?
> + resource_size(>cxlds->ram_res) :
> + resource_size(>cxlds->pmem_res) - ctx.dpa;
> +
> + device_for_each_child(>dev, , __cxl_report_poison);
> +}
> +
> +static void cxl_event_handle_general_media(struct cxl_memdev *cxlmd,
> +enum cxl_event_log_type type,
> +struct cxl_event_gen_media *rec)
> +{
> + if (type == CXL_EVENT_TYPE_FAIL) {
> + switch (rec->transaction_type) {
> + case CXL_EVENT_TRANSACTION_READ:
> + case CXL_EVENT_TRANSACTION_WRITE:
> + case CXL_EVENT_TRANSACTION_INJECT_POISON:
> + cxl_event_handle_poison(cxlmd, rec);
> + break;
> + default:
> +

Re: [RFC PATCH v2 0/6] cxl: add poison event handler

2024-03-29 Thread Dan Williams

Alison Schofield wrote:
[..]
> Upon receipt of that new poison list, call memory_failture_queue()
> on *any* poison in a mapped space. Is that OK?  Can we call
> memory_failure_queue() on any and every poison report that is in
> HPA space regardless of whether it first came to us through a GMER?
> I'm actually wondering if that is going to be the next ask anyway -
> ie report all poison.

memory_failure_queue() should be called on poison creation events. Leave
the MF_ACTION_REQUIRED flag not set so that memory_failure() performs
"action optional" handling.  So I would expect memory_failure_queue()
notification for GMER events, but not on poison list events.

Re: [RFC PATCH v2 4/6] cxl/core: report poison when injecting from debugfs

2024-03-29 Thread Alison Schofield

On Fri, Mar 29, 2024 at 02:36:12PM +0800, Shiyang Ruan wrote:
> Poison injection from debugfs is silent too.  Add calling
> cxl_mem_report_poison() to make it able to do memory_failure().

Curious as to why it is silent? Will a GMER poison event occur
and trigger the path to report it via memory_failure?

> 
> Signed-off-by: Shiyang Ruan 
> ---
>  drivers/cxl/core/memdev.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index e976141ca4a9..b0dcbe6f1004 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -366,6 +366,7 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
>   .length = cpu_to_le32(1),
>   };
>   trace_cxl_poison(cxlmd, cxlr, , 0, 0, CXL_POISON_TRACE_INJECT);
> + cxl_mem_report_poison(cxlmd, cxlr, );
>  out:
>   up_read(_dpa_rwsem);
>   up_read(_region_rwsem);
> -- 
> 2.34.1
> 
>

Backdoor in xz, should we switch compression format for tarballs?

2024-03-29 Thread Paolo Bonzini

For more info, see
https://lwn.net/ml/oss-security/20240329155126.kjjfduxw2yrlx...@awork3.anarazel.de/
but, essentially, xz was backdoored and it seems like upstream was directly
responsible for this.

Based on this, should we switch our distribution from bz2+xz to bz2+zstd or
bz2+lzip?

Thanks,

Paolo

Re: [PATCH 1/1] e1000: Get debug flags from an environment variable

2024-03-29 Thread Richard Henderson


On 3/29/24 05:04, Don Porter wrote:

From: Austin Clements 

The E1000 debug messages are very useful for developing drivers, so
this introduces an E1000_DEBUG environment variable that lets the
debug flags be set without recompiling QEMU.

Signed-off-by: Austin Clements 
[geo...@ldpreload.com: Rebased on top of 2.9.0]
Signed-off-by: Geoffrey Thomas 
Signed-off-by: Don Porter 

...

-/* #define E1000_DEBUG */
-
-#ifdef E1000_DEBUG
  enum {
  DEBUG_GENERAL,  DEBUG_IO,   DEBUG_MMIO, DEBUG_INTERRUPT,
  DEBUG_RX,   DEBUG_TX,   DEBUG_MDIC, DEBUG_EEPROM,
  DEBUG_UNKNOWN,  DEBUG_TXSUM,DEBUG_TXERR,DEBUG_RXERR,
  DEBUG_RXFILTER, DEBUG_PHY,  DEBUG_NOTYET,
  };
+
+static const char *debugnames[] = {
+"GENERAL",  "IO",   "MMIO", "INTERRUPT",
+"RX",   "TX",   "MDIC", "EEPROM",
+"UNKNOWN",  "TXSUM","TXERR","RXERR",
+"RXFILTER", "PHY",  "NOTYET",   NULL
+};
  #define DBGBIT(x)(1<

These should be converted to tracepoints.
See docs/devel/tracing.rst.


r~

[PATCH v3 3/5] hw/char/stm32l4x5_usart: Add options for serial parameters setting

2024-03-29 Thread Arnaud Minier

Add a function to change the settings of the
serial connection.

Signed-off-by: Arnaud Minier 
Signed-off-by: Inès Varhol 
Reviewed-by: Peter Maydell 
---
 hw/char/stm32l4x5_usart.c | 98 +++
 hw/char/trace-events  |  1 +
 2 files changed, 99 insertions(+)

diff --git a/hw/char/stm32l4x5_usart.c b/hw/char/stm32l4x5_usart.c
index b948a89c13..0f5ab7e0cb 100644
--- a/hw/char/stm32l4x5_usart.c
+++ b/hw/char/stm32l4x5_usart.c
@@ -268,6 +268,92 @@ static void usart_cancel_transmit(Stm32l4x5UsartBaseState 
*s)
 }
 }
 
+static void stm32l4x5_update_params(Stm32l4x5UsartBaseState *s)
+{
+int speed, parity, data_bits, stop_bits;
+uint32_t value, usart_div;
+QEMUSerialSetParams ssp;
+
+/* Select the parity type */
+if (s->cr1 & R_CR1_PCE_MASK) {
+if (s->cr1 & R_CR1_PS_MASK) {
+parity = 'O';
+} else {
+parity = 'E';
+}
+} else {
+parity = 'N';
+}
+
+/* Select the number of stop bits */
+switch (FIELD_EX32(s->cr2, CR2, STOP)) {
+case 0:
+stop_bits = 1;
+break;
+case 2:
+stop_bits = 2;
+break;
+default:
+qemu_log_mask(LOG_UNIMP,
+"UNIMPLEMENTED: fractionnal stop bits; CR2[13:12] = %u",
+FIELD_EX32(s->cr2, CR2, STOP));
+return;
+}
+
+/* Select the length of the word */
+switch ((FIELD_EX32(s->cr1, CR1, M1) << 1) | FIELD_EX32(s->cr1, CR1, M0)) {
+case 0:
+data_bits = 8;
+break;
+case 1:
+data_bits = 9;
+break;
+case 2:
+data_bits = 7;
+break;
+default:
+qemu_log_mask(LOG_GUEST_ERROR,
+"UNDEFINED: invalid word length, CR1.M = 0b11");
+return;
+}
+
+/* Select the baud rate */
+value = FIELD_EX32(s->brr, BRR, BRR);
+if (value < 16) {
+qemu_log_mask(LOG_GUEST_ERROR,
+"UNDEFINED: BRR less than 16: %u", value);
+return;
+}
+
+if (FIELD_EX32(s->cr1, CR1, OVER8) == 0) {
+/*
+ * Oversampling by 16
+ * BRR = USARTDIV
+ */
+usart_div = value;
+} else {
+/*
+ * Oversampling by 8
+ * - BRR[2:0] = USARTDIV[3:0] shifted 1 bit to the right.
+ * - BRR[3] must be kept cleared.
+ * - BRR[15:4] = USARTDIV[15:4]
+ * - The frequency is multiplied by 2
+ */
+usart_div = ((value & 0xFFF0) | ((value & 0x0007) << 1)) / 2;
+}
+
+speed = clock_get_hz(s->clk) / usart_div;
+
+ssp.speed = speed;
+ssp.parity= parity;
+ssp.data_bits = data_bits;
+ssp.stop_bits = stop_bits;
+
+qemu_chr_fe_ioctl(>chr, CHR_IOCTL_SERIAL_SET_PARAMS, );
+
+trace_stm32l4x5_usart_update_params(speed, parity, data_bits, stop_bits);
+}
+
 static void stm32l4x5_usart_base_reset_hold(Object *obj)
 {
 Stm32l4x5UsartBaseState *s = STM32L4X5_USART_BASE(obj);
@@ -366,16 +452,19 @@ static void stm32l4x5_usart_base_write(void *opaque, 
hwaddr addr,
 switch (addr) {
 case A_CR1:
 s->cr1 = value;
+stm32l4x5_update_params(s);
 stm32l4x5_update_irq(s);
 return;
 case A_CR2:
 s->cr2 = value;
+stm32l4x5_update_params(s);
 return;
 case A_CR3:
 s->cr3 = value;
 return;
 case A_BRR:
 s->brr = value;
+stm32l4x5_update_params(s);
 return;
 case A_GTPR:
 s->gtpr = value;
@@ -444,10 +533,19 @@ static void stm32l4x5_usart_base_init(Object *obj)
 s->clk = qdev_init_clock_in(DEVICE(s), "clk", NULL, s, 0);
 }
 
+static int stm32l4x5_usart_base_post_load(void *opaque, int version_id)
+{
+Stm32l4x5UsartBaseState *s = (Stm32l4x5UsartBaseState *)opaque;
+
+stm32l4x5_update_params(s);
+return 0;
+}
+
 static const VMStateDescription vmstate_stm32l4x5_usart_base = {
 .name = TYPE_STM32L4X5_USART_BASE,
 .version_id = 1,
 .minimum_version_id = 1,
+.post_load = stm32l4x5_usart_base_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINT32(cr1, Stm32l4x5UsartBaseState),
 VMSTATE_UINT32(cr2, Stm32l4x5UsartBaseState),
diff --git a/hw/char/trace-events b/hw/char/trace-events
index f22f0ee2bc..8875758076 100644
--- a/hw/char/trace-events
+++ b/hw/char/trace-events
@@ -116,6 +116,7 @@ stm32l4x5_usart_irq_raised(uint32_t reg) "USART: IRQ 
raised: 0x%08"PRIx32
 stm32l4x5_usart_irq_lowered(void) "USART: IRQ lowered"
 stm32l4x5_usart_overrun_detected(uint8_t current, uint8_t received) "USART: 
Overrun detected, RDR='0x%x', received 0x%x"
 stm32l4x5_usart_receiver_not_enabled(uint8_t ue_bit, uint8_t re_bit) "USART: 
Receiver not enabled, UE=0x%x, RE=0x%x"
+stm32l4x5_usart_update_params(int speed, uint8_t parity, int data, int stop) 
"USART: speed: %d, parity: %c, data bits: %d, stop bits: %d"
 
 # xen_console.c
 xen_console_connect(unsigned int idx, unsigned int ring_ref, unsigned int 
port, unsigned int limit) "idx %u ring_ref %u

[PATCH v3 4/5] hw/arm: Add the USART to the stm32l4x5 SoC

2024-03-29 Thread Arnaud Minier

Add the USART to the SoC and connect it to the other implemented devices.

Signed-off-by: Arnaud Minier 
Signed-off-by: Inès Varhol 
Reviewed-by: Peter Maydell 
---
 docs/system/arm/b-l475e-iot01a.rst |  2 +-
 hw/arm/Kconfig |  1 +
 hw/arm/stm32l4x5_soc.c | 82 +++---
 include/hw/arm/stm32l4x5_soc.h |  7 +++
 4 files changed, 85 insertions(+), 7 deletions(-)

diff --git a/docs/system/arm/b-l475e-iot01a.rst 
b/docs/system/arm/b-l475e-iot01a.rst
index 0afef8e4f4..a76c9976c5 100644
--- a/docs/system/arm/b-l475e-iot01a.rst
+++ b/docs/system/arm/b-l475e-iot01a.rst
@@ -19,13 +19,13 @@ Currently B-L475E-IOT01A machine's only supports the 
following devices:
 - STM32L4x5 SYSCFG (System configuration controller)
 - STM32L4x5 RCC (Reset and clock control)
 - STM32L4x5 GPIOs (General-purpose I/Os)
+- STM32L4x5 USARTs, UARTs and LPUART (Serial ports)
 
 Missing devices
 """
 
 The B-L475E-IOT01A does *not* support the following devices:
 
-- Serial ports (UART)
 - Analog to Digital Converter (ADC)
 - SPI controller
 - Timer controller (TIMER)
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 893a7bff66..098d043375 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -477,6 +477,7 @@ config STM32L4X5_SOC
 select STM32L4X5_SYSCFG
 select STM32L4X5_RCC
 select STM32L4X5_GPIO
+select STM32L4X5_USART
 
 config XLNX_ZYNQMP_ARM
 bool
diff --git a/hw/arm/stm32l4x5_soc.c b/hw/arm/stm32l4x5_soc.c
index 40e294f838..8fc369fff0 100644
--- a/hw/arm/stm32l4x5_soc.c
+++ b/hw/arm/stm32l4x5_soc.c
@@ -28,6 +28,7 @@
 #include "sysemu/sysemu.h"
 #include "hw/or-irq.h"
 #include "hw/arm/stm32l4x5_soc.h"
+#include "hw/char/stm32l4x5_usart.h"
 #include "hw/gpio/stm32l4x5_gpio.h"
 #include "hw/qdev-clock.h"
 #include "hw/misc/unimp.h"
@@ -116,6 +117,22 @@ static const struct {
 { 0x48001C00, 0x000F, 0x, 0x },
 };
 
+static const hwaddr usart_addr[] = {
+0x40013800, /* "USART1", 0x400 */
+0x40004400, /* "USART2", 0x400 */
+0x40004800, /* "USART3", 0x400 */
+};
+static const hwaddr uart_addr[] = {
+0x40004C00, /* "UART4" , 0x400 */
+0x40005000  /* "UART5" , 0x400 */
+};
+
+#define LPUART_BASE_ADDRESS 0x40008000
+
+static const int usart_irq[] = { 37, 38, 39 };
+static const int uart_irq[] = { 52, 53 };
+#define LPUART_IRQ 70
+
 static void stm32l4x5_soc_initfn(Object *obj)
 {
 Stm32l4x5SocState *s = STM32L4X5_SOC(obj);
@@ -132,6 +149,18 @@ static void stm32l4x5_soc_initfn(Object *obj)
 g_autofree char *name = g_strdup_printf("gpio%c", 'a' + i);
 object_initialize_child(obj, name, >gpio[i], TYPE_STM32L4X5_GPIO);
 }
+
+for (int i = 0; i < STM_NUM_USARTS; i++) {
+object_initialize_child(obj, "usart[*]", >usart[i],
+TYPE_STM32L4X5_USART);
+}
+
+for (int i = 0; i < STM_NUM_UARTS; i++) {
+object_initialize_child(obj, "uart[*]", >uart[i],
+TYPE_STM32L4X5_UART);
+}
+object_initialize_child(obj, "lpuart1", >lpuart,
+TYPE_STM32L4X5_LPUART);
 }
 
 static void stm32l4x5_soc_realize(DeviceState *dev_soc, Error **errp)
@@ -279,6 +308,53 @@ static void stm32l4x5_soc_realize(DeviceState *dev_soc, 
Error **errp)
 sysbus_mmio_map(busdev, 0, RCC_BASE_ADDRESS);
 sysbus_connect_irq(busdev, 0, qdev_get_gpio_in(armv7m, RCC_IRQ));
 
+/* USART devices */
+for (int i = 0; i < STM_NUM_USARTS; i++) {
+g_autofree char *name = g_strdup_printf("usart%d-out", i + 1);
+dev = DEVICE(&(s->usart[i]));
+qdev_prop_set_chr(dev, "chardev", serial_hd(i));
+qdev_connect_clock_in(dev, "clk",
+qdev_get_clock_out(DEVICE(&(s->rcc)), name));
+busdev = SYS_BUS_DEVICE(dev);
+if (!sysbus_realize(busdev, errp)) {
+return;
+}
+sysbus_mmio_map(busdev, 0, usart_addr[i]);
+sysbus_connect_irq(busdev, 0, qdev_get_gpio_in(armv7m, usart_irq[i]));
+}
+
+/*
+ * TODO: Connect the USARTs, UARTs and LPUART to the EXTI once the EXTI
+ * can handle other gpio-in than the gpios. (e.g. Direct Lines for the 
usarts)
+ */
+
+/* UART devices */
+for (int i = 0; i < STM_NUM_UARTS; i++) {
+g_autofree char *name = g_strdup_printf("uart%d-out", STM_NUM_USARTS + 
i + 1);
+dev = DEVICE(&(s->uart[i]));
+qdev_prop_set_chr(dev, "chardev", serial_hd(STM_NUM_USARTS + i));
+qdev_connect_clock_in(dev, "clk",
+qdev_get_clock_out(DEVICE(&(s->rcc)), name));
+busdev = SYS_BUS_DEVICE(dev);
+if (!sysbus_realize(busdev, errp)) {
+return;
+}
+sysbus_mmio_map(busdev, 0, uart_addr[i]);
+sysbus_connect_irq(busdev, 0, qdev_get_gpio_in(armv7m, uart_irq[i]));
+}
+
+/* LPUART device*/
+dev = DEVICE(&(s->lpuart));
+qdev_prop_set_chr(dev, "chardev", serial_hd(STM_NUM_USARTS + 
STM_NUM_UARTS));
+

[PATCH v3 5/5] tests/qtest: Add tests for the STM32L4x5 USART

2024-03-29 Thread Arnaud Minier

Test:
- read/write from/to the usart registers
- send/receive a character/string over the serial port

Signed-off-by: Arnaud Minier 
Signed-off-by: Inès Varhol 
---
 tests/qtest/meson.build|   4 +-
 tests/qtest/stm32l4x5_usart-test.c | 325 +
 2 files changed, 328 insertions(+), 1 deletion(-)
 create mode 100644 tests/qtest/stm32l4x5_usart-test.c

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 36c5c13a7b..b128fa5a4b 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -7,6 +7,7 @@ slow_qtests = {
   'npcm7xx_pwm-test': 300,
   'npcm7xx_watchdog_timer-test': 120,
   'qom-test' : 900,
+  'stm32l4x5_usart-test' : 600,
   'test-hmp' : 240,
   'pxe-test': 610,
   'prom-env-test': 360,
@@ -205,7 +206,8 @@ qtests_stm32l4x5 = \
   ['stm32l4x5_exti-test',
'stm32l4x5_syscfg-test',
'stm32l4x5_rcc-test',
-   'stm32l4x5_gpio-test']
+   'stm32l4x5_gpio-test',
+   'stm32l4x5_usart-test']
 
 qtests_arm = \
   (config_all_devices.has_key('CONFIG_MPS2') ? ['sse-timer-test'] : []) + \
diff --git a/tests/qtest/stm32l4x5_usart-test.c 
b/tests/qtest/stm32l4x5_usart-test.c
new file mode 100644
index 00..ca6aa33ff6
--- /dev/null
+++ b/tests/qtest/stm32l4x5_usart-test.c
@@ -0,0 +1,325 @@
+/*
+ * QTest testcase for STML4X5_USART
+ *
+ * Copyright (c) 2023 Arnaud Minier 
+ * Copyright (c) 2023 Inès Varhol 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "hw/misc/stm32l4x5_rcc_internals.h"
+#include "hw/registerfields.h"
+
+#define RCC_BASE_ADDR 0x40021000
+/* Use USART 1 ADDR, assume the others work the same */
+#define USART1_BASE_ADDR 0x40013800
+
+/* See stm32l4x5_usart for definitions */
+REG32(CR1, 0x00)
+FIELD(CR1, M1, 28, 1)
+FIELD(CR1, OVER8, 15, 1)
+FIELD(CR1, M0, 12, 1)
+FIELD(CR1, PCE, 10, 1)
+FIELD(CR1, TXEIE, 7, 1)
+FIELD(CR1, RXNEIE, 5, 1)
+FIELD(CR1, TE, 3, 1)
+FIELD(CR1, RE, 2, 1)
+FIELD(CR1, UE, 0, 1)
+REG32(CR2, 0x04)
+REG32(CR3, 0x08)
+FIELD(CR3, OVRDIS, 12, 1)
+REG32(BRR, 0x0C)
+REG32(GTPR, 0x10)
+REG32(RTOR, 0x14)
+REG32(RQR, 0x18)
+REG32(ISR, 0x1C)
+FIELD(ISR, TXE, 7, 1)
+FIELD(ISR, RXNE, 5, 1)
+FIELD(ISR, ORE, 3, 1)
+REG32(ICR, 0x20)
+REG32(RDR, 0x24)
+REG32(TDR, 0x28)
+
+#define NVIC_ISPR1 0XE000E204
+#define NVIC_ICPR1 0xE000E284
+#define USART1_IRQ 37
+
+static bool check_nvic_pending(QTestState *qts, unsigned int n)
+{
+/* No USART interrupts are less than 32 */
+assert(n > 32);
+n -= 32;
+return qtest_readl(qts, NVIC_ISPR1) & (1 << n);
+}
+
+static bool clear_nvic_pending(QTestState *qts, unsigned int n)
+{
+/* No USART interrupts are less than 32 */
+assert(n > 32);
+n -= 32;
+qtest_writel(qts, NVIC_ICPR1, (1 << n));
+return true;
+}
+
+/*
+ * Wait indefinitely for the flag to be updated.
+ * If this is run on a slow CI runner,
+ * the meson harness will timeout after 10 minutes for us.
+ */
+static bool usart_wait_for_flag(QTestState *qts, uint32_t event_addr, uint32_t 
flag)
+{
+while (true) {
+if ((qtest_readl(qts, event_addr) & flag)) {
+return true;
+}
+g_usleep(1000);
+}
+
+return false;
+}
+
+static void usart_receive_string(QTestState *qts, int sock_fd, const char *in,
+   char *out)
+{
+int i, in_len = strlen(in);
+
+g_assert_true(send(sock_fd, in, in_len, 0) == in_len);
+for (i = 0; i < in_len; i++) {
+g_assert_true(usart_wait_for_flag(qts,
+USART1_BASE_ADDR + A_ISR, R_ISR_RXNE_MASK));
+out[i] = qtest_readl(qts, USART1_BASE_ADDR + A_RDR);
+}
+out[i] = '\0';
+}
+
+static void usart_send_string(QTestState *qts, const char *in)
+{
+int i, in_len = strlen(in);
+
+for (i = 0; i < in_len; i++) {
+qtest_writel(qts, USART1_BASE_ADDR + A_TDR, in[i]);
+g_assert_true(usart_wait_for_flag(qts,
+USART1_BASE_ADDR + A_ISR, R_ISR_TXE_MASK));
+}
+}
+
+/* Init the RCC clocks to run at 80 MHz */
+static void init_clocks(QTestState *qts)
+{
+uint32_t value;
+
+/* MSIRANGE can be set only when MSI is OFF or READY */
+qtest_writel(qts, (RCC_BASE_ADDR + A_CR), R_CR_MSION_MASK);
+
+/* Clocking from MSI, in case MSI was not the default source */
+qtest_writel(qts, (RCC_BASE_ADDR + A_CFGR), 0);
+
+/*
+ * Update PLL and set MSI as the source clock.
+ * PLLM = 1 --> 000
+ * PLLN = 40 --> 40
+ * PPLLR = 2 --> 00
+ * PLLDIV = unused, PLLP = unused (SAI3), PLLQ = unused (48M1)
+ * SRC = MSI --> 01
+ */
+qtest_writel(qts, (RCC_BASE_ADDR + A_PLLCFGR), R_PLLCFGR_PLLREN_MASK |
+(40 << R_PLLCFGR_PLLN_SHIFT) |
+(0b01 << R_PLLCFGR_PLLSRC_SHIFT));
+
+/* PLL activation */
+
+value = qtest_readl(qts, (RCC_BASE_ADDR + A_CR));
+qtest_writel(qts, (RCC_BASE_ADDR +

[PATCH v3 1/5] hw/char: Implement STM32L4x5 USART skeleton

2024-03-29 Thread Arnaud Minier

Add the basic infrastructure (register read/write, type...)
to implement the STM32L4x5 USART.

Also create different types for the USART, UART and LPUART
of the STM32L4x5 to deduplicate code and enable the
implementation of different behaviors depending on the type.

Signed-off-by: Arnaud Minier 
Signed-off-by: Inès Varhol 
Reviewed-by: Peter Maydell 
---
 MAINTAINERS   |   1 +
 hw/char/Kconfig   |   3 +
 hw/char/meson.build   |   1 +
 hw/char/stm32l4x5_usart.c | 396 ++
 hw/char/trace-events  |   4 +
 include/hw/char/stm32l4x5_usart.h |  66 +
 6 files changed, 471 insertions(+)
 create mode 100644 hw/char/stm32l4x5_usart.c
 create mode 100644 include/hw/char/stm32l4x5_usart.h

diff --git a/MAINTAINERS b/MAINTAINERS
index a07af6b9d4..fa7212b9b1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1128,6 +1128,7 @@ M: Inès Varhol 
 L: qemu-...@nongnu.org
 S: Maintained
 F: hw/arm/stm32l4x5_soc.c
+F: hw/char/stm32l4x5_usart.c
 F: hw/misc/stm32l4x5_exti.c
 F: hw/misc/stm32l4x5_syscfg.c
 F: hw/misc/stm32l4x5_rcc.c
diff --git a/hw/char/Kconfig b/hw/char/Kconfig
index 6b6cf2fc1d..4fd74ea878 100644
--- a/hw/char/Kconfig
+++ b/hw/char/Kconfig
@@ -41,6 +41,9 @@ config VIRTIO_SERIAL
 config STM32F2XX_USART
 bool
 
+config STM32L4X5_USART
+bool
+
 config CMSDK_APB_UART
 bool
 
diff --git a/hw/char/meson.build b/hw/char/meson.build
index 006d20f1e2..e5b13b6958 100644
--- a/hw/char/meson.build
+++ b/hw/char/meson.build
@@ -31,6 +31,7 @@ system_ss.add(when: 'CONFIG_RENESAS_SCI', if_true: 
files('renesas_sci.c'))
 system_ss.add(when: 'CONFIG_SIFIVE_UART', if_true: files('sifive_uart.c'))
 system_ss.add(when: 'CONFIG_SH_SCI', if_true: files('sh_serial.c'))
 system_ss.add(when: 'CONFIG_STM32F2XX_USART', if_true: 
files('stm32f2xx_usart.c'))
+system_ss.add(when: 'CONFIG_STM32L4X5_USART', if_true: 
files('stm32l4x5_usart.c'))
 system_ss.add(when: 'CONFIG_MCHP_PFSOC_MMUART', if_true: 
files('mchp_pfsoc_mmuart.c'))
 system_ss.add(when: 'CONFIG_HTIF', if_true: files('riscv_htif.c'))
 system_ss.add(when: 'CONFIG_GOLDFISH_TTY', if_true: files('goldfish_tty.c'))
diff --git a/hw/char/stm32l4x5_usart.c b/hw/char/stm32l4x5_usart.c
new file mode 100644
index 00..13ce0eb89d
--- /dev/null
+++ b/hw/char/stm32l4x5_usart.c
@@ -0,0 +1,396 @@
+/*
+ * STM32L4X5 USART (Universal Synchronous Asynchronous Receiver Transmitter)
+ *
+ * Copyright (c) 2023 Arnaud Minier 
+ * Copyright (c) 2023 Inès Varhol 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ * The STM32L4X5 USART is heavily inspired by the stm32f2xx_usart
+ * by Alistair Francis.
+ * The reference used is the STMicroElectronics RM0351 Reference manual
+ * for STM32L4x5 and STM32L4x6 advanced Arm ® -based 32-bit MCUs.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qapi/error.h"
+#include "chardev/char-fe.h"
+#include "chardev/char-serial.h"
+#include "migration/vmstate.h"
+#include "hw/char/stm32l4x5_usart.h"
+#include "hw/clock.h"
+#include "hw/irq.h"
+#include "hw/qdev-clock.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-properties-system.h"
+#include "hw/registerfields.h"
+#include "trace.h"
+
+
+REG32(CR1, 0x00)
+FIELD(CR1, M1, 28, 1)/* Word length (part 2, see M0) */
+FIELD(CR1, EOBIE, 27, 1) /* End of Block interrupt enable */
+FIELD(CR1, RTOIE, 26, 1) /* Receiver timeout interrupt enable */
+FIELD(CR1, DEAT, 21, 5)  /* Driver Enable assertion time */
+FIELD(CR1, DEDT, 16, 5)  /* Driver Enable de-assertion time */
+FIELD(CR1, OVER8, 15, 1) /* Oversampling mode */
+FIELD(CR1, CMIE, 14, 1)  /* Character match interrupt enable */
+FIELD(CR1, MME, 13, 1)   /* Mute mode enable */
+FIELD(CR1, M0, 12, 1)/* Word length (part 1, see M1) */
+FIELD(CR1, WAKE, 11, 1)  /* Receiver wakeup method */
+FIELD(CR1, PCE, 10, 1)   /* Parity control enable */
+FIELD(CR1, PS, 9, 1) /* Parity selection */
+FIELD(CR1, PEIE, 8, 1)   /* PE interrupt enable */
+FIELD(CR1, TXEIE, 7, 1)  /* TXE interrupt enable */
+FIELD(CR1, TCIE, 6, 1)   /* Transmission complete interrupt enable */
+FIELD(CR1, RXNEIE, 5, 1) /* RXNE interrupt enable */
+FIELD(CR1, IDLEIE, 4, 1) /* IDLE interrupt enable */
+FIELD(CR1, TE, 3, 1) /* Transmitter enable */
+FIELD(CR1, RE, 2, 1) /* Receiver enable */
+FIELD(CR1, UESM, 1, 1)   /* USART enable in Stop mode */
+FIELD(CR1, UE, 0, 1) /* USART enable */
+REG32(CR2, 0x04)
+FIELD(CR2, ADD_1, 28, 4)/* ADD[7:4] */
+FIELD(CR2, ADD_0, 24, 1)/* ADD[3:0] */
+FIELD(CR2, RTOEN, 23, 1)/* Receiver timeout enable */
+FIELD(CR2, ABRMOD, 21, 2)   /* Auto baud rate mode */
+FIELD(CR2, ABREN, 20, 1)/* Auto baud rate enable */
+FIELD(CR2, MSBFIRST, 19, 1)

[PATCH v3 2/5] hw/char/stm32l4x5_usart: Enable serial read and write

2024-03-29 Thread Arnaud Minier

Implement the ability to read and write characters to the
usart using the serial port.

The character transmission is based on the
cmsdk-apb-uart implementation.

Signed-off-by: Arnaud Minier 
Signed-off-by: Inès Varhol 
Reviewed-by: Peter Maydell 
---
 hw/char/stm32l4x5_usart.c | 140 ++
 hw/char/trace-events  |   7 ++
 include/hw/char/stm32l4x5_usart.h |   1 +
 3 files changed, 148 insertions(+)

diff --git a/hw/char/stm32l4x5_usart.c b/hw/char/stm32l4x5_usart.c
index 13ce0eb89d..b948a89c13 100644
--- a/hw/char/stm32l4x5_usart.c
+++ b/hw/char/stm32l4x5_usart.c
@@ -154,6 +154,120 @@ REG32(RDR, 0x24)
 REG32(TDR, 0x28)
 FIELD(TDR, TDR, 0, 9)
 
+static void stm32l4x5_update_irq(Stm32l4x5UsartBaseState *s)
+{
+if (((s->isr & R_ISR_WUF_MASK) && (s->cr3 & R_CR3_WUFIE_MASK))||
+((s->isr & R_ISR_CMF_MASK) && (s->cr1 & R_CR1_CMIE_MASK)) ||
+((s->isr & R_ISR_ABRF_MASK) && (s->cr1 & R_CR1_RXNEIE_MASK))  ||
+((s->isr & R_ISR_EOBF_MASK) && (s->cr1 & R_CR1_EOBIE_MASK))   ||
+((s->isr & R_ISR_RTOF_MASK) && (s->cr1 & R_CR1_RTOIE_MASK))   ||
+((s->isr & R_ISR_CTSIF_MASK) && (s->cr3 & R_CR3_CTSIE_MASK))  ||
+((s->isr & R_ISR_LBDF_MASK) && (s->cr2 & R_CR2_LBDIE_MASK))   ||
+((s->isr & R_ISR_TXE_MASK) && (s->cr1 & R_CR1_TXEIE_MASK))||
+((s->isr & R_ISR_TC_MASK) && (s->cr1 & R_CR1_TCIE_MASK))  ||
+((s->isr & R_ISR_RXNE_MASK) && (s->cr1 & R_CR1_RXNEIE_MASK))  ||
+((s->isr & R_ISR_IDLE_MASK) && (s->cr1 & R_CR1_IDLEIE_MASK))  ||
+((s->isr & R_ISR_ORE_MASK) &&
+((s->cr1 & R_CR1_RXNEIE_MASK) || (s->cr3 & R_CR3_EIE_MASK)))  ||
+/* TODO: Handle NF ? */
+((s->isr & R_ISR_FE_MASK) && (s->cr3 & R_CR3_EIE_MASK))   ||
+((s->isr & R_ISR_PE_MASK) && (s->cr1 & R_CR1_PEIE_MASK))) {
+qemu_irq_raise(s->irq);
+trace_stm32l4x5_usart_irq_raised(s->isr);
+} else {
+qemu_irq_lower(s->irq);
+trace_stm32l4x5_usart_irq_lowered();
+}
+}
+
+static int stm32l4x5_usart_base_can_receive(void *opaque)
+{
+Stm32l4x5UsartBaseState *s = opaque;
+
+if (!(s->isr & R_ISR_RXNE_MASK)) {
+return 1;
+}
+
+return 0;
+}
+
+static void stm32l4x5_usart_base_receive(void *opaque, const uint8_t *buf, int 
size)
+{
+Stm32l4x5UsartBaseState *s = opaque;
+
+if (!((s->cr1 & R_CR1_UE_MASK) && (s->cr1 & R_CR1_RE_MASK))) {
+trace_stm32l4x5_usart_receiver_not_enabled(
+FIELD_EX32(s->cr1, CR1, UE), FIELD_EX32(s->cr1, CR1, RE));
+return;
+}
+
+/* Check if overrun detection is enabled and if there is an overrun */
+if (!(s->cr3 & R_CR3_OVRDIS_MASK) && (s->isr & R_ISR_RXNE_MASK)) {
+/*
+ * A character has been received while
+ * the previous has not been read = Overrun.
+ */
+s->isr |= R_ISR_ORE_MASK;
+trace_stm32l4x5_usart_overrun_detected(s->rdr, *buf);
+} else {
+/* No overrun */
+s->rdr = *buf;
+s->isr |= R_ISR_RXNE_MASK;
+trace_stm32l4x5_usart_rx(s->rdr);
+}
+
+stm32l4x5_update_irq(s);
+}
+
+/*
+ * Try to send tx data, and arrange to be called back later if
+ * we can't (ie the char backend is busy/blocking).
+ */
+static gboolean usart_transmit(void *do_not_use, GIOCondition cond, void 
*opaque)
+{
+Stm32l4x5UsartBaseState *s = STM32L4X5_USART_BASE(opaque);
+int ret;
+/* TODO: Handle 9 bits transmission */
+uint8_t ch = s->tdr;
+
+s->watch_tag = 0;
+
+if (!(s->cr1 & R_CR1_TE_MASK) || (s->isr & R_ISR_TXE_MASK)) {
+return G_SOURCE_REMOVE;
+}
+
+ret = qemu_chr_fe_write(>chr, , 1);
+if (ret <= 0) {
+s->watch_tag = qemu_chr_fe_add_watch(>chr, G_IO_OUT | G_IO_HUP,
+ usart_transmit, s);
+if (!s->watch_tag) {
+/* Most common reason to be here is "no chardev backend":
+ * just insta-drain the buffer, so the serial output
+ * goes into a void, rather than blocking the guest.
+ */
+goto buffer_drained;
+}
+/* Transmit pending */
+trace_stm32l4x5_usart_tx_pending();
+return G_SOURCE_REMOVE;
+}
+
+buffer_drained:
+/* Character successfully sent */
+trace_stm32l4x5_usart_tx(ch);
+s->isr |= R_ISR_TC_MASK | R_ISR_TXE_MASK;
+stm32l4x5_update_irq(s);
+return G_SOURCE_REMOVE;
+}
+
+static void usart_cancel_transmit(Stm32l4x5UsartBaseState *s)
+{
+if (s->watch_tag) {
+g_source_remove(s->watch_tag);
+s->watch_tag = 0;
+}
+}
+
 static void stm32l4x5_usart_base_reset_hold(Object *obj)
 {
 Stm32l4x5UsartBaseState *s = STM32L4X5_USART_BASE(obj);
@@ -167,6 +281,22 @@ static void stm32l4x5_usart_base_reset_hold(Object *obj)
 s->isr = 0x02C0;
 s->rdr = 0x;
 s->tdr = 0x;
+
+

[PATCH v3 0/5] hw/char: Implement the STM32L4x5 USART, UART and LPUART

2024-03-29 Thread Arnaud Minier

This patch adds the STM32L4x5 USART
(Universal Synchronous/Asynchronous Receiver/Transmitter)
device and is part of a series implementing the
STM32L4x5 with a few peripherals.

It implements the necessary functionalities to receive/send
characters over the serial port, which are useful to
communicate with the program currently running.

Thank you Peter and Thomas for your reviews !

Changes from v1 to v2:
- Use asynchronous transmission for serial communication
  (based on cmsdk-apb-uart implementation)
- Use qemu_log_mask instead of error_report
- Squash the commit that renamed the base struct
- Use switch statements where appropriate
- Fix RDR and TDR mask size
- Increase time limit in tests
- Remove the global qtest in the tests
- Use assert when checking the interrupt number in the tests
- Correct usage of g_autofree in the SoC

Changes from v2 to v3:
- Fix typos and comment formatting
- Declare variables at the start of code blocks in the SoC
- Use %u instead of %x in an error log
- Add ".abstract = true" to the base usart class
- Change tests to use meson harness timeout
- Drop merged RCC commit

Arnaud Minier (5):
  hw/char: Implement STM32L4x5 USART skeleton
  hw/char/stm32l4x5_usart: Enable serial read and write
  hw/char/stm32l4x5_usart: Add options for serial parameters setting
  hw/arm: Add the USART to the stm32l4x5 SoC
  tests/qtest: Add tests for the STM32L4x5 USART

 MAINTAINERS|   1 +
 docs/system/arm/b-l475e-iot01a.rst |   2 +-
 hw/arm/Kconfig |   1 +
 hw/arm/stm32l4x5_soc.c |  82 +++-
 hw/char/Kconfig|   3 +
 hw/char/meson.build|   1 +
 hw/char/stm32l4x5_usart.c  | 634 +
 hw/char/trace-events   |  12 +
 include/hw/arm/stm32l4x5_soc.h |   7 +
 include/hw/char/stm32l4x5_usart.h  |  67 +++
 tests/qtest/meson.build|   4 +-
 tests/qtest/stm32l4x5_usart-test.c | 325 +++
 12 files changed, 1131 insertions(+), 8 deletions(-)
 create mode 100644 hw/char/stm32l4x5_usart.c
 create mode 100644 include/hw/char/stm32l4x5_usart.h
 create mode 100644 tests/qtest/stm32l4x5_usart-test.c

-- 
2.34.1

Re: [RFC PATCH v2 0/6] cxl: add poison event handler

2024-03-29 Thread Alison Schofield

On Fri, Mar 29, 2024 at 02:36:08PM +0800, Shiyang Ruan wrote:
> Changes:
> RFCv1 -> RFCv2:
> 1. update commit message of PATCH 1
> 2. use memory_failure_queue() instead of MCE
> 3. also report poison in debugfs when injecting poison
> 4. correct DPA->HPA logic:
>  find memdev's endpoint decoder to find the region it belongs to
> 5. distinguish transaction_type of GMER, only handle POISON related
>  event for now
> 
> 
> Currently driver only traces cxl events, poison injection (for both vmem
> and pmem type) on cxl memdev is silent.  OS needs to be notified then it
> could handle poison range in time.  Per CXL spec, the device error event
> could be signaled through FW-First and OS-First methods.
> 
> So, add poison event handler in OS-First method:
>   - qemu:
> - CXL device report POISON event to OS by MSI by sending GMER after
>   injecting a poison record
>   - CXL driver  <-- this patchset
> a. parse the POISON event from GMER;
> b. retrieve POISON list from memdev;
> c. translate poisoned DPA to HPA;
> d. enqueue poisoned PFN to memory_failure's work queue;

Hi,

Yesterday I posted code adding the HPAs to cxl_general_media & dram
events[1], so as I review this patchset today it's fresh in my mind.

Can we integrate this into the trace_ path directly:

1) On any GMER/poison, trigger a new poison list read

BTW - I'm not sure where to trigger that because we want to keep all
the locking in place and read by endpoints like is done now. It may
not be safe to sneak in a direct call to cxl_mem_get_poison()
as is done in this patch set.

2) Teach the poison list read trace event handler to call
memory_failure_queue().

Upon receipt of that new poison list, call memory_failture_queue()
on *any* poison in a mapped space. Is that OK?  Can we call
memory_failure_queue() on any and every poison report that is in
HPA space regardless of whether it first came to us through a GMER?
I'm actually wondering if that is going to be the next ask anyway -
ie report all poison.

I'll comment a bit more on individual patches.

--Alison

[1] 
https://lore.kernel.org/linux-cxl/cover.1711598777.git.alison.schofi...@intel.com/

> 
> 
> Shiyang Ruan (6):
>   cxl/core: correct length of DPA field masks
>   cxl/core: introduce cxl_mem_report_poison()
>   cxl/core: add report option for cxl_mem_get_poison()
>   cxl/core: report poison when injecting from debugfs
>   cxl: add definition for transaction_type
>   cxl/core: add poison injection event handler
> 
>  drivers/cxl/core/mbox.c   | 126 +-
>  drivers/cxl/core/memdev.c |   5 +-
>  drivers/cxl/core/region.c |   8 +--
>  drivers/cxl/core/trace.h  |   6 +-
>  drivers/cxl/cxlmem.h  |  13 ++--
>  include/linux/cxl-event.h |  17 -
>  6 files changed, 144 insertions(+), 31 deletions(-)
> 
> -- 
> 2.34.1
> >

[PATCH net v3] virtio_net: Do not send RSS key if it is not supported

2024-03-29 Thread Breno Leitao

There is a bug when setting the RSS options in virtio_net that can break
the whole machine, getting the kernel into an infinite loop.

Running the following command in any QEMU virtual machine with virtionet
will reproduce this problem:

# ethtool -X eth0  hfunc toeplitz

This is how the problem happens:

1) ethtool_set_rxfh() calls virtnet_set_rxfh()

2) virtnet_set_rxfh() calls virtnet_commit_rss_command()

3) virtnet_commit_rss_command() populates 4 entries for the rss
scatter-gather

4) Since the command above does not have a key, then the last
scatter-gatter entry will be zeroed, since rss_key_size == 0.
sg_buf_size = vi->rss_key_size;

5) This buffer is passed to qemu, but qemu is not happy with a buffer
with zero length, and do the following in virtqueue_map_desc() (QEMU
function):

  if (!sz) {
  virtio_error(vdev, "virtio: zero sized buffers are not allowed");

6) virtio_error() (also QEMU function) set the device as broken

vdev->broken = true;

7) Qemu bails out, and do not repond this crazy kernel.

8) The kernel is waiting for the response to come back (function
virtnet_send_command())

9) The kernel is waiting doing the following :

  while (!virtqueue_get_buf(vi->cvq, ) &&
 !virtqueue_is_broken(vi->cvq))
  cpu_relax();

10) None of the following functions above is true, thus, the kernel
loops here forever. Keeping in mind that virtqueue_is_broken() does
not look at the qemu `vdev->broken`, so, it never realizes that the
vitio is broken at QEMU side.

Fix it by not sending RSS commands if the feature is not available in
the device.

Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
Cc: sta...@vger.kernel.org
Cc: qemu-devel@nongnu.org
Signed-off-by: Breno Leitao 
---
Changelog:

V2:
  * Moved from creating a valid packet, by rejecting the request
completely
V3:
  * Got some good feedback from and Xuan Zhuo and Heng Qi, and reworked
the rejection path.

---
 drivers/net/virtio_net.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c22d1118a133..c4a21ec51adf 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3807,6 +3807,7 @@ static int virtnet_set_rxfh(struct net_device *dev,
struct netlink_ext_ack *extack)
 {
struct virtnet_info *vi = netdev_priv(dev);
+   bool update = false;
int i;
 
if (rxfh->hfunc != ETH_RSS_HASH_NO_CHANGE &&
@@ -3814,13 +3815,24 @@ static int virtnet_set_rxfh(struct net_device *dev,
return -EOPNOTSUPP;
 
if (rxfh->indir) {
+   if (!vi->has_rss)
+   return -EOPNOTSUPP;
+
for (i = 0; i < vi->rss_indir_table_size; ++i)
vi->ctrl->rss.indirection_table[i] = rxfh->indir[i];
+   update = true;
}
-   if (rxfh->key)
+
+   if (rxfh->key) {
+   if (!vi->has_rss && !vi->has_rss_hash_report)
+   return -EOPNOTSUPP;
+
memcpy(vi->ctrl->rss.key, rxfh->key, vi->rss_key_size);
+   update = true;
+   }
 
-   virtnet_commit_rss_command(vi);
+   if (update)
+   virtnet_commit_rss_command(vi);
 
return 0;
 }
@@ -4729,13 +4741,15 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_HASH_REPORT))
vi->has_rss_hash_report = true;
 
-   if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS))
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS)) {
vi->has_rss = true;
 
-   if (vi->has_rss || vi->has_rss_hash_report) {
vi->rss_indir_table_size =
virtio_cread16(vdev, offsetof(struct virtio_net_config,
rss_max_indirection_table_length));
+   }
+
+   if (vi->has_rss || vi->has_rss_hash_report) {
vi->rss_key_size =
virtio_cread8(vdev, offsetof(struct virtio_net_config, 
rss_max_key_size));
 
-- 
2.43.0

Re: [RFC] util/error-report: Add "error: " prefix for error-level report

2024-03-29 Thread Zhao Liu

Hi Paolo,

On Fri, Mar 29, 2024 at 12:10:17PM +0100, Paolo Bonzini wrote:
> Date: Fri, 29 Mar 2024 12:10:17 +0100
> From: Paolo Bonzini 
> Subject: Re: [RFC] util/error-report: Add "error: " prefix for error-level
>  report
> 
> On Fri, Mar 29, 2024 at 10:37 AM  wrote:
> > > This was done in the context of inheriting the original error_report()
> > > interface without the prefix style. And it was also useful to have a
> > > means of error handling, such as exit(), when error occurs, so that the
> > > error message - the most serious level - can be noticed by the user.
> > >
> > > Nowadays, however, error_report() and its variants have a tendency to be
> > > "abused": it is used a lot just for the sake of logging something more
> > > noticeable than the "warn" or "info" level, in the absence of
> > > appropriate error handling logic.
> 
> Unfortunately, this is the reason why you _cannot_ do what this patch does.
> 
> For example:
> 
>   error_reportf_err(local_err, "Disconnect client, due to: ");
>   error_report("terminating on signal %d", shutdown_signal);
> 
> This should not be prepending "error" - it's not an error.

So I feel these 2 cases maybe should use info_report()?

>   error_report_once("%s: detected read error on DMAR slpte "
> 
> This is a guest error, so "error:" is probably not a good idea (it
> should use qemu_log_mask).

Yes, here I can do a cleanup.

> And so on. :(

error_report() and its variants have 2600+ use cases, and it's
impossible to distinguish whether ther're appropriate or not.

Thanks for your explanation, I understand this is not workable,
since there is too heavy the debt to sort out error_report().

Regards,
Zhao

[PATCH 0/1] Upstreaming Course Debugging Changes

2024-03-29 Thread Don Porter

Hi all,

I am a CS professor (and first time contributor) and have been using
qemu in my courses for over a decade, especially a course that asks
students to write major pieces of an OS kernel from starter code.

I have some patches, originally from Austin Clements at MIT, that I
have found useful over the years and that may be useful to others.  It
would also be nice not to have to build a custom qemu each semester.  I
have cleared upstreaming these with Austin, the original author.

In order to learn the process and community, I thought I would start
with one patch.  As the description says, it enables e1000 debugging
without recompilation.  One project in the course is to write an e1000
driver, and these logs are quite helpful to students.

Thank you in advance for your time,
Don Porter

Austin Clements (1):
  e1000: Get debug flags from an environment variable

 hw/net/e1000.c | 65 +-
 1 file changed, 59 insertions(+), 6 deletions(-)

--
2.25.1

[PATCH 1/1] e1000: Get debug flags from an environment variable

2024-03-29 Thread Don Porter

From: Austin Clements 

The E1000 debug messages are very useful for developing drivers, so
this introduces an E1000_DEBUG environment variable that lets the
debug flags be set without recompiling QEMU.

Signed-off-by: Austin Clements 
[geo...@ldpreload.com: Rebased on top of 2.9.0]
Signed-off-by: Geoffrey Thomas 
Signed-off-by: Don Porter 
---
 hw/net/e1000.c | 65 +-
 1 file changed, 59 insertions(+), 6 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 43f3a4a701..8d46225944 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -30,11 +30,14 @@
 #include "hw/pci/pci_device.h"
 #include "hw/qdev-properties.h"
 #include "migration/vmstate.h"
+#include "monitor/monitor.h"
 #include "net/eth.h"
 #include "net/net.h"
 #include "net/checksum.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/dma.h"
+#include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
 #include "qemu/iov.h"
 #include "qemu/module.h"
 #include "qemu/range.h"
@@ -44,15 +47,19 @@
 #include "trace.h"
 #include "qom/object.h"
 
-/* #define E1000_DEBUG */
-
-#ifdef E1000_DEBUG
 enum {
 DEBUG_GENERAL,  DEBUG_IO,   DEBUG_MMIO, DEBUG_INTERRUPT,
 DEBUG_RX,   DEBUG_TX,   DEBUG_MDIC, DEBUG_EEPROM,
 DEBUG_UNKNOWN,  DEBUG_TXSUM,DEBUG_TXERR,DEBUG_RXERR,
 DEBUG_RXFILTER, DEBUG_PHY,  DEBUG_NOTYET,
 };
+
+static const char *debugnames[] = {
+"GENERAL",  "IO",   "MMIO", "INTERRUPT",
+"RX",   "TX",   "MDIC", "EEPROM",
+"UNKNOWN",  "TXSUM","TXERR","RXERR",
+"RXFILTER", "PHY",  "NOTYET",   NULL
+};
 #define DBGBIT(x)(1<

[PATCH v6 1/2] Refactor common functions between POSIX and Windows implementation

2024-03-29 Thread aidan_leuck

From: aidaleuc 

Signed-off-by: aidaleuc 
---
 qga/commands-common-ssh.c | 50 +++
 qga/commands-common-ssh.h | 10 
 qga/commands-posix-ssh.c  | 47 +---
 qga/meson.build   |  1 +
 4 files changed, 62 insertions(+), 46 deletions(-)
 create mode 100644 qga/commands-common-ssh.c
 create mode 100644 qga/commands-common-ssh.h

diff --git a/qga/commands-common-ssh.c b/qga/commands-common-ssh.c
new file mode 100644
index 00..537869fb98
--- /dev/null
+++ b/qga/commands-common-ssh.c
@@ -0,0 +1,50 @@
+/*
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "commands-common-ssh.h"
+
+GStrv read_authkeys(const char *path, Error **errp)
+{
+g_autoptr(GError) err = NULL;
+g_autofree char *contents = NULL;
+
+if (!g_file_get_contents(path, , NULL, )) {
+error_setg(errp, "failed to read '%s': %s", path, err->message);
+return NULL;
+}
+
+return g_strsplit(contents, "\n", -1);
+}
+
+bool check_openssh_pub_keys(strList *keys, size_t *nkeys, Error **errp)
+{
+size_t n = 0;
+strList *k;
+
+for (k = keys; k != NULL; k = k->next) {
+if (!check_openssh_pub_key(k->value, errp)) {
+return false;
+}
+n++;
+}
+
+if (nkeys) {
+*nkeys = n;
+}
+return true;
+}
+
+bool check_openssh_pub_key(const char *key, Error **errp)
+{
+/* simple sanity-check, we may want more? */
+if (!key || key[0] == '#' || strchr(key, '\n')) {
+error_setg(errp, "invalid OpenSSH public key: '%s'", key);
+return false;
+}
+
+return true;
+}
diff --git a/qga/commands-common-ssh.h b/qga/commands-common-ssh.h
new file mode 100644
index 00..14d955fa84
--- /dev/null
+++ b/qga/commands-common-ssh.h
@@ -0,0 +1,10 @@
+/*
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qapi/qapi-builtin-types.h"
+
+GStrv read_authkeys(const char *path, Error **errp);
+bool check_openssh_pub_keys(strList *keys, size_t *nkeys, Error **errp);
+bool check_openssh_pub_key(const char *key, Error **errp);
diff --git a/qga/commands-posix-ssh.c b/qga/commands-posix-ssh.c
index 236f80de44..dd2ecb453a 100644
--- a/qga/commands-posix-ssh.c
+++ b/qga/commands-posix-ssh.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 
+#include "commands-common-ssh.h"
 #include "qapi/error.h"
 #include "qga-qapi-commands.h"
 
@@ -80,37 +81,6 @@ mkdir_for_user(const char *path, const struct passwd *p,
 return true;
 }
 
-static bool
-check_openssh_pub_key(const char *key, Error **errp)
-{
-/* simple sanity-check, we may want more? */
-if (!key || key[0] == '#' || strchr(key, '\n')) {
-error_setg(errp, "invalid OpenSSH public key: '%s'", key);
-return false;
-}
-
-return true;
-}
-
-static bool
-check_openssh_pub_keys(strList *keys, size_t *nkeys, Error **errp)
-{
-size_t n = 0;
-strList *k;
-
-for (k = keys; k != NULL; k = k->next) {
-if (!check_openssh_pub_key(k->value, errp)) {
-return false;
-}
-n++;
-}
-
-if (nkeys) {
-*nkeys = n;
-}
-return true;
-}
-
 static bool
 write_authkeys(const char *path, const GStrv keys,
const struct passwd *p, Error **errp)
@@ -139,21 +109,6 @@ write_authkeys(const char *path, const GStrv keys,
 return true;
 }
 
-static GStrv
-read_authkeys(const char *path, Error **errp)
-{
-g_autoptr(GError) err = NULL;
-g_autofree char *contents = NULL;
-
-if (!g_file_get_contents(path, , NULL, )) {
-error_setg(errp, "failed to read '%s': %s", path, err->message);
-return NULL;
-}
-
-return g_strsplit(contents, "\n", -1);
-
-}
-
 void
 qmp_guest_ssh_add_authorized_keys(const char *username, strList *keys,
   bool has_reset, bool reset,
diff --git a/qga/meson.build b/qga/meson.build
index 1c3d2a3d1b..4c3899751b 100644
--- a/qga/meson.build
+++ b/qga/meson.build
@@ -66,6 +66,7 @@ qga_ss.add(files(
   'guest-agent-command-state.c',
   'main.c',
   'cutils.c',
+  'commands-common-ssh.c'
 ))
 if host_os == 'windows'
   qga_ss.add(files(
-- 
2.34.1

[PATCH v6 2/2] Implement SSH commands in QEMU GA for Windows

2024-03-29 Thread aidan_leuck

From: aidaleuc 

Signed-off-by: aidaleuc 
---
 qga/commands-windows-ssh.c | 789 +
 qga/commands-windows-ssh.h |  26 ++
 qga/meson.build|   5 +-
 qga/qapi-schema.json   |  17 +-
 4 files changed, 826 insertions(+), 11 deletions(-)
 create mode 100644 qga/commands-windows-ssh.c
 create mode 100644 qga/commands-windows-ssh.h

diff --git a/qga/commands-windows-ssh.c b/qga/commands-windows-ssh.c
new file mode 100644
index 00..bfd944f0a4
--- /dev/null
+++ b/qga/commands-windows-ssh.c
@@ -0,0 +1,789 @@
+/*
+ * QEMU Guest Agent win32-specific command implementations for SSH keys.
+ * The implementation is opinionated and expects the SSH implementation to
+ * be OpenSSH.
+ *
+ * Copyright Schweitzer Engineering Laboratories. 2024
+ *
+ * Authors:
+ *  Aidan Leuck 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include 
+#include 
+
+#include "commands-common-ssh.h"
+#include "commands-windows-ssh.h"
+#include "guest-agent-core.h"
+#include "limits.h"
+#include "lmaccess.h"
+#include "lmapibuf.h"
+#include "lmerr.h"
+#include "qapi/error.h"
+
+#include "qga-qapi-commands.h"
+#include "sddl.h"
+#include "shlobj.h"
+#include "userenv.h"
+
+#define AUTHORIZED_KEY_FILE "authorized_keys"
+#define AUTHORIZED_KEY_FILE_ADMIN "administrators_authorized_keys"
+#define LOCAL_SYSTEM_SID "S-1-5-18"
+#define ADMIN_SID "S-1-5-32-544"
+#define WORLD_SID "S-1-1-0"
+
+/*
+ * Frees userInfo structure. This implements the g_auto cleanup
+ * for the structure.
+ */
+void free_userInfo(PWindowsUserInfo info)
+{
+g_free(info->sshDirectory);
+g_free(info->authorizedKeyFile);
+LocalFree(info->SSID);
+g_free(info->username);
+g_free(info);
+}
+
+/*
+ * Gets the admin SSH folder for OpenSSH. OpenSSH does not store
+ * the authorized_key file in the users home directory for security reasons and
+ * instead stores it at %PROGRAMDATA%/ssh. This function returns the path to
+ * that directory on the users machine
+ *
+ * parameters:
+ * errp -> error structure to set when an error occurs
+ * returns: The path to the ssh folder in %PROGRAMDATA% or NULL if an error
+ * occurred.
+ */
+static char *get_admin_ssh_folder(Error **errp)
+{
+/* Allocate memory for the program data path */
+g_autofree char *programDataPath = NULL;
+char *authkeys_path = NULL;
+PWSTR pgDataW = NULL;
+g_autoptr(GError) gerr = NULL;
+
+/* Get the KnownFolderPath on the machine. */
+HRESULT folderResult =
+SHGetKnownFolderPath(_ProgramData, 0, NULL, );
+if (folderResult != S_OK) {
+error_setg(errp, "Failed to retrieve ProgramData folder");
+return NULL;
+}
+
+/* Convert from a wide string back to a standard character string. */
+programDataPath = g_utf16_to_utf8(pgDataW, -1, NULL, NULL, );
+CoTaskMemFree(pgDataW);
+if (!programDataPath) {
+error_setg(errp,
+   "Failed converting ProgramData folder path to UTF-16 %s",
+   gerr->message);
+return NULL;
+}
+
+/* Build the path to the file. */
+authkeys_path = g_build_filename(programDataPath, "ssh", NULL);
+return authkeys_path;
+}
+
+/*
+ * Gets the path to the SSH folder for the specified user. If the user is an
+ * admin it returns the ssh folder located at %PROGRAMDATA%/ssh. If the user is
+ * not an admin it returns %USERPROFILE%/.ssh
+ *
+ * parameters:
+ * username -> Username to get the SSH folder for
+ * isAdmin -> Whether the user is an admin or not
+ * errp -> Error structure to set any errors that occur.
+ * returns: path to the ssh folder as a string.
+ */
+static char *get_ssh_folder(const char *username, const bool isAdmin,
+Error **errp)
+{
+DWORD maxSize = MAX_PATH;
+g_autofree char *profilesDir = g_new0(char, maxSize);
+
+if (isAdmin) {
+return get_admin_ssh_folder(errp);
+}
+
+/* If not an Admin the SSH key is in the user directory. */
+/* Get the user profile directory on the machine. */
+BOOL ret = GetProfilesDirectory(profilesDir, );
+if (!ret) {
+error_setg_win32(errp, GetLastError(),
+ "failed to retrieve profiles directory");
+return NULL;
+}
+
+/* Builds the filename */
+return g_build_filename(profilesDir, username, ".ssh", NULL);
+}
+
+/*
+ * Creates an entry for the everyone group. This is used when the user is an
+ * Administrator This is consistent with the folder permissions that OpenSSH
+ * creates when it is installed. Anyone can read the file, but only
+ * Administrators and SYSTEM can modify the file.
+ *
+ * parameters:
+ * userInfo -> Information about the current user
+ * pACL -> Pointer to an ACL structure
+ * errp -> Error structure to set any errors that occur
+ * returns: 1 on success, 0 otherwise
+ */
+static bool

[PATCH v6 0/2] Implement SSH commands in QEMU GA for Windows

2024-03-29 Thread aidan_leuck

From: aidaleuc 

This patch aims to implement guest-ssh-add-authorized-keys, 
guest-ssh-remove-authorized-keys, and guest-ssh-get-authorized-keys
for Windows. This PR is based on Microsoft's OpenSSH implementation 
https://github.com/PowerShell/Win32-OpenSSH. The guest agents 
will support Kubevirt and allow guest agent propagation to be used to 
dynamically inject SSH keys. 
https://kubevirt.io/user-guide/virtual_machines/accessing_virtual_machines/#dynamic-ssh-public-key-injection-via-qemu-guest-agent

Changes since v5
* Fixed spurious formatting 

Changes since v4
* Moved qapi/error.h to commands-common-ssh.c
* Changed  to "qapi/qapi-builtin-types.h" 
* Removed stbool.h from commands-common-ssh.h

Changes since v3
* Renamed commands-ssh-core.c/h to commands-common-ssh.c/h
* Fixed styling errors discovered by checkpatch.pl 
* Moved some header includes to the commands-common-ssh.h

Changes since v2
* Set indent to 4 spaces
* Moved all comments to C style comments
* Fixed a segfault bug in get_user_info function related to non zeroed memory 
when a user did not exist.
* Used g_new0 instead of g_malloc where applicable
* Modified newlines in qapi-schema.json
* Added newlines at the end of all files
* GError functions now use g_autoptr instead of being freed manually.
* Refactored get_ssh_folder to remove goto error statement
* Fixed uninitialized variable pgDataW
* Modified patch order so that the generalization patch is the first patch
* Removed unnecssary ZeroMemory calls

Changes since v1
* Fixed styling errors
* Moved from wcstombs to g_utf functions
* Removed unnecessary if checks on calls to free
* Fixed copyright headers
* Refactored create_acl functions into base function, admin function and user 
function
* Removed unused user count function
* Split up refactor of existing code into a separate patch

aidaleuc (2):
  Refactor common functions between POSIX and Windows implementation
  Implement SSH commands in QEMU GA for Windows

 qga/commands-common-ssh.c  |  50 +++
 qga/commands-common-ssh.h  |  10 +
 qga/commands-posix-ssh.c   |  47 +--
 qga/commands-windows-ssh.c | 789 +
 qga/commands-windows-ssh.h |  26 ++
 qga/meson.build|  12 +-
 qga/qapi-schema.json   |  17 +-
 7 files changed, 893 insertions(+), 58 deletions(-)
 create mode 100644 qga/commands-common-ssh.c
 create mode 100644 qga/commands-common-ssh.h
 create mode 100644 qga/commands-windows-ssh.c
 create mode 100644 qga/commands-windows-ssh.h

-- 
2.34.1

Re: [PATCH v1 01/11] Introduce a common abstract struct HostIOMMUDevice

2024-03-29 Thread Cédric Le Goater


Hello Zhenzhong,

On 3/28/24 04:06, Duan, Zhenzhong wrote:

Hi Cédric,


-Original Message-
From: Cédric Le Goater 
Subject: Re: [PATCH v1 01/11] Introduce a common abstract struct
HostIOMMUDevice

Hello Zhenzhong,

On 3/19/24 12:58, Duan, Zhenzhong wrote:

Hi Cédric,


-Original Message-
From: Cédric Le Goater 
Sent: Tuesday, March 19, 2024 4:17 PM
To: Duan, Zhenzhong ; qemu-
de...@nongnu.org
Cc: alex.william...@redhat.com; eric.au...@redhat.com;
pet...@redhat.com; jasow...@redhat.com; m...@redhat.com;
j...@nvidia.com; nicol...@nvidia.com; joao.m.mart...@oracle.com; Tian,
Kevin ; Liu, Yi L ; Sun, Yi Y
; Peng, Chao P 
Subject: Re: [PATCH v1 01/11] Introduce a common abstract struct
HostIOMMUDevice

Hello Zhenzhong,

On 2/28/24 04:58, Zhenzhong Duan wrote:

HostIOMMUDevice will be inherited by two sub classes,
legacy and iommufd currently.

Introduce a helper function host_iommu_base_device_init to initialize it.

Suggested-by: Eric Auger 
Signed-off-by: Zhenzhong Duan 
---
include/sysemu/host_iommu_device.h | 22

++

1 file changed, 22 insertions(+)
create mode 100644 include/sysemu/host_iommu_device.h

diff --git a/include/sysemu/host_iommu_device.h

b/include/sysemu/host_iommu_device.h

new file mode 100644
index 00..fe80ab25fb
--- /dev/null
+++ b/include/sysemu/host_iommu_device.h
@@ -0,0 +1,22 @@
+#ifndef HOST_IOMMU_DEVICE_H
+#define HOST_IOMMU_DEVICE_H
+
+typedef enum HostIOMMUDevice_Type {
+HID_LEGACY,
+HID_IOMMUFD,
+HID_MAX,
+} HostIOMMUDevice_Type;
+
+typedef struct HostIOMMUDevice {
+HostIOMMUDevice_Type type;


A type field is not a good sign and that's where QOM is useful.


Yes, agree.
I didn't choose QOM because in iommufd-cdev series, VFIOContainer

chooses not using QOM model.

See the discussion:

https://lore.kernel.org/all/YmuFv2s5TPuw7K%2Fu@yekko/

I thought HostIOMMUDevice need to follow same rule.

But after further digging into this, I think it may be ok to use QOM model

as long as we don't expose

HostIOMMUDevice in qapi/qom.json and not use USER_CREATABLE

interface. Your thoughts?

yes. Can we change a bit this series to use QOM ? something like :

 typedef struct HostIOMMUDevice {
 Object parent;
 } HostIOMMUDevice;

 #define TYPE_HOST_IOMMU "host.iommu"
 OBJECT_DECLARE_TYPE(HostIOMMUDevice, HostIOMMUClass,
HOST_IOMMU)

 struct HostIOMMUClass {
 ObjectClass parent_class;

 int (*get_type)(HostIOMMUDevice *hiod, uint64_t *type, Error **errp);
 int (*get_cap)(HostIOMMUDevice *hiod, uint64_t *cap, Error **errp);
 };

Inherited objects would be TYPE_HOST_IOMMU_IOMMUFD and
TYPE_HOST_IOMMU_LEGACY.
Each class implementing the handlers or not (legacy mode).


Understood, thanks for your guide.



The class handlers are introduced for the intel-iommu helper
vtd_check_hdev()
in order to avoid using iommufd routines directly. HostIOMMUDevice is
supposed
to abstract the Host IOMMU device, so we need to abstract also all the
interfaces to this object.


I'd like to have a minimal adjustment to class handers. Just let me know if you 
have strong
preference.

Cap/ecap is intel_iommu specific, I'd like to make it a bit generic also for 
arm smmu usage,
and merge get_type and get_cap into one function as they both calls 
ioctl(IOMMU_GET_HW_INFO),
something like:
get_info(HostIOMMUDevice *hiod, enum iommu_hw_info_type *type, void **data, 
void **len,  Error **errp);


OK. Let's see how it goes. Having more users of this new object Host
IOMMU device is important to get a better feeling of the interface.
As of today, it doesn't have not much value. The iommufd object could
be QOM linked to the vIOMMU when available and we could get the bind
devid in some other ways I suppose. Anyhow, please keep it simple and
let's explore.

Thanks,

C.





and let iommu emulater to extract content of *data. For intel_iommu, it's:

struct iommu_hw_info_vtd {
 __u32 flags;
 __u32 __reserved;
 __aligned_u64 cap_reg;
 __aligned_u64 ecap_reg;
};



The .host_iommu_device_create() handler could be merged
in .attach_device()
possibly. Anyhow, please use now object_new() and object_unref() instead.
host_iommu_base_device_init() is useless IMHO.


Good idea, will do.







Is vtd_check_hdev() the only use of this field ?


Currently yes. virtio-iommu may have similar usage.


If so, can we simplify with a QOM interface in any way ?


QOM interface is a set of callbacks, guess you mean QOM class,
saying HostIOMMUDevice class, IOMMULegacyDevice class and

IOMMUFDDevice class?

See above proposal. it should work fine.

Also, I think it is better to use a IOMMUFDBackend* parameter for
iommufd_device_get_info() to be consistent with the other routines.


Sure, then I'd like to also rename it to iommufd_backend_get_device_info().

Thanks
Zhenzhong



Then It would interesting to see how this applies to Eric's series.

Thanks,

C.

[PATCH] raspi4b: Reduce RAM to 1Gb on 32-bit hosts

2024-03-29 Thread Cédric Le Goater

Change the board revision number and RAM size to 1Gb on 32-bit hosts.
On these systems, RAM has a 2047 MB limit and this breaks the tests.

Fixes: 7785e8ea2204 ("hw/arm: Introduce Raspberry PI 4 machine")
Signed-off-by: Cédric Le Goater 
---
 hw/arm/raspi4b.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/arm/raspi4b.c b/hw/arm/raspi4b.c
index 
cb1b1f2f147e8685a1dba6f137335ea0bc89bca5..85877880fc706d216de04ff1e081d66e6080ebac
 100644
--- a/hw/arm/raspi4b.c
+++ b/hw/arm/raspi4b.c
@@ -112,7 +112,11 @@ static void raspi4b_machine_class_init(ObjectClass *oc, 
void *data)
 MachineClass *mc = MACHINE_CLASS(oc);
 RaspiBaseMachineClass *rmc = RASPI_BASE_MACHINE_CLASS(oc);
 
+#if HOST_LONG_BITS == 32
+rmc->board_rev = 0xa03111; /* Revision 1.1, 1 Gb RAM */
+#else
 rmc->board_rev = 0xb03115; /* Revision 1.5, 2 Gb RAM */
+#endif
 raspi_machine_class_common_init(mc, rmc->board_rev);
 mc->init = raspi4b_machine_init;
 }
-- 
2.44.0

[PATCH] target/riscv/cpu_helper.c: fix wrong exception raise

2024-03-29 Thread Alexei Filippov

Successed two stage translation, but failed pmp check can cause guest
page fault instead of regular page fault.

In case of execution ld instuction in VS mode we can
face situation when two stages of translation was passed successfully,
and if PMP check was failed first_stage_error variable going to be
setted to false and raise_mmu_exception will raise
RISCV_EXCP_LOAD_GUEST_ACCESS_FAULT(scause == 21) instead of
RISCV_EXCP_LOAD_ACCESS_FAULT(scause == 5) and this is wrong, according
to RISCV privileged spec sect. 3.7: Attempting to execute a load or
load-reserved instruction which accesses a physical address within a
PMP region without read permissions raises a load access-fault
exception.

Signed-off-by: Alexei Filippov 
---
 target/riscv/cpu_helper.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index bc70ab5abc..eaef1c2c62 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -1408,9 +1408,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
   __func__, pa, ret, prot_pmp, tlb_size);
 
 prot &= prot_pmp;
-}
-
-if (ret != TRANSLATE_SUCCESS) {
+} else {
 /*
  * Guest physical address translation failed, this is a HS
  * level exception
-- 
2.34.1

QEMU Community Call Agenda Items (April 2nd, 2024)

2024-03-29 Thread Philippe Mathieu-Daudé


Hi,

The KVM/QEMU community call is at:

  https://meet.jit.si/kvmcallmeeting
  @
  2/4/2024 14:00 UTC

Are there any agenda items for the sync-up?

Alex maintains the invite on our Linaro project calendar here:


https://calendar.google.com/calendar/event?action=TEMPLATE=MWd2dWI5NDM1bzdocnJlbTBhMHJhbG5sNWlfMjAyNDAyMjBUMTQwMDAwWiBjX2s1cDJscGd2YnB0ZGlya3U1c2kwMWJsbW5rQGc=c_k5p2lpgvbptdirku5si01blmnk%40group.calendar.google.com=ALL

If you want to be added to the invite list let him know and you can get
spammed by your calendar app as well 

Regards,

Phil.

Re: [RFC PATCH-for-9.1 14/21] system: Introduce QMP generic_query_cpu_definitions()

2024-03-29 Thread Philippe Mathieu-Daudé


Hi Markus,

On 26/3/24 14:28, Markus Armbruster wrote:

Philippe Mathieu-Daudé  writes:


Each target use a common template for qmp_query_cpu_definitions().

Extract it as generic_query_cpu_definitions(), keeping the
target-specific implementations as the following SysemuCPUOps
handlers:
  - cpu_list_compare()
  - add_definition()
  - add_alias_definitions()

Signed-off-by: Philippe Mathieu-Daudé 
---
  MAINTAINERS   |  2 +
  include/hw/core/sysemu-cpu-ops.h  | 14 ++
  include/qapi/commands-target-compat.h | 14 ++
  system/cpu-qmp-cmds.c | 71 +++
  system/meson.build|  1 +
  5 files changed, 102 insertions(+)
  create mode 100644 include/qapi/commands-target-compat.h
  create mode 100644 system/cpu-qmp-cmds.c




diff --git a/system/cpu-qmp-cmds.c b/system/cpu-qmp-cmds.c
new file mode 100644
index 00..daeb131159
--- /dev/null
+++ b/system/cpu-qmp-cmds.c
@@ -0,0 +1,71 @@
+/*
+ * QAPI helpers for target specific QMP commands
+ *
+ * SPDX-FileCopyrightText: 2024 Linaro Ltd.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qom/object.h"
+#include "qapi/commands-target-compat.h"
+#include "sysemu/arch_init.h"
+#include "hw/core/cpu.h"
+#include "hw/core/sysemu-cpu-ops.h"
+
+static void cpu_common_add_definition(gpointer data, gpointer user_data)
+{
+ObjectClass *oc = data;
+CpuDefinitionInfoList **cpu_list = user_data;
+CpuDefinitionInfo *info;
+const char *typename;
+
+typename = object_class_get_name(oc);
+info = g_malloc0(sizeof(*info));
+info->name = cpu_model_from_type(typename);
+info->q_typename = g_strdup(typename);
+
+QAPI_LIST_PREPEND(*cpu_list, info);
+}
+
+static void arch_add_cpu_definitions(CpuDefinitionInfoList **cpu_list,
+ const char *cpu_typename)
+{
+ObjectClass *oc;
+GSList *list;
+const struct SysemuCPUOps *ops;
+
+oc = object_class_by_name(cpu_typename);
+if (!oc) {
+return;
+}
+ops = CPU_CLASS(oc)->sysemu_ops;
+
+list = object_class_get_list(cpu_typename, false);
+if (ops->cpu_list_compare) {
+list = g_slist_sort(list, ops->cpu_list_compare);
+}
+g_slist_foreach(list, ops->add_definition ? : cpu_common_add_definition,
+cpu_list);
+g_slist_free(list);
+
+if (ops->add_alias_definitions) {
+ops->add_alias_definitions(cpu_list);
+}
+}
+
+CpuDefinitionInfoList *generic_query_cpu_definitions(Error **errp)
+{
+CpuDefinitionInfoList *cpu_list = NULL;
+
+for (unsigned i = 0; i <= QEMU_ARCH_BIT_LAST; i++) {
+const char *cpu_typename;
+
+cpu_typename = cpu_typename_by_arch_bit(i);
+if (!cpu_typename) {
+continue;
+}
+arch_add_cpu_definitions(_list, cpu_typename);
+}
+
+return cpu_list;
+}


The target-specific qmp_query_cpu_definitions() this is going to replace
each execute the equivalent of *one* loop iteration: the one
corresponding to their own arch bit.

For the replacement to be faithful, as cpu_typename_by_arch_bit() must
return non-null exactly once.

This is the case for the qemu-system-TARGET.  The solution feels
overengineered there.

I figure cpu_typename_by_arch_bit() will return non-null multiple times
in a future single binary supporting heterogeneous machines.

Such a single binary then can't serve as drop-in replacement for the
qemu-system-TARGET, because query-cpu-definitions returns more.

To get a drop-in replacement, we'll need additional logic to restrict
the query for the homogeneous use case.


Can we ask the management layer to provide the current homogeneous
target via argument? Otherwise we can add a new query-cpu-definitions-v2
command requiring an explicit target argument, allowing 'all', and
deprecate the current query-cpu-definitions.


I think this needs to be discussed in the commit message.

Possibly easier: don't loop over the bits, relying on
cpu_typename_by_arch_bit() to select the right one.  Instead get the
right bit from somewhere.

We can switch to a loop when we need it for the heterogeneous case.


Alex suggested to consider heterogeneous emulation the new default,
and the current homogeneous use as a particular case. I'd rather not
plan on a "heterogeneous switch day" and get things integrated in
the way, otherwise we'll never get there...

Regards,

Phil.

Re: [PATCH v1] coroutine: avoid inserting duplicate coroutine to co_queue_wakeup

2024-03-29 Thread Zhu Yangyang via

On Thu, 28 Mar 2024 07:40:14 -0500, Eric Blake wrote:
> On Mon, Mar 25, 2024 at 05:18:50PM +0800, zhuyangyang via wrote:
> > If g_main_loop_run()/aio_poll() is called in the coroutine context,
> > the pending coroutine may be woken up repeatedly, and the co_queue_wakeup
> > may be disordered.
> > 
> > When the poll() syscall exited in g_main_loop_run()/aio_poll(), it means
> > some listened events is completed. Therefore, the completion callback
> > function is dispatched.
> > 
> > If this callback function needs to invoke aio_co_enter(), it will only
> > wake up the coroutine (because we are already in coroutine context),
> > which may cause that the data on this listening event_fd/socket_fd
> > is not read/cleared. When the next poll () exits, it will be woken up again
> > and inserted into the wakeup queue again.
> > 
> > For example, if TLS is enabled in NBD, the server will call 
> > g_main_loop_run()
> > in the coroutine, and repeatedly wake up the io_read event on a socket.
> > The call stack is as follows:
> > 
> > aio_co_enter()
> > aio_co_wake()
> > qio_channel_restart_read()
> > aio_dispatch_handler()
> > aio_dispatch_handlers()
> > aio_dispatch()
> > aio_ctx_dispatch()
> > g_main_context_dispatch()
> > g_main_loop_run()
> > nbd_negotiate_handle_starttls()
> > nbd_negotiate_options()
> > nbd_negotiate()
> > nbd_co_client_start()
> > coroutine_trampoline()
> 
> zhuyangyang, do you have a reliable reproduction setup for how you
> were able to trigger this?  Obviously, it only happens when TLS is
> enabled (we aren't creating a g_main_loop_run for any other NBD
> command), and only when the server is first starting to serve a
> client; is this a case where you were hammering a long-running qemu
> process running an NBD server with multiple clients trying to
> reconnect to the server all near the same time?

I'm sorry I didn't make the background of the problem clear before,
this problem is not on the NBD command, but on the VM live migration
with qemu TLS.

Next, I'll detail how to reproduce the issue.

1. Make the problem more obvious.

When TLS is enabled during live migration, the migration progress
may be suspended because some I/O are not returned during the mirror job
on target host.

Now we know that the reason is that some coroutines are lost.
The entry function of these lost coroutines are nbd_trip().

Add an assertion on the target host side to make the problem
show up quickly.

$ git diff util/async.c
diff --git a/util/async.c b/util/async.c
index 0467890052..4e3547c3ea 100644
--- a/util/async.c
+++ b/util/async.c
@@ -705,6 +705,7 @@ void aio_co_enter(AioContext *ctx, Coroutine *co)
 if (qemu_in_coroutine()) {
 Coroutine *self = qemu_coroutine_self();
 assert(self != co);
+assert(!co->co_queue_next.sqe_next);
 QSIMPLEQ_INSERT_TAIL(>co_queue_wakeup, co, co_queue_next);
 } else {
 qemu_aio_coroutine_enter(ctx, co);

2. Reproduce the issue

1) start vm on the origin host

Note: Configuring multiple disks for a VM 
(It is recommended to configure more than 6 disks)

These disk tasks(nbd_trip) will be performed simultaneously 
with nbd_negotiate_handle_starttls() on the main thread during migrate.


  centos7.3_64_server
  4194304
  4194304
  4
  
/machine
  
  
hvm

  
  



  
  
  



  
  destroy
  restart
  restart
  
/usr/bin/qemu-kvm

  
  
  
  
  
  


  
  
  
  
  
  


  
  
  
  
  
  


  
  
  
  
  
  


  
  
  
  
  
  


  
  
  
  
  
  


  

  


$ virsh create vm_x86.xml
Domain 'centos7.3_64_server' created from /home/vm_x86.xml

2) migrate the vm to target host
virsh migrate --live --p2p \
   --migrateuri tcp:10.91.xxx.xxx centos7.3_64_server 
qemu+tcp://10.91.xxx.xxx/system \
   --copy-storage-all \
   --tls

Than, An error is reported on the peer host
qemu-kvm: ../util/async.c:705: aio_co_enter: Assertion 
`!co->co_queue_next.sqe_next' failed.

> 
> If we can come up with a reliable formula for reproducing the
> corrupted coroutine list, it would make a great iotest addition
> alongside the existing qemu-iotests 233 for ensuring that NBD TLS
> traffic is handled correctly in both server and client.

I'm not sure if this can be used for testing of qemu-nbd

> 
> > 
> > Signed-off-by: zhuyangyang 
> 
> Side note: this appears to be your first qemu contribution (based on
> 'git shortlog --author zhuyangyang').  While I am not in a position to
> presume how you would like your name Anglicized, I will point out that
> the prevailing style is to separate given name from family name (just
> because your username at work has no spaces does not mean that your
> S-o-b has to follow suit).  It is also permissible to list your name
> in native characters alongside or in place of the

Re: [PATCH for-9.1 v5 1/3] hw: Add compat machines for 9.1

2024-03-29 Thread Paolo Bonzini

On Thu, Mar 28, 2024 at 11:07 AM Zhao Liu  wrote:
>
> Hi Paolo,
>
> Just meet typos when compiling ;-)

Thank you very much! Fixed both.

Paolo

> On Mon, Mar 25, 2024 at 03:14:20PM +0100, Paolo Bonzini wrote:
> > Date: Mon, 25 Mar 2024 15:14:20 +0100
> > From: Paolo Bonzini 
> > Subject: [PATCH for-9.1 v5 1/3] hw: Add compat machines for 9.1
> > X-Mailer: git-send-email 2.44.0
>
> [snip]
>
> > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> > index b1dcb3857f0..67e8b0b05e8 100644
> > --- a/hw/s390x/s390-virtio-ccw.c
> > +++ b/hw/s390x/s390-virtio-ccw.c
> > @@ -859,14 +859,26 @@ bool css_migration_enabled(void)
> >  }  
> >\
> >  type_init(ccw_machine_register_##suffix)
> >
> > +static void ccw_machine_9_1_instance_options(MachineState *machine)
> > +{
> > +}
> > +
> > +static void ccw_machine_9_1_class_options(MachineClass *mc)
> > +{
> > +}
> > +DEFINE_CCW_MACHINE(9_0, "9.1", true);
>
> Should be:
>
> DEFINE_CCW_MACHINE(9_1, "9.1", true);
>
> > +
> >  static void ccw_machine_9_0_instance_options(MachineState *machine)
> >  {
> > +ccw_machine_9_1_instance_options(machine);
> >  }
> >
> >  static void ccw_machine_9_0_class_options(MachineClass *mc)
> >  {
> > +ccw_machine_9_1_class_options(machine);
>
> s/machine/mc/
>
> Regards,
> Zhao
>

Re: [PATCH] hw/xen_evtchn: Initialize flush_kvm_routes

2024-03-29 Thread Philippe Mathieu-Daudé


On 29/3/24 12:39, Artem Chernyshev wrote:

In xen_evtchn_soft_reset() variable flush_kvm_routes can
be used before being initialized.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Oleg Sviridov 
Signed-off-by: Artem Chernyshev 
---
  hw/i386/kvm/xen_evtchn.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH for-9.1] migration: Add Error** argument to add_bitmaps_to_list()

2024-03-29 Thread Philippe Mathieu-Daudé


On 29/3/24 11:56, Cédric Le Goater wrote:

This allows to report more precise errors in the migration handler
dirty_bitmap_save_setup().

Suggested-by Vladimir Sementsov-Ogievskiy 
Signed-off-by: Cédric Le Goater 
---

  To apply on top of :
  https://lore.kernel.org/qemu-devel/20240320064911.545001-1-...@redhat.com/
  
  migration/block-dirty-bitmap.c | 34 --

  1 file changed, 16 insertions(+), 18 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH for-9.0] tests/qtest: Fix STM32L4x5 GPIO test on 32-bit

2024-03-29 Thread Philippe Mathieu-Daudé


Hi Cédric, Thomas,

On 29/3/24 10:27, Cédric Le Goater wrote:

The test mangles the GPIO address and the pin number in the
qtest_add_data_func data parameter. Doing so, it assumes that the host
pointer size is always 64-bit, which breaks on 32-bit :

../tests/qtest/stm32l4x5_gpio-test.c: In function ‘test_gpio_output_mode’:
../tests/qtest/stm32l4x5_gpio-test.c:272:25: error: cast from pointer to 
integer of different size [-Werror=pointer-to-int-cast]
   272 | unsigned int pin = ((uint64_t)data) & 0xF;
   | ^
../tests/qtest/stm32l4x5_gpio-test.c:273:22: error: cast from pointer to 
integer of different size [-Werror=pointer-to-int-cast]
   273 | uint32_t gpio = ((uint64_t)data) >> 32;
   |  ^


Any clue why this isn't this covered by CI?



To fix, improve the mangling of the GPIO address and pin number fields
by using GPIO_SIZE so that the resulting value fits in a 32-bit pointer.
While at it, include some helpers to hide the details.

Cc: Arnaud Minier 
Cc: Inès Varhol 
Signed-off-by: Cédric Le Goater 
---
  tests/qtest/stm32l4x5_gpio-test.c | 59 ++-
  1 file changed, 35 insertions(+), 24 deletions(-)

Re: [PATCH] linux-user/syscall: xtensa: fix target_msqid_ds and ipc_perm conversion

2024-03-29 Thread Philippe Mathieu-Daudé


Hi Max,

On 29/3/24 07:31, Max Filippov wrote:

- target_ipc_perm::mode and target_ipc_perm::__seq fields are 32-bit wide
   on xtensa and thus need to use tswap32
- target_msqid_ds::msg_*time field pairs are reversed on big-endian
   xtensa


Please split in 2 distinct patches.


Both issues result in incorrect conversion results on big-endian xtensa
targets, spotted by the libc-test http://nsz.repo.hu/git/?p=libc-test

Cc: qemu-sta...@nongnu.org
Fixes: a3da8be5126b ("target/xtensa: linux-user: fix sysv IPC structures")
Signed-off-by: Max Filippov 
---
  linux-user/syscall.c | 19 +++
  1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index e384e1424890..cb334e90d6f0 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -3758,12 +3758,13 @@ static inline abi_long target_to_host_ipc_perm(struct 
ipc_perm *host_ip,
  host_ip->gid = tswap32(target_ip->gid);
  host_ip->cuid = tswap32(target_ip->cuid);
  host_ip->cgid = tswap32(target_ip->cgid);
-#if defined(TARGET_ALPHA) || defined(TARGET_MIPS) || defined(TARGET_PPC)
+#if defined(TARGET_ALPHA) || defined(TARGET_MIPS) || defined(TARGET_PPC) || \
+defined(TARGET_XTENSA)
  host_ip->mode = tswap32(target_ip->mode);
  #else
  host_ip->mode = tswap16(target_ip->mode);
  #endif
-#if defined(TARGET_PPC)
+#if defined(TARGET_PPC) || defined(TARGET_XTENSA)
  host_ip->__seq = tswap32(target_ip->__seq);
  #else
  host_ip->__seq = tswap16(target_ip->__seq);
@@ -3786,12 +3787,13 @@ static inline abi_long 
host_to_target_ipc_perm(abi_ulong target_addr,
  target_ip->gid = tswap32(host_ip->gid);
  target_ip->cuid = tswap32(host_ip->cuid);
  target_ip->cgid = tswap32(host_ip->cgid);
-#if defined(TARGET_ALPHA) || defined(TARGET_MIPS) || defined(TARGET_PPC)
+#if defined(TARGET_ALPHA) || defined(TARGET_MIPS) || defined(TARGET_PPC) || \
+defined(TARGET_XTENSA)
  target_ip->mode = tswap32(host_ip->mode);
  #else
  target_ip->mode = tswap16(host_ip->mode);
  #endif
-#if defined(TARGET_PPC)
+#if defined(TARGET_PPC) || defined(TARGET_XTENSA)
  target_ip->__seq = tswap32(host_ip->__seq);
  #else
  target_ip->__seq = tswap16(host_ip->__seq);
@@ -4111,6 +4113,14 @@ static inline abi_long do_semtimedop(int semid,
  struct target_msqid_ds
  {
  struct target_ipc_perm msg_perm;
+#if defined(TARGET_XTENSA) && TARGET_BIG_ENDIAN


Why restrict to only Xtensa here?


+abi_ulong __unused1;
+abi_ulong msg_stime;
+abi_ulong __unused2;
+abi_ulong msg_rtime;
+abi_ulong __unused3;
+abi_ulong msg_ctime;
+#else
  abi_ulong msg_stime;
  #if TARGET_ABI_BITS == 32
  abi_ulong __unused1;
@@ -4122,6 +4132,7 @@ struct target_msqid_ds
  abi_ulong msg_ctime;
  #if TARGET_ABI_BITS == 32
  abi_ulong __unused3;
+#endif
  #endif
  abi_ulong __msg_cbytes;
  abi_ulong msg_qnum;

[PATCH] hw/xen_evtchn: Initialize flush_kvm_routes

2024-03-29 Thread Artem Chernyshev

In xen_evtchn_soft_reset() variable flush_kvm_routes can
be used before being initialized.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Oleg Sviridov 
Signed-off-by: Artem Chernyshev 
---
 hw/i386/kvm/xen_evtchn.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index a5052c0ea3..07bd0c9ab8 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -1097,7 +1097,7 @@ static int close_port(XenEvtchnState *s, evtchn_port_t 
port,
 int xen_evtchn_soft_reset(void)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
-bool flush_kvm_routes;
+bool flush_kvm_routes = false;
 int i;
 
 if (!s) {
-- 
2.37.3

QEMU NVMe Emulation does not seem to work without Shadow Buffers Config

2024-03-29 Thread Constantine Gavrilov

I am trying to write a custom NVMe driver and I tested it with QEMU
NVMe emulation on Fedora 39.
I have not tried to set shadow buffers, even though the controller
reports such support.
What I see is the following:
1. Only the first command in the Admin queue generates interrupt and
completes (Identify Controller command).
2. Any subsequent command does not complete (no CQE even if polling
CQ) and does not generate an interrupt.

I have a suspicion that the implemented shadow buffers config support
causes this behavior.

Any ideas?

-- 

Constantine Gavrilov
Systems Architect

Re: QEMU NVMe Emulation does not seem to work without Shadow Buffers Config

2024-03-29 Thread Constantine Gavrilov

Please ignore. The issue was on my side.

On Fri, Mar 29, 2024 at 1:07 PM Constantine Gavrilov
 wrote:
>
> I am trying to write a custom NVMe driver and I tested it with QEMU
> NVMe emulation on Fedora 39.
> I have not tried to set shadow buffers, even though the controller
> reports such support.
> What I see is the following:
> 1. Only the first command in the Admin queue generates interrupt and
> completes (Identify Controller command).
> 2. Any subsequent command does not complete (no CQE even if polling
> CQ) and does not generate an interrupt.
>
> I have a suspicion that the implemented shadow buffers config support
> causes this behavior.
>
> Any ideas?
>
> --
> 
> Constantine Gavrilov
> Systems Architect
> 



-- 

Constantine Gavrilov
Systems Architect

Re: [PATCH] spapr: nested: use bitwise NOT operator for flags check

2024-03-29 Thread Philippe Mathieu-Daudé


On 29/3/24 05:34, Harsh Prateek Bora wrote:

Check for flag bit in H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE need to use
bitwise NOT operator to ensure no other flag bits are set.
Reported by Coverity as CID 1540008, 1540009.

Reported-by: Peter Maydell 
Signed-off by: Harsh Prateek Bora 
---
  hw/ppc/spapr_nested.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé

[PATCH v2] riscv: thead: Add th.sxstatus CSR emulation

2024-03-29 Thread Christoph Müllner

The th.sxstatus CSR can be used to identify available custom extension
on T-Head CPUs. The CSR is documented here:
  https://github.com/T-head-Semi/thead-extension-spec/pull/46

An important property of this patch is, that the th.sxstatus MAEE field
is not set (indicating that XTheadMaee is not available).
XTheadMaee is a memory attribute extension (similar to Svpbmt) which is
implemented in many T-Head CPUs (C906, C910, etc.) and utilizes bits
in PTEs that are marked as reserved. QEMU maintainers prefer to not
implement XTheadMaee, so we need give kernels a mechanism to identify
if XTheadMaee is available in a system or not. And this patch introduces
this mechanism in QEMU in a way that's compatible with real HW
(i.e., probing the th.sxstatus.MAEE bit).

Further context can be found on the list:
https://lists.gnu.org/archive/html/qemu-devel/2024-02/msg00775.html

Signed-off-by: Christoph Müllner 
---
 target/riscv/cpu.c   |  1 +
 target/riscv/cpu.h   |  3 ++
 target/riscv/meson.build |  1 +
 target/riscv/th_csr.c| 78 
 4 files changed, 83 insertions(+)
 create mode 100644 target/riscv/th_csr.c

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 36e3e5fdaf..b82ba95ae6 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -545,6 +545,7 @@ static void rv64_thead_c906_cpu_init(Object *obj)
 cpu->cfg.mvendorid = THEAD_VENDOR_ID;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(cpu, VM_1_10_SV39);
+th_register_custom_csrs(cpu);
 #endif
 
 /* inherited from parent obj via riscv_cpu_init() */
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 3b1a02b944..c9f8f06751 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -824,4 +824,7 @@ void riscv_cpu_register_gdb_regs_for_features(CPUState *cs);
 uint8_t satp_mode_max_from_map(uint32_t map);
 const char *satp_mode_str(uint8_t satp_mode, bool is_32_bit);
 
+/* Implemented in th_csr.c */
+void th_register_custom_csrs(RISCVCPU *cpu);
+
 #endif /* RISCV_CPU_H */
diff --git a/target/riscv/meson.build b/target/riscv/meson.build
index a5e0734e7f..a4bd61e52a 100644
--- a/target/riscv/meson.build
+++ b/target/riscv/meson.build
@@ -33,6 +33,7 @@ riscv_system_ss.add(files(
   'monitor.c',
   'machine.c',
   'pmu.c',
+  'th_csr.c',
   'time_helper.c',
   'riscv-qmp-cmds.c',
 ))
diff --git a/target/riscv/th_csr.c b/target/riscv/th_csr.c
new file mode 100644
index 00..66d260cabd
--- /dev/null
+++ b/target/riscv/th_csr.c
@@ -0,0 +1,78 @@
+/*
+ * T-Head-specific CSRs.
+ *
+ * Copyright (c) 2024 VRULL GmbH
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "cpu_vendorid.h"
+
+#define CSR_TH_SXSTATUS 0x5c0
+
+/* TH_SXSTATUS bits */
+#define TH_SXSTATUS_UCMEBIT(16)
+#define TH_SXSTATUS_MAEEBIT(21)
+#define TH_SXSTATUS_THEADISAEE  BIT(22)
+
+typedef struct {
+int csrno;
+int (*insertion_test)(RISCVCPU *cpu);
+riscv_csr_operations csr_ops;
+} riscv_csr;
+
+static RISCVException s_mode_csr(CPURISCVState *env, int csrno)
+{
+if (env->debugger)
+return RISCV_EXCP_NONE;
+
+if (env->priv >= PRV_S)
+return RISCV_EXCP_NONE;
+
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+static int test_thead_mvendorid(RISCVCPU *cpu)
+{
+if (cpu->cfg.mvendorid != THEAD_VENDOR_ID)
+return -1;
+return 0;
+}
+
+static RISCVException read_th_sxstatus(CPURISCVState *env, int csrno,
+   target_ulong *val)
+{
+/* We don't set MAEE here, because QEMU does not implement MAEE. */
+*val = TH_SXSTATUS_UCME | TH_SXSTATUS_THEADISAEE;
+return RISCV_EXCP_NONE;
+}
+
+static riscv_csr th_csr_list[] = {
+{
+.csrno = CSR_TH_SXSTATUS,
+.insertion_test = test_thead_mvendorid,
+.csr_ops = { "th.sxstatus", s_mode_csr, read_th_sxstatus }
+}
+};
+
+void th_register_custom_csrs(RISCVCPU *cpu)
+{
+for (size_t i = 0; i < ARRAY_SIZE(th_csr_list); i++) {
+int csrno = th_csr_list[i].csrno;
+riscv_csr_operations *csr_ops = _csr_list[i].csr_ops;
+if (!th_csr_list[i].insertion_test(cpu))
+riscv_set_csr_ops(csrno, csr_ops);
+}
+}
-- 
2.44.0

Re: [PATCH for-9.1 v5 07/14] migration: Add Error** argument to .save_setup() handler

2024-03-29 Thread Vladimir Sementsov-Ogievskiy


On 29.03.24 13:53, Cédric Le Goater wrote:

Hello Vladimir,

On 3/29/24 10:32, Vladimir Sementsov-Ogievskiy wrote:

On 20.03.24 09:49, Cédric Le Goater wrote:

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 
2708abf3d762de774ed294d3fdb8e56690d2974c..542a8c297b329abc30d1b3a205d29340fa59a961
 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -1213,12 +1213,14 @@ fail:
  return ret;
  }
-static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
+static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  DBMSaveState *s = &((DBMState *)opaque)->save;
  SaveBitmapState *dbms = NULL;
  if (init_dirty_bitmap_migration(s) < 0) {
+    error_setg(errp,
+   "Failed to initialize dirty tracking bitmap for blocks");


No, that's not about initializing a bitmap. This all is about migration of 
block-dirty-bitmaps themselves.

So correct would be say "Failed to initialize migration of block dirty bitmaps".

with this, for block dirty bitmap migration:
Acked-by: Vladimir Sementsov-Ogievskiy 


I had kept your previous R-b.


Oh, I missed that)



Should we remove it ? or is it ok if I address your comments below in a
followup patch, in which case the error message above would be removed.
 
Yes of course, you can keep my old r-b. Followup patch is appreciated





Still, a lot better is add errp to init_dirty_bitmap_migration() and to 
add_bitmaps_to_list() too: look,

init_dirty_bitmap_migration() fails only if add_bitmaps_to_list() fails

in turn,

add_bitmaps_to_list() have several clear failure points, where it always does 
error_report (or error_report_err), which would be better to pass-through to 
the user.


Good idea. Will do.

Thanks,

C.




--
Best regards,
Vladimir

[PATCH 1/1] ebpf: Added traces back. Changed source set for eBPF to 'system'.

2024-03-29 Thread Andrew Melnychenko

There was an issue with Qemu build with "--disable-system".
The traces could be generated and the build fails.
The traces were 'cut out' for previous patches, and overall,
the 'system' source set should be used like in pre-'eBPF blob' patches.

Signed-off-by: Andrew Melnychenko 
---
 ebpf/ebpf_rss.c  | 7 +++
 ebpf/meson.build | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
index 2e506f97435..c8a68da2c56 100644
--- a/ebpf/ebpf_rss.c
+++ b/ebpf/ebpf_rss.c
@@ -25,6 +25,8 @@
 #include "ebpf/rss.bpf.skeleton.h"
 #include "ebpf/ebpf.h"
 
+#include "trace.h"
+
 void ebpf_rss_init(struct EBPFRSSContext *ctx)
 {
 if (ctx != NULL) {
@@ -55,18 +57,21 @@ static bool ebpf_rss_mmap(struct EBPFRSSContext *ctx)
PROT_READ | PROT_WRITE, MAP_SHARED,
ctx->map_configuration, 0);
 if (ctx->mmap_configuration == MAP_FAILED) {
+trace_ebpf_error("eBPF RSS", "can not mmap eBPF configuration array");
 return false;
 }
 ctx->mmap_toeplitz_key = mmap(NULL, qemu_real_host_page_size(),
PROT_READ | PROT_WRITE, MAP_SHARED,
ctx->map_toeplitz_key, 0);
 if (ctx->mmap_toeplitz_key == MAP_FAILED) {
+trace_ebpf_error("eBPF RSS", "can not mmap eBPF toeplitz key");
 goto toeplitz_fail;
 }
 ctx->mmap_indirections_table = mmap(NULL, qemu_real_host_page_size(),
PROT_READ | PROT_WRITE, MAP_SHARED,
ctx->map_indirections_table, 0);
 if (ctx->mmap_indirections_table == MAP_FAILED) {
+trace_ebpf_error("eBPF RSS", "can not mmap eBPF indirection table");
 goto indirection_fail;
 }
 
@@ -108,12 +113,14 @@ bool ebpf_rss_load(struct EBPFRSSContext *ctx)
 
 rss_bpf_ctx = rss_bpf__open();
 if (rss_bpf_ctx == NULL) {
+trace_ebpf_error("eBPF RSS", "can not open eBPF RSS object");
 goto error;
 }
 
 bpf_program__set_type(rss_bpf_ctx->progs.tun_rss_steering_prog, 
BPF_PROG_TYPE_SOCKET_FILTER);
 
 if (rss_bpf__load(rss_bpf_ctx)) {
+trace_ebpf_error("eBPF RSS", "can not load RSS program");
 goto error;
 }
 
diff --git a/ebpf/meson.build b/ebpf/meson.build
index c5bf9295a20..bff6156f518 100644
--- a/ebpf/meson.build
+++ b/ebpf/meson.build
@@ -1 +1 @@
-common_ss.add(when: libbpf, if_true: files('ebpf.c', 'ebpf_rss.c'), if_false: 
files('ebpf_rss-stub.c'))
+system_ss.add(when: libbpf, if_true: files('ebpf.c', 'ebpf_rss.c'), if_false: 
files('ebpf_rss-stub.c'))
-- 
2.43.0

Re: [RFC] util/error-report: Add "error: " prefix for error-level report

2024-03-29 Thread Paolo Bonzini

On Fri, Mar 29, 2024 at 10:37 AM  wrote:
> > This was done in the context of inheriting the original error_report()
> > interface without the prefix style. And it was also useful to have a
> > means of error handling, such as exit(), when error occurs, so that the
> > error message - the most serious level - can be noticed by the user.
> >
> > Nowadays, however, error_report() and its variants have a tendency to be
> > "abused": it is used a lot just for the sake of logging something more
> > noticeable than the "warn" or "info" level, in the absence of
> > appropriate error handling logic.

Unfortunately, this is the reason why you _cannot_ do what this patch does.

For example:

  error_reportf_err(local_err, "Disconnect client, due to: ");
  error_report("terminating on signal %d", shutdown_signal);

This should not be prepending "error" - it's not an error.

  error_report_once("%s: detected read error on DMAR slpte "

This is a guest error, so "error:" is probably not a good idea (it
should use qemu_log_mask).

And so on. :(

Paolo

> > But, in the use case above, due to the lack of a prefix, it is in fact
> > less informative to the user than warn_report()/info_report() (with
> > "warn:" or "info: " prfix), which does not match its highest level.
> >
> > Therefore, add "error: " prefix for error-level report.
> >
> > [1]: https://lore.kernel.org/qemu-devel/87r2xuay5h@dusky.pond.sub.org/#t
> >
> > Signed-off-by: Zhao Liu 
> > ---
> >   util/error-report.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/util/error-report.c b/util/error-report.c
> > index 6e44a5573217..e981c0b032f0 100644
> > --- a/util/error-report.c
> > +++ b/util/error-report.c
> > @@ -213,6 +213,7 @@ static void vreport(report_type type, const char *fmt, 
> > va_list ap)
> >
> >   switch (type) {
> >   case REPORT_TYPE_ERROR:
> > +error_printf("error: ");
> >   break;
> >   case REPORT_TYPE_WARNING:
> >   error_printf("warning: ");
>
> Sounds like a good idea to me, but I think you should then also remove
> the hard-coded "error:" strings in the various error_reports():
>
> $ grep -ri 'error_report.*"error:'
> migration/block-dirty-bitmap.c:error_report("Error: block device name 
> is not set");
> migration/block-dirty-bitmap.c:error_report("Error: Unknown 
> bitmap alias '%s' on node "
> migration/block-dirty-bitmap.c:error_report("Error: unknown 
> dirty bitmap "
> migration/block-dirty-bitmap.c:error_report("Error: block device name 
> is not set");
> target/i386/kvm/kvm.c:error_report("error: failed to set MSR 0x%" 
> PRIx32 " to 0x%" PRIx64,
> target/i386/kvm/kvm.c:error_report("error: failed to get MSR 0x%" 
> PRIx32,
> util/vhost-user-server.c:error_report("Error: too big message 
> request: %d, "
> accel/hvf/hvf-all.c:error_report("Error: HV_ERROR");
> accel/hvf/hvf-all.c:error_report("Error: HV_BUSY");
> accel/hvf/hvf-all.c:error_report("Error: HV_BAD_ARGUMENT");
> accel/hvf/hvf-all.c:error_report("Error: HV_NO_RESOURCES");
> accel/hvf/hvf-all.c:error_report("Error: HV_NO_DEVICE");
> accel/hvf/hvf-all.c:error_report("Error: HV_UNSUPPORTED");
> accel/hvf/hvf-all.c:error_report("Error: HV_DENIED");
> monitor/hmp-cmds.c:error_reportf_err(err, "Error: ");
> hw/arm/xlnx-zcu102.c:error_report("ERROR: RAM size 0x%" PRIx64 " 
> above max supported of "
> hw/block/dataplane/xen-block.c:error_report("error: unknown operation 
> (%d)", request->req.operation);
> hw/block/dataplane/xen-block.c:error_report("error: write req for ro 
> device");
> hw/block/dataplane/xen-block.c:error_report("error: nr_segments 
> too big");
> hw/block/dataplane/xen-block.c:error_report("error: first > last 
> sector");
> hw/block/dataplane/xen-block.c:error_report("error: page 
> crossing");
> hw/block/dataplane/xen-block.c:error_report("error: access beyond end 
> of file");
> hw/rdma/rdma_backend.c:rdma_error_report("Error: Not a MAD 
> request, skipping");
> hw/s390x/s390-skeys.c:error_report("Error: Setting storage keys for 
> pages with unallocated "
> hw/s390x/s390-skeys.c:error_report("Error: Getting storage keys for 
> pages with unallocated "
> hw/usb/bus.c:error_report("Error: no usb bus to attach usbdevice %s, "
> gdbstub/gdbstub.c:error_report("Error: Bad gdb register numbering 
> for '%s', "
>
>   Thomas
>

[PATCH for-9.1] migration: Add Error** argument to add_bitmaps_to_list()

2024-03-29 Thread Cédric Le Goater

This allows to report more precise errors in the migration handler
dirty_bitmap_save_setup().

Suggested-by Vladimir Sementsov-Ogievskiy  
Signed-off-by: Cédric Le Goater 
---

 To apply on top of : 
 https://lore.kernel.org/qemu-devel/20240320064911.545001-1-...@redhat.com/
 
 migration/block-dirty-bitmap.c | 34 --
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 
542a8c297b329abc30d1b3a205d29340fa59a961..a7d55048c23505fde565ca784cec3c917dca37e5
 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -481,13 +481,13 @@ static void dirty_bitmap_do_save_cleanup(DBMSaveState *s)
 
 /* Called with the BQL taken. */
 static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs,
-   const char *bs_name, GHashTable *alias_map)
+   const char *bs_name, GHashTable *alias_map,
+   Error **errp)
 {
 BdrvDirtyBitmap *bitmap;
 SaveBitmapState *dbms;
 GHashTable *bitmap_aliases;
 const char *node_alias, *bitmap_name, *bitmap_alias;
-Error *local_err = NULL;
 
 /* When an alias map is given, @bs_name must be @bs's node name */
 assert(!alias_map || !strcmp(bs_name, bdrv_get_node_name(bs)));
@@ -504,8 +504,8 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 bitmap_name = bdrv_dirty_bitmap_name(bitmap);
 
 if (!bs_name || strcmp(bs_name, "") == 0) {
-error_report("Bitmap '%s' in unnamed node can't be migrated",
- bitmap_name);
+error_setg(errp, "Bitmap '%s' in unnamed node can't be migrated",
+   bitmap_name);
 return -1;
 }
 
@@ -525,9 +525,9 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 }
 
 if (node_alias[0] == '#') {
-error_report("Bitmap '%s' in a node with auto-generated "
- "name '%s' can't be migrated",
- bitmap_name, node_alias);
+error_setg(errp, "Bitmap '%s' in a node with auto-generated "
+   "name '%s' can't be migrated",
+   bitmap_name, node_alias);
 return -1;
 }
 
@@ -538,8 +538,7 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 continue;
 }
 
-if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_DEFAULT, _err)) {
-error_report_err(local_err);
+if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_DEFAULT, errp)) {
 return -1;
 }
 
@@ -558,9 +557,9 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 }
 } else {
 if (strlen(bitmap_name) > UINT8_MAX) {
-error_report("Cannot migrate bitmap '%s' on node '%s': "
- "Name is longer than %u bytes",
- bitmap_name, bs_name, UINT8_MAX);
+error_setg(errp, "Cannot migrate bitmap '%s' on node '%s': "
+   "Name is longer than %u bytes",
+   bitmap_name, bs_name, UINT8_MAX);
 return -1;
 }
 bitmap_alias = bitmap_name;
@@ -599,7 +598,7 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 }
 
 /* Called with the BQL taken. */
-static int init_dirty_bitmap_migration(DBMSaveState *s)
+static int init_dirty_bitmap_migration(DBMSaveState *s, Error **errp)
 {
 BlockDriverState *bs;
 SaveBitmapState *dbms;
@@ -643,7 +642,7 @@ static int init_dirty_bitmap_migration(DBMSaveState *s)
 }
 
 if (bs && bs->drv && !bs->drv->is_filter) {
-if (add_bitmaps_to_list(s, bs, name, NULL)) {
+if (add_bitmaps_to_list(s, bs, name, NULL, errp)) {
 goto fail;
 }
 g_hash_table_add(handled_by_blk, bs);
@@ -656,7 +655,8 @@ static int init_dirty_bitmap_migration(DBMSaveState *s)
 continue;
 }
 
-if (add_bitmaps_to_list(s, bs, bdrv_get_node_name(bs), alias_map)) {
+if (add_bitmaps_to_list(s, bs, bdrv_get_node_name(bs), alias_map,
+errp)) {
 goto fail;
 }
 }
@@ -1218,9 +1218,7 @@ static int dirty_bitmap_save_setup(QEMUFile *f, void 
*opaque, Error **errp)
 DBMSaveState *s = &((DBMState *)opaque)->save;
 SaveBitmapState *dbms = NULL;
 
-if (init_dirty_bitmap_migration(s) < 0) {
-error_setg(errp,
-   "Failed to initialize dirty tracking bitmap for blocks");
+if (init_dirty_bitmap_migration(s, errp) < 0) {
 return -1;
 }
 
-- 
2.44.0

Re: [PATCH for-9.1 v5 07/14] migration: Add Error** argument to .save_setup() handler

2024-03-29 Thread Cédric Le Goater


Hello Vladimir,

On 3/29/24 10:32, Vladimir Sementsov-Ogievskiy wrote:

On 20.03.24 09:49, Cédric Le Goater wrote:

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 
2708abf3d762de774ed294d3fdb8e56690d2974c..542a8c297b329abc30d1b3a205d29340fa59a961
 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -1213,12 +1213,14 @@ fail:
  return ret;
  }
-static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
+static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  DBMSaveState *s = &((DBMState *)opaque)->save;
  SaveBitmapState *dbms = NULL;
  if (init_dirty_bitmap_migration(s) < 0) {
+    error_setg(errp,
+   "Failed to initialize dirty tracking bitmap for blocks");


No, that's not about initializing a bitmap. This all is about migration of 
block-dirty-bitmaps themselves.

So correct would be say "Failed to initialize migration of block dirty bitmaps".

with this, for block dirty bitmap migration:
Acked-by: Vladimir Sementsov-Ogievskiy 


I had kept your previous R-b.

Should we remove it ? or is it ok if I address your comments below in a
followup patch, in which case the error message above would be removed.


Still, a lot better is add errp to init_dirty_bitmap_migration() and to 
add_bitmaps_to_list() too: look,

init_dirty_bitmap_migration() fails only if add_bitmaps_to_list() fails

in turn,

add_bitmaps_to_list() have several clear failure points, where it always does 
error_report (or error_report_err), which would be better to pass-through to 
the user.


Good idea. Will do.

Thanks,

C.

Re: [RFC QEMU PATCH v8 2/2] virtio-pci: implement No_Soft_Reset bit

2024-03-29 Thread Michael S. Tsirkin

On Fri, Mar 29, 2024 at 03:20:59PM +0800, Jason Wang wrote:
> On Fri, Mar 29, 2024 at 3:07 PM Chen, Jiqian  wrote:
> >
> > On 2024/3/28 20:36, Michael S. Tsirkin wrote:
> >  +}
> >  +
> >   static void virtio_pci_bus_reset_hold(Object *obj)
> >   {
> >   PCIDevice *dev = PCI_DEVICE(obj);
> >   DeviceState *qdev = DEVICE(obj);
> > 
> >  +if (virtio_pci_no_soft_reset(dev)) {
> >  +return;
> >  +}
> >  +
> >   virtio_pci_reset(qdev);
> > 
> >   if (pci_is_express(dev)) {
> >  @@ -2484,6 +2511,8 @@ static Property virtio_pci_properties[] = {
> >   VIRTIO_PCI_FLAG_INIT_LNKCTL_BIT, true),
> >   DEFINE_PROP_BIT("x-pcie-pm-init", VirtIOPCIProxy, flags,
> >   VIRTIO_PCI_FLAG_INIT_PM_BIT, true),
> >  +DEFINE_PROP_BIT("x-pcie-pm-no-soft-reset", VirtIOPCIProxy, flags,
> >  +VIRTIO_PCI_FLAG_PM_NO_SOFT_RESET_BIT, false),
> 
> Why does it come with an x prefix?
> 
> >   DEFINE_PROP_BIT("x-pcie-flr-init", VirtIOPCIProxy, flags,
> >   VIRTIO_PCI_FLAG_INIT_FLR_BIT, true),
> >   DEFINE_PROP_BIT("aer", VirtIOPCIProxy, flags,
> > >>>
> > >>> I am a bit confused about this part.
> > >>> Do you want to make this software controllable?
> > >> Yes, because even the real hardware, this bit is not always set.
> 
> We are talking about emulated devices here.
> 
> > >
> > > So which virtio devices should and which should not set this bit?
> > This depends on the scenario the virtio-device is used, if we want to 
> > trigger an internal soft reset for the virtio-device during S3, this bit 
> > shouldn't be set.
> 
> If the device doesn't need reset, why bother the driver for this?
> 
> Btw, no_soft_reset is insufficient for some cases, there's a proposal
> for the virtio-spec. I think we need to wait until it is done.

That seems orthogonal or did I miss something?

> > In my use case on my environment, I don't want to reset virtio-gpu during 
> > S3,
> > because once the display resources are destroyed, there are not enough 
> > information to re-create them, so this bit should be set.
> > Making this bit software controllable is convenient for users to take their 
> > own choices.
> 
> Thanks
> 
> >
> > >
> > >>> Or should this be set to true by default and then
> > >>> changed to false for old machine types?
> > >> How can I do so?
> > >> Do you mean set this to true by default, and if old machine types don't 
> > >> need this bit, they can pass false config to qemu when running qemu?
> > >
> > > No, you would use compat machinery. See how is x-pcie-flr-init handled.
> > >
> > >
> >
> > --
> > Best regards,
> > Jiqian Chen.

Re: [RFC QEMU PATCH v8 2/2] virtio-pci: implement No_Soft_Reset bit

2024-03-29 Thread Jason Wang

On Fri, Mar 29, 2024 at 4:00 PM Chen, Jiqian  wrote:
>
> On 2024/3/29 15:20, Jason Wang wrote:
> > On Fri, Mar 29, 2024 at 3:07 PM Chen, Jiqian  wrote:
> >>
> >> On 2024/3/28 20:36, Michael S. Tsirkin wrote:
> >> +}
> >> +
> >>  static void virtio_pci_bus_reset_hold(Object *obj)
> >>  {
> >>  PCIDevice *dev = PCI_DEVICE(obj);
> >>  DeviceState *qdev = DEVICE(obj);
> >>
> >> +if (virtio_pci_no_soft_reset(dev)) {
> >> +return;
> >> +}
> >> +
> >>  virtio_pci_reset(qdev);
> >>
> >>  if (pci_is_express(dev)) {
> >> @@ -2484,6 +2511,8 @@ static Property virtio_pci_properties[] = {
> >>  VIRTIO_PCI_FLAG_INIT_LNKCTL_BIT, true),
> >>  DEFINE_PROP_BIT("x-pcie-pm-init", VirtIOPCIProxy, flags,
> >>  VIRTIO_PCI_FLAG_INIT_PM_BIT, true),
> >> +DEFINE_PROP_BIT("x-pcie-pm-no-soft-reset", VirtIOPCIProxy, flags,
> >> +VIRTIO_PCI_FLAG_PM_NO_SOFT_RESET_BIT, false),
> >
> > Why does it come with an x prefix?
> Sorry, it's my misunderstanding of this prefix, if No_Soft_Reset doesn't need 
> this prefix, I will delete it in next version.
> Does x prefix means compat machinery? Or other meanings?
>
> >
> >>  DEFINE_PROP_BIT("x-pcie-flr-init", VirtIOPCIProxy, flags,
> >>  VIRTIO_PCI_FLAG_INIT_FLR_BIT, true),
> >>  DEFINE_PROP_BIT("aer", VirtIOPCIProxy, flags,
> >
> > I am a bit confused about this part.
> > Do you want to make this software controllable?
>  Yes, because even the real hardware, this bit is not always set.
> >
> > We are talking about emulated devices here.
> Yes, I just gave an example. It actually this bit is not always set. What's 
> your opinion about when to set this bit or which virtio-device should set 
> this bit?

If the implementation of Qemu is correct, we should set it unless we
need compatibility.

>
> >
> >>>
> >>> So which virtio devices should and which should not set this bit?
> >> This depends on the scenario the virtio-device is used, if we want to 
> >> trigger an internal soft reset for the virtio-device during S3, this bit 
> >> shouldn't be set.
> >
> > If the device doesn't need reset, why bother the driver for this?
> I don't know what you mean.
> If the device doesn't need reset, we can config true to set this bit, then on 
> the driver side, driver finds this bit is set, then driver will not trigger a 
> soft reset.

I mean if the device can suspend without reset, we don't need to
bother the driver to save and load states.

>
> >
> > Btw, no_soft_reset is insufficient for some cases,
> May I know which cases?
>
> > there's a proposal for the virtio-spec. I think we need to wait until it is 
> > done.
> Can you share the proposal?

See this

https://lore.kernel.org/all/20240227015345.3614965-1-steve...@chromium.org/T/

Thanks

>
> >
> >> In my use case on my environment, I don't want to reset virtio-gpu during 
> >> S3,
> >> because once the display resources are destroyed, there are not enough 
> >> information to re-create them, so this bit should be set.
> >> Making this bit software controllable is convenient for users to take 
> >> their own choices.
> >
> > Thanks
> >
> >>
> >>>
> > Or should this be set to true by default and then
> > changed to false for old machine types?
>  How can I do so?
>  Do you mean set this to true by default, and if old machine types don't 
>  need this bit, they can pass false config to qemu when running qemu?
> >>>
> >>> No, you would use compat machinery. See how is x-pcie-flr-init handled.
> >>>
> >>>
> >>
> >> --
> >> Best regards,
> >> Jiqian Chen.
> >
>
> --
> Best regards,
> Jiqian Chen.

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-03-29 Thread Philippe Mathieu-Daudé


Hi Zhijian,

On 29/3/24 02:53, Zhijian Li (Fujitsu) wrote:



On 28/03/2024 23:01, Peter Xu wrote:

On Thu, Mar 28, 2024 at 11:18:04AM -0300, Fabiano Rosas wrote:

Philippe Mathieu-Daudé  writes:


The whole RDMA subsystem was deprecated in commit e9a54265f5
("hw/rdma: Deprecate the pvrdma device and the rdma subsystem")
released in v8.2.

Remove:
   - RDMA handling from migration
   - dependencies on libibumad, libibverbs and librdmacm

Keep the RAM_SAVE_FLAG_HOOK definition since it might appears
in old migration streams.

Cc: Peter Xu 
Cc: Li Zhijian 
Acked-by: Fabiano Rosas 
Signed-off-by: Philippe Mathieu-Daudé 


Just to be clear, because people raised the point in the last version,
the first link in the deprecation commit links to a thread comprising
entirely of rdma migration patches. I don't see any ambiguity on whether
the deprecation was intended to include migration. There's even an ack
from Juan.


Yes I remember that's the plan.



So on the basis of not reverting the previous maintainer's decision, my
Ack stands here.

We also had pretty obvious bugs ([1], [2]) in the past that would have
been caught if we had any kind of testing for the feature, so I can't
even say this thing works currently.

@Peter Xu, @Li Zhijian, what are your thoughts on this?


Generally I definitely agree with such a removal sooner or later, as that's
how deprecation works, and even after Juan's left I'm not aware of any
other new RDMA users.  Personally, I'd slightly prefer postponing it one
more release which might help a bit of our downstream maintenance, however
I assume that's not a blocker either, as I think we can also manage it.

IMHO it's more important to know whether there are still users and whether
they would still like to see it around. That's also one thing I notice that
e9a54265f533f didn't yet get acks from RDMA users that we are aware, even
if they're rare. According to [2] it could be that such user may only rely
on the release versions of QEMU when it broke things.

So I'm copying Yu too (while Zhijian is already in the loop), just in case
someone would like to stand up and speak.



I admit RDMA migration was lack of testing(unit/CI test), which led to the a few
obvious bugs being noticed too late.
However I was a bit surprised when I saw the removal of the RDMA migration. I 
wasn't
aware that this feature has not been marked as deprecated(at least there is no
prompt to end-user).



IMHO it's more important to know whether there are still users and whether
they would still like to see it around.


Agree.
I didn't immediately express my opinion in V1 because I'm also consulting our
customers for this feature in the future.

Personally, I agree with Perter's idea that "I'd slightly prefer postponing it 
one
more release which might help a bit of our downstream maintenance"


Do you mind posting a deprecation patch to clarify the situation?

Thanks,

Phil.



Thanks
Zhijian



Thanks,



1- https://lore.kernel.org/r/20230920090412.726725-1-lizhij...@fujitsu.com
2- 
https://lore.kernel.org/r/cahecvy7hxswn4ow_kog+q+tn6f_kmeichevz1qgm-fbxbpp...@mail.gmail.com

Re: [PATCH-for-9.0 v2] hw/i386/pc: Deprecate 64-bit CPUs on ISA-only PC machine

2024-03-29 Thread Philippe Mathieu-Daudé


On 28/3/24 16:39, Thomas Huth wrote:

On 28/03/2024 16.12, Mark Cave-Ayland wrote:

On 27/03/2024 16:54, Philippe Mathieu-Daudé wrote:


Per Daniel suggestion [*]:

  > isapc could arguably be restricted to just 32-bit CPU models,
  > because we should not need it to support any feature that didn't
  > exist prior to circa 1995. eg refuse to start with isapc, if 'lm'
  > is present in the CPU model for example.

Display a warning when such CPU is used:

   $ qemu-system-x86_64 -monitor stdio -S -M isapc -cpu Westmere
   qemu-system-x86_64: warning: Use of 64-bit CPU 'Westmere' is 
deprecated on the ISA-only PC machine

   QEMU 8.2.91 monitor - type 'help' for more information
   (qemu) q

   $ qemu-system-x86_64 -monitor stdio -S -M isapc -cpu athlon
   QEMU 8.2.91 monitor - type 'help' for more information
   (qemu) q

[*] https://lore.kernel.org/qemu-devel/zgqks4rpmst5x...@redhat.com/

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Philippe Mathieu-Daudé 
---
  docs/about/deprecated.rst |  7 +++
  include/hw/i386/pc.h  |  1 +
  hw/i386/pc_piix.c | 14 ++
  3 files changed, 22 insertions(+)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 7b548519b5..345c35507f 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -208,6 +208,13 @@ is no longer packaged in any distro making it 
harder to run the

  ``check-tcg`` tests. Unless we can improve the testing situation there
  is a chance the code will bitrot without anyone noticing.
+64-bit (x86_64) CPUs on the ``isapc`` machine (since 9.0)
+'
+
+The ``isapc`` machine aims to emulate old PC machine without PCI was
+generalized, so hardware available around 1995, before 64-bit intel
+CPUs were produced.
+
  System emulator machines
  
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 27a68071d7..2d202b9549 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -96,6 +96,7 @@ struct PCMachineClass {
  const char *default_south_bridge;
  /* Compat options: */
+    bool deprecate_64bit_cpu; /* Specific to the 'isapc' machine */
  /* Default CPU model version.  See 
x86_cpu_set_default_version(). */

  int default_cpu_version;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 18ba076609..2e5b2efc33 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -182,7 +182,20 @@ static void pc_init1(MachineState *machine, 
const char *pci_type)

  }
  pc_machine_init_sgx_epc(pcms);
+
  x86_cpus_init(x86ms, pcmc->default_cpu_version);
+    if (pcmc->deprecate_64bit_cpu) {
+    X86CPU *cpu = X86_CPU(first_cpu);
+
+    if (cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
+    const char *cpu_type = 
object_get_typename(OBJECT(first_cpu));
+    int cpu_len = strlen(cpu_type) - 
strlen(X86_CPU_TYPE_SUFFIX);

+
+    warn_report("Use of 64-bit CPU '%.*s' is deprecated"
+    " on the ISA-only PC machine",
+    cpu_len, cpu_type);
+    }
+    }
  if (kvm_enabled()) {
  kvmclock_create(pcmc->kvmclock_create_always);
@@ -918,6 +931,7 @@ static void isapc_machine_options(MachineClass *m)
  pcmc->gigabyte_align = false;
  pcmc->smbios_legacy_mode = true;
  pcmc->has_reserved_memory = false;
+    pcmc->deprecate_64bit_cpu = true;
  m->default_nic = "ne2k_isa";
  m->default_cpu_type = X86_CPU_TYPE_NAME("486");
  m->no_parallel = !module_object_class_by_name(TYPE_ISA_PARALLEL);


The logic around checking CPUID_EXT2_LM looks good to me. Slightly 
curious as to whether people feel updating PCMachineClass is 
necessary, or you can simply do qdev_get_machine() and use 
object_dynamic_cast() to see if the machine matches 
MACHINE_NAME("isapc") and warn that way?


Why don't you simply pass it as a parameter from pc_init_isa() instead? 
Or do the whole check in pc_init_isa() instead?


Because the CPU isn't instantiated so we can't check the CPUID_EXT2_LM
feature :/

Re: [RFC PATCH-for-9.1 09/29] hw/i386/pc: Pass PCMachineState argument to acpi_setup()

2024-03-29 Thread Philippe Mathieu-Daudé


On 28/3/24 19:45, BALATON Zoltan wrote:

On Thu, 28 Mar 2024, Philippe Mathieu-Daudé wrote:

acpi_setup() caller knows about the machine state, so pass
it as argument to avoid a qdev_get_machine() call.

We already resolved X86_MACHINE(pcms) as 'x86ms' so use the
latter.

Signed-off-by: Philippe Mathieu-Daudé 
---
hw/i386/acpi-build.h | 3 ++-
hw/i386/acpi-build.c | 5 ++---
hw/i386/pc.c | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/i386/acpi-build.h b/hw/i386/acpi-build.h
index 0dce155c8c..31de5bddbd 100644
--- a/hw/i386/acpi-build.h
+++ b/hw/i386/acpi-build.h
@@ -2,6 +2,7 @@
#ifndef HW_I386_ACPI_BUILD_H
#define HW_I386_ACPI_BUILD_H
#include "hw/acpi/acpi-defs.h"
+#include "hw/i386/pc.h"

extern const struct AcpiGenericAddress x86_nvdimm_acpi_dsmio;

@@ -9,7 +10,7 @@ extern const struct AcpiGenericAddress 
x86_nvdimm_acpi_dsmio;

#define ACPI_PCIHP_SEJ_BASE 0x8
#define ACPI_PCIHP_BNMR_BASE 0x10

-void acpi_setup(void);
+void acpi_setup(PCMachineState *pcms);


This is changed to PcPciMachineState * in a following patch so can't you 
already introduce it here to avoid some churn?


Unfortunately not, because we'd need to use:

  PcPciMachineState *ppms = PC_PCI_MACHINE(pcms);

which would trigger an assertion at this point.



Regards,
BALATON Zoltan

Re: [PATCH for-9.0] docs/about: Mark the iaspc machine type as deprecated

2024-03-29 Thread Bernhard Beschow

Am 28. März 2024 14:09:52 UTC schrieb Mark Cave-Ayland 
:
>On 27/03/2024 07:09, Gerd Hoffmann wrote:
>
>> On Tue, Mar 26, 2024 at 01:30:48PM +, Mark Cave-Ayland wrote:
>>> Heh I've actually been using isapc over the past couple of weeks to fire up
>>> some old programs in a Windows 3 VM :)
>> 
>> I'm wondering why these use cases can't simply use the 'pc' machine
>> type?
>> 
>> The early pci chipsets of the 90-ies have been designed in a
>> backward-compatible manner, with devices such as the IDE controller
>> being mapped to the standard ISA ioports.  So even an historic OS which
>> does not know what PCI is can run on that hardware, by simply talking to
>> devices using the standard ISA io ports ...
>
>Hmmm that's a fair point: I think the pc machine has a PCI-ISA bridge 
>included, so ISA devices can be plugged in as needed. The reason I ended up on 
>that configuration was because I ended up chasing down a regression, and 
>wanted to quickly eliminate things such as ACPI.

In theory you could pass `-M acpi=off` to not instantiate the PIIX4 ACPI 
function, essentially turning the Frankenstein-PIIX4 SB into a PIIX3. However, 
this also removes SMI registers used by SeaBIOS to handle SMM setup which may 
create unwanted side effects. On a real PIIX3, these registers are located in 
the ISA function. I wonder if it made sense to implement that for greater 
compatibility.

What do you think? Gerd, what do you think w.r.t. SeaBIOS?

Best regards,
Bernhard

>
>
>ATB,
>
>Mark.
>
>

Re: [PATCH 2/2] backup: add minimum cluster size to performance options

2024-03-29 Thread Vladimir Sementsov-Ogievskiy


On 08.03.24 18:51, Fiona Ebner wrote:

Useful to make discard-source work in the context of backup fleecing
when the fleecing image has a larger granularity than the backup
target.

Backup/block-copy will use at least this granularity for copy operations
and in particular, discard requests to the backup source will too. If
the granularity is too small, they will just be aligned down in
cbw_co_pdiscard_snapshot() and thus effectively ignored.

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Fiona Ebner 
---
  block/backup.c| 2 +-
  block/copy-before-write.c | 2 ++
  block/copy-before-write.h | 1 +
  blockdev.c| 3 +++
  qapi/block-core.json  | 9 +++--
  5 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 3dd2e229d2..a1292c01ec 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -458,7 +458,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
  }
  
  cbw = bdrv_cbw_append(bs, target, filter_node_name, discard_source,

-  , errp);
+  perf->min_cluster_size, , errp);
  if (!cbw) {
  goto error;
  }
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index f9896c6c1e..55a9272485 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -545,6 +545,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
BlockDriverState *target,
const char *filter_node_name,
bool discard_source,
+  int64_t min_cluster_size,


same note, suggest uint32_t


BlockCopyState **bcs,
Error **errp)
  {
@@ -563,6 +564,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
  }
  qdict_put_str(opts, "file", bdrv_get_node_name(source));
  qdict_put_str(opts, "target", bdrv_get_node_name(target));
+qdict_put_int(opts, "min-cluster-size", min_cluster_size);
  
  top = bdrv_insert_node(source, opts, flags, errp);

  if (!top) {
diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 01af0cd3c4..dc6cafe7fa 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -40,6 +40,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
BlockDriverState *target,
const char *filter_node_name,
bool discard_source,
+  int64_t min_cluster_size,
BlockCopyState **bcs,
Error **errp);
  void bdrv_cbw_drop(BlockDriverState *bs);
diff --git a/blockdev.c b/blockdev.c
index daceb50460..8e6bdbc94a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2653,6 +2653,9 @@ static BlockJob *do_backup_common(BackupCommon *backup,
  if (backup->x_perf->has_max_chunk) {
  perf.max_chunk = backup->x_perf->max_chunk;
  }
+if (backup->x_perf->has_min_cluster_size) {
+perf.min_cluster_size = backup->x_perf->min_cluster_size;
+}
  }
  
  if ((backup->sync == MIRROR_SYNC_MODE_BITMAP) ||

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 85c8f88f6e..ba0836892f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1551,11 +1551,16 @@
  # it should not be less than job cluster size which is calculated
  # as maximum of target image cluster size and 64k.  Default 0.
  #
+# @min-cluster-size: Minimum size of blocks used by copy-before-write
+# and background copy operations.  Has to be a power of 2.  No
+# effect if smaller than the maximum of the target's cluster size
+# and 64 KiB.  Default 0.  (Since 9.0)


9.1


+#
  # Since: 6.0
  ##
  { 'struct': 'BackupPerf',
-  'data': { '*use-copy-range': 'bool',
-'*max-workers': 'int', '*max-chunk': 'int64' } }
+  'data': { '*use-copy-range': 'bool', '*max-workers': 'int',
+'*max-chunk': 'int64', '*min-cluster-size': 'uint32' } }
  
  ##

  # @BackupCommon:




--
Best regards,
Vladimir

Re: [PATCH 1/2] copy-before-write: allow specifying minimum cluster size

2024-03-29 Thread Vladimir Sementsov-Ogievskiy


On 08.03.24 18:51, Fiona Ebner wrote:

Useful to make discard-source work in the context of backup fleecing
when the fleecing image has a larger granularity than the backup
target.

Copy-before-write operations will use at least this granularity and in
particular, discard requests to the source node will too. If the
granularity is too small, they will just be aligned down in
cbw_co_pdiscard_snapshot() and thus effectively ignored.

The QAPI uses uint32 so the value will be non-negative, but still fit
into a uint64_t.

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Fiona Ebner 
---
  block/block-copy.c | 17 +
  block/copy-before-write.c  |  3 ++-
  include/block/block-copy.h |  1 +
  qapi/block-core.json   |  8 +++-
  4 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 7e3b378528..adb1cbb440 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -310,6 +310,7 @@ void block_copy_set_copy_opts(BlockCopyState *s, bool 
use_copy_range,
  }
  
  static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,

+ int64_t min_cluster_size,


Maybe better use uint32_t here as well.


   Error **errp)
  {
  int ret;
@@ -335,7 +336,7 @@ static int64_t 
block_copy_calculate_cluster_size(BlockDriverState *target,
  "used. If the actual block size of the target exceeds "
  "this default, the backup may be unusable",
  BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
-return BLOCK_COPY_CLUSTER_SIZE_DEFAULT;
+return MAX(min_cluster_size, BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
  } else if (ret < 0 && !target_does_cow) {
  error_setg_errno(errp, -ret,
  "Couldn't determine the cluster size of the target image, "
@@ -345,16 +346,18 @@ static int64_t 
block_copy_calculate_cluster_size(BlockDriverState *target,
  return ret;
  } else if (ret < 0 && target_does_cow) {
  /* Not fatal; just trudge on ahead. */
-return BLOCK_COPY_CLUSTER_SIZE_DEFAULT;
+return MAX(min_cluster_size, BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
  }
  
-return MAX(BLOCK_COPY_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);

+return MAX(min_cluster_size,
+   MAX(BLOCK_COPY_CLUSTER_SIZE_DEFAULT, bdi.cluster_size));
  }
  
  BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,

   BlockDriverState *copy_bitmap_bs,
   const BdrvDirtyBitmap *bitmap,
   bool discard_source,
+ int64_t min_cluster_size,


and here why not uint32_t


   Error **errp)
  {
  ERRP_GUARD();
@@ -365,7 +368,13 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
  
  GLOBAL_STATE_CODE();
  
-cluster_size = block_copy_calculate_cluster_size(target->bs, errp);

+if (min_cluster_size && !is_power_of_2(min_cluster_size)) {
+error_setg(errp, "min-cluster-size needs to be a power of 2");
+return NULL;
+}
+
+cluster_size = block_copy_calculate_cluster_size(target->bs,
+ min_cluster_size, errp);
  if (cluster_size < 0) {
  return NULL;
  }
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index dac57481c5..f9896c6c1e 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -476,7 +476,8 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
  
  s->discard_source = flags & BDRV_O_CBW_DISCARD_SOURCE;

  s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap,
-  flags & BDRV_O_CBW_DISCARD_SOURCE, errp);
+  flags & BDRV_O_CBW_DISCARD_SOURCE,
+  opts->min_cluster_size, errp);


I assume it is guaranteed to be 0 when not specified by user.


  if (!s->bcs) {
  error_prepend(errp, "Cannot create block-copy-state: ");
  return -EINVAL;
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index bdc703bacd..77857c6c68 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -28,6 +28,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
   BlockDriverState *copy_bitmap_bs,
   const BdrvDirtyBitmap *bitmap,
   bool discard_source,
+ int64_t min_cluster_size,
   Error **errp);
  
  /* Function should be called prior any actual copy request */

diff --git a/qapi/block-core.json b/qapi/block-core.json
index

[PATCH for-9.1 6/7] target/i386: Fix duplicated kvmclock name in FEAT_KVM

2024-03-29 Thread Zhao Liu

From: Tim Wiederhake 

The commit 642258c6c7 ("kvm: add kvmclock to its second bit") gave the
old and new kvmclocks with the same name "kvmclock", to facilitate user
to set/unset the feature bits for both 2 kvmclock features together.

This could work because:
* QEMU side:
  - x86_cpu_register_bit_prop() supports "the same property name can be
registered multiple times to make it affect multiple bits in the
same FeatureWord".
* KVM side:
  - The only difference between 2 version kvmclocks is their MSRs have
different addresses.
  - When 2 kvmclocks are both enabled, KVM will prioritize the use of
new kvmclock's MSRs.

However, there're reasons we need give the second kvmclock a new name:
* Based on the KVM mechanism, it doesn't make sense to bind two
  kvmclocks together. Since kvmclock is enabled by default in most cases
  as a KVM PV feature, the benefit of the same naming is reflected in
  the fact that -kvmclock can disable both. But, following the KVM
  interface style (i.e., separating the two kvmclocks) is clearly
  clearer to the user.
* For developers, identical names have been creating confusion along
  with persistent doubts about the naming.
* FeatureWordInfo should define names based on hardware/Host CPUID bit,
  and the name is used to distinguish the bit.
* User actions based on +/- feature names should only work on
  independent feature bits. The common effect of multiple features
  should be controlled by an additional CPU property or additional code
  logic to show the association between different feature bits.
* The old kvmclock will eventually be removed. Different naming can ease
  the burden of future cleanups.

Therefore, rename the new kvmclock feature as "kvmclock2".

Additionally, add "kvmclock2" entry in kvm_default_props[] since the
oldest kernel supported by QEMU (v4.5) has supported the new kvm clock.

Signed-off-by: Tim Wiederhake 
Signed-off-by: Zhao Liu 
---
Based on Tim's original patch, rewrote the commit message and added the
tiny fix for compatibility.
---
 target/i386/cpu.c | 2 +-
 target/i386/kvm/kvm-cpu.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1b6caf071a6d..0a1dac60f5de 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -855,7 +855,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 [FEAT_KVM] = {
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
-"kvmclock", "kvm-nopiodelay", "kvm-mmu", "kvmclock",
+"kvmclock", "kvm-nopiodelay", "kvm-mmu", "kvmclock2",
 "kvm-asyncpf", "kvm-steal-time", "kvm-pv-eoi", "kvm-pv-unhalt",
 NULL, "kvm-pv-tlb-flush", "kvm-asyncpf-vmexit", "kvm-pv-ipi",
 "kvm-poll-control", "kvm-pv-sched-yield", "kvm-asyncpf-int", 
"kvm-msi-ext-dest-id",
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index ae3cb27c8aa8..753f90c18bd6 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -77,7 +77,7 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
 
 if (cpu->legacy_kvmclock) {
 /*
- * The old and new kvmclock are both set by default from the
+ * The kvmclock and kvmclock2 are both set by default from the
  * oldest KVM supported (v4.5, see "OS requirements" section at
  * docs/system/target-i386.rst). So when one of them is missing,
  * it is only possible that the user is actively masking it.
@@ -179,6 +179,7 @@ static void kvm_cpu_xsave_init(void)
  */
 static PropValue kvm_default_props[] = {
 { "kvmclock", "on" },
+{ "kvmclock2", "on" },
 { "kvm-nopiodelay", "on" },
 { "kvm-asyncpf", "on" },
 { "kvm-steal-time", "on" },
-- 
2.34.1

[PATCH for-9.1 3/7] target/i386/kvm: Only Save/load kvmclock MSRs when kvmclock enabled

2024-03-29 Thread Zhao Liu

From: Zhao Liu 

MSR_KVM_SYSTEM_TIME and MSR_KVM_WALL_CLOCK are attached with the (old)
kvmclock feature (KVM_FEATURE_CLOCKSOURCE).

So, just save/load them only when kvmclock (KVM_FEATURE_CLOCKSOURCE) is
enabled.

Signed-off-by: Zhao Liu 
---
 target/i386/kvm/kvm.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index be88339fb8bd..498580d60c3b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3319,8 +3319,10 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
  */
 if (level >= KVM_PUT_RESET_STATE) {
 kvm_msr_entry_add(cpu, MSR_IA32_TSC, env->tsc);
-kvm_msr_entry_add(cpu, MSR_KVM_SYSTEM_TIME, env->system_time_msr);
-kvm_msr_entry_add(cpu, MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
+if (env->features[FEAT_KVM] & CPUID_FEAT_KVM_CLOCK) {
+kvm_msr_entry_add(cpu, MSR_KVM_SYSTEM_TIME, env->system_time_msr);
+kvm_msr_entry_add(cpu, MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
+}
 if (env->features[FEAT_KVM] & CPUID_FEAT_KVM_ASYNCPF_INT) {
 kvm_msr_entry_add(cpu, MSR_KVM_ASYNC_PF_INT, 
env->async_pf_int_msr);
 }
@@ -3784,8 +3786,10 @@ static int kvm_get_msrs(X86CPU *cpu)
 kvm_msr_entry_add(cpu, MSR_LSTAR, 0);
 }
 #endif
-kvm_msr_entry_add(cpu, MSR_KVM_SYSTEM_TIME, 0);
-kvm_msr_entry_add(cpu, MSR_KVM_WALL_CLOCK, 0);
+if (env->features[FEAT_KVM] & CPUID_FEAT_KVM_CLOCK) {
+kvm_msr_entry_add(cpu, MSR_KVM_SYSTEM_TIME, 0);
+kvm_msr_entry_add(cpu, MSR_KVM_WALL_CLOCK, 0);
+}
 if (env->features[FEAT_KVM] & CPUID_FEAT_KVM_ASYNCPF_INT) {
 kvm_msr_entry_add(cpu, MSR_KVM_ASYNC_PF_INT, 0);
 }
-- 
2.34.1

1 2 >

1 - 100 of 131 matches

Mail list logo