date:20211129

Re: [PATCH v8 09/10] target/ppc: PMU Event-Based exception support

2021-11-29 Thread David Gibson

On Thu, Nov 25, 2021 at 12:08:16PM -0300, Daniel Henrique Barboza wrote:
> From: Gustavo Romero 
> 
> Following up the rfebb implementation, this patch adds the EBB exception
> support that are triggered by Performance Monitor alerts. This exception
> occurs when an enabled PMU condition or event happens and both MMCR0_EBE
> and BESCR_PME are set.
> 
> The supported PM alerts will consist of counter negative conditions of
> the PMU counters. This will be achieved by a timer mechanism that will
> predict when a counter becomes negative. The PMU timer callback will set
> the appropriate bits in MMCR0 and fire a PMC interrupt. The EBB
> exception code will then set the appropriate BESCR bits, set the next
> instruction pointer to the address pointed by the return register
> (SPR_EBBRR), and redirect execution to the handler (pointed by
> SPR_EBBHR).
> 
> CC: Gustavo Romero 
> Signed-off-by: Gustavo Romero 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  target/ppc/cpu.h |  5 -
>  target/ppc/excp_helper.c | 29 +
>  target/ppc/power8-pmu.c  | 35 +--
>  3 files changed, 66 insertions(+), 3 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index edb4488176..28ae904d76 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -129,8 +129,10 @@ enum {
>  /* ISA 3.00 additions */
>  POWERPC_EXCP_HVIRT= 101,
>  POWERPC_EXCP_SYSCALL_VECTORED = 102, /* scv exception
>  */
> +POWERPC_EXCP_EBB = 103, /* Event-based branch exception  
> */
> +
>  /* EOL   
> */
> -POWERPC_EXCP_NB   = 103,
> +POWERPC_EXCP_NB   = 104,
>  /* QEMU exceptions: special cases we want to stop translation
> */
>  POWERPC_EXCP_SYSCALL_USER = 0x203, /* System call in user mode only  
> */
>  };
> @@ -2453,6 +2455,7 @@ enum {
>  PPC_INTERRUPT_HMI,/* Hypervisor Maintenance interrupt*/
>  PPC_INTERRUPT_HDOORBELL,  /* Hypervisor Doorbell interrupt*/
>  PPC_INTERRUPT_HVIRT,  /* Hypervisor virtualization interrupt  */
> +PPC_INTERRUPT_PMC,/* Hypervisor virtualization interrupt  */

I'm guessing the comment here should be updated.

>  };
>  
>  /* Processor Compatibility mask (PCR) */
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index 7ead32279c..a26d266fe6 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -799,6 +799,23 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
> excp_model, int excp)
>  cpu_abort(cs, "Non maskable external exception "
>"is not implemented yet !\n");
>  break;
> +case POWERPC_EXCP_EBB:   /* Event-based branch exception 
> */
> +if ((env->spr[SPR_FSCR] & (1ull << FSCR_EBB)) &&
> +(env->spr[SPR_BESCR] & BESCR_GE) &&
> +(env->spr[SPR_BESCR] & BESCR_PME)) {
> +target_ulong nip;
> +
> +env->spr[SPR_BESCR] &= ~BESCR_GE;   /* Clear GE */
> +env->spr[SPR_BESCR] |= BESCR_PMEO;  /* Set PMEO */
> +env->spr[SPR_EBBRR] = env->nip; /* Save NIP for rfebb insn */
> +nip = env->spr[SPR_EBBHR];  /* EBB handler */
> +powerpc_set_excp_state(cpu, nip, env->msr);
> +}
> +/*
> + * This interrupt is handled by userspace. No need
> + * to proceed.
> + */
> +return;
>  default:
>  excp_invalid:
>  cpu_abort(cs, "Invalid PowerPC exception %d. Aborting\n", excp);
> @@ -1046,6 +1063,18 @@ static void ppc_hw_interrupt(CPUPPCState *env)
>  powerpc_excp(cpu, env->excp_model, POWERPC_EXCP_THERM);
>  return;
>  }
> +/* PMC -> Event-based branch exception */
> +if (env->pending_interrupts & (1 << PPC_INTERRUPT_PMC)) {
> +/*
> + * Performance Monitor event-based exception can only
> + * occur in problem state.
> + */
> +if (msr_pr == 1) {
> +env->pending_interrupts &= ~(1 << PPC_INTERRUPT_PMC);
> +powerpc_excp(cpu, env->excp_model, POWERPC_EXCP_EBB);
> +return;
> +}
> +}
>  }
>  
>  if (env->resume_as_sreset) {
> diff --git a/target/ppc/power8-pmu.c b/target/ppc/power8-pmu.c
> index 98797f0b2f..330e0d2ae8 100644
> --- a/target/ppc/power8-pmu.c
> +++ b/target/ppc/power8-pmu.c
> @@ -290,6 +290,15 @@ void helper_store_pmc(CPUPPCState *env, uint32_t sprn, 
> uint64_t value)
>  pmc_update_overflow_timer(env, sprn);
>  }
>  
> +static void pmu_delete_timers(CPUPPCState *env)
> +{
> +int i;
> +
> +for (i = 0; i < PMU_TIMERS_NUM; i++) {
> +timer_del(env->pmu_cyc_overflow_timers[i]);
> +}
> +}
> +
>  static void fire_PMC_interrupt(PowerPCCPU *cpu)
>  {
>

Re: [PATCH v8 08/10] PPC64/TCG: Implement 'rfebb' instruction

2021-11-29 Thread David Gibson

On Thu, Nov 25, 2021 at 12:08:15PM -0300, Daniel Henrique Barboza wrote:
> An Event-Based Branch (EBB) allows applications to change the NIA when a
> event-based exception occurs. Event-based exceptions are enabled by
> setting the Branch Event Status and Control Register (BESCR). If the
> event-based exception is enabled when the exception occurs, an EBB
> happens.
> 
> The following operations happens during an EBB:
> 
> - Global Enable (GE) bit of BESCR is set to 0;
> - bits 0-61 of the Event-Based Branch Return Register (EBBRR) are set
> to the the effective address of the NIA that would have executed if the EBB
> didn't happen;
> - Instruction fetch and execution will continue in the effective address
> contained in the Event-Based Branch Handler Register (EBBHR).
> 
> The EBB Handler will process the event and then execute the Return From
> Event-Based Branch (rfebb) instruction. rfebb sets BESCR_GE and then
> redirects execution to the address pointed in EBBRR. This process is
> described in the PowerISA v3.1, Book II, Chapter 6 [1].
> 
> This patch implements the rfebb instruction. Descriptions of all
> relevant BESCR bits are also added - this patch is only using BESCR_GE,
> but the next patches will use the remaining bits.
> 
> [1] https://wiki.raptorcs.com/w/images/f/f5/PowerISA_public.v3.1.pdf
> 
> Reviewed-by: Matheus Ferst 
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: David Gibson 

I'm guessing that for some applications rfebb could be a fairly hot
path, so you might want to rework this to avoid the helper.  But that
can certainly be a later improvement.

> ---
>  target/ppc/cpu.h   | 13 ++
>  target/ppc/excp_helper.c   | 31 
>  target/ppc/helper.h|  1 +
>  target/ppc/insn32.decode   |  5 
>  target/ppc/translate.c |  2 ++
>  target/ppc/translate/branch-impl.c.inc | 33 ++
>  6 files changed, 85 insertions(+)
>  create mode 100644 target/ppc/translate/branch-impl.c.inc
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 993884164f..edb4488176 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -393,6 +393,19 @@ typedef enum {
>  /* PMU uses CTRL_RUN to sample PM_RUN_INST_CMPL */
>  #define CTRL_RUN PPC_BIT(63)
>  
> +/* EBB/BESCR bits */
> +/* Global Enable */
> +#define BESCR_GE PPC_BIT(0)
> +/* External Event-based Exception Enable */
> +#define BESCR_EE PPC_BIT(30)
> +/* Performance Monitor Event-based Exception Enable */
> +#define BESCR_PME PPC_BIT(31)
> +/* External Event-based Exception Occurred */
> +#define BESCR_EEO PPC_BIT(62)
> +/* Performance Monitor Event-based Exception Occurred */
> +#define BESCR_PMEO PPC_BIT(63)
> +#define BESCR_INVALID PPC_BITMASK(32, 33)
> +
>  /* LPCR bits */
>  #define LPCR_VPM0 PPC_BIT(0)
>  #define LPCR_VPM1 PPC_BIT(1)
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index 17607adbe4..7ead32279c 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -1250,6 +1250,37 @@ void helper_hrfid(CPUPPCState *env)
>  }
>  #endif
>  
> +#if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
> +void helper_rfebb(CPUPPCState *env, target_ulong s)
> +{
> +target_ulong msr = env->msr;
> +
> +/*
> + * Handling of BESCR bits 32:33 according to PowerISA v3.1:
> + *
> + * "If BESCR 32:33 != 0b00 the instruction is treated as if
> + *  the instruction form were invalid."
> + */
> +if (env->spr[SPR_BESCR] & BESCR_INVALID) {
> +raise_exception_err(env, POWERPC_EXCP_PROGRAM,
> +POWERPC_EXCP_INVAL | POWERPC_EXCP_INVAL_INVAL);
> +}
> +
> +env->nip = env->spr[SPR_EBBRR];
> +
> +/* Switching to 32-bit ? Crop the nip */
> +if (!msr_is_64bit(env, msr)) {
> +env->nip = (uint32_t)env->spr[SPR_EBBRR];
> +}
> +
> +if (s) {
> +env->spr[SPR_BESCR] |= BESCR_GE;
> +} else {
> +env->spr[SPR_BESCR] &= ~BESCR_GE;
> +}
> +}
> +#endif
> +
>  
> /*/
>  /* Embedded PowerPC specific helpers */
>  void helper_40x_rfci(CPUPPCState *env)
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index d8a23e054a..b0535b389b 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -18,6 +18,7 @@ DEF_HELPER_2(pminsn, void, env, i32)
>  DEF_HELPER_1(rfid, void, env)
>  DEF_HELPER_1(rfscv, void, env)
>  DEF_HELPER_1(hrfid, void, env)
> +DEF_HELPER_2(rfebb, void, env, tl)
>  DEF_HELPER_2(store_lpcr, void, env, tl)
>  DEF_HELPER_2(store_pcr, void, env, tl)
>  DEF_HELPER_2(store_mmcr0, void, env, tl)
> diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
> index e135b8aba4..6cad783dde 100644
> --- a/target/ppc/insn32.decode
> +++ b/target/ppc/insn32.decode
> @@ -427,3 +427,8 @@ XXSPLTW 00 . ---.. . 010100100 . .  
> @XX2
>  ## VSX Vector

Re: [PATCH v8 10/10] target/ppc/excp_helper.c: EBB handling adjustments

2021-11-29 Thread David Gibson

On Thu, Nov 25, 2021 at 12:08:17PM -0300, Daniel Henrique Barboza wrote:
> The current logic is only considering event-based exceptions triggered
> by the performance monitor. This is true now, but we might want to add
> support for external event-based exceptions in the future.
> 
> Let's make it a bit easier to do so by adding the bit logic that would
> happen in case we were dealing with an external event-based exception.
> 
> While we're at it, add a few comments explaining why we're setting and
> clearing BESCR bits.
> 
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: David Gibson 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

How to simulate a microcontroller whose ROM and RAM are the same address space?

2021-11-29 Thread Duo jia

How to simulate a microcontroller whose ROM and RAM are the same address
space?

Some microcontrollers have a Haval architecture. ROM and RAM have separate
buses, which means they have the same address space, such as 0-0x100. How
do I set the memory region?

Thanks!

Re: [PATCH 1/1] ppc/pnv.c: add a friendly warning when accel=kvm is used

2021-11-29 Thread David Gibson

On Mon, Nov 29, 2021 at 06:09:41PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 11/27/21 02:14, David Gibson wrote:
> > On Fri, Nov 26, 2021 at 06:51:38PM +0100, Cédric le Goater wrote:
> > > On 11/26/21 02:11, David Gibson wrote:
> > > > On Thu, Nov 25, 2021 at 07:42:02PM -0300, Daniel Henrique Barboza wrote:
> > > > > If one tries to use -machine powernv9,accel=kvm in a Power9 host, a
> > > > > cryptic error will be shown:
> > > > > 
> > > > > qemu-system-ppc64: Register sync failed... If you're using kvm-hv.ko, 
> > > > > only "-cpu host" is possible
> > > > > qemu-system-ppc64: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): 
> > > > > Invalid argument
> > > > > 
> > > > > Appending '-cpu host' will throw another error:
> > > > > 
> > > > > qemu-system-ppc64: invalid chip model 'host' for powernv9 machine
> > > > > 
> > > > > The root cause is that in IBM PowerPC we have different specs for the 
> > > > > bare-metal
> > > > > and the guests. The bare-metal follows OPAL, the guests follow PAPR. 
> > > > > The kernel
> > > > > KVM modules presented in the ppc kernels implements PAPR. This means 
> > > > > that we
> > > > > can't use KVM accel when using the powernv machine, which is the 
> > > > > emulation of
> > > > > the bare-metal host.
> > > > > 
> > > > > All that said, let's give a more informative error in this case.
> > > > > 
> > > > > Signed-off-by: Daniel Henrique Barboza 
> > > > > ---
> > > > >hw/ppc/pnv.c | 5 +
> > > > >1 file changed, 5 insertions(+)
> > > > > 
> > > > > diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> > > > > index 71e45515f1..e5b87e8730 100644
> > > > > --- a/hw/ppc/pnv.c
> > > > > +++ b/hw/ppc/pnv.c
> > > > > @@ -742,6 +742,11 @@ static void pnv_init(MachineState *machine)
> > > > >DriveInfo *pnor = drive_get(IF_MTD, 0, 0);
> > > > >DeviceState *dev;
> > > > > +if (kvm_enabled()) {
> > > > > +error_report("The powernv machine does not work with KVM 
> > > > > acceleration");
> > > > > +exit(EXIT_FAILURE);
> > > > > +}
> > > > 
> > > > 
> > > > Hmm.. my only concern here is that powernv could, at least
> > > > theoretically, work with KVM PR.  I don't think it does right now,
> > > > though.
> > > 
> > > At the same time, it is nice to not let the user think that it could work
> > > in its current state. Don't you think so ?
> > 
> > Right, I'm thinking of the implication if you have an old qemu but a
> > new KVM which let it work.  Chances of KVM actually implementing this
> > probably aren't good though, so requiring the qemu update if we ever
> > do is probably the better deal.
> 
> 
> If the KVM module implements powernv accel support in the future, I wouldn't
> take my the chances with the powernv machine working out of the box with it.
> 
> Most likely, if an endeavor of supporting KVM accel for powernv ever takes
> place, we'll need QEMU changes to go with it. And when that happens we can
> revert this patch and make the other necessary changes/fixes.
> 
> All that said, perhaps it's useful to add a note in 
> docs/system/ppc/powernv.rst
> explaining the the rationale for what we're doing here.

Yeah, makes sense to me.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v8 07/10] target/ppc/power8-pmu.c: add PM_RUN_INST_CMPL (0xFA) event

2021-11-29 Thread David Gibson

On Thu, Nov 25, 2021 at 12:08:14PM -0300, Daniel Henrique Barboza wrote:
> PM_RUN_INST_CMPL, instructions completed with the run latch set, is
> the architected PowerISA v3.1 event defined with PMC4SEL = 0xFA.
> 
> Implement it by checking for the CTRL RUN bit before incrementing the
> counter. To make this work properly we also need to force a new
> translation block each time SPR_CTRL is written. A small tweak in
> pmu_increment_insns() is then needed to only increment this event
> if the thread has the run latch.
> 
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: David Gibson 

Obviously, it would also be possible to treat the runlatch
instructions event like the all-instructions event but force an update
on runlatch changes.  Having to incoporate CTRL into the active
counter logic as well as the other stuff seems like it might make
things messier that way overall though.

> ---
>  target/ppc/cpu.h|  4 
>  target/ppc/cpu_init.c   |  2 +-
>  target/ppc/power8-pmu.c | 24 ++--
>  target/ppc/spr_tcg.h|  1 +
>  target/ppc/translate.c  | 12 
>  5 files changed, 40 insertions(+), 3 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 38cd2b5c43..993884164f 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -304,6 +304,7 @@ typedef enum {
>  PMU_EVENT_INACTIVE,
>  PMU_EVENT_CYCLES,
>  PMU_EVENT_INSTRUCTIONS,
> +PMU_EVENT_INSN_RUN_LATCH,
>  } PMUEventType;
>  
>  
> /*/
> @@ -389,6 +390,9 @@ typedef enum {
>  #define MMCR1_PMC4SEL_START 56
>  #define MMCR1_PMC4EVT_EXTR (64 - MMCR1_PMC4SEL_START - MMCR1_EVT_SIZE)
>  
> +/* PMU uses CTRL_RUN to sample PM_RUN_INST_CMPL */
> +#define CTRL_RUN PPC_BIT(63)
> +
>  /* LPCR bits */
>  #define LPCR_VPM0 PPC_BIT(0)
>  #define LPCR_VPM1 PPC_BIT(1)
> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> index 2d72dde26d..ecce4c7c1e 100644
> --- a/target/ppc/cpu_init.c
> +++ b/target/ppc/cpu_init.c
> @@ -6749,7 +6749,7 @@ static void register_book3s_ctrl_sprs(CPUPPCState *env)
>  {
>  spr_register(env, SPR_CTRL, "SPR_CTRL",
>   SPR_NOACCESS, SPR_NOACCESS,
> - SPR_NOACCESS, _write_generic,
> + SPR_NOACCESS, _write_CTRL,
>   0x);
>  spr_register(env, SPR_UCTRL, "SPR_UCTRL",
>   _read_ureg, SPR_NOACCESS,
> diff --git a/target/ppc/power8-pmu.c b/target/ppc/power8-pmu.c
> index 59d0def79d..98797f0b2f 100644
> --- a/target/ppc/power8-pmu.c
> +++ b/target/ppc/power8-pmu.c
> @@ -96,6 +96,15 @@ static PMUEventType pmc_get_event(CPUPPCState *env, int 
> sprn)
>  evt_type = PMU_EVENT_CYCLES;
>  }
>  break;
> +case 0xFA:
> +/*
> + * PMC4SEL = 0xFA is the "instructions completed
> + * with run latch set" event.
> + */
> +if (sprn == SPR_POWER_PMC4) {
> +evt_type = PMU_EVENT_INSN_RUN_LATCH;
> +}
> +break;
>  case 0xFE:
>  /*
>   * PMC1SEL = 0xFE is the architected PowerISA v3.1
> @@ -119,11 +128,22 @@ static bool pmu_increment_insns(CPUPPCState *env, 
> uint32_t num_insns)
>  
>  /* PMC6 never counts instructions */
>  for (sprn = SPR_POWER_PMC1; sprn <= SPR_POWER_PMC5; sprn++) {
> -if (pmc_get_event(env, sprn) != PMU_EVENT_INSTRUCTIONS) {
> +PMUEventType evt_type = pmc_get_event(env, sprn);
> +bool insn_event = evt_type == PMU_EVENT_INSTRUCTIONS ||
> +  evt_type == PMU_EVENT_INSN_RUN_LATCH;
> +
> +if (pmc_is_inactive(env, sprn) || !insn_event) {
>  continue;
>  }
>  
> -env->spr[sprn] += num_insns;
> +if (evt_type == PMU_EVENT_INSTRUCTIONS) {
> +env->spr[sprn] += num_insns;
> +}
> +
> +if (evt_type == PMU_EVENT_INSN_RUN_LATCH &&
> +env->spr[SPR_CTRL] & CTRL_RUN) {
> +env->spr[sprn] += num_insns;
> +}
>  
>  if (env->spr[sprn] >= PMC_COUNTER_NEGATIVE_VAL &&
>  pmc_has_overflow_enabled(env, sprn)) {
> diff --git a/target/ppc/spr_tcg.h b/target/ppc/spr_tcg.h
> index 1d6521eedc..f98d97c0ba 100644
> --- a/target/ppc/spr_tcg.h
> +++ b/target/ppc/spr_tcg.h
> @@ -28,6 +28,7 @@ void spr_write_generic(DisasContext *ctx, int sprn, int 
> gprn);
>  void spr_write_MMCR0(DisasContext *ctx, int sprn, int gprn);
>  void spr_write_MMCR1(DisasContext *ctx, int sprn, int gprn);
>  void spr_write_PMC(DisasContext *ctx, int sprn, int gprn);
> +void spr_write_CTRL(DisasContext *ctx, int sprn, int gprn);
>  void spr_read_xer(DisasContext *ctx, int gprn, int sprn);
>  void spr_write_xer(DisasContext *ctx, int sprn, int gprn);
>  void spr_read_lr(DisasContext *ctx, int gprn, int sprn);
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index ccc83d0603..d0e361a9d1 100644
> --- a/target/ppc/translate.c
> +++

Re: [PATCH v4] target/ppc: fix Hash64 MMU update of PTE bit R

2021-11-29 Thread David Gibson

On Mon, Nov 29, 2021 at 03:57:51PM -0300, Leandro Lupori wrote:
> When updating the R bit of a PTE, the Hash64 MMU was using a wrong byte
> offset, causing the first byte of the adjacent PTE to be corrupted.
> This caused a panic when booting FreeBSD, using the Hash MMU.
> 
> Fixes: a2dd4e83e76b ("ppc/hash64: Rework R and C bit updates")
> Signed-off-by: Leandro Lupori 

Reviewed-by: David Gibson 

Thanks for your patience with our nitpicking :).

> ---
> Changes from v3:
> - rename defines
> ---
>  hw/ppc/spapr.c  | 8 
>  hw/ppc/spapr_softmmu.c  | 2 +-
>  target/ppc/mmu-hash64.c | 4 ++--
>  target/ppc/mmu-hash64.h | 5 +
>  4 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 163c90388a..3b5fd749be 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1414,7 +1414,7 @@ void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>  kvmppc_write_hpte(ptex, pte0, pte1);
>  } else {
>  if (pte0 & HPTE64_V_VALID) {
> -stq_p(spapr->htab + offset + HASH_PTE_SIZE_64 / 2, pte1);
> +stq_p(spapr->htab + offset + HPTE64_DW1, pte1);
>  /*
>   * When setting valid, we write PTE1 first. This ensures
>   * proper synchronization with the reading code in
> @@ -1430,7 +1430,7 @@ void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>   * ppc_hash64_pteg_search()
>   */
>  smp_wmb();
> -stq_p(spapr->htab + offset + HASH_PTE_SIZE_64 / 2, pte1);
> +stq_p(spapr->htab + offset + HPTE64_DW1, pte1);
>  }
>  }
>  }
> @@ -1438,7 +1438,7 @@ void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>  static void spapr_hpte_set_c(PPCVirtualHypervisor *vhyp, hwaddr ptex,
>   uint64_t pte1)
>  {
> -hwaddr offset = ptex * HASH_PTE_SIZE_64 + 15;
> +hwaddr offset = ptex * HASH_PTE_SIZE_64 + HPTE64_DW1_C;
>  SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
>  
>  if (!spapr->htab) {
> @@ -1454,7 +1454,7 @@ static void spapr_hpte_set_c(PPCVirtualHypervisor 
> *vhyp, hwaddr ptex,
>  static void spapr_hpte_set_r(PPCVirtualHypervisor *vhyp, hwaddr ptex,
>   uint64_t pte1)
>  {
> -hwaddr offset = ptex * HASH_PTE_SIZE_64 + 14;
> +hwaddr offset = ptex * HASH_PTE_SIZE_64 + HPTE64_DW1_R;
>  SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
>  
>  if (!spapr->htab) {
> diff --git a/hw/ppc/spapr_softmmu.c b/hw/ppc/spapr_softmmu.c
> index f8924270ef..4ee03c83e4 100644
> --- a/hw/ppc/spapr_softmmu.c
> +++ b/hw/ppc/spapr_softmmu.c
> @@ -426,7 +426,7 @@ static void new_hpte_store(void *htab, uint64_t pteg, int 
> slot,
>  addr += slot * HASH_PTE_SIZE_64;
>  
>  stq_p(addr, pte0);
> -stq_p(addr + HASH_PTE_SIZE_64 / 2, pte1);
> +stq_p(addr + HPTE64_DW1, pte1);
>  }
>  
>  static int rehash_hpte(PowerPCCPU *cpu,
> diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
> index 19832c4b46..da9fe99ff8 100644
> --- a/target/ppc/mmu-hash64.c
> +++ b/target/ppc/mmu-hash64.c
> @@ -786,7 +786,7 @@ static void ppc_hash64_set_dsi(CPUState *cs, int mmu_idx, 
> uint64_t dar, uint64_t
>  
>  static void ppc_hash64_set_r(PowerPCCPU *cpu, hwaddr ptex, uint64_t pte1)
>  {
> -hwaddr base, offset = ptex * HASH_PTE_SIZE_64 + 16;
> +hwaddr base, offset = ptex * HASH_PTE_SIZE_64 + HPTE64_DW1_R;
>  
>  if (cpu->vhyp) {
>  PPCVirtualHypervisorClass *vhc =
> @@ -803,7 +803,7 @@ static void ppc_hash64_set_r(PowerPCCPU *cpu, hwaddr 
> ptex, uint64_t pte1)
>  
>  static void ppc_hash64_set_c(PowerPCCPU *cpu, hwaddr ptex, uint64_t pte1)
>  {
> -hwaddr base, offset = ptex * HASH_PTE_SIZE_64 + 15;
> +hwaddr base, offset = ptex * HASH_PTE_SIZE_64 + HPTE64_DW1_C;
>  
>  if (cpu->vhyp) {
>  PPCVirtualHypervisorClass *vhc =
> diff --git a/target/ppc/mmu-hash64.h b/target/ppc/mmu-hash64.h
> index c5b2f97ff7..1496955d38 100644
> --- a/target/ppc/mmu-hash64.h
> +++ b/target/ppc/mmu-hash64.h
> @@ -97,6 +97,11 @@ void ppc_hash64_finalize(PowerPCCPU *cpu);
>  #define HPTE64_V_1TB_SEG0x4000ULL
>  #define HPTE64_V_VRMA_MASK  0x4001ff00ULL
>  
> +/* PTE offsets */
> +#define HPTE64_DW1  (HASH_PTE_SIZE_64 / 2)
> +#define HPTE64_DW1_R(HPTE64_DW1 + 6)
> +#define HPTE64_DW1_C(HPTE64_DW1 + 7)
> +
>  /* Format changes for ARCH v3 */
>  #define HPTE64_V_COMMON_BITS0x000fULL
>  #define HPTE64_R_3_0_SSIZE_SHIFT 58

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[PATCH 1/1] hw/arm/virt: Support for virtio-mem-pci

2021-11-29 Thread Gavin Shan

This supports virtio-mem-pci device on "virt" platform, by simply
following the implementation on x86.

   * The patch was written by David Hildenbrand 
 modified by Jonathan Cameron 

   * This implements the hotplug handlers to support virtio-mem-pci
 device hot-add, while the hot-remove isn't supported as we have
 on x86.

   * The block size is 1GB on ARM64 instead of 128MB on x86.

   * It has been passing the tests with various combinations like 64KB
 and 4KB page sizes on host and guest, different memory device
 backends like normal, transparent huge page and HugeTLB, plus
 migration.

Signed-off-by: Gavin Shan 
---
 hw/arm/Kconfig |  1 +
 hw/arm/virt.c  | 68 +-
 hw/virtio/virtio-mem.c |  2 ++
 3 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 2d37d29f02..15aff8efb8 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -27,6 +27,7 @@ config ARM_VIRT
 select DIMM
 select ACPI_HW_REDUCED
 select ACPI_APEI
+select VIRTIO_MEM_SUPPORTED
 
 config CHEETAH
 bool
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 369552ad45..f4599a5ef0 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -71,9 +71,11 @@
 #include "hw/arm/smmuv3.h"
 #include "hw/acpi/acpi.h"
 #include "target/arm/internals.h"
+#include "hw/mem/memory-device.h"
 #include "hw/mem/pc-dimm.h"
 #include "hw/mem/nvdimm.h"
 #include "hw/acpi/generic_event_device.h"
+#include "hw/virtio/virtio-mem-pci.h"
 #include "hw/virtio/virtio-iommu.h"
 #include "hw/char/pl011.h"
 #include "qemu/guest-random.h"
@@ -2480,6 +2482,63 @@ static void virt_memory_plug(HotplugHandler *hotplug_dev,
  dev, _abort);
 }
 
+static void virt_virtio_md_pci_pre_plug(HotplugHandler *hotplug_dev,
+DeviceState *dev, Error **errp)
+{
+HotplugHandler *hotplug_dev2 = qdev_get_bus_hotplug_handler(dev);
+Error *local_err = NULL;
+
+if (!hotplug_dev2 && dev->hotplugged) {
+/*
+ * Without a bus hotplug handler, we cannot control the plug/unplug
+ * order. We should never reach this point when hotplugging on x86,
+ * however, better add a safety net.
+ */
+error_setg(errp, "hotplug of virtio based memory devices not supported"
+   " on this bus.");
+return;
+}
+/*
+ * First, see if we can plug this memory device at all. If that
+ * succeeds, branch of to the actual hotplug handler.
+ */
+memory_device_pre_plug(MEMORY_DEVICE(dev), MACHINE(hotplug_dev), NULL,
+   _err);
+if (!local_err && hotplug_dev2) {
+hotplug_handler_pre_plug(hotplug_dev2, dev, _err);
+}
+error_propagate(errp, local_err);
+}
+
+static void virt_virtio_md_pci_plug(HotplugHandler *hotplug_dev,
+DeviceState *dev, Error **errp)
+{
+HotplugHandler *hotplug_dev2 = qdev_get_bus_hotplug_handler(dev);
+Error *local_err = NULL;
+
+/*
+ * Plug the memory device first and then branch off to the actual
+ * hotplug handler. If that one fails, we can easily undo the memory
+ * device bits.
+ */
+memory_device_plug(MEMORY_DEVICE(dev), MACHINE(hotplug_dev));
+if (hotplug_dev2) {
+hotplug_handler_plug(hotplug_dev2, dev, _err);
+if (local_err) {
+memory_device_unplug(MEMORY_DEVICE(dev), MACHINE(hotplug_dev));
+}
+}
+error_propagate(errp, local_err);
+}
+
+static void virt_virtio_md_pci_unplug_request(HotplugHandler *hotplug_dev,
+  DeviceState *dev, Error **errp)
+{
+/* We don't support hot unplug of virtio based memory devices */
+error_setg(errp, "virtio based memory devices cannot be unplugged.");
+}
+
+
 static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
 DeviceState *dev, Error **errp)
 {
@@ -2513,6 +2572,8 @@ static void 
virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
 qdev_prop_set_uint32(dev, "len-reserved-regions", 1);
 qdev_prop_set_string(dev, "reserved-regions[0]", resv_prop_str);
 g_free(resv_prop_str);
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
+virt_virtio_md_pci_pre_plug(hotplug_dev, dev, errp);
 }
 }
 
@@ -2538,6 +2599,8 @@ static void virt_machine_device_plug_cb(HotplugHandler 
*hotplug_dev,
 vms->iommu = VIRT_IOMMU_VIRTIO;
 vms->virtio_iommu_bdf = pci_get_bdf(pdev);
 create_virtio_iommu_dt_bindings(vms);
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
+virt_virtio_md_pci_plug(hotplug_dev, dev, errp);
 }
 }
 
@@ -2588,6 +2651,8 @@ static void 
virt_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
 {
 if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {

[PATCH 0/1] hw/arm/virt: Support for virtio-mem-pci

2021-11-29 Thread Gavin Shan

This series supports virtio-mem-pci device, by simply following the
implementation on x86. The exception is the block size is 1GB on ARM64
instead of 128MB on x86.

The work was done by David Hildenbrand and then Jonathan Cameron. I'm
taking the patch and putting more efforts, which is all about testing
to me at current stage.

Testing
===

The upstream linux kernel (v5.16.rc3) is used on host/guest during
the testing. The guest kernel includes changes to enable virtio-mem
driver, which is simply to enable CONFIG_VIRTIO_MEM on ARM64.

Mutiple combinations like page sizes on host/guest, memory backend
device etc are covered in the testing. Besides, migration is also
tested. The following command lines are used for VM or virtio-mem-pci
device hot-add. It's notable that virtio-mem-pci device hot-remove
isn't supported, similar to what we have on x86. 

  host.pgsize  guest.pgsize  backendhot-add  hot-remove  migration
  -
   4KB 4KB   normal ok   ok  ok
 THPok   ok  ok
 hugeTLBok   ok  ok
   4KB 64KB  normal ok   ok  ok
 THPok   ok  ok
 hugeTLBok   ok  ok
  64KB 4KB   normal ok   ok  ok
 THPok   ok  ok
 hugeTLBok   ok  ok
  64KB 64KB  normal ok   ok  ok
 THPok   ok  ok
 hugeTLBok   ok  ok

The command lines are used for VM. When hugeTLBfs is used, all memory
backend objects are popuated on /dev/hugepages-2048kB or
/dev/hugepages-524288kB, depending on the host page sizes.

  /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64   
\
  -accel kvm -machine virt,gic-version=host 
\
  -cpu host -smp 4,sockets=2,cores=2,threads=1  
\
  -m 1024M,slots=16,maxmem=64G  
\
  -object memory-backend-ram,id=mem0,size=512M  
\
  -object memory-backend-ram,id=mem1,size=512M  
\
  -numa node,nodeid=0,cpus=0-1,memdev=mem0  
\
  -numa node,nodeid=1,cpus=2-3,memdev=mem1  
\
 :
  -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image 
\
  -initrd /home/gavin/sandbox/images/rootfs.cpio.xz 
\
  -append earlycon=pl011,mmio,0x900 
\
  -device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1 
\
  -device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2 
\
  -device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3 
\
  -object memory-backend-ram,id=vmem0,size=512M 
\
  -device virtio-mem-pci,id=vm0,bus=pcie.1,memdev=vmem0,node=0,requested-size=0 
\
  -object memory-backend-ram,id=vmem1,size=512M 
\
  -device virtio-mem-pci,id=vm1,bus=pcie.2,memdev=vmem1,node=1,requested-size=0 

Command lines used for memory hot-add and hot-remove:

  (qemu) qom-set vm1 requested-size 512M
  (qemu) qom-set vm1 requested-size 0
  (qemu) qom-set vm1 requested-size 512M

Command lines used for virtio-mem-pci device hot-add:

  (qemu) object_add memory-backend-ram,id=hp-mem1,size=512M
  (qemu) device_add virtio-mem-pci,id=hp-vm1,bus=pcie.3,memdev=hp-mem1,node=1
  (qemu) qom-set hp-vm1 requested-size 512M
  (qemu) qom-set hp-vm1 requested-size 0
  (qemu) qom-set hp-vm1 requested-size 512M

Gavin Shan (1):
  hw/arm/virt: Support for virtio-mem-pci

 hw/arm/Kconfig |  1 +
 hw/arm/virt.c  | 68 +-
 hw/virtio/virtio-mem.c |  2 ++
 3 files changed, 70 insertions(+), 1 deletion(-)

-- 
2.23.0

Re: [PATCH 10/10] vhost-user-blk: propagate error return from generic vhost

2021-11-29 Thread Raphael Norwitz

Ditto - not for 6.2.

I'm happy with this once the vhost and vhost-user patches go in.

Looks like vhost-user-vgpu, vhost-user-input and vhost-user-vsock also
return -1 on vhost_user_*_handle_config_change, so presumably those
should be fixed too.

On Thu, Nov 11, 2021 at 06:33:54PM +0300, Roman Kagan wrote:
> Fix the only callsite that doesn't propagate the error code from the
> generic vhost code.
> 
> Signed-off-by: Roman Kagan 
> ---

Reviewed-by: Raphael Norwitz 

>  hw/block/vhost-user-blk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index f9b17f6813..ab11ce8252 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -100,7 +100,7 @@ static int vhost_user_blk_handle_config_change(struct 
> vhost_dev *dev)
> _err);
>  if (ret < 0) {
>  error_report_err(local_err);
> -return -1;
> +return ret;
>  }
>  
>  /* valid for resize only */
> -- 
> 2.33.1
>

Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize

2021-11-29 Thread Raphael Norwitz

As mst said, not for 6.2.

On Thu, Nov 11, 2021 at 06:33:45PM +0300, Roman Kagan wrote:
> vhost-user-blk realize only attempts to reconnect if the previous
> connection attempt failed on "a problem with the connection and not an
> error related to the content (which would fail again the same way in the
> next attempt)".
> 
> However this distinction is very subtle, and may be inadvertently broken
> if the code changes somewhere deep down the stack and a new error gets
> propagated up to here.
> 
> OTOH now that the number of reconnection attempts is limited it seems
> harmless to try reconnecting on any error.
> 
> So relax the condition of whether to retry connecting to check for any
> error.
> 
> This patch amends a527e312b5 "vhost-user-blk: Implement reconnection
> during realize".
> 
> Signed-off-by: Roman Kagan 

Reviewed-by: Raphael Norwitz 

> ---
>  hw/block/vhost-user-blk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index ba13cb87e5..f9b17f6813 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -511,7 +511,7 @@ static void vhost_user_blk_device_realize(DeviceState 
> *dev, Error **errp)
>  *errp = NULL;
>  }
>  ret = vhost_user_blk_realize_connect(s, errp);
> -} while (ret == -EPROTO && retries--);
> +} while (ret < 0 && retries--);
>  
>  if (ret < 0) {
>  goto virtio_err;
> -- 
> 2.33.1
>

Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize

2021-11-29 Thread Raphael Norwitz

> > 
> > I see. I hadn't looked at the rest of the series yet because I ran out
> > of time, but now that I'm skimming them, I see quite a few places that
> > use non-EPROTO, but I wonder which of them actually should be
> > reconnected. So far all I saw were presumably persistent errors where a
> > retry won't help. Can you give me some examples?
> 
> E.g. the particular case you mention earlier, -ECONNREFUSED, is not
> unlikely to happen due to the vhost-user server restart for maintenance;
> in this case retying looks like a reasonable thing to do, doesn't it?
>

Seems like a net-positive to me, expecially with the cleanups in the
rest of the series, but I don't feel strongly.

> Thanks,
> Roman.
>

Re: [PATCH] hw/vhost-user-blk: turn on VIRTIO_BLK_F_SIZE_MAX feature for virtio blk device

2021-11-29 Thread Raphael Norwitz

Just a commit message nit. Otherwise I'm happy with this. OFC should not
be queued for 6.2.

On Fri, Nov 26, 2021 at 10:00:18AM +0800, Andy Pei wrote:
> Turn on pre-defined feature VIRTIO_BLK_F_SIZE_MAX virtio blk device
> to avoid guest DMA request size is too large to exceed hardware spec.

Grammar here. Should be something like "...DMA request sizes which are
to large for the hardware spec".

> 
> Signed-off-by: Andy Pei 

Acked-by: Raphael Norwitz 

> ---
>  hw/block/vhost-user-blk.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index ba13cb8..eb1264a 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -252,6 +252,7 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice 
> *vdev,
>  VHostUserBlk *s = VHOST_USER_BLK(vdev);
>  
>  /* Turn on pre-defined features */
> +virtio_add_feature(, VIRTIO_BLK_F_SIZE_MAX);
>  virtio_add_feature(, VIRTIO_BLK_F_SEG_MAX);
>  virtio_add_feature(, VIRTIO_BLK_F_GEOMETRY);
>  virtio_add_feature(, VIRTIO_BLK_F_TOPOLOGY);
> -- 
> 1.8.3.1
>

[PATCH for-6.2 v2] block/nbd: forbid incompatible change of server options on reconnect

2021-11-29 Thread Vladimir Sementsov-Ogievskiy

Reconnect feature was never prepared to handle server options changed
on reconnect. Let's be stricter and check what exactly is changed. If
server capabilities just got richer don't worry. Otherwise fail and
drop the established connection.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

v2: by Eric's comments:
 - drop extra check about old->min_block % new->min_block
 - make context_id check conditional itself
 - don't handle READ_ONLY flag here (see comment in code)
 - wording

 Code seems quite obvious, but honestly I still didn't test that it does
 what it should :( And I'm afraid, Qemu actually doesn't provide good
 possibility to do so.

 Eric, may be you know some simple way to test it with nbdkit?

 include/block/nbd.h |  9 +
 nbd/client-connection.c | 88 +
 2 files changed, 97 insertions(+)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 78d101b774..9e1943d24c 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -157,6 +157,10 @@ enum {
 #define NBD_FLAG_SEND_RESIZE   (1 << NBD_FLAG_SEND_RESIZE_BIT)
 #define NBD_FLAG_SEND_CACHE(1 << NBD_FLAG_SEND_CACHE_BIT)
 #define NBD_FLAG_SEND_FAST_ZERO(1 << NBD_FLAG_SEND_FAST_ZERO_BIT)
+/*
+ * WARNING! If you add any new NBD_FLAG_ flag, check that logic in
+ * nbd_is_new_info_compatible() is still good about handling flags.
+ */
 
 /* New-style handshake (global) flags, sent from server to client, and
control what will happen during handshake phase. */
@@ -305,6 +309,11 @@ struct NBDExportInfo {
 
 uint32_t context_id;
 
+/*
+ * WARNING! When adding any new field to the structure, don't forget
+ * to check and update the nbd_is_new_info_compatible() function.
+ */
+
 /* Set by server results during nbd_receive_export_list() */
 char *description;
 int n_contexts;
diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 695f855754..d50c187482 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -37,6 +37,10 @@ struct NBDClientConnection {
 bool do_negotiation;
 bool do_retry;
 
+/* Used only by connection thread, does not need mutex protection */
+bool has_prev_info;
+NBDExportInfo prev_info;
+
 QemuMutex mutex;
 
 /*
@@ -160,6 +164,69 @@ static int nbd_connect(QIOChannelSocket *sioc, 
SocketAddress *addr,
 return 0;
 }
 
+static bool nbd_is_new_info_compatible(NBDExportInfo *old, NBDExportInfo *new,
+   Error **errp)
+{
+uint32_t dropped_flags;
+
+if (old->structured_reply && !new->structured_reply) {
+error_setg(errp, "Server options degraded after reconnect: "
+   "structured_reply is not supported anymore");
+return false;
+}
+
+if (old->base_allocation) {
+if (!new->base_allocation) {
+error_setg(errp, "Server options degraded after reconnect: "
+   "base_allocation is not supported anymore");
+return false;
+}
+
+if (old->context_id != new->context_id) {
+error_setg(errp, "Meta context id changed after reconnect");
+return false;
+}
+}
+
+if (old->size != new->size) {
+error_setg(errp, "NBD export size changed after reconnect");
+return false;
+}
+
+/*
+ * No worry if rotational status changed.
+ *
+ * Also, we can't handle NBD_FLAG_READ_ONLY properly at this level: we 
don't
+ * actually know, does our client need write access or not. So, it's 
handled
+ * in block layer in nbd_handle_updated_info().
+ *
+ * All other flags are feature flags, they should not degrade.
+ */
+dropped_flags = (old->flags & ~new->flags) &
+~(NBD_FLAG_ROTATIONAL | NBD_FLAG_READ_ONLY);
+if (dropped_flags) {
+error_setg(errp, "Server options degraded after reconnect: flags 0x%"
+   PRIx32 " are not reported anymore", dropped_flags);
+return false;
+}
+
+if (new->min_block > old->min_block) {
+error_setg(errp, "Server requires more strict min_block after "
+   "reconnect: %" PRIu32 " instead of %" PRIu32,
+   new->min_block, old->min_block);
+return false;
+}
+
+if (new->max_block < old->max_block) {
+error_setg(errp, "Server requires more strict max_block after "
+   "reconnect: %" PRIu32 " instead of %" PRIu32,
+   new->max_block, old->max_block);
+return false;
+}
+
+return true;
+}
+
 static void *connect_thread_func(void *opaque)
 {
 NBDClientConnection *conn = opaque;
@@ -183,6 +250,27 @@ static void *connect_thread_func(void *opaque)
   conn->do_negotiation ? >updated_info : NULL,
   conn->tlscreds, >ioc, >err);
 
+if (ret == 0) {
+if (conn->has_prev_info &&
+

Re: [PATCH 00/10] vhost: stick to -errno error return convention

2021-11-29 Thread Roman Kagan

On Sun, Nov 28, 2021 at 04:47:20PM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2021 at 06:33:44PM +0300, Roman Kagan wrote:
> > Error propagation between the generic vhost code and the specific backends 
> > is
> > not quite consistent: some places follow "return -1 and set errno" 
> > convention,
> > while others assume "return negated errno".  Furthermore, not enough care is
> > taken not to clobber errno.
> > 
> > As a result, on certain code paths the errno resulting from a failure may 
> > get
> > overridden by another function call, and then that zero errno inidicating
> > success is propagated up the stack, leading to failures being lost.  In
> > particular, we've seen errors in the communication with a vhost-user-blk 
> > slave
> > not trigger an immediate connection drop and reconnection, leaving it in a
> > broken state.
> > 
> > Rework error propagation to always return negated errno on errors and
> > correctly pass it up the stack.
> 
> Hi Roman,
> if there are bugfixes here I'll be happy to take them right now.
> The wholesale rework seems inappropriate for 6.2, I'll be
> happy to tag it for after 6.2. Pls ping me aftre release to help
> make sure it's not lost.

All these patches are bugfixes in one way or another.  That said, none
of the problems being addressed are recent regressions.  OTOH the
patches introduce non-zero churn and change behavior on some error
paths, so I'd suggest to postpone the whole series till after 6.2 is
out.

Thanks,
Roman.

Re: [PATCH 1/1] ppc/pnv.c: add a friendly warning when accel=kvm is used

2021-11-29 Thread Daniel Henrique Barboza





On 11/27/21 02:14, David Gibson wrote:

On Fri, Nov 26, 2021 at 06:51:38PM +0100, Cédric le Goater wrote:

On 11/26/21 02:11, David Gibson wrote:

On Thu, Nov 25, 2021 at 07:42:02PM -0300, Daniel Henrique Barboza wrote:

If one tries to use -machine powernv9,accel=kvm in a Power9 host, a
cryptic error will be shown:

qemu-system-ppc64: Register sync failed... If you're using kvm-hv.ko, only "-cpu 
host" is possible
qemu-system-ppc64: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): Invalid 
argument

Appending '-cpu host' will throw another error:

qemu-system-ppc64: invalid chip model 'host' for powernv9 machine

The root cause is that in IBM PowerPC we have different specs for the bare-metal
and the guests. The bare-metal follows OPAL, the guests follow PAPR. The kernel
KVM modules presented in the ppc kernels implements PAPR. This means that we
can't use KVM accel when using the powernv machine, which is the emulation of
the bare-metal host.

All that said, let's give a more informative error in this case.

Signed-off-by: Daniel Henrique Barboza 
---
   hw/ppc/pnv.c | 5 +
   1 file changed, 5 insertions(+)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 71e45515f1..e5b87e8730 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -742,6 +742,11 @@ static void pnv_init(MachineState *machine)
   DriveInfo *pnor = drive_get(IF_MTD, 0, 0);
   DeviceState *dev;
+if (kvm_enabled()) {
+error_report("The powernv machine does not work with KVM 
acceleration");
+exit(EXIT_FAILURE);
+}



Hmm.. my only concern here is that powernv could, at least
theoretically, work with KVM PR.  I don't think it does right now,
though.


At the same time, it is nice to not let the user think that it could work
in its current state. Don't you think so ?


Right, I'm thinking of the implication if you have an old qemu but a
new KVM which let it work.  Chances of KVM actually implementing this
probably aren't good though, so requiring the qemu update if we ever
do is probably the better deal.



If the KVM module implements powernv accel support in the future, I wouldn't
take my the chances with the powernv machine working out of the box with it.

Most likely, if an endeavor of supporting KVM accel for powernv ever takes
place, we'll need QEMU changes to go with it. And when that happens we can
revert this patch and make the other necessary changes/fixes.

All that said, perhaps it's useful to add a note in docs/system/ppc/powernv.rst
explaining the the rationale for what we're doing here.



Thanks,


Daniel

Re: [PATCH for-7.0 0/4] qemu-common.h include cleanup

2021-11-29 Thread Peter Maydell

On Mon, 29 Nov 2021 at 20:05, Peter Maydell  wrote:
>
> qemu-common.h has a comment at the top:
>
>  * This file is supposed to be included only by .c files. No header file 
> should
>  * depend on qemu-common.h, as this would easily lead to circular header
>  * dependencies.

As a side note, that comment was added back in 2012 when qemu-common.h
was bigger, included other headers, and did some of the work we currently
use osdep.h for. As it stands today qemu-common.h includes no other
files so it isn't a source of possible circular dependencies -- it's
just a grab-bag of miscellaneous prototypes that in an ideal world
would be in more focused individual headers[*]. So there's an argument
for deleting this comment...

[*] A cleanup that would be nice, and I'm about to send out a patchset
that splits out the rtc related functions; but the grab-bag at the
bottom of osdep.h is probably higher priority because that header
gets pulled in by an order of magnitude more C files.

-- PMM

[PATCH for-7.0] rtc: Move RTC function prototypes to their own header

2021-11-29 Thread Peter Maydell

softmmu/rtc.c defines two public functions: qemu_get_timedate() and
qemu_timedate_diff().  Currently we keep the prototypes for these in
qemu-common.h, but most files don't need them.  Move them to their
own header, a new include/sysemu/rtc.h.

Since the C files using these two functions did not need to include
qemu-common.h for any other reason, we can remove those include lines
when we add the include of the new rtc.h.

The license for the .h file follows that of the softmmu/rtc.c
where both the functions are defined.

Signed-off-by: Peter Maydell 
---
I have added documentation comments for the two functions, but
since my understanding of them and their purpose is little shaky
review would be welcome.
---
 include/qemu-common.h|  3 ---
 include/sysemu/rtc.h | 58 
 hw/arm/omap1.c   |  2 +-
 hw/arm/pxa2xx.c  |  2 +-
 hw/arm/strongarm.c   |  2 +-
 hw/misc/mac_via.c|  2 +-
 hw/misc/macio/cuda.c |  2 +-
 hw/misc/macio/pmu.c  |  2 +-
 hw/ppc/spapr_rtc.c   |  2 +-
 hw/rtc/allwinner-rtc.c   |  2 +-
 hw/rtc/aspeed_rtc.c  |  2 +-
 hw/rtc/ds1338.c  |  2 +-
 hw/rtc/exynos4210_rtc.c  |  2 +-
 hw/rtc/goldfish_rtc.c|  2 +-
 hw/rtc/m41t80.c  |  2 +-
 hw/rtc/m48t59.c  |  2 +-
 hw/rtc/mc146818rtc.c |  2 +-
 hw/rtc/pl031.c   |  2 +-
 hw/rtc/twl92230.c|  2 +-
 hw/rtc/xlnx-zynqmp-rtc.c |  2 +-
 hw/s390x/tod-tcg.c   |  2 +-
 hw/scsi/megasas.c|  2 +-
 net/dump.c   |  2 +-
 softmmu/rtc.c|  2 +-
 24 files changed, 80 insertions(+), 25 deletions(-)
 create mode 100644 include/sysemu/rtc.h

diff --git a/include/qemu-common.h b/include/qemu-common.h
index 73bcf763ed8..bed0b06a3d2 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -26,9 +26,6 @@
 int qemu_main(int argc, char **argv, char **envp);
 #endif
 
-void qemu_get_timedate(struct tm *tm, int offset);
-int qemu_timedate_diff(struct tm *tm);
-
 void *qemu_oom_check(void *ptr);
 
 ssize_t qemu_write_full(int fd, const void *buf, size_t count)
diff --git a/include/sysemu/rtc.h b/include/sysemu/rtc.h
new file mode 100644
index 000..159702b45b5
--- /dev/null
+++ b/include/sysemu/rtc.h
@@ -0,0 +1,58 @@
+/*
+ * RTC configuration and clock read
+ *
+ * Copyright (c) 2003-2021 QEMU contributors
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef SYSEMU_RTC_H
+#define SYSEMU_RTC_H
+
+/**
+ * qemu_get_timedate: Get the current RTC time
+ * @tm: struct tm to fill in with RTC time
+ * @offset: offset in seconds to adjust the RTC time by before
+ *  converting to struct tm format.
+ *
+ * This function fills in @tm with the current RTC time, as adjusted
+ * by @offset (for example, if @offset is 3600 then the returned time/date
+ * will be one hour further ahead than the current RTC time).
+ *
+ * The usual use is by RTC device models, which should call this function
+ * to find the time/date value that they should return to the guest
+ * when it reads the RTC registers.
+ *
+ * The behaviour of the clock whose value this function returns will
+ * depend on the -rtc command line option passed by the user.
+ */
+void qemu_get_timedate(struct tm *tm, int offset);
+
+/**
+ * qemu_timedate_diff: Return difference between a struct tm and the RTC
+ * @tm: struct tm containing the date/time to compare against
+ *
+ * Returns the difference in seconds between the RTC clock time
+ * and the date/time specified in @tm. For example, if @tm specifies
+ * a timestamp one hour further ahead than the current RTC time
+ * then this function will return 3600.
+ */
+int qemu_timedate_diff(struct tm *tm);
+
+#endif
diff --git a/hw/arm/omap1.c b/hw/arm/omap1.c
index 180d3788f89..9852c2a07ec 100644
--- a/hw/arm/omap1.c
+++ b/hw/arm/omap1.c
@@ -21,7 +21,6 @@
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qapi/error.h"
-#include "qemu-common.h"

[PULL 0/1] ppc queue

2021-11-29 Thread Cédric Le Goater

The following changes since commit a0fd8a5492240379a07c0b39c8dae3b8341b458f:

  Merge tag 'pull-for-6.2-291121-1' of https://github.com/stsquad/qemu into 
staging (2021-11-29 18:58:06 +0100)

are available in the Git repository at:

  https://github.com/legoater/qemu/ tags/pull-ppc-20211129

for you to fetch changes up to 7bf00dfb51566070960e0b7977e41abba96c130e:

  target/ppc: fix Hash64 MMU update of PTE bit R (2021-11-29 21:00:08 +0100)


ppc 6.2 queue:

* Hash64 MMU fix for FreeBSD installer


Leandro Lupori (1):
  target/ppc: fix Hash64 MMU update of PTE bit R

 target/ppc/mmu-hash64.h | 5 +
 hw/ppc/spapr.c  | 8 
 hw/ppc/spapr_softmmu.c  | 2 +-
 target/ppc/mmu-hash64.c | 4 ++--
 4 files changed, 12 insertions(+), 7 deletions(-)

[PULL 1/1] target/ppc: fix Hash64 MMU update of PTE bit R

2021-11-29 Thread Cédric Le Goater

From: Leandro Lupori 

When updating the R bit of a PTE, the Hash64 MMU was using a wrong byte
offset, causing the first byte of the adjacent PTE to be corrupted.
This caused a panic when booting FreeBSD, using the Hash MMU.

Fixes: a2dd4e83e76b ("ppc/hash64: Rework R and C bit updates")
Signed-off-by: Leandro Lupori 
Signed-off-by: Cédric Le Goater 
---
 target/ppc/mmu-hash64.h | 5 +
 hw/ppc/spapr.c  | 8 
 hw/ppc/spapr_softmmu.c  | 2 +-
 target/ppc/mmu-hash64.c | 4 ++--
 4 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/target/ppc/mmu-hash64.h b/target/ppc/mmu-hash64.h
index c5b2f97ff74f..1496955d389b 100644
--- a/target/ppc/mmu-hash64.h
+++ b/target/ppc/mmu-hash64.h
@@ -97,6 +97,11 @@ void ppc_hash64_finalize(PowerPCCPU *cpu);
 #define HPTE64_V_1TB_SEG0x4000ULL
 #define HPTE64_V_VRMA_MASK  0x4001ff00ULL
 
+/* PTE offsets */
+#define HPTE64_DW1  (HASH_PTE_SIZE_64 / 2)
+#define HPTE64_DW1_R(HPTE64_DW1 + 6)
+#define HPTE64_DW1_C(HPTE64_DW1 + 7)
+
 /* Format changes for ARCH v3 */
 #define HPTE64_V_COMMON_BITS0x000fULL
 #define HPTE64_R_3_0_SSIZE_SHIFT 58
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 163c90388af2..3b5fd749be89 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1414,7 +1414,7 @@ void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
 kvmppc_write_hpte(ptex, pte0, pte1);
 } else {
 if (pte0 & HPTE64_V_VALID) {
-stq_p(spapr->htab + offset + HASH_PTE_SIZE_64 / 2, pte1);
+stq_p(spapr->htab + offset + HPTE64_DW1, pte1);
 /*
  * When setting valid, we write PTE1 first. This ensures
  * proper synchronization with the reading code in
@@ -1430,7 +1430,7 @@ void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
  * ppc_hash64_pteg_search()
  */
 smp_wmb();
-stq_p(spapr->htab + offset + HASH_PTE_SIZE_64 / 2, pte1);
+stq_p(spapr->htab + offset + HPTE64_DW1, pte1);
 }
 }
 }
@@ -1438,7 +1438,7 @@ void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
 static void spapr_hpte_set_c(PPCVirtualHypervisor *vhyp, hwaddr ptex,
  uint64_t pte1)
 {
-hwaddr offset = ptex * HASH_PTE_SIZE_64 + 15;
+hwaddr offset = ptex * HASH_PTE_SIZE_64 + HPTE64_DW1_C;
 SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
 
 if (!spapr->htab) {
@@ -1454,7 +1454,7 @@ static void spapr_hpte_set_c(PPCVirtualHypervisor *vhyp, 
hwaddr ptex,
 static void spapr_hpte_set_r(PPCVirtualHypervisor *vhyp, hwaddr ptex,
  uint64_t pte1)
 {
-hwaddr offset = ptex * HASH_PTE_SIZE_64 + 14;
+hwaddr offset = ptex * HASH_PTE_SIZE_64 + HPTE64_DW1_R;
 SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
 
 if (!spapr->htab) {
diff --git a/hw/ppc/spapr_softmmu.c b/hw/ppc/spapr_softmmu.c
index f8924270eff5..4ee03c83e48e 100644
--- a/hw/ppc/spapr_softmmu.c
+++ b/hw/ppc/spapr_softmmu.c
@@ -426,7 +426,7 @@ static void new_hpte_store(void *htab, uint64_t pteg, int 
slot,
 addr += slot * HASH_PTE_SIZE_64;
 
 stq_p(addr, pte0);
-stq_p(addr + HASH_PTE_SIZE_64 / 2, pte1);
+stq_p(addr + HPTE64_DW1, pte1);
 }
 
 static int rehash_hpte(PowerPCCPU *cpu,
diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index 19832c4b46f2..da9fe99ff8bd 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -786,7 +786,7 @@ static void ppc_hash64_set_dsi(CPUState *cs, int mmu_idx, 
uint64_t dar, uint64_t
 
 static void ppc_hash64_set_r(PowerPCCPU *cpu, hwaddr ptex, uint64_t pte1)
 {
-hwaddr base, offset = ptex * HASH_PTE_SIZE_64 + 16;
+hwaddr base, offset = ptex * HASH_PTE_SIZE_64 + HPTE64_DW1_R;
 
 if (cpu->vhyp) {
 PPCVirtualHypervisorClass *vhc =
@@ -803,7 +803,7 @@ static void ppc_hash64_set_r(PowerPCCPU *cpu, hwaddr ptex, 
uint64_t pte1)
 
 static void ppc_hash64_set_c(PowerPCCPU *cpu, hwaddr ptex, uint64_t pte1)
 {
-hwaddr base, offset = ptex * HASH_PTE_SIZE_64 + 15;
+hwaddr base, offset = ptex * HASH_PTE_SIZE_64 + HPTE64_DW1_C;
 
 if (cpu->vhyp) {
 PPCVirtualHypervisorClass *vhc =
-- 
2.31.1

Re: [RFC for-6.2] block/nbd: forbid incompatible change of server options on reconnect

2021-11-29 Thread Vladimir Sementsov-Ogievskiy


29.11.2021 22:16, Eric Blake wrote:

On Wed, Nov 24, 2021 at 03:09:51PM +0100, Vladimir Sementsov-Ogievskiy wrote:

Reconnect feature was never prepared to handle server options changed
on reconnect. Let's be stricter and check what exactly is changed. If
server capabilities just got richer don't worry. Otherwise fail and
drop the established connection.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
+/*
+ * No worry if rotational status changed. But other flags are feature 
flags,
+ * they should not degrade.
+ */
+dropped_flags = (old->flags & ~new->flags) & ~NBD_FLAG_ROTATIONAL;
+if (dropped_flags) {
+error_setg(errp, "Server options degrade after reconnect: flags 0x%"
+   PRIx32 " are not reported anymore", dropped_flags);
+return false;
+}


Your logic is good for most flags, but somewhat wrong for
NBD_FLAG_READ_ONLY_BIT.  For cases where we are only using the block
device read-only, we don't care about changes of that bit, in either
direction.  But for cases where we want to use the block device
read-write, the bit changing from clear in the old to set in the new
server is an incompatible change that your logic failed to flag.



Oh right! Will fix it and resend soon.

--
Best regards,
Vladimir

Re: [RFC for-6.2] block/nbd: forbid incompatible change of server options on reconnect

2021-11-29 Thread Vladimir Sementsov-Ogievskiy


29.11.2021 20:34, Eric Blake wrote:

On Wed, Nov 24, 2021 at 03:09:51PM +0100, Vladimir Sementsov-Ogievskiy wrote:

Reconnect feature was never prepared to handle server options changed
on reconnect. Let's be stricter and check what exactly is changed. If
server capabilities just got richer don't worry. Otherwise fail and
drop the established connection.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all! The patch is probably good for 6.2. It's an RFC because I didn't
test it yet) But I want to early send, so that my proposed design be
available for discussion.


We're cutting it awfully close.  My justification for including it in
-rc3 (if we like it) is that it is a lot easier to audit that we
reject server downgrades than it is to audit whether we have a CVE
because of a server downgrade across a reconnect.  But it is not a new
regression to 6.2, so slipping it to 7.0 (if we don't feel comfortable
with the current iteration of the patch) is okay on that front.




  include/block/nbd.h |  9 +
  nbd/client-connection.c | 86 +
  2 files changed, 95 insertions(+)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 78d101b774..3d379b5539 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -157,6 +157,10 @@ enum {
  #define NBD_FLAG_SEND_RESIZE   (1 << NBD_FLAG_SEND_RESIZE_BIT)
  #define NBD_FLAG_SEND_CACHE(1 << NBD_FLAG_SEND_CACHE_BIT)
  #define NBD_FLAG_SEND_FAST_ZERO(1 << NBD_FLAG_SEND_FAST_ZERO_BIT)
+/*
+ * If you add any new NBD_FLAG_ flag, check that logic in
+ * nbd_is_new_info_compatible() is still good about handling flags.
+ */
  
  /* New-style handshake (global) flags, sent from server to client, and

 control what will happen during handshake phase. */
@@ -305,6 +309,11 @@ struct NBDExportInfo {
  
  uint32_t context_id;
  
+/*

+ * WARNING! when add any new field to the structure, don't forget to check


adding


+ * and updated nbd_is_new_info_compatible() function.


update the


+ */


Odd that one comment has WARNING! and the other does not.


+
  /* Set by server results during nbd_receive_export_list() */
  char *description;
  int n_contexts;
diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 695f855754..2d66993632 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -37,6 +37,10 @@ struct NBDClientConnection {
  bool do_negotiation;
  bool do_retry;
  
+/* Used only by connection thread, no need in locking the mutex */


s/no need in locking the mutex/does not need mutex protection/


+bool has_prev_info;
+NBDExportInfo prev_info;
+
  QemuMutex mutex;
  
  /*

@@ -160,6 +164,67 @@ static int nbd_connect(QIOChannelSocket *sioc, 
SocketAddress *addr,
  return 0;
  }
  
+static bool nbd_is_new_info_compatible(NBDExportInfo *old, NBDExportInfo *new,

+   Error **errp)
+{
+uint32_t dropped_flags;
+
+if (old->structured_reply && !new->structured_reply) {
+error_setg(errp, "Server options degrade after reconnect: "


degraded


+   "structured_reply is not supported anymore");
+return false;
+}
+
+if (old->base_allocation && !new->base_allocation) {
+error_setg(errp, "Server options degrade after reconnect: "


degraded


+   "base_allocation is not supported anymore");
+return false;
+}


Do we also need to insist that the context id value be identical, or
can our code gracefully deal with it being different?  We don't ever
send the context id, so even if we retry a CMD_BLOCK_STATUS, our real
risk is whether we will reject the new server's reply because it used
a different id than we were expecting.


+
+if (old->size != new->size) {
+error_setg(errp, "NBD export size changed after reconnect");
+return false;
+}
+
+/*
+ * No worry if rotational status changed. But other flags are feature 
flags,
+ * they should not degrade.
+ */
+dropped_flags = (old->flags & ~new->flags) & ~NBD_FLAG_ROTATIONAL;
+if (dropped_flags) {
+error_setg(errp, "Server options degrade after reconnect: flags 0x%"


degraded


+   PRIx32 " are not reported anymore", dropped_flags);
+return false;
+}
+
+if (new->min_block > old->min_block) {
+error_setg(errp, "Server requires more strict min_block after "
+   "reconnect: %" PRIu32 " instead of %" PRIu32,
+   new->min_block, old->min_block);
+return false;
+}


Good...


+if (new->min_block && (old->min_block % new->min_block)) {
+error_setg(errp, "Server requires new min_block %" PRIu32
+   " after reconnect, incompatible with old one %" PRIu32,
+   new->min_block, old->min_block);
+return false;
+}


...but why is this one necessary?  Since min_block has to be a power

Re: [PATCH for-6.1 v2] i386: do not call cpudef-only models functions for max, host, base

2021-11-29 Thread David Woodhouse

On Mon, 2021-11-29 at 20:55 +0100, Claudio Fontana wrote:
> On 11/29/21 8:19 PM, David Woodhouse wrote:
> > On Mon, 2021-11-29 at 20:10 +0100, Claudio Fontana wrote:
> > > 
> > > Hmm I thought what you actually care for, for cpu "host", is just the 
> > > kvm_enable_x2apic() call, not the kvm_default_props.
> > > 
> > > 
> > > 
> > > Do you also expect the kvm_default_prop "kvm-msi-ext-dest-id" to be 
> > > switch to "on" and applied?
> > 
> > It's already on today. It just isn't *true* because QEMU never called
> > kvm_enable_x2apic().
> 
> 
> property should be on, but not by setting in kvm_default_prop / applied via 
> kvm_default_prop, that mechanism is for the versioned cpu models,
> which use X86CPUModel / X86CPUDefinition , and "host" isn't one of them.
> 
> Out of curiosity, does my previous snippet actually work? Not that I am sure 
> it is the best solution,
> just for my understanding. It would be surprising to me that the need to 
> actually manually apply "kvm-msi-ext-dest-id" to "on" there.
> 

This one?

--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -161,14 +161,14 @@ static void kvm_cpu_instance_init(CPUState *cs)
 
 host_cpu_instance_init(cpu);
 
-if (xcc->model) {
 /* only applies to builtin_x86_defs cpus */
 if (!kvm_irqchip_in_kernel()) {
 x86_cpu_change_kvm_default("x2apic", "off");
 } else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) {
-x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
+   x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
 }
 
+if (xcc->model) {
 /* Special cases not set in the X86CPUDefinition structs: */
 x86_cpu_apply_props(cpu, kvm_default_props);
 }

Note that in today's HEAD we already advertise X2APIC and ext-dest-id
to the '-cpu host' guest; it's just not *true* because we never call
kvm_enable_x2apic().

So yes, the above works on a modern kernel where kvm_enable_x2apic()
succeeds. But that's the easy case.

Where your snippet *won't* work is in the case of running on an old
kernel where kvm_enable_x2apic() fails.

In that case it needs to turn x2apic support *off*. But simply calling
(or not calling) x86_cpu_change_kvm_default() makes absolutely no
difference unless those defaults are *applied* by calling
x86_cpu_apply_props() or making the same change by some other means.


> > So what I care about (in case ∃ APIC IDs >= 255) is two things:
> > 
> >  1. Qemu needs to call kvm_enable_x2apic().
> >  2. If that *fails* qemu needs to *stop* advertising X2APIC and ext-dest-id.
> > 
> > 
> > That last patch snippet in pc_machine_done() should suffice to achieve
> > that, I think. Because if kvm_enable_x2apic() fails and qemu has been
> > asked for that many CPUs, it aborts completely. Which seems right.
> > 
> 
> seems right to abort if requesting > 255 APIC IDs cannot be satisfied, I 
> agree.
> 
> So I think in the end, we want to:
> 
> 1) make sure that when accel=kvm and smp > 255 for i386, using cpu "host", 
> kvm_enable_x2apic() is called and successful.
> 
> 2) in addressing requirement 1), we do not break something else (other 
> machines, other cpu classes/models, TCG, ...).
> 
> 3) as a plus we might want to cleanup and determine once and for all where 
> kvm_enable_x2apic() should be called:
>we have calls in intel_iommu.c and in the kvm cpu class instance 
> initialization here in kvm-cpu.c today:
>before adding a third call we should really ask ourselves where the proper 
> initialization of this should happen.
> 

I think the existing two calls to kvm_enable_x2apic() become mostly
redundant. Because in fact the vtd_decide_config() and
kvm_cpu_instance_init() callers would both by perfectly OK without
kvm_enable_x2apic() if there isn't a CPU with an APIC ID >= 255
anyway. 

And that means that with my patch, pc_machine_done() will have
*aborted* if their conditions aren't met.

But then again, if since kvm_enable_x2apic() is both the initial
initialisation *and* a cached sanity check that it has indeed been
enabled successfully, there perhaps isn't any *harm* in having them do
the check for themselves?



smime.p7s
Description: S/MIME cryptographic signature

[PATCH for-7.0 4/4] hw/arm: Don't include qemu-common.h unnecessarily

2021-11-29 Thread Peter Maydell

A lot of C files in hw/arm include qemu-common.h when they don't
need anything from it. Drop the include lines.

omap1.c, pxa2xx.c and strongarm.c retain the include because they
use it for the prototype of qemu_get_timedate().

Signed-off-by: Peter Maydell 
---
 hw/arm/boot.c   | 1 -
 hw/arm/digic_boards.c   | 1 -
 hw/arm/highbank.c   | 1 -
 hw/arm/npcm7xx_boards.c | 1 -
 hw/arm/sbsa-ref.c   | 1 -
 hw/arm/stm32f405_soc.c  | 1 -
 hw/arm/vexpress.c   | 1 -
 hw/arm/virt.c   | 1 -
 8 files changed, 8 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 74ad397b1ff..399f8e837ce 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -8,7 +8,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu-common.h"
 #include "qemu/datadir.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
diff --git a/hw/arm/digic_boards.c b/hw/arm/digic_boards.c
index b771a3d8b74..4093af09cb2 100644
--- a/hw/arm/digic_boards.c
+++ b/hw/arm/digic_boards.c
@@ -25,7 +25,6 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
-#include "qemu-common.h"
 #include "qemu/datadir.h"
 #include "hw/boards.h"
 #include "qemu/error-report.h"
diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c
index c3cb315dbc6..4210894d814 100644
--- a/hw/arm/highbank.c
+++ b/hw/arm/highbank.c
@@ -18,7 +18,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu-common.h"
 #include "qemu/datadir.h"
 #include "qapi/error.h"
 #include "hw/sysbus.h"
diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
index dec7d16ae51..aff8c870420 100644
--- a/hw/arm/npcm7xx_boards.c
+++ b/hw/arm/npcm7xx_boards.c
@@ -24,7 +24,6 @@
 #include "hw/qdev-core.h"
 #include "hw/qdev-properties.h"
 #include "qapi/error.h"
-#include "qemu-common.h"
 #include "qemu/datadir.h"
 #include "qemu/units.h"
 #include "sysemu/blockdev.h"
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index 358714bd3e8..dd944553f78 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -18,7 +18,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu-common.h"
 #include "qemu/datadir.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
diff --git a/hw/arm/stm32f405_soc.c b/hw/arm/stm32f405_soc.c
index 0019b7f4785..c07947d9f8b 100644
--- a/hw/arm/stm32f405_soc.c
+++ b/hw/arm/stm32f405_soc.c
@@ -24,7 +24,6 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
-#include "qemu-common.h"
 #include "exec/address-spaces.h"
 #include "sysemu/sysemu.h"
 #include "hw/arm/stm32f405_soc.h"
diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
index 58481c07629..3e6d63c7f96 100644
--- a/hw/arm/vexpress.c
+++ b/hw/arm/vexpress.c
@@ -23,7 +23,6 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
-#include "qemu-common.h"
 #include "qemu/datadir.h"
 #include "cpu.h"
 #include "hw/sysbus.h"
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 30da05dfe04..3e2144e31af 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -29,7 +29,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu-common.h"
 #include "qemu/datadir.h"
 #include "qemu/units.h"
 #include "qemu/option.h"
-- 
2.25.1

[PATCH for-7.0 0/4] qemu-common.h include cleanup

2021-11-29 Thread Peter Maydell

qemu-common.h has a comment at the top:

 * This file is supposed to be included only by .c files. No header file should
 * depend on qemu-common.h, as this would easily lead to circular header
 * dependencies.

We still have a few .h files which include it, though.  The first 3
patches in this series fix that: in 3 out of 4 cases we didn't need
the #include at all, and in the 4th case we can instead #include
qemu-common.h from just one .c file.

Patch 4 is just removing the #include from 8 files in hw/arm which
don't need it at all.  (Probably there are other files like this, but
I just did the Arm related ones.)

Tested by pushing to gitlab for the CI build.

-- PMM

Peter Maydell (4):
  include/hw/i386: Don't include qemu-common.h in .h files
  target/hexagon/cpu.h: don't include qemu-common.h
  target/rx/cpu.h: Don't include qemu-common.h
  hw/arm: Don't include qemu-common.h unnecessarily

 include/hw/i386/microvm.h | 1 -
 include/hw/i386/x86.h | 1 -
 target/hexagon/cpu.h  | 1 -
 target/rx/cpu.h   | 1 -
 hw/arm/boot.c | 1 -
 hw/arm/digic_boards.c | 1 -
 hw/arm/highbank.c | 1 -
 hw/arm/npcm7xx_boards.c   | 1 -
 hw/arm/sbsa-ref.c | 1 -
 hw/arm/stm32f405_soc.c| 1 -
 hw/arm/vexpress.c | 1 -
 hw/arm/virt.c | 1 -
 linux-user/hexagon/cpu_loop.c | 1 +
 13 files changed, 1 insertion(+), 12 deletions(-)

-- 
2.25.1

[PATCH for-7.0 3/4] target/rx/cpu.h: Don't include qemu-common.h

2021-11-29 Thread Peter Maydell

The qemu-common.h header is not supposed to be included from any
other header files, only from .c files (as documented in a comment at
the start of it).

Nothing actually relies on target/rx/cpu.h including it, so we can
just drop the include.

Signed-off-by: Peter Maydell 
---
 target/rx/cpu.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/target/rx/cpu.h b/target/rx/cpu.h
index 4ac71aec370..657db84ef0a 100644
--- a/target/rx/cpu.h
+++ b/target/rx/cpu.h
@@ -20,7 +20,6 @@
 #define RX_CPU_H
 
 #include "qemu/bitops.h"
-#include "qemu-common.h"
 #include "hw/registerfields.h"
 #include "cpu-qom.h"
 
-- 
2.25.1

[PATCH for-7.0 1/4] include/hw/i386: Don't include qemu-common.h in .h files

2021-11-29 Thread Peter Maydell

The qemu-common.h header is not supposed to be included from any
other header files, only from .c files (as documented in a comment at
the start of it).

include/hw/i386/x86.h and include/hw/i386/microvm.h break this rule.
In fact, the include is not required at all, so we can just drop it
from both files.

Signed-off-by: Peter Maydell 
---
 include/hw/i386/microvm.h | 1 -
 include/hw/i386/x86.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
index 4d9c732d4b2..efcbd926fd4 100644
--- a/include/hw/i386/microvm.h
+++ b/include/hw/i386/microvm.h
@@ -18,7 +18,6 @@
 #ifndef HW_I386_MICROVM_H
 #define HW_I386_MICROVM_H
 
-#include "qemu-common.h"
 #include "exec/hwaddr.h"
 #include "qemu/notify.h"
 
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index bb1cfb88966..a145a303703 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -17,7 +17,6 @@
 #ifndef HW_I386_X86_H
 #define HW_I386_X86_H
 
-#include "qemu-common.h"
 #include "exec/hwaddr.h"
 #include "qemu/notify.h"
 
-- 
2.25.1

[PATCH for-7.0 2/4] target/hexagon/cpu.h: don't include qemu-common.h

2021-11-29 Thread Peter Maydell

The qemu-common.h header is not supposed to be included from any
other header files, only from .c files (as documented in a comment at
the start of it).

Move the include to linux-user/hexagon/cpu_loop.c, which needs it for
the declaration of cpu_exec_step_atomic().

Signed-off-by: Peter Maydell 
---
 target/hexagon/cpu.h  | 1 -
 linux-user/hexagon/cpu_loop.c | 1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index de121d950f2..58a0d3870bb 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -23,7 +23,6 @@ typedef struct CPUHexagonState CPUHexagonState;
 
 #include "fpu/softfloat-types.h"
 
-#include "qemu-common.h"
 #include "exec/cpu-defs.h"
 #include "hex_regs.h"
 #include "mmvec/mmvec.h"
diff --git a/linux-user/hexagon/cpu_loop.c b/linux-user/hexagon/cpu_loop.c
index 6b24cbaba93..e47f8348d56 100644
--- a/linux-user/hexagon/cpu_loop.c
+++ b/linux-user/hexagon/cpu_loop.c
@@ -19,6 +19,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu-common.h"
 #include "qemu.h"
 #include "user-internals.h"
 #include "cpu_loop-common.h"
-- 
2.25.1

Observing VM Status Changes

2021-11-29 Thread John Snow

Is there a generic event for observing VM state changes? I see we have a
lot of bespoke events like "STOP", "RESUME", "SHUTDOWN" and so forth, but I
can't quickly and at a glance determine if we have a 1:1 correlation for
every QAPI RunState to a QMP Event announcing that state.

I'm looking at e.g. the PAUSED runstate and I can see it set in several
places:

migration/migration.c:runstate_set(RUN_STATE_PAUSED);
migration/savevm.c:runstate_set(RUN_STATE_PAUSED);

but for, say, the migration/savevm.c route, it doesn't look like it's
accompanied by a QMP event -- that appears to only be emitted by
softmmu/cpus.c -- and only when the vcpus were already running. In this
case, the savevm route only occurs before we've started the vCPUs.

So as far as I can tell, there's really no well-defined relationship
between the various events in qapi/run-state.json and the RunState
enumeration. This would make it hard for a client to keep track of the VM
state without having to re-query it a lot. Am I mistaken?

(I was looking into adding VM state into the qmp-shell tool such that it
spied on QMP events and updated a toolbar accordingly. However, not every
state seems to be preceded by an event, and not every event gives a strong
indication of what the resulting VM state would actually be. Some runstate
changes don't appear to be announced by any event at all.)

Re: [RFC PATCH v3 0/5] QMP support for cold-plugging devices

2021-11-29 Thread Dr. David Alan Gilbert

* Markus Armbruster (arm...@redhat.com) wrote:
> Damien Hedde  writes:

> > Patches 1, 3 and 5 miss a review.
> >
> > The series is organized as follow:
> >
> > + Patches 1 and 2 converts the MachinePhase enum to a qapi definition
> >   and add the 'query-machine-phase'. It allows to introspect the
> >   current machine phase during preconfig as we will now be able to
> >   reach several machine phases using QMP.
> 
> If we fold MachinePhase into RunState, we can reuse query-status.
> 
> Having two state machines run one after the other feels like one too
> many.

Be careful, the RunState is API and things watch for events on it, so
any changes to it are delicate.

Dave

> > + Patch 3 adds the 'x-machine-init' QMP command to stop QEMU at
> >   machine-initialized phase during preconfig.
> > + Patch 4 allows issuing device_add QMP command during the
> >   machine-initialized phase.
> > + Patch 5 improves the doc about preconfig in consequence. 
> 
> I understand you want to make progress towards machine configuration
> with QMP.  However, QEMU startup is (in my educated opinion) a hole, and
> we should be wary of digging deeper.
> 
> The "timeline" you gave above illustrates this.  It's a complicated
> shuffling of command line options and QMP commands that basically nobody
> can keep in working memory.  We have reshuffled it / made it more
> complicated quite a few times already to support new features.  Based on
> your cover letter, I figure you're making it more complicated once more.
> 
> At some point, we need to stop digging us deeper into the hole.  This is
> not an objection to merging your work.  It's a call to stop and think.
> 
> Let me quote the sketch I posted to the "Stabilize preconfig" thread:
> 
> 1. Start event loop
> 
> 2. Feed it CLI left to right.  Each option runs a handler just like each
>QMP command does.
> 
>Options that read a configuration file inject the file into the feed.
> 
>Options that create a monitor create it suspended.
> 
>Options may advance the phase / run state, and they may require
>certain phase(s).
> 
> 3. When we're done with CLI, resume any monitors we created.
> 
> 4. Monitors now feed commands to the event loop.  Commands may advance
>the phase / run state, and they may require certain phase(s).
> 
> Users can do as much or as little with the CLI as they want.  You'd
> probably want to set up a QMP monitor and no more.
> 
> device_add becomes possible at a certain state of the phase / run state
> machine.  It changes from cold to hot plug at a certain later state.
> 
> > [1]: 
> > https://lore.kernel.org/qemu-devel/b31f442d28920447690a6b8cee865bdbacde1283.1635160056.git.mpriv...@redhat.com
> >
> > Thanks for your feedback.
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH for-6.1 v2] i386: do not call cpudef-only models functions for max, host, base

2021-11-29 Thread Claudio Fontana

On 11/29/21 8:19 PM, David Woodhouse wrote:
> On Mon, 2021-11-29 at 20:10 +0100, Claudio Fontana wrote:
>>
>> Hmm I thought what you actually care for, for cpu "host", is just the 
>> kvm_enable_x2apic() call, not the kvm_default_props.
>>
>>
>>
>> Do you also expect the kvm_default_prop "kvm-msi-ext-dest-id" to be switch 
>> to "on" and applied?
> 
> It's already on today. It just isn't *true* because QEMU never called
> kvm_enable_x2apic().

property should be on, but not by setting in kvm_default_prop / applied via 
kvm_default_prop, that mechanism is for the versioned cpu models,
which use X86CPUModel / X86CPUDefinition , and "host" isn't one of them.

Out of curiosity, does my previous snippet actually work? Not that I am sure it 
is the best solution,
just for my understanding. It would be surprising to me that the need to 
actually manually apply "kvm-msi-ext-dest-id" to "on" there.

> 
> So what I care about (in case ∃ APIC IDs >= 255) is two things:
> 
>  1. Qemu needs to call kvm_enable_x2apic().
>  2. If that *fails* qemu needs to *stop* advertising X2APIC and ext-dest-id.
> 
> 
> That last patch snippet in pc_machine_done() should suffice to achieve
> that, I think. Because if kvm_enable_x2apic() fails and qemu has been
> asked for that many CPUs, it aborts completely. Which seems right.
> 

seems right to abort if requesting > 255 APIC IDs cannot be satisfied, I agree.

So I think in the end, we want to:

1) make sure that when accel=kvm and smp > 255 for i386, using cpu "host", 
kvm_enable_x2apic() is called and successful.

2) in addressing requirement 1), we do not break something else (other 
machines, other cpu classes/models, TCG, ...).

3) as a plus we might want to cleanup and determine once and for all where 
kvm_enable_x2apic() should be called:
   we have calls in intel_iommu.c and in the kvm cpu class instance 
initialization here in kvm-cpu.c today:
   before adding a third call we should really ask ourselves where the proper 
initialization of this should happen.

Let me know about the previous snippet, and I'd really look for other comments 
from Eduardo or Paolo at this point, regarding the "what should be" question.

Ciao,

Claudio

Re: [PULL for 6.2 0/8] more tcg, plugin, test and build fixes

2021-11-29 Thread Richard Henderson


On 11/29/21 6:14 PM, Alex Bennée wrote:

The following changes since commit e750c10167fa8ad3fcc98236a474c46e52e7c18c:

   Merge tag 'pull-target-arm-20211129' of 
https://git.linaro.org/people/pmaydell/qemu-arm into staging (2021-11-29 
11:56:07 +0100)

are available in the Git repository at:

   https://github.com/stsquad/qemu.git tags/pull-for-6.2-291121-1

for you to fetch changes up to d5615bbf9103f01911df683cc3e4e85c49a92593:

   tests/plugin/syscall.c: fix compiler warnings (2021-11-29 15:13:22 +)


TCG, plugin and build fixes:

   - introduce CF_NOIRQ to avoid watchpoint race
   - fix avocado plugin test
   - fix linker issue with weird paths
   - band-aid for gdbstub race
   - updates for MAINTAINERS
   - fix some compiler warning in example plugin


Alex Bennée (5):
   accel/tcg: introduce CF_NOIRQ
   accel/tcg: suppress IRQ check for special TBs
   tests/avocado: fix tcg_plugin mem access count test
   plugins/meson.build: fix linker issue with weird paths
   gdbstub: handle a potentially racing TaskState

Juro Bystricky (1):
   tests/plugin/syscall.c: fix compiler warnings

Philippe Mathieu-Daudé (1):
   MAINTAINERS: Add section for Aarch64 GitLab custom runner

Willian Rampazzo (1):
   MAINTAINERS: Remove me as a reviewer for the build and test/avocado

  include/exec/exec-all.h  |  1 +
  include/exec/gen-icount.h| 21 +
  accel/tcg/cpu-exec.c |  9 +
  accel/tcg/translate-all.c|  4 ++--
  gdbstub.c|  2 +-
  softmmu/physmem.c|  4 ++--
  tests/plugin/syscall.c   |  8 +++-
  MAINTAINERS  | 10 --
  plugins/meson.build  |  4 ++--
  tests/avocado/tcg_plugins.py |  2 +-
  10 files changed, 46 insertions(+), 19 deletions(-)


Applied, thanks.


r~

Re: [PATCH for-6.1 v2] i386: do not call cpudef-only models functions for max, host, base

2021-11-29 Thread David Woodhouse

On Mon, 2021-11-29 at 20:10 +0100, Claudio Fontana wrote:
> 
> Hmm I thought what you actually care for, for cpu "host", is just the 
> kvm_enable_x2apic() call, not the kvm_default_props.
> 
> 
> 
> Do you also expect the kvm_default_prop "kvm-msi-ext-dest-id" to be switch to 
> "on" and applied?

It's already on today. It just isn't *true* because QEMU never called
kvm_enable_x2apic().

So what I care about (in case ∃ APIC IDs >= 255) is two things:

 1. Qemu needs to call kvm_enable_x2apic().
 2. If that *fails* qemu needs to *stop* advertising X2APIC and ext-dest-id.

That last patch snippet in pc_machine_done() should suffice to achieve
that, I think. Because if kvm_enable_x2apic() fails and qemu has been
asked for that many CPUs, it aborts completely. Which seems right.

smime.p7s
Description: S/MIME cryptographic signature

Re: [RFC for-6.2] block/nbd: forbid incompatible change of server options on reconnect

2021-11-29 Thread Eric Blake

On Wed, Nov 24, 2021 at 03:09:51PM +0100, Vladimir Sementsov-Ogievskiy wrote:
> Reconnect feature was never prepared to handle server options changed
> on reconnect. Let's be stricter and check what exactly is changed. If
> server capabilities just got richer don't worry. Otherwise fail and
> drop the established connection.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> +/*
> + * No worry if rotational status changed. But other flags are feature 
> flags,
> + * they should not degrade.
> + */
> +dropped_flags = (old->flags & ~new->flags) & ~NBD_FLAG_ROTATIONAL;
> +if (dropped_flags) {
> +error_setg(errp, "Server options degrade after reconnect: flags 0x%"
> +   PRIx32 " are not reported anymore", dropped_flags);
> +return false;
> +}

Your logic is good for most flags, but somewhat wrong for
NBD_FLAG_READ_ONLY_BIT.  For cases where we are only using the block
device read-only, we don't care about changes of that bit, in either
direction.  But for cases where we want to use the block device
read-write, the bit changing from clear in the old to set in the new
server is an incompatible change that your logic failed to flag.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v6 3/4] virtio-iommu: Fix the domain_range end

2021-11-29 Thread Jean-Philippe Brucker

On Sat, Nov 27, 2021 at 08:29:09AM +0100, Eric Auger wrote:
> in old times the domain range was defined by a domain_bits le32.
> This was then converted into a domain_range struct. During the
> upgrade the original value of '32' (bits) has been kept while
> the end field now is the max value of the domain id (UINT32_MAX).
> Fix that and also use UINT64_MAX for the input_range.end.
> 
> Signed-off-by: Eric Auger 
> Reported-by: Jean-Philippe Brucker 

Reviewed-by: Jean-Philippe Brucker 

> ---
>  hw/virtio/virtio-iommu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index 30ee09187b8..aa9c16a17b1 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -978,8 +978,8 @@ static void virtio_iommu_device_realize(DeviceState *dev, 
> Error **errp)
>  s->event_vq = virtio_add_queue(vdev, VIOMMU_DEFAULT_QUEUE_SIZE, NULL);
>  
>  s->config.page_size_mask = TARGET_PAGE_MASK;
> -s->config.input_range.end = -1UL;
> -s->config.domain_range.end = 32;
> +s->config.input_range.end = UINT64_MAX;
> +s->config.domain_range.end = UINT32_MAX;
>  s->config.probe_size = VIOMMU_PROBE_SIZE;
>  
>  virtio_add_feature(>features, VIRTIO_RING_F_EVENT_IDX);
> -- 
> 2.26.3
>

Re: [PATCH v6 2/4] virtio-iommu: Fix endianness in get_config

2021-11-29 Thread Jean-Philippe Brucker

On Sat, Nov 27, 2021 at 08:29:08AM +0100, Eric Auger wrote:
> Endianess is not properly handled when populating
> the returned config. Use the cpu_to_le* primitives
> for each separate field. Also, while at it, trace
> the domain range start.
> 
> Signed-off-by: Eric Auger 
> Reported-by: Thomas Huth 

Reviewed-by: Jean-Philippe Brucker 

> ---
>  hw/virtio/trace-events   |  2 +-
>  hw/virtio/virtio-iommu.c | 22 +++---
>  2 files changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index 54bd7da00c8..f7ad6be5fbb 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -91,7 +91,7 @@ virtio_mmio_setting_irq(int level) "virtio_mmio setting IRQ 
> %d"
>  virtio_iommu_device_reset(void) "reset!"
>  virtio_iommu_get_features(uint64_t features) "device supports 
> features=0x%"PRIx64
>  virtio_iommu_device_status(uint8_t status) "driver status = %d"
> -virtio_iommu_get_config(uint64_t page_size_mask, uint64_t start, uint64_t 
> end, uint32_t domain_range, uint32_t probe_size) "page_size_mask=0x%"PRIx64" 
> start=0x%"PRIx64" end=0x%"PRIx64" domain_range=%d probe_size=0x%x"
> +virtio_iommu_get_config(uint64_t page_size_mask, uint64_t start, uint64_t 
> end, uint32_t domain_start, uint32_t domain_end, uint32_t probe_size) 
> "page_size_mask=0x%"PRIx64" input range start=0x%"PRIx64" input range 
> end=0x%"PRIx64" domain range start=%d domain range end=%d probe_size=0x%x"
>  virtio_iommu_attach(uint32_t domain_id, uint32_t ep_id) "domain=%d 
> endpoint=%d"
>  virtio_iommu_detach(uint32_t domain_id, uint32_t ep_id) "domain=%d 
> endpoint=%d"
>  virtio_iommu_map(uint32_t domain_id, uint64_t virt_start, uint64_t virt_end, 
> uint64_t phys_start, uint32_t flags) "domain=%d virt_start=0x%"PRIx64" 
> virt_end=0x%"PRIx64 " phys_start=0x%"PRIx64" flags=%d"
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index 645c0aa3997..30ee09187b8 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -822,14 +822,22 @@ unlock:
>  static void virtio_iommu_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  {
>  VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
> -struct virtio_iommu_config *config = >config;
> +struct virtio_iommu_config *dev_config = >config;
> +struct virtio_iommu_config *out_config = (void *)config_data;
>  
> -trace_virtio_iommu_get_config(config->page_size_mask,
> -  config->input_range.start,
> -  config->input_range.end,
> -  config->domain_range.end,
> -  config->probe_size);
> -memcpy(config_data, >config, sizeof(struct virtio_iommu_config));
> +out_config->page_size_mask = cpu_to_le64(dev_config->page_size_mask);
> +out_config->input_range.start = 
> cpu_to_le64(dev_config->input_range.start);
> +out_config->input_range.end = cpu_to_le64(dev_config->input_range.end);
> +out_config->domain_range.start = 
> cpu_to_le32(dev_config->domain_range.start);
> +out_config->domain_range.end = cpu_to_le32(dev_config->domain_range.end);
> +out_config->probe_size = cpu_to_le32(dev_config->probe_size);
> +
> +trace_virtio_iommu_get_config(dev_config->page_size_mask,
> +  dev_config->input_range.start,
> +  dev_config->input_range.end,
> +  dev_config->domain_range.start,
> +  dev_config->domain_range.end,
> +  dev_config->probe_size);
>  }
>  
>  static uint64_t virtio_iommu_get_features(VirtIODevice *vdev, uint64_t f,
> -- 
> 2.26.3
>

Re: [PATCH for-6.1 v2] i386: do not call cpudef-only models functions for max, host, base

2021-11-29 Thread Claudio Fontana

On 11/29/21 6:17 PM, David Woodhouse wrote:
> On Mon, 2021-11-29 at 17:57 +0100, Claudio Fontana wrote:
>> On 11/29/21 4:11 PM, David Woodhouse wrote:
>>> On Mon, 2021-11-29 at 15:14 +0100, Claudio Fontana wrote:
 On 11/29/21 12:39 PM, Woodhouse, David wrote:
> On Fri, 2021-07-23 at 13:29 +0200, Claudio Fontana wrote:
>>  static void kvm_cpu_instance_init(CPUState *cs)
>>  {
>>  X86CPU *cpu = X86_CPU(cs);
>> +X86CPUClass *xcc = X86_CPU_GET_CLASS(cpu);
>>
>>  host_cpu_instance_init(cpu);
>>
>> -if (!kvm_irqchip_in_kernel()) {
>> -x86_cpu_change_kvm_default("x2apic", "off");
>> -} else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) {
>> -x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
>> -}
>> -
>> -/* Special cases not set in the X86CPUDefinition structs: */
>> +if (xcc->model) {
>> +/* only applies to builtin_x86_defs cpus */
>> +if (!kvm_irqchip_in_kernel()) {
>> +x86_cpu_change_kvm_default("x2apic", "off");
>> +} else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) {
>> +x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
>> +}
>>
>> -x86_cpu_apply_props(cpu, kvm_default_props);
>> +/* Special cases not set in the X86CPUDefinition structs: */
>> +x86_cpu_apply_props(cpu, kvm_default_props);
>> +}
>>
>
> I think this causes a regression in x2apic and kvm-msi-ext-dest-id
> support. If you start qemu thus:

 If I recall correctly, this change just tries to restore the behavior 
 prior to
 commit f5cc5a5c168674f84bf061cdb307c2d25fba5448 ,

 fixing the issue introduced with the refactoring at that time.

 Can you try bisecting prior to
 f5cc5a5c168674f84bf061cdb307c2d25fba5448 , to see if the actual
 breakage comes from somewhere else?
>>>
>>> Hm, so it looks like it never worked for '-cpu host' *until* commit
>>> f5cc5a5c16.
>>
>> Right, so here we are talking about properly supporting this for the first 
>> time.
>>
>> The fact that it works with f5cc5a5c16 is more an accident than anything 
>> else, that commit was clearly broken
>> (exemplified by reports of failed boots).
>>
>> So we need to find the proper solution, ie, exactly which features should be 
>> enabled for which cpu classes and models.
>>
>>>
>>> It didn't matter before c1bb5418e3 because you couldn't enable that
>>> many vCPUs without an IOMMU, and the *IOMMU* setup would call
>>> kvm_enable_x2apic().
>>>
>>> But after that, nothing ever called kvm_enable_x2apic() in the '-cpu
>>> host' case until commit f5cc5a5c16, which fixed it... until you
>>> restored the previous behaviour :)
>>>
>>> This "works" to fix this case, but presumably isn't correct:
>>
>> Right, we cannot just enable all this code, or the original refactor would 
>> have been right.
>>
>> These kvm default properties have been as far as I know intended for the cpu 
>> actual models (builtin_x86_defs),
>> and not for the special cpu classes max, host and base. This is what the 
>> revert addresses.
>>
>> I suspect what we actually need here is to review exactly in which specific 
>> cases kvm_enable_x2apic() should be called in the end.
>>
>> The code there is mixing changes to the kvm_default_props that are then 
>> applied using x86_cpu_apply_props (and that part should be only for 
>> xcc->model != NULL),
>> with the actual enablement of the kvm x2apic using kvm_vm_enable_cap(s, 
>> KVM_CAP_X2APIC_API, 0, flags) via kvm_enable_x2apic().
>>
>> One way is to ignore this detail and just move out those checks, since 
>> changes to kvm_default_props are harmless once we skip the 
>> x86_cpu_apply_props call,
>> as such: 
>>
>> --
>>
>> static void kvm_cpu_instance_init(CPUState *cs)
>> {
>> X86CPU *cpu = X86_CPU(cs);
>> X86CPUClass *xcc = X86_CPU_GET_CLASS(cpu);
>>
>> host_cpu_instance_init(cpu);
>>
>> /* only applies to builtin_x86_defs cpus */
>> if (!kvm_irqchip_in_kernel()) {
>> x86_cpu_change_kvm_default("x2apic", "off");
>> } else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) {
>> x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
>> }
>>
>> if (xcc->model) {
>> /* Special cases not set in the X86CPUDefinition structs: */
>> x86_cpu_apply_props(cpu, kvm_default_props);
>> }
>>
> 
> I don't believe that works in the case when kvm_enable_x2apic() fails
> on an older kernel. Although it sets the defaults, it still doesn't
> then *apply* them so it makes no difference.

Hmm I thought what you actually care for, for cpu "host", is just the 
kvm_enable_x2apic() call, not the kvm_default_props.

Do you also expect the kvm_default_prop "kvm-msi-ext-dest-id" to be switch to 
"on" and applied?

kvm_default_props were never applied to cpus without an x86 model definition 
(except for that

Re: [PATCH v6 1/4] virtio-iommu: Remove set_config callback

2021-11-29 Thread Jean-Philippe Brucker

On Sat, Nov 27, 2021 at 08:29:07AM +0100, Eric Auger wrote:
> The spec says "the driver must not write to device configuration
> fields". So remove the set_config() callback which anyway did
> not do anything.
> 
> Signed-off-by: Eric Auger 

Removing this makes sense. For bypass, I'll add the function back with a
reduced trace

Reviewed-by: Jean-Philippe Brucker 

> ---
>  hw/virtio/trace-events   |  1 -
>  hw/virtio/virtio-iommu.c | 14 --
>  2 files changed, 15 deletions(-)
> 
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index 650e521e351..54bd7da00c8 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -92,7 +92,6 @@ virtio_iommu_device_reset(void) "reset!"
>  virtio_iommu_get_features(uint64_t features) "device supports 
> features=0x%"PRIx64
>  virtio_iommu_device_status(uint8_t status) "driver status = %d"
>  virtio_iommu_get_config(uint64_t page_size_mask, uint64_t start, uint64_t 
> end, uint32_t domain_range, uint32_t probe_size) "page_size_mask=0x%"PRIx64" 
> start=0x%"PRIx64" end=0x%"PRIx64" domain_range=%d probe_size=0x%x"
> -virtio_iommu_set_config(uint64_t page_size_mask, uint64_t start, uint64_t 
> end, uint32_t domain_range, uint32_t probe_size) "page_size_mask=0x%"PRIx64" 
> start=0x%"PRIx64" end=0x%"PRIx64" domain_bits=%d probe_size=0x%x"
>  virtio_iommu_attach(uint32_t domain_id, uint32_t ep_id) "domain=%d 
> endpoint=%d"
>  virtio_iommu_detach(uint32_t domain_id, uint32_t ep_id) "domain=%d 
> endpoint=%d"
>  virtio_iommu_map(uint32_t domain_id, uint64_t virt_start, uint64_t virt_end, 
> uint64_t phys_start, uint32_t flags) "domain=%d virt_start=0x%"PRIx64" 
> virt_end=0x%"PRIx64 " phys_start=0x%"PRIx64" flags=%d"
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index 1b23e8e18c7..645c0aa3997 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -832,19 +832,6 @@ static void virtio_iommu_get_config(VirtIODevice *vdev, 
> uint8_t *config_data)
>  memcpy(config_data, >config, sizeof(struct virtio_iommu_config));
>  }
>  
> -static void virtio_iommu_set_config(VirtIODevice *vdev,
> -  const uint8_t *config_data)
> -{
> -struct virtio_iommu_config config;
> -
> -memcpy(, config_data, sizeof(struct virtio_iommu_config));
> -trace_virtio_iommu_set_config(config.page_size_mask,
> -  config.input_range.start,
> -  config.input_range.end,
> -  config.domain_range.end,
> -  config.probe_size);
> -}
> -
>  static uint64_t virtio_iommu_get_features(VirtIODevice *vdev, uint64_t f,
>Error **errp)
>  {
> @@ -1185,7 +1172,6 @@ static void virtio_iommu_class_init(ObjectClass *klass, 
> void *data)
>  vdc->unrealize = virtio_iommu_device_unrealize;
>  vdc->reset = virtio_iommu_device_reset;
>  vdc->get_config = virtio_iommu_get_config;
> -vdc->set_config = virtio_iommu_set_config;
>  vdc->get_features = virtio_iommu_get_features;
>  vdc->set_status = virtio_iommu_set_status;
>  vdc->vmsd = _virtio_iommu_device;
> -- 
> 2.26.3
>

[PATCH v4] target/ppc: fix Hash64 MMU update of PTE bit R

2021-11-29 Thread Leandro Lupori

When updating the R bit of a PTE, the Hash64 MMU was using a wrong byte
offset, causing the first byte of the adjacent PTE to be corrupted.
This caused a panic when booting FreeBSD, using the Hash MMU.

Fixes: a2dd4e83e76b ("ppc/hash64: Rework R and C bit updates")
Signed-off-by: Leandro Lupori 
---
Changes from v3:
- rename defines
---
 hw/ppc/spapr.c  | 8 
 hw/ppc/spapr_softmmu.c  | 2 +-
 target/ppc/mmu-hash64.c | 4 ++--
 target/ppc/mmu-hash64.h | 5 +
 4 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 163c90388a..3b5fd749be 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1414,7 +1414,7 @@ void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
 kvmppc_write_hpte(ptex, pte0, pte1);
 } else {
 if (pte0 & HPTE64_V_VALID) {
-stq_p(spapr->htab + offset + HASH_PTE_SIZE_64 / 2, pte1);
+stq_p(spapr->htab + offset + HPTE64_DW1, pte1);
 /*
  * When setting valid, we write PTE1 first. This ensures
  * proper synchronization with the reading code in
@@ -1430,7 +1430,7 @@ void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
  * ppc_hash64_pteg_search()
  */
 smp_wmb();
-stq_p(spapr->htab + offset + HASH_PTE_SIZE_64 / 2, pte1);
+stq_p(spapr->htab + offset + HPTE64_DW1, pte1);
 }
 }
 }
@@ -1438,7 +1438,7 @@ void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
 static void spapr_hpte_set_c(PPCVirtualHypervisor *vhyp, hwaddr ptex,
  uint64_t pte1)
 {
-hwaddr offset = ptex * HASH_PTE_SIZE_64 + 15;
+hwaddr offset = ptex * HASH_PTE_SIZE_64 + HPTE64_DW1_C;
 SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
 
 if (!spapr->htab) {
@@ -1454,7 +1454,7 @@ static void spapr_hpte_set_c(PPCVirtualHypervisor *vhyp, 
hwaddr ptex,
 static void spapr_hpte_set_r(PPCVirtualHypervisor *vhyp, hwaddr ptex,
  uint64_t pte1)
 {
-hwaddr offset = ptex * HASH_PTE_SIZE_64 + 14;
+hwaddr offset = ptex * HASH_PTE_SIZE_64 + HPTE64_DW1_R;
 SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
 
 if (!spapr->htab) {
diff --git a/hw/ppc/spapr_softmmu.c b/hw/ppc/spapr_softmmu.c
index f8924270ef..4ee03c83e4 100644
--- a/hw/ppc/spapr_softmmu.c
+++ b/hw/ppc/spapr_softmmu.c
@@ -426,7 +426,7 @@ static void new_hpte_store(void *htab, uint64_t pteg, int 
slot,
 addr += slot * HASH_PTE_SIZE_64;
 
 stq_p(addr, pte0);
-stq_p(addr + HASH_PTE_SIZE_64 / 2, pte1);
+stq_p(addr + HPTE64_DW1, pte1);
 }
 
 static int rehash_hpte(PowerPCCPU *cpu,
diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index 19832c4b46..da9fe99ff8 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -786,7 +786,7 @@ static void ppc_hash64_set_dsi(CPUState *cs, int mmu_idx, 
uint64_t dar, uint64_t
 
 static void ppc_hash64_set_r(PowerPCCPU *cpu, hwaddr ptex, uint64_t pte1)
 {
-hwaddr base, offset = ptex * HASH_PTE_SIZE_64 + 16;
+hwaddr base, offset = ptex * HASH_PTE_SIZE_64 + HPTE64_DW1_R;
 
 if (cpu->vhyp) {
 PPCVirtualHypervisorClass *vhc =
@@ -803,7 +803,7 @@ static void ppc_hash64_set_r(PowerPCCPU *cpu, hwaddr ptex, 
uint64_t pte1)
 
 static void ppc_hash64_set_c(PowerPCCPU *cpu, hwaddr ptex, uint64_t pte1)
 {
-hwaddr base, offset = ptex * HASH_PTE_SIZE_64 + 15;
+hwaddr base, offset = ptex * HASH_PTE_SIZE_64 + HPTE64_DW1_C;
 
 if (cpu->vhyp) {
 PPCVirtualHypervisorClass *vhc =
diff --git a/target/ppc/mmu-hash64.h b/target/ppc/mmu-hash64.h
index c5b2f97ff7..1496955d38 100644
--- a/target/ppc/mmu-hash64.h
+++ b/target/ppc/mmu-hash64.h
@@ -97,6 +97,11 @@ void ppc_hash64_finalize(PowerPCCPU *cpu);
 #define HPTE64_V_1TB_SEG0x4000ULL
 #define HPTE64_V_VRMA_MASK  0x4001ff00ULL
 
+/* PTE offsets */
+#define HPTE64_DW1  (HASH_PTE_SIZE_64 / 2)
+#define HPTE64_DW1_R(HPTE64_DW1 + 6)
+#define HPTE64_DW1_C(HPTE64_DW1 + 7)
+
 /* Format changes for ARCH v3 */
 #define HPTE64_V_COMMON_BITS0x000fULL
 #define HPTE64_R_3_0_SSIZE_SHIFT 58
-- 
2.25.1

Re: Follow-up on the CXL discussion at OFTC

2021-11-29 Thread Alex Bennée



Ben Widawsky  writes:

> On 21-11-26 12:08:08, Alex Bennée wrote:
>> 
>> Ben Widawsky  writes:
>> 
>> > On 21-11-19 02:29:51, Shreyas Shah wrote:
>> >> Hi Ben
>> >> 
>> >> Are you planning to add the CXL2.0 switch inside QEMU or already added in 
>> >> one of the version? 
>> >>  
>> >
>> > From me, there are no plans for QEMU anything until/unless upstream thinks 
>> > it
>> > will merge the existing patches, or provide feedback as to what it would 
>> > take to
>> > get them merged. If upstream doesn't see a point in these patches, then I 
>> > really
>> > don't see much value in continuing to further them. Once hardware comes 
>> > out, the
>> > value proposition is certainly less.
>> 
>> I take it:
>> 
>>   Subject: [RFC PATCH v3 00/31] CXL 2.0 Support
>>   Date: Mon,  1 Feb 2021 16:59:17 -0800
>>   Message-Id: <20210202005948.241655-1-ben.widaw...@intel.com>
>> 
>> is the current state of the support? I saw there was a fair amount of
>> discussion on the thread so assumed there would be a v4 forthcoming at
>> some point.
>
> Hi Alex,
>
> There is a v4, however, we never really had a solid plan for the primary issue
> which was around handling CXL memory expander devices properly (both from an
> interleaving standpoint as well as having a device which hosts multiple memory
> capacities, persistent and volatile). I didn't feel it was worth sending a v4
> unless someone could say
>
> 1. we will merge what's there and fix later, or
> 2. you must have a more perfect emulation in place, or
> 3. we want to see usages for a real guest

I think 1. is acceptable if the community is happy there will be ongoing
development and it's not just a code dump. Given it will have a
MAINTAINERS entry I think that is demonstrated.

What's the current use case? Testing drivers before real HW comes out?
Will it still be useful after real HW comes out for people wanting to
debug things without HW?

>
> I had hoped we could merge what was there mostly as is and fix it up as we go.
> It's useful in the state it is now, and as time goes on, we find more usecases
> for it in a VMM, and not just driver development.
>
>> 
>> Adding new subsystems to QEMU does seem to be a pain point for new
>> contributors. Patches tend to fall through the cracks of existing
>> maintainers who spend most of their time looking at stuff that directly
>> touches their files. There is also a reluctance to merge large chunks of
>> functionality without an identified maintainer (and maybe reviewers) who
>> can be the contact point for new patches. So in short you need:
>> 
>>  - Maintainer Reviewed-by/Acked-by on patches that touch other sub-systems
>
> This is the challenging one. I have Cc'd the relevant maintainers (hw/pci and
> hw/mem are the two) in the past, but I think there interest is lacking (and
> reasonably so, it is an entirely different subsystem).

So the best approach to that is to leave a Cc: tag in the patch itself
on your next posting so we can see the maintainer did see it but didn't
contribute a review tag. This is also a good reason to keep Message-Id
tags in patches so we can go back to the original threads.

So in my latest PR you'll see:

  Signed-off-by: Willian Rampazzo 
  Reviewed-by: Beraldo Leal 
  Message-Id: <20211122191124.31620-1-willi...@redhat.com>
  Signed-off-by: Alex Bennée 
  Reviewed-by: Philippe Mathieu-Daudé 
  Message-Id: <20211129140932.4115115-7-alex.ben...@linaro.org>

which shows the Message-Id from Willian's original posting and the
latest Message-Id from my posting of the maintainer tree (I trim off my
old ones).

>>  - Reviewed-by tags on the new sub-system patches from anyone who 
>> understands CXL
>
> I have/had those from Jonathan.
>
>>  - Some* in-tree testing (so it doesn't quietly bitrot)
>
> We had this, but it's stale now. We can bring this back up.
>
>>  - A patch adding the sub-system to MAINTAINERS with identified people
>
> That was there too. Since the original posting, I'd be happy to sign Jonathan 
> up
> to this if he's willing.

Sounds good to me.

>> * Some means at least ensuring qtest can instantiate the device and not
>>   fall over. Obviously more testing is better but it can always be
>>   expanded on in later series.
>
> This was in the patch series. It could use more testing for sure, but I had
> basic functional testing in place via qtest.

More is always better but the basic qtest does ensure a device doesn't
segfault if it's instantiated.

>
>> 
>> Is that the feedback you were looking for?
>
> You validated my assumptions as to what's needed, but your first bullet is the
> one I can't seem to pin down.
>
> Thanks.
> Ben


-- 
Alex Bennée

Re: [PATCH v3 15/23] multifd: Use a single writev on the send side

2021-11-29 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> Until now, we wrote the packet header with write(), and the rest of the
> pages with writev().  Just increase the size of the iovec and do a
> single writev().
> 
> Signed-off-by: Juan Quintela 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  migration/multifd.c | 20 
>  1 file changed, 8 insertions(+), 12 deletions(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 71bdef068e..65676d56fd 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -643,7 +643,7 @@ static void *multifd_send_thread(void *opaque)
>  uint32_t used = p->pages->num;
>  uint64_t packet_num = p->packet_num;
>  uint32_t flags = p->flags;
> -p->iovs_num = 0;
> +p->iovs_num = 1;
>  
>  if (used) {
>  ret = multifd_send_state->ops->send_prepare(p, _err);
> @@ -663,20 +663,15 @@ static void *multifd_send_thread(void *opaque)
>  trace_multifd_send(p->id, packet_num, used, flags,
> p->next_packet_size);
>  
> -ret = qio_channel_write_all(p->c, (void *)p->packet,
> -p->packet_len, _err);
> +p->iov[0].iov_len = p->packet_len;
> +p->iov[0].iov_base = p->packet;
> +
> +ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
> + _err);
>  if (ret != 0) {
>  break;
>  }
>  
> -if (used) {
> -ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
> - _err);
> -if (ret != 0) {
> -break;
> -}
> -}
> -
>  qemu_mutex_lock(>mutex);
>  p->pending_job--;
>  qemu_mutex_unlock(>mutex);
> @@ -913,7 +908,8 @@ int multifd_save_setup(Error **errp)
>  p->packet->version = cpu_to_be32(MULTIFD_VERSION);
>  p->name = g_strdup_printf("multifdsend_%d", i);
>  p->tls_hostname = g_strdup(s->hostname);
> -p->iov = g_new0(struct iovec, page_count);
> +/* We need one extra place for the packet header */
> +p->iov = g_new0(struct iovec, page_count + 1);
>  socket_send_channel_create(multifd_new_send_channel_async, p);
>  }
>  
> -- 
> 2.33.1
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v3 12/23] multifd: Make zlib use iov's

2021-11-29 Thread Juan Quintela

"Dr. David Alan Gilbert"  wrote:
> * Juan Quintela (quint...@redhat.com) wrote:
>> Signed-off-by: Juan Quintela 
>> ---
>>  migration/multifd-zlib.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>> 
>> diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
>> index da6201704c..478a4af115 100644
>> --- a/migration/multifd-zlib.c
>> +++ b/migration/multifd-zlib.c
>> @@ -143,6 +143,9 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
>> **errp)
>>  }
>>  out_size += available - zs->avail_out;
>>  }
>> +p->iov[p->iovs_num].iov_base = z->zbuff;
>> +p->iov[p->iovs_num].iov_len = out_size;
>> +p->iovs_num++;
>>  p->next_packet_size = out_size;
>
> Do you still need next_packet_size?

As my crystal ball didn't worked so well, I ended putting
next_packet_size on the wire.  So yes, I need it.

Yes, I also wanted to remove it.


Later, Juan.

>
> but:
>
>
> Reviewed-by: Dr. David Alan Gilbert 
>
>>  p->flags |= MULTIFD_FLAG_ZLIB;
>>  
>> @@ -162,10 +165,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, 
>> Error **errp)
>>   */
>>  static int zlib_send_write(MultiFDSendParams *p, uint32_t used, Error 
>> **errp)
>>  {
>> -struct zlib_data *z = p->data;
>> -
>> -return qio_channel_write_all(p->c, (void *)z->zbuff, 
>> p->next_packet_size,
>> - errp);
>> +return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
>>  }
>>  
>>  /**
>> -- 
>> 2.33.1
>>

Re: [PATCH v3 14/23] multifd: Remove send_write() method

2021-11-29 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> Everything use now iov's.
> 
> Signed-off-by: Juan Quintela 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  migration/multifd.h  |  2 --
>  migration/multifd-zlib.c | 17 -
>  migration/multifd-zstd.c | 17 -
>  migration/multifd.c  | 20 ++--
>  4 files changed, 2 insertions(+), 54 deletions(-)
> 
> diff --git a/migration/multifd.h b/migration/multifd.h
> index c3f18af364..7496f951a7 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -164,8 +164,6 @@ typedef struct {
>  void (*send_cleanup)(MultiFDSendParams *p, Error **errp);
>  /* Prepare the send packet */
>  int (*send_prepare)(MultiFDSendParams *p, Error **errp);
> -/* Write the send packet */
> -int (*send_write)(MultiFDSendParams *p, uint32_t used, Error **errp);
>  /* Setup for receiving side */
>  int (*recv_setup)(MultiFDRecvParams *p, Error **errp);
>  /* Cleanup for receiving side */
> diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
> index 478a4af115..f65159392a 100644
> --- a/migration/multifd-zlib.c
> +++ b/migration/multifd-zlib.c
> @@ -152,22 +152,6 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
> **errp)
>  return 0;
>  }
>  
> -/**
> - * zlib_send_write: do the actual write of the data
> - *
> - * Do the actual write of the comprresed buffer.
> - *
> - * Returns 0 for success or -1 for error
> - *
> - * @p: Params for the channel that we are using
> - * @used: number of pages used
> - * @errp: pointer to an error
> - */
> -static int zlib_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
> -{
> -return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
> -}
> -
>  /**
>   * zlib_recv_setup: setup receive side
>   *
> @@ -307,7 +291,6 @@ static MultiFDMethods multifd_zlib_ops = {
>  .send_setup = zlib_send_setup,
>  .send_cleanup = zlib_send_cleanup,
>  .send_prepare = zlib_send_prepare,
> -.send_write = zlib_send_write,
>  .recv_setup = zlib_recv_setup,
>  .recv_cleanup = zlib_recv_cleanup,
>  .recv_pages = zlib_recv_pages
> diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
> index 259277dc42..6933ba622a 100644
> --- a/migration/multifd-zstd.c
> +++ b/migration/multifd-zstd.c
> @@ -163,22 +163,6 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error 
> **errp)
>  return 0;
>  }
>  
> -/**
> - * zstd_send_write: do the actual write of the data
> - *
> - * Do the actual write of the comprresed buffer.
> - *
> - * Returns 0 for success or -1 for error
> - *
> - * @p: Params for the channel that we are using
> - * @used: number of pages used
> - * @errp: pointer to an error
> - */
> -static int zstd_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
> -{
> -return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
> -}
> -
>  /**
>   * zstd_recv_setup: setup receive side
>   *
> @@ -320,7 +304,6 @@ static MultiFDMethods multifd_zstd_ops = {
>  .send_setup = zstd_send_setup,
>  .send_cleanup = zstd_send_cleanup,
>  .send_prepare = zstd_send_prepare,
> -.send_write = zstd_send_write,
>  .recv_setup = zstd_recv_setup,
>  .recv_cleanup = zstd_recv_cleanup,
>  .recv_pages = zstd_recv_pages
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 37487fd01c..71bdef068e 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -100,22 +100,6 @@ static int nocomp_send_prepare(MultiFDSendParams *p, 
> Error **errp)
>  return 0;
>  }
>  
> -/**
> - * nocomp_send_write: do the actual write of the data
> - *
> - * For no compression we just have to write the data.
> - *
> - * Returns 0 for success or -1 for error
> - *
> - * @p: Params for the channel that we are using
> - * @used: number of pages used
> - * @errp: pointer to an error
> - */
> -static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error 
> **errp)
> -{
> -return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
> -}
> -
>  /**
>   * nocomp_recv_setup: setup receive side
>   *
> @@ -173,7 +157,6 @@ static MultiFDMethods multifd_nocomp_ops = {
>  .send_setup = nocomp_send_setup,
>  .send_cleanup = nocomp_send_cleanup,
>  .send_prepare = nocomp_send_prepare,
> -.send_write = nocomp_send_write,
>  .recv_setup = nocomp_recv_setup,
>  .recv_cleanup = nocomp_recv_cleanup,
>  .recv_pages = nocomp_recv_pages
> @@ -687,7 +670,8 @@ static void *multifd_send_thread(void *opaque)
>  }
>  
>  if (used) {
> -ret = multifd_send_state->ops->send_write(p, used, 
> _err);
> +ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
> + _err);
>  if (ret != 0) {
>  break;
>  }
> -- 
> 2.33.1
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v3 13/23] multifd: Make zstd use iov's

2021-11-29 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> Signed-off-by: Juan Quintela 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  migration/multifd-zstd.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
> index 2d5b61106c..259277dc42 100644
> --- a/migration/multifd-zstd.c
> +++ b/migration/multifd-zstd.c
> @@ -154,6 +154,9 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error 
> **errp)
>  return -1;
>  }
>  }
> +p->iov[p->iovs_num].iov_base = z->zbuff;
> +p->iov[p->iovs_num].iov_len = z->out.pos;
> +p->iovs_num++;
>  p->next_packet_size = z->out.pos;
>  p->flags |= MULTIFD_FLAG_ZSTD;
>  
> @@ -173,10 +176,7 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error 
> **errp)
>   */
>  static int zstd_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
>  {
> -struct zstd_data *z = p->data;
> -
> -return qio_channel_write_all(p->c, (void *)z->zbuff, p->next_packet_size,
> - errp);
> +return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
>  }
>  
>  /**
> -- 
> 2.33.1
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PULL 0/1] Linux user for 6.2 patches

2021-11-29 Thread Richard Henderson


On 11/29/21 3:04 PM, Laurent Vivier wrote:

The following changes since commit e750c10167fa8ad3fcc98236a474c46e52e7c18c:

   Merge tag 'pull-target-arm-20211129' of 
https://git.linaro.org/people/pmaydell/qemu-arm into staging (2021-11-29 
11:56:07 +0100)

are available in the Git repository at:

   git://github.com/vivier/qemu.git tags/linux-user-for-6.2-pull-request

for you to fetch changes up to 0a761ce30338526213f74dfe9900b9213d4bbb0b:

   linux-user: implement more loop ioctls (2021-11-29 14:54:17 +0100)


linux-user pull request 20211129

Fix losetup



Andreas Schwab (1):
   linux-user: implement more loop ioctls

  linux-user/ioctls.h| 4 
  linux-user/linux_loop.h| 2 ++
  linux-user/syscall_defs.h  | 4 
  linux-user/syscall_types.h | 6 ++
  4 files changed, 16 insertions(+)


Applied, thanks.

r~

Re: [PATCH v3 12/23] multifd: Make zlib use iov's

2021-11-29 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> Signed-off-by: Juan Quintela 
> ---
>  migration/multifd-zlib.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
> index da6201704c..478a4af115 100644
> --- a/migration/multifd-zlib.c
> +++ b/migration/multifd-zlib.c
> @@ -143,6 +143,9 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
> **errp)
>  }
>  out_size += available - zs->avail_out;
>  }
> +p->iov[p->iovs_num].iov_base = z->zbuff;
> +p->iov[p->iovs_num].iov_len = out_size;
> +p->iovs_num++;
>  p->next_packet_size = out_size;

Do you still need next_packet_size?

but:


Reviewed-by: Dr. David Alan Gilbert 

>  p->flags |= MULTIFD_FLAG_ZLIB;
>  
> @@ -162,10 +165,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
> **errp)
>   */
>  static int zlib_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
>  {
> -struct zlib_data *z = p->data;
> -
> -return qio_channel_write_all(p->c, (void *)z->zbuff, p->next_packet_size,
> - errp);
> +return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
>  }
>  
>  /**
> -- 
> 2.33.1
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v3 11/23] multifd: Move iov from pages to params

2021-11-29 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> This will allow us to reduce the number of system calls on the next patch.
> 
> Signed-off-by: Juan Quintela 

Leo: Does this make your zerocopy any harder?

Dave

> ---
>  migration/multifd.h |  8 ++--
>  migration/multifd.c | 34 --
>  2 files changed, 30 insertions(+), 12 deletions(-)
> 
> diff --git a/migration/multifd.h b/migration/multifd.h
> index e57adc783b..c3f18af364 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -62,8 +62,6 @@ typedef struct {
>  uint64_t packet_num;
>  /* offset of each page */
>  ram_addr_t *offset;
> -/* pointer to each page */
> -struct iovec *iov;
>  RAMBlock *block;
>  } MultiFDPages_t;
>  
> @@ -110,6 +108,10 @@ typedef struct {
>  uint64_t num_pages;
>  /* syncs main thread and channels */
>  QemuSemaphore sem_sync;
> +/* buffers to send */
> +struct iovec *iov;
> +/* number of iovs used */
> +uint32_t iovs_num;
>  /* used for compression methods */
>  void *data;
>  }  MultiFDSendParams;
> @@ -149,6 +151,8 @@ typedef struct {
>  uint64_t num_pages;
>  /* syncs main thread and channels */
>  QemuSemaphore sem_sync;
> +/* buffers to recv */
> +struct iovec *iov;
>  /* used for de-compression methods */
>  void *data;
>  } MultiFDRecvParams;
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 0533da154a..37487fd01c 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -86,7 +86,16 @@ static void nocomp_send_cleanup(MultiFDSendParams *p, 
> Error **errp)
>   */
>  static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
>  {
> -p->next_packet_size = p->pages->num * qemu_target_page_size();
> +MultiFDPages_t *pages = p->pages;
> +size_t page_size = qemu_target_page_size();
> +
> +for (int i = 0; i < p->pages->num; i++) {
> +p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
> +p->iov[p->iovs_num].iov_len = page_size;
> +p->iovs_num++;
> +}
> +
> +p->next_packet_size = p->pages->num * page_size;
>  p->flags |= MULTIFD_FLAG_NOCOMP;
>  return 0;
>  }
> @@ -104,7 +113,7 @@ static int nocomp_send_prepare(MultiFDSendParams *p, 
> Error **errp)
>   */
>  static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error 
> **errp)
>  {
> -return qio_channel_writev_all(p->c, p->pages->iov, used, errp);
> +return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
>  }
>  
>  /**
> @@ -146,13 +155,18 @@ static void nocomp_recv_cleanup(MultiFDRecvParams *p)
>  static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
>  {
>  uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
> +size_t page_size = qemu_target_page_size();
>  
>  if (flags != MULTIFD_FLAG_NOCOMP) {
>  error_setg(errp, "multifd %d: flags received %x flags expected %x",
> p->id, flags, MULTIFD_FLAG_NOCOMP);
>  return -1;
>  }
> -return qio_channel_readv_all(p->c, p->pages->iov, p->pages->num, errp);
> +for (int i = 0; i < p->pages->num; i++) {
> +p->iov[i].iov_base = p->pages->block->host + p->pages->offset[i];
> +p->iov[i].iov_len = page_size;
> +}
> +return qio_channel_readv_all(p->c, p->iov, p->pages->num, errp);
>  }
>  
>  static MultiFDMethods multifd_nocomp_ops = {
> @@ -242,7 +256,6 @@ static MultiFDPages_t *multifd_pages_init(size_t size)
>  MultiFDPages_t *pages = g_new0(MultiFDPages_t, 1);
>  
>  pages->allocated = size;
> -pages->iov = g_new0(struct iovec, size);
>  pages->offset = g_new0(ram_addr_t, size);
>  
>  return pages;
> @@ -254,8 +267,6 @@ static void multifd_pages_clear(MultiFDPages_t *pages)
>  pages->allocated = 0;
>  pages->packet_num = 0;
>  pages->block = NULL;
> -g_free(pages->iov);
> -pages->iov = NULL;
>  g_free(pages->offset);
>  pages->offset = NULL;
>  g_free(pages);
> @@ -365,8 +376,6 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams 
> *p, Error **errp)
>  return -1;
>  }
>  p->pages->offset[i] = offset;
> -p->pages->iov[i].iov_base = block->host + offset;
> -p->pages->iov[i].iov_len = page_size;
>  }
>  
>  return 0;
> @@ -470,8 +479,6 @@ int multifd_queue_page(QEMUFile *f, RAMBlock *block, 
> ram_addr_t offset)
>  
>  if (pages->block == block) {
>  pages->offset[pages->num] = offset;
> -pages->iov[pages->num].iov_base = block->host + offset;
> -pages->iov[pages->num].iov_len = qemu_target_page_size();
>  pages->num++;
>  
>  if (pages->num < pages->allocated) {
> @@ -564,6 +571,8 @@ void multifd_save_cleanup(void)
>  p->packet_len = 0;
>  g_free(p->packet);
>  p->packet = NULL;
> +g_free(p->iov);
> +p->iov = NULL;
>  multifd_send_state->ops->send_cleanup(p, _err);
>

Re: [RFC for-6.2] block/nbd: forbid incompatible change of server options on reconnect

2021-11-29 Thread Eric Blake

On Wed, Nov 24, 2021 at 03:09:51PM +0100, Vladimir Sementsov-Ogievskiy wrote:
> Reconnect feature was never prepared to handle server options changed
> on reconnect. Let's be stricter and check what exactly is changed. If
> server capabilities just got richer don't worry. Otherwise fail and
> drop the established connection.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> Hi all! The patch is probably good for 6.2. It's an RFC because I didn't
> test it yet) But I want to early send, so that my proposed design be
> available for discussion.

We're cutting it awfully close.  My justification for including it in
-rc3 (if we like it) is that it is a lot easier to audit that we
reject server downgrades than it is to audit whether we have a CVE
because of a server downgrade across a reconnect.  But it is not a new
regression to 6.2, so slipping it to 7.0 (if we don't feel comfortable
with the current iteration of the patch) is okay on that front.

> 
> 
>  include/block/nbd.h |  9 +
>  nbd/client-connection.c | 86 +
>  2 files changed, 95 insertions(+)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 78d101b774..3d379b5539 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -157,6 +157,10 @@ enum {
>  #define NBD_FLAG_SEND_RESIZE   (1 << NBD_FLAG_SEND_RESIZE_BIT)
>  #define NBD_FLAG_SEND_CACHE(1 << NBD_FLAG_SEND_CACHE_BIT)
>  #define NBD_FLAG_SEND_FAST_ZERO(1 << NBD_FLAG_SEND_FAST_ZERO_BIT)
> +/*
> + * If you add any new NBD_FLAG_ flag, check that logic in
> + * nbd_is_new_info_compatible() is still good about handling flags.
> + */
>  
>  /* New-style handshake (global) flags, sent from server to client, and
> control what will happen during handshake phase. */
> @@ -305,6 +309,11 @@ struct NBDExportInfo {
>  
>  uint32_t context_id;
>  
> +/*
> + * WARNING! when add any new field to the structure, don't forget to 
> check

adding

> + * and updated nbd_is_new_info_compatible() function.

update the

> + */

Odd that one comment has WARNING! and the other does not.

> +
>  /* Set by server results during nbd_receive_export_list() */
>  char *description;
>  int n_contexts;
> diff --git a/nbd/client-connection.c b/nbd/client-connection.c
> index 695f855754..2d66993632 100644
> --- a/nbd/client-connection.c
> +++ b/nbd/client-connection.c
> @@ -37,6 +37,10 @@ struct NBDClientConnection {
>  bool do_negotiation;
>  bool do_retry;
>  
> +/* Used only by connection thread, no need in locking the mutex */

s/no need in locking the mutex/does not need mutex protection/

> +bool has_prev_info;
> +NBDExportInfo prev_info;
> +
>  QemuMutex mutex;
>  
>  /*
> @@ -160,6 +164,67 @@ static int nbd_connect(QIOChannelSocket *sioc, 
> SocketAddress *addr,
>  return 0;
>  }
>  
> +static bool nbd_is_new_info_compatible(NBDExportInfo *old, NBDExportInfo 
> *new,
> +   Error **errp)
> +{
> +uint32_t dropped_flags;
> +
> +if (old->structured_reply && !new->structured_reply) {
> +error_setg(errp, "Server options degrade after reconnect: "

degraded

> +   "structured_reply is not supported anymore");
> +return false;
> +}
> +
> +if (old->base_allocation && !new->base_allocation) {
> +error_setg(errp, "Server options degrade after reconnect: "

degraded

> +   "base_allocation is not supported anymore");
> +return false;
> +}

Do we also need to insist that the context id value be identical, or
can our code gracefully deal with it being different?  We don't ever
send the context id, so even if we retry a CMD_BLOCK_STATUS, our real
risk is whether we will reject the new server's reply because it used
a different id than we were expecting.

> +
> +if (old->size != new->size) {
> +error_setg(errp, "NBD export size changed after reconnect");
> +return false;
> +}
> +
> +/*
> + * No worry if rotational status changed. But other flags are feature 
> flags,
> + * they should not degrade.
> + */
> +dropped_flags = (old->flags & ~new->flags) & ~NBD_FLAG_ROTATIONAL;
> +if (dropped_flags) {
> +error_setg(errp, "Server options degrade after reconnect: flags 0x%"

degraded

> +   PRIx32 " are not reported anymore", dropped_flags);
> +return false;
> +}
> +
> +if (new->min_block > old->min_block) {
> +error_setg(errp, "Server requires more strict min_block after "
> +   "reconnect: %" PRIu32 " instead of %" PRIu32,
> +   new->min_block, old->min_block);
> +return false;
> +}

Good...

> +if (new->min_block && (old->min_block % new->min_block)) {
> +error_setg(errp, "Server requires new min_block %" PRIu32
> +   " after reconnect, incompatible with old one %" PRIu32,
> +

Re: SME : Please review and merge : hw/arm/aspeed: Added eMMC boot support for AST2600 image.

2021-11-29 Thread Peter Maydell

On Tue, 9 Nov 2021 at 18:04, Shitalkumar Gandhi  wrote:
>
> Hi SME's,
>
> Please see the attached patch, which has been added to the boot eMMC image 
> for AST2600 machine on QEMU.
>
> qemu should be run as follows:
>
> ./qemu-system-arm -m 1G -M ast2600-evb -nographic -drive
> file=mmc-evb-ast2600.img,format=raw,if=sd,index=2
>
> Tested: Booted AST2600 eMMC image on QEMU.
>
> Suggested-by: Troy Lee leet...@gmail.com
> Reviewed-by: Troy Lee leet...@gmail.com
> Reviewed-by: Andrew Jeffery and...@aj.id.au
> Signed-off-by: Shitalkumar Gandhi shitalkumar.gan...@seagate.com

Hi; thanks for this patch.

Fishing the patch out of the attachment, the diff is:

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index ba5f1dc5af..6a890adb83 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -148,7 +148,7 @@ struct AspeedMachineState {
 SCU_AST2400_HW_STRAP_BOOT_MODE(AST2400_SPI_BOOT))

 /* AST2600 evb hardware value */
-#define AST2600_EVB_HW_STRAP1 0x00C0
+#define AST2600_EVB_HW_STRAP1 (0x00C0 | AST26500_HW_STRAP_BOOT_SRC_EMMC)
 #define AST2600_EVB_HW_STRAP2 0x0003

 /* Tacoma hardware value */
-- 

I've CC'd the aspeed maintainers, but since this has Andrew's R-by tag already
I'll put this into my set of patches to apply via target-arm.next for 7.0
unless somebody objects.

thanks
-- PMM

Re: [PATCH v3 04/10] hw/dma: Add the DMA control interface

2021-11-29 Thread Peter Maydell

On Wed, 24 Nov 2021 at 10:16, Francisco Iglesias
 wrote:
>
> Add an interface for controlling DMA models that are reused with other
> models. This allows a controlling model to start transfers through the
> DMA while reusing the DMA's handling of transfer state and completion
> signaling.
>
> Signed-off-by: Francisco Iglesias 
> Reviewed-by: Edgar E. Iglesias 

Could you give an expanded sketch of the design here, please?
What sort of objects would implement this new interface? Who
calls it? Should all new DMA engine devices consider implementing it?

If it's likely to be widely useful we should consider having
documentation under docs/devel for the API.

> ---
>  hw/dma/dma-ctrl.c | 31 
>  hw/dma/meson.build|  1 +
>  include/hw/dma/dma-ctrl.h | 74 
> +++
>  3 files changed, 106 insertions(+)
>  create mode 100644 hw/dma/dma-ctrl.c
>  create mode 100644 include/hw/dma/dma-ctrl.h
>
> diff --git a/hw/dma/dma-ctrl.c b/hw/dma/dma-ctrl.c
> new file mode 100644
> index 00..4a9b68dac1
> --- /dev/null
> +++ b/hw/dma/dma-ctrl.c
> @@ -0,0 +1,31 @@
> +/*
> + * DMA control interface.
> + *
> + * Copyright (c) 2021 Xilinx Inc.
> + * Written by Francisco Iglesias 
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +#include "qemu/osdep.h"
> +#include "exec/hwaddr.h"
> +#include "hw/dma/dma-ctrl.h"
> +
> +void dma_ctrl_read_with_notify(DmaCtrl *dma_ctrl, hwaddr addr, uint32_t len,
> +   DmaCtrlNotify *notify, bool start_dma)
> +{
> +DmaCtrlClass *dcc =  DMA_CTRL_GET_CLASS(dma_ctrl);
> +dcc->read(dma_ctrl, addr, len, notify, start_dma);
> +}
> +
> +static const TypeInfo dma_ctrl_info = {
> +.name  = TYPE_DMA_CTRL,
> +.parent= TYPE_INTERFACE,
> +.class_size = sizeof(DmaCtrlClass),
> +};
> +
> +static void dma_ctrl_register_types(void)
> +{
> +type_register_static(_ctrl_info);
> +}
> +
> +type_init(dma_ctrl_register_types)
> diff --git a/hw/dma/meson.build b/hw/dma/meson.build
> index f3f0661bc3..c0bc134046 100644
> --- a/hw/dma/meson.build
> +++ b/hw/dma/meson.build
> @@ -14,3 +14,4 @@ softmmu_ss.add(when: 'CONFIG_PXA2XX', if_true: 
> files('pxa2xx_dma.c'))
>  softmmu_ss.add(when: 'CONFIG_RASPI', if_true: files('bcm2835_dma.c'))
>  softmmu_ss.add(when: 'CONFIG_SIFIVE_PDMA', if_true: files('sifive_pdma.c'))
>  softmmu_ss.add(when: 'CONFIG_XLNX_CSU_DMA', if_true: files('xlnx_csu_dma.c'))
> +common_ss.add(when: 'CONFIG_XILINX_AXI', if_true: files('dma-ctrl.c'))
> diff --git a/include/hw/dma/dma-ctrl.h b/include/hw/dma/dma-ctrl.h
> new file mode 100644
> index 00..498469395f
> --- /dev/null
> +++ b/include/hw/dma/dma-ctrl.h
> @@ -0,0 +1,74 @@
> +/*
> + * DMA control interface.
> + *
> + * Copyright (c) 2021 Xilinx Inc.
> + * Written by Francisco Iglesias 
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +#ifndef HW_DMA_CTRL_H
> +#define HW_DMA_CTRL_H
> +
> +#include "qemu-common.h"

Header files should not include qemu-common.h; the comment at the
top explains:

/*
 * This file is supposed to be included only by .c files. No header file should
 * depend on qemu-common.h, as this would easily lead to circular header
 * dependencies.
 *
 * If a header file uses a definition from qemu-common.h, that definition
 * must be moved to a separate header file, and the header that uses it
 * must include that header.
 */

> +#include "hw/hw.h"
> +#include "qom/object.h"
> +
> +#define TYPE_DMA_CTRL "dma-ctrl"
> +
> +#define DMA_CTRL_CLASS(klass) \
> + OBJECT_CLASS_CHECK(DmaCtrlClass, (klass), TYPE_DMA_CTRL)
> +#define DMA_CTRL_GET_CLASS(obj) \
> +OBJECT_GET_CLASS(DmaCtrlClass, (obj), TYPE_DMA_CTRL)

Use the DECLARE_CLASS_CHECKERS macro rather than hand-writing these.

> +#define DMA_CTRL(obj) \
> + INTERFACE_CHECK(DmaCtrl, (obj), TYPE_DMA_CTRL)
> +
> +typedef void (*dmactrl_notify_fn)(void *opaque);
> +
> +typedef struct DmaCtrlNotify {
> +void *opaque;
> +dmactrl_notify_fn cb;
> +} DmaCtrlNotify;
> +
> +typedef struct DmaCtrl {
> +Object Parent;
> +} DmaCtrl;
> +
> +typedef struct DmaCtrlClass {

Can you include either "If" or "Interface" in the class/struct names
of interfaces, please? (We have examples of both, eg ArmLinuxBootIf
and IDAUInterface.) I think it makes it clearer that this is an interface
and not a real object.

> +InterfaceClass parent;
> +
> +/*
> + * read: Start a read transfer on the DMA implementing the DMA control
> + * interface
> + *
> + * @dma_ctrl: the DMA implementing this interface
> + * @addr: the address to read
> + * @len: the amount of bytes to read at 'addr'
> + * @notify: the structure containg a callback to call and opaque pointer
> + * to pass the callback when the transfer has been completed
> + * @start_dma: true for starting the DMA transfer and false for just
> + * refilling and proceding an already started transfer
> + */
> +void

Re: SME : Please review and merge : hw/arm/aspeed: Added eMMC boot support for AST2600 image.

2021-11-29 Thread Cédric Le Goater


Hello,

On 11/29/21 18:20, Peter Maydell wrote:

On Tue, 9 Nov 2021 at 18:04, Shitalkumar Gandhi  wrote:


Hi SME's,

Please see the attached patch, which has been added to the boot eMMC image for 
AST2600 machine on QEMU.

qemu should be run as follows:

./qemu-system-arm -m 1G -M ast2600-evb -nographic -drive
file=mmc-evb-ast2600.img,format=raw,if=sd,index=2

Tested: Booted AST2600 eMMC image on QEMU.

Suggested-by: Troy Lee leet...@gmail.com
Reviewed-by: Troy Lee leet...@gmail.com
Reviewed-by: Andrew Jeffery and...@aj.id.au
Signed-off-by: Shitalkumar Gandhi shitalkumar.gan...@seagate.com


Hi; thanks for this patch.

Fishing the patch out of the attachment, the diff is:


Yes. A pull request was sent here also :
 
  https://github.com/openbmc/qemu/pull/35


The patch is based on the OpenBMC QEMU branch which includes a large
change adding eMMC support to the SD model. But without the eMMC
model upstream, it's pointless, we can only boot from flash.

For the time being, a "boot-emmc" machine option to set/unset the emmc
boot should be enough. It's all in my branch. I think the right approach
would be to use the boot index of the device on the command line to
change the hw strapping.

Thanks,

C.




diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index ba5f1dc5af..6a890adb83 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -148,7 +148,7 @@ struct AspeedMachineState {
  SCU_AST2400_HW_STRAP_BOOT_MODE(AST2400_SPI_BOOT))

  /* AST2600 evb hardware value */
-#define AST2600_EVB_HW_STRAP1 0x00C0
+#define AST2600_EVB_HW_STRAP1 (0x00C0 | AST26500_HW_STRAP_BOOT_SRC_EMMC)
  #define AST2600_EVB_HW_STRAP2 0x0003

  /* Tacoma hardware value */

Re: [PATCH v3 02/10] hw/arm/xlnx-versal: Connect Versal's PMC SLCR

2021-11-29 Thread Peter Maydell

On Wed, 24 Nov 2021 at 10:16, Francisco Iglesias
 wrote:
>
> Connect Versal's PMC SLCR (system-level control registers) model.
>
> Signed-off-by: Francisco Iglesias 
> Reviewed-by: Edgar E. Iglesias 
> ---
>  hw/arm/xlnx-versal.c | 18 ++
>  include/hw/arm/xlnx-versal.h |  6 ++
>  2 files changed, 24 insertions(+)
>
> diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
> index b2705b6925..08e250945f 100644
> --- a/hw/arm/xlnx-versal.c
> +++ b/hw/arm/xlnx-versal.c
> @@ -369,6 +369,23 @@ static void versal_create_efuse(Versal *s, qemu_irq *pic)
>  sysbus_connect_irq(SYS_BUS_DEVICE(ctrl), 0, pic[VERSAL_EFUSE_IRQ]);
>  }
>
> +static void versal_create_pmc_iou_slcr(Versal *s, qemu_irq *pic)
> +{
> +SysBusDevice *sbd;
> +
> +object_initialize_child(OBJECT(s), "versal-pmc-iou-slcr", 
> >pmc.iou.slcr,
> +TYPE_XILINX_VERSAL_PMC_IOU_SLCR);
> +
> +sbd = SYS_BUS_DEVICE(>pmc.iou.slcr);
> +sysbus_realize(sbd, _fatal);
> +
> +memory_region_add_subregion(>mr_ps, MM_PMC_PMC_IOU_SLCR,
> +sysbus_mmio_get_region(sbd, 0));

Nit: the indent here is wrong.

Otherwise
Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH v3 03/10] include/hw/dma/xlnx_csu_dma: Add in missing includes in the header

2021-11-29 Thread Peter Maydell

On Wed, 24 Nov 2021 at 10:16, Francisco Iglesias
 wrote:
>
> Add in the missing includes in the header for being able to build the DMA
> model when reusing it.
>
> Signed-off-by: Francisco Iglesias 
> ---
>  include/hw/dma/xlnx_csu_dma.h | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/include/hw/dma/xlnx_csu_dma.h b/include/hw/dma/xlnx_csu_dma.h
> index 9e9dc551e9..28806628b1 100644
> --- a/include/hw/dma/xlnx_csu_dma.h
> +++ b/include/hw/dma/xlnx_csu_dma.h
> @@ -21,6 +21,11 @@
>  #ifndef XLNX_CSU_DMA_H
>  #define XLNX_CSU_DMA_H
>
> +#include "hw/sysbus.h"
> +#include "hw/register.h"
> +#include "hw/ptimer.h"
> +#include "hw/stream.h"
> +
>  #define TYPE_XLNX_CSU_DMA "xlnx.csu_dma"
>
>  #define XLNX_CSU_DMA_R_MAX (0x2c / 4)
> --
> 2.11.0

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH v3 10/23] multifd: Make zlib compression method not use iovs

2021-11-29 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> Signed-off-by: Juan Quintela 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  migration/multifd-zlib.c | 17 +
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
> index e85ef8824d..da6201704c 100644
> --- a/migration/multifd-zlib.c
> +++ b/migration/multifd-zlib.c
> @@ -13,6 +13,7 @@
>  #include "qemu/osdep.h"
>  #include 
>  #include "qemu/rcu.h"
> +#include "exec/ramblock.h"
>  #include "exec/target_page.h"
>  #include "qapi/error.h"
>  #include "migration.h"
> @@ -98,8 +99,8 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error 
> **errp)
>   */
>  static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
>  {
> -struct iovec *iov = p->pages->iov;
>  struct zlib_data *z = p->data;
> +size_t page_size = qemu_target_page_size();
>  z_stream *zs = >zs;
>  uint32_t out_size = 0;
>  int ret;
> @@ -113,8 +114,8 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
> **errp)
>  flush = Z_SYNC_FLUSH;
>  }
>  
> -zs->avail_in = iov[i].iov_len;
> -zs->next_in = iov[i].iov_base;
> +zs->avail_in = page_size;
> +zs->next_in = p->pages->block->host + p->pages->offset[i];
>  
>  zs->avail_out = available;
>  zs->next_out = z->zbuff + out_size;
> @@ -235,6 +236,7 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p)
>  static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
>  {
>  struct zlib_data *z = p->data;
> +size_t page_size = qemu_target_page_size();
>  z_stream *zs = >zs;
>  uint32_t in_size = p->next_packet_size;
>  /* we measure the change of total_out */
> @@ -259,7 +261,6 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error 
> **errp)
>  zs->next_in = z->zbuff;
>  
>  for (i = 0; i < p->pages->num; i++) {
> -struct iovec *iov = >pages->iov[i];
>  int flush = Z_NO_FLUSH;
>  unsigned long start = zs->total_out;
>  
> @@ -267,8 +268,8 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error 
> **errp)
>  flush = Z_SYNC_FLUSH;
>  }
>  
> -zs->avail_out = iov->iov_len;
> -zs->next_out = iov->iov_base;
> +zs->avail_out = page_size;
> +zs->next_out = p->pages->block->host + p->pages->offset[i];
>  
>  /*
>   * Welcome to inflate semantics
> @@ -281,8 +282,8 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error 
> **errp)
>  do {
>  ret = inflate(zs, flush);
>  } while (ret == Z_OK && zs->avail_in
> - && (zs->total_out - start) < iov->iov_len);
> -if (ret == Z_OK && (zs->total_out - start) < iov->iov_len) {
> + && (zs->total_out - start) < page_size);
> +if (ret == Z_OK && (zs->total_out - start) < page_size) {
>  error_setg(errp, "multifd %d: inflate generated too few output",
> p->id);
>  return -1;
> -- 
> 2.33.1
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: Follow-up on the CXL discussion at OFTC

2021-11-29 Thread Ben Widawsky

On 21-11-26 12:08:08, Alex Bennée wrote:
> 
> Ben Widawsky  writes:
> 
> > On 21-11-19 02:29:51, Shreyas Shah wrote:
> >> Hi Ben
> >> 
> >> Are you planning to add the CXL2.0 switch inside QEMU or already added in 
> >> one of the version? 
> >>  
> >
> > From me, there are no plans for QEMU anything until/unless upstream thinks 
> > it
> > will merge the existing patches, or provide feedback as to what it would 
> > take to
> > get them merged. If upstream doesn't see a point in these patches, then I 
> > really
> > don't see much value in continuing to further them. Once hardware comes 
> > out, the
> > value proposition is certainly less.
> 
> I take it:
> 
>   Subject: [RFC PATCH v3 00/31] CXL 2.0 Support
>   Date: Mon,  1 Feb 2021 16:59:17 -0800
>   Message-Id: <20210202005948.241655-1-ben.widaw...@intel.com>
> 
> is the current state of the support? I saw there was a fair amount of
> discussion on the thread so assumed there would be a v4 forthcoming at
> some point.

Hi Alex,

There is a v4, however, we never really had a solid plan for the primary issue
which was around handling CXL memory expander devices properly (both from an
interleaving standpoint as well as having a device which hosts multiple memory
capacities, persistent and volatile). I didn't feel it was worth sending a v4
unless someone could say
1. we will merge what's there and fix later, or
2. you must have a more perfect emulation in place, or
3. we want to see usages for a real guest

I had hoped we could merge what was there mostly as is and fix it up as we go.
It's useful in the state it is now, and as time goes on, we find more usecases
for it in a VMM, and not just driver development.

> 
> Adding new subsystems to QEMU does seem to be a pain point for new
> contributors. Patches tend to fall through the cracks of existing
> maintainers who spend most of their time looking at stuff that directly
> touches their files. There is also a reluctance to merge large chunks of
> functionality without an identified maintainer (and maybe reviewers) who
> can be the contact point for new patches. So in short you need:
> 
>  - Maintainer Reviewed-by/Acked-by on patches that touch other sub-systems

This is the challenging one. I have Cc'd the relevant maintainers (hw/pci and
hw/mem are the two) in the past, but I think there interest is lacking (and
reasonably so, it is an entirely different subsystem).

>  - Reviewed-by tags on the new sub-system patches from anyone who understands 
> CXL

I have/had those from Jonathan.

>  - Some* in-tree testing (so it doesn't quietly bitrot)

We had this, but it's stale now. We can bring this back up.

>  - A patch adding the sub-system to MAINTAINERS with identified people

That was there too. Since the original posting, I'd be happy to sign Jonathan up
to this if he's willing.

> 
> * Some means at least ensuring qtest can instantiate the device and not
>   fall over. Obviously more testing is better but it can always be
>   expanded on in later series.

This was in the patch series. It could use more testing for sure, but I had
basic functional testing in place via qtest.

> 
> Is that the feedback you were looking for?

You validated my assumptions as to what's needed, but your first bullet is the
one I can't seem to pin down.

Thanks.
Ben

Re: [PATCH] s390x/ipl: support extended kernel command line size

2021-11-29 Thread Christian Borntraeger





Am 22.11.21 um 12:29 schrieb Marc Hartmayer:

In the past s390 used a fixed command line length of 896 bytes. This has changed
with the Linux commit 5ecb2da660ab ("s390: support command lines longer than 896
bytes"). There is now a parm area indicating the maximum command line size. This
parm area has always been initialized to zero, so with older kernels this field
would read zero and we must then assume that only 896 bytes are available.

Acked-by: Viktor Mihajlovski 
Signed-off-by: Marc Hartmayer 


Reviewed-by: Christian Borntraeger 


---
  hw/s390x/ipl.c | 23 ---
  1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index 7ddca0127fc2..092c66b3f9f1 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -37,8 +37,9 @@
  
  #define KERN_IMAGE_START0x01UL

  #define LINUX_MAGIC_ADDR0x010008UL
+#define KERN_PARM_AREA_SIZE_ADDR0x010430UL
  #define KERN_PARM_AREA  0x010480UL
-#define KERN_PARM_AREA_SIZE 0x000380UL
+#define LEGACY_KERN_PARM_AREA_SIZE  0x000380UL
  #define INITRD_START0x80UL
  #define INITRD_PARM_START   0x010408UL
  #define PARMFILE_START  0x001000UL
@@ -110,6 +111,21 @@ static uint64_t bios_translate_addr(void *opaque, uint64_t 
srcaddr)
  return srcaddr + dstaddr;
  }
  
+static uint64_t get_max_kernel_cmdline_size(void)

+{
+uint64_t *size_ptr = rom_ptr(KERN_PARM_AREA_SIZE_ADDR, sizeof(*size_ptr));
+
+if (size_ptr) {
+uint64_t size;
+
+size = be64_to_cpu(*size_ptr);
+if (size != 0) {
+return size;
+}
+}
+return LEGACY_KERN_PARM_AREA_SIZE;
+}
+
  static void s390_ipl_realize(DeviceState *dev, Error **errp)
  {
  MachineState *ms = MACHINE(qdev_get_machine());
@@ -197,10 +213,11 @@ static void s390_ipl_realize(DeviceState *dev, Error 
**errp)
  ipl->start_addr = KERN_IMAGE_START;
  /* Overwrite parameters in the kernel image, which are "rom" */
  if (parm_area) {
-if (cmdline_size > KERN_PARM_AREA_SIZE) {
+uint64_t max_cmdline_size = get_max_kernel_cmdline_size();
+if (cmdline_size > max_cmdline_size) {
  error_setg(errp,
 "kernel command line exceeds maximum size: %zu > 
%lu",
-   cmdline_size, KERN_PARM_AREA_SIZE);
+   cmdline_size, max_cmdline_size);
  return;
  }

Re: [PATCH for-6.1 v2] i386: do not call cpudef-only models functions for max, host, base

2021-11-29 Thread David Woodhouse

On Mon, 2021-11-29 at 17:57 +0100, Claudio Fontana wrote:
> On 11/29/21 4:11 PM, David Woodhouse wrote:
> > On Mon, 2021-11-29 at 15:14 +0100, Claudio Fontana wrote:
> > > On 11/29/21 12:39 PM, Woodhouse, David wrote:
> > > > On Fri, 2021-07-23 at 13:29 +0200, Claudio Fontana wrote:
> > > > >  static void kvm_cpu_instance_init(CPUState *cs)
> > > > >  {
> > > > >  X86CPU *cpu = X86_CPU(cs);
> > > > > +X86CPUClass *xcc = X86_CPU_GET_CLASS(cpu);
> > > > > 
> > > > >  host_cpu_instance_init(cpu);
> > > > > 
> > > > > -if (!kvm_irqchip_in_kernel()) {
> > > > > -x86_cpu_change_kvm_default("x2apic", "off");
> > > > > -} else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) {
> > > > > -x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
> > > > > -}
> > > > > -
> > > > > -/* Special cases not set in the X86CPUDefinition structs: */
> > > > > +if (xcc->model) {
> > > > > +/* only applies to builtin_x86_defs cpus */
> > > > > +if (!kvm_irqchip_in_kernel()) {
> > > > > +x86_cpu_change_kvm_default("x2apic", "off");
> > > > > +} else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) {
> > > > > +x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
> > > > > +}
> > > > > 
> > > > > -x86_cpu_apply_props(cpu, kvm_default_props);
> > > > > +/* Special cases not set in the X86CPUDefinition structs: */
> > > > > +x86_cpu_apply_props(cpu, kvm_default_props);
> > > > > +}
> > > > > 
> > > > 
> > > > I think this causes a regression in x2apic and kvm-msi-ext-dest-id
> > > > support. If you start qemu thus:
> > > 
> > > If I recall correctly, this change just tries to restore the behavior 
> > > prior to
> > > commit f5cc5a5c168674f84bf061cdb307c2d25fba5448 ,
> > > 
> > > fixing the issue introduced with the refactoring at that time.
> > > 
> > > Can you try bisecting prior to
> > > f5cc5a5c168674f84bf061cdb307c2d25fba5448 , to see if the actual
> > > breakage comes from somewhere else?
> > 
> > Hm, so it looks like it never worked for '-cpu host' *until* commit
> > f5cc5a5c16.
> 
> Right, so here we are talking about properly supporting this for the first 
> time.
> 
> The fact that it works with f5cc5a5c16 is more an accident than anything 
> else, that commit was clearly broken
> (exemplified by reports of failed boots).
> 
> So we need to find the proper solution, ie, exactly which features should be 
> enabled for which cpu classes and models.
> 
> > 
> > It didn't matter before c1bb5418e3 because you couldn't enable that
> > many vCPUs without an IOMMU, and the *IOMMU* setup would call
> > kvm_enable_x2apic().
> > 
> > But after that, nothing ever called kvm_enable_x2apic() in the '-cpu
> > host' case until commit f5cc5a5c16, which fixed it... until you
> > restored the previous behaviour :)
> > 
> > This "works" to fix this case, but presumably isn't correct:
> 
> Right, we cannot just enable all this code, or the original refactor would 
> have been right.
> 
> These kvm default properties have been as far as I know intended for the cpu 
> actual models (builtin_x86_defs),
> and not for the special cpu classes max, host and base. This is what the 
> revert addresses.
> 
> I suspect what we actually need here is to review exactly in which specific 
> cases kvm_enable_x2apic() should be called in the end.
> 
> The code there is mixing changes to the kvm_default_props that are then 
> applied using x86_cpu_apply_props (and that part should be only for 
> xcc->model != NULL),
> with the actual enablement of the kvm x2apic using kvm_vm_enable_cap(s, 
> KVM_CAP_X2APIC_API, 0, flags) via kvm_enable_x2apic().
> 
> One way is to ignore this detail and just move out those checks, since 
> changes to kvm_default_props are harmless once we skip the 
> x86_cpu_apply_props call,
> as such: 
> 
> --
> 
> static void kvm_cpu_instance_init(CPUState *cs)
> {
> X86CPU *cpu = X86_CPU(cs);
> X86CPUClass *xcc = X86_CPU_GET_CLASS(cpu);
> 
> host_cpu_instance_init(cpu);
> 
> /* only applies to builtin_x86_defs cpus */
> if (!kvm_irqchip_in_kernel()) {
> x86_cpu_change_kvm_default("x2apic", "off");
> } else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) {
> x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
> }
> 
> if (xcc->model) {
> /* Special cases not set in the X86CPUDefinition structs: */
> x86_cpu_apply_props(cpu, kvm_default_props);
> }
> 

I don't believe that works in the case when kvm_enable_x2apic() fails
on an older kernel. Although it sets the defaults, it still doesn't
then *apply* them so it makes no difference.

How about this:

--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -739,9 +739,9 @@ void pc_machine_done(Notifier *notifier, void *data)
 
 
 if (x86ms->apic_id_limit > 255 && !xen_enabled() &&
-!kvm_irqchip_in_kernel()) {
+(!kvm_irqchip_in_kernel() || !kvm_enable_x2apic())) {

Re: [PATCH v3 09/23] multifd: Make zstd compression method not use iovs

2021-11-29 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> Signed-off-by: Juan Quintela 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  migration/multifd-zstd.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
> index a8b104f4ee..2d5b61106c 100644
> --- a/migration/multifd-zstd.c
> +++ b/migration/multifd-zstd.c
> @@ -13,6 +13,7 @@
>  #include "qemu/osdep.h"
>  #include 
>  #include "qemu/rcu.h"
> +#include "exec/ramblock.h"
>  #include "exec/target_page.h"
>  #include "qapi/error.h"
>  #include "migration.h"
> @@ -111,8 +112,8 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error 
> **errp)
>   */
>  static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
>  {
> -struct iovec *iov = p->pages->iov;
>  struct zstd_data *z = p->data;
> +size_t page_size = qemu_target_page_size();
>  int ret;
>  uint32_t i;
>  
> @@ -126,8 +127,8 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error 
> **errp)
>  if (i == p->pages->num - 1) {
>  flush = ZSTD_e_flush;
>  }
> -z->in.src = iov[i].iov_base;
> -z->in.size = iov[i].iov_len;
> +z->in.src = p->pages->block->host + p->pages->offset[i];
> +z->in.size = page_size;
>  z->in.pos = 0;
>  
>  /*
> @@ -256,7 +257,8 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error 
> **errp)
>  {
>  uint32_t in_size = p->next_packet_size;
>  uint32_t out_size = 0;
> -uint32_t expected_size = p->pages->num * qemu_target_page_size();
> +size_t page_size = qemu_target_page_size();
> +uint32_t expected_size = p->pages->num * page_size;
>  uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
>  struct zstd_data *z = p->data;
>  int ret;
> @@ -278,10 +280,8 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error 
> **errp)
>  z->in.pos = 0;
>  
>  for (i = 0; i < p->pages->num; i++) {
> -struct iovec *iov = >pages->iov[i];
> -
> -z->out.dst = iov->iov_base;
> -z->out.size = iov->iov_len;
> +z->out.dst = p->pages->block->host + p->pages->offset[i];
> +z->out.size = page_size;
>  z->out.pos = 0;
>  
>  /*
> @@ -295,8 +295,8 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error 
> **errp)
>  do {
>  ret = ZSTD_decompressStream(z->zds, >out, >in);
>  } while (ret > 0 && (z->in.size - z->in.pos > 0)
> - && (z->out.pos < iov->iov_len));
> -if (ret > 0 && (z->out.pos < iov->iov_len)) {
> + && (z->out.pos < page_size));
> +if (ret > 0 && (z->out.pos < page_size)) {
>  error_setg(errp, "multifd %d: decompressStream buffer too small",
> p->id);
>  return -1;
> -- 
> 2.33.1
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[PULL 8/8] tests/plugin/syscall.c: fix compiler warnings

2021-11-29 Thread Alex Bennée

From: Juro Bystricky 

Fix compiler warnings. The warnings can result in a broken build.
This patch fixes warnings such as:

In file included from /usr/include/glib-2.0/glib.h:111,
 from ../tests/plugin/syscall.c:13:
../tests/plugin/syscall.c: In function ‘print_entry’:
/usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: error: ‘out’ may be
   used uninitialized in this function [-Werror=maybe-uninitialized]
   g_free (*pp);
   ^~~~
../tests/plugin/syscall.c:82:23: note: ‘out’ was declared here
 g_autofree gchar *out;
   ^~~
In file included from /usr/include/glib-2.0/glib.h:111,
 from ../tests/plugin/syscall.c:13:
../tests/plugin/syscall.c: In function ‘vcpu_syscall_ret’:
/usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: error: ‘out’ may be
used uninitialized in this function [-Werror=maybe-uninitialized]
   g_free (*pp);
   ^~~~
../tests/plugin/syscall.c:73:27: note: ‘out’ was declared here
 g_autofree gchar *out;
   ^~~
cc1: all warnings being treated as errors

Signed-off-by: Juro Bystricky 
Signed-off-by: Alex Bennée 
Message-Id: <20211128011551.2115468-1-juro.bystri...@intel.com>
Reviewed-by: Richard Henderson 
Message-Id: <20211129140932.4115115-9-alex.ben...@linaro.org>

diff --git a/tests/plugin/syscall.c b/tests/plugin/syscall.c
index 484b48de49..96040c578f 100644
--- a/tests/plugin/syscall.c
+++ b/tests/plugin/syscall.c
@@ -70,19 +70,17 @@ static void vcpu_syscall_ret(qemu_plugin_id_t id, unsigned 
int vcpu_idx,
 }
 g_mutex_unlock();
 } else {
-g_autofree gchar *out;
-out = g_strdup_printf("syscall #%" PRIi64 " returned -> %" PRIi64 "\n",
-num, ret);
+g_autofree gchar *out = g_strdup_printf(
+ "syscall #%" PRIi64 " returned -> %" PRIi64 "\n", num, ret);
 qemu_plugin_outs(out);
 }
 }
 
 static void print_entry(gpointer val, gpointer user_data)
 {
-g_autofree gchar *out;
 SyscallStats *entry = (SyscallStats *) val;
 int64_t syscall_num = entry->num;
-out = g_strdup_printf(
+g_autofree gchar *out = g_strdup_printf(
 "%-13" PRIi64 "%-6" PRIi64 " %" PRIi64 "\n",
 syscall_num, entry->calls, entry->errors);
 qemu_plugin_outs(out);
-- 
2.30.2

[PULL 2/8] accel/tcg: suppress IRQ check for special TBs

2021-11-29 Thread Alex Bennée

When we set cpu->cflags_next_tb it is because we want to carefully
control the execution of the next TB. Currently there is a race that
causes the second stage of watchpoint handling to get ignored if an
IRQ is processed before we finish executing the instruction that
triggers the watchpoint. Use the new CF_NOIRQ facility to avoid the
race.

We also suppress IRQs when handling precise self modifying code to
avoid unnecessary bouncing.

Signed-off-by: Alex Bennée 
Cc: Pavel Dovgalyuk 
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/245
Reviewed-by: Richard Henderson 
Message-Id: <20211129140932.4115115-3-alex.ben...@linaro.org>

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 2d14d02f6c..409ec8c38c 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -721,6 +721,15 @@ static inline bool need_replay_interrupt(int 
interrupt_request)
 static inline bool cpu_handle_interrupt(CPUState *cpu,
 TranslationBlock **last_tb)
 {
+/*
+ * If we have requested custom cflags with CF_NOIRQ we should
+ * skip checking here. Any pending interrupts will get picked up
+ * by the next TB we execute under normal cflags.
+ */
+if (cpu->cflags_next_tb != -1 && cpu->cflags_next_tb & CF_NOIRQ) {
+return false;
+}
+
 /* Clear the interrupt flag now since we're processing
  * cpu->interrupt_request and cpu->exit_request.
  * Ensure zeroing happens before reading cpu->exit_request or
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index bd0bb81d08..bd71db59a9 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1738,7 +1738,7 @@ tb_invalidate_phys_page_range__locked(struct 
page_collection *pages,
 if (current_tb_modified) {
 page_collection_unlock(pages);
 /* Force execution of one insn next time.  */
-cpu->cflags_next_tb = 1 | curr_cflags(cpu);
+cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(cpu);
 mmap_unlock();
 cpu_loop_exit_noexc(cpu);
 }
@@ -1906,7 +1906,7 @@ static bool tb_invalidate_phys_page(tb_page_addr_t addr, 
uintptr_t pc)
 #ifdef TARGET_HAS_PRECISE_SMC
 if (current_tb_modified) {
 /* Force execution of one insn next time.  */
-cpu->cflags_next_tb = 1 | curr_cflags(cpu);
+cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(cpu);
 return true;
 }
 #endif
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 314f8b439c..3524c04c2a 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -912,7 +912,7 @@ void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr 
len,
  */
 if (!cpu->can_do_io) {
 /* Force execution of one insn next time.  */
-cpu->cflags_next_tb = 1 | CF_LAST_IO | curr_cflags(cpu);
+cpu->cflags_next_tb = 1 | CF_LAST_IO | CF_NOIRQ | 
curr_cflags(cpu);
 cpu_loop_exit_restore(cpu, ra);
 }
 /*
@@ -946,7 +946,7 @@ void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr 
len,
 cpu_loop_exit(cpu);
 } else {
 /* Force execution of one insn next time.  */
-cpu->cflags_next_tb = 1 | CF_LAST_IO | curr_cflags(cpu);
+cpu->cflags_next_tb = 1 | CF_LAST_IO | CF_NOIRQ | 
curr_cflags(cpu);
 mmap_unlock();
 cpu_loop_exit_noexc(cpu);
 }
-- 
2.30.2

[PULL 4/8] plugins/meson.build: fix linker issue with weird paths

2021-11-29 Thread Alex Bennée

Signed-off-by: Alex Bennée 
Tested-by: Stefan Weil 
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/712
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20211129140932.4115115-5-alex.ben...@linaro.org>

diff --git a/plugins/meson.build b/plugins/meson.build
index aeb386ebae..b3de57853b 100644
--- a/plugins/meson.build
+++ b/plugins/meson.build
@@ -2,9 +2,9 @@ plugin_ldflags = []
 # Modules need more symbols than just those in plugins/qemu-plugins.symbols
 if not enable_modules
   if 'CONFIG_HAS_LD_DYNAMIC_LIST' in config_host
-plugin_ldflags = ['-Wl,--dynamic-list=' + (meson.project_build_root() / 
'qemu-plugins-ld.symbols')]
+plugin_ldflags = ['-Wl,--dynamic-list=qemu-plugins-ld.symbols']
   elif 'CONFIG_HAS_LD_EXPORTED_SYMBOLS_LIST' in config_host
-plugin_ldflags = ['-Wl,-exported_symbols_list,' + 
(meson.project_build_root() / 'qemu-plugins-ld64.symbols')]
+plugin_ldflags = ['-Wl,-exported_symbols_list,qemu-plugins-ld64.symbols']
   endif
 endif
 
-- 
2.30.2

[PULL 7/8] MAINTAINERS: Add section for Aarch64 GitLab custom runner

2021-11-29 Thread Alex Bennée

From: Philippe Mathieu-Daudé 

Add a MAINTAINERS section to cover the GitLab YAML config file
containing the jobs run on the custom runner sponsored by the
Works On Arm project [*].

[*] https://developer.arm.com/solutions/infrastructure/works-on-arm

Suggested-by: Thomas Huth 
Signed-off-by: Philippe Mathieu-Daudé 
Signed-off-by: Alex Bennée 
Message-Id: <2026163226.2719320-1-f4...@amsat.org>
Message-Id: <20211129140932.4115115-8-alex.ben...@linaro.org>

diff --git a/MAINTAINERS b/MAINTAINERS
index 8f5156bfa7..006a2293ba 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3511,6 +3511,12 @@ R: Beraldo Leal 
 S: Odd Fixes
 F: tests/avocado/
 
+GitLab custom runner (Works On Arm Sponsored)
+M: Alex Bennée 
+M: Philippe Mathieu-Daudé 
+S: Maintained
+F: .gitlab-ci.d/custom-runners/ubuntu-20.04-aarch64.yml
+
 Documentation
 -
 Build system architecture
-- 
2.30.2

[PULL 3/8] tests/avocado: fix tcg_plugin mem access count test

2021-11-29 Thread Alex Bennée

When we cleaned up argument handling the test was missed.

Fixes: 5ae589faad ("tests/plugins/mem: introduce "track" arg and make args not 
positional")
Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20211129140932.4115115-4-alex.ben...@linaro.org>

diff --git a/tests/avocado/tcg_plugins.py b/tests/avocado/tcg_plugins.py
index 9ca1515c3b..642d2e49e3 100644
--- a/tests/avocado/tcg_plugins.py
+++ b/tests/avocado/tcg_plugins.py
@@ -131,7 +131,7 @@ def test_aarch64_virt_mem_icount(self):
  suffix=".log")
 
 self.run_vm(kernel_path, kernel_command_line,
-"tests/plugin/libmem.so,arg=both", plugin_log.name,
+"tests/plugin/libmem.so,inline=true,callback=true", 
plugin_log.name,
 console_pattern,
 args=('-icount', 'shift=1'))
 
-- 
2.30.2

[PULL for 6.2 0/8] more tcg, plugin, test and build fixes

2021-11-29 Thread Alex Bennée

The following changes since commit e750c10167fa8ad3fcc98236a474c46e52e7c18c:

  Merge tag 'pull-target-arm-20211129' of 
https://git.linaro.org/people/pmaydell/qemu-arm into staging (2021-11-29 
11:56:07 +0100)

are available in the Git repository at:

  https://github.com/stsquad/qemu.git tags/pull-for-6.2-291121-1

for you to fetch changes up to d5615bbf9103f01911df683cc3e4e85c49a92593:

  tests/plugin/syscall.c: fix compiler warnings (2021-11-29 15:13:22 +)


TCG, plugin and build fixes:

  - introduce CF_NOIRQ to avoid watchpoint race
  - fix avocado plugin test
  - fix linker issue with weird paths
  - band-aid for gdbstub race
  - updates for MAINTAINERS
  - fix some compiler warning in example plugin


Alex Bennée (5):
  accel/tcg: introduce CF_NOIRQ
  accel/tcg: suppress IRQ check for special TBs
  tests/avocado: fix tcg_plugin mem access count test
  plugins/meson.build: fix linker issue with weird paths
  gdbstub: handle a potentially racing TaskState

Juro Bystricky (1):
  tests/plugin/syscall.c: fix compiler warnings

Philippe Mathieu-Daudé (1):
  MAINTAINERS: Add section for Aarch64 GitLab custom runner

Willian Rampazzo (1):
  MAINTAINERS: Remove me as a reviewer for the build and test/avocado

 include/exec/exec-all.h  |  1 +
 include/exec/gen-icount.h| 21 +
 accel/tcg/cpu-exec.c |  9 +
 accel/tcg/translate-all.c|  4 ++--
 gdbstub.c|  2 +-
 softmmu/physmem.c|  4 ++--
 tests/plugin/syscall.c   |  8 +++-
 MAINTAINERS  | 10 --
 plugins/meson.build  |  4 ++--
 tests/avocado/tcg_plugins.py |  2 +-
 10 files changed, 46 insertions(+), 19 deletions(-)

-- 
2.30.2

[PULL 6/8] MAINTAINERS: Remove me as a reviewer for the build and test/avocado

2021-11-29 Thread Alex Bennée

From: Willian Rampazzo 

Remove me as a reviewer for the Build and test automation and the
Integration Testing with the Avocado Framework and add Beraldo
Leal.

Signed-off-by: Willian Rampazzo 
Reviewed-by: Beraldo Leal 
Message-Id: <20211122191124.31620-1-willi...@redhat.com>
Signed-off-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20211129140932.4115115-7-alex.ben...@linaro.org>

diff --git a/MAINTAINERS b/MAINTAINERS
index d3879aa3c1..8f5156bfa7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3469,7 +3469,7 @@ M: Alex Bennée 
 M: Philippe Mathieu-Daudé 
 M: Thomas Huth 
 R: Wainer dos Santos Moschetta 
-R: Willian Rampazzo 
+R: Beraldo Leal 
 S: Maintained
 F: .github/lockdown.yml
 F: .gitlab-ci.yml
@@ -3507,7 +3507,7 @@ W: https://trello.com/b/6Qi1pxVn/avocado-qemu
 R: Cleber Rosa 
 R: Philippe Mathieu-Daudé 
 R: Wainer dos Santos Moschetta 
-R: Willian Rampazzo 
+R: Beraldo Leal 
 S: Odd Fixes
 F: tests/avocado/
 
-- 
2.30.2

[PULL 5/8] gdbstub: handle a potentially racing TaskState

2021-11-29 Thread Alex Bennée

When dealing with multi-threaded userspace programs there is a race
condition with the addition of cpu->opaque (aka TaskState). This is
due to cpu_copy calling cpu_create which updates the global vCPU list.
However the task state isn't set until later. This shouldn't be a
problem because the new thread can't have executed anything yet but
the gdbstub code does liberally iterate through the CPU list in
various places.

This sticking plaster ensure the not yet fully realized vCPU is given
an pid of -1 which should be enough to ensure it doesn't show up
anywhere else.

In the longer term I think the code that manages the association
between vCPUs and attached GDB processes could do with a clean-up and
re-factor.

Signed-off-by: Alex Bennée 
Tested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Cc: Richard Henderson 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/730
Message-Id: <20211129140932.4115115-6-alex.ben...@linaro.org>

diff --git a/gdbstub.c b/gdbstub.c
index 23baaef40e..141d7bc4ec 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -94,7 +94,7 @@ static inline int cpu_gdb_index(CPUState *cpu)
 {
 #if defined(CONFIG_USER_ONLY)
 TaskState *ts = (TaskState *) cpu->opaque;
-return ts->ts_tid;
+return ts ? ts->ts_tid : -1;
 #else
 return cpu->cpu_index + 1;
 #endif
-- 
2.30.2

[PULL 1/8] accel/tcg: introduce CF_NOIRQ

2021-11-29 Thread Alex Bennée

Here we introduce a new compiler flag to disable the checking of exit
request (icount_decr.u32). This is useful when we want to ensure the
next block cannot be preempted by an asynchronous event.

Suggested-by: Richard Henderson 
Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Message-Id: <20211129140932.4115115-2-alex.ben...@linaro.org>

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 6bb2a0f7ec..35d8e93976 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -503,6 +503,7 @@ struct TranslationBlock {
 #define CF_USE_ICOUNT0x0002
 #define CF_INVALID   0x0004 /* TB is stale. Set with @jmp_lock held */
 #define CF_PARALLEL  0x0008 /* Generate code for a parallel context */
+#define CF_NOIRQ 0x0010 /* Generate an uninterruptible TB */
 #define CF_CLUSTER_MASK  0xff00 /* Top 8 bits are cluster ID */
 #define CF_CLUSTER_SHIFT 24
 
diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 610cba58fe..c57204ddad 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -21,7 +21,6 @@ static inline void gen_tb_start(const TranslationBlock *tb)
 {
 TCGv_i32 count;
 
-tcg_ctx->exitreq_label = gen_new_label();
 if (tb_cflags(tb) & CF_USE_ICOUNT) {
 count = tcg_temp_local_new_i32();
 } else {
@@ -42,7 +41,19 @@ static inline void gen_tb_start(const TranslationBlock *tb)
 icount_start_insn = tcg_last_op();
 }
 
-tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
+/*
+ * Emit the check against icount_decr.u32 to see if we should exit
+ * unless we suppress the check with CF_NOIRQ. If we are using
+ * icount and have suppressed interruption the higher level code
+ * should have ensured we don't run more instructions than the
+ * budget.
+ */
+if (tb_cflags(tb) & CF_NOIRQ) {
+tcg_ctx->exitreq_label = NULL;
+} else {
+tcg_ctx->exitreq_label = gen_new_label();
+tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
+}
 
 if (tb_cflags(tb) & CF_USE_ICOUNT) {
 tcg_gen_st16_i32(count, cpu_env,
@@ -74,8 +85,10 @@ static inline void gen_tb_end(const TranslationBlock *tb, 
int num_insns)
tcgv_i32_arg(tcg_constant_i32(num_insns)));
 }
 
-gen_set_label(tcg_ctx->exitreq_label);
-tcg_gen_exit_tb(tb, TB_EXIT_REQUESTED);
+if (tcg_ctx->exitreq_label) {
+gen_set_label(tcg_ctx->exitreq_label);
+tcg_gen_exit_tb(tb, TB_EXIT_REQUESTED);
+}
 }
 
 #endif
-- 
2.30.2

Re: [PATCH for-6.1 v2] i386: do not call cpudef-only models functions for max, host, base

2021-11-29 Thread Claudio Fontana

On 11/29/21 5:57 PM, Claudio Fontana wrote:
> On 11/29/21 4:11 PM, David Woodhouse wrote:
>> On Mon, 2021-11-29 at 15:14 +0100, Claudio Fontana wrote:
>>> On 11/29/21 12:39 PM, Woodhouse, David wrote:
 On Fri, 2021-07-23 at 13:29 +0200, Claudio Fontana wrote:
>  static void kvm_cpu_instance_init(CPUState *cs)
>  {
>  X86CPU *cpu = X86_CPU(cs);
> +X86CPUClass *xcc = X86_CPU_GET_CLASS(cpu);
>
>  host_cpu_instance_init(cpu);
>
> -if (!kvm_irqchip_in_kernel()) {
> -x86_cpu_change_kvm_default("x2apic", "off");
> -} else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) {
> -x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
> -}
> -
> -/* Special cases not set in the X86CPUDefinition structs: */
> +if (xcc->model) {
> +/* only applies to builtin_x86_defs cpus */
> +if (!kvm_irqchip_in_kernel()) {
> +x86_cpu_change_kvm_default("x2apic", "off");
> +} else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) {
> +x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
> +}
>
> -x86_cpu_apply_props(cpu, kvm_default_props);
> +/* Special cases not set in the X86CPUDefinition structs: */
> +x86_cpu_apply_props(cpu, kvm_default_props);
> +}
>

 I think this causes a regression in x2apic and kvm-msi-ext-dest-id
 support. If you start qemu thus:
>>>
>>> If I recall correctly, this change just tries to restore the behavior prior 
>>> to
>>> commit f5cc5a5c168674f84bf061cdb307c2d25fba5448 ,
>>>
>>> fixing the issue introduced with the refactoring at that time.
>>>
>>> Can you try bisecting prior to
>>> f5cc5a5c168674f84bf061cdb307c2d25fba5448 , to see if the actual
>>> breakage comes from somewhere else?
>>
>> Hm, so it looks like it never worked for '-cpu host' *until* commit
>> f5cc5a5c16.
> 
> Right, so here we are talking about properly supporting this for the first 
> time.
> 
> The fact that it works with f5cc5a5c16 is more an accident than anything 
> else, that commit was clearly broken
> (exemplified by reports of failed boots).
> 
> So we need to find the proper solution, ie, exactly which features should be 
> enabled for which cpu classes and models.
> 
>>
>> It didn't matter before c1bb5418e3 because you couldn't enable that
>> many vCPUs without an IOMMU, and the *IOMMU* setup would call
>> kvm_enable_x2apic().
>>
>> But after that, nothing ever called kvm_enable_x2apic() in the '-cpu
>> host' case until commit f5cc5a5c16, which fixed it... until you
>> restored the previous behaviour :)
>>
>> This "works" to fix this case, but presumably isn't correct:
> 
> Right, we cannot just enable all this code, or the original refactor would 
> have been right.
> 
> These kvm default properties have been as far as I know intended for the cpu 
> actual models (builtin_x86_defs),
> and not for the special cpu classes max, host and base. This is what the 
> revert addresses.
> 
> I suspect what we actually need here is to review exactly in which specific 
> cases kvm_enable_x2apic() should be called in the end.
> 
> The code there is mixing changes to the kvm_default_props that are then 
> applied using x86_cpu_apply_props (and that part should be only for 
> xcc->model != NULL),
> with the actual enablement of the kvm x2apic using kvm_vm_enable_cap(s, 
> KVM_CAP_X2APIC_API, 0, flags) via kvm_enable_x2apic().
> 
> One way is to ignore this detail and just move out those checks, since 
> changes to kvm_default_props are harmless once we skip the 
> x86_cpu_apply_props call,
> as such: 
> 
> --
> 
> static void kvm_cpu_instance_init(CPUState *cs)
> {
> X86CPU *cpu = X86_CPU(cs);
> X86CPUClass *xcc = X86_CPU_GET_CLASS(cpu);
> 
> host_cpu_instance_init(cpu);
> 
> /* only applies to builtin_x86_defs cpus */
> if (!kvm_irqchip_in_kernel()) {
> x86_cpu_change_kvm_default("x2apic", "off");
> } else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) {
> x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
> }
> 
> if (xcc->model) {
> /* Special cases not set in the X86CPUDefinition structs: */
> x86_cpu_apply_props(cpu, kvm_default_props);
> }
> 
> if (cpu->max_features) {
> kvm_cpu_max_instance_init(cpu);
> }
> 
> kvm_cpu_xsave_init();
> }
> 
> --
> 
> this might however cause further confusion later on, and I wonder if this is 
> actually correct, should we _always_ enable x2apic when 
> kvm_irqchip_is_split() returns true?

... and only when kvm_irqchip_is_split() ?

> Even for cpu class "base"? I am not too sure.
> 
> Another option that comes to mind is to add a call to enable x2apic for max 
> features cpus only ("host", "max") and not for base.
> 
> Thoughts? Paolo, Edoardo, anything comes to mind from your side?
> 
> Ciao,
> 
> Claudio
> 
> 
>>
>> ---

Re: [PATCH for-6.1 v2] i386: do not call cpudef-only models functions for max, host, base

2021-11-29 Thread Claudio Fontana

On 11/29/21 4:11 PM, David Woodhouse wrote:
> On Mon, 2021-11-29 at 15:14 +0100, Claudio Fontana wrote:
>> On 11/29/21 12:39 PM, Woodhouse, David wrote:
>>> On Fri, 2021-07-23 at 13:29 +0200, Claudio Fontana wrote:
  static void kvm_cpu_instance_init(CPUState *cs)
  {
  X86CPU *cpu = X86_CPU(cs);
 +X86CPUClass *xcc = X86_CPU_GET_CLASS(cpu);

  host_cpu_instance_init(cpu);

 -if (!kvm_irqchip_in_kernel()) {
 -x86_cpu_change_kvm_default("x2apic", "off");
 -} else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) {
 -x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
 -}
 -
 -/* Special cases not set in the X86CPUDefinition structs: */
 +if (xcc->model) {
 +/* only applies to builtin_x86_defs cpus */
 +if (!kvm_irqchip_in_kernel()) {
 +x86_cpu_change_kvm_default("x2apic", "off");
 +} else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) {
 +x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
 +}

 -x86_cpu_apply_props(cpu, kvm_default_props);
 +/* Special cases not set in the X86CPUDefinition structs: */
 +x86_cpu_apply_props(cpu, kvm_default_props);
 +}

>>>
>>> I think this causes a regression in x2apic and kvm-msi-ext-dest-id
>>> support. If you start qemu thus:
>>
>> If I recall correctly, this change just tries to restore the behavior prior 
>> to
>> commit f5cc5a5c168674f84bf061cdb307c2d25fba5448 ,
>>
>> fixing the issue introduced with the refactoring at that time.
>>
>> Can you try bisecting prior to
>> f5cc5a5c168674f84bf061cdb307c2d25fba5448 , to see if the actual
>> breakage comes from somewhere else?
> 
> Hm, so it looks like it never worked for '-cpu host' *until* commit
> f5cc5a5c16.

Right, so here we are talking about properly supporting this for the first time.

The fact that it works with f5cc5a5c16 is more an accident than anything else, 
that commit was clearly broken
(exemplified by reports of failed boots).

So we need to find the proper solution, ie, exactly which features should be 
enabled for which cpu classes and models.

> 
> It didn't matter before c1bb5418e3 because you couldn't enable that
> many vCPUs without an IOMMU, and the *IOMMU* setup would call
> kvm_enable_x2apic().
> 
> But after that, nothing ever called kvm_enable_x2apic() in the '-cpu
> host' case until commit f5cc5a5c16, which fixed it... until you
> restored the previous behaviour :)
> 
> This "works" to fix this case, but presumably isn't correct:

Right, we cannot just enable all this code, or the original refactor would have 
been right.

These kvm default properties have been as far as I know intended for the cpu 
actual models (builtin_x86_defs),
and not for the special cpu classes max, host and base. This is what the revert 
addresses.

I suspect what we actually need here is to review exactly in which specific 
cases kvm_enable_x2apic() should be called in the end.

The code there is mixing changes to the kvm_default_props that are then applied 
using x86_cpu_apply_props (and that part should be only for xcc->model != NULL),
with the actual enablement of the kvm x2apic using kvm_vm_enable_cap(s, 
KVM_CAP_X2APIC_API, 0, flags) via kvm_enable_x2apic().

One way is to ignore this detail and just move out those checks, since changes 
to kvm_default_props are harmless once we skip the x86_cpu_apply_props call,
as such: 

--

static void kvm_cpu_instance_init(CPUState *cs)
{
X86CPU *cpu = X86_CPU(cs);
X86CPUClass *xcc = X86_CPU_GET_CLASS(cpu);

host_cpu_instance_init(cpu);

/* only applies to builtin_x86_defs cpus */
if (!kvm_irqchip_in_kernel()) {
x86_cpu_change_kvm_default("x2apic", "off");
} else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) {
x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
}

if (xcc->model) {
/* Special cases not set in the X86CPUDefinition structs: */
x86_cpu_apply_props(cpu, kvm_default_props);
}

if (cpu->max_features) {
kvm_cpu_max_instance_init(cpu);
}

kvm_cpu_xsave_init();
}

--

this might however cause further confusion later on, and I wonder if this is 
actually correct, should we _always_ enable x2apic when kvm_irqchip_is_split() 
returns true?
Even for cpu class "base"? I am not too sure.

Another option that comes to mind is to add a call to enable x2apic for max 
features cpus only ("host", "max") and not for base.

Thoughts? Paolo, Edoardo, anything comes to mind from your side?

Ciao,

Claudio

> 
> --- a/target/i386/kvm/kvm-cpu.c
> +++ b/target/i386/kvm/kvm-cpu.c
> @@ -161,7 +161,7 @@ static void kvm_cpu_instance_init(CPUState *cs)
>  
>  host_cpu_instance_init(cpu);
>  
> -if (xcc->model) {
> +if (1 || xcc->model) {
>  /* only applies to builtin_x86_defs cpus */
>  if

Re: [PATCH v6 04/16] linux-user/host/mips: Add safe-syscall.inc.S

2021-11-29 Thread Richard Henderson


On 11/29/21 5:40 PM, Peter Maydell wrote:

+lw  a2, 16(sp)
+lw  a3, 20(sp)
+lw  t4, 24(sp)
+lw  t5, 28(sp)
+lw  t6, 32(sp)
+lw  t7, 40(sp)
+sw  t4, 16(sp)
+sw  t5, 20(sp)
+sw  t6, 24(sp)
+sw  t7, 28(sp)


This is a varargs call, so (unless I'm confused, which is
quite possible) the caller will only allocate enough stack
space for the arguments we're actually passed, right? That
means that unless the syscall actually has 3 or more arguments
the memory at 16(sp) will be whatever the caller had on the
stack above the argument-passing area, and we can't write to
it. I think we need to actually move sp down here so we have
some space we know we can scribble on.


Yep, good catch.


r~

Re: [PULL 0/7] virtio,pci,pc: bugfixes

2021-11-29 Thread Richard Henderson


On 11/29/21 2:51 PM, Michael S. Tsirkin wrote:

The following changes since commit dd4b0de45965538f19bb40c7ddaaba384a8c613a:

   Fix version for v6.2.0-rc2 release (2021-11-26 11:58:54 +0100)

are available in the Git repository at:

   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to bacf58ca18f06f0b464466bf8c19945f19791feb:

   Fix bad overflow check in hw/pci/pcie.c (2021-11-29 08:49:36 -0500)


virtio,pci,pc: bugfixes

Lots of small fixes all over the place.

Signed-off-by: Michael S. Tsirkin 


Cindy Lu (1):
   virtio-mmio : fix the crash in the vm shutdown

Daniella Lee (1):
   Fix bad overflow check in hw/pci/pcie.c

Eugenio Pérez (1):
   vdpa: Add dummy receive callback

Jason Wang (3):
   virtio-balloon: process all in sgs for free_page_vq
   virtio-balloon: correct used length
   intel-iommu: ignore leaf SNP bit in scalable mode

Laurent Vivier (1):
   failover: fix unplug pending detection

  hw/i386/intel_iommu_internal.h |  2 ++
  hw/acpi/pcihp.c| 30 +++---
  hw/i386/intel_iommu.c  |  6 ++
  hw/pci/pcie.c  |  4 ++--
  hw/virtio/virtio-balloon.c | 13 -
  hw/virtio/virtio-mmio.c| 12 
  net/vhost-vdpa.c   |  8 
  7 files changed, 65 insertions(+), 10 deletions(-)


Applied, thanks.

r~

Re: [PATCH v6 04/16] linux-user/host/mips: Add safe-syscall.inc.S

2021-11-29 Thread Peter Maydell

On Tue, 23 Nov 2021 at 17:44, Richard Henderson
 wrote:
>
> Signed-off-by: Richard Henderson 
> ---
>  linux-user/host/mips/hostdep.h  |   3 +
>  linux-user/host/mips/safe-syscall.inc.S | 123 
>  2 files changed, 126 insertions(+)
>  create mode 100644 linux-user/host/mips/safe-syscall.inc.S


> +LEAF(safe_syscall_base)
> +.cfi_startproc
> +#if _MIPS_SIM == _ABIO32
> +/*
> + * The syscall calling convention is nearly the same as C:
> + * we enter with a0 == _pending
> + *   a1 == syscall number
> + *   a2, a3, stack == syscall arguments
> + *   and return the result in a0
> + * and the syscall instruction needs
> + *   v0 == syscall number
> + *   a0 ... a3, stack == syscall arguments
> + *   and returns the result in v0
> + * Shuffle everything around appropriately.
> + */
> +movet0, a0  /* signal_pending pointer */
> +movev0, a1  /* syscall number */
> +movea0, a2  /* syscall arguments */
> +movea1, a3
> +lw  a2, 16(sp)
> +lw  a3, 20(sp)
> +lw  t4, 24(sp)
> +lw  t5, 28(sp)
> +lw  t6, 32(sp)
> +lw  t7, 40(sp)
> +sw  t4, 16(sp)
> +sw  t5, 20(sp)
> +sw  t6, 24(sp)
> +sw  t7, 28(sp)

This is a varargs call, so (unless I'm confused, which is
quite possible) the caller will only allocate enough stack
space for the arguments we're actually passed, right? That
means that unless the syscall actually has 3 or more arguments
the memory at 16(sp) will be whatever the caller had on the
stack above the argument-passing area, and we can't write to
it. I think we need to actually move sp down here so we have
some space we know we can scribble on.

-- PMM

Re: [PATCH] Fix STM32F2XX USART data register readout

2021-11-29 Thread Olivier Heriveaux

If I understand correctly (I'm not a QEmu internals expert), yes this is
what happens.
Maybe stm32f2xx_usart_can_receive() is also called but since the
USART_SR_RXNE flag is reset before the USART_DR is read, it does not
prevent reading the next character.

Best regards,
Olivier Hériveaux

Le lun. 29 nov. 2021 à 11:46, Peter Maydell  a
écrit :

> On Sun, 28 Nov 2021 at 12:07, Olivier Hériveaux
>  wrote:
> >
> > Fix issue where the data register may be overwritten by next character
> > reception before being read and returned.
> >
> > Signed-off-by: Olivier Hériveaux 
> > ---
> >  hw/char/stm32f2xx_usart.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/char/stm32f2xx_usart.c b/hw/char/stm32f2xx_usart.c
> > index 8df0832424..fde67f4f03 100644
> > --- a/hw/char/stm32f2xx_usart.c
> > +++ b/hw/char/stm32f2xx_usart.c
> > @@ -103,10 +103,11 @@ static uint64_t stm32f2xx_usart_read(void *opaque,
> hwaddr addr,
> >  return retvalue;
> >  case USART_DR:
> >  DB_PRINT("Value: 0x%" PRIx32 ", %c\n", s->usart_dr, (char)
> s->usart_dr);
> > +retvalue = s->usart_dr & 0x3FF;
> >  s->usart_sr &= ~USART_SR_RXNE;
> >  qemu_chr_fe_accept_input(>chr);
> >  qemu_set_irq(s->irq, 0);
> > -return s->usart_dr & 0x3FF;
> > +return retvalue;
> >  case USART_BRR:
> >  return s->usart_brr;
> >  case USART_CR1:
> > --
> > 2.17.1
>
> The bug happens because qemu_chr_fe_accept_input() can cause
> stm32f2xx_usart_receive() to be called, right ?
>
> Reviewed-by: Peter Maydell 
>
> I'll put this in my list of patches to take via target-arm.next for the
> 7.0 release.
>
> thanks
> -- PMM
>

-- 

Les informations contenues dans ce message électronique ainsi que celles 
contenues dans les documents attachés sont strictement confidentielles et 
sont destinées à l'usage exclusif du (des) destinataire(s) nommé(s).
Toute 
divulgation, distribution ou reproduction, même partielle, en est 
strictement interdite sauf autorisation écrite et expresse de l’émetteur.
Si vous recevez ce message par erreur, veuillez le notifier immédiatement à 
son émetteur par retour, et le détruire ainsi que tous les documents qui y 
sont attachés.

The information contained in this email and in any 
document enclosed is strictly confidential and is intended solely for the 
use of the individual or entity to which it is addressed.
Partial or total 
disclosure, distribution or reproduction of its contents is strictly 
prohibited unless expressly approved in writing by the sender.
If you have 
received this communication in error, please notify us immediately by 
responding to this email, and then delete the message and its attached 
files from your system.

[PATCH] MAINTAINERS: Change my email address

2021-11-29 Thread Eduardo Habkost

The ehabk...@redhat.com email address will stop working on
2021-12-01, change it to my personal email address.

Signed-off-by: Eduardo Habkost 
---
Note: I will probably step down as maintainer of some areas, but
I will do this later because I will need a few weeks to figure
out how much time I will be able to dedicate to QEMU.
---
 MAINTAINERS | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index d3879aa3c12..7e8a586b2ae 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -324,7 +324,7 @@ F: disas/sparc.c
 X86 TCG CPUs
 M: Paolo Bonzini 
 M: Richard Henderson 
-M: Eduardo Habkost 
+M: Eduardo Habkost 
 S: Maintained
 F: target/i386/tcg/
 F: tests/tcg/i386/
@@ -1628,7 +1628,7 @@ F: include/hw/i386/microvm.h
 F: pc-bios/bios-microvm.bin
 
 Machine core
-M: Eduardo Habkost 
+M: Eduardo Habkost 
 M: Marcel Apfelbaum 
 R: Philippe Mathieu-Daudé 
 S: Supported
@@ -2648,13 +2648,13 @@ F: backends/cryptodev*.c
 Python library
 M: John Snow 
 M: Cleber Rosa 
-R: Eduardo Habkost 
+R: Eduardo Habkost 
 S: Maintained
 F: python/
 T: git https://gitlab.com/jsnow/qemu.git python
 
 Python scripts
-M: Eduardo Habkost 
+M: Eduardo Habkost 
 M: Cleber Rosa 
 S: Odd Fixes
 F: scripts/*.py
@@ -2730,7 +2730,7 @@ T: git https://github.com/mdroth/qemu.git qga
 QOM
 M: Paolo Bonzini 
 R: Daniel P. Berrange 
-R: Eduardo Habkost 
+R: Eduardo Habkost 
 S: Supported
 F: docs/qdev-device-use.txt
 F: hw/core/qdev*
@@ -2750,7 +2750,7 @@ F: tests/unit/check-qom-proplist.c
 F: tests/unit/test-qdev-global-props.c
 
 QOM boilerplate conversion script
-M: Eduardo Habkost 
+M: Eduardo Habkost 
 S: Maintained
 F: scripts/codeconverter/
 
-- 
2.32.0

[PATCH v7 0/3] support dirty restraint on vCPU

2021-11-29 Thread huangy81

From: Hyman Huang(黄勇) 

The patch [2/3] has not been touched so far. Any corrections and
suggetions are welcome. 

Please review, thanks!

v7:
- rebase on master
- polish the comments and error message according to the
  advices given by Markus
- introduce dirtylimit_enabled function to pre-check if dirty
  page limit is enabled before canceling.

v6:
- rebase on master
- fix dirtylimit setup crash found by Markus
- polish the comments according to the advice given by Markus
- adjust the qemu qmp command tag to 7.0

v5:
- rebase on master
- adjust the throttle algorithm by removing the tuning in 
  RESTRAINT_RATIO case so that dirty page rate could reachs the quota
  more quickly.
- fix percentage update in throttle iteration.

v4:
- rebase on master
- modify the following points according to the advice given by Markus
  1. move the defination into migration.json
  2. polish the comments of set-dirty-limit
  3. do the syntax check and change dirty rate to dirty page rate

Thanks for the carefule reviews made by Markus.

Please review, thanks!

v3:
- rebase on master
- modify the following points according to the advice given by Markus
  1. remove the DirtyRateQuotaVcpu and use its field as option directly
  2. add comments to show details of what dirtylimit setup do
  3. explain how to use dirtylimit in combination with existing qmp
 commands "calc-dirty-rate" and "query-dirty-rate" in documentation.

Thanks for the carefule reviews made by Markus.

Please review, thanks!

Hyman

v2:
- rebase on master
- modify the following points according to the advices given by Juan
  1. rename dirtyrestraint to dirtylimit
  2. implement the full lifecyle function of dirtylimit_calc, include
 dirtylimit_calc and dirtylimit_calc_quit
  3. introduce 'quit' field in dirtylimit_calc_state to implement the
 dirtylimit_calc_quit
  4. remove the ready_cond and ready_mtx since it may not be suitable
  5. put the 'record_dirtypage' function code at the beggining of the
 file
  6. remove the unnecesary return;
- other modifications has been made after code review
  1. introduce 'bmap' and 'nr' field in dirtylimit_state to record the
 number of running thread forked by dirtylimit
  2. stop the dirtyrate calculation thread if all the dirtylimit thread
 are stopped
  3. do some renaming works
 dirtyrate calulation thread -> dirtylimit-calc
 dirtylimit thread -> dirtylimit-{cpu_index}
 function name do_dirtyrestraint -> dirtylimit_check
 qmp command dirty-restraint -> set-drity-limit
 qmp command dirty-restraint-cancel -> cancel-dirty-limit
 header file dirtyrestraint.h -> dirtylimit.h

Please review, thanks !

thanks for the accurate and timely advices given by Juan. we really
appreciate it if corrections and suggetions about this patchset are
proposed.

Best Regards !

Hyman

v1:
this patchset introduce a mechanism to impose dirty restraint
on vCPU, aiming to keep the vCPU running in a certain dirtyrate
given by user. dirty restraint on vCPU maybe an alternative
method to implement convergence logic for live migration,
which could improve guest memory performance during migration
compared with traditional method in theory.

For the current live migration implementation, the convergence
logic throttles all vCPUs of the VM, which has some side effects.
-'read processes' on vCPU will be unnecessarily penalized
- throttle increase percentage step by step, which seems
  struggling to find the optimal throttle percentage when
  dirtyrate is high.
- hard to predict the remaining time of migration if the
  throttling percentage reachs 99%

to a certain extent, the dirty restraint machnism can fix these
effects by throttling at vCPU granularity during migration.

the implementation is rather straightforward, we calculate
vCPU dirtyrate via the Dirty Ring mechanism periodically
as the commit 0e21bf246 "implement dirty-ring dirtyrate calculation"
does, for vCPU that be specified to impose dirty restraint,
we throttle it periodically as the auto-converge does, once after
throttling, we compare the quota dirtyrate with current dirtyrate,
if current dirtyrate is not under the quota, increase the throttling
percentage until current dirtyrate is under the quota.

this patchset is the basis of implmenting a new auto-converge method
for live migration, we introduce two qmp commands for impose/cancel
the dirty restraint on specified vCPU, so it also can be an independent
api to supply the upper app such as libvirt, which can use it to
implement the convergence logic during live migration, supplemented
with the qmp 'calc-dirty-rate' command or whatever.

we post this patchset for RFC and any corrections and suggetions about
the implementation, api, throttleing algorithm or whatever are very
appreciated!

Please review, thanks !

Best Regards !

Hyman Huang (3):
  migration/dirtyrate: implement vCPU dirtyrate calculation periodically
  cpu-throttle: implement vCPU throttle
  cpus-common: implement dirty

[PATCH v7 2/3] cpu-throttle: implement vCPU throttle

2021-11-29 Thread huangy81

From: Hyman Huang(黄勇) 

Impose dirty restraint on vCPU by kicking it and sleep
as the auto-converge does during migration, but just
kick the specified vCPU instead, not all vCPUs of vm.

Start a thread to track the dirtylimit status and adjust
the throttle pencentage dynamically depend on current
and quota dirtyrate.

Introduce the util function in the header for dirtylimit
implementation.

Signed-off-by: Hyman Huang(黄勇) 
---
 include/sysemu/cpu-throttle.h |  30 
 softmmu/cpu-throttle.c| 316 ++
 softmmu/trace-events  |   5 +
 3 files changed, 351 insertions(+)

diff --git a/include/sysemu/cpu-throttle.h b/include/sysemu/cpu-throttle.h
index d65bdef..334e5e2 100644
--- a/include/sysemu/cpu-throttle.h
+++ b/include/sysemu/cpu-throttle.h
@@ -65,4 +65,34 @@ bool cpu_throttle_active(void);
  */
 int cpu_throttle_get_percentage(void);
 
+/**
+ * dirtylimit_enabled
+ *
+ * Returns: %true if dirty page limit for vCPU is enabled, %false otherwise.
+ */
+bool dirtylimit_enabled(int cpu_index);
+
+/**
+ * dirtylimit_state_init:
+ *
+ * initialize golobal state for dirtylimit
+ */
+void dirtylimit_state_init(int max_cpus);
+
+/**
+ * dirtylimit_vcpu:
+ *
+ * impose dirtylimit on vcpu util reaching the quota dirtyrate
+ */
+void dirtylimit_vcpu(int cpu_index,
+ uint64_t quota);
+/**
+ * dirtylimit_cancel_vcpu:
+ *
+ * cancel dirtylimit for the specified vcpu
+ *
+ * Returns: the number of running threads for dirtylimit
+ */
+int dirtylimit_cancel_vcpu(int cpu_index);
+
 #endif /* SYSEMU_CPU_THROTTLE_H */
diff --git a/softmmu/cpu-throttle.c b/softmmu/cpu-throttle.c
index 8c2144a..f199d68 100644
--- a/softmmu/cpu-throttle.c
+++ b/softmmu/cpu-throttle.c
@@ -29,6 +29,8 @@
 #include "qemu/main-loop.h"
 #include "sysemu/cpus.h"
 #include "sysemu/cpu-throttle.h"
+#include "sysemu/dirtylimit.h"
+#include "trace.h"
 
 /* vcpu throttling controls */
 static QEMUTimer *throttle_timer;
@@ -38,6 +40,320 @@ static unsigned int throttle_percentage;
 #define CPU_THROTTLE_PCT_MAX 99
 #define CPU_THROTTLE_TIMESLICE_NS 1000
 
+#define DIRTYLIMIT_TOLERANCE_RANGE  15  /* 15MB/s */
+
+#define DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK 75
+#define DIRTYLIMIT_THROTTLE_SLIGHT_WATERMARK90
+
+#define DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE 5
+#define DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE2
+
+typedef enum {
+RESTRAIN_KEEP,
+RESTRAIN_RATIO,
+RESTRAIN_HEAVY,
+RESTRAIN_SLIGHT,
+} RestrainPolicy;
+
+typedef struct DirtyLimitState {
+int cpu_index;
+bool enabled;
+uint64_t quota; /* quota dirtyrate MB/s */
+QemuThread thread;
+char *name; /* thread name */
+} DirtyLimitState;
+
+struct {
+DirtyLimitState *states;
+int max_cpus;
+unsigned long *bmap; /* running thread bitmap */
+unsigned long nr;
+} *dirtylimit_state;
+
+bool dirtylimit_enabled(int cpu_index)
+{
+return qatomic_read(_state->states[cpu_index].enabled);
+}
+
+static inline void dirtylimit_set_quota(int cpu_index, uint64_t quota)
+{
+qatomic_set(_state->states[cpu_index].quota, quota);
+}
+
+static inline uint64_t dirtylimit_quota(int cpu_index)
+{
+return qatomic_read(_state->states[cpu_index].quota);
+}
+
+static int64_t dirtylimit_current(int cpu_index)
+{
+return dirtylimit_calc_current(cpu_index);
+}
+
+static void dirtylimit_vcpu_thread(CPUState *cpu, run_on_cpu_data data)
+{
+double pct;
+double throttle_ratio;
+int64_t sleeptime_ns, endtime_ns;
+int *percentage = (int *)data.host_ptr;
+
+pct = (double)(*percentage) / 100;
+throttle_ratio = pct / (1 - pct);
+/* Add 1ns to fix double's rounding error (like 0.999...) */
+sleeptime_ns = (int64_t)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS + 1);
+endtime_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + sleeptime_ns;
+while (sleeptime_ns > 0 && !cpu->stop) {
+if (sleeptime_ns > SCALE_MS) {
+qemu_cond_timedwait_iothread(cpu->halt_cond,
+ sleeptime_ns / SCALE_MS);
+} else {
+qemu_mutex_unlock_iothread();
+g_usleep(sleeptime_ns / SCALE_US);
+qemu_mutex_lock_iothread();
+}
+sleeptime_ns = endtime_ns - qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+}
+qatomic_set(>throttle_thread_scheduled, 0);
+
+free(percentage);
+}
+
+static void dirtylimit_check(int cpu_index,
+ int percentage)
+{
+CPUState *cpu;
+int64_t sleeptime_ns, starttime_ms, currenttime_ms;
+int *pct_parameter;
+double pct;
+
+pct = (double) percentage / 100;
+
+starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+while (true) {
+CPU_FOREACH(cpu) {
+if ((cpu_index == cpu->cpu_index) &&
+(!qatomic_xchg(>throttle_thread_scheduled, 1))) {
+pct_parameter = malloc(sizeof(*pct_parameter));
+*pct_parameter = percentage;
+

[PATCH v7 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically

2021-11-29 Thread huangy81

From: Hyman Huang(黄勇) 

Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
tracking for calculate dirtyrate periodly for dirty restraint.

Implement thread for calculate dirtyrate periodly, which will
be used for dirty restraint.

Add dirtylimit.h to introduce the util function for dirty
limit implementation.

Signed-off-by: Hyman Huang(黄勇) 
---
 include/exec/memory.h   |   5 +-
 include/sysemu/dirtylimit.h |  44 ++
 migration/dirtyrate.c   | 139 
 migration/dirtyrate.h   |   2 +
 4 files changed, 179 insertions(+), 11 deletions(-)
 create mode 100644 include/sysemu/dirtylimit.h

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 20f1b27..606bec8 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
 /* Dirty tracking enabled because measuring dirty rate */
 #define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
 
-#define GLOBAL_DIRTY_MASK  (0x3)
+/* Dirty tracking enabled because dirty limit */
+#define GLOBAL_DIRTY_LIMIT  (1U << 2)
+
+#define GLOBAL_DIRTY_MASK  (0x7)
 
 extern unsigned int global_dirty_tracking;
 
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
new file mode 100644
index 000..49298a2
--- /dev/null
+++ b/include/sysemu/dirtylimit.h
@@ -0,0 +1,44 @@
+/*
+ * dirty limit helper functions
+ *
+ * Copyright (c) 2021 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_DIRTYRLIMIT_H
+#define QEMU_DIRTYRLIMIT_H
+
+#define DIRTYLIMIT_CALC_PERIOD_TIME_S   15  /* 15s */
+
+/**
+ * dirtylimit_calc_current:
+ *
+ * get current dirty page rate for specified vCPU.
+ */
+int64_t dirtylimit_calc_current(int cpu_index);
+
+/**
+ * dirtylimit_calc:
+ *
+ * start dirty page rate calculation thread.
+ */
+void dirtylimit_calc(void);
+
+/**
+ * dirtylimit_calc_quit:
+ *
+ * quit dirty page rate calculation thread.
+ */
+void dirtylimit_calc_quit(void);
+
+/**
+ * dirtylimit_calc_state_init:
+ *
+ * initialize dirty page rate calculation state.
+ */
+void dirtylimit_calc_state_init(int max_cpus);
+#endif
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index d65e744..d370a21 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -27,6 +27,7 @@
 #include "qapi/qmp/qdict.h"
 #include "sysemu/kvm.h"
 #include "sysemu/runstate.h"
+#include "sysemu/dirtylimit.h"
 #include "exec/memory.h"
 
 /*
@@ -46,6 +47,134 @@ static struct DirtyRateStat DirtyStat;
 static DirtyRateMeasureMode dirtyrate_mode =
 DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
 
+#define DIRTYLIMIT_CALC_TIME_MS 1000/* 1000ms */
+
+struct {
+DirtyRatesData data;
+int64_t period;
+bool quit;
+} *dirtylimit_calc_state;
+
+static void dirtylimit_global_dirty_log_start(void)
+{
+qemu_mutex_lock_iothread();
+memory_global_dirty_log_start(GLOBAL_DIRTY_LIMIT);
+qemu_mutex_unlock_iothread();
+}
+
+static void dirtylimit_global_dirty_log_stop(void)
+{
+qemu_mutex_lock_iothread();
+memory_global_dirty_log_sync();
+memory_global_dirty_log_stop(GLOBAL_DIRTY_LIMIT);
+qemu_mutex_unlock_iothread();
+}
+
+static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
+ CPUState *cpu, bool start)
+{
+if (start) {
+dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
+} else {
+dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
+}
+}
+
+static void dirtylimit_calc_func(void)
+{
+CPUState *cpu;
+DirtyPageRecord *dirty_pages;
+int64_t start_time, end_time, calc_time;
+DirtyRateVcpu rate;
+int i = 0;
+
+dirty_pages = g_malloc0(sizeof(*dirty_pages) *
+dirtylimit_calc_state->data.nvcpu);
+
+dirtylimit_global_dirty_log_start();
+
+CPU_FOREACH(cpu) {
+record_dirtypages(dirty_pages, cpu, true);
+}
+
+start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+g_usleep(DIRTYLIMIT_CALC_TIME_MS * 1000);
+end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+calc_time = end_time - start_time;
+
+dirtylimit_global_dirty_log_stop();
+
+CPU_FOREACH(cpu) {
+record_dirtypages(dirty_pages, cpu, false);
+}
+
+for (i = 0; i < dirtylimit_calc_state->data.nvcpu; i++) {
+uint64_t increased_dirty_pages =
+dirty_pages[i].end_pages - dirty_pages[i].start_pages;
+uint64_t memory_size_MB =
+(increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
+int64_t dirtyrate = (memory_size_MB * 1000) / calc_time;
+
+rate.id = i;
+rate.dirty_rate  = dirtyrate;
+dirtylimit_calc_state->data.rates[i] = rate;
+
+trace_dirtyrate_do_calculate_vcpu(i,
+dirtylimit_calc_state->data.rates[i].dirty_rate);
+}
+}
+
+static void

Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever

2021-11-29 Thread Dr. David Alan Gilbert

* Daniel P. Berrangé (berra...@redhat.com) wrote:
> On Mon, Nov 29, 2021 at 11:20:08AM +, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berra...@redhat.com) wrote:
> > > On Fri, Nov 26, 2021 at 04:31:53PM +0100, Li Zhang wrote:
> > > > When doing live migration with multifd channels 8, 16 or larger number,
> > > > the guest hangs in the presence of the network errors such as missing 
> > > > TCP ACKs.
> > > > 
> > > > At sender's side:
> > > > The main thread is blocked on qemu_thread_join, migration_fd_cleanup
> > > > is called because one thread fails on qio_channel_write_all when
> > > > the network problem happens and other send threads are blocked on 
> > > > sendmsg.
> > > > They could not be terminated. So the main thread is blocked on 
> > > > qemu_thread_join
> > > > to wait for the threads terminated.
> > > 
> > > Isn't the right answer here to ensure we've called 'shutdown' on
> > > all the FDs, so that the threads get kicked out of sendmsg, before
> > > trying to join the thread ?
> > 
> > I agree a timeout is wrong here; there is no way to get a good timeout
> > value.
> > However, I'm a bit confused - we should be able to try a shutdown on the
> > receive side using the 'yank' command. - that's what it's there for; Li
> > does this solve your problem?
> 
> Why do we even need to use 'yank' on the receive side ? Until migration
> has switched over from src to dst, the receive side is discardable and
> the whole process can just be teminated with kill(SIGTERM/SIGKILL).

True, although it's nice to be able to quit cleanly.

> On the source side 'yank' is needed, because the QEMU process is still
> running the live workload and thus is precious and mustn't be killed.

True.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever

2021-11-29 Thread Li Zhang




On 11/29/21 3:50 PM, Dr. David Alan Gilbert wrote:

* Li Zhang (lizh...@suse.de) wrote:

On 11/29/21 12:20 PM, Dr. David Alan Gilbert wrote:

* Daniel P. Berrangé (berra...@redhat.com) wrote:

On Fri, Nov 26, 2021 at 04:31:53PM +0100, Li Zhang wrote:

When doing live migration with multifd channels 8, 16 or larger number,
the guest hangs in the presence of the network errors such as missing TCP ACKs.

At sender's side:
The main thread is blocked on qemu_thread_join, migration_fd_cleanup
is called because one thread fails on qio_channel_write_all when
the network problem happens and other send threads are blocked on sendmsg.
They could not be terminated. So the main thread is blocked on qemu_thread_join
to wait for the threads terminated.

Isn't the right answer here to ensure we've called 'shutdown' on
all the FDs, so that the threads get kicked out of sendmsg, before
trying to join the thread ?

I agree a timeout is wrong here; there is no way to get a good timeout
value.
However, I'm a bit confused - we should be able to try a shutdown on the
receive side using the 'yank' command. - that's what it's there for; Li
does this solve your problem?

No, I tried to register 'yank' on the receive side, the receive threads are
still waiting there.

It seems that on send side, 'yank' doesn't work either when the send threads
are blocked.

This may be not the case to call yank. I am not quite sure about it.

We need to fix that; 'yank' should be able to recover from any network
issue.  If it's not working we need to understand why.


OK, I will look into it.




multifd_load_cleanup already kicks sem_sync before trying to do a
thread_join - so have we managed to trigger that on the receive side?

There is no problem with sem_sync in function multifd_load_cleanup.

But it is not called in my case, because no errors are detected on the
receive side.

If you're getting TCP errors why aren't you seeing any errors on the
receive side?


That's  a good point. I need to find out it.




The problem is here:

void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
{
     MigrationIncomingState *mis = migration_incoming_get_current();
     Error *local_err = NULL;
     bool start_migration;

    ...

     if (!mis->from_src_file) {

     ...

      } else {
     /* Multiple connections */
     assert(migrate_use_multifd());
     start_migration = multifd_recv_new_channel(ioc, _err);
     if (local_err) {
     error_propagate(errp, local_err);
     return;
     }
     }
    if (start_migration) {
     migration_incoming_process();
     }
}

start_migration is always 0, and migration is not started because some
receive threads are not created.

No errors are detected here and the main process works well but receive
threads are all waiting for semaphore.

It's hard to know if the receive threads are not created. If we can find a
way to check if any receive threads

So is this only a problem for network issues that happen during startup,
before all the threads have been created?


Yes, it is.



Dave


are not created, we can kick the sem_sync and do cleanup.

 From the source code, the thread will be created when QIO channel detects
something by GIO watch if I understand correctly.

If nothing is detected, socket_accept_icoming_migration won't be called, the
thread will not be created.

socket_start_incoming_migration_internal ->

     qio_net_listener_set_client_func_full(listener,
socket_accept_incoming_migration,
   NULL, NULL,
g_main_context_get_thread_default());

    qio_net_listener_set_client_func_full ->

    qio_channel_add_watch_source(
     QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
     qio_net_listener_channel_func,
     listener, (GDestroyNotify)object_unref, context);

   socket_accept_incoming_migration ->

    migration_channel_process_incoming ->

    migration_ioc_process_incoming ->

  multifd_recv_new_channel ->

                             qemu_thread_create(>thread, p->name,
multifd_recv_thread, p,
QEMU_THREAD_JOINABLE);


Dave


Regards,
Daniel
--
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [RFC PATCH 0/2] QEMU/openbios: PPC Software TLB support in the G4 family

2021-11-29 Thread Fabiano Rosas

Cédric Le Goater  writes:

>>> Right. If we're doing this to say "I can boot a kernel with a 7450 cpu in 
>>> QEMU" but
>>> the implementation is different from real hardware, then I'm not sure what 
>>> the real
>>> value is. That effectively leaves option b) if someone is willing to do the 
>>> work, or
>>> as you say to simply remove the code from QEMU.
>> 
>> Yeah, that is a good point. Although the software TLB is well contained,
>> so we could certainly document that our 7450s don't have that feature
>> and call it a day. Does QEMU have any policy on how much of a machine is
>> required to be implemented?
>> 
>> I am more inclined to apply c) for now as I said, just to have some code
>> running on the CPU and maybe document in a gitlab issue that we're
>> lacking the runtime switch and eventually implement that. It's not like
>> this is high traffic code anyway. It has been broken for 10+ years.
>> 
>> That said, if Cédric and Daniel see more value in moving the 7450s to
>> the POWERPC_MMU_32B I won't oppose.
>
> I am in favor of dropping unused code in QEMU and keeping the CPUs for
> which we have support in Linux using the POWERPC_MMU_32B in QEMU and the
> openbios patch. If we need SoftTLB support for the 74x CPUs in QEMU, we
> can always dig in the history.

Ack. I'll send a v2.

>
> We can give FreeBSB a try also since they had support for the G4 :
>
>https://people.freebsd.org/~arved/stuff/minimac
>
>
> With the openbios patch, Linux boots fine under 7450, 7455, 7447 CPUs.
>
> Under 7448, it drops in xmon with a :
>   
> kernel tried to execute exec-protected page (c07fdd98) - exploit attempt? 
> (uid: 0)
> BUG: Unable to handle kernel instruction fetch
> Faulting instruction address: 0xc07fdd98
> Vector: 400 (Instruction Access) at [f1019d30]
>  pc: c07fdd98: __do_softirq+0x0/0x2f0
>  lr: c00516a4: irq_exit+0xbc/0xf8
>  sp: f1019df0
> msr: 10001032
>current = 0xc0d0
>  pid   = 1, comm = swapper

I see two possible issues:

1) The 7448 is configured as a 7400 in QEMU (cpu-models.c), so it will
behave differently from the 7450s. The user manual seems to indicate it
is closer to a 7445 than a 7400. We need to double check what is correct.

2) OpenBIOS already has support for the 7448 PVR without my patch, but
given that no other cpu of the 7450 family is supported, I'd say this is
accidental. The mask that OpenBIOS uses for e600/MPC86xx is:

.iu_version = 0x8004,
.name = "PowerPC,MPC86xx",

And the verification:

iu_version = mfpvr() & 0x;

for (i = 0; i < sizeof(ppc_defs) / sizeof(struct cpudef); i++) {
if (iu_version == ppc_defs[i].iu_version)
return _defs[i];
}
printk("Unknown cpu (pvr %x), freezing!\n", iu_version);

But QEMU says the PVRs are as follows:

CPU_POWERPC_e600   = 0x80040010,
#define CPU_POWERPC_MPC8610  CPU_POWERPC_e600
#define CPU_POWERPC_MPC8641  CPU_POWERPC_e600
#define CPU_POWERPC_MPC8641D CPU_POWERPC_e600

CPU_POWERPC_7448_v10   = 0x80040100,
CPU_POWERPC_7448_v11   = 0x80040101,
CPU_POWERPC_7448_v20   = 0x80040200,
CPU_POWERPC_7448_v21   = 0x80040201,

So by applying the mask, OpenBIOS is matching both 0x80040100 and
 0x80040010 when it looks like it only wants to match the latter.

Re: [PATCH for-6.1 v2] i386: do not call cpudef-only models functions for max, host, base

2021-11-29 Thread David Woodhouse

On Mon, 2021-11-29 at 15:14 +0100, Claudio Fontana wrote:
> On 11/29/21 12:39 PM, Woodhouse, David wrote:
> > On Fri, 2021-07-23 at 13:29 +0200, Claudio Fontana wrote:
> > >  static void kvm_cpu_instance_init(CPUState *cs)
> > >  {
> > >  X86CPU *cpu = X86_CPU(cs);
> > > +X86CPUClass *xcc = X86_CPU_GET_CLASS(cpu);
> > > 
> > >  host_cpu_instance_init(cpu);
> > > 
> > > -if (!kvm_irqchip_in_kernel()) {
> > > -x86_cpu_change_kvm_default("x2apic", "off");
> > > -} else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) {
> > > -x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
> > > -}
> > > -
> > > -/* Special cases not set in the X86CPUDefinition structs: */
> > > +if (xcc->model) {
> > > +/* only applies to builtin_x86_defs cpus */
> > > +if (!kvm_irqchip_in_kernel()) {
> > > +x86_cpu_change_kvm_default("x2apic", "off");
> > > +} else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) {
> > > +x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
> > > +}
> > > 
> > > -x86_cpu_apply_props(cpu, kvm_default_props);
> > > +/* Special cases not set in the X86CPUDefinition structs: */
> > > +x86_cpu_apply_props(cpu, kvm_default_props);
> > > +}
> > > 
> > 
> > I think this causes a regression in x2apic and kvm-msi-ext-dest-id
> > support. If you start qemu thus:
> 
> If I recall correctly, this change just tries to restore the behavior prior to
> commit f5cc5a5c168674f84bf061cdb307c2d25fba5448 ,
> 
> fixing the issue introduced with the refactoring at that time.
> 
> Can you try bisecting prior to
> f5cc5a5c168674f84bf061cdb307c2d25fba5448 , to see if the actual
> breakage comes from somewhere else?

Hm, so it looks like it never worked for '-cpu host' *until* commit
f5cc5a5c16.

It didn't matter before c1bb5418e3 because you couldn't enable that
many vCPUs without an IOMMU, and the *IOMMU* setup would call
kvm_enable_x2apic().

But after that, nothing ever called kvm_enable_x2apic() in the '-cpu
host' case until commit f5cc5a5c16, which fixed it... until you
restored the previous behaviour :)

This "works" to fix this case, but presumably isn't correct:

--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -161,7 +161,7 @@ static void kvm_cpu_instance_init(CPUState *cs)
 
 host_cpu_instance_init(cpu);
 
-if (xcc->model) {
+if (1 || xcc->model) {
 /* only applies to builtin_x86_defs cpus */
 if (!kvm_irqchip_in_kernel()) {
 x86_cpu_change_kvm_default("x2apic", "off");


> > Any image to specifically test out? Would an actual 9 sockets machine be 
> > required to reproduce this?

No, but the more CPUs you have in the host the less you have to wait
for 288 vCPUs to spin up :)

My test is:

./qemu-system-x86_64 -machine q35,accel=kvm,usb=off,kernel_irqchip=split -cpu 
host -m 2G -smp sockets=9,cores=16,threads=2 -drive 
file=/var/lib/libvirt/images/fedora.qcow2,if=virtio -serial mon:stdio -display 
none  -kernel ~/git/linux/arch/x86/boot/bzImage  -append "console=ttyS0,115200 
root=/dev/vda1" 


I then play with the affinity of the AHCI MSI. Pointing it at CPU 255
should show the problem. 

[root@localhost ~]# cd /proc/irq/313
[root@localhost 313]# echo 255 > smp_affinity_list 
[root@localhost 313]#
[   65.365821] Composed MSI for APIC 255 vector 0x22: 0/feeff000 22
[root@localhost 313]# grep ahci /proc/interrupts 


I also added some debugging into host and guest kernels to be a little
more explicit:

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b70344bf6600..53191db5145d 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1866,6 +1866,7 @@ static __init void try_to_enable_x2apic(int remap_mode)
 * used for non-remapped IRQ domains.
 */
if (x86_init.hyper.msi_ext_dest_id()) {
+   pr_info("x2apic: support extended destination ID\n");
virt_ext_dest_id = 1;
apic_limit = 32767;
}
@@ -2539,6 +2540,7 @@ void __irq_msi_compose_msg(struct irq_cfg *cfg, struct 
msi_msg *msg,
msg->arch_addr_lo.virt_destid_8_14 = cfg->dest_apicid >> 8;
else
WARN_ON_ONCE(cfg->dest_apicid > 0xFF);
+   printk("Composed MSI for APIC %d vector 0x%x: %x/%x %x\n", 
cfg->dest_apicid, cfg->vector, msg->address_hi, msg->address_lo, msg->data);
 }
 
 u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool extid)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 59abbdad7729..f0a7715763a2 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -856,6 +856,8 @@ static void __init kvm_apic_init(void)
 
 static bool __init kvm_msi_ext_dest_id(void)
 {
+   printk("dest id? %d (%x)\n", 
kvm_para_has_feature(KVM_FEATURE_MSI_EXT_DEST_ID),
+  kvm_arch_para_features());
return

Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever

2021-11-29 Thread Daniel P . Berrangé

On Mon, Nov 29, 2021 at 11:20:08AM +, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berra...@redhat.com) wrote:
> > On Fri, Nov 26, 2021 at 04:31:53PM +0100, Li Zhang wrote:
> > > When doing live migration with multifd channels 8, 16 or larger number,
> > > the guest hangs in the presence of the network errors such as missing TCP 
> > > ACKs.
> > > 
> > > At sender's side:
> > > The main thread is blocked on qemu_thread_join, migration_fd_cleanup
> > > is called because one thread fails on qio_channel_write_all when
> > > the network problem happens and other send threads are blocked on sendmsg.
> > > They could not be terminated. So the main thread is blocked on 
> > > qemu_thread_join
> > > to wait for the threads terminated.
> > 
> > Isn't the right answer here to ensure we've called 'shutdown' on
> > all the FDs, so that the threads get kicked out of sendmsg, before
> > trying to join the thread ?
> 
> I agree a timeout is wrong here; there is no way to get a good timeout
> value.
> However, I'm a bit confused - we should be able to try a shutdown on the
> receive side using the 'yank' command. - that's what it's there for; Li
> does this solve your problem?

Why do we even need to use 'yank' on the receive side ? Until migration
has switched over from src to dst, the receive side is discardable and
the whole process can just be teminated with kill(SIGTERM/SIGKILL).

On the source side 'yank' is needed, because the QEMU process is still
running the live workload and thus is precious and mustn't be killed.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2 2/2] virtio-net: Fix log message

2021-11-29 Thread Eugenio Perez Martin

On Mon, Nov 29, 2021 at 2:43 PM Michael S. Tsirkin  wrote:
>
> On Fri, Nov 26, 2021 at 10:54:32AM +0800, Jason Wang wrote:
> > On Thu, Nov 25, 2021 at 6:16 PM Eugenio Pérez  wrote:
> > >
> > > The message has never been true in the case of non tap networking, so
> > > only tell that userland networking will be used if possible.
> > >
> > > Signed-off-by: Eugenio Pérez 
> >
> > Acked-by: Jason Wang 
>
> Breaks make check. I suspect it's called without a peer or something.
>

You're right, sending it as a separate patch since I saw the other one
made it into the pull request.

Thanks!

> Dropped for 6.2.
>
> > > ---
> > >  hw/net/virtio-net.c | 11 ++-
> > >  1 file changed, 6 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > index f2014d5ea0..d6c98c3c2d 100644
> > > --- a/hw/net/virtio-net.c
> > > +++ b/hw/net/virtio-net.c
> > > @@ -245,6 +245,7 @@ static void virtio_net_vhost_status(VirtIONet *n, 
> > > uint8_t status)
> > >  NetClientState *nc = qemu_get_queue(n->nic);
> > >  int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> > >  int cvq = n->max_ncs - n->max_queue_pairs;
> > > +bool tap_backend = nc->peer->info->type == NET_CLIENT_DRIVER_TAP;
> > >
> > >  if (!get_vhost_net(nc->peer)) {
> > >  return;
> > > @@ -258,9 +259,9 @@ static void virtio_net_vhost_status(VirtIONet *n, 
> > > uint8_t status)
> > >  int r, i;
> > >
> > >  if (n->needs_vnet_hdr_swap) {
> > > -error_report("backend does not support %s vnet headers; "
> > > - "falling back on userspace virtio",
> > > - virtio_is_big_endian(vdev) ? "BE" : "LE");
> > > +error_report("backend does not support %s vnet headers%s",
> > > +virtio_is_big_endian(vdev) ? "BE" : "LE",
> > > +tap_backend ? "; falling back on userspace virtio" : 
> > > "");
> > >  return;
> > >  }
> > >
> > > @@ -288,8 +289,8 @@ static void virtio_net_vhost_status(VirtIONet *n, 
> > > uint8_t status)
> > >  n->vhost_started = 1;
> > >  r = vhost_net_start(vdev, n->nic->ncs, queue_pairs, cvq);
> > >  if (r < 0) {
> > > -error_report("unable to start vhost net: %d: "
> > > - "falling back on userspace virtio", -r);
> > > +error_report("unable to start vhost net: %d%s", -r,
> > > +   tap_backend ? " falling back on userspace virtio" 
> > > : "");
> > >  n->vhost_started = 0;
> > >  }
> > >  } else {
> > > --
> > > 2.27.0
> > >
>

[PATCH] block/file-posix.c: Fix compilation on macOS SDKs <10.12.4

2021-11-29 Thread Evan Miller

fpunchhole_t was introduced in the macOS 10.12.4 SDK. For reference, see:

https://developer.apple.com/documentation/kernel/fpunchhole_t

Test the SDK version before attempting any fpunchhole_t-related logic.


Signed-off-by: Evan Miller 

--- block/file-posix.c.orig
+++ block/file-posix.c
@@ -1830,7 +1830,9 @@
 ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
aiocb->aio_offset, aiocb->aio_nbytes);
 ret = translate_err(-errno);
-#elif defined(__APPLE__) && (__MACH__)
+#elif defined(__APPLE__) && (__MACH__) && \
+  defined(__MAC_OS_X_VERSION_MAX_ALLOWED) && \
+  __MAC_OS_X_VERSION_MAX_ALLOWED >= 101204
 fpunchhole_t fpunchhole;
 fpunchhole.fp_flags = 0;
 fpunchhole.reserved = 0;

Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever

2021-11-29 Thread Dr. David Alan Gilbert

* Li Zhang (lizh...@suse.de) wrote:
> 
> On 11/29/21 12:20 PM, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berra...@redhat.com) wrote:
> > > On Fri, Nov 26, 2021 at 04:31:53PM +0100, Li Zhang wrote:
> > > > When doing live migration with multifd channels 8, 16 or larger number,
> > > > the guest hangs in the presence of the network errors such as missing 
> > > > TCP ACKs.
> > > > 
> > > > At sender's side:
> > > > The main thread is blocked on qemu_thread_join, migration_fd_cleanup
> > > > is called because one thread fails on qio_channel_write_all when
> > > > the network problem happens and other send threads are blocked on 
> > > > sendmsg.
> > > > They could not be terminated. So the main thread is blocked on 
> > > > qemu_thread_join
> > > > to wait for the threads terminated.
> > > Isn't the right answer here to ensure we've called 'shutdown' on
> > > all the FDs, so that the threads get kicked out of sendmsg, before
> > > trying to join the thread ?
> > I agree a timeout is wrong here; there is no way to get a good timeout
> > value.
> > However, I'm a bit confused - we should be able to try a shutdown on the
> > receive side using the 'yank' command. - that's what it's there for; Li
> > does this solve your problem?
> 
> No, I tried to register 'yank' on the receive side, the receive threads are
> still waiting there.
> 
> It seems that on send side, 'yank' doesn't work either when the send threads
> are blocked.
> 
> This may be not the case to call yank. I am not quite sure about it.

We need to fix that; 'yank' should be able to recover from any network
issue.  If it's not working we need to understand why.

> > 
> > multifd_load_cleanup already kicks sem_sync before trying to do a
> > thread_join - so have we managed to trigger that on the receive side?
> 
> There is no problem with sem_sync in function multifd_load_cleanup.
> 
> But it is not called in my case, because no errors are detected on the
> receive side.

If you're getting TCP errors why aren't you seeing any errors on the
receive side?

> The problem is here:
> 
> void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
> {
>     MigrationIncomingState *mis = migration_incoming_get_current();
>     Error *local_err = NULL;
>     bool start_migration;
> 
>    ...
> 
>     if (!mis->from_src_file) {
> 
>     ...
> 
>      } else {
>     /* Multiple connections */
>     assert(migrate_use_multifd());
>     start_migration = multifd_recv_new_channel(ioc, _err);
>     if (local_err) {
>     error_propagate(errp, local_err);
>     return;
>     }
>     }
>    if (start_migration) {
>     migration_incoming_process();
>     }
> }
> 
> start_migration is always 0, and migration is not started because some
> receive threads are not created.
> 
> No errors are detected here and the main process works well but receive
> threads are all waiting for semaphore.
> 
> It's hard to know if the receive threads are not created. If we can find a
> way to check if any receive threads

So is this only a problem for network issues that happen during startup,
before all the threads have been created?

Dave

> are not created, we can kick the sem_sync and do cleanup.
> 
> From the source code, the thread will be created when QIO channel detects
> something by GIO watch if I understand correctly.
> 
> If nothing is detected, socket_accept_icoming_migration won't be called, the
> thread will not be created.
> 
> socket_start_incoming_migration_internal ->
> 
>     qio_net_listener_set_client_func_full(listener,
> socket_accept_incoming_migration,
>   NULL, NULL,
> g_main_context_get_thread_default());
> 
>    qio_net_listener_set_client_func_full ->
> 
>    qio_channel_add_watch_source(
>     QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
>     qio_net_listener_channel_func,
>     listener, (GDestroyNotify)object_unref, context);
> 
>   socket_accept_incoming_migration ->
> 
>    migration_channel_process_incoming ->
> 
>    migration_ioc_process_incoming ->
> 
>  multifd_recv_new_channel ->
> 
>                             qemu_thread_create(>thread, p->name,
> multifd_recv_thread, p,
> QEMU_THREAD_JOINABLE);
> 
> > 
> > Dave
> > 
> > > Regards,
> > > Daniel
> > > -- 
> > > |: https://berrange.com  -o-
> > > https://www.flickr.com/photos/dberrange :|
> > > |: https://libvirt.org -o-
> > > https://fstop138.berrange.com :|
> > > |: https://entangle-photo.org-o-
> > > https://www.instagram.com/dberrange :|
> > > 
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: SEV guest attestation

2021-11-29 Thread Brijesh Singh





On 11/29/21 8:29 AM, Brijesh Singh wrote:



On 11/25/21 7:59 AM, Dov Murik wrote:

[+cc Tom, Brijesh]

On 25/11/2021 15:42, Daniel P. Berrangé wrote:

On Thu, Nov 25, 2021 at 02:44:51PM +0200, Dov Murik wrote:

[+cc jejb, tobin, jim, hubertus]


On 25/11/2021 9:14, Sergio Lopez wrote:
On Wed, Nov 24, 2021 at 06:29:07PM +, Dr. David Alan Gilbert 
wrote:

* Daniel P. Berrangé (berra...@redhat.com) wrote:

On Wed, Nov 24, 2021 at 11:34:16AM -0500, Tyler Fanelli wrote:

Hi,

We recently discussed a way for remote SEV guest attestation 
through QEMU.
My initial approach was to get data needed for attestation 
through different
QMP commands (all of which are already available, so no changes 
required
there), deriving hashes and certificate data; and collecting all 
of this
into a new QMP struct (SevLaunchStart, which would include the 
VM's policy,
secret, and GPA) which would need to be upstreamed into QEMU. 
Once this is
provided, QEMU would then need to have support for attestation 
before a VM
is started. Upon speaking to Dave about this proposal, he 
mentioned that
this may not be the best approach, as some situations would 
render the
attestation unavailable, such as the instance where a VM is 
running in a
cloud, and a guest owner would like to perform attestation via 
QMP (a likely
scenario), yet a cloud provider cannot simply let anyone pass 
arbitrary QMP

commands, as this could be an issue.


As a general point, QMP is a low level QEMU implementation detail,
which is generally expected to be consumed exclusively on the host
by a privileged mgmt layer, which will in turn expose its own higher
level APIs to users or other apps. I would not expect to see QMP
exposed to anything outside of the privileged host layer.

We also use the QAPI protocol for QEMU guest agent commmunication,
however, that is a distinct service from QMP on the host. It shares
most infra with QMP but has a completely diffent command set. On the
host it is not consumed inside QEMU, but instead consumed by a
mgmt app like libvirt.

So I ask, does anyone involved in QEMU's SEV implementation have 
any input
on a quality way to perform guest attestation? If so, I'd be 
interested.


I think what's missing is some clearer illustrations of how this
feature is expected to be consumed in some real world application
and the use cases we're trying to solve.

I'd like to understand how it should fit in with common libvirt
applications across the different virtualization management
scenarios - eg virsh (command line),  virt-manger (local desktop
GUI), cockpit (single host web mgmt), OpenStack (cloud mgmt), etc.
And of course any non-traditional virt use cases that might be
relevant such as Kata.


That's still not that clear; I know Alice and Sergio have some ideas
(cc'd).
There's also some standardisation efforts (e.g. 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.potaroo.net%2Fietf%2Fhtml%2Fids-wg-rats.htmldata=04%7C01%7Cbrijesh.singh%40amd.com%7C3c94b09f0cd5450460a808d9b01be1f8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637734456065941078%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=E%2FeaI6JNF2ckosTeAbFRaCZUJOZ3zG0GNfKP8082INQ%3Dreserved=0 

and 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Farchive%2Fid%2Fdraft-ietf-rats-architecture-00.htmldata=04%7C01%7Cbrijesh.singh%40amd.com%7C3c94b09f0cd5450460a808d9b01be1f8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637734456065951077%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=WEkMIZZp3O5Gyay5jZT8KSUH9fyarNfXy5O0Z%2FpHdnQ%3Dreserved=0 


) - that I can't claim to fully understand.
However, there are some themes that are emerging:

   a) One use is to only allow a VM to access some private data 
once we

prove it's the VM we expect running in a secure/confidential system
   b) (a) normally involves requesting some proof from the VM and 
then

providing it some confidential data/a key if it's OK
   c) RATs splits the problem up:
 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Farchive%2Fid%2Fdraft-ietf-rats-architecture-00.html%23name-architectural-overviewdata=04%7C01%7Cbrijesh.singh%40amd.com%7C3c94b09f0cd5450460a808d9b01be1f8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637734456065951077%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=%2FwNFMGAfojFZyGIj79D5%2BW%2BRPPuwumJiqIrf5UVrkPU%3Dreserved=0 

 I don't fully understand the split yet, but in principal 
there are

at least a few different things:

   d) The comms layer
   e) Something that validates the attestation message (i.e. the
signatures are valid, the hashes all add up etc)
   f) Something that knows what hashes to expect (i.e. oh that's a 
RHEL

8.4 kernel, or that's a valid kernel command line)
   g) Something that holds some secrets that can be handed

Re: SEV guest attestation

2021-11-29 Thread Brijesh Singh





On 11/25/21 7:59 AM, Dov Murik wrote:

[+cc Tom, Brijesh]

On 25/11/2021 15:42, Daniel P. Berrangé wrote:

On Thu, Nov 25, 2021 at 02:44:51PM +0200, Dov Murik wrote:

[+cc jejb, tobin, jim, hubertus]


On 25/11/2021 9:14, Sergio Lopez wrote:

On Wed, Nov 24, 2021 at 06:29:07PM +, Dr. David Alan Gilbert wrote:

* Daniel P. Berrangé (berra...@redhat.com) wrote:

On Wed, Nov 24, 2021 at 11:34:16AM -0500, Tyler Fanelli wrote:

Hi,

We recently discussed a way for remote SEV guest attestation through QEMU.
My initial approach was to get data needed for attestation through different
QMP commands (all of which are already available, so no changes required
there), deriving hashes and certificate data; and collecting all of this
into a new QMP struct (SevLaunchStart, which would include the VM's policy,
secret, and GPA) which would need to be upstreamed into QEMU. Once this is
provided, QEMU would then need to have support for attestation before a VM
is started. Upon speaking to Dave about this proposal, he mentioned that
this may not be the best approach, as some situations would render the
attestation unavailable, such as the instance where a VM is running in a
cloud, and a guest owner would like to perform attestation via QMP (a likely
scenario), yet a cloud provider cannot simply let anyone pass arbitrary QMP
commands, as this could be an issue.


As a general point, QMP is a low level QEMU implementation detail,
which is generally expected to be consumed exclusively on the host
by a privileged mgmt layer, which will in turn expose its own higher
level APIs to users or other apps. I would not expect to see QMP
exposed to anything outside of the privileged host layer.

We also use the QAPI protocol for QEMU guest agent commmunication,
however, that is a distinct service from QMP on the host. It shares
most infra with QMP but has a completely diffent command set. On the
host it is not consumed inside QEMU, but instead consumed by a
mgmt app like libvirt.


So I ask, does anyone involved in QEMU's SEV implementation have any input
on a quality way to perform guest attestation? If so, I'd be interested.


I think what's missing is some clearer illustrations of how this
feature is expected to be consumed in some real world application
and the use cases we're trying to solve.

I'd like to understand how it should fit in with common libvirt
applications across the different virtualization management
scenarios - eg virsh (command line),  virt-manger (local desktop
GUI), cockpit (single host web mgmt), OpenStack (cloud mgmt), etc.
And of course any non-traditional virt use cases that might be
relevant such as Kata.


That's still not that clear; I know Alice and Sergio have some ideas
(cc'd).
There's also some standardisation efforts (e.g. 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.potaroo.net%2Fietf%2Fhtml%2Fids-wg-rats.htmldata=04%7C01%7Cbrijesh.singh%40amd.com%7C3c94b09f0cd5450460a808d9b01be1f8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637734456065941078%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=E%2FeaI6JNF2ckosTeAbFRaCZUJOZ3zG0GNfKP8082INQ%3Dreserved=0
and 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Farchive%2Fid%2Fdraft-ietf-rats-architecture-00.htmldata=04%7C01%7Cbrijesh.singh%40amd.com%7C3c94b09f0cd5450460a808d9b01be1f8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637734456065951077%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=WEkMIZZp3O5Gyay5jZT8KSUH9fyarNfXy5O0Z%2FpHdnQ%3Dreserved=0
) - that I can't claim to fully understand.
However, there are some themes that are emerging:

   a) One use is to only allow a VM to access some private data once we
prove it's the VM we expect running in a secure/confidential system
   b) (a) normally involves requesting some proof from the VM and then
providing it some confidential data/a key if it's OK
   c) RATs splits the problem up:
 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Farchive%2Fid%2Fdraft-ietf-rats-architecture-00.html%23name-architectural-overviewdata=04%7C01%7Cbrijesh.singh%40amd.com%7C3c94b09f0cd5450460a808d9b01be1f8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637734456065951077%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=%2FwNFMGAfojFZyGIj79D5%2BW%2BRPPuwumJiqIrf5UVrkPU%3Dreserved=0
 I don't fully understand the split yet, but in principal there are
at least a few different things:

   d) The comms layer
   e) Something that validates the attestation message (i.e. the
signatures are valid, the hashes all add up etc)
   f) Something that knows what hashes to expect (i.e. oh that's a RHEL
8.4 kernel, or that's a valid kernel command line)
   g) Something that holds some secrets that can be handed out if e & f
are happy.

   There have also been proposals (e.g. Intel

Re: [PATCH 35/35] test/tcg/ppc64le: Add float reference files

2021-11-29 Thread Cédric Le Goater


On 11/22/21 12:16, Richard Henderson wrote:

On 11/22/21 10:43 AM, Richard Henderson wrote:

On 11/21/21 6:47 PM, Cédric Le Goater wrote:

I am getting an error with this test. See below.

...

  ### Rounding to nearest
  from single: f32(-nan:0xffa0)
-  to double: f64(-nan:0x00fff4) (INVALID)
+  to double: f64(-nan:0x00fff4) (OK)


Well that's disconcerting.

I can replicate this failure on an x86_64 host, but do not see the same error 
on a power9 ppc64le host.


Bah.  The test case is buggy.

It reads the fpscr for the flags *after* having gone through the printf for the 
result, at which point you are at the mercy of whatever other fp arithmetic 
libc chooses to do.

Fixed with

--- a/tests/tcg/multiarch/float_convs.c
+++ b/tests/tcg/multiarch/float_convs.c
@@ -51,8 +51,8 @@ static void convert_single_to_double(float input)

  output = input;

-    out_fmt = fmt_f64(output);
  flag_fmt = fmt_flags();
+    out_fmt = fmt_f64(output);
  printf("  to double: %s (%s)\n", out_fmt, flag_fmt);
  free(out_fmt);
  free(flag_fmt);

But this alone of course causes other "failures", because we've got some 
incorrect reference files.


The only one I have seen so far is on hexagon:

  https://gitlab.com/legoater/qemu/-/jobs/1829273672

C.

Re: [PATCH resend v2] qemu-binfmt-conf.sh: fix -F option

2021-11-29 Thread Laurent Vivier


Le 29/11/2021 à 14:51, mwi...@suse.com a écrit :

From: Martin Wilck 

qemu-binfmt-conf.sh should use "-F" as short option for "--qemu-suffix".
Fix the getopt call to make this work.

Signed-off-by: Martin Wilck 
---
previous: https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg03132.html


Sorry for the delay, didn't see the email.

To put "linux-user" in the email subject helps to pass through all my email 
filters...


ref: https://bugzilla.opensuse.org/show_bug.cgi?id=1186256


I think you don't need the suffix anymore with the new "--preserve-argv0" 
parameter.

See

6e1c0d7b951e ("linux-user: manage binfmt-misc preserve-arg[0] flag")

but you need v5.12 kernel.

2347961b11d4 ("binfmt_misc: pass binfmt_misc flags to the interpreter")

Moreover it seems it will be possible (soon?) to use a binfmt_misc 
configuration per container;

  [1/2] binfmt_misc: cleanup on filesystem umount

https://lore.kernel.org/lkml/20211105043000.ga25...@mail.hallyn.com/T/#m4a99a73c4e2c261a800dc6765fcbea8087635cfc 


  [2/2/] binfmt_misc: enable sandboxed mounts

https://lore.kernel.org/lkml/20211105043000.ga25...@mail.hallyn.com/T/#m8d991c47721f34e37d8253cb54ddf7b56a048f3e


---
  scripts/qemu-binfmt-conf.sh | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/scripts/qemu-binfmt-conf.sh b/scripts/qemu-binfmt-conf.sh
index 7de996d536..e9bfeb94d3 100755
--- a/scripts/qemu-binfmt-conf.sh
+++ b/scripts/qemu-binfmt-conf.sh
@@ -340,7 +340,9 @@ PERSISTENT=no
  PRESERVE_ARG0=no
  QEMU_SUFFIX=""
  
-options=$(getopt -o ds:Q:S:e:hc:p:g: -l debian,systemd:,qemu-path:,qemu-suffix:,exportdir:,help,credential:,persistent:,preserve-argv0: -- "$@")

+_longopts="debian,systemd:,qemu-path:,qemu-suffix:,exportdir:,help,credential:,\
+persistent:,preserve-argv0:"
+options=$(getopt -o ds:Q:S:e:hc:p:g:F: -l ${_longopts} -- "$@")
  eval set -- "$options"
  
  while true ; do




Fixes: 7155be7cda5c ("qemu-binfmt-conf.sh: allow to provide a suffix to the 
interpreter name")

Reviewed-by: Laurent Vivier

Re: [PATCH 1/3] ppc/pnv: Tune the POWER9 PCIe Host bridge model

2021-11-29 Thread Frederic Barrat





On 26/11/2021 10:09, Cédric Le Goater wrote:

On 11/16/21 18:01, Frederic Barrat wrote:

The PHB v4 found on POWER9 doesn't request any LSI, so let's clear the
Interrupt Pin register in the config space so that the model matches
the hardware.

If we don't, then we inherit from the default pcie root bridge, which
requests a LSI. And because we don't map it correctly in the device
tree, all PHBs allocate the same bogus hw interrupt. We end up with
inconsistent interrupt controller (xive) data. The problem goes away
if we don't allocate the LSI in the first place.

Signed-off-by: Frederic Barrat 
---
  hw/pci-host/pnv_phb4.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index 5c375a9f28..1659d55b4f 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -1234,10 +1234,13 @@ static void pnv_phb4_reset(DeviceState *dev)
  PCIDevice *root_dev = PCI_DEVICE(>root);
  /*
- * Configure PCI device id at reset using a property.
+ * Configure the PCI device at reset:
+ *   - set the Vendor and Device ID to for the root bridge
+ *   - no LSI
   */
  pci_config_set_vendor_id(root_dev->config, PCI_VENDOR_ID_IBM);
  pci_config_set_device_id(root_dev->config, phb->device_id);
+    pci_config_set_interrupt_pin(root_dev->config, 0);
  }
  static const char *pnv_phb4_root_bus_path(PCIHostState *host_bridge,



FYI, I am seeing an issue with FreeBSD when booting from iso :

   
https://download.freebsd.org/ftp/snapshots/powerpc/powerpc64/ISO-IMAGES/14.0/FreeBSD-14.0-CURRENT-powerpc-powerpc64-20211028-4827bf76bce-250301-disc1.iso.xz 






I see what's going on... Since the phb4 model borrows most of its code 
from the pcie_root bridge, there are several instances of code such as:


if (msix_enabled(dev)) {
do something;
} else if (msi_enabled(dev)) {
do something else;
} else {
yet something else which assumes a LSI;
}

With this series, I removed the LSI from the phb4 root port to match the 
hardware and fixed one such code pattern in patch 3. But there are 
others, and we hit one of those when installing from the free bsd iso.


So this is going to need more work.

  Fred




Thanks,

C.

SIGTERM received, booting...
KDB: debugger backends: ddb
KDB: current backend: ddb
---<>---
Copyright (c) 1992-2021 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
 The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 14.0-CURRENT #0 main-n250301-4827bf76bce: Thu Oct 28 06:53:58 
UTC 2021
 
r...@releng1.nyi.freebsd.org:/usr/obj/usr/src/powerpc.powerpc64/sys/GENERIC64 
powerpc
FreeBSD clang version 12.0.1 (g...@github.com:llvm/llvm-project.git 
llvmorg-12.0.1-0-gfed41342a82f)

WARNING: WITNESS option enabled, expect reduced performance.
VT: init without driver.
ofw_initrd: initrd loaded at 0x2800-0x28c7928c
cpu0: IBM POWER9 revision 2.0, 1000.00 MHz
cpu0: Features 
dc007182
cpu0: Features2 
bee0

real memory  = 1014484992 (967 MB)
avail memory = 117903360 (112 MB)
random: registering fast source PowerISA DARN random number generator
random: fast provider: "PowerISA DARN random number generator"
arc4random: WARNING: initial seeding bypassed the cryptographic random 
device because it was not yet seeded and the knob 
'bypass_before_seeding' was enabled.

random: entropy device external interface
kbd0 at kbdmux0
ofwbus0:  on nexus0
opal0:  irq 
1048560,1048561,1048562,1048563,1048564,1048565,1048566,1048567,1048568,1048569,1048570,1048571,1048572,1048573 
on ofwbus0

opal0: registered as a time-of-day clock, resolution 0.002000s
simplebus0:  mem 
0x60300-0x60300 on ofwbus0
pcib0:  mem 
0x600c3c000-0x600c3cfff,0x600c3-0x600c30fff on ofwbus0

pci0:  numa-domain 0 on pcib0
qemu-system-ppc64: ../hw/pci/pci.c:1487: pci_irq_handler: Assertion `0 
<= irq_num && irq_num < PCI_NUM_PINS' failed.

Re: [PATCH v3 2/8] accel/tcg: suppress IRQ check for special TBs

2021-11-29 Thread Richard Henderson


On 11/29/21 3:09 PM, Alex Bennée wrote:

When we set cpu->cflags_next_tb it is because we want to carefully
control the execution of the next TB. Currently there is a race that
causes the second stage of watchpoint handling to get ignored if an
IRQ is processed before we finish executing the instruction that
triggers the watchpoint. Use the new CF_NOIRQ facility to avoid the
race.

We also suppress IRQs when handling precise self modifying code to
avoid unnecessary bouncing.

Signed-off-by: Alex Bennée
Cc: Pavel Dovgalyuk
Fixes:https://gitlab.com/qemu-project/qemu/-/issues/245

---
v2
   - split the CF_NOIRQ implementation
   - only apply CF_NOIRQ for watchpoints/SMC handling
   - minor reword of commit
v3
   - add additional two cases of | CF_NOIRQ
---
  accel/tcg/cpu-exec.c  | 9 +
  accel/tcg/translate-all.c | 4 ++--
  softmmu/physmem.c | 4 ++--
  3 files changed, 13 insertions(+), 4 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v6 02/16] linux-user/host/ppc64: Use r11 for signal_pending address

2021-11-29 Thread Richard Henderson


On 11/29/21 12:01 PM, Peter Maydell wrote:

On Tue, 23 Nov 2021 at 17:40, Richard Henderson
 wrote:


We don't need a register that can live across the syscall;
we only need a register that can live until the syscall.


What about the case where:
  * we execute the sc instruction (r11 trashed)
  * the syscall is one that from the host kernel point of
view is restartable
  * the kernel arranges to restart the syscall by rewinding the
PC to point to the start of the 'sc' instruction
  * our rewind_if_in_safe_syscall() rewinds PC further to
point at safe_syscall_start
  * we want to use r11 again, but it was trashed in step 1
?

Put another way, this patch is effectively a revert of
commit 5d9f3ea081721, which was a fix to an observed bug.


Whoops.  I forgot about that (a mere 3 years ago).

r~

Re: [PATCH v3 8/8] tests/plugin/syscall.c: fix compiler warnings

2021-11-29 Thread Richard Henderson


On 11/29/21 3:09 PM, Alex Bennée wrote:

From: Juro Bystricky 

Fix compiler warnings. The warnings can result in a broken build.
This patch fixes warnings such as:

In file included from /usr/include/glib-2.0/glib.h:111,
  from ../tests/plugin/syscall.c:13:
../tests/plugin/syscall.c: In function ‘print_entry’:
/usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: error: ‘out’ may be
used uninitialized in this function [-Werror=maybe-uninitialized]
g_free (*pp);
^~~~
../tests/plugin/syscall.c:82:23: note: ‘out’ was declared here
  g_autofree gchar *out;
^~~
In file included from /usr/include/glib-2.0/glib.h:111,
  from ../tests/plugin/syscall.c:13:
../tests/plugin/syscall.c: In function ‘vcpu_syscall_ret’:
/usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: error: ‘out’ may be
 used uninitialized in this function [-Werror=maybe-uninitialized]
g_free (*pp);
^~~~
../tests/plugin/syscall.c:73:27: note: ‘out’ was declared here
  g_autofree gchar *out;
^~~
cc1: all warnings being treated as errors

Signed-off-by: Juro Bystricky 
Signed-off-by: Alex Bennée 
Message-Id: <20211128011551.2115468-1-juro.bystri...@intel.com>
---
  tests/plugin/syscall.c | 8 +++-
  1 file changed, 3 insertions(+), 5 deletions(-)



Reviewed-by: Richard Henderson 

r~




diff --git a/tests/plugin/syscall.c b/tests/plugin/syscall.c
index 484b48de49..96040c578f 100644
--- a/tests/plugin/syscall.c
+++ b/tests/plugin/syscall.c
@@ -70,19 +70,17 @@ static void vcpu_syscall_ret(qemu_plugin_id_t id, unsigned 
int vcpu_idx,
  }
  g_mutex_unlock();
  } else {
-g_autofree gchar *out;
-out = g_strdup_printf("syscall #%" PRIi64 " returned -> %" PRIi64 "\n",
-num, ret);
+g_autofree gchar *out = g_strdup_printf(
+ "syscall #%" PRIi64 " returned -> %" PRIi64 "\n", num, ret);
  qemu_plugin_outs(out);
  }
  }
  
  static void print_entry(gpointer val, gpointer user_data)

  {
-g_autofree gchar *out;
  SyscallStats *entry = (SyscallStats *) val;
  int64_t syscall_num = entry->num;
-out = g_strdup_printf(
+g_autofree gchar *out = g_strdup_printf(
  "%-13" PRIi64 "%-6" PRIi64 " %" PRIi64 "\n",
  syscall_num, entry->calls, entry->errors);
  qemu_plugin_outs(out);

Re: [PATCH v6 03/18] qemu/int128: addition of div/rem 128-bit operations

2021-11-29 Thread Frédéric Pétrot


On 29/11/2021 11:07, Richard Henderson wrote:

On 11/28/21 2:57 PM, Frédéric Pétrot wrote:

--- /dev/null
+++ b/util/int128.c
@@ -0,0 +1,145 @@
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "qemu/int128.h"


Missing file header and copyright boilerplate.


+#ifdef CONFIG_INT128
+
+Int128 int128_divu(Int128 a, Int128 b)
+{
+    return (__uint128_t)a / (__uint128_t)b;
+}
+
+Int128 int128_remu(Int128 a, Int128 b)
+{
+    return (__uint128_t)a % (__uint128_t)b;
+}
+
+Int128 int128_divs(Int128 a, Int128 b)
+{
+    return a / b;
+}
+
+Int128 int128_rems(Int128 a, Int128 b)
+{
+    return a % b;
+}


I think we should simply expose these inline, and let the compiler call its 
runtime function directly.


  Thanks.
  Ok, I'll drop that and handle the CONFIG_INT128 directly in the rv128
  div/rem helpers then.
  Frédéric



r~


--
+---+
| Frédéric Pétrot, Pr. Grenoble INP-Ensimag/TIMA,   Ensimag deputy director |
| Mob/Pho: +33 6 74 57 99 65/+33 4 76 57 48 70  Ad augusta  per angusta |
| http://tima.univ-grenoble-alpes.fr frederic.pet...@univ-grenoble-alpes.fr |
+---+

Re: [PATCH v6 03/18] qemu/int128: addition of div/rem 128-bit operations

2021-11-29 Thread Richard Henderson


On 11/29/21 3:27 PM, Frédéric Pétrot wrote:

On 29/11/2021 11:07, Richard Henderson wrote:

On 11/28/21 2:57 PM, Frédéric Pétrot wrote:

--- /dev/null
+++ b/util/int128.c
@@ -0,0 +1,145 @@
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "qemu/int128.h"


Missing file header and copyright boilerplate.


+#ifdef CONFIG_INT128
+
+Int128 int128_divu(Int128 a, Int128 b)
+{
+    return (__uint128_t)a / (__uint128_t)b;
+}
+
+Int128 int128_remu(Int128 a, Int128 b)
+{
+    return (__uint128_t)a % (__uint128_t)b;
+}
+
+Int128 int128_divs(Int128 a, Int128 b)
+{
+    return a / b;
+}
+
+Int128 int128_rems(Int128 a, Int128 b)
+{
+    return a % b;
+}


I think we should simply expose these inline, and let the compiler call its runtime 
function directly.


   Thanks.
   Ok, I'll drop that and handle the CONFIG_INT128 directly in the rv128
   div/rem helpers then.


No, that's not what I meant.  Copy these directly into include/qemu/int128.h and add 
static inline, within the existing CONFIG_INT128 block.



r~

[PATCH v3 8/8] tests/plugin/syscall.c: fix compiler warnings

2021-11-29 Thread Alex Bennée

From: Juro Bystricky 

Fix compiler warnings. The warnings can result in a broken build.
This patch fixes warnings such as:

In file included from /usr/include/glib-2.0/glib.h:111,
 from ../tests/plugin/syscall.c:13:
../tests/plugin/syscall.c: In function ‘print_entry’:
/usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: error: ‘out’ may be
   used uninitialized in this function [-Werror=maybe-uninitialized]
   g_free (*pp);
   ^~~~
../tests/plugin/syscall.c:82:23: note: ‘out’ was declared here
 g_autofree gchar *out;
   ^~~
In file included from /usr/include/glib-2.0/glib.h:111,
 from ../tests/plugin/syscall.c:13:
../tests/plugin/syscall.c: In function ‘vcpu_syscall_ret’:
/usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: error: ‘out’ may be
used uninitialized in this function [-Werror=maybe-uninitialized]
   g_free (*pp);
   ^~~~
../tests/plugin/syscall.c:73:27: note: ‘out’ was declared here
 g_autofree gchar *out;
   ^~~
cc1: all warnings being treated as errors

Signed-off-by: Juro Bystricky 
Signed-off-by: Alex Bennée 
Message-Id: <20211128011551.2115468-1-juro.bystri...@intel.com>
---
 tests/plugin/syscall.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/tests/plugin/syscall.c b/tests/plugin/syscall.c
index 484b48de49..96040c578f 100644
--- a/tests/plugin/syscall.c
+++ b/tests/plugin/syscall.c
@@ -70,19 +70,17 @@ static void vcpu_syscall_ret(qemu_plugin_id_t id, unsigned 
int vcpu_idx,
 }
 g_mutex_unlock();
 } else {
-g_autofree gchar *out;
-out = g_strdup_printf("syscall #%" PRIi64 " returned -> %" PRIi64 "\n",
-num, ret);
+g_autofree gchar *out = g_strdup_printf(
+ "syscall #%" PRIi64 " returned -> %" PRIi64 "\n", num, ret);
 qemu_plugin_outs(out);
 }
 }
 
 static void print_entry(gpointer val, gpointer user_data)
 {
-g_autofree gchar *out;
 SyscallStats *entry = (SyscallStats *) val;
 int64_t syscall_num = entry->num;
-out = g_strdup_printf(
+g_autofree gchar *out = g_strdup_printf(
 "%-13" PRIi64 "%-6" PRIi64 " %" PRIi64 "\n",
 syscall_num, entry->calls, entry->errors);
 qemu_plugin_outs(out);
-- 
2.30.2

[PATCH v3 2/8] accel/tcg: suppress IRQ check for special TBs

2021-11-29 Thread Alex Bennée

When we set cpu->cflags_next_tb it is because we want to carefully
control the execution of the next TB. Currently there is a race that
causes the second stage of watchpoint handling to get ignored if an
IRQ is processed before we finish executing the instruction that
triggers the watchpoint. Use the new CF_NOIRQ facility to avoid the
race.

We also suppress IRQs when handling precise self modifying code to
avoid unnecessary bouncing.

Signed-off-by: Alex Bennée 
Cc: Pavel Dovgalyuk 
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/245

---
v2
  - split the CF_NOIRQ implementation
  - only apply CF_NOIRQ for watchpoints/SMC handling
  - minor reword of commit
v3
  - add additional two cases of | CF_NOIRQ
---
 accel/tcg/cpu-exec.c  | 9 +
 accel/tcg/translate-all.c | 4 ++--
 softmmu/physmem.c | 4 ++--
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 2d14d02f6c..409ec8c38c 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -721,6 +721,15 @@ static inline bool need_replay_interrupt(int 
interrupt_request)
 static inline bool cpu_handle_interrupt(CPUState *cpu,
 TranslationBlock **last_tb)
 {
+/*
+ * If we have requested custom cflags with CF_NOIRQ we should
+ * skip checking here. Any pending interrupts will get picked up
+ * by the next TB we execute under normal cflags.
+ */
+if (cpu->cflags_next_tb != -1 && cpu->cflags_next_tb & CF_NOIRQ) {
+return false;
+}
+
 /* Clear the interrupt flag now since we're processing
  * cpu->interrupt_request and cpu->exit_request.
  * Ensure zeroing happens before reading cpu->exit_request or
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index bd0bb81d08..bd71db59a9 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1738,7 +1738,7 @@ tb_invalidate_phys_page_range__locked(struct 
page_collection *pages,
 if (current_tb_modified) {
 page_collection_unlock(pages);
 /* Force execution of one insn next time.  */
-cpu->cflags_next_tb = 1 | curr_cflags(cpu);
+cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(cpu);
 mmap_unlock();
 cpu_loop_exit_noexc(cpu);
 }
@@ -1906,7 +1906,7 @@ static bool tb_invalidate_phys_page(tb_page_addr_t addr, 
uintptr_t pc)
 #ifdef TARGET_HAS_PRECISE_SMC
 if (current_tb_modified) {
 /* Force execution of one insn next time.  */
-cpu->cflags_next_tb = 1 | curr_cflags(cpu);
+cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(cpu);
 return true;
 }
 #endif
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 314f8b439c..3524c04c2a 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -912,7 +912,7 @@ void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr 
len,
  */
 if (!cpu->can_do_io) {
 /* Force execution of one insn next time.  */
-cpu->cflags_next_tb = 1 | CF_LAST_IO | curr_cflags(cpu);
+cpu->cflags_next_tb = 1 | CF_LAST_IO | CF_NOIRQ | 
curr_cflags(cpu);
 cpu_loop_exit_restore(cpu, ra);
 }
 /*
@@ -946,7 +946,7 @@ void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr 
len,
 cpu_loop_exit(cpu);
 } else {
 /* Force execution of one insn next time.  */
-cpu->cflags_next_tb = 1 | CF_LAST_IO | curr_cflags(cpu);
+cpu->cflags_next_tb = 1 | CF_LAST_IO | CF_NOIRQ | 
curr_cflags(cpu);
 mmap_unlock();
 cpu_loop_exit_noexc(cpu);
 }
-- 
2.30.2

[PATCH v3 6/8] MAINTAINERS: Remove me as a reviewer for the build and test/avocado

2021-11-29 Thread Alex Bennée

From: Willian Rampazzo 

Remove me as a reviewer for the Build and test automation and the
Integration Testing with the Avocado Framework and add Beraldo
Leal.

Signed-off-by: Willian Rampazzo 
Reviewed-by: Beraldo Leal 
Message-Id: <20211122191124.31620-1-willi...@redhat.com>
Signed-off-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20211123205729.2205806-7-alex.ben...@linaro.org>
---
 MAINTAINERS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index d3879aa3c1..8f5156bfa7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3469,7 +3469,7 @@ M: Alex Bennée 
 M: Philippe Mathieu-Daudé 
 M: Thomas Huth 
 R: Wainer dos Santos Moschetta 
-R: Willian Rampazzo 
+R: Beraldo Leal 
 S: Maintained
 F: .github/lockdown.yml
 F: .gitlab-ci.yml
@@ -3507,7 +3507,7 @@ W: https://trello.com/b/6Qi1pxVn/avocado-qemu
 R: Cleber Rosa 
 R: Philippe Mathieu-Daudé 
 R: Wainer dos Santos Moschetta 
-R: Willian Rampazzo 
+R: Beraldo Leal 
 S: Odd Fixes
 F: tests/avocado/
 
-- 
2.30.2

[PATCH v3 3/8] tests/avocado: fix tcg_plugin mem access count test

2021-11-29 Thread Alex Bennée

When we cleaned up argument handling the test was missed.

Fixes: 5ae589faad ("tests/plugins/mem: introduce "track" arg and make args not 
positional")
Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20211123205729.2205806-4-alex.ben...@linaro.org>
---
 tests/avocado/tcg_plugins.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/avocado/tcg_plugins.py b/tests/avocado/tcg_plugins.py
index 9ca1515c3b..642d2e49e3 100644
--- a/tests/avocado/tcg_plugins.py
+++ b/tests/avocado/tcg_plugins.py
@@ -131,7 +131,7 @@ def test_aarch64_virt_mem_icount(self):
  suffix=".log")
 
 self.run_vm(kernel_path, kernel_command_line,
-"tests/plugin/libmem.so,arg=both", plugin_log.name,
+"tests/plugin/libmem.so,inline=true,callback=true", 
plugin_log.name,
 console_pattern,
 args=('-icount', 'shift=1'))
 
-- 
2.30.2

Re: [PATCH for-6.1 v2] i386: do not call cpudef-only models functions for max, host, base

2021-11-29 Thread Woodhouse, David

On Fri, 2021-07-23 at 13:29 +0200, Claudio Fontana wrote:
>  static void kvm_cpu_instance_init(CPUState *cs)
>  {
>  X86CPU *cpu = X86_CPU(cs);
> +X86CPUClass *xcc = X86_CPU_GET_CLASS(cpu);
>  
>  host_cpu_instance_init(cpu);
>  
> -if (!kvm_irqchip_in_kernel()) {
> -x86_cpu_change_kvm_default("x2apic", "off");
> -} else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) {
> -x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
> -}
> -
> -/* Special cases not set in the X86CPUDefinition structs: */
> +if (xcc->model) {
> +/* only applies to builtin_x86_defs cpus */
> +if (!kvm_irqchip_in_kernel()) {
> +x86_cpu_change_kvm_default("x2apic", "off");
> +} else if (kvm_irqchip_is_split() && kvm_enable_x2apic()) {
> +x86_cpu_change_kvm_default("kvm-msi-ext-dest-id", "on");
> +}
>  
> -x86_cpu_apply_props(cpu, kvm_default_props);
> +/* Special cases not set in the X86CPUDefinition structs: */
> +x86_cpu_apply_props(cpu, kvm_default_props);
> +}
>  

I think this causes a regression in x2apic and kvm-msi-ext-dest-id
support. If you start qemu thus:

qemu-system-x86_64 -machine q35,accel=kvm,usb=off,kernel_irqchip=split -cpu 
host -smp 288,sockets=9,cores=16,threads=2

The guest now sees those features, but we don't actually call
kvm_enable_x2apic() so the APIC broadcast quirk doesn't get disabled,
and interrupts targeted at APIC ID 255 are interpreted as broadcasts:

[ 73.198504] __common_interrupt: 0.34 No irq handler for vector
[ 73.198515] __common_interrupt: 11.34 No irq handler for vector
[ 73.198517] __common_interrupt: 12.34 No irq handler for vector
[ 73.198521] __common_interrupt: 15.34 No irq handler for vector
[ 73.198524] __common_interrupt: 17.34 No irq handler for vector
[ 73.198528] __common_interrupt: 34.34 No irq handler for vector
[ 73.198529] __common_interrupt: 20.34 No irq handler for vector
[ 73.198533] __common_interrupt: 41.34 No irq handler for vector
[ 73.198539] __common_interrupt: 27.34 No irq handler for vector
[ 73.198542] __common_interrupt: 28.34 No irq handler for vector



Amazon Development Centre (London) Ltd. Registered in England and Wales with 
registration number 04543232 with its registered office at 1 Principal Place, 
Worship Street, London EC2A 2FA, United Kingdom.

1 2 >

1 - 100 of 159 matches

Mail list logo