date:20230120

Re: [PULL 0/5] tcg patch queue

2023-01-20 Thread Richard Henderson


On 1/19/23 23:41, Thomas Huth wrote:

On 16/01/2023 23.36, Richard Henderson wrote:

The following changes since commit fb7e7990342e59cf67dbd895c1a1e3fb1741df7a:

   tests/qtest/qom-test: Do not print tested properties by default (2023-01-16 15:00:57 
+)


are available in the Git repository at:

   https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230116

for you to fetch changes up to 61710a7e23a63546da0071ea32adb96476fa5d07:

   accel/tcg: Split out cpu_exec_{setjmp,loop} (2023-01-16 10:14:12 -1000)


- Reorg cpu_tb_exec around setjmp.
- Use __attribute__((target)) for buffer_is_zero.
- Add perfmap and jitdump for perf support.


Ilya Leoshkevich (3):
   linux-user: Clean up when exiting due to a signal
   accel/tcg: Add debuginfo support
   tcg: add perfmap and jitdump

Richard Henderson (2):
   util/bufferiszero: Use __attribute__((target)) for avx2/avx512
   accel/tcg: Split out cpu_exec_{setjmp,loop}


  Hi Richard, hi Ilya,

with the recent QEMU master branch (commit 701ed34), I'm now seeing failures in 
Travis:

  https://app.travis-ci.com/github/huth/qemu/jobs/593786529#L14411

Everything was still fine a couple of days ago (commit fb7e799):

  https://app.travis-ci.com/github/huth/qemu/builds/259755664

... so it seems this is likely related to this pull request. Could you please 
have a look?


Thankfully our s390x.ci.qemu.org has the same version gcc installed, and I was able to 
reproduce this.  But only once -- it's irregular and very low frequency.


The code generated by gcc is correct and easy to inspect, since cpu_exec_setjmp is now 
quite small:


000f3250 :
   f3250:   eb 6f f0 30 00 24   stmg%r6,%r15,48(%r15)
   f3256:   a7 39 00 00 lghi%r3,0
   f325a:   e3 f0 ff 58 ff 71   lay %r15,-168(%r15)

// Save cpu to stack+160.
   f3260:   e3 20 f0 a0 00 24   stg %r2,160(%r15)
   f3266:   41 20 20 f0 la  %r2,240(%r2)
   f326a:   c0 e5 ff fb 10 eb   brasl   %r14,55440 <__sigsetjmp@plt>
   f3270:   ec 26 00 0d 00 7e   cijne   %r2,0,f328a 


// Reload cpu for cpu_exec_loop().
   f3276:   e3 20 f0 a0 00 04   lg  %r2,160(%r15)
   f327c:   c0 e5 ff ff fb ee   brasl   %r14,f2a58 

   f3282:   eb 6f f0 d8 00 04   lmg %r6,%r15,216(%r15)
   f3288:   07 fe   br  %r14

// Load tls pointer and current_cpu 
address.
   f328a:   b2 4f 00 10 ear %r1,%a0
   f328e:   c0 20 00 0a 35 9d   larl%r2,239dc8 

   f3294:   eb 11 00 20 00 0d   sllg%r1,%r1,32
   f329a:   e3 20 20 00 00 04   lg  %r2,0(%r2)
   f32a0:   b2 4f 00 11 ear %r1,%a1

// Reload cpu for comparison
   f32a4:   e3 30 f0 a0 00 04   lg  %r3,160(%r15)
// cpu == current_cpu
   f32aa:   e3 32 10 00 00 20   cg  %r3,0(%r2,%r1)
   f32b0:   a7 84 00 12 je  f32d4 

   ...

The only way I can imagine that this comparison fails is if we have corrupted the stack in 
some way.  I have not been able to induce failure under any sort of debugging, and I can't 
imagine where irregular corruption would have come from.



r~

r~

Re: [PULL 4/9] MAINTAINERS: Remove bouncing mail address from Kamil Rytarowski

2023-01-20 Thread Brad Smith

I e-mailed his business address and received a bounce back from Google 
saying the account

does not exist.

On 1/18/2023 6:34 AM, Thomas Huth wrote:

When sending mail to Kamil's address, it's bouncing with a message
that the mailbox is full. This already happens since summer 2022,
and the last message that Kamil sent to the qemu-devel mailing list
is from November 2021 (as far as I can see), so we unfortunately
have to assume that this e-mail address is not valid anymore.

Message-Id: <20230113081735.1148057-1-th...@redhat.com>
Signed-off-by: Thomas Huth 
---
  MAINTAINERS | 2 --
  1 file changed, 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0fe50d01e3..08ad1e5341 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -511,7 +511,6 @@ F: target/i386/hax/
  Guest CPU Cores (NVMM)
  --
  NetBSD Virtual Machine Monitor (NVMM) CPU support
-M: Kamil Rytarowski 
  M: Reinoud Zandijk 
  S: Maintained
  F: include/sysemu/nvmm.h
@@ -536,7 +535,6 @@ F: util/*posix*.c
  F: include/qemu/*posix*.h
  
  NETBSD

-M: Kamil Rytarowski 
  M: Reinoud Zandijk 
  M: Ryo ONODERA 
  S: Maintained

[PATCH v2 1/2] target/riscv: add Zicond as an experimental extension

2023-01-20 Thread Philipp Tomsich

This implements the Zicond (conditional integer operations) extension,
as of version 1.0-draft-20230120 as an experimental extension in QEMU
("x-zicond").

The Zicond extension acts as a building block for branchless sequences
including conditional-{arithmetic,logic,select,move}.  Refer to the
specification for usage scenarios and application guidance.

The following instructions constitute Zicond:
  - czero.eqz rd, rs1, rs2  =>  rd = (rs2 == 0) ? 0 : rs1
  - czero.nez rd, rs1, rs2  =>  rd = (rs2 != 0) ? 0 : rs1

See
  
https://github.com/riscv/riscv-zicond/releases/download/v1.0-draft-20230120/riscv-zicond_1.0-draft-20230120.pdf
for the (current version of the) Zicond specification and usage details.

Signed-off-by: Philipp Tomsich 
---

Changes in v2:
- gates availability of the instructions through a REQUIRE_ZICOND
  macro (these were previously always enabled)

 MAINTAINERS  |  7 +++
 target/riscv/cpu.c   |  4 ++
 target/riscv/cpu.h   |  1 +
 target/riscv/insn32.decode   |  4 ++
 target/riscv/insn_trans/trans_rvzicond.c.inc | 54 
 target/riscv/translate.c |  1 +
 6 files changed, 71 insertions(+)
 create mode 100644 target/riscv/insn_trans/trans_rvzicond.c.inc

diff --git a/MAINTAINERS b/MAINTAINERS
index 08ad1e5341..ca914c42fa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -295,6 +295,13 @@ F: include/hw/riscv/
 F: linux-user/host/riscv32/
 F: linux-user/host/riscv64/
 
+RISC-V Zicond extension
+M: Philipp Tomsich 
+M: Christoph Muellner 
+L: qemu-ri...@nongnu.org
+S: Supported
+F: target/riscv/insn_trans/trans_zicond.c.inc
+
 RISC-V XVentanaCondOps extension
 M: Philipp Tomsich 
 L: qemu-ri...@nongnu.org
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index cc75ca7667..f214b03e95 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -73,6 +73,7 @@ struct isa_ext_data {
 static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(h, false, PRIV_VERSION_1_12_0, ext_h),
 ISA_EXT_DATA_ENTRY(v, false, PRIV_VERSION_1_12_0, ext_v),
+ISA_EXT_DATA_ENTRY(zicond, true, PRIV_VERSION_1_12_0, ext_zicond),
 ISA_EXT_DATA_ENTRY(zicsr, true, PRIV_VERSION_1_10_0, ext_icsr),
 ISA_EXT_DATA_ENTRY(zifencei, true, PRIV_VERSION_1_10_0, ext_ifencei),
 ISA_EXT_DATA_ENTRY(zihintpause, true, PRIV_VERSION_1_10_0, 
ext_zihintpause),
@@ -1080,6 +1081,9 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_BOOL("x-smaia", RISCVCPU, cfg.ext_smaia, false),
 DEFINE_PROP_BOOL("x-ssaia", RISCVCPU, cfg.ext_ssaia, false),
 
+/* Zicond 1.0-draft-20230120 */
+DEFINE_PROP_BOOL("x-zicond", RISCVCPU, cfg.ext_zicond, false),
+
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index f5609b62a2..4b6ff800d3 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -446,6 +446,7 @@ struct RISCVCPUConfig {
 bool ext_zkt;
 bool ext_ifencei;
 bool ext_icsr;
+bool ext_zicond;
 bool ext_zihintpause;
 bool ext_smstateen;
 bool ext_sstc;
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b7e7613ea2..ca812c2f7a 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -890,3 +890,7 @@ sm3p1   00 01000 01001 . 001 . 0010011 @r2
 # *** RV32 Zksed Standard Extension ***
 sm4ed   .. 11000 . . 000 . 0110011 @k_aes
 sm4ks   .. 11010 . . 000 . 0110011 @k_aes
+
+# *** Zicond Standard Extension ***
+czero_eqz   111 . . 101 . 0110011 @r
+czero_nez   111 . . 111 . 0110011 @r
diff --git a/target/riscv/insn_trans/trans_rvzicond.c.inc 
b/target/riscv/insn_trans/trans_rvzicond.c.inc
new file mode 100644
index 00..20e9694a2c
--- /dev/null
+++ b/target/riscv/insn_trans/trans_rvzicond.c.inc
@@ -0,0 +1,54 @@
+/*
+ * RISC-V translation routines for the XVentanaCondOps extension.
+ *
+ * Copyright (c) 2022 VRULL GmbH.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define REQUIRE_ZICOND(ctx) do { \
+if (!ctx->cfg_ptr->ext_zicond) { \
+return false;\
+}\
+} while (0)
+
+/* Emits "$rd = ($rs2  $zero) ? $zero : $rs1" */
+static

[PATCH v2 2/2] target/riscv: redirect XVentanaCondOps to use the Zicond functions

2023-01-20 Thread Philipp Tomsich

The Zicond standard extension implements the same instruction
semantics as XVentanaCondOps, although using different mnemonics and
opcodes.

Point XVentanaCondOps to the (newly implemented) Zicond implementation
to reduce the future maintenance burden.

Also updating MAINTAINERS as trans_xventanacondops.c.inc will not see
active maintenance from here forward.

Signed-off-by: Philipp Tomsich 
---

Changes in v2:
- Calls into the gen_czero_{eqz,nez} helpers instead of calling
  trans_czero_{eqz,nez} to bypass the require-check and ensure that
  XVentanaCondOps can be enabled/disabled independently of Zicond.

 MAINTAINERS|  2 +-
 .../insn_trans/trans_xventanacondops.c.inc | 18 +++---
 2 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index ca914c42fa..293a9d1c8c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -305,7 +305,7 @@ F: target/riscv/insn_trans/trans_zicond.c.inc
 RISC-V XVentanaCondOps extension
 M: Philipp Tomsich 
 L: qemu-ri...@nongnu.org
-S: Supported
+S: Odd Fixes
 F: target/riscv/XVentanaCondOps.decode
 F: target/riscv/insn_trans/trans_xventanacondops.c.inc
 
diff --git a/target/riscv/insn_trans/trans_xventanacondops.c.inc 
b/target/riscv/insn_trans/trans_xventanacondops.c.inc
index 16849e6d4e..38c15f2825 100644
--- a/target/riscv/insn_trans/trans_xventanacondops.c.inc
+++ b/target/riscv/insn_trans/trans_xventanacondops.c.inc
@@ -1,7 +1,7 @@
 /*
  * RISC-V translation routines for the XVentanaCondOps extension.
  *
- * Copyright (c) 2021-2022 VRULL GmbH.
+ * Copyright (c) 2021-2023 VRULL GmbH.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -16,24 +16,12 @@
  * this program.  If not, see .
  */
 
-static bool gen_vt_condmask(DisasContext *ctx, arg_r *a, TCGCond cond)
-{
-TCGv dest = dest_gpr(ctx, a->rd);
-TCGv src1 = get_gpr(ctx, a->rs1, EXT_NONE);
-TCGv src2 = get_gpr(ctx, a->rs2, EXT_NONE);
-
-tcg_gen_movcond_tl(cond, dest, src2, ctx->zero, src1, ctx->zero);
-
-gen_set_gpr(ctx, a->rd, dest);
-return true;
-}
-
 static bool trans_vt_maskc(DisasContext *ctx, arg_r *a)
 {
-return gen_vt_condmask(ctx, a, TCG_COND_NE);
+return gen_logic(ctx, a, gen_czero_eqz);
 }
 
 static bool trans_vt_maskcn(DisasContext *ctx, arg_r *a)
 {
-return gen_vt_condmask(ctx, a, TCG_COND_EQ);
+return gen_logic(ctx, a, gen_czero_nez);
 }
-- 
2.34.1

Re: [PATCH v9] xen/pt: reserve PCI slot 2 for Intel igd-passthru

2023-01-20 Thread Chuck Zmudzinski

On 1/20/23 4:34 PM, Stefano Stabellini wrote:
> On Sat, 14 Jan 2023, Chuck Zmudzinski wrote:
> > Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
> > as noted in docs/igd-assign.txt in the Qemu source code.
> > 
> > Currently, when the xl toolstack is used to configure a Xen HVM guest with
> > Intel IGD passthrough to the guest with the Qemu upstream device model,
> > a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
> > a different slot. This problem often prevents the guest from booting.
> > 
> > The only available workaround is not good: Configure Xen HVM guests to use
> > the old and no longer maintained Qemu traditional device model available
> > from xenbits.xen.org which does reserve slot 2 for the Intel IGD.
> > 
> > To implement this feature in the Qemu upstream device model for Xen HVM
> > guests, introduce the following new functions, types, and macros:
> > 
> > * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
> > * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
> > * typedef XenPTQdevRealize function pointer
> > * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
> > * xen_igd_reserve_slot and xen_igd_clear_slot functions
> > 
> > Michael Tsirkin:
> > * Introduce XEN_PCI_IGD_DOMAIN, XEN_PCI_IGD_BUS, XEN_PCI_IGD_DEV, and
> >   XEN_PCI_IGD_FN - use them to compute the value of XEN_PCI_IGD_SLOT_MASK
> > 
> > The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
> > member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
> > the xl toolstack with the gfx_passthru option enabled, which sets the
> > igd-passthru=on option to Qemu for the Xen HVM machine type.
> > 
> > The new xen_igd_reserve_slot function also needs to be implemented in
> > hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
> > when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
> > in which case it does nothing.
> > 
> > The new xen_igd_clear_slot function overrides qdev->realize of the parent
> > PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
> > since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
> > created in hw/i386/pc_piix.c for the case when igd-passthru=on.
> > 
> > Move the call to xen_host_pci_device_get, and the associated error
> > handling, from xen_pt_realize to the new xen_igd_clear_slot function to
> > initialize the device class and vendor values which enables the checks for
> > the Intel IGD to succeed. The verification that the host device is an
> > Intel IGD to be passed through is done by checking the domain, bus, slot,
> > and function values as well as by checking that gfx_passthru is enabled,
> > the device class is VGA, and the device vendor in Intel.
> > 
> > Signed-off-by: Chuck Zmudzinski 
>
> Hi Chuck,
>
> The approach looks OK in principle to me. I only have one question: for
> other PCI devices (not Intel IGD), where is xen_host_pci_device_get
> called now?

I think you are right that there might be a problem for the
devices being added after the Intel IGD. I think I only tested
the case when the Intel IGD is the last device being added.
I do expect if I add the Intel IGD first, then there will be
problems when the subsequent type XEN_PT devices are
added because xen_pt_realize will not like it if
xen_host_pci_device_get does not get called. I will check
this over the weekend and, if a change is needed, I will
post it in v10.

I also think there is the same problem when the bit in
slot_reserved_mask is never set, which happens when
the guest has pci devices passed through but without
configuring Qemu with the igd-passthru=on option for
the xenfv machine type. I will also test this over the
weekend.

> It looks like that xen_igd_reserve_slot would return without setting
> slot_reserved_mask, hence xen_igd_clear_slot would also return without
> calling xen_host_pci_device_get. And xen_pt_realize doesn't call
> xen_host_pci_device_get any longer.
>
> Am I missing something?

No, you are not missing anything. You are pointing to some
cases that I need to test that probably would not work.
I think the fix is to have this at the beginning of
xen_igd_clear_slot instead of what I have now:

    xen_host_pci_device_get(>real_device,
    s->hostaddr.domain, s->hostaddr.bus,
    s->hostaddr.slot, s->hostaddr.function,
    errp);
    if (*errp) {
    error_append_hint(errp, "Failed to \"open\" the real pci device");
    return;
    }

    if (!(pci_bus->slot_reserved_mask & XEN_PCI_IGD_SLOT_MASK))
    return;

That way xen_host_pci_device_get would still get called
when xen_igd_clear_slot returns because the bit in the
mask was not set.

Thanks for your careful review of the patch. I think you
did find a real mistake that needs to be fixed in v10 which
I hope to post with the above

Re: [PATCH v10 3/9] KVM: Extend the memslot to support fd-based private memory

2023-01-20 Thread Jarkko Sakkinen

On Tue, Jan 10, 2023 at 05:14:32PM +0800, Chao Peng wrote:
> On Mon, Jan 09, 2023 at 07:32:05PM +, Sean Christopherson wrote:
> > On Fri, Jan 06, 2023, Chao Peng wrote:
> > > On Thu, Jan 05, 2023 at 11:23:01AM +, Jarkko Sakkinen wrote:
> > > > On Fri, Dec 02, 2022 at 02:13:41PM +0800, Chao Peng wrote:
> > > > > To make future maintenance easy, internally use a binary compatible
> > > > > alias struct kvm_user_mem_region to handle both the normal and the
> > > > > '_ext' variants.
> > > > 
> > > > Feels bit hacky IMHO, and more like a completely new feature than
> > > > an extension.
> > > > 
> > > > Why not just add a new ioctl? The commit message does not address
> > > > the most essential design here.
> > > 
> > > Yes, people can always choose to add a new ioctl for this kind of change
> > > and the balance point here is we want to also avoid 'too many ioctls' if
> > > the functionalities are similar.  The '_ext' variant reuses all the
> > > existing fields in the 'normal' variant and most importantly KVM
> > > internally can reuse most of the code. I certainly can add some words in
> > > the commit message to explain this design choice.
> > 
> > After seeing the userspace side of this, I agree with Jarkko; overloading
> > KVM_SET_USER_MEMORY_REGION is a hack.  E.g. the size validation ends up 
> > being
> > bogus, and userspace ends up abusing unions or implementing 
> > kvm_user_mem_region
> > itself.
> 
> How is the size validation being bogus? I don't quite follow. Then we
> will use kvm_userspace_memory_region2 as the KVM internal alias, right?
> I see similar examples use different functions to handle different
> versions but it does look easier if we use alias for this function.
> 
> > 
> > It feels absolutely ridiculous, but I think the best option is to do:
> > 
> > #define KVM_SET_USER_MEMORY_REGION2 _IOW(KVMIO, 0x49, \
> >  struct kvm_userspace_memory_region2)
> 
> Just interesting, is 0x49 a safe number we can use? 
> 
> > 
> > /* for KVM_SET_USER_MEMORY_REGION2 */
> > struct kvm_user_mem_region2 {
> > __u32 slot;
> > __u32 flags;
> > __u64 guest_phys_addr;
> > __u64 memory_size;
> > __u64 userspace_addr;
> > __u64 restricted_offset;
> > __u32 restricted_fd;
> > __u32 pad1;
> > __u64 pad2[14];
> > }
> > 
> > And it's consistent with other KVM ioctls(), e.g. KVM_SET_CPUID2.
> 
> Okay, agree from KVM userspace API perspective this is more consistent
> with similar existing examples. I see several of them.
> 
> I think we will also need a CAP_KVM_SET_USER_MEMORY_REGION2 for this new
> ioctl.

The current API in the patch set is trivial for C user space but for
any other more "constrained" language such as Rust a new ioctl would be
easier to adapt.

> > 
> > Regarding the userspace side of things, please include Vishal's selftests 
> > in v11,
> > it's impossible to properly review the uAPI changes without seeing the 
> > userspace
> > side of things.  I'm in the process of reviewing Vishal's v2[*], I'll try to
> > massage it into a set of patches that you can incorporate into your series.
> 
> Previously I included Vishal's selftests in the github repo, but not
> include them in this patch series. It's OK for me to incorporate them
> directly into this series and review together if Vishal is fine.
> 
> Chao
> > 
> > [*] 
> > https://lore.kernel.org/all/20221205232341.4131240-1-vannapu...@google.com

BR, Jarkko

Re: [PATCH v10 3/9] KVM: Extend the memslot to support fd-based private memory

2023-01-20 Thread Jarkko Sakkinen

On Mon, Jan 09, 2023 at 07:32:05PM +, Sean Christopherson wrote:
> On Fri, Jan 06, 2023, Chao Peng wrote:
> > On Thu, Jan 05, 2023 at 11:23:01AM +, Jarkko Sakkinen wrote:
> > > On Fri, Dec 02, 2022 at 02:13:41PM +0800, Chao Peng wrote:
> > > > To make future maintenance easy, internally use a binary compatible
> > > > alias struct kvm_user_mem_region to handle both the normal and the
> > > > '_ext' variants.
> > > 
> > > Feels bit hacky IMHO, and more like a completely new feature than
> > > an extension.
> > > 
> > > Why not just add a new ioctl? The commit message does not address
> > > the most essential design here.
> > 
> > Yes, people can always choose to add a new ioctl for this kind of change
> > and the balance point here is we want to also avoid 'too many ioctls' if
> > the functionalities are similar.  The '_ext' variant reuses all the
> > existing fields in the 'normal' variant and most importantly KVM
> > internally can reuse most of the code. I certainly can add some words in
> > the commit message to explain this design choice.
> 
> After seeing the userspace side of this, I agree with Jarkko; overloading
> KVM_SET_USER_MEMORY_REGION is a hack.  E.g. the size validation ends up being
> bogus, and userspace ends up abusing unions or implementing 
> kvm_user_mem_region
> itself.
> 
> It feels absolutely ridiculous, but I think the best option is to do:
> 
> #define KVM_SET_USER_MEMORY_REGION2 _IOW(KVMIO, 0x49, \
>struct kvm_userspace_memory_region2)
> 
> /* for KVM_SET_USER_MEMORY_REGION2 */
> struct kvm_user_mem_region2 {
>   __u32 slot;
>   __u32 flags;
>   __u64 guest_phys_addr;
>   __u64 memory_size;
>   __u64 userspace_addr;
>   __u64 restricted_offset;
>   __u32 restricted_fd;
>   __u32 pad1;
>   __u64 pad2[14];
> }
> 
> And it's consistent with other KVM ioctls(), e.g. KVM_SET_CPUID2.
> 
> Regarding the userspace side of things, please include Vishal's selftests in 
> v11,
> it's impossible to properly review the uAPI changes without seeing the 
> userspace
> side of things.  I'm in the process of reviewing Vishal's v2[*], I'll try to
> massage it into a set of patches that you can incorporate into your series.
> 
> [*] https://lore.kernel.org/all/20221205232341.4131240-1-vannapu...@google.com

+1

BR, Jarkko

Re: [PATCH v8 09/13] vfio/migration: Implement VFIO migration protocol v2

2023-01-20 Thread Alex Williamson

On Mon, 16 Jan 2023 16:11:31 +0200
Avihai Horon  wrote:

> Implement the basic mandatory part of VFIO migration protocol v2.
> This includes all functionality that is necessary to support
> VFIO_MIGRATION_STOP_COPY part of the v2 protocol.
> 
> The two protocols, v1 and v2, will co-exist and in the following patches
> v1 protocol code will be removed.
> 
> There are several main differences between v1 and v2 protocols:
> - VFIO device state is now represented as a finite state machine instead
>   of a bitmap.
> 
> - Migration interface with kernel is now done using VFIO_DEVICE_FEATURE
>   ioctl and normal read() and write() instead of the migration region.
> 
> - Pre-copy is made optional in v2 protocol. Support for pre-copy will be
>   added later on.
> 
> Detailed information about VFIO migration protocol v2 and its difference
> compared to v1 protocol can be found here [1].
> 
> [1]
> https://lore.kernel.org/all/20220224142024.147653-10-yish...@nvidia.com/
> 
> Signed-off-by: Avihai Horon 
> Reviewed-by: Cédric Le Goater 
> ---
>  include/hw/vfio/vfio-common.h |   5 +
>  hw/vfio/common.c  |  19 +-
>  hw/vfio/migration.c   | 455 +++---
>  hw/vfio/trace-events  |   7 +
>  4 files changed, 447 insertions(+), 39 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index bbaf72ba00..6d7d850bfe 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -66,6 +66,11 @@ typedef struct VFIOMigration {
>  int vm_running;
>  Notifier migration_state;
>  uint64_t pending_bytes;
> +uint32_t device_state;
> +int data_fd;
> +void *data_buffer;
> +size_t data_buffer_size;
> +bool v2;
>  } VFIOMigration;
>  
>  typedef struct VFIOAddressSpace {
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 550b2d7ded..dcaa77d2a8 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -355,10 +355,18 @@ static bool 
> vfio_devices_all_dirty_tracking(VFIOContainer *container)
>  return false;
>  }
>  
> -if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) 
> &&
> +if (!migration->v2 &&
> +(vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) 
> &&
>  (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) 
> {
>  return false;
>  }
> +
> +if (migration->v2 &&
> +(vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) 
> &&
> +(migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
> + migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
> +return false;
> +}
>  }
>  }
>  return true;
> @@ -385,7 +393,14 @@ static bool 
> vfio_devices_all_running_and_mig_active(VFIOContainer *container)
>  return false;
>  }
>  
> -if (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
> +if (!migration->v2 &&
> +migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
> +continue;
> +}
> +
> +if (migration->v2 &&
> +(migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
> + migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
>  continue;
>  } else {
>  return false;
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 9df859f4d3..f19ada0f4f 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -10,6 +10,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu/main-loop.h"
>  #include "qemu/cutils.h"
> +#include "qemu/units.h"
>  #include 
>  #include 
>  
> @@ -44,8 +45,103 @@
>  #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
>  #define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
>  
> +/*
> + * This is an arbitrary size based on migration of mlx5 devices, where 
> typically
> + * total device migration size is on the order of 100s of MB. Testing with
> + * larger values, e.g. 128MB and 1GB, did not show a performance improvement.
> + */
> +#define VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE (1 * MiB)
> +
>  static int64_t bytes_transferred;
>  
> +static const char *mig_state_to_str(enum vfio_device_mig_state state)
> +{
> +switch (state) {
> +case VFIO_DEVICE_STATE_ERROR:
> +return "ERROR";
> +case VFIO_DEVICE_STATE_STOP:
> +return "STOP";
> +case VFIO_DEVICE_STATE_RUNNING:
> +return "RUNNING";
> +case VFIO_DEVICE_STATE_STOP_COPY:
> +return "STOP_COPY";
> +case VFIO_DEVICE_STATE_RESUMING:
> +return "RESUMING";
> +case VFIO_DEVICE_STATE_RUNNING_P2P:
> +return "RUNNING_P2P";
> +default:
> +return "UNKNOWN STATE";
> +}
> +}
> +
> +static int vfio_migration_set_state(VFIODevice *vbasedev,
> +

Re: [PATCH v8 05/13] migration/qemu-file: Add qemu_file_get_to_fd()

2023-01-20 Thread Alex Williamson

On Mon, 16 Jan 2023 16:11:27 +0200
Avihai Horon  wrote:

> Add new function qemu_file_get_to_fd() that allows reading data from
> QEMUFile and writing it straight into a given fd.
> 
> This will be used later in VFIO migration code.
> 
> Signed-off-by: Avihai Horon 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Cédric Le Goater 
> ---
>  migration/qemu-file.h |  1 +
>  migration/qemu-file.c | 34 ++
>  2 files changed, 35 insertions(+)

quintela, dgilbert, Ping for ack.  Thanks,

Alex

 
> diff --git a/migration/qemu-file.h b/migration/qemu-file.h
> index fa13d04d78..9d0155a2a1 100644
> --- a/migration/qemu-file.h
> +++ b/migration/qemu-file.h
> @@ -148,6 +148,7 @@ int qemu_file_shutdown(QEMUFile *f);
>  QEMUFile *qemu_file_get_return_path(QEMUFile *f);
>  void qemu_fflush(QEMUFile *f);
>  void qemu_file_set_blocking(QEMUFile *f, bool block);
> +int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
>  
>  void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
>  void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 2d5f74ffc2..102ab3b439 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -940,3 +940,37 @@ QIOChannel *qemu_file_get_ioc(QEMUFile *file)
>  {
>  return file->ioc;
>  }
> +
> +/*
> + * Read size bytes from QEMUFile f and write them to fd.
> + */
> +int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size)
> +{
> +while (size) {
> +size_t pending = f->buf_size - f->buf_index;
> +ssize_t rc;
> +
> +if (!pending) {
> +rc = qemu_fill_buffer(f);
> +if (rc < 0) {
> +return rc;
> +}
> +if (rc == 0) {
> +return -EIO;
> +}
> +continue;
> +}
> +
> +rc = write(fd, f->buf + f->buf_index, MIN(pending, size));
> +if (rc < 0) {
> +return -errno;
> +}
> +if (rc == 0) {
> +return -EIO;
> +}
> +f->buf_index += rc;
> +size -= rc;
> +}
> +
> +return 0;
> +}

Re: [PATCH v8 04/13] vfio/migration: Allow migration without VFIO IOMMU dirty tracking support

2023-01-20 Thread Alex Williamson

On Mon, 16 Jan 2023 16:11:26 +0200
Avihai Horon  wrote:

> Currently, if IOMMU of a VFIO container doesn't support dirty page
> tracking, migration is blocked. This is because a DMA-able VFIO device
> can dirty RAM pages without updating QEMU about it, thus breaking the
> migration.
> 
> However, this doesn't mean that migration can't be done at all.
> In such case, allow migration and let QEMU VFIO code mark all pages
> dirty.
> 
> This guarantees that all pages that might have gotten dirty are reported
> back, and thus guarantees a valid migration even without VFIO IOMMU
> dirty tracking support.
> 
> The motivation for this patch is the introduction of iommufd [1].
> iommufd can directly implement the /dev/vfio/vfio container IOCTLs by
> mapping them into its internal ops, allowing the usage of these IOCTLs
> over iommufd. However, VFIO IOMMU dirty tracking is not supported by
> this VFIO compatibility API.
> 
> This patch will allow migration by hosts that use the VFIO compatibility
> API and prevent migration regressions caused by the lack of VFIO IOMMU
> dirty tracking support.
> 
> [1]
> https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_...@nvidia.com/
> 
> Signed-off-by: Avihai Horon 
> ---
>  hw/vfio/common.c| 20 ++--
>  hw/vfio/migration.c |  3 +--
>  2 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 130e5d1dc7..f6dd571549 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -488,6 +488,12 @@ static int vfio_dma_unmap(VFIOContainer *container,
>  return -errno;
>  }
>  
> +if (iotlb && vfio_devices_all_running_and_saving(container)) {
> +cpu_physical_memory_set_dirty_range(iotlb->translated_addr, size,
> +tcg_enabled() ? 
> DIRTY_CLIENTS_ALL :
> +DIRTY_CLIENTS_NOCODE);

I take it this is an attempt to decipher the mask arg based on its use
in cpu_physical_memory_set_dirty_lebitmap().  I'm attempting to do the
same.  It seems like it must logically be the case that
global_dirty_tracking is set to pass the running-and-saving test, but I
can't connect the pieces.  Is this your understanding as well and the
reason we don't also need to optionally exclude DIRTY_MEMORY_MIGRATION?
Thanks,

Alex

> +}
> +
>  return 0;
>  }
>  
> @@ -1201,6 +1207,10 @@ static void vfio_set_dirty_page_tracking(VFIOContainer 
> *container, bool start)
>  .argsz = sizeof(dirty),
>  };
>  
> +if (!container->dirty_pages_supported) {
> +return;
> +}
> +
>  if (start) {
>  dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
>  } else {
> @@ -1236,6 +1246,13 @@ static int vfio_get_dirty_bitmap(VFIOContainer 
> *container, uint64_t iova,
>  uint64_t pages;
>  int ret;
>  
> +if (!container->dirty_pages_supported) {
> +cpu_physical_memory_set_dirty_range(ram_addr, size,
> +tcg_enabled() ? 
> DIRTY_CLIENTS_ALL :
> +DIRTY_CLIENTS_NOCODE);
> +return 0;
> +}
> +
>  dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range));
>  
>  dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range);
> @@ -1409,8 +1426,7 @@ static void vfio_listener_log_sync(MemoryListener 
> *listener,
>  {
>  VFIOContainer *container = container_of(listener, VFIOContainer, 
> listener);
>  
> -if (vfio_listener_skipped_section(section) ||
> -!container->dirty_pages_supported) {
> +if (vfio_listener_skipped_section(section)) {
>  return;
>  }
>  
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 09fe7c1de2..552c2313b2 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -860,11 +860,10 @@ int64_t vfio_mig_bytes_transferred(void)
>  
>  int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
>  {
> -VFIOContainer *container = vbasedev->group->container;
>  struct vfio_region_info *info = NULL;
>  int ret = -ENOTSUP;
>  
> -if (!vbasedev->enable_migration || !container->dirty_pages_supported) {
> +if (!vbasedev->enable_migration) {
>  goto add_blocker;
>  }
>

Re: cxl nvdimm Potential probe ordering issues.

2023-01-20 Thread Dan Williams

Gregory Price wrote:
> On Fri, Jan 20, 2023 at 09:38:13AM -0800, Dan Williams wrote:
> > As it stands currently that dax device and the cxl device are not
> > related since a default dax-device is loaded just based on the presence
> > of an EFI_MEMORY_SP address range in the address map. With the new ram
> > enabling that default device will be elided and CXL will register a
> > dax-device parented by a cxl region.
> > 
> > >- The memory *does not* auto-online, instead the dax device can be
> > >  onlined as system-ram *manually* via ndctl and friends
> > 
> > That *manually* part is the problem that needs distro help to solve. It
> > should be the case that by default all Linux distributions auto-online
> > all dax-devices. If that happens to online memory that is too slow for
> > general use, or too high-performance / precious for general purpose use
> > then the administrator can set policy after the fact. Unfortunately user
> > policy can not be applied if these memory ranges were onlined by the
> > kernel at boot , so that's why the kernel policy defaults to not-online.
> > 
> > In other words, there is no guarantee that memory that was assigned to
> > the general purpose pool at boot can be removed. The only guaranteed
> > behavior is to never give the memory to the core kernel in the first
> > instance and always let user policy route the memory.
> > 
> > > 3) The code creates an nvdimm_bridge IFF a CFMW is defined - regardless
> > >of the type-3 device configuration (pmem-only or vmem-only)
> > 
> > Correct, the top-level bus code (cxl_acpi) and the endpoint code
> > (cxl_mem, cxl_port) need to handshake before establishing regions. For
> > pmem regions the platform needs to claim the availability of a pmem
> > capable CXL window.
> > 
> > > 4) As you can see above, multiple decoders are registered.  I'm not sure
> > >if that's correct or not, but it does seem odd given there's only one
> > >cxl type-3 device.  Odd that decoder0.0 shows up when CFMW is there,
> > >but not when it isn't.
> > 
> > CXL windows are modeled as decoders hanging off the the CXL root device
> > (ACPI0017 on ACPI based platforms). An endpoint decoder can then map a
> > selection of that window.
> > 
> > > Don't know why I haven't thought of this until now, but is the CFMW code
> > > reporting something odd about what's behind it?  Is it assuming the
> > > devices are pmem?
> > 
> > No, the cxl_acpi code is just advertising platform decode possibilities
> > independent of what devices show up. Think of this like the PCI MMIO
> > space that gets allocated to a root bridge at the beginning of time.
> > That space may or may not get consumed based on what devices show up
> > downstream.
> 
> Thank you for the explanation Dan, and thank you for you patience
> @JCameron.  I'm fairly sure I grok it now.
> 
> Summarizing to make sure: the cxl driver is providing what would be the
> CXL.io (control) path, and the CXL.mem path is basically being simulated
> by what otherwise would be a traditional PCI memory region. This explains
> why turning off Legacy mode drops the dax devices, and why the topology
> looks strange - the devices are basically attached in 2 different ways.
> 
> Might there be interest from the QEMU community to implement this
> legacy-style setup in the short term, in an effort to test the the
> control path of type-3 devices while we wait for the kernel to catch up?
> 
> Or should we forget this mode ever existed and just barrel forward
> with HDM decoders and writing the kernel code to hook up the underlying
> devices in drivers/cxl?

Which mode are you referring?

The next steps for the kernel enabling relevant to this thread are:

* ram region discovery (platform firmware or kexec established)
* ram region creation
* pmem region discovery (from labels)

[PATCH] target/i386: translate GPA to HPA even in unpaged mode

2023-01-20 Thread Bernhard Kauer

Guest to host page translation is missing if the guest runs in unpaged mode.
See last sentence in AMD SDM rev 3.40 section 15.25.5.

Signed-off-by: Bernhard Kauer 
---
 target/i386/tcg/sysemu/excp_helper.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/i386/tcg/sysemu/excp_helper.c 
b/target/i386/tcg/sysemu/excp_helper.c
index 55bd1194d3..8d9152245b 100644
--- a/target/i386/tcg/sysemu/excp_helper.c
+++ b/target/i386/tcg/sysemu/excp_helper.c
@@ -576,6 +576,9 @@ static bool get_physical_address(CPUX86State *env, vaddr 
addr,
 }
 return mmu_translate(env, , out, err);
 }
+if (use_stage2) {
+return get_physical_address(env, addr, access_type, 
MMU_NESTED_IDX, out, err);
+}
 break;
 }

[PATCH 0/2] tests/avocado: Pass parameters via Makefile

2023-01-20 Thread Fabiano Rosas

This is intended to replace the last two patches of Daniel's series:
https://lore.kernel.org/r/20230118124348.364771-1-dbarb...@ventanamicro.com

Currently, the initialization code in setUp() infers properties of the
tests & host/target machine by looking at the avocado tags present in
the test. If there are no tags some best-effort fallbacks are chosen.

This means that for generic tests they end up always choosing the QEMU
target that matches the host architecture, which is not always
desirable. If we allow command line parameters to override the
fallback, we can alter this behavior.

patch 1 - change precedence to tags > params to avoid changing tests
  that do have tags;

patch 2 - wires up the new AVOCADO_PARAMS variable to the '-p' avocado
  command line option;

Fabiano Rosas (2):
  tests/avocado: Invert parameter vs. tag precedence during setUp
  tests/avocado: Allow passing command line parameters via Makefile

 tests/Makefile.include |  6 -
 tests/avocado/avocado_qemu/__init__.py | 32 +++---
 2 files changed, 24 insertions(+), 14 deletions(-)

-- 
2.35.3

[PATCH 2/2] tests/avocado: Allow passing command line parameters via Makefile

2023-01-20 Thread Fabiano Rosas

Add support for the 'avocado run' "-p" option, which allows us to pass
parameters in the form key=value to be applied to all tests selected
for a given run. This is useful to force generic tests to use a
specific machine, cpu or qemu-binary where otherwise the defaults
would be used.

E.g.:
 $ make check-avocado AVOCADO_PARAMS="machine=virt arch=riscv64"

 

Signed-off-by: Fabiano Rosas 
---
 tests/Makefile.include | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 9422ddaece..f92e730aa0 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -107,6 +107,10 @@ else
AVOCADO_CMDLINE_TAGS=$(addprefix -t , $(AVOCADO_TAGS))
 endif
 
+ifdef AVOCADO_PARAMS
+   AVOCADO_CMDLINE_PARAMS=$(addprefix -p , $(AVOCADO_PARAMS))
+endif
+
 quiet-venv-pip = $(quiet-@)$(call quiet-command-run, \
 $(TESTS_PYTHON) -m pip -q --disable-pip-version-check $1, \
 "VENVPIP","$1")
@@ -144,7 +148,7 @@ check-avocado: check-venv $(TESTS_RESULTS_DIR) get-vm-images
 --show=$(AVOCADO_SHOW) run --job-results-dir=$(TESTS_RESULTS_DIR) \
 $(if $(AVOCADO_TAGS),, --filter-by-tags-include-empty \
--filter-by-tags-include-empty-key) \
-$(AVOCADO_CMDLINE_TAGS) \
+$(AVOCADO_CMDLINE_TAGS) $(AVOCADO_CMDLINE_PARAMS) \
 $(if $(GITLAB_CI),,--failfast) $(AVOCADO_TESTS), \
 "AVOCADO", "tests/avocado")
 
-- 
2.35.3

[PATCH 1/2] tests/avocado: Invert parameter vs. tag precedence during setUp

2023-01-20 Thread Fabiano Rosas

We currently never pass parameters to the avocado process via
Makefile. To start doing so we need to invert the precedence between
command line parameters and tags, otherwise a command line parameter
would override values for all the tests, which is unlikely to be a
common use-case.

A more likely use-case is to force certain values for the tests that
have no tags. For instance, if a test has no 'arch' tags and therefore
can run for all targets, one could possibly force it to run on a
certain target with an arch=foo parameter.

This applies to the variables set during setUp(): arch, machine, cpu,
distro_name, distro_version. Parameters used directly in tests or read
via self.params.get are left unchanged.

Signed-off-by: Fabiano Rosas 
---
 tests/avocado/avocado_qemu/__init__.py | 32 +++---
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/tests/avocado/avocado_qemu/__init__.py 
b/tests/avocado/avocado_qemu/__init__.py
index 910f3ba1ea..a181cac383 100644
--- a/tests/avocado/avocado_qemu/__init__.py
+++ b/tests/avocado/avocado_qemu/__init__.py
@@ -240,12 +240,23 @@ def _get_unique_tag_val(self, tag_name):
 return vals.pop()
 return None
 
+def _get_prop(self, name):
+"""
+Infer test properties based on tags. If no tag is present,
+look for a command line parameter of the same name.
+"""
+val = self._get_unique_tag_val(name)
+if not val:
+# If there's no tag, look for a command line
+# parameter. This allows the user to override any defaults
+# the caller of this function would choose if we were to
+# return None.
+val = self.params.get(name)
+return val
+
 def setUp(self, bin_prefix):
-self.arch = self.params.get('arch',
-default=self._get_unique_tag_val('arch'))
-
-self.cpu = self.params.get('cpu',
-   default=self._get_unique_tag_val('cpu'))
+self.arch = self._get_prop('arch')
+self.cpu = self._get_prop('cpu')
 
 default_qemu_bin = pick_default_qemu_bin(bin_prefix, arch=self.arch)
 self.qemu_bin = self.params.get('qemu_bin',
@@ -274,8 +285,7 @@ def setUp(self):
 
 super().setUp('qemu-system-')
 
-self.machine = self.params.get('machine',
-   
default=self._get_unique_tag_val('machine'))
+self.machine = self._get_prop('machine')
 
 def require_accelerator(self, accelerator):
 """
@@ -529,15 +539,11 @@ class LinuxTest(LinuxSSHMixIn, QemuSystemTest):
 memory = '1024'
 
 def _set_distro(self):
-distro_name = self.params.get(
-'distro',
-default=self._get_unique_tag_val('distro'))
+distro_name = self._get_prop('distro')
 if not distro_name:
 distro_name = 'fedora'
 
-distro_version = self.params.get(
-'distro_version',
-default=self._get_unique_tag_val('distro_version'))
+distro_version = self._get_prop('distro_version')
 if not distro_version:
 distro_version = '31'
 
-- 
2.35.3

[PATCH] target/i386: translate GPA to HPA even in unpaged mode

2023-01-20 Thread Bernhard Kauer

Guest to host page translation should be done even if the guest runs in unpaged 
mode.
See last sentence in AMD SDM rev 3.40 section 15.25.5.

Signed-off-by: Bernhard Kauer 
---
 target/i386/tcg/sysemu/excp_helper.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/i386/tcg/sysemu/excp_helper.c 
b/target/i386/tcg/sysemu/excp_helper.c
index 55bd1194d3..8d9152245b 100644
--- a/target/i386/tcg/sysemu/excp_helper.c
+++ b/target/i386/tcg/sysemu/excp_helper.c
@@ -576,6 +576,9 @@ static bool get_physical_address(CPUX86State *env, vaddr 
addr,
             }
             return mmu_translate(env, , out, err);
         }
+        if (use_stage2) {
+            return get_physical_address(env, addr, access_type, 
MMU_NESTED_IDX, out, err);
+        }
         break;
     }

Re: cxl nvdimm Potential probe ordering issues.

2023-01-20 Thread Gregory Price

On Fri, Jan 20, 2023 at 09:38:13AM -0800, Dan Williams wrote:
> As it stands currently that dax device and the cxl device are not
> related since a default dax-device is loaded just based on the presence
> of an EFI_MEMORY_SP address range in the address map. With the new ram
> enabling that default device will be elided and CXL will register a
> dax-device parented by a cxl region.
> 
> >  - The memory *does not* auto-online, instead the dax device can be
> >onlined as system-ram *manually* via ndctl and friends
> 
> That *manually* part is the problem that needs distro help to solve. It
> should be the case that by default all Linux distributions auto-online
> all dax-devices. If that happens to online memory that is too slow for
> general use, or too high-performance / precious for general purpose use
> then the administrator can set policy after the fact. Unfortunately user
> policy can not be applied if these memory ranges were onlined by the
> kernel at boot , so that's why the kernel policy defaults to not-online.
> 
> In other words, there is no guarantee that memory that was assigned to
> the general purpose pool at boot can be removed. The only guaranteed
> behavior is to never give the memory to the core kernel in the first
> instance and always let user policy route the memory.
> 
> > 3) The code creates an nvdimm_bridge IFF a CFMW is defined - regardless
> >of the type-3 device configuration (pmem-only or vmem-only)
> 
> Correct, the top-level bus code (cxl_acpi) and the endpoint code
> (cxl_mem, cxl_port) need to handshake before establishing regions. For
> pmem regions the platform needs to claim the availability of a pmem
> capable CXL window.
> 
> > 4) As you can see above, multiple decoders are registered.  I'm not sure
> >if that's correct or not, but it does seem odd given there's only one
> >  cxl type-3 device.  Odd that decoder0.0 shows up when CFMW is there,
> >  but not when it isn't.
> 
> CXL windows are modeled as decoders hanging off the the CXL root device
> (ACPI0017 on ACPI based platforms). An endpoint decoder can then map a
> selection of that window.
> 
> > Don't know why I haven't thought of this until now, but is the CFMW code
> > reporting something odd about what's behind it?  Is it assuming the
> > devices are pmem?
> 
> No, the cxl_acpi code is just advertising platform decode possibilities
> independent of what devices show up. Think of this like the PCI MMIO
> space that gets allocated to a root bridge at the beginning of time.
> That space may or may not get consumed based on what devices show up
> downstream.

Thank you for the explanation Dan, and thank you for you patience
@JCameron.  I'm fairly sure I grok it now.

Summarizing to make sure: the cxl driver is providing what would be the
CXL.io (control) path, and the CXL.mem path is basically being simulated
by what otherwise would be a traditional PCI memory region. This explains
why turning off Legacy mode drops the dax devices, and why the topology
looks strange - the devices are basically attached in 2 different ways.

Might there be interest from the QEMU community to implement this
legacy-style setup in the short term, in an effort to test the the
control path of type-3 devices while we wait for the kernel to catch up?

Or should we forget this mode ever existed and just barrel forward
with HDM decoders and writing the kernel code to hook up the underlying
devices in drivers/cxl?

Patch: target/i386: Change CR4 before CR0 in SVM

2023-01-20 Thread Bernhard Kauer

 There is a dependency in cpu_x86_update_cr0() to the current value of CR4
 to enable or disable long-mode.  This value is outdated when switching into
 or out of SVM. This leads to invalid CPU state when returning from an unpaged
 VM when EFER.LME is set.
    
Signed-off-by: Bernhard Kauer 

diff --git a/target/i386/tcg/sysemu/svm_helper.c 
b/target/i386/tcg/sysemu/svm_helper.c
index 2d27731b60..229a22816e 100644
--- a/target/i386/tcg/sysemu/svm_helper.c
+++ b/target/i386/tcg/sysemu/svm_helper.c
@@ -312,8 +312,8 @@ void helper_vmrun(CPUX86State *env, int aflag, int 
next_eip_addend)
     x86_stq_phys(cs,
              env->vm_vmcb + offsetof(struct vmcb, control.exit_info_2), 0);
 
-    cpu_x86_update_cr0(env, new_cr0);
     cpu_x86_update_cr4(env, new_cr4);
+    cpu_x86_update_cr0(env, new_cr0);
     cpu_x86_update_cr3(env, new_cr3);
     env->cr[2] = x86_ldq_phys(cs,
                           env->vm_vmcb + offsetof(struct vmcb, save.cr2));
@@ -812,13 +812,13 @@ void do_vmexit(CPUX86State *env)
     env->idt.limit = x86_ldl_phys(cs, env->vm_hsave + offsetof(struct vmcb,
                                                        save.idtr.limit));
 
+    cpu_x86_update_cr4(env, x86_ldq_phys(cs,
+                                     env->vm_hsave + offsetof(struct vmcb,
+                                                              save.cr4)));
     cpu_x86_update_cr0(env, x86_ldq_phys(cs,
                                      env->vm_hsave + offsetof(struct vmcb,
                                                               save.cr0)) |
                        CR0_PE_MASK);
-    cpu_x86_update_cr4(env, x86_ldq_phys(cs,
-                                     env->vm_hsave + offsetof(struct vmcb,
-                                                              save.cr4)));
     cpu_x86_update_cr3(env, x86_ldq_phys(cs,
                                      env->vm_hsave + offsetof(struct vmcb,
                                                               save.cr3)));

Re: [PATCH v9] xen/pt: reserve PCI slot 2 for Intel igd-passthru

2023-01-20 Thread Stefano Stabellini

On Sat, 14 Jan 2023, Chuck Zmudzinski wrote:
> Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
> as noted in docs/igd-assign.txt in the Qemu source code.
> 
> Currently, when the xl toolstack is used to configure a Xen HVM guest with
> Intel IGD passthrough to the guest with the Qemu upstream device model,
> a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
> a different slot. This problem often prevents the guest from booting.
> 
> The only available workaround is not good: Configure Xen HVM guests to use
> the old and no longer maintained Qemu traditional device model available
> from xenbits.xen.org which does reserve slot 2 for the Intel IGD.
> 
> To implement this feature in the Qemu upstream device model for Xen HVM
> guests, introduce the following new functions, types, and macros:
> 
> * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
> * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
> * typedef XenPTQdevRealize function pointer
> * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
> * xen_igd_reserve_slot and xen_igd_clear_slot functions
> 
> Michael Tsirkin:
> * Introduce XEN_PCI_IGD_DOMAIN, XEN_PCI_IGD_BUS, XEN_PCI_IGD_DEV, and
>   XEN_PCI_IGD_FN - use them to compute the value of XEN_PCI_IGD_SLOT_MASK
> 
> The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
> member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
> the xl toolstack with the gfx_passthru option enabled, which sets the
> igd-passthru=on option to Qemu for the Xen HVM machine type.
> 
> The new xen_igd_reserve_slot function also needs to be implemented in
> hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
> when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
> in which case it does nothing.
> 
> The new xen_igd_clear_slot function overrides qdev->realize of the parent
> PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
> since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
> created in hw/i386/pc_piix.c for the case when igd-passthru=on.
> 
> Move the call to xen_host_pci_device_get, and the associated error
> handling, from xen_pt_realize to the new xen_igd_clear_slot function to
> initialize the device class and vendor values which enables the checks for
> the Intel IGD to succeed. The verification that the host device is an
> Intel IGD to be passed through is done by checking the domain, bus, slot,
> and function values as well as by checking that gfx_passthru is enabled,
> the device class is VGA, and the device vendor in Intel.
> 
> Signed-off-by: Chuck Zmudzinski 

Hi Chuck,

The approach looks OK in principle to me. I only have one question: for
other PCI devices (not Intel IGD), where is xen_host_pci_device_get
called now?

It looks like that xen_igd_reserve_slot would return without setting
slot_reserved_mask, hence xen_igd_clear_slot would also return without
calling xen_host_pci_device_get. And xen_pt_realize doesn't call
xen_host_pci_device_get any longer.

Am I missing something?


> ---
> Notes that might be helpful to reviewers of patched code in hw/xen:
> 
> The new functions and types are based on recommendations from Qemu docs:
> https://qemu.readthedocs.io/en/latest/devel/qom.html
> 
> Notes that might be helpful to reviewers of patched code in hw/i386:
> 
> The small patch to hw/i386/pc_piix.c is protected by CONFIG_XEN so it does
> not affect builds that do not have CONFIG_XEN defined.
> 
> xen_igd_gfx_pt_enabled() in the patched hw/i386/pc_piix.c file is an
> existing function that is only true when Qemu is built with
> xen-pci-passthrough enabled and the administrator has configured the Xen
> HVM guest with Qemu's igd-passthru=on option.
> 
> v2: Remove From:  tag at top of commit message
> 
> v3: Changed the test for the Intel IGD in xen_igd_clear_slot:
> 
> if (is_igd_vga_passthrough(>real_device) &&
> (s->real_device.vendor_id == PCI_VENDOR_ID_INTEL)) {
> 
> is changed to
> 
> if (xen_igd_gfx_pt_enabled() && (s->hostaddr.slot == 2)
> && (s->hostaddr.function == 0)) {
> 
> I hoped that I could use the test in v2, since it matches the
> other tests for the Intel IGD in Qemu and Xen, but those tests
> do not work because the necessary data structures are not set with
> their values yet. So instead use the test that the administrator
> has enabled gfx_passthru and the device address on the host is
> 02.0. This test does detect the Intel IGD correctly.
> 
> v4: Use brchu...@aol.com instead of brchu...@netscape.net for the author's
> email address to match the address used by the same author in commits
> be9c61da and c0e86b76
> 
> Change variable for XEN_PT_DEVICE_CLASS: xptc changed to xpdc
> 
> v5: The patch of xen_pt.c was re-worked to allow a more consistent test
> for the Intel IGD

Re: [RFC PATCH v5 3/9] target/arm: Use "max" as default cpu for the virt machine with KVM

2023-01-20 Thread Richard Henderson


On 1/20/23 08:48, Fabiano Rosas wrote:

Now that the cortex-a15 is under CONFIG_TCG, use as default CPU for a
KVM-only build the 'max' cpu.

Note that we cannot use 'host' here because the qtests can run without
any other accelerator (than qtest) and 'host' depends on KVM being
enabled.

Signed-off-by: Fabiano Rosas
Cc: Daniel P. Berrangé
Cc: Thomas Huth
---
  hw/arm/virt.c | 4 
  1 file changed, 4 insertions(+)


It does seem like that path of least resistance.

Acked-by: Richard Henderson 


r~

Re: [PATCH v8] xen/pt: reserve PCI slot 2 for Intel igd-passthru

2023-01-20 Thread Chuck Zmudzinski

On 1/20/2023 11:01 AM, Igor Mammedov wrote:
> On Tue, 17 Jan 2023 09:50:23 -0500
> Chuck Zmudzinski  wrote:
>
> > On 1/17/2023 5:35 AM, Igor Mammedov wrote:
> > > On Mon, 16 Jan 2023 13:00:53 -0500
> > > Chuck Zmudzinski  wrote:
> > >  
> > > > On 1/16/23 10:33, Igor Mammedov wrote:  
> > > > > On Fri, 13 Jan 2023 16:31:26 -0500
> > > > > Chuck Zmudzinski  wrote:
> > > > > 
> > > > >> On 1/13/23 4:33 AM, Igor Mammedov wrote:
> > > > >> > On Thu, 12 Jan 2023 23:14:26 -0500
> > > > >> > Chuck Zmudzinski  wrote:
> > > > >> >   
> > > > >> >> On 1/12/23 6:03 PM, Michael S. Tsirkin wrote:  
> > > > >> >> > On Thu, Jan 12, 2023 at 10:55:25PM +, Bernhard Beschow 
> > > > >> >> > wrote:
> > > > >> >> >> I think the change Michael suggests is very minimalistic: Move 
> > > > >> >> >> the if
> > > > >> >> >> condition around xen_igd_reserve_slot() into the function 
> > > > >> >> >> itself and
> > > > >> >> >> always call it there unconditionally -- basically turning 
> > > > >> >> >> three lines
> > > > >> >> >> into one. Since xen_igd_reserve_slot() seems very problem 
> > > > >> >> >> specific,
> > > > >> >> >> Michael further suggests to rename it to something more 
> > > > >> >> >> general. All
> > > > >> >> >> in all no big changes required.
> > > > >> >> > 
> > > > >> >> > yes, exactly.
> > > > >> >> > 
> > > > >> >> 
> > > > >> >> OK, got it. I can do that along with the other suggestions.  
> > > > >> > 
> > > > >> > have you considered instead of reservation, putting a slot check 
> > > > >> > in device model
> > > > >> > and if it's intel igd being passed through, fail at realize time  
> > > > >> > if it can't take
> > > > >> > required slot (with a error directing user to fix command line)?   
> > > > >> >
> > > > >> 
> > > > >> Yes, but the core pci code currently already fails at realize time
> > > > >> with a useful error message if the user tries to use slot 2 for the
> > > > >> igd, because of the xen platform device which has slot 2. The user
> > > > >> can fix this without patching qemu, but having the user fix it on
> > > > >> the command line is not the best way to solve the problem, primarily
> > > > >> because the user would need to hotplug the xen platform device via a
> > > > >> command line option instead of having the xen platform device added 
> > > > >> by
> > > > >> pc_xen_hvm_init functions almost immediately after creating the pci
> > > > >> bus, and that delay in adding the xen platform device degrades
> > > > >> startup performance of the guest.
> > > > >> 
> > > > >> > That could be less complicated than dealing with slot reservations 
> > > > >> > at the cost of
> > > > >> > being less convenient.  
> > > > >> 
> > > > >> And also a cost of reduced startup performance
> > > > > 
> > > > > Could you clarify how it affects performance (and how much).
> > > > > (as I see, setup done at board_init time is roughly the same
> > > > > as with '-device foo' CLI options, modulo time needed to parse
> > > > > options which should be negligible. and both ways are done before
> > > > > guest runs)
> > > > 
> > > > I preface my answer by saying there is a v9, but you don't
> > > > need to look at that. I will answer all your questions here.
> > > > 
> > > > I am going by what I observe on the main HDMI display with the
> > > > different approaches. With the approach of not patching Qemu
> > > > to fix this, which requires adding the Xen platform device a
> > > > little later, the length of time it takes to fully load the
> > > > guest is increased. I also noticed with Linux guests that use
> > > > the grub bootoader, the grub vga driver cannot display the
> > > > grub boot menu at the native resolution of the display, which
> > > > in the tested case is 1920x1080, when the Xen platform device
> > > > is added via a command line option instead of by the
> > > > pc_xen_hvm_init_pci fucntion in pc_piix.c, but with this patch
> > > > to Qemu, the grub menu is displayed at the full, 1920x1080
> > > > native resolution of the display. Once the guest fully loads,
> > > > there is no noticeable difference in performance. It is mainly
> > > > a degradation in startup performance, not performance once
> > > > the guest OS is fully loaded.  
> > > above hints on presence of bug[s] in igd-passthru implementation,
> > > and this patch effectively hides problem instead of trying to figure
> > > out what's wrong and fixing it.
> > >  
> > 
> > I agree that the with the current Qemu there is a bug in the igd-passthru
> > implementation. My proposed patch fixes it. So why not support the
> > proposed patch with a Reviewed-by tag?
> I see the patch as workaround (it might be a reasonable one) and
> it's upto xen maintainers to give Reviewed-by/accept it.
>
> Though I'd add to commit message something about performance issues
> you are experiencing when trying to passthrough IGD manually

No, the device that needs to be passed through manually

[PATCH] tests/qtest: Plug memory leaks in qtest_get_machines

2023-01-20 Thread Fabiano Rosas

These leaks can be avoided:

 759 bytes in 61 blocks are still reachable in loss record 56 of 60
at 0x4034744: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x4A88518: g_malloc (in /usr/lib64/libglib-2.0.so.0.7000.5)
by 0x4AA313E: g_strdup (in /usr/lib64/libglib-2.0.so.0.7000.5)
by 0x12083E: qtest_get_machines (libqtest.c:1323)
by 0x12098C: qtest_cb_for_every_machine (libqtest.c:1348)
by 0x11556C: main (test-hmp.c:160)

 992 bytes in 1 blocks are still reachable in loss record 57 of 60
at 0x4034744: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x4A88518: g_malloc (in /usr/lib64/libglib-2.0.so.0.7000.5)
by 0x120725: qtest_get_machines (libqtest.c:1313)
by 0x12098C: qtest_cb_for_every_machine (libqtest.c:1348)
by 0x11556C: main (test-hmp.c:160)

Signed-off-by: Fabiano Rosas 
---
 tests/qtest/libqtest.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 6b2216cb20..65abac5029 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -1285,6 +1285,18 @@ struct MachInfo {
 char *alias;
 };
 
+static void qtest_free_machine_info(gpointer data)
+{
+struct MachInfo *machines = data;
+int i;
+
+for (i = 0; machines[i].name != NULL; i++) {
+g_free((void *)machines[i].name);
+g_free((void *)machines[i].alias);
+}
+g_free(machines);
+}
+
 /*
  * Returns an array with pointers to the available machine names.
  * The terminating entry has the name set to NULL.
@@ -1336,6 +1348,8 @@ static struct MachInfo *qtest_get_machines(void)
 qobject_unref(response);
 
 memset([idx], 0, sizeof(struct MachInfo)); /* Terminating entry */
+g_test_queue_destroy(qtest_free_machine_info, machines);
+
 return machines;
 }
 
-- 
2.35.3

Re: [PATCH v8] xen/pt: reserve PCI slot 2 for Intel igd-passthru

2023-01-20 Thread Chuck Zmudzinski

On 1/20/2023 11:57 AM, Stefano Stabellini wrote:
> On Tue, 17 Jan 2023, Chuck Zmudzinski wrote:
> > On 1/17/2023 6:04 AM, Igor Mammedov wrote:
> > > On Mon, 16 Jan 2023 13:00:53 -0500
> > > Chuck Zmudzinski  wrote:
> > >
> > > > On 1/16/23 10:33, Igor Mammedov wrote:
> > > > > On Fri, 13 Jan 2023 16:31:26 -0500
> > > > > Chuck Zmudzinski  wrote:
> > > > >   
> > > > >> On 1/13/23 4:33 AM, Igor Mammedov wrote:  
> > > > >> > On Thu, 12 Jan 2023 23:14:26 -0500
> > > > >> > Chuck Zmudzinski  wrote:
> > > > >> > 
> > > > >> >> On 1/12/23 6:03 PM, Michael S. Tsirkin wrote:
> > > > >> >> > On Thu, Jan 12, 2023 at 10:55:25PM +, Bernhard Beschow 
> > > > >> >> > wrote:  
> > > > >> >> >> I think the change Michael suggests is very minimalistic: Move 
> > > > >> >> >> the if
> > > > >> >> >> condition around xen_igd_reserve_slot() into the function 
> > > > >> >> >> itself and
> > > > >> >> >> always call it there unconditionally -- basically turning 
> > > > >> >> >> three lines
> > > > >> >> >> into one. Since xen_igd_reserve_slot() seems very problem 
> > > > >> >> >> specific,
> > > > >> >> >> Michael further suggests to rename it to something more 
> > > > >> >> >> general. All
> > > > >> >> >> in all no big changes required.  
> > > > >> >> > 
> > > > >> >> > yes, exactly.
> > > > >> >> >   
> > > > >> >> 
> > > > >> >> OK, got it. I can do that along with the other suggestions.
> > > > >> > 
> > > > >> > have you considered instead of reservation, putting a slot check 
> > > > >> > in device model
> > > > >> > and if it's intel igd being passed through, fail at realize time  
> > > > >> > if it can't take
> > > > >> > required slot (with a error directing user to fix command line)?   
> > > > >> >  
> > > > >> 
> > > > >> Yes, but the core pci code currently already fails at realize time
> > > > >> with a useful error message if the user tries to use slot 2 for the
> > > > >> igd, because of the xen platform device which has slot 2. The user
> > > > >> can fix this without patching qemu, but having the user fix it on
> > > > >> the command line is not the best way to solve the problem, primarily
> > > > >> because the user would need to hotplug the xen platform device via a
> > > > >> command line option instead of having the xen platform device added 
> > > > >> by
> > > > >> pc_xen_hvm_init functions almost immediately after creating the pci
> > > > >> bus, and that delay in adding the xen platform device degrades
> > > > >> startup performance of the guest.
> > > > >>   
> > > > >> > That could be less complicated than dealing with slot reservations 
> > > > >> > at the cost of
> > > > >> > being less convenient.
> > > > >> 
> > > > >> And also a cost of reduced startup performance  
> > > > > 
> > > > > Could you clarify how it affects performance (and how much).
> > > > > (as I see, setup done at board_init time is roughly the same
> > > > > as with '-device foo' CLI options, modulo time needed to parse
> > > > > options which should be negligible. and both ways are done before
> > > > > guest runs)  
> > > > 
> > > > I preface my answer by saying there is a v9, but you don't
> > > > need to look at that. I will answer all your questions here.
> > > > 
> > > > I am going by what I observe on the main HDMI display with the
> > > > different approaches. With the approach of not patching Qemu
> > > > to fix this, which requires adding the Xen platform device a
> > > > little later, the length of time it takes to fully load the
> > > > guest is increased. I also noticed with Linux guests that use
> > > > the grub bootoader, the grub vga driver cannot display the
> > > > grub boot menu at the native resolution of the display, which
> > > > in the tested case is 1920x1080, when the Xen platform device
> > > > is added via a command line option instead of by the
> > > > pc_xen_hvm_init_pci fucntion in pc_piix.c, but with this patch
> > > > to Qemu, the grub menu is displayed at the full, 1920x1080
> > > > native resolution of the display. Once the guest fully loads,
> > > > there is no noticeable difference in performance. It is mainly
> > > > a degradation in startup performance, not performance once
> > > > the guest OS is fully loaded.
> > >
> > > Looking at igd-assign.txt, it recommends to add IGD using '-device' CLI
> > > option, and actually drop at least graphics defaults explicitly.
> > > So it is expected to work fine even when IGD is constructed with
> > > '-device'.
> > >
> > > Could you provide full CLI current xen starts QEMU with and then
> > > a CLI you used (with explicit -device for IGD) that leads
> > > to reduced performance?
> > >
> > > CCing vfio folks who might have an idea what could be wrong based
> > > on vfio experience.
> > 
> > Actually, the igd is not added with an explicit -device option using Xen:
> > 
> >    1573 ?    Ssl    0:42 /usr/bin/qemu-system-i386 -xen-domid 1 
> > -no-shutdown -chardev 
> >

[RFC PATCH v5 7/9] target/avocado: Pass parameters to migration test on aarch64

2023-01-20 Thread Fabiano Rosas

The migration tests are currently broken for an aarch64 host because
the tests pass no 'machine' and 'cpu' options on the QEMU command
line. Most other architectures define a default value in QEMU for
these options, but arm does not.

Add these options to the test class in case the test is being executed
in an aarch64 host.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Richard Henderson 
---
 tests/avocado/migration.py | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tests/avocado/migration.py b/tests/avocado/migration.py
index 4b25680c50..ffd3db0f35 100644
--- a/tests/avocado/migration.py
+++ b/tests/avocado/migration.py
@@ -11,6 +11,8 @@
 
 
 import tempfile
+import os
+
 from avocado_qemu import QemuSystemTest
 from avocado import skipUnless
 
@@ -26,6 +28,14 @@ class Migration(QemuSystemTest):
 
 timeout = 10
 
+def setUp(self):
+super().setUp()
+
+arch = os.uname()[4]
+if arch == 'aarch64':
+self.machine = 'virt'
+self.cpu = 'max'
+
 @staticmethod
 def migration_finished(vm):
 return vm.command('query-migrate')['status'] in ('completed', 'failed')
-- 
2.35.3

[RFC PATCH v5 0/9] target/arm: Allow CONFIG_TCG=n builds

2023-01-20 Thread Fabiano Rosas

This series makes the necessary changes to allow the use of
--disable-tcg for arm.

Based on "target/arm: CONFIG_TCG=n part 1":
https://lore.kernel.org/r/20230118193518.26433-1-faro...@suse.de

Since v4:

- Used "max" as the default CPU for KVM-only builds. This allows me to
  drop all the clunky qtest changes and it keeps disabling TCG
  separate from changing cpu defaults.

  I'm neutral towards removing the defaults for arm. We can do that in a
  separate series. It would be nice to make the TCG default equal to the
  non-TCG one. Otherwise we're bound to get reports that "this command
  line used to work" if users switch from: 'CONFIG_TCG=n -accel kvm' to
  'CONFIG_TCG=y -accel kvm' (the latter would try to use the cortex-a15
  as default).

- Move the ifdef around valid_cpus into the patches that move the
  respective cpus. Patches 1 & 2.

v4:
https://lore.kernel.org/r/20230119135424.5417-1-faro...@suse.de

v3:
https://lore.kernel.org/r/20230113140419.4013-1-faro...@suse.de

v2:
https://lore.kernel.org/r/20230109224232.11661-1-faro...@suse.de

v1:
https://lore.kernel.org/r/20230104215835.24692-1-faro...@suse.de

Claudio Fontana (1):
  target/arm: move cpu_tcg to tcg/cpu32.c

Fabiano Rosas (8):
  target/arm: Move 64-bit TCG CPUs into tcg/
  target/arm: Use "max" as default cpu for the virt machine with KVM
  tests/qtest: arm-cpu-features: Match tests to required accelerators
  tests/qtest: Restrict tpm-tis-devices-{swtpm}-test to CONFIG_TCG
  tests/tcg: Do not build/run TCG tests if TCG is disabled
  target/avocado: Pass parameters to migration test on aarch64
  arm/Kconfig: Always select SEMIHOSTING when TCG is present
  arm/Kconfig: Do not build TCG-only boards on a KVM-only build

 configs/devices/aarch64-softmmu/default.mak |   4 -
 configs/devices/arm-softmmu/default.mak |  39 --
 configure   |   6 +-
 hw/arm/Kconfig  |  43 +-
 hw/arm/virt.c   |  10 +-
 target/arm/Kconfig  |   7 +
 target/arm/cpu64.c  | 632 +--
 target/arm/internals.h  |   4 +
 target/arm/meson.build  |   1 -
 target/arm/{cpu_tcg.c => tcg/cpu32.c}   |  13 +-
 target/arm/tcg/cpu64.c  | 654 
 target/arm/tcg/meson.build  |   2 +
 tests/avocado/migration.py  |  10 +
 tests/qtest/arm-cpu-features.c  |  34 +-
 tests/qtest/meson.build |   4 +-
 15 files changed, 772 insertions(+), 691 deletions(-)
 rename target/arm/{cpu_tcg.c => tcg/cpu32.c} (99%)
 create mode 100644 target/arm/tcg/cpu64.c

-- 
2.35.3

[RFC PATCH v5 9/9] arm/Kconfig: Do not build TCG-only boards on a KVM-only build

2023-01-20 Thread Fabiano Rosas

Move all the CONFIG_FOO=y from default.mak into "default y if TCG"
statements in Kconfig. That way they won't be selected when
CONFIG_TCG=n.

I'm leaving CONFIG_ARM_VIRT in default.mak because it allows us to
keep the two default.mak files not empty and keep aarch64-default.mak
including arm-default.mak. That way we don't surprise anyone that's
used to altering these files.

With this change we can start building with --disable-tcg.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Richard Henderson 
---
 configs/devices/aarch64-softmmu/default.mak |  4 --
 configs/devices/arm-softmmu/default.mak | 37 --
 hw/arm/Kconfig  | 42 -
 3 files changed, 41 insertions(+), 42 deletions(-)

diff --git a/configs/devices/aarch64-softmmu/default.mak 
b/configs/devices/aarch64-softmmu/default.mak
index cf43ac8da1..70e05a197d 100644
--- a/configs/devices/aarch64-softmmu/default.mak
+++ b/configs/devices/aarch64-softmmu/default.mak
@@ -2,7 +2,3 @@
 
 # We support all the 32 bit boards so need all their config
 include ../arm-softmmu/default.mak
-
-CONFIG_XLNX_ZYNQMP_ARM=y
-CONFIG_XLNX_VERSAL=y
-CONFIG_SBSA_REF=y
diff --git a/configs/devices/arm-softmmu/default.mak 
b/configs/devices/arm-softmmu/default.mak
index cb3e5aea65..647fbce88d 100644
--- a/configs/devices/arm-softmmu/default.mak
+++ b/configs/devices/arm-softmmu/default.mak
@@ -4,40 +4,3 @@
 # CONFIG_TEST_DEVICES=n
 
 CONFIG_ARM_VIRT=y
-CONFIG_CUBIEBOARD=y
-CONFIG_EXYNOS4=y
-CONFIG_HIGHBANK=y
-CONFIG_INTEGRATOR=y
-CONFIG_FSL_IMX31=y
-CONFIG_MUSICPAL=y
-CONFIG_MUSCA=y
-CONFIG_CHEETAH=y
-CONFIG_SX1=y
-CONFIG_NSERIES=y
-CONFIG_STELLARIS=y
-CONFIG_STM32VLDISCOVERY=y
-CONFIG_REALVIEW=y
-CONFIG_VERSATILE=y
-CONFIG_VEXPRESS=y
-CONFIG_ZYNQ=y
-CONFIG_MAINSTONE=y
-CONFIG_GUMSTIX=y
-CONFIG_SPITZ=y
-CONFIG_TOSA=y
-CONFIG_Z2=y
-CONFIG_NPCM7XX=y
-CONFIG_COLLIE=y
-CONFIG_ASPEED_SOC=y
-CONFIG_NETDUINO2=y
-CONFIG_NETDUINOPLUS2=y
-CONFIG_OLIMEX_STM32_H405=y
-CONFIG_MPS2=y
-CONFIG_RASPI=y
-CONFIG_DIGIC=y
-CONFIG_SABRELITE=y
-CONFIG_EMCRAFT_SF2=y
-CONFIG_MICROBIT=y
-CONFIG_FSL_IMX25=y
-CONFIG_FSL_IMX7=y
-CONFIG_FSL_IMX6UL=y
-CONFIG_ALLWINNER_H3=y
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index e0da8841db..4a2460311a 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -34,20 +34,24 @@ config ARM_VIRT
 
 config CHEETAH
 bool
+default y if TCG && ARM
 select OMAP
 select TSC210X
 
 config CUBIEBOARD
 bool
+default y if TCG && ARM
 select ALLWINNER_A10
 
 config DIGIC
 bool
+default y if TCG && ARM
 select PTIMER
 select PFLASH_CFI02
 
 config EXYNOS4
 bool
+default y if TCG && ARM
 imply I2C_DEVICES
 select A9MPCORE
 select I2C
@@ -60,6 +64,7 @@ config EXYNOS4
 
 config HIGHBANK
 bool
+default y if TCG && ARM
 select A9MPCORE
 select A15MPCORE
 select AHCI
@@ -74,6 +79,7 @@ config HIGHBANK
 
 config INTEGRATOR
 bool
+default y if TCG && ARM
 select ARM_TIMER
 select INTEGRATOR_DEBUG
 select PL011 # UART
@@ -86,12 +92,14 @@ config INTEGRATOR
 
 config MAINSTONE
 bool
+default y if TCG && ARM
 select PXA2XX
 select PFLASH_CFI01
 select SMC91C111
 
 config MUSCA
 bool
+default y if TCG && ARM
 select ARMSSE
 select PL011
 select PL031
@@ -103,6 +111,7 @@ config MARVELL_88W8618
 
 config MUSICPAL
 bool
+default y if TCG && ARM
 select OR_IRQ
 select BITBANG_I2C
 select MARVELL_88W8618
@@ -113,18 +122,22 @@ config MUSICPAL
 
 config NETDUINO2
 bool
+default y if TCG && ARM
 select STM32F205_SOC
 
 config NETDUINOPLUS2
 bool
+default y if TCG && ARM
 select STM32F405_SOC
 
 config OLIMEX_STM32_H405
 bool
+default y if TCG && ARM
 select STM32F405_SOC
 
 config NSERIES
 bool
+default y if TCG && ARM
 select OMAP
 select TMP105   # tempature sensor
 select BLIZZARD # LCD/TV controller
@@ -157,12 +170,14 @@ config PXA2XX
 
 config GUMSTIX
 bool
+default y if TCG && ARM
 select PFLASH_CFI01
 select SMC91C111
 select PXA2XX
 
 config TOSA
 bool
+default y if TCG && ARM
 select ZAURUS  # scoop
 select MICRODRIVE
 select PXA2XX
@@ -170,6 +185,7 @@ config TOSA
 
 config SPITZ
 bool
+default y if TCG && ARM
 select ADS7846 # touch-screen controller
 select MAX111X # A/D converter
 select WM8750  # audio codec
@@ -182,6 +198,7 @@ config SPITZ
 
 config Z2
 bool
+default y if TCG && ARM
 select PFLASH_CFI01
 select WM8750
 select PL011 # UART
@@ -189,6 +206,7 @@ config Z2
 
 config REALVIEW
 bool
+default y if TCG && ARM
 imply PCI_DEVICES
 imply PCI_TESTDEV
 imply I2C_DEVICES
@@ -217,6 +235,7 @@ config REALVIEW
 
 config SBSA_REF
 bool
+default y if TCG && AARCH64
 imply PCI_DEVICES
 select AHCI
 select ARM_SMMUV3
@@ -232,11 +251,13 @@ config SBSA_REF
 
 config SABRELITE
 bool
+default y if TCG &&

[RFC PATCH v5 1/9] target/arm: Move 64-bit TCG CPUs into tcg/

2023-01-20 Thread Fabiano Rosas

Move the 64-bit CPUs that are TCG-only:
- cortex-a35
- cortex-a55
- cortex-a72
- cortex-a76
- a64fx
- neoverse-n1

Keep the CPUs that can be used with KVM:
- cortex-a57
- cortex-a53
- max
- host

For the special case "max" CPU, there's a nuance that while KVM/HVF
use the "host" model instead, we still cannot move all of the TCG code
into the tcg directory because the qtests might reach the !kvm && !hvf
branch. Keep the cortex_a57_initfn() call to cover that scenario.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Richard Henderson 
---
First hunk is new.
Fixed missing return statement in aarch64_max_initfn.
---
 hw/arm/virt.c  |   6 +-
 target/arm/cpu64.c | 632 +--
 target/arm/internals.h |   4 +
 target/arm/tcg/cpu64.c | 654 +
 target/arm/tcg/meson.build |   1 +
 5 files changed, 675 insertions(+), 622 deletions(-)
 create mode 100644 target/arm/tcg/cpu64.c

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ea2413a0ba..d243ae28e2 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -205,14 +205,16 @@ static const int a15irqmap[] = {
 static const char *valid_cpus[] = {
 ARM_CPU_TYPE_NAME("cortex-a7"),
 ARM_CPU_TYPE_NAME("cortex-a15"),
+#ifdef CONFIG_TCG
 ARM_CPU_TYPE_NAME("cortex-a35"),
-ARM_CPU_TYPE_NAME("cortex-a53"),
 ARM_CPU_TYPE_NAME("cortex-a55"),
-ARM_CPU_TYPE_NAME("cortex-a57"),
 ARM_CPU_TYPE_NAME("cortex-a72"),
 ARM_CPU_TYPE_NAME("cortex-a76"),
 ARM_CPU_TYPE_NAME("a64fx"),
 ARM_CPU_TYPE_NAME("neoverse-n1"),
+#endif
+ARM_CPU_TYPE_NAME("cortex-a53"),
+ARM_CPU_TYPE_NAME("cortex-a57"),
 ARM_CPU_TYPE_NAME("host"),
 ARM_CPU_TYPE_NAME("max"),
 };
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 5dfdae7bd2..ce3af48e6d 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -24,6 +24,8 @@
 #include "qemu/module.h"
 #include "sysemu/kvm.h"
 #include "sysemu/hvf.h"
+#include "sysemu/qtest.h"
+#include "sysemu/tcg.h"
 #include "kvm_arm.h"
 #include "hvf_arm.h"
 #include "qapi/visitor.h"
@@ -31,86 +33,6 @@
 #include "internals.h"
 #include "cpregs.h"
 
-static void aarch64_a35_initfn(Object *obj)
-{
-ARMCPU *cpu = ARM_CPU(obj);
-
-cpu->dtb_compatible = "arm,cortex-a35";
-set_feature(>env, ARM_FEATURE_V8);
-set_feature(>env, ARM_FEATURE_NEON);
-set_feature(>env, ARM_FEATURE_GENERIC_TIMER);
-set_feature(>env, ARM_FEATURE_AARCH64);
-set_feature(>env, ARM_FEATURE_CBAR_RO);
-set_feature(>env, ARM_FEATURE_EL2);
-set_feature(>env, ARM_FEATURE_EL3);
-set_feature(>env, ARM_FEATURE_PMU);
-
-/* From B2.2 AArch64 identification registers. */
-cpu->midr = 0x411fd040;
-cpu->revidr = 0;
-cpu->ctr = 0x84448004;
-cpu->isar.id_pfr0 = 0x0131;
-cpu->isar.id_pfr1 = 0x00011011;
-cpu->isar.id_dfr0 = 0x03010066;
-cpu->id_afr0 = 0;
-cpu->isar.id_mmfr0 = 0x10201105;
-cpu->isar.id_mmfr1 = 0x4000;
-cpu->isar.id_mmfr2 = 0x0126;
-cpu->isar.id_mmfr3 = 0x02102211;
-cpu->isar.id_isar0 = 0x02101110;
-cpu->isar.id_isar1 = 0x13112111;
-cpu->isar.id_isar2 = 0x21232042;
-cpu->isar.id_isar3 = 0x01112131;
-cpu->isar.id_isar4 = 0x00011142;
-cpu->isar.id_isar5 = 0x00011121;
-cpu->isar.id_aa64pfr0 = 0x;
-cpu->isar.id_aa64pfr1 = 0;
-cpu->isar.id_aa64dfr0 = 0x10305106;
-cpu->isar.id_aa64dfr1 = 0;
-cpu->isar.id_aa64isar0 = 0x00011120;
-cpu->isar.id_aa64isar1 = 0;
-cpu->isar.id_aa64mmfr0 = 0x00101122;
-cpu->isar.id_aa64mmfr1 = 0;
-cpu->clidr = 0x0a200023;
-cpu->dcz_blocksize = 4;
-
-/* From B2.4 AArch64 Virtual Memory control registers */
-cpu->reset_sctlr = 0x00c50838;
-
-/* From B2.10 AArch64 performance monitor registers */
-cpu->isar.reset_pmcr_el0 = 0x410a3000;
-
-/* From B2.29 Cache ID registers */
-cpu->ccsidr[0] = 0x700fe01a; /* 32KB L1 dcache */
-cpu->ccsidr[1] = 0x201fe00a; /* 32KB L1 icache */
-cpu->ccsidr[2] = 0x703fe03a; /* 512KB L2 cache */
-
-/* From B3.5 VGIC Type register */
-cpu->gic_num_lrs = 4;
-cpu->gic_vpribits = 5;
-cpu->gic_vprebits = 5;
-cpu->gic_pribits = 5;
-
-/* From C6.4 Debug ID Register */
-cpu->isar.dbgdidr = 0x3516d000;
-/* From C6.5 Debug Device ID Register */
-cpu->isar.dbgdevid = 0x00110f13;
-/* From C6.6 Debug Device ID Register 1 */
-cpu->isar.dbgdevid1 = 0x2;
-
-/* From Cortex-A35 SIMD and Floating-point Support r1p0 */
-/* From 3.2 AArch32 register summary */
-cpu->reset_fpsid = 0x41034043;
-
-/* From 2.2 AArch64 register summary */
-cpu->isar.mvfr0 = 0x10110222;
-cpu->isar.mvfr1 = 0x1211;
-cpu->isar.mvfr2 = 0x0043;
-
-/* These values are the same with A53/A57/A72. */
-define_cortex_a72_a57_a53_cp_reginfo(cpu);
-}
-
 void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
 {
 /*
@@ -310,47 +232,6 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
 cpu->sve_vq.map

[RFC PATCH v5 2/9] target/arm: move cpu_tcg to tcg/cpu32.c

2023-01-20 Thread Fabiano Rosas

From: Claudio Fontana 

move the module containing cpu models definitions
for 32bit TCG-only CPUs to tcg/ and rename it for clarity.

Signed-off-by: Claudio Fontana 
Signed-off-by: Fabiano Rosas 
Reviewed-by: Richard Henderson 
---
First and last hunks are new.
---
 hw/arm/virt.c |  2 +-
 target/arm/meson.build|  1 -
 target/arm/{cpu_tcg.c => tcg/cpu32.c} | 13 +++--
 target/arm/tcg/cpu64.c|  2 +-
 target/arm/tcg/meson.build|  1 +
 tests/qtest/arm-cpu-features.c| 12 +---
 6 files changed, 15 insertions(+), 16 deletions(-)
 rename target/arm/{cpu_tcg.c => tcg/cpu32.c} (99%)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d243ae28e2..b5c711c919 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -203,9 +203,9 @@ static const int a15irqmap[] = {
 };
 
 static const char *valid_cpus[] = {
+#ifdef CONFIG_TCG
 ARM_CPU_TYPE_NAME("cortex-a7"),
 ARM_CPU_TYPE_NAME("cortex-a15"),
-#ifdef CONFIG_TCG
 ARM_CPU_TYPE_NAME("cortex-a35"),
 ARM_CPU_TYPE_NAME("cortex-a55"),
 ARM_CPU_TYPE_NAME("cortex-a72"),
diff --git a/target/arm/meson.build b/target/arm/meson.build
index 595d22a099..88f1a5c570 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -5,7 +5,6 @@ arm_ss.add(files(
   'gdbstub.c',
   'helper.c',
   'vfp_helper.c',
-  'cpu_tcg.c',
 ))
 arm_ss.add(zlib)
 
diff --git a/target/arm/cpu_tcg.c b/target/arm/tcg/cpu32.c
similarity index 99%
rename from target/arm/cpu_tcg.c
rename to target/arm/tcg/cpu32.c
index 64d5a785c1..caa5252ad9 100644
--- a/target/arm/cpu_tcg.c
+++ b/target/arm/tcg/cpu32.c
@@ -1,5 +1,5 @@
 /*
- * QEMU ARM TCG CPUs.
+ * QEMU ARM TCG-only CPUs.
  *
  * Copyright (c) 2012 SUSE LINUX Products GmbH
  *
@@ -10,9 +10,7 @@
 
 #include "qemu/osdep.h"
 #include "cpu.h"
-#ifdef CONFIG_TCG
 #include "hw/core/tcg-cpu-ops.h"
-#endif /* CONFIG_TCG */
 #include "internals.h"
 #include "target/arm/idau.h"
 #if !defined(CONFIG_USER_ONLY)
@@ -93,7 +91,7 @@ void aa32_max_features(ARMCPU *cpu)
 /* CPU models. These are not needed for the AArch64 linux-user build. */
 #if !defined(CONFIG_USER_ONLY) || !defined(TARGET_AARCH64)
 
-#if !defined(CONFIG_USER_ONLY) && defined(CONFIG_TCG)
+#if !defined(CONFIG_USER_ONLY)
 static bool arm_v7m_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
 {
 CPUClass *cc = CPU_GET_CLASS(cs);
@@ -117,7 +115,7 @@ static bool arm_v7m_cpu_exec_interrupt(CPUState *cs, int 
interrupt_request)
 }
 return ret;
 }
-#endif /* !CONFIG_USER_ONLY && CONFIG_TCG */
+#endif /* !CONFIG_USER_ONLY */
 
 static void arm926_initfn(Object *obj)
 {
@@ -1013,7 +1011,6 @@ static void pxa270c5_initfn(Object *obj)
 cpu->reset_sctlr = 0x0078;
 }
 
-#ifdef CONFIG_TCG
 static const struct TCGCPUOps arm_v7m_tcg_ops = {
 .initialize = arm_translate_init,
 .synchronize_from_tb = arm_cpu_synchronize_from_tb,
@@ -1034,7 +1031,6 @@ static const struct TCGCPUOps arm_v7m_tcg_ops = {
 .debug_check_breakpoint = arm_debug_check_breakpoint,
 #endif /* !CONFIG_USER_ONLY */
 };
-#endif /* CONFIG_TCG */
 
 static void arm_v7m_class_init(ObjectClass *oc, void *data)
 {
@@ -1042,10 +1038,7 @@ static void arm_v7m_class_init(ObjectClass *oc, void 
*data)
 CPUClass *cc = CPU_CLASS(oc);
 
 acc->info = data;
-#ifdef CONFIG_TCG
 cc->tcg_ops = _v7m_tcg_ops;
-#endif /* CONFIG_TCG */
-
 cc->gdb_core_xml_file = "arm-m-profile.xml";
 }
 
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index 4d5bdddae4..92943853ce 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -457,7 +457,7 @@ static void aarch64_neoverse_n1_initfn(Object *obj)
 
 /*
  * -cpu max: a CPU with as many features enabled as our emulation supports.
- * The version of '-cpu max' for qemu-system-arm is defined in cpu_tcg.c;
+ * The version of '-cpu max' for qemu-system-arm is defined in cpu32.c;
  * this only needs to handle 64 bits.
  */
 void aarch64_max_tcg_initfn(Object *obj)
diff --git a/target/arm/tcg/meson.build b/target/arm/tcg/meson.build
index 8debe81fd5..cea1e594c1 100644
--- a/target/arm/tcg/meson.build
+++ b/target/arm/tcg/meson.build
@@ -18,6 +18,7 @@ gen = [
 arm_ss.add(gen)
 
 arm_ss.add(files(
+  'cpu32.c',
   'translate.c',
   'translate-m-nocp.c',
   'translate-mve.c',
diff --git a/tests/qtest/arm-cpu-features.c b/tests/qtest/arm-cpu-features.c
index 8691802950..4ff2014bea 100644
--- a/tests/qtest/arm-cpu-features.c
+++ b/tests/qtest/arm-cpu-features.c
@@ -506,9 +506,15 @@ static void test_query_cpu_model_expansion_kvm(const void 
*data)
 QDict *resp;
 char *error;
 
-assert_error(qts, "cortex-a15",
-"We cannot guarantee the CPU type 'cortex-a15' works "
-"with KVM on this host", NULL);
+if (qtest_has_accel("tcg")) {
+assert_error(qts, "cortex-a15",
+ "We cannot guarantee the CPU type 'cortex-a15' works "
+ "with KVM on this host", NULL);
+

[RFC PATCH v5 5/9] tests/qtest: Restrict tpm-tis-devices-{swtpm}-test to CONFIG_TCG

2023-01-20 Thread Fabiano Rosas

These tests set -accel tcg, so restrict them to when TCG is present.

Signed-off-by: Fabiano Rosas 
Acked-by: Richard Henderson 
Reviewed-by: Thomas Huth 
---
Removed unneeded hunk restricting dependencies
Use config_all instead of config_devices_all to check for TCG
---
 tests/qtest/meson.build | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 1af63f8bd2..26d8cc4222 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -204,8 +204,8 @@ qtests_arm = \
 # TODO: once aarch64 TCG is fixed on ARM 32 bit host, make bios-tables-test 
unconditional
 qtests_aarch64 = \
   (cpu != 'arm' and unpack_edk2_blobs ? ['bios-tables-test'] : []) +   
 \
-  (config_all_devices.has_key('CONFIG_TPM_TIS_SYSBUS') ? 
['tpm-tis-device-test'] : []) +\
-  (config_all_devices.has_key('CONFIG_TPM_TIS_SYSBUS') ? 
['tpm-tis-device-swtpm-test'] : []) +  \
+  (config_all.has_key('CONFIG_TCG') and 
config_all_devices.has_key('CONFIG_TPM_TIS_SYSBUS') ?\
+['tpm-tis-device-test', 'tpm-tis-device-swtpm-test'] : []) +   
  \
   (config_all_devices.has_key('CONFIG_XLNX_ZYNQMP_ARM') ? ['xlnx-can-test', 
'fuzz-xlnx-dp-test'] : []) + \
   (config_all_devices.has_key('CONFIG_RASPI') ? ['bcm2835-dma-test'] : []) +  \
   ['arm-cpu-features',
-- 
2.35.3

[RFC PATCH v5 6/9] tests/tcg: Do not build/run TCG tests if TCG is disabled

2023-01-20 Thread Fabiano Rosas

The tests under tests/tcg depend on the TCG accelerator. Do not build
them if --disable-tcg was given in the configure line.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
---
 configure | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 9e407ce2e3..64960c6000 100755
--- a/configure
+++ b/configure
@@ -2483,7 +2483,11 @@ for target in $target_list; do
   tcg_tests_targets="$tcg_tests_targets $target"
   fi
 done
-echo "TCG_TESTS_TARGETS=$tcg_tests_targets" >> config-host.mak)
+
+if test "$tcg" = "enabled"; then
+echo "TCG_TESTS_TARGETS=$tcg_tests_targets" >> config-host.mak
+fi
+)
 
 if test "$skip_meson" = no; then
   cross="config-meson.cross.new"
-- 
2.35.3

[RFC PATCH v5 8/9] arm/Kconfig: Always select SEMIHOSTING when TCG is present

2023-01-20 Thread Fabiano Rosas

We are about to enable the build without TCG, so CONFIG_SEMIHOSTING
and CONFIG_ARM_COMPATIBLE_SEMIHOSTING cannot be unconditionally set in
default.mak anymore. So reflect the change in a Kconfig.

Instead of using semihosting/Kconfig, use a target-specific file, so
that the change doesn't affect other architectures which might
implement semihosting in a way compatible with KVM.

The selection from ARM_v7M needs to be removed to avoid a cycle during
parsing.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Richard Henderson 
---
 configs/devices/arm-softmmu/default.mak | 2 --
 hw/arm/Kconfig  | 1 -
 target/arm/Kconfig  | 7 +++
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/configs/devices/arm-softmmu/default.mak 
b/configs/devices/arm-softmmu/default.mak
index 1b49a7830c..cb3e5aea65 100644
--- a/configs/devices/arm-softmmu/default.mak
+++ b/configs/devices/arm-softmmu/default.mak
@@ -40,6 +40,4 @@ CONFIG_MICROBIT=y
 CONFIG_FSL_IMX25=y
 CONFIG_FSL_IMX7=y
 CONFIG_FSL_IMX6UL=y
-CONFIG_SEMIHOSTING=y
-CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
 CONFIG_ALLWINNER_H3=y
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 19d6b9d95f..e0da8841db 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -316,7 +316,6 @@ config ARM_V7M
 # currently v7M must be included in a TCG build due to translate.c
 default y if TCG && (ARM || AARCH64)
 select PTIMER
-select ARM_COMPATIBLE_SEMIHOSTING
 
 config ALLWINNER_A10
 bool
diff --git a/target/arm/Kconfig b/target/arm/Kconfig
index 3f3394a22b..39f05b6420 100644
--- a/target/arm/Kconfig
+++ b/target/arm/Kconfig
@@ -4,3 +4,10 @@ config ARM
 config AARCH64
 bool
 select ARM
+
+# This config exists just so we can make SEMIHOSTING default when TCG
+# is selected without also changing it for other architectures.
+config ARM_SEMIHOSTING
+bool
+default y if TCG && ARM
+select ARM_COMPATIBLE_SEMIHOSTING
-- 
2.35.3

[RFC PATCH v5 4/9] tests/qtest: arm-cpu-features: Match tests to required accelerators

2023-01-20 Thread Fabiano Rosas

Signed-off-by: Fabiano Rosas 
Reviewed-by: Richard Henderson 
Acked-by: Thomas Huth 
---
 tests/qtest/arm-cpu-features.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/tests/qtest/arm-cpu-features.c b/tests/qtest/arm-cpu-features.c
index 4ff2014bea..1555b0bab8 100644
--- a/tests/qtest/arm-cpu-features.c
+++ b/tests/qtest/arm-cpu-features.c
@@ -21,7 +21,7 @@
 #define SVE_MAX_VQ 16
 
 #define MACHINE "-machine virt,gic-version=max -accel tcg "
-#define MACHINE_KVM "-machine virt,gic-version=max -accel kvm -accel tcg "
+#define MACHINE_KVM "-machine virt,gic-version=max -accel kvm "
 #define QUERY_HEAD  "{ 'execute': 'query-cpu-model-expansion', " \
 "  'arguments': { 'type': 'full', "
 #define QUERY_TAIL  "}}"
@@ -613,31 +613,39 @@ int main(int argc, char **argv)
 {
 g_test_init(, , NULL);
 
-qtest_add_data_func("/arm/query-cpu-model-expansion",
-NULL, test_query_cpu_model_expansion);
+if (qtest_has_accel("tcg")) {
+qtest_add_data_func("/arm/query-cpu-model-expansion",
+NULL, test_query_cpu_model_expansion);
+}
+
+if (!g_str_equal(qtest_get_arch(), "aarch64")) {
+goto out;
+}
 
 /*
  * For now we only run KVM specific tests with AArch64 QEMU in
  * order avoid attempting to run an AArch32 QEMU with KVM on
  * AArch64 hosts. That won't work and isn't easy to detect.
  */
-if (g_str_equal(qtest_get_arch(), "aarch64") && qtest_has_accel("kvm")) {
+if (qtest_has_accel("kvm")) {
 /*
  * This tests target the 'host' CPU type, so register it only if
  * KVM is available.
  */
 qtest_add_data_func("/arm/kvm/query-cpu-model-expansion",
 NULL, test_query_cpu_model_expansion_kvm);
+
+qtest_add_data_func("/arm/kvm/query-cpu-model-expansion/sve-off",
+NULL, sve_tests_sve_off_kvm);
 }
 
-if (g_str_equal(qtest_get_arch(), "aarch64")) {
+if (qtest_has_accel("tcg")) {
 qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-max-vq-8",
 NULL, sve_tests_sve_max_vq_8);
 qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-off",
 NULL, sve_tests_sve_off);
-qtest_add_data_func("/arm/kvm/query-cpu-model-expansion/sve-off",
-NULL, sve_tests_sve_off_kvm);
 }
 
+out:
 return g_test_run();
 }
-- 
2.35.3

[RFC PATCH v5 3/9] target/arm: Use "max" as default cpu for the virt machine with KVM

2023-01-20 Thread Fabiano Rosas

Now that the cortex-a15 is under CONFIG_TCG, use as default CPU for a
KVM-only build the 'max' cpu.

Note that we cannot use 'host' here because the qtests can run without
any other accelerator (than qtest) and 'host' depends on KVM being
enabled.

Signed-off-by: Fabiano Rosas 
Cc: Daniel P. Berrangé 
Cc: Thomas Huth 
---
 hw/arm/virt.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b5c711c919..8091c74e3d 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -3005,7 +3005,11 @@ static void virt_machine_class_init(ObjectClass *oc, 
void *data)
 mc->minimum_page_bits = 12;
 mc->possible_cpu_arch_ids = virt_possible_cpu_arch_ids;
 mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
+#ifdef CONFIG_TCG
 mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a15");
+#else
+mc->default_cpu_type = ARM_CPU_TYPE_NAME("max");
+#endif
 mc->get_default_cpu_node_id = virt_get_default_cpu_node_id;
 mc->kvm_type = virt_kvm_type;
 assert(!mc->get_hotplug_handler);
-- 
2.35.3

Re: [PATCH v3 0/2] various aarch64 fixes for running Hyper-V on TCG

2023-01-20 Thread Peter Maydell

On Fri, 20 Jan 2023 at 15:59, Evgeny Iakovlev
 wrote:
>
> Small series of changes to aarch64 emulation to better support running
> Hyper-V as a TCG guest wtih EL3 firmware.



Applied to target-arm.next, thanks.

-- PMM

Re: [PATCH v3 4/5] hw/char/pl011: better handling of FIFO flags on LCR reset

2023-01-20 Thread Peter Maydell

On Fri, 20 Jan 2023 at 15:54, Evgeny Iakovlev
 wrote:
>
> Current FIFO handling code does not reset RXFE/RXFF flags when guest
> resets FIFO by writing to UARTLCR register, although internal FIFO state
> is reset to 0 read count. Actual guest-visible flag update will happen
> only on next data read or write attempt. As a result of that any guest
> that expects RXFE flag to be set (and RXFF to be cleared) after resetting
> FIFO will never see that happen.
>
> Signed-off-by: Evgeny Iakovlev 
Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH v3 3/5] hw/char/pl011: implement a reset method

2023-01-20 Thread Peter Maydell

On Fri, 20 Jan 2023 at 15:54, Evgeny Iakovlev
 wrote:
>
> PL011 currently lacks a reset method. Implement it.
>
> Signed-off-by: Evgeny Iakovlev 
> ---
>  hw/char/pl011.c | 26 +-

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH v3 1/5] hw/char/pl011: refactor FIFO depth handling code

2023-01-20 Thread Peter Maydell

On Fri, 20 Jan 2023 at 15:54, Evgeny Iakovlev
 wrote:
>
> PL011 can be in either of 2 modes depending guest config: FIFO and
> single register. The last mode could be viewed as a 1-element-deep FIFO.
>
> Current code open-codes a bunch of depth-dependent logic. Refactor FIFO
> depth handling code to isolate calculating current FIFO depth.
>
> One functional (albeit guest-invisible) side-effect of this change is
> that previously we would always increment s->read_pos in UARTDR read
> handler even if FIFO was disabled, now we are limiting read_pos to not
> exceed FIFO depth (read_pos itself is reset to 0 if user disables FIFO).
>
> Signed-off-by: Evgeny Iakovlev 
> ---

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH v3 2/5] hw/char/pl011: add post_load hook for backwards-compatibility

2023-01-20 Thread Peter Maydell

On Fri, 20 Jan 2023 at 15:54, Evgeny Iakovlev
 wrote:
>
> Previous change slightly modified the way we handle data writes when
> FIFO is disabled. Previously we kept incrementing read_pos and were
> storing data at that position, although we only have a
> single-register-deep FIFO now. Then we changed it to always store data
> at pos 0.
>
> If guest disables FIFO and the proceeds to read data, it will work out
> fine, because we read from current read_pos before setting it to 0.
>
> However, to make code less fragile, introduce a post_load hook for
> PL011State and move fixup read FIFO state when FIFO is disabled. Since
> we are introducing a post_load hook, also do some sanity checking on
> untrusted incoming input state.
>
> Signed-off-by: Evgeny Iakovlev 
> ---
>  hw/char/pl011.c | 27 ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/hw/char/pl011.c b/hw/char/pl011.c
> index 3fa3b75d04..4df649a064 100644
> --- a/hw/char/pl011.c
> +++ b/hw/char/pl011.c
> @@ -352,10 +352,35 @@ static const VMStateDescription vmstate_pl011_clock = {
>  }
>  };
>
> +static int pl011_post_load(void *opaque, int version_id)
> +{
> +PL011State* s = opaque;
> +
> +/* Sanity-check input state */
> +if (s->read_pos >= ARRAY_SIZE(s->read_fifo) ||
> +s->read_count > ARRAY_SIZE(s->read_fifo)) {
> +return -1;
> +}
> +
> +if (version_id < 3 && !pl011_is_fifo_enabled(s)) {
> +/*
> + * Older versions of PL011 didn't ensure that the single
> + * character in the FIFO in FIFO-disabled mode is in
> + * element 0 of the array; convert to follow the current
> + * code's assumptions.
> + */
> +s->read_fifo[0] = s->read_fifo[s->read_pos];
> +s->read_pos = 0;
> +}

You don't need to bump the version id and do this
check based on version ID. You can just check whether
the old state indicates that the data isn't in slot 0
of the array, the way I suggested in my comment on the
previous version of the patchset.

(New->old migration will work fine.)

thanks
-- PMM

Re: [RFC] cxl-host: Fix committed check for passthrough decoder

2023-01-20 Thread Jonathan Cameron via

On Mon, 16 Jan 2023 14:37:23 +
Jonathan Cameron via  wrote:

> On Fri, 13 Jan 2023 17:10:51 +
> Fan Ni  wrote:
> 
> > On Fri, Jan 13, 2023 at 09:47:25AM +, Jonathan Cameron wrote:
> >   
> > > On Fri, 13 Jan 2023 00:27:55 +
> > > Fan Ni  wrote:
> > > 
> > > > For passthrough decoder (a decoder hosted by a cxl component with only
> > > > one downstream port), its cache_mem_registers field COMMITTED
> > > > (see spec 2.0 8.2.5.12 - CXL HDM Decoder Capability Structure) will not
> > > > be set by the current Linux CXL driver. Without the fix, for a cxl
> > > > topology setup with a single HB and single root port, the memdev 
> > > > read/write
> > > > requests cannot be passed to the device successfully as the function
> > > > cxl_hdm_find_target will fail the decoder COMMITTED check and return
> > > > directly, which causes read/write not being directed to cxl type3 
> > > > device.
> > > > 
> > > > Before the fix, a segfault is observed when trying using cxl memory for
> > > > htop command through 'numactl --membind' after converting cxl memory
> > > > into normal RAM.
> > > 
> > > We also need to fix that segfault.
> > With the patch, we do not see the segfault anymore. The segfault was
> > there before the patch because for a passthrough decoder, we cannot find a
> > target as the committed field check cannot pass, the read request will
> > return 0 (in cxl_read_cfmws) which can be used for futher addressing.
> > With the patch, we skip the committed check for passthrough decoder and
> > the requests can be passed to the device so the segfault is fixed. Our
> > concern is that the fix may also let the requests pass for unprogrammed
> > decoder, which is not allowed in current code.  
> 
> Agreed on the concern. That is one reason we need a more comprehensive 
> solution.
> 
> > > 
> > > 
> > > > 
> > > > Detailed steps to reproduce the issue with the cxl setup where there is
> > > > only one HB and a memdev is directly attached to the only root port of
> > > > the HB are listed as below,
> > > > 1. cxl create-region region0
> > > > 2. ndctl create-namespace -m dax -r region0
> > > > 3. daxctl reconfigure-device --mode=system-ram --no-online dax0.0
> > > > 4. daxctl online-memory dax0.0
> > > > 5. numactl --membind=1 htop
> > > > 
> > > > Signed-off-by: Fan Ni 
> > > 
> > > Ah. This mess is still going on. I've not been testing with this
> > > particular combination because the kernel didn't support it.
> > > The kernel code assumes that the implementation made the choice
> > > (which is an option in the spec) to not have any HDM decoders
> > > for the pass through case. As such it never programmed them
> > > (if you dig back a long way in the region bring patch sets in the
> > > kernel you'll find some discussion of this). Now I knew that meant
> > > the configuration didn't 'work' but nothing should be crashing -
> > > unless you mean that something in linux userspace is trying to
> > > access the memory and crashing because that fails
> > > (which is fine as far as I'm concerned ;)
> > > 
> > > The work around for QEMU testing so far has been to add another root
> > > port and put nothing below that. The HDM decoders then have to be
> > > implemented so the kernel does what we expect.
> > Do you mean we already have the workaround somewhere or it is what we
> > have planned? currently the kernel will create a passthrough decoder if
> > the number of downstream is 1. If we have the workaround, there
> > should never be a passthrough decoder being created and we should not
> > see the issue.  
> 
> I have code as describe (now, didn't until few minutes ago).
> I'll send it out later this week after a little more testing / internal 
> review.
> 
> > > 
> > > I'm not against a more comprehensive fix.  Two options come to mind.
> > > 1) Add an option to the host bridge device to tell it not to implement
> > >hdm decoders at all. I'm not keen to just automatically drop them
> > >because having decoders on a pass through HB is a valid configuration.
> > > 2) Cheat and cleanly detect a pass through situation and let the accesses
> > >through.  I'm not particularly keen on this option though as it
> > >will fail to test the code once it's 'fixed' in Linux.  IIRC the spec
> > >doesn't say that programming such an HDM decoder is optional.
> > > 
> > > I guess we could be a bit naughty with option 1 and flip the logic even
> > > though it would break backwards compatibility. So default to no HDM 
> > > decoder.
> > > I doubt anyone will notice given that's the configuration that would have
> > > worked.  However I would want to keep the option to enable these decoders
> > > around.  I can spin up a patch or do you want to do it? My suggestion is 
> > > option
> > > 1 with default being no HDM decoder.  
> 
> I went with this option. Implementation uses the reset callback in QEMU
> for the pxb-cxl to edit the capability header table to 'hide' the HDM

Re: [PATCH 3/3] plugins: Iterate on cb_lists in qemu_plugin_user_exit

2023-01-20 Thread Alex Bennée



Richard Henderson  writes:

> Rather than iterate over all plugins for all events,
> iterate over plugins that have registered a given event.
>
> Signed-off-by: Richard Henderson 

Queued to plugins/next, thanks.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: cxl nvdimm Potential probe ordering issues.

2023-01-20 Thread Dan Williams

Gregory Price wrote:
> On Thu, Jan 19, 2023 at 03:04:49PM +, Jonathan Cameron wrote:
> > Gregory, would you mind checking if
> > cxl_nvb is NULL here...
> > https://elixir.bootlin.com/linux/v6.2-rc4/source/drivers/cxl/pmem.c#L67
> > (printk before it is used should work).
> > 
> > Might also be worth checking cxl_nvd and cxl_ds
> > but my guess is cxl_nvb is our problem (it is when I deliberate change
> > the load order).
> > 
> > Jonathan
> > 
> 
> This is exactly the issue.  cxl_nvb is null, the rest appear fine.
> 
> Also, note, that weirdly the non-volatile bridge shows up when launching
> this in volatile mode, but no stack trace appears.
> 
> ¯\_(ツ)_/¯
> 
> After spending way too much time tracing through the current cxl driver
> code, i have only really determined that
> 
> 1) The code is very pmem oriented, and it's unclear to me how the driver
>as-is differentiates a persistent device from a volatile device. That
>code path still completely escapes me.  The only differentiating code
>i see is in the memdev probe path that creates mem#/pmem and mem#/ram

Yes, pmem was the initial focus because it had the most dependency on
the OS to setup vs BIOS, but the ram enabling is at the top of the queue
now.

> 
> 2) The code successfully manages probe, enable, and mount a REAL device
>- cxl memdev appears (/sys/bus/cxl/devices/mem0)
>- a dax device appears (/sys/bus/dax/devices/)
>  This happens at boot, which I assume must be bios related

As it stands currently that dax device and the cxl device are not
related since a default dax-device is loaded just based on the presence
of an EFI_MEMORY_SP address range in the address map. With the new ram
enabling that default device will be elided and CXL will register a
dax-device parented by a cxl region.

>- The memory *does not* auto-online, instead the dax device can be
>  onlined as system-ram *manually* via ndctl and friends

That *manually* part is the problem that needs distro help to solve. It
should be the case that by default all Linux distributions auto-online
all dax-devices. If that happens to online memory that is too slow for
general use, or too high-performance / precious for general purpose use
then the administrator can set policy after the fact. Unfortunately user
policy can not be applied if these memory ranges were onlined by the
kernel at boot , so that's why the kernel policy defaults to not-online.

In other words, there is no guarantee that memory that was assigned to
the general purpose pool at boot can be removed. The only guaranteed
behavior is to never give the memory to the core kernel in the first
instance and always let user policy route the memory.

> 3) The code creates an nvdimm_bridge IFF a CFMW is defined - regardless
>of the type-3 device configuration (pmem-only or vmem-only)

Correct, the top-level bus code (cxl_acpi) and the endpoint code
(cxl_mem, cxl_port) need to handshake before establishing regions. For
pmem regions the platform needs to claim the availability of a pmem
capable CXL window.

> 
># CFMW defined
>[root@fedora ~]# ls /sys/bus/cxl/devices/
>decoder0.0  decoder2.0  mem0port1
>decoder1.0  endpoint2   nvdimm-bridge0  root0
> 
># CFMW not defined
>[root@fedora ~]# ls /sys/bus/cxl/devices/
>decoder1.0  decoder2.0  endpoint2  mem0  port1  root0
> 
> 4) As you can see above, multiple decoders are registered.  I'm not sure
>if that's correct or not, but it does seem odd given there's only one
>cxl type-3 device.  Odd that decoder0.0 shows up when CFMW is there,
>but not when it isn't.

CXL windows are modeled as decoders hanging off the the CXL root device
(ACPI0017 on ACPI based platforms). An endpoint decoder can then map a
selection of that window.

> 
>Note: All these tests have two root ports:
>-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
>-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
>-device cxl-rp,id=rp1,bus=cxl.0,chassis=0,port=1,slot=1 \
> 
> 
> Don't know why I haven't thought of this until now, but is the CFMW code
> reporting something odd about what's behind it?  Is it assuming the
> devices are pmem?

No, the cxl_acpi code is just advertising platform decode possibilities
independent of what devices show up. Think of this like the PCI MMIO
space that gets allocated to a root bridge at the beginning of time.
That space may or may not get consumed based on what devices show up
downstream.

Re: [PATCH 2/3] plugins: Avoid deadlock in qemu_plugin_user_exit

2023-01-20 Thread Alex Bennée



Richard Henderson  writes:

> Use of start_exclusive on this exit path leads to deadlock,
> in particular when called from dump_core_and_abort.  There
> does not appear to be a need for it.

We don't want to be doing any translation while un-registering things
lest things get confused. You could split the patch in two though as the
early return seems reasonable.

>
> While we're at it, skip the entire function if no plugins.
>
> Signed-off-by: Richard Henderson 
> ---
>  plugins/core.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/plugins/core.c b/plugins/core.c
> index ccb770a485..35aca0266d 100644
> --- a/plugins/core.c
> +++ b/plugins/core.c
> @@ -502,7 +502,9 @@ void qemu_plugin_user_exit(void)
>  
>  QEMU_LOCK_GUARD();
>  
> -start_exclusive();
> +if (QTAILQ_EMPTY()) {
> +return;
> +}
>  
>  /* un-register all callbacks except the final AT_EXIT one */
>  for (ev = 0; ev < QEMU_PLUGIN_EV_MAX; ev++) {
> @@ -520,8 +522,6 @@ void qemu_plugin_user_exit(void)
>  qemu_plugin_disable_mem_helpers(cpu);
>  }
>  
> -end_exclusive();
> -
>  /* now it's safe to handle the exit case */
>  qemu_plugin_atexit_cb();
>  }


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH] vhost-user-fs: add capability to allow migration

2023-01-20 Thread Anton Kuchin


On 20/01/2023 15:58, Michael S. Tsirkin wrote:

On Thu, Jan 19, 2023 at 03:45:06PM +0200, Anton Kuchin wrote:

On 19/01/2023 14:51, Michael S. Tsirkin wrote:

On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:

Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But we can give an option to orchestrator to override this if it can
guarantee that state will be preserved (e.g. it uses migration to
update qemu and dst will run on the same host as src and use the same
socket endpoints).

This patch keeps default behavior that prevents migration with such devices
but adds migration capability 'vhost-user-fs' to explicitly allow migration.

Signed-off-by: Anton Kuchin 
---
   hw/virtio/vhost-user-fs.c | 25 -
   qapi/migration.json   |  7 ++-
   2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index f5049735ac..13d920423e 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,7 @@
   #include "hw/virtio/vhost-user-fs.h"
   #include "monitor/monitor.h"
   #include "sysemu/sysemu.h"
+#include "migration/migration.h"
   static const int user_feature_bits[] = {
   VIRTIO_F_VERSION_1,
@@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
   return >vhost_dev;
   }
+static int vhost_user_fs_pre_save(void *opaque)
+{
+MigrationState *s = migrate_get_current();
+
+if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
+error_report("Migration of vhost-user-fs devices requires internal FUSE 
"
+ "state of backend to be preserved. If orchestrator can "
+ "guarantee this (e.g. dst connects to the same backend "
+ "instance or backend state is migrated) set 'vhost-user-fs' 
"
+ "migration capability to true to enable migration.");

Isn't it possible that some backends are same and some are not?
Shouldn't this be a device property then?

If some are not the same it is not guaranteed that correct FUSE
state is present there, so orchestrator shouldn't set the capability
because this can result in destination devices being broken (they'll
be fine after the remount in guest, but this is guest visible and is
not acceptable).

I can imagine smart orchestrator and backend that can transfer
internal FUSE state, but we are not there yet, and this would be
their responsibility then to ensure endpoint compatibility between src
and dst and set the capability (that's why I put "e.g." and "or" in
the error description).

So instead of relying on the orchestrator how about making it a device
property?


We have to rely on the orchestrator here and I can't see how a property
can help us with this: same device can be migratable or not depending
on if the destination is the same host, what features backend supports,
how management software works and other factors of environment that
are not accessible from qemu or backend daemon.
So in the end we'll end up with orchestrator having to setup flags for
each device before each migration based on information only it can
have - in my opinion this is worse than just giving the orchestrator
a single flag that it can set after calculating the decision for
the particular migration that it organizes.








+return -1;
+}
+
+return 0;
+}
+
   static const VMStateDescription vuf_vmstate = {
   .name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_END_OF_LIST()
+},
+   .pre_save = vhost_user_fs_pre_save,
   };
   static Property vuf_properties[] = {
diff --git a/qapi/migration.json b/qapi/migration.json
index 88ecf86ac8..9a229ea884 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -477,6 +477,11 @@
   #will be handled faster.  This is a performance feature 
and
   #should not affect the correctness of postcopy migration.
   #(since 7.1)
+# @vhost-user-fs: If enabled, the migration process will allow migration of
+# vhost-user-fs devices, this should be enabled only when
+# backend can preserve local FUSE state e.g. for qemu update
+# when dst reconects to the same endpoints after migration.
+# (since 8.0)
   #
   # Features:
   # @unstable: Members @x-colo and @x-ignore-shared are experimental.
@@ -492,7 +497,7 @@
  'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
  { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
  'validate-uuid', 'background-snapshot',
-   'zero-copy-send', 'postcopy-preempt'] }
+   'zero-copy-send',

Re: cxl nvdimm Potential probe ordering issues.

2023-01-20 Thread Dan Williams

Jonathan Cameron wrote:
> On Thu, 19 Jan 2023 15:04:49 +
> Jonathan Cameron  wrote:
> 
> > On Thu, 19 Jan 2023 12:42:44 +
> > Jonathan Cameron via  wrote:
> > 
> > > On Wed, 18 Jan 2023 14:31:53 -0500
> > > Gregory Price  wrote:
> > >   
> > > > I apparently forgot an intro lol
> > > > 
> > > > I tested the DOE linux branch with the 2023-1-11 QEMU branch with both
> > > > volatile, non-volatile, and "legacy" (pre-my-patch) non-volatile mode.
> > > > 
> > > > 1) *In volatile mode, there are no stack traces present (during boot*)
> > > > 
> > > > On Wed, Jan 18, 2023 at 02:22:08PM -0500, Gregory Price wrote:
> > > > > 
> > > > > 1) No stack traces present
> > > > > 2) Device usage appears to work, but cxl-cli fails to create a 
> > > > > region, i
> > > > > haven't checked why yet (also tried ndctl-75, same results)
> > > > > 3) There seems to be some other regression with the cxl_pmem_init
> > > > > routine, because I get a stack trace in this setup regardless of 
> > > > > whether
> > > > > I apply the type-3 device commit.
> > > > > 
> > > > > 
> > > > > All tests below with the previously posted DOE linux branch.
> > > > > Base QEMU branch was Jonathan's 2023-1-11
> > > > > 
> > > > > 
> > > > > DOE Branch - 2023-1-11 (HEAD) (all commits)
> > > > > 
> > > > > QEMU Config:
> > > > > sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
> > > > > -drive 
> > > > > file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=disk,id=hd
> > > > >  \
> > > > > -m 3G,slots=4,maxmem=8G \
> > > > > -smp 4 \
> > > > > -machine type=q35,accel=kvm,cxl=on \
> > > > > -enable-kvm \
> > > > > -nographic \
> > > > > -object memory-backend-ram,id=mem0,size=1G,share=on \
> > > > > -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
> > > > > -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 \
> > > > > -device cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0 \
> > > > > -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G
> > > > > 
> > > > > Result:  This worked, but cxl-cli could not create a region (will look
> > > > > into this further later).
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > When running with a persistent memory configuration, I'm seeing a
> > > > > kernel stack trace on cxl_pmem_init
> > > > > 
> > > > > Config:
> > > > > sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
> > > > > -drive 
> > > > > file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=disk,id=hd
> > > > >  \
> > > > > -m 3G,slots=4,maxmem=4G \
> > > > > -smp 4 \
> > > > > -machine type=q35,accel=kvm,cxl=on \
> > > > > -enable-kvm \
> > > > > -nographic \
> > > > > -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
> > > > > -device cxl-rp,port=0,id=rp0,bus=cxl.0,chassis=0,slot=0 \
> > > > > -object memory-backend-file,id=cxl-mem0,mem-path=/tmp/mem0,size=1G \
> > > > > -object memory-backend-file,id=cxl-lsa0,mem-path=/tmp/lsa0,size=1G \
> > > > > -device 
> > > > > cxl-type3,bus=rp0,persistent-memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0
> > > > >  \
> > > > > -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G
> > > > > 
> > > > > 
> > > > > [   62.167518] BUG: kernel NULL pointer dereference, address: 
> > > > > 04c0
> > > > > [   62.185069] #PF: supervisor read access in kernel mode
> > > > > [   62.198502] #PF: error_code(0x) - not-present page
> > > > > [   62.211019] PGD 0 P4D 0
> > > > > [   62.220521] Oops:  [#1] PREEMPT SMP PTI
> > > > > [   62.233457] CPU: 3 PID: 558 Comm: systemd-udevd Not tainted 
> > > > > 6.2.0-rc1+ #1
> > > > > [   62.252886] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), 
> > > > > BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
> > > > > [   62.258432] Adding 2939900k swap on /dev/zram0.  Priority:100 
> > > > > extents:1 across:2939900k SSDscFS
> > > > > [   62.285513] RIP: 0010:cxl_nvdimm_probe+0x8d/0x130 [cxl_pmem]
> > > > > [   62.285529] Code: 85 c0 0f 85 90 00 00 00 f0 80 0c 24 40 f0 80 4c 
> > > > > 24 08 10 f0 80 4c 24 08 20 f0 80 4c 24 08 40 49 8d 84 24 b8 04 00 00 
> > > > > 4c 89 0
> > > > > [   62.285531] RSP: 0018:acff0141fc38 EFLAGS: 00010202
> > > > > [   62.285534] RAX: 97a8a37b84b8 RBX: 97a8a37b8000 RCX: 
> > > > > 
> > > > > [   62.285536] RDX: 0001 RSI: 97a8a37b8000 RDI: 
> > > > > 
> > > > > [   62.285537] RBP: 97a8a37b8000 R08: 0001 R09: 
> > > > > 0001
> > > > > [   62.285538] R10: 0001 R11:  R12: 
> > > > > 97a8a37b8000
> > > > > [   62.285539] R13: 97a982c3dc28 R14:  R15: 
> > > > > 
> > > > > [   62.285541] FS:  7f2619829580() GS:97a9bca0() 
> > > > > knlGS:
> > > > > [   62.285542] CS:  0010 DS:  ES:  CR0: 80050033
> > > > > [   62.285544] CR2: 04c0 CR3: 0001056a8000 CR4: 
> > > > > 06e0
> > > > > [   62.285653] Call Trace:
> > > > > [   62.285656]  
> > > > > [   62.285660]  cxl_bus_probe+0x17/0x50
> > > >

Re: [PATCH 15/25] target/arm: Allow users to set the number of VFP registers

2023-01-20 Thread Peter Maydell

On Fri, 20 Jan 2023 at 08:40, Cédric Le Goater  wrote:
>
> On 1/19/23 13:34, Cédric Le Goater wrote:
> > Cortex A7 CPUs with an FPU implementing VFPv4 without NEON support
> > have 16 64-bit FPU registers and not 32 registers. Let users set the
> > number of VFP registers with a CPU property.
> >
> > The primary use case of this property is for the Cortex A7 of the
> > Aspeed AST2600 SoC.
> >
> > Signed-off-by: Cédric Le Goater 
> > ---
> >   target/arm/cpu.h|  2 ++
> >   hw/arm/aspeed_ast2600.c |  2 ++
> >   target/arm/cpu.c| 31 +++
> >   3 files changed, 35 insertions(+)
> >
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index bf2bce046d..5f2fefef4e 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -901,6 +901,8 @@ struct ArchCPU {
> >   bool has_pmu;
> >   /* CPU has VFP */
> >   bool has_vfp;
> > +/* CPU has 32 VFP registers */
> > +bool has_vfp_d32;
> >   /* CPU has Neon */
> >   bool has_neon;
> >   /* CPU has M-profile DSP extension */
> > diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c
> > index cd75465c2b..37f43b4165 100644
> > --- a/hw/arm/aspeed_ast2600.c
> > +++ b/hw/arm/aspeed_ast2600.c
> > @@ -309,6 +309,8 @@ static void aspeed_soc_ast2600_realize(DeviceState 
> > *dev, Error **errp)
> >   _abort);
> >   object_property_set_bool(OBJECT(>cpu[i]), "neon", false,
> >   _abort);
> > +object_property_set_bool(OBJECT(>cpu[i]), "vfp-d32", false,
> > +_abort);
> >   object_property_set_link(OBJECT(>cpu[i]), "memory",
> >OBJECT(s->memory), _abort);
> >
> > diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> > index 5f63316dbf..ad90de75fb 100644
> > --- a/target/arm/cpu.c
> > +++ b/target/arm/cpu.c
> > @@ -1269,6 +1269,9 @@ static Property arm_cpu_cfgend_property =
> >   static Property arm_cpu_has_vfp_property =
> >   DEFINE_PROP_BOOL("vfp", ARMCPU, has_vfp, true);
> >
> > +static Property arm_cpu_has_vfp_d32_property =
> > +DEFINE_PROP_BOOL("vfp-d32", ARMCPU, has_vfp_d32, true);
> > +
> >   static Property arm_cpu_has_neon_property =
> >   DEFINE_PROP_BOOL("neon", ARMCPU, has_neon, true);
> >
> > @@ -1400,6 +1403,14 @@ void arm_cpu_post_init(Object *obj)
> >   }
> >   }
> >
> > +if (cpu->has_vfp && cpu_isar_feature(aa32_simd_r32, cpu)) {
> > +cpu->has_vfp_d32 = true;
> > +if (!kvm_enabled()) {
> > +qdev_property_add_static(DEVICE(obj),
> > + _cpu_has_vfp_d32_property);
> > +}
> > +}
> > +
> >   if (arm_feature(>env, ARM_FEATURE_NEON)) {
> >   cpu->has_neon = true;
> >   if (!kvm_enabled()) {
> > @@ -1661,6 +1672,26 @@ static void arm_cpu_realizefn(DeviceState *dev, 
> > Error **errp)
> >   return;
> >   }
> >
> > +if (!cpu->has_vfp_d32 &&
> > +arm_feature(env, ARM_FEATURE_V8) &&
> > +!arm_feature(env, ARM_FEATURE_M)) {
> > +error_setg(errp, "ARMv8A CPUs must have VFP32");

I wonder if we should instead not create the property for v8A CPUs?
That way there's no way for has_vfp_d32 to get set to false.
But checking after the fact is fine too. (One day we might move away
from the current setup where properties get set up per-object for
Arm CPUs and towards a "the class of the object determines the
properties it has" setup; the way we do it today is a bit odd for
historical reasons.)

> > +return;
> > +}
>
> The specs say that the permitted values of the SIMDReg bits [3:0]
> in Armv8-A are 0b and 0b0010. In Armv8-M, the possible values
> of the field are b0001. All other values are reserved.
>
> This should be better :
>
> if (arm_feature(env, ARM_FEATURE_V8)) {
> if (arm_feature(env, ARM_FEATURE_M)) {
> if (cpu->has_vfp_d32) {

This is unreachable code, because we only set has_vfp_d32 to
true and set up the property if the CPU has the aa32_simd_r32
feature; and v8M CPUs will never have that feature, so
has_vfp_d32 must be false for them here.

(Put another way, we set up the property so we can only
"downgrade" from 32 to 16 dregs, so there is no need to
check whether a CPU that can't have 32 dregs still doesn't have
32 dregs.)

> error_setg(errp, "ARMv8-M CPUs do not support VFP-D32");
> return;
> }
> } else if (!cpu->has_vfp_d32) {
> error_setg(errp, "ARMv8-A CPUs must have VFP-D32");
> return;
> }
> }
>
> ?

thanks
-- PMM

Re: [PATCH 3/4] semihosting: add semihosting section to the docs

2023-01-20 Thread Alex Bennée



Peter Maydell  writes:

> On Fri, 13 Jan 2023 at 13:39, Alex Bennée  wrote:
>>

>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 3aa3a2f5a3..de3a368f58 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -4633,10 +4633,13 @@ DEF("semihosting", 0, QEMU_OPTION_semihosting,
>>  QEMU_ARCH_MIPS | QEMU_ARCH_NIOS2 | QEMU_ARCH_RISCV)
>>  SRST
>>  ``-semihosting``
>> -Enable semihosting mode (ARM, M68K, Xtensa, MIPS, Nios II, RISC-V only).
>> +Enable :ref:`Semihosting` mode (ARM, M68K, Xtensa, MIPS, Nios II, 
>> RISC-V only).
>>
>> -Note that this allows guest direct access to the host filesystem, so
>> -should only be used with a trusted guest OS.
>> +.. warning::
>> +  Note that this allows guest direct access to the host filesystem, so
>> +  should only be used with a trusted guest OS.
>> +
>> +.. _Semihosting Options:
>
> Does this render OK in the manpage version of this text ?

Seems to yes.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

[PATCH 3/2] MAINTAINERS: Cover tpm.c again

2023-01-20 Thread Markus Armbruster

Commit 800d4deda0 "softmmu: move more files to softmmu/" (v5.2.0)
updated MAINTAINERS for all moved files but one.  Fix that.

Fixes: 800d4deda04be016a95fbbf397c830a2d14ff9f6
Signed-off-by: Markus Armbruster 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index c888ccaf7a..27a2dac726 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3068,7 +3068,7 @@ T: git https://github.com/stefanha/qemu.git tracing
 TPM
 M: Stefan Berger 
 S: Maintained
-F: tpm.c
+F: softmmu/tpm.c
 F: hw/tpm/*
 F: include/hw/acpi/tpm.h
 F: include/sysemu/tpm*
-- 
2.39.0

Re: [PATCH v8] xen/pt: reserve PCI slot 2 for Intel igd-passthru

2023-01-20 Thread Stefano Stabellini

On Tue, 17 Jan 2023, Chuck Zmudzinski wrote:
> On 1/17/2023 6:04 AM, Igor Mammedov wrote:
> > On Mon, 16 Jan 2023 13:00:53 -0500
> > Chuck Zmudzinski  wrote:
> >
> > > On 1/16/23 10:33, Igor Mammedov wrote:
> > > > On Fri, 13 Jan 2023 16:31:26 -0500
> > > > Chuck Zmudzinski  wrote:
> > > >   
> > > >> On 1/13/23 4:33 AM, Igor Mammedov wrote:  
> > > >> > On Thu, 12 Jan 2023 23:14:26 -0500
> > > >> > Chuck Zmudzinski  wrote:
> > > >> > 
> > > >> >> On 1/12/23 6:03 PM, Michael S. Tsirkin wrote:
> > > >> >> > On Thu, Jan 12, 2023 at 10:55:25PM +, Bernhard Beschow wrote: 
> > > >> >> >  
> > > >> >> >> I think the change Michael suggests is very minimalistic: Move 
> > > >> >> >> the if
> > > >> >> >> condition around xen_igd_reserve_slot() into the function itself 
> > > >> >> >> and
> > > >> >> >> always call it there unconditionally -- basically turning three 
> > > >> >> >> lines
> > > >> >> >> into one. Since xen_igd_reserve_slot() seems very problem 
> > > >> >> >> specific,
> > > >> >> >> Michael further suggests to rename it to something more general. 
> > > >> >> >> All
> > > >> >> >> in all no big changes required.  
> > > >> >> > 
> > > >> >> > yes, exactly.
> > > >> >> >   
> > > >> >> 
> > > >> >> OK, got it. I can do that along with the other suggestions.
> > > >> > 
> > > >> > have you considered instead of reservation, putting a slot check in 
> > > >> > device model
> > > >> > and if it's intel igd being passed through, fail at realize time  if 
> > > >> > it can't take
> > > >> > required slot (with a error directing user to fix command line)?
> > > >> 
> > > >> Yes, but the core pci code currently already fails at realize time
> > > >> with a useful error message if the user tries to use slot 2 for the
> > > >> igd, because of the xen platform device which has slot 2. The user
> > > >> can fix this without patching qemu, but having the user fix it on
> > > >> the command line is not the best way to solve the problem, primarily
> > > >> because the user would need to hotplug the xen platform device via a
> > > >> command line option instead of having the xen platform device added by
> > > >> pc_xen_hvm_init functions almost immediately after creating the pci
> > > >> bus, and that delay in adding the xen platform device degrades
> > > >> startup performance of the guest.
> > > >>   
> > > >> > That could be less complicated than dealing with slot reservations 
> > > >> > at the cost of
> > > >> > being less convenient.
> > > >> 
> > > >> And also a cost of reduced startup performance  
> > > > 
> > > > Could you clarify how it affects performance (and how much).
> > > > (as I see, setup done at board_init time is roughly the same
> > > > as with '-device foo' CLI options, modulo time needed to parse
> > > > options which should be negligible. and both ways are done before
> > > > guest runs)  
> > > 
> > > I preface my answer by saying there is a v9, but you don't
> > > need to look at that. I will answer all your questions here.
> > > 
> > > I am going by what I observe on the main HDMI display with the
> > > different approaches. With the approach of not patching Qemu
> > > to fix this, which requires adding the Xen platform device a
> > > little later, the length of time it takes to fully load the
> > > guest is increased. I also noticed with Linux guests that use
> > > the grub bootoader, the grub vga driver cannot display the
> > > grub boot menu at the native resolution of the display, which
> > > in the tested case is 1920x1080, when the Xen platform device
> > > is added via a command line option instead of by the
> > > pc_xen_hvm_init_pci fucntion in pc_piix.c, but with this patch
> > > to Qemu, the grub menu is displayed at the full, 1920x1080
> > > native resolution of the display. Once the guest fully loads,
> > > there is no noticeable difference in performance. It is mainly
> > > a degradation in startup performance, not performance once
> > > the guest OS is fully loaded.
> >
> > Looking at igd-assign.txt, it recommends to add IGD using '-device' CLI
> > option, and actually drop at least graphics defaults explicitly.
> > So it is expected to work fine even when IGD is constructed with
> > '-device'.
> >
> > Could you provide full CLI current xen starts QEMU with and then
> > a CLI you used (with explicit -device for IGD) that leads
> > to reduced performance?
> >
> > CCing vfio folks who might have an idea what could be wrong based
> > on vfio experience.
> 
> Actually, the igd is not added with an explicit -device option using Xen:
> 
>    1573 ?    Ssl    0:42 /usr/bin/qemu-system-i386 -xen-domid 1 
> -no-shutdown -chardev 
> socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-1,server,nowait -mon 
> chardev=libxl-cmd,mode=control -chardev 
> socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-1,server,nowait 
> -mon chardev=libxenstat-cmd,mode=control -nodefaults -no-user-config -name 
> windows -vnc none -display none -serial pty

Re: [PULL 0/5] tcg patch queue

2023-01-20 Thread Alex Bennée



Thomas Huth  writes:

> On 20/01/2023 11.53, Ilya Leoshkevich wrote:
>> On Fri, 2023-01-20 at 10:41 +0100, Thomas Huth wrote:
>>> On 16/01/2023 23.36, Richard Henderson wrote:
 The following changes since commit
 fb7e7990342e59cf67dbd895c1a1e3fb1741df7a:

     tests/qtest/qom-test: Do not print tested properties by default
 (2023-01-16 15:00:57 +)

 are available in the Git repository at:

     https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230116

 for you to fetch changes up to
 61710a7e23a63546da0071ea32adb96476fa5d07:

     accel/tcg: Split out cpu_exec_{setjmp,loop} (2023-01-16 10:14:12
 -1000)

 
 - Reorg cpu_tb_exec around setjmp.
 - Use __attribute__((target)) for buffer_is_zero.
 - Add perfmap and jitdump for perf support.

 
 Ilya Leoshkevich (3):
     linux-user: Clean up when exiting due to a signal
     accel/tcg: Add debuginfo support
     tcg: add perfmap and jitdump

 Richard Henderson (2):
     util/bufferiszero: Use __attribute__((target)) for
 avx2/avx512
     accel/tcg: Split out cpu_exec_{setjmp,loop}
>>>
>>>    Hi Richard, hi Ilya,
>>>
>>> with the recent QEMU master branch (commit 701ed34), I'm now seeing
>>> failures
>>> in Travis:
>>>
>>>    https://app.travis-ci.com/github/huth/qemu/jobs/593786529#L14411
>>>
>>> Everything was still fine a couple of days ago (commit fb7e799):
>>>
>>>    https://app.travis-ci.com/github/huth/qemu/builds/259755664
>>>
>>> ... so it seems this is likely related to this pull request. Could
>>> you
>>> please have a look?
>>>
>>>    Thanks,
>>>     Thomas
>>>
>> I would expect this to be (temporarily) fixed by [1], but we
>> probably
>> don't set GITLAB_CI in Travis. Would it make sense to set it? It looks
>> as if this variable is currently used only to skip certain tests.
>> If not, then maybe split it into QEMU_CI, GITLAB_CI and TRAVIS_CI?
>> https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg04438.html
>
> Ah, ok, so this test has issues in gitlab, too!

*sigh* yeah the test is flaky but this is a subtly different failure
 mode. All the gitlab failures I saw where the test triggering the abort
 rather than the assert catch we have here.


>
> For Travis, I think we should either check the CI or TRAVIS
> environment variables:
>
>
> https://docs.travis-ci.com/user/environment-variables/#default-environment-variables
>
>  Thomas


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [RFC] virtio-iommu: Take into account possible aliasing in virtio_iommu_mr()

2023-01-20 Thread Eric Auger

Hi Jean, Robin,

On 1/20/23 17:23, Jean-Philippe Brucker wrote:
> On Fri, Jan 20, 2023 at 03:50:18PM +, Robin Murphy wrote:
>> On 2023-01-20 15:28, Jean-Philippe Brucker wrote:
>>
>> For some reason this came through as blank mail with a text attachment,
> Ugh sorry about that, looks like I hit ^D in mutt before sending
>
>> so apologies for the lack of quoting, but for reference this is a
>> long-standing known issue:
>>
>> https://lore.kernel.org/linux-iommu/9625faf4-48ef-2dd3-d82f-931d9cf26...@huawei.com/
>>
>> In summary, yes, hanging {of,acpi}_iommu_configure() off driver probe is
>> massively wrong, and pulling that step into iommu_probe_device() in a sane
>> and robust manner is one of the next things to start on after getting the
>> bus ops out of the way.
> Right, thanks for confirming

Thank you very much for your support and confirmation of the probe issue!

Eric
>
> Thanks,
> Jean
>

Re: [PATCH v3 2/7] hw/acpi/acpi_dev_interface: Resolve AcpiDeviceIfClass::madt_cpu

2023-01-20 Thread Igor Mammedov

On Thu, 19 Jan 2023 14:47:41 +
Bernhard Beschow  wrote:

> Am 18. Januar 2023 14:59:05 UTC schrieb Igor Mammedov :
> >On Tue, 17 Jan 2023 00:30:23 +
> >Bernhard Beschow  wrote:
> >  
> >> Am 16. Januar 2023 16:29:30 UTC schrieb Igor Mammedov 
> >> :  
> >> >On Mon, 16 Jan 2023 16:29:03 +0100
> >> >Bernhard Beschow  wrote:
> >> >
> >> >> This class attribute was always set to pc_madt_cpu_entry().
> >> >> pc_madt_cpu_entry() is architecture dependent and was assigned to the
> >> >> attribute even in architecture agnostic code such as in hw/acpi/piix4.c
> >> >> and hw/isa/lpc_ich9. Not having to set madt_cpu there resolves the
> >> >> assumption that these device models are only ever used with ACPI on x86
> >> >> targets.
> >> >> 
> >> >> The only target independent location where madt_cpu was called was hw/
> >> >> acpi/cpu.c. Here a function pointer can be passed via an argument
> >> >> instead. The other locations where it was called was in x86-specific 
> >> >> code
> >> >> where pc_madt_cpu_entry() can be used directly.
> >> >>
> >> >> While at it, move pc_madt_cpu_entry() from the public include/hw/i386/
> >> >> pc.h to the private hw/i386/acpi-common where it is also implemented.   
> >> >>  
> >> >
> >> >I'm not sure about this approach,
> >> >the callback is intend to be used not only by x86 but also in
> >> >the end by ARM (it's just that arm/virt CPU hotplug patches are
> >> >still work in progress and haven't been merged).
> >> 
> >> IIUC one would call the target-independent build_cpus_aml() from the 
> >> ARM-specific build_madt(). There, one could pass a function pointer to an 
> >> ARM-specific madt_cpu_fn. Shouldn't that get us covered?  
> >
> >it will work in this case, but I don't like going back to random
> >function pointer argument approach instead of using QOM and
> >interfaces which is much better in describing expectations/contract 
> >for interfaces adn objects it's attached to.
> >
> >Howver that is not the reason why I'm against this approach, see bellow.
> >  
> >> >So I'd prefer to keep AcpiDeviceIfClass::madt_cpu.
> >> >
> >> >What's the end goal you are trying to achieve by getting
> >> >rid of this callback?
> >> 
> >> ACPI controllers are in principle not x86-specific, yet our PIIX4 and ICH9 
> >> device models always assign an x86-specific function to 
> >> AcpiDeviceIfClass::madt_cpu. That doesn't seem right because the device 
> >> models make assumptions about their usage contexts which they ideally 
> >> shouldn't.
> >> 
> >> In addition, it seems to me that ACPI controllers and AML generation 
> >> should not be mixed: An ACPI controller deals with (hardware) events while 
> >> AML generation is usually a BIOS task to inject code into an OS. 
> >> Therefore, ACPI controllers don't seem to be the right place to assign 
> >> AML-generating functions because that doesn't match how reality works.  
> >
> >It would be nice to have pristine hardware-only device models
> >and firmware composing ACPI tables like baremetal systems do
> >(they have a luxury of fixed hardware which makes it way simpler),
> >however it was not practical nor sustainable in mainstream virt case.
> >That's the reason why ACPI tables (firmware) were moved into QEMU (hardware).
> >  
> >> To solve both issues, one could factor out e.g. an AmlDeviceIf from 
> >> AcpiDeviceIf, so one would't force the device models to provide an 
> >> madt_cpu. 
> >> Then one could assign a target-specific AmlDeviceIfClass::madt_cpu e.g. in 
> >> board code.  
> >ACPI tables in QEMU started this way, It stopped being maintainable
> >at certain point, see below for more.
> >  
> >>At the moment I don't see a need for a new interface, however, since the 
> >>function pointer works just fine: It frees the device models from having to 
> >>provide it and it is used in code which already deals with AML generation.  
> >When the move happened, the bulk of ACPI code was as you suggest made
> >as machine specific and it's still the case for the most of it.
> >Then, with ease of adding new related features, it exploded into
> >I would say hard to maintain mess. Hence before adding more and
> >making it worse, I introduced  call_dev_aml_func()/AcpiDevAmlIf
> >and started refactoring PCI AML code. That simplified main PCI bus
> >enumeration quite a bit.
> >
> >The new interface however does exactly opposite of what you are doing,
> >i.e. it pushes device specific AML generation into corresponding device
> >model. It's not ideal as we have to provide stubs for targets where it's
> >not needed. But stubs for target independent code is typical and proven
> >approach we use in QEMU for such cases and is relatively easy to maintain.
> >So I'd call it a reasonable compromise.
> >
> >Recently posted series (that untangles generic PCI slots AML from
> >ACPI PCI hotplug frankenstein) continues with AcpiDevAmlIf approach 
> >and pushes more AML into target in-depended device models.  
> 
> I fully agree with the

[PATCH] migration: Fix migration crash when target psize larger than host

2023-01-20 Thread Peter Xu

Commit d9e474ea56 overlooked the case where the target psize is even larger
than the host psize.  One example is Alpha has 8K page size and migration
will start to crash the source QEMU when running Alpha migration on x86.

Fix it by detecting that case and set host start/end just to cover the
single page to be migrated.

This will slightly optimize the common case where host psize equals to
guest psize so we don't even need to do the roundups, but that's trivial.

Cc: qemu-sta...@nongnu.org
Reported-by: Thomas Huth 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1456
Fixes: d9e474ea56 ("migration: Teach PSS about host page")
Signed-off-by: Peter Xu 
---
 migration/ram.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 334309f1c6..68a45338e3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2319,8 +2319,25 @@ static void pss_host_page_prepare(PageSearchStatus *pss)
 size_t guest_pfns = qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
 
 pss->host_page_sending = true;
-pss->host_page_start = ROUND_DOWN(pss->page, guest_pfns);
-pss->host_page_end = ROUND_UP(pss->page + 1, guest_pfns);
+if (guest_pfns <= 1) {
+/*
+ * This covers both when guest psize == host psize, or when guest
+ * has larger psize than the host (guest_pfns==0).
+ *
+ * For the latter, we always send one whole guest page per
+ * iteration of the host page (example: an Alpha VM on x86 host
+ * will have guest psize 8K while host psize 4K).
+ */
+pss->host_page_start = pss->page;
+pss->host_page_end = pss->page + 1;
+} else {
+/*
+ * The host page spans over multiple guest pages, we send them
+ * within the same host page iteration.
+ */
+pss->host_page_start = ROUND_DOWN(pss->page, guest_pfns);
+pss->host_page_end = ROUND_UP(pss->page + 1, guest_pfns);
+}
 }
 
 /*
-- 
2.37.3

Re: [RFC] virtio-iommu: Take into account possible aliasing in virtio_iommu_mr()

2023-01-20 Thread Jean-Philippe Brucker

On Fri, Jan 20, 2023 at 03:50:18PM +, Robin Murphy wrote:
> On 2023-01-20 15:28, Jean-Philippe Brucker wrote:
> 
> For some reason this came through as blank mail with a text attachment,

Ugh sorry about that, looks like I hit ^D in mutt before sending

> so apologies for the lack of quoting, but for reference this is a
> long-standing known issue:
> 
> https://lore.kernel.org/linux-iommu/9625faf4-48ef-2dd3-d82f-931d9cf26...@huawei.com/
> 
> In summary, yes, hanging {of,acpi}_iommu_configure() off driver probe is
> massively wrong, and pulling that step into iommu_probe_device() in a sane
> and robust manner is one of the next things to start on after getting the
> bus ops out of the way.

Right, thanks for confirming

Thanks,
Jean

Re: [PULL 00/12] Header cleanup patches for 2023-01-20

2023-01-20 Thread Peter Maydell

On Fri, 20 Jan 2023 at 07:01, Markus Armbruster  wrote:
>
> The following changes since commit 7ec8aeb6048018680c06fb9205c01ca6bda08846:
>
>   Merge tag 'pull-tpm-2023-01-17-1' of 
> https://github.com/stefanberger/qemu-tpm into staging (2023-01-17 15:47:53 
> +)
>
> are available in the Git repository at:
>
>   https://repo.or.cz/qemu/armbru.git tags/pull-include-2023-01-20
>
> for you to fetch changes up to b6c80037ed3ba275eea2b33bc17e36af2b89813a:
>
>   include/hw/ppc include/hw/pci-host: Drop extra typedefs (2023-01-20 
> 07:25:22 +0100)
>
> 
> Header cleanup patches for 2023-01-20
>
> 
> Markus Armbruster (12):
>   coroutine: Clean up superfluous inclusion of qemu/coroutine.h
>   coroutine: Move coroutine_fn to qemu/osdep.h, trim includes
>   coroutine: Clean up superfluous inclusion of qemu/lockable.h
>   coroutine: Split qemu/coroutine-core.h off qemu/coroutine.h
>   coroutine: Use Coroutine typedef name instead of structure tag
>   include/block: Untangle inclusion loops
>   hw/sparc64/niagara: Use blk_name() instead of open-coding it
>   include/hw/block: Include hw/block/block.h where needed
>   include/hw/ppc: Split pnv_chip.h off pnv.h
>   include/hw/ppc: Supply a few missing includes
>   include/hw/ppc: Don't include hw/pci-host/pnv_phb.h from pnv.h
>   include/hw/ppc include/hw/pci-host: Drop extra typedefs
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.0
for any user-visible changes.

-- PMM

Re: [PATCH v8] xen/pt: reserve PCI slot 2 for Intel igd-passthru

2023-01-20 Thread Igor Mammedov

On Tue, 17 Jan 2023 09:50:23 -0500
Chuck Zmudzinski  wrote:

> On 1/17/2023 5:35 AM, Igor Mammedov wrote:
> > On Mon, 16 Jan 2023 13:00:53 -0500
> > Chuck Zmudzinski  wrote:
> >  
> > > On 1/16/23 10:33, Igor Mammedov wrote:  
> > > > On Fri, 13 Jan 2023 16:31:26 -0500
> > > > Chuck Zmudzinski  wrote:
> > > > 
> > > >> On 1/13/23 4:33 AM, Igor Mammedov wrote:
> > > >> > On Thu, 12 Jan 2023 23:14:26 -0500
> > > >> > Chuck Zmudzinski  wrote:
> > > >> >   
> > > >> >> On 1/12/23 6:03 PM, Michael S. Tsirkin wrote:  
> > > >> >> > On Thu, Jan 12, 2023 at 10:55:25PM +, Bernhard Beschow wrote: 
> > > >> >> >
> > > >> >> >> I think the change Michael suggests is very minimalistic: Move 
> > > >> >> >> the if
> > > >> >> >> condition around xen_igd_reserve_slot() into the function itself 
> > > >> >> >> and
> > > >> >> >> always call it there unconditionally -- basically turning three 
> > > >> >> >> lines
> > > >> >> >> into one. Since xen_igd_reserve_slot() seems very problem 
> > > >> >> >> specific,
> > > >> >> >> Michael further suggests to rename it to something more general. 
> > > >> >> >> All
> > > >> >> >> in all no big changes required.
> > > >> >> > 
> > > >> >> > yes, exactly.
> > > >> >> > 
> > > >> >> 
> > > >> >> OK, got it. I can do that along with the other suggestions.  
> > > >> > 
> > > >> > have you considered instead of reservation, putting a slot check in 
> > > >> > device model
> > > >> > and if it's intel igd being passed through, fail at realize time  if 
> > > >> > it can't take
> > > >> > required slot (with a error directing user to fix command line)? 
> > > >> >  
> > > >> 
> > > >> Yes, but the core pci code currently already fails at realize time
> > > >> with a useful error message if the user tries to use slot 2 for the
> > > >> igd, because of the xen platform device which has slot 2. The user
> > > >> can fix this without patching qemu, but having the user fix it on
> > > >> the command line is not the best way to solve the problem, primarily
> > > >> because the user would need to hotplug the xen platform device via a
> > > >> command line option instead of having the xen platform device added by
> > > >> pc_xen_hvm_init functions almost immediately after creating the pci
> > > >> bus, and that delay in adding the xen platform device degrades
> > > >> startup performance of the guest.
> > > >> 
> > > >> > That could be less complicated than dealing with slot reservations 
> > > >> > at the cost of
> > > >> > being less convenient.  
> > > >> 
> > > >> And also a cost of reduced startup performance
> > > > 
> > > > Could you clarify how it affects performance (and how much).
> > > > (as I see, setup done at board_init time is roughly the same
> > > > as with '-device foo' CLI options, modulo time needed to parse
> > > > options which should be negligible. and both ways are done before
> > > > guest runs)
> > > 
> > > I preface my answer by saying there is a v9, but you don't
> > > need to look at that. I will answer all your questions here.
> > > 
> > > I am going by what I observe on the main HDMI display with the
> > > different approaches. With the approach of not patching Qemu
> > > to fix this, which requires adding the Xen platform device a
> > > little later, the length of time it takes to fully load the
> > > guest is increased. I also noticed with Linux guests that use
> > > the grub bootoader, the grub vga driver cannot display the
> > > grub boot menu at the native resolution of the display, which
> > > in the tested case is 1920x1080, when the Xen platform device
> > > is added via a command line option instead of by the
> > > pc_xen_hvm_init_pci fucntion in pc_piix.c, but with this patch
> > > to Qemu, the grub menu is displayed at the full, 1920x1080
> > > native resolution of the display. Once the guest fully loads,
> > > there is no noticeable difference in performance. It is mainly
> > > a degradation in startup performance, not performance once
> > > the guest OS is fully loaded.  
> > above hints on presence of bug[s] in igd-passthru implementation,
> > and this patch effectively hides problem instead of trying to figure
> > out what's wrong and fixing it.
> >  
> 
> I agree that the with the current Qemu there is a bug in the igd-passthru
> implementation. My proposed patch fixes it. So why not support the
> proposed patch with a Reviewed-by tag?
I see the patch as workaround (it might be a reasonable one) and
it's upto xen maintainers to give Reviewed-by/accept it.

Though I'd add to commit message something about performance issues
you are experiencing when trying to passthrough IGD manually
as one of justifications for workaround (it might push Xen folks
to investigate the issue and it won't be lost/forgotten on mail-list).

[PATCH v3 1/2] target/arm: implement DBGCLAIM registers

2023-01-20 Thread Evgeny Iakovlev

The architecture does not define any functionality for the CLAIM tag bits.
So we will just keep the raw bits, as per spec.

Signed-off-by: Evgeny Iakovlev 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/cpu.h  |  1 +
 target/arm/debug_helper.c | 33 +
 2 files changed, 34 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index bf2bce046d..d1ad0939ca 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -495,6 +495,7 @@ typedef struct CPUArchState {
 uint64_t dbgbcr[16]; /* breakpoint control registers */
 uint64_t dbgwvr[16]; /* watchpoint value registers */
 uint64_t dbgwcr[16]; /* watchpoint control registers */
+uint64_t dbgclaim;   /* DBGCLAIM bits */
 uint64_t mdscr_el1;
 uint64_t oslsr_el1; /* OS Lock Status */
 uint64_t osdlr_el1; /* OS DoubleLock status */
diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c
index 2f6ddc0da5..f95a73329d 100644
--- a/target/arm/debug_helper.c
+++ b/target/arm/debug_helper.c
@@ -632,6 +632,24 @@ static void osdlr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 }
 }
 
+static void dbgclaimset_write(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
+{
+env->cp15.dbgclaim |= (value & 0xFF);
+}
+
+static uint64_t dbgclaimset_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+/* CLAIM bits are RAO */
+return 0xFF;
+}
+
+static void dbgclaimclr_write(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
+{
+env->cp15.dbgclaim &= ~(value & 0xFF);
+}
+
 static const ARMCPRegInfo debug_cp_reginfo[] = {
 /*
  * DBGDRAR, DBGDSAR: always RAZ since we don't implement memory mapped
@@ -715,6 +733,21 @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
   .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 0,
   .access = PL1_RW, .accessfn = access_tda,
   .type = ARM_CP_NOP },
+/*
+ * Dummy DBGCLAIM registers.
+ * "The architecture does not define any functionality for the CLAIM tag 
bits.",
+ * so we only keep the raw bits
+ */
+{ .name = "DBGCLAIMSET_EL1", .state = ARM_CP_STATE_BOTH,
+  .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 6,
+  .type = ARM_CP_ALIAS,
+  .access = PL1_RW, .accessfn = access_tda,
+  .writefn = dbgclaimset_write, .readfn = dbgclaimset_read },
+{ .name = "DBGCLAIMCLR_EL1", .state = ARM_CP_STATE_BOTH,
+  .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 7, .crm = 9, .opc2 = 6,
+  .access = PL1_RW, .accessfn = access_tda,
+  .writefn = dbgclaimclr_write, .raw_writefn = raw_write,
+  .fieldoffset = offsetof(CPUARMState, cp15.dbgclaim) },
 };
 
 static const ARMCPRegInfo debug_lpae_cp_reginfo[] = {
-- 
2.34.1

[PATCH v3 2/2] target/arm: provide stubs for more external debug registers

2023-01-20 Thread Evgeny Iakovlev

Qemu doesn't implement Debug Communication Channel, as well as the rest
of external debug interface. However, Microsoft Hyper-V in tries to
access some of those registers during an EL2 context switch.

Since there is no architectural way to not advertise support for external
debug, provide RAZ/WI stubs for OSDTRRX_EL1, OSDTRTX_EL1 and OSECCR_EL1
registers in the same way the rest of DCM is currently done. Do account
for access traps though with access_tda.

Signed-off-by: Evgeny Iakovlev 
Reviewed-by: Peter Maydell 
---
 target/arm/debug_helper.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c
index f95a73329d..cced3f168d 100644
--- a/target/arm/debug_helper.c
+++ b/target/arm/debug_helper.c
@@ -682,6 +682,27 @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
   .opc0 = 2, .opc1 = 3, .crn = 0, .crm = 1, .opc2 = 0,
   .access = PL0_R, .accessfn = access_tda,
   .type = ARM_CP_CONST, .resetvalue = 0 },
+/*
+ * OSDTRRX_EL1/OSDTRTX_EL1 are used for save and restore of DBGDTRRX_EL0.
+ * It is a component of the Debug Communications Channel, which is not 
implemented.
+ */
+{ .name = "OSDTRRX_EL1", .state = ARM_CP_STATE_BOTH, .cp = 14,
+  .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 0, .opc2 = 2,
+  .access = PL1_RW, .accessfn = access_tda,
+  .type = ARM_CP_CONST, .resetvalue = 0 },
+{ .name = "OSDTRTX_EL1", .state = ARM_CP_STATE_BOTH, .cp = 14,
+  .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 3, .opc2 = 2,
+  .access = PL1_RW, .accessfn = access_tda,
+  .type = ARM_CP_CONST, .resetvalue = 0 },
+/*
+ * OSECCR_EL1 provides a mechanism for an operating system
+ * to access the contents of EDECCR. EDECCR is not implemented though,
+ * as is the rest of external device mechanism.
+ */
+{ .name = "OSECCR_EL1", .state = ARM_CP_STATE_BOTH, .cp = 14,
+  .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 2,
+  .access = PL1_RW, .accessfn = access_tda,
+  .type = ARM_CP_CONST, .resetvalue = 0 },
 /*
  * DBGDSCRint[15,12,5:2] map to MDSCR_EL1[15,12,5:2].  Map all bits as
  * it is unlikely a guest will care.
-- 
2.34.1

[PATCH v3 0/2] various aarch64 fixes for running Hyper-V on TCG

2023-01-20 Thread Evgeny Iakovlev

Small series of changes to aarch64 emulation to better support running
Hyper-V as a TCG guest wtih EL3 firmware.

v3:
* Replaced raw_write fn with raw_write instead of a manual function

v2:
* DBGCLAIM now implements a (trivial) raw_write handler
* Added comments around ignored external debug registers
* Patch 3 is dropped because it was manually picked into target-arm.next

Evgeny Iakovlev (2):
  target/arm: implement DBGCLAIM registers
  target/arm: provide stubs for more external debug registers

 target/arm/cpu.h  |  1 +
 target/arm/debug_helper.c | 54 +++
 2 files changed, 55 insertions(+)

-- 
2.34.1

[PATCH v3 2/5] hw/char/pl011: add post_load hook for backwards-compatibility

2023-01-20 Thread Evgeny Iakovlev

Previous change slightly modified the way we handle data writes when
FIFO is disabled. Previously we kept incrementing read_pos and were
storing data at that position, although we only have a
single-register-deep FIFO now. Then we changed it to always store data
at pos 0.

If guest disables FIFO and the proceeds to read data, it will work out
fine, because we read from current read_pos before setting it to 0.

However, to make code less fragile, introduce a post_load hook for
PL011State and move fixup read FIFO state when FIFO is disabled. Since
we are introducing a post_load hook, also do some sanity checking on
untrusted incoming input state.

Signed-off-by: Evgeny Iakovlev 
---
 hw/char/pl011.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index 3fa3b75d04..4df649a064 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -352,10 +352,35 @@ static const VMStateDescription vmstate_pl011_clock = {
 }
 };
 
+static int pl011_post_load(void *opaque, int version_id)
+{
+PL011State* s = opaque;
+
+/* Sanity-check input state */
+if (s->read_pos >= ARRAY_SIZE(s->read_fifo) ||
+s->read_count > ARRAY_SIZE(s->read_fifo)) {
+return -1;
+}
+
+if (version_id < 3 && !pl011_is_fifo_enabled(s)) {
+/*
+ * Older versions of PL011 didn't ensure that the single
+ * character in the FIFO in FIFO-disabled mode is in
+ * element 0 of the array; convert to follow the current
+ * code's assumptions.
+ */
+s->read_fifo[0] = s->read_fifo[s->read_pos];
+s->read_pos = 0;
+}
+
+return 0;
+}
+
 static const VMStateDescription vmstate_pl011 = {
 .name = "pl011",
-.version_id = 2,
+.version_id = 3,
 .minimum_version_id = 2,
+.post_load = pl011_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINT32(readbuff, PL011State),
 VMSTATE_UINT32(flags, PL011State),
-- 
2.34.1

[PATCH v3 1/5] hw/char/pl011: refactor FIFO depth handling code

2023-01-20 Thread Evgeny Iakovlev

PL011 can be in either of 2 modes depending guest config: FIFO and
single register. The last mode could be viewed as a 1-element-deep FIFO.

Current code open-codes a bunch of depth-dependent logic. Refactor FIFO
depth handling code to isolate calculating current FIFO depth.

One functional (albeit guest-invisible) side-effect of this change is
that previously we would always increment s->read_pos in UARTDR read
handler even if FIFO was disabled, now we are limiting read_pos to not
exceed FIFO depth (read_pos itself is reset to 0 if user disables FIFO).

Signed-off-by: Evgeny Iakovlev 
---
 hw/char/pl011.c | 30 ++
 include/hw/char/pl011.h |  5 -
 2 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index c076813423..3fa3b75d04 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -81,6 +81,17 @@ static void pl011_update(PL011State *s)
 }
 }
 
+static bool pl011_is_fifo_enabled(PL011State *s)
+{
+return (s->lcr & 0x10) != 0;
+}
+
+static inline unsigned pl011_get_fifo_depth(PL011State *s)
+{
+/* Note: FIFO depth is expected to be power-of-2 */
+return pl011_is_fifo_enabled(s) ? PL011_FIFO_DEPTH : 1;
+}
+
 static uint64_t pl011_read(void *opaque, hwaddr offset,
unsigned size)
 {
@@ -94,8 +105,7 @@ static uint64_t pl011_read(void *opaque, hwaddr offset,
 c = s->read_fifo[s->read_pos];
 if (s->read_count > 0) {
 s->read_count--;
-if (++s->read_pos == 16)
-s->read_pos = 0;
+s->read_pos = (s->read_pos + 1) & (pl011_get_fifo_depth(s) - 1);
 }
 if (s->read_count == 0) {
 s->flags |= PL011_FLAG_RXFE;
@@ -273,11 +283,7 @@ static int pl011_can_receive(void *opaque)
 PL011State *s = (PL011State *)opaque;
 int r;
 
-if (s->lcr & 0x10) {
-r = s->read_count < 16;
-} else {
-r = s->read_count < 1;
-}
+r = s->read_count < pl011_get_fifo_depth(s);
 trace_pl011_can_receive(s->lcr, s->read_count, r);
 return r;
 }
@@ -286,15 +292,15 @@ static void pl011_put_fifo(void *opaque, uint32_t value)
 {
 PL011State *s = (PL011State *)opaque;
 int slot;
+unsigned pipe_depth;
 
-slot = s->read_pos + s->read_count;
-if (slot >= 16)
-slot -= 16;
+pipe_depth = pl011_get_fifo_depth(s);
+slot = (s->read_pos + s->read_count) & (pipe_depth - 1);
 s->read_fifo[slot] = value;
 s->read_count++;
 s->flags &= ~PL011_FLAG_RXFE;
 trace_pl011_put_fifo(value, s->read_count);
-if (!(s->lcr & 0x10) || s->read_count == 16) {
+if (s->read_count == pipe_depth) {
 trace_pl011_put_fifo_full();
 s->flags |= PL011_FLAG_RXFF;
 }
@@ -359,7 +365,7 @@ static const VMStateDescription vmstate_pl011 = {
 VMSTATE_UINT32(dmacr, PL011State),
 VMSTATE_UINT32(int_enabled, PL011State),
 VMSTATE_UINT32(int_level, PL011State),
-VMSTATE_UINT32_ARRAY(read_fifo, PL011State, 16),
+VMSTATE_UINT32_ARRAY(read_fifo, PL011State, PL011_FIFO_DEPTH),
 VMSTATE_UINT32(ilpr, PL011State),
 VMSTATE_UINT32(ibrd, PL011State),
 VMSTATE_UINT32(fbrd, PL011State),
diff --git a/include/hw/char/pl011.h b/include/hw/char/pl011.h
index dc2c90eedc..926322e242 100644
--- a/include/hw/char/pl011.h
+++ b/include/hw/char/pl011.h
@@ -27,6 +27,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(PL011State, PL011)
 /* This shares the same struct (and cast macro) as the base pl011 device */
 #define TYPE_PL011_LUMINARY "pl011_luminary"
 
+/* Depth of UART FIFO in bytes, when FIFO mode is enabled (else depth == 1) */
+#define PL011_FIFO_DEPTH 16
+
 struct PL011State {
 SysBusDevice parent_obj;
 
@@ -39,7 +42,7 @@ struct PL011State {
 uint32_t dmacr;
 uint32_t int_enabled;
 uint32_t int_level;
-uint32_t read_fifo[16];
+uint32_t read_fifo[PL011_FIFO_DEPTH];
 uint32_t ilpr;
 uint32_t ibrd;
 uint32_t fbrd;
-- 
2.34.1

[PATCH v3 3/5] hw/char/pl011: implement a reset method

2023-01-20 Thread Evgeny Iakovlev

PL011 currently lacks a reset method. Implement it.

Signed-off-by: Evgeny Iakovlev 
---
 hw/char/pl011.c | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index 4df649a064..f9413f3703 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -427,11 +427,6 @@ static void pl011_init(Object *obj)
 s->clk = qdev_init_clock_in(DEVICE(obj), "clk", pl011_clock_update, s,
 ClockUpdate);
 
-s->read_trigger = 1;
-s->ifl = 0x12;
-s->cr = 0x300;
-s->flags = 0x90;
-
 s->id = pl011_id_arm;
 }
 
@@ -443,11 +438,32 @@ static void pl011_realize(DeviceState *dev, Error **errp)
  pl011_event, NULL, s, NULL, true);
 }
 
+static void pl011_reset(DeviceState *dev)
+{
+PL011State *s = PL011(dev);
+
+s->lcr = 0;
+s->rsr = 0;
+s->dmacr = 0;
+s->int_enabled = 0;
+s->int_level = 0;
+s->ilpr = 0;
+s->ibrd = 0;
+s->fbrd = 0;
+s->read_pos = 0;
+s->read_count = 0;
+s->read_trigger = 1;
+s->ifl = 0x12;
+s->cr = 0x300;
+s->flags = 0x90;
+}
+
 static void pl011_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
 
 dc->realize = pl011_realize;
+dc->reset = pl011_reset;
 dc->vmsd = _pl011;
 device_class_set_props(dc, pl011_properties);
 }
-- 
2.34.1

[PATCH v3 5/5] hw/char/pl011: check if UART is enabled before RX or TX operation

2023-01-20 Thread Evgeny Iakovlev

UART should be enabled in general and have RX enabled specifically to be
able to receive data from peripheral device. Same goes for transmitting
data to peripheral device and a TXE flag.

Check if UART CR register has EN and RXE or TXE bits enabled before
trying to receive or transmit data.

Signed-off-by: Evgeny Iakovlev 
Reviewed-by: Peter Maydell 
---
 hw/char/pl011.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index c72fbb7d50..dd20b76609 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -54,6 +54,11 @@
 #define INT_E (INT_OE | INT_BE | INT_PE | INT_FE)
 #define INT_MS (INT_RI | INT_DSR | INT_DCD | INT_CTS)
 
+/* UARTCR bits */
+#define PL011_CR_UARTEN (1 << 0)
+#define PL011_CR_TXE(1 << 8)
+#define PL011_CR_RXE(1 << 9)
+
 static const unsigned char pl011_id_arm[8] =
   { 0x11, 0x10, 0x14, 0x00, 0x0d, 0xf0, 0x05, 0xb1 };
 static const unsigned char pl011_id_luminary[8] =
@@ -211,6 +216,11 @@ static void pl011_trace_baudrate_change(const PL011State 
*s)
 s->ibrd, s->fbrd);
 }
 
+static inline bool pl011_can_transmit(PL011State *s)
+{
+return s->cr & PL011_CR_UARTEN && s->cr & PL011_CR_TXE;
+}
+
 static void pl011_write(void *opaque, hwaddr offset,
 uint64_t value, unsigned size)
 {
@@ -221,7 +231,9 @@ static void pl011_write(void *opaque, hwaddr offset,
 
 switch (offset >> 2) {
 case 0: /* UARTDR */
-/* ??? Check if transmitter is enabled.  */
+if (!pl011_can_transmit(s)) {
+break;
+}
 ch = value;
 /* XXX this blocks entire thread. Rewrite to use
  * qemu_chr_fe_write and background I/O callbacks */
@@ -292,7 +304,11 @@ static int pl011_can_receive(void *opaque)
 PL011State *s = (PL011State *)opaque;
 int r;
 
-r = s->read_count < pl011_get_fifo_depth(s);
+if (!(s->cr & PL011_CR_UARTEN) || !(s->cr & PL011_CR_RXE)) {
+r = 0;
+} else {
+r = s->read_count < pl011_get_fifo_depth(s);
+}
 trace_pl011_can_receive(s->lcr, s->read_count, r);
 return r;
 }
@@ -461,7 +477,7 @@ static void pl011_reset(DeviceState *dev)
 s->fbrd = 0;
 s->read_trigger = 1;
 s->ifl = 0x12;
-s->cr = 0x300;
+s->cr = PL011_CR_RXE | PL011_CR_TXE;
 s->flags = 0;
 pl011_reset_fifo(s);
 }
-- 
2.34.1

[PATCH v3 0/5] Series of fixes for PL011 char device

2023-01-20 Thread Evgeny Iakovlev

v3:
* Introduced a post_load hook for PL011State migration for
  backwards-compatibility due to some input state fragility.
* No longer touching irq lines in reset method
* Minor changes based on review feedback.

v2:
* Moved FIFO depth refactoring part of FIFO flags change into its own
  commit.
* Added a reset method for PL011

Evgeny Iakovlev (5):
  hw/char/pl011: refactor FIFO depth handling code
  hw/char/pl011: add post_load hook for backwards-compatibility
  hw/char/pl011: implement a reset method
  hw/char/pl011: better handling of FIFO flags on LCR reset
  hw/char/pl011: check if UART is enabled before RX or TX operation

 hw/char/pl011.c | 110 +---
 include/hw/char/pl011.h |   5 +-
 2 files changed, 95 insertions(+), 20 deletions(-)

-- 
2.34.1

[PATCH v3 4/5] hw/char/pl011: better handling of FIFO flags on LCR reset

2023-01-20 Thread Evgeny Iakovlev

Current FIFO handling code does not reset RXFE/RXFF flags when guest
resets FIFO by writing to UARTLCR register, although internal FIFO state
is reset to 0 read count. Actual guest-visible flag update will happen
only on next data read or write attempt. As a result of that any guest
that expects RXFE flag to be set (and RXFF to be cleared) after resetting
FIFO will never see that happen.

Signed-off-by: Evgeny Iakovlev 
---
 hw/char/pl011.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index f9413f3703..c72fbb7d50 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -92,6 +92,16 @@ static inline unsigned pl011_get_fifo_depth(PL011State *s)
 return pl011_is_fifo_enabled(s) ? PL011_FIFO_DEPTH : 1;
 }
 
+static inline void pl011_reset_fifo(PL011State *s)
+{
+s->read_count = 0;
+s->read_pos = 0;
+
+/* Reset FIFO flags */
+s->flags &= ~(PL011_FLAG_RXFF | PL011_FLAG_TXFF);
+s->flags |= PL011_FLAG_RXFE | PL011_FLAG_TXFE;
+}
+
 static uint64_t pl011_read(void *opaque, hwaddr offset,
unsigned size)
 {
@@ -239,8 +249,7 @@ static void pl011_write(void *opaque, hwaddr offset,
 case 11: /* UARTLCR_H */
 /* Reset the FIFO state on FIFO enable or disable */
 if ((s->lcr ^ value) & 0x10) {
-s->read_count = 0;
-s->read_pos = 0;
+pl011_reset_fifo(s);
 }
 if ((s->lcr ^ value) & 0x1) {
 int break_enable = value & 0x1;
@@ -450,12 +459,11 @@ static void pl011_reset(DeviceState *dev)
 s->ilpr = 0;
 s->ibrd = 0;
 s->fbrd = 0;
-s->read_pos = 0;
-s->read_count = 0;
 s->read_trigger = 1;
 s->ifl = 0x12;
 s->cr = 0x300;
-s->flags = 0x90;
+s->flags = 0;
+pl011_reset_fifo(s);
 }
 
 static void pl011_class_init(ObjectClass *oc, void *data)
-- 
2.34.1

Re: [PATCH v8] xen/pt: reserve PCI slot 2 for Intel igd-passthru

2023-01-20 Thread Igor Mammedov

On Tue, 17 Jan 2023 09:43:52 -0500
Chuck Zmudzinski  wrote:

> On 1/17/2023 5:35 AM, Igor Mammedov wrote:
> > On Mon, 16 Jan 2023 13:00:53 -0500
> > Chuck Zmudzinski  wrote:
> >  
> > > On 1/16/23 10:33, Igor Mammedov wrote:  
> > > > On Fri, 13 Jan 2023 16:31:26 -0500
> > > > Chuck Zmudzinski  wrote:
> > > > 
> > > >> On 1/13/23 4:33 AM, Igor Mammedov wrote:
> > > >> > On Thu, 12 Jan 2023 23:14:26 -0500
> > > >> > Chuck Zmudzinski  wrote:
> > > >> >   
> > > >> >> On 1/12/23 6:03 PM, Michael S. Tsirkin wrote:  
> > > >> >> > On Thu, Jan 12, 2023 at 10:55:25PM +, Bernhard Beschow wrote: 
> > > >> >> >
> > > >> >> >> I think the change Michael suggests is very minimalistic: Move 
> > > >> >> >> the if
> > > >> >> >> condition around xen_igd_reserve_slot() into the function itself 
> > > >> >> >> and
> > > >> >> >> always call it there unconditionally -- basically turning three 
> > > >> >> >> lines
> > > >> >> >> into one. Since xen_igd_reserve_slot() seems very problem 
> > > >> >> >> specific,
> > > >> >> >> Michael further suggests to rename it to something more general. 
> > > >> >> >> All
> > > >> >> >> in all no big changes required.
> > > >> >> > 
> > > >> >> > yes, exactly.
> > > >> >> > 
> > > >> >> 
> > > >> >> OK, got it. I can do that along with the other suggestions.  
> > > >> > 
> > > >> > have you considered instead of reservation, putting a slot check in 
> > > >> > device model
> > > >> > and if it's intel igd being passed through, fail at realize time  if 
> > > >> > it can't take
> > > >> > required slot (with a error directing user to fix command line)? 
> > > >> >  
> > > >> 
> > > >> Yes, but the core pci code currently already fails at realize time
> > > >> with a useful error message if the user tries to use slot 2 for the
> > > >> igd, because of the xen platform device which has slot 2. The user
> > > >> can fix this without patching qemu, but having the user fix it on
> > > >> the command line is not the best way to solve the problem, primarily
> > > >> because the user would need to hotplug the xen platform device via a
> > > >> command line option instead of having the xen platform device added by
> > > >> pc_xen_hvm_init functions almost immediately after creating the pci
> > > >> bus, and that delay in adding the xen platform device degrades
> > > >> startup performance of the guest.
> > > >> 
> > > >> > That could be less complicated than dealing with slot reservations 
> > > >> > at the cost of
> > > >> > being less convenient.  
> > > >> 
> > > >> And also a cost of reduced startup performance
> > > > 
> > > > Could you clarify how it affects performance (and how much).
> > > > (as I see, setup done at board_init time is roughly the same
> > > > as with '-device foo' CLI options, modulo time needed to parse
> > > > options which should be negligible. and both ways are done before
> > > > guest runs)
> > > 
> > > I preface my answer by saying there is a v9, but you don't
> > > need to look at that. I will answer all your questions here.
> > > 
> > > I am going by what I observe on the main HDMI display with the
> > > different approaches. With the approach of not patching Qemu
> > > to fix this, which requires adding the Xen platform device a
> > > little later, the length of time it takes to fully load the
> > > guest is increased. I also noticed with Linux guests that use
> > > the grub bootoader, the grub vga driver cannot display the
> > > grub boot menu at the native resolution of the display, which
> > > in the tested case is 1920x1080, when the Xen platform device
> > > is added via a command line option instead of by the
> > > pc_xen_hvm_init_pci fucntion in pc_piix.c, but with this patch
> > > to Qemu, the grub menu is displayed at the full, 1920x1080
> > > native resolution of the display. Once the guest fully loads,
> > > there is no noticeable difference in performance. It is mainly
> > > a degradation in startup performance, not performance once
> > > the guest OS is fully loaded.  
> > above hints on presence of bug[s] in igd-passthru implementation,
> > and this patch effectively hides problem instead of trying to figure
> > out what's wrong and fixing it.
> >  
> 
> Why did you delete the rest of my answers to your other observations and
> not respond to them?

they are preserved on mail-list in you previous email.
It's usually fine to trim non relevant parts and keep only part/context
that is being replied to.

I didn't want to argue point that it's usually job of management
arrange correct IGD passthrough for QEMU like Alex pointed out.
Hence those part got trimmed.


>

Re: [RFC] virtio-iommu: Take into account possible aliasing in virtio_iommu_mr()

2023-01-20 Thread Robin Murphy


On 2023-01-20 15:28, Jean-Philippe Brucker wrote:

For some reason this came through as blank mail with a text attachment, 
so apologies for the lack of quoting, but for reference this is a 
long-standing known issue:


https://lore.kernel.org/linux-iommu/9625faf4-48ef-2dd3-d82f-931d9cf26...@huawei.com/

In summary, yes, hanging {of,acpi}_iommu_configure() off driver probe is 
massively wrong, and pulling that step into iommu_probe_device() in a 
sane and robust manner is one of the next things to start on after 
getting the bus ops out of the way.


Thanks,
Robin.

Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group

2023-01-20 Thread Jean-Philippe Brucker

On Wed, Jan 18, 2023 at 11:28:32AM -0700, Alex Williamson wrote:
> The VT-d spec[2](8.3.1) has a more elegant solution using a path
> described in a device scope, based on a root bus number (not
> susceptible to OS renumbering) and a sequence of devfns to uniquely
> describe a hierarchy or endpoint, invariant of OS bus renumbering.

That's a good idea, we could describe the hierarchy using only devfns.
I think I based VIOT mostly on IORT and device-tree which don't provide
that as far as I know, but could have studied DMAR better. One problem is
that for virtio-iommu we'd need to update both device-tree and VIOT (and
neither are easy to change).

But it's worth thinking about because it would solve a problem we
currently have, that a virtio-iommu using the virtio-pci transport cannot
be placed behind a bridge, including a root port, because the firmware
tables cannot refer to it.

Thanks,
Jean

Re: [RFC] virtio-iommu: Take into account possible aliasing in virtio_iommu_mr()

2023-01-20 Thread Jean-Philippe Brucker

Hi Eric,

On Mon, Jan 16, 2023 at 07:47:09AM -0500, Eric Auger wrote:
[...]
> once we attempt to plug such devices downstream to the pcie-to-pci
> bridge, those devices are put in a singleton group. The pcie-to-pci
> bridge disappears from the radar (not attached to any group), and the
> pcie root port the pcie-to-pci bridge is plugged onto is put in a
> separate singleton group. So the topology is wrong and definitively
> different from the one observed with the intel iommu.
> 
> I suspect some other issue in the viot/pci probing on guest kernel
> side. I would appreciate on that kernel part.
> 
> qemu command excerpt:
> for x86_64:
> 
> -device ioh3420,bus=pcie.0,chassis=1,id=pcie.1,addr=0x2.0x0
> -device pcie-pci-bridge,id=pcie_pci_bridge1,bus=pcie.1
> -netdev tap,id=mynet0,ifname=tap0,script=qemu-ifup,downscript=qemu-ifdown
> -device e1000,netdev=mynet0,bus=pcie_pci_bridge1
> 
> guest pci topology:
> 
> -[:00]-+-00.0  Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
>+-01.0  Device 1234:
>+-02.0-[01-02]00.0-[02]00.0  Intel Corporation 82540EM 
> Gigabit Ethernet Controller
> 
> wrong guest topology:
> /sys/kernel/iommu_groups/2/devices/:00:02.0 (pcie root port)
> /sys/kernel/iommu_groups/1/devices/:02:00.0 (E1000)
> no group for :01:00:0 (the pcie-to-pci bridge)

This highlights several problems in the kernel:

(1) The pcie-to-pci bridge doesn't get an IOMMU group. That's a bug in the
VIOT driver, where we use the wrong struct device when dealing with PCI
aliases. I'll send a fix.

(2) e1000 gets a different group than the pcie-to-pci bridge, and than
other devices on that bus. This wrongly enables assigning aliased devices
to different VMs.

It's not specific to virtio-iommu, I can get the same result with SMMUv3,
both DT and ACPI:

qemu-system-aarch64 -M virt,gic-version=3,its=on,iommu=smmuv3 -cpu max -m 4G 
-smp 8 -kernel Image -initrd initrd -append console=ttyAMA0 -nographic \
-device pcie-root-port,chassis=1,id=pcie.1,bus=pcie.0 \
-device pcie-pci-bridge,id=pcie.2,bus=pcie.1 \
-device e1000,netdev=net0,bus=pcie.2,addr=1.0 -netdev user,id=net0 \
-device e1000,netdev=net1,bus=pcie.2,addr=2.0 -netdev user,id=net1

I think the logic in pci_device_group() expects the devices to be probed
in a specific top-down order, so that when we get to a device, all its
parents already have a group. Starting with arbitrary devices would
require walking down and across the tree to find aliases which is too
complex.

As you noted Intel IOMMU doesn't have this problem, because probing
happens in the expected order. The driver loads at the right time and
bus_iommu_probe() ends up calling pci_device_group() in the right order.
For virtio-iommu and SMMU, the drivers are subject to probe deferral, and
devices drivers are bound kinda randomly by the driver core. In that case
it's acpi/of_iommu_configure() that calls pci_device_group().

I'm not sure what the best approach is to fix this. It seems wrong to rely
on previous driver probes in pci_device_group() at all, because even if
you probe in the right order, you could build a kernel without the PCIe
port driver and the aliased endpoints will still be put in different
groups. Maybe pci_device_group(), when creating a new group, should
immediately assign that group to all parents that alias the device?
Or maybe we can reuse the RID alias infrastructure used by quirks.

(3) with the SMMU, additional devices on the PCI bus can't get an IOMMU
group:

[2.019587] e1000 :02:01.0: Adding to iommu group 0
[2.389750] e1000 :02:02.0: stream 512 already in tree

Due to cdf315f907d4 ("iommu/arm-smmu-v3: Maintain a SID->device
structure"), the SMMUv3 driver doesn't support RID aliasing anymore.
The structure needs to be extended with multiple devices per SID, in which
case the fault handler will choose an arbitrary device associated to the
faulting SID.

(4) When everything else works, on Arm ACPI the PCIe root port is put in a
separate group due to ACS being enabled, but on other platforms (including
Arm DT), the root port is in the same group. I haven't looked into that.

> with intel iommu we get the following topology:
> /sys/kernel/iommu_groups/2/devices/:02:00.0
> /sys/kernel/iommu_groups/2/devices/:01:00.0
> /sys/kernel/iommu_groups/2/devices/:00:02.0
> 
> on aarch64 we get the same using those devices:
> -device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1,addr=0x2.0x0
> -device pcie-pci-bridge,id=pcie_pci_bridge1,bus=pcie.1
> -netdev tap,id=mynet0,ifname=tap0,script=qemu-ifup,downscript=qemu-ifdown
> -device e1000,netdev=mynet0,bus=pcie_pci_bridge1
> 
> Signed-off-by: Eric Auger 
> ---
>  hw/virtio/virtio-iommu.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index 23c470977e..58c367b744 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@

[PATCH v1 2/2] target/riscv: redirect XVentanaCondOps to use the Zicond functions

2023-01-20 Thread Philipp Tomsich

The Zicond standard extension implements the same instruction
semantics as XVentanaCondOps, although using different mnemonics and
opcodes.

Point XVentanaCondOps to the (newly implemented) Zicond implementation
to reduce the future maintenance burden.

Also updating MAINTAINERS as trans_xventanacondops.c.inc will not see
active maintenance from here forward.

Signed-off-by: Philipp Tomsich 
---

 MAINTAINERS|  2 +-
 .../insn_trans/trans_xventanacondops.c.inc | 18 +++---
 2 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index ca914c42fa..293a9d1c8c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -305,7 +305,7 @@ F: target/riscv/insn_trans/trans_zicond.c.inc
 RISC-V XVentanaCondOps extension
 M: Philipp Tomsich 
 L: qemu-ri...@nongnu.org
-S: Supported
+S: Odd Fixes
 F: target/riscv/XVentanaCondOps.decode
 F: target/riscv/insn_trans/trans_xventanacondops.c.inc
 
diff --git a/target/riscv/insn_trans/trans_xventanacondops.c.inc 
b/target/riscv/insn_trans/trans_xventanacondops.c.inc
index 16849e6d4e..d6dbe89a77 100644
--- a/target/riscv/insn_trans/trans_xventanacondops.c.inc
+++ b/target/riscv/insn_trans/trans_xventanacondops.c.inc
@@ -1,7 +1,7 @@
 /*
  * RISC-V translation routines for the XVentanaCondOps extension.
  *
- * Copyright (c) 2021-2022 VRULL GmbH.
+ * Copyright (c) 2021-2023 VRULL GmbH.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -16,24 +16,12 @@
  * this program.  If not, see .
  */
 
-static bool gen_vt_condmask(DisasContext *ctx, arg_r *a, TCGCond cond)
-{
-TCGv dest = dest_gpr(ctx, a->rd);
-TCGv src1 = get_gpr(ctx, a->rs1, EXT_NONE);
-TCGv src2 = get_gpr(ctx, a->rs2, EXT_NONE);
-
-tcg_gen_movcond_tl(cond, dest, src2, ctx->zero, src1, ctx->zero);
-
-gen_set_gpr(ctx, a->rd, dest);
-return true;
-}
-
 static bool trans_vt_maskc(DisasContext *ctx, arg_r *a)
 {
-return gen_vt_condmask(ctx, a, TCG_COND_NE);
+return trans_czero_eqz(ctx, a);
 }
 
 static bool trans_vt_maskcn(DisasContext *ctx, arg_r *a)
 {
-return gen_vt_condmask(ctx, a, TCG_COND_EQ);
+return trans_czero_nez(ctx, a);
 }
-- 
2.34.1

[PATCH v1 1/2] target/riscv: add Zicond as an experimental extension

2023-01-20 Thread Philipp Tomsich

This implements the Zicond (conditional integer operations) extension,
as of version 1.0-draft-20230120 as an experimental extension in QEMU
("x-zicond").

The Zicond extension acts as a building block for branchless sequences
including conditional-{arithmetic,logic,select,move}.  Refer to the
specification for usage scenarios and application guidance.

The following instructions constitute Zicond:
  - czero.eqz rd, rs1, rs2  =>  rd = (rs2 == 0) ? 0 : rs1
  - czero.nez rd, rs1, rs2  =>  rd = (rs2 != 0) ? 0 : rs1

See
  
https://github.com/riscv/riscv-zicond/releases/download/v1.0-draft-20230120/riscv-zicond_1.0-draft-20230120.pdf
for the (current version of the) Zicond specification and usage details.

Signed-off-by: Philipp Tomsich 
---

 MAINTAINERS  |  7 
 target/riscv/cpu.c   |  4 ++
 target/riscv/cpu.h   |  1 +
 target/riscv/insn32.decode   |  4 ++
 target/riscv/insn_trans/trans_rvzicond.c.inc | 44 
 target/riscv/translate.c |  1 +
 6 files changed, 61 insertions(+)
 create mode 100644 target/riscv/insn_trans/trans_rvzicond.c.inc

diff --git a/MAINTAINERS b/MAINTAINERS
index 08ad1e5341..ca914c42fa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -295,6 +295,13 @@ F: include/hw/riscv/
 F: linux-user/host/riscv32/
 F: linux-user/host/riscv64/
 
+RISC-V Zicond extension
+M: Philipp Tomsich 
+M: Christoph Muellner 
+L: qemu-ri...@nongnu.org
+S: Supported
+F: target/riscv/insn_trans/trans_zicond.c.inc
+
 RISC-V XVentanaCondOps extension
 M: Philipp Tomsich 
 L: qemu-ri...@nongnu.org
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index cc75ca7667..f214b03e95 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -73,6 +73,7 @@ struct isa_ext_data {
 static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(h, false, PRIV_VERSION_1_12_0, ext_h),
 ISA_EXT_DATA_ENTRY(v, false, PRIV_VERSION_1_12_0, ext_v),
+ISA_EXT_DATA_ENTRY(zicond, true, PRIV_VERSION_1_12_0, ext_zicond),
 ISA_EXT_DATA_ENTRY(zicsr, true, PRIV_VERSION_1_10_0, ext_icsr),
 ISA_EXT_DATA_ENTRY(zifencei, true, PRIV_VERSION_1_10_0, ext_ifencei),
 ISA_EXT_DATA_ENTRY(zihintpause, true, PRIV_VERSION_1_10_0, 
ext_zihintpause),
@@ -1080,6 +1081,9 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_BOOL("x-smaia", RISCVCPU, cfg.ext_smaia, false),
 DEFINE_PROP_BOOL("x-ssaia", RISCVCPU, cfg.ext_ssaia, false),
 
+/* Zicond 1.0-draft-20230120 */
+DEFINE_PROP_BOOL("x-zicond", RISCVCPU, cfg.ext_zicond, false),
+
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index f5609b62a2..4b6ff800d3 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -446,6 +446,7 @@ struct RISCVCPUConfig {
 bool ext_zkt;
 bool ext_ifencei;
 bool ext_icsr;
+bool ext_zicond;
 bool ext_zihintpause;
 bool ext_smstateen;
 bool ext_sstc;
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b7e7613ea2..ca812c2f7a 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -890,3 +890,7 @@ sm3p1   00 01000 01001 . 001 . 0010011 @r2
 # *** RV32 Zksed Standard Extension ***
 sm4ed   .. 11000 . . 000 . 0110011 @k_aes
 sm4ks   .. 11010 . . 000 . 0110011 @k_aes
+
+# *** Zicond Standard Extension ***
+czero_eqz   111 . . 101 . 0110011 @r
+czero_nez   111 . . 111 . 0110011 @r
diff --git a/target/riscv/insn_trans/trans_rvzicond.c.inc 
b/target/riscv/insn_trans/trans_rvzicond.c.inc
new file mode 100644
index 00..ba35a76f1e
--- /dev/null
+++ b/target/riscv/insn_trans/trans_rvzicond.c.inc
@@ -0,0 +1,44 @@
+/*
+ * RISC-V translation routines for the XVentanaCondOps extension.
+ *
+ * Copyright (c) 2022 VRULL GmbH.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+/* Emits "$rd = ($rs2  $zero) ? $zero : $rs1" */
+static inline void gen_czero(TCGv dest, TCGv src1, TCGv src2, TCGCond cond)
+{
+TCGv zero = tcg_constant_tl(0);
+tcg_gen_movcond_tl(cond, dest, src2, zero, zero, src1);
+}
+
+static inline void gen_czero_eqz(TCGv dest, TCGv src1, TCGv src2)
+{
+gen_czero(dest, src1, src2, TCG_COND_EQ);
+}
+
+static inline void gen_czero_nez(TCGv dest, TCGv src1, TCGv src2)

[PATCH v1] target/riscv: update disas.c for xnor/orn/andn and slli.uw

2023-01-20 Thread Philipp Tomsich

The decoding of the following instructions from Zb[abcs] currently
contains decoding/printing errors:
 * xnor,orn,andn: the rs2 operand is not being printed
 * slli.uw: decodes and prints the immediate shift-amount as a
register (e.g. 'shift-by-2' becomes 'sp') instead of
interpreting this as an immediate

This commit updates the instruction descriptions to use the
appropriate decoding/printing formats.

Signed-off-by: Philipp Tomsich 

---

 disas/riscv.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/disas/riscv.c b/disas/riscv.c
index d216b9c39b..ddda687c13 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -1626,9 +1626,9 @@ const rv_opcode_data opcode_data[] = {
 { "cpop", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
 { "sext.h", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
 { "sext.b", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
-{ "xnor", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
-{ "orn", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
-{ "andn", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
+{ "xnor", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "orn", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "andn", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
 { "rol", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
 { "ror", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
 { "sh1add", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
@@ -1647,7 +1647,7 @@ const rv_opcode_data opcode_data[] = {
 { "clzw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
 { "clzw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
 { "cpopw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
-{ "slli.uw", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "slli.uw", rv_codec_i_sh5, rv_fmt_rd_rs1_imm, NULL, 0, 0, 0 },
 { "add.uw", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
 { "rolw", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
 { "rorw", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
-- 
2.34.1

Re: [PATCH 2/2] Revert "vhost-user: Introduce nested event loop in vhost_user_read()"

2023-01-20 Thread Maxime Coquelin





On 1/19/23 18:24, Greg Kurz wrote:

This reverts commit a7f523c7d114d445c5d83aecdba3efc038e5a692.

The nested event loop is broken by design. It's only user was removed.
Drop the code as well so that nobody ever tries to use it again.

I had to fix a couple of trivial conflicts around return values because
of 025faa872bcf ("vhost-user: stick to -errno error return convention").

Signed-off-by: Greg Kurz 
---
  hw/virtio/vhost-user.c | 65 --
  1 file changed, 5 insertions(+), 60 deletions(-)



Acked-by: Maxime Coquelin 

Thanks,
Maxime

Re: [PATCH v2 2/4] hw/char/pl011: implement a reset method

2023-01-20 Thread eiakovlev

On 1/20/23 12:45 PM, Peter Maydell  wrote:

On Thu, 19 Jan 2023 at 21:57, Evgeny Iakovlev
 wrote:
>
>
> On 1/19/2023 14:27, Peter Maydell wrote:
>> On Tue, 17 Jan 2023 at 22:05, Evgeny Iakovlev
>>  wrote:
>>> PL011 currently lacks a reset method. Implement it.
>>>
>>> Signed-off-by: Evgeny Iakovlev 
>>> ---
>>>hw/char/pl011.c | 31 ++-
>>>1 file changed, 26 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/hw/char/pl011.c b/hw/char/pl011.c
>>> index 329cc6926d..404d52a3b8 100644
>>> --- a/hw/char/pl011.c
>>> +++ b/hw/char/pl011.c
>>> @@ -397,11 +397,6 @@ static void pl011_init(Object *obj)
>>>s->clk = qdev_init_clock_in(DEVICE(obj), "clk", pl011_clock_update, 
s,
>>>ClockUpdate);
>>>
>>> -s->read_trigger = 1;
>>> -s->ifl = 0x12;
>>> -s->cr = 0x300;
>>> -s->flags = 0x90;
>>> -
>>>s->id = pl011_id_arm;
>>>}
>>>
>>> @@ -413,11 +408,37 @@ static void pl011_realize(DeviceState *dev, Error 
**errp)
>>> pl011_event, NULL, s, NULL, true);
>>>}
>>>
>>> +static void pl011_reset(DeviceState *dev)
>>> +{
>>> +PL011State *s = PL011(dev);
>>> +int i;
>>> +
>>> +s->lcr = 0;
>>> +s->rsr = 0;
>>> +s->dmacr = 0;
>>> +s->int_enabled = 0;
>>> +s->int_level = 0;
>>> +s->ilpr = 0;
>>> +s->ibrd = 0;
>>> +s->fbrd = 0;
>>> +s->read_pos = 0;
>>> +s->read_count = 0;
>>> +s->read_trigger = 1;
>>> +s->ifl = 0x12;
>>> +s->cr = 0x300;
>>> +s->flags = 0x90;
>>> +
>>> +for (i = 0; i < ARRAY_SIZE(s->irq); i++) {
>>> +qemu_irq_lower(s->irq[i]);
>>> +}
>> Reset should never touch outbound qemu_irq lines.
>> (The other end of the line will also reset and will end
>> up in the correct "as if the input is 0" state.)
>
>
> Really? I saw this reset happening on other devices in hw/char (don't
> remember which ones specifically), so i though it is needed.

A few devices mess with their IRQ line in a reset handler;
this is a bug but usually one you can get away with. Some
devices use the newer "three phase reset" approach which
does allow you to change IRQ line state in the 'hold' phase.
But generally the standard is not to touch the IRQ line if
its reset state would be 'low'. You only need to do odd
things for the rare case where an IRQ line is supposed to
be taken high on reset.

(The reason for the "no touching IRQs in reset" rule is that
there's no ordering on device reset, so if you raise an
IRQ line in your reset handler, you don't know if the
device on the other end has already reset and thus will
correctly deal with the 0->1 transition it sees, or if
it has not yet reset and is about to undo the effects of
that 0->1 transition. 3-phase reset lets devices split
their reset handling up, so you know that if you do something
with an IRQ line in the 'hold' phase that the device you're
talking to has definitely already done its 'enter' phase.
Though 3-phase reset is pretty new, so a lot of devices
don't use it yet, and I have a fairly strong suspicion
that there are still some issues with the design that we
will need to iron out as we make more use of it.)

I see. Thanks a lot for explaining.

thanks
-- PMM

Re: [PATCH 1/2] Revert "vhost-user: Monitor slave channel in vhost_user_read()"

2023-01-20 Thread Maxime Coquelin





On 1/19/23 18:24, Greg Kurz wrote:

This reverts commit db8a3772e300c1a656331a92da0785d81667dc81.

Motivation : this is breaking vhost-user with DPDK as reported in [0].

Received unexpected msg type. Expected 22 received 40
Fail to update device iotlb
Received unexpected msg type. Expected 40 received 22
Received unexpected msg type. Expected 22 received 11
Fail to update device iotlb
Received unexpected msg type. Expected 11 received 22
vhost VQ 1 ring restore failed: -71: Protocol error (71)
Received unexpected msg type. Expected 22 received 11
Fail to update device iotlb
Received unexpected msg type. Expected 11 received 22
vhost VQ 0 ring restore failed: -71: Protocol error (71)
unable to start vhost net: 71: falling back on userspace virtio

The failing sequence that leads to the first error is :
- QEMU sends a VHOST_USER_GET_STATUS (40) request to DPDK on the master
   socket
- QEMU starts a nested event loop in order to wait for the
   VHOST_USER_GET_STATUS response and to be able to process messages from
   the slave channel
- DPDK sends a couple of legitimate IOTLB miss messages on the slave
   channel
- QEMU processes each IOTLB request and sends VHOST_USER_IOTLB_MSG (22)
   updates on the master socket
- QEMU assumes to receive a response for the latest VHOST_USER_IOTLB_MSG
   but it gets the response for the VHOST_USER_GET_STATUS instead

The subsequent errors have the same root cause : the nested event loop
breaks the order by design. It lures QEMU to expect responses to the
latest message sent on the master socket to arrive first.

Since this was only needed for DAX enablement which is still not merged
upstream, just drop the code for now. A working solution will have to
be merged later on. Likely protect the master socket with a mutex
and service the slave channel with a separate thread, as discussed with
Maxime in the mail thread below.

[0] 
https://lore.kernel.org/qemu-devel/43145ede-89dc-280e-b953-6a2b436de...@redhat.com/

Reported-by: Yanghang Liu 
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2155173
Signed-off-by: Greg Kurz 
---
  hw/virtio/vhost-user.c | 35 +++
  1 file changed, 3 insertions(+), 32 deletions(-)



Acked-by: Maxime Coquelin 

Thanks,
Maxime

Re: [PATCH 4/4] tests/tcg/multiarch: add vma-pthread.c

2023-01-20 Thread Peter Maydell

On Fri, 13 Jan 2023 at 17:10, Alex Bennée  wrote:
>
>
> Peter Maydell  writes:
>
> > On Sat, 24 Dec 2022 at 15:19, Richard Henderson
> >  wrote:
> >>
> >> From: Ilya Leoshkevich 
> >>
> >> Add a test that locklessly changes and exercises page protection bits
> >> from various threads. This helps catch race conditions in the VMA
> >> handling.
> >>
> >> Signed-off-by: Ilya Leoshkevich 
> >> Message-Id: <20221223120252.513319-1-...@linux.ibm.com>
> >> Signed-off-by: Richard Henderson 
> >
> > I've noticed that this newly added vma-pthread test seems to
> > be flaky. Here's an example from a clang-user job:
> > https://gitlab.com/qemu-project/qemu/-/jobs/3600385176
> >
> > TEST vma-pthread-with-libbb.so on aarch64
> > fail indirect write 0x5500b1eff0 (Bad address)
> > timeout: the monitored command dumped core
> > Aborted
> > make[1]: *** [Makefile:173: run-plugin-vma-pthread-with-libbb.so] Error 134
> >
> > and another from a few days earlier:
> > https://gitlab.com/qemu-project/qemu/-/jobs/3572970612
> >
> > TEST vma-pthread-with-libsyscall.so on s390x
> > fail indirect read 0x4000999000 (Bad address)
> > timeout: the monitored command dumped core
> > Aborted
> > make[1]: *** [Makefile:173: run-plugin-vma-pthread-with-libsyscall.so] 
> > Error 134
> >
> > thanks
> > -- PMM
>
> Interesting those are both with plugins. I wonder if the tsan plugin
> fixes in my latest tree help?

I think this is a failure in the non-plugin case:
https://gitlab.com/qemu-project/qemu/-/jobs/3636082364

-- PMM

proposed 8.0 cycle freeze/release dates

2023-01-20 Thread Peter Maydell

Here's my proposal for 8.0 cycle freeze and release dates:

https://wiki.qemu.org/Planning/8.0

2022-12-14 : Beginning of development phase
2022-03-07 : Soft feature freeze. Only bug fixes after this point.
 All feature changes must be already in a sub maintainer
 tree and all pull requests from submaintainers must have
 been sent to the list by this date.
2022-03-14: Hard feature freeze. Tag rc0
2022-03-21: Tag rc1
2022-03-28: Tag rc2
2022-04-04: Tag rc3
2022-04-11: Release; or tag rc4 if needed
2022-04-18: Release if we needed an rc4

thanks
-- PMM

Re: [PATCH v4 1/3] memory: prevent dma-reentracy issues

2023-01-20 Thread Peter Maydell

On Fri, 20 Jan 2023 at 14:42, Darren Kenny  wrote:
> Generally, this looks good, but I do have a comment below...
>
> On Thursday, 2023-01-19 at 02:00:02 -05, Alexander Bulekov wrote:
> > Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
> > This flag is set/checked prior to calling a device's MemoryRegion
> > handlers, and set when device code initiates DMA.  The purpose of this
> > flag is to prevent two types of DMA-based reentrancy issues:

> > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > index e05332d07f..90ffaaa4f5 100644
> > --- a/softmmu/memory.c
> > +++ b/softmmu/memory.c
> > @@ -533,6 +533,7 @@ static MemTxResult access_with_adjusted_size(hwaddr 
> > addr,
> >  uint64_t access_mask;
> >  unsigned access_size;
> >  unsigned i;
> > +DeviceState *dev = NULL;
> >  MemTxResult r = MEMTX_OK;
> >
> >  if (!access_size_min) {
> > @@ -542,6 +543,17 @@ static MemTxResult access_with_adjusted_size(hwaddr 
> > addr,
> >  access_size_max = 4;
> >  }
> >
> > +/* Do not allow more than one simultanous access to a device's IO 
> > Regions */
> > +if (mr->owner &&
> > +!mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
> > +dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
>
> I don't know how likely this is to happen, but according to:
>
> - https://qemu-project.gitlab.io/qemu/devel/qom.html#c.object_dynamic_cast
>
> it is possible for the object_dynamic_cast() function to return NULL,
> so it might make sense to wrap the subsequent calls in a test of dev !=
> NULL.

Yes. This came up in a previous version of this:
https://lore.kernel.org/qemu-devel/CAFEAcA8E4nDoAWcj-v-dED-0hDtXGjJNSp3A=kdgf8uocw0...@mail.gmail.com/

It's generally a bug to call object_dynamic_cast() and then not check
the return value.

thanks
-- PMM

Re: [PATCH v4 1/3] memory: prevent dma-reentracy issues

2023-01-20 Thread Darren Kenny

Hi Alex,

Generally, this looks good, but I do have a comment below...

On Thursday, 2023-01-19 at 02:00:02 -05, Alexander Bulekov wrote:
> Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
> This flag is set/checked prior to calling a device's MemoryRegion
> handlers, and set when device code initiates DMA.  The purpose of this
> flag is to prevent two types of DMA-based reentrancy issues:
>
> 1.) mmio -> dma -> mmio case
> 2.) bh -> dma write -> mmio case
>
> These issues have led to problems such as stack-exhaustion and
> use-after-frees.
>
> Summary of the problem from Peter Maydell:
> https://lore.kernel.org/qemu-devel/cafeaca_23vc7he3iam-jva6w38lk4hjowae5kcknhprd5fp...@mail.gmail.com
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
> Signed-off-by: Alexander Bulekov 
> ---
>  include/hw/qdev-core.h |  7 +++
>  softmmu/memory.c   | 15 +++
>  softmmu/trace-events   |  1 +
>  3 files changed, 23 insertions(+)
>
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 35fddb19a6..8858195262 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -162,6 +162,10 @@ struct NamedClockList {
>  QLIST_ENTRY(NamedClockList) node;
>  };
>  
> +typedef struct {
> +bool engaged_in_io;
> +} MemReentrancyGuard;
> +
>  /**
>   * DeviceState:
>   * @realized: Indicates whether the device has been fully constructed.
> @@ -194,6 +198,9 @@ struct DeviceState {
>  int alias_required_for_version;
>  ResettableState reset;
>  GSList *unplug_blockers;
> +
> +/* Is the device currently in mmio/pio/dma? Used to prevent re-entrancy 
> */
> +MemReentrancyGuard mem_reentrancy_guard;
>  };
>  
>  struct DeviceListener {
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index e05332d07f..90ffaaa4f5 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -533,6 +533,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
>  uint64_t access_mask;
>  unsigned access_size;
>  unsigned i;
> +DeviceState *dev = NULL;
>  MemTxResult r = MEMTX_OK;
>  
>  if (!access_size_min) {
> @@ -542,6 +543,17 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
>  access_size_max = 4;
>  }
>  
> +/* Do not allow more than one simultanous access to a device's IO 
> Regions */
> +if (mr->owner &&
> +!mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
> +dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);

I don't know how likely this is to happen, but according to:

- https://qemu-project.gitlab.io/qemu/devel/qom.html#c.object_dynamic_cast

it is possible for the object_dynamic_cast() function to return NULL,
so it might make sense to wrap the subsequent calls in a test of dev !=
NULL.

Thanks,

Darren.

> +if (dev->mem_reentrancy_guard.engaged_in_io) {
> +trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, 
> size);
> +return MEMTX_ERROR;
> +}
> +dev->mem_reentrancy_guard.engaged_in_io = true;
> +}
> +
>  /* FIXME: support unaligned access? */
>  access_size = MAX(MIN(size, access_size_max), access_size_min);
>  access_mask = MAKE_64BIT_MASK(0, access_size * 8);
> @@ -556,6 +568,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
>  access_mask, attrs);
>  }
>  }
> +if (dev) {
> +dev->mem_reentrancy_guard.engaged_in_io = false;
> +}
>  return r;
>  }
>  
> diff --git a/softmmu/trace-events b/softmmu/trace-events
> index 22606dc27b..62d04ea9a7 100644
> --- a/softmmu/trace-events
> +++ b/softmmu/trace-events
> @@ -13,6 +13,7 @@ memory_region_ops_read(int cpu_index, void *mr, uint64_t 
> addr, uint64_t value, u
>  memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t 
> value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 
> 0x%"PRIx64" size %u name '%s'"
>  memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, 
> uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 
> 0x%"PRIx64" size %u"
>  memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, 
> uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 
> 0x%"PRIx64" size %u"
> +memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, 
> unsigned size) "cpu %d mr %p offset 0x%"PRIx64" size %u"
>  memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, 
> uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 
> 0x%"PRIx64" size %u"
>  memory_region_ram_device_write(int cpu_index, void *mr,

Re: [PATCH v14 06/11] s390x/cpu topology: interception of PTF instruction

2023-01-20 Thread Pierre Morel





On 1/16/23 19:24, Nina Schoetterl-Glausch wrote:

On Thu, 2023-01-05 at 15:53 +0100, Pierre Morel wrote:

When the host supports the CPU topology facility, the PTF
instruction with function code 2 is interpreted by the SIE,
provided that the userland hypervizor activates the interpretation
by using the KVM_CAP_S390_CPU_TOPOLOGY KVM extension.

The PTF instructions with function code 0 and 1 are intercepted
and must be emulated by the userland hypervizor.

During RESET all CPU of the configuration are placed in
horizontal polarity.

Signed-off-by: Pierre Morel 
---
  include/hw/s390x/cpu-topology.h|  3 +
  include/hw/s390x/s390-virtio-ccw.h |  6 ++
  target/s390x/cpu.h |  1 +
  hw/s390x/cpu-topology.c| 92 ++
  target/s390x/cpu-sysemu.c  | 16 ++
  target/s390x/kvm/kvm.c | 11 
  6 files changed, 129 insertions(+)

diff --git a/include/hw/s390x/cpu-topology.h b/include/hw/s390x/cpu-topology.h
index 9571aa70e5..33e23d78b9 100644
--- a/include/hw/s390x/cpu-topology.h
+++ b/include/hw/s390x/cpu-topology.h
@@ -55,11 +55,13 @@ typedef struct S390Topology {
  QTAILQ_HEAD(, S390TopologyEntry) list;
  uint8_t *sockets;
  CpuTopology *smp;
+int polarity;
  } S390Topology;
  
  #ifdef CONFIG_KVM

  bool s390_has_topology(void);
  void s390_topology_set_cpu(MachineState *ms, S390CPU *cpu, Error **errp);
+void s390_topology_set_polarity(int polarity);
  #else
  static inline bool s390_has_topology(void)
  {
@@ -68,6 +70,7 @@ static inline bool s390_has_topology(void)
  static inline void s390_topology_set_cpu(MachineState *ms,
   S390CPU *cpu,
   Error **errp) {}
+static inline void s390_topology_set_polarity(int polarity) {}
  #endif
  extern S390Topology s390_topology;
  
diff --git a/include/hw/s390x/s390-virtio-ccw.h b/include/hw/s390x/s390-virtio-ccw.h

index 9bba21a916..c1d46e78af 100644
--- a/include/hw/s390x/s390-virtio-ccw.h
+++ b/include/hw/s390x/s390-virtio-ccw.h
@@ -30,6 +30,12 @@ struct S390CcwMachineState {
  uint8_t loadparm[8];
  };
  
+#define S390_PTF_REASON_NONE (0x00 << 8)

+#define S390_PTF_REASON_DONE (0x01 << 8)
+#define S390_PTF_REASON_BUSY (0x02 << 8)
+#define S390_TOPO_FC_MASK 0xffUL
+void s390_handle_ptf(S390CPU *cpu, uint8_t r1, uintptr_t ra);
+
  struct S390CcwMachineClass {
  /*< private >*/
  MachineClass parent_class;
diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index 01ade07009..5da4041576 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -864,6 +864,7 @@ void s390_do_cpu_set_diag318(CPUState *cs, run_on_cpu_data 
arg);
  int s390_assign_subch_ioeventfd(EventNotifier *notifier, uint32_t sch_id,
  int vq, bool assign);
  void s390_cpu_topology_reset(void);
+void s390_cpu_topology_set(void);


I don't like this name much, it's nondescript.
s390_cpu_topology_set_modified ?


  #ifndef CONFIG_USER_ONLY
  unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
  #else
diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index 438055c612..e6b4692581 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -97,6 +97,98 @@ static s390_topology_id s390_topology_from_cpu(S390CPU *cpu)
  }
  
  /**

+ * s390_topology_set_polarity
+ * @polarity: horizontal or vertical
+ *
+ * Changes the polarity of all the CPU in the configuration.
+ *
+ * If the dedicated CPU modifier attribute is set a vertical
+ * polarization is always high (Architecture).
+ * Otherwise we decide to set it as medium.
+ *
+ * Once done, advertise a topology change.
+ */
+void s390_topology_set_polarity(int polarity)


I don't like that this function ignores what kind of vertical polarization is 
passed,
it's confusing.
That seems like a further reason to split horizontal/vertical from the 
entitlement.



Yes.
I will also make the ordering of the TLE at the last moment instead of 
doing it during hotplug or QAPI command.
This is something Cedric proposed some time ago and I think it is right, 
it will be easier to do it once in STSI than on every change of topology,

hotplug, QAPI and polarization change with PTF.

Regards,
Pierre

--
Pierre Morel
IBM Lab Boeblingen

[PATCH v2 7/7] hw/mem/cxl_type3: Add CXL RAS Error Injection Support

2023-01-20 Thread Jonathan Cameron via

CXL uses PCI AER Internal errors to signal to the host that an error has
occurred. The host can then read more detailed status from the CXL RAS
capability.

For uncorrectable errors: support multiple injection in one operation
as this is needed to reliably test multiple header logging support in an
OS. The equivalent feature doesn't exist for correctable errors, so only
one error need be injected at a time.

Note:
 - Header content needs to be manually specified in a fashion that
   matches the specification for what can be in the header for each
   error type.

Injection via QMP:
{ "execute": "qmp_capabilities" }
...
{ "execute": "cxl-inject-uncorrectable-errors",
  "arguments": {
"path": "/machine/peripheral/cxl-pmem0",
"errors": [
{
"type": "cache-address-parity",
"header": [ 3, 4]
},
{
"type": "cache-data-parity",
"header": 
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
},
{
"type": "internal",
"header": [ 1, 2, 4]
}
]
  }}
...
{ "execute": "cxl-inject-correctable-error",
"arguments": {
"path": "/machine/peripheral/cxl-pmem0",
"type": "physical",
"header": [ 3, 4]
} }

Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-component-utils.c   |   4 +-
 hw/mem/cxl_type3.c | 290 +
 hw/mem/cxl_type3_stubs.c   |  10 ++
 hw/mem/meson.build |   2 +
 include/hw/cxl/cxl_component.h |  26 +++
 include/hw/cxl/cxl_device.h|  11 ++
 qapi/cxl.json  | 113 +
 qapi/meson.build   |   1 +
 qapi/qapi-schema.json  |   1 +
 9 files changed, 457 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
index 3edd303a33..02fb6c17b9 100644
--- a/hw/cxl/cxl-component-utils.c
+++ b/hw/cxl/cxl-component-utils.c
@@ -142,16 +142,18 @@ static void ras_init_common(uint32_t *reg_state, uint32_t 
*write_msk)
  * be handled as RO.
  */
 reg_state[R_CXL_RAS_UNC_ERR_STATUS] = 0;
+write_msk[R_CXL_RAS_UNC_ERR_STATUS] = 0x1cfff;
 /* Bits 12-13 and 17-31 reserved in CXL 2.0 */
 reg_state[R_CXL_RAS_UNC_ERR_MASK] = 0x1cfff;
 write_msk[R_CXL_RAS_UNC_ERR_MASK] = 0x1cfff;
 reg_state[R_CXL_RAS_UNC_ERR_SEVERITY] = 0x1cfff;
 write_msk[R_CXL_RAS_UNC_ERR_SEVERITY] = 0x1cfff;
 reg_state[R_CXL_RAS_COR_ERR_STATUS] = 0;
+write_msk[R_CXL_RAS_COR_ERR_STATUS] = 0x7f;
 reg_state[R_CXL_RAS_COR_ERR_MASK] = 0x7f;
 write_msk[R_CXL_RAS_COR_ERR_MASK] = 0x7f;
 /* CXL switches and devices must set */
-reg_state[R_CXL_RAS_ERR_CAP_CTRL] = 0x00;
+reg_state[R_CXL_RAS_ERR_CAP_CTRL] = 0x200;
 }
 
 static void hdm_init_common(uint32_t *reg_state, uint32_t *write_msk,
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 6cdd988d1d..41ae19bf66 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1,6 +1,7 @@
 #include "qemu/osdep.h"
 #include "qemu/units.h"
 #include "qemu/error-report.h"
+#include "qapi/qapi-commands-cxl.h"
 #include "hw/mem/memory-device.h"
 #include "hw/mem/pc-dimm.h"
 #include "hw/pci/pci.h"
@@ -323,6 +324,66 @@ static void hdm_decoder_commit(CXLType3Dev *ct3d, int 
which)
 ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMITTED, 1);
 }
 
+static int ct3d_qmp_uncor_err_to_cxl(CxlUncorErrorType qmp_err)
+{
+switch (qmp_err) {
+case CXL_UNCOR_ERROR_TYPE_CACHE_DATA_PARITY:
+return CXL_RAS_UNC_ERR_CACHE_DATA_PARITY;
+case CXL_UNCOR_ERROR_TYPE_CACHE_ADDRESS_PARITY:
+return CXL_RAS_UNC_ERR_CACHE_ADDRESS_PARITY;
+case CXL_UNCOR_ERROR_TYPE_CACHE_BE_PARITY:
+return CXL_RAS_UNC_ERR_CACHE_BE_PARITY;
+case CXL_UNCOR_ERROR_TYPE_CACHE_DATA_ECC:
+return CXL_RAS_UNC_ERR_CACHE_DATA_ECC;
+case CXL_UNCOR_ERROR_TYPE_MEM_DATA_PARITY:
+return CXL_RAS_UNC_ERR_MEM_DATA_PARITY;
+case CXL_UNCOR_ERROR_TYPE_MEM_ADDRESS_PARITY:
+return CXL_RAS_UNC_ERR_MEM_ADDRESS_PARITY;
+case CXL_UNCOR_ERROR_TYPE_MEM_BE_PARITY:
+return CXL_RAS_UNC_ERR_MEM_BE_PARITY;
+case CXL_UNCOR_ERROR_TYPE_MEM_DATA_ECC:
+return CXL_RAS_UNC_ERR_MEM_DATA_ECC;
+case CXL_UNCOR_ERROR_TYPE_REINIT_THRESHOLD:
+return CXL_RAS_UNC_ERR_REINIT_THRESHOLD;
+case CXL_UNCOR_ERROR_TYPE_RSVD_ENCODING:
+return CXL_RAS_UNC_ERR_RSVD_ENCODING;
+case CXL_UNCOR_ERROR_TYPE_POISON_RECEIVED:
+return CXL_RAS_UNC_ERR_POISON_RECEIVED;
+case CXL_UNCOR_ERROR_TYPE_RECEIVER_OVERFLOW:
+return CXL_RAS_UNC_ERR_RECEIVER_OVERFLOW;
+case CXL_UNCOR_ERROR_TYPE_INTERNAL:
+return CXL_RAS_UNC_ERR_INTERNAL;
+case CXL_UNCOR_ERROR_TYPE_CXL_IDE_TX:
+return CXL_RAS_UNC_ERR_CXL_IDE_TX;
+case CXL_UNCOR_ERROR_TYPE_CXL_IDE_RX:
+return CXL_RAS_UNC_ERR_CXL_IDE_RX;
+default:
+return -EINVAL;
+}
+}
+
+static int

[PATCH v2 6/7] hw/pci/aer: Make PCIE AER error injection facility available for other emulation to use.

2023-01-20 Thread Jonathan Cameron via

This infrastructure will be reused for CXL RAS error injection
in patches that follow.

Signed-off-by: Jonathan Cameron 
---
 hw/pci/pci-internal.h | 1 -
 include/hw/pci/pcie_aer.h | 1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/pci/pci-internal.h b/hw/pci/pci-internal.h
index 2ea356bdf5..a7d6d8a732 100644
--- a/hw/pci/pci-internal.h
+++ b/hw/pci/pci-internal.h
@@ -20,6 +20,5 @@ void pcibus_dev_print(Monitor *mon, DeviceState *dev, int 
indent);
 
 int pcie_aer_parse_error_string(const char *error_name,
 uint32_t *status, bool *correctable);
-int pcie_aer_inject_error(PCIDevice *dev, const PCIEAERErr *err);
 
 #endif
diff --git a/include/hw/pci/pcie_aer.h b/include/hw/pci/pcie_aer.h
index 65e71d98fe..1234fdc4e2 100644
--- a/include/hw/pci/pcie_aer.h
+++ b/include/hw/pci/pcie_aer.h
@@ -100,4 +100,5 @@ void pcie_aer_root_write_config(PCIDevice *dev,
 uint32_t addr, uint32_t val, int len,
 uint32_t root_cmd_prev);
 
+int pcie_aer_inject_error(PCIDevice *dev, const PCIEAERErr *err);
 #endif /* QEMU_PCIE_AER_H */
-- 
2.37.2

[PATCH v2 5/7] hw/mem/cxl-type3: Add AER extended capability

2023-01-20 Thread Jonathan Cameron via

This enables AER error injection to function as expected.
It is intended as a building block in enabling CXL RAS error injection
in the following patches.

Signed-off-by: Jonathan Cameron 
---
 hw/mem/cxl_type3.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 217a5e639b..6cdd988d1d 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -250,6 +250,7 @@ static void ct3d_config_write(PCIDevice *pci_dev, uint32_t 
addr, uint32_t val,
 
 pcie_doe_write_config(>doe_cdat, addr, val, size);
 pci_default_write_config(pci_dev, addr, val, size);
+pcie_aer_write_config(pci_dev, addr, val, size);
 }
 
 /*
@@ -452,8 +453,19 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 cxl_cstate->cdat.free_cdat_table = ct3_free_cdat_table;
 cxl_cstate->cdat.private = ct3d;
 cxl_doe_cdat_init(cxl_cstate, errp);
+
+pcie_cap_deverr_init(pci_dev);
+/* Leave a bit of room for expansion */
+rc = pcie_aer_init(pci_dev, PCI_ERR_VER, 0x200, PCI_ERR_SIZEOF, NULL);
+if (rc) {
+goto err_release_cdat;
+}
+
 return;
 
+err_release_cdat:
+cxl_doe_cdat_release(cxl_cstate);
+g_free(regs->special_ops);
 err_address_space_free:
 address_space_destroy(>hostmem_as);
 return;
@@ -465,6 +477,7 @@ static void ct3_exit(PCIDevice *pci_dev)
 CXLComponentState *cxl_cstate = >cxl_cstate;
 ComponentRegisters *regs = _cstate->crb;
 
+pcie_aer_exit(pci_dev);
 cxl_doe_cdat_release(cxl_cstate);
 g_free(regs->special_ops);
 address_space_destroy(>hostmem_as);
-- 
2.37.2

[PATCH v2 4/7] hw/pci-bridge/cxl_root_port: Wire up MSI

2023-01-20 Thread Jonathan Cameron via

Done to avoid fixing ACPI route description of traditional PCI interrupts on q35
and because we should probably move with the times anyway.

Signed-off-by: Jonathan Cameron 
---
 hw/pci-bridge/cxl_root_port.c | 61 +++
 1 file changed, 61 insertions(+)

diff --git a/hw/pci-bridge/cxl_root_port.c b/hw/pci-bridge/cxl_root_port.c
index 00195257f7..7dfd20aa67 100644
--- a/hw/pci-bridge/cxl_root_port.c
+++ b/hw/pci-bridge/cxl_root_port.c
@@ -22,6 +22,7 @@
 #include "qemu/range.h"
 #include "hw/pci/pci_bridge.h"
 #include "hw/pci/pcie_port.h"
+#include "hw/pci/msi.h"
 #include "hw/qdev-properties.h"
 #include "hw/sysbus.h"
 #include "qapi/error.h"
@@ -29,6 +30,10 @@
 
 #define CXL_ROOT_PORT_DID 0x7075
 
+#define CXL_RP_MSI_OFFSET   0x60
+#define CXL_RP_MSI_SUPPORTED_FLAGS  PCI_MSI_FLAGS_MASKBIT
+#define CXL_RP_MSI_NR_VECTOR2
+
 /* Copied from the gen root port which we derive */
 #define GEN_PCIE_ROOT_PORT_AER_OFFSET 0x100
 #define GEN_PCIE_ROOT_PORT_ACS_OFFSET \
@@ -47,6 +52,49 @@ typedef struct CXLRootPort {
 #define TYPE_CXL_ROOT_PORT "cxl-rp"
 DECLARE_INSTANCE_CHECKER(CXLRootPort, CXL_ROOT_PORT, TYPE_CXL_ROOT_PORT)
 
+/*
+ * If two MSI vector are allocated, Advanced Error Interrupt Message Number
+ * is 1. otherwise 0.
+ * 17.12.5.10 RPERRSTS,  32:27 bit Advanced Error Interrupt Message Number.
+ */
+static uint8_t cxl_rp_aer_vector(const PCIDevice *d)
+{
+switch (msi_nr_vectors_allocated(d)) {
+case 1:
+return 0;
+case 2:
+return 1;
+case 4:
+case 8:
+case 16:
+case 32:
+default:
+break;
+}
+abort();
+return 0;
+}
+
+static int cxl_rp_interrupts_init(PCIDevice *d, Error **errp)
+{
+int rc;
+
+rc = msi_init(d, CXL_RP_MSI_OFFSET, CXL_RP_MSI_NR_VECTOR,
+  CXL_RP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
+  CXL_RP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
+  errp);
+if (rc < 0) {
+assert(rc == -ENOTSUP);
+}
+
+return rc;
+}
+
+static void cxl_rp_interrupts_uninit(PCIDevice *d)
+{
+msi_uninit(d);
+}
+
 static void latch_registers(CXLRootPort *crp)
 {
 uint32_t *reg_state = crp->cxl_cstate.crb.cache_mem_registers;
@@ -183,6 +231,15 @@ static void cxl_rp_dvsec_write_config(PCIDevice *dev, 
uint32_t addr,
 }
 }
 
+static void cxl_rp_aer_vector_update(PCIDevice *d)
+{
+PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(d);
+
+if (rpc->aer_vector) {
+pcie_aer_root_set_vector(d, rpc->aer_vector(d));
+}
+}
+
 static void cxl_rp_write_config(PCIDevice *d, uint32_t address, uint32_t val,
 int len)
 {
@@ -192,6 +249,7 @@ static void cxl_rp_write_config(PCIDevice *d, uint32_t 
address, uint32_t val,
 
 pcie_cap_slot_get(d, _ctl, _sta);
 pci_bridge_write_config(d, address, val, len);
+cxl_rp_aer_vector_update(d);
 pcie_cap_flr_write_config(d, address, val, len);
 pcie_cap_slot_write_config(d, slt_ctl, slt_sta, address, val, len);
 pcie_aer_write_config(d, address, val, len);
@@ -220,6 +278,9 @@ static void cxl_root_port_class_init(ObjectClass *oc, void 
*data)
 
 rpc->aer_offset = GEN_PCIE_ROOT_PORT_AER_OFFSET;
 rpc->acs_offset = GEN_PCIE_ROOT_PORT_ACS_OFFSET;
+rpc->aer_vector = cxl_rp_aer_vector;
+rpc->interrupts_init = cxl_rp_interrupts_init;
+rpc->interrupts_uninit = cxl_rp_interrupts_uninit;
 
 dc->hotpluggable = false;
 }
-- 
2.37.2

[PATCH v2 3/7] hw/pci-bridge/cxl_root_port: Wire up AER

2023-01-20 Thread Jonathan Cameron via

We are missing necessary config write handling for AER emulation in
the CXL root port. Add it based on pcie_root_port.c

Signed-off-by: Jonathan Cameron 
---
 hw/pci-bridge/cxl_root_port.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/pci-bridge/cxl_root_port.c b/hw/pci-bridge/cxl_root_port.c
index 6664783974..00195257f7 100644
--- a/hw/pci-bridge/cxl_root_port.c
+++ b/hw/pci-bridge/cxl_root_port.c
@@ -187,12 +187,15 @@ static void cxl_rp_write_config(PCIDevice *d, uint32_t 
address, uint32_t val,
 int len)
 {
 uint16_t slt_ctl, slt_sta;
+uint32_t root_cmd =
+pci_get_long(d->config + d->exp.aer_cap + PCI_ERR_ROOT_COMMAND);
 
 pcie_cap_slot_get(d, _ctl, _sta);
 pci_bridge_write_config(d, address, val, len);
 pcie_cap_flr_write_config(d, address, val, len);
 pcie_cap_slot_write_config(d, slt_ctl, slt_sta, address, val, len);
 pcie_aer_write_config(d, address, val, len);
+pcie_aer_root_write_config(d, address, val, len, root_cmd);
 
 cxl_rp_dvsec_write_config(d, address, val, len);
 }
-- 
2.37.2

Re: [PATCH v2 1/2] tests/avocado: Factor file_truncate() helper out

2023-01-20 Thread Cédric Le Goater


On 1/20/23 14:43, Philippe Mathieu-Daudé wrote:

Factor file_truncate() helper out of image_pow2ceil_expand()
for reuse.

Signed-off-by: Philippe Mathieu-Daudé 
---
  tests/avocado/boot_linux_console.py | 13 +++--
  1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/tests/avocado/boot_linux_console.py 
b/tests/avocado/boot_linux_console.py
index 8c1d981586..8a598be966 100644
--- a/tests/avocado/boot_linux_console.py
+++ b/tests/avocado/boot_linux_console.py
@@ -30,15 +30,16 @@
  def pow2ceil(x):
  return 1 if x == 0 else 2**(x - 1).bit_length()
  
+def file_truncate(path, size):

+if size != os.path.getsize(path):
+with open(path, 'ab+') as fd:
+fd.truncate(size)
+
  """
  Expand file size to next power of 2
  """
-def image_pow2ceil_expand(path):
-size = os.path.getsize(path)
-size_aligned = pow2ceil(size)
-if size != size_aligned:
-with open(path, 'ab+') as fd:
-fd.truncate(size_aligned)
+def image_pow2ceil_expand(path, size):


The image_pow2ceil_expand() callers should be changed to add 'size' argument.

C.



+file_truncate(path, pow2ceil(size))
  
  class LinuxKernelTest(QemuSystemTest):

  KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '

[PATCH v2 2/7] hw/pci/aer: Add missing routing for AER errors

2023-01-20 Thread Jonathan Cameron via

PCIe r6.0 Figure 6-3 "Pseudo Logic Diagram for Selected Error Message Control
and Status Bits" includes a right hand branch under "All PCI Express devices"
that allows for messages to be generated or sent onwards without SERR#
being set as long as the appropriate per error class bit in the PCIe
Device Control Register is set.

Implement that branch thus enabling routing of ERR_COR, ERR_NONFATAL
and ERR_FATAL under OSes that set these bits appropriately (e.g. Linux)

Signed-off-by: Jonathan Cameron 
---
 hw/pci/pcie_aer.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
index 909e027d99..103667c368 100644
--- a/hw/pci/pcie_aer.c
+++ b/hw/pci/pcie_aer.c
@@ -192,8 +192,16 @@ static void pcie_aer_update_uncor_status(PCIDevice *dev)
 static bool
 pcie_aer_msg_alldev(PCIDevice *dev, const PCIEAERMsg *msg)
 {
+uint16_t devctl = pci_get_word(dev->config + dev->exp.exp_cap +
+   PCI_EXP_DEVCTL);
 if (!(pcie_aer_msg_is_uncor(msg) &&
-  (pci_get_word(dev->config + PCI_COMMAND) & PCI_COMMAND_SERR))) {
+  (pci_get_word(dev->config + PCI_COMMAND) & PCI_COMMAND_SERR)) &&
+!((msg->severity == PCI_ERR_ROOT_CMD_NONFATAL_EN) &&
+  (devctl & PCI_EXP_DEVCTL_NFERE)) &&
+!((msg->severity == PCI_ERR_ROOT_CMD_COR_EN) &&
+  (devctl & PCI_EXP_DEVCTL_CERE)) &&
+!((msg->severity == PCI_ERR_ROOT_CMD_FATAL_EN) &&
+  (devctl & PCI_EXP_DEVCTL_FERE))) {
 return false;
 }
 
-- 
2.37.2

[PATCH v2 1/7] hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register

2023-01-20 Thread Jonathan Cameron via

This register in AER should be both writeable and should
have a default value with a couple of the errors masked
including the Uncorrectable Internal Error used by CXL for
it's error reporting.

Signed-off-by: Jonathan Cameron 
---
 hw/pci/pcie_aer.c  | 4 
 include/hw/pci/pcie_regs.h | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
index 9a19be44ae..909e027d99 100644
--- a/hw/pci/pcie_aer.c
+++ b/hw/pci/pcie_aer.c
@@ -112,6 +112,10 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, 
uint16_t offset,
 
 pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
  PCI_ERR_UNC_SUPPORTED);
+pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
+ PCI_ERR_UNC_MASK_DEFAULT);
+pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
+ PCI_ERR_UNC_SUPPORTED);
 
 pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER,
  PCI_ERR_UNC_SEVERITY_DEFAULT);
diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
index 963dc2e170..6ec4785448 100644
--- a/include/hw/pci/pcie_regs.h
+++ b/include/hw/pci/pcie_regs.h
@@ -155,6 +155,9 @@ typedef enum PCIExpLinkWidth {
  PCI_ERR_UNC_ATOP_EBLOCKED |\
  PCI_ERR_UNC_TLP_PRF_BLOCKED)
 
+#define PCI_ERR_UNC_MASK_DEFAULT(PCI_ERR_UNC_INTN | \
+ PCI_ERR_UNC_TLP_PRF_BLOCKED)
+
 #define PCI_ERR_UNC_SEVERITY_DEFAULT(PCI_ERR_UNC_DLP |  \
  PCI_ERR_UNC_SDN |  \
  PCI_ERR_UNC_FCP |  \
-- 
2.37.2

[PATCH v2 0/7] hw/cxl: RAS error emulation and injection

2023-01-20 Thread Jonathan Cameron via

v2: Thanks to Mike Maslenkin for review.
- Fix wrong parameter type to ct3d_qmp_cor_err_to_cxl()
- Rework use of CXLError local variable in ct3d_reg_write() to improve
  code readability.

CXL error reporting is complex. This series only covers the protocol
related errors reported via PCIE AER - Ira Weiny has posted support for
Event log based injection and I will post an update of Poison list injection
shortly. My proposal is to upstream this one first, followed by Ira's Event
Log series, then finally the Poison List handling. That is based on likely
order of Linux kernel support (the support for this type of error reporting
went in during the recent merge window, the others are still under review).
Note we may propose other non error related features in between!
The current revisions of all the error injection can be found at:
https://gitlab.com/jic23/qemu/-/tree/cxl-2023-01-11

In order to test the kernel support for RAS error handling, I previously
provided this series via gitlab, enabling David Jiang's kernel patches
to be tested.

Now that Linux kernel support is upstream, this series is proposing the
support for upstream inclusion in QEMU. Note that support for Multiple
Header Recording has been added to QEMU the meantime and a kernel
patch to use that feature sent out.

https://lore.kernel.org/linux-cxl/20230113154058.16227-1-jonathan.came...@huawei.com/T/#t

There are two generic PCI AER precursor feature additions.
1) The PCI_ERR_UCOR_MASK register has not been implemented until now
   and is necessary for correct emulation.
2) The routing for AER errors, via existing AER error injection, only
   covered one of two paths given in the PCIe base specification,
   unfortunately not the one used by the Linux kernel CXL support.

The use of MSI for the CXL root ports, both makes sense from the point
of view of how it may well be implemented, and works around the documented
lack of PCI interrupt routing in i386/q35. I have a hack that lets
us correctly route those interrupts but don't currently plan to post it.

The actual CXL error injection uses a new QMP interface as documented
in the final patch description. The existing AER error injection
internals are reused though it's HMP interface is not.

Injection via QMP:
{ "execute": "qmp_capabilities" }
...
{ "execute": "cxl-inject-uncorrectable-errors",
  "arguments": {
"path": "/machine/peripheral/cxl-pmem0",
"errors": [
{
"type": "cache-address-parity",
"header": [ 3, 4]
},
{
"type": "cache-data-parity",
"header": 
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
},
{
"type": "internal",
"header": [ 1, 2, 4]
}
]
  }}
...
{ "execute": "cxl-inject-correctable-error",
"arguments": {
"path": "/machine/peripheral/cxl-pmem0",
"type": "physical",
"header": [ 3, 4]
} }

Based on top of:
https://lore.kernel.org/all/20230112102644.27830-1-jonathan.came...@huawei.com/
[PATCH v2 0/8] hw/cxl: CXL emulation cleanups and minor fixes for upstream

Jonathan Cameron (7):
  hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register
  hw/pci/aer: Add missing routing for AER errors
  hw/pci-bridge/cxl_root_port: Wire up AER
  hw/pci-bridge/cxl_root_port: Wire up MSI
  hw/mem/cxl-type3: Add AER extended capability
  hw/pci/aer: Make PCIE AER error injection facility available for other
emulation to use.
  hw/mem/cxl_type3: Add CXL RAS Error Injection Support.

 hw/cxl/cxl-component-utils.c   |   4 +-
 hw/mem/cxl_type3.c | 303 +
 hw/mem/cxl_type3_stubs.c   |  10 ++
 hw/mem/meson.build |   2 +
 hw/pci-bridge/cxl_root_port.c  |  64 +++
 hw/pci/pci-internal.h  |   1 -
 hw/pci/pcie_aer.c  |  14 +-
 include/hw/cxl/cxl_component.h |  26 +++
 include/hw/cxl/cxl_device.h|  11 ++
 include/hw/pci/pcie_aer.h  |   1 +
 include/hw/pci/pcie_regs.h |   3 +
 qapi/cxl.json  | 113 
 qapi/meson.build   |   1 +
 qapi/qapi-schema.json  |   1 +
 14 files changed, 551 insertions(+), 3 deletions(-)
 create mode 100644 hw/mem/cxl_type3_stubs.c
 create mode 100644 qapi/cxl.json

-- 
2.37.2

Re: [PATCH v14 10/11] qapi/s390/cpu topology: POLARITY_CHANGE qapi event

2023-01-20 Thread Pierre Morel





On 1/20/23 12:56, Thomas Huth wrote:

On 18/01/2023 18.09, Pierre Morel wrote:


On 1/12/23 12:52, Thomas Huth wrote:

On 05/01/2023 15.53, Pierre Morel wrote:

...>>> +#


OK


+# Emitted when the guest asks to change the polarity.
+#
+# @polarity: polarity specified by the guest


Please elaborate: Where does the value come from (the PTF 
instruction)? Which values are possible?


Yes what about:

# @polarity: the guest can specify with the PTF instruction a horizontal
#    or a vertical polarity.


Maybe something like: "The guest can tell the host (via the PTF 
instruction) whether a CPU should have horizontal or vertical polarity." ?


Yes thanks, much better.




# On horizontal polarity the host is expected to provision
#    the vCPU equally.


Maybe: "all vCPUs equally" ?
Or: "each vCPU equally" ?


yes, thx.





#    On vertical polarity the host can provision each vCPU
#    differently
#    The guest can get information on the provisioning with
#    the STSI(15) instruction.


  Thomas



I make the changes.

Regards,
Pierre

--
Pierre Morel
IBM Lab Boeblingen

Re: [PATCH v2 01/12] block: Improve empty format-specific info dump

2023-01-20 Thread Kevin Wolf

Am 20.01.2023 um 14:35 hat Hanna Czenczek geschrieben:
> On 19.01.23 15:00, Kevin Wolf wrote:
> > Am 20.06.2022 um 18:26 hat Hanna Reitz geschrieben:
> > > When a block driver supports obtaining format-specific information, but
> > > that object only contains optional fields, it is possible that none of
> > > them are present, so that dump_qobject() (called by
> > > bdrv_image_info_specific_dump()) will not print anything.
> > > 
> > > The callers of bdrv_image_info_specific_dump() put a header above this
> > > information ("Format specific information:\n"), which will look strange
> > > when there is nothing below.  Modify bdrv_image_info_specific_dump() to
> > > print this header instead of its callers, and only if there is indeed
> > > something to be printed.
> > > 
> > > Signed-off-by: Hanna Reitz 
> > > diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> > > index 2f0d8ac25a..084ec44d3b 100644
> > > --- a/qemu-io-cmds.c
> > > +++ b/qemu-io-cmds.c
> > > @@ -1819,8 +1819,8 @@ static int info_f(BlockBackend *blk, int argc, char 
> > > **argv)
> > >   return -EIO;
> > >   }
> > >   if (spec_info) {
> > > -printf("Format specific information:\n");
> > > -bdrv_image_info_specific_dump(spec_info);
> > > +bdrv_image_info_specific_dump(spec_info,
> > > +  "Format specific information:\n");
> > >   qapi_free_ImageInfoSpecific(spec_info);
> > >   }
> > Interesting observation here: That qemu-io uses printf() instead of
> > qemu_printf() for the top level, but then dump_qobject() (which results
> > in qemu_printf()) for the format specific information, means that if you
> > use the 'qemu-io' HMP command, you'll get half of the output on stdout
> > and the other half in the monitor.
> 
> Hu.  I can’t find a single instance of qemu_printf() in qemu-io-cmds.c, but
> then I assume all printf()s there should really be qemu_printf()?

That would probably be the most consistent way.

I expect it would change the output of some qemu-iotests cases, but we
can explicitly print whatever was returned in QMP to keep the same
information in the test output.

Kevin

[PATCH 0/2] vhost-user: Remove the nested event loop to unbreak the DPDK use case

2023-01-20 Thread Greg Kurz

The nested event loop was introduced in QEMU 6.0 to allow servicing
of requests coming from the slave channel while waiting for an ack
from the back-end on the master socket. It turns out this is fragile
and breaks if the servicing of the slave channel causes a new message
to be sent on the master socket. This is exactly what happens when
using DPDK as reported in [0].

The only identified user for the nested loop is DAX enablement that
isn't upstream yet. Just drop the code for now. Some more clever
solution should be designed when the need to service concurrent
requests from both channels arises again.

Greg Kurz (2):
  Revert "vhost-user: Monitor slave channel in vhost_user_read()"
  Revert "vhost-user: Introduce nested event loop in vhost_user_read()"

 hw/virtio/vhost-user.c | 100 -
 1 file changed, 8 insertions(+), 92 deletions(-)

-- 
2.39.0

Re: [PATCH] vhost-user-fs: add capability to allow migration

2023-01-20 Thread Michael S. Tsirkin

On Thu, Jan 19, 2023 at 03:45:06PM +0200, Anton Kuchin wrote:
> On 19/01/2023 14:51, Michael S. Tsirkin wrote:
> > On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:
> > > Now any vhost-user-fs device makes VM unmigratable, that also prevents
> > > qemu update without stopping the VM. In most cases that makes sense
> > > because qemu has no way to transfer FUSE session state.
> > > 
> > > But we can give an option to orchestrator to override this if it can
> > > guarantee that state will be preserved (e.g. it uses migration to
> > > update qemu and dst will run on the same host as src and use the same
> > > socket endpoints).
> > > 
> > > This patch keeps default behavior that prevents migration with such 
> > > devices
> > > but adds migration capability 'vhost-user-fs' to explicitly allow 
> > > migration.
> > > 
> > > Signed-off-by: Anton Kuchin 
> > > ---
> > >   hw/virtio/vhost-user-fs.c | 25 -
> > >   qapi/migration.json   |  7 ++-
> > >   2 files changed, 30 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > > index f5049735ac..13d920423e 100644
> > > --- a/hw/virtio/vhost-user-fs.c
> > > +++ b/hw/virtio/vhost-user-fs.c
> > > @@ -24,6 +24,7 @@
> > >   #include "hw/virtio/vhost-user-fs.h"
> > >   #include "monitor/monitor.h"
> > >   #include "sysemu/sysemu.h"
> > > +#include "migration/migration.h"
> > >   static const int user_feature_bits[] = {
> > >   VIRTIO_F_VERSION_1,
> > > @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice 
> > > *vdev)
> > >   return >vhost_dev;
> > >   }
> > > +static int vhost_user_fs_pre_save(void *opaque)
> > > +{
> > > +MigrationState *s = migrate_get_current();
> > > +
> > > +if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
> > > +error_report("Migration of vhost-user-fs devices requires 
> > > internal FUSE "
> > > + "state of backend to be preserved. If orchestrator 
> > > can "
> > > + "guarantee this (e.g. dst connects to the same 
> > > backend "
> > > + "instance or backend state is migrated) set 
> > > 'vhost-user-fs' "
> > > + "migration capability to true to enable 
> > > migration.");
> > Isn't it possible that some backends are same and some are not?
> > Shouldn't this be a device property then?
> If some are not the same it is not guaranteed that correct FUSE
> state is present there, so orchestrator shouldn't set the capability
> because this can result in destination devices being broken (they'll
> be fine after the remount in guest, but this is guest visible and is
> not acceptable).
> 
> I can imagine smart orchestrator and backend that can transfer
> internal FUSE state, but we are not there yet, and this would be
> their responsibility then to ensure endpoint compatibility between src
> and dst and set the capability (that's why I put "e.g." and "or" in
> the error description).

So instead of relying on the orchestrator how about making it a device
property?


> > 
> > 
> > 
> > > +return -1;
> > > +}
> > > +
> > > +return 0;
> > > +}
> > > +
> > >   static const VMStateDescription vuf_vmstate = {
> > >   .name = "vhost-user-fs",
> > > -.unmigratable = 1,
> > > +.minimum_version_id = 0,
> > > +.version_id = 0,
> > > +.fields = (VMStateField[]) {
> > > +VMSTATE_VIRTIO_DEVICE,
> > > +VMSTATE_END_OF_LIST()
> > > +},
> > > +   .pre_save = vhost_user_fs_pre_save,
> > >   };
> > >   static Property vuf_properties[] = {
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index 88ecf86ac8..9a229ea884 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -477,6 +477,11 @@
> > >   #will be handled faster.  This is a performance 
> > > feature and
> > >   #should not affect the correctness of postcopy 
> > > migration.
> > >   #(since 7.1)
> > > +# @vhost-user-fs: If enabled, the migration process will allow migration 
> > > of
> > > +# vhost-user-fs devices, this should be enabled only when
> > > +# backend can preserve local FUSE state e.g. for qemu 
> > > update
> > > +# when dst reconects to the same endpoints after 
> > > migration.
> > > +# (since 8.0)
> > >   #
> > >   # Features:
> > >   # @unstable: Members @x-colo and @x-ignore-shared are experimental.
> > > @@ -492,7 +497,7 @@
> > >  'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
> > >  { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
> > >  'validate-uuid', 'background-snapshot',
> > > -   'zero-copy-send', 'postcopy-preempt'] }
> > > +   'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }
> > I kind of dislike that it's such a specific flag. Is only

Re: MSYS2 and libfdt

2023-01-20 Thread Marc-André Lureau

Hi Thomas

On Fri, Jan 20, 2023 at 12:31 PM Thomas Huth  wrote:
>
> On 19/01/2023 09.56, Marc-André Lureau wrote:
> > Hi
> >
> > On Thu, Jan 19, 2023 at 12:31 PM Thomas Huth  wrote:
> >>
> >>
> >>Hi all,
> >>
> >> in some spare minutes, I started playing with a patch to try to remove the
> >> dtc submodule from the QEMU git repository - according to
> >> https://repology.org/project/dtc/versions our supported build platforms
> >> should now all provide the minimum required version.
> >>
> >> However, I'm hitting a problem with Windows / MSYS2 in the CI jobs: The
> >> libfdt is packaged as part of the dtc package there:
> >>
> >>https://packages.msys2.org/package/dtc
> >>
> >> ... meaning that it is added with a usr/include and usr/lib path prefix
> >> instead of mingw64/include and mingw64/lib like other packages are using
> >> (see e.g.
> >> https://packages.msys2.org/package/mingw-w64-x86_64-zlib?repo=mingw64). 
> >> Thus
> >> the compiler does not find the library there. Also there does not seem to 
> >> be
> >> a difference between a i686 (32-bit) and x86_64 (64-bit) variant available
> >> here? Does anybody know how libfdt is supposed to be used with MSYS2 ?
> >
> > The msys environment is a bit special, it's not an environment for a
> > particular build target, my understanding is that it holds common
> > files/tools.
> >
> > dtc should be added to https://github.com/msys2/MINGW-packages for it
> > to be available as a target dependency.
>
> Do you have already any experience in requesting a new package there? Could
> you maybe do it? ... since I don't have a proper MinGW installation here, it
> would be very cumbersome for me right now.
>

Here you go (although let see what CI has to say):
https://github.com/msys2/MINGW-packages/pull/15168

The msys2 maintainers are usually very quick and helpful, in my experience.

(I sometime use a windev evaluation VM, that I import with the help of
https://github.com/elmarco/virt-install-windev)

-- 
Marc-André Lureau

[PATCH 3/4] backends/hostmem: add an ability to specify prealloc timeout

2023-01-20 Thread Daniil Tatianin

Use the new qemu_prealloc_mem_with_timeout api so that we can limit the
maximum amount of time to be spent preallocating guest RAM. We also emit
a warning from the timeout handler detailing the current prealloc
progress and letting the user know that it was exceeded.

The timeout is set to zero (no timeout) by default, and can be
configured via the new 'prealloc-timeout' property.

Signed-off-by: Daniil Tatianin 
---
 backends/hostmem.c   | 48 ++--
 include/sysemu/hostmem.h |  2 ++
 qapi/qom.json|  4 
 3 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 842bfa9eb7..be9af7515e 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -34,6 +34,19 @@ QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_BIND != MPOL_BIND);
 QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_INTERLEAVE != MPOL_INTERLEAVE);
 #endif
 
+static void
+host_memory_on_prealloc_timeout(void *opaque,
+const PreallocStats *stats)
+{
+HostMemoryBackend *backend = opaque;
+
+backend->prealloc_did_timeout = true;
+warn_report("HostMemory preallocation timeout %"PRIu64"s exceeded, "
+"allocated %zu/%zu (%zu byte) pages (%d threads)",
+(uint64_t)stats->seconds_elapsed, stats->allocated_pages,
+stats->total_pages, stats->page_size, stats->threads);
+}
+
 char *
 host_memory_backend_get_name(HostMemoryBackend *backend)
 {
@@ -223,8 +236,26 @@ static bool do_prealloc_mr(HostMemoryBackend *backend, 
Error **errp)
 void *ptr = memory_region_get_ram_ptr(>mr);
 uint64_t sz = memory_region_size(>mr);
 
-qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads,
-  backend->prealloc_context, _err);
+if (backend->prealloc_timeout) {
+PreallocTimeout timeout = {
+.seconds = (time_t)backend->prealloc_timeout,
+.user = backend,
+.on_timeout = host_memory_on_prealloc_timeout,
+};
+
+qemu_prealloc_mem_with_timeout(fd, ptr, sz, backend->prealloc_threads,
+   backend->prealloc_context, ,
+   _err);
+if (local_err && backend->prealloc_did_timeout) {
+error_free(local_err);
+local_err = NULL;
+}
+} else {
+qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads,
+  backend->prealloc_context, _err);
+}
+
+
 if (local_err) {
 error_propagate(errp, local_err);
 return false;
@@ -277,6 +308,13 @@ static void 
host_memory_backend_set_prealloc_threads(Object *obj, Visitor *v,
 backend->prealloc_threads = value;
 }
 
+static void host_memory_backend_get_set_prealloc_timeout(Object *obj,
+Visitor *v, const char *name, void *opaque, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+visit_type_uint32(v, name, >prealloc_timeout, errp);
+}
+
 static void host_memory_backend_init(Object *obj)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -516,6 +554,12 @@ host_memory_backend_class_init(ObjectClass *oc, void *data)
 object_property_allow_set_link, OBJ_PROP_LINK_STRONG);
 object_class_property_set_description(oc, "prealloc-context",
 "Context to use for creating CPU threads for preallocation");
+object_class_property_add(oc, "prealloc-timeout", "int",
+host_memory_backend_get_set_prealloc_timeout,
+host_memory_backend_get_set_prealloc_timeout,
+NULL, NULL);
+object_class_property_set_description(oc, "prealloc-timeout",
+"Maximum memory preallocation timeout in seconds");
 object_class_property_add(oc, "size", "int",
 host_memory_backend_get_size,
 host_memory_backend_set_size,
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index 39326f1d4f..21910f3b45 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -66,7 +66,9 @@ struct HostMemoryBackend {
 uint64_t size;
 bool merge, dump, use_canonical_path;
 bool prealloc, is_mapped, share, reserve;
+bool prealloc_did_timeout;
 uint32_t prealloc_threads;
+uint32_t prealloc_timeout;
 ThreadContext *prealloc_context;
 DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
 HostMemPolicy policy;
diff --git a/qapi/qom.json b/qapi/qom.json
index 30e76653ad..9149c064b8 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -581,6 +581,9 @@
 # @prealloc-context: thread context to use for creation of preallocation 
threads
 #(default: none) (since 7.2)
 #
+# @prealloc-timeout: Maximum memory preallocation timeout in seconds
+#(default: 0) (since 7.3)
+#
 # @share: if false, the memory is private to QEMU; if true, it is shared
 # (default: false)
 #
@@ -612,6 +615,7 @@
 '*prealloc': 'bool',
 '*prealloc-threads': 'uint32',
 '*prealloc-context':

[PATCH 2/4] backends/hostmem: move memory region preallocation logic into a helper

2023-01-20 Thread Daniil Tatianin

...so that we don't have to duplicate it in multiple places throughout
the file.

Signed-off-by: Daniil Tatianin 
---
 backends/hostmem.c | 38 --
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 747e7838c0..842bfa9eb7 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -216,10 +216,26 @@ static bool host_memory_backend_get_prealloc(Object *obj, 
Error **errp)
 return backend->prealloc;
 }
 
+static bool do_prealloc_mr(HostMemoryBackend *backend, Error **errp)
+{
+Error *local_err = NULL;
+int fd = memory_region_get_fd(>mr);
+void *ptr = memory_region_get_ram_ptr(>mr);
+uint64_t sz = memory_region_size(>mr);
+
+qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads,
+  backend->prealloc_context, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return false;
+}
+
+return true;
+}
+
 static void host_memory_backend_set_prealloc(Object *obj, bool value,
  Error **errp)
 {
-Error *local_err = NULL;
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
 
 if (!backend->reserve && value) {
@@ -233,17 +249,7 @@ static void host_memory_backend_set_prealloc(Object *obj, 
bool value,
 }
 
 if (value && !backend->prealloc) {
-int fd = memory_region_get_fd(>mr);
-void *ptr = memory_region_get_ram_ptr(>mr);
-uint64_t sz = memory_region_size(>mr);
-
-qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads,
-  backend->prealloc_context, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-backend->prealloc = true;
+backend->prealloc = do_prealloc_mr(backend, errp);
 }
 }
 
@@ -399,12 +405,8 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
  * specified NUMA policy in place.
  */
 if (backend->prealloc) {
-qemu_prealloc_mem(memory_region_get_fd(>mr), ptr, sz,
-  backend->prealloc_threads,
-  backend->prealloc_context, _err);
-if (local_err) {
-goto out;
-}
+do_prealloc_mr(backend, errp);
+return;
 }
 }
 out:
-- 
2.25.1

[PATCH v0 0/4] backends/hostmem: add an ability to specify prealloc timeout

2023-01-20 Thread Daniil Tatianin

This series introduces new qemu_prealloc_mem_with_timeout() api,
which allows limiting the maximum amount of time to be spent on memory
preallocation. It also adds prealloc statistics collection that is
exposed via an optional timeout handler.

This new api is then utilized by hostmem for guest RAM preallocation
controlled via new object properties called 'prealloc-timeout' and
'prealloc-timeout-fatal'.

This is useful for limiting VM startup time on systems with
unpredictable page allocation delays due to memory fragmentation or the
backing storage. The timeout can be configured to either simply emit a
warning and continue VM startup without having preallocated the entire
guest RAM or just abort startup entirely if that is not acceptable for
a specific use case.

Daniil Tatianin (4):
  oslib: introduce new qemu_prealloc_mem_with_timeout() api
  backends/hostmem: move memory region preallocation logic into a helper
  backends/hostmem: add an ability to specify prealloc timeout
  backends/hostmem: add an ability to make prealloc timeout fatal

 backends/hostmem.c   | 112 +++---
 include/qemu/osdep.h |  19 +++
 include/sysemu/hostmem.h |   3 ++
 qapi/qom.json|   8 +++
 util/oslib-posix.c   | 114 +++
 util/oslib-win32.c   |   9 
 6 files changed, 238 insertions(+), 27 deletions(-)

-- 
2.25.1

[PATCH 4/4] backends/hostmem: add an ability to make prealloc timeout fatal

2023-01-20 Thread Daniil Tatianin

This is controlled via the new 'prealloc-timeout-fatal' property and can
be useful for cases when we cannot afford to not preallocate all guest
pages while being time constrained.

Signed-off-by: Daniil Tatianin 
---
 backends/hostmem.c   | 38 ++
 include/sysemu/hostmem.h |  1 +
 qapi/qom.json|  4 
 3 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index be9af7515e..0808dc6951 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -39,12 +39,21 @@ host_memory_on_prealloc_timeout(void *opaque,
 const PreallocStats *stats)
 {
 HostMemoryBackend *backend = opaque;
+const char *msg = "HostMemory preallocation timeout %"PRIu64"s exceeded, "
+  "allocated %zu/%zu (%zu byte) pages (%d threads)";
+
+if (backend->prealloc_timeout_fatal) {
+error_report(msg, (uint64_t)stats->seconds_elapsed,
+ stats->allocated_pages, stats->total_pages,
+ stats->page_size, stats->threads);
+exit(1);
+
+}
 
 backend->prealloc_did_timeout = true;
-warn_report("HostMemory preallocation timeout %"PRIu64"s exceeded, "
-"allocated %zu/%zu (%zu byte) pages (%d threads)",
-(uint64_t)stats->seconds_elapsed, stats->allocated_pages,
-stats->total_pages, stats->page_size, stats->threads);
+warn_report(msg, (uint64_t)stats->seconds_elapsed,
+stats->allocated_pages, stats->total_pages,
+stats->page_size, stats->threads);
 }
 
 char *
@@ -315,6 +324,22 @@ static void 
host_memory_backend_get_set_prealloc_timeout(Object *obj,
 visit_type_uint32(v, name, >prealloc_timeout, errp);
 }
 
+static bool host_memory_backend_get_prealloc_timeout_fatal(
+Object *obj, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+
+return backend->prealloc_timeout_fatal;
+}
+
+static void host_memory_backend_set_prealloc_timeout_fatal(
+Object *obj, bool value, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+
+backend->prealloc_timeout_fatal = value;
+}
+
 static void host_memory_backend_init(Object *obj)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -560,6 +585,11 @@ host_memory_backend_class_init(ObjectClass *oc, void *data)
 NULL, NULL);
 object_class_property_set_description(oc, "prealloc-timeout",
 "Maximum memory preallocation timeout in seconds");
+object_class_property_add_bool(oc, "prealloc-timeout-fatal",
+host_memory_backend_get_prealloc_timeout_fatal,
+host_memory_backend_set_prealloc_timeout_fatal);
+object_class_property_set_description(oc, "prealloc-timeout-fatal",
+"Consider preallocation timeout a fatal error");
 object_class_property_add(oc, "size", "int",
 host_memory_backend_get_size,
 host_memory_backend_set_size,
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index 21910f3b45..b501b5eff2 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -67,6 +67,7 @@ struct HostMemoryBackend {
 bool merge, dump, use_canonical_path;
 bool prealloc, is_mapped, share, reserve;
 bool prealloc_did_timeout;
+bool prealloc_timeout_fatal;
 uint32_t prealloc_threads;
 uint32_t prealloc_timeout;
 ThreadContext *prealloc_context;
diff --git a/qapi/qom.json b/qapi/qom.json
index 9149c064b8..70644d714b 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -584,6 +584,9 @@
 # @prealloc-timeout: Maximum memory preallocation timeout in seconds
 #(default: 0) (since 7.3)
 #
+# @prealloc-timeout-fatal: Consider preallocation timeout a fatal error
+#  (default: false) (since 7.3)
+#
 # @share: if false, the memory is private to QEMU; if true, it is shared
 # (default: false)
 #
@@ -616,6 +619,7 @@
 '*prealloc-threads': 'uint32',
 '*prealloc-context': 'str',
 '*prealloc-timeout': 'uint32',
+'*prealloc-timeout-fatal': 'bool',
 '*share': 'bool',
 '*reserve': 'bool',
 'size': 'size',
-- 
2.25.1

[PATCH 1/4] oslib: introduce new qemu_prealloc_mem_with_timeout() api

2023-01-20 Thread Daniil Tatianin

This helper allows limiting the maximum amount time to be spent
preallocating a block of memory, which is important on systems that
might have unpredictable page allocation delays because of possible
fragmentation or other reasons specific to the backend.

It also exposes a way to register a callback that is invoked in case the
specified timeout is exceeded. The callback is provided with a
PreallocStats structure that includes a bunch of statistics about the
progress including total & allocated number of pages, as well as page
size and number of allocation threads.

The win32 implementation is currently a stub that just calls into the
old qemu_prealloc_mem api.

Signed-off-by: Daniil Tatianin 
---
 include/qemu/osdep.h |  19 
 util/oslib-posix.c   | 114 +++
 util/oslib-win32.c   |   9 
 3 files changed, 133 insertions(+), 9 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index bd23a08595..21757e5144 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -595,6 +595,25 @@ typedef struct ThreadContext ThreadContext;
 void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads,
ThreadContext *tc, Error **errp);
 
+typedef struct PreallocStats {
+size_t page_size;
+size_t total_pages;
+size_t allocated_pages;
+int threads;
+time_t seconds_elapsed;
+} PreallocStats;
+
+typedef struct PreallocTimeout {
+time_t seconds;
+void *user;
+void (*on_timeout)(void *user, const PreallocStats *stats);
+} PreallocTimeout;
+
+void qemu_prealloc_mem_with_timeout(int fd, char *area, size_t sz,
+int max_threads, ThreadContext *tc,
+const PreallocTimeout *timeout,
+Error **errp);
+
 /**
  * qemu_get_pid_name:
  * @pid: pid of a process
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 59a891b6a8..570fca601f 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -74,6 +74,7 @@ typedef struct MemsetContext {
 bool any_thread_failed;
 struct MemsetThread *threads;
 int num_threads;
+PreallocStats stats;
 } MemsetContext;
 
 struct MemsetThread {
@@ -83,6 +84,7 @@ struct MemsetThread {
 QemuThread pgthread;
 sigjmp_buf env;
 MemsetContext *context;
+size_t touched_pages;
 };
 typedef struct MemsetThread MemsetThread;
 
@@ -373,6 +375,7 @@ static void *do_touch_pages(void *arg)
  */
 *(volatile char *)addr = *addr;
 addr += hpagesize;
+qatomic_inc(_args->touched_pages);
 }
 }
 pthread_sigmask(SIG_SETMASK, , NULL);
@@ -396,6 +399,11 @@ static void *do_madv_populate_write_pages(void *arg)
 if (size && qemu_madvise(addr, size, QEMU_MADV_POPULATE_WRITE)) {
 ret = -errno;
 }
+
+if (!ret) {
+qatomic_set(_args->touched_pages, memset_args->numpages);
+}
+
 return (void *)(uintptr_t)ret;
 }
 
@@ -418,8 +426,68 @@ static inline int get_memset_num_threads(size_t hpagesize, 
size_t numpages,
 return ret;
 }
 
+static int do_join_memset_threads_with_timeout(MemsetContext *context,
+   time_t seconds)
+{
+struct timespec ts;
+int i = 0;
+
+if (clock_gettime(CLOCK_REALTIME, ) < 0) {
+return i;
+}
+ts.tv_sec += seconds;
+
+for (; i < context->num_threads; ++i) {
+if (pthread_timedjoin_np(context->threads[i].pgthread.thread,
+ NULL, )) {
+break;
+}
+}
+
+return i;
+}
+
+static void memset_stats_count_pages(MemsetContext *context)
+{
+int i;
+
+for (i = 0; i < context->num_threads; ++i) {
+size_t pages = qatomic_load_acquire(
+>threads[i].touched_pages);
+context->stats.allocated_pages += pages;
+}
+}
+
+static int timed_join_memset_threads(MemsetContext *context,
+ const PreallocTimeout *timeout)
+{
+int i, off;
+PreallocStats *stats = >stats;
+off = do_join_memset_threads_with_timeout(context, timeout->seconds);
+
+if (off != context->num_threads && timeout->on_timeout) {
+memset_stats_count_pages(context);
+
+/*
+ * Guard against possible races if preallocation finishes right
+ * after the timeout is exceeded.
+ */
+if (stats->allocated_pages < stats->total_pages) {
+stats->seconds_elapsed = timeout->seconds;
+timeout->on_timeout(timeout->user, stats);
+}
+}
+
+for (i = off; i < context->num_threads; ++i) {
+pthread_cancel(context->threads[i].pgthread.thread);
+}
+
+return off;
+}
+
 static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
int max_threads, ThreadContext *tc,
+   const PreallocTimeout *timeout,

Re: [PATCH v2 00/12] qemu-img info: Show protocol-level information

2023-01-20 Thread Hanna Czenczek


On 19.01.23 21:12, Kevin Wolf wrote:

Am 08.12.2022 um 13:24 hat Hanna Reitz geschrieben:

On 20.06.22 18:26, Hanna Reitz wrote:

Hi,

This series is a v2 to:

https://lists.nongnu.org/archive/html/qemu-block/2022-05/msg00042.html

Ping, it looks like this still applies (to the master branch and kevin’s
block-next branch at least).

Not any more. :-)

But the conflicts seemed obvious enough, so I rebased it (including
changing the "Since: 7.1" occurrences to 8.0) and applied it to my block
branch.


Ah, yes.  That I should have fixed.


Thanks!


Thank you! :)

Hanna

[PATCH v2 1/2] tests/avocado: Factor file_truncate() helper out

2023-01-20 Thread Philippe Mathieu-Daudé

Factor file_truncate() helper out of image_pow2ceil_expand()
for reuse.

Signed-off-by: Philippe Mathieu-Daudé 
---
 tests/avocado/boot_linux_console.py | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/tests/avocado/boot_linux_console.py 
b/tests/avocado/boot_linux_console.py
index 8c1d981586..8a598be966 100644
--- a/tests/avocado/boot_linux_console.py
+++ b/tests/avocado/boot_linux_console.py
@@ -30,15 +30,16 @@
 def pow2ceil(x):
 return 1 if x == 0 else 2**(x - 1).bit_length()
 
+def file_truncate(path, size):
+if size != os.path.getsize(path):
+with open(path, 'ab+') as fd:
+fd.truncate(size)
+
 """
 Expand file size to next power of 2
 """
-def image_pow2ceil_expand(path):
-size = os.path.getsize(path)
-size_aligned = pow2ceil(size)
-if size != size_aligned:
-with open(path, 'ab+') as fd:
-fd.truncate(size_aligned)
+def image_pow2ceil_expand(path, size):
+file_truncate(path, pow2ceil(size))
 
 class LinuxKernelTest(QemuSystemTest):
 KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
-- 
2.38.1

1 2 3 >

1 - 100 of 280 matches

Mail list logo