date:20230912

Re: [PULL v1 0/1] Merge tpm 2023/09/12 v1

2023-09-12 Thread Philippe Mathieu-Daudé


On 12/9/23 23:41, Stefan Berger wrote:

Hello!

   This PR contains a fix for the case where the TPM file descriptor is >= 1024
and the select() call cannot be used.

Regards,
Stefan

The following changes since commit 9ef497755afc252fb8e060c9ea6b0987abfd20b6:

   Merge tag 'pull-vfio-20230911' of https://github.com/legoater/qemu into 
staging (2023-09-11 09:13:08 -0400)

are available in the Git repository at:

   https://github.com/stefanberger/qemu-tpm.git tags/pull-tpm-2023-09-12-1

for you to fetch changes up to 8557de964dfaae5c6eea09d488f85f4aa6cb3ce7:

   tpm: fix crash when FD >= 1024 (2023-09-12 17:30:12 -0400)


Marc-Andr޸ Lureau (1):


UTF-8 mojibake :/


   tpm: fix crash when FD >= 1024

  backends/tpm/tpm_util.c | 11 ++-
  1 file changed, 2 insertions(+), 9 deletions(-)

[PATCH] ppc/xive: Fix uint32_t overflow

2023-09-12 Thread Cédric Le Goater

As reported by Coverity, "idx << xive->pc_shift" is evaluated using
32-bit arithmetic, and then used in a context expecting a "uint64_t".
Add a uint64_t cast.

Fixes: Coverity CID 1519049
Fixes: b68147b7a5bf ("ppc/xive: Add support for the PC MMIOs")
Signed-off-by: Cédric Le Goater 
---
 hw/intc/pnv_xive.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/intc/pnv_xive.c b/hw/intc/pnv_xive.c
index 9b10e905195a..a36b3bf08c92 100644
--- a/hw/intc/pnv_xive.c
+++ b/hw/intc/pnv_xive.c
@@ -210,7 +210,7 @@ static uint64_t pnv_xive_vst_addr_remote(PnvXive *xive, 
uint32_t type,
 return 0;
 }
 
-remote_addr |= idx << xive->pc_shift;
+remote_addr |= ((uint64_t) idx) << xive->pc_shift;
 
 vst_addr = address_space_ldq_be(_space_memory, remote_addr,
 MEMTXATTRS_UNSPECIFIED, );
-- 
2.41.0

Re: [PATCH 3/4] gitlab: make Cirrus CI timeout explicit

2023-09-12 Thread Philippe Mathieu-Daudé


On 12/9/23 20:41, Daniel P. Berrangé wrote:

On the GitLab side we're invoking the Cirrus CI job using the
cirrus-run tool which speaks to the Cirrus REST API. Cirrus
sometimes tasks 5-10 minutes to actually schedule the task,
and thus the execution time of 'cirrus-run' inside GitLab will
be slightly longer than the execution time of the Cirrus CI
task.

Setting the timeout in the GitLab CI job should thus be done
in relation to the timeout set for the Cirrus CI job. While
Cirrus CI defaults to 60 minutes, it is better to set this
explicitly, and make the relationship between the jobs
explicit

Signed-off-by: Daniel P. Berrangé 
---
  .gitlab-ci.d/cirrus.yml   | 3 +++
  .gitlab-ci.d/cirrus/build.yml | 2 ++
  2 files changed, 5 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 00/10] Adds CPU hot-plug support to Loongarch

2023-09-12 Thread lixianglai




Hi, Salil Mehta :

Hi Xianglai,


From: qemu-devel-bounces+salil.mehta=huawei@nongnu.org  On Behalf Of xianglai li
Sent: Tuesday, September 12, 2023 3:12 AM
To: qemu-devel@nongnu.org
Cc: Salil Mehta ; Xiaojuan Yang
; Song Gao ; Michael S.
Tsirkin ; Igor Mammedov ; Ani Sinha
; Paolo Bonzini ; Richard
Henderson ; Eduardo Habkost
; Marcel Apfelbaum ;
Philippe Mathieu-Daudé ; wangyanan (Y)
; Daniel P. Berrangé ; Peter
Xu ; David Hildenbrand ; Bibo Mao
; Xianglai li 
Subject: [PATCH v2 00/10] Adds CPU hot-plug support to Loongarch

Hello everyone, We refer to the implementation of ARM CPU
Hot-Plug to add GED-based CPU Hot-Plug support to Loongarch.

The first 4 patches are changes to the QEMU common code,
including adding GED support for CPU Hot-Plug, updating
the ACPI table creation process, and adding qdev_disconnect_gpio_out_named
and cpu_address_space_destroy interfaces to release resources
when CPU un-plug.

For the modification of the public part of the code, we refer to the
arm-related patch, and the link address of the corresponding patch is
as follows:
https://lore.kernel.org/all/20200613213629.21984-1-salil.me...@huawei.com/

In order to respect the work of "Salil Mehta", we will rebase the first
4 patches in the final patch, which are referenced here to ensure that
the loongarch cpu hotplug can work properly.


Just to let you know RFC V2 for above link is undergoing internal review
process and I will be posting the patches to community soon.

Also, I am planning to post RFC V2 as one complete patch-set initially.
(This is required to reflect the clear change from RFC V1)
This will have patches which are ARM specific and architecture common.
Later patches can be cherry picked and compiled independently.

After RFC V2 has been posted, and you have confirmed that architecture
common patches works well with your changes, I will split the RFC V2
further into 2 patch-sets,
1. Architecture common patch-set (This will come with no RFC)
2. ARM specific patch-set  (This will continue as RFC V3)

This will help patch-set 1 getting absorbed earlier in this Qemu
cycle if everything goes well.


Thanks
Salil.


That's great,Looking forward to your patch.

However, I suggest that you pay attention to the community's feedback

on patch 3 and patch 4 in the patch series I sent.

I think it may be helpful for your later patch submission.

And I'm still working on how to reply to the community.


The last 6 patches are Loongarch architecture-related,
and the modifications include the definition of the hook
function related to the CPU Hot-(UN)Plug, the allocation
and release of CPU resources when CPU Hot-(UN)Plug,
the creation process of updating the ACPI table,
and finally the custom switch for the CPU Hot-Plug.

V2:
- Fix formatting and spelling errors
- Split large patches into smaller patches
   - Split the original patch
 <> into
 <>
 <>
 <>.
   - Split the original patch
 <> into
 <>
 <>
- Added loongarch cpu topology calculation method.
- Change the position of the cpu topology patch.
- Change unreasonable variable and function names.



xianglai li (10):
   Update ACPI GED framework to support vcpu hot-(un)plug
   Update CPUs AML with cpu-(ctrl)dev change
   make qdev_disconnect_gpio_out_named() public
   Introduce the CPU address space destruction function
   Added CPU topology support for Loongarch
   Optimize loongarch_irq_init function implementation
   Add basic CPU hot-(un)plug support for Loongarch
   Add support of *unrealize* for Loongarch cpu
   Add generic event device for Loongarch
   Update the ACPI table for the Loongarch CPU

  .../devices/loongarch64-softmmu/default.mak   |   1 +
  docs/system/loongarch/virt.rst|  31 ++
  hw/acpi/acpi-cpu-hotplug-stub.c   |  15 +
  hw/acpi/cpu.c |  27 +-
  hw/acpi/generic_event_device.c|  33 ++
  hw/core/gpio.c|   4 +-
  hw/i386/acpi-build.c  |   2 +-
  hw/loongarch/acpi-build.c |  33 +-
  hw/loongarch/generic_event_device_loongarch.c |  36 ++
  hw/loongarch/meson.build  |   2 +-
  hw/loongarch/virt.c   | 424 +++---
  include/exec/cpu-common.h |   8 +
  include/hw/acpi/cpu.h |   5 +-
  include/hw/acpi/cpu_hotplug.h |  10 +
  include/hw/acpi/generic_event_device.h|   6 +
  include/hw/core/cpu.h |   1 +
  include/hw/loongarch/virt.h   |  10 +-
  include/hw/qdev-core.h|   2 +
  softmmu/physmem.c |  24 +
  target/loongarch/cpu.c|  35 +-
  target/loongarch/cpu.h|  13 +-
  21 files changed, 635 insertions(+), 87 deletions(-)
  create mode 100644 hw/loongarch/generic_event_device_loongarch.c

Cc: "Salil Mehta" 
Cc: Xiaojuan

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-12 Thread Gupta, Pankaj


From: William Roche 

AMD guests can't currently deal with BUS_MCEERR_AO MCE injection
as it panics the VM kernel. We filter this event and provide a
warning message.

Signed-off-by: William Roche 
---
v3:
   - New patch
v4:
   - Remove redundant check for AO errors
---
  target/i386/kvm/kvm.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 5fce74aac5..7e9fc0cac5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -604,6 +604,10 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int 
code)
  mcg_status |= MCG_STATUS_RIPV;
  }
  } else {
+if (code == BUS_MCEERR_AO) {
+/* XXX we don't support BUS_MCEERR_AO injection on AMD yet */
+return;
+}
  mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
  }
  
@@ -668,8 +672,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)

  addr, paddr, "BUS_MCEERR_AR");
  } else {
   warn_report("Guest MCE Memory Error at QEMU addr %p and "
- "GUEST addr 0x%" HWADDR_PRIx " of type %s injected",
- addr, paddr, "BUS_MCEERR_AO");
+ "GUEST addr 0x%" HWADDR_PRIx " of type %s %s",
+ addr, paddr, "BUS_MCEERR_AO",
+ IS_AMD_CPU(env) ? "ignored on AMD guest" : "injected");
  }
  
  return;


Reviewed-by: Pankaj Gupta

Re: [PATCH 0/4] net: avoid variable length arrays

2023-09-12 Thread Jason Wang

On Tue, Sep 12, 2023 at 10:20 PM Peter Maydell  wrote:
>
> Hi, Jason. This patchset has been reviewed -- do you want to
> pick it up via the net tree?

Yes, I've queued this.

Thanks

>
> thanks
> -- PMM
>
> On Thu, 24 Aug 2023 at 16:32, Peter Maydell  wrote:
> >
> > This patchset removes the use of variable length arrays in a couple
> > of network devices and the net/ core code.  In one case we can switch
> > to a fixed-sized array on the stack; in the other three we have to
> > use a heap allocation.
> >
> > The codebase has very few VLAs, and if we can get rid of them all we
> > can make the compiler error on new additions.  This is a defensive
> > measure against security bugs where an on-stack dynamic allocation
> > isn't correctly size-checked (e.g.  CVE-2021-3527).
> >
> > Philippe had a go at these in  a patch in 2021:
> > https://patchew.org/QEMU/20210505211047.1496765-1-phi...@redhat.com/20210505211047.1496765-16-phi...@redhat.com/
> > but these are re-implementations, mostly.
> >
> > Usual disclaimer: I have tested these patches only with
> > "make check" and "make check-avocado".
> >
> > thanks
> > -- PMM
> >
> > Peter Maydell (4):
> >   hw/net/fsl_etsec/rings.c: Avoid variable length array
> >   hw/net/rocker: Avoid variable length array
> >   net/dump: Avoid variable length array
> >   net/tap: Avoid variable-length array
> >
> >  hw/net/fsl_etsec/rings.c  | 12 ++--
> >  hw/net/rocker/rocker_of_dpa.c |  2 +-
> >  net/dump.c|  2 +-
> >  net/tap.c |  3 ++-
> >  4 files changed, 14 insertions(+), 5 deletions(-)
>

Re: qemu-riscv32 usermode still broken?

2023-09-12 Thread LIU Zhiwei




On 2023/9/13 6:31, Andreas K. Huettel wrote:

Dear all,

I've once more tried to build up a riscv32 linux install in a qemu-riscv32
usermode systemd-nspawn, and am running into the same problems as some time
ago...

https://dev.gentoo.org/~dilfridge/riscv32/riscv32.tar.xz   (220M)

The problems manifest themselves mostly in bash; if I replace /bin/bash
with a static x86-64 binary (in the tarball as /bin/bash.amd64), bypassing
qemu, I can make the chroot rebuild itself completely.

https://lists.gnu.org/archive/html/bug-bash/2023-09/msg00119.html
^ Here I'm trying to find out more.

Bash tests apparently indicate that argv[0] is overwritten, and that
reading through a pipe or from /dev/tty fails or loses data.

Apart from the bash testsuite failing, symptoms are as follows:

* Something seems wrong in the signal handling (?):


If it is wrong for signal handling and for 32-bit, I guess it may be 
fixed by this patch


https://www.mail-archive.com/qemu-devel@nongnu.org/msg981238.html

And this patch has been merged into master branch yesterday.


May be you can have a try based on the master branch.

Thanks,
Zhiwei


--- our package manager (bash/python combo, there bash) hangs reproducibly at
one point.
--- when I run a console program and try to background it with ctl-z, it hangs
 (only the first time per bash instance, it seems)
 repeated ctl-c gets me back to the shell, then the program is in the
background

riscv32 ~ # python
Python 3.11.5 (main, Aug 31 2023, 21:56:30) [GCC 13.2.1 20230826] on linux
Type "help", "copyright", "credits" or "license" for more information.
[1]+  Stopped python
^C^C^C^C^C^C^C
riscv32 ~ # ^C
riscv32 ~ #
riscv32 ~ # jobs
[1]+  Stopped python
riscv32 ~ # fg
python


--- make, when building something, seems to always start only one job in
parallel

Any advice or debugging would be appreciated.

If we get this running then I can set up regular riscv32 Gentoo stage builds
within a week. [*]

Thanks in advance,
Andreas

PS.
huettel@pinacolada ~ $ /var/lib/machines/riscv32/usr/bin/qemu-riscv32 -version
qemu-riscv32 version 8.1.0
Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers


[*] https://www.gentoo.org/downloads/#riscv

Re: [PATCH v11 0/9] rutabaga_gfx + gfxstream

2023-09-12 Thread Gurchetan Singh

On Tue, Sep 12, 2023 at 1:53 AM Alyssa Ross  wrote:

> Gurchetan Singh  writes:
>
> > On Fri, Aug 25, 2023 at 12:37 PM Alyssa Ross  wrote:
> >
> >> Alyssa Ross  writes:
> >>
> >> > Gurchetan Singh  writes:
> >> >
> >> >> On Fri, Aug 25, 2023 at 12:11 AM Alyssa Ross  wrote:
> >> >>
> >> >>> Gurchetan Singh  writes:
> >> >>>
> >> >>> > On Wed, Aug 23, 2023 at 4:07 AM Alyssa Ross  wrote:
> >> >>> >
> >> >>> >> Gurchetan Singh  writes:
> >> >>> >>
> >> >>> >> > - Official "release commits" issued for rutabaga_gfx_ffi,
> >> >>> >> >   gfxstream, aemu-base.  For example, see crrev.com/c/4778941
> >> >>> >> >
> >> >>> >> > - The release commits can make packaging easier, though once
> >> >>> >> >   again all known users will likely just build from sources
> >> >>> >> >   anyways
> >> >>> >>
> >> >>> >> It's a small thing, but could there be actual tags, rather than
> just
> >> >>> >> blessed commits?  It'd just make them easier to find, and save a
> >> bit of
> >> >>> >> time in review for packages.
> >> >>> >>
> >> >>> >
> >> >>> > I added:
> >> >>> >
> >> >>> >
> >> >>>
> >>
> https://crosvm.dev/book/appendix/rutabaga_gfx.html#latest-releases-for-potential-packaging
> >> >>> >
> >> >>> > Tags are possible, but I want to clarify the use case before
> >> packaging.
> >> >>> > Where are you thinking of packaging it for (Debian??)? Are you
> mostly
> >> >>> > interested in Wayland passthrough (my guess) or gfxstream too?
> >> Depending
> >> >>> > your use case, we may be able to minimize the work involved.
> >> >>>
> >> >>> Packaging for Nixpkgs (where I already maintain what to my
> knowledge is
> >> >>> the only crosvm distro package).  I'm personally mostly interested
> in
> >> >>> Wayland passthroug, but I wouldn't be surprised if others are
> >> interested
> >> >>> in gfxstream.  The packaging work is already done, I've just been
> >> >>> holding off actually pushing the packages waiting for the stable
> >> >>> releases.
> >> >>>
> >> >>> The reason that tags would be useful is that it allows a reviewer of
> >> the
> >> >>> package to see at a glance that the package is built from a stable
> >> >>> release.  If it's just built from a commit hash, they have to go and
> >> >>> verify that it's a stable release, which is mildly annoying and
> >> >>> unconventional.
> >> >>>
> >> >>
> >> >> Understood.  Request to have gfxstream and AEMU v0.1.2 release tags
> >> made.
> >> >>
> >> >> For rutabaga_gfx_ffi, is the crates.io upload sufficient?
> >> >>
> >> >> https://crates.io/crates/rutabaga_gfx_ffi
> >> >>
> >> >> Debian, for example, treats crates.io as the source of truth and
> builds
> >> >> tooling around that.  I wonder if Nixpkgs as similar tooling around
> >> >> crates.io.
> >> >
> >> > We do, and I'll use the crates.io release for the package — good
> >> > suggestion, but it's still useful to also have a tag in a git repo.
> It
> >> > makes it easier if I need to do a bisect, for example.  As a distro
> >> > developer, I'm frequently jumping across codebases I am not very
> >> > familiar with to try to track down regressions, etc., and it's much
> >> > easier when I don't have to learn some special quirk of the package
> like
> >> > not having git tags.
> >>
> >> Aha, trying to switch my package over to it has revealed that there is
> >> actually a reason not to use the crates.io release.  It doesn't include
> >> a Cargo.lock, which would mean we'd have to obtain one from elsewhere.
> >> Either from the crosvm git repo, at which point we might just get all
> >> the sources from there, or by vendoring a Cargo.lock into our own git
> >> tree for packages, which we try to avoid because when you have a lot of
> >> them, they become quite a large proportion of the overall size of the
> >> repo.
> >>
> >
> > Ack.  Request to have a rutabaga release tag in crosvm also made, should
> be
> > complete in a few days.
>
> Thanks!  I've found the rutabaga tag, but I still don't see any relevant
> tags for aemu or gfxstream.  Any news there?
>

It's harder to get the attention of the Android build team than the Chrome
build team.  Though, there are a few issues with AEMU/gfxstream packaging
we also need to figure out -- see "[PATCH v13 0/9] rutabaga_gfx +
gfxstream" for details -- interested in your opinion on the matter!

Re: [PATCH v13 0/9] rutabaga_gfx + gfxstream

2023-09-12 Thread Gurchetan Singh

On Tue, Sep 12, 2023 at 6:59 AM Marc-André Lureau <
marcandre.lur...@gmail.com> wrote:

> Hi Gurchetan
>
> On Wed, Sep 6, 2023 at 5:22 AM Gurchetan Singh
>  wrote:
> >
> >
> >
> > On Wed, Aug 30, 2023 at 7:26 PM Huang Rui  wrote:
> >>
> >> On Tue, Aug 29, 2023 at 08:36:20AM +0800, Gurchetan Singh wrote:
> >> > From: Gurchetan Singh 
> >> >
> >> > Changes since v12:
> >> > - Added r-b tags from Antonio Caggiano and Akihiko Odaki
> >> > - Removed review version from commit messages
> >> > - I think we're good to merge since we've had multiple people test
> and review this series??
> >> >
> >> > How to build both rutabaga and gfxstream guest/host libs:
> >> >
> >> > https://crosvm.dev/book/appendix/rutabaga_gfx.html
> >> >
> >> > Branch containing this patch series:
> >> >
> >> > https://gitlab.com/gurchetansingh/qemu/-/commits/qemu-gfxstream-v13
> >> >
> >> > Antonio Caggiano (2):
> >> >   virtio-gpu: CONTEXT_INIT feature
> >> >   virtio-gpu: blob prep
> >> >
> >> > Dr. David Alan Gilbert (1):
> >> >   virtio: Add shared memory capability
> >> >
> >> > Gerd Hoffmann (1):
> >> >   virtio-gpu: hostmem
> >>
> >> Patch 1 -> 4 are
> >>
> >> Acked-and-Tested-by: Huang Rui 
> >
> >
> > Thanks Ray, I've rebased
> https://gitlab.com/gurchetansingh/qemu/-/commits/qemu-gfxstream-v13 and
> added the additional acks in the commit message.
> >
> > UI/gfx maintainers, since everything is reviewed and there hasn't been
> any additional review comments, may we merge the gfxstream + rutabaga_gfx
> series?  Thank you!
> >
> >
>
> Packaging aemu and gfxstream is a bit problematic. I have some WIP
> Fedora packages.
>
> AEMU:
> - installs files under /usr/include/host-common and
> /usr/include/snapshot. Can this be moved under /usr/include/aemu
> instead?
> - builds only static versions of libaemu-host-common.a and
> liblogging-base.a (distros don't like static libs much)
> - could liblogging-base(.a,.so,..) also have "aemu" name in it?
> - libaemu-base.so is not versioned
> - I can't find a release tarball, nor the (v0.1.2) release tag
> - could have a README file
>
> I am not very familiar with cmake, so it's not obvious how to make the
> required changes. Would you like me to open an issue (where?) or try
> to make some patches?
>

I filed an internal bug with all the issues you listed: Android side should
fix this internally.

I see a few options for packaging:

1) Punt on gfxstream/AEMU packaging, just do rutabaga

gfxstream is mostly useful for Android guests, and I didn't expect anyone
to actually package it at this point since most here are interested in
Linux guests (where gfxstream VK is headless only right now).  Plus
ioctl-fowarding > API forwarding for security and performance, so I'm not
sure if it'll have any sticking power even if everything is supported
(outside of a few niche use cases).

Though, I sense interest in Wayland passthrough for dual Linux use cases.
I put up:

crrev.com/c/4860836 

that'll allow packaging on rutabaga_gfx and even CI testing without
gfxstream, since it is designed to function without it.  We could issue
another rutabaga-release tag, or you can simply add a patch (a common
packaging practice) on the Fedora package with the "UPSTEAM label".

2) Actually package gfxstream only

Probably an intermediate solution that doesn't introduce versioning/static
library issues would be just to have a copy of AEMU in the gfxstream repo,
and link it statically.  Will need another release tag/commit of
gfxstream.

3) Don't package at all

For my particular use case since we have to build QEMU for sources, this is
fine.  If upstream breaks virtio-gpu-rutabaga.c, we'll send a patch and fix
it.  Being in-tree is most important.

Let me know what you prefer!

>
> gfxstream:
> - libgfxtream_backend.so is not versioned
> - I can't find a release tarball, nor the (v0.1.2) release tag
>

https://android-review.googlesource.com/c/platform/hardware/google/gfxstream/+/2749095

>
>
> (packaging is important so we can build the new code in CI too!)
>
> thanks
>
> --
> Marc-André Lureau
>

[RFC PATCH 1/3] target/ppc: Change CR registers from i32 to tl

2023-09-12 Thread Nicholas Piggin

tl is more convenient to work with because it matches most other
registers.

Change the type to tl. Keep generated code changes to a minimum with
trivial conversions (e.g., tcg_gen_trunc_tl_i32 -> tcg_gen_mov_tl).
Optimisation is done with a subsequent change.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu.h |   2 +-
 target/ppc/cpu_init.c|   2 +-
 target/ppc/dfp_helper.c  |  12 +-
 target/ppc/fpu_helper.c  |  38 ++--
 target/ppc/helper.h  | 124 +-
 target/ppc/int_helper.c  |  58 ++---
 target/ppc/machine.c |   2 +-
 target/ppc/translate.c   | 224 +--
 target/ppc/translate/fixedpoint-impl.c.inc   |   2 +-
 target/ppc/translate/fp-impl.c.inc   |  26 +--
 target/ppc/translate/spe-impl.c.inc  |  18 +-
 target/ppc/translate/storage-ctrl-impl.c.inc |   4 +-
 target/ppc/translate/vmx-impl.c.inc  |  24 +-
 target/ppc/translate/vsx-impl.c.inc  |   6 +-
 14 files changed, 271 insertions(+), 271 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 173e4c351a..2cc3622148 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1107,7 +1107,7 @@ struct CPUArchState {
 target_ulong gprh[32]; /* storage for GPR MSB, used by the SPE extension */
 target_ulong lr;
 target_ulong ctr;
-uint32_t crf[8];   /* condition register */
+target_ulong crf[8];   /* condition register */
 #if defined(TARGET_PPC64)
 target_ulong cfar;
 #endif
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 7ab5ee92d9..f94dcf7de6 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -7481,7 +7481,7 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, int flags)
 }
 qemu_fprintf(f, "CR ");
 for (i = 0; i < 8; i++)
-qemu_fprintf(f, "%01x", env->crf[i]);
+qemu_fprintf(f, "%01x", (uint32_t)env->crf[i]);
 qemu_fprintf(f, "  [");
 for (i = 0; i < 8; i++) {
 char a = '-';
diff --git a/target/ppc/dfp_helper.c b/target/ppc/dfp_helper.c
index 5967ea07a9..822bb28877 100644
--- a/target/ppc/dfp_helper.c
+++ b/target/ppc/dfp_helper.c
@@ -493,7 +493,7 @@ DFP_HELPER_TAB(DDIV, decNumberDivide, DIV_PPs, 64)
 DFP_HELPER_TAB(DDIVQ, decNumberDivide, DIV_PPs, 128)
 
 #define DFP_HELPER_BF_AB(op, dnop, postprocs, size)
\
-uint32_t helper_##op(CPUPPCState *env, ppc_fprp_t *a, ppc_fprp_t *b)   
\
+target_ulong helper_##op(CPUPPCState *env, ppc_fprp_t *a, ppc_fprp_t *b)   
\
 {  
\
 struct PPC_DFP dfp;
\
 dfp_prepare_decimal##size(, a, b, env);
\
@@ -525,7 +525,7 @@ DFP_HELPER_BF_AB(DCMPO, decNumberCompare, CMPO_PPs, 64)
 DFP_HELPER_BF_AB(DCMPOQ, decNumberCompare, CMPO_PPs, 128)
 
 #define DFP_HELPER_TSTDC(op, size)   \
-uint32_t helper_##op(CPUPPCState *env, ppc_fprp_t *a, uint32_t dcm)  \
+target_ulong helper_##op(CPUPPCState *env, ppc_fprp_t *a, uint32_t dcm)  \
 {\
 struct PPC_DFP dfp;  \
 int match = 0;   \
@@ -553,7 +553,7 @@ DFP_HELPER_TSTDC(DTSTDC, 64)
 DFP_HELPER_TSTDC(DTSTDCQ, 128)
 
 #define DFP_HELPER_TSTDG(op, size)   \
-uint32_t helper_##op(CPUPPCState *env, ppc_fprp_t *a, uint32_t dcm)  \
+target_ulong helper_##op(CPUPPCState *env, ppc_fprp_t *a, uint32_t dcm)  \
 {\
 struct PPC_DFP dfp;  \
 int minexp, maxexp, nzero_digits, nzero_idx, is_negative, is_zero,   \
@@ -608,7 +608,7 @@ DFP_HELPER_TSTDG(DTSTDG, 64)
 DFP_HELPER_TSTDG(DTSTDGQ, 128)
 
 #define DFP_HELPER_TSTEX(op, size)   \
-uint32_t helper_##op(CPUPPCState *env, ppc_fprp_t *a, ppc_fprp_t *b) \
+target_ulong helper_##op(CPUPPCState *env, ppc_fprp_t *a, ppc_fprp_t *b) \
 {\
 struct PPC_DFP dfp;  \
 int expa, expb, a_is_special, b_is_special;  \
@@ -640,7 +640,7 @@ DFP_HELPER_TSTEX(DTSTEX, 64)
 DFP_HELPER_TSTEX(DTSTEXQ, 128)
 
 #define DFP_HELPER_TSTSF(op, size)   \
-uint32_t helper_##op(CPUPPCState *env, ppc_fprp_t *a, ppc_fprp_t *b) \
+target_ulong helper_##op(CPUPPCState *env, ppc_fprp_t *a, ppc_fprp_t *b) \
 {\
 struct PPC_DFP dfp;

[RFC PATCH 3/3] target/ppc: Optimise after CR register tl conversion

2023-09-12 Thread Nicholas Piggin

After changing CR registers from i32 to tl, a number of places that
that previously did type conversion are now redundant moves between
variables that can be removed.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/translate.c | 97 +-
 target/ppc/translate/fixedpoint-impl.c.inc |  3 +-
 target/ppc/translate/fp-impl.c.inc | 17 +---
 4 files changed, 46 insertions(+), 73 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 3472697b30..8fdc3f3546 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -1471,21 +1471,18 @@ static opc_handler_t invalid_handler = {
 
 static inline void gen_op_cmp(TCGv arg0, TCGv arg1, int s, int crf)
 {
-TCGv t0 = tcg_temp_new();
-TCGv t1 = tcg_temp_new();
-TCGv t = tcg_temp_new();
+TCGv tmp = tcg_temp_new();
+TCGv cr = cpu_crf[crf];
 
-tcg_gen_movi_tl(t0, CRF_EQ);
-tcg_gen_movi_tl(t1, CRF_LT);
+tcg_gen_movi_tl(cr, CRF_EQ);
+tcg_gen_movi_tl(tmp, CRF_LT);
 tcg_gen_movcond_tl((s ? TCG_COND_LT : TCG_COND_LTU),
-   t0, arg0, arg1, t1, t0);
-tcg_gen_movi_tl(t1, CRF_GT);
+   cr, arg0, arg1, tmp, cr);
+tcg_gen_movi_tl(tmp, CRF_GT);
 tcg_gen_movcond_tl((s ? TCG_COND_GT : TCG_COND_GTU),
-   t0, arg0, arg1, t1, t0);
+   cr, arg0, arg1, tmp, cr);
 
-tcg_gen_mov_tl(t, t0);
-tcg_gen_mov_tl(cpu_crf[crf], cpu_so);
-tcg_gen_or_tl(cpu_crf[crf], cpu_crf[crf], t);
+tcg_gen_or_tl(cr, cr, cpu_so);
 }
 
 static inline void gen_op_cmpi(TCGv arg0, target_ulong arg1, int s, int crf)
@@ -1531,19 +1528,16 @@ static void gen_cmprb(DisasContext *ctx)
 TCGv src2 = tcg_temp_new();
 TCGv src2lo = tcg_temp_new();
 TCGv src2hi = tcg_temp_new();
-TCGv crf = cpu_crf[crfD(ctx->opcode)];
-
-tcg_gen_mov_tl(src1, cpu_gpr[rA(ctx->opcode)]);
-tcg_gen_mov_tl(src2, cpu_gpr[rB(ctx->opcode)]);
+TCGv cr = cpu_crf[crfD(ctx->opcode)];
 
-tcg_gen_andi_tl(src1, src1, 0xFF);
-tcg_gen_ext8u_tl(src2lo, src2);
-tcg_gen_shri_tl(src2, src2, 8);
+tcg_gen_andi_tl(src1, cpu_gpr[rA(ctx->opcode)], 0xFF);
+tcg_gen_ext8u_tl(src2lo, cpu_gpr[rB(ctx->opcode)]);
+tcg_gen_shri_tl(src2, cpu_gpr[rB(ctx->opcode)], 8);
 tcg_gen_ext8u_tl(src2hi, src2);
 
 tcg_gen_setcond_tl(TCG_COND_LEU, src2lo, src2lo, src1);
 tcg_gen_setcond_tl(TCG_COND_LEU, src2hi, src1, src2hi);
-tcg_gen_and_tl(crf, src2lo, src2hi);
+tcg_gen_and_tl(cr, src2lo, src2hi);
 
 if (ctx->opcode & 0x0020) {
 tcg_gen_shri_tl(src2, src2, 8);
@@ -1553,9 +1547,9 @@ static void gen_cmprb(DisasContext *ctx)
 tcg_gen_setcond_tl(TCG_COND_LEU, src2lo, src2lo, src1);
 tcg_gen_setcond_tl(TCG_COND_LEU, src2hi, src1, src2hi);
 tcg_gen_and_tl(src2lo, src2lo, src2hi);
-tcg_gen_or_tl(crf, crf, src2lo);
+tcg_gen_or_tl(cr, cr, src2lo);
 }
-tcg_gen_shli_tl(crf, crf, CRF_GT_BIT);
+tcg_gen_shli_tl(cr, cr, CRF_GT_BIT);
 }
 
 #if defined(TARGET_PPC64)
@@ -1572,11 +1566,11 @@ static void gen_isel(DisasContext *ctx)
 {
 uint32_t bi = rC(ctx->opcode);
 uint32_t mask = 0x08 >> (bi & 0x03);
+TCGv cr = cpu_crf[bi >> 2];
 TCGv t0 = tcg_temp_new();
 TCGv zr;
 
-tcg_gen_mov_tl(t0, cpu_crf[bi >> 2]);
-tcg_gen_andi_tl(t0, t0, mask);
+tcg_gen_andi_tl(t0, cr, mask);
 
 zr = tcg_constant_tl(0);
 tcg_gen_movcond_tl(TCG_COND_NE, cpu_gpr[rD(ctx->opcode)], t0, zr,
@@ -3806,13 +3800,12 @@ static void gen_conditional_store(DisasContext *ctx, 
MemOp memop)
 {
 TCGLabel *lfail;
 TCGv EA;
-TCGv cr0;
+TCGv cr0 = cpu_crf[0];
 TCGv t0;
 int rs = rS(ctx->opcode);
 
 lfail = gen_new_label();
 EA = tcg_temp_new();
-cr0 = tcg_temp_new();
 t0 = tcg_temp_new();
 
 tcg_gen_mov_tl(cr0, cpu_so);
@@ -3829,7 +3822,6 @@ static void gen_conditional_store(DisasContext *ctx, 
MemOp memop)
 tcg_gen_or_tl(cr0, cr0, t0);
 
 gen_set_label(lfail);
-tcg_gen_mov_tl(cpu_crf[0], cr0);
 tcg_gen_movi_tl(cpu_reserve, -1);
 }
 
@@ -3885,7 +3877,7 @@ static void gen_stqcx_(DisasContext *ctx)
 {
 TCGLabel *lfail;
 TCGv EA, t0, t1;
-TCGv cr0;
+TCGv cr0 = cpu_crf[0];
 TCGv_i128 cmp, val;
 int rs = rS(ctx->opcode);
 
@@ -3896,7 +3888,6 @@ static void gen_stqcx_(DisasContext *ctx)
 
 lfail = gen_new_label();
 EA = tcg_temp_new();
-cr0 = tcg_temp_new();
 
 tcg_gen_mov_tl(cr0, cpu_so);
 gen_set_access_type(ctx, ACCESS_RES);
@@ -3928,7 +3919,6 @@ static void gen_stqcx_(DisasContext *ctx)
 tcg_gen_or_tl(cr0, cr0, t0);
 
 gen_set_label(lfail);
-tcg_gen_mov_tl(cpu_crf[0], cr0);
 tcg_gen_movi_tl(cpu_reserve, -1);
 }
 #endif /* defined(TARGET_PPC64) */
@@ -4680,34 +4670,30 @@ static void gen_mcrxrx(DisasContext *ctx)
 /* mfcr mfocrf */
 static void gen_mfcr(DisasContext *ctx)
 {
+TCGv dst = cpu_gpr[rD(ctx->opcode)];
 uint32_t crm, crn;

[RFC PATCH 2/3] target/ppc: Use FP CR1 update helper more widely

2023-09-12 Thread Nicholas Piggin

Several places open-code this FP CR1 update. Move them to call
gen_set_cr1_from_fpscr().

FPSCR_OX = 28 so move that to the symbolic constant while we are here.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/translate/fp-impl.c.inc | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/target/ppc/translate/fp-impl.c.inc 
b/target/ppc/translate/fp-impl.c.inc
index 4e355cb379..9f71c039ce 100644
--- a/target/ppc/translate/fp-impl.c.inc
+++ b/target/ppc/translate/fp-impl.c.inc
@@ -20,12 +20,12 @@ static void gen_set_cr1_from_fpscr(DisasContext *ctx)
 {
 TCGv tmp = tcg_temp_new();
 tcg_gen_mov_tl(tmp, cpu_fpscr);
-tcg_gen_shri_tl(cpu_crf[1], tmp, 28);
+tcg_gen_shri_tl(cpu_crf[1], tmp, FPSCR_OX);
 }
 #else
 static void gen_set_cr1_from_fpscr(DisasContext *ctx)
 {
-tcg_gen_shri_tl(cpu_crf[1], cpu_fpscr, 28);
+tcg_gen_shri_tl(cpu_crf[1], cpu_fpscr, FPSCR_OX);
 }
 #endif
 
@@ -694,8 +694,7 @@ static void gen_mtfsb0(DisasContext *ctx)
 gen_helper_fpscr_clrbit(cpu_env, tcg_constant_i32(crb));
 }
 if (unlikely(Rc(ctx->opcode) != 0)) {
-tcg_gen_mov_tl(cpu_crf[1], cpu_fpscr);
-tcg_gen_shri_tl(cpu_crf[1], cpu_crf[1], FPSCR_OX);
+gen_set_cr1_from_fpscr(ctx);
 }
 }
 
@@ -714,8 +713,7 @@ static void gen_mtfsb1(DisasContext *ctx)
 gen_helper_fpscr_setbit(cpu_env, tcg_constant_i32(crb));
 }
 if (unlikely(Rc(ctx->opcode) != 0)) {
-tcg_gen_mov_tl(cpu_crf[1], cpu_fpscr);
-tcg_gen_shri_tl(cpu_crf[1], cpu_crf[1], FPSCR_OX);
+gen_set_cr1_from_fpscr(ctx);
 }
 /* We can raise a deferred exception */
 gen_helper_fpscr_check_status(cpu_env);
@@ -750,8 +748,7 @@ static void gen_mtfsf(DisasContext *ctx)
 get_fpr(t1, rB(ctx->opcode));
 gen_helper_store_fpscr(cpu_env, t1, t0);
 if (unlikely(Rc(ctx->opcode) != 0)) {
-tcg_gen_mov_tl(cpu_crf[1], cpu_fpscr);
-tcg_gen_shri_tl(cpu_crf[1], cpu_crf[1], FPSCR_OX);
+gen_set_cr1_from_fpscr(ctx);
 }
 /* We can raise a deferred exception */
 gen_helper_fpscr_check_status(cpu_env);
@@ -779,8 +776,7 @@ static void gen_mtfsfi(DisasContext *ctx)
 t1 = tcg_constant_i32(1 << sh);
 gen_helper_store_fpscr(cpu_env, t0, t1);
 if (unlikely(Rc(ctx->opcode) != 0)) {
-tcg_gen_mov_tl(cpu_crf[1], cpu_fpscr);
-tcg_gen_shri_tl(cpu_crf[1], cpu_crf[1], FPSCR_OX);
+gen_set_cr1_from_fpscr(ctx);
 }
 /* We can raise a deferred exception */
 gen_helper_fpscr_check_status(cpu_env);
-- 
2.40.1

[RFC PATCH 0/3] target/ppc: Change CR registers from i32 to tl

2023-09-12 Thread Nicholas Piggin

This is a bit of churn so I might leave it for later in the cycle (or
defer if we get a lot of other changes) since it's a relatively
mechanical change. So don't spend time reviewing details, I'm just
wondering about concept and general approach.

I'm not sure the history of why these are 32-bit, maybe better code gen
on 32-bit host emulating 64-bit? If so, that shouldn't be so important
now that most people use 64-bit systems to develop and test with.

Thanks,
Nick

Nicholas Piggin (3):
  target/ppc: Change CR registers from i32 to tl
  target/ppc: Use FP CR1 update helper more widely
  target/ppc: Optimise after CR register tl conversion

 target/ppc/cpu.h |   2 +-
 target/ppc/cpu_init.c|   2 +-
 target/ppc/dfp_helper.c  |  12 +-
 target/ppc/fpu_helper.c  |  38 +--
 target/ppc/helper.h  | 124 +-
 target/ppc/int_helper.c  |  58 ++---
 target/ppc/machine.c |   2 +-
 target/ppc/translate.c   | 239 +--
 target/ppc/translate/fixedpoint-impl.c.inc   |   3 +-
 target/ppc/translate/fp-impl.c.inc   |  31 +--
 target/ppc/translate/spe-impl.c.inc  |  18 +-
 target/ppc/translate/storage-ctrl-impl.c.inc |   4 +-
 target/ppc/translate/vmx-impl.c.inc  |  24 +-
 target/ppc/translate/vsx-impl.c.inc  |   6 +-
 15 files changed, 267 insertions(+), 298 deletions(-)

-- 
2.40.1

Re: [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery

2023-09-12 Thread Peter Xu

On Tue, Sep 12, 2023 at 07:49:37PM -0300, Fabiano Rosas wrote:
> I figured what is going on here (test #1). At postcopy_pause_incoming()
> the state transition is ACTIVE -> PAUSED, but when the first recovery
> fails on the incoming side, the transition would have to be RECOVER ->
> PAUSED.
> 
> Could you add that change to this patch?

Yes, and actually, see:

https://lore.kernel.org/qemu-devel/2023091145.731099-11-pet...@redhat.com/

> > -bool migration_postcopy_is_alive(void)
> > +bool migration_postcopy_is_alive(int state)
> >  {
> >  MigrationState *s = migrate_get_current();
> >  
> 
> Note there's a missing hunk here to actually use the 'state'.

Yes.. I fixed it in the version I just posted, here:

https://lore.kernel.org/qemu-devel/2023091145.731099-10-pet...@redhat.com/

+bool migration_postcopy_is_alive(int state)
+{
+switch (state) {
+case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+case MIGRATION_STATUS_POSTCOPY_RECOVER:
+return true;
+default:
+return false;
+}
+}

[...]

> >> Here, with this patch the migration gets stuck unless we call
> >> migrate_pause() one more time. After another round of migrate_pause +
> >> recover, it finishes properly.
> 
> Here (test #2), the issue is that the sockets are unpaired, so there's
> no G_IO_IN to trigger the qio_channel watch callback. The incoming side
> never calls fd_accept_incoming_migration() and the RP hangs waiting for
> something. I think there's no other way to unblock aside from the
> explicit qmp_migrate_pause().

Exactly, that's the "trick" I mentioned. :)

Sorry when replying just now I seem to have jumped over some sections.
See:

https://lore.kernel.org/qemu-devel/2023091145.731099-12-pet...@redhat.com/

I put a rich comment for that:

+/*
+ * Write the 1st byte as QEMU_VM_COMMAND (0x8) for the dest socket, to
+ * emulate the 1st byte of a real recovery, but stops from there to
+ * keep dest QEMU in RECOVER.  This is needed so that we can kick off
+ * the recover process on dest QEMU (by triggering the G_IO_IN event).
+ *
+ * NOTE: this trick is not needed on src QEMUs, because src doesn't
+ * rely on an pre-existing G_IO_IN event, so it will always trigger the
+ * upcoming recovery anyway even if it can read nothing.
+ */
+#define QEMU_VM_COMMAND  0x08
+c = QEMU_VM_COMMAND;
+ret = send(pair2[1], , 1, 0);
+g_assert_cmpint(ret, ==, 1);

> We could give them both separate files and the result would be more
> predictable.

Please have a look at the test patch I posted (note!  it's still under your
name but I changed it quite a lot with my sign-off).  I used your 2nd
method to create socket pairs, and hopefully that provides very reliable
way to put both src/dst sides into RECOVER state, then kick it out of it
using qmp migrate-pause on both sides.

> You can take it. Or drop it if it ends being too artificial.

I like your suggestion on having the test case, and I hope the new version
in above link I posted isn't so artificial; the only part I don't like
about that was the "write 1 byte" trick for dest qemu, but that seems still
okay.  Feel free to go and have a look.

Thanks a lot,

-- 
Peter Xu

[PATCH v3 1/5] block: remove AIOCBInfo->get_aio_context()

2023-09-12 Thread Stefan Hajnoczi

The synchronous bdrv_aio_cancel() function needs the acb's AioContext so
it can call aio_poll() to wait for cancellation.

It turns out that all users run under the BQL in the main AioContext, so
this callback is not needed.

Remove the callback, mark bdrv_aio_cancel() GLOBAL_STATE_CODE just like
its blk_aio_cancel() caller, and poll the main loop AioContext.

The purpose of this cleanup is to identify bdrv_aio_cancel() as an API
that does not work with the multi-queue block layer.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio.h|  1 -
 include/block/block-global-state.h |  2 ++
 include/block/block-io.h   |  1 -
 block/block-backend.c  | 17 -
 block/io.c | 23 ---
 hw/nvme/ctrl.c |  7 ---
 softmmu/dma-helpers.c  |  8 
 util/thread-pool.c |  8 
 8 files changed, 10 insertions(+), 57 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 32042e8905..bcc165c974 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -31,7 +31,6 @@ typedef void BlockCompletionFunc(void *opaque, int ret);
 
 typedef struct AIOCBInfo {
 void (*cancel_async)(BlockAIOCB *acb);
-AioContext *(*get_aio_context)(BlockAIOCB *acb);
 size_t aiocb_size;
 } AIOCBInfo;
 
diff --git a/include/block/block-global-state.h 
b/include/block/block-global-state.h
index f347199bff..ac2a605ef5 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -185,6 +185,8 @@ void bdrv_drain_all_begin_nopoll(void);
 void bdrv_drain_all_end(void);
 void bdrv_drain_all(void);
 
+void bdrv_aio_cancel(BlockAIOCB *acb);
+
 int bdrv_has_zero_init_1(BlockDriverState *bs);
 int bdrv_has_zero_init(BlockDriverState *bs);
 BlockDriverState *bdrv_find_node(const char *node_name);
diff --git a/include/block/block-io.h b/include/block/block-io.h
index 6db48f2d35..f1c796a1ce 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -101,7 +101,6 @@ bdrv_co_delete_file_noerr(BlockDriverState *bs);
 
 
 /* async block I/O */
-void bdrv_aio_cancel(BlockAIOCB *acb);
 void bdrv_aio_cancel_async(BlockAIOCB *acb);
 
 /* sg packet commands */
diff --git a/block/block-backend.c b/block/block-backend.c
index 4009ed5fed..a77295a198 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -33,8 +33,6 @@
 
 #define NOT_DONE 0x7fff /* used while emulated sync operation in progress 
*/
 
-static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb);
-
 typedef struct BlockBackendAioNotifier {
 void (*attached_aio_context)(AioContext *new_context, void *opaque);
 void (*detach_aio_context)(void *opaque);
@@ -103,7 +101,6 @@ typedef struct BlockBackendAIOCB {
 } BlockBackendAIOCB;
 
 static const AIOCBInfo block_backend_aiocb_info = {
-.get_aio_context = blk_aiocb_get_aio_context,
 .aiocb_size = sizeof(BlockBackendAIOCB),
 };
 
@@ -1545,16 +1542,8 @@ typedef struct BlkAioEmAIOCB {
 bool has_returned;
 } BlkAioEmAIOCB;
 
-static AioContext *blk_aio_em_aiocb_get_aio_context(BlockAIOCB *acb_)
-{
-BlkAioEmAIOCB *acb = container_of(acb_, BlkAioEmAIOCB, common);
-
-return blk_get_aio_context(acb->rwco.blk);
-}
-
 static const AIOCBInfo blk_aio_em_aiocb_info = {
 .aiocb_size = sizeof(BlkAioEmAIOCB),
-.get_aio_context= blk_aio_em_aiocb_get_aio_context,
 };
 
 static void blk_aio_complete(BlkAioEmAIOCB *acb)
@@ -2434,12 +2423,6 @@ AioContext *blk_get_aio_context(BlockBackend *blk)
 return blk->ctx;
 }
 
-static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb)
-{
-BlockBackendAIOCB *blk_acb = DO_UPCAST(BlockBackendAIOCB, common, acb);
-return blk_get_aio_context(blk_acb->blk);
-}
-
 int blk_set_aio_context(BlockBackend *blk, AioContext *new_context,
 Error **errp)
 {
diff --git a/block/io.c b/block/io.c
index ba23a9bcd3..209a6da0c8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2950,25 +2950,18 @@ int bdrv_load_vmstate(BlockDriverState *bs, uint8_t 
*buf,
 /**/
 /* async I/Os */
 
+/**
+ * Synchronously cancels an acb. Must be called with the BQL held and the acb
+ * must be processed with the BQL held too (IOThreads are not allowed).
+ *
+ * Use bdrv_aio_cancel_async() instead when possible.
+ */
 void bdrv_aio_cancel(BlockAIOCB *acb)
 {
-IO_CODE();
+GLOBAL_STATE_CODE();
 qemu_aio_ref(acb);
 bdrv_aio_cancel_async(acb);
-while (acb->refcnt > 1) {
-if (acb->aiocb_info->get_aio_context) {
-aio_poll(acb->aiocb_info->get_aio_context(acb), true);
-} else if (acb->bs) {
-/* qemu_aio_ref and qemu_aio_unref are not thread-safe, so
- * assert that we're not using an I/O thread.  Thread-safe
- * code should use bdrv_aio_cancel_async exclusively.
- */
-assert(bdrv_get_aio_context(acb->bs) ==

[PATCH v3 4/5] block-backend: process zoned requests in the current AioContext

2023-09-12 Thread Stefan Hajnoczi

Process zoned requests in the current thread's AioContext instead of in
the BlockBackend's AioContext.

There is no need to use the BlockBackend's AioContext thanks to CoMutex
bs->wps->colock, which protects zone metadata.

Signed-off-by: Stefan Hajnoczi 
---
 block/block-backend.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 4863be5691..427ebcc0e4 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1890,11 +1890,11 @@ BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, 
int64_t offset,
 acb->has_returned = false;
 
 co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
-aio_co_enter(blk_get_aio_context(blk), co);
+aio_co_enter(qemu_get_current_aio_context(), co);
 
 acb->has_returned = true;
 if (acb->rwco.ret != NOT_DONE) {
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  blk_aio_complete_bh, acb);
 }
 
@@ -1931,11 +1931,11 @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, 
BlockZoneOp op,
 acb->has_returned = false;
 
 co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
-aio_co_enter(blk_get_aio_context(blk), co);
+aio_co_enter(qemu_get_current_aio_context(), co);
 
 acb->has_returned = true;
 if (acb->rwco.ret != NOT_DONE) {
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  blk_aio_complete_bh, acb);
 }
 
@@ -1971,10 +1971,10 @@ BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, 
int64_t *offset,
 acb->has_returned = false;
 
 co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
-aio_co_enter(blk_get_aio_context(blk), co);
+aio_co_enter(qemu_get_current_aio_context(), co);
 acb->has_returned = true;
 if (acb->rwco.ret != NOT_DONE) {
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  blk_aio_complete_bh, acb);
 }
 
-- 
2.41.0

[PATCH v3 0/5] block-backend: process I/O in the current AioContext

2023-09-12 Thread Stefan Hajnoczi

v3
- Add Patch 2 to fix a race condition in test-bdrv-drain. This was the CI
  failure that bumped this patch series from Kevin's pull request.
- Add missing 051.pc.out file. I tried qemu-system-aarch64 to see of 051.out
  also needs to be updated, but no changes were necessary. [Kevin]
v2
- Add patch to remove AIOCBInfo->get_aio_context() [Kevin]
- Add patch to use qemu_get_current_aio_context() in block-coroutine-wrapper so
  that the wrappers use the current AioContext instead of
  bdrv_get_aio_context().

Switch blk_aio_*() APIs over to multi-queue by using
qemu_get_current_aio_context() instead of blk_get_aio_context(). This change
will allow devices to process I/O in multiple IOThreads in the future.

The final patch requires my QIOChannel AioContext series to pass
tests/qemu-iotests/check -qcow2 281 because the nbd block driver is now
accessed from the main loop thread in addition to the IOThread:
https://lore.kernel.org/qemu-devel/20230823234504.1387239-1-stefa...@redhat.com/T/#t

Based-on: 20230823234504.1387239-1-stefa...@redhat.com

Stefan Hajnoczi (5):
  block: remove AIOCBInfo->get_aio_context()
  test-bdrv-drain: avoid race with BH in IOThread drain test
  block-backend: process I/O in the current AioContext
  block-backend: process zoned requests in the current AioContext
  block-coroutine-wrapper: use qemu_get_current_aio_context()

 include/block/aio.h|  1 -
 include/block/block-global-state.h |  2 ++
 include/block/block-io.h   |  1 -
 block/block-backend.c  | 35 --
 block/io.c | 23 +++-
 hw/nvme/ctrl.c |  7 --
 softmmu/dma-helpers.c  |  8 ---
 tests/unit/test-bdrv-drain.c   |  8 +++
 util/thread-pool.c |  8 ---
 scripts/block-coroutine-wrapper.py |  6 ++---
 tests/qemu-iotests/051.pc.out  |  4 ++--
 11 files changed, 31 insertions(+), 72 deletions(-)

-- 
2.41.0

[PATCH v3 2/5] test-bdrv-drain: avoid race with BH in IOThread drain test

2023-09-12 Thread Stefan Hajnoczi

This patch fixes a race condition in test-bdrv-drain that is difficult
to reproduce. test-bdrv-drain sometimes fails without an error message
on the block pull request sent by Kevin Wolf on Sep 4, 2023. I was able
to reproduce it locally and found that "block-backend: process I/O in
the current AioContext" (in this patch series) is the first commit where
it reproduces.

I do not know why "block-backend: process I/O in the current AioContext"
exposes this bug. It might be related to the fact that the test's preadv
request runs in the main thread instead of IOThread a after my commit.
That might simply change the timing of the test.

Now on to the race condition in test-bdrv-drain. The main thread
schedules a BH in IOThread a and then drains the BDS:

  aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, );

  /* The request is running on the IOThread a. Draining its block device
   * will make sure that it has completed as far as the BDS is concerned,
   * but the drain in this thread can continue immediately after
   * bdrv_dec_in_flight() and aio_ret might be assigned only slightly
   * later. */
  do_drain_begin(drain_type, bs);

If the BH completes before do_drain_begin() then there is nothing to
worry about.

If the BH invokes bdrv_flush() before do_drain_begin(), then
do_drain_begin() waits for it to complete.

The problematic case is when do_drain_begin() runs before the BH enters
bdrv_flush(). Then do_drain_begin() misses the BH and the drain
mechanism has failed in quiescing I/O.

Fix this by incrementing the in_flight counter so that do_drain_begin()
waits for test_iothread_main_thread_bh().

Signed-off-by: Stefan Hajnoczi 
---
 tests/unit/test-bdrv-drain.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index ccc453c29e..67a79aa3f0 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -512,6 +512,7 @@ static void test_iothread_main_thread_bh(void *opaque)
  * executed during drain, otherwise this would deadlock. */
 aio_context_acquire(bdrv_get_aio_context(data->bs));
 bdrv_flush(data->bs);
+bdrv_dec_in_flight(data->bs); /* incremented by test_iothread_common() */
 aio_context_release(bdrv_get_aio_context(data->bs));
 }
 
@@ -583,6 +584,13 @@ static void test_iothread_common(enum drain_type 
drain_type, int drain_thread)
 aio_context_acquire(ctx_a);
 }
 
+/*
+ * Increment in_flight so that do_drain_begin() waits for
+ * test_iothread_main_thread_bh(). This prevents the race between
+ * test_iothread_main_thread_bh() in IOThread a and do_drain_begin() in
+ * this thread. test_iothread_main_thread_bh() decrements in_flight.
+ */
+bdrv_inc_in_flight(bs);
 aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, );
 
 /* The request is running on the IOThread a. Draining its block device
-- 
2.41.0

[PATCH v3 5/5] block-coroutine-wrapper: use qemu_get_current_aio_context()

2023-09-12 Thread Stefan Hajnoczi

Use qemu_get_current_aio_context() in mixed wrappers and coroutine
wrappers so that code runs in the caller's AioContext instead of moving
to the BlockDriverState's AioContext. This change is necessary for the
multi-queue block layer where any thread can call into the block layer.

Most wrappers are IO_CODE where it's safe to use the current AioContext
nowadays. BlockDrivers and the core block layer use their own locks and
no longer depend on the AioContext lock for thread-safety.

The bdrv_create() wrapper invokes GLOBAL_STATE code. Using the current
AioContext is safe because this code is only called with the BQL held
from the main loop thread.

The output of qemu-iotests 051 is sensitive to event loop activity.
Update the output because the monitor BH runs at a different time,
causing prompts to be printed differently in the output.

Signed-off-by: Stefan Hajnoczi 
---
 scripts/block-coroutine-wrapper.py | 6 ++
 tests/qemu-iotests/051.pc.out  | 4 ++--
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/scripts/block-coroutine-wrapper.py 
b/scripts/block-coroutine-wrapper.py
index d4a183db61..f93fe154c3 100644
--- a/scripts/block-coroutine-wrapper.py
+++ b/scripts/block-coroutine-wrapper.py
@@ -88,8 +88,6 @@ def __init__(self, wrapper_type: str, return_type: str, name: 
str,
 raise ValueError(f"no_co function can't be rdlock: 
{self.name}")
 self.target_name = f'{subsystem}_{subname}'
 
-self.ctx = self.gen_ctx()
-
 self.get_result = 's->ret = '
 self.ret = 'return s.ret;'
 self.co_ret = 'return '
@@ -162,7 +160,7 @@ def create_mixed_wrapper(func: FuncDecl) -> str:
 {func.co_ret}{name}({ func.gen_list('{name}') });
 }} else {{
 {struct_name} s = {{
-.poll_state.ctx = {func.ctx},
+.poll_state.ctx = qemu_get_current_aio_context(),
 .poll_state.in_progress = true,
 
 { func.gen_block('.{name} = {name},') }
@@ -186,7 +184,7 @@ def create_co_wrapper(func: FuncDecl) -> str:
 {func.return_type} {func.name}({ func.gen_list('{decl}') })
 {{
 {struct_name} s = {{
-.poll_state.ctx = {func.ctx},
+.poll_state.ctx = qemu_get_current_aio_context(),
 .poll_state.in_progress = true,
 
 { func.gen_block('.{name} = {name},') }
diff --git a/tests/qemu-iotests/051.pc.out b/tests/qemu-iotests/051.pc.out
index 4d4af5a486..650cfed8e2 100644
--- a/tests/qemu-iotests/051.pc.out
+++ b/tests/qemu-iotests/051.pc.out
@@ -177,11 +177,11 @@ QEMU_PROG: -device virtio-blk-pci,drive=disk,share-rw=on: 
Cannot change iothread
 
 Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object 
iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 
-device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device 
lsi53c895a,id=lsi0 -device scsi-hd,bus=lsi0.0,drive=disk,share-rw=on
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) QEMU_PROG: -device scsi-hd,bus=lsi0.0,drive=disk,share-rw=on: HBA does 
not support iothreads
+QEMU_PROG: -device scsi-hd,bus=lsi0.0,drive=disk,share-rw=on: HBA does not 
support iothreads
 
 Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object 
iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 
-device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device 
virtio-scsi,id=virtio-scsi1 -device 
scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) QEMU_PROG: -device scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on: 
Cannot change iothread of active block backend
+QEMU_PROG: -device scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on: Cannot 
change iothread of active block backend
 
 Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object 
iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 
-device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device 
virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on
 QEMU X.Y.Z monitor - type 'help' for more information
-- 
2.41.0

[PATCH v3 3/5] block-backend: process I/O in the current AioContext

2023-09-12 Thread Stefan Hajnoczi

Switch blk_aio_*() APIs over to multi-queue by using
qemu_get_current_aio_context() instead of blk_get_aio_context(). This
change will allow devices to process I/O in multiple IOThreads in the
future.

I audited existing blk_aio_*() callers:
- migration/block.c: blk_mig_lock() protects the data accessed by the
  completion callback.
- The remaining emulated devices and exports run with
  qemu_get_aio_context() == blk_get_aio_context().

Signed-off-by: Stefan Hajnoczi 
---
 block/block-backend.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index a77295a198..4863be5691 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1530,7 +1530,7 @@ BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
 acb->blk = blk;
 acb->ret = ret;
 
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  error_callback_bh, acb);
 return >common;
 }
@@ -1584,11 +1584,11 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, 
int64_t offset,
 acb->has_returned = false;
 
 co = qemu_coroutine_create(co_entry, acb);
-aio_co_enter(blk_get_aio_context(blk), co);
+aio_co_enter(qemu_get_current_aio_context(), co);
 
 acb->has_returned = true;
 if (acb->rwco.ret != NOT_DONE) {
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  blk_aio_complete_bh, acb);
 }
 
-- 
2.41.0

CI container image interference between staging and staging-7.2

2023-09-12 Thread Stefan Hajnoczi

Hi,
TL;DR Michael: Please check that the staging-7.2 branch has Dan's
commit e28112d00703abd136e2411d23931f4f891c9244 ("gitlab: stable
staging branches publish containers in a separate tag").

I couldn't explain a check-cfi-x86_64 failure
(https://gitlab.com/qemu-project/qemu/-/jobs/5072006964), so I reran
build-cfi-x86_64 to see if it has an effect on its dependencies.

To my surprise the rerun of build-cfi-x86_64 failed:
https://gitlab.com/qemu-project/qemu/-/jobs/5072087783

The first run was successful:
https://gitlab.com/qemu-project/qemu/-/jobs/5071532799

Diffing the output shows that the software versions are different. The
successful run has Python 3.11.5 and meson 1.0.1 while the failed run
has Python 3.10.8 and meson 0.63.3.

I think staging and staging-7.2 pipelines are interfering with each
other. My understanding is that build-cfi-x86_64 uses
registry.gitlab.com/qemu-project/qemu/qemu/fedora:latest and that
should be built from fedora:38. Python 3.10.8 is what Fedora 35 uses.
The staging-7.2 branch's fedora.docker file uses fedora:35.

Stefan

Re: [PATCH v7 14/18] cpu: Call plugin hooks only when ready

2023-09-12 Thread Akihiko Odaki


On 2023/09/12 17:46, Philippe Mathieu-Daudé wrote:

Hi Akihiko,

On 12/9/23 09:12, Akihiko Odaki wrote:

The initialization and exit hooks will not affect the state of vCPU,


What about:

  qemu_plugin_vcpu_init_hook()
    -> plugin_cpu_update__locked()
   -> plugin_cpu_update__async()
  -> bitmap_copy(cpu->plugin_mask, ...)
     tcg_flush_jmp_cache(cpu)
     -> qatomic_set(>tb_jmp_cache->array[i].tb, ...)

?


Hi,

bitmap_copy(cpu->plugin_mask, ...) is contained in the plugin 
infrastructure and shouldn't matter.


The TCG is not started filling caches so tcg_flush_jmp_cache() is 
effectively nop though that is not clearly stated.


By the way, I found plugin_cpu_update__locked() will not synchronously 
call plugin_cpu_update__async() after this change because cpu->created 
will be always true for the system emulation. For user space emulation, 
it has already been broken and it *always* synchronously calls the 
function since cpu->created is not set.


I wrote a change to replace cpu->created with DEVICE(cpu)->realized and 
added to the base patch series ("[PATCH v3 03/12] plugins: Check if vCPU 
is realized" in "[PATCH v3 00/12] gdbstub and TCG plugin improvements").


Regards,
Akihiko Odaki

Re: [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery

2023-09-12 Thread Fabiano Rosas

Peter Xu  writes:

>> Scenario 1:
>> /x86_64/migration/postcopy/recovery/fail-twice
>> 
>> the stacks are:
>> 
>> Thread 8 (Thread 0x7fffd5ffe700 (LWP 30282) "live_migration"):
>>  qemu_sem_wait
>>  ram_dirty_bitmap_sync_all
>>  ram_resume_prepare
>>  qemu_savevm_state_resume_prepare
>>  postcopy_do_resume
>>  postcopy_pause
>>  migration_detect_error
>>  migration_thread
>> 
>> Thread 7 (Thread 0x7fffd67ff700 (LWP 30281) "return path"):
>>  qemu_sem_wait
>>  postcopy_pause_return_path_thread
>>  source_return_path_thread
>
> I guess this is because below path triggers:
>
> if (len > 0) {
> f->buf_size += len;
> f->total_transferred += len;
> } else if (len == 0) {
> qemu_file_set_error_obj(f, -EIO, local_error); <---
> } else {
> qemu_file_set_error_obj(f, len, local_error);
> }
>
> So the src can always write anything into the tmp file, but any read will
> return 0 immediately because file offset is always pointing to the file
> size.

Yes, a 0 return would mean EOF indeed.

>> 
>> This patch seems to fix it, although we cannot call qmp_migrate_recover
>> a second time because the mis state is now in RECOVER:
>> 
>>   "Migrate recover can only be run when postcopy is paused."
>> 
>> Do we maybe need to return the state to PAUSED, or allow
>> qmp_migrate_recover to run in RECOVER, like you did on the src side?

I figured what is going on here (test #1). At postcopy_pause_incoming()
the state transition is ACTIVE -> PAUSED, but when the first recovery
fails on the incoming side, the transition would have to be RECOVER ->
PAUSED.

Could you add that change to this patch?

>
> Ouch, I just noticed that my patch was wrong.
>
> I probably need this:
>
> ===8<===
> From 8c2fb7b4c7488002283c7fb6a5e2aae81b21e04b Mon Sep 17 00:00:00 2001
> From: Peter Xu 
> Date: Tue, 12 Sep 2023 15:49:54 -0400
> Subject: [PATCH] fixup! migration/postcopy: Allow network to fail even during
>  recovery
>
> Signed-off-by: Peter Xu 
> ---
>  migration/migration.h | 2 +-
>  migration/migration.c | 6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/migration/migration.h b/migration/migration.h
> index e7f48e736e..7e61e2ece7 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -482,7 +482,7 @@ int migrate_init(MigrationState *s, Error **errp);
>  bool migration_is_blocked(Error **errp);
>  /* True if outgoing migration has entered postcopy phase */
>  bool migration_in_postcopy(void);
> -bool migration_postcopy_is_alive(void);
> +bool migration_postcopy_is_alive(int state);
>  MigrationState *migrate_get_current(void);
>  
>  uint64_t ram_get_total_transferred_pages(void);
> diff --git a/migration/migration.c b/migration/migration.c
> index de2146c6fc..a9d381886c 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1349,7 +1349,7 @@ bool migration_in_postcopy(void)
>  }
>  }
>  
> -bool migration_postcopy_is_alive(void)
> +bool migration_postcopy_is_alive(int state)
>  {
>  MigrationState *s = migrate_get_current();
>  

Note there's a missing hunk here to actually use the 'state'.

> @@ -1569,7 +1569,7 @@ void qmp_migrate_pause(Error **errp)
>  MigrationIncomingState *mis = migration_incoming_get_current();
>  int ret;
>  
> -if (migration_postcopy_is_alive()) {
> +if (migration_postcopy_is_alive(ms->state)) {
>  /* Source side, during postcopy */
>  Error *error = NULL;
>  
> @@ -1593,7 +1593,7 @@ void qmp_migrate_pause(Error **errp)
>  return;
>  }
>  
> -if (migration_postcopy_is_alive()) {
> +if (migration_postcopy_is_alive(mis->state)) {
>  ret = qemu_file_shutdown(mis->from_src_file);
>  if (ret) {
>  error_setg(errp, "Failed to pause destination migration");
> -- 
> 2.41.0
> ===8<===
>> 
>> 
>> Scenario 2:
>> /x86_64/migration/postcopy/recovery/fail-twice/rp
>> 
>> Thread 8 (Thread 0x7fffd5ffe700 (LWP 30456) "live_migration"):
>>  qemu_sem_wait
>>  ram_dirty_bitmap_sync_all
>>  ram_resume_prepare
>>  qemu_savevm_state_resume_prepare
>>  postcopy_do_resume
>>  postcopy_pause
>>  migration_detect_error
>>  migration_thread
>> 
>> Thread 7 (Thread 0x7fffd67ff700 (LWP 30455) "return path"):
>>  recvmsg
>>  qio_channel_socket_readv
>>  qio_channel_readv_full
>>  qio_channel_read
>>  qemu_fill_buffer
>>  qemu_peek_byte
>>  qemu_get_byte
>>  qemu_get_be16
>>  source_return_path_thread
>> 
>> Here, with this patch the migration gets stuck unless we call
>> migrate_pause() one more time. After another round of migrate_pause +
>> recover, it finishes properly.

Here (test #2), the issue is that the sockets are unpaired, so there's
no G_IO_IN to trigger the qio_channel watch callback. The incoming side
never calls fd_accept_incoming_migration() and the RP hangs waiting for
something. I think there's no other way to unblock aside from the
explicit qmp_migrate_pause().

>> 
>> 
>> 1- hacked-together test:
>> -->8--
>> From

[PATCH v3 04/12] contrib/plugins: Use GRWLock in execlog

2023-09-12 Thread Akihiko Odaki

execlog had the following comment:
> As we could have multiple threads trying to do this we need to
> serialise the expansion under a lock. Threads accessing already
> created entries can continue without issue even if the ptr array
> gets reallocated during resize.

However, when the ptr array gets reallocated, the other threads may have
a stale reference to the old buffer. This results in use-after-free.

Use GRWLock to properly fix this issue.

Fixes: 3d7caf145e ("contrib/plugins: add execlog to log instruction execution 
and memory access")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Alex Bennée 
---
 contrib/plugins/execlog.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/contrib/plugins/execlog.c b/contrib/plugins/execlog.c
index 7129d526f8..82dc2f584e 100644
--- a/contrib/plugins/execlog.c
+++ b/contrib/plugins/execlog.c
@@ -19,7 +19,7 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_version = 
QEMU_PLUGIN_VERSION;
 
 /* Store last executed instruction on each vCPU as a GString */
 static GPtrArray *last_exec;
-static GMutex expand_array_lock;
+static GRWLock expand_array_lock;
 
 static GPtrArray *imatches;
 static GArray *amatches;
@@ -28,18 +28,16 @@ static GArray *amatches;
  * Expand last_exec array.
  *
  * As we could have multiple threads trying to do this we need to
- * serialise the expansion under a lock. Threads accessing already
- * created entries can continue without issue even if the ptr array
- * gets reallocated during resize.
+ * serialise the expansion under a lock.
  */
 static void expand_last_exec(int cpu_index)
 {
-g_mutex_lock(_array_lock);
+g_rw_lock_writer_lock(_array_lock);
 while (cpu_index >= last_exec->len) {
 GString *s = g_string_new(NULL);
 g_ptr_array_add(last_exec, s);
 }
-g_mutex_unlock(_array_lock);
+g_rw_lock_writer_unlock(_array_lock);
 }
 
 /**
@@ -51,8 +49,10 @@ static void vcpu_mem(unsigned int cpu_index, 
qemu_plugin_meminfo_t info,
 GString *s;
 
 /* Find vCPU in array */
+g_rw_lock_reader_lock(_array_lock);
 g_assert(cpu_index < last_exec->len);
 s = g_ptr_array_index(last_exec, cpu_index);
+g_rw_lock_reader_unlock(_array_lock);
 
 /* Indicate type of memory access */
 if (qemu_plugin_mem_is_store(info)) {
@@ -80,10 +80,14 @@ static void vcpu_insn_exec(unsigned int cpu_index, void 
*udata)
 GString *s;
 
 /* Find or create vCPU in array */
+g_rw_lock_reader_lock(_array_lock);
 if (cpu_index >= last_exec->len) {
+g_rw_lock_reader_unlock(_array_lock);
 expand_last_exec(cpu_index);
+g_rw_lock_reader_lock(_array_lock);
 }
 s = g_ptr_array_index(last_exec, cpu_index);
+g_rw_lock_reader_unlock(_array_lock);
 
 /* Print previous instruction in cache */
 if (s->len) {
-- 
2.42.0

[PATCH v3 10/12] target/ppc: Remove references to gdb_has_xml

2023-09-12 Thread Akihiko Odaki

GDB has XML support since 6.7 which was released in 2007.
It's time to remove support for old GDB versions without XML support.

Signed-off-by: Akihiko Odaki 
---
 target/ppc/gdbstub.c | 18 --
 1 file changed, 18 deletions(-)

diff --git a/target/ppc/gdbstub.c b/target/ppc/gdbstub.c
index 778ef73bd7..ec5731e5d6 100644
--- a/target/ppc/gdbstub.c
+++ b/target/ppc/gdbstub.c
@@ -54,12 +54,6 @@ static int ppc_gdb_register_len(int n)
 case 0 ... 31:
 /* gprs */
 return sizeof(target_ulong);
-case 32 ... 63:
-/* fprs */
-if (gdb_has_xml()) {
-return 0;
-}
-return 8;
 case 66:
 /* cr */
 case 69:
@@ -74,12 +68,6 @@ static int ppc_gdb_register_len(int n)
 case 68:
 /* ctr */
 return sizeof(target_ulong);
-case 70:
-/* fpscr */
-if (gdb_has_xml()) {
-return 0;
-}
-return sizeof(target_ulong);
 default:
 return 0;
 }
@@ -132,9 +120,6 @@ int ppc_cpu_gdb_read_register(CPUState *cs, GByteArray 
*buf, int n)
 if (n < 32) {
 /* gprs */
 gdb_get_regl(buf, env->gpr[n]);
-} else if (n < 64) {
-/* fprs */
-gdb_get_reg64(buf, *cpu_fpr_ptr(env, n - 32));
 } else {
 switch (n) {
 case 64:
@@ -158,9 +143,6 @@ int ppc_cpu_gdb_read_register(CPUState *cs, GByteArray 
*buf, int n)
 case 69:
 gdb_get_reg32(buf, cpu_read_xer(env));
 break;
-case 70:
-gdb_get_reg32(buf, env->fpscr);
-break;
 }
 }
 mem_buf = buf->data + buf->len - r;
-- 
2.42.0

[PATCH v3 09/12] target/arm: Remove references to gdb_has_xml

2023-09-12 Thread Akihiko Odaki

GDB has XML support since 6.7 which was released in 2007.
It's time to remove support for old GDB versions without XML support.

Signed-off-by: Akihiko Odaki 
Acked-by: Alex Bennée 
---
 target/arm/gdbstub.c | 32 ++--
 1 file changed, 2 insertions(+), 30 deletions(-)

diff --git a/target/arm/gdbstub.c b/target/arm/gdbstub.c
index 8fc8351df7..b7ace24bfc 100644
--- a/target/arm/gdbstub.c
+++ b/target/arm/gdbstub.c
@@ -46,21 +46,7 @@ int arm_cpu_gdb_read_register(CPUState *cs, GByteArray 
*mem_buf, int n)
 /* Core integer register.  */
 return gdb_get_reg32(mem_buf, env->regs[n]);
 }
-if (n < 24) {
-/* FPA registers.  */
-if (gdb_has_xml()) {
-return 0;
-}
-return gdb_get_zeroes(mem_buf, 12);
-}
-switch (n) {
-case 24:
-/* FPA status register.  */
-if (gdb_has_xml()) {
-return 0;
-}
-return gdb_get_reg32(mem_buf, 0);
-case 25:
+if (n == 25) {
 /* CPSR, or XPSR for M-profile */
 if (arm_feature(env, ARM_FEATURE_M)) {
 return gdb_get_reg32(mem_buf, xpsr_read(env));
@@ -100,21 +86,7 @@ int arm_cpu_gdb_write_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 env->regs[n] = tmp;
 return 4;
 }
-if (n < 24) { /* 16-23 */
-/* FPA registers (ignored).  */
-if (gdb_has_xml()) {
-return 0;
-}
-return 12;
-}
-switch (n) {
-case 24:
-/* FPA status register (ignored).  */
-if (gdb_has_xml()) {
-return 0;
-}
-return 4;
-case 25:
+if (n == 25) {
 /* CPSR, or XPSR for M-profile */
 if (arm_feature(env, ARM_FEATURE_M)) {
 /*
-- 
2.42.0

[PATCH v3 07/12] hw/core/cpu: Return static value with gdb_arch_name()

2023-09-12 Thread Akihiko Odaki

All implementations of gdb_arch_name() returns dynamic duplicates of
static strings. It's also unlikely that there will be an implementation
of gdb_arch_name() that returns a truly dynamic value due to the nature
of the function returning a well-known identifiers. Qualify the value
gdb_arch_name() with const and make all of its implementations return
static strings.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Alex Bennée 
---
 include/hw/core/cpu.h  | 2 +-
 target/ppc/internal.h  | 2 +-
 gdbstub/gdbstub.c  | 3 +--
 target/arm/cpu.c   | 6 +++---
 target/arm/cpu64.c | 4 ++--
 target/i386/cpu.c  | 6 +++---
 target/loongarch/cpu.c | 8 
 target/ppc/gdbstub.c   | 6 +++---
 target/riscv/cpu.c | 6 +++---
 target/s390x/cpu.c | 4 ++--
 target/tricore/cpu.c   | 4 ++--
 11 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index fdcbe87352..4f5c7eb04e 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -164,7 +164,7 @@ struct CPUClass {
 vaddr (*gdb_adjust_breakpoint)(CPUState *cpu, vaddr addr);
 
 const char *gdb_core_xml_file;
-gchar * (*gdb_arch_name)(CPUState *cpu);
+const gchar * (*gdb_arch_name)(CPUState *cpu);
 const char * (*gdb_get_dynamic_xml)(CPUState *cpu, const char *xmlname);
 
 void (*disas_set_info)(CPUState *cpu, disassemble_info *info);
diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index 57acb3212c..974b37aa60 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -221,7 +221,7 @@ void destroy_ppc_opcodes(PowerPCCPU *cpu);
 
 /* gdbstub.c */
 void ppc_gdb_init(CPUState *cs, PowerPCCPUClass *ppc);
-gchar *ppc_gdb_arch_name(CPUState *cs);
+const gchar *ppc_gdb_arch_name(CPUState *cs);
 
 /**
  * prot_for_access_type:
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 12f4d07046..9db4af41c1 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -380,10 +380,9 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
 "");
 
 if (cc->gdb_arch_name) {
-g_autofree gchar *arch = cc->gdb_arch_name(cpu);
 g_string_append_printf(xml,
"%s",
-   arch);
+   cc->gdb_arch_name(cpu));
 }
 g_string_append(xml, "gdb_core_xml_file);
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 6ff6ff2d55..a13c609249 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2316,15 +2316,15 @@ static Property arm_cpu_properties[] = {
 DEFINE_PROP_END_OF_LIST()
 };
 
-static gchar *arm_gdb_arch_name(CPUState *cs)
+static const gchar *arm_gdb_arch_name(CPUState *cs)
 {
 ARMCPU *cpu = ARM_CPU(cs);
 CPUARMState *env = >env;
 
 if (arm_feature(env, ARM_FEATURE_IWMMXT)) {
-return g_strdup("iwmmxt");
+return "iwmmxt";
 }
-return g_strdup("arm");
+return "arm";
 }
 
 #ifndef CONFIG_USER_ONLY
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 96158093cc..6b91aab6b7 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -743,9 +743,9 @@ static void aarch64_cpu_finalizefn(Object *obj)
 {
 }
 
-static gchar *aarch64_gdb_arch_name(CPUState *cs)
+static const gchar *aarch64_gdb_arch_name(CPUState *cs)
 {
-return g_strdup("aarch64");
+return "aarch64";
 }
 
 static void aarch64_cpu_class_init(ObjectClass *oc, void *data)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 00f913b638..5678b52472 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5914,12 +5914,12 @@ static void x86_cpu_load_model(X86CPU *cpu, X86CPUModel 
*model)
 memset(>user_features, 0, sizeof(env->user_features));
 }
 
-static gchar *x86_gdb_arch_name(CPUState *cs)
+static const gchar *x86_gdb_arch_name(CPUState *cs)
 {
 #ifdef TARGET_X86_64
-return g_strdup("i386:x86-64");
+return "i386:x86-64";
 #else
-return g_strdup("i386");
+return "i386";
 #endif
 }
 
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 27fc6e1f33..f88cfa93ce 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -763,9 +763,9 @@ static void loongarch_cpu_class_init(ObjectClass *c, void 
*data)
 #endif
 }
 
-static gchar *loongarch32_gdb_arch_name(CPUState *cs)
+static const gchar *loongarch32_gdb_arch_name(CPUState *cs)
 {
-return g_strdup("loongarch32");
+return "loongarch32";
 }
 
 static void loongarch32_cpu_class_init(ObjectClass *c, void *data)
@@ -777,9 +777,9 @@ static void loongarch32_cpu_class_init(ObjectClass *c, void 
*data)
 cc->gdb_arch_name = loongarch32_gdb_arch_name;
 }
 
-static gchar *loongarch64_gdb_arch_name(CPUState *cs)
+static const gchar *loongarch64_gdb_arch_name(CPUState *cs)
 {
-return g_strdup("loongarch64");
+return "loongarch64";
 }
 
 static void loongarch64_cpu_class_init(ObjectClass *c, void *data)
diff --git a/target/ppc/gdbstub.c

[PATCH v3 06/12] target/arm: Move the reference to arm-core.xml

2023-09-12 Thread Akihiko Odaki

Some subclasses overwrite gdb_core_xml_file member but others don't.
Always initialize the member in the subclasses for consistency.

This especially helps for AArch64; in a following change, the file
specified by gdb_core_xml_file is always looked up even if it's going to
be overwritten later. Looking up arm-core.xml results in an error as
it will not be embedded in the AArch64 build.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Richard Henderson 
---
 target/arm/cpu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 0bb0585441..6ff6ff2d55 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2389,7 +2389,6 @@ static void arm_cpu_class_init(ObjectClass *oc, void 
*data)
 cc->sysemu_ops = _sysemu_ops;
 #endif
 cc->gdb_num_core_regs = 26;
-cc->gdb_core_xml_file = "arm-core.xml";
 cc->gdb_arch_name = arm_gdb_arch_name;
 cc->gdb_get_dynamic_xml = arm_gdb_get_dynamic_xml;
 cc->gdb_stop_before_watchpoint = true;
@@ -2411,8 +2410,10 @@ static void arm_cpu_instance_init(Object *obj)
 static void cpu_register_class_init(ObjectClass *oc, void *data)
 {
 ARMCPUClass *acc = ARM_CPU_CLASS(oc);
+CPUClass *cc = CPU_CLASS(acc);
 
 acc->info = data;
+cc->gdb_core_xml_file = "arm-core.xml";
 }
 
 void arm_cpu_register(const ARMCPUInfo *info)
-- 
2.42.0

[PATCH v3 11/12] gdbstub: Remove gdb_has_xml variable

2023-09-12 Thread Akihiko Odaki

GDB has XML support since 6.7 which was released in 2007.
It's time to remove support for old GDB versions without XML support.

Signed-off-by: Akihiko Odaki 
---
 gdbstub/internals.h|  2 --
 include/exec/gdbstub.h |  8 
 gdbstub/gdbstub.c  | 15 ---
 3 files changed, 25 deletions(-)

diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index fee243081f..7128c4aa85 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -32,8 +32,6 @@ enum {
 typedef struct GDBProcess {
 uint32_t pid;
 bool attached;
-
-/* If gdb sends qXfer:features:read:target.xml this will be populated */
 char *target_xml;
 } GDBProcess;
 
diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index 705be2c5d7..1a01c35f8e 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -45,14 +45,6 @@ int gdbserver_start(const char *port_or_device);
 
 void gdb_set_stop_cpu(CPUState *cpu);
 
-/**
- * gdb_has_xml() - report of gdb supports modern target descriptions
- *
- * This will report true if the gdb negotiated qXfer:features:read
- * target descriptions.
- */
-bool gdb_has_xml(void);
-
 /* in gdbstub-xml.c, generated by scripts/feature_to_c.py */
 extern const GDBFeature gdb_static_features[];
 
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index a4f2bf3723..177dce9ba2 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -349,11 +349,6 @@ static CPUState *gdb_get_cpu(uint32_t pid, uint32_t tid)
 }
 }
 
-bool gdb_has_xml(void)
-{
-return !!gdb_get_cpu_process(gdbserver_state.g_cpu)->target_xml;
-}
-
 static const char *get_feature_xml(const char *p, const char **newp,
GDBProcess *process)
 {
@@ -1086,11 +1081,6 @@ static void handle_set_reg(GArray *params, void 
*user_ctx)
 {
 int reg_size;
 
-if (!gdb_get_cpu_process(gdbserver_state.g_cpu)->target_xml) {
-gdb_put_packet("");
-return;
-}
-
 if (params->len != 2) {
 gdb_put_packet("E22");
 return;
@@ -1107,11 +1097,6 @@ static void handle_get_reg(GArray *params, void 
*user_ctx)
 {
 int reg_size;
 
-if (!gdb_get_cpu_process(gdbserver_state.g_cpu)->target_xml) {
-gdb_put_packet("");
-return;
-}
-
 if (!params->len) {
 gdb_put_packet("E14");
 return;
-- 
2.42.0

[PATCH v3 01/12] gdbstub: Fix target_xml initialization

2023-09-12 Thread Akihiko Odaki

target_xml is no longer a fixed-length array but a pointer to a
variable-length memory.

Fixes: 56e534bd11 ("gdbstub: refactor get_feature_xml")
Signed-off-by: Akihiko Odaki 
---
 gdbstub/softmmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gdbstub/softmmu.c b/gdbstub/softmmu.c
index 9f0b8b5497..42645d2220 100644
--- a/gdbstub/softmmu.c
+++ b/gdbstub/softmmu.c
@@ -292,7 +292,7 @@ static int find_cpu_clusters(Object *child, void *opaque)
 assert(cluster->cluster_id != UINT32_MAX);
 process->pid = cluster->cluster_id + 1;
 process->attached = false;
-process->target_xml[0] = '\0';
+process->target_xml = NULL;
 
 return 0;
 }
-- 
2.42.0

[PATCH v3 12/12] gdbstub: Replace gdb_regs with an array

2023-09-12 Thread Akihiko Odaki

An array is a more appropriate data structure than a list for gdb_regs
since it is initialized only with append operation and read-only after
initialization.

Signed-off-by: Akihiko Odaki 
---
 include/hw/core/cpu.h |  2 +-
 gdbstub/gdbstub.c | 34 --
 2 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 4f5c7eb04e..c84c631242 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -375,7 +375,7 @@ struct CPUState {
 
 CPUJumpCache *tb_jmp_cache;
 
-struct GDBRegisterState *gdb_regs;
+GArray *gdb_regs;
 int gdb_num_regs;
 int gdb_num_g_regs;
 QTAILQ_ENTRY(CPUState) node;
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 177dce9ba2..9810d15278 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -51,7 +51,6 @@ typedef struct GDBRegisterState {
 gdb_get_reg_cb get_reg;
 gdb_set_reg_cb set_reg;
 const char *xml;
-struct GDBRegisterState *next;
 } GDBRegisterState;
 
 GDBState gdbserver_state;
@@ -386,7 +385,8 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
 xml,
 g_markup_printf_escaped("",
 cc->gdb_core_xml_file));
-for (r = cpu->gdb_regs; r; r = r->next) {
+for (guint i = 0; i < cpu->gdb_regs->len; i++) {
+r = _array_index(cpu->gdb_regs, GDBRegisterState, i);
 g_ptr_array_add(
 xml,
 g_markup_printf_escaped("",
@@ -430,7 +430,8 @@ static int gdb_read_register(CPUState *cpu, GByteArray 
*buf, int reg)
 return cc->gdb_read_register(cpu, buf, reg);
 }
 
-for (r = cpu->gdb_regs; r; r = r->next) {
+for (guint i = 0; i < cpu->gdb_regs->len; i++) {
+r = _array_index(cpu->gdb_regs, GDBRegisterState, i);
 if (r->base_reg <= reg && reg < r->base_reg + r->num_regs) {
 return r->get_reg(env, buf, reg - r->base_reg);
 }
@@ -448,7 +449,8 @@ static int gdb_write_register(CPUState *cpu, uint8_t 
*mem_buf, int reg)
 return cc->gdb_write_register(cpu, mem_buf, reg);
 }
 
-for (r = cpu->gdb_regs; r; r = r->next) {
+for (guint i = 0; i < cpu->gdb_regs->len; i++) {
+r =  _array_index(cpu->gdb_regs, GDBRegisterState, i);
 if (r->base_reg <= reg && reg < r->base_reg + r->num_regs) {
 return r->set_reg(env, mem_buf, reg - r->base_reg);
 }
@@ -461,17 +463,22 @@ void gdb_register_coprocessor(CPUState *cpu,
   int num_regs, const char *xml, int g_pos)
 {
 GDBRegisterState *s;
-GDBRegisterState **p;
-
-p = >gdb_regs;
-while (*p) {
-/* Check for duplicates.  */
-if (strcmp((*p)->xml, xml) == 0)
-return;
-p = &(*p)->next;
+guint i;
+
+if (cpu->gdb_regs) {
+for (i = 0; i < cpu->gdb_regs->len; i++) {
+/* Check for duplicates.  */
+s = _array_index(cpu->gdb_regs, GDBRegisterState, i);
+if (strcmp(s->xml, xml) == 0)
+return;
+}
+} else {
+cpu->gdb_regs = g_array_new(false, false, sizeof(GDBRegisterState));
+i = 0;
 }
 
-s = g_new0(GDBRegisterState, 1);
+g_array_set_size(cpu->gdb_regs, i + 1);
+s = _array_index(cpu->gdb_regs, GDBRegisterState, i);
 s->base_reg = cpu->gdb_num_regs;
 s->num_regs = num_regs;
 s->get_reg = get_reg;
@@ -480,7 +487,6 @@ void gdb_register_coprocessor(CPUState *cpu,
 
 /* Add to end of list.  */
 cpu->gdb_num_regs += num_regs;
-*p = s;
 if (g_pos) {
 if (g_pos != s->base_reg) {
 error_report("Error: Bad gdb register numbering for '%s', "
-- 
2.42.0

[PATCH v3 00/12] gdbstub and TCG plugin improvements

2023-09-12 Thread Akihiko Odaki

This series extracts fixes and refactorings that can be applied
independently from "[PATCH RESEND v5 00/26] plugins: Allow to read
registers" as suggested by Nicholas Piggin.

Patch "target/ppc: Remove references to gdb_has_xml" is also updated to
remove some dead code I missed earlier and thus the Reviewed-by tag is
dropped.

V2 -> V3:
  Added patch "plugins: Check if vCPU is realized".

V1 -> V2:
  Rebased.
  Added patch "gdbstub: Fix target_xml initialization".
  Added patch "gdbstub: Fix target.xml response".
  Added patch "gdbstub: Replace gdb_regs with an array".

Akihiko Odaki (12):
  gdbstub: Fix target_xml initialization
  gdbstub: Fix target.xml response
  plugins: Check if vCPU is realized
  contrib/plugins: Use GRWLock in execlog
  gdbstub: Introduce GDBFeature structure
  target/arm: Move the reference to arm-core.xml
  hw/core/cpu: Return static value with gdb_arch_name()
  gdbstub: Use g_markup_printf_escaped()
  target/arm: Remove references to gdb_has_xml
  target/ppc: Remove references to gdb_has_xml
  gdbstub: Remove gdb_has_xml variable
  gdbstub: Replace gdb_regs with an array

 MAINTAINERS   |  2 +-
 meson.build   |  2 +-
 gdbstub/internals.h   |  2 -
 include/exec/gdbstub.h| 17 +++
 include/hw/core/cpu.h |  4 +-
 target/ppc/internal.h |  2 +-
 contrib/plugins/execlog.c | 16 ---
 gdbstub/gdbstub.c | 94 +++
 gdbstub/softmmu.c |  2 +-
 plugins/core.c|  2 +-
 stubs/gdbstub.c   |  6 +--
 target/arm/cpu.c  |  9 ++--
 target/arm/cpu64.c|  4 +-
 target/arm/gdbstub.c  | 32 +
 target/i386/cpu.c |  6 +--
 target/loongarch/cpu.c|  8 ++--
 target/ppc/gdbstub.c  | 24 ++
 target/riscv/cpu.c|  6 +--
 target/s390x/cpu.c|  4 +-
 target/tricore/cpu.c  |  4 +-
 scripts/feature_to_c.py   | 48 
 scripts/feature_to_c.sh   | 69 
 22 files changed, 146 insertions(+), 217 deletions(-)
 create mode 100755 scripts/feature_to_c.py
 delete mode 100644 scripts/feature_to_c.sh

-- 
2.42.0

[PATCH v3 08/12] gdbstub: Use g_markup_printf_escaped()

2023-09-12 Thread Akihiko Odaki

g_markup_printf_escaped() is a safer alternative to simple printf() as
it automatically escapes values.

Signed-off-by: Akihiko Odaki 
---
 gdbstub/gdbstub.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 9db4af41c1..a4f2bf3723 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -373,28 +373,34 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
 if (strncmp(p, "target.xml", len) == 0) {
 if (!process->target_xml) {
 GDBRegisterState *r;
-GString *xml = g_string_new("");
+g_autoptr(GPtrArray) xml = g_ptr_array_new_with_free_func(g_free);
 
-g_string_append(xml,
-""
-"");
+g_ptr_array_add(
+xml,
+g_strdup(""
+ ""
+ ""));
 
 if (cc->gdb_arch_name) {
-g_string_append_printf(xml,
-   "%s",
-   cc->gdb_arch_name(cpu));
+g_ptr_array_add(
+xml,
+g_markup_printf_escaped("%s",
+cc->gdb_arch_name(cpu)));
 }
-g_string_append(xml, "gdb_core_xml_file);
-g_string_append(xml, "\"/>");
+g_ptr_array_add(
+xml,
+g_markup_printf_escaped("",
+cc->gdb_core_xml_file));
 for (r = cpu->gdb_regs; r; r = r->next) {
-g_string_append(xml, "xml);
-g_string_append(xml, "\"/>");
+g_ptr_array_add(
+xml,
+g_markup_printf_escaped("",
+r->xml));
 }
-g_string_append(xml, "");
+g_ptr_array_add(xml, g_strdup(""));
+g_ptr_array_add(xml, NULL);
 
-process->target_xml = g_string_free(xml, false);
+process->target_xml = g_strjoinv(NULL, (void *)xml->pdata);
 }
 return process->target_xml;
 }
-- 
2.42.0

[PATCH v3 02/12] gdbstub: Fix target.xml response

2023-09-12 Thread Akihiko Odaki

It was failing to return target.xml after the first request.

Fixes: 56e534bd11 ("gdbstub: refactor get_feature_xml")
Signed-off-by: Akihiko Odaki 
---
 gdbstub/gdbstub.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 349d348c7b..384191bcb0 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -396,8 +396,8 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
 g_string_append(xml, "");
 
 process->target_xml = g_string_free(xml, false);
-return process->target_xml;
 }
+return process->target_xml;
 }
 /* Is it dynamically generated by the target? */
 if (cc->gdb_get_dynamic_xml) {
-- 
2.42.0

[PATCH v3 03/12] plugins: Check if vCPU is realized

2023-09-12 Thread Akihiko Odaki

The created member of CPUState tells if the vCPU thread is started, and
will be always false for the user space emulation that manages threads
independently. Use the realized member of DeviceState, which is valid
for both of the system and user space emulation.

Fixes: 54cb65d858 ("plugin: add core code")
Signed-off-by: Akihiko Odaki 
---
 plugins/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/plugins/core.c b/plugins/core.c
index 3c4e26c7ed..fcd33a2bff 100644
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -64,7 +64,7 @@ static void plugin_cpu_update__locked(gpointer k, gpointer v, 
gpointer udata)
 CPUState *cpu = container_of(k, CPUState, cpu_index);
 run_on_cpu_data mask = RUN_ON_CPU_HOST_ULONG(*plugin.mask);
 
-if (cpu->created) {
+if (DEVICE(cpu)->realized) {
 async_run_on_cpu(cpu, plugin_cpu_update__async, mask);
 } else {
 plugin_cpu_update__async(cpu, mask);
-- 
2.42.0

[PATCH v3 05/12] gdbstub: Introduce GDBFeature structure

2023-09-12 Thread Akihiko Odaki

Before this change, the information from a XML file was stored in an
array that is not descriptive. Introduce a dedicated structure type to
make it easier to understand and to extend with more fields.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 MAINTAINERS |  2 +-
 meson.build |  2 +-
 include/exec/gdbstub.h  |  9 --
 gdbstub/gdbstub.c   |  6 ++--
 stubs/gdbstub.c |  6 ++--
 scripts/feature_to_c.py | 48 
 scripts/feature_to_c.sh | 69 -
 7 files changed, 63 insertions(+), 79 deletions(-)
 create mode 100755 scripts/feature_to_c.py
 delete mode 100644 scripts/feature_to_c.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 6111b6b4d9..61ff9234d3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2826,7 +2826,7 @@ F: include/exec/gdbstub.h
 F: include/gdbstub/*
 F: gdb-xml/
 F: tests/tcg/multiarch/gdbstub/
-F: scripts/feature_to_c.sh
+F: scripts/feature_to_c.py
 F: scripts/probe-gdb-support.py
 
 Memory API
diff --git a/meson.build b/meson.build
index 98e68ef0b1..5c633f7e01 100644
--- a/meson.build
+++ b/meson.build
@@ -3683,7 +3683,7 @@ common_all = static_library('common',
 dependencies: common_all.dependencies(),
 name_suffix: 'fa')
 
-feature_to_c = find_program('scripts/feature_to_c.sh')
+feature_to_c = find_program('scripts/feature_to_c.py')
 
 if targetos == 'darwin'
   entitlement = find_program('scripts/entitlement.sh')
diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index 16a139043f..705be2c5d7 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -10,6 +10,11 @@
 #define GDB_WATCHPOINT_READ  3
 #define GDB_WATCHPOINT_ACCESS4
 
+typedef struct GDBFeature {
+const char *xmlname;
+const char *xml;
+} GDBFeature;
+
 
 /* Get or set a register.  Returns the size of the register.  */
 typedef int (*gdb_get_reg_cb)(CPUArchState *env, GByteArray *buf, int reg);
@@ -48,7 +53,7 @@ void gdb_set_stop_cpu(CPUState *cpu);
  */
 bool gdb_has_xml(void);
 
-/* in gdbstub-xml.c, generated by scripts/feature_to_c.sh */
-extern const char *const xml_builtin[][2];
+/* in gdbstub-xml.c, generated by scripts/feature_to_c.py */
+extern const GDBFeature gdb_static_features[];
 
 #endif
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 384191bcb0..12f4d07046 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -408,11 +408,11 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
 }
 }
 /* Is it one of the encoded gdb-xml/ files? */
-for (int i = 0; xml_builtin[i][0]; i++) {
-const char *name = xml_builtin[i][0];
+for (int i = 0; gdb_static_features[i].xmlname; i++) {
+const char *name = gdb_static_features[i].xmlname;
 if ((strncmp(name, p, len) == 0) &&
 strlen(name) == len) {
-return xml_builtin[i][1];
+return gdb_static_features[i].xml;
 }
 }
 
diff --git a/stubs/gdbstub.c b/stubs/gdbstub.c
index 2b7aee50d3..580e20702b 100644
--- a/stubs/gdbstub.c
+++ b/stubs/gdbstub.c
@@ -1,6 +1,6 @@
 #include "qemu/osdep.h"
-#include "exec/gdbstub.h"   /* xml_builtin */
+#include "exec/gdbstub.h"   /* gdb_static_features */
 
-const char *const xml_builtin[][2] = {
-  { NULL, NULL }
+const GDBFeature gdb_static_features[] = {
+  { NULL }
 };
diff --git a/scripts/feature_to_c.py b/scripts/feature_to_c.py
new file mode 100755
index 00..bcbcb83beb
--- /dev/null
+++ b/scripts/feature_to_c.py
@@ -0,0 +1,48 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+import os, sys
+
+def writeliteral(indent, bytes):
+sys.stdout.write(' ' * indent)
+sys.stdout.write('"')
+quoted = True
+
+for c in bytes:
+if not quoted:
+sys.stdout.write('\n')
+sys.stdout.write(' ' * indent)
+sys.stdout.write('"')
+quoted = True
+
+if c == b'"'[0]:
+sys.stdout.write('\\"')
+elif c == b'\\'[0]:
+sys.stdout.write('')
+elif c == b'\n'[0]:
+sys.stdout.write('\\n"')
+quoted = False
+elif c >= 32 and c < 127:
+sys.stdout.write(c.to_bytes(1, 'big').decode())
+else:
+sys.stdout.write(f'\{c:03o}')
+
+if quoted:
+sys.stdout.write('"')
+
+sys.stdout.write('#include "qemu/osdep.h"\n' \
+ '#include "exec/gdbstub.h"\n' \
+ '\n'
+ 'const GDBFeature gdb_static_features[] = {\n')
+
+for input in sys.argv[1:]:
+with open(input, 'rb') as file:
+read = file.read()
+
+sys.stdout.write('{\n')
+writeliteral(8, bytes(os.path.basename(input), 'utf-8'))
+sys.stdout.write(',\n')
+writeliteral(8, read)
+sys.stdout.write('\n},\n')
+
+sys.stdout.write('{ NULL

qemu-riscv32 usermode still broken?

2023-09-12 Thread Andreas K. Huettel

Dear all, 

I've once more tried to build up a riscv32 linux install in a qemu-riscv32
usermode systemd-nspawn, and am running into the same problems as some time
ago...

https://dev.gentoo.org/~dilfridge/riscv32/riscv32.tar.xz   (220M)

The problems manifest themselves mostly in bash; if I replace /bin/bash 
with a static x86-64 binary (in the tarball as /bin/bash.amd64), bypassing
qemu, I can make the chroot rebuild itself completely.

https://lists.gnu.org/archive/html/bug-bash/2023-09/msg00119.html
^ Here I'm trying to find out more. 

Bash tests apparently indicate that argv[0] is overwritten, and that
reading through a pipe or from /dev/tty fails or loses data.

Apart from the bash testsuite failing, symptoms are as follows:

* Something seems wrong in the signal handling (?):
--- our package manager (bash/python combo, there bash) hangs reproducibly at 
one point.
--- when I run a console program and try to background it with ctl-z, it hangs
(only the first time per bash instance, it seems)
repeated ctl-c gets me back to the shell, then the program is in the 
background

riscv32 ~ # python
Python 3.11.5 (main, Aug 31 2023, 21:56:30) [GCC 13.2.1 20230826] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
[1]+  Stopped python
^C^C^C^C^C^C^C
riscv32 ~ # ^C
riscv32 ~ # 
riscv32 ~ # jobs
[1]+  Stopped python
riscv32 ~ # fg
python


>>> 

--- make, when building something, seems to always start only one job in 
parallel

Any advice or debugging would be appreciated. 

If we get this running then I can set up regular riscv32 Gentoo stage builds
within a week. [*]

Thanks in advance,
Andreas

PS.
huettel@pinacolada ~ $ /var/lib/machines/riscv32/usr/bin/qemu-riscv32 -version
qemu-riscv32 version 8.1.0
Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers


[*] https://www.gentoo.org/downloads/#riscv

-- 
Andreas K. Hüttel
dilfri...@gentoo.org
Gentoo Linux developer
(council, toolchain, base-system, perl, libreoffice)

signature.asc
Description: This is a digitally signed message part.

[PATCH v2 02/11] migration: Let migrate_set_error() take ownership

2023-09-12 Thread Peter Xu

migrate_set_error() used one error_copy() so it always copy an error.
However that's not the major use case - the major use case is one would
like to pass the error to migrate_set_error() without further touching the
error.

It can be proved if we see most of the callers are freeing the error
explicitly right afterwards.  There're a few outliers (only if when the
caller) where we can use error_copy() explicitly there.

Drop three call sites where we called migrate_set_error() then following a
error_report_err(): otherwise we need to do error_copy() for them. Since we
already have them stored in MigrationState.error, the error report can be
slightly duplicated.

Signed-off-by: Peter Xu 
---
 migration/migration.h|  4 ++--
 migration/channel.c  |  1 -
 migration/migration.c| 25 -
 migration/multifd.c  | 10 --
 migration/postcopy-ram.c |  1 -
 migration/ram.c  |  1 -
 6 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index c390500604..1eefa563c4 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -465,7 +465,7 @@ bool  migration_has_all_channels(void);
 
 uint64_t migrate_max_downtime(void);
 
-void migrate_set_error(MigrationState *s, const Error *error);
+void migrate_set_error(MigrationState *s, Error *error);
 
 void migrate_fd_connect(MigrationState *s, Error *error_in);
 
@@ -510,7 +510,7 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void 
*opaque);
 void migration_make_urgent_request(void);
 void migration_consume_urgent_request(void);
 bool migration_rate_limit(void);
-void migration_cancel(const Error *error);
+void migration_cancel(Error *error);
 
 void migration_populate_vfio_info(MigrationInfo *info);
 void migration_reset_vfio_bytes_transferred(void);
diff --git a/migration/channel.c b/migration/channel.c
index ca3319a309..48b3f6abd6 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -90,7 +90,6 @@ void migration_channel_connect(MigrationState *s,
 }
 }
 migrate_fd_connect(s, error);
-error_free(error);
 }
 
 
diff --git a/migration/migration.c b/migration/migration.c
index 61e91f61af..4b4dba5b12 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -162,7 +162,7 @@ void migration_object_init(void)
 dirty_bitmap_mig_init();
 }
 
-void migration_cancel(const Error *error)
+void migration_cancel(Error *error)
 {
 if (error) {
 migrate_set_error(current_migration, error);
@@ -1218,11 +1218,22 @@ static void migrate_fd_cleanup_bh(void *opaque)
 object_unref(OBJECT(s));
 }
 
-void migrate_set_error(MigrationState *s, const Error *error)
+/*
+ * Set error for current migration state.  The `error' ownership will be
+ * moved from the caller to MigrationState, so the caller doesn't need to
+ * free the error.
+ *
+ * If the caller still needs to reference the `error' passed in, one should
+ * use error_copy() explicitly.
+ */
+void migrate_set_error(MigrationState *s, Error *error)
 {
 QEMU_LOCK_GUARD(>error_mutex);
 if (!s->error) {
-s->error = error_copy(error);
+/* Record the first error triggered */
+s->error = error;
+} else {
+error_free(error);
 }
 }
 
@@ -1235,7 +1246,7 @@ static void migrate_error_free(MigrationState *s)
 }
 }
 
-static void migrate_fd_error(MigrationState *s, const Error *error)
+static void migrate_fd_error(MigrationState *s, Error *error)
 {
 trace_migrate_fd_error(error_get_pretty(error));
 assert(s->to_dst_file == NULL);
@@ -1714,7 +1725,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 if (!resume_requested) {
 yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
-migrate_fd_error(s, local_err);
+migrate_fd_error(s, error_copy(local_err));
 error_propagate(errp, local_err);
 return;
 }
@@ -2637,7 +2648,6 @@ static MigThrError migration_detect_error(MigrationState 
*s)
 
 if (local_error) {
 migrate_set_error(s, local_error);
-error_free(local_error);
 }
 
 if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret) {
@@ -2789,7 +2799,6 @@ static MigIterateState 
migration_iteration_run(MigrationState *s)
 qatomic_read(>start_postcopy)) {
 if (postcopy_start(s, _err)) {
 migrate_set_error(s, local_err);
-error_report_err(local_err);
 }
 return MIG_ITERATE_SKIP;
 }
@@ -3283,7 +3292,6 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 error_setg(_err, "Unable to open return-path for postcopy");
 migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED);
 migrate_set_error(s, local_err);
-error_report_err(local_err);
 migrate_fd_cleanup(s);
 return;
 }
@@ -3308,7 +3316,6 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 
 if

[PATCH v2 07/11] migration: Remember num of ramblocks to sync during recovery

2023-09-12 Thread Peter Xu

Instead of only relying on the count of rp_sem, make the counter be part of
RAMState so it can be used in both threads to synchronize on the process.

rp_sem will be further reused as a way to kick the main thread, e.g., on
recovery failures.

Reviewed-by: Fabiano Rosas 
Signed-off-by: Peter Xu 
---
 migration/ram.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 814c59c17b..a9541c60b4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -394,6 +394,14 @@ struct RAMState {
 /* Queue of outstanding page requests from the destination */
 QemuMutex src_page_req_mutex;
 QSIMPLEQ_HEAD(, RAMSrcPageRequest) src_page_requests;
+
+/*
+ * This is only used when postcopy is in recovery phase, to communicate
+ * between the migration thread and the return path thread on dirty
+ * bitmap synchronizations.  This field is unused in other stages of
+ * RAM migration.
+ */
+unsigned int postcopy_bmap_sync_requested;
 };
 typedef struct RAMState RAMState;
 
@@ -4135,20 +4143,20 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, 
RAMState *rs)
 {
 RAMBlock *block;
 QEMUFile *file = s->to_dst_file;
-int ramblock_count = 0;
 
 trace_ram_dirty_bitmap_sync_start();
 
+qatomic_set(>postcopy_bmap_sync_requested, 0);
 RAMBLOCK_FOREACH_NOT_IGNORED(block) {
 qemu_savevm_send_recv_bitmap(file, block->idstr);
 trace_ram_dirty_bitmap_request(block->idstr);
-ramblock_count++;
+qatomic_inc(>postcopy_bmap_sync_requested);
 }
 
 trace_ram_dirty_bitmap_sync_wait();
 
 /* Wait until all the ramblocks' dirty bitmap synced */
-while (ramblock_count--) {
+while (qatomic_read(>postcopy_bmap_sync_requested)) {
 qemu_sem_wait(>rp_state.rp_sem);
 }
 
@@ -4175,6 +4183,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock 
*block, Error **errp)
 unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS;
 uint64_t local_size = DIV_ROUND_UP(nbits, 8);
 uint64_t size, end_mark;
+RAMState *rs = ram_state;
 
 trace_ram_dirty_bitmap_reload_begin(block->idstr);
 
@@ -4240,6 +4249,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock 
*block, Error **errp)
 /* We'll recalculate migration_dirty_pages in ram_state_resume_prepare(). 
*/
 trace_ram_dirty_bitmap_reload_complete(block->idstr);
 
+qatomic_dec(>postcopy_bmap_sync_requested);
+
 /*
  * We succeeded to sync bitmap for current ramblock. If this is
  * the last one to sync, we need to notify the main send thread.
-- 
2.41.0

[PATCH v2 03/11] migration: Introduce migrate_has_error()

2023-09-12 Thread Peter Xu

Introduce a helper to detect whether MigrationState.error is set for
whatever reason.  It is intended to not taking the error_mutex here because
neither do we reference the pointer, nor do we modify the pointer.  State
why it's safe to do so.

This is preparation work for any thread (e.g. source return path thread) to
setup errors in an unified way to MigrationState, rather than relying on
its own way to set errors (mark_source_rp_bad()).

Reviewed-by: Fabiano Rosas 
Signed-off-by: Peter Xu 
---
 migration/migration.h | 1 +
 migration/migration.c | 7 +++
 2 files changed, 8 insertions(+)

diff --git a/migration/migration.h b/migration/migration.h
index 1eefa563c4..b50e97a098 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -466,6 +466,7 @@ bool  migration_has_all_channels(void);
 uint64_t migrate_max_downtime(void);
 
 void migrate_set_error(MigrationState *s, Error *error);
+bool migrate_has_error(MigrationState *s);
 
 void migrate_fd_connect(MigrationState *s, Error *error_in);
 
diff --git a/migration/migration.c b/migration/migration.c
index 4b4dba5b12..7bd056a4b5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1237,6 +1237,13 @@ void migrate_set_error(MigrationState *s, Error *error)
 }
 }
 
+bool migrate_has_error(MigrationState *s)
+{
+/* The lock is not helpful here, but still follow the rule */
+QEMU_LOCK_GUARD(>error_mutex);
+return qatomic_read(>error);
+}
+
 static void migrate_error_free(MigrationState *s)
 {
 QEMU_LOCK_GUARD(>error_mutex);
-- 
2.41.0

[PATCH v2 04/11] migration: Refactor error handling in source return path

2023-09-12 Thread Peter Xu

rp_state.error was a boolean used to show error happened in return path
thread.  That's not only duplicating error reporting (migrate_set_error),
but also not good enough in that we only do error_report() and set it to
true, we never can keep a history of the exact error and show it in
query-migrate.

To make this better, a few things done:

  - Use error_setg() rather than error_report() across the whole lifecycle
of return path thread, keeping the error in an Error*.

  - Use migrate_set_error() to apply that captured error to the global
migration object when error occured in this thread.

  - With above, no need to have mark_source_rp_bad(), remove it, alongside
with rp_state.error itself.

Reviewed-by: Fabiano Rosas 
Signed-off-by: Peter Xu 
---
 migration/migration.h  |   1 -
 migration/ram.h|   5 +-
 migration/migration.c  | 122 +
 migration/ram.c|  41 +++---
 migration/trace-events |   2 +-
 5 files changed, 89 insertions(+), 82 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index b50e97a098..48322e909e 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -297,7 +297,6 @@ struct MigrationState {
 /* Protected by qemu_file_lock */
 QEMUFile *from_dst_file;
 QemuThreadrp_thread;
-bool  error;
 /*
  * We can also check non-zero of rp_thread, but there's no "official"
  * way to do this, so this bool makes it slightly more elegant.
diff --git a/migration/ram.h b/migration/ram.h
index 145c915ca7..14ed666d58 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -51,7 +51,8 @@ uint64_t ram_bytes_total(void);
 void mig_throttle_counter_reset(void);
 
 uint64_t ram_pagesize_summary(void);
-int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
+int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len,
+ Error **errp);
 void ram_postcopy_migrated_memory_release(MigrationState *ms);
 /* For outgoing discard bitmap */
 void ram_postcopy_send_discard_bitmap(MigrationState *ms);
@@ -71,7 +72,7 @@ void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
 void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
 int64_t ramblock_recv_bitmap_send(QEMUFile *file,
   const char *block_name);
-int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
+int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp);
 bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start);
 void postcopy_preempt_shutdown_file(MigrationState *s);
 void *postcopy_preempt_thread(void *opaque);
diff --git a/migration/migration.c b/migration/migration.c
index 7bd056a4b5..825d8a71d4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1431,7 +1431,6 @@ int migrate_init(MigrationState *s, Error **errp)
 s->to_dst_file = NULL;
 s->state = MIGRATION_STATUS_NONE;
 s->rp_state.from_dst_file = NULL;
-s->rp_state.error = false;
 s->mbps = 0.0;
 s->pages_per_second = 0.0;
 s->downtime = 0;
@@ -1754,14 +1753,14 @@ void qmp_migrate_continue(MigrationStatus state, Error 
**errp)
 qemu_sem_post(>pause_sem);
 }
 
-/* migration thread support */
-/*
- * Something bad happened to the RP stream, mark an error
- * The caller shall print or trace something to indicate why
- */
-static void mark_source_rp_bad(MigrationState *s)
+void migration_rp_wait(MigrationState *s)
 {
-s->rp_state.error = true;
+qemu_sem_wait(>rp_state.rp_sem);
+}
+
+void migration_rp_kick(MigrationState *s)
+{
+qemu_sem_post(>rp_state.rp_sem);
 }
 
 static struct rp_cmd_args {
@@ -1785,7 +1784,7 @@ static struct rp_cmd_args {
  * and we don't need to send pages that have already been sent.
  */
 static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
-   ram_addr_t start, size_t len)
+ram_addr_t start, size_t len, Error 
**errp)
 {
 long our_host_ps = qemu_real_host_page_size();
 
@@ -1797,15 +1796,12 @@ static void migrate_handle_rp_req_pages(MigrationState 
*ms, const char* rbname,
  */
 if (!QEMU_IS_ALIGNED(start, our_host_ps) ||
 !QEMU_IS_ALIGNED(len, our_host_ps)) {
-error_report("%s: Misaligned page request, start: " RAM_ADDR_FMT
- " len: %zd", __func__, start, len);
-mark_source_rp_bad(ms);
+error_setg(errp, "MIG_RP_MSG_REQ_PAGES: Misaligned page request, 
start:"
+   RAM_ADDR_FMT " len: %zd", start, len);
 return;
 }
 
-if (ram_save_queue_pages(rbname, start, len)) {
-mark_source_rp_bad(ms);
-}
+ram_save_queue_pages(rbname, start, len, errp);
 }
 
 /* Return true to retry, false to quit */
@@ -1820,26 +1816,28 @@ static bool 
postcopy_pause_return_path_thread(MigrationState *s)

[PATCH v2 10/11] migration: Allow RECOVER->PAUSED convertion for dest qemu

2023-09-12 Thread Peter Xu

There's a bug on dest that if a double fault triggered on dest qemu (a
network issue during postcopy-recover), we won't set PAUSED correctly
because we assumed we always came from ACTIVE.

Fix that by always overwriting the state to PAUSE.

We could also check for these two states, but maybe it's an overkill.  We
did the same on the src QEMU to unconditionally switch to PAUSE anyway.

Signed-off-by: Peter Xu 
---
 migration/savevm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index bb3e99194c..422406e0ee 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2723,7 +2723,8 @@ static bool 
postcopy_pause_incoming(MigrationIncomingState *mis)
 qemu_mutex_unlock(>postcopy_prio_thread_mutex);
 }
 
-migrate_set_state(>state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+/* Current state can be either ACTIVE or RECOVER */
+migrate_set_state(>state, mis->state,
   MIGRATION_STATUS_POSTCOPY_PAUSED);
 
 /* Notify the fault thread for the invalidated file handle */
-- 
2.41.0

Re: [PATCH 3/3] iotests: distinguish 'skipped' and 'not run' states

2023-09-12 Thread Denis V. Lunev


On 9/12/23 22:03, Vladimir Sementsov-Ogievskiy wrote:

On 06.09.23 17:09, Denis V. Lunev wrote:

Each particular testcase could skipped intentionally and accidentally.
For example the test is not designed for a particular image format or
is not run due to the missed library.

The latter case is unwanted in reality. Though the discussion has
revealed that failing the test in such a case would be bad. Thus the
patch tries to do different thing. It adds additional status for
the test case - 'skipped' and bound intentinal cases to that state.


Hmm. Do I miss something, or in this patch you only split them, not 
making "not run" produce an error? So ./check still reports success 
when some tests are "not run"?


The split itself looks correct to me.


The original talk was to avoid failing of such tests.
If we would let them fail - that could be done much
faster and without dances.

Thus tests are still counted as "skipped other way".
But I would definitely like to concentrate my attention
on something abnormal, i.e. things that should be
run but don't run.

These tests count be missed due to error (see patch 1)
or due to the missed package. And the reason of skip
is barely visible within the list of 100+ tests. While
1-2-3 "should not be skipped" tests are clearly in the
focus.

Den

[PATCH v2 06/11] qemufile: Always return a verbose error

2023-09-12 Thread Peter Xu

There're a lot of cases where we only have an errno set in last_error but
without a detailed error description.  When this happens, try to generate
an error contains the errno as a descriptive error.

This will be helpful in cases where one relies on the Error*.  E.g.,
migration state only caches Error* in MigrationState.error.  With this,
we'll display correct error messages in e.g. query-migrate when the error
was only set by qemu_file_set_error().

Reviewed-by: Fabiano Rosas 
Signed-off-by: Peter Xu 
---
 migration/qemu-file.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index eea7171192..3e64e900c9 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -142,15 +142,24 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks 
*hooks)
  *
  * Return negative error value if there has been an error on previous
  * operations, return 0 if no error happened.
- * Optional, it returns Error* in errp, but it may be NULL even if return value
- * is not 0.
  *
+ * If errp is specified, a verbose error message will be copied over.
  */
 int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
 {
+if (!f->last_error) {
+return 0;
+}
+
+/* There is an error */
 if (errp) {
-*errp = f->last_error_obj ? error_copy(f->last_error_obj) : NULL;
+if (f->last_error_obj) {
+*errp = error_copy(f->last_error_obj);
+} else {
+error_setg_errno(errp, -f->last_error, "Channel error");
+}
 }
+
 return f->last_error;
 }
 
-- 
2.41.0

[PATCH v2 05/11] migration: Deliver return path file error to migrate state too

2023-09-12 Thread Peter Xu

We've already did this for most of the return path thread errors, but not
yet for the IO errors happened on the return path qemufile.  Do that too.

Remember to reset "err" always, because the ownership is not us anymore,
otherwise we're prone to use-after-free later after recovered.

Re-export qemu_file_get_error_obj().

Reviewed-by: Fabiano Rosas 
Signed-off-by: Peter Xu 
---
 migration/qemu-file.h | 1 +
 migration/migration.c | 7 +++
 migration/qemu-file.c | 2 +-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 47015f5201..bc6edc5c39 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -129,6 +129,7 @@ void qemu_file_skip(QEMUFile *f, int size);
 void qemu_file_credit_transfer(QEMUFile *f, size_t size);
 int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp);
 void qemu_file_set_error_obj(QEMUFile *f, int ret, Error *err);
+int qemu_file_get_error_obj(QEMUFile *f, Error **errp);
 void qemu_file_set_error(QEMUFile *f, int ret);
 int qemu_file_shutdown(QEMUFile *f);
 QEMUFile *qemu_file_get_return_path(QEMUFile *f);
diff --git a/migration/migration.c b/migration/migration.c
index 825d8a71d4..216d0e871f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2038,6 +2038,13 @@ out:
 
 res = qemu_file_get_error(rp);
 if (res) {
+/* We have forwarded any error in "err" already, reuse "error" */
+assert(err == NULL);
+/* Try to deliver this file error to migration state */
+qemu_file_get_error_obj(rp, );
+migrate_set_error(ms, err);
+err = NULL;
+
 if (res && migration_in_postcopy()) {
 /*
  * Maybe there is something we can do: it looks like a
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 19c33c9985..eea7171192 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -146,7 +146,7 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks 
*hooks)
  * is not 0.
  *
  */
-static int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
+int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
 {
 if (errp) {
 *errp = f->last_error_obj ? error_copy(f->last_error_obj) : NULL;
-- 
2.41.0

[PATCH v2 01/11] migration: Display error in query-migrate irrelevant of status

2023-09-12 Thread Peter Xu

Display it as long as being set, irrelevant of FAILED status.  E.g., it may
also be applicable to PAUSED stage of postcopy, to provide hint on what has
gone wrong.

The error_mutex seems to be overlooked when referencing the error, add it
to be very safe.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018404
Reviewed-by: Fabiano Rosas 
Signed-off-by: Peter Xu 
---
 qapi/migration.json   | 5 ++---
 migration/migration.c | 8 +---
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 8843e74b59..c241b6d318 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -230,9 +230,8 @@
 # throttled during auto-converge.  This is only present when
 # auto-converge has started throttling guest cpus.  (Since 2.7)
 #
-# @error-desc: the human readable error description string, when
-# @status is 'failed'. Clients should not attempt to parse the
-# error strings.  (Since 2.7)
+# @error-desc: the human readable error description string. Clients
+# should not attempt to parse the error strings.  (Since 2.7)
 #
 # @postcopy-blocktime: total time when all vCPU were blocked during
 # postcopy live migration.  This is only present when the
diff --git a/migration/migration.c b/migration/migration.c
index d61e572742..61e91f61af 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1052,9 +1052,6 @@ static void fill_source_migration_info(MigrationInfo 
*info)
 break;
 case MIGRATION_STATUS_FAILED:
 info->has_status = true;
-if (s->error) {
-info->error_desc = g_strdup(error_get_pretty(s->error));
-}
 break;
 case MIGRATION_STATUS_CANCELLED:
 info->has_status = true;
@@ -1064,6 +1061,11 @@ static void fill_source_migration_info(MigrationInfo 
*info)
 break;
 }
 info->status = state;
+
+QEMU_LOCK_GUARD(>error_mutex);
+if (s->error) {
+info->error_desc = g_strdup(error_get_pretty(s->error));
+}
 }
 
 static void fill_destination_migration_info(MigrationInfo *info)
-- 
2.41.0

[PATCH v2 11/11] tests/migration-test: Add a test for postcopy hangs during RECOVER

2023-09-12 Thread Peter Xu

From: Fabiano Rosas 

To do so, create two paired sockets, but make them not providing real data.
Feed those fake sockets to src/dst QEMUs for recovery to let them go into
RECOVER stage without going out.  Test that we can always kick it out and
recover again with the right ports.

This patch is based on Fabiano's version here:

https://lore.kernel.org/r/877cowmdu0@suse.de

Signed-off-by: Fabiano Rosas 
[peterx: write commit message, remove case 1, fix bugs, and more]
Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 94 
 1 file changed, 94 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 1b43df5ca7..6105c2da65 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -695,6 +695,7 @@ typedef struct {
 /* Postcopy specific fields */
 void *postcopy_data;
 bool postcopy_preempt;
+bool postcopy_recovery_test_fail;
 } MigrateCommon;
 
 static int test_migrate_start(QTestState **from, QTestState **to,
@@ -1357,6 +1358,78 @@ static void test_postcopy_preempt_tls_psk(void)
 }
 #endif
 
+static void wait_for_postcopy_status(QTestStatus *one, const char *status)
+{
+wait_for_migration_status(from, status,
+  (const char * []) { "failed", "active",
+  "completed", NULL });
+}
+
+static void postcopy_recover_fail(QTestState *from, QTestState *to)
+{
+int ret, pair1[2], pair2[2];
+char c;
+
+/* Create two unrelated socketpairs */
+ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair1);
+g_assert_cmpint(ret, ==, 0);
+
+ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair2);
+g_assert_cmpint(ret, ==, 0);
+
+/*
+ * Give the guests unpaired ends of the sockets, so they'll all blocked
+ * at reading.  This mimics a wrong channel established.
+ */
+qtest_qmp_fds_assert_success(from, [0], 1,
+ "{ 'execute': 'getfd',"
+ "  'arguments': { 'fdname': 'fd-mig' }}");
+qtest_qmp_fds_assert_success(to, [0], 1,
+ "{ 'execute': 'getfd',"
+ "  'arguments': { 'fdname': 'fd-mig' }}");
+
+/*
+ * Write the 1st byte as QEMU_VM_COMMAND (0x8) for the dest socket, to
+ * emulate the 1st byte of a real recovery, but stops from there to
+ * keep dest QEMU in RECOVER.  This is needed so that we can kick off
+ * the recover process on dest QEMU (by triggering the G_IO_IN event).
+ *
+ * NOTE: this trick is not needed on src QEMUs, because src doesn't
+ * rely on an pre-existing G_IO_IN event, so it will always trigger the
+ * upcoming recovery anyway even if it can read nothing.
+ */
+#define QEMU_VM_COMMAND  0x08
+c = QEMU_VM_COMMAND;
+ret = send(pair2[1], , 1, 0);
+g_assert_cmpint(ret, ==, 1);
+
+migrate_recover(to, "fd:fd-mig");
+migrate_qmp(from, "fd:fd-mig", "{'resume': true}");
+
+/*
+ * Make sure both QEMU instances will go into RECOVER stage, then test
+ * kicking them out using migrate-pause.
+ */
+wait_for_postcopy_status(from, "postcopy-recover")
+wait_for_postcopy_status(to, "postcopy-recover");
+
+/*
+ * This would be issued by the admin upon noticing the hang, we should
+ * make sure we're able to kick this out.
+ */
+migrate_pause(from);
+wait_for_postcopy_status(from, "postcopy-paused");
+
+/* Do the same test on dest */
+migrate_pause(to);
+wait_for_postcopy_status(to, "postcopy-paused");
+
+close(pair1[0]);
+close(pair1[1]);
+close(pair2[0]);
+close(pair2[1]);
+}
+
 static void test_postcopy_recovery_common(MigrateCommon *args)
 {
 QTestState *from, *to;
@@ -1396,6 +1469,15 @@ static void test_postcopy_recovery_common(MigrateCommon 
*args)
   (const char * []) { "failed", "active",
   "completed", NULL });
 
+if (args->postcopy_recovery_test_fail) {
+/*
+ * Test when a wrong socket specified for recover, and then the
+ * ability to kick it out, and continue with a correct socket.
+ */
+postcopy_recover_fail(from, to);
+/* continue with a good recovery */
+}
+
 /*
  * Create a new socket to emulate a new channel that is different
  * from the broken migration channel; tell the destination to
@@ -1435,6 +1517,15 @@ static void test_postcopy_recovery_compress(void)
 test_postcopy_recovery_common();
 }
 
+static void test_postcopy_recovery_double_fail(void)
+{
+MigrateCommon args = {
+.postcopy_recovery_test_fail = true,
+};
+
+test_postcopy_recovery_common();
+}
+
 #ifdef CONFIG_GNUTLS
 static void test_postcopy_recovery_tls_psk(void)
 {
@@ -2825,6 +2916,9 @@ int main(int argc, char **argv)

[PATCH v2 09/11] migration: Allow network to fail even during recovery

2023-09-12 Thread Peter Xu

Normally the postcopy recover phase should only exist for a super short
period, that's the duration when QEMU is trying to recover from an
interrupted postcopy migration, during which handshake will be carried out
for continuing the procedure with state changes from PAUSED -> RECOVER ->
POSTCOPY_ACTIVE again.

Here RECOVER phase should be super small, that happens right after the
admin specified a new but working network link for QEMU to reconnect to
dest QEMU.

However there can still be case where the channel is broken in this small
RECOVER window.

If it happens, with current code there's no way the src QEMU can got kicked
out of RECOVER stage. No way either to retry the recover in another channel
when established.

This patch allows the RECOVER phase to fail itself too - we're mostly
ready, just some small things missing, e.g. properly kick the main
migration thread out when sleeping on rp_sem when we found that we're at
RECOVER stage.  When this happens, it fails the RECOVER itself, and
rollback to PAUSED stage.  Then the user can retry another round of
recovery.

To make it even stronger, teach QMP command migrate-pause to explicitly
kick src/dst QEMU out when needed, so even if for some reason the migration
thread didn't got kicked out already by a failing rethrn-path thread, the
admin can also kick it out.

This will be an super, super corner case, but still try to cover that.

One can try to test this with two proxy channels for migration:

  (a) socat unix-listen:/tmp/src.sock,reuseaddr,fork tcp:localhost:1
  (b) socat tcp-listen:1,reuseaddr,fork unix:/tmp/dst.sock

So the migration channel will be:

  (a)  (b)
  src -> /tmp/src.sock -> tcp:1 -> /tmp/dst.sock -> dst

Then to make QEMU hang at RECOVER stage, one can do below:

  (1) stop the postcopy using QMP command postcopy-pause
  (2) kill the 2nd proxy (b)
  (3) try to recover the postcopy using /tmp/src.sock on src
  (4) src QEMU will go into RECOVER stage but won't be able to continue
  from there, because the channel is actually broken at (b)

Before this patch, step (4) will make src QEMU stuck in RECOVER stage,
without a way to kick the QEMU out or continue the postcopy again.  After
this patch, (4) will quickly fail qemu and bounce back to PAUSED stage.

Admin can also kick QEMU from (4) into PAUSED when needed using
migrate-pause when needed.

After bouncing back to PAUSED stage, one can recover again.

Reported-by: Xiaohui Li 
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2111332
Signed-off-by: Peter Xu 
---
 migration/migration.h |  8 --
 migration/migration.c | 62 +++
 migration/ram.c   |  4 ++-
 3 files changed, 66 insertions(+), 8 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 311334c701..7e61e2ece7 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -482,6 +482,7 @@ int migrate_init(MigrationState *s, Error **errp);
 bool migration_is_blocked(Error **errp);
 /* True if outgoing migration has entered postcopy phase */
 bool migration_in_postcopy(void);
+bool migration_postcopy_is_alive(int state);
 MigrationState *migrate_get_current(void);
 
 uint64_t ram_get_total_transferred_pages(void);
@@ -522,8 +523,11 @@ void migration_populate_vfio_info(MigrationInfo *info);
 void migration_reset_vfio_bytes_transferred(void);
 void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
 
-/* Migration thread waiting for return path thread. */
-void migration_rp_wait(MigrationState *s);
+/*
+ * Migration thread waiting for return path thread.  Return non-zero if an
+ * error is detected.
+ */
+int migration_rp_wait(MigrationState *s);
 /*
  * Kick the migration thread waiting for return path messages.  NOTE: the
  * name can be slightly confusing (when read as "kick the rp thread"), just
diff --git a/migration/migration.c b/migration/migration.c
index b958ac8743..97d4b234d2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1349,6 +1349,17 @@ bool migration_in_postcopy(void)
 }
 }
 
+bool migration_postcopy_is_alive(int state)
+{
+switch (state) {
+case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+case MIGRATION_STATUS_POSTCOPY_RECOVER:
+return true;
+default:
+return false;
+}
+}
+
 bool migration_in_postcopy_after_devices(MigrationState *s)
 {
 return migration_in_postcopy() && s->postcopy_after_devices;
@@ -1556,18 +1567,31 @@ void qmp_migrate_pause(Error **errp)
 MigrationIncomingState *mis = migration_incoming_get_current();
 int ret;
 
-if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+if (migration_postcopy_is_alive(ms->state)) {
 /* Source side, during postcopy */
+Error *error = NULL;
+
+/* Tell the core migration that we're pausing */
+error_setg(, "Postcopy migration is paused by the user");
+migrate_set_error(ms, error);
+
 qemu_mutex_lock(>qemu_file_lock);
 ret

[PATCH v2 08/11] migration: Add migration_rp_wait|kick()

2023-09-12 Thread Peter Xu

It's just a simple wrapper for rp_sem on either wait() or kick(), make it
even clearer on how it is used.  Prepared to be used even for other things.

Reviewed-by: Fabiano Rosas 
Signed-off-by: Peter Xu 
---
 migration/migration.h | 15 +++
 migration/migration.c |  4 ++--
 migration/ram.c   | 16 +++-
 3 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 48322e909e..311334c701 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -304,6 +304,12 @@ struct MigrationState {
  * be cleared in the rp_thread!
  */
 bool  rp_thread_created;
+/*
+ * Used to synchronize between migration main thread and return
+ * path thread.  The migration thread can wait() on this sem, while
+ * other threads (e.g., return path thread) can kick it using a
+ * post().
+ */
 QemuSemaphore rp_sem;
 /*
  * We post to this when we got one PONG from dest. So far it's an
@@ -516,4 +522,13 @@ void migration_populate_vfio_info(MigrationInfo *info);
 void migration_reset_vfio_bytes_transferred(void);
 void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
 
+/* Migration thread waiting for return path thread. */
+void migration_rp_wait(MigrationState *s);
+/*
+ * Kick the migration thread waiting for return path messages.  NOTE: the
+ * name can be slightly confusing (when read as "kick the rp thread"), just
+ * to remember the target is always the migration thread.
+ */
+void migration_rp_kick(MigrationState *s);
+
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index 216d0e871f..b958ac8743 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1846,7 +1846,7 @@ static int migrate_handle_rp_resume_ack(MigrationState *s,
   MIGRATION_STATUS_POSTCOPY_ACTIVE);
 
 /* Notify send thread that time to continue send pages */
-qemu_sem_post(>rp_state.rp_sem);
+migration_rp_kick(s);
 
 return 0;
 }
@@ -2514,7 +2514,7 @@ static int postcopy_resume_handshake(MigrationState *s)
 qemu_savevm_send_postcopy_resume(s->to_dst_file);
 
 while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
-qemu_sem_wait(>rp_state.rp_sem);
+migration_rp_wait(s);
 }
 
 if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
diff --git a/migration/ram.c b/migration/ram.c
index a9541c60b4..b5f6d65d84 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4157,7 +4157,7 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, 
RAMState *rs)
 
 /* Wait until all the ramblocks' dirty bitmap synced */
 while (qatomic_read(>postcopy_bmap_sync_requested)) {
-qemu_sem_wait(>rp_state.rp_sem);
+migration_rp_wait(s);
 }
 
 trace_ram_dirty_bitmap_sync_complete();
@@ -4165,11 +4165,6 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, 
RAMState *rs)
 return 0;
 }
 
-static void ram_dirty_bitmap_reload_notify(MigrationState *s)
-{
-qemu_sem_post(>rp_state.rp_sem);
-}
-
 /*
  * Read the received bitmap, revert it as the initial dirty bitmap.
  * This is only used when the postcopy migration is paused but wants
@@ -4252,10 +4247,13 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock 
*block, Error **errp)
 qatomic_dec(>postcopy_bmap_sync_requested);
 
 /*
- * We succeeded to sync bitmap for current ramblock. If this is
- * the last one to sync, we need to notify the main send thread.
+ * We succeeded to sync bitmap for current ramblock. Always kick the
+ * migration thread to check whether all requested bitmaps are
+ * reloaded.  NOTE: it's racy to only kick when requested==0, because
+ * we don't know whether the migration thread may still be increasing
+ * it.
  */
-ram_dirty_bitmap_reload_notify(s);
+migration_rp_kick(s);
 
 ret = 0;
 out:
-- 
2.41.0

[PATCH v2 00/11] migration: Better error handling in rp thread, allow failures in recover

2023-09-12 Thread Peter Xu

v2:
- Patch "migration: Let migrate_set_error() take ownership"
  - Fix three new call sites that uses migrate_set_error(), by dropping the
error_report_err() later on.  [Fabiano]
- Patch "migration: Allow network to fail even during recovery"
  - Fixed wrong check for dest QEMU
- Patch "migration: Allow RECOVER->PAUSED convertion for dest qemu"
  - Newly added
- Patch "tests/migration-test: Add a test for postcopy hangs during RECOVER"
  - Newly added, based on Fabiano's test case provided

v1: https://lore.kernel.org/r/20230829214235.69309-1-pet...@redhat.com

Again, if this collapse with anything I can rebase.

This series allow better error handling in the postcopy return path thread,
also it enables double-failures to happen during postcopy recovery, IOW,
one can fail again right during RECOVER phase on both sides.

Big thanks for Fabiano on prioviding a base test case for the double
failure case.

Please have a look, thanks.

Fabiano Rosas (1):
  tests/migration-test: Add a test for postcopy hangs during RECOVER

Peter Xu (10):
  migration: Display error in query-migrate irrelevant of status
  migration: Let migrate_set_error() take ownership
  migration: Introduce migrate_has_error()
  migration: Refactor error handling in source return path
  migration: Deliver return path file error to migrate state too
  qemufile: Always return a verbose error
  migration: Remember num of ramblocks to sync during recovery
  migration: Add migration_rp_wait|kick()
  migration: Allow network to fail even during recovery
  migration: Allow RECOVER->PAUSED convertion for dest qemu

 qapi/migration.json  |   5 +-
 migration/migration.h|  25 +++-
 migration/qemu-file.h|   1 +
 migration/ram.h  |   5 +-
 migration/channel.c  |   1 -
 migration/migration.c| 231 +++
 migration/multifd.c  |  10 +-
 migration/postcopy-ram.c |   1 -
 migration/qemu-file.c|  17 ++-
 migration/ram.c  |  77 +++-
 migration/savevm.c   |   3 +-
 tests/qtest/migration-test.c |  94 ++
 migration/trace-events   |   2 +-
 13 files changed, 342 insertions(+), 130 deletions(-)

-- 
2.41.0

Re: [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery

2023-09-12 Thread Peter Xu

On Tue, Sep 12, 2023 at 04:05:27PM -0400, Peter Xu wrote:
> Thanks for contributing the test case!
> 
> Do you want me to pick this patch up (with modifications) and repost
> together with this series?  It'll also work if you want to send a separate
> test patch.  Let me know!

It turns out I found more bug when I was reworking that test case based on
yours.  E.g., currently we'll crash dest qemu if we really fail during
recovery, because we miss:

diff --git a/migration/savevm.c b/migration/savevm.c
index bb3e99194c..422406e0ee 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2723,7 +2723,8 @@ static bool 
postcopy_pause_incoming(MigrationIncomingState *mis)
 qemu_mutex_unlock(>postcopy_prio_thread_mutex);
 }

-migrate_set_state(>state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+/* Current state can be either ACTIVE or RECOVER */
+migrate_set_state(>state, mis->state,
   MIGRATION_STATUS_POSTCOPY_PAUSED);

 /* Notify the fault thread for the invalidated file handle */

So in double failure case we'll not set RECOVER to PAUSED, and we'll crash
right afterwards, as we'll skip the semaphore:

while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {  <--- not true, 
continue
qemu_sem_wait(>postcopy_pause_sem_dst);
}

Now within the new test case I am 100% sure I can kick both sides into
RECOVER state (one trick still needed along the way; the test patch will
tell soon), then kick them back, then proceed with a successful migration.

Let me just repost everything with the new test case.

Thanks,

-- 
Peter Xu

[PATCH 4/4] target/ppc: Add migration support for BHRB

2023-09-12 Thread Glenn Miles

Adds migration support for Branch History Rolling
Buffer (BHRB) internal state.

Signed-off-by: Glenn Miles 
---
 target/ppc/machine.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index b195fb4dc8..89146969c8 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -314,6 +314,7 @@ static int cpu_post_load(void *opaque, int version_id)
 
 if (tcg_enabled()) {
 pmu_mmcr01a_updated(env);
+hreg_bhrb_filter_update(env);
 }
 
 return 0;
@@ -670,6 +671,27 @@ static const VMStateDescription vmstate_compat = {
 }
 };
 
+#ifdef TARGET_PPC64
+static bool bhrb_needed(void *opaque)
+{
+PowerPCCPU *cpu = opaque;
+return (cpu->env.flags & POWERPC_FLAG_BHRB) != 0;
+}
+
+static const VMStateDescription vmstate_bhrb = {
+.name = "cpu/bhrb",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = bhrb_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINTTL(env.bhrb_num_entries, PowerPCCPU),
+VMSTATE_UINTTL(env.bhrb_offset, PowerPCCPU),
+VMSTATE_UINT64_ARRAY(env.bhrb, PowerPCCPU, BHRB_MAX_NUM_ENTRIES),
+VMSTATE_END_OF_LIST()
+}
+};
+#endif
+
 const VMStateDescription vmstate_ppc_cpu = {
 .name = "cpu",
 .version_id = 5,
@@ -716,6 +738,7 @@ const VMStateDescription vmstate_ppc_cpu = {
 #ifdef TARGET_PPC64
 _tm,
 _slb,
+_bhrb,
 #endif /* TARGET_PPC64 */
 _tlb6xx,
 _tlbemb,
-- 
2.31.1

[PATCH 4/4] target/ppc: Add migration support for BHRB

2023-09-12 Thread Glenn Miles

Adds migration support for Branch History Rolling
Buffer (BHRB) internal state.

Signed-off-by: Glenn Miles 
---
 target/ppc/machine.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index b195fb4dc8..89146969c8 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -314,6 +314,7 @@ static int cpu_post_load(void *opaque, int version_id)
 
 if (tcg_enabled()) {
 pmu_mmcr01a_updated(env);
+hreg_bhrb_filter_update(env);
 }
 
 return 0;
@@ -670,6 +671,27 @@ static const VMStateDescription vmstate_compat = {
 }
 };
 
+#ifdef TARGET_PPC64
+static bool bhrb_needed(void *opaque)
+{
+PowerPCCPU *cpu = opaque;
+return (cpu->env.flags & POWERPC_FLAG_BHRB) != 0;
+}
+
+static const VMStateDescription vmstate_bhrb = {
+.name = "cpu/bhrb",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = bhrb_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINTTL(env.bhrb_num_entries, PowerPCCPU),
+VMSTATE_UINTTL(env.bhrb_offset, PowerPCCPU),
+VMSTATE_UINT64_ARRAY(env.bhrb, PowerPCCPU, BHRB_MAX_NUM_ENTRIES),
+VMSTATE_END_OF_LIST()
+}
+};
+#endif
+
 const VMStateDescription vmstate_ppc_cpu = {
 .name = "cpu",
 .version_id = 5,
@@ -716,6 +738,7 @@ const VMStateDescription vmstate_ppc_cpu = {
 #ifdef TARGET_PPC64
 _tm,
 _slb,
+_bhrb,
 #endif /* TARGET_PPC64 */
 _tlb6xx,
 _tlbemb,
-- 
2.31.1

[PULL v1 1/1] tpm: fix crash when FD >= 1024

2023-09-12 Thread Stefan Berger

From: Marc-Andr޸ Lureau 

Replace select() with poll() to fix a crash when QEMU has a large number
of FDs.

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=2020133

Cc: qemu-sta...@nongnu.org
Fixes: 56a3c24ffc ("tpm: Probe for connected TPM 1.2 or TPM 2")
Signed-off-by: Marc-Andr޸ Lureau 
Reviewed-by: Michael Tokarev 
Reviewed-by: Stefan Berger 
Signed-off-by: Stefan Berger 
---
 backends/tpm/tpm_util.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/backends/tpm/tpm_util.c b/backends/tpm/tpm_util.c
index a6e6d3e72f..1856589c3b 100644
--- a/backends/tpm/tpm_util.c
+++ b/backends/tpm/tpm_util.c
@@ -112,12 +112,8 @@ static int tpm_util_request(int fd,
 void *response,
 size_t responselen)
 {
-fd_set readfds;
+GPollFD fds[1] = { {.fd = fd, .events = G_IO_IN } };
 int n;
-struct timeval tv = {
-.tv_sec = 1,
-.tv_usec = 0,
-};
 
 n = write(fd, request, requestlen);
 if (n < 0) {
@@ -127,11 +123,8 @@ static int tpm_util_request(int fd,
 return -EFAULT;
 }
 
-FD_ZERO();
-FD_SET(fd, );
-
 /* wait for a second */
-n = select(fd + 1, , NULL, NULL, );
+n = RETRY_ON_EINTR(g_poll(fds, 1, 1000));
 if (n != 1) {
 return -errno;
 }
-- 
2.41.0

[PULL v1 0/1] Merge tpm 2023/09/12 v1

2023-09-12 Thread Stefan Berger

Hello!

  This PR contains a fix for the case where the TPM file descriptor is >= 1024
and the select() call cannot be used.

Regards,
   Stefan

The following changes since commit 9ef497755afc252fb8e060c9ea6b0987abfd20b6:

  Merge tag 'pull-vfio-20230911' of https://github.com/legoater/qemu into 
staging (2023-09-11 09:13:08 -0400)

are available in the Git repository at:

  https://github.com/stefanberger/qemu-tpm.git tags/pull-tpm-2023-09-12-1

for you to fetch changes up to 8557de964dfaae5c6eea09d488f85f4aa6cb3ce7:

  tpm: fix crash when FD >= 1024 (2023-09-12 17:30:12 -0400)


Marc-Andr޸ Lureau (1):
  tpm: fix crash when FD >= 1024

 backends/tpm/tpm_util.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

-- 
2.41.0

Re: [PATCH v2 4/4] block-coroutine-wrapper: use qemu_get_current_aio_context()

2023-09-12 Thread Stefan Hajnoczi

On Fri, Sep 01, 2023 at 07:01:37PM +0200, Kevin Wolf wrote:
> Am 24.08.2023 um 01:59 hat Stefan Hajnoczi geschrieben:
> > Use qemu_get_current_aio_context() in mixed wrappers and coroutine
> > wrappers so that code runs in the caller's AioContext instead of moving
> > to the BlockDriverState's AioContext. This change is necessary for the
> > multi-queue block layer where any thread can call into the block layer.
> > 
> > Most wrappers are IO_CODE where it's safe to use the current AioContext
> > nowadays. BlockDrivers and the core block layer use their own locks and
> > no longer depend on the AioContext lock for thread-safety.
> > 
> > The bdrv_create() wrapper invokes GLOBAL_STATE code. Using the current
> > AioContext is safe because this code is only called with the BQL held
> > from the main loop thread.
> > 
> > The output of qemu-iotests 051 is sensitive to event loop activity.
> > Update the output because the monitor BH runs at a different time,
> > causing prompts to be printed differently in the output.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> The update for 051 is actually missing from this patch, and so the test
> fails.
> 
> I missed the dependency on your qio_channel series, so 281 ran into an
> abort() for me (see below for the stack trace). I expect that the other
> series actually fixes this, but this kind of interaction wasn't really
> obvious. How did you make sure that there aren't other places that don't
> like this change?

Only by running qemu-iotests.

Stefan

> 
> Kevin
> 
> (gdb) bt
> #0  0x7f8ef0d2fe5c in __pthread_kill_implementation () at /lib64/libc.so.6
> #1  0x7f8ef0cdfa76 in raise () at /lib64/libc.so.6
> #2  0x7f8ef0cc97fc in abort () at /lib64/libc.so.6
> #3  0x7f8ef0cc971b in _nl_load_domain.cold () at /lib64/libc.so.6
> #4  0x7f8ef0cd8656 in  () at /lib64/libc.so.6
> #5  0x55fd19da6af3 in qio_channel_yield (ioc=0x7f8eeb70, 
> condition=G_IO_IN) at ../io/channel.c:583
> #6  0x55fd19e0382f in nbd_read_eof (bs=0x55fd1b681350, 
> ioc=0x7f8eeb70, buffer=0x55fd1b680da0, size=4, errp=0x0) at 
> ../nbd/client.c:1454
> #7  0x55fd19e03612 in nbd_receive_reply (bs=0x55fd1b681350, 
> ioc=0x7f8eeb70, reply=0x55fd1b680da0, errp=0x0) at ../nbd/client.c:1491
> #8  0x55fd19e40575 in nbd_receive_replies (s=0x55fd1b680b00, cookie=1) at 
> ../block/nbd.c:461
> #9  0x55fd19e3fec4 in nbd_co_do_receive_one_chunk
> (s=0x55fd1b680b00, cookie=1, only_structured=true, 
> request_ret=0x7f8ee8bff91c, qiov=0x7f8ee8bfff10, payload=0x7f8ee8bff9d0, 
> errp=0x7f8ee8bff910) at ../block/nbd.c:844
> #10 0x55fd19e3fd55 in nbd_co_receive_one_chunk
> (s=0x55fd1b680b00, cookie=1, only_structured=true, 
> request_ret=0x7f8ee8bff91c, qiov=0x7f8ee8bfff10, reply=0x7f8ee8bff9f0, 
> payload=0x7f8ee8bff9d0, errp=0x7f8ee8bff910)
> at ../block/nbd.c:925
> #11 0x55fd19e3f7b5 in nbd_reply_chunk_iter_receive (s=0x55fd1b680b00, 
> iter=0x7f8ee8bff9d8, cookie=1, qiov=0x7f8ee8bfff10, reply=0x7f8ee8bff9f0, 
> payload=0x7f8ee8bff9d0)
> at ../block/nbd.c:1008
> #12 0x55fd19e3ecf7 in nbd_co_receive_cmdread_reply (s=0x55fd1b680b00, 
> cookie=1, offset=0, qiov=0x7f8ee8bfff10, request_ret=0x7f8ee8bffad4, 
> errp=0x7f8ee8bffac8) at ../block/nbd.c:1074
> #13 0x55fd19e3c804 in nbd_client_co_preadv (bs=0x55fd1b681350, offset=0, 
> bytes=131072, qiov=0x7f8ee8bfff10, flags=0) at ../block/nbd.c:1258
> #14 0x55fd19e33547 in bdrv_driver_preadv (bs=0x55fd1b681350, offset=0, 
> bytes=131072, qiov=0x7f8ee8bfff10, qiov_offset=0, flags=0) at 
> ../block/io.c:1005
> #15 0x55fd19e2c8bb in bdrv_aligned_preadv (child=0x55fd1c282d90, 
> req=0x7f8ee8bffd90, offset=0, bytes=131072, align=1, qiov=0x7f8ee8bfff10, 
> qiov_offset=0, flags=0) at ../block/io.c:1398
> #16 0x55fd19e2bf7d in bdrv_co_preadv_part (child=0x55fd1c282d90, 
> offset=0, bytes=131072, qiov=0x7f8ee8bfff10, qiov_offset=0, flags=0) at 
> ../block/io.c:1815
> #17 0x55fd19e176bd in blk_co_do_preadv_part (blk=0x55fd1c269c00, 
> offset=0, bytes=131072, qiov=0x7f8ee8bfff10, qiov_offset=0, flags=0) at 
> ../block/block-backend.c:1344
> #18 0x55fd19e17588 in blk_co_preadv (blk=0x55fd1c269c00, offset=0, 
> bytes=131072, qiov=0x7f8ee8bfff10, flags=0) at ../block/block-backend.c:1369
> #19 0x55fd19e17514 in blk_co_pread (blk=0x55fd1c269c00, offset=0, 
> bytes=131072, buf=0x55fd1c16d000, flags=0) at ../block/block-backend.c:1358
> #20 0x55fd19ddcc91 in blk_co_pread_entry (opaque=0x7ffc4bbdd9a0) at 
> block/block-gen.c:1519
> #21 0x55fd19feb2a1 in coroutine_trampoline (i0=460835072, i1=22013) at 
> ../util/coroutine-ucontext.c:177
> #22 0x7f8ef0cf5c20 in __start_context () at /lib64/libc.so.6
> 
> io/channel.c:583 is this:
> 
> 577 void coroutine_fn qio_channel_yield(QIOChannel *ioc,
> 578 GIOCondition condition)
> 579 {
> 580 AioContext *ioc_ctx = ioc->ctx ?: qemu_get_aio_context();
> 581
> 582 assert(qemu_in_coroutine());
> 583

[PATCH 2/4] target/ppc: Add recording of taken branches to BHRB

2023-09-12 Thread Glenn Miles

This commit continues adding support for the Branch History
Rolling Buffer (BHRB) as is provided starting with the P8
processor and continuing with its successors.  This commit
is limited to the recording and filtering of taken branches.

The following changes were made:

  - Added a BHRB buffer for storing branch instruction and
target addresses for taken branches
  - Renamed gen_update_cfar to gen_update_branch_history and
added a 'target' parameter to hold the branch target
address and 'inst_type' parameter to use for filtering
  - Added a combination of jit-time and run-time checks to
gen_update_branch_history for determining if a branch
should be recorded
  - Added TCG code to gen_update_branch_history that stores
data to the BHRB and updates the BHRB offset.
  - Added BHRB resource initialization and reset functions
  - Enabled functionality for P8, P9 and P10 processors.

Signed-off-by: Glenn Miles 
---
 target/ppc/cpu.h   |  18 +++-
 target/ppc/cpu_init.c  |  41 -
 target/ppc/helper_regs.c   |  32 +++
 target/ppc/helper_regs.h   |   1 +
 target/ppc/power8-pmu.c|   2 +
 target/ppc/power8-pmu.h|   7 ++
 target/ppc/translate.c | 114 +++--
 target/ppc/translate/branch-impl.c.inc |   2 +-
 8 files changed, 205 insertions(+), 12 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 20ae1466a5..bda1afb700 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -454,8 +454,9 @@ FIELD(MSR, LE, MSR_LE, 1)
 #define MMCR2_UREG_MASK (MMCR2_FC1P0 | MMCR2_FC2P0 | MMCR2_FC3P0 | \
  MMCR2_FC4P0 | MMCR2_FC5P0 | MMCR2_FC6P0)
 
-#define MMCRA_BHRBRDPPC_BIT(26)/* BHRB Recording Disable */
-
+#define MMCRA_BHRBRDPPC_BIT(26) /* BHRB Recording Disable */
+#define MMCRA_IFM_MASK  PPC_BITMASK(32, 33) /* BHRB Instruction Filtering */
+#define MMCRA_IFM_SHIFT PPC_BIT_NR(33)
 
 #define MMCR1_EVT_SIZE 8
 /* extract64() does a right shift before extracting */
@@ -682,6 +683,8 @@ enum {
 POWERPC_FLAG_SMT  = 0x0040,
 /* Using "LPAR per core" mode  (as opposed to per-thread)*/
 POWERPC_FLAG_SMT_1LPAR = 0x0080,
+/* Has BHRB */
+POWERPC_FLAG_BHRB  = 0x0100,
 };
 
 /*
@@ -1110,6 +1113,9 @@ DEXCR_ASPECT(PHIE, 6)
 #define PPC_CPU_OPCODES_LEN  0x40
 #define PPC_CPU_INDIRECT_OPCODES_LEN 0x20
 
+#define BHRB_MAX_NUM_ENTRIES_LOG2 (5)
+#define BHRB_MAX_NUM_ENTRIES  (1 << BHRB_MAX_NUM_ENTRIES_LOG2)
+
 struct CPUArchState {
 /* Most commonly used resources during translated code execution first */
 target_ulong gpr[32];  /* general purpose registers */
@@ -1196,6 +1202,14 @@ struct CPUArchState {
 int dcache_line_size;
 int icache_line_size;
 
+/* Branch History Rolling Buffer (BHRB) resources */
+target_ulong bhrb_num_entries;
+target_ulong bhrb_base;
+target_ulong bhrb_filter;
+target_ulong bhrb_offset;
+target_ulong bhrb_offset_mask;
+uint64_t bhrb[BHRB_MAX_NUM_ENTRIES];
+
 /* These resources are used during exception processing */
 /* CPU model definition */
 target_ulong msr_mask;
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 568f9c3b88..19d7505a73 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6100,6 +6100,28 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
 pcc->l1_icache_size = 0x8000;
 }
 
+static void bhrb_init_state(CPUPPCState *env, target_long num_entries_log2)
+{
+if (env->flags & POWERPC_FLAG_BHRB) {
+if (num_entries_log2 > BHRB_MAX_NUM_ENTRIES_LOG2) {
+num_entries_log2 = BHRB_MAX_NUM_ENTRIES_LOG2;
+}
+env->bhrb_num_entries = 1 << num_entries_log2;
+env->bhrb_base = (target_long)>bhrb[0];
+env->bhrb_offset_mask = (env->bhrb_num_entries * sizeof(uint64_t)) - 1;
+}
+}
+
+static void bhrb_reset_state(CPUPPCState *env)
+{
+if (env->flags & POWERPC_FLAG_BHRB) {
+env->bhrb_offset = 0;
+env->bhrb_filter = 0;
+memset(env->bhrb, 0, sizeof(env->bhrb));
+}
+}
+
+#define POWER8_BHRB_ENTRIES_LOG2 5
 static void init_proc_POWER8(CPUPPCState *env)
 {
 /* Common Registers */
@@ -6141,6 +6163,8 @@ static void init_proc_POWER8(CPUPPCState *env)
 env->dcache_line_size = 128;
 env->icache_line_size = 128;
 
+bhrb_init_state(env, POWER8_BHRB_ENTRIES_LOG2);
+
 /* Allocate hardware IRQ controller */
 init_excp_POWER8(env);
 ppcPOWER7_irq_init(env_archcpu(env));
@@ -6241,7 +6265,8 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
 pcc->flags = POWERPC_FLAG_VRE | POWERPC_FLAG_SE |
  POWERPC_FLAG_BE | POWERPC_FLAG_PMM |
  POWERPC_FLAG_BUS_CLK | POWERPC_FLAG_CFAR |
- POWERPC_FLAG_VSX | POWERPC_FLAG_TM;
+ POWERPC_FLAG_VSX | POWERPC_FLAG_TM |
+

[RFC v6 5/9] target/i386: Add support for native library calls

2023-09-12 Thread Yeqi Fu

This commit introduces support for native library calls on the
i386 target. When encountering special instructions reserved
for native calls, this commit extracts the function name and
generates the corresponding native call.

Signed-off-by: Yeqi Fu 
---
 configs/targets/i386-linux-user.mak   |  1 +
 configs/targets/x86_64-linux-user.mak |  1 +
 target/i386/tcg/translate.c   | 38 +++
 3 files changed, 40 insertions(+)

diff --git a/configs/targets/i386-linux-user.mak 
b/configs/targets/i386-linux-user.mak
index 5b2546a430..2d8bca8f93 100644
--- a/configs/targets/i386-linux-user.mak
+++ b/configs/targets/i386-linux-user.mak
@@ -2,3 +2,4 @@ TARGET_ARCH=i386
 TARGET_SYSTBL_ABI=i386
 TARGET_SYSTBL=syscall_32.tbl
 TARGET_XML_FILES= gdb-xml/i386-32bit.xml
+CONFIG_NATIVE_CALL=y
diff --git a/configs/targets/x86_64-linux-user.mak 
b/configs/targets/x86_64-linux-user.mak
index 9ceefbb615..a53b017454 100644
--- a/configs/targets/x86_64-linux-user.mak
+++ b/configs/targets/x86_64-linux-user.mak
@@ -3,3 +3,4 @@ TARGET_BASE_ARCH=i386
 TARGET_SYSTBL_ABI=common,64
 TARGET_SYSTBL=syscall_64.tbl
 TARGET_XML_FILES= gdb-xml/i386-64bit.xml
+CONFIG_NATIVE_CALL=y
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 90c7b32f36..a7a8377832 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -33,6 +33,7 @@
 #include "helper-tcg.h"
 
 #include "exec/log.h"
+#include "native/native.h"
 
 #define HELPER_H "helper.h"
 #include "exec/helper-info.c.inc"
@@ -3075,6 +3076,37 @@ static void gen_cmpxchg16b(DisasContext *s, CPUX86State 
*env, int modrm)
 }
 #endif
 
+static void gen_native_call(CPUState *cpu, DisasContext *s, CPUX86State *env)
+{
+#ifdef CONFIG_USER_ONLY
+uint32_t func_tmp;
+char *func_name;
+TCGv ret = tcg_temp_new();
+TCGv arg1 = tcg_temp_new();
+TCGv arg2 = tcg_temp_new();
+TCGv arg3 = tcg_temp_new();
+x86_ldub_code(env, s);
+func_tmp = x86_ldl_code(env, s);
+func_name = g2h(cpu, s->pc + func_tmp);
+#ifdef TARGET_X86_64
+tcg_gen_mov_tl(arg1, cpu_regs[R_EDI]);
+tcg_gen_mov_tl(arg2, cpu_regs[R_ESI]);
+tcg_gen_mov_tl(arg3, cpu_regs[R_EDX]);
+#else
+tcg_gen_addi_tl(arg1, cpu_regs[R_ESP], 4);
+gen_op_ld_v(s, MO_UL, arg1, arg1);
+tcg_gen_addi_tl(arg2, cpu_regs[R_ESP], 8);
+gen_op_ld_v(s, MO_UL, arg2, arg2);
+tcg_gen_addi_tl(arg3, cpu_regs[R_ESP], 12);
+gen_op_ld_v(s, MO_UL, arg3, arg3);
+#endif
+if (!gen_native_call_tl(func_name, ret, arg1, arg2, arg3)) {
+gen_illegal_opcode(s);
+}
+tcg_gen_mov_tl(cpu_regs[R_EAX], ret);
+#endif
+}
+
 /* convert one instruction. s->base.is_jmp is set if the translation must
be stopped. Return the next pc value */
 static bool disas_insn(DisasContext *s, CPUState *cpu)
@@ -6810,6 +6842,12 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 case 0x1d0 ... 0x1fe:
 disas_insn_new(s, cpu, b);
 break;
+case 0x1ff:
+if (native_bypass_enabled()) {
+gen_native_call(cpu, s, env);
+break;
+}
+goto unknown_op;
 default:
 goto unknown_op;
 }
-- 
2.34.1

[RFC v6 2/9] build: Implement libnative library and the build machinery for libnative

2023-09-12 Thread Yeqi Fu

This commit implements a shared library, where native functions are
rewritten as special instructions. At runtime, user programs load
the shared library, and special instructions are executed when
native functions are called.

Signed-off-by: Yeqi Fu 
---
 Makefile|  2 ++
 common-user/native/Makefile.include |  8 +
 common-user/native/Makefile.target  | 22 +
 common-user/native/libnative.S  | 51 +
 configure   | 39 ++
 5 files changed, 122 insertions(+)
 create mode 100644 common-user/native/Makefile.include
 create mode 100644 common-user/native/Makefile.target
 create mode 100644 common-user/native/libnative.S

diff --git a/Makefile b/Makefile
index 5d48dfac18..6f6147b40f 100644
--- a/Makefile
+++ b/Makefile
@@ -182,6 +182,8 @@ SUBDIR_MAKEFLAGS=$(if $(V),,--no-print-directory --quiet)
 
 include $(SRC_PATH)/tests/Makefile.include
 
+include $(SRC_PATH)/common-user/native/Makefile.include
+
 all: recurse-all
 
 ROMS_RULES=$(foreach t, all clean distclean, $(addsuffix /$(t), $(ROMS)))
diff --git a/common-user/native/Makefile.include 
b/common-user/native/Makefile.include
new file mode 100644
index 00..65b10fddac
--- /dev/null
+++ b/common-user/native/Makefile.include
@@ -0,0 +1,8 @@
+.PHONY: build-native
+build-native: $(NATIVE_TARGETS:%=build-native-library-%)
+$(NATIVE_TARGETS:%=build-native-library-%): build-native-library-%:
+   $(call quiet-command, \
+   $(MAKE) -C common-user/native/$* $(SUBDIR_MAKEFLAGS), \
+   "BUILD","$* native library")
+
+all: build-native
diff --git a/common-user/native/Makefile.target 
b/common-user/native/Makefile.target
new file mode 100644
index 00..65d90102e2
--- /dev/null
+++ b/common-user/native/Makefile.target
@@ -0,0 +1,22 @@
+# -*- Mode: makefile -*-
+#
+# Library for native calls
+#
+
+all:
+-include ../../../config-host.mak
+-include config-target.mak
+
+CFLAGS+=-shared -D $(TARGET_NAME)
+LDFLAGS+=
+
+SRC = $(SRC_PATH)/common-user/native/libnative.S
+LIBNATIVE = libnative.so
+
+all: $(LIBNATIVE)
+
+$(LIBNATIVE): $(SRC)
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(EXTRA_NATIVE_CALL_FLAGS) $< -o $@ 
$(LDFLAGS)
+
+clean:
+   rm -f $(LIBNATIVE)
diff --git a/common-user/native/libnative.S b/common-user/native/libnative.S
new file mode 100644
index 00..bc51dabedf
--- /dev/null
+++ b/common-user/native/libnative.S
@@ -0,0 +1,51 @@
+.macro special_instr sym
+#if defined(__i386__)
+ ud0 \sym-1f, %eax; 1:
+#elif defined(__x86_64__)
+ ud0 \sym(%rip), %eax
+#elif defined(__arm__) || defined(__aarch64__)
+ hlt 0x
+1:  .word   \sym - 1b
+#elif defined(__mips__)
+ syscall 0x
+1:  .word   \sym - 1b
+#else
+# error
+#endif
+.endm
+
+.macro ret_instr
+#if defined(__i386__) || defined(__x86_64__) || defined(__aarch64__)
+ ret
+#elif defined(__arm__)
+ bx lr
+#elif defined(__mips__)
+ jr $ra
+#else
+# error
+#endif
+.endm
+
+/* Symbols of native functions */
+
+.macro define_function name
+ .text
+\name:
+ special_instr 9f
+ ret_instr
+ .globl \name
+ .type \name, %function
+ .size \name, . - \name
+
+ .section .rodata
+9:  .asciz  "\name"
+.endm
+
+define_function memcmp
+define_function memcpy
+define_function memset
+define_function strcat
+define_function strcmp
+define_function strcpy
+define_function strncmp
+define_function strncpy
diff --git a/configure b/configure
index 7a1e463d9c..de533b27a2 100755
--- a/configure
+++ b/configure
@@ -1826,6 +1826,45 @@ if test "$tcg" = "enabled"; then
 fi
 )
 
+# common-user/native configuration
+(mkdir -p common-user/native
+
+native_targets=
+for target in $target_list; do
+  case $target in
+*-softmmu)
+continue
+;;
+  esac
+
+  # native call is only supported on these architectures
+  arch=${target%%-*}
+  config_target_mak=common-user/native/${target}/config-target.mak
+  case $arch in
+i386|x86_64|arm|aarch64|mips|mips64)
+  if test -f cross-build/${target}/config-target.mak; then
+mkdir -p "common-user/native/${target}"
+ln -srf cross-build/${target}/config-target.mak "$config_target_mak"
+if test $arch = arm; then
+  echo "EXTRA_NATIVE_CALL_FLAGS=-marm" >> "$config_target_mak"
+fi
+if test $arch = $cpu || \
+  { test $arch = i386 && test $cpu = x86_64; } || \
+  { test $arch = arm && test $cpu = aarch64; } || \
+  { test $arch = mips && test $cpu = mips64; }; then
+  echo "LD_PREFIX=/" >> "$config_target_mak"
+fi
+echo "LIBNATIVE=$PWD/common-user/native/${target}/libnative.so" >> 
"$config_target_mak"
+ln -sf ${source_path}/common-user/native/Makefile.target 
common-user/native/${target}/Makefile
+native_targets="$native_targets ${target}"
+  fi
+;;
+  esac
+done
+
+echo

[RFC v6 9/9] docs/user: Add doc for native library calls

2023-09-12 Thread Yeqi Fu

Signed-off-by: Yeqi Fu 
---
 docs/user/index.rst|  1 +
 docs/user/native_calls.rst | 91 ++
 2 files changed, 92 insertions(+)
 create mode 100644 docs/user/native_calls.rst

diff --git a/docs/user/index.rst b/docs/user/index.rst
index 782d27cda2..d3fc9b7af1 100644
--- a/docs/user/index.rst
+++ b/docs/user/index.rst
@@ -12,3 +12,4 @@ processes compiled for one CPU on another CPU.
:maxdepth: 2
 
main
+   native_calls
diff --git a/docs/user/native_calls.rst b/docs/user/native_calls.rst
new file mode 100644
index 00..0f8c2273a3
--- /dev/null
+++ b/docs/user/native_calls.rst
@@ -0,0 +1,91 @@
+Native Library Calls Optimization
+=
+
+Description
+---
+
+Executing a program under QEMU's user mode subjects the entire
+program, including all library calls, to translation. It's important
+to understand that many of these library functions are optimized
+specifically for the guest architecture. Therefore, their
+translation might not yield the most efficient execution.
+
+When the semantics of a library function are well defined, we can
+capitalize on this by substituting the translated version with a call
+to the native equivalent function.
+
+To achieve tangible results, focus should be given to functions such
+as memory-related ('mem*') and string-related ('str*') functions.
+These subsets of functions often have the most significant effect
+on overall performance, making them optimal candidates for
+optimization.
+
+Implementation
+--
+
+By writing the name of a specific library into the /etc/ld.so.preload
+file, the dynamic linker will prioritize loading this library before
+any others. If this library contains functions that share names with
+functions in other libraries, the ones in the specified library will
+take precedence.
+
+In order to bypass certain native libraries, we have developed a
+shared library and re-implemented the native functions within it
+as a special instruction sequence. By making dynamic modifications
+to the /etc/ld.so.preload file, the shared library is loaded into
+the user program. Consequently, when the user program calls a native
+function, it executes the corresponding special instruction sequence.
+During execution, the QEMU translator identifies these special
+instructions and executes the corresponding native functions
+accordingly.
+
+These special instructions are implemented using
+architecture-specific unused or invalid opcodes, ensuring that they
+do not conflict with existing instructions.
+
+
+i386 and x86_64
+---
+An unused instruction is utilized to mark a native call.
+
+arm and aarch64
+---
+HLT is an invalid instruction for userspace programs, and is used to
+mark a native call.
+
+mips and mips64
+---
+The syscall instruction contains 20 unused bits, which are typically
+set to 0. These bits can be used to store non-zero data,
+distinguishing them from a regular syscall instruction.
+
+Usage
+-
+
+1. Install cross-compilation tools
+
+Cross-compilation tools are required to build the shared libraries
+that can hook the necessary library functions. For example, a viable
+command on Ubuntu is:
+
+::
+
+apt install gcc-arm-linux-gnueabihf gcc-aarch64-linux-gnu \
+gcc-mips-linux-gnu gcc-mips64-linux-gnuabi64
+
+
+2. Locate the compiled libnative.so
+
+After compilation, the libnative.so file can be found in the
+``./build/common-user/native/-linux-user`` directory.
+
+3. Run the program with the ``--native-bypass`` option
+
+To run your program with native library bypass, use the
+``--native-bypass`` option to import libnative.so:
+
+::
+
+qemu- --native-bypass \
+./build/common-user/native/-linux-user/libnative.so ./program
+
-- 
2.34.1

[RFC v6 3/9] linux-user: Implement native-bypass option support

2023-09-12 Thread Yeqi Fu

This commit implements support for the native-bypass option
in linux-user. By utilizing this functionality, the specified
shared library can be loaded into the user program. This is
achieved by dynamically modifying the /etc/ld.so.preload file,
enabling the user program to load the shared library effortlessly.

Signed-off-by: Yeqi Fu 
---
 include/native/native.h |  7 ++
 linux-user/main.c   | 20 +++
 linux-user/syscall.c| 55 +
 3 files changed, 82 insertions(+)
 create mode 100644 include/native/native.h

diff --git a/include/native/native.h b/include/native/native.h
new file mode 100644
index 00..12462a261e
--- /dev/null
+++ b/include/native/native.h
@@ -0,0 +1,7 @@
+#if defined(CONFIG_USER_ONLY) && defined(CONFIG_NATIVE_CALL)
+extern char *native_lib_path;
+/* Check if the native-bypass option is enabled. */
+#define native_bypass_enabled() (native_lib_path != NULL)
+#else
+#define native_bypass_enabled() false
+#endif
diff --git a/linux-user/main.c b/linux-user/main.c
index dba67ffa36..4c1d515944 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -60,6 +60,11 @@
 #include "semihosting/semihost.h"
 #endif
 
+#if defined(CONFIG_NATIVE_CALL)
+#include "native/native.h"
+char *native_lib_path;
+#endif
+
 #ifndef AT_FLAGS_PRESERVE_ARGV0
 #define AT_FLAGS_PRESERVE_ARGV0_BIT 0
 #define AT_FLAGS_PRESERVE_ARGV0 (1 << AT_FLAGS_PRESERVE_ARGV0_BIT)
@@ -293,6 +298,17 @@ static void handle_arg_set_env(const char *arg)
 free(r);
 }
 
+#if defined(CONFIG_NATIVE_CALL)
+static void handle_arg_native_bypass(const char *arg)
+{
+if (!g_file_test(arg, G_FILE_TEST_IS_REGULAR)) {
+fprintf(stderr, "native library %s does not exist\n", arg);
+exit(EXIT_FAILURE);
+}
+native_lib_path = g_strdup(arg);
+}
+#endif
+
 static void handle_arg_unset_env(const char *arg)
 {
 char *r, *p, *token;
@@ -522,6 +538,10 @@ static const struct qemu_argument arg_table[] = {
  "",   "Generate a /tmp/perf-${pid}.map file for perf"},
 {"jitdump","QEMU_JITDUMP", false, handle_arg_jitdump,
  "",   "Generate a jit-${pid}.dump file for perf"},
+#if defined(CONFIG_NATIVE_CALL)
+{"native-bypass", "QEMU_NATIVE_BYPASS", true, handle_arg_native_bypass,
+ "",   "native bypass for library calls"},
+#endif
 {NULL, NULL, false, NULL, NULL, NULL}
 };
 
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 08162cc966..7034f58373 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -143,6 +143,7 @@
 #include "fd-trans.h"
 #include "tcg/tcg.h"
 #include "cpu_loop-common.h"
+#include "native/native.h"
 
 #ifndef CLONE_IO
 #define CLONE_IO0x8000  /* Clone io context */
@@ -8503,6 +8504,40 @@ static int open_hardware(CPUArchState *cpu_env, int fd)
 }
 #endif
 
+#if defined(CONFIG_NATIVE_CALL)
+static int is_ld_so_preload(const char *filename, const char *entry)
+{
+if (native_bypass_enabled() && !strcmp(filename, entry)) {
+return 1;
+}
+return 0;
+}
+
+/* This function is only called when the "native-bypass" option is provided. */
+static int open_ld_so_preload(CPUArchState *cpu_env, int fd)
+{
+FILE *fp;
+char *line = NULL;
+size_t len = 0;
+ssize_t read;
+
+dprintf(fd, "%s\n", native_lib_path);
+fp = fopen("/etc/ld.so.preload", "r");
+if (fp == NULL) {
+return 0;
+}
+
+while ((read = getline(, , fp)) != -1) {
+dprintf(fd, "%s", line);
+}
+
+free(line);
+fclose(fp);
+
+return 0;
+}
+#endif
+
 int do_guest_openat(CPUArchState *cpu_env, int dirfd, const char *pathname,
 int flags, mode_t mode, bool safe)
 {
@@ -8527,6 +8562,9 @@ int do_guest_openat(CPUArchState *cpu_env, int dirfd, 
const char *pathname,
 #endif
 #if defined(TARGET_M68K)
 { "/proc/hardware", open_hardware, is_proc },
+#endif
+#if defined(CONFIG_NATIVE_CALL)
+{ "/etc/ld.so.preload", open_ld_so_preload, is_ld_so_preload },
 #endif
 { NULL, NULL, NULL }
 };
@@ -9523,6 +9561,11 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 return -TARGET_EFAULT;
 }
 ret = get_errno(access(path(p), arg2));
+if (ret != 0 && native_bypass_enabled()) {
+if (strcmp(p, "/etc/ld.so.preload") == 0 && arg2 == R_OK) {
+return 0;
+}
+}
 unlock_user(p, arg1, 0);
 return ret;
 #endif
@@ -9532,6 +9575,12 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 return -TARGET_EFAULT;
 }
 ret = get_errno(faccessat(arg1, p, arg3, 0));
+if (ret != 0 && native_bypass_enabled()) {
+if (strcmp(p, "/etc/ld.so.preload") == 0 && arg1 == AT_FDCWD &&
+arg3 == R_OK) {
+return 0;
+}
+}
 unlock_user(p, arg2, 0);
 return ret;
 #endif
@@

[PATCH 4/4] target/ppc: Add migration support for BHRB

2023-09-12 Thread Glenn Miles

Adds migration support for Branch History Rolling
Buffer (BHRB) internal state.

Signed-off-by: Glenn Miles 
---
 target/ppc/machine.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index b195fb4dc8..89146969c8 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -314,6 +314,7 @@ static int cpu_post_load(void *opaque, int version_id)
 
 if (tcg_enabled()) {
 pmu_mmcr01a_updated(env);
+hreg_bhrb_filter_update(env);
 }
 
 return 0;
@@ -670,6 +671,27 @@ static const VMStateDescription vmstate_compat = {
 }
 };
 
+#ifdef TARGET_PPC64
+static bool bhrb_needed(void *opaque)
+{
+PowerPCCPU *cpu = opaque;
+return (cpu->env.flags & POWERPC_FLAG_BHRB) != 0;
+}
+
+static const VMStateDescription vmstate_bhrb = {
+.name = "cpu/bhrb",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = bhrb_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINTTL(env.bhrb_num_entries, PowerPCCPU),
+VMSTATE_UINTTL(env.bhrb_offset, PowerPCCPU),
+VMSTATE_UINT64_ARRAY(env.bhrb, PowerPCCPU, BHRB_MAX_NUM_ENTRIES),
+VMSTATE_END_OF_LIST()
+}
+};
+#endif
+
 const VMStateDescription vmstate_ppc_cpu = {
 .name = "cpu",
 .version_id = 5,
@@ -716,6 +738,7 @@ const VMStateDescription vmstate_ppc_cpu = {
 #ifdef TARGET_PPC64
 _tm,
 _slb,
+_bhrb,
 #endif /* TARGET_PPC64 */
 _tlb6xx,
 _tlbemb,
-- 
2.31.1

[RFC v6 8/9] tests/tcg/multiarch: Add nativecall.c test

2023-09-12 Thread Yeqi Fu

Introduce a new test for native calls to ensure their functionality.
The process involves cross-compiling the test cases, building them
as dynamically linked binaries, and running these binaries which
necessitates the addition of the appropriate interpreter prefix.

Signed-off-by: Yeqi Fu 
---
 tests/tcg/multiarch/Makefile.target |  32 ++
 tests/tcg/multiarch/native/nativecall.c | 132 
 2 files changed, 164 insertions(+)
 create mode 100644 tests/tcg/multiarch/native/nativecall.c

diff --git a/tests/tcg/multiarch/Makefile.target 
b/tests/tcg/multiarch/Makefile.target
index 43bddeaf21..8bad8ac0d5 100644
--- a/tests/tcg/multiarch/Makefile.target
+++ b/tests/tcg/multiarch/Makefile.target
@@ -12,7 +12,9 @@ VPATH+= $(MULTIARCH_SRC)
 MULTIARCH_SRCS =  $(notdir $(wildcard $(MULTIARCH_SRC)/*.c))
 ifeq ($(filter %-linux-user, $(TARGET)),$(TARGET))
 VPATH += $(MULTIARCH_SRC)/linux
+VPATH  += $(MULTIARCH_SRC)/native
 MULTIARCH_SRCS += $(notdir $(wildcard $(MULTIARCH_SRC)/linux/*.c))
+MULTIARCH_SRCS += $(notdir $(wildcard $(MULTIARCH_SRC)/native/*.c))
 endif
 MULTIARCH_TESTS = $(MULTIARCH_SRCS:.c=)
 
@@ -138,5 +140,35 @@ run-plugin-semiconsole-with-%:
 TESTS += semihosting semiconsole
 endif
 
+nativecall: LDFLAGS+=-ldl
+nativecall: CFLAGS+=-D_GNU_SOURCE -fPIE
+nativecall: nativecall.c
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(filter-out 
-static,$(LDFLAGS))
+
+ifneq ($(LD_PREFIX),)
+ifneq ($(LIBNATIVE),)
+run-nativecall: nativecall
+   $(call run-test, $<, $(QEMU) -L $(LD_PREFIX) \
+   --native-bypass $(LIBNATIVE) $<, "nativecall")
+
+run-plugin-nativecall-with-%:
+   $(call run-test, $@, $(QEMU) $(QEMU_OPTS) \
+   -L $(LD_PREFIX) --native-bypass $(LIBNATIVE) \
+   -plugin $(PLUGIN_LIB)/$(call extract-plugin,$@)$(PLUGIN_ARGS) \
+$(call strip-plugin,$<) 2> $<.err, \
+$< with $*)
+else
+run-nativecall: nativecall
+   $(call skip-test, $<, "no native library found")
+run-plugin-nativecall-with-%:
+   $(call skip-test, $<, "no native library found")
+endif
+else
+run-nativecall: nativecall
+   $(call skip-test, $<, "no elf interpreter prefix found")
+run-plugin-nativecall-with-%:
+   $(call skip-test, $<, "no elf interpreter prefix found")
+endif
+
 # Update TESTS
 TESTS += $(MULTIARCH_TESTS)
diff --git a/tests/tcg/multiarch/native/nativecall.c 
b/tests/tcg/multiarch/native/nativecall.c
new file mode 100644
index 00..de18718c61
--- /dev/null
+++ b/tests/tcg/multiarch/native/nativecall.c
@@ -0,0 +1,132 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+void compare_memory(const void *a, const void *b, size_t n)
+{
+const unsigned char *p1 = a;
+const unsigned char *p2 = b;
+for (size_t i = 0; i < n; i++) {
+assert(p1[i] == p2[i]);
+}
+}
+
+void test_memcpy(char *src)
+{
+char dest[2000];
+memcpy(dest, src, 2000);
+compare_memory(dest, src, 2000);
+}
+
+void test_strncpy(char *src)
+{
+char dest[2000];
+strncpy(dest, src, 2000);
+compare_memory(dest, src, 2000);
+}
+
+void test_strcpy(char *src)
+{
+char dest[2000];
+strcpy(dest, src);
+compare_memory(dest, src, 2000);
+}
+
+void test_strcat()
+{
+char src[30] = "Hello, ";
+char dest[] = "world!";
+char str[] = "Hello, world!";
+strcat(src, dest);
+compare_memory(src, str, 13);
+}
+
+void test_memcmp(char *str1, char *str2, char *str3)
+{
+int result1 = memcmp(str1, str2, 3);
+int result2 = memcmp(str1, str3, 3);
+int result3 = memcmp(str3, str1, 3);
+assert(result1 == 0);
+assert(result2 < 0);
+assert(result3 > 0);
+}
+
+void test_strncmp(char *str1, char *str2, char *str3)
+{
+int result1 = strncmp(str1, str2, 3);
+int result2 = strncmp(str1, str3, 3);
+int result3 = strncmp(str3, str1, 3);
+assert(result1 == 0);
+assert(result2 < 0);
+assert(result3 > 0);
+}
+
+void test_strcmp(char *str1, char *str2, char *str3)
+{
+int result1 = strcmp(str1, str2);
+int result2 = strcmp(str1, str3);
+int result3 = strcmp(str3, str1);
+assert(result1 == 0);
+assert(result2 < 0);
+assert(result3 > 0);
+}
+
+void test_memset(char *buffer)
+{
+memset(buffer, 'A', 2000 - 1);
+for (int i = 0; i < 2000 - 1; i++) {
+assert(buffer[i] == 'A');
+}
+}
+
+void test_libnative()
+{
+Dl_info info;
+void *memcpy_addr = (void *)memcpy;
+if (dladdr(memcpy_addr, ) != 0) {
+assert(strstr(info.dli_fname, "libnative.so") != NULL);
+}
+}
+
+/*
+ * When executing execv, an error may occur stating that the shared library
+ * cannot be preloaded.
+ */
+void test_execv(const char *cmd)
+{
+char *argv[4];
+argv[0] = (char *)"/bin/sh";
+argv[1] = (char *)"-c";
+argv[2] = (char *)cmd;
+argv[3] = NULL;
+execv("/bin/sh", argv);
+}
+
+int main()
+{
+char buf[2000];
+for (int i = 0; i < 2000 -

[RFC v6 6/9] target/mips: Add support for native library calls

2023-09-12 Thread Yeqi Fu

This commit introduces support for native library calls on the
mips target. When encountering special instructions reserved
for native calls, this commit extracts the function name and
generates the corresponding native call.

Signed-off-by: Yeqi Fu 
---
 configs/targets/mips-linux-user.mak   |  1 +
 configs/targets/mips64-linux-user.mak |  1 +
 target/mips/tcg/translate.c   | 30 ++-
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/configs/targets/mips-linux-user.mak 
b/configs/targets/mips-linux-user.mak
index b4569a9893..fa005d487a 100644
--- a/configs/targets/mips-linux-user.mak
+++ b/configs/targets/mips-linux-user.mak
@@ -3,3 +3,4 @@ TARGET_ABI_MIPSO32=y
 TARGET_SYSTBL_ABI=o32
 TARGET_SYSTBL=syscall_o32.tbl
 TARGET_BIG_ENDIAN=y
+CONFIG_NATIVE_CALL=y
diff --git a/configs/targets/mips64-linux-user.mak 
b/configs/targets/mips64-linux-user.mak
index d2ff509a11..ecfe6bcf73 100644
--- a/configs/targets/mips64-linux-user.mak
+++ b/configs/targets/mips64-linux-user.mak
@@ -4,3 +4,4 @@ TARGET_BASE_ARCH=mips
 TARGET_SYSTBL_ABI=n64
 TARGET_SYSTBL=syscall_n64.tbl
 TARGET_BIG_ENDIAN=y
+CONFIG_NATIVE_CALL=y
diff --git a/target/mips/tcg/translate.c b/target/mips/tcg/translate.c
index 74af91e4f5..b2d60e83d9 100644
--- a/target/mips/tcg/translate.c
+++ b/target/mips/tcg/translate.c
@@ -31,6 +31,7 @@
 #include "trace.h"
 #include "disas/disas.h"
 #include "fpu_helper.h"
+#include "native/native.h"
 
 #define HELPER_H "helper.h"
 #include "exec/helper-info.c.inc"
@@ -13484,10 +13485,32 @@ static void decode_opc_special_legacy(CPUMIPSState 
*env, DisasContext *ctx)
 }
 }
 
+static void gen_native_call(DisasContext *ctx, CPUMIPSState *env)
+{
+#ifdef CONFIG_USER_ONLY
+char *func_name;
+uint32_t func_tmp;
+TCGv arg1 = tcg_temp_new();
+TCGv arg2 = tcg_temp_new();
+TCGv arg3 = tcg_temp_new();
+TCGv ret = tcg_temp_new();
+tcg_gen_mov_tl(arg1, cpu_gpr[4]);
+tcg_gen_mov_tl(arg2, cpu_gpr[5]);
+tcg_gen_mov_tl(arg3, cpu_gpr[6]);
+ctx->base.pc_next += 4;
+func_tmp = translator_ldl(env, >base, ctx->base.pc_next);
+func_name = g2h(env_cpu(env), ctx->base.pc_next + func_tmp);
+if (!gen_native_call_tl(func_name, ret, arg1, arg2, arg3)) {
+gen_reserved_instruction(ctx);
+}
+tcg_gen_mov_tl(cpu_gpr[2], ret);
+#endif
+}
+
 static void decode_opc_special(CPUMIPSState *env, DisasContext *ctx)
 {
 int rs, rt, rd, sa;
-uint32_t op1;
+uint32_t op1, sig;
 
 rs = (ctx->opcode >> 21) & 0x1f;
 rt = (ctx->opcode >> 16) & 0x1f;
@@ -13583,6 +13606,11 @@ static void decode_opc_special(CPUMIPSState *env, 
DisasContext *ctx)
 #endif
 break;
 case OPC_SYSCALL:
+sig = (ctx->opcode) >> 6;
+if ((sig == 0x) && native_bypass_enabled()) {
+gen_native_call(ctx, env);
+break;
+}
 generate_exception_end(ctx, EXCP_SYSCALL);
 break;
 case OPC_BREAK:
-- 
2.34.1

[RFC v6 7/9] target/arm: Add support for native library calls

2023-09-12 Thread Yeqi Fu

This commit introduces support for native library calls on the
arm target. When encountering special instructions reserved
for native calls, this commit extracts the function name and
generates the corresponding native call.

Signed-off-by: Yeqi Fu 
---
 configs/targets/aarch64-linux-user.mak |  1 +
 configs/targets/arm-linux-user.mak |  1 +
 target/arm/tcg/translate-a64.c | 32 ++
 target/arm/tcg/translate.c | 29 +++
 target/arm/tcg/translate.h |  5 
 5 files changed, 68 insertions(+)

diff --git a/configs/targets/aarch64-linux-user.mak 
b/configs/targets/aarch64-linux-user.mak
index ba8bc5fe3f..5a8fd98cd9 100644
--- a/configs/targets/aarch64-linux-user.mak
+++ b/configs/targets/aarch64-linux-user.mak
@@ -4,3 +4,4 @@ TARGET_XML_FILES= gdb-xml/aarch64-core.xml 
gdb-xml/aarch64-fpu.xml gdb-xml/aarch
 TARGET_HAS_BFLT=y
 CONFIG_SEMIHOSTING=y
 CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
+CONFIG_NATIVE_CALL=y
diff --git a/configs/targets/arm-linux-user.mak 
b/configs/targets/arm-linux-user.mak
index 7f5d65794c..f934fb82da 100644
--- a/configs/targets/arm-linux-user.mak
+++ b/configs/targets/arm-linux-user.mak
@@ -5,3 +5,4 @@ TARGET_XML_FILES= gdb-xml/arm-core.xml gdb-xml/arm-vfp.xml 
gdb-xml/arm-vfp3.xml
 TARGET_HAS_BFLT=y
 CONFIG_SEMIHOSTING=y
 CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
+CONFIG_NATIVE_CALL=y
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 3baab6aa60..00b69e9c24 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -25,6 +25,7 @@
 #include "arm_ldst.h"
 #include "semihosting/semihost.h"
 #include "cpregs.h"
+#include "native/native.h"
 
 static TCGv_i64 cpu_X[32];
 static TCGv_i64 cpu_pc;
@@ -2400,6 +2401,10 @@ static bool trans_HLT(DisasContext *s, arg_i *a)
  * it is required for halting debug disabled: it will UNDEF.
  * Secondly, "HLT 0xf000" is the A64 semihosting syscall instruction.
  */
+if (native_bypass_enabled() && (a->imm == 0x)) {
+s->native_call_status = true;
+return true;
+}
 if (semihosting_enabled(s->current_el == 0) && a->imm == 0xf000) {
 gen_exception_internal_insn(s, EXCP_SEMIHOST);
 } else {
@@ -13392,6 +13397,26 @@ void gen_gvec_rax1(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, );
 }
 
+static void gen_native_call(CPUState *cpu, DisasContext *s, CPUARMState *env)
+{
+#ifdef CONFIG_USER_ONLY
+TCGv_i64 arg1 = tcg_temp_new_i64();
+TCGv_i64 arg2 = tcg_temp_new_i64();
+TCGv_i64 arg3 = tcg_temp_new_i64();
+TCGv_i64 ret = tcg_temp_new_i64();
+uint32_t func_tmp = translator_ldl_swap(env, >base, s->base.pc_next,
+bswap_code(s->sctlr_b));
+char *func_name = g2h(cpu, s->base.pc_next + func_tmp);
+tcg_gen_mov_i64(arg1, cpu_reg(s, 0));
+tcg_gen_mov_i64(arg2, cpu_reg(s, 1));
+tcg_gen_mov_i64(arg3, cpu_reg(s, 2));
+if (!gen_native_call_i64(func_name, ret, arg1, arg2, arg3)) {
+unallocated_encoding(s);
+}
+tcg_gen_mov_i64(cpu_reg(s, 0), ret);
+#endif
+}
+
 /* Crypto three-reg SHA512
  *  31   21 20  16 15  14  13 12  11  10  95 40
  * +---+--+---+---+-++--+--+
@@ -13950,6 +13975,13 @@ static void aarch64_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
 uint64_t pc = s->base.pc_next;
 uint32_t insn;
 
+if (native_bypass_enabled() && s->native_call_status) {
+gen_native_call(cpu, s, env);
+s->base.pc_next = pc + 4;
+s->native_call_status = false;
+return;
+}
+
 /* Singlestep exceptions have the highest priority. */
 if (s->ss_active && !s->pstate_ss) {
 /* Singlestep state is Active-pending.
diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index 13c88ba1b9..7a5a0f9444 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -28,6 +28,7 @@
 #include "semihosting/semihost.h"
 #include "cpregs.h"
 #include "exec/helper-proto.h"
+#include "native/native.h"
 
 #define HELPER_H "helper.h"
 #include "exec/helper-info.c.inc"
@@ -1125,6 +1126,23 @@ void gen_lookup_tb(DisasContext *s)
 s->base.is_jmp = DISAS_EXIT;
 }
 
+static void gen_native_call(CPUState *cpu, DisasContext *dc, CPUARMState *env)
+{
+#ifdef CONFIG_USER_ONLY
+TCGv_i32 arg1 = load_reg(dc, 0);
+TCGv_i32 arg2 = load_reg(dc, 1);
+TCGv_i32 arg3 = load_reg(dc, 2);
+TCGv_i32 ret = tcg_temp_new_i32();
+uint32_t func_tmp =
+arm_ldl_code(env, >base, dc->base.pc_next, dc->sctlr_b);
+char *func_name = g2h(cpu, dc->base.pc_next + func_tmp);
+if (!gen_native_call_i32(func_name, ret, arg1, arg2, arg3)) {
+unallocated_encoding(dc);
+}
+store_reg(dc, 0, ret);
+#endif
+}
+
 static inline void gen_hlt(DisasContext *s, int imm)
 {
 /* HLT. This has two purposes.
@@

[RFC v6 4/9] tcg: Add tcg opcodes and helpers for native library calls

2023-09-12 Thread Yeqi Fu

This commit implements tcg opcodes and helpers for native library
calls. A table is used to store the parameter types and return value
types for each native library function. In terms of types, only three
types are of real concern: the two base sizes int and intptr_t, and
if the value is a pointer, tcg_gen_g2h and tcg_gen_h2g are used for
address conversion.

Signed-off-by: Yeqi Fu 
---
 accel/tcg/tcg-runtime.c  |  66 +++
 accel/tcg/tcg-runtime.h  |  12 +++
 include/exec/helper-head.h   |   1 +
 include/native/native-defs.h |  41 ++
 include/tcg/tcg-op-common.h  |  13 +++
 include/tcg/tcg-op.h |   2 +
 include/tcg/tcg.h|   8 ++
 tcg/tcg-op.c |  36 
 tcg/tcg.c| 154 +++
 9 files changed, 333 insertions(+)
 create mode 100644 include/native/native-defs.h

diff --git a/accel/tcg/tcg-runtime.c b/accel/tcg/tcg-runtime.c
index 9fa539ad3d..764ca631d5 100644
--- a/accel/tcg/tcg-runtime.c
+++ b/accel/tcg/tcg-runtime.c
@@ -152,3 +152,69 @@ void HELPER(exit_atomic)(CPUArchState *env)
 {
 cpu_loop_exit_atomic(env_cpu(env), GETPC());
 }
+
+#ifdef CONFIG_USER_ONLY
+int HELPER(nc_memcmp)(void *s1, void *s2, void *len)
+{
+set_helper_retaddr(GETPC());
+int r = memcmp(s1, s2, (size_t)len);
+clear_helper_retaddr();
+return r;
+}
+
+void *HELPER(nc_memcpy)(void *dst, void *src, void *len)
+{
+set_helper_retaddr(GETPC());
+void *r = memcpy(dst, src, (size_t)len);
+clear_helper_retaddr();
+return r;
+}
+
+void *HELPER(nc_memset)(void *b, int c, void *len)
+{
+set_helper_retaddr(GETPC());
+void *r = memset(b, c, (size_t)len);
+clear_helper_retaddr();
+return r;
+}
+
+void *HELPER(nc_strcat)(void *dst, void *src)
+{
+set_helper_retaddr(GETPC());
+void *r = strcat(dst, src);
+clear_helper_retaddr();
+return r;
+}
+
+int HELPER(nc_strcmp)(void *s1, void *s2)
+{
+set_helper_retaddr(GETPC());
+int r = strcmp(s1, s2);
+clear_helper_retaddr();
+return r;
+}
+
+void *HELPER(nc_strcpy)(void *dst, void *src)
+{
+set_helper_retaddr(GETPC());
+void *r = strcpy(dst, src);
+clear_helper_retaddr();
+return r;
+}
+
+int HELPER(nc_strncmp)(void *s1, void *s2, void *len)
+{
+set_helper_retaddr(GETPC());
+int r = strncmp(s1, s2, (size_t)len);
+clear_helper_retaddr();
+return r;
+}
+
+void *HELPER(nc_strncpy)(void *dst, void *src, void *len)
+{
+set_helper_retaddr(GETPC());
+void *r = strncpy(dst, src, (size_t)len);
+clear_helper_retaddr();
+return r;
+}
+#endif
diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 39e68007f9..7330124c0b 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -298,3 +298,15 @@ DEF_HELPER_FLAGS_4(gvec_leu32, TCG_CALL_NO_RWG, void, ptr, 
ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_leu64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(gvec_bitsel, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+
+#ifdef CONFIG_USER_ONLY
+/* Helpers for native library calls */
+DEF_HELPER_FLAGS_3(nc_memcmp, TCG_CALL_NO_RWG, int, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_3(nc_memcpy, TCG_CALL_NO_RWG, ptr, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_3(nc_memset, TCG_CALL_NO_RWG, ptr, ptr, int, ptr)
+DEF_HELPER_FLAGS_2(nc_strcat, TCG_CALL_NO_RWG, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_2(nc_strcmp, TCG_CALL_NO_RWG, int, ptr, ptr)
+DEF_HELPER_FLAGS_2(nc_strcpy, TCG_CALL_NO_RWG, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_3(nc_strncmp, TCG_CALL_NO_RWG, int, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_3(nc_strncpy, TCG_CALL_NO_RWG, ptr, ptr, ptr, ptr)
+#endif
diff --git a/include/exec/helper-head.h b/include/exec/helper-head.h
index 28ceab0a46..438c74e2ac 100644
--- a/include/exec/helper-head.h
+++ b/include/exec/helper-head.h
@@ -84,6 +84,7 @@
 
 #define dh_typecode_void 0
 #define dh_typecode_noreturn 0
+#define dh_typecode_iptr 1
 #define dh_typecode_i32 2
 #define dh_typecode_s32 3
 #define dh_typecode_i64 4
diff --git a/include/native/native-defs.h b/include/native/native-defs.h
new file mode 100644
index 00..b193882698
--- /dev/null
+++ b/include/native/native-defs.h
@@ -0,0 +1,41 @@
+/*
+ * Argument encoding. We only really care about 3 types. The two base
+ * sizes (int and intptr_t) and if the value is a pointer (in which
+ * case we need to adjust it g2h before passing to the native
+ * function).
+ */
+#include "exec/helper-head.h"
+
+#define TYPE_AAAP  \
+(dh_typemask(ptr, 0) | dh_typemask(ptr, 1) | dh_typemask(ptr, 2) | \
+ dh_typemask(iptr, 3))
+#define TYPE_IAAP  \
+(dh_typemask(int, 0) | dh_typemask(ptr, 1) | dh_typemask(ptr, 2) | \
+ dh_typemask(iptr, 3))
+#define TYPE_AAIP  \
+(dh_typemask(ptr, 0) | dh_typemask(ptr, 1) | dh_typemask(int, 2) | \
+ dh_typemask(iptr, 3))
+#define TYPE_AAA

[RFC v6 1/9] build: Implement logic for sharing cross-building config files

2023-09-12 Thread Yeqi Fu

Since both TCG tests and libnative libraries require cross-building,
the config files for cross-building, config_target_mak, are now saved
in the cross-build directory for sharing. This allows TCG tests and
libnative libraries to use these config files through symbolic links
when cross-building configuration is needed.

Since config_host_mak essentially contains all the information from
the original tests/tcg/config-host.mak, the original config-host.mak
has been deleted and replaced with a symbolic link to config_host_mak.

Signed-off-by: Yeqi Fu 
---
 configure | 61 ++-
 1 file changed, 38 insertions(+), 23 deletions(-)

diff --git a/configure b/configure
index 2b41c49c0d..7a1e463d9c 100755
--- a/configure
+++ b/configure
@@ -1751,32 +1751,23 @@ if test "$ccache_cpp2" = "yes"; then
   echo "export CCACHE_CPP2=y" >> $config_host_mak
 fi
 
-# tests/tcg configuration
-(config_host_mak=tests/tcg/config-host.mak
-mkdir -p tests/tcg
-echo "# Automatically generated by configure - do not modify" > 
$config_host_mak
-echo "SRC_PATH=$source_path" >> $config_host_mak
-echo "HOST_CC=$host_cc" >> $config_host_mak
+# Prepare the config files for cross building.
+# This process generates 'cross-build//config-target.mak' files.
+# These files are then symlinked to the directories that need them which
+# including the TCG tests (tests/tcg/) and the libnative library
+# for linux-user (common/native//).
+mkdir -p cross-build
 
-# versioned checked in the main config_host.mak above
-if test -n "$gdb_bin"; then
-echo "HAVE_GDB_BIN=$gdb_bin" >> $config_host_mak
-fi
-if test "$plugins" = "yes" ; then
-echo "CONFIG_PLUGIN=y" >> $config_host_mak
-fi
-
-tcg_tests_targets=
 for target in $target_list; do
   arch=${target%%-*}
-
   case $target in
 xtensa*-linux-user)
-  # the toolchain is not complete with headers, only build softmmu tests
+  # the toolchain for tests/tcg is not complete with headers
   continue
   ;;
 *-softmmu)
-  test -f "$source_path/tests/tcg/$arch/Makefile.softmmu-target" || 
continue
+  # skip installing config-target.mak when we have no tests to build
+  test -f "${source_path}/tests/tcg/${arch}/Makefile.softmmu-target" || 
continue
   qemu="qemu-system-$arch"
   ;;
 *-linux-user|*-bsd-user)
@@ -1786,22 +1777,46 @@ for target in $target_list; do
 
   if probe_target_compiler $target || test -n "$container_image"; then
   test -n "$container_image" && build_static=y
-  mkdir -p "tests/tcg/$target"
-  config_target_mak=tests/tcg/$target/config-target.mak
-  ln -sf "$source_path/tests/tcg/Makefile.target" 
"tests/tcg/$target/Makefile"
+  mkdir -p "cross-build/${target}"
+  config_target_mak=cross-build/${target}/config-target.mak
   echo "# Automatically generated by configure - do not modify" > 
"$config_target_mak"
   echo "TARGET_NAME=$arch" >> "$config_target_mak"
   echo "TARGET=$target" >> "$config_target_mak"
-  write_target_makefile "build-tcg-tests-$target" >> "$config_target_mak"
+  write_target_makefile "$target" >> "$config_target_mak"
   echo "BUILD_STATIC=$build_static" >> "$config_target_mak"
   echo "QEMU=$PWD/$qemu" >> "$config_target_mak"
 
+  # get the interpreter prefix and the path of libnative required for 
native call tests
+  if test -n "$target_cc" && [ -d "/usr/$(echo "$target_cc" | sed 
's/-gcc//')" ]; then
+  echo "LD_PREFIX=/usr/$(echo "$target_cc" | sed 's/-gcc//')" >> 
"$config_target_mak"
+  fi
+
   # will GDB work with these binaries?
   if test "${gdb_arches#*$arch}" != "$gdb_arches"; then
   echo "HOST_GDB_SUPPORTS_ARCH=y" >> "$config_target_mak"
   fi
+  fi
+done
+
+# tests/tcg configuration
+(mkdir -p tests/tcg
+# create a symlink to the config-host.mak file in the tests/tcg
+ln -srf $config_host_mak tests/tcg/config-host.mak
+echo "HOST_CC=$host_cc" >> $config_host_mak
+
+tcg_tests_targets=
+for target in $target_list; do
+  case $target in
+*-softmmu)
+  test -f "${source_path}/tests/tcg/${arch}/Makefile.softmmu-target" || 
continue
+  ;;
+  esac
 
-  echo "run-tcg-tests-$target: $qemu\$(EXESUF)" >> Makefile.prereqs
+  if test -f cross-build/${target}/config-target.mak; then
+  mkdir -p "tests/tcg/${target}"
+  ln -srf cross-build/${target}/config-target.mak 
tests/tcg/${target}/config-target.mak
+  ln -sf ${source_path}/tests/tcg/Makefile.target 
tests/tcg/${target}/Makefile
+  echo "run-tcg-tests-${target}: $qemu\$(EXESUF)" >> Makefile.prereqs
   tcg_tests_targets="$tcg_tests_targets $target"
   fi
 done
-- 
2.34.1

[RFC v6 0/9] Native Library Calls

2023-09-12 Thread Yeqi Fu

Executing a program under QEMU's user mode subjects the entire
program, including all library calls, to translation. It's important
to understand that many of these library functions are optimized
specifically for the guest architecture. Therefore, their
translation might not yield the most efficient execution.

When the semantics of a library function are well defined, we can
capitalize on this by substituting the translated version with a call
to the native equivalent function.

To achieve tangible results, focus should be given to functions such
as memory-related ('mem*') and string-related ('str*') functions.
These subsets of functions often have the most significant effect
on overall performance, making them optimal candidates for
optimization.

Yeqi Fu (9):
  build: Implement logic for sharing cross-building config files
  build: Implement libnative library and the build machinery for
libnative
  linux-user: Implement native-bypass option support
  tcg: Add tcg opcodes and helpers for native library calls
  target/i386: Add support for native library calls
  target/mips: Add support for native library calls
  target/arm: Add support for native library calls
  tests/tcg/multiarch: Add nativecall.c test
  docs/user: Add doc for native library calls

 Makefile|   2 +
 accel/tcg/tcg-runtime.c |  66 ++
 accel/tcg/tcg-runtime.h |  12 ++
 common-user/native/Makefile.include |   8 ++
 common-user/native/Makefile.target  |  22 
 common-user/native/libnative.S  |  51 
 configs/targets/aarch64-linux-user.mak  |   1 +
 configs/targets/arm-linux-user.mak  |   1 +
 configs/targets/i386-linux-user.mak |   1 +
 configs/targets/mips-linux-user.mak |   1 +
 configs/targets/mips64-linux-user.mak   |   1 +
 configs/targets/x86_64-linux-user.mak   |   1 +
 configure   | 100 +++
 docs/user/index.rst |   1 +
 docs/user/native_calls.rst  |  91 ++
 include/exec/helper-head.h  |   1 +
 include/native/native-defs.h|  41 +++
 include/native/native.h |   7 ++
 include/tcg/tcg-op-common.h |  13 ++
 include/tcg/tcg-op.h|   2 +
 include/tcg/tcg.h   |   8 ++
 linux-user/main.c   |  20 +++
 linux-user/syscall.c|  55 +
 target/arm/tcg/translate-a64.c  |  32 +
 target/arm/tcg/translate.c  |  29 +
 target/arm/tcg/translate.h  |   5 +
 target/i386/tcg/translate.c |  38 ++
 target/mips/tcg/translate.c |  30 -
 tcg/tcg-op.c|  36 ++
 tcg/tcg.c   | 154 
 tests/tcg/multiarch/Makefile.target |  32 +
 tests/tcg/multiarch/native/nativecall.c | 132 
 32 files changed, 970 insertions(+), 24 deletions(-)
 create mode 100644 common-user/native/Makefile.include
 create mode 100644 common-user/native/Makefile.target
 create mode 100644 common-user/native/libnative.S
 create mode 100644 docs/user/native_calls.rst
 create mode 100644 include/native/native-defs.h
 create mode 100644 include/native/native.h
 create mode 100644 tests/tcg/multiarch/native/nativecall.c

-- 
2.34.1

[PATCH 0/4] Add BHRB Facility Support

2023-09-12 Thread Glenn Miles

This is a series of patches for adding support for the Branch History
Rolling Buffer (BHRB) facility.  This was added to the Power ISA
starting with version 2.07.  Changes were subsequently made in version
3.1 to limit BHRB recording to instructions run in problem state only
and to add a control bit to disable recording (MMCRA[BHRBRD]).


Glenn Miles (4):
  target/ppc: Add new hflags to support BHRB
  target/ppc: Add recording of taken branches to BHRB
  target/ppc: Add clrbhrb and mfbhrbe instructions
  target/ppc: Add migration support for BHRB

 target/ppc/cpu.h   |  24 +
 target/ppc/cpu_init.c  |  45 -
 target/ppc/helper.h|   5 +
 target/ppc/helper_regs.c   |  44 
 target/ppc/helper_regs.h   |   1 +
 target/ppc/machine.c   |  25 -
 target/ppc/misc_helper.c   |  43 
 target/ppc/power8-pmu-regs.c.inc   |   5 +
 target/ppc/power8-pmu.c|  17 +++-
 target/ppc/power8-pmu.h|  11 +-
 target/ppc/spr_common.h|   1 +
 target/ppc/translate.c | 133 +++--
 target/ppc/translate/branch-impl.c.inc |   2 +-
 13 files changed, 337 insertions(+), 19 deletions(-)

-- 
2.31.1

[PATCH 3/4] target/ppc: Add clrbhrb and mfbhrbe instructions

2023-09-12 Thread Glenn Miles

Add support for the clrbhrb and mfbhrbe instructions.

Since neither instruction is believed to be critical to
performance, both instructions were implemented using helper
functions.

Access to both instructions is controlled by bits in the
HFSCR (for privileged state) and MMCR0 (for problem state).
A new function, helper_mmcr0_facility_check, was added for
checking MMCR0[BHRBA] and raising a facility_unavailable exception
if required.

Signed-off-by: Glenn Miles 
---
 target/ppc/cpu.h |  1 +
 target/ppc/helper.h  |  4 
 target/ppc/misc_helper.c | 43 
 target/ppc/translate.c   | 13 
 4 files changed, 61 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index bda1afb700..ee81ede4ee 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -541,6 +541,7 @@ FIELD(MSR, LE, MSR_LE, 1)
 
 /* HFSCR bits */
 #define HFSCR_MSGP PPC_BIT(53) /* Privileged Message Send Facilities */
+#define HFSCR_BHRB PPC_BIT(59) /* BHRB Instructions */
 #define HFSCR_IC_MSGP  0xA
 
 #define DBCR0_ICMP (1 << 27)
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 1a3d9a7e57..bbc32ff114 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -816,3 +816,7 @@ DEF_HELPER_4(DSCLIQ, void, env, fprp, fprp, i32)
 
 DEF_HELPER_1(tbegin, void, env)
 DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
+
+DEF_HELPER_1(clrbhrb, void, env)
+DEF_HELPER_FLAGS_2(mfbhrbe, TCG_CALL_NO_WG, i64, env, i32)
+
diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
index 692d058665..45abe04f66 100644
--- a/target/ppc/misc_helper.c
+++ b/target/ppc/misc_helper.c
@@ -139,6 +139,17 @@ void helper_fscr_facility_check(CPUPPCState *env, uint32_t 
bit,
 #endif
 }
 
+static void helper_mmcr0_facility_check(CPUPPCState *env, uint32_t bit,
+ uint32_t sprn, uint32_t cause)
+{
+#ifdef TARGET_PPC64
+if (FIELD_EX64(env->msr, MSR, PR) &&
+!(env->spr[SPR_POWER_MMCR0] & (1ULL << bit))) {
+raise_fu_exception(env, bit, sprn, cause, GETPC());
+}
+#endif
+}
+
 void helper_msr_facility_check(CPUPPCState *env, uint32_t bit,
uint32_t sprn, uint32_t cause)
 {
@@ -351,3 +362,35 @@ void helper_fixup_thrm(CPUPPCState *env)
 env->spr[i] = v;
 }
 }
+
+void helper_clrbhrb(CPUPPCState *env)
+{
+helper_hfscr_facility_check(env, HFSCR_BHRB, "clrbhrb", FSCR_IC_BHRB);
+
+helper_mmcr0_facility_check(env, MMCR0_BHRBA, 0, FSCR_IC_BHRB);
+
+memset(env->bhrb, 0, sizeof(env->bhrb));
+}
+
+uint64_t helper_mfbhrbe(CPUPPCState *env, uint32_t bhrbe)
+{
+unsigned int index;
+
+helper_hfscr_facility_check(env, HFSCR_BHRB, "mfbhrbe", FSCR_IC_BHRB);
+
+helper_mmcr0_facility_check(env, MMCR0_BHRBA, 0, FSCR_IC_BHRB);
+
+if ((bhrbe >= env->bhrb_num_entries) ||
+   (env->spr[SPR_POWER_MMCR0] & MMCR0_PMAE)) {
+return 0;
+}
+
+/*
+ * Note: bhrb_offset is the byte offset for writing the
+ * next entry (over the oldest entry), which is why we
+ * must offset bhrbe by 1 to get to the 0th entry.
+ */
+index = ((env->bhrb_offset / sizeof(uint64_t)) - (bhrbe + 1)) %
+env->bhrb_num_entries;
+return env->bhrb[index];
+}
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 7824475f54..b330871793 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -6549,12 +6549,25 @@ static void gen_brh(DisasContext *ctx)
 }
 #endif
 
+static void gen_clrbhrb(DisasContext *ctx)
+{
+gen_helper_clrbhrb(cpu_env);
+}
+
+static void gen_mfbhrbe(DisasContext *ctx)
+{
+TCGv_i32 bhrbe = tcg_constant_i32(_SPR(ctx->opcode));
+gen_helper_mfbhrbe(cpu_gpr[rD(ctx->opcode)], cpu_env, bhrbe);
+}
+
 static opcode_t opcodes[] = {
 #if defined(TARGET_PPC64)
 GEN_HANDLER_E(brd, 0x1F, 0x1B, 0x05, 0xF801, PPC_NONE, PPC2_ISA310),
 GEN_HANDLER_E(brw, 0x1F, 0x1B, 0x04, 0xF801, PPC_NONE, PPC2_ISA310),
 GEN_HANDLER_E(brh, 0x1F, 0x1B, 0x06, 0xF801, PPC_NONE, PPC2_ISA310),
 #endif
+GEN_HANDLER_E(clrbhrb, 0x1F, 0x0E, 0x0D, 0x3FFF801, PPC_NONE, PPC2_ISA207S),
+GEN_HANDLER_E(mfbhrbe, 0x1F, 0x0E, 0x09, 0x001, PPC_NONE, PPC2_ISA207S),
 GEN_HANDLER(invalid, 0x00, 0x00, 0x00, 0x, PPC_NONE),
 #if defined(TARGET_PPC64)
 GEN_HANDLER_E(cmpeqb, 0x1F, 0x00, 0x07, 0x0060, PPC_NONE, PPC2_ISA300),
-- 
2.31.1

[PATCH 3/4] target/ppc: Add clrbhrb and mfbhrbe instructions

2023-09-12 Thread Glenn Miles

Add support for the clrbhrb and mfbhrbe instructions.

Since neither instruction is believed to be critical to
performance, both instructions were implemented using helper
functions.

Access to both instructions is controlled by bits in the
HFSCR (for privileged state) and MMCR0 (for problem state).
A new function, helper_mmcr0_facility_check, was added for
checking MMCR0[BHRBA] and raising a facility_unavailable exception
if required.

Signed-off-by: Glenn Miles 
---
 target/ppc/cpu.h |  1 +
 target/ppc/helper.h  |  4 
 target/ppc/misc_helper.c | 43 
 target/ppc/translate.c   | 13 
 4 files changed, 61 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index bda1afb700..ee81ede4ee 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -541,6 +541,7 @@ FIELD(MSR, LE, MSR_LE, 1)
 
 /* HFSCR bits */
 #define HFSCR_MSGP PPC_BIT(53) /* Privileged Message Send Facilities */
+#define HFSCR_BHRB PPC_BIT(59) /* BHRB Instructions */
 #define HFSCR_IC_MSGP  0xA
 
 #define DBCR0_ICMP (1 << 27)
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 1a3d9a7e57..bbc32ff114 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -816,3 +816,7 @@ DEF_HELPER_4(DSCLIQ, void, env, fprp, fprp, i32)
 
 DEF_HELPER_1(tbegin, void, env)
 DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
+
+DEF_HELPER_1(clrbhrb, void, env)
+DEF_HELPER_FLAGS_2(mfbhrbe, TCG_CALL_NO_WG, i64, env, i32)
+
diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
index 692d058665..45abe04f66 100644
--- a/target/ppc/misc_helper.c
+++ b/target/ppc/misc_helper.c
@@ -139,6 +139,17 @@ void helper_fscr_facility_check(CPUPPCState *env, uint32_t 
bit,
 #endif
 }
 
+static void helper_mmcr0_facility_check(CPUPPCState *env, uint32_t bit,
+ uint32_t sprn, uint32_t cause)
+{
+#ifdef TARGET_PPC64
+if (FIELD_EX64(env->msr, MSR, PR) &&
+!(env->spr[SPR_POWER_MMCR0] & (1ULL << bit))) {
+raise_fu_exception(env, bit, sprn, cause, GETPC());
+}
+#endif
+}
+
 void helper_msr_facility_check(CPUPPCState *env, uint32_t bit,
uint32_t sprn, uint32_t cause)
 {
@@ -351,3 +362,35 @@ void helper_fixup_thrm(CPUPPCState *env)
 env->spr[i] = v;
 }
 }
+
+void helper_clrbhrb(CPUPPCState *env)
+{
+helper_hfscr_facility_check(env, HFSCR_BHRB, "clrbhrb", FSCR_IC_BHRB);
+
+helper_mmcr0_facility_check(env, MMCR0_BHRBA, 0, FSCR_IC_BHRB);
+
+memset(env->bhrb, 0, sizeof(env->bhrb));
+}
+
+uint64_t helper_mfbhrbe(CPUPPCState *env, uint32_t bhrbe)
+{
+unsigned int index;
+
+helper_hfscr_facility_check(env, HFSCR_BHRB, "mfbhrbe", FSCR_IC_BHRB);
+
+helper_mmcr0_facility_check(env, MMCR0_BHRBA, 0, FSCR_IC_BHRB);
+
+if ((bhrbe >= env->bhrb_num_entries) ||
+   (env->spr[SPR_POWER_MMCR0] & MMCR0_PMAE)) {
+return 0;
+}
+
+/*
+ * Note: bhrb_offset is the byte offset for writing the
+ * next entry (over the oldest entry), which is why we
+ * must offset bhrbe by 1 to get to the 0th entry.
+ */
+index = ((env->bhrb_offset / sizeof(uint64_t)) - (bhrbe + 1)) %
+env->bhrb_num_entries;
+return env->bhrb[index];
+}
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 7824475f54..b330871793 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -6549,12 +6549,25 @@ static void gen_brh(DisasContext *ctx)
 }
 #endif
 
+static void gen_clrbhrb(DisasContext *ctx)
+{
+gen_helper_clrbhrb(cpu_env);
+}
+
+static void gen_mfbhrbe(DisasContext *ctx)
+{
+TCGv_i32 bhrbe = tcg_constant_i32(_SPR(ctx->opcode));
+gen_helper_mfbhrbe(cpu_gpr[rD(ctx->opcode)], cpu_env, bhrbe);
+}
+
 static opcode_t opcodes[] = {
 #if defined(TARGET_PPC64)
 GEN_HANDLER_E(brd, 0x1F, 0x1B, 0x05, 0xF801, PPC_NONE, PPC2_ISA310),
 GEN_HANDLER_E(brw, 0x1F, 0x1B, 0x04, 0xF801, PPC_NONE, PPC2_ISA310),
 GEN_HANDLER_E(brh, 0x1F, 0x1B, 0x06, 0xF801, PPC_NONE, PPC2_ISA310),
 #endif
+GEN_HANDLER_E(clrbhrb, 0x1F, 0x0E, 0x0D, 0x3FFF801, PPC_NONE, PPC2_ISA207S),
+GEN_HANDLER_E(mfbhrbe, 0x1F, 0x0E, 0x09, 0x001, PPC_NONE, PPC2_ISA207S),
 GEN_HANDLER(invalid, 0x00, 0x00, 0x00, 0x, PPC_NONE),
 #if defined(TARGET_PPC64)
 GEN_HANDLER_E(cmpeqb, 0x1F, 0x00, 0x07, 0x0060, PPC_NONE, PPC2_ISA300),
-- 
2.31.1

Re: [PATCH 4/4] docs/cxl: Change to lowercase as others

2023-09-12 Thread Fan Ni

On Mon, Sep 04, 2023 at 02:28:06PM +0100, Jonathan Cameron wrote:
> From: Li Zhijian 
> 
> Using the same style as elsewhere for topology / topo
> 
> Signed-off-by: Li Zhijian 
> Link: 
> https://urldefense.com/v3/__https://lore.kernel.org/r/20230519085802.2106900-2-lizhijian@cn.fujitsu.com__;!!EwVzqGoTKBqv-0DWAJBm!TWHVrdL5Ys7OOFU_w1CJQ5DC6mxu649kYA9GYDJ182CNPuQqpVkWYsB5mlJpVd_BAAmhxCD4Si2CkMERZI7ZE03kPz2c$
>  
> Signed-off-by: Jonathan Cameron 
> ---

Reviewed-by: Fan Ni 

>  docs/system/devices/cxl.rst | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
> index f12011e230..b742120657 100644
> --- a/docs/system/devices/cxl.rst
> +++ b/docs/system/devices/cxl.rst
> @@ -157,7 +157,7 @@ responsible for allocating appropriate ranges from within 
> the CFMWs
>  and exposing those via normal memory configurations as would be done
>  for system RAM.
>  
> -Example system Topology. x marks the match in each decoder level::
> +Example system topology. x marks the match in each decoder level::
>  
>|<--SYSTEM PHYSICAL ADDRESS MAP (1)->|
>|__   __   __|
> @@ -187,8 +187,8 @@ Example system Topology. x marks the match in each 
> decoder level::
> ___|___   __|__   __|_   ___|_
> (3)|  Root Port 0  | | Root Port 1 | | Root Port 2| | Root Port 3 |
>|  Appears in   | | Appears in  | | Appears in | | Appear in   |
> -  |  PCI topology | | PCI Topology| | PCI Topo   | | PCI Topo|
> -  |  As 0c:00.0   | | as 0c:01.0  | | as de:00.0 | | as de:01.0  |
> +  |  PCI topology | | PCI topology| | PCI topo   | | PCI topo|
> +  |  as 0c:00.0   | | as 0c:01.0  | | as de:00.0 | | as de:01.0  |
>|___| |_| || |_|
>  |  |   |  |
>  |  |   |  |
> @@ -272,7 +272,7 @@ Example topology involving a switch::
>|  Root Port 0  |
>|  Appears in   |
>|  PCI topology |
> -  |  As 0c:00.0   |
> +  |  as 0c:00.0   |
>|___x___|
>|
>|
> -- 
> 2.39.2
>

[PATCH v4 0/3] Fix MCE handling on AMD hosts

2023-09-12 Thread John Allen

In the event that a guest process attempts to access memory that has
been poisoned in response to a deferred uncorrected MCE, an AMD system
will currently generate a SIGBUS error which will result in the entire
guest being shutdown. Ideally, we only want to kill the guest process
that accessed poisoned memory in this case.

This support has been included in qemu for Intel hosts for a long time,
but there are a couple of changes needed for AMD hosts. First, we will
need to expose the SUCCOR cpuid bit to guests. Second, we need to modify
the MCE injection code to avoid Intel specific behavior when we are
running on an AMD host.

v2:
  - Add "succor" feature word.
  - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.

v3:
  - Reorder series. Only enable SUCCOR after bugs have been fixed.
  - Introduce new patch ignoring AO errors.

v4:
  - Remove redundant check for AO errors.

John Allen (2):
  i386: Fix MCE support for AMD hosts
  i386: Add support for SUCCOR feature

William Roche (1):
  i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

 target/i386/cpu.c | 18 +-
 target/i386/cpu.h |  4 
 target/i386/helper.c  |  4 
 target/i386/kvm/kvm.c | 28 
 4 files changed, 45 insertions(+), 9 deletions(-)

-- 
2.39.3

Re: [PATCH 2/4] hw/pci-bridge/cxl_upstream: Fix bandwidth entry base unit for SSLBIS

2023-09-12 Thread Fan Ni

On Mon, Sep 04, 2023 at 02:28:04PM +0100, Jonathan Cameron wrote:
> From: Dave Jiang 
> 
> According to ACPI spec 6.5 5.2.28.4 System Locality Latency and Bandwidth
> Information Structure, if the "Entry Base Unit" is 1024 for BW and the
> matrix entry has the value of 100, the BW is 100 GB/s. So the
> entry_base_unit should be changed from 1000 to 1024 given the comment notes
> it's 16GB/s for .latency_bandwidth.
> 
> Fixes: 882877fc359d ("hw/pci-bridge/cxl-upstream: Add a CDAT table access 
> DOE")
> Signed-off-by: Dave Jiang 
> Signed-off-by: Jonathan Cameron 

Reviewed-by: Fan Ni 

> ---
>  hw/pci-bridge/cxl_upstream.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/pci-bridge/cxl_upstream.c b/hw/pci-bridge/cxl_upstream.c
> index 9159f48a8c..2b9cf0cc97 100644
> --- a/hw/pci-bridge/cxl_upstream.c
> +++ b/hw/pci-bridge/cxl_upstream.c
> @@ -262,7 +262,7 @@ static int build_cdat_table(CDATSubHeader ***cdat_table, 
> void *priv)
>  .length = sslbis_size,
>  },
>  .data_type = HMATLB_DATA_TYPE_ACCESS_BANDWIDTH,
> -.entry_base_unit = 1000,
> +.entry_base_unit = 1024,
>  },
>  };
>  
> -- 
> 2.39.2
>

Re: [PATCH 1/4] hw/cxl: Fix CFMW config memory leak

2023-09-12 Thread Fan Ni

On Mon, Sep 04, 2023 at 02:28:03PM +0100, Jonathan Cameron wrote:

> From: Li Zhijian 
> 
> Allocate targets and targets[n] resources when all sanity checks are
> passed to avoid memory leaks.
> 
> Suggested-by: Philippe Mathieu-Daudé 
> Signed-off-by: Li Zhijian 
> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Jonathan Cameron 

Reviewed-by: Fan Ni 

> ---
>  hw/cxl/cxl-host.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
> index 034c7805b3..f0920da956 100644
> --- a/hw/cxl/cxl-host.c
> +++ b/hw/cxl/cxl-host.c
> @@ -39,12 +39,6 @@ static void cxl_fixed_memory_window_config(CXLState 
> *cxl_state,
>  return;
>  }
>  
> -fw->targets = g_malloc0_n(fw->num_targets, sizeof(*fw->targets));
> -for (i = 0, target = object->targets; target; i++, target = 
> target->next) {
> -/* This link cannot be resolved yet, so stash the name for now */
> -fw->targets[i] = g_strdup(target->value);
> -}
> -
>  if (object->size % (256 * MiB)) {
>  error_setg(errp,
> "Size of a CXL fixed memory window must be a multiple of 
> 256MiB");
> @@ -64,6 +58,12 @@ static void cxl_fixed_memory_window_config(CXLState 
> *cxl_state,
>  fw->enc_int_gran = 0;
>  }
>  
> +fw->targets = g_malloc0_n(fw->num_targets, sizeof(*fw->targets));
> +for (i = 0, target = object->targets; target; i++, target = 
> target->next) {
> +/* This link cannot be resolved yet, so stash the name for now */
> +fw->targets[i] = g_strdup(target->value);
> +}
> +
>  cxl_state->fixed_windows = g_list_append(cxl_state->fixed_windows,
>   g_steal_pointer());
>  
> -- 
> 2.39.2
>

[PATCH v4 1/3] i386: Fix MCE support for AMD hosts

2023-09-12 Thread John Allen

For the most part, AMD hosts can use the same MCE injection code as Intel, but
there are instances where the qemu implementation is Intel specific. First, MCE
delivery works differently on AMD and does not support broadcast. Second,
kvm_mce_inject generates MCEs that include a number of Intel specific status
bits. Modify kvm_mce_inject to properly generate MCEs on AMD platforms.

Reported-by: William Roche 
Signed-off-by: John Allen 
---
v3:
  - Update to latest qemu code that introduces using MCG_STATUS_RIPV in the
case of a BUS_MCEERR_AR on a non-AMD machine.
---
 target/i386/helper.c  |  4 
 target/i386/kvm/kvm.c | 17 +++--
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/target/i386/helper.c b/target/i386/helper.c
index 89aa696c6d..9547e2b09d 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -91,6 +91,10 @@ int cpu_x86_support_mca_broadcast(CPUX86State *env)
 int family = 0;
 int model = 0;
 
+if (IS_AMD_CPU(env)) {
+return 0;
+}
+
 cpu_x86_version(env, , );
 if ((family == 6 && model >= 14) || family > 6) {
 return 1;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 639a242ad8..5fce74aac5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -590,16 +590,21 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int 
code)
 CPUState *cs = CPU(cpu);
 CPUX86State *env = >env;
 uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
-  MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S;
+  MCI_STATUS_MISCV | MCI_STATUS_ADDRV;
 uint64_t mcg_status = MCG_STATUS_MCIP;
 int flags = 0;
 
-if (code == BUS_MCEERR_AR) {
-status |= MCI_STATUS_AR | 0x134;
-mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV;
+if (!IS_AMD_CPU(env)) {
+status |= MCI_STATUS_S;
+if (code == BUS_MCEERR_AR) {
+status |= MCI_STATUS_AR | 0x134;
+mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV;
+} else {
+status |= 0xc0;
+mcg_status |= MCG_STATUS_RIPV;
+}
 } else {
-status |= 0xc0;
-mcg_status |= MCG_STATUS_RIPV;
+mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
 }
 
 flags = cpu_x86_support_mca_broadcast(env) ? MCE_INJECT_BROADCAST : 0;
-- 
2.39.3

[PATCH v4 3/3] i386: Add support for SUCCOR feature

2023-09-12 Thread John Allen

Add cpuid bit definition for the SUCCOR feature. This cpuid bit is required to
be exposed to guests to allow them to handle machine check exceptions on AMD
hosts.

Reported-by: William Roche 
Reviewed-by: Joao Martins 
Signed-off-by: John Allen 

v2:
  - Add "succor" feature word.
  - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.
---
 target/i386/cpu.c | 18 +-
 target/i386/cpu.h |  4 
 target/i386/kvm/kvm.c |  2 ++
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 00f913b638..d90d3a9489 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1029,6 +1029,22 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .tcg_features = TCG_APM_FEATURES,
 .unmigratable_flags = CPUID_APM_INVTSC,
 },
+[FEAT_8000_0007_EBX] = {
+.type = CPUID_FEATURE_WORD,
+.feat_names = {
+NULL, "succor", NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+},
+.cpuid = { .eax = 0x8007, .reg = R_EBX, },
+.tcg_features = 0,
+.unmigratable_flags = 0,
+},
 [FEAT_8000_0008_EBX] = {
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
@@ -6554,7 +6570,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 0x8007:
 *eax = 0;
-*ebx = 0;
+*ebx = env->features[FEAT_8000_0007_EBX];
 *ecx = 0;
 *edx = env->features[FEAT_8000_0007_EDX];
 break;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index a6000e93bd..f5afc5e4fd 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -598,6 +598,7 @@ typedef enum FeatureWord {
 FEAT_7_1_EAX,   /* CPUID[EAX=7,ECX=1].EAX */
 FEAT_8000_0001_EDX, /* CPUID[8000_0001].EDX */
 FEAT_8000_0001_ECX, /* CPUID[8000_0001].ECX */
+FEAT_8000_0007_EBX, /* CPUID[8000_0007].EBX */
 FEAT_8000_0007_EDX, /* CPUID[8000_0007].EDX */
 FEAT_8000_0008_EBX, /* CPUID[8000_0008].EBX */
 FEAT_8000_0021_EAX, /* CPUID[8000_0021].EAX */
@@ -942,6 +943,9 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 /* Packets which contain IP payload have LIP values */
 #define CPUID_14_0_ECX_LIP  (1U << 31)
 
+/* RAS Features */
+#define CPUID_8000_0007_EBX_SUCCOR  (1U << 1)
+
 /* CLZERO instruction */
 #define CPUID_8000_0008_EBX_CLZERO  (1U << 0)
 /* Always save/restore FP error pointers */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 7e9fc0cac5..15a642a894 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -477,6 +477,8 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t 
function,
  */
 cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
 ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES;
+} else if (function == 0x8007 && reg == R_EBX) {
+ret |= CPUID_8000_0007_EBX_SUCCOR;
 } else if (function == KVM_CPUID_FEATURES && reg == R_EAX) {
 /* kvm_pv_unhalt is reported by GET_SUPPORTED_CPUID, but it can't
  * be enabled without the in-kernel irqchip
-- 
2.39.3

[PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-12 Thread John Allen

From: William Roche 

AMD guests can't currently deal with BUS_MCEERR_AO MCE injection
as it panics the VM kernel. We filter this event and provide a
warning message.

Signed-off-by: William Roche 
---
v3:
  - New patch
v4:
  - Remove redundant check for AO errors
---
 target/i386/kvm/kvm.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 5fce74aac5..7e9fc0cac5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -604,6 +604,10 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int 
code)
 mcg_status |= MCG_STATUS_RIPV;
 }
 } else {
+if (code == BUS_MCEERR_AO) {
+/* XXX we don't support BUS_MCEERR_AO injection on AMD yet */
+return;
+}
 mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
 }
 
@@ -668,8 +672,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void 
*addr)
 addr, paddr, "BUS_MCEERR_AR");
 } else {
  warn_report("Guest MCE Memory Error at QEMU addr %p and "
- "GUEST addr 0x%" HWADDR_PRIx " of type %s injected",
- addr, paddr, "BUS_MCEERR_AO");
+ "GUEST addr 0x%" HWADDR_PRIx " of type %s %s",
+ addr, paddr, "BUS_MCEERR_AO",
+ IS_AMD_CPU(env) ? "ignored on AMD guest" : "injected");
 }
 
 return;
-- 
2.39.3

Re: [PATCH v2 2/3] hw/cxl: Add QTG _DSM support for ACPI0017 device

2023-09-12 Thread Fan Ni

On Mon, Sep 04, 2023 at 05:18:46PM +0100, Jonathan Cameron wrote:

> From: Dave Jiang 
> 
> Add a simple _DSM call support for the ACPI0017 device to return a fake QTG
> ID value of 0 in all cases. The enabling is for _DSM plumbing testing
> from the OS.
> 
> Following edited for readbility only
> 
> Device (CXLM)
> {
> Name (_HID, "ACPI0017")  // _HID: Hardware ID
> ...
> Method (_DSM, 4, Serialized)  // _DSM: Device-Specific Method
> {
> If ((Arg0 == ToUUID ("f365f9a6-a7de-4071-a66a-b40c0b4f8e52")))
> {
> If ((Arg2 == Zero))
> {
> Return (Buffer (One) { 0x01 })
> }
> 
> If ((Arg2 == One))
> {
> Return (Package (0x02)
> {
> Buffer (0x02)
> { 0x01, 0x00 },
> Package (0x01)
> {
> Buffer (0x02)
> { 0x00, 0x00 }
> }
> })
> }
> }
> }
> 
> Signed-off-by: Dave Jiang 
> Signed-off-by: Jonathan Cameron 

Looks good to me. One minor comment inline.
> 
> --
> v2: Minor edit to drop reference to switches in patch description.
> ---
>  include/hw/acpi/cxl.h |  1 +
>  hw/acpi/cxl.c | 57 +++
>  hw/i386/acpi-build.c  |  1 +
>  3 files changed, 59 insertions(+)
> 
> diff --git a/include/hw/acpi/cxl.h b/include/hw/acpi/cxl.h
> index acf4418886..8f22c71530 100644
> --- a/include/hw/acpi/cxl.h
> +++ b/include/hw/acpi/cxl.h
> @@ -25,5 +25,6 @@ void cxl_build_cedt(GArray *table_offsets, GArray 
> *table_data,
>  BIOSLinker *linker, const char *oem_id,
>  const char *oem_table_id, CXLState *cxl_state);
>  void build_cxl_osc_method(Aml *dev);
> +void build_cxl_dsm_method(Aml *dev);
>  
>  #endif
> diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
> index 92b46bc932..5e9039785a 100644
> --- a/hw/acpi/cxl.c
> +++ b/hw/acpi/cxl.c
> @@ -30,6 +30,63 @@
>  #include "qapi/error.h"
>  #include "qemu/uuid.h"
>  
> +void build_cxl_dsm_method(Aml *dev)

Not a concern for now, I think, do we need to check the revision
field?

Fan
> +{
> +Aml *method, *ifctx, *ifctx2;
> +
> +method = aml_method("_DSM", 4, AML_SERIALIZED);
> +{
> +Aml *function, *uuid;
> +
> +uuid = aml_arg(0);
> +function = aml_arg(2);
> +/* CXL spec v3.0 9.17.3.1 *, QTG ID _DSM */
> +ifctx = aml_if(aml_equal(
> +uuid, aml_touuid("F365F9A6-A7DE-4071-A66A-B40C0B4F8E52")));
> +
> +/* Function 0, standard DSM query function */
> +ifctx2 = aml_if(aml_equal(function, aml_int(0)));
> +{
> +uint8_t byte_list[1] = { 0x01 }; /* functions 1 only */
> +
> +aml_append(ifctx2,
> +   aml_return(aml_buffer(sizeof(byte_list), byte_list)));
> +}
> +aml_append(ifctx, ifctx2);
> +
> +/*
> + * Function 1
> + * A return value of {1, {0}} inciate that
> + * max supported QTG ID of 1 and recommended QTG is 0.
> + * The values here are faked to simplify emulation.
> + */
> +ifctx2 = aml_if(aml_equal(function, aml_int(1)));
> +{
> +uint16_t word_list[1] = { 0x01 };
> +uint16_t word_list2[1] = { 0 };
> +uint8_t *byte_list = (uint8_t *)word_list;
> +uint8_t *byte_list2 = (uint8_t *)word_list2;
> +Aml *pak, *pak1;
> +
> +/*
> + * The return package is a package of a WORD and another package.
> + * The embedded package contains 0 or more WORDs for the
> + * recommended QTG IDs.
> + */
> +pak1 = aml_package(1);
> +aml_append(pak1, aml_buffer(sizeof(uint16_t), byte_list2));
> +pak = aml_package(2);
> +aml_append(pak, aml_buffer(sizeof(uint16_t), byte_list));
> +aml_append(pak, pak1);
> +
> +aml_append(ifctx2, aml_return(pak));
> +}
> +aml_append(ifctx, ifctx2);
> +}
> +aml_append(method, ifctx);
> +aml_append(dev, method);
> +}
> +
>  static void cedt_build_chbs(GArray *table_data, PXBCXLDev *cxl)
>  {
>  PXBDev *pxb = PXB_DEV(cxl);
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index bb12b0ad43..d3bc5875eb 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1422,6 +1422,7 @@ static void build_acpi0017(Aml *table)
>  method = aml_method("_STA", 0, AML_NOTSERIALIZED);
>  aml_append(method, aml_return(aml_int(0x01)));
>  aml_append(dev, method);
> +build_cxl_dsm_method(dev);
>  
>  aml_append(scope, dev);
>  aml_append(table, scope);
> -- 
> 2.39.2
> 
>

Re: [PATCH v2 1/3] tests/acpi: Allow update of DSDT.cxl

2023-09-12 Thread Fan Ni

On Mon, Sep 04, 2023 at 05:18:45PM +0100, Jonathan Cameron wrote:

> Addition of QTG in following patch requires an update to the test
> data.
> 
> Signed-off-by: Jonathan Cameron 
> ---

Reviewed-by: Fan Ni 

>  tests/qtest/bios-tables-test-allowed-diff.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
> b/tests/qtest/bios-tables-test-allowed-diff.h
> index dfb8523c8b..9ce0f596cc 100644
> --- a/tests/qtest/bios-tables-test-allowed-diff.h
> +++ b/tests/qtest/bios-tables-test-allowed-diff.h
> @@ -1 +1,2 @@
>  /* List of comma-separated changed AML files to ignore */
> +"tests/data/acpi/q35/DSDT.cxl",
> -- 
> 2.39.2
> 
>

Re: [PATCH v2 3/3] tests/acpi: Update DSDT.cxl with QTG DSM

2023-09-12 Thread Fan Ni

On Mon, Sep 04, 2023 at 05:18:47PM +0100, Jonathan Cameron wrote:

> Description of change in previous patch.
> 
> Signed-off-by: Jonathan Cameron 

Reviewed-by: Fan Ni 

> ---
>  tests/qtest/bios-tables-test-allowed-diff.h |   1 -
>  tests/data/acpi/q35/DSDT.cxl| Bin 9655 -> 9723 bytes
>  2 files changed, 1 deletion(-)
> 
> diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
> b/tests/qtest/bios-tables-test-allowed-diff.h
> index 9ce0f596cc..dfb8523c8b 100644
> --- a/tests/qtest/bios-tables-test-allowed-diff.h
> +++ b/tests/qtest/bios-tables-test-allowed-diff.h
> @@ -1,2 +1 @@
>  /* List of comma-separated changed AML files to ignore */
> -"tests/data/acpi/q35/DSDT.cxl",
> diff --git a/tests/data/acpi/q35/DSDT.cxl b/tests/data/acpi/q35/DSDT.cxl
> index 
> ee16a861c2de7b7caaf11d91c50fcdf308815233..d4272e87c00e010a6805b6a276fcc87d9b6ead17
>  100644
> GIT binary patch
> delta 129
> zcmdn){o9+%CDZ1KTP@zG5VY|arrz8vu$o-VwO zA{_C-A}77)2ae;$4D$c@|hs?J;#_4B>ug$~QIw(xNK_XREBoSen5M39-0
> gae?^cEXE~5f=q&}Tuh7%LL7`B1_Q(9{fa-B0lXk1>i_@%
> 
> delta 61
> zcmezEz1^G3CD5k9^g@gANoypGNRo(2Yn<_sbn
> R@xdXE3`-a{Gb{aI1^_O_5bOW|
> 
> -- 
> 2.39.2
> 
>

[PATCH 3/4] target/ppc: Add clrbhrb and mfbhrbe instructions

2023-09-12 Thread Glenn Miles

Add support for the clrbhrb and mfbhrbe instructions.

Since neither instruction is believed to be critical to
performance, both instructions were implemented using helper
functions.

Access to both instructions is controlled by bits in the
HFSCR (for privileged state) and MMCR0 (for problem state).
A new function, helper_mmcr0_facility_check, was added for
checking MMCR0[BHRBA] and raising a facility_unavailable exception
if required.

Signed-off-by: Glenn Miles 
---
 target/ppc/cpu.h |  1 +
 target/ppc/helper.h  |  4 
 target/ppc/misc_helper.c | 43 
 target/ppc/translate.c   | 13 
 4 files changed, 61 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index bda1afb700..ee81ede4ee 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -541,6 +541,7 @@ FIELD(MSR, LE, MSR_LE, 1)
 
 /* HFSCR bits */
 #define HFSCR_MSGP PPC_BIT(53) /* Privileged Message Send Facilities */
+#define HFSCR_BHRB PPC_BIT(59) /* BHRB Instructions */
 #define HFSCR_IC_MSGP  0xA
 
 #define DBCR0_ICMP (1 << 27)
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 1a3d9a7e57..bbc32ff114 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -816,3 +816,7 @@ DEF_HELPER_4(DSCLIQ, void, env, fprp, fprp, i32)
 
 DEF_HELPER_1(tbegin, void, env)
 DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
+
+DEF_HELPER_1(clrbhrb, void, env)
+DEF_HELPER_FLAGS_2(mfbhrbe, TCG_CALL_NO_WG, i64, env, i32)
+
diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
index 692d058665..45abe04f66 100644
--- a/target/ppc/misc_helper.c
+++ b/target/ppc/misc_helper.c
@@ -139,6 +139,17 @@ void helper_fscr_facility_check(CPUPPCState *env, uint32_t 
bit,
 #endif
 }
 
+static void helper_mmcr0_facility_check(CPUPPCState *env, uint32_t bit,
+ uint32_t sprn, uint32_t cause)
+{
+#ifdef TARGET_PPC64
+if (FIELD_EX64(env->msr, MSR, PR) &&
+!(env->spr[SPR_POWER_MMCR0] & (1ULL << bit))) {
+raise_fu_exception(env, bit, sprn, cause, GETPC());
+}
+#endif
+}
+
 void helper_msr_facility_check(CPUPPCState *env, uint32_t bit,
uint32_t sprn, uint32_t cause)
 {
@@ -351,3 +362,35 @@ void helper_fixup_thrm(CPUPPCState *env)
 env->spr[i] = v;
 }
 }
+
+void helper_clrbhrb(CPUPPCState *env)
+{
+helper_hfscr_facility_check(env, HFSCR_BHRB, "clrbhrb", FSCR_IC_BHRB);
+
+helper_mmcr0_facility_check(env, MMCR0_BHRBA, 0, FSCR_IC_BHRB);
+
+memset(env->bhrb, 0, sizeof(env->bhrb));
+}
+
+uint64_t helper_mfbhrbe(CPUPPCState *env, uint32_t bhrbe)
+{
+unsigned int index;
+
+helper_hfscr_facility_check(env, HFSCR_BHRB, "mfbhrbe", FSCR_IC_BHRB);
+
+helper_mmcr0_facility_check(env, MMCR0_BHRBA, 0, FSCR_IC_BHRB);
+
+if ((bhrbe >= env->bhrb_num_entries) ||
+   (env->spr[SPR_POWER_MMCR0] & MMCR0_PMAE)) {
+return 0;
+}
+
+/*
+ * Note: bhrb_offset is the byte offset for writing the
+ * next entry (over the oldest entry), which is why we
+ * must offset bhrbe by 1 to get to the 0th entry.
+ */
+index = ((env->bhrb_offset / sizeof(uint64_t)) - (bhrbe + 1)) %
+env->bhrb_num_entries;
+return env->bhrb[index];
+}
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 7824475f54..b330871793 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -6549,12 +6549,25 @@ static void gen_brh(DisasContext *ctx)
 }
 #endif
 
+static void gen_clrbhrb(DisasContext *ctx)
+{
+gen_helper_clrbhrb(cpu_env);
+}
+
+static void gen_mfbhrbe(DisasContext *ctx)
+{
+TCGv_i32 bhrbe = tcg_constant_i32(_SPR(ctx->opcode));
+gen_helper_mfbhrbe(cpu_gpr[rD(ctx->opcode)], cpu_env, bhrbe);
+}
+
 static opcode_t opcodes[] = {
 #if defined(TARGET_PPC64)
 GEN_HANDLER_E(brd, 0x1F, 0x1B, 0x05, 0xF801, PPC_NONE, PPC2_ISA310),
 GEN_HANDLER_E(brw, 0x1F, 0x1B, 0x04, 0xF801, PPC_NONE, PPC2_ISA310),
 GEN_HANDLER_E(brh, 0x1F, 0x1B, 0x06, 0xF801, PPC_NONE, PPC2_ISA310),
 #endif
+GEN_HANDLER_E(clrbhrb, 0x1F, 0x0E, 0x0D, 0x3FFF801, PPC_NONE, PPC2_ISA207S),
+GEN_HANDLER_E(mfbhrbe, 0x1F, 0x0E, 0x09, 0x001, PPC_NONE, PPC2_ISA207S),
 GEN_HANDLER(invalid, 0x00, 0x00, 0x00, 0x, PPC_NONE),
 #if defined(TARGET_PPC64)
 GEN_HANDLER_E(cmpeqb, 0x1F, 0x00, 0x07, 0x0060, PPC_NONE, PPC2_ISA300),
-- 
2.31.1

[PATCH 2/4] target/ppc: Add recording of taken branches to BHRB

2023-09-12 Thread Glenn Miles

This commit continues adding support for the Branch History
Rolling Buffer (BHRB) as is provided starting with the P8
processor and continuing with its successors.  This commit
is limited to the recording and filtering of taken branches.

The following changes were made:

  - Added a BHRB buffer for storing branch instruction and
target addresses for taken branches
  - Renamed gen_update_cfar to gen_update_branch_history and
added a 'target' parameter to hold the branch target
address and 'inst_type' parameter to use for filtering
  - Added a combination of jit-time and run-time checks to
gen_update_branch_history for determining if a branch
should be recorded
  - Added TCG code to gen_update_branch_history that stores
data to the BHRB and updates the BHRB offset.
  - Added BHRB resource initialization and reset functions
  - Enabled functionality for P8, P9 and P10 processors.

Signed-off-by: Glenn Miles 
---
 target/ppc/cpu.h   |  18 +++-
 target/ppc/cpu_init.c  |  41 -
 target/ppc/helper_regs.c   |  32 +++
 target/ppc/helper_regs.h   |   1 +
 target/ppc/power8-pmu.c|   2 +
 target/ppc/power8-pmu.h|   7 ++
 target/ppc/translate.c | 114 +++--
 target/ppc/translate/branch-impl.c.inc |   2 +-
 8 files changed, 205 insertions(+), 12 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 20ae1466a5..bda1afb700 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -454,8 +454,9 @@ FIELD(MSR, LE, MSR_LE, 1)
 #define MMCR2_UREG_MASK (MMCR2_FC1P0 | MMCR2_FC2P0 | MMCR2_FC3P0 | \
  MMCR2_FC4P0 | MMCR2_FC5P0 | MMCR2_FC6P0)
 
-#define MMCRA_BHRBRDPPC_BIT(26)/* BHRB Recording Disable */
-
+#define MMCRA_BHRBRDPPC_BIT(26) /* BHRB Recording Disable */
+#define MMCRA_IFM_MASK  PPC_BITMASK(32, 33) /* BHRB Instruction Filtering */
+#define MMCRA_IFM_SHIFT PPC_BIT_NR(33)
 
 #define MMCR1_EVT_SIZE 8
 /* extract64() does a right shift before extracting */
@@ -682,6 +683,8 @@ enum {
 POWERPC_FLAG_SMT  = 0x0040,
 /* Using "LPAR per core" mode  (as opposed to per-thread)*/
 POWERPC_FLAG_SMT_1LPAR = 0x0080,
+/* Has BHRB */
+POWERPC_FLAG_BHRB  = 0x0100,
 };
 
 /*
@@ -1110,6 +1113,9 @@ DEXCR_ASPECT(PHIE, 6)
 #define PPC_CPU_OPCODES_LEN  0x40
 #define PPC_CPU_INDIRECT_OPCODES_LEN 0x20
 
+#define BHRB_MAX_NUM_ENTRIES_LOG2 (5)
+#define BHRB_MAX_NUM_ENTRIES  (1 << BHRB_MAX_NUM_ENTRIES_LOG2)
+
 struct CPUArchState {
 /* Most commonly used resources during translated code execution first */
 target_ulong gpr[32];  /* general purpose registers */
@@ -1196,6 +1202,14 @@ struct CPUArchState {
 int dcache_line_size;
 int icache_line_size;
 
+/* Branch History Rolling Buffer (BHRB) resources */
+target_ulong bhrb_num_entries;
+target_ulong bhrb_base;
+target_ulong bhrb_filter;
+target_ulong bhrb_offset;
+target_ulong bhrb_offset_mask;
+uint64_t bhrb[BHRB_MAX_NUM_ENTRIES];
+
 /* These resources are used during exception processing */
 /* CPU model definition */
 target_ulong msr_mask;
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 568f9c3b88..19d7505a73 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6100,6 +6100,28 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
 pcc->l1_icache_size = 0x8000;
 }
 
+static void bhrb_init_state(CPUPPCState *env, target_long num_entries_log2)
+{
+if (env->flags & POWERPC_FLAG_BHRB) {
+if (num_entries_log2 > BHRB_MAX_NUM_ENTRIES_LOG2) {
+num_entries_log2 = BHRB_MAX_NUM_ENTRIES_LOG2;
+}
+env->bhrb_num_entries = 1 << num_entries_log2;
+env->bhrb_base = (target_long)>bhrb[0];
+env->bhrb_offset_mask = (env->bhrb_num_entries * sizeof(uint64_t)) - 1;
+}
+}
+
+static void bhrb_reset_state(CPUPPCState *env)
+{
+if (env->flags & POWERPC_FLAG_BHRB) {
+env->bhrb_offset = 0;
+env->bhrb_filter = 0;
+memset(env->bhrb, 0, sizeof(env->bhrb));
+}
+}
+
+#define POWER8_BHRB_ENTRIES_LOG2 5
 static void init_proc_POWER8(CPUPPCState *env)
 {
 /* Common Registers */
@@ -6141,6 +6163,8 @@ static void init_proc_POWER8(CPUPPCState *env)
 env->dcache_line_size = 128;
 env->icache_line_size = 128;
 
+bhrb_init_state(env, POWER8_BHRB_ENTRIES_LOG2);
+
 /* Allocate hardware IRQ controller */
 init_excp_POWER8(env);
 ppcPOWER7_irq_init(env_archcpu(env));
@@ -6241,7 +6265,8 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
 pcc->flags = POWERPC_FLAG_VRE | POWERPC_FLAG_SE |
  POWERPC_FLAG_BE | POWERPC_FLAG_PMM |
  POWERPC_FLAG_BUS_CLK | POWERPC_FLAG_CFAR |
- POWERPC_FLAG_VSX | POWERPC_FLAG_TM;
+ POWERPC_FLAG_VSX | POWERPC_FLAG_TM |
+

[PATCH 1/4] target/ppc: Add new hflags to support BHRB

2023-09-12 Thread Glenn Miles

This commit is preparatory to the addition of Branch History
Rolling Buffer (BHRB) functionality, which is being provided
today starting with the P8 processor.

BHRB uses several SPR register fields to control whether or not
a branch instruction's address (and sometimes target address)
should be recorded.  Checking each of these fields with each
branch instruction using jitted code would lead to a significant
decrease in performance.

Therefore, it was decided that BHRB configuration bits that are
not expected to change frequently should have their state stored in
hflags so that the amount of checking done by jitted code can
be reduced.

This commit contains the changes for storing the state of the
following register fields as hflags:

MMCR0[FCP] - Determines if BHRB recording is frozen in the
 problem state

MMCR0[FCPC] - A modifier for MMCR0[FCP]

MMCRA[BHRBRD] - Disables all BHRB recording for a thread

Signed-off-by: Glenn Miles 
---
 target/ppc/cpu.h |  9 +
 target/ppc/cpu_init.c|  4 ++--
 target/ppc/helper.h  |  1 +
 target/ppc/helper_regs.c | 12 
 target/ppc/machine.c |  2 +-
 target/ppc/power8-pmu-regs.c.inc |  5 +
 target/ppc/power8-pmu.c  | 15 +++
 target/ppc/power8-pmu.h  |  4 ++--
 target/ppc/spr_common.h  |  1 +
 target/ppc/translate.c   |  6 ++
 10 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 25fac9577a..20ae1466a5 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -439,6 +439,9 @@ FIELD(MSR, LE, MSR_LE, 1)
 #define MMCR0_FC56   PPC_BIT(59) /* PMC Freeze Counters 5-6 bit */
 #define MMCR0_PMC1CE PPC_BIT(48) /* MMCR0 PMC1 Condition Enabled */
 #define MMCR0_PMCjCE PPC_BIT(49) /* MMCR0 PMCj Condition Enabled */
+#define MMCR0_BHRBA  PPC_BIT_NR(42)  /* BHRB Available */
+#define MMCR0_FCPPPC_BIT(34) /* Freeze Counters/BHRB if PR=1 */
+#define MMCR0_FCPC   PPC_BIT(51) /* Condition for FCP bit */
 /* MMCR0 userspace r/w mask */
 #define MMCR0_UREG_MASK (MMCR0_FC | MMCR0_PMAO | MMCR0_PMAE)
 /* MMCR2 userspace r/w mask */
@@ -451,6 +454,9 @@ FIELD(MSR, LE, MSR_LE, 1)
 #define MMCR2_UREG_MASK (MMCR2_FC1P0 | MMCR2_FC2P0 | MMCR2_FC3P0 | \
  MMCR2_FC4P0 | MMCR2_FC5P0 | MMCR2_FC6P0)
 
+#define MMCRA_BHRBRDPPC_BIT(26)/* BHRB Recording Disable */
+
+
 #define MMCR1_EVT_SIZE 8
 /* extract64() does a right shift before extracting */
 #define MMCR1_PMC1SEL_START 32
@@ -703,6 +709,9 @@ enum {
 HFLAGS_PMCJCE = 17, /* MMCR0 PMCjCE bit */
 HFLAGS_PMC_OTHER = 18, /* PMC other than PMC5-6 is enabled */
 HFLAGS_INSN_CNT = 19, /* PMU instruction count enabled */
+HFLAGS_FCPC = 20,   /* MMCR0 FCPC bit */
+HFLAGS_FCP = 21,/* MMCR0 FCP bit */
+HFLAGS_BHRBRD = 22, /* MMCRA BHRBRD bit */
 HFLAGS_VSX = 23, /* MSR_VSX if cpu has VSX */
 HFLAGS_VR = 25,  /* MSR_VR if cpu has VRE */
 
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 02b7aad9b0..568f9c3b88 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -5152,7 +5152,7 @@ static void register_book3s_pmu_sup_sprs(CPUPPCState *env)
  KVM_REG_PPC_MMCR1, 0x);
 spr_register_kvm(env, SPR_POWER_MMCRA, "MMCRA",
  SPR_NOACCESS, SPR_NOACCESS,
- _read_generic, _write_generic,
+ _read_generic, _write_MMCRA,
  KVM_REG_PPC_MMCRA, 0x);
 spr_register_kvm(env, SPR_POWER_PMC1, "PMC1",
  SPR_NOACCESS, SPR_NOACCESS,
@@ -7152,7 +7152,7 @@ static void ppc_cpu_reset_hold(Object *obj)
 if (env->mmu_model != POWERPC_MMU_REAL) {
 ppc_tlb_invalidate_all(env);
 }
-pmu_mmcr01_updated(env);
+pmu_mmcr01a_updated(env);
 }
 
 /* clean any pending stop state */
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index abec6fe341..1a3d9a7e57 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -27,6 +27,7 @@ DEF_HELPER_2(store_lpcr, void, env, tl)
 DEF_HELPER_2(store_pcr, void, env, tl)
 DEF_HELPER_2(store_mmcr0, void, env, tl)
 DEF_HELPER_2(store_mmcr1, void, env, tl)
+DEF_HELPER_2(store_mmcrA, void, env, tl)
 DEF_HELPER_3(store_pmc, void, env, i32, i64)
 DEF_HELPER_2(read_pmc, tl, env, i32)
 DEF_HELPER_2(insns_inc, void, env, i32)
diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
index f380342d4d..4ff054063d 100644
--- a/target/ppc/helper_regs.c
+++ b/target/ppc/helper_regs.c
@@ -61,6 +61,15 @@ static uint32_t hreg_compute_pmu_hflags_value(CPUPPCState 
*env)
 if (env->spr[SPR_POWER_MMCR0] & MMCR0_PMCjCE) {
 hflags |= 1 << HFLAGS_PMCJCE;
 }
+if (env->spr[SPR_POWER_MMCR0] & MMCR0_FCP) {
+hflags |= 1 << HFLAGS_FCP;
+}
+if (env->spr[SPR_POWER_MMCR0] & MMCR0_FCPC) {
+

Re: [PATCH v2] tpm: fix crash when FD >= 1024

2023-09-12 Thread Michael Tokarev


12.09.2023 03:08, Stefan Berger пишет:


On 9/11/23 09:25, marcandre.lur...@redhat.com wrote:

From: Marc-Andr޸ Lureau 

Replace select() with poll() to fix a crash when QEMU has a large number
of FDs.

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=2020133


Fixes:  ca64b08638 ("tpm: Move backend code under the 'backends/' directory")


Heh. I noticed this only now.  No, this is not the commit which introduced
the breakage.  It is either

commit 56a3c24ffc11955ddc7bb21362ca8069a3fc8c55
Author: Stefan Berger 
Date:   Tue May 26 16:51:06 2015 -0400

tpm: Probe for connected TPM 1.2 or TPM 2

which introduced select() in the first place (provided similar select()
hasn't been used in there before.  Or some other commit somewhere else
which allowed to have large number of filedescriptors - provided it wasn't
possible before.  But definitely not a commit which just moved file into
another subdir :)

/mjt

Re: [PATCH 1/4] target/ppc: Add new hflags to support BHRB

2023-09-12 Thread Cédric Le Goater


On 9/12/23 22:00, Glenn Miles wrote:

Sorry, this is my first attempt at sending out a patch and it looks like only 
part of the patch made it.  Until I can figure out what I did wrong, please 
ignore this patch.


I didn't get patches 2-4. Patch 1 looked good though. Please resend.

Thanks,

C.

Re: [PATCH 2/9] migration: Let migrate_set_error() take ownership

2023-09-12 Thread Peter Xu

On Tue, Sep 12, 2023 at 04:40:14PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > migrate_set_error() used one error_copy() so it always copy an error.
> > However that's not the major use case - the major use case is one would
> > like to pass the error to migrate_set_error() without further touching the
> > error.
> >
> > It can be proved if we see most of the callers are freeing the error
> > explicitly right afterwards.  There're a few outliers (only if when the
> > caller) where we can use error_copy() explicitly there.
> >
> > Reviewed-by: Fabiano Rosas 
> > Signed-off-by: Peter Xu 
> > ---
> >  migration/migration.h|  4 ++--
> >  migration/channel.c  |  1 -
> >  migration/migration.c| 22 --
> >  migration/multifd.c  | 10 --
> >  migration/postcopy-ram.c |  1 -
> >  migration/ram.c  |  1 -
> >  6 files changed, 22 insertions(+), 17 deletions(-)
> >
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 6eea18db36..76e35a5ecf 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -465,7 +465,7 @@ bool  migration_has_all_channels(void);
> >  
> >  uint64_t migrate_max_downtime(void);
> >  
> > -void migrate_set_error(MigrationState *s, const Error *error);
> > +void migrate_set_error(MigrationState *s, Error *error);
> >  
> >  void migrate_fd_connect(MigrationState *s, Error *error_in);
> >  
> > @@ -510,7 +510,7 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, 
> > void *opaque);
> >  void migration_make_urgent_request(void);
> >  void migration_consume_urgent_request(void);
> >  bool migration_rate_limit(void);
> > -void migration_cancel(const Error *error);
> > +void migration_cancel(Error *error);
> >  
> >  void populate_vfio_info(MigrationInfo *info);
> >  void reset_vfio_bytes_transferred(void);
> > diff --git a/migration/channel.c b/migration/channel.c
> > index ca3319a309..48b3f6abd6 100644
> > --- a/migration/channel.c
> > +++ b/migration/channel.c
> > @@ -90,7 +90,6 @@ void migration_channel_connect(MigrationState *s,
> >  }
> >  }
> >  migrate_fd_connect(s, error);
> > -error_free(error);
> >  }
> >  
> >  
> > diff --git a/migration/migration.c b/migration/migration.c
> > index c60064d48e..0f3ca168ed 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -162,7 +162,7 @@ void migration_object_init(void)
> >  dirty_bitmap_mig_init();
> >  }
> >  
> > -void migration_cancel(const Error *error)
> > +void migration_cancel(Error *error)
> >  {
> >  if (error) {
> >  migrate_set_error(current_migration, error);
> > @@ -1218,11 +1218,22 @@ static void migrate_fd_cleanup_bh(void *opaque)
> >  object_unref(OBJECT(s));
> >  }
> >  
> > -void migrate_set_error(MigrationState *s, const Error *error)
> > +/*
> > + * Set error for current migration state.  The `error' ownership will be
> > + * moved from the caller to MigrationState, so the caller doesn't need to
> > + * free the error.
> > + *
> > + * If the caller still needs to reference the `error' passed in, one should
> > + * use error_copy() explicitly.
> > + */
> > +void migrate_set_error(MigrationState *s, Error *error)
> >  {
> >  QEMU_LOCK_GUARD(>error_mutex);
> >  if (!s->error) {
> > -s->error = error_copy(error);
> > +/* Record the first error triggered */
> > +s->error = error;
> > +} else {
> > +error_free(error);
> 
> This will conflict logically with 908927db28 ("migration: Update error
> description whenever migration fails") which does:
> 
> +migrate_set_error(s, local_err);
> +error_report_err(local_err);
> 
> both functions may now try to free the error.

Indeed, thanks for spotting this.  Perhaps I should just drop the
error_report_err() if we've set the error already anyway.

> 
> 
> I'm working on top of this series to try to get rid of all of those
> qemu_file_set_error() we have. I'm trying to use migrate_set_error()
> whenever possible and only set f->last_error at the very bottom IO
> functions.

I'll read when it comes.

-- 
Peter Xu

Re: [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery

2023-09-12 Thread Peter Xu

On Mon, Sep 11, 2023 at 09:31:51PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> Hi, sorry it took me so long to get to this.

Not a problem.

> 
> > Normally the postcopy recover phase should only exist for a super short
> > period, that's the duration when QEMU is trying to recover from an
> > interrupted postcopy migration, during which handshake will be carried out
> > for continuing the procedure with state changes from PAUSED -> RECOVER ->
> > POSTCOPY_ACTIVE again.
> >
> > Here RECOVER phase should be super small, that happens right after the
> > admin specified a new but working network link for QEMU to reconnect to
> > dest QEMU.
> >
> > However there can still be case where the channel is broken in this small
> > RECOVER window.
> >
> > If it happens, with current code there's no way the src QEMU can got kicked
> > out of RECOVER stage. No way either to retry the recover in another channel
> > when established.
> >
> > This patch allows the RECOVER phase to fail itself too - we're mostly
> > ready, just some small things missing, e.g. properly kick the main
> > migration thread out when sleeping on rp_sem when we found that we're at
> > RECOVER stage.  When this happens, it fails the RECOVER itself, and
> > rollback to PAUSED stage.  Then the user can retry another round of
> > recovery.
> >
> > To make it even stronger, teach QMP command migrate-pause to explicitly
> > kick src/dst QEMU out when needed, so even if for some reason the migration
> > thread didn't got kicked out already by a failing rethrn-path thread, the
> > admin can also kick it out.
> >
> > This will be an super, super corner case, but still try to cover that.
> 
> It would be nice to have a test for this. Being such a corner case, it
> will be hard to keep this scenario working.

Yes makes sense.

> 
> I wrote two tests[1] that do the recovery each using a different URI:
> 1) fd: using a freshly opened file,
> 2) fd: using a socketpair that simply has nothing on the other end.
> 
> These might be too far from the original bug, but it seems to exercise
> some of the same paths:
> 
> Scenario 1:
> /x86_64/migration/postcopy/recovery/fail-twice
> 
> the stacks are:
> 
> Thread 8 (Thread 0x7fffd5ffe700 (LWP 30282) "live_migration"):
>  qemu_sem_wait
>  ram_dirty_bitmap_sync_all
>  ram_resume_prepare
>  qemu_savevm_state_resume_prepare
>  postcopy_do_resume
>  postcopy_pause
>  migration_detect_error
>  migration_thread
> 
> Thread 7 (Thread 0x7fffd67ff700 (LWP 30281) "return path"):
>  qemu_sem_wait
>  postcopy_pause_return_path_thread
>  source_return_path_thread

I guess this is because below path triggers:

if (len > 0) {
f->buf_size += len;
f->total_transferred += len;
} else if (len == 0) {
qemu_file_set_error_obj(f, -EIO, local_error); <---
} else {
qemu_file_set_error_obj(f, len, local_error);
}

So the src can always write anything into the tmp file, but any read will
return 0 immediately because file offset is always pointing to the file
size.

> 
> This patch seems to fix it, although we cannot call qmp_migrate_recover
> a second time because the mis state is now in RECOVER:
> 
>   "Migrate recover can only be run when postcopy is paused."
> 
> Do we maybe need to return the state to PAUSED, or allow
> qmp_migrate_recover to run in RECOVER, like you did on the src side?

Ouch, I just noticed that my patch was wrong.

I probably need this:

===8<===
>From 8c2fb7b4c7488002283c7fb6a5e2aae81b21e04b Mon Sep 17 00:00:00 2001
From: Peter Xu 
Date: Tue, 12 Sep 2023 15:49:54 -0400
Subject: [PATCH] fixup! migration/postcopy: Allow network to fail even during
 recovery

Signed-off-by: Peter Xu 
---
 migration/migration.h | 2 +-
 migration/migration.c | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index e7f48e736e..7e61e2ece7 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -482,7 +482,7 @@ int migrate_init(MigrationState *s, Error **errp);
 bool migration_is_blocked(Error **errp);
 /* True if outgoing migration has entered postcopy phase */
 bool migration_in_postcopy(void);
-bool migration_postcopy_is_alive(void);
+bool migration_postcopy_is_alive(int state);
 MigrationState *migrate_get_current(void);
 
 uint64_t ram_get_total_transferred_pages(void);
diff --git a/migration/migration.c b/migration/migration.c
index de2146c6fc..a9d381886c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1349,7 +1349,7 @@ bool migration_in_postcopy(void)
 }
 }
 
-bool migration_postcopy_is_alive(void)
+bool migration_postcopy_is_alive(int state)
 {
 MigrationState *s = migrate_get_current();
 
@@ -1569,7 +1569,7 @@ void qmp_migrate_pause(Error **errp)
 MigrationIncomingState *mis = migration_incoming_get_current();
 int ret;
 
-if (migration_postcopy_is_alive()) {
+if (migration_postcopy_is_alive(ms->state)) {
 /* Source side, during postcopy

Re: [PATCH v2] tpm: fix crash when FD >= 1024

2023-09-12 Thread Michael Tokarev


12.09.2023 03:08, Stefan Berger wrote:


On 9/11/23 09:25, marcandre.lur...@redhat.com wrote:

From: Marc-Andr޸ Lureau 

Replace select() with poll() to fix a crash when QEMU has a large number
of FDs.

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=2020133


For backporting I think we should also add this tag here:

Fixes:  ca64b08638 ("tpm: Move backend code under the 'backends/' directory")


It's nice to have Fixes tags generally.

Yes, it helps backporting a little bit, but it is mostly about choosing
which changes might be appropriate when there's no to-stable/to-backport
markers/tags whatsoever.  If you already know for sure some change should
be picked up for stable, it's better to add Cc: qemu-stable@.  With Fixes
also in place, besides its usefulness for other purposes, it helps me to
see which older versions needs this, but usually it's relatively easy to
determine even without Fixes: tag.  Many changes picked up for stable do
not have such tag just because there's no single commit which introduced
an issue, or some other situation.


Though RETRY_ON_EINTR was only introduced in 8.0.0-rc0. What's the right tag 
for backporting then?


There's no such tag.  If you know already there's possible issue with older
versions (and this is exactly the case), any comment about this might help
for sure.  This your note saved me a compile (which would fail for sure),
after which I would find

commit v7.2.0-538-g8b6aa69365
Author: Nikita Ivanov 
Date:   Sun Oct 23 12:04:21 2022 +0300

Refactoring: refactor TFR() macro to RETRY_ON_EINTR()

the same way I did now.

If you're trying to find a way to make this new fix be "more backportable",
maybe by avoiding using a feature designed especially for this, - I think
this is not productive, the priority is definitely to have better "master",
and think about what to do with earlier versions only after that.

In this case, and in about 5 other examples from today, the thought about
stable releases best be done when introducing wide changes, like this commit
above which replaced TFR with RETRY_ON_EINTR.  Since this new macro will be
used everywhere for sure, the best way would be to split that single patch
into 3: first one introducing the new RETRY_ON_EINTR(), second converting all
users of TFR to RETRY_ON_EINTR, and 3rd (which can be folded into second)
removing TFR which is now unused.  This way I can cherry-pick just the first
patch easily if needed.  But once again, the priority should be master, not
backports.

Thanks,

/mjt

Re: [PATCH 3/3] iotests: distinguish 'skipped' and 'not run' states

2023-09-12 Thread Vladimir Sementsov-Ogievskiy


On 06.09.23 17:09, Denis V. Lunev wrote:

Each particular testcase could skipped intentionally and accidentally.
For example the test is not designed for a particular image format or
is not run due to the missed library.

The latter case is unwanted in reality. Though the discussion has
revealed that failing the test in such a case would be bad. Thus the
patch tries to do different thing. It adds additional status for
the test case - 'skipped' and bound intentinal cases to that state.


Hmm. Do I miss something, or in this patch you only split them, not making "not run" 
produce an error? So ./check still reports success when some tests are "not run"?

The split itself looks correct to me.

--
Best regards,
Vladimir

Re: [PATCH 0/4] ci: fix hang of FreeBSD CI jobs

2023-09-12 Thread Thomas Huth


On 12/09/2023 20.41, Daniel P. Berrangé wrote:

This addresses

   https://gitlab.com/qemu-project/qemu/-/issues/1882

Which turned out to be a genuine flaw which we missed during merge
as the patch hitting master co-incided with the FreeBSD CI job
having an temporary outage due to changed release image version.

Daniel P. Berrangé (4):
   microbit: add missing qtest_quit() call
   qtest: kill orphaned qtest QEMU processes on FreeBSD
   gitlab: make Cirrus CI timeout explicit
   gitlab: make Cirrus CI jobs gating

  .gitlab-ci.d/cirrus.yml   | 4 +++-
  .gitlab-ci.d/cirrus/build.yml | 2 ++
  tests/qtest/libqtest.c| 7 +++
  tests/qtest/microbit-test.c   | 2 ++
  4 files changed, 14 insertions(+), 1 deletion(-)



Series
Reviewed-by: Thomas Huth 

Alex, will you pick these up or shall I take them for my next PR?
Or Stefan, do you want to apply these directly as a CI fix?

 Thomas

Re: [PATCH 1/4] target/ppc: Add new hflags to support BHRB

2023-09-12 Thread Glenn Miles

Sorry, this is my first attempt at sending out a patch and it looks like 
only part of the patch made it.  Until I can figure out what I did 
wrong, please ignore this patch.


Thanks,
Glenn Miles

Re: [PATCH] gitlab: remove unreliable avocado CI jobs

2023-09-12 Thread Thomas Huth


On 12/09/2023 17.06, Stefan Hajnoczi wrote:

The avocado-system-alpine, avocado-system-fedora, and
avocado-system-ubuntu jobs are unreliable. I identified them while
looking over CI failures from the past week:
https://gitlab.com/qemu-project/qemu/-/jobs/5058610614
https://gitlab.com/qemu-project/qemu/-/jobs/5058610654
https://gitlab.com/qemu-project/qemu/-/jobs/5030428571

Thomas Huth suggest on IRC today that there may be a legitimate failure
in there:

   th_huth: f4bug, yes, seems like it does not start at all correctly on
   alpine anymore ... and it's broken since ~ 2 weeks already, so if nobody
   noticed this by now, this is worrying

It crept in because the jobs were already unreliable.

I don't know how to interpret the job output, so all I can do is to
propose removing these jobs. A useful CI job has two outcomes: pass or
fail. Timeouts and other in-between states are not useful because they
require constant triaging by someone who understands the details of the
tests and they can occur when run against pull requests that have
nothing to do with the area covered by the test.

Hopefully test owners will be able to identify the root causes and solve
them so that these jobs can stay. In their current state the jobs are
not useful since I cannot cannot tell whether job failures are real or
just intermittent when merging qemu.git pull requests.

If you are a test owner, please take a look.

It is likely that other avocado-system-* CI jobs have similar failures
from time to time, but I'll leave them as long as they are passing.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1884
Signed-off-by: Stefan Hajnoczi 
---
  .gitlab-ci.d/buildtest.yml | 27 ---
  1 file changed, 27 deletions(-)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index aee9101507..83ce448c4d 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -22,15 +22,6 @@ check-system-alpine:
  IMAGE: alpine
  MAKE_CHECK_ARGS: check-unit check-qtest
  
-avocado-system-alpine:

-  extends: .avocado_test_job_template
-  needs:
-- job: build-system-alpine
-  artifacts: true
-  variables:
-IMAGE: alpine
-MAKE_CHECK_ARGS: check-avocado


Please don't remove the whole job! Just disable the failing tests within the 
job, e.g.:

diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py
--- a/tests/avocado/replay_kernel.py
+++ b/tests/avocado/replay_kernel.py
@@ -503,6 +503,7 @@ def do_test_mips_malta32el_nanomips(self, kernel_path_xz):
 console_pattern = 'Kernel command line: %s' % kernel_command_line
 self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5)
 
+@skipIf(os.getenv('GITLAB_CI'), 'Skipping unstable test on GitLab')

 def test_mips_malta32el_nanomips_4k(self):
 """
 :avocado: tags=arch:mipsel
@@ -517,6 +518,7 @@ def test_mips_malta32el_nanomips_4k(self):
 kernel_path_xz = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
 self.do_test_mips_malta32el_nanomips(kernel_path_xz)
 
+@skipIf(os.getenv('GITLAB_CI'), 'Skipping unstable test on GitLab')

 def test_mips_malta32el_nanomips_16k_up(self):
 """
 :avocado: tags=arch:mipsel
@@ -531,6 +533,7 @@ def test_mips_malta32el_nanomips_16k_up(self):
 kernel_path_xz = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
 self.do_test_mips_malta32el_nanomips(kernel_path_xz)
 
+@skipIf(os.getenv('GITLAB_CI'), 'Skipping unstable test on GitLab')

 def test_mips_malta32el_nanomips_64k_dbg(self):
 """
 :avocado: tags=arch:mipsel

Re: [PATCH 2/9] migration: Let migrate_set_error() take ownership

2023-09-12 Thread Fabiano Rosas

Peter Xu  writes:

> migrate_set_error() used one error_copy() so it always copy an error.
> However that's not the major use case - the major use case is one would
> like to pass the error to migrate_set_error() without further touching the
> error.
>
> It can be proved if we see most of the callers are freeing the error
> explicitly right afterwards.  There're a few outliers (only if when the
> caller) where we can use error_copy() explicitly there.
>
> Reviewed-by: Fabiano Rosas 
> Signed-off-by: Peter Xu 
> ---
>  migration/migration.h|  4 ++--
>  migration/channel.c  |  1 -
>  migration/migration.c| 22 --
>  migration/multifd.c  | 10 --
>  migration/postcopy-ram.c |  1 -
>  migration/ram.c  |  1 -
>  6 files changed, 22 insertions(+), 17 deletions(-)
>
> diff --git a/migration/migration.h b/migration/migration.h
> index 6eea18db36..76e35a5ecf 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -465,7 +465,7 @@ bool  migration_has_all_channels(void);
>  
>  uint64_t migrate_max_downtime(void);
>  
> -void migrate_set_error(MigrationState *s, const Error *error);
> +void migrate_set_error(MigrationState *s, Error *error);
>  
>  void migrate_fd_connect(MigrationState *s, Error *error_in);
>  
> @@ -510,7 +510,7 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void 
> *opaque);
>  void migration_make_urgent_request(void);
>  void migration_consume_urgent_request(void);
>  bool migration_rate_limit(void);
> -void migration_cancel(const Error *error);
> +void migration_cancel(Error *error);
>  
>  void populate_vfio_info(MigrationInfo *info);
>  void reset_vfio_bytes_transferred(void);
> diff --git a/migration/channel.c b/migration/channel.c
> index ca3319a309..48b3f6abd6 100644
> --- a/migration/channel.c
> +++ b/migration/channel.c
> @@ -90,7 +90,6 @@ void migration_channel_connect(MigrationState *s,
>  }
>  }
>  migrate_fd_connect(s, error);
> -error_free(error);
>  }
>  
>  
> diff --git a/migration/migration.c b/migration/migration.c
> index c60064d48e..0f3ca168ed 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -162,7 +162,7 @@ void migration_object_init(void)
>  dirty_bitmap_mig_init();
>  }
>  
> -void migration_cancel(const Error *error)
> +void migration_cancel(Error *error)
>  {
>  if (error) {
>  migrate_set_error(current_migration, error);
> @@ -1218,11 +1218,22 @@ static void migrate_fd_cleanup_bh(void *opaque)
>  object_unref(OBJECT(s));
>  }
>  
> -void migrate_set_error(MigrationState *s, const Error *error)
> +/*
> + * Set error for current migration state.  The `error' ownership will be
> + * moved from the caller to MigrationState, so the caller doesn't need to
> + * free the error.
> + *
> + * If the caller still needs to reference the `error' passed in, one should
> + * use error_copy() explicitly.
> + */
> +void migrate_set_error(MigrationState *s, Error *error)
>  {
>  QEMU_LOCK_GUARD(>error_mutex);
>  if (!s->error) {
> -s->error = error_copy(error);
> +/* Record the first error triggered */
> +s->error = error;
> +} else {
> +error_free(error);

This will conflict logically with 908927db28 ("migration: Update error
description whenever migration fails") which does:

+migrate_set_error(s, local_err);
+error_report_err(local_err);

both functions may now try to free the error.


I'm working on top of this series to try to get rid of all of those
qemu_file_set_error() we have. I'm trying to use migrate_set_error()
whenever possible and only set f->last_error at the very bottom IO
functions.

Re: [PATCH 2/3] iotests: improve 'not run' message for nbd-multiconn test

2023-09-12 Thread Vladimir Sementsov-Ogievskiy


On 06.09.23 17:09, Denis V. Lunev wrote:

The test actually requires Python bindings to libnbd rather than libnbd
itself. Clarify that inside the message.

Signed-off-by: Denis V. Lunev
CC: Kevin Wolf
CC: Hanna Reitz
CC: Eric Blake


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: [PATCH 05/11] accel/tcg: Modifies memory access functions to use CPUState

2023-09-12 Thread Richard Henderson


On 9/12/23 08:34, Anton Johansson wrote:

do_[ld|st]*() and mmu_lookup*() are changed to use CPUState over
CPUArchState, moving the target-dependence to the target-facing facing
cpu_[ld|st] functions.

Signed-off-by: Anton Johansson 
---
  accel/tcg/cputlb.c | 324 ++---
  1 file changed, 161 insertions(+), 163 deletions(-)


So... what's your ultimate plan here?

At the moment through patches 5-11, all you do is take CPUArchState, discard knowledge of 
it via CPUState, and then recover knowledge of it via cpu->tlb_ptr.


I agree that *something* has to happen in order to allow these entry points to be used by 
multiple cpu types simultaneously, but there must be a plan.


Is it to have tcg generated code perform env_cpu() inline before the call?  That's just 
pointer arithmetic, so it's certainly an easy option.



r~

Re: [PATCH 1/2] blockdev: qmp_transaction: harden transaction properties for bitmaps

2023-09-12 Thread Vladimir Sementsov-Ogievskiy


On 04.09.23 11:31, Andrey Zhadchenko wrote:

Unlike other transaction commands, bitmap operations do not drain target
bds. If we have an IOThread, this may result in some inconsistencies, as
bitmap content may change during transaction command.
Add bdrv_drained_begin()/end() to bitmap operations.

Signed-off-by: Andrey Zhadchenko


Hi!

First, please always include cover letter when more than 1 patch.

Next. Hmm. Good idea, but I'm afraid that's still not enough.

Assume you have two BSs A and B in two different iothreads. So, the sequence 
may be like this:

1. drain_begin A

2. do operation with bitmap in A

3. guest writes to B, B is modified and bitmap in B is modified as well

4. drain_begin B

5. do operation with bitmap in B

6. drain_end B

7. drain_end A

User may expect, that all the operations are done atomically in relation to any 
guest IO operations. And if operations are dependent, the intermediate write 
[3] make break the result.

So, we should drain all participant drives during the whole transactions. The 
simplest solution is bdrv_drain_all_begin() / bdrv_drain_all_end() pair in 
qmp_transaction(), could we start with it?

--
Best regards,
Vladimir

[PATCH 0/4] Add BHRB Facility Support

2023-09-12 Thread Glenn Miles

This is a series of patches for adding support for the Branch History
Rolling Buffer (BHRB) facility.  This was added to the Power ISA
starting with version 2.07.  Changes were subsequently made in version
3.1 to limit BHRB recording to instructions run in problem state only
and to add a control bit to disable recording (MMCRA[BHRBRD]).


Glenn Miles (4):
  target/ppc: Add new hflags to support BHRB
  target/ppc: Add recording of taken branches to BHRB
  target/ppc: Add clrbhrb and mfbhrbe instructions
  target/ppc: Add migration support for BHRB

 target/ppc/cpu.h   |  24 +
 target/ppc/cpu_init.c  |  45 -
 target/ppc/helper.h|   5 +
 target/ppc/helper_regs.c   |  44 
 target/ppc/helper_regs.h   |   1 +
 target/ppc/machine.c   |  25 -
 target/ppc/misc_helper.c   |  43 
 target/ppc/power8-pmu-regs.c.inc   |   5 +
 target/ppc/power8-pmu.c|  17 +++-
 target/ppc/power8-pmu.h|  11 +-
 target/ppc/spr_common.h|   1 +
 target/ppc/translate.c | 133 +++--
 target/ppc/translate/branch-impl.c.inc |   2 +-
 13 files changed, 337 insertions(+), 19 deletions(-)

-- 
2.31.1

Re: [PATCH 1/4] microbit: add missing qtest_quit() call

2023-09-12 Thread Richard Henderson


On 9/12/23 11:41, Daniel P. Berrangé wrote:

Without this call, the QEMU process is being left running which on
FreeBSD 13.2 at least, makes meson think the test is still running,
and thus execution of "make check" continues forever.

This fixes the regression introduced in:

   commit a9c9bbee855877293683012942d3485d50f286af
   Author: Chris Laplante
   Date:   Tue Aug 22 17:31:02 2023 +0100

 qtest: microbit-test: add tests for nRF51 DETECT

Fixes:https://gitlab.com/qemu-project/qemu/-/issues/1882
Signed-off-by: Daniel P. Berrangé
---
  tests/qtest/microbit-test.c | 2 ++
  1 file changed, 2 insertions(+)


Reviewed-by: Richard Henderson 

But I think that it's unfortunate that we have to remember this for each test.


r~

Re: [PATCH 2/4] qtest: kill orphaned qtest QEMU processes on FreeBSD

2023-09-12 Thread Richard Henderson


On 9/12/23 11:41, Daniel P. Berrangé wrote:

On Linux we use PR_SET_PDEATHSIG to kill orphaned QEMU processes
if we fail to call qtest_quit(), or the test program aborts/segvs.
This prevents meson from hanging forever due to the orphaned
process keeping stdout open.

On FreeBSD we can achieve the same using PROC_PDEATHSIG_CTL, which
gives us the equivalent protection against hangs.

Signed-off-by: Daniel P. Berrangé 
---
  tests/qtest/libqtest.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 34b9c14b75..b1eba71ffe 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -24,6 +24,9 @@
  #ifdef __linux__
  #include 
  #endif /* __linux__ */
+#ifdef __FreeBSD__
+#include 
+#endif /* __FreeBSD__ */
  
  #include "libqtest.h"

  #include "libqmp.h"
@@ -414,6 +417,10 @@ static QTestState *G_GNUC_PRINTF(1, 2) 
qtest_spawn_qemu(const char *fmt, ...)
   */
  prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
  #endif /* __linux__ */
+#ifdef __FreeBSD__
+int sig = SIGKILL;
+procctl(P_PID, getpid(), PROC_PDEATHSIG_CTL, );


We could use 0 for "current process", but this works too.

Reviewed-by: Richard Henderson 


r~

1 2 3 4 >

1 - 100 of 362 matches

Mail list logo