date:20230215

Re: [RFC v3 14/18] backends/iommufd: Introduce the iommufd object

2023-02-15 Thread Eric Auger

Hi Nicolin,

On 2/16/23 00:48, Nicolin Chen wrote:
> Hi Eric,
>
> On Tue, Jan 31, 2023 at 09:53:01PM +0100, Eric Auger wrote:
>
>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>> new file mode 100644
>> index 00..06a866d1bd
>> --- /dev/null
>> +++ b/include/sysemu/iommufd.h
>> @@ -0,0 +1,47 @@
>> +#ifndef SYSEMU_IOMMUFD_H
>> +#define SYSEMU_IOMMUFD_H
>> +
>> +#include "qom/object.h"
>> +#include "qemu/thread.h"
>> +#include "exec/hwaddr.h"
>> +#include "exec/ram_addr.h"
> After rebasing nesting patches on top of this, I see a build error:
>
> 
> [47/876] Compiling C object libcommon.fa.p/hw_arm_smmu-common.c.o
> FAILED: libcommon.fa.p/hw_arm_smmu-common.c.o 
> cc -Ilibcommon.fa.p -I../src/3rdparty/qemu/dtc/libfdt -I/usr/include/pixman-1 
> -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/glib-2.0 
> -I/usr/lib/aarch64-linux-gnu/glib-2.0/include -I/usr/include/gio-unix-2.0 
> -fdiagnostics-color=auto -Wall -Winvalid-pch -std=gnu11 -O2 -g -isystem 
> /src/3rdparty/qemu/linux-headers -isystem linux-headers -iquote . -iquote 
> /src/3rdparty/qemu -iquote /src/3rdparty/qemu/include -iquote 
> /src/3rdparty/qemu/tcg/aarch64 -pthread -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 
> -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing 
> -fno-common -fwrapv -Wundef -Wwrite-strings -Wmissing-prototypes 
> -Wstrict-prototypes -Wredundant-decls -Wold-style-declaration 
> -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k 
> -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels 
> -Wexpansion-to-defined -Wimplicit-fallthrough=2 -Wmissing-format-attribute 
> -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-psabi 
> -fstack-protector-strong -fPIE -MD -MQ libcommon.fa.p/hw_arm_smmu-common.c.o 
> -MF libcommon.fa.p/hw_arm_smmu-common.c.o.d -o 
> libcommon.fa.p/hw_arm_smmu-common.c.o -c 
> ../src/3rdparty/qemu/hw/arm/smmu-common.c
> In file included from /src/3rdparty/qemu/include/sysemu/iommufd.h:7,
>  from ../src/3rdparty/qemu/hw/arm/smmu-common.c:29:
> /src/3rdparty/qemu/include/exec/ram_addr.h:23:10: fatal error: cpu.h: No such 
> file or directory
>23 | #include "cpu.h"
>   |  ^~~
> compilation terminated.
> 
>
> I guess it's resulted from the module inter-dependency. Though our
> nesting patches aren't finalized yet, the possibility of including
> iommufd.h is still there. Meanwhile, the ram_addr.h here is added
> for "ram_addr_t" type, I think. So, could we include "cpu-common.h"
> instead, where the "ram_addr_t" type is actually defined?

Sure. We will fix that on the next iteration

Eric

>
> The build error is gone after this replacement:
> 
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> index 45540de63986..86d370c221b3 100644
> --- a/include/sysemu/iommufd.h
> +++ b/include/sysemu/iommufd.h
> @@ -4,7 +4,7 @@
>  #include "qom/object.h"
>  #include "qemu/thread.h"
>  #include "exec/hwaddr.h"
> -#include "exec/ram_addr.h"
> +#include "exec/cpu-common.h"
>  #include 
>  
>  #define TYPE_IOMMUFD_BACKEND "iommufd"
> 
>
> Thanks
> Nic
>

Re: [PATCH 07/12] testing: update ubuntu2004 to ubuntu2204

2023-02-15 Thread Thomas Huth


On 15/02/2023 20.25, Alex Bennée wrote:

The 22.04 LTS release has been out for almost a year now so its time
to update all the remaining images to the current LTS. We can also
drop some hacks we need for older clang TSAN support.

Signed-off-by: Alex Bennée 
---
  docs/devel/testing.rst|  4 ++--
  .gitlab-ci.d/buildtest.yml| 22 +--
  .gitlab-ci.d/containers.yml   |  4 ++--
  .../{ubuntu2004.docker => ubuntu2204.docker}  | 16 +-
  tests/docker/test-tsan|  2 +-
  tests/lcitool/refresh | 10 +
  6 files changed, 23 insertions(+), 35 deletions(-)
  rename tests/docker/dockerfiles/{ubuntu2004.docker => ubuntu2204.docker} (91%)


Reviewed-by: Thomas Huth

Re: [PATCH v2 05/15] linux-user/sparc: Tidy window spill/fill traps

2023-02-15 Thread Philippe Mathieu-Daudé


On 16/2/23 06:45, Richard Henderson wrote:

Add some macros to localize the hw difference between v9 and pre-v9.

Signed-off-by: Richard Henderson 
---
  linux-user/sparc/cpu_loop.c | 23 +--
  1 file changed, 13 insertions(+), 10 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 04/15] linux-user/sparc: Use TT_TRAP for flush windows

2023-02-15 Thread Philippe Mathieu-Daudé


On 16/2/23 06:45, Richard Henderson wrote:

The v9 and pre-v9 code can be unified with this macro.

Signed-off-by: Richard Henderson 
---
  linux-user/sparc/cpu_loop.c | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 02/15] linux-user/sparc: Tidy syscall trap

2023-02-15 Thread Philippe Mathieu-Daudé


On 16/2/23 06:45, Richard Henderson wrote:

Use TT_TRAP.

For sparc32, 0x88 is the "Slowaris" system call, currently BAD_TRAP
in the kernel's ttable_32.S.  For sparc64, 0x110 is tl0_linux32, the
sparc32 trap, now folded into the TARGET_ABI32 case via TT_TRAP.

For sparc64, there does still exist trap 0x111 as tl0_oldlinux64,
which was replaced by 0x16d as tl0_linux64 in 1998.  Since no one
has noticed, don't bother implementing it now.

Signed-off-by: Richard Henderson 
---
  linux-user/sparc/cpu_loop.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 0/4] target/arm: Cache ARMVAParameters

2023-02-15 Thread Philippe Mathieu-Daudé


Hi Richard,

On 2/2/23 08:52, Richard Henderson wrote:


Richard Henderson (4):
   target/arm: Flush only required tlbs for TCR_EL[12]
   target/arm: Store tbi for both insns and data in ARMVAParameters
   target/arm: Use FIELD for ARMVAParameters
   target/arm: Cache ARMVAParameters


Applying: target/arm: Flush only required tlbs for TCR_EL[12]
error: patch failed: target/arm/helper.c:4151
error: target/arm/helper.c: patch does not apply
Patch failed at 0001 target/arm: Flush only required tlbs for TCR_EL[12]

What is this series base commit?

Re: [PATCH 06/12] gitlab: extend custom runners with base_job_template

2023-02-15 Thread Thomas Huth


On 15/02/2023 20.25, Alex Bennée wrote:

The base job template is responsible for controlling how we kick off
testing on our various branches. Rename and extend the
custom_runner_template so we can take advantage of all that control.

Signed-off-by: Alex Bennée 
---
  .gitlab-ci.d/custom-runners.yml  |  3 ++-
  .gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml   | 10 +-
  .gitlab-ci.d/custom-runners/ubuntu-22.04-aarch32.yml |  2 +-
  .gitlab-ci.d/custom-runners/ubuntu-22.04-aarch64.yml | 10 +-
  4 files changed, 13 insertions(+), 12 deletions(-)


Reviewed-by: Thomas Huth

Re: [PATCH 05/12] gitlab: reduce default verbosity of cirrus run

2023-02-15 Thread Thomas Huth


On 15/02/2023 20.25, Alex Bennée wrote:

We also truncate the echoing of the test log if we fail. Ideally we
would want the build aretefact to be available to gitlab but so far
how to do this eludes me.

Signed-off-by: Alex Bennée 
Cc: Daniel P. Berrangé 
---
  .gitlab-ci.d/cirrus/build.yml | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.gitlab-ci.d/cirrus/build.yml b/.gitlab-ci.d/cirrus/build.yml
index 7ef6af8d33..6563ff3c7a 100644
--- a/.gitlab-ci.d/cirrus/build.yml
+++ b/.gitlab-ci.d/cirrus/build.yml
@@ -32,6 +32,6 @@ build_task:
  - $MAKE -j$(sysctl -n hw.ncpu)
  - for TARGET in $TEST_TARGETS ;
do
-$MAKE -j$(sysctl -n hw.ncpu) $TARGET V=1
-|| { cat meson-logs/testlog.txt; exit 1; } ;
+$MAKE -j$(sysctl -n hw.ncpu) $TARGET
+|| { tail -n 200 meson-logs/testlog.txt; exit 1; } ;
done


I think it should be OK to publish the artifacts on cirrus-ci.com instead - 
you have to click a little bit more often, but you can still get the 
artifacts there, see:


 https://lore.kernel.org/qemu-devel/20230215142503.90660-1-th...@redhat.com/

 Thomas

Re: [PATCH v2 01/13] vdpa net: move iova tree creation from init to start

2023-02-15 Thread Eugenio Perez Martin

On Thu, Feb 16, 2023 at 3:15 AM Si-Wei Liu  wrote:
>
>
>
> On 2/14/2023 11:07 AM, Eugenio Perez Martin wrote:
> > On Tue, Feb 14, 2023 at 2:45 AM Si-Wei Liu  wrote:
> >>
> >>
> >> On 2/13/2023 3:14 AM, Eugenio Perez Martin wrote:
> >>> On Mon, Feb 13, 2023 at 7:51 AM Si-Wei Liu  wrote:
> 
>  On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
> > Only create iova_tree if and when it is needed.
> >
> > The cleanup keeps being responsible of last VQ but this change allows it
> > to merge both cleanup functions.
> >
> > Signed-off-by: Eugenio Pérez 
> > Acked-by: Jason Wang 
> > ---
> > net/vhost-vdpa.c | 99 
> > ++--
> > 1 file changed, 71 insertions(+), 28 deletions(-)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index de5ed8ff22..a9e6c8f28e 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -178,13 +178,9 @@ err_init:
> > static void vhost_vdpa_cleanup(NetClientState *nc)
> > {
> > VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > -struct vhost_dev *dev = >vhost_net->dev;
> >
> > qemu_vfree(s->cvq_cmd_out_buffer);
> > qemu_vfree(s->status);
> > -if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> > -g_clear_pointer(>vhost_vdpa.iova_tree, 
> > vhost_iova_tree_delete);
> > -}
> > if (s->vhost_net) {
> > vhost_net_cleanup(s->vhost_net);
> > g_free(s->vhost_net);
> > @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState 
> > *nc, const uint8_t *buf,
> > return size;
> > }
> >
> > +/** From any vdpa net client, get the netclient of first queue pair */
> > +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> > +{
> > +NICState *nic = qemu_get_nic(s->nc.peer);
> > +NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
> > +
> > +return DO_UPCAST(VhostVDPAState, nc, nc0);
> > +}
> > +
> > +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
> > +{
> > +struct vhost_vdpa *v = >vhost_vdpa;
> > +
> > +if (v->shadow_vqs_enabled) {
> > +v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> > +   v->iova_range.last);
> > +}
> > +}
> > +
> > +static int vhost_vdpa_net_data_start(NetClientState *nc)
> > +{
> > +VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +struct vhost_vdpa *v = >vhost_vdpa;
> > +
> > +assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > +
> > +if (v->index == 0) {
> > +vhost_vdpa_net_data_start_first(s);
> > +return 0;
> > +}
> > +
> > +if (v->shadow_vqs_enabled) {
> > +VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
> > +v->iova_tree = s0->vhost_vdpa.iova_tree;
> > +}
> > +
> > +return 0;
> > +}
> > +
> > +static void vhost_vdpa_net_client_stop(NetClientState *nc)
> > +{
> > +VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +struct vhost_dev *dev;
> > +
> > +assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > +
> > +dev = s->vhost_vdpa.dev;
> > +if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> > +g_clear_pointer(>vhost_vdpa.iova_tree, 
> > vhost_iova_tree_delete);
> > +}
> > +}
> > +
> > static NetClientInfo net_vhost_vdpa_info = {
> > .type = NET_CLIENT_DRIVER_VHOST_VDPA,
> > .size = sizeof(VhostVDPAState),
> > .receive = vhost_vdpa_receive,
> > +.start = vhost_vdpa_net_data_start,
> > +.stop = vhost_vdpa_net_client_stop,
> > .cleanup = vhost_vdpa_cleanup,
> > .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
> > .has_ufo = vhost_vdpa_has_ufo,
> > @@ -351,7 +401,7 @@ dma_map_err:
> >
> > static int vhost_vdpa_net_cvq_start(NetClientState *nc)
> > {
> > -VhostVDPAState *s;
> > +VhostVDPAState *s, *s0;
> > struct vhost_vdpa *v;
> > uint64_t backend_features;
> > int64_t cvq_group;
> > @@ -425,6 +475,15 @@ out:
> > return 0;
> > }
> >
> > +s0 = vhost_vdpa_net_first_nc_vdpa(s);
> > +if (s0->vhost_vdpa.iova_tree) {
> > +/* SVQ is already configured for all virtqueues */
> > +v->iova_tree = s0->vhost_vdpa.iova_tree;
> > +} else {
> > +v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> > +   v->iova_range.last);
>  I wonder how this case could happen,

Re: [PATCH 2/4] target/arm: Store tbi for both insns and data in ARMVAParameters

2023-02-15 Thread Philippe Mathieu-Daudé


On 2/2/23 08:52, Richard Henderson wrote:

This is slightly more work on the consumer side, but means
we will be able to compute this once for multiple uses.

Signed-off-by: Richard Henderson 
---
  target/arm/internals.h|  5 +++--
  target/arm/helper.c   | 18 +-
  target/arm/pauth_helper.c | 29 -
  target/arm/ptw.c  |  6 +++---
  4 files changed, 31 insertions(+), 27 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 04/12] tests: be a bit more strict cleaning up fifos

2023-02-15 Thread Thomas Huth


On 15/02/2023 20.25, Alex Bennée wrote:

When we re-factored we dropped the unlink() step which turns out to be
required for rmdir to do its thing. If we had been checking the return
value we would have noticed so lets do that with this fix.

Fixes: 68406d1085 (tests/unit: cleanups for test-io-channel-command)
Signed-off-by: Alex Bennée 
Suggested-by: Philippe Mathieu-Daudé 
---
  tests/unit/test-io-channel-command.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)


Reviewed-by: Thomas Huth

Re: [PATCH 01/12] gitlab: tweak and filter ninja output to reduce build noise

2023-02-15 Thread Thomas Huth


On 15/02/2023 20.25, Alex Bennée wrote:

A significant portion of our CI logs are just enumerating each
successfully built object file. The current widespread versions of
ninja don't have a quiet option so we use NINJA_STATUS to add a fixed
string to the ninja output which we then filter with grep. If there
are any errors in the output we get them from the compiler.

Signed-off-by: Alex Bennée 
---
  .gitlab-ci.d/buildtest-template.yml | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.gitlab-ci.d/buildtest-template.yml 
b/.gitlab-ci.d/buildtest-template.yml
index 73ecfabb8d..3af51846cd 100644
--- a/.gitlab-ci.d/buildtest-template.yml
+++ b/.gitlab-ci.d/buildtest-template.yml
@@ -21,7 +21,7 @@
then
  ../meson/meson.py configure . -Dbackend_max_links="$LD_JOBS" ;
fi || exit 1;
-- make -j"$JOBS"
+- env NINJA_STATUS="[ninja][%f/%t] " make -j"$JOBS" | grep -v 
"\[ninja\]\[.*[123456789]/"
  - if test -n "$MAKE_CHECK_ARGS";
then
  make -j"$JOBS" $MAKE_CHECK_ARGS ;


Not meant as a veto, but just for the records: I still don't like the idea. 
Having a log of the files that got compiled is still sometimes useful for 
me, e.g. when I want to check whether a certain file has been compiled at 
all or not (when e.g. debugging meson.build problems). So I'm still in 
favour of dropping this patch.


IMHO if you want to shorten the build log in the CI, please get those chatty 
softfloat tests fixed instead.


 Thomas

Re: [PATCH] target/i386: Fix 32-bit AD[CO]X insns in 64-bit mode

2023-02-15 Thread Philippe Mathieu-Daudé


On 15/1/23 02:21, Richard Henderson wrote:

Failure to truncate the inputs results in garbage for the carry-out.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1373
Signed-off-by: Richard Henderson 
---
  tests/tcg/x86_64/adox.c  | 69 
  target/i386/tcg/emit.c.inc   |  2 +
  tests/tcg/x86_64/Makefile.target |  3 ++
  3 files changed, 74 insertions(+)
  create mode 100644 tests/tcg/x86_64/adox.c


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 02/27] accel/tcg: Pass max_insn to gen_intermediate_code by pointer

2023-02-15 Thread Philippe Mathieu-Daudé


On 30/1/23 21:59, Richard Henderson wrote:

In preparation for returning the number of insns generated
via the same pointer.  Adjust only the prototypes so far.

Signed-off-by: Richard Henderson 
---
  include/exec/translator.h | 4 ++--
  accel/tcg/translate-all.c | 2 +-
  accel/tcg/translator.c| 4 ++--
  target/alpha/translate.c  | 2 +-
  target/arm/translate.c| 2 +-
  target/avr/translate.c| 2 +-
  target/cris/translate.c   | 2 +-
  target/hexagon/translate.c| 2 +-
  target/hppa/translate.c   | 2 +-
  target/i386/tcg/translate.c   | 2 +-
  target/loongarch/translate.c  | 2 +-
  target/m68k/translate.c   | 2 +-
  target/microblaze/translate.c | 2 +-
  target/mips/tcg/translate.c   | 2 +-
  target/nios2/translate.c  | 2 +-
  target/openrisc/translate.c   | 2 +-
  target/ppc/translate.c| 2 +-
  target/riscv/translate.c  | 2 +-
  target/rx/translate.c | 2 +-
  target/s390x/tcg/translate.c  | 2 +-
  target/sh4/translate.c| 2 +-
  target/sparc/translate.c  | 2 +-
  target/tricore/translate.c| 2 +-
  target/xtensa/translate.c | 2 +-
  24 files changed, 26 insertions(+), 26 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 12/27] accel/tcg/plugin: Use tcg_temp_ebb_*

2023-02-15 Thread Philippe Mathieu-Daudé


On 30/1/23 21:59, Richard Henderson wrote:

All of these uses have quite local scope.
Avoid tcg_const_*, because we haven't added a corresponding
interface for TEMP_EBB.  Use explicit tcg_gen_movi_* instead.

Signed-off-by: Richard Henderson 
---
  accel/tcg/plugin-gen.c | 24 ++--
  1 file changed, 14 insertions(+), 10 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 10/27] tcg: Add tcg_gen_movi_ptr

2023-02-15 Thread Philippe Mathieu-Daudé


On 30/1/23 21:59, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  include/tcg/tcg-op.h | 5 +
  1 file changed, 5 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 21/27] target/i386: Don't use tcg_temp_local_new

2023-02-15 Thread Philippe Mathieu-Daudé


On 30/1/23 21:59, Richard Henderson wrote:

Since tcg_temp_new is now identical, use that.
In some cases we can avoid a copy from A0 or T0.

Signed-off-by: Richard Henderson 
---
  target/i386/tcg/translate.c | 27 +--
  1 file changed, 9 insertions(+), 18 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH] target/microblaze: Add gdbstub xml

2023-02-15 Thread Edgar E. Iglesias

On Thu, Feb 16, 2023 at 12:56 AM Richard Henderson <
richard.hender...@linaro.org> wrote:

> Alex, Edgar, this has been reviewed.  Will either of you take it with your
> trees, or shall
> I just queue it through tcg-next?
>
>

Hi Richard, yeah if you don't mind, please take it through your tree!

Thanks,
Edgar



> r~
>
> On 12/30/22 06:24, Richard Henderson wrote:
> > Mirroring the upstream gdb xml files, the two stack boundary
> > registers are separated out.
> >
> > Signed-off-by: Richard Henderson 
> > ---
> >
> > I did this thinking I would be fixing:
> >
> >TESTbasic gdbstub support on microblaze
> >Truncated register 35 in remote 'g' packet
> >Traceback (most recent call last):
> >  File "/home/rth/qemu/src/tests/tcg/multiarch/gdbstub/sha1.py",
> >line 71, in  if gdb.parse_and_eval('$pc') == 0:
> >gdb.error: No registers.
> >
> > but in the end it turned out that the gdb-multiarch supplied
> > by ubuntu 22.04 simply doesn't support MicroBlaze, as can be
> > seen with the "set architecture" command within gdb.
> >
> > (I built gdb from source, to try and debug why this still wasn't
> > working, only to find that it did.  :-P)
> >
> > Alex, any way to modify our gdb test to fail gracefully here?
> >
> > Regardless, having proper xml for all of our targets seems
> > like the correct way forward.
> >
> >
> > r~
> >
> > Cc: Alex Bennée 
> > Cc: Edgar E. Iglesias 
> > ---
> >   configs/targets/microblaze-linux-user.mak   |  1 +
> >   configs/targets/microblaze-softmmu.mak  |  1 +
> >   configs/targets/microblazeel-linux-user.mak |  1 +
> >   configs/targets/microblazeel-softmmu.mak|  1 +
> >   target/microblaze/cpu.h |  2 +
> >   target/microblaze/cpu.c |  7 ++-
> >   target/microblaze/gdbstub.c | 51 +++-
> >   gdb-xml/microblaze-core.xml | 67 +
> >   gdb-xml/microblaze-stack-protect.xml| 12 
> >   9 files changed, 128 insertions(+), 15 deletions(-)
> >   create mode 100644 gdb-xml/microblaze-core.xml
> >   create mode 100644 gdb-xml/microblaze-stack-protect.xml
> >
> > diff --git a/configs/targets/microblaze-linux-user.mak
> b/configs/targets/microblaze-linux-user.mak
> > index 4249a37f65..0a2322c249 100644
> > --- a/configs/targets/microblaze-linux-user.mak
> > +++ b/configs/targets/microblaze-linux-user.mak
> > @@ -3,3 +3,4 @@ TARGET_SYSTBL_ABI=common
> >   TARGET_SYSTBL=syscall.tbl
> >   TARGET_BIG_ENDIAN=y
> >   TARGET_HAS_BFLT=y
> > +TARGET_XML_FILES=gdb-xml/microblaze-core.xml
> gdb-xml/microblaze-stack-protect.xml
> > diff --git a/configs/targets/microblaze-softmmu.mak
> b/configs/targets/microblaze-softmmu.mak
> > index 8385e2d333..e84c0cc728 100644
> > --- a/configs/targets/microblaze-softmmu.mak
> > +++ b/configs/targets/microblaze-softmmu.mak
> > @@ -2,3 +2,4 @@ TARGET_ARCH=microblaze
> >   TARGET_BIG_ENDIAN=y
> >   TARGET_SUPPORTS_MTTCG=y
> >   TARGET_NEED_FDT=y
> > +TARGET_XML_FILES=gdb-xml/microblaze-core.xml
> gdb-xml/microblaze-stack-protect.xml
> > diff --git a/configs/targets/microblazeel-linux-user.mak
> b/configs/targets/microblazeel-linux-user.mak
> > index d0e775d840..270743156a 100644
> > --- a/configs/targets/microblazeel-linux-user.mak
> > +++ b/configs/targets/microblazeel-linux-user.mak
> > @@ -2,3 +2,4 @@ TARGET_ARCH=microblaze
> >   TARGET_SYSTBL_ABI=common
> >   TARGET_SYSTBL=syscall.tbl
> >   TARGET_HAS_BFLT=y
> > +TARGET_XML_FILES=gdb-xml/microblaze-core.xml
> gdb-xml/microblaze-stack-protect.xml
> > diff --git a/configs/targets/microblazeel-softmmu.mak
> b/configs/targets/microblazeel-softmmu.mak
> > index af40391f2f..9b688036bd 100644
> > --- a/configs/targets/microblazeel-softmmu.mak
> > +++ b/configs/targets/microblazeel-softmmu.mak
> > @@ -1,3 +1,4 @@
> >   TARGET_ARCH=microblaze
> >   TARGET_SUPPORTS_MTTCG=y
> >   TARGET_NEED_FDT=y
> > +TARGET_XML_FILES=gdb-xml/microblaze-core.xml
> gdb-xml/microblaze-stack-protect.xml
> > diff --git a/target/microblaze/cpu.h b/target/microblaze/cpu.h
> > index 1e84dd8f47..e541fbb0b3 100644
> > --- a/target/microblaze/cpu.h
> > +++ b/target/microblaze/cpu.h
> > @@ -367,6 +367,8 @@ hwaddr mb_cpu_get_phys_page_attrs_debug(CPUState
> *cpu, vaddr addr,
> >   MemTxAttrs *attrs);
> >   int mb_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
> >   int mb_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
> > +int mb_cpu_gdb_read_stack_protect(CPUArchState *cpu, GByteArray *buf,
> int reg);
> > +int mb_cpu_gdb_write_stack_protect(CPUArchState *cpu, uint8_t *buf, int
> reg);
> >
> >   static inline uint32_t mb_cpu_read_msr(const CPUMBState *env)
> >   {
> > diff --git a/target/microblaze/cpu.c b/target/microblaze/cpu.c
> > index 817681f9b2..a2d2f5c340 100644
> > --- a/target/microblaze/cpu.c
> > +++ b/target/microblaze/cpu.c
> > @@ -28,6 +28,7 @@
> >   #include "qemu/module.h"
> >   #include "hw/qdev-properties.h"
> >

Re: [PATCH] target/microblaze: Add gdbstub xml

2023-02-15 Thread Richard Henderson

Alex, Edgar, this has been reviewed.  Will either of you take it with your trees, or shall 
I just queue it through tcg-next?


r~

On 12/30/22 06:24, Richard Henderson wrote:

Mirroring the upstream gdb xml files, the two stack boundary
registers are separated out.

Signed-off-by: Richard Henderson 
---

I did this thinking I would be fixing:

   TESTbasic gdbstub support on microblaze
   Truncated register 35 in remote 'g' packet
   Traceback (most recent call last):
 File "/home/rth/qemu/src/tests/tcg/multiarch/gdbstub/sha1.py",
   line 71, in  if gdb.parse_and_eval('$pc') == 0:
   gdb.error: No registers.

but in the end it turned out that the gdb-multiarch supplied
by ubuntu 22.04 simply doesn't support MicroBlaze, as can be
seen with the "set architecture" command within gdb.

(I built gdb from source, to try and debug why this still wasn't
working, only to find that it did.  :-P)

Alex, any way to modify our gdb test to fail gracefully here?

Regardless, having proper xml for all of our targets seems
like the correct way forward.


r~

Cc: Alex Bennée 
Cc: Edgar E. Iglesias 
---
  configs/targets/microblaze-linux-user.mak   |  1 +
  configs/targets/microblaze-softmmu.mak  |  1 +
  configs/targets/microblazeel-linux-user.mak |  1 +
  configs/targets/microblazeel-softmmu.mak|  1 +
  target/microblaze/cpu.h |  2 +
  target/microblaze/cpu.c |  7 ++-
  target/microblaze/gdbstub.c | 51 +++-
  gdb-xml/microblaze-core.xml | 67 +
  gdb-xml/microblaze-stack-protect.xml| 12 
  9 files changed, 128 insertions(+), 15 deletions(-)
  create mode 100644 gdb-xml/microblaze-core.xml
  create mode 100644 gdb-xml/microblaze-stack-protect.xml

diff --git a/configs/targets/microblaze-linux-user.mak 
b/configs/targets/microblaze-linux-user.mak
index 4249a37f65..0a2322c249 100644
--- a/configs/targets/microblaze-linux-user.mak
+++ b/configs/targets/microblaze-linux-user.mak
@@ -3,3 +3,4 @@ TARGET_SYSTBL_ABI=common
  TARGET_SYSTBL=syscall.tbl
  TARGET_BIG_ENDIAN=y
  TARGET_HAS_BFLT=y
+TARGET_XML_FILES=gdb-xml/microblaze-core.xml 
gdb-xml/microblaze-stack-protect.xml
diff --git a/configs/targets/microblaze-softmmu.mak 
b/configs/targets/microblaze-softmmu.mak
index 8385e2d333..e84c0cc728 100644
--- a/configs/targets/microblaze-softmmu.mak
+++ b/configs/targets/microblaze-softmmu.mak
@@ -2,3 +2,4 @@ TARGET_ARCH=microblaze
  TARGET_BIG_ENDIAN=y
  TARGET_SUPPORTS_MTTCG=y
  TARGET_NEED_FDT=y
+TARGET_XML_FILES=gdb-xml/microblaze-core.xml 
gdb-xml/microblaze-stack-protect.xml
diff --git a/configs/targets/microblazeel-linux-user.mak 
b/configs/targets/microblazeel-linux-user.mak
index d0e775d840..270743156a 100644
--- a/configs/targets/microblazeel-linux-user.mak
+++ b/configs/targets/microblazeel-linux-user.mak
@@ -2,3 +2,4 @@ TARGET_ARCH=microblaze
  TARGET_SYSTBL_ABI=common
  TARGET_SYSTBL=syscall.tbl
  TARGET_HAS_BFLT=y
+TARGET_XML_FILES=gdb-xml/microblaze-core.xml 
gdb-xml/microblaze-stack-protect.xml
diff --git a/configs/targets/microblazeel-softmmu.mak 
b/configs/targets/microblazeel-softmmu.mak
index af40391f2f..9b688036bd 100644
--- a/configs/targets/microblazeel-softmmu.mak
+++ b/configs/targets/microblazeel-softmmu.mak
@@ -1,3 +1,4 @@
  TARGET_ARCH=microblaze
  TARGET_SUPPORTS_MTTCG=y
  TARGET_NEED_FDT=y
+TARGET_XML_FILES=gdb-xml/microblaze-core.xml 
gdb-xml/microblaze-stack-protect.xml
diff --git a/target/microblaze/cpu.h b/target/microblaze/cpu.h
index 1e84dd8f47..e541fbb0b3 100644
--- a/target/microblaze/cpu.h
+++ b/target/microblaze/cpu.h
@@ -367,6 +367,8 @@ hwaddr mb_cpu_get_phys_page_attrs_debug(CPUState *cpu, 
vaddr addr,
  MemTxAttrs *attrs);
  int mb_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
  int mb_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
+int mb_cpu_gdb_read_stack_protect(CPUArchState *cpu, GByteArray *buf, int reg);
+int mb_cpu_gdb_write_stack_protect(CPUArchState *cpu, uint8_t *buf, int reg);
  
  static inline uint32_t mb_cpu_read_msr(const CPUMBState *env)

  {
diff --git a/target/microblaze/cpu.c b/target/microblaze/cpu.c
index 817681f9b2..a2d2f5c340 100644
--- a/target/microblaze/cpu.c
+++ b/target/microblaze/cpu.c
@@ -28,6 +28,7 @@
  #include "qemu/module.h"
  #include "hw/qdev-properties.h"
  #include "exec/exec-all.h"
+#include "exec/gdbstub.h"
  #include "fpu/softfloat-helpers.h"
  
  static const struct {

@@ -294,6 +295,9 @@ static void mb_cpu_initfn(Object *obj)
  CPUMBState *env = >env;
  
  cpu_set_cpustate_pointers(cpu);

+gdb_register_coprocessor(CPU(cpu), mb_cpu_gdb_read_stack_protect,
+ mb_cpu_gdb_write_stack_protect, 2,
+ "microblaze-stack-protect.xml", 0);
  
  set_float_rounding_mode(float_round_nearest_even, >fp_status);
  
@@ -422,7 +426,8 @@ static void mb_cpu_class_init(ObjectClass *oc, void

Re: [PATCH 0/4] target/arm: Cache ARMVAParameters

2023-02-15 Thread Richard Henderson


Ping.

r~

On 2/1/23 21:52, Richard Henderson wrote:

Hi Anders,

I'm not well versed on tuxrun, and how to make that work with a qemu
binary outside of the container, so I'm not sure if I'm comparing
apples to bananas.  Can you look and see if this fixes the kselftest
slowdown you reported?

Anyway, for a boot and shutdown of your rootfs, I see:

Before:
 11.13%  [.] aa64_va_parameters
  8.38%  [.] helper_lookup_tb_ptr
  7.37%  [.] pauth_computepac
  3.79%  [.] qht_lookup_custom

After:
  9.17%  [.] helper_lookup_tb_ptr
  8.05%  [.] pauth_computepac
  4.22%  [.] qht_lookup_custom
  3.68%  [.] pauth_addpac
  ...
  1.67%  [.] aa64_va_parameters


This is all due to the heavy use pauth makes of aa64_va_parameters.
It "only" needs 2 parameters, tsz and tbi, but tsz is probably the
most expensive part of aa64_va_parameters -- do anything about that
and we might as well cache the whole thing.

The change from struct+bitfields to uint32_t+FIELD is meant to combat
some really ugly code that gcc produced.  Seems like they should have
compiled to the same thing, more or less, but alas.


r~


Richard Henderson (4):
   target/arm: Flush only required tlbs for TCR_EL[12]
   target/arm: Store tbi for both insns and data in ARMVAParameters
   target/arm: Use FIELD for ARMVAParameters
   target/arm: Cache ARMVAParameters

  target/arm/cpu.h  |  30 +++
  target/arm/internals.h|  21 +
  target/arm/helper.c   | 177 --
  target/arm/pauth_helper.c |  39 +
  target/arm/ptw.c  |  57 ++--
  5 files changed, 217 insertions(+), 107 deletions(-)

Re: [PATCH 0/1] accel/tcg: Allow the second page of an instruction to be MMIO

2023-02-15 Thread Richard Henderson


On 2/6/23 09:38, Richard Henderson wrote:

Curious but true: two independent reports of the same issue within
24 hours, one with an x86 guest and one with an arm guest.

Neither report included instructions for reproduction (and both seem
to be with complex setup), therefore this is untested, but seems simple
enough to be the proper fix.  It matches up with

 /*
  * If the TB is not associated with a physical RAM page then it must be
  * a temporary one-insn TB, and we have nothing left to do. Return early
  * before attempting to link to other TBs or add to the lookup table.
  */
 if (tb_page_addr0(tb) == -1) {
 return tb;
 }

in tb_gen_code().


r~


Richard Henderson (1):
   accel/tcg: Allow the second page of an instruction to be MMIO

  accel/tcg/translator.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)



Queued to tcg-next.


r~

Re: [PATCH] target/i386: Fix 32-bit AD[CO]X insns in 64-bit mode

2023-02-15 Thread Richard Henderson


Ping.

Paolo, I see you've queued a fix for a different ADCOX bug in your latest pull.  You could 
probably adjust your new test for this case, but this problem is exclusively x86_64.


r~


On 1/14/23 15:21, Richard Henderson wrote:

Failure to truncate the inputs results in garbage for the carry-out.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1373
Signed-off-by: Richard Henderson 
---
  tests/tcg/x86_64/adox.c  | 69 
  target/i386/tcg/emit.c.inc   |  2 +
  tests/tcg/x86_64/Makefile.target |  3 ++
  3 files changed, 74 insertions(+)
  create mode 100644 tests/tcg/x86_64/adox.c

diff --git a/tests/tcg/x86_64/adox.c b/tests/tcg/x86_64/adox.c
new file mode 100644
index 00..36be644c8b
--- /dev/null
+++ b/tests/tcg/x86_64/adox.c
@@ -0,0 +1,69 @@
+/* See if ADOX give expected results */
+
+#include 
+#include 
+#include 
+
+static uint64_t adoxq(bool *c_out, uint64_t a, uint64_t b, bool c)
+{
+asm ("addl $0x7fff, %k1\n\t"
+ "adoxq %2, %0\n\t"
+ "seto %b1"
+ : "+r"(a), "="(c) : "r"(b), "1"((int)c));
+*c_out = c;
+return a;
+}
+
+static uint64_t adoxl(bool *c_out, uint64_t a, uint64_t b, bool c)
+{
+asm ("addl $0x7fff, %k1\n\t"
+ "adoxl %k2, %k0\n\t"
+ "seto %b1"
+ : "+r"(a), "="(c) : "r"(b), "1"((int)c));
+*c_out = c;
+return a;
+}
+
+int main()
+{
+uint64_t r;
+bool c;
+
+r = adoxq(, 0, 0, 0);
+assert(r == 0);
+assert(c == 0);
+
+r = adoxl(, 0, 0, 0);
+assert(r == 0);
+assert(c == 0);
+
+r = adoxl(, 0x1, 0, 0);
+assert(r == 0);
+assert(c == 0);
+
+r = adoxq(, 0, 0, 1);
+assert(r == 1);
+assert(c == 0);
+
+r = adoxl(, 0, 0, 1);
+assert(r == 1);
+assert(c == 0);
+
+r = adoxq(, -1, -1, 0);
+assert(r == -2);
+assert(c == 1);
+
+r = adoxl(, -1, -1, 0);
+assert(r == 0xfffe);
+assert(c == 1);
+
+r = adoxq(, -1, -1, 1);
+assert(r == -1);
+assert(c == 1);
+
+r = adoxl(, -1, -1, 1);
+assert(r == 0x);
+assert(c == 1);
+
+return 0;
+}
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index 1eace1231a..d44c51209d 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -1042,6 +1042,8 @@ static void gen_ADCOX(DisasContext *s, CPUX86State *env, 
MemOp ot, int cc_op)
  #ifdef TARGET_X86_64
  case MO_32:
  /* If TL is 64-bit just do everything in 64-bit arithmetic.  */
+tcg_gen_ext32u_tl(s->T0, s->T0);
+tcg_gen_ext32u_tl(s->T1, s->T1);
  tcg_gen_add_i64(s->T0, s->T0, s->T1);
  tcg_gen_add_i64(s->T0, s->T0, carry_in);
  tcg_gen_shri_i64(carry_out, s->T0, 32);
diff --git a/tests/tcg/x86_64/Makefile.target b/tests/tcg/x86_64/Makefile.target
index 4eac78293f..e64aab1b81 100644
--- a/tests/tcg/x86_64/Makefile.target
+++ b/tests/tcg/x86_64/Makefile.target
@@ -12,11 +12,14 @@ ifeq ($(filter %-linux-user, $(TARGET)),$(TARGET))
  X86_64_TESTS += vsyscall
  X86_64_TESTS += noexec
  X86_64_TESTS += cmpxchg
+X86_64_TESTS += adox
  TESTS=$(MULTIARCH_TESTS) $(X86_64_TESTS) test-x86_64
  else
  TESTS=$(MULTIARCH_TESTS)
  endif
  
+adox: CFLAGS=-O2

+
  run-test-i386-ssse3: QEMU_OPTS += -cpu max
  run-plugin-test-i386-ssse3-%: QEMU_OPTS += -cpu max

Re: [PATCH] target/i386: Fix BZHI instruction

2023-02-15 Thread Richard Henderson


Ping.

r~

On 1/14/23 13:32, Richard Henderson wrote:

We did not correctly handle N >= operand size.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1374
Signed-off-by: Richard Henderson 
---
  tests/tcg/i386/test-i386-bmi2.c |  3 +++
  target/i386/tcg/emit.c.inc  | 14 +++---
  2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/tests/tcg/i386/test-i386-bmi2.c b/tests/tcg/i386/test-i386-bmi2.c
index 982d4abda4..0244df7987 100644
--- a/tests/tcg/i386/test-i386-bmi2.c
+++ b/tests/tcg/i386/test-i386-bmi2.c
@@ -123,6 +123,9 @@ int main(int argc, char *argv[]) {
  result = bzhiq(mask, 0x1f);
  assert(result == (mask & ~(-1 << 30)));
  
+result = bzhiq(mask, 0x40);

+assert(result == mask);
+
  result = rorxq(0x2132435465768798, 8);
  assert(result == 0x9821324354657687);
  
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc

index 4d7702c106..1eace1231a 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -1143,20 +1143,20 @@ static void gen_BLSR(DisasContext *s, CPUX86State *env, 
X86DecodedInsn *decode)
  static void gen_BZHI(DisasContext *s, CPUX86State *env, X86DecodedInsn 
*decode)
  {
  MemOp ot = decode->op[0].ot;
-TCGv bound;
+TCGv bound = tcg_constant_tl(ot == MO_64 ? 63 : 31);
+TCGv zero = tcg_constant_tl(0);
+TCGv mone = tcg_constant_tl(-1);
  
-tcg_gen_ext8u_tl(s->T1, cpu_regs[s->vex_v]);

-bound = tcg_constant_tl(ot == MO_64 ? 63 : 31);
+tcg_gen_ext8u_tl(s->T1, s->T1);
  
  /*

   * Note that since we're using BMILG (in order to get O
   * cleared) we need to store the inverse into C.
   */
-tcg_gen_setcond_tl(TCG_COND_LT, cpu_cc_src, s->T1, bound);
-tcg_gen_movcond_tl(TCG_COND_GT, s->T1, s->T1, bound, bound, s->T1);
+tcg_gen_setcond_tl(TCG_COND_LEU, cpu_cc_src, s->T1, bound);
  
-tcg_gen_movi_tl(s->A0, -1);

-tcg_gen_shl_tl(s->A0, s->A0, s->T1);
+tcg_gen_shl_tl(s->A0, mone, s->T1);
+tcg_gen_movcond_tl(TCG_COND_LEU, s->A0, s->T1, bound, s->A0, zero);
  tcg_gen_andc_tl(s->T0, s->T0, s->A0);
  
  gen_op_update1_cc(s);

[PATCH v11 31/59] hw/xen: Implement EVTCHNOP_unmask

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

This finally comes with a mechanism for actually injecting events into
the guest vCPU, with all the atomic-test-and-set that's involved in
setting the bit in the shinfo, then the index in the vcpu_info, and
injecting either the lapic vector as MSI, or letting KVM inject the
bare vector.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_evtchn.c  | 175 ++
 hw/i386/kvm/xen_evtchn.h  |   2 +
 target/i386/kvm/xen-emu.c |  12 +++
 3 files changed, 189 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 08c6fac357..deea7de027 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -224,6 +224,13 @@ int xen_evtchn_set_callback_param(uint64_t param)
 return ret;
 }
 
+static void inject_callback(XenEvtchnState *s, uint32_t vcpu)
+{
+int type = s->callback_param >> CALLBACK_VIA_TYPE_SHIFT;
+
+kvm_xen_inject_vcpu_callback_vector(vcpu, type);
+}
+
 static bool valid_port(evtchn_port_t port)
 {
 if (!port) {
@@ -294,6 +301,152 @@ int xen_evtchn_status_op(struct evtchn_status *status)
 return 0;
 }
 
+/*
+ * Never thought I'd hear myself say this, but C++ templates would be
+ * kind of nice here.
+ *
+ * template static int do_unmask_port(T *shinfo, ...);
+ */
+static int do_unmask_port_lm(XenEvtchnState *s, evtchn_port_t port,
+ bool do_unmask, struct shared_info *shinfo,
+ struct vcpu_info *vcpu_info)
+{
+const int bits_per_word = BITS_PER_BYTE * 
sizeof(shinfo->evtchn_pending[0]);
+typeof(shinfo->evtchn_pending[0]) mask;
+int idx = port / bits_per_word;
+int offset = port % bits_per_word;
+
+mask = 1UL << offset;
+
+if (idx >= bits_per_word) {
+return -EINVAL;
+}
+
+if (do_unmask) {
+/*
+ * If this is a true unmask operation, clear the mask bit. If
+ * it was already unmasked, we have nothing further to do.
+ */
+if (!((qatomic_fetch_and(>evtchn_mask[idx], ~mask) & mask))) {
+return 0;
+}
+} else {
+/*
+ * This is a pseudo-unmask for affinity changes. We don't
+ * change the mask bit, and if it's *masked* we have nothing
+ * else to do.
+ */
+if (qatomic_fetch_or(>evtchn_mask[idx], 0) & mask) {
+return 0;
+}
+}
+
+/* If the event was not pending, we're done. */
+if (!(qatomic_fetch_or(>evtchn_pending[idx], 0) & mask)) {
+return 0;
+}
+
+/* Now on to the vcpu_info evtchn_pending_sel index... */
+mask = 1UL << idx;
+
+/* If a port in this word was already pending for this vCPU, all done. */
+if (qatomic_fetch_or(_info->evtchn_pending_sel, mask) & mask) {
+return 0;
+}
+
+/* Set evtchn_upcall_pending for this vCPU */
+if (qatomic_fetch_or(_info->evtchn_upcall_pending, 1)) {
+return 0;
+}
+
+inject_callback(s, s->port_table[port].vcpu);
+
+return 0;
+}
+
+static int do_unmask_port_compat(XenEvtchnState *s, evtchn_port_t port,
+ bool do_unmask,
+ struct compat_shared_info *shinfo,
+ struct compat_vcpu_info *vcpu_info)
+{
+const int bits_per_word = BITS_PER_BYTE * 
sizeof(shinfo->evtchn_pending[0]);
+typeof(shinfo->evtchn_pending[0]) mask;
+int idx = port / bits_per_word;
+int offset = port % bits_per_word;
+
+mask = 1UL << offset;
+
+if (idx >= bits_per_word) {
+return -EINVAL;
+}
+
+if (do_unmask) {
+/*
+ * If this is a true unmask operation, clear the mask bit. If
+ * it was already unmasked, we have nothing further to do.
+ */
+if (!((qatomic_fetch_and(>evtchn_mask[idx], ~mask) & mask))) {
+return 0;
+}
+} else {
+/*
+ * This is a pseudo-unmask for affinity changes. We don't
+ * change the mask bit, and if it's *masked* we have nothing
+ * else to do.
+ */
+if (qatomic_fetch_or(>evtchn_mask[idx], 0) & mask) {
+return 0;
+}
+}
+
+/* If the event was not pending, we're done. */
+if (!(qatomic_fetch_or(>evtchn_pending[idx], 0) & mask)) {
+return 0;
+}
+
+/* Now on to the vcpu_info evtchn_pending_sel index... */
+mask = 1UL << idx;
+
+/* If a port in this word was already pending for this vCPU, all done. */
+if (qatomic_fetch_or(_info->evtchn_pending_sel, mask) & mask) {
+return 0;
+}
+
+/* Set evtchn_upcall_pending for this vCPU */
+if (qatomic_fetch_or(_info->evtchn_upcall_pending, 1)) {
+return 0;
+}
+
+inject_callback(s, s->port_table[port].vcpu);
+
+return 0;
+}
+
+static int unmask_port(XenEvtchnState *s, evtchn_port_t port, bool do_unmask)
+{
+void *vcpu_info, *shinfo;
+
+if (s->port_table[port].type ==

[PATCH v11 54/59] i386/xen: Implement HYPERVISOR_physdev_op

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Just hook up the basic hypercalls to stubs in xen_evtchn.c for now.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_evtchn.c |  25 
 hw/i386/kvm/xen_evtchn.h |  11 
 target/i386/kvm/xen-compat.h |  19 ++
 target/i386/kvm/xen-emu.c| 118 +++
 4 files changed, 173 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 7412139154..ca9f15698f 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -1347,6 +1347,31 @@ int xen_evtchn_set_port(uint16_t port)
 return ret;
 }
 
+int xen_physdev_map_pirq(struct physdev_map_pirq *map)
+{
+return -ENOTSUP;
+}
+
+int xen_physdev_unmap_pirq(struct physdev_unmap_pirq *unmap)
+{
+return -ENOTSUP;
+}
+
+int xen_physdev_eoi_pirq(struct physdev_eoi *eoi)
+{
+return -ENOTSUP;
+}
+
+int xen_physdev_query_pirq(struct physdev_irq_status_query *query)
+{
+return -ENOTSUP;
+}
+
+int xen_physdev_get_free_pirq(struct physdev_get_free_pirq *get)
+{
+return -ENOTSUP;
+}
+
 struct xenevtchn_handle *xen_be_evtchn_open(void)
 {
 struct xenevtchn_handle *xc = g_new0(struct xenevtchn_handle, 1);
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 5a71ffb753..352c875976 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -62,4 +62,15 @@ int xen_evtchn_bind_interdomain_op(struct 
evtchn_bind_interdomain *interdomain);
 int xen_evtchn_bind_vcpu_op(struct evtchn_bind_vcpu *vcpu);
 int xen_evtchn_reset_op(struct evtchn_reset *reset);
 
+struct physdev_map_pirq;
+struct physdev_unmap_pirq;
+struct physdev_eoi;
+struct physdev_irq_status_query;
+struct physdev_get_free_pirq;
+int xen_physdev_map_pirq(struct physdev_map_pirq *map);
+int xen_physdev_unmap_pirq(struct physdev_unmap_pirq *unmap);
+int xen_physdev_eoi_pirq(struct physdev_eoi *eoi);
+int xen_physdev_query_pirq(struct physdev_irq_status_query *query);
+int xen_physdev_get_free_pirq(struct physdev_get_free_pirq *get);
+
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/target/i386/kvm/xen-compat.h b/target/i386/kvm/xen-compat.h
index 448336de92..7f30180cc2 100644
--- a/target/i386/kvm/xen-compat.h
+++ b/target/i386/kvm/xen-compat.h
@@ -48,4 +48,23 @@ struct compat_xen_add_to_physmap_batch {
 COMPAT_HANDLE(int) errs;
 };
 
+struct compat_physdev_map_pirq {
+domid_t domid;
+uint16_t pad;
+/* IN */
+int type;
+/* IN (ignored for ..._MULTI_MSI) */
+int index;
+/* IN or OUT */
+int pirq;
+/* IN - high 16 bits hold segment for ..._MSI_SEG and ..._MULTI_MSI */
+int bus;
+/* IN */
+int devfn;
+/* IN (also OUT for ..._MULTI_MSI) */
+int entry_nr;
+/* IN */
+uint64_t table_base;
+} __attribute__((packed));
+
 #endif /* QEMU_I386_XEN_COMPAT_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 389acd0c42..e8e7092c66 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -1517,6 +1517,121 @@ static bool kvm_xen_hcall_gnttab_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static bool kvm_xen_hcall_physdev_op(struct kvm_xen_exit *exit, X86CPU *cpu,
+ int cmd, uint64_t arg)
+{
+CPUState *cs = CPU(cpu);
+int err;
+
+switch (cmd) {
+case PHYSDEVOP_map_pirq: {
+struct physdev_map_pirq map;
+
+if (hypercall_compat32(exit->u.hcall.longmode)) {
+struct compat_physdev_map_pirq *map32 = (void *)
+
+if (kvm_copy_from_gva(cs, arg, map32, sizeof(*map32))) {
+return -EFAULT;
+}
+
+/*
+ * The only thing that's different is the alignment of the
+ * uint64_t table_base at the end, which gets padding to make
+ * it 64-bit aligned in the 64-bit version.
+ */
+qemu_build_assert(sizeof(*map32) == 36);
+qemu_build_assert(offsetof(struct physdev_map_pirq, entry_nr) ==
+  offsetof(struct compat_physdev_map_pirq, 
entry_nr));
+memmove(_base, >table_base, 
sizeof(map.table_base));
+} else {
+if (kvm_copy_from_gva(cs, arg, , sizeof(map))) {
+err = -EFAULT;
+break;
+}
+}
+err = xen_physdev_map_pirq();
+/*
+ * Since table_base is an IN parameter and won't be changed, just
+ * copy the size of the compat structure back to the guest.
+ */
+if (!err && kvm_copy_to_gva(cs, arg, ,
+sizeof(struct compat_physdev_map_pirq))) {
+err = -EFAULT;
+}
+break;
+}
+case PHYSDEVOP_unmap_pirq: {
+struct physdev_unmap_pirq unmap;
+
+qemu_build_assert(sizeof(unmap) == 8);
+if (kvm_copy_from_gva(cs, arg, , sizeof(unmap))) {
+err = -EFAULT;
+break;
+}
+
+err =

[PATCH v11 07/59] xen-platform: exclude vfio-pci from the PCI platform unplug

2023-02-15 Thread David Woodhouse

From: Joao Martins 

Such that PCI passthrough devices work for Xen emulated guests.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/xen/xen_platform.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
index 66e6de31a6..d601a5509d 100644
--- a/hw/i386/xen/xen_platform.c
+++ b/hw/i386/xen/xen_platform.c
@@ -109,12 +109,25 @@ static void log_writeb(PCIXenPlatformState *s, char val)
 #define _UNPLUG_NVME_DISKS 3
 #define UNPLUG_NVME_DISKS (1u << _UNPLUG_NVME_DISKS)
 
+static bool pci_device_is_passthrough(PCIDevice *d)
+{
+if (!strcmp(d->name, "xen-pci-passthrough")) {
+return true;
+}
+
+if (xen_mode == XEN_EMULATE && !strcmp(d->name, "vfio-pci")) {
+return true;
+}
+
+return false;
+}
+
 static void unplug_nic(PCIBus *b, PCIDevice *d, void *o)
 {
 /* We have to ignore passthrough devices */
 if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
 PCI_CLASS_NETWORK_ETHERNET
-&& strcmp(d->name, "xen-pci-passthrough") != 0) {
+&& !pci_device_is_passthrough(d)) {
 object_unparent(OBJECT(d));
 }
 }
@@ -187,9 +200,8 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void 
*opaque)
 !(flags & UNPLUG_IDE_SCSI_DISKS);
 
 /* We have to ignore passthrough devices */
-if (!strcmp(d->name, "xen-pci-passthrough")) {
+if (pci_device_is_passthrough(d))
 return;
-}
 
 switch (pci_get_word(d->config + PCI_CLASS_DEVICE)) {
 case PCI_CLASS_STORAGE_IDE:
-- 
2.39.0

[PATCH v11 38/59] hw/xen: Implement EVTCHNOP_reset

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_evtchn.c  | 30 ++
 hw/i386/kvm/xen_evtchn.h  |  3 +++
 target/i386/kvm/xen-emu.c | 17 +
 3 files changed, 50 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index f87b6a3b23..9b1fb47e85 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -12,6 +12,7 @@
 #include "qemu/osdep.h"
 #include "qemu/host-utils.h"
 #include "qemu/module.h"
+#include "qemu/lockable.h"
 #include "qemu/main-loop.h"
 #include "qemu/log.h"
 #include "qapi/error.h"
@@ -745,6 +746,35 @@ static int close_port(XenEvtchnState *s, evtchn_port_t 
port)
 return 0;
 }
 
+int xen_evtchn_soft_reset(void)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int i;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+assert(qemu_mutex_iothread_locked());
+
+QEMU_LOCK_GUARD(>port_lock);
+
+for (i = 0; i < s->nr_ports; i++) {
+close_port(s, i);
+}
+
+return 0;
+}
+
+int xen_evtchn_reset_op(struct evtchn_reset *reset)
+{
+if (reset->dom != DOMID_SELF && reset->dom != xen_domid) {
+return -ESRCH;
+}
+
+return xen_evtchn_soft_reset();
+}
+
 int xen_evtchn_close_op(struct evtchn_close *close)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 486b031c82..5d3e03553f 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -13,6 +13,7 @@
 #define QEMU_XEN_EVTCHN_H
 
 void xen_evtchn_create(void);
+int xen_evtchn_soft_reset(void);
 int xen_evtchn_set_callback_param(uint64_t param);
 
 struct evtchn_status;
@@ -24,6 +25,7 @@ struct evtchn_send;
 struct evtchn_alloc_unbound;
 struct evtchn_bind_interdomain;
 struct evtchn_bind_vcpu;
+struct evtchn_reset;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
@@ -33,5 +35,6 @@ int xen_evtchn_send_op(struct evtchn_send *send);
 int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc);
 int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain 
*interdomain);
 int xen_evtchn_bind_vcpu_op(struct evtchn_bind_vcpu *vcpu);
+int xen_evtchn_reset_op(struct evtchn_reset *reset);
 
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index ec7aefadfc..96261c10a0 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -961,6 +961,18 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 err = xen_evtchn_bind_vcpu_op();
 break;
 }
+case EVTCHNOP_reset: {
+struct evtchn_reset reset;
+
+qemu_build_assert(sizeof(reset) == 2);
+if (kvm_copy_from_gva(cs, arg, , sizeof(reset))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_evtchn_reset_op();
+break;
+}
 default:
 return false;
 }
@@ -978,6 +990,11 @@ int kvm_xen_soft_reset(void)
 
 trace_kvm_xen_soft_reset();
 
+err = xen_evtchn_soft_reset();
+if (err) {
+return err;
+}
+
 /*
  * Zero is the reset/startup state for HVM_PARAM_CALLBACK_IRQ. Strictly,
  * it maps to HVM_PARAM_CALLBACK_TYPE_GSI with GSI#0, but Xen refuses to
-- 
2.39.0

[PATCH v11 08/59] xen-platform: allow its creation with XEN_EMULATE mode

2023-02-15 Thread David Woodhouse

From: Joao Martins 

The only thing we need to fix to make this build is the PIO hack which
sets the BIOS memory areas to R/W v.s. R/O. Theoretically we could hook
that up to the PAM registers on the emulated PIIX, but in practice
nobody cares, so just leave it doing nothing.

Now it builds without actual Xen, move it to CONFIG_XEN_BUS to include it
in the KVM-only builds.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/xen/meson.build|  5 -
 hw/i386/xen/xen_platform.c | 39 +-
 2 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/hw/i386/xen/meson.build b/hw/i386/xen/meson.build
index 2fcc46e6ca..3dc4c4f106 100644
--- a/hw/i386/xen/meson.build
+++ b/hw/i386/xen/meson.build
@@ -1,6 +1,9 @@
 i386_ss.add(when: 'CONFIG_XEN', if_true: files(
   'xen-hvm.c',
   'xen_apic.c',
-  'xen_platform.c',
   'xen_pvdevice.c',
 ))
+
+i386_ss.add(when: 'CONFIG_XEN_BUS', if_true: files(
+  'xen_platform.c',
+))
diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
index d601a5509d..319049d80c 100644
--- a/hw/i386/xen/xen_platform.c
+++ b/hw/i386/xen/xen_platform.c
@@ -28,9 +28,9 @@
 #include "hw/ide.h"
 #include "hw/ide/pci.h"
 #include "hw/pci/pci.h"
-#include "hw/xen/xen_common.h"
 #include "migration/vmstate.h"
-#include "hw/xen/xen-legacy-backend.h"
+#include "hw/xen/xen.h"
+#include "net/net.h"
 #include "trace.h"
 #include "sysemu/xen.h"
 #include "sysemu/block-backend.h"
@@ -38,6 +38,11 @@
 #include "qemu/module.h"
 #include "qom/object.h"
 
+#ifdef CONFIG_XEN
+#include "hw/xen/xen_common.h"
+#include "hw/xen/xen-legacy-backend.h"
+#endif
+
 //#define DEBUG_PLATFORM
 
 #ifdef DEBUG_PLATFORM
@@ -280,18 +285,26 @@ static void platform_fixed_ioport_writeb(void *opaque, 
uint32_t addr, uint32_t v
 PCIXenPlatformState *s = opaque;
 
 switch (addr) {
-case 0: /* Platform flags */ {
-hvmmem_type_t mem_type = (val & PFFLAG_ROM_LOCK) ?
-HVMMEM_ram_ro : HVMMEM_ram_rw;
-if (xen_set_mem_type(xen_domid, mem_type, 0xc0, 0x40)) {
-DPRINTF("unable to change ro/rw state of ROM memory area!\n");
-} else {
+case 0: /* Platform flags */
+if (xen_mode == XEN_EMULATE) {
+/* XX: Use i440gx/q35 PAM setup to do this? */
 s->flags = val & PFFLAG_ROM_LOCK;
-DPRINTF("changed ro/rw state of ROM memory area. now is %s 
state.\n",
-(mem_type == HVMMEM_ram_ro ? "ro":"rw"));
+#ifdef CONFIG_XEN
+} else {
+hvmmem_type_t mem_type = (val & PFFLAG_ROM_LOCK) ?
+HVMMEM_ram_ro : HVMMEM_ram_rw;
+
+if (xen_set_mem_type(xen_domid, mem_type, 0xc0, 0x40)) {
+DPRINTF("unable to change ro/rw state of ROM memory area!\n");
+} else {
+s->flags = val & PFFLAG_ROM_LOCK;
+DPRINTF("changed ro/rw state of ROM memory area. now is %s 
state.\n",
+(mem_type == HVMMEM_ram_ro ? "ro" : "rw"));
+}
+#endif
 }
 break;
-}
+
 case 2:
 log_writeb(s, val);
 break;
@@ -509,8 +522,8 @@ static void xen_platform_realize(PCIDevice *dev, Error 
**errp)
 uint8_t *pci_conf;
 
 /* Device will crash on reset if xen is not initialized */
-if (!xen_enabled()) {
-error_setg(errp, "xen-platform device requires the Xen accelerator");
+if (xen_mode == XEN_DISABLED) {
+error_setg(errp, "xen-platform device requires a Xen guest");
 return;
 }
 
-- 
2.39.0

[PATCH v11 09/59] i386/xen: handle guest hypercalls

2023-02-15 Thread David Woodhouse

From: Joao Martins 

This means handling the new exit reason for Xen but still
crashing on purpose. As we implement each of the hypercalls
we will then return the right return code.

Signed-off-by: Joao Martins 
[dwmw2: Add CPL to hypercall tracing, disallow hypercalls from CPL > 0]
Signed-off-by: David Woodhouse 
---
 target/i386/kvm/kvm.c|  5 
 target/i386/kvm/trace-events |  3 +++
 target/i386/kvm/xen-emu.c| 44 
 target/i386/kvm/xen-emu.h|  1 +
 4 files changed, 53 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 165fa5232d..a7ba3476ac 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5478,6 +5478,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run 
*run)
 assert(run->msr.reason == KVM_MSR_EXIT_REASON_FILTER);
 ret = kvm_handle_wrmsr(cpu, run);
 break;
+#ifdef CONFIG_XEN_EMU
+case KVM_EXIT_XEN:
+ret = kvm_xen_handle_exit(cpu, >xen);
+break;
+#endif
 default:
 fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
 ret = -1;
diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index 7c369db1e1..cd6f842b1f 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -5,3 +5,6 @@ kvm_x86_fixup_msi_error(uint32_t gsi) "VT-d failed to remap 
interrupt for GSI %"
 kvm_x86_add_msi_route(int virq) "Adding route entry for virq %d"
 kvm_x86_remove_msi_route(int virq) "Removing route entry for virq %d"
 kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
+
+# xen-emu.c
+kvm_xen_hypercall(int cpu, uint8_t cpl, uint64_t input, uint64_t a0, uint64_t 
a1, uint64_t a2, uint64_t ret) "xen_hypercall: cpu %d cpl %d input %" PRIu64 " 
a0 0x%" PRIx64 " a1 0x%" PRIx64 " a2 0x%" PRIx64" ret 0x%" PRIx64
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 4883b95d9d..476f464ee2 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -10,10 +10,12 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/log.h"
 #include "sysemu/kvm_int.h"
 #include "sysemu/kvm_xen.h"
 #include "kvm/kvm_i386.h"
 #include "xen-emu.h"
+#include "trace.h"
 
 int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
 {
@@ -84,3 +86,45 @@ uint32_t kvm_xen_get_caps(void)
 {
 return kvm_state->xen_caps;
 }
+
+static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
+{
+uint16_t code = exit->u.hcall.input;
+
+if (exit->u.hcall.cpl > 0) {
+exit->u.hcall.result = -EPERM;
+return true;
+}
+
+switch (code) {
+default:
+return false;
+}
+}
+
+int kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
+{
+if (exit->type != KVM_EXIT_XEN_HCALL) {
+return -1;
+}
+
+if (!do_kvm_xen_handle_exit(cpu, exit)) {
+/*
+ * Some hypercalls will be deliberately "implemented" by returning
+ * -ENOSYS. This case is for hypercalls which are unexpected.
+ */
+exit->u.hcall.result = -ENOSYS;
+qemu_log_mask(LOG_UNIMP, "Unimplemented Xen hypercall %"
+  PRId64 " (0x%" PRIx64 " 0x%" PRIx64 " 0x%" PRIx64 ")\n",
+  (uint64_t)exit->u.hcall.input,
+  (uint64_t)exit->u.hcall.params[0],
+  (uint64_t)exit->u.hcall.params[1],
+  (uint64_t)exit->u.hcall.params[2]);
+}
+
+trace_kvm_xen_hypercall(CPU(cpu)->cpu_index, exit->u.hcall.cpl,
+exit->u.hcall.input, exit->u.hcall.params[0],
+exit->u.hcall.params[1], exit->u.hcall.params[2],
+exit->u.hcall.result);
+return 0;
+}
diff --git a/target/i386/kvm/xen-emu.h b/target/i386/kvm/xen-emu.h
index d62f1d8ed8..21faf6bf38 100644
--- a/target/i386/kvm/xen-emu.h
+++ b/target/i386/kvm/xen-emu.h
@@ -25,5 +25,6 @@
 
 int kvm_xen_init(KVMState *s, uint32_t hypercall_msr);
 int kvm_xen_init_vcpu(CPUState *cs);
+int kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit);
 
 #endif /* QEMU_I386_KVM_XEN_EMU_H */
-- 
2.39.0

[PATCH v11 12/59] i386/xen: Implement SCHEDOP_poll and SCHEDOP_yield

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

They both do the same thing and just call sched_yield. This is enough to
stop the Linux guest panicking when running on a host kernel which doesn't
intercept SCHEDOP_poll and lets it reach userspace.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/kvm/xen-emu.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 4ed833656f..ebea27caf6 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -234,6 +234,19 @@ static bool kvm_xen_hcall_sched_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 err = schedop_shutdown(cs, arg);
 break;
 
+case SCHEDOP_poll:
+/*
+ * Linux will panic if this doesn't work. Just yield; it's not
+ * worth overthinking it because with event channel handling
+ * in KVM, the kernel will intercept this and it will never
+ * reach QEMU anyway. The semantics of the hypercall explicltly
+ * permit spurious wakeups.
+ */
+case SCHEDOP_yield:
+sched_yield();
+err = 0;
+break;
+
 default:
 return false;
 }
-- 
2.39.0

[PATCH v11 02/59] xen: add CONFIG_XEN_BUS and CONFIG_XEN_EMU options for Xen emulation

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

The XEN_EMU option will cover core Xen support in target/, which exists
only for x86 with KVM today but could theoretically also be implemented
on Arm/Aarch64 and with TCG or other accelerators (if anyone wants to
run the gauntlet of struct layout compatibility, errno mapping, and the
rest of that fui).

It will also cover the support for architecture-independent grant table
and event channel support which will be added in hw/i386/kvm/ (on the
basis that the non-KVM support is very theoretical and making it not use
KVM directly seems like gratuitous overengineering at this point).

The XEN_BUS option is for the xenfv platform support, which will now be
used both by XEN_EMU and by real Xen.

The XEN option remains dependent on the Xen runtime libraries, and covers
support for real Xen. Some code which currently resides under CONFIG_XEN
will be moving to CONFIG_XEN_BUS over time as the direct dependencies on
Xen runtime libraries are eliminated. The Xen PCI platform device will
also reside under CONFIG_XEN_BUS.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/Kconfig  | 1 +
 hw/i386/Kconfig | 5 +
 hw/xen/Kconfig  | 3 +++
 meson.build | 1 +
 4 files changed, 10 insertions(+)
 create mode 100644 hw/xen/Kconfig

diff --git a/hw/Kconfig b/hw/Kconfig
index 38233bbb0f..ba62ff6417 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -41,6 +41,7 @@ source tpm/Kconfig
 source usb/Kconfig
 source virtio/Kconfig
 source vfio/Kconfig
+source xen/Kconfig
 source watchdog/Kconfig
 
 # arch Kconfig
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index 9fbfe748b5..d40802d83f 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -136,3 +136,8 @@ config VMPORT
 config VMMOUSE
 bool
 depends on VMPORT
+
+config XEN_EMU
+bool
+default y
+depends on KVM && (I386 || X86_64)
diff --git a/hw/xen/Kconfig b/hw/xen/Kconfig
new file mode 100644
index 00..3467efb986
--- /dev/null
+++ b/hw/xen/Kconfig
@@ -0,0 +1,3 @@
+config XEN_BUS
+bool
+default y if (XEN || XEN_EMU)
diff --git a/meson.build b/meson.build
index 3f08bceba0..12071688cd 100644
--- a/meson.build
+++ b/meson.build
@@ -3853,6 +3853,7 @@ if have_system
   if xen.found()
 summary_info += {'xen ctrl version':  xen.version()}
   endif
+  summary_info += {'Xen emulation': config_all.has_key('CONFIG_XEN_EMU')}
 endif
 summary_info += {'TCG support':   config_all.has_key('CONFIG_TCG')}
 if config_all.has_key('CONFIG_TCG')
-- 
2.39.0

[PATCH v11 23/59] i386/xen: handle VCPUOP_register_runstate_memory_area

2023-02-15 Thread David Woodhouse

From: Joao Martins 

Allow guest to setup the vcpu runstates which is used as
steal clock.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/cpu.h |  1 +
 target/i386/kvm/xen-emu.c | 57 +++
 target/i386/machine.c |  1 +
 3 files changed, 59 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 96c2d0d5cb..bf44a87ddb 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1791,6 +1791,7 @@ typedef struct CPUArchState {
 uint64_t xen_vcpu_info_gpa;
 uint64_t xen_vcpu_info_default_gpa;
 uint64_t xen_vcpu_time_info_gpa;
+uint64_t xen_vcpu_runstate_gpa;
 #endif
 #if defined(CONFIG_HVF)
 HVFX86LazyFlags hvf_lflags;
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 0b3bd0b889..f5c8b6d20c 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -160,6 +160,7 @@ int kvm_xen_init_vcpu(CPUState *cs)
 env->xen_vcpu_info_gpa = INVALID_GPA;
 env->xen_vcpu_info_default_gpa = INVALID_GPA;
 env->xen_vcpu_time_info_gpa = INVALID_GPA;
+env->xen_vcpu_runstate_gpa = INVALID_GPA;
 
 return 0;
 }
@@ -254,6 +255,17 @@ static void do_set_vcpu_time_info_gpa(CPUState *cs, 
run_on_cpu_data data)
   env->xen_vcpu_time_info_gpa);
 }
 
+static void do_set_vcpu_runstate_gpa(CPUState *cs, run_on_cpu_data data)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
+
+env->xen_vcpu_runstate_gpa = data.host_ulong;
+
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR,
+  env->xen_vcpu_runstate_gpa);
+}
+
 static void do_vcpu_soft_reset(CPUState *cs, run_on_cpu_data data)
 {
 X86CPU *cpu = X86_CPU(cs);
@@ -262,10 +274,14 @@ static void do_vcpu_soft_reset(CPUState *cs, 
run_on_cpu_data data)
 env->xen_vcpu_info_gpa = INVALID_GPA;
 env->xen_vcpu_info_default_gpa = INVALID_GPA;
 env->xen_vcpu_time_info_gpa = INVALID_GPA;
+env->xen_vcpu_runstate_gpa = INVALID_GPA;
 
 kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO, INVALID_GPA);
 kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO,
   INVALID_GPA);
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR,
+  INVALID_GPA);
+
 }
 
 static int xen_set_shared_info(uint64_t gfn)
@@ -517,6 +533,35 @@ static int vcpuop_register_vcpu_time_info(CPUState *cs, 
CPUState *target,
 return 0;
 }
 
+static int vcpuop_register_runstate_info(CPUState *cs, CPUState *target,
+ uint64_t arg)
+{
+struct vcpu_register_runstate_memory_area rma;
+uint64_t gpa;
+size_t len;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(rma) == 8);
+/* The runstate area actually does change size, but Linux copes. */
+
+if (!target) {
+return -ENOENT;
+}
+
+if (kvm_copy_from_gva(cs, arg, , sizeof(rma))) {
+return -EFAULT;
+}
+
+/* As with vcpu_time_info, Xen actually uses the GVA but KVM doesn't. */
+if (!kvm_gva_to_gpa(cs, rma.addr.p, , , false)) {
+return -EFAULT;
+}
+
+async_run_on_cpu(target, do_set_vcpu_runstate_gpa,
+ RUN_ON_CPU_HOST_ULONG(gpa));
+return 0;
+}
+
 static bool kvm_xen_hcall_vcpu_op(struct kvm_xen_exit *exit, X86CPU *cpu,
   int cmd, int vcpu_id, uint64_t arg)
 {
@@ -525,6 +570,9 @@ static bool kvm_xen_hcall_vcpu_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 int err;
 
 switch (cmd) {
+case VCPUOP_register_runstate_memory_area:
+err = vcpuop_register_runstate_info(cs, dest, arg);
+break;
 case VCPUOP_register_vcpu_time_memory_area:
 err = vcpuop_register_vcpu_time_info(cs, dest, arg);
 break;
@@ -730,6 +778,15 @@ int kvm_put_xen_state(CPUState *cs)
 }
 }
 
+gpa = env->xen_vcpu_runstate_gpa;
+if (gpa != INVALID_GPA) {
+ret = kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR,
+gpa);
+if (ret < 0) {
+return ret;
+}
+}
+
 return 0;
 }
 
diff --git a/target/i386/machine.c b/target/i386/machine.c
index eb657907ca..3f3d436aaa 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -1273,6 +1273,7 @@ static const VMStateDescription vmstate_xen_vcpu = {
 VMSTATE_UINT64(env.xen_vcpu_info_gpa, X86CPU),
 VMSTATE_UINT64(env.xen_vcpu_info_default_gpa, X86CPU),
 VMSTATE_UINT64(env.xen_vcpu_time_info_gpa, X86CPU),
+VMSTATE_UINT64(env.xen_vcpu_runstate_gpa, X86CPU),
 VMSTATE_END_OF_LIST()
 }
 };
-- 
2.39.0

[PATCH v11 29/59] hw/xen: Implement EVTCHNOP_status

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

This adds the basic structure for maintaining the port table and reporting
the status of ports therein.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_evtchn.c  | 104 ++
 hw/i386/kvm/xen_evtchn.h  |   3 ++
 target/i386/kvm/xen-emu.c |  20 +++-
 3 files changed, 125 insertions(+), 2 deletions(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 9d6f4076ad..8bed33890f 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -22,6 +22,7 @@
 #include "hw/sysbus.h"
 #include "hw/xen/xen.h"
 #include "xen_evtchn.h"
+#include "xen_overlay.h"
 
 #include "sysemu/kvm.h"
 #include "sysemu/kvm_xen.h"
@@ -33,6 +34,22 @@
 #define TYPE_XEN_EVTCHN "xen-evtchn"
 OBJECT_DECLARE_SIMPLE_TYPE(XenEvtchnState, XEN_EVTCHN)
 
+typedef struct XenEvtchnPort {
+uint32_t vcpu;  /* Xen/ACPI vcpu_id */
+uint16_t type;  /* EVTCHNSTAT_ */
+uint16_t type_val;  /* pirq# / virq# / remote port according to type */
+} XenEvtchnPort;
+
+#define COMPAT_EVTCHN_2L_NR_CHANNELS1024
+
+/*
+ * For unbound/interdomain ports there are only two possible remote
+ * domains; self and QEMU. Use a single high bit in type_val for that,
+ * and the low bits for the remote port number (or 0 for unbound).
+ */
+#define PORT_INFO_TYPEVAL_REMOTE_QEMU   0x8000
+#define PORT_INFO_TYPEVAL_REMOTE_PORT_MASK  0x7FFF
+
 struct XenEvtchnState {
 /*< private >*/
 SysBusDevice busdev;
@@ -42,6 +59,8 @@ struct XenEvtchnState {
 bool evtchn_in_kernel;
 
 QemuMutex port_lock;
+uint32_t nr_ports;
+XenEvtchnPort port_table[EVTCHN_2L_NR_CHANNELS];
 };
 
 struct XenEvtchnState *xen_evtchn_singleton;
@@ -65,6 +84,18 @@ static bool xen_evtchn_is_needed(void *opaque)
 return xen_mode == XEN_EMULATE;
 }
 
+static const VMStateDescription xen_evtchn_port_vmstate = {
+.name = "xen_evtchn_port",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(vcpu, XenEvtchnPort),
+VMSTATE_UINT16(type, XenEvtchnPort),
+VMSTATE_UINT16(type_val, XenEvtchnPort),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription xen_evtchn_vmstate = {
 .name = "xen_evtchn",
 .version_id = 1,
@@ -73,6 +104,9 @@ static const VMStateDescription xen_evtchn_vmstate = {
 .post_load = xen_evtchn_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINT64(callback_param, XenEvtchnState),
+VMSTATE_UINT32(nr_ports, XenEvtchnState),
+VMSTATE_STRUCT_VARRAY_UINT32(port_table, XenEvtchnState, nr_ports, 1,
+ xen_evtchn_port_vmstate, XenEvtchnPort),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -153,3 +187,73 @@ int xen_evtchn_set_callback_param(uint64_t param)
 
 return ret;
 }
+
+static bool valid_port(evtchn_port_t port)
+{
+if (!port) {
+return false;
+}
+
+if (xen_is_long_mode()) {
+return port < EVTCHN_2L_NR_CHANNELS;
+} else {
+return port < COMPAT_EVTCHN_2L_NR_CHANNELS;
+}
+}
+
+int xen_evtchn_status_op(struct evtchn_status *status)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+XenEvtchnPort *p;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (status->dom != DOMID_SELF && status->dom != xen_domid) {
+return -ESRCH;
+}
+
+if (!valid_port(status->port)) {
+return -EINVAL;
+}
+
+qemu_mutex_lock(>port_lock);
+
+p = >port_table[status->port];
+
+status->status = p->type;
+status->vcpu = p->vcpu;
+
+switch (p->type) {
+case EVTCHNSTAT_unbound:
+if (p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU) {
+status->u.unbound.dom = DOMID_QEMU;
+} else {
+status->u.unbound.dom = xen_domid;
+}
+break;
+
+case EVTCHNSTAT_interdomain:
+if (p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU) {
+status->u.interdomain.dom = DOMID_QEMU;
+} else {
+status->u.interdomain.dom = xen_domid;
+}
+
+status->u.interdomain.port = p->type_val &
+PORT_INFO_TYPEVAL_REMOTE_PORT_MASK;
+break;
+
+case EVTCHNSTAT_pirq:
+status->u.pirq = p->type_val;
+break;
+
+case EVTCHNSTAT_virq:
+status->u.virq = p->type_val;
+break;
+}
+
+qemu_mutex_unlock(>port_lock);
+return 0;
+}
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index c9b7f9d11f..76467636ee 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -15,4 +15,7 @@
 void xen_evtchn_create(void);
 int xen_evtchn_set_callback_param(uint64_t param);
 
+struct evtchn_status;
+int xen_evtchn_status_op(struct evtchn_status *status);
+
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 4513f07c68..3811153724 100644
--- a/target/i386/kvm/xen-emu.c
+++

[PATCH v11 40/59] hw/xen: Support HVM_PARAM_CALLBACK_TYPE_GSI callback

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

The GSI callback (and later PCI_INTX) is a level triggered interrupt. It
is asserted when an event channel is delivered to vCPU0, and is supposed
to be cleared when the vcpu_info->evtchn_upcall_pending field for vCPU0
is cleared again.

Thankfully, Xen does *not* assert the GSI if the guest sets its own
evtchn_upcall_pending field; we only need to assert the GSI when we
have delivered an event for ourselves. So that's the easy part, kind of.

There's a slight complexity in that we need to hold the BQL before we
can call qemu_set_irq(), and we definitely can't do that while holding
our own port_lock (because we'll need to take that from the qemu-side
functions that the PV backend drivers will call). So if we end up
wanting to set the IRQ in a context where we *don't* already hold the
BQL, defer to a BH.

However, we *do* need to poll for the evtchn_upcall_pending flag being
cleared. In an ideal world we would poll that when the EOI happens on
the PIC/IOAPIC. That's how it works in the kernel with the VFIO eventfd
pairs — one is used to trigger the interrupt, and the other works in the
other direction to 'resample' on EOI, and trigger the first eventfd
again if the line is still active.

However, QEMU doesn't seem to do that. Even VFIO level interrupts seem
to be supported by temporarily unmapping the device's BARs from the
guest when an interrupt happens, then trapping *all* MMIO to the device
and sending the 'resample' event on *every* MMIO access until the IRQ
is cleared! Maybe in future we'll plumb the 'resample' concept through
QEMU's irq framework but for now we'll do what Xen itself does: just
check the flag on every vmexit if the upcall GSI is known to be
asserted.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_evtchn.c  | 97 +++
 hw/i386/kvm/xen_evtchn.h  |  4 ++
 hw/i386/pc.c  |  6 +++
 include/sysemu/kvm_xen.h  |  1 +
 target/i386/cpu.h |  1 +
 target/i386/kvm/kvm.c | 11 +
 target/i386/kvm/xen-emu.c | 40 
 target/i386/kvm/xen-emu.h |  1 +
 8 files changed, 161 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index fa54d185cd..ecc93da172 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -27,6 +27,8 @@
 
 #include "hw/sysbus.h"
 #include "hw/xen/xen.h"
+#include "hw/i386/x86.h"
+#include "hw/irq.h"
 
 #include "xen_evtchn.h"
 #include "xen_overlay.h"
@@ -100,9 +102,12 @@ struct XenEvtchnState {
 uint64_t callback_param;
 bool evtchn_in_kernel;
 
+QEMUBH *gsi_bh;
+
 QemuMutex port_lock;
 uint32_t nr_ports;
 XenEvtchnPort port_table[EVTCHN_2L_NR_CHANNELS];
+qemu_irq gsis[GSI_NUM_PINS];
 };
 
 struct XenEvtchnState *xen_evtchn_singleton;
@@ -167,13 +172,42 @@ static const TypeInfo xen_evtchn_info = {
 .class_init= xen_evtchn_class_init,
 };
 
+static void gsi_assert_bh(void *opaque)
+{
+struct vcpu_info *vi = kvm_xen_get_vcpu_info_hva(0);
+if (vi) {
+xen_evtchn_set_callback_level(!!vi->evtchn_upcall_pending);
+}
+}
+
 void xen_evtchn_create(void)
 {
 XenEvtchnState *s = XEN_EVTCHN(sysbus_create_simple(TYPE_XEN_EVTCHN,
 -1, NULL));
+int i;
+
 xen_evtchn_singleton = s;
 
 qemu_mutex_init(>port_lock);
+s->gsi_bh = aio_bh_new(qemu_get_aio_context(), gsi_assert_bh, s);
+
+for (i = 0; i < GSI_NUM_PINS; i++) {
+sysbus_init_irq(SYS_BUS_DEVICE(s), >gsis[i]);
+}
+}
+
+void xen_evtchn_connect_gsis(qemu_irq *system_gsis)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int i;
+
+if (!s) {
+return;
+}
+
+for (i = 0; i < GSI_NUM_PINS; i++) {
+sysbus_connect_irq(SYS_BUS_DEVICE(s), i, system_gsis[i]);
+}
 }
 
 static void xen_evtchn_register_types(void)
@@ -183,6 +217,64 @@ static void xen_evtchn_register_types(void)
 
 type_init(xen_evtchn_register_types)
 
+void xen_evtchn_set_callback_level(int level)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+uint32_t param;
+
+if (!s) {
+return;
+}
+
+/*
+ * We get to this function in a number of ways:
+ *
+ *  • From I/O context, via PV backend drivers sending a notification to
+ *the guest.
+ *
+ *  • From guest vCPU context, via loopback interdomain event channels
+ *(or theoretically even IPIs but guests don't use those with GSI
+ *delivery because that's pointless. We don't want a malicious guest
+ *to be able to trigger a deadlock though, so we can't rule it out.)
+ *
+ *  • From guest vCPU context when the HVM_PARAM_CALLBACK_IRQ is being
+ *configured.
+ *
+ *  • From guest vCPU context in the KVM exit handler, if the upcall
+ *pending flag has been cleared and the GSI needs to be deasserted.
+ *
+ *  • Maybe in future, in an interrupt ack/eoi notifier when the GSI has
+ *

[PATCH v11 33/59] hw/xen: Implement EVTCHNOP_bind_ipi

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_evtchn.c  | 69 +++
 hw/i386/kvm/xen_evtchn.h  |  2 ++
 target/i386/kvm/xen-emu.c | 15 +
 3 files changed, 86 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index da2f5711dd..d8527483b9 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -13,6 +13,7 @@
 #include "qemu/host-utils.h"
 #include "qemu/module.h"
 #include "qemu/main-loop.h"
+#include "qemu/log.h"
 #include "qapi/error.h"
 #include "qom/object.h"
 #include "exec/target_page.h"
@@ -231,6 +232,43 @@ static void inject_callback(XenEvtchnState *s, uint32_t 
vcpu)
 kvm_xen_inject_vcpu_callback_vector(vcpu, type);
 }
 
+static void deassign_kernel_port(evtchn_port_t port)
+{
+struct kvm_xen_hvm_attr ha;
+int ret;
+
+ha.type = KVM_XEN_ATTR_TYPE_EVTCHN;
+ha.u.evtchn.send_port = port;
+ha.u.evtchn.flags = KVM_XEN_EVTCHN_DEASSIGN;
+
+ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, );
+if (ret) {
+qemu_log_mask(LOG_GUEST_ERROR, "Failed to unbind kernel port %d: %s\n",
+  port, strerror(ret));
+}
+}
+
+static int assign_kernel_port(uint16_t type, evtchn_port_t port,
+  uint32_t vcpu_id)
+{
+CPUState *cpu = qemu_get_cpu(vcpu_id);
+struct kvm_xen_hvm_attr ha;
+
+if (!cpu) {
+return -ENOENT;
+}
+
+ha.type = KVM_XEN_ATTR_TYPE_EVTCHN;
+ha.u.evtchn.send_port = port;
+ha.u.evtchn.type = type;
+ha.u.evtchn.flags = 0;
+ha.u.evtchn.deliver.port.port = port;
+ha.u.evtchn.deliver.port.vcpu = kvm_arch_vcpu_id(cpu);
+ha.u.evtchn.deliver.port.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL;
+
+return kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, );
+}
+
 static bool valid_port(evtchn_port_t port)
 {
 if (!port) {
@@ -549,6 +587,12 @@ static int close_port(XenEvtchnState *s, evtchn_port_t 
port)
   p->type_val, 0);
 break;
 
+case EVTCHNSTAT_ipi:
+if (s->evtchn_in_kernel) {
+deassign_kernel_port(port);
+}
+break;
+
 default:
 break;
 }
@@ -638,3 +682,28 @@ int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq)
 
 return ret;
 }
+
+int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int ret;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (!valid_vcpu(ipi->vcpu)) {
+return -ENOENT;
+}
+
+qemu_mutex_lock(>port_lock);
+
+ret = allocate_port(s, ipi->vcpu, EVTCHNSTAT_ipi, 0, >port);
+if (!ret && s->evtchn_in_kernel) {
+assign_kernel_port(EVTCHNSTAT_ipi, ipi->port, ipi->vcpu);
+}
+
+qemu_mutex_unlock(>port_lock);
+
+return ret;
+}
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 0ea13dda3a..107f420848 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -19,9 +19,11 @@ struct evtchn_status;
 struct evtchn_close;
 struct evtchn_unmask;
 struct evtchn_bind_virq;
+struct evtchn_bind_ipi;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
 int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq);
+int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi);
 
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 0c4988ad63..4a20ccdf78 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -891,6 +891,21 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 }
 break;
 }
+case EVTCHNOP_bind_ipi: {
+struct evtchn_bind_ipi ipi;
+
+qemu_build_assert(sizeof(ipi) == 8);
+if (kvm_copy_from_gva(cs, arg, , sizeof(ipi))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_evtchn_bind_ipi_op();
+if (!err && kvm_copy_to_gva(cs, arg, , sizeof(ipi))) {
+err = -EFAULT;
+}
+break;
+}
 default:
 return false;
 }
-- 
2.39.0

[PATCH v11 06/59] i386/hvm: Set Xen vCPU ID in KVM

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

There are (at least) three different vCPU ID number spaces. One is the
internal KVM vCPU index, based purely on which vCPU was chronologically
created in the kernel first. If userspace threads are all spawned and
create their KVM vCPUs in essentially random order, then the KVM indices
are basically random too.

The second number space is the APIC ID space, which is consistent and
useful for referencing vCPUs. MSIs will specify the target vCPU using
the APIC ID, for example, and the KVM Xen APIs also take an APIC ID
from userspace whenever a vCPU needs to be specified (as opposed to
just using the appropriate vCPU fd).

The third number space is not normally relevant to the kernel, and is
the ACPI/MADT/Xen CPU number which corresponds to cs->cpu_index. But
Xen timer hypercalls use it, and Xen timer hypercalls *really* want
to be accelerated in the kernel rather than handled in userspace, so
the kernel needs to be told.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/kvm/kvm.c |  5 +
 target/i386/kvm/xen-emu.c | 28 
 target/i386/kvm/xen-emu.h |  1 +
 3 files changed, 34 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 2b3daabf7b..165fa5232d 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1869,6 +1869,11 @@ int kvm_arch_init_vcpu(CPUState *cs)
 }
 }
 
+r = kvm_xen_init_vcpu(cs);
+if (r) {
+return r;
+}
+
 kvm_base += 0x100;
 #else /* CONFIG_XEN_EMU */
 /* This should never happen as kvm_arch_init() would have died first. 
*/
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 34d5bc1bc9..4883b95d9d 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -52,6 +52,34 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
 return 0;
 }
 
+int kvm_xen_init_vcpu(CPUState *cs)
+{
+int err;
+
+/*
+ * The kernel needs to know the Xen/ACPI vCPU ID because that's
+ * what the guest uses in hypercalls such as timers. It doesn't
+ * match the APIC ID which is generally used for talking to the
+ * kernel about vCPUs. And if vCPU threads race with creating
+ * their KVM vCPUs out of order, it doesn't necessarily match
+ * with the kernel's internal vCPU indices either.
+ */
+if (kvm_xen_has_cap(EVTCHN_SEND)) {
+struct kvm_xen_vcpu_attr va = {
+.type = KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID,
+.u.vcpu_id = cs->cpu_index,
+};
+err = kvm_vcpu_ioctl(cs, KVM_XEN_VCPU_SET_ATTR, );
+if (err) {
+error_report("kvm: Failed to set Xen vCPU ID attribute: %s",
+ strerror(-err));
+return err;
+}
+}
+
+return 0;
+}
+
 uint32_t kvm_xen_get_caps(void)
 {
 return kvm_state->xen_caps;
diff --git a/target/i386/kvm/xen-emu.h b/target/i386/kvm/xen-emu.h
index 2101df0182..d62f1d8ed8 100644
--- a/target/i386/kvm/xen-emu.h
+++ b/target/i386/kvm/xen-emu.h
@@ -24,5 +24,6 @@
 #define XEN_VERSION(maj, min) ((maj) << 16 | (min))
 
 int kvm_xen_init(KVMState *s, uint32_t hypercall_msr);
+int kvm_xen_init_vcpu(CPUState *cs);
 
 #endif /* QEMU_I386_KVM_XEN_EMU_H */
-- 
2.39.0

[PATCH v11 51/59] hw/xen: Add xen_xenstore device for xenstore emulation

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Just the basic shell, with the event channel hookup. It only dumps the
buffer for now; a real ring implmentation will come in a subsequent patch.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/meson.build|   1 +
 hw/i386/kvm/xen_evtchn.c   |   1 +
 hw/i386/kvm/xen_xenstore.c | 248 +
 hw/i386/kvm/xen_xenstore.h |  20 +++
 hw/i386/pc.c   |   2 +
 target/i386/kvm/xen-emu.c  |  12 ++
 6 files changed, 284 insertions(+)
 create mode 100644 hw/i386/kvm/xen_xenstore.c
 create mode 100644 hw/i386/kvm/xen_xenstore.h

diff --git a/hw/i386/kvm/meson.build b/hw/i386/kvm/meson.build
index e02449e4d4..6d6981fced 100644
--- a/hw/i386/kvm/meson.build
+++ b/hw/i386/kvm/meson.build
@@ -8,6 +8,7 @@ i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files(
   'xen_overlay.c',
   'xen_evtchn.c',
   'xen_gnttab.c',
+  'xen_xenstore.c',
   ))
 
 i386_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss)
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 519b8e0600..7412139154 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -34,6 +34,7 @@
 
 #include "xen_evtchn.h"
 #include "xen_overlay.h"
+#include "xen_xenstore.h"
 
 #include "sysemu/kvm.h"
 #include "sysemu/kvm_xen.h"
diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
new file mode 100644
index 00..702f417633
--- /dev/null
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -0,0 +1,248 @@
+/*
+ * QEMU Xen emulation: Shared/overlay pages support
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Authors: David Woodhouse 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "qemu/host-utils.h"
+#include "qemu/module.h"
+#include "qemu/main-loop.h"
+#include "qemu/cutils.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+#include "migration/vmstate.h"
+
+#include "hw/sysbus.h"
+#include "hw/xen/xen.h"
+#include "xen_overlay.h"
+#include "xen_evtchn.h"
+#include "xen_xenstore.h"
+
+#include "sysemu/kvm.h"
+#include "sysemu/kvm_xen.h"
+
+#include "hw/xen/interface/io/xs_wire.h"
+#include "hw/xen/interface/event_channel.h"
+
+#define TYPE_XEN_XENSTORE "xen-xenstore"
+OBJECT_DECLARE_SIMPLE_TYPE(XenXenstoreState, XEN_XENSTORE)
+
+#define XEN_PAGE_SHIFT 12
+#define XEN_PAGE_SIZE (1ULL << XEN_PAGE_SHIFT)
+
+#define ENTRIES_PER_FRAME_V1 (XEN_PAGE_SIZE / sizeof(grant_entry_v1_t))
+#define ENTRIES_PER_FRAME_V2 (XEN_PAGE_SIZE / sizeof(grant_entry_v2_t))
+
+#define XENSTORE_HEADER_SIZE ((unsigned int)sizeof(struct xsd_sockmsg))
+
+struct XenXenstoreState {
+/*< private >*/
+SysBusDevice busdev;
+/*< public >*/
+
+MemoryRegion xenstore_page;
+struct xenstore_domain_interface *xs;
+uint8_t req_data[XENSTORE_HEADER_SIZE + XENSTORE_PAYLOAD_MAX];
+uint8_t rsp_data[XENSTORE_HEADER_SIZE + XENSTORE_PAYLOAD_MAX];
+uint32_t req_offset;
+uint32_t rsp_offset;
+bool rsp_pending;
+bool fatal_error;
+
+evtchn_port_t guest_port;
+evtchn_port_t be_port;
+struct xenevtchn_handle *eh;
+};
+
+struct XenXenstoreState *xen_xenstore_singleton;
+
+static void xen_xenstore_event(void *opaque);
+
+static void xen_xenstore_realize(DeviceState *dev, Error **errp)
+{
+XenXenstoreState *s = XEN_XENSTORE(dev);
+
+if (xen_mode != XEN_EMULATE) {
+error_setg(errp, "Xen xenstore support is for Xen emulation");
+return;
+}
+memory_region_init_ram(>xenstore_page, OBJECT(dev), "xen:xenstore_page",
+   XEN_PAGE_SIZE, _abort);
+memory_region_set_enabled(>xenstore_page, true);
+s->xs = memory_region_get_ram_ptr(>xenstore_page);
+memset(s->xs, 0, XEN_PAGE_SIZE);
+
+/* We can't map it this early as KVM isn't ready */
+xen_xenstore_singleton = s;
+
+s->eh = xen_be_evtchn_open();
+if (!s->eh) {
+error_setg(errp, "Xenstore evtchn port init failed");
+return;
+}
+aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), true,
+   xen_xenstore_event, NULL, NULL, NULL, s);
+}
+
+static bool xen_xenstore_is_needed(void *opaque)
+{
+return xen_mode == XEN_EMULATE;
+}
+
+static int xen_xenstore_pre_save(void *opaque)
+{
+XenXenstoreState *s = opaque;
+
+if (s->eh) {
+s->guest_port = xen_be_evtchn_get_guest_port(s->eh);
+}
+return 0;
+}
+
+static int xen_xenstore_post_load(void *opaque, int ver)
+{
+XenXenstoreState *s = opaque;
+
+/*
+ * As qemu/dom0, rebind to the guest's port. The Windows drivers may
+ * unbind the XenStore evtchn and rebind to it, having obtained the
+ * "remote" port through EVTCHNOP_status. In the case that migration
+ * occurs while it's unbound, the "remote" port needs to be the same
+ * as before so that the guest can find it, but should remain

[PATCH v11 39/59] i386/xen: add monitor commands to test event injection

2023-02-15 Thread David Woodhouse

From: Joao Martins 

Specifically add listing, injection of event channels.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
Acked-by: Dr. David Alan Gilbert 
Reviewed-by: Paul Durrant 
---
 hmp-commands.hx  |  29 +
 hw/i386/kvm/xen_evtchn.c | 137 +++
 include/monitor/hmp.h|   2 +
 qapi/misc-target.json| 116 +
 4 files changed, 284 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index fbb5daf09b..b87c250e23 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1815,3 +1815,32 @@ SRST
   Dump the FDT in dtb format to *filename*.
 ERST
 #endif
+
+#if defined(CONFIG_XEN_EMU)
+{
+.name   = "xen-event-inject",
+.args_type  = "port:i",
+.params = "port",
+.help   = "inject event channel",
+.cmd= hmp_xen_event_inject,
+},
+
+SRST
+``xen-event-inject`` *port*
+  Notify guest via event channel on port *port*.
+ERST
+
+
+{
+.name   = "xen-event-list",
+.args_type  = "",
+.params = "",
+.help   = "list event channel state",
+.cmd= hmp_xen_event_list,
+},
+
+SRST
+``xen-event-list``
+  List event channels in the guest
+ERST
+#endif
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 9b1fb47e85..fa54d185cd 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -15,7 +15,11 @@
 #include "qemu/lockable.h"
 #include "qemu/main-loop.h"
 #include "qemu/log.h"
+#include "monitor/monitor.h"
+#include "monitor/hmp.h"
 #include "qapi/error.h"
+#include "qapi/qapi-commands-misc-target.h"
+#include "qapi/qmp/qdict.h"
 #include "qom/object.h"
 #include "exec/target_page.h"
 #include "exec/address-spaces.h"
@@ -1067,3 +1071,136 @@ int xen_evtchn_send_op(struct evtchn_send *send)
 return ret;
 }
 
+EvtchnInfoList *qmp_xen_event_list(Error **errp)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+EvtchnInfoList *head = NULL, **tail = 
+void *shinfo, *pending, *mask;
+int i;
+
+if (!s) {
+error_setg(errp, "Xen event channel emulation not enabled");
+return NULL;
+}
+
+shinfo = xen_overlay_get_shinfo_ptr();
+if (!shinfo) {
+error_setg(errp, "Xen shared info page not allocated");
+return NULL;
+}
+
+if (xen_is_long_mode()) {
+pending = shinfo + offsetof(struct shared_info, evtchn_pending);
+mask = shinfo + offsetof(struct shared_info, evtchn_mask);
+} else {
+pending = shinfo + offsetof(struct compat_shared_info, evtchn_pending);
+mask = shinfo + offsetof(struct compat_shared_info, evtchn_mask);
+}
+
+QEMU_LOCK_GUARD(>port_lock);
+
+for (i = 0; i < s->nr_ports; i++) {
+XenEvtchnPort *p = >port_table[i];
+EvtchnInfo *info;
+
+if (p->type == EVTCHNSTAT_closed) {
+continue;
+}
+
+info = g_new0(EvtchnInfo, 1);
+
+info->port = i;
+qemu_build_assert(EVTCHN_PORT_TYPE_CLOSED == EVTCHNSTAT_closed);
+qemu_build_assert(EVTCHN_PORT_TYPE_UNBOUND == EVTCHNSTAT_unbound);
+qemu_build_assert(EVTCHN_PORT_TYPE_INTERDOMAIN == 
EVTCHNSTAT_interdomain);
+qemu_build_assert(EVTCHN_PORT_TYPE_PIRQ == EVTCHNSTAT_pirq);
+qemu_build_assert(EVTCHN_PORT_TYPE_VIRQ == EVTCHNSTAT_virq);
+qemu_build_assert(EVTCHN_PORT_TYPE_IPI == EVTCHNSTAT_ipi);
+
+info->type = p->type;
+if (p->type == EVTCHNSTAT_interdomain) {
+info->remote_domain = g_strdup((p->type_val & 
PORT_INFO_TYPEVAL_REMOTE_QEMU) ?
+   "qemu" : "loopback");
+info->target = p->type_val & PORT_INFO_TYPEVAL_REMOTE_PORT_MASK;
+} else {
+info->target = p->type_val;
+}
+info->vcpu = p->vcpu;
+info->pending = test_bit(i, pending);
+info->masked = test_bit(i, mask);
+
+QAPI_LIST_APPEND(tail, info);
+}
+
+return head;
+}
+
+void qmp_xen_event_inject(uint32_t port, Error **errp)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+
+if (!s) {
+error_setg(errp, "Xen event channel emulation not enabled");
+return;
+}
+
+if (!valid_port(port)) {
+error_setg(errp, "Invalid port %u", port);
+}
+
+QEMU_LOCK_GUARD(>port_lock);
+
+if (set_port_pending(s, port)) {
+error_setg(errp, "Failed to set port %u", port);
+return;
+}
+}
+
+void hmp_xen_event_list(Monitor *mon, const QDict *qdict)
+{
+EvtchnInfoList *iter, *info_list;
+Error *err = NULL;
+
+info_list = qmp_xen_event_list();
+if (err) {
+hmp_handle_error(mon, err);
+return;
+}
+
+for (iter = info_list; iter; iter = iter->next) {
+EvtchnInfo *info = iter->value;
+
+monitor_printf(mon, "port %4lu: vcpu: %ld %s", info->port, info->vcpu,
+   EvtchnPortType_str(info->type));
+

[PATCH v11 44/59] hw/xen: Support mapping grant frames

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_gnttab.c  | 73 ++-
 hw/i386/kvm/xen_overlay.c |  2 +-
 hw/i386/kvm/xen_overlay.h |  2 ++
 3 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/hw/i386/kvm/xen_gnttab.c b/hw/i386/kvm/xen_gnttab.c
index ef8857e50c..72e87aea6a 100644
--- a/hw/i386/kvm/xen_gnttab.c
+++ b/hw/i386/kvm/xen_gnttab.c
@@ -37,13 +37,26 @@ OBJECT_DECLARE_SIMPLE_TYPE(XenGnttabState, XEN_GNTTAB)
 #define XEN_PAGE_SHIFT 12
 #define XEN_PAGE_SIZE (1ULL << XEN_PAGE_SHIFT)
 
+#define ENTRIES_PER_FRAME_V1 (XEN_PAGE_SIZE / sizeof(grant_entry_v1_t))
+
 struct XenGnttabState {
 /*< private >*/
 SysBusDevice busdev;
 /*< public >*/
 
+QemuMutex gnt_lock;
+
 uint32_t nr_frames;
 uint32_t max_frames;
+
+union {
+grant_entry_v1_t *v1;
+/* Theoretically, v2 support could be added here. */
+} entries;
+
+MemoryRegion gnt_frames;
+MemoryRegion *gnt_aliases;
+uint64_t *gnt_frame_gpas;
 };
 
 struct XenGnttabState *xen_gnttab_singleton;
@@ -51,6 +64,7 @@ struct XenGnttabState *xen_gnttab_singleton;
 static void xen_gnttab_realize(DeviceState *dev, Error **errp)
 {
 XenGnttabState *s = XEN_GNTTAB(dev);
+int i;
 
 if (xen_mode != XEN_EMULATE) {
 error_setg(errp, "Xen grant table support is for Xen emulation");
@@ -58,6 +72,38 @@ static void xen_gnttab_realize(DeviceState *dev, Error 
**errp)
 }
 s->nr_frames = 0;
 s->max_frames = kvm_xen_get_gnttab_max_frames();
+memory_region_init_ram(>gnt_frames, OBJECT(dev), "xen:grant_table",
+   XEN_PAGE_SIZE * s->max_frames, _abort);
+memory_region_set_enabled(>gnt_frames, true);
+s->entries.v1 = memory_region_get_ram_ptr(>gnt_frames);
+memset(s->entries.v1, 0, XEN_PAGE_SIZE * s->max_frames);
+
+/* Create individual page-sizes aliases for overlays */
+s->gnt_aliases = (void *)g_new0(MemoryRegion, s->max_frames);
+s->gnt_frame_gpas = (void *)g_new(uint64_t, s->max_frames);
+for (i = 0; i < s->max_frames; i++) {
+memory_region_init_alias(>gnt_aliases[i], OBJECT(dev),
+ NULL, >gnt_frames,
+ i * XEN_PAGE_SIZE, XEN_PAGE_SIZE);
+s->gnt_frame_gpas[i] = INVALID_GPA;
+}
+
+qemu_mutex_init(>gnt_lock);
+
+xen_gnttab_singleton = s;
+}
+
+static int xen_gnttab_post_load(void *opaque, int version_id)
+{
+XenGnttabState *s = XEN_GNTTAB(opaque);
+uint32_t i;
+
+for (i = 0; i < s->nr_frames; i++) {
+if (s->gnt_frame_gpas[i] != INVALID_GPA) {
+xen_overlay_do_map_page(>gnt_aliases[i], s->gnt_frame_gpas[i]);
+}
+}
+return 0;
 }
 
 static bool xen_gnttab_is_needed(void *opaque)
@@ -70,8 +116,11 @@ static const VMStateDescription xen_gnttab_vmstate = {
 .version_id = 1,
 .minimum_version_id = 1,
 .needed = xen_gnttab_is_needed,
+.post_load = xen_gnttab_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINT32(nr_frames, XenGnttabState),
+VMSTATE_VARRAY_UINT32(gnt_frame_gpas, XenGnttabState, nr_frames, 0,
+  vmstate_info_uint64, uint64_t),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -106,6 +155,28 @@ type_init(xen_gnttab_register_types)
 
 int xen_gnttab_map_page(uint64_t idx, uint64_t gfn)
 {
-return -ENOSYS;
+XenGnttabState *s = xen_gnttab_singleton;
+uint64_t gpa = gfn << XEN_PAGE_SHIFT;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (idx >= s->max_frames) {
+return -EINVAL;
+}
+
+QEMU_IOTHREAD_LOCK_GUARD();
+QEMU_LOCK_GUARD(>gnt_lock);
+
+xen_overlay_do_map_page(>gnt_aliases[idx], gpa);
+
+s->gnt_frame_gpas[idx] = gpa;
+
+if (s->nr_frames <= idx) {
+s->nr_frames = idx + 1;
+}
+
+return 0;
 }
 
diff --git a/hw/i386/kvm/xen_overlay.c b/hw/i386/kvm/xen_overlay.c
index 8685d87959..39fda1b72c 100644
--- a/hw/i386/kvm/xen_overlay.c
+++ b/hw/i386/kvm/xen_overlay.c
@@ -49,7 +49,7 @@ struct XenOverlayState {
 
 struct XenOverlayState *xen_overlay_singleton;
 
-static void xen_overlay_do_map_page(MemoryRegion *page, uint64_t gpa)
+void xen_overlay_do_map_page(MemoryRegion *page, uint64_t gpa)
 {
 /*
  * Xen allows guests to map the same page as many times as it likes
diff --git a/hw/i386/kvm/xen_overlay.h b/hw/i386/kvm/xen_overlay.h
index 5c46a0b036..75ecb6b359 100644
--- a/hw/i386/kvm/xen_overlay.h
+++ b/hw/i386/kvm/xen_overlay.h
@@ -21,4 +21,6 @@ int xen_sync_long_mode(void);
 int xen_set_long_mode(bool long_mode);
 bool xen_is_long_mode(void);
 
+void xen_overlay_do_map_page(MemoryRegion *page, uint64_t gpa);
+
 #endif /* QEMU_XEN_OVERLAY_H */
-- 
2.39.0

[PATCH v11 26/59] i386/xen: implement HVMOP_set_param

2023-02-15 Thread David Woodhouse

From: Ankur Arora 

This is the hook for adding the HVM_PARAM_CALLBACK_IRQ parameter in a
subsequent commit.

Signed-off-by: Ankur Arora 
Signed-off-by: Joao Martins 
[dwmw2: Split out from another commit]
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/kvm/xen-emu.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 55dc2ac012..67c5832d09 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -489,6 +489,36 @@ static bool kvm_xen_hcall_memory_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static bool handle_set_param(struct kvm_xen_exit *exit, X86CPU *cpu,
+ uint64_t arg)
+{
+CPUState *cs = CPU(cpu);
+struct xen_hvm_param hp;
+int err = 0;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(hp) == 16);
+
+if (kvm_copy_from_gva(cs, arg, , sizeof(hp))) {
+err = -EFAULT;
+goto out;
+}
+
+if (hp.domid != DOMID_SELF && hp.domid != xen_domid) {
+err = -ESRCH;
+goto out;
+}
+
+switch (hp.index) {
+default:
+return false;
+}
+
+out:
+exit->u.hcall.result = err;
+return true;
+}
+
 static int kvm_xen_hcall_evtchn_upcall_vector(struct kvm_xen_exit *exit,
   X86CPU *cpu, uint64_t arg)
 {
@@ -530,6 +560,9 @@ static bool kvm_xen_hcall_hvm_op(struct kvm_xen_exit *exit, 
X86CPU *cpu,
 ret = -ENOSYS;
 break;
 
+case HVMOP_set_param:
+return handle_set_param(exit, cpu, arg);
+
 default:
 return false;
 }
-- 
2.39.0

[PATCH v11 58/59] kvm/i386: Add xen-evtchn-max-pirq property

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

The default number of PIRQs is set to 256 to avoid issues with 32-bit MSI
devices. Allow it to be increased if the user desires.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 accel/kvm/kvm-all.c   |  1 +
 hw/i386/kvm/xen_evtchn.c  | 21 +++--
 include/sysemu/kvm_int.h  |  1 +
 include/sysemu/kvm_xen.h  |  1 +
 target/i386/kvm/kvm.c | 34 ++
 target/i386/kvm/xen-emu.c |  6 ++
 6 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index dc5b0bb434..3b7881e949 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3705,6 +3705,7 @@ static void kvm_accel_instance_init(Object *obj)
 s->notify_window = 0;
 s->xen_version = 0;
 s->xen_gnttab_max_frames = 64;
+s->xen_evtchn_max_pirq = 256;
 }
 
 /**
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 4ec0c7af75..3f60461e5c 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -302,17 +302,18 @@ void xen_evtchn_create(void)
 }
 
 /*
- * We could parameterise the number of PIRQs available if needed,
- * but for now limit it to 256. The Xen scheme for encoding PIRQ#
- * into an MSI message is not compatible with 32-bit MSI, as it
- * puts the high bits of the PIRQ# into the high bits of the MSI
- * message address, instead of using the Extended Destination ID
- * in address bits 4-11 which perhaps would have been a better
- * choice. So to keep life simple, just stick with 256 as the
- * default, which conveniently doesn't need to set anything
- * outside the low 32 bits of the address.
+ * The Xen scheme for encoding PIRQ# into an MSI message is not
+ * compatible with 32-bit MSI, as it puts the high bits of the
+ * PIRQ# into the high bits of the MSI message address, instead of
+ * using the Extended Destination ID in address bits 4-11 which
+ * perhaps would have been a better choice.
+ *
+ * To keep life simple, kvm_accel_instance_init() initialises the
+ * default to 256. which conveniently doesn't need to set anything
+ * outside the low 32 bits of the address. It can be increased by
+ * setting the xen-evtchn-max-pirq property.
  */
-s->nr_pirqs = 256;
+s->nr_pirqs = kvm_xen_get_evtchn_max_pirq();
 
 s->nr_pirq_inuse_words = DIV_ROUND_UP(s->nr_pirqs, 64);
 s->pirq_inuse_bitmap = g_new0(uint64_t, s->nr_pirq_inuse_words);
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 39ce4d36f6..a641c974ea 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -121,6 +121,7 @@ struct KVMState
 uint32_t xen_version;
 uint32_t xen_caps;
 uint16_t xen_gnttab_max_frames;
+uint16_t xen_evtchn_max_pirq;
 };
 
 void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
diff --git a/include/sysemu/kvm_xen.h b/include/sysemu/kvm_xen.h
index 0b63bb81df..400aaa1490 100644
--- a/include/sysemu/kvm_xen.h
+++ b/include/sysemu/kvm_xen.h
@@ -26,6 +26,7 @@ void kvm_xen_inject_vcpu_callback_vector(uint32_t vcpu_id, 
int type);
 void kvm_xen_set_callback_asserted(void);
 int kvm_xen_set_vcpu_virq(uint32_t vcpu_id, uint16_t virq, uint16_t port);
 uint16_t kvm_xen_get_gnttab_max_frames(void);
+uint16_t kvm_xen_get_evtchn_max_pirq(void);
 
 #define kvm_xen_has_cap(cap) (!!(kvm_xen_get_caps() &   \
  KVM_XEN_HVM_CONFIG_ ## cap))
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index b497225fbd..4decd2559b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5907,6 +5907,33 @@ static void kvm_arch_set_xen_gnttab_max_frames(Object 
*obj, Visitor *v,
 s->xen_gnttab_max_frames = value;
 }
 
+static void kvm_arch_get_xen_evtchn_max_pirq(Object *obj, Visitor *v,
+ const char *name, void *opaque,
+ Error **errp)
+{
+KVMState *s = KVM_STATE(obj);
+uint16_t value = s->xen_evtchn_max_pirq;
+
+visit_type_uint16(v, name, , errp);
+}
+
+static void kvm_arch_set_xen_evtchn_max_pirq(Object *obj, Visitor *v,
+ const char *name, void *opaque,
+ Error **errp)
+{
+KVMState *s = KVM_STATE(obj);
+Error *error = NULL;
+uint16_t value;
+
+visit_type_uint16(v, name, , );
+if (error) {
+error_propagate(errp, error);
+return;
+}
+
+s->xen_evtchn_max_pirq = value;
+}
+
 void kvm_arch_accel_class_init(ObjectClass *oc)
 {
 object_class_property_add_enum(oc, "notify-vmexit", "NotifyVMexitOption",
@@ -5939,6 +5966,13 @@ void kvm_arch_accel_class_init(ObjectClass *oc)
   NULL, NULL);
 object_class_property_set_description(oc, "xen-gnttab-max-frames",
   "Maximum number of grant

[PATCH v11 45/59] i386/xen: Implement HYPERVISOR_grant_table_op and GNTTABOP_[gs]et_verson

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_gnttab.c  | 31 
 hw/i386/kvm/xen_gnttab.h  |  5 
 target/i386/kvm/xen-emu.c | 60 +++
 3 files changed, 96 insertions(+)

diff --git a/hw/i386/kvm/xen_gnttab.c b/hw/i386/kvm/xen_gnttab.c
index 72e87aea6a..b54a94e2bd 100644
--- a/hw/i386/kvm/xen_gnttab.c
+++ b/hw/i386/kvm/xen_gnttab.c
@@ -180,3 +180,34 @@ int xen_gnttab_map_page(uint64_t idx, uint64_t gfn)
 return 0;
 }
 
+int xen_gnttab_set_version_op(struct gnttab_set_version *set)
+{
+int ret;
+
+switch (set->version) {
+case 1:
+ret = 0;
+break;
+
+case 2:
+/* Behave as before set_version was introduced. */
+ret = -ENOSYS;
+break;
+
+default:
+ret = -EINVAL;
+}
+
+set->version = 1;
+return ret;
+}
+
+int xen_gnttab_get_version_op(struct gnttab_get_version *get)
+{
+if (get->dom != DOMID_SELF && get->dom != xen_domid) {
+return -ESRCH;
+}
+
+get->version = 1;
+return 0;
+}
diff --git a/hw/i386/kvm/xen_gnttab.h b/hw/i386/kvm/xen_gnttab.h
index a7caa94c83..79579677ba 100644
--- a/hw/i386/kvm/xen_gnttab.h
+++ b/hw/i386/kvm/xen_gnttab.h
@@ -15,4 +15,9 @@
 void xen_gnttab_create(void);
 int xen_gnttab_map_page(uint64_t idx, uint64_t gfn);
 
+struct gnttab_set_version;
+struct gnttab_get_version;
+int xen_gnttab_set_version_op(struct gnttab_set_version *set);
+int xen_gnttab_get_version_op(struct gnttab_get_version *get);
+
 #endif /* QEMU_XEN_GNTTAB_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 41976e85af..e35b2d5557 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -34,6 +34,7 @@
 #include "hw/xen/interface/hvm/params.h"
 #include "hw/xen/interface/vcpu.h"
 #include "hw/xen/interface/event_channel.h"
+#include "hw/xen/interface/grant_table.h"
 
 #include "xen-compat.h"
 
@@ -1166,6 +1167,61 @@ static bool kvm_xen_hcall_sched_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static bool kvm_xen_hcall_gnttab_op(struct kvm_xen_exit *exit, X86CPU *cpu,
+int cmd, uint64_t arg, int count)
+{
+CPUState *cs = CPU(cpu);
+int err;
+
+switch (cmd) {
+case GNTTABOP_set_version: {
+struct gnttab_set_version set;
+
+qemu_build_assert(sizeof(set) == 4);
+if (kvm_copy_from_gva(cs, arg, , sizeof(set))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_gnttab_set_version_op();
+if (!err && kvm_copy_to_gva(cs, arg, , sizeof(set))) {
+err = -EFAULT;
+}
+break;
+}
+case GNTTABOP_get_version: {
+struct gnttab_get_version get;
+
+qemu_build_assert(sizeof(get) == 8);
+if (kvm_copy_from_gva(cs, arg, , sizeof(get))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_gnttab_get_version_op();
+if (!err && kvm_copy_to_gva(cs, arg, , sizeof(get))) {
+err = -EFAULT;
+}
+break;
+}
+case GNTTABOP_query_size:
+case GNTTABOP_setup_table:
+case GNTTABOP_copy:
+case GNTTABOP_map_grant_ref:
+case GNTTABOP_unmap_grant_ref:
+case GNTTABOP_swap_grant_ref:
+return false;
+
+default:
+/* Xen explicitly returns -ENOSYS to HVM guests for all others */
+err = -ENOSYS;
+break;
+}
+
+exit->u.hcall.result = err;
+return true;
+}
+
 static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
 {
 uint16_t code = exit->u.hcall.input;
@@ -1176,6 +1232,10 @@ static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct 
kvm_xen_exit *exit)
 }
 
 switch (code) {
+case __HYPERVISOR_grant_table_op:
+return kvm_xen_hcall_gnttab_op(exit, cpu, exit->u.hcall.params[0],
+   exit->u.hcall.params[1],
+   exit->u.hcall.params[2]);
 case __HYPERVISOR_sched_op:
 return kvm_xen_hcall_sched_op(exit, cpu, exit->u.hcall.params[0],
   exit->u.hcall.params[1]);
-- 
2.39.0

[PATCH v11 35/59] hw/xen: Implement EVTCHNOP_alloc_unbound

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_evtchn.c  | 32 
 hw/i386/kvm/xen_evtchn.h  |  2 ++
 target/i386/kvm/xen-emu.c | 15 +++
 3 files changed, 49 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index a97d6ba61d..9dc5a98d94 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -835,6 +835,38 @@ int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi)
 return ret;
 }
 
+int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+uint16_t type_val;
+int ret;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (alloc->dom != DOMID_SELF && alloc->dom != xen_domid) {
+return -ESRCH;
+}
+
+if (alloc->remote_dom == DOMID_QEMU) {
+type_val = PORT_INFO_TYPEVAL_REMOTE_QEMU;
+} else if (alloc->remote_dom == DOMID_SELF ||
+   alloc->remote_dom == xen_domid) {
+type_val = 0;
+} else {
+return -EPERM;
+}
+
+qemu_mutex_lock(>port_lock);
+
+ret = allocate_port(s, 0, EVTCHNSTAT_unbound, type_val, >port);
+
+qemu_mutex_unlock(>port_lock);
+
+return ret;
+}
+
 int xen_evtchn_send_op(struct evtchn_send *send)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 500fdbe8b8..fc080138e3 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -21,11 +21,13 @@ struct evtchn_unmask;
 struct evtchn_bind_virq;
 struct evtchn_bind_ipi;
 struct evtchn_send;
+struct evtchn_alloc_unbound;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
 int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq);
 int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi);
 int xen_evtchn_send_op(struct evtchn_send *send);
+int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc);
 
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 5299614d3c..e186dec9a9 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -918,6 +918,21 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 err = xen_evtchn_send_op();
 break;
 }
+case EVTCHNOP_alloc_unbound: {
+struct evtchn_alloc_unbound alloc;
+
+qemu_build_assert(sizeof(alloc) == 8);
+if (kvm_copy_from_gva(cs, arg, , sizeof(alloc))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_evtchn_alloc_unbound_op();
+if (!err && kvm_copy_to_gva(cs, arg, , sizeof(alloc))) {
+err = -EFAULT;
+}
+break;
+}
 default:
 return false;
 }
-- 
2.39.0

[PATCH v11 21/59] i386/xen: handle VCPUOP_register_vcpu_info

2023-02-15 Thread David Woodhouse

From: Joao Martins 

Handle the hypercall to set a per vcpu info, and also wire up the default
vcpu_info in the shared_info page for the first 32 vCPUs.

To avoid deadlock within KVM a vCPU thread must set its *own* vcpu_info
rather than it being set from the context in which the hypercall is
invoked.

Add the vcpu_info (and default) GPA to the vmstate_x86_cpu for migration,
and restore it in kvm_arch_put_registers() appropriately.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/cpu.h|   2 +
 target/i386/kvm/kvm.c|  17 
 target/i386/kvm/trace-events |   1 +
 target/i386/kvm/xen-emu.c| 152 ++-
 target/i386/kvm/xen-emu.h|   2 +
 target/i386/machine.c|  19 +
 6 files changed, 190 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index c6c57baed5..109b2e5669 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1788,6 +1788,8 @@ typedef struct CPUArchState {
 #endif
 #if defined(CONFIG_KVM)
 struct kvm_nested_state *nested_state;
+uint64_t xen_vcpu_info_gpa;
+uint64_t xen_vcpu_info_default_gpa;
 #endif
 #if defined(CONFIG_HVF)
 HVFX86LazyFlags hvf_lflags;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a7ba3476ac..766a757bae 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4735,6 +4735,15 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 kvm_arch_set_tsc_khz(cpu);
 }
 
+#ifdef CONFIG_XEN_EMU
+if (xen_mode == XEN_EMULATE && level == KVM_PUT_FULL_STATE) {
+ret = kvm_put_xen_state(cpu);
+if (ret < 0) {
+return ret;
+}
+}
+#endif
+
 ret = kvm_getput_regs(x86_cpu, 1);
 if (ret < 0) {
 return ret;
@@ -4834,6 +4843,14 @@ int kvm_arch_get_registers(CPUState *cs)
 if (ret < 0) {
 goto out;
 }
+#ifdef CONFIG_XEN_EMU
+if (xen_mode == XEN_EMULATE) {
+ret = kvm_get_xen_state(cs);
+if (ret < 0) {
+goto out;
+}
+}
+#endif
 ret = 0;
  out:
 cpu_sync_bndcs_hflags(>env);
diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index 8e9f269f56..a840e0333d 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -10,3 +10,4 @@ kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
 kvm_xen_hypercall(int cpu, uint8_t cpl, uint64_t input, uint64_t a0, uint64_t 
a1, uint64_t a2, uint64_t ret) "xen_hypercall: cpu %d cpl %d input %" PRIu64 " 
a0 0x%" PRIx64 " a1 0x%" PRIx64 " a2 0x%" PRIx64" ret 0x%" PRIx64
 kvm_xen_soft_reset(void) ""
 kvm_xen_set_shared_info(uint64_t gfn) "shared info at gfn 0x%" PRIx64
+kvm_xen_set_vcpu_attr(int cpu, int type, uint64_t gpa) "vcpu attr cpu %d type 
%d gpa 0x%" PRIx64
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index e5ae0a9a38..1cec8566ec 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -119,6 +119,8 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
 
 int kvm_xen_init_vcpu(CPUState *cs)
 {
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
 int err;
 
 /*
@@ -142,6 +144,9 @@ int kvm_xen_init_vcpu(CPUState *cs)
 }
 }
 
+env->xen_vcpu_info_gpa = INVALID_GPA;
+env->xen_vcpu_info_default_gpa = INVALID_GPA;
+
 return 0;
 }
 
@@ -187,10 +192,58 @@ static bool kvm_xen_hcall_xen_version(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static int kvm_xen_set_vcpu_attr(CPUState *cs, uint16_t type, uint64_t gpa)
+{
+struct kvm_xen_vcpu_attr xhsi;
+
+xhsi.type = type;
+xhsi.u.gpa = gpa;
+
+trace_kvm_xen_set_vcpu_attr(cs->cpu_index, type, gpa);
+
+return kvm_vcpu_ioctl(cs, KVM_XEN_VCPU_SET_ATTR, );
+}
+
+static void do_set_vcpu_info_default_gpa(CPUState *cs, run_on_cpu_data data)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
+
+env->xen_vcpu_info_default_gpa = data.host_ulong;
+
+/* Changing the default does nothing if a vcpu_info was explicitly set. */
+if (env->xen_vcpu_info_gpa == INVALID_GPA) {
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO,
+  env->xen_vcpu_info_default_gpa);
+}
+}
+
+static void do_set_vcpu_info_gpa(CPUState *cs, run_on_cpu_data data)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
+
+env->xen_vcpu_info_gpa = data.host_ulong;
+
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO,
+  env->xen_vcpu_info_gpa);
+}
+
+static void do_vcpu_soft_reset(CPUState *cs, run_on_cpu_data data)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
+
+env->xen_vcpu_info_gpa = INVALID_GPA;
+env->xen_vcpu_info_default_gpa = INVALID_GPA;
+
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO, INVALID_GPA);
+}
+
 static int xen_set_shared_info(uint64_t gfn)
 {
 uint64_t gpa = gfn << TARGET_PAGE_BITS;

[PATCH v11 43/59] hw/xen: Add xen_gnttab device for grant table emulation

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/meson.build   |   1 +
 hw/i386/kvm/xen_gnttab.c  | 111 ++
 hw/i386/kvm/xen_gnttab.h  |  18 +++
 hw/i386/pc.c  |   2 +
 target/i386/kvm/xen-emu.c |   3 ++
 5 files changed, 135 insertions(+)
 create mode 100644 hw/i386/kvm/xen_gnttab.c
 create mode 100644 hw/i386/kvm/xen_gnttab.h

diff --git a/hw/i386/kvm/meson.build b/hw/i386/kvm/meson.build
index cab64df339..e02449e4d4 100644
--- a/hw/i386/kvm/meson.build
+++ b/hw/i386/kvm/meson.build
@@ -7,6 +7,7 @@ i386_kvm_ss.add(when: 'CONFIG_IOAPIC', if_true: 
files('ioapic.c'))
 i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files(
   'xen_overlay.c',
   'xen_evtchn.c',
+  'xen_gnttab.c',
   ))
 
 i386_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss)
diff --git a/hw/i386/kvm/xen_gnttab.c b/hw/i386/kvm/xen_gnttab.c
new file mode 100644
index 00..ef8857e50c
--- /dev/null
+++ b/hw/i386/kvm/xen_gnttab.c
@@ -0,0 +1,111 @@
+/*
+ * QEMU Xen emulation: Grant table support
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Authors: David Woodhouse 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "qemu/module.h"
+#include "qemu/lockable.h"
+#include "qemu/main-loop.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+#include "exec/target_page.h"
+#include "exec/address-spaces.h"
+#include "migration/vmstate.h"
+
+#include "hw/sysbus.h"
+#include "hw/xen/xen.h"
+#include "xen_overlay.h"
+#include "xen_gnttab.h"
+
+#include "sysemu/kvm.h"
+#include "sysemu/kvm_xen.h"
+
+#include "hw/xen/interface/memory.h"
+#include "hw/xen/interface/grant_table.h"
+
+#define TYPE_XEN_GNTTAB "xen-gnttab"
+OBJECT_DECLARE_SIMPLE_TYPE(XenGnttabState, XEN_GNTTAB)
+
+#define XEN_PAGE_SHIFT 12
+#define XEN_PAGE_SIZE (1ULL << XEN_PAGE_SHIFT)
+
+struct XenGnttabState {
+/*< private >*/
+SysBusDevice busdev;
+/*< public >*/
+
+uint32_t nr_frames;
+uint32_t max_frames;
+};
+
+struct XenGnttabState *xen_gnttab_singleton;
+
+static void xen_gnttab_realize(DeviceState *dev, Error **errp)
+{
+XenGnttabState *s = XEN_GNTTAB(dev);
+
+if (xen_mode != XEN_EMULATE) {
+error_setg(errp, "Xen grant table support is for Xen emulation");
+return;
+}
+s->nr_frames = 0;
+s->max_frames = kvm_xen_get_gnttab_max_frames();
+}
+
+static bool xen_gnttab_is_needed(void *opaque)
+{
+return xen_mode == XEN_EMULATE;
+}
+
+static const VMStateDescription xen_gnttab_vmstate = {
+.name = "xen_gnttab",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = xen_gnttab_is_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(nr_frames, XenGnttabState),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static void xen_gnttab_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->realize = xen_gnttab_realize;
+dc->vmsd = _gnttab_vmstate;
+}
+
+static const TypeInfo xen_gnttab_info = {
+.name  = TYPE_XEN_GNTTAB,
+.parent= TYPE_SYS_BUS_DEVICE,
+.instance_size = sizeof(XenGnttabState),
+.class_init= xen_gnttab_class_init,
+};
+
+void xen_gnttab_create(void)
+{
+xen_gnttab_singleton = XEN_GNTTAB(sysbus_create_simple(TYPE_XEN_GNTTAB,
+   -1, NULL));
+}
+
+static void xen_gnttab_register_types(void)
+{
+type_register_static(_gnttab_info);
+}
+
+type_init(xen_gnttab_register_types)
+
+int xen_gnttab_map_page(uint64_t idx, uint64_t gfn)
+{
+return -ENOSYS;
+}
+
diff --git a/hw/i386/kvm/xen_gnttab.h b/hw/i386/kvm/xen_gnttab.h
new file mode 100644
index 00..a7caa94c83
--- /dev/null
+++ b/hw/i386/kvm/xen_gnttab.h
@@ -0,0 +1,18 @@
+/*
+ * QEMU Xen emulation: Grant table support
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Authors: David Woodhouse 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_XEN_GNTTAB_H
+#define QEMU_XEN_GNTTAB_H
+
+void xen_gnttab_create(void);
+int xen_gnttab_map_page(uint64_t idx, uint64_t gfn);
+
+#endif /* QEMU_XEN_GNTTAB_H */
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2d3f316d10..ae1d50e084 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -91,6 +91,7 @@
 #include "hw/virtio/virtio-mem-pci.h"
 #include "hw/i386/kvm/xen_overlay.h"
 #include "hw/i386/kvm/xen_evtchn.h"
+#include "hw/i386/kvm/xen_gnttab.h"
 #include "hw/mem/memory-device.h"
 #include "sysemu/replay.h"
 #include "target/i386/cpu.h"
@@ -1858,6 +1859,7 @@ int pc_machine_kvm_type(MachineState *machine, const char 
*kvm_type)
 if (xen_mode == XEN_EMULATE) {
 xen_overlay_create();

[PATCH v11 15/59] i386/xen: add pc_machine_kvm_type to initialize XEN_EMULATE mode

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

The xen_overlay device (and later similar devices for event channels and
grant tables) need to be instantiated. Do this from a kvm_type method on
the PC machine derivatives, since KVM is only way to support Xen emulation
for now.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/pc.c | 11 +++
 include/hw/i386/pc.h |  3 +++
 2 files changed, 14 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 6e592bd969..9169305f4f 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -89,6 +89,7 @@
 #include "hw/virtio/virtio-iommu.h"
 #include "hw/virtio/virtio-pmem-pci.h"
 #include "hw/virtio/virtio-mem-pci.h"
+#include "hw/i386/kvm/xen_overlay.h"
 #include "hw/mem/memory-device.h"
 #include "sysemu/replay.h"
 #include "target/i386/cpu.h"
@@ -1844,6 +1845,16 @@ static void pc_machine_initfn(Object *obj)
 cxl_machine_init(obj, >cxl_devices_state);
 }
 
+int pc_machine_kvm_type(MachineState *machine, const char *kvm_type)
+{
+#ifdef CONFIG_XEN_EMU
+if (xen_mode == XEN_EMULATE) {
+xen_overlay_create();
+}
+#endif
+return 0;
+}
+
 static void pc_machine_reset(MachineState *machine, ShutdownCause reason)
 {
 CPUState *cs;
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 66e3d059ef..740497a961 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -291,12 +291,15 @@ extern const size_t pc_compat_1_5_len;
 extern GlobalProperty pc_compat_1_4[];
 extern const size_t pc_compat_1_4_len;
 
+extern int pc_machine_kvm_type(MachineState *machine, const char *vm_type);
+
 #define DEFINE_PC_MACHINE(suffix, namestr, initfn, optsfn) \
 static void pc_machine_##suffix##_class_init(ObjectClass *oc, void *data) \
 { \
 MachineClass *mc = MACHINE_CLASS(oc); \
 optsfn(mc); \
 mc->init = initfn; \
+mc->kvm_type = pc_machine_kvm_type; \
 } \
 static const TypeInfo pc_machine_type_##suffix = { \
 .name   = namestr TYPE_MACHINE_SUFFIX, \
-- 
2.39.0

[PATCH v11 30/59] hw/xen: Implement EVTCHNOP_close

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

It calls an internal close_port() helper which will also be used from
EVTCHNOP_reset and will actually do the work to disconnect/unbind a port
once any of that is actually implemented in the first place.

That in turn calls a free_port() internal function which will be in
error paths after allocation.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_evtchn.c  | 121 ++
 hw/i386/kvm/xen_evtchn.h  |   2 +
 target/i386/kvm/xen-emu.c |  12 
 3 files changed, 135 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 8bed33890f..08c6fac357 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -21,6 +21,7 @@
 
 #include "hw/sysbus.h"
 #include "hw/xen/xen.h"
+
 #include "xen_evtchn.h"
 #include "xen_overlay.h"
 
@@ -40,6 +41,41 @@ typedef struct XenEvtchnPort {
 uint16_t type_val;  /* pirq# / virq# / remote port according to type */
 } XenEvtchnPort;
 
+/* 32-bit compatibility definitions, also used natively in 32-bit build */
+struct compat_arch_vcpu_info {
+unsigned int cr2;
+unsigned int pad[5];
+};
+
+struct compat_vcpu_info {
+uint8_t evtchn_upcall_pending;
+uint8_t evtchn_upcall_mask;
+uint16_t pad;
+uint32_t evtchn_pending_sel;
+struct compat_arch_vcpu_info arch;
+struct vcpu_time_info time;
+}; /* 64 bytes (x86) */
+
+struct compat_arch_shared_info {
+unsigned int max_pfn;
+unsigned int pfn_to_mfn_frame_list_list;
+unsigned int nmi_reason;
+unsigned int p2m_cr3;
+unsigned int p2m_vaddr;
+unsigned int p2m_generation;
+uint32_t wc_sec_hi;
+};
+
+struct compat_shared_info {
+struct compat_vcpu_info vcpu_info[XEN_LEGACY_MAX_VCPUS];
+uint32_t evtchn_pending[32];
+uint32_t evtchn_mask[32];
+uint32_t wc_version;  /* Version counter: see vcpu_time_info_t. */
+uint32_t wc_sec;
+uint32_t wc_nsec;
+struct compat_arch_shared_info arch;
+};
+
 #define COMPAT_EVTCHN_2L_NR_CHANNELS1024
 
 /*
@@ -257,3 +293,88 @@ int xen_evtchn_status_op(struct evtchn_status *status)
 qemu_mutex_unlock(>port_lock);
 return 0;
 }
+
+static int clear_port_pending(XenEvtchnState *s, evtchn_port_t port)
+{
+void *p = xen_overlay_get_shinfo_ptr();
+if (!p)
+return -ENOTSUP;
+
+if (xen_is_long_mode()) {
+struct shared_info *shinfo = p;
+const int bits_per_word = BITS_PER_BYTE * 
sizeof(shinfo->evtchn_pending[0]);
+typeof(shinfo->evtchn_pending[0]) mask;
+int idx = port / bits_per_word;
+int offset = port % bits_per_word;
+
+mask = 1UL << offset;
+
+qatomic_fetch_and(>evtchn_pending[idx], ~mask);
+} else {
+struct compat_shared_info *shinfo = p;
+const int bits_per_word = BITS_PER_BYTE * 
sizeof(shinfo->evtchn_pending[0]);
+typeof(shinfo->evtchn_pending[0]) mask;
+int idx = port / bits_per_word;
+int offset = port % bits_per_word;
+
+mask = 1UL << offset;
+
+qatomic_fetch_and(>evtchn_pending[idx], ~mask);
+}
+return 0;
+}
+
+static void free_port(XenEvtchnState *s, evtchn_port_t port)
+{
+s->port_table[port].type = EVTCHNSTAT_closed;
+s->port_table[port].type_val = 0;
+s->port_table[port].vcpu = 0;
+
+if (s->nr_ports == port + 1) {
+do {
+s->nr_ports--;
+} while (s->nr_ports &&
+ s->port_table[s->nr_ports - 1].type == EVTCHNSTAT_closed);
+}
+
+/* Clear pending event to avoid unexpected behavior on re-bind. */
+clear_port_pending(s, port);
+}
+
+static int close_port(XenEvtchnState *s, evtchn_port_t port)
+{
+XenEvtchnPort *p = >port_table[port];
+
+switch (p->type) {
+case EVTCHNSTAT_closed:
+return -ENOENT;
+
+default:
+break;
+}
+
+free_port(s, port);
+return 0;
+}
+
+int xen_evtchn_close_op(struct evtchn_close *close)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int ret;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (!valid_port(close->port)) {
+return -EINVAL;
+}
+
+qemu_mutex_lock(>port_lock);
+
+ret = close_port(s, close->port);
+
+qemu_mutex_unlock(>port_lock);
+
+return ret;
+}
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 76467636ee..cb3924941a 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -16,6 +16,8 @@ void xen_evtchn_create(void);
 int xen_evtchn_set_callback_param(uint64_t param);
 
 struct evtchn_status;
+struct evtchn_close;
 int xen_evtchn_status_op(struct evtchn_status *status);
+int xen_evtchn_close_op(struct evtchn_close *close);
 
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 3811153724..c54372700a 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -802,6 +802,18 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit,

[PATCH v11 00/59] Xen HVM support under KVM

2023-02-15 Thread David Woodhouse

Updated to base it on the incoming Arm Xen PVH support which at least
yesterday was in the staging branch, and a couple of tweaks from Paul's
review feedback.

Most of the changes we've actually been making are in the XenStore part
which we're keeping out of this patch set as it's large enough already.
As ever it can be seen in all its glory, even running guests with PV
disk now, at 
https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv

v11: 
https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv-kvm-11

 • Rebase on Arm PVH support.

 • Fix 32-bit set_timer_op hypercall.

 • Drop references to grant table v2 which might imply imminent support.

v10: 
https://lore.kernel.org/qemu-devel/20230201143148.1744093-1-dw...@infradead.org/
 
https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv-kvm-10

 • Move imported Xen headers to include/hw/xen/interface/.

 • Allow --xen-domid to be set, and default to non-zero.

 • Update documentation to include xen-evtchn-max-pirq and
   xen-gnttab-max-frames properties.

 • Explicitly include "qemu/lockable.h" in xen_evtchn.c to fix build.

v9: 
https://lore.kernel.org/qemu-devel/20230128081113.1615111-1-dw...@infradead.org/

https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv-kvm-9

 • Fix race in GSI deassertion. I still hate this and want to fix it to
   happen on EOI at the irqchip and fix VFIO too, but we can do that in
   a separate series rather than piling it into this one. At least this
   one is nicer than the VFIO one that already exists.

 • Fix user builds by not including xen-stubs.c in those.

 • On rebasing, add some explicit includes needed after header cleanups.

v8: 
https://lore.kernel.org/qemu-devel/20230120131343.1441939-1-dw...@infradead.org/

https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv-kvm-8

 • Instantiate xen pci-platform device automatically.

 • Add documentation.

 • Rename (newly-added) CONFIG_XENFV_PLATFORM to CONFIG_XEN_EMU. That's
   basically what it enables now that the dust is settling on the rest
   of the patch set that comes next.

 • Shift QMP commands to qapi/misc-target.json, other review feedback.

 • Clear upcall vector on soft reset.

 • Wire up soft reset to occur on qemu_devices_reset() (e.g. reboot).

 • Locking tweaks largely resulting from doing soft reset with the BQL.

 • Poll for deassertion of event channel GSI from kvm_arch_post_run()
   instead of kvm_arch_handle_exit().

 • Add PIRQ support.

v7: 
https://lore.kernel.org/qemu-devel/20230116215805.1123514-1-dw...@infradead.org/

https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv-kvm-7

 • Trivial review feedback and collected ack/review tags.

 • Only call qemu_set_irq() under the BQL, which means doing so from a BH
   in some circumstances.

v6: 
https://lore.kernel.org/qemu-devel/20230110122042.1562155-1-dw...@infradead.org/

https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv-kvm-6

 • Require split irqchip to ensure the GSI handling works correctly.

 • Rework monitor commands to be QMP-based.

 • Cache vcpu_info hva to avoid MemoryRegion refcount leaks.

 • Pull in more Xen headers to allow for later PV backend work.

 • Define __XEN_TOOLS__ in hw/xen/xen.h instead of littering C files with
   separate definitions of __XEN_INTERFACE_VERSION__.

 • Drop debugging hexdump from xenstore processing.

 • Minor fixes in event channel backend handling.

 • Drop "Refactor xen_be_init()" patch. It turns out we're going to do that
   all quite differently, so it's neither necessary nor sufficient.

v5: 
https://lore.kernel.org/qemu-devel/20221230121235.1282915-1-dw...@infradead.org/

https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv-kvm-5

 • Add backend implementation of event channel support, to parallel the
   libxenevtchn API used by existing backend drivers.

 • Add basic XenStore ring implementation, test migration and kexec.

 • Some kexec/soft reset fixes (clear port pending bits, kernel timer virq).
 
 • Fix race with setting the xen_callback_asserted flag before actually
   doing so, which could lead to it being *cleared* again before we even
   assert it... and leave it asserted for ever.

v4: 
https://lore.kernel.org/qemu-devel/20221221010623.1000191-1-dw...@infradead.org/

https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv-kvm-4

 • Add soft reset support near the beginning and thread it through the
   rest of the feature enablement.

 • Add PV timer support and advertise XENFEAT_safe_hvm_pvclock.

 • Add basic grant table mapping and [gs]et_version / query_size support.

 • Make xen_platform device build (and work) without CONFIG_XEN.

 • Fix Xen HVM mode not to require --xen-attach.

v3: 
https://lore.kernel.org/qemu-devel/20221216004117.862106-1-dw...@infradead.org/

[PATCH v11 57/59] hw/xen: Support MSI mapping to PIRQ

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

The way that Xen handles MSI PIRQs is kind of awful.

There is a special MSI message which targets a PIRQ. The vector in the
low bits of data must be zero. The low 8 bits of the PIRQ# are in the
destination ID field, the extended destination ID field is unused, and
instead the high bits of the PIRQ# are in the high 32 bits of the address.

Using the high bits of the address means that we can't intercept and
translate these messages in kvm_send_msi(), because they won't be caught
by the APIC — addresses like 0x1000fee46000 aren't in the APIC's range.

So we catch them in pci_msi_trigger() instead, and deliver the event
channel directly.

That isn't even the worst part. The worst part is that Xen snoops on
writes to devices' MSI vectors while they are *masked*. When a MSI
message is written which looks like it targets a PIRQ, it remembers
the device and vector for later.

When the guest makes a hypercall to bind that PIRQ# (snooped from a
marked MSI vector) to an event channel port, Xen *unmasks* that MSI
vector on the device. Xen guests using PIRQ delivery of MSI don't
ever actually unmask the MSI for themselves.

Now that this is working we can finally enable XENFEAT_hvm_pirqs and
let the guest use it all.

Tested with passthrough igb and emulated e1000e + AHCI.

   CPU0   CPU1
  0: 65  0   IO-APIC   2-edge  timer
  1:  0 14  xen-pirq   1-ioapic-edge  i8042
  4:  0846  xen-pirq   4-ioapic-edge  ttyS0
  8:  1  0  xen-pirq   8-ioapic-edge  rtc0
  9:  0  0  xen-pirq   9-ioapic-level  acpi
 12:257  0  xen-pirq  12-ioapic-edge  i8042
 24:   9600  0  xen-percpu-virq  timer0
 25:   2758  0  xen-percpu-ipi   resched0
 26:  0  0  xen-percpu-ipi   callfunc0
 27:  0  0  xen-percpu-virq  debug0
 28:   1526  0  xen-percpu-ipi   callfuncsingle0
 29:  0  0  xen-percpu-ipi   spinlock0
 30:  0   8608  xen-percpu-virq  timer1
 31:  0874  xen-percpu-ipi   resched1
 32:  0  0  xen-percpu-ipi   callfunc1
 33:  0  0  xen-percpu-virq  debug1
 34:  0   1617  xen-percpu-ipi   callfuncsingle1
 35:  0  0  xen-percpu-ipi   spinlock1
 36:  8  0   xen-dyn-event xenbus
 37:  0   6046  xen-pirq-msi   ahci[:00:03.0]
 38:  1  0  xen-pirq-msi-x ens4
 39:  0 73  xen-pirq-msi-x ens4-rx-0
 40: 14  0  xen-pirq-msi-x ens4-rx-1
 41:  0 32  xen-pirq-msi-x ens4-tx-0
 42: 47  0  xen-pirq-msi-x ens4-tx-1

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/meson.build|   7 +
 hw/i386/kvm/trace-events   |   1 +
 hw/i386/kvm/xen-stubs.c|  27 
 hw/i386/kvm/xen_evtchn.c   | 261 -
 hw/i386/kvm/xen_evtchn.h   |   8 ++
 hw/pci/msi.c   |  11 ++
 hw/pci/msix.c  |   7 +
 hw/pci/pci.c   |  17 +++
 include/hw/pci/msi.h   |   1 +
 target/i386/kvm/kvm.c  |  19 ++-
 target/i386/kvm/kvm_i386.h |   2 +
 target/i386/kvm/xen-emu.c  |   3 +-
 12 files changed, 354 insertions(+), 10 deletions(-)
 create mode 100644 hw/i386/kvm/xen-stubs.c

diff --git a/hw/i386/kvm/meson.build b/hw/i386/kvm/meson.build
index 6d6981fced..82dd6ae7c6 100644
--- a/hw/i386/kvm/meson.build
+++ b/hw/i386/kvm/meson.build
@@ -12,3 +12,10 @@ i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files(
   ))
 
 i386_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss)
+
+xen_stubs_ss = ss.source_set()
+xen_stubs_ss.add(when: 'CONFIG_XEN_EMU', if_false: files(
+  'xen-stubs.c',
+))
+
+specific_ss.add_all(when: 'CONFIG_SOFTMMU', if_true: xen_stubs_ss)
diff --git a/hw/i386/kvm/trace-events b/hw/i386/kvm/trace-events
index 04e60c5bb8..b83c3eb965 100644
--- a/hw/i386/kvm/trace-events
+++ b/hw/i386/kvm/trace-events
@@ -2,3 +2,4 @@ kvm_xen_map_pirq(int pirq, int gsi) "pirq %d gsi %d"
 kvm_xen_unmap_pirq(int pirq, int gsi) "pirq %d gsi %d"
 kvm_xen_get_free_pirq(int pirq, int type) "pirq %d type %d"
 kvm_xen_bind_pirq(int pirq, int port) "pirq %d port %d"
+kvm_xen_unmask_pirq(int pirq, char *dev, int vector) "pirq %d dev %s vector %d"
diff --git a/hw/i386/kvm/xen-stubs.c b/hw/i386/kvm/xen-stubs.c
new file mode 100644
index 00..a95964bbac
--- /dev/null
+++ b/hw/i386/kvm/xen-stubs.c
@@ -0,0 +1,27 @@
+/*
+ * QEMU Xen emulation: QMP stubs
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Authors: David Woodhouse 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include

[PATCH v11 55/59] hw/xen: Implement emulated PIRQ hypercall support

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

This wires up the basic infrastructure but the actual interrupts aren't
there yet, so don't advertise it to the guest.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/trace-events  |   4 +
 hw/i386/kvm/trace.h   |   1 +
 hw/i386/kvm/xen_evtchn.c  | 300 +-
 hw/i386/kvm/xen_evtchn.h  |   2 +
 meson.build   |   1 +
 target/i386/kvm/xen-emu.c |  15 ++
 6 files changed, 318 insertions(+), 5 deletions(-)
 create mode 100644 hw/i386/kvm/trace-events
 create mode 100644 hw/i386/kvm/trace.h

diff --git a/hw/i386/kvm/trace-events b/hw/i386/kvm/trace-events
new file mode 100644
index 00..04e60c5bb8
--- /dev/null
+++ b/hw/i386/kvm/trace-events
@@ -0,0 +1,4 @@
+kvm_xen_map_pirq(int pirq, int gsi) "pirq %d gsi %d"
+kvm_xen_unmap_pirq(int pirq, int gsi) "pirq %d gsi %d"
+kvm_xen_get_free_pirq(int pirq, int type) "pirq %d type %d"
+kvm_xen_bind_pirq(int pirq, int port) "pirq %d port %d"
diff --git a/hw/i386/kvm/trace.h b/hw/i386/kvm/trace.h
new file mode 100644
index 00..e55d0812fd
--- /dev/null
+++ b/hw/i386/kvm/trace.h
@@ -0,0 +1 @@
+#include "trace/trace-hw_i386_kvm.h"
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index ca9f15698f..f5e835ff70 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -24,6 +24,7 @@
 #include "exec/target_page.h"
 #include "exec/address-spaces.h"
 #include "migration/vmstate.h"
+#include "trace.h"
 
 #include "hw/sysbus.h"
 #include "hw/xen/xen.h"
@@ -105,6 +106,21 @@ struct xenevtchn_handle {
 #define PORT_INFO_TYPEVAL_REMOTE_QEMU   0x8000
 #define PORT_INFO_TYPEVAL_REMOTE_PORT_MASK  0x7FFF
 
+/*
+ * These 'emuirq' values are used by Xen in the LM stream... and yes, I am
+ * insane enough to think about guest-transparent live migration from actual
+ * Xen to QEMU, and ensuring that we can convert/consume the stream.
+ */
+#define IRQ_UNBOUND -1
+#define IRQ_PT -2
+#define IRQ_MSI_EMU -3
+
+
+struct pirq_info {
+int gsi;
+uint16_t port;
+};
+
 struct XenEvtchnState {
 /*< private >*/
 SysBusDevice busdev;
@@ -122,8 +138,25 @@ struct XenEvtchnState {
 qemu_irq gsis[GSI_NUM_PINS];
 
 struct xenevtchn_handle *be_handles[EVTCHN_2L_NR_CHANNELS];
+
+uint32_t nr_pirqs;
+
+/* Bitmap of allocated PIRQs (serialized) */
+uint16_t nr_pirq_inuse_words;
+uint64_t *pirq_inuse_bitmap;
+
+/* GSI → PIRQ mapping (serialized) */
+uint16_t gsi_pirq[GSI_NUM_PINS];
+
+/* Per-PIRQ information (rebuilt on migration) */
+struct pirq_info *pirq;
 };
 
+#define pirq_inuse_word(s, pirq) (s->pirq_inuse_bitmap[((pirq) / 64)])
+#define pirq_inuse_bit(pirq) (1ULL << ((pirq) & 63))
+
+#define pirq_inuse(s, pirq) (pirq_inuse_word(s, pirq) & pirq_inuse_bit(pirq))
+
 struct XenEvtchnState *xen_evtchn_singleton;
 
 /* Top bits of callback_param are the type (HVM_PARAM_CALLBACK_TYPE_xxx) */
@@ -138,17 +171,45 @@ static int xen_evtchn_pre_load(void *opaque)
 /* Unbind all the backend-side ports; they need to rebind */
 unbind_backend_ports(s);
 
+/* It'll be leaked otherwise. */
+g_free(s->pirq_inuse_bitmap);
+s->pirq_inuse_bitmap = NULL;
+
 return 0;
 }
 
 static int xen_evtchn_post_load(void *opaque, int version_id)
 {
 XenEvtchnState *s = opaque;
+uint32_t i;
 
 if (s->callback_param) {
 xen_evtchn_set_callback_param(s->callback_param);
 }
 
+/* Rebuild s->pirq[].port mapping */
+for (i = 0; i < s->nr_ports; i++) {
+XenEvtchnPort *p = >port_table[i];
+
+if (p->type == EVTCHNSTAT_pirq) {
+assert(p->type_val);
+assert(p->type_val < s->nr_pirqs);
+
+/*
+ * Set the gsi to IRQ_UNBOUND; it may be changed to an actual
+ * GSI# below, or to IRQ_MSI_EMU when the MSI table snooping
+ * catches up with it.
+ */
+s->pirq[p->type_val].gsi = IRQ_UNBOUND;
+s->pirq[p->type_val].port = i;
+}
+}
+/* Rebuild s->pirq[].gsi mapping */
+for (i = 0; i < GSI_NUM_PINS; i++) {
+if (s->gsi_pirq[i]) {
+s->pirq[s->gsi_pirq[i]].gsi = i;
+}
+}
 return 0;
 }
 
@@ -181,6 +242,10 @@ static const VMStateDescription xen_evtchn_vmstate = {
 VMSTATE_UINT32(nr_ports, XenEvtchnState),
 VMSTATE_STRUCT_VARRAY_UINT32(port_table, XenEvtchnState, nr_ports, 1,
  xen_evtchn_port_vmstate, XenEvtchnPort),
+VMSTATE_UINT16_ARRAY(gsi_pirq, XenEvtchnState, GSI_NUM_PINS),
+VMSTATE_VARRAY_UINT16_ALLOC(pirq_inuse_bitmap, XenEvtchnState,
+nr_pirq_inuse_words, 0,
+vmstate_info_uint64, uint64_t),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -221,6 +286,23 @@ void xen_evtchn_create(void)
 for (i = 0; i < GSI_NUM_PINS; i++) {
 sysbus_init_irq(SYS_BUS_DEVICE(s), >gsis[i]);
 }
+
+

[PATCH v11 52/59] hw/xen: Add basic ring handling to xenstore

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Extract requests, return ENOSYS to all of them. This is enough to allow
older Linux guests to boot, as they need *something* back but it doesn't
matter much what.

A full implementation of a single-tentant internal XenStore copy-on-write
tree with transactions and watches is waiting in the wings to be sent in
a subsequent round of patches along with hooking up the actual PV disk
back end in qemu, but this is enough to get guests booting for now.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_xenstore.c | 223 -
 1 file changed, 220 insertions(+), 3 deletions(-)

diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
index 702f417633..2388842d15 100644
--- a/hw/i386/kvm/xen_xenstore.c
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -188,18 +188,235 @@ uint16_t xen_xenstore_get_port(void)
 return s->guest_port;
 }
 
+static bool req_pending(XenXenstoreState *s)
+{
+struct xsd_sockmsg *req = (struct xsd_sockmsg *)s->req_data;
+
+return s->req_offset == XENSTORE_HEADER_SIZE + req->len;
+}
+
+static void reset_req(XenXenstoreState *s)
+{
+memset(s->req_data, 0, sizeof(s->req_data));
+s->req_offset = 0;
+}
+
+static void reset_rsp(XenXenstoreState *s)
+{
+s->rsp_pending = false;
+
+memset(s->rsp_data, 0, sizeof(s->rsp_data));
+s->rsp_offset = 0;
+}
+
+static void process_req(XenXenstoreState *s)
+{
+struct xsd_sockmsg *req = (struct xsd_sockmsg *)s->req_data;
+struct xsd_sockmsg *rsp = (struct xsd_sockmsg *)s->rsp_data;
+const char enosys[] = "ENOSYS";
+
+assert(req_pending(s));
+   assert(!s->rsp_pending);
+
+rsp->type = XS_ERROR;
+rsp->req_id = req->req_id;
+rsp->tx_id = req->tx_id;
+rsp->len = sizeof(enosys);
+memcpy((void *)[1], enosys, sizeof(enosys));
+
+s->rsp_pending = true;
+reset_req(s);
+}
+
+static unsigned int copy_from_ring(XenXenstoreState *s, uint8_t *ptr, unsigned 
int len)
+{
+if (!len)
+return 0;
+
+XENSTORE_RING_IDX prod = qatomic_read(>xs->req_prod);
+XENSTORE_RING_IDX cons = qatomic_read(>xs->req_cons);
+unsigned int copied = 0;
+
+smp_mb();
+
+while (len) {
+unsigned int avail = prod - cons;
+unsigned int offset = MASK_XENSTORE_IDX(cons);
+unsigned int copylen = avail;
+
+if (avail > XENSTORE_RING_SIZE) {
+error_report("XenStore ring handling error");
+s->fatal_error = true;
+break;
+} else if (avail == 0)
+break;
+
+if (copylen > len) {
+copylen = len;
+}
+if (copylen > XENSTORE_RING_SIZE - offset) {
+copylen = XENSTORE_RING_SIZE - offset;
+}
+
+memcpy(ptr, >xs->req[offset], copylen);
+copied += copylen;
+
+ptr += copylen;
+len -= copylen;
+
+cons += copylen;
+}
+
+smp_mb();
+
+qatomic_set(>xs->req_cons, cons);
+
+return copied;
+}
+
+static unsigned int copy_to_ring(XenXenstoreState *s, uint8_t *ptr, unsigned 
int len)
+{
+if (!len)
+return 0;
+
+XENSTORE_RING_IDX cons = qatomic_read(>xs->rsp_cons);
+XENSTORE_RING_IDX prod = qatomic_read(>xs->rsp_prod);
+unsigned int copied = 0;
+
+smp_mb();
+
+while (len) {
+unsigned int avail = cons + XENSTORE_RING_SIZE - prod;
+unsigned int offset = MASK_XENSTORE_IDX(prod);
+unsigned int copylen = len;
+
+if (avail > XENSTORE_RING_SIZE) {
+error_report("XenStore ring handling error");
+s->fatal_error = true;
+break;
+} else if (avail == 0)
+break;
+
+if (copylen > avail) {
+copylen = avail;
+}
+if (copylen > XENSTORE_RING_SIZE - offset) {
+copylen = XENSTORE_RING_SIZE - offset;
+}
+
+
+memcpy(>xs->rsp[offset], ptr, copylen);
+copied += copylen;
+
+ptr += copylen;
+len -= copylen;
+
+prod += copylen;
+}
+
+smp_mb();
+
+qatomic_set(>xs->rsp_prod, prod);
+
+return copied;
+}
+
+static unsigned int get_req(XenXenstoreState *s)
+{
+unsigned int copied = 0;
+
+if (s->fatal_error)
+return 0;
+
+assert(!req_pending(s));
+
+if (s->req_offset < XENSTORE_HEADER_SIZE) {
+void *ptr = s->req_data + s->req_offset;
+unsigned int len = XENSTORE_HEADER_SIZE;
+unsigned int copylen = copy_from_ring(s, ptr, len);
+
+copied += copylen;
+s->req_offset += copylen;
+}
+
+if (s->req_offset >= XENSTORE_HEADER_SIZE) {
+struct xsd_sockmsg *req = (struct xsd_sockmsg *)s->req_data;
+
+if (req->len > (uint32_t)XENSTORE_PAYLOAD_MAX) {
+error_report("Illegal XenStore request");
+s->fatal_error = true;
+return 0;
+}
+
+void *ptr = s->req_data + s->req_offset;
+unsigned int len = XENSTORE_HEADER_SIZE + req->len -

[PATCH v11 36/59] hw/xen: Implement EVTCHNOP_bind_interdomain

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_evtchn.c  | 78 +++
 hw/i386/kvm/xen_evtchn.h  |  2 +
 target/i386/kvm/xen-emu.c | 16 
 3 files changed, 96 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 9dc5a98d94..3e6f7afcbc 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -720,6 +720,23 @@ static int close_port(XenEvtchnState *s, evtchn_port_t 
port)
 }
 break;
 
+case EVTCHNSTAT_interdomain:
+if (p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU) {
+/* Not yet implemented. This can't happen! */
+} else {
+/* Loopback interdomain */
+XenEvtchnPort *rp = >port_table[p->type_val];
+if (!valid_port(p->type_val) || rp->type_val != port ||
+rp->type != EVTCHNSTAT_interdomain) {
+error_report("Inconsistent state for interdomain unbind");
+} else {
+/* Set the other end back to unbound */
+rp->type = EVTCHNSTAT_unbound;
+rp->type_val = 0;
+}
+}
+break;
+
 default:
 break;
 }
@@ -835,6 +852,67 @@ int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi)
 return ret;
 }
 
+int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain *interdomain)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+uint16_t type_val;
+int ret;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (interdomain->remote_dom == DOMID_QEMU) {
+type_val = PORT_INFO_TYPEVAL_REMOTE_QEMU;
+} else if (interdomain->remote_dom == DOMID_SELF ||
+   interdomain->remote_dom == xen_domid) {
+type_val = 0;
+} else {
+return -ESRCH;
+}
+
+if (!valid_port(interdomain->remote_port)) {
+return -EINVAL;
+}
+
+qemu_mutex_lock(>port_lock);
+
+/* The newly allocated port starts out as unbound */
+ret = allocate_port(s, 0, EVTCHNSTAT_unbound, type_val,
+>local_port);
+if (ret) {
+goto out;
+}
+
+if (interdomain->remote_dom == DOMID_QEMU) {
+/* We haven't hooked up QEMU's PV drivers to this yet */
+ret = -ENOSYS;
+} else {
+/* Loopback */
+XenEvtchnPort *rp = >port_table[interdomain->remote_port];
+XenEvtchnPort *lp = >port_table[interdomain->local_port];
+
+if (rp->type == EVTCHNSTAT_unbound && rp->type_val == 0) {
+/* It's a match! */
+rp->type = EVTCHNSTAT_interdomain;
+rp->type_val = interdomain->local_port;
+
+lp->type = EVTCHNSTAT_interdomain;
+lp->type_val = interdomain->remote_port;
+} else {
+ret = -EINVAL;
+}
+}
+
+if (ret) {
+free_port(s, interdomain->local_port);
+}
+ out:
+qemu_mutex_unlock(>port_lock);
+
+return ret;
+
+}
 int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index fc080138e3..1ebc7580eb 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -22,6 +22,7 @@ struct evtchn_bind_virq;
 struct evtchn_bind_ipi;
 struct evtchn_send;
 struct evtchn_alloc_unbound;
+struct evtchn_bind_interdomain;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
@@ -29,5 +30,6 @@ int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq);
 int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi);
 int xen_evtchn_send_op(struct evtchn_send *send);
 int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc);
+int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain 
*interdomain);
 
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index e186dec9a9..a07d1d39f3 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -933,6 +933,22 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 }
 break;
 }
+case EVTCHNOP_bind_interdomain: {
+struct evtchn_bind_interdomain interdomain;
+
+qemu_build_assert(sizeof(interdomain) == 12);
+if (kvm_copy_from_gva(cs, arg, , sizeof(interdomain))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_evtchn_bind_interdomain_op();
+if (!err &&
+kvm_copy_to_gva(cs, arg, , sizeof(interdomain))) {
+err = -EFAULT;
+}
+break;
+}
 default:
 return false;
 }
-- 
2.39.0

[PATCH v11 47/59] i386/xen: handle PV timer hypercalls

2023-02-15 Thread David Woodhouse

From: Joao Martins 

Introduce support for one shot and periodic mode of Xen PV timers,
whereby timer interrupts come through a special virq event channel
with deadlines being set through:

1) set_timer_op hypercall (only oneshot)
2) vcpu_op hypercall for {set,stop}_{singleshot,periodic}_timer
hypercalls

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  |  31 +
 hw/i386/kvm/xen_evtchn.h  |   2 +
 target/i386/cpu.h |   5 +
 target/i386/kvm/xen-emu.c | 252 +-
 target/i386/machine.c |   1 +
 5 files changed, 289 insertions(+), 2 deletions(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 5d5996641d..06572b3e10 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -1220,6 +1220,37 @@ int xen_evtchn_send_op(struct evtchn_send *send)
 return ret;
 }
 
+int xen_evtchn_set_port(uint16_t port)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+XenEvtchnPort *p;
+int ret = -EINVAL;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (!valid_port(port)) {
+return -EINVAL;
+}
+
+qemu_mutex_lock(>port_lock);
+
+p = >port_table[port];
+
+/* QEMU has no business sending to anything but these */
+if (p->type == EVTCHNSTAT_virq ||
+(p->type == EVTCHNSTAT_interdomain &&
+ (p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU))) {
+set_port_pending(s, port);
+ret = 0;
+}
+
+qemu_mutex_unlock(>port_lock);
+
+return ret;
+}
+
 EvtchnInfoList *qmp_xen_event_list(Error **errp)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index b03c3108bc..24611478b8 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -20,6 +20,8 @@ int xen_evtchn_set_callback_param(uint64_t param);
 void xen_evtchn_connect_gsis(qemu_irq *system_gsis);
 void xen_evtchn_set_callback_level(int level);
 
+int xen_evtchn_set_port(uint16_t port);
+
 struct evtchn_status;
 struct evtchn_close;
 struct evtchn_unmask;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index e8718c31e5..b579f0f0f8 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -26,6 +26,7 @@
 #include "exec/cpu-defs.h"
 #include "qapi/qapi-types-common.h"
 #include "qemu/cpu-float.h"
+#include "qemu/timer.h"
 
 #define XEN_NR_VIRQS 24
 
@@ -1800,6 +1801,10 @@ typedef struct CPUArchState {
 bool xen_callback_asserted;
 uint16_t xen_virq[XEN_NR_VIRQS];
 uint64_t xen_singleshot_timer_ns;
+QEMUTimer *xen_singleshot_timer;
+uint64_t xen_periodic_timer_period;
+QEMUTimer *xen_periodic_timer;
+QemuMutex xen_timers_lock;
 #endif
 #if defined(CONFIG_HVF)
 HVFX86LazyFlags hvf_lflags;
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 44fa0de784..4781b1fa97 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -38,6 +38,9 @@
 
 #include "xen-compat.h"
 
+static void xen_vcpu_singleshot_timer_event(void *opaque);
+static void xen_vcpu_periodic_timer_event(void *opaque);
+
 #ifdef TARGET_X86_64
 #define hypercall_compat32(longmode) (!(longmode))
 #else
@@ -201,6 +204,23 @@ int kvm_xen_init_vcpu(CPUState *cs)
 env->xen_vcpu_time_info_gpa = INVALID_GPA;
 env->xen_vcpu_runstate_gpa = INVALID_GPA;
 
+qemu_mutex_init(>xen_timers_lock);
+env->xen_singleshot_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+ xen_vcpu_singleshot_timer_event,
+ cpu);
+if (!env->xen_singleshot_timer) {
+return -ENOMEM;
+}
+env->xen_singleshot_timer->opaque = cs;
+
+env->xen_periodic_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+   xen_vcpu_periodic_timer_event,
+   cpu);
+if (!env->xen_periodic_timer) {
+return -ENOMEM;
+}
+env->xen_periodic_timer->opaque = cs;
+
 return 0;
 }
 
@@ -232,7 +252,8 @@ static bool kvm_xen_hcall_xen_version(struct kvm_xen_exit 
*exit, X86CPU *cpu,
  1 << XENFEAT_writable_descriptor_tables |
  1 << XENFEAT_auto_translated_physmap |
  1 << XENFEAT_supervisor_mode_kernel |
- 1 << XENFEAT_hvm_callback_vector;
+ 1 << XENFEAT_hvm_callback_vector |
+ 1 << XENFEAT_hvm_safe_pvclock;
 }
 
 err = kvm_copy_to_gva(CPU(cpu), arg, , sizeof(fi));
@@ -875,13 +896,192 @@ static int vcpuop_register_runstate_info(CPUState *cs, 
CPUState *target,
 return 0;
 }
 
+static uint64_t kvm_get_current_ns(void)
+{
+struct kvm_clock_data data;
+int ret;
+
+ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, );
+if (ret < 0) {
+fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
+abort();
+}
+
+return data.clock;
+}
+
+static void

[PATCH v11 42/59] kvm/i386: Add xen-gnttab-max-frames property

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 accel/kvm/kvm-all.c   |  1 +
 include/sysemu/kvm_int.h  |  1 +
 include/sysemu/kvm_xen.h  |  1 +
 target/i386/kvm/kvm.c | 34 ++
 target/i386/kvm/xen-emu.c |  6 ++
 5 files changed, 43 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f242e36316..dc5b0bb434 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3704,6 +3704,7 @@ static void kvm_accel_instance_init(Object *obj)
 s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN;
 s->notify_window = 0;
 s->xen_version = 0;
+s->xen_gnttab_max_frames = 64;
 }
 
 /**
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 7f945bc763..39ce4d36f6 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -120,6 +120,7 @@ struct KVMState
 uint32_t notify_window;
 uint32_t xen_version;
 uint32_t xen_caps;
+uint16_t xen_gnttab_max_frames;
 };
 
 void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
diff --git a/include/sysemu/kvm_xen.h b/include/sysemu/kvm_xen.h
index 1edff29541..7fee28dec7 100644
--- a/include/sysemu/kvm_xen.h
+++ b/include/sysemu/kvm_xen.h
@@ -25,6 +25,7 @@ void *kvm_xen_get_vcpu_info_hva(uint32_t vcpu_id);
 void kvm_xen_inject_vcpu_callback_vector(uint32_t vcpu_id, int type);
 void kvm_xen_set_callback_asserted(void);
 int kvm_xen_set_vcpu_virq(uint32_t vcpu_id, uint16_t virq, uint16_t port);
+uint16_t kvm_xen_get_gnttab_max_frames(void);
 
 #define kvm_xen_has_cap(cap) (!!(kvm_xen_get_caps() &   \
  KVM_XEN_HVM_CONFIG_ ## cap))
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f6ae70c831..6d112ccddd 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5865,6 +5865,33 @@ static void kvm_arch_set_xen_version(Object *obj, 
Visitor *v,
 }
 }
 
+static void kvm_arch_get_xen_gnttab_max_frames(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+KVMState *s = KVM_STATE(obj);
+uint16_t value = s->xen_gnttab_max_frames;
+
+visit_type_uint16(v, name, , errp);
+}
+
+static void kvm_arch_set_xen_gnttab_max_frames(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+KVMState *s = KVM_STATE(obj);
+Error *error = NULL;
+uint16_t value;
+
+visit_type_uint16(v, name, , );
+if (error) {
+error_propagate(errp, error);
+return;
+}
+
+s->xen_gnttab_max_frames = value;
+}
+
 void kvm_arch_accel_class_init(ObjectClass *oc)
 {
 object_class_property_add_enum(oc, "notify-vmexit", "NotifyVMexitOption",
@@ -5890,6 +5917,13 @@ void kvm_arch_accel_class_init(ObjectClass *oc)
   "Xen version to be emulated "
   "(in XENVER_version form "
   "e.g. 0x4000a for 4.10)");
+
+object_class_property_add(oc, "xen-gnttab-max-frames", "uint16",
+  kvm_arch_get_xen_gnttab_max_frames,
+  kvm_arch_set_xen_gnttab_max_frames,
+  NULL, NULL);
+object_class_property_set_description(oc, "xen-gnttab-max-frames",
+  "Maximum number of grant table 
frames");
 }
 
 void kvm_set_max_apic_id(uint32_t max_apic_id)
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index ec82170261..c57620ca51 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -1235,6 +1235,12 @@ int kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit 
*exit)
 return 0;
 }
 
+uint16_t kvm_xen_get_gnttab_max_frames(void)
+{
+KVMState *s = KVM_STATE(current_accel());
+return s->xen_gnttab_max_frames;
+}
+
 int kvm_put_xen_state(CPUState *cs)
 {
 X86CPU *cpu = X86_CPU(cs);
-- 
2.39.0

[PATCH v11 48/59] i386/xen: Reserve Xen special pages for console, xenstore rings

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Xen has eight frames at 0xfeff8000 for this; we only really need two for
now and KVM puts the identity map at 0xfeffc000, so limit ourselves to
four.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 include/sysemu/kvm_xen.h  |  8 
 target/i386/kvm/xen-emu.c | 10 ++
 2 files changed, 18 insertions(+)

diff --git a/include/sysemu/kvm_xen.h b/include/sysemu/kvm_xen.h
index 7fee28dec7..0b63bb81df 100644
--- a/include/sysemu/kvm_xen.h
+++ b/include/sysemu/kvm_xen.h
@@ -30,4 +30,12 @@ uint16_t kvm_xen_get_gnttab_max_frames(void);
 #define kvm_xen_has_cap(cap) (!!(kvm_xen_get_caps() &   \
  KVM_XEN_HVM_CONFIG_ ## cap))
 
+#define XEN_SPECIAL_AREA_ADDR 0xfeff8000UL
+#define XEN_SPECIAL_AREA_SIZE 0x4000UL
+
+#define XEN_SPECIALPAGE_CONSOLE 0
+#define XEN_SPECIALPAGE_XENSTORE1
+
+#define XEN_SPECIAL_PFN(x) ((XEN_SPECIAL_AREA_ADDR >> TARGET_PAGE_BITS) + 
XEN_SPECIALPAGE_##x)
+
 #endif /* QEMU_SYSEMU_KVM_XEN_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 4781b1fa97..f55ab08959 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -23,6 +23,7 @@
 
 #include "hw/pci/msi.h"
 #include "hw/i386/apic-msidef.h"
+#include "hw/i386/e820_memory_layout.h"
 #include "hw/i386/kvm/xen_overlay.h"
 #include "hw/i386/kvm/xen_evtchn.h"
 #include "hw/i386/kvm/xen_gnttab.h"
@@ -169,6 +170,15 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
 }
 
 s->xen_caps = xen_caps;
+
+/* Tell fw_cfg to notify the BIOS to reserve the range. */
+ret = e820_add_entry(XEN_SPECIAL_AREA_ADDR, XEN_SPECIAL_AREA_SIZE,
+ E820_RESERVED);
+if (ret < 0) {
+fprintf(stderr, "e820_add_entry() table is full\n");
+return ret;
+}
+
 return 0;
 }
 
-- 
2.39.0

[PATCH v11 17/59] i386/xen: implement HYPERVISOR_memory_op

2023-02-15 Thread David Woodhouse

From: Joao Martins 

Specifically XENMEM_add_to_physmap with space XENMAPSPACE_shared_info to
allow the guest to set its shared_info page.

Signed-off-by: Joao Martins 
[dwmw2: Use the xen_overlay device, add compat support]
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/kvm/trace-events |   1 +
 target/i386/kvm/xen-compat.h |  27 
 target/i386/kvm/xen-emu.c| 116 ++-
 3 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 target/i386/kvm/xen-compat.h

diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index bb732e1da8..8e9f269f56 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -9,3 +9,4 @@ kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
 # xen-emu.c
 kvm_xen_hypercall(int cpu, uint8_t cpl, uint64_t input, uint64_t a0, uint64_t 
a1, uint64_t a2, uint64_t ret) "xen_hypercall: cpu %d cpl %d input %" PRIu64 " 
a0 0x%" PRIx64 " a1 0x%" PRIx64 " a2 0x%" PRIx64" ret 0x%" PRIx64
 kvm_xen_soft_reset(void) ""
+kvm_xen_set_shared_info(uint64_t gfn) "shared info at gfn 0x%" PRIx64
diff --git a/target/i386/kvm/xen-compat.h b/target/i386/kvm/xen-compat.h
new file mode 100644
index 00..2d852e2a28
--- /dev/null
+++ b/target/i386/kvm/xen-compat.h
@@ -0,0 +1,27 @@
+/*
+ * Xen HVM emulation support in KVM
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_I386_KVM_XEN_COMPAT_H
+#define QEMU_I386_KVM_XEN_COMPAT_H
+
+#include "hw/xen/interface/memory.h"
+
+typedef uint32_t compat_pfn_t;
+typedef uint32_t compat_ulong_t;
+
+struct compat_xen_add_to_physmap {
+domid_t domid;
+uint16_t size;
+unsigned int space;
+compat_ulong_t idx;
+compat_pfn_t gpfn;
+};
+
+#endif /* QEMU_I386_XEN_COMPAT_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index be6d85f2cb..5d79827128 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -12,6 +12,7 @@
 #include "qemu/osdep.h"
 #include "qemu/log.h"
 #include "qemu/main-loop.h"
+#include "hw/xen/xen.h"
 #include "sysemu/kvm_int.h"
 #include "sysemu/kvm_xen.h"
 #include "kvm/kvm_i386.h"
@@ -24,6 +25,15 @@
 
 #include "hw/xen/interface/version.h"
 #include "hw/xen/interface/sched.h"
+#include "hw/xen/interface/memory.h"
+
+#include "xen-compat.h"
+
+#ifdef TARGET_X86_64
+#define hypercall_compat32(longmode) (!(longmode))
+#else
+#define hypercall_compat32(longmode) (false)
+#endif
 
 static int kvm_gva_rw(CPUState *cs, uint64_t gva, void *_buf, size_t sz,
   bool is_write)
@@ -175,13 +185,114 @@ static bool kvm_xen_hcall_xen_version(struct 
kvm_xen_exit *exit, X86CPU *cpu,
 return true;
 }
 
+static int xen_set_shared_info(uint64_t gfn)
+{
+uint64_t gpa = gfn << TARGET_PAGE_BITS;
+int err;
+
+QEMU_IOTHREAD_LOCK_GUARD();
+
+/*
+ * The xen_overlay device tells KVM about it too, since it had to
+ * do that on migration load anyway (unless we're going to jump
+ * through lots of hoops to maintain the fiction that this isn't
+ * KVM-specific.
+ */
+err = xen_overlay_map_shinfo_page(gpa);
+if (err) {
+return err;
+}
+
+trace_kvm_xen_set_shared_info(gfn);
+
+return err;
+}
+
+static int add_to_physmap_one(uint32_t space, uint64_t idx, uint64_t gfn)
+{
+switch (space) {
+case XENMAPSPACE_shared_info:
+if (idx > 0) {
+return -EINVAL;
+}
+return xen_set_shared_info(gfn);
+
+case XENMAPSPACE_grant_table:
+case XENMAPSPACE_gmfn:
+case XENMAPSPACE_gmfn_range:
+return -ENOTSUP;
+
+case XENMAPSPACE_gmfn_foreign:
+case XENMAPSPACE_dev_mmio:
+return -EPERM;
+
+default:
+return -EINVAL;
+}
+}
+
+static int do_add_to_physmap(struct kvm_xen_exit *exit, X86CPU *cpu,
+ uint64_t arg)
+{
+struct xen_add_to_physmap xatp;
+CPUState *cs = CPU(cpu);
+
+if (hypercall_compat32(exit->u.hcall.longmode)) {
+struct compat_xen_add_to_physmap xatp32;
+
+qemu_build_assert(sizeof(struct compat_xen_add_to_physmap) == 16);
+if (kvm_copy_from_gva(cs, arg, , sizeof(xatp32))) {
+return -EFAULT;
+}
+xatp.domid = xatp32.domid;
+xatp.size = xatp32.size;
+xatp.space = xatp32.space;
+xatp.idx = xatp32.idx;
+xatp.gpfn = xatp32.gpfn;
+} else {
+if (kvm_copy_from_gva(cs, arg, , sizeof(xatp))) {
+return -EFAULT;
+}
+}
+
+if (xatp.domid != DOMID_SELF && xatp.domid != xen_domid) {
+return -ESRCH;
+}
+
+return add_to_physmap_one(xatp.space, xatp.idx, xatp.gpfn);
+}
+
+static bool kvm_xen_hcall_memory_op(struct kvm_xen_exit *exit, X86CPU *cpu,
+   int

[PATCH v11 37/59] hw/xen: Implement EVTCHNOP_bind_vcpu

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_evtchn.c  | 40 +++
 hw/i386/kvm/xen_evtchn.h  |  2 ++
 target/i386/kvm/xen-emu.c | 12 
 3 files changed, 54 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 3e6f7afcbc..f87b6a3b23 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -789,6 +789,46 @@ int xen_evtchn_unmask_op(struct evtchn_unmask *unmask)
 return ret;
 }
 
+int xen_evtchn_bind_vcpu_op(struct evtchn_bind_vcpu *vcpu)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+XenEvtchnPort *p;
+int ret = -EINVAL;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (!valid_port(vcpu->port)) {
+return -EINVAL;
+}
+
+if (!valid_vcpu(vcpu->vcpu)) {
+return -ENOENT;
+}
+
+qemu_mutex_lock(>port_lock);
+
+p = >port_table[vcpu->port];
+
+if (p->type == EVTCHNSTAT_interdomain ||
+p->type == EVTCHNSTAT_unbound ||
+p->type == EVTCHNSTAT_pirq ||
+(p->type == EVTCHNSTAT_virq && virq_is_global(p->type_val))) {
+/*
+ * unmask_port() with do_unmask==false will just raise the event
+ * on the new vCPU if the port was already pending.
+ */
+p->vcpu = vcpu->vcpu;
+unmask_port(s, vcpu->port, false);
+ret = 0;
+}
+
+qemu_mutex_unlock(>port_lock);
+
+return ret;
+}
+
 int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 1ebc7580eb..486b031c82 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -23,6 +23,7 @@ struct evtchn_bind_ipi;
 struct evtchn_send;
 struct evtchn_alloc_unbound;
 struct evtchn_bind_interdomain;
+struct evtchn_bind_vcpu;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
@@ -31,5 +32,6 @@ int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi);
 int xen_evtchn_send_op(struct evtchn_send *send);
 int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc);
 int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain 
*interdomain);
+int xen_evtchn_bind_vcpu_op(struct evtchn_bind_vcpu *vcpu);
 
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index a07d1d39f3..ec7aefadfc 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -949,6 +949,18 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 }
 break;
 }
+case EVTCHNOP_bind_vcpu: {
+struct evtchn_bind_vcpu vcpu;
+
+qemu_build_assert(sizeof(vcpu) == 8);
+if (kvm_copy_from_gva(cs, arg, , sizeof(vcpu))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_evtchn_bind_vcpu_op();
+break;
+}
 default:
 return false;
 }
-- 
2.39.0

[PATCH v11 34/59] hw/xen: Implement EVTCHNOP_send

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_evtchn.c  | 180 ++
 hw/i386/kvm/xen_evtchn.h  |   2 +
 target/i386/kvm/xen-emu.c |  12 +++
 3 files changed, 194 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index d8527483b9..a97d6ba61d 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -490,6 +490,133 @@ static int unmask_port(XenEvtchnState *s, evtchn_port_t 
port, bool do_unmask)
 }
 }
 
+static int do_set_port_lm(XenEvtchnState *s, evtchn_port_t port,
+  struct shared_info *shinfo,
+  struct vcpu_info *vcpu_info)
+{
+const int bits_per_word = BITS_PER_BYTE * 
sizeof(shinfo->evtchn_pending[0]);
+typeof(shinfo->evtchn_pending[0]) mask;
+int idx = port / bits_per_word;
+int offset = port % bits_per_word;
+
+mask = 1UL << offset;
+
+if (idx >= bits_per_word) {
+return -EINVAL;
+}
+
+/* Update the pending bit itself. If it was already set, we're done. */
+if (qatomic_fetch_or(>evtchn_pending[idx], mask) & mask) {
+return 0;
+}
+
+/* Check if it's masked. */
+if (qatomic_fetch_or(>evtchn_mask[idx], 0) & mask) {
+return 0;
+}
+
+/* Now on to the vcpu_info evtchn_pending_sel index... */
+mask = 1UL << idx;
+
+/* If a port in this word was already pending for this vCPU, all done. */
+if (qatomic_fetch_or(_info->evtchn_pending_sel, mask) & mask) {
+return 0;
+}
+
+/* Set evtchn_upcall_pending for this vCPU */
+if (qatomic_fetch_or(_info->evtchn_upcall_pending, 1)) {
+return 0;
+}
+
+inject_callback(s, s->port_table[port].vcpu);
+
+return 0;
+}
+
+static int do_set_port_compat(XenEvtchnState *s, evtchn_port_t port,
+  struct compat_shared_info *shinfo,
+  struct compat_vcpu_info *vcpu_info)
+{
+const int bits_per_word = BITS_PER_BYTE * 
sizeof(shinfo->evtchn_pending[0]);
+typeof(shinfo->evtchn_pending[0]) mask;
+int idx = port / bits_per_word;
+int offset = port % bits_per_word;
+
+mask = 1UL << offset;
+
+if (idx >= bits_per_word) {
+return -EINVAL;
+}
+
+/* Update the pending bit itself. If it was already set, we're done. */
+if (qatomic_fetch_or(>evtchn_pending[idx], mask) & mask) {
+return 0;
+}
+
+/* Check if it's masked. */
+if (qatomic_fetch_or(>evtchn_mask[idx], 0) & mask) {
+return 0;
+}
+
+/* Now on to the vcpu_info evtchn_pending_sel index... */
+mask = 1UL << idx;
+
+/* If a port in this word was already pending for this vCPU, all done. */
+if (qatomic_fetch_or(_info->evtchn_pending_sel, mask) & mask) {
+return 0;
+}
+
+/* Set evtchn_upcall_pending for this vCPU */
+if (qatomic_fetch_or(_info->evtchn_upcall_pending, 1)) {
+return 0;
+}
+
+inject_callback(s, s->port_table[port].vcpu);
+
+return 0;
+}
+
+static int set_port_pending(XenEvtchnState *s, evtchn_port_t port)
+{
+void *vcpu_info, *shinfo;
+
+if (s->port_table[port].type == EVTCHNSTAT_closed) {
+return -EINVAL;
+}
+
+if (s->evtchn_in_kernel) {
+XenEvtchnPort *p = >port_table[port];
+CPUState *cpu = qemu_get_cpu(p->vcpu);
+struct kvm_irq_routing_xen_evtchn evt;
+
+if (!cpu) {
+return 0;
+}
+
+evt.port = port;
+evt.vcpu = kvm_arch_vcpu_id(cpu);
+evt.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL;
+
+return kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_EVTCHN_SEND, );
+}
+
+shinfo = xen_overlay_get_shinfo_ptr();
+if (!shinfo) {
+return -ENOTSUP;
+}
+
+vcpu_info = kvm_xen_get_vcpu_info_hva(s->port_table[port].vcpu);
+if (!vcpu_info) {
+return -EINVAL;
+}
+
+if (xen_is_long_mode()) {
+return do_set_port_lm(s, port, shinfo, vcpu_info);
+} else {
+return do_set_port_compat(s, port, shinfo, vcpu_info);
+}
+}
+
 static int clear_port_pending(XenEvtchnState *s, evtchn_port_t port)
 {
 void *p = xen_overlay_get_shinfo_ptr();
@@ -707,3 +834,56 @@ int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi)
 
 return ret;
 }
+
+int xen_evtchn_send_op(struct evtchn_send *send)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+XenEvtchnPort *p;
+int ret = 0;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (!valid_port(send->port)) {
+return -EINVAL;
+}
+
+qemu_mutex_lock(>port_lock);
+
+p = >port_table[send->port];
+
+switch (p->type) {
+case EVTCHNSTAT_interdomain:
+if (p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU) {
+/*
+ * This is an event from the guest to qemu itself, which is
+ * serving as the driver domain. Not yet implemented; it will
+ * be hooked up to the qemu

[PATCH v11 16/59] i386/xen: manage and save/restore Xen guest long_mode setting

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Xen will "latch" the guest's 32-bit or 64-bit ("long mode") setting when
the guest writes the MSR to fill in the hypercall page, or when the guest
sets the event channel callback in HVM_PARAM_CALLBACK_IRQ.

KVM handles the former and sets the kernel's long_mode flag accordingly.
The latter will be handled in userspace. Keep them in sync by noticing
when a hypercall is made in a mode that doesn't match qemu's idea of
the guest mode, and resyncing from the kernel. Do that same sync right
before serialization too, in case the guest has set the hypercall page
but hasn't yet made a system call.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_overlay.c | 62 +++
 hw/i386/kvm/xen_overlay.h |  4 +++
 target/i386/kvm/xen-emu.c | 12 
 3 files changed, 78 insertions(+)

diff --git a/hw/i386/kvm/xen_overlay.c b/hw/i386/kvm/xen_overlay.c
index a2441e2b4e..8685d87959 100644
--- a/hw/i386/kvm/xen_overlay.c
+++ b/hw/i386/kvm/xen_overlay.c
@@ -44,6 +44,7 @@ struct XenOverlayState {
 MemoryRegion shinfo_mem;
 void *shinfo_ptr;
 uint64_t shinfo_gpa;
+bool long_mode;
 };
 
 struct XenOverlayState *xen_overlay_singleton;
@@ -96,9 +97,21 @@ static void xen_overlay_realize(DeviceState *dev, Error 
**errp)
 
 s->shinfo_ptr = memory_region_get_ram_ptr(>shinfo_mem);
 s->shinfo_gpa = INVALID_GPA;
+s->long_mode = false;
 memset(s->shinfo_ptr, 0, XEN_PAGE_SIZE);
 }
 
+static int xen_overlay_pre_save(void *opaque)
+{
+/*
+ * Fetch the kernel's idea of long_mode to avoid the race condition
+ * where the guest has set the hypercall page up in 64-bit mode but
+ * not yet made a hypercall by the time migration happens, so qemu
+ * hasn't yet noticed.
+ */
+return xen_sync_long_mode();
+}
+
 static int xen_overlay_post_load(void *opaque, int version_id)
 {
 XenOverlayState *s = opaque;
@@ -107,6 +120,9 @@ static int xen_overlay_post_load(void *opaque, int 
version_id)
 xen_overlay_do_map_page(>shinfo_mem, s->shinfo_gpa);
 xen_overlay_set_be_shinfo(s->shinfo_gpa >> XEN_PAGE_SHIFT);
 }
+if (s->long_mode) {
+xen_set_long_mode(true);
+}
 
 return 0;
 }
@@ -121,9 +137,11 @@ static const VMStateDescription xen_overlay_vmstate = {
 .version_id = 1,
 .minimum_version_id = 1,
 .needed = xen_overlay_is_needed,
+.pre_save = xen_overlay_pre_save,
 .post_load = xen_overlay_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINT64(shinfo_gpa, XenOverlayState),
+VMSTATE_BOOL(long_mode, XenOverlayState),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -208,3 +226,47 @@ void *xen_overlay_get_shinfo_ptr(void)
 
 return s->shinfo_ptr;
 }
+
+int xen_sync_long_mode(void)
+{
+int ret;
+struct kvm_xen_hvm_attr xa = {
+.type = KVM_XEN_ATTR_TYPE_LONG_MODE,
+};
+
+if (!xen_overlay_singleton) {
+return -ENOENT;
+}
+
+ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_GET_ATTR, );
+if (!ret) {
+xen_overlay_singleton->long_mode = xa.u.long_mode;
+}
+
+return ret;
+}
+
+int xen_set_long_mode(bool long_mode)
+{
+int ret;
+struct kvm_xen_hvm_attr xa = {
+.type = KVM_XEN_ATTR_TYPE_LONG_MODE,
+.u.long_mode = long_mode,
+};
+
+if (!xen_overlay_singleton) {
+return -ENOENT;
+}
+
+ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, );
+if (!ret) {
+xen_overlay_singleton->long_mode = xa.u.long_mode;
+}
+
+return ret;
+}
+
+bool xen_is_long_mode(void)
+{
+return xen_overlay_singleton && xen_overlay_singleton->long_mode;
+}
diff --git a/hw/i386/kvm/xen_overlay.h b/hw/i386/kvm/xen_overlay.h
index 00cff05bb0..5c46a0b036 100644
--- a/hw/i386/kvm/xen_overlay.h
+++ b/hw/i386/kvm/xen_overlay.h
@@ -17,4 +17,8 @@ void xen_overlay_create(void);
 int xen_overlay_map_shinfo_page(uint64_t gpa);
 void *xen_overlay_get_shinfo_ptr(void);
 
+int xen_sync_long_mode(void);
+int xen_set_long_mode(bool long_mode);
+bool xen_is_long_mode(void);
+
 #endif /* QEMU_XEN_OVERLAY_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index ebea27caf6..be6d85f2cb 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -20,6 +20,8 @@
 #include "trace.h"
 #include "sysemu/runstate.h"
 
+#include "hw/i386/kvm/xen_overlay.h"
+
 #include "hw/xen/interface/version.h"
 #include "hw/xen/interface/sched.h"
 
@@ -282,6 +284,16 @@ int kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit 
*exit)
 return -1;
 }
 
+/*
+ * The kernel latches the guest 32/64 mode when the MSR is used to fill
+ * the hypercall page. So if we see a hypercall in a mode that doesn't
+ * match our own idea of the guest mode, fetch the kernel's idea of the
+ * "long mode" to remain in sync.
+ */
+if (exit->u.hcall.longmode != xen_is_long_mode()) {
+xen_sync_long_mode();
+}
+
 if

[PATCH v11 20/59] i386/xen: implement HYPERVISOR_vcpu_op

2023-02-15 Thread David Woodhouse

From: Joao Martins 

This is simply when guest tries to register a vcpu_info
and since vcpu_info placement is optional in the minimum ABI
therefore we can just fail with -ENOSYS

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/kvm/xen-emu.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 4002b1b797..e5ae0a9a38 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -27,6 +27,7 @@
 #include "hw/xen/interface/sched.h"
 #include "hw/xen/interface/memory.h"
 #include "hw/xen/interface/hvm/hvm_op.h"
+#include "hw/xen/interface/vcpu.h"
 
 #include "xen-compat.h"
 
@@ -363,6 +364,25 @@ static bool kvm_xen_hcall_hvm_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 }
 }
 
+static bool kvm_xen_hcall_vcpu_op(struct kvm_xen_exit *exit, X86CPU *cpu,
+  int cmd, int vcpu_id, uint64_t arg)
+{
+int err;
+
+switch (cmd) {
+case VCPUOP_register_vcpu_info:
+/* no vcpu info placement for now */
+err = -ENOSYS;
+break;
+
+default:
+return false;
+}
+
+exit->u.hcall.result = err;
+return true;
+}
+
 int kvm_xen_soft_reset(void)
 {
 int err;
@@ -464,6 +484,11 @@ static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct 
kvm_xen_exit *exit)
 case __HYPERVISOR_sched_op:
 return kvm_xen_hcall_sched_op(exit, cpu, exit->u.hcall.params[0],
   exit->u.hcall.params[1]);
+case __HYPERVISOR_vcpu_op:
+return kvm_xen_hcall_vcpu_op(exit, cpu,
+ exit->u.hcall.params[0],
+ exit->u.hcall.params[1],
+ exit->u.hcall.params[2]);
 case __HYPERVISOR_hvm_op:
 return kvm_xen_hcall_hvm_op(exit, cpu, exit->u.hcall.params[0],
 exit->u.hcall.params[1]);
-- 
2.39.0

[PATCH v11 27/59] hw/xen: Add xen_evtchn device for event channel emulation

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Include basic support for setting HVM_PARAM_CALLBACK_IRQ to the global
vector method HVM_PARAM_CALLBACK_TYPE_VECTOR, which is handled in-kernel
by raising the vector whenever the vCPU's vcpu_info->evtchn_upcall_pending
flag is set.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/meson.build   |   5 +-
 hw/i386/kvm/xen_evtchn.c  | 155 ++
 hw/i386/kvm/xen_evtchn.h  |  18 +
 hw/i386/pc.c  |   2 +
 target/i386/kvm/xen-emu.c |  15 
 5 files changed, 194 insertions(+), 1 deletion(-)
 create mode 100644 hw/i386/kvm/xen_evtchn.c
 create mode 100644 hw/i386/kvm/xen_evtchn.h

diff --git a/hw/i386/kvm/meson.build b/hw/i386/kvm/meson.build
index 6165cbf019..cab64df339 100644
--- a/hw/i386/kvm/meson.build
+++ b/hw/i386/kvm/meson.build
@@ -4,6 +4,9 @@ i386_kvm_ss.add(when: 'CONFIG_APIC', if_true: files('apic.c'))
 i386_kvm_ss.add(when: 'CONFIG_I8254', if_true: files('i8254.c'))
 i386_kvm_ss.add(when: 'CONFIG_I8259', if_true: files('i8259.c'))
 i386_kvm_ss.add(when: 'CONFIG_IOAPIC', if_true: files('ioapic.c'))
-i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen_overlay.c'))
+i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files(
+  'xen_overlay.c',
+  'xen_evtchn.c',
+  ))
 
 i386_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss)
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
new file mode 100644
index 00..9d6f4076ad
--- /dev/null
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -0,0 +1,155 @@
+/*
+ * QEMU Xen emulation: Event channel support
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Authors: David Woodhouse 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "qemu/module.h"
+#include "qemu/main-loop.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+#include "exec/target_page.h"
+#include "exec/address-spaces.h"
+#include "migration/vmstate.h"
+
+#include "hw/sysbus.h"
+#include "hw/xen/xen.h"
+#include "xen_evtchn.h"
+
+#include "sysemu/kvm.h"
+#include "sysemu/kvm_xen.h"
+#include 
+
+#include "hw/xen/interface/memory.h"
+#include "hw/xen/interface/hvm/params.h"
+
+#define TYPE_XEN_EVTCHN "xen-evtchn"
+OBJECT_DECLARE_SIMPLE_TYPE(XenEvtchnState, XEN_EVTCHN)
+
+struct XenEvtchnState {
+/*< private >*/
+SysBusDevice busdev;
+/*< public >*/
+
+uint64_t callback_param;
+bool evtchn_in_kernel;
+
+QemuMutex port_lock;
+};
+
+struct XenEvtchnState *xen_evtchn_singleton;
+
+/* Top bits of callback_param are the type (HVM_PARAM_CALLBACK_TYPE_xxx) */
+#define CALLBACK_VIA_TYPE_SHIFT 56
+
+static int xen_evtchn_post_load(void *opaque, int version_id)
+{
+XenEvtchnState *s = opaque;
+
+if (s->callback_param) {
+xen_evtchn_set_callback_param(s->callback_param);
+}
+
+return 0;
+}
+
+static bool xen_evtchn_is_needed(void *opaque)
+{
+return xen_mode == XEN_EMULATE;
+}
+
+static const VMStateDescription xen_evtchn_vmstate = {
+.name = "xen_evtchn",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = xen_evtchn_is_needed,
+.post_load = xen_evtchn_post_load,
+.fields = (VMStateField[]) {
+VMSTATE_UINT64(callback_param, XenEvtchnState),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static void xen_evtchn_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->vmsd = _evtchn_vmstate;
+}
+
+static const TypeInfo xen_evtchn_info = {
+.name  = TYPE_XEN_EVTCHN,
+.parent= TYPE_SYS_BUS_DEVICE,
+.instance_size = sizeof(XenEvtchnState),
+.class_init= xen_evtchn_class_init,
+};
+
+void xen_evtchn_create(void)
+{
+XenEvtchnState *s = XEN_EVTCHN(sysbus_create_simple(TYPE_XEN_EVTCHN,
+-1, NULL));
+xen_evtchn_singleton = s;
+
+qemu_mutex_init(>port_lock);
+}
+
+static void xen_evtchn_register_types(void)
+{
+type_register_static(_evtchn_info);
+}
+
+type_init(xen_evtchn_register_types)
+
+int xen_evtchn_set_callback_param(uint64_t param)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+struct kvm_xen_hvm_attr xa = {
+.type = KVM_XEN_ATTR_TYPE_UPCALL_VECTOR,
+.u.vector = 0,
+};
+bool in_kernel = false;
+int ret;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+qemu_mutex_lock(>port_lock);
+
+switch (param >> CALLBACK_VIA_TYPE_SHIFT) {
+case HVM_PARAM_CALLBACK_TYPE_VECTOR: {
+xa.u.vector = (uint8_t)param,
+
+ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, );
+if (!ret && kvm_xen_has_cap(EVTCHN_SEND)) {
+in_kernel = true;
+}
+break;
+}
+default:
+/* Xen doesn't return error even if you set something bogus */
+ret = 0;
+break;
+}
+
+if (!ret)

[PATCH v11 03/59] xen: Add XEN_DISABLED mode and make it default

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Also set XEN_ATTACH mode in xen_init() to reflect the truth; not that
anyone ever cared before. It was *only* ever checked in xen_init_pv()
before.

Suggested-by: Paolo Bonzini 
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 accel/xen/xen-all.c  | 2 ++
 include/hw/xen/xen.h | 5 +++--
 softmmu/globals.c| 2 +-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/accel/xen/xen-all.c b/accel/xen/xen-all.c
index 69aa7d018b..2329556595 100644
--- a/accel/xen/xen-all.c
+++ b/accel/xen/xen-all.c
@@ -181,6 +181,8 @@ static int xen_init(MachineState *ms)
  * opt out of system RAM being allocated by generic code
  */
 mc->default_ram_id = NULL;
+
+xen_mode = XEN_ATTACH;
 return 0;
 }
 
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 4d412fd4b2..b3873c581b 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -22,8 +22,9 @@
 
 /* xen-machine.c */
 enum xen_mode {
-XEN_EMULATE = 0,  // xen emulation, using xenner (default)
-XEN_ATTACH// attach to xen domain created by libxl
+XEN_DISABLED = 0, // xen support disabled (default)
+XEN_ATTACH,   // attach to xen domain created by libxl
+XEN_EMULATE,
 };
 
 extern uint32_t xen_domid;
diff --git a/softmmu/globals.c b/softmmu/globals.c
index 527edbefdd..0a4405614e 100644
--- a/softmmu/globals.c
+++ b/softmmu/globals.c
@@ -63,5 +63,5 @@ QemuUUID qemu_uuid;
 bool qemu_uuid_set;
 
 uint32_t xen_domid;
-enum xen_mode xen_mode = XEN_EMULATE;
+enum xen_mode xen_mode = XEN_DISABLED;
 bool xen_domid_restrict;
-- 
2.39.0

[PATCH v11 25/59] i386/xen: implement HVMOP_set_evtchn_upcall_vector

2023-02-15 Thread David Woodhouse

From: Ankur Arora 

The HVMOP_set_evtchn_upcall_vector hypercall sets the per-vCPU upcall
vector, to be delivered to the local APIC just like an MSI (with an EOI).

This takes precedence over the system-wide delivery method set by the
HVMOP_set_param hypercall with HVM_PARAM_CALLBACK_IRQ. It's used by
Windows and Xen (PV shim) guests but normally not by Linux.

Signed-off-by: Ankur Arora 
Signed-off-by: Joao Martins 
[dwmw2: Rework for upstream kernel changes and split from HVMOP_set_param]
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/cpu.h|  1 +
 target/i386/kvm/trace-events |  1 +
 target/i386/kvm/xen-emu.c| 84 ++--
 target/i386/machine.c|  1 +
 4 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index bf44a87ddb..938a1b9c8b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1792,6 +1792,7 @@ typedef struct CPUArchState {
 uint64_t xen_vcpu_info_default_gpa;
 uint64_t xen_vcpu_time_info_gpa;
 uint64_t xen_vcpu_runstate_gpa;
+uint8_t xen_vcpu_callback_vector;
 #endif
 #if defined(CONFIG_HVF)
 HVFX86LazyFlags hvf_lflags;
diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index a840e0333d..b365a8e8e2 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -11,3 +11,4 @@ kvm_xen_hypercall(int cpu, uint8_t cpl, uint64_t input, 
uint64_t a0, uint64_t a1
 kvm_xen_soft_reset(void) ""
 kvm_xen_set_shared_info(uint64_t gfn) "shared info at gfn 0x%" PRIx64
 kvm_xen_set_vcpu_attr(int cpu, int type, uint64_t gpa) "vcpu attr cpu %d type 
%d gpa 0x%" PRIx64
+kvm_xen_set_vcpu_callback(int cpu, int vector) "callback vcpu %d vector %d"
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 0bca370ea4..55dc2ac012 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -27,6 +27,7 @@
 #include "hw/xen/interface/sched.h"
 #include "hw/xen/interface/memory.h"
 #include "hw/xen/interface/hvm/hvm_op.h"
+#include "hw/xen/interface/hvm/params.h"
 #include "hw/xen/interface/vcpu.h"
 #include "hw/xen/interface/event_channel.h"
 
@@ -193,7 +194,8 @@ static bool kvm_xen_hcall_xen_version(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 fi.submap |= 1 << XENFEAT_writable_page_tables |
  1 << XENFEAT_writable_descriptor_tables |
  1 << XENFEAT_auto_translated_physmap |
- 1 << XENFEAT_supervisor_mode_kernel;
+ 1 << XENFEAT_supervisor_mode_kernel |
+ 1 << XENFEAT_hvm_callback_vector;
 }
 
 err = kvm_copy_to_gva(CPU(cpu), arg, , sizeof(fi));
@@ -220,6 +222,31 @@ static int kvm_xen_set_vcpu_attr(CPUState *cs, uint16_t 
type, uint64_t gpa)
 return kvm_vcpu_ioctl(cs, KVM_XEN_VCPU_SET_ATTR, );
 }
 
+static int kvm_xen_set_vcpu_callback_vector(CPUState *cs)
+{
+uint8_t vector = X86_CPU(cs)->env.xen_vcpu_callback_vector;
+struct kvm_xen_vcpu_attr xva;
+
+xva.type = KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR;
+xva.u.vector = vector;
+
+trace_kvm_xen_set_vcpu_callback(cs->cpu_index, vector);
+
+return kvm_vcpu_ioctl(cs, KVM_XEN_HVM_SET_ATTR, );
+}
+
+static void do_set_vcpu_callback_vector(CPUState *cs, run_on_cpu_data data)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
+
+env->xen_vcpu_callback_vector = data.host_int;
+
+if (kvm_xen_has_cap(EVTCHN_SEND)) {
+kvm_xen_set_vcpu_callback_vector(cs);
+}
+}
+
 static void do_set_vcpu_info_default_gpa(CPUState *cs, run_on_cpu_data data)
 {
 X86CPU *cpu = X86_CPU(cs);
@@ -276,12 +303,16 @@ static void do_vcpu_soft_reset(CPUState *cs, 
run_on_cpu_data data)
 env->xen_vcpu_info_default_gpa = INVALID_GPA;
 env->xen_vcpu_time_info_gpa = INVALID_GPA;
 env->xen_vcpu_runstate_gpa = INVALID_GPA;
+env->xen_vcpu_callback_vector = 0;
 
 kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO, INVALID_GPA);
 kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO,
   INVALID_GPA);
 kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR,
   INVALID_GPA);
+if (kvm_xen_has_cap(EVTCHN_SEND)) {
+kvm_xen_set_vcpu_callback_vector(cs);
+}
 
 }
 
@@ -458,17 +489,53 @@ static bool kvm_xen_hcall_memory_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static int kvm_xen_hcall_evtchn_upcall_vector(struct kvm_xen_exit *exit,
+  X86CPU *cpu, uint64_t arg)
+{
+struct xen_hvm_evtchn_upcall_vector up;
+CPUState *target_cs;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(up) == 8);
+
+if (kvm_copy_from_gva(CPU(cpu), arg, , sizeof(up))) {
+return -EFAULT;
+}
+
+if (up.vector < 0x10) {
+return -EINVAL;
+}
+
+target_cs = qemu_get_cpu(up.vcpu);
+

[PATCH v11 13/59] hw/xen: Add xen_overlay device for emulating shared xenheap pages

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

For the shared info page and for grant tables, Xen shares its own pages
from the "Xen heap" to the guest. The guest requests that a given page
from a certain address space (XENMAPSPACE_shared_info, etc.) be mapped
to a given GPA using the XENMEM_add_to_physmap hypercall.

To support that in qemu when *emulating* Xen, create a memory region
(migratable) and allow it to be mapped as an overlay when requested.

Xen theoretically allows the same page to be mapped multiple times
into the guest, but that's hard to track and reinstate over migration,
so we automatically *unmap* any previous mapping when creating a new
one. This approach has been used in production with a non-trivial
number of guests expecting true Xen, without any problems yet being
noticed.

This adds just the shared info page for now. The grant tables will be
a larger region, and will need to be overlaid one page at a time. I
think that means I need to create separate aliases for each page of
the overall grant_frames region, so that they can be mapped individually.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/meson.build   |   1 +
 hw/i386/kvm/xen_overlay.c | 210 ++
 hw/i386/kvm/xen_overlay.h |  20 
 include/sysemu/kvm_xen.h  |   7 ++
 4 files changed, 238 insertions(+)
 create mode 100644 hw/i386/kvm/xen_overlay.c
 create mode 100644 hw/i386/kvm/xen_overlay.h

diff --git a/hw/i386/kvm/meson.build b/hw/i386/kvm/meson.build
index 95467f1ded..6165cbf019 100644
--- a/hw/i386/kvm/meson.build
+++ b/hw/i386/kvm/meson.build
@@ -4,5 +4,6 @@ i386_kvm_ss.add(when: 'CONFIG_APIC', if_true: files('apic.c'))
 i386_kvm_ss.add(when: 'CONFIG_I8254', if_true: files('i8254.c'))
 i386_kvm_ss.add(when: 'CONFIG_I8259', if_true: files('i8259.c'))
 i386_kvm_ss.add(when: 'CONFIG_IOAPIC', if_true: files('ioapic.c'))
+i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen_overlay.c'))
 
 i386_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss)
diff --git a/hw/i386/kvm/xen_overlay.c b/hw/i386/kvm/xen_overlay.c
new file mode 100644
index 00..a2441e2b4e
--- /dev/null
+++ b/hw/i386/kvm/xen_overlay.c
@@ -0,0 +1,210 @@
+/*
+ * QEMU Xen emulation: Shared/overlay pages support
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Authors: David Woodhouse 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "qemu/module.h"
+#include "qemu/main-loop.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+#include "exec/target_page.h"
+#include "exec/address-spaces.h"
+#include "migration/vmstate.h"
+
+#include "hw/sysbus.h"
+#include "hw/xen/xen.h"
+#include "xen_overlay.h"
+
+#include "sysemu/kvm.h"
+#include "sysemu/kvm_xen.h"
+#include 
+
+#include "hw/xen/interface/memory.h"
+
+
+#define TYPE_XEN_OVERLAY "xen-overlay"
+OBJECT_DECLARE_SIMPLE_TYPE(XenOverlayState, XEN_OVERLAY)
+
+#define XEN_PAGE_SHIFT 12
+#define XEN_PAGE_SIZE (1ULL << XEN_PAGE_SHIFT)
+
+struct XenOverlayState {
+/*< private >*/
+SysBusDevice busdev;
+/*< public >*/
+
+MemoryRegion shinfo_mem;
+void *shinfo_ptr;
+uint64_t shinfo_gpa;
+};
+
+struct XenOverlayState *xen_overlay_singleton;
+
+static void xen_overlay_do_map_page(MemoryRegion *page, uint64_t gpa)
+{
+/*
+ * Xen allows guests to map the same page as many times as it likes
+ * into guest physical frames. We don't, because it would be hard
+ * to track and restore them all. One mapping of each page is
+ * perfectly sufficient for all known guests... and we've tested
+ * that theory on a few now in other implementations. dwmw2.
+ */
+if (memory_region_is_mapped(page)) {
+if (gpa == INVALID_GPA) {
+memory_region_del_subregion(get_system_memory(), page);
+} else {
+/* Just move it */
+memory_region_set_address(page, gpa);
+}
+} else if (gpa != INVALID_GPA) {
+memory_region_add_subregion_overlap(get_system_memory(), gpa, page, 0);
+}
+}
+
+/* KVM is the only existing back end for now. Let's not overengineer it yet. */
+static int xen_overlay_set_be_shinfo(uint64_t gfn)
+{
+struct kvm_xen_hvm_attr xa = {
+.type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
+.u.shared_info.gfn = gfn,
+};
+
+return kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, );
+}
+
+
+static void xen_overlay_realize(DeviceState *dev, Error **errp)
+{
+XenOverlayState *s = XEN_OVERLAY(dev);
+
+if (xen_mode != XEN_EMULATE) {
+error_setg(errp, "Xen overlay page support is for Xen emulation");
+return;
+}
+
+memory_region_init_ram(>shinfo_mem, OBJECT(dev), "xen:shared_info",
+   XEN_PAGE_SIZE, _abort);
+memory_region_set_enabled(>shinfo_mem, true);
+
+s->shinfo_ptr =

[PATCH v11 14/59] xen: Permit --xen-domid argument when accel is KVM

2023-02-15 Thread David Woodhouse

From: Paul Durrant 

Signed-off-by: Paul Durrant 
Signed-off-by: David Wooodhouse 
---
 softmmu/vl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/softmmu/vl.c b/softmmu/vl.c
index b2ee3fee3f..2b071159c5 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -3359,7 +3359,7 @@ void qemu_init(int argc, char **argv)
 has_defaults = 0;
 break;
 case QEMU_OPTION_xen_domid:
-if (!(accel_find("xen"))) {
+if (!(accel_find("xen")) && !(accel_find("kvm"))) {
 error_report("Option not supported for this target");
 exit(1);
 }
-- 
2.39.0

[PATCH v11 49/59] i386/xen: handle HVMOP_get_param

2023-02-15 Thread David Woodhouse

From: Joao Martins 

Which is used to fetch xenstore PFN and port to be used
by the guest. This is preallocated by the toolstack when
guest will just read those and use it straight away.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/kvm/xen-emu.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index f55ab08959..36e60bd2a5 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -762,6 +762,42 @@ out:
 return true;
 }
 
+static bool handle_get_param(struct kvm_xen_exit *exit, X86CPU *cpu,
+ uint64_t arg)
+{
+CPUState *cs = CPU(cpu);
+struct xen_hvm_param hp;
+int err = 0;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(hp) == 16);
+
+if (kvm_copy_from_gva(cs, arg, , sizeof(hp))) {
+err = -EFAULT;
+goto out;
+}
+
+if (hp.domid != DOMID_SELF && hp.domid != xen_domid) {
+err = -ESRCH;
+goto out;
+}
+
+switch (hp.index) {
+case HVM_PARAM_STORE_PFN:
+hp.value = XEN_SPECIAL_PFN(XENSTORE);
+break;
+default:
+return false;
+}
+
+if (kvm_copy_to_gva(cs, arg, , sizeof(hp))) {
+err = -EFAULT;
+}
+out:
+exit->u.hcall.result = err;
+return true;
+}
+
 static int kvm_xen_hcall_evtchn_upcall_vector(struct kvm_xen_exit *exit,
   X86CPU *cpu, uint64_t arg)
 {
@@ -806,6 +842,9 @@ static bool kvm_xen_hcall_hvm_op(struct kvm_xen_exit *exit, 
X86CPU *cpu,
 case HVMOP_set_param:
 return handle_set_param(exit, cpu, arg);
 
+case HVMOP_get_param:
+return handle_get_param(exit, cpu, arg);
+
 default:
 return false;
 }
-- 
2.39.0

[PATCH v11 18/59] i386/xen: implement XENMEM_add_to_physmap_batch

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/kvm/xen-compat.h | 24 +
 target/i386/kvm/xen-emu.c| 69 
 2 files changed, 93 insertions(+)

diff --git a/target/i386/kvm/xen-compat.h b/target/i386/kvm/xen-compat.h
index 2d852e2a28..448336de92 100644
--- a/target/i386/kvm/xen-compat.h
+++ b/target/i386/kvm/xen-compat.h
@@ -15,6 +15,20 @@
 
 typedef uint32_t compat_pfn_t;
 typedef uint32_t compat_ulong_t;
+typedef uint32_t compat_ptr_t;
+
+#define __DEFINE_COMPAT_HANDLE(name, type)  \
+typedef struct {\
+compat_ptr_t c; \
+type *_[0] __attribute__((packed));   \
+} __compat_handle_ ## name; \
+
+#define DEFINE_COMPAT_HANDLE(name) __DEFINE_COMPAT_HANDLE(name, name)
+#define COMPAT_HANDLE(name) __compat_handle_ ## name
+
+DEFINE_COMPAT_HANDLE(compat_pfn_t);
+DEFINE_COMPAT_HANDLE(compat_ulong_t);
+DEFINE_COMPAT_HANDLE(int);
 
 struct compat_xen_add_to_physmap {
 domid_t domid;
@@ -24,4 +38,14 @@ struct compat_xen_add_to_physmap {
 compat_pfn_t gpfn;
 };
 
+struct compat_xen_add_to_physmap_batch {
+domid_t domid;
+uint16_t space;
+uint16_t size;
+uint16_t extra;
+COMPAT_HANDLE(compat_ulong_t) idxs;
+COMPAT_HANDLE(compat_pfn_t) gpfns;
+COMPAT_HANDLE(int) errs;
+};
+
 #endif /* QEMU_I386_XEN_COMPAT_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 5d79827128..2b235e7b27 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -262,6 +262,71 @@ static int do_add_to_physmap(struct kvm_xen_exit *exit, 
X86CPU *cpu,
 return add_to_physmap_one(xatp.space, xatp.idx, xatp.gpfn);
 }
 
+static int do_add_to_physmap_batch(struct kvm_xen_exit *exit, X86CPU *cpu,
+   uint64_t arg)
+{
+struct xen_add_to_physmap_batch xatpb;
+unsigned long idxs_gva, gpfns_gva, errs_gva;
+CPUState *cs = CPU(cpu);
+size_t op_sz;
+
+if (hypercall_compat32(exit->u.hcall.longmode)) {
+struct compat_xen_add_to_physmap_batch xatpb32;
+
+qemu_build_assert(sizeof(struct compat_xen_add_to_physmap_batch) == 
20);
+if (kvm_copy_from_gva(cs, arg, , sizeof(xatpb32))) {
+return -EFAULT;
+}
+xatpb.domid = xatpb32.domid;
+xatpb.space = xatpb32.space;
+xatpb.size = xatpb32.size;
+
+idxs_gva = xatpb32.idxs.c;
+gpfns_gva = xatpb32.gpfns.c;
+errs_gva = xatpb32.errs.c;
+op_sz = sizeof(uint32_t);
+} else {
+if (kvm_copy_from_gva(cs, arg, , sizeof(xatpb))) {
+return -EFAULT;
+}
+op_sz = sizeof(unsigned long);
+idxs_gva = (unsigned long)xatpb.idxs.p;
+gpfns_gva = (unsigned long)xatpb.gpfns.p;
+errs_gva = (unsigned long)xatpb.errs.p;
+}
+
+if (xatpb.domid != DOMID_SELF && xatpb.domid != xen_domid) {
+return -ESRCH;
+}
+
+/* Explicitly invalid for the batch op. Not that we implement it anyway. */
+if (xatpb.space == XENMAPSPACE_gmfn_range) {
+return -EINVAL;
+}
+
+while (xatpb.size--) {
+unsigned long idx = 0;
+unsigned long gpfn = 0;
+int err;
+
+/* For 32-bit compat this only copies the low 32 bits of each */
+if (kvm_copy_from_gva(cs, idxs_gva, , op_sz) ||
+kvm_copy_from_gva(cs, gpfns_gva, , op_sz)) {
+return -EFAULT;
+}
+idxs_gva += op_sz;
+gpfns_gva += op_sz;
+
+err = add_to_physmap_one(xatpb.space, idx, gpfn);
+
+if (kvm_copy_to_gva(cs, errs_gva, , sizeof(err))) {
+return -EFAULT;
+}
+errs_gva += sizeof(err);
+}
+return 0;
+}
+
 static bool kvm_xen_hcall_memory_op(struct kvm_xen_exit *exit, X86CPU *cpu,
int cmd, uint64_t arg)
 {
@@ -272,6 +337,10 @@ static bool kvm_xen_hcall_memory_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 err = do_add_to_physmap(exit, cpu, arg);
 break;
 
+case XENMEM_add_to_physmap_batch:
+err = do_add_to_physmap_batch(exit, cpu, arg);
+break;
+
 default:
 return false;
 }
-- 
2.39.0

[PATCH v11 04/59] i386/kvm: Add xen-version KVM accelerator property and init KVM Xen support

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

This just initializes the basic Xen support in KVM for now. Only permitted
on TYPE_PC_MACHINE because that's where the sysbus devices for Xen heap
overlay, event channel, grant tables and other stuff will exist. There's
no point having the basic hypercall support if nothing else works.

Provide sysemu/kvm_xen.h and a kvm_xen_get_caps() which will be used
later by support devices.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 accel/kvm/kvm-all.c |  1 +
 include/sysemu/kvm_int.h|  2 ++
 include/sysemu/kvm_xen.h| 20 +
 target/i386/kvm/kvm.c   | 59 +
 target/i386/kvm/meson.build |  2 ++
 target/i386/kvm/xen-emu.c   | 58 
 target/i386/kvm/xen-emu.h   | 19 
 7 files changed, 161 insertions(+)
 create mode 100644 include/sysemu/kvm_xen.h
 create mode 100644 target/i386/kvm/xen-emu.c
 create mode 100644 target/i386/kvm/xen-emu.h

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 9b26582655..f242e36316 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3703,6 +3703,7 @@ static void kvm_accel_instance_init(Object *obj)
 s->kvm_dirty_ring_size = 0;
 s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN;
 s->notify_window = 0;
+s->xen_version = 0;
 }
 
 /**
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 60b520a13e..7f945bc763 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -118,6 +118,8 @@ struct KVMState
 struct KVMDirtyRingReaper reaper;
 NotifyVmexitOption notify_vmexit;
 uint32_t notify_window;
+uint32_t xen_version;
+uint32_t xen_caps;
 };
 
 void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
diff --git a/include/sysemu/kvm_xen.h b/include/sysemu/kvm_xen.h
new file mode 100644
index 00..296533f2d5
--- /dev/null
+++ b/include/sysemu/kvm_xen.h
@@ -0,0 +1,20 @@
+/*
+ * Xen HVM emulation support in KVM
+ *
+ * Copyright © 2019 Oracle and/or its affiliates. All rights reserved.
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_SYSEMU_KVM_XEN_H
+#define QEMU_SYSEMU_KVM_XEN_H
+
+uint32_t kvm_xen_get_caps(void);
+
+#define kvm_xen_has_cap(cap) (!!(kvm_xen_get_caps() &   \
+ KVM_XEN_HVM_CONFIG_ ## cap))
+
+#endif /* QEMU_SYSEMU_KVM_XEN_H */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 5870301991..aa6eac7cad 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -31,6 +31,7 @@
 #include "sysemu/runstate.h"
 #include "kvm_i386.h"
 #include "sev.h"
+#include "xen-emu.h"
 #include "hyperv.h"
 #include "hyperv-proto.h"
 
@@ -42,6 +43,7 @@
 #include "qemu/error-report.h"
 #include "qemu/memalign.h"
 #include "hw/i386/x86.h"
+#include "hw/i386/pc.h"
 #include "hw/i386/apic.h"
 #include "hw/i386/apic_internal.h"
 #include "hw/i386/apic-msidef.h"
@@ -49,6 +51,8 @@
 #include "hw/i386/x86-iommu.h"
 #include "hw/i386/e820_memory_layout.h"
 
+#include "hw/xen/xen.h"
+
 #include "hw/pci/pci.h"
 #include "hw/pci/msi.h"
 #include "hw/pci/msix.h"
@@ -2514,6 +2518,22 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 }
 }
 
+if (s->xen_version) {
+#ifdef CONFIG_XEN_EMU
+if (!object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE)) {
+error_report("kvm: Xen support only available in PC machine");
+return -ENOTSUP;
+}
+ret = kvm_xen_init(s);
+if (ret < 0) {
+return ret;
+}
+#else
+error_report("kvm: Xen support not enabled in qemu");
+return -ENOTSUP;
+#endif
+}
+
 ret = kvm_get_supported_msrs(s);
 if (ret < 0) {
 return ret;
@@ -5704,6 +5724,36 @@ static void kvm_arch_set_notify_window(Object *obj, 
Visitor *v,
 s->notify_window = value;
 }
 
+static void kvm_arch_get_xen_version(Object *obj, Visitor *v,
+ const char *name, void *opaque,
+ Error **errp)
+{
+KVMState *s = KVM_STATE(obj);
+uint32_t value = s->xen_version;
+
+visit_type_uint32(v, name, , errp);
+}
+
+static void kvm_arch_set_xen_version(Object *obj, Visitor *v,
+ const char *name, void *opaque,
+ Error **errp)
+{
+KVMState *s = KVM_STATE(obj);
+Error *error = NULL;
+uint32_t value;
+
+visit_type_uint32(v, name, , );
+if (error) {
+error_propagate(errp, error);
+return;
+}
+
+s->xen_version = value;
+if (value && xen_mode == XEN_DISABLED) {
+xen_mode = XEN_EMULATE;
+}
+}
+
 void kvm_arch_accel_class_init(ObjectClass *oc)
 {
 object_class_property_add_enum(oc, "notify-vmexit", "NotifyVMexitOption",
@@

[PATCH v11 28/59] i386/xen: Add support for Xen event channel delivery to vCPU

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

The kvm_xen_inject_vcpu_callback_vector() function will either deliver
the per-vCPU local APIC vector (as an MSI), or just kick the vCPU out
of the kernel to trigger KVM's automatic delivery of the global vector.
Support for asserting the GSI/PCI_INTX callbacks will come later.

Also add kvm_xen_get_vcpu_info_hva() which returns the vcpu_info of
a given vCPU.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 include/sysemu/kvm_xen.h  |  2 +
 target/i386/cpu.h |  2 +
 target/i386/kvm/xen-emu.c | 91 ---
 3 files changed, 89 insertions(+), 6 deletions(-)

diff --git a/include/sysemu/kvm_xen.h b/include/sysemu/kvm_xen.h
index 0c3a273549..0c0efbe699 100644
--- a/include/sysemu/kvm_xen.h
+++ b/include/sysemu/kvm_xen.h
@@ -21,6 +21,8 @@
 
 int kvm_xen_soft_reset(void);
 uint32_t kvm_xen_get_caps(void);
+void *kvm_xen_get_vcpu_info_hva(uint32_t vcpu_id);
+void kvm_xen_inject_vcpu_callback_vector(uint32_t vcpu_id, int type);
 
 #define kvm_xen_has_cap(cap) (!!(kvm_xen_get_caps() &   \
  KVM_XEN_HVM_CONFIG_ ## cap))
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 938a1b9c8b..c9b12e7476 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1788,6 +1788,8 @@ typedef struct CPUArchState {
 #endif
 #if defined(CONFIG_KVM)
 struct kvm_nested_state *nested_state;
+MemoryRegion *xen_vcpu_info_mr;
+void *xen_vcpu_info_hva;
 uint64_t xen_vcpu_info_gpa;
 uint64_t xen_vcpu_info_default_gpa;
 uint64_t xen_vcpu_time_info_gpa;
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index e80de809fc..4513f07c68 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -21,6 +21,8 @@
 #include "trace.h"
 #include "sysemu/runstate.h"
 
+#include "hw/pci/msi.h"
+#include "hw/i386/apic-msidef.h"
 #include "hw/i386/kvm/xen_overlay.h"
 #include "hw/i386/kvm/xen_evtchn.h"
 
@@ -248,6 +250,40 @@ static void do_set_vcpu_callback_vector(CPUState *cs, 
run_on_cpu_data data)
 }
 }
 
+static int set_vcpu_info(CPUState *cs, uint64_t gpa)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
+MemoryRegionSection mrs = { .mr = NULL };
+void *vcpu_info_hva = NULL;
+int ret;
+
+ret = kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO, gpa);
+if (ret || gpa == INVALID_GPA) {
+goto out;
+}
+
+mrs = memory_region_find(get_system_memory(), gpa,
+ sizeof(struct vcpu_info));
+if (!mrs.mr) {
+ret = -EINVAL;
+} else if (!mrs.mr->ram_block || mrs.size < sizeof(struct vcpu_info) ||
+   !(vcpu_info_hva = qemu_map_ram_ptr(mrs.mr->ram_block,
+  mrs.offset_within_region))) {
+ret = -EINVAL;
+memory_region_unref(mrs.mr);
+mrs.mr = NULL;
+}
+
+ out:
+if (env->xen_vcpu_info_mr) {
+memory_region_unref(env->xen_vcpu_info_mr);
+}
+env->xen_vcpu_info_hva = vcpu_info_hva;
+env->xen_vcpu_info_mr = mrs.mr;
+return ret;
+}
+
 static void do_set_vcpu_info_default_gpa(CPUState *cs, run_on_cpu_data data)
 {
 X86CPU *cpu = X86_CPU(cs);
@@ -257,8 +293,7 @@ static void do_set_vcpu_info_default_gpa(CPUState *cs, 
run_on_cpu_data data)
 
 /* Changing the default does nothing if a vcpu_info was explicitly set. */
 if (env->xen_vcpu_info_gpa == INVALID_GPA) {
-kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO,
-  env->xen_vcpu_info_default_gpa);
+set_vcpu_info(cs, env->xen_vcpu_info_default_gpa);
 }
 }
 
@@ -269,8 +304,52 @@ static void do_set_vcpu_info_gpa(CPUState *cs, 
run_on_cpu_data data)
 
 env->xen_vcpu_info_gpa = data.host_ulong;
 
-kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO,
-  env->xen_vcpu_info_gpa);
+set_vcpu_info(cs, env->xen_vcpu_info_gpa);
+}
+
+void *kvm_xen_get_vcpu_info_hva(uint32_t vcpu_id)
+{
+CPUState *cs = qemu_get_cpu(vcpu_id);
+if (!cs) {
+return NULL;
+}
+
+return X86_CPU(cs)->env.xen_vcpu_info_hva;
+}
+
+void kvm_xen_inject_vcpu_callback_vector(uint32_t vcpu_id, int type)
+{
+CPUState *cs = qemu_get_cpu(vcpu_id);
+uint8_t vector;
+
+if (!cs) {
+return;
+}
+
+vector = X86_CPU(cs)->env.xen_vcpu_callback_vector;
+if (vector) {
+/*
+ * The per-vCPU callback vector injected via lapic. Just
+ * deliver it as an MSI.
+ */
+MSIMessage msg = {
+.address = APIC_DEFAULT_ADDRESS | X86_CPU(cs)->apic_id,
+.data = vector | (1UL << MSI_DATA_LEVEL_SHIFT),
+};
+kvm_irqchip_send_msi(kvm_state, msg);
+return;
+}
+
+switch (type) {
+case HVM_PARAM_CALLBACK_TYPE_VECTOR:
+/*
+ * If the evtchn_upcall_pending field in the vcpu_info is set, then
+ * KVM will automatically deliver the

[PATCH v11 32/59] hw/xen: Implement EVTCHNOP_bind_virq

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Add the array of virq ports to each vCPU so that we can deliver timers,
debug ports, etc. Global virqs are allocated against vCPU 0 initially,
but can be migrated to other vCPUs (when we implement that).

The kernel needs to know about VIRQ_TIMER in order to accelerate timers,
so tell it via KVM_XEN_VCPU_ATTR_TYPE_TIMER. Also save/restore the value
of the singleshot timer across migration, as the kernel will handle the
hypercalls automatically now.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_evtchn.c  | 85 
 hw/i386/kvm/xen_evtchn.h  |  2 +
 include/sysemu/kvm_xen.h  |  1 +
 target/i386/cpu.h |  4 ++
 target/i386/kvm/xen-emu.c | 91 +++
 target/i386/machine.c |  2 +
 6 files changed, 185 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index deea7de027..da2f5711dd 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -244,6 +244,11 @@ static bool valid_port(evtchn_port_t port)
 }
 }
 
+static bool valid_vcpu(uint32_t vcpu)
+{
+return !!qemu_get_cpu(vcpu);
+}
+
 int xen_evtchn_status_op(struct evtchn_status *status)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
@@ -494,6 +499,43 @@ static void free_port(XenEvtchnState *s, evtchn_port_t 
port)
 clear_port_pending(s, port);
 }
 
+static int allocate_port(XenEvtchnState *s, uint32_t vcpu, uint16_t type,
+ uint16_t val, evtchn_port_t *port)
+{
+evtchn_port_t p = 1;
+
+for (p = 1; valid_port(p); p++) {
+if (s->port_table[p].type == EVTCHNSTAT_closed) {
+s->port_table[p].vcpu = vcpu;
+s->port_table[p].type = type;
+s->port_table[p].type_val = val;
+
+*port = p;
+
+if (s->nr_ports < p + 1) {
+s->nr_ports = p + 1;
+}
+
+return 0;
+}
+}
+return -ENOSPC;
+}
+
+static bool virq_is_global(uint32_t virq)
+{
+switch (virq) {
+case VIRQ_TIMER:
+case VIRQ_DEBUG:
+case VIRQ_XENOPROF:
+case VIRQ_XENPMU:
+return false;
+
+default:
+return true;
+}
+}
+
 static int close_port(XenEvtchnState *s, evtchn_port_t port)
 {
 XenEvtchnPort *p = >port_table[port];
@@ -502,6 +544,11 @@ static int close_port(XenEvtchnState *s, evtchn_port_t 
port)
 case EVTCHNSTAT_closed:
 return -ENOENT;
 
+case EVTCHNSTAT_virq:
+kvm_xen_set_vcpu_virq(virq_is_global(p->type_val) ? 0 : p->vcpu,
+  p->type_val, 0);
+break;
+
 default:
 break;
 }
@@ -553,3 +600,41 @@ int xen_evtchn_unmask_op(struct evtchn_unmask *unmask)
 
 return ret;
 }
+
+int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int ret;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (virq->virq >= NR_VIRQS) {
+return -EINVAL;
+}
+
+/* Global VIRQ must be allocated on vCPU0 first */
+if (virq_is_global(virq->virq) && virq->vcpu != 0) {
+return -EINVAL;
+}
+
+if (!valid_vcpu(virq->vcpu)) {
+return -ENOENT;
+}
+
+qemu_mutex_lock(>port_lock);
+
+ret = allocate_port(s, virq->vcpu, EVTCHNSTAT_virq, virq->virq,
+>port);
+if (!ret) {
+ret = kvm_xen_set_vcpu_virq(virq->vcpu, virq->virq, virq->port);
+if (ret) {
+free_port(s, virq->port);
+}
+}
+
+qemu_mutex_unlock(>port_lock);
+
+return ret;
+}
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 69c6b0d743..0ea13dda3a 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -18,8 +18,10 @@ int xen_evtchn_set_callback_param(uint64_t param);
 struct evtchn_status;
 struct evtchn_close;
 struct evtchn_unmask;
+struct evtchn_bind_virq;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
+int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq);
 
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/include/sysemu/kvm_xen.h b/include/sysemu/kvm_xen.h
index 0c0efbe699..297630cd87 100644
--- a/include/sysemu/kvm_xen.h
+++ b/include/sysemu/kvm_xen.h
@@ -23,6 +23,7 @@ int kvm_xen_soft_reset(void);
 uint32_t kvm_xen_get_caps(void);
 void *kvm_xen_get_vcpu_info_hva(uint32_t vcpu_id);
 void kvm_xen_inject_vcpu_callback_vector(uint32_t vcpu_id, int type);
+int kvm_xen_set_vcpu_virq(uint32_t vcpu_id, uint16_t virq, uint16_t port);
 
 #define kvm_xen_has_cap(cap) (!!(kvm_xen_get_caps() &   \
  KVM_XEN_HVM_CONFIG_ ## cap))
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index c9b12e7476..dba8732fc6 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -27,6 +27,8 @@
 #include "qapi/qapi-types-common.h"
 #include "qemu/cpu-float.h"
 
+#define

[PATCH v11 05/59] i386/kvm: handle Xen HVM cpuid leaves

2023-02-15 Thread David Woodhouse

From: Joao Martins 

Introduce support for emulating CPUID for Xen HVM guests. It doesn't make
sense to advertise the KVM leaves to a Xen guest, so do Xen unconditionally
when the xen-version machine property is set.

Signed-off-by: Joao Martins 
[dwmw2: Obtain xen_version from KVM property, make it automatic]
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/cpu.c |  1 +
 target/i386/cpu.h |  2 +
 target/i386/kvm/kvm.c | 77 ++-
 target/i386/kvm/xen-emu.c |  4 +-
 target/i386/kvm/xen-emu.h | 13 ++-
 5 files changed, 91 insertions(+), 6 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 4d2b8d0444..eb5a466d4e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7070,6 +7070,7 @@ static Property x86_cpu_properties[] = {
  * own cache information (see x86_cpu_load_def()).
  */
 DEFINE_PROP_BOOL("legacy-cache", X86CPU, legacy_cache, true),
+DEFINE_PROP_BOOL("xen-vapic", X86CPU, xen_vapic, false),
 
 /*
  * From "Requirements for Implementing the Microsoft
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index d4bc19577a..c6c57baed5 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1964,6 +1964,8 @@ struct ArchCPU {
 int32_t thread_id;
 
 int32_t hv_max_vps;
+
+bool xen_vapic;
 };
 
 
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index aa6eac7cad..2b3daabf7b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -22,6 +22,7 @@
 
 #include 
 #include "standard-headers/asm-x86/kvm_para.h"
+#include "hw/xen/interface/arch-x86/cpuid.h"
 
 #include "cpu.h"
 #include "host-cpu.h"
@@ -1804,7 +1805,77 @@ int kvm_arch_init_vcpu(CPUState *cs)
 has_msr_hv_hypercall = true;
 }
 
-if (cpu->expose_kvm) {
+if (cs->kvm_state->xen_version) {
+#ifdef CONFIG_XEN_EMU
+struct kvm_cpuid_entry2 *xen_max_leaf;
+
+memcpy(signature, "XenVMMXenVMM", 12);
+
+xen_max_leaf = c = _data.entries[cpuid_i++];
+c->function = kvm_base + XEN_CPUID_SIGNATURE;
+c->eax = kvm_base + XEN_CPUID_TIME;
+c->ebx = signature[0];
+c->ecx = signature[1];
+c->edx = signature[2];
+
+c = _data.entries[cpuid_i++];
+c->function = kvm_base + XEN_CPUID_VENDOR;
+c->eax = cs->kvm_state->xen_version;
+c->ebx = 0;
+c->ecx = 0;
+c->edx = 0;
+
+c = _data.entries[cpuid_i++];
+c->function = kvm_base + XEN_CPUID_HVM_MSR;
+/* Number of hypercall-transfer pages */
+c->eax = 1;
+/* Hypercall MSR base address */
+if (hyperv_enabled(cpu)) {
+c->ebx = XEN_HYPERCALL_MSR_HYPERV;
+kvm_xen_init(cs->kvm_state, c->ebx);
+} else {
+c->ebx = XEN_HYPERCALL_MSR;
+}
+c->ecx = 0;
+c->edx = 0;
+
+c = _data.entries[cpuid_i++];
+c->function = kvm_base + XEN_CPUID_TIME;
+c->eax = ((!!tsc_is_stable_and_known(env) << 1) |
+(!!(env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_RDTSCP) << 2));
+/* default=0 (emulate if necessary) */
+c->ebx = 0;
+/* guest tsc frequency */
+c->ecx = env->user_tsc_khz;
+/* guest tsc incarnation (migration count) */
+c->edx = 0;
+
+c = _data.entries[cpuid_i++];
+c->function = kvm_base + XEN_CPUID_HVM;
+xen_max_leaf->eax = kvm_base + XEN_CPUID_HVM;
+if (cs->kvm_state->xen_version >= XEN_VERSION(4, 5)) {
+c->function = kvm_base + XEN_CPUID_HVM;
+
+if (cpu->xen_vapic) {
+c->eax |= XEN_HVM_CPUID_APIC_ACCESS_VIRT;
+c->eax |= XEN_HVM_CPUID_X2APIC_VIRT;
+}
+
+c->eax |= XEN_HVM_CPUID_IOMMU_MAPPINGS;
+
+if (cs->kvm_state->xen_version >= XEN_VERSION(4, 6)) {
+c->eax |= XEN_HVM_CPUID_VCPU_ID_PRESENT;
+c->ebx = cs->cpu_index;
+}
+}
+
+kvm_base += 0x100;
+#else /* CONFIG_XEN_EMU */
+/* This should never happen as kvm_arch_init() would have died first. 
*/
+fprintf(stderr, "Cannot enable Xen CPUID without Xen support\n");
+abort();
+#endif
+} else if (cpu->expose_kvm) {
 memcpy(signature, "KVMKVMKVM\0\0\0", 12);
 c = _data.entries[cpuid_i++];
 c->function = KVM_CPUID_SIGNATURE | kvm_base;
@@ -2524,7 +2595,9 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 error_report("kvm: Xen support only available in PC machine");
 return -ENOTSUP;
 }
-ret = kvm_xen_init(s);
+/* hyperv_enabled() doesn't work yet. */
+uint32_t msr = XEN_HYPERCALL_MSR;
+ret = kvm_xen_init(s, msr);
 if (ret < 0) {
 return ret;
 }
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index b556d903aa..34d5bc1bc9 100644
--- a/target/i386/kvm/xen-emu.c
+++

[PATCH v11 59/59] i386/xen: Document Xen HVM emulation

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 docs/system/i386/xen.rst| 76 +
 docs/system/target-i386.rst |  1 +
 2 files changed, 77 insertions(+)
 create mode 100644 docs/system/i386/xen.rst

diff --git a/docs/system/i386/xen.rst b/docs/system/i386/xen.rst
new file mode 100644
index 00..a00523b492
--- /dev/null
+++ b/docs/system/i386/xen.rst
@@ -0,0 +1,76 @@
+Xen HVM guest support
+=
+
+
+Description
+---
+
+KVM has support for hosting Xen guests, intercepting Xen hypercalls and event
+channel (Xen PV interrupt) delivery. This allows guests which expect to be
+run under Xen to be hosted in QEMU under Linux/KVM instead.
+
+Setup
+-
+
+Xen mode is enabled by setting the ``xen-version`` property of the KVM
+accelerator, for example for Xen 4.10:
+
+.. parsed-literal::
+
+  |qemu_system| --accel kvm,xen-version=0x4000a
+
+Additionally, virtual APIC support can be advertised to the guest through the
+``xen-vapic`` CPU flag:
+
+.. parsed-literal::
+
+  |qemu_system| --accel kvm,xen-version=0x4000a --cpu host,+xen_vapic
+
+When Xen support is enabled, QEMU changes hypervisor identification (CPUID
+0x4000..0x400A) to Xen. The KVM identification and features are not
+advertised to a Xen guest. If Hyper-V is also enabled, the Xen identification
+moves to leaves 0x4100..0x410A.
+
+The Xen platform device is enabled automatically for a Xen guest. This allows
+a guest to unplug all emulated devices, in order to use Xen PV block and 
network
+drivers instead. Note that until the Xen PV device back ends are enabled to 
work
+with Xen mode in QEMU, that is unlikely to cause significant joy. Linux guests
+can be dissuaded from this by adding 'xen_emul_unplug=never' on their command
+line, and it can also be noted that AHCI disk controllers are exempt from being
+unplugged, as are passthrough VFIO PCI devices.
+
+Properties
+--
+
+The following properties exist on the KVM accelerator object:
+
+``xen-version``
+  This property contains the Xen version in ``XENVER_version`` form, with the
+  major version in the top 16 bits and the minor version in the low 16 bits.
+  Setting this property enables the Xen guest support.
+
+``xen-evtchn-max-pirq``
+  Xen PIRQs represent an emulated physical interrupt, either GSI or MSI, which
+  can be routed to an event channel instead of to the emulated I/O or local
+  APIC. By default, QEMU permits only 256 PIRQs because this allows maximum
+  compatibility with 32-bit MSI where the higher bits of the PIRQ# would need
+  to be in the upper 64 bits of the MSI message. For guests with large numbers
+  of PCI devices (and none which are limited to 32-bit addressing) it may be
+  desirable to increase this value.
+
+``xen-gnttab-max-frames``
+  Xen grant tables are the means by which a Xen guest grants access to its
+  memory for PV back ends (disk, network, etc.). Since QEMU only supports v1
+  grant tables which are 8 bytes in size, each page (each frame) of the grant
+  table can reference 512 pages of guest memory. The default number of frames
+  is 64, allowing for 32768 pages of guest memory to be accessed by PV backends
+  through simultaneous grants. For guests with large numbers of PV devices and
+  high throughput, it may be desirable to increase this value.
+
+OS requirements
+---
+
+The minimal Xen support in the KVM accelerator requires the host to be running
+Linux v5.12 or newer. Later versions add optimisations: Linux v5.17 added
+acceleration of interrupt delivery via the Xen PIRQ mechanism, and Linux v5.19
+accelerated Xen PV timers and inter-processor interrupts (IPIs).
diff --git a/docs/system/target-i386.rst b/docs/system/target-i386.rst
index e64c013077..77c2f3b979 100644
--- a/docs/system/target-i386.rst
+++ b/docs/system/target-i386.rst
@@ -27,6 +27,7 @@ Architectural features
 
i386/cpu
i386/hyperv
+   i386/xen
i386/kvm-pv
i386/sgx
i386/amd-memory-encryption
-- 
2.39.0

[PATCH v11 56/59] hw/xen: Support GSI mapping to PIRQ

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

If I advertise XENFEAT_hvm_pirqs then a guest now boots successfully as
long as I tell it 'pci=nomsi'.

[root@localhost ~]# cat /proc/interrupts
   CPU0
  0: 52   IO-APIC   2-edge  timer
  1: 16  xen-pirq   1-ioapic-edge  i8042
  4:   1534  xen-pirq   4-ioapic-edge  ttyS0
  8:  1  xen-pirq   8-ioapic-edge  rtc0
  9:  0  xen-pirq   9-ioapic-level  acpi
 11:   5648  xen-pirq  11-ioapic-level  ahci[:00:04.0]
 12:257  xen-pirq  12-ioapic-edge  i8042
...

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_evtchn.c | 56 +++-
 hw/i386/kvm/xen_evtchn.h |  2 ++
 hw/i386/x86.c| 16 
 3 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index f5e835ff70..8df95742a7 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -148,6 +148,9 @@ struct XenEvtchnState {
 /* GSI → PIRQ mapping (serialized) */
 uint16_t gsi_pirq[GSI_NUM_PINS];
 
+/* Per-GSI assertion state (serialized) */
+uint32_t pirq_gsi_set;
+
 /* Per-PIRQ information (rebuilt on migration) */
 struct pirq_info *pirq;
 };
@@ -246,6 +249,7 @@ static const VMStateDescription xen_evtchn_vmstate = {
 VMSTATE_VARRAY_UINT16_ALLOC(pirq_inuse_bitmap, XenEvtchnState,
 nr_pirq_inuse_words, 0,
 vmstate_info_uint64, uint64_t),
+VMSTATE_UINT32(pirq_gsi_set, XenEvtchnState),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -1506,6 +1510,51 @@ static int allocate_pirq(XenEvtchnState *s, int type, 
int gsi)
 return pirq;
 }
 
+bool xen_evtchn_set_gsi(int gsi, int level)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int pirq;
+
+assert(qemu_mutex_iothread_locked());
+
+if (!s || gsi < 0 || gsi > GSI_NUM_PINS) {
+return false;
+}
+
+/*
+ * Check that that it *isn't* the event channel GSI, and thus
+ * that we are not recursing and it's safe to take s->port_lock.
+ *
+ * Locking aside, it's perfectly sane to bail out early for that
+ * special case, as it would make no sense for the event channel
+ * GSI to be routed back to event channels, when the delivery
+ * method is to raise the GSI... that recursion wouldn't *just*
+ * be a locking issue.
+ */
+if (gsi && gsi == s->callback_gsi) {
+return false;
+}
+
+QEMU_LOCK_GUARD(>port_lock);
+
+pirq = s->gsi_pirq[gsi];
+if (!pirq) {
+return false;
+}
+
+if (level) {
+int port = s->pirq[pirq].port;
+
+s->pirq_gsi_set |= (1U << gsi);
+if (port) {
+set_port_pending(s, port);
+}
+} else {
+s->pirq_gsi_set &= ~(1U << gsi);
+}
+return true;
+}
+
 int xen_physdev_map_pirq(struct physdev_map_pirq *map)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
@@ -1612,8 +1661,13 @@ int xen_physdev_eoi_pirq(struct physdev_eoi *eoi)
 if (gsi < 0) {
 return -EINVAL;
 }
+if (s->pirq_gsi_set & (1U << gsi)) {
+int port = s->pirq[pirq].port;
+if (port) {
+set_port_pending(s, port);
+}
+}
 
-// XX: Reassert a level IRQ if needed */
 return 0;
 }
 
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index a7383f760c..95400b7fbf 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -24,6 +24,8 @@ void xen_evtchn_set_callback_level(int level);
 
 int xen_evtchn_set_port(uint16_t port);
 
+bool xen_evtchn_set_gsi(int gsi, int level);
+
 /*
  * These functions mirror the libxenevtchn library API, providing the QEMU
  * backend side of "interdomain" event channels.
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index eaff4227bd..594fd25c55 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -62,6 +62,11 @@
 #include CONFIG_DEVICES
 #include "kvm/kvm_i386.h"
 
+#ifdef CONFIG_XEN_EMU
+#include "hw/xen/xen.h"
+#include "hw/i386/kvm/xen_evtchn.h"
+#endif
+
 /* Physical Address of PVH entry point read from kernel ELF NOTE */
 static size_t pvh_start_addr;
 
@@ -609,6 +614,17 @@ void gsi_handler(void *opaque, int n, int level)
 }
 /* fall through */
 case ISA_NUM_IRQS ... IOAPIC_NUM_PINS - 1:
+#ifdef CONFIG_XEN_EMU
+/*
+ * Xen delivers the GSI to the Legacy PIC (not that Legacy PIC
+ * routing actually works properly under Xen). And then to
+ * *either* the PIRQ handling or the I/OAPIC depending on
+ * whether the former wants it.
+ */
+if (xen_mode == XEN_EMULATE && xen_evtchn_set_gsi(n, level)) {
+break;
+}
+#endif
 qemu_set_irq(s->ioapic_irq[n], level);
 break;
 case IO_APIC_SECONDARY_IRQBASE
-- 
2.39.0

[PATCH v11 53/59] hw/xen: Automatically add xen-platform PCI device for emulated Xen guests

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

It isn't strictly mandatory but Linux guests at least will only map
their grant tables over the dummy BAR that it provides, and don't have
sufficient wit to map them in any other unused part of their guest
address space. So include it by default for minimal surprise factor.

As I come to document "how to run a Xen guest in QEMU", this means one
fewer thing to tell the user about, according to the mantra of "if it
needs documenting, fix it first, then document what remains".

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/pc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index a12a7a67e9..5ec3518b9e 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1313,6 +1313,9 @@ void pc_basic_device_init(struct PCMachineState *pcms,
 #ifdef CONFIG_XEN_EMU
 if (xen_mode == XEN_EMULATE) {
 xen_evtchn_connect_gsis(gsi);
+if (pcms->bus) {
+pci_create_simple(pcms->bus, -1, "xen-platform");
+}
 }
 #endif
 
-- 
2.39.0

[PATCH v11 24/59] i386/xen: implement HYPERVISOR_event_channel_op

2023-02-15 Thread David Woodhouse

From: Joao Martins 

Signed-off-by: Joao Martins 
[dwmw2: Ditch event_channel_op_compat which was never available to HVM guests]
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/kvm/xen-emu.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index f5c8b6d20c..0bca370ea4 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -28,6 +28,7 @@
 #include "hw/xen/interface/memory.h"
 #include "hw/xen/interface/hvm/hvm_op.h"
 #include "hw/xen/interface/vcpu.h"
+#include "hw/xen/interface/event_channel.h"
 
 #include "xen-compat.h"
 
@@ -588,6 +589,27 @@ static bool kvm_xen_hcall_vcpu_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit *exit,
+int cmd, uint64_t arg)
+{
+int err = -ENOSYS;
+
+switch (cmd) {
+case EVTCHNOP_init_control:
+case EVTCHNOP_expand_array:
+case EVTCHNOP_set_priority:
+/* We do not support FIFO channels at this point */
+err = -ENOSYS;
+break;
+
+default:
+return false;
+}
+
+exit->u.hcall.result = err;
+return true;
+}
+
 int kvm_xen_soft_reset(void)
 {
 CPUState *cpu;
@@ -694,6 +716,9 @@ static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct 
kvm_xen_exit *exit)
 case __HYPERVISOR_sched_op:
 return kvm_xen_hcall_sched_op(exit, cpu, exit->u.hcall.params[0],
   exit->u.hcall.params[1]);
+case __HYPERVISOR_event_channel_op:
+return kvm_xen_hcall_evtchn_op(exit, exit->u.hcall.params[0],
+   exit->u.hcall.params[1]);
 case __HYPERVISOR_vcpu_op:
 return kvm_xen_hcall_vcpu_op(exit, cpu,
  exit->u.hcall.params[0],
-- 
2.39.0

[PATCH v11 50/59] hw/xen: Add backend implementation of interdomain event channel support

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

The provides the QEMU side of interdomain event channels, allowing events
to be sent to/from the guest.

The API mirrors libxenevtchn, and in time both this and the real Xen one
will be available through ops structures so that the PV backend drivers
can use the correct one as appropriate.

For now, this implementation can be used directly by our XenStore which
will be for emulated mode only.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_evtchn.c | 340 ++-
 hw/i386/kvm/xen_evtchn.h |  19 +++
 2 files changed, 352 insertions(+), 7 deletions(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 06572b3e10..519b8e0600 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -38,6 +38,7 @@
 #include "sysemu/kvm.h"
 #include "sysemu/kvm_xen.h"
 #include 
+#include 
 
 #include "hw/xen/interface/memory.h"
 #include "hw/xen/interface/hvm/params.h"
@@ -88,6 +89,13 @@ struct compat_shared_info {
 
 #define COMPAT_EVTCHN_2L_NR_CHANNELS1024
 
+/* Local private implementation of struct xenevtchn_handle */
+struct xenevtchn_handle {
+evtchn_port_t be_port;
+evtchn_port_t guest_port; /* Or zero for unbound */
+int fd;
+};
+
 /*
  * For unbound/interdomain ports there are only two possible remote
  * domains; self and QEMU. Use a single high bit in type_val for that,
@@ -111,6 +119,8 @@ struct XenEvtchnState {
 uint32_t nr_ports;
 XenEvtchnPort port_table[EVTCHN_2L_NR_CHANNELS];
 qemu_irq gsis[GSI_NUM_PINS];
+
+struct xenevtchn_handle *be_handles[EVTCHN_2L_NR_CHANNELS];
 };
 
 struct XenEvtchnState *xen_evtchn_singleton;
@@ -118,6 +128,18 @@ struct XenEvtchnState *xen_evtchn_singleton;
 /* Top bits of callback_param are the type (HVM_PARAM_CALLBACK_TYPE_xxx) */
 #define CALLBACK_VIA_TYPE_SHIFT 56
 
+static void unbind_backend_ports(XenEvtchnState *s);
+
+static int xen_evtchn_pre_load(void *opaque)
+{
+XenEvtchnState *s = opaque;
+
+/* Unbind all the backend-side ports; they need to rebind */
+unbind_backend_ports(s);
+
+return 0;
+}
+
 static int xen_evtchn_post_load(void *opaque, int version_id)
 {
 XenEvtchnState *s = opaque;
@@ -151,6 +173,7 @@ static const VMStateDescription xen_evtchn_vmstate = {
 .version_id = 1,
 .minimum_version_id = 1,
 .needed = xen_evtchn_is_needed,
+.pre_load = xen_evtchn_pre_load,
 .post_load = xen_evtchn_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINT64(callback_param, XenEvtchnState),
@@ -423,6 +446,20 @@ static int assign_kernel_port(uint16_t type, evtchn_port_t 
port,
 return kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, );
 }
 
+static int assign_kernel_eventfd(uint16_t type, evtchn_port_t port, int fd)
+{
+struct kvm_xen_hvm_attr ha;
+
+ha.type = KVM_XEN_ATTR_TYPE_EVTCHN;
+ha.u.evtchn.send_port = port;
+ha.u.evtchn.type = type;
+ha.u.evtchn.flags = 0;
+ha.u.evtchn.deliver.eventfd.port = 0;
+ha.u.evtchn.deliver.eventfd.fd = fd;
+
+return kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, );
+}
+
 static bool valid_port(evtchn_port_t port)
 {
 if (!port) {
@@ -441,6 +478,32 @@ static bool valid_vcpu(uint32_t vcpu)
 return !!qemu_get_cpu(vcpu);
 }
 
+static void unbind_backend_ports(XenEvtchnState *s)
+{
+XenEvtchnPort *p;
+int i;
+
+for (i = 1; i < s->nr_ports; i++) {
+p = >port_table[i];
+if (p->type == EVTCHNSTAT_interdomain &&
+(p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU) ) {
+evtchn_port_t be_port = p->type_val & 
PORT_INFO_TYPEVAL_REMOTE_PORT_MASK;
+
+if (s->be_handles[be_port]) {
+/* This part will be overwritten on the load anyway. */
+p->type = EVTCHNSTAT_unbound;
+p->type_val = PORT_INFO_TYPEVAL_REMOTE_QEMU;
+
+/* Leave the backend port open and unbound too. */
+if (kvm_xen_has_cap(EVTCHN_SEND)) {
+deassign_kernel_port(i);
+}
+s->be_handles[be_port]->guest_port = 0;
+}
+}
+}
+}
+
 int xen_evtchn_status_op(struct evtchn_status *status)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
@@ -876,7 +939,14 @@ static int close_port(XenEvtchnState *s, evtchn_port_t 
port)
 
 case EVTCHNSTAT_interdomain:
 if (p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU) {
-/* Not yet implemented. This can't happen! */
+uint16_t be_port = p->type_val & ~PORT_INFO_TYPEVAL_REMOTE_QEMU;
+struct xenevtchn_handle *xc = s->be_handles[be_port];
+if (xc) {
+if (kvm_xen_has_cap(EVTCHN_SEND)) {
+deassign_kernel_port(port);
+}
+xc->guest_port = 0;
+}
 } else {
 /* Loopback interdomain */
 XenEvtchnPort *rp = >port_table[p->type_val];
@@ -1108,8 +1178,27 @@ int

[PATCH v11 10/59] i386/xen: implement HYPERVISOR_xen_version

2023-02-15 Thread David Woodhouse

From: Joao Martins 

This is just meant to serve as an example on how we can implement
hypercalls. xen_version specifically since Qemu does all kind of
feature controllability. So handling that here seems appropriate.

Signed-off-by: Joao Martins 
[dwmw2: Implement kvm_gva_rw() safely]
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/kvm/xen-emu.c | 86 +++
 1 file changed, 86 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 476f464ee2..56b80a7880 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -14,9 +14,55 @@
 #include "sysemu/kvm_int.h"
 #include "sysemu/kvm_xen.h"
 #include "kvm/kvm_i386.h"
+#include "exec/address-spaces.h"
 #include "xen-emu.h"
 #include "trace.h"
 
+#include "hw/xen/interface/version.h"
+
+static int kvm_gva_rw(CPUState *cs, uint64_t gva, void *_buf, size_t sz,
+  bool is_write)
+{
+uint8_t *buf = (uint8_t *)_buf;
+int ret;
+
+while (sz) {
+struct kvm_translation tr = {
+.linear_address = gva,
+};
+
+size_t len = TARGET_PAGE_SIZE - (tr.linear_address & 
~TARGET_PAGE_MASK);
+if (len > sz) {
+len = sz;
+}
+
+ret = kvm_vcpu_ioctl(cs, KVM_TRANSLATE, );
+if (ret || !tr.valid || (is_write && !tr.writeable)) {
+return -EFAULT;
+}
+
+cpu_physical_memory_rw(tr.physical_address, buf, len, is_write);
+
+buf += len;
+sz -= len;
+gva += len;
+}
+
+return 0;
+}
+
+static inline int kvm_copy_from_gva(CPUState *cs, uint64_t gva, void *buf,
+size_t sz)
+{
+return kvm_gva_rw(cs, gva, buf, sz, false);
+}
+
+static inline int kvm_copy_to_gva(CPUState *cs, uint64_t gva, void *buf,
+  size_t sz)
+{
+return kvm_gva_rw(cs, gva, buf, sz, true);
+}
+
 int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
 {
 const int required_caps = KVM_XEN_HVM_CONFIG_HYPERCALL_MSR |
@@ -87,6 +133,43 @@ uint32_t kvm_xen_get_caps(void)
 return kvm_state->xen_caps;
 }
 
+static bool kvm_xen_hcall_xen_version(struct kvm_xen_exit *exit, X86CPU *cpu,
+ int cmd, uint64_t arg)
+{
+int err = 0;
+
+switch (cmd) {
+case XENVER_get_features: {
+struct xen_feature_info fi;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(fi) == 8);
+
+err = kvm_copy_from_gva(CPU(cpu), arg, , sizeof(fi));
+if (err) {
+break;
+}
+
+fi.submap = 0;
+if (fi.submap_idx == 0) {
+fi.submap |= 1 << XENFEAT_writable_page_tables |
+ 1 << XENFEAT_writable_descriptor_tables |
+ 1 << XENFEAT_auto_translated_physmap |
+ 1 << XENFEAT_supervisor_mode_kernel;
+}
+
+err = kvm_copy_to_gva(CPU(cpu), arg, , sizeof(fi));
+break;
+}
+
+default:
+return false;
+}
+
+exit->u.hcall.result = err;
+return true;
+}
+
 static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
 {
 uint16_t code = exit->u.hcall.input;
@@ -97,6 +180,9 @@ static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct 
kvm_xen_exit *exit)
 }
 
 switch (code) {
+case __HYPERVISOR_xen_version:
+return kvm_xen_hcall_xen_version(exit, cpu, exit->u.hcall.params[0],
+ exit->u.hcall.params[1]);
 default:
 return false;
 }
-- 
2.39.0

[PATCH v11 19/59] i386/xen: implement HYPERVISOR_hvm_op

2023-02-15 Thread David Woodhouse

From: Joao Martins 

This is when guest queries for support for HVMOP_pagetable_dying.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/kvm/xen-emu.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 2b235e7b27..4002b1b797 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -26,6 +26,7 @@
 #include "hw/xen/interface/version.h"
 #include "hw/xen/interface/sched.h"
 #include "hw/xen/interface/memory.h"
+#include "hw/xen/interface/hvm/hvm_op.h"
 
 #include "xen-compat.h"
 
@@ -349,6 +350,19 @@ static bool kvm_xen_hcall_memory_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static bool kvm_xen_hcall_hvm_op(struct kvm_xen_exit *exit, X86CPU *cpu,
+ int cmd, uint64_t arg)
+{
+switch (cmd) {
+case HVMOP_pagetable_dying:
+exit->u.hcall.result = -ENOSYS;
+return true;
+
+default:
+return false;
+}
+}
+
 int kvm_xen_soft_reset(void)
 {
 int err;
@@ -450,6 +464,9 @@ static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct 
kvm_xen_exit *exit)
 case __HYPERVISOR_sched_op:
 return kvm_xen_hcall_sched_op(exit, cpu, exit->u.hcall.params[0],
   exit->u.hcall.params[1]);
+case __HYPERVISOR_hvm_op:
+return kvm_xen_hcall_hvm_op(exit, cpu, exit->u.hcall.params[0],
+exit->u.hcall.params[1]);
 case __HYPERVISOR_memory_op:
 return kvm_xen_hcall_memory_op(exit, cpu, exit->u.hcall.params[0],
exit->u.hcall.params[1]);
-- 
2.39.0

[PATCH v11 22/59] i386/xen: handle VCPUOP_register_vcpu_time_info

2023-02-15 Thread David Woodhouse

From: Joao Martins 

In order to support Linux vdso in Xen.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/cpu.h |   1 +
 target/i386/kvm/xen-emu.c | 100 +-
 target/i386/machine.c |   1 +
 3 files changed, 90 insertions(+), 12 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 109b2e5669..96c2d0d5cb 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1790,6 +1790,7 @@ typedef struct CPUArchState {
 struct kvm_nested_state *nested_state;
 uint64_t xen_vcpu_info_gpa;
 uint64_t xen_vcpu_info_default_gpa;
+uint64_t xen_vcpu_time_info_gpa;
 #endif
 #if defined(CONFIG_HVF)
 HVFX86LazyFlags hvf_lflags;
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 1cec8566ec..0b3bd0b889 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -37,28 +37,41 @@
 #define hypercall_compat32(longmode) (false)
 #endif
 
-static int kvm_gva_rw(CPUState *cs, uint64_t gva, void *_buf, size_t sz,
-  bool is_write)
+static bool kvm_gva_to_gpa(CPUState *cs, uint64_t gva, uint64_t *gpa,
+   size_t *len, bool is_write)
 {
-uint8_t *buf = (uint8_t *)_buf;
-int ret;
-
-while (sz) {
 struct kvm_translation tr = {
 .linear_address = gva,
 };
 
-size_t len = TARGET_PAGE_SIZE - (tr.linear_address & 
~TARGET_PAGE_MASK);
-if (len > sz) {
-len = sz;
+if (len) {
+*len = TARGET_PAGE_SIZE - (gva & ~TARGET_PAGE_MASK);
+}
+
+if (kvm_vcpu_ioctl(cs, KVM_TRANSLATE, ) || !tr.valid ||
+(is_write && !tr.writeable)) {
+return false;
 }
+*gpa = tr.physical_address;
+return true;
+}
+
+static int kvm_gva_rw(CPUState *cs, uint64_t gva, void *_buf, size_t sz,
+  bool is_write)
+{
+uint8_t *buf = (uint8_t *)_buf;
+uint64_t gpa;
+size_t len;
 
-ret = kvm_vcpu_ioctl(cs, KVM_TRANSLATE, );
-if (ret || !tr.valid || (is_write && !tr.writeable)) {
+while (sz) {
+if (!kvm_gva_to_gpa(cs, gva, , , is_write)) {
 return -EFAULT;
 }
+if (len > sz) {
+len = sz;
+}
 
-cpu_physical_memory_rw(tr.physical_address, buf, len, is_write);
+cpu_physical_memory_rw(gpa, buf, len, is_write);
 
 buf += len;
 sz -= len;
@@ -146,6 +159,7 @@ int kvm_xen_init_vcpu(CPUState *cs)
 
 env->xen_vcpu_info_gpa = INVALID_GPA;
 env->xen_vcpu_info_default_gpa = INVALID_GPA;
+env->xen_vcpu_time_info_gpa = INVALID_GPA;
 
 return 0;
 }
@@ -229,6 +243,17 @@ static void do_set_vcpu_info_gpa(CPUState *cs, 
run_on_cpu_data data)
   env->xen_vcpu_info_gpa);
 }
 
+static void do_set_vcpu_time_info_gpa(CPUState *cs, run_on_cpu_data data)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
+
+env->xen_vcpu_time_info_gpa = data.host_ulong;
+
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO,
+  env->xen_vcpu_time_info_gpa);
+}
+
 static void do_vcpu_soft_reset(CPUState *cs, run_on_cpu_data data)
 {
 X86CPU *cpu = X86_CPU(cs);
@@ -236,8 +261,11 @@ static void do_vcpu_soft_reset(CPUState *cs, 
run_on_cpu_data data)
 
 env->xen_vcpu_info_gpa = INVALID_GPA;
 env->xen_vcpu_info_default_gpa = INVALID_GPA;
+env->xen_vcpu_time_info_gpa = INVALID_GPA;
 
 kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO, INVALID_GPA);
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO,
+  INVALID_GPA);
 }
 
 static int xen_set_shared_info(uint64_t gfn)
@@ -453,6 +481,42 @@ static int vcpuop_register_vcpu_info(CPUState *cs, 
CPUState *target,
 return 0;
 }
 
+static int vcpuop_register_vcpu_time_info(CPUState *cs, CPUState *target,
+  uint64_t arg)
+{
+struct vcpu_register_time_memory_area tma;
+uint64_t gpa;
+size_t len;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(tma) == 8);
+qemu_build_assert(sizeof(struct vcpu_time_info) == 32);
+
+if (!target) {
+return -ENOENT;
+}
+
+if (kvm_copy_from_gva(cs, arg, , sizeof(tma))) {
+return -EFAULT;
+}
+
+/*
+ * Xen actually uses the GVA and does the translation through the guest
+ * page tables each time. But Linux/KVM uses the GPA, on the assumption
+ * that guests only ever use *global* addresses (kernel virtual addresses)
+ * for it. If Linux is changed to redo the GVA→GPA translation each time,
+ * it will offer a new vCPU attribute for that, and we'll use it instead.
+ */
+if (!kvm_gva_to_gpa(cs, tma.addr.p, , , false) ||
+len < sizeof(struct vcpu_time_info)) {
+return -EFAULT;
+}
+
+async_run_on_cpu(target,

[PATCH v11 41/59] hw/xen: Support HVM_PARAM_CALLBACK_TYPE_PCI_INTX callback

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

The guest is permitted to specify an arbitrary domain/bus/device/function
and INTX pin from which the callback IRQ shall appear to have come.

In QEMU we can only easily do this for devices that actually exist, and
even that requires us "knowing" that it's a PCMachine in order to find
the PCI root bus — although that's OK really because it's always true.

We also don't get to get notified of INTX routing changes, because we
can't do that as a passive observer; if we try to register a notifier
it will overwrite any existing notifier callback on the device.

But in practice, guests using PCI_INTX will only ever use pin A on the
Xen platform device, and won't swizzle the INTX routing after they set
it up. So this is just fine.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_evtchn.c  | 80 ---
 target/i386/kvm/xen-emu.c | 34 +
 2 files changed, 100 insertions(+), 14 deletions(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index ecc93da172..5d5996641d 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -28,6 +28,8 @@
 #include "hw/sysbus.h"
 #include "hw/xen/xen.h"
 #include "hw/i386/x86.h"
+#include "hw/i386/pc.h"
+#include "hw/pci/pci.h"
 #include "hw/irq.h"
 
 #include "xen_evtchn.h"
@@ -101,6 +103,7 @@ struct XenEvtchnState {
 
 uint64_t callback_param;
 bool evtchn_in_kernel;
+uint32_t callback_gsi;
 
 QEMUBH *gsi_bh;
 
@@ -217,11 +220,41 @@ static void xen_evtchn_register_types(void)
 
 type_init(xen_evtchn_register_types)
 
+static int set_callback_pci_intx(XenEvtchnState *s, uint64_t param)
+{
+PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
+uint8_t pin = param & 3;
+uint8_t devfn = (param >> 8) & 0xff;
+uint16_t bus = (param >> 16) & 0x;
+uint16_t domain = (param >> 32) & 0x;
+PCIDevice *pdev;
+PCIINTxRoute r;
+
+if (domain || !pcms) {
+return 0;
+}
+
+pdev = pci_find_device(pcms->bus, bus, devfn);
+if (!pdev) {
+return 0;
+}
+
+r = pci_device_route_intx_to_irq(pdev, pin);
+if (r.mode != PCI_INTX_ENABLED) {
+return 0;
+}
+
+/*
+ * Hm, can we be notified of INTX routing changes? Not without
+ * *owning* the device and being allowed to overwrite its own
+ * ->intx_routing_notifier, AFAICT. So let's not.
+ */
+return r.irq;
+}
+
 void xen_evtchn_set_callback_level(int level)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
-uint32_t param;
-
 if (!s) {
 return;
 }
@@ -260,18 +293,12 @@ void xen_evtchn_set_callback_level(int level)
 return;
 }
 
-param = (uint32_t)s->callback_param;
-
-switch (s->callback_param >> CALLBACK_VIA_TYPE_SHIFT) {
-case HVM_PARAM_CALLBACK_TYPE_GSI:
-if (param < GSI_NUM_PINS) {
-qemu_set_irq(s->gsis[param], level);
-if (level) {
-/* Ensure the vCPU polls for deassertion */
-kvm_xen_set_callback_asserted();
-}
+if (s->callback_gsi && s->callback_gsi < GSI_NUM_PINS) {
+qemu_set_irq(s->gsis[s->callback_gsi], level);
+if (level) {
+/* Ensure the vCPU polls for deassertion */
+kvm_xen_set_callback_asserted();
 }
-break;
 }
 }
 
@@ -283,15 +310,22 @@ int xen_evtchn_set_callback_param(uint64_t param)
 .u.vector = 0,
 };
 bool in_kernel = false;
+uint32_t gsi = 0;
+int type = param >> CALLBACK_VIA_TYPE_SHIFT;
 int ret;
 
 if (!s) {
 return -ENOTSUP;
 }
 
+/*
+ * We need the BQL because set_callback_pci_intx() may call into PCI code,
+ * and because we may need to manipulate the old and new GSI levels.
+ */
+assert(qemu_mutex_iothread_locked());
 qemu_mutex_lock(>port_lock);
 
-switch (param >> CALLBACK_VIA_TYPE_SHIFT) {
+switch (type) {
 case HVM_PARAM_CALLBACK_TYPE_VECTOR: {
 xa.u.vector = (uint8_t)param,
 
@@ -299,10 +333,17 @@ int xen_evtchn_set_callback_param(uint64_t param)
 if (!ret && kvm_xen_has_cap(EVTCHN_SEND)) {
 in_kernel = true;
 }
+gsi = 0;
 break;
 }
 
+case HVM_PARAM_CALLBACK_TYPE_PCI_INTX:
+gsi = set_callback_pci_intx(s, param);
+ret = gsi ? 0 : -EINVAL;
+break;
+
 case HVM_PARAM_CALLBACK_TYPE_GSI:
+gsi = (uint32_t)param;
 ret = 0;
 break;
 
@@ -320,6 +361,17 @@ int xen_evtchn_set_callback_param(uint64_t param)
 }
 s->callback_param = param;
 s->evtchn_in_kernel = in_kernel;
+
+if (gsi != s->callback_gsi) {
+struct vcpu_info *vi = kvm_xen_get_vcpu_info_hva(0);
+
+xen_evtchn_set_callback_level(0);
+s->callback_gsi = gsi;
+
+if (gsi && vi && vi->evtchn_upcall_pending) {
+kvm_xen_inject_vcpu_callback_vector(0, type);
+

[PATCH v11 11/59] i386/xen: implement HYPERVISOR_sched_op, SCHEDOP_shutdown

2023-02-15 Thread David Woodhouse

From: Joao Martins 

It allows to shutdown itself via hypercall with any of the 3 reasons:
  1) self-reboot
  2) shutdown
  3) crash

Implementing SCHEDOP_shutdown sub op let us handle crashes gracefully rather
than leading to triple faults if it remains unimplemented.

In addition, the SHUTDOWN_soft_reset reason is used for kexec, to reset
Xen shared pages and other enlightenments and leave a clean slate for the
new kernel without the hypervisor helpfully writing information at
unexpected addresses.

Signed-off-by: Joao Martins 
[dwmw2: Ditch sched_op_compat which was never available for HVM guests,
Add SCHEDOP_soft_reset]
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 include/sysemu/kvm_xen.h |  1 +
 target/i386/kvm/trace-events |  1 +
 target/i386/kvm/xen-emu.c| 75 
 3 files changed, 77 insertions(+)

diff --git a/include/sysemu/kvm_xen.h b/include/sysemu/kvm_xen.h
index 296533f2d5..5dffcc0542 100644
--- a/include/sysemu/kvm_xen.h
+++ b/include/sysemu/kvm_xen.h
@@ -12,6 +12,7 @@
 #ifndef QEMU_SYSEMU_KVM_XEN_H
 #define QEMU_SYSEMU_KVM_XEN_H
 
+int kvm_xen_soft_reset(void);
 uint32_t kvm_xen_get_caps(void);
 
 #define kvm_xen_has_cap(cap) (!!(kvm_xen_get_caps() &   \
diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index cd6f842b1f..bb732e1da8 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -8,3 +8,4 @@ kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
 
 # xen-emu.c
 kvm_xen_hypercall(int cpu, uint8_t cpl, uint64_t input, uint64_t a0, uint64_t 
a1, uint64_t a2, uint64_t ret) "xen_hypercall: cpu %d cpl %d input %" PRIu64 " 
a0 0x%" PRIx64 " a1 0x%" PRIx64 " a2 0x%" PRIx64" ret 0x%" PRIx64
+kvm_xen_soft_reset(void) ""
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 56b80a7880..4ed833656f 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -11,14 +11,17 @@
 
 #include "qemu/osdep.h"
 #include "qemu/log.h"
+#include "qemu/main-loop.h"
 #include "sysemu/kvm_int.h"
 #include "sysemu/kvm_xen.h"
 #include "kvm/kvm_i386.h"
 #include "exec/address-spaces.h"
 #include "xen-emu.h"
 #include "trace.h"
+#include "sysemu/runstate.h"
 
 #include "hw/xen/interface/version.h"
+#include "hw/xen/interface/sched.h"
 
 static int kvm_gva_rw(CPUState *cs, uint64_t gva, void *_buf, size_t sz,
   bool is_write)
@@ -170,6 +173,75 @@ static bool kvm_xen_hcall_xen_version(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+int kvm_xen_soft_reset(void)
+{
+assert(qemu_mutex_iothread_locked());
+
+trace_kvm_xen_soft_reset();
+
+/* Nothing to reset... yet. */
+return 0;
+}
+
+static int schedop_shutdown(CPUState *cs, uint64_t arg)
+{
+struct sched_shutdown shutdown;
+int ret = 0;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(shutdown) == 4);
+
+if (kvm_copy_from_gva(cs, arg, , sizeof(shutdown))) {
+return -EFAULT;
+}
+
+switch (shutdown.reason) {
+case SHUTDOWN_crash:
+cpu_dump_state(cs, stderr, CPU_DUMP_CODE);
+qemu_system_guest_panicked(NULL);
+break;
+
+case SHUTDOWN_reboot:
+qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+break;
+
+case SHUTDOWN_poweroff:
+qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+break;
+
+case SHUTDOWN_soft_reset:
+qemu_mutex_lock_iothread();
+ret = kvm_xen_soft_reset();
+qemu_mutex_unlock_iothread();
+break;
+
+default:
+ret = -EINVAL;
+break;
+}
+
+return ret;
+}
+
+static bool kvm_xen_hcall_sched_op(struct kvm_xen_exit *exit, X86CPU *cpu,
+   int cmd, uint64_t arg)
+{
+CPUState *cs = CPU(cpu);
+int err = -ENOSYS;
+
+switch (cmd) {
+case SCHEDOP_shutdown:
+err = schedop_shutdown(cs, arg);
+break;
+
+default:
+return false;
+}
+
+exit->u.hcall.result = err;
+return true;
+}
+
 static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
 {
 uint16_t code = exit->u.hcall.input;
@@ -180,6 +252,9 @@ static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct 
kvm_xen_exit *exit)
 }
 
 switch (code) {
+case __HYPERVISOR_sched_op:
+return kvm_xen_hcall_sched_op(exit, cpu, exit->u.hcall.params[0],
+  exit->u.hcall.params[1]);
 case __HYPERVISOR_xen_version:
 return kvm_xen_hcall_xen_version(exit, cpu, exit->u.hcall.params[0],
  exit->u.hcall.params[1]);
-- 
2.39.0

[PATCH v11 46/59] hw/xen: Implement GNTTABOP_query_size

2023-02-15 Thread David Woodhouse

From: David Woodhouse 

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/kvm/xen_gnttab.c  | 19 +++
 hw/i386/kvm/xen_gnttab.h  |  2 ++
 target/i386/kvm/xen-emu.c | 16 +++-
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/hw/i386/kvm/xen_gnttab.c b/hw/i386/kvm/xen_gnttab.c
index b54a94e2bd..1e691ded32 100644
--- a/hw/i386/kvm/xen_gnttab.c
+++ b/hw/i386/kvm/xen_gnttab.c
@@ -211,3 +211,22 @@ int xen_gnttab_get_version_op(struct gnttab_get_version 
*get)
 get->version = 1;
 return 0;
 }
+
+int xen_gnttab_query_size_op(struct gnttab_query_size *size)
+{
+XenGnttabState *s = xen_gnttab_singleton;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (size->dom != DOMID_SELF && size->dom != xen_domid) {
+size->status = GNTST_bad_domain;
+return 0;
+}
+
+size->status = GNTST_okay;
+size->nr_frames = s->nr_frames;
+size->max_nr_frames = s->max_frames;
+return 0;
+}
diff --git a/hw/i386/kvm/xen_gnttab.h b/hw/i386/kvm/xen_gnttab.h
index 79579677ba..3bdbe96191 100644
--- a/hw/i386/kvm/xen_gnttab.h
+++ b/hw/i386/kvm/xen_gnttab.h
@@ -17,7 +17,9 @@ int xen_gnttab_map_page(uint64_t idx, uint64_t gfn);
 
 struct gnttab_set_version;
 struct gnttab_get_version;
+struct gnttab_query_size;
 int xen_gnttab_set_version_op(struct gnttab_set_version *set);
 int xen_gnttab_get_version_op(struct gnttab_get_version *get);
+int xen_gnttab_query_size_op(struct gnttab_query_size *size);
 
 #endif /* QEMU_XEN_GNTTAB_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index e35b2d5557..44fa0de784 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -1204,7 +1204,21 @@ static bool kvm_xen_hcall_gnttab_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 }
 break;
 }
-case GNTTABOP_query_size:
+case GNTTABOP_query_size: {
+struct gnttab_query_size size;
+
+qemu_build_assert(sizeof(size) == 16);
+if (kvm_copy_from_gva(cs, arg, , sizeof(size))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_gnttab_query_size_op();
+if (!err && kvm_copy_to_gva(cs, arg, , sizeof(size))) {
+err = -EFAULT;
+}
+break;
+}
 case GNTTABOP_setup_table:
 case GNTTABOP_copy:
 case GNTTABOP_map_grant_ref:
-- 
2.39.0

Re: [PATCH 00/27] tcg: Simplify temporary usage

2023-02-15 Thread Richard Henderson


On 2/10/23 02:35, Emilio Cota wrote:

I ran yesterday linux-user SPEC06 benchmarks from your tcg-life branch.
I do see perf regressions for two workloads (sjeng and xalancbmk).
With perf(1) I see liveness_pass* are at 0.00%, so I wonder: is it
possible that the emitted code isn't quite the same?


Everything that I checked by hand was the same, but it's possible.
It's a tedious process.  You'd definitely want to turn off ASR.

My current branch has __attribute__((noreturn)) added to all of the liveness passes, so 
that they don't get folded into tcg_gen_code.  But I still would expect 0%.


r~

Re: [PATCH 00/27] tcg: Simplify temporary usage

2023-02-15 Thread Richard Henderson


Ping for the 9 patches lacking review.

r~

On 1/30/23 10:59, Richard Henderson wrote:

Based-on: 20230126043824.54819-1-richard.hender...@linaro.org
("[PATCH v5 00/36] tcg: Support for Int128 with helpers")

The biggest pitfall for new users of TCG is the fact that "normal"
temporaries die at branches, and we must therefore use a different
"local" temporary in that case.

The following patch set changes that, so that the "normal" temporary
is the one that lives across branches, and there is a special temporary
that dies at the end of the extended basic block, and this special
case is reserved for tcg internals.

TEMP_LOCAL is renamed TEMP_TB, which I believe to be more explicit and
less confusing.  TEMP_NORMAL is removed entirely.

I thought about putting in a proper full-power liveness analysis pass.
This would have eliminated the differences between all non-global
temporaries, and would have noticed when TEMP_LOCAL finally dies
within a translation and avoid any final writeback.
But I came to the conclusion that it was too expensive in runtime,
and so retaining some distinction in the types was required.

In addition, I found that the usage of temps within plugin-gen.c
(9 per guest memory operation) meant that we *must* have some form
of temp that can be re-used.  (There is one x86 instruction which
generates 62 memory operations; 62 * 9 == 558, which is larger than
our current TCG_MAX_TEMPS.)

However I did add a new liveness pass which, with a single pass over
the opcode stream, can see that a TEMP_LOCAL is only live within a
single extended basic block, and thus may be transformed to TEMP_EBB.

With this, and by not recycling TEMP_LOCAL, we can get identical code
out of the backend even when changing the front end translators are
adjusted to use TEMP_LOCAL for everything.

Benchmarking one test case, qemu-arm linux-test, the new liveness pass
comes in at about 1.6% on perf, but I can't see any difference in
wall clock time before and after the patch set.


r~


Richard Henderson (27):
   tcg: Adjust TCGContext.temps_in_use check
   accel/tcg: Pass max_insn to gen_intermediate_code by pointer
   accel/tcg: Use more accurate max_insns for tb_overflow
   tcg: Remove branch-to-next regardless of reference count
   tcg: Rename TEMP_LOCAL to TEMP_TB
   tcg: Add liveness_pass_0
   tcg: Remove TEMP_NORMAL
   tcg: Pass TCGTempKind to tcg_temp_new_internal
   tcg: Add tcg_temp_ebb_new_{i32,i64,ptr}
   tcg: Add tcg_gen_movi_ptr
   tcg: Use tcg_temp_ebb_new_* in tcg/
   accel/tcg/plugin: Use tcg_temp_ebb_*
   accel/tcg/plugin: Tidy plugin_gen_disable_mem_helpers
   tcg: Don't re-use TEMP_TB temporaries
   tcg: Change default temp lifetime to TEMP_TB
   target/arm: Drop copies in gen_sve_{ldr,str}
   target/arm: Don't use tcg_temp_local_new_*
   target/cris: Don't use tcg_temp_local_new
   target/hexagon: Don't use tcg_temp_local_new_*
   target/hppa: Don't use tcg_temp_local_new
   target/i386: Don't use tcg_temp_local_new
   target/mips: Don't use tcg_temp_local_new
   target/ppc: Don't use tcg_temp_local_new
   target/xtensa: Don't use tcg_temp_local_new_*
   exec/gen-icount: Don't use tcg_temp_local_new_i32
   tcg: Remove tcg_temp_local_new_*, tcg_const_local_*
   tcg: Update docs/devel/tcg-ops.rst for temporary changes

  docs/devel/tcg-ops.rst  | 103 
  target/hexagon/idef-parser/README.rst   |   4 +-
  include/exec/gen-icount.h   |   8 +-
  include/exec/translator.h   |   4 +-
  include/tcg/tcg-op.h|   7 +-
  include/tcg/tcg.h   |  64 ++---
  target/arm/translate-a64.h  |   1 -
  target/hexagon/gen_tcg.h|   4 +-
  accel/tcg/plugin-gen.c  |  33 +--
  accel/tcg/translate-all.c   |   2 +-
  accel/tcg/translator.c  |   6 +-
  target/alpha/translate.c|   2 +-
  target/arm/translate-a64.c  |   6 -
  target/arm/translate-sve.c  |  38 +--
  target/arm/translate.c  |   8 +-
  target/avr/translate.c  |   2 +-
  target/cris/translate.c |   8 +-
  target/hexagon/genptr.c |  16 +-
  target/hexagon/idef-parser/parser-helpers.c |   4 +-
  target/hexagon/translate.c  |   4 +-
  target/hppa/translate.c |   5 +-
  target/i386/tcg/translate.c |  29 +--
  target/loongarch/translate.c|   2 +-
  target/m68k/translate.c |   2 +-
  target/microblaze/translate.c   |   2 +-
  target/mips/tcg/translate.c |  59 ++---
  target/nios2/translate.c|   2 +-
  target/openrisc/translate.c |   2 +-
  target/ppc/translate.c  |   8 +-
  target/riscv/translate.c|   2 +-
  target/rx/translate.c   |   2 +-

Re: [PATCH v4 0/4] Fix deadlock when dying because of a signal

2023-02-15 Thread Richard Henderson


On 2/14/23 04:08, Ilya Leoshkevich wrote:

Based-on:<20230202005204.2055899-1-richard.hender...@linaro.org>
("[PATCH 00/14] linux-user/sparc: Handle missing traps")

v3:https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg03534.html
v3 -> v4: Add printfs to the test in order to make the uncaught signals
   less scary:

   $ build/x86_64-linux-user/qemu-x86_64 
build/tests/tcg/x86_64-linux-user/linux-fork-trap
   about to trigger fault...
   qemu: uncaught target signal 4 (Illegal instruction) - core dumped
   faulting thread exited cleanly


Queued to tcg-next, thanks.


r~

Re: [PATCH v2 01/15] linux-user/sparc: Raise SIGILL for all unhandled software traps

2023-02-15 Thread Richard Henderson


On 2/15/23 19:45, Richard Henderson wrote:

The linux kernel's trap tables vector all unassigned trap
numbers to BAD_TRAP, which then raises SIGILL.

Tested-by: Ilya Leoshkevich 
Reported-by: Ilya Leoshkevich 
Signed-off-by: Richard Henderson 
---
  linux-user/sparc/cpu_loop.c | 8 
  1 file changed, 8 insertions(+)


I'll queue this to tcg-next, along with Ilya's other start_exclusive patches.


r~

Re: [PATCH v1 RFC Zisslpcfi 7/9] target/riscv: Tracking indirect branches (fcfi) using TCG

2023-02-15 Thread Richard Henderson


On 2/8/23 20:24, Deepak Gupta wrote:

+if (cpu->cfg.ext_cfi) {
+/*
+ * For Forward CFI, only the expectation of a lpcll at
+ * the start of the block is tracked (which can only happen
+ * when FCFI is enabled for the current processor mode). A jump
+ * or call at the end of the previous TB will have updated
+ * env->elp to indicate the expectation.
+ */
+flags = FIELD_DP32(flags, TB_FLAGS, FCFI_LP_EXPECTED,
+   env->elp != NO_LP_EXPECTED);


You should also check cpu_fcfien here.  We can completely ignore elp if the feature is 
disabled.  Which means that the tb flag will be set if and only if we require a landing pad.



  static void riscv_tr_tb_start(DisasContextBase *db, CPUState *cpu)
  {
+DisasContext *ctx = container_of(db, DisasContext, base);
+
+if (ctx->fcfi_lp_expected) {
+/*
+ * Since we can't look ahead to confirm that the first
+ * instruction is a legal landing pad instruction, emit
+ * compare-and-branch sequence that will be fixed-up in
+ * riscv_tr_tb_stop() to either statically hit or skip an
+ * illegal instruction exception depending on whether the
+ * flag was lowered by translation of a CJLP or JLP as
+ * the first instruction in the block.


You can "look ahead" by deferring this to riscv_tr_translate_insn.
Compare target/arm/translate-a64.c, btype_destination_ok and uses thereof.
Note that risc-v does not have the same "guarded page" bit that aa64 does.


r~

RE: [PATCH] Adding ability to change disassembler syntax in TCG plugins

2023-02-15 Thread Mikhail Tyutin

> On 2/15/23 19:04, Mikhail Tyutin wrote:
> >> On 2/15/23 18:17, Mikhail Tyutin wrote:
> >>> ping
> >>>
> >>> patchew link:
> >>> https://patchew.org/QEMU/7d17f0cbb5ed4c90bbadd39924290...@yadro.com/
> >>>
> >>> 10.02.2023 18:24, Mikhail Tyutin wrote:
>  This patch adds new function qemu_plugin_insn_disas_with_syntax() that 
>  allows TCG
>  plugins to get disassembler string with non-default syntax if it wants 
>  to.
> 
>  Signed-off-by: Mikhail Tyutin 
> >>
> >> Why?
> >>
> >> It's certainly not very generic, exposing a disassembly quirk for exactly 
> >> one guest
> >> architecture.  I mean, you could just as easily link your plugin directly 
> >> to libcapstone
> >> via qemu_plugin_insn_data().
> >>
> >>
> >> r~
> >
> > I agree it can be done outside of Qemu using another disassembler library. 
> > However,
> > there are few reasons to do it in Qemu from architecture standpoint:
> >
> > 1. To have a single place of instruction decoding logic. TCG has to decode 
> > guest instructions
> > anyway. If plugins add another decoder, it causes double work and prone to 
> > errors (however
> > current implementation does double decode work anyway). For example, TCG 
> > might support
> > new instruction which is not available in external decoder yet.
> >
> > 2. Under the hood Qemu uses different implementations of decoder (in 
> > addition to capstone)
> > which is not exposed in public interface. If there is a need to configure 
> > its output, proposed
> > API allows that as well.
> >
> > 3. If multiple plugins want to use another disassembler syntax, they have 
> > to share
> > implementation as utility function.
> 
> What's all this got to do with preferring intel over at syntax?
> I still think it's a generally useless switch.
> 
> 
> r~

Linux-world prefers AT style, Windows-world prefers Intel style for x86_64 
ISA. That causes
a lot of pain for developers and tools that have to compare and parse assembler 
texts. If you have
to work on different hosts, you would better use one style for both.

[PATCH v2 09/15] linux-user/sparc: Handle getcc, setcc, getpsr traps

2023-02-15 Thread Richard Henderson

These are really only meaningful for sparc32, but they're
still present for backward compatibility for sparc64.

Signed-off-by: Richard Henderson 
---
 linux-user/sparc/cpu_loop.c | 62 +++--
 1 file changed, 59 insertions(+), 3 deletions(-)

diff --git a/linux-user/sparc/cpu_loop.c b/linux-user/sparc/cpu_loop.c
index e04c842867..a3edb353f6 100644
--- a/linux-user/sparc/cpu_loop.c
+++ b/linux-user/sparc/cpu_loop.c
@@ -149,6 +149,51 @@ static void flush_windows(CPUSPARCState *env)
 #endif
 }
 
+static void next_instruction(CPUSPARCState *env)
+{
+env->pc = env->npc;
+env->npc = env->npc + 4;
+}
+
+static uint32_t do_getcc(CPUSPARCState *env)
+{
+#ifdef TARGET_SPARC64
+return cpu_get_ccr(env) & 0xf;
+#else
+return extract32(cpu_get_psr(env), 20, 4);
+#endif
+}
+
+static void do_setcc(CPUSPARCState *env, uint32_t icc)
+{
+#ifdef TARGET_SPARC64
+cpu_put_ccr(env, (cpu_get_ccr(env) & 0xf0) | (icc & 0xf));
+#else
+cpu_put_psr(env, deposit32(cpu_get_psr(env), 20, 4, icc));
+#endif
+}
+
+static uint32_t do_getpsr(CPUSPARCState *env)
+{
+#ifdef TARGET_SPARC64
+const uint64_t TSTATE_CWP = 0x1f;
+const uint64_t TSTATE_ICC = 0xfull << 32;
+const uint64_t TSTATE_XCC = 0xfull << 36;
+const uint32_t PSR_S  = 0x0080u;
+const uint32_t PSR_V8PLUS = 0xff00u;
+uint64_t tstate = sparc64_tstate(env);
+
+/* See , tstate_to_psr. */
+return ((tstate & TSTATE_CWP)   |
+PSR_S   |
+((tstate & TSTATE_ICC) >> 12)   |
+((tstate & TSTATE_XCC) >> 20)   |
+PSR_V8PLUS);
+#else
+return (cpu_get_psr(env) & (PSR_ICC | PSR_CWP)) | PSR_S;
+#endif
+}
+
 /* Avoid ifdefs below for the abi32 and abi64 paths. */
 #ifdef TARGET_ABI32
 #define TARGET_TT_SYSCALL  (TT_TRAP + 0x10) /* t_linux */
@@ -218,9 +263,20 @@ void cpu_loop (CPUSPARCState *env)
 
 case TT_TRAP + 0x03: /* flush windows */
 flush_windows(env);
-/* next instruction */
-env->pc = env->npc;
-env->npc = env->npc + 4;
+next_instruction(env);
+break;
+
+case TT_TRAP + 0x20: /* getcc */
+env->gregs[1] = do_getcc(env);
+next_instruction(env);
+break;
+case TT_TRAP + 0x21: /* setcc */
+do_setcc(env, env->gregs[1]);
+next_instruction(env);
+break;
+case TT_TRAP + 0x22: /* getpsr */
+env->gregs[1] = do_getpsr(env);
+next_instruction(env);
 break;
 
 #ifdef TARGET_SPARC64
-- 
2.34.1

[PATCH v2 15/15] linux-user/sparc: Handle tag overflow traps

2023-02-15 Thread Richard Henderson

This trap is raised by taddcctv and tsubcctv insns.

Signed-off-by: Richard Henderson 
---
 linux-user/sparc/target_signal.h | 2 +-
 linux-user/syscall_defs.h| 5 +
 linux-user/sparc/cpu_loop.c  | 3 +++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/linux-user/sparc/target_signal.h b/linux-user/sparc/target_signal.h
index 87757f0c4e..f223eb4af6 100644
--- a/linux-user/sparc/target_signal.h
+++ b/linux-user/sparc/target_signal.h
@@ -8,7 +8,7 @@
 #define TARGET_SIGTRAP   5
 #define TARGET_SIGABRT   6
 #define TARGET_SIGIOT6
-#define TARGET_SIGSTKFLT 7 /* actually EMT */
+#define TARGET_SIGEMT7
 #define TARGET_SIGFPE8
 #define TARGET_SIGKILL   9
 #define TARGET_SIGBUS   10
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 77864de57f..614a1cbc8e 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -717,6 +717,11 @@ typedef struct target_siginfo {
 #define TARGET_TRAP_HWBKPT  (4) /* hardware breakpoint/watchpoint */
 #define TARGET_TRAP_UNK (5) /* undiagnosed trap */
 
+/*
+ * SIGEMT si_codes
+ */
+#define TARGET_EMT_TAGOVF  1   /* tag overflow */
+
 #include "target_resource.h"
 
 struct target_pollfd {
diff --git a/linux-user/sparc/cpu_loop.c b/linux-user/sparc/cpu_loop.c
index 5a8a71e976..b36bb2574b 100644
--- a/linux-user/sparc/cpu_loop.c
+++ b/linux-user/sparc/cpu_loop.c
@@ -328,6 +328,9 @@ void cpu_loop (CPUSPARCState *env)
 case TT_PRIV_INSN:
 force_sig_fault(TARGET_SIGILL, TARGET_ILL_PRVOPC, env->pc);
 break;
+case TT_TOVF:
+force_sig_fault(TARGET_SIGEMT, TARGET_EMT_TAGOVF, env->pc);
+break;
 #ifdef TARGET_SPARC64
 case TT_PRIV_ACT:
 /* Note do_privact defers to do_privop. */
-- 
2.34.1

[PATCH v2 14/15] linux-user/sparc: Handle floating-point exceptions

2023-02-15 Thread Richard Henderson

Raise SIGFPE for ieee exceptions.

The other types, such as FSR_FTT_UNIMPFPOP, should not appear,
because we enable normal emulation of missing insns at the
start of sparc_cpu_realizefn().

Signed-off-by: Richard Henderson 
---
 target/sparc/cpu.h  |  3 +--
 linux-user/sparc/cpu_loop.c | 22 ++
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/target/sparc/cpu.h b/target/sparc/cpu.h
index e478c5eb16..ae8de606d5 100644
--- a/target/sparc/cpu.h
+++ b/target/sparc/cpu.h
@@ -197,8 +197,7 @@ enum {
 #define FSR_FTT2   (1ULL << 16)
 #define FSR_FTT1   (1ULL << 15)
 #define FSR_FTT0   (1ULL << 14)
-//gcc warns about constant overflow for ~FSR_FTT_MASK
-//#define FSR_FTT_MASK (FSR_FTT2 | FSR_FTT1 | FSR_FTT0)
+#define FSR_FTT_MASK (FSR_FTT2 | FSR_FTT1 | FSR_FTT0)
 #ifdef TARGET_SPARC64
 #define FSR_FTT_NMASK  0xfffe3fffULL
 #define FSR_FTT_CEXC_NMASK 0xfffe3fe0ULL
diff --git a/linux-user/sparc/cpu_loop.c b/linux-user/sparc/cpu_loop.c
index 093358a39a..5a8a71e976 100644
--- a/linux-user/sparc/cpu_loop.c
+++ b/linux-user/sparc/cpu_loop.c
@@ -297,6 +297,28 @@ void cpu_loop (CPUSPARCState *env)
 restore_window(env);
 break;
 
+case TT_FP_EXCP:
+{
+int code = TARGET_FPE_FLTUNK;
+target_ulong fsr = env->fsr;
+
+if ((fsr & FSR_FTT_MASK) == FSR_FTT_IEEE_EXCP) {
+if (fsr & FSR_NVC) {
+code = TARGET_FPE_FLTINV;
+} else if (fsr & FSR_OFC) {
+code = TARGET_FPE_FLTOVF;
+} else if (fsr & FSR_UFC) {
+code = TARGET_FPE_FLTUND;
+} else if (fsr & FSR_DZC) {
+code = TARGET_FPE_FLTDIV;
+} else if (fsr & FSR_NXC) {
+code = TARGET_FPE_FLTRES;
+}
+}
+force_sig_fault(TARGET_SIGFPE, code, env->pc);
+}
+break;
+
 case EXCP_INTERRUPT:
 /* just indicate that signals should be handled asap */
 break;
-- 
2.34.1

[PATCH v2 04/15] linux-user/sparc: Use TT_TRAP for flush windows

2023-02-15 Thread Richard Henderson

The v9 and pre-v9 code can be unified with this macro.

Signed-off-by: Richard Henderson 
---
 linux-user/sparc/cpu_loop.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/linux-user/sparc/cpu_loop.c b/linux-user/sparc/cpu_loop.c
index 051a292ce5..e1d08ff204 100644
--- a/linux-user/sparc/cpu_loop.c
+++ b/linux-user/sparc/cpu_loop.c
@@ -196,15 +196,14 @@ void cpu_loop (CPUSPARCState *env)
 env->pc = env->npc;
 env->npc = env->npc + 4;
 break;
-case 0x83: /* flush windows */
-#ifdef TARGET_ABI32
-case 0x103:
-#endif
+
+case TT_TRAP + 0x03: /* flush windows */
 flush_windows(env);
 /* next instruction */
 env->pc = env->npc;
 env->npc = env->npc + 4;
 break;
+
 #ifndef TARGET_SPARC64
 case TT_WIN_OVF: /* window overflow */
 save_window(env);
-- 
2.34.1

Re: [PATCH 1/2] configure: Add 'mkdir build' check

2023-02-15 Thread Dinah B

*ping*

Patch series:
https://lore.kernel.org/qemu-devel/20230208233111.398577-1-dinahbaum...@gmail.com/

-Dinah

On Wed, Feb 8, 2023 at 6:31 PM Dinah Baum  wrote:

> QEMU configure script goes into an infinite error printing loop
> when in read only directory due to 'build' dir never being created.
>
> Checking if 'mkdir dir' succeeds prevents this error.
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/321
> ---
>  configure | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/configure b/configure
> index 64960c6000..3b384914ce 100755
> --- a/configure
> +++ b/configure
> @@ -31,10 +31,11 @@ then
>  fi
>  fi
>
> -mkdir build
> -touch $MARKER
> +if mkdir build
> +then
> +touch $MARKER
>
> -cat > GNUmakefile <<'EOF'
> +cat > GNUmakefile <<'EOF'
>  # This file is auto-generated by configure to support in-source tree
>  # 'make' command invocation
>
> @@ -56,8 +57,12 @@ force: ;
>  GNUmakefile: ;
>
>  EOF
> -cd build
> -exec "$source_path/configure" "$@"
> +cd build
> +exec "$source_path/configure" "$@"
> +else
> +echo "ERROR: Unable to use ./build dir, try using a
> ../qemu/configure build"
> +exit 1
> +fi
>  fi
>
>  # Temporary directory used for files created while
> --
> 2.30.2
>
>

[PATCH v2 05/15] linux-user/sparc: Tidy window spill/fill traps

2023-02-15 Thread Richard Henderson

Add some macros to localize the hw difference between v9 and pre-v9.

Signed-off-by: Richard Henderson 
---
 linux-user/sparc/cpu_loop.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/linux-user/sparc/cpu_loop.c b/linux-user/sparc/cpu_loop.c
index e1d08ff204..2bcf32590f 100644
--- a/linux-user/sparc/cpu_loop.c
+++ b/linux-user/sparc/cpu_loop.c
@@ -158,6 +158,15 @@ static void flush_windows(CPUSPARCState *env)
 #define syscall_cc xcc
 #endif
 
+/* Avoid ifdefs below for the v9 and pre-v9 hw traps. */
+#ifdef TARGET_SPARC64
+#define TARGET_TT_SPILL  TT_SPILL
+#define TARGET_TT_FILL   TT_FILL
+#else
+#define TARGET_TT_SPILL  TT_WIN_OVF
+#define TARGET_TT_FILL   TT_WIN_UNF
+#endif
+
 void cpu_loop (CPUSPARCState *env)
 {
 CPUState *cs = env_cpu(env);
@@ -204,20 +213,14 @@ void cpu_loop (CPUSPARCState *env)
 env->npc = env->npc + 4;
 break;
 
-#ifndef TARGET_SPARC64
-case TT_WIN_OVF: /* window overflow */
+case TARGET_TT_SPILL: /* window overflow */
 save_window(env);
 break;
-case TT_WIN_UNF: /* window underflow */
-restore_window(env);
-break;
-#else
-case TT_SPILL: /* window overflow */
-save_window(env);
-break;
-case TT_FILL: /* window underflow */
+case TARGET_TT_FILL:  /* window underflow */
 restore_window(env);
 break;
+
+#ifdef TARGET_SPARC64
 #ifndef TARGET_ABI32
 case 0x16e:
 flush_windows(env);
-- 
2.34.1

[PATCH v2 13/15] linux-user/sparc: Handle unimplemented flush trap

2023-02-15 Thread Richard Henderson

For sparc64, TT_UNIMP_FLUSH == TT_ILL_INSN, so this is
already handled.  For sparc32, the kernel uses SKIP_TRAP.

Signed-off-by: Richard Henderson 
---
 linux-user/sparc/cpu_loop.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/linux-user/sparc/cpu_loop.c b/linux-user/sparc/cpu_loop.c
index bf7e10216f..093358a39a 100644
--- a/linux-user/sparc/cpu_loop.c
+++ b/linux-user/sparc/cpu_loop.c
@@ -315,6 +315,9 @@ void cpu_loop (CPUSPARCState *env)
 case TT_NCP_INSN:
 force_sig_fault(TARGET_SIGILL, TARGET_ILL_COPROC, env->pc);
 break;
+case TT_UNIMP_FLUSH:
+next_instruction(env);
+break;
 #endif
 case EXCP_ATOMIC:
 cpu_exec_step_atomic(cs);
-- 
2.34.1

[PATCH v2 07/15] linux-user/sparc: Handle software breakpoint trap

2023-02-15 Thread Richard Henderson

This is 'ta 1' for both v9 and pre-v9.

Signed-off-by: Richard Henderson 
---
 linux-user/sparc/cpu_loop.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/linux-user/sparc/cpu_loop.c b/linux-user/sparc/cpu_loop.c
index edbc4f3bdc..c14eaea163 100644
--- a/linux-user/sparc/cpu_loop.c
+++ b/linux-user/sparc/cpu_loop.c
@@ -206,6 +206,11 @@ void cpu_loop (CPUSPARCState *env)
 env->npc = env->npc + 4;
 break;
 
+case TT_TRAP + 0x01: /* breakpoint */
+case EXCP_DEBUG:
+force_sig_fault(TARGET_SIGTRAP, TARGET_TRAP_BRKPT, env->pc);
+break;
+
 case TT_TRAP + 0x03: /* flush windows */
 flush_windows(env);
 /* next instruction */
@@ -237,9 +242,6 @@ void cpu_loop (CPUSPARCState *env)
 case TT_ILL_INSN:
 force_sig_fault(TARGET_SIGILL, TARGET_ILL_ILLOPC, env->pc);
 break;
-case EXCP_DEBUG:
-force_sig_fault(TARGET_SIGTRAP, TARGET_TRAP_BRKPT, env->pc);
-break;
 case EXCP_ATOMIC:
 cpu_exec_step_atomic(cs);
 break;
-- 
2.34.1

[PATCH v2 12/15] linux-user/sparc: Handle coprocessor disabled trap

2023-02-15 Thread Richard Henderson

Since qemu does not implement a sparc coprocessor, all such
instructions raise this trap.  Because of that, we never raise
the coprocessor exception trap, which would be vector 0x28.

Signed-off-by: Richard Henderson 
---
 linux-user/sparc/cpu_loop.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/linux-user/sparc/cpu_loop.c b/linux-user/sparc/cpu_loop.c
index 43f19fbd91..bf7e10216f 100644
--- a/linux-user/sparc/cpu_loop.c
+++ b/linux-user/sparc/cpu_loop.c
@@ -311,6 +311,10 @@ void cpu_loop (CPUSPARCState *env)
 /* Note do_privact defers to do_privop. */
 force_sig_fault(TARGET_SIGILL, TARGET_ILL_PRVOPC, env->pc);
 break;
+#else
+case TT_NCP_INSN:
+force_sig_fault(TARGET_SIGILL, TARGET_ILL_COPROC, env->pc);
+break;
 #endif
 case EXCP_ATOMIC:
 cpu_exec_step_atomic(cs);
-- 
2.34.1

[PATCH v2 03/15] linux-user/sparc: Tidy syscall error return

2023-02-15 Thread Richard Henderson

Reduce ifdefs with #define syscall_cc.

Signed-off-by: Richard Henderson 
---
 linux-user/sparc/cpu_loop.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/linux-user/sparc/cpu_loop.c b/linux-user/sparc/cpu_loop.c
index d31ea057db..051a292ce5 100644
--- a/linux-user/sparc/cpu_loop.c
+++ b/linux-user/sparc/cpu_loop.c
@@ -149,10 +149,13 @@ static void flush_windows(CPUSPARCState *env)
 #endif
 }
 
+/* Avoid ifdefs below for the abi32 and abi64 paths. */
 #ifdef TARGET_ABI32
 #define TARGET_TT_SYSCALL  (TT_TRAP + 0x10) /* t_linux */
+#define syscall_cc psr
 #else
 #define TARGET_TT_SYSCALL  (TT_TRAP + 0x6d) /* tl0_linux64 */
+#define syscall_cc xcc
 #endif
 
 void cpu_loop (CPUSPARCState *env)
@@ -183,18 +186,10 @@ void cpu_loop (CPUSPARCState *env)
 break;
 }
 if ((abi_ulong)ret >= (abi_ulong)(-515)) {
-#if defined(TARGET_SPARC64) && !defined(TARGET_ABI32)
-env->xcc |= PSR_CARRY;
-#else
-env->psr |= PSR_CARRY;
-#endif
+env->syscall_cc |= PSR_CARRY;
 ret = -ret;
 } else {
-#if defined(TARGET_SPARC64) && !defined(TARGET_ABI32)
-env->xcc &= ~PSR_CARRY;
-#else
-env->psr &= ~PSR_CARRY;
-#endif
+env->syscall_cc &= ~PSR_CARRY;
 }
 env->regwptr[0] = ret;
 /* next instruction */
-- 
2.34.1

[PATCH v2 06/15] linux-user/sparc: Fix sparc64_{get, set}_context traps

2023-02-15 Thread Richard Henderson

These traps are present for sparc64 with ilp32, aka sparc32plus.
Enabling them means adjusting the defines over in signal.c,
and fixing an incorrect usage of abi_ulong when we really meant
the full register, target_ulong.

Signed-off-by: Richard Henderson 
---
 linux-user/sparc/cpu_loop.c | 23 +++
 linux-user/sparc/signal.c   | 36 +++-
 2 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/linux-user/sparc/cpu_loop.c b/linux-user/sparc/cpu_loop.c
index 2bcf32590f..edbc4f3bdc 100644
--- a/linux-user/sparc/cpu_loop.c
+++ b/linux-user/sparc/cpu_loop.c
@@ -213,6 +213,17 @@ void cpu_loop (CPUSPARCState *env)
 env->npc = env->npc + 4;
 break;
 
+#ifdef TARGET_SPARC64
+case TT_TRAP + 0x6e:
+flush_windows(env);
+sparc64_get_context(env);
+break;
+case TT_TRAP + 0x6f:
+flush_windows(env);
+sparc64_set_context(env);
+break;
+#endif
+
 case TARGET_TT_SPILL: /* window overflow */
 save_window(env);
 break;
@@ -220,18 +231,6 @@ void cpu_loop (CPUSPARCState *env)
 restore_window(env);
 break;
 
-#ifdef TARGET_SPARC64
-#ifndef TARGET_ABI32
-case 0x16e:
-flush_windows(env);
-sparc64_get_context(env);
-break;
-case 0x16f:
-flush_windows(env);
-sparc64_set_context(env);
-break;
-#endif
-#endif
 case EXCP_INTERRUPT:
 /* just indicate that signals should be handled asap */
 break;
diff --git a/linux-user/sparc/signal.c b/linux-user/sparc/signal.c
index b501750fe0..2be9000b9e 100644
--- a/linux-user/sparc/signal.c
+++ b/linux-user/sparc/signal.c
@@ -503,7 +503,23 @@ long do_rt_sigreturn(CPUSPARCState *env)
 return -QEMU_ESIGRETURN;
 }
 
-#if defined(TARGET_SPARC64) && !defined(TARGET_ABI32)
+#ifdef TARGET_ABI32
+void setup_sigtramp(abi_ulong sigtramp_page)
+{
+uint32_t *tramp = lock_user(VERIFY_WRITE, sigtramp_page, 2 * 8, 0);
+assert(tramp != NULL);
+
+default_sigreturn = sigtramp_page;
+install_sigtramp(tramp, TARGET_NR_sigreturn);
+
+default_rt_sigreturn = sigtramp_page + 8;
+install_sigtramp(tramp + 2, TARGET_NR_rt_sigreturn);
+
+unlock_user(tramp, sigtramp_page, 2 * 8);
+}
+#endif
+
+#ifdef TARGET_SPARC64
 #define SPARC_MC_TSTATE 0
 #define SPARC_MC_PC 1
 #define SPARC_MC_NPC 2
@@ -575,7 +591,7 @@ void sparc64_set_context(CPUSPARCState *env)
 struct target_ucontext *ucp;
 target_mc_gregset_t *grp;
 target_mc_fpu_t *fpup;
-abi_ulong pc, npc, tstate;
+target_ulong pc, npc, tstate;
 unsigned int i;
 unsigned char fenab;
 
@@ -773,18 +789,4 @@ do_sigsegv:
 unlock_user_struct(ucp, ucp_addr, 1);
 force_sig(TARGET_SIGSEGV);
 }
-#else
-void setup_sigtramp(abi_ulong sigtramp_page)
-{
-uint32_t *tramp = lock_user(VERIFY_WRITE, sigtramp_page, 2 * 8, 0);
-assert(tramp != NULL);
-
-default_sigreturn = sigtramp_page;
-install_sigtramp(tramp, TARGET_NR_sigreturn);
-
-default_rt_sigreturn = sigtramp_page + 8;
-install_sigtramp(tramp + 2, TARGET_NR_rt_sigreturn);
-
-unlock_user(tramp, sigtramp_page, 2 * 8);
-}
-#endif
+#endif /* TARGET_SPARC64 */
-- 
2.34.1

[PATCH v2 08/15] linux-user/sparc: Handle division by zero traps

2023-02-15 Thread Richard Henderson

In addition to the hw trap vector, there is a software trap
assigned for older sparc without hw division instructions.

Signed-off-by: Richard Henderson 
---
 linux-user/sparc/cpu_loop.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/linux-user/sparc/cpu_loop.c b/linux-user/sparc/cpu_loop.c
index c14eaea163..e04c842867 100644
--- a/linux-user/sparc/cpu_loop.c
+++ b/linux-user/sparc/cpu_loop.c
@@ -211,6 +211,11 @@ void cpu_loop (CPUSPARCState *env)
 force_sig_fault(TARGET_SIGTRAP, TARGET_TRAP_BRKPT, env->pc);
 break;
 
+case TT_TRAP + 0x02: /* div0 */
+case TT_DIV_ZERO:
+force_sig_fault(TARGET_SIGFPE, TARGET_FPE_INTDIV, env->pc);
+break;
+
 case TT_TRAP + 0x03: /* flush windows */
 flush_windows(env);
 /* next instruction */
-- 
2.34.1

1 2 3 4 5 6 >

1 - 100 of 598 matches

Mail list logo