Re: [PATCH v4 1/2] tpm: convert tpmdev options processing to new visitor format

2022-12-30 Thread James Bottomley
On Fri, 2022-12-30 at 12:01 -0500, Stefan Berger wrote:
> On 12/30/22 10:24, James Bottomley wrote:
[...]
> > @@ -2906,9 +2893,7 @@ void qemu_init(int argc, char **argv)
> >   break;
> >   #ifdef CONFIG_TPM
> >   case QEMU_OPTION_tpmdev:
> > -    if (tpm_config_parse(qemu_find_opts("tpmdev"),
> > optarg) < 0) {
> > -    exit(1);
> > -    }
> > +    tpm_config_parse(optarg);
> 
> The patches don't apply to upstream's master.

I think it depends how you apply them.  If you use git, they do except
a minor merge conflict in tpm_passthrough.c

More seriously there's now a compile failure in tpm_mssim.c because of
the lost has_X for X pointer options, but it's also easily fixable.

> This used to exit() on failure but doesn't do this anymore, though it
> probably should.

Actually it still does.  I converted it to the standard _fatal
way of doing this, which will cause an exit(1) if we get an error.  The
error_fatal construct seems to have been done precisely to cure this
type of return value threading.

James




Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread H. Peter Anvin

On 12/30/22 17:06, H. Peter Anvin wrote


TThe 62 MB limit mentioned in boot.rst is unrelated, and only applies to 
very, very old kernels that used INT 15h, AH=88h to probe memory.




I am 88% sure this was fixed long before setup_data was created, as it 
was created originally to carry e820 info for more than 128(!) memory 
segments. However, as we see here, it is never certain that bugs didn't 
creep in in the meantime...


-hpa




Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread H. Peter Anvin




On 12/30/22 14:10, Jason A. Donenfeld wrote:

On Fri, Dec 30, 2022 at 01:58:39PM -0800, H. Peter Anvin wrote:

See the other thread fork. They have identified the problem already.


Not sure I follow. Is there another thread where somebody worked out why
this 62meg limit was happening?

Note that I sent v2/v3, to fix the original problem in a different way,
and if that looks good to the QEMU maintainers, then we can all be happy
with that. But I *haven't* addressed and still don't fully understand
why the 62meg limit applied to my v1 in the way it does. Did you find a
bug there to fix? If so, please do CC me.



Yes, you yourself posted the problem:


Then build qemu. Run it with `-kernel bzImage`, based on the kernel
built with the .config I attached.

You'll see that the CPU triple faults when hitting this line:

sd = (struct setup_data *)boot_params->hdr.setup_data;
while (sd) {
unsigned long sd_addr = (unsigned long)sd;

kernel_add_identity_map(sd_addr, sd_addr + sizeof(*sd) + sd->len); 
 <
sd = (struct setup_data *)sd->next;
}

, because it dereferences *sd. This does not happen if the decompressed
size of the kernel is < 62 megs.

So that's the "big and pretty serious" bug that might be worthy of
investigation.


This needs to be something like:

kernel_add_identity_map(sd_addr, sd_addr + sizeof(*sd));
kernel_add_identity_map(sd_addr + sizeof(*sd),
sd_addr + sizeof(*sd) + sd->len);


TThe 62 MB limit mentioned in boot.rst is unrelated, and only applies to 
very, very old kernels that used INT 15h, AH=88h to probe memory.


-hpa



Re: [PATCH v4] hw/rtc/mc146818rtc: Make this rtc device target independent

2022-12-30 Thread Bernhard Beschow



Am 29. Dezember 2022 10:58:48 UTC schrieb Thomas Huth :
>The only reason for this code being target dependent is the apic-related
>code in rtc_policy_slew_deliver_irq(). Since these apic functions are rather
>simple, we can easily move them into a new, separate file (apic_irqcount.c)
>which will always be compiled and linked if either APIC or the mc146818 device
>are required. This way we can get rid of the #ifdef TARGET_I386 switches in
>mc146818rtc.c and declare it in the softmmu_ss instead of specific_ss, so
>that the code only gets compiled once for all targets.
>
>Reviewed-by: Philippe Mathieu-Daudé 
>Reviewed-by: Mark Cave-Ayland 
>Signed-off-by: Thomas Huth 
>---
> v4: Check for QEMU_ARCH_I386 instead of looking for an APIC

Can we find a more appropriate name for the helpers than "apic" then? If the 
slew tick policy is a workaround for (x86-) KVM I propose to do s/apic/kvm/ 
while still compiling for every target.

>
> include/hw/i386/apic.h  |  1 +
> include/hw/i386/apic_internal.h |  1 -
> include/hw/rtc/mc146818rtc.h|  1 +
> hw/intc/apic_common.c   | 27 -
> hw/intc/apic_irqcount.c | 53 +
> hw/rtc/mc146818rtc.c| 26 +---
> hw/intc/meson.build |  6 +++-
> hw/rtc/meson.build  |  3 +-
> 8 files changed, 69 insertions(+), 49 deletions(-)
> create mode 100644 hw/intc/apic_irqcount.c
>
>diff --git a/include/hw/i386/apic.h b/include/hw/i386/apic.h
>index da1d2fe155..96b9939425 100644
>--- a/include/hw/i386/apic.h
>+++ b/include/hw/i386/apic.h
>@@ -9,6 +9,7 @@ int apic_accept_pic_intr(DeviceState *s);
> void apic_deliver_pic_intr(DeviceState *s, int level);
> void apic_deliver_nmi(DeviceState *d);
> int apic_get_interrupt(DeviceState *s);
>+void apic_report_irq_delivered(int delivered);
> void apic_reset_irq_delivered(void);
> int apic_get_irq_delivered(void);
> void cpu_set_apic_base(DeviceState *s, uint64_t val);
>diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h
>index c175e7e718..e61ad04769 100644
>--- a/include/hw/i386/apic_internal.h
>+++ b/include/hw/i386/apic_internal.h
>@@ -199,7 +199,6 @@ typedef struct VAPICState {
> 
> extern bool apic_report_tpr_access;
> 
>-void apic_report_irq_delivered(int delivered);
> bool apic_next_timer(APICCommonState *s, int64_t current_time);
> void apic_enable_tpr_access_reporting(DeviceState *d, bool enable);
> void apic_enable_vapic(DeviceState *d, hwaddr paddr);
>diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
>index 1db0fcee92..45bcd6f040 100644
>--- a/include/hw/rtc/mc146818rtc.h
>+++ b/include/hw/rtc/mc146818rtc.h
>@@ -55,5 +55,6 @@ ISADevice *mc146818_rtc_init(ISABus *bus, int base_year,
>  qemu_irq intercept_irq);
> void rtc_set_memory(ISADevice *dev, int addr, int val);
> int rtc_get_memory(ISADevice *dev, int addr);
>+void qmp_rtc_reset_reinjection(Error **errp);
> 
> #endif /* HW_RTC_MC146818RTC_H */
>diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
>index 2a20982066..b0f85f9384 100644
>--- a/hw/intc/apic_common.c
>+++ b/hw/intc/apic_common.c
>@@ -33,7 +33,6 @@
> #include "hw/sysbus.h"
> #include "migration/vmstate.h"
> 
>-static int apic_irq_delivered;
> bool apic_report_tpr_access;
> 
> void cpu_set_apic_base(DeviceState *dev, uint64_t val)
>@@ -122,32 +121,6 @@ void apic_handle_tpr_access_report(DeviceState *dev, 
>target_ulong ip,
> vapic_report_tpr_access(s->vapic, CPU(s->cpu), ip, access);
> }
> 
>-void apic_report_irq_delivered(int delivered)
>-{
>-apic_irq_delivered += delivered;
>-
>-trace_apic_report_irq_delivered(apic_irq_delivered);
>-}
>-
>-void apic_reset_irq_delivered(void)
>-{
>-/* Copy this into a local variable to encourage gcc to emit a plain
>- * register for a sys/sdt.h marker.  For details on this workaround, see:
>- * https://sourceware.org/bugzilla/show_bug.cgi?id=13296
>- */
>-volatile int a_i_d = apic_irq_delivered;
>-trace_apic_reset_irq_delivered(a_i_d);
>-
>-apic_irq_delivered = 0;
>-}
>-
>-int apic_get_irq_delivered(void)
>-{
>-trace_apic_get_irq_delivered(apic_irq_delivered);
>-
>-return apic_irq_delivered;
>-}
>-
> void apic_deliver_nmi(DeviceState *dev)
> {
> APICCommonState *s = APIC_COMMON(dev);
>diff --git a/hw/intc/apic_irqcount.c b/hw/intc/apic_irqcount.c
>new file mode 100644
>index 00..0aadef1cb5
>--- /dev/null
>+++ b/hw/intc/apic_irqcount.c
>@@ -0,0 +1,53 @@
>+/*
>+ * APIC support - functions for counting the delivered IRQs.
>+ * (this code is in a separate file since it is used from the
>+ * mc146818rtc code on targets without APIC, too)
>+ *
>+ *  Copyright (c) 2011  Jan Kiszka, Siemens AG
>+ *
>+ * This library is free software; you can redistribute it and/or
>+ * modify it under the terms of the GNU Lesser General Public
>+ * License as published by the Free Software Foundation; either
>+ * version 2.1 of the License, or (at your 

Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread Jason A. Donenfeld
On Fri, Dec 30, 2022 at 01:58:39PM -0800, H. Peter Anvin wrote:
> See the other thread fork. They have identified the problem already.

Not sure I follow. Is there another thread where somebody worked out why
this 62meg limit was happening?

Note that I sent v2/v3, to fix the original problem in a different way,
and if that looks good to the QEMU maintainers, then we can all be happy
with that. But I *haven't* addressed and still don't fully understand
why the 62meg limit applied to my v1 in the way it does. Did you find a
bug there to fix? If so, please do CC me.

Jason



[PATCH qemu v3] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread Jason A. Donenfeld
The setup_data links are appended to the compressed kernel image. Since
the kernel image is typically loaded at 0x10, setup_data lives at
`0x10 + compressed_size`, which does not get relocated during the
kernel's boot process.

The kernel typically decompresses the image starting at address
0x100 (note: there's one more zero there than the compressed image
above). This usually is fine for most kernels.

However, if the compressed image is actually quite large, then
setup_data will live at a `0x10 + compressed_size` that extends into
the decompressed zone at 0x100. In other words, if compressed_size
is larger than `0x100 - 0x10`, then the decompression step will
clobber setup_data, resulting in crashes.

Visually, what happens now is that QEMU appends setup_data to the kernel
image:

  kernel imagesetup_data
   |--|||
0x10  0x10+l1 0x10+l1+l2

The problem is that this decompresses to 0x100 (one more zero). So
if l1 is > (0x100-0x10), then this winds up looking like:

  kernel imagesetup_data
   |--|||
0x10  0x10+l1 0x10+l1+l2

 d e c o m p r e s s e d   k e r n e l
 
|-|
0x100 
0x100+l3

The decompressed kernel seemingly overwriting the compressed kernel
image isn't a problem, because that gets relocated to a higher address
early on in the boot process, at the end of startup_64. setup_data,
however, stays in the same place, since those links are self referential
and nothing fixes them up.  So the decompressed kernel clobbers it.

Fix this by appending setup_data to the cmdline blob rather than the
kernel image blob, which remains at a lower address that won't get
clobbered.

This could have been done by overwriting the initrd blob instead, but
that poses big difficulties, such as no longer being able to use memory
mapped files for initrd, hurting performance, and, more importantly, the
initrd address calculation is hard coded in qboot, and it always grows
down rather than up, which means lots of brittle semantics would have to
be changed around, incurring more complexity. In contrast, using cmdline
is simple and doesn't interfere with anything.

The microvm machine has a gross hack where it fiddles with fw_cfg data
after the fact. So this hack is updated to account for this appending,
by reserving some bytes.

Cc: x...@kernel.org
Cc: Philippe Mathieu-Daudé 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Eric Biggers 
Signed-off-by: Jason A. Donenfeld 
---
Changes v2->v3:
- Fix mistakes in string handling.
Changes v1->v2:
- Append setup_data to cmdline instead of kernel image.

 hw/i386/microvm.c | 13 ++
 hw/i386/x86.c | 50 +++
 hw/nvram/fw_cfg.c |  9 +++
 include/hw/i386/microvm.h |  5 ++--
 include/hw/nvram/fw_cfg.h |  9 +++
 5 files changed, 54 insertions(+), 32 deletions(-)

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 170a331e3f..1b19d28c02 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -378,7 +378,8 @@ static void microvm_fix_kernel_cmdline(MachineState 
*machine)
 MicrovmMachineState *mms = MICROVM_MACHINE(machine);
 BusState *bus;
 BusChild *kid;
-char *cmdline;
+char *cmdline, *existing_cmdline = fw_cfg_read_bytes_ptr(x86ms->fw_cfg, 
FW_CFG_CMDLINE_DATA);
+size_t len;
 
 /*
  * Find MMIO transports with attached devices, and add them to the kernel
@@ -387,7 +388,7 @@ static void microvm_fix_kernel_cmdline(MachineState 
*machine)
  * Yes, this is a hack, but one that heavily improves the UX without
  * introducing any significant issues.
  */
-cmdline = g_strdup(machine->kernel_cmdline);
+cmdline = g_strdup(existing_cmdline);
 bus = sysbus_get_default();
 QTAILQ_FOREACH(kid, >children, sibling) {
 DeviceState *dev = kid->child;
@@ -411,9 +412,11 @@ static void microvm_fix_kernel_cmdline(MachineState 
*machine)
 }
 }
 
-fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
-fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
-
+len = strlen(cmdline);
+if (len > VIRTIO_CMDLINE_TOTAL_MAX_LEN + strlen(existing_cmdline))
+fprintf(stderr, "qemu: virtio mmio cmdline too large, skipping\n");
+else
+memcpy(existing_cmdline, cmdline, len + 1);
 g_free(cmdline);
 }
 
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 78cc131926..b57a993596 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -50,6 +50,7 @@
 #include "hw/intc/i8259.h"
 #include "hw/rtc/mc146818rtc.h"
 #include "target/i386/sev.h"
+#include "hw/i386/microvm.h"
 
 #include 

Re: [PATCH qemu v2] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread Jason A. Donenfeld
On Fri, Dec 30, 2022 at 07:38:19PM +0100, Jason A. Donenfeld wrote:
> The microvm machine has a gross hack where it fiddles with fw_cfg data
> after the fact. So this hack is updated to account for this appending,
> by reserving some bytes.

This is a little derpy. I'll send a v3 in a second to clean this up.



Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread H. Peter Anvin
On December 30, 2022 11:54:11 AM PST, Borislav Petkov  wrote:
>On Fri, Dec 30, 2022 at 06:07:24PM +0100, Jason A. Donenfeld wrote:
>> Look closer at the boot process. The compressed image is initially at
>> 0x10, but it gets relocated to a safer area at the end of
>> startup_64:
>
>That is the address we're executing here from, rip here looks like 0x100xxx.
>
>> /*
>>  * Copy the compressed kernel to the end of our buffer
>>  * where decompression in place becomes safe.
>>  */
>> pushq   %rsi
>> leaq(_bss-8)(%rip), %rsi
>> leaqrva(_bss-8)(%rbx), %rdi
>
>when you get to here, it looks something like this:
>
>leaq(_bss-8)(%rip), %rsi   # 0x9e7ff8
>leaqrva(_bss-8)(%rbx), %rdi# 0xc6eeff8
>
>so the source address is that _bss thing and we copy...
>
>> movl$(_bss - startup_32), %ecx
>> shrl$3, %ecx
>> std
>
>... backwards since DF=1.
>
>Up to:
>
># rsi = 0x8
># rdi = 0xbe06ff8
>
>Ok, so the source address is 0x10. Good.
>
>> HOWEVER, qemu currently appends setup_data to the end of the
>> compressed kernel image,
>
>Yeah, you mean the kernel which starts executing at 0x10, i.e., that part
>which is compressed/head_64.S and which does the above and the relocation etc.
>
>> and this part isn't moved, and setup_data links aren't walked/relocated. So
>> that means the original address remains, of 0x10.
>
>See above: when it starts copying the kernel image backwards to a higher
>address, that last byte is at 0x9e7ff8 so I'm guessing qemu has put setup_data
>*after* that address. And that doesn't get copied ofc.
>
>So far, so good.
>
>Now later, we extract the compressed kernel created with the mkpiggy magic:
>
>input_data:
>.incbin "arch/x86/boot/compressed/vmlinux.bin.gz"
>input_data_end:
>
>by doing
>
>/*
> * Do the extraction, and jump to the new kernel..
> */
>
>pushq   %rsi/* Save the real mode argument */  
> 0x13d00
>movq%rsi, %rdi  /* real mode address */
> 0x13d00
>leaqboot_heap(%rip), %rsi   /* malloc area for uncompression */
> 0xc6ef000
>leaqinput_data(%rip), %rdx  /* input_data */   
> 0xbe073a8
>movlinput_len(%rip), %ecx   /* input_len */
> 0x8cfe13
>movq%rbp, %r8   /* output target address */
> 0x100
>movloutput_len(%rip), %r9d  /* decompressed length, end of relocs 
> */
>callextract_kernel  /* returns kernel location in %rax */
>popq%rsi
>
>(actual addresses at the end.)
>
>Now, when you say you triplefault somewhere in initialize_identity_maps() when
>trying to access setup_data, then if you look a couple of lines before that 
>call
>we do
>
>   call load_stage2_idt
>
>which sets up a boottime #PF handler do_boot_page_fault() and it actually does
>call kernel_add_identity_map() so *actually* it should map any unmapped
>setup_data addresses.
>
>So why doesn't it do that and why do you triplefault?
>
>Hmmm.
>

See the other thread fork. They have identified the problem already.



Re: [PATCH] target/microblaze: Add gdbstub xml

2022-12-30 Thread Edgar E. Iglesias
On Fri, Dec 30, 2022 at 08:24:19AM -0800, Richard Henderson wrote:
> Mirroring the upstream gdb xml files, the two stack boundary
> registers are separated out.

Reviewed-by: Edgar E. Iglesias 



> 
> Signed-off-by: Richard Henderson 
> ---
> 
> I did this thinking I would be fixing:
> 
>   TESTbasic gdbstub support on microblaze
>   Truncated register 35 in remote 'g' packet
>   Traceback (most recent call last):
> File "/home/rth/qemu/src/tests/tcg/multiarch/gdbstub/sha1.py",
>   line 71, in  if gdb.parse_and_eval('$pc') == 0:
>   gdb.error: No registers.
> 
> but in the end it turned out that the gdb-multiarch supplied
> by ubuntu 22.04 simply doesn't support MicroBlaze, as can be
> seen with the "set architecture" command within gdb.
> 
> (I built gdb from source, to try and debug why this still wasn't
> working, only to find that it did.  :-P)
> 
> Alex, any way to modify our gdb test to fail gracefully here?
> 
> Regardless, having proper xml for all of our targets seems
> like the correct way forward.
> 
> 
> r~
> 
> Cc: Alex Bennée 
> Cc: Edgar E. Iglesias 
> ---
>  configs/targets/microblaze-linux-user.mak   |  1 +
>  configs/targets/microblaze-softmmu.mak  |  1 +
>  configs/targets/microblazeel-linux-user.mak |  1 +
>  configs/targets/microblazeel-softmmu.mak|  1 +
>  target/microblaze/cpu.h |  2 +
>  target/microblaze/cpu.c |  7 ++-
>  target/microblaze/gdbstub.c | 51 +++-
>  gdb-xml/microblaze-core.xml | 67 +
>  gdb-xml/microblaze-stack-protect.xml| 12 
>  9 files changed, 128 insertions(+), 15 deletions(-)
>  create mode 100644 gdb-xml/microblaze-core.xml
>  create mode 100644 gdb-xml/microblaze-stack-protect.xml
> 
> diff --git a/configs/targets/microblaze-linux-user.mak 
> b/configs/targets/microblaze-linux-user.mak
> index 4249a37f65..0a2322c249 100644
> --- a/configs/targets/microblaze-linux-user.mak
> +++ b/configs/targets/microblaze-linux-user.mak
> @@ -3,3 +3,4 @@ TARGET_SYSTBL_ABI=common
>  TARGET_SYSTBL=syscall.tbl
>  TARGET_BIG_ENDIAN=y
>  TARGET_HAS_BFLT=y
> +TARGET_XML_FILES=gdb-xml/microblaze-core.xml 
> gdb-xml/microblaze-stack-protect.xml
> diff --git a/configs/targets/microblaze-softmmu.mak 
> b/configs/targets/microblaze-softmmu.mak
> index 8385e2d333..e84c0cc728 100644
> --- a/configs/targets/microblaze-softmmu.mak
> +++ b/configs/targets/microblaze-softmmu.mak
> @@ -2,3 +2,4 @@ TARGET_ARCH=microblaze
>  TARGET_BIG_ENDIAN=y
>  TARGET_SUPPORTS_MTTCG=y
>  TARGET_NEED_FDT=y
> +TARGET_XML_FILES=gdb-xml/microblaze-core.xml 
> gdb-xml/microblaze-stack-protect.xml
> diff --git a/configs/targets/microblazeel-linux-user.mak 
> b/configs/targets/microblazeel-linux-user.mak
> index d0e775d840..270743156a 100644
> --- a/configs/targets/microblazeel-linux-user.mak
> +++ b/configs/targets/microblazeel-linux-user.mak
> @@ -2,3 +2,4 @@ TARGET_ARCH=microblaze
>  TARGET_SYSTBL_ABI=common
>  TARGET_SYSTBL=syscall.tbl
>  TARGET_HAS_BFLT=y
> +TARGET_XML_FILES=gdb-xml/microblaze-core.xml 
> gdb-xml/microblaze-stack-protect.xml
> diff --git a/configs/targets/microblazeel-softmmu.mak 
> b/configs/targets/microblazeel-softmmu.mak
> index af40391f2f..9b688036bd 100644
> --- a/configs/targets/microblazeel-softmmu.mak
> +++ b/configs/targets/microblazeel-softmmu.mak
> @@ -1,3 +1,4 @@
>  TARGET_ARCH=microblaze
>  TARGET_SUPPORTS_MTTCG=y
>  TARGET_NEED_FDT=y
> +TARGET_XML_FILES=gdb-xml/microblaze-core.xml 
> gdb-xml/microblaze-stack-protect.xml
> diff --git a/target/microblaze/cpu.h b/target/microblaze/cpu.h
> index 1e84dd8f47..e541fbb0b3 100644
> --- a/target/microblaze/cpu.h
> +++ b/target/microblaze/cpu.h
> @@ -367,6 +367,8 @@ hwaddr mb_cpu_get_phys_page_attrs_debug(CPUState *cpu, 
> vaddr addr,
>  MemTxAttrs *attrs);
>  int mb_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
>  int mb_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
> +int mb_cpu_gdb_read_stack_protect(CPUArchState *cpu, GByteArray *buf, int 
> reg);
> +int mb_cpu_gdb_write_stack_protect(CPUArchState *cpu, uint8_t *buf, int reg);
>  
>  static inline uint32_t mb_cpu_read_msr(const CPUMBState *env)
>  {
> diff --git a/target/microblaze/cpu.c b/target/microblaze/cpu.c
> index 817681f9b2..a2d2f5c340 100644
> --- a/target/microblaze/cpu.c
> +++ b/target/microblaze/cpu.c
> @@ -28,6 +28,7 @@
>  #include "qemu/module.h"
>  #include "hw/qdev-properties.h"
>  #include "exec/exec-all.h"
> +#include "exec/gdbstub.h"
>  #include "fpu/softfloat-helpers.h"
>  
>  static const struct {
> @@ -294,6 +295,9 @@ static void mb_cpu_initfn(Object *obj)
>  CPUMBState *env = >env;
>  
>  cpu_set_cpustate_pointers(cpu);
> +gdb_register_coprocessor(CPU(cpu), mb_cpu_gdb_read_stack_protect,
> + mb_cpu_gdb_write_stack_protect, 2,
> + "microblaze-stack-protect.xml", 0);
>  
>  

[PATCH] net: Increase L2TPv3 buffer to fit jumboframes

2022-12-30 Thread Christian Svensson
Increase the allocated buffer size to fit larger packets.
Given that jumboframes can commonly be up to 9000 bytes the closest suitable
value seems to be 16 KiB.

Tested by running qemu towards a Linux L2TPv3 endpoint and pushing
jumboframe traffic through the interfaces.

Signed-off-by: Christian Svensson 
---
 net/l2tpv3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/l2tpv3.c b/net/l2tpv3.c
index 5852e42738..3d5c6d11d3 100644
--- a/net/l2tpv3.c
+++ b/net/l2tpv3.c
@@ -42,7 +42,7 @@
  */
 
 #define BUFFER_ALIGN sysconf(_SC_PAGESIZE)
-#define BUFFER_SIZE 2048
+#define BUFFER_SIZE 16384
 #define IOVSIZE 2
 #define MAX_L2TPV3_MSGCNT 64
 #define MAX_L2TPV3_IOVCNT (MAX_L2TPV3_MSGCNT * IOVSIZE)
-- 
2.36.2




Re: [PATCH 6/9] hw/arm/aspeed_ast10x0: Map HACE peripheral

2022-12-30 Thread Peter Delevoryas
On Fri, Dec 30, 2022 at 09:13:29AM +0100, Philippe Mathieu-Daudé wrote:
> On 29/12/22 21:52, Peter Delevoryas wrote:
> > On Thu, Dec 29, 2022 at 04:23:22PM +0100, Philippe Mathieu-Daudé wrote:
> > > Since I don't have access to the datasheet, the relevant
> > > values were found in:
> > > https://github.com/AspeedTech-BMC/zephyr/blob/v00.01.08/dts/arm/aspeed/ast10x0.dtsi
> > > 
> > > Before on Zephyr:
> > > 
> > >uart:~$ crypto aes256_cbc_vault
> > >aes256_cbc vault key 1
> > >[00:00:06.699,000]  hace_global: aspeed_crypto_session_setup
> > >[00:00:06.699,000]  hace_global: data->cmd: 1c2098
> > >[00:00:06.699,000]  hace_global: crypto_data_src: 93340
> > >[00:00:06.699,000]  hace_global: crypto_data_dst: 93348
> > >[00:00:06.699,000]  hace_global: crypto_ctx_base: 93300
> > >[00:00:06.699,000]  hace_global: crypto_data_len: 8040
> > >[00:00:06.699,000]  hace_global: crypto_cmd_reg:  11c2098
> > >[00:00:09.743,000]  hace_global: HACE_STS: 0
> > >[00:00:09.743,000]  hace_global: HACE poll timeout
> > >[00:00:09.743,000]  crypto: CBC mode ENCRYPT - Failed
> > >[00:00:09.743,000]  hace_global: aspeed_crypto_session_free
> > >uart:~$
> > > 
> > > After:
> > > 
> > >uart:~$ crypto aes256_cbc_vault
> > >aes256_cbc vault key 1
> > >Was waiting for:
> > >6b c1 be e2 2e 40 9f 96 e9 3d 7e 11 73 93 17 2a
> > >ae 2d 8a 57 1e 03 ac 9c 9e b7 6f ac 45 af 8e 51
> > >30 c8 1c 46 a3 5c e4 11 e5 fb c1 19 1a 0a 52 ef
> > >f6 9f 24 45 df 4f 9b 17 ad 2b 41 7b e6 6c 37 10
> > > 
> > > But got:
> > >00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > 
> > >[00:00:05.771,000]  hace_global: aspeed_crypto_session_setup
> > >[00:00:05.772,000]  hace_global: data->cmd: 1c2098
> > >[00:00:05.772,000]  hace_global: crypto_data_src: 93340
> > >[00:00:05.772,000]  hace_global: crypto_data_dst: 93348
> > >[00:00:05.772,000]  hace_global: crypto_ctx_base: 93300
> > >[00:00:05.772,000]  hace_global: crypto_data_len: 8040
> > >[00:00:05.772,000]  hace_global: crypto_cmd_reg:  11c2098
> > >[00:00:05.772,000]  hace_global: HACE_STS: 1000
> > >[00:00:05.772,000]  crypto: Output length (encryption): 80
> > >[00:00:05.772,000]  hace_global: aspeed_crypto_session_free
> > >[00:00:05.772,000]  hace_global: aspeed_crypto_session_setup
> > >[00:00:05.772,000]  hace_global: data->cmd: 1c2018
> > >[00:00:05.772,000]  hace_global: crypto_data_src: 93340
> > >[00:00:05.772,000]  hace_global: crypto_data_dst: 93348
> > >[00:00:05.772,000]  hace_global: crypto_ctx_base: 93300
> > >[00:00:05.772,000]  hace_global: crypto_data_len: 8040
> > >[00:00:05.772,000]  hace_global: crypto_cmd_reg:  11c2018
> > >[00:00:05.772,000]  hace_global: HACE_STS: 1000
> > >[00:00:05.772,000]  crypto: Output length (decryption): 64
> > >[00:00:05.772,000]  crypto: CBC mode DECRYPT - Mismatch between 
> > > plaintext and decrypted cipher text
> > >[00:00:05.774,000]  hace_global: aspeed_crypto_session_free
> > >uart:~$
> > > 
> > > Signed-off-by: Philippe Mathieu-Daudé 
> > 
> > Awesome!
> > 
> > Reviewed-by: Peter Delevoryas 
> > 
> > > ---
> > > Should we rename HACE 'dram' as 'secram' / 'secure-ram'?
> > 
> > Sure, sounds good to me.
> > 
> > > ---
> > >   hw/arm/aspeed_ast10x0.c | 15 +++
> > >   1 file changed, 15 insertions(+)
> > > 
> > > diff --git a/hw/arm/aspeed_ast10x0.c b/hw/arm/aspeed_ast10x0.c
> > > index 21a2e62345..02636705b6 100644
> > > --- a/hw/arm/aspeed_ast10x0.c
> > > +++ b/hw/arm/aspeed_ast10x0.c
> > > @@ -29,6 +29,7 @@ static const hwaddr aspeed_soc_ast1030_memmap[] = {
> > >   [ASPEED_DEV_SPI1]  = 0x7E63,
> > >   [ASPEED_DEV_SPI2]  = 0x7E64,
> > >   [ASPEED_DEV_UDC]   = 0x7E6A2000,
> > > +[ASPEED_DEV_HACE]  = 0x7E6D,
> > >   [ASPEED_DEV_SCU]   = 0x7E6E2000,
> > >   [ASPEED_DEV_JTAG0] = 0x7E6E4000,
> > >   [ASPEED_DEV_JTAG1] = 0x7E6E4100,
> > > @@ -166,6 +167,9 @@ static void aspeed_soc_ast1030_init(Object *obj)
> > >   snprintf(typename, sizeof(typename), "aspeed.gpio-%s", socname);
> > >   object_initialize_child(obj, "gpio", >gpio, typename);
> > > +snprintf(typename, sizeof(typename), "aspeed.hace-%s", socname);
> > > +object_initialize_child(obj, "hace", >hace, typename);
> > > +
> > >   object_initialize_child(obj, "iomem", >iomem, 
> > > TYPE_UNIMPLEMENTED_DEVICE);
> > >   object_initialize_child(obj, "sbc-unimplemented", 
> > > >sbc_unimplemented,
> > >   TYPE_UNIMPLEMENTED_DEVICE);
> > > @@ -359,6 +363,17 @@ static void aspeed_soc_ast1030_realize(DeviceState 
> > > *dev_soc, Error **errp)
> > >   }
> > >   aspeed_mmio_map(s, SYS_BUS_DEVICE(>sbc), 0, 

Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread Borislav Petkov
On Fri, Dec 30, 2022 at 06:07:24PM +0100, Jason A. Donenfeld wrote:
> Look closer at the boot process. The compressed image is initially at
> 0x10, but it gets relocated to a safer area at the end of
> startup_64:

That is the address we're executing here from, rip here looks like 0x100xxx.

> /*
>  * Copy the compressed kernel to the end of our buffer
>  * where decompression in place becomes safe.
>  */
> pushq   %rsi
> leaq(_bss-8)(%rip), %rsi
> leaqrva(_bss-8)(%rbx), %rdi

when you get to here, it looks something like this:

leaq(_bss-8)(%rip), %rsi# 0x9e7ff8
leaqrva(_bss-8)(%rbx), %rdi # 0xc6eeff8

so the source address is that _bss thing and we copy...

> movl$(_bss - startup_32), %ecx
> shrl$3, %ecx
> std

... backwards since DF=1.

Up to:

# rsi = 0x8
# rdi = 0xbe06ff8

Ok, so the source address is 0x10. Good.

> HOWEVER, qemu currently appends setup_data to the end of the
> compressed kernel image,

Yeah, you mean the kernel which starts executing at 0x10, i.e., that part
which is compressed/head_64.S and which does the above and the relocation etc.

> and this part isn't moved, and setup_data links aren't walked/relocated. So
> that means the original address remains, of 0x10.

See above: when it starts copying the kernel image backwards to a higher
address, that last byte is at 0x9e7ff8 so I'm guessing qemu has put setup_data
*after* that address. And that doesn't get copied ofc.

So far, so good.

Now later, we extract the compressed kernel created with the mkpiggy magic:

input_data:
.incbin "arch/x86/boot/compressed/vmlinux.bin.gz"
input_data_end:

by doing

/*
 * Do the extraction, and jump to the new kernel..
 */

pushq   %rsi/* Save the real mode argument */   
0x13d00
movq%rsi, %rdi  /* real mode address */ 
0x13d00
leaqboot_heap(%rip), %rsi   /* malloc area for uncompression */ 
0xc6ef000
leaqinput_data(%rip), %rdx  /* input_data */
0xbe073a8
movlinput_len(%rip), %ecx   /* input_len */ 
0x8cfe13
movq%rbp, %r8   /* output target address */ 
0x100
movloutput_len(%rip), %r9d  /* decompressed length, end of relocs */
callextract_kernel  /* returns kernel location in %rax */
popq%rsi

(actual addresses at the end.)

Now, when you say you triplefault somewhere in initialize_identity_maps() when
trying to access setup_data, then if you look a couple of lines before that call
we do

call load_stage2_idt

which sets up a boottime #PF handler do_boot_page_fault() and it actually does
call kernel_add_identity_map() so *actually* it should map any unmapped
setup_data addresses.

So why doesn't it do that and why do you triplefault?

Hmmm.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread H. Peter Anvin
On December 30, 2022 7:59:30 AM PST, "Jason A. Donenfeld"  
wrote:
>Hi,
>
>On Wed, Dec 28, 2022 at 11:31:34PM -0800, H. Peter Anvin wrote:
>> On December 28, 2022 6:31:07 PM PST, "Jason A. Donenfeld"  
>> wrote:
>> >Hi,
>> >
>> >Read this message in a fixed width text editor with a lot of columns.
>> >
>> >On Wed, Dec 28, 2022 at 03:58:12PM -0800, H. Peter Anvin wrote:
>> >> Glad you asked.
>> >> 
>> >> So the kernel load addresses are parameterized in the kernel image
>> >> setup header. One of the things that are so parameterized are the size
>> >> and possible realignment of the kernel image in memory.
>> >> 
>> >> I'm very confused where you are getting the 64 MB number from. There
>> >> should not be any such limitation.
>> >
>> >Currently, QEMU appends it to the kernel image, not to the initramfs as
>> >you suggest below. So, that winds up looking, currently, like:
>> >
>> >  kernel imagesetup_data
>> >   |--|||
>> >0x10  0x10+l1 0x10+l1+l2
>> >
>> >The problem is that this decompresses to 0x100 (one more zero). So
>> >if l1 is > (0x100-0x10), then this winds up looking like:
>> >
>> >  kernel imagesetup_data
>> >   |--|||
>> >0x10  0x10+l1 0x10+l1+l2
>> >
>> > d e c o m p r e s s e d   k e r n e l
>> > 
>> > |-|
>> >0x100   
>> >   0x100+l3 
>> >
>> >The decompressed kernel seemingly overwriting the compressed kernel
>> >image isn't a problem, because that gets relocated to a higher address
>> >early on in the boot process. setup_data, however, stays in the same
>> >place, since those links are self referential and nothing fixes them up.
>> >So the decompressed kernel clobbers it.
>> >
>> >The solution in this commit adds a bunch of padding between the kernel
>> >image and setup_data to avoid this. That looks like this:
>> >
>> >  kernel imagepadding   
>> > setup_data
>> >   
>> > |--||---|||
>> >0x10  0x10+l1   
>> >  0x100+l3  0x100+l3+l2
>> >
>> > d e c o m p r e s s e d   k e r n e l
>> > 
>> > |-|
>> >0x100   
>> >   0x100+l3 
>> >
>> >This way, the decompressed kernel doesn't clobber setup_data.
>> >
>> >The problem is that if 0x100+l3-0x10 is around 62 megabytes,
>> >then the bootloader crashes when trying to dereference setup_data's
>> >->len param at the end of initialize_identity_maps() in ident_map_64.c.
>> >I don't know why it does this. If I could remove the 62 megabyte
>> >restriction, then I could keep with this technique and all would be
>> >well.
>> >
>> >> In general, setup_data should be able to go anywhere the initrd can
>> >> go, and so is subject to the same address cap (896 MB for old kernels,
>> >> 4 GB on newer ones; this address too is enumerated in the header.)
>> >
>> >It would be theoretically possible to attach it to the initrd image
>> >instead of to the kernel image. As a last resort, I guess I can look
>> >into doing that. However, that's going to require some serious rework
>> >and plumbing of a lot of different components. So if I can make it work
>> >as is, that'd be ideal. However, I need to figure out this weird 62 meg
>> >limitation.
>> >
>> >Any ideas on that?
>> >
>> >Jason
>> 
>> As far as a crash... that sounds like a big and a pretty serious one at that.
>> 
>> Could you let me know what kernel you are using and how *exactly* you are 
>> booting it?
>
>I'll attach a .config file. Apply the patch at the top of this thread to
>qemu, except make one modification:
>
>diff --git a/hw/i386/x86.c b/hw/i386/x86.c
>index 628fd2b2e9..a61ee23e13 100644
>--- a/hw/i386/x86.c
>+++ b/hw/i386/x86.c
>@@ -1097,7 +1097,7 @@ void x86_load_linux(X86MachineState *x86ms,
> 
> /* The early stage can't address past around 64 MB from the 
> original
>  * mapping, so just give up in that case. */
>-if (padded_size < 62 * 1024 * 1024)
>+if (true || padded_size < 62 * 1024 * 1024)
> kernel_size = padded_size;
> else {
> fprintf(stderr, "qemu: Kernel image too large to hold 
> setup_data\n");
>
>Then build qemu. Run it with `-kernel bzImage`, based on the kernel
>built with the .config I attached.
>
>You'll see that the CPU triple faults when hitting this line:
>
>sd = (struct setup_data 

[PATCH qemu v2] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread Jason A. Donenfeld
The setup_data links are appended to the compressed kernel image. Since
the kernel image is typically loaded at 0x10, setup_data lives at
`0x10 + compressed_size`, which does not get relocated during the
kernel's boot process.

The kernel typically decompresses the image starting at address
0x100 (note: there's one more zero there than the compressed image
above). This usually is fine for most kernels.

However, if the compressed image is actually quite large, then
setup_data will live at a `0x10 + compressed_size` that extends into
the decompressed zone at 0x100. In other words, if compressed_size
is larger than `0x100 - 0x10`, then the decompression step will
clobber setup_data, resulting in crashes.

Visually, what happens now is that QEMU appends setup_data to the kernel
image:

  kernel imagesetup_data
   |--|||
0x10  0x10+l1 0x10+l1+l2

The problem is that this decompresses to 0x100 (one more zero). So
if l1 is > (0x100-0x10), then this winds up looking like:

  kernel imagesetup_data
   |--|||
0x10  0x10+l1 0x10+l1+l2

 d e c o m p r e s s e d   k e r n e l
 
|-|
0x100 
0x100+l3

The decompressed kernel seemingly overwriting the compressed kernel
image isn't a problem, because that gets relocated to a higher address
early on in the boot process, at the end of startup_64. setup_data,
however, stays in the same place, since those links are self referential
and nothing fixes them up.  So the decompressed kernel clobbers it.

Fix this by appending setup_data to the cmdline blob rather than the
kernel image blob, which remains at a lower address that won't get
clobbered.

This could have been done by overwriting the initrd blob instead, but
that poses big difficulties, such as no longer being able to use memory
mapped files for initrd, hurting performance, and, more importantly, the
initrd address calculation is hard coded in qboot, and it always grows
down rather than up, which means lots of brittle semantics would have to
be changed around, incurring more complexity. In contrast, using cmdline
is simple and doesn't interfere with anything.

The microvm machine has a gross hack where it fiddles with fw_cfg data
after the fact. So this hack is updated to account for this appending,
by reserving some bytes.

Cc: x...@kernel.org
Cc: Philippe Mathieu-Daudé 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Eric Biggers 
Signed-off-by: Jason A. Donenfeld 
---
 hw/i386/microvm.c | 10 
 hw/i386/x86.c | 50 +++
 hw/nvram/fw_cfg.c |  9 +++
 include/hw/i386/microvm.h |  5 ++--
 include/hw/nvram/fw_cfg.h |  9 +++
 5 files changed, 51 insertions(+), 32 deletions(-)

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 170a331e3f..1f3686d90d 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -378,7 +378,8 @@ static void microvm_fix_kernel_cmdline(MachineState 
*machine)
 MicrovmMachineState *mms = MICROVM_MACHINE(machine);
 BusState *bus;
 BusChild *kid;
-char *cmdline;
+void *existing_cmdline = fw_cfg_read_bytes_ptr(x86ms->fw_cfg, 
FW_CFG_CMDLINE_DATA);
+char *cmdline, *insertion_point;
 
 /*
  * Find MMIO transports with attached devices, and add them to the kernel
@@ -387,7 +388,7 @@ static void microvm_fix_kernel_cmdline(MachineState 
*machine)
  * Yes, this is a hack, but one that heavily improves the UX without
  * introducing any significant issues.
  */
-cmdline = g_strdup(machine->kernel_cmdline);
+cmdline = g_strdup(existing_cmdline);
 bus = sysbus_get_default();
 QTAILQ_FOREACH(kid, >children, sibling) {
 DeviceState *dev = kid->child;
@@ -411,9 +412,8 @@ static void microvm_fix_kernel_cmdline(MachineState 
*machine)
 }
 }
 
-fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
-fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
-
+insertion_point = existing_cmdline + strlen(existing_cmdline);
+memcpy(insertion_point, cmdline, strlen(cmdline) + 1);
 g_free(cmdline);
 }
 
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 78cc131926..b57a993596 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -50,6 +50,7 @@
 #include "hw/intc/i8259.h"
 #include "hw/rtc/mc146818rtc.h"
 #include "target/i386/sev.h"
+#include "hw/i386/microvm.h"
 
 #include "hw/acpi/cpu_hotplug.h"
 #include "hw/irq.h"
@@ -802,7 +803,7 @@ void x86_load_linux(X86MachineState *x86ms,
 bool linuxboot_dma_enabled = 
X86_MACHINE_GET_CLASS(x86ms)->fwcfg_dma_enabled;
 uint16_t protocol;
 int setup_size, 

Re: [PATCH v2 02/11] hw/watchdog/wdt_aspeed: Extend MMIO range to cover more registers

2022-12-30 Thread Peter Delevoryas
On Fri, Dec 30, 2022 at 01:31:35PM +0100, Philippe Mathieu-Daudé wrote:
> On 30/12/22 12:34, Philippe Mathieu-Daudé wrote:
> > When booting the Zephyr demo in [1] we get:
> > 
> >aspeed.io: unimplemented device write (size 4, offset 0x185128, value 
> > 0x030f1ff1) <--
> >aspeed.io: unimplemented device write (size 4, offset 0x18512c, value 
> > 0x03f1)
> > 
> > This corresponds to this Zephyr code [2]:
> > 
> >static int aspeed_wdt_init(const struct device *dev)
> >{
> >  const struct aspeed_wdt_config *config = dev->config;
> >  struct aspeed_wdt_data *const data = dev->data;
> >  uint32_t reg_val;
> > 
> >  /* disable WDT by default */
> >  reg_val = sys_read32(config->ctrl_base + WDT_CTRL_REG);
> >  reg_val &= ~WDT_CTRL_ENABLE;
> >  sys_write32(reg_val, config->ctrl_base + WDT_CTRL_REG);
> > 
> >  sys_write32(data->rst_mask1,
> >  config->ctrl_base + WDT_SW_RESET_MASK1_REG);   <--
> >  sys_write32(data->rst_mask2,
> >  config->ctrl_base + WDT_SW_RESET_MASK2_REG);
> > 
> >  return 0;
> >}
> > 
> > The register definitions are [3]:
> > 
> >#define WDT_RELOAD_VAL_REG  0x0004
> >#define WDT_RESTART_REG 0x0008
> >#define WDT_CTRL_REG0x000C
> >#define WDT_TIMEOUT_STATUS_REG  0x0010
> >#define WDT_TIMEOUT_STATUS_CLR_REG  0x0014
> >#define WDT_RESET_MASK1_REG 0x001C
> >#define WDT_RESET_MASK2_REG 0x0020
> >#define WDT_SW_RESET_MASK1_REG  0x0028   <--
> >#define WDT_SW_RESET_MASK2_REG  0x002C
> >#define WDT_SW_RESET_CTRL_REG   0x0024
> > 
> > Currently QEMU only cover a MMIO region of size 0x20:
> > 
> >#define ASPEED_WDT_REGS_MAX(0x20 / 4)
> > 
> > Change to map the whole 'iosize' which might be bigger, covering
> > the other registers. The MemoryRegionOps read/write handlers will
> > report the accesses as out-of-bounds guest-errors, but the next
> > commit will report them as unimplemented.

A I see, this makes perfect sense now, thanks for the detail!

> 
> I'll amend here for clarity:
> 
> ---
> 
> Memory layout before this change:
> 
>   (qemu) info mtree -f
> ...
> 7e785000-7e78501f (prio 0, i/o): aspeed.wdt
> 7e785020-7e78507f (prio -1000, i/o): aspeed.io
> @00185020
> 7e785080-7e78509f (prio 0, i/o): aspeed.wdt
> 7e7850a0-7e7850ff (prio -1000, i/o): aspeed.io
> @001850a0
> 7e785100-7e78511f (prio 0, i/o): aspeed.wdt
> 7e785120-7e78517f (prio -1000, i/o): aspeed.io
> @00185120
> 7e785180-7e78519f (prio 0, i/o): aspeed.wdt
> 7e7851a0-7e788fff (prio -1000, i/o): aspeed.io
> @001851a0
> 
> After:
> 
>   (qemu) info mtree -f
> ...
> 7e785000-7e78507f (prio 0, i/o): aspeed.wdt
> 7e785080-7e7850ff (prio 0, i/o): aspeed.wdt
> 7e785100-7e78517f (prio 0, i/o): aspeed.wdt
> 7e785180-7e7851ff (prio 0, i/o): aspeed.wdt
> 7e785200-7e788fff (prio -1000, i/o): aspeed.io
> @00185200
> ---
> 
> > [1] https://github.com/AspeedTech-BMC/zephyr/releases/tag/v00.01.07
> > [2] https://github.com/AspeedTech-BMC/zephyr/commit/2e99f10ac27b
> > [3] 
> > https://github.com/AspeedTech-BMC/zephyr/blob/v00.01.08/drivers/watchdog/wdt_aspeed.c#L31
> > 
> > Reviewed-by: Peter Delevoryas 
> > Signed-off-by: Philippe Mathieu-Daudé 
> > ---
> >   hw/watchdog/wdt_aspeed.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/watchdog/wdt_aspeed.c b/hw/watchdog/wdt_aspeed.c
> > index 958725a1b5..eefca31ae4 100644
> > --- a/hw/watchdog/wdt_aspeed.c
> > +++ b/hw/watchdog/wdt_aspeed.c
> > @@ -260,6 +260,7 @@ static void aspeed_wdt_realize(DeviceState *dev, Error 
> > **errp)
> >   {
> >   SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
> >   AspeedWDTState *s = ASPEED_WDT(dev);
> > +AspeedWDTClass *awc = ASPEED_WDT_GET_CLASS(dev);
> >   assert(s->scu);
> > @@ -271,7 +272,7 @@ static void aspeed_wdt_realize(DeviceState *dev, Error 
> > **errp)
> >   s->pclk_freq = PCLK_HZ;
> >   memory_region_init_io(>iomem, OBJECT(s), _wdt_ops, s,
> > -  TYPE_ASPEED_WDT, ASPEED_WDT_REGS_MAX * 4);
> > +  TYPE_ASPEED_WDT, awc->iosize);
> >   sysbus_init_mmio(sbd, >iomem);
> >   }
> 



Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread Jason A. Donenfeld
On Wed, Dec 28, 2022 at 03:38:30PM +0100, Jason A. Donenfeld wrote:
> The setup_data links are appended to the compressed kernel image. Since
> the kernel image is typically loaded at 0x10, setup_data lives at
> `0x10 + compressed_size`, which does not get relocated during the
> kernel's boot process.
> 
> The kernel typically decompresses the image starting at address
> 0x100 (note: there's one more zero there than the decompressed image
> above). This usually is fine for most kernels.
> 
> However, if the compressed image is actually quite large, then
> setup_data will live at a `0x10 + compressed_size` that extends into
> the decompressed zone at 0x100. In other words, if compressed_size
> is larger than `0x100 - 0x10`, then the decompression step will
> clobber setup_data, resulting in crashes.
> 
> Fix this by detecting that possibility, and if it occurs, put setup_data
> *after* the end of the decompressed kernel, so that it doesn't get
> clobbered.
> 
> One caveat is that this only works for images less than around 64
> megabytes, so just bail out in that case. This is unfortunate, but I
> don't currently have a way of fixing it.
 
I've got a different solution now that tries to piggy back on cmdline.
I'll send a v2. It avoids the 62MiB crash situation of this one and
seems to work fine.



Re: [PATCH v2 03/11] hw/watchdog/wdt_aspeed: Log unimplemented registers as UNIMP level

2022-12-30 Thread Peter Delevoryas
On Fri, Dec 30, 2022 at 12:34:56PM +0100, Philippe Mathieu-Daudé wrote:
> Add more Aspeed watchdog registers from [*].
> 
> Since guests can righteously access them, log the access at
> 'unimplemented' level instead of 'guest-errors'.
> 
> [*] 
> https://github.com/AspeedTech-BMC/zephyr/blob/v00.01.08/drivers/watchdog/wdt_aspeed.c#L31
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Peter Delevoryas 

> ---
>  hw/watchdog/wdt_aspeed.c | 13 +
>  include/hw/watchdog/wdt_aspeed.h |  2 +-
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/watchdog/wdt_aspeed.c b/hw/watchdog/wdt_aspeed.c
> index eefca31ae4..d267aa185c 100644
> --- a/hw/watchdog/wdt_aspeed.c
> +++ b/hw/watchdog/wdt_aspeed.c
> @@ -42,6 +42,11 @@
>  #define WDT_PUSH_PULL_MAGIC (0xA8 << 24)
>  #define WDT_OPEN_DRAIN_MAGIC(0x8A << 24)
>  #define WDT_RESET_MASK1 (0x1c / 4)
> +#define WDT_RESET_MASK2 (0x20 / 4)
> +
> +#define WDT_SW_RESET_CTRL   (0x24 / 4)
> +#define WDT_SW_RESET_MASK1  (0x28 / 4)
> +#define WDT_SW_RESET_MASK2  (0x2c / 4)
>  
>  #define WDT_TIMEOUT_STATUS  (0x10 / 4)
>  #define WDT_TIMEOUT_CLEAR   (0x14 / 4)
> @@ -83,6 +88,10 @@ static uint64_t aspeed_wdt_read(void *opaque, hwaddr 
> offset, unsigned size)
>  return s->regs[WDT_RESET_MASK1];
>  case WDT_TIMEOUT_STATUS:
>  case WDT_TIMEOUT_CLEAR:
> +case WDT_RESET_MASK2:
> +case WDT_SW_RESET_CTRL:
> +case WDT_SW_RESET_MASK1:
> +case WDT_SW_RESET_MASK2:
>  qemu_log_mask(LOG_UNIMP,
>"%s: uninmplemented read at offset 0x%" HWADDR_PRIx 
> "\n",
>__func__, offset);
> @@ -190,6 +199,10 @@ static void aspeed_wdt_write(void *opaque, hwaddr 
> offset, uint64_t data,
>  
>  case WDT_TIMEOUT_STATUS:
>  case WDT_TIMEOUT_CLEAR:
> +case WDT_RESET_MASK2:
> +case WDT_SW_RESET_CTRL:
> +case WDT_SW_RESET_MASK1:
> +case WDT_SW_RESET_MASK2:
>  qemu_log_mask(LOG_UNIMP,
>"%s: uninmplemented write at offset 0x%" HWADDR_PRIx 
> "\n",
>__func__, offset);
> diff --git a/include/hw/watchdog/wdt_aspeed.h 
> b/include/hw/watchdog/wdt_aspeed.h
> index db91ee6b51..e90ef86651 100644
> --- a/include/hw/watchdog/wdt_aspeed.h
> +++ b/include/hw/watchdog/wdt_aspeed.h
> @@ -21,7 +21,7 @@ OBJECT_DECLARE_TYPE(AspeedWDTState, AspeedWDTClass, 
> ASPEED_WDT)
>  #define TYPE_ASPEED_2600_WDT TYPE_ASPEED_WDT "-ast2600"
>  #define TYPE_ASPEED_1030_WDT TYPE_ASPEED_WDT "-ast1030"
>  
> -#define ASPEED_WDT_REGS_MAX(0x20 / 4)
> +#define ASPEED_WDT_REGS_MAX(0x30 / 4)
>  
>  struct AspeedWDTState {
>  /*< private >*/
> -- 
> 2.38.1
> 



Re: qemu-system-i386: Could not install MSR_CORE_THREAD_COUNT handler: Success

2022-12-30 Thread Vitaly Chikunov
Alexander,

On Fri, Dec 30, 2022 at 06:44:14PM +0100, Alexander Graf wrote:
> Hi Vitaly,
> 
> This is a kvm kernel bug and should be fixed with the latest stable releases. 
> Which kernel version are you running?

This is on latest v6.0 stable - 6.0.15.

Maybe there could be workaround for such situations? (Or maybe it's
possible to make this error non-fatal?) We use qemu+kvm for testing and
now we cannot test on x86.

Thanks,


> 
> Thanks,
> 
> Alex
> 
> 
> > Am 30.12.2022 um 15:30 schrieb Vitaly Chikunov :
> > 
> > Hi,
> > 
> > QEMU 7.2.0 when run on 32-bit x86 architecture fails with:
> > 
> >  i586$ qemu-system-i386 -enable-kvm
> >  qemu-system-i386: Could not install MSR_CORE_THREAD_COUNT handler: Success
> >  i586$ qemu-system-x86_64 -enable-kvm
> >  qemu-system-x86_64: Could not install MSR_CORE_THREAD_COUNT handler: 
> > Success
> > 
> > Minimal reproducer is `qemu-system-i386 -enable-kvm'. And this only
> > happens on x86 (linux32 personality and binaries on x86_64 host):
> > 
> >  i586$ file /usr/bin/qemu-system-i386
> >  /usr/bin/qemu-system-i386: ELF 32-bit LSB pie executable, Intel 80386, 
> > version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, 
> > BuildID[sha1]=0ba1d953bcb7a691014255954f060ff404c8df90, for GNU/Linux 
> > 3.2.0, stripped
> >  i586$ /usr/bin/qemu-system-i386 --version
> >  QEMU emulator version 7.2.0 (qemu-7.2.0-alt1)
> >  Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
> > 
> > Thanks,
> > 
> 



Re: qemu-system-i386: Could not install MSR_CORE_THREAD_COUNT handler: Success

2022-12-30 Thread Alexander Graf
Hi Vitaly,

This is a kvm kernel bug and should be fixed with the latest stable releases. 
Which kernel version are you running?

Thanks,

Alex


> Am 30.12.2022 um 15:30 schrieb Vitaly Chikunov :
> 
> Hi,
> 
> QEMU 7.2.0 when run on 32-bit x86 architecture fails with:
> 
>  i586$ qemu-system-i386 -enable-kvm
>  qemu-system-i386: Could not install MSR_CORE_THREAD_COUNT handler: Success
>  i586$ qemu-system-x86_64 -enable-kvm
>  qemu-system-x86_64: Could not install MSR_CORE_THREAD_COUNT handler: Success
> 
> Minimal reproducer is `qemu-system-i386 -enable-kvm'. And this only
> happens on x86 (linux32 personality and binaries on x86_64 host):
> 
>  i586$ file /usr/bin/qemu-system-i386
>  /usr/bin/qemu-system-i386: ELF 32-bit LSB pie executable, Intel 80386, 
> version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, 
> BuildID[sha1]=0ba1d953bcb7a691014255954f060ff404c8df90, for GNU/Linux 3.2.0, 
> stripped
>  i586$ /usr/bin/qemu-system-i386 --version
>  QEMU emulator version 7.2.0 (qemu-7.2.0-alt1)
>  Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
> 
> Thanks,
> 



Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread Jason A. Donenfeld
On Fri, Dec 30, 2022 at 6:01 PM Borislav Petkov  wrote:
>
> On Fri, Dec 30, 2022 at 04:54:27PM +0100, Jason A. Donenfeld wrote:
> > > Right, with CONFIG_X86_VERBOSE_BOOTUP=y in a guest here, it says:
> > >
> > > early console in extract_kernel
> > > input_data: 0x0be073a8
> > > input_len: 0x008cfc43
> > > output: 0x0100
> > > output_len: 0x0b600a98
> > > kernel_total_size: 0x0ac26000
> > > needed_size: 0x0b80
> > > trampoline_32bit: 0x0009d000
> > >
> > > so that's a ~9M kernel which gets decompressed at 0x100 and the
> > > output len is, what, ~180M which looks like plenty to me...
> >
> > I think you might have misunderstood the thread.
>
> Maybe...
>
> I've been trying to make sense of all the decompression dancing we're doing 
> and
> the diagrams you've drawn and there's a comment over extract_kernel() which
> basically says that the compressed image is at the back of the memory buffer
>
> input_data: 0x0be073a8
>
> and we decompress to a smaller address
>
> output: 0x0100
>
> And, it *looks* like setup_data is at an address somewhere >= 0x100 so 
> when
> we start decompressing, we overwrite it.
>
> I guess extract_kernel() could also dump the setup_data addresses so that we 
> can
> verify that...
>
> > First, to reproduce the bug that this patch fixes, you need a kernel with a
> > compressed size of around 16 megs, not 9.
>
> Not only that - you need setup_data to be placed somewhere just so that it 
> gets
> overwritten during decompression. Also, see below.
>
> > Secondly, that crash is well understood and doesn't need to be reproduced;
>
> Is it?
>
> Where do you get the 0x10 as the starting address of the kernel image?
>
> Because input_data above says that the input address is somewhere higher...

Look closer at the boot process. The compressed image is initially at
0x10, but it gets relocated to a safer area at the end of
startup_64:

/*
 * Copy the compressed kernel to the end of our buffer
 * where decompression in place becomes safe.
 */
pushq   %rsi
leaq(_bss-8)(%rip), %rsi
leaqrva(_bss-8)(%rbx), %rdi
movl$(_bss - startup_32), %ecx
shrl$3, %ecx
std
rep movsq
cld
popq%rsi

/*
 * The GDT may get overwritten either during the copy we just did or
 * during extract_kernel below. To avoid any issues, repoint the GDTR
 * to the new copy of the GDT.
 */
leaqrva(gdt64)(%rbx), %rax
leaqrva(gdt)(%rbx), %rdx
movq%rdx, 2(%rax)
lgdt(%rax)

/*
 * Jump to the relocated address.
 */
leaqrva(.Lrelocated)(%rbx), %rax
jmp *%rax

And that address winds up being calculated with a combination of the
image size and the init_size param, so it's safe. Decompression won't
overwrite the compressed kernel.

HOWEVER, qemu currently appends setup_data to the end of the
compressed kernel image, and this part isn't moved, and setup_data
links aren't walked/relocated. So that means the original address
remains, of 0x10.

Jason



Re: [PATCH v4 1/2] tpm: convert tpmdev options processing to new visitor format

2022-12-30 Thread Stefan Berger




On 12/30/22 10:24, James Bottomley wrote:

From: James Bottomley 

Instead of processing the tpmdev options using the old qemu options,
convert to the new visitor format which also allows the passing of
json on the command line.

Signed-off-by: James Bottomley 

---
v4: add TpmConfiOptions
---
  backends/tpm/tpm_emulator.c| 24 -
  backends/tpm/tpm_passthrough.c | 27 +++---
  include/sysemu/tpm.h   |  4 +-
  include/sysemu/tpm_backend.h   |  2 +-
  qapi/tpm.json  | 19 +++
  softmmu/tpm.c  | 90 ++
  softmmu/vl.c   | 19 +--
  7 files changed, 76 insertions(+), 109 deletions(-)

diff --git a/backends/tpm/tpm_emulator.c b/backends/tpm/tpm_emulator.c
index 49cc3d749d..cb6bf9d7c2 100644
--- a/backends/tpm/tpm_emulator.c
+++ b/backends/tpm/tpm_emulator.c
@@ -584,33 +584,28 @@ err_exit:
  return -1;
  }
  
-static int tpm_emulator_handle_device_opts(TPMEmulator *tpm_emu, QemuOpts *opts)

+static int tpm_emulator_handle_device_opts(TPMEmulator *tpm_emu, 
TpmCreateOptions *opts)
  {
-const char *value;
  Error *err = NULL;
  Chardev *dev;
  
-value = qemu_opt_get(opts, "chardev");

-if (!value) {
-error_report("tpm-emulator: parameter 'chardev' is missing");
-goto err;
-}
+tpm_emu->options = QAPI_CLONE(TPMEmulatorOptions, >u.emulator);
+tpm_emu->data_ioc = NULL;
  
-dev = qemu_chr_find(value);

+dev = qemu_chr_find(opts->u.emulator.chardev);
  if (!dev) {
-error_report("tpm-emulator: tpm chardev '%s' not found", value);
+error_report("tpm-emulator: tpm chardev '%s' not found",
+opts->u.emulator.chardev);
  goto err;
  }
  
  if (!qemu_chr_fe_init(_emu->ctrl_chr, dev, )) {

  error_prepend(, "tpm-emulator: No valid chardev found at '%s':",
-  value);
+  opts->u.emulator.chardev);
  error_report_err(err);
  goto err;
  }
  
-tpm_emu->options->chardev = g_strdup(value);

-
  if (tpm_emulator_prepare_data_fd(tpm_emu) < 0) {
  goto err;
  }
@@ -649,7 +644,7 @@ err:
  return -1;
  }
  
-static TPMBackend *tpm_emulator_create(QemuOpts *opts)

+static TPMBackend *tpm_emulator_create(TpmCreateOptions *opts)
  {
  TPMBackend *tb = TPM_BACKEND(object_new(TYPE_TPM_EMULATOR));
  
@@ -972,7 +967,6 @@ static void tpm_emulator_inst_init(Object *obj)
  
  trace_tpm_emulator_inst_init();
  
-tpm_emu->options = g_new0(TPMEmulatorOptions, 1);

  tpm_emu->cur_locty_number = ~0;
  qemu_mutex_init(_emu->mutex);
  tpm_emu->vmstate =
@@ -990,7 +984,7 @@ static void tpm_emulator_shutdown(TPMEmulator *tpm_emu)
  {
  ptm_res res;
  
-if (!tpm_emu->options->chardev) {

+if (!tpm_emu->data_ioc) {
  /* was never properly initialized */
  return;
  }
diff --git a/backends/tpm/tpm_passthrough.c b/backends/tpm/tpm_passthrough.c
index 5a2f74db1b..f9771eaf1f 100644
--- a/backends/tpm/tpm_passthrough.c
+++ b/backends/tpm/tpm_passthrough.c
@@ -252,23 +252,11 @@ static int 
tpm_passthrough_open_sysfs_cancel(TPMPassthruState *tpm_pt)
  }
  
  static int

-tpm_passthrough_handle_device_opts(TPMPassthruState *tpm_pt, QemuOpts *opts)
+tpm_passthrough_handle_device_opts(TPMPassthruState *tpm_pt, TpmCreateOptions 
*opts)
  {
-const char *value;
+tpm_pt->options = QAPI_CLONE(TPMPassthroughOptions, >u.passthrough);
  
-value = qemu_opt_get(opts, "cancel-path");

-if (value) {
-tpm_pt->options->cancel_path = g_strdup(value);
-tpm_pt->options->has_cancel_path = true;
-}
-
-value = qemu_opt_get(opts, "path");
-if (value) {
-tpm_pt->options->has_path = true;
-tpm_pt->options->path = g_strdup(value);
-}
-
-tpm_pt->tpm_dev = value ? value : TPM_PASSTHROUGH_DEFAULT_DEVICE;
+tpm_pt->tpm_dev = opts->u.passthrough.has_path ? opts->u.passthrough.path 
: TPM_PASSTHROUGH_DEFAULT_DEVICE;
  tpm_pt->tpm_fd = qemu_open_old(tpm_pt->tpm_dev, O_RDWR);
  if (tpm_pt->tpm_fd < 0) {
  error_report("Cannot access TPM device using '%s': %s",
@@ -290,11 +278,11 @@ tpm_passthrough_handle_device_opts(TPMPassthruState 
*tpm_pt, QemuOpts *opts)
  return 0;
  }
  
-static TPMBackend *tpm_passthrough_create(QemuOpts *opts)

+static TPMBackend *tpm_passthrough_create(TpmCreateOptions *tco)
  {
  Object *obj = object_new(TYPE_TPM_PASSTHROUGH);
  
-if (tpm_passthrough_handle_device_opts(TPM_PASSTHROUGH(obj), opts)) {

+if (tpm_passthrough_handle_device_opts(TPM_PASSTHROUGH(obj), tco)) {
  object_unref(obj);
  return NULL;
  }
@@ -321,8 +309,8 @@ static TpmTypeOptions 
*tpm_passthrough_get_tpm_options(TPMBackend *tb)
  TpmTypeOptions *options = g_new0(TpmTypeOptions, 1);
  
  options->type = TPM_TYPE_PASSTHROUGH;

-options->u.passthrough.data = QAPI_CLONE(TPMPassthroughOptions,
-  

Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread Borislav Petkov
On Fri, Dec 30, 2022 at 04:54:27PM +0100, Jason A. Donenfeld wrote:
> > Right, with CONFIG_X86_VERBOSE_BOOTUP=y in a guest here, it says:
> > 
> > early console in extract_kernel
> > input_data: 0x0be073a8
> > input_len: 0x008cfc43
> > output: 0x0100
> > output_len: 0x0b600a98
> > kernel_total_size: 0x0ac26000
> > needed_size: 0x0b80
> > trampoline_32bit: 0x0009d000
> > 
> > so that's a ~9M kernel which gets decompressed at 0x100 and the
> > output len is, what, ~180M which looks like plenty to me...
> 
> I think you might have misunderstood the thread.

Maybe...

I've been trying to make sense of all the decompression dancing we're doing and
the diagrams you've drawn and there's a comment over extract_kernel() which
basically says that the compressed image is at the back of the memory buffer

input_data: 0x0be073a8

and we decompress to a smaller address

output: 0x0100

And, it *looks* like setup_data is at an address somewhere >= 0x100 so when
we start decompressing, we overwrite it.

I guess extract_kernel() could also dump the setup_data addresses so that we can
verify that...

> First, to reproduce the bug that this patch fixes, you need a kernel with a
> compressed size of around 16 megs, not 9.

Not only that - you need setup_data to be placed somewhere just so that it gets
overwritten during decompression. Also, see below.
 
> Secondly, that crash is well understood and doesn't need to be reproduced;

Is it?

Where do you get the 0x10 as the starting address of the kernel image?

Because input_data above says that the input address is somewhere higher...

Btw, here's an allyesconfig:

early console in setup code
No EFI environment detected.
early console in extract_kernel
input_data: 0x1bd003a8
input_len: 0x0cde2963
output: 0x0100
output_len: 0x27849d74
kernel_total_size: 0x2583
needed_size: 0x27a0
trampoline_32bit: 0x0009d000
Physical KASLR using RDRAND RDTSC...
Virtual KASLR using RDRAND RDTSC...

That kernel has compressed size of ~205M and the output buffer is of size ~632M.

> this patch fixes it. Rather, the question now is how to improve this patch
> to remove the 62 meg limit.

Yeah, I'm wondering what that 62M limit is too, actually.

But maybe I'm misunderstanding...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



[PATCH] target/microblaze: Add gdbstub xml

2022-12-30 Thread Richard Henderson
Mirroring the upstream gdb xml files, the two stack boundary
registers are separated out.

Signed-off-by: Richard Henderson 
---

I did this thinking I would be fixing:

  TESTbasic gdbstub support on microblaze
  Truncated register 35 in remote 'g' packet
  Traceback (most recent call last):
File "/home/rth/qemu/src/tests/tcg/multiarch/gdbstub/sha1.py",
  line 71, in  if gdb.parse_and_eval('$pc') == 0:
  gdb.error: No registers.

but in the end it turned out that the gdb-multiarch supplied
by ubuntu 22.04 simply doesn't support MicroBlaze, as can be
seen with the "set architecture" command within gdb.

(I built gdb from source, to try and debug why this still wasn't
working, only to find that it did.  :-P)

Alex, any way to modify our gdb test to fail gracefully here?

Regardless, having proper xml for all of our targets seems
like the correct way forward.


r~

Cc: Alex Bennée 
Cc: Edgar E. Iglesias 
---
 configs/targets/microblaze-linux-user.mak   |  1 +
 configs/targets/microblaze-softmmu.mak  |  1 +
 configs/targets/microblazeel-linux-user.mak |  1 +
 configs/targets/microblazeel-softmmu.mak|  1 +
 target/microblaze/cpu.h |  2 +
 target/microblaze/cpu.c |  7 ++-
 target/microblaze/gdbstub.c | 51 +++-
 gdb-xml/microblaze-core.xml | 67 +
 gdb-xml/microblaze-stack-protect.xml| 12 
 9 files changed, 128 insertions(+), 15 deletions(-)
 create mode 100644 gdb-xml/microblaze-core.xml
 create mode 100644 gdb-xml/microblaze-stack-protect.xml

diff --git a/configs/targets/microblaze-linux-user.mak 
b/configs/targets/microblaze-linux-user.mak
index 4249a37f65..0a2322c249 100644
--- a/configs/targets/microblaze-linux-user.mak
+++ b/configs/targets/microblaze-linux-user.mak
@@ -3,3 +3,4 @@ TARGET_SYSTBL_ABI=common
 TARGET_SYSTBL=syscall.tbl
 TARGET_BIG_ENDIAN=y
 TARGET_HAS_BFLT=y
+TARGET_XML_FILES=gdb-xml/microblaze-core.xml 
gdb-xml/microblaze-stack-protect.xml
diff --git a/configs/targets/microblaze-softmmu.mak 
b/configs/targets/microblaze-softmmu.mak
index 8385e2d333..e84c0cc728 100644
--- a/configs/targets/microblaze-softmmu.mak
+++ b/configs/targets/microblaze-softmmu.mak
@@ -2,3 +2,4 @@ TARGET_ARCH=microblaze
 TARGET_BIG_ENDIAN=y
 TARGET_SUPPORTS_MTTCG=y
 TARGET_NEED_FDT=y
+TARGET_XML_FILES=gdb-xml/microblaze-core.xml 
gdb-xml/microblaze-stack-protect.xml
diff --git a/configs/targets/microblazeel-linux-user.mak 
b/configs/targets/microblazeel-linux-user.mak
index d0e775d840..270743156a 100644
--- a/configs/targets/microblazeel-linux-user.mak
+++ b/configs/targets/microblazeel-linux-user.mak
@@ -2,3 +2,4 @@ TARGET_ARCH=microblaze
 TARGET_SYSTBL_ABI=common
 TARGET_SYSTBL=syscall.tbl
 TARGET_HAS_BFLT=y
+TARGET_XML_FILES=gdb-xml/microblaze-core.xml 
gdb-xml/microblaze-stack-protect.xml
diff --git a/configs/targets/microblazeel-softmmu.mak 
b/configs/targets/microblazeel-softmmu.mak
index af40391f2f..9b688036bd 100644
--- a/configs/targets/microblazeel-softmmu.mak
+++ b/configs/targets/microblazeel-softmmu.mak
@@ -1,3 +1,4 @@
 TARGET_ARCH=microblaze
 TARGET_SUPPORTS_MTTCG=y
 TARGET_NEED_FDT=y
+TARGET_XML_FILES=gdb-xml/microblaze-core.xml 
gdb-xml/microblaze-stack-protect.xml
diff --git a/target/microblaze/cpu.h b/target/microblaze/cpu.h
index 1e84dd8f47..e541fbb0b3 100644
--- a/target/microblaze/cpu.h
+++ b/target/microblaze/cpu.h
@@ -367,6 +367,8 @@ hwaddr mb_cpu_get_phys_page_attrs_debug(CPUState *cpu, 
vaddr addr,
 MemTxAttrs *attrs);
 int mb_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
 int mb_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
+int mb_cpu_gdb_read_stack_protect(CPUArchState *cpu, GByteArray *buf, int reg);
+int mb_cpu_gdb_write_stack_protect(CPUArchState *cpu, uint8_t *buf, int reg);
 
 static inline uint32_t mb_cpu_read_msr(const CPUMBState *env)
 {
diff --git a/target/microblaze/cpu.c b/target/microblaze/cpu.c
index 817681f9b2..a2d2f5c340 100644
--- a/target/microblaze/cpu.c
+++ b/target/microblaze/cpu.c
@@ -28,6 +28,7 @@
 #include "qemu/module.h"
 #include "hw/qdev-properties.h"
 #include "exec/exec-all.h"
+#include "exec/gdbstub.h"
 #include "fpu/softfloat-helpers.h"
 
 static const struct {
@@ -294,6 +295,9 @@ static void mb_cpu_initfn(Object *obj)
 CPUMBState *env = >env;
 
 cpu_set_cpustate_pointers(cpu);
+gdb_register_coprocessor(CPU(cpu), mb_cpu_gdb_read_stack_protect,
+ mb_cpu_gdb_write_stack_protect, 2,
+ "microblaze-stack-protect.xml", 0);
 
 set_float_rounding_mode(float_round_nearest_even, >fp_status);
 
@@ -422,7 +426,8 @@ static void mb_cpu_class_init(ObjectClass *oc, void *data)
 cc->sysemu_ops = _sysemu_ops;
 #endif
 device_class_set_props(dc, mb_properties);
-cc->gdb_num_core_regs = 32 + 27;
+cc->gdb_num_core_regs = 32 + 25;
+cc->gdb_core_xml_file = "microblaze-core.xml";
 
 

Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread Jason A. Donenfeld
Er, .config attached now.


.config
Description: Binary data


Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread Jason A. Donenfeld
Hi,

On Wed, Dec 28, 2022 at 11:31:34PM -0800, H. Peter Anvin wrote:
> On December 28, 2022 6:31:07 PM PST, "Jason A. Donenfeld"  
> wrote:
> >Hi,
> >
> >Read this message in a fixed width text editor with a lot of columns.
> >
> >On Wed, Dec 28, 2022 at 03:58:12PM -0800, H. Peter Anvin wrote:
> >> Glad you asked.
> >> 
> >> So the kernel load addresses are parameterized in the kernel image
> >> setup header. One of the things that are so parameterized are the size
> >> and possible realignment of the kernel image in memory.
> >> 
> >> I'm very confused where you are getting the 64 MB number from. There
> >> should not be any such limitation.
> >
> >Currently, QEMU appends it to the kernel image, not to the initramfs as
> >you suggest below. So, that winds up looking, currently, like:
> >
> >  kernel imagesetup_data
> >   |--|||
> >0x10  0x10+l1 0x10+l1+l2
> >
> >The problem is that this decompresses to 0x100 (one more zero). So
> >if l1 is > (0x100-0x10), then this winds up looking like:
> >
> >  kernel imagesetup_data
> >   |--|||
> >0x10  0x10+l1 0x10+l1+l2
> >
> > d e c o m p r e s s e d   k e r n e l
> >  
> > |-|
> >0x100
> >  0x100+l3 
> >
> >The decompressed kernel seemingly overwriting the compressed kernel
> >image isn't a problem, because that gets relocated to a higher address
> >early on in the boot process. setup_data, however, stays in the same
> >place, since those links are self referential and nothing fixes them up.
> >So the decompressed kernel clobbers it.
> >
> >The solution in this commit adds a bunch of padding between the kernel
> >image and setup_data to avoid this. That looks like this:
> >
> >  kernel imagepadding
> >setup_data
> >   
> > |--||---|||
> >0x10  0x10+l1
> > 0x100+l3  0x100+l3+l2
> >
> > d e c o m p r e s s e d   k e r n e l
> >  
> > |-|
> >0x100
> >  0x100+l3 
> >
> >This way, the decompressed kernel doesn't clobber setup_data.
> >
> >The problem is that if 0x100+l3-0x10 is around 62 megabytes,
> >then the bootloader crashes when trying to dereference setup_data's
> >->len param at the end of initialize_identity_maps() in ident_map_64.c.
> >I don't know why it does this. If I could remove the 62 megabyte
> >restriction, then I could keep with this technique and all would be
> >well.
> >
> >> In general, setup_data should be able to go anywhere the initrd can
> >> go, and so is subject to the same address cap (896 MB for old kernels,
> >> 4 GB on newer ones; this address too is enumerated in the header.)
> >
> >It would be theoretically possible to attach it to the initrd image
> >instead of to the kernel image. As a last resort, I guess I can look
> >into doing that. However, that's going to require some serious rework
> >and plumbing of a lot of different components. So if I can make it work
> >as is, that'd be ideal. However, I need to figure out this weird 62 meg
> >limitation.
> >
> >Any ideas on that?
> >
> >Jason
> 
> As far as a crash... that sounds like a big and a pretty serious one at that.
> 
> Could you let me know what kernel you are using and how *exactly* you are 
> booting it?

I'll attach a .config file. Apply the patch at the top of this thread to
qemu, except make one modification:

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 628fd2b2e9..a61ee23e13 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1097,7 +1097,7 @@ void x86_load_linux(X86MachineState *x86ms,
 
 /* The early stage can't address past around 64 MB from the 
original
  * mapping, so just give up in that case. */
-if (padded_size < 62 * 1024 * 1024)
+if (true || padded_size < 62 * 1024 * 1024)
 kernel_size = padded_size;
 else {
 fprintf(stderr, "qemu: Kernel image too large to hold 
setup_data\n");

Then build qemu. Run it with `-kernel bzImage`, based on the kernel
built with the .config I attached.

You'll see that the CPU triple faults when hitting this line:

sd = (struct setup_data *)boot_params->hdr.setup_data;
while (sd) {
unsigned long sd_addr = (unsigned long)sd;

kernel_add_identity_map(sd_addr, sd_addr + sizeof(*sd) + 
sd->len);  <
 

Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread Jason A. Donenfeld
On Thu, Dec 29, 2022 at 01:47:49PM +0100, Borislav Petkov wrote:
> On Wed, Dec 28, 2022 at 11:31:34PM -0800, H. Peter Anvin wrote:
> > As far as a crash... that sounds like a big and a pretty serious one at 
> > that.
> > 
> > Could you let me know what kernel you are using and how *exactly* you are 
> > booting it?
> 
> Right, with CONFIG_X86_VERBOSE_BOOTUP=y in a guest here, it says:
> 
> early console in extract_kernel
> input_data: 0x0be073a8
> input_len: 0x008cfc43
> output: 0x0100
> output_len: 0x0b600a98
> kernel_total_size: 0x0ac26000
> needed_size: 0x0b80
> trampoline_32bit: 0x0009d000
> 
> so that's a ~9M kernel which gets decompressed at 0x100 and the
> output len is, what, ~180M which looks like plenty to me...

I think you might have misunderstood the thread. First, to reproduce the
bug that this patch fixes, you need a kernel with a compressed size of
around 16 megs, not 9. Secondly, that crash is well understood and
doesn't need to be reproduced; this patch fixes it. Rather, the question
now is how to improve this patch to remove the 62 meg limit. I'll follow
up with hpa's request for reproduction info.

Jason



Re: [PATCH v2 2/2] hw/arm: Add Olimex H405

2022-12-30 Thread Philippe Mathieu-Daudé

On 30/12/22 15:57, Felipe Balbi wrote:

Olimex makes a series of low-cost STM32 boards. This commit introduces
the minimum setup to support SMT32-H405. See [1] for details

[1] https://www.olimex.com/Products/ARM/ST/STM32-H405/

Signed-off-by: Felipe Balbi 
---

Changes since v1:
- Add a note in stm32.rst
- Initialize default_cpu_type to cortex-m4
- 0-initialize default_ram_size

  MAINTAINERS |  6 +++
  configs/devices/arm-softmmu/default.mak |  1 +
  docs/system/arm/stm32.rst   |  1 +
  hw/arm/Kconfig  |  4 ++
  hw/arm/meson.build  |  1 +
  hw/arm/olimex-stm32-h405.c  | 69 +
  6 files changed, 82 insertions(+)
  create mode 100644 hw/arm/olimex-stm32-h405.c


Reviewed-by: Philippe Mathieu-Daudé 




[PATCH v4 2/2] tpm: add backend for mssim

2022-12-30 Thread James Bottomley
From: James Bottomley 

The Microsoft Simulator (mssim) is the reference emulation platform
for the TCG TPM 2.0 specification.

https://github.com/Microsoft/ms-tpm-20-ref.git

It exports a fairly simple network socket based protocol on two
sockets, one for command (default 2321) and one for control (default
2322).  This patch adds a simple backend that can speak the mssim
protocol over the network.  It also allows the two sockets to be
specified on the command line.  The benefits are twofold: firstly it
gives us a backend that actually speaks a standard TPM emulation
protocol instead of the linux specific TPM driver format of the
current emulated TPM backend and secondly, using the microsoft
protocol, the end point of the emulator can be anywhere on the
network, facilitating the cloud use case where a central TPM service
can be used over a control network.

The implementation does basic control commands like power off/on, but
doesn't implement cancellation or startup.  The former because
cancellation is pretty much useless on a fast operating TPM emulator
and the latter because this emulator is designed to be used with OVMF
which itself does TPM startup and I wanted to validate that.

To run this, simply download an emulator based on the MS specification
(package ibmswtpm2 on openSUSE) and run it, then add these two lines
to the qemu command and it will use the emulator.

-tpmdev mssim,id=tpm0 \
-device tpm-crb,tpmdev=tpm0 \

to use a remote emulator replace the first line with

-tpmdev 
"{'type':'mssim','id':'tpm0','command':{'type':inet,'host':'remote','port':'2321'}}"

tpm-tis also works as the backend.

Signed-off-by: James Bottomley 

---

v2: convert to SocketAddr json and use qio_channel_socket_connect_sync()
v3: gate control power off by migration state keep control socket disconnected
to test outside influence and add docs.
---
 MAINTAINERS  |   6 +
 backends/tpm/Kconfig |   5 +
 backends/tpm/meson.build |   1 +
 backends/tpm/tpm_mssim.c | 265 +++
 backends/tpm/tpm_mssim.h |  43 +++
 docs/specs/tpm.rst   |  35 ++
 monitor/hmp-cmds.c   |   7 ++
 qapi/tpm.json|  28 -
 8 files changed, 386 insertions(+), 4 deletions(-)
 create mode 100644 backends/tpm/tpm_mssim.c
 create mode 100644 backends/tpm/tpm_mssim.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 6966490c94..b0f5cceda1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3043,9 +3043,15 @@ F: include/hw/acpi/tpm.h
 F: include/sysemu/tpm*
 F: qapi/tpm.json
 F: backends/tpm/
+X: backends/tpm/tpm_mssim.*
 F: tests/qtest/*tpm*
 T: git https://github.com/stefanberger/qemu-tpm.git tpm-next
 
+MSSIM TPM Backend
+M: James Bottomley 
+S: Maintained
+F: backends/tpm/tpm_mssim.*
+
 Checkpatch
 S: Odd Fixes
 F: scripts/checkpatch.pl
diff --git a/backends/tpm/Kconfig b/backends/tpm/Kconfig
index 5d91eb89c2..d6d6fa53e9 100644
--- a/backends/tpm/Kconfig
+++ b/backends/tpm/Kconfig
@@ -12,3 +12,8 @@ config TPM_EMULATOR
 bool
 default y
 depends on TPM_BACKEND
+
+config TPM_MSSIM
+bool
+default y
+depends on TPM_BACKEND
diff --git a/backends/tpm/meson.build b/backends/tpm/meson.build
index 7f2503f84e..c7c3c79125 100644
--- a/backends/tpm/meson.build
+++ b/backends/tpm/meson.build
@@ -3,4 +3,5 @@ if have_tpm
   softmmu_ss.add(files('tpm_util.c'))
   softmmu_ss.add(when: 'CONFIG_TPM_PASSTHROUGH', if_true: 
files('tpm_passthrough.c'))
   softmmu_ss.add(when: 'CONFIG_TPM_EMULATOR', if_true: files('tpm_emulator.c'))
+  softmmu_ss.add(when: 'CONFIG_TPM_MSSIM', if_true: files('tpm_mssim.c'))
 endif
diff --git a/backends/tpm/tpm_mssim.c b/backends/tpm/tpm_mssim.c
new file mode 100644
index 00..e83cef
--- /dev/null
+++ b/backends/tpm/tpm_mssim.c
@@ -0,0 +1,265 @@
+/*
+ * Emulator TPM driver which connects over the mssim protocol
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (c) 2022
+ * Author: James Bottomley 
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/sockets.h"
+
+#include "qapi/clone-visitor.h"
+#include "qapi/qapi-visit-tpm.h"
+
+#include "io/channel-socket.h"
+
+#include "sysemu/runstate.h"
+#include "sysemu/tpm_backend.h"
+#include "sysemu/tpm_util.h"
+
+#include "qom/object.h"
+
+#include "tpm_int.h"
+#include "tpm_mssim.h"
+
+#define ERROR_PREFIX "TPM mssim Emulator: "
+
+#define TYPE_TPM_MSSIM "tpm-mssim"
+OBJECT_DECLARE_SIMPLE_TYPE(TPMmssim, TPM_MSSIM)
+
+struct TPMmssim {
+TPMBackend parent;
+
+TPMmssimOptions opts;
+
+QIOChannelSocket *cmd_qc, *ctrl_qc;
+};
+
+static int tpm_send_ctrl(TPMmssim *t, uint32_t cmd, Error **errp)
+{
+int ret;
+
+qio_channel_socket_connect_sync(t->ctrl_qc, t->opts.control, errp);
+cmd = htonl(cmd);
+ret = qio_channel_write_all(QIO_CHANNEL(t->ctrl_qc), (char *), 
sizeof(cmd), errp);
+if (ret != 0)
+goto out;
+ret = qio_channel_read_all(QIO_CHANNEL(t->ctrl_qc), (char *), 
sizeof(cmd), errp);
+if (ret != 

[PATCH v4 1/2] tpm: convert tpmdev options processing to new visitor format

2022-12-30 Thread James Bottomley
From: James Bottomley 

Instead of processing the tpmdev options using the old qemu options,
convert to the new visitor format which also allows the passing of
json on the command line.

Signed-off-by: James Bottomley 

---
v4: add TpmConfiOptions
---
 backends/tpm/tpm_emulator.c| 24 -
 backends/tpm/tpm_passthrough.c | 27 +++---
 include/sysemu/tpm.h   |  4 +-
 include/sysemu/tpm_backend.h   |  2 +-
 qapi/tpm.json  | 19 +++
 softmmu/tpm.c  | 90 ++
 softmmu/vl.c   | 19 +--
 7 files changed, 76 insertions(+), 109 deletions(-)

diff --git a/backends/tpm/tpm_emulator.c b/backends/tpm/tpm_emulator.c
index 49cc3d749d..cb6bf9d7c2 100644
--- a/backends/tpm/tpm_emulator.c
+++ b/backends/tpm/tpm_emulator.c
@@ -584,33 +584,28 @@ err_exit:
 return -1;
 }
 
-static int tpm_emulator_handle_device_opts(TPMEmulator *tpm_emu, QemuOpts 
*opts)
+static int tpm_emulator_handle_device_opts(TPMEmulator *tpm_emu, 
TpmCreateOptions *opts)
 {
-const char *value;
 Error *err = NULL;
 Chardev *dev;
 
-value = qemu_opt_get(opts, "chardev");
-if (!value) {
-error_report("tpm-emulator: parameter 'chardev' is missing");
-goto err;
-}
+tpm_emu->options = QAPI_CLONE(TPMEmulatorOptions, >u.emulator);
+tpm_emu->data_ioc = NULL;
 
-dev = qemu_chr_find(value);
+dev = qemu_chr_find(opts->u.emulator.chardev);
 if (!dev) {
-error_report("tpm-emulator: tpm chardev '%s' not found", value);
+error_report("tpm-emulator: tpm chardev '%s' not found",
+opts->u.emulator.chardev);
 goto err;
 }
 
 if (!qemu_chr_fe_init(_emu->ctrl_chr, dev, )) {
 error_prepend(, "tpm-emulator: No valid chardev found at '%s':",
-  value);
+  opts->u.emulator.chardev);
 error_report_err(err);
 goto err;
 }
 
-tpm_emu->options->chardev = g_strdup(value);
-
 if (tpm_emulator_prepare_data_fd(tpm_emu) < 0) {
 goto err;
 }
@@ -649,7 +644,7 @@ err:
 return -1;
 }
 
-static TPMBackend *tpm_emulator_create(QemuOpts *opts)
+static TPMBackend *tpm_emulator_create(TpmCreateOptions *opts)
 {
 TPMBackend *tb = TPM_BACKEND(object_new(TYPE_TPM_EMULATOR));
 
@@ -972,7 +967,6 @@ static void tpm_emulator_inst_init(Object *obj)
 
 trace_tpm_emulator_inst_init();
 
-tpm_emu->options = g_new0(TPMEmulatorOptions, 1);
 tpm_emu->cur_locty_number = ~0;
 qemu_mutex_init(_emu->mutex);
 tpm_emu->vmstate =
@@ -990,7 +984,7 @@ static void tpm_emulator_shutdown(TPMEmulator *tpm_emu)
 {
 ptm_res res;
 
-if (!tpm_emu->options->chardev) {
+if (!tpm_emu->data_ioc) {
 /* was never properly initialized */
 return;
 }
diff --git a/backends/tpm/tpm_passthrough.c b/backends/tpm/tpm_passthrough.c
index 5a2f74db1b..f9771eaf1f 100644
--- a/backends/tpm/tpm_passthrough.c
+++ b/backends/tpm/tpm_passthrough.c
@@ -252,23 +252,11 @@ static int 
tpm_passthrough_open_sysfs_cancel(TPMPassthruState *tpm_pt)
 }
 
 static int
-tpm_passthrough_handle_device_opts(TPMPassthruState *tpm_pt, QemuOpts *opts)
+tpm_passthrough_handle_device_opts(TPMPassthruState *tpm_pt, TpmCreateOptions 
*opts)
 {
-const char *value;
+tpm_pt->options = QAPI_CLONE(TPMPassthroughOptions, >u.passthrough);
 
-value = qemu_opt_get(opts, "cancel-path");
-if (value) {
-tpm_pt->options->cancel_path = g_strdup(value);
-tpm_pt->options->has_cancel_path = true;
-}
-
-value = qemu_opt_get(opts, "path");
-if (value) {
-tpm_pt->options->has_path = true;
-tpm_pt->options->path = g_strdup(value);
-}
-
-tpm_pt->tpm_dev = value ? value : TPM_PASSTHROUGH_DEFAULT_DEVICE;
+tpm_pt->tpm_dev = opts->u.passthrough.has_path ? opts->u.passthrough.path 
: TPM_PASSTHROUGH_DEFAULT_DEVICE;
 tpm_pt->tpm_fd = qemu_open_old(tpm_pt->tpm_dev, O_RDWR);
 if (tpm_pt->tpm_fd < 0) {
 error_report("Cannot access TPM device using '%s': %s",
@@ -290,11 +278,11 @@ tpm_passthrough_handle_device_opts(TPMPassthruState 
*tpm_pt, QemuOpts *opts)
 return 0;
 }
 
-static TPMBackend *tpm_passthrough_create(QemuOpts *opts)
+static TPMBackend *tpm_passthrough_create(TpmCreateOptions *tco)
 {
 Object *obj = object_new(TYPE_TPM_PASSTHROUGH);
 
-if (tpm_passthrough_handle_device_opts(TPM_PASSTHROUGH(obj), opts)) {
+if (tpm_passthrough_handle_device_opts(TPM_PASSTHROUGH(obj), tco)) {
 object_unref(obj);
 return NULL;
 }
@@ -321,8 +309,8 @@ static TpmTypeOptions 
*tpm_passthrough_get_tpm_options(TPMBackend *tb)
 TpmTypeOptions *options = g_new0(TpmTypeOptions, 1);
 
 options->type = TPM_TYPE_PASSTHROUGH;
-options->u.passthrough.data = QAPI_CLONE(TPMPassthroughOptions,
- TPM_PASSTHROUGH(tb)->options);
+
+options->u.passthrough.data = 

[PATCH v4 0/2] tpm: add mssim backend

2022-12-30 Thread James Bottomley
From: James Bottomley 

The requested feedback was to convert the tpmdev handler to being json
based, which requires rethreading all the backends.  The good news is
this reduced quite a bit of code (especially as I converted it to
error_fatal handling as well, which removes the return status
threading).  The bad news is I can't test any of the conversions.
swtpm still isn't building on opensuse and, apparently, passthrough
doesn't like my native TPM because it doesn't allow cancellation.

v3 pulls out more unneeded code in the visitor conversion, makes
migration work on external state preservation of the simulator and
adds documentation

v4 puts back the wrapper options (but doesn't add any for mssim since
it post dates the necessity)

James

---

James Bottomley (2):
  tpm: convert tpmdev options processing to new visitor format
  tpm: add backend for mssim

 MAINTAINERS|   6 +
 backends/tpm/Kconfig   |   5 +
 backends/tpm/meson.build   |   1 +
 backends/tpm/tpm_emulator.c|  24 ++-
 backends/tpm/tpm_mssim.c   | 265 +
 backends/tpm/tpm_mssim.h   |  43 ++
 backends/tpm/tpm_passthrough.c |  27 +---
 docs/specs/tpm.rst |  35 +
 include/sysemu/tpm.h   |   4 +-
 include/sysemu/tpm_backend.h   |   2 +-
 monitor/hmp-cmds.c |   7 +
 qapi/tpm.json  |  45 +-
 softmmu/tpm.c  |  90 +--
 softmmu/vl.c   |  19 +--
 14 files changed, 461 insertions(+), 112 deletions(-)
 create mode 100644 backends/tpm/tpm_mssim.c
 create mode 100644 backends/tpm/tpm_mssim.h

-- 
2.35.3




[PATCH v2 0/2] hw/arm: Add support for STM32 H405 and fix STM32F405 memory layout

2022-12-30 Thread Felipe Balbi
Hi,

The following patches pass checkpatch.pl and have been tested against
55745005e90a.

Felipe Balbi (2):
  hw/arm/stm32f405: correctly describe the memory layout
  hw/arm: Add Olimex H405

 MAINTAINERS |  6 +++
 configs/devices/arm-softmmu/default.mak |  1 +
 docs/system/arm/stm32.rst   |  1 +
 hw/arm/Kconfig  |  4 ++
 hw/arm/meson.build  |  1 +
 hw/arm/olimex-stm32-h405.c  | 69 +
 hw/arm/stm32f405_soc.c  |  8 +++
 include/hw/arm/stm32f405_soc.h  |  5 +-
 8 files changed, 94 insertions(+), 1 deletion(-)
 create mode 100644 hw/arm/olimex-stm32-h405.c

-- 
2.39.0




[PATCH v2 1/2] hw/arm/stm32f405: correctly describe the memory layout

2022-12-30 Thread Felipe Balbi
STM32F405 has 128K of SRAM and another 64K of CCM (Core-coupled
Memory) at a different base address. Correctly describe the memory
layout to give existing FW images a chance to run unmodified.

Reviewed-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Felipe Balbi 
---

Changes since v1:
- None

 hw/arm/stm32f405_soc.c | 8 
 include/hw/arm/stm32f405_soc.h | 5 -
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/arm/stm32f405_soc.c b/hw/arm/stm32f405_soc.c
index c07947d9f8b1..cef23d7ee41a 100644
--- a/hw/arm/stm32f405_soc.c
+++ b/hw/arm/stm32f405_soc.c
@@ -139,6 +139,14 @@ static void stm32f405_soc_realize(DeviceState *dev_soc, 
Error **errp)
 }
 memory_region_add_subregion(system_memory, SRAM_BASE_ADDRESS, >sram);
 
+memory_region_init_ram(>ccm, NULL, "STM32F405.ccm", CCM_SIZE,
+   );
+if (err != NULL) {
+error_propagate(errp, err);
+return;
+}
+memory_region_add_subregion(system_memory, CCM_BASE_ADDRESS, >ccm);
+
 armv7m = DEVICE(>armv7m);
 qdev_prop_set_uint32(armv7m, "num-irq", 96);
 qdev_prop_set_string(armv7m, "cpu-type", s->cpu_type);
diff --git a/include/hw/arm/stm32f405_soc.h b/include/hw/arm/stm32f405_soc.h
index 5bb0c8d56979..249ab5434ec7 100644
--- a/include/hw/arm/stm32f405_soc.h
+++ b/include/hw/arm/stm32f405_soc.h
@@ -46,7 +46,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(STM32F405State, STM32F405_SOC)
 #define FLASH_BASE_ADDRESS 0x0800
 #define FLASH_SIZE (1024 * 1024)
 #define SRAM_BASE_ADDRESS 0x2000
-#define SRAM_SIZE (192 * 1024)
+#define SRAM_SIZE (128 * 1024)
+#define CCM_BASE_ADDRESS 0x1000
+#define CCM_SIZE (64 * 1024)
 
 struct STM32F405State {
 /*< private >*/
@@ -65,6 +67,7 @@ struct STM32F405State {
 STM32F2XXADCState adc[STM_NUM_ADCS];
 STM32F2XXSPIState spi[STM_NUM_SPIS];
 
+MemoryRegion ccm;
 MemoryRegion sram;
 MemoryRegion flash;
 MemoryRegion flash_alias;
-- 
2.39.0




[PATCH v2 2/2] hw/arm: Add Olimex H405

2022-12-30 Thread Felipe Balbi
Olimex makes a series of low-cost STM32 boards. This commit introduces
the minimum setup to support SMT32-H405. See [1] for details

[1] https://www.olimex.com/Products/ARM/ST/STM32-H405/

Signed-off-by: Felipe Balbi 
---

Changes since v1:
- Add a note in stm32.rst
- Initialize default_cpu_type to cortex-m4
- 0-initialize default_ram_size

 MAINTAINERS |  6 +++
 configs/devices/arm-softmmu/default.mak |  1 +
 docs/system/arm/stm32.rst   |  1 +
 hw/arm/Kconfig  |  4 ++
 hw/arm/meson.build  |  1 +
 hw/arm/olimex-stm32-h405.c  | 69 +
 6 files changed, 82 insertions(+)
 create mode 100644 hw/arm/olimex-stm32-h405.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bd433b65a55..e37846df0071 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1026,6 +1026,12 @@ L: qemu-...@nongnu.org
 S: Maintained
 F: hw/arm/netduinoplus2.c
 
+Olimex STM32 H405
+M: Felipe Balbi 
+L: qemu-...@nongnu.org
+S: Maintained
+F: hw/arm/olimex-stm32-h405.c
+
 SmartFusion2
 M: Subbaraya Sundeep 
 M: Peter Maydell 
diff --git a/configs/devices/arm-softmmu/default.mak 
b/configs/devices/arm-softmmu/default.mak
index 6985a25377a0..1b49a7830c7e 100644
--- a/configs/devices/arm-softmmu/default.mak
+++ b/configs/devices/arm-softmmu/default.mak
@@ -30,6 +30,7 @@ CONFIG_COLLIE=y
 CONFIG_ASPEED_SOC=y
 CONFIG_NETDUINO2=y
 CONFIG_NETDUINOPLUS2=y
+CONFIG_OLIMEX_STM32_H405=y
 CONFIG_MPS2=y
 CONFIG_RASPI=y
 CONFIG_DIGIC=y
diff --git a/docs/system/arm/stm32.rst b/docs/system/arm/stm32.rst
index 508b92cf862b..d7265b763d47 100644
--- a/docs/system/arm/stm32.rst
+++ b/docs/system/arm/stm32.rst
@@ -20,6 +20,7 @@ The STM32F4 series is based on ARM Cortex-M4F core. This 
series is pin-to-pin
 compatible with STM32F2 series. The following machines are based on this chip :
 
 - ``netduinoplus2`` Netduino Plus 2 board with STM32F405RGT6 
microcontroller
+- ``olimex-stm32-h405`` Olimex STM32 H405 board with STM32F405RGT6 
microcontroller
 
 There are many other STM32 series that are currently not supported by QEMU.
 
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 17fcde8e1ccc..9143533ef792 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -119,6 +119,10 @@ config NETDUINOPLUS2
 bool
 select STM32F405_SOC
 
+config OLIMEX_STM32_H405
+bool
+select STM32F405_SOC
+
 config NSERIES
 bool
 select OMAP
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index 92f9f6e000ea..76d4d650e42e 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -12,6 +12,7 @@ arm_ss.add(when: 'CONFIG_MICROBIT', if_true: 
files('microbit.c'))
 arm_ss.add(when: 'CONFIG_MUSICPAL', if_true: files('musicpal.c'))
 arm_ss.add(when: 'CONFIG_NETDUINO2', if_true: files('netduino2.c'))
 arm_ss.add(when: 'CONFIG_NETDUINOPLUS2', if_true: files('netduinoplus2.c'))
+arm_ss.add(when: 'CONFIG_OLIMEX_STM32_H405', if_true: 
files('olimex-stm32-h405.c'))
 arm_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx.c', 
'npcm7xx_boards.c'))
 arm_ss.add(when: 'CONFIG_NSERIES', if_true: files('nseries.c'))
 arm_ss.add(when: 'CONFIG_SX1', if_true: files('omap_sx1.c'))
diff --git a/hw/arm/olimex-stm32-h405.c b/hw/arm/olimex-stm32-h405.c
new file mode 100644
index ..3aa61c91b759
--- /dev/null
+++ b/hw/arm/olimex-stm32-h405.c
@@ -0,0 +1,69 @@
+/*
+ * ST STM32VLDISCOVERY machine
+ * Olimex STM32-H405 machine
+ *
+ * Copyright (c) 2022 Felipe Balbi 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/boards.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-clock.h"
+#include "qemu/error-report.h"
+#include "hw/arm/stm32f405_soc.h"
+#include "hw/arm/boot.h"
+
+/* olimex-stm32-h405 implementation is derived from netduinoplus2 */
+
+/* Main SYSCLK frequency in Hz (168MHz) */
+#define SYSCLK_FRQ 16800ULL
+
+static void 

[PATCH RFC 4/4] vdagent: remove migration blocker

2022-12-30 Thread dengpc12
From: "dengp...@chinatelecom.cn" 

Now that migration already be supported, so remove the blocker.

Signed-off-by: dengp...@chinatelecom.cn 
Signed-off-by: liuy...@chinatelecom.cn 
---
 ui/vdagent.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/ui/vdagent.c b/ui/vdagent.c
index 1193abe348..f0a7fd5093 100644
--- a/ui/vdagent.c
+++ b/ui/vdagent.c
@@ -5,7 +5,6 @@
 #include "qemu/option.h"
 #include "qemu/units.h"
 #include "hw/qdev-core.h"
-#include "migration/blocker.h"
 #include "migration/vmstate.h"
 #include "ui/clipboard.h"
 #include "ui/console.h"
@@ -32,9 +31,6 @@
 struct VDAgentChardev {
 Chardev parent;
 
-/* TODO: migration isn't yet supported */
-Error *migration_blocker;
-
 /* config */
 bool mouse;
 bool clipboard;
@@ -675,10 +671,6 @@ static void vdagent_chr_open(Chardev *chr,
 return;
 #endif
 
-if (migrate_add_blocker(vd->migration_blocker, errp) != 0) {
-return;
-}
-
 vd->mouse = VDAGENT_MOUSE_DEFAULT;
 if (cfg->has_mouse) {
 vd->mouse = cfg->mouse;
@@ -950,18 +942,14 @@ static void vdagent_chr_init(Object *obj)
 buffer_init(>outbuf, "vdagent-outbuf");
 
 vmstate_register(NULL, 0, _vdagent, vd);
-error_setg(>migration_blocker,
-   "The vdagent chardev doesn't yet support migration");
 }
 
 static void vdagent_chr_fini(Object *obj)
 {
 VDAgentChardev *vd = QEMU_VDAGENT_CHARDEV(obj);
 
-migrate_del_blocker(vd->migration_blocker);
 vdagent_disconnect(vd);
 buffer_free(>outbuf);
-error_free(vd->migration_blocker);
 }
 
 static const TypeInfo vdagent_chr_type_info = {
-- 
2.27.0




[PATCH RFC 0/4] vdagent: support live migration

2022-12-30 Thread dengpc12
From: "dengp...@chinatelecom.cn" 

1. after live migration, copy/paste with vnc is not working. this is because:
1). vd->caps is not saved; this will leads wrong clipboard type is prased in 
vdagent_clipboard_recv_grab;
2). vdagent isn`t register to qemu-clipboard; this will leads vdagent cannot 
send/receive messages from qemu-clipboard, thus copy
validated on win2016 data center evaluation with 
spice-guest-tools-latest(spice-guest-tools-0.141) and on fedora 37 with spice-vd

2. the possible memory leak
we called qemu_input_handler_register, but not calling 
qemu_input_handler_unregister; 
qemu_input_handler_register will mallocate memory;

we post this patchset for RFC and any corrections and suggetions about
the implementation are very appreciated!

Please review, thanks !

Best Regards !

dengp...@chinatelecom.cn (4):
  vdagent: fix memory leak when vdagent_disconnect is called
  vdagent: refactor vdagent_chr_recv_caps function
  vdagent: add live migration support
  vdagent: remove migration blocker

 ui/trace-events |  2 ++
 ui/vdagent.c| 62 +
 2 files changed, 44 insertions(+), 20 deletions(-)

-- 
2.27.0




[PATCH RFC 2/4] vdagent: refactor vdagent_chr_recv_caps function

2022-12-30 Thread dengpc12
From: "dengp...@chinatelecom.cn" 

Abstract vdagent registry logic into
vdagent_register_to_qemu_clipboard.

Note that trace log of vdagent_recv_caps also be added.

Signed-off-by: dengp...@chinatelecom.cn 
Signed-off-by: liuy...@chinatelecom.cn 
---
 ui/trace-events |  1 +
 ui/vdagent.c| 20 +---
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/ui/trace-events b/ui/trace-events
index 977577fbba..5e50b60da5 100644
--- a/ui/trace-events
+++ b/ui/trace-events
@@ -143,6 +143,7 @@ vdagent_cb_grab_selection(const char *name) "selection %s"
 vdagent_cb_grab_discard(const char *name, int cur, int recv) "selection %s, 
cur:%d recv:%d"
 vdagent_cb_grab_type(const char *name) "type %s"
 vdagent_cb_serial_discard(uint32_t current, uint32_t received) "current=%u, 
received=%u"
+vdagent_recv_caps(uint32_t caps) "received caps %u"
 
 # dbus.c
 dbus_registered_listener(const char *bus_name) "peer %s"
diff --git a/ui/vdagent.c b/ui/vdagent.c
index 645383b4ec..38061d5b38 100644
--- a/ui/vdagent.c
+++ b/ui/vdagent.c
@@ -696,6 +696,16 @@ static void vdagent_chr_open(Chardev *chr,
 *be_opened = true;
 }
 
+static void vdagent_register_to_qemu_clipboard(VDAgentChardev *vd)
+{
+if (have_clipboard(vd) && vd->cbpeer.notifier.notify == NULL) {
+vd->cbpeer.name = "vdagent";
+vd->cbpeer.notifier.notify = vdagent_clipboard_notify;
+vd->cbpeer.request = vdagent_clipboard_request;
+qemu_clipboard_peer_register(>cbpeer);
+}
+}
+
 static void vdagent_chr_recv_caps(VDAgentChardev *vd, VDAgentMessage *msg)
 {
 VDAgentAnnounceCapabilities *caps = (void *)msg->data;
@@ -720,14 +730,10 @@ static void vdagent_chr_recv_caps(VDAgentChardev *vd, 
VDAgentMessage *msg)
 qemu_input_handler_activate(vd->mouse_hs);
 }
 
-memset(vd->last_serial, 0, sizeof(vd->last_serial));
+trace_vdagent_recv_caps(vd->caps);
 
-if (have_clipboard(vd) && vd->cbpeer.notifier.notify == NULL) {
-vd->cbpeer.name = "vdagent";
-vd->cbpeer.notifier.notify = vdagent_clipboard_notify;
-vd->cbpeer.request = vdagent_clipboard_request;
-qemu_clipboard_peer_register(>cbpeer);
-}
+memset(vd->last_serial, 0, sizeof(vd->last_serial));
+vdagent_register_to_qemu_clipboard(vd);
 }
 
 static void vdagent_chr_recv_msg(VDAgentChardev *vd, VDAgentMessage *msg)
-- 
2.27.0




[PATCH RFC 3/4] vdagent: add live migration support

2022-12-30 Thread dengpc12
From: "dengp...@chinatelecom.cn" 

To support live migration, we made the following 2 modifications:
1. save the caps field of VDAgentChardev.
2. register vdagent to qemu-clipboard after
   vm device state being reloaded during live migration.

Signed-off-by: dengp...@chinatelecom.cn 
Signed-off-by: liuy...@chinatelecom.cn 
---
 ui/trace-events |  1 +
 ui/vdagent.c| 28 
 2 files changed, 29 insertions(+)

diff --git a/ui/trace-events b/ui/trace-events
index 5e50b60da5..ccacd867d1 100644
--- a/ui/trace-events
+++ b/ui/trace-events
@@ -144,6 +144,7 @@ vdagent_cb_grab_discard(const char *name, int cur, int 
recv) "selection %s, cur:
 vdagent_cb_grab_type(const char *name) "type %s"
 vdagent_cb_serial_discard(uint32_t current, uint32_t received) "current=%u, 
received=%u"
 vdagent_recv_caps(uint32_t caps) "received caps %u"
+vdagent_migration_caps(uint32_t caps) "migrated caps %u"
 
 # dbus.c
 dbus_registered_listener(const char *bus_name) "peer %s"
diff --git a/ui/vdagent.c b/ui/vdagent.c
index 38061d5b38..1193abe348 100644
--- a/ui/vdagent.c
+++ b/ui/vdagent.c
@@ -6,6 +6,7 @@
 #include "qemu/units.h"
 #include "hw/qdev-core.h"
 #include "migration/blocker.h"
+#include "migration/vmstate.h"
 #include "ui/clipboard.h"
 #include "ui/console.h"
 #include "ui/input.h"
@@ -906,6 +907,31 @@ static void vdagent_chr_parse(QemuOpts *opts, 
ChardevBackend *backend,
 
 /* -- */
 
+static int vdagent_post_load(void *opaque, int version_id)
+{
+VDAgentChardev *vd = QEMU_VDAGENT_CHARDEV(opaque);
+
+trace_vdagent_migration_caps(vd->caps);
+
+if (vd->caps) {
+vdagent_register_to_qemu_clipboard(vd);
+qemu_input_handler_activate(vd->mouse_hs);
+}
+
+return 0;
+}
+
+static const VMStateDescription vmstate_vdagent = {
+.name = "vdagent",
+.version_id = 1,
+.minimum_version_id = 1,
+.post_load = vdagent_post_load,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(caps, VDAgentChardev),
+VMSTATE_END_OF_LIST()
+},
+};
+
 static void vdagent_chr_class_init(ObjectClass *oc, void *data)
 {
 ChardevClass *cc = CHARDEV_CLASS(oc);
@@ -922,6 +948,8 @@ static void vdagent_chr_init(Object *obj)
 VDAgentChardev *vd = QEMU_VDAGENT_CHARDEV(obj);
 
 buffer_init(>outbuf, "vdagent-outbuf");
+
+vmstate_register(NULL, 0, _vdagent, vd);
 error_setg(>migration_blocker,
"The vdagent chardev doesn't yet support migration");
 }
-- 
2.27.0




[PATCH RFC 1/4] vdagent: fix memory leak when vdagent_disconnect is called

2022-12-30 Thread dengpc12
From: "dengp...@chinatelecom.cn" 

Memory free should be done in vdagent_disconnect using
qemu_input_handler_unregister, replace qemu_input_handler_deactivate
with that.

Signed-off-by: dengp...@chinatelecom.cn 
Signed-off-by: liuy...@chinatelecom.cn 
---
 ui/vdagent.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ui/vdagent.c b/ui/vdagent.c
index 4bf50f0c4d..645383b4ec 100644
--- a/ui/vdagent.c
+++ b/ui/vdagent.c
@@ -863,7 +863,7 @@ static void vdagent_disconnect(VDAgentChardev *vd)
 vdagent_reset_bufs(vd);
 vd->caps = 0;
 if (vd->mouse_hs) {
-qemu_input_handler_deactivate(vd->mouse_hs);
+qemu_input_handler_unregister(vd->mouse_hs);
 }
 if (vd->cbpeer.notifier.notify) {
 qemu_clipboard_peer_unregister(>cbpeer);
-- 
2.27.0




qemu-system-i386: Could not install MSR_CORE_THREAD_COUNT handler: Success

2022-12-30 Thread Vitaly Chikunov
Hi,

QEMU 7.2.0 when run on 32-bit x86 architecture fails with:

  i586$ qemu-system-i386 -enable-kvm
  qemu-system-i386: Could not install MSR_CORE_THREAD_COUNT handler: Success
  i586$ qemu-system-x86_64 -enable-kvm
  qemu-system-x86_64: Could not install MSR_CORE_THREAD_COUNT handler: Success

Minimal reproducer is `qemu-system-i386 -enable-kvm'. And this only
happens on x86 (linux32 personality and binaries on x86_64 host):

  i586$ file /usr/bin/qemu-system-i386
  /usr/bin/qemu-system-i386: ELF 32-bit LSB pie executable, Intel 80386, 
version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, 
BuildID[sha1]=0ba1d953bcb7a691014255954f060ff404c8df90, for GNU/Linux 3.2.0, 
stripped
  i586$ /usr/bin/qemu-system-i386 --version
  QEMU emulator version 7.2.0 (qemu-7.2.0-alt1)
  Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers

Thanks,




Re: [PATCH 10/11] alsaaudio: change default playback settings

2022-12-30 Thread Christian Schoenebeck
On Friday, December 30, 2022 10:01:47 AM CET Volker Rümelin wrote:
> Am 28.12.22 um 14:52 schrieb Christian Schoenebeck:
> > On Monday, December 26, 2022 4:08:37 PM CET Volker Rümelin wrote:
> >> Am 21.12.22 um 12:03 schrieb Christian Schoenebeck:
> >>> On Sunday, December 18, 2022 6:15:38 PM CET Volker Rümelin wrote:
>  The currently used default playback settings in the ALSA audio
>  backend are a bit unfortunate. With a few emulated audio devices,
>  audio playback does not work properly. Here is a short part of
>  the debug log while audio is playing (elapsed time in seconds).
> >>> Which emulated devices are these?
> >> The hda device and sb16. When I wrote this patch two months ago ac97
> >> also had occasional dropouts, but at the moment ac97 works without issues.
> >>
>  audio: Elapsed since last alsa run (running): 0.046244
>  audio: Elapsed since last alsa run (running): 0.023137
>  audio: Elapsed since last alsa run (running): 0.023170
>  audio: Elapsed since last alsa run (running): 0.023650
>  audio: Elapsed since last alsa run (running): 0.060802
>  audio: Elapsed since last alsa run (running): 0.031931
> 
>  For some audio devices the time of more than 23ms between updates
>  is too long.
> 
>  Set the period time to 5.8ms so that the maximum time between
>  two updates typically does not exceed 11ms. This roughly matches
>  the 10ms period time when doing playback with the audio timer.
>  After this patch the debug log looks like this.
> >>> And what about dynamically adapting that value instead of reducing period 
> >>> time
> >>> for everyone by default?
> >> It seems this would be only needed for the ALSA backend. All other
> >> backends with the exception of OSS are fine with a 10ms period, and the
> >> ALSA audio backend also uses 10ms with -audiodev
> >> alsa,out.try-poll=off,in.try-poll=off.
> > OK, but all it would need was adjusting dev->timer_period appropriately 
> > either
> > in audio_validate_opts() [audio/audio.c, line 2126] to handle it generalized
> > or at the end of alsa_audio_init() [audio/alsaaudio.c, line 944] if
> > specifically for ALSA only, no?
> 
> I think this should be the other way around. If period-length wasn't 
> specified on the command line, it should be calculated from 
> timer-period. With timer based playback or recording, the guest should 
> be able to write or read new audio frames every timer-period 
> microseconds. To have a high probability that the guest can write or 
> read new frames every timer-period, the asynchronous updates of the 
> audio backend have to be faster than timer-period e.g. 75-80% of 
> timer-period. But that's a different patch.

Probably in both directions, depending on what the user supplied.

I still have this feeling that this patch might just move the problem to other
users (dropouts) until properly addressed, but anyway, for the time being:

Acked-by: Christian Schoenebeck 

> >>> 23ms is usually a good trade off between low latency, CPU load and 
> >>> potential
> >>> for audio dropouts.
> >> Quite often it's longer than 23ms. For the rest of the audio backends a
> >> timer period of 10ms was selected as a good trade off between CPU load
> >> and audio dropouts. But you are right, this patch increases the CPU load.
> >>
> >> On my system the CPU load is increased by 0.9%. This was measured with a
> >> Linux guest using rhythmbox for audio playback. The guest was configured
> >> to use pulseaudio as sound server. The measurement was done with top -b
> >> -d 10 -n 14 over a period of two minutes. The first and last measurement
> >> was dropped. The average QEMU CPU load was 10.7% with and 9.8% without
> >> this patch.
> >>
> >> I would prefer a system with a 0.9% increased CPU load where audio just
> >> works over a system where you have to fine tune audio parameters.
> >>





Re: [PATCH] ui/cocoa: user friendly characters for release mouse

2022-12-30 Thread Christian Schoenebeck
On Thursday, December 29, 2022 1:31:09 PM CET Philippe Mathieu-Daudé wrote:
> On 27/12/22 17:15, Christian Schoenebeck wrote:
> > While mouse is grabbed, window title contains a hint for the user what
> > keyboard keys to press to release the mouse. Make that hint text a bit
> > more user friendly for a Mac user:
> > 
> >   - Replace "Ctrl" and "Alt" by appropriate symbols for those keyboard
> > keys typically displayed for them on a Mac (encode those symbols by
> > using UTF-8 characters).
> > 
> >   - Drop " + " in between the keys, as that's not common on macOS for
> > documenting keyboard shortcuts.
> > 
> >   - Convert lower case "g" to upper case "G", as that's common on macOS.
> > 
> >   - Add one additional space at start and end of key stroke set, to
> > visually separate the key strokes from the rest of the text.
> > 
> > Signed-off-by: Christian Schoenebeck 
> > ---
> >   ui/cocoa.m | 7 +--
> >   1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/ui/cocoa.m b/ui/cocoa.m
> > index e915c344a8..289a2b193e 100644
> > --- a/ui/cocoa.m
> > +++ b/ui/cocoa.m
> > @@ -72,6 +72,9 @@
> >   
> >   #define cgrect(nsrect) (*(CGRect *)&(nsrect))
> >   
> > +#define UC_CTRL_KEY "\xe2\x8c\x83"
> > +#define UC_ALT_KEY "\xe2\x8c\xa5"
> > +
> >   typedef struct {
> >   int width;
> >   int height;
> > @@ -1135,9 +1138,9 @@ - (void) grabMouse
> >   
> >   if (!isFullscreen) {
> >   if (qemu_name)
> > -[normalWindow setTitle:[NSString stringWithFormat:@"QEMU %s - 
> > (Press ctrl + alt + g to release Mouse)", qemu_name]];
> > +[normalWindow setTitle:[NSString stringWithFormat:@"QEMU %s - 
> > (Press  " UC_CTRL_KEY " " UC_ALT_KEY " G  to release Mouse)", qemu_name]];
> 
> I was a bit confused by the control symbol at first and tough
> I had to press '^'; thus I'm not sure keeping 'ctrl' wouldn't
> be clearer. The UC_ALT_KEY symbol certainly is. Anyhow,

It _should_ be clear to Mac users that "^" == "control" key on a Mac keyboard,
as that's how it is presented for keyboard shortcuts in all Mac menus. But if
it's really irritating then we can also revert that to either "Ctrl" or
"control". It's a quick change.

I just got itched by the "Alt" hint actually, which is not printed on any Mac
keyboard.

> Tested-by: Philippe Mathieu-Daudé 
> Reviewed-by: Philippe Mathieu-Daudé 

Thanks!

> >   else
> > -[normalWindow setTitle:@"QEMU - (Press ctrl + alt + g to 
> > release Mouse)"];
> > +[normalWindow setTitle:@"QEMU - (Press  " UC_CTRL_KEY " " 
> > UC_ALT_KEY " G  to release Mouse)"];
> >   }
> >   [self hideCursor];
> >   CGAssociateMouseAndMouseCursorPosition(isAbsoluteEnabled);
> 
> 
> 





Re: [PATCH v3 07/17] hw/9pfs: Support getting current directory offset for Windows

2022-12-30 Thread Christian Schoenebeck
On Thursday, December 29, 2022 7:03:54 AM CET Shi, Guohuai wrote:
> 
> > -Original Message-
> > From: Christian Schoenebeck 
> > Sent: Wednesday, December 28, 2022 19:51
> > To: Greg Kurz ; qemu-devel@nongnu.org
> > Cc: Meng, Bin ; Shi, Guohuai
> > 
> > Subject: Re: [PATCH v3 07/17] hw/9pfs: Support getting current directory
> > offset for Windows
> > 
> > CAUTION: This email comes from a non Wind River email account!
> > Do not click links or open attachments unless you recognize the sender and
> > know the content is safe.
> > 
> > On Wednesday, December 21, 2022 7:02:43 PM CET Shi, Guohuai wrote:
> > >
> > > > -Original Message-
> > > > From: Christian Schoenebeck 
> > > > Sent: Wednesday, December 21, 2022 22:48
> > > > To: Greg Kurz ; qemu-devel@nongnu.org
> > > > Cc: Shi, Guohuai ; Meng, Bin
> > > > 
> > > > Subject: Re: [PATCH v3 07/17] hw/9pfs: Support getting current
> > > > directory offset for Windows
> > > >
> > > > CAUTION: This email comes from a non Wind River email account!
> > > > Do not click links or open attachments unless you recognize the
> > > > sender and know the content is safe.
> > > >
> > > > On Monday, December 19, 2022 11:20:11 AM CET Bin Meng wrote:
> > > > > From: Guohuai Shi 
> > > > >
> > > > > On Windows 'struct dirent' does not have current directory offset.
> > > > > Update qemu_dirent_off() to support Windows.
> > > > >
> > > > > While we are here, add a build time check to error out if a new
> > > > > host does not implement this helper.
> > > > >
> > > > > Signed-off-by: Guohuai Shi 
> > > > > Signed-off-by: Bin Meng 
> > > > > ---
> > > > >
> > > > > (no changes since v1)
> > > > >
> > > > >  hw/9pfs/9p-util.h   | 11 ---
> > > > >  hw/9pfs/9p-util-win32.c |  7 +++
> > > > >  hw/9pfs/9p.c|  4 ++--
> > > > >  hw/9pfs/codir.c |  2 +-
> > > > >  4 files changed, 18 insertions(+), 6 deletions(-)
> > > > >
> > > > > diff --git a/hw/9pfs/9p-util.h b/hw/9pfs/9p-util.h index
> > > > > 90420a7578..e395936b30 100644
> > > > > --- a/hw/9pfs/9p-util.h
> > > > > +++ b/hw/9pfs/9p-util.h
> > > > > @@ -127,6 +127,7 @@ int unlinkat_win32(int dirfd, const char
> > > > > *pathname, int flags);  int statfs_win32(const char *root_path,
> > > > > struct statfs *stbuf);  int openat_dir(int dirfd, const char
> > > > > *name);  int openat_file(int dirfd, const char *name, int flags,
> > > > > mode_t mode);
> > > > > +off_t qemu_dirent_off_win32(void *s, void *fs);
> > > > >  #endif
> > > > >
> > > > >  static inline void close_preserve_errno(int fd) @@ -200,12
> > > > > +201,16 @@ ssize_t fremovexattrat_nofollow(int dirfd, const char
> > *filename,
> > > > >   * so ensure it is manually injected earlier and call here when
> > > > >   * needed.
> > > > >   */
> > > > > -static inline off_t qemu_dirent_off(struct dirent *dent)
> > > > > +static inline off_t qemu_dirent_off(struct dirent *dent, void *s,
> > > > > +void *fs)
> > > > >  {
> > > >
> > > > Not sure why you chose void* here.
> > >
> > > It is "V9fsState *" here, but 9p-util.h may be included by some other
> > source file which is not have definition of "V9fsState".
> > > So change it to "void *" to prevent unnecessary type declarations.
> > 
> > You can anonymously declare the struct to avoid cyclic dependencies (e.g.
> > like in 9p.h):
> > 
> > typedef struct V9fsState V9fsState;
> > 
> > Avoid the void.
> > 
> 
> Got it, understood.
> 
> > > >
> > > > > -#ifdef CONFIG_DARWIN
> > > > > +#if defined(CONFIG_DARWIN)
> > > > >  return dent->d_seekoff;
> > > > > -#else
> > > > > +#elif defined(CONFIG_LINUX)
> > > > >  return dent->d_off;
> > > > > +#elif defined(CONFIG_WIN32)
> > > > > +return qemu_dirent_off_win32(s, fs); #else #error Missing
> > > > > +qemu_dirent_off() implementation for this host system
> > > > >  #endif
> > > > >  }
> > > > >
> > > > > diff --git a/hw/9pfs/9p-util-win32.c b/hw/9pfs/9p-util-win32.c
> > > > > index 7a270a7bd5..3592e057ce 100644
> > > > > --- a/hw/9pfs/9p-util-win32.c
> > > > > +++ b/hw/9pfs/9p-util-win32.c
> > > > > @@ -929,3 +929,10 @@ int qemu_mknodat(int dirfd, const char
> > > > > *filename,
> > > > mode_t mode, dev_t dev)
> > > > >  errno = ENOTSUP;
> > > > >  return -1;
> > > > >  }
> > > > > +
> > > > > +off_t qemu_dirent_off_win32(void *s, void *fs) {
> > > > > +V9fsState *v9fs = s;
> > > > > +
> > > > > +return v9fs->ops->telldir(>ctx, (V9fsFidOpenState
> > > > > + *)fs); }
> > > >
> > > > That's a bit tricky. So far (i.e. for Linux host, macOS host) we are
> > > > storing the directory offset directly to the dirent struct. We are
> > > > not using
> > > > telldir() as the stream might have mutated in the meantime, hence
> > > > calling
> > > > telldir() might return a value that does no longer correlate to
> > > > dirent in question.
> > > >
> > > > Hence my idea was to use the same hack for Windows as we did for
> > > > macOS, where we simply misuse a dirent field known to be not used,
> > > > on macOS that's 

Re: [PULL 46/47] accel/tcg: Handle false negative lookup in page_check_range

2022-12-30 Thread Philippe Mathieu-Daudé

On 30/12/22 01:02, Richard Henderson wrote:

As in page_get_flags, we need to try again with the mmap
lock held if we fail a page lookup.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
  accel/tcg/user-exec.c | 41 ++---
  1 file changed, 34 insertions(+), 7 deletions(-)

diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index 2c5c10d2e6..a8eb63ab96 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -525,6 +525,8 @@ void page_set_flags(target_ulong start, target_ulong end, 
int flags)
  int page_check_range(target_ulong start, target_ulong len, int flags)
  {
  target_ulong last;
+int locked;  /* tri-state: =0: unlocked, +1: global, -1: local */


Thanks :)




Re: [PATCH v4 07/11] hw/riscv: write bootargs 'chosen' FDT after riscv_load_kernel()

2022-12-30 Thread Philippe Mathieu-Daudé

On 29/12/22 19:11, Daniel Henrique Barboza wrote:

The sifive_u, spike and virt machines are writing the 'bootargs' FDT
node during their respective create_fdt().

Given that bootargs is written only when '-append' is used, and this
option is only allowed with the '-kernel' option, which in turn is
already being check before executing riscv_load_kernel(), write
'bootargs' in the same code path as riscv_load_kernel().

Cc: Palmer Dabbelt 
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Bin Meng 
---
  hw/riscv/sifive_u.c | 11 +--
  hw/riscv/spike.c|  9 +
  hw/riscv/virt.c | 11 +--
  3 files changed, 15 insertions(+), 16 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v4 06/11] hw/riscv: write initrd 'chosen' FDT inside riscv_load_initrd()

2022-12-30 Thread Philippe Mathieu-Daudé

On 29/12/22 19:11, Daniel Henrique Barboza wrote:

riscv_load_initrd() returns the initrd end addr while also writing a
'start' var to mark the addr start. These informations are being used
just to write the initrd FDT node. Every existing caller of
riscv_load_initrd() is writing the FDT in the same manner.

We can simplify things by writing the FDT inside riscv_load_initrd(),
sparing callers from having to manage start/end addrs to write the FDT
themselves.

An 'if (fdt)' check is already inserted at the end of the function
because we'll end up using it later on with other boards that doesn´t
have a FDT.

Cc: Palmer Dabbelt 
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Bin Meng 
---
  hw/riscv/boot.c| 18 --
  hw/riscv/microchip_pfsoc.c | 10 ++
  hw/riscv/sifive_u.c| 10 ++
  hw/riscv/spike.c   | 10 ++
  hw/riscv/virt.c| 10 ++
  include/hw/riscv/boot.h|  4 ++--
  6 files changed, 22 insertions(+), 40 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v4 01/11] tests/avocado: add RISC-V OpenSBI boot test

2022-12-30 Thread Philippe Mathieu-Daudé

On 29/12/22 19:11, Daniel Henrique Barboza wrote:

This test is used to do a quick sanity check to ensure that we're able
to run the existing QEMU FW image.

'sifive_u', 'spike' and 'virt' riscv64 machines, and 'sifive_u' and
'virt' 32 bit machines are able to run the default RISCV64_BIOS_BIN |
RISCV32_BIOS_BIN firmware with minimal options.

The riscv32 'spike' machine isn't bootable at this moment, requiring an
OpenSBI fix [1] and QEMU side changes [2]. We could just leave at that
or add a 'skip' test to remind us about it. To work as a reminder that
we have a riscv32 'spike' test that should be enabled as soon as OpenSBI
QEMU rom receives the fix, we're adding a 'skip' test:

(06/18) tests/avocado/riscv_opensbi.py:RiscvOpenSBI.test_riscv32_spike:
 SKIP: requires OpenSBI fix to work

[1] 
https://patchwork.ozlabs.org/project/opensbi/patch/20221226033603.1860569-1-bm...@tinylab.org/
[2] https://patchwork.ozlabs.org/project/qemu-devel/list/?series=334159

Cc: Cleber Rosa 
Cc: Philippe Mathieu-Daudé 
Reviewed-by: Bin Meng 
Tested-by: Bin Meng 
Signed-off-by: Daniel Henrique Barboza 
---
  tests/avocado/riscv_opensbi.py | 65 ++
  1 file changed, 65 insertions(+)
  create mode 100644 tests/avocado/riscv_opensbi.py


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v4 04/11] hw/riscv/boot.c: exit early if filename is NULL in load_(kernel|initrd)

2022-12-30 Thread Philippe Mathieu-Daudé

On 30/12/22 09:58, Bin Meng wrote:

On Fri, Dec 30, 2022 at 2:21 AM Daniel Henrique Barboza
 wrote:


riscv_load_kernel() and riscv_load_initrd() works under the assumption
that 'kernel_filename' and 'filename' are not NULL.


We should do the same in riscv_load_firmware()


Can be done on top IMHO.


This is currently the case since all callers of both functions are
checking for NULL before calling them. Put an assert in both to make
sure that a NULL value for both cases would be considered a bug.

Suggested-by: Alex Bennée 
Signed-off-by: Daniel Henrique Barboza 
---
  hw/riscv/boot.c | 4 
  1 file changed, 4 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v4 3/3] hw/intc/loongarch_pch: Change default irq number of pch irq controller

2022-12-30 Thread Philippe Mathieu-Daudé

On 30/12/22 10:59, Tianrui Zhao wrote:

Change the default irq number of pch pic to 32, so that the irq
number of pch msi is 224(256 - 32), and move the 'PCH_PIC_IRQ_NUM'
macro to pci-host/ls7a.h and add prefix 'VIRT' on it to keep standard
format.

Signed-off-by: Tianrui Zhao 
---
  hw/intc/loongarch_pch_pic.c | 3 ++-
  hw/loongarch/virt.c | 2 +-
  include/hw/intc/loongarch_pch_msi.h | 6 +++---
  include/hw/intc/loongarch_pch_pic.h | 1 -
  include/hw/pci-host/ls7a.h  | 1 +
  5 files changed, 7 insertions(+), 6 deletions(-)




diff --git a/include/hw/intc/loongarch_pch_msi.h 
b/include/hw/intc/loongarch_pch_msi.h
index c5a52bc327..a1172f56ff 100644
--- a/include/hw/intc/loongarch_pch_msi.h
+++ b/include/hw/intc/loongarch_pch_msi.h
@@ -8,10 +8,10 @@
  #define TYPE_LOONGARCH_PCH_MSI "loongarch_pch_msi"
  OBJECT_DECLARE_SIMPLE_TYPE(LoongArchPCHMSI, LOONGARCH_PCH_MSI)
  
-/* Msi irq start start from 64 to 255 */

-#define PCH_MSI_IRQ_START   64
+/* Msi irq start start from 32 to 255 */


s/Msi/MSI/

Otherwise,

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v2] i386: Deprecate the -no-hpet QEMU command line option

2022-12-30 Thread Philippe Mathieu-Daudé

On 29/12/22 12:49, Thomas Huth wrote:

The HPET setting has been turned into a machine property a while ago
already, so we should finally do the next step and deprecate the
legacy CLI option, too.

Signed-off-by: Thomas Huth 
---
  v2:
  - Rebased to current version from master branch / adjusted version info
  - Dropped the descrpition in hw/i386/pc.c (already done via another patch)

  Note: The "hpet" property should now show up in the output of the
  "query-command-line-options" QMP command since commit 2f129fc107b4a, so
  it should be feasible for libvirt now to properly probe for the property .

  docs/about/deprecated.rst | 6 ++
  softmmu/vl.c  | 1 +
  qemu-options.hx   | 2 +-
  3 files changed, 8 insertions(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé 




[RFC PATCH v5 15/52] i386/xen: add pc_machine_kvm_type to initialize XEN_EMULATE mode

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

The xen_overlay device (and later similar devices for event channels and
grant tables) need to be instantiated. Do this from a kvm_type method on
the PC machine derivatives, since KVM is only way to support Xen emulation
for now.

Signed-off-by: David Woodhouse 
---
 hw/i386/pc.c | 11 +++
 include/hw/i386/pc.h |  3 +++
 2 files changed, 14 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 546b703cb4..d8c5b1814e 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -89,6 +89,7 @@
 #include "hw/virtio/virtio-iommu.h"
 #include "hw/virtio/virtio-pmem-pci.h"
 #include "hw/virtio/virtio-mem-pci.h"
+#include "hw/i386/kvm/xen_overlay.h"
 #include "hw/mem/memory-device.h"
 #include "sysemu/replay.h"
 #include "target/i386/cpu.h"
@@ -1842,6 +1843,16 @@ static void pc_machine_initfn(Object *obj)
 cxl_machine_init(obj, >cxl_devices_state);
 }
 
+int pc_machine_kvm_type(MachineState *machine, const char *kvm_type)
+{
+#ifdef CONFIG_XEN_EMU
+if (xen_mode == XEN_EMULATE) {
+xen_overlay_create();
+}
+#endif
+return 0;
+}
+
 static void pc_machine_reset(MachineState *machine, ShutdownCause reason)
 {
 CPUState *cs;
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index c95333514e..e82224857e 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -290,12 +290,15 @@ extern const size_t pc_compat_1_5_len;
 extern GlobalProperty pc_compat_1_4[];
 extern const size_t pc_compat_1_4_len;
 
+extern int pc_machine_kvm_type(MachineState *machine, const char *vm_type);
+
 #define DEFINE_PC_MACHINE(suffix, namestr, initfn, optsfn) \
 static void pc_machine_##suffix##_class_init(ObjectClass *oc, void *data) \
 { \
 MachineClass *mc = MACHINE_CLASS(oc); \
 optsfn(mc); \
 mc->init = initfn; \
+mc->kvm_type = pc_machine_kvm_type; \
 } \
 static const TypeInfo pc_machine_type_##suffix = { \
 .name   = namestr TYPE_MACHINE_SUFFIX, \
-- 
2.35.3




[RFC PATCH v5 14/52] hw/xen: Add xen_overlay device for emulating shared xenheap pages

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

For the shared info page and for grant tables, Xen shares its own pages
from the "Xen heap" to the guest. The guest requests that a given page
from a certain address space (XENMAPSPACE_shared_info, etc.) be mapped
to a given GPA using the XENMEM_add_to_physmap hypercall.

To support that in qemu when *emulating* Xen, create a memory region
(migratable) and allow it to be mapped as an overlay when requested.

Xen theoretically allows the same page to be mapped multiple times
into the guest, but that's hard to track and reinstate over migration,
so we automatically *unmap* any previous mapping when creating a new
one. This approach has been used in production with a non-trivial
number of guests expecting true Xen, without any problems yet being
noticed.

This adds just the shared info page for now. The grant tables will be
a larger region, and will need to be overlaid one page at a time. I
think that means I need to create separate aliases for each page of
the overall grant_frames region, so that they can be mapped individually.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/meson.build   |   1 +
 hw/i386/kvm/xen_overlay.c | 200 ++
 hw/i386/kvm/xen_overlay.h |  20 
 include/sysemu/kvm_xen.h  |   4 +
 4 files changed, 225 insertions(+)
 create mode 100644 hw/i386/kvm/xen_overlay.c
 create mode 100644 hw/i386/kvm/xen_overlay.h

diff --git a/hw/i386/kvm/meson.build b/hw/i386/kvm/meson.build
index 95467f1ded..6165cbf019 100644
--- a/hw/i386/kvm/meson.build
+++ b/hw/i386/kvm/meson.build
@@ -4,5 +4,6 @@ i386_kvm_ss.add(when: 'CONFIG_APIC', if_true: files('apic.c'))
 i386_kvm_ss.add(when: 'CONFIG_I8254', if_true: files('i8254.c'))
 i386_kvm_ss.add(when: 'CONFIG_I8259', if_true: files('i8259.c'))
 i386_kvm_ss.add(when: 'CONFIG_IOAPIC', if_true: files('ioapic.c'))
+i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen_overlay.c'))
 
 i386_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss)
diff --git a/hw/i386/kvm/xen_overlay.c b/hw/i386/kvm/xen_overlay.c
new file mode 100644
index 00..331dea6b8b
--- /dev/null
+++ b/hw/i386/kvm/xen_overlay.c
@@ -0,0 +1,200 @@
+/*
+ * QEMU Xen emulation: Shared/overlay pages support
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Authors: David Woodhouse 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "qemu/module.h"
+#include "qemu/main-loop.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+#include "exec/target_page.h"
+#include "exec/address-spaces.h"
+#include "migration/vmstate.h"
+
+#include "hw/sysbus.h"
+#include "hw/xen/xen.h"
+#include "xen_overlay.h"
+
+#include "sysemu/kvm.h"
+#include "sysemu/kvm_xen.h"
+#include 
+
+#include "standard-headers/xen/memory.h"
+
+
+#define TYPE_XEN_OVERLAY "xen-overlay"
+OBJECT_DECLARE_SIMPLE_TYPE(XenOverlayState, XEN_OVERLAY)
+
+#define XEN_PAGE_SHIFT 12
+#define XEN_PAGE_SIZE (1ULL << XEN_PAGE_SHIFT)
+
+struct XenOverlayState {
+/*< private >*/
+SysBusDevice busdev;
+/*< public >*/
+
+MemoryRegion shinfo_mem;
+void *shinfo_ptr;
+uint64_t shinfo_gpa;
+};
+
+struct XenOverlayState *xen_overlay_singleton;
+
+static void xen_overlay_map_page_locked(MemoryRegion *page, uint64_t gpa)
+{
+/*
+ * Xen allows guests to map the same page as many times as it likes
+ * into guest physical frames. We don't, because it would be hard
+ * to track and restore them all. One mapping of each page is
+ * perfectly sufficient for all known guests... and we've tested
+ * that theory on a few now in other implementations. dwmw2.
+ */
+if (memory_region_is_mapped(page)) {
+if (gpa == INVALID_GPA) {
+memory_region_del_subregion(get_system_memory(), page);
+} else {
+/* Just move it */
+memory_region_set_address(page, gpa);
+}
+} else if (gpa != INVALID_GPA) {
+memory_region_add_subregion_overlap(get_system_memory(), gpa, page, 0);
+}
+}
+
+/* KVM is the only existing back end for now. Let's not overengineer it yet. */
+static int xen_overlay_set_be_shinfo(uint64_t gfn)
+{
+struct kvm_xen_hvm_attr xa = {
+.type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
+.u.shared_info.gfn = gfn,
+};
+
+return kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, );
+}
+
+
+static void xen_overlay_realize(DeviceState *dev, Error **errp)
+{
+XenOverlayState *s = XEN_OVERLAY(dev);
+
+if (xen_mode != XEN_EMULATE) {
+error_setg(errp, "Xen overlay page support is for Xen emulation");
+return;
+}
+
+memory_region_init_ram(>shinfo_mem, OBJECT(dev), "xen:shared_info",
+   XEN_PAGE_SIZE, _abort);
+memory_region_set_enabled(>shinfo_mem, true);
+
+s->shinfo_ptr = memory_region_get_ram_ptr(>shinfo_mem);
+ 

[RFC PATCH v5 39/52] i386/xen: add monitor commands to test event injection

2022-12-30 Thread David Woodhouse
From: Joao Martins 

Specifically add listing, injection of event channels.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
---
 hmp-commands.hx  | 29 ++
 hw/i386/kvm/xen_evtchn.c | 83 
 hw/i386/kvm/xen_evtchn.h |  3 ++
 monitor/misc.c   |  4 ++
 4 files changed, 119 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 673e39a697..fd77c432c0 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1815,3 +1815,32 @@ SRST
   Dump the FDT in dtb format to *filename*.
 ERST
 #endif
+
+#if defined(CONFIG_XEN_EMU)
+{
+.name   = "xen-event-inject",
+.args_type  = "port:i",
+.params = "port",
+.help   = "inject event channel",
+.cmd= hmp_xen_event_inject,
+},
+
+SRST
+``xen-event-inject`` *port*
+  Notify guest via event channel on port *port*.
+ERST
+
+
+{
+.name   = "xen-event-list",
+.args_type  = "",
+.params = "",
+.help   = "list event channel state",
+.cmd= hmp_xen_event_list,
+},
+
+SRST
+``xen-event-list``
+  List event channels in the guest
+ERST
+#endif
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index cdf7d86da8..87136334f5 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -19,6 +19,8 @@
 #include "exec/target_page.h"
 #include "exec/address-spaces.h"
 #include "migration/vmstate.h"
+#include "monitor/monitor.h"
+#include "qapi/qmp/qdict.h"
 
 #include "hw/sysbus.h"
 #include "hw/xen/xen.h"
@@ -1061,3 +1063,84 @@ int xen_evtchn_send_op(struct evtchn_send *send)
 return ret;
 }
 
+static const char *type_names[] = {
+"closed",
+"unbound",
+"interdomain",
+"pirq",
+"virq",
+"ipi"
+};
+
+void hmp_xen_event_list(Monitor *mon, const QDict *qdict)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+void *shinfo, *pending, *mask;
+int i;
+
+if (!s) {
+monitor_printf(mon, "Xen event channel emulation not enabled\n");
+return;
+}
+
+shinfo = xen_overlay_get_shinfo_ptr();
+if (!shinfo) {
+monitor_printf(mon, "Xen shared info page not allocated\n");
+return;
+}
+if (xen_is_long_mode()) {
+pending = shinfo + offsetof(struct shared_info, evtchn_pending);
+mask = shinfo + offsetof(struct shared_info, evtchn_mask);
+} else {
+pending = shinfo + offsetof(struct compat_shared_info, evtchn_pending);
+mask = shinfo + offsetof(struct compat_shared_info, evtchn_mask);
+}
+
+qemu_mutex_lock(>port_lock);
+
+for (i = 0; i < s->nr_ports; i++) {
+XenEvtchnPort *p = >port_table[i];
+
+if (p->type == EVTCHNSTAT_closed) {
+continue;
+}
+
+monitor_printf(mon, "port %4u %s/%d vcpu:%d pending:%d mask:%d\n", i,
+   type_names[p->type], p->type_val, p->vcpu,
+   test_bit(i, pending), test_bit(i, mask));
+}
+
+qemu_mutex_unlock(>port_lock);
+}
+
+void hmp_xen_event_inject(Monitor *mon, const QDict *qdict)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int port = qdict_get_int(qdict, "port");
+XenEvtchnPort *p;
+
+if (!s) {
+monitor_printf(mon, "Xen event channel emulation not enabled\n");
+return;
+}
+
+if (!valid_port(port)) {
+monitor_printf(mon, "Invalid port %d\n", port);
+return;
+}
+p = >port_table[port];
+
+qemu_mutex_lock(>port_lock);
+
+monitor_printf(mon, "port %4u %s/%d vcpu:%d\n", port,
+   type_names[p->type], p->type_val, p->vcpu);
+
+if (set_port_pending(s, port)) {
+monitor_printf(mon, "Failed to set port %d\n", port);
+} else {
+monitor_printf(mon, "Delivered port %d\n", port);
+}
+
+qemu_mutex_unlock(>port_lock);
+}
+
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 5d3e03553f..1de53daa52 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -16,6 +16,9 @@ void xen_evtchn_create(void);
 int xen_evtchn_soft_reset(void);
 int xen_evtchn_set_callback_param(uint64_t param);
 
+void hmp_xen_event_list(Monitor *mon, const QDict *qdict);
+void hmp_xen_event_inject(Monitor *mon, const QDict *qdict);
+
 struct evtchn_status;
 struct evtchn_close;
 struct evtchn_unmask;
diff --git a/monitor/misc.c b/monitor/misc.c
index 205487e2b9..2b11c0f86a 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -88,6 +88,10 @@
 /* Make devices configuration available for use in hmp-commands*.hx templates 
*/
 #include CONFIG_DEVICES
 
+#ifdef CONFIG_XEN_EMU
+#include "hw/i386/kvm/xen_evtchn.h"
+#endif
+
 /* file descriptors passed via SCM_RIGHTS */
 typedef struct mon_fd_t mon_fd_t;
 struct mon_fd_t {
-- 
2.35.3




[RFC PATCH v5 48/52] i386/xen: Reserve Xen special pages for console, xenstore rings

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

Xen has eight frames at 0xfeff8000 for this; we only really need two for
now and KVM puts the identity map at 0xfeffc000, so limit ourselves to
four.

Signed-off-by: David Woodhouse 
---
 include/sysemu/kvm_xen.h  |  8 
 target/i386/kvm/xen-emu.c | 15 +++
 2 files changed, 23 insertions(+)

diff --git a/include/sysemu/kvm_xen.h b/include/sysemu/kvm_xen.h
index 4999a0cff4..6e374a06ee 100644
--- a/include/sysemu/kvm_xen.h
+++ b/include/sysemu/kvm_xen.h
@@ -25,4 +25,12 @@ uint16_t kvm_xen_get_gnttab_max_frames(void);
 #define kvm_xen_has_cap(cap) (!!(kvm_xen_get_caps() &   \
  KVM_XEN_HVM_CONFIG_ ## cap))
 
+#define XEN_SPECIAL_AREA_ADDR 0xfeff8000UL
+#define XEN_SPECIAL_AREA_SIZE 0x4000UL
+
+#define XEN_SPECIALPAGE_CONSOLE 0
+#define XEN_SPECIALPAGE_XENSTORE1
+
+#define XEN_SPECIAL_PFN(x) ((XEN_SPECIAL_AREA_ADDR >> TARGET_PAGE_BITS) + 
XEN_SPECIALPAGE_##x)
+
 #endif /* QEMU_SYSEMU_KVM_XEN_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 7f13ae6a8b..83ab96724e 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -24,6 +24,7 @@
 
 #include "hw/pci/msi.h"
 #include "hw/i386/apic-msidef.h"
+#include "hw/i386/e820_memory_layout.h"
 #include "hw/i386/kvm/xen_overlay.h"
 #include "hw/i386/kvm/xen_evtchn.h"
 #include "hw/i386/kvm/xen_gnttab.h"
@@ -137,7 +138,21 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
 return ret;
 }
 
+/* If called a second time, don't repeat the rest of the setup. */
+if (s->xen_caps) {
+return 0;
+}
+
 s->xen_caps = xen_caps;
+
+/* Tell fw_cfg to notify the BIOS to reserve the range. */
+ret = e820_add_entry(XEN_SPECIAL_AREA_ADDR, XEN_SPECIAL_AREA_SIZE,
+ E820_RESERVED);
+if (ret < 0) {
+fprintf(stderr, "e820_add_entry() table is full\n");
+return ret;
+}
+
 return 0;
 }
 
-- 
2.35.3




Re: [PATCH v2 02/11] hw/watchdog/wdt_aspeed: Extend MMIO range to cover more registers

2022-12-30 Thread Philippe Mathieu-Daudé

On 30/12/22 12:34, Philippe Mathieu-Daudé wrote:

When booting the Zephyr demo in [1] we get:

   aspeed.io: unimplemented device write (size 4, offset 0x185128, value 
0x030f1ff1) <--
   aspeed.io: unimplemented device write (size 4, offset 0x18512c, value 
0x03f1)

This corresponds to this Zephyr code [2]:

   static int aspeed_wdt_init(const struct device *dev)
   {
 const struct aspeed_wdt_config *config = dev->config;
 struct aspeed_wdt_data *const data = dev->data;
 uint32_t reg_val;

 /* disable WDT by default */
 reg_val = sys_read32(config->ctrl_base + WDT_CTRL_REG);
 reg_val &= ~WDT_CTRL_ENABLE;
 sys_write32(reg_val, config->ctrl_base + WDT_CTRL_REG);

 sys_write32(data->rst_mask1,
 config->ctrl_base + WDT_SW_RESET_MASK1_REG);   <--
 sys_write32(data->rst_mask2,
 config->ctrl_base + WDT_SW_RESET_MASK2_REG);

 return 0;
   }

The register definitions are [3]:

   #define WDT_RELOAD_VAL_REG  0x0004
   #define WDT_RESTART_REG 0x0008
   #define WDT_CTRL_REG0x000C
   #define WDT_TIMEOUT_STATUS_REG  0x0010
   #define WDT_TIMEOUT_STATUS_CLR_REG  0x0014
   #define WDT_RESET_MASK1_REG 0x001C
   #define WDT_RESET_MASK2_REG 0x0020
   #define WDT_SW_RESET_MASK1_REG  0x0028   <--
   #define WDT_SW_RESET_MASK2_REG  0x002C
   #define WDT_SW_RESET_CTRL_REG   0x0024

Currently QEMU only cover a MMIO region of size 0x20:

   #define ASPEED_WDT_REGS_MAX(0x20 / 4)

Change to map the whole 'iosize' which might be bigger, covering
the other registers. The MemoryRegionOps read/write handlers will
report the accesses as out-of-bounds guest-errors, but the next
commit will report them as unimplemented.


I'll amend here for clarity:

---

Memory layout before this change:

  (qemu) info mtree -f
...
7e785000-7e78501f (prio 0, i/o): aspeed.wdt
7e785020-7e78507f (prio -1000, i/o): aspeed.io 
@00185020

7e785080-7e78509f (prio 0, i/o): aspeed.wdt
7e7850a0-7e7850ff (prio -1000, i/o): aspeed.io 
@001850a0

7e785100-7e78511f (prio 0, i/o): aspeed.wdt
7e785120-7e78517f (prio -1000, i/o): aspeed.io 
@00185120

7e785180-7e78519f (prio 0, i/o): aspeed.wdt
7e7851a0-7e788fff (prio -1000, i/o): aspeed.io 
@001851a0


After:

  (qemu) info mtree -f
...
7e785000-7e78507f (prio 0, i/o): aspeed.wdt
7e785080-7e7850ff (prio 0, i/o): aspeed.wdt
7e785100-7e78517f (prio 0, i/o): aspeed.wdt
7e785180-7e7851ff (prio 0, i/o): aspeed.wdt
7e785200-7e788fff (prio -1000, i/o): aspeed.io 
@00185200

---


[1] https://github.com/AspeedTech-BMC/zephyr/releases/tag/v00.01.07
[2] https://github.com/AspeedTech-BMC/zephyr/commit/2e99f10ac27b
[3] 
https://github.com/AspeedTech-BMC/zephyr/blob/v00.01.08/drivers/watchdog/wdt_aspeed.c#L31

Reviewed-by: Peter Delevoryas 
Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/watchdog/wdt_aspeed.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/watchdog/wdt_aspeed.c b/hw/watchdog/wdt_aspeed.c
index 958725a1b5..eefca31ae4 100644
--- a/hw/watchdog/wdt_aspeed.c
+++ b/hw/watchdog/wdt_aspeed.c
@@ -260,6 +260,7 @@ static void aspeed_wdt_realize(DeviceState *dev, Error 
**errp)
  {
  SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
  AspeedWDTState *s = ASPEED_WDT(dev);
+AspeedWDTClass *awc = ASPEED_WDT_GET_CLASS(dev);
  
  assert(s->scu);
  
@@ -271,7 +272,7 @@ static void aspeed_wdt_realize(DeviceState *dev, Error **errp)

  s->pclk_freq = PCLK_HZ;
  
  memory_region_init_io(>iomem, OBJECT(s), _wdt_ops, s,

-  TYPE_ASPEED_WDT, ASPEED_WDT_REGS_MAX * 4);
+  TYPE_ASPEED_WDT, awc->iosize);
  sysbus_init_mmio(sbd, >iomem);
  }
  





[RFC PATCH v5 36/52] hw/xen: Implement EVTCHNOP_bind_interdomain

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 78 +++
 hw/i386/kvm/xen_evtchn.h  |  2 +
 target/i386/kvm/xen-emu.c | 16 
 3 files changed, 96 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 5b93d52e6b..9146e61c66 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -715,6 +715,23 @@ static int close_port(XenEvtchnState *s, evtchn_port_t 
port)
 }
 break;
 
+case EVTCHNSTAT_interdomain:
+if (p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU) {
+/* Not yet implemented. This can't happen! */
+} else {
+/* Loopback interdomain */
+XenEvtchnPort *rp = >port_table[p->type_val];
+if (!valid_port(p->type_val) || rp->type_val != port ||
+rp->type != EVTCHNSTAT_interdomain) {
+error_report("Inconsistent state for interdomain unbind");
+} else {
+/* Set the other end back to unbound */
+rp->type = EVTCHNSTAT_unbound;
+rp->type_val = 0;
+}
+}
+break;
+
 default:
 break;
 }
@@ -830,6 +847,67 @@ int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi)
 return ret;
 }
 
+int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain *interdomain)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+uint16_t type_val;
+int ret;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (interdomain->remote_dom == DOMID_QEMU) {
+type_val = PORT_INFO_TYPEVAL_REMOTE_QEMU;
+} else if (interdomain->remote_dom == DOMID_SELF ||
+   interdomain->remote_dom == xen_domid) {
+type_val = 0;
+} else {
+return -ESRCH;
+}
+
+if (!valid_port(interdomain->remote_port)) {
+return -EINVAL;
+}
+
+qemu_mutex_lock(>port_lock);
+
+/* The newly allocated port starts out as unbound */
+ret = allocate_port(s, 0, EVTCHNSTAT_unbound, type_val,
+>local_port);
+if (ret) {
+goto out;
+}
+
+if (interdomain->remote_dom == DOMID_QEMU) {
+/* We haven't hooked up QEMU's PV drivers to this yet */
+ret = -ENOSYS;
+} else {
+/* Loopback */
+XenEvtchnPort *rp = >port_table[interdomain->remote_port];
+XenEvtchnPort *lp = >port_table[interdomain->local_port];
+
+if (rp->type == EVTCHNSTAT_unbound && rp->type_val == 0) {
+/* It's a match! */
+rp->type = EVTCHNSTAT_interdomain;
+rp->type_val = interdomain->local_port;
+
+lp->type = EVTCHNSTAT_interdomain;
+lp->type_val = interdomain->remote_port;
+} else {
+ret = -EINVAL;
+}
+}
+
+if (ret) {
+free_port(s, interdomain->local_port);
+}
+ out:
+qemu_mutex_unlock(>port_lock);
+
+return ret;
+
+}
 int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index fc080138e3..1ebc7580eb 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -22,6 +22,7 @@ struct evtchn_bind_virq;
 struct evtchn_bind_ipi;
 struct evtchn_send;
 struct evtchn_alloc_unbound;
+struct evtchn_bind_interdomain;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
@@ -29,5 +30,6 @@ int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq);
 int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi);
 int xen_evtchn_send_op(struct evtchn_send *send);
 int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc);
+int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain 
*interdomain);
 
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index ba3b32b67b..8a2a8c6291 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -918,6 +918,22 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 }
 break;
 }
+case EVTCHNOP_bind_interdomain: {
+struct evtchn_bind_interdomain interdomain;
+
+qemu_build_assert(sizeof(interdomain) == 12);
+if (kvm_copy_from_gva(cs, arg, , sizeof(interdomain))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_evtchn_bind_interdomain_op();
+if (!err &&
+kvm_copy_to_gva(cs, arg, , sizeof(interdomain))) {
+err = -EFAULT;
+}
+break;
+}
 default:
 return false;
 }
-- 
2.35.3




Re: [PATCH v4 2/3] hw/intc/loongarch_pch_pic: add irq number property

2022-12-30 Thread Philippe Mathieu-Daudé

On 30/12/22 10:59, Tianrui Zhao wrote:

With loongarch 7A1000 manual, irq number supported can be set
in PCH_PIC_INT_ID_HI register. This patch adds irq number property
for loongarch_pch_pic, so that virt machine can set different
irq number when pch_pic intc is added.

Signed-off-by: Tianrui Zhao 
---
  hw/intc/loongarch_pch_pic.c | 33 +
  hw/loongarch/virt.c |  8 ---
  include/hw/intc/loongarch_pch_pic.h |  5 ++---
  3 files changed, 36 insertions(+), 10 deletions(-)




@@ -78,7 +80,12 @@ static uint64_t loongarch_pch_pic_low_readw(void *opaque, 
hwaddr addr,
  val = PCH_PIC_INT_ID_VAL;
  break;
  case PCH_PIC_INT_ID_HI:
-val = PCH_PIC_INT_ID_NUM;
+/*
+ * With 7A1000 manual
+ *   bit  0-15 pch irqchip version
+ *   bit 16-31 irq number supported with pch irqchip
+ */
+val = PCH_PIC_INT_ID_VER + ((s->irq_num - 1) << 16);


We usually use the '|' operator instead of '+', but you can also
use deposit32() from "qemu/bitops.h":

  val = deposit32(PCH_PIC_INT_ID_VER, 16, 15, s->irq_num - 1);


  break;
  case PCH_PIC_INT_MASK_LO:
  val = (uint32_t)s->int_mask;
@@ -365,6 +372,19 @@ static void loongarch_pch_pic_reset(DeviceState *d)
  s->int_polarity = 0x0;
  }




diff --git a/include/hw/intc/loongarch_pch_pic.h 
b/include/hw/intc/loongarch_pch_pic.h
index 2d4aa9ed6f..ba3a47fa88 100644
--- a/include/hw/intc/loongarch_pch_pic.h
+++ b/include/hw/intc/loongarch_pch_pic.h
@@ -9,11 +9,9 @@
  #define PCH_PIC_NAME(name) TYPE_LOONGARCH_PCH_PIC#name
  OBJECT_DECLARE_SIMPLE_TYPE(LoongArchPCHPIC, LOONGARCH_PCH_PIC)
  
-#define PCH_PIC_IRQ_START   0

-#define PCH_PIC_IRQ_END 63
  #define PCH_PIC_IRQ_NUM 64
  #define PCH_PIC_INT_ID_VAL  0x700UL
-#define PCH_PIC_INT_ID_NUM  0x3f0001UL
+#define PCH_PIC_INT_ID_VER  0x1UL
  
  #define PCH_PIC_INT_ID_LO   0x00

  #define PCH_PIC_INT_ID_HI   0x04
@@ -66,4 +64,5 @@ struct LoongArchPCHPIC {
  MemoryRegion iomem32_low;
  MemoryRegion iomem32_high;
  MemoryRegion iomem8;
+unsigned int  irq_num;


There is one extra space here, otherwise LGTM.


  };





[RFC PATCH v5 30/52] hw/xen: Implement EVTCHNOP_close

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

It calls an internal close_port() helper which will also be used from
EVTCHNOP_reset and will actually do the work to disconnect/unbind a port
once any of that is actually implemented in the first place.

That in turn calls a free_port() internal function which will be in
error paths after allocation.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 121 ++
 hw/i386/kvm/xen_evtchn.h  |   2 +
 target/i386/kvm/xen-emu.c |  12 
 3 files changed, 135 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 0f1abfb760..f732821d5b 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -21,6 +21,7 @@
 
 #include "hw/sysbus.h"
 #include "hw/xen/xen.h"
+
 #include "xen_evtchn.h"
 #include "xen_overlay.h"
 
@@ -40,6 +41,41 @@ typedef struct XenEvtchnPort {
 uint16_t type_val;  /* pirq# / virq# / remote port according to type */
 } XenEvtchnPort;
 
+/* 32-bit compatibility definitions, also used natively in 32-bit build */
+struct compat_arch_vcpu_info {
+unsigned int cr2;
+unsigned int pad[5];
+};
+
+struct compat_vcpu_info {
+uint8_t evtchn_upcall_pending;
+uint8_t evtchn_upcall_mask;
+uint16_t pad;
+uint32_t evtchn_pending_sel;
+struct compat_arch_vcpu_info arch;
+struct vcpu_time_info time;
+}; /* 64 bytes (x86) */
+
+struct compat_arch_shared_info {
+unsigned int max_pfn;
+unsigned int pfn_to_mfn_frame_list_list;
+unsigned int nmi_reason;
+unsigned int p2m_cr3;
+unsigned int p2m_vaddr;
+unsigned int p2m_generation;
+uint32_t wc_sec_hi;
+};
+
+struct compat_shared_info {
+struct compat_vcpu_info vcpu_info[XEN_LEGACY_MAX_VCPUS];
+uint32_t evtchn_pending[32];
+uint32_t evtchn_mask[32];
+uint32_t wc_version;  /* Version counter: see vcpu_time_info_t. */
+uint32_t wc_sec;
+uint32_t wc_nsec;
+struct compat_arch_shared_info arch;
+};
+
 #define COMPAT_EVTCHN_2L_NR_CHANNELS1024
 
 /*
@@ -252,3 +288,88 @@ int xen_evtchn_status_op(struct evtchn_status *status)
 qemu_mutex_unlock(>port_lock);
 return 0;
 }
+
+static int clear_port_pending(XenEvtchnState *s, evtchn_port_t port)
+{
+void *p = xen_overlay_get_shinfo_ptr();
+if (!p)
+return -ENOTSUP;
+
+if (xen_is_long_mode()) {
+struct shared_info *shinfo = p;
+const int bits_per_word = BITS_PER_BYTE * 
sizeof(shinfo->evtchn_pending[0]);
+typeof(shinfo->evtchn_pending[0]) mask;
+int idx = port / bits_per_word;
+int offset = port % bits_per_word;
+
+mask = 1UL << offset;
+
+qatomic_fetch_and(>evtchn_pending[idx], ~mask);
+} else {
+struct compat_shared_info *shinfo = p;
+const int bits_per_word = BITS_PER_BYTE * 
sizeof(shinfo->evtchn_pending[0]);
+typeof(shinfo->evtchn_pending[0]) mask;
+int idx = port / bits_per_word;
+int offset = port % bits_per_word;
+
+mask = 1UL << offset;
+
+qatomic_fetch_and(>evtchn_pending[idx], ~mask);
+}
+return 0;
+}
+
+static void free_port(XenEvtchnState *s, evtchn_port_t port)
+{
+s->port_table[port].type = EVTCHNSTAT_closed;
+s->port_table[port].type_val = 0;
+s->port_table[port].vcpu = 0;
+
+if (s->nr_ports == port + 1) {
+do {
+s->nr_ports--;
+} while (s->nr_ports &&
+ s->port_table[s->nr_ports - 1].type == EVTCHNSTAT_closed);
+}
+
+/* Clear pending event to avoid unexpected behavior on re-bind. */
+clear_port_pending(s, port);
+}
+
+static int close_port(XenEvtchnState *s, evtchn_port_t port)
+{
+XenEvtchnPort *p = >port_table[port];
+
+switch (p->type) {
+case EVTCHNSTAT_closed:
+return -ENOENT;
+
+default:
+break;
+}
+
+free_port(s, port);
+return 0;
+}
+
+int xen_evtchn_close_op(struct evtchn_close *close)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int ret;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (!valid_port(close->port)) {
+return -EINVAL;
+}
+
+qemu_mutex_lock(>port_lock);
+
+ret = close_port(s, close->port);
+
+qemu_mutex_unlock(>port_lock);
+
+return ret;
+}
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 76467636ee..cb3924941a 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -16,6 +16,8 @@ void xen_evtchn_create(void);
 int xen_evtchn_set_callback_param(uint64_t param);
 
 struct evtchn_status;
+struct evtchn_close;
 int xen_evtchn_status_op(struct evtchn_status *status);
+int xen_evtchn_close_op(struct evtchn_close *close);
 
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 6f3f8ae834..a95063434e 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -787,6 +787,18 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 }
  

[RFC PATCH v5 16/52] i386/xen: manage and save/restore Xen guest long_mode setting

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

Xen will "latch" the guest's 32-bit or 64-bit ("long mode") setting when
the guest writes the MSR to fill in the hypercall page, or when the guest
sets the event channel callback in HVM_PARAM_CALLBACK_IRQ.

KVM handles the former and sets the kernel's long_mode flag accordingly.
The latter will be handled in userspace. Keep them in sync by noticing
when a hypercall is made in a mode that doesn't match qemu's idea of
the guest mode, and resyncing from the kernel. Do that same sync right
before serialization too, in case the guest has set the hypercall page
but hasn't yet made a system call.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_overlay.c | 65 +++
 hw/i386/kvm/xen_overlay.h |  4 +++
 target/i386/kvm/xen-emu.c | 12 
 3 files changed, 81 insertions(+)

diff --git a/hw/i386/kvm/xen_overlay.c b/hw/i386/kvm/xen_overlay.c
index 331dea6b8b..7046bd55f1 100644
--- a/hw/i386/kvm/xen_overlay.c
+++ b/hw/i386/kvm/xen_overlay.c
@@ -44,6 +44,7 @@ struct XenOverlayState {
 MemoryRegion shinfo_mem;
 void *shinfo_ptr;
 uint64_t shinfo_gpa;
+bool long_mode;
 };
 
 struct XenOverlayState *xen_overlay_singleton;
@@ -96,9 +97,21 @@ static void xen_overlay_realize(DeviceState *dev, Error 
**errp)
 
 s->shinfo_ptr = memory_region_get_ram_ptr(>shinfo_mem);
 s->shinfo_gpa = INVALID_GPA;
+s->long_mode = false;
 memset(s->shinfo_ptr, 0, XEN_PAGE_SIZE);
 }
 
+static int xen_overlay_pre_save(void *opaque)
+{
+/*
+ * Fetch the kernel's idea of long_mode to avoid the race condition
+ * where the guest has set the hypercall page up in 64-bit mode but
+ * not yet made a hypercall by the time migration happens, so qemu
+ * hasn't yet noticed.
+ */
+return xen_sync_long_mode();
+}
+
 static int xen_overlay_post_load(void *opaque, int version_id)
 {
 XenOverlayState *s = opaque;
@@ -107,6 +120,9 @@ static int xen_overlay_post_load(void *opaque, int 
version_id)
 xen_overlay_map_page_locked(>shinfo_mem, s->shinfo_gpa);
 xen_overlay_set_be_shinfo(s->shinfo_gpa >> XEN_PAGE_SHIFT);
 }
+if (s->long_mode) {
+xen_set_long_mode(true);
+}
 
 return 0;
 }
@@ -121,9 +137,11 @@ static const VMStateDescription xen_overlay_vmstate = {
 .version_id = 1,
 .minimum_version_id = 1,
 .needed = xen_overlay_is_needed,
+.pre_save = xen_overlay_pre_save,
 .post_load = xen_overlay_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINT64(shinfo_gpa, XenOverlayState),
+VMSTATE_BOOL(long_mode, XenOverlayState),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -198,3 +216,50 @@ void *xen_overlay_get_shinfo_ptr(void)
 
 return s->shinfo_ptr;
 }
+
+int xen_sync_long_mode(void)
+{
+int ret;
+struct kvm_xen_hvm_attr xa = {
+.type = KVM_XEN_ATTR_TYPE_LONG_MODE,
+};
+
+if (!xen_overlay_singleton) {
+return -ENOENT;
+}
+
+ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_GET_ATTR, );
+if (!ret) {
+xen_overlay_singleton->long_mode = xa.u.long_mode;
+}
+
+return ret;
+}
+
+int xen_set_long_mode(bool long_mode)
+{
+int ret;
+struct kvm_xen_hvm_attr xa = {
+.type = KVM_XEN_ATTR_TYPE_LONG_MODE,
+.u.long_mode = long_mode,
+};
+
+if (!xen_overlay_singleton) {
+return -ENOENT;
+}
+
+ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, );
+if (!ret) {
+xen_overlay_singleton->long_mode = xa.u.long_mode;
+}
+
+return ret;
+}
+
+bool xen_is_long_mode(void)
+{
+if (xen_overlay_singleton) {
+return xen_overlay_singleton->long_mode;
+}
+return false;
+}
diff --git a/hw/i386/kvm/xen_overlay.h b/hw/i386/kvm/xen_overlay.h
index 00cff05bb0..5c46a0b036 100644
--- a/hw/i386/kvm/xen_overlay.h
+++ b/hw/i386/kvm/xen_overlay.h
@@ -17,4 +17,8 @@ void xen_overlay_create(void);
 int xen_overlay_map_shinfo_page(uint64_t gpa);
 void *xen_overlay_get_shinfo_ptr(void);
 
+int xen_sync_long_mode(void);
+int xen_set_long_mode(bool long_mode);
+bool xen_is_long_mode(void);
+
 #endif /* QEMU_XEN_OVERLAY_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 80005ea527..80f09f33df 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -19,6 +19,8 @@
 #include "trace.h"
 #include "sysemu/runstate.h"
 
+#include "hw/i386/kvm/xen_overlay.h"
+
 #include "standard-headers/xen/version.h"
 #include "standard-headers/xen/sched.h"
 
@@ -274,6 +276,16 @@ int kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit 
*exit)
 return -1;
 }
 
+/*
+ * The kernel latches the guest 32/64 mode when the MSR is used to fill
+ * the hypercall page. So if we see a hypercall in a mode that doesn't
+ * match our own idea of the guest mode, fetch the kernel's idea of the
+ * "long mode" to remain in sync.
+ */
+if (exit->u.hcall.longmode != xen_is_long_mode()) {
+xen_sync_long_mode();
+}
+
 

[RFC PATCH v5 25/52] i386/xen: implement HVMOP_set_evtchn_upcall_vector

2022-12-30 Thread David Woodhouse
From: Ankur Arora 

The HVMOP_set_evtchn_upcall_vector hypercall sets the per-vCPU upcall
vector, to be delivered to the local APIC just like an MSI (with an EOI).

This takes precedence over the system-wide delivery method set by the
HVMOP_set_param hypercall with HVM_PARAM_CALLBACK_IRQ. It's used by
Windows and Xen (PV shim) guests but normally not by Linux.

Signed-off-by: Ankur Arora 
Signed-off-by: Joao Martins 
[dwmw2: Rework for upstream kernel changes and split from HVMOP_set_param]
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/cpu.h|  1 +
 target/i386/kvm/trace-events |  1 +
 target/i386/kvm/xen-emu.c| 84 ++--
 target/i386/machine.c|  1 +
 4 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index bf44a87ddb..938a1b9c8b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1792,6 +1792,7 @@ typedef struct CPUArchState {
 uint64_t xen_vcpu_info_default_gpa;
 uint64_t xen_vcpu_time_info_gpa;
 uint64_t xen_vcpu_runstate_gpa;
+uint8_t xen_vcpu_callback_vector;
 #endif
 #if defined(CONFIG_HVF)
 HVFX86LazyFlags hvf_lflags;
diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index 14e54dfca5..6133f6dd9e 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -10,3 +10,4 @@ kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
 kvm_xen_hypercall(int cpu, uint8_t cpl, uint64_t input, uint64_t a0, uint64_t 
a1, uint64_t a2, uint64_t ret) "xen_hypercall: cpu %d cpl %d input %" PRIu64 " 
a0 0x%" PRIx64 " a1 0x%" PRIx64 " a2 0x%" PRIx64" ret 0x%" PRIx64
 kvm_xen_set_shared_info(uint64_t gfn) "shared info at gfn 0x%" PRIx64
 kvm_xen_set_vcpu_attr(int cpu, int type, uint64_t gpa) "vcpu attr cpu %d type 
%d gpa 0x%" PRIx64
+kvm_xen_set_vcpu_callback(int cpu, int vector) "callback vcpu %d vector %d"
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 9cc939b41e..9f0ade9325 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -28,6 +28,7 @@
 #include "standard-headers/xen/sched.h"
 #include "standard-headers/xen/memory.h"
 #include "standard-headers/xen/hvm/hvm_op.h"
+#include "standard-headers/xen/hvm/params.h"
 #include "standard-headers/xen/vcpu.h"
 #include "standard-headers/xen/event_channel.h"
 
@@ -194,7 +195,8 @@ static bool kvm_xen_hcall_xen_version(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 fi.submap |= 1 << XENFEAT_writable_page_tables |
  1 << XENFEAT_writable_descriptor_tables |
  1 << XENFEAT_auto_translated_physmap |
- 1 << XENFEAT_supervisor_mode_kernel;
+ 1 << XENFEAT_supervisor_mode_kernel |
+ 1 << XENFEAT_hvm_callback_vector;
 }
 
 err = kvm_copy_to_gva(CPU(cpu), arg, , sizeof(fi));
@@ -221,6 +223,31 @@ static int kvm_xen_set_vcpu_attr(CPUState *cs, uint16_t 
type, uint64_t gpa)
 return kvm_vcpu_ioctl(cs, KVM_XEN_VCPU_SET_ATTR, );
 }
 
+static int kvm_xen_set_vcpu_callback_vector(CPUState *cs)
+{
+uint8_t vector = X86_CPU(cs)->env.xen_vcpu_callback_vector;
+struct kvm_xen_vcpu_attr xva;
+
+xva.type = KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR;
+xva.u.vector = vector;
+
+trace_kvm_xen_set_vcpu_callback(cs->cpu_index, vector);
+
+return kvm_vcpu_ioctl(cs, KVM_XEN_HVM_SET_ATTR, );
+}
+
+static void do_set_vcpu_callback_vector(CPUState *cs, run_on_cpu_data data)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
+
+env->xen_vcpu_callback_vector = data.host_int;
+
+if (kvm_xen_has_cap(EVTCHN_SEND)) {
+kvm_xen_set_vcpu_callback_vector(cs);
+}
+}
+
 static void do_set_vcpu_info_default_gpa(CPUState *cs, run_on_cpu_data data)
 {
 X86CPU *cpu = X86_CPU(cs);
@@ -277,12 +304,16 @@ static void do_vcpu_soft_reset(CPUState *cs, 
run_on_cpu_data data)
 env->xen_vcpu_info_default_gpa = INVALID_GPA;
 env->xen_vcpu_time_info_gpa = INVALID_GPA;
 env->xen_vcpu_runstate_gpa = INVALID_GPA;
+env->xen_vcpu_callback_vector = 0;
 
 kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO, INVALID_GPA);
 kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO,
   INVALID_GPA);
 kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR,
   INVALID_GPA);
+if (kvm_xen_has_cap(EVTCHN_SEND)) {
+kvm_xen_set_vcpu_callback_vector(cs);
+}
 
 }
 
@@ -457,17 +488,53 @@ static bool kvm_xen_hcall_memory_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static int kvm_xen_hcall_evtchn_upcall_vector(struct kvm_xen_exit *exit,
+  X86CPU *cpu, uint64_t arg)
+{
+struct xen_hvm_evtchn_upcall_vector up;
+CPUState *target_cs;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(up) == 8);
+
+

[RFC PATCH v5 24/52] i386/xen: implement HYPERVISOR_event_channel_op

2022-12-30 Thread David Woodhouse
From: Joao Martins 

Additionally set XEN_INTERFACE_VERSION to most recent in order to
exercise the "new" event_channel_op.

Signed-off-by: Joao Martins 
[dwmw2: Ditch event_channel_op_compat which was never available to HVM guests]
Signed-off-by: David Woodhouse 
---
 target/i386/kvm/xen-emu.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 6521df0461..9cc939b41e 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -29,6 +29,7 @@
 #include "standard-headers/xen/memory.h"
 #include "standard-headers/xen/hvm/hvm_op.h"
 #include "standard-headers/xen/vcpu.h"
+#include "standard-headers/xen/event_channel.h"
 
 #include "xen-compat.h"
 
@@ -587,6 +588,24 @@ static bool kvm_xen_hcall_vcpu_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit *exit,
+int cmd, uint64_t arg)
+{
+int err = -ENOSYS;
+
+switch (cmd) {
+case EVTCHNOP_init_control:
+err = -ENOSYS;
+break;
+
+default:
+return false;
+}
+
+exit->u.hcall.result = err;
+return true;
+}
+
 static int kvm_xen_soft_reset(void)
 {
 CPUState *cpu;
@@ -686,6 +705,9 @@ static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct 
kvm_xen_exit *exit)
 case __HYPERVISOR_sched_op:
 return kvm_xen_hcall_sched_op(exit, cpu, exit->u.hcall.params[0],
   exit->u.hcall.params[1]);
+case __HYPERVISOR_event_channel_op:
+return kvm_xen_hcall_evtchn_op(exit, exit->u.hcall.params[0],
+   exit->u.hcall.params[1]);
 case __HYPERVISOR_vcpu_op:
 return kvm_xen_hcall_vcpu_op(exit, cpu,
  exit->u.hcall.params[0],
-- 
2.35.3




Re: [PATCH v4 1/3] hw/intc/loongarch_pch_msi: add irq number property

2022-12-30 Thread Philippe Mathieu-Daudé

On 30/12/22 10:59, Tianrui Zhao wrote:

This patch adds irq number property for loongarch msi interrupt
controller, and remove hard coding irq number macro.

Signed-off-by: Tianrui Zhao 
---
  hw/intc/loongarch_pch_msi.c | 33 ++---
  hw/loongarch/virt.c | 13 +++-
  include/hw/intc/loongarch_pch_msi.h |  3 ++-
  include/hw/pci-host/ls7a.h  |  1 -
  4 files changed, 40 insertions(+), 10 deletions(-)




+static void loongarch_pch_msi_realize(DeviceState *dev, Error **errp)
+{
+LoongArchPCHMSI *s = LOONGARCH_PCH_MSI(dev);
+
+if (!s->irq_num || s->irq_num  > PCH_MSI_IRQ_NUM) {


Here you check for s->irq_num != 0, ...


+error_setg(errp, "Invalid 'msi_irq_num'");
+return;
+}
+
+s->pch_msi_irq = g_new(qemu_irq, s->irq_num);
+if (!s->pch_msi_irq) {


... so this check is unreachable / pointless: g_new() will never
return NULL but abort.


+error_report("loongarch_pch_msi: fail to alloc memory");
+exit(1);
+}
+
+qdev_init_gpio_out(dev, s->pch_msi_irq, s->irq_num);
+qdev_init_gpio_in(dev, pch_msi_irq_handler, s->irq_num);
+}



diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 958be74fa1..1e58346aeb 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -496,7 +496,7 @@ static void loongarch_irq_init(LoongArchMachineState *lams)
  LoongArchCPU *lacpu;
  CPULoongArchState *env;
  CPUState *cpu_state;
-int cpu, pin, i;
+int cpu, pin, i, start, num;
  
  ipi = qdev_new(TYPE_LOONGARCH_IPI);

  sysbus_realize_and_unref(SYS_BUS_DEVICE(ipi), _fatal);
@@ -576,14 +576,17 @@ static void loongarch_irq_init(LoongArchMachineState 
*lams)
  }
  
  pch_msi = qdev_new(TYPE_LOONGARCH_PCH_MSI);

-qdev_prop_set_uint32(pch_msi, "msi_irq_base", PCH_MSI_IRQ_START);
+start   =  PCH_PIC_IRQ_NUM;


Maybe name this 'msi_irq_base'


+num = EXTIOI_IRQS - start;


and 'msi_irq_num' for clarity?


+qdev_prop_set_uint32(pch_msi, "msi_irq_base", start);
+qdev_prop_set_uint32(pch_msi, "msi_irq_num", num);
  d = SYS_BUS_DEVICE(pch_msi);
  sysbus_realize_and_unref(d, _fatal);
  sysbus_mmio_map(d, 0, VIRT_PCH_MSI_ADDR_LOW);
-for (i = 0; i < PCH_MSI_IRQ_NUM; i++) {
-/* Connect 192 pch_msi irqs to extioi */
+for (i = 0; i < num; i++) {
+/* Connect pch_msi irqs to extioi */
  qdev_connect_gpio_out(DEVICE(d), i,
-  qdev_get_gpio_in(extioi, i + PCH_MSI_IRQ_START));
+  qdev_get_gpio_in(extioi, i + start));
  }


Removing the unreachable check:
Reviewed-by: Philippe Mathieu-Daudé 




[RFC PATCH v5 27/52] hw/xen: Add xen_evtchn device for event channel emulation

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

Include basic support for setting HVM_PARAM_CALLBACK_IRQ to the global
vector method HVM_PARAM_CALLBACK_TYPE_VECTOR, which is handled in-kernel
by raising the vector whenever the vCPU's vcpu_info->evtchn_upcall_pending
flag is set.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/meson.build   |   5 +-
 hw/i386/kvm/xen_evtchn.c  | 148 ++
 hw/i386/kvm/xen_evtchn.h  |  18 +
 hw/i386/pc.c  |   2 +
 target/i386/kvm/xen-emu.c |  10 +++
 5 files changed, 182 insertions(+), 1 deletion(-)
 create mode 100644 hw/i386/kvm/xen_evtchn.c
 create mode 100644 hw/i386/kvm/xen_evtchn.h

diff --git a/hw/i386/kvm/meson.build b/hw/i386/kvm/meson.build
index 6165cbf019..cab64df339 100644
--- a/hw/i386/kvm/meson.build
+++ b/hw/i386/kvm/meson.build
@@ -4,6 +4,9 @@ i386_kvm_ss.add(when: 'CONFIG_APIC', if_true: files('apic.c'))
 i386_kvm_ss.add(when: 'CONFIG_I8254', if_true: files('i8254.c'))
 i386_kvm_ss.add(when: 'CONFIG_I8259', if_true: files('i8259.c'))
 i386_kvm_ss.add(when: 'CONFIG_IOAPIC', if_true: files('ioapic.c'))
-i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen_overlay.c'))
+i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files(
+  'xen_overlay.c',
+  'xen_evtchn.c',
+  ))
 
 i386_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss)
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
new file mode 100644
index 00..018f4ef4da
--- /dev/null
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -0,0 +1,148 @@
+/*
+ * QEMU Xen emulation: Event channel support
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Authors: David Woodhouse 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "qemu/module.h"
+#include "qemu/main-loop.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+#include "exec/target_page.h"
+#include "exec/address-spaces.h"
+#include "migration/vmstate.h"
+
+#include "hw/sysbus.h"
+#include "hw/xen/xen.h"
+#include "xen_evtchn.h"
+
+#include "sysemu/kvm.h"
+#include "sysemu/kvm_xen.h"
+#include 
+
+#include "standard-headers/xen/memory.h"
+#include "standard-headers/xen/hvm/params.h"
+
+#define TYPE_XEN_EVTCHN "xen-evtchn"
+OBJECT_DECLARE_SIMPLE_TYPE(XenEvtchnState, XEN_EVTCHN)
+
+struct XenEvtchnState {
+/*< private >*/
+SysBusDevice busdev;
+/*< public >*/
+
+uint64_t callback_param;
+bool evtchn_in_kernel;
+
+QemuMutex port_lock;
+};
+
+struct XenEvtchnState *xen_evtchn_singleton;
+
+/* Top bits of callback_param are the type (HVM_PARAM_CALLBACK_TYPE_xxx) */
+#define CALLBACK_VIA_TYPE_SHIFT 56
+
+static int xen_evtchn_post_load(void *opaque, int version_id)
+{
+XenEvtchnState *s = opaque;
+
+if (s->callback_param) {
+xen_evtchn_set_callback_param(s->callback_param);
+}
+
+return 0;
+}
+
+static bool xen_evtchn_is_needed(void *opaque)
+{
+return xen_mode == XEN_EMULATE;
+}
+
+static const VMStateDescription xen_evtchn_vmstate = {
+.name = "xen_evtchn",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = xen_evtchn_is_needed,
+.post_load = xen_evtchn_post_load,
+.fields = (VMStateField[]) {
+VMSTATE_UINT64(callback_param, XenEvtchnState),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static void xen_evtchn_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->vmsd = _evtchn_vmstate;
+}
+
+static const TypeInfo xen_evtchn_info = {
+.name  = TYPE_XEN_EVTCHN,
+.parent= TYPE_SYS_BUS_DEVICE,
+.instance_size = sizeof(XenEvtchnState),
+.class_init= xen_evtchn_class_init,
+};
+
+void xen_evtchn_create(void)
+{
+XenEvtchnState *s = XEN_EVTCHN(sysbus_create_simple(TYPE_XEN_EVTCHN,
+-1, NULL));
+xen_evtchn_singleton = s;
+
+qemu_mutex_init(>port_lock);
+}
+
+static void xen_evtchn_register_types(void)
+{
+type_register_static(_evtchn_info);
+}
+
+type_init(xen_evtchn_register_types)
+
+int xen_evtchn_set_callback_param(uint64_t param)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+bool in_kernel = false;
+int ret;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+qemu_mutex_lock(>port_lock);
+
+switch (param >> CALLBACK_VIA_TYPE_SHIFT) {
+case HVM_PARAM_CALLBACK_TYPE_VECTOR: {
+struct kvm_xen_hvm_attr xa = {
+.type = KVM_XEN_ATTR_TYPE_UPCALL_VECTOR,
+.u.vector = (uint8_t)param,
+};
+
+ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, );
+if (!ret && kvm_xen_has_cap(EVTCHN_SEND)) {
+in_kernel = true;
+}
+break;
+}
+default:
+ret = -ENOSYS;
+break;
+}
+
+if (!ret) {
+s->callback_param = param;
+s->evtchn_in_kernel = in_kernel;
+}
+
+

Re: [PATCH v4 10/11] hw/riscv/boot.c: introduce riscv_load_kernel_and_initrd()

2022-12-30 Thread Bin Meng
On Fri, Dec 30, 2022 at 8:04 PM Daniel Henrique Barboza
 wrote:
>
>
>
> On 12/30/22 06:05, Bin Meng wrote:
> > On Fri, Dec 30, 2022 at 2:47 AM Daniel Henrique Barboza
> >  wrote:
> >> The microchip_icicle_kit, sifive_u, spike and virt boards are now doing
> >> the same steps when '-kernel' is used:
> >>
> >> - execute load_kernel()
> >> - load init_rd()
> >> - write kernel_cmdline in the fdt
> >>
> >> Let's fold everything inside riscv_load_kernel() to avoid code
> >> repetition. Every other board that uses riscv_load_kernel() will have
> >> this same behavior, including boards that doesn't have a valid FDT, so
> >> we need to take care to not do FDT operations without checking it first.
> >>
> >> Since we're now doing way more than just loading the kernel, rename
> >> riscv_load_kernel() to riscv_load_kernel_and_initrd().
> >>
> >> Cc: Palmer Dabbelt 
> >> Reviewed-by: Philippe Mathieu-Daudé 
> >> Signed-off-by: Daniel Henrique Barboza 
> >> ---
> >>   hw/riscv/boot.c| 27 +--
> >>   hw/riscv/microchip_pfsoc.c | 12 ++--
> >>   hw/riscv/opentitan.c   |  2 +-
> >>   hw/riscv/sifive_e.c|  3 ++-
> >>   hw/riscv/sifive_u.c| 12 ++--
> >>   hw/riscv/spike.c   | 14 +++---
> >>   hw/riscv/virt.c| 12 ++--
> >>   include/hw/riscv/boot.h|  6 +++---
> >>   8 files changed, 36 insertions(+), 52 deletions(-)
> >>
> >> diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
> >> index 5606ea6f43..d18262c302 100644
> >> --- a/hw/riscv/boot.c
> >> +++ b/hw/riscv/boot.c
> >> @@ -171,12 +171,13 @@ target_ulong riscv_load_firmware(const char 
> >> *firmware_filename,
> >>   exit(1);
> >>   }
> >>
> >> -target_ulong riscv_load_kernel(MachineState *machine,
> >> -   target_ulong kernel_start_addr,
> >> -   symbol_fn_t sym_cb)
> >> +target_ulong riscv_load_kernel_and_initrd(MachineState *machine,
> >> +  target_ulong kernel_start_addr,
> >> +  symbol_fn_t sym_cb)
> > How about we keep the riscv_load_kernel() API, in case there is a need
> > for machines that just want to load kernel only?
> >
> > Then introduce a new API riscv_load_kernel_and_initrd() to do both
> > kernel and initrd loading?
>
> What about an extra flag to riscv_load_kernel(), e.g:
>
> riscv_load_kernel(MachineState *machine,  target_ulong kernel_start_addr,
>  bool load_initrd, symbol_fn_t sym_cb)
>
>
> And then machines that don't want to load initrd can opt out by using
> load_initrd = false. The name of the flag would also make our intentions with
> the API self explanatory (i.e. load_initrd makes load_kernel() loads initrd
> as well).
>

That sounds like a good idea!

Regards,
Bin



[RFC PATCH v5 33/52] hw/xen: Implement EVTCHNOP_bind_ipi

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 69 +++
 hw/i386/kvm/xen_evtchn.h  |  2 ++
 target/i386/kvm/xen-emu.c | 15 +
 3 files changed, 86 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index f8e7170a84..ae715c4af9 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -13,6 +13,7 @@
 #include "qemu/host-utils.h"
 #include "qemu/module.h"
 #include "qemu/main-loop.h"
+#include "qemu/log.h"
 #include "qapi/error.h"
 #include "qom/object.h"
 #include "exec/target_page.h"
@@ -226,6 +227,43 @@ static void inject_callback(XenEvtchnState *s, uint32_t 
vcpu)
 kvm_xen_inject_vcpu_callback_vector(vcpu, type);
 }
 
+static void deassign_kernel_port(evtchn_port_t port)
+{
+struct kvm_xen_hvm_attr ha;
+int ret;
+
+ha.type = KVM_XEN_ATTR_TYPE_EVTCHN;
+ha.u.evtchn.send_port = port;
+ha.u.evtchn.flags = KVM_XEN_EVTCHN_DEASSIGN;
+
+ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, );
+if (ret) {
+qemu_log_mask(LOG_GUEST_ERROR, "Failed to unbind kernel port %d: %s\n",
+  port, strerror(ret));
+}
+}
+
+static int assign_kernel_port(uint16_t type, evtchn_port_t port,
+  uint32_t vcpu_id)
+{
+CPUState *cpu = qemu_get_cpu(vcpu_id);
+struct kvm_xen_hvm_attr ha;
+
+if (!cpu) {
+return -ENOENT;
+}
+
+ha.type = KVM_XEN_ATTR_TYPE_EVTCHN;
+ha.u.evtchn.send_port = port;
+ha.u.evtchn.type = type;
+ha.u.evtchn.flags = 0;
+ha.u.evtchn.deliver.port.port = port;
+ha.u.evtchn.deliver.port.vcpu = kvm_arch_vcpu_id(cpu);
+ha.u.evtchn.deliver.port.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL;
+
+return kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, );
+}
+
 static bool valid_port(evtchn_port_t port)
 {
 if (!port) {
@@ -544,6 +582,12 @@ static int close_port(XenEvtchnState *s, evtchn_port_t 
port)
   p->type_val, 0);
 break;
 
+case EVTCHNSTAT_ipi:
+if (s->evtchn_in_kernel) {
+deassign_kernel_port(port);
+}
+break;
+
 default:
 break;
 }
@@ -633,3 +677,28 @@ int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq)
 
 return ret;
 }
+
+int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int ret;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (!valid_vcpu(ipi->vcpu)) {
+return -ENOENT;
+}
+
+qemu_mutex_lock(>port_lock);
+
+ret = allocate_port(s, ipi->vcpu, EVTCHNSTAT_ipi, 0, >port);
+if (!ret && s->evtchn_in_kernel) {
+assign_kernel_port(EVTCHNSTAT_ipi, ipi->port, ipi->vcpu);
+}
+
+qemu_mutex_unlock(>port_lock);
+
+return ret;
+}
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 0ea13dda3a..107f420848 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -19,9 +19,11 @@ struct evtchn_status;
 struct evtchn_close;
 struct evtchn_unmask;
 struct evtchn_bind_virq;
+struct evtchn_bind_ipi;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
 int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq);
+int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi);
 
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index f3d456f4b8..04f672c0ca 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -876,6 +876,21 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 }
 break;
 }
+case EVTCHNOP_bind_ipi: {
+struct evtchn_bind_ipi ipi;
+
+qemu_build_assert(sizeof(ipi) == 8);
+if (kvm_copy_from_gva(cs, arg, , sizeof(ipi))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_evtchn_bind_ipi_op();
+if (!err && kvm_copy_to_gva(cs, arg, , sizeof(ipi))) {
+err = -EFAULT;
+}
+break;
+}
 default:
 return false;
 }
-- 
2.35.3




[RFC PATCH v5 43/52] hw/xen: Add xen_gnttab device for grant table emulation

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/meson.build   |   1 +
 hw/i386/kvm/xen_gnttab.c  | 110 ++
 hw/i386/kvm/xen_gnttab.h  |  18 +++
 hw/i386/pc.c  |   2 +
 target/i386/kvm/xen-emu.c |   3 ++
 5 files changed, 134 insertions(+)
 create mode 100644 hw/i386/kvm/xen_gnttab.c
 create mode 100644 hw/i386/kvm/xen_gnttab.h

diff --git a/hw/i386/kvm/meson.build b/hw/i386/kvm/meson.build
index cab64df339..e02449e4d4 100644
--- a/hw/i386/kvm/meson.build
+++ b/hw/i386/kvm/meson.build
@@ -7,6 +7,7 @@ i386_kvm_ss.add(when: 'CONFIG_IOAPIC', if_true: 
files('ioapic.c'))
 i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files(
   'xen_overlay.c',
   'xen_evtchn.c',
+  'xen_gnttab.c',
   ))
 
 i386_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss)
diff --git a/hw/i386/kvm/xen_gnttab.c b/hw/i386/kvm/xen_gnttab.c
new file mode 100644
index 00..7a441445cd
--- /dev/null
+++ b/hw/i386/kvm/xen_gnttab.c
@@ -0,0 +1,110 @@
+/*
+ * QEMU Xen emulation: Grant table support
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Authors: David Woodhouse 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "qemu/module.h"
+#include "qemu/main-loop.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+#include "exec/target_page.h"
+#include "exec/address-spaces.h"
+#include "migration/vmstate.h"
+
+#include "hw/sysbus.h"
+#include "hw/xen/xen.h"
+#include "xen_overlay.h"
+#include "xen_gnttab.h"
+
+#include "sysemu/kvm.h"
+#include "sysemu/kvm_xen.h"
+
+#include "standard-headers/xen/memory.h"
+#include "standard-headers/xen/grant_table.h"
+
+#define TYPE_XEN_GNTTAB "xen-gnttab"
+OBJECT_DECLARE_SIMPLE_TYPE(XenGnttabState, XEN_GNTTAB)
+
+#define XEN_PAGE_SHIFT 12
+#define XEN_PAGE_SIZE (1ULL << XEN_PAGE_SHIFT)
+
+struct XenGnttabState {
+/*< private >*/
+SysBusDevice busdev;
+/*< public >*/
+
+uint32_t nr_frames;
+uint32_t max_frames;
+};
+
+struct XenGnttabState *xen_gnttab_singleton;
+
+static void xen_gnttab_realize(DeviceState *dev, Error **errp)
+{
+XenGnttabState *s = XEN_GNTTAB(dev);
+
+if (xen_mode != XEN_EMULATE) {
+error_setg(errp, "Xen grant table support is for Xen emulation");
+return;
+}
+s->nr_frames = 0;
+s->max_frames = kvm_xen_get_gnttab_max_frames();
+}
+
+static bool xen_gnttab_is_needed(void *opaque)
+{
+return xen_mode == XEN_EMULATE;
+}
+
+static const VMStateDescription xen_gnttab_vmstate = {
+.name = "xen_gnttab",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = xen_gnttab_is_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(nr_frames, XenGnttabState),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static void xen_gnttab_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->realize = xen_gnttab_realize;
+dc->vmsd = _gnttab_vmstate;
+}
+
+static const TypeInfo xen_gnttab_info = {
+.name  = TYPE_XEN_GNTTAB,
+.parent= TYPE_SYS_BUS_DEVICE,
+.instance_size = sizeof(XenGnttabState),
+.class_init= xen_gnttab_class_init,
+};
+
+void xen_gnttab_create(void)
+{
+xen_gnttab_singleton = XEN_GNTTAB(sysbus_create_simple(TYPE_XEN_GNTTAB,
+   -1, NULL));
+}
+
+static void xen_gnttab_register_types(void)
+{
+type_register_static(_gnttab_info);
+}
+
+type_init(xen_gnttab_register_types)
+
+int xen_gnttab_map_page(uint64_t idx, uint64_t gfn)
+{
+return -ENOSYS;
+}
+
diff --git a/hw/i386/kvm/xen_gnttab.h b/hw/i386/kvm/xen_gnttab.h
new file mode 100644
index 00..a7caa94c83
--- /dev/null
+++ b/hw/i386/kvm/xen_gnttab.h
@@ -0,0 +1,18 @@
+/*
+ * QEMU Xen emulation: Grant table support
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Authors: David Woodhouse 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_XEN_GNTTAB_H
+#define QEMU_XEN_GNTTAB_H
+
+void xen_gnttab_create(void);
+int xen_gnttab_map_page(uint64_t idx, uint64_t gfn);
+
+#endif /* QEMU_XEN_GNTTAB_H */
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 1c4941de8f..4f044bc7da 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -91,6 +91,7 @@
 #include "hw/virtio/virtio-mem-pci.h"
 #include "hw/i386/kvm/xen_overlay.h"
 #include "hw/i386/kvm/xen_evtchn.h"
+#include "hw/i386/kvm/xen_gnttab.h"
 #include "hw/mem/memory-device.h"
 #include "sysemu/replay.h"
 #include "target/i386/cpu.h"
@@ -1856,6 +1857,7 @@ int pc_machine_kvm_type(MachineState *machine, const char 
*kvm_type)
 if (xen_mode == XEN_EMULATE) {
 xen_overlay_create();
 xen_evtchn_create();
+xen_gnttab_create();
 }
 

[RFC PATCH v5 11/52] i386/xen: implement HYPERVISOR_xen_version

2022-12-30 Thread David Woodhouse
From: Joao Martins 

This is just meant to serve as an example on how we can implement
hypercalls. xen_version specifically since Qemu does all kind of
feature controllability. So handling that here seems appropriate.

Signed-off-by: Joao Martins 
[dwmw2: Implement kvm_gva_rw() safely]
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/kvm/xen-emu.c | 86 +++
 1 file changed, 86 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 476f464ee2..1dea6feb90 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -14,9 +14,55 @@
 #include "sysemu/kvm_int.h"
 #include "sysemu/kvm_xen.h"
 #include "kvm/kvm_i386.h"
+#include "exec/address-spaces.h"
 #include "xen-emu.h"
 #include "trace.h"
 
+#include "standard-headers/xen/version.h"
+
+static int kvm_gva_rw(CPUState *cs, uint64_t gva, void *_buf, size_t sz,
+  bool is_write)
+{
+uint8_t *buf = (uint8_t *)_buf;
+int ret;
+
+while (sz) {
+struct kvm_translation tr = {
+.linear_address = gva,
+};
+
+size_t len = TARGET_PAGE_SIZE - (tr.linear_address & 
~TARGET_PAGE_MASK);
+if (len > sz) {
+len = sz;
+}
+
+ret = kvm_vcpu_ioctl(cs, KVM_TRANSLATE, );
+if (ret || !tr.valid || (is_write && !tr.writeable)) {
+return -EFAULT;
+}
+
+cpu_physical_memory_rw(tr.physical_address, buf, len, is_write);
+
+buf += len;
+sz -= len;
+gva += len;
+}
+
+return 0;
+}
+
+static inline int kvm_copy_from_gva(CPUState *cs, uint64_t gva, void *buf,
+size_t sz)
+{
+return kvm_gva_rw(cs, gva, buf, sz, false);
+}
+
+static inline int kvm_copy_to_gva(CPUState *cs, uint64_t gva, void *buf,
+  size_t sz)
+{
+return kvm_gva_rw(cs, gva, buf, sz, true);
+}
+
 int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
 {
 const int required_caps = KVM_XEN_HVM_CONFIG_HYPERCALL_MSR |
@@ -87,6 +133,43 @@ uint32_t kvm_xen_get_caps(void)
 return kvm_state->xen_caps;
 }
 
+static bool kvm_xen_hcall_xen_version(struct kvm_xen_exit *exit, X86CPU *cpu,
+ int cmd, uint64_t arg)
+{
+int err = 0;
+
+switch (cmd) {
+case XENVER_get_features: {
+struct xen_feature_info fi;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(fi) == 8);
+
+err = kvm_copy_from_gva(CPU(cpu), arg, , sizeof(fi));
+if (err) {
+break;
+}
+
+fi.submap = 0;
+if (fi.submap_idx == 0) {
+fi.submap |= 1 << XENFEAT_writable_page_tables |
+ 1 << XENFEAT_writable_descriptor_tables |
+ 1 << XENFEAT_auto_translated_physmap |
+ 1 << XENFEAT_supervisor_mode_kernel;
+}
+
+err = kvm_copy_to_gva(CPU(cpu), arg, , sizeof(fi));
+break;
+}
+
+default:
+return false;
+}
+
+exit->u.hcall.result = err;
+return true;
+}
+
 static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
 {
 uint16_t code = exit->u.hcall.input;
@@ -97,6 +180,9 @@ static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct 
kvm_xen_exit *exit)
 }
 
 switch (code) {
+case __HYPERVISOR_xen_version:
+return kvm_xen_hcall_xen_version(exit, cpu, exit->u.hcall.params[0],
+ exit->u.hcall.params[1]);
 default:
 return false;
 }
-- 
2.35.3




[RFC PATCH v5 44/52] hw/xen: Support mapping grant frames

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_gnttab.c  | 83 ++-
 hw/i386/kvm/xen_overlay.c |  2 +-
 hw/i386/kvm/xen_overlay.h |  2 +
 3 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/hw/i386/kvm/xen_gnttab.c b/hw/i386/kvm/xen_gnttab.c
index 7a441445cd..00627648ef 100644
--- a/hw/i386/kvm/xen_gnttab.c
+++ b/hw/i386/kvm/xen_gnttab.c
@@ -9,6 +9,8 @@
  * See the COPYING file in the top-level directory.
  */
 
+#define __XEN_INTERFACE_VERSION__ 0x00040e00
+
 #include "qemu/osdep.h"
 #include "qemu/host-utils.h"
 #include "qemu/module.h"
@@ -36,20 +38,36 @@ OBJECT_DECLARE_SIMPLE_TYPE(XenGnttabState, XEN_GNTTAB)
 #define XEN_PAGE_SHIFT 12
 #define XEN_PAGE_SIZE (1ULL << XEN_PAGE_SHIFT)
 
+#define ENTRIES_PER_FRAME_V1 (XEN_PAGE_SIZE / sizeof(grant_entry_v1_t))
+#define ENTRIES_PER_FRAME_V2 (XEN_PAGE_SIZE / sizeof(grant_entry_v2_t))
+
 struct XenGnttabState {
 /*< private >*/
 SysBusDevice busdev;
 /*< public >*/
 
+QemuMutex gnt_lock;
+
 uint32_t nr_frames;
 uint32_t max_frames;
+
+union {
+grant_entry_v1_t *v1;
+grant_entry_v2_t *v2;
+} entries;
+
+MemoryRegion gnt_frames;
+MemoryRegion *gnt_aliases;
+uint64_t *gnt_frame_gpas;
 };
 
 struct XenGnttabState *xen_gnttab_singleton;
 
+
 static void xen_gnttab_realize(DeviceState *dev, Error **errp)
 {
 XenGnttabState *s = XEN_GNTTAB(dev);
+int i;
 
 if (xen_mode != XEN_EMULATE) {
 error_setg(errp, "Xen grant table support is for Xen emulation");
@@ -57,6 +75,39 @@ static void xen_gnttab_realize(DeviceState *dev, Error 
**errp)
 }
 s->nr_frames = 0;
 s->max_frames = kvm_xen_get_gnttab_max_frames();
+memory_region_init_ram(>gnt_frames, OBJECT(dev), "xen:grant_table",
+   XEN_PAGE_SIZE * s->max_frames, _abort);
+memory_region_set_enabled(>gnt_frames, true);
+s->entries.v1 = memory_region_get_ram_ptr(>gnt_frames);
+memset(s->entries.v1, 0, XEN_PAGE_SIZE * s->max_frames);
+
+/* Create individual page-sizes aliases for overlays */
+s->gnt_aliases = (void *)g_new0(MemoryRegion, s->max_frames);
+s->gnt_frame_gpas = (void *)g_new(uint64_t, s->max_frames);
+for (i = 0; i < s->max_frames; i++) {
+memory_region_init_alias(>gnt_aliases[i], OBJECT(dev),
+ NULL, >gnt_frames,
+ i * XEN_PAGE_SIZE, XEN_PAGE_SIZE);
+s->gnt_frame_gpas[i] = INVALID_GPA;
+}
+
+qemu_mutex_init(>gnt_lock);
+
+xen_gnttab_singleton = s;
+}
+
+static int xen_gnttab_post_load(void *opaque, int version_id)
+{
+XenGnttabState *s = XEN_GNTTAB(opaque);
+uint32_t i;
+
+for (i = 0; i < s->nr_frames; i++) {
+if (s->gnt_frame_gpas[i] != INVALID_GPA) {
+xen_overlay_map_page_locked(>gnt_aliases[i],
+s->gnt_frame_gpas[i]);
+}
+}
+return 0;
 }
 
 static bool xen_gnttab_is_needed(void *opaque)
@@ -69,8 +120,11 @@ static const VMStateDescription xen_gnttab_vmstate = {
 .version_id = 1,
 .minimum_version_id = 1,
 .needed = xen_gnttab_is_needed,
+.post_load = xen_gnttab_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINT32(nr_frames, XenGnttabState),
+VMSTATE_VARRAY_UINT32(gnt_frame_gpas, XenGnttabState, nr_frames, 0,
+  vmstate_info_uint64, uint64_t),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -105,6 +159,33 @@ type_init(xen_gnttab_register_types)
 
 int xen_gnttab_map_page(uint64_t idx, uint64_t gfn)
 {
-return -ENOSYS;
+XenGnttabState *s = xen_gnttab_singleton;
+uint64_t gpa = gfn << XEN_PAGE_SHIFT;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (idx >= s->max_frames) {
+return -EINVAL;
+}
+
+qemu_mutex_lock(>gnt_lock);
+
+qemu_mutex_lock_iothread();
+
+xen_overlay_map_page_locked(>gnt_aliases[idx], gpa);
+
+qemu_mutex_unlock_iothread();
+
+s->gnt_frame_gpas[idx] = gpa;
+
+if (s->nr_frames <= idx) {
+s->nr_frames = idx + 1;
+}
+
+qemu_mutex_unlock(>gnt_lock);
+
+return 0;
 }
 
diff --git a/hw/i386/kvm/xen_overlay.c b/hw/i386/kvm/xen_overlay.c
index 7046bd55f1..2b88d846ac 100644
--- a/hw/i386/kvm/xen_overlay.c
+++ b/hw/i386/kvm/xen_overlay.c
@@ -49,7 +49,7 @@ struct XenOverlayState {
 
 struct XenOverlayState *xen_overlay_singleton;
 
-static void xen_overlay_map_page_locked(MemoryRegion *page, uint64_t gpa)
+void xen_overlay_map_page_locked(MemoryRegion *page, uint64_t gpa)
 {
 /*
  * Xen allows guests to map the same page as many times as it likes
diff --git a/hw/i386/kvm/xen_overlay.h b/hw/i386/kvm/xen_overlay.h
index 5c46a0b036..594d2cba59 100644
--- a/hw/i386/kvm/xen_overlay.h
+++ b/hw/i386/kvm/xen_overlay.h
@@ -21,4 +21,6 @@ int xen_sync_long_mode(void);
 int xen_set_long_mode(bool long_mode);
 bool xen_is_long_mode(void);
 
+void 

[RFC PATCH v5 05/52] i386/kvm: handle Xen HVM cpuid leaves

2022-12-30 Thread David Woodhouse
From: Joao Martins 

Introduce support for emulating CPUID for Xen HVM guests. It doesn't make
sense to advertise the KVM leaves to a Xen guest, so do Xen unconditionally
when the xen-version machine property is set.

Signed-off-by: Joao Martins 
[dwmw2: Obtain xen_version from KVM property, make it automatic]
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/cpu.c |  1 +
 target/i386/cpu.h |  2 +
 target/i386/kvm/kvm.c | 77 ++-
 target/i386/kvm/xen-emu.c |  4 +-
 target/i386/kvm/xen-emu.h | 13 ++-
 5 files changed, 91 insertions(+), 6 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 22b681ca37..50aa95f134 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7069,6 +7069,7 @@ static Property x86_cpu_properties[] = {
  * own cache information (see x86_cpu_load_def()).
  */
 DEFINE_PROP_BOOL("legacy-cache", X86CPU, legacy_cache, true),
+DEFINE_PROP_BOOL("xen-vapic", X86CPU, xen_vapic, false),
 
 /*
  * From "Requirements for Implementing the Microsoft
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index d4bc19577a..c6c57baed5 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1964,6 +1964,8 @@ struct ArchCPU {
 int32_t thread_id;
 
 int32_t hv_max_vps;
+
+bool xen_vapic;
 };
 
 
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index b097de0524..893e350a10 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -22,6 +22,7 @@
 
 #include 
 #include "standard-headers/asm-x86/kvm_para.h"
+#include "standard-headers/xen/arch-x86/cpuid.h"
 
 #include "cpu.h"
 #include "host-cpu.h"
@@ -1803,7 +1804,77 @@ int kvm_arch_init_vcpu(CPUState *cs)
 has_msr_hv_hypercall = true;
 }
 
-if (cpu->expose_kvm) {
+if (cs->kvm_state->xen_version) {
+#ifdef CONFIG_XEN_EMU
+struct kvm_cpuid_entry2 *xen_max_leaf;
+
+memcpy(signature, "XenVMMXenVMM", 12);
+
+xen_max_leaf = c = _data.entries[cpuid_i++];
+c->function = kvm_base + XEN_CPUID_SIGNATURE;
+c->eax = kvm_base + XEN_CPUID_TIME;
+c->ebx = signature[0];
+c->ecx = signature[1];
+c->edx = signature[2];
+
+c = _data.entries[cpuid_i++];
+c->function = kvm_base + XEN_CPUID_VENDOR;
+c->eax = cs->kvm_state->xen_version;
+c->ebx = 0;
+c->ecx = 0;
+c->edx = 0;
+
+c = _data.entries[cpuid_i++];
+c->function = kvm_base + XEN_CPUID_HVM_MSR;
+/* Number of hypercall-transfer pages */
+c->eax = 1;
+/* Hypercall MSR base address */
+if (hyperv_enabled(cpu)) {
+c->ebx = XEN_HYPERCALL_MSR_HYPERV;
+kvm_xen_init(cs->kvm_state, c->ebx);
+} else {
+c->ebx = XEN_HYPERCALL_MSR;
+}
+c->ecx = 0;
+c->edx = 0;
+
+c = _data.entries[cpuid_i++];
+c->function = kvm_base + XEN_CPUID_TIME;
+c->eax = ((!!tsc_is_stable_and_known(env) << 1) |
+(!!(env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_RDTSCP) << 2));
+/* default=0 (emulate if necessary) */
+c->ebx = 0;
+/* guest tsc frequency */
+c->ecx = env->user_tsc_khz;
+/* guest tsc incarnation (migration count) */
+c->edx = 0;
+
+c = _data.entries[cpuid_i++];
+c->function = kvm_base + XEN_CPUID_HVM;
+xen_max_leaf->eax = kvm_base + XEN_CPUID_HVM;
+if (cs->kvm_state->xen_version >= XEN_VERSION(4, 5)) {
+c->function = kvm_base + XEN_CPUID_HVM;
+
+if (cpu->xen_vapic) {
+c->eax |= XEN_HVM_CPUID_APIC_ACCESS_VIRT;
+c->eax |= XEN_HVM_CPUID_X2APIC_VIRT;
+}
+
+c->eax |= XEN_HVM_CPUID_IOMMU_MAPPINGS;
+
+if (cs->kvm_state->xen_version >= XEN_VERSION(4, 6)) {
+c->eax |= XEN_HVM_CPUID_VCPU_ID_PRESENT;
+c->ebx = cs->cpu_index;
+}
+}
+
+kvm_base += 0x100;
+#else /* CONFIG_XEN_EMU */
+/* This should never happen as kvm_arch_init() would have died first. 
*/
+fprintf(stderr, "Cannot enable Xen CPUID without Xen support\n");
+abort();
+#endif
+} else if (cpu->expose_kvm) {
 memcpy(signature, "KVMKVMKVM\0\0\0", 12);
 c = _data.entries[cpuid_i++];
 c->function = KVM_CPUID_SIGNATURE | kvm_base;
@@ -2523,7 +2594,9 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 error_report("kvm: Xen support only available in PC machine");
 return -ENOTSUP;
 }
-ret = kvm_xen_init(s);
+/* hyperv_enabled() doesn't work yet. */
+uint32_t msr = XEN_HYPERCALL_MSR;
+ret = kvm_xen_init(s, msr);
 if (ret < 0) {
 return ret;
 }
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index b556d903aa..34d5bc1bc9 100644
--- a/target/i386/kvm/xen-emu.c
+++ 

[RFC PATCH v5 10/52] i386/xen: handle guest hypercalls

2022-12-30 Thread David Woodhouse
From: Joao Martins 

This means handling the new exit reason for Xen but still
crashing on purpose. As we implement each of the hypercalls
we will then return the right return code.

Signed-off-by: Joao Martins 
[dwmw2: Add CPL to hypercall tracing, disallow hypercalls from CPL > 0]
Signed-off-by: David Woodhouse 
---
 target/i386/kvm/kvm.c|  5 
 target/i386/kvm/trace-events |  3 +++
 target/i386/kvm/xen-emu.c| 44 
 target/i386/kvm/xen-emu.h|  1 +
 4 files changed, 53 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index be5e50b4fc..f365e56fcc 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5477,6 +5477,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run 
*run)
 assert(run->msr.reason == KVM_MSR_EXIT_REASON_FILTER);
 ret = kvm_handle_wrmsr(cpu, run);
 break;
+#ifdef CONFIG_XEN_EMU
+case KVM_EXIT_XEN:
+ret = kvm_xen_handle_exit(cpu, >xen);
+break;
+#endif
 default:
 fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
 ret = -1;
diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index 7c369db1e1..cd6f842b1f 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -5,3 +5,6 @@ kvm_x86_fixup_msi_error(uint32_t gsi) "VT-d failed to remap 
interrupt for GSI %"
 kvm_x86_add_msi_route(int virq) "Adding route entry for virq %d"
 kvm_x86_remove_msi_route(int virq) "Removing route entry for virq %d"
 kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
+
+# xen-emu.c
+kvm_xen_hypercall(int cpu, uint8_t cpl, uint64_t input, uint64_t a0, uint64_t 
a1, uint64_t a2, uint64_t ret) "xen_hypercall: cpu %d cpl %d input %" PRIu64 " 
a0 0x%" PRIx64 " a1 0x%" PRIx64 " a2 0x%" PRIx64" ret 0x%" PRIx64
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 4883b95d9d..476f464ee2 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -10,10 +10,12 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/log.h"
 #include "sysemu/kvm_int.h"
 #include "sysemu/kvm_xen.h"
 #include "kvm/kvm_i386.h"
 #include "xen-emu.h"
+#include "trace.h"
 
 int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
 {
@@ -84,3 +86,45 @@ uint32_t kvm_xen_get_caps(void)
 {
 return kvm_state->xen_caps;
 }
+
+static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
+{
+uint16_t code = exit->u.hcall.input;
+
+if (exit->u.hcall.cpl > 0) {
+exit->u.hcall.result = -EPERM;
+return true;
+}
+
+switch (code) {
+default:
+return false;
+}
+}
+
+int kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
+{
+if (exit->type != KVM_EXIT_XEN_HCALL) {
+return -1;
+}
+
+if (!do_kvm_xen_handle_exit(cpu, exit)) {
+/*
+ * Some hypercalls will be deliberately "implemented" by returning
+ * -ENOSYS. This case is for hypercalls which are unexpected.
+ */
+exit->u.hcall.result = -ENOSYS;
+qemu_log_mask(LOG_UNIMP, "Unimplemented Xen hypercall %"
+  PRId64 " (0x%" PRIx64 " 0x%" PRIx64 " 0x%" PRIx64 ")\n",
+  (uint64_t)exit->u.hcall.input,
+  (uint64_t)exit->u.hcall.params[0],
+  (uint64_t)exit->u.hcall.params[1],
+  (uint64_t)exit->u.hcall.params[2]);
+}
+
+trace_kvm_xen_hypercall(CPU(cpu)->cpu_index, exit->u.hcall.cpl,
+exit->u.hcall.input, exit->u.hcall.params[0],
+exit->u.hcall.params[1], exit->u.hcall.params[2],
+exit->u.hcall.result);
+return 0;
+}
diff --git a/target/i386/kvm/xen-emu.h b/target/i386/kvm/xen-emu.h
index d62f1d8ed8..21faf6bf38 100644
--- a/target/i386/kvm/xen-emu.h
+++ b/target/i386/kvm/xen-emu.h
@@ -25,5 +25,6 @@
 
 int kvm_xen_init(KVMState *s, uint32_t hypercall_msr);
 int kvm_xen_init_vcpu(CPUState *cs);
+int kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit);
 
 #endif /* QEMU_I386_KVM_XEN_EMU_H */
-- 
2.35.3




[RFC PATCH v5 32/52] hw/xen: Implement EVTCHNOP_bind_virq

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

Add the array of virq ports to each vCPU so that we can deliver timers,
debug ports, etc. Global virqs are allocated against vCPU 0 initially,
but can be migrated to other vCPUs (when we implement that).

The kernel needs to know about VIRQ_TIMER in order to accelerate timers,
so tell it via KVM_XEN_VCPU_ATTR_TYPE_TIMER. Also save/restore the value
of the singleshot timer across migration, as the kernel will handle the
hypercalls automatically now.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 85 
 hw/i386/kvm/xen_evtchn.h  |  2 +
 include/sysemu/kvm_xen.h  |  1 +
 target/i386/cpu.h |  4 ++
 target/i386/kvm/xen-emu.c | 91 +++
 target/i386/machine.c |  2 +
 6 files changed, 185 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index bfdcc8de67..f8e7170a84 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -239,6 +239,11 @@ static bool valid_port(evtchn_port_t port)
 }
 }
 
+static bool valid_vcpu(uint32_t vcpu)
+{
+return !!qemu_get_cpu(vcpu);
+}
+
 int xen_evtchn_status_op(struct evtchn_status *status)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
@@ -489,6 +494,43 @@ static void free_port(XenEvtchnState *s, evtchn_port_t 
port)
 clear_port_pending(s, port);
 }
 
+static int allocate_port(XenEvtchnState *s, uint32_t vcpu, uint16_t type,
+ uint16_t val, evtchn_port_t *port)
+{
+evtchn_port_t p = 1;
+
+for (p = 1; valid_port(p); p++) {
+if (s->port_table[p].type == EVTCHNSTAT_closed) {
+s->port_table[p].vcpu = vcpu;
+s->port_table[p].type = type;
+s->port_table[p].type_val = val;
+
+*port = p;
+
+if (s->nr_ports < p + 1) {
+s->nr_ports = p + 1;
+}
+
+return 0;
+}
+}
+return -ENOSPC;
+}
+
+static bool virq_is_global(uint32_t virq)
+{
+switch (virq) {
+case VIRQ_TIMER:
+case VIRQ_DEBUG:
+case VIRQ_XENOPROF:
+case VIRQ_XENPMU:
+return false;
+
+default:
+return true;
+}
+}
+
 static int close_port(XenEvtchnState *s, evtchn_port_t port)
 {
 XenEvtchnPort *p = >port_table[port];
@@ -497,6 +539,11 @@ static int close_port(XenEvtchnState *s, evtchn_port_t 
port)
 case EVTCHNSTAT_closed:
 return -ENOENT;
 
+case EVTCHNSTAT_virq:
+kvm_xen_set_vcpu_virq(virq_is_global(p->type_val) ? 0 : p->vcpu,
+  p->type_val, 0);
+break;
+
 default:
 break;
 }
@@ -548,3 +595,41 @@ int xen_evtchn_unmask_op(struct evtchn_unmask *unmask)
 
 return ret;
 }
+
+int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int ret;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (virq->virq >= NR_VIRQS) {
+return -EINVAL;
+}
+
+/* Global VIRQ must be allocated on vCPU0 first */
+if (virq_is_global(virq->virq) && virq->vcpu != 0) {
+return -EINVAL;
+}
+
+if (!valid_vcpu(virq->vcpu)) {
+return -ENOENT;
+}
+
+qemu_mutex_lock(>port_lock);
+
+ret = allocate_port(s, virq->vcpu, EVTCHNSTAT_virq, virq->virq,
+>port);
+if (!ret) {
+ret = kvm_xen_set_vcpu_virq(virq->vcpu, virq->virq, virq->port);
+if (ret) {
+free_port(s, virq->port);
+}
+}
+
+qemu_mutex_unlock(>port_lock);
+
+return ret;
+}
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 69c6b0d743..0ea13dda3a 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -18,8 +18,10 @@ int xen_evtchn_set_callback_param(uint64_t param);
 struct evtchn_status;
 struct evtchn_close;
 struct evtchn_unmask;
+struct evtchn_bind_virq;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
+int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq);
 
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/include/sysemu/kvm_xen.h b/include/sysemu/kvm_xen.h
index ee53294deb..864772514f 100644
--- a/include/sysemu/kvm_xen.h
+++ b/include/sysemu/kvm_xen.h
@@ -19,6 +19,7 @@
 uint32_t kvm_xen_get_caps(void);
 void *kvm_xen_get_vcpu_info_hva(uint32_t vcpu_id);
 void kvm_xen_inject_vcpu_callback_vector(uint32_t vcpu_id, int type);
+int kvm_xen_set_vcpu_virq(uint32_t vcpu_id, uint16_t virq, uint16_t port);
 
 #define kvm_xen_has_cap(cap) (!!(kvm_xen_get_caps() &   \
  KVM_XEN_HVM_CONFIG_ ## cap))
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 938a1b9c8b..000ed2fed9 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -27,6 +27,8 @@
 #include "qapi/qapi-types-common.h"
 #include "qemu/cpu-float.h"
 
+#define XEN_NR_VIRQS 24
+
 /* The x86 has a strong memory model 

[RFC PATCH v5 46/52] hw/xen: Implement GNTTABOP_query_size

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_gnttab.c  | 19 +++
 hw/i386/kvm/xen_gnttab.h  |  2 ++
 target/i386/kvm/xen-emu.c | 16 +++-
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/hw/i386/kvm/xen_gnttab.c b/hw/i386/kvm/xen_gnttab.c
index 121c39f6e2..79805b9f74 100644
--- a/hw/i386/kvm/xen_gnttab.c
+++ b/hw/i386/kvm/xen_gnttab.c
@@ -220,3 +220,22 @@ int xen_gnttab_get_version_op(struct gnttab_get_version 
*get)
 get->version = 1;
 return 0;
 }
+
+int xen_gnttab_query_size_op(struct gnttab_query_size *size)
+{
+XenGnttabState *s = xen_gnttab_singleton;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (size->dom != DOMID_SELF && size->dom != xen_domid) {
+size->status = GNTST_bad_domain;
+return 0;
+}
+
+size->status = GNTST_okay;
+size->nr_frames = s->nr_frames;
+size->max_nr_frames = s->max_frames;
+return 0;
+}
diff --git a/hw/i386/kvm/xen_gnttab.h b/hw/i386/kvm/xen_gnttab.h
index 79579677ba..3bdbe96191 100644
--- a/hw/i386/kvm/xen_gnttab.h
+++ b/hw/i386/kvm/xen_gnttab.h
@@ -17,7 +17,9 @@ int xen_gnttab_map_page(uint64_t idx, uint64_t gfn);
 
 struct gnttab_set_version;
 struct gnttab_get_version;
+struct gnttab_query_size;
 int xen_gnttab_set_version_op(struct gnttab_set_version *set);
 int xen_gnttab_get_version_op(struct gnttab_get_version *get);
+int xen_gnttab_query_size_op(struct gnttab_query_size *size);
 
 #endif /* QEMU_XEN_GNTTAB_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index cd5786e4a7..0c47859980 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -1129,7 +1129,21 @@ static bool kvm_xen_hcall_gnttab_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 }
 break;
 }
-case GNTTABOP_query_size:
+case GNTTABOP_query_size: {
+struct gnttab_query_size size;
+
+qemu_build_assert(sizeof(size) == 16);
+if (kvm_copy_from_gva(cs, arg, , sizeof(size))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_gnttab_query_size_op();
+if (!err && kvm_copy_to_gva(cs, arg, , sizeof(size))) {
+err = -EFAULT;
+}
+break;
+}
 case GNTTABOP_setup_table:
 case GNTTABOP_copy:
 case GNTTABOP_map_grant_ref:
-- 
2.35.3




[RFC PATCH v5 50/52] hw/xen: Add backend implementation of interdomain event channel support

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

The provides the QEMU side of interdomain event channels, allowing events
to be sent to/from the guest.

The API mirrors libxenevtchn, and in time both this and the real Xen one
will be available through ops structures so that the PV backend drivers
can use the correct one as appropriate.

For now, this implementation can be used directly by our XenStore which
will be for emulated mode only.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c | 342 +--
 hw/i386/kvm/xen_evtchn.h |  20 +++
 2 files changed, 353 insertions(+), 9 deletions(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 34c5199421..c0f6ef9dff 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -35,6 +35,7 @@
 #include "sysemu/kvm.h"
 #include "sysemu/kvm_xen.h"
 #include 
+#include 
 
 #include "standard-headers/xen/memory.h"
 #include "standard-headers/xen/hvm/params.h"
@@ -85,6 +86,13 @@ struct compat_shared_info {
 
 #define COMPAT_EVTCHN_2L_NR_CHANNELS1024
 
+/* Local private implementation of struct xenevtchn_handle */
+struct xenevtchn_handle {
+evtchn_port_t be_port;
+evtchn_port_t guest_port; /* Or zero for unbound */
+int fd;
+};
+
 /*
  * For unbound/interdomain ports there are only two possible remote
  * domains; self and QEMU. Use a single high bit in type_val for that,
@@ -93,8 +101,6 @@ struct compat_shared_info {
 #define PORT_INFO_TYPEVAL_REMOTE_QEMU   0x8000
 #define PORT_INFO_TYPEVAL_REMOTE_PORT_MASK  0x7FFF
 
-#define DOMID_QEMU  0
-
 struct XenEvtchnState {
 /*< private >*/
 SysBusDevice busdev;
@@ -108,6 +114,8 @@ struct XenEvtchnState {
 uint32_t nr_ports;
 XenEvtchnPort port_table[EVTCHN_2L_NR_CHANNELS];
 qemu_irq gsis[GSI_NUM_PINS];
+
+struct xenevtchn_handle *be_handles[EVTCHN_2L_NR_CHANNELS];
 };
 
 struct XenEvtchnState *xen_evtchn_singleton;
@@ -115,6 +123,18 @@ struct XenEvtchnState *xen_evtchn_singleton;
 /* Top bits of callback_param are the type (HVM_PARAM_CALLBACK_TYPE_xxx) */
 #define CALLBACK_VIA_TYPE_SHIFT 56
 
+static void unbind_backend_ports(XenEvtchnState *s);
+
+static int xen_evtchn_pre_load(void *opaque)
+{
+XenEvtchnState *s = opaque;
+
+/* Unbind all the backend-side ports; they need to rebind */
+unbind_backend_ports(s);
+
+return 0;
+}
+
 static int xen_evtchn_post_load(void *opaque, int version_id)
 {
 XenEvtchnState *s = opaque;
@@ -148,6 +168,7 @@ static const VMStateDescription xen_evtchn_vmstate = {
 .version_id = 1,
 .minimum_version_id = 1,
 .needed = xen_evtchn_is_needed,
+.pre_load = xen_evtchn_pre_load,
 .post_load = xen_evtchn_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINT64(callback_param, XenEvtchnState),
@@ -362,6 +383,20 @@ static int assign_kernel_port(uint16_t type, evtchn_port_t 
port,
 return kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, );
 }
 
+static int assign_kernel_eventfd(uint16_t type, evtchn_port_t port, int fd)
+{
+struct kvm_xen_hvm_attr ha;
+
+ha.type = KVM_XEN_ATTR_TYPE_EVTCHN;
+ha.u.evtchn.send_port = port;
+ha.u.evtchn.type = type;
+ha.u.evtchn.flags = 0;
+ha.u.evtchn.deliver.eventfd.port = 0;
+ha.u.evtchn.deliver.eventfd.fd = fd;
+
+return kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, );
+}
+
 static bool valid_port(evtchn_port_t port)
 {
 if (!port) {
@@ -380,6 +415,32 @@ static bool valid_vcpu(uint32_t vcpu)
 return !!qemu_get_cpu(vcpu);
 }
 
+static void unbind_backend_ports(XenEvtchnState *s)
+{
+XenEvtchnPort *p;
+int i;
+
+for (i = 1; i <= s->nr_ports; i++) {
+p = >port_table[i];
+if (p->type == EVTCHNSTAT_interdomain &&
+(p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU) ) {
+evtchn_port_t be_port = p->type_val & 
PORT_INFO_TYPEVAL_REMOTE_PORT_MASK;
+
+if (s->be_handles[be_port]) {
+/* This part will be overwritten on the load anyway. */
+p->type = EVTCHNSTAT_unbound;
+p->type_val = PORT_INFO_TYPEVAL_REMOTE_QEMU;
+
+/* Leave the backend port open and unbound too. */
+if (kvm_xen_has_cap(EVTCHN_SEND)) {
+deassign_kernel_port(i);
+}
+s->be_handles[be_port]->guest_port = 0;
+}
+}
+}
+}
+
 int xen_evtchn_status_op(struct evtchn_status *status)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
@@ -815,7 +876,14 @@ static int close_port(XenEvtchnState *s, evtchn_port_t 
port)
 
 case EVTCHNSTAT_interdomain:
 if (p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU) {
-/* Not yet implemented. This can't happen! */
+uint16_t be_port = p->type_val & ~PORT_INFO_TYPEVAL_REMOTE_QEMU;
+struct xenevtchn_handle *xc = s->be_handles[be_port];
+if (xc) {
+if (kvm_xen_has_cap(EVTCHN_SEND)) {
+

[RFC PATCH v5 42/52] kvm/i386: Add xen-gnttab-max-frames property

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 accel/kvm/kvm-all.c   |  1 +
 include/sysemu/kvm_int.h  |  1 +
 include/sysemu/kvm_xen.h  |  1 +
 target/i386/kvm/kvm.c | 34 ++
 target/i386/kvm/xen-emu.c |  6 ++
 5 files changed, 43 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 568bb09c09..54930e88e4 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3621,6 +3621,7 @@ static void kvm_accel_instance_init(Object *obj)
 s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN;
 s->notify_window = 0;
 s->xen_version = 0;
+s->xen_gnttab_max_frames = 64;
 }
 
 /**
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 9a8c062609..e6eb99030f 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -112,6 +112,7 @@ struct KVMState
 uint32_t notify_window;
 uint32_t xen_version;
 uint32_t xen_caps;
+uint16_t xen_gnttab_max_frames;
 };
 
 void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
diff --git a/include/sysemu/kvm_xen.h b/include/sysemu/kvm_xen.h
index 864772514f..4999a0cff4 100644
--- a/include/sysemu/kvm_xen.h
+++ b/include/sysemu/kvm_xen.h
@@ -20,6 +20,7 @@ uint32_t kvm_xen_get_caps(void);
 void *kvm_xen_get_vcpu_info_hva(uint32_t vcpu_id);
 void kvm_xen_inject_vcpu_callback_vector(uint32_t vcpu_id, int type);
 int kvm_xen_set_vcpu_virq(uint32_t vcpu_id, uint16_t virq, uint16_t port);
+uint16_t kvm_xen_get_gnttab_max_frames(void);
 
 #define kvm_xen_has_cap(cap) (!!(kvm_xen_get_caps() &   \
  KVM_XEN_HVM_CONFIG_ ## cap))
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 2488773e1e..8fbdc7c064 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5869,6 +5869,33 @@ static void kvm_arch_set_xen_version(Object *obj, 
Visitor *v,
 }
 }
 
+static void kvm_arch_get_xen_gnttab_max_frames(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+KVMState *s = KVM_STATE(obj);
+uint16_t value = s->xen_gnttab_max_frames;
+
+visit_type_uint16(v, name, , errp);
+}
+
+static void kvm_arch_set_xen_gnttab_max_frames(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+KVMState *s = KVM_STATE(obj);
+Error *error = NULL;
+uint16_t value;
+
+visit_type_uint16(v, name, , );
+if (error) {
+error_propagate(errp, error);
+return;
+}
+
+s->xen_gnttab_max_frames = value;
+}
+
 void kvm_arch_accel_class_init(ObjectClass *oc)
 {
 object_class_property_add_enum(oc, "notify-vmexit", "NotifyVMexitOption",
@@ -5894,6 +5921,13 @@ void kvm_arch_accel_class_init(ObjectClass *oc)
   "Xen version to be emulated "
   "(in XENVER_version form "
   "e.g. 0x4000a for 4.10)");
+
+object_class_property_add(oc, "xen-gnttab-max-frames", "uint16",
+  kvm_arch_get_xen_gnttab_max_frames,
+  kvm_arch_set_xen_gnttab_max_frames,
+  NULL, NULL);
+object_class_property_set_description(oc, "xen-gnttab-max-frames",
+  "Maximum number of grant table 
frames");
 }
 
 void kvm_set_max_apic_id(uint32_t max_apic_id)
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 9a8eb0f1f9..fd3e2464ee 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -1160,6 +1160,12 @@ int kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit 
*exit)
 return 0;
 }
 
+uint16_t kvm_xen_get_gnttab_max_frames(void)
+{
+KVMState *s = KVM_STATE(current_accel());
+return s->xen_gnttab_max_frames;
+}
+
 int kvm_put_xen_state(CPUState *cs)
 {
 X86CPU *cpu = X86_CPU(cs);
-- 
2.35.3




[RFC PATCH v5 47/52] i386/xen: handle PV timer hypercalls

2022-12-30 Thread David Woodhouse
From: Joao Martins 

Introduce support for one shot and periodic mode of Xen PV timers,
whereby timer interrupts come through a special virq event channel
with deadlines being set through:

1) set_timer_op hypercall (only oneshot)
2) vcpu_op hypercall for {set,stop}_{singleshot,periodic}_timer
hypercalls

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  |  31 +
 hw/i386/kvm/xen_evtchn.h  |   2 +
 target/i386/cpu.h |   5 +
 target/i386/kvm/xen-emu.c | 245 +-
 target/i386/machine.c |   1 +
 5 files changed, 282 insertions(+), 2 deletions(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 49caf69dc0..34c5199421 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -1159,6 +1159,37 @@ int xen_evtchn_send_op(struct evtchn_send *send)
 return ret;
 }
 
+int xen_evtchn_set_port(uint16_t port)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+XenEvtchnPort *p;
+int ret = -EINVAL;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (!valid_port(port)) {
+return -EINVAL;
+}
+
+qemu_mutex_lock(>port_lock);
+
+p = >port_table[port];
+
+/* QEMU has no business sending to anything but these */
+if (p->type == EVTCHNSTAT_virq ||
+(p->type == EVTCHNSTAT_interdomain &&
+ (p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU))) {
+set_port_pending(s, port);
+ret = 0;
+}
+
+qemu_mutex_unlock(>port_lock);
+
+return ret;
+}
+
 static const char *type_names[] = {
 "closed",
 "unbound",
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 86f154e359..eb6ba16f72 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -20,6 +20,8 @@ int xen_evtchn_set_callback_param(uint64_t param);
 void xen_evtchn_connect_gsis(qemu_irq *system_gsis);
 void xen_evtchn_set_callback_level(int level);
 
+int xen_evtchn_set_port(uint16_t port);
+
 void hmp_xen_event_list(Monitor *mon, const QDict *qdict);
 void hmp_xen_event_inject(Monitor *mon, const QDict *qdict);
 
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 144b1c3038..1a411e1b49 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -26,6 +26,7 @@
 #include "exec/cpu-defs.h"
 #include "qapi/qapi-types-common.h"
 #include "qemu/cpu-float.h"
+#include "qemu/timer.h"
 
 #define XEN_NR_VIRQS 24
 
@@ -1798,6 +1799,10 @@ typedef struct CPUArchState {
 bool xen_callback_asserted;
 uint16_t xen_virq[XEN_NR_VIRQS];
 uint64_t xen_singleshot_timer_ns;
+QEMUTimer *xen_singleshot_timer;
+uint64_t xen_periodic_timer_period;
+QEMUTimer *xen_periodic_timer;
+QemuMutex xen_timers_lock;
 #endif
 #if defined(CONFIG_HVF)
 HVFX86LazyFlags hvf_lflags;
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 0c47859980..7f13ae6a8b 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -39,6 +39,9 @@
 
 #include "xen-compat.h"
 
+static void xen_vcpu_singleshot_timer_event(void *opaque);
+static void xen_vcpu_periodic_timer_event(void *opaque);
+
 #ifdef TARGET_X86_64
 #define hypercall_compat32(longmode) (!(longmode))
 #else
@@ -170,6 +173,23 @@ int kvm_xen_init_vcpu(CPUState *cs)
 env->xen_vcpu_time_info_gpa = INVALID_GPA;
 env->xen_vcpu_runstate_gpa = INVALID_GPA;
 
+qemu_mutex_init(>xen_timers_lock);
+env->xen_singleshot_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+ xen_vcpu_singleshot_timer_event,
+ cpu);
+if (!env->xen_singleshot_timer) {
+return -ENOMEM;
+}
+env->xen_singleshot_timer->opaque = cs;
+
+env->xen_periodic_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+   xen_vcpu_periodic_timer_event,
+   cpu);
+if (!env->xen_periodic_timer) {
+return -ENOMEM;
+}
+env->xen_periodic_timer->opaque = cs;
+
 return 0;
 }
 
@@ -201,7 +221,8 @@ static bool kvm_xen_hcall_xen_version(struct kvm_xen_exit 
*exit, X86CPU *cpu,
  1 << XENFEAT_writable_descriptor_tables |
  1 << XENFEAT_auto_translated_physmap |
  1 << XENFEAT_supervisor_mode_kernel |
- 1 << XENFEAT_hvm_callback_vector;
+ 1 << XENFEAT_hvm_callback_vector |
+ 1 << XENFEAT_hvm_safe_pvclock;
 }
 
 err = kvm_copy_to_gva(CPU(cpu), arg, , sizeof(fi));
@@ -815,13 +836,192 @@ static int vcpuop_register_runstate_info(CPUState *cs, 
CPUState *target,
 return 0;
 }
 
+static uint64_t kvm_get_current_ns(void)
+{
+struct kvm_clock_data data;
+int ret;
+
+ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, );
+if (ret < 0) {
+fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
+abort();
+}
+
+

[RFC PATCH v5 13/52] i386/xen: Implement SCHEDOP_poll and SCHEDOP_yield

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

They both do the same thing and just call sched_yield. This is enough to
stop the Linux guest panicking when running on a host kernel which doesn't
intercept SCHEDOP_poll and lets it reach userspace.

Signed-off-by: David Woodhouse 
---
 target/i386/kvm/xen-emu.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 5f2b55ef10..80005ea527 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -227,6 +227,18 @@ static bool kvm_xen_hcall_sched_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 err = schedop_shutdown(cs, arg);
 break;
 
+case SCHEDOP_poll:
+/*
+ * Linux will panic if this doesn't work. Just yield; it's not
+ * worth overthinking it because wWith event channel handling
+ * in KVM, the kernel will intercept this and it will never
+ * reach QEMU anyway.
+ */
+case SCHEDOP_yield:
+sched_yield();
+err = 0;
+break;
+
 default:
 return false;
 }
-- 
2.35.3




[RFC PATCH v5 45/52] i386/xen: Implement HYPERVISOR_grant_table_op and GNTTABOP_[gs]et_verson

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_gnttab.c  | 31 
 hw/i386/kvm/xen_gnttab.h  |  5 
 target/i386/kvm/xen-emu.c | 60 +++
 3 files changed, 96 insertions(+)

diff --git a/hw/i386/kvm/xen_gnttab.c b/hw/i386/kvm/xen_gnttab.c
index 00627648ef..121c39f6e2 100644
--- a/hw/i386/kvm/xen_gnttab.c
+++ b/hw/i386/kvm/xen_gnttab.c
@@ -189,3 +189,34 @@ int xen_gnttab_map_page(uint64_t idx, uint64_t gfn)
 return 0;
 }
 
+int xen_gnttab_set_version_op(struct gnttab_set_version *set)
+{
+int ret;
+
+switch (set->version) {
+case 1:
+ret = 0;
+break;
+
+case 2:
+/* Behave as before set_version was introduced. */
+ret = -ENOSYS;
+break;
+
+default:
+ret = -EINVAL;
+}
+
+set->version = 1;
+return ret;
+}
+
+int xen_gnttab_get_version_op(struct gnttab_get_version *get)
+{
+if (get->dom != DOMID_SELF && get->dom != xen_domid) {
+return -ESRCH;
+}
+
+get->version = 1;
+return 0;
+}
diff --git a/hw/i386/kvm/xen_gnttab.h b/hw/i386/kvm/xen_gnttab.h
index a7caa94c83..79579677ba 100644
--- a/hw/i386/kvm/xen_gnttab.h
+++ b/hw/i386/kvm/xen_gnttab.h
@@ -15,4 +15,9 @@
 void xen_gnttab_create(void);
 int xen_gnttab_map_page(uint64_t idx, uint64_t gfn);
 
+struct gnttab_set_version;
+struct gnttab_get_version;
+int xen_gnttab_set_version_op(struct gnttab_set_version *set);
+int xen_gnttab_get_version_op(struct gnttab_get_version *get);
+
 #endif /* QEMU_XEN_GNTTAB_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 931e74670b..cd5786e4a7 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -35,6 +35,7 @@
 #include "standard-headers/xen/hvm/params.h"
 #include "standard-headers/xen/vcpu.h"
 #include "standard-headers/xen/event_channel.h"
+#include "standard-headers/xen/grant_table.h"
 
 #include "xen-compat.h"
 
@@ -1091,6 +1092,61 @@ static bool kvm_xen_hcall_sched_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static bool kvm_xen_hcall_gnttab_op(struct kvm_xen_exit *exit, X86CPU *cpu,
+int cmd, uint64_t arg, int count)
+{
+CPUState *cs = CPU(cpu);
+int err;
+
+switch (cmd) {
+case GNTTABOP_set_version: {
+struct gnttab_set_version set;
+
+qemu_build_assert(sizeof(set) == 4);
+if (kvm_copy_from_gva(cs, arg, , sizeof(set))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_gnttab_set_version_op();
+if (!err && kvm_copy_to_gva(cs, arg, , sizeof(set))) {
+err = -EFAULT;
+}
+break;
+}
+case GNTTABOP_get_version: {
+struct gnttab_get_version get;
+
+qemu_build_assert(sizeof(get) == 8);
+if (kvm_copy_from_gva(cs, arg, , sizeof(get))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_gnttab_get_version_op();
+if (!err && kvm_copy_to_gva(cs, arg, , sizeof(get))) {
+err = -EFAULT;
+}
+break;
+}
+case GNTTABOP_query_size:
+case GNTTABOP_setup_table:
+case GNTTABOP_copy:
+case GNTTABOP_map_grant_ref:
+case GNTTABOP_unmap_grant_ref:
+case GNTTABOP_swap_grant_ref:
+return false;
+
+default:
+/* Xen explicitly returns -ENOSYS to HVM guests for all others */
+err = -ENOSYS;
+break;
+}
+
+exit->u.hcall.result = err;
+return true;
+}
+
 static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
 {
 uint16_t code = exit->u.hcall.input;
@@ -1101,6 +1157,10 @@ static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct 
kvm_xen_exit *exit)
 }
 
 switch (code) {
+case __HYPERVISOR_grant_table_op:
+return kvm_xen_hcall_gnttab_op(exit, cpu, exit->u.hcall.params[0],
+   exit->u.hcall.params[1],
+   exit->u.hcall.params[2]);
 case __HYPERVISOR_sched_op:
 return kvm_xen_hcall_sched_op(exit, cpu, exit->u.hcall.params[0],
   exit->u.hcall.params[1]);
-- 
2.35.3




[RFC PATCH v5 02/52] xen: add CONFIG_XENFV_MACHINE and CONFIG_XEN_EMU options for Xen emulation

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

The XEN_EMU option will cover core Xen support in target/, which exists
only for x86 with KVM today but could theoretically also be implemented
on Arm/Aarch64 and with TCG or other accelerators. It will also cover
the support for architecture-independent grant table and event channel
support which will be added in hw/i386/kvm/ (on the basis that the
non-KVM support is very theoretical and making it not use KVM directly
seems like gratuitous overengineering at this point).

The XENFV_MACHINE option is for the xenfv platform support, which will
now be used both by XEN_EMU and by real Xen.

The XEN option remains dependent on the Xen runtime libraries, and covers
support for real Xen. Some code which currently resides under CONFIG_XEN
will be moving to CONFIG_XENFV_MACHINE over time.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/Kconfig  | 1 +
 hw/i386/Kconfig | 5 +
 hw/xen/Kconfig  | 3 +++
 meson.build | 1 +
 4 files changed, 10 insertions(+)
 create mode 100644 hw/xen/Kconfig

diff --git a/hw/Kconfig b/hw/Kconfig
index 38233bbb0f..ba62ff6417 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -41,6 +41,7 @@ source tpm/Kconfig
 source usb/Kconfig
 source virtio/Kconfig
 source vfio/Kconfig
+source xen/Kconfig
 source watchdog/Kconfig
 
 # arch Kconfig
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index d22ac4a4b9..c9fd577997 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -137,3 +137,8 @@ config VMPORT
 config VMMOUSE
 bool
 depends on VMPORT
+
+config XEN_EMU
+bool
+default y
+depends on KVM && (I386 || X86_64)
diff --git a/hw/xen/Kconfig b/hw/xen/Kconfig
new file mode 100644
index 00..755c8b1faf
--- /dev/null
+++ b/hw/xen/Kconfig
@@ -0,0 +1,3 @@
+config XENFV_MACHINE
+bool
+default y if (XEN || XEN_EMU)
diff --git a/meson.build b/meson.build
index 5c6b5a1c75..9348cf572c 100644
--- a/meson.build
+++ b/meson.build
@@ -3828,6 +3828,7 @@ if have_system
   if xen.found()
 summary_info += {'xen ctrl version':  xen.version()}
   endif
+  summary_info += {'Xen emulation': config_all.has_key('CONFIG_XEN_EMU')}
 endif
 summary_info += {'TCG support':   config_all.has_key('CONFIG_TCG')}
 if config_all.has_key('CONFIG_TCG')
-- 
2.35.3




[RFC PATCH v5 37/52] hw/xen: Implement EVTCHNOP_bind_vcpu

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 40 +++
 hw/i386/kvm/xen_evtchn.h  |  2 ++
 target/i386/kvm/xen-emu.c | 12 
 3 files changed, 54 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 9146e61c66..1623922dfc 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -784,6 +784,46 @@ int xen_evtchn_unmask_op(struct evtchn_unmask *unmask)
 return ret;
 }
 
+int xen_evtchn_bind_vcpu_op(struct evtchn_bind_vcpu *vcpu)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+XenEvtchnPort *p;
+int ret = -EINVAL;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (!valid_port(vcpu->port)) {
+return -EINVAL;
+}
+
+if (!valid_vcpu(vcpu->vcpu)) {
+return -ENOENT;
+}
+
+qemu_mutex_lock(>port_lock);
+
+p = >port_table[vcpu->port];
+
+if (p->type == EVTCHNSTAT_interdomain ||
+p->type == EVTCHNSTAT_unbound ||
+p->type == EVTCHNSTAT_pirq ||
+(p->type == EVTCHNSTAT_virq && virq_is_global(p->type_val))) {
+/*
+ * unmask_port() with do_unmask==false will just raise the event
+ * on the new vCPU if the port was already pending.
+ */
+p->vcpu = vcpu->vcpu;
+unmask_port(s, vcpu->port, false);
+ret = 0;
+}
+
+qemu_mutex_unlock(>port_lock);
+
+return ret;
+}
+
 int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 1ebc7580eb..486b031c82 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -23,6 +23,7 @@ struct evtchn_bind_ipi;
 struct evtchn_send;
 struct evtchn_alloc_unbound;
 struct evtchn_bind_interdomain;
+struct evtchn_bind_vcpu;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
@@ -31,5 +32,6 @@ int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi);
 int xen_evtchn_send_op(struct evtchn_send *send);
 int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc);
 int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain 
*interdomain);
+int xen_evtchn_bind_vcpu_op(struct evtchn_bind_vcpu *vcpu);
 
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 8a2a8c6291..960999198a 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -934,6 +934,18 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 }
 break;
 }
+case EVTCHNOP_bind_vcpu: {
+struct evtchn_bind_vcpu vcpu;
+
+qemu_build_assert(sizeof(vcpu) == 8);
+if (kvm_copy_from_gva(cs, arg, , sizeof(vcpu))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_evtchn_bind_vcpu_op();
+break;
+}
 default:
 return false;
 }
-- 
2.35.3




[RFC PATCH v5 22/52] i386/xen: handle VCPUOP_register_vcpu_time_info

2022-12-30 Thread David Woodhouse
From: Joao Martins 

In order to support Linux vdso in Xen.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
---
 target/i386/cpu.h |   1 +
 target/i386/kvm/xen-emu.c | 100 +-
 target/i386/machine.c |   1 +
 3 files changed, 90 insertions(+), 12 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 109b2e5669..96c2d0d5cb 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1790,6 +1790,7 @@ typedef struct CPUArchState {
 struct kvm_nested_state *nested_state;
 uint64_t xen_vcpu_info_gpa;
 uint64_t xen_vcpu_info_default_gpa;
+uint64_t xen_vcpu_time_info_gpa;
 #endif
 #if defined(CONFIG_HVF)
 HVFX86LazyFlags hvf_lflags;
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index aa06588c07..ebb5d1296a 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -38,28 +38,41 @@
 #define hypercall_compat32(longmode) (false)
 #endif
 
-static int kvm_gva_rw(CPUState *cs, uint64_t gva, void *_buf, size_t sz,
-  bool is_write)
+static bool kvm_gva_to_gpa(CPUState *cs, uint64_t gva, uint64_t *gpa,
+   size_t *len, bool is_write)
 {
-uint8_t *buf = (uint8_t *)_buf;
-int ret;
-
-while (sz) {
 struct kvm_translation tr = {
 .linear_address = gva,
 };
 
-size_t len = TARGET_PAGE_SIZE - (tr.linear_address & 
~TARGET_PAGE_MASK);
-if (len > sz) {
-len = sz;
+if (len) {
+*len = TARGET_PAGE_SIZE - (gva & ~TARGET_PAGE_MASK);
+}
+
+if (kvm_vcpu_ioctl(cs, KVM_TRANSLATE, ) || !tr.valid ||
+(is_write && !tr.writeable)) {
+return false;
 }
+*gpa = tr.physical_address;
+return true;
+}
+
+static int kvm_gva_rw(CPUState *cs, uint64_t gva, void *_buf, size_t sz,
+  bool is_write)
+{
+uint8_t *buf = (uint8_t *)_buf;
+uint64_t gpa;
+size_t len;
 
-ret = kvm_vcpu_ioctl(cs, KVM_TRANSLATE, );
-if (ret || !tr.valid || (is_write && !tr.writeable)) {
+while (sz) {
+if (!kvm_gva_to_gpa(cs, gva, , , is_write)) {
 return -EFAULT;
 }
+if (len > sz) {
+len = sz;
+}
 
-cpu_physical_memory_rw(tr.physical_address, buf, len, is_write);
+cpu_physical_memory_rw(gpa, buf, len, is_write);
 
 buf += len;
 sz -= len;
@@ -147,6 +160,7 @@ int kvm_xen_init_vcpu(CPUState *cs)
 
 env->xen_vcpu_info_gpa = INVALID_GPA;
 env->xen_vcpu_info_default_gpa = INVALID_GPA;
+env->xen_vcpu_time_info_gpa = INVALID_GPA;
 
 return 0;
 }
@@ -230,6 +244,17 @@ static void do_set_vcpu_info_gpa(CPUState *cs, 
run_on_cpu_data data)
   env->xen_vcpu_info_gpa);
 }
 
+static void do_set_vcpu_time_info_gpa(CPUState *cs, run_on_cpu_data data)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
+
+env->xen_vcpu_time_info_gpa = data.host_ulong;
+
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO,
+  env->xen_vcpu_time_info_gpa);
+}
+
 static void do_vcpu_soft_reset(CPUState *cs, run_on_cpu_data data)
 {
 X86CPU *cpu = X86_CPU(cs);
@@ -237,8 +262,11 @@ static void do_vcpu_soft_reset(CPUState *cs, 
run_on_cpu_data data)
 
 env->xen_vcpu_info_gpa = INVALID_GPA;
 env->xen_vcpu_info_default_gpa = INVALID_GPA;
+env->xen_vcpu_time_info_gpa = INVALID_GPA;
 
 kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO, INVALID_GPA);
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO,
+  INVALID_GPA);
 }
 
 static int xen_set_shared_info(uint64_t gfn)
@@ -452,6 +480,42 @@ static int vcpuop_register_vcpu_info(CPUState *cs, 
CPUState *target,
 return 0;
 }
 
+static int vcpuop_register_vcpu_time_info(CPUState *cs, CPUState *target,
+  uint64_t arg)
+{
+struct vcpu_register_time_memory_area tma;
+uint64_t gpa;
+size_t len;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(tma) == 8);
+qemu_build_assert(sizeof(struct vcpu_time_info) == 32);
+
+if (!target) {
+return -ENOENT;
+}
+
+if (kvm_copy_from_gva(cs, arg, , sizeof(tma))) {
+return -EFAULT;
+}
+
+/*
+ * Xen actually uses the GVA and does the translation through the guest
+ * page tables each time. But Linux/KVM uses the GPA, on the assumption
+ * that guests only ever use *global* addresses (kernel virtual addresses)
+ * for it. If Linux is changed to redo the GVA→GPA translation each time,
+ * it will offer a new vCPU attribute for that, and we'll use it instead.
+ */
+if (!kvm_gva_to_gpa(cs, tma.addr.p, , , false) ||
+len < sizeof(struct vcpu_time_info)) {
+return -EFAULT;
+}
+
+async_run_on_cpu(target, do_set_vcpu_time_info_gpa,
+ 

[RFC PATCH v5 21/52] i386/xen: handle VCPUOP_register_vcpu_info

2022-12-30 Thread David Woodhouse
From: Joao Martins 

Handle the hypercall to set a per vcpu info, and also wire up the default
vcpu_info in the shared_info page for the first 32 vCPUs.

To avoid deadlock within KVM a vCPU thread must set its *own* vcpu_info
rather than it being set from the context in which the hypercall is
invoked.

Add the vcpu_info (and default) GPA to the vmstate_x86_cpu for migration,
and restore it in kvm_arch_put_registers() appropriately.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
---
 target/i386/cpu.h|   2 +
 target/i386/kvm/kvm.c|  17 
 target/i386/kvm/trace-events |   1 +
 target/i386/kvm/xen-emu.c| 152 ++-
 target/i386/kvm/xen-emu.h|   2 +
 target/i386/machine.c|  19 +
 6 files changed, 190 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index c6c57baed5..109b2e5669 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1788,6 +1788,8 @@ typedef struct CPUArchState {
 #endif
 #if defined(CONFIG_KVM)
 struct kvm_nested_state *nested_state;
+uint64_t xen_vcpu_info_gpa;
+uint64_t xen_vcpu_info_default_gpa;
 #endif
 #if defined(CONFIG_HVF)
 HVFX86LazyFlags hvf_lflags;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f365e56fcc..52d69e87e7 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4734,6 +4734,15 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 kvm_arch_set_tsc_khz(cpu);
 }
 
+#ifdef CONFIG_XEN_EMU
+if (xen_mode == XEN_EMULATE && level == KVM_PUT_FULL_STATE) {
+ret = kvm_put_xen_state(cpu);
+if (ret < 0) {
+return ret;
+}
+}
+#endif
+
 ret = kvm_getput_regs(x86_cpu, 1);
 if (ret < 0) {
 return ret;
@@ -4833,6 +4842,14 @@ int kvm_arch_get_registers(CPUState *cs)
 if (ret < 0) {
 goto out;
 }
+#ifdef CONFIG_XEN_EMU
+if (xen_mode == XEN_EMULATE) {
+ret = kvm_get_xen_state(cs);
+if (ret < 0) {
+goto out;
+}
+}
+#endif
 ret = 0;
  out:
 cpu_sync_bndcs_hflags(>env);
diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index 0a47c26e80..14e54dfca5 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -9,3 +9,4 @@ kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
 # xen-emu.c
 kvm_xen_hypercall(int cpu, uint8_t cpl, uint64_t input, uint64_t a0, uint64_t 
a1, uint64_t a2, uint64_t ret) "xen_hypercall: cpu %d cpl %d input %" PRIu64 " 
a0 0x%" PRIx64 " a1 0x%" PRIx64 " a2 0x%" PRIx64" ret 0x%" PRIx64
 kvm_xen_set_shared_info(uint64_t gfn) "shared info at gfn 0x%" PRIx64
+kvm_xen_set_vcpu_attr(int cpu, int type, uint64_t gpa) "vcpu attr cpu %d type 
%d gpa 0x%" PRIx64
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 0e9ae481d8..aa06588c07 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -120,6 +120,8 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
 
 int kvm_xen_init_vcpu(CPUState *cs)
 {
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
 int err;
 
 /*
@@ -143,6 +145,9 @@ int kvm_xen_init_vcpu(CPUState *cs)
 }
 }
 
+env->xen_vcpu_info_gpa = INVALID_GPA;
+env->xen_vcpu_info_default_gpa = INVALID_GPA;
+
 return 0;
 }
 
@@ -188,10 +193,58 @@ static bool kvm_xen_hcall_xen_version(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static int kvm_xen_set_vcpu_attr(CPUState *cs, uint16_t type, uint64_t gpa)
+{
+struct kvm_xen_vcpu_attr xhsi;
+
+xhsi.type = type;
+xhsi.u.gpa = gpa;
+
+trace_kvm_xen_set_vcpu_attr(cs->cpu_index, type, gpa);
+
+return kvm_vcpu_ioctl(cs, KVM_XEN_VCPU_SET_ATTR, );
+}
+
+static void do_set_vcpu_info_default_gpa(CPUState *cs, run_on_cpu_data data)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
+
+env->xen_vcpu_info_default_gpa = data.host_ulong;
+
+/* Changing the default does nothing if a vcpu_info was explicitly set. */
+if (env->xen_vcpu_info_gpa == INVALID_GPA) {
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO,
+  env->xen_vcpu_info_default_gpa);
+}
+}
+
+static void do_set_vcpu_info_gpa(CPUState *cs, run_on_cpu_data data)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
+
+env->xen_vcpu_info_gpa = data.host_ulong;
+
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO,
+  env->xen_vcpu_info_gpa);
+}
+
+static void do_vcpu_soft_reset(CPUState *cs, run_on_cpu_data data)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
+
+env->xen_vcpu_info_gpa = INVALID_GPA;
+env->xen_vcpu_info_default_gpa = INVALID_GPA;
+
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO, INVALID_GPA);
+}
+
 static int xen_set_shared_info(uint64_t gfn)
 {
 uint64_t gpa = gfn << TARGET_PAGE_BITS;
-int err;
+int i, err;
 
 /*

[RFC PATCH v5 40/52] hw/xen: Support HVM_PARAM_CALLBACK_TYPE_GSI callback

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

The GSI callback (and later PCI_INTX) is a level triggered interrupt. It
is asserted when an event channel is delivered to vCPU0, and is supposed
to be cleared when the vcpu_info->evtchn_upcall_pending field for vCPU0
is cleared again.

Thankfully, Xen does *not* assert the GSI if the guest sets its own
evtchn_upcall_pending field; we only need to assert the GSI when we
have delivered an event for ourselves. So that's the easy part.

However, we *do* need to poll for the evtchn_upcall_pending flag being
cleared. In an ideal world we would poll that when the EOI happens on
the PIC/IOAPIC. That's how it works in the kernel with the VFIO eventfd
pairs — one is used to trigger the interrupt, and the other works in the
other direction to 'resample' on EOI, and trigger the first eventfd
again if the line is still active.

However, QEMU doesn't seem to do that. Even VFIO level interrupts seem
to be supported by temporarily unmapping the device's BARs from the
guest when an interrupt happens, then trapping *all* MMIO to the device
and sending the 'resample' event on *every* MMIO access until the IRQ
is cleared! Maybe in future we'll plumb the 'resample' concept through
QEMU's irq framework but for now we'll do what Xen itself does: just
check the flag on every vmexit if the upcall GSI is known to be
asserted.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 45 ++
 hw/i386/kvm/xen_evtchn.h  |  4 
 hw/i386/pc.c  |  6 +
 target/i386/cpu.h |  1 +
 target/i386/kvm/kvm.c | 13 +++
 target/i386/kvm/xen-emu.c | 46 ++-
 target/i386/kvm/xen-emu.h |  1 +
 7 files changed, 106 insertions(+), 10 deletions(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 87136334f5..29f5e60172 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -24,6 +24,8 @@
 
 #include "hw/sysbus.h"
 #include "hw/xen/xen.h"
+#include "hw/i386/x86.h"
+#include "hw/irq.h"
 
 #include "xen_evtchn.h"
 #include "xen_overlay.h"
@@ -102,6 +104,7 @@ struct XenEvtchnState {
 QemuMutex port_lock;
 uint32_t nr_ports;
 XenEvtchnPort port_table[EVTCHN_2L_NR_CHANNELS];
+qemu_irq gsis[GSI_NUM_PINS];
 };
 
 struct XenEvtchnState *xen_evtchn_singleton;
@@ -170,9 +173,29 @@ void xen_evtchn_create(void)
 {
 XenEvtchnState *s = XEN_EVTCHN(sysbus_create_simple(TYPE_XEN_EVTCHN,
 -1, NULL));
+int i;
+
 xen_evtchn_singleton = s;
 
 qemu_mutex_init(>port_lock);
+
+for (i = 0; i < GSI_NUM_PINS; i++) {
+sysbus_init_irq(SYS_BUS_DEVICE(s), >gsis[i]);
+}
+}
+
+void xen_evtchn_connect_gsis(qemu_irq *system_gsis)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int i;
+
+if (!s) {
+return;
+}
+
+for (i = 0; i < GSI_NUM_PINS; i++) {
+sysbus_connect_irq(SYS_BUS_DEVICE(s), i, system_gsis[i]);
+}
 }
 
 static void xen_evtchn_register_types(void)
@@ -182,6 +205,23 @@ static void xen_evtchn_register_types(void)
 
 type_init(xen_evtchn_register_types)
 
+void xen_evtchn_set_callback_level(int level)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+
+if (s) {
+uint32_t param = (uint32_t)s->callback_param;
+
+switch (s->callback_param >> CALLBACK_VIA_TYPE_SHIFT) {
+case HVM_PARAM_CALLBACK_TYPE_GSI:
+if (param < GSI_NUM_PINS) {
+qemu_set_irq(s->gsis[param], level);
+}
+break;
+}
+}
+}
+
 int xen_evtchn_set_callback_param(uint64_t param)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
@@ -207,6 +247,11 @@ int xen_evtchn_set_callback_param(uint64_t param)
 }
 break;
 }
+
+case HVM_PARAM_CALLBACK_TYPE_GSI:
+ret = 0;
+break;
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 1de53daa52..86f154e359 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -12,9 +12,13 @@
 #ifndef QEMU_XEN_EVTCHN_H
 #define QEMU_XEN_EVTCHN_H
 
+#include "hw/sysbus.h"
+
 void xen_evtchn_create(void);
 int xen_evtchn_soft_reset(void);
 int xen_evtchn_set_callback_param(uint64_t param);
+void xen_evtchn_connect_gsis(qemu_irq *system_gsis);
+void xen_evtchn_set_callback_level(int level);
 
 void hmp_xen_event_list(Monitor *mon, const QDict *qdict);
 void hmp_xen_event_inject(Monitor *mon, const QDict *qdict);
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c1328a779d..1c4941de8f 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1303,6 +1303,12 @@ void pc_basic_device_init(struct PCMachineState *pcms,
 }
 *rtc_state = mc146818_rtc_init(isa_bus, 2000, rtc_irq);
 
+#ifdef CONFIG_XEN_EMU
+if (xen_mode == XEN_EMULATE) {
+xen_evtchn_connect_gsis(gsi);
+}
+#endif
+
 qemu_register_boot_set(pc_boot_set, *rtc_state);
 
 if (!xen_enabled() &&
diff 

[RFC PATCH v5 29/52] hw/xen: Implement EVTCHNOP_status

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

This adds the basic structure for maintaining the port table and reporting
the status of ports therein.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 106 ++
 hw/i386/kvm/xen_evtchn.h  |   3 ++
 target/i386/kvm/xen-emu.c |  20 ++-
 3 files changed, 127 insertions(+), 2 deletions(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 018f4ef4da..0f1abfb760 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -22,6 +22,7 @@
 #include "hw/sysbus.h"
 #include "hw/xen/xen.h"
 #include "xen_evtchn.h"
+#include "xen_overlay.h"
 
 #include "sysemu/kvm.h"
 #include "sysemu/kvm_xen.h"
@@ -33,6 +34,24 @@
 #define TYPE_XEN_EVTCHN "xen-evtchn"
 OBJECT_DECLARE_SIMPLE_TYPE(XenEvtchnState, XEN_EVTCHN)
 
+typedef struct XenEvtchnPort {
+uint32_t vcpu;  /* Xen/ACPI vcpu_id */
+uint16_t type;  /* EVTCHNSTAT_ */
+uint16_t type_val;  /* pirq# / virq# / remote port according to type */
+} XenEvtchnPort;
+
+#define COMPAT_EVTCHN_2L_NR_CHANNELS1024
+
+/*
+ * For unbound/interdomain ports there are only two possible remote
+ * domains; self and QEMU. Use a single high bit in type_val for that,
+ * and the low bits for the remote port number (or 0 for unbound).
+ */
+#define PORT_INFO_TYPEVAL_REMOTE_QEMU   0x8000
+#define PORT_INFO_TYPEVAL_REMOTE_PORT_MASK  0x7FFF
+
+#define DOMID_QEMU  0
+
 struct XenEvtchnState {
 /*< private >*/
 SysBusDevice busdev;
@@ -42,6 +61,8 @@ struct XenEvtchnState {
 bool evtchn_in_kernel;
 
 QemuMutex port_lock;
+uint32_t nr_ports;
+XenEvtchnPort port_table[EVTCHN_2L_NR_CHANNELS];
 };
 
 struct XenEvtchnState *xen_evtchn_singleton;
@@ -65,6 +86,18 @@ static bool xen_evtchn_is_needed(void *opaque)
 return xen_mode == XEN_EMULATE;
 }
 
+static const VMStateDescription xen_evtchn_port_vmstate = {
+.name = "xen_evtchn_port",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(vcpu, XenEvtchnPort),
+VMSTATE_UINT16(type, XenEvtchnPort),
+VMSTATE_UINT16(type_val, XenEvtchnPort),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription xen_evtchn_vmstate = {
 .name = "xen_evtchn",
 .version_id = 1,
@@ -73,6 +106,9 @@ static const VMStateDescription xen_evtchn_vmstate = {
 .post_load = xen_evtchn_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINT64(callback_param, XenEvtchnState),
+VMSTATE_UINT32(nr_ports, XenEvtchnState),
+VMSTATE_STRUCT_VARRAY_UINT32(port_table, XenEvtchnState, nr_ports, 1,
+ xen_evtchn_port_vmstate, XenEvtchnPort),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -146,3 +182,73 @@ int xen_evtchn_set_callback_param(uint64_t param)
 
 return ret;
 }
+
+static bool valid_port(evtchn_port_t port)
+{
+if (!port) {
+return false;
+}
+
+if (xen_is_long_mode()) {
+return port < EVTCHN_2L_NR_CHANNELS;
+} else {
+return port < COMPAT_EVTCHN_2L_NR_CHANNELS;
+}
+}
+
+int xen_evtchn_status_op(struct evtchn_status *status)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+XenEvtchnPort *p;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (status->dom != DOMID_SELF && status->dom != xen_domid) {
+return -ESRCH;
+}
+
+if (!valid_port(status->port)) {
+return -EINVAL;
+}
+
+qemu_mutex_lock(>port_lock);
+
+p = >port_table[status->port];
+
+status->status = p->type;
+status->vcpu = p->vcpu;
+
+switch (p->type) {
+case EVTCHNSTAT_unbound:
+if (p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU) {
+status->u.unbound.dom = DOMID_QEMU;
+} else {
+status->u.unbound.dom = xen_domid;
+}
+break;
+
+case EVTCHNSTAT_interdomain:
+if (p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU) {
+status->u.interdomain.dom = DOMID_QEMU;
+} else {
+status->u.interdomain.dom = xen_domid;
+}
+
+status->u.interdomain.port = p->type_val &
+PORT_INFO_TYPEVAL_REMOTE_PORT_MASK;
+break;
+
+case EVTCHNSTAT_pirq:
+status->u.pirq = p->type_val;
+break;
+
+case EVTCHNSTAT_virq:
+status->u.virq = p->type_val;
+break;
+}
+
+qemu_mutex_unlock(>port_lock);
+return 0;
+}
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index c9b7f9d11f..76467636ee 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -15,4 +15,7 @@
 void xen_evtchn_create(void);
 int xen_evtchn_set_callback_param(uint64_t param);
 
+struct evtchn_status;
+int xen_evtchn_status_op(struct evtchn_status *status);
+
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 11a6fe4164..6f3f8ae834 100644
--- a/target/i386/kvm/xen-emu.c
+++ 

[RFC PATCH v5 38/52] hw/xen: Implement EVTCHNOP_reset

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 29 +
 hw/i386/kvm/xen_evtchn.h  |  3 +++
 target/i386/kvm/xen-emu.c | 17 +
 3 files changed, 49 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 1623922dfc..cdf7d86da8 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -740,6 +740,35 @@ static int close_port(XenEvtchnState *s, evtchn_port_t 
port)
 return 0;
 }
 
+int xen_evtchn_soft_reset(void)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int i;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+qemu_mutex_lock(>port_lock);
+
+for (i = 0; i < s->nr_ports; i++) {
+close_port(s, i);
+}
+
+qemu_mutex_unlock(>port_lock);
+
+return 0;
+}
+
+int xen_evtchn_reset_op(struct evtchn_reset *reset)
+{
+if (reset->dom != DOMID_SELF && reset->dom != xen_domid) {
+return -ESRCH;
+}
+
+return xen_evtchn_soft_reset();
+}
+
 int xen_evtchn_close_op(struct evtchn_close *close)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 486b031c82..5d3e03553f 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -13,6 +13,7 @@
 #define QEMU_XEN_EVTCHN_H
 
 void xen_evtchn_create(void);
+int xen_evtchn_soft_reset(void);
 int xen_evtchn_set_callback_param(uint64_t param);
 
 struct evtchn_status;
@@ -24,6 +25,7 @@ struct evtchn_send;
 struct evtchn_alloc_unbound;
 struct evtchn_bind_interdomain;
 struct evtchn_bind_vcpu;
+struct evtchn_reset;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
@@ -33,5 +35,6 @@ int xen_evtchn_send_op(struct evtchn_send *send);
 int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc);
 int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain 
*interdomain);
 int xen_evtchn_bind_vcpu_op(struct evtchn_bind_vcpu *vcpu);
+int xen_evtchn_reset_op(struct evtchn_reset *reset);
 
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 960999198a..e4104b67c3 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -946,6 +946,18 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 err = xen_evtchn_bind_vcpu_op();
 break;
 }
+case EVTCHNOP_reset: {
+struct evtchn_reset reset;
+
+qemu_build_assert(sizeof(reset) == 2);
+if (kvm_copy_from_gva(cs, arg, , sizeof(reset))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_evtchn_reset_op();
+break;
+}
 default:
 return false;
 }
@@ -959,6 +971,11 @@ static int kvm_xen_soft_reset(void)
 CPUState *cpu;
 int err;
 
+err = xen_evtchn_soft_reset();
+if (err) {
+return err;
+}
+
 err = xen_evtchn_set_callback_param(0);
 if (err) {
 return err;
-- 
2.35.3




[RFC PATCH v5 28/52] i386/xen: Add support for Xen event channel delivery to vCPU

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

The kvm_xen_inject_vcpu_callback_vector() function will either deliver
the per-vCPU local APIC vector (as an MSI), or just kick the vCPU out
of the kernel to trigger KVM's automatic delivery of the global vector.
Support for asserting the GSI/PCI_INTX callbacks will come later.

Also add kvm_xen_get_vcpu_info_hva() which returns the vcpu_info of
a given vCPU.

Signed-off-by: David Woodhouse 
---
 include/sysemu/kvm_xen.h  |  2 ++
 target/i386/kvm/xen-emu.c | 68 +++
 2 files changed, 70 insertions(+)

diff --git a/include/sysemu/kvm_xen.h b/include/sysemu/kvm_xen.h
index 3e43cd7843..ee53294deb 100644
--- a/include/sysemu/kvm_xen.h
+++ b/include/sysemu/kvm_xen.h
@@ -17,6 +17,8 @@
 #define INVALID_GFN UINT64_MAX
 
 uint32_t kvm_xen_get_caps(void);
+void *kvm_xen_get_vcpu_info_hva(uint32_t vcpu_id);
+void kvm_xen_inject_vcpu_callback_vector(uint32_t vcpu_id, int type);
 
 #define kvm_xen_has_cap(cap) (!!(kvm_xen_get_caps() &   \
  KVM_XEN_HVM_CONFIG_ ## cap))
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 87fcfc636c..11a6fe4164 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -22,6 +22,8 @@
 #include "trace.h"
 #include "sysemu/runstate.h"
 
+#include "hw/pci/msi.h"
+#include "hw/i386/apic-msidef.h"
 #include "hw/i386/kvm/xen_overlay.h"
 #include "hw/i386/kvm/xen_evtchn.h"
 
@@ -274,6 +276,72 @@ static void do_set_vcpu_info_gpa(CPUState *cs, 
run_on_cpu_data data)
   env->xen_vcpu_info_gpa);
 }
 
+static void *gpa_to_hva(uint64_t gpa)
+{
+MemoryRegionSection mrs;
+
+mrs = memory_region_find(get_system_memory(), gpa, 1);
+return !mrs.mr ? NULL : qemu_map_ram_ptr(mrs.mr->ram_block,
+ mrs.offset_within_region);
+}
+
+void *kvm_xen_get_vcpu_info_hva(uint32_t vcpu_id)
+{
+CPUState *cs = qemu_get_cpu(vcpu_id);
+CPUX86State *env;
+uint64_t gpa;
+
+if (!cs) {
+return NULL;
+}
+env = _CPU(cs)->env;
+
+gpa = env->xen_vcpu_info_gpa;
+if (gpa == INVALID_GPA) {
+gpa = env->xen_vcpu_info_default_gpa;
+}
+if (gpa == INVALID_GPA) {
+return NULL;
+}
+
+return gpa_to_hva(gpa);
+}
+
+void kvm_xen_inject_vcpu_callback_vector(uint32_t vcpu_id, int type)
+{
+CPUState *cs = qemu_get_cpu(vcpu_id);
+uint8_t vector;
+
+if (!cs) {
+return;
+}
+
+vector = X86_CPU(cs)->env.xen_vcpu_callback_vector;
+if (vector) {
+/*
+ * The per-vCPU callback vector injected via lapic. Just
+ * deliver it as an MSI.
+ */
+MSIMessage msg = {
+.address = APIC_DEFAULT_ADDRESS | X86_CPU(cs)->apic_id,
+.data = vector | (1UL << MSI_DATA_LEVEL_SHIFT),
+};
+kvm_irqchip_send_msi(kvm_state, msg);
+return;
+}
+
+switch (type) {
+case HVM_PARAM_CALLBACK_TYPE_VECTOR:
+/*
+ * If the evtchn_upcall_pending field in the vcpu_info is set, then
+ * KVM will automatically deliver the vector on entering the vCPU
+ * so all we have to do is kick it out.
+ */
+qemu_cpu_kick(cs);
+break;
+}
+}
+
 static void do_set_vcpu_time_info_gpa(CPUState *cs, run_on_cpu_data data)
 {
 X86CPU *cpu = X86_CPU(cs);
-- 
2.35.3




[RFC PATCH v5 19/52] i386/xen: implement HYPERVISOR_hvm_op

2022-12-30 Thread David Woodhouse
From: Joao Martins 

This is when guest queries for support for HVMOP_pagetable_dying.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
---
 target/i386/kvm/xen-emu.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 5a969be4cc..6b4d40397c 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -27,6 +27,7 @@
 #include "standard-headers/xen/version.h"
 #include "standard-headers/xen/sched.h"
 #include "standard-headers/xen/memory.h"
+#include "standard-headers/xen/hvm/hvm_op.h"
 
 #include "xen-compat.h"
 
@@ -348,6 +349,19 @@ static bool kvm_xen_hcall_memory_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static bool kvm_xen_hcall_hvm_op(struct kvm_xen_exit *exit, X86CPU *cpu,
+ int cmd, uint64_t arg)
+{
+switch (cmd) {
+case HVMOP_pagetable_dying:
+exit->u.hcall.result = -ENOSYS;
+return true;
+
+default:
+return false;
+}
+}
+
 static int kvm_xen_soft_reset(void)
 {
 int err;
@@ -442,6 +456,9 @@ static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct 
kvm_xen_exit *exit)
 case __HYPERVISOR_sched_op:
 return kvm_xen_hcall_sched_op(exit, cpu, exit->u.hcall.params[0],
   exit->u.hcall.params[1]);
+case __HYPERVISOR_hvm_op:
+return kvm_xen_hcall_hvm_op(exit, cpu, exit->u.hcall.params[0],
+exit->u.hcall.params[1]);
 case __HYPERVISOR_memory_op:
 return kvm_xen_hcall_memory_op(exit, cpu, exit->u.hcall.params[0],
exit->u.hcall.params[1]);
-- 
2.35.3




[RFC PATCH v5 35/52] hw/xen: Implement EVTCHNOP_alloc_unbound

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 32 
 hw/i386/kvm/xen_evtchn.h  |  2 ++
 target/i386/kvm/xen-emu.c | 15 +++
 3 files changed, 49 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 1ce60f7b1d..5b93d52e6b 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -830,6 +830,38 @@ int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi)
 return ret;
 }
 
+int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+uint16_t type_val;
+int ret;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (alloc->dom != DOMID_SELF && alloc->dom != xen_domid) {
+return -ESRCH;
+}
+
+if (alloc->remote_dom == DOMID_QEMU) {
+type_val = PORT_INFO_TYPEVAL_REMOTE_QEMU;
+} else if (alloc->remote_dom == DOMID_SELF ||
+   alloc->remote_dom == xen_domid) {
+type_val = 0;
+} else {
+return -EPERM;
+}
+
+qemu_mutex_lock(>port_lock);
+
+ret = allocate_port(s, 0, EVTCHNSTAT_unbound, type_val, >port);
+
+qemu_mutex_unlock(>port_lock);
+
+return ret;
+}
+
 int xen_evtchn_send_op(struct evtchn_send *send)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 500fdbe8b8..fc080138e3 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -21,11 +21,13 @@ struct evtchn_unmask;
 struct evtchn_bind_virq;
 struct evtchn_bind_ipi;
 struct evtchn_send;
+struct evtchn_alloc_unbound;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
 int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq);
 int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi);
 int xen_evtchn_send_op(struct evtchn_send *send);
+int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc);
 
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index c14891748a..ba3b32b67b 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -903,6 +903,21 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 err = xen_evtchn_send_op();
 break;
 }
+case EVTCHNOP_alloc_unbound: {
+struct evtchn_alloc_unbound alloc;
+
+qemu_build_assert(sizeof(alloc) == 8);
+if (kvm_copy_from_gva(cs, arg, , sizeof(alloc))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_evtchn_alloc_unbound_op();
+if (!err && kvm_copy_to_gva(cs, arg, , sizeof(alloc))) {
+err = -EFAULT;
+}
+break;
+}
 default:
 return false;
 }
-- 
2.35.3




[RFC PATCH v5 09/52] hw/xen_backend: refactor xen_be_init()

2022-12-30 Thread David Woodhouse
From: Joao Martins 

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/xen/xen-legacy-backend.c | 40 +
 include/hw/xen/xen-legacy-backend.h |  3 +++
 2 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/hw/xen/xen-legacy-backend.c b/hw/xen/xen-legacy-backend.c
index 085fd31ef7..694e7bbc54 100644
--- a/hw/xen/xen-legacy-backend.c
+++ b/hw/xen/xen-legacy-backend.c
@@ -676,17 +676,40 @@ void xenstore_update_fe(char *watch, struct 
XenLegacyDevice *xendev)
 }
 /*  */
 
-int xen_be_init(void)
+int xen_be_xenstore_open(void)
 {
-xengnttab_handle *gnttabdev;
-
 xenstore = xs_daemon_open();
 if (!xenstore) {
-xen_pv_printf(NULL, 0, "can't connect to xenstored\n");
 return -1;
 }
 
 qemu_set_fd_handler(xs_fileno(xenstore), xenstore_update, NULL, NULL);
+return 0;
+}
+
+void xen_be_xenstore_close(void)
+{
+qemu_set_fd_handler(xs_fileno(xenstore), NULL, NULL, NULL);
+xs_daemon_close(xenstore);
+xenstore = NULL;
+}
+
+void xen_be_sysdev_init(void)
+{
+xen_sysdev = qdev_new(TYPE_XENSYSDEV);
+sysbus_realize_and_unref(SYS_BUS_DEVICE(xen_sysdev), _fatal);
+xen_sysbus = qbus_new(TYPE_XENSYSBUS, xen_sysdev, "xen-sysbus");
+qbus_set_bus_hotplug_handler(xen_sysbus);
+}
+
+int xen_be_init(void)
+{
+xengnttab_handle *gnttabdev;
+
+if (xen_be_xenstore_open()) {
+xen_pv_printf(NULL, 0, "can't connect to xenstored\n");
+return -1;
+}
 
 if (xen_xc == NULL || xen_fmem == NULL) {
 /* Check if xen_init() have been called */
@@ -701,17 +724,12 @@ int xen_be_init(void)
 xengnttab_close(gnttabdev);
 }
 
-xen_sysdev = qdev_new(TYPE_XENSYSDEV);
-sysbus_realize_and_unref(SYS_BUS_DEVICE(xen_sysdev), _fatal);
-xen_sysbus = qbus_new(TYPE_XENSYSBUS, xen_sysdev, "xen-sysbus");
-qbus_set_bus_hotplug_handler(xen_sysbus);
+xen_be_sysdev_init();
 
 return 0;
 
 err:
-qemu_set_fd_handler(xs_fileno(xenstore), NULL, NULL, NULL);
-xs_daemon_close(xenstore);
-xenstore = NULL;
+xen_be_xenstore_close();
 
 return -1;
 }
diff --git a/include/hw/xen/xen-legacy-backend.h 
b/include/hw/xen/xen-legacy-backend.h
index be281e1f38..0aa171f6c2 100644
--- a/include/hw/xen/xen-legacy-backend.h
+++ b/include/hw/xen/xen-legacy-backend.h
@@ -42,6 +42,9 @@ int xenstore_read_fe_uint64(struct XenLegacyDevice *xendev, 
const char *node,
 void xen_be_check_state(struct XenLegacyDevice *xendev);
 
 /* xen backend driver bits */
+int xen_be_xenstore_open(void);
+void xen_be_xenstore_close(void);
+void xen_be_sysdev_init(void);
 int xen_be_init(void);
 void xen_be_register_common(void);
 int xen_be_register(const char *type, struct XenDevOps *ops);
-- 
2.35.3




[RFC PATCH v5 04/52] i386/kvm: Add xen-version KVM accelerator property and init KVM Xen support

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

This just initializes the basic Xen support in KVM for now. Only permitted
on TYPE_PC_MACHINE because that's where the sysbus devices for Xen heap
overlay, event channel, grant tables and other stuff will exist. There's
no point having the basic hypercall support if nothing else works.

Provide sysemu/kvm_xen.h and a kvm_xen_get_caps() which will be used
later by support devices.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 accel/kvm/kvm-all.c |  1 +
 include/sysemu/kvm_int.h|  2 ++
 include/sysemu/kvm_xen.h| 20 +
 target/i386/kvm/kvm.c   | 59 +
 target/i386/kvm/meson.build |  2 ++
 target/i386/kvm/xen-emu.c   | 58 
 target/i386/kvm/xen-emu.h   | 19 
 7 files changed, 161 insertions(+)
 create mode 100644 include/sysemu/kvm_xen.h
 create mode 100644 target/i386/kvm/xen-emu.c
 create mode 100644 target/i386/kvm/xen-emu.h

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f99b0becd8..568bb09c09 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3620,6 +3620,7 @@ static void kvm_accel_instance_init(Object *obj)
 s->kvm_dirty_ring_size = 0;
 s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN;
 s->notify_window = 0;
+s->xen_version = 0;
 }
 
 /**
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 3b4adcdc10..9a8c062609 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -110,6 +110,8 @@ struct KVMState
 struct KVMDirtyRingReaper reaper;
 NotifyVmexitOption notify_vmexit;
 uint32_t notify_window;
+uint32_t xen_version;
+uint32_t xen_caps;
 };
 
 void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
diff --git a/include/sysemu/kvm_xen.h b/include/sysemu/kvm_xen.h
new file mode 100644
index 00..296533f2d5
--- /dev/null
+++ b/include/sysemu/kvm_xen.h
@@ -0,0 +1,20 @@
+/*
+ * Xen HVM emulation support in KVM
+ *
+ * Copyright © 2019 Oracle and/or its affiliates. All rights reserved.
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_SYSEMU_KVM_XEN_H
+#define QEMU_SYSEMU_KVM_XEN_H
+
+uint32_t kvm_xen_get_caps(void);
+
+#define kvm_xen_has_cap(cap) (!!(kvm_xen_get_caps() &   \
+ KVM_XEN_HVM_CONFIG_ ## cap))
+
+#endif /* QEMU_SYSEMU_KVM_XEN_H */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a213209379..b097de0524 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -31,6 +31,7 @@
 #include "sysemu/runstate.h"
 #include "kvm_i386.h"
 #include "sev.h"
+#include "xen-emu.h"
 #include "hyperv.h"
 #include "hyperv-proto.h"
 
@@ -41,6 +42,7 @@
 #include "qemu/error-report.h"
 #include "qemu/memalign.h"
 #include "hw/i386/x86.h"
+#include "hw/i386/pc.h"
 #include "hw/i386/apic.h"
 #include "hw/i386/apic_internal.h"
 #include "hw/i386/apic-msidef.h"
@@ -48,6 +50,8 @@
 #include "hw/i386/x86-iommu.h"
 #include "hw/i386/e820_memory_layout.h"
 
+#include "hw/xen/xen.h"
+
 #include "hw/pci/pci.h"
 #include "hw/pci/msi.h"
 #include "hw/pci/msix.h"
@@ -2513,6 +2517,22 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 }
 }
 
+if (s->xen_version) {
+#ifdef CONFIG_XEN_EMU
+if (!object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE)) {
+error_report("kvm: Xen support only available in PC machine");
+return -ENOTSUP;
+}
+ret = kvm_xen_init(s);
+if (ret < 0) {
+return ret;
+}
+#else
+error_report("kvm: Xen support not enabled in qemu");
+return -ENOTSUP;
+#endif
+}
+
 ret = kvm_get_supported_msrs(s);
 if (ret < 0) {
 return ret;
@@ -5706,6 +5726,36 @@ static void kvm_arch_set_notify_window(Object *obj, 
Visitor *v,
 s->notify_window = value;
 }
 
+static void kvm_arch_get_xen_version(Object *obj, Visitor *v,
+ const char *name, void *opaque,
+ Error **errp)
+{
+KVMState *s = KVM_STATE(obj);
+uint32_t value = s->xen_version;
+
+visit_type_uint32(v, name, , errp);
+}
+
+static void kvm_arch_set_xen_version(Object *obj, Visitor *v,
+ const char *name, void *opaque,
+ Error **errp)
+{
+KVMState *s = KVM_STATE(obj);
+Error *error = NULL;
+uint32_t value;
+
+visit_type_uint32(v, name, , );
+if (error) {
+error_propagate(errp, error);
+return;
+}
+
+s->xen_version = value;
+if (value && xen_mode == XEN_DISABLED) {
+xen_mode = XEN_EMULATE;
+}
+}
+
 void kvm_arch_accel_class_init(ObjectClass *oc)
 {
 object_class_property_add_enum(oc, "notify-vmexit", "NotifyVMexitOption",
@@ 

[RFC PATCH v5 00/52] Xen support under KVM

2022-12-30 Thread David Woodhouse
Round 5, in which it gains a XenStore implementation. This just returns
ENOSYS to every request for now, but that's enough to let older Linux
guests boot, and let the XTF tests run.

As noted, I'd like to hook that up to a real xenstored via the UNIX
socket and an XS_SU command to let that connection be seen as the client
domain. If I can just get xenstored to be a little less crashy when run
without actual Xen, that is... 

Having even this much XenStore allows the GSI and PCI_INTX event 
delivery to be tested properly, and it's working OK with userspace 
IOAPIC and the hack to poll for deassertion on vmexit. I do still want 
to clean that (and VFIO) up with a hook on EOI in the PIC/IOAPIC instead 
of the excessive polling, but that's a pre-existing problem and I can 
live with what we have for now.

The in-kernel PIC/IOAPIC *do* have the right notifiers for acked GSIs
but they're deprecated, so I don't think I want to add code to the Xen
support in the kernel to support them; I think I'll just declare that
Xen support requires split-irqchip mode.

XenStore also means I can do some more kexec testing (since I'm lazy and
the kernels *inside* the disk images I'm using are old enough to need
it). So there are some fixes for SHUTDOWN_soft_reset in there too.

I think this is mostly ready to merge as it is, as an implementation of
the basic platform support.

Next steps.. with an actual xenstore hooked up, I can work on a set of 
gnttab backend operations like the evtchn ones, and then selecting the 
right set of ops according to whether we're running on true Xen or under 
KVM, and the existing back end drivers ought to work without a large 
amount of work (except perhaps for migration).

v5: 
https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv-kvm-5
 • Add backend implementation of event channel support, to parallel the
   libxenevtchn API used by existing backend drivers.

 • Add basic XenStore ring implementation, test migration and kexec.

 • Some kexec/soft reset fixes (clear port pending bits, kernel timer virq).
 
 • Fix race with setting the xen_callback_asserted flag before actually
   doing so, which could lead to it being *cleared* again before we even
   assert it... and leave it asserted for ever.

v4: 
https://lore.kernel.org/qemu-devel/20221221010623.1000191-1-dw...@infradead.org/
 • Add soft reset support near the beginning and thread it through the
   rest of the feature enablement.

 • Add PV timer support and advertise XENFEAT_safe_hvm_pvclock.

 • Add basic grant table mapping and [gs]et_version / query_size support.

 • Make xen_platform device build (and work) without CONFIG_XEN.

 • Fix Xen HVM mode not to require --xen-attach.

v3: 
https://lore.kernel.org/qemu-devel/20221216004117.862106-1-dw...@infradead.org/

 • Switch back to xen-version as KVM accelerator property, other review
   feedback and bug fixes.

 • Fix Hyper-V coexistence (ick, calling kvm_xen_init() again because
   hyperv_enabled() doesn't return the right answer the first time).

 • Implement event channel support, including GSI/PCI_INTX callback.

 • Implement 32-bit guest support.

v2: 
https://lore.kernel.org/qemu-devel/20221209095612.689243-1-dw...@infradead.org/

 • Attempt to implement migration support; every Xen enlightenment is
   now recorded either from vmstate_x86_cpu or from a new sysdev device
   created for that purpose. And — I believe — correctly restored, in
   the right order, on vmload.

 • The shared_info page is created as a proper overlay instead of abusing
   the underlying guest page. This is important because Windows doesn't
   even select a GPA which had RAM behind it beforehand. This will be
   extended to handle the grant frames too, in the fullness of time.

 • Set vCPU attributes from the correct vCPU thread to avoid deadlocks.

 • Carefully copy the entire hypercall argument structure from userspace
   instead of assuming that it's contiguous in HVA space.

 • Distinguish between "handled but intentionally returns -ENOSYS" and
   "no idea what that was" in hypercalls, allowing us to emit a
   GUEST_ERROR (actually, shouldn't that change to UNIMP?) on the
   latter. Experience shows that to we'll end up having to intentionally
   return -ENOSYS to a bunch of weird crap that ancient guests still
   attempt to use, including XenServer local hacks that nobody even
   remembers what they were (hvmop 0x101, anyone? Some old Windows
   PV driver appears to be trying to use it...).

 * Drop the '+xen' CPU property and present Xen CPUID instead of KVM
   unconditionally when running in Xen mode. Make the Xen CPUID coexist
   with Hyper-V CPUID as it should, though.

 • Add XEN_EMU and XENFV_MACHINE (the latter to be XEN_EMU||XEN) config
   options. Some more work on this, and the incestuous relationships
   between the KVM target code and the 'platform' code, is going to be
   required but it's probably better to get on with implementing the
   real code so we can see those 

[RFC PATCH v5 23/52] i386/xen: handle VCPUOP_register_runstate_memory_area

2022-12-30 Thread David Woodhouse
From: Joao Martins 

Allow guest to setup the vcpu runstates which is used as
steal clock.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
---
 target/i386/cpu.h |  1 +
 target/i386/kvm/xen-emu.c | 57 +++
 target/i386/machine.c |  1 +
 3 files changed, 59 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 96c2d0d5cb..bf44a87ddb 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1791,6 +1791,7 @@ typedef struct CPUArchState {
 uint64_t xen_vcpu_info_gpa;
 uint64_t xen_vcpu_info_default_gpa;
 uint64_t xen_vcpu_time_info_gpa;
+uint64_t xen_vcpu_runstate_gpa;
 #endif
 #if defined(CONFIG_HVF)
 HVFX86LazyFlags hvf_lflags;
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index ebb5d1296a..6521df0461 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -161,6 +161,7 @@ int kvm_xen_init_vcpu(CPUState *cs)
 env->xen_vcpu_info_gpa = INVALID_GPA;
 env->xen_vcpu_info_default_gpa = INVALID_GPA;
 env->xen_vcpu_time_info_gpa = INVALID_GPA;
+env->xen_vcpu_runstate_gpa = INVALID_GPA;
 
 return 0;
 }
@@ -255,6 +256,17 @@ static void do_set_vcpu_time_info_gpa(CPUState *cs, 
run_on_cpu_data data)
   env->xen_vcpu_time_info_gpa);
 }
 
+static void do_set_vcpu_runstate_gpa(CPUState *cs, run_on_cpu_data data)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
+
+env->xen_vcpu_runstate_gpa = data.host_ulong;
+
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR,
+  env->xen_vcpu_runstate_gpa);
+}
+
 static void do_vcpu_soft_reset(CPUState *cs, run_on_cpu_data data)
 {
 X86CPU *cpu = X86_CPU(cs);
@@ -263,10 +275,14 @@ static void do_vcpu_soft_reset(CPUState *cs, 
run_on_cpu_data data)
 env->xen_vcpu_info_gpa = INVALID_GPA;
 env->xen_vcpu_info_default_gpa = INVALID_GPA;
 env->xen_vcpu_time_info_gpa = INVALID_GPA;
+env->xen_vcpu_runstate_gpa = INVALID_GPA;
 
 kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO, INVALID_GPA);
 kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO,
   INVALID_GPA);
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR,
+  INVALID_GPA);
+
 }
 
 static int xen_set_shared_info(uint64_t gfn)
@@ -516,6 +532,35 @@ static int vcpuop_register_vcpu_time_info(CPUState *cs, 
CPUState *target,
 return 0;
 }
 
+static int vcpuop_register_runstate_info(CPUState *cs, CPUState *target,
+ uint64_t arg)
+{
+struct vcpu_register_runstate_memory_area rma;
+uint64_t gpa;
+size_t len;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(rma) == 8);
+/* The runstate area actually does change size, but Linux copes. */
+
+if (!target) {
+return -ENOENT;
+}
+
+if (kvm_copy_from_gva(cs, arg, , sizeof(rma))) {
+return -EFAULT;
+}
+
+/* As with vcpu_time_info, Xen actually uses the GVA but KVM doesn't. */
+if (!kvm_gva_to_gpa(cs, rma.addr.p, , , false)) {
+return -EFAULT;
+}
+
+async_run_on_cpu(target, do_set_vcpu_runstate_gpa,
+ RUN_ON_CPU_HOST_ULONG(gpa));
+return 0;
+}
+
 static bool kvm_xen_hcall_vcpu_op(struct kvm_xen_exit *exit, X86CPU *cpu,
   int cmd, int vcpu_id, uint64_t arg)
 {
@@ -524,6 +569,9 @@ static bool kvm_xen_hcall_vcpu_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 int err;
 
 switch (cmd) {
+case VCPUOP_register_runstate_memory_area:
+err = vcpuop_register_runstate_info(cs, dest, arg);
+break;
 case VCPUOP_register_vcpu_time_memory_area:
 err = vcpuop_register_vcpu_time_info(cs, dest, arg);
 break;
@@ -722,6 +770,15 @@ int kvm_put_xen_state(CPUState *cs)
 }
 }
 
+gpa = env->xen_vcpu_runstate_gpa;
+if (gpa != INVALID_GPA) {
+ret = kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR,
+gpa);
+if (ret < 0) {
+return ret;
+}
+}
+
 return 0;
 }
 
diff --git a/target/i386/machine.c b/target/i386/machine.c
index eb657907ca..3f3d436aaa 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -1273,6 +1273,7 @@ static const VMStateDescription vmstate_xen_vcpu = {
 VMSTATE_UINT64(env.xen_vcpu_info_gpa, X86CPU),
 VMSTATE_UINT64(env.xen_vcpu_info_default_gpa, X86CPU),
 VMSTATE_UINT64(env.xen_vcpu_time_info_gpa, X86CPU),
+VMSTATE_UINT64(env.xen_vcpu_runstate_gpa, X86CPU),
 VMSTATE_END_OF_LIST()
 }
 };
-- 
2.35.3




[RFC PATCH v5 06/52] i386/hvm: Set Xen vCPU ID in KVM

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

There are (at least) three different vCPU ID number spaces. One is the
internal KVM vCPU index, based purely on which vCPU was chronologically
created in the kernel first. If userspace threads are all spawned and
create their KVM vCPUs in essentially random order, then the KVM indices
are basically random too.

The second number space is the APIC ID space, which is consistent and
useful for referencing vCPUs. MSIs will specify the target vCPU using
the APIC ID, for example, and the KVM Xen APIs also take an APIC ID
from userspace whenever a vCPU needs to be specified (as opposed to
just using the appropriate vCPU fd).

The third number space is not normally relevant to the kernel, and is
the ACPI/MADT/Xen CPU number which corresponds to cs->cpu_index. But
Xen timer hypercalls use it, and Xen timer hypercalls *really* want
to be accelerated in the kernel rather than handled in userspace, so
the kernel needs to be told.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 target/i386/kvm/kvm.c |  5 +
 target/i386/kvm/xen-emu.c | 28 
 target/i386/kvm/xen-emu.h |  1 +
 3 files changed, 34 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 893e350a10..be5e50b4fc 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1868,6 +1868,11 @@ int kvm_arch_init_vcpu(CPUState *cs)
 }
 }
 
+r = kvm_xen_init_vcpu(cs);
+if (r) {
+return r;
+}
+
 kvm_base += 0x100;
 #else /* CONFIG_XEN_EMU */
 /* This should never happen as kvm_arch_init() would have died first. 
*/
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 34d5bc1bc9..4883b95d9d 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -52,6 +52,34 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
 return 0;
 }
 
+int kvm_xen_init_vcpu(CPUState *cs)
+{
+int err;
+
+/*
+ * The kernel needs to know the Xen/ACPI vCPU ID because that's
+ * what the guest uses in hypercalls such as timers. It doesn't
+ * match the APIC ID which is generally used for talking to the
+ * kernel about vCPUs. And if vCPU threads race with creating
+ * their KVM vCPUs out of order, it doesn't necessarily match
+ * with the kernel's internal vCPU indices either.
+ */
+if (kvm_xen_has_cap(EVTCHN_SEND)) {
+struct kvm_xen_vcpu_attr va = {
+.type = KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID,
+.u.vcpu_id = cs->cpu_index,
+};
+err = kvm_vcpu_ioctl(cs, KVM_XEN_VCPU_SET_ATTR, );
+if (err) {
+error_report("kvm: Failed to set Xen vCPU ID attribute: %s",
+ strerror(-err));
+return err;
+}
+}
+
+return 0;
+}
+
 uint32_t kvm_xen_get_caps(void)
 {
 return kvm_state->xen_caps;
diff --git a/target/i386/kvm/xen-emu.h b/target/i386/kvm/xen-emu.h
index 2101df0182..d62f1d8ed8 100644
--- a/target/i386/kvm/xen-emu.h
+++ b/target/i386/kvm/xen-emu.h
@@ -24,5 +24,6 @@
 #define XEN_VERSION(maj, min) ((maj) << 16 | (min))
 
 int kvm_xen_init(KVMState *s, uint32_t hypercall_msr);
+int kvm_xen_init_vcpu(CPUState *cs);
 
 #endif /* QEMU_I386_KVM_XEN_EMU_H */
-- 
2.35.3




[RFC PATCH v5 12/52] i386/xen: implement HYPERVISOR_sched_op, SCHEDOP_shutdown

2022-12-30 Thread David Woodhouse
From: Joao Martins 

It allows to shutdown itself via hypercall with any of the 3 reasons:
  1) self-reboot
  2) shutdown
  3) crash

Implementing SCHEDOP_shutdown sub op let us handle crashes gracefully rather
than leading to triple faults if it remains unimplemented.

In addition, the SHUTDOWN_soft_reset reason is used for kexec, to reset
Xen shared pages and other enlightenments and leave a clean slate for the
new kernel without the hypervisor helpfully writing information at
unexpected addresses.

Signed-off-by: Joao Martins 
[dwmw2: Ditch sched_op_compat which was never available for HVM guests,
Add SCHEDOP_soft_reset]
Signed-off-by: David Woodhouse 
---
 target/i386/kvm/xen-emu.c | 68 +++
 1 file changed, 68 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 1dea6feb90..5f2b55ef10 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -17,8 +17,10 @@
 #include "exec/address-spaces.h"
 #include "xen-emu.h"
 #include "trace.h"
+#include "sysemu/runstate.h"
 
 #include "standard-headers/xen/version.h"
+#include "standard-headers/xen/sched.h"
 
 static int kvm_gva_rw(CPUState *cs, uint64_t gva, void *_buf, size_t sz,
   bool is_write)
@@ -170,6 +172,69 @@ static bool kvm_xen_hcall_xen_version(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static int kvm_xen_soft_reset(void)
+{
+/* Nothing to reset... yet. */
+return 0;
+}
+
+static int schedop_shutdown(CPUState *cs, uint64_t arg)
+{
+struct sched_shutdown shutdown;
+int ret = 0;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(shutdown) == 4);
+
+if (kvm_copy_from_gva(cs, arg, , sizeof(shutdown))) {
+return -EFAULT;
+}
+
+switch (shutdown.reason) {
+case SHUTDOWN_crash:
+cpu_dump_state(cs, stderr, CPU_DUMP_CODE);
+qemu_system_guest_panicked(NULL);
+break;
+
+case SHUTDOWN_reboot:
+qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+break;
+
+case SHUTDOWN_poweroff:
+qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+break;
+
+case SHUTDOWN_soft_reset:
+ret = kvm_xen_soft_reset();
+break;
+
+default:
+ret = -EINVAL;
+break;
+}
+
+return ret;
+}
+
+static bool kvm_xen_hcall_sched_op(struct kvm_xen_exit *exit, X86CPU *cpu,
+   int cmd, uint64_t arg)
+{
+CPUState *cs = CPU(cpu);
+int err = -ENOSYS;
+
+switch (cmd) {
+case SCHEDOP_shutdown:
+err = schedop_shutdown(cs, arg);
+break;
+
+default:
+return false;
+}
+
+exit->u.hcall.result = err;
+return true;
+}
+
 static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
 {
 uint16_t code = exit->u.hcall.input;
@@ -180,6 +245,9 @@ static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct 
kvm_xen_exit *exit)
 }
 
 switch (code) {
+case __HYPERVISOR_sched_op:
+return kvm_xen_hcall_sched_op(exit, cpu, exit->u.hcall.params[0],
+  exit->u.hcall.params[1]);
 case __HYPERVISOR_xen_version:
 return kvm_xen_hcall_xen_version(exit, cpu, exit->u.hcall.params[0],
  exit->u.hcall.params[1]);
-- 
2.35.3




[RFC PATCH v5 18/52] i386/xen: implement XENMEM_add_to_physmap_batch

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 target/i386/kvm/xen-compat.h | 24 +
 target/i386/kvm/xen-emu.c| 69 
 2 files changed, 93 insertions(+)

diff --git a/target/i386/kvm/xen-compat.h b/target/i386/kvm/xen-compat.h
index 0b7088662a..ff5d20e901 100644
--- a/target/i386/kvm/xen-compat.h
+++ b/target/i386/kvm/xen-compat.h
@@ -15,6 +15,20 @@
 
 typedef uint32_t compat_pfn_t;
 typedef uint32_t compat_ulong_t;
+typedef uint32_t compat_ptr_t;
+
+#define __DEFINE_COMPAT_HANDLE(name, type)  \
+typedef struct {\
+compat_ptr_t c; \
+type *_[0] __attribute__((packed));   \
+} __compat_handle_ ## name; \
+
+#define DEFINE_COMPAT_HANDLE(name) __DEFINE_COMPAT_HANDLE(name, name)
+#define COMPAT_HANDLE(name) __compat_handle_ ## name
+
+DEFINE_COMPAT_HANDLE(compat_pfn_t);
+DEFINE_COMPAT_HANDLE(compat_ulong_t);
+DEFINE_COMPAT_HANDLE(int);
 
 struct compat_xen_add_to_physmap {
 domid_t domid;
@@ -24,4 +38,14 @@ struct compat_xen_add_to_physmap {
 compat_pfn_t gpfn;
 };
 
+struct compat_xen_add_to_physmap_batch {
+domid_t domid;
+uint16_t space;
+uint16_t size;
+uint16_t extra;
+COMPAT_HANDLE(compat_ulong_t) idxs;
+COMPAT_HANDLE(compat_pfn_t) gpfns;
+COMPAT_HANDLE(int) errs;
+};
+
 #endif /* QEMU_I386_XEN_COMPAT_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index c8543d2153..5a969be4cc 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -261,6 +261,71 @@ static int do_add_to_physmap(struct kvm_xen_exit *exit, 
X86CPU *cpu,
 return add_to_physmap_one(xatp.space, xatp.idx, xatp.gpfn);
 }
 
+static int do_add_to_physmap_batch(struct kvm_xen_exit *exit, X86CPU *cpu,
+   uint64_t arg)
+{
+struct xen_add_to_physmap_batch xatpb;
+unsigned long idxs_gva, gpfns_gva, errs_gva;
+CPUState *cs = CPU(cpu);
+size_t op_sz;
+
+if (hypercall_compat32(exit->u.hcall.longmode)) {
+struct compat_xen_add_to_physmap_batch xatpb32;
+
+qemu_build_assert(sizeof(struct compat_xen_add_to_physmap_batch) == 
20);
+if (kvm_copy_from_gva(cs, arg, , sizeof(xatpb32))) {
+return -EFAULT;
+}
+xatpb.domid = xatpb32.domid;
+xatpb.space = xatpb32.space;
+xatpb.size = xatpb32.size;
+
+idxs_gva = xatpb32.idxs.c;
+gpfns_gva = xatpb32.gpfns.c;
+errs_gva = xatpb32.errs.c;
+op_sz = sizeof(uint32_t);
+} else {
+if (kvm_copy_from_gva(cs, arg, , sizeof(xatpb))) {
+return -EFAULT;
+}
+op_sz = sizeof(unsigned long);
+idxs_gva = (unsigned long)xatpb.idxs.p;
+gpfns_gva = (unsigned long)xatpb.gpfns.p;
+errs_gva = (unsigned long)xatpb.errs.p;
+}
+
+if (xatpb.domid != DOMID_SELF && xatpb.domid != xen_domid) {
+return -ESRCH;
+}
+
+/* Explicitly invalid for the batch op. Not that we implement it anyway. */
+if (xatpb.space == XENMAPSPACE_gmfn_range) {
+return -EINVAL;
+}
+
+while (xatpb.size--) {
+unsigned long idx = 0;
+unsigned long gpfn = 0;
+int err;
+
+/* For 32-bit compat this only copies the low 32 bits of each */
+if (kvm_copy_from_gva(cs, idxs_gva, , op_sz) ||
+kvm_copy_from_gva(cs, gpfns_gva, , op_sz)) {
+return -EFAULT;
+}
+idxs_gva += op_sz;
+gpfns_gva += op_sz;
+
+err = add_to_physmap_one(xatpb.space, idx, gpfn);
+
+if (kvm_copy_to_gva(cs, errs_gva, , sizeof(err))) {
+return -EFAULT;
+}
+errs_gva += sizeof(err);
+}
+return 0;
+}
+
 static bool kvm_xen_hcall_memory_op(struct kvm_xen_exit *exit, X86CPU *cpu,
int cmd, uint64_t arg)
 {
@@ -271,6 +336,10 @@ static bool kvm_xen_hcall_memory_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 err = do_add_to_physmap(exit, cpu, arg);
 break;
 
+case XENMEM_add_to_physmap_batch:
+err = do_add_to_physmap_batch(exit, cpu, arg);
+break;
+
 default:
 return false;
 }
-- 
2.35.3




[RFC PATCH v5 52/52] hw/xen: Add basic ring handling to xenstore

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

Extract requests, return ENOSYS to all of them. This is enough to allow
older Linux guests to boot, as they need *something* back but it doesn't
matter much what.

In the first instance we're likely to wire this up over a UNIX socket to
an actual xenstored implementation, but in the fullness of time it would
be nice to have a fully single-tenant "virtual" xenstore within qemu.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_xenstore.c | 227 -
 1 file changed, 224 insertions(+), 3 deletions(-)

diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
index 63530059fa..219291866f 100644
--- a/hw/i386/kvm/xen_xenstore.c
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -188,18 +188,239 @@ uint16_t xen_xenstore_get_port(void)
 return s->guest_port;
 }
 
+static bool req_pending(XenXenstoreState *s)
+{
+struct xsd_sockmsg *req = (struct xsd_sockmsg *)s->req_data;
+
+return s->req_offset == XENSTORE_HEADER_SIZE + req->len;
+}
+
+static void reset_req(XenXenstoreState *s)
+{
+memset(s->req_data, 0, sizeof(s->req_data));
+s->req_offset = 0;
+}
+
+static void reset_rsp(XenXenstoreState *s)
+{
+s->rsp_pending = false;
+
+memset(s->rsp_data, 0, sizeof(s->rsp_data));
+s->rsp_offset = 0;
+}
+
+static void process_req(XenXenstoreState *s)
+{
+struct xsd_sockmsg *req = (struct xsd_sockmsg *)s->req_data;
+struct xsd_sockmsg *rsp = (struct xsd_sockmsg *)s->rsp_data;
+const char enosys[] = "ENOSYS";
+
+assert(req_pending(s));
+   assert(!s->rsp_pending);
+
+qemu_hexdump(stdout, "Xenstore req:", req, sizeof(req) + req->len);
+
+rsp->type = XS_ERROR;
+rsp->req_id = req->req_id;
+rsp->tx_id = req->tx_id;
+rsp->len = sizeof(enosys);
+memcpy((void *)[1], enosys, sizeof(enosys));
+
+qemu_hexdump(stdout, "XenStore rsp:", rsp, sizeof(rsp) + rsp->len);
+
+s->rsp_pending = true;
+reset_req(s);
+}
+
+static unsigned int copy_from_ring(XenXenstoreState *s, uint8_t *ptr, unsigned 
int len)
+{
+if (!len)
+return 0;
+
+XENSTORE_RING_IDX prod = qatomic_read(>xs->req_prod);
+XENSTORE_RING_IDX cons = qatomic_read(>xs->req_cons);
+unsigned int copied = 0;
+
+smp_mb();
+
+while (len) {
+unsigned int avail = prod - cons;
+unsigned int offset = MASK_XENSTORE_IDX(cons);
+unsigned int copylen = avail;
+
+if (avail > XENSTORE_RING_SIZE) {
+error_report("XenStore ring handling error");
+s->fatal_error = true;
+break;
+} else if (avail == 0)
+break;
+
+if (copylen > len) {
+copylen = len;
+}
+if (copylen > XENSTORE_RING_SIZE - offset) {
+copylen = XENSTORE_RING_SIZE - offset;
+}
+
+memcpy(ptr, >xs->req[offset], copylen);
+copied += copylen;
+
+ptr += copylen;
+len -= copylen;
+
+cons += copylen;
+}
+
+smp_mb();
+
+qatomic_set(>xs->req_cons, cons);
+
+return copied;
+}
+
+static unsigned int copy_to_ring(XenXenstoreState *s, uint8_t *ptr, unsigned 
int len)
+{
+if (!len)
+return 0;
+
+XENSTORE_RING_IDX cons = qatomic_read(>xs->rsp_cons);
+XENSTORE_RING_IDX prod = qatomic_read(>xs->rsp_prod);
+unsigned int copied = 0;
+
+smp_mb();
+
+while (len) {
+unsigned int avail = cons + XENSTORE_RING_SIZE - prod;
+unsigned int offset = MASK_XENSTORE_IDX(prod);
+unsigned int copylen = len;
+
+if (avail > XENSTORE_RING_SIZE) {
+error_report("XenStore ring handling error");
+s->fatal_error = true;
+break;
+} else if (avail == 0)
+break;
+
+if (copylen > avail) {
+copylen = avail;
+}
+if (copylen > XENSTORE_RING_SIZE - offset) {
+copylen = XENSTORE_RING_SIZE - offset;
+}
+
+
+memcpy(>xs->rsp[offset], ptr, copylen);
+copied += copylen;
+
+ptr += copylen;
+len -= copylen;
+
+prod += copylen;
+}
+
+smp_mb();
+
+qatomic_set(>xs->rsp_prod, prod);
+
+return copied;
+}
+
+static unsigned int get_req(XenXenstoreState *s)
+{
+unsigned int copied = 0;
+
+if (s->fatal_error)
+return 0;
+
+assert(!req_pending(s));
+
+if (s->req_offset < XENSTORE_HEADER_SIZE) {
+void *ptr = s->req_data + s->req_offset;
+unsigned int len = XENSTORE_HEADER_SIZE;
+unsigned int copylen = copy_from_ring(s, ptr, len);
+
+copied += copylen;
+s->req_offset += copylen;
+}
+
+if (s->req_offset >= XENSTORE_HEADER_SIZE) {
+struct xsd_sockmsg *req = (struct xsd_sockmsg *)s->req_data;
+
+if (req->len > (uint32_t)XENSTORE_PAYLOAD_MAX) {
+error_report("Illegal XenStore request");
+s->fatal_error = true;
+return 0;
+}
+
+void *ptr = s->req_data + s->req_offset;
+

[RFC PATCH v5 51/52] hw/xen: Add xen_xenstore device for xenstore emulation

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

The hookup to event channel is a bit of a special case hack right now; as
we make this work for real PV driver back ends, that will be implemented
for the general case of Dom0 ports binding to DomU.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/meson.build|   1 +
 hw/i386/kvm/xen_evtchn.c   |   1 +
 hw/i386/kvm/xen_xenstore.c | 248 +
 hw/i386/kvm/xen_xenstore.h |  20 +++
 hw/i386/pc.c   |   2 +
 target/i386/kvm/xen-emu.c  |  12 ++
 6 files changed, 284 insertions(+)
 create mode 100644 hw/i386/kvm/xen_xenstore.c
 create mode 100644 hw/i386/kvm/xen_xenstore.h

diff --git a/hw/i386/kvm/meson.build b/hw/i386/kvm/meson.build
index e02449e4d4..6d6981fced 100644
--- a/hw/i386/kvm/meson.build
+++ b/hw/i386/kvm/meson.build
@@ -8,6 +8,7 @@ i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files(
   'xen_overlay.c',
   'xen_evtchn.c',
   'xen_gnttab.c',
+  'xen_xenstore.c',
   ))
 
 i386_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss)
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index c0f6ef9dff..0653cad3bb 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -31,6 +31,7 @@
 
 #include "xen_evtchn.h"
 #include "xen_overlay.h"
+#include "xen_xenstore.h"
 
 #include "sysemu/kvm.h"
 #include "sysemu/kvm_xen.h"
diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
new file mode 100644
index 00..63530059fa
--- /dev/null
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -0,0 +1,248 @@
+/*
+ * QEMU Xen emulation: Shared/overlay pages support
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Authors: David Woodhouse 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "qemu/host-utils.h"
+#include "qemu/module.h"
+#include "qemu/main-loop.h"
+#include "qemu/cutils.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+#include "migration/vmstate.h"
+
+#include "hw/sysbus.h"
+#include "hw/xen/xen.h"
+#include "xen_overlay.h"
+#include "xen_evtchn.h"
+#include "xen_xenstore.h"
+
+#include "sysemu/kvm.h"
+#include "sysemu/kvm_xen.h"
+
+#include "standard-headers/xen/io/xs_wire.h"
+#include "standard-headers/xen/event_channel.h"
+
+#define TYPE_XEN_XENSTORE "xen-xenstore"
+OBJECT_DECLARE_SIMPLE_TYPE(XenXenstoreState, XEN_XENSTORE)
+
+#define XEN_PAGE_SHIFT 12
+#define XEN_PAGE_SIZE (1ULL << XEN_PAGE_SHIFT)
+
+#define ENTRIES_PER_FRAME_V1 (XEN_PAGE_SIZE / sizeof(grant_entry_v1_t))
+#define ENTRIES_PER_FRAME_V2 (XEN_PAGE_SIZE / sizeof(grant_entry_v2_t))
+
+#define XENSTORE_HEADER_SIZE ((unsigned int)sizeof(struct xsd_sockmsg))
+
+struct XenXenstoreState {
+/*< private >*/
+SysBusDevice busdev;
+/*< public >*/
+
+MemoryRegion xenstore_page;
+struct xenstore_domain_interface *xs;
+uint8_t req_data[XENSTORE_HEADER_SIZE + XENSTORE_PAYLOAD_MAX];
+uint8_t rsp_data[XENSTORE_HEADER_SIZE + XENSTORE_PAYLOAD_MAX];
+uint32_t req_offset;
+uint32_t rsp_offset;
+bool rsp_pending;
+bool fatal_error;
+
+evtchn_port_t guest_port;
+evtchn_port_t be_port;
+struct xenevtchn_handle *eh;
+};
+
+struct XenXenstoreState *xen_xenstore_singleton;
+
+static void xen_xenstore_event(void *opaque);
+
+static void xen_xenstore_realize(DeviceState *dev, Error **errp)
+{
+XenXenstoreState *s = XEN_XENSTORE(dev);
+
+if (xen_mode != XEN_EMULATE) {
+error_setg(errp, "Xen xenstore support is for Xen emulation");
+return;
+}
+memory_region_init_ram(>xenstore_page, OBJECT(dev), "xen:xenstore_page",
+   XEN_PAGE_SIZE, _abort);
+memory_region_set_enabled(>xenstore_page, true);
+s->xs = memory_region_get_ram_ptr(>xenstore_page);
+memset(s->xs, 0, XEN_PAGE_SIZE);
+
+/* We can't map it this early as KVM isn't ready */
+xen_xenstore_singleton = s;
+
+s->eh = xen_be_evtchn_open(NULL, 0);
+if (!s->eh) {
+error_setg(errp, "Xenstore evtchn port init failed");
+return;
+}
+aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), true,
+   xen_xenstore_event, NULL, NULL, NULL, s);
+}
+
+static bool xen_xenstore_is_needed(void *opaque)
+{
+return xen_mode == XEN_EMULATE;
+}
+
+static int xen_xenstore_pre_save(void *opaque)
+{
+XenXenstoreState *s = opaque;
+
+if (s->eh) {
+s->guest_port = xen_be_evtchn_get_guest_port(s->eh);
+}
+return 0;
+}
+
+static int xen_xenstore_post_load(void *opaque, int ver)
+{
+XenXenstoreState *s = opaque;
+
+/*
+ * As qemu/dom0, rebind to the guest's port. The Windows drivers may
+ * unbind the XenStore evtchn and rebind to it, having obtained the
+ * "remote" port through EVTCHNOP_status. In the case that migration
+ * occurs while it's unbound, the "remote" port needs to be the same
+ * as before so that the 

[RFC PATCH v5 07/52] xen-platform: exclude vfio-pci from the PCI platform unplug

2022-12-30 Thread David Woodhouse
From: Joao Martins 

Such that PCI passthrough devices work for Xen emulated guests.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/i386/xen/xen_platform.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
index a64265cca0..a6f0fb478a 100644
--- a/hw/i386/xen/xen_platform.c
+++ b/hw/i386/xen/xen_platform.c
@@ -109,12 +109,25 @@ static void log_writeb(PCIXenPlatformState *s, char val)
 #define _UNPLUG_NVME_DISKS 3
 #define UNPLUG_NVME_DISKS (1u << _UNPLUG_NVME_DISKS)
 
+static bool pci_device_is_passthrough(PCIDevice *d)
+{
+if (!strcmp(d->name, "xen-pci-passthrough")) {
+return true;
+}
+
+if (xen_mode == XEN_EMULATE && !strcmp(d->name, "vfio-pci")) {
+return true;
+}
+
+return false;
+}
+
 static void unplug_nic(PCIBus *b, PCIDevice *d, void *o)
 {
 /* We have to ignore passthrough devices */
 if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
 PCI_CLASS_NETWORK_ETHERNET
-&& strcmp(d->name, "xen-pci-passthrough") != 0) {
+&& !pci_device_is_passthrough(d)) {
 object_unparent(OBJECT(d));
 }
 }
@@ -187,9 +200,8 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void 
*opaque)
 !(flags & UNPLUG_IDE_SCSI_DISKS);
 
 /* We have to ignore passthrough devices */
-if (!strcmp(d->name, "xen-pci-passthrough")) {
+if (pci_device_is_passthrough(d))
 return;
-}
 
 switch (pci_get_word(d->config + PCI_CLASS_DEVICE)) {
 case PCI_CLASS_STORAGE_IDE:
-- 
2.35.3




[RFC PATCH v5 26/52] i386/xen: implement HVMOP_set_param

2022-12-30 Thread David Woodhouse
From: Ankur Arora 

This is the hook for adding the HVM_PARAM_CALLBACK_IRQ parameter in a
subsequent commit.

Signed-off-by: Ankur Arora 
Signed-off-by: Joao Martins 
[dwmw2: Split out from another commit]
Signed-off-by: David Woodhouse 
---
 target/i386/kvm/xen-emu.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 9f0ade9325..4ef0b05278 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -488,6 +488,36 @@ static bool kvm_xen_hcall_memory_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static bool handle_set_param(struct kvm_xen_exit *exit, X86CPU *cpu,
+ uint64_t arg)
+{
+CPUState *cs = CPU(cpu);
+struct xen_hvm_param hp;
+int err = 0;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(hp) == 16);
+
+if (kvm_copy_from_gva(cs, arg, , sizeof(hp))) {
+err = -EFAULT;
+goto out;
+}
+
+if (hp.domid != DOMID_SELF && hp.domid != xen_domid) {
+err = -ESRCH;
+goto out;
+}
+
+switch (hp.index) {
+default:
+return false;
+}
+
+out:
+exit->u.hcall.result = err;
+return true;
+}
+
 static int kvm_xen_hcall_evtchn_upcall_vector(struct kvm_xen_exit *exit,
   X86CPU *cpu, uint64_t arg)
 {
@@ -529,6 +559,9 @@ static bool kvm_xen_hcall_hvm_op(struct kvm_xen_exit *exit, 
X86CPU *cpu,
 ret = -ENOSYS;
 break;
 
+case HVMOP_set_param:
+return handle_set_param(exit, cpu, arg);
+
 default:
 return false;
 }
-- 
2.35.3




[RFC PATCH v5 08/52] xen-platform: allow its creation with XEN_EMULATE mode

2022-12-30 Thread David Woodhouse
From: Joao Martins 

The only thing we need to handle on KVM side is to change the
pfn from R/W to R/O.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
---
 hw/i386/xen/meson.build|  5 -
 hw/i386/xen/xen_platform.c | 39 +-
 2 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/hw/i386/xen/meson.build b/hw/i386/xen/meson.build
index be84130300..79d75cc927 100644
--- a/hw/i386/xen/meson.build
+++ b/hw/i386/xen/meson.build
@@ -2,6 +2,9 @@ i386_ss.add(when: 'CONFIG_XEN', if_true: files(
   'xen-hvm.c',
   'xen-mapcache.c',
   'xen_apic.c',
-  'xen_platform.c',
   'xen_pvdevice.c',
 ))
+
+i386_ss.add(when: 'CONFIG_XENFV_MACHINE', if_true: files(
+  'xen_platform.c',
+))
diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
index a6f0fb478a..123c1d 100644
--- a/hw/i386/xen/xen_platform.c
+++ b/hw/i386/xen/xen_platform.c
@@ -28,9 +28,9 @@
 #include "hw/ide.h"
 #include "hw/ide/pci.h"
 #include "hw/pci/pci.h"
-#include "hw/xen/xen_common.h"
 #include "migration/vmstate.h"
-#include "hw/xen/xen-legacy-backend.h"
+#include "hw/xen/xen.h"
+#include "net/net.h"
 #include "trace.h"
 #include "sysemu/xen.h"
 #include "sysemu/block-backend.h"
@@ -38,6 +38,11 @@
 #include "qemu/module.h"
 #include "qom/object.h"
 
+#ifdef CONFIG_XEN
+#include "hw/xen/xen_common.h"
+#include "hw/xen/xen-legacy-backend.h"
+#endif
+
 //#define DEBUG_PLATFORM
 
 #ifdef DEBUG_PLATFORM
@@ -280,18 +285,26 @@ static void platform_fixed_ioport_writeb(void *opaque, 
uint32_t addr, uint32_t v
 PCIXenPlatformState *s = opaque;
 
 switch (addr) {
-case 0: /* Platform flags */ {
-hvmmem_type_t mem_type = (val & PFFLAG_ROM_LOCK) ?
-HVMMEM_ram_ro : HVMMEM_ram_rw;
-if (xen_set_mem_type(xen_domid, mem_type, 0xc0, 0x40)) {
-DPRINTF("unable to change ro/rw state of ROM memory area!\n");
-} else {
+case 0: /* Platform flags */
+if (xen_mode == XEN_EMULATE) {
+/* XX: Use i440gx/q35 PAM setup to do this? */
 s->flags = val & PFFLAG_ROM_LOCK;
-DPRINTF("changed ro/rw state of ROM memory area. now is %s 
state.\n",
-(mem_type == HVMMEM_ram_ro ? "ro":"rw"));
+#ifdef CONFIG_XEN
+} else {
+hvmmem_type_t mem_type = (val & PFFLAG_ROM_LOCK) ?
+HVMMEM_ram_ro : HVMMEM_ram_rw;
+
+if (xen_set_mem_type(xen_domid, mem_type, 0xc0, 0x40)) {
+DPRINTF("unable to change ro/rw state of ROM memory area!\n");
+} else {
+s->flags = val & PFFLAG_ROM_LOCK;
+DPRINTF("changed ro/rw state of ROM memory area. now is %s 
state.\n",
+(mem_type == HVMMEM_ram_ro ? "ro" : "rw"));
+}
+#endif
 }
 break;
-}
+
 case 2:
 log_writeb(s, val);
 break;
@@ -509,8 +522,8 @@ static void xen_platform_realize(PCIDevice *dev, Error 
**errp)
 uint8_t *pci_conf;
 
 /* Device will crash on reset if xen is not initialized */
-if (!xen_enabled()) {
-error_setg(errp, "xen-platform device requires the Xen accelerator");
+if (xen_mode == XEN_DISABLED) {
+error_setg(errp, "xen-platform device requires a Xen guest");
 return;
 }
 
-- 
2.35.3




[RFC PATCH v5 34/52] hw/xen: Implement EVTCHNOP_send

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 180 ++
 hw/i386/kvm/xen_evtchn.h  |   2 +
 target/i386/kvm/xen-emu.c |  12 +++
 3 files changed, 194 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index ae715c4af9..1ce60f7b1d 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -485,6 +485,133 @@ static int unmask_port(XenEvtchnState *s, evtchn_port_t 
port, bool do_unmask)
 }
 }
 
+static int do_set_port_lm(XenEvtchnState *s, evtchn_port_t port,
+  struct shared_info *shinfo,
+  struct vcpu_info *vcpu_info)
+{
+const int bits_per_word = BITS_PER_BYTE * 
sizeof(shinfo->evtchn_pending[0]);
+typeof(shinfo->evtchn_pending[0]) mask;
+int idx = port / bits_per_word;
+int offset = port % bits_per_word;
+
+mask = 1UL << offset;
+
+if (idx >= bits_per_word) {
+return -EINVAL;
+}
+
+/* Update the pending bit itself. If it was already set, we're done. */
+if (qatomic_fetch_or(>evtchn_pending[idx], mask) & mask) {
+return 0;
+}
+
+/* Check if it's masked. */
+if (qatomic_fetch_or(>evtchn_mask[idx], 0) & mask) {
+return 0;
+}
+
+/* Now on to the vcpu_info evtchn_pending_sel index... */
+mask = 1UL << idx;
+
+/* If a port in this word was already pending for this vCPU, all done. */
+if (qatomic_fetch_or(_info->evtchn_pending_sel, mask) & mask) {
+return 0;
+}
+
+/* Set evtchn_upcall_pending for this vCPU */
+if (qatomic_fetch_or(_info->evtchn_upcall_pending, 1)) {
+return 0;
+}
+
+inject_callback(s, s->port_table[port].vcpu);
+
+return 0;
+}
+
+static int do_set_port_compat(XenEvtchnState *s, evtchn_port_t port,
+  struct compat_shared_info *shinfo,
+  struct compat_vcpu_info *vcpu_info)
+{
+const int bits_per_word = BITS_PER_BYTE * 
sizeof(shinfo->evtchn_pending[0]);
+typeof(shinfo->evtchn_pending[0]) mask;
+int idx = port / bits_per_word;
+int offset = port % bits_per_word;
+
+mask = 1UL << offset;
+
+if (idx >= bits_per_word) {
+return -EINVAL;
+}
+
+/* Update the pending bit itself. If it was already set, we're done. */
+if (qatomic_fetch_or(>evtchn_pending[idx], mask) & mask) {
+return 0;
+}
+
+/* Check if it's masked. */
+if (qatomic_fetch_or(>evtchn_mask[idx], 0) & mask) {
+return 0;
+}
+
+/* Now on to the vcpu_info evtchn_pending_sel index... */
+mask = 1UL << idx;
+
+/* If a port in this word was already pending for this vCPU, all done. */
+if (qatomic_fetch_or(_info->evtchn_pending_sel, mask) & mask) {
+return 0;
+}
+
+/* Set evtchn_upcall_pending for this vCPU */
+if (qatomic_fetch_or(_info->evtchn_upcall_pending, 1)) {
+return 0;
+}
+
+inject_callback(s, s->port_table[port].vcpu);
+
+return 0;
+}
+
+static int set_port_pending(XenEvtchnState *s, evtchn_port_t port)
+{
+void *vcpu_info, *shinfo;
+
+if (s->port_table[port].type == EVTCHNSTAT_closed) {
+return -EINVAL;
+}
+
+if (s->evtchn_in_kernel) {
+XenEvtchnPort *p = >port_table[port];
+CPUState *cpu = qemu_get_cpu(p->vcpu);
+struct kvm_irq_routing_xen_evtchn evt;
+
+if (!cpu) {
+return 0;
+}
+
+evt.port = port;
+evt.vcpu = kvm_arch_vcpu_id(cpu);
+evt.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL;
+
+return kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_EVTCHN_SEND, );
+}
+
+shinfo = xen_overlay_get_shinfo_ptr();
+if (!shinfo) {
+return -ENOTSUP;
+}
+
+vcpu_info = kvm_xen_get_vcpu_info_hva(s->port_table[port].vcpu);
+if (!vcpu_info) {
+return -EINVAL;
+}
+
+if (xen_is_long_mode()) {
+return do_set_port_lm(s, port, shinfo, vcpu_info);
+} else {
+return do_set_port_compat(s, port, shinfo, vcpu_info);
+}
+}
+
 static int clear_port_pending(XenEvtchnState *s, evtchn_port_t port)
 {
 void *p = xen_overlay_get_shinfo_ptr();
@@ -702,3 +829,56 @@ int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi)
 
 return ret;
 }
+
+int xen_evtchn_send_op(struct evtchn_send *send)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+XenEvtchnPort *p;
+int ret = 0;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (!valid_port(send->port)) {
+return -EINVAL;
+}
+
+qemu_mutex_lock(>port_lock);
+
+p = >port_table[send->port];
+
+switch (p->type) {
+case EVTCHNSTAT_interdomain:
+if (p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU) {
+/*
+ * This is an event from the guest to qemu itself, which is
+ * serving as the driver domain. Not yet implemented; it will
+ * be hooked up to the qemu implementation of xenstore,
+ 

[RFC PATCH v5 31/52] hw/xen: Implement EVTCHNOP_unmask

2022-12-30 Thread David Woodhouse
From: David Woodhouse 

This finally comes with a mechanism for actually injecting events into
the guest vCPU, with all the atomic-test-and-set that's involved in
setting the bit in the shinfo, then the index in the vcpu_info, and
injecting either the lapic vector as MSI, or letting KVM inject the
bare vector.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 175 ++
 hw/i386/kvm/xen_evtchn.h  |   2 +
 target/i386/kvm/xen-emu.c |  12 +++
 3 files changed, 189 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index f732821d5b..bfdcc8de67 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -219,6 +219,13 @@ int xen_evtchn_set_callback_param(uint64_t param)
 return ret;
 }
 
+static void inject_callback(XenEvtchnState *s, uint32_t vcpu)
+{
+int type = s->callback_param >> CALLBACK_VIA_TYPE_SHIFT;
+
+kvm_xen_inject_vcpu_callback_vector(vcpu, type);
+}
+
 static bool valid_port(evtchn_port_t port)
 {
 if (!port) {
@@ -289,6 +296,152 @@ int xen_evtchn_status_op(struct evtchn_status *status)
 return 0;
 }
 
+/*
+ * Never thought I'd hear myself say this, but C++ templates would be
+ * kind of nice here.
+ *
+ * template static int do_unmask_port(T *shinfo, ...);
+ */
+static int do_unmask_port_lm(XenEvtchnState *s, evtchn_port_t port,
+ bool do_unmask, struct shared_info *shinfo,
+ struct vcpu_info *vcpu_info)
+{
+const int bits_per_word = BITS_PER_BYTE * 
sizeof(shinfo->evtchn_pending[0]);
+typeof(shinfo->evtchn_pending[0]) mask;
+int idx = port / bits_per_word;
+int offset = port % bits_per_word;
+
+mask = 1UL << offset;
+
+if (idx >= bits_per_word) {
+return -EINVAL;
+}
+
+if (do_unmask) {
+/*
+ * If this is a true unmask operation, clear the mask bit. If
+ * it was already unmasked, we have nothing further to do.
+ */
+if (!((qatomic_fetch_and(>evtchn_mask[idx], ~mask) & mask))) {
+return 0;
+}
+} else {
+/*
+ * This is a pseudo-unmask for affinity changes. We don't
+ * change the mask bit, and if it's *masked* we have nothing
+ * else to do.
+ */
+if (qatomic_fetch_or(>evtchn_mask[idx], 0) & mask) {
+return 0;
+}
+}
+
+/* If the event was not pending, we're done. */
+if (!(qatomic_fetch_or(>evtchn_pending[idx], 0) & mask)) {
+return 0;
+}
+
+/* Now on to the vcpu_info evtchn_pending_sel index... */
+mask = 1UL << idx;
+
+/* If a port in this word was already pending for this vCPU, all done. */
+if (qatomic_fetch_or(_info->evtchn_pending_sel, mask) & mask) {
+return 0;
+}
+
+/* Set evtchn_upcall_pending for this vCPU */
+if (qatomic_fetch_or(_info->evtchn_upcall_pending, 1)) {
+return 0;
+}
+
+inject_callback(s, s->port_table[port].vcpu);
+
+return 0;
+}
+
+static int do_unmask_port_compat(XenEvtchnState *s, evtchn_port_t port,
+ bool do_unmask,
+ struct compat_shared_info *shinfo,
+ struct compat_vcpu_info *vcpu_info)
+{
+const int bits_per_word = BITS_PER_BYTE * 
sizeof(shinfo->evtchn_pending[0]);
+typeof(shinfo->evtchn_pending[0]) mask;
+int idx = port / bits_per_word;
+int offset = port % bits_per_word;
+
+mask = 1UL << offset;
+
+if (idx >= bits_per_word) {
+return -EINVAL;
+}
+
+if (do_unmask) {
+/*
+ * If this is a true unmask operation, clear the mask bit. If
+ * it was already unmasked, we have nothing further to do.
+ */
+if (!((qatomic_fetch_and(>evtchn_mask[idx], ~mask) & mask))) {
+return 0;
+}
+} else {
+/*
+ * This is a pseudo-unmask for affinity changes. We don't
+ * change the mask bit, and if it's *masked* we have nothing
+ * else to do.
+ */
+if (qatomic_fetch_or(>evtchn_mask[idx], 0) & mask) {
+return 0;
+}
+}
+
+/* If the event was not pending, we're done. */
+if (!(qatomic_fetch_or(>evtchn_pending[idx], 0) & mask)) {
+return 0;
+}
+
+/* Now on to the vcpu_info evtchn_pending_sel index... */
+mask = 1UL << idx;
+
+/* If a port in this word was already pending for this vCPU, all done. */
+if (qatomic_fetch_or(_info->evtchn_pending_sel, mask) & mask) {
+return 0;
+}
+
+/* Set evtchn_upcall_pending for this vCPU */
+if (qatomic_fetch_or(_info->evtchn_upcall_pending, 1)) {
+return 0;
+}
+
+inject_callback(s, s->port_table[port].vcpu);
+
+return 0;
+}
+
+static int unmask_port(XenEvtchnState *s, evtchn_port_t port, bool do_unmask)
+{
+void *vcpu_info, *shinfo;
+
+if (s->port_table[port].type == EVTCHNSTAT_closed) {
+   

  1   2   >