[linux-linus test] 183010: regressions - FAIL

2023-09-15 Thread osstest service owner
flight 183010 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183010/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 8 xen-boot fail REGR. 
vs. 182531
 test-amd64-coresched-amd64-xl  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-libvirt-pair 12 xen-boot/src_host   fail REGR. vs. 182531
 test-amd64-amd64-libvirt-pair 13 xen-boot/dst_host   fail REGR. vs. 182531
 test-amd64-amd64-xl-xsm   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-libvirt-raw  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-libvirt-qcow2  8 xen-boot   fail REGR. vs. 182531
 test-amd64-amd64-xl-vhd   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-pvhv2-intel  8 xen-boot  fail REGR. vs. 182531
 test-amd64-amd64-xl-pvshim8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-freebsd12-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-ws16-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-debianhvm-amd64  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-xl   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-libvirt-xsm  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-win7-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-win7-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-dom0pvh-xl-amd 14 guest-start   fail REGR. vs. 182531
 test-amd64-amd64-xl-shadow8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-xl-credit1   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-examine-uefi  8 reboot  fail REGR. vs. 182531
 test-amd64-amd64-pair12 xen-boot/src_hostfail REGR. vs. 182531
 test-amd64-amd64-pair13 xen-boot/dst_hostfail REGR. vs. 182531
 test-amd64-amd64-xl-multivcpu  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-qemuu-nested-intel  8 xen-boot  fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-ws16-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-pvhv2-amd  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-ovmf-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-qemuu-nested-amd  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-dom0pvh-xl-intel 14 guest-start fail REGR. vs. 182531
 test-amd64-amd64-libvirt  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-freebsd11-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-pygrub   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 8 xen-boot fail REGR. vs. 
182531
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 8 xen-boot fail REGR. 
vs. 182531
 test-amd64-amd64-xl-credit2   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-examine-bios  8 reboot  fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-examine  8 reboot   fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 8 xen-boot fail REGR. vs. 
182531

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds  8 xen-boot fail REGR. vs. 182531

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 182531
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 182531
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 182531
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 

Re: [PATCH v6 08/13] xen/arm: Fold mmu_init_secondary_cpu() to head.S

2023-09-15 Thread Henry Wang
Hi Julien,

> On Sep 16, 2023, at 05:58, Julien Grall  wrote:
> 
> Hi Henry,
> 
> On 28/08/2023 02:32, Henry Wang wrote:
>> Currently mmu_init_secondary_cpu() only enforces the page table
>> should not contain mapping that are both Writable and eXecutables
>> after boot. To ease the arch/arm/mm.c split work, fold this function
>> to head.S.
>> Introduce assembly macro pt_enforce_wxn for both arm32 and arm64.
>> For arm64, the macro is called at the end of enable_secondary_cpu_mm().
>> For arm32, the macro is called before secondary CPUs jumping into
>> the C world.
>> Signed-off-by: Henry Wang 
>> ---
>> v6:
>> - New patch.
>> ---
>>  xen/arch/arm/arm32/head.S | 20 
>>  xen/arch/arm/arm64/mmu/head.S | 21 +
>>  xen/arch/arm/include/asm/mm.h |  2 --
>>  xen/arch/arm/mm.c |  6 --
>>  xen/arch/arm/smpboot.c|  2 --
>>  5 files changed, 41 insertions(+), 10 deletions(-)
>> diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S
>> index 33b038e7e0..39218cf15f 100644
>> --- a/xen/arch/arm/arm32/head.S
>> +++ b/xen/arch/arm/arm32/head.S
>> @@ -83,6 +83,25 @@
>>  isb
>>  .endm
>>  +/*
>> + * Enforce Xen page-tables do not contain mapping that are both
>> + * Writable and eXecutables.
>> + *
>> + * This should be called on each secondary CPU.
>> + */
>> +.macro pt_enforce_wxn tmp
>> +mrc   CP32(\tmp, HSCTLR)
>> +orr   \tmp, \tmp, #SCTLR_Axx_ELx_WXN
>> +dsb
>> +mcr   CP32(\tmp, HSCTLR)
>> +/*
>> + * The TLBs may cache SCTLR_EL2.WXN. So ensure it is synchronized
>> + * before flushing the TLBs.
>> + */
>> +isb
>> +flush_xen_tlb_local \tmp
>> +.endm
>> +
>>  /*
>>   * Common register usage in this file:
>>   *   r0  -
>> @@ -254,6 +273,7 @@ secondary_switched:
>>  /* Use a virtual address to access the UART. */
>>  mov_w r11, EARLY_UART_VIRTUAL_ADDRESS
>>  #endif
>> +pt_enforce_wxn r0
>>  PRINT("- Ready -\r\n")
>>  /* Jump to C world */
>>  mov_w r2, start_secondary
>> diff --git a/xen/arch/arm/arm64/mmu/head.S b/xen/arch/arm/arm64/mmu/head.S
>> index a5271e3880..25028bdf07 100644
>> --- a/xen/arch/arm/arm64/mmu/head.S
>> +++ b/xen/arch/arm/arm64/mmu/head.S
>> @@ -31,6 +31,25 @@
>>  isb
>>  .endm
>>  +/*
>> + * Enforce Xen page-tables do not contain mapping that are both
>> + * Writable and eXecutables.
>> + *
>> + * This should be called on each secondary CPU.
>> + */
>> +.macro pt_enforce_wxn tmp
>> +   mrs   \tmp, SCTLR_EL2
>> +   orr   \tmp, \tmp, #SCTLR_Axx_ELx_WXN
>> +   dsb   sy
>> +   msr   SCTLR_EL2, \tmp
>> +   /*
>> +* The TLBs may cache SCTLR_EL2.WXN. So ensure it is synchronized
>> +* before flushing the TLBs.
>> +*/
>> +   isb
>> +   flush_xen_tlb_local
>> +.endm
>> +
> 
> It would be preferable if we can set the flag right when the MMU is 
> initialized enabled configured. This would avoid the extra TLB flush and 
> SCTLR dance. How about the following (not compiled/cleaned) code:

Thank you for the detailed information. Sure, I will try below code and keep you
updated about if it works. Will update the patch accordingly.

> 
> diff --git a/xen/arch/arm/arm64/mmu/head.S b/xen/arch/arm/arm64/mmu/head.S
> index a5271e388071..6b19d15ff89f 100644
> --- a/xen/arch/arm/arm64/mmu/head.S
> +++ b/xen/arch/arm/arm64/mmu/head.S
> @@ -264,10 +264,11 @@ ENDPROC(create_page_tables)
>  * Inputs:
>  *   x0 : Physical address of the page tables.
>  *
> - * Clobbers x0 - x4
> + * Clobbers x0 - x6
>  */
> enable_mmu:
> mov   x4, x0
> +mov   x5, x1
> PRINT("- Turning on paging -\r\n")
> 
> /*
> @@ -283,6 +284,7 @@ enable_mmu:
> mrs   x0, SCTLR_EL2
> orr   x0, x0, #SCTLR_Axx_ELx_M  /* Enable MMU */
> orr   x0, x0, #SCTLR_Axx_ELx_C  /* Enable D-cache */
> +orr   x0, x0, x5/* Enable extra flags */
> dsb   sy /* Flush PTE writes and finish reads */
> msr   SCTLR_EL2, x0  /* now paging is enabled */
> isb  /* Now, flush the icache */
> @@ -297,16 +299,17 @@ ENDPROC(enable_mmu)
>  * Inputs:
>  *   lr : Virtual address to return to.
>  *
> - * Clobbers x0 - x5
> + * Clobbers x0 - x6
>  */
> ENTRY(enable_secondary_cpu_mm)
> -mov   x5, lr
> +mov   x6, lr
> 
> load_paddr x0, init_ttbr
> ldr   x0, [x0]
> 
> +mov   x1, #SCTLR_Axx_ELx_WXN/* Enable WxN from the start */
> blenable_mmu
> -mov   lr, x5
> +mov   lr, x6
> 
> /* Return to the virtual address requested by the caller. */
> ret
> @@ -320,16 +323,17 @@ ENDPROC(enable_secondary_cpu_mm)
>  * Inputs:
>  *   lr : Virtual address to return to.
>  *
> - * Clobbers x0 - x5
> + * Clobbers x0 - x6
>  */
> ENTRY(enable_boot_cpu_mm)
> -mov   x5, lr
> +mov   x6, 

[PATCH] xen/arm64: head.S: Fix wrong enable_boot_cpu_mm() code movement

2023-09-15 Thread Henry Wang
Some addressed comments on enable_boot_cpu_mm() were reintroduced
back during the code movement from arm64/head.S to arm64/mmu/head.S.
We should drop the unreachable code, move the 'mov lr, x5' closer to
'b remove_identity_mapping' so it is clearer that it will return,
and update the in-code comment accordingly.

Fixes: 6734327d76be ("xen/arm64: Split and move MMU-specific head.S to 
mmu/head.S")
Reported-by: Julien Grall 
Signed-off-by: Henry Wang 
---
 xen/arch/arm/arm64/mmu/head.S | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/arm64/mmu/head.S b/xen/arch/arm/arm64/mmu/head.S
index a5271e3880..88075ef083 100644
--- a/xen/arch/arm/arm64/mmu/head.S
+++ b/xen/arch/arm/arm64/mmu/head.S
@@ -329,7 +329,6 @@ ENTRY(enable_boot_cpu_mm)
 load_paddr x0, boot_pgtable
 
 blenable_mmu
-mov   lr, x5
 
 /*
  * The MMU is turned on and we are in the 1:1 mapping. Switch
@@ -338,19 +337,15 @@ ENTRY(enable_boot_cpu_mm)
 ldr   x0, =1f
 brx0
 1:
+mov   lr, x5
 /*
  * The 1:1 map may clash with other parts of the Xen virtual memory
  * layout. As it is not used anymore, remove it completely to
  * avoid having to worry about replacing existing mapping
- * afterwards. Function will return to primary_switched.
+ * afterwards. Function will return to the virtual address requested
+ * by the caller.
  */
 b remove_identity_mapping
-
-/*
- * Below is supposed to be unreachable code, as "ret" in
- * remove_identity_mapping will use the return address in LR in 
advance.
- */
-b fail
 ENDPROC(enable_boot_cpu_mm)
 
 /*
-- 
2.25.1




[xen-unstable test] 183009: tolerable FAIL - PUSHED

2023-09-15 Thread osstest service owner
flight 183009 xen-unstable real [real]
flight 183012 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/183009/
http://logs.test-lab.xenproject.org/osstest/logs/183012/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemuu-win7-amd64 12 windows-install fail pass in 
183012-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stop  fail in 183012 like 183005
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 183005
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 183005
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 183005
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 183005
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 183005
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 183005
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 183005
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 183005
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 183005
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 183005
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 183005
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 xen  b5a601093d1f9d5e96eb74b692f1b6252a2598a2
baseline version:
 xen  6aa25c32180ab59081c73bae4c568367d9133a1f

Last test of basis   183005  2023-09-15 01:53:36 

[xen-unstable-smoke test] 183011: tolerable all pass - PUSHED

2023-09-15 Thread osstest service owner
flight 183011 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183011/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  290f82375d828ef93f831a5ef028f1283aa1ea47
baseline version:
 xen  b5a601093d1f9d5e96eb74b692f1b6252a2598a2

Last test of basis   183006  2023-09-15 02:00:28 Z0 days
Testing same since   183011  2023-09-15 22:01:58 Z0 days1 attempts


People who touched revisions under test:
  Julien Grall 
  Michal Orzel 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   b5a601093d..290f82375d  290f82375d828ef93f831a5ef028f1283aa1ea47 -> smoke



Re: [PATCH for-4.18 v2] tools/light: Revoke permissions when a PCI detach for HVM domain

2023-09-15 Thread Henry Wang
Hi Julien,

> On Sep 15, 2023, at 20:52, Julien Grall  wrote:
> 
> From: Julien Grall 
> 
> Currently, libxl will grant IOMEM, I/O port and IRQ permissions when
> a PCI is attached (see pci_add_dm_done()) for all domain types. However,
> the permissions are only revoked for non-HVM domain (see do_pci_remove()).
> 
> This means that HVM domains will be left with extra permissions. While
> this look bad on the paper, the IRQ permissions should be revoked
> when the Device Model call xc_physdev_unmap_pirq() and such domain
> cannot directly mapped I/O port and IOMEM regions. Instead, this has to
> be done by a Device Model.
> 
> The Device Model can only run in dom0 or PV stubdomain (upstream libxl
> doesn't have support for HVM/PVH stubdomain).
> 
> For PV/PVH stubdomain, the permission are properly revoked, so there is
> no security concern.
> 
> This leaves dom0. There are two cases:
>  1) Privileged: Anyone gaining access to the Device Model would already
> have large control on the host.
>  2) Deprivileged: PCI passthrough require PHYSDEV operations which
> are not accessible when the Device Model is restricted.
> 
> So overall, it is believed that the extra permissions cannot be exploited.
> 
> Rework the code so the permissions are all removed for HVM domains.
> This needs to happen after the QEMU has detached the device. So
> the revocation is now moved to pci_remove_detached().
> 
> Also add a comment on top of the error message when the PIRQ cannot
> be unbind to explain this could be a spurious error as QEMU may have
> already done it.
> 
> Signed-off-by: Julien Grall 

As in discussion in v1, it is agreed that this patch should be included in
4.18, although technically my release-ack tag should be effective after
code freeze, I am still providing the tag to avoid possible confusion:

Release-acked-by: Henry Wang 

Kind regards,
Henry




Re: [PATCH v6 03/13] xen/arm64: Split and move MMU-specific head.S to mmu/head.S

2023-09-15 Thread Henry Wang
Hi,

> On Sep 16, 2023, at 07:17, Henry Wang  wrote:
> 
> Hi Julien,
> 
>> On Sep 16, 2023, at 05:41, Julien Grall  wrote:
>> 
>> Hi Henry,
>> 
>> I realize that this was already committed. But something went wrong during 
>> the code movement.
>> 
>> On 28/08/2023 02:32, Henry Wang wrote:
>>> +/*
>>> + * Enable mm (turn on the data cache and the MMU) for the boot CPU.
>>> + * The function will return to the virtual address provided in LR (e.g. the
>>> + * runtime mapping).
>>> + *
>>> + * Inputs:
>>> + *   lr : Virtual address to return to.
>>> + *
>>> + * Clobbers x0 - x5
>>> + */
>>> +ENTRY(enable_boot_cpu_mm)
>>> +mov   x5, lr
>>> +
>>> +blcreate_page_tables
>>> +load_paddr x0, boot_pgtable
>>> +
>>> +blenable_mmu
>>> +mov   lr, x5
>>> +
>>> +/*
>>> + * The MMU is turned on and we are in the 1:1 mapping. Switch
>>> + * to the runtime mapping.
>>> + */
>>> +ldr   x0, =1f
>>> +brx0
>>> +1:
>>> +/*
>>> + * The 1:1 map may clash with other parts of the Xen virtual memory
>>> + * layout. As it is not used anymore, remove it completely to
>>> + * avoid having to worry about replacing existing mapping
>>> + * afterwards. Function will return to primary_switched.
>>> + */
>>> +b remove_identity_mapping
>>> +
>>> +/*
>>> + * Below is supposed to be unreachable code, as "ret" in
>>> + * remove_identity_mapping will use the return address in LR in 
>>> advance.
>>> + */
>>> +b fail
>> 
>> The "b fail" didn't exist in head.S. I guess this was due to a wrong 
>> rebase? Can you check if there is something else that went missing?
> 
> Please correct me if I am wrong but I think the “b fail” of 
> enable_boot_cpu_mm() is
> in the mmu head.S, see line 348 [1].
> 
> [1] 
> https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/arm/arm64/mmu/head.S;h=d71fdc69a531780501387fc5c717b4c41bb1b66a;hb=6734327d76be38d20f280ecc96392e385fbc1d8b#l348

Oh I realized we agreed to remove this line as it is unreachable, I will send a 
patch.
Sorry about it.

Kind regards,
Henry


> 
> Kind regards,
> Henry
> 
> 
>> 
>> Cheers,
>> 
>> -- 
>> Julien Grall
> 



Re: [PATCH v6 03/13] xen/arm64: Split and move MMU-specific head.S to mmu/head.S

2023-09-15 Thread Henry Wang
Hi Julien,

> On Sep 16, 2023, at 05:41, Julien Grall  wrote:
> 
> Hi Henry,
> 
> I realize that this was already committed. But something went wrong during 
> the code movement.
> 
> On 28/08/2023 02:32, Henry Wang wrote:
>> +/*
>> + * Enable mm (turn on the data cache and the MMU) for the boot CPU.
>> + * The function will return to the virtual address provided in LR (e.g. the
>> + * runtime mapping).
>> + *
>> + * Inputs:
>> + *   lr : Virtual address to return to.
>> + *
>> + * Clobbers x0 - x5
>> + */
>> +ENTRY(enable_boot_cpu_mm)
>> +mov   x5, lr
>> +
>> +blcreate_page_tables
>> +load_paddr x0, boot_pgtable
>> +
>> +blenable_mmu
>> +mov   lr, x5
>> +
>> +/*
>> + * The MMU is turned on and we are in the 1:1 mapping. Switch
>> + * to the runtime mapping.
>> + */
>> +ldr   x0, =1f
>> +brx0
>> +1:
>> +/*
>> + * The 1:1 map may clash with other parts of the Xen virtual memory
>> + * layout. As it is not used anymore, remove it completely to
>> + * avoid having to worry about replacing existing mapping
>> + * afterwards. Function will return to primary_switched.
>> + */
>> +b remove_identity_mapping
>> +
>> +/*
>> + * Below is supposed to be unreachable code, as "ret" in
>> + * remove_identity_mapping will use the return address in LR in 
>> advance.
>> + */
>> +b fail
> 
> The "b fail" didn't exist in head.S. I guess this was due to a wrong 
> rebase? Can you check if there is something else that went missing?

Please correct me if I am wrong but I think the “b fail” of 
enable_boot_cpu_mm() is
in the mmu head.S, see line 348 [1].

[1] 
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/arm/arm64/mmu/head.S;h=d71fdc69a531780501387fc5c717b4c41bb1b66a;hb=6734327d76be38d20f280ecc96392e385fbc1d8b#l348

Kind regards,
Henry


> 
> Cheers,
> 
> -- 
> Julien Grall



Re: [PATCH v6 08/13] xen/arm: Fold mmu_init_secondary_cpu() to head.S

2023-09-15 Thread Julien Grall

Hi Henry,

On 28/08/2023 02:32, Henry Wang wrote:

Currently mmu_init_secondary_cpu() only enforces the page table
should not contain mapping that are both Writable and eXecutables
after boot. To ease the arch/arm/mm.c split work, fold this function
to head.S.

Introduce assembly macro pt_enforce_wxn for both arm32 and arm64.
For arm64, the macro is called at the end of enable_secondary_cpu_mm().
For arm32, the macro is called before secondary CPUs jumping into
the C world.

Signed-off-by: Henry Wang 
---
v6:
- New patch.
---
  xen/arch/arm/arm32/head.S | 20 
  xen/arch/arm/arm64/mmu/head.S | 21 +
  xen/arch/arm/include/asm/mm.h |  2 --
  xen/arch/arm/mm.c |  6 --
  xen/arch/arm/smpboot.c|  2 --
  5 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S
index 33b038e7e0..39218cf15f 100644
--- a/xen/arch/arm/arm32/head.S
+++ b/xen/arch/arm/arm32/head.S
@@ -83,6 +83,25 @@
  isb
  .endm
  
+/*

+ * Enforce Xen page-tables do not contain mapping that are both
+ * Writable and eXecutables.
+ *
+ * This should be called on each secondary CPU.
+ */
+.macro pt_enforce_wxn tmp
+mrc   CP32(\tmp, HSCTLR)
+orr   \tmp, \tmp, #SCTLR_Axx_ELx_WXN
+dsb
+mcr   CP32(\tmp, HSCTLR)
+/*
+ * The TLBs may cache SCTLR_EL2.WXN. So ensure it is synchronized
+ * before flushing the TLBs.
+ */
+isb
+flush_xen_tlb_local \tmp
+.endm
+
  /*
   * Common register usage in this file:
   *   r0  -
@@ -254,6 +273,7 @@ secondary_switched:
  /* Use a virtual address to access the UART. */
  mov_w r11, EARLY_UART_VIRTUAL_ADDRESS
  #endif
+pt_enforce_wxn r0
  PRINT("- Ready -\r\n")
  /* Jump to C world */
  mov_w r2, start_secondary
diff --git a/xen/arch/arm/arm64/mmu/head.S b/xen/arch/arm/arm64/mmu/head.S
index a5271e3880..25028bdf07 100644
--- a/xen/arch/arm/arm64/mmu/head.S
+++ b/xen/arch/arm/arm64/mmu/head.S
@@ -31,6 +31,25 @@
  isb
  .endm
  
+/*

+ * Enforce Xen page-tables do not contain mapping that are both
+ * Writable and eXecutables.
+ *
+ * This should be called on each secondary CPU.
+ */
+.macro pt_enforce_wxn tmp
+   mrs   \tmp, SCTLR_EL2
+   orr   \tmp, \tmp, #SCTLR_Axx_ELx_WXN
+   dsb   sy
+   msr   SCTLR_EL2, \tmp
+   /*
+* The TLBs may cache SCTLR_EL2.WXN. So ensure it is synchronized
+* before flushing the TLBs.
+*/
+   isb
+   flush_xen_tlb_local
+.endm
+


It would be preferable if we can set the flag right when the MMU is 
initialized enabled configured. This would avoid the extra TLB flush and 
SCTLR dance. How about the following (not compiled/cleaned) code:


diff --git a/xen/arch/arm/arm64/mmu/head.S b/xen/arch/arm/arm64/mmu/head.S
index a5271e388071..6b19d15ff89f 100644
--- a/xen/arch/arm/arm64/mmu/head.S
+++ b/xen/arch/arm/arm64/mmu/head.S
@@ -264,10 +264,11 @@ ENDPROC(create_page_tables)
  * Inputs:
  *   x0 : Physical address of the page tables.
  *
- * Clobbers x0 - x4
+ * Clobbers x0 - x6
  */
 enable_mmu:
 mov   x4, x0
+mov   x5, x1
 PRINT("- Turning on paging -\r\n")

 /*
@@ -283,6 +284,7 @@ enable_mmu:
 mrs   x0, SCTLR_EL2
 orr   x0, x0, #SCTLR_Axx_ELx_M  /* Enable MMU */
 orr   x0, x0, #SCTLR_Axx_ELx_C  /* Enable D-cache */
+orr   x0, x0, x5/* Enable extra flags */
 dsb   sy /* Flush PTE writes and finish 
reads */

 msr   SCTLR_EL2, x0  /* now paging is enabled */
 isb  /* Now, flush the icache */
@@ -297,16 +299,17 @@ ENDPROC(enable_mmu)
  * Inputs:
  *   lr : Virtual address to return to.
  *
- * Clobbers x0 - x5
+ * Clobbers x0 - x6
  */
 ENTRY(enable_secondary_cpu_mm)
-mov   x5, lr
+mov   x6, lr

 load_paddr x0, init_ttbr
 ldr   x0, [x0]

+mov   x1, #SCTLR_Axx_ELx_WXN/* Enable WxN from the start */
 blenable_mmu
-mov   lr, x5
+mov   lr, x6

 /* Return to the virtual address requested by the caller. */
 ret
@@ -320,16 +323,17 @@ ENDPROC(enable_secondary_cpu_mm)
  * Inputs:
  *   lr : Virtual address to return to.
  *
- * Clobbers x0 - x5
+ * Clobbers x0 - x6
  */
 ENTRY(enable_boot_cpu_mm)
-mov   x5, lr
+mov   x6, lr

 blcreate_page_tables
 load_paddr x0, boot_pgtable

+mov   x1, #0/* No extra SCTLR flags */
 blenable_mmu
-mov   lr, x5
+mov   lr, x6

 /*
  * The MMU is turned on and we are in the 1:1 mapping. Switch

The same logic could be used for arm32.

Cheers,

--
Julien Grall



Re: [PATCH 3/3] x86/xen: allow nesting of same lazy mode

2023-09-15 Thread Boris Ostrovsky




On 9/13/23 7:38 AM, Juergen Gross wrote:

When running as a paravirtualized guest under Xen, Linux is using
"lazy mode" for issuing hypercalls which don't need to take immediate
effect in order to improve performance (examples are e.g. multiple
PTE changes).

There are two different lazy modes defined: MMU and CPU lazy mode.
Today it is not possible to nest multiple lazy mode sections, even if
they are of the same kind. A recent change in memory management added
nesting of MMU lazy mode sections, resulting in a regression when
running as Xen PV guest.

Technically there is no reason why nesting of multiple sections of the
same kind of lazy mode shouldn't be allowed. So add support for that
for fixing the regression.

Fixes: bcc6cc832573 ("mm: add default definition of set_ptes()")
Signed-off-by: Juergen Gross 


For patches 2 and 3

Reviewed-by: Boris Ostrovsky 



Re: [PATCH v6 03/13] xen/arm64: Split and move MMU-specific head.S to mmu/head.S

2023-09-15 Thread Julien Grall

Hi Henry,

I realize that this was already committed. But something went wrong 
during the code movement.


On 28/08/2023 02:32, Henry Wang wrote:

+/*
+ * Enable mm (turn on the data cache and the MMU) for the boot CPU.
+ * The function will return to the virtual address provided in LR (e.g. the
+ * runtime mapping).
+ *
+ * Inputs:
+ *   lr : Virtual address to return to.
+ *
+ * Clobbers x0 - x5
+ */
+ENTRY(enable_boot_cpu_mm)
+mov   x5, lr
+
+blcreate_page_tables
+load_paddr x0, boot_pgtable
+
+blenable_mmu
+mov   lr, x5
+
+/*
+ * The MMU is turned on and we are in the 1:1 mapping. Switch
+ * to the runtime mapping.
+ */
+ldr   x0, =1f
+brx0
+1:
+/*
+ * The 1:1 map may clash with other parts of the Xen virtual memory
+ * layout. As it is not used anymore, remove it completely to
+ * avoid having to worry about replacing existing mapping
+ * afterwards. Function will return to primary_switched.
+ */
+b remove_identity_mapping
+
+/*
+ * Below is supposed to be unreachable code, as "ret" in
+ * remove_identity_mapping will use the return address in LR in 
advance.
+ */
+b fail


The "b fail" didn't exist in head.S. I guess this was due to a wrong 
rebase? Can you check if there is something else that went missing?


Cheers,

--
Julien Grall



Re: [PATCH v2] xen/arm: Skip Xen specific nodes/properties from hwdom /chosen node

2023-09-15 Thread Julien Grall

Hi Henry,

On 13/09/2023 01:49, Henry Wang wrote:

Hi Stefano,


On Sep 13, 2023, at 01:46, Stefano Stabellini  wrote:

On Tue, 12 Sep 2023, Michal Orzel wrote:

Skip the following Xen specific host device tree nodes/properties
from being included into hardware domain /chosen node:
- xen,static-heap: this property informs Xen about memory regions
   reserved exclusively as static heap,
- xen,domain-shared-memory-v1: node with this compatible informs Xen
   about static shared memory region for a domain. Xen exposes a different
   node (under /reserved-memory with compatible "xen,shared-memory-v1") to
   let domain know about the shared region,
- xen,evtchn-v1: node with this compatible informs Xen about static
   event channel configuration for a domain. Xen does not expose
   information about static event channels to domUs and dom0 case was
   overlooked (by default nodes from host dt are copied to dom0 fdt unless
   explicitly marked to be skipped), since the author's idea was not to
   expose it (refer docs/misc/arm/device-tree/booting.txt, "Static Event
   Channel"). Even if we wanted to expose the static event channel
   information, the current node is in the wrong format (i.e. contains
   phandle to domU node not visible by dom0). Lastly, this feature is
   marked as tech-preview and there is no Linux dt binding in place.

Signed-off-by: Michal Orzel 


Reviewed-by: Stefano Stabellini 

Do we need Henry's explicit approval on bug fixes at this point?


I think it is a bit too early, we can wait for the code freeze for the 
release-ack.
Before code freeze, maintainers/committers can push the patch as usual.

Anyway, I agree this patch is definitely qualified to be included in 4.18 so 
feel
free to add below tag if you want.

Release-acked-by: Henry Wang 


Thanks. It is now committed.

Cheers,

--
Julien Grall



Re: [PATCH v2 for-4.18?] x86: support data operand independent timing mode

2023-09-15 Thread Julien Grall

Hi Jan,

On 14/09/2023 12:01, Jan Beulich wrote:

On 14.09.2023 11:18, Julien Grall wrote:

On 14/09/2023 10:11, Jan Beulich wrote:

On 14.09.2023 11:04, Julien Grall wrote:

On 14/09/2023 07:32, Jan Beulich wrote:

On 13.09.2023 19:56, Julien Grall wrote:

If not, I think we should taint Xen and/or print a message if the admin
requested to use DIT. This would make clear that the admin is trying to
do something that doesn't work.


Tainting Xen on all hardware not exposing the bit would seem entirely
unreasonable to me. If we learned of specific cases where the vendor
promises are broken, we could taint there, sure. I guess that would be
individual XSAs / CVEs then for each instance.


... we need to somehow let the user know that the HW it is running on
may not honor DIT. Tainting might be a bit too harsh, but I think
printing a
message with warning_add() is necessary.


But Intel's claim is that with the bit unavailable hardware behaves as
if DIT was in effect.


I am confused. Above, you suggested it cannot be guaranteed. So I
interpreted we don't know what's happening on any processor.


Right. We can trust vendors, or not.


So where
you referring to...


   Therefore you're still suggesting to emit a

warning on most of Intel's hardware and on all non-Intel one.


... non-Intel HW?


That's
going too far, imo.


We could restrict the warning to non-Intel platform.


That still goes too far imo. I'm not convinced we should cover for
vendor uncertainty here. We can't make a system any safer than its
hardware is, in this regard. We simply have no (or not enough)
influence on the internal workings of their pipelines.

Fair enough. I would still prefer a message in the log but ...



What I have done is add a sentence to the command line option's
documentation:

"Note that enabling this option cannot guarantee anything beyond what
  underlying hardware guarantees (with, where available and known to Xen,
  respective tweaks applied)."

Plus I've further qualified the option:

### dit (x86/Intel)


... I am also happy with these two changes.

Cheers,

--
Julien Grall



[PATCH 2/7] x86/emul: Fix and extend #DB trap handling

2023-09-15 Thread Andrew Cooper
Lots of this is very very broken, but we need to start somewhere.

First, the bugfix.  Hooks which use X86EMUL_DONE to skip the general emulation
still need to evaluate singlestep as part of completing the instruction.
Defer the logic until X86EMUL_DONE has been converted to X86EMUL_OKAY.

Second, the improvement.  PENDING_DBG, INTERRUPTIBILITY and ACTIVITY are
internal pipeline state which Intel exposed to software in the VMCS, and AMD
exposed a subset of in the VMCB.  Importantly, bits set in PENDING_DBG can
survive across multiple instruction boundaries if e.g. delivery of #DB is
delayed by a MovSS.

For now, introduce a full pending_dbg field into the retire union.  This keeps
the sh_page_fault() and init_context() paths working but in due course the
field will want to lose the "retire" infix.

In addition, set singlestep into pending_dbg as appropriate.  Leave the old
singlestep bitfield in place until we can adjust the callers to the new
scheme.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Jinoh Kang 

v2:
 * Only evaluate singlestep on X86EMUL_OKAY, but do so after X86EMUL_DONE has
   been adjusted to X86EMUL_OKAY.
 * Adjust comments in light of X86EMUL_DONE not getting back to callers.
---
 xen/arch/x86/x86_emulate/x86_emulate.c | 20 +---
 xen/arch/x86/x86_emulate/x86_emulate.h | 14 +++---
 2 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c 
b/xen/arch/x86/x86_emulate/x86_emulate.c
index 94caec1d142c..de7f99500e3f 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -8379,13 +8379,6 @@ x86_emulate(
 if ( !mode_64bit() )
 _regs.r(ip) = (uint32_t)_regs.r(ip);
 
-/* Should a singlestep #DB be raised? */
-if ( rc == X86EMUL_OKAY && singlestep && !ctxt->retire.mov_ss )
-{
-ctxt->retire.singlestep = true;
-ctxt->retire.sti = false;
-}
-
 if ( rc != X86EMUL_DONE )
 *ctxt->regs = _regs;
 else
@@ -8394,6 +8387,19 @@ x86_emulate(
 rc = X86EMUL_OKAY;
 }
 
+/* Should a singlestep #DB be raised? */
+if ( rc == X86EMUL_OKAY && singlestep )
+{
+ctxt->retire.pending_dbg |= X86_DR6_BS;
+
+/* BROKEN - TODO, merge into pending_dbg. */
+if ( !ctxt->retire.mov_ss )
+{
+ctxt->retire.singlestep = true;
+ctxt->retire.sti = false;
+}
+}
+
 ctxt->regs->eflags &= ~X86_EFLAGS_RF;
 
  done:
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h 
b/xen/arch/x86/x86_emulate/x86_emulate.h
index 698750267a90..fbc023c37e34 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.h
+++ b/xen/arch/x86/x86_emulate/x86_emulate.h
@@ -588,15 +588,23 @@ struct x86_emulate_ctxt
 /* Canonical opcode (see below) (valid only on X86EMUL_OKAY). */
 unsigned int opcode;
 
-/* Retirement state, set by the emulator (valid only on X86EMUL_OKAY). */
+/*
+ * Retirement state, set by the emulator (valid only on X86EMUL_OKAY).
+ *
+ * TODO: all this state should be input/output from the VMCS PENDING_DBG,
+ * INTERRUPTIBILITY and ACTIVITIY fields.
+ */
 union {
-uint8_t raw;
+unsigned long raw;
 struct {
+/* Accumulated %dr6 trap bits, positive polarity. */
+unsigned int pending_dbg;
+
 bool hlt:1;  /* Instruction HLTed. */
 bool mov_ss:1;   /* Instruction sets MOV-SS irq shadow. */
 bool sti:1;  /* Instruction sets STI irq shadow. */
 bool unblock_nmi:1;  /* Instruction clears NMI blocking. */
-bool singlestep:1;   /* Singlestepping was active. */
+bool singlestep:1;   /* Singlestepping was active. (TODO, merge 
into pending_dbg) */
 };
 } retire;
 
-- 
2.30.2




[PATCH 7/7] x86/pv: Rewrite %dr6 handling

2023-09-15 Thread Andrew Cooper
All #DB exceptions result in an update of %dr6, but this isn't handled
properly by Xen for any guest type.

Remove all ad-hoc dr6 handling, leaving it to pv_inject_event() in most cases
and using the new x86_merge_dr6() helper.

In do_debug(), swap the dr6 to pending_dbg in order to operate entirely with
positive polarity.  Among other things, this helps spot RTM/BLD in the
diagnostic message.

Drop the unconditional v->arch.dr6 adjustment.  pv_inject_event() performs the
adjustment in the common case, but retain the prior behaviour if a debugger is
attached.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Jinoh Kang 

v2:
 * Split pieces out into an earlier patch.
 * Extend do_debug() to use pending_dbg entirely, rather than have the same
   variable used with different polarity at different times.
---
 xen/arch/x86/pv/emul-priv-op.c  |  5 +
 xen/arch/x86/pv/emulate.c   |  9 +++--
 xen/arch/x86/pv/ro-page-fault.c |  4 ++--
 xen/arch/x86/pv/traps.c | 17 +
 xen/arch/x86/traps.c| 23 +++
 5 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 2f73ed0682e1..fe2011f28e34 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -1364,10 +1364,7 @@ int pv_emulate_privileged_op(struct cpu_user_regs *regs)
 ASSERT(!curr->arch.pv.trap_bounce.flags);
 
 if ( ctxt.ctxt.retire.pending_dbg )
-{
-curr->arch.dr6 |= ctxt.ctxt.retire.pending_dbg | 
DR_STATUS_RESERVED_ONE;
-pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
-}
+pv_inject_DB(ctxt.ctxt.retire.pending_dbg);
 
 /* fall through */
 case X86EMUL_RETRY:
diff --git a/xen/arch/x86/pv/emulate.c b/xen/arch/x86/pv/emulate.c
index e7a1c0a2cc4f..29425a0a00ec 100644
--- a/xen/arch/x86/pv/emulate.c
+++ b/xen/arch/x86/pv/emulate.c
@@ -71,10 +71,15 @@ void pv_emul_instruction_done(struct cpu_user_regs *regs, 
unsigned long rip)
 {
 regs->rip = rip;
 regs->eflags &= ~X86_EFLAGS_RF;
+
 if ( regs->eflags & X86_EFLAGS_TF )
 {
-current->arch.dr6 |= DR_STEP | DR_STATUS_RESERVED_ONE;
-pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+/*
+ * TODO, this should generally use TF from the start of the
+ * instruction.  It's only a latent bug for now, as this path isn't
+ * used for any instruction which modifies eflags.
+ */
+pv_inject_DB(X86_DR6_BS);
 }
 }
 
diff --git a/xen/arch/x86/pv/ro-page-fault.c b/xen/arch/x86/pv/ro-page-fault.c
index cad28ef928ad..f6bb33556e72 100644
--- a/xen/arch/x86/pv/ro-page-fault.c
+++ b/xen/arch/x86/pv/ro-page-fault.c
@@ -389,8 +389,8 @@ int pv_ro_page_fault(unsigned long addr, struct 
cpu_user_regs *regs)
 
 /* Fallthrough */
 case X86EMUL_OKAY:
-if ( ctxt.retire.singlestep )
-pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+if ( ctxt.retire.pending_dbg )
+pv_inject_DB(ctxt.retire.pending_dbg);
 
 /* Fallthrough */
 case X86EMUL_RETRY:
diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 74f333da7e1c..553b04bca956 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -13,6 +13,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -50,9 +51,9 @@ void pv_inject_event(const struct x86_event *event)
 tb->cs= ti->cs;
 tb->eip   = ti->address;
 
-if ( event->type == X86_EVENTTYPE_HW_EXCEPTION &&
- vector == X86_EXC_PF )
+switch ( vector | -(event->type == X86_EVENTTYPE_SW_INTERRUPT) )
 {
+case X86_EXC_PF:
 curr->arch.pv.ctrlreg[2] = event->cr2;
 arch_set_cr2(curr, event->cr2);
 
@@ -62,9 +63,17 @@ void pv_inject_event(const struct x86_event *event)
 error_code |= PFEC_user_mode;
 
 trace_pv_page_fault(event->cr2, error_code);
-}
-else
+break;
+
+case X86_EXC_DB:
+curr->arch.dr6 = x86_merge_dr6(curr->domain->arch.cpu_policy,
+   curr->arch.dr6, event->pending_dbg);
+/* Fallthrough */
+
+default:
 trace_pv_trap(vector, regs->rip, use_error_code, error_code);
+break;
+}
 
 if ( use_error_code )
 {
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index dead728ce329..447edc827b3a 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1887,11 +1887,11 @@ void do_device_not_available(struct cpu_user_regs *regs)
 /* SAF-1-safe */
 void do_debug(struct cpu_user_regs *regs)
 {
-unsigned long dr6;
+unsigned long pending_dbg;
 struct vcpu *v = current;
 
-/* Stash dr6 as early as possible. */
-dr6 = read_debugreg(6);
+/* Stash dr6 as early as possible, operating with positive polarity. */
+pending_dbg = read_debugreg(6) ^ X86_DR6_DEFAULT;
 
 /*
 

[PATCH 5/7] x86: Introduce x86_merge_dr6()

2023-09-15 Thread Andrew Cooper
The current logic used to update %dr6 when injecting #DB is buggy.

The SDM/APM documention on %dr6 updates is far from ideal, but does at least
make clear that it's non-trivial.  The actual behaviour is to overwrite
B{0..3} and accumulate all other bits.

Introduce x86_merge_dr6() to perform the operaton properly.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Jinoh Kang 

v2:
 * Extend commit message
---
 xen/arch/x86/debug.c | 20 
 xen/arch/x86/include/asm/debugreg.h  |  7 +++
 xen/arch/x86/include/asm/x86-defns.h |  7 +++
 3 files changed, 34 insertions(+)

diff --git a/xen/arch/x86/debug.c b/xen/arch/x86/debug.c
index 127fe83021cd..bfcd83ea4d0b 100644
--- a/xen/arch/x86/debug.c
+++ b/xen/arch/x86/debug.c
@@ -3,6 +3,7 @@
  * Copyright (C) 2023 XenServer.
  */
 #include 
+#include 
 
 #include 
 
@@ -28,6 +29,25 @@ unsigned int x86_adj_dr6_rsvd(const struct cpu_policy *p, 
unsigned int dr6)
 return dr6;
 }
 
+unsigned int x86_merge_dr6(const struct cpu_policy *p, unsigned int dr6,
+   unsigned int new)
+{
+/* Flip dr6 to have positive polarity. */
+dr6 ^= X86_DR6_DEFAULT;
+
+/* Sanity check that only known values are passed in. */
+ASSERT(!(dr6 & ~X86_DR6_KNOWN_MASK));
+ASSERT(!(new & ~X86_DR6_KNOWN_MASK));
+
+/* Breakpoint matches are overridden.  All other bits accumulate. */
+dr6 = (dr6 & ~X86_DR6_BP_MASK) | new;
+
+/* Flip dr6 back to having default polarity. */
+dr6 ^= X86_DR6_DEFAULT;
+
+return x86_adj_dr6_rsvd(p, dr6);
+}
+
 unsigned int x86_adj_dr7_rsvd(const struct cpu_policy *p, unsigned int dr7)
 {
 unsigned int zeros = X86_DR7_ZEROS;
diff --git a/xen/arch/x86/include/asm/debugreg.h 
b/xen/arch/x86/include/asm/debugreg.h
index 39ba312b84ee..e98a9ce977fa 100644
--- a/xen/arch/x86/include/asm/debugreg.h
+++ b/xen/arch/x86/include/asm/debugreg.h
@@ -89,4 +89,11 @@ struct cpu_policy;
 unsigned int x86_adj_dr6_rsvd(const struct cpu_policy *p, unsigned int dr6);
 unsigned int x86_adj_dr7_rsvd(const struct cpu_policy *p, unsigned int dr7);
 
+/*
+ * Merge new bits into dr6.  'new' is always given in positive polarity,
+ * matching the Intel VMCS PENDING_DBG semantics.
+ */
+unsigned int x86_merge_dr6(const struct cpu_policy *p, unsigned int dr6,
+   unsigned int new);
+
 #endif /* _X86_DEBUGREG_H */
diff --git a/xen/arch/x86/include/asm/x86-defns.h 
b/xen/arch/x86/include/asm/x86-defns.h
index 5838631ef634..edfecc89bd08 100644
--- a/xen/arch/x86/include/asm/x86-defns.h
+++ b/xen/arch/x86/include/asm/x86-defns.h
@@ -116,6 +116,13 @@
 #define X86_DR6_BT  (_AC(1, UL) << 15)   /* Task switch
 */
 #define X86_DR6_RTM (_AC(1, UL) << 16)   /* #DB/#BP in RTM region 
(INV) */
 
+#define X86_DR6_BP_MASK \
+(X86_DR6_B0 | X86_DR6_B1 | X86_DR6_B2 | X86_DR6_B3)
+
+#define X86_DR6_KNOWN_MASK  \
+(X86_DR6_BP_MASK | X86_DR6_BLD | X86_DR6_BD | X86_DR6_BS |  \
+ X86_DR6_BT | X86_DR6_RTM)
+
 #define X86_DR6_ZEROS   _AC(0x1000, UL)  /* %dr6 bits forced to 0  
 */
 #define X86_DR6_DEFAULT _AC(0x0ff0, UL)  /* Default %dr6 value 
 */
 
-- 
2.30.2




[PATCH 4/7] x86/pv: Drop priv_op_ctxt.bpmatch and use pending_dbg instead

2023-09-15 Thread Andrew Cooper
With a full pending_dbg field in x86_emulate_ctxt, use it rather than using a
local bpmatch field.

This simplifies the X86EMUL_OKAY path as singlestep is already accumulated by
x86_emulate() when appropriate.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Jinoh Kang 

v2:
 * Tweak commit message to avoid referencing X86EMUL_DONE.
---
 xen/arch/x86/pv/emul-priv-op.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 0d9f84f458ba..2f73ed0682e1 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -34,7 +34,6 @@ struct priv_op_ctxt {
 unsigned long base, limit;
 } cs;
 char *io_emul_stub;
-unsigned int bpmatch;
 };
 
 /* I/O emulation helpers.  Use non-standard calling conventions. */
@@ -367,7 +366,8 @@ static int cf_check read_io(
 if ( !guest_io_okay(port, bytes, curr, ctxt->regs) )
 return X86EMUL_UNHANDLEABLE;
 
-poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes);
+poc->ctxt.retire.pending_dbg |=
+check_guest_io_breakpoint(curr, port, bytes);
 
 if ( admin_io_okay(port, bytes, currd) )
 {
@@ -472,7 +472,8 @@ static int cf_check write_io(
 if ( !guest_io_okay(port, bytes, curr, ctxt->regs) )
 return X86EMUL_UNHANDLEABLE;
 
-poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes);
+poc->ctxt.retire.pending_dbg |=
+check_guest_io_breakpoint(curr, port, bytes);
 
 if ( admin_io_okay(port, bytes, currd) )
 {
@@ -636,7 +637,8 @@ static int cf_check rep_ins(
 return X86EMUL_EXCEPTION;
 }
 
-poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes_per_rep);
+poc->ctxt.retire.pending_dbg |=
+check_guest_io_breakpoint(curr, port, bytes_per_rep);
 
 while ( *reps < goal )
 {
@@ -658,7 +660,7 @@ static int cf_check rep_ins(
 
 ++*reps;
 
-if ( poc->bpmatch || hypercall_preempt_check() )
+if ( poc->ctxt.retire.pending_dbg || hypercall_preempt_check() )
 break;
 
 /* x86_emulate() clips the repetition count to ensure we don't wrap. */
@@ -703,7 +705,8 @@ static int cf_check rep_outs(
 return X86EMUL_EXCEPTION;
 }
 
-poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes_per_rep);
+poc->ctxt.retire.pending_dbg |=
+check_guest_io_breakpoint(curr, port, bytes_per_rep);
 
 while ( *reps < goal )
 {
@@ -726,7 +729,7 @@ static int cf_check rep_outs(
 
 ++*reps;
 
-if ( poc->bpmatch || hypercall_preempt_check() )
+if ( poc->ctxt.retire.pending_dbg || hypercall_preempt_check() )
 break;
 
 /* x86_emulate() clips the repetition count to ensure we don't wrap. */
@@ -1360,12 +1363,9 @@ int pv_emulate_privileged_op(struct cpu_user_regs *regs)
 case X86EMUL_OKAY:
 ASSERT(!curr->arch.pv.trap_bounce.flags);
 
-if ( ctxt.ctxt.retire.singlestep )
-ctxt.bpmatch |= DR_STEP;
-
-if ( ctxt.bpmatch )
+if ( ctxt.ctxt.retire.pending_dbg )
 {
-curr->arch.dr6 |= ctxt.bpmatch | DR_STATUS_RESERVED_ONE;
+curr->arch.dr6 |= ctxt.ctxt.retire.pending_dbg | 
DR_STATUS_RESERVED_ONE;
 pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
 }
 
-- 
2.30.2




[PATCH v2 0/7] x86/pv: #DB vs %dr6 fixes, part 2

2023-09-15 Thread Andrew Cooper
This time with a bit of sanity testing.  See patches for details.

Andrew Cooper (7):
  x86/emul: ASSERT that X86EMUL_DONE doesn't escape to callers
  x86/emul: Fix and extend #DB trap handling
  x86/pv: Fix the determiniation of whether to inject #DB
  x86/pv: Drop priv_op_ctxt.bpmatch and use pending_dbg instead
  x86: Introduce x86_merge_dr6()
  x86: Extend x86_event with a pending_dbg field
  x86/pv: Rewrite %dr6 handling

 xen/arch/x86/debug.c   | 20 +
 xen/arch/x86/include/asm/debugreg.h|  7 ++
 xen/arch/x86/include/asm/domain.h  | 18 ++--
 xen/arch/x86/include/asm/hvm/hvm.h |  3 ++-
 xen/arch/x86/include/asm/x86-defns.h   |  7 ++
 xen/arch/x86/pv/emul-priv-op.c | 30 +-
 xen/arch/x86/pv/emulate.c  |  9 ++--
 xen/arch/x86/pv/ro-page-fault.c|  4 ++--
 xen/arch/x86/pv/traps.c| 17 +++
 xen/arch/x86/traps.c   | 23 ++--
 xen/arch/x86/x86_emulate/x86_emulate.c | 26 --
 xen/arch/x86/x86_emulate/x86_emulate.h | 19 
 12 files changed, 134 insertions(+), 49 deletions(-)

-- 
2.30.2




[PATCH 3/7] x86/pv: Fix the determiniation of whether to inject #DB

2023-09-15 Thread Andrew Cooper
We long ago fixed the emulator to not inject exceptions behind our back.
Therefore, assert that that a PV event (including interrupts, because that
would be buggy too) isn't pending, rather than skipping the #DB injection if
one is.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Jinoh Kang 

v2:
 * Drop X86EMUL_DONE adjustment.
---
 xen/arch/x86/pv/emul-priv-op.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 142bc4818cb5..0d9f84f458ba 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -1358,14 +1358,17 @@ int pv_emulate_privileged_op(struct cpu_user_regs *regs)
 switch ( rc )
 {
 case X86EMUL_OKAY:
+ASSERT(!curr->arch.pv.trap_bounce.flags);
+
 if ( ctxt.ctxt.retire.singlestep )
 ctxt.bpmatch |= DR_STEP;
+
 if ( ctxt.bpmatch )
 {
 curr->arch.dr6 |= ctxt.bpmatch | DR_STATUS_RESERVED_ONE;
-if ( !(curr->arch.pv.trap_bounce.flags & TBF_EXCEPTION) )
-pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
 }
+
 /* fall through */
 case X86EMUL_RETRY:
 return EXCRET_fault_fixed;
-- 
2.30.2




[PATCH 1/7] x86/emul: ASSERT that X86EMUL_DONE doesn't escape to callers

2023-09-15 Thread Andrew Cooper
This property is far from clear.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Jinoh Kang 

v2:
 * New
---
 xen/arch/x86/x86_emulate/x86_emulate.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c 
b/xen/arch/x86/x86_emulate/x86_emulate.c
index e88245eae9fb..94caec1d142c 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -8651,6 +8651,12 @@ int x86_emulate_wrapper(
 
 rc = x86_emulate(ctxt, ops);
 
+/*
+ * X86EMUL_DONE is an internal signal in the emulator, and is not expected
+ * to ever escape out to callers.
+ */
+ASSERT(rc != X86EMUL_DONE);
+
 /*
  * Most retire flags should only be set for successful instruction
  * emulation.
-- 
2.30.2




[PATCH 6/7] x86: Extend x86_event with a pending_dbg field

2023-09-15 Thread Andrew Cooper
... using the Intel VMCS PENDING_DBG semantics, and sharing storage with cr2.

This requires working around anonymous union bugs in obsolete versions of GCC,
which in turn needs to drop unnecessary const qualifiers.

Also introduce a pv_inject_DB() wrapper use this field nicely.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Jinoh Kang 

v2:
 * Split out of prior patch.
---
 xen/arch/x86/include/asm/domain.h  | 18 --
 xen/arch/x86/include/asm/hvm/hvm.h |  3 ++-
 xen/arch/x86/x86_emulate/x86_emulate.h |  5 -
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/include/asm/domain.h 
b/xen/arch/x86/include/asm/domain.h
index c2d9fc333be5..fd1f306222be 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -729,15 +729,29 @@ static inline void pv_inject_hw_exception(unsigned int 
vector, int errcode)
 pv_inject_event();
 }
 
+static inline void pv_inject_DB(unsigned long pending_dbg)
+{
+struct x86_event event = {
+.vector  = X86_EXC_DB,
+.type= X86_EVENTTYPE_HW_EXCEPTION,
+.error_code  = X86_EVENT_NO_EC,
+};
+
+event.pending_dbg = pending_dbg;
+
+pv_inject_event();
+}
+
 static inline void pv_inject_page_fault(int errcode, unsigned long cr2)
 {
-const struct x86_event event = {
+struct x86_event event = {
 .vector = X86_EXC_PF,
 .type = X86_EVENTTYPE_HW_EXCEPTION,
 .error_code = errcode,
-.cr2 = cr2,
 };
 
+event.cr2 = cr2;
+
 pv_inject_event();
 }
 
diff --git a/xen/arch/x86/include/asm/hvm/hvm.h 
b/xen/arch/x86/include/asm/hvm/hvm.h
index 6d53713fc3a9..ea966f4429f9 100644
--- a/xen/arch/x86/include/asm/hvm/hvm.h
+++ b/xen/arch/x86/include/asm/hvm/hvm.h
@@ -532,9 +532,10 @@ static inline void hvm_inject_page_fault(int errcode, 
unsigned long cr2)
 .vector = X86_EXC_PF,
 .type = X86_EVENTTYPE_HW_EXCEPTION,
 .error_code = errcode,
-.cr2 = cr2,
 };
 
+event.cr2 = cr2;
+
 hvm_inject_event();
 }
 
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h 
b/xen/arch/x86/x86_emulate/x86_emulate.h
index fbc023c37e34..e567a9b635d9 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.h
+++ b/xen/arch/x86/x86_emulate/x86_emulate.h
@@ -78,7 +78,10 @@ struct x86_event {
 uint8_t   type; /* X86_EVENTTYPE_* */
 uint8_t   insn_len; /* Instruction length */
 int32_t   error_code;   /* X86_EVENT_NO_EC if n/a */
-unsigned long cr2;  /* Only for X86_EXC_PF h/w exception */
+union {
+unsigned long cr2; /* #PF */
+unsigned long pending_dbg; /* #DB (new DR6 bits, positive polarity) */
+};
 };
 
 /*
-- 
2.30.2




Re: [PATCH 0/5] x86/pv: #DB vs %dr6 fixes, part 2

2023-09-15 Thread Andrew Cooper
On 13/09/2023 12:21 am, Andrew Cooper wrote:
> Slightly RFC.  This is the next chunk of debug fixes from the bug that Jinoh
> reported.

I was trying to do a bit of due diligence before posting v2, and have
made some discoveries.

pv/emul-priv-op SingleStep vs Branch Step
  https://gitlab.com/xen-project/xen/-/work_items/170

HVM IO Breakpoints:
  https://gitlab.com/xen-project/xen/-/work_items/171


A third which I'm on the fence about is about PV guests and General
Detect.  We firmly prohibit PV guests from setting DR7.GD, but we them
play with the DR6.GD bit as if it had been asserted.

It would be easy to put GD back into the set of reserved bits for DR6,
but it also wouldn't be hard to handle GD via dr7_emul and let the PV
guest have a more-normal looking set of debug functionality.

Thoughts?

~Andrew



[linux-linus test] 183008: regressions - FAIL

2023-09-15 Thread osstest service owner
flight 183008 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183008/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 8 xen-boot fail REGR. 
vs. 182531
 test-amd64-coresched-amd64-xl  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-libvirt-pair 12 xen-boot/src_host   fail REGR. vs. 182531
 test-amd64-amd64-libvirt-pair 13 xen-boot/dst_host   fail REGR. vs. 182531
 test-amd64-amd64-xl-xsm   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-libvirt-raw  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-libvirt-qcow2  8 xen-boot   fail REGR. vs. 182531
 test-amd64-amd64-xl-vhd   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-pvhv2-intel  8 xen-boot  fail REGR. vs. 182531
 test-amd64-amd64-xl-pvshim8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-freebsd12-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-ws16-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-debianhvm-amd64  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-libvirt-xsm  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-win7-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-win7-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-dom0pvh-xl-amd 14 guest-start   fail REGR. vs. 182531
 test-amd64-amd64-xl-shadow8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-xl-credit1   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-examine-uefi  8 reboot  fail REGR. vs. 182531
 test-amd64-amd64-pair12 xen-boot/src_hostfail REGR. vs. 182531
 test-amd64-amd64-pair13 xen-boot/dst_hostfail REGR. vs. 182531
 test-amd64-amd64-xl-multivcpu  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-qemuu-nested-intel  8 xen-boot  fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-ws16-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-pvhv2-amd  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-ovmf-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-qemuu-nested-amd  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-dom0pvh-xl-intel 14 guest-start fail REGR. vs. 182531
 test-amd64-amd64-libvirt  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-freebsd11-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-pygrub   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 8 xen-boot fail REGR. vs. 
182531
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 8 xen-boot fail REGR. 
vs. 182531
 test-amd64-amd64-xl-credit2   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-examine-bios  8 reboot  fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-examine  8 reboot   fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 8 xen-boot fail REGR. vs. 
182531

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds  8 xen-boot fail REGR. vs. 182531

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 182531
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 182531
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 182531
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 

Re: [PATCH] x86/shutdown: change default reboot method preference

2023-09-15 Thread Marek Marczykowski-Górecki
On Fri, Sep 15, 2023 at 09:11:50AM +0200, Roger Pau Monné wrote:
> On Thu, Sep 14, 2023 at 06:42:03PM +0100, Andrew Cooper wrote:
> > On 14/09/2023 4:21 pm, Roger Pau Monne wrote:
> > > The current logic to chose the preferred reboot method is based on the 
> > > mode Xen
> > > has been booted into, so if the box is booted from UEFI, the preferred 
> > > reboot
> > > method will be to use the ResetSystem() run time service call.
> > >
> > > However, that call seems to be widely untested once the firmware has 
> > > exited
> > > boot services, and quite often leads to a result similar to:
> > 
> > I don't know how true this is.  What Xen does differently to most of the
> > rest of the world is not calling SetVirtualAddressMap()
> 
> Hm, but that doesn't seem to affect the rest of the run-time services
> (at least on this box), it's (just?) ResetSystem() that misbehaves.

I've seen (too) many firmwares where also GetVariable crashes without
SetVirtualAddressMap(). That's why we have a build-time option to
actually call SetVirtualAddressMap().

Anyway, I agree that preferring ACPI reboot even when booted through
UEFI is a good idea.

> I can remove that sentence however, it is more of a guess rather than
> a fact.
> 
> > 
> > >
> > > Hardware Dom0 shutdown: rebooting machine
> > > [ Xen-4.18-unstable  x86_64  debug=y  Tainted:   C]
> > > CPU:0
> > > RIP:e008:[<0017>] 0017
> > > RFLAGS: 00010202   CONTEXT: hypervisor
> > > [...]
> > > Xen call trace:
> > >[<0017>] R 0017
> > >[] S 83207eff7b50
> > >[] F machine_restart+0x1da/0x261
> > >[] F apic_wait_icr_idle+0/0x37
> > >[] F smp_call_function_interrupt+0xc7/0xcb
> > >[] F call_function_interrupt+0x20/0x34
> > >[] F do_IRQ+0x150/0x6f3
> > >[] F common_interrupt+0x132/0x140
> > >[] F 
> > > arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129
> > >[] F 
> > > arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7
> > >[] F arch/x86/domain.c#idle_loop+0xec/0xee
> > >
> > > 
> > > Panic on CPU 0:
> > > FATAL TRAP: vector = 6 (invalid opcode)
> > > 
> > >
> > > Which in most cases does lead to a reboot, however that's unreliable.
> > >
> > > Change the default reboot preference to prefer ACPI over UEFI if 
> > > available and
> > > not in reduced hardware mode.
> > >
> > > This is in line to what Linux does, so it's unlikely to cause issues on 
> > > current
> > > and future hardware, since there's a much higher chance of vendors testing
> > > hardware with Linux rather than Xen.
> > >
> > > I'm not aware of using ACPI reboot causing issues on boxes that do have
> > > properly implemented ResetSystem() methods.
> > >
> > > Signed-off-by: Roger Pau Monné 
> > 
> > Wording aside, Reviewed-by: Andrew Cooper 
> > 
> > This is a giant embarrassment to Xen, and needs fixing.
> 
> Looking at Linux I think we need to add an override for Acer
> TravelMate X514-51T to force it to use UEFI, will send v2 with it.
> 
> I do wonder if we need to keep the reboot_dmi_table[] entries that
> force systems to use ACPI, as it would now be the default?  My
> thinking is that it's likely better to keep it just in case we change
> the default again in the future.
> 
> Thanks, Roger.
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [PATCH v5 2/5] xen/ppc: Implement bitops.h

2023-09-15 Thread Shawn Anastasio
On 9/15/23 1:50 AM, Jan Beulich wrote:
> On 14.09.2023 20:15, Shawn Anastasio wrote:
>> On 9/13/23 2:29 AM, Jan Beulich wrote:
>>> On 12.09.2023 20:35, Shawn Anastasio wrote:
 --- a/xen/arch/ppc/include/asm/bitops.h
 +++ b/xen/arch/ppc/include/asm/bitops.h
 @@ -1,9 +1,335 @@
 +/* SPDX-License-Identifier: GPL-2.0-or-later */
 +/*
 + * Adapted from Linux's arch/powerpc/include/asm/bitops.h.
 + *
 + * Merged version by David Gibson .
 + * Based on ppc64 versions by: Dave Engebretsen, Todd Inglett, Don
 + * Reed, Pat McCarthy, Peter Bergner, Anton Blanchard.  They
 + * originally took it from the ppc32 code.
 + */
  #ifndef _ASM_PPC_BITOPS_H
  #define _ASM_PPC_BITOPS_H

 +#include 
 +
 +#define __set_bit(n, p) set_bit(n, p)
 +#define __clear_bit(n, p)   clear_bit(n, p)
 +
 +#define BITOP_BITS_PER_WORD 32
 +#define BITOP_MASK(nr)  (1U << ((nr) % BITOP_BITS_PER_WORD))
 +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
 +#define BITS_PER_BYTE   8
 +
  /* PPC bit number conversion */
 -#define PPC_BITLSHIFT(be) (BITS_PER_LONG - 1 - (be))
 -#define PPC_BIT(bit)  (1UL << PPC_BITLSHIFT(bit))
 -#define PPC_BITMASK(bs, be)   ((PPC_BIT(bs) - PPC_BIT(be)) | 
 PPC_BIT(bs))
 +#define PPC_BITLSHIFT(be)(BITS_PER_LONG - 1 - (be))
 +#define PPC_BIT(bit) (1UL << PPC_BITLSHIFT(bit))
 +#define PPC_BITMASK(bs, be)  ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
 +
 +/* Macro for generating the ***_bits() functions */
 +#define DEFINE_BITOP(fn, op, prefix)  
  \
 +static inline void fn(unsigned int mask,  
 \
 +  volatile unsigned int *p_)  
  \
 +{ 
  \
 +unsigned int old; 
 \
 +unsigned int *p = (unsigned int *)p_; 
  \
>>>
>>> What use is this, when ...
>>>
 +asm volatile ( prefix 
  \
 +   "1: lwarx %0,0,%3,0\n" 
  \
 +   #op "%I2 %0,%0,%2\n"   
  \
 +   "stwcx. %0,0,%3\n" 
  \
 +   "bne- 1b\n"
  \
 +   : "=" (old), "+m" (*p)   
  \
>>>
>>> ... the "+m" operand isn't used and ...
>>>
 +   : "rK" (mask), "r" (p) 
  \
 +   : "cc", "memory" );
  \
>>>
>>> ... there's a memory clobber anyway?
>>>
>>
>> I see what you're saying, and I'm not sure why it was written this way
>> in Linux. That said, this is the kind of thing that I'm hesitant to
>> change without knowing the rationale of the original author. If you are
>> confident that the this can be dropped given that there is already a
>> memory clobber, I could drop it. Otherwise I'm inclined to leave it in a
>> state that matches Linux.
> 
> This being an arch-independent property, I am confident. Yet still you're
> the maintainer, so if you want to keep it like this initially, that'll be
> okay. If I feel bothered enough, I could always send a patch afterwards.
>

Okay, for now then I will leave it as-is.

> Jan

Thanks,
Shawn



Re: [QEMU PATCH v5 06/13] virtio-gpu: Support context init feature with virglrenderer

2023-09-15 Thread Akihiko Odaki

On 2023/09/15 20:11, Huang Rui wrote:

Patch "virtio-gpu: CONTEXT_INIT feature" has added the context_init
feature flags.
We would like to enable the feature with virglrenderer, so add to create
virgl renderer context with flags using context_id when valid.

Originally-by: Antonio Caggiano 
Signed-off-by: Huang Rui 
---

V4 -> V5:
 - Inverted patch 5 and 6 because we should configure
   HAVE_VIRGL_CONTEXT_INIT firstly. (Philippe)

  hw/display/virtio-gpu-virgl.c | 13 +++--
  hw/display/virtio-gpu.c   |  2 ++
  2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 8bb7a2c21f..312953ec16 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -106,8 +106,17 @@ static void virgl_cmd_context_create(VirtIOGPU *g,
  trace_virtio_gpu_cmd_ctx_create(cc.hdr.ctx_id,
  cc.debug_name);
  
-virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen,

-  cc.debug_name);
+if (cc.context_init) {
+#ifdef HAVE_VIRGL_CONTEXT_INIT
+virgl_renderer_context_create_with_flags(cc.hdr.ctx_id,
+ cc.context_init,
+ cc.nlen,
+ cc.debug_name);
+return;
+#endif
+}
+
+virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen, cc.debug_name);
  }
  
  static void virgl_cmd_context_destroy(VirtIOGPU *g,

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 3e658f1fef..a66cbd9930 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1506,6 +1506,8 @@ static Property virtio_gpu_properties[] = {
  DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
  VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
  DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0),
+DEFINE_PROP_BIT("context_init", VirtIOGPU, parent_obj.conf.flags,
+VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED, false),
  DEFINE_PROP_END_OF_LIST(),
  };
  


I think it's more convenient if this feature is enabled by default.



Re: [QEMU PATCH v5 10/13] virtio-gpu: Resource UUID

2023-09-15 Thread Akihiko Odaki

On 2023/09/15 20:11, Huang Rui wrote:

From: Antonio Caggiano 

Enable resource UUID feature and implement command resource assign UUID.
This is done by introducing a hash table to map resource IDs to their
UUIDs.

Signed-off-by: Antonio Caggiano 
Signed-off-by: Huang Rui 
---

V4 -> V5:
 - Add virtio migration handling for uuid (Akihiko)
 - Adjust sequence to allocate gpu resource before virglrender resource
   creation (Akihiko)
 - Clean up (Akihiko)

  hw/display/trace-events|  1 +
  hw/display/virtio-gpu-base.c   |  2 ++
  hw/display/virtio-gpu-virgl.c  | 21 
  hw/display/virtio-gpu.c| 58 ++
  include/hw/virtio/virtio-gpu.h |  6 
  5 files changed, 88 insertions(+)

diff --git a/hw/display/trace-events b/hw/display/trace-events
index 2336a0ca15..54d6894c59 100644
--- a/hw/display/trace-events
+++ b/hw/display/trace-events
@@ -41,6 +41,7 @@ virtio_gpu_cmd_res_create_blob(uint32_t res, uint64_t size) "res 
0x%x, size %" P
  virtio_gpu_cmd_res_unref(uint32_t res) "res 0x%x"
  virtio_gpu_cmd_res_back_attach(uint32_t res) "res 0x%x"
  virtio_gpu_cmd_res_back_detach(uint32_t res) "res 0x%x"
+virtio_gpu_cmd_res_assign_uuid(uint32_t res) "res 0x%x"
  virtio_gpu_cmd_res_xfer_toh_2d(uint32_t res) "res 0x%x"
  virtio_gpu_cmd_res_xfer_toh_3d(uint32_t res) "res 0x%x"
  virtio_gpu_cmd_res_xfer_fromh_3d(uint32_t res) "res 0x%x"
diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index 4f2b0ba1f3..f44388715c 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -236,6 +236,8 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
  features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT);
  }
  
+features |= (1 << VIRTIO_GPU_F_RESOURCE_UUID);

+
  return features;
  }
  
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c

index 563a6f2f58..8a017dbeb4 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -36,11 +36,20 @@ static void virgl_cmd_create_resource_2d(VirtIOGPU *g,
  {
  struct virtio_gpu_resource_create_2d c2d;
  struct virgl_renderer_resource_create_args args;
+struct virtio_gpu_simple_resource *res;
  
  VIRTIO_GPU_FILL_CMD(c2d);

  trace_virtio_gpu_cmd_res_create_2d(c2d.resource_id, c2d.format,
 c2d.width, c2d.height);
  
+res = g_new0(struct virtio_gpu_simple_resource, 1);

+if (!res) {
+cmd->error = VIRTIO_GPU_RESP_ERR_OUT_OF_MEMORY;
+return;
+}
+res->resource_id = c2d.resource_id;
+QTAILQ_INSERT_HEAD(>reslist, res, next);
+


The struct virtio_gpu_simple_resource for a resource created with 
virgl_cmd_create_resource_2d() and virgl_resource_attach_backing() will 
not have iov and iov_cnt set, which is inconsistent with 
virgl_cmd_resource_create_blob().



  args.handle = c2d.resource_id;
  args.target = 2;
  args.format = c2d.format;
@@ -60,11 +69,20 @@ static void virgl_cmd_create_resource_3d(VirtIOGPU *g,
  {
  struct virtio_gpu_resource_create_3d c3d;
  struct virgl_renderer_resource_create_args args;
+struct virtio_gpu_simple_resource *res;
  
  VIRTIO_GPU_FILL_CMD(c3d);

  trace_virtio_gpu_cmd_res_create_3d(c3d.resource_id, c3d.format,
 c3d.width, c3d.height, c3d.depth);
  
+res = g_new0(struct virtio_gpu_simple_resource, 1);

+if (!res) {
+cmd->error = VIRTIO_GPU_RESP_ERR_OUT_OF_MEMORY;
+return;
+}
+res->resource_id = c3d.resource_id;
+QTAILQ_INSERT_HEAD(>reslist, res, next);
+
  args.handle = c3d.resource_id;
  args.target = c3d.target;
  args.format = c3d.format;
@@ -682,6 +700,9 @@ void virtio_gpu_virgl_process_cmd(VirtIOGPU *g,
  /* TODO add security */
  virgl_cmd_ctx_detach_resource(g, cmd);
  break;
+case VIRTIO_GPU_CMD_RESOURCE_ASSIGN_UUID:
+virtio_gpu_resource_assign_uuid(g, cmd);
+break;
  case VIRTIO_GPU_CMD_GET_CAPSET_INFO:
  virgl_cmd_get_capset_info(g, cmd);
  break;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index cc4c1f81bb..44414c1c5e 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -966,6 +966,38 @@ virtio_gpu_resource_detach_backing(VirtIOGPU *g,
  virtio_gpu_cleanup_mapping(g, res);
  }
  
+void virtio_gpu_resource_assign_uuid(VirtIOGPU *g,

+ struct virtio_gpu_ctrl_command *cmd)
+{
+struct virtio_gpu_simple_resource *res;
+struct virtio_gpu_resource_assign_uuid assign;
+struct virtio_gpu_resp_resource_uuid resp;
+QemuUUID *uuid;
+
+VIRTIO_GPU_FILL_CMD(assign);
+virtio_gpu_bswap_32(, sizeof(assign));
+trace_virtio_gpu_cmd_res_assign_uuid(assign.resource_id);
+
+res = virtio_gpu_find_check_resource(g, assign.resource_id, false, __func__, 
>error);
+if (!res) {
+return;
+}
+

Re: [QEMU PATCH v5 09/13] virtio-gpu: Handle resource blob commands

2023-09-15 Thread Akihiko Odaki

On 2023/09/16 1:04, Akihiko Odaki wrote:

On 2023/09/15 20:11, Huang Rui wrote:

From: Antonio Caggiano 

Support BLOB resources creation, mapping and unmapping by calling the
new stable virglrenderer 0.10 interface. Only enabled when available and
via the blob config. E.g. -device virtio-vga-gl,blob=true

Signed-off-by: Antonio Caggiano 
Signed-off-by: Dmitry Osipenko 
Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Huang Rui 
---

V4 -> V5:
 - Use memory_region_init_ram_ptr() instead of
   memory_region_init_ram_device_ptr() (Akihiko)

  hw/display/virtio-gpu-virgl.c  | 213 +
  hw/display/virtio-gpu.c    |   4 +-
  include/hw/virtio/virtio-gpu.h |   5 +
  meson.build    |   4 +
  4 files changed, 225 insertions(+), 1 deletion(-)

diff --git a/hw/display/virtio-gpu-virgl.c 
b/hw/display/virtio-gpu-virgl.c

index 312953ec16..563a6f2f58 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -17,6 +17,7 @@
  #include "trace.h"
  #include "hw/virtio/virtio.h"
  #include "hw/virtio/virtio-gpu.h"
+#include "hw/virtio/virtio-gpu-bswap.h"
  #include "ui/egl-helpers.h"
@@ -78,9 +79,24 @@ static void virgl_cmd_create_resource_3d(VirtIOGPU *g,
  virgl_renderer_resource_create(, NULL, 0);
  }
+static void virgl_resource_destroy(VirtIOGPU *g,
+   struct virtio_gpu_simple_resource 
*res)

+{
+    if (!res)
+    return;
+
+    QTAILQ_REMOVE(>reslist, res, next);
+
+    virtio_gpu_cleanup_mapping_iov(g, res->iov, res->iov_cnt);
+    g_free(res->addrs);
+
+    g_free(res);
+}
+
  static void virgl_cmd_resource_unref(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command 
*cmd)

  {
+    struct virtio_gpu_simple_resource *res;
  struct virtio_gpu_resource_unref unref;
  struct iovec *res_iovs = NULL;
  int num_iovs = 0;
@@ -88,13 +104,22 @@ static void virgl_cmd_resource_unref(VirtIOGPU *g,
  VIRTIO_GPU_FILL_CMD(unref);
  trace_virtio_gpu_cmd_res_unref(unref.resource_id);
+    res = virtio_gpu_find_resource(g, unref.resource_id);
+
  virgl_renderer_resource_detach_iov(unref.resource_id,
 _iovs,
 _iovs);
  if (res_iovs != NULL && num_iovs != 0) {
  virtio_gpu_cleanup_mapping_iov(g, res_iovs, num_iovs);
+    if (res) {
+    res->iov = NULL;
+    res->iov_cnt = 0;
+    }
  }
+
  virgl_renderer_resource_unref(unref.resource_id);
+
+    virgl_resource_destroy(g, res);


This may leak memory region.



Re: [QEMU PATCH v5 09/13] virtio-gpu: Handle resource blob commands

2023-09-15 Thread Akihiko Odaki

On 2023/09/15 20:11, Huang Rui wrote:

From: Antonio Caggiano 

Support BLOB resources creation, mapping and unmapping by calling the
new stable virglrenderer 0.10 interface. Only enabled when available and
via the blob config. E.g. -device virtio-vga-gl,blob=true

Signed-off-by: Antonio Caggiano 
Signed-off-by: Dmitry Osipenko 
Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Huang Rui 
---

V4 -> V5:
 - Use memory_region_init_ram_ptr() instead of
   memory_region_init_ram_device_ptr() (Akihiko)

  hw/display/virtio-gpu-virgl.c  | 213 +
  hw/display/virtio-gpu.c|   4 +-
  include/hw/virtio/virtio-gpu.h |   5 +
  meson.build|   4 +
  4 files changed, 225 insertions(+), 1 deletion(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 312953ec16..563a6f2f58 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -17,6 +17,7 @@
  #include "trace.h"
  #include "hw/virtio/virtio.h"
  #include "hw/virtio/virtio-gpu.h"
+#include "hw/virtio/virtio-gpu-bswap.h"
  
  #include "ui/egl-helpers.h"
  
@@ -78,9 +79,24 @@ static void virgl_cmd_create_resource_3d(VirtIOGPU *g,

  virgl_renderer_resource_create(, NULL, 0);
  }
  
+static void virgl_resource_destroy(VirtIOGPU *g,

+   struct virtio_gpu_simple_resource *res)
+{
+if (!res)
+return;
+
+QTAILQ_REMOVE(>reslist, res, next);
+
+virtio_gpu_cleanup_mapping_iov(g, res->iov, res->iov_cnt);
+g_free(res->addrs);
+
+g_free(res);
+}
+
  static void virgl_cmd_resource_unref(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd)
  {
+struct virtio_gpu_simple_resource *res;
  struct virtio_gpu_resource_unref unref;
  struct iovec *res_iovs = NULL;
  int num_iovs = 0;
@@ -88,13 +104,22 @@ static void virgl_cmd_resource_unref(VirtIOGPU *g,
  VIRTIO_GPU_FILL_CMD(unref);
  trace_virtio_gpu_cmd_res_unref(unref.resource_id);
  
+res = virtio_gpu_find_resource(g, unref.resource_id);

+
  virgl_renderer_resource_detach_iov(unref.resource_id,
 _iovs,
 _iovs);
  if (res_iovs != NULL && num_iovs != 0) {
  virtio_gpu_cleanup_mapping_iov(g, res_iovs, num_iovs);
+if (res) {
+res->iov = NULL;
+res->iov_cnt = 0;
+}
  }
+
  virgl_renderer_resource_unref(unref.resource_id);
+
+virgl_resource_destroy(g, res);
  }
  
  static void virgl_cmd_context_create(VirtIOGPU *g,

@@ -426,6 +451,183 @@ static void virgl_cmd_get_capset(VirtIOGPU *g,
  g_free(resp);
  }
  
+#ifdef HAVE_VIRGL_RESOURCE_BLOB

+
+static void virgl_cmd_resource_create_blob(VirtIOGPU *g,
+   struct virtio_gpu_ctrl_command *cmd)
+{
+struct virtio_gpu_simple_resource *res;
+struct virtio_gpu_resource_create_blob cblob;
+struct virgl_renderer_resource_create_blob_args virgl_args = { 0 };
+int ret;
+
+VIRTIO_GPU_FILL_CMD(cblob);
+virtio_gpu_create_blob_bswap();
+trace_virtio_gpu_cmd_res_create_blob(cblob.resource_id, cblob.size);
+
+if (cblob.resource_id == 0) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource id 0 is not allowed\n",
+  __func__);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
+res = virtio_gpu_find_resource(g, cblob.resource_id);
+if (res) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource already exists %d\n",
+  __func__, cblob.resource_id);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
+res = g_new0(struct virtio_gpu_simple_resource, 1);
+if (!res) {
+cmd->error = VIRTIO_GPU_RESP_ERR_OUT_OF_MEMORY;
+return;
+}
+
+res->resource_id = cblob.resource_id;
+res->blob_size = cblob.size;
+
+if (cblob.blob_mem != VIRTIO_GPU_BLOB_MEM_HOST3D) {
+ret = virtio_gpu_create_mapping_iov(g, cblob.nr_entries, sizeof(cblob),
+cmd, >addrs, >iov,
+>iov_cnt);
+if (!ret) {
+g_free(res);
+cmd->error = VIRTIO_GPU_RESP_ERR_UNSPEC;
+return;
+}
+}
+
+QTAILQ_INSERT_HEAD(>reslist, res, next);
+
+virgl_args.res_handle = cblob.resource_id;
+virgl_args.ctx_id = cblob.hdr.ctx_id;
+virgl_args.blob_mem = cblob.blob_mem;
+virgl_args.blob_id = cblob.blob_id;
+virgl_args.blob_flags = cblob.blob_flags;
+virgl_args.size = cblob.size;
+virgl_args.iovecs = res->iov;
+virgl_args.num_iovs = res->iov_cnt;
+
+ret = virgl_renderer_resource_create_blob(_args);
+if (ret) {
+virgl_resource_destroy(g, res);
+qemu_log_mask(LOG_GUEST_ERROR, "%s: virgl blob create error: %s\n",
+  __func__, 

Re: [QEMU PATCH v5 06/13] virtio-gpu: Support context init feature with virglrenderer

2023-09-15 Thread Akihiko Odaki

On 2023/09/15 20:11, Huang Rui wrote:

Patch "virtio-gpu: CONTEXT_INIT feature" has added the context_init
feature flags.
We would like to enable the feature with virglrenderer, so add to create
virgl renderer context with flags using context_id when valid.

Originally-by: Antonio Caggiano 
Signed-off-by: Huang Rui 
---

V4 -> V5:
 - Inverted patch 5 and 6 because we should configure
   HAVE_VIRGL_CONTEXT_INIT firstly. (Philippe)

  hw/display/virtio-gpu-virgl.c | 13 +++--
  hw/display/virtio-gpu.c   |  2 ++
  2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 8bb7a2c21f..312953ec16 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -106,8 +106,17 @@ static void virgl_cmd_context_create(VirtIOGPU *g,
  trace_virtio_gpu_cmd_ctx_create(cc.hdr.ctx_id,
  cc.debug_name);
  
-virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen,

-  cc.debug_name);
+if (cc.context_init) {
+#ifdef HAVE_VIRGL_CONTEXT_INIT
+virgl_renderer_context_create_with_flags(cc.hdr.ctx_id,
+ cc.context_init,
+ cc.nlen,
+ cc.debug_name);
+return;
+#endif


This should deal with the case when context_init is set while 
HAVE_VIRGL_CONTEXT_INIT is not defined.




Re: [QEMU PATCH v5 07/13] softmmu/memory: enable automatic deallocation of memory regions

2023-09-15 Thread Akihiko Odaki

On 2023/09/15 20:11, Huang Rui wrote:

From: Xenia Ragiadakou 

When the memory region has a different life-cycle from that of her parent,
could be automatically released, once has been unparent and once all of her
references have gone away, via the object's free callback.

However, currently, references to the memory region are held by its owner
without first incrementing the memory region object's reference count.
As a result, the automatic deallocation of the object, not taking into
account those references, results in use-after-free memory corruption.

This patch increases the reference count of an owned memory region object
on each memory_region_ref() and decreases it on each memory_region_unref().

Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Huang Rui 
---

V4 -> V5:
 - ref/unref only owned memory regions (Akihiko)

  softmmu/memory.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7d9494ce70..15e1699750 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1800,6 +1800,9 @@ void memory_region_ref(MemoryRegion *mr)
  /* MMIO callbacks most likely will access data that belongs
   * to the owner, hence the need to ref/unref the owner whenever
   * the memory region is in use.
+ * Likewise, the owner keeps references to the memory region,
+ * hence the need to ref/unref the memory region object to prevent
+ * its automatic deallocation while still referenced by its owner.


This comment does not make sense. Traditionally no such automatic 
deallocation happens so the owner has been always required to free the 
memory region when it gets finalized.


"[QEMU PATCH v5 09/13] virtio-gpu: Handle resource blob commands" 
introduces a different kind of memory region, which can be freed anytime 
before the device gets finalized. Even in this case, the owner removes 
the reference to the memory owner by doing res->region = NULL;


Regards,
Akihiko Odaki



[PATCH 9/9] x86/spec-ctrl: Mitigate the Zen1 DIV leakge

2023-09-15 Thread Andrew Cooper
In the Zen1 microarchitecure, there is one divider in the pipeline which
services uops from both threads.  In the case of #DE, the latched result from
the previous DIV to execute will be forwarded speculatively.

This is an interesting covert channel that allows two threads to communicate
without any system calls.  In also allows userspace to obtain the result of
the most recent DIV instruction executed (even speculatively) in the core,
which can be from a higher privilege context.

Scrub the result from the divider by executing a non-faulting divide.  This
needs performing on the exit-to-guest paths, and ist_exit-to-Xen.

This is XSA-439 / CVE-2023-20588.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

No embargo - this is already public.  XSA paperwork to follow.

v2:
 * Rebase over the introduction of is_zen1_uarch().
 * Fix the SC_DIV bit not to alias SC_VERW_IDLE.
 * Extend comments.
---
 docs/misc/xen-command-line.pandoc|  6 ++-
 xen/arch/x86/hvm/svm/entry.S |  1 +
 xen/arch/x86/include/asm/cpufeatures.h   |  2 +-
 xen/arch/x86/include/asm/spec_ctrl_asm.h | 17 
 xen/arch/x86/spec_ctrl.c | 49 +++-
 5 files changed, 72 insertions(+), 3 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index f88e6a70aed6..7acd68885656 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2353,7 +2353,7 @@ By default SSBD will be mitigated at runtime (i.e 
`ssbd=runtime`).
 >  {msr-sc,rsb,md-clear,ibpb-entry}=|{pv,hvm}=,
 >  bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
 >  eager-fpu,l1d-flush,branch-harden,srb-lock,
->  unpriv-mmio,gds-mit}= ]`
+>  unpriv-mmio,gds-mit,div-scrub}= ]`
 
 Controls for speculative execution sidechannel mitigations.  By default, Xen
 will pick the most appropriate mitigations based on compiled in support,
@@ -2475,6 +2475,10 @@ has elected not to lock the configuration, Xen will use 
GDS_CTRL to mitigate
 GDS with.  Otherwise, Xen will mitigate by disabling AVX, which blocks the use
 of the AVX2 Gather instructions.
 
+On all hardware, the `div-scrub=` option can be used to force or prevent Xen
+from mitigating the DIV-leakage vulnerability.  By default, Xen will mitigate
+DIV-leakage on hardware believed to be vulnerable.
+
 ### sync_console
 > `= `
 
diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
index 9effd2199ba0..c52528fed4cf 100644
--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -72,6 +72,7 @@ __UNLIKELY_END(nsvm_hap)
 1:  /* No Spectre v1 concerns.  Execution will hit VMRUN imminently. */
 .endm
 ALTERNATIVE "", svm_vmentry_spec_ctrl, X86_FEATURE_SC_MSR_HVM
+ALTERNATIVE "", DO_SPEC_CTRL_DIV, X86_FEATURE_SC_DIV
 
 pop  %r15
 pop  %r14
diff --git a/xen/arch/x86/include/asm/cpufeatures.h 
b/xen/arch/x86/include/asm/cpufeatures.h
index da0593de8542..c3aad21c3b43 100644
--- a/xen/arch/x86/include/asm/cpufeatures.h
+++ b/xen/arch/x86/include/asm/cpufeatures.h
@@ -35,7 +35,7 @@ XEN_CPUFEATURE(SC_RSB_HVM,X86_SYNTH(19)) /* RSB 
overwrite needed for HVM
 XEN_CPUFEATURE(XEN_SELFSNOOP, X86_SYNTH(20)) /* SELFSNOOP gets used by Xen 
itself */
 XEN_CPUFEATURE(SC_MSR_IDLE,   X86_SYNTH(21)) /* Clear MSR_SPEC_CTRL on 
idle */
 XEN_CPUFEATURE(XEN_LBR,   X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR 
*/
-/* Bits 23 unused. */
+XEN_CPUFEATURE(SC_DIV,X86_SYNTH(23)) /* DIV scrub needed */
 XEN_CPUFEATURE(SC_RSB_IDLE,   X86_SYNTH(24)) /* RSB overwrite needed for 
idle. */
 XEN_CPUFEATURE(SC_VERW_IDLE,  X86_SYNTH(25)) /* VERW used by Xen for idle 
*/
 XEN_CPUFEATURE(XEN_SHSTK, X86_SYNTH(26)) /* Xen uses CET Shadow Stacks 
*/
diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h 
b/xen/arch/x86/include/asm/spec_ctrl_asm.h
index 9a27e3170347..5c5b3b6f5324 100644
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -165,6 +165,19 @@
 .L\@_verw_skip:
 .endm
 
+.macro DO_SPEC_CTRL_DIV
+/*
+ * Requires nothing
+ * Clobbers %rax
+ *
+ * Issue a DIV for its flushing side effect (Zen1 uarch specific).  Any
+ * non-faulting DIV will do; a byte DIV has least latency, and doesn't clobber
+ * %rdx.
+ */
+mov $1, %eax
+div %al
+.endm
+
 .macro DO_SPEC_CTRL_ENTRY maybexen:req
 /*
  * Requires %rsp=regs (also cpuinfo if !maybexen)
@@ -267,6 +280,8 @@
 ALTERNATIVE "", DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV
 
 DO_SPEC_CTRL_COND_VERW
+
+ALTERNATIVE "", DO_SPEC_CTRL_DIV, X86_FEATURE_SC_DIV
 .endm
 
 /*
@@ -379,6 +394,8 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
 verw STACK_CPUINFO_FIELD(verw_sel)(%r14)
 .L\@_skip_verw:
 
+ALTERNATIVE "", DO_SPEC_CTRL_DIV, X86_FEATURE_SC_DIV
+
 .L\@_skip_ist_exit:
 .endm
 
diff --git a/xen/arch/x86/spec_ctrl.c 

[PATCH 7/9] x86/spec-ctrl: Issue VERW during IST exit to Xen

2023-09-15 Thread Andrew Cooper
There is a corner case where e.g. an NMI hitting an exit-to-guest path after
SPEC_CTRL_EXIT_TO_* would have run the entire NMI handler *after* the VERW
flush to scrub potentially sensitive data from uarch buffers.

In order to compensate, issue VERW when exiting to Xen from an IST entry.

SPEC_CTRL_EXIT_TO_XEN already has two reads of spec_ctrl_flags off the stack,
and we're about to add a third.  Load the field into %ebx, and list the
register as clobbered.

%r12 has been arranged to be the ist_exit signal, so add this as an input
dependency and use it to identify when to issue a VERW.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

Note to reviewers:  .L\@_skip_verw and .L\@_skip_ist_exit are separate to
reduce the churn in the following patch.

v2:
 * Rename .L\@_skip_verw
---
 xen/arch/x86/include/asm/spec_ctrl_asm.h | 20 +++-
 xen/arch/x86/x86_64/entry.S  |  2 +-
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h 
b/xen/arch/x86/include/asm/spec_ctrl_asm.h
index b696033240e4..9a27e3170347 100644
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -345,10 +345,12 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
  */
 .macro SPEC_CTRL_EXIT_TO_XEN
 /*
- * Requires %r14=stack_end
- * Clobbers %rax, %rcx, %rdx
+ * Requires %r12=ist_exit, %r14=stack_end
+ * Clobbers %rax, %rbx, %rcx, %rdx
  */
-testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
+movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
+
+testb $SCF_ist_sc_msr, %bl
 jz .L\@_skip_sc_msr
 
 /*
@@ -359,7 +361,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
  */
 xor %edx, %edx
 
-testb $SCF_use_shadow, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
+testb $SCF_use_shadow, %bl
 jz .L\@_skip_sc_msr
 
 mov STACK_CPUINFO_FIELD(shadow_spec_ctrl)(%r14), %eax
@@ -368,8 +370,16 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
 
 .L\@_skip_sc_msr:
 
-/* TODO VERW */
+test %r12, %r12
+jz .L\@_skip_ist_exit
+
+/* Logically DO_SPEC_CTRL_COND_VERW but without the %rsp=cpuinfo 
dependency */
+testb $SCF_verw, %bl
+jz .L\@_skip_verw
+verw STACK_CPUINFO_FIELD(verw_sel)(%r14)
+.L\@_skip_verw:
 
+.L\@_skip_ist_exit:
 .endm
 
 #endif /* __ASSEMBLY__ */
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index e5055e5bbf9f..988ef6cbc628 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -687,7 +687,7 @@ UNLIKELY_START(ne, exit_cr3)
 UNLIKELY_END(exit_cr3)
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
-SPEC_CTRL_EXIT_TO_XEN /* Req: %r14=end, Clob: acd */
+SPEC_CTRL_EXIT_TO_XEN /* Req: %r12=ist_exit %r14=end, Clob: abcd */
 
 RESTORE_ALL adj=8
 iretq
-- 
2.30.2




[PATCH 2/9] x86/spec-ctrl: Fold DO_SPEC_CTRL_EXIT_TO_XEN into it's single user

2023-09-15 Thread Andrew Cooper
With the SPEC_CTRL_EXIT_TO_XEN{,_IST} confusion fixed, it's now obvious that
there's only a single EXIT_TO_XEN path.  Fold DO_SPEC_CTRL_EXIT_TO_XEN into
SPEC_CTRL_EXIT_TO_XEN to simplify further fixes.

When merging labels, switch the name to .L\@_skip_sc_msr as "skip" on its own
is going to be too generic shortly.

No functional change.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
---
 xen/arch/x86/include/asm/spec_ctrl_asm.h | 40 ++--
 1 file changed, 16 insertions(+), 24 deletions(-)

diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h 
b/xen/arch/x86/include/asm/spec_ctrl_asm.h
index cfba35560333..72e7046f70d6 100644
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -199,27 +199,6 @@
 wrmsr
 .endm
 
-.macro DO_SPEC_CTRL_EXIT_TO_XEN
-/*
- * Requires %rbx=stack_end
- * Clobbers %rax, %rcx, %rdx
- *
- * When returning to Xen context, look to see whether SPEC_CTRL shadowing is
- * in effect, and reload the shadow value.  This covers race conditions which
- * exist with an NMI/MCE/etc hitting late in the return-to-guest path.
- */
-xor %edx, %edx
-
-testb $SCF_use_shadow, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
-jz .L\@_skip
-
-mov STACK_CPUINFO_FIELD(shadow_spec_ctrl)(%rbx), %eax
-mov $MSR_SPEC_CTRL, %ecx
-wrmsr
-
-.L\@_skip:
-.endm
-
 .macro DO_SPEC_CTRL_EXIT_TO_GUEST
 /*
  * Requires %eax=spec_ctrl, %rsp=regs/cpuinfo
@@ -328,11 +307,24 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
  * Clobbers %rax, %rcx, %rdx
  */
 testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
-jz .L\@_skip
+jz .L\@_skip_sc_msr
 
-DO_SPEC_CTRL_EXIT_TO_XEN
+/*
+ * When returning to Xen context, look to see whether SPEC_CTRL shadowing
+ * is in effect, and reload the shadow value.  This covers race conditions
+ * which exist with an NMI/MCE/etc hitting late in the return-to-guest
+ * path.
+ */
+xor %edx, %edx
 
-.L\@_skip:
+testb $SCF_use_shadow, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
+jz .L\@_skip_sc_msr
+
+mov STACK_CPUINFO_FIELD(shadow_spec_ctrl)(%rbx), %eax
+mov $MSR_SPEC_CTRL, %ecx
+wrmsr
+
+.L\@_skip_sc_msr:
 .endm
 
 #endif /* __ASSEMBLY__ */
-- 
2.30.2




[PATCH 8/9] x86/amd: Introduce is_zen{1,2}_uarch() predicates

2023-09-15 Thread Andrew Cooper
We already have 3 cases using STIBP as a Zen1/2 heuristic, and are about to
introduce a 4th.  Wrap the heuristic into a pair of predictes rather than
opencoding it, and the explaination of the heursitic, at each usage site.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v2:
 * New
---
 xen/arch/x86/cpu/amd.c | 18 --
 xen/arch/x86/include/asm/amd.h | 11 +++
 2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index bbf7887f2e1d..4f27187f92ec 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -882,15 +882,13 @@ void amd_set_legacy_ssbd(bool enable)
  * non-branch instructions to be ignored.  It is to be set unilaterally in
  * newer microcode.
  *
- * This chickenbit is something unrelated on Zen1, and Zen1 vs Zen2 isn't a
- * simple model number comparison, so use STIBP as a heuristic to separate the
- * two uarches in Fam17h(AMD)/18h(Hygon).
+ * This chickenbit is something unrelated on Zen1.
  */
 void amd_init_spectral_chicken(void)
 {
uint64_t val, chickenbit = 1 << 1;
 
-   if (cpu_has_hypervisor || !boot_cpu_has(X86_FEATURE_AMD_STIBP))
+   if (cpu_has_hypervisor || !is_zen2_uarch())
return;
 
if (rdmsr_safe(MSR_AMD64_DE_CFG2, val) == 0 && !(val & chickenbit))
@@ -939,11 +937,8 @@ void amd_check_zenbleed(void)
 * With the Fam17h check above, most parts getting here are
 * Zen1.  They're not affected.  Assume Zen2 ones making it
 * here are affected regardless of microcode version.
-*
-* Zen1 vs Zen2 isn't a simple model number comparison, so use
-* STIBP as a heuristic to distinguish.
 */
-   if (!boot_cpu_has(X86_FEATURE_AMD_STIBP))
+   if (is_zen1_uarch())
return;
good_rev = ~0U;
break;
@@ -1298,12 +1293,7 @@ static int __init cf_check zen2_c6_errata_check(void)
 */
s_time_t delta;
 
-   /*
-* Zen1 vs Zen2 isn't a simple model number comparison, so use STIBP as
-* a heuristic to separate the two uarches in Fam17h.
-*/
-   if (cpu_has_hypervisor || boot_cpu_data.x86 != 0x17 ||
-   !boot_cpu_has(X86_FEATURE_AMD_STIBP))
+   if (cpu_has_hypervisor || boot_cpu_data.x86 != 0x17 || !is_zen2_uarch())
return 0;
 
/*
diff --git a/xen/arch/x86/include/asm/amd.h b/xen/arch/x86/include/asm/amd.h
index 09ee52dc1c09..d862cb7972a1 100644
--- a/xen/arch/x86/include/asm/amd.h
+++ b/xen/arch/x86/include/asm/amd.h
@@ -140,6 +140,17 @@
AMD_MODEL_RANGE(0x11, 0x0, 0x0, 0xff, 0xf), \
AMD_MODEL_RANGE(0x12, 0x0, 0x0, 0xff, 0xf))
 
+/*
+ * The Zen1 and Zen2 microarchitectures are implemented by AMD (Fam17h) and
+ * Hygon (Fam18h) but without simple model number rules.  Instead, use STIBP
+ * as a heuristic that distinguishes the two.
+ *
+ * The caller is required to perform the appropriate vendor/family checks
+ * first.
+ */
+#define is_zen1_uarch() (!boot_cpu_has(X86_FEATURE_AMD_STIBP))
+#define is_zen2_uarch()   boot_cpu_has(X86_FEATURE_AMD_STIBP)
+
 struct cpuinfo_x86;
 int cpu_has_amd_erratum(const struct cpuinfo_x86 *, int, ...);
 
-- 
2.30.2




[PATCH 3/9] x86/spec-ctrl: Turn the remaining SPEC_CTRL_{ENTRY,EXIT}_* into asm macros

2023-09-15 Thread Andrew Cooper
These have grown more complex over time, with some already having been
converted.

Provide full Requires/Clobbers comments, otherwise missing at this level of
indirection.

No functional change.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
---
 xen/arch/x86/include/asm/spec_ctrl_asm.h | 37 ++--
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h 
b/xen/arch/x86/include/asm/spec_ctrl_asm.h
index 72e7046f70d6..f768b0f48a0b 100644
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -219,26 +219,45 @@
 .endm
 
 /* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
-#define SPEC_CTRL_ENTRY_FROM_PV \
+.macro SPEC_CTRL_ENTRY_FROM_PV
+/*
+ * Requires %rsp=regs/cpuinfo, %rdx=0
+ * Clobbers %rax, %rcx, %rdx
+ */
 ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=0), \
-X86_FEATURE_IBPB_ENTRY_PV;  \
-ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV;\
+X86_FEATURE_IBPB_ENTRY_PV
+
+ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV
+
 ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=0), \
 X86_FEATURE_SC_MSR_PV
+.endm
 
 /* Use in interrupt/exception context.  May interrupt Xen or PV context. */
-#define SPEC_CTRL_ENTRY_FROM_INTR   \
+.macro SPEC_CTRL_ENTRY_FROM_INTR
+/*
+ * Requires %rsp=regs, %r14=stack_end, %rdx=0
+ * Clobbers %rax, %rcx, %rdx
+ */
 ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=1), \
-X86_FEATURE_IBPB_ENTRY_PV;  \
-ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV;\
+X86_FEATURE_IBPB_ENTRY_PV
+
+ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV
+
 ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=1), \
 X86_FEATURE_SC_MSR_PV
+.endm
 
 /* Use when exiting to PV guest context. */
-#define SPEC_CTRL_EXIT_TO_PV\
-ALTERNATIVE "", \
-DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV;  \
+.macro SPEC_CTRL_EXIT_TO_PV
+/*
+ * Requires %rax=spec_ctrl, %rsp=regs/info
+ * Clobbers %rcx, %rdx
+ */
+ALTERNATIVE "", DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV
+
 DO_SPEC_CTRL_COND_VERW
+.endm
 
 /*
  * Use in IST interrupt/exception context.  May interrupt Xen or PV context.
-- 
2.30.2




[PATCH 5/9] x86/entry: Adjust restore_all_xen to hold stack_end in %r14

2023-09-15 Thread Andrew Cooper
All other SPEC_CTRL_{ENTRY,EXIT}_* helpers hold stack_end in %r14.  Adjust it
for consistency.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
---
 xen/arch/x86/include/asm/spec_ctrl_asm.h | 8 
 xen/arch/x86/x86_64/entry.S  | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h 
b/xen/arch/x86/include/asm/spec_ctrl_asm.h
index 8996fe3fc0ef..b696033240e4 100644
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -345,10 +345,10 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
  */
 .macro SPEC_CTRL_EXIT_TO_XEN
 /*
- * Requires %rbx=stack_end
+ * Requires %r14=stack_end
  * Clobbers %rax, %rcx, %rdx
  */
-testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
+testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
 jz .L\@_skip_sc_msr
 
 /*
@@ -359,10 +359,10 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
  */
 xor %edx, %edx
 
-testb $SCF_use_shadow, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
+testb $SCF_use_shadow, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
 jz .L\@_skip_sc_msr
 
-mov STACK_CPUINFO_FIELD(shadow_spec_ctrl)(%rbx), %eax
+mov STACK_CPUINFO_FIELD(shadow_spec_ctrl)(%r14), %eax
 mov $MSR_SPEC_CTRL, %ecx
 wrmsr
 
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index a1c860f56949..525877e97330 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -665,15 +665,15 @@ restore_all_xen:
  * Check whether we need to switch to the per-CPU page tables, in
  * case we return to late PV exit code (from an NMI or #MC).
  */
-GET_STACK_END(bx)
-cmpb  $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%rbx)
+GET_STACK_END(14)
+cmpb  $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%r14)
 UNLIKELY_START(ne, exit_cr3)
-mov   STACK_CPUINFO_FIELD(pv_cr3)(%rbx), %rax
+mov   STACK_CPUINFO_FIELD(pv_cr3)(%r14), %rax
 mov   %rax, %cr3
 UNLIKELY_END(exit_cr3)
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
-SPEC_CTRL_EXIT_TO_XEN /* Req: %rbx=end, Clob: acd */
+SPEC_CTRL_EXIT_TO_XEN /* Req: %r14=end, Clob: acd */
 
 RESTORE_ALL adj=8
 iretq
-- 
2.30.2




[PATCH 1/9] x86/spec-ctrl: Fix confusion between SPEC_CTRL_EXIT_TO_XEN{,_IST}

2023-09-15 Thread Andrew Cooper
c/s 3fffaf9c13e9 ("x86/entry: Avoid using alternatives in NMI/#MC paths")
dropped the only user, leaving behind the (incorrect) implication that Xen had
split exit paths.

Delete the unused SPEC_CTRL_EXIT_TO_XEN and rename SPEC_CTRL_EXIT_TO_XEN_IST
to SPEC_CTRL_EXIT_TO_XEN for consistency.

No functional change.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v2:
 * Tweak comment.
---
 xen/arch/x86/include/asm/spec_ctrl_asm.h | 10 ++
 xen/arch/x86/x86_64/entry.S  |  2 +-
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h 
b/xen/arch/x86/include/asm/spec_ctrl_asm.h
index f48f9e75e8dc..cfba35560333 100644
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -67,7 +67,6 @@
  *  - SPEC_CTRL_ENTRY_FROM_PV
  *  - SPEC_CTRL_ENTRY_FROM_INTR
  *  - SPEC_CTRL_ENTRY_FROM_INTR_IST
- *  - SPEC_CTRL_EXIT_TO_XEN_IST
  *  - SPEC_CTRL_EXIT_TO_XEN
  *  - SPEC_CTRL_EXIT_TO_PV
  *
@@ -256,11 +255,6 @@
 ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=1), \
 X86_FEATURE_SC_MSR_PV
 
-/* Use when exiting to Xen context. */
-#define SPEC_CTRL_EXIT_TO_XEN   \
-ALTERNATIVE "", \
-DO_SPEC_CTRL_EXIT_TO_XEN, X86_FEATURE_SC_MSR_PV
-
 /* Use when exiting to PV guest context. */
 #define SPEC_CTRL_EXIT_TO_PV\
 ALTERNATIVE "", \
@@ -327,8 +321,8 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
 UNLIKELY_END(\@_serialise)
 .endm
 
-/* Use when exiting to Xen in IST context. */
-.macro SPEC_CTRL_EXIT_TO_XEN_IST
+/* Use when exiting to Xen context. */
+.macro SPEC_CTRL_EXIT_TO_XEN
 /*
  * Requires %rbx=stack_end
  * Clobbers %rax, %rcx, %rdx
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 81dd2c74b876..a1c860f56949 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -673,7 +673,7 @@ UNLIKELY_START(ne, exit_cr3)
 UNLIKELY_END(exit_cr3)
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
-SPEC_CTRL_EXIT_TO_XEN_IST /* Req: %rbx=end, Clob: acd */
+SPEC_CTRL_EXIT_TO_XEN /* Req: %rbx=end, Clob: acd */
 
 RESTORE_ALL adj=8
 iretq
-- 
2.30.2




[PATCH 4/9] x86/spec-ctrl: Improve all SPEC_CTRL_{ENTER,EXIT}_* comments

2023-09-15 Thread Andrew Cooper
... to better explain how they're used.

Doing so highlights that SPEC_CTRL_EXIT_TO_XEN is missing a VERW flush for the
corner case when e.g. an NMI hits late in an exit-to-guest path.

Leave a TODO, which will be addressed in subsequent patches which arrange for
DO_COND_VERW to be safe within SPEC_CTRL_EXIT_TO_XEN.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

This was decided not to be XSA-worthy, as guests can't usefully control when
IST events occur.

v2:
 * Rewrite.
---
 xen/arch/x86/include/asm/spec_ctrl_asm.h | 36 
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h 
b/xen/arch/x86/include/asm/spec_ctrl_asm.h
index f768b0f48a0b..8996fe3fc0ef 100644
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -218,7 +218,10 @@
 wrmsr
 .endm
 
-/* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
+/*
+ * Used after an entry from PV context: SYSCALL, SYSENTER, INT,
+ * etc.  There is always a guest speculation state in context.
+ */
 .macro SPEC_CTRL_ENTRY_FROM_PV
 /*
  * Requires %rsp=regs/cpuinfo, %rdx=0
@@ -233,7 +236,11 @@
 X86_FEATURE_SC_MSR_PV
 .endm
 
-/* Use in interrupt/exception context.  May interrupt Xen or PV context. */
+/*
+ * Used after an exception or maskable interrupt, hitting Xen or PV context.
+ * There will either be a guest speculation context, or (baring fatal
+ * exceptions) a well-formed Xen speculation context.
+ */
 .macro SPEC_CTRL_ENTRY_FROM_INTR
 /*
  * Requires %rsp=regs, %r14=stack_end, %rdx=0
@@ -248,7 +255,10 @@
 X86_FEATURE_SC_MSR_PV
 .endm
 
-/* Use when exiting to PV guest context. */
+/*
+ * Used when exiting from any entry context, back to PV context.  This
+ * includes from an IST entry which moved onto the primary stack.
+ */
 .macro SPEC_CTRL_EXIT_TO_PV
 /*
  * Requires %rax=spec_ctrl, %rsp=regs/info
@@ -260,7 +270,13 @@
 .endm
 
 /*
- * Use in IST interrupt/exception context.  May interrupt Xen or PV context.
+ * Used after an IST entry hitting Xen or PV context.  Special care is needed,
+ * because when hitting Xen context, there may not a well-formed speculation
+ * context.  (i.e. it can hit in the middle of SPEC_CTRL_{ENTRY,EXIT}_*
+ * regions.)
+ *
+ * An IST entry which hits PV context moves onto the primary stack and leaves
+ * via SPEC_CTRL_EXIT_TO_PV, *not* SPEC_CTRL_EXIT_TO_XEN.
  */
 .macro SPEC_CTRL_ENTRY_FROM_INTR_IST
 /*
@@ -319,7 +335,14 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
 UNLIKELY_END(\@_serialise)
 .endm
 
-/* Use when exiting to Xen context. */
+/*
+ * Use when exiting from any entry context, back to Xen context.  This
+ * includes returning to other SPEC_CTRL_{ENTRY,EXIT}_* regions with an
+ * incomplete speculation context.
+ *
+ * Because we might have interrupted Xen beyond SPEC_CTRL_EXIT_TO_$GUEST, we
+ * need to treat this as if it were an EXIT_TO_$GUEST case too.
+ */
 .macro SPEC_CTRL_EXIT_TO_XEN
 /*
  * Requires %rbx=stack_end
@@ -344,6 +367,9 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
 wrmsr
 
 .L\@_skip_sc_msr:
+
+/* TODO VERW */
+
 .endm
 
 #endif /* __ASSEMBLY__ */
-- 
2.30.2




[PATCH 6/9] x86/entry: Track the IST-ness of an entry for the exit paths

2023-09-15 Thread Andrew Cooper
Use %r12 to hold an ist_exit boolean.  This register is zero elsewhere in the
entry/exit asm, so it only needs setting in the IST path.

As this is subtle and fragile, add check_ist_exit() to be used in debugging
builds to cross-check that the ist_exit boolean matches the entry vector.

Write check_ist_exit() it in C, because it's debug only and the logic more
complicated than I care to maintain in asm.

For now, we only need to use this signal in the exit-to-Xen path, but some
exit-to-guest paths happen in IST context too.  Check the correctness in all
exit paths to avoid the logic bitrotting.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v2:
 * %r12 -> %r12d
 * Extend commit message
 * Tweak surrounding context
---
 xen/arch/x86/traps.c   | 13 +
 xen/arch/x86/x86_64/compat/entry.S |  9 -
 xen/arch/x86/x86_64/entry.S| 22 --
 3 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index dead728ce329..0a005f088bca 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2259,6 +2259,19 @@ void asm_domain_crash_synchronous(unsigned long addr)
 do_softirq();
 }
 
+#ifdef CONFIG_DEBUG
+void check_ist_exit(const struct cpu_user_regs *regs, bool ist_exit)
+{
+const unsigned int ist_mask =
+(1U << X86_EXC_NMI) | (1U << X86_EXC_DB) |
+(1U << X86_EXC_DF)  | (1U << X86_EXC_MC);
+uint8_t ev = regs->entry_vector;
+bool is_ist = (ev < X86_EXC_NUM) && ((1U << ev) & ist_mask);
+
+ASSERT(is_ist == ist_exit);
+}
+#endif
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/x86_64/compat/entry.S 
b/xen/arch/x86/x86_64/compat/entry.S
index bd5abd8040bd..7504bfb4f326 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -117,8 +117,15 @@ compat_process_trap:
 call  compat_create_bounce_frame
 jmp   compat_test_all_events
 
-/* %rbx: struct vcpu, interrupts disabled */
+/* %rbx: struct vcpu, %r12: ist_exit, interrupts disabled */
 ENTRY(compat_restore_all_guest)
+
+#ifdef CONFIG_DEBUG
+mov   %rsp, %rdi
+mov   %r12, %rsi
+call  check_ist_exit
+#endif
+
 ASSERT_INTERRUPTS_DISABLED
 mov   $~(X86_EFLAGS_IOPL | X86_EFLAGS_VM), %r11d
 and   UREGS_eflags(%rsp),%r11d
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 525877e97330..e5055e5bbf9f 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -142,8 +142,15 @@ process_trap:
 
 .section .text.entry, "ax", @progbits
 
-/* %rbx: struct vcpu, interrupts disabled */
+/* %rbx: struct vcpu, %r12: ist_exit, interrupts disabled */
 restore_all_guest:
+
+#ifdef CONFIG_DEBUG
+mov   %rsp, %rdi
+mov   %r12, %rsi
+call  check_ist_exit
+#endif
+
 ASSERT_INTERRUPTS_DISABLED
 
 /* Stash guest SPEC_CTRL value while we can read struct vcpu. */
@@ -659,8 +666,15 @@ ENTRY(early_page_fault)
 .section .text.entry, "ax", @progbits
 
 ALIGN
-/* No special register assumptions. */
+/* %r12=ist_exit */
 restore_all_xen:
+
+#ifdef CONFIG_DEBUG
+mov   %rsp, %rdi
+mov   %r12, %rsi
+call  check_ist_exit
+#endif
+
 /*
  * Check whether we need to switch to the per-CPU page tables, in
  * case we return to late PV exit code (from an NMI or #MC).
@@ -1087,6 +1101,10 @@ handle_ist_exception:
 .L_ist_dispatch_done:
 mov   %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
 mov   %bl, STACK_CPUINFO_FIELD(use_pv_cr3)(%r14)
+
+/* This is an IST exit */
+mov   $1, %r12d
+
 cmpb  $X86_EXC_NMI, UREGS_entry_vector(%rsp)
 jne   ret_from_intr
 
-- 
2.30.2




[PATCH 0/9] x86/spec-ctrl: AMD DIV fix, and VERW prerequisite bugfixes

2023-09-15 Thread Andrew Cooper
Patch 9 is the XSA-439 fix for the AMD DIV issue, disclosed insufficiently
ahead of August 8th for us to prepare a fix for the embargo.

Patches 1 thru 8 are prerequisites, identified while trying to write patch 9.

All 9 patches are for all security trees.

Andrew Cooper (9):
  x86/spec-ctrl: Fix confusion between SPEC_CTRL_EXIT_TO_XEN{,_IST}
  x86/spec-ctrl: Fold DO_SPEC_CTRL_EXIT_TO_XEN into it's single user
  x86/spec-ctrl: Turn the remaining SPEC_CTRL_{ENTRY,EXIT}_* into asm
macros
  x86/spec-ctrl: Improve all SPEC_CTRL_{ENTER,EXIT}_* comments
  x86/entry: Adjust restore_all_xen to hold stack_end in %r14
  x86/entry: Track the IST-ness of an entry for the exit paths
  x86/spec-ctrl: Issue VERW during IST exit to Xen
  x86/amd: Introduce is_zen{1,2}_uarch() predicates
  x86/spec-ctrl: Mitigate the Zen1 DIV leakge

 docs/misc/xen-command-line.pandoc|   6 +-
 xen/arch/x86/cpu/amd.c   |  18 +--
 xen/arch/x86/hvm/svm/entry.S |   1 +
 xen/arch/x86/include/asm/amd.h   |  11 ++
 xen/arch/x86/include/asm/cpufeatures.h   |   2 +-
 xen/arch/x86/include/asm/spec_ctrl_asm.h | 152 ---
 xen/arch/x86/spec_ctrl.c |  49 +++-
 xen/arch/x86/traps.c |  13 ++
 xen/arch/x86/x86_64/compat/entry.S   |   9 +-
 xen/arch/x86/x86_64/entry.S  |  30 -
 10 files changed, 220 insertions(+), 71 deletions(-)

-- 
2.30.2




Re: [PATCH 3/5] x86/emul: Add a pending_dbg field to x86_emulate_ctxt.retire

2023-09-15 Thread Andrew Cooper
On 15/09/2023 3:24 pm, Jinoh Kang wrote:
> On 9/15/23 21:20, Jinoh Kang wrote:
>> On 9/13/23 08:21, Andrew Cooper wrote:
>>> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h 
>>> b/xen/arch/x86/x86_emulate/x86_emulate.h
>>> index 698750267a90..f0e74d23c378 100644
>>> --- a/xen/arch/x86/x86_emulate/x86_emulate.h
>>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
>>> @@ -588,15 +588,26 @@ struct x86_emulate_ctxt
>>>  /* Canonical opcode (see below) (valid only on X86EMUL_OKAY). */
>>>  unsigned int opcode;
>>>  
>>> -/* Retirement state, set by the emulator (valid only on X86EMUL_OKAY). 
>>> */
>>> +/*
>>> + * Retirement state, set by the emulator (valid only on 
>>> X86EMUL_OKAY/DONE).
>>> + *
>>> + * TODO: all this state should be input/output from the VMCS 
>>> PENDING_DBG,
>>> + * INTERRUPTIBILITY and ACTIVITIY fields.
>>> + */
>>>  union {
>>> -uint8_t raw;
>>> +unsigned long raw;
>> Minor nit: this should be uint64_t for clarity.  Otherwise, it's not at all
>> clear that the raw field covers the entire union, unless you remind myself
>> that Xen does not support 32-bit host.
> you remind yourself*.  What a weird typo to make :-(

For better or worse, this is form preferred by the Xen coding style.

We deleted the 32bit build of the Xen more than a decade ago, and have
been 64bit-only ever since.

~Andrew



Re: [PATCH 1/2] xen/x86: io_apic: Introduce a command line option to skip timer check

2023-09-15 Thread Julien Grall

Hi Roger,

On 15/09/2023 14:54, Roger Pau Monné wrote:

On Fri, Aug 18, 2023 at 02:44:40PM +0100, Julien Grall wrote:

From: Julien Grall 

Currently, Xen will spend ~100ms to check if the timer works. If the
Admin knows their platform have a working timer, then it would be
handy to be able to bypass the check.


I'm of the opinion that the current code is at least dubious.

Specifically attempts to test the PIT timer, even when the hypervisor
might be using a different timer.  At which point it mostly attempts
to test that the interrupt routing from the PIT (or it's replacement)
is correct, and that Xen can receive such interrupts.

We go to great lengths in order to attempt to please this piece of
code, even when no PIT is available, at which point we switch the HPET
to legacy replacement mode in order to satisfy the checks.

I think the current code is not useful, and I would be fine if it was
just removed.


I am afraid I don't know enough the code to be able to say if it can be 
removed.


I also have no idea how common are such platforms nowadays (I suspect it 
is rather small). Which I why I went with a command line option.






Introduce a command line option 'no_timer_check' (the name is
matching the Linux parameter) for this purpose.

Signed-off-by: Julien Grall 
---
  docs/misc/xen-command-line.pandoc |  7 +++
  xen/arch/x86/io_apic.c| 11 +++
  2 files changed, 18 insertions(+)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index 4872b9098e83..1f9d3106383f 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1896,6 +1896,13 @@ This option is ignored in **pv-shim** mode.
  ### nr_irqs (x86)
  > `= `
  
+### no_timer_works (x86)


I find the naming confusing, and I would rather avoid negative named
booleans.

Maybe it should be `check_pit_intr` and default to enabled for the
time being?

Jan suggested to use timer-irq-works. Would you be happy with the name?




+> `=`
+
+> Default: `true`


I think the default is wrong here?  AFAICT no_timer_check will default
to false.


Ah yes. In the downstream version, I went with setting to true by 
default as we don't support any platform with broken timer. I forgot to 
update the patch before sending.





+
+Disables the code which tests for broken timer IRQ sources.


Hm, yes, it does check for one specific source, which doesn't need to
be the currently selected timer.

"Disables the code which tests interrupt injection from the legacy
i8259 timer."


I can update the comment.



Even that is not fully true, as it would resort to testing the HPET
legacy mode if PIT doesn't work, but one could argue the HPET legacy
mode is just a replacement for the i8254.


+
  ### irq-max-guests (x86)
  > `= `
  
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c

index b3afef8933d7..4875bb97003f 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -57,6 +57,14 @@ bool __initdata ioapic_ack_forced;
  int __read_mostly nr_ioapic_entries[MAX_IO_APICS];
  int __read_mostly nr_ioapics;
  
+/*

+ * The logic to check if the timer is working is expensive. So allow
+ * the admin to bypass it if they know their platform doesn't have
+ * a buggy timer.


I would mention i8254 here, as said this is quite likely not testing
the currently in use timer.


Sure.


Note that if you don't want to run the test in timer_irq_works() you
can possibly avoid the code in preinit_pit(), as there no need to
setup channel 0 in periodic mode then.


The channel also seems to be used as a fallback method for calibrating 
the APIC (see calibrate_APIC_clock()). AFAICT, the fallback method 
should only be used when the PIT is enabled.


I think it would still be feasible to avoid running the IRQ tests even 
when PIT is used. So it means, we cannot skip preinit_pit().


As a side note, I noticed that preinit_pit() is called during resume. 
But I can't find any use of channel 0 after boot. So maybe the call 
could be removed?


Cheers,

--
Julien Grall



Re: [PATCH 3/5] x86/emul: Add a pending_dbg field to x86_emulate_ctxt.retire

2023-09-15 Thread Jinoh Kang
On 9/15/23 21:20, Jinoh Kang wrote:
> On 9/13/23 08:21, Andrew Cooper wrote:
>> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h 
>> b/xen/arch/x86/x86_emulate/x86_emulate.h
>> index 698750267a90..f0e74d23c378 100644
>> --- a/xen/arch/x86/x86_emulate/x86_emulate.h
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
>> @@ -588,15 +588,26 @@ struct x86_emulate_ctxt
>>  /* Canonical opcode (see below) (valid only on X86EMUL_OKAY). */
>>  unsigned int opcode;
>>  
>> -/* Retirement state, set by the emulator (valid only on X86EMUL_OKAY). 
>> */
>> +/*
>> + * Retirement state, set by the emulator (valid only on 
>> X86EMUL_OKAY/DONE).
>> + *
>> + * TODO: all this state should be input/output from the VMCS 
>> PENDING_DBG,
>> + * INTERRUPTIBILITY and ACTIVITIY fields.
>> + */
>>  union {
>> -uint8_t raw;
>> +unsigned long raw;
> 
> Minor nit: this should be uint64_t for clarity.  Otherwise, it's not at all
> clear that the raw field covers the entire union, unless you remind myself
> that Xen does not support 32-bit host.

you remind yourself*.  What a weird typo to make :-(

-- 
Sincerely,
Jinoh Kang




[libvirt test] 183007: tolerable all pass - PUSHED

2023-09-15 Thread osstest service owner
flight 183007 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183007/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 183001
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 183001
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 183001
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  f26c0018ba44730404c0daf2131dc144d1345672
baseline version:
 libvirt  1f85f0967b12521f3d7321cb8469393e72d0ce7a

Last test of basis   183001  2023-09-14 04:18:45 Z1 days
Testing same since   183007  2023-09-15 04:20:23 Z0 days1 attempts


People who touched revisions under test:
  Boris Fiuczynski 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-i386-libvirt-raw  pass
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is 

Re: [PATCH 2/2] xen/x86: ioapic: Bail out from timer_irq_works() as soon as possible

2023-09-15 Thread Julien Grall

Hi Jan,

On 07/09/2023 15:28, Jan Beulich wrote:

On 18.08.2023 15:44, Julien Grall wrote:

From: Julien Grall 

Currently timer_irq_works() will wait the full 100ms before checking
that pit0_ticks has been incremented at least 4 times.

However, the bulk of the BIOS/platform should not have a buggy timer.
So waiting for the full 100ms is a bit harsh.

Rework the logic to only wait until 100ms passed or we saw more than
4 ticks. So now, in the good case, this will reduce the wait time
to ~50ms.

Signed-off-by: Julien Grall 


In principle this is all fine. There's a secondary aspect though which
may call for a slight rework of the patch.


--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1509,6 +1509,8 @@ static void __init setup_ioapic_ids_from_mpc(void)
  static int __init timer_irq_works(void)
  {
  unsigned long t1, flags;
+/* Wait for maximum 10 ticks */
+unsigned long msec = (10 * 1000) / HZ;


(Minor remark: I don't think this needs to be unsigned long; unsigned
in will suffice.)


You are right. I can switch to unsigned int.




@@ -1517,19 +1519,25 @@ static int __init timer_irq_works(void)
  
  local_save_flags(flags);

  local_irq_enable();
-/* Let ten ticks pass... */
-mdelay((10 * 1000) / HZ);
-local_irq_restore(flags);
  
-/*

- * Expect a few ticks at least, to be sure some possible
- * glue logic does not lock up after one or two first
- * ticks in a non-ExtINT mode.  Also the local APIC
- * might have cached one ExtINT interrupt.  Finally, at
- * least one tick may be lost due to delays.
- */
-if ( (ACCESS_ONCE(pit0_ticks) - t1) > 4 )
+while ( msec-- )
+{
+mdelay(1);
+/*
+ * Expect a few ticks at least, to be sure some possible
+ * glue logic does not lock up after one or two first
+ * ticks in a non-ExtINT mode.  Also the local APIC
+ * might have cached one ExtINT interrupt.  Finally, at
+ * least one tick may be lost due to delays.
+ */
+if ( (ACCESS_ONCE(pit0_ticks) - t1) <= 4 )
+continue;
+
+local_irq_restore(flags);
  return 1;
+}
+
+local_irq_restore(flags);
  
  return 0;

  }


While Andrew has a patch pending (not sure why it didn't go in yet)
to simplify local_irq_restore(), and while further it shouldn't really
need using here (local_irq_disable() ought to be fine)


Skimming through the code, the last call of timer_irq_works() in 
check_timer() happens after the interrupts masking state have been restored:


local_irq_restore(flags);

if ( timer_irq_works() )
  ...


So I think timer_irq_works() can be called with interrupts enabled and 
therefore we can't use local_irq_disable().



I can see that
you don't want to make such an adjustment here. But then I'd prefer if
we got away with just a single instance, adjusting the final return
statement accordingly (easiest would likely be to go from the value of
"msec").


I was thinking to use 'msec > 0' as a condition to determine if the test 
passed. However, it would consider a failure if it tooks 10ms for the 
test to pass.


I will see if I can rework the loop.

Cheers,

--
Julien Grall



Re: [PATCH 1/2] xen/x86: io_apic: Introduce a command line option to skip timer check

2023-09-15 Thread Roger Pau Monné
On Fri, Aug 18, 2023 at 02:44:40PM +0100, Julien Grall wrote:
> From: Julien Grall 
> 
> Currently, Xen will spend ~100ms to check if the timer works. If the
> Admin knows their platform have a working timer, then it would be
> handy to be able to bypass the check.

I'm of the opinion that the current code is at least dubious.

Specifically attempts to test the PIT timer, even when the hypervisor
might be using a different timer.  At which point it mostly attempts
to test that the interrupt routing from the PIT (or it's replacement)
is correct, and that Xen can receive such interrupts.

We go to great lengths in order to attempt to please this piece of
code, even when no PIT is available, at which point we switch the HPET
to legacy replacement mode in order to satisfy the checks.

I think the current code is not useful, and I would be fine if it was
just removed.

> 
> Introduce a command line option 'no_timer_check' (the name is
> matching the Linux parameter) for this purpose.
> 
> Signed-off-by: Julien Grall 
> ---
>  docs/misc/xen-command-line.pandoc |  7 +++
>  xen/arch/x86/io_apic.c| 11 +++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/docs/misc/xen-command-line.pandoc 
> b/docs/misc/xen-command-line.pandoc
> index 4872b9098e83..1f9d3106383f 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1896,6 +1896,13 @@ This option is ignored in **pv-shim** mode.
>  ### nr_irqs (x86)
>  > `= `
>  
> +### no_timer_works (x86)

I find the naming confusing, and I would rather avoid negative named
booleans.

Maybe it should be `check_pit_intr` and default to enabled for the
time being?

> +> `=`
> +
> +> Default: `true`

I think the default is wrong here?  AFAICT no_timer_check will default
to false.

> +
> +Disables the code which tests for broken timer IRQ sources.

Hm, yes, it does check for one specific source, which doesn't need to
be the currently selected timer.

"Disables the code which tests interrupt injection from the legacy
i8259 timer."

Even that is not fully true, as it would resort to testing the HPET
legacy mode if PIT doesn't work, but one could argue the HPET legacy
mode is just a replacement for the i8254.

> +
>  ### irq-max-guests (x86)
>  > `= `
>  
> diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
> index b3afef8933d7..4875bb97003f 100644
> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -57,6 +57,14 @@ bool __initdata ioapic_ack_forced;
>  int __read_mostly nr_ioapic_entries[MAX_IO_APICS];
>  int __read_mostly nr_ioapics;
>  
> +/*
> + * The logic to check if the timer is working is expensive. So allow
> + * the admin to bypass it if they know their platform doesn't have
> + * a buggy timer.

I would mention i8254 here, as said this is quite likely not testing
the currently in use timer.

Note that if you don't want to run the test in timer_irq_works() you
can possibly avoid the code in preinit_pit(), as there no need to
setup channel 0 in periodic mode then.

Thanks, Roger.



Re: [PATCH 1/2] xen/x86: io_apic: Introduce a command line option to skip timer check

2023-09-15 Thread George Dunlap
On Fri, Sep 15, 2023 at 2:18 PM Julien Grall  wrote:

> I don't mind too much about using - over _ but it is never clear why you
> strongly push for it (and whether the others agrees). Is this documented
> somewhere? If not, can you do it so everyone can apply it consistently?
> (At least I would not remember to ask for it because I am happy with the _).

FWIW I prefer `-` over `_`; I'd Ack a patch that documented it
somewhere (if it's not documented already).

 -George



Re: [PATCH 1/2] xen/x86: io_apic: Introduce a command line option to skip timer check

2023-09-15 Thread Julien Grall

Hi,

On 07/09/2023 15:09, Jan Beulich wrote:

On 18.08.2023 15:44, Julien Grall wrote:

--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1896,6 +1896,13 @@ This option is ignored in **pv-shim** mode.
  ### nr_irqs (x86)
  > `= `
  
+### no_timer_works (x86)

+> `=`
+
+> Default: `true`
+
+Disables the code which tests for broken timer IRQ sources.


In description and code it's "check", but here it's "works". Likely
just a typo. But I'd prefer if we didn't introduce any new "no*"
options which then can be negated to "no-no*". Make it "timer-check"
(also avoiding the underscore, no matter that Linux uses it), or
alternatively make it a truly positive option, e.g. "timer-irq-works".


I don't mind too much about using - over _ but it is never clear why you 
strongly push for it (and whether the others agrees). Is this documented 
somewhere? If not, can you do it so everyone can apply it consistently? 
(At least I would not remember to ask for it because I am happy with the _).


I will go for 'timer-irq-works'.



I also think it wants emphasizing that if this option is used and then
something doesn't work, people are on their own.


I will do that.

Note that I will only resend a new version after the tree as branched 
because this is not meant for 4.18.


Cheers,

--
Julien Grall



[PATCH for-4.18 v2] tools/light: Revoke permissions when a PCI detach for HVM domain

2023-09-15 Thread Julien Grall
From: Julien Grall 

Currently, libxl will grant IOMEM, I/O port and IRQ permissions when
a PCI is attached (see pci_add_dm_done()) for all domain types. However,
the permissions are only revoked for non-HVM domain (see do_pci_remove()).

This means that HVM domains will be left with extra permissions. While
this look bad on the paper, the IRQ permissions should be revoked
when the Device Model call xc_physdev_unmap_pirq() and such domain
cannot directly mapped I/O port and IOMEM regions. Instead, this has to
be done by a Device Model.

The Device Model can only run in dom0 or PV stubdomain (upstream libxl
doesn't have support for HVM/PVH stubdomain).

For PV/PVH stubdomain, the permission are properly revoked, so there is
no security concern.

This leaves dom0. There are two cases:
  1) Privileged: Anyone gaining access to the Device Model would already
 have large control on the host.
  2) Deprivileged: PCI passthrough require PHYSDEV operations which
 are not accessible when the Device Model is restricted.

So overall, it is believed that the extra permissions cannot be exploited.

Rework the code so the permissions are all removed for HVM domains.
This needs to happen after the QEMU has detached the device. So
the revocation is now moved to pci_remove_detached().

Also add a comment on top of the error message when the PIRQ cannot
be unbind to explain this could be a spurious error as QEMU may have
already done it.

Signed-off-by: Julien Grall 

---

Changes since v1:
* Move the code to revoke in pci_remove_detached()
* Add a comment on top of the PIRQ unbind error path
* Use goto to deal with errors.
---
 tools/libs/light/libxl_pci.c | 137 ---
 1 file changed, 77 insertions(+), 60 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 7f5f170e6eb0..96cb4da0794e 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1968,7 +1968,6 @@ static void do_pci_remove(libxl__egc *egc, 
pci_remove_state *prs)
 goto out_fail;
 }
 
-rc = ERROR_FAIL;
 if (type == LIBXL_DOMAIN_TYPE_HVM) {
 prs->hvm = true;
 switch (libxl__device_model_version_running(gc, domid)) {
@@ -1989,65 +1988,7 @@ static void do_pci_remove(libxl__egc *egc, 
pci_remove_state *prs)
 rc = ERROR_INVAL;
 goto out_fail;
 }
-} else {
-char *sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/resource", 
pci->domain,
- pci->bus, pci->dev, pci->func);
-FILE *f = fopen(sysfs_path, "r");
-unsigned int start = 0, end = 0, flags = 0, size = 0;
-int irq = 0;
-int i;
-
-if (f == NULL) {
-LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
-goto skip1;
-}
-for (i = 0; i < PROC_PCI_NUM_RESOURCES; i++) {
-if (fscanf(f, "0x%x 0x%x 0x%x\n", , , ) != 3)
-continue;
-size = end - start + 1;
-if (start) {
-if (flags & PCI_BAR_IO) {
-rc = xc_domain_ioport_permission(ctx->xch, domid, start, 
size, 0);
-if (rc < 0)
-LOGED(ERROR, domid,
-  "xc_domain_ioport_permission error 0x%x/0x%x",
-  start,
-  size);
-} else {
-rc = xc_domain_iomem_permission(ctx->xch, domid, 
start>>XC_PAGE_SHIFT,
-
(size+(XC_PAGE_SIZE-1))>>XC_PAGE_SHIFT, 0);
-if (rc < 0)
-LOGED(ERROR, domid,
-  "xc_domain_iomem_permission error 0x%x/0x%x",
-  start,
-  size);
-}
-}
-}
-fclose(f);
-skip1:
-if (!pci_supp_legacy_irq())
-goto skip_irq;
-sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
-   pci->bus, pci->dev, pci->func);
-f = fopen(sysfs_path, "r");
-if (f == NULL) {
-LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
-goto skip_irq;
-}
-if ((fscanf(f, "%u", ) == 1) && irq) {
-rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
-if (rc < 0) {
-LOGED(ERROR, domid, "xc_physdev_unmap_pirq irq=%d", irq);
-}
-rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0);
-if (rc < 0) {
-LOGED(ERROR, domid, "xc_domain_irq_permission irq=%d", irq);
-}
-}
-fclose(f);
 }
-skip_irq:
 rc = 0;
 out_fail:
 pci_remove_detached(egc, prs, rc); /* must be last */
@@ -2226,7 +2167,11 @@ static void pci_remove_detached(libxl__egc *egc,
 int rc)
 {
 

[xen-unstable test] 183005: tolerable FAIL

2023-09-15 Thread osstest service owner
flight 183005 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183005/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 183000
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 183000
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 183000
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 183000
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 183000
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 183000
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 183000
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 183000
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 183000
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 183000
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 183000
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 183000
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 xen  6aa25c32180ab59081c73bae4c568367d9133a1f
baseline version:
 xen  6aa25c32180ab59081c73bae4c568367d9133a1f

Last test of basis   183005  2023-09-15 01:53:36 Z0 days
Testing same since  (not found) 0 attempts

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 

Re: [PATCH 3/5] x86/emul: Add a pending_dbg field to x86_emulate_ctxt.retire

2023-09-15 Thread Jinoh Kang
On 9/13/23 08:21, Andrew Cooper wrote:
> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h 
> b/xen/arch/x86/x86_emulate/x86_emulate.h
> index 698750267a90..f0e74d23c378 100644
> --- a/xen/arch/x86/x86_emulate/x86_emulate.h
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
> @@ -588,15 +588,26 @@ struct x86_emulate_ctxt
>  /* Canonical opcode (see below) (valid only on X86EMUL_OKAY). */
>  unsigned int opcode;
>  
> -/* Retirement state, set by the emulator (valid only on X86EMUL_OKAY). */
> +/*
> + * Retirement state, set by the emulator (valid only on 
> X86EMUL_OKAY/DONE).
> + *
> + * TODO: all this state should be input/output from the VMCS PENDING_DBG,
> + * INTERRUPTIBILITY and ACTIVITIY fields.
> + */
>  union {
> -uint8_t raw;
> +unsigned long raw;

Minor nit: this should be uint64_t for clarity.  Otherwise, it's not at all
clear that the raw field covers the entire union, unless you remind myself
that Xen does not support 32-bit host.

>  struct {
> +/*
> + * Accumulated %dr6 trap bits, positive polarity.  Should only be
> + * interpreted in the case of X86EMUL_OKAY/DONE.
> + */
> +unsigned int pending_dbg;
> +
>  bool hlt:1;  /* Instruction HLTed. */
>  bool mov_ss:1;   /* Instruction sets MOV-SS irq shadow. */
>  bool sti:1;  /* Instruction sets STI irq shadow. */
>  bool unblock_nmi:1;  /* Instruction clears NMI blocking. */
> -bool singlestep:1;   /* Singlestepping was active. */
> +bool singlestep:1;   /* Singlestepping was active. (TODO, merge 
> into pending_dbg) */
>  };
>  } retire;
>  

-- 
Sincerely,
Jinoh Kang




Re: [PATCH] Revert "EDAC/mce_amd: Do not load edac_mce_amd module on guests"

2023-09-15 Thread Borislav Petkov
On Thu, Sep 14, 2023 at 10:02:05AM -0700, Elliott Mitchell wrote:
> Indeed.  At what point is the lack of information and response long
> enough to simply commit a revert due to those lacks?

At no point.

> Even with the commit message having been rewritten and the link to:
> https://lkml.kernel.org/r/20210628172740.245689-1-smita.koralahallichannabasa...@amd.com
> added, this still reads as roughly:
> 
> "A hypothetical bug on a hypothetivisor"

If "Hypervisors likely do not expose the SMCA feature to the guest"
doesn't explain to you what the problem is this commit is fixing, then
I can't help you.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



[QEMU PATCH v5 11/13] virtio-gpu: Support Venus capset

2023-09-15 Thread Huang Rui
From: Antonio Caggiano 

Add support for the Venus capset, which enables Vulkan support through
the Venus Vulkan driver for virtio-gpu.

Signed-off-by: Antonio Caggiano 
Signed-off-by: Huang Rui 
---

V4 -> V5:
- Send kernel patch to define VIRTIO_GPU_CAPSET_VENUS and will use
  another patch to sync up linux headers. (Akihiko)
- https://lore.kernel.org/lkml/20230915105918.3763061-1-ray.hu...@amd.com/

 hw/display/virtio-gpu-virgl.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 8a017dbeb4..7f95490e90 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -437,6 +437,11 @@ static void virgl_cmd_get_capset_info(VirtIOGPU *g,
 virgl_renderer_get_cap_set(resp.capset_id,
_max_version,
_max_size);
+} else if (info.capset_index == 2) {
+resp.capset_id = VIRTIO_GPU_CAPSET_VENUS;
+virgl_renderer_get_cap_set(resp.capset_id,
+   _max_version,
+   _max_size);
 } else {
 resp.capset_max_version = 0;
 resp.capset_max_size = 0;
@@ -901,10 +906,18 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
 
 int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)
 {
-uint32_t capset2_max_ver, capset2_max_size;
+uint32_t capset2_max_ver, capset2_max_size, num_capsets;
+num_capsets = 1;
+
 virgl_renderer_get_cap_set(VIRTIO_GPU_CAPSET_VIRGL2,
-  _max_ver,
-  _max_size);
+   _max_ver,
+   _max_size);
+num_capsets += capset2_max_ver ? 1 : 0;
+
+virgl_renderer_get_cap_set(VIRTIO_GPU_CAPSET_VENUS,
+   _max_ver,
+   _max_size);
+num_capsets += capset2_max_size ? 1 : 0;
 
-return capset2_max_ver ? 2 : 1;
+return num_capsets;
 }
-- 
2.34.1




[QEMU PATCH v5 08/13] virtio-gpu: Don't require udmabuf when blobs and virgl are enabled

2023-09-15 Thread Huang Rui
From: Dmitry Osipenko 

The udmabuf usage is mandatory when virgl is disabled and blobs feature
enabled in the Qemu machine configuration. If virgl and blobs are enabled,
then udmabuf requirement is optional. Since udmabuf isn't widely supported
by a popular Linux distros today, let's relax the udmabuf requirement for
blobs=on,virgl=on. Now, a full-featured virtio-gpu acceleration is
available to Qemu users without a need to have udmabuf available in the
system.

Reviewed-by: Antonio Caggiano 
Signed-off-by: Dmitry Osipenko 
Signed-off-by: Huang Rui 
---
 hw/display/virtio-gpu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index a66cbd9930..5b7a7eab4f 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1361,7 +1361,8 @@ void virtio_gpu_device_realize(DeviceState *qdev, Error 
**errp)
 VirtIOGPU *g = VIRTIO_GPU(qdev);
 
 if (virtio_gpu_blob_enabled(g->parent_obj.conf)) {
-if (!virtio_gpu_have_udmabuf()) {
+if (!virtio_gpu_virgl_enabled(g->parent_obj.conf) &&
+!virtio_gpu_have_udmabuf()) {
 error_setg(errp, "cannot enable blob resources without udmabuf");
 return;
 }
-- 
2.34.1




[QEMU PATCH v5 13/13] virtio-gpu: Enable virglrenderer render server flag for venus

2023-09-15 Thread Huang Rui
Venus in virglrenderer has required render server support.

Signed-off-by: Huang Rui 
---
 hw/display/virtio-gpu-virgl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 39c04d730c..65ffce85a8 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -888,7 +888,7 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
 #endif
 
 #ifdef VIRGL_RENDERER_VENUS
-flags |= VIRGL_RENDERER_VENUS;
+flags |= VIRGL_RENDERER_VENUS | VIRGL_RENDERER_RENDER_SERVER;
 #endif
 
 ret = virgl_renderer_init(g, flags, _gpu_3d_cbs);
-- 
2.34.1




[QEMU PATCH v5 09/13] virtio-gpu: Handle resource blob commands

2023-09-15 Thread Huang Rui
From: Antonio Caggiano 

Support BLOB resources creation, mapping and unmapping by calling the
new stable virglrenderer 0.10 interface. Only enabled when available and
via the blob config. E.g. -device virtio-vga-gl,blob=true

Signed-off-by: Antonio Caggiano 
Signed-off-by: Dmitry Osipenko 
Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Huang Rui 
---

V4 -> V5:
- Use memory_region_init_ram_ptr() instead of
  memory_region_init_ram_device_ptr() (Akihiko)

 hw/display/virtio-gpu-virgl.c  | 213 +
 hw/display/virtio-gpu.c|   4 +-
 include/hw/virtio/virtio-gpu.h |   5 +
 meson.build|   4 +
 4 files changed, 225 insertions(+), 1 deletion(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 312953ec16..563a6f2f58 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -17,6 +17,7 @@
 #include "trace.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-gpu.h"
+#include "hw/virtio/virtio-gpu-bswap.h"
 
 #include "ui/egl-helpers.h"
 
@@ -78,9 +79,24 @@ static void virgl_cmd_create_resource_3d(VirtIOGPU *g,
 virgl_renderer_resource_create(, NULL, 0);
 }
 
+static void virgl_resource_destroy(VirtIOGPU *g,
+   struct virtio_gpu_simple_resource *res)
+{
+if (!res)
+return;
+
+QTAILQ_REMOVE(>reslist, res, next);
+
+virtio_gpu_cleanup_mapping_iov(g, res->iov, res->iov_cnt);
+g_free(res->addrs);
+
+g_free(res);
+}
+
 static void virgl_cmd_resource_unref(VirtIOGPU *g,
  struct virtio_gpu_ctrl_command *cmd)
 {
+struct virtio_gpu_simple_resource *res;
 struct virtio_gpu_resource_unref unref;
 struct iovec *res_iovs = NULL;
 int num_iovs = 0;
@@ -88,13 +104,22 @@ static void virgl_cmd_resource_unref(VirtIOGPU *g,
 VIRTIO_GPU_FILL_CMD(unref);
 trace_virtio_gpu_cmd_res_unref(unref.resource_id);
 
+res = virtio_gpu_find_resource(g, unref.resource_id);
+
 virgl_renderer_resource_detach_iov(unref.resource_id,
_iovs,
_iovs);
 if (res_iovs != NULL && num_iovs != 0) {
 virtio_gpu_cleanup_mapping_iov(g, res_iovs, num_iovs);
+if (res) {
+res->iov = NULL;
+res->iov_cnt = 0;
+}
 }
+
 virgl_renderer_resource_unref(unref.resource_id);
+
+virgl_resource_destroy(g, res);
 }
 
 static void virgl_cmd_context_create(VirtIOGPU *g,
@@ -426,6 +451,183 @@ static void virgl_cmd_get_capset(VirtIOGPU *g,
 g_free(resp);
 }
 
+#ifdef HAVE_VIRGL_RESOURCE_BLOB
+
+static void virgl_cmd_resource_create_blob(VirtIOGPU *g,
+   struct virtio_gpu_ctrl_command *cmd)
+{
+struct virtio_gpu_simple_resource *res;
+struct virtio_gpu_resource_create_blob cblob;
+struct virgl_renderer_resource_create_blob_args virgl_args = { 0 };
+int ret;
+
+VIRTIO_GPU_FILL_CMD(cblob);
+virtio_gpu_create_blob_bswap();
+trace_virtio_gpu_cmd_res_create_blob(cblob.resource_id, cblob.size);
+
+if (cblob.resource_id == 0) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource id 0 is not allowed\n",
+  __func__);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
+res = virtio_gpu_find_resource(g, cblob.resource_id);
+if (res) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource already exists %d\n",
+  __func__, cblob.resource_id);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
+res = g_new0(struct virtio_gpu_simple_resource, 1);
+if (!res) {
+cmd->error = VIRTIO_GPU_RESP_ERR_OUT_OF_MEMORY;
+return;
+}
+
+res->resource_id = cblob.resource_id;
+res->blob_size = cblob.size;
+
+if (cblob.blob_mem != VIRTIO_GPU_BLOB_MEM_HOST3D) {
+ret = virtio_gpu_create_mapping_iov(g, cblob.nr_entries, sizeof(cblob),
+cmd, >addrs, >iov,
+>iov_cnt);
+if (!ret) {
+g_free(res);
+cmd->error = VIRTIO_GPU_RESP_ERR_UNSPEC;
+return;
+}
+}
+
+QTAILQ_INSERT_HEAD(>reslist, res, next);
+
+virgl_args.res_handle = cblob.resource_id;
+virgl_args.ctx_id = cblob.hdr.ctx_id;
+virgl_args.blob_mem = cblob.blob_mem;
+virgl_args.blob_id = cblob.blob_id;
+virgl_args.blob_flags = cblob.blob_flags;
+virgl_args.size = cblob.size;
+virgl_args.iovecs = res->iov;
+virgl_args.num_iovs = res->iov_cnt;
+
+ret = virgl_renderer_resource_create_blob(_args);
+if (ret) {
+virgl_resource_destroy(g, res);
+qemu_log_mask(LOG_GUEST_ERROR, "%s: virgl blob create error: %s\n",
+  __func__, strerror(-ret));
+cmd->error = VIRTIO_GPU_RESP_ERR_UNSPEC;
+}
+}
+
+static 

[QEMU PATCH v5 12/13] virtio-gpu: Initialize Venus

2023-09-15 Thread Huang Rui
From: Antonio Caggiano 

Request Venus when initializing VirGL.

Signed-off-by: Antonio Caggiano 
Signed-off-by: Huang Rui 
---

V4 -> V5:
- Add meson check to make sure unstable APIs defined from 0.9.0. (Antonio)

 hw/display/virtio-gpu-virgl.c | 4 
 meson.build   | 5 +
 2 files changed, 9 insertions(+)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 7f95490e90..39c04d730c 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -887,6 +887,10 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
 }
 #endif
 
+#ifdef VIRGL_RENDERER_VENUS
+flags |= VIRGL_RENDERER_VENUS;
+#endif
+
 ret = virgl_renderer_init(g, flags, _gpu_3d_cbs);
 if (ret != 0) {
 error_report("virgl could not be initialized: %d", ret);
diff --git a/meson.build b/meson.build
index f7b744ab82..e4004d05b1 100644
--- a/meson.build
+++ b/meson.build
@@ -1076,6 +1076,11 @@ if not get_option('virglrenderer').auto() or have_system 
or have_vhost_user_gpu
cc.has_function('virgl_renderer_resource_create_blob',
prefix: '#include ',
dependencies: virgl))
+  if virgl.version().version_compare('>= 0.9.0') and 
virgl.version().version_compare('< 1.0.0')
+message('Enabling virglrenderer unstable APIs')
+virgl = declare_dependency(compile_args: '-DVIRGL_RENDERER_UNSTABLE_APIS',
+   dependencies: virgl)
+  endif
 endif
 blkio = not_found
 if not get_option('blkio').auto() or have_block
-- 
2.34.1




[QEMU PATCH v5 10/13] virtio-gpu: Resource UUID

2023-09-15 Thread Huang Rui
From: Antonio Caggiano 

Enable resource UUID feature and implement command resource assign UUID.
This is done by introducing a hash table to map resource IDs to their
UUIDs.

Signed-off-by: Antonio Caggiano 
Signed-off-by: Huang Rui 
---

V4 -> V5:
- Add virtio migration handling for uuid (Akihiko)
- Adjust sequence to allocate gpu resource before virglrender resource
  creation (Akihiko)
- Clean up (Akihiko)

 hw/display/trace-events|  1 +
 hw/display/virtio-gpu-base.c   |  2 ++
 hw/display/virtio-gpu-virgl.c  | 21 
 hw/display/virtio-gpu.c| 58 ++
 include/hw/virtio/virtio-gpu.h |  6 
 5 files changed, 88 insertions(+)

diff --git a/hw/display/trace-events b/hw/display/trace-events
index 2336a0ca15..54d6894c59 100644
--- a/hw/display/trace-events
+++ b/hw/display/trace-events
@@ -41,6 +41,7 @@ virtio_gpu_cmd_res_create_blob(uint32_t res, uint64_t size) 
"res 0x%x, size %" P
 virtio_gpu_cmd_res_unref(uint32_t res) "res 0x%x"
 virtio_gpu_cmd_res_back_attach(uint32_t res) "res 0x%x"
 virtio_gpu_cmd_res_back_detach(uint32_t res) "res 0x%x"
+virtio_gpu_cmd_res_assign_uuid(uint32_t res) "res 0x%x"
 virtio_gpu_cmd_res_xfer_toh_2d(uint32_t res) "res 0x%x"
 virtio_gpu_cmd_res_xfer_toh_3d(uint32_t res) "res 0x%x"
 virtio_gpu_cmd_res_xfer_fromh_3d(uint32_t res) "res 0x%x"
diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index 4f2b0ba1f3..f44388715c 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -236,6 +236,8 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT);
 }
 
+features |= (1 << VIRTIO_GPU_F_RESOURCE_UUID);
+
 return features;
 }
 
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 563a6f2f58..8a017dbeb4 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -36,11 +36,20 @@ static void virgl_cmd_create_resource_2d(VirtIOGPU *g,
 {
 struct virtio_gpu_resource_create_2d c2d;
 struct virgl_renderer_resource_create_args args;
+struct virtio_gpu_simple_resource *res;
 
 VIRTIO_GPU_FILL_CMD(c2d);
 trace_virtio_gpu_cmd_res_create_2d(c2d.resource_id, c2d.format,
c2d.width, c2d.height);
 
+res = g_new0(struct virtio_gpu_simple_resource, 1);
+if (!res) {
+cmd->error = VIRTIO_GPU_RESP_ERR_OUT_OF_MEMORY;
+return;
+}
+res->resource_id = c2d.resource_id;
+QTAILQ_INSERT_HEAD(>reslist, res, next);
+
 args.handle = c2d.resource_id;
 args.target = 2;
 args.format = c2d.format;
@@ -60,11 +69,20 @@ static void virgl_cmd_create_resource_3d(VirtIOGPU *g,
 {
 struct virtio_gpu_resource_create_3d c3d;
 struct virgl_renderer_resource_create_args args;
+struct virtio_gpu_simple_resource *res;
 
 VIRTIO_GPU_FILL_CMD(c3d);
 trace_virtio_gpu_cmd_res_create_3d(c3d.resource_id, c3d.format,
c3d.width, c3d.height, c3d.depth);
 
+res = g_new0(struct virtio_gpu_simple_resource, 1);
+if (!res) {
+cmd->error = VIRTIO_GPU_RESP_ERR_OUT_OF_MEMORY;
+return;
+}
+res->resource_id = c3d.resource_id;
+QTAILQ_INSERT_HEAD(>reslist, res, next);
+
 args.handle = c3d.resource_id;
 args.target = c3d.target;
 args.format = c3d.format;
@@ -682,6 +700,9 @@ void virtio_gpu_virgl_process_cmd(VirtIOGPU *g,
 /* TODO add security */
 virgl_cmd_ctx_detach_resource(g, cmd);
 break;
+case VIRTIO_GPU_CMD_RESOURCE_ASSIGN_UUID:
+virtio_gpu_resource_assign_uuid(g, cmd);
+break;
 case VIRTIO_GPU_CMD_GET_CAPSET_INFO:
 virgl_cmd_get_capset_info(g, cmd);
 break;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index cc4c1f81bb..44414c1c5e 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -966,6 +966,38 @@ virtio_gpu_resource_detach_backing(VirtIOGPU *g,
 virtio_gpu_cleanup_mapping(g, res);
 }
 
+void virtio_gpu_resource_assign_uuid(VirtIOGPU *g,
+ struct virtio_gpu_ctrl_command *cmd)
+{
+struct virtio_gpu_simple_resource *res;
+struct virtio_gpu_resource_assign_uuid assign;
+struct virtio_gpu_resp_resource_uuid resp;
+QemuUUID *uuid;
+
+VIRTIO_GPU_FILL_CMD(assign);
+virtio_gpu_bswap_32(, sizeof(assign));
+trace_virtio_gpu_cmd_res_assign_uuid(assign.resource_id);
+
+res = virtio_gpu_find_check_resource(g, assign.resource_id, false, 
__func__, >error);
+if (!res) {
+return;
+}
+
+memset(, 0, sizeof(resp));
+resp.hdr.type = VIRTIO_GPU_RESP_OK_RESOURCE_UUID;
+
+uuid = g_hash_table_lookup(g->resource_uuids, 
GUINT_TO_POINTER(assign.resource_id));
+if (!uuid) {
+uuid = g_new(QemuUUID, 1);
+qemu_uuid_generate(uuid);
+g_hash_table_insert(g->resource_uuids, 

[QEMU PATCH v5 07/13] softmmu/memory: enable automatic deallocation of memory regions

2023-09-15 Thread Huang Rui
From: Xenia Ragiadakou 

When the memory region has a different life-cycle from that of her parent,
could be automatically released, once has been unparent and once all of her
references have gone away, via the object's free callback.

However, currently, references to the memory region are held by its owner
without first incrementing the memory region object's reference count.
As a result, the automatic deallocation of the object, not taking into
account those references, results in use-after-free memory corruption.

This patch increases the reference count of an owned memory region object
on each memory_region_ref() and decreases it on each memory_region_unref().

Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Huang Rui 
---

V4 -> V5:
- ref/unref only owned memory regions (Akihiko)

 softmmu/memory.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7d9494ce70..15e1699750 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1800,6 +1800,9 @@ void memory_region_ref(MemoryRegion *mr)
 /* MMIO callbacks most likely will access data that belongs
  * to the owner, hence the need to ref/unref the owner whenever
  * the memory region is in use.
+ * Likewise, the owner keeps references to the memory region,
+ * hence the need to ref/unref the memory region object to prevent
+ * its automatic deallocation while still referenced by its owner.
  *
  * The memory region is a child of its owner.  As long as the
  * owner doesn't call unparent itself on the memory region,
@@ -1808,6 +1811,7 @@ void memory_region_ref(MemoryRegion *mr)
  * we do not ref/unref them because it slows down DMA sensibly.
  */
 if (mr && mr->owner) {
+object_ref(OBJECT(mr));
 object_ref(mr->owner);
 }
 }
@@ -1816,6 +1820,7 @@ void memory_region_unref(MemoryRegion *mr)
 {
 if (mr && mr->owner) {
 object_unref(mr->owner);
+object_unref(OBJECT(mr));
 }
 }
 
-- 
2.34.1




[QEMU PATCH v5 06/13] virtio-gpu: Support context init feature with virglrenderer

2023-09-15 Thread Huang Rui
Patch "virtio-gpu: CONTEXT_INIT feature" has added the context_init
feature flags.
We would like to enable the feature with virglrenderer, so add to create
virgl renderer context with flags using context_id when valid.

Originally-by: Antonio Caggiano 
Signed-off-by: Huang Rui 
---

V4 -> V5:
- Inverted patch 5 and 6 because we should configure
  HAVE_VIRGL_CONTEXT_INIT firstly. (Philippe)

 hw/display/virtio-gpu-virgl.c | 13 +++--
 hw/display/virtio-gpu.c   |  2 ++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 8bb7a2c21f..312953ec16 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -106,8 +106,17 @@ static void virgl_cmd_context_create(VirtIOGPU *g,
 trace_virtio_gpu_cmd_ctx_create(cc.hdr.ctx_id,
 cc.debug_name);
 
-virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen,
-  cc.debug_name);
+if (cc.context_init) {
+#ifdef HAVE_VIRGL_CONTEXT_INIT
+virgl_renderer_context_create_with_flags(cc.hdr.ctx_id,
+ cc.context_init,
+ cc.nlen,
+ cc.debug_name);
+return;
+#endif
+}
+
+virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen, cc.debug_name);
 }
 
 static void virgl_cmd_context_destroy(VirtIOGPU *g,
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 3e658f1fef..a66cbd9930 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1506,6 +1506,8 @@ static Property virtio_gpu_properties[] = {
 DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
 VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
 DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0),
+DEFINE_PROP_BIT("context_init", VirtIOGPU, parent_obj.conf.flags,
+VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.34.1




[QEMU PATCH v5 05/13] virtio-gpu: Configure context init for virglrenderer

2023-09-15 Thread Huang Rui
Configure context init feature flag for virglrenderer.

Originally-by: Antonio Caggiano 
Signed-off-by: Huang Rui 
---

V4 -> V5:
- Inverted patch 5 and 6 because we should configure
  HAVE_VIRGL_CONTEXT_INIT firstly. (Philippe)

 meson.build | 4 
 1 file changed, 4 insertions(+)

diff --git a/meson.build b/meson.build
index 98e68ef0b1..ff20d3c249 100644
--- a/meson.build
+++ b/meson.build
@@ -1068,6 +1068,10 @@ if not get_option('virglrenderer').auto() or have_system 
or have_vhost_user_gpu
prefix: '#include ',
dependencies: virgl))
   endif
+  config_host_data.set('HAVE_VIRGL_CONTEXT_INIT',
+   
cc.has_function('virgl_renderer_context_create_with_flags',
+   prefix: '#include ',
+   dependencies: virgl))
 endif
 blkio = not_found
 if not get_option('blkio').auto() or have_block
-- 
2.34.1




[QEMU PATCH v5 04/13] virtio-gpu: blob prep

2023-09-15 Thread Huang Rui
From: Antonio Caggiano 

This adds preparatory functions needed to:

 - decode blob cmds
 - tracking iovecs

Signed-off-by: Antonio Caggiano 
Signed-off-by: Dmitry Osipenko 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Tested-by: Akihiko Odaki 
Tested-by: Huang Rui 
Acked-by: Huang Rui 
Reviewed-by: Emmanouil Pitsidianakis 
Reviewed-by: Akihiko Odaki 
Signed-off-by: Huang Rui 
---

This patch is already under review as part of the "rutabaga_gfx + gfxstream"
series (already in v13) and has been included here because of dependency.

 hw/display/virtio-gpu.c  | 10 +++---
 include/hw/virtio/virtio-gpu-bswap.h | 15 +++
 include/hw/virtio/virtio-gpu.h   |  5 +
 3 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 48ef0d9fad..3e658f1fef 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -33,15 +33,11 @@
 
 #define VIRTIO_GPU_VM_VERSION 1
 
-static struct virtio_gpu_simple_resource*
-virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id);
 static struct virtio_gpu_simple_resource *
 virtio_gpu_find_check_resource(VirtIOGPU *g, uint32_t resource_id,
bool require_backing,
const char *caller, uint32_t *error);
 
-static void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
-   struct virtio_gpu_simple_resource *res);
 static void virtio_gpu_reset_bh(void *opaque);
 
 void virtio_gpu_update_cursor_data(VirtIOGPU *g,
@@ -116,7 +112,7 @@ static void update_cursor(VirtIOGPU *g, struct 
virtio_gpu_update_cursor *cursor)
   cursor->resource_id ? 1 : 0);
 }
 
-static struct virtio_gpu_simple_resource *
+struct virtio_gpu_simple_resource *
 virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id)
 {
 struct virtio_gpu_simple_resource *res;
@@ -904,8 +900,8 @@ void virtio_gpu_cleanup_mapping_iov(VirtIOGPU *g,
 g_free(iov);
 }
 
-static void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
-   struct virtio_gpu_simple_resource *res)
+void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
+struct virtio_gpu_simple_resource *res)
 {
 virtio_gpu_cleanup_mapping_iov(g, res->iov, res->iov_cnt);
 res->iov = NULL;
diff --git a/include/hw/virtio/virtio-gpu-bswap.h 
b/include/hw/virtio/virtio-gpu-bswap.h
index 637a0585d0..dd1975e2d4 100644
--- a/include/hw/virtio/virtio-gpu-bswap.h
+++ b/include/hw/virtio/virtio-gpu-bswap.h
@@ -70,6 +70,21 @@ virtio_gpu_create_blob_bswap(struct 
virtio_gpu_resource_create_blob *cblob)
 le64_to_cpus(>size);
 }
 
+static inline void
+virtio_gpu_map_blob_bswap(struct virtio_gpu_resource_map_blob *mblob)
+{
+virtio_gpu_ctrl_hdr_bswap(>hdr);
+le32_to_cpus(>resource_id);
+le64_to_cpus(>offset);
+}
+
+static inline void
+virtio_gpu_unmap_blob_bswap(struct virtio_gpu_resource_unmap_blob *ublob)
+{
+virtio_gpu_ctrl_hdr_bswap(>hdr);
+le32_to_cpus(>resource_id);
+}
+
 static inline void
 virtio_gpu_scanout_blob_bswap(struct virtio_gpu_set_scanout_blob *ssb)
 {
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index de4f624e94..55973e112f 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -257,6 +257,9 @@ void virtio_gpu_base_fill_display_info(VirtIOGPUBase *g,
 void virtio_gpu_base_generate_edid(VirtIOGPUBase *g, int scanout,
struct virtio_gpu_resp_edid *edid);
 /* virtio-gpu.c */
+struct virtio_gpu_simple_resource *
+virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id);
+
 void virtio_gpu_ctrl_response(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd,
   struct virtio_gpu_ctrl_hdr *resp,
@@ -275,6 +278,8 @@ int virtio_gpu_create_mapping_iov(VirtIOGPU *g,
   uint32_t *niov);
 void virtio_gpu_cleanup_mapping_iov(VirtIOGPU *g,
 struct iovec *iov, uint32_t count);
+void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
+struct virtio_gpu_simple_resource *res);
 void virtio_gpu_process_cmdq(VirtIOGPU *g);
 void virtio_gpu_device_realize(DeviceState *qdev, Error **errp);
 void virtio_gpu_reset(VirtIODevice *vdev);
-- 
2.34.1




[QEMU PATCH v5 02/13] virtio-gpu: CONTEXT_INIT feature

2023-09-15 Thread Huang Rui
From: Antonio Caggiano 

The feature can be enabled when a backend wants it.

Signed-off-by: Antonio Caggiano 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Akihiko Odaki 
Tested-by: Huang Rui 
Acked-by: Huang Rui 
Reviewed-by: Marc-André Lureau 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Akihiko Odaki 
Signed-off-by: Huang Rui 
---

This patch is already under review as part of the "rutabaga_gfx + gfxstream"
series (already in v13) and has been included here because of dependency.

 hw/display/virtio-gpu-base.c   | 3 +++
 include/hw/virtio/virtio-gpu.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index ca1fb7b16f..4f2b0ba1f3 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -232,6 +232,9 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 if (virtio_gpu_blob_enabled(g->conf)) {
 features |= (1 << VIRTIO_GPU_F_RESOURCE_BLOB);
 }
+if (virtio_gpu_context_init_enabled(g->conf)) {
+features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT);
+}
 
 return features;
 }
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 390c4642b8..8377c365ef 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -93,6 +93,7 @@ enum virtio_gpu_base_conf_flags {
 VIRTIO_GPU_FLAG_EDID_ENABLED,
 VIRTIO_GPU_FLAG_DMABUF_ENABLED,
 VIRTIO_GPU_FLAG_BLOB_ENABLED,
+VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
 };
 
 #define virtio_gpu_virgl_enabled(_cfg) \
@@ -105,6 +106,8 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
 #define virtio_gpu_blob_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
+#define virtio_gpu_context_init_enabled(_cfg) \
+(_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
-- 
2.34.1




[QEMU PATCH v5 03/13] virtio-gpu: hostmem

2023-09-15 Thread Huang Rui
From: Gerd Hoffmann 

Use VIRTIO_GPU_SHM_ID_HOST_VISIBLE as id for virtio-gpu.

Signed-off-by: Antonio Caggiano 
Tested-by: Alyssa Ross 
Tested-by: Akihiko Odaki 
Tested-by: Huang Rui 
Acked-by: Huang Rui 
Acked-by: Michael S. Tsirkin 
Reviewed-by: Akihiko Odaki 
Signed-off-by: Huang Rui 
---

This patch is already under review as part of the "rutabaga_gfx + gfxstream"
series (already in v13) and has been included here because of dependency.

 hw/display/virtio-gpu-pci.c| 14 ++
 hw/display/virtio-gpu.c|  1 +
 hw/display/virtio-vga.c| 33 -
 include/hw/virtio/virtio-gpu.h |  5 +
 4 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/hw/display/virtio-gpu-pci.c b/hw/display/virtio-gpu-pci.c
index 93f214ff58..da6a99f038 100644
--- a/hw/display/virtio-gpu-pci.c
+++ b/hw/display/virtio-gpu-pci.c
@@ -33,6 +33,20 @@ static void virtio_gpu_pci_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 DeviceState *vdev = DEVICE(g);
 int i;
 
+if (virtio_gpu_hostmem_enabled(g->conf)) {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(>hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ >hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem,
+   VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
+}
+
 virtio_pci_force_virtio_1(vpci_dev);
 if (!qdev_realize(vdev, BUS(_dev->bus), errp)) {
 return;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index bbd5c6561a..48ef0d9fad 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1509,6 +1509,7 @@ static Property virtio_gpu_properties[] = {
  256 * MiB),
 DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
 VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
+DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c
index e6fb0aa876..c8552ff760 100644
--- a/hw/display/virtio-vga.c
+++ b/hw/display/virtio-vga.c
@@ -115,17 +115,32 @@ static void virtio_vga_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 pci_register_bar(_dev->pci_dev, 0,
  PCI_BASE_ADDRESS_MEM_PREFETCH, >vram);
 
-/*
- * Configure virtio bar and regions
- *
- * We use bar #2 for the mmio regions, to be compatible with stdvga.
- * virtio regions are moved to the end of bar #2, to make room for
- * the stdvga mmio registers at the start of bar #2.
- */
-vpci_dev->modern_mem_bar_idx = 2;
-vpci_dev->msix_bar_idx = 4;
 vpci_dev->modern_io_bar_idx = 5;
 
+if (!virtio_gpu_hostmem_enabled(g->conf)) {
+/*
+ * Configure virtio bar and regions
+ *
+ * We use bar #2 for the mmio regions, to be compatible with stdvga.
+ * virtio regions are moved to the end of bar #2, to make room for
+ * the stdvga mmio registers at the start of bar #2.
+ */
+vpci_dev->modern_mem_bar_idx = 2;
+vpci_dev->msix_bar_idx = 4;
+} else {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(>hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ >hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem,
+   VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
+}
+
 if (!(vpci_dev->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ)) {
 /*
  * with page-per-vq=off there is no padding space we can use
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 8377c365ef..de4f624e94 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -108,12 +108,15 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
 #define virtio_gpu_context_init_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
+#define virtio_gpu_hostmem_enabled(_cfg) \
+(_cfg.hostmem > 0)
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
 uint32_t flags;
 uint32_t xres;
 uint32_t yres;
+uint64_t hostmem;
 };
 
 struct virtio_gpu_ctrl_command {
@@ -137,6 +140,8 @@ struct VirtIOGPUBase {
 int renderer_blocked;
 int enable;
 
+MemoryRegion hostmem;
+
 struct 

[QEMU PATCH v5 01/13] virtio: Add shared memory capability

2023-09-15 Thread Huang Rui
From: "Dr. David Alan Gilbert" 

Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG' to allow
defining shared memory regions with sizes and offsets of 2^32 and more.
Multiple instances of the capability are allowed and distinguished
by a device-specific 'id'.

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Antonio Caggiano 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Huang Rui 
Tested-by: Akihiko Odaki 
Acked-by: Huang Rui 
Reviewed-by: Gurchetan Singh 
Reviewed-by: Akihiko Odaki 
Signed-off-by: Huang Rui 
---

This patch is already under review as part of the "rutabaga_gfx + gfxstream"
series (already in v13) and has been included here because of dependency.

 hw/virtio/virtio-pci.c | 18 ++
 include/hw/virtio/virtio-pci.h |  4 
 2 files changed, 22 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index edbc0daa18..da8c9ea12d 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1435,6 +1435,24 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
 return offset;
 }
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+   uint8_t bar, uint64_t offset, uint64_t length,
+   uint8_t id)
+{
+struct virtio_pci_cap64 cap = {
+.cap.cap_len = sizeof cap,
+.cap.cfg_type = VIRTIO_PCI_CAP_SHARED_MEMORY_CFG,
+};
+
+cap.cap.bar = bar;
+cap.cap.length = cpu_to_le32(length);
+cap.length_hi = cpu_to_le32(length >> 32);
+cap.cap.offset = cpu_to_le32(offset);
+cap.offset_hi = cpu_to_le32(offset >> 32);
+cap.cap.id = id;
+return virtio_pci_add_mem_cap(proxy, );
+}
+
 static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
unsigned size)
 {
diff --git a/include/hw/virtio/virtio-pci.h b/include/hw/virtio/virtio-pci.h
index ab2051b64b..5a3f182f99 100644
--- a/include/hw/virtio/virtio-pci.h
+++ b/include/hw/virtio/virtio-pci.h
@@ -264,4 +264,8 @@ unsigned virtio_pci_optimal_num_queues(unsigned 
fixed_queues);
 void virtio_pci_set_guest_notifier_fd_handler(VirtIODevice *vdev, VirtQueue 
*vq,
   int n, bool assign,
   bool with_irqfd);
+
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy, uint8_t bar, uint64_t offset,
+   uint64_t length, uint8_t id);
+
 #endif
-- 
2.34.1




[QEMU PATCH v5 00/13] Support blob memory and venus on qemu

2023-09-15 Thread Huang Rui
Hi all,

Antonio Caggiano made the venus with QEMU on KVM platform last
September[1]. This series are inherited from his original work to support
the features of context init, hostmem, resource uuid, and blob resources
for venus.
At March of this year, we sent out the V1 version[2] for the review. But
those series are included both xen and virtio gpu. Right now, we would like
to divide into two parts, one is to continue the Antonio's work to upstream
virtio-gpu support for blob memory and venus, and another is to upstream
xen specific patches. This series is focusing on virtio-gpu, so we are
marking as V4 version here to continue Antonio's patches[1]. And we will
send xen specific patches separately, because they are hypervisor specific.
Besides of QEMU, these supports also included virglrenderer[3][4] and
mesa[5][6] as well. Right now, virglrenderer and mesa parts are all
accepted by upstream. In this qemu version, we try to address the concerns
around not proper cleanup during blob resource unmap and unref. Appreciate
it if you have any commments.

[1] 
https://lore.kernel.org/qemu-devel/20220926142422.22325-1-antonio.caggi...@collabora.com/
[2] V1: 
https://lore.kernel.org/qemu-devel/20230312092244.451465-1-ray.hu...@amd.com
[3] https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1068
[4] https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1180
[5] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22108
[6] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23680

Please note the first 4 patches 1 -> 4 are inlcuded in these series because
the series depends on them and not because we want them to be reviewed
since they are already in the process of review through the "rutabaga_gfx +
gfxstream" series.
- 
https://lore.kernel.org/qemu-devel/20230829003629.410-1-gurchetansi...@chromium.org/

V4: 
https://lore.kernel.org/qemu-devel/20230831093252.2461282-1-ray.hu...@amd.com

Changes from V4 (virtio gpu V4) to V5

- Inverted patch 5 and 6 because we should configure
  HAVE_VIRGL_CONTEXT_INIT firstly.

- Validate owner of memory region to avoid slowing down DMA.

- Use memory_region_init_ram_ptr() instead of
  memory_region_init_ram_device_ptr().

- Adjust sequence to allocate gpu resource before virglrender resource
  creation

- Add virtio migration handling for uuid.

- Send kernel patch to define VIRTIO_GPU_CAPSET_VENUS.
  https://lore.kernel.org/lkml/20230915105918.3763061-1-ray.hu...@amd.com/

- Add meson check to make sure unstable APIs defined from 0.9.0.

Changes from V1 to V2 (virtio gpu V4)

- Remove unused #include "hw/virtio/virtio-iommu.h"

- Add a local function, called virgl_resource_destroy(), that is used
  to release a vgpu resource on error paths and in resource_unref.

- Remove virtio_gpu_virgl_resource_unmap from
  virtio_gpu_cleanup_mapping(),
  since this function won't be called on blob resources and also because
  blob resources are unmapped via virgl_cmd_resource_unmap_blob().

- In virgl_cmd_resource_create_blob(), do proper cleanup in error paths
  and move QTAILQ_INSERT_HEAD(>reslist, res, next) after the resource
  has been fully initialized.

- Memory region has a different life-cycle from virtio gpu resources
  i.e. cannot be released synchronously along with the vgpu resource.
  So, here the field "region" was changed to a pointer and is allocated
  dynamically when the blob is mapped.
  Also, since the pointer can be used to indicate whether the blob
  is mapped, the explicite field "mapped" was removed.

- In virgl_cmd_resource_map_blob(), add check on the value of
  res->region, to prevent beeing called twice on the same resource.

- Add a patch to enable automatic deallocation of memory regions to resolve
  use-after-free memory corruption with a reference.

References

Demo with Venus:
- 
https://static.sched.com/hosted_files/xen2023/3f/xen_summit_2023_virtgpu_demo.mp4
QEMU repository:
- https://gitlab.freedesktop.org/rui/qemu-xen/-/commits/upstream-for-virtio-gpu

Thanks,
Ray

Antonio Caggiano (6):
  virtio-gpu: CONTEXT_INIT feature
  virtio-gpu: blob prep
  virtio-gpu: Handle resource blob commands
  virtio-gpu: Resource UUID
  virtio-gpu: Support Venus capset
  virtio-gpu: Initialize Venus

Dmitry Osipenko (1):
  virtio-gpu: Don't require udmabuf when blobs and virgl are enabled

Dr. David Alan Gilbert (1):
  virtio: Add shared memory capability

Gerd Hoffmann (1):
  virtio-gpu: hostmem

Huang Rui (3):
  virtio-gpu: Configure context init for virglrenderer
  virtio-gpu: Support context init feature with virglrenderer
  virtio-gpu: Enable virglrenderer render server flag for venus

Xenia Ragiadakou (1):
  softmmu/memory: enable automatic deallocation of memory regions

 hw/display/trace-events  |   1 +
 hw/display/virtio-gpu-base.c |   5 +
 hw/display/virtio-gpu-pci.c  |  14 ++
 hw/display/virtio-gpu-virgl.c| 272 ++-
 hw/display/virtio-gpu.c  |  78 

Re: [XEN PATCH] x86/ACPI: Ignore entries with invalid APIC IDs when parsing MADT

2023-09-15 Thread George Dunlap
On Thu, Sep 14, 2023 at 12:18 AM Stefano Stabellini
 wrote:
>
> On Wed, 13 Sep 2023, George Dunlap wrote:
> > On Tue, Sep 12, 2023 at 8:57 AM Thomas Gleixner  wrote:
> > >
> > > On Mon, Sep 11 2023 at 19:24, Andrew Cooper wrote:
> > > > Furthermore, cursory testing that Thomas did for the Linux topology work
> > > > demonstrates that it is broken anyway for reasons unrelated to ACPI 
> > > > parsing.
> > > >
> > > > Even furthermore, it's an area of the Xen / dom0 boundary which is
> > > > fundamentally broken for non-PV cases, and undocumented for the PV case,
> > > > hence why it's broken in Linux.
> > > >
> > > > Physical CPU Hotplug does not pass the bar for being anything more than
> > > > experimental.  It's absolutely not tech-preview level because the only
> > > > demo it has had in an environment (admittedly virtual) which does
> > > > implement the spec in a usable way demonstrates that it doesn't 
> > > > function.
> > > >
> > > > The fact no-one has noticed until now shows that the feature isn't used,
> > > > which comes back around full circle to the fact that Intel never made it
> > > > work and never shipped it.
> > >
> > > OTOH it _is_ used in virtualization. KVM supports it and it just
> > > works. That's how I found out that XEN explodes in colourful ways :)
> >
> > It should be pointed out that there's currently a start-up selling a
> > product that specifically runs Xen under cloud providers -- Exostellar
> > (was Exotanium) [1].  If cloud providers do use ACPI hotplug to allow
> > on-the-fly adjustments of the number of vcpus, Exostellar will
> > probably want support at some point.  (Perhaps it would be good to
> > rope them into testing / maintaining it.)
>
> Supporting CPU hotplug in a nested virtualization setting is a different
> proposition compared to supporting Physical CPU Hotplug. Typically
> virtual firmware (hypervisor-provided firmware) has less unexpected
> behaviors compared to baremetal firmware.
>
> Could you make the distinction in SUPPORT.md?

People say at the moment it doesn't actually work, even under QEMU; so
"Experimental" is probably the right status.

But if someone got it working, we might add "supported, with caveats"
and specify things in a comment.

 -George



Re: [PATCH v2] timer: fix NR_CPUS=1 build with gcc13

2023-09-15 Thread George Dunlap
On Thu, Sep 14, 2023 at 3:32 PM Jan Beulich  wrote:
>
> Gcc13 apparently infers from "if ( old_cpu < new_cpu )" that "new_cpu"
> is >= 1, and then (on x86) complains about "per_cpu(timers, new_cpu)"
> exceeding __per_cpu_offset[]'s bounds (being an array of 1 in such a
> configuration). Make the code conditional upon there being at least 2
> CPUs configured (otherwise there simply is nothing to migrate [to]).
>
> Signed-off-by: Jan Beulich 

I would still have preferred the code be more robust for the NR_CPUS >
1 case, but this is an improvement, so in any case:

Acked-by: George Dunlap 



Re: [QEMU PATCH v4 10/13] virtio-gpu: Resource UUID

2023-09-15 Thread Albert Esteve
On Thu, Sep 14, 2023 at 6:56 PM Akihiko Odaki 
wrote:

> On 2023/09/14 17:29, Albert Esteve wrote:
> >
> >
> > On Thu, Sep 14, 2023 at 9:17 AM Akihiko Odaki  > > wrote:
> >
> > On 2023/09/13 23:18, Albert Esteve wrote:
> >  >
> >  >
> >  > On Wed, Sep 13, 2023 at 3:43 PM Akihiko Odaki
> > mailto:akihiko.od...@daynix.com>
> >  >  > >> wrote:
> >  >
> >  > On 2023/09/13 21:58, Albert Esteve wrote:
> >  >  >
> >  >  >
> >  >  > On Wed, Sep 13, 2023 at 2:22 PM Akihiko Odaki
> >  > mailto:akihiko.od...@daynix.com>
> > >
> >  >  >  > 
> >  >  >  >  >  >
> >  >  > On 2023/09/13 20:34, Albert Esteve wrote:
> >  >  >  >
> >  >  >  >
> >  >  >  > On Wed, Sep 13, 2023 at 12:34 PM Akihiko Odaki
> >  >  >  >   > >
> >  >  >   > >>
> >  >  >  >  > 
> >  >  > >
> >  >  >  > 
> >  >  >  wrote:
> >  >  >  >
> >  >  >  > On 2023/09/13 16:55, Albert Esteve wrote:
> >  >  >  >  > Hi Antonio,
> >  >  >  >  >
> >  >  >  >  > If I'm not mistaken, this patch is related
> with:
> >  >  >  >  >
> >  >  >  >
> >  >  >
> >  >
> > https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html
> >  >
> >  >
> >   <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html>>
> >  >  >
> >  >
> >   <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html> <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html>>>
> >  >  >  >
> >  >  >
> >  >
> >   <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html> <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html>> <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html> <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html
> >  >  >  >  >
> >  >  >  >
> >  >  >
> >  >
> >   <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html> <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html>> <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html> <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html>>>
> >  >  >  >
> >  >  >
> >  >
> >   <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html> <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html>> <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html> <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html>
> >  >  >  >  > IMHO, ideally, virtio-gpu and vhost-user-gpu
> > both,
> >  > would
> >  >  > use the
> >  >  >  >  

Re: [PATCH 6/8] x86/entry: Track the IST-ness of an entry for the exit paths

2023-09-15 Thread Andrew Cooper
On 15/09/2023 8:13 am, Jan Beulich wrote:
> On 14.09.2023 21:44, Andrew Cooper wrote:
>> On 14/09/2023 10:32 am, Jan Beulich wrote:
>>> On 13.09.2023 22:27, Andrew Cooper wrote:
 --- a/xen/arch/x86/x86_64/entry.S
 +++ b/xen/arch/x86/x86_64/entry.S
 @@ -142,10 +142,16 @@ process_trap:
  
  .section .text.entry, "ax", @progbits
  
 -/* %rbx: struct vcpu, interrupts disabled */
 +/* %rbx: struct vcpu, %r12: ist_exit, interrupts disabled */
  restore_all_guest:
 -ASSERT_INTERRUPTS_DISABLED
  
 +#ifdef CONFIG_DEBUG
 +mov   %rsp, %rdi
 +mov   %r12, %rsi
 +call  check_ist_exit
 +#endif
 +
 +ASSERT_INTERRUPTS_DISABLED
  /* Stash guest SPEC_CTRL value while we can read struct vcpu. */
  mov VCPU_arch_msrs(%rbx), %rdx
  mov VCPUMSR_spec_ctrl_raw(%rdx), %r15d
>>> Even here I don't think I see a need for the addition. Plus if the check
>>> is warranted here, is it really necessary for it to live ahead of the
>>> interrupts-disabled check?
>> What makes you think there is a relevance to the order of two assertions
>> in fully irqs-off code?
> You explicitly making it more churn than strictly needed. IOW I was
> simply wondering whether I was overlooking some aspect.

That was just the diff algorithm after accidentally removing the newline
after the ASSERT.  I've undone that, and the hunk is simple additions
for the check, like it is in the other two hunks.

~Andrew



Re: [PATCH 4/8] x86/spec-ctrl: Extend all SPEC_CTRL_{ENTER,EXIT}_* comments

2023-09-15 Thread Andrew Cooper
On 15/09/2023 8:07 am, Jan Beulich wrote:
> On 14.09.2023 21:23, Andrew Cooper wrote:
>> On 14/09/2023 8:58 am, Jan Beulich wrote:
>>> On 13.09.2023 22:27, Andrew Cooper wrote:
 @@ -319,7 +334,14 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
  UNLIKELY_END(\@_serialise)
  .endm
  
 -/* Use when exiting to Xen in IST context. */
 +/*
 + * Use when exiting from any entry context, back to Xen context.  This
 + * includes returning to other SPEC_CTRL_{ENTRY,EXIT}_* regions with
 + * unsanitised state.
 + *
 + * Because we might have interrupted Xen beyond SPEC_CTRL_EXIT_TO_$GUEST, 
 we
 + * must treat this as if it were an EXIT_TO_$GUEST case too.
 + */
  .macro SPEC_CTRL_EXIT_TO_XEN
  /*
   * Requires %rbx=stack_end
>>> Is it really "must"? At least in theory there are ways to recognize that
>>> exit is back to Xen context outside of interrupted entry/exit regions
>>> (simply by evaluating how far below stack top %rsp is).
>> Yes, it is must - it's about how Xen behaves right now, not about some
>> theoretical future with different tracking mechanism.
> Well, deleting "must" does exactly that

Nonsense.

*When* someone changes the logic such that there's an alternative route,
the comment necessarily needs updating.  And until that point, the logic
*must* behave in this way to be correct.

~Andrew



Re: [PATCH 6/7] block: Clean up local variable shadowing

2023-09-15 Thread Kevin Wolf
Am 31.08.2023 um 15:25 hat Markus Armbruster geschrieben:
> Local variables shadowing other local variables or parameters make the
> code needlessly hard to understand.  Tracked down with -Wshadow=local.
> Clean up: delete inner declarations when they are actually redundant,
> else rename variables.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  block.c  |  7 ---
>  block/rbd.c  |  2 +-
>  block/stream.c   |  1 -
>  block/vvfat.c| 34 +-
>  hw/block/xen-block.c |  6 +++---
>  5 files changed, 25 insertions(+), 25 deletions(-)

I wonder why you made vdi a separate patch, but not vvfat, even though
that has more changes. (Of course, my selfish motivation for asking this
is that I could have given a R-b for it and wouldn't have to look at it
again in a v2 :-))

> diff --git a/block.c b/block.c
> index a307c151a8..7f0003d8ac 100644
> --- a/block.c
> +++ b/block.c
> @@ -3001,7 +3001,8 @@ static BdrvChild 
> *bdrv_attach_child_common(BlockDriverState *child_bs,
> BdrvChildRole child_role,
> uint64_t perm, uint64_t 
> shared_perm,
> void *opaque,
> -   Transaction *tran, Error **errp)
> +   Transaction *transaction,
> +   Error **errp)
>  {
>  BdrvChild *new_child;
>  AioContext *parent_ctx, *new_child_ctx;
> @@ -3088,7 +3089,7 @@ static BdrvChild 
> *bdrv_attach_child_common(BlockDriverState *child_bs,
>  .old_parent_ctx = parent_ctx,
>  .old_child_ctx = child_ctx,
>  };
> -tran_add(tran, _attach_child_common_drv, s);
> +tran_add(transaction, _attach_child_common_drv, s);
>  
>  if (new_child_ctx != child_ctx) {
>  aio_context_release(new_child_ctx);

I think I would resolve this one the other way around. 'tran' is the
typical name for the parameter and it is the transaction that this
function should add things to.

The other one that shadows it is a local transaction that is completed
within the function. I think it's better if that one has a different
name.

As usual, being more specific than just 'tran' vs. 'transaction' would
be nice. Maybe 'aio_ctx_tran' for the nested one?

The rest looks okay.

Kevin




Re: [PATCH 4/7] block/dirty-bitmap: Clean up local variable shadowing

2023-09-15 Thread Kevin Wolf
Am 31.08.2023 um 15:25 hat Markus Armbruster geschrieben:
> Local variables shadowing other local variables or parameters make the
> code needlessly hard to understand.  Tracked down with -Wshadow=local.
> Clean up: delete inner declarations when they are actually redundant,
> else rename variables.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  block/monitor/bitmap-qmp-cmds.c | 2 +-
>  block/qcow2-bitmap.c| 3 +--
>  2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/block/monitor/bitmap-qmp-cmds.c b/block/monitor/bitmap-qmp-cmds.c
> index 55f778f5af..4d018423d8 100644
> --- a/block/monitor/bitmap-qmp-cmds.c
> +++ b/block/monitor/bitmap-qmp-cmds.c
> @@ -276,7 +276,7 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char 
> *node, const char *target,
>  
>  for (lst = bms; lst; lst = lst->next) {
>  switch (lst->value->type) {
> -const char *name, *node;
> +const char *name;
>  case QTYPE_QSTRING:
>  name = lst->value->u.local;
>  src = bdrv_find_dirty_bitmap(bs, name);

The names in this function are all over the place... A more ambitious
patch could rename the parameters to dst_node/dst_bitmap and these
variables to src_node/src_bitmap to get some more consistency (both with
each other and with the existing src/dst variables).

Preexisting, so I'm not insisting that you should do this.

> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> index 037fa2d435..ffd5cd3b23 100644
> --- a/block/qcow2-bitmap.c
> +++ b/block/qcow2-bitmap.c
> @@ -1555,7 +1555,6 @@ bool 
> qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
>  FOR_EACH_DIRTY_BITMAP(bs, bitmap) {
>  const char *name = bdrv_dirty_bitmap_name(bitmap);
>  uint32_t granularity = bdrv_dirty_bitmap_granularity(bitmap);
> -Qcow2Bitmap *bm;
>  
>  if (!bdrv_dirty_bitmap_get_persistence(bitmap) ||
>  bdrv_dirty_bitmap_inconsistent(bitmap)) {
> @@ -1625,7 +1624,7 @@ bool 
> qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
>  
>  /* allocate clusters and store bitmaps */
>  QSIMPLEQ_FOREACH(bm, bm_list, entry) {
> -BdrvDirtyBitmap *bitmap = bm->dirty_bitmap;
> +bitmap = bm->dirty_bitmap;
>  
>  if (bitmap == NULL || bdrv_dirty_bitmap_readonly(bitmap)) {
>  continue;

Reviewed-by: Kevin Wolf 




[PATCH v2] x86/shutdown: change default reboot method preference

2023-09-15 Thread Roger Pau Monne
The current logic to chose the preferred reboot method is based on the mode Xen
has been booted into, so if the box is booted from UEFI, the preferred reboot
method will be to use the ResetSystem() run time service call.

However, that method seems to be widely untested, and quite often leads to a
result similar to:

Hardware Dom0 shutdown: rebooting machine
[ Xen-4.18-unstable  x86_64  debug=y  Tainted:   C]
CPU:0
RIP:e008:[<0017>] 0017
RFLAGS: 00010202   CONTEXT: hypervisor
[...]
Xen call trace:
   [<0017>] R 0017
   [] S 83207eff7b50
   [] F machine_restart+0x1da/0x261
   [] F apic_wait_icr_idle+0/0x37
   [] F smp_call_function_interrupt+0xc7/0xcb
   [] F call_function_interrupt+0x20/0x34
   [] F do_IRQ+0x150/0x6f3
   [] F common_interrupt+0x132/0x140
   [] F 
arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129
   [] F 
arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7
   [] F arch/x86/domain.c#idle_loop+0xec/0xee


Panic on CPU 0:
FATAL TRAP: vector = 6 (invalid opcode)


Which in most cases does lead to a reboot, however that's unreliable.

Change the default reboot preference to prefer ACPI over UEFI if available and
not in reduced hardware mode.

This is in line to what Linux does, so it's unlikely to cause issues on current
and future hardware, since there's a much higher chance of vendors testing
hardware with Linux rather than Xen.

Add a special case for one Acer model that does require being rebooted using
ResetSystem().  See Linux commit 0082517fa4bce for rationale.

I'm not aware of using ACPI reboot causing issues on boxes that do have
properly implemented ResetSystem() methods.

Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Add special case for Acer model to use UEFI reboot.
 - Adjust commit message.
---
 xen/arch/x86/shutdown.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/shutdown.c b/xen/arch/x86/shutdown.c
index 7619544d14da..3816ede1afe5 100644
--- a/xen/arch/x86/shutdown.c
+++ b/xen/arch/x86/shutdown.c
@@ -150,19 +150,20 @@ static void default_reboot_type(void)
 
 if ( xen_guest )
 reboot_type = BOOT_XEN;
+else if ( !acpi_disabled && !acpi_gbl_reduced_hardware )
+reboot_type = BOOT_ACPI;
 else if ( efi_enabled(EFI_RS) )
 reboot_type = BOOT_EFI;
-else if ( acpi_disabled )
-reboot_type = BOOT_KBD;
 else
-reboot_type = BOOT_ACPI;
+reboot_type = BOOT_KBD;
 }
 
 static int __init cf_check override_reboot(const struct dmi_system_id *d)
 {
 enum reboot_type type = (long)d->driver_data;
 
-if ( type == BOOT_ACPI && acpi_disabled )
+if ( (type == BOOT_ACPI && acpi_disabled) ||
+ (type == BOOT_EFI && !efi_enabled(EFI_RS)) )
 type = BOOT_KBD;
 
 if ( reboot_type != type )
@@ -172,6 +173,7 @@ static int __init cf_check override_reboot(const struct 
dmi_system_id *d)
 [BOOT_KBD]  = "keyboard controller",
 [BOOT_ACPI] = "ACPI",
 [BOOT_CF9]  = "PCI",
+[BOOT_EFI]  = "UEFI",
 };
 
 reboot_type = type;
@@ -530,6 +532,15 @@ static const struct dmi_system_id __initconstrel 
reboot_dmi_table[] = {
 DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R740"),
 },
 },
+{/* Handle problems with rebooting on Acer TravelMate X514-51T. */
+.callback = override_reboot,
+.driver_data = (void *)(long)BOOT_EFI,
+.ident = "Acer TravelMate X514-51T",
+.matches = {
+DMI_MATCH(DMI_SYS_VENDOR, "Acer"),
+DMI_MATCH(DMI_PRODUCT_NAME, "TravelMate X514-51T"),
+},
+},
 { }
 };
 
-- 
2.42.0




Re: [PATCH 2/5] x86: Introduce x86_merge_dr6()

2023-09-15 Thread Jan Beulich
On 14.09.2023 20:03, Andrew Cooper wrote:
> On 14/09/2023 3:53 pm, Jan Beulich wrote:
>> On 13.09.2023 01:21, Andrew Cooper wrote:
>>> The current logic used to update %dr6 when injecting #DB is buggy.  The
>>> architectural behaviour is to overwrite B{0..3} and accumulate all other 
>>> bits.
>> While I consider this behavior plausible, forever since the introduction of
>> debug registers in i386 I have been missing a description in the manuals of
>> how %dr6 updating works. Can you point me at where the above is actually
>> spelled out?
> 
> The documentation is very poor.  The comment in the code is based on my
> conversations with architects.

Especially in such a case, can you please make very explicit in the
description what the origin of the information is? That way it's
simply impossible for anyone to review properly without having had
the same conversations.

Jan

> APM Vol2 13.1.1.3 Debug-Status Register (DR6) says
> 
> "Bits 15:13 of the DR6 register are not cleared by the processor and
> must be cleared by software after the contents have been read."
> 
> although this is buggy given the addition of BLD in the latest
> revision.  I've asked AMD to correct it.
> 
> 
> SDM Vol3 18.2.3 Debug Status Register (DR6) says
> 
> "Certain debug exceptions may clear bits 0-3. The remaining contents of
> the DR6 register are never cleared by the processor."
> 
> ~Andrew




Re: [PATCH 5/7] block/vdi: Clean up local variable shadowing

2023-09-15 Thread Kevin Wolf
Am 31.08.2023 um 15:25 hat Markus Armbruster geschrieben:
> Local variables shadowing other local variables or parameters make the
> code needlessly hard to understand.  Tracked down with -Wshadow=local.
> Clean up: delete inner declarations when they are actually redundant,
> else rename variables.
> 
> Signed-off-by: Markus Armbruster 

> @@ -700,7 +699,7 @@ nonallocating_write:
>  /* One or more new blocks were allocated. */
>  VdiHeader *header;
>  uint8_t *base;
> -uint64_t offset;
> +uint64_t offs;
>  uint32_t n_sectors;
>  
>  g_free(block);
> @@ -723,11 +722,11 @@ nonallocating_write:
>  bmap_first /= (SECTOR_SIZE / sizeof(uint32_t));
>  bmap_last /= (SECTOR_SIZE / sizeof(uint32_t));
>  n_sectors = bmap_last - bmap_first + 1;
> -offset = s->bmap_sector + bmap_first;
> +offs = s->bmap_sector + bmap_first;
>  base = ((uint8_t *)>bmap[0]) + bmap_first * SECTOR_SIZE;
>  logout("will write %u block map sectors starting from entry %u\n",
> n_sectors, bmap_first);
> -ret = bdrv_co_pwrite(bs->file, offset * SECTOR_SIZE,
> +ret = bdrv_co_pwrite(bs->file, offs * SECTOR_SIZE,
>   n_sectors * SECTOR_SIZE, base, 0);
>  }

Having two variables 'offset' and 'offs' doesn't really help with
clarity either. Can we be more specific and use something like
'bmap_offset' here?

Kevin




Re: 4.8-unstable: building firmware/hvmloader errors in Bookworm with gcc-12.2

2023-09-15 Thread Jan Beulich
On 14.09.2023 22:53, Pry Mar wrote:
> First attempt to build xen-4.18-unstabl in deb12 (Bookworm):
> /home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.asl
>   9851: Name ( SLT, 0x0 )
> Remark   2173 -  Creation of named objects within a method is
> highly inefficient, use globals or method local variables instead ^
>  (\_GPE._L03)
> 
> /home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.asl
>   9852: Name ( EVT, 0x0 )
> Remark   2173 -  Creation of named objects within a method is
> highly inefficient, use globals or method local variables instead ^
>  (\_GPE._L03)

Two remarks here, and ...

> ASL Input:
> /home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.asl
> -  397476 bytes   8140 keywords  11139 source lines
> AML Output:
>  
> /home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.aml
> -   73130 bytes   5541 opcodes2599 named objects
> Hex Dump:
>  
> /home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.hex
> -  686160 bytes
> 
> Compilation successful. 0 Errors, 129 Warnings, 2 Remarks, 2762
> Optimizations

... no errors here.

> sed -e 's/AmlCode/dsdt_anycpu/g' -e 's/_aml_code//g'
> /home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.hex
>> /home/mockbuild/pbdeps/xen-4.18~rc0/debi>
> echo "int dsdt_anycpu_len=sizeof(dsdt_anycpu);" >>
> /home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.c.tmp
> mv -f
> /home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.c.tmp
> /home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmlo>
> rm -f
> /home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.aml
> /home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmload>
> make[9]: Leaving directory
> '/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/libacpi'
> make[8]: Leaving directory
> '/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader'
> make[7]: ***
> [/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/../../tools/Rules.mk:206:
> subdir-all-hvmloader] Error 2

And then a sudden failure here with no indication what has failed. Did you
supply an insufficient fragment of overall output, or is there indeed
something failing without any error indication? Of course it also doesn't
help that your email is hopelessly damaged by undue line wrapping.

Jan

> make[7]: Leaving directory
> '/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware'
> make[6]: ***
> [/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/../../tools/Rules.mk:201:
> subdirs-all] Error 2
> make[6]: Leaving directory
> '/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware'
> make[5]: *** [Makefile:37: all] Error 2
> make[5]: Leaving directory
> '/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware'
> make[4]: ***
> [/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/../tools/Rules.mk:206:
> subdir-all-firmware] Error 2
> make[4]: Leaving directory
> '/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools'
> make[3]: ***
> [/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/../tools/Rules.mk:201:
> subdirs-all] Error 2
> make[3]: Leaving directory
> '/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools'
> make[2]: *** [debian/rules.real:218: debian/stamps/build-utils_amd64] Error
> 2
> make[2]: Leaving directory '/home/mockbuild/pbdeps/xen-4.18~rc0'
> make[1]: *** [debian/rules.gen:57: build-arch_amd64_real] Error 2
> make[1]: Leaving directory '/home/mockbuild/pbdeps/xen-4.18~rc0'
> make: *** [debian/rules:24: build-arch] Error 2
> dpkg-buildpackage: error: debian/rules build subprocess returned exit
> status 2
> 
> any help is appreciated,
> PryMar56
> 




Re: [PATCH 7/8] x86/spec-ctrl: Issue VERW during IST exit to Xen

2023-09-15 Thread Jan Beulich
On 14.09.2023 21:49, Andrew Cooper wrote:
> On 14/09/2023 11:01 am, Jan Beulich wrote:
>> On 13.09.2023 22:27, Andrew Cooper wrote:
>>> There is a corner case where e.g. an NMI hitting an exit-to-guest path after
>>> SPEC_CTRL_EXIT_TO_* would have run the entire NMI handler *after* the VERW
>>> flush to scrub potentially sensitive data from uarch buffers.
>>>
>>> In order to compensate, issue VERW when exiting to Xen from an IST entry.
>>>
>>> SPEC_CTRL_EXIT_TO_XEN already has two reads of spec_ctrl_flags off the 
>>> stack,
>>> and we're about to add a third.  Load the field into %ebx, and list the
>>> register as clobbered.
>>>
>>> %r12 has been arranged to be the ist_exit signal, so add this as an input
>>> dependency and use it to identify when to issue a VERW.
>>>
>>> Signed-off-by: Andrew Cooper 
>> While looking technically okay, I still have a couple of remarks:
>>
>>> --- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
>>> +++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
>>> @@ -344,10 +344,12 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
>>>   */
>>>  .macro SPEC_CTRL_EXIT_TO_XEN
>>>  /*
>>> - * Requires %r14=stack_end
>>> - * Clobbers %rax, %rcx, %rdx
>>> + * Requires %r12=ist_exit, %r14=stack_end
>>> + * Clobbers %rax, %rbx, %rcx, %rdx
>> While I'd generally be a little concerned of the growing dependency and
>> clobber lists, there being a single use site makes this pretty okay. The
>> macro invocation being immediately followed by RESTORE_ALL effectively
>> means we can clobber almost everything here.
>>
>> As to register usage and my comment on patch 5: There's no real need
>> to switch %rbx to %r14 there if I'm not mistaken
> 
> As said, it's about consistency of the helpers.

While I'm not entirely happy with this, ...

>>> @@ -367,8 +369,16 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
>>>  
>>>  .L\@_skip_sc_msr:
>>>  
>>> -/* TODO VERW */
>>> +test %r12, %r12
>>> +jz .L\@_skip_ist_exit
>>> +
>>> +/* Logically DO_SPEC_CTRL_COND_VERW but without the %rsp=cpuinfo 
>>> dependency */
>>> +testb $SCF_verw, %bl
>>> +jz .L\@_verw_skip
>>> +verw STACK_CPUINFO_FIELD(verw_sel)(%r14)
>>> +.L\@_verw_skip:
>> Nit: .L\@_skip_verw would be more consistent with the other label names.
> 
> So it would.  I'll tweak.

... then
Reviewed-by: Jan Beulich 

Jan



Re: [PATCH 6/8] x86/entry: Track the IST-ness of an entry for the exit paths

2023-09-15 Thread Jan Beulich
On 14.09.2023 21:44, Andrew Cooper wrote:
> On 14/09/2023 10:32 am, Jan Beulich wrote:
>> On 13.09.2023 22:27, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/x86_64/compat/entry.S
>>> +++ b/xen/arch/x86/x86_64/compat/entry.S
>>> @@ -117,8 +117,15 @@ compat_process_trap:
>>>  call  compat_create_bounce_frame
>>>  jmp   compat_test_all_events
>>>  
>>> -/* %rbx: struct vcpu, interrupts disabled */
>>> +/* %rbx: struct vcpu, %r12: ist_exit, interrupts disabled */
>>>  ENTRY(compat_restore_all_guest)
>>> +
>>> +#ifdef CONFIG_DEBUG
>>> +mov   %rsp, %rdi
>>> +mov   %r12, %rsi
>>> +call  check_ist_exit
>>> +#endif
>>> +
>>>  ASSERT_INTERRUPTS_DISABLED
>>>  mov   $~(X86_EFLAGS_IOPL | X86_EFLAGS_VM), %r11d
>>>  and   UREGS_eflags(%rsp),%r11d
>> Without having peeked ahead, is there any use of %r12 going to appear
>> on this path? I thought it's only going to be restore_all_xen?
> 
> For now, we only need to change behaviour based on ist_exit in
> restore_all_xen.
> 
> But, we do get here in IST context, and I'm not interested in having to
> re-do the analysis to determine if this is safe.  ist_exit is a global
> property of exiting Xen, so should be kept correct from the outset.

Would be nice to mention this just-in-case aspect in the description.

>>> --- a/xen/arch/x86/x86_64/entry.S
>>> +++ b/xen/arch/x86/x86_64/entry.S
>>> @@ -142,10 +142,16 @@ process_trap:
>>>  
>>>  .section .text.entry, "ax", @progbits
>>>  
>>> -/* %rbx: struct vcpu, interrupts disabled */
>>> +/* %rbx: struct vcpu, %r12: ist_exit, interrupts disabled */
>>>  restore_all_guest:
>>> -ASSERT_INTERRUPTS_DISABLED
>>>  
>>> +#ifdef CONFIG_DEBUG
>>> +mov   %rsp, %rdi
>>> +mov   %r12, %rsi
>>> +call  check_ist_exit
>>> +#endif
>>> +
>>> +ASSERT_INTERRUPTS_DISABLED
>>>  /* Stash guest SPEC_CTRL value while we can read struct vcpu. */
>>>  mov VCPU_arch_msrs(%rbx), %rdx
>>>  mov VCPUMSR_spec_ctrl_raw(%rdx), %r15d
>> Even here I don't think I see a need for the addition. Plus if the check
>> is warranted here, is it really necessary for it to live ahead of the
>> interrupts-disabled check?
> 
> What makes you think there is a relevance to the order of two assertions
> in fully irqs-off code?

You explicitly making it more churn than strictly needed. IOW I was
simply wondering whether I was overlooking some aspect.

> The checks are in the same order as the comment stating the invariants.

If that's the only criteria, then okay (but still slightly odd to
see more churn than necessary).

Jan



Re: [PATCH] x86/shutdown: change default reboot method preference

2023-09-15 Thread Roger Pau Monné
On Thu, Sep 14, 2023 at 06:42:03PM +0100, Andrew Cooper wrote:
> On 14/09/2023 4:21 pm, Roger Pau Monne wrote:
> > The current logic to chose the preferred reboot method is based on the mode 
> > Xen
> > has been booted into, so if the box is booted from UEFI, the preferred 
> > reboot
> > method will be to use the ResetSystem() run time service call.
> >
> > However, that call seems to be widely untested once the firmware has exited
> > boot services, and quite often leads to a result similar to:
> 
> I don't know how true this is.  What Xen does differently to most of the
> rest of the world is not calling SetVirtualAddressMap()

Hm, but that doesn't seem to affect the rest of the run-time services
(at least on this box), it's (just?) ResetSystem() that misbehaves.

I can remove that sentence however, it is more of a guess rather than
a fact.

> 
> >
> > Hardware Dom0 shutdown: rebooting machine
> > [ Xen-4.18-unstable  x86_64  debug=y  Tainted:   C]
> > CPU:0
> > RIP:e008:[<0017>] 0017
> > RFLAGS: 00010202   CONTEXT: hypervisor
> > [...]
> > Xen call trace:
> >[<0017>] R 0017
> >[] S 83207eff7b50
> >[] F machine_restart+0x1da/0x261
> >[] F apic_wait_icr_idle+0/0x37
> >[] F smp_call_function_interrupt+0xc7/0xcb
> >[] F call_function_interrupt+0x20/0x34
> >[] F do_IRQ+0x150/0x6f3
> >[] F common_interrupt+0x132/0x140
> >[] F 
> > arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129
> >[] F 
> > arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7
> >[] F arch/x86/domain.c#idle_loop+0xec/0xee
> >
> > 
> > Panic on CPU 0:
> > FATAL TRAP: vector = 6 (invalid opcode)
> > 
> >
> > Which in most cases does lead to a reboot, however that's unreliable.
> >
> > Change the default reboot preference to prefer ACPI over UEFI if available 
> > and
> > not in reduced hardware mode.
> >
> > This is in line to what Linux does, so it's unlikely to cause issues on 
> > current
> > and future hardware, since there's a much higher chance of vendors testing
> > hardware with Linux rather than Xen.
> >
> > I'm not aware of using ACPI reboot causing issues on boxes that do have
> > properly implemented ResetSystem() methods.
> >
> > Signed-off-by: Roger Pau Monné 
> 
> Wording aside, Reviewed-by: Andrew Cooper 
> 
> This is a giant embarrassment to Xen, and needs fixing.

Looking at Linux I think we need to add an override for Acer
TravelMate X514-51T to force it to use UEFI, will send v2 with it.

I do wonder if we need to keep the reboot_dmi_table[] entries that
force systems to use ACPI, as it would now be the default?  My
thinking is that it's likely better to keep it just in case we change
the default again in the future.

Thanks, Roger.



Re: [PATCH 4/8] x86/spec-ctrl: Extend all SPEC_CTRL_{ENTER,EXIT}_* comments

2023-09-15 Thread Jan Beulich
On 14.09.2023 21:23, Andrew Cooper wrote:
> On 14/09/2023 8:58 am, Jan Beulich wrote:
>> On 13.09.2023 22:27, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
>>> +++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
>>> @@ -218,7 +218,10 @@
>>>  wrmsr
>>>  .endm
>>>  
>>> -/* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). 
>>> */
>>> +/*
>>> + * Used after a synchronous entry from PV context.  SYSCALL, SYSENTER, INT,
>>> + * etc.  Will always interrupt a guest speculation context.
>>> + */
>>>  .macro SPEC_CTRL_ENTRY_FROM_PV
>>>  /*
>>>   * Requires %rsp=regs/cpuinfo, %rdx=0
>> For the entry point comments - not being a native speaker -, the use of
>> "{will,may} interrupt" reads odd. You're describing the macros here,
>> not the the events that led to their invocation. Therefore it would seem
>> to me that "{will,may} have interrupted" would be more appropriate.
> 
> The salient information is what the speculation state looks like when
> we're running the asm in these macros.
> 
> Sync and Async perhaps aren't the best terms.  For PV context at least,
> it boils down to:
> 
> 1) CPL>0 => you were in fully-good guest speculation context
> 2) CPL=0 => you were in fully-good Xen speculation context
> 3) IST && CPL=0 => Here be dragons.
> 
> HVM is more of a challenge.  VT-x behaves like an IST path, while SVM
> does allow us to atomically switch to good Xen state.
> 
> Really, this needs to be a separate doc, with diagrams...
> 
>>> @@ -319,7 +334,14 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
>>>  UNLIKELY_END(\@_serialise)
>>>  .endm
>>>  
>>> -/* Use when exiting to Xen in IST context. */
>>> +/*
>>> + * Use when exiting from any entry context, back to Xen context.  This
>>> + * includes returning to other SPEC_CTRL_{ENTRY,EXIT}_* regions with
>>> + * unsanitised state.
>>> + *
>>> + * Because we might have interrupted Xen beyond SPEC_CTRL_EXIT_TO_$GUEST, 
>>> we
>>> + * must treat this as if it were an EXIT_TO_$GUEST case too.
>>> + */
>>>  .macro SPEC_CTRL_EXIT_TO_XEN
>>>  /*
>>>   * Requires %rbx=stack_end
>> Is it really "must"? At least in theory there are ways to recognize that
>> exit is back to Xen context outside of interrupted entry/exit regions
>> (simply by evaluating how far below stack top %rsp is).
> 
> Yes, it is must - it's about how Xen behaves right now, not about some
> theoretical future with different tracking mechanism.

Well, deleting "must" does exactly that: Describe what we currently do,
without imposing that it necessarily has to be that way. That's at least
to me, as an - as said - non-native speaker.

> Checking the stack is very fragile and we've had bugs doing this in the
> past.  It would break completely if we were to take things such as the
> recursive-NMI fix (not that we're liable to at this point with FRED on
> the horizon.)
> 
> A perhaps less fragile option would be to have .text.entry.spec_suspect
> section and check %rip being in that.
> 
> But neither of these are good options.  It's adding complexity (latency)
> to a fastpath to avoid a small hit in a rare case, so is a concrete
> anti-optimisation.

I fully accept all of this, and I wasn't meaning to suggest we do what
might be possible. I merely dislike stronger than necessary statements,
as such are liable to cause confusion down the road.

That said, I certainly won't refuse to eventually ack this patch just
because of this one word.

Jan



Re: [PATCH v4 4/8] Arm: annotate entry points with type and size

2023-09-15 Thread Jan Beulich
On 14.09.2023 23:25, Julien Grall wrote:
> On 04/08/2023 07:28, Jan Beulich wrote:
>> Use the generic framework in xen/linkage.h. No change in generated code
>> except for the changed padding value (noticable when config.gz isn't a
>> multiple of 4 in size). Plus of course the converted symbols change to
>> be hidden ones.
>>
>> Note that ASM_INT() is switched to DATA(), not DATA_LOCAL(), as the only
>> use site wants the symbol global anyway.
>>
>> Signed-off-by: Jan Beulich 
> 
> Reviewed-by: Julien Grall 

Thanks.

>> ---
>> Only one each of the assembly files is being converted for now. More
>> could be done right here or as follow-on in separate patches.
> 
> I don't have a strong preference. Are you planning to do follow-up? (I 
> am ok if it is no).

Well, I certainly can, but I wasn't expecting this series to remain pending
for this long, so time's running out for 4.18.

Jan



Re: [PATCH v5 2/5] xen/ppc: Implement bitops.h

2023-09-15 Thread Jan Beulich
On 14.09.2023 20:15, Shawn Anastasio wrote:
> On 9/13/23 2:29 AM, Jan Beulich wrote:
>> On 12.09.2023 20:35, Shawn Anastasio wrote:
>>> --- a/xen/arch/ppc/include/asm/bitops.h
>>> +++ b/xen/arch/ppc/include/asm/bitops.h
>>> @@ -1,9 +1,335 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>>> +/*
>>> + * Adapted from Linux's arch/powerpc/include/asm/bitops.h.
>>> + *
>>> + * Merged version by David Gibson .
>>> + * Based on ppc64 versions by: Dave Engebretsen, Todd Inglett, Don
>>> + * Reed, Pat McCarthy, Peter Bergner, Anton Blanchard.  They
>>> + * originally took it from the ppc32 code.
>>> + */
>>>  #ifndef _ASM_PPC_BITOPS_H
>>>  #define _ASM_PPC_BITOPS_H
>>>
>>> +#include 
>>> +
>>> +#define __set_bit(n, p) set_bit(n, p)
>>> +#define __clear_bit(n, p)   clear_bit(n, p)
>>> +
>>> +#define BITOP_BITS_PER_WORD 32
>>> +#define BITOP_MASK(nr)  (1U << ((nr) % BITOP_BITS_PER_WORD))
>>> +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
>>> +#define BITS_PER_BYTE   8
>>> +
>>>  /* PPC bit number conversion */
>>> -#define PPC_BITLSHIFT(be)  (BITS_PER_LONG - 1 - (be))
>>> -#define PPC_BIT(bit)   (1UL << PPC_BITLSHIFT(bit))
>>> -#define PPC_BITMASK(bs, be)((PPC_BIT(bs) - PPC_BIT(be)) | 
>>> PPC_BIT(bs))
>>> +#define PPC_BITLSHIFT(be)(BITS_PER_LONG - 1 - (be))
>>> +#define PPC_BIT(bit) (1UL << PPC_BITLSHIFT(bit))
>>> +#define PPC_BITMASK(bs, be)  ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
>>> +
>>> +/* Macro for generating the ***_bits() functions */
>>> +#define DEFINE_BITOP(fn, op, prefix)   
>>> \
>>> +static inline void fn(unsigned int mask,   
>>>\
>>> +  volatile unsigned int *p_)   
>>> \
>>> +{  
>>> \
>>> +unsigned int old;  
>>>\
>>> +unsigned int *p = (unsigned int *)p_;  
>>> \
>>
>> What use is this, when ...
>>
>>> +asm volatile ( prefix  
>>> \
>>> +   "1: lwarx %0,0,%3,0\n"  
>>> \
>>> +   #op "%I2 %0,%0,%2\n"
>>> \
>>> +   "stwcx. %0,0,%3\n"  
>>> \
>>> +   "bne- 1b\n" 
>>> \
>>> +   : "=" (old), "+m" (*p)
>>> \
>>
>> ... the "+m" operand isn't used and ...
>>
>>> +   : "rK" (mask), "r" (p)  
>>> \
>>> +   : "cc", "memory" ); 
>>> \
>>
>> ... there's a memory clobber anyway?
>>
> 
> I see what you're saying, and I'm not sure why it was written this way
> in Linux. That said, this is the kind of thing that I'm hesitant to
> change without knowing the rationale of the original author. If you are
> confident that the this can be dropped given that there is already a
> memory clobber, I could drop it. Otherwise I'm inclined to leave it in a
> state that matches Linux.

This being an arch-independent property, I am confident. Yet still you're
the maintainer, so if you want to keep it like this initially, that'll be
okay. If I feel bothered enough, I could always send a patch afterwards.

Jan



Re: [PATCH v1 16/29] xen/asm-generic: introduce stub header flushtlb.h

2023-09-15 Thread Jiamei Xie

Hi Oleksii

On 2023/9/14 22:56, Oleksii Kurochko wrote:

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
  xen/include/asm-generic/flushtlb.h | 42 ++
  1 file changed, 42 insertions(+)
  create mode 100644 xen/include/asm-generic/flushtlb.h

diff --git a/xen/include/asm-generic/flushtlb.h 
b/xen/include/asm-generic/flushtlb.h
new file mode 100644
index 00..79e4773179
--- /dev/null
+++ b/xen/include/asm-generic/flushtlb.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_FLUSHTLB_H__
+#define __ASM_GENERIC_FLUSHTLB_H__
+
+#include 
+
+/*
+ * Filter the given set of CPUs, removing those that definitely flushed their
+ * TLB since @page_timestamp.
+ */
+/* XXX lazy implementation just doesn't clear anything */
+static inline void tlbflush_filter(cpumask_t *mask, uint32_t page_timestamp) {}
+
+#define tlbflush_current_time() (0)
+
+static inline void page_set_tlbflush_timestamp(struct page_info *page)
+{
+BUG();
+}
+
+/* Flush specified CPUs' TLBs */
+void arch_flush_tlb_mask(const cpumask_t *mask);
+
+#endif /* __ASM_GENERIC_FLUSHTLB_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
+

It's duplicated.

+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: BSD
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */




[linux-linus test] 183004: regressions - FAIL

2023-09-15 Thread osstest service owner
flight 183004 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183004/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 8 xen-boot fail REGR. 
vs. 182531
 test-amd64-coresched-amd64-xl  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-libvirt-pair 12 xen-boot/src_host   fail REGR. vs. 182531
 test-amd64-amd64-libvirt-pair 13 xen-boot/dst_host   fail REGR. vs. 182531
 test-amd64-amd64-xl-xsm   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-libvirt-raw  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-libvirt-qcow2  8 xen-boot   fail REGR. vs. 182531
 test-amd64-amd64-xl-vhd   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-pvhv2-intel  8 xen-boot  fail REGR. vs. 182531
 test-amd64-amd64-xl-pvshim8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-freebsd12-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-ws16-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-debianhvm-amd64  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-libvirt-xsm  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-win7-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-win7-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-dom0pvh-xl-amd 14 guest-start   fail REGR. vs. 182531
 test-amd64-amd64-xl-shadow8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-xl-credit1   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-examine-uefi  8 reboot  fail REGR. vs. 182531
 test-amd64-amd64-pair12 xen-boot/src_hostfail REGR. vs. 182531
 test-amd64-amd64-pair13 xen-boot/dst_hostfail REGR. vs. 182531
 test-amd64-amd64-xl-multivcpu  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-qemuu-nested-intel  8 xen-boot  fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-ws16-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-pvhv2-amd  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-ovmf-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-qemuu-nested-amd  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-dom0pvh-xl-intel 14 guest-start fail REGR. vs. 182531
 test-amd64-amd64-libvirt  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-freebsd11-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-pygrub   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 8 xen-boot fail REGR. vs. 
182531
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 8 xen-boot fail REGR. 
vs. 182531
 test-amd64-amd64-xl-credit2   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-examine-bios  8 reboot  fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-examine  8 reboot   fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 8 xen-boot fail REGR. vs. 
182531

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds  8 xen-boot fail REGR. vs. 182531

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 182531
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 182531
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 182531
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass