date:20230612

[xen-unstable-smoke test] 181396: trouble: blocked/broken/pass

2023-06-12 Thread osstest service owner

flight 181396 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181396/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf  broken
 build-armhf   4 host-install(4)broken REGR. vs. 181349

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  128557e3a44d79f0c9360dc88e42c3d0ef728edf
baseline version:
 xen  b4642c32c4d079916d5607ddda0232aae5e1690e

Last test of basis   181349  2023-06-09 20:00:24 Z3 days
Testing same since   181396  2023-06-12 22:00:25 Z0 days1 attempts


People who touched revisions under test:
  Bertrand Marquis 
  Jason Andryuk 
  Juergen Gross 
  Julien Grall 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  broken  
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  blocked 
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary

broken-job build-armhf broken
broken-step build-armhf host-install(4)

Not pushing.


commit 128557e3a44d79f0c9360dc88e42c3d0ef728edf
Author: Julien Grall 
Date:   Mon Jun 12 11:13:19 2023 +0100

tools/xenstored: Correct the prototype of domain_max_chk()

Some version of GCC will complain because the prototype and the
declaration of domain_max_chk() don't match:

xenstored_domain.c:1503:6: error: conflicting types for 'domain_max_chk' 
due to enum/integer mismatch; have '_Bool(const struct connection *, enum 
accitem,  unsigned int)' [-Werror=enum-int-mismatch]
 1503 | bool domain_max_chk(const struct connection *conn, enum accitem 
what,
  |  ^~
In file included from xenstored_domain.c:31:
xenstored_domain.h:146:6: note: previous declaration of 'domain_max_chk' 
with type '_Bool(const struct connection *, unsigned int,  unsigned int)'
  146 | bool domain_max_chk(const struct connection *conn, unsigned int 
what,
  |  ^~

Update the prototype to match the declaration.

This was spotted by Gitlab CI with the job opensuse-tumbleweed-gcc.

Fixes: 685048441e1c ("tools/xenstore: switch quota management to be table 
based")
Signed-off-by: Julien Grall 
Reviewed-by: Jason Andryuk 
Tested-by: Jason Andryuk 
Reviewed-by: Juergen Gross 

commit 1a0342507cb4011607673efec13a8f3238ac6aa8
Author: Juergen Gross 
Date:   Tue May 30 10:54:13 2023 +0200

tools/libs/store: make libxenstore independent of utils.h

There is no real need for including tools/xenstore/utils.h from
libxenstore, as only streq() and ARRAY_SIZE() are obtained via that
header.

streq() is just !strcmp(), and ARRAY_SIZE() is brought in via
xen-tools/common-macros.h.

Signed-off-by: Juergen Gross 
Acked-by: Julien Grall 

commit 0d5dfd2ed60addc1361ae82cbb52378abc912ede
Author: Juergen Gross 
Date:   Tue May 30 10:54:12 2023 +0200

tools/libs/store: use xen_list.h instead of xenstore/list.h

Replace the usage of the xenstore private list.h header with the
common xen_list.h one.

Use the XEN_TAILQ type list, as it allows to directly swap the
related macros/functions without having to change the logic.

Signed-off-by: Juergen Gross 
Acked-by: Julien Grall 

commit 84ac67cd1e3df780c413cd7093aa3ad8d508b79a
Author: Bertrand Marquis 
Date:   Mon Jun 12 15:00:46 2023 +0200

xen/arm: rename guest_cpuinfo in domain_cpuinfo

Rename the guest_cpuinfo structure to

[PATCH v3] docs/misra: new rules addition

2023-06-12 Thread Stefano Stabellini

From: Stefano Stabellini 

For Dir 1.1, a document describing all implementation-defined behaviour
(i.e. gcc-specific behavior) will be added to docs/misra, also including
implementation-specific (gcc-specific) appropriate types for bit-field
relevant to Rule 6.1.

Rule 21.21 is lacking an example on gitlab but the rule is
straightforward: we don't use stdlib at all in Xen.

Signed-off-by: Stefano Stabellini 
---
Changes in v3:
- add all signed integer types to the Notes of 6.1
- clarify 7.2 in the Notes
- not added: marking "inapplicable" rules, to be a separate patch

Changes in v2:
- drop 5.6
- specify additional appropriate types for 6.1
---
 docs/misra/rules.rst | 51 
 1 file changed, 51 insertions(+)

diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
index d5a6ee8cb6..f72a49c9c4 100644
--- a/docs/misra/rules.rst
+++ b/docs/misra/rules.rst
@@ -40,6 +40,12 @@ existing codebase are work-in-progress.
  - Summary
  - Notes
 
+   * - `Dir 1.1 
`_
+ - Required
+ - Any implementation-defined behaviour on which the output of the
+   program depends shall be documented and understood
+ -
+
* - `Dir 2.1 
`_
  - Required
  - All source files shall compile without any compilation errors
@@ -57,6 +63,13 @@ existing codebase are work-in-progress.
header file being included more than once
  -
 
+   * - `Dir 4.11 
`_
+ - Required
+ - The validity of values passed to library functions shall be checked
+ - We do not have libraries in Xen (libfdt and others are not
+   considered libraries from MISRA C point of view as they are
+   imported in source form)
+
* - `Dir 4.14 
`_
  - Required
  - The validity of values received from external sources shall be
@@ -133,6 +146,13 @@ existing codebase are work-in-progress.
headers (xen/include/public/) are allowed to retain longer
identifiers for backward compatibility.
 
+   * - `Rule 6.1 
`_
+ - Required
+ - Bit-fields shall only be declared with an appropriate type
+ - In addition to the C99 types, we also consider appropriate types:
+   unsigned char, unsigned short, unsigned long, unsigned long long,
+   enum, and all explicitly signed integer types.
+
* - `Rule 6.2 
`_
  - Required
  - Single-bit named bit fields shall not be of a signed type
@@ -143,6 +163,32 @@ existing codebase are work-in-progress.
  - Octal constants shall not be used
  -
 
+   * - `Rule 7.2 
`_
+ - Required
+ - A "u" or "U" suffix shall be applied to all integer constants
+   that are represented in an unsigned type
+ - The rule asks that any integer literal that is implicitly
+   unsigned is made explicitly unsigned by using one of the
+   indicated suffixes.  As an example, on a machine where the int
+   type is 32-bit wide, 0x is signed whereas 0x8000 is
+   (implicitly) unsigned. In order to comply with the rule, the
+   latter should be rewritten as either 0x8000u or 0x8000U.
+   Consistency considerations may suggest using the same suffix even
+   when not required by the rule. For instance, if one has:
+
+   Original: f(0x); f(0x8000);
+
+   one might prefer
+
+   Solution 1: f(0xU); f(0x8000U);
+
+   over
+
+   Solution 2: f(0x); f(0x8000U);
+
+   after having ascertained that "Solution 1" is compatible with the
+   intended semantics.
+
* - `Rule 7.3 
`_
  - Required
  - The lowercase character l shall not be used in a literal suffix
@@ -314,6 +360,11 @@ existing codebase are work-in-progress.
used following a subsequent call to the same function
  -
 
+   * - Rule 21.21
+ - Required
+ - The Standard Library function system of  shall not be used
+ -
+
* - `Rule 22.2 
`_
  - Mandatory
  - A block of memory shall only be freed if it was allocated by means of a
-- 
2.25.1

RE: [PATCH 4/4] xen/arm: pl011: Add SBSA UART device-tree support

2023-06-12 Thread Henry Wang

Hi Michal,

> -Original Message-
> Subject: [PATCH 4/4] xen/arm: pl011: Add SBSA UART device-tree support
> 
> We already have all the bits necessary in PL011 driver to support SBSA
> UART thanks to commit 032ea8c736d10f02672863c6e369338f948f7ed8 that
> enabled it for ACPI. Plumb in the remaining part for device-tree boot:
>  - add arm,sbsa-uart compatible to pl011_dt_match (no need for a separate
>struct and DT_DEVICE_START as SBSA is a subset of PL011),
>  - from pl011_dt_uart_init(), check for SBSA UART compatible to determine
>the UART type in use.
> 
> Signed-off-by: Michal Orzel 

Reviewed-by: Henry Wang 

I've also tested this patch on top of today's staging on FVP arm32 and arm64
and confirm this patch will not break existing functionality. So:

Tested-by: Henry Wang 

Kind regards,
Henry

RE: [PATCH 3/4] xen/arm: pl011: Use correct accessors

2023-06-12 Thread Henry Wang

Hi Michal,

> -Original Message-
> Subject: [PATCH 3/4] xen/arm: pl011: Use correct accessors
> 
> At the moment, we use 32-bit only accessors (i.e. readl/writel) to match
> the SBSA v2.x requirement. This should not be the default case for normal
> PL011 where accesses shall be 8/16-bit (max register size is 16-bit).
> There are however implementations of this UART that can only handle 32-bit
> MMIO. This is advertised by dt property "reg-io-width" set to 4.
> 
> Introduce new struct pl011 member mmio32 and replace pl011_{read/write}
> macros with static inline helpers that use 32-bit or 16-bit accessors
> (largest-common not to end up using different ones depending on the actual
> register size) according to mmio32 value. By default this property is set
> to false, unless:
>  - reg-io-width is specified with value 4,
>  - SBSA UART is in use.
> 
> For now, no changes done for ACPI due to lack of testing possibilities
> (i.e. current behavior maintained resulting in 32-bit accesses).
> 
> Signed-off-by: Michal Orzel 

I've tested this patch on top of today's staging on FVP arm32 and arm64 and
confirm this patch will not break existing functionality. So:

Tested-by: Henry Wang 

Kind regards,
Henry

RE: [PATCH 2/4] xen/arm: debug-pl011: Add support for 32-bit only MMIO

2023-06-12 Thread Henry Wang

Hi Michal,

> -Original Message-
> Subject: [PATCH 2/4] xen/arm: debug-pl011: Add support for 32-bit only
> MMIO
> 
> There are implementations of PL011 that can only handle 32-bit accesses
> as oppose to the normal behavior where accesses are 8/16-bit wide. This
> is usually advertised by setting a dt property 'reg-io-width' to 4.
> 
> Introduce CONFIG_EARLY_UART_PL011_MMIO32 Kconfig option to be able to
> enable the use of 32-bit only accessors in PL011 early printk code.
> Define macros PL011_{STRH,STRB,LDRH} to distinguish accessors for normal
> case from 32-bit MMIO one and use them in arm32/arm64 pl011 early printk
> code.
> 
> Update documentation accordingly.
> 
> Signed-off-by: Michal Orzel 

I've tested this patch on top of today's staging on FVP arm32 and arm64 and
confirm this patch will not break existing functionality. So:

Tested-by: Henry Wang 

Kind regards,
Henry

RE: [PATCH 1/4] xen/arm: debug-pl011: Use correct accessors

2023-06-12 Thread Henry Wang

Hi Michal,

> -Original Message-
> Subject: [PATCH 1/4] xen/arm: debug-pl011: Use correct accessors
> 
> Although most PL011 UARTs can cope with 32-bit accesses, some of the old
> legacy ones might not. PL011 registers are 8/16-bit wide and this shall
> be perceived as the normal behavior.
> 
> Modify early printk pl011 code for arm32/arm64 to use the correct
> accessors depending on the register size (refer ARM DDI 0183G, Table 3.1).
> 
> Signed-off-by: Michal Orzel 

I've tested this patch on top of today's staging on FVP arm32 and arm64 and
confirm this patch will not break existing functionality. So:

Tested-by: Henry Wang 

Kind regards,
Henry

[qemu-mainline test] 181395: regressions - trouble: blocked/broken/fail/pass

2023-06-12 Thread osstest service owner

flight 181395 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181395/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf  broken
 build-armhf   4 host-install(4)broken REGR. vs. 180691
 build-arm64-xsm   6 xen-buildfail REGR. vs. 180691
 build-arm64   6 xen-buildfail REGR. vs. 180691
 build-amd64   6 xen-buildfail REGR. vs. 180691
 build-i3866 xen-buildfail REGR. vs. 180691
 build-amd64-xsm   6 xen-buildfail REGR. vs. 180691
 build-i386-xsm6 xen-buildfail REGR. vs. 180691

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-ws16-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-shadow 1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-vhd1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-coresched-i386-xl  1 build-check(1)   blocked  n/a
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)

Re: Asking for help to debug xen efi on Kunpeng machine

2023-06-12 Thread Jiatong Shen

Hello Stefano and Julien,

   Could you provide more insights for debugging? I tried to connect to
serial console through ipmitool sol activate and enabled ACPI, I do see
some logs but the machine is still stuck. BMC video screen is still not
responsive and black out.

Thank you  very much for the help!

Best Regards,
Jiatong Shen

On Sun, Jun 11, 2023 at 5:17 PM Jiatong Shen  wrote:

> Hello Stefano and Julien,
>
>I tried to do some debugging by adding a printk inside the function
> idle_loop in file arm/domain.c. Looks like the idle function is running
> normally because the printk function is getting called without stalling.
> But the vga screen is still blacked out and the serial terminal does not
> display any login message.
>
> the grub config for xen 4.17 is
>
>  submenu 'Xen hypervisor, version 4.17' $menuentry_id_option
> 'xen-hypervisor-4.17-5ebc23af-c2e2-4ac3-b308-3e82ec786c04' {
> menuentry 'Debian GNU/Linux, with Xen 4.17 and Linux 5.10.0-23-arm64'
> --class debian --class gnu-linux --class gnu --class os --class xen
> $menuentry_id_option
> 'xen-gnulinux-5.10.0-23-arm64-advanced-5ebc23af-c2e2-4ac3-b308-3e82ec786c04'
> {
> insmod part_gpt
> insmod ext2
> set root='hd0,gpt2'
> if [ x$feature_platform_search_hint = xy ]; then
>  search --no-floppy --fs-uuid --set=root
> --hint-ieee1275='ieee1275//sas/disk@2,gpt2' --hint-bios=hd0,gpt2
> --hint-efi=hd0,gpt2 --hint-baremetal=ahci0,gpt2
>  5ebc23af-c2e2-4ac3-b308-3e82ec786c04
> else
>  search --no-floppy --fs-uuid --set=root
> 5ebc23af-c2e2-4ac3-b308-3e82ec786c04
> fi
> echo 'Loading Xen 4.17 ...'
>if [ "$grub_platform" = "pc" -o "$grub_platform" = "" ]; then
>xen_rm_opts=
>else
>xen_rm_opts="no-real-mode edd=off"
>fi
> xen_hypervisor /boot/xen-4.17 placeholder   ${xen_rm_opts}
> echo 'Loading Linux 5.10.0-23-arm64 ...'
> xen_module /boot/vmlinuz-5.10.0-23-arm64 placeholder
> root=UUID=5ebc23af-c2e2-4ac3-b308-3e82ec786c04 ro  quiet
> echo 'Loading initial ramdisk ...'
> xen_module --nounzip   /boot/initrd.img-5.10.0-23-arm64
> }
>
> The code I am modifying is
>
> static void noreturn idle_loop(void)
> {
> unsigned int cpu = smp_processor_id();
>
> for ( ; ; )
> {
> dprintk(XENLOG_INFO, "running idle loop \n");
> if ( cpu_is_offline(cpu) )
> stop_cpu();
> }
> }
> }
>
> Hopes this debugging makes some sense.
>
> Best Regards,
> Jiatong Shen
>
> On Sun, Jun 11, 2023 at 12:00 PM Jiatong Shen 
> wrote:
>
>> Hello Stefano,
>>
>> I am able to obtain some serial logging (by enabling debugging and
>> verbose debugging messages, hopefully select the right option). The message
>> looks like
>>
>> (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
>> (XEN) Freed 372kB init memory.
>> (XEN) d0v0: vGICD: unhandled word write 0x00 to ICACTIVER4
>> (XEN) d0v0: vGICD: unhandled word write 0x00 to ICACTIVER8
>> (XEN) d0v0: vGICD: unhandled word write 0x00 to ICACTIVER12
>> (XEN) d0v0: vGICD: unhandled word write 0x00 to ICACTIVER16
>> (XEN) d0v0: vGICD: unhandled word write 0x00 to ICACTIVER20
>> (XEN) d0v0: vGICD: unhandled word write 0x00 to ICACTIVER24
>> (XEN) d0v0: vGICD: unhandled word write 0x00 to ICACTIVER28
>> (XEN) d0v0: vGICD: unhandled word write 0x00 to ICACTIVER32
>> (XEN) d0v0: vGICD: unhandled word write 0x00 to ICACTIVER36
>> (XEN) d0v0: vGICD: unhandled word write 0x00 to ICACTIVER40
>>
>> Could you help find out where it is wrong ? Thank you very much for the
>> help!
>>
>> Best Regards,
>> Jiatong Shen
>>
>>
>> On Sat, Jun 10, 2023 at 7:15 AM Jiatong Shen 
>> wrote:
>>
>>> Hello Julien,
>>>
>>> Thank you very much for your help!
>>>
>>> Best,
>>>
>>> Jiatong Shen
>>>
>>> On Fri, Jun 9, 2023 at 4:48 PM Julien Grall  wrote:
>>>
 Hello,

 On 09/06/2023 03:32, Jiatong Shen wrote:
 > Thank you for your answer. Can you teach me how to verify if acpi is
 > enabled?

 You usually look at the .config. But I am not sure if this is provided
 by the Debian package. If not, then your best option would be to build
 your own Xen. To select ACPI, you want to use the menuconfig and select
 UNSUPPORTED and ACPI.

 Cheers,

 --
 Julien Grall

>>>
>>>
>>> --
>>>
>>> Best Regards,
>>>
>>> Jiatong Shen
>>>
>>
>>
>> --
>>
>> Best Regards,
>>
>> Jiatong Shen
>>
>
>
> --
>
> Best Regards,
>
> Jiatong Shen
>


-- 

Best Regards,

Jiatong Shen

[linux-linus test] 181392: regressions - trouble: broken/fail/pass

2023-06-12 Thread osstest service owner

flight 181392 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181392/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-libvirt broken
 test-armhf-armhf-libvirt-qcow2 broken
 test-armhf-armhf-libvirt-raw broken
 test-armhf-armhf-xl  broken
 test-armhf-armhf-xl-arndale  broken
 test-armhf-armhf-xl-credit1  broken
 test-armhf-armhf-xl-credit2  broken
 test-armhf-armhf-xl-multivcpu broken
 test-armhf-armhf-xl-rtds broken
 test-armhf-armhf-xl-vhd  broken
 test-armhf-armhf-xl-credit1   8 xen-boot   fail in 181383 REGR. vs. 180278
 build-arm64-pvops 6 kernel-build   fail in 181383 REGR. vs. 180278

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-libvirt-raw  5 host-install(5)  broken pass in 181383
 test-armhf-armhf-xl-multivcpu  5 host-install(5) broken pass in 181383
 test-armhf-armhf-xl-vhd   5 host-install(5)  broken pass in 181383
 test-armhf-armhf-xl-credit1   5 host-install(5)  broken pass in 181383
 test-armhf-armhf-examine  5 host-install broken pass in 181387
 test-armhf-armhf-xl   5 host-install(5)  broken pass in 181387
 test-armhf-armhf-xl-rtds  5 host-install(5)  broken pass in 181387
 test-armhf-armhf-xl-arndale   5 host-install(5)  broken pass in 181387
 test-armhf-armhf-xl-credit2   5 host-install(5)  broken pass in 181387
 test-armhf-armhf-libvirt  5 host-install(5)  broken pass in 181387
 test-armhf-armhf-libvirt-qcow2  5 host-install(5)broken pass in 181387
 test-amd64-amd64-xl-vhd 21 guest-start/debian.repeat fail in 181383 pass in 
181392

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-examine  1 build-check(1)   blocked in 181383 n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked in 181383 n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked in 181383 n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked in 181383 n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked in 181383 n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked in 181383 n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked in 181383 n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked in 181383 n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked in 181383 n/a
 test-armhf-armhf-xl-multivcpu  8 xen-boot   fail in 181383 like 180278
 test-armhf-armhf-xl-vhd   8 xen-bootfail in 181383 like 180278
 test-armhf-armhf-libvirt-raw  8 xen-bootfail in 181383 like 180278
 test-armhf-armhf-xl-credit2   8 xen-bootfail in 181387 like 180278
 test-armhf-armhf-examine  8 reboot  fail in 181387 like 180278
 test-armhf-armhf-libvirt  8 xen-bootfail in 181387 like 180278
 test-armhf-armhf-xl-arndale   8 xen-bootfail in 181387 like 180278
 test-armhf-armhf-libvirt-qcow2  8 xen-boot  fail in 181387 like 180278
 test-armhf-armhf-xl   8 xen-bootfail in 181387 like 180278
 test-armhf-armhf-xl-rtds  8 xen-bootfail in 181387 like 180278
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180278
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass

[PATCH v9 02/42] mm: Move pte/pmd_mkwrite() callers with no VMA to _novma()

2023-06-12 Thread Rick Edgecombe

The x86 Shadow stack feature includes a new type of memory called shadow
stack. This shadow stack memory has some unusual properties, which requires
some core mm changes to function properly.

One of these unusual properties is that shadow stack memory is writable,
but only in limited ways. These limits are applied via a specific PTE
bit combination. Nevertheless, the memory is writable, and core mm code
will need to apply the writable permissions in the typical paths that
call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so
that the x86 implementation of it can know whether to create regular
writable memory or shadow stack memory.

But there are a couple of challenges to this. Modifying the signatures of
each arch pte_mkwrite() implementation would be error prone because some
are generated with macros and would need to be re-implemented. Also, some
pte_mkwrite() callers operate on kernel memory without a VMA.

So this can be done in a three step process. First pte_mkwrite() can be
renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite()
added that just calls pte_mkwrite_novma(). Next callers without a VMA can
be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers
can be changed to take/pass a VMA.

Previous patches have done the first step, so next move the callers that
don't have a VMA to pte_mkwrite_novma(). Also do the same for
pmd_mkwrite(). This will be ok for the shadow stack feature, as these
callers are on kernel memory which will not need to be made shadow stack,
and the other architectures only currently support one type of memory
in pte_mkwrite()

Cc: linux-...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-s...@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Rick Edgecombe 
---
Hi Non-x86 Arch’s,

x86 has a feature that allows for the creation of a special type of
writable memory (shadow stack) that is only writable in limited specific
ways. Previously, changes were proposed to core MM code to teach it to
decide when to create normally writable memory or the special shadow stack
writable memory, but David Hildenbrand suggested[0] to change
pXX_mkwrite() to take a VMA, so awareness of shadow stack memory can be
moved into x86 code. Later Linus suggested a less error-prone way[1] to go
about this after the first attempt had a bug.

Since pXX_mkwrite() is defined in every arch, it requires some tree-wide
changes. So that is why you are seeing some patches out of a big x86
series pop up in your arch mailing list. There is no functional change.
After this refactor, the shadow stack series goes on to use the arch
helpers to push arch memory details inside arch/x86 and other arch's
with upcoming shadow stack features.

Testing was just 0-day build testing.

Hopefully that is enough context. Thanks!

[0] 
https://lore.kernel.org/lkml/0e29a2d0-08d8-bcd6-ff26-4bea0e403...@redhat.com/
[1] 
https://lore.kernel.org/lkml/CAHk-=wizjsu7c9sfyzb3q04108stghff2wfbokgccgw7riz...@mail.gmail.com/
---
 arch/arm64/mm/trans_pgd.c | 4 ++--
 arch/s390/mm/pageattr.c   | 4 ++--
 arch/x86/xen/mmu_pv.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
index 4ea2eefbc053..a01493f3a06f 100644
--- a/arch/arm64/mm/trans_pgd.c
+++ b/arch/arm64/mm/trans_pgd.c
@@ -40,7 +40,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, 
unsigned long addr)
 * read only (code, rodata). Clear the RDONLY bit from
 * the temporary mappings we use during restore.
 */
-   set_pte(dst_ptep, pte_mkwrite(pte));
+   set_pte(dst_ptep, pte_mkwrite_novma(pte));
} else if (debug_pagealloc_enabled() && !pte_none(pte)) {
/*
 * debug_pagealloc will removed the PTE_VALID bit if
@@ -53,7 +53,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, 
unsigned long addr)
 */
BUG_ON(!pfn_valid(pte_pfn(pte)));
 
-   set_pte(dst_ptep, pte_mkpresent(pte_mkwrite(pte)));
+   set_pte(dst_ptep, pte_mkpresent(pte_mkwrite_novma(pte)));
}
 }
 
diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c
index 5ba3bd8a7b12..6931d484d8a7 100644
--- a/arch/s390/mm/pageattr.c
+++ b/arch/s390/mm/pageattr.c
@@ -97,7 +97,7 @@ static int walk_pte_level(pmd_t *pmdp, unsigned long addr, 
unsigned long end,
if (flags & SET_MEMORY_RO)
new = pte_wrprotect(new);
else if (flags & SET_MEMORY_RW)
-   new = pte_mkwrite(pte_mkdirty(new));
+   new = pte_mkwrite_novma(pte_mkdirty(new));
if (flags & SET_MEMORY_NX)
new = set_pte_bit(new, __pgprot(_PAGE_NOEXEC));
else if (flags & SET_MEMORY_X)
@@ -155,7 +155,7 @@ static void

Re: [patch V4 10/37] x86/smpboot: Get rid of cpu_init_secondary()

2023-06-12 Thread Philippe Mathieu-Daudé


On 12/5/23 23:07, Thomas Gleixner wrote:

From: Thomas Gleixner 

The synchronization of the AP with the control CPU is a SMP boot problem
and has nothing to do with cpu_init().

Open code cpu_init_secondary() in start_secondary() and move
wait_for_master_cpu() into the SMP boot code.

No functional change.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
  arch/x86/include/asm/processor.h |1 -
  arch/x86/kernel/cpu/common.c |   27 ---
  arch/x86/kernel/smpboot.c|   24 +++-
  3 files changed, 19 insertions(+), 33 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [patch V4 07/37] x86/smpboot: Restrict soft_restart_cpu() to SEV

2023-06-12 Thread Philippe Mathieu-Daudé


On 12/5/23 23:07, Thomas Gleixner wrote:

From: Thomas Gleixner 

Now that the CPU0 hotplug cruft is gone, the only user is AMD SEV.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
  arch/x86/kernel/callthunks.c |2 +-
  arch/x86/kernel/head_32.S|   14 --
  arch/x86/kernel/head_64.S|2 +-
  3 files changed, 2 insertions(+), 16 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [patch V4 04/37] x86/smpboot: Rename start_cpu0() to soft_restart_cpu()

2023-06-12 Thread Philippe Mathieu-Daudé


On 12/5/23 23:07, Thomas Gleixner wrote:

From: Thomas Gleixner 

This is used in the SEV play_dead() implementation to re-online CPUs. But
that has nothing to do with CPU0.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
  arch/x86/include/asm/cpu.h   |2 +-
  arch/x86/kernel/callthunks.c |2 +-
  arch/x86/kernel/head_32.S|   10 +-
  arch/x86/kernel/head_64.S|   10 +-
  arch/x86/kernel/sev.c|2 +-
  5 files changed, 13 insertions(+), 13 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2] xen/arm: rename guest_cpuinfo in domain_cpuinfo

2023-06-12 Thread Julien Grall


Hi Bertrand,

On 12/06/2023 14:00, Bertrand Marquis wrote:

Rename the guest_cpuinfo structure to domain_cpuinfo as it is not only
used for guests but also for dom0 so domain is a more suitable name.

While there also rename the create_guest_cpuinfo function to
create_domain_cpuinfo to be coherent and fix comments accordingly.

Signed-off-by: Bertrand Marquis 


Acked-by: Julien Grall 

And committed.

Cheers,

--
Julien Grall

[PATCH v4 22/34] csky: Convert __pte_free_tlb() to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Guo Ren 
---
 arch/csky/include/asm/pgalloc.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/csky/include/asm/pgalloc.h b/arch/csky/include/asm/pgalloc.h
index 7d57e5da0914..9c84c9012e53 100644
--- a/arch/csky/include/asm/pgalloc.h
+++ b/arch/csky/include/asm/pgalloc.h
@@ -63,8 +63,8 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 
 #define __pte_free_tlb(tlb, pte, address)  \
 do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page(tlb, pte);  \
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc(tlb, page_ptdesc(pte));  \
 } while (0)
 
 extern void pagetable_init(void);
-- 
2.40.1

[PATCH v4 23/34] hexagon: Convert __pte_free_tlb() to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/hexagon/include/asm/pgalloc.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/hexagon/include/asm/pgalloc.h 
b/arch/hexagon/include/asm/pgalloc.h
index f0c47e6a7427..55988625e6fb 100644
--- a/arch/hexagon/include/asm/pgalloc.h
+++ b/arch/hexagon/include/asm/pgalloc.h
@@ -87,10 +87,10 @@ static inline void pmd_populate_kernel(struct mm_struct 
*mm, pmd_t *pmd,
max_kernel_seg = pmdindex;
 }
 
-#define __pte_free_tlb(tlb, pte, addr) \
-do {   \
-   pgtable_pte_page_dtor((pte));   \
-   tlb_remove_page((tlb), (pte));  \
+#define __pte_free_tlb(tlb, pte, addr) \
+do {   \
+   pagetable_pte_dtor((page_ptdesc(pte))); \
+   tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
 } while (0)
 
 #endif
-- 
2.40.1

[PATCH v4 30/34] sh: Convert pte_free_tlb() to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents. Also cleans up some spacing issues.

Signed-off-by: Vishal Moola (Oracle) 
Reviewed-by: Geert Uytterhoeven 
Acked-by: John Paul Adrian Glaubitz 
---
 arch/sh/include/asm/pgalloc.h | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/sh/include/asm/pgalloc.h b/arch/sh/include/asm/pgalloc.h
index a9e98233c4d4..5d8577ab1591 100644
--- a/arch/sh/include/asm/pgalloc.h
+++ b/arch/sh/include/asm/pgalloc.h
@@ -2,6 +2,7 @@
 #ifndef __ASM_SH_PGALLOC_H
 #define __ASM_SH_PGALLOC_H
 
+#include 
 #include 
 
 #define __HAVE_ARCH_PMD_ALLOC_ONE
@@ -31,10 +32,10 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmd,
set_pmd(pmd, __pmd((unsigned long)page_address(pte)));
 }
 
-#define __pte_free_tlb(tlb,pte,addr)   \
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), (pte));  \
+#define __pte_free_tlb(tlb, pte, addr) \
+do {   \
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
 } while (0)
 
 #endif /* __ASM_SH_PGALLOC_H */
-- 
2.40.1

[PATCH v4 26/34] mips: Convert various functions to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/mips/include/asm/pgalloc.h | 31 +--
 arch/mips/mm/pgtable.c  |  7 ---
 2 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/arch/mips/include/asm/pgalloc.h b/arch/mips/include/asm/pgalloc.h
index f72e737dda21..6940e5536664 100644
--- a/arch/mips/include/asm/pgalloc.h
+++ b/arch/mips/include/asm/pgalloc.h
@@ -51,13 +51,13 @@ extern pgd_t *pgd_alloc(struct mm_struct *mm);
 
 static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 {
-   free_pages((unsigned long)pgd, PGD_TABLE_ORDER);
+   pagetable_free(virt_to_ptdesc(pgd));
 }
 
-#define __pte_free_tlb(tlb,pte,address)\
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), pte);\
+#define __pte_free_tlb(tlb, pte, address)  \
+do {   \
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
 } while (0)
 
 #ifndef __PAGETABLE_PMD_FOLDED
@@ -65,18 +65,18 @@ do {
\
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 {
pmd_t *pmd;
-   struct page *pg;
+   struct ptdesc *ptdesc;
 
-   pg = alloc_pages(GFP_KERNEL_ACCOUNT, PMD_TABLE_ORDER);
-   if (!pg)
+   ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, PMD_TABLE_ORDER);
+   if (!ptdesc)
return NULL;
 
-   if (!pgtable_pmd_page_ctor(pg)) {
-   __free_pages(pg, PMD_TABLE_ORDER);
+   if (!pagetable_pmd_ctor(ptdesc)) {
+   pagetable_free(ptdesc);
return NULL;
}
 
-   pmd = (pmd_t *)page_address(pg);
+   pmd = ptdesc_address(ptdesc);
pmd_init(pmd);
return pmd;
 }
@@ -90,10 +90,13 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
unsigned long address)
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
 {
pud_t *pud;
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, PUD_TABLE_ORDER);
 
-   pud = (pud_t *) __get_free_pages(GFP_KERNEL, PUD_TABLE_ORDER);
-   if (pud)
-   pud_init(pud);
+   if (!ptdesc)
+   return NULL;
+   pud = ptdesc_address(ptdesc);
+
+   pud_init(pud);
return pud;
 }
 
diff --git a/arch/mips/mm/pgtable.c b/arch/mips/mm/pgtable.c
index b13314be5d0e..729258ff4e3b 100644
--- a/arch/mips/mm/pgtable.c
+++ b/arch/mips/mm/pgtable.c
@@ -10,10 +10,11 @@
 
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-   pgd_t *ret, *init;
+   pgd_t *init, *ret = NULL;
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, PGD_TABLE_ORDER);
 
-   ret = (pgd_t *) __get_free_pages(GFP_KERNEL, PGD_TABLE_ORDER);
-   if (ret) {
+   if (ptdesc) {
+   ret = ptdesc_address(ptdesc);
init = pgd_offset(_mm, 0UL);
pgd_init(ret);
memcpy(ret + USER_PTRS_PER_PGD, init + USER_PTRS_PER_PGD,
-- 
2.40.1

[PATCH v4 27/34] nios2: Convert __pte_free_tlb() to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/nios2/include/asm/pgalloc.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/nios2/include/asm/pgalloc.h b/arch/nios2/include/asm/pgalloc.h
index ecd1657bb2ce..ce6bb8e74271 100644
--- a/arch/nios2/include/asm/pgalloc.h
+++ b/arch/nios2/include/asm/pgalloc.h
@@ -28,10 +28,10 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmd,
 
 extern pgd_t *pgd_alloc(struct mm_struct *mm);
 
-#define __pte_free_tlb(tlb, pte, addr) \
-   do {\
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), (pte));  \
+#define __pte_free_tlb(tlb, pte, addr) \
+   do {\
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
} while (0)
 
 #endif /* _ASM_NIOS2_PGALLOC_H */
-- 
2.40.1

[PATCH v4 17/34] s390: Convert various pgalloc functions to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/s390/include/asm/pgalloc.h |   4 +-
 arch/s390/include/asm/tlb.h |   4 +-
 arch/s390/mm/pgalloc.c  | 108 
 3 files changed, 59 insertions(+), 57 deletions(-)

diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 17eb618f1348..00ad9b88fda9 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -86,7 +86,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
unsigned long vmaddr)
if (!table)
return NULL;
crst_table_init(table, _SEGMENT_ENTRY_EMPTY);
-   if (!pgtable_pmd_page_ctor(virt_to_page(table))) {
+   if (!pagetable_pmd_ctor(virt_to_ptdesc(table))) {
crst_table_free(mm, table);
return NULL;
}
@@ -97,7 +97,7 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 {
if (mm_pmd_folded(mm))
return;
-   pgtable_pmd_page_dtor(virt_to_page(pmd));
+   pagetable_pmd_dtor(virt_to_ptdesc(pmd));
crst_table_free(mm, (unsigned long *) pmd);
 }
 
diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
index b91f4a9b044c..383b1f91442c 100644
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -89,12 +89,12 @@ static inline void pmd_free_tlb(struct mmu_gather *tlb, 
pmd_t *pmd,
 {
if (mm_pmd_folded(tlb->mm))
return;
-   pgtable_pmd_page_dtor(virt_to_page(pmd));
+   pagetable_pmd_dtor(virt_to_ptdesc(pmd));
__tlb_adjust_range(tlb, address, PAGE_SIZE);
tlb->mm->context.flush_mm = 1;
tlb->freed_tables = 1;
tlb->cleared_puds = 1;
-   tlb_remove_table(tlb, pmd);
+   tlb_remove_ptdesc(tlb, pmd);
 }
 
 /*
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 6b99932abc66..eeb7c95b98cf 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -43,17 +43,17 @@ __initcall(page_table_register_sysctl);
 
 unsigned long *crst_table_alloc(struct mm_struct *mm)
 {
-   struct page *page = alloc_pages(GFP_KERNEL, CRST_ALLOC_ORDER);
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, CRST_ALLOC_ORDER);
 
-   if (!page)
+   if (!ptdesc)
return NULL;
-   arch_set_page_dat(page, CRST_ALLOC_ORDER);
-   return (unsigned long *) page_to_virt(page);
+   arch_set_page_dat(ptdesc_page(ptdesc), CRST_ALLOC_ORDER);
+   return (unsigned long *) ptdesc_to_virt(ptdesc);
 }
 
 void crst_table_free(struct mm_struct *mm, unsigned long *table)
 {
-   free_pages((unsigned long)table, CRST_ALLOC_ORDER);
+   pagetable_free(virt_to_ptdesc(table));
 }
 
 static void __crst_table_upgrade(void *arg)
@@ -140,21 +140,21 @@ static inline unsigned int atomic_xor_bits(atomic_t *v, 
unsigned int bits)
 
 struct page *page_table_alloc_pgste(struct mm_struct *mm)
 {
-   struct page *page;
+   struct ptdesc *ptdesc;
u64 *table;
 
-   page = alloc_page(GFP_KERNEL);
-   if (page) {
-   table = (u64 *)page_to_virt(page);
+   ptdesc = pagetable_alloc(GFP_KERNEL, 0);
+   if (ptdesc) {
+   table = (u64 *)ptdesc_to_virt(ptdesc);
memset64(table, _PAGE_INVALID, PTRS_PER_PTE);
memset64(table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
}
-   return page;
+   return ptdesc_page(ptdesc);
 }
 
 void page_table_free_pgste(struct page *page)
 {
-   __free_page(page);
+   pagetable_free(page_ptdesc(page));
 }
 
 #endif /* CONFIG_PGSTE */
@@ -230,7 +230,7 @@ void page_table_free_pgste(struct page *page)
 unsigned long *page_table_alloc(struct mm_struct *mm)
 {
unsigned long *table;
-   struct page *page;
+   struct ptdesc *ptdesc;
unsigned int mask, bit;
 
/* Try to get a fragment of a 4K page as a 2K page table */
@@ -238,9 +238,9 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
table = NULL;
spin_lock_bh(>context.lock);
if (!list_empty(>context.pgtable_list)) {
-   page = list_first_entry(>context.pgtable_list,
-   struct page, lru);
-   mask = atomic_read(>pt_frag_refcount);
+   ptdesc = list_first_entry(>context.pgtable_list,
+   struct ptdesc, pt_list);
+   mask = atomic_read(>pt_frag_refcount);
/*
 * The pending removal bits must also be checked.
 * Failure to do so might

[PATCH v4 32/34] sparc: Convert pgtable_pte_page_{ctor, dtor}() to ptdesc equivalents

2023-06-12 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable pte constructor/destructors with
ptdesc equivalents.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/sparc/mm/srmmu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/mm/srmmu.c b/arch/sparc/mm/srmmu.c
index 13f027afc875..8393faa3e596 100644
--- a/arch/sparc/mm/srmmu.c
+++ b/arch/sparc/mm/srmmu.c
@@ -355,7 +355,8 @@ pgtable_t pte_alloc_one(struct mm_struct *mm)
return NULL;
page = pfn_to_page(__nocache_pa((unsigned long)ptep) >> PAGE_SHIFT);
spin_lock(>page_table_lock);
-   if (page_ref_inc_return(page) == 2 && !pgtable_pte_page_ctor(page)) {
+   if (page_ref_inc_return(page) == 2 &&
+   !pagetable_pte_ctor(page_ptdesc(page))) {
page_ref_dec(page);
ptep = NULL;
}
@@ -371,7 +372,7 @@ void pte_free(struct mm_struct *mm, pgtable_t ptep)
page = pfn_to_page(__nocache_pa((unsigned long)ptep) >> PAGE_SHIFT);
spin_lock(>page_table_lock);
if (page_ref_dec_return(page) == 1)
-   pgtable_pte_page_dtor(page);
+   pagetable_pte_dtor(page_ptdesc(page));
spin_unlock(>page_table_lock);
 
srmmu_free_nocache(ptep, SRMMU_PTE_TABLE_SIZE);
-- 
2.40.1

[PATCH v4 34/34] mm: Remove pgtable_{pmd, pte}_page_{ctor, dtor}() wrappers

2023-06-12 Thread Vishal Moola (Oracle)

These functions are no longer necessary. Remove them and cleanup
Documentation referencing them.

Signed-off-by: Vishal Moola (Oracle) 
---
 Documentation/mm/split_page_table_lock.rst| 12 +--
 .../zh_CN/mm/split_page_table_lock.rst| 14 ++---
 include/linux/mm.h| 20 ---
 3 files changed, 13 insertions(+), 33 deletions(-)

diff --git a/Documentation/mm/split_page_table_lock.rst 
b/Documentation/mm/split_page_table_lock.rst
index 50ee0dfc95be..4bffec728340 100644
--- a/Documentation/mm/split_page_table_lock.rst
+++ b/Documentation/mm/split_page_table_lock.rst
@@ -53,7 +53,7 @@ Support of split page table lock by an architecture
 ===
 
 There's no need in special enabling of PTE split page table lock: everything
-required is done by pgtable_pte_page_ctor() and pgtable_pte_page_dtor(), which
+required is done by pagetable_pte_ctor() and pagetable_pte_dtor(), which
 must be called on PTE table allocation / freeing.
 
 Make sure the architecture doesn't use slab allocator for page table
@@ -63,8 +63,8 @@ This field shares storage with page->ptl.
 PMD split lock only makes sense if you have more than two page table
 levels.
 
-PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table
-allocation and pgtable_pmd_page_dtor() on freeing.
+PMD split lock enabling requires pagetable_pmd_ctor() call on PMD table
+allocation and pagetable_pmd_dtor() on freeing.
 
 Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and
 pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing
@@ -72,7 +72,7 @@ paths: i.e X86_PAE preallocate few PMDs on pgd_alloc().
 
 With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.
 
-NOTE: pgtable_pte_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
+NOTE: pagetable_pte_ctor() and pagetable_pmd_ctor() can fail -- it must
 be handled properly.
 
 page->ptl
@@ -92,7 +92,7 @@ trick:
split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
one more cache line for indirect access;
 
-The spinlock_t allocated in pgtable_pte_page_ctor() for PTE table and in
-pgtable_pmd_page_ctor() for PMD table.
+The spinlock_t allocated in pagetable_pte_ctor() for PTE table and in
+pagetable_pmd_ctor() for PMD table.
 
 Please, never access page->ptl directly -- use appropriate helper.
diff --git a/Documentation/translations/zh_CN/mm/split_page_table_lock.rst 
b/Documentation/translations/zh_CN/mm/split_page_table_lock.rst
index 4fb7aa666037..a2c288670a24 100644
--- a/Documentation/translations/zh_CN/mm/split_page_table_lock.rst
+++ b/Documentation/translations/zh_CN/mm/split_page_table_lock.rst
@@ -56,16 +56,16 @@ Hugetlb特定的辅助函数:
 架构对分页表锁的支持
 
 
-没有必要特别启用PTE分页表锁：所有需要的东西都由pgtable_pte_page_ctor()
-和pgtable_pte_page_dtor()完成，它们必须在PTE表分配/释放时被调用。
+没有必要特别启用PTE分页表锁：所有需要的东西都由pagetable_pte_ctor()
+和pagetable_pte_dtor()完成，它们必须在PTE表分配/释放时被调用。
 
 确保架构不使用slab分配器来分配页表：slab使用page->slab_cache来分配其页
 面。这个区域与page->ptl共享存储。
 
 PMD分页锁只有在你有两个以上的页表级别时才有意义。
 
-启用PMD分页锁需要在PMD表分配时调用pgtable_pmd_page_ctor()，在释放时调
-用pgtable_pmd_page_dtor()。
+启用PMD分页锁需要在PMD表分配时调用pagetable_pmd_ctor()，在释放时调
+用pagetable_pmd_dtor()。
 
 分配通常发生在pmd_alloc_one()中，释放发生在pmd_free()和pmd_free_tlb()
 中，但要确保覆盖所有的PMD表分配/释放路径：即X86_PAE在pgd_alloc()中预先
@@ -73,7 +73,7 @@ PMD分页锁只有在你有两个以上的页表级别时才有意义。
 
 一切就绪后，你可以设置CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK。
 
-注意：pgtable_pte_page_ctor()和pgtable_pmd_page_ctor()可能失败--必
+注意：pagetable_pte_ctor()和pagetable_pmd_ctor()可能失败--必
 须正确处理。
 
 page->ptl
@@ -90,7 +90,7 @@ page->ptl用于访问分割页表锁，其中'page'是包含该表的页面struc
的指针并动态分配它。这允许在启用DEBUG_SPINLOCK或DEBUG_LOCK_ALLOC的
情况下使用分页锁，但由于间接访问而多花了一个缓存行。
 
-PTE表的spinlock_t分配在pgtable_pte_page_ctor()中，PMD表的spinlock_t
-分配在pgtable_pmd_page_ctor()中。
+PTE表的spinlock_t分配在pagetable_pte_ctor()中，PMD表的spinlock_t
+分配在pagetable_pmd_ctor()中。
 
 请不要直接访问page->ptl - -使用适当的辅助函数。
diff --git a/include/linux/mm.h b/include/linux/mm.h
index dc211c43610b..6d83483cf186 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2897,11 +2897,6 @@ static inline bool pagetable_pte_ctor(struct ptdesc 
*ptdesc)
return true;
 }
 
-static inline bool pgtable_pte_page_ctor(struct page *page)
-{
-   return pagetable_pte_ctor(page_ptdesc(page));
-}
-
 static inline void pagetable_pte_dtor(struct ptdesc *ptdesc)
 {
struct folio *folio = ptdesc_folio(ptdesc);
@@ -2911,11 +2906,6 @@ static inline void pagetable_pte_dtor(struct ptdesc 
*ptdesc)
lruvec_stat_sub_folio(folio, NR_PAGETABLE);
 }
 
-static inline void pgtable_pte_page_dtor(struct page *page)
-{
-   pagetable_pte_dtor(page_ptdesc(page));
-}
-
 #define pte_offset_map_lock(mm, pmd, address, ptlp)\
 ({ \
spinlock_t *__ptl = pte_lockptr(mm, pmd);   \
@@ -3006,11 +2996,6 @@ static inline bool pagetable_pmd_ctor(struct ptdesc 
*ptdesc)
return true;
 }

[PATCH v4 33/34] um: Convert {pmd, pte}_free_tlb() to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents. Also cleans up some spacing issues.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/um/include/asm/pgalloc.h | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/pgalloc.h b/arch/um/include/asm/pgalloc.h
index 8ec7cd46dd96..de5e31c64793 100644
--- a/arch/um/include/asm/pgalloc.h
+++ b/arch/um/include/asm/pgalloc.h
@@ -25,19 +25,19 @@
  */
 extern pgd_t *pgd_alloc(struct mm_struct *);
 
-#define __pte_free_tlb(tlb,pte, address)   \
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb),(pte));   \
+#define __pte_free_tlb(tlb, pte, address)  \
+do {   \
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
 } while (0)
 
 #ifdef CONFIG_3_LEVEL_PGTABLES
 
-#define __pmd_free_tlb(tlb, pmd, address)  \
-do {   \
-   pgtable_pmd_page_dtor(virt_to_page(pmd));   \
-   tlb_remove_page((tlb),virt_to_page(pmd));   \
-} while (0)\
+#define __pmd_free_tlb(tlb, pmd, address)  \
+do {   \
+   pagetable_pmd_dtor(virt_to_ptdesc(pmd));\
+   tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \
+} while (0)
 
 #endif
 
-- 
2.40.1

[PATCH v4 14/34] powerpc: Convert various functions to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

In order to split struct ptdesc from struct page, convert various
functions to use ptdescs.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/powerpc/mm/book3s64/mmu_context.c | 10 +++---
 arch/powerpc/mm/book3s64/pgtable.c | 32 +-
 arch/powerpc/mm/pgtable-frag.c | 46 +-
 3 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/mmu_context.c 
b/arch/powerpc/mm/book3s64/mmu_context.c
index c766e4c26e42..1715b07c630c 100644
--- a/arch/powerpc/mm/book3s64/mmu_context.c
+++ b/arch/powerpc/mm/book3s64/mmu_context.c
@@ -246,15 +246,15 @@ static void destroy_contexts(mm_context_t *ctx)
 static void pmd_frag_destroy(void *pmd_frag)
 {
int count;
-   struct page *page;
+   struct ptdesc *ptdesc;
 
-   page = virt_to_page(pmd_frag);
+   ptdesc = virt_to_ptdesc(pmd_frag);
/* drop all the pending references */
count = ((unsigned long)pmd_frag & ~PAGE_MASK) >> PMD_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
-   if (atomic_sub_and_test(PMD_FRAG_NR - count, >pt_frag_refcount)) {
-   pgtable_pmd_page_dtor(page);
-   __free_page(page);
+   if (atomic_sub_and_test(PMD_FRAG_NR - count, 
>pt_frag_refcount)) {
+   pagetable_pmd_dtor(ptdesc);
+   pagetable_free(ptdesc);
}
 }
 
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 85c84e89e3ea..1212deeabe15 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -306,22 +306,22 @@ static pmd_t *get_pmd_from_cache(struct mm_struct *mm)
 static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
 {
void *ret = NULL;
-   struct page *page;
+   struct ptdesc *ptdesc;
gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO;
 
if (mm == _mm)
gfp &= ~__GFP_ACCOUNT;
-   page = alloc_page(gfp);
-   if (!page)
+   ptdesc = pagetable_alloc(gfp, 0);
+   if (!ptdesc)
return NULL;
-   if (!pgtable_pmd_page_ctor(page)) {
-   __free_pages(page, 0);
+   if (!pagetable_pmd_ctor(ptdesc)) {
+   pagetable_free(ptdesc);
return NULL;
}
 
-   atomic_set(>pt_frag_refcount, 1);
+   atomic_set(>pt_frag_refcount, 1);
 
-   ret = page_address(page);
+   ret = ptdesc_address(ptdesc);
/*
 * if we support only one fragment just return the
 * allocated page.
@@ -331,12 +331,12 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
 
spin_lock(>page_table_lock);
/*
-* If we find pgtable_page set, we return
+* If we find ptdesc_page set, we return
 * the allocated page with single fragment
 * count.
 */
if (likely(!mm->context.pmd_frag)) {
-   atomic_set(>pt_frag_refcount, PMD_FRAG_NR);
+   atomic_set(>pt_frag_refcount, PMD_FRAG_NR);
mm->context.pmd_frag = ret + PMD_FRAG_SIZE;
}
spin_unlock(>page_table_lock);
@@ -357,15 +357,15 @@ pmd_t *pmd_fragment_alloc(struct mm_struct *mm, unsigned 
long vmaddr)
 
 void pmd_fragment_free(unsigned long *pmd)
 {
-   struct page *page = virt_to_page(pmd);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
 
-   if (PageReserved(page))
-   return free_reserved_page(page);
+   if (pagetable_is_reserved(ptdesc))
+   return free_reserved_ptdesc(ptdesc);
 
-   BUG_ON(atomic_read(>pt_frag_refcount) <= 0);
-   if (atomic_dec_and_test(>pt_frag_refcount)) {
-   pgtable_pmd_page_dtor(page);
-   __free_page(page);
+   BUG_ON(atomic_read(>pt_frag_refcount) <= 0);
+   if (atomic_dec_and_test(>pt_frag_refcount)) {
+   pagetable_pmd_dtor(ptdesc);
+   pagetable_free(ptdesc);
}
 }
 
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index 20652daa1d7e..8961f1540209 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -18,15 +18,15 @@
 void pte_frag_destroy(void *pte_frag)
 {
int count;
-   struct page *page;
+   struct ptdesc *ptdesc;
 
-   page = virt_to_page(pte_frag);
+   ptdesc = virt_to_ptdesc(pte_frag);
/* drop all the pending references */
count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
-   if (atomic_sub_and_test(PTE_FRAG_NR - count, >pt_frag_refcount)) {
-   pgtable_pte_page_dtor(page);
-   __free_page(page);
+   if (atomic_sub_and_test(PTE_FRAG_NR - count, 
>pt_frag_refcount)) {
+   pagetable_pte_dtor(ptdesc);
+   pagetable_free(ptdesc);
}
 }
 
@@ -55,25 +55,25 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
 static pte_t *__alloc_for_ptecache(struct

[PATCH v4 28/34] openrisc: Convert __pte_free_tlb() to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/openrisc/include/asm/pgalloc.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/openrisc/include/asm/pgalloc.h 
b/arch/openrisc/include/asm/pgalloc.h
index b7b2b8d16fad..c6a73772a546 100644
--- a/arch/openrisc/include/asm/pgalloc.h
+++ b/arch/openrisc/include/asm/pgalloc.h
@@ -66,10 +66,10 @@ extern inline pgd_t *pgd_alloc(struct mm_struct *mm)
 
 extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm);
 
-#define __pte_free_tlb(tlb, pte, addr) \
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), (pte));  \
+#define __pte_free_tlb(tlb, pte, addr) \
+do {   \
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
 } while (0)
 
 #endif
-- 
2.40.1

[PATCH v4 29/34] riscv: Convert alloc_{pmd, pte}_late() to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Palmer Dabbelt 
---
 arch/riscv/include/asm/pgalloc.h |  8 
 arch/riscv/mm/init.c | 16 ++--
 2 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index 59dc12b5b7e8..d169a4f41a2e 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -153,10 +153,10 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 
 #endif /* __PAGETABLE_PMD_FOLDED */
 
-#define __pte_free_tlb(tlb, pte, buf)   \
-do {\
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), pte);\
+#define __pte_free_tlb(tlb, pte, buf)  \
+do {   \
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
 } while (0)
 #endif /* CONFIG_MMU */
 
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 3d689ffb2072..6bfeec80bf4e 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -354,12 +354,10 @@ static inline phys_addr_t __init 
alloc_pte_fixmap(uintptr_t va)
 
 static phys_addr_t __init alloc_pte_late(uintptr_t va)
 {
-   unsigned long vaddr;
-
-   vaddr = __get_free_page(GFP_KERNEL);
-   BUG_ON(!vaddr || !pgtable_pte_page_ctor(virt_to_page((void *)vaddr)));
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, 0);
 
-   return __pa(vaddr);
+   BUG_ON(!ptdesc || !pagetable_pte_ctor(ptdesc));
+   return __pa((pte_t *)ptdesc_address(ptdesc));
 }
 
 static void __init create_pte_mapping(pte_t *ptep,
@@ -437,12 +435,10 @@ static phys_addr_t __init alloc_pmd_fixmap(uintptr_t va)
 
 static phys_addr_t __init alloc_pmd_late(uintptr_t va)
 {
-   unsigned long vaddr;
-
-   vaddr = __get_free_page(GFP_KERNEL);
-   BUG_ON(!vaddr || !pgtable_pmd_page_ctor(virt_to_page((void *)vaddr)));
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, 0);
 
-   return __pa(vaddr);
+   BUG_ON(!ptdesc || !pagetable_pmd_ctor(ptdesc));
+   return __pa((pmd_t *)ptdesc_address(ptdesc));
 }
 
 static void __init create_pmd_mapping(pmd_t *pmdp,
-- 
2.40.1

[PATCH v4 19/34] pgalloc: Convert various functions to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/asm-generic/pgalloc.h | 62 +--
 1 file changed, 37 insertions(+), 25 deletions(-)

diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index a7cf825befae..3fd6ce79e654 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -18,7 +18,11 @@
  */
 static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm)
 {
-   return (pte_t *)__get_free_page(GFP_PGTABLE_KERNEL);
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL, 0);
+
+   if (!ptdesc)
+   return NULL;
+   return ptdesc_address(ptdesc);
 }
 
 #ifndef __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
@@ -41,7 +45,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct 
*mm)
  */
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-   free_page((unsigned long)pte);
+   pagetable_free(virt_to_ptdesc(pte));
 }
 
 /**
@@ -49,7 +53,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, 
pte_t *pte)
  * @mm: the mm_struct of the current context
  * @gfp: GFP flags to use for the allocation
  *
- * Allocates a page and runs the pgtable_pte_page_ctor().
+ * Allocates a ptdesc and runs the pagetable_pte_ctor().
  *
  * This function is intended for architectures that need
  * anything beyond simple page allocation or must have custom GFP flags.
@@ -58,17 +62,17 @@ static inline void pte_free_kernel(struct mm_struct *mm, 
pte_t *pte)
  */
 static inline pgtable_t __pte_alloc_one(struct mm_struct *mm, gfp_t gfp)
 {
-   struct page *pte;
+   struct ptdesc *ptdesc;
 
-   pte = alloc_page(gfp);
-   if (!pte)
+   ptdesc = pagetable_alloc(gfp, 0);
+   if (!ptdesc)
return NULL;
-   if (!pgtable_pte_page_ctor(pte)) {
-   __free_page(pte);
+   if (!pagetable_pte_ctor(ptdesc)) {
+   pagetable_free(ptdesc);
return NULL;
}
 
-   return pte;
+   return ptdesc_page(ptdesc);
 }
 
 #ifndef __HAVE_ARCH_PTE_ALLOC_ONE
@@ -76,7 +80,7 @@ static inline pgtable_t __pte_alloc_one(struct mm_struct *mm, 
gfp_t gfp)
  * pte_alloc_one - allocate a page for PTE-level user page table
  * @mm: the mm_struct of the current context
  *
- * Allocates a page and runs the pgtable_pte_page_ctor().
+ * Allocates a ptdesc and runs the pagetable_pte_ctor().
  *
  * Return: `struct page` initialized as page table or %NULL on error
  */
@@ -98,8 +102,10 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
  */
 static inline void pte_free(struct mm_struct *mm, struct page *pte_page)
 {
-   pgtable_pte_page_dtor(pte_page);
-   __free_page(pte_page);
+   struct ptdesc *ptdesc = page_ptdesc(pte_page);
+
+   pagetable_pte_dtor(ptdesc);
+   pagetable_free(ptdesc);
 }
 
 
@@ -110,7 +116,7 @@ static inline void pte_free(struct mm_struct *mm, struct 
page *pte_page)
  * pmd_alloc_one - allocate a page for PMD-level page table
  * @mm: the mm_struct of the current context
  *
- * Allocates a page and runs the pgtable_pmd_page_ctor().
+ * Allocates a ptdesc and runs the pagetable_pmd_ctor().
  * Allocations use %GFP_PGTABLE_USER in user context and
  * %GFP_PGTABLE_KERNEL in kernel context.
  *
@@ -118,28 +124,30 @@ static inline void pte_free(struct mm_struct *mm, struct 
page *pte_page)
  */
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-   struct page *page;
+   struct ptdesc *ptdesc;
gfp_t gfp = GFP_PGTABLE_USER;
 
if (mm == _mm)
gfp = GFP_PGTABLE_KERNEL;
-   page = alloc_page(gfp);
-   if (!page)
+   ptdesc = pagetable_alloc(gfp, 0);
+   if (!ptdesc)
return NULL;
-   if (!pgtable_pmd_page_ctor(page)) {
-   __free_page(page);
+   if (!pagetable_pmd_ctor(ptdesc)) {
+   pagetable_free(ptdesc);
return NULL;
}
-   return (pmd_t *)page_address(page);
+   return ptdesc_address(ptdesc);
 }
 #endif
 
 #ifndef __HAVE_ARCH_PMD_FREE
 static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 {
+   struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
+
BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
-   pgtable_pmd_page_dtor(virt_to_page(pmd));
-   free_page((unsigned long)pmd);
+   pagetable_pmd_dtor(ptdesc);
+   pagetable_free(ptdesc);
 }
 #endif
 
@@ -149,11 +157,15 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t 
*pmd)
 
 static inline pud_t *__pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-   gfp_t gfp = GFP_PGTABLE_USER;
+   gfp_t gfp = GFP_PGTABLE_USER

[PATCH v4 25/34] m68k: Convert various functions to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/m68k/include/asm/mcf_pgalloc.h  | 41 ++--
 arch/m68k/include/asm/sun3_pgalloc.h |  8 +++---
 arch/m68k/mm/motorola.c  |  4 +--
 3 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/arch/m68k/include/asm/mcf_pgalloc.h 
b/arch/m68k/include/asm/mcf_pgalloc.h
index 5c2c0a864524..857949ac9431 100644
--- a/arch/m68k/include/asm/mcf_pgalloc.h
+++ b/arch/m68k/include/asm/mcf_pgalloc.h
@@ -7,20 +7,19 @@
 
 extern inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-   free_page((unsigned long) pte);
+   pagetable_free(virt_to_ptdesc(pte));
 }
 
 extern const char bad_pmd_string[];
 
 extern inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 {
-   unsigned long page = __get_free_page(GFP_DMA);
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_DMA | __GFP_ZERO, 0);
 
-   if (!page)
+   if (!ptdesc)
return NULL;
 
-   memset((void *)page, 0, PAGE_SIZE);
-   return (pte_t *) (page);
+   return ptdesc_address(ptdesc);
 }
 
 extern inline pmd_t *pmd_alloc_kernel(pgd_t *pgd, unsigned long address)
@@ -35,36 +34,36 @@ extern inline pmd_t *pmd_alloc_kernel(pgd_t *pgd, unsigned 
long address)
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pgtable,
  unsigned long address)
 {
-   struct page *page = virt_to_page(pgtable);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pgtable);
 
-   pgtable_pte_page_dtor(page);
-   __free_page(page);
+   pagetable_pte_dtor(ptdesc);
+   pagetable_free(ptdesc);
 }
 
 static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
 {
-   struct page *page = alloc_pages(GFP_DMA, 0);
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_DMA, 0);
pte_t *pte;
 
-   if (!page)
+   if (!ptdesc)
return NULL;
-   if (!pgtable_pte_page_ctor(page)) {
-   __free_page(page);
+   if (!pagetable_pte_ctor(ptdesc)) {
+   pagetable_free(ptdesc);
return NULL;
}
 
-   pte = page_address(page);
-   clear_page(pte);
+   pte = ptdesc_address(ptdesc);
+   pagetable_clear(pte);
 
return pte;
 }
 
 static inline void pte_free(struct mm_struct *mm, pgtable_t pgtable)
 {
-   struct page *page = virt_to_page(pgtable);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pgtable);
 
-   pgtable_pte_page_dtor(page);
-   __free_page(page);
+   pagetable_pte_dtor(ptdesc);
+   pagetable_free(ptdesc);
 }
 
 /*
@@ -75,16 +74,18 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t 
pgtable)
 
 static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 {
-   free_page((unsigned long) pgd);
+   pagetable_free(virt_to_ptdesc(pgd));
 }
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
pgd_t *new_pgd;
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_DMA | GFP_NOWARN, 0);
 
-   new_pgd = (pgd_t *)__get_free_page(GFP_DMA | __GFP_NOWARN);
-   if (!new_pgd)
+   if (!ptdesc)
return NULL;
+   new_pgd = ptdesc_address(ptdesc);
+
memcpy(new_pgd, swapper_pg_dir, PTRS_PER_PGD * sizeof(pgd_t));
memset(new_pgd, 0, PAGE_OFFSET >> PGDIR_SHIFT);
return new_pgd;
diff --git a/arch/m68k/include/asm/sun3_pgalloc.h 
b/arch/m68k/include/asm/sun3_pgalloc.h
index 198036aff519..ff48573db2c0 100644
--- a/arch/m68k/include/asm/sun3_pgalloc.h
+++ b/arch/m68k/include/asm/sun3_pgalloc.h
@@ -17,10 +17,10 @@
 
 extern const char bad_pmd_string[];
 
-#define __pte_free_tlb(tlb,pte,addr)   \
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), pte);\
+#define __pte_free_tlb(tlb, pte, addr) \
+do {   \
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
 } while (0)
 
 static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, pte_t 
*pte)
diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
index c75984e2d86b..594575a0780c 100644
--- a/arch/m68k/mm/motorola.c
+++ b/arch/m68k/mm/motorola.c
@@ -161,7 +161,7 @@ void *get_pointer_table(int type)
 * m68k doesn't have SPLIT_PTE_PTLOCKS for not having
 * SMP.
 */
-   pgtable_pte_page_ctor(virt_to_page(page));
+

[PATCH v4 16/34] s390: Convert various gmap functions to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

In order to split struct ptdesc from struct page, convert various
functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/s390/mm/gmap.c | 230 
 1 file changed, 128 insertions(+), 102 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 81c683426b49..010e87df7299 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -34,7 +34,7 @@
 static struct gmap *gmap_alloc(unsigned long limit)
 {
struct gmap *gmap;
-   struct page *page;
+   struct ptdesc *ptdesc;
unsigned long *table;
unsigned long etype, atype;
 
@@ -67,12 +67,12 @@ static struct gmap *gmap_alloc(unsigned long limit)
spin_lock_init(>guest_table_lock);
spin_lock_init(>shadow_lock);
refcount_set(>ref_count, 1);
-   page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
-   if (!page)
+   ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
+   if (!ptdesc)
goto out_free;
-   page->_pt_s390_gaddr = 0;
-   list_add(>lru, >crst_list);
-   table = page_to_virt(page);
+   ptdesc->_pt_s390_gaddr = 0;
+   list_add(>pt_list, >crst_list);
+   table = ptdesc_to_virt(ptdesc);
crst_table_init(table, etype);
gmap->table = table;
gmap->asce = atype | _ASCE_TABLE_LENGTH |
@@ -181,25 +181,25 @@ static void gmap_rmap_radix_tree_free(struct 
radix_tree_root *root)
  */
 static void gmap_free(struct gmap *gmap)
 {
-   struct page *page, *next;
+   struct ptdesc *ptdesc, *next;
 
/* Flush tlb of all gmaps (if not already done for shadows) */
if (!(gmap_is_shadow(gmap) && gmap->removed))
gmap_flush_tlb(gmap);
/* Free all segment & region tables. */
-   list_for_each_entry_safe(page, next, >crst_list, lru) {
-   page->_pt_s390_gaddr = 0;
-   __free_pages(page, CRST_ALLOC_ORDER);
+   list_for_each_entry_safe(ptdesc, next, >crst_list, pt_list) {
+   ptdesc->_pt_s390_gaddr = 0;
+   pagetable_free(ptdesc);
}
gmap_radix_tree_free(>guest_to_host);
gmap_radix_tree_free(>host_to_guest);
 
/* Free additional data for a shadow gmap */
if (gmap_is_shadow(gmap)) {
-   /* Free all page tables. */
-   list_for_each_entry_safe(page, next, >pt_list, lru) {
-   page->_pt_s390_gaddr = 0;
-   page_table_free_pgste(page);
+   /* Free all ptdesc tables. */
+   list_for_each_entry_safe(ptdesc, next, >pt_list, pt_list) 
{
+   ptdesc->_pt_s390_gaddr = 0;
+   page_table_free_pgste(ptdesc_page(ptdesc));
}
gmap_rmap_radix_tree_free(>host_to_rmap);
/* Release reference to the parent */
@@ -308,27 +308,27 @@ EXPORT_SYMBOL_GPL(gmap_get_enabled);
 static int gmap_alloc_table(struct gmap *gmap, unsigned long *table,
unsigned long init, unsigned long gaddr)
 {
-   struct page *page;
+   struct ptdesc *ptdesc;
unsigned long *new;
 
/* since we dont free the gmap table until gmap_free we can unlock */
-   page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
-   if (!page)
+   ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
+   if (!ptdesc)
return -ENOMEM;
-   new = page_to_virt(page);
+   new = ptdesc_to_virt(ptdesc);
crst_table_init(new, init);
spin_lock(>guest_table_lock);
if (*table & _REGION_ENTRY_INVALID) {
-   list_add(>lru, >crst_list);
+   list_add(>pt_list, >crst_list);
*table = __pa(new) | _REGION_ENTRY_LENGTH |
(*table & _REGION_ENTRY_TYPE_MASK);
-   page->_pt_s390_gaddr = gaddr;
-   page = NULL;
+   ptdesc->_pt_s390_gaddr = gaddr;
+   ptdesc = NULL;
}
spin_unlock(>guest_table_lock);
-   if (page) {
-   page->_pt_s390_gaddr = 0;
-   __free_pages(page, CRST_ALLOC_ORDER);
+   if (ptdesc) {
+   ptdesc->_pt_s390_gaddr = 0;
+   pagetable_free(ptdesc);
}
return 0;
 }
@@ -341,15 +341,15 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned 
long *table,
  */
 static unsigned long __gmap_segment_gaddr(unsigned long *entry)
 {
-   struct page *page;
+   struct ptdesc *ptdesc;
unsigned long offset, mask;
 
offset = (unsigned long) entry / sizeof(unsigned long);
offset = (offset & (PTRS_PER_PMD - 1)) * PMD_SIZE;
mask = ~(PTRS_PER_PMD * sizeof(pmd_t) - 1);
-   page = virt_to_page((void *)((unsigned long)

[PATCH v4 31/34] sparc64: Convert various functions to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/sparc/mm/init_64.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 04f9db0c3111..105915cd2eee 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2893,14 +2893,15 @@ pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 
 pgtable_t pte_alloc_one(struct mm_struct *mm)
 {
-   struct page *page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-   if (!page)
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL | __GFP_ZERO, 0);
+
+   if (!ptdesc)
return NULL;
-   if (!pgtable_pte_page_ctor(page)) {
-   __free_page(page);
+   if (!pagetable_pte_ctor(ptdesc)) {
+   pagetable_free(ptdesc);
return NULL;
}
-   return (pte_t *) page_address(page);
+   return ptdesc_address(ptdesc);
 }
 
 void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
@@ -2910,10 +2911,10 @@ void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 
 static void __pte_free(pgtable_t pte)
 {
-   struct page *page = virt_to_page(pte);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pte);
 
-   pgtable_pte_page_dtor(page);
-   __free_page(page);
+   pagetable_pte_dtor(ptdesc);
+   pagetable_free(ptdesc);
 }
 
 void pte_free(struct mm_struct *mm, pgtable_t pte)
-- 
2.40.1

[PATCH v4 20/34] arm: Convert various functions to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

late_alloc() also uses the __get_free_pages() helper function. Convert
this to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/arm/include/asm/tlb.h | 12 +++-
 arch/arm/mm/mmu.c  |  6 +++---
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index b8cbe03ad260..f40d06ad5d2a 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -39,7 +39,9 @@ static inline void __tlb_remove_table(void *_table)
 static inline void
 __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr)
 {
-   pgtable_pte_page_dtor(pte);
+   struct ptdesc *ptdesc = page_ptdesc(pte);
+
+   pagetable_pte_dtor(ptdesc);
 
 #ifndef CONFIG_ARM_LPAE
/*
@@ -50,17 +52,17 @@ __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, 
unsigned long addr)
__tlb_adjust_range(tlb, addr - PAGE_SIZE, 2 * PAGE_SIZE);
 #endif
 
-   tlb_remove_table(tlb, pte);
+   tlb_remove_ptdesc(tlb, ptdesc);
 }
 
 static inline void
 __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
 {
 #ifdef CONFIG_ARM_LPAE
-   struct page *page = virt_to_page(pmdp);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
 
-   pgtable_pmd_page_dtor(page);
-   tlb_remove_table(tlb, page);
+   pagetable_pmd_dtor(ptdesc);
+   tlb_remove_ptdesc(tlb, ptdesc);
 #endif
 }
 
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 22292cf3381c..294518fd0240 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -737,11 +737,11 @@ static void __init *early_alloc(unsigned long sz)
 
 static void *__init late_alloc(unsigned long sz)
 {
-   void *ptr = (void *)__get_free_pages(GFP_PGTABLE_KERNEL, get_order(sz));
+   void *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL, get_order(sz));
 
-   if (!ptr || !pgtable_pte_page_ctor(virt_to_page(ptr)))
+   if (!ptdesc || !pagetable_pte_ctor(ptdesc))
BUG();
-   return ptr;
+   return ptdesc;
 }
 
 static pte_t * __init arm_pte_alloc(pmd_t *pmd, unsigned long addr,
-- 
2.40.1

[PATCH v4 15/34] x86: Convert various functions to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

In order to split struct ptdesc from struct page, convert various
functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/x86/mm/pgtable.c | 46 +--
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 15a8009a4480..6da7fd5d4782 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -52,7 +52,7 @@ early_param("userpte", setup_userpte);
 
 void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
 {
-   pgtable_pte_page_dtor(pte);
+   pagetable_pte_dtor(page_ptdesc(pte));
paravirt_release_pte(page_to_pfn(pte));
paravirt_tlb_remove_table(tlb, pte);
 }
@@ -60,7 +60,7 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
 #if CONFIG_PGTABLE_LEVELS > 2
 void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 {
-   struct page *page = virt_to_page(pmd);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
paravirt_release_pmd(__pa(pmd) >> PAGE_SHIFT);
/*
 * NOTE! For PAE, any changes to the top page-directory-pointer-table
@@ -69,8 +69,8 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 #ifdef CONFIG_X86_PAE
tlb->need_flush_all = 1;
 #endif
-   pgtable_pmd_page_dtor(page);
-   paravirt_tlb_remove_table(tlb, page);
+   pagetable_pmd_dtor(ptdesc);
+   paravirt_tlb_remove_table(tlb, ptdesc_page(ptdesc));
 }
 
 #if CONFIG_PGTABLE_LEVELS > 3
@@ -92,16 +92,16 @@ void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
 
 static inline void pgd_list_add(pgd_t *pgd)
 {
-   struct page *page = virt_to_page(pgd);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pgd);
 
-   list_add(>lru, _list);
+   list_add(>pt_list, _list);
 }
 
 static inline void pgd_list_del(pgd_t *pgd)
 {
-   struct page *page = virt_to_page(pgd);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pgd);
 
-   list_del(>lru);
+   list_del(>pt_list);
 }
 
 #define UNSHARED_PTRS_PER_PGD  \
@@ -112,12 +112,12 @@ static inline void pgd_list_del(pgd_t *pgd)
 
 static void pgd_set_mm(pgd_t *pgd, struct mm_struct *mm)
 {
-   virt_to_page(pgd)->pt_mm = mm;
+   virt_to_ptdesc(pgd)->pt_mm = mm;
 }
 
 struct mm_struct *pgd_page_get_mm(struct page *page)
 {
-   return page->pt_mm;
+   return page_ptdesc(page)->pt_mm;
 }
 
 static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
@@ -213,11 +213,14 @@ void pud_populate(struct mm_struct *mm, pud_t *pudp, 
pmd_t *pmd)
 static void free_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
 {
int i;
+   struct ptdesc *ptdesc;
 
for (i = 0; i < count; i++)
if (pmds[i]) {
-   pgtable_pmd_page_dtor(virt_to_page(pmds[i]));
-   free_page((unsigned long)pmds[i]);
+   ptdesc = virt_to_ptdesc(pmds[i]);
+
+   pagetable_pmd_dtor(ptdesc);
+   pagetable_free(ptdesc);
mm_dec_nr_pmds(mm);
}
 }
@@ -232,16 +235,21 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t 
*pmds[], int count)
gfp &= ~__GFP_ACCOUNT;
 
for (i = 0; i < count; i++) {
-   pmd_t *pmd = (pmd_t *)__get_free_page(gfp);
-   if (!pmd)
+   pmd_t *pmd = NULL;
+   struct ptdesc *ptdesc = pagetable_alloc(gfp, 0);
+
+   if (!ptdesc)
failed = true;
-   if (pmd && !pgtable_pmd_page_ctor(virt_to_page(pmd))) {
-   free_page((unsigned long)pmd);
-   pmd = NULL;
+   if (ptdesc && !pagetable_pmd_ctor(ptdesc)) {
+   pagetable_free(ptdesc);
+   ptdesc = NULL;
failed = true;
}
-   if (pmd)
+   if (ptdesc) {
mm_inc_nr_pmds(mm);
+   pmd = ptdesc_address(ptdesc);
+   }
+
pmds[i] = pmd;
}
 
@@ -830,7 +838,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 
free_page((unsigned long)pmd_sv);
 
-   pgtable_pmd_page_dtor(virt_to_page(pmd));
+   pagetable_pmd_dtor(virt_to_ptdesc(pmd));
free_page((unsigned long)pmd);
 
return 1;
-- 
2.40.1

[PATCH v4 21/34] arm64: Convert various functions to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/arm64/include/asm/tlb.h | 14 --
 arch/arm64/mm/mmu.c  |  7 ---
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index c995d1f4594f..2c29239d05c3 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -75,18 +75,20 @@ static inline void tlb_flush(struct mmu_gather *tlb)
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
  unsigned long addr)
 {
-   pgtable_pte_page_dtor(pte);
-   tlb_remove_table(tlb, pte);
+   struct ptdesc *ptdesc = page_ptdesc(pte);
+
+   pagetable_pte_dtor(ptdesc);
+   tlb_remove_ptdesc(tlb, ptdesc);
 }
 
 #if CONFIG_PGTABLE_LEVELS > 2
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
  unsigned long addr)
 {
-   struct page *page = virt_to_page(pmdp);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
 
-   pgtable_pmd_page_dtor(page);
-   tlb_remove_table(tlb, page);
+   pagetable_pmd_dtor(ptdesc);
+   tlb_remove_ptdesc(tlb, ptdesc);
 }
 #endif
 
@@ -94,7 +96,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, 
pmd_t *pmdp,
 static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
  unsigned long addr)
 {
-   tlb_remove_table(tlb, virt_to_page(pudp));
+   tlb_remove_ptdesc(tlb, virt_to_ptdesc(pudp));
 }
 #endif
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index af6bc8403ee4..5867a0e917b9 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -426,6 +426,7 @@ static phys_addr_t __pgd_pgtable_alloc(int shift)
 static phys_addr_t pgd_pgtable_alloc(int shift)
 {
phys_addr_t pa = __pgd_pgtable_alloc(shift);
+   struct ptdesc *ptdesc = page_ptdesc(phys_to_page(pa));
 
/*
 * Call proper page table ctor in case later we need to
@@ -433,12 +434,12 @@ static phys_addr_t pgd_pgtable_alloc(int shift)
 * this pre-allocated page table.
 *
 * We don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK if pmd is
-* folded, and if so pgtable_pmd_page_ctor() becomes nop.
+* folded, and if so pagetable_pte_ctor() becomes nop.
 */
if (shift == PAGE_SHIFT)
-   BUG_ON(!pgtable_pte_page_ctor(phys_to_page(pa)));
+   BUG_ON(!pagetable_pte_ctor(ptdesc));
else if (shift == PMD_SHIFT)
-   BUG_ON(!pgtable_pmd_page_ctor(phys_to_page(pa)));
+   BUG_ON(!pagetable_pmd_ctor(ptdesc));
 
return pa;
 }
-- 
2.40.1

[PATCH v4 24/34] loongarch: Convert various functions to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/loongarch/include/asm/pgalloc.h | 27 +++
 arch/loongarch/mm/pgtable.c  |  7 ---
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/arch/loongarch/include/asm/pgalloc.h 
b/arch/loongarch/include/asm/pgalloc.h
index af1d1e4a6965..70bb3bdd201e 100644
--- a/arch/loongarch/include/asm/pgalloc.h
+++ b/arch/loongarch/include/asm/pgalloc.h
@@ -45,9 +45,9 @@ extern void pagetable_init(void);
 extern pgd_t *pgd_alloc(struct mm_struct *mm);
 
 #define __pte_free_tlb(tlb, pte, address)  \
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), pte);\
+do {   \
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
 } while (0)
 
 #ifndef __PAGETABLE_PMD_FOLDED
@@ -55,18 +55,18 @@ do {
\
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 {
pmd_t *pmd;
-   struct page *pg;
+   struct ptdesc *ptdesc;
 
-   pg = alloc_page(GFP_KERNEL_ACCOUNT);
-   if (!pg)
+   ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, 0);
+   if (!ptdesc)
return NULL;
 
-   if (!pgtable_pmd_page_ctor(pg)) {
-   __free_page(pg);
+   if (!pagetable_pmd_ctor(ptdesc)) {
+   pagetable_free(ptdesc);
return NULL;
}
 
-   pmd = (pmd_t *)page_address(pg);
+   pmd = ptdesc_address(ptdesc);
pmd_init(pmd);
return pmd;
 }
@@ -80,10 +80,13 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
unsigned long address)
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
 {
pud_t *pud;
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, 0);
 
-   pud = (pud_t *) __get_free_page(GFP_KERNEL);
-   if (pud)
-   pud_init(pud);
+   if (!ptdesc)
+   return NULL;
+   pud = ptdesc_address(ptdesc);
+
+   pud_init(pud);
return pud;
 }
 
diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
index 36a6dc0148ae..cdba10ffc0df 100644
--- a/arch/loongarch/mm/pgtable.c
+++ b/arch/loongarch/mm/pgtable.c
@@ -11,10 +11,11 @@
 
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-   pgd_t *ret, *init;
+   pgd_t *init, *ret = NULL;
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, 0);
 
-   ret = (pgd_t *) __get_free_page(GFP_KERNEL);
-   if (ret) {
+   if (ptdesc) {
+   ret = (pgd_t *)ptdesc_address(ptdesc);
init = pgd_offset(_mm, 0UL);
pgd_init(ret);
memcpy(ret + USER_PTRS_PER_PGD, init + USER_PTRS_PER_PGD,
-- 
2.40.1

[PATCH v4 18/34] mm: Remove page table members from struct page

2023-06-12 Thread Vishal Moola (Oracle)

The page table members are now split out into their own ptdesc struct.
Remove them from struct page.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/mm_types.h | 14 --
 include/linux/pgtable.h  |  3 ---
 2 files changed, 17 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6161fe1ae5b8..31ffa1be21d0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -141,20 +141,6 @@ struct page {
struct {/* Tail pages of compound page */
unsigned long compound_head;/* Bit zero is set */
};
-   struct {/* Page table pages */
-   unsigned long _pt_pad_1;/* compound_head */
-   pgtable_t pmd_huge_pte; /* protected by page->ptl */
-   unsigned long _pt_s390_gaddr;   /* mapping */
-   union {
-   struct mm_struct *pt_mm; /* x86 pgds only */
-   atomic_t pt_frag_refcount; /* powerpc */
-   };
-#if ALLOC_SPLIT_PTLOCKS
-   spinlock_t *ptl;
-#else
-   spinlock_t ptl;
-#endif
-   };
struct {/* ZONE_DEVICE pages */
/** @pgmap: Points to the hosting device page map. */
struct dev_pagemap *pgmap;
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index c405f74d3875..33cc19d752b3 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1019,10 +1019,7 @@ struct ptdesc {
 TABLE_MATCH(flags, __page_flags);
 TABLE_MATCH(compound_head, pt_list);
 TABLE_MATCH(compound_head, _pt_pad_1);
-TABLE_MATCH(pmd_huge_pte, pmd_huge_pte);
 TABLE_MATCH(mapping, _pt_s390_gaddr);
-TABLE_MATCH(pt_mm, pt_mm);
-TABLE_MATCH(ptl, ptl);
 #undef TABLE_MATCH
 static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
 
-- 
2.40.1

[PATCH v4 13/34] mm: Create ptdesc equivalents for pgtable_{pte,pmd}_page_{ctor,dtor}

2023-06-12 Thread Vishal Moola (Oracle)

Creates pagetable_pte_ctor(), pagetable_pmd_ctor(), pagetable_pte_dtor(),
and pagetable_pmd_dtor() and make the original pgtable
constructor/destructors wrappers.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/mm.h | 56 ++
 1 file changed, 42 insertions(+), 14 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a1af7983e1bd..dc211c43610b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2886,20 +2886,34 @@ static inline bool ptlock_init(struct ptdesc *ptdesc) { 
return true; }
 static inline void ptlock_free(struct ptdesc *ptdesc) {}
 #endif /* USE_SPLIT_PTE_PTLOCKS */
 
-static inline bool pgtable_pte_page_ctor(struct page *page)
+static inline bool pagetable_pte_ctor(struct ptdesc *ptdesc)
 {
-   if (!ptlock_init(page_ptdesc(page)))
+   struct folio *folio = ptdesc_folio(ptdesc);
+
+   if (!ptlock_init(ptdesc))
return false;
-   __SetPageTable(page);
-   inc_lruvec_page_state(page, NR_PAGETABLE);
+   __folio_set_table(folio);
+   lruvec_stat_add_folio(folio, NR_PAGETABLE);
return true;
 }
 
+static inline bool pgtable_pte_page_ctor(struct page *page)
+{
+   return pagetable_pte_ctor(page_ptdesc(page));
+}
+
+static inline void pagetable_pte_dtor(struct ptdesc *ptdesc)
+{
+   struct folio *folio = ptdesc_folio(ptdesc);
+
+   ptlock_free(ptdesc);
+   __folio_clear_table(folio);
+   lruvec_stat_sub_folio(folio, NR_PAGETABLE);
+}
+
 static inline void pgtable_pte_page_dtor(struct page *page)
 {
-   ptlock_free(page_ptdesc(page));
-   __ClearPageTable(page);
-   dec_lruvec_page_state(page, NR_PAGETABLE);
+   pagetable_pte_dtor(page_ptdesc(page));
 }
 
 #define pte_offset_map_lock(mm, pmd, address, ptlp)\
@@ -2981,20 +2995,34 @@ static inline spinlock_t *pmd_lock(struct mm_struct 
*mm, pmd_t *pmd)
return ptl;
 }
 
-static inline bool pgtable_pmd_page_ctor(struct page *page)
+static inline bool pagetable_pmd_ctor(struct ptdesc *ptdesc)
 {
-   if (!pmd_ptlock_init(page_ptdesc(page)))
+   struct folio *folio = ptdesc_folio(ptdesc);
+
+   if (!pmd_ptlock_init(ptdesc))
return false;
-   __SetPageTable(page);
-   inc_lruvec_page_state(page, NR_PAGETABLE);
+   __folio_set_table(folio);
+   lruvec_stat_add_folio(folio, NR_PAGETABLE);
return true;
 }
 
+static inline bool pgtable_pmd_page_ctor(struct page *page)
+{
+   return pagetable_pmd_ctor(page_ptdesc(page));
+}
+
+static inline void pagetable_pmd_dtor(struct ptdesc *ptdesc)
+{
+   struct folio *folio = ptdesc_folio(ptdesc);
+
+   pmd_ptlock_free(ptdesc);
+   __folio_clear_table(folio);
+   lruvec_stat_sub_folio(folio, NR_PAGETABLE);
+}
+
 static inline void pgtable_pmd_page_dtor(struct page *page)
 {
-   pmd_ptlock_free(page_ptdesc(page));
-   __ClearPageTable(page);
-   dec_lruvec_page_state(page, NR_PAGETABLE);
+   pagetable_pmd_dtor(page_ptdesc(page));
 }
 
 /*
-- 
2.40.1

[PATCH v4 10/34] mm: Convert ptlock_init() to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/mm.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index daecf1db6cf1..f48e626d9c98 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2857,7 +2857,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
 }
 
-static inline bool ptlock_init(struct page *page)
+static inline bool ptlock_init(struct ptdesc *ptdesc)
 {
/*
 * prep_new_page() initialize page->private (and therefore page->ptl)
@@ -2866,10 +2866,10 @@ static inline bool ptlock_init(struct page *page)
 * It can happen if arch try to use slab for page table allocation:
 * slab code uses page->slab_cache, which share storage with page->ptl.
 */
-   VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
-   if (!ptlock_alloc(page_ptdesc(page)))
+   VM_BUG_ON_PAGE(*(unsigned long *)>ptl, ptdesc_page(ptdesc));
+   if (!ptlock_alloc(ptdesc))
return false;
-   spin_lock_init(ptlock_ptr(page_ptdesc(page)));
+   spin_lock_init(ptlock_ptr(ptdesc));
return true;
 }
 
@@ -2882,13 +2882,13 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
return >page_table_lock;
 }
 static inline void ptlock_cache_init(void) {}
-static inline bool ptlock_init(struct page *page) { return true; }
+static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
 static inline void ptlock_free(struct page *page) {}
 #endif /* USE_SPLIT_PTE_PTLOCKS */
 
 static inline bool pgtable_pte_page_ctor(struct page *page)
 {
-   if (!ptlock_init(page))
+   if (!ptlock_init(page_ptdesc(page)))
return false;
__SetPageTable(page);
inc_lruvec_page_state(page, NR_PAGETABLE);
@@ -2947,7 +2947,7 @@ static inline bool pmd_ptlock_init(struct ptdesc *ptdesc)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
ptdesc->pmd_huge_pte = NULL;
 #endif
-   return ptlock_init(ptdesc_page(ptdesc));
+   return ptlock_init(ptdesc);
 }
 
 static inline void pmd_ptlock_free(struct page *page)
-- 
2.40.1

[PATCH v4 12/34] mm: Convert ptlock_free() to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/mm.h | 10 +-
 mm/memory.c|  4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3b54bb4c9753..a1af7983e1bd 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2826,7 +2826,7 @@ static inline void pagetable_clear(void *x)
 #if ALLOC_SPLIT_PTLOCKS
 void __init ptlock_cache_init(void);
 bool ptlock_alloc(struct ptdesc *ptdesc);
-extern void ptlock_free(struct page *page);
+void ptlock_free(struct ptdesc *ptdesc);
 
 static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
 {
@@ -2842,7 +2842,7 @@ static inline bool ptlock_alloc(struct ptdesc *ptdesc)
return true;
 }
 
-static inline void ptlock_free(struct page *page)
+static inline void ptlock_free(struct ptdesc *ptdesc)
 {
 }
 
@@ -2883,7 +2883,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
 }
 static inline void ptlock_cache_init(void) {}
 static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
-static inline void ptlock_free(struct page *page) {}
+static inline void ptlock_free(struct ptdesc *ptdesc) {}
 #endif /* USE_SPLIT_PTE_PTLOCKS */
 
 static inline bool pgtable_pte_page_ctor(struct page *page)
@@ -2897,7 +2897,7 @@ static inline bool pgtable_pte_page_ctor(struct page 
*page)
 
 static inline void pgtable_pte_page_dtor(struct page *page)
 {
-   ptlock_free(page);
+   ptlock_free(page_ptdesc(page));
__ClearPageTable(page);
dec_lruvec_page_state(page, NR_PAGETABLE);
 }
@@ -2955,7 +2955,7 @@ static inline void pmd_ptlock_free(struct ptdesc *ptdesc)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc));
 #endif
-   ptlock_free(ptdesc_page(ptdesc));
+   ptlock_free(ptdesc);
 }
 
 #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
diff --git a/mm/memory.c b/mm/memory.c
index ba9579117686..d4d2ea5cf0fd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5945,8 +5945,8 @@ bool ptlock_alloc(struct ptdesc *ptdesc)
return true;
 }
 
-void ptlock_free(struct page *page)
+void ptlock_free(struct ptdesc *ptdesc)
 {
-   kmem_cache_free(page_ptl_cachep, page->ptl);
+   kmem_cache_free(page_ptl_cachep, ptdesc->ptl);
 }
 #endif
-- 
2.40.1

[PATCH v4 09/34] mm: Convert pmd_ptlock_init() to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/mm.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bb934d51390f..daecf1db6cf1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2942,12 +2942,12 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
return ptlock_ptr(pmd_ptdesc(pmd));
 }
 
-static inline bool pmd_ptlock_init(struct page *page)
+static inline bool pmd_ptlock_init(struct ptdesc *ptdesc)
 {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-   page->pmd_huge_pte = NULL;
+   ptdesc->pmd_huge_pte = NULL;
 #endif
-   return ptlock_init(page);
+   return ptlock_init(ptdesc_page(ptdesc));
 }
 
 static inline void pmd_ptlock_free(struct page *page)
@@ -2967,7 +2967,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
return >page_table_lock;
 }
 
-static inline bool pmd_ptlock_init(struct page *page) { return true; }
+static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; }
 static inline void pmd_ptlock_free(struct page *page) {}
 
 #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
@@ -2983,7 +2983,7 @@ static inline spinlock_t *pmd_lock(struct mm_struct *mm, 
pmd_t *pmd)
 
 static inline bool pgtable_pmd_page_ctor(struct page *page)
 {
-   if (!pmd_ptlock_init(page))
+   if (!pmd_ptlock_init(page_ptdesc(page)))
return false;
__SetPageTable(page);
inc_lruvec_page_state(page, NR_PAGETABLE);
-- 
2.40.1

[PATCH v4 11/34] mm: Convert pmd_ptlock_free() to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/mm.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f48e626d9c98..3b54bb4c9753 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2950,12 +2950,12 @@ static inline bool pmd_ptlock_init(struct ptdesc 
*ptdesc)
return ptlock_init(ptdesc);
 }
 
-static inline void pmd_ptlock_free(struct page *page)
+static inline void pmd_ptlock_free(struct ptdesc *ptdesc)
 {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-   VM_BUG_ON_PAGE(page->pmd_huge_pte, page);
+   VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc));
 #endif
-   ptlock_free(page);
+   ptlock_free(ptdesc_page(ptdesc));
 }
 
 #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
@@ -2968,7 +2968,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
 }
 
 static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; }
-static inline void pmd_ptlock_free(struct page *page) {}
+static inline void pmd_ptlock_free(struct ptdesc *ptdesc) {}
 
 #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
 
@@ -2992,7 +2992,7 @@ static inline bool pgtable_pmd_page_ctor(struct page 
*page)
 
 static inline void pgtable_pmd_page_dtor(struct page *page)
 {
-   pmd_ptlock_free(page);
+   pmd_ptlock_free(page_ptdesc(page));
__ClearPageTable(page);
dec_lruvec_page_state(page, NR_PAGETABLE);
 }
-- 
2.40.1

[PATCH v4 05/34] mm: add utility functions for ptdesc

2023-06-12 Thread Vishal Moola (Oracle)

Introduce utility functions setting the foundation for ptdescs. These
will also assist in the splitting out of ptdesc from struct page.

Functions that focus on the descriptor are prefixed with ptdesc_* while
functions that focus on the pagetable are prefixed with pagetable_*.

pagetable_alloc() is defined to allocate new ptdesc pages as compound
pages. This is to standardize ptdescs by allowing for one allocation
and one free function, in contrast to 2 allocation and 2 free functions.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/asm-generic/tlb.h | 11 +++
 include/linux/mm.h| 61 +++
 include/linux/pgtable.h   | 12 
 3 files changed, 84 insertions(+)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b46617207c93..6bade9e0e799 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -481,6 +481,17 @@ static inline void tlb_remove_page(struct mmu_gather *tlb, 
struct page *page)
return tlb_remove_page_size(tlb, page, PAGE_SIZE);
 }
 
+static inline void tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
+{
+   tlb_remove_table(tlb, pt);
+}
+
+/* Like tlb_remove_ptdesc, but for page-like page directories. */
+static inline void tlb_remove_page_ptdesc(struct mmu_gather *tlb, struct 
ptdesc *pt)
+{
+   tlb_remove_page(tlb, ptdesc_page(pt));
+}
+
 static inline void tlb_change_page_size(struct mmu_gather *tlb,
 unsigned int page_size)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0db09639dd2d..f184f1eba85d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2766,6 +2766,62 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, 
pud_t *pud, unsigned long a
 }
 #endif /* CONFIG_MMU */
 
+static inline struct ptdesc *virt_to_ptdesc(const void *x)
+{
+   return page_ptdesc(virt_to_page(x));
+}
+
+static inline void *ptdesc_to_virt(const struct ptdesc *pt)
+{
+   return page_to_virt(ptdesc_page(pt));
+}
+
+static inline void *ptdesc_address(const struct ptdesc *pt)
+{
+   return folio_address(ptdesc_folio(pt));
+}
+
+static inline bool pagetable_is_reserved(struct ptdesc *pt)
+{
+   return folio_test_reserved(ptdesc_folio(pt));
+}
+
+/**
+ * pagetable_alloc - Allocate pagetables
+ * @gfp:GFP flags
+ * @order:  desired pagetable order
+ *
+ * pagetable_alloc allocates a page table descriptor as well as all pages
+ * described by it.
+ *
+ * Return: The ptdesc describing the allocated page tables.
+ */
+static inline struct ptdesc *pagetable_alloc(gfp_t gfp, unsigned int order)
+{
+   struct page *page = alloc_pages(gfp | __GFP_COMP, order);
+
+   return page_ptdesc(page);
+}
+
+/**
+ * pagetable_free - Free pagetables
+ * @pt:The page table descriptor
+ *
+ * pagetable_free frees a page table descriptor as well as all page
+ * tables described by said ptdesc.
+ */
+static inline void pagetable_free(struct ptdesc *pt)
+{
+   struct page *page = ptdesc_page(pt);
+
+   __free_pages(page, compound_order(page));
+}
+
+static inline void pagetable_clear(void *x)
+{
+   clear_page(x);
+}
+
 #if USE_SPLIT_PTE_PTLOCKS
 #if ALLOC_SPLIT_PTLOCKS
 void __init ptlock_cache_init(void);
@@ -2992,6 +3048,11 @@ static inline void mark_page_reserved(struct page *page)
adjust_managed_page_count(page, -1);
 }
 
+static inline void free_reserved_ptdesc(struct ptdesc *pt)
+{
+   free_reserved_page(ptdesc_page(pt));
+}
+
 /*
  * Default method to free all the __init memory into the buddy system.
  * The freed pages will be poisoned with pattern "poison" if it's within
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 330de96ebfd6..c405f74d3875 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1026,6 +1026,18 @@ TABLE_MATCH(ptl, ptl);
 #undef TABLE_MATCH
 static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
 
+#define ptdesc_page(pt)(_Generic((pt), 
\
+   const struct ptdesc *:  (const struct page *)(pt),  \
+   struct ptdesc *:(struct page *)(pt)))
+
+#define ptdesc_folio(pt)   (_Generic((pt), \
+   const struct ptdesc *:  (const struct folio *)(pt), \
+   struct ptdesc *:(struct folio *)(pt)))
+
+#define page_ptdesc(p) (_Generic((p),  \
+   const struct page *:(const struct ptdesc *)(p), \
+   struct page *:  (struct ptdesc *)(p)))
+
 /*
  * No-op macros that just return the current protection value. Defined here
  * because these macros can be used even if CONFIG_MMU is not defined.
-- 
2.40.1

[PATCH v4 08/34] mm: Convert ptlock_ptr() to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/x86/xen/mmu_pv.c |  2 +-
 include/linux/mm.h| 14 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index b3b8d289b9ab..f469862e3ef4 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -651,7 +651,7 @@ static spinlock_t *xen_pte_lock(struct page *page, struct 
mm_struct *mm)
spinlock_t *ptl = NULL;
 
 #if USE_SPLIT_PTE_PTLOCKS
-   ptl = ptlock_ptr(page);
+   ptl = ptlock_ptr(page_ptdesc(page));
spin_lock_nest_lock(ptl, >page_table_lock);
 #endif
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e6f1be2a405e..bb934d51390f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2828,9 +2828,9 @@ void __init ptlock_cache_init(void);
 bool ptlock_alloc(struct ptdesc *ptdesc);
 extern void ptlock_free(struct page *page);
 
-static inline spinlock_t *ptlock_ptr(struct page *page)
+static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
 {
-   return page->ptl;
+   return ptdesc->ptl;
 }
 #else /* ALLOC_SPLIT_PTLOCKS */
 static inline void ptlock_cache_init(void)
@@ -2846,15 +2846,15 @@ static inline void ptlock_free(struct page *page)
 {
 }
 
-static inline spinlock_t *ptlock_ptr(struct page *page)
+static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
 {
-   return >ptl;
+   return >ptl;
 }
 #endif /* ALLOC_SPLIT_PTLOCKS */
 
 static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
 {
-   return ptlock_ptr(pmd_page(*pmd));
+   return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
 }
 
 static inline bool ptlock_init(struct page *page)
@@ -2869,7 +2869,7 @@ static inline bool ptlock_init(struct page *page)
VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
if (!ptlock_alloc(page_ptdesc(page)))
return false;
-   spin_lock_init(ptlock_ptr(page));
+   spin_lock_init(ptlock_ptr(page_ptdesc(page)));
return true;
 }
 
@@ -2939,7 +2939,7 @@ static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd)
 
 static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
 {
-   return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd)));
+   return ptlock_ptr(pmd_ptdesc(pmd));
 }
 
 static inline bool pmd_ptlock_init(struct page *page)
-- 
2.40.1

[PATCH v4 07/34] mm: Convert ptlock_alloc() to use ptdescs

2023-06-12 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/mm.h | 6 +++---
 mm/memory.c| 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 088b7664f897..e6f1be2a405e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2825,7 +2825,7 @@ static inline void pagetable_clear(void *x)
 #if USE_SPLIT_PTE_PTLOCKS
 #if ALLOC_SPLIT_PTLOCKS
 void __init ptlock_cache_init(void);
-extern bool ptlock_alloc(struct page *page);
+bool ptlock_alloc(struct ptdesc *ptdesc);
 extern void ptlock_free(struct page *page);
 
 static inline spinlock_t *ptlock_ptr(struct page *page)
@@ -2837,7 +2837,7 @@ static inline void ptlock_cache_init(void)
 {
 }
 
-static inline bool ptlock_alloc(struct page *page)
+static inline bool ptlock_alloc(struct ptdesc *ptdesc)
 {
return true;
 }
@@ -2867,7 +2867,7 @@ static inline bool ptlock_init(struct page *page)
 * slab code uses page->slab_cache, which share storage with page->ptl.
 */
VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
-   if (!ptlock_alloc(page))
+   if (!ptlock_alloc(page_ptdesc(page)))
return false;
spin_lock_init(ptlock_ptr(page));
return true;
diff --git a/mm/memory.c b/mm/memory.c
index 80ce9dda2779..ba9579117686 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5934,14 +5934,14 @@ void __init ptlock_cache_init(void)
SLAB_PANIC, NULL);
 }
 
-bool ptlock_alloc(struct page *page)
+bool ptlock_alloc(struct ptdesc *ptdesc)
 {
spinlock_t *ptl;
 
ptl = kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL);
if (!ptl)
return false;
-   page->ptl = ptl;
+   ptdesc->ptl = ptl;
return true;
 }
 
-- 
2.40.1

[PATCH v4 06/34] mm: Convert pmd_pgtable_page() to pmd_ptdesc()

2023-06-12 Thread Vishal Moola (Oracle)

Converts pmd_pgtable_page() to pmd_ptdesc() and all its callers. This
removes some direct accesses to struct page, working towards splitting
out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/mm.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f184f1eba85d..088b7664f897 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2931,15 +2931,15 @@ static inline void pgtable_pte_page_dtor(struct page 
*page)
 
 #if USE_SPLIT_PMD_PTLOCKS
 
-static inline struct page *pmd_pgtable_page(pmd_t *pmd)
+static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd)
 {
unsigned long mask = ~(PTRS_PER_PMD * sizeof(pmd_t) - 1);
-   return virt_to_page((void *)((unsigned long) pmd & mask));
+   return virt_to_ptdesc((void *)((unsigned long) pmd & mask));
 }
 
 static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
 {
-   return ptlock_ptr(pmd_pgtable_page(pmd));
+   return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd)));
 }
 
 static inline bool pmd_ptlock_init(struct page *page)
@@ -2958,7 +2958,7 @@ static inline void pmd_ptlock_free(struct page *page)
ptlock_free(page);
 }
 
-#define pmd_huge_pte(mm, pmd) (pmd_pgtable_page(pmd)->pmd_huge_pte)
+#define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
 
 #else
 
-- 
2.40.1

[PATCH v4 04/34] pgtable: Create struct ptdesc

2023-06-12 Thread Vishal Moola (Oracle)

Currently, page table information is stored within struct page. As part
of simplifying struct page, create struct ptdesc for page table
information.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/pgtable.h | 51 +
 1 file changed, 51 insertions(+)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index c5a51481bbb9..330de96ebfd6 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -975,6 +975,57 @@ static inline void ptep_modify_prot_commit(struct 
vm_area_struct *vma,
 #endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */
 #endif /* CONFIG_MMU */
 
+
+/**
+ * struct ptdesc - Memory descriptor for page tables.
+ * @__page_flags: Same as page flags. Unused for page tables.
+ * @pt_list: List of used page tables. Used for s390 and x86.
+ * @_pt_pad_1: Padding that aliases with page's compound head.
+ * @pmd_huge_pte: Protected by ptdesc->ptl, used for THPs.
+ * @_pt_s390_gaddr: Aliases with page's mapping. Used for s390 gmap only.
+ * @pt_mm: Used for x86 pgds.
+ * @pt_frag_refcount: For fragmented page table tracking. Powerpc and s390 
only.
+ * @ptl: Lock for the page table.
+ *
+ * This struct overlays struct page for now. Do not modify without a good
+ * understanding of the issues.
+ */
+struct ptdesc {
+   unsigned long __page_flags;
+
+   union {
+   struct list_head pt_list;
+   struct {
+   unsigned long _pt_pad_1;
+   pgtable_t pmd_huge_pte;
+   };
+   };
+   unsigned long _pt_s390_gaddr;
+
+   union {
+   struct mm_struct *pt_mm;
+   atomic_t pt_frag_refcount;
+   };
+
+#if ALLOC_SPLIT_PTLOCKS
+   spinlock_t *ptl;
+#else
+   spinlock_t ptl;
+#endif
+};
+
+#define TABLE_MATCH(pg, pt)\
+   static_assert(offsetof(struct page, pg) == offsetof(struct ptdesc, pt))
+TABLE_MATCH(flags, __page_flags);
+TABLE_MATCH(compound_head, pt_list);
+TABLE_MATCH(compound_head, _pt_pad_1);
+TABLE_MATCH(pmd_huge_pte, pmd_huge_pte);
+TABLE_MATCH(mapping, _pt_s390_gaddr);
+TABLE_MATCH(pt_mm, pt_mm);
+TABLE_MATCH(ptl, ptl);
+#undef TABLE_MATCH
+static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
+
 /*
  * No-op macros that just return the current protection value. Defined here
  * because these macros can be used even if CONFIG_MMU is not defined.
-- 
2.40.1

[PATCH v4 03/34] s390: Use pt_frag_refcount for pagetables

2023-06-12 Thread Vishal Moola (Oracle)

s390 currently uses _refcount to identify fragmented page tables.
The page table struct already has a member pt_frag_refcount used by
powerpc, so have s390 use that instead of the _refcount field as well.
This improves the safety for _refcount and the page table tracking.

This also allows us to simplify the tracking since we can once again use
the lower byte of pt_frag_refcount instead of the upper byte of _refcount.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/s390/mm/pgalloc.c | 38 +++---
 1 file changed, 15 insertions(+), 23 deletions(-)

diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 66ab68db9842..6b99932abc66 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -182,20 +182,17 @@ void page_table_free_pgste(struct page *page)
  * As follows from the above, no unallocated or fully allocated parent
  * pages are contained in mm_context_t::pgtable_list.
  *
- * The upper byte (bits 24-31) of the parent page _refcount is used
+ * The lower byte (bits 0-7) of the parent page pt_frag_refcount is used
  * for tracking contained 2KB-pgtables and has the following format:
  *
  *   PP  AA
- * 01234567upper byte (bits 24-31) of struct page::_refcount
+ * 01234567upper byte (bits 0-7) of struct page::pt_frag_refcount
  *   ||  ||
  *   ||  |+--- upper 2KB-pgtable is allocated
  *   ||  + lower 2KB-pgtable is allocated
  *   |+--- upper 2KB-pgtable is pending for removal
  *   + lower 2KB-pgtable is pending for removal
  *
- * (See commit 620b4e903179 ("s390: use _refcount for pgtables") on why
- * using _refcount is possible).
- *
  * When 2KB-pgtable is allocated the corresponding AA bit is set to 1.
  * The parent page is either:
  *   - added to mm_context_t::pgtable_list in case the second half of the
@@ -243,11 +240,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
if (!list_empty(>context.pgtable_list)) {
page = list_first_entry(>context.pgtable_list,
struct page, lru);
-   mask = atomic_read(>_refcount) >> 24;
+   mask = atomic_read(>pt_frag_refcount);
/*
 * The pending removal bits must also be checked.
 * Failure to do so might lead to an impossible
-* value of (i.e 0x13 or 0x23) written to _refcount.
+* value of (i.e 0x13 or 0x23) written to
+* pt_frag_refcount.
 * Such values violate the assumption that pending and
 * allocation bits are mutually exclusive, and the rest
 * of the code unrails as result. That could lead to
@@ -259,8 +257,8 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
bit = mask & 1; /* =1 -> second 2K */
if (bit)
table += PTRS_PER_PTE;
-   atomic_xor_bits(>_refcount,
-   0x01U << (bit + 24));
+   atomic_xor_bits(>pt_frag_refcount,
+   0x01U << bit);
list_del(>lru);
}
}
@@ -281,12 +279,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
table = (unsigned long *) page_to_virt(page);
if (mm_alloc_pgste(mm)) {
/* Return 4K page table with PGSTEs */
-   atomic_xor_bits(>_refcount, 0x03U << 24);
+   atomic_xor_bits(>pt_frag_refcount, 0x03U);
memset64((u64 *)table, _PAGE_INVALID, PTRS_PER_PTE);
memset64((u64 *)table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
} else {
/* Return the first 2K fragment of the page */
-   atomic_xor_bits(>_refcount, 0x01U << 24);
+   atomic_xor_bits(>pt_frag_refcount, 0x01U);
memset64((u64 *)table, _PAGE_INVALID, 2 * PTRS_PER_PTE);
spin_lock_bh(>context.lock);
list_add(>lru, >context.pgtable_list);
@@ -323,22 +321,19 @@ void page_table_free(struct mm_struct *mm, unsigned long 
*table)
 * will happen outside of the critical section from this
 * function or from __tlb_remove_table()
 */
-   mask = atomic_xor_bits(>_refcount, 0x11U << (bit + 24));
-   mask >>= 24;
+   mask = atomic_xor_bits(>pt_frag_refcount, 0x11U << bit);
if (mask & 0x03U)
list_add(>lru, >context.pgtable_list);
else
list_del(>lru);
spin_unlock_bh(>context.lock);
-   mask = atomic_xor_bits(>_refcount, 0x10U << (bit + 24));
-

[PATCH v4 02/34] s390: Use _pt_s390_gaddr for gmap address tracking

2023-06-12 Thread Vishal Moola (Oracle)

s390 uses page->index to keep track of page tables for the guest address
space. In an attempt to consolidate the usage of page fields in s390,
replace _pt_pad_2 with _pt_s390_gaddr to replace page->index in gmap.

This will help with the splitting of struct ptdesc from struct page, as
well as allow s390 to use _pt_frag_refcount for fragmented page table
tracking.

Since page->_pt_s390_gaddr aliases with mapping, ensure its set to NULL
before freeing the pages as well.

This also reverts commit 7e25de77bc5ea ("s390/mm: use pmd_pgtable_page()
helper in __gmap_segment_gaddr()") which had s390 use
pmd_pgtable_page() to get a gmap page table, as pmd_pgtable_page()
should be used for more generic process page tables.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/s390/mm/gmap.c  | 56 +++-
 include/linux/mm_types.h |  2 +-
 2 files changed, 39 insertions(+), 19 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index dc90d1eb0d55..81c683426b49 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -70,7 +70,7 @@ static struct gmap *gmap_alloc(unsigned long limit)
page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
if (!page)
goto out_free;
-   page->index = 0;
+   page->_pt_s390_gaddr = 0;
list_add(>lru, >crst_list);
table = page_to_virt(page);
crst_table_init(table, etype);
@@ -187,16 +187,20 @@ static void gmap_free(struct gmap *gmap)
if (!(gmap_is_shadow(gmap) && gmap->removed))
gmap_flush_tlb(gmap);
/* Free all segment & region tables. */
-   list_for_each_entry_safe(page, next, >crst_list, lru)
+   list_for_each_entry_safe(page, next, >crst_list, lru) {
+   page->_pt_s390_gaddr = 0;
__free_pages(page, CRST_ALLOC_ORDER);
+   }
gmap_radix_tree_free(>guest_to_host);
gmap_radix_tree_free(>host_to_guest);
 
/* Free additional data for a shadow gmap */
if (gmap_is_shadow(gmap)) {
/* Free all page tables. */
-   list_for_each_entry_safe(page, next, >pt_list, lru)
+   list_for_each_entry_safe(page, next, >pt_list, lru) {
+   page->_pt_s390_gaddr = 0;
page_table_free_pgste(page);
+   }
gmap_rmap_radix_tree_free(>host_to_rmap);
/* Release reference to the parent */
gmap_put(gmap->parent);
@@ -318,12 +322,14 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned 
long *table,
list_add(>lru, >crst_list);
*table = __pa(new) | _REGION_ENTRY_LENGTH |
(*table & _REGION_ENTRY_TYPE_MASK);
-   page->index = gaddr;
+   page->_pt_s390_gaddr = gaddr;
page = NULL;
}
spin_unlock(>guest_table_lock);
-   if (page)
+   if (page) {
+   page->_pt_s390_gaddr = 0;
__free_pages(page, CRST_ALLOC_ORDER);
+   }
return 0;
 }
 
@@ -336,12 +342,14 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned 
long *table,
 static unsigned long __gmap_segment_gaddr(unsigned long *entry)
 {
struct page *page;
-   unsigned long offset;
+   unsigned long offset, mask;
 
offset = (unsigned long) entry / sizeof(unsigned long);
offset = (offset & (PTRS_PER_PMD - 1)) * PMD_SIZE;
-   page = pmd_pgtable_page((pmd_t *) entry);
-   return page->index + offset;
+   mask = ~(PTRS_PER_PMD * sizeof(pmd_t) - 1);
+   page = virt_to_page((void *)((unsigned long) entry & mask));
+
+   return page->_pt_s390_gaddr + offset;
 }
 
 /**
@@ -1351,6 +1359,7 @@ static void gmap_unshadow_pgt(struct gmap *sg, unsigned 
long raddr)
/* Free page table */
page = phys_to_page(pgt);
list_del(>lru);
+   page->_pt_s390_gaddr = 0;
page_table_free_pgste(page);
 }
 
@@ -1379,6 +1388,7 @@ static void __gmap_unshadow_sgt(struct gmap *sg, unsigned 
long raddr,
/* Free page table */
page = phys_to_page(pgt);
list_del(>lru);
+   page->_pt_s390_gaddr = 0;
page_table_free_pgste(page);
}
 }
@@ -1409,6 +1419,7 @@ static void gmap_unshadow_sgt(struct gmap *sg, unsigned 
long raddr)
/* Free segment table */
page = phys_to_page(sgt);
list_del(>lru);
+   page->_pt_s390_gaddr = 0;
__free_pages(page, CRST_ALLOC_ORDER);
 }
 
@@ -1437,6 +1448,7 @@ static void __gmap_unshadow_r3t(struct gmap *sg, unsigned 
long raddr,
/* Free segment table */
page = phys_to_page(sgt);
list_del(>lru);
+   page->_pt_s390_gaddr = 0;
__free_pages(page, CRST_ALLOC_ORDER);
}
 }
@@ -1467,6 +1479,7 @@ static void gmap_unshadow_r3t(struct gmap *sg, unsigned 
long raddr)
/* Free region 3 table */

[PATCH v4 00/34] Split ptdesc from struct page

2023-06-12 Thread Vishal Moola (Oracle)

The MM subsystem is trying to shrink struct page. This patchset
introduces a memory descriptor for page table tracking - struct ptdesc.

This patchset introduces ptdesc, splits ptdesc from struct page, and
converts many callers of page table constructor/destructors to use ptdescs.

Ptdesc is a foundation to further standardize page tables, and eventually
allow for dynamic allocation of page tables independent of struct page.
However, the use of pages for page table tracking is quite deeply
ingrained and varied across archictectures, so there is still a lot of
work to be done before that can happen.

This is rebased on next-20230609.

v4:
  Got more Acked-bys
  Fixed m68k compilation issue
  Dropped unnecessary casts
  Cleanup some fields in struct ptdesc

Vishal Moola (Oracle) (34):
  mm: Add PAGE_TYPE_OP folio functions
  s390: Use _pt_s390_gaddr for gmap address tracking
  s390: Use pt_frag_refcount for pagetables
  pgtable: Create struct ptdesc
  mm: add utility functions for ptdesc
  mm: Convert pmd_pgtable_page() to pmd_ptdesc()
  mm: Convert ptlock_alloc() to use ptdescs
  mm: Convert ptlock_ptr() to use ptdescs
  mm: Convert pmd_ptlock_init() to use ptdescs
  mm: Convert ptlock_init() to use ptdescs
  mm: Convert pmd_ptlock_free() to use ptdescs
  mm: Convert ptlock_free() to use ptdescs
  mm: Create ptdesc equivalents for pgtable_{pte,pmd}_page_{ctor,dtor}
  powerpc: Convert various functions to use ptdescs
  x86: Convert various functions to use ptdescs
  s390: Convert various gmap functions to use ptdescs
  s390: Convert various pgalloc functions to use ptdescs
  mm: Remove page table members from struct page
  pgalloc: Convert various functions to use ptdescs
  arm: Convert various functions to use ptdescs
  arm64: Convert various functions to use ptdescs
  csky: Convert __pte_free_tlb() to use ptdescs
  hexagon: Convert __pte_free_tlb() to use ptdescs
  loongarch: Convert various functions to use ptdescs
  m68k: Convert various functions to use ptdescs
  mips: Convert various functions to use ptdescs
  nios2: Convert __pte_free_tlb() to use ptdescs
  openrisc: Convert __pte_free_tlb() to use ptdescs
  riscv: Convert alloc_{pmd, pte}_late() to use ptdescs
  sh: Convert pte_free_tlb() to use ptdescs
  sparc64: Convert various functions to use ptdescs
  sparc: Convert pgtable_pte_page_{ctor, dtor}() to ptdesc equivalents
  um: Convert {pmd, pte}_free_tlb() to use ptdescs
  mm: Remove pgtable_{pmd, pte}_page_{ctor, dtor}() wrappers

 Documentation/mm/split_page_table_lock.rst|  12 +-
 .../zh_CN/mm/split_page_table_lock.rst|  14 +-
 arch/arm/include/asm/tlb.h|  12 +-
 arch/arm/mm/mmu.c |   6 +-
 arch/arm64/include/asm/tlb.h  |  14 +-
 arch/arm64/mm/mmu.c   |   7 +-
 arch/csky/include/asm/pgalloc.h   |   4 +-
 arch/hexagon/include/asm/pgalloc.h|   8 +-
 arch/loongarch/include/asm/pgalloc.h  |  27 ++-
 arch/loongarch/mm/pgtable.c   |   7 +-
 arch/m68k/include/asm/mcf_pgalloc.h   |  41 ++--
 arch/m68k/include/asm/sun3_pgalloc.h  |   8 +-
 arch/m68k/mm/motorola.c   |   4 +-
 arch/mips/include/asm/pgalloc.h   |  31 +--
 arch/mips/mm/pgtable.c|   7 +-
 arch/nios2/include/asm/pgalloc.h  |   8 +-
 arch/openrisc/include/asm/pgalloc.h   |   8 +-
 arch/powerpc/mm/book3s64/mmu_context.c|  10 +-
 arch/powerpc/mm/book3s64/pgtable.c|  32 +--
 arch/powerpc/mm/pgtable-frag.c|  46 ++--
 arch/riscv/include/asm/pgalloc.h  |   8 +-
 arch/riscv/mm/init.c  |  16 +-
 arch/s390/include/asm/pgalloc.h   |   4 +-
 arch/s390/include/asm/tlb.h   |   4 +-
 arch/s390/mm/gmap.c   | 222 +++---
 arch/s390/mm/pgalloc.c| 126 +-
 arch/sh/include/asm/pgalloc.h |   9 +-
 arch/sparc/mm/init_64.c   |  17 +-
 arch/sparc/mm/srmmu.c |   5 +-
 arch/um/include/asm/pgalloc.h |  18 +-
 arch/x86/mm/pgtable.c |  46 ++--
 arch/x86/xen/mmu_pv.c |   2 +-
 include/asm-generic/pgalloc.h |  62 +++--
 include/asm-generic/tlb.h |  11 +
 include/linux/mm.h| 155 
 include/linux/mm_types.h  |  14 --
 include/linux/page-flags.h|  20 +-
 include/linux/pgtable.h   |  60 +
 mm/memory.c   |   8 +-
 39 files changed, 664 insertions(+), 449 deletions(-)

-- 
2.40.1

[PATCH v4 01/34] mm: Add PAGE_TYPE_OP folio functions

2023-06-12 Thread Vishal Moola (Oracle)

No folio equivalents for page type operations have been defined, so
define them for later folio conversions.

Also changes the Page##uname macros to take in const struct page* since
we only read the memory here.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/page-flags.h | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 92a2063a0a23..e99a616b9bcd 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -908,6 +908,8 @@ static inline bool is_page_hwpoison(struct page *page)
 
 #define PageType(page, flag)   \
((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
+#define folio_test_type(folio, flag)   \
+   ((folio->page.page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
 
 static inline int page_type_has_type(unsigned int page_type)
 {
@@ -920,20 +922,34 @@ static inline int page_has_type(struct page *page)
 }
 
 #define PAGE_TYPE_OPS(uname, lname)\
-static __always_inline int Page##uname(struct page *page)  \
+static __always_inline int Page##uname(const struct page *page)
\
 {  \
return PageType(page, PG_##lname);  \
 }  \
+static __always_inline int folio_test_##lname(const struct folio *folio)\
+{  \
+   return folio_test_type(folio, PG_##lname);  \
+}  \
 static __always_inline void __SetPage##uname(struct page *page)
\
 {  \
VM_BUG_ON_PAGE(!PageType(page, 0), page);   \
page->page_type &= ~PG_##lname; \
 }  \
+static __always_inline void __folio_set_##lname(struct folio *folio)   \
+{  \
+   VM_BUG_ON_FOLIO(!folio_test_type(folio, 0), folio); \
+   folio->page.page_type &= ~PG_##lname;   \
+}  \
 static __always_inline void __ClearPage##uname(struct page *page)  \
 {  \
VM_BUG_ON_PAGE(!Page##uname(page), page);   \
page->page_type |= PG_##lname;  \
-}
+}  \
+static __always_inline void __folio_clear_##lname(struct folio *folio) \
+{  \
+   VM_BUG_ON_FOLIO(!folio_test_##lname(folio), folio); \
+   folio->page.page_type |= PG_##lname;\
+}  \
 
 /*
  * PageBuddy() indicates that the page is free and in the buddy system
-- 
2.40.1

Re: [PATCH 2/5] libxl: drop dead assignments to "ret" from libxl__domain_config_setdefault()

2023-06-12 Thread Daniel P. Smith





On 6/12/23 15:44, Daniel P. Smith wrote:

On 6/12/23 07:46, Jan Beulich wrote:

The variable needs to be properly set only on the error paths.

Coverity ID: 1532311
Fixes: ab4440112bec ("xl / libxl: push parsing of SSID and CPU pool ID 
down to libxl")

Signed-off-by: Jan Beulich 


Reviewed-by: Daniel P. Smith 


---
If XSM is disabled, is it really useful to issue the 2nd and 3rd calls
if the 1st yielded ENOSYS?


Would you be okay with the calls staying if instead on the first 
invocation of any libxl_flask_* method, flask status was checked and 
stored in a variable that would then be checked by any subsequent calls 
and immediately returned if flask was not enabled?


v/r,
dps


Looking closer I realized there is a slight flaw in the logic here. The 
first call is accomplished via an xsm_op call and then assumes that 
FLASK is the only XSM that has implemented the xsm hook, xsm_op, and 
that the result will be an ENOSYS. If someone decides to implement an 
xsm_op hook for any of the existing XSM modules or introduces a new XSM 
module that has an xsm_op hook, the return likely would not be ENOSYS. I 
have often debated if there should be a way to query which XSM module 
was loaded for instances just like this. The question is what mechanism 
would be best to do so.


v/r,
dps

Re: [PATCH v3 1/4] limits.h: add UCHAR_MAX, SCHAR_MAX, and SCHAR_MIN

2023-06-12 Thread Demi Marie Obenour

On Mon, Jun 12, 2023 at 05:31:51PM +0100, Vincenzo Frascino wrote:
> Hi Demi,
> 
> On 6/10/23 21:40, Demi Marie Obenour wrote:
> > Some drivers already defined these, and they will be used by sscanf()
> > for overflow checks later.  Also add SSIZE_MIN to limits.h, which will
> > also be needed later.
> > 
> > Signed-off-by: Demi Marie Obenour 
> > ---
> >  .../media/atomisp/pci/hive_isp_css_include/platform_support.h  | 1 -
> >  include/linux/limits.h | 1 +
> >  include/linux/mfd/wl1273-core.h| 3 ---
> >  include/vdso/limits.h  | 3 +++
> >  4 files changed, 4 insertions(+), 4 deletions(-)
> > 
> ...
> 
> > diff --git a/include/vdso/limits.h b/include/vdso/limits.h
> > index 
> > 0197888ad0e00b2f853d3f25ffa764f61cca7385..0cad0a2490e5efc194d874025eb3e3b846a5c7b4
> >  100644
> > --- a/include/vdso/limits.h
> > +++ b/include/vdso/limits.h
> > @@ -2,6 +2,9 @@
> >  #ifndef __VDSO_LIMITS_H
> >  #define __VDSO_LIMITS_H
> >  
> > +#define UCHAR_MAX  ((unsigned char)~0U)
> > +#define SCHAR_MAX  ((signed char)(UCHAR_MAX >> 1))
> > +#define SCHAR_MIN  ((signed char)(-SCHAR_MAX - 1))
> 
> Are you planning to use those definitions in the vDSO library?

Nope.  They were added here for consistency with the other *_{MIN,MAX}
defines.

> If not can you please define them in linux/limits.h, the vdso headers contain
> only what is necessary for the vDSO library.

Will fix in the next version.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature

[qemu-mainline test] 181394: regressions - FAIL

2023-06-12 Thread osstest service owner

flight 181394 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181394/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64-xsm   6 xen-buildfail REGR. vs. 180691
 build-arm64   6 xen-buildfail REGR. vs. 180691
 build-amd64   6 xen-buildfail REGR. vs. 180691
 build-i3866 xen-buildfail REGR. vs. 180691
 build-amd64-xsm   6 xen-buildfail REGR. vs. 180691
 build-i386-xsm6 xen-buildfail REGR. vs. 180691
 build-armhf   6 xen-buildfail REGR. vs. 180691

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-ws16-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-shadow 1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-vhd1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-coresched-i386-xl  1 build-check(1)   blocked  n/a
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a

Re: [PATCH] xen: speed up grant-table reclaim

2023-06-12 Thread Demi Marie Obenour

On Mon, Jun 12, 2023 at 08:27:59AM +0200, Juergen Gross wrote:
> On 10.06.23 17:32, Demi Marie Obenour wrote:
> > When a grant entry is still in use by the remote domain, Linux must put
> > it on a deferred list.
> 
> This lacks quite some context.
> 
> The main problem is related to the grant not having been unmapped after
> the end of a request, but the side granting the access is assuming this
> should be the case.

The GUI agent has relied on deferred grant reclaim for as long as it has
existed.  One could argue that doing so means that the agent is misusing
gntalloc, but this is not documented anywhere.  A better fix would be to
use IOCTL_GNTDEV_SET_UNMAP_NOTIFY in the GUI daemon.

> In general this means that the two sides implementing the protocol don't
> agree how it should work, or that the protocol itself has a flaw.

What would a better solution be?  This is going to be particularly
tricky with Wayland, as the wl_shm protocol makes absolutely no
guarantees that compositors will promptly release the mapping and
provides no way whatsoever for Wayland clients to know when this has
happened.  Relying on an LD_PRELOAD hack is not sustainable.

> > Normally, this list is very short, because
> > the PV network and block protocols expect the backend to unmap the grant
> > first.
> 
> Normally the list is just empty. Only in very rare cases like premature
> PV frontend module unloading it is expected to see cases of deferred
> grant reclaims.

In the case of a system using only properly-designed PV protocols
implemented in kernel mode I agree.  However, both libxenvchan and the
Qubes GUI protocol are implemented in user mode and this means that if
the frontend process (the one that uses gntalloc) crashes, deferred
grant reclaims will occur.  Worse, it is possible for the domain to use
the grant in a PV protocol.  If the PV backend driver then maps and
unmaps the grant and then tells the frontend driver to reclaim it, but
the backend userspace process (the one using gntdev) maps it before the
frontend does reclaim it, the frontend will think the backend is trying
to exploit XSA-396 and freeze the connection.

> > However, Qubes OS's GUI protocol is subject to the constraints
> > of the X Window System, and as such winds up with the frontend unmapping
> > the window first.  As a result, the list can grow very large, resulting
> > in a massive memory leak and eventual VM freeze.
> 
> I do understand that it is difficult to change the protocol and/or
> behavior after the fact, or that performance considerations are in the
> way of doing so.

Would the correct fix be to use IOCTL_GNTDEV_SET_UNMAP_NOTIFY?  That
would require that the agent either create a new event channel for each
window or maintain a pool of event channels, but that should be doable.
This still does not solve the problem of the frontend exiting
unexpectedly, though.

> > To partially solve this problem, make the number of entries that the VM
> > will attempt to free at each iteration tunable.  The default is still
> > 10, but it can be overridden at compile-time (via Kconfig), boot-time
> > (via a kernel command-line option), or runtime (via sysfs).
> 
> Is there really a need to have another Kconfig option for this? AFAICS
> only QubesOS is affected by the problem you are trying to solve. I don't
> see why you can't use the command-line option or sysfs node to set the
> higher reclaim batch size.

Fair.  In practice, Qubes OS will need to use the sysfs node, since
the other two do not work with in-VM kernels.

> > Fixes: 569ca5b3f94c ("xen/gnttab: add deferred freeing logic")
> 
> I don't think this "Fixes:" tag is appropriate. The mentioned commit didn't
> have a bug. You are adding new functionality on top of it.

I’ll drop the "Fixes:" tag, but I will keep the "Cc: sta...@vger.kernel.org"
as I believe this patch meets the following criterion for stable
backport (from Documentation/process/stable-kernel-rules.rst):

Serious issues as reported by a user of a distribution kernel may also
be considered if they fix a notable performance or interactivity issue.

> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Demi Marie Obenour 
> > ---
> >   drivers/xen/Kconfig   | 12 
> >   drivers/xen/grant-table.c | 40 ---
> >   2 files changed, 41 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> > index 
> > d5d7c402b65112b8592ba10bd3fd1732c26b771e..8f96e1359eb102d6420775b66e7805004a4ce9fe
> >  100644
> > --- a/drivers/xen/Kconfig
> > +++ b/drivers/xen/Kconfig
> > @@ -65,6 +65,18 @@ config XEN_MEMORY_HOTPLUG_LIMIT
> >   This value is used to allocate enough space in internal
> >   tables needed for physical memory administration.
> > +config XEN_GRANTS_RECLAIM_PER_ITERATION
> > +   int "Default number of grant entries to reclaim per iteration"
> > +   default 10
> > +   range 10 4294967295
> > +   help
> > + This sets the default

Re: [PATCH 2/5] libxl: drop dead assignments to "ret" from libxl__domain_config_setdefault()

2023-06-12 Thread Daniel P. Smith


On 6/12/23 07:46, Jan Beulich wrote:

The variable needs to be properly set only on the error paths.

Coverity ID: 1532311
Fixes: ab4440112bec ("xl / libxl: push parsing of SSID and CPU pool ID down to 
libxl")
Signed-off-by: Jan Beulich 


Reviewed-by: Daniel P. Smith 


---
If XSM is disabled, is it really useful to issue the 2nd and 3rd calls
if the 1st yielded ENOSYS?


Would you be okay with the calls staying if instead on the first 
invocation of any libxl_flask_* method, flask status was checked and 
stored in a variable that would then be checked by any subsequent calls 
and immediately returned if flask was not enabled?


v/r,
dps

[PATCH] x86/spec-ctrl: Fix the rendering of FB_CLEAR

2023-06-12 Thread Andrew Cooper

FB_CLEAR is a read-only status bit, not a read-write control.  Move it from
"Hardware features" into "Hardware hints".

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
---
 xen/arch/x86/spec_ctrl.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index cd5ea6aa52d9..ec4bcdd97e04 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -409,7 +409,7 @@ static void __init print_details(enum ind_thunk thunk)
  * Hardware read-only information, stating immunity to certain issues, or
  * suggestions of which mitigation to use.
  */
-printk("  Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+printk("  Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
(caps & ARCH_CAPS_RDCL_NO)? " RDCL_NO"  
  : "",
(caps & ARCH_CAPS_EIBRS)  ? " EIBRS"
  : "",
(caps & ARCH_CAPS_RSBA)   ? " RSBA" 
  : "",
@@ -422,6 +422,7 @@ static void __init print_details(enum ind_thunk thunk)
(caps & ARCH_CAPS_SBDR_SSDP_NO)   ? " SBDR_SSDP_NO" 
  : "",
(caps & ARCH_CAPS_FBSDP_NO)   ? " FBSDP_NO" 
  : "",
(caps & ARCH_CAPS_PSDP_NO)? " PSDP_NO"  
  : "",
+   (caps & ARCH_CAPS_FB_CLEAR)   ? " FB_CLEAR" 
  : "",
(caps & ARCH_CAPS_PBRSB_NO)   ? " PBRSB_NO" 
  : "",
(e8b  & cpufeat_mask(X86_FEATURE_IBRS_ALWAYS))? " IBRS_ALWAYS"  
  : "",
(e8b  & cpufeat_mask(X86_FEATURE_STIBP_ALWAYS))   ? " STIBP_ALWAYS" 
  : "",
@@ -431,7 +432,7 @@ static void __init print_details(enum ind_thunk thunk)
(e8b  & cpufeat_mask(X86_FEATURE_IBPB_RET))   ? " IBPB_RET" 
  : "");
 
 /* Hardware features which need driving to mitigate issues. */
-printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
+printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s\n",
(e8b  & cpufeat_mask(X86_FEATURE_IBPB)) ||
(_7d0 & cpufeat_mask(X86_FEATURE_IBRSB))  ? " IBPB" 
  : "",
(e8b  & cpufeat_mask(X86_FEATURE_IBRS)) ||
@@ -447,7 +448,6 @@ static void __init print_details(enum ind_thunk thunk)
(_7d0 & cpufeat_mask(X86_FEATURE_SRBDS_CTRL)) ? " SRBDS_CTRL"   
  : "",
(e8b  & cpufeat_mask(X86_FEATURE_VIRT_SSBD))  ? " VIRT_SSBD"
  : "",
(caps & ARCH_CAPS_TSX_CTRL)   ? " TSX_CTRL" 
  : "",
-   (caps & ARCH_CAPS_FB_CLEAR)   ? " FB_CLEAR" 
  : "",
(caps & ARCH_CAPS_FB_CLEAR_CTRL)  ? " 
FB_CLEAR_CTRL"  : "");
 
 /* Compiled-in support which pertains to mitigations. */
-- 
2.30.2

[qemu-mainline test] 181393: regressions - FAIL

2023-06-12 Thread osstest service owner

flight 181393 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181393/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64-xsm   6 xen-buildfail REGR. vs. 180691
 build-arm64   6 xen-buildfail REGR. vs. 180691
 build-amd64   6 xen-buildfail REGR. vs. 180691
 build-i3866 xen-buildfail REGR. vs. 180691
 build-amd64-xsm   6 xen-buildfail REGR. vs. 180691
 build-i386-xsm6 xen-buildfail REGR. vs. 180691
 build-armhf   6 xen-buildfail REGR. vs. 180691

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-ws16-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-shadow 1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-vhd1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-coresched-i386-xl  1 build-check(1)   blocked  n/a
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a

Re: [PATCH v2 4/4] x86/microcode: Prevent attempting updates if DIS_MCU_LOAD is set

2023-06-12 Thread Andrew Cooper

On 05/06/2023 6:08 pm, Alejandro Vallejo wrote:
> diff --git a/xen/arch/x86/cpu/microcode/core.c 
> b/xen/arch/x86/cpu/microcode/core.c
> index 4f60d96d98..a4c123118b 100644
> --- a/xen/arch/x86/cpu/microcode/core.c
> +++ b/xen/arch/x86/cpu/microcode/core.c
> @@ -871,6 +885,15 @@ int __init early_microcode_init(unsigned long 
> *module_map,
>   * present.
>   */
>  ucode_ops = intel_ucode_ops;
> +
> +/*
> + * In the case where microcode updates are blocked by the
> + * DIS_MCU_LOAD bit we can still read the microcode version even if
> + * we can't change it.
> + */
> +if ( !this_cpu_can_install_update() )
> +ucode_ops = (struct microcode_ops){ .collect_cpu_info =
> +intel_ucode_ops.collect_cpu_info };

I don't see how this (the logic in this_cpu_can_install_update()) can
work, as ...

>  break;
>  }
>  
> @@ -900,6 +923,10 @@ int __init early_microcode_init(unsigned long 
> *module_map,
>  if ( ucode_mod.mod_end || ucode_blob.size )
>  rc = early_microcode_update_cpu();
>  
> +/*
> + * We just updated microcode so we must reload the boot_cpu_data bits
> + * we read before because they might be stale after the updata.
> + */
>  early_read_cpuid_7d0();
>  
>  /*

... MSR_ARCH_CAPS is read out-of-context down here.

In hindsight, I think swapping patch 2 and 3 might be wise.  The rev ==
~0 case doesn't need any of the cpu_has_* shuffling, and it already
starts to build up the if/else chain of cases where we decide to clobber
the apply_microcode() hook.

The call to this_cpu_can_install_update() should be lower down.  In
principle it's not Intel-specific.

~Andrew

Re: [PATCH v2 3/4] x86/microcode: Ignore microcode loading interface for revision = -1

2023-06-12 Thread Andrew Cooper

On 05/06/2023 6:08 pm, Alejandro Vallejo wrote:
> diff --git a/xen/arch/x86/cpu/microcode/core.c 
> b/xen/arch/x86/cpu/microcode/core.c
> index 892bcec901..4f60d96d98 100644
> --- a/xen/arch/x86/cpu/microcode/core.c
> +++ b/xen/arch/x86/cpu/microcode/core.c
> @@ -874,6 +874,21 @@ int __init early_microcode_init(unsigned long 
> *module_map,
>  break;
>  }
>  
> +if ( ucode_ops.collect_cpu_info )
> +ucode_ops.collect_cpu_info();
> +
> +/*
> + * This is a special case for virtualized Xen.

I'm not sure this first sentence is useful.  I'd just start with "Some
hypervisors ..."

>  Some hypervisors
> + * deliberately report a microcode revision of -1 to mean that they
> + * will not accept microcode updates. We take the hint and ignore the
> + * microcode interface in that case.
> + */
> +if ( this_cpu(cpu_sig).rev == ~0 )
> +{
> +this_cpu(cpu_sig) = (struct cpu_signature){ 0 };
> +ucode_ops = (struct microcode_ops){ 0 };

I think we want to retain XENPF_get_ucode_revision's ability to see this ~0.

As with the following patch, we want to retain the ability to query, so
leave cpu_sig alone and only remove the apply_microcode hook.  In turn,
that probably means this wants to be an else if in the next clause down.

Moving it down also means you can drop the check for collect_cpu_info,
because it's a mandatory hook if ucode_ops was filled in.

~Andrew

> +}
> +
>  if ( !ucode_ops.apply_microcode )
>  {
>  printk(XENLOG_WARNING "Microcode loading not available\n");

Re: [PATCH v2 2/4] x86: Read MSR_ARCH_CAPS after early_microcode_init()

2023-06-12 Thread Andrew Cooper

On 12/06/2023 4:46 pm, Jan Beulich wrote:
> On 05.06.2023 19:08, Alejandro Vallejo wrote:
>> --- a/xen/arch/x86/cpu/microcode/core.c
>> +++ b/xen/arch/x86/cpu/microcode/core.c
>> @@ -840,6 +840,15 @@ static int __init early_microcode_update_cpu(void)
>>  return microcode_update_cpu(patch);
>>  }
>>  
>> +static void __init early_read_cpuid_7d0(void)
>> +{
>> +boot_cpu_data.cpuid_level = cpuid_eax(0);
> As per above I don't think this is needed.
>
>> +if ( boot_cpu_data.cpuid_level >= 7 )
>> +boot_cpu_data.x86_capability[FEATURESET_7d0]
>> += cpuid_count_edx(7, 0);
> This is actually filled in early_cpu_init() as well, so doesn't need
> re-doing here unless because of a suspected change to the value (but
> then other CPUID output may have changed, too).

Hmm, yes.  I suspect that is due to the CET series (which needed to know
7d0 much earlier than previously), and me forgetting to clean up tsx_init().

>  At which point ...
>
>> @@ -878,5 +887,17 @@ int __init early_microcode_init(unsigned long 
>> *module_map,
>>  if ( ucode_mod.mod_end || ucode_blob.size )
>>  rc = early_microcode_update_cpu();
>>  
>> +early_read_cpuid_7d0();
>> +
>> +/*
>> + * tsx_init() needs MSR_ARCH_CAPS, but it runs before identify_cpu()
>> + * populates boot_cpu_data, so we read it here to centralize early
>> + * CPUID/MSR reads in the same place.
>> + */
>> +if ( cpu_has_arch_caps )
>> +rdmsr(MSR_ARCH_CAPABILITIES,
>> +  boot_cpu_data.x86_capability[FEATURESET_m10Al],
>> +  boot_cpu_data.x86_capability[FEATURESET_m10Ah]);
> ... "centralize" aspect goes away, and hence the comment needs adjusting.

I find it weird splitting apart the various reads into x86_capability[],
but in light of the feedback, only the rdmsr() needs to stay.

>
>> --- a/xen/arch/x86/tsx.c
>> +++ b/xen/arch/x86/tsx.c
>> @@ -39,9 +39,9 @@ void tsx_init(void)
>>  static bool __read_mostly once;
>>  
>>  /*
>> - * This function is first called between microcode being loaded, and 
>> CPUID
>> - * being scanned generally.  Read into boot_cpu_data.x86_capability[] 
>> for
>> - * the cpu_has_* bits we care about using here.
>> + * While MSRs/CPUID haven't yet been scanned, MSR_ARCH_CAPABILITIES
>> + * and leaf 7d0 have already been read if present after early microcode
>> + * loading time. So we can assume _those_ are present.
>>   */
>>  if ( unlikely(!once) )
>>  {
> I think I'd like to see at least the initial part of the original comment
> retained here.

The first sentence needs to stay as-is.  That's still relevant even with
the feature handling moved out.

The second sentence wants to say something like "However,
microcode_init() has already prepared the feature bits we need." because
it's the justification of why we don't do it here.

~Andrew

Re: [PATCH v2 1/4] x86/microcode: Remove Intel's family check on early_microcode_init()

2023-06-12 Thread Andrew Cooper

On 12/06/2023 4:16 pm, Jan Beulich wrote:
> On 05.06.2023 19:08, Alejandro Vallejo wrote:
>
>> --- a/xen/arch/x86/cpu/microcode/core.c
>> +++ b/xen/arch/x86/cpu/microcode/core.c
>> @@ -854,8 +854,14 @@ int __init early_microcode_init(unsigned long 
>> *module_map,
>>  break;
>>  
>>  case X86_VENDOR_INTEL:
>> -if ( c->x86 >= 6 )
>> -ucode_ops = intel_ucode_ops;
>> +/*
>> + * Intel introduced microcode loading with family 6. Because we
>> + * don't support compiling Xen for 32bit machines we're guaranteed
>> + * that at this point we're either in family 15 (Pentium 4) or 6
>> + * (everything since then), so microcode facilities are always
>> + * present.
>> + */
>> +ucode_ops = intel_ucode_ops;
>>  break;
>>  }
> There are many places where we make such connections / assumptions without
> long comments. I'd be okay with a brief one, but I'm not convinced we need
> one at all.

I agree.  I don't think we need a comment here.

I'd also tweak the commit message to say "All 64bit-capable Intel CPUs
are supported as far as microcode loading goes" or similar.  It's subtly
different IMO.

The Intel microcode driver already relies on 64bit-ness to exclude and
early case (on 32bit CPUs only) which lack Platform Flags.

I'm happy to fix both of these up on commit.

~Andrew

[qemu-mainline test] 181391: regressions - FAIL

2023-06-12 Thread osstest service owner

flight 181391 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181391/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64-xsm   6 xen-buildfail REGR. vs. 180691
 build-arm64   6 xen-buildfail REGR. vs. 180691
 build-amd64   6 xen-buildfail REGR. vs. 180691
 build-i3866 xen-buildfail REGR. vs. 180691
 build-amd64-xsm   6 xen-buildfail REGR. vs. 180691
 build-i386-xsm6 xen-buildfail REGR. vs. 180691
 build-armhf   6 xen-buildfail REGR. vs. 180691

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-ws16-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-shadow 1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-vhd1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-coresched-i386-xl  1 build-check(1)   blocked  n/a
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a

Re: [PATCH v3 1/4] limits.h: add UCHAR_MAX, SCHAR_MAX, and SCHAR_MIN

2023-06-12 Thread Vincenzo Frascino

Hi Demi,

On 6/10/23 21:40, Demi Marie Obenour wrote:
> Some drivers already defined these, and they will be used by sscanf()
> for overflow checks later.  Also add SSIZE_MIN to limits.h, which will
> also be needed later.
> 
> Signed-off-by: Demi Marie Obenour 
> ---
>  .../media/atomisp/pci/hive_isp_css_include/platform_support.h  | 1 -
>  include/linux/limits.h | 1 +
>  include/linux/mfd/wl1273-core.h| 3 ---
>  include/vdso/limits.h  | 3 +++
>  4 files changed, 4 insertions(+), 4 deletions(-)
> 
...

> diff --git a/include/vdso/limits.h b/include/vdso/limits.h
> index 
> 0197888ad0e00b2f853d3f25ffa764f61cca7385..0cad0a2490e5efc194d874025eb3e3b846a5c7b4
>  100644
> --- a/include/vdso/limits.h
> +++ b/include/vdso/limits.h
> @@ -2,6 +2,9 @@
>  #ifndef __VDSO_LIMITS_H
>  #define __VDSO_LIMITS_H
>  
> +#define UCHAR_MAX((unsigned char)~0U)
> +#define SCHAR_MAX((signed char)(UCHAR_MAX >> 1))
> +#define SCHAR_MIN((signed char)(-SCHAR_MAX - 1))

Are you planning to use those definitions in the vDSO library?

If not can you please define them in linux/limits.h, the vdso headers contain
only what is necessary for the vDSO library.

Thanks!

>  #define USHRT_MAX((unsigned short)~0U)
>  #define SHRT_MAX ((short)(USHRT_MAX >> 1))
>  #define SHRT_MIN ((short)(-SHRT_MAX - 1))

-- 
Regards,
Vincenzo

[PATCH v3 0/4] x86: RSBA and RRSBA handling

2023-06-12 Thread Andrew Cooper

This series deals with the hanlding of the RSBA and RRSBA bits across all
parts and all mistakes encountered in various microcode versions.

There are only minor changes from v2.  See patches for details.

Andrew Cooper (4):
  x86/spec-ctrl: Use a taint for CET without MSR_SPEC_CTRL
  x86/spec-ctrl: Rename retpoline_safe() to retpoline_calculations()
  x86/spec-ctrl: Fix up the RSBA/RRSBA bits as appropriate
  x86/cpu-policy: Derive RSBA/RRSBA for guest policies

 xen/arch/x86/cpu-policy.c   |  39 ++
 xen/arch/x86/include/asm/cpufeature.h   |   1 +
 xen/arch/x86/spec_ctrl.c| 142 +---
 xen/common/kernel.c |   2 +-
 xen/include/public/arch-x86/cpufeatureset.h |   4 +-
 xen/tools/gen-cpuid.py  |   5 +-
 6 files changed, 170 insertions(+), 23 deletions(-)

-- 
2.30.2

[PATCH v3 3/4] x86/spec-ctrl: Fix up the RSBA/RRSBA bits as appropriate

2023-06-12 Thread Andrew Cooper

In order to level a VM safely for migration, the toolstack needs to know the
RSBA/RRSBA properties of the CPU, whether or not they happen to be enumerated.

See the code comment for details.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v3:
 * Add a taint for bad EIBRS vs RSBA/RRSBA.
 * Minor comment improvements.

v2:
 * Rewrite almost from scratch.
---
 xen/arch/x86/include/asm/cpufeature.h |   1 +
 xen/arch/x86/spec_ctrl.c  | 100 --
 2 files changed, 96 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/include/asm/cpufeature.h 
b/xen/arch/x86/include/asm/cpufeature.h
index ace31e3b1f1a..e2cb8f3cc728 100644
--- a/xen/arch/x86/include/asm/cpufeature.h
+++ b/xen/arch/x86/include/asm/cpufeature.h
@@ -193,6 +193,7 @@ static inline bool boot_cpu_has(unsigned int feat)
 #define cpu_has_tsx_ctrlboot_cpu_has(X86_FEATURE_TSX_CTRL)
 #define cpu_has_taa_no  boot_cpu_has(X86_FEATURE_TAA_NO)
 #define cpu_has_fb_clearboot_cpu_has(X86_FEATURE_FB_CLEAR)
+#define cpu_has_rrsba   boot_cpu_has(X86_FEATURE_RRSBA)
 
 /* Synthesized. */
 #define cpu_has_arch_perfmonboot_cpu_has(X86_FEATURE_ARCH_PERFMON)
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 3892ce4d20ba..fb1b59b4d7e3 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -579,7 +579,10 @@ static bool __init check_smt_enabled(void)
 return false;
 }
 
-/* Calculate whether Retpoline is known-safe on this CPU. */
+/*
+ * Calculate whether Retpoline is known-safe on this CPU.  Fix up the
+ * RSBA/RRSBA bits as necessary.
+ */
 static bool __init retpoline_calculations(void)
 {
 unsigned int ucode_rev = this_cpu(cpu_sig).rev;
@@ -593,15 +596,93 @@ static bool __init retpoline_calculations(void)
 return false;
 
 /*
- * RSBA may be set by a hypervisor to indicate that we may move to a
- * processor which isn't retpoline-safe.
+ * The meaning of the RSBA and RRSBA bits have evolved over time.  The
+ * agreed upon meaning at the time of writing (May 2023) is thus:
+ *
+ * - RSBA (RSB Alternative) means that an RSB may fall back to an
+ *   alternative predictor on underflow.  Skylake uarch and later all have
+ *   this property.  Broadwell too, when running microcode versions prior
+ *   to Jan 2018.
+ *
+ * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces
+ *   tagging of predictions with the mode in which they were learned.  So
+ *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
+ *
+ * - CPUs are not expected to enumerate both RSBA and RRSBA.
+ *
+ * Some parts (Broadwell) are not expected to ever enumerate this
+ * behaviour directly.  Other parts have differing enumeration with
+ * microcode version.  Fix up Xen's idea, so we can advertise them safely
+ * to guests, and so toolstacks can level a VM safety for migration.
+ *
+ * The following states exist:
+ *
+ * |   | RSBA | EIBRS | RRSBA | Notes  | Action|
+ * |---+--+---+---++---|
+ * | 1 |0 | 0 | 0 | OK (older parts)   | Maybe +RSBA   |
+ * | 2 |0 | 0 | 1 | Broken | +RSBA, -RRSBA |
+ * | 3 |0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA|
+ * | 4 |0 | 1 | 1 | OK |   |
+ * | 5 |1 | 0 | 0 | OK |   |
+ * | 6 |1 | 0 | 1 | Broken | -RRSBA|
+ * | 7 |1 | 1 | 0 | Broken | -RSBA, +RRSBA |
+ * | 8 |1 | 1 | 1 | Broken | -RSBA |
+ *
+ * However, we don't need perfect adherence to the spec.  We only need
+ * RSBA || RRSBA to indicate "alternative predictors potentially in use".
+ * Rows 1 & 3 are fixed up by later logic, as they're known configurations
+ * which exist in the world.
  *
+ * Complain loudly at the broken cases. They're safe for Xen to use (so we
+ * don't attempt to correct), and may or may not exist in reality, but if
+ * we ever encoutner them in practice, something is wrong and needs
+ * further investigation.
+ */
+if ( cpu_has_eibrs ? cpu_has_rsba  /* Rows 7, 8 */
+   : cpu_has_rrsba /* Rows 2, 6 */ )
+{
+printk(XENLOG_ERR
+   "FIRMWARE BUG: CPU %02x-%02x-%02x, ucode 0x%08x: RSBA %u, EIBRS 
%u, RRSBA %u\n",
+   boot_cpu_data.x86, boot_cpu_data.x86_model,
+   boot_cpu_data.x86_mask, ucode_rev,
+   cpu_has_rsba, cpu_has_eibrs, cpu_has_rrsba);
+add_taint(TAINT_CPU_OUT_OF_SPEC);
+}
+
+/*
  * Processors offering Enhanced IBRS are not guarenteed to be
  * repoline-safe.
  */
-if ( cpu_has_rsba || cpu_has_eibrs )
+if ( cpu_has_eibrs )
+

[PATCH v3 2/4] x86/spec-ctrl: Rename retpoline_safe() to retpoline_calculations()

2023-06-12 Thread Andrew Cooper

This is prep work, split out to simply the diff on the following change.

 * Rename to retpoline_calculations(), and call unconditionally.  It is
   shortly going to synthesise missing enumerations required for guest safety.
 * For the model check switch statement, store the result in a variable and
   break rather than returning directly.

No functional change.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v2:
 * Extend the 'safe' variable to the entire switch statement.
---
 xen/arch/x86/spec_ctrl.c | 41 +---
 1 file changed, 26 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 05b86edf73d3..3892ce4d20ba 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -580,9 +580,10 @@ static bool __init check_smt_enabled(void)
 }
 
 /* Calculate whether Retpoline is known-safe on this CPU. */
-static bool __init retpoline_safe(void)
+static bool __init retpoline_calculations(void)
 {
 unsigned int ucode_rev = this_cpu(cpu_sig).rev;
+bool safe = false;
 
 if ( boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) )
 return true;
@@ -620,29 +621,31 @@ static bool __init retpoline_safe(void)
 case 0x3f: /* Haswell EX/EP */
 case 0x45: /* Haswell D */
 case 0x46: /* Haswell H */
-return true;
+safe = true;
+break;
 
 /*
  * Broadwell processors are retpoline-safe after specific microcode
  * versions.
  */
 case 0x3d: /* Broadwell */
-return ucode_rev >= 0x2a;
+safe = ucode_rev >= 0x2a;  break;
 case 0x47: /* Broadwell H */
-return ucode_rev >= 0x1d;
+safe = ucode_rev >= 0x1d;  break;
 case 0x4f: /* Broadwell EP/EX */
-return ucode_rev >= 0xb21;
+safe = ucode_rev >= 0xb21; break;
 case 0x56: /* Broadwell D */
 switch ( boot_cpu_data.x86_mask )
 {
-case 2:  return ucode_rev >= 0x15;
-case 3:  return ucode_rev >= 0x712;
-case 4:  return ucode_rev >= 0xf11;
-case 5:  return ucode_rev >= 0xe09;
+case 2:  safe = ucode_rev >= 0x15;  break;
+case 3:  safe = ucode_rev >= 0x712; break;
+case 4:  safe = ucode_rev >= 0xf11; break;
+case 5:  safe = ucode_rev >= 0xe09; break;
 default:
 printk("Unrecognised CPU stepping %#x - assuming not reptpoline 
safe\n",
boot_cpu_data.x86_mask);
-return false;
+safe = false;
+break;
 }
 break;
 
@@ -656,7 +659,8 @@ static bool __init retpoline_safe(void)
 case 0x67: /* Cannonlake? */
 case 0x8e: /* Kabylake M */
 case 0x9e: /* Kabylake D */
-return false;
+safe = false;
+break;
 
 /*
  * Atom processors before Goldmont Plus/Gemini Lake are retpoline-safe.
@@ -675,13 +679,17 @@ static bool __init retpoline_safe(void)
 case 0x5c: /* Goldmont */
 case 0x5f: /* Denverton */
 case 0x85: /* Knights Mill */
-return true;
+safe = true;
+break;
 
 default:
 printk("Unrecognised CPU model %#x - assuming not reptpoline safe\n",
boot_cpu_data.x86_model);
-return false;
+safe = false;
+break;
 }
+
+return safe;
 }
 
 /*
@@ -1114,7 +1122,7 @@ void __init init_speculation_mitigations(void)
 {
 enum ind_thunk thunk = THUNK_DEFAULT;
 bool has_spec_ctrl, ibrs = false, hw_smt_enabled;
-bool cpu_has_bug_taa;
+bool cpu_has_bug_taa, retpoline_safe;
 
 hw_smt_enabled = check_smt_enabled();
 
@@ -1143,6 +1151,9 @@ void __init init_speculation_mitigations(void)
 thunk = THUNK_JMP;
 }
 
+/* Determine if retpoline is safe on this CPU. */
+retpoline_safe = retpoline_calculations();
+
 /*
  * Has the user specified any custom BTI mitigations?  If so, follow their
  * instructions exactly and disable all heuristics.
@@ -1164,7 +1175,7 @@ void __init init_speculation_mitigations(void)
  * On all hardware, we'd like to use retpoline in preference to
  * IBRS, but only if it is safe on this hardware.
  */
-if ( retpoline_safe() )
+if ( retpoline_safe )
 thunk = THUNK_RETPOLINE;
 else if ( has_spec_ctrl )
 ibrs = true;
-- 
2.30.2

[PATCH v3 1/4] x86/spec-ctrl: Use a taint for CET without MSR_SPEC_CTRL

2023-06-12 Thread Andrew Cooper

Reword the comment for 'S' to include an incompatbile set of features on the
same core.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v3:
 * New
---
 xen/arch/x86/spec_ctrl.c | 3 +++
 xen/common/kernel.c  | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index cd5ea6aa52d9..05b86edf73d3 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -1132,7 +1132,10 @@ void __init init_speculation_mitigations(void)
 if ( read_cr4() & X86_CR4_CET )
 {
 if ( !has_spec_ctrl )
+{
 printk(XENLOG_WARNING "?!? CET active, but no MSR_SPEC_CTRL?\n");
+add_taint(TAINT_CPU_OUT_OF_SPEC);
+}
 else if ( opt_ibrs == -1 )
 opt_ibrs = ibrs = true;
 
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index fd975ae21ebc..719b08d6c76a 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -373,7 +373,7 @@ unsigned int tainted;
  *  'H' - HVM forced emulation prefix is permitted.
  *  'I' - Platform is insecure (usually due to an errata on the platform).
  *  'M' - Machine had a machine check experience.
- *  'S' - Out of spec CPU (One core has a feature incompatible with others).
+ *  'S' - Out of spec CPU (Incompatible features on one or more cores).
  *
  *  The string is overwritten by the next call to print_taint().
  */
-- 
2.30.2

[PATCH v3 4/4] x86/cpu-policy: Derive RSBA/RRSBA for guest policies

2023-06-12 Thread Andrew Cooper

The RSBA bit, "RSB Alternative", means that the RSB may use alternative
predictors when empty.  From a practical point of view, this mean "Retpoline
not safe".

Enhanced IBRS (officially IBRS_ALL in Intel's docs, previously IBRS_ATT) is a
statement that IBRS is implemented in hardware (as opposed to the form
retrofitted to existing CPUs in microcode).

The RRSBA bit, "Restricted-RSBA", is a combination of RSBA, and the eIBRS
property that predictions are tagged with the mode in which they were learnt.
Therefore, it means "when eIBRS is active, the RSB may fall back to
alternative predictors but restricted to the current prediction mode".  As
such, it's stronger statement than RSBA, but still means "Retpoline not safe".

CPUs are not expected to enumerate both RSBA and RRSBA.

Add feature dependencies for EIBRS and RRSBA.  While technically they're not
linked, absolutely nothing good can come of letting the guest see RRSBA
without EIBRS.  Nor a guest seeing EIBRS without IBRSB.  Furthermore, we use
this dependency to simplify the max derivation logic.

The max policies gets RSBA and RRSBA unconditionally set (with the EIBRS
dependency maybe hiding RRSBA).  We can run any VM, even if it has been told
"somewhere you might run, Retpoline isn't safe".

The default policies are more complicated.  A guest shouldn't see both bits,
but it needs to see one if the current host suffers from any form of RSBA, and
which bit it needs to see depends on whether eIBRS is visible or not.
Therefore, the calculation must be performed after sanitise_featureset().

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v3:
 * Minor commit message adjustment.
 * Drop changes to recalculate_cpuid_policy().  Deferred to a later series.

v2:
 * Expand/adjust the comment for the max features.
 * Rewrite the default feature derivation in light of new information.
 * Fix up in recalculate_cpuid_policy() too.
---
 xen/arch/x86/cpu-policy.c   | 39 +
 xen/include/public/arch-x86/cpufeatureset.h |  4 +--
 xen/tools/gen-cpuid.py  |  5 ++-
 3 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index ee256ff5a137..cde7f7605c28 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -423,8 +423,17 @@ static void __init 
guest_common_max_feature_adjustments(uint32_t *fs)
  * Retpoline not safe)", so these need to be visible to a guest in all
  * cases, even when it's only some other server in the pool which
  * suffers the identified behaviour.
+ *
+ * We can always run any VM which has previously (or will
+ * subsequently) run on hardware where Retpoline is not safe.
+ * Note:
+ *  - The dependency logic may hide RRSBA for other reasons.
+ *  - The max policy does not contitute a sensible configuration to
+ *run a guest in.
  */
 __set_bit(X86_FEATURE_ARCH_CAPS, fs);
+__set_bit(X86_FEATURE_RSBA, fs);
+__set_bit(X86_FEATURE_RRSBA, fs);
 }
 }
 
@@ -532,6 +541,21 @@ static void __init calculate_pv_def_policy(void)
 guest_common_default_feature_adjustments(fs);
 
 sanitise_featureset(fs);
+
+/*
+ * If the host suffers from RSBA of any form, and the guest can see
+ * MSR_ARCH_CAPS, reflect the appropriate RSBA/RRSBA property to the guest
+ * depending on the visibility of eIBRS.
+ */
+if ( test_bit(X86_FEATURE_ARCH_CAPS, fs) &&
+ (cpu_has_rsba || cpu_has_rrsba) )
+{
+bool eibrs = test_bit(X86_FEATURE_EIBRS, fs);
+
+__set_bit(eibrs ? X86_FEATURE_RRSBA
+: X86_FEATURE_RSBA, fs);
+}
+
 x86_cpu_featureset_to_policy(fs, p);
 recalculate_xstate(p);
 }
@@ -664,6 +688,21 @@ static void __init calculate_hvm_def_policy(void)
 __set_bit(X86_FEATURE_VIRT_SSBD, fs);
 
 sanitise_featureset(fs);
+
+/*
+ * If the host suffers from RSBA of any form, and the guest can see
+ * MSR_ARCH_CAPS, reflect the appropriate RSBA/RRSBA property to the guest
+ * depending on the visibility of eIBRS.
+ */
+if ( test_bit(X86_FEATURE_ARCH_CAPS, fs) &&
+ (cpu_has_rsba || cpu_has_rrsba) )
+{
+bool eibrs = test_bit(X86_FEATURE_EIBRS, fs);
+
+__set_bit(eibrs ? X86_FEATURE_RRSBA
+: X86_FEATURE_RSBA, fs);
+}
+
 x86_cpu_featureset_to_policy(fs, p);
 recalculate_xstate(p);
 }
diff --git a/xen/include/public/arch-x86/cpufeatureset.h 
b/xen/include/public/arch-x86/cpufeatureset.h
index ea779c29879e..ce7407d6a10c 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -311,7 +311,7 @@ XEN_CPUFEATURE(CET_SSS,15*32+18) /*   CET 
Supervisor Shadow Stacks s
 /* Intel-defined CPU features, MSR_ARCH_CAPS 0x10a.eax, word 16 */
 XEN_CPUFEATURE(RDCL_NO,

[XEN PATCH] xen: fixed violations of MISRA C:2012 Rule 3.1

2023-06-12 Thread nicola . vetrini

From: Nicola Vetrini 

The xen sources contain several violations of Rule 3.1 from MISRA C:2012,
whose headline states:
"The character sequences '/*' and '//' shall not be used within a comment".

Most of the violations are due to the presence of links to webpages within
C-style comment blocks, such as:

xen/arch/arm/include/asm/smccc.h:37.1-41.3
/*
 * This file provides common defines for ARM SMC Calling Convention as
 * specified in
 * http://infocenter.arm.com/help/topic/com.arm.doc.den0028a/index.html
 */

In this case, we propose to deviate all of these occurrences with a
project deviation to be captured by a tool configuration.

There are, however, a few other violations that do not fall under this
category, namely:

1. in file "xen/arch/arm/include/asm/arm64/flushtlb.h" we propose to
avoid the usage of a nested comment;
2. in file "xen/common/xmalloc_tlsf.c" we propose to substitute the
commented-out if statement with a "#if 0 .. #endif;
3. in file "xen/include/xen/atomic.h" and
"xen/drivers/passthrough/arm/smmu-v3.c" we propose to split the C-style
comment containing the nested comment into two doxygen comments, clearly
identifying the second as a code sample. This can then be captured with a
project deviation by a tool configuration.

Signed-off-by: Nicola Vetrini 
---
 xen/arch/arm/include/asm/arm64/flushtlb.h | 8 
 xen/common/xmalloc_tlsf.c | 7 ---
 xen/drivers/passthrough/arm/smmu-v3.c | 9 ++---
 xen/include/xen/atomic.h  | 5 -
 4 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/flushtlb.h 
b/xen/arch/arm/include/asm/arm64/flushtlb.h
index 3a9092b814..90ac3f9809 100644
--- a/xen/arch/arm/include/asm/arm64/flushtlb.h
+++ b/xen/arch/arm/include/asm/arm64/flushtlb.h
@@ -4,10 +4,10 @@
 /*
  * Every invalidation operation use the following patterns:
  *
- * DSB ISHST// Ensure prior page-tables updates have completed
- * TLBI...  // Invalidate the TLB
- * DSB ISH  // Ensure the TLB invalidation has completed
- * ISB  // See explanation below
+ * DSB ISHSTEnsure prior page-tables updates have completed
+ * TLBI...  Invalidate the TLB
+ * DSB ISH  Ensure the TLB invalidation has completed
+ * ISB  See explanation below
  *
  * ARM64_WORKAROUND_REPEAT_TLBI:
  * Modification of the translation table for a virtual address might lead to
diff --git a/xen/common/xmalloc_tlsf.c b/xen/common/xmalloc_tlsf.c
index 75bdf18c4e..ea6ec47a59 100644
--- a/xen/common/xmalloc_tlsf.c
+++ b/xen/common/xmalloc_tlsf.c
@@ -140,9 +140,10 @@ static inline void MAPPING_SEARCH(unsigned long *r, int 
*fl, int *sl)
 *fl = flsl(*r) - 1;
 *sl = (*r >> (*fl - MAX_LOG2_SLI)) - MAX_SLI;
 *fl -= FLI_OFFSET;
-/*if ((*fl -= FLI_OFFSET) < 0) // FL will be always >0!
- *fl = *sl = 0;
- */
+#if 0
+if ((*fl -= FLI_OFFSET) < 0) // FL will be always >0!
+fl = *sl = 0;
+#endif
 *r &= ~t;
 }
 }
diff --git a/xen/drivers/passthrough/arm/smmu-v3.c 
b/xen/drivers/passthrough/arm/smmu-v3.c
index 720aa69ff2..b1c536e7d9 100644
--- a/xen/drivers/passthrough/arm/smmu-v3.c
+++ b/xen/drivers/passthrough/arm/smmu-v3.c
@@ -1045,15 +1045,18 @@ static int arm_smmu_atc_inv_domain(struct 
arm_smmu_domain *smmu_domain,
/*
 * Ensure that we've completed prior invalidation of the main TLBs
 * before we read 'nr_ats_masters' in case of a concurrent call to
-* arm_smmu_enable_ats():
+* arm_smmu_enable_ats().
+*/
+   /**
+* Code sample: Ensures that we always see the incremented
+* 'nr_ats_masters' count if ATS was enabled at the PCI device before
+* completion of the TLBI.
 *
 *  // unmap()  // arm_smmu_enable_ats()
 *  TLBI+SYNC   atomic_inc(_ats_masters);
 *  smp_mb();   [...]
 *  atomic_read(_ats_masters);   pci_enable_ats() // writel()
 *
-* Ensures that we always see the incremented 'nr_ats_masters' count if
-* ATS was enabled at the PCI device before completion of the TLBI.
 */
smp_mb();
if (!atomic_read(_domain->nr_ats_masters))
diff --git a/xen/include/xen/atomic.h b/xen/include/xen/atomic.h
index 529213ebbb..829646dda0 100644
--- a/xen/include/xen/atomic.h
+++ b/xen/include/xen/atomic.h
@@ -71,7 +71,10 @@ static inline void _atomic_set(atomic_t *v, int i);
  * Returns the initial value in @v, hence succeeds when the return value
  * matches that of @old.
  *
- * Sample (tries atomic increment of v until the operation succeeds):
+ */
+/**
+ *
+ * Code sample: Tries atomic increment of v until the operation succeeds.
  *
  *  while(1)
  *  {
-- 
2.34.1

Re: [PATCH v2 2/4] x86: Read MSR_ARCH_CAPS after early_microcode_init()

2023-06-12 Thread Jan Beulich

On 05.06.2023 19:08, Alejandro Vallejo wrote:
> tsx_init() has some ad-hoc code to read MSR_ARCH_CAPS if present. In order
> to suuport DIS_MCU_UPDATE we need access to it earlier, so this patch moves
> early read to the tail of early_microcode_init(), after the early microcode
> update.
> 
> The read of the 7d0 CPUID leaf is left in a helper because it's reused in a
> later patch.
> 
> No functional change.
> 
> Signed-off-by: Alejandro Vallejo 
> ---
> I suspect there was an oversight in tsx_init() by which
> boot_cpu_data.cpuid_level was never read? The first read I can
> see is in identify_cpu(), which happens after tsx_init().

See early_cpu_init(). (I have to admit that I was also struggling with
your use of "read": Aiui you mean the field was never written / set,
and "read" really refers to the reading of the corresponding CPUID
leaf.)

> --- a/xen/arch/x86/cpu/microcode/core.c
> +++ b/xen/arch/x86/cpu/microcode/core.c
> @@ -840,6 +840,15 @@ static int __init early_microcode_update_cpu(void)
>  return microcode_update_cpu(patch);
>  }
>  
> +static void __init early_read_cpuid_7d0(void)
> +{
> +boot_cpu_data.cpuid_level = cpuid_eax(0);

As per above I don't think this is needed.

> +if ( boot_cpu_data.cpuid_level >= 7 )
> +boot_cpu_data.x86_capability[FEATURESET_7d0]
> += cpuid_count_edx(7, 0);

This is actually filled in early_cpu_init() as well, so doesn't need
re-doing here unless because of a suspected change to the value (but
then other CPUID output may have changed, too). At which point ...

> @@ -878,5 +887,17 @@ int __init early_microcode_init(unsigned long 
> *module_map,
>  if ( ucode_mod.mod_end || ucode_blob.size )
>  rc = early_microcode_update_cpu();
>  
> +early_read_cpuid_7d0();
> +
> +/*
> + * tsx_init() needs MSR_ARCH_CAPS, but it runs before identify_cpu()
> + * populates boot_cpu_data, so we read it here to centralize early
> + * CPUID/MSR reads in the same place.
> + */
> +if ( cpu_has_arch_caps )
> +rdmsr(MSR_ARCH_CAPABILITIES,
> +  boot_cpu_data.x86_capability[FEATURESET_m10Al],
> +  boot_cpu_data.x86_capability[FEATURESET_m10Ah]);

... "centralize" aspect goes away, and hence the comment needs adjusting.

> --- a/xen/arch/x86/tsx.c
> +++ b/xen/arch/x86/tsx.c
> @@ -39,9 +39,9 @@ void tsx_init(void)
>  static bool __read_mostly once;
>  
>  /*
> - * This function is first called between microcode being loaded, and 
> CPUID
> - * being scanned generally.  Read into boot_cpu_data.x86_capability[] for
> - * the cpu_has_* bits we care about using here.
> + * While MSRs/CPUID haven't yet been scanned, MSR_ARCH_CAPABILITIES
> + * and leaf 7d0 have already been read if present after early microcode
> + * loading time. So we can assume _those_ are present.
>   */
>  if ( unlikely(!once) )
>  {

I think I'd like to see at least the initial part of the original comment
retained here.

Jan

Re: [PATCH v3 0/4] Make sscanf() stricter

2023-06-12 Thread Andy Shevchenko

On Sat, Jun 10, 2023 at 04:40:40PM -0400, Demi Marie Obenour wrote:
> Roger Pau Monné suggested making xenbus_scanf() stricter instead of
> using a custom parser.  Christoph Hellwig asked why the normal vsscanf()
> cannot be made stricter.  Richard Weinberger mentioned Linus Torvalds’s
> suggestion of using ! to allow overflow.

As Rasmus articulated, NAK w.o. test cases being added to all parts where your
changes touch.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH 2/3] xen/ppc: Implement early serial printk on PaPR/pseries

2023-06-12 Thread Julien Grall


Hi George,

Thanks for the summary! A couple of comments below.

On 12/06/2023 16:19, George Dunlap wrote:

On Fri, Jun 9, 2023 at 5:07 PM Julien Grall  wrote:


Hi Shawn,

On 09/06/2023 16:01, Shawn Anastasio wrote:

On Fri Jun 9, 2023 at 5:12 AM CDT, Julien Grall wrote:

Strictly speaking we can refuse any code. That count for license as
well. Anyway, I didn't request a change here. I merely pointed out that
any use of GPLv2+ should be justified because on Arm most of the people
don't pay attention on the license and pick the one from an existing

file.


Hi Julien,

The choice of GPLv2+ for many of the files in this patchset was indeed
inherited from old IBM-written Xen code that the files in question were
derived from. I did not realize it was permissible or even desirable to
relicense those to GPLv2-only.

As for the new files, GPLv2+ was chosen to remain consistent and to open
the door for future derivations from GPLv2+ licensed code, either from
the older Xen tree or from the Linux ppc tree, much of which is also
licensed as GPLv2+. If it would reduce friction, these files could be
relicensed to GPLv2-only.


(Before someone points out, I know this is already a problem on other
part of Xen. But it would be ideal if we avoid spreading this mess on
new architectures :).

Thanks for the explanations. To clarify, are you saying that all the
files will be GPLv2+ or just some?

If the latter, then my concern would be that if you need to import
GPLv2-only code, then you may need to write your code in a different
file. This may become messy to handle and some developer may end up to
be confused.

I am not a lawyer though, so you may want to check the implications here.



Shawn,

Again sorry that you've sort of bumped a hornet's nest here.

Just to clarify, the situation as I understand it is:

1. Large parts of Xen, being inherited from the Linux Kernel, are
GPLv2-only; and the documentation clearly states that code is GPLv2-only
unless explicitly stated otherwise.

2. Some individual files in Xen are labelled as GPLv2-or-later; but as they
rely on the "only" files, Xen as a whole can only be compiled under a GPLv2
license.

3. New contributions to a file assumed to have the same license as the
header of the file; i.e., the code contained in patches to GPLv2-or-later
files is assumed to be granted according to a GPLv2-or-later license.


The new contribution here could be code imported from Linux that would 
be GPLv2-only in GPLv2-or-later file. It is not clear to me what would 
be the legal implication.




4. In the past, the legal teams of some contributors -- namely ARM -- were
wary of the GPLv3; specifically the patent grant.  Since ARM doesn't make
anything themselves, their patents are literally their product; they need
to be very careful of not accidentally granting them to the world.  I think
one thing ARM may have been afraid of at some point is one of their
engineers accidentally submitting a patch to a GPLv2-or-later file which
would, when taken with a GPLv3 (or GPLv4 license, once it comes out) cause
them to lose too much control over their IP.

My understanding is that Julien is afraid that if the "GPLv2-or-later"
files start to proliferate, that companies like ARM will start to become
more wary of contributing; and so has been generally trying to encourage
new files to be labelled "GPLv2-only" unless there's a good reason to do
otherwise.  (Other issues like copying code from GPLv2-only are potential
pitfalls as well, but probably less important.)
There is that and also the fact that we now need to be more careful when 
importing code from Linux. In Shawn's case this is mitigated by the fact 
that the license in Xen files should match the one in Linux.



Additionally, I think it would be good if the community *did* have a
discussion about whether we want an official policy; so that either we can
point people to the relevant doc (with explanation), or stop bothering
about it. :-)


+1. Do you think that would be a good topic for Xen Summit?

Cheers,

--
Julien Grall

[linux-linus test] 181387: regressions - trouble: broken/fail/pass

2023-06-12 Thread osstest service owner

flight 181387 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181387/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-libvirt-raw broken
 test-armhf-armhf-xl-credit1  broken
 test-armhf-armhf-xl-multivcpu broken
 test-armhf-armhf-xl-vhd  broken
 test-armhf-armhf-xl-credit1   8 xen-boot   fail in 181383 REGR. vs. 180278
 build-arm64-pvops 6 kernel-build   fail in 181383 REGR. vs. 180278

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-vhd   5 host-install(5)  broken pass in 181383
 test-armhf-armhf-libvirt-raw  5 host-install(5)  broken pass in 181383
 test-armhf-armhf-xl-multivcpu  5 host-install(5) broken pass in 181383
 test-armhf-armhf-xl-credit1   5 host-install(5)  broken pass in 181383
 test-amd64-amd64-xl-vhd 21 guest-start/debian.repeat fail in 181383 pass in 
181387

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked in 181383 n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked in 181383 n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked in 181383 n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked in 181383 n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked in 181383 n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked in 181383 n/a
 test-arm64-arm64-examine  1 build-check(1)   blocked in 181383 n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked in 181383 n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked in 181383 n/a
 test-armhf-armhf-xl-multivcpu  8 xen-boot   fail in 181383 like 180278
 test-armhf-armhf-libvirt-raw  8 xen-bootfail in 181383 like 180278
 test-armhf-armhf-xl-vhd   8 xen-bootfail in 181383 like 180278
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180278
 test-armhf-armhf-xl-credit2   8 xen-boot fail  like 180278
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180278
 test-armhf-armhf-examine  8 reboot   fail  like 180278
 test-armhf-armhf-libvirt  8 xen-boot fail  like 180278
 test-armhf-armhf-xl-arndale   8 xen-boot fail  like 180278
 test-armhf-armhf-libvirt-qcow2  8 xen-bootfail like 180278
 test-armhf-armhf-xl   8 xen-boot fail  like 180278
 test-armhf-armhf-xl-rtds  8 xen-boot fail  like 180278
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 linux858fd168a95c5b9669aac8db6c14a9aeab446375
baseline version:
 linux6c538e1adbfc696ac4747fb10d63e704344f763d

Last test of basis   180278  2023-04-16 19:41:46 Z   56 days
Failing since180281  2023-04-17

[qemu-mainline test] 181390: regressions - FAIL

2023-06-12 Thread osstest service owner

flight 181390 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181390/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64-xsm   6 xen-buildfail REGR. vs. 180691
 build-arm64   6 xen-buildfail REGR. vs. 180691
 build-amd64   6 xen-buildfail REGR. vs. 180691
 build-i3866 xen-buildfail REGR. vs. 180691
 build-amd64-xsm   6 xen-buildfail REGR. vs. 180691
 build-i386-xsm6 xen-buildfail REGR. vs. 180691
 build-armhf   6 xen-buildfail REGR. vs. 180691

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-ws16-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-shadow 1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-vhd1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-coresched-i386-xl  1 build-check(1)   blocked  n/a
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a

Re: [PATCH 2/3] xen/ppc: Implement early serial printk on PaPR/pseries

2023-06-12 Thread George Dunlap

On Fri, Jun 9, 2023 at 5:07 PM Julien Grall  wrote:

> Hi Shawn,
>
> On 09/06/2023 16:01, Shawn Anastasio wrote:
> > On Fri Jun 9, 2023 at 5:12 AM CDT, Julien Grall wrote:
> >> Strictly speaking we can refuse any code. That count for license as
> >> well. Anyway, I didn't request a change here. I merely pointed out that
> >> any use of GPLv2+ should be justified because on Arm most of the people
> >> don't pay attention on the license and pick the one from an existing
> file.
> >
> > Hi Julien,
> >
> > The choice of GPLv2+ for many of the files in this patchset was indeed
> > inherited from old IBM-written Xen code that the files in question were
> > derived from. I did not realize it was permissible or even desirable to
> > relicense those to GPLv2-only.
> >
> > As for the new files, GPLv2+ was chosen to remain consistent and to open
> > the door for future derivations from GPLv2+ licensed code, either from
> > the older Xen tree or from the Linux ppc tree, much of which is also
> > licensed as GPLv2+. If it would reduce friction, these files could be
> > relicensed to GPLv2-only.
>
> (Before someone points out, I know this is already a problem on other
> part of Xen. But it would be ideal if we avoid spreading this mess on
> new architectures :).
>
> Thanks for the explanations. To clarify, are you saying that all the
> files will be GPLv2+ or just some?
>
> If the latter, then my concern would be that if you need to import
> GPLv2-only code, then you may need to write your code in a different
> file. This may become messy to handle and some developer may end up to
> be confused.
>
> I am not a lawyer though, so you may want to check the implications here.
>

Shawn,

Again sorry that you've sort of bumped a hornet's nest here.

Just to clarify, the situation as I understand it is:

1. Large parts of Xen, being inherited from the Linux Kernel, are
GPLv2-only; and the documentation clearly states that code is GPLv2-only
unless explicitly stated otherwise.

2. Some individual files in Xen are labelled as GPLv2-or-later; but as they
rely on the "only" files, Xen as a whole can only be compiled under a GPLv2
license.

3. New contributions to a file assumed to have the same license as the
header of the file; i.e., the code contained in patches to GPLv2-or-later
files is assumed to be granted according to a GPLv2-or-later license.

4. In the past, the legal teams of some contributors -- namely ARM -- were
wary of the GPLv3; specifically the patent grant.  Since ARM doesn't make
anything themselves, their patents are literally their product; they need
to be very careful of not accidentally granting them to the world.  I think
one thing ARM may have been afraid of at some point is one of their
engineers accidentally submitting a patch to a GPLv2-or-later file which
would, when taken with a GPLv3 (or GPLv4 license, once it comes out) cause
them to lose too much control over their IP.

My understanding is that Julien is afraid that if the "GPLv2-or-later"
files start to proliferate, that companies like ARM will start to become
more wary of contributing; and so has been generally trying to encourage
new files to be labelled "GPLv2-only" unless there's a good reason to do
otherwise.  (Other issues like copying code from GPLv2-only are potential
pitfalls as well, but probably less important.)

HOWEVER, as Andrew says, there is no official policy at this point; all the
document say is that GPLv2-only is the default unless explicitly stated
otherwise.

Furthermore, the concerns raised by ARM's legal team were nearly a decade
ago; it's not clear to me whether they still care that much.

All that to say: If you don't mind and feel that you can do so legally,
then consider switching to GPLv2-only; but if you don't want to and/or feel
that you can't do so legally, feel free to leave it as-is.

Additionally, I think it would be good if the community *did* have a
discussion about whether we want an official policy; so that either we can
point people to the relevant doc (with explanation), or stop bothering
about it. :-)

 -George

Re: [PATCH v2 1/4] x86/microcode: Remove Intel's family check on early_microcode_init()

2023-06-12 Thread Jan Beulich

On 05.06.2023 19:08, Alejandro Vallejo wrote:

> --- a/xen/arch/x86/cpu/microcode/core.c
> +++ b/xen/arch/x86/cpu/microcode/core.c
> @@ -854,8 +854,14 @@ int __init early_microcode_init(unsigned long 
> *module_map,
>  break;
>  
>  case X86_VENDOR_INTEL:
> -if ( c->x86 >= 6 )
> -ucode_ops = intel_ucode_ops;
> +/*
> + * Intel introduced microcode loading with family 6. Because we
> + * don't support compiling Xen for 32bit machines we're guaranteed
> + * that at this point we're either in family 15 (Pentium 4) or 6
> + * (everything since then), so microcode facilities are always
> + * present.
> + */
> +ucode_ops = intel_ucode_ops;
>  break;
>  }

There are many places where we make such connections / assumptions without
long comments. I'd be okay with a brief one, but I'm not convinced we need
one at all.

Jan

Re: [PATCH v2] docs/misra: new rules addition

2023-06-12 Thread Roberto Bagnara


On 12/06/23 11:50, Jan Beulich wrote:

On 12.06.2023 11:34, Roberto Bagnara wrote:

On 12/06/23 09:33, Jan Beulich wrote:

On 09.06.2023 19:45, Stefano Stabellini wrote:

@@ -143,6 +163,12 @@ existing codebase are work-in-progress.
- Octal constants shall not be used
-
   
+   * - `Rule 7.2 `_

+ - Required
+ - A "u" or "U" suffix shall be applied to all integer constants
+   that are represented in an unsigned type
+ -


I continue to consider "represented in" problematic here without
further qualification.


We should distinguish two things here.  The headline of Rule 7.2
is non negotiable: it is simply as it is.


I understand this, and ...


  As all headlines,
it is a compromise between conciseness and mnemonic value.
If what is wanted there is not the headline, then you can add
"implicitly" before "represented".  Or you may leave the headline
and add an explanatory note afterwards.


... such a note is what my comment was heading towards.


Here is an attempt.  "The rule asks that any integer literal
that is implicitly unsigned is made explicitly unsigned by
using one of the indicated suffixes.  As an example, on
a machine where the int type is 32-bit wide, 0x
is signed whereas 0x8000 is (implicitly) unsigned.
In order to comply with the rule, the latter should be
rewritten as either 0x8000u or 0x8000U.  Consistency
considerations may suggest using the same suffix even
when not required by the rule.  For instance, if one has

   f(0x);  // Original
   f(0x8000);

one might prefer

   f(0xU); // Solution 1
   f(0x8000U);

over

   f(0x);  // Solution 2
   f(0x8000U);

after having ascertained that "Solution 1" is compatible
with the intended semantics."



@@ -314,6 +340,11 @@ existing codebase are work-in-progress.
  used following a subsequent call to the same function
-
   
+   * - Rule 21.21

+ - Required
+ - The Standard Library function system of  shall not be used
+ -


Still no "inapplicable" note (whichever way it would be worded to also
please Roberto)?


I am not the one to be pleased ;-)

But really, I don't follow: when you say the rule is inapplicable
your reasoning is, IIUC, "nobody would even dream using system() in Xen".
Which is exactly what the rule is asking.  If Xen adopts the rule,
tooling will make sure system() is not used, and seeing that the rule
is applied, assessors will be pleased.


My point is that "not using functions of stdlib.h" is ambiguous: It may
mean functions implemented in an external library (which the hypervisor
doesn't use), or it may mean functions of identical name (and purpose).
The full text goes even further and forbids the use of these
identifiers (plural; see next paragraph), so it's clearly not only
about an external library, and we also can't put it off as inapplicable.
(I wouldn't be surprised if we had a local variable or label named
"exit" or "abort".)

Btw - I can't find a rule 21.21 in my two (slightly different) copies
of the doc, nor one with this headline and a different number. What I
have is "21.8 The Standard Library functions abort, exit and system of
 shall not be used". (I further wonder why neither of the two
docs allows me to copy-and-paste a line out of it.)


Rule 21.21 was added in MISRA C:2012 Amendment 2, which you can download
(free of charge) from 
https://www.misra.org.uk/app/uploads/2021/06/MISRA-C-2012-AMD2.pdf

Re: [PATCH] tools/xenstored: Correct the prototype of domain_max_chk()

2023-06-12 Thread Jason Andryuk

On Mon, Jun 12, 2023 at 6:13 AM Julien Grall  wrote:
>
> From: Julien Grall 
>
> Some version of GCC will complain because the prototype and the
> declaration of domain_max_chk() don't match:
>
> xenstored_domain.c:1503:6: error: conflicting types for 'domain_max_chk' due 
> to enum/integer mismatch; have '_Bool(const struct connection *, enum 
> accitem,  unsigned int)' [-Werror=enum-int-mismatch]
>  1503 | bool domain_max_chk(const struct connection *conn, enum accitem what,
>   |  ^~
> In file included from xenstored_domain.c:31:
> xenstored_domain.h:146:6: note: previous declaration of 'domain_max_chk' with 
> type '_Bool(const struct connection *, unsigned int,  unsigned int)'
>   146 | bool domain_max_chk(const struct connection *conn, unsigned int what,
>   |  ^~
>
> Update the prototype to match the declaration.
>
> This was spotted by Gitlab CI with the job opensuse-tumbleweed-gcc.
>
> Fixes: 685048441e1c ("tools/xenstore: switch quota management to be table 
> based")
> Signed-off-by: Julien Grall 

Reviewed-by: Jason Andryuk 
Tested-by: Jason Andryuk 

This fixes the issue on Fedora 38, too.

Thanks,
Jason

Re: [PATCH v2 4/4] maintainers: Add ppc64 maintainer

2023-06-12 Thread Jan Beulich

On 12.06.2023 16:51, Shawn Anastasio wrote:
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -460,6 +460,10 @@ X:   xen/arch/x86/acpi/lib.c
>  F:   xen/drivers/cpufreq/
>  F:   xen/include/acpi/cpufreq/
>  
> +PPC64
> +M:   Shawn Anastasio 
> +F:  xen/arch/ppc

I'm sorry, but two nits again: This lack a trailing slash, and
padding is done using spaces on the 2nd line instead of a tab.

Jan

Re: [PATCH v2 0/4] Initial support for Power

2023-06-12 Thread Shawn Anastasio

On Mon Jun 12, 2023 at 9:51 AM CDT, Shawn Anastasio wrote:
> With an appropriate powerpc64le-linux-gnu cross-toolchain, the minimal
> image can be built with:
>
> $ make XEN_TARGET_ARCH=ppc64 -C xen openpower_defconfig
> $ make XEN_TARGET_ARCH=ppc64 SUBSYSTEMS=xen -C xen TARGET=ppc64/head.o

Minor clarification to this cover letter, the manual TARGET= override
is not necessary. All that is needed is:

$ make XEN_TARGET_ARCH=ppc64 SUBSYSTEMS=xen -C xen build

Thanks,
Shawn

[PATCH v2 4/4] maintainers: Add ppc64 maintainer

2023-06-12 Thread Shawn Anastasio

Signed-off-by: Shawn Anastasio 
---
 MAINTAINERS | 4 
 1 file changed, 4 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1bb7a6a839..8966175400 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -460,6 +460,10 @@ X: xen/arch/x86/acpi/lib.c
 F: xen/drivers/cpufreq/
 F: xen/include/acpi/cpufreq/
 
+PPC64
+M: Shawn Anastasio 
+F:  xen/arch/ppc
+
 PUBLIC I/O INTERFACES AND PV DRIVERS DESIGNS
 M: Juergen Gross 
 S: Supported
-- 
2.30.2

[PATCH v2 1/4] automation: Add container for ppc64le builds

2023-06-12 Thread Shawn Anastasio

Add a container for cross-compiling xen for ppc64le.

Signed-off-by: Shawn Anastasio 
---
 .../build/debian/bullseye-ppc64le.dockerfile  | 28 +++
 automation/scripts/containerize   |  1 +
 2 files changed, 29 insertions(+)
 create mode 100644 automation/build/debian/bullseye-ppc64le.dockerfile

diff --git a/automation/build/debian/bullseye-ppc64le.dockerfile 
b/automation/build/debian/bullseye-ppc64le.dockerfile
new file mode 100644
index 00..8a87631b52
--- /dev/null
+++ b/automation/build/debian/bullseye-ppc64le.dockerfile
@@ -0,0 +1,28 @@
+FROM debian:bullseye-slim
+LABEL maintainer.name="The Xen Project" \
+  maintainer.email="xen-devel@lists.xenproject.org"
+
+ENV DEBIAN_FRONTEND=noninteractive
+ENV USER root
+
+# Add compiler path
+ENV CROSS_COMPILE powerpc64le-linux-gnu-
+
+RUN mkdir /build
+WORKDIR /build
+
+# build depends
+RUN apt-get update && \
+apt-get --quiet --yes --no-install-recommends install \
+bison \
+build-essential \
+checkpolicy \
+flex \
+gawk \
+gcc-powerpc64le-linux-gnu \
+make \
+python3-minimal \
+&& \
+apt-get autoremove -y && \
+apt-get clean && \
+rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/*
diff --git a/automation/scripts/containerize b/automation/scripts/containerize
index 5476ff0ea1..6d46f63665 100755
--- a/automation/scripts/containerize
+++ b/automation/scripts/containerize
@@ -33,6 +33,7 @@ case "_${CONTAINER}" in
 _focal) CONTAINER="${BASE}/ubuntu:focal" ;;
 _jessie) CONTAINER="${BASE}/debian:jessie" ;;
 _jessie-i386) CONTAINER="${BASE}/debian:jessie-i386" ;;
+_bullseye-ppc64le) CONTAINER="${BASE}/debian:bullseye-ppc64le" ;;
 _stretch|_) CONTAINER="${BASE}/debian:stretch" ;;
 _stretch-i386) CONTAINER="${BASE}/debian:stretch-i386" ;;
 _buster-gcc-ibt) CONTAINER="${BASE}/debian:buster-gcc-ibt" ;;
-- 
2.30.2

[PATCH v2 3/4] automation: Add ppc64le cross-build jobs

2023-06-12 Thread Shawn Anastasio

Add build jobs to cross-compile Xen for ppc64le.

Signed-off-by: Shawn Anastasio 
---
 automation/gitlab-ci/build.yaml | 60 +
 1 file changed, 60 insertions(+)

diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
index 420ffa5acb..bd8c7332db 100644
--- a/automation/gitlab-ci/build.yaml
+++ b/automation/gitlab-ci/build.yaml
@@ -183,6 +183,33 @@
   variables:
 <<: *gcc
 
+.ppc64le-cross-build-tmpl:
+  <<: *build
+  variables:
+XEN_TARGET_ARCH: ppc64
+  tags:
+- x86_64
+
+.ppc64le-cross-build:
+  extends: .ppc64le-cross-build-tmpl
+  variables:
+debug: n
+
+.ppc64le-cross-build-debug:
+  extends: .ppc64le-cross-build-tmpl
+  variables:
+debug: y
+
+.gcc-ppc64le-cross-build:
+  extends: .ppc64le-cross-build
+  variables:
+<<: *gcc
+
+.gcc-ppc64le-cross-build-debug:
+  extends: .ppc64le-cross-build-debug
+  variables:
+<<: *gcc
+
 .yocto-test:
   stage: build
   image: registry.gitlab.com/xen-project/xen/${CONTAINER}
@@ -516,6 +543,39 @@ archlinux-current-gcc-riscv64-debug-randconfig:
 EXTRA_FIXED_RANDCONFIG:
   CONFIG_COVERAGE=n
 
+# Power cross-build
+debian-bullseye-gcc-ppc64le:
+  extends: .gcc-ppc64le-cross-build
+  variables:
+CONTAINER: debian:bullseye-ppc64le
+KBUILD_DEFCONFIG: openpower_defconfig
+HYPERVISOR_ONLY: y
+
+debian-bullseye-gcc-ppc64le-debug:
+  extends: .gcc-ppc64le-cross-build-debug
+  variables:
+CONTAINER: debian:bullseye-ppc64le
+KBUILD_DEFCONFIG: openpower_defconfig
+HYPERVISOR_ONLY: y
+
+debian-bullseye-gcc-ppc64le-randconfig:
+  extends: .gcc-ppc64le-cross-build
+  variables:
+CONTAINER: debian:bullseye-ppc64le
+KBUILD_DEFCONFIG: openpower_defconfig
+RANDCONFIG: y
+EXTRA_FIXED_RANDCONFIG:
+  CONFIG_COVERAGE=n
+
+debian-bullseye-gcc-ppc64le-debug-randconfig:
+  extends: .gcc-ppc64le-cross-build-debug
+  variables:
+CONTAINER: debian:bullseye-ppc64le
+KBUILD_DEFCONFIG: openpower_defconfig
+RANDCONFIG: y
+EXTRA_FIXED_RANDCONFIG:
+  CONFIG_COVERAGE=n
+
 # Yocto test jobs
 yocto-qemuarm64:
   extends: .yocto-test-arm64
-- 
2.30.2

[PATCH v2 2/4] xen: Add files needed for minimal ppc64le build

2023-06-12 Thread Shawn Anastasio

Add the build system changes required to build for ppc64le (POWER8+).
As of now the resulting image simply boots to an infinite loop.

$ make XEN_TARGET_ARCH=ppc64 -C xen openpower_defconfig
$ make XEN_TARGET_ARCH=ppc64 SUBSYSTEMS=xen -C xen build

This port targets POWER8+ CPUs running in Little Endian mode specifically,
and does not boot on older machines. Additionally, this initial skeleton
only implements the PaPR/pseries boot protocol which allows it to be
booted in a standard QEMU virtual machine:

$ qemu-system-ppc64 -M pseries-5.2 -m 256M -kernel xen/xen

Signed-off-by: Shawn Anastasio 
---
 config/ppc64.mk  |   5 +
 xen/Makefile |   5 +-
 xen/arch/ppc/Kconfig |  42 ++
 xen/arch/ppc/Kconfig.debug   |   0
 xen/arch/ppc/Makefile|  16 +++
 xen/arch/ppc/Rules.mk|   0
 xen/arch/ppc/arch.mk |  11 ++
 xen/arch/ppc/configs/openpower_defconfig |  13 ++
 xen/arch/ppc/include/asm/config.h|  63 +
 xen/arch/ppc/include/asm/page-bits.h |   7 +
 xen/arch/ppc/ppc64/Makefile  |   1 +
 xen/arch/ppc/ppc64/asm-offsets.c |   0
 xen/arch/ppc/ppc64/head.S|  27 
 xen/arch/ppc/xen.lds.S   | 173 +++
 14 files changed, 361 insertions(+), 2 deletions(-)
 create mode 100644 config/ppc64.mk
 create mode 100644 xen/arch/ppc/Kconfig
 create mode 100644 xen/arch/ppc/Kconfig.debug
 create mode 100644 xen/arch/ppc/Makefile
 create mode 100644 xen/arch/ppc/Rules.mk
 create mode 100644 xen/arch/ppc/arch.mk
 create mode 100644 xen/arch/ppc/configs/openpower_defconfig
 create mode 100644 xen/arch/ppc/include/asm/config.h
 create mode 100644 xen/arch/ppc/include/asm/page-bits.h
 create mode 100644 xen/arch/ppc/ppc64/Makefile
 create mode 100644 xen/arch/ppc/ppc64/asm-offsets.c
 create mode 100644 xen/arch/ppc/ppc64/head.S
 create mode 100644 xen/arch/ppc/xen.lds.S

diff --git a/config/ppc64.mk b/config/ppc64.mk
new file mode 100644
index 00..597f0668c3
--- /dev/null
+++ b/config/ppc64.mk
@@ -0,0 +1,5 @@
+CONFIG_PPC := y
+CONFIG_PPC64 := y
+CONFIG_PPC_$(XEN_OS) := y
+
+CONFIG_XEN_INSTALL_SUFFIX :=
diff --git a/xen/Makefile b/xen/Makefile
index e89fc461fc..db5454fb58 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -38,7 +38,7 @@ EFI_MOUNTPOINT ?= $(BOOT_DIR)/efi
 ARCH=$(XEN_TARGET_ARCH)
 SRCARCH=$(shell echo $(ARCH) | \
   sed -e 's/x86.*/x86/' -e s'/arm\(32\|64\)/arm/g' \
-  -e s'/riscv.*/riscv/g')
+  -e s'/riscv.*/riscv/g' -e s'/ppc.*/ppc/g')
 export ARCH SRCARCH
 
 # Allow someone to change their config file
@@ -244,7 +244,7 @@ include $(XEN_ROOT)/Config.mk
 export TARGET_SUBARCH  := $(XEN_TARGET_ARCH)
 export TARGET_ARCH := $(shell echo $(XEN_TARGET_ARCH) | \
 sed -e 's/x86.*/x86/' -e s'/arm\(32\|64\)/arm/g' \
--e s'/riscv.*/riscv/g')
+-e s'/riscv.*/riscv/g' -e s'/ppc.*/ppc/g')
 
 export CONFIG_SHELL := $(SHELL)
 export CC CXX LD NM OBJCOPY OBJDUMP ADDR2LINE
@@ -563,6 +563,7 @@ _clean:
$(Q)$(MAKE) $(clean)=xsm
$(Q)$(MAKE) $(clean)=crypto
$(Q)$(MAKE) $(clean)=arch/arm
+   $(Q)$(MAKE) $(clean)=arch/ppc
$(Q)$(MAKE) $(clean)=arch/riscv
$(Q)$(MAKE) $(clean)=arch/x86
$(Q)$(MAKE) $(clean)=test
diff --git a/xen/arch/ppc/Kconfig b/xen/arch/ppc/Kconfig
new file mode 100644
index 00..a0a70adef4
--- /dev/null
+++ b/xen/arch/ppc/Kconfig
@@ -0,0 +1,42 @@
+config PPC
+   def_bool y
+
+config PPC64
+   def_bool y
+   select 64BIT
+
+config ARCH_DEFCONFIG
+   string
+   default "arch/ppc/configs/openpower_defconfig"
+
+menu "Architecture Features"
+
+source "arch/Kconfig"
+
+endmenu
+
+menu "ISA Selection"
+
+choice
+   prompt "Base ISA"
+   default POWER_ISA_2_07B if PPC64
+   help
+ This selects the base ISA version that Xen will target.
+
+config POWER_ISA_2_07B
+   bool "Power ISA 2.07B"
+   help
+ Target version 2.07B of the Power ISA (POWER8)
+
+config POWER_ISA_3_00
+   bool "Power ISA 3.00"
+   help
+ Target version 3.00 of the Power ISA (POWER9)
+
+endchoice
+
+endmenu
+
+source "common/Kconfig"
+
+source "drivers/Kconfig"
diff --git a/xen/arch/ppc/Kconfig.debug b/xen/arch/ppc/Kconfig.debug
new file mode 100644
index 00..e69de29bb2
diff --git a/xen/arch/ppc/Makefile b/xen/arch/ppc/Makefile
new file mode 100644
index 00..10b101cf9c
--- /dev/null
+++ b/xen/arch/ppc/Makefile
@@ -0,0 +1,16 @@
+obj-$(CONFIG_PPC64) += ppc64/
+
+$(TARGET): $(TARGET)-syms
+   cp -f $< $@
+
+$(TARGET)-syms: $(objtree)/prelink.o $(obj)/xen.lds
+   $(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< $(build_id_linker) -o $@
+   $(NM) -pa --format=sysv $(@D)/$(@F) \
+   | $(objtree)/tools/symbols --all-symbols --xensyms --sysv 
--sort \

[PATCH v2 0/4] Initial support for Power

2023-06-12 Thread Shawn Anastasio

Hello all,

This patch series adds support for building a minimal image
(head.o-only) for Power ISA 2.07B+ (POWER8+) systems. The first patch
boots to an infinite loop and the second adds early serial console
support on pseries VMs, with bare metal support planned next.

Since Xen previously had support for a much older version of the ISA in
version 3.2.3, we were able to carry over some headers and support
routines from that version. Unlike that initial port though, this effort
focuses solely on POWER8+ CPUs that are capable of running in Little
Endian mode.

With an appropriate powerpc64le-linux-gnu cross-toolchain, the minimal
image can be built with:

$ make XEN_TARGET_ARCH=ppc64 -C xen openpower_defconfig
$ make XEN_TARGET_ARCH=ppc64 SUBSYSTEMS=xen -C xen TARGET=ppc64/head.o

The resulting head.o can then be booted in a standard QEMU/pseries VM:

$ qemu-system-ppc64 -M pseries-5.2 -m 256M -kernel xen/ppc64/head.o \
-vga none -serial mon:stdio -nographic

Thanks,
Shawn

--
Changes from v2:
  - Add ppc64le cross-build container patch
  - Add ppc64le cross build CI job patch
  - Drop serial output patch (will be in future patch series)
  - Drop setup.c and unneeded headers from minimal build patch
  - Fixed ordering of MAINTAINERS patch + add F: line
  - Fix config/ppc64.mk option names
  - Clarify Kconfig Baseline ISA option help strings

Shawn Anastasio (4):
  automation: Add container for ppc64le builds
  xen: Add files needed for minimal ppc64le build
  automation: Add ppc64le cross-build jobs
  maintainers: Add ppc64 maintainer

 MAINTAINERS   |   4 +
 .../build/debian/bullseye-ppc64le.dockerfile  |  28 +++
 automation/gitlab-ci/build.yaml   |  60 ++
 automation/scripts/containerize   |   1 +
 config/ppc64.mk   |   5 +
 xen/Makefile  |   5 +-
 xen/arch/ppc/Kconfig  |  42 +
 xen/arch/ppc/Kconfig.debug|   0
 xen/arch/ppc/Makefile |  16 ++
 xen/arch/ppc/Rules.mk |   0
 xen/arch/ppc/arch.mk  |  11 ++
 xen/arch/ppc/configs/openpower_defconfig  |  13 ++
 xen/arch/ppc/include/asm/config.h |  63 +++
 xen/arch/ppc/include/asm/page-bits.h  |   7 +
 xen/arch/ppc/ppc64/Makefile   |   1 +
 xen/arch/ppc/ppc64/asm-offsets.c  |   0
 xen/arch/ppc/ppc64/head.S |  27 +++
 xen/arch/ppc/xen.lds.S| 173 ++
 18 files changed, 454 insertions(+), 2 deletions(-)
 create mode 100644 automation/build/debian/bullseye-ppc64le.dockerfile
 create mode 100644 config/ppc64.mk
 create mode 100644 xen/arch/ppc/Kconfig
 create mode 100644 xen/arch/ppc/Kconfig.debug
 create mode 100644 xen/arch/ppc/Makefile
 create mode 100644 xen/arch/ppc/Rules.mk
 create mode 100644 xen/arch/ppc/arch.mk
 create mode 100644 xen/arch/ppc/configs/openpower_defconfig
 create mode 100644 xen/arch/ppc/include/asm/config.h
 create mode 100644 xen/arch/ppc/include/asm/page-bits.h
 create mode 100644 xen/arch/ppc/ppc64/Makefile
 create mode 100644 xen/arch/ppc/ppc64/asm-offsets.c
 create mode 100644 xen/arch/ppc/ppc64/head.S
 create mode 100644 xen/arch/ppc/xen.lds.S

-- 
2.30.2

Re: [PATCH V3 3/3] libxl: arm: Add grant_usage parameter for virtio devices

2023-06-12 Thread Anthony PERARD

On Fri, Jun 02, 2023 at 11:19:09AM +0530, Viresh Kumar wrote:
> diff --git a/tools/libs/light/libxl_virtio.c b/tools/libs/light/libxl_virtio.c
> index f8a78e22d156..19d834984777 100644
> --- a/tools/libs/light/libxl_virtio.c
> +++ b/tools/libs/light/libxl_virtio.c
> @@ -48,11 +56,13 @@ static int libxl__set_xenstore_virtio(libxl__gc *gc, 
> uint32_t domid,
>flexarray_t *ro_front)
>  {
>  const char *transport = 
> libxl_virtio_transport_to_string(virtio->transport);
> +const char *grant_usage = libxl_defbool_to_string(virtio->grant_usage);
>  
>  flexarray_append_pair(back, "irq", GCSPRINTF("%u", virtio->irq));
>  flexarray_append_pair(back, "base", GCSPRINTF("%#"PRIx64, virtio->base));
>  flexarray_append_pair(back, "type", GCSPRINTF("%s", virtio->type));
>  flexarray_append_pair(back, "transport", GCSPRINTF("%s", transport));
> +flexarray_append_pair(back, "grant_usage", GCSPRINTF("%s", grant_usage));

It doesn't seems like a good idea to write a string like "True" or
"False" in xenstore when a simple integer would work. Also I'm pretty
sure all other bool are written as "0" or "1", for false or true.
Could you change to write "0" or "1" instead of using
libxl_defbool_to_string() ?


Beside this, patch looks good to me.

Cheers,

-- 
Anthony PERARD

[PATCH 3/3] swiotlb: unexport is_swiotlb_active

2023-06-12 Thread Christoph Hellwig

Drivers have no business looking at dma-mapping or swiotlb internals.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 775f7bb10ab184..1891faa3a6952e 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -932,7 +932,6 @@ bool is_swiotlb_active(struct device *dev)
 
return mem && mem->nslabs;
 }
-EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
 #ifdef CONFIG_DEBUG_FS
 
-- 
2.39.2

[PATCH 1/3] xen/pci: add flag for PCI passthrough being possible

2023-06-12 Thread Christoph Hellwig

From: Juergen Gross 

When running as a Xen PV guests passed through PCI devices only have a
chance to work if the Xen supplied memory map has some PCI space
reserved.

Add a flag xen_pv_pci_possible which will be set in early boot in case
the memory map has at least one area with the type E820_TYPE_RESERVED.

Signed-off-by: Juergen Gross 
Signed-off-by: Christoph Hellwig 
---
 arch/x86/xen/setup.c | 6 ++
 include/xen/xen.h| 6 ++
 2 files changed, 12 insertions(+)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index c2be3efb2ba0fa..716f76c4141651 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -43,6 +43,9 @@ struct xen_memory_region 
xen_extra_mem[XEN_EXTRA_MEM_MAX_REGIONS] __initdata;
 /* Number of pages released from the initial allocation. */
 unsigned long xen_released_pages;
 
+/* Memory map would allow PCI passthrough. */
+bool xen_pv_pci_possible;
+
 /* E820 map used during setting up memory. */
 static struct e820_table xen_e820_table __initdata;
 
@@ -804,6 +807,9 @@ char * __init xen_memory_setup(void)
chunk_size = size;
type = xen_e820_table.entries[i].type;
 
+   if (type == E820_TYPE_RESERVED)
+   xen_pv_pci_possible = true;
+
if (type == E820_TYPE_RAM) {
if (addr < mem_end) {
chunk_size = min(size, mem_end - addr);
diff --git a/include/xen/xen.h b/include/xen/xen.h
index 0efeb652f9b8fb..5eb0a974a11e7e 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -29,6 +29,12 @@ extern bool xen_pvh;
 
 extern uint32_t xen_start_flags;
 
+#ifdef CONFIG_XEN_PV
+extern bool xen_pv_pci_possible;
+#else
+#define xen_pv_pci_possible0
+#endif
+
 #include 
 extern struct hvm_start_info pvh_start_info;
 
-- 
2.39.2

unexport swiotlb_active v2

2023-06-12 Thread Christoph Hellwig

Hi all,

this little series removes the last swiotlb API exposed to modules.

Changes since v1:
 - add a patch from Juergen to export if the e820 table indicates Xen PV
   PCI is enabled
 - slightly reorganize the logic to check if swiotlb is needed for
   Xen/x86
 - drop the already merged nouveau patch

Diffstat:
 arch/x86/include/asm/xen/swiotlb-xen.h |6 --
 arch/x86/kernel/pci-dma.c  |   29 +++--
 arch/x86/xen/setup.c   |6 ++
 drivers/pci/xen-pcifront.c |6 --
 include/xen/xen.h  |6 ++
 kernel/dma/swiotlb.c   |1 -
 6 files changed, 19 insertions(+), 35 deletions(-)

[PATCH 2/3] x86: always initialize xen-swiotlb when xen-pcifront is enabling

2023-06-12 Thread Christoph Hellwig

Remove the dangerous late initialization of xen-swiotlb in
pci_xen_swiotlb_init_late and instead just always initialize
xen-swiotlb in the boot code if CONFIG_XEN_PCIDEV_FRONTEND is
enabled and Xen PV PCI is possible.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/include/asm/xen/swiotlb-xen.h |  6 --
 arch/x86/kernel/pci-dma.c  | 29 +++---
 drivers/pci/xen-pcifront.c |  6 --
 3 files changed, 7 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/xen/swiotlb-xen.h 
b/arch/x86/include/asm/xen/swiotlb-xen.h
index 77a2d19cc9909e..abde0f44df57dc 100644
--- a/arch/x86/include/asm/xen/swiotlb-xen.h
+++ b/arch/x86/include/asm/xen/swiotlb-xen.h
@@ -2,12 +2,6 @@
 #ifndef _ASM_X86_SWIOTLB_XEN_H
 #define _ASM_X86_SWIOTLB_XEN_H
 
-#ifdef CONFIG_SWIOTLB_XEN
-extern int pci_xen_swiotlb_init_late(void);
-#else
-static inline int pci_xen_swiotlb_init_late(void) { return -ENXIO; }
-#endif
-
 int xen_swiotlb_fixup(void *buf, unsigned long nslabs);
 int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
unsigned int address_bits,
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index de6be0a3965ee4..f323d83e40a70b 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -72,9 +72,15 @@ static inline void __init pci_swiotlb_detect(void)
 #endif /* CONFIG_SWIOTLB */
 
 #ifdef CONFIG_SWIOTLB_XEN
+static bool xen_swiotlb_enabled(void)
+{
+   return xen_initial_domain() || x86_swiotlb_enable ||
+   (IS_ENABLED(CONFIG_XEN_PCIDEV_FRONTEND) && xen_pv_pci_possible);
+}
+
 static void __init pci_xen_swiotlb_init(void)
 {
-   if (!xen_initial_domain() && !x86_swiotlb_enable)
+   if (!xen_swiotlb_enabled())
return;
x86_swiotlb_enable = true;
x86_swiotlb_flags |= SWIOTLB_ANY;
@@ -83,27 +89,6 @@ static void __init pci_xen_swiotlb_init(void)
if (IS_ENABLED(CONFIG_PCI))
pci_request_acs();
 }
-
-int pci_xen_swiotlb_init_late(void)
-{
-   if (dma_ops == _swiotlb_dma_ops)
-   return 0;
-
-   /* we can work with the default swiotlb */
-   if (!io_tlb_default_mem.nslabs) {
-   int rc = swiotlb_init_late(swiotlb_size_or_default(),
-  GFP_KERNEL, xen_swiotlb_fixup);
-   if (rc < 0)
-   return rc;
-   }
-
-   /* XXX: this switches the dma ops under live devices! */
-   dma_ops = _swiotlb_dma_ops;
-   if (IS_ENABLED(CONFIG_PCI))
-   pci_request_acs();
-   return 0;
-}
-EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
 #else
 static inline void __init pci_xen_swiotlb_init(void)
 {
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index 83c0ab50676dff..11636634ae512f 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -22,7 +22,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include 
@@ -669,11 +668,6 @@ static int pcifront_connect_and_init_dma(struct 
pcifront_device *pdev)
 
spin_unlock(_dev_lock);
 
-   if (!err && !is_swiotlb_active(>xdev->dev)) {
-   err = pci_xen_swiotlb_init_late();
-   if (err)
-   dev_err(>xdev->dev, "Could not setup SWIOTLB!\n");
-   }
return err;
 }
 
-- 
2.39.2

Re: [PATCH v1 5/8] xen/riscv: introduce identity mapping

2023-06-12 Thread Jan Beulich

On 12.06.2023 15:48, Jan Beulich wrote:
> On 06.06.2023 21:55, Oleksii Kurochko wrote:
>> -void __init noreturn noinline enable_mmu()
>> +/*
>> + * enable_mmu() can't be __init because __init section isn't part of 
>> identity
>> + * mapping so it will cause an issue after MMU will be enabled.
>> + */
> 
> As hinted at above already - perhaps the identity mapping wants to be
> larger, up to covering the entire Xen image? Since it's temporary
> only anyway, you could even consider using a large page (and RWX
> permission). You already require no overlap of link and load addresses,
> so at least small page mappings ought to be possible for the entire
> image.

To expand on that: Assume a future change on this path results in a call
to memcpy() or memset() being introduced by the compiler (and then let's
further assume this only occurs for a specific compiler version). Right
now such a case would be noticed simply because we don't build those
library functions yet. But it'll likely be a perplexing crash once a full
hypervisor can be built, the more that exception handlers also aren't
mapped.

>> - mmu_is_enabled:
>>  /*
>> - * Stack should be re-inited as:
>> - * 1. Right now an address of the stack is relative to load time
>> - *addresses what will cause an issue in case of load start address
>> - *isn't equal to linker start address.
>> - * 2. Addresses in stack are all load time relative which can be an
>> - *issue in case when load start address isn't equal to linker
>> - *start address.
>> - *
>> - * We can't return to the caller because the stack was reseted
>> - * and it may have stash some variable on the stack.
>> - * Jump to a brand new function as the stack was reseted
>> + * id_addrs should be in sync with id mapping in
>> + * setup_initial_pagetables()
> 
> What is "id" meant to stand for here? Also if things need keeping in
> sync, then a similar comment should exist on the other side.

I guess it's meant to stand for "identity mapping", but the common use
of "id" makes we wonder if the variable wouldn't better be ident_addrs[].

Jan

Re: [PATCH V3 2/3] libxl: Call libxl__virtio_devtype.set_default() early enough

2023-06-12 Thread Anthony PERARD

On Fri, Jun 02, 2023 at 11:19:08AM +0530, Viresh Kumar wrote:
> The _setdefault() function for virtio devices is getting called after
> libxl__prepare_dtb(), which is late as libxl__prepare_dtb() expects the
> defaults to be already set by this time.
> 
> Call libxl__virtio_devtype.set_default() from
> libxl__domain_config_setdefault(), in a similar way as other devices
> like disk, etc.
> 
> Suggested-by: Anthony PERARD 
> Signed-off-by: Viresh Kumar 

Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: [PATCH V3 1/3] libxl: virtio: Remove unused frontend nodes

2023-06-12 Thread Anthony PERARD

On Fri, Jun 02, 2023 at 11:19:07AM +0530, Viresh Kumar wrote:
> Only the VirtIO backend will watch xenstore to find out when a new
> instance needs to be created for a guest, and read the parameters from
> there. VirtIO frontend are only virtio, so they will not do anything
> with the xenstore nodes. They can be removed.
> 
> While at it, also add a comment to the libxl_virtio.c file.
> 
> Signed-off-by: Viresh Kumar 

Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

[qemu-mainline test] 181389: regressions - FAIL

2023-06-12 Thread osstest service owner

flight 181389 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181389/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64-xsm   6 xen-buildfail REGR. vs. 180691
 build-arm64   6 xen-buildfail REGR. vs. 180691
 build-amd64   6 xen-buildfail REGR. vs. 180691
 build-i3866 xen-buildfail REGR. vs. 180691
 build-amd64-xsm   6 xen-buildfail REGR. vs. 180691
 build-i386-xsm6 xen-buildfail REGR. vs. 180691
 build-armhf   6 xen-buildfail REGR. vs. 180691

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-ws16-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-shadow 1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-vhd1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-coresched-i386-xl  1 build-check(1)   blocked  n/a
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a

Re: [PATCH v1 5/8] xen/riscv: introduce identity mapping

2023-06-12 Thread Jan Beulich

On 06.06.2023 21:55, Oleksii Kurochko wrote:
> The way how switch to virtual address was implemented in the
> commit e66003e7be ("xen/riscv: introduce setup_initial_pages")
> wasn't safe enough so identity mapping was introduced and
> used.

I don't think this is sufficient as a description. You want to make
clear what the "not safe enough" is, and you also want to go into
more detail as to the solution chosen. I'm particularly puzzled that
you map just two singular pages ...

> @@ -35,8 +40,10 @@ static unsigned long phys_offset;
>   *
>   * It might be needed one more page table in case when Xen load address
>   * isn't 2 MB aligned.
> + *
> + * 3 additional page tables are needed for identity mapping.
>   */
> -#define PGTBL_INITIAL_COUNT ((CONFIG_PAGING_LEVELS - 1) + 1)
> +#define PGTBL_INITIAL_COUNT ((CONFIG_PAGING_LEVELS - 1) + 1 + 3)

What is this 3 coming from? It feels like the value should (again)
somehow depend on CONFIG_PAGING_LEVELS.

> @@ -108,16 +116,18 @@ static void __init setup_initial_mapping(struct 
> mmu_desc *mmu_desc,
>  {
>  unsigned long paddr = (page_addr - map_start) + pa_start;
>  unsigned int permissions = PTE_LEAF_DEFAULT;
> +unsigned long addr = (is_identity_mapping) ?

Nit: No need for parentheses here.

> + page_addr : LINK_TO_LOAD(page_addr);

As a remark, while we want binary operators at the end of lines when
wrapping, we usually do things differently for the ternary operator:
Either

unsigned long addr = is_identity_mapping
 ? page_addr : LINK_TO_LOAD(page_addr);

or

unsigned long addr = is_identity_mapping
 ? page_addr
 : LINK_TO_LOAD(page_addr);

.

> @@ -232,22 +242,27 @@ void __init setup_initial_pagetables(void)
>linker_start,
>linker_end,
>load_start);
> +
> +if ( linker_start == load_start )
> +return;
> +
> +setup_initial_mapping(_desc,
> +  load_start,
> +  load_start + PAGE_SIZE,
> +  load_start);
> +
> +setup_initial_mapping(_desc,
> +  (unsigned long)cpu0_boot_stack,
> +  (unsigned long)cpu0_boot_stack + PAGE_SIZE,

Shouldn't this be STACK_SIZE (and then also be prepared for
STACK_SIZE > PAGE_SIZE)?

> +  (unsigned long)cpu0_boot_stack);
>  }
>  
> -void __init noreturn noinline enable_mmu()
> +/*
> + * enable_mmu() can't be __init because __init section isn't part of identity
> + * mapping so it will cause an issue after MMU will be enabled.
> + */

As hinted at above already - perhaps the identity mapping wants to be
larger, up to covering the entire Xen image? Since it's temporary
only anyway, you could even consider using a large page (and RWX
permission). You already require no overlap of link and load addresses,
so at least small page mappings ought to be possible for the entire
image.

> @@ -255,25 +270,41 @@ void __init noreturn noinline enable_mmu()
>  csr_write(CSR_SATP,
>PFN_DOWN((unsigned long)stage1_pgtbl_root) |
>RV_STAGE1_MODE << SATP_MODE_SHIFT);
> +}
> +
> +void __init remove_identity_mapping(void)
> +{
> +int i, j;

Nit: unsigned int please.

> +pte_t *pgtbl;
> +unsigned int index, xen_index;

These would all probably better be declared in the narrowest possible
scope.

> -asm volatile ( ".p2align 2" );
> - mmu_is_enabled:
>  /*
> - * Stack should be re-inited as:
> - * 1. Right now an address of the stack is relative to load time
> - *addresses what will cause an issue in case of load start address
> - *isn't equal to linker start address.
> - * 2. Addresses in stack are all load time relative which can be an
> - *issue in case when load start address isn't equal to linker
> - *start address.
> - *
> - * We can't return to the caller because the stack was reseted
> - * and it may have stash some variable on the stack.
> - * Jump to a brand new function as the stack was reseted
> + * id_addrs should be in sync with id mapping in
> + * setup_initial_pagetables()

What is "id" meant to stand for here? Also if things need keeping in
sync, then a similar comment should exist on the other side.

>   */
> +unsigned long id_addrs[] =  {
> + LINK_TO_LOAD(_start),
> + LINK_TO_LOAD(cpu0_boot_stack),
> +};
>  
> -switch_stack_and_jump((unsigned long)cpu0_boot_stack + STACK_SIZE,
> -  cont_after_mmu_is_enabled);
> +pgtbl = stage1_pgtbl_root;
> +
> +for ( j = 0; j < ARRAY_SIZE(id_addrs); j++ )
> +{
> +for

[PATCH v2] xen/arm: rename guest_cpuinfo in domain_cpuinfo

2023-06-12 Thread Bertrand Marquis

Rename the guest_cpuinfo structure to domain_cpuinfo as it is not only
used for guests but also for dom0 so domain is a more suitable name.

While there also rename the create_guest_cpuinfo function to
create_domain_cpuinfo to be coherent and fix comments accordingly.

Signed-off-by: Bertrand Marquis 

---
Changes in v2:
- fix 2 more comments to domain instead of guest (Julien)
---
 xen/arch/arm/arm64/vsysreg.c  |  6 ++--
 xen/arch/arm/cpufeature.c | 44 +--
 xen/arch/arm/include/asm/cpufeature.h |  2 +-
 xen/arch/arm/vcpreg.c |  2 +-
 4 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
index fe31f7b3827f..b5d54c569b33 100644
--- a/xen/arch/arm/arm64/vsysreg.c
+++ b/xen/arch/arm/arm64/vsysreg.c
@@ -76,7 +76,7 @@ TVM_REG(CONTEXTIDR_EL1)
 case HSR_SYSREG_##reg:  \
 {   \
 return handle_ro_read_val(regs, regidx, hsr.sysreg.read, hsr,   \
-  1, guest_cpuinfo.field.bits[offset]); \
+  1, domain_cpuinfo.field.bits[offset]); \
 }
 
 void do_sysreg(struct cpu_user_regs *regs,
@@ -300,7 +300,7 @@ void do_sysreg(struct cpu_user_regs *regs,
 
 case HSR_SYSREG_ID_AA64PFR0_EL1:
 {
-register_t guest_reg_value = guest_cpuinfo.pfr64.bits[0];
+register_t guest_reg_value = domain_cpuinfo.pfr64.bits[0];
 
 if ( is_sve_domain(v->domain) )
 {
@@ -336,7 +336,7 @@ void do_sysreg(struct cpu_user_regs *regs,
  * When the guest has the SVE feature enabled, the whole 
id_aa64zfr0_el1
  * needs to be exposed.
  */
-register_t guest_reg_value = guest_cpuinfo.zfr64.bits[0];
+register_t guest_reg_value = domain_cpuinfo.zfr64.bits[0];
 
 if ( is_sve_domain(v->domain) )
 guest_reg_value = system_cpuinfo.zfr64.bits[0];
diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
index b53e1a977601..f43d5cb338d0 100644
--- a/xen/arch/arm/cpufeature.c
+++ b/xen/arch/arm/cpufeature.c
@@ -14,7 +14,7 @@
 
 DECLARE_BITMAP(cpu_hwcaps, ARM_NCAPS);
 
-struct cpuinfo_arm __read_mostly guest_cpuinfo;
+struct cpuinfo_arm __read_mostly domain_cpuinfo;
 
 #ifdef CONFIG_ARM_64
 static bool has_sb_instruction(const struct arm_cpu_capabilities *entry)
@@ -190,46 +190,46 @@ void identify_cpu(struct cpuinfo_arm *c)
 
 /*
  * This function is creating a cpuinfo structure with values modified to mask
- * all cpu features that should not be published to guest.
- * The created structure is then used to provide ID registers values to guests.
+ * all cpu features that should not be published to domains.
+ * The created structure is then used to provide ID registers values to 
domains.
  */
-static int __init create_guest_cpuinfo(void)
+static int __init create_domain_cpuinfo(void)
 {
-/* Use the sanitized cpuinfo as initial guest cpuinfo */
-guest_cpuinfo = system_cpuinfo;
+/* Use the sanitized cpuinfo as initial domain cpuinfo */
+domain_cpuinfo = system_cpuinfo;
 
 #ifdef CONFIG_ARM_64
 /* Hide MPAM support as xen does not support it */
-guest_cpuinfo.pfr64.mpam = 0;
-guest_cpuinfo.pfr64.mpam_frac = 0;
+domain_cpuinfo.pfr64.mpam = 0;
+domain_cpuinfo.pfr64.mpam_frac = 0;
 
 /* Hide SVE by default */
-guest_cpuinfo.pfr64.sve = 0;
-guest_cpuinfo.zfr64.bits[0] = 0;
+domain_cpuinfo.pfr64.sve = 0;
+domain_cpuinfo.zfr64.bits[0] = 0;
 
 /* Hide MTE support as Xen does not support it */
-guest_cpuinfo.pfr64.mte = 0;
+domain_cpuinfo.pfr64.mte = 0;
 
 /* Hide PAC support as Xen does not support it */
-guest_cpuinfo.isa64.apa = 0;
-guest_cpuinfo.isa64.api = 0;
-guest_cpuinfo.isa64.gpa = 0;
-guest_cpuinfo.isa64.gpi = 0;
+domain_cpuinfo.isa64.apa = 0;
+domain_cpuinfo.isa64.api = 0;
+domain_cpuinfo.isa64.gpa = 0;
+domain_cpuinfo.isa64.gpi = 0;
 #endif
 
 /* Hide AMU support */
 #ifdef CONFIG_ARM_64
-guest_cpuinfo.pfr64.amu = 0;
+domain_cpuinfo.pfr64.amu = 0;
 #endif
-guest_cpuinfo.pfr32.amu = 0;
+domain_cpuinfo.pfr32.amu = 0;
 
 /* Hide RAS support as Xen does not support it */
 #ifdef CONFIG_ARM_64
-guest_cpuinfo.pfr64.ras = 0;
-guest_cpuinfo.pfr64.ras_frac = 0;
+domain_cpuinfo.pfr64.ras = 0;
+domain_cpuinfo.pfr64.ras_frac = 0;
 #endif
-guest_cpuinfo.pfr32.ras = 0;
-guest_cpuinfo.pfr32.ras_frac = 0;
+domain_cpuinfo.pfr32.ras = 0;
+domain_cpuinfo.pfr32.ras_frac = 0;
 
 return 0;
 }
@@ -237,7 +237,7 @@ static int __init create_guest_cpuinfo(void)
  * This function needs to be run after all smp are started to have
  * cpuinfo structures for all cores.
  */
-__initcall(create_guest_cpuinfo);
+__initcall(create_domain_cpuinfo);
 
 /*
  * Local variables:
diff --git

Re: [PATCH 1/5] xen-mfndump: drop dead assignment to "page" from lookup_pte_func()

2023-06-12 Thread Jason Andryuk

On Mon, Jun 12, 2023 at 7:45 AM Jan Beulich  wrote:
>
> The variable isn't used past the loop, and its value also isn't
> meaningful across iterations. Reduce its scope to make this more
> obvious.
>
> Coverity ID: 1532310
> Fixes: ae763e422430 ("tools/misc: introduce xen-mfndump")
> Signed-off-by: Jan Beulich 

Reviewed-by: Jason Andryuk 

Thanks,
Jason

Re: [QEMU PATCH 1/1] virtgpu: do not destroy resources when guest suspend

2023-06-12 Thread Marc-André Lureau

Hi

On Thu, Jun 8, 2023 at 6:26 AM Jiqian Chen  wrote:

> After suspending and resuming guest VM, you will get
> a black screen, and the display can't come back.
>
> This is because when guest did suspending, it called
> into qemu to call virtio_gpu_gl_reset. In function
> virtio_gpu_gl_reset, it destroyed resources and reset
> renderer, which were used for display. As a result,
> guest's screen can't come back to the time when it was
> suspended and only showed black.
>
> So, this patch adds a new ctrl message
> VIRTIO_GPU_CMD_STATUS_FREEZING to get notification from
> guest. If guest is during suspending, it sets freezing
> status of virtgpu to true, this will prevent destroying
> resources and resetting renderer when guest calls into
> virtio_gpu_gl_reset. If guest is during resuming, it sets
> freezing to false, and then virtio_gpu_gl_reset will keep
> its origin actions and has no other impaction.
>
> Signed-off-by: Jiqian Chen 
> ---
>  hw/display/virtio-gpu-gl.c  |  9 ++-
>  hw/display/virtio-gpu-virgl.c   |  3 +++
>  hw/display/virtio-gpu.c | 26 +++--
>  include/hw/virtio/virtio-gpu.h  |  3 +++
>  include/standard-headers/linux/virtio_gpu.h |  9 +++
>  5 files changed, 47 insertions(+), 3 deletions(-)
>
> diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
> index e06be60dfb..e11ad233eb 100644
> --- a/hw/display/virtio-gpu-gl.c
> +++ b/hw/display/virtio-gpu-gl.c
> @@ -100,7 +100,14 @@ static void virtio_gpu_gl_reset(VirtIODevice *vdev)
>   */
>  if (gl->renderer_inited && !gl->renderer_reset) {
>  virtio_gpu_virgl_reset_scanout(g);
> -gl->renderer_reset = true;
> +/*
> + * If guest is suspending, we shouldn't reset renderer,
> + * otherwise, the display can't come back to the time when
> + * it was suspended after guest resumed.
> + */
> +if (!g->freezing) {
> +gl->renderer_reset = true;
> +}
>  }
>  }
>
> diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
> index 73cb92c8d5..183ec92d53 100644
> --- a/hw/display/virtio-gpu-virgl.c
> +++ b/hw/display/virtio-gpu-virgl.c
> @@ -464,6 +464,9 @@ void virtio_gpu_virgl_process_cmd(VirtIOGPU *g,
>  case VIRTIO_GPU_CMD_GET_EDID:
>  virtio_gpu_get_edid(g, cmd);
>  break;
> +case VIRTIO_GPU_CMD_STATUS_FREEZING:
> +virtio_gpu_cmd_status_freezing(g, cmd);
> +break;
>  default:
>  cmd->error = VIRTIO_GPU_RESP_ERR_UNSPEC;
>  break;
> diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
> index 5e15c79b94..8f235d7848 100644
> --- a/hw/display/virtio-gpu.c
> +++ b/hw/display/virtio-gpu.c
> @@ -373,6 +373,16 @@ static void virtio_gpu_resource_create_blob(VirtIOGPU
> *g,
>  QTAILQ_INSERT_HEAD(>reslist, res, next);
>  }
>
> +void virtio_gpu_cmd_status_freezing(VirtIOGPU *g,
> + struct virtio_gpu_ctrl_command *cmd)
> +{
> +struct virtio_gpu_status_freezing sf;
> +
> +VIRTIO_GPU_FILL_CMD(sf);
> +virtio_gpu_bswap_32(, sizeof(sf));
> +g->freezing = sf.freezing;
> +}
> +
>  static void virtio_gpu_disable_scanout(VirtIOGPU *g, int scanout_id)
>  {
>  struct virtio_gpu_scanout *scanout =
> >parent_obj.scanout[scanout_id];
> @@ -986,6 +996,9 @@ void virtio_gpu_simple_process_cmd(VirtIOGPU *g,
>  case VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING:
>  virtio_gpu_resource_detach_backing(g, cmd);
>  break;
> +case VIRTIO_GPU_CMD_STATUS_FREEZING:
> +virtio_gpu_cmd_status_freezing(g, cmd);
> +break;
>  default:
>  cmd->error = VIRTIO_GPU_RESP_ERR_UNSPEC;
>  break;
> @@ -1344,6 +1357,8 @@ void virtio_gpu_device_realize(DeviceState *qdev,
> Error **errp)
>  QTAILQ_INIT(>reslist);
>  QTAILQ_INIT(>cmdq);
>  QTAILQ_INIT(>fenceq);
> +
> +g->freezing = false;
>  }
>
>  void virtio_gpu_reset(VirtIODevice *vdev)
> @@ -1352,8 +1367,15 @@ void virtio_gpu_reset(VirtIODevice *vdev)
>  struct virtio_gpu_simple_resource *res, *tmp;
>  struct virtio_gpu_ctrl_command *cmd;
>
> -QTAILQ_FOREACH_SAFE(res, >reslist, next, tmp) {
> -virtio_gpu_resource_destroy(g, res);
> +/*
> + * If guest is suspending, we shouldn't destroy resources,
> + * otherwise, the display can't come back to the time when
> + * it was suspended after guest resumed.
> + */
> +if (!g->freezing) {
> +QTAILQ_FOREACH_SAFE(res, >reslist, next, tmp) {
> +virtio_gpu_resource_destroy(g, res);
> +}
>  }
>
>  while (!QTAILQ_EMPTY(>cmdq)) {
> diff --git a/include/hw/virtio/virtio-gpu.h
> b/include/hw/virtio/virtio-gpu.h
> index 2e28507efe..c21c2990fb 100644
> --- a/include/hw/virtio/virtio-gpu.h
> +++ b/include/hw/virtio/virtio-gpu.h
> @@ -173,6 +173,7 @@ struct VirtIOGPU {
>
>  uint64_t hostmem;
>
> +bool freezing;
>  bool processing_cmdq;
>

Re: [PATCH 5/5] libxl: drop dead assignment to transaction variable from libxl__domain_make()

2023-06-12 Thread Juergen Gross


On 12.06.23 13:47, Jan Beulich wrote:

"t" is written first thing at the "retry_transaction" label.

Coverity ID: 1532321
Fixes: 1057300109ea ("libxl: fix error handling (xenstore transaction leak) in 
libxl__domain_make")
Signed-off-by: Jan Beulich 


Reviewed-by: Juergen Gross 


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH 4/5] libxg: drop dead assignment to "rc" from xc_cpuid_apply_policy()

2023-06-12 Thread Juergen Gross


On 12.06.23 13:47, Jan Beulich wrote:

"rc" is written immediately below the outer if(). Fold the remaining two
if()s.

Coverity ID: 1532320
Fixes: 685e922d6f30 ("tools/libxc: Rework xc_cpuid_apply_policy() to use 
{get,set}_cpu_policy()")
Signed-off-by: Jan Beulich 


Reviewed-by: Juergen Gross 


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH 3/5] libxg: drop dead assignment to "ptes" from xc_core_arch_map_p2m_list_rw()

2023-06-12 Thread Juergen Gross


On 12.06.23 13:46, Jan Beulich wrote:

The function returns immediately after the enclosing if().

Coverity ID: 1532314
Fixes: bd7a29c3d0b9 ("tools/libs/ctrl: fix xc_core_arch_map_p2m() to support linear 
p2m table")
Signed-off-by: Jan Beulich 


Reviewed-by: Juergen Gross 


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

1 2 >

1 - 100 of 144 matches

Mail list logo