date:20230614

[ovmf test] 181429: all pass - PUSHED

2023-06-14 Thread osstest service owner

flight 181429 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181429/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf aad98d915abe5ba092e318913028ed47937a9447
baseline version:
 ovmf 51bb8eb76c4e8c57d5484c647ecf0b5c5fa8fa94

Last test of basis   181404  2023-06-13 08:11:19 Z1 days
Testing same since   181429  2023-06-14 15:14:02 Z0 days1 attempts


People who touched revisions under test:
  BruceX Wang 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   51bb8eb76c..aad98d915a  aad98d915abe5ba092e318913028ed47937a9447 -> 
xen-tested-master

Re: [PATCH v7 00/12] PCI devices passthrough on Arm, part 3

2023-06-14 Thread Stewart Hildebrand

On 6/13/23 06:32, Volodymyr Babchuk wrote:
> Hello,
> 
> This is another another version of vPCI rework (previous one can be
> found at [1]). The biggest change is how vPCI locking is done. This
> series uses per-domain vPCI rwlock.
> 
> Note that this series does not include my work on reference counting
> for PCI devices because this counting does not resolve isses we are
> having for vPCI. While it is (maybe) nice to have PCI refcounting, it
> does not moves us towards PCI on ARM.
> 
> 
> [1] https://lore.kernel.org/all/20220204063459.680961-1-andr2...@gmail.com/

Thanks for sending this!

Should this be v8? I see v7 at [2].

I had to rewind my xen.git back to 67c28bfc5245 for this series to apply 
cleanly (just before ee045f3a4a6d "vpci/header: cope with devices not having 
vpci allocated").

[2] https://lists.xenproject.org/archives/html/xen-devel/2022-07/msg01127.html

Re: [PATCH 4/4] xen/arm: pl011: Add SBSA UART device-tree support

2023-06-14 Thread Stefano Stabellini

On Wed, 7 Jun 2023, Michal Orzel wrote:
> We already have all the bits necessary in PL011 driver to support SBSA
> UART thanks to commit 032ea8c736d10f02672863c6e369338f948f7ed8 that
> enabled it for ACPI. Plumb in the remaining part for device-tree boot:
>  - add arm,sbsa-uart compatible to pl011_dt_match (no need for a separate
>struct and DT_DEVICE_START as SBSA is a subset of PL011),
>  - from pl011_dt_uart_init(), check for SBSA UART compatible to determine
>the UART type in use.
> 
> Signed-off-by: Michal Orzel 

Reviewed-by: Stefano Stabellini 


> ---
> After this series the last thing not to be in spec for newer UARTs (well,
> for rev1.5 introduced in 2007 I believe) is incorrect FIFO size. We hardcode 
> it
> to 16 but in r1.5 it is 32. This requires checking the peripheral ID register
> or using arm,primecell-periphid dt property for overriding HW. Something to
> be done in the future (at least 16 is not harmful).
> ---
>  xen/drivers/char/pl011.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/drivers/char/pl011.c b/xen/drivers/char/pl011.c
> index 403b1ac06551..f7bf3ad117af 100644
> --- a/xen/drivers/char/pl011.c
> +++ b/xen/drivers/char/pl011.c
> @@ -286,7 +286,7 @@ static int __init pl011_dt_uart_init(struct 
> dt_device_node *dev,
>  int res;
>  paddr_t addr, size;
>  uint32_t io_width;
> -bool mmio32 = false;
> +bool mmio32 = false, sbsa;
>  
>  if ( strcmp(config, "") )
>  {
> @@ -320,7 +320,9 @@ static int __init pl011_dt_uart_init(struct 
> dt_device_node *dev,
>  }
>  }
>  
> -res = pl011_uart_init(res, addr, size, false, mmio32);
> +sbsa = dt_device_is_compatible(dev, "arm,sbsa-uart");
> +
> +res = pl011_uart_init(res, addr, size, sbsa, mmio32);
>  if ( res < 0 )
>  {
>  printk("pl011: Unable to initialize\n");
> @@ -335,6 +337,8 @@ static int __init pl011_dt_uart_init(struct 
> dt_device_node *dev,
>  static const struct dt_device_match pl011_dt_match[] __initconst =
>  {
>  DT_MATCH_COMPATIBLE("arm,pl011"),
> +/* No need for a separate struct as SBSA UART is a subset of PL011 */
> +DT_MATCH_COMPATIBLE("arm,sbsa-uart"),
>  { /* sentinel */ },
>  };
>  
> -- 
> 2.25.1
>

Re: [PATCH 3/4] xen/arm: pl011: Use correct accessors

2023-06-14 Thread Stefano Stabellini

On Wed, 7 Jun 2023, Michal Orzel wrote:
> At the moment, we use 32-bit only accessors (i.e. readl/writel) to match
> the SBSA v2.x requirement. This should not be the default case for normal
> PL011 where accesses shall be 8/16-bit (max register size is 16-bit).
> There are however implementations of this UART that can only handle 32-bit
> MMIO. This is advertised by dt property "reg-io-width" set to 4.
> 
> Introduce new struct pl011 member mmio32 and replace pl011_{read/write}
> macros with static inline helpers that use 32-bit or 16-bit accessors
> (largest-common not to end up using different ones depending on the actual
> register size) according to mmio32 value. By default this property is set
> to false, unless:
>  - reg-io-width is specified with value 4,
>  - SBSA UART is in use.
> 
> For now, no changes done for ACPI due to lack of testing possibilities
> (i.e. current behavior maintained resulting in 32-bit accesses).
> 
> Signed-off-by: Michal Orzel 

Reviewed-by: Stefano Stabellini 


> ---
>  xen/drivers/char/pl011.c | 53 +++-
>  1 file changed, 47 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/drivers/char/pl011.c b/xen/drivers/char/pl011.c
> index 052a6512515c..403b1ac06551 100644
> --- a/xen/drivers/char/pl011.c
> +++ b/xen/drivers/char/pl011.c
> @@ -41,6 +41,7 @@ static struct pl011 {
>  /* unsigned int timeout_ms; */
>  /* bool_t probing, intr_works; */
>  bool sbsa;  /* ARM SBSA generic interface */
> +bool mmio32; /* 32-bit only MMIO */
>  } pl011_com = {0};
>  
>  /* These parity settings can be ORed directly into the LCR. */
> @@ -50,9 +51,30 @@ static struct pl011 {
>  #define PARITY_MARK  (PEN|SPS)
>  #define PARITY_SPACE (PEN|EPS|SPS)
>  
> -/* SBSA v2.x document requires, all reads/writes must be 32-bit accesses */
> -#define pl011_read(uart, off)   readl((uart)->regs + (off))
> -#define pl011_write(uart, off,val)  writel((val), (uart)->regs + (off))
> +/*
> + * By default, PL011 accesses shall be done using 8/16-bit accessors to
> + * support legacy devices that cannot cope with 32-bit. On the other hand,
> + * there are implementations of PL011 that can only handle 32-bit MMIO. Also,
> + * SBSA v2.x requires 32-bit accesses. Note that for default case, we use
> + * largest-common accessors (i.e. 16-bit) not to end up using different ones
> + * depending on the actual register size.
> + */
> +static inline void
> +pl011_write(struct pl011 *uart, unsigned int offset, unsigned int val)
> +{
> +if ( uart->mmio32 )
> +writel(val, uart->regs + offset);
> +else
> +writew(val, uart->regs + offset);
> +}
> +
> +static inline unsigned int pl011_read(struct pl011 *uart, unsigned int 
> offset)
> +{
> +if ( uart->mmio32 )
> +return readl(uart->regs + offset);
> +
> +return readw(uart->regs + offset);
> +}
>  
>  static unsigned int pl011_intr_status(struct pl011 *uart)
>  {
> @@ -222,7 +244,8 @@ static struct uart_driver __read_mostly pl011_driver = {
>  .vuart_info   = pl011_vuart,
>  };
>  
> -static int __init pl011_uart_init(int irq, paddr_t addr, paddr_t size, bool 
> sbsa)
> +static int __init
> +pl011_uart_init(int irq, paddr_t addr, paddr_t size, bool sbsa, bool mmio32)
>  {
>  struct pl011 *uart;
>  
> @@ -233,6 +256,9 @@ static int __init pl011_uart_init(int irq, paddr_t addr, 
> paddr_t size, bool sbsa
>  uart->stop_bits = 1;
>  uart->sbsa  = sbsa;
>  
> +/* Set 32-bit MMIO also for SBSA since v2.x requires it */
> +uart->mmio32 = (mmio32 || sbsa);
> +
>  uart->regs = ioremap_nocache(addr, size);
>  if ( !uart->regs )
>  {
> @@ -259,6 +285,8 @@ static int __init pl011_dt_uart_init(struct 
> dt_device_node *dev,
>  const char *config = data;
>  int res;
>  paddr_t addr, size;
> +uint32_t io_width;
> +bool mmio32 = false;
>  
>  if ( strcmp(config, "") )
>  {
> @@ -280,7 +308,19 @@ static int __init pl011_dt_uart_init(struct 
> dt_device_node *dev,
>  return -EINVAL;
>  }
>  
> -res = pl011_uart_init(res, addr, size, false);
> +/* See linux Documentation/devicetree/bindings/serial/pl011.yaml */
> +if ( dt_property_read_u32(dev, "reg-io-width", _width) )
> +{
> +if ( io_width == 4 )
> +mmio32 = true;
> +else if ( io_width != 1 )
> +{
> +printk("pl011: Unsupported reg-io-width (%"PRIu32")\n", 
> io_width);
> +return -EINVAL;
> +}
> +}
> +
> +res = pl011_uart_init(res, addr, size, false, mmio32);
>  if ( res < 0 )
>  {
>  printk("pl011: Unable to initialize\n");
> @@ -328,8 +368,9 @@ static int __init pl011_acpi_uart_init(const void *data)
>  /* trigger/polarity information is not available in spcr */
>  irq_set_type(spcr->interrupt, IRQ_TYPE_LEVEL_HIGH);
>  
> +/* TODO - mmio32 proper handling (for now set to true) */
>  res = pl011_uart_init(spcr->interrupt,

Re: [PATCH 2/4] xen/arm: debug-pl011: Add support for 32-bit only MMIO

2023-06-14 Thread Stefano Stabellini

On Wed, 7 Jun 2023, Michal Orzel wrote:
> There are implementations of PL011 that can only handle 32-bit accesses
> as oppose to the normal behavior where accesses are 8/16-bit wide. This
> is usually advertised by setting a dt property 'reg-io-width' to 4.
> 
> Introduce CONFIG_EARLY_UART_PL011_MMIO32 Kconfig option to be able to
> enable the use of 32-bit only accessors in PL011 early printk code.
> Define macros PL011_{STRH,STRB,LDRH} to distinguish accessors for normal
> case from 32-bit MMIO one and use them in arm32/arm64 pl011 early printk
> code.
> 
> Update documentation accordingly.
> 
> Signed-off-by: Michal Orzel 

Reviewed-by: Stefano Stabellini 

With the caveat of the potential change to patch #1 that would affect
this patch too


> ---
> I might want to align the indentation of operands but doing so in this patch
> is rather no go as it would limit the visibility of the scope of this patch.
> Something to do in the future.
> ---
>  docs/misc/arm/early-printk.txt|  3 +++
>  xen/arch/arm/Kconfig.debug|  7 +++
>  xen/arch/arm/arm32/debug-pl011.inc| 12 ++--
>  xen/arch/arm/arm64/debug-pl011.inc| 12 ++--
>  xen/arch/arm/include/asm/pl011-uart.h | 19 +++
>  5 files changed, 41 insertions(+), 12 deletions(-)
> 
> diff --git a/docs/misc/arm/early-printk.txt b/docs/misc/arm/early-printk.txt
> index aa22826075a4..bc2d65aa2ea3 100644
> --- a/docs/misc/arm/early-printk.txt
> +++ b/docs/misc/arm/early-printk.txt
> @@ -26,6 +26,9 @@ Other options depends on the driver selected:
>If CONFIG_EARLY_UART_PL011_BAUD_RATE  is set to 0 then the code will
>not try to initialize the UART, so that bootloader or firmware
>settings can be used for maximum compatibility.
> +
> +- CONFIG_EARLY_UART_PL011_MMIO32 is, optionally, used to enable 32-bit
> +  only accesses to registers.
>- scif
>  - CONFIG_EARLY_UART_SCIF_VERSION_* is, optionally, the interface version
>of the UART. Default to version NONE.
> diff --git a/xen/arch/arm/Kconfig.debug b/xen/arch/arm/Kconfig.debug
> index 842d768280c4..eec860e88e0b 100644
> --- a/xen/arch/arm/Kconfig.debug
> +++ b/xen/arch/arm/Kconfig.debug
> @@ -253,6 +253,13 @@ config EARLY_UART_PL011_BAUD_RATE
>   default 115200 if EARLY_PRINTK_FASTMODEL
>   default 0
>  
> +config EARLY_UART_PL011_MMIO32
> + bool "32-bit only MMIO for PL011 early printk"
> + depends on EARLY_UART_PL011
> + help
> +   If specified, all accesses to PL011 registers made from early printk 
> code
> +   will be done using 32-bit only accessors.
> +
>  config EARLY_UART_INIT
>   depends on EARLY_UART_PL011 && EARLY_UART_PL011_BAUD_RATE != 0
>   def_bool y
> diff --git a/xen/arch/arm/arm32/debug-pl011.inc 
> b/xen/arch/arm/arm32/debug-pl011.inc
> index 9fe0c2503831..5833da2a235c 100644
> --- a/xen/arch/arm/arm32/debug-pl011.inc
> +++ b/xen/arch/arm/arm32/debug-pl011.inc
> @@ -26,13 +26,13 @@
>   */
>  .macro early_uart_init rb, rc, rd
>  mov   \rc, #(7372800 / CONFIG_EARLY_UART_PL011_BAUD_RATE % 16)
> -strb  \rc, [\rb, #FBRD] /* -> UARTFBRD (Baud divisor fraction) */
> +PL011_STRB  \rc, [\rb, #FBRD]  /* -> UARTFBRD (Baud divisor 
> fraction) */
>  mov   \rc, #(7372800 / CONFIG_EARLY_UART_PL011_BAUD_RATE / 16)
> -strh  \rc, [\rb, #IBRD] /* -> UARTIBRD (Baud divisor integer) */
> +PL011_STRH  \rc, [\rb, #IBRD]  /* -> UARTIBRD (Baud divisor integer) 
> */
>  mov   \rc, #WLEN_8  /* 8n1 */
> -strb  \rc, [\rb, #LCR_H] /* -> UARTLCR_H (Line control) */
> +PL011_STRB  \rc, [\rb, #LCR_H] /* -> UARTLCR_H (Line control) */
>  ldr   \rc, =(RXE | TXE | UARTEN)  /* RXE | TXE | UARTEN */
> -strh  \rc, [\rb, #CR] /* -> UARTCR (Control Register) */
> +PL011_STRH  \rc, [\rb, #CR]/* -> UARTCR (Control Register) */
>  .endm
>  
>  /*
> @@ -42,7 +42,7 @@
>   */
>  .macro early_uart_ready rb, rc
>  1:
> -ldrh  \rc, [\rb, #FR]   /* <- UARTFR (Flag register) */
> +PL011_LDRH  \rc, [\rb, #FR] /* <- UARTFR (Flag register) */
>  tst   \rc, #BUSY /* Check BUSY bit */
>  bne   1b/* Wait for the UART to be ready */
>  .endm
> @@ -53,7 +53,7 @@
>   * rt: register which contains the character to transmit
>   */
>  .macro early_uart_transmit rb, rt
> -strb  \rt, [\rb, #DR]/* -> UARTDR (Data Register) */
> +PL011_STRB  \rt, [\rb, #DR]  /* -> UARTDR (Data Register) */
>  .endm
>  
>  /*
> diff --git a/xen/arch/arm/arm64/debug-pl011.inc 
> b/xen/arch/arm/arm64/debug-pl011.inc
> index df713eff4922..430594610b2c 100644
> --- a/xen/arch/arm/arm64/debug-pl011.inc
> +++ b/xen/arch/arm/arm64/debug-pl011.inc
> @@ -25,13 +25,13 @@
>   */
>  .macro early_uart_init xb, c
>  mov   x\c, #(7372800 / CONFIG_EARLY_UART_PL011_BAUD_RATE % 16)
> -strb  w\c, [\xb, #FBRD]  /* ->

Re: [PATCH 1/4] xen/arm: debug-pl011: Use correct accessors

2023-06-14 Thread Stefano Stabellini

On Wed, 7 Jun 2023, Michal Orzel wrote:
> Although most PL011 UARTs can cope with 32-bit accesses, some of the old
> legacy ones might not. PL011 registers are 8/16-bit wide and this shall
> be perceived as the normal behavior.
> 
> Modify early printk pl011 code for arm32/arm64 to use the correct
> accessors depending on the register size (refer ARM DDI 0183G, Table 3.1).
> 
> Signed-off-by: Michal Orzel 
> ---
> Next patch will override strX,ldrX with macros but I prefer to keep the
> history clean (+ possibiltity for a backport if needed).
> ---
>  xen/arch/arm/arm32/debug-pl011.inc | 12 ++--
>  xen/arch/arm/arm64/debug-pl011.inc |  6 +++---
>  2 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/xen/arch/arm/arm32/debug-pl011.inc 
> b/xen/arch/arm/arm32/debug-pl011.inc
> index c527f1d4424d..9fe0c2503831 100644
> --- a/xen/arch/arm/arm32/debug-pl011.inc
> +++ b/xen/arch/arm/arm32/debug-pl011.inc
> @@ -26,13 +26,13 @@
>   */
>  .macro early_uart_init rb, rc, rd
>  mov   \rc, #(7372800 / CONFIG_EARLY_UART_PL011_BAUD_RATE % 16)
> -str   \rc, [\rb, #FBRD] /* -> UARTFBRD (Baud divisor fraction) */
> +strb  \rc, [\rb, #FBRD] /* -> UARTFBRD (Baud divisor fraction) */
>  mov   \rc, #(7372800 / CONFIG_EARLY_UART_PL011_BAUD_RATE / 16)
> -str   \rc, [\rb, #IBRD] /* -> UARTIBRD (Baud divisor integer) */
> +strh  \rc, [\rb, #IBRD] /* -> UARTIBRD (Baud divisor integer) */
>  mov   \rc, #WLEN_8  /* 8n1 */
> -str   \rc, [\rb, #LCR_H] /* -> UARTLCR_H (Line control) */
> +strb  \rc, [\rb, #LCR_H] /* -> UARTLCR_H (Line control) */
>  ldr   \rc, =(RXE | TXE | UARTEN)  /* RXE | TXE | UARTEN */
> -str   \rc, [\rb, #CR] /* -> UARTCR (Control Register) */
> +strh  \rc, [\rb, #CR] /* -> UARTCR (Control Register) */
>  .endm
>  
>  /*
> @@ -42,7 +42,7 @@
>   */
>  .macro early_uart_ready rb, rc
>  1:
> -ldr   \rc, [\rb, #FR]   /* <- UARTFR (Flag register) */
> +ldrh  \rc, [\rb, #FR]   /* <- UARTFR (Flag register) */
>  tst   \rc, #BUSY /* Check BUSY bit */
>  bne   1b/* Wait for the UART to be ready */
>  .endm
> @@ -53,7 +53,7 @@
>   * rt: register which contains the character to transmit
>   */
>  .macro early_uart_transmit rb, rt
> -str   \rt, [\rb, #DR]/* -> UARTDR (Data Register) */
> +strb  \rt, [\rb, #DR]/* -> UARTDR (Data Register) */

Isn't UARTDR potentially 12-bit? I am not sure if we should use strb or
strh here...

Everything else checks out.


>  .endm
>  
>  /*
> diff --git a/xen/arch/arm/arm64/debug-pl011.inc 
> b/xen/arch/arm/arm64/debug-pl011.inc
> index 6d60e78c8ba3..df713eff4922 100644
> --- a/xen/arch/arm/arm64/debug-pl011.inc
> +++ b/xen/arch/arm/arm64/debug-pl011.inc
> @@ -25,13 +25,13 @@
>   */
>  .macro early_uart_init xb, c
>  mov   x\c, #(7372800 / CONFIG_EARLY_UART_PL011_BAUD_RATE % 16)
> -strh  w\c, [\xb, #FBRD]  /* -> UARTFBRD (Baud divisor fraction) 
> */
> +strb  w\c, [\xb, #FBRD]  /* -> UARTFBRD (Baud divisor fraction) 
> */
>  mov   x\c, #(7372800 / CONFIG_EARLY_UART_PL011_BAUD_RATE / 16)
>  strh  w\c, [\xb, #IBRD]  /* -> UARTIBRD (Baud divisor integer) */
>  mov   x\c, #WLEN_8   /* 8n1 */
> -str   w\c, [\xb, #LCR_H] /* -> UARTLCR_H (Line control) */
> +strb  w\c, [\xb, #LCR_H] /* -> UARTLCR_H (Line control) */
>  ldr   x\c, =(RXE | TXE | UARTEN)
> -str   w\c, [\xb, #CR]/* -> UARTCR (Control Register) */
> +strh  w\c, [\xb, #CR]/* -> UARTCR (Control Register) */
>  .endm
>  
>  /*
> -- 
> 2.25.1
>

[qemu-mainline test] 181430: regressions - FAIL

2023-06-14 Thread osstest service owner

flight 181430 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181430/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 180691
 build-arm64   6 xen-buildfail REGR. vs. 180691
 build-arm64-xsm   6 xen-buildfail REGR. vs. 180691
 build-amd64   6 xen-buildfail REGR. vs. 180691
 build-i3866 xen-buildfail REGR. vs. 180691
 build-i386-xsm6 xen-buildfail REGR. vs. 180691
 build-armhf   6 xen-buildfail REGR. vs. 180691

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-ws16-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-shadow 1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-vhd1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-coresched-i386-xl  1 build-check(1)   blocked  n/a
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a

[QEMU][PATCH v8 04/11] xen-hvm: reorganize xen-hvm and move common function to xen-hvm-common

2023-06-14 Thread Vikram Garhwal

From: Stefano Stabellini 

This patch does following:
1. creates arch_handle_ioreq() and arch_xen_set_memory(). This is done in
preparation for moving most of xen-hvm code to an arch-neutral location,
move the x86-specific portion of xen_set_memory to arch_xen_set_memory.
Also, move handle_vmport_ioreq to arch_handle_ioreq.

2. Pure code movement: move common functions to hw/xen/xen-hvm-common.c
Extract common functionalities from hw/i386/xen/xen-hvm.c and move them to
hw/xen/xen-hvm-common.c. These common functions are useful for creating
an IOREQ server.

xen_hvm_init_pc() contains the architecture independent code for creating
and mapping a IOREQ server, connecting memory and IO listeners, initializing
a xen bus and registering backends. Moved this common xen code to a new
function xen_register_ioreq() which can be used by both x86 and ARM 
machines.

Following functions are moved to hw/xen/xen-hvm-common.c:
xen_vcpu_eport(), xen_vcpu_ioreq(), xen_ram_alloc(), xen_set_memory(),
xen_region_add(), xen_region_del(), xen_io_add(), xen_io_del(),
xen_device_realize(), xen_device_unrealize(),
cpu_get_ioreq_from_shared_memory(), cpu_get_ioreq(), do_inp(),
do_outp(), rw_phys_req_item(), read_phys_req_item(),
write_phys_req_item(), cpu_ioreq_pio(), cpu_ioreq_move(),
cpu_ioreq_config(), handle_ioreq(), handle_buffered_iopage(),
handle_buffered_io(), cpu_handle_ioreq(), xen_main_loop_prepare(),
xen_hvm_change_state_handler(), xen_exit_notifier(),
xen_map_ioreq_server(), destroy_hvm_domain() and
xen_shutdown_fatal_error()

3. Removed static type from below functions:
1. xen_region_add()
2. xen_region_del()
3. xen_io_add()
4. xen_io_del()
5. xen_device_realize()
6. xen_device_unrealize()
7. xen_hvm_change_state_handler()
8. cpu_ioreq_pio()
9. xen_exit_notifier()

4. Replace TARGET_PAGE_SIZE with XC_PAGE_SIZE to match the page side with Xen.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Acked-by: Stefano Stabellini 
---
 hw/i386/xen/trace-events|   14 -
 hw/i386/xen/xen-hvm.c   | 1016 ++-
 hw/xen/meson.build  |5 +-
 hw/xen/trace-events |   14 +
 hw/xen/xen-hvm-common.c |  860 ++
 include/hw/i386/xen_arch_hvm.h  |   11 +
 include/hw/xen/arch_hvm.h   |3 +
 include/hw/xen/xen-hvm-common.h |   99 +++
 8 files changed, 1054 insertions(+), 968 deletions(-)
 create mode 100644 hw/xen/xen-hvm-common.c
 create mode 100644 include/hw/i386/xen_arch_hvm.h
 create mode 100644 include/hw/xen/arch_hvm.h
 create mode 100644 include/hw/xen/xen-hvm-common.h

diff --git a/hw/i386/xen/trace-events b/hw/i386/xen/trace-events
index a0c89d91c4..5d0a8d6dcf 100644
--- a/hw/i386/xen/trace-events
+++ b/hw/i386/xen/trace-events
@@ -7,17 +7,3 @@ xen_platform_log(char *s) "xen platform: %s"
 xen_pv_mmio_read(uint64_t addr) "WARNING: read from Xen PV Device MMIO space 
(address 0x%"PRIx64")"
 xen_pv_mmio_write(uint64_t addr) "WARNING: write to Xen PV Device MMIO space 
(address 0x%"PRIx64")"
 
-# xen-hvm.c
-xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: 0x%lx, 
size 0x%lx"
-xen_client_set_memory(uint64_t start_addr, unsigned long size, bool log_dirty) 
"0x%"PRIx64" size 0x%lx, log_dirty %i"
-handle_ioreq(void *req, uint32_t type, uint32_t dir, uint32_t df, uint32_t 
data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) 
"I/O=%p type=%d dir=%d df=%d ptr=%d port=0x%"PRIx64" data=0x%"PRIx64" count=%d 
size=%d"
-handle_ioreq_read(void *req, uint32_t type, uint32_t df, uint32_t data_is_ptr, 
uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p read 
type=%d df=%d ptr=%d port=0x%"PRIx64" data=0x%"PRIx64" count=%d size=%d"
-handle_ioreq_write(void *req, uint32_t type, uint32_t df, uint32_t 
data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) 
"I/O=%p write type=%d df=%d ptr=%d port=0x%"PRIx64" data=0x%"PRIx64" count=%d 
size=%d"
-cpu_ioreq_pio(void *req, uint32_t dir, uint32_t df, uint32_t data_is_ptr, 
uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p pio dir=%d 
df=%d ptr=%d port=0x%"PRIx64" data=0x%"PRIx64" count=%d size=%d"
-cpu_ioreq_pio_read_reg(void *req, uint64_t data, uint64_t addr, uint32_t size) 
"I/O=%p pio read reg data=0x%"PRIx64" port=0x%"PRIx64" size=%d"
-cpu_ioreq_pio_write_reg(void *req, uint64_t data, uint64_t addr, uint32_t 
size) "I/O=%p pio write reg data=0x%"PRIx64" port=0x%"PRIx64" size=%d"
-cpu_ioreq_move(void *req, uint32_t dir, uint32_t df, uint32_t data_is_ptr, 
uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p copy 
dir=%d df=%d ptr=%d port=0x%"PRIx64" data=0x%"PRIx64" count=%d size=%d"
-xen_map_resource_ioreq(uint32_t id, void *addr) "id: %u addr: %p"
-cpu_ioreq_config_read(void *req, uint32_t sbdf, uint32_t reg,

[QEMU][PATCH v8 06/11] hw/xen/xen-hvm-common: skip ioreq creation on ioreq registration failure

2023-06-14 Thread Vikram Garhwal

From: Stefano Stabellini 

On ARM it is possible to have a functioning xenpv machine with only the
PV backends and no IOREQ server. If the IOREQ server creation fails continue
to the PV backends initialization.

Also, moved the IOREQ registration and mapping subroutine to new function
xen_do_ioreq_register().

Signed-off-by: Stefano Stabellini 
Signed-off-by: Vikram Garhwal 
Reviewed-by: Stefano Stabellini 
Reviewed-by: Paul Durrant 
---
 hw/xen/xen-hvm-common.c | 57 +++--
 1 file changed, 38 insertions(+), 19 deletions(-)

diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index a31b067404..cb82f4b83d 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -764,27 +764,12 @@ void xen_shutdown_fatal_error(const char *fmt, ...)
 qemu_system_shutdown_request(SHUTDOWN_CAUSE_HOST_ERROR);
 }
 
-void xen_register_ioreq(XenIOState *state, unsigned int max_cpus,
-MemoryListener xen_memory_listener)
+static void xen_do_ioreq_register(XenIOState *state,
+   unsigned int max_cpus,
+   MemoryListener xen_memory_listener)
 {
 int i, rc;
 
-setup_xen_backend_ops();
-
-state->xce_handle = qemu_xen_evtchn_open();
-if (state->xce_handle == NULL) {
-perror("xen: event channel open");
-goto err;
-}
-
-state->xenstore = xs_daemon_open();
-if (state->xenstore == NULL) {
-perror("xen: xenstore open");
-goto err;
-}
-
-xen_create_ioreq_server(xen_domid, >ioservid);
-
 state->exit.notify = xen_exit_notifier;
 qemu_add_exit_notifier(>exit);
 
@@ -849,12 +834,46 @@ void xen_register_ioreq(XenIOState *state, unsigned int 
max_cpus,
 QLIST_INIT(>dev_list);
 device_listener_register(>device_listener);
 
+return;
+
+err:
+error_report("xen hardware virtual machine initialisation failed");
+exit(1);
+}
+
+void xen_register_ioreq(XenIOState *state, unsigned int max_cpus,
+MemoryListener xen_memory_listener)
+{
+int rc;
+
+setup_xen_backend_ops();
+
+state->xce_handle = qemu_xen_evtchn_open();
+if (state->xce_handle == NULL) {
+perror("xen: event channel open");
+goto err;
+}
+
+state->xenstore = xs_daemon_open();
+if (state->xenstore == NULL) {
+perror("xen: xenstore open");
+goto err;
+}
+
+rc = xen_create_ioreq_server(xen_domid, >ioservid);
+if (!rc) {
+xen_do_ioreq_register(state, max_cpus, xen_memory_listener);
+} else {
+warn_report("xen: failed to create ioreq server");
+}
+
 xen_bus_init();
 
 xen_be_init();
 
 return;
+
 err:
-error_report("xen hardware virtual machine initialisation failed");
+error_report("xen hardware virtual machine backend registration failed");
 exit(1);
 }
-- 
2.17.1

[QEMU][PATCH v8 09/11] hw/arm: introduce xenpvh machine

2023-06-14 Thread Vikram Garhwal

Add a new machine xenpvh which creates a IOREQ server to register/connect with
Xen Hypervisor.

Optional: When CONFIG_TPM is enabled, it also creates a tpm-tis-device, adds a
TPM emulator and connects to swtpm running on host machine via chardev socket
and support TPM functionalities for a guest domain.

Extra command line for aarch64 xenpvh QEMU to connect to swtpm:
-chardev socket,id=chrtpm,path=/tmp/myvtpm2/swtpm-sock \
-tpmdev emulator,id=tpm0,chardev=chrtpm \
-machine tpm-base-addr=0x0c00 \

swtpm implements a TPM software emulator(TPM 1.2 & TPM 2) built on libtpms and
provides access to TPM functionality over socket, chardev and CUSE interface.
Github repo: https://github.com/stefanberger/swtpm
Example for starting swtpm on host machine:
mkdir /tmp/vtpm2
swtpm socket --tpmstate dir=/tmp/vtpm2 \
--ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Stefano Stabellini 
---
 docs/system/arm/xenpvh.rst|  34 +++
 docs/system/target-arm.rst|   1 +
 hw/arm/meson.build|   2 +
 hw/arm/xen_arm.c  | 181 ++
 include/hw/arm/xen_arch_hvm.h |   9 ++
 include/hw/xen/arch_hvm.h |   2 +
 6 files changed, 229 insertions(+)
 create mode 100644 docs/system/arm/xenpvh.rst
 create mode 100644 hw/arm/xen_arm.c
 create mode 100644 include/hw/arm/xen_arch_hvm.h

diff --git a/docs/system/arm/xenpvh.rst b/docs/system/arm/xenpvh.rst
new file mode 100644
index 00..e1655c7ab8
--- /dev/null
+++ b/docs/system/arm/xenpvh.rst
@@ -0,0 +1,34 @@
+XENPVH (``xenpvh``)
+=
+This machine creates a IOREQ server to register/connect with Xen Hypervisor.
+
+When TPM is enabled, this machine also creates a tpm-tis-device at a user input
+tpm base address, adds a TPM emulator and connects to a swtpm application
+running on host machine via chardev socket. This enables xenpvh to support TPM
+functionalities for a guest domain.
+
+More information about TPM use and installing swtpm linux application can be
+found at: docs/specs/tpm.rst.
+
+Example for starting swtpm on host machine:
+.. code-block:: console
+
+mkdir /tmp/vtpm2
+swtpm socket --tpmstate dir=/tmp/vtpm2 \
+--ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &
+
+Sample QEMU xenpvh commands for running and connecting with Xen:
+.. code-block:: console
+
+qemu-system-aarch64 -xen-domid 1 \
+-chardev socket,id=libxl-cmd,path=qmp-libxl-1,server=on,wait=off \
+-mon chardev=libxl-cmd,mode=control \
+-chardev socket,id=libxenstat-cmd,path=qmp-libxenstat-1,server=on,wait=off 
\
+-mon chardev=libxenstat-cmd,mode=control \
+-xen-attach -name guest0 -vnc none -display none -nographic \
+-machine xenpvh -m 1301 \
+-chardev socket,id=chrtpm,path=tmp/vtpm2/swtpm-sock \
+-tpmdev emulator,id=tpm0,chardev=chrtpm -machine tpm-base-addr=0x0C00
+
+In above QEMU command, last two lines are for connecting xenpvh QEMU to swtpm
+via chardev socket.
diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
index a12b6bca05..790ac1b8a2 100644
--- a/docs/system/target-arm.rst
+++ b/docs/system/target-arm.rst
@@ -107,6 +107,7 @@ undocumented; you can get a complete list by running
arm/stm32
arm/virt
arm/xlnx-versal-virt
+   arm/xenpvh
 
 Emulated CPU architecture support
 =
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index 870ec67376..4f94f821b0 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -63,6 +63,8 @@ arm_ss.add(when: 'CONFIG_FSL_IMX7', if_true: 
files('fsl-imx7.c', 'mcimx7d-sabre.
 arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
 arm_ss.add(when: 'CONFIG_FSL_IMX6UL', if_true: files('fsl-imx6ul.c', 
'mcimx6ul-evk.c'))
 arm_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_soc.c'))
+arm_ss.add(when: 'CONFIG_XEN', if_true: files('xen_arm.c'))
+arm_ss.add_all(xen_ss)
 
 softmmu_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmu-common.c'))
 softmmu_ss.add(when: 'CONFIG_EXYNOS4', if_true: files('exynos4_boards.c'))
diff --git a/hw/arm/xen_arm.c b/hw/arm/xen_arm.c
new file mode 100644
index 00..19b1cb81ad
--- /dev/null
+++ b/hw/arm/xen_arm.c
@@ -0,0 +1,181 @@
+/*
+ * QEMU ARM Xen PVH Machine
+ *
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT

[QEMU][PATCH v8 03/11] hw/i386/xen/xen-hvm: move x86-specific fields out of XenIOState

2023-06-14 Thread Vikram Garhwal

From: Stefano Stabellini 

In preparation to moving most of xen-hvm code to an arch-neutral location, move:
- shared_vmport_page
- log_for_dirtybit
- dirty_bitmap
- suspend
- wakeup

out of XenIOState struct as these are only used on x86, especially the ones
related to dirty logging.
Updated XenIOState can be used for both aarch64 and x86.

Also, remove free_phys_offset as it was unused.

Signed-off-by: Stefano Stabellini 
Signed-off-by: Vikram Garhwal 
Reviewed-by: Paul Durrant 
Reviewed-by: Alex Bennée 
---
 hw/i386/xen/xen-hvm.c | 58 ---
 1 file changed, 27 insertions(+), 31 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 7a7764240e..01bf947f1c 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -74,6 +74,7 @@ struct shared_vmport_iopage {
 };
 typedef struct shared_vmport_iopage shared_vmport_iopage_t;
 #endif
+static shared_vmport_iopage_t *shared_vmport_page;
 
 static inline uint32_t xen_vcpu_eport(shared_iopage_t *shared_page, int i)
 {
@@ -96,6 +97,11 @@ typedef struct XenPhysmap {
 } XenPhysmap;
 
 static QLIST_HEAD(, XenPhysmap) xen_physmap;
+static const XenPhysmap *log_for_dirtybit;
+/* Buffer used by xen_sync_dirty_bitmap */
+static unsigned long *dirty_bitmap;
+static Notifier suspend;
+static Notifier wakeup;
 
 typedef struct XenPciDevice {
 PCIDevice *pci_dev;
@@ -106,7 +112,6 @@ typedef struct XenPciDevice {
 typedef struct XenIOState {
 ioservid_t ioservid;
 shared_iopage_t *shared_page;
-shared_vmport_iopage_t *shared_vmport_page;
 buffered_iopage_t *buffered_io_page;
 xenforeignmemory_resource_handle *fres;
 QEMUTimer *buffered_io_timer;
@@ -126,14 +131,8 @@ typedef struct XenIOState {
 MemoryListener io_listener;
 QLIST_HEAD(, XenPciDevice) dev_list;
 DeviceListener device_listener;
-hwaddr free_phys_offset;
-const XenPhysmap *log_for_dirtybit;
-/* Buffer used by xen_sync_dirty_bitmap */
-unsigned long *dirty_bitmap;
 
 Notifier exit;
-Notifier suspend;
-Notifier wakeup;
 } XenIOState;
 
 /* Xen specific function for piix pci */
@@ -463,10 +462,10 @@ static int xen_remove_from_physmap(XenIOState *state,
 }
 
 QLIST_REMOVE(physmap, list);
-if (state->log_for_dirtybit == physmap) {
-state->log_for_dirtybit = NULL;
-g_free(state->dirty_bitmap);
-state->dirty_bitmap = NULL;
+if (log_for_dirtybit == physmap) {
+log_for_dirtybit = NULL;
+g_free(dirty_bitmap);
+dirty_bitmap = NULL;
 }
 g_free(physmap);
 
@@ -627,16 +626,16 @@ static void xen_sync_dirty_bitmap(XenIOState *state,
 return;
 }
 
-if (state->log_for_dirtybit == NULL) {
-state->log_for_dirtybit = physmap;
-state->dirty_bitmap = g_new(unsigned long, bitmap_size);
-} else if (state->log_for_dirtybit != physmap) {
+if (log_for_dirtybit == NULL) {
+log_for_dirtybit = physmap;
+dirty_bitmap = g_new(unsigned long, bitmap_size);
+} else if (log_for_dirtybit != physmap) {
 /* Only one range for dirty bitmap can be tracked. */
 return;
 }
 
 rc = xen_track_dirty_vram(xen_domid, start_addr >> TARGET_PAGE_BITS,
-  npages, state->dirty_bitmap);
+  npages, dirty_bitmap);
 if (rc < 0) {
 #ifndef ENODATA
 #define ENODATA  ENOENT
@@ -651,7 +650,7 @@ static void xen_sync_dirty_bitmap(XenIOState *state,
 }
 
 for (i = 0; i < bitmap_size; i++) {
-unsigned long map = state->dirty_bitmap[i];
+unsigned long map = dirty_bitmap[i];
 while (map != 0) {
 j = ctzl(map);
 map &= ~(1ul << j);
@@ -677,12 +676,10 @@ static void xen_log_start(MemoryListener *listener,
 static void xen_log_stop(MemoryListener *listener, MemoryRegionSection 
*section,
  int old, int new)
 {
-XenIOState *state = container_of(listener, XenIOState, memory_listener);
-
 if (old & ~new & (1 << DIRTY_MEMORY_VGA)) {
-state->log_for_dirtybit = NULL;
-g_free(state->dirty_bitmap);
-state->dirty_bitmap = NULL;
+log_for_dirtybit = NULL;
+g_free(dirty_bitmap);
+dirty_bitmap = NULL;
 /* Disable dirty bit tracking */
 xen_track_dirty_vram(xen_domid, 0, 0, NULL);
 }
@@ -1022,9 +1019,9 @@ static void handle_vmport_ioreq(XenIOState *state, 
ioreq_t *req)
 {
 vmware_regs_t *vmport_regs;
 
-assert(state->shared_vmport_page);
+assert(shared_vmport_page);
 vmport_regs =
->shared_vmport_page->vcpu_vmport_regs[state->send_vcpu];
+_vmport_page->vcpu_vmport_regs[state->send_vcpu];
 QEMU_BUILD_BUG_ON(sizeof(*req) < sizeof(*vmport_regs));
 
 current_cpu = state->cpu_by_vcpu_id[state->send_vcpu];
@@ -1472,7 +1469,6 @@ void xen_hvm_init_pc(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 
 state->memory_listener = xen_memory_listener;

[QEMU][PATCH v8 07/11] hw/xen/xen-hvm-common: Use g_new and error_report

2023-06-14 Thread Vikram Garhwal

Replace g_malloc with g_new and perror with error_report.

Signed-off-by: Vikram Garhwal 
Reviewed-by: Stefano Stabellini 
Reviewed-by: Paul Durrant 
---
 hw/xen/xen-hvm-common.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index cb82f4b83d..42339c96bd 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -33,7 +33,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, 
MemoryRegion *mr,
 trace_xen_ram_alloc(ram_addr, size);
 
 nr_pfn = size >> TARGET_PAGE_BITS;
-pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn);
+pfn_list = g_new(xen_pfn_t, nr_pfn);
 
 for (i = 0; i < nr_pfn; i++) {
 pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
@@ -730,7 +730,7 @@ void destroy_hvm_domain(bool reboot)
 return;
 }
 if (errno != ENOTTY /* old Xen */) {
-perror("xendevicemodel_shutdown failed");
+error_report("xendevicemodel_shutdown failed with error %d", 
errno);
 }
 /* well, try the old thing then */
 }
@@ -784,7 +784,7 @@ static void xen_do_ioreq_register(XenIOState *state,
 }
 
 /* Note: cpus is empty at this point in init */
-state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
+state->cpu_by_vcpu_id = g_new0(CPUState *, max_cpus);
 
 rc = xen_set_ioreq_server_state(xen_domid, state->ioservid, true);
 if (rc < 0) {
@@ -793,7 +793,7 @@ static void xen_do_ioreq_register(XenIOState *state,
 goto err;
 }
 
-state->ioreq_local_port = g_malloc0(max_cpus * sizeof (evtchn_port_t));
+state->ioreq_local_port = g_new0(evtchn_port_t, max_cpus);
 
 /* FIXME: how about if we overflow the page here? */
 for (i = 0; i < max_cpus; i++) {
@@ -850,13 +850,13 @@ void xen_register_ioreq(XenIOState *state, unsigned int 
max_cpus,
 
 state->xce_handle = qemu_xen_evtchn_open();
 if (state->xce_handle == NULL) {
-perror("xen: event channel open");
+error_report("xen: event channel open failed with error %d", errno);
 goto err;
 }
 
 state->xenstore = xs_daemon_open();
 if (state->xenstore == NULL) {
-perror("xen: xenstore open");
+error_report("xen: xenstore open failed with error %d", errno);
 goto err;
 }
 
-- 
2.17.1

[QEMU][PATCH v8 05/11] include/hw/xen/xen_common: return error from xen_create_ioreq_server

2023-06-14 Thread Vikram Garhwal

From: Stefano Stabellini 

This is done to prepare for enabling xenpv support for ARM architecture.
On ARM it is possible to have a functioning xenpv machine with only the
PV backends and no IOREQ server. If the IOREQ server creation fails,
continue to the PV backends initialization.

Signed-off-by: Stefano Stabellini 
Signed-off-by: Vikram Garhwal 
Reviewed-by: Stefano Stabellini 
Reviewed-by: Paul Durrant 
---
 include/hw/xen/xen_native.h | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/hw/xen/xen_native.h b/include/hw/xen/xen_native.h
index f11eb423e3..4dce905fde 100644
--- a/include/hw/xen/xen_native.h
+++ b/include/hw/xen/xen_native.h
@@ -463,8 +463,8 @@ static inline void xen_unmap_pcidev(domid_t dom,
   PCI_FUNC(pci_dev->devfn));
 }
 
-static inline void xen_create_ioreq_server(domid_t dom,
-   ioservid_t *ioservid)
+static inline int xen_create_ioreq_server(domid_t dom,
+  ioservid_t *ioservid)
 {
 int rc = xendevicemodel_create_ioreq_server(xen_dmod, dom,
 HVM_IOREQSRV_BUFIOREQ_ATOMIC,
@@ -472,12 +472,14 @@ static inline void xen_create_ioreq_server(domid_t dom,
 
 if (rc == 0) {
 trace_xen_ioreq_server_create(*ioservid);
-return;
+return rc;
 }
 
 *ioservid = 0;
 use_default_ioreq_server = true;
 trace_xen_default_ioreq_server();
+
+return rc;
 }
 
 static inline void xen_destroy_ioreq_server(domid_t dom,
-- 
2.17.1

[QEMU][PATCH v8 01/11] hw/i386/xen/: move xen-mapcache.c to hw/xen/

2023-06-14 Thread Vikram Garhwal

xen-mapcache.c contains common functions which can be used for enabling Xen on
aarch64 with IOREQ handling. Moving it out from hw/i386/xen to hw/xen to make it
accessible for both aarch64 and x86.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Paul Durrant 
---
 hw/i386/meson.build  | 1 +
 hw/i386/xen/meson.build  | 1 -
 hw/i386/xen/trace-events | 5 -
 hw/xen/meson.build   | 4 
 hw/xen/trace-events  | 5 +
 hw/{i386 => }/xen/xen-mapcache.c | 0
 6 files changed, 10 insertions(+), 6 deletions(-)
 rename hw/{i386 => }/xen/xen-mapcache.c (100%)

diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 213e2e82b3..cfdbfdcbcb 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -33,5 +33,6 @@ subdir('kvm')
 subdir('xen')
 
 i386_ss.add_all(xenpv_ss)
+i386_ss.add_all(xen_ss)
 
 hw_arch += {'i386': i386_ss}
diff --git a/hw/i386/xen/meson.build b/hw/i386/xen/meson.build
index 2e64a34e16..3dc4c4f106 100644
--- a/hw/i386/xen/meson.build
+++ b/hw/i386/xen/meson.build
@@ -1,6 +1,5 @@
 i386_ss.add(when: 'CONFIG_XEN', if_true: files(
   'xen-hvm.c',
-  'xen-mapcache.c',
   'xen_apic.c',
   'xen_pvdevice.c',
 ))
diff --git a/hw/i386/xen/trace-events b/hw/i386/xen/trace-events
index 5d6be61090..a0c89d91c4 100644
--- a/hw/i386/xen/trace-events
+++ b/hw/i386/xen/trace-events
@@ -21,8 +21,3 @@ xen_map_resource_ioreq(uint32_t id, void *addr) "id: %u addr: 
%p"
 cpu_ioreq_config_read(void *req, uint32_t sbdf, uint32_t reg, uint32_t size, 
uint32_t data) "I/O=%p sbdf=0x%x reg=%u size=%u data=0x%x"
 cpu_ioreq_config_write(void *req, uint32_t sbdf, uint32_t reg, uint32_t size, 
uint32_t data) "I/O=%p sbdf=0x%x reg=%u size=%u data=0x%x"
 
-# xen-mapcache.c
-xen_map_cache(uint64_t phys_addr) "want 0x%"PRIx64
-xen_remap_bucket(uint64_t index) "index 0x%"PRIx64
-xen_map_cache_return(void* ptr) "%p"
-
diff --git a/hw/xen/meson.build b/hw/xen/meson.build
index 19c6aabc7c..202752e557 100644
--- a/hw/xen/meson.build
+++ b/hw/xen/meson.build
@@ -26,3 +26,7 @@ else
 endif
 
 specific_ss.add_all(when: ['CONFIG_XEN', xen], if_true: xen_specific_ss)
+
+xen_ss = ss.source_set()
+
+xen_ss.add(when: 'CONFIG_XEN', if_true: files('xen-mapcache.c'))
diff --git a/hw/xen/trace-events b/hw/xen/trace-events
index 55c9e1df68..f977c7c8c6 100644
--- a/hw/xen/trace-events
+++ b/hw/xen/trace-events
@@ -41,3 +41,8 @@ xs_node_vprintf(char *path, char *value) "%s %s"
 xs_node_vscanf(char *path, char *value) "%s %s"
 xs_node_watch(char *path) "%s"
 xs_node_unwatch(char *path) "%s"
+
+# xen-mapcache.c
+xen_map_cache(uint64_t phys_addr) "want 0x%"PRIx64
+xen_remap_bucket(uint64_t index) "index 0x%"PRIx64
+xen_map_cache_return(void* ptr) "%p"
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
similarity index 100%
rename from hw/i386/xen/xen-mapcache.c
rename to hw/xen/xen-mapcache.c
-- 
2.17.1

[QEMU][PATCH v8 02/11] hw/i386/xen: rearrange xen_hvm_init_pc

2023-06-14 Thread Vikram Garhwal

In preparation to moving most of xen-hvm code to an arch-neutral location,
move non IOREQ references to:
- xen_get_vmport_regs_pfn
- xen_suspend_notifier
- xen_wakeup_notifier
- xen_ram_init

towards the end of the xen_hvm_init_pc() function.

This is done to keep the common ioreq functions in one place which will be
moved to new function in next patch in order to make it common to both x86 and
aarch64 machines.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Paul Durrant 
---
 hw/i386/xen/xen-hvm.c | 49 ++-
 1 file changed, 25 insertions(+), 24 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index ab8f1b61ee..7a7764240e 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -1419,12 +1419,6 @@ void xen_hvm_init_pc(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 state->exit.notify = xen_exit_notifier;
 qemu_add_exit_notifier(>exit);
 
-state->suspend.notify = xen_suspend_notifier;
-qemu_register_suspend_notifier(>suspend);
-
-state->wakeup.notify = xen_wakeup_notifier;
-qemu_register_wakeup_notifier(>wakeup);
-
 /*
  * Register wake-up support in QMP query-current-machine API
  */
@@ -1435,23 +1429,6 @@ void xen_hvm_init_pc(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 goto err;
 }
 
-rc = xen_get_vmport_regs_pfn(xen_xc, xen_domid, _pfn);
-if (!rc) {
-DPRINTF("shared vmport page at pfn %lx\n", ioreq_pfn);
-state->shared_vmport_page =
-xenforeignmemory_map(xen_fmem, xen_domid, PROT_READ|PROT_WRITE,
- 1, _pfn, NULL);
-if (state->shared_vmport_page == NULL) {
-error_report("map shared vmport IO page returned error %d 
handle=%p",
- errno, xen_xc);
-goto err;
-}
-} else if (rc != -ENOSYS) {
-error_report("get vmport regs pfn returned error %d, rc=%d",
- errno, rc);
-goto err;
-}
-
 /* Note: cpus is empty at this point in init */
 state->cpu_by_vcpu_id = g_new0(CPUState *, max_cpus);
 
@@ -1490,7 +1467,6 @@ void xen_hvm_init_pc(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 #else
 xen_map_cache_init(NULL, state);
 #endif
-xen_ram_init(pcms, ms->ram_size, ram_memory);
 
 qemu_add_vm_change_state_handler(xen_hvm_change_state_handler, state);
 
@@ -1511,6 +1487,31 @@ void xen_hvm_init_pc(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 QLIST_INIT(_physmap);
 xen_read_physmap(state);
 
+state->suspend.notify = xen_suspend_notifier;
+qemu_register_suspend_notifier(>suspend);
+
+state->wakeup.notify = xen_wakeup_notifier;
+qemu_register_wakeup_notifier(>wakeup);
+
+rc = xen_get_vmport_regs_pfn(xen_xc, xen_domid, _pfn);
+if (!rc) {
+DPRINTF("shared vmport page at pfn %lx\n", ioreq_pfn);
+state->shared_vmport_page =
+xenforeignmemory_map(xen_fmem, xen_domid, PROT_READ|PROT_WRITE,
+ 1, _pfn, NULL);
+if (state->shared_vmport_page == NULL) {
+error_report("map shared vmport IO page returned error %d 
handle=%p",
+ errno, xen_xc);
+goto err;
+}
+} else if (rc != -ENOSYS) {
+error_report("get vmport regs pfn returned error %d, rc=%d",
+ errno, rc);
+goto err;
+}
+
+xen_ram_init(pcms, ms->ram_size, ram_memory);
+
 /* Disable ACPI build because Xen handles it */
 pcms->acpi_build_enabled = false;
 
-- 
2.17.1

Re: [PATCH] xen/arm: Remove stray semicolon at VREG_REG_HELPERS/TLB_HELPER* callers

2023-06-14 Thread Stefano Stabellini

On Wed, 14 Jun 2023, Michal Orzel wrote:
> This is inconsistent with the rest of the code where macros are used
> to define functions, as it results in an empty declaration (i.e.
> semicolon with nothing before it) after function definition. This is also
> not allowed by C99.
> 
> Take the opportunity to undefine TLB_HELPER* macros after last use.
> 
> Signed-off-by: Michal Orzel 

Reviewed-by: Stefano Stabellini 


> ---
> Discussion:
> https://lore.kernel.org/xen-devel/17c59d5c-795e-4591-a7c9-a4c5179bf...@arm.com/
> 
> Other empty declarations appear at callers of TYPE_SAFE and Linux module
> macros like EXPORT_SYMBOL for which we need some sort of agreement.
> ---
>  xen/arch/arm/include/asm/arm32/flushtlb.h | 12 +++-
>  xen/arch/arm/include/asm/arm64/flushtlb.h | 17 ++---
>  xen/arch/arm/include/asm/vreg.h   |  4 ++--
>  3 files changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/xen/arch/arm/include/asm/arm32/flushtlb.h 
> b/xen/arch/arm/include/asm/arm32/flushtlb.h
> index 7ae6a12f8155..22ee3b317b4d 100644
> --- a/xen/arch/arm/include/asm/arm32/flushtlb.h
> +++ b/xen/arch/arm/include/asm/arm32/flushtlb.h
> @@ -29,19 +29,21 @@ static inline void name(void)   \
>  }
>  
>  /* Flush local TLBs, current VMID only */
> -TLB_HELPER(flush_guest_tlb_local, TLBIALL, nsh);
> +TLB_HELPER(flush_guest_tlb_local, TLBIALL, nsh)
>  
>  /* Flush inner shareable TLBs, current VMID only */
> -TLB_HELPER(flush_guest_tlb, TLBIALLIS, ish);
> +TLB_HELPER(flush_guest_tlb, TLBIALLIS, ish)
>  
>  /* Flush local TLBs, all VMIDs, non-hypervisor mode */
> -TLB_HELPER(flush_all_guests_tlb_local, TLBIALLNSNH, nsh);
> +TLB_HELPER(flush_all_guests_tlb_local, TLBIALLNSNH, nsh)
>  
>  /* Flush innershareable TLBs, all VMIDs, non-hypervisor mode */
> -TLB_HELPER(flush_all_guests_tlb, TLBIALLNSNHIS, ish);
> +TLB_HELPER(flush_all_guests_tlb, TLBIALLNSNHIS, ish)
>  
>  /* Flush all hypervisor mappings from the TLB of the local processor. */
> -TLB_HELPER(flush_xen_tlb_local, TLBIALLH, nsh);
> +TLB_HELPER(flush_xen_tlb_local, TLBIALLH, nsh)
> +
> +#undef TLB_HELPER
>  
>  /* Flush TLB of local processor for address va. */
>  static inline void __flush_xen_tlb_one_local(vaddr_t va)
> diff --git a/xen/arch/arm/include/asm/arm64/flushtlb.h 
> b/xen/arch/arm/include/asm/arm64/flushtlb.h
> index 3a9092b814a9..56c6fc763b56 100644
> --- a/xen/arch/arm/include/asm/arm64/flushtlb.h
> +++ b/xen/arch/arm/include/asm/arm64/flushtlb.h
> @@ -67,25 +67,28 @@ static inline void name(vaddr_t va)  \
>  }
>  
>  /* Flush local TLBs, current VMID only. */
> -TLB_HELPER(flush_guest_tlb_local, vmalls12e1, nsh);
> +TLB_HELPER(flush_guest_tlb_local, vmalls12e1, nsh)
>  
>  /* Flush innershareable TLBs, current VMID only */
> -TLB_HELPER(flush_guest_tlb, vmalls12e1is, ish);
> +TLB_HELPER(flush_guest_tlb, vmalls12e1is, ish)
>  
>  /* Flush local TLBs, all VMIDs, non-hypervisor mode */
> -TLB_HELPER(flush_all_guests_tlb_local, alle1, nsh);
> +TLB_HELPER(flush_all_guests_tlb_local, alle1, nsh)
>  
>  /* Flush innershareable TLBs, all VMIDs, non-hypervisor mode */
> -TLB_HELPER(flush_all_guests_tlb, alle1is, ish);
> +TLB_HELPER(flush_all_guests_tlb, alle1is, ish)
>  
>  /* Flush all hypervisor mappings from the TLB of the local processor. */
> -TLB_HELPER(flush_xen_tlb_local, alle2, nsh);
> +TLB_HELPER(flush_xen_tlb_local, alle2, nsh)
>  
>  /* Flush TLB of local processor for address va. */
> -TLB_HELPER_VA(__flush_xen_tlb_one_local, vae2);
> +TLB_HELPER_VA(__flush_xen_tlb_one_local, vae2)
>  
>  /* Flush TLB of all processors in the inner-shareable domain for address va. 
> */
> -TLB_HELPER_VA(__flush_xen_tlb_one, vae2is);
> +TLB_HELPER_VA(__flush_xen_tlb_one, vae2is)
> +
> +#undef TLB_HELPER
> +#undef TLB_HELPER_VA
>  
>  #endif /* __ASM_ARM_ARM64_FLUSHTLB_H__ */
>  /*
> diff --git a/xen/arch/arm/include/asm/vreg.h b/xen/arch/arm/include/asm/vreg.h
> index d92450017bc4..bf945eebbde4 100644
> --- a/xen/arch/arm/include/asm/vreg.h
> +++ b/xen/arch/arm/include/asm/vreg.h
> @@ -140,8 +140,8 @@ static inline void vreg_reg##sz##_clearbits(uint##sz##_t 
> *reg,  \
>  *reg &= ~(((uint##sz##_t)bits & mask) << shift);\
>  }
>  
> -VREG_REG_HELPERS(64, 0x7);
> -VREG_REG_HELPERS(32, 0x3);
> +VREG_REG_HELPERS(64, 0x7)
> +VREG_REG_HELPERS(32, 0x3)
>  
>  #undef VREG_REG_HELPERS
>  
> 
> base-commit: 2f69ef96801f0d2b9646abf6396e60f99c56e3a0
> -- 
> 2.25.1
>

Re: [PATCH] Arm: drop bogus ALIGN() from linker script

2023-06-14 Thread Stefano Stabellini

On Wed, 14 Jun 2023, Jan Beulich wrote:
> Having ALIGN() inside a section definition usually makes sense only with
> a label definition following (an exception case is a few lines out of
> context, where cache line sharing is intended to be avoided).
> Constituents of .bss.page_aligned need to specify their own alignment
> correctly anyway, or else they're susceptible to link order changing.
> This requirement is already met: Arm-specific code has no such object,
> while common (EFI) code has another one. That one has suitable alignment
> specified.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Stefano Stabellini 


> ---
> Note how RISC-V had this dropped pretty recently.
> 
> --- a/xen/arch/arm/xen.lds.S
> +++ b/xen/arch/arm/xen.lds.S
> @@ -199,7 +199,6 @@ SECTIONS
>.bss : { /* BSS */
> __bss_start = .;
> *(.bss.stack_aligned)
> -   . = ALIGN(PAGE_SIZE);
> *(.bss.page_aligned)
> . = ALIGN(PAGE_SIZE);
> __per_cpu_start = .;
>

Re: [PATCH] spinlock: alter inlining of _spin_lock_cb()

2023-06-14 Thread Stefano Stabellini

On Wed, 14 Jun 2023, Jan Beulich wrote:
> To comply with Misra rule 8.10 ("An inline function shall be declared
> with the static storage class"), convert what is presently
> _spin_lock_cb() to an always-inline (and static) helper, while making
> the function itself a thin wrapper, just like _spin_lock() is.
> 
> While there drop the unlikely() from the callback check, and correct
> indentation in _spin_lock().
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Stefano Stabellini 


> --- a/xen/common/spinlock.c
> +++ b/xen/common/spinlock.c
> @@ -304,7 +304,8 @@ static always_inline u16 observe_head(sp
>  return read_atomic(>head);
>  }
>  
> -void inline _spin_lock_cb(spinlock_t *lock, void (*cb)(void *), void *data)
> +static void always_inline spin_lock_common(spinlock_t *lock,
> +   void (*cb)(void *), void *data)
>  {
>  spinlock_tickets_t tickets = SPINLOCK_TICKET_INC;
>  LOCK_PROFILE_VAR;
> @@ -316,7 +317,7 @@ void inline _spin_lock_cb(spinlock_t *lo
>  while ( tickets.tail != observe_head(>tickets) )
>  {
>  LOCK_PROFILE_BLOCK;
> -if ( unlikely(cb) )
> +if ( cb )
>  cb(data);
>  arch_lock_relax();
>  }
> @@ -327,7 +328,12 @@ void inline _spin_lock_cb(spinlock_t *lo
>  
>  void _spin_lock(spinlock_t *lock)
>  {
> - _spin_lock_cb(lock, NULL, NULL);
> +spin_lock_common(lock, NULL, NULL);
> +}
> +
> +void _spin_lock_cb(spinlock_t *lock, void (*cb)(void *), void *data)
> +{
> +spin_lock_common(lock, cb, data);
>  }
>  
>  void _spin_lock_irq(spinlock_t *lock)
>

[linux-linus test] 181427: regressions - FAIL

2023-06-14 Thread osstest service owner

flight 181427 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181427/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-pvops 6 kernel-build fail REGR. vs. 180278
 test-armhf-armhf-xl-credit1   8 xen-boot   fail in 181417 REGR. vs. 180278
 build-arm64-pvops 6 kernel-build   fail in 181417 REGR. vs. 180278

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-dom0pvh-xl-amd 22 guest-start/debian.repeat fail pass in 
181417

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked in 181417 n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked in 181417 n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked in 181417 n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked in 181417 n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked in 181417 n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked in 181417 n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked in 181417 n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked in 181417 n/a
 test-arm64-arm64-examine  1 build-check(1)   blocked in 181417 n/a
 test-armhf-armhf-examine  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit1   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  8 xen-boot   fail in 181417 like 180278
 test-armhf-armhf-xl   8 xen-bootfail in 181417 like 180278
 test-armhf-armhf-xl-credit2   8 xen-bootfail in 181417 like 180278
 test-armhf-armhf-libvirt  8 xen-bootfail in 181417 like 180278
 test-armhf-armhf-xl-arndale   8 xen-bootfail in 181417 like 180278
 test-armhf-armhf-examine  8 reboot  fail in 181417 like 180278
 test-armhf-armhf-libvirt-raw  8 xen-bootfail in 181417 like 180278
 test-armhf-armhf-libvirt-qcow2  8 xen-boot  fail in 181417 like 180278
 test-armhf-armhf-xl-rtds  8 xen-bootfail in 181417 like 180278
 test-armhf-armhf-xl-vhd   8 xen-bootfail in 181417 like 180278
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180278
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass

[PATCH] xen/misra: add rules 1.4 and 2.1

2023-06-14 Thread Stefano Stabellini

From: Stefano Stabellini 

Signed-off-by: Stefano Stabellini 
---
 docs/misra/rules.rst | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
index 41a727ca98..4179e49ac2 100644
--- a/docs/misra/rules.rst
+++ b/docs/misra/rules.rst
@@ -90,6 +90,17 @@ existing codebase are work-in-progress.
behaviour
  -
 
+   * - Rule 1.4
+ - Required
+ - Emergent language features shall not be used
+ - Emergent language features, such as C11 features, should not be
+   confused with similar compiler extensions, which we use.
+
+   * - `Rule 2.1 
`_
+ - Required
+ - A project shall not contain unreachable code
+ -
+
* - `Rule 2.6 
`_
  - Advisory
  - A function should not contain unused label declarations
-- 
2.25.1

[PATCH v4] docs/misra: new rules addition

2023-06-14 Thread Stefano Stabellini

From: Stefano Stabellini 

For Dir 1.1, a document describing all implementation-defined behaviour
(i.e. gcc-specific behavior) will be added to docs/misra, also including
implementation-specific (gcc-specific) appropriate types for bit-field
relevant to Rule 6.1.

Rule 21.21 is lacking an example on gitlab but the rule is
straightforward: we don't use stdlib at all in Xen.

Signed-off-by: Stefano Stabellini 
---
Changes in v4:
- improve wording of the note in 6.1

Changes in v3:
- add all signed integer types to the Notes of 6.1
- clarify 7.2 in the Notes
- not added: marking "inapplicable" rules, to be a separate patch

Changes in v2:
- drop 5.6
- specify additional appropriate types for 6.1
---
 docs/misra/rules.rst | 50 
 1 file changed, 50 insertions(+)

diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
index d5a6ee8cb6..41a727ca98 100644
--- a/docs/misra/rules.rst
+++ b/docs/misra/rules.rst
@@ -40,6 +40,12 @@ existing codebase are work-in-progress.
  - Summary
  - Notes
 
+   * - `Dir 1.1 
`_
+ - Required
+ - Any implementation-defined behaviour on which the output of the
+   program depends shall be documented and understood
+ -
+
* - `Dir 2.1 
`_
  - Required
  - All source files shall compile without any compilation errors
@@ -57,6 +63,13 @@ existing codebase are work-in-progress.
header file being included more than once
  -
 
+   * - `Dir 4.11 
`_
+ - Required
+ - The validity of values passed to library functions shall be checked
+ - We do not have libraries in Xen (libfdt and others are not
+   considered libraries from MISRA C point of view as they are
+   imported in source form)
+
* - `Dir 4.14 
`_
  - Required
  - The validity of values received from external sources shall be
@@ -133,6 +146,12 @@ existing codebase are work-in-progress.
headers (xen/include/public/) are allowed to retain longer
identifiers for backward compatibility.
 
+   * - `Rule 6.1 
`_
+ - Required
+ - Bit-fields shall only be declared with an appropriate type
+ - In addition to the C99 types, we also consider appropriate types
+   enum and all explicitly signed / unsigned integer types.
+
* - `Rule 6.2 
`_
  - Required
  - Single-bit named bit fields shall not be of a signed type
@@ -143,6 +162,32 @@ existing codebase are work-in-progress.
  - Octal constants shall not be used
  -
 
+   * - `Rule 7.2 
`_
+ - Required
+ - A "u" or "U" suffix shall be applied to all integer constants
+   that are represented in an unsigned type
+ - The rule asks that any integer literal that is implicitly
+   unsigned is made explicitly unsigned by using one of the
+   indicated suffixes.  As an example, on a machine where the int
+   type is 32-bit wide, 0x is signed whereas 0x8000 is
+   (implicitly) unsigned. In order to comply with the rule, the
+   latter should be rewritten as either 0x8000u or 0x8000U.
+   Consistency considerations may suggest using the same suffix even
+   when not required by the rule. For instance, if one has:
+
+   Original: f(0x); f(0x8000);
+
+   one might prefer
+
+   Solution 1: f(0xU); f(0x8000U);
+
+   over
+
+   Solution 2: f(0x); f(0x8000U);
+
+   after having ascertained that "Solution 1" is compatible with the
+   intended semantics.
+
* - `Rule 7.3 
`_
  - Required
  - The lowercase character l shall not be used in a literal suffix
@@ -314,6 +359,11 @@ existing codebase are work-in-progress.
used following a subsequent call to the same function
  -
 
+   * - Rule 21.21
+ - Required
+ - The Standard Library function system of  shall not be used
+ -
+
* - `Rule 22.2 
`_
  - Mandatory
  - A block of memory shall only be freed if it was allocated by means of a
-- 
2.25.1

Re: [PATCH v3] docs/misra: new rules addition

2023-06-14 Thread Stefano Stabellini

On Tue, 13 Jun 2023, Jan Beulich wrote:
> On 13.06.2023 05:44, Stefano Stabellini wrote:
> > @@ -133,6 +146,13 @@ existing codebase are work-in-progress.
> > headers (xen/include/public/) are allowed to retain longer
> > identifiers for backward compatibility.
> >  
> > +   * - `Rule 6.1 
> > `_
> > + - Required
> > + - Bit-fields shall only be declared with an appropriate type
> > + - In addition to the C99 types, we also consider appropriate types:
> > +   unsigned char, unsigned short, unsigned long, unsigned long long,
> > +   enum, and all explicitly signed integer types.
> 
> If I was to read this without the earlier discussion in mind, I would wonder
> why the unsigned types are explicitly enumerated, but the signed ones are
> described in more general terms. Can't it simply be "all explicitly unsigned
> / signed integer types", which then also covers e.g. uint32_t?

I'll change it to that effect


> > @@ -143,6 +163,32 @@ existing codebase are work-in-progress.
> >   - Octal constants shall not be used
> >   -
> >  
> > +   * - `Rule 7.2 
> > `_
> > + - Required
> > + - A "u" or "U" suffix shall be applied to all integer constants
> > +   that are represented in an unsigned type
> > + - The rule asks that any integer literal that is implicitly
> > +   unsigned is made explicitly unsigned by using one of the
> > +   indicated suffixes.  As an example, on a machine where the int
> > +   type is 32-bit wide, 0x is signed whereas 0x8000 is
> > +   (implicitly) unsigned. In order to comply with the rule, the
> > +   latter should be rewritten as either 0x8000u or 0x8000U.
> > +   Consistency considerations may suggest using the same suffix even
> > +   when not required by the rule. For instance, if one has:
> > +
> > +   Original: f(0x); f(0x8000);
> > +
> > +   one might prefer
> > +
> > +   Solution 1: f(0xU); f(0x8000U);
> > +
> > +   over
> > +
> > +   Solution 2: f(0x); f(0x8000U);
> > +
> > +   after having ascertained that "Solution 1" is compatible with the
> > +   intended semantics.
> 
> I think we should state here what we want people to do, not what "one
> might prefer". That aspect aside, I'm not convinced the added text
> (matching what Roberto did suggest) really addresses my concerns. Yet
> I'm not going to pursue this any further - we'll see how this ends up
> working in practice.

OK. I'll keep it as is.

Re: [PATCH v4 27/34] nios2: Convert __pte_free_tlb() to use ptdescs

2023-06-14 Thread Dinh Nguyen





On 6/14/23 04:30, Geert Uytterhoeven wrote:

Hi Dinh,

On Wed, Jun 14, 2023 at 12:17 AM Dinh Nguyen  wrote:

On 6/12/23 16:04, Vishal Moola (Oracle) wrote:

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents.

Signed-off-by: Vishal Moola (Oracle) 
---
   arch/nios2/include/asm/pgalloc.h | 8 
   1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/nios2/include/asm/pgalloc.h b/arch/nios2/include/asm/pgalloc.h
index ecd1657bb2ce..ce6bb8e74271 100644
--- a/arch/nios2/include/asm/pgalloc.h
+++ b/arch/nios2/include/asm/pgalloc.h
@@ -28,10 +28,10 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmd,

   extern pgd_t *pgd_alloc(struct mm_struct *mm);

-#define __pte_free_tlb(tlb, pte, addr)   \
- do {\
- pgtable_pte_page_dtor(pte); \
- tlb_remove_page((tlb), (pte));  \
+#define __pte_free_tlb(tlb, pte, addr)   \
+ do {\
+ pagetable_pte_dtor(page_ptdesc(pte));   \
+ tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
   } while (0)

   #endif /* _ASM_NIOS2_PGALLOC_H */


Applied!


I don't think you can just apply this patch, as the new functions
were only introduced in [PATCH v4 05/34] of this series.



Ah, thanks for the pointer!

Dinh

Re: [PATCH v4 0/2] x86: xen: add missing prototypes

2023-06-14 Thread Boris Ostrovsky





On 6/14/23 3:34 AM, Juergen Gross wrote:

Avoid missing prototype warnings.

Arnd Bergmann (1):
   x86: xen: add missing prototypes

Juergen Gross (1):
   x86/xen: add prototypes for paravirt mmu functions

  arch/x86/xen/efi.c |  2 ++
  arch/x86/xen/mmu_pv.c  | 16 
  arch/x86/xen/smp.h |  4 
  arch/x86/xen/smp_pv.c  |  1 -
  arch/x86/xen/xen-ops.h |  3 +++
  include/xen/xen.h  |  3 +++
  6 files changed, 28 insertions(+), 1 deletion(-)




Reviewed-by: Boris Ostrovsky

Re: [PATCH 2/3] acpi/processor: sanitize _PDC buffer bits when running as Xen dom0

2023-06-14 Thread Jason Andryuk

Hi, Roger,

On Mon, Nov 21, 2022 at 10:04 AM Roger Pau Monné  wrote:
>
> On Mon, Nov 21, 2022 at 03:10:36PM +0100, Jan Beulich wrote:
> > On 21.11.2022 11:21, Roger Pau Monne wrote:
> > > --- a/drivers/acpi/processor_pdc.c
> > > +++ b/drivers/acpi/processor_pdc.c
> > > @@ -137,6 +137,14 @@ acpi_processor_eval_pdc(acpi_handle handle, struct 
> > > acpi_object_list *pdc_in)
> > > buffer[2] &= ~(ACPI_PDC_C_C2C3_FFH | ACPI_PDC_C_C1_FFH);
> > >
> > > }
> > > +   if (xen_initial_domain())
> > > +   /*
> > > +* When Linux is running as Xen dom0 it's the hypervisor the
> > > +* entity in charge of the processor power management, and so
> > > +* Xen needs to check the OS capabilities reported in the _PDC
> > > +* buffer matches what the hypervisor driver supports.
> > > +*/
> > > +   xen_sanitize_pdc((uint32_t *)pdc_in->pointer->buffer.pointer);
> > > status = acpi_evaluate_object(handle, "_PDC", pdc_in, NULL);
> >
> > Again looking at our old XenoLinux forward port we had this inside the
> > earlier if(), as an _alternative_ to the &= (I don't think it's valid
> > to apply both the kernel's and Xen's adjustments). That would also let
> > you use "buffer" rather than re-calculating it via yet another (risky
> > from an abstract pov) cast.
>
> Hm, I've wondered this and decided it wasn't worth to short-circuit
> the boot_option_idle_override conditional because ACPI_PDC_C_C2C3_FFH
> and ACPI_PDC_C_C1_FFH will be set anyway by Xen in
> arch_acpi_set_pdc_bits() as part of ACPI_PDC_C_CAPABILITY_SMP.
>
> I could re-use some of the code in there, but didn't want to make it
> more difficult to read just for the benefit of reusing buffer.
>
> > It was the very nature of requiring Xen-specific conditionals which I
> > understand was the reason why so far no attempt was made to get this
> > (incl the corresponding logic for patch 1) into any upstream kernel.
>
> Yes, well, it's all kind of ugly.  Hence my suggestion to simply avoid
> doing any ACPI Processor object handling in Linux with the native code
> and handle it all in a Xen specific driver.  That requires the Xen
> driver being able to fetch more data itself form the ACPI Processor
> methods, but also unties it from the dependency on the data being
> filled by the generic code, and the 'tricks' is plays into fooling
> generic code to think certain processors are online.

Are you working on this patch anymore?  My Xen HWP patches need a
Linux patch like this one to set bit 12 in the PDC.  I had an affected
user test with this patch and it worked, serving as an equivalent of
Linux commit a21211672c9a ("ACPI / processor: Request native thermal
interrupt handling via _OSC").

Another idea is to use Linux's arch_acpi_set_pdc_bits() to make the
hypercall to Xen.  It occurs earlier:
acpi_processor_set_pdc()
acpi_processor_alloc_pdc()
acpi_set_pdc_bits()
arch_acpi_set_pdc_bits()
acpi_processor_eval_pdc()

So the IDLE_NOMWAIT masking in acpi_processor_eval_pdc() would still
apply.  arch_acpi_set_pdc_bits() is provided the buffer, so it's a
little cleaner in that respect.

Thanks,
Jason

[linux-5.4 test] 181425: regressions - FAIL

2023-06-14 Thread osstest service owner

flight 181425 linux-5.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181425/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-pvops 6 kernel-build fail REGR. vs. 181363

Tests which did not succeed, but are not blocking:
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-xl-qemut-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked 
n/a
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-xl-qcow2 1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvshim1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvhv2-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-examine  1 build-check(1)   blocked  n/a
 test-amd64-amd64-examine-bios  1 build-check(1)   blocked  n/a
 test-amd64-amd64-examine-uefi  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvhv2-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-amd64-pair 1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-credit1   1 build-check(1)   blocked  n/a
 test-amd64-amd64-pygrub   1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-freebsd11-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-freebsd12-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-qemuu-nested-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-nested-intel  1 build-check(1)  blocked n/a
 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail  like 181354
 test-amd64-i386-libvirt-raw   7 xen-install  fail  like 181363
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 181363
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 181363
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 181363
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 181363
 test-armhf-armhf-xl-multivcpu 18 guest-start/debian.repeatfail like 181363
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 181363
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 181363
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 181363
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass

Re: [PATCH v3 4/4] x86/cpu-policy: Derive RSBA/RRSBA for guest policies

2023-06-14 Thread Andrew Cooper

On 13/06/2023 10:59 am, Jan Beulich wrote:
> On 12.06.2023 18:13, Andrew Cooper wrote:
>> The RSBA bit, "RSB Alternative", means that the RSB may use alternative
>> predictors when empty.  From a practical point of view, this mean "Retpoline
>> not safe".
>>
>> Enhanced IBRS (officially IBRS_ALL in Intel's docs, previously IBRS_ATT) is a
>> statement that IBRS is implemented in hardware (as opposed to the form
>> retrofitted to existing CPUs in microcode).
>>
>> The RRSBA bit, "Restricted-RSBA", is a combination of RSBA, and the eIBRS
>> property that predictions are tagged with the mode in which they were learnt.
>> Therefore, it means "when eIBRS is active, the RSB may fall back to
>> alternative predictors but restricted to the current prediction mode".  As
>> such, it's stronger statement than RSBA, but still means "Retpoline not 
>> safe".
>>
>> CPUs are not expected to enumerate both RSBA and RRSBA.
>>
>> Add feature dependencies for EIBRS and RRSBA.  While technically they're not
>> linked, absolutely nothing good can come of letting the guest see RRSBA
>> without EIBRS.  Nor a guest seeing EIBRS without IBRSB.  Furthermore, we use
>> this dependency to simplify the max derivation logic.
>>
>> The max policies gets RSBA and RRSBA unconditionally set (with the EIBRS
>> dependency maybe hiding RRSBA).  We can run any VM, even if it has been told
>> "somewhere you might run, Retpoline isn't safe".
>>
>> The default policies are more complicated.  A guest shouldn't see both bits,
>> but it needs to see one if the current host suffers from any form of RSBA, 
>> and
>> which bit it needs to see depends on whether eIBRS is visible or not.
>> Therefore, the calculation must be performed after sanitise_featureset().
>>
>> Signed-off-by: Andrew Cooper 
>> ---
>> CC: Jan Beulich 
>> CC: Roger Pau Monné 
>> CC: Wei Liu 
>>
>> v3:
>>  * Minor commit message adjustment.
>>  * Drop changes to recalculate_cpuid_policy().  Deferred to a later series.
> With this dropped, with the title not saying "max/default", and with
> the description also not mentioning "live" policies at all, I don't
> think this patch is self-consistent (meaning in particular: leaving
> aside the fact that there's no way right now to requests e.g. both
> RSBA and RRSBA for a guest; aiui it is possible for Dom0).
>
> As you may imagine I'm also curious why you decided to drop this.

Because when I tried doing levelling in Xapi, I remembered why I did it
the way I did in v1, and why the v2 way was wrong.

Xen cannot safely edit what the toolstack provides, so must not. 
Instead, failing the set_policy() call is an option, and is what we want
to do longterm, but also happens to be wrong too in this case. An admin
may know that a VM isn't using retpoline, and may need to migrate it
anyway for a number of reasons, so any safety checks need to be in the
toolstack, and need to be overrideable with something like --force.


I don't really associate "derive policies" with anything other than the
system policies.  Domain construction isn't any kind of derivation -
it's simply doing what the toolstack asks.

~Andrew

[PATCH v4 15/15] CHANGELOG: Add Intel HWP entry

2023-06-14 Thread Jason Andryuk

Signed-off-by: Jason Andryuk 
Acked-by: Henry Wang 
---
v3:
Position under existing Added section
Add Henry's Ack

v2:
Add blank line
---
 CHANGELOG.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 7d7e0590f8..8d6e6c3088 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -24,7 +24,7 @@ The format is based on [Keep a 
Changelog](https://keepachangelog.com/en/1.0.0/)
  - xl/libxl can customize SMBIOS strings for HVM guests.
  - Add support for AVX512-FP16 on x86.
  - On Arm, Xen supports guests running SVE/SVE2 instructions. (Tech Preview)
-
+ - Add Intel Hardware P-States (HWP) cpufreq driver.
 
 ## 
[4.17.0](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.17.0) 
- 2022-12-12
 
-- 
2.40.1

[PATCH v4 10/15] libxc: Include cppc_para in definitions

2023-06-14 Thread Jason Andryuk

Expose the cppc_para fields through libxc.

Signed-off-by: Jason Andryuk 
Acked-by: Anthony PERARD 
---
v4:
Rename hwp to cppc
Add Anthony's Ack
---
 tools/include/xenctrl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 8aedb952a0..2092632296 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1892,6 +1892,7 @@ int xc_smt_disable(xc_interface *xch);
  */
 typedef struct xen_userspace xc_userspace_t;
 typedef struct xen_ondemand xc_ondemand_t;
+typedef struct xen_cppc_para xc_cppc_para_t;
 
 struct xc_get_cpufreq_para {
 /* IN/OUT variable */
@@ -1923,6 +1924,7 @@ struct xc_get_cpufreq_para {
 xc_ondemand_t ondemand;
 } u;
 } s;
+xc_cppc_para_t cppc_para;
 } u;
 
 int32_t turbo_enabled;
-- 
2.40.1

[PATCH v4 13/15] libxc: Add xc_set_cpufreq_cppc

2023-06-14 Thread Jason Andryuk

Add xc_set_cpufreq_cppc to allow calling xen_systctl_pm_op
SET_CPUFREQ_CPPC.

Signed-off-by: Jason Andryuk 
Acked-by: Anthony PERARD 
---
v2:
Mark xc_set_hwp_para_t const

v4:
s/hwp/cppc/
Add Anthony's Ack
---
 tools/include/xenctrl.h |  4 
 tools/libs/ctrl/xc_pm.c | 18 ++
 2 files changed, 22 insertions(+)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2092632296..c7eb97959a 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1930,11 +1930,15 @@ struct xc_get_cpufreq_para {
 int32_t turbo_enabled;
 };
 
+typedef struct xen_set_cppc_para xc_set_cppc_para_t;
+
 int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
 struct xc_get_cpufreq_para *user_para);
 int xc_set_cpufreq_gov(xc_interface *xch, int cpuid, char *govname);
 int xc_set_cpufreq_para(xc_interface *xch, int cpuid,
 int ctrl_type, int ctrl_value);
+int xc_set_cpufreq_cppc(xc_interface *xch, int cpuid,
+const xc_set_cppc_para_t *set_cppc);
 int xc_get_cpufreq_avgfreq(xc_interface *xch, int cpuid, int *avg_freq);
 
 int xc_set_sched_opt_smt(xc_interface *xch, uint32_t value);
diff --git a/tools/libs/ctrl/xc_pm.c b/tools/libs/ctrl/xc_pm.c
index 19fe1a79dd..e86045697d 100644
--- a/tools/libs/ctrl/xc_pm.c
+++ b/tools/libs/ctrl/xc_pm.c
@@ -329,6 +329,24 @@ int xc_set_cpufreq_para(xc_interface *xch, int cpuid,
 return xc_sysctl(xch, );
 }
 
+int xc_set_cpufreq_cppc(xc_interface *xch, int cpuid,
+const xc_set_cppc_para_t *set_cppc)
+{
+DECLARE_SYSCTL;
+
+if ( !xch )
+{
+errno = EINVAL;
+return -1;
+}
+sysctl.cmd = XEN_SYSCTL_pm_op;
+sysctl.u.pm_op.cmd = SET_CPUFREQ_CPPC;
+sysctl.u.pm_op.cpuid = cpuid;
+sysctl.u.pm_op.u.set_cppc = *set_cppc;
+
+return xc_sysctl(xch, );
+}
+
 int xc_get_cpufreq_avgfreq(xc_interface *xch, int cpuid, int *avg_freq)
 {
 int ret = 0;
-- 
2.40.1

[PATCH v4 12/15] xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op

2023-06-14 Thread Jason Andryuk

Add SET_CPUFREQ_HWP xen_sysctl_pm_op to set HWP parameters.  The sysctl
supports setting multiple values simultaneously as indicated by the
set_params bits.  This allows atomically applying new HWP configuration
via a single wrmsr.

XEN_SYSCTL_HWP_SET_PRESET_BALANCE/PERFORMANCE/POWERSAVE provide three
common presets.  Setting them depends on hardware limits which the
hypervisor is already caching.  So using them allows skipping a
hypercall to query the limits (lowest/highest) to then set those same
values.  The code is organized to allow a preset to be refined with
additional stuff if desired.

"most_efficient" and "guaranteed" could be additional presets in the
future, but the are not added now.  Those levels can change at runtime,
but we don't have code in place to monitor and update for those events.

Signed-off-by: Jason Andryuk 

---
v4:
Remove IA32_ENERGY_BIAS support
Validate parameters don't exceed 255
Use CPPC/cppc name
set_cppc_para() add const
set_cppc_para() return hwp_cpufreq_target()
Expand sysctl comments

v3:
Remove cpufreq_governor_internal from set_cpufreq_hwp

v2:
Update for naming anonymous union
Drop hwp_err for invalid input in set_hwp_para()
Drop uint16_t cast in XEN_SYSCTL_HWP_SET_PARAM_MASK
Drop parens for HWP_SET_PRESET defines
Reference activity_window format comment
Place SET_CPUFREQ_HWP after SET_CPUFREQ_PARA
Add {HWP,IA32}_ENERGY_PERF_MAX_{PERFORMANCE,POWERSAVE} defines
Order defines before fields in sysctl.h
Use XEN_HWP_GOVERNOR
Use per_cpu for hwp_drv_data
---
 xen/arch/x86/acpi/cpufreq/hwp.c| 98 ++
 xen/drivers/acpi/pmstat.c  | 17 ++
 xen/include/acpi/cpufreq/cpufreq.h |  2 +
 xen/include/public/sysctl.h| 58 ++
 4 files changed, 175 insertions(+)

diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
index 86c5793266..3ee046940c 100644
--- a/xen/arch/x86/acpi/cpufreq/hwp.c
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -23,6 +23,10 @@ static bool __ro_after_init feature_hdc;
 bool __initdata opt_cpufreq_hwp;
 static bool __ro_after_init opt_cpufreq_hdc = true;
 
+#define HWP_ENERGY_PERF_MAX_PERFORMANCE 0
+#define HWP_ENERGY_PERF_BALANCE 0x80
+#define HWP_ENERGY_PERF_MAX_POWERSAVE   0xff
+
 union hwp_request
 {
 struct
@@ -560,6 +564,100 @@ int get_hwp_para(const unsigned int cpu,
 return 0;
 }
 
+int set_hwp_para(struct cpufreq_policy *policy,
+ const struct xen_set_cppc_para *set_cppc)
+{
+unsigned int cpu = policy->cpu;
+struct hwp_drv_data *data = per_cpu(hwp_drv_data, cpu);
+
+if ( data == NULL )
+return -EINVAL;
+
+/* Validate all parameters first */
+if ( set_cppc->set_params & ~XEN_SYSCTL_CPPC_SET_PARAM_MASK )
+return -EINVAL;
+
+if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ACT_WINDOW &&
+ !feature_hwp_activity_window )
+return -EINVAL;
+
+if ( !(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ACT_WINDOW) &&
+ set_cppc->activity_window )
+return -EINVAL;
+
+if ( set_cppc->activity_window & ~XEN_SYSCTL_CPPC_ACT_WINDOW_MASK )
+return -EINVAL;
+
+if ( (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED) &&
+ set_cppc->desired != 0 &&
+ (set_cppc->desired < data->hw.lowest ||
+  set_cppc->desired > data->hw.highest) )
+return -EINVAL;
+
+/*
+ * minimum & maximum are not validated against lowest or highest as
+ * hardware doesn't seem to care and the SDM says CPUs will clip
+ * internally.
+ */
+if ( set_cppc->minimum > 255 ||
+ set_cppc->maximum > 255 ||
+ set_cppc->energy_perf > 255 )
+return -EINVAL;
+
+/* Apply presets */
+switch ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_PRESET_MASK )
+{
+case XEN_SYSCTL_CPPC_SET_PRESET_POWERSAVE:
+data->minimum = data->hw.lowest;
+data->maximum = data->hw.lowest;
+data->activity_window = 0;
+data->energy_perf = HWP_ENERGY_PERF_MAX_POWERSAVE;
+data->desired = 0;
+break;
+
+case XEN_SYSCTL_CPPC_SET_PRESET_PERFORMANCE:
+data->minimum = data->hw.highest;
+data->maximum = data->hw.highest;
+data->activity_window = 0;
+data->energy_perf = HWP_ENERGY_PERF_MAX_PERFORMANCE;
+data->desired = 0;
+break;
+
+case XEN_SYSCTL_CPPC_SET_PRESET_BALANCE:
+data->minimum = data->hw.lowest;
+data->maximum = data->hw.highest;
+data->activity_window = 0;
+data->energy_perf = HWP_ENERGY_PERF_BALANCE;
+data->desired = 0;
+break;
+
+case XEN_SYSCTL_CPPC_SET_PRESET_NONE:
+break;
+
+default:
+return -EINVAL;
+}
+
+/* Further customize presets if needed */
+if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM )
+data->minimum = set_cppc->minimum;
+
+if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM )
+data->maximum = set_cppc->maximum;
+
+if (

[PATCH v4 14/15] xenpm: Add set-cpufreq-cppc subcommand

2023-06-14 Thread Jason Andryuk

set-cpufreq-cppc allows setting the Hardware P-State (HWP) parameters.

It can be run on all or just a single cpu.  There are presets of
balance, powersave & performance.  Those can be further tweaked by
param:val arguments as explained in the usage description.

Parameter names are just checked to the first 3 characters to shorten
typing.

Some options are hardware dependent, and ranges can be found in
get-cpufreq-para.

Signed-off-by: Jason Andryuk 
---
v4:
Remove energy bias 0-15 & 7 references
Use MASK_INSR
Fixup { placement
Drop extra case in parse_activity_window
strcmp suffix
Expand help text
s/hwp/cppc/
Use isdigit() to check cpuid - otherwise run on all CPUs.

v2:
Compare provided parameter name and not just 3 characters.
Use "-" in parameter names
Remove hw_
Replace sscanf with strchr & strtoul.
Remove toplevel error message with lower level ones.
Help text s/127/128/
Help text mention truncation.
Avoid some truncation rounding down by adding 5 before division.
Help test mention default microseconds
Also comment the limit check written to avoid overflow.
---
 tools/misc/xenpm.c | 237 +
 1 file changed, 237 insertions(+)

diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 488797fd20..2f2b699794 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -16,6 +16,8 @@
  */
 #define MAX_NR_CPU 512
 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -67,6 +69,30 @@ void show_help(void)
 " set-max-cstate|'unlimited' [|'unlimited']\n"
 " set the C-State limitation 
( >= 0) and\n"
 " optionally the C-sub-state 
limitation ( >= 0)\n"
+" set-cpufreq-cppc  [cpuid] [balance|performance|powersave] 
*\n"
+" set Hardware P-State (HWP) 
parameters\n"
+" on CPU  or all if 
omitted.\n"
+" optionally a preset of one 
of:\n"
+"   
balance|performance|powersave\n"
+" an optional list of 
param:val arguments\n"
+"   minimum:N (0-255)\n"
+"   maximum:N (0-255)\n"
+"   get-cpufreq-para 
lowest/highest\n"
+"   values are limits 
for\n"
+"   minumum/maximum.\n"
+"   desired:N (0-255)\n"
+"   set explicit 
performance target.\n"
+"   non-zero disables 
auto-HWP mode.\n"
+"   energy-perf:N (0-255)\n"
+"   
energy/performance hint\n"
+"   lower - favor 
performance\n"
+"   higher - favor 
powersave\n"
+"   128 - 
balance\n"
+"   act-window:N{,m,u}s range 
1us-1270s\n"
+"   window for internal 
calculations.\n"
+"   units default to 
\"us\" if unspecified.\n"
+"   truncates 
un-representable values.\n"
+"   0 lets the hardware 
decide.\n"
 " start [seconds] start collect Cx/Px 
statistics,\n"
 " output after CTRL-C or 
SIGINT or several seconds.\n"
 " enable-turbo-mode [cpuid]   enable Turbo Mode for 
processors that support it.\n"
@@ -1292,6 +1318,216 @@ void disable_turbo_mode(int argc, char *argv[])
 errno, strerror(errno));
 }
 
+/*
+ * Parse activity_window:NNN{us,ms,s} and validate range.
+ *
+ * Activity window is a 7bit mantissa (0-127) with a 3bit exponent (0-7) base
+ * 10 in microseconds.  So the range is 1 microsecond to 1270 seconds.  A value
+ * of 0 lets the hardware autonomously select the window.
+ *
+ * Return 0 on success
+ *   -1 on error
+ */
+static int parse_activity_window(xc_set_cppc_para_t *set_cppc, unsigned long u,
+ const char *suffix)
+{
+unsigned int exponent = 0;
+unsigned int multiplier = 1;
+
+if ( suffix && suffix[0] )
+{
+if ( strcmp(suffix, "s") == 0 )
+{
+multiplier = 1000 * 1000;
+exponent = 6;
+}
+else if (

[PATCH v4 09/15] cpufreq: Export HWP parameters to userspace as CPPC

2023-06-14 Thread Jason Andryuk

Extend xen_get_cpufreq_para to return hwp parameters.  HWP is an
implementation of ACPI CPPC (Collaborative Processor Performance
Control).  Use the CPPC name since that might be useful in the future
for AMD P-state.

We need the features bitmask to indicate fields supported by the actual
hardware - this only applies to activity window for the time being.

The HWP most_efficient is mapped to CPPC lowest_nonlinear, and guaranteed is
mapped to nominal.  CPPC has a guaranteed that is optional while nominal
is required.  ACPI spec says "If this register is not implemented, OSPM
assumes guaranteed performance is always equal to nominal performance."

The use of uint8_t parameters matches the hardware size.  uint32_t
entries grows the sysctl_t past the build assertion in setup.c.  The
uint8_t ranges are supported across multiple generations, so hopefully
they won't change.

Signed-off-by: Jason Andryuk 
---
v2:
Style fixes
Don't bump XEN_SYSCTL_INTERFACE_VERSION
Drop cpufreq.h comment divider
Expand xen_hwp_para comment
Add HWP activity window mantissa/exponent defines
Handle union rename
Add const to get_hwp_para
Remove hw_ prefix from xen_hwp_para members
Use XEN_HWP_GOVERNOR
Use per_cpu for hwp_drv_data

v4:
Fixup for opt_cpufreq_hwp/hdc removal
get_hwp_para() takes cpu as arg
XEN_ prefix HWP_ACT_WINDOW_*
Drop HWP_ACT_WINDOW_EXPONENT_SHIFT - shift MASK
Remove Energy Bias (0-15) EPP fallback
Rename xen_hwp_para to xen_cppc_para
s/hwp/cppc/
Use scaling driver to switch output
---
 xen/arch/x86/acpi/cpufreq/hwp.c| 23 +
 xen/drivers/acpi/pmstat.c  | 78 --
 xen/include/acpi/cpufreq/cpufreq.h |  2 +
 xen/include/public/sysctl.h| 56 +
 4 files changed, 123 insertions(+), 36 deletions(-)

diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
index 5f210b54ff..86c5793266 100644
--- a/xen/arch/x86/acpi/cpufreq/hwp.c
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -537,6 +537,29 @@ static const struct cpufreq_driver __initconstrel 
hwp_cpufreq_driver =
 .update = hwp_cpufreq_update,
 };
 
+int get_hwp_para(const unsigned int cpu,
+ struct xen_cppc_para *cppc_para)
+{
+const struct hwp_drv_data *data = per_cpu(hwp_drv_data, cpu);
+
+if ( data == NULL )
+return -EINVAL;
+
+cppc_para->features =
+(feature_hwp_activity_window ? XEN_SYSCTL_CPPC_FEAT_ACT_WINDOW : 0);
+cppc_para->lowest   = data->hw.lowest;
+cppc_para->lowest_nonlinear = data->hw.most_efficient;
+cppc_para->nominal  = data->hw.guaranteed;
+cppc_para->highest  = data->hw.highest;
+cppc_para->minimum  = data->minimum;
+cppc_para->maximum  = data->maximum;
+cppc_para->desired  = data->desired;
+cppc_para->energy_perf  = data->energy_perf;
+cppc_para->activity_window  = data->activity_window;
+
+return 0;
+}
+
 int __init hwp_register_driver(void)
 {
 return cpufreq_register_driver(_cpufreq_driver);
diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c
index 57359c21d8..10143c084c 100644
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -251,48 +251,54 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
 else
 strlcpy(op->u.get_para.scaling_driver, "Unknown", CPUFREQ_NAME_LEN);
 
-if ( !(scaling_available_governors =
-   xzalloc_array(char, gov_num * CPUFREQ_NAME_LEN)) )
-return -ENOMEM;
-if ( (ret = read_scaling_available_governors(
-scaling_available_governors,
-gov_num * CPUFREQ_NAME_LEN * sizeof(char))) )
+if ( !strncasecmp(op->u.get_para.scaling_driver, XEN_HWP_DRIVER,
+  CPUFREQ_NAME_LEN) )
+ret = get_hwp_para(policy->cpu, >u.get_para.u.cppc_para);
+else
 {
+if ( !(scaling_available_governors =
+   xzalloc_array(char, gov_num * CPUFREQ_NAME_LEN)) )
+return -ENOMEM;
+if ( (ret = read_scaling_available_governors(
+scaling_available_governors,
+gov_num * CPUFREQ_NAME_LEN * sizeof(char))) )
+{
+xfree(scaling_available_governors);
+return ret;
+}
+ret = copy_to_guest(op->u.get_para.scaling_available_governors,
+scaling_available_governors, gov_num * CPUFREQ_NAME_LEN);
 xfree(scaling_available_governors);
-return ret;
-}
-ret = copy_to_guest(op->u.get_para.scaling_available_governors,
-scaling_available_governors, gov_num * CPUFREQ_NAME_LEN);
-xfree(scaling_available_governors);
-if ( ret )
-return ret;
+if ( ret )
+return ret;
 
-op->u.get_para.u.s.scaling_cur_freq = policy->cur;
-op->u.get_para.u.s.scaling_max_freq = policy->max;
-op->u.get_para.u.s.scaling_min_freq = policy->min;
+op->u.get_para.u.s.scaling_cur_freq =

[PATCH v4 11/15] xenpm: Print HWP/CPPC parameters

2023-06-14 Thread Jason Andryuk

Print HWP-specific parameters.  Some are always present, but others
depend on hardware support.

Signed-off-by: Jason Andryuk 
---
v2:
Style fixes
Declare i outside loop
Replace repearted hardware/configured limits with spaces
Fixup for hw_ removal
Use XEN_HWP_GOVERNOR
Use HWP_ACT_WINDOW_EXPONENT_*
Remove energy_perf hw autonomous - 0 doesn't mean autonomous

v4:
Return activity_window from calculate_hwp_activity_window
Use blanks instead of _ in output
Use MASK_EXTR
Check XEN_HWP_DRIVER name since governor is no longer returned
s/hwp/cppc
---
 tools/misc/xenpm.c | 66 ++
 1 file changed, 66 insertions(+)

diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 4e53e68dc5..488797fd20 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -708,6 +708,46 @@ void start_gather_func(int argc, char *argv[])
 pause();
 }
 
+static unsigned int calculate_activity_window(const xc_cppc_para_t *cppc,
+  const char **units)
+{
+unsigned int mantissa = MASK_EXTR(cppc->activity_window,
+  XEN_CPPC_ACT_WINDOW_MANTISSA_MASK);
+unsigned int exponent = MASK_EXTR(cppc->activity_window,
+  XEN_CPPC_ACT_WINDOW_EXPONENT_MASK);
+unsigned int multiplier = 1;
+unsigned int i;
+
+/*
+ * SDM only states a 0 register is hardware selected, and doesn't mention
+ * a 0 mantissa with a non-0 exponent.  Only special case a 0 register.
+ */
+if ( cppc->activity_window == 0 )
+{
+*units = "hardware selected";
+
+return 0;
+}
+
+if ( exponent >= 6 )
+{
+*units = "s";
+exponent -= 6;
+}
+else if ( exponent >= 3 )
+{
+*units = "ms";
+exponent -= 3;
+}
+else
+*units = "us";
+
+for ( i = 0; i < exponent; i++ )
+multiplier *= 10;
+
+return mantissa * multiplier;
+}
+
 /* print out parameters about cpu frequency */
 static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para 
*p_cpufreq)
 {
@@ -772,6 +812,32 @@ static void print_cpufreq_para(int cpuid, struct 
xc_get_cpufreq_para *p_cpufreq)
p_cpufreq->u.s.scaling_min_freq,
p_cpufreq->u.s.scaling_cur_freq);
 }
+else
+{
+const xc_cppc_para_t *cppc = _cpufreq->u.cppc_para;
+
+printf("cppc variables   :\n");
+printf("  hardware limits: lowest [%u] lowest nonlinear [%u]\n",
+   cppc->lowest, cppc->lowest_nonlinear);
+printf(" : nominal [%u] highest [%u]\n",
+   cppc->nominal, cppc->highest);
+printf("  configured limits  : min [%u] max [%u] energy perf [%u]\n",
+   cppc->minimum, cppc->maximum, cppc->energy_perf);
+
+if ( cppc->features & XEN_SYSCTL_CPPC_FEAT_ACT_WINDOW )
+{
+unsigned int activity_window;
+const char *units;
+
+activity_window = calculate_activity_window(cppc, );
+printf(" : activity_window [%u %s]\n",
+   activity_window, units);
+}
+
+printf(" : desired [%u%s]\n",
+   cppc->desired,
+   cppc->desired ? "" : " hw autonomous");
+}
 
 printf("turbo mode   : %s\n",
p_cpufreq->turbo_enabled ? "enabled" : "disabled or n/a");
-- 
2.40.1

[PATCH v4 06/15] cpufreq: Add Hardware P-State (HWP) driver

2023-06-14 Thread Jason Andryuk

>From the Intel SDM: "Hardware-Controlled Performance States (HWP), which
autonomously selects performance states while utilizing OS supplied
performance guidance hints."

Enable HWP to run in autonomous mode by poking the correct MSRs.
cpufreq=hwp enables and specifying cpufreq=xen would disable it.  hdc is
a sub-option under hwp (i.e.  cpufreq=xen:hwp,hdc=0) as is verbose.  If
cpufreq=hwp is specified, but hardware support is unavailable, Xen
fallbacks back to cpufreq=xen.

There is no interface to configure - xen_sysctl_pm_op/xenpm will
be extended to configure in subsequent patches.  It will run with the
default values, which should be the default 0x80 (out of 0x0-0xff)
energy/performance preference.

Unscientific powertop measurement of an mostly idle, customized OpenXT
install:
A 10th gen 6-core laptop showed battery discharge drop from ~9.x to
~7.x watts.
A 8th gen 4-core laptop dropped from ~10 to ~9

Power usage depends on many factors, especially display brightness, but
this does show a power saving in balanced mode when CPU utilization is
low.

HWP isn't compatible with an external governor - it doesn't take
explicit frequency requests.  Therefore a minimal internal governor,
hwp, is also added as a placeholder.

While adding to the xen-command-line.pandoc entry, un-nest verbose from
minfreq.  They are independent.

With cpufreq=hwp,verbose, HWP prints processor capabilities that are not
used by the code, like HW_FEEDBACK.  This is done because otherwise
there isn't a convenient way to query the information.

Signed-off-by: Jason Andryuk 

---

We disable on cpuid_level < 0x16.  cpuid(0x16) is used to get the cpu
frequencies for calculating the APERF/MPERF.  Without it, things would
still work, but the average cpu frequency output would be wrong.

My 8th & 10th gen test systems both report:
(XEN) HWP: 1 notify: 1 act_window: 1 energy_perf: 1 pkg_level: 0 peci: 0
(XEN) HWP: Hardware Duty Cycling (HDC) supported
(XEN) HWP: HW_FEEDBACK not supported

Specifying HWP as a fake governor - cpufreq=xen:hwp - would require
resetting the governor if HWP hardware wasn't available.  Making
cpufreq=hwp a top level option avoids that issue.

Falling back from cpufreq=hwp to cpufreq=xen is a more user-friendly
choice than disabling cpufreq when HWP is not available.  Specifying
cpufreq=hwp indicates the user wants cpufreq, so, if HWP isn't
available, it makes sense to give them the cpufreq that can be
supported.  i.e. I can't see a user only wanting cpufreq=hwp or
cpufreq=none, but not cpufreq=xen.

We can't use parse_boolean() since it requires a single name=val string
and cpufreq_handle_common_option is provided two strings.  Use
parse_bool() and manual handle no-hwp.

Write to disable the interrupt - the linux pstate driver does this.  We
don't use the interrupts, so we can just turn them off.  We aren't ready
to handle them, so we don't want any.  Unclear if this is necessary.
SDM says it's default disabled.

FAST_IA32_HWP_REQUEST was removed in v2.  The check in v1 was wrong,
it's a model specific feature and the CPUID bit is only available
after enabling via the MSR.  Support was untested since I don't have
hardware with the feature.  Writes are expected to be infrequent, so
just leave it out.

---
v2:
Alphabetize headers
Re-work driver registration
name hwp_drv_data anonymous union "hw"
Drop hwp_verbose_cont
style cleanups
Condense hwp_governor switch
hwp_cpufreq_target remove .raw from hwp_req assignment
Use typed-pointer in a few functions
Pass type to xzalloc
Add HWP_ENERGY_PERF_BALANCE/IA32_ENERGY_BIAS_BALANCE defines
Add XEN_HWP_GOVERNOR define for "hwp-internal"
Capitalize CPUID and MSR defines
Change '_' to '-' for energy-perf & act-window
Read-modify-write MSRs updates
Use FAST_IA32_HWP_REQUEST_MSR_ENABLE define
constify pointer in hwp_set_misc_turbo
Add space after non-fallthrough break in governor switch
Add IA32_ENERGY_BIAS_MASK define
Check CPUID_PM_LEAK for energy bias when needed
Fail initialization with curr_req = -1
Fold hwp_read_capabilities into hwp_init_msrs
Add command line cpufreq=xen:hwp
Add command line cpufreq=xen:hdc
Use per_cpu for hwp_drv_data pointers
Move hwp_energy_perf_bias call into hwp_write_request
energy_perf 0 is valid, so hwp_energy_perf_bias cannot be skipped
Ensure we don't generate interrupts
Remove Fast Write of Uncore MSR
Initialize hwp_drv_data from curr_req
Use SPDX line instead of license text in hwp.c

v3:
Add cf_check to cpufreq_gov_hwp_init() - Marek
Print cpuid_level with %#x - Marek

v4:
Use BIT() for CPUID and MSR bits
Move __initdata after type
Add __ro_after_init to feature_*
Remove aperf/mperf comment
Move feature_hwp_energy_perf { to newline
Remove _IA32_ infix
Use unsigned int & bool for bitfields
Require energy perf pref (Remove ENERGY_PERF_BIAS support)
Initialize activity_window
Return errors on wrmsr failure
Change command line to: cpufreq=xen:hwp
Move hdc into the hwp-specific handle_options
Drop feature_hwp_energy_perf,

[PATCH v4 08/15] xenpm: Change get-cpufreq-para output for hwp

2023-06-14 Thread Jason Andryuk

When using HWP, some of the returned data is not applicable.  In that
case, we should just omit it to avoid confusing the user.  So switch to
printing the base and max frequencies since those are relevant to HWP.
Similarly, stop printing the CPU frequencies since those do not apply.
The scaling fields are also no longer printed.

Signed-off-by: Jason Andryuk 
---
v2:
Use full governor name XEN_HWP_GOVERNOR to change output
Style fixes

v4:
s/turbo/max/
Check for XEN_HWP_DRIVER driver instead of "-internal"
---
 tools/misc/xenpm.c | 83 +-
 1 file changed, 46 insertions(+), 37 deletions(-)

diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 1c474c3b59..4e53e68dc5 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -711,6 +711,7 @@ void start_gather_func(int argc, char *argv[])
 /* print out parameters about cpu frequency */
 static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para 
*p_cpufreq)
 {
+bool hwp = strcmp(p_cpufreq->scaling_driver, XEN_HWP_DRIVER) == 0;
 int i;
 
 printf("cpu id   : %d\n", cpuid);
@@ -720,49 +721,57 @@ static void print_cpufreq_para(int cpuid, struct 
xc_get_cpufreq_para *p_cpufreq)
 printf(" %d", p_cpufreq->affected_cpus[i]);
 printf("\n");
 
-printf("cpuinfo frequency: max [%u] min [%u] cur [%u]\n",
-   p_cpufreq->cpuinfo_max_freq,
-   p_cpufreq->cpuinfo_min_freq,
-   p_cpufreq->cpuinfo_cur_freq);
+if ( hwp )
+printf("cpuinfo frequency: base [%u] max [%u]\n",
+   p_cpufreq->cpuinfo_min_freq,
+   p_cpufreq->cpuinfo_max_freq);
+else
+printf("cpuinfo frequency: max [%u] min [%u] cur [%u]\n",
+   p_cpufreq->cpuinfo_max_freq,
+   p_cpufreq->cpuinfo_min_freq,
+   p_cpufreq->cpuinfo_cur_freq);
 
 printf("scaling_driver   : %s\n", p_cpufreq->scaling_driver);
 
-printf("scaling_avail_gov: %s\n",
-   p_cpufreq->scaling_available_governors);
-
-printf("current_governor : %s\n", p_cpufreq->u.s.scaling_governor);
-if ( !strncmp(p_cpufreq->u.s.scaling_governor,
-  "userspace", CPUFREQ_NAME_LEN) )
-{
-printf("  userspace specific :\n");
-printf("scaling_setspeed : %u\n",
-   p_cpufreq->u.s.u.userspace.scaling_setspeed);
-}
-else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
-   "ondemand", CPUFREQ_NAME_LEN) )
+if ( !hwp )
 {
-printf("  ondemand specific  :\n");
-printf("sampling_rate: max [%u] min [%u] cur [%u]\n",
-   p_cpufreq->u.s.u.ondemand.sampling_rate_max,
-   p_cpufreq->u.s.u.ondemand.sampling_rate_min,
-   p_cpufreq->u.s.u.ondemand.sampling_rate);
-printf("up_threshold : %u\n",
-   p_cpufreq->u.s.u.ondemand.up_threshold);
-}
+printf("scaling_avail_gov: %s\n",
+   p_cpufreq->scaling_available_governors);
 
-printf("scaling_avail_freq   :");
-for ( i = 0; i < p_cpufreq->freq_num; i++ )
-if ( p_cpufreq->scaling_available_frequencies[i] ==
- p_cpufreq->u.s.scaling_cur_freq )
-printf(" *%d", p_cpufreq->scaling_available_frequencies[i]);
-else
-printf(" %d", p_cpufreq->scaling_available_frequencies[i]);
-printf("\n");
+printf("current_governor : %s\n", p_cpufreq->u.s.scaling_governor);
+if ( !strncmp(p_cpufreq->u.s.scaling_governor,
+  "userspace", CPUFREQ_NAME_LEN) )
+{
+printf("  userspace specific :\n");
+printf("scaling_setspeed : %u\n",
+   p_cpufreq->u.s.u.userspace.scaling_setspeed);
+}
+else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
+   "ondemand", CPUFREQ_NAME_LEN) )
+{
+printf("  ondemand specific  :\n");
+printf("sampling_rate: max [%u] min [%u] cur [%u]\n",
+   p_cpufreq->u.s.u.ondemand.sampling_rate_max,
+   p_cpufreq->u.s.u.ondemand.sampling_rate_min,
+   p_cpufreq->u.s.u.ondemand.sampling_rate);
+printf("up_threshold : %u\n",
+   p_cpufreq->u.s.u.ondemand.up_threshold);
+}
+
+printf("scaling_avail_freq   :");
+for ( i = 0; i < p_cpufreq->freq_num; i++ )
+if ( p_cpufreq->scaling_available_frequencies[i] ==
+ p_cpufreq->u.s.scaling_cur_freq )
+printf(" *%d", p_cpufreq->scaling_available_frequencies[i]);
+else
+printf(" %d", p_cpufreq->scaling_available_frequencies[i]);
+printf("\n");
 
-printf("scaling frequency: max [%u] min [%u] cur [%u]\n",
-   p_cpufreq->u.s.scaling_max_freq,
-   p_cpufreq->u.s.scaling_min_freq,
-   p_cpufreq->u.s.scaling_cur_freq);
+

[PATCH v4 07/15] xen/x86: Tweak PDC bits when using HWP

2023-06-14 Thread Jason Andryuk

Qubes testing of HWP support had a report of a laptop, Thinkpad X1
Carbon Gen 4 with a Skylake processor, locking up during boot when HWP
is enabled.  A user found a kernel bug that seems to be the same issue:
https://bugzilla.kernel.org/show_bug.cgi?id=110941.

That bug was fixed by Linux commit a21211672c9a ("ACPI / processor:
Request native thermal interrupt handling via _OSC").  The tl;dr is SMM
crashes when it receives thermal interrupts, so Linux calls the ACPI
_OSC method to take over interrupt handling.

The Linux fix looks at the CPU features to decide whether or not to call
_OSC with bit 12 set to take over native interrupt handling.  Xen needs
some way to communicate HWP to Dom0 for making an equivalent call.

Xen exposes modified PDC bits via the platform_op set_pminfo hypercall.
Expand that to set bit 12 when HWP is present and in use.

Any generated interrupt would be handled by Xen's thermal drive, which
clears the status.

Bit 12 isn't named in the linux header and is open coded in Linux's
usage.

This will need a corresponding linux patch to pick up and apply the PDC
bits.

Signed-off-by: Jason Andryuk 
Reviewed-by: Jan Beulich 
---
v4:
Added __ro_after_init
s/ACPI_PDC_CPPC_NTV_INT/ACPI_PDC_CPPC_NATIVE_INTR/
Remove _IA32_
Fixup for opt_cpufreq_hwp removal
Add Jan Reviewed-by

v3:
New
---
 xen/arch/x86/acpi/cpufreq/hwp.c   | 16 +++-
 xen/arch/x86/acpi/lib.c   |  5 +
 xen/arch/x86/cpu/mcheck/mce_intel.c   |  6 ++
 xen/arch/x86/include/asm/msr-index.h  |  1 +
 xen/include/acpi/cpufreq/processor_perf.h |  1 +
 xen/include/acpi/pdc_intel.h  |  1 +
 6 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
index c62345dde7..5f210b54ff 100644
--- a/xen/arch/x86/acpi/cpufreq/hwp.c
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -13,7 +13,8 @@
 #include 
 #include 
 
-static bool __ro_after_init feature_hwp;
+static bool __ro_after_init hwp_in_use;
+
 static bool __ro_after_init feature_hwp_notification;
 static bool __ro_after_init feature_hwp_activity_window;
 
@@ -168,6 +169,11 @@ static int __init cf_check cpufreq_gov_hwp_init(void)
 }
 __initcall(cpufreq_gov_hwp_init);
 
+bool hwp_active(void)
+{
+return hwp_in_use;
+}
+
 bool __init hwp_available(void)
 {
 unsigned int eax;
@@ -211,7 +217,6 @@ bool __init hwp_available(void)
 return false;
 }
 
-feature_hwp = eax & CPUID6_EAX_HWP;
 feature_hwp_notification= eax & CPUID6_EAX_HWP_NOTIFICATION;
 feature_hwp_activity_window = eax & CPUID6_EAX_HWP_ACTIVITY_WINDOW;
 feature_hdc = eax & CPUID6_EAX_HDC;
@@ -224,12 +229,13 @@ bool __init hwp_available(void)
 hwp_verbose("HW_FEEDBACK %ssupported\n",
 (eax & CPUID6_EAX_HW_FEEDBACK) ? "" : "not ");
 
-cpufreq_governor_internal = feature_hwp;
+hwp_in_use = eax & CPUID6_EAX_HWP;
+cpufreq_governor_internal = hwp_in_use;
 
-if ( feature_hwp )
+if ( hwp_in_use )
 hwp_info("Using HWP for cpufreq\n");
 
-return feature_hwp;
+return hwp_in_use;
 }
 
 static int hdc_set_pkg_hdc_ctl(unsigned int cpu, bool val)
diff --git a/xen/arch/x86/acpi/lib.c b/xen/arch/x86/acpi/lib.c
index 43831b92d1..1b4710a790 100644
--- a/xen/arch/x86/acpi/lib.c
+++ b/xen/arch/x86/acpi/lib.c
@@ -26,6 +26,8 @@
 #include 
 #include 
 
+#include 
+
 u32 __read_mostly acpi_smi_cmd;
 u8 __read_mostly acpi_enable_value;
 u8 __read_mostly acpi_disable_value;
@@ -140,5 +142,8 @@ int arch_acpi_set_pdc_bits(u32 acpi_id, u32 *pdc, u32 mask)
!(ecx & CPUID5_ECX_INTERRUPT_BREAK))
pdc[2] &= ~(ACPI_PDC_C_C1_FFH | ACPI_PDC_C_C2C3_FFH);
 
+   if (hwp_active())
+   pdc[2] |= ACPI_PDC_CPPC_NATIVE_INTR;
+
return 0;
 }
diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c 
b/xen/arch/x86/cpu/mcheck/mce_intel.c
index 2f23f02923..c95152ad85 100644
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -15,6 +15,9 @@
 #include 
 #include 
 #include 
+
+#include 
+
 #include "mce.h"
 #include "x86_mca.h"
 #include "barrier.h"
@@ -64,6 +67,9 @@ static void cf_check intel_thermal_interrupt(struct 
cpu_user_regs *regs)
 
 ack_APIC_irq();
 
+if ( hwp_active() )
+wrmsr_safe(MSR_HWP_STATUS, 0);
+
 if ( NOW() < per_cpu(next, cpu) )
 return;
 
diff --git a/xen/arch/x86/include/asm/msr-index.h 
b/xen/arch/x86/include/asm/msr-index.h
index 47b09a24b5..351745f6bc 100644
--- a/xen/arch/x86/include/asm/msr-index.h
+++ b/xen/arch/x86/include/asm/msr-index.h
@@ -157,6 +157,7 @@
 #define MSR_HWP_CAPABILITIES0x0771
 #define MSR_HWP_INTERRUPT   0x0773
 #define MSR_HWP_REQUEST 0x0774
+#define MSR_HWP_STATUS  0x0777
 
 #define MSR_X2APIC_FIRST0x0800
 #define MSR_X2APIC_LAST 0x08ff
diff --git

[PATCH v4 02/15] cpufreq: Add perf_freq to cpuinfo

2023-06-14 Thread Jason Andryuk

acpi-cpufreq scales the aperf/mperf measurements by max_freq, but HWP
needs to scale by base frequency.  Settings max_freq to base_freq
"works" but the code is not obvious, and returning values to userspace
is tricky.  Add an additonal perf_freq member which is used for scaling
aperf/mperf measurements.

Signed-off-by: Jason Andryuk 
Acked-by: Jan Beulich 
---
v3:
Add Jan's Ack

I don't like this, but it seems the best way to re-use the common
aperf/mperf code.  The other option would be to add wrappers that then
do the acpi vs. hwp scaling.
---
 xen/arch/x86/acpi/cpufreq/cpufreq.c | 2 +-
 xen/drivers/cpufreq/utility.c   | 1 +
 xen/include/acpi/cpufreq/cpufreq.h  | 3 +++
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c 
b/xen/arch/x86/acpi/cpufreq/cpufreq.c
index 2e0067fbe5..6c70d04395 100644
--- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
+++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
@@ -316,7 +316,7 @@ unsigned int get_measured_perf(unsigned int cpu, unsigned 
int flag)
 else
 perf_percent = 0;
 
-return policy->cpuinfo.max_freq * perf_percent / 100;
+return policy->cpuinfo.perf_freq * perf_percent / 100;
 }
 
 static unsigned int cf_check get_cur_freq_on_cpu(unsigned int cpu)
diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
index 9eb7ecedcd..6831f62851 100644
--- a/xen/drivers/cpufreq/utility.c
+++ b/xen/drivers/cpufreq/utility.c
@@ -236,6 +236,7 @@ int cpufreq_frequency_table_cpuinfo(struct cpufreq_policy 
*policy,
 
 policy->min = policy->cpuinfo.min_freq = min_freq;
 policy->max = policy->cpuinfo.max_freq = max_freq;
+policy->cpuinfo.perf_freq = max_freq;
 policy->cpuinfo.second_max_freq = second_max_freq;
 
 if (policy->min == ~0)
diff --git a/xen/include/acpi/cpufreq/cpufreq.h 
b/xen/include/acpi/cpufreq/cpufreq.h
index 1c0872506a..e2e03b8bd7 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -37,6 +37,9 @@ extern struct acpi_cpufreq_data *cpufreq_drv_data[NR_CPUS];
 struct cpufreq_cpuinfo {
 unsigned intmax_freq;
 unsigned intsecond_max_freq;/* P1 if Turbo Mode is on */
+unsigned intperf_freq; /* Scaling freq for aperf/mpref.
+  acpi-cpufreq uses max_freq, but HWP uses
+  base_freq.*/
 unsigned intmin_freq;
 unsigned inttransition_latency; /* in 10^(-9) s = nanoseconds */
 };
-- 
2.40.1

[PATCH v4 04/15] xen/sysctl: Nest cpufreq scaling options

2023-06-14 Thread Jason Andryuk

Add a union and struct so that most of the scaling variables of struct
xen_get_cpufreq_para are within in a binary-compatible layout.  This
allows cppc_para to live in the larger union and use uint32_ts - struct
xen_cppc_para will be 10 uint32_t's.

The new scaling struct is 3 * uint32_t + 16 bytes CPUFREQ_NAME_LEN + 4 *
uint32_t for xen_ondemand = 11 uint32_t.  That means the old size is
retained, int32_t turbo_enabled doesn't move and it's binary compatible.

Signed-off-by: Jason Andryuk 
---
 tools/include/xenctrl.h | 22 +-
 tools/libs/ctrl/xc_pm.c |  5 -
 tools/misc/xenpm.c  | 24 
 xen/drivers/acpi/pmstat.c   | 27 ++-
 xen/include/public/sysctl.h | 22 +-
 5 files changed, 52 insertions(+), 48 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index dba33d5d0f..8aedb952a0 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1909,16 +1909,20 @@ struct xc_get_cpufreq_para {
 uint32_t cpuinfo_cur_freq;
 uint32_t cpuinfo_max_freq;
 uint32_t cpuinfo_min_freq;
-uint32_t scaling_cur_freq;
-
-char scaling_governor[CPUFREQ_NAME_LEN];
-uint32_t scaling_max_freq;
-uint32_t scaling_min_freq;
-
-/* for specific governor */
 union {
-xc_userspace_t userspace;
-xc_ondemand_t ondemand;
+struct {
+uint32_t scaling_cur_freq;
+
+char scaling_governor[CPUFREQ_NAME_LEN];
+uint32_t scaling_max_freq;
+uint32_t scaling_min_freq;
+
+/* for specific governor */
+union {
+xc_userspace_t userspace;
+xc_ondemand_t ondemand;
+} u;
+} s;
 } u;
 
 int32_t turbo_enabled;
diff --git a/tools/libs/ctrl/xc_pm.c b/tools/libs/ctrl/xc_pm.c
index c3a9864bf7..f92542eaf7 100644
--- a/tools/libs/ctrl/xc_pm.c
+++ b/tools/libs/ctrl/xc_pm.c
@@ -265,15 +265,10 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
 user_para->cpuinfo_cur_freq = sys_para->cpuinfo_cur_freq;
 user_para->cpuinfo_max_freq = sys_para->cpuinfo_max_freq;
 user_para->cpuinfo_min_freq = sys_para->cpuinfo_min_freq;
-user_para->scaling_cur_freq = sys_para->scaling_cur_freq;
-user_para->scaling_max_freq = sys_para->scaling_max_freq;
-user_para->scaling_min_freq = sys_para->scaling_min_freq;
 user_para->turbo_enabled= sys_para->turbo_enabled;
 
 memcpy(user_para->scaling_driver,
 sys_para->scaling_driver, CPUFREQ_NAME_LEN);
-memcpy(user_para->scaling_governor,
-sys_para->scaling_governor, CPUFREQ_NAME_LEN);
 
 /* copy to user_para no matter what cpufreq governor */
 BUILD_BUG_ON(sizeof(((struct xc_get_cpufreq_para *)0)->u) !=
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 1bb6187e56..ee8ce5d5f2 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -730,39 +730,39 @@ static void print_cpufreq_para(int cpuid, struct 
xc_get_cpufreq_para *p_cpufreq)
 printf("scaling_avail_gov: %s\n",
p_cpufreq->scaling_available_governors);
 
-printf("current_governor : %s\n", p_cpufreq->scaling_governor);
-if ( !strncmp(p_cpufreq->scaling_governor,
+printf("current_governor : %s\n", p_cpufreq->u.s.scaling_governor);
+if ( !strncmp(p_cpufreq->u.s.scaling_governor,
   "userspace", CPUFREQ_NAME_LEN) )
 {
 printf("  userspace specific :\n");
 printf("scaling_setspeed : %u\n",
-   p_cpufreq->u.userspace.scaling_setspeed);
+   p_cpufreq->u.s.u.userspace.scaling_setspeed);
 }
-else if ( !strncmp(p_cpufreq->scaling_governor,
+else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
"ondemand", CPUFREQ_NAME_LEN) )
 {
 printf("  ondemand specific  :\n");
 printf("sampling_rate: max [%u] min [%u] cur [%u]\n",
-   p_cpufreq->u.ondemand.sampling_rate_max,
-   p_cpufreq->u.ondemand.sampling_rate_min,
-   p_cpufreq->u.ondemand.sampling_rate);
+   p_cpufreq->u.s.u.ondemand.sampling_rate_max,
+   p_cpufreq->u.s.u.ondemand.sampling_rate_min,
+   p_cpufreq->u.s.u.ondemand.sampling_rate);
 printf("up_threshold : %u\n",
-   p_cpufreq->u.ondemand.up_threshold);
+   p_cpufreq->u.s.u.ondemand.up_threshold);
 }
 
 printf("scaling_avail_freq   :");
 for ( i = 0; i < p_cpufreq->freq_num; i++ )
 if ( p_cpufreq->scaling_available_frequencies[i] ==
- p_cpufreq->scaling_cur_freq )
+ p_cpufreq->u.s.scaling_cur_freq )
 printf(" *%d", p_cpufreq->scaling_available_frequencies[i]);
 else
 printf(" %d", p_cpufreq->scaling_available_frequencies[i]);
 printf("\n");
 
 printf("scaling frequency: max

[PATCH v4 05/15] pmstat: Re-arrage for cpufreq union

2023-06-14 Thread Jason Andryuk

Move some code around now that common xen_sysctl_pm_op get_para fields
are together.  In particular, the scaling governor information like
scaling_available_governors is inside the union, so it is not always
available.

With that, gov_num may be 0, so bounce buffer handling needs
to be modified.

scaling_governor won't be filled for hwp, so this will simplify the
change when it is introduced.

Signed-off-by: Jason Andryuk 
---
 tools/libs/ctrl/xc_pm.c   | 12 
 tools/misc/xenpm.c|  3 ++-
 xen/drivers/acpi/pmstat.c | 32 +---
 3 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/tools/libs/ctrl/xc_pm.c b/tools/libs/ctrl/xc_pm.c
index f92542eaf7..19fe1a79dd 100644
--- a/tools/libs/ctrl/xc_pm.c
+++ b/tools/libs/ctrl/xc_pm.c
@@ -221,7 +221,7 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
 {
 if ( (!user_para->affected_cpus)||
  (!user_para->scaling_available_frequencies)||
- (!user_para->scaling_available_governors) )
+ (user_para->gov_num && !user_para->scaling_available_governors) )
 {
 errno = EINVAL;
 return -1;
@@ -230,12 +230,15 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
 goto unlock_1;
 if ( xc_hypercall_bounce_pre(xch, scaling_available_frequencies) )
 goto unlock_2;
-if ( xc_hypercall_bounce_pre(xch, scaling_available_governors) )
+if ( user_para->gov_num &&
+ xc_hypercall_bounce_pre(xch, scaling_available_governors) )
 goto unlock_3;
 
 set_xen_guest_handle(sys_para->affected_cpus, affected_cpus);
 set_xen_guest_handle(sys_para->scaling_available_frequencies, 
scaling_available_frequencies);
-set_xen_guest_handle(sys_para->scaling_available_governors, 
scaling_available_governors);
+if ( user_para->gov_num )
+set_xen_guest_handle(sys_para->scaling_available_governors,
+ scaling_available_governors);
 }
 
 sysctl.cmd = XEN_SYSCTL_pm_op;
@@ -278,7 +281,8 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
 }
 
 unlock_4:
-xc_hypercall_bounce_post(xch, scaling_available_governors);
+if ( user_para->gov_num )
+xc_hypercall_bounce_post(xch, scaling_available_governors);
 unlock_3:
 xc_hypercall_bounce_post(xch, scaling_available_frequencies);
 unlock_2:
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index ee8ce5d5f2..1c474c3b59 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -811,7 +811,8 @@ static int show_cpufreq_para_by_cpuid(xc_interface 
*xc_handle, int cpuid)
 ret = -ENOMEM;
 goto out;
 }
-if (!(p_cpufreq->scaling_available_governors =
+if (p_cpufreq->gov_num &&
+!(p_cpufreq->scaling_available_governors =
   malloc(p_cpufreq->gov_num * CPUFREQ_NAME_LEN * sizeof(char
 {
 fprintf(stderr,
diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c
index f5a9ac3f1a..57359c21d8 100644
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -239,11 +239,24 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
 if ( ret )
 return ret;
 
+op->u.get_para.cpuinfo_cur_freq =
+cpufreq_driver.get ? cpufreq_driver.get(op->cpuid) : policy->cur;
+op->u.get_para.cpuinfo_max_freq = policy->cpuinfo.max_freq;
+op->u.get_para.cpuinfo_min_freq = policy->cpuinfo.min_freq;
+op->u.get_para.turbo_enabled = cpufreq_get_turbo_status(op->cpuid);
+
+if ( cpufreq_driver.name[0] )
+strlcpy(op->u.get_para.scaling_driver,
+cpufreq_driver.name, CPUFREQ_NAME_LEN);
+else
+strlcpy(op->u.get_para.scaling_driver, "Unknown", CPUFREQ_NAME_LEN);
+
 if ( !(scaling_available_governors =
xzalloc_array(char, gov_num * CPUFREQ_NAME_LEN)) )
 return -ENOMEM;
-if ( (ret = read_scaling_available_governors(scaling_available_governors,
-gov_num * CPUFREQ_NAME_LEN * sizeof(char))) )
+if ( (ret = read_scaling_available_governors(
+scaling_available_governors,
+gov_num * CPUFREQ_NAME_LEN * sizeof(char))) )
 {
 xfree(scaling_available_governors);
 return ret;
@@ -254,26 +267,16 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
 if ( ret )
 return ret;
 
-op->u.get_para.cpuinfo_cur_freq =
-cpufreq_driver.get ? cpufreq_driver.get(op->cpuid) : policy->cur;
-op->u.get_para.cpuinfo_max_freq = policy->cpuinfo.max_freq;
-op->u.get_para.cpuinfo_min_freq = policy->cpuinfo.min_freq;
-
 op->u.get_para.u.s.scaling_cur_freq = policy->cur;
 op->u.get_para.u.s.scaling_max_freq = policy->max;
 op->u.get_para.u.s.scaling_min_freq = policy->min;
 
-if ( cpufreq_driver.name[0] )
-strlcpy(op->u.get_para.scaling_driver,
-

[PATCH v4 00/15] Intel Hardware P-States (HWP) support

2023-06-14 Thread Jason Andryuk

Hi,

This patch series adds Hardware-Controlled Performance States (HWP) for
Intel processors to Xen.

v2 was only partially reviewed, so v3 is mostly a reposting of v2.  In v2 &
v3, I think I addressed all comments for v1.  I kept patch 11 "xenpm:
Factor out a non-fatal cpuid_parse variant", with a v2 comment
explaining why I keep it.

v3 adds "xen/x86: Tweak PDC bits when using HWP".  Qubes testing revealed
an issue where enabling HWP can crash firwmare code (maybe SMM).  This
requires a Linux change to get the PDC bits from Xen and pass them to
ACPI.  Roger has a patch [0] to set the PDC bits.  Roger's 3 patch
series was tested with "xen/x86: Tweak PDC bits when using HWP" on
affected hardware and allowed proper operation.

v4:
There is a large amount or renaming from HWP/hwp to CPPC/cppc in the series.
The driver remains hwp_ prefixed since it is dealing with the hardware
interface.  The sysctl, xc and xenpm interfaces were renamed to cppc to
be the generic ACPI CPPC (Collaborative Processor Performance Control)
interface.

struct xen_get_cpufreq_para was re-organized in a binary compatible
fashion to nest scaling governor options.  This allows the cppc support
to use uint32_t's for its parameters.

HWP is now enabled with a top-level cpufreq=hwp option.  It will
fallback to cpufreq=xen if hwp is unavailable.  This seems like the most
user-friendly option.  Since the user was trying to specify *some*
cpufreq, we should give them the best that we can instead of disabling
the functionality.

"xenpm: Factor out a non-fatal cpuid_parse variant" was dropped.
set-cpufreq-cppc expects either a cpu number or none specified, which
implies all.

Some patches were re-arrange - "xen/x86: Tweak PDC bits when using HWP"
now comes immediately after "cpufreq: Add Hardware P-State (HWP) driver"

The implementation of "cpufreq: Allow restricting to internal governors
only " changed, so I removed Jan's Ack.

Previous cover letter:

With HWP, the processor makes its own determinations for frequency
selection, though users can set some parameters and preferences.  There
is also Turbo Boost which dynamically pushes the max frequency if
possible.

The existing governors don't work with HWP since they select frequencies
and HWP doesn't expose those.  Therefore a dummy hwp-interal governor is
used that doesn't do anything.

xenpm get-cpufreq-para is extended to show HWP parameters, and
set-cpufreq-cppc is added to set them.

A lightly loaded OpenXT laptop showed ~1W power savings according to
powertop.  A mostly idle Fedora system (dom0 only) showed a more modest
power savings.

This is for a 10th gen 6-core 1600 MHz base 4900 MHZ max cpu.  In the
default balance mode, Turbo Boost doesn't exceed 4GHz.  Tweaking the
energy_perf preference with `xenpm set-cpufreq-para balance ene:64`,
I've seen the CPU hit 4.7GHz before throttling down and bouncing around
between 4.3 and 4.5 GHz.  Curiously the other cores read ~4GHz when
turbo boost takes affect.  This was done after pinning all dom0 cores,
and using taskset to pin to vCPU/pCPU 11 and running a bash tightloop.

HWP defaults to disabled and running with the existing HWP configuration
- it doesn't reconfigure by default.  It can be enabled with
cpufreq=hwp.

Hardware Duty Cycling (HDC) is another feature to autonomously powerdown
things.  It defaults to enabled when HWP is enabled, but HDC can be
disabled on the command line.  cpufreq=xen:hwp,no-hdc

I've only tested on 8th gen and 10th gen systems with activity window
and energy_perf support.  So the pathes for CPUs lacking those features
are untested.

Fast MSR support was removed in v2.  The model specific checking was not
done properly, and I don't have hardware to test with.  Since writes are
expected to be infrequent, I just removed the code.

This changes the systcl_pm_op hypercall, so that wants review.

Regards,
Jason

[0] 
https://lore.kernel.org/xen-devel/20221121102113.41893-3-roger@citrix.com/

Jason Andryuk (15):
  cpufreq: Allow restricting to internal governors only
  cpufreq: Add perf_freq to cpuinfo
  cpufreq: Export intel_feature_detect
  xen/sysctl: Nest cpufreq scaling options
  pmstat: Re-arrage for cpufreq union
  cpufreq: Add Hardware P-State (HWP) driver
  xen/x86: Tweak PDC bits when using HWP
  xenpm: Change get-cpufreq-para output for hwp
  cpufreq: Export HWP parameters to userspace as CPPC
  libxc: Include cppc_para in definitions
  xenpm: Print HWP/CPPC parameters
  xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op
  libxc: Add xc_set_cpufreq_cppc
  xenpm: Add set-cpufreq-cppc subcommand
  CHANGELOG: Add Intel HWP entry

 CHANGELOG.md  |   2 +-
 docs/misc/xen-command-line.pandoc |   9 +-
 tools/include/xenctrl.h   |  28 +-
 tools/libs/ctrl/xc_pm.c   |  35 +-
 tools/misc/xenpm.c| 385 +++--
 xen/arch/x86/acpi/cpufreq/Makefile|   1 +
 xen/arch/x86/acpi/cpufreq/cpufreq.c   |  15 +-

[PATCH v4 03/15] cpufreq: Export intel_feature_detect

2023-06-14 Thread Jason Andryuk

Export feature_detect as intel_feature_detect so it can be re-used by
HWP.

Signed-off-by: Jason Andryuk 
Acked-by: Jan Beulich 
---
v4:
Add Jan's Ack

v3:
Remove void * cast when calling intel_feature_detect

v2:
export intel_feature_detect with typed pointer
Move intel_feature_detect to acpi/cpufreq/cpufreq.h since the
declaration now contains struct cpufreq_policy *.
---
 xen/arch/x86/acpi/cpufreq/cpufreq.c | 8 ++--
 xen/include/acpi/cpufreq/cpufreq.h  | 2 ++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c 
b/xen/arch/x86/acpi/cpufreq/cpufreq.c
index 6c70d04395..f1cc473b4f 100644
--- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
+++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
@@ -339,9 +339,8 @@ static unsigned int cf_check get_cur_freq_on_cpu(unsigned 
int cpu)
 return extract_freq(get_cur_val(cpumask_of(cpu)), data);
 }
 
-static void cf_check feature_detect(void *info)
+void intel_feature_detect(struct cpufreq_policy *policy)
 {
-struct cpufreq_policy *policy = info;
 unsigned int eax;
 
 eax = cpuid_eax(6);
@@ -353,6 +352,11 @@ static void cf_check feature_detect(void *info)
 }
 }
 
+static void cf_check feature_detect(void *info)
+{
+intel_feature_detect(info);
+}
+
 static unsigned int check_freqs(const cpumask_t *mask, unsigned int freq,
 struct acpi_cpufreq_data *data)
 {
diff --git a/xen/include/acpi/cpufreq/cpufreq.h 
b/xen/include/acpi/cpufreq/cpufreq.h
index e2e03b8bd7..a49efd1cb2 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -244,4 +244,6 @@ int write_userspace_scaling_setspeed(unsigned int cpu, 
unsigned int freq);
 void cpufreq_dbs_timer_suspend(void);
 void cpufreq_dbs_timer_resume(void);
 
+void intel_feature_detect(struct cpufreq_policy *policy);
+
 #endif /* __XEN_CPUFREQ_PM_H__ */
-- 
2.40.1

[PATCH v4 01/15] cpufreq: Allow restricting to internal governors only

2023-06-14 Thread Jason Andryuk

For hwp, the standard governors are not usable, and only the internal
one is applicable.  Add the cpufreq_governor_internal boolean to
indicate when an internal governor, like hwp, will be used.
This is set during presmp_initcall, so that it can suppress governor
registration during initcall.  Add an internal flag to struct
cpufreq_governor to indicate such governors.

This way, the unusable governors are not registered, so the internal
one is the only one returned to userspace.  This means incompatible
governors won't be advertised to userspace.

Signed-off-by: Jason Andryuk 
---
v4:
Rework to use an internal flag
Removed Jan's Ack since the approach is different.

v3:
Switch to initdata
Add Jan Acked-by
Commit message s/they/the/ typo
Don't register hwp-internal when running non-hwp - Marek

v2:
Switch to "-internal"
Add blank line in header
---
 xen/drivers/cpufreq/cpufreq.c  | 7 +++
 xen/include/acpi/cpufreq/cpufreq.h | 3 +++
 2 files changed, 10 insertions(+)

diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index 2321c7dd07..cccf9a64c8 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -56,6 +56,7 @@ struct cpufreq_dom {
 };
 static LIST_HEAD_READ_MOSTLY(cpufreq_dom_list_head);
 
+bool __initdata cpufreq_governor_internal;
 struct cpufreq_governor *__read_mostly cpufreq_opt_governor;
 LIST_HEAD_READ_MOSTLY(cpufreq_governor_list);
 
@@ -121,6 +122,12 @@ int __init cpufreq_register_governor(struct 
cpufreq_governor *governor)
 if (!governor)
 return -EINVAL;
 
+if (cpufreq_governor_internal && !governor->internal)
+return -EINVAL;
+
+if (!cpufreq_governor_internal && governor->internal)
+return -EINVAL;
+
 if (__find_governor(governor->name) != NULL)
 return -EEXIST;
 
diff --git a/xen/include/acpi/cpufreq/cpufreq.h 
b/xen/include/acpi/cpufreq/cpufreq.h
index 35dcf21e8f..1c0872506a 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -106,6 +106,7 @@ struct cpufreq_governor {
 unsigned int event);
 bool_t  (*handle_option)(const char *name, const char *value);
 struct list_head governor_list;
+boolinternal;
 };
 
 extern struct cpufreq_governor *cpufreq_opt_governor;
@@ -114,6 +115,8 @@ extern struct cpufreq_governor cpufreq_gov_userspace;
 extern struct cpufreq_governor cpufreq_gov_performance;
 extern struct cpufreq_governor cpufreq_gov_powersave;
 
+extern bool cpufreq_governor_internal;
+
 extern struct list_head cpufreq_governor_list;
 
 extern int cpufreq_register_governor(struct cpufreq_governor *governor);
-- 
2.40.1

Re: Functions _spin_lock_cb() and handle_ro_raz()

2023-06-14 Thread Julien Grall


(+ Bertrand and Stefano)

On 14/06/2023 14:08, Federico Serafini wrote:

Hello everyone,


Hi Federico,

Let me start with a tip to help reaching the maintainers and getting a 
more timely answer. Xen-devel has a large volume of e-mails (still less 
than Linux :)). So some of us will have filter to try to classify the 
e-mails received.


Commonly, all the e-mails where the person is in the CC/To list will go 
to the inbox. All the others will go a separate directory that may or 
may not be watched. Personally, I tend to glance in that directory, but 
I would not read all of them.


So I would highly recommend to CC the maintainers/reviewers of the 
specific component. You can find them in MAINTAINERS at the root of the 
Xen repository. We also have script like scripts/get_maintainers.pl that 
can help you to find who to CC.


If you pass '-f ', it will output the maintainers of that file. 
You can also use the script with patch to find all the maintainers to CC.


Now back to the subject of the e-mail.



I am working on the violations of MISRA C:2012 Rule 8.10,
whose headline says:
"An inline function shall be declared with the static storage class".

For both ARM64 and X86_64 builds,
function _spin_lock_cb() defined in spinlock.c violates the rule.
Such function is declared in spinlock.h without
the inline function specifier: are there any reasons to do this?
What about solving the violation by moving the function definition in
spinlock.h and declaring it as static inline?


Jan answered it and sent a patch. So I will skip the reply for this one.



The same happens also for the function handle_ro_raz() in the ARM64
build, declared in traps.h and defined in traps.c.


I looked at the history and it is not clear to me why the 'inline' was 
added at first. That said, I don't see any value to ask the compiler to 
inline (which it would be free to ignore) the function.


So I would suggest to send a patch to remove the 'inline'.

Best regards,

--
Julien Grall

[xen-unstable test] 181423: regressions - FAIL

2023-06-14 Thread osstest service owner

flight 181423 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181423/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64-pvops 6 kernel-build fail REGR. vs. 181415

Tests which are failing intermittently (not blocking):
 test-amd64-i386-livepatch 7 xen-install  fail in 181415 pass in 181423
 test-amd64-amd64-xl-qemuu-win7-amd64 12 windows-install fail in 181415 pass in 
181423
 test-amd64-amd64-xl-qcow221 guest-start/debian.repeat  fail pass in 181415

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-examine  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stop  fail blocked in 181415
 test-amd64-amd64-xl-qcow222 guest-start.2 fail in 181415 blocked in 181423
 test-arm64-arm64-xl-xsm 15 migrate-support-check fail in 181415 never pass
 test-arm64-arm64-xl-xsm 16 saverestore-support-check fail in 181415 never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-check fail in 181415 never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-check fail in 181415 never 
pass
 test-arm64-arm64-xl-credit2 15 migrate-support-check fail in 181415 never pass
 test-arm64-arm64-xl-credit2 16 saverestore-support-check fail in 181415 never 
pass
 test-arm64-arm64-xl-credit1 15 migrate-support-check fail in 181415 never pass
 test-arm64-arm64-xl-credit1 16 saverestore-support-check fail in 181415 never 
pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-check fail in 181415 never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-check fail in 181415 never 
pass
 test-arm64-arm64-xl 15 migrate-support-check fail in 181415 never pass
 test-arm64-arm64-xl 16 saverestore-support-check fail in 181415 never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-check fail in 181415 never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-check fail in 181415 never 
pass
 test-arm64-arm64-xl-vhd 14 migrate-support-check fail in 181415 never pass
 test-arm64-arm64-xl-vhd 15 saverestore-support-check fail in 181415 never pass
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 181415
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 181415
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 181415
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 181415
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 181415
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 181415
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail like 
181415
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 181415
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 181415
 test-amd64-i386-xl-vhd   21 guest-start/debian.repeatfail  like 181415
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 181415
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 181415
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 181415
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass

Re: [PATCH v3 3/4] x86/spec-ctrl: Fix up the RSBA/RRSBA bits as appropriate

2023-06-14 Thread Andrew Cooper

On 13/06/2023 10:30 am, Jan Beulich wrote:
> On 12.06.2023 18:13, Andrew Cooper wrote:
>> @@ -593,15 +596,93 @@ static bool __init retpoline_calculations(void)
>>  return false;
>>  
>>  /*
>> - * RSBA may be set by a hypervisor to indicate that we may move to a
>> - * processor which isn't retpoline-safe.
>> + * The meaning of the RSBA and RRSBA bits have evolved over time.  The
>> + * agreed upon meaning at the time of writing (May 2023) is thus:
>> + *
>> + * - RSBA (RSB Alternative) means that an RSB may fall back to an
>> + *   alternative predictor on underflow.  Skylake uarch and later all 
>> have
>> + *   this property.  Broadwell too, when running microcode versions 
>> prior
>> + *   to Jan 2018.
>> + *
>> + * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces
>> + *   tagging of predictions with the mode in which they were learned.  
>> So
>> + *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
>> + *
>> + * - CPUs are not expected to enumerate both RSBA and RRSBA.
>> + *
>> + * Some parts (Broadwell) are not expected to ever enumerate this
>> + * behaviour directly.  Other parts have differing enumeration with
>> + * microcode version.  Fix up Xen's idea, so we can advertise them 
>> safely
>> + * to guests, and so toolstacks can level a VM safety for migration.
>> + *
>> + * The following states exist:
>> + *
>> + * |   | RSBA | EIBRS | RRSBA | Notes  | Action|
>> + * |---+--+---+---++---|
>> + * | 1 |0 | 0 | 0 | OK (older parts)   | Maybe +RSBA   |
>> + * | 2 |0 | 0 | 1 | Broken | +RSBA, -RRSBA |
>> + * | 3 |0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA|
>> + * | 4 |0 | 1 | 1 | OK |   |
>> + * | 5 |1 | 0 | 0 | OK |   |
>> + * | 6 |1 | 0 | 1 | Broken | -RRSBA|
>> + * | 7 |1 | 1 | 0 | Broken | -RSBA, +RRSBA |
>> + * | 8 |1 | 1 | 1 | Broken | -RSBA |
> You've kept the Action column as you had it originally, despite no longer
> applying all the fixups. Wouldn't it make sense to mark those we don't do,
> e.g. by enclosing in parentheses?

Hmm, yes.  How does this look?

|   | RSBA | EIBRS | RRSBA | Notes  | Action (in principle) |
|---+--+---+---++---|
| 1 |    0 | 0 | 0 | OK (older parts)   | Maybe +RSBA   |
| 2 |    0 | 0 | 1 | Broken | (+RSBA, -RRSBA)   |
| 3 |    0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA    |
| 4 |    0 | 1 | 1 | OK |   |
| 5 |    1 | 0 | 0 | OK |   |
| 6 |    1 | 0 | 1 | Broken | (-RRSBA)  |
| 7 |    1 | 1 | 0 | Broken | (-RSBA, +RRSBA)   |
| 8 |    1 | 1 | 1 | Broken | (-RSBA)   |


>> + * further investigation.
>> + */
>> +if ( cpu_has_eibrs ? cpu_has_rsba  /* Rows 7, 8 */
>> +   : cpu_has_rrsba /* Rows 2, 6 */ )
>> +{
>> +printk(XENLOG_ERR
>> +   "FIRMWARE BUG: CPU %02x-%02x-%02x, ucode 0x%08x: RSBA %u, 
>> EIBRS %u, RRSBA %u\n",
>> +   boot_cpu_data.x86, boot_cpu_data.x86_model,
>> +   boot_cpu_data.x86_mask, ucode_rev,
>> +   cpu_has_rsba, cpu_has_eibrs, cpu_has_rrsba);
> Perhaps with adjustments (as you deem them sensible)
> Reviewed-by: Jan Beulich 

Thanks.

~Andrew

Re: [PATCH v9 02/42] mm: Move pte/pmd_mkwrite() callers with no VMA to _novma()

2023-06-14 Thread Edgecombe, Rick P

On Tue, 2023-06-13 at 19:00 +0200, David Hildenbrand wrote:
> On 13.06.23 18:19, Edgecombe, Rick P wrote:
> > On Tue, 2023-06-13 at 10:44 +0300, Mike Rapoport wrote:
> > > > Previous patches have done the first step, so next move the
> > > > callers
> > > > that
> > > > don't have a VMA to pte_mkwrite_novma(). Also do the same for
> > > 
> > > I hear x86 maintainers asking to drop "previous patches" ;-)
> > > 
> > > Maybe
> > > This is the second step of the conversion that moves the callers
> > > ...
> > 
> > Really? I've not heard that. Just a strong aversion to "this
> > patch".
> > I've got feedback to say "previous patches" and not "the last
> > patch" so
> > it doesn't get stale. I guess it could be "previous changes".
> 
> Talking about patches make sense when discussing literal patches sent
> to 
> the mailing list. In the git log, it's commit, and "future commits"
> or 
> "follow-up work".
> 
> Yes, we use "patches" all of the time in commit logs, especially when
> we 
>   include the cover letter in the commit message (as done frequently
> in 
> the -mm tree).

I think I'll switch over to talking about "changes". If you talk about
commits it doesn't make as much sense when they are still just patches.
Thanks.

Re: [PATCH v4 21/34] arm64: Convert various functions to use ptdescs

2023-06-14 Thread Catalin Marinas

On Mon, Jun 12, 2023 at 02:04:10PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Catalin Marinas

[qemu-mainline test] 181428: regressions - FAIL

2023-06-14 Thread osstest service owner

flight 181428 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181428/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 180691
 build-arm64   6 xen-buildfail REGR. vs. 180691
 build-arm64-xsm   6 xen-buildfail REGR. vs. 180691
 build-amd64   6 xen-buildfail REGR. vs. 180691
 build-i3866 xen-buildfail REGR. vs. 180691
 build-i386-xsm6 xen-buildfail REGR. vs. 180691
 build-armhf   6 xen-buildfail REGR. vs. 180691

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-ws16-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-shadow 1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-vhd1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-coresched-i386-xl  1 build-check(1)   blocked  n/a
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a

Re: [PATCH v3 2/4] xen: Add files needed for minimal ppc64le build

2023-06-14 Thread Shawn Anastasio

On Wed Jun 14, 2023 at 10:51 AM CDT, Jan Beulich wrote:
> On 13.06.2023 16:50, Shawn Anastasio wrote:
> > --- /dev/null
> > +++ b/xen/arch/ppc/Makefile
> > @@ -0,0 +1,16 @@
> > +obj-$(CONFIG_PPC64) += ppc64/
> > +
> > +$(TARGET): $(TARGET)-syms
> > +   cp -f $< $@
> > +
> > +$(TARGET)-syms: $(objtree)/prelink.o $(obj)/xen.lds
> > +   $(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< $(build_id_linker) -o $@
> > +   $(NM) -pa --format=sysv $(@D)/$(@F) \
> > +   | $(objtree)/tools/symbols --all-symbols --xensyms --sysv 
> > --sort \
> > +   >$(@D)/$(@F).map
>
> Elsewhere we recently switched these uses of $(@D)/$(@F) to just $@.
> Please can you do so here as well?

Sure, will fix in v4.

> > --- /dev/null
> > +++ b/xen/arch/ppc/arch.mk
> > @@ -0,0 +1,11 @@
> > +
> > +# Power-specific definitions
> > +
> > +ppc-march-$(CONFIG_POWER_ISA_2_07B) := power8
> > +ppc-march-$(CONFIG_POWER_ISA_3_00) := power9
> > +
> > +CFLAGS += -mcpu=$(ppc-march-y) -mstrict-align -mcmodel=large -mabi=elfv2 
> > -mno-altivec -mno-vsx
>
> Wouldn't it make sense to also pass -mlittle here, such that a tool
> chain defaulting to big-endian can still be used?

Good call. On this topic, I suppose I'll also add -m64 to allow 32-bit
toolchains to be used as well.

> > --- /dev/null
> > +++ b/xen/arch/ppc/ppc64/head.S
> > @@ -0,0 +1,27 @@
> > +/* SPDX-License-Identifier: GPL-2.0-or-later */
> > +
> > +.section .text.header, "ax", %progbits
> > +
> > +ENTRY(start)
> > +/*
> > + * Depending on how we were booted, the CPU could be running in either
> > + * Little Endian or Big Endian mode. The following trampoline from 
> > Linux
> > + * cleverly uses an instruction that encodes to a NOP if the CPU's
> > + * endianness matches the assumption of the assembler (LE, in our case)
> > + * or a branch to code that performs the endian switch in the other 
> > case.
> > + */
> > +tdi 0, 0, 0x48/* Reverse endian of b . + 8  */
> > +b $ + 44  /* Skip trampoline if endian is good  */
>
> If I get this right, $ and . are interchangable on Power? If not,
> then all is fine and there likely is a reason to use . in the
> comment but $ in the code. But if so, it would be nice if both
> could match, and I guess with other architectures in mind . would
> be preferable.

As hinted by the comment, this code was directly inherited from Linux
and I'm not sure why the original author chose '$' instead of '.'. That
said, as far as I can tell you are correct about the two being
interchangeable, and changing the $ to . results in the exact same
machine code.

I can go ahead and make the change for consistency in v4.

> > +DECL_SECTION(.bss) { /* BSS */
> > +__bss_start = .;
> > +*(.bss.stack_aligned)
> > +. = ALIGN(PAGE_SIZE);
> > +*(.bss.page_aligned)
>
> ... the one between the two .bss parts looks unmotivated. Within
> a section definition ALIGN() typically only makes sense when followed
> by a label definition, like ...

Correct me if I'm wrong, but wouldn't the ALIGN here serve to ensure
that the subsequent '.bss.page_aligned' section has the correct alignment
that its name implies?

> Jan

Thanks,
Shawn

[PATCH v2] xen/grant: Purge PIN_FAIL()

2023-06-14 Thread Andrew Cooper

The name PIN_FAIL() is poor; it's not used only pinning failures.  More
importantly, it interferes with code legibility by hiding control flow.
Expand and drop it.

 * Drop redundant "rc = rc" assignment
 * Rework gnttab_copy_buf() to be simpler by dropping the rc variable

As a side effect, this fixes several violations of MISRA rule 2.1 (dead code -
the while() following a goto).

No functional change.

Signed-off-by: Andrew Cooper 
Reviewed-by: Julien Grall 
Reviewed-by: Jan Beulich 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 

v2:
 * Fix indentation.
 * Reword the commit message a little.
---
 xen/common/grant_table.c | 154 ---
 1 file changed, 111 insertions(+), 43 deletions(-)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index d87e58a53d86..89b7811c51c3 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -270,13 +270,6 @@ struct gnttab_unmap_common {
 #define GNTTAB_UNMAP_BATCH_SIZE 32
 
 
-#define PIN_FAIL(_lbl, _rc, _f, _a...)  \
-do {\
-gdprintk(XENLOG_WARNING, _f, ## _a );   \
-rc = (_rc); \
-goto _lbl;  \
-} while ( 0 )
-
 /*
  * Tracks a mapping of another domain's grant reference. Each domain has a
  * table of these, indexes into which are returned as a 'mapping handle'.
@@ -785,9 +778,13 @@ static int _set_status_v1(const grant_entry_header_t *shah,
 /* If not already pinned, check the grant domid and type. */
 if ( !act->pin && (((scombo.flags & mask) != GTF_permit_access) ||
(scombo.domid != ldomid)) )
-PIN_FAIL(done, GNTST_general_error,
+{
+gdprintk(XENLOG_WARNING,
  "Bad flags (%x) or dom (%d); expected d%d\n",
  scombo.flags, scombo.domid, ldomid);
+rc = GNTST_general_error;
+goto done;
+}
 
 new = scombo;
 new.flags |= GTF_reading;
@@ -796,8 +793,12 @@ static int _set_status_v1(const grant_entry_header_t *shah,
 {
 new.flags |= GTF_writing;
 if ( unlikely(scombo.flags & GTF_readonly) )
-PIN_FAIL(done, GNTST_general_error,
+{
+gdprintk(XENLOG_WARNING,
  "Attempt to write-pin a r/o grant entry\n");
+rc = GNTST_general_error;
+goto done;
+}
 }
 
 prev.raw = guest_cmpxchg(rd, raw_shah, scombo.raw, new.raw);
@@ -805,8 +806,11 @@ static int _set_status_v1(const grant_entry_header_t *shah,
 break;
 
 if ( retries++ == 4 )
-PIN_FAIL(done, GNTST_general_error,
- "Shared grant entry is unstable\n");
+{
+gdprintk(XENLOG_WARNING, "Shared grant entry is unstable\n");
+rc = GNTST_general_error;
+goto done;
+}
 
 scombo = prev;
 }
@@ -840,9 +844,13 @@ static int _set_status_v2(const grant_entry_header_t *shah,
  scombo.flags & mask) != GTF_permit_access) &&
(mapflag || ((scombo.flags & mask) != GTF_transitive))) ||
   (scombo.domid != ldomid)) )
-PIN_FAIL(done, GNTST_general_error,
+{
+gdprintk(XENLOG_WARNING,
  "Bad flags (%x) or dom (%d); expected d%d, flags %x\n",
  scombo.flags, scombo.domid, ldomid, mask);
+rc = GNTST_general_error;
+goto done;
+}
 
 if ( readonly )
 {
@@ -851,8 +859,12 @@ static int _set_status_v2(const grant_entry_header_t *shah,
 else
 {
 if ( unlikely(scombo.flags & GTF_readonly) )
-PIN_FAIL(done, GNTST_general_error,
+{
+gdprintk(XENLOG_WARNING,
  "Attempt to write-pin a r/o grant entry\n");
+rc = GNTST_general_error;
+goto done;
+}
 *status |= GTF_reading | GTF_writing;
 }
 
@@ -870,9 +882,11 @@ static int _set_status_v2(const grant_entry_header_t *shah,
  (!readonly && (scombo.flags & GTF_readonly)) )
 {
 gnttab_clear_flags(rd, GTF_writing | GTF_reading, status);
-PIN_FAIL(done, GNTST_general_error,
+gdprintk(XENLOG_WARNING,
  "Unstable flags (%x) or dom (%d); expected d%d (r/w: 
%d)\n",
  scombo.flags, scombo.domid, ldomid, !readonly);
+rc = GNTST_general_error;
+goto done;
 }
 }
 else
@@ -880,8 +894,9 @@ static int _set_status_v2(const grant_entry_header_t *shah,
 if ( unlikely(scombo.flags & GTF_readonly) )
 {
 gnttab_clear_flags(rd, GTF_writing, status);
-PIN_FAIL(done, GNTST_general_error,
- "Unstable grant readonly

Re: Functions _spin_lock_cb() and handle_ro_raz()

2023-06-14 Thread Federico Serafini




On 14/06/23 16:03, Jan Beulich wrote:

On 14.06.2023 15:08, Federico Serafini wrote:

Hello everyone,

I am working on the violations of MISRA C:2012 Rule 8.10,
whose headline says:
"An inline function shall be declared with the static storage class".

For both ARM64 and X86_64 builds,
function _spin_lock_cb() defined in spinlock.c violates the rule.
Such function is declared in spinlock.h without
the inline function specifier: are there any reasons to do this?

Since this function was mentioned elsewhere already, I'm afraid I
have to be a little blunt and ask back: Did you check the history
of the function. Yes, it is intentional to be that way, for the
function to be inlined into _spin_lock(), and for it to also be
available for external callers (we have just one right now, but
that could change).


What about solving the violation by moving the function definition in
spinlock.h and declaring it as static inline?

Did you try whether that would work at least purely mechanically?
I'm afraid you'll find that it doesn't, because of LOCK_PROFILE_*
being unavailable then. Yet we also don't want to expose all that
in the header.

In the earlier context I did suggest already to make the function
an always-inline one in spinlock.c, under a slightly altered name,
and then have _spin_lock_cb() be a trivial wrapper just like
_spin_lock() is. I guess best is going to be if I make and post a
patch ...

Jan

Thank you for the information.

Federico

[PATCH] Arm: drop bogus ALIGN() from linker script

2023-06-14 Thread Jan Beulich

Having ALIGN() inside a section definition usually makes sense only with
a label definition following (an exception case is a few lines out of
context, where cache line sharing is intended to be avoided).
Constituents of .bss.page_aligned need to specify their own alignment
correctly anyway, or else they're susceptible to link order changing.
This requirement is already met: Arm-specific code has no such object,
while common (EFI) code has another one. That one has suitable alignment
specified.

Signed-off-by: Jan Beulich 
---
Note how RISC-V had this dropped pretty recently.

--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -199,7 +199,6 @@ SECTIONS
   .bss : { /* BSS */
__bss_start = .;
*(.bss.stack_aligned)
-   . = ALIGN(PAGE_SIZE);
*(.bss.page_aligned)
. = ALIGN(PAGE_SIZE);
__per_cpu_start = .;

Re: [PATCH v3 2/4] xen: Add files needed for minimal ppc64le build

2023-06-14 Thread Jan Beulich

On 13.06.2023 16:50, Shawn Anastasio wrote:
> --- /dev/null
> +++ b/xen/arch/ppc/Makefile
> @@ -0,0 +1,16 @@
> +obj-$(CONFIG_PPC64) += ppc64/
> +
> +$(TARGET): $(TARGET)-syms
> + cp -f $< $@
> +
> +$(TARGET)-syms: $(objtree)/prelink.o $(obj)/xen.lds
> + $(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< $(build_id_linker) -o $@
> + $(NM) -pa --format=sysv $(@D)/$(@F) \
> + | $(objtree)/tools/symbols --all-symbols --xensyms --sysv 
> --sort \
> + >$(@D)/$(@F).map

Elsewhere we recently switched these uses of $(@D)/$(@F) to just $@.
Please can you do so here as well?

> --- /dev/null
> +++ b/xen/arch/ppc/arch.mk
> @@ -0,0 +1,11 @@
> +
> +# Power-specific definitions
> +
> +ppc-march-$(CONFIG_POWER_ISA_2_07B) := power8
> +ppc-march-$(CONFIG_POWER_ISA_3_00) := power9
> +
> +CFLAGS += -mcpu=$(ppc-march-y) -mstrict-align -mcmodel=large -mabi=elfv2 
> -mno-altivec -mno-vsx

Wouldn't it make sense to also pass -mlittle here, such that a tool
chain defaulting to big-endian can still be used?

> --- /dev/null
> +++ b/xen/arch/ppc/ppc64/head.S
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +
> +.section .text.header, "ax", %progbits
> +
> +ENTRY(start)
> +/*
> + * Depending on how we were booted, the CPU could be running in either
> + * Little Endian or Big Endian mode. The following trampoline from Linux
> + * cleverly uses an instruction that encodes to a NOP if the CPU's
> + * endianness matches the assumption of the assembler (LE, in our case)
> + * or a branch to code that performs the endian switch in the other case.
> + */
> +tdi 0, 0, 0x48/* Reverse endian of b . + 8  */
> +b $ + 44  /* Skip trampoline if endian is good  */

If I get this right, $ and . are interchangable on Power? If not,
then all is fine and there likely is a reason to use . in the
comment but $ in the code. But if so, it would be nice if both
could match, and I guess with other architectures in mind . would
be preferable.

> --- /dev/null
> +++ b/xen/arch/ppc/xen.lds.S
> @@ -0,0 +1,173 @@
> +#include 
> +
> +#undef ENTRY
> +#undef ALIGN
> +
> +OUTPUT_ARCH(powerpc:common64)
> +ENTRY(start)
> +
> +PHDRS
> +{
> +text PT_LOAD ;
> +#if defined(BUILD_ID)
> +note PT_NOTE ;
> +#endif
> +}
> +
> +/**
> + * OF's base load address is 0x40 (XEN_VIRT_START).
> + * By defining sections this way, we can keep our virtual address base at 
> 0x40
> + * while keeping the physical base at 0x0.
> + *
> + * Without this, OF incorrectly loads .text at 0x40 + 0x40 = 
> 0x80.
> + * Taken from x86/xen.lds.S
> + */
> +#ifdef CONFIG_LD_IS_GNU
> +# define DECL_SECTION(x) x : AT(ADDR(#x) - XEN_VIRT_START)
> +#else
> +# define DECL_SECTION(x) x : AT(ADDR(x) - XEN_VIRT_START)
> +#endif
> +
> +SECTIONS
> +{
> +. = XEN_VIRT_START;
> +
> +DECL_SECTION(.text) {
> +_stext = .;/* Text section */
> +*(.text.header)
> +
> +*(.text.cold)
> +*(.text.unlikely .text.*_unlikely .text.unlikely.*)
> +
> +*(.text)
> +#ifdef CONFIG_CC_SPLIT_SECTIONS
> +*(.text.*)
> +#endif
> +
> +*(.fixup)
> +*(.gnu.warning)
> +. = ALIGN(POINTER_ALIGN);
> +_etext = .; /* End of text section */
> +} :text
> +
> +. = ALIGN(PAGE_SIZE);
> +DECL_SECTION(.rodata) {
> +_srodata = .;  /* Read-only data */
> +*(.rodata)
> +*(.rodata.*)
> +*(.data.rel.ro)
> +*(.data.rel.ro.*)
> +
> +VPCI_ARRAY
> +
> +. = ALIGN(POINTER_ALIGN);
> +_erodata = .;/* End of read-only data */
> +} :text
> +
> +#if defined(BUILD_ID)
> +. = ALIGN(4);
> +DECL_SECTION(.note.gnu.build-id) {
> +__note_gnu_build_id_start = .;
> +*(.note.gnu.build-id)
> +__note_gnu_build_id_end = .;
> +} :note :text
> +#endif
> +_erodata = .;/* End of read-only data */
> +
> +. = ALIGN(PAGE_SIZE);
> +DECL_SECTION(.data.ro_after_init) {
> +__ro_after_init_start = .;
> +*(.data.ro_after_init)
> +. = ALIGN(PAGE_SIZE);
> +__ro_after_init_end = .;
> +} : text
> +
> +DECL_SECTION(.data.read_mostly) {
> +*(.data.read_mostly)
> +} :text
> +
> +. = ALIGN(PAGE_SIZE);
> +DECL_SECTION(.data) {/* Data */
> +*(.data.page_aligned)
> +. = ALIGN(8);
> +__start_schedulers_array = .;
> +*(.data.schedulers)
> +__end_schedulers_array = .;
> +
> +HYPFS_PARAM
> +
> +*(.data .data.*)
> +CONSTRUCTORS
> +} :text
> +
> +. = ALIGN(PAGE_SIZE); /* Init code and data */
> +__init_begin = .;
> +DECL_SECTION(.init.text) {
> +_sinittext = .;
> +*(.init.text)
> +_einittext = .;
> +. = ALIGN(PAGE_SIZE);/* Avoid mapping alt insns

Re: [PATCH v4 34/34] mm: Remove pgtable_{pmd, pte}_page_{ctor, dtor}() wrappers

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:23PM -0700, Vishal Moola (Oracle) wrote:
> These functions are no longer necessary. Remove them and cleanup
> Documentation referencing them.
> 
> Signed-off-by: Vishal Moola (Oracle) 

I've found one stale reference in riscv:

$ git grep -n pgtable_pmd_page_ctor
arch/riscv/mm/init.c:440:   BUG_ON(!vaddr || 
!pgtable_pmd_page_ctor(virt_to_page(vaddr)));

Otherwise

Acked-by: Mike Rapoport (IBM) 


> ---
>  Documentation/mm/split_page_table_lock.rst| 12 +--
>  .../zh_CN/mm/split_page_table_lock.rst| 14 ++---
>  include/linux/mm.h| 20 ---
>  3 files changed, 13 insertions(+), 33 deletions(-)
> 
> diff --git a/Documentation/mm/split_page_table_lock.rst 
> b/Documentation/mm/split_page_table_lock.rst
> index 50ee0dfc95be..4bffec728340 100644
> --- a/Documentation/mm/split_page_table_lock.rst
> +++ b/Documentation/mm/split_page_table_lock.rst
> @@ -53,7 +53,7 @@ Support of split page table lock by an architecture
>  ===
>  
>  There's no need in special enabling of PTE split page table lock: everything
> -required is done by pgtable_pte_page_ctor() and pgtable_pte_page_dtor(), 
> which
> +required is done by pagetable_pte_ctor() and pagetable_pte_dtor(), which
>  must be called on PTE table allocation / freeing.
>  
>  Make sure the architecture doesn't use slab allocator for page table
> @@ -63,8 +63,8 @@ This field shares storage with page->ptl.
>  PMD split lock only makes sense if you have more than two page table
>  levels.
>  
> -PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table
> -allocation and pgtable_pmd_page_dtor() on freeing.
> +PMD split lock enabling requires pagetable_pmd_ctor() call on PMD table
> +allocation and pagetable_pmd_dtor() on freeing.
>  
>  Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and
>  pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing
> @@ -72,7 +72,7 @@ paths: i.e X86_PAE preallocate few PMDs on pgd_alloc().
>  
>  With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.
>  
> -NOTE: pgtable_pte_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
> +NOTE: pagetable_pte_ctor() and pagetable_pmd_ctor() can fail -- it must
>  be handled properly.
>  
>  page->ptl
> @@ -92,7 +92,7 @@ trick:
> split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
> one more cache line for indirect access;
>  
> -The spinlock_t allocated in pgtable_pte_page_ctor() for PTE table and in
> -pgtable_pmd_page_ctor() for PMD table.
> +The spinlock_t allocated in pagetable_pte_ctor() for PTE table and in
> +pagetable_pmd_ctor() for PMD table.
>  
>  Please, never access page->ptl directly -- use appropriate helper.
> diff --git a/Documentation/translations/zh_CN/mm/split_page_table_lock.rst 
> b/Documentation/translations/zh_CN/mm/split_page_table_lock.rst
> index 4fb7aa666037..a2c288670a24 100644
> --- a/Documentation/translations/zh_CN/mm/split_page_table_lock.rst
> +++ b/Documentation/translations/zh_CN/mm/split_page_table_lock.rst
> @@ -56,16 +56,16 @@ Hugetlb特定的辅助函数:
>  架构对分页表锁的支持
>  
>  
> -没有必要特别启用PTE分页表锁：所有需要的东西都由pgtable_pte_page_ctor()
> -和pgtable_pte_page_dtor()完成，它们必须在PTE表分配/释放时被调用。
> +没有必要特别启用PTE分页表锁：所有需要的东西都由pagetable_pte_ctor()
> +和pagetable_pte_dtor()完成，它们必须在PTE表分配/释放时被调用。
>  
>  确保架构不使用slab分配器来分配页表：slab使用page->slab_cache来分配其页
>  面。这个区域与page->ptl共享存储。
>  
>  PMD分页锁只有在你有两个以上的页表级别时才有意义。
>  
> -启用PMD分页锁需要在PMD表分配时调用pgtable_pmd_page_ctor()，在释放时调
> -用pgtable_pmd_page_dtor()。
> +启用PMD分页锁需要在PMD表分配时调用pagetable_pmd_ctor()，在释放时调
> +用pagetable_pmd_dtor()。
>  
>  分配通常发生在pmd_alloc_one()中，释放发生在pmd_free()和pmd_free_tlb()
>  中，但要确保覆盖所有的PMD表分配/释放路径：即X86_PAE在pgd_alloc()中预先
> @@ -73,7 +73,7 @@ PMD分页锁只有在你有两个以上的页表级别时才有意义。
>  
>  一切就绪后，你可以设置CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK。
>  
> -注意：pgtable_pte_page_ctor()和pgtable_pmd_page_ctor()可能失败--必
> +注意：pagetable_pte_ctor()和pagetable_pmd_ctor()可能失败--必
>  须正确处理。
>  
>  page->ptl
> @@ -90,7 +90,7 @@ page->ptl用于访问分割页表锁，其中'page'是包含该表的页面struc
> 的指针并动态分配它。这允许在启用DEBUG_SPINLOCK或DEBUG_LOCK_ALLOC的
> 情况下使用分页锁，但由于间接访问而多花了一个缓存行。
>  
> -PTE表的spinlock_t分配在pgtable_pte_page_ctor()中，PMD表的spinlock_t
> -分配在pgtable_pmd_page_ctor()中。
> +PTE表的spinlock_t分配在pagetable_pte_ctor()中，PMD表的spinlock_t
> +分配在pagetable_pmd_ctor()中。
>  
>  请不要直接访问page->ptl - -使用适当的辅助函数。
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index dc211c43610b..6d83483cf186 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2897,11 +2897,6 @@ static inline bool pagetable_pte_ctor(struct ptdesc 
> *ptdesc)
>   return true;
>  }
>  
> -static inline bool pgtable_pte_page_ctor(struct page *page)
> -{
> - return pagetable_pte_ctor(page_ptdesc(page));
> -}
> -
>  static inline void pagetable_pte_dtor(struct ptdesc *ptdesc)
>  {
>   struct folio *folio = ptdesc_folio(ptdesc);
> @@ -2911,11 +2906,6 @@

Re: [PATCH v4 33/34] um: Convert {pmd, pte}_free_tlb() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:22PM -0700, Vishal Moola (Oracle) wrote:
> Part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents. Also cleans up some spacing issues.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/um/include/asm/pgalloc.h | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/um/include/asm/pgalloc.h b/arch/um/include/asm/pgalloc.h
> index 8ec7cd46dd96..de5e31c64793 100644
> --- a/arch/um/include/asm/pgalloc.h
> +++ b/arch/um/include/asm/pgalloc.h
> @@ -25,19 +25,19 @@
>   */
>  extern pgd_t *pgd_alloc(struct mm_struct *);
>  
> -#define __pte_free_tlb(tlb,pte, address) \
> -do { \
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page((tlb),(pte));   \
> +#define __pte_free_tlb(tlb, pte, address)\
> +do { \
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
>  } while (0)
>  
>  #ifdef CONFIG_3_LEVEL_PGTABLES
>  
> -#define __pmd_free_tlb(tlb, pmd, address)\
> -do { \
> - pgtable_pmd_page_dtor(virt_to_page(pmd));   \
> - tlb_remove_page((tlb),virt_to_page(pmd));   \
> -} while (0)  \
> +#define __pmd_free_tlb(tlb, pmd, address)\
> +do { \
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd));\
> + tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \
> +} while (0)
>  
>  #endif
>  
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 32/34] sparc: Convert pgtable_pte_page_{ctor, dtor}() to ptdesc equivalents

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:21PM -0700, Vishal Moola (Oracle) wrote:
> Part of the conversions to replace pgtable pte constructor/destructors with
> ptdesc equivalents.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/sparc/mm/srmmu.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/sparc/mm/srmmu.c b/arch/sparc/mm/srmmu.c
> index 13f027afc875..8393faa3e596 100644
> --- a/arch/sparc/mm/srmmu.c
> +++ b/arch/sparc/mm/srmmu.c
> @@ -355,7 +355,8 @@ pgtable_t pte_alloc_one(struct mm_struct *mm)
>   return NULL;
>   page = pfn_to_page(__nocache_pa((unsigned long)ptep) >> PAGE_SHIFT);
>   spin_lock(>page_table_lock);
> - if (page_ref_inc_return(page) == 2 && !pgtable_pte_page_ctor(page)) {
> + if (page_ref_inc_return(page) == 2 &&
> + !pagetable_pte_ctor(page_ptdesc(page))) {
>   page_ref_dec(page);
>   ptep = NULL;
>   }
> @@ -371,7 +372,7 @@ void pte_free(struct mm_struct *mm, pgtable_t ptep)
>   page = pfn_to_page(__nocache_pa((unsigned long)ptep) >> PAGE_SHIFT);
>   spin_lock(>page_table_lock);
>   if (page_ref_dec_return(page) == 1)
> - pgtable_pte_page_dtor(page);
> + pagetable_pte_dtor(page_ptdesc(page));
>   spin_unlock(>page_table_lock);
>  
>   srmmu_free_nocache(ptep, SRMMU_PTE_TABLE_SIZE);
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 31/34] sparc64: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:20PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/sparc/mm/init_64.c | 17 +
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
> index 04f9db0c3111..105915cd2eee 100644
> --- a/arch/sparc/mm/init_64.c
> +++ b/arch/sparc/mm/init_64.c
> @@ -2893,14 +2893,15 @@ pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
>  
>  pgtable_t pte_alloc_one(struct mm_struct *mm)
>  {
> - struct page *page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> - if (!page)
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL | __GFP_ZERO, 0);
> +
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pte_page_ctor(page)) {
> - __free_page(page);
> + if (!pagetable_pte_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
> - return (pte_t *) page_address(page);
> + return ptdesc_address(ptdesc);
>  }
>  
>  void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
> @@ -2910,10 +2911,10 @@ void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>  
>  static void __pte_free(pgtable_t pte)
>  {
> - struct page *page = virt_to_page(pte);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pte);
>  
> - pgtable_pte_page_dtor(page);
> - __free_page(page);
> + pagetable_pte_dtor(ptdesc);
> + pagetable_free(ptdesc);
>  }
>  
>  void pte_free(struct mm_struct *mm, pgtable_t pte)
> -- 
> 2.40.1
> 
> 
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 30/34] sh: Convert pte_free_tlb() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:19PM -0700, Vishal Moola (Oracle) wrote:
> Part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents. Also cleans up some spacing issues.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> Reviewed-by: Geert Uytterhoeven 
> Acked-by: John Paul Adrian Glaubitz 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/sh/include/asm/pgalloc.h | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/sh/include/asm/pgalloc.h b/arch/sh/include/asm/pgalloc.h
> index a9e98233c4d4..5d8577ab1591 100644
> --- a/arch/sh/include/asm/pgalloc.h
> +++ b/arch/sh/include/asm/pgalloc.h
> @@ -2,6 +2,7 @@
>  #ifndef __ASM_SH_PGALLOC_H
>  #define __ASM_SH_PGALLOC_H
>  
> +#include 
>  #include 
>  
>  #define __HAVE_ARCH_PMD_ALLOC_ONE
> @@ -31,10 +32,10 @@ static inline void pmd_populate(struct mm_struct *mm, 
> pmd_t *pmd,
>   set_pmd(pmd, __pmd((unsigned long)page_address(pte)));
>  }
>  
> -#define __pte_free_tlb(tlb,pte,addr) \
> -do { \
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page((tlb), (pte));  \
> +#define __pte_free_tlb(tlb, pte, addr)   \
> +do { \
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
>  } while (0)
>  
>  #endif /* __ASM_SH_PGALLOC_H */
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 29/34] riscv: Convert alloc_{pmd, pte}_late() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:18PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> Acked-by: Palmer Dabbelt 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/riscv/include/asm/pgalloc.h |  8 
>  arch/riscv/mm/init.c | 16 ++--
>  2 files changed, 10 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/pgalloc.h 
> b/arch/riscv/include/asm/pgalloc.h
> index 59dc12b5b7e8..d169a4f41a2e 100644
> --- a/arch/riscv/include/asm/pgalloc.h
> +++ b/arch/riscv/include/asm/pgalloc.h
> @@ -153,10 +153,10 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
>  
>  #endif /* __PAGETABLE_PMD_FOLDED */
>  
> -#define __pte_free_tlb(tlb, pte, buf)   \
> -do {\
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page((tlb), pte);\
> +#define __pte_free_tlb(tlb, pte, buf)\
> +do { \
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
>  } while (0)
>  #endif /* CONFIG_MMU */
>  
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 3d689ffb2072..6bfeec80bf4e 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -354,12 +354,10 @@ static inline phys_addr_t __init 
> alloc_pte_fixmap(uintptr_t va)
>  
>  static phys_addr_t __init alloc_pte_late(uintptr_t va)
>  {
> - unsigned long vaddr;
> -
> - vaddr = __get_free_page(GFP_KERNEL);
> - BUG_ON(!vaddr || !pgtable_pte_page_ctor(virt_to_page((void *)vaddr)));
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, 0);
>  
> - return __pa(vaddr);
> + BUG_ON(!ptdesc || !pagetable_pte_ctor(ptdesc));
> + return __pa((pte_t *)ptdesc_address(ptdesc));
>  }
>  
>  static void __init create_pte_mapping(pte_t *ptep,
> @@ -437,12 +435,10 @@ static phys_addr_t __init alloc_pmd_fixmap(uintptr_t va)
>  
>  static phys_addr_t __init alloc_pmd_late(uintptr_t va)
>  {
> - unsigned long vaddr;
> -
> - vaddr = __get_free_page(GFP_KERNEL);
> - BUG_ON(!vaddr || !pgtable_pmd_page_ctor(virt_to_page((void *)vaddr)));
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, 0);
>  
> - return __pa(vaddr);
> + BUG_ON(!ptdesc || !pagetable_pmd_ctor(ptdesc));
> + return __pa((pmd_t *)ptdesc_address(ptdesc));
>  }
>  
>  static void __init create_pmd_mapping(pmd_t *pmdp,
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 28/34] openrisc: Convert __pte_free_tlb() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:17PM -0700, Vishal Moola (Oracle) wrote:
> Part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/openrisc/include/asm/pgalloc.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/openrisc/include/asm/pgalloc.h 
> b/arch/openrisc/include/asm/pgalloc.h
> index b7b2b8d16fad..c6a73772a546 100644
> --- a/arch/openrisc/include/asm/pgalloc.h
> +++ b/arch/openrisc/include/asm/pgalloc.h
> @@ -66,10 +66,10 @@ extern inline pgd_t *pgd_alloc(struct mm_struct *mm)
>  
>  extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm);
>  
> -#define __pte_free_tlb(tlb, pte, addr)   \
> -do { \
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page((tlb), (pte));  \
> +#define __pte_free_tlb(tlb, pte, addr)   \
> +do { \
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
>  } while (0)
>  
>  #endif
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 27/34] nios2: Convert __pte_free_tlb() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:16PM -0700, Vishal Moola (Oracle) wrote:
> Part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/nios2/include/asm/pgalloc.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/nios2/include/asm/pgalloc.h 
> b/arch/nios2/include/asm/pgalloc.h
> index ecd1657bb2ce..ce6bb8e74271 100644
> --- a/arch/nios2/include/asm/pgalloc.h
> +++ b/arch/nios2/include/asm/pgalloc.h
> @@ -28,10 +28,10 @@ static inline void pmd_populate(struct mm_struct *mm, 
> pmd_t *pmd,
>  
>  extern pgd_t *pgd_alloc(struct mm_struct *mm);
>  
> -#define __pte_free_tlb(tlb, pte, addr)   \
> - do {\
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page((tlb), (pte));  \
> +#define __pte_free_tlb(tlb, pte, addr)   
> \
> + do {\
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
>   } while (0)
>  
>  #endif /* _ASM_NIOS2_PGALLOC_H */
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 26/34] mips: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:15PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/mips/include/asm/pgalloc.h | 31 +--
>  arch/mips/mm/pgtable.c  |  7 ---
>  2 files changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/mips/include/asm/pgalloc.h b/arch/mips/include/asm/pgalloc.h
> index f72e737dda21..6940e5536664 100644
> --- a/arch/mips/include/asm/pgalloc.h
> +++ b/arch/mips/include/asm/pgalloc.h
> @@ -51,13 +51,13 @@ extern pgd_t *pgd_alloc(struct mm_struct *mm);
>  
>  static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
>  {
> - free_pages((unsigned long)pgd, PGD_TABLE_ORDER);
> + pagetable_free(virt_to_ptdesc(pgd));
>  }
>  
> -#define __pte_free_tlb(tlb,pte,address)  \
> -do { \
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page((tlb), pte);\
> +#define __pte_free_tlb(tlb, pte, address)\
> +do { \
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
>  } while (0)
>  
>  #ifndef __PAGETABLE_PMD_FOLDED
> @@ -65,18 +65,18 @@ do {  
> \
>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long 
> address)
>  {
>   pmd_t *pmd;
> - struct page *pg;
> + struct ptdesc *ptdesc;
>  
> - pg = alloc_pages(GFP_KERNEL_ACCOUNT, PMD_TABLE_ORDER);
> - if (!pg)
> + ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, PMD_TABLE_ORDER);
> + if (!ptdesc)
>   return NULL;
>  
> - if (!pgtable_pmd_page_ctor(pg)) {
> - __free_pages(pg, PMD_TABLE_ORDER);
> + if (!pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
>  
> - pmd = (pmd_t *)page_address(pg);
> + pmd = ptdesc_address(ptdesc);
>   pmd_init(pmd);
>   return pmd;
>  }
> @@ -90,10 +90,13 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
> unsigned long address)
>  static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long 
> address)
>  {
>   pud_t *pud;
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, PUD_TABLE_ORDER);
>  
> - pud = (pud_t *) __get_free_pages(GFP_KERNEL, PUD_TABLE_ORDER);
> - if (pud)
> - pud_init(pud);
> + if (!ptdesc)
> + return NULL;
> + pud = ptdesc_address(ptdesc);
> +
> + pud_init(pud);
>   return pud;
>  }
>  
> diff --git a/arch/mips/mm/pgtable.c b/arch/mips/mm/pgtable.c
> index b13314be5d0e..729258ff4e3b 100644
> --- a/arch/mips/mm/pgtable.c
> +++ b/arch/mips/mm/pgtable.c
> @@ -10,10 +10,11 @@
>  
>  pgd_t *pgd_alloc(struct mm_struct *mm)
>  {
> - pgd_t *ret, *init;
> + pgd_t *init, *ret = NULL;
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, PGD_TABLE_ORDER);
>  
> - ret = (pgd_t *) __get_free_pages(GFP_KERNEL, PGD_TABLE_ORDER);
> - if (ret) {
> + if (ptdesc) {
> + ret = ptdesc_address(ptdesc);
>   init = pgd_offset(_mm, 0UL);
>   pgd_init(ret);
>   memcpy(ret + USER_PTRS_PER_PGD, init + USER_PTRS_PER_PGD,
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 25/34] m68k: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:14PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

One comment below
> ---
>  arch/m68k/include/asm/mcf_pgalloc.h  | 41 ++--
>  arch/m68k/include/asm/sun3_pgalloc.h |  8 +++---
>  arch/m68k/mm/motorola.c  |  4 +--
>  3 files changed, 27 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/m68k/include/asm/mcf_pgalloc.h 
> b/arch/m68k/include/asm/mcf_pgalloc.h
> index 5c2c0a864524..857949ac9431 100644
> --- a/arch/m68k/include/asm/mcf_pgalloc.h
> +++ b/arch/m68k/include/asm/mcf_pgalloc.h
> @@ -7,20 +7,19 @@
>  
>  extern inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>  {
> - free_page((unsigned long) pte);
> + pagetable_free(virt_to_ptdesc(pte));
>  }
>  
>  extern const char bad_pmd_string[];
>  
>  extern inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
>  {
> - unsigned long page = __get_free_page(GFP_DMA);
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_DMA | __GFP_ZERO, 0);
>  
> - if (!page)
> + if (!ptdesc)
>   return NULL;
>  
> - memset((void *)page, 0, PAGE_SIZE);
> - return (pte_t *) (page);
> + return ptdesc_address(ptdesc);
>  }
>  
>  extern inline pmd_t *pmd_alloc_kernel(pgd_t *pgd, unsigned long address)
> @@ -35,36 +34,36 @@ extern inline pmd_t *pmd_alloc_kernel(pgd_t *pgd, 
> unsigned long address)
>  static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pgtable,
> unsigned long address)
>  {
> - struct page *page = virt_to_page(pgtable);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pgtable);
>  
> - pgtable_pte_page_dtor(page);
> - __free_page(page);
> + pagetable_pte_dtor(ptdesc);
> + pagetable_free(ptdesc);
>  }
>  
>  static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
>  {
> - struct page *page = alloc_pages(GFP_DMA, 0);
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_DMA, 0);

You can add __GFP_ZERO here and drop pagetable_clear() below

>   pte_t *pte;
>  
> - if (!page)
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pte_page_ctor(page)) {
> - __free_page(page);
> + if (!pagetable_pte_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
>  
> - pte = page_address(page);
> - clear_page(pte);
> + pte = ptdesc_address(ptdesc);
> + pagetable_clear(pte);
>  
>   return pte;
>  }
>  
>  static inline void pte_free(struct mm_struct *mm, pgtable_t pgtable)
>  {
> - struct page *page = virt_to_page(pgtable);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pgtable);
>  
> - pgtable_pte_page_dtor(page);
> - __free_page(page);
> + pagetable_pte_dtor(ptdesc);
> + pagetable_free(ptdesc);
>  }
>  
>  /*
> @@ -75,16 +74,18 @@ static inline void pte_free(struct mm_struct *mm, 
> pgtable_t pgtable)
>  
>  static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
>  {
> - free_page((unsigned long) pgd);
> + pagetable_free(virt_to_ptdesc(pgd));
>  }
>  
>  static inline pgd_t *pgd_alloc(struct mm_struct *mm)
>  {
>   pgd_t *new_pgd;
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_DMA | GFP_NOWARN, 0);
>  
> - new_pgd = (pgd_t *)__get_free_page(GFP_DMA | __GFP_NOWARN);
> - if (!new_pgd)
> + if (!ptdesc)
>   return NULL;
> + new_pgd = ptdesc_address(ptdesc);
> +
>   memcpy(new_pgd, swapper_pg_dir, PTRS_PER_PGD * sizeof(pgd_t));
>   memset(new_pgd, 0, PAGE_OFFSET >> PGDIR_SHIFT);
>   return new_pgd;
> diff --git a/arch/m68k/include/asm/sun3_pgalloc.h 
> b/arch/m68k/include/asm/sun3_pgalloc.h
> index 198036aff519..ff48573db2c0 100644
> --- a/arch/m68k/include/asm/sun3_pgalloc.h
> +++ b/arch/m68k/include/asm/sun3_pgalloc.h
> @@ -17,10 +17,10 @@
>  
>  extern const char bad_pmd_string[];
>  
> -#define __pte_free_tlb(tlb,pte,addr) \
> -do { \
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page((tlb), pte);\
> +#define __pte_free_tlb(tlb, pte, addr)   \
> +do { \
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
>  } while (0)
>  
>  static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, 
> pte_t *pte)
> diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
> index c75984e2d86b..594575a0780c 100644
> ---

Re: [PATCH v4 24/34] loongarch: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:13PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/loongarch/include/asm/pgalloc.h | 27 +++
>  arch/loongarch/mm/pgtable.c  |  7 ---
>  2 files changed, 19 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/loongarch/include/asm/pgalloc.h 
> b/arch/loongarch/include/asm/pgalloc.h
> index af1d1e4a6965..70bb3bdd201e 100644
> --- a/arch/loongarch/include/asm/pgalloc.h
> +++ b/arch/loongarch/include/asm/pgalloc.h
> @@ -45,9 +45,9 @@ extern void pagetable_init(void);
>  extern pgd_t *pgd_alloc(struct mm_struct *mm);
>  
>  #define __pte_free_tlb(tlb, pte, address)\
> -do { \
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page((tlb), pte);\
> +do { \
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
>  } while (0)
>  
>  #ifndef __PAGETABLE_PMD_FOLDED
> @@ -55,18 +55,18 @@ do {  
> \
>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long 
> address)
>  {
>   pmd_t *pmd;
> - struct page *pg;
> + struct ptdesc *ptdesc;
>  
> - pg = alloc_page(GFP_KERNEL_ACCOUNT);
> - if (!pg)
> + ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, 0);
> + if (!ptdesc)
>   return NULL;
>  
> - if (!pgtable_pmd_page_ctor(pg)) {
> - __free_page(pg);
> + if (!pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
>  
> - pmd = (pmd_t *)page_address(pg);
> + pmd = ptdesc_address(ptdesc);
>   pmd_init(pmd);
>   return pmd;
>  }
> @@ -80,10 +80,13 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
> unsigned long address)
>  static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long 
> address)
>  {
>   pud_t *pud;
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, 0);
>  
> - pud = (pud_t *) __get_free_page(GFP_KERNEL);
> - if (pud)
> - pud_init(pud);
> + if (!ptdesc)
> + return NULL;
> + pud = ptdesc_address(ptdesc);
> +
> + pud_init(pud);
>   return pud;
>  }
>  
> diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
> index 36a6dc0148ae..cdba10ffc0df 100644
> --- a/arch/loongarch/mm/pgtable.c
> +++ b/arch/loongarch/mm/pgtable.c
> @@ -11,10 +11,11 @@
>  
>  pgd_t *pgd_alloc(struct mm_struct *mm)
>  {
> - pgd_t *ret, *init;
> + pgd_t *init, *ret = NULL;
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, 0);
>  
> - ret = (pgd_t *) __get_free_page(GFP_KERNEL);
> - if (ret) {
> + if (ptdesc) {
> + ret = (pgd_t *)ptdesc_address(ptdesc);
>   init = pgd_offset(_mm, 0UL);
>   pgd_init(ret);
>   memcpy(ret + USER_PTRS_PER_PGD, init + USER_PTRS_PER_PGD,
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH] libxg: shrink variable scope in xc_core_arch_map_p2m_list_rw()

2023-06-14 Thread Anthony PERARD

On Wed, Jun 14, 2023 at 09:02:56AM +0200, Jan Beulich wrote:
> This in particular allows to drop a dead assignment to "ptes" from near
> the end of the function.
> 
> Coverity ID: 1532314
> Fixes: bd7a29c3d0b9 ("tools/libs/ctrl: fix xc_core_arch_map_p2m() to support 
> linear p2m table")
> Signed-off-by: Jan Beulich 
> ---
> v2: Much bigger change to limit the scope of "ptes" and other variables.

The change of scope of all variables isn't too hard to review with
--word-diff option and they all look fine.

> --- a/tools/libs/guest/xg_core_x86.c
> +++ b/tools/libs/guest/xg_core_x86.c
> @@ -169,18 +169,21 @@ xc_core_arch_map_p2m_list_rw(xc_interfac
>  if ( !mfns )
>  {
>  ERROR("Cannot allocate memory for array of %u mfns", idx);
> +out_unmap:
> +munmap(ptes, n_pages * PAGE_SIZE);
>  goto out;
>  }
>  

I guess it's not that great to have the label out_unmap in the middle of
the for loop (at least it's near the beginning), but at least that mean
that the mapping need to be gone once out of the loop. So if someone
edit the for loop and introduce a `goto out` instead of `goto out_unmap`
there's just a potential leak rather than potential use-after-free or
double-free, so I guess that better.

So:
Acked-by: Anthony PERARD 

Cheers,

-- 
Anthony PERARD

Re: [PATCH v4 23/34] hexagon: Convert __pte_free_tlb() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:12PM -0700, Vishal Moola (Oracle) wrote:
> Part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/hexagon/include/asm/pgalloc.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/hexagon/include/asm/pgalloc.h 
> b/arch/hexagon/include/asm/pgalloc.h
> index f0c47e6a7427..55988625e6fb 100644
> --- a/arch/hexagon/include/asm/pgalloc.h
> +++ b/arch/hexagon/include/asm/pgalloc.h
> @@ -87,10 +87,10 @@ static inline void pmd_populate_kernel(struct mm_struct 
> *mm, pmd_t *pmd,
>   max_kernel_seg = pmdindex;
>  }
>  
> -#define __pte_free_tlb(tlb, pte, addr)   \
> -do { \
> - pgtable_pte_page_dtor((pte));   \
> - tlb_remove_page((tlb), (pte));  \
> +#define __pte_free_tlb(tlb, pte, addr)   \
> +do { \
> + pagetable_pte_dtor((page_ptdesc(pte))); \
> + tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
>  } while (0)
>  
>  #endif
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 22/34] csky: Convert __pte_free_tlb() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:11PM -0700, Vishal Moola (Oracle) wrote:
> Part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> Acked-by: Guo Ren 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/csky/include/asm/pgalloc.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/csky/include/asm/pgalloc.h b/arch/csky/include/asm/pgalloc.h
> index 7d57e5da0914..9c84c9012e53 100644
> --- a/arch/csky/include/asm/pgalloc.h
> +++ b/arch/csky/include/asm/pgalloc.h
> @@ -63,8 +63,8 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
>  
>  #define __pte_free_tlb(tlb, pte, address)\
>  do { \
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page(tlb, pte);  \
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc(tlb, page_ptdesc(pte));  \
>  } while (0)
>  
>  extern void pagetable_init(void);
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 21/34] arm64: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:10PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/arm64/include/asm/tlb.h | 14 --
>  arch/arm64/mm/mmu.c  |  7 ---
>  2 files changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> index c995d1f4594f..2c29239d05c3 100644
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -75,18 +75,20 @@ static inline void tlb_flush(struct mmu_gather *tlb)
>  static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
> unsigned long addr)
>  {
> - pgtable_pte_page_dtor(pte);
> - tlb_remove_table(tlb, pte);
> + struct ptdesc *ptdesc = page_ptdesc(pte);
> +
> + pagetable_pte_dtor(ptdesc);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  }
>  
>  #if CONFIG_PGTABLE_LEVELS > 2
>  static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
> unsigned long addr)
>  {
> - struct page *page = virt_to_page(pmdp);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
>  
> - pgtable_pmd_page_dtor(page);
> - tlb_remove_table(tlb, page);
> + pagetable_pmd_dtor(ptdesc);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  }
>  #endif
>  
> @@ -94,7 +96,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, 
> pmd_t *pmdp,
>  static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
> unsigned long addr)
>  {
> - tlb_remove_table(tlb, virt_to_page(pudp));
> + tlb_remove_ptdesc(tlb, virt_to_ptdesc(pudp));
>  }
>  #endif
>  
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index af6bc8403ee4..5867a0e917b9 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -426,6 +426,7 @@ static phys_addr_t __pgd_pgtable_alloc(int shift)
>  static phys_addr_t pgd_pgtable_alloc(int shift)
>  {
>   phys_addr_t pa = __pgd_pgtable_alloc(shift);
> + struct ptdesc *ptdesc = page_ptdesc(phys_to_page(pa));
>  
>   /*
>* Call proper page table ctor in case later we need to
> @@ -433,12 +434,12 @@ static phys_addr_t pgd_pgtable_alloc(int shift)
>* this pre-allocated page table.
>*
>* We don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK if pmd is
> -  * folded, and if so pgtable_pmd_page_ctor() becomes nop.
> +  * folded, and if so pagetable_pte_ctor() becomes nop.
>*/
>   if (shift == PAGE_SHIFT)
> - BUG_ON(!pgtable_pte_page_ctor(phys_to_page(pa)));
> + BUG_ON(!pagetable_pte_ctor(ptdesc));
>   else if (shift == PMD_SHIFT)
> - BUG_ON(!pgtable_pmd_page_ctor(phys_to_page(pa)));
> + BUG_ON(!pagetable_pmd_ctor(ptdesc));
>  
>   return pa;
>  }
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [XEN PATCH] xen: fixed violations of MISRA C:2012 Rule 3.1

2023-06-14 Thread Jan Beulich

On 14.06.2023 16:28, Andrew Cooper wrote:
> On 13/06/2023 8:42 am, Nicola Vetrini wrote:
>> diff --git a/xen/common/xmalloc_tlsf.c b/xen/common/xmalloc_tlsf.c
>> index 75bdf18c4e..ea6ec47a59 100644
>> --- a/xen/common/xmalloc_tlsf.c
>> +++ b/xen/common/xmalloc_tlsf.c
>> @@ -140,9 +140,10 @@ static inline void MAPPING_SEARCH(unsigned long *r, int 
>> *fl, int *sl)
>>  *fl = flsl(*r) - 1;
>>  *sl = (*r >> (*fl - MAX_LOG2_SLI)) - MAX_SLI;
>>  *fl -= FLI_OFFSET;
>> -/*if ((*fl -= FLI_OFFSET) < 0) // FL will be always >0!
>> - *fl = *sl = 0;
>> - */
>> +#if 0
>> +if ((*fl -= FLI_OFFSET) < 0) // FL will be always >0!
>> +fl = *sl = 0;
>> +#endif
>>  *r &= ~t;
>>  }
>>  }
> 
> This logic has been commented out right from it's introduction in c/s
> 9736b76d829b2d in 2008, and never touched since.
> 
> I think it can safely be deleted, and not placed inside an #if 0.

I have to admit that I wouldn't be happy with deleting without any
replacement. Instead of the commented out code, how about instead
having ASSERT(*fl >= 0)? (What isn't clear to me is whether the
commented out code is actually meant to replace the earlier line,
rather than (optionally) be there in addition - at least it very
much looks like so. With such an uncertainty I'd be further
inclined to not remove what's there.)

Jan

Re: [PATCH v4 20/34] arm: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:09PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> late_alloc() also uses the __get_free_pages() helper function. Convert
> this to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

One comment below.

> ---
>  arch/arm/include/asm/tlb.h | 12 +++-
>  arch/arm/mm/mmu.c  |  6 +++---
>  2 files changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
> index b8cbe03ad260..f40d06ad5d2a 100644
> --- a/arch/arm/include/asm/tlb.h
> +++ b/arch/arm/include/asm/tlb.h
> @@ -39,7 +39,9 @@ static inline void __tlb_remove_table(void *_table)
>  static inline void
>  __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr)
>  {
> - pgtable_pte_page_dtor(pte);
> + struct ptdesc *ptdesc = page_ptdesc(pte);
> +
> + pagetable_pte_dtor(ptdesc);
>  
>  #ifndef CONFIG_ARM_LPAE
>   /*
> @@ -50,17 +52,17 @@ __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, 
> unsigned long addr)
>   __tlb_adjust_range(tlb, addr - PAGE_SIZE, 2 * PAGE_SIZE);
>  #endif
>  
> - tlb_remove_table(tlb, pte);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  }
>  
>  static inline void
>  __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
>  {
>  #ifdef CONFIG_ARM_LPAE
> - struct page *page = virt_to_page(pmdp);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
>  
> - pgtable_pmd_page_dtor(page);
> - tlb_remove_table(tlb, page);
> + pagetable_pmd_dtor(ptdesc);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  #endif
>  }
>  
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index 22292cf3381c..294518fd0240 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -737,11 +737,11 @@ static void __init *early_alloc(unsigned long sz)
>  
>  static void *__init late_alloc(unsigned long sz)
>  {
> - void *ptr = (void *)__get_free_pages(GFP_PGTABLE_KERNEL, get_order(sz));
> + void *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL, get_order(sz));
>  
> - if (!ptr || !pgtable_pte_page_ctor(virt_to_page(ptr)))
> + if (!ptdesc || !pagetable_pte_ctor(ptdesc))
>   BUG();
> - return ptr;
> + return ptdesc;

should be

return  ptdesc_to_virt(ptdesc);

>  }
>  
>  static pte_t * __init arm_pte_alloc(pmd_t *pmd, unsigned long addr,
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 19/34] pgalloc: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:08PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  include/asm-generic/pgalloc.h | 62 +--
>  1 file changed, 37 insertions(+), 25 deletions(-)
> 
> diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
> index a7cf825befae..3fd6ce79e654 100644
> --- a/include/asm-generic/pgalloc.h
> +++ b/include/asm-generic/pgalloc.h
> @@ -18,7 +18,11 @@
>   */
>  static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm)
>  {
> - return (pte_t *)__get_free_page(GFP_PGTABLE_KERNEL);
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL, 0);
> +
> + if (!ptdesc)
> + return NULL;
> + return ptdesc_address(ptdesc);
>  }
>  
>  #ifndef __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
> @@ -41,7 +45,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct 
> *mm)
>   */
>  static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>  {
> - free_page((unsigned long)pte);
> + pagetable_free(virt_to_ptdesc(pte));
>  }
>  
>  /**
> @@ -49,7 +53,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, 
> pte_t *pte)
>   * @mm: the mm_struct of the current context
>   * @gfp: GFP flags to use for the allocation
>   *
> - * Allocates a page and runs the pgtable_pte_page_ctor().
> + * Allocates a ptdesc and runs the pagetable_pte_ctor().

Allocates memory for page table and ptdesc

>   *
>   * This function is intended for architectures that need
>   * anything beyond simple page allocation or must have custom GFP flags.

The Return: description here should be fixed up

> @@ -58,17 +62,17 @@ static inline void pte_free_kernel(struct mm_struct *mm, 
> pte_t *pte)
>   */
>  static inline pgtable_t __pte_alloc_one(struct mm_struct *mm, gfp_t gfp)
>  {
> - struct page *pte;
> + struct ptdesc *ptdesc;
>  
> - pte = alloc_page(gfp);
> - if (!pte)
> + ptdesc = pagetable_alloc(gfp, 0);
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pte_page_ctor(pte)) {
> - __free_page(pte);
> + if (!pagetable_pte_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
>  
> - return pte;
> + return ptdesc_page(ptdesc);
>  }
>  
>  #ifndef __HAVE_ARCH_PTE_ALLOC_ONE
> @@ -76,7 +80,7 @@ static inline pgtable_t __pte_alloc_one(struct mm_struct 
> *mm, gfp_t gfp)
>   * pte_alloc_one - allocate a page for PTE-level user page table
>   * @mm: the mm_struct of the current context
>   *
> - * Allocates a page and runs the pgtable_pte_page_ctor().
> + * Allocates a ptdesc and runs the pagetable_pte_ctor().

Allocates memory for page table and ptdesc

>   *
>   * Return: `struct page` initialized as page table or %NULL on error

Return: ptdesc ...

>   */
> @@ -98,8 +102,10 @@ static inline pgtable_t pte_alloc_one(struct mm_struct 
> *mm)
>   */
>  static inline void pte_free(struct mm_struct *mm, struct page *pte_page)
>  {
> - pgtable_pte_page_dtor(pte_page);
> - __free_page(pte_page);
> + struct ptdesc *ptdesc = page_ptdesc(pte_page);
> +
> + pagetable_pte_dtor(ptdesc);
> + pagetable_free(ptdesc);
>  }
>  
>  
> @@ -110,7 +116,7 @@ static inline void pte_free(struct mm_struct *mm, struct 
> page *pte_page)
>   * pmd_alloc_one - allocate a page for PMD-level page table
>   * @mm: the mm_struct of the current context
>   *
> - * Allocates a page and runs the pgtable_pmd_page_ctor().
> + * Allocates a ptdesc and runs the pagetable_pmd_ctor().

Allocate memory for page table and ptdesc

>   * Allocations use %GFP_PGTABLE_USER in user context and
>   * %GFP_PGTABLE_KERNEL in kernel context.
>   *
> @@ -118,28 +124,30 @@ static inline void pte_free(struct mm_struct *mm, 
> struct page *pte_page)
>   */
>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
>  {
> - struct page *page;
> + struct ptdesc *ptdesc;
>   gfp_t gfp = GFP_PGTABLE_USER;
>  
>   if (mm == _mm)
>   gfp = GFP_PGTABLE_KERNEL;
> - page = alloc_page(gfp);
> - if (!page)
> + ptdesc = pagetable_alloc(gfp, 0);
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pmd_page_ctor(page)) {
> - __free_page(page);
> + if (!pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
> - return (pmd_t *)page_address(page);
> + return ptdesc_address(ptdesc);
>  }
>  #endif
>  
>  #ifndef __HAVE_ARCH_PMD_FREE
>  static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
>  {
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
> +
>

Re: [PATCH v4 18/34] mm: Remove page table members from struct page

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:07PM -0700, Vishal Moola (Oracle) wrote:
> The page table members are now split out into their own ptdesc struct.
> Remove them from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm_types.h | 14 --
>  include/linux/pgtable.h  |  3 ---
>  2 files changed, 17 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6161fe1ae5b8..31ffa1be21d0 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -141,20 +141,6 @@ struct page {
>   struct {/* Tail pages of compound page */
>   unsigned long compound_head;/* Bit zero is set */
>   };
> - struct {/* Page table pages */
> - unsigned long _pt_pad_1;/* compound_head */
> - pgtable_t pmd_huge_pte; /* protected by page->ptl */
> - unsigned long _pt_s390_gaddr;   /* mapping */
> - union {
> - struct mm_struct *pt_mm; /* x86 pgds only */
> - atomic_t pt_frag_refcount; /* powerpc */
> - };
> -#if ALLOC_SPLIT_PTLOCKS
> - spinlock_t *ptl;
> -#else
> - spinlock_t ptl;
> -#endif
> - };
>   struct {/* ZONE_DEVICE pages */
>   /** @pgmap: Points to the hosting device page map. */
>   struct dev_pagemap *pgmap;
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index c405f74d3875..33cc19d752b3 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1019,10 +1019,7 @@ struct ptdesc {
>  TABLE_MATCH(flags, __page_flags);
>  TABLE_MATCH(compound_head, pt_list);
>  TABLE_MATCH(compound_head, _pt_pad_1);
> -TABLE_MATCH(pmd_huge_pte, pmd_huge_pte);
>  TABLE_MATCH(mapping, _pt_s390_gaddr);
> -TABLE_MATCH(pt_mm, pt_mm);
> -TABLE_MATCH(ptl, ptl);
>  #undef TABLE_MATCH
>  static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
>  
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 17/34] s390: Convert various pgalloc functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:06PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/s390/include/asm/pgalloc.h |   4 +-
>  arch/s390/include/asm/tlb.h |   4 +-
>  arch/s390/mm/pgalloc.c  | 108 
>  3 files changed, 59 insertions(+), 57 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
> index 17eb618f1348..00ad9b88fda9 100644
> --- a/arch/s390/include/asm/pgalloc.h
> +++ b/arch/s390/include/asm/pgalloc.h
> @@ -86,7 +86,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
> unsigned long vmaddr)
>   if (!table)
>   return NULL;
>   crst_table_init(table, _SEGMENT_ENTRY_EMPTY);
> - if (!pgtable_pmd_page_ctor(virt_to_page(table))) {
> + if (!pagetable_pmd_ctor(virt_to_ptdesc(table))) {
>   crst_table_free(mm, table);
>   return NULL;
>   }
> @@ -97,7 +97,7 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t 
> *pmd)
>  {
>   if (mm_pmd_folded(mm))
>   return;
> - pgtable_pmd_page_dtor(virt_to_page(pmd));
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd));
>   crst_table_free(mm, (unsigned long *) pmd);
>  }
>  
> diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
> index b91f4a9b044c..383b1f91442c 100644
> --- a/arch/s390/include/asm/tlb.h
> +++ b/arch/s390/include/asm/tlb.h
> @@ -89,12 +89,12 @@ static inline void pmd_free_tlb(struct mmu_gather *tlb, 
> pmd_t *pmd,
>  {
>   if (mm_pmd_folded(tlb->mm))
>   return;
> - pgtable_pmd_page_dtor(virt_to_page(pmd));
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd));
>   __tlb_adjust_range(tlb, address, PAGE_SIZE);
>   tlb->mm->context.flush_mm = 1;
>   tlb->freed_tables = 1;
>   tlb->cleared_puds = 1;
> - tlb_remove_table(tlb, pmd);
> + tlb_remove_ptdesc(tlb, pmd);
>  }
>  
>  /*
> diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
> index 6b99932abc66..eeb7c95b98cf 100644
> --- a/arch/s390/mm/pgalloc.c
> +++ b/arch/s390/mm/pgalloc.c
> @@ -43,17 +43,17 @@ __initcall(page_table_register_sysctl);
>  
>  unsigned long *crst_table_alloc(struct mm_struct *mm)
>  {
> - struct page *page = alloc_pages(GFP_KERNEL, CRST_ALLOC_ORDER);
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, CRST_ALLOC_ORDER);
>  
> - if (!page)
> + if (!ptdesc)
>   return NULL;
> - arch_set_page_dat(page, CRST_ALLOC_ORDER);
> - return (unsigned long *) page_to_virt(page);
> + arch_set_page_dat(ptdesc_page(ptdesc), CRST_ALLOC_ORDER);
> + return (unsigned long *) ptdesc_to_virt(ptdesc);
>  }
>  
>  void crst_table_free(struct mm_struct *mm, unsigned long *table)
>  {
> - free_pages((unsigned long)table, CRST_ALLOC_ORDER);
> + pagetable_free(virt_to_ptdesc(table));
>  }
>  
>  static void __crst_table_upgrade(void *arg)
> @@ -140,21 +140,21 @@ static inline unsigned int atomic_xor_bits(atomic_t *v, 
> unsigned int bits)
>  
>  struct page *page_table_alloc_pgste(struct mm_struct *mm)
>  {
> - struct page *page;
> + struct ptdesc *ptdesc;
>   u64 *table;
>  
> - page = alloc_page(GFP_KERNEL);
> - if (page) {
> - table = (u64 *)page_to_virt(page);
> + ptdesc = pagetable_alloc(GFP_KERNEL, 0);
> + if (ptdesc) {
> + table = (u64 *)ptdesc_to_virt(ptdesc);
>   memset64(table, _PAGE_INVALID, PTRS_PER_PTE);
>   memset64(table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
>   }
> - return page;
> + return ptdesc_page(ptdesc);
>  }
>  
>  void page_table_free_pgste(struct page *page)
>  {
> - __free_page(page);
> + pagetable_free(page_ptdesc(page));
>  }
>  
>  #endif /* CONFIG_PGSTE */
> @@ -230,7 +230,7 @@ void page_table_free_pgste(struct page *page)
>  unsigned long *page_table_alloc(struct mm_struct *mm)
>  {
>   unsigned long *table;
> - struct page *page;
> + struct ptdesc *ptdesc;
>   unsigned int mask, bit;
>  
>   /* Try to get a fragment of a 4K page as a 2K page table */
> @@ -238,9 +238,9 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>   table = NULL;
>   spin_lock_bh(>context.lock);
>   if (!list_empty(>context.pgtable_list)) {
> - page = list_first_entry(>context.pgtable_list,
> - struct page, lru);
> - mask = atomic_read(>pt_frag_refcount);
> + ptdesc = list_first_entry(>context.pgtable_list,
> +

Re: [PATCH v4 16/34] s390: Convert various gmap functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:05PM -0700, Vishal Moola (Oracle) wrote:
> In order to split struct ptdesc from struct page, convert various
> functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

With folding

ptdesc->_pt_s390_gaddr = 0;

into pagetable_free()

Acked-by: Mike Rapoport (IBM) 


> ---
>  arch/s390/mm/gmap.c | 230 
>  1 file changed, 128 insertions(+), 102 deletions(-)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 81c683426b49..010e87df7299 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -34,7 +34,7 @@
>  static struct gmap *gmap_alloc(unsigned long limit)
>  {
>   struct gmap *gmap;
> - struct page *page;
> + struct ptdesc *ptdesc;
>   unsigned long *table;
>   unsigned long etype, atype;
>  
> @@ -67,12 +67,12 @@ static struct gmap *gmap_alloc(unsigned long limit)
>   spin_lock_init(>guest_table_lock);
>   spin_lock_init(>shadow_lock);
>   refcount_set(>ref_count, 1);
> - page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> - if (!page)
> + ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> + if (!ptdesc)
>   goto out_free;
> - page->_pt_s390_gaddr = 0;
> - list_add(>lru, >crst_list);
> - table = page_to_virt(page);
> + ptdesc->_pt_s390_gaddr = 0;
> + list_add(>pt_list, >crst_list);
> + table = ptdesc_to_virt(ptdesc);
>   crst_table_init(table, etype);
>   gmap->table = table;
>   gmap->asce = atype | _ASCE_TABLE_LENGTH |
> @@ -181,25 +181,25 @@ static void gmap_rmap_radix_tree_free(struct 
> radix_tree_root *root)
>   */
>  static void gmap_free(struct gmap *gmap)
>  {
> - struct page *page, *next;
> + struct ptdesc *ptdesc, *next;
>  
>   /* Flush tlb of all gmaps (if not already done for shadows) */
>   if (!(gmap_is_shadow(gmap) && gmap->removed))
>   gmap_flush_tlb(gmap);
>   /* Free all segment & region tables. */
> - list_for_each_entry_safe(page, next, >crst_list, lru) {
> - page->_pt_s390_gaddr = 0;
> - __free_pages(page, CRST_ALLOC_ORDER);
> + list_for_each_entry_safe(ptdesc, next, >crst_list, pt_list) {
> + ptdesc->_pt_s390_gaddr = 0;
> + pagetable_free(ptdesc);
>   }
>   gmap_radix_tree_free(>guest_to_host);
>   gmap_radix_tree_free(>host_to_guest);
>  
>   /* Free additional data for a shadow gmap */
>   if (gmap_is_shadow(gmap)) {
> - /* Free all page tables. */
> - list_for_each_entry_safe(page, next, >pt_list, lru) {
> - page->_pt_s390_gaddr = 0;
> - page_table_free_pgste(page);
> + /* Free all ptdesc tables. */
> + list_for_each_entry_safe(ptdesc, next, >pt_list, pt_list) 
> {
> + ptdesc->_pt_s390_gaddr = 0;
> + page_table_free_pgste(ptdesc_page(ptdesc));
>   }
>   gmap_rmap_radix_tree_free(>host_to_rmap);
>   /* Release reference to the parent */
> @@ -308,27 +308,27 @@ EXPORT_SYMBOL_GPL(gmap_get_enabled);
>  static int gmap_alloc_table(struct gmap *gmap, unsigned long *table,
>   unsigned long init, unsigned long gaddr)
>  {
> - struct page *page;
> + struct ptdesc *ptdesc;
>   unsigned long *new;
>  
>   /* since we dont free the gmap table until gmap_free we can unlock */
> - page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> - if (!page)
> + ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> + if (!ptdesc)
>   return -ENOMEM;
> - new = page_to_virt(page);
> + new = ptdesc_to_virt(ptdesc);
>   crst_table_init(new, init);
>   spin_lock(>guest_table_lock);
>   if (*table & _REGION_ENTRY_INVALID) {
> - list_add(>lru, >crst_list);
> + list_add(>pt_list, >crst_list);
>   *table = __pa(new) | _REGION_ENTRY_LENGTH |
>   (*table & _REGION_ENTRY_TYPE_MASK);
> - page->_pt_s390_gaddr = gaddr;
> - page = NULL;
> + ptdesc->_pt_s390_gaddr = gaddr;
> + ptdesc = NULL;
>   }
>   spin_unlock(>guest_table_lock);
> - if (page) {
> - page->_pt_s390_gaddr = 0;
> - __free_pages(page, CRST_ALLOC_ORDER);
> + if (ptdesc) {
> + ptdesc->_pt_s390_gaddr = 0;
> + pagetable_free(ptdesc);
>   }
>   return 0;
>  }
> @@ -341,15 +341,15 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned 
> long *table,
>   */
>  static unsigned long __gmap_segment_gaddr(unsigned long *entry)
>  {
> - struct page *page;
> + struct ptdesc *ptdesc;
>

[libvirt test] 181418: tolerable all pass - PUSHED

2023-06-14 Thread osstest service owner

flight 181418 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181418/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 181374
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 181374
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 181374
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  a7ee9eac835324854483a231d7931b9329f259bc
baseline version:
 libvirt  97f0bd00b4d055f2329392d2f8b7fe566fc65901

Last test of basis   181374  2023-06-11 04:18:49 Z3 days
Testing same since   181401  2023-06-13 04:20:27 Z1 days2 attempts


People who touched revisions under test:
  Ján Tomko 
  Michal Privoznik 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-i386-libvirt-raw  pass
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in

Re: [XEN PATCH] xen: fixed violations of MISRA C:2012 Rule 3.1

2023-06-14 Thread Andrew Cooper

On 13/06/2023 8:42 am, Nicola Vetrini wrote:
> diff --git a/xen/common/xmalloc_tlsf.c b/xen/common/xmalloc_tlsf.c
> index 75bdf18c4e..ea6ec47a59 100644
> --- a/xen/common/xmalloc_tlsf.c
> +++ b/xen/common/xmalloc_tlsf.c
> @@ -140,9 +140,10 @@ static inline void MAPPING_SEARCH(unsigned long *r, int 
> *fl, int *sl)
>  *fl = flsl(*r) - 1;
>  *sl = (*r >> (*fl - MAX_LOG2_SLI)) - MAX_SLI;
>  *fl -= FLI_OFFSET;
> -/*if ((*fl -= FLI_OFFSET) < 0) // FL will be always >0!
> - *fl = *sl = 0;
> - */
> +#if 0
> +if ((*fl -= FLI_OFFSET) < 0) // FL will be always >0!
> +fl = *sl = 0;
> +#endif
>  *r &= ~t;
>  }
>  }

This logic has been commented out right from it's introduction in c/s
9736b76d829b2d in 2008, and never touched since.

I think it can safely be deleted, and not placed inside an #if 0.

~Andrew

Re: [PATCH v4 15/34] x86: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:04PM -0700, Vishal Moola (Oracle) wrote:
> In order to split struct ptdesc from struct page, convert various
> functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert

Nit:   *get_free_page*()

> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.

More importantly, get_free_pages() ensures a page won't be allocated from
HIGHMEM, and for 32-bits this is a must.
 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  arch/x86/mm/pgtable.c | 46 +--
>  1 file changed, 27 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index 15a8009a4480..6da7fd5d4782 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -52,7 +52,7 @@ early_param("userpte", setup_userpte);
>  
>  void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
>  {
> - pgtable_pte_page_dtor(pte);
> + pagetable_pte_dtor(page_ptdesc(pte));
>   paravirt_release_pte(page_to_pfn(pte));
>   paravirt_tlb_remove_table(tlb, pte);
>  }
> @@ -60,7 +60,7 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page 
> *pte)
>  #if CONFIG_PGTABLE_LEVELS > 2
>  void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
>  {
> - struct page *page = virt_to_page(pmd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
>   paravirt_release_pmd(__pa(pmd) >> PAGE_SHIFT);
>   /*
>* NOTE! For PAE, any changes to the top page-directory-pointer-table
> @@ -69,8 +69,8 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
>  #ifdef CONFIG_X86_PAE
>   tlb->need_flush_all = 1;
>  #endif
> - pgtable_pmd_page_dtor(page);
> - paravirt_tlb_remove_table(tlb, page);
> + pagetable_pmd_dtor(ptdesc);
> + paravirt_tlb_remove_table(tlb, ptdesc_page(ptdesc));
>  }
>  
>  #if CONFIG_PGTABLE_LEVELS > 3
> @@ -92,16 +92,16 @@ void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
>  
>  static inline void pgd_list_add(pgd_t *pgd)
>  {
> - struct page *page = virt_to_page(pgd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pgd);
>  
> - list_add(>lru, _list);
> + list_add(>pt_list, _list);
>  }
>  
>  static inline void pgd_list_del(pgd_t *pgd)
>  {
> - struct page *page = virt_to_page(pgd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pgd);
>  
> - list_del(>lru);
> + list_del(>pt_list);
>  }
>  
>  #define UNSHARED_PTRS_PER_PGD\
> @@ -112,12 +112,12 @@ static inline void pgd_list_del(pgd_t *pgd)
>  
>  static void pgd_set_mm(pgd_t *pgd, struct mm_struct *mm)
>  {
> - virt_to_page(pgd)->pt_mm = mm;
> + virt_to_ptdesc(pgd)->pt_mm = mm;
>  }
>  
>  struct mm_struct *pgd_page_get_mm(struct page *page)
>  {
> - return page->pt_mm;
> + return page_ptdesc(page)->pt_mm;
>  }
>  
>  static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
> @@ -213,11 +213,14 @@ void pud_populate(struct mm_struct *mm, pud_t *pudp, 
> pmd_t *pmd)
>  static void free_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
>  {
>   int i;
> + struct ptdesc *ptdesc;
>  
>   for (i = 0; i < count; i++)
>   if (pmds[i]) {
> - pgtable_pmd_page_dtor(virt_to_page(pmds[i]));
> - free_page((unsigned long)pmds[i]);
> + ptdesc = virt_to_ptdesc(pmds[i]);
> +
> + pagetable_pmd_dtor(ptdesc);
> + pagetable_free(ptdesc);
>   mm_dec_nr_pmds(mm);
>   }
>  }
> @@ -232,16 +235,21 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t 
> *pmds[], int count)
>   gfp &= ~__GFP_ACCOUNT;
>  
>   for (i = 0; i < count; i++) {
> - pmd_t *pmd = (pmd_t *)__get_free_page(gfp);
> - if (!pmd)
> + pmd_t *pmd = NULL;
> + struct ptdesc *ptdesc = pagetable_alloc(gfp, 0);
> +
> + if (!ptdesc)
>   failed = true;
> - if (pmd && !pgtable_pmd_page_ctor(virt_to_page(pmd))) {
> - free_page((unsigned long)pmd);
> - pmd = NULL;
> + if (ptdesc && !pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
> + ptdesc = NULL;
>   failed = true;
>   }
> - if (pmd)
> + if (ptdesc) {
>   mm_inc_nr_pmds(mm);
> + pmd = ptdesc_address(ptdesc);
> + }
> +
>   pmds[i] = pmd;
>   }
>  
> @@ -830,7 +838,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
>  
>   free_page((unsigned long)pmd_sv);
>  
> - pgtable_pmd_page_dtor(virt_to_page(pmd));
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd));
>   free_page((unsigned long)pmd);
>  
>   return 1;
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

[xen-unstable-smoke test] 181426: tolerable all pass - PUSHED

2023-06-14 Thread osstest service owner

flight 181426 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181426/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  87c621d0ef75e5f95987d66811ed1fd7129208d1
baseline version:
 xen  2f69ef96801f0d2b9646abf6396e60f99c56e3a0

Last test of basis   181407  2023-06-13 14:00:28 Z1 days
Testing same since   181426  2023-06-14 11:02:37 Z0 days1 attempts


People who touched revisions under test:
  Anthony PERARD 
  Jan Beulich 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   2f69ef9680..87c621d0ef  87c621d0ef75e5f95987d66811ed1fd7129208d1 -> smoke

Re: [PATCH v4 14/34] powerpc: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:03PM -0700, Vishal Moola (Oracle) wrote:
> In order to split struct ptdesc from struct page, convert various
> functions to use ptdescs.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/powerpc/mm/book3s64/mmu_context.c | 10 +++---
>  arch/powerpc/mm/book3s64/pgtable.c | 32 +-
>  arch/powerpc/mm/pgtable-frag.c | 46 +-
>  3 files changed, 44 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/mmu_context.c 
> b/arch/powerpc/mm/book3s64/mmu_context.c
> index c766e4c26e42..1715b07c630c 100644
> --- a/arch/powerpc/mm/book3s64/mmu_context.c
> +++ b/arch/powerpc/mm/book3s64/mmu_context.c
> @@ -246,15 +246,15 @@ static void destroy_contexts(mm_context_t *ctx)
>  static void pmd_frag_destroy(void *pmd_frag)
>  {
>   int count;
> - struct page *page;
> + struct ptdesc *ptdesc;
>  
> - page = virt_to_page(pmd_frag);
> + ptdesc = virt_to_ptdesc(pmd_frag);
>   /* drop all the pending references */
>   count = ((unsigned long)pmd_frag & ~PAGE_MASK) >> PMD_FRAG_SIZE_SHIFT;
>   /* We allow PTE_FRAG_NR fragments from a PTE page */
> - if (atomic_sub_and_test(PMD_FRAG_NR - count, >pt_frag_refcount)) {
> - pgtable_pmd_page_dtor(page);
> - __free_page(page);
> + if (atomic_sub_and_test(PMD_FRAG_NR - count, 
> >pt_frag_refcount)) {
> + pagetable_pmd_dtor(ptdesc);
> + pagetable_free(ptdesc);
>   }
>  }
>  
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
> b/arch/powerpc/mm/book3s64/pgtable.c
> index 85c84e89e3ea..1212deeabe15 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -306,22 +306,22 @@ static pmd_t *get_pmd_from_cache(struct mm_struct *mm)
>  static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
>  {
>   void *ret = NULL;
> - struct page *page;
> + struct ptdesc *ptdesc;
>   gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO;
>  
>   if (mm == _mm)
>   gfp &= ~__GFP_ACCOUNT;
> - page = alloc_page(gfp);
> - if (!page)
> + ptdesc = pagetable_alloc(gfp, 0);
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pmd_page_ctor(page)) {
> - __free_pages(page, 0);
> + if (!pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
>  
> - atomic_set(>pt_frag_refcount, 1);
> + atomic_set(>pt_frag_refcount, 1);
>  
> - ret = page_address(page);
> + ret = ptdesc_address(ptdesc);
>   /*
>* if we support only one fragment just return the
>* allocated page.
> @@ -331,12 +331,12 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
>  
>   spin_lock(>page_table_lock);
>   /*
> -  * If we find pgtable_page set, we return
> +  * If we find ptdesc_page set, we return
>* the allocated page with single fragment
>* count.
>*/
>   if (likely(!mm->context.pmd_frag)) {
> - atomic_set(>pt_frag_refcount, PMD_FRAG_NR);
> + atomic_set(>pt_frag_refcount, PMD_FRAG_NR);
>   mm->context.pmd_frag = ret + PMD_FRAG_SIZE;
>   }
>   spin_unlock(>page_table_lock);
> @@ -357,15 +357,15 @@ pmd_t *pmd_fragment_alloc(struct mm_struct *mm, 
> unsigned long vmaddr)
>  
>  void pmd_fragment_free(unsigned long *pmd)
>  {
> - struct page *page = virt_to_page(pmd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
>  
> - if (PageReserved(page))
> - return free_reserved_page(page);
> + if (pagetable_is_reserved(ptdesc))
> + return free_reserved_ptdesc(ptdesc);
>  
> - BUG_ON(atomic_read(>pt_frag_refcount) <= 0);
> - if (atomic_dec_and_test(>pt_frag_refcount)) {
> - pgtable_pmd_page_dtor(page);
> - __free_page(page);
> + BUG_ON(atomic_read(>pt_frag_refcount) <= 0);
> + if (atomic_dec_and_test(>pt_frag_refcount)) {
> + pagetable_pmd_dtor(ptdesc);
> + pagetable_free(ptdesc);
>   }
>  }
>  
> diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
> index 20652daa1d7e..8961f1540209 100644
> --- a/arch/powerpc/mm/pgtable-frag.c
> +++ b/arch/powerpc/mm/pgtable-frag.c
> @@ -18,15 +18,15 @@
>  void pte_frag_destroy(void *pte_frag)
>  {
>   int count;
> - struct page *page;
> + struct ptdesc *ptdesc;
>  
> - page = virt_to_page(pte_frag);
> + ptdesc = virt_to_ptdesc(pte_frag);
>   /* drop all the pending references */
>   count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
>   /* We allow PTE_FRAG_NR fragments from a PTE page */
> - if (atomic_sub_and_test(PTE_FRAG_NR - count, >pt_frag_refcount)) {
> - pgtable_pte_page_dtor(page);
> - __free_page(page);
> + if (atomic_sub_and_test(PTE_FRAG_NR - count, 
> >pt_frag_refcount)) {
> +

[PATCH] spinlock: alter inlining of _spin_lock_cb()

2023-06-14 Thread Jan Beulich

To comply with Misra rule 8.10 ("An inline function shall be declared
with the static storage class"), convert what is presently
_spin_lock_cb() to an always-inline (and static) helper, while making
the function itself a thin wrapper, just like _spin_lock() is.

While there drop the unlikely() from the callback check, and correct
indentation in _spin_lock().

Signed-off-by: Jan Beulich 

--- a/xen/common/spinlock.c
+++ b/xen/common/spinlock.c
@@ -304,7 +304,8 @@ static always_inline u16 observe_head(sp
 return read_atomic(>head);
 }
 
-void inline _spin_lock_cb(spinlock_t *lock, void (*cb)(void *), void *data)
+static void always_inline spin_lock_common(spinlock_t *lock,
+   void (*cb)(void *), void *data)
 {
 spinlock_tickets_t tickets = SPINLOCK_TICKET_INC;
 LOCK_PROFILE_VAR;
@@ -316,7 +317,7 @@ void inline _spin_lock_cb(spinlock_t *lo
 while ( tickets.tail != observe_head(>tickets) )
 {
 LOCK_PROFILE_BLOCK;
-if ( unlikely(cb) )
+if ( cb )
 cb(data);
 arch_lock_relax();
 }
@@ -327,7 +328,12 @@ void inline _spin_lock_cb(spinlock_t *lo
 
 void _spin_lock(spinlock_t *lock)
 {
- _spin_lock_cb(lock, NULL, NULL);
+spin_lock_common(lock, NULL, NULL);
+}
+
+void _spin_lock_cb(spinlock_t *lock, void (*cb)(void *), void *data)
+{
+spin_lock_common(lock, cb, data);
 }
 
 void _spin_lock_irq(spinlock_t *lock)

Re: [PATCH v4 13/34] mm: Create ptdesc equivalents for pgtable_{pte,pmd}_page_{ctor,dtor}

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:02PM -0700, Vishal Moola (Oracle) wrote:
> Creates pagetable_pte_ctor(), pagetable_pmd_ctor(), pagetable_pte_dtor(),
> and pagetable_pmd_dtor() and make the original pgtable
> constructor/destructors wrappers.

Nit: either "creates ... makes" or "create ... make"
I like the second form more.
 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 56 ++
>  1 file changed, 42 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a1af7983e1bd..dc211c43610b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2886,20 +2886,34 @@ static inline bool ptlock_init(struct ptdesc *ptdesc) 
> { return true; }
>  static inline void ptlock_free(struct ptdesc *ptdesc) {}
>  #endif /* USE_SPLIT_PTE_PTLOCKS */
>  
> -static inline bool pgtable_pte_page_ctor(struct page *page)
> +static inline bool pagetable_pte_ctor(struct ptdesc *ptdesc)
>  {
> - if (!ptlock_init(page_ptdesc(page)))
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + if (!ptlock_init(ptdesc))
>   return false;
> - __SetPageTable(page);
> - inc_lruvec_page_state(page, NR_PAGETABLE);
> + __folio_set_table(folio);

This comment is more to patch 1 ("mm: Add PAGE_TYPE_OP folio functions")

It would be better to have _pgtable here, as "table" does not necessary
mean page table.
With PageType SetPageTable was fine, but with folio I think it should be
more explicit.

I'd add a third parameter to PAGE_TYPE_OPS for that.

> + lruvec_stat_add_folio(folio, NR_PAGETABLE);
>   return true;
>  }
>  
> +static inline bool pgtable_pte_page_ctor(struct page *page)
> +{
> + return pagetable_pte_ctor(page_ptdesc(page));
> +}
> +
> +static inline void pagetable_pte_dtor(struct ptdesc *ptdesc)
> +{
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + ptlock_free(ptdesc);
> + __folio_clear_table(folio);
> + lruvec_stat_sub_folio(folio, NR_PAGETABLE);
> +}
> +
>  static inline void pgtable_pte_page_dtor(struct page *page)
>  {
> - ptlock_free(page_ptdesc(page));
> - __ClearPageTable(page);
> - dec_lruvec_page_state(page, NR_PAGETABLE);
> + pagetable_pte_dtor(page_ptdesc(page));
>  }
>  
>  #define pte_offset_map_lock(mm, pmd, address, ptlp)  \
> @@ -2981,20 +2995,34 @@ static inline spinlock_t *pmd_lock(struct mm_struct 
> *mm, pmd_t *pmd)
>   return ptl;
>  }
>  
> -static inline bool pgtable_pmd_page_ctor(struct page *page)
> +static inline bool pagetable_pmd_ctor(struct ptdesc *ptdesc)
>  {
> - if (!pmd_ptlock_init(page_ptdesc(page)))
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + if (!pmd_ptlock_init(ptdesc))
>   return false;
> - __SetPageTable(page);
> - inc_lruvec_page_state(page, NR_PAGETABLE);
> + __folio_set_table(folio);
> + lruvec_stat_add_folio(folio, NR_PAGETABLE);
>   return true;
>  }
>  
> +static inline bool pgtable_pmd_page_ctor(struct page *page)
> +{
> + return pagetable_pmd_ctor(page_ptdesc(page));
> +}
> +
> +static inline void pagetable_pmd_dtor(struct ptdesc *ptdesc)
> +{
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + pmd_ptlock_free(ptdesc);
> + __folio_clear_table(folio);
> + lruvec_stat_sub_folio(folio, NR_PAGETABLE);
> +}
> +
>  static inline void pgtable_pmd_page_dtor(struct page *page)
>  {
> - pmd_ptlock_free(page_ptdesc(page));
> - __ClearPageTable(page);
> - dec_lruvec_page_state(page, NR_PAGETABLE);
> + pagetable_pmd_dtor(page_ptdesc(page));
>  }
>  
>  /*
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: Functions _spin_lock_cb() and handle_ro_raz()

2023-06-14 Thread Jan Beulich

On 14.06.2023 15:08, Federico Serafini wrote:
> Hello everyone,
> 
> I am working on the violations of MISRA C:2012 Rule 8.10,
> whose headline says:
> "An inline function shall be declared with the static storage class".
> 
> For both ARM64 and X86_64 builds,
> function _spin_lock_cb() defined in spinlock.c violates the rule.
> Such function is declared in spinlock.h without
> the inline function specifier: are there any reasons to do this?

Since this function was mentioned elsewhere already, I'm afraid I
have to be a little blunt and ask back: Did you check the history
of the function. Yes, it is intentional to be that way, for the
function to be inlined into _spin_lock(), and for it to also be
available for external callers (we have just one right now, but
that could change).

> What about solving the violation by moving the function definition in
> spinlock.h and declaring it as static inline?

Did you try whether that would work at least purely mechanically?
I'm afraid you'll find that it doesn't, because of LOCK_PROFILE_*
being unavailable then. Yet we also don't want to expose all that
in the header.

In the earlier context I did suggest already to make the function
an always-inline one in spinlock.c, under a slightly altered name,
and then have _spin_lock_cb() be a trivial wrapper just like
_spin_lock() is. I guess best is going to be if I make and post a
patch ...

Jan

Re: [PATCH v4 12/34] mm: Convert ptlock_free() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:01PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 10 +-
>  mm/memory.c|  4 ++--
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 3b54bb4c9753..a1af7983e1bd 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2826,7 +2826,7 @@ static inline void pagetable_clear(void *x)
>  #if ALLOC_SPLIT_PTLOCKS
>  void __init ptlock_cache_init(void);
>  bool ptlock_alloc(struct ptdesc *ptdesc);
> -extern void ptlock_free(struct page *page);
> +void ptlock_free(struct ptdesc *ptdesc);
>  
>  static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
>  {
> @@ -2842,7 +2842,7 @@ static inline bool ptlock_alloc(struct ptdesc *ptdesc)
>   return true;
>  }
>  
> -static inline void ptlock_free(struct page *page)
> +static inline void ptlock_free(struct ptdesc *ptdesc)
>  {
>  }
>  
> @@ -2883,7 +2883,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>  }
>  static inline void ptlock_cache_init(void) {}
>  static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
> -static inline void ptlock_free(struct page *page) {}
> +static inline void ptlock_free(struct ptdesc *ptdesc) {}
>  #endif /* USE_SPLIT_PTE_PTLOCKS */
>  
>  static inline bool pgtable_pte_page_ctor(struct page *page)
> @@ -2897,7 +2897,7 @@ static inline bool pgtable_pte_page_ctor(struct page 
> *page)
>  
>  static inline void pgtable_pte_page_dtor(struct page *page)
>  {
> - ptlock_free(page);
> + ptlock_free(page_ptdesc(page));
>   __ClearPageTable(page);
>   dec_lruvec_page_state(page, NR_PAGETABLE);
>  }
> @@ -2955,7 +2955,7 @@ static inline void pmd_ptlock_free(struct ptdesc 
> *ptdesc)
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>   VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc));
>  #endif
> - ptlock_free(ptdesc_page(ptdesc));
> + ptlock_free(ptdesc);
>  }
>  
>  #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
> diff --git a/mm/memory.c b/mm/memory.c
> index ba9579117686..d4d2ea5cf0fd 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5945,8 +5945,8 @@ bool ptlock_alloc(struct ptdesc *ptdesc)
>   return true;
>  }
>  
> -void ptlock_free(struct page *page)
> +void ptlock_free(struct ptdesc *ptdesc)
>  {
> - kmem_cache_free(page_ptl_cachep, page->ptl);
> + kmem_cache_free(page_ptl_cachep, ptdesc->ptl);
>  }
>  #endif
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 11/34] mm: Convert pmd_ptlock_free() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:00PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f48e626d9c98..3b54bb4c9753 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2950,12 +2950,12 @@ static inline bool pmd_ptlock_init(struct ptdesc 
> *ptdesc)
>   return ptlock_init(ptdesc);
>  }
>  
> -static inline void pmd_ptlock_free(struct page *page)
> +static inline void pmd_ptlock_free(struct ptdesc *ptdesc)
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - VM_BUG_ON_PAGE(page->pmd_huge_pte, page);
> + VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc));
>  #endif
> - ptlock_free(page);
> + ptlock_free(ptdesc_page(ptdesc));
>  }
>  
>  #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
> @@ -2968,7 +2968,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>  }
>  
>  static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; }
> -static inline void pmd_ptlock_free(struct page *page) {}
> +static inline void pmd_ptlock_free(struct ptdesc *ptdesc) {}
>  
>  #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
>  
> @@ -2992,7 +2992,7 @@ static inline bool pgtable_pmd_page_ctor(struct page 
> *page)
>  
>  static inline void pgtable_pmd_page_dtor(struct page *page)
>  {
> - pmd_ptlock_free(page);
> + pmd_ptlock_free(page_ptdesc(page));
>   __ClearPageTable(page);
>   dec_lruvec_page_state(page, NR_PAGETABLE);
>  }
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 10/34] mm: Convert ptlock_init() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:59PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index daecf1db6cf1..f48e626d9c98 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2857,7 +2857,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>   return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
>  }
>  
> -static inline bool ptlock_init(struct page *page)
> +static inline bool ptlock_init(struct ptdesc *ptdesc)
>  {
>   /*
>* prep_new_page() initialize page->private (and therefore page->ptl)
> @@ -2866,10 +2866,10 @@ static inline bool ptlock_init(struct page *page)
>* It can happen if arch try to use slab for page table allocation:
>* slab code uses page->slab_cache, which share storage with page->ptl.
>*/
> - VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
> - if (!ptlock_alloc(page_ptdesc(page)))
> + VM_BUG_ON_PAGE(*(unsigned long *)>ptl, ptdesc_page(ptdesc));
> + if (!ptlock_alloc(ptdesc))
>   return false;
> - spin_lock_init(ptlock_ptr(page_ptdesc(page)));
> + spin_lock_init(ptlock_ptr(ptdesc));
>   return true;
>  }
>  
> @@ -2882,13 +2882,13 @@ static inline spinlock_t *pte_lockptr(struct 
> mm_struct *mm, pmd_t *pmd)
>   return >page_table_lock;
>  }
>  static inline void ptlock_cache_init(void) {}
> -static inline bool ptlock_init(struct page *page) { return true; }
> +static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
>  static inline void ptlock_free(struct page *page) {}
>  #endif /* USE_SPLIT_PTE_PTLOCKS */
>  
>  static inline bool pgtable_pte_page_ctor(struct page *page)
>  {
> - if (!ptlock_init(page))
> + if (!ptlock_init(page_ptdesc(page)))
>   return false;
>   __SetPageTable(page);
>   inc_lruvec_page_state(page, NR_PAGETABLE);
> @@ -2947,7 +2947,7 @@ static inline bool pmd_ptlock_init(struct ptdesc 
> *ptdesc)
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>   ptdesc->pmd_huge_pte = NULL;
>  #endif
> - return ptlock_init(ptdesc_page(ptdesc));
> + return ptlock_init(ptdesc);
>  }
>  
>  static inline void pmd_ptlock_free(struct page *page)
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 09/34] mm: Convert pmd_ptlock_init() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:58PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bb934d51390f..daecf1db6cf1 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2942,12 +2942,12 @@ static inline spinlock_t *pmd_lockptr(struct 
> mm_struct *mm, pmd_t *pmd)
>   return ptlock_ptr(pmd_ptdesc(pmd));
>  }
>  
> -static inline bool pmd_ptlock_init(struct page *page)
> +static inline bool pmd_ptlock_init(struct ptdesc *ptdesc)
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - page->pmd_huge_pte = NULL;
> + ptdesc->pmd_huge_pte = NULL;
>  #endif
> - return ptlock_init(page);
> + return ptlock_init(ptdesc_page(ptdesc));
>  }
>  
>  static inline void pmd_ptlock_free(struct page *page)
> @@ -2967,7 +2967,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>   return >page_table_lock;
>  }
>  
> -static inline bool pmd_ptlock_init(struct page *page) { return true; }
> +static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; }
>  static inline void pmd_ptlock_free(struct page *page) {}
>  
>  #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
> @@ -2983,7 +2983,7 @@ static inline spinlock_t *pmd_lock(struct mm_struct 
> *mm, pmd_t *pmd)
>  
>  static inline bool pgtable_pmd_page_ctor(struct page *page)
>  {
> - if (!pmd_ptlock_init(page))
> + if (!pmd_ptlock_init(page_ptdesc(page)))
>   return false;
>   __SetPageTable(page);
>   inc_lruvec_page_state(page, NR_PAGETABLE);
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 08/34] mm: Convert ptlock_ptr() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:57PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/x86/xen/mmu_pv.c |  2 +-
>  include/linux/mm.h| 14 +++---
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
> index b3b8d289b9ab..f469862e3ef4 100644
> --- a/arch/x86/xen/mmu_pv.c
> +++ b/arch/x86/xen/mmu_pv.c
> @@ -651,7 +651,7 @@ static spinlock_t *xen_pte_lock(struct page *page, struct 
> mm_struct *mm)
>   spinlock_t *ptl = NULL;
>  
>  #if USE_SPLIT_PTE_PTLOCKS
> - ptl = ptlock_ptr(page);
> + ptl = ptlock_ptr(page_ptdesc(page));
>   spin_lock_nest_lock(ptl, >page_table_lock);
>  #endif
>  
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e6f1be2a405e..bb934d51390f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2828,9 +2828,9 @@ void __init ptlock_cache_init(void);
>  bool ptlock_alloc(struct ptdesc *ptdesc);
>  extern void ptlock_free(struct page *page);
>  
> -static inline spinlock_t *ptlock_ptr(struct page *page)
> +static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
>  {
> - return page->ptl;
> + return ptdesc->ptl;
>  }
>  #else /* ALLOC_SPLIT_PTLOCKS */
>  static inline void ptlock_cache_init(void)
> @@ -2846,15 +2846,15 @@ static inline void ptlock_free(struct page *page)
>  {
>  }
>  
> -static inline spinlock_t *ptlock_ptr(struct page *page)
> +static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
>  {
> - return >ptl;
> + return >ptl;
>  }
>  #endif /* ALLOC_SPLIT_PTLOCKS */
>  
>  static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
> - return ptlock_ptr(pmd_page(*pmd));
> + return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
>  }
>  
>  static inline bool ptlock_init(struct page *page)
> @@ -2869,7 +2869,7 @@ static inline bool ptlock_init(struct page *page)
>   VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
>   if (!ptlock_alloc(page_ptdesc(page)))
>   return false;
> - spin_lock_init(ptlock_ptr(page));
> + spin_lock_init(ptlock_ptr(page_ptdesc(page)));
>   return true;
>  }
>  
> @@ -2939,7 +2939,7 @@ static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd)
>  
>  static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
> - return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd)));
> + return ptlock_ptr(pmd_ptdesc(pmd));
>  }
>  
>  static inline bool pmd_ptlock_init(struct page *page)
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 07/34] mm: Convert ptlock_alloc() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:56PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 6 +++---
>  mm/memory.c| 4 ++--
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 088b7664f897..e6f1be2a405e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2825,7 +2825,7 @@ static inline void pagetable_clear(void *x)
>  #if USE_SPLIT_PTE_PTLOCKS
>  #if ALLOC_SPLIT_PTLOCKS
>  void __init ptlock_cache_init(void);
> -extern bool ptlock_alloc(struct page *page);
> +bool ptlock_alloc(struct ptdesc *ptdesc);
>  extern void ptlock_free(struct page *page);
>  
>  static inline spinlock_t *ptlock_ptr(struct page *page)
> @@ -2837,7 +2837,7 @@ static inline void ptlock_cache_init(void)
>  {
>  }
>  
> -static inline bool ptlock_alloc(struct page *page)
> +static inline bool ptlock_alloc(struct ptdesc *ptdesc)
>  {
>   return true;
>  }
> @@ -2867,7 +2867,7 @@ static inline bool ptlock_init(struct page *page)
>* slab code uses page->slab_cache, which share storage with page->ptl.
>*/
>   VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
> - if (!ptlock_alloc(page))
> + if (!ptlock_alloc(page_ptdesc(page)))
>   return false;
>   spin_lock_init(ptlock_ptr(page));
>   return true;
> diff --git a/mm/memory.c b/mm/memory.c
> index 80ce9dda2779..ba9579117686 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5934,14 +5934,14 @@ void __init ptlock_cache_init(void)
>   SLAB_PANIC, NULL);
>  }
>  
> -bool ptlock_alloc(struct page *page)
> +bool ptlock_alloc(struct ptdesc *ptdesc)
>  {
>   spinlock_t *ptl;
>  
>   ptl = kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL);
>   if (!ptl)
>   return false;
> - page->ptl = ptl;
> + ptdesc->ptl = ptl;
>   return true;
>  }
>  
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH] iommu/amd-vi: adjust _amd_iommu_flush_pages() to handle pseudo-domids

2023-06-14 Thread Jan Beulich

On 14.06.2023 15:23, Roger Pau Monné wrote:
> On Wed, Jun 14, 2023 at 02:58:14PM +0200, Jan Beulich wrote:
>> On 14.06.2023 10:32, Roger Pau Monne wrote:
>>> When the passed domain is DomIO iterate over the list of DomIO
>>> assigned devices and flush each pseudo-domid found.
>>>
>>> invalidate_all_domain_pages() does call amd_iommu_flush_all_pages()
>>> with DomIO as a parameter,
>>
>> Does it? Since the full function is visible in the patch (because of
>> the "While there ..." change), it seems pretty clear that it doesn't:
>> for_each_domain() iterates over ordinary domains only.
> 
> Oh, I got confused by domain_create() returning early for system
> domains.
> 
>>> and hence the underlying
>>> _amd_iommu_flush_pages() implementation must be capable of flushing
>>> all pseudo-domids used by the quarantine domain logic.
>>
>> While it didn't occur to me right away when we discussed this, it
>> may well be that I left alone all flushing when introducing the pseudo
>> domain IDs simply because no flushing would ever happen for the
>> quarantine domain.
> 
> But the purpose of the calls to invalidate_all_devices() and
> invalidate_all_domain_pages() in amd_iommu_resume() is to cover up for
> the lack of Invalidate All support in the IOMMU, so flushing
> pseudo-domids is also required in order to flush all possible IOMMU
> state.
> 
> Note that as part of invalidate_all_devices() we do invalidate DTEs
> for devices assigned to pseudo-domids, hence it seems natural that we
> also flush such pseudo-domids.
> 
>>> While there fix invalidate_all_domain_pages() to only attempt to flush
>>> the domains that have IOMMU enabled, otherwise the flush is pointless.
>>
>> For the moment at least it looks to me as if this change alone wants
>> to go in.
> 
> I would rather get the current patch with an added call to flush
> dom_io in invalidate_all_domain_pages().

The question is: Is there anything that needs flushing for the
quarantine domain. Right now I'm thinking that there isn't.

Jan

Re: [PATCH v4 06/34] mm: Convert pmd_pgtable_page() to pmd_ptdesc()

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:55PM -0700, Vishal Moola (Oracle) wrote:
> Converts pmd_pgtable_page() to pmd_ptdesc() and all its callers. This
> removes some direct accesses to struct page, working towards splitting
> out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f184f1eba85d..088b7664f897 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2931,15 +2931,15 @@ static inline void pgtable_pte_page_dtor(struct page 
> *page)
>  
>  #if USE_SPLIT_PMD_PTLOCKS
>  
> -static inline struct page *pmd_pgtable_page(pmd_t *pmd)
> +static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd)
>  {
>   unsigned long mask = ~(PTRS_PER_PMD * sizeof(pmd_t) - 1);
> - return virt_to_page((void *)((unsigned long) pmd & mask));
> + return virt_to_ptdesc((void *)((unsigned long) pmd & mask));
>  }
>  
>  static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
> - return ptlock_ptr(pmd_pgtable_page(pmd));
> + return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd)));
>  }
>  
>  static inline bool pmd_ptlock_init(struct page *page)
> @@ -2958,7 +2958,7 @@ static inline void pmd_ptlock_free(struct page *page)
>   ptlock_free(page);
>  }
>  
> -#define pmd_huge_pte(mm, pmd) (pmd_pgtable_page(pmd)->pmd_huge_pte)
> +#define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
>  
>  #else
>  
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 05/34] mm: add utility functions for ptdesc

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:54PM -0700, Vishal Moola (Oracle) wrote:
> Introduce utility functions setting the foundation for ptdescs. These
> will also assist in the splitting out of ptdesc from struct page.
> 
> Functions that focus on the descriptor are prefixed with ptdesc_* while
> functions that focus on the pagetable are prefixed with pagetable_*.
> 
> pagetable_alloc() is defined to allocate new ptdesc pages as compound
> pages. This is to standardize ptdescs by allowing for one allocation
> and one free function, in contrast to 2 allocation and 2 free functions.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  include/asm-generic/tlb.h | 11 +++
>  include/linux/mm.h| 61 +++
>  include/linux/pgtable.h   | 12 
>  3 files changed, 84 insertions(+)
> 
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index b46617207c93..6bade9e0e799 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -481,6 +481,17 @@ static inline void tlb_remove_page(struct mmu_gather 
> *tlb, struct page *page)
>   return tlb_remove_page_size(tlb, page, PAGE_SIZE);
>  }
>  
> +static inline void tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
> +{
> + tlb_remove_table(tlb, pt);
> +}
> +
> +/* Like tlb_remove_ptdesc, but for page-like page directories. */
> +static inline void tlb_remove_page_ptdesc(struct mmu_gather *tlb, struct 
> ptdesc *pt)
> +{
> + tlb_remove_page(tlb, ptdesc_page(pt));
> +}
> +
>  static inline void tlb_change_page_size(struct mmu_gather *tlb,
>unsigned int page_size)
>  {
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0db09639dd2d..f184f1eba85d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2766,6 +2766,62 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, 
> pud_t *pud, unsigned long a
>  }
>  #endif /* CONFIG_MMU */
>  
> +static inline struct ptdesc *virt_to_ptdesc(const void *x)
> +{
> + return page_ptdesc(virt_to_page(x));
> +}
> +
> +static inline void *ptdesc_to_virt(const struct ptdesc *pt)
> +{
> + return page_to_virt(ptdesc_page(pt));
> +}
> +
> +static inline void *ptdesc_address(const struct ptdesc *pt)
> +{
> + return folio_address(ptdesc_folio(pt));
> +}
> +
> +static inline bool pagetable_is_reserved(struct ptdesc *pt)
> +{
> + return folio_test_reserved(ptdesc_folio(pt));
> +}
> +
> +/**
> + * pagetable_alloc - Allocate pagetables
> + * @gfp:GFP flags
> + * @order:  desired pagetable order
> + *
> + * pagetable_alloc allocates a page table descriptor as well as all pages
> + * described by it.

I think the order should be switched here to emphasize that primarily this
method allocates memory for page tables. How about

 pagetable_alloc allocates memory for the page tables as well as a page
 table descriptor that describes the allocated memory

> + *
> + * Return: The ptdesc describing the allocated page tables.
> + */
> +static inline struct ptdesc *pagetable_alloc(gfp_t gfp, unsigned int order)
> +{
> + struct page *page = alloc_pages(gfp | __GFP_COMP, order);
> +
> + return page_ptdesc(page);
> +}
> +
> +/**
> + * pagetable_free - Free pagetables
> + * @pt:  The page table descriptor
> + *
> + * pagetable_free frees a page table descriptor as well as all page
> + * tables described by said ptdesc.

Similarly here.

> + */
> +static inline void pagetable_free(struct ptdesc *pt)
> +{
> + struct page *page = ptdesc_page(pt);
> +
> + __free_pages(page, compound_order(page));
> +}
> +
> +static inline void pagetable_clear(void *x)
> +{
> + clear_page(x);
> +}
> +
>  #if USE_SPLIT_PTE_PTLOCKS
>  #if ALLOC_SPLIT_PTLOCKS
>  void __init ptlock_cache_init(void);
> @@ -2992,6 +3048,11 @@ static inline void mark_page_reserved(struct page 
> *page)
>   adjust_managed_page_count(page, -1);
>  }
>  
> +static inline void free_reserved_ptdesc(struct ptdesc *pt)
> +{
> + free_reserved_page(ptdesc_page(pt));
> +}
> +
>  /*
>   * Default method to free all the __init memory into the buddy system.
>   * The freed pages will be poisoned with pattern "poison" if it's within
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 330de96ebfd6..c405f74d3875 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1026,6 +1026,18 @@ TABLE_MATCH(ptl, ptl);
>  #undef TABLE_MATCH
>  static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
>  
> +#define ptdesc_page(pt)  (_Generic((pt), 
> \
> + const struct ptdesc *:  (const struct page *)(pt),  \
> + struct ptdesc *:(struct page *)(pt)))
> +
> +#define ptdesc_folio(pt) (_Generic((pt), \
> + const struct ptdesc *:  (const struct folio *)(pt), \
> + struct ptdesc *:(struct folio *)(pt)))
> +
> +#define page_ptdesc(p)

[qemu-mainline test] 181424: regressions - FAIL

2023-06-14 Thread osstest service owner

flight 181424 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181424/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 180691
 build-arm64   6 xen-buildfail REGR. vs. 180691
 build-arm64-xsm   6 xen-buildfail REGR. vs. 180691
 build-amd64   6 xen-buildfail REGR. vs. 180691
 build-i3866 xen-buildfail REGR. vs. 180691
 build-i386-xsm6 xen-buildfail REGR. vs. 180691
 build-armhf   6 xen-buildfail REGR. vs. 180691

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-ws16-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-shadow 1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-vhd1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-coresched-i386-xl  1 build-check(1)   blocked  n/a
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a

Re: [PATCH v4 04/34] pgtable: Create struct ptdesc

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:53PM -0700, Vishal Moola (Oracle) wrote:
> Currently, page table information is stored within struct page. As part
> of simplifying struct page, create struct ptdesc for page table
> information.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/pgtable.h | 51 +
>  1 file changed, 51 insertions(+)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index c5a51481bbb9..330de96ebfd6 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -975,6 +975,57 @@ static inline void ptep_modify_prot_commit(struct 
> vm_area_struct *vma,
>  #endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */
>  #endif /* CONFIG_MMU */
>  
> +
> +/**
> + * struct ptdesc - Memory descriptor for page tables.
> + * @__page_flags: Same as page flags. Unused for page tables.
> + * @pt_list: List of used page tables. Used for s390 and x86.
> + * @_pt_pad_1: Padding that aliases with page's compound head.
> + * @pmd_huge_pte: Protected by ptdesc->ptl, used for THPs.
> + * @_pt_s390_gaddr: Aliases with page's mapping. Used for s390 gmap only.
> + * @pt_mm: Used for x86 pgds.
> + * @pt_frag_refcount: For fragmented page table tracking. Powerpc and s390 
> only.
> + * @ptl: Lock for the page table.

Do you mind aligning the descriptions by @pt_frag_refcount? I think it'll
be more readable.

> + *
> + * This struct overlays struct page for now. Do not modify without a good
> + * understanding of the issues.
> + */
> +struct ptdesc {
> + unsigned long __page_flags;
> +
> + union {
> + struct list_head pt_list;
> + struct {
> + unsigned long _pt_pad_1;
> + pgtable_t pmd_huge_pte;
> + };
> + };
> + unsigned long _pt_s390_gaddr;
> +
> + union {
> + struct mm_struct *pt_mm;
> + atomic_t pt_frag_refcount;
> + };
> +
> +#if ALLOC_SPLIT_PTLOCKS
> + spinlock_t *ptl;
> +#else
> + spinlock_t ptl;
> +#endif
> +};
> +
> +#define TABLE_MATCH(pg, pt)  \
> + static_assert(offsetof(struct page, pg) == offsetof(struct ptdesc, pt))
> +TABLE_MATCH(flags, __page_flags);
> +TABLE_MATCH(compound_head, pt_list);
> +TABLE_MATCH(compound_head, _pt_pad_1);
> +TABLE_MATCH(pmd_huge_pte, pmd_huge_pte);
> +TABLE_MATCH(mapping, _pt_s390_gaddr);
> +TABLE_MATCH(pt_mm, pt_mm);
> +TABLE_MATCH(ptl, ptl);
> +#undef TABLE_MATCH
> +static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
> +
>  /*
>   * No-op macros that just return the current protection value. Defined here
>   * because these macros can be used even if CONFIG_MMU is not defined.
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH] iommu/amd-vi: adjust _amd_iommu_flush_pages() to handle pseudo-domids

2023-06-14 Thread Roger Pau Monné

On Wed, Jun 14, 2023 at 02:58:14PM +0200, Jan Beulich wrote:
> On 14.06.2023 10:32, Roger Pau Monne wrote:
> > When the passed domain is DomIO iterate over the list of DomIO
> > assigned devices and flush each pseudo-domid found.
> > 
> > invalidate_all_domain_pages() does call amd_iommu_flush_all_pages()
> > with DomIO as a parameter,
> 
> Does it? Since the full function is visible in the patch (because of
> the "While there ..." change), it seems pretty clear that it doesn't:
> for_each_domain() iterates over ordinary domains only.

Oh, I got confused by domain_create() returning early for system
domains.

> > and hence the underlying
> > _amd_iommu_flush_pages() implementation must be capable of flushing
> > all pseudo-domids used by the quarantine domain logic.
> 
> While it didn't occur to me right away when we discussed this, it
> may well be that I left alone all flushing when introducing the pseudo
> domain IDs simply because no flushing would ever happen for the
> quarantine domain.

But the purpose of the calls to invalidate_all_devices() and
invalidate_all_domain_pages() in amd_iommu_resume() is to cover up for
the lack of Invalidate All support in the IOMMU, so flushing
pseudo-domids is also required in order to flush all possible IOMMU
state.

Note that as part of invalidate_all_devices() we do invalidate DTEs
for devices assigned to pseudo-domids, hence it seems natural that we
also flush such pseudo-domids.

> > While there fix invalidate_all_domain_pages() to only attempt to flush
> > the domains that have IOMMU enabled, otherwise the flush is pointless.
> 
> For the moment at least it looks to me as if this change alone wants
> to go in.

I would rather get the current patch with an added call to flush
dom_io in invalidate_all_domain_pages().

Thanks, Roger.

Re: [PATCH v4 03/34] s390: Use pt_frag_refcount for pagetables

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:52PM -0700, Vishal Moola (Oracle) wrote:
> s390 currently uses _refcount to identify fragmented page tables.
> The page table struct already has a member pt_frag_refcount used by
> powerpc, so have s390 use that instead of the _refcount field as well.
> This improves the safety for _refcount and the page table tracking.
> 
> This also allows us to simplify the tracking since we can once again use
> the lower byte of pt_frag_refcount instead of the upper byte of _refcount.
> 
> Signed-off-by: Vishal Moola (Oracle) 

One nit below, otherwise

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/s390/mm/pgalloc.c | 38 +++---
>  1 file changed, 15 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
> index 66ab68db9842..6b99932abc66 100644
> --- a/arch/s390/mm/pgalloc.c
> +++ b/arch/s390/mm/pgalloc.c
> @@ -182,20 +182,17 @@ void page_table_free_pgste(struct page *page)
>   * As follows from the above, no unallocated or fully allocated parent
>   * pages are contained in mm_context_t::pgtable_list.
>   *
> - * The upper byte (bits 24-31) of the parent page _refcount is used
> + * The lower byte (bits 0-7) of the parent page pt_frag_refcount is used
>   * for tracking contained 2KB-pgtables and has the following format:
>   *
>   *   PP  AA
> - * 01234567upper byte (bits 24-31) of struct page::_refcount
> + * 01234567upper byte (bits 0-7) of struct page::pt_frag_refcount

Nit:  lower

>   *   ||  ||
>   *   ||  |+--- upper 2KB-pgtable is allocated
>   *   ||  + lower 2KB-pgtable is allocated
>   *   |+--- upper 2KB-pgtable is pending for removal
>   *   + lower 2KB-pgtable is pending for removal
>   *
> - * (See commit 620b4e903179 ("s390: use _refcount for pgtables") on why
> - * using _refcount is possible).
> - *
>   * When 2KB-pgtable is allocated the corresponding AA bit is set to 1.
>   * The parent page is either:
>   *   - added to mm_context_t::pgtable_list in case the second half of the
> @@ -243,11 +240,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>   if (!list_empty(>context.pgtable_list)) {
>   page = list_first_entry(>context.pgtable_list,
>   struct page, lru);
> - mask = atomic_read(>_refcount) >> 24;
> + mask = atomic_read(>pt_frag_refcount);
>   /*
>* The pending removal bits must also be checked.
>* Failure to do so might lead to an impossible
> -  * value of (i.e 0x13 or 0x23) written to _refcount.
> +  * value of (i.e 0x13 or 0x23) written to
> +  * pt_frag_refcount.
>* Such values violate the assumption that pending and
>* allocation bits are mutually exclusive, and the rest
>* of the code unrails as result. That could lead to
> @@ -259,8 +257,8 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>   bit = mask & 1; /* =1 -> second 2K */
>   if (bit)
>   table += PTRS_PER_PTE;
> - atomic_xor_bits(>_refcount,
> - 0x01U << (bit + 24));
> + atomic_xor_bits(>pt_frag_refcount,
> + 0x01U << bit);
>   list_del(>lru);
>   }
>   }
> @@ -281,12 +279,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>   table = (unsigned long *) page_to_virt(page);
>   if (mm_alloc_pgste(mm)) {
>   /* Return 4K page table with PGSTEs */
> - atomic_xor_bits(>_refcount, 0x03U << 24);
> + atomic_xor_bits(>pt_frag_refcount, 0x03U);
>   memset64((u64 *)table, _PAGE_INVALID, PTRS_PER_PTE);
>   memset64((u64 *)table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
>   } else {
>   /* Return the first 2K fragment of the page */
> - atomic_xor_bits(>_refcount, 0x01U << 24);
> + atomic_xor_bits(>pt_frag_refcount, 0x01U);
>   memset64((u64 *)table, _PAGE_INVALID, 2 * PTRS_PER_PTE);
>   spin_lock_bh(>context.lock);
>   list_add(>lru, >context.pgtable_list);
> @@ -323,22 +321,19 @@ void page_table_free(struct mm_struct *mm, unsigned 
> long *table)
>* will happen outside of the critical section from this
>* function or from __tlb_remove_table()
>*/
> - mask = atomic_xor_bits(>_refcount, 0x11U << (bit + 24));
> - mask >>= 24;
> + mask = atomic_xor_bits(>pt_frag_refcount, 0x11U << bit);
>   if (mask & 0x03U)
>

Re: [XEN PATCH] xen: fixed violations of MISRA C:2012 Rule 3.1

2023-06-14 Thread nicola




On 13/06/23 11:44, Julien Grall wrote:

Hi,

On 13/06/2023 09:27, Jan Beulich wrote:

On 13.06.2023 09:42, Nicola Vetrini wrote:
The xen sources contain several violations of Rule 3.1 from MISRA 
C:2012,

whose headline states:
"The character sequences '/*' and '//' shall not be used within a 
comment".


Most of the violations are due to the presence of links to webpages 
within

C-style comment blocks, such as:

xen/arch/arm/include/asm/smccc.h:37.1-41.3
/*
  * This file provides common defines for ARM SMC Calling Convention as
  * specified in
  * 
http://infocenter.arm.com/help/topic/com.arm.doc.den0028a/index.html

  */

In this case, we propose to deviate all of these occurrences with a
project deviation to be captured by a tool configuration.

There are, however, a few other violations that do not fall under this
category, namely:

1. in file "xen/arch/arm/include/asm/arm64/flushtlb.h" we propose to
avoid the usage of a nested comment;
2. in file "xen/common/xmalloc_tlsf.c" we propose to substitute the
commented-out if statement with a "#if 0 .. #endif;
3. in file "xen/include/xen/atomic.h" and
"xen/drivers/passthrough/arm/smmu-v3.c" we propose to split the C-style
comment containing the nested comment into two doxygen comments, 
clearly
identifying the second as a code sample. This can then be captured 
with a

project deviation by a tool configuration.

Signed-off-by: Nicola Vetrini 
---
Changes:
- Resending the patch with the right maintainers in CC.


But without otherwise addressing comments already given, afaics. One 
more

remark:


--- a/xen/common/xmalloc_tlsf.c
+++ b/xen/common/xmalloc_tlsf.c
@@ -140,9 +140,10 @@ static inline void MAPPING_SEARCH(unsigned long 
*r, int *fl, int *sl)

  *fl = flsl(*r) - 1;
  *sl = (*r >> (*fl - MAX_LOG2_SLI)) - MAX_SLI;
  *fl -= FLI_OFFSET;
-    /*if ((*fl -= FLI_OFFSET) < 0) // FL will be always >0!
- *fl = *sl = 0;
- */
+#if 0
+    if ((*fl -= FLI_OFFSET) < 0) // FL will be always >0!
+    fl = *sl = 0;


You want to get indentation right here, and you don't want to lose
the indirection on fl.


+#endif
  *r &= ~t;
  }
  }


If you split this to 4 patches, leaving the URL proposal in just
the cover letter, then I think this one change (with the adjustments)
could go in right away. Similarly I expect the arm64/flushtlb.h
change could be ack-ed right away by an Arm maintainer.


I actually dislike the proposal. In this case, the code is meant to 
look like assembly code. I would replace the // with ;. Also, I would 
like to keep the comment style in sync in arm32/flushtlb.h. So can 
this be updated as well?


Cheers,


Hi, Julien.

I'm not authorized to send patches about files in the arm32 tree, but 
surely the patch can be easily replicated in any place where it makes 
sense for consistency.


Regards,

  Nicola

Re: [PATCH v4 02/34] s390: Use _pt_s390_gaddr for gmap address tracking

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:51PM -0700, Vishal Moola (Oracle) wrote:
> s390 uses page->index to keep track of page tables for the guest address
> space. In an attempt to consolidate the usage of page fields in s390,
> replace _pt_pad_2 with _pt_s390_gaddr to replace page->index in gmap.
> 
> This will help with the splitting of struct ptdesc from struct page, as
> well as allow s390 to use _pt_frag_refcount for fragmented page table
> tracking.
> 
> Since page->_pt_s390_gaddr aliases with mapping, ensure its set to NULL
> before freeing the pages as well.

I'm looking at the final result and unless I've missed something, setting
of _pt_s390_gaddr to 0 is always followed by pagetable_free().
Can't we have pagetable_free() take care of zeroing _pt_s390_gaddr?
I think patch 16 ("s390: Convert various gmap functions to use ptdescs")
would be the right place for that.

Otherwise:

Acked-by: Mike Rapoport (IBM) 
 
> This also reverts commit 7e25de77bc5ea ("s390/mm: use pmd_pgtable_page()
> helper in __gmap_segment_gaddr()") which had s390 use
> pmd_pgtable_page() to get a gmap page table, as pmd_pgtable_page()
> should be used for more generic process page tables.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  arch/s390/mm/gmap.c  | 56 +++-
>  include/linux/mm_types.h |  2 +-
>  2 files changed, 39 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index dc90d1eb0d55..81c683426b49 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -70,7 +70,7 @@ static struct gmap *gmap_alloc(unsigned long limit)
>   page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
>   if (!page)
>   goto out_free;
> - page->index = 0;
> + page->_pt_s390_gaddr = 0;
>   list_add(>lru, >crst_list);
>   table = page_to_virt(page);
>   crst_table_init(table, etype);
> @@ -187,16 +187,20 @@ static void gmap_free(struct gmap *gmap)
>   if (!(gmap_is_shadow(gmap) && gmap->removed))
>   gmap_flush_tlb(gmap);
>   /* Free all segment & region tables. */
> - list_for_each_entry_safe(page, next, >crst_list, lru)
> + list_for_each_entry_safe(page, next, >crst_list, lru) {
> + page->_pt_s390_gaddr = 0;
>   __free_pages(page, CRST_ALLOC_ORDER);
> + }
>   gmap_radix_tree_free(>guest_to_host);
>   gmap_radix_tree_free(>host_to_guest);
>  
>   /* Free additional data for a shadow gmap */
>   if (gmap_is_shadow(gmap)) {
>   /* Free all page tables. */
> - list_for_each_entry_safe(page, next, >pt_list, lru)
> + list_for_each_entry_safe(page, next, >pt_list, lru) {
> + page->_pt_s390_gaddr = 0;
>   page_table_free_pgste(page);
> + }
>   gmap_rmap_radix_tree_free(>host_to_rmap);
>   /* Release reference to the parent */
>   gmap_put(gmap->parent);
> @@ -318,12 +322,14 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned 
> long *table,
>   list_add(>lru, >crst_list);
>   *table = __pa(new) | _REGION_ENTRY_LENGTH |
>   (*table & _REGION_ENTRY_TYPE_MASK);
> - page->index = gaddr;
> + page->_pt_s390_gaddr = gaddr;
>   page = NULL;
>   }
>   spin_unlock(>guest_table_lock);
> - if (page)
> + if (page) {
> + page->_pt_s390_gaddr = 0;
>   __free_pages(page, CRST_ALLOC_ORDER);
> + }
>   return 0;
>  }
>  
> @@ -336,12 +342,14 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned 
> long *table,
>  static unsigned long __gmap_segment_gaddr(unsigned long *entry)
>  {
>   struct page *page;
> - unsigned long offset;
> + unsigned long offset, mask;
>  
>   offset = (unsigned long) entry / sizeof(unsigned long);
>   offset = (offset & (PTRS_PER_PMD - 1)) * PMD_SIZE;
> - page = pmd_pgtable_page((pmd_t *) entry);
> - return page->index + offset;
> + mask = ~(PTRS_PER_PMD * sizeof(pmd_t) - 1);
> + page = virt_to_page((void *)((unsigned long) entry & mask));
> +
> + return page->_pt_s390_gaddr + offset;
>  }
>  
>  /**
> @@ -1351,6 +1359,7 @@ static void gmap_unshadow_pgt(struct gmap *sg, unsigned 
> long raddr)
>   /* Free page table */
>   page = phys_to_page(pgt);
>   list_del(>lru);
> + page->_pt_s390_gaddr = 0;
>   page_table_free_pgste(page);
>  }
>  
> @@ -1379,6 +1388,7 @@ static void __gmap_unshadow_sgt(struct gmap *sg, 
> unsigned long raddr,
>   /* Free page table */
>   page = phys_to_page(pgt);
>   list_del(>lru);
> + page->_pt_s390_gaddr = 0;
>   page_table_free_pgste(page);
>   }
>  }
> @@ -1409,6 +1419,7 @@ static void gmap_unshadow_sgt(struct gmap *sg, unsigned 
> long raddr)
>   /* Free segment table */
>   page = phys_to_page(sgt);
>

Functions _spin_lock_cb() and handle_ro_raz()

2023-06-14 Thread Federico Serafini


Hello everyone,

I am working on the violations of MISRA C:2012 Rule 8.10,
whose headline says:
"An inline function shall be declared with the static storage class".

For both ARM64 and X86_64 builds,
function _spin_lock_cb() defined in spinlock.c violates the rule.
Such function is declared in spinlock.h without
the inline function specifier: are there any reasons to do this?
What about solving the violation by moving the function definition in
spinlock.h and declaring it as static inline?

The same happens also for the function handle_ro_raz() in the ARM64
build, declared in traps.h and defined in traps.c.

Regards,
Federico Serafini

Re: [PATCH v4 01/34] mm: Add PAGE_TYPE_OP folio functions

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:50PM -0700, Vishal Moola (Oracle) wrote:
> No folio equivalents for page type operations have been defined, so
> define them for later folio conversions.
> 
> Also changes the Page##uname macros to take in const struct page* since
> we only read the memory here.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/page-flags.h | 20 ++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 92a2063a0a23..e99a616b9bcd 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -908,6 +908,8 @@ static inline bool is_page_hwpoison(struct page *page)
>  
>  #define PageType(page, flag) \
>   ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> +#define folio_test_type(folio, flag) \
> + ((folio->page.page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
>  
>  static inline int page_type_has_type(unsigned int page_type)
>  {
> @@ -920,20 +922,34 @@ static inline int page_has_type(struct page *page)
>  }
>  
>  #define PAGE_TYPE_OPS(uname, lname)  \
> -static __always_inline int Page##uname(struct page *page)\
> +static __always_inline int Page##uname(const struct page *page)  
> \
>  {\
>   return PageType(page, PG_##lname);  \
>  }\
> +static __always_inline int folio_test_##lname(const struct folio *folio)\
> +{\
> + return folio_test_type(folio, PG_##lname);  \
> +}\
>  static __always_inline void __SetPage##uname(struct page *page)  
> \
>  {\
>   VM_BUG_ON_PAGE(!PageType(page, 0), page);   \
>   page->page_type &= ~PG_##lname; \
>  }\
> +static __always_inline void __folio_set_##lname(struct folio *folio) \
> +{\
> + VM_BUG_ON_FOLIO(!folio_test_type(folio, 0), folio); \
> + folio->page.page_type &= ~PG_##lname;   \
> +}\
>  static __always_inline void __ClearPage##uname(struct page *page)\
>  {\
>   VM_BUG_ON_PAGE(!Page##uname(page), page);   \
>   page->page_type |= PG_##lname;  \
> -}
> +}\
> +static __always_inline void __folio_clear_##lname(struct folio *folio)   
> \
> +{\
> + VM_BUG_ON_FOLIO(!folio_test_##lname(folio), folio); \
> + folio->page.page_type |= PG_##lname;\
> +}\
>  
>  /*
>   * PageBuddy() indicates that the page is free and in the buddy system
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

1 2 >

1 - 100 of 141 matches

Mail list logo