[PATCH v2 14/14] xtensa: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.

Update both to actually describe what this option does.

Signed-off-by: Mike Rapoport (IBM) 
Reviewed-by: Max Filippov 
Acked-by: Kirill A. Shutemov 
---
 arch/xtensa/Kconfig | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index 3eee334ba873..3c6e5471f025 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -772,15 +772,17 @@ config HIGHMEM
  If unsure, say Y.
 
 config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order"
+   int "Order of maximal physically contiguous allocations"
default "10"
help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages.  This option selects the largest power of two that the kernel
- keeps in the memory allocator.  If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
+
+ Don't change if unsure.
 
 endmenu
 
-- 
2.35.1



[PATCH v2 13/14] sparc: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.

Update both to actually describe what this option does.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Kirill A. Shutemov 
---
 arch/sparc/Kconfig | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index e3242bf5a8df..959e43a1aaca 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -270,15 +270,17 @@ config ARCH_SPARSEMEM_DEFAULT
def_bool y if SPARC64
 
 config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order"
+   int "Order of maximal physically contiguous allocations"
default "12"
help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages.  This option selects the largest power of two that the kernel
- keeps in the memory allocator.  If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
+
+ Don't change if unsure.
 
 if SPARC64 || COMPILE_TEST
 source "kernel/power/Kconfig"
-- 
2.35.1



[PATCH v2 12/14] sh: drop ranges for definition of ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

sh defines insane ranges for ARCH_FORCE_MAX_ORDER allowing MAX_ORDER
up to 63, which implies maximal contiguous allocation size of 2^63
pages.

Drop bogus definitions of ranges for ARCH_FORCE_MAX_ORDER and leave it a
simple integer with sensible defaults.

Users that *really* need to change the value of ARCH_FORCE_MAX_ORDER
will be able to do so but they won't be mislead by the bogus ranges.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Kirill A. Shutemov 
---
 arch/sh/mm/Kconfig | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig
index fb15ba1052ba..511c17aede4a 100644
--- a/arch/sh/mm/Kconfig
+++ b/arch/sh/mm/Kconfig
@@ -21,9 +21,7 @@ config PAGE_OFFSET
 config ARCH_FORCE_MAX_ORDER
int "Order of maximal physically contiguous allocations"
default "8" if PAGE_SIZE_16KB
-   range 6 63 if PAGE_SIZE_64KB
default "6" if PAGE_SIZE_64KB
-   range 10 63
default "13" if !MMU
default "10"
help
-- 
2.35.1



[PATCH v2 11/14] sh: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.

Update both to actually describe what this option does.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Kirill A. Shutemov 
---
 arch/sh/mm/Kconfig | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig
index 40271090bd7d..fb15ba1052ba 100644
--- a/arch/sh/mm/Kconfig
+++ b/arch/sh/mm/Kconfig
@@ -19,8 +19,7 @@ config PAGE_OFFSET
default "0x"
 
 config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order"
-   range 8 63 if PAGE_SIZE_16KB
+   int "Order of maximal physically contiguous allocations"
default "8" if PAGE_SIZE_16KB
range 6 63 if PAGE_SIZE_64KB
default "6" if PAGE_SIZE_64KB
@@ -28,16 +27,18 @@ config ARCH_FORCE_MAX_ORDER
default "13" if !MMU
default "10"
help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages.  This option selects the largest power of two that the kernel
- keeps in the memory allocator.  If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
 
  The page size is not necessarily 4KB. Keep this in mind when
  choosing a value for this option.
 
+ Don't change if unsure.
+
 config MEMORY_START
hex "Physical memory start address"
default "0x0800"
-- 
2.35.1



[PATCH v2 10/14] powerpc: drop ranges for definition of ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

PowerPC defines ranges for ARCH_FORCE_MAX_ORDER some of which are
insanely allowing MAX_ORDER up to 63, which implies maximal contiguous
allocation size of 2^63 pages.

Drop bogus definitions of ranges for ARCH_FORCE_MAX_ORDER and leave it a
simple integer with sensible defaults.

Users that *really* need to change the value of ARCH_FORCE_MAX_ORDER
will be able to do so but they won't be mislead by the bogus ranges.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Kirill A. Shutemov 
---
 arch/powerpc/Kconfig | 6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c0095bf795ca..419be4a71004 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -897,17 +897,11 @@ config DATA_SHIFT
 
 config ARCH_FORCE_MAX_ORDER
int "Order of maximal physically contiguous allocations"
-   range 7 8 if PPC64 && PPC_64K_PAGES
default "8" if PPC64 && PPC_64K_PAGES
-   range 12 12 if PPC64 && !PPC_64K_PAGES
default "12" if PPC64 && !PPC_64K_PAGES
-   range 8 63 if PPC32 && PPC_16K_PAGES
default "8" if PPC32 && PPC_16K_PAGES
-   range 6 63 if PPC32 && PPC_64K_PAGES
default "6" if PPC32 && PPC_64K_PAGES
-   range 4 63 if PPC32 && PPC_256K_PAGES
default "4" if PPC32 && PPC_256K_PAGES
-   range 10 63
default "10"
help
  The kernel page allocator limits the size of maximal physically
-- 
2.35.1



[PATCH v2 09/14] powerpc: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.

Update both to actually describe what this option does.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Kirill A. Shutemov 
---
 arch/powerpc/Kconfig | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 24d56536b269..c0095bf795ca 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -896,7 +896,7 @@ config DATA_SHIFT
  8M pages will be pinned.
 
 config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order"
+   int "Order of maximal physically contiguous allocations"
range 7 8 if PPC64 && PPC_64K_PAGES
default "8" if PPC64 && PPC_64K_PAGES
range 12 12 if PPC64 && !PPC_64K_PAGES
@@ -910,17 +910,19 @@ config ARCH_FORCE_MAX_ORDER
range 10 63
default "10"
help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages.  This option selects the largest power of two that the kernel
- keeps in the memory allocator.  If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
 
  The page size is not necessarily 4KB.  For example, on 64-bit
  systems, 64KB pages can be enabled via CONFIG_PPC_64K_PAGES.  Keep
  this in mind when choosing a value for this option.
 
+ Don't change if unsure.
+
 config PPC_SUBPAGE_PROT
bool "Support setting protections for 4k subpages (subpage_prot 
syscall)"
default n
-- 
2.35.1



[PATCH v2 08/14] nios2: drop ranges for definition of ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

nios2 defines range for ARCH_FORCE_MAX_ORDER allowing MAX_ORDER
up to 19, which implies maximal contiguous allocation size of 2^19
pages or 2GiB.

Drop bogus definition of ranges for ARCH_FORCE_MAX_ORDER and leave it a
simple integer with sensible default.

Users that *really* need to change the value of ARCH_FORCE_MAX_ORDER
will be able to do so but they won't be mislead by the bogus ranges.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Kirill A. Shutemov 
---
 arch/nios2/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/nios2/Kconfig b/arch/nios2/Kconfig
index fcaa6bbda3fc..e5936417d3cd 100644
--- a/arch/nios2/Kconfig
+++ b/arch/nios2/Kconfig
@@ -46,7 +46,6 @@ source "kernel/Kconfig.hz"
 
 config ARCH_FORCE_MAX_ORDER
int "Order of maximal physically contiguous allocations"
-   range 8 19
default "10"
help
  The kernel page allocator limits the size of maximal physically
-- 
2.35.1



[PATCH v2 07/14] nios2: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.

Update both to actually describe what this option does.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Kirill A. Shutemov 
---
 arch/nios2/Kconfig | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/nios2/Kconfig b/arch/nios2/Kconfig
index 89708b95978c..fcaa6bbda3fc 100644
--- a/arch/nios2/Kconfig
+++ b/arch/nios2/Kconfig
@@ -45,16 +45,18 @@ menu "Kernel features"
 source "kernel/Kconfig.hz"
 
 config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order"
+   int "Order of maximal physically contiguous allocations"
range 8 19
default "10"
help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages.  This option selects the largest power of two that the kernel
- keeps in the memory allocator.  If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
+
+ Don't change if unsure.
 
 endmenu
 
-- 
2.35.1



[PATCH v2 06/14] m68k: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.

Update both to actually describe what this option does.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Kirill A. Shutemov 
Acked-by: Geert Uytterhoeven 
---
 arch/m68k/Kconfig.cpu | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu
index c9df6572133f..e530bc8f240f 100644
--- a/arch/m68k/Kconfig.cpu
+++ b/arch/m68k/Kconfig.cpu
@@ -398,21 +398,23 @@ config SINGLE_MEMORY_CHUNK
  Say N if not sure.
 
 config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order" if ADVANCED
+   int "Order of maximal physically contiguous allocations" if ADVANCED
depends on !SINGLE_MEMORY_CHUNK
default "10"
help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages.  This option selects the largest power of two that the kernel
- keeps in the memory allocator.  If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
 
  For systems that have holes in their physical address space this
  value also defines the minimal size of the hole that allows
  freeing unused memory map.
 
+ Don't change if unsure.
+
 config 060_WRITETHROUGH
bool "Use write-through caching for 68060 supervisor accesses"
depends on ADVANCED && M68060
-- 
2.35.1



[PATCH v2 05/14] ia64: don't allow users to override ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

It is enough to keep default values for base and huge pages without
letting users to override ARCH_FORCE_MAX_ORDER.

Drop the prompt to make the option unvisible in *config.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Kirill A. Shutemov 
---
 arch/ia64/Kconfig | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 0d2f41fa56ee..b61437cae162 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -202,8 +202,7 @@ config IA64_CYCLONE
  If you're unsure, answer N.
 
 config ARCH_FORCE_MAX_ORDER
-   int "MAX_ORDER (10 - 16)"  if !HUGETLB_PAGE
-   range 10 16  if !HUGETLB_PAGE
+   int
default "16" if HUGETLB_PAGE
default "10"
 
-- 
2.35.1



[PATCH v2 04/14] csky: drop ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The default value of ARCH_FORCE_MAX_ORDER matches the generic default
defined in the MM code, the architecture does not support huge pages, so
there is no need to keep ARCH_FORCE_MAX_ORDER option available.

Drop it.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Kirill A. Shutemov 
---
 arch/csky/Kconfig | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index c694fac43bed..00379a843c37 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -332,10 +332,6 @@ config HIGHMEM
select KMAP_LOCAL
default y
 
-config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order"
-   default "10"
-
 config DRAM_BASE
hex "DRAM start addr (the same with memory-section in dts)"
default 0x0
-- 
2.35.1



[PATCH v2 03/14] arm64: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.

Update both to actually describe what this option does.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Kirill A. Shutemov 
---
 arch/arm64/Kconfig | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7324032af859..cdaf52ac4018 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1487,24 +1487,24 @@ config XEN
 # 16K |   27  |  14  |   13| 11
 |
 # 64K |   29  |  16  |   13| 13
 |
 config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order" if EXPERT && (ARM64_4K_PAGES || 
ARM64_16K_PAGES)
+   int "Order of maximal physically contiguous allocations" if 
ARM64_4K_PAGES || ARM64_16K_PAGES
default "13" if ARM64_64K_PAGES
default "11" if ARM64_16K_PAGES
default "10"
help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages.  This option selects the largest power of two that the kernel
- keeps in the memory allocator.  If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
 
- We make sure that we can allocate up to a HugePage size for each 
configuration.
- Hence we have :
-   MAX_ORDER = PMD_SHIFT - PAGE_SHIFT  => PAGE_SHIFT - 3
+ The maximal size of allocation cannot exceed the size of the
+ section, so the value of MAX_ORDER should satisfy
 
- However for 4K, we choose a higher default value, 10 as opposed to 9, 
giving us
- 4M allocations matching the default size used by generic code.
+   MAX_ORDER + PAGE_SHIFT <= SECTION_SIZE_BITS
+
+ Don't change if unsure.
 
 config UNMAP_KERNEL_AT_EL0
bool "Unmap kernel when running in userspace (aka \"KAISER\")" if EXPERT
-- 
2.35.1



[PATCH v2 02/14] arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

It is not a good idea to change fundamental parameters of core memory
management. Having predefined ranges suggests that the values within
those ranges are sensible, but one has to *really* understand
implications of changing MAX_ORDER before actually amending it and
ranges don't help here.

Drop ranges in definition of ARCH_FORCE_MAX_ORDER and make its prompt
visible only if EXPERT=y

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/arm64/Kconfig | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e60baf7859d1..7324032af859 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1487,11 +1487,9 @@ config XEN
 # 16K |   27  |  14  |   13| 11
 |
 # 64K |   29  |  16  |   13| 13
 |
 config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order" if ARM64_4K_PAGES || ARM64_16K_PAGES
+   int "Maximum zone order" if EXPERT && (ARM64_4K_PAGES || 
ARM64_16K_PAGES)
default "13" if ARM64_64K_PAGES
-   range 11 13 if ARM64_16K_PAGES
default "11" if ARM64_16K_PAGES
-   range 10 15 if ARM64_4K_PAGES
default "10"
help
  The kernel memory allocator divides physically contiguous memory
-- 
2.35.1



[PATCH v2 01/14] arm: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.

Update both to actually describe what this option does.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Kirill A. Shutemov 
---
 arch/arm/Kconfig | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 929e646e84b9..0b15384c62e6 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1354,17 +1354,19 @@ config ARM_MODULE_PLTS
  configurations. If unsure, say y.
 
 config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order"
+   int "Order of maximal physically contiguous allocations"
default "11" if SOC_AM33XX
default "8" if SA
default "10"
help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages.  This option selects the largest power of two that the kernel
- keeps in the memory allocator.  If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
+
+ Don't change if unsure.
 
 config ALIGNMENT_TRAP
def_bool CPU_CP15_MMU
-- 
2.35.1



[PATCH v2 00/14] arch,mm: cleanup Kconfig entries for ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Hi,

Several architectures have ARCH_FORCE_MAX_ORDER in their Kconfig and
they all have wrong and misleading prompt and help text for this option.

Besides, some define insane limits for possible values of
ARCH_FORCE_MAX_ORDER, some carefully define ranges only for a subset of
possible configurations, some make this option configurable by users for no
good reason.

This set updates the prompt and help text everywhere and does its best to
update actual definitions of ranges where applicable.

kbuild generated a bunch of false positives because it assigns -1 to
ARCH_FORCE_MAX_ORDER, hopefully this will be fixed soon.

v2:
* arm64: show prompt for ARCH_FORCE_MAX_ORDER only if EXPERT (Catalin)
* Add Acked- and Reviewed-by tags (thanks Geert, Kirill and Max)

v1: https://lore.kernel.org/all/20230323092156.2545741-1-r...@kernel.org

Mike Rapoport (IBM) (14):
  arm: reword ARCH_FORCE_MAX_ORDER prompt and help text
  arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER
  arm64: reword ARCH_FORCE_MAX_ORDER prompt and help text
  csky: drop ARCH_FORCE_MAX_ORDER
  ia64: don't allow users to override ARCH_FORCE_MAX_ORDER
  m68k: reword ARCH_FORCE_MAX_ORDER prompt and help text
  nios2: reword ARCH_FORCE_MAX_ORDER prompt and help text
  nios2: drop ranges for definition of ARCH_FORCE_MAX_ORDER
  powerpc: reword ARCH_FORCE_MAX_ORDER prompt and help text
  powerpc: drop ranges for definition of ARCH_FORCE_MAX_ORDER
  sh: reword ARCH_FORCE_MAX_ORDER prompt and help text
  sh: drop ranges for definition of ARCH_FORCE_MAX_ORDER
  sparc: reword ARCH_FORCE_MAX_ORDER prompt and help text
  xtensa: reword ARCH_FORCE_MAX_ORDER prompt and help text

 arch/arm/Kconfig  | 16 +---
 arch/arm64/Kconfig| 27 ---
 arch/csky/Kconfig |  4 
 arch/ia64/Kconfig |  3 +--
 arch/m68k/Kconfig.cpu | 16 +---
 arch/nios2/Kconfig| 17 +
 arch/powerpc/Kconfig  | 22 +-
 arch/sh/mm/Kconfig| 19 +--
 arch/sparc/Kconfig| 16 +---
 arch/xtensa/Kconfig   | 16 +---
 10 files changed, 76 insertions(+), 80 deletions(-)


base-commit: 51551d71edbc998fd8c8afa7312db3d270f5998e
-- 
2.35.1

*** BLURB HERE ***

Mike Rapoport (IBM) (14):
  arm: reword ARCH_FORCE_MAX_ORDER prompt and help text
  arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER
  arm64: reword ARCH_FORCE_MAX_ORDER prompt and help text
  csky: drop ARCH_FORCE_MAX_ORDER
  ia64: don't allow users to override ARCH_FORCE_MAX_ORDER
  m68k: reword ARCH_FORCE_MAX_ORDER prompt and help text
  nios2: reword ARCH_FORCE_MAX_ORDER prompt and help text
  nios2: drop ranges for definition of ARCH_FORCE_MAX_ORDER
  powerpc: reword ARCH_FORCE_MAX_ORDER prompt and help text
  powerpc: drop ranges for definition of ARCH_FORCE_MAX_ORDER
  sh: reword ARCH_FORCE_MAX_ORDER prompt and help text
  sh: drop ranges for definition of ARCH_FORCE_MAX_ORDER
  sparc: reword ARCH_FORCE_MAX_ORDER prompt and help text
  xtensa: reword ARCH_FORCE_MAX_ORDER prompt and help text

 arch/arm/Kconfig  | 16 +---
 arch/arm64/Kconfig| 26 --
 arch/csky/Kconfig |  4 
 arch/ia64/Kconfig |  3 +--
 arch/m68k/Kconfig.cpu | 16 +---
 arch/nios2/Kconfig| 17 +
 arch/powerpc/Kconfig  | 22 +-
 arch/sh/mm/Kconfig| 19 +--
 arch/sparc/Kconfig| 16 +---
 arch/xtensa/Kconfig   | 16 +---
 10 files changed, 76 insertions(+), 79 deletions(-)


base-commit: 51551d71edbc998fd8c8afa7312db3d270f5998e
-- 
2.35.1



[PATCH v2 2/2] KVM: PPC: selftests: basic sanity tests

2023-03-23 Thread Nicholas Piggin
Add tests that exercise basic functions of the kvm selftests framework,
guest creation, ucalls, hcalls, copying data between guest and host,
interrupts and page faults.

These don't stress KVM so much as being useful when developing support
for powerpc.

Signed-off-by: Nicholas Piggin 
---
 tools/testing/selftests/kvm/Makefile  |   2 +
 .../selftests/kvm/include/powerpc/hcall.h |   2 +
 .../testing/selftests/kvm/powerpc/null_test.c | 166 ++
 .../selftests/kvm/powerpc/rtas_hcall.c| 146 +++
 4 files changed, 316 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/powerpc/null_test.c
 create mode 100644 tools/testing/selftests/kvm/powerpc/rtas_hcall.c

diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index 908602a9f513..d4052eccaaee 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -181,6 +181,8 @@ TEST_GEN_PROGS_riscv += kvm_page_table_test
 TEST_GEN_PROGS_riscv += set_memory_region_test
 TEST_GEN_PROGS_riscv += kvm_binary_stats_test
 
+TEST_GEN_PROGS_powerpc += powerpc/null_test
+TEST_GEN_PROGS_powerpc += powerpc/rtas_hcall
 TEST_GEN_PROGS_powerpc += demand_paging_test
 TEST_GEN_PROGS_powerpc += dirty_log_test
 TEST_GEN_PROGS_powerpc += kvm_create_max_vcpus
diff --git a/tools/testing/selftests/kvm/include/powerpc/hcall.h 
b/tools/testing/selftests/kvm/include/powerpc/hcall.h
index 4eb9ad635402..cbcaf180c427 100644
--- a/tools/testing/selftests/kvm/include/powerpc/hcall.h
+++ b/tools/testing/selftests/kvm/include/powerpc/hcall.h
@@ -13,6 +13,8 @@
 #define UCALL_R4_EXCPT 0x1b0f // other exception, r5 contains vector, r6,7 SRRs
 #define UCALL_R4_SIMPLE0x // simple exit usable by asm with no 
ucall data
 
+#define H_RTAS 0xf000
+
 int64_t hcall0(uint64_t token);
 int64_t hcall1(uint64_t token, uint64_t arg1);
 int64_t hcall2(uint64_t token, uint64_t arg1, uint64_t arg2);
diff --git a/tools/testing/selftests/kvm/powerpc/null_test.c 
b/tools/testing/selftests/kvm/powerpc/null_test.c
new file mode 100644
index ..31db0b6becd6
--- /dev/null
+++ b/tools/testing/selftests/kvm/powerpc/null_test.c
@@ -0,0 +1,166 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Tests for guest creation, run, ucall, interrupt, and vm dumping.
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "kselftest.h"
+#include "processor.h"
+#include "helpers.h"
+
+extern void guest_code_asm(void);
+asm(".global guest_code_asm");
+asm(".balign 4");
+asm("guest_code_asm:");
+asm("li 3,0"); // H_UCALL
+asm("li 4,0"); // UCALL_R4_SIMPLE
+asm("sc 1");
+
+static void test_asm(void)
+{
+   struct kvm_vcpu *vcpu;
+   struct kvm_vm *vm;
+
+   vm = vm_create_with_one_vcpu(, guest_code_asm);
+
+   vcpu_run(vcpu);
+   handle_ucall(vcpu, UCALL_NONE);
+
+   kvm_vm_free(vm);
+}
+
+static void guest_code_ucall(void)
+{
+   GUEST_DONE();
+}
+
+static void test_ucall(void)
+{
+   struct kvm_vcpu *vcpu;
+   struct kvm_vm *vm;
+
+   vm = vm_create_with_one_vcpu(, guest_code_ucall);
+
+   vcpu_run(vcpu);
+   handle_ucall(vcpu, UCALL_DONE);
+
+   kvm_vm_free(vm);
+}
+
+static void trap_handler(struct ex_regs *regs)
+{
+   GUEST_SYNC(1);
+   regs->nia += 4;
+}
+
+static void guest_code_trap(void)
+{
+   GUEST_SYNC(0);
+   asm volatile("trap");
+   GUEST_DONE();
+}
+
+static void test_trap(void)
+{
+   struct kvm_vcpu *vcpu;
+   struct kvm_vm *vm;
+
+   vm = vm_create_with_one_vcpu(, guest_code_trap);
+   vm_install_exception_handler(vm, 0x700, trap_handler);
+
+   vcpu_run(vcpu);
+   host_sync(vcpu, 0);
+   vcpu_run(vcpu);
+   host_sync(vcpu, 1);
+   vcpu_run(vcpu);
+   handle_ucall(vcpu, UCALL_DONE);
+
+   vm_install_exception_handler(vm, 0x700, NULL);
+
+   kvm_vm_free(vm);
+}
+
+static void dsi_handler(struct ex_regs *regs)
+{
+   GUEST_SYNC(1);
+   regs->nia += 4;
+}
+
+static void guest_code_dsi(void)
+{
+   GUEST_SYNC(0);
+   asm volatile("stb %r0,0(0)");
+   GUEST_DONE();
+}
+
+static void test_dsi(void)
+{
+   struct kvm_vcpu *vcpu;
+   struct kvm_vm *vm;
+
+   vm = vm_create_with_one_vcpu(, guest_code_dsi);
+   vm_install_exception_handler(vm, 0x300, dsi_handler);
+
+   vcpu_run(vcpu);
+   host_sync(vcpu, 0);
+   vcpu_run(vcpu);
+   host_sync(vcpu, 1);
+   vcpu_run(vcpu);
+   handle_ucall(vcpu, UCALL_DONE);
+
+   vm_install_exception_handler(vm, 0x300, NULL);
+
+   kvm_vm_free(vm);
+}
+
+static void test_dump(void)
+{
+   struct kvm_vcpu *vcpu;
+   struct kvm_vm *vm;
+
+   vm = vm_create_with_one_vcpu(, guest_code_ucall);
+
+   vcpu_run(vcpu);
+   handle_ucall(vcpu, UCALL_DONE);
+
+   printf("Testing vm_dump...\n");
+   vm_dump(stderr, vm, 2);

[PATCH v2 1/2] KVM: PPC: selftests: implement support for powerpc

2023-03-23 Thread Nicholas Piggin
Implement KVM selftests support for powerpc (Book3S-64).

ucalls are implemented with an unsuppored PAPR hcall number which causes
KVM to exit to userspace.

Virtual memory is only implemented for 64K page size and the radix MMU,
and only the base page size is supported for now.

Guest interrupt handlers run in real-mode so require a real-mode page at
gRA 0 and require a bit of juggling around because gVA:gRA is not 1:1
mapped like the kernel has, so we can't just switch MMU on and off.

Signed-off-by: Nicholas Piggin 
---
 tools/testing/selftests/kvm/Makefile  |  13 +
 .../selftests/kvm/include/kvm_util_base.h |  13 +
 .../selftests/kvm/include/powerpc/hcall.h |  20 +
 .../selftests/kvm/include/powerpc/ppc_asm.h   |  17 +
 .../selftests/kvm/include/powerpc/processor.h |  32 ++
 tools/testing/selftests/kvm/lib/kvm_util.c|  10 +
 .../selftests/kvm/lib/powerpc/handlers.S  |  96 
 .../testing/selftests/kvm/lib/powerpc/hcall.c |  45 ++
 .../selftests/kvm/lib/powerpc/processor.c | 411 ++
 .../testing/selftests/kvm/lib/powerpc/ucall.c |  30 ++
 tools/testing/selftests/kvm/powerpc/helpers.h |  46 ++
 11 files changed, 733 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/include/powerpc/hcall.h
 create mode 100644 tools/testing/selftests/kvm/include/powerpc/ppc_asm.h
 create mode 100644 tools/testing/selftests/kvm/include/powerpc/processor.h
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/handlers.S
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/hcall.c
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/processor.c
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/ucall.c
 create mode 100644 tools/testing/selftests/kvm/powerpc/helpers.h

diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index 84a627c43795..908602a9f513 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -55,6 +55,11 @@ LIBKVM_s390x += lib/s390x/ucall.c
 LIBKVM_riscv += lib/riscv/processor.c
 LIBKVM_riscv += lib/riscv/ucall.c
 
+LIBKVM_powerpc += lib/powerpc/handlers.S
+LIBKVM_powerpc += lib/powerpc/processor.c
+LIBKVM_powerpc += lib/powerpc/ucall.c
+LIBKVM_powerpc += lib/powerpc/hcall.c
+
 # Non-compiled test targets
 TEST_PROGS_x86_64 += x86_64/nx_huge_pages_test.sh
 
@@ -176,6 +181,14 @@ TEST_GEN_PROGS_riscv += kvm_page_table_test
 TEST_GEN_PROGS_riscv += set_memory_region_test
 TEST_GEN_PROGS_riscv += kvm_binary_stats_test
 
+TEST_GEN_PROGS_powerpc += demand_paging_test
+TEST_GEN_PROGS_powerpc += dirty_log_test
+TEST_GEN_PROGS_powerpc += kvm_create_max_vcpus
+TEST_GEN_PROGS_powerpc += kvm_page_table_test
+TEST_GEN_PROGS_powerpc += rseq_test
+TEST_GEN_PROGS_powerpc += set_memory_region_test
+TEST_GEN_PROGS_powerpc += kvm_binary_stats_test
+
 TEST_PROGS += $(TEST_PROGS_$(ARCH_DIR))
 TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(ARCH_DIR))
 TEST_GEN_PROGS_EXTENDED += $(TEST_GEN_PROGS_EXTENDED_$(ARCH_DIR))
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
b/tools/testing/selftests/kvm/include/kvm_util_base.h
index fbc2a79369b8..f6807aea634f 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -105,6 +105,7 @@ struct kvm_vm {
bool pgd_created;
vm_paddr_t ucall_mmio_addr;
vm_paddr_t pgd;
+   vm_paddr_t prtb; // powerpc process table
vm_vaddr_t gdt;
vm_vaddr_t tss;
vm_vaddr_t idt;
@@ -160,6 +161,7 @@ enum vm_guest_mode {
VM_MODE_PXXV48_4K,  /* For 48bits VA but ANY bits PA */
VM_MODE_P47V64_4K,
VM_MODE_P44V64_4K,
+   VM_MODE_P52V52_64K,
VM_MODE_P36V48_4K,
VM_MODE_P36V48_16K,
VM_MODE_P36V48_64K,
@@ -197,6 +199,17 @@ extern enum vm_guest_mode vm_mode_default;
 #define MIN_PAGE_SHIFT 12U
 #define ptes_per_page(page_size)   ((page_size) / 8)
 
+#elif defined(__powerpc64__)
+
+/* Radix guest EA and RA are 52-bit on POWER9 and POWER10 */
+#define VM_MODE_DEFAULTVM_MODE_P52V52_64K
+#define MIN_PAGE_SHIFT 12U /// XXX: hack to allocate more page 
table memory because we aren't allocating page tables well on 64K base page size
+#define ptes_per_page(page_size)   ((page_size) / 8)
+
+#else
+
+#error "KVM selftests not implemented for architecture"
+
 #endif
 
 #define MIN_PAGE_SIZE  (1U << MIN_PAGE_SHIFT)
diff --git a/tools/testing/selftests/kvm/include/powerpc/hcall.h 
b/tools/testing/selftests/kvm/include/powerpc/hcall.h
new file mode 100644
index ..4eb9ad635402
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/powerpc/hcall.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * powerpc hcall defines
+ */
+#ifndef SELFTEST_KVM_HCALL_H
+#define SELFTEST_KVM_HCALL_H
+
+#include 
+
+/* Ucalls use unimplemented PAPR hcall 0 which exits KVM */
+#define H_UCALL0
+#define 

[PATCH v2 0/2] KVM: PPC: selftests: powerpc support

2023-03-23 Thread Nicholas Piggin
Hi,

This series adds initial KVM selftests support for powerpc
(64-bit, BookS). It spans 3 maintainers but it does not really
affect arch/powerpc, and it is well contained in selftests
code, just touches some makefiles and a tiny bit headers so
conflicts should be unlikely and trivial.

Hey Paolo and KVM group, if you didn't take the v1 series yet, could
you please take this instead. Otherwise I can send an incremental
fixup.

Since v1:
- r2 (TOC) was not being set for guest code
- MSR[VSX] was not being set for guest code
- Proper guest interrupt handling instead of quick hack that
  just made a ucall out to host.
- Adjust subject to better match kvm selftests convention.

Thanks,
Nick


Nicholas Piggin (2):
  KVM: PPC: selftests: implement support for powerpc
  KVM: PPC: selftests: basic sanity tests

 tools/testing/selftests/kvm/Makefile  |  15 +
 .../selftests/kvm/include/kvm_util_base.h |  13 +
 .../selftests/kvm/include/powerpc/hcall.h |  22 +
 .../selftests/kvm/include/powerpc/ppc_asm.h   |  17 +
 .../selftests/kvm/include/powerpc/processor.h |  32 ++
 tools/testing/selftests/kvm/lib/kvm_util.c|  10 +
 .../selftests/kvm/lib/powerpc/handlers.S  |  96 
 .../testing/selftests/kvm/lib/powerpc/hcall.c |  45 ++
 .../selftests/kvm/lib/powerpc/processor.c | 411 ++
 .../testing/selftests/kvm/lib/powerpc/ucall.c |  30 ++
 tools/testing/selftests/kvm/powerpc/helpers.h |  46 ++
 .../testing/selftests/kvm/powerpc/null_test.c | 166 +++
 .../selftests/kvm/powerpc/rtas_hcall.c| 146 +++
 13 files changed, 1049 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/include/powerpc/hcall.h
 create mode 100644 tools/testing/selftests/kvm/include/powerpc/ppc_asm.h
 create mode 100644 tools/testing/selftests/kvm/include/powerpc/processor.h
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/handlers.S
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/hcall.c
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/processor.c
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/ucall.c
 create mode 100644 tools/testing/selftests/kvm/powerpc/helpers.h
 create mode 100644 tools/testing/selftests/kvm/powerpc/null_test.c
 create mode 100644 tools/testing/selftests/kvm/powerpc/rtas_hcall.c

-- 
2.37.2



Re: [next-20230322] Kernel WARN at kernel/workqueue.c:3182 (rcutorture)

2023-03-23 Thread Paul E. McKenney
On Fri, Mar 24, 2023 at 08:47:38AM +0530, Sachin Sant wrote:
> 
> >>> Hello, Sachin, and it looks like you hit something that Zqiang and I
> >>> have been tracking down.  I am guessing that you were using modprobe
> >>> and rmmod to make this happen, and that this happened at rmmod time.
> >>> 
> >> Yes, the LTP test script rcu_torture.sh relies on modprobe to load/unload
> >> the rcutorture module.
> >> 
> >>> Whatever the reproducer, does the following patch help?
> >> 
> >> Thank you for the patch. Yes, with this patch applied, the test completes
> >> successfully without the reported warning.
> > 
> > Very good, thank you!  May we have your Tested-by?
> 
> Tested-by: Sachin Sant 


Thank you, and I will apply on the next rebase.

Thanx, Paul


Re: [next-20230322] Kernel WARN at kernel/workqueue.c:3182 (rcutorture)

2023-03-23 Thread Sachin Sant


>>> Hello, Sachin, and it looks like you hit something that Zqiang and I
>>> have been tracking down.  I am guessing that you were using modprobe
>>> and rmmod to make this happen, and that this happened at rmmod time.
>>> 
>> Yes, the LTP test script rcu_torture.sh relies on modprobe to load/unload
>> the rcutorture module.
>> 
>>> Whatever the reproducer, does the following patch help?
>> 
>> Thank you for the patch. Yes, with this patch applied, the test completes
>> successfully without the reported warning.
> 
> Very good, thank you!  May we have your Tested-by?

Tested-by: Sachin Sant 

- Sachin



Re: [PATCH] powerpc/atomics: Remove unused function

2023-03-23 Thread Nysal Jan K.A.
Michael,
Any comments on this one?

On Fri, Feb 24, 2023 at 11:02:31AM +, Christophe Leroy wrote:
> 
> 
> Le 24/02/2023 à 11:39, Nysal Jan K.A a écrit :
> > [Vous ne recevez pas souvent de courriers de ny...@linux.ibm.com. Découvrez 
> > pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification 
> > ]
> > 
> > Remove arch_atomic_try_cmpxchg_lock function as it is no longer used
> > since commit 9f61521c7a28 ("powerpc/qspinlock: powerpc qspinlock
> > implementation")
> > 
> > Signed-off-by: Nysal Jan K.A 
> 
> Reviewed-by: Christophe Leroy 
> 
> > ---
> >   arch/powerpc/include/asm/atomic.h | 29 -
> >   1 file changed, 29 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/atomic.h 
> > b/arch/powerpc/include/asm/atomic.h
> > index 486ab7889121..b3a53830446b 100644
> > --- a/arch/powerpc/include/asm/atomic.h
> > +++ b/arch/powerpc/include/asm/atomic.h
> > @@ -130,35 +130,6 @@ ATOMIC_OPS(xor, xor, "", K)
> >   #define arch_atomic_xchg_relaxed(v, new) \
> >  arch_xchg_relaxed(&((v)->counter), (new))
> > 
> > -/*
> > - * Don't want to override the generic atomic_try_cmpxchg_acquire, because
> > - * we add a lock hint to the lwarx, which may not be wanted for the
> > - * _acquire case (and is not used by the other _acquire variants so it
> > - * would be a surprise).
> > - */
> > -static __always_inline bool
> > -arch_atomic_try_cmpxchg_lock(atomic_t *v, int *old, int new)
> > -{
> > -   int r, o = *old;
> > -   unsigned int eh = IS_ENABLED(CONFIG_PPC64);
> > -
> > -   __asm__ __volatile__ (
> > -"1:lwarx   %0,0,%2,%[eh]   # atomic_try_cmpxchg_acquire\n"
> > -"  cmpw0,%0,%3 \n"
> > -"  bne-2f  \n"
> > -"  stwcx.  %4,0,%2 \n"
> > -"  bne-1b  \n"
> > -"\t"   PPC_ACQUIRE_BARRIER "   \n"
> > -"2:\n"
> > -   : "=" (r), "+m" (v->counter)
> > -   : "r" (>counter), "r" (o), "r" (new), [eh] "n" (eh)
> > -   : "cr0", "memory");
> > -
> > -   if (unlikely(r != o))
> > -   *old = r;
> > -   return likely(r == o);
> > -}
> > -
> >   /**
> >* atomic_fetch_add_unless - add unless the number is a given value
> >* @v: pointer of type atomic_t
> > --
> > 2.39.2
> > 


Re: [PATCH 7/8] powerpc/rtas: warn on unsafe argument to rtas_call_unlocked()

2023-03-23 Thread Nathan Lynch
Nathan Lynch  writes:
>
> aside: does anyone know if the display_status() code is worth keeping?
> It looks like it is used to drive the 16-character wide physical LCD I
> remember seeing on P4-era and older machines. Is it a vestige of
> non-LPAR pseries that should be dropped, or is it perhaps useful for
> chrp or cell?

Never mind, I see the display-character token and associated properties
on a P8 LPAR and in a current PAPR.


Re: [PATCH v2 0/4] Reenable VFIO support on POWER systems

2023-03-23 Thread Narayana Murty
On Mon, Mar 06, 2023 at 11:29:53AM -0600, Timothy Pearson wrote:
> This patch series reenables VFIO support on POWER systems.  It
> is based on Alexey Kardashevskiys's patch series, rebased and
> successfully tested under QEMU with a Marvell PCIe SATA controller
> on a POWER9 Blackbird host.
> 
> Alexey Kardashevskiy (3):
>   powerpc/iommu: Add "borrowing" iommu_table_group_ops
>   powerpc/pci_64: Init pcibios subsys a bit later
>   powerpc/iommu: Add iommu_ops to report capabilities and allow blocking
> domains
> 
> Timothy Pearson (1):
>   Add myself to MAINTAINERS for Power VFIO support
> 
>  MAINTAINERS   |   5 +
>  arch/powerpc/include/asm/iommu.h  |   6 +-
>  arch/powerpc/include/asm/pci-bridge.h |   7 +
>  arch/powerpc/kernel/iommu.c   | 246 +-
>  arch/powerpc/kernel/pci_64.c  |   2 +-
>  arch/powerpc/platforms/powernv/pci-ioda.c |  36 +++-
>  arch/powerpc/platforms/pseries/iommu.c|  27 +++
>  arch/powerpc/platforms/pseries/pseries.h  |   4 +
>  arch/powerpc/platforms/pseries/setup.c|   3 +
>  drivers/vfio/vfio_iommu_spapr_tce.c   |  96 ++---
>  10 files changed, 338 insertions(+), 94 deletions(-)
> 
> -- 
> 2.30.2
The Alexey Kardashevskiy (3) patchs  series tested on powerNV denali power9 
system with 
NetXtreme BCM5719 Gigabit Ethernet PCIe  card.The vfio passthrough 
is working fine 

Tested-by : Narayana Murty 


Re: perf tools power9 JSON files build breakage on ubuntu 18.04 cross build

2023-03-23 Thread Benjamin Gray
On Thu, 2023-03-23 at 08:50 -0700, Ian Rogers wrote:
> On Thu, Mar 23, 2023 at 6:11 AM Arnaldo Carvalho de Melo
>  wrote:
> > 
> > Exception processing pmu-events/arch/powerpc/power9/other.json
> > Traceback (most recent call last):
> >   File "pmu-events/jevents.py", line 997, in 
> >     main()
> >   File "pmu-events/jevents.py", line 979, in main
> >     ftw(arch_path, [], preprocess_one_file)
> >   File "pmu-events/jevents.py", line 935, in ftw
> >     ftw(item.path, parents + [item.name], action)
> >   File "pmu-events/jevents.py", line 933, in ftw
> >     action(parents, item)
> >   File "pmu-events/jevents.py", line 514, in preprocess_one_file
> >     for event in read_json_events(item.path, topic):
> >   File "pmu-events/jevents.py", line 388, in read_json_events
> >     events = json.load(open(path), object_hook=JsonEvent)
> >   File "/usr/lib/python3.6/json/__init__.py", line 296, in load
> >     return loads(fp.read(),
> >   File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
> >     return codecs.ascii_decode(input, self.errors)[0]
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in
> > position 55090: ordinal not in range(128)
> >   CC  /tmp/build/perf/tests/expr.o
> > pmu-events/Build:35: recipe for target '/tmp/build/perf/pmu-
> > events/pmu-events.c' failed
> > make[3]: *** [/tmp/build/perf/pmu-events/pmu-events.c] Error 1
> > make[3]: *** Deleting file '/tmp/build/perf/pmu-events/pmu-
> > events.c'
> > Makefile.perf:679: recipe for target '/tmp/build/perf/pmu-
> > events/pmu-events-in.o' failed
> > make[2]: *** [/tmp/build/perf/pmu-events/pmu-events-in.o] Error 2
> > make[2]: *** Waiting for unfinished jobs
> > 
> > 
> > Now jevents is an opt-out feature so I'm noticing these problems.
> > 
> > A similar fix for s390 was accepted today:
> 
> The JEVENTS_ARCH=all make option builds the s390 files even on x86.
> I'm confused as to why that's been working before these fixes.

This is the non-breaking space in the file (UTF8 C2 A0). Telling Python
to decode with UTF8 would work (note it's breaking with the 'ascii'
codec). Setting the environment variable LC_CTYPE="C.UTF-8" sets the
default, or the script can specify explicitly.

But I also doubt the NBS was intentional in the first place.


Re: [linux-next:master] BUILD REGRESSION 7c4a254d78f89546d0e74a40617ef24c6151c8d1

2023-03-23 Thread Lorenzo Stoakes
On Fri, Mar 24, 2023 at 05:34:18AM +0800, kernel test robot wrote:
> tree/branch: 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> branch HEAD: 7c4a254d78f89546d0e74a40617ef24c6151c8d1  Add linux-next 
> specific files for 20230323
>
> Error/Warning reports:
>
> https://lore.kernel.org/oe-kbuild-all/202303161521.jbgbafjj-...@intel.com
> https://lore.kernel.org/oe-kbuild-all/202303231302.iy6qifxa-...@intel.com
> https://lore.kernel.org/oe-kbuild-all/202303232154.axoxawhg-...@intel.com
>
> Error/Warning: (recently discovered and may have been fixed)
>
> drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_validation.c:351:13: 
> warning: variable 'bw_needed' set but not used [-Wunused-but-set-variable]
> drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_validation.c:352:25: 
> warning: variable 'link' set but not used [-Wunused-but-set-variable]
> drivers/net/wireless/legacy/ray_cs.c:628:17: warning: 'strncpy' specified 
> bound 32 equals destination size [-Wstringop-truncation]
> gpio.c:(.init.text+0xec): undefined reference to `of_mm_gpiochip_add_data'
> include/linux/mmzone.h:1749:2: error: #error Allocator MAX_ORDER exceeds 
> SECTION_SIZE
>
> Unverified Error/Warning (likely false positive, please contact us if 
> interested):
>
> drivers/soc/fsl/qe/tsa.c:140:26: sparse: sparse: incorrect type in argument 2 
> (different address spaces)
> drivers/soc/fsl/qe/tsa.c:150:27: sparse: sparse: incorrect type in argument 1 
> (different address spaces)
> mm/mmap.c:962 vma_merge() error: uninitialized symbol 'next'.

I hate to add noise, but just for completeness and clarity, this is an
issue that was resolved in the patchset already whose fixes will be ported
to -next on the next go around.

https://lore.kernel.org/all/cover.1679516210.git.lstoa...@gmail.com has the
latest version (v5), fix came in v3.

> sound/soc/sof/ipc4-pcm.c:391 sof_ipc4_pcm_dai_link_fixup_rate() error: 
> uninitialized symbol 'be_rate'.
> sound/soc/sof/ipc4-topology.c:1132 ipc4_copier_set_capture_fmt() error: 
> uninitialized symbol 'sample_valid_bits'.
>
> Error/Warning ids grouped by kconfigs:
>
> gcc_recent_errors
> |-- alpha-allyesconfig
> |   `-- 
> drivers-net-wireless-legacy-ray_cs.c:warning:strncpy-specified-bound-equals-destination-size
> |-- arm-allmodconfig
> |   |-- 
> drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
> |   `-- 
> drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
> |-- i386-randconfig-m021
> |   `-- mm-mmap.c-vma_merge()-error:uninitialized-symbol-next-.
> |-- ia64-allmodconfig
> |   `-- 
> drivers-net-wireless-legacy-ray_cs.c:warning:strncpy-specified-bound-equals-destination-size
> |-- loongarch-randconfig-r006-20230322
> |   `-- 
> include-linux-mmzone.h:error:error-Allocator-MAX_ORDER-exceeds-SECTION_SIZE
> |-- m68k-randconfig-s041-20230323
> |   |-- 
> drivers-soc-fsl-qe-tsa.c:sparse:sparse:incorrect-type-in-argument-(different-address-spaces)-expected-void-const-noderef-__iomem-got-void-noderef-__iomem-addr
> |   `-- 
> drivers-soc-fsl-qe-tsa.c:sparse:sparse:incorrect-type-in-argument-(different-address-spaces)-expected-void-noderef-__iomem-got-void-noderef-__iomem-addr
> |-- microblaze-buildonly-randconfig-r003-20230323
> |   |-- 
> drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
> |   `-- 
> drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
> |-- powerpc-allmodconfig
> |   |-- 
> drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
> |   `-- 
> drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
> |-- powerpc-randconfig-s041-20230322
> |   `-- gpio.c:(.init.text):undefined-reference-to-of_mm_gpiochip_add_data
> |-- s390-randconfig-m031-20230321
> |   |-- 
> sound-soc-sof-ipc4-pcm.c-sof_ipc4_pcm_dai_link_fixup_rate()-error:uninitialized-symbol-be_rate-.
> |   `-- 
> sound-soc-sof-ipc4-topology.c-ipc4_copier_set_capture_fmt()-error:uninitialized-symbol-sample_valid_bits-.
> `-- sparc-allyesconfig
> `-- 
> drivers-net-wireless-legacy-ray_cs.c:warning:strncpy-specified-bound-equals-destination-size
>
> elapsed time: 1022m
>
> configs tested: 95
> configs skipped: 4
>
> tested configs:
> alphaallyesconfig   gcc
> alpha   defconfig   gcc
> alpharandconfig-r026-20230322   gcc
> arc  allyesconfig   gcc
> arc defconfig   gcc
> arc  r

Re: [PATCH v3 0/4] Use dma_default_coherent for devicetree default coherency

2023-03-23 Thread Christoph Hellwig
On Thu, Mar 23, 2023 at 09:07:31PM +, Jiaxun Yang wrote:
> 
> 
> > 2023年3月23日 07:29,Christoph Hellwig  写道:
> > 
> > The series looks fine to me.  How should we merge it?
> 
> Perhaps go through dma-mapping tree?

Is patch a 6.3 candidate or should all of it go into 6.4?


[linux-next:master] BUILD REGRESSION 7c4a254d78f89546d0e74a40617ef24c6151c8d1

2023-03-23 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 7c4a254d78f89546d0e74a40617ef24c6151c8d1  Add linux-next specific 
files for 20230323

Error/Warning reports:

https://lore.kernel.org/oe-kbuild-all/202303161521.jbgbafjj-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202303231302.iy6qifxa-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202303232154.axoxawhg-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_validation.c:351:13: 
warning: variable 'bw_needed' set but not used [-Wunused-but-set-variable]
drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_validation.c:352:25: 
warning: variable 'link' set but not used [-Wunused-but-set-variable]
drivers/net/wireless/legacy/ray_cs.c:628:17: warning: 'strncpy' specified bound 
32 equals destination size [-Wstringop-truncation]
gpio.c:(.init.text+0xec): undefined reference to `of_mm_gpiochip_add_data'
include/linux/mmzone.h:1749:2: error: #error Allocator MAX_ORDER exceeds 
SECTION_SIZE

Unverified Error/Warning (likely false positive, please contact us if 
interested):

drivers/soc/fsl/qe/tsa.c:140:26: sparse: sparse: incorrect type in argument 2 
(different address spaces)
drivers/soc/fsl/qe/tsa.c:150:27: sparse: sparse: incorrect type in argument 1 
(different address spaces)
mm/mmap.c:962 vma_merge() error: uninitialized symbol 'next'.
sound/soc/sof/ipc4-pcm.c:391 sof_ipc4_pcm_dai_link_fixup_rate() error: 
uninitialized symbol 'be_rate'.
sound/soc/sof/ipc4-topology.c:1132 ipc4_copier_set_capture_fmt() error: 
uninitialized symbol 'sample_valid_bits'.

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   `-- 
drivers-net-wireless-legacy-ray_cs.c:warning:strncpy-specified-bound-equals-destination-size
|-- arm-allmodconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|-- i386-randconfig-m021
|   `-- mm-mmap.c-vma_merge()-error:uninitialized-symbol-next-.
|-- ia64-allmodconfig
|   `-- 
drivers-net-wireless-legacy-ray_cs.c:warning:strncpy-specified-bound-equals-destination-size
|-- loongarch-randconfig-r006-20230322
|   `-- 
include-linux-mmzone.h:error:error-Allocator-MAX_ORDER-exceeds-SECTION_SIZE
|-- m68k-randconfig-s041-20230323
|   |-- 
drivers-soc-fsl-qe-tsa.c:sparse:sparse:incorrect-type-in-argument-(different-address-spaces)-expected-void-const-noderef-__iomem-got-void-noderef-__iomem-addr
|   `-- 
drivers-soc-fsl-qe-tsa.c:sparse:sparse:incorrect-type-in-argument-(different-address-spaces)-expected-void-noderef-__iomem-got-void-noderef-__iomem-addr
|-- microblaze-buildonly-randconfig-r003-20230323
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|-- powerpc-allmodconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|-- powerpc-randconfig-s041-20230322
|   `-- gpio.c:(.init.text):undefined-reference-to-of_mm_gpiochip_add_data
|-- s390-randconfig-m031-20230321
|   |-- 
sound-soc-sof-ipc4-pcm.c-sof_ipc4_pcm_dai_link_fixup_rate()-error:uninitialized-symbol-be_rate-.
|   `-- 
sound-soc-sof-ipc4-topology.c-ipc4_copier_set_capture_fmt()-error:uninitialized-symbol-sample_valid_bits-.
`-- sparc-allyesconfig
`-- 
drivers-net-wireless-legacy-ray_cs.c:warning:strncpy-specified-bound-equals-destination-size

elapsed time: 1022m

configs tested: 95
configs skipped: 4

tested configs:
alphaallyesconfig   gcc  
alpha   defconfig   gcc  
alpharandconfig-r026-20230322   gcc  
arc  allyesconfig   gcc  
arc defconfig   gcc  
arc  randconfig-r033-20230322   gcc  
arc  randconfig-r043-20230322   gcc  
arm  allmodconfig   gcc  
arm  allyesconfig   gcc  
arm defconfig   gcc  
arm  randconfig-r023-20230322   clang
arm  randconfig-r046-20230322   clang
arm vf610m4_defconfig   gcc  
arm64allyesconfig   gcc  
arm64   defconfig   gcc  
cskydefconfig   gcc  
hexagon  randconfig-r041-20230322   clang
hexagon  randconfig-r045-20230322   clang
i386 allyesconfig   gcc  
i386  debian-10.3   gcc  
i386

Re: [PATCH v3 0/4] Use dma_default_coherent for devicetree default coherency

2023-03-23 Thread Jiaxun Yang



> 2023年3月23日 07:29,Christoph Hellwig  写道:
> 
> The series looks fine to me.  How should we merge it?

Perhaps go through dma-mapping tree?

Thanks
- Jiaxun




Re: [PATCH v8 1/3] riscv: Introduce CONFIG_RELOCATABLE

2023-03-23 Thread Fangrui Song
On Wed, Mar 22, 2023 at 11:26 AM Nick Desaulniers
 wrote:
>
> On Fri, Feb 24, 2023 at 7:58 AM Björn Töpel  wrote:
> >
> > Alexandre Ghiti  writes:
> >
> > > +cc linux-kbuild, llvm, Nathan, Nick
> > >
> > > On 2/15/23 15:36, Alexandre Ghiti wrote:
> > >> From: Alexandre Ghiti 
> > >>
> > > I tried a lot of things, but I struggle to understand, does anyone have
> > > any idea? FYI, the same problem happens with LLVM.
>
> Off the top of my head, no idea.
>
> (Maybe as a follow up to this series, I wonder if pursuing
> ARCH_HAS_RELR for ARCH=riscv is worthwhile?)

(I had thought about this for my own fun, but the currently only
implementation arch/arm64/kernel/head.S uses assembly.
Every port needs to write some assembly for the same task, which is a pity.
In FreeBSD rtld, glibc, and musl, DT_RELR code is target-independent.)


> >
> > Don't ask me *why*, but adding --emit-relocs to your linker flags solves
> > "the NULL .rela.dyn" both for GCC and LLVM.
> >
> > The downside is that you end up with a bunch of .rela cruft in your
> > vmlinux.
>
> There was a patch just this week to use $(OBJCOPY) to strip these from
> vmlinux (for x86). Looks like x86 uses --emit-relocs for KASLR:
> https://lore.kernel.org/lkml/20230320121006.4863-1-petr.pa...@suse.com/
> --
> Thanks,
> ~Nick Desaulniers
>


-- 
宋方睿


Re: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()

2023-03-23 Thread Nathan Lynch
Andrew Donnellan  writes:

> On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote:
>> * We can expect performance improvements for existing sys_rtas()
>>   users, not only because of overall reduction in the number of
>> system
>>   calls issued, but also due to the better handling of -2/990x in the
>>   kernel. For example, librtas still sleeps for 1ms on -2, which is
>>   completely unnecessary.
>
> Would be good to see this fixed on the librtas side.

Filed an issue: https://github.com/ibm-power-utilities/librtas/issues/30


Re: [PATCH v2] powerpc/pseries/vas: Ignore VAS update for DLPAR if copy/paste is not enabled

2023-03-23 Thread Nathan Lynch
Haren Myneni  writes:
> The hypervisor supports user-mode NX from Power10. pseries_vas_dlpar_cpu()
> is called from lparcfg_write() to update VAS windows for DLPAR event in
> shared processor mode and the kernel gets -ENOTSUPP for HCALLs if the
> user-mode NX is not supported. The current VAS implementation also
> supports only with Radix page tables. Whereas in dedicated processor
> mode, pseries_vas_notifier() is registered only if the copy/paste
> feature is enabled. So instead of displaying HCALL error messages,
> update VAS capabilities if the copy/paste feature is available.

OK, so this prevents the code from making inappropriate hcalls and
logging useless errors, looks good.

>
> This patch ignores updating VAS capabilities in pseries_vas_dlpar_cpu()
> and returns success if the copy/paste feature is not enabled.
> Then lparcfg_write() completes the processor DLPAR operations
> without any failures.
>
> Fixes: 2147783d6bf0 ("powerpc/pseries: Use lparcfg to reconfig VAS windows 
> for DLPAR CPU")
> Signed-off-by: Haren Myneni 

Reviewed-by: Nathan Lynch 


> ---
> v2: Use bool for copypaste_feat and expand commit message on handling failures
>
>  arch/powerpc/platforms/pseries/vas.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/vas.c 
> b/arch/powerpc/platforms/pseries/vas.c
> index 559112312810..513180467562 100644
> --- a/arch/powerpc/platforms/pseries/vas.c
> +++ b/arch/powerpc/platforms/pseries/vas.c
> @@ -856,6 +856,13 @@ int pseries_vas_dlpar_cpu(void)
>  {
>   int new_nr_creds, rc;
>  
> + /*
> +  * NX-GZIP is not enabled. Nothing to do for DLPAR event
> +  */
> + if (!copypaste_feat)
> + return 0;
> +
> +
>   rc = h_query_vas_capabilities(H_QUERY_VAS_CAPABILITIES,
> vascaps[VAS_GZIP_DEF_FEAT_TYPE].feat,
> (u64)virt_to_phys(_cop_caps));
> @@ -1012,6 +1019,7 @@ static int __init pseries_vas_init(void)
>* Linux supports user space COPY/PASTE only with Radix
>*/
>   if (!radix_enabled()) {
> + copypaste_feat = false;
>   pr_err("API is supported only with radix page tables\n");
>   return -ENOTSUPP;
>   }
> -- 
> 2.26.3


Memory coherency issue with IO thread offloading?

2023-03-23 Thread Jens Axboe
Hi,

I got a report sent to me from mariadb, in where 5.10.158 works fine and
5.10.162 is broken. And in fact, current 6.3-rc also fails the test
case. Beware that this email is long, as I'm trying to include
everything that may be relevant...

The test case in question is pretty simple. On debian testing, do:

$ sudo apt-get install mariadb-test
$ cd /usr/share/mysql/mysql-test
$ ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-use-native-aio=1 
--vardir=/dev/shm/mysql  --force encryption.innodb_encryption,innodb,undo0 
--repeat=200

and if it fails, you'll see something like:

encryption.innodb_encryption 'innodb,undo0' [ 6 pass ]   3120
encryption.innodb_encryption 'innodb,undo0' [ 7 pass ]   3123
encryption.innodb_encryption 'innodb,undo0' [ 8 pass ]   3042
encryption.innodb_encryption 'innodb,undo0' [ 9 fail ]
Test ended at 2023-03-23 16:55:17

CURRENT_TEST: encryption.innodb_encryption
mysqltest: At line 11: query 'SET @start_global_value = 
@@global.innodb_encryption_threads' failed: ER_UNKNOWN_SYSTEM_VARIABLE (1193): 
Unknown system variable 'innodb_encryption_threads'

The result from queries just before the failure was:
SET @start_global_value = @@global.innodb_encryption_threads;

 - saving '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/' to 
'/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/'
***Warnings generated in error logs during shutdown after running tests: 
encryption.innodb_encryption

2023-03-23 16:55:17 0 [Warning] Plugin 'example_key_management' is of maturity 
level experimental while the server is stable
2023-03-23 16:55:17 0 [ERROR] InnoDB: Database page corruption on disk or a 
failed read of file './ibdata1' page [page id: space=0, page number=221]. You 
may have to recover from a backup.

where data read was not as expected.

Now, there are a number of io_uring changes between .158 and .162, as it
includes the backport that brought 5.10-stable into line with what
5.15-stable includes. I'll spare you all the digging I did to vet those
changes, but the key thing is that it STILL happens on 6.3-git on
powerpc.

After ruling out many things, one key difference between 158 and 162 is
that the former offloaded requests that could not be done nonblocking to
a kthread, and 162 and newer offloads to an IO thread. An IO thread is
just a normal thread created from the application submitting IO, the
only difference is that it never exits to userspace. An IO thread has
the same mm/files/you-name-it from the original task. It really is the
same as a userspace thread created by the application The switch to IO
threads was done exactly because of that, rather than rely on a fragile
scheme of having the kthread worker assume all sorts of identify from
the original task. surprises if things were missed. This is what caused
most of the io_uring security issues in the past.

The IO that mariadb does in this test is pretty simple - a bunch of
largish buffered writes with IORING_OP_WRITEV, and some smallish (16K)
buffered reads with IORING_OP_READV.

Today I finally gave up and ran a basic experiment, which simply
offloads the writes to a kthread. Since powerpc has an interesting
memory coherency model, my suspicion was that the work involved with
switching MMs for the kthread could just be the main difference here.
The patch is really dumb and simple - rather than queue the write to an
IO thread, it just offloads it to a kthread that then does
kthread_use_mm(), perform write with the same write handler,
kthread_unuse_mm(). AND THIS WORKS! Usually the above mtr test would
fail in 2..20 loops, I've now done 200 and 500 loops and it's fine.

Which then leads me to the question, what about the IO thread offload
makes this fail on powerpc (and no other arch I've tested on, including
x86/x86-64/aarch64/hppa64)? The offload should be equivalent to having a
thread in userspace in the application, and having that thread just
perform the writes. Is there some magic involved with the kthread mm
use/unuse that makes this sufficiently consistent on powerpc? I've tried
any mix of isync()/mb and making the flush_dcache_page() unconditionally
done in the filemap read/write helpers, and it still falls flat on its
face with the offload to an IO thread.

I must clearly be missing something here, which is why I'm emailing the
powerpc Gods for help :-)

-- 
Jens Axboe



Re: [next-20230322] Kernel WARN at kernel/workqueue.c:3182 (rcutorture)

2023-03-23 Thread Paul E. McKenney
On Thu, Mar 23, 2023 at 11:00:59PM +0530, Sachin Sant wrote:
> 
> >> [ 3629.243407] NIP [7fff8cd39558] 0x7fff8cd39558
> >> [ 3629.243410] LR [00010d800398] 0x10d800398
> >> [ 3629.243413] --- interrupt: c00
> >> [ 3629.243415] Code: 419dffa4 e93a0078 3941 552907be 2f89 7d20579e 
> >> 0b09 e95a0078 e91a0080 3921 7fa85000 7d204f9e <0b09> 7f23cb78 
> >> 4bfffd65 0b03 
> >> [ 3629.243430] ---[ end trace  ]—
> >> 
> >> These warnings are repeated few times. The LTP test is marked as PASS.
> >> 
> >> Git bisect point to the following patch
> >> commit f46a5170e6e7d5f836f2199fe82cdb0b4363427f
> >>srcu: Use static init for statically allocated in-module srcu_struct
> > 
> > Hello, Sachin, and it looks like you hit something that Zqiang and I
> > have been tracking down.  I am guessing that you were using modprobe
> > and rmmod to make this happen, and that this happened at rmmod time.
> > 
> Yes, the LTP test script rcu_torture.sh relies on modprobe to load/unload
> the rcutorture module.
> 
> > Whatever the reproducer, does the following patch help?
> 
> Thank you for the patch. Yes, with this patch applied, the test completes
> successfully without the reported warning.

Very good, thank you!  May we have your Tested-by?

Thanx, Paul


Re: [PATCH v5 7/7] sched, smp: Trace smp callback causing an IPI

2023-03-23 Thread Valentin Schneider
On 23/03/23 18:41, Peter Zijlstra wrote:
> On Thu, Mar 23, 2023 at 04:25:25PM +, Valentin Schneider wrote:
>> On 22/03/23 15:04, Peter Zijlstra wrote:
>> > @@ -798,14 +794,20 @@ static void smp_call_function_many_cond(
>> >}
>> >  
>> >/*
>> > +   * Trace each smp_function_call_*() as an IPI, actual IPIs
>> > +   * will be traced with 
>> > func==generic_smp_call_function_single_ipi().
>> > +   */
>> > +  trace_ipi_send_cpumask(cfd->cpumask_ipi, _RET_IP_, func);
>> 
>> I just got a trace pointing out this can emit an event even though no IPI
>> is sent if e.g. the cond_func predicate filters all CPUs in the argument
>> mask:
>> 
>>   ipi_send_cpumask: cpumask= callsite=on_each_cpu_cond_mask+0x3c 
>> callback=flush_tlb_func+0x0
>> 
>> Maybe something like so on top?
>> 
>> ---
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index ba5478814e677..1dc452017d000 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -791,6 +791,8 @@ static void smp_call_function_many_cond(const struct 
>> cpumask *mask,
>>  }
>>  }
>>  
>> +if (!nr_cpus)
>> +goto local;
>
> Hmm, this isn't right. You can get nr_cpus==0 even though it did add
> some to various lists but never was first.
>

Duh, glanced over that.

> But urgh, even if we were to say count nr_queued we'd never get the mask
> right, because we don't track which CPUs have the predicate matched,
> only those we need to actually send an IPI to :/
>
> Ooh, I think we can clear those bits from cfd->cpumask, arguably that's
> a correctness fix too, because the 'run_remote && wait' case shouldn't
> wait on things we didn't queue.
>

Yeah, that makes sense to me. Just one tiny suggestion below.

> Hmm?
>
>
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -728,9 +728,9 @@ static void smp_call_function_many_cond(
>   int cpu, last_cpu, this_cpu = smp_processor_id();
>   struct call_function_data *cfd;
>   bool wait = scf_flags & SCF_WAIT;
> + int nr_cpus = 0, nr_queued = 0;
>   bool run_remote = false;
>   bool run_local = false;
> - int nr_cpus = 0;
>  
>   lockdep_assert_preemption_disabled();
>  
> @@ -772,8 +772,10 @@ static void smp_call_function_many_cond(
>   for_each_cpu(cpu, cfd->cpumask) {
>   call_single_data_t *csd = per_cpu_ptr(cfd->csd, cpu);
>  
> - if (cond_func && !cond_func(cpu, info))
> + if (cond_func && !cond_func(cpu, info)) {
> + __cpumask_clear_cpu(cpu, cfd->cpumask);
>   continue;
> + }
>  
>   csd_lock(csd);
>   if (wait)
> @@ -789,13 +791,15 @@ static void smp_call_function_many_cond(
>   nr_cpus++;
>   last_cpu = cpu;
>   }
> + nr_queued++;
>   }
>  
>   /*
>* Trace each smp_function_call_*() as an IPI, actual IPIs
>* will be traced with 
> func==generic_smp_call_function_single_ipi().
>*/
> - trace_ipi_send_cpumask(cfd->cpumask_ipi, _RET_IP_, func);
> + if (nr_queued)

With your change to cfd->cpumask, we could ditch nr_queued and make this

if (!cpumask_empty(cfd->cpumask))

since cfd->cpumask now only contains CPUs that have had a CSD queued.

> + trace_ipi_send_cpumask(cfd->cpumask, _RET_IP_, func);
>  
>   /*
>* Choose the most efficient way to send an IPI. Note that the



Re: [PATCH v5 7/7] sched, smp: Trace smp callback causing an IPI

2023-03-23 Thread Peter Zijlstra
On Thu, Mar 23, 2023 at 04:25:25PM +, Valentin Schneider wrote:
> On 22/03/23 15:04, Peter Zijlstra wrote:
> > @@ -798,14 +794,20 @@ static void smp_call_function_many_cond(
> > }
> >  
> > /*
> > +* Trace each smp_function_call_*() as an IPI, actual IPIs
> > +* will be traced with 
> > func==generic_smp_call_function_single_ipi().
> > +*/
> > +   trace_ipi_send_cpumask(cfd->cpumask_ipi, _RET_IP_, func);
> 
> I just got a trace pointing out this can emit an event even though no IPI
> is sent if e.g. the cond_func predicate filters all CPUs in the argument
> mask:
> 
>   ipi_send_cpumask: cpumask= callsite=on_each_cpu_cond_mask+0x3c 
> callback=flush_tlb_func+0x0
> 
> Maybe something like so on top?
> 
> ---
> diff --git a/kernel/smp.c b/kernel/smp.c
> index ba5478814e677..1dc452017d000 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -791,6 +791,8 @@ static void smp_call_function_many_cond(const struct 
> cpumask *mask,
>   }
>   }
>  
> + if (!nr_cpus)
> + goto local;

Hmm, this isn't right. You can get nr_cpus==0 even though it did add
some to various lists but never was first.

But urgh, even if we were to say count nr_queued we'd never get the mask
right, because we don't track which CPUs have the predicate matched,
only those we need to actually send an IPI to :/

Ooh, I think we can clear those bits from cfd->cpumask, arguably that's
a correctness fix too, because the 'run_remote && wait' case shouldn't
wait on things we didn't queue.

Hmm?


--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -728,9 +728,9 @@ static void smp_call_function_many_cond(
int cpu, last_cpu, this_cpu = smp_processor_id();
struct call_function_data *cfd;
bool wait = scf_flags & SCF_WAIT;
+   int nr_cpus = 0, nr_queued = 0;
bool run_remote = false;
bool run_local = false;
-   int nr_cpus = 0;
 
lockdep_assert_preemption_disabled();
 
@@ -772,8 +772,10 @@ static void smp_call_function_many_cond(
for_each_cpu(cpu, cfd->cpumask) {
call_single_data_t *csd = per_cpu_ptr(cfd->csd, cpu);
 
-   if (cond_func && !cond_func(cpu, info))
+   if (cond_func && !cond_func(cpu, info)) {
+   __cpumask_clear_cpu(cpu, cfd->cpumask);
continue;
+   }
 
csd_lock(csd);
if (wait)
@@ -789,13 +791,15 @@ static void smp_call_function_many_cond(
nr_cpus++;
last_cpu = cpu;
}
+   nr_queued++;
}
 
/*
 * Trace each smp_function_call_*() as an IPI, actual IPIs
 * will be traced with 
func==generic_smp_call_function_single_ipi().
 */
-   trace_ipi_send_cpumask(cfd->cpumask_ipi, _RET_IP_, func);
+   if (nr_queued)
+   trace_ipi_send_cpumask(cfd->cpumask, _RET_IP_, func);
 
/*
 * Choose the most efficient way to send an IPI. Note that the



[PATCH v7 5/6] pcmcia: Convert to use less arguments in pci_bus_for_each_resource()

2023-03-23 Thread Andy Shevchenko
The pci_bus_for_each_resource() can hide the iterator loop since
it may be not used otherwise. With this, we may drop that iterator
variable definition.

Signed-off-by: Andy Shevchenko 
Reviewed-by: Krzysztof Wilczyński 
Acked-by: Dominik Brodowski 
---
 drivers/pcmcia/rsrc_nonstatic.c | 9 +++--
 drivers/pcmcia/yenta_socket.c   | 3 +--
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/pcmcia/rsrc_nonstatic.c b/drivers/pcmcia/rsrc_nonstatic.c
index ad1141fddb4c..96264ebee46a 100644
--- a/drivers/pcmcia/rsrc_nonstatic.c
+++ b/drivers/pcmcia/rsrc_nonstatic.c
@@ -934,7 +934,7 @@ static int adjust_io(struct pcmcia_socket *s, unsigned int 
action, unsigned long
 static int nonstatic_autoadd_resources(struct pcmcia_socket *s)
 {
struct resource *res;
-   int i, done = 0;
+   int done = 0;
 
if (!s->cb_dev || !s->cb_dev->bus)
return -ENODEV;
@@ -960,12 +960,9 @@ static int nonstatic_autoadd_resources(struct 
pcmcia_socket *s)
 */
if (s->cb_dev->bus->number == 0)
return -EINVAL;
-
-   for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) {
-   res = s->cb_dev->bus->resource[i];
-#else
-   pci_bus_for_each_resource(s->cb_dev->bus, res, i) {
 #endif
+
+   pci_bus_for_each_resource(s->cb_dev->bus, res) {
if (!res)
continue;
 
diff --git a/drivers/pcmcia/yenta_socket.c b/drivers/pcmcia/yenta_socket.c
index 1365eaa20ff4..fd18ab571ce8 100644
--- a/drivers/pcmcia/yenta_socket.c
+++ b/drivers/pcmcia/yenta_socket.c
@@ -673,9 +673,8 @@ static int yenta_search_res(struct yenta_socket *socket, 
struct resource *res,
u32 min)
 {
struct resource *root;
-   int i;
 
-   pci_bus_for_each_resource(socket->dev->bus, root, i) {
+   pci_bus_for_each_resource(socket->dev->bus, root) {
if (!root)
continue;
 
-- 
2.40.0.1.gaa8946217a0b



[PATCH v7 6/6] PCI: Make use of pci_resource_n()

2023-03-23 Thread Andy Shevchenko
Replace open-coded implementations of pci_resource_n() in pci.h.

Signed-off-by: Andy Shevchenko 
---
 include/linux/pci.h | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 70a4684d5f26..9539cf63fe5e 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2006,14 +2006,12 @@ int pci_iobar_pfn(struct pci_dev *pdev, int bar, struct 
vm_area_struct *vma);
  * for accessing popular PCI BAR info
  */
 #define pci_resource_n(dev, bar)   (&(dev)->resource[(bar)])
-#define pci_resource_start(dev, bar)   ((dev)->resource[(bar)].start)
-#define pci_resource_end(dev, bar) ((dev)->resource[(bar)].end)
-#define pci_resource_flags(dev, bar)   ((dev)->resource[(bar)].flags)
-#define pci_resource_len(dev,bar) \
-   ((pci_resource_end((dev), (bar)) == 0) ? 0 :\
-   \
-(pci_resource_end((dev), (bar)) -  \
- pci_resource_start((dev), (bar)) + 1))
+#define pci_resource_start(dev, bar)   (pci_resource_n(dev, bar)->start)
+#define pci_resource_end(dev, bar) (pci_resource_n(dev, bar)->end)
+#define pci_resource_flags(dev, bar)   (pci_resource_n(dev, bar)->flags)
+#define pci_resource_len(dev,bar)  \
+   (pci_resource_end((dev), (bar)) ?   \
+resource_size(pci_resource_n((dev), (bar))) : 0)
 
 #define __pci_dev_for_each_resource_0(dev, res, ...)   \
for (unsigned int __b = 0;  \
-- 
2.40.0.1.gaa8946217a0b



[PATCH v7 3/6] PCI: Allow pci_bus_for_each_resource() to take less arguments

2023-03-23 Thread Andy Shevchenko
Refactor pci_bus_for_each_resource() in the same way as it's done in
pci_dev_for_each_resource() case. This will allow to hide iterator
inside the loop, where it's not used otherwise.

No functional changes intended.

Signed-off-by: Andy Shevchenko 
Reviewed-by: Krzysztof Wilczyński 
---
 drivers/pci/bus.c  |  7 +++
 drivers/pci/hotplug/shpchp_sysfs.c |  8 
 drivers/pci/pci.c  |  3 +--
 drivers/pci/probe.c|  2 +-
 drivers/pci/setup-bus.c| 10 --
 include/linux/pci.h| 17 +
 6 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 549c4bd5caec..5bc81cc0a2de 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -182,13 +182,13 @@ static int pci_bus_alloc_from_region(struct pci_bus *bus, 
struct resource *res,
void *alignf_data,
struct pci_bus_region *region)
 {
-   int i, ret;
struct resource *r, avail;
resource_size_t max;
+   int ret;
 
type_mask |= IORESOURCE_TYPE_BITS;
 
-   pci_bus_for_each_resource(bus, r, i) {
+   pci_bus_for_each_resource(bus, r) {
resource_size_t min_used = min;
 
if (!r)
@@ -289,9 +289,8 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx)
struct resource *res = >resource[idx];
struct resource orig_res = *res;
struct resource *r;
-   int i;
 
-   pci_bus_for_each_resource(bus, r, i) {
+   pci_bus_for_each_resource(bus, r) {
resource_size_t start, end;
 
if (!r)
diff --git a/drivers/pci/hotplug/shpchp_sysfs.c 
b/drivers/pci/hotplug/shpchp_sysfs.c
index 64beed7a26be..01d47a42da04 100644
--- a/drivers/pci/hotplug/shpchp_sysfs.c
+++ b/drivers/pci/hotplug/shpchp_sysfs.c
@@ -24,16 +24,16 @@
 static ssize_t show_ctrl(struct device *dev, struct device_attribute *attr, 
char *buf)
 {
struct pci_dev *pdev;
-   int index, busnr;
struct resource *res;
struct pci_bus *bus;
size_t len = 0;
+   int busnr;
 
pdev = to_pci_dev(dev);
bus = pdev->subordinate;
 
len += sysfs_emit_at(buf, len, "Free resources: memory\n");
-   pci_bus_for_each_resource(bus, res, index) {
+   pci_bus_for_each_resource(bus, res) {
if (res && (res->flags & IORESOURCE_MEM) &&
!(res->flags & IORESOURCE_PREFETCH)) {
len += sysfs_emit_at(buf, len,
@@ -43,7 +43,7 @@ static ssize_t show_ctrl(struct device *dev, struct 
device_attribute *attr, char
}
}
len += sysfs_emit_at(buf, len, "Free resources: prefetchable memory\n");
-   pci_bus_for_each_resource(bus, res, index) {
+   pci_bus_for_each_resource(bus, res) {
if (res && (res->flags & IORESOURCE_MEM) &&
   (res->flags & IORESOURCE_PREFETCH)) {
len += sysfs_emit_at(buf, len,
@@ -53,7 +53,7 @@ static ssize_t show_ctrl(struct device *dev, struct 
device_attribute *attr, char
}
}
len += sysfs_emit_at(buf, len, "Free resources: IO\n");
-   pci_bus_for_each_resource(bus, res, index) {
+   pci_bus_for_each_resource(bus, res) {
if (res && (res->flags & IORESOURCE_IO)) {
len += sysfs_emit_at(buf, len,
 "start = %8.8llx, length = 
%8.8llx\n",
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7a67611dc5f4..99299f1299c4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -779,9 +779,8 @@ struct resource *pci_find_parent_resource(const struct 
pci_dev *dev,
 {
const struct pci_bus *bus = dev->bus;
struct resource *r;
-   int i;
 
-   pci_bus_for_each_resource(bus, r, i) {
+   pci_bus_for_each_resource(bus, r) {
if (!r)
continue;
if (resource_contains(r, res)) {
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index a3f68b6ba6ac..f8191750f6b7 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -533,7 +533,7 @@ void pci_read_bridge_bases(struct pci_bus *child)
pci_read_bridge_mmio_pref(child);
 
if (dev->transparent) {
-   pci_bus_for_each_resource(child->parent, res, i) {
+   pci_bus_for_each_resource(child->parent, res) {
if (res && res->flags) {
pci_bus_add_resource(child, res,
 PCI_SUBTRACTIVE_DECODE);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 027b985dd1ee..fdeb121e9175 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -770,9 +770,8 @@ static struct resource *find_bus_resource_of_type(struct 
pci_bus *bus,
  unsigned long type)
 {

[PATCH v7 0/6] Add pci_dev_for_each_resource() helper and update users

2023-03-23 Thread Andy Shevchenko
Provide two new helper macros to iterate over PCI device resources and
convert users.

Looking at it, refactor existing pci_bus_for_each_resource() and convert
users accordingly.

Changelog v7:
- made both macros to share same name (Bjorn)
- split out the pci_resource_n() conversion (Bjorn)

Changelog v6:
- dropped unused variable in PPC code (LKP)

Changelog v5:
- renamed loop variable to minimize the clash (Keith)
- addressed smatch warning (Dan)
- addressed 0-day bot findings (LKP)

Changelog v4:
- rebased on top of v6.3-rc1
- added tag (Krzysztof)

Changelog v3:
- rebased on top of v2 by Mika, see above
- added tag to pcmcia patch (Dominik)

Changelog v2:
- refactor to have two macros
- refactor existing pci_bus_for_each_resource() in the same way and
  convert users

Andy Shevchenko (5):
  kernel.h: Split out COUNT_ARGS() and CONCATENATE()
  PCI: Allow pci_bus_for_each_resource() to take less arguments
  EISA: Convert to use less arguments in pci_bus_for_each_resource()
  pcmcia: Convert to use less arguments in pci_bus_for_each_resource()
  PCI: Make use of pci_resource_n()

Mika Westerberg (1):
  PCI: Introduce pci_dev_for_each_resource()

 .clang-format |  1 +
 arch/alpha/kernel/pci.c   |  5 +--
 arch/arm/kernel/bios32.c  | 16 
 arch/arm/mach-dove/pcie.c | 10 ++---
 arch/arm/mach-mv78xx0/pcie.c  | 10 ++---
 arch/arm/mach-orion5x/pci.c   | 10 ++---
 arch/mips/pci/ops-bcm63xx.c   |  8 ++--
 arch/mips/pci/pci-legacy.c|  3 +-
 arch/powerpc/kernel/pci-common.c  | 21 +-
 arch/powerpc/platforms/4xx/pci.c  |  8 ++--
 arch/powerpc/platforms/52xx/mpc52xx_pci.c |  5 +--
 arch/powerpc/platforms/pseries/pci.c  | 16 
 arch/sh/drivers/pci/pcie-sh7786.c | 10 ++---
 arch/sparc/kernel/leon_pci.c  |  5 +--
 arch/sparc/kernel/pci.c   | 10 ++---
 arch/sparc/kernel/pcic.c  |  5 +--
 drivers/eisa/pci_eisa.c   |  4 +-
 drivers/pci/bus.c |  7 ++--
 drivers/pci/hotplug/shpchp_sysfs.c|  8 ++--
 drivers/pci/pci.c |  3 +-
 drivers/pci/probe.c   |  2 +-
 drivers/pci/remove.c  |  5 +--
 drivers/pci/setup-bus.c   | 37 +++---
 drivers/pci/setup-res.c   |  4 +-
 drivers/pci/vgaarb.c  | 17 +++-
 drivers/pci/xen-pcifront.c|  4 +-
 drivers/pcmcia/rsrc_nonstatic.c   |  9 ++---
 drivers/pcmcia/yenta_socket.c |  3 +-
 drivers/pnp/quirks.c  | 29 +-
 include/linux/args.h  | 13 +++
 include/linux/kernel.h|  8 +---
 include/linux/pci.h   | 47 +--
 32 files changed, 165 insertions(+), 178 deletions(-)
 create mode 100644 include/linux/args.h

-- 
2.40.0.1.gaa8946217a0b



[PATCH v7 2/6] PCI: Introduce pci_dev_for_each_resource()

2023-03-23 Thread Andy Shevchenko
From: Mika Westerberg 

Instead of open-coding it everywhere introduce a tiny helper that can be
used to iterate over each resource of a PCI device, and convert the most
obvious users into it.

While at it drop doubled empty line before pdev_sort_resources().

No functional changes intended.

Suggested-by: Andy Shevchenko 
Signed-off-by: Mika Westerberg 
Signed-off-by: Andy Shevchenko 
Reviewed-by: Krzysztof Wilczyński 
---
 .clang-format |  1 +
 arch/alpha/kernel/pci.c   |  5 ++--
 arch/arm/kernel/bios32.c  | 16 ++---
 arch/arm/mach-dove/pcie.c | 10 
 arch/arm/mach-mv78xx0/pcie.c  | 10 
 arch/arm/mach-orion5x/pci.c   | 10 
 arch/mips/pci/ops-bcm63xx.c   |  8 +++
 arch/mips/pci/pci-legacy.c|  3 +--
 arch/powerpc/kernel/pci-common.c  | 21 
 arch/powerpc/platforms/4xx/pci.c  |  8 +++
 arch/powerpc/platforms/52xx/mpc52xx_pci.c |  5 ++--
 arch/powerpc/platforms/pseries/pci.c  | 16 ++---
 arch/sh/drivers/pci/pcie-sh7786.c | 10 
 arch/sparc/kernel/leon_pci.c  |  5 ++--
 arch/sparc/kernel/pci.c   | 10 
 arch/sparc/kernel/pcic.c  |  5 ++--
 drivers/pci/remove.c  |  5 ++--
 drivers/pci/setup-bus.c   | 27 -
 drivers/pci/setup-res.c   |  4 +---
 drivers/pci/vgaarb.c  | 17 -
 drivers/pci/xen-pcifront.c|  4 +---
 drivers/pnp/quirks.c  | 29 ---
 include/linux/pci.h   | 16 +
 23 files changed, 113 insertions(+), 132 deletions(-)

diff --git a/.clang-format b/.clang-format
index d988e9fa9b26..2048b0296d76 100644
--- a/.clang-format
+++ b/.clang-format
@@ -520,6 +520,7 @@ ForEachMacros:
   - 'of_property_for_each_string'
   - 'of_property_for_each_u32'
   - 'pci_bus_for_each_resource'
+  - 'pci_dev_for_each_resource'
   - 'pci_doe_for_each_off'
   - 'pcl_for_each_chunk'
   - 'pcl_for_each_segment'
diff --git a/arch/alpha/kernel/pci.c b/arch/alpha/kernel/pci.c
index 64fbfb0763b2..4458eb7f44f0 100644
--- a/arch/alpha/kernel/pci.c
+++ b/arch/alpha/kernel/pci.c
@@ -288,11 +288,10 @@ pcibios_claim_one_bus(struct pci_bus *b)
struct pci_bus *child_bus;
 
list_for_each_entry(dev, >devices, bus_list) {
+   struct resource *r;
int i;
 
-   for (i = 0; i < PCI_NUM_RESOURCES; i++) {
-   struct resource *r = >resource[i];
-
+   pci_dev_for_each_resource(dev, r, i) {
if (r->parent || !r->start || !r->flags)
continue;
if (pci_has_flag(PCI_PROBE_ONLY) ||
diff --git a/arch/arm/kernel/bios32.c b/arch/arm/kernel/bios32.c
index e7ef2b5bea9c..d334c7fb672b 100644
--- a/arch/arm/kernel/bios32.c
+++ b/arch/arm/kernel/bios32.c
@@ -142,15 +142,15 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_WINBOND2, 
PCI_DEVICE_ID_WINBOND2_89C940F,
  */
 static void pci_fixup_dec21285(struct pci_dev *dev)
 {
-   int i;
-
if (dev->devfn == 0) {
+   struct resource *r;
+
dev->class &= 0xff;
dev->class |= PCI_CLASS_BRIDGE_HOST << 8;
-   for (i = 0; i < PCI_NUM_RESOURCES; i++) {
-   dev->resource[i].start = 0;
-   dev->resource[i].end   = 0;
-   dev->resource[i].flags = 0;
+   pci_dev_for_each_resource(dev, r) {
+   r->start = 0;
+   r->end = 0;
+   r->flags = 0;
}
}
 }
@@ -162,13 +162,11 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_DEC, 
PCI_DEVICE_ID_DEC_21285, pci_fixup_d
 static void pci_fixup_ide_bases(struct pci_dev *dev)
 {
struct resource *r;
-   int i;
 
if ((dev->class >> 8) != PCI_CLASS_STORAGE_IDE)
return;
 
-   for (i = 0; i < PCI_NUM_RESOURCES; i++) {
-   r = dev->resource + i;
+   pci_dev_for_each_resource(dev, r) {
if ((r->start & ~0x80) == 0x374) {
r->start |= 2;
r->end = r->start;
diff --git a/arch/arm/mach-dove/pcie.c b/arch/arm/mach-dove/pcie.c
index 754ca381f600..3044b7e03890 100644
--- a/arch/arm/mach-dove/pcie.c
+++ b/arch/arm/mach-dove/pcie.c
@@ -142,14 +142,14 @@ static struct pci_ops pcie_ops = {
 static void rc_pci_fixup(struct pci_dev *dev)
 {
if (dev->bus->parent == NULL && dev->devfn == 0) {
-   int i;
+   struct resource *r;
 
dev->class &= 0xff;
dev->class |= PCI_CLASS_BRIDGE_HOST << 8;
-   for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
-   dev->resource[i].start = 0;
-   

[PATCH v7 4/6] EISA: Convert to use less arguments in pci_bus_for_each_resource()

2023-03-23 Thread Andy Shevchenko
The pci_bus_for_each_resource() can hide the iterator loop since
it may be not used otherwise. With this, we may drop that iterator
variable definition.

Signed-off-by: Andy Shevchenko 
Reviewed-by: Krzysztof Wilczyński 
---
 drivers/eisa/pci_eisa.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/eisa/pci_eisa.c b/drivers/eisa/pci_eisa.c
index 930c2332c3c4..8173e60bb808 100644
--- a/drivers/eisa/pci_eisa.c
+++ b/drivers/eisa/pci_eisa.c
@@ -20,8 +20,8 @@ static struct eisa_root_device pci_eisa_root;
 
 static int __init pci_eisa_init(struct pci_dev *pdev)
 {
-   int rc, i;
struct resource *res, *bus_res = NULL;
+   int rc;
 
if ((rc = pci_enable_device (pdev))) {
dev_err(>dev, "Could not enable device\n");
@@ -38,7 +38,7 @@ static int __init pci_eisa_init(struct pci_dev *pdev)
 * eisa_root_register() can only deal with a single io port resource,
*  so we use the first valid io port resource.
 */
-   pci_bus_for_each_resource(pdev->bus, res, i)
+   pci_bus_for_each_resource(pdev->bus, res)
if (res && (res->flags & IORESOURCE_IO)) {
bus_res = res;
break;
-- 
2.40.0.1.gaa8946217a0b



[PATCH v7 1/6] kernel.h: Split out COUNT_ARGS() and CONCATENATE()

2023-03-23 Thread Andy Shevchenko
kernel.h is being used as a dump for all kinds of stuff for a long time.
The COUNT_ARGS() and CONCATENATE() macros may be used in some places
without need of the full kernel.h dependency train with it.

Here is the attempt on cleaning it up by splitting out these macros().

Signed-off-by: Andy Shevchenko 
---
 include/linux/args.h   | 13 +
 include/linux/kernel.h |  8 +---
 2 files changed, 14 insertions(+), 7 deletions(-)
 create mode 100644 include/linux/args.h

diff --git a/include/linux/args.h b/include/linux/args.h
new file mode 100644
index ..16ef6fad8add
--- /dev/null
+++ b/include/linux/args.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _LINUX_ARGS_H
+#define _LINUX_ARGS_H
+
+/* This counts to 12. Any more, it will return 13th argument. */
+#define __COUNT_ARGS(_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, 
_n, X...) _n
+#define COUNT_ARGS(X...) __COUNT_ARGS(, ##X, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 
2, 1, 0)
+
+#define __CONCAT(a, b) a ## b
+#define CONCATENATE(a, b) __CONCAT(a, b)
+
+#endif /* _LINUX_ARGS_H */
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 40bce7495af8..c049d3394425 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -13,6 +13,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -484,13 +485,6 @@ ftrace_vprintk(const char *fmt, va_list ap)
 static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
 #endif /* CONFIG_TRACING */
 
-/* This counts to 12. Any more, it will return 13th argument. */
-#define __COUNT_ARGS(_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, 
_n, X...) _n
-#define COUNT_ARGS(X...) __COUNT_ARGS(, ##X, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 
2, 1, 0)
-
-#define __CONCAT(a, b) a ## b
-#define CONCATENATE(a, b) __CONCAT(a, b)
-
 /* Rebuild everything on CONFIG_FTRACE_MCOUNT_RECORD */
 #ifdef CONFIG_FTRACE_MCOUNT_RECORD
 # define REBUILD_DUE_TO_FTRACE_MCOUNT_RECORD
-- 
2.40.0.1.gaa8946217a0b



Re: [next-20230322] Kernel WARN at kernel/workqueue.c:3182 (rcutorture)

2023-03-23 Thread Sachin Sant


>> [ 3629.243407] NIP [7fff8cd39558] 0x7fff8cd39558
>> [ 3629.243410] LR [00010d800398] 0x10d800398
>> [ 3629.243413] --- interrupt: c00
>> [ 3629.243415] Code: 419dffa4 e93a0078 3941 552907be 2f89 7d20579e 
>> 0b09 e95a0078 e91a0080 3921 7fa85000 7d204f9e <0b09> 7f23cb78 
>> 4bfffd65 0b03 
>> [ 3629.243430] ---[ end trace  ]—
>> 
>> These warnings are repeated few times. The LTP test is marked as PASS.
>> 
>> Git bisect point to the following patch
>> commit f46a5170e6e7d5f836f2199fe82cdb0b4363427f
>>srcu: Use static init for statically allocated in-module srcu_struct
> 
> Hello, Sachin, and it looks like you hit something that Zqiang and I
> have been tracking down.  I am guessing that you were using modprobe
> and rmmod to make this happen, and that this happened at rmmod time.
> 
Yes, the LTP test script rcu_torture.sh relies on modprobe to load/unload
the rcutorture module.

> Whatever the reproducer, does the following patch help?
> 
> Thanx, Paul
> 

Thank you for the patch. Yes, with this patch applied, the test completes
successfully without the reported warning.

- Sachin



Re: [PATCH 02/14] arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER

2023-03-23 Thread kernel test robot
Hi Mike,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on 51551d71edbc998fd8c8afa7312db3d270f5998e]

url:
https://github.com/intel-lab-lkp/linux/commits/Mike-Rapoport/arm-reword-ARCH_FORCE_MAX_ORDER-prompt-and-help-text/20230323-172512
base:   51551d71edbc998fd8c8afa7312db3d270f5998e
patch link:
https://lore.kernel.org/r/20230323092156.2545741-3-rppt%40kernel.org
patch subject: [PATCH 02/14] arm64: drop ranges in definition of 
ARCH_FORCE_MAX_ORDER
config: arm64-randconfig-r031-20230322 
(https://download.01.org/0day-ci/archive/20230324/202303240155.01y6t6fj-...@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project 
67409911353323ca5edf2049ef0df54132fa1ca7)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm64 cross compiling tool for clang build
# apt-get install binutils-aarch64-linux-gnu
# 
https://github.com/intel-lab-lkp/linux/commit/0522f943c071abf1610651ea40405b7489c50987
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Mike-Rapoport/arm-reword-ARCH_FORCE_MAX_ORDER-prompt-and-help-text/20230323-172512
git checkout 0522f943c071abf1610651ea40405b7489c50987
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 
O=build_dir ARCH=arm64 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 
O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/iommu/ kernel/dma/ mm/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot 
| Link: 
https://lore.kernel.org/oe-kbuild-all/202303240155.01y6t6fj-...@intel.com/

All error/warnings (new ones prefixed by >>):

>> mm/memory.c:5791:37: warning: shift count is negative 
>> [-Wshift-count-negative]
   if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
  ^~
   include/linux/mmzone.h:33:31: note: expanded from macro 'MAX_ORDER_NR_PAGES'
   #define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
 ^  ~
   include/linux/compiler.h:48:41: note: expanded from macro 'unlikely'
   #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
 ^
   include/linux/compiler.h:33:34: note: expanded from macro '__branch_check__'
   __r = __builtin_expect(!!(x), expect);  \
 ^
>> mm/memory.c:5791:37: warning: shift count is negative 
>> [-Wshift-count-negative]
   if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
  ^~
   include/linux/mmzone.h:33:31: note: expanded from macro 'MAX_ORDER_NR_PAGES'
   #define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
 ^  ~
   include/linux/compiler.h:48:68: note: expanded from macro 'unlikely'
   #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
^
   include/linux/compiler.h:35:19: note: expanded from macro '__branch_check__'
expect, is_constant);  \
^~~
   mm/memory.c:5843:37: warning: shift count is negative 
[-Wshift-count-negative]
   if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
  ^~
   include/linux/mmzone.h:33:31: note: expanded from macro 'MAX_ORDER_NR_PAGES'
   #define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
 ^  ~
   include/linux/compiler.h:48:41: note: expanded from macro 'unlikely'
   #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
 ^
   include/linux/compiler.h:33:34: note: expanded from macro '__branch_check__'
   __r = __builtin_expect(!!(x), expect);  \
 ^
   mm/memory.c:5843:37: warning: shift count is negative 
[-Wshift-count-negative]
   if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
  ^~
   include/linux/mmzone.h:33:31: note: expanded from macro 'MAX_ORDER_NR_PAGES'
   #define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
 ^  ~
   include/linux/compiler.h:48:68: note: expanded from macro 'unlikely'

[PATCH v4] Kconfig: introduce HAS_IOPORT option and select it as necessary

2023-03-23 Thread Niklas Schnelle
We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O
Port access. In a future patch HAS_IOPORT=n will disable compilation of
the I/O accessor functions inb()/outb() and friends on architectures
which can not meaningfully support legacy I/O spaces such as s390.

The following architectures do not select HAS_IOPORT:

* ARC
* C-SKY
* Hexagon
* Nios II
* OpenRISC
* s390
* User-Mode Linux
* Xtensa

All other architectures select HAS_IOPORT at least conditionally.

The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs
for HAS_IOPORT specific sections will be added in subsequent patches on
a per subsystem basis.

Co-developed-by: Arnd Bergmann 
Signed-off-by: Arnd Bergmann 
Acked-by: Johannes Berg  # for ARCH=um
Acked-by: Geert Uytterhoeven 
Signed-off-by: Niklas Schnelle 
---
Note: This patch is the initial patch of a larger series[0]. This patch
introduces the HAS_IOPORT config option while the rest of the series adds
driver dependencies and the final patch removes inb() / outb() and friends on
platforms that don't support them. 

Thus each of the per-subsystem patches is independent from each other but
depends on this patch while the final patch depends on the whole series. Thus
splitting this initial patch off allows the per-subsytem HAS_IOPORT dependency
addition be merged separately via different trees without breaking the build.

[0] https://lore.kernel.org/lkml/20230314121216.413434-1-schne...@linux.ibm.com/

Changes since v3:
- List archs without HAS_IOPORT in commit message (Arnd)
- Select HAS_IOPORT for LoongArch (Arnd)
- Use "select HAS_IOPORT if (E)ISA || .." instead of a "depends on" for (E)ISA
  for m68k and parisc
- Select HAS_IOPORT with config GSC on parisc (Arnd)
- Drop "depends on HAS_IOPORT" for um's config ISA (Johannes)
- Drop "depends on HAS_IOPORT" for config ISA on x86 and parisc where it is
  always selected (Arnd)

 arch/alpha/Kconfig  | 1 +
 arch/arm/Kconfig| 1 +
 arch/arm64/Kconfig  | 1 +
 arch/ia64/Kconfig   | 1 +
 arch/loongarch/Kconfig  | 1 +
 arch/m68k/Kconfig   | 1 +
 arch/microblaze/Kconfig | 1 +
 arch/mips/Kconfig   | 1 +
 arch/parisc/Kconfig | 1 +
 arch/powerpc/Kconfig| 1 +
 arch/riscv/Kconfig  | 1 +
 arch/sh/Kconfig | 1 +
 arch/sparc/Kconfig  | 1 +
 arch/x86/Kconfig| 1 +
 drivers/bus/Kconfig | 2 +-
 drivers/parisc/Kconfig  | 1 +
 lib/Kconfig | 4 
 17 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 780d4673c3ca..a5c2b1aa46b0 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -27,6 +27,7 @@ config ALPHA
select AUDIT_ARCH
select GENERIC_CPU_VULNERABILITIES
select GENERIC_SMP_IDLE_THREAD
+   select HAS_IOPORT
select HAVE_ARCH_AUDITSYSCALL
select HAVE_MOD_ARCH_SPECIFIC
select MODULES_USE_ELF_RELA
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e24a9820e12f..4acb5bc4b52a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -70,6 +70,7 @@ config ARM
select GENERIC_SCHED_CLOCK
select GENERIC_SMP_IDLE_THREAD
select HARDIRQS_SW_RESEND
+   select HAS_IOPORT
select HAVE_ARCH_AUDITSYSCALL if AEABI && !OABI_COMPAT
select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1023e896d46b..b740019c4aee 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -145,6 +145,7 @@ config ARM64
select GENERIC_GETTIMEOFDAY
select GENERIC_VDSO_TIME_NS
select HARDIRQS_SW_RESEND
+   select HAS_IOPORT
select HAVE_MOVE_PMD
select HAVE_MOVE_PUD
select HAVE_PCI
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index d7e4a24e8644..2e13ec8263b9 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -25,6 +25,7 @@ config IA64
select PCI_DOMAINS if PCI
select PCI_MSI
select PCI_SYSCALL if PCI
+   select HAS_IOPORT
select HAVE_ASM_MODVERSIONS
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_EXIT_THREAD
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 7fd51257e0ed..e1615dfb5437 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -80,6 +80,7 @@ config LOONGARCH
select GENERIC_SMP_IDLE_THREAD
select GENERIC_TIME_VSYSCALL
select GPIOLIB
+   select HAS_IOPORT
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_MMAP_RND_BITS if MMU
select HAVE_ARCH_SECCOMP_FILTER
diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 82154952e574..40198a1ebe27 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -18,6 +18,7 @@ config M68K
select GENERIC_CPU_DEVICES
select GENERIC_IOMAP
select GENERIC_IRQ_SHOW
+   select HAS_IOPORT if PCI || ISA || ATARI_ROM_ISA
select 

Re: [PATCH v6 1/4] PCI: Introduce pci_dev_for_each_resource()

2023-03-23 Thread Geert Uytterhoeven
Hi Andy,

On Thu, Mar 23, 2023 at 4:15 PM Andy Shevchenko
 wrote:
> On Thu, Mar 23, 2023 at 10:02:38AM -0500, Bjorn Helgaas wrote:
> > I poked around looking for similar patterns elsewhere with:
> >   git grep "#define.*for_each_.*_p("
> >   git grep "#define.*for_each_.*_idx("
> >
> > I didn't find any other "_p" iterators and just a few "_idx" ones, so
> > my hope is to follow what little precedent there is, as well as
> > converge on the basic "*_for_each_resource()" iterators and remove the
> > "_idx()" versions over time by doing things like the
> > pci_claim_resource() change.
>
> The p is heavily used in the byte order conversion helpers.

I can't seem to find them. Example?

Or do you mean cpu_to_be32p()? There "p" means pointer,
which is something completely different.

> > What do you think?  If it seems like excessive churn, we can do it
> > as-is and still try to reduce the use of the index variable over time.
>
> I think _p has a precedent as well. But I can think about it a bit, maybe
> we can come up with something smarter.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH 06/14] m68k: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Geert Uytterhoeven
On Thu, Mar 23, 2023 at 10:23 AM Mike Rapoport  wrote:
> From: "Mike Rapoport (IBM)" 
>
> The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
> describe this configuration option.
>
> Update both to actually describe what this option does.
>
> Signed-off-by: Mike Rapoport (IBM) 

Acked-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH v5 7/7] sched, smp: Trace smp callback causing an IPI

2023-03-23 Thread Valentin Schneider
On 22/03/23 15:04, Peter Zijlstra wrote:
> @@ -798,14 +794,20 @@ static void smp_call_function_many_cond(
>   }
>  
>   /*
> +  * Trace each smp_function_call_*() as an IPI, actual IPIs
> +  * will be traced with 
> func==generic_smp_call_function_single_ipi().
> +  */
> + trace_ipi_send_cpumask(cfd->cpumask_ipi, _RET_IP_, func);

I just got a trace pointing out this can emit an event even though no IPI
is sent if e.g. the cond_func predicate filters all CPUs in the argument
mask:

  ipi_send_cpumask: cpumask= callsite=on_each_cpu_cond_mask+0x3c 
callback=flush_tlb_func+0x0

Maybe something like so on top?

---
diff --git a/kernel/smp.c b/kernel/smp.c
index ba5478814e677..1dc452017d000 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -791,6 +791,8 @@ static void smp_call_function_many_cond(const struct 
cpumask *mask,
}
}
 
+   if (!nr_cpus)
+   goto local;
/*
 * Trace each smp_function_call_*() as an IPI, actual IPIs
 * will be traced with 
func==generic_smp_call_function_single_ipi().
@@ -804,10 +806,10 @@ static void smp_call_function_many_cond(const struct 
cpumask *mask,
 */
if (nr_cpus == 1)
send_call_function_single_ipi(last_cpu);
-   else if (likely(nr_cpus > 1))
+   else
send_call_function_ipi_mask(cfd->cpumask_ipi);
}
-
+local:
if (run_local && (!cond_func || cond_func(this_cpu, info))) {
unsigned long flags;
 



Re: [PATCH 5/8] powerpc/rtas: rename va_rtas_call_unlocked() to va_rtas_call()

2023-03-23 Thread Nathan Lynch
Andrew Donnellan  writes:

> On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote:
>> From: Nathan Lynch 
>> 
>> The function name va_rtas_call_unlocked() is confusing: it may be
>> called with or without rtas_lock held. Rename it to va_rtas_call().
>> 
>> Signed-off-by: Nathan Lynch 
>
> Not a huge fan of the name, the va_ suggests that the only difference
> between this function and rtas_call() is the varargs handling. Perhaps
> something like __rtas_call()?

I would be more inclined to agree if va_rtas_call() were a public API,
like rtas_call().

But it's not, so the convention you're appealing to shouldn't inform the
expectations of external users of the rtas_* APIs, at least.

__rtas_call() conveys strictly less information than va_rtas_call()
IMO. Most functions in the kernel that take a va_list have a "v" worked
into their name somehow.


Re: perf tools power9 JSON files build breakage on ubuntu 18.04 cross build

2023-03-23 Thread Ian Rogers
On Thu, Mar 23, 2023 at 6:11 AM Arnaldo Carvalho de Melo
 wrote:
>
> Exception processing pmu-events/arch/powerpc/power9/other.json
> Traceback (most recent call last):
>   File "pmu-events/jevents.py", line 997, in 
> main()
>   File "pmu-events/jevents.py", line 979, in main
> ftw(arch_path, [], preprocess_one_file)
>   File "pmu-events/jevents.py", line 935, in ftw
> ftw(item.path, parents + [item.name], action)
>   File "pmu-events/jevents.py", line 933, in ftw
> action(parents, item)
>   File "pmu-events/jevents.py", line 514, in preprocess_one_file
> for event in read_json_events(item.path, topic):
>   File "pmu-events/jevents.py", line 388, in read_json_events
> events = json.load(open(path), object_hook=JsonEvent)
>   File "/usr/lib/python3.6/json/__init__.py", line 296, in load
> return loads(fp.read(),
>   File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
> return codecs.ascii_decode(input, self.errors)[0]
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 55090: 
> ordinal not in range(128)
>   CC  /tmp/build/perf/tests/expr.o
> pmu-events/Build:35: recipe for target 
> '/tmp/build/perf/pmu-events/pmu-events.c' failed
> make[3]: *** [/tmp/build/perf/pmu-events/pmu-events.c] Error 1
> make[3]: *** Deleting file '/tmp/build/perf/pmu-events/pmu-events.c'
> Makefile.perf:679: recipe for target 
> '/tmp/build/perf/pmu-events/pmu-events-in.o' failed
> make[2]: *** [/tmp/build/perf/pmu-events/pmu-events-in.o] Error 2
> make[2]: *** Waiting for unfinished jobs
>
>
> Now jevents is an opt-out feature so I'm noticing these problems.
>
> A similar fix for s390 was accepted today:

The JEVENTS_ARCH=all make option builds the s390 files even on x86.
I'm confused as to why that's been working before these fixes.

Thanks,
Ian

> https://lore.kernel.org/r/20230323122532.2305847-1-tmri...@linux.ibm.com
> https://lore.kernel.org/r/ZBwkl77/I31AQk12@osiris
> --
>
> - Arnaldo


Re: [PATCH v6 1/4] PCI: Introduce pci_dev_for_each_resource()

2023-03-23 Thread Andy Shevchenko
On Thu, Mar 23, 2023 at 10:02:38AM -0500, Bjorn Helgaas wrote:
> On Thu, Mar 23, 2023 at 04:30:01PM +0200, Andy Shevchenko wrote:

...

> I poked around looking for similar patterns elsewhere with:
> 
>   git grep "#define.*for_each_.*_p("
>   git grep "#define.*for_each_.*_idx("
> 
> I didn't find any other "_p" iterators and just a few "_idx" ones, so
> my hope is to follow what little precedent there is, as well as
> converge on the basic "*_for_each_resource()" iterators and remove the
> "_idx()" versions over time by doing things like the
> pci_claim_resource() change.

The p is heavily used in the byte order conversion helpers.

> What do you think?  If it seems like excessive churn, we can do it
> as-is and still try to reduce the use of the index variable over time.

I think _p has a precedent as well. But I can think about it a bit, maybe
we can come up with something smarter.

-- 
With Best Regards,
Andy Shevchenko




Re: [PATCH v6 1/4] PCI: Introduce pci_dev_for_each_resource()

2023-03-23 Thread Bjorn Helgaas
On Thu, Mar 23, 2023 at 04:30:01PM +0200, Andy Shevchenko wrote:
> On Wed, Mar 22, 2023 at 02:28:04PM -0500, Bjorn Helgaas wrote:
> > On Mon, Mar 20, 2023 at 03:16:30PM +0200, Andy Shevchenko wrote:
> ...
> 
> > > + pci_dev_for_each_resource_p(dev, r) {
> > >   /* zap the 2nd function of the winbond chip */
> > > - if (dev->resource[i].flags & IORESOURCE_IO
> > > - && dev->bus->number == 0 && dev->devfn == 0x81)
> > > - dev->resource[i].flags &= ~IORESOURCE_IO;
> > > - if (dev->resource[i].start == 0 && dev->resource[i].end) {
> > > - dev->resource[i].flags = 0;
> > > - dev->resource[i].end = 0;
> > > + if (dev->bus->number == 0 && dev->devfn == 0x81 &&
> > > + r->flags & IORESOURCE_IO)
> > 
> > This is a nice literal conversion, but it's kind of lame to test
> > bus->number and devfn *inside* the loop here, since they can't change
> > inside the loop.
> 
> Hmm... why are you asking me, even if I may agree on that? It's
> in the original code and out of scope of this series.

Yeah, I don't think it would be *unreasonable* to clean this up at the
same time so the maintainers can look at both at the same time (this
is arch/powerpc/platforms/pseries/pci.c, so Michael, et al), but no
need for you to do anything, certainly.  I can post a follow-up patch.

> > but
> > since we're converging on the "(dev, res)" style, I think we should
> > reverse the names so we have something like:
> > 
> >   pci_dev_for_each_resource(dev, res)
> >   pci_dev_for_each_resource_idx(dev, res, i)
> 
> Wouldn't it be more churn, including pci_bus_for_each_resource() correction?

Yes, it definitely is a little more churn because we already have
pci_bus_for_each_resource() that would have to be changed.

I poked around looking for similar patterns elsewhere with:

  git grep "#define.*for_each_.*_p("
  git grep "#define.*for_each_.*_idx("

I didn't find any other "_p" iterators and just a few "_idx" ones, so
my hope is to follow what little precedent there is, as well as
converge on the basic "*_for_each_resource()" iterators and remove the
"_idx()" versions over time by doing things like the
pci_claim_resource() change.

What do you think?  If it seems like excessive churn, we can do it
as-is and still try to reduce the use of the index variable over time.

Bjorn


Re: [PATCH v6 1/4] PCI: Introduce pci_dev_for_each_resource()

2023-03-23 Thread Andy Shevchenko
On Wed, Mar 22, 2023 at 02:28:04PM -0500, Bjorn Helgaas wrote:
> On Mon, Mar 20, 2023 at 03:16:30PM +0200, Andy Shevchenko wrote:

...

> > +   pci_dev_for_each_resource_p(dev, r) {
> > /* zap the 2nd function of the winbond chip */
> > -   if (dev->resource[i].flags & IORESOURCE_IO
> > -   && dev->bus->number == 0 && dev->devfn == 0x81)
> > -   dev->resource[i].flags &= ~IORESOURCE_IO;
> > -   if (dev->resource[i].start == 0 && dev->resource[i].end) {
> > -   dev->resource[i].flags = 0;
> > -   dev->resource[i].end = 0;
> > +   if (dev->bus->number == 0 && dev->devfn == 0x81 &&
> > +   r->flags & IORESOURCE_IO)
> 
> This is a nice literal conversion, but it's kind of lame to test
> bus->number and devfn *inside* the loop here, since they can't change
> inside the loop.

Hmm... why are you asking me, even if I may agree on that? It's
in the original code and out of scope of this series.

> > +   r->flags &= ~IORESOURCE_IO;
> > +   if (r->start == 0 && r->end) {
> > +   r->flags = 0;
> > +   r->end = 0;
> > }
> > }

...

> >  #define pci_resource_len(dev,bar) \
> > ((pci_resource_end((dev), (bar)) == 0) ? 0 :\
> > \
> > -(pci_resource_end((dev), (bar)) -  \
> > - pci_resource_start((dev), (bar)) + 1))
> > +resource_size(pci_resource_n((dev), (bar
> 
> I like this change, but it's unrelated to pci_dev_for_each_resource()
> and unmentioned in the commit log.

And as you rightfully noticed this either. I can split it to a separate one.

...

> > +#define __pci_dev_for_each_resource(dev, res, __i, vartype)
> > \
> > +   for (vartype __i = 0;   \
> > +res = pci_resource_n(dev, __i), __i < PCI_NUM_RESOURCES;   \
> > +__i++)
> > +
> > +#define pci_dev_for_each_resource(dev, res, i) 
> > \
> > +   __pci_dev_for_each_resource(dev, res, i, )
> > +
> > +#define pci_dev_for_each_resource_p(dev, res)  
> > \
> > +   __pci_dev_for_each_resource(dev, res, __i, unsigned int)
> 
> This series converts many cases to drop the iterator variable ("i"),
> which is fantastic.
> 
> Several of the remaining places need the iterator variable only to
> call pci_claim_resource(), which could be converted to take a "struct
> resource *" directly without much trouble.
> 
> We don't have to do that pci_claim_resource() conversion now,

Exactly, it's definitely should be separate change.

> but
> since we're converging on the "(dev, res)" style, I think we should
> reverse the names so we have something like:
> 
>   pci_dev_for_each_resource(dev, res)
>   pci_dev_for_each_resource_idx(dev, res, i)

Wouldn't it be more churn, including pci_bus_for_each_resource() correction?

...

> Not sure __pci_dev_for_each_resource() is worthwhile since it only
> avoids repeating that single "for" statement, and passing in "vartype"
> (sometimes empty to implicitly avoid the declaration) is a little
> complicated to read.  I think it'd be easier to read like this:

No objections here.

>   #define pci_dev_for_each_resource(dev, res)  \
> for (unsigned int __i = 0; \
>  res = pci_resource_n(dev, __i), __i < PCI_NUM_RESOURCES;  \
>  __i++)
> 
>   #define pci_dev_for_each_resource_idx(dev, res, idx) \
> for (idx = 0;  \
>  res = pci_resource_n(dev, idx), idx < PCI_NUM_RESOURCES;  \
>  idx++)

-- 
With Best Regards,
Andy Shevchenko




Re: [kvm-unit-tests v2 10/10] powerpc/sprs: Test hypervisor registers on powernv machine

2023-03-23 Thread Thomas Huth

On 20/03/2023 08.03, Nicholas Piggin wrote:

This enables HV privilege registers to be tested with the powernv
machine.

Signed-off-by: Nicholas Piggin 
---
  powerpc/sprs.c | 31 ---
  1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/powerpc/sprs.c b/powerpc/sprs.c
index dd83dac..a7878ff 100644
--- a/powerpc/sprs.c
+++ b/powerpc/sprs.c
@@ -199,16 +199,16 @@ static const struct spr sprs_power_common[1024] = {
  [190] = {"HFSCR",   64, HV_RW, },
  [256] = {"VRSAVE",  32, RW, },
  [259] = {"SPRG3",   64, RO, },
-[284] = {"TBL",  32, HV_WO, },
-[285] = {"TBU",  32, HV_WO, },
-[286] = {"TBU40",64, HV_WO, },
+[284] = {"TBL",  32, HV_WO, }, /* Things can go a bit wonky 
with */
+[285] = {"TBU",  32, HV_WO, }, /* Timebase changing. Should 
save */
+[286] = {"TBU40",64, HV_WO, }, /* and restore it. */
  [304] = {"HSPRG0",  64, HV_RW, },
  [305] = {"HSPRG1",  64, HV_RW, },
  [306] = {"HDSISR",  32, HV_RW,  SPR_INT, },
  [307] = {"HDAR",64, HV_RW,  SPR_INT, },
  [308] = {"SPURR",   64, HV_RW | OS_RO,  SPR_ASYNC, },
  [309] = {"PURR",64, HV_RW | OS_RO,  SPR_ASYNC, },
-[313] = {"HRMOR",64, HV_RW, },
+[313] = {"HRMOR",64, HV_RW,  SPR_HARNESS, }, /* Harness can't 
cope with HRMOR changing */
  [314] = {"HSRR0",   64, HV_RW,  SPR_INT, },
  [315] = {"HSRR1",   64, HV_RW,  SPR_INT, },
  [318] = {"LPCR",64, HV_RW, },
@@ -350,6 +350,22 @@ static const struct spr sprs_power10_pmu[1024] = {
  
  static struct spr sprs[1024];
  
+static bool spr_read_perms(int spr)

+{
+   if (machine_is_powernv())
+   return !!(sprs[spr].access & SPR_HV_READ);
+   else
+   return !!(sprs[spr].access & SPR_OS_READ);
+}
+
+static bool spr_write_perms(int spr)
+{
+   if (machine_is_powernv())
+   return !!(sprs[spr].access & SPR_HV_WRITE);
+   else
+   return !!(sprs[spr].access & SPR_OS_WRITE);
+}
+
  static void setup_sprs(void)
  {
uint32_t pvr = mfspr(287);  /* Processor Version Register */
@@ -462,7 +478,7 @@ static void get_sprs(uint64_t *v)
int i;
  
  	for (i = 0; i < 1024; i++) {

-   if (!(sprs[i].access & SPR_OS_READ))
+   if (!spr_read_perms(i))
continue;
v[i] = mfspr(i);
}
@@ -473,8 +489,9 @@ static void set_sprs(uint64_t val)
int i;
  
  	for (i = 0; i < 1024; i++) {

-   if (!(sprs[i].access & SPR_OS_WRITE))
+   if (!spr_write_perms(i))
continue;
+
if (sprs[i].type & SPR_HARNESS)
continue;
if (!strcmp(sprs[i].name, "MMCR0")) {
@@ -546,7 +563,7 @@ int main(int argc, char **argv)
for (i = 0; i < 1024; i++) {
bool pass = true;
  
-		if (!(sprs[i].access & SPR_OS_READ))

+   if (!spr_read_perms(i))
continue;
  
  		if (sprs[i].width == 32) {


Acked-by: Thomas Huth 



Re: [kvm-unit-tests v2 09/10] powerpc: Support powernv machine with QEMU TCG

2023-03-23 Thread Thomas Huth

On 20/03/2023 08.03, Nicholas Piggin wrote:

This is a basic first pass at powernv support using OPAL (skiboot)
firmware.

The ACCEL is a bit clunky, now defaulting to tcg for powernv machine.
It also does not yet run in the run_tests.sh batch process, more work
is needed to exclude certain tests (e.g., rtas) and adjust parameters
(e.g., increase memory size) to allow powernv to work. For now it
can run single test cases.

Signed-off-by: Nicholas Piggin 
---
  lib/powerpc/asm/ppc_asm.h   |  5 +++
  lib/powerpc/asm/processor.h | 14 
  lib/powerpc/hcall.c |  4 +--
  lib/powerpc/io.c| 33 --
  lib/powerpc/io.h|  6 
  lib/powerpc/processor.c | 10 ++
  lib/powerpc/setup.c | 10 --
  lib/ppc64/asm/opal.h| 11 ++
  lib/ppc64/opal-calls.S  | 46 +
  lib/ppc64/opal.c| 67 +
  powerpc/Makefile.ppc64  |  2 ++
  powerpc/cstart64.S  |  7 
  powerpc/run | 30 ++---
  13 files changed, 234 insertions(+), 11 deletions(-)
  create mode 100644 lib/ppc64/asm/opal.h
  create mode 100644 lib/ppc64/opal-calls.S
  create mode 100644 lib/ppc64/opal.c

diff --git a/lib/powerpc/asm/ppc_asm.h b/lib/powerpc/asm/ppc_asm.h
index 6299ff5..5eec9d3 100644
--- a/lib/powerpc/asm/ppc_asm.h
+++ b/lib/powerpc/asm/ppc_asm.h
@@ -36,7 +36,12 @@
  #endif /* __BYTE_ORDER__ */
  
  /* Machine State Register definitions: */

+#define MSR_LE_BIT 0
  #define MSR_EE_BIT15  /* External Interrupts Enable */
+#define MSR_HV_BIT 60  /* Hypervisor mode */
  #define MSR_SF_BIT63  /* 64-bit mode */
  
+#define SPR_HSRR0	0x13A

+#define SPR_HSRR1  0x13B
+
  #endif /* _ASMPOWERPC_PPC_ASM_H */
diff --git a/lib/powerpc/asm/processor.h b/lib/powerpc/asm/processor.h
index ebfeff2..8084787 100644
--- a/lib/powerpc/asm/processor.h
+++ b/lib/powerpc/asm/processor.h
@@ -3,12 +3,26 @@
  
  #include 

  #include 
+#include 
  
  #ifndef __ASSEMBLY__

  void handle_exception(int trap, void (*func)(struct pt_regs *, void *), void 
*);
  void do_handle_exception(struct pt_regs *regs);
  #endif /* __ASSEMBLY__ */
  
+/*

+ * If this returns true on PowerNV / OPAL machines which run in hypervisor
+ * mode. False on pseries / PAPR machines that run in guest mode.


s/If this/This/

 Thomas



Re: [kvm-unit-tests v2 07/10] powerpc/spapr_vpa: Add basic VPA tests

2023-03-23 Thread Thomas Huth

On 20/03/2023 08.03, Nicholas Piggin wrote:

The VPA is a(n optional) memory structure shared between the hypervisor
and operating system, defined by PAPR. This test defines the structure
and adds registration, deregistration, and a few simple sanity tests.

Signed-off-by: Nicholas Piggin 
---
  lib/linux/compiler.h|  2 +
  lib/powerpc/asm/hcall.h |  1 +
  lib/ppc64/asm/vpa.h | 62 
  powerpc/Makefile.ppc64  |  2 +-
  powerpc/spapr_vpa.c | 90 +


Please add the new test to powerpc/unittests.cfg, otherwise it won't get 
picked up by the run_tests.sh script.



diff --git a/lib/linux/compiler.h b/lib/linux/compiler.h
index 6f565e4..c9d205e 100644
--- a/lib/linux/compiler.h
+++ b/lib/linux/compiler.h
@@ -45,7 +45,9 @@
  
  #define barrier()	asm volatile("" : : : "memory")
  
+#ifndef __always_inline

  #define __always_inline   inline __attribute__((always_inline))
+#endif


What's this change good for? ... it doesn't seem to be related to this patch?


diff --git a/lib/ppc64/asm/vpa.h b/lib/ppc64/asm/vpa.h
new file mode 100644
index 000..11dde01
--- /dev/null
+++ b/lib/ppc64/asm/vpa.h
@@ -0,0 +1,62 @@
+#ifndef _ASMPOWERPC_VPA_H_
+#define _ASMPOWERPC_VPA_H_
+/*
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+
+#ifndef __ASSEMBLY__
+
+struct vpa {
+   uint32_tdescriptor;
+   uint16_tsize;
+   uint8_t reserved1[3];
+   uint8_t status;


Where does this status field come from? ... My LoPAPR only says that there 
are 18 "reserved" bytes in total here.



+   uint8_t reserved2[14];
+   uint32_tfru_node_id;
+   uint32_tfru_proc_id;
+   uint8_t reserved3[56];
+   uint8_t vhpn_change_counters[8];
+   uint8_t reserved4[80];
+   uint8_t cede_latency;
+   uint8_t maintain_ebb;
+   uint8_t reserved5[6];
+   uint8_t dtl_enable_mask;
+   uint8_t dedicated_cpu_donate;
+   uint8_t maintain_fpr;
+   uint8_t maintain_pmc;
+   uint8_t reserved6[28];
+   uint64_tidle_estimate_purr;
+   uint8_t reserved7[28];
+   uint16_tmaintain_nr_slb;
+   uint8_t idle;
+   uint8_t maintain_vmx;
+   uint32_tvp_dispatch_count;
+   uint32_tvp_dispatch_dispersion;
+   uint64_tvp_fault_count;
+   uint64_tvp_fault_tb;
+   uint64_tpurr_exprop_idle;
+   uint64_tspurr_exprop_idle;
+   uint64_tpurr_exprop_busy;
+   uint64_tspurr_exprop_busy;
+   uint64_tpurr_donate_idle;
+   uint64_tspurr_donate_idle;
+   uint64_tpurr_donate_busy;
+   uint64_tspurr_donate_busy;
+   uint64_tvp_wait3_tb;
+   uint64_tvp_wait2_tb;
+   uint64_tvp_wait1_tb;
+   uint64_tpurr_exprop_adjunct_busy;
+   uint64_tspurr_exprop_adjunct_busy;


The above two fields are also marked as "reserved" in my LoPAPR ... which 
version did you use?



+   uint32_tsupervisor_pagein_count;
+   uint8_t reserved8[4];
+   uint64_tpurr_exprop_adjunct_idle;
+   uint64_tspurr_exprop_adjunct_idle;
+   uint64_tadjunct_insns_executed;


dito for the above three lines... I guess my LoPAPR is too old...


+   uint8_t reserved9[120];
+   uint64_tdtl_index;
+   uint8_t reserved10[96];
+};
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ASMPOWERPC_VPA_H_ */
diff --git a/powerpc/Makefile.ppc64 b/powerpc/Makefile.ppc64
index ea68447..b0ed2b1 100644
--- a/powerpc/Makefile.ppc64
+++ b/powerpc/Makefile.ppc64
@@ -19,7 +19,7 @@ reloc.o  = $(TEST_DIR)/reloc64.o
  OBJDIRS += lib/ppc64
  
  # ppc64 specific tests

-tests =
+tests = $(TEST_DIR)/spapr_vpa.elf
  
  include $(SRCDIR)/$(TEST_DIR)/Makefile.common
  
diff --git a/powerpc/spapr_vpa.c b/powerpc/spapr_vpa.c

new file mode 100644
index 000..45688fe
--- /dev/null
+++ b/powerpc/spapr_vpa.c
@@ -0,0 +1,90 @@
+/*
+ * Test sPAPR hypervisor calls (aka. h-calls)


Adjust to "Test sPAPR H_REGISTER_VPA hypervisor call" ?


+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include  /* for endian accessors */
+
+static void print_vpa(struct vpa *vpa)
+{
+   printf("VPA\n");
+   printf("descriptor:0x%08x\n", 
be32_to_cpu(vpa->descriptor));
+   printf("size:  0x%04x\n", 
be16_to_cpu(vpa->size));
+   printf("status:  0x%02x\n", 
vpa->status);
+   printf("fru_node_id:   0x%08x\n", 
be32_to_cpu(vpa->fru_node_id));
+   printf("fru_proc_id:   

Re: [next-20230322] Kernel WARN at kernel/workqueue.c:3182 (rcutorture)

2023-03-23 Thread Paul E. McKenney
On Thu, Mar 23, 2023 at 04:55:54PM +0530, Sachin Sant wrote:
> While running rcutorture tests from LTP on an IBM Power10 server booted with
> 6.3.0-rc3-next-20230322 following warning is observed:
> 
> [ 3629.242831] [ cut here ]
> [ 3629.242835] WARNING: CPU: 8 PID: 614614 at kernel/workqueue.c:3182 
> __flush_work.isra.44+0x44/0x370
> [ 3629.242845] Modules linked in: rcutorture(-) torture vmac poly1305_generic 
> chacha_generic chacha20poly1305 n_gsm pps_ldisc ppp_synctty ppp_async 
> ppp_generic serport slcan can_dev slip slhc snd_hrtimer snd_seq 
> snd_seq_device snd_timer snd soundcore pcrypt crypto_user n_hdlc dummy veth 
> tun nfsv3 nfs_acl nfs lockd grace fscache netfs brd overlay exfat vfat fat 
> btrfs blake2b_generic xor raid6_pq zstd_compress xfs loop sctp ip6_udp_tunnel 
> udp_tunnel libcrc32c dm_mod bonding rfkill tls sunrpc kmem device_dax nd_pmem 
> nd_btt dax_pmem papr_scm pseries_rng libnvdimm vmx_crypto ext4 mbcache jbd2 
> sd_mod t10_pi crc64_rocksoft crc64 sg ibmvscsi scsi_transport_srp ibmveth 
> fuse [last unloaded: ltp_uaccess(O)]
> [ 3629.242911] CPU: 8 PID: 614614 Comm: modprobe Tainted: G O 
> 6.3.0-rc3-next-20230322 #1
> [ 3629.242917] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 
> of:IBM,FW1030.00 (NH1030_026) hv:phyp pSeries
> [ 3629.242923] NIP: c018c204 LR: c022306c CTR: 
> c02233c0
> [ 3629.242927] REGS: c005c14e3880 TRAP: 0700 Tainted: G O 
> (6.3.0-rc3-next-20230322)
> [ 3629.242932] MSR: 8282b033  CR: 
> 4800 XER: 000a
> [ 3629.242943] CFAR: c018c5b0 IRQMASK: 0 
> [ 3629.242943] GPR00: c022306c c005c14e3b20 c1401200 
> c0080c4419e8 
> [ 3629.242943] GPR04: 0001 0001 0011 
> fffe 
> [ 3629.242943] GPR08: c00efe9a8300 0001  
> c0080c42afe0 
> [ 3629.242943] GPR12: c02233c0 c00e6700  
>  
> [ 3629.242943] GPR16:    
>  
> [ 3629.242943] GPR20:    
>  
> [ 3629.242943] GPR24: c0080c443400 c0080c440f60 c0080c4418c8 
> c2abb368 
> [ 3629.242943] GPR28: c0080c440f58  c0080c4419e8 
> c0080c443400 
> [ 3629.242987] NIP [c018c204] __flush_work.isra.44+0x44/0x370
> [ 3629.242993] LR [c022306c] cleanup_srcu_struct+0x6c/0x1e0
> [ 3629.242998] Call Trace:
> [ 3629.243000] [c005c14e3b20] [c0080c440f58] 
> srcu9+0x0/0xfffef0a8 [rcutorture] (unreliable)
> [ 3629.243009] [c005c14e3bb0] [c022306c] 
> cleanup_srcu_struct+0x6c/0x1e0
> [ 3629.243015] [c005c14e3c50] [c0223428] 
> srcu_module_notify+0x68/0x180
> [ 3629.243021] [c005c14e3c90] [c019a1e0] 
> notifier_call_chain+0xc0/0x1b0
> [ 3629.243027] [c005c14e3cf0] [c019ad24] 
> blocking_notifier_call_chain+0x64/0xa0
> [ 3629.243033] [c005c14e3d30] [c024a4c8] 
> sys_delete_module+0x1f8/0x3c0
> [ 3629.243039] [c005c14e3e10] [c0037480] 
> system_call_exception+0x140/0x350
> [ 3629.243044] [c005c14e3e50] [c000d6a0] 
> system_call_common+0x160/0x2e4
> [ 3629.243050] --- interrupt: c00 at 0x7fff8cd39558
> [ 3629.243054] NIP: 7fff8cd39558 LR: 00010d800398 CTR: 
> 
> [ 3629.243057] REGS: c005c14e3e80 TRAP: 0c00 Tainted: G O 
> (6.3.0-rc3-next-20230322)
> [ 3629.243062] MSR: 8280f033  CR: 
> 28008282 XER: 
> [ 3629.243072] IRQMASK: 0 
> [ 3629.243072] GPR00: 0081 7fffe99fd9c0 7fff8ce07300 
> 00013df30ec8 
> [ 3629.243072] GPR04: 0800 000a 1999 
>  
> [ 3629.243072] GPR08: 7fff8cd98160   
>  
> [ 3629.243072] GPR12:  7fff8d5fcb50 00010d80a650 
> 00010d80a648 
> [ 3629.243072] GPR16:  0001  
> 00010d80a428 
> [ 3629.243072] GPR20: 00010d830068  7fffe99ff2f8 
> 00013df304f0 
> [ 3629.243072] GPR24:  7fffe99ff2f8 00013df30ec8 
>  
> [ 3629.243072] GPR28:  00013df30e60 00013df30ec8 
> 00013df30e60 
> [ 3629.243115] NIP [7fff8cd39558] 0x7fff8cd39558
> [ 3629.243118] LR [00010d800398] 0x10d800398
> [ 3629.243121] --- interrupt: c00
> [ 3629.243123] Code: 89292e39 f821ff71 e94d0c78 f9410068 3940 69290001 
> 0b09 fbc10080 7c7e1b78 e9230018 7d290074 7929d182 <0b09> 7c0802a6 
> fb810070 f80100a0 
> [ 3629.243138] ---[ end trace  ]—
> 
> Followed by following traces:
> 
> [ 3629.243149] [ cut here ]
> [ 3629.243152] WARNING: CPU: 8 PID: 614614 at kernel/rcu/srcutree.c:663 
> cleanup_srcu_struct+0x11c/0x1e0
> [ 3629.243159] Modules 

Re: [PATCH 02/14] arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER

2023-03-23 Thread kernel test robot
Hi Mike,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on 51551d71edbc998fd8c8afa7312db3d270f5998e]

url:
https://github.com/intel-lab-lkp/linux/commits/Mike-Rapoport/arm-reword-ARCH_FORCE_MAX_ORDER-prompt-and-help-text/20230323-172512
base:   51551d71edbc998fd8c8afa7312db3d270f5998e
patch link:
https://lore.kernel.org/r/20230323092156.2545741-3-rppt%40kernel.org
patch subject: [PATCH 02/14] arm64: drop ranges in definition of 
ARCH_FORCE_MAX_ORDER
config: arm64-randconfig-r022-20230322 
(https://download.01.org/0day-ci/archive/20230323/202303232149.chh6khii-...@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/intel-lab-lkp/linux/commit/0522f943c071abf1610651ea40405b7489c50987
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Mike-Rapoport/arm-reword-ARCH_FORCE_MAX_ORDER-prompt-and-help-text/20230323-172512
git checkout 0522f943c071abf1610651ea40405b7489c50987
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 
O=build_dir ARCH=arm64 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 
O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/base/regmap/ drivers/iommu/ 
fs/proc/ mm/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot 
| Link: 
https://lore.kernel.org/oe-kbuild-all/202303232149.chh6khii-...@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:25,
from mm/mm_init.c:9:
   mm/mm_init.c: In function 'find_zone_movable_pfns_for_nodes':
>> include/linux/mmzone.h:33:31: warning: left shift count is negative 
>> [-Wshift-count-negative]
  33 | #define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
 |   ^~
   include/linux/math.h:61:25: note: in definition of macro 'roundup'
  61 | typeof(y) __y = y;  \
 | ^
   mm/mm_init.c:429:55: note: in expansion of macro 'MAX_ORDER_NR_PAGES'
 429 | roundup(required_movablecore, 
MAX_ORDER_NR_PAGES);
 |   
^~
>> include/linux/mmzone.h:33:31: warning: left shift count is negative 
>> [-Wshift-count-negative]
  33 | #define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
 |   ^~
   include/linux/math.h:61:25: note: in definition of macro 'roundup'
  61 | typeof(y) __y = y;  \
 | ^
   mm/mm_init.c:540:56: note: in expansion of macro 'MAX_ORDER_NR_PAGES'
 540 | roundup(zone_movable_pfn[nid], 
MAX_ORDER_NR_PAGES);
 |
^~
--
   In file included from include/linux/gfp.h:7,
from include/linux/mm.h:7,
from mm/debug.c:10:
   mm/debug.c: In function '__dump_page':
>> include/linux/mmzone.h:33:31: warning: left shift count is negative 
>> [-Wshift-count-negative]
  33 | #define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
 |   ^~
   mm/debug.c:70:44: note: in expansion of macro 'MAX_ORDER_NR_PAGES'
  70 | if (page < head || (page >= head + MAX_ORDER_NR_PAGES)) {
 |^~
--
   In file included from include/linux/build_bug.h:5,
from include/linux/container_of.h:5,
from include/linux/list.h:5,
from include/linux/smp.h:12,
from include/linux/kernel_stat.h:5,
from mm/memory.c:42:
   mm/memory.c: In function 'clear_huge_page':
>> include/linux/mmzone.h:33:31: warning: left shift count is negative 
>> [-Wshift-count-negative]
  33 | #define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
 |   ^~
   include/linux/compiler.h:78:45: note: in definition of macro 'unlikely'
  78 | # define unlikely(x)__builtin_expect(!!(x), 0)
 | ^
   mm/memory.c:5791:44: note: in expansion of macro 'MAX_ORDER_NR_PAGES'
5791 | if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
 |^~
   mm/memory.c: In function 'copy_user_huge_page':
>> include/linux/mmzone.h:33:31: warning: left shift

Re: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()

2023-03-23 Thread Nathan Lynch
Michael Ellerman  writes:
> Nathan Lynch via B4 Relay  writes:
>> From: Nathan Lynch 
>>
>> The kernel can handle retrying RTAS function calls in response to
>> -2/990x in the sys_rtas() handler instead of relaying the intermediate
>> status to user space.
>
> This looks good in general.
>
> One query ...
>
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index 47a2aa43d7d4..c330a22ccc70 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -1798,7 +1798,6 @@ static bool block_rtas_call(int token, int nargs,
>>  /* We assume to be passed big endian arguments */
>>  SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>>  {
>> -struct pin_cookie cookie;
>>  struct rtas_args args;
>>  unsigned long flags;
>>  char *buff_copy, *errbuf = NULL;
>> @@ -1866,20 +1865,25 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, 
>> uargs)
>>  
>>  buff_copy = get_errorlog_buffer();
>>  
>> -raw_spin_lock_irqsave(_lock, flags);
>> -cookie = lockdep_pin_lock(_lock);
>> +do {
>> +struct pin_cookie cookie;
>>  
>> -rtas_args = args;
>> -do_enter_rtas(_args);
>> -args = rtas_args;
>> +raw_spin_lock_irqsave(_lock, flags);
>> +cookie = lockdep_pin_lock(_lock);
>>  
>> -/* A -1 return code indicates that the last command couldn't
>> -   be completed due to a hardware error. */
>> -if (be32_to_cpu(args.rets[0]) == -1)
>> -errbuf = __fetch_rtas_last_error(buff_copy);
>> +rtas_args = args;
>> +do_enter_rtas(_args);
>> +args = rtas_args;
>>  
>> -lockdep_unpin_lock(_lock, cookie);
>> -raw_spin_unlock_irqrestore(_lock, flags);
>> +/*
>> + * Handle error record retrieval before releasing the lock.
>> + */
>> +if (be32_to_cpu(args.rets[0]) == -1)
>> +errbuf = __fetch_rtas_last_error(buff_copy);
>> +
>> +lockdep_unpin_lock(_lock, cookie);
>> +raw_spin_unlock_irqrestore(_lock, flags);
>> +} while (rtas_busy_delay(be32_to_cpu(args.rets[0])));
>
> rtas_busy_delay_early() has the successive_ext_delays case that will
> break out eventually. But if we keep getting plain RTAS_BUSY back from
> RTAS I *think* this loop will never terminate?

Yes, but if this happens, then there is a serious bug in Linux or
RTAS. The only time I've seen something like that on PowerVM is when
Linux corrupted internal RTAS state by not serializing calls correctly.

rtas_busy_delay_early() has a bail-out heuristic, not for RTAS_BUSY, but
for extended delay statuses (990x), which I suspect happen rarely (if
ever) that early. That's there in order to allow boot to proceed and
hopefully get useful messages out in a truly unexpected circumstance.

That said...

> To avoid that, and just as good manners, I think we should have a
> fatal_signal_pending() check, and if that returns true we bail out of
> the syscall with -EINTR ?

That probably makes sense. In its current state, I could see
this patch preventing or delaying OS shutdown in situations where it
wouldn't have occurred before.

I think I would want the bailout condition in this case to be
(fatal_signal_pending() && retries > some_threshold), to reduce the
likelihood of non-"stuck" operations from being left unfinished. And it
should dump a stack trace.


Re: [PATCH v3 01/38] Kconfig: introduce HAS_IOPORT option and select it as necessary

2023-03-23 Thread Niklas Schnelle
On Tue, 2023-03-14 at 13:37 +0100, Johannes Berg wrote:
> On Tue, 2023-03-14 at 13:11 +0100, Niklas Schnelle wrote:
> > --- a/arch/um/Kconfig
> > +++ b/arch/um/Kconfig
> > @@ -56,6 +56,7 @@ config NO_IOPORT_MAP
> >  
> >  config ISA
> > bool
> > +   depends on HAS_IOPORT
> > 
> 
> config ISA here is already unselectable, and nothing ever does "select
> ISA" (only in some other architectures), so is there much point in this?
> 
> I'm not even sure why this exists at all.

You're right there's not much point and I dropped this for v4. I agree
that probably the whole "config ISA" could be removed if it's always
false anyway but that seems out of scope for this patch.

> 
> But anyway, adding a dependency to a always-false symbol doesn't make it
> less always-false :-)
> 
> Acked-by: Johannes Berg  # for ARCH=um

Thanks

> 
> 
> Certainly will be nice to get rid of this cruft for architectures that
> don't have it.
> 
> johannes

Yes, also, for s390 the broken NULL + port number access in the generic
inb()/outb() currently causes the only remaining clang warning on
defconfig builds.


perf tools power9 JSON files build breakage on ubuntu 18.04 cross build

2023-03-23 Thread Arnaldo Carvalho de Melo
Exception processing pmu-events/arch/powerpc/power9/other.json
Traceback (most recent call last):
  File "pmu-events/jevents.py", line 997, in 
main()
  File "pmu-events/jevents.py", line 979, in main
ftw(arch_path, [], preprocess_one_file)
  File "pmu-events/jevents.py", line 935, in ftw
ftw(item.path, parents + [item.name], action)
  File "pmu-events/jevents.py", line 933, in ftw
action(parents, item)
  File "pmu-events/jevents.py", line 514, in preprocess_one_file
for event in read_json_events(item.path, topic):
  File "pmu-events/jevents.py", line 388, in read_json_events
events = json.load(open(path), object_hook=JsonEvent)
  File "/usr/lib/python3.6/json/__init__.py", line 296, in load
return loads(fp.read(),
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 55090: 
ordinal not in range(128)
  CC  /tmp/build/perf/tests/expr.o
pmu-events/Build:35: recipe for target 
'/tmp/build/perf/pmu-events/pmu-events.c' failed
make[3]: *** [/tmp/build/perf/pmu-events/pmu-events.c] Error 1
make[3]: *** Deleting file '/tmp/build/perf/pmu-events/pmu-events.c'
Makefile.perf:679: recipe for target 
'/tmp/build/perf/pmu-events/pmu-events-in.o' failed
make[2]: *** [/tmp/build/perf/pmu-events/pmu-events-in.o] Error 2
make[2]: *** Waiting for unfinished jobs


Now jevents is an opt-out feature so I'm noticing these problems.

A similar fix for s390 was accepted today:


https://lore.kernel.org/r/20230323122532.2305847-1-tmri...@linux.ibm.com
https://lore.kernel.org/r/ZBwkl77/I31AQk12@osiris
-- 

- Arnaldo


Re: [kvm-unit-tests v2 06/10] powerpc/sprs: Specify SPRs with data rather than code

2023-03-23 Thread Thomas Huth

On 20/03/2023 08.03, Nicholas Piggin wrote:

A significant rework that builds an array of 'struct spr', where each
element describes an SPR. This makes various metadata about the SPR
like name and access type easier to carry and use.

Hypervisor privileged registers are described despite not being used
at the moment for completeness, but also the code might one day be
reused for a hypervisor-privileged test.

Signed-off-by: Nicholas Piggin 

This ended up a little over-engineered perhaps, but there are lots of
SPRs, lots of access types, lots of changes between processor and ISA
versions, and lots of places they are implemented and used, so lots of
room for mistakes. There is not a good system in place to easily
see that userspace, supervisor, etc., switches perform all the right
SPR context switching so this is a nice test case to have. The sprs test
quickly caught a few QEMU TCG SPR bugs which really motivated me to
improve the SPR coverage.
---
  powerpc/sprs.c | 589 +
  1 file changed, 394 insertions(+), 195 deletions(-)

diff --git a/powerpc/sprs.c b/powerpc/sprs.c
index db341a9..dd83dac 100644
--- a/powerpc/sprs.c
+++ b/powerpc/sprs.c
@@ -82,231 +82,407 @@ static void mtspr(unsigned spr, uint64_t val)
: "lr", "ctr", "xer");
  }
  
-uint64_t before[1024], after[1024];

+static uint64_t before[1024], after[1024];
  
-/* Common SPRs for all PowerPC CPUs */

-static void set_sprs_common(uint64_t val)
-{
-   // mtspr(9, val);   /* CTR */ /* Used by mfspr/mtspr */
-   // mtspr(273, val); /* SPRG1 */  /* Used by our exception handler */
-   mtspr(274, val);/* SPRG2 */
-   mtspr(275, val);/* SPRG3 */
-}
+#define SPR_PR_READ0x0001
+#define SPR_PR_WRITE   0x0002
+#define SPR_OS_READ0x0010
+#define SPR_OS_WRITE   0x0020
+#define SPR_HV_READ0x0100
+#define SPR_HV_WRITE   0x0200
+
+#define RW 0x333
+#define RO 0x111
+#define WO 0x222
+#define OS_RW  0x330
+#define OS_RO  0x110
+#define OS_WO  0x220
+#define HV_RW  0x300
+#define HV_RO  0x100
+#define HV_WO  0x200
+
+#define SPR_ASYNC  0x1000  /* May be updated asynchronously */
+#define SPR_INT0x2000  /* May be updated by synchronous 
interrupt */
+#define SPR_HARNESS0x4000  /* Test harness uses the register */
+
+struct spr {
+   const char  *name;
+   uint8_t width;
+   uint16_taccess;
+   uint16_ttype;
+};
+
+/* SPRs common denominator back to PowerPC Operating Environment Architecture 
*/
+static const struct spr sprs_common[1024] = {
+  [1] = {"XER",  64, RW, SPR_HARNESS, }, /* 
Compiler */
+  [8] = {"LR",   64, RW, SPR_HARNESS, }, /* 
Compiler, mfspr/mtspr */
+  [9] = {"CTR",  64, RW, SPR_HARNESS, }, /* 
Compiler, mfspr/mtspr */
+ [18] = {"DSISR",32, OS_RW,  SPR_INT, },
+ [19] = {"DAR",  64, OS_RW,  SPR_INT, },
+ [26] = {"SRR0", 64, OS_RW,  SPR_INT, },
+ [27] = {"SRR1", 64, OS_RW,  SPR_INT, },
+[268] = {"TB",   64, RO  ,   SPR_ASYNC, },
+[269] = {"TBU",  32, RO, SPR_ASYNC, },
+[272] = {"SPRG0",64, OS_RW,  SPR_HARNESS, }, /* Int stack */
+[273] = {"SPRG1",64, OS_RW,  SPR_HARNESS, }, /* Scratch */
+[274] = {"SPRG2",64, OS_RW, },
+[275] = {"SPRG3",64, OS_RW, },
+[287] = {"PVR",  32, OS_RO, },
+};


Using a size of 1024 for each of these arrays looks weird. Why don't you add 
a "nr" field to struct spr and specify the register number via that field 
instead of using the index into the array as register number?


 Thomas



Re: [PATCH 7/8] powerpc/rtas: warn on unsafe argument to rtas_call_unlocked()

2023-03-23 Thread Nathan Lynch
Andrew Donnellan  writes:

> On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote:
>> From: Nathan Lynch 
>> 
>> Any caller of rtas_call_unlocked() must provide an rtas_args
>> parameter
>> block distinct from the core rtas_args buffer used by the rtas_call()
>> path. It's an unlikely error to make, but the potential consequences
>> are grim, and it's trivial to check.
>> 
>> Signed-off-by: Nathan Lynch 
>
> call_rtas_display_status() seems to do exactly this, or am I missing
> something?

No you're right, the warning would be spurious in that case. May need to
drop this one, or refactor rtas_call():

  4456f4524604be2558e5f6a8e0f7cc9ed17c783e
  Author: Michael Ellerman 
  AuthorDate: Tue Nov 24 22:26:11 2015 +1100

  powerpc/rtas: Use rtas_call_unlocked() in call_rtas_display_status()

  Although call_rtas_display_status() does actually want to use the
  regular RTAS locking, it doesn't want the extra logic that is in
  rtas_call(), so currently it open codes the logic.

  Instead we can use rtas_call_unlocked(), after taking the RTAS lock.

aside: does anyone know if the display_status() code is worth keeping?
It looks like it is used to drive the 16-character wide physical LCD I
remember seeing on P4-era and older machines. Is it a vestige of
non-LPAR pseries that should be dropped, or is it perhaps useful for
chrp or cell?


Re: [kvm-unit-tests v2 03/10] powerpc: abstract H_CEDE calls into a sleep functions

2023-03-23 Thread Thomas Huth

On 20/03/2023 08.03, Nicholas Piggin wrote:

This consolidates several implementations, and it no longer leaves
MSR[EE] enabled after the decrementer interrupt is handled, but
rather disables it on return.

The handler no longer allows a continuous ticking, but rather dec
has to be re-armed and EE re-enabled (e.g., via H_CEDE hcall) each
time.

Signed-off-by: Nicholas Piggin 
---
  lib/powerpc/asm/handlers.h  |  2 +-
  lib/powerpc/asm/ppc_asm.h   |  1 +
  lib/powerpc/asm/processor.h |  7 +++
  lib/powerpc/handlers.c  | 10 -
  lib/powerpc/processor.c | 42 +
  powerpc/sprs.c  |  6 +-
  powerpc/tm.c| 20 +-
  7 files changed, 57 insertions(+), 31 deletions(-)

diff --git a/lib/powerpc/asm/handlers.h b/lib/powerpc/asm/handlers.h
index 64ba727..e4a0cd4 100644
--- a/lib/powerpc/asm/handlers.h
+++ b/lib/powerpc/asm/handlers.h
@@ -3,6 +3,6 @@
  
  #include 
  
-void dec_except_handler(struct pt_regs *regs, void *data);

+void dec_handler_oneshot(struct pt_regs *regs, void *data);
  
  #endif /* _ASMPOWERPC_HANDLERS_H_ */

diff --git a/lib/powerpc/asm/ppc_asm.h b/lib/powerpc/asm/ppc_asm.h
index 1b85f6b..6299ff5 100644
--- a/lib/powerpc/asm/ppc_asm.h
+++ b/lib/powerpc/asm/ppc_asm.h
@@ -36,6 +36,7 @@
  #endif /* __BYTE_ORDER__ */
  
  /* Machine State Register definitions: */

+#define MSR_EE_BIT 15  /* External Interrupts Enable */
  #define MSR_SF_BIT63  /* 64-bit mode */
  
  #endif /* _ASMPOWERPC_PPC_ASM_H */

diff --git a/lib/powerpc/asm/processor.h b/lib/powerpc/asm/processor.h
index ac001e1..ebfeff2 100644
--- a/lib/powerpc/asm/processor.h
+++ b/lib/powerpc/asm/processor.h
@@ -20,6 +20,8 @@ static inline uint64_t get_tb(void)
  
  extern void delay(uint64_t cycles);

  extern void udelay(uint64_t us);
+extern void sleep_tb(uint64_t cycles);
+extern void usleep(uint64_t us);
  
  static inline void mdelay(uint64_t ms)

  {
@@ -27,4 +29,9 @@ static inline void mdelay(uint64_t ms)
udelay(1000);
  }
  
+static inline void msleep(uint64_t ms)

+{
+   usleep(ms * 1000);
+}
+
  #endif /* _ASMPOWERPC_PROCESSOR_H_ */
diff --git a/lib/powerpc/handlers.c b/lib/powerpc/handlers.c
index c8721e0..296f14f 100644
--- a/lib/powerpc/handlers.c
+++ b/lib/powerpc/handlers.c
@@ -9,15 +9,13 @@
  #include 
  #include 
  #include 
+#include 
  
  /*

   * Generic handler for decrementer exceptions (0x900)
- * Just reset the decrementer back to the value specified when registering the
- * handler
+ * Return with MSR[EE] disabled.
   */
-void dec_except_handler(struct pt_regs *regs __unused, void *data)
+void dec_handler_oneshot(struct pt_regs *regs, void *data)
  {
-   uint64_t dec = *((uint64_t *) data);
-
-   asm volatile ("mtdec %0" : : "r" (dec));
+   regs->msr &= ~(1UL << MSR_EE_BIT);
  }
diff --git a/lib/powerpc/processor.c b/lib/powerpc/processor.c
index ec85b9d..e77a240 100644
--- a/lib/powerpc/processor.c
+++ b/lib/powerpc/processor.c
@@ -10,6 +10,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  
  static struct {

void (*func)(struct pt_regs *, void *data);
@@ -54,3 +56,43 @@ void udelay(uint64_t us)
  {
delay((us * tb_hz) / 100);
  }
+
+void sleep_tb(uint64_t cycles)
+{
+   uint64_t start, end, now;
+
+   start = now = get_tb();
+   end = start + cycles;
+
+   while (end > now) {
+   uint64_t left = end - now;
+
+   /* Could support large decrementer */
+   if (left > 0x7fff)
+   left = 0x7fff;
+
+   asm volatile ("mtdec %0" : : "r" (left));
+   handle_exception(0x900, _handler_oneshot, NULL);


Wouldn't it be better to first call handle_exception() before moving 
something into the decrementer?



+   /*
+* H_CEDE is called with MSR[EE] clear and enables it as part
+* of the hcall, returning with EE enabled. The dec interrupt
+* is then taken immediately and the handler disables EE.
+*
+* If H_CEDE returned for any other interrupt than dec
+* expiring, that is considered an unhandled interrupt and
+* the test case would be stopped.
+*/
+   if (hcall(H_CEDE) != H_SUCCESS) {
+   printf("H_CEDE failed\n");
+   abort();
+   }
+   handle_exception(0x900, NULL, NULL);
+
+   now = get_tb();
+   }
+}


 Thomas



Re: [kvm-unit-tests v2 04/10] powerpc: Add ISA v3.1 (POWER10) support to SPR test

2023-03-23 Thread Thomas Huth

On 20/03/2023 08.03, Nicholas Piggin wrote:

This is a very basic detection that does not include all new SPRs.

Signed-off-by: Nicholas Piggin 
---
  powerpc/sprs.c | 22 ++
  1 file changed, 22 insertions(+)

diff --git a/powerpc/sprs.c b/powerpc/sprs.c
index ba4ddee..6ee6dba 100644
--- a/powerpc/sprs.c
+++ b/powerpc/sprs.c
@@ -117,6 +117,15 @@ static void set_sprs_book3s_300(uint64_t val)
mtspr(823, val);/* PSSCR */
  }
  
+/* SPRs from Power ISA Version 3.1B */

+static void set_sprs_book3s_31(uint64_t val)
+{
+   set_sprs_book3s_207(val);
+   mtspr(48, val); /* PIDR */
+   /* 3.1 removes TIDR */
+   mtspr(823, val);/* PSSCR */
+}
+
  static void set_sprs(uint64_t val)
  {
uint32_t pvr = mfspr(287);  /* Processor Version Register */
@@ -137,6 +146,9 @@ static void set_sprs(uint64_t val)
case 0x4e:  /* POWER9 */
set_sprs_book3s_300(val);
break;
+   case 0x80:  /* POWER10 */
+   set_sprs_book3s_31(val);
+   break;
default:
puts("Warning: Unknown processor version!\n");
}
@@ -220,6 +232,13 @@ static void get_sprs_book3s_300(uint64_t *v)
v[823] = mfspr(823);/* PSSCR */
  }
  
+static void get_sprs_book3s_31(uint64_t *v)

+{
+   get_sprs_book3s_207(v);
+   v[48] = mfspr(48);  /* PIDR */
+   v[823] = mfspr(823);/* PSSCR */
+}
+
  static void get_sprs(uint64_t *v)
  {
uint32_t pvr = mfspr(287);  /* Processor Version Register */
@@ -240,6 +259,9 @@ static void get_sprs(uint64_t *v)
case 0x4e:  /* POWER9 */
get_sprs_book3s_300(v);
break;
+   case 0x80:  /* POWER10 */
+   get_sprs_book3s_31(v);
+   break;
}
  }


Looks like I accidentally replied to v1 a couple of minutes ago... I meant 
to reply here:


Reviewed-by: Thomas Huth 



Re: [PATCH 02/14] arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Zi Yan
On 23 Mar 2023, at 6:37, Mike Rapoport wrote:

> On Thu, Mar 23, 2023 at 10:15:33AM +, Catalin Marinas wrote:
>> On Thu, Mar 23, 2023 at 11:21:44AM +0200, Mike Rapoport wrote:
>>> From: "Mike Rapoport (IBM)" 
>>>
>>> It is not a good idea to change fundamental parameters of core memory
>>> management. Having predefined ranges suggests that the values within
>>> those ranges are sensible, but one has to *really* understand
>>> implications of changing MAX_ORDER before actually amending it and
>>> ranges don't help here.
>>>
>>> Drop ranges in definition of ARCH_FORCE_MAX_ORDER
>>>
>>> Signed-off-by: Mike Rapoport (IBM) 
>>> ---
>>>  arch/arm64/Kconfig | 2 --
>>>  1 file changed, 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index e60baf7859d1..bab6483e4317 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -1489,9 +1489,7 @@ config XEN
>>>  config ARCH_FORCE_MAX_ORDER
>>> int "Maximum zone order" if ARM64_4K_PAGES || ARM64_16K_PAGES
>>> default "13" if ARM64_64K_PAGES
>>> -   range 11 13 if ARM64_16K_PAGES
>>> default "11" if ARM64_16K_PAGES
>>> -   range 10 15 if ARM64_4K_PAGES
>>> default "10"
>>
>> I don't mind rewriting the help text as in the subsequent patch but I'd
>> keep the ranges as a safety measure. It's less wasted time explaining to
>> people why some random max order doesn't work. Alternatively, we can
>> drop the ranges but make this option configurable only if EXPERT.
>
> I like the EXPERT alternative more. I'll add it in v2.

I got an error report from kernel test robot, which set -1 to 
ARCH_FORCE_MAX_ORDER
via random config generator[1].

Does the EXPERT option prevent kernel test robot from generating such config?
Or we should fix random config generator?

[1] 
https://lore.kernel.org/linux-mm/91e887e4-0867-421f-9c75-fb9cff15c...@nvidia.com/


--
Best Regards,
Yan, Zi


signature.asc
Description: OpenPGP digital signature


Re: [kvm-unit-tests PATCH 4/7] powerpc: Add ISA v3.1 (POWER10) support to SPR test

2023-03-23 Thread Thomas Huth

On 17/03/2023 13.36, Nicholas Piggin wrote:

This is a very basic detection that does not include all new SPRs.

Signed-off-by: Nicholas Piggin 
---
  powerpc/sprs.c | 22 ++
  1 file changed, 22 insertions(+)


Reviewed-by: Thomas Huth 



Re: [kvm-unit-tests v2 02/10] powerpc: add local variant of SPR test

2023-03-23 Thread Thomas Huth

On 20/03/2023 08.03, Nicholas Piggin wrote:

This adds the non-migration variant of the SPR test to the matrix,
which can be simpler to run and debug.

Reviewed-by: Thomas Huth 
Signed-off-by: Nicholas Piggin 
---
  powerpc/unittests.cfg | 4 
  1 file changed, 4 insertions(+)

diff --git a/powerpc/unittests.cfg b/powerpc/unittests.cfg
index 1e74948..3e41598 100644
--- a/powerpc/unittests.cfg
+++ b/powerpc/unittests.cfg
@@ -68,5 +68,9 @@ groups = h_cede_tm
  
  [sprs]

  file = sprs.elf
+groups = sprs


Looking at this again, I think you don't really need a "groups =" entry here 
... I'd suggest to drop that line.


 Thomas



+[sprs-migration]
+file = sprs.elf
  extra_params = -append '-w'
  groups = migration




[next-20230322] Kernel WARN at kernel/workqueue.c:3182 (rcutorture)

2023-03-23 Thread Sachin Sant
While running rcutorture tests from LTP on an IBM Power10 server booted with
6.3.0-rc3-next-20230322 following warning is observed:

[ 3629.242831] [ cut here ]
[ 3629.242835] WARNING: CPU: 8 PID: 614614 at kernel/workqueue.c:3182 
__flush_work.isra.44+0x44/0x370
[ 3629.242845] Modules linked in: rcutorture(-) torture vmac poly1305_generic 
chacha_generic chacha20poly1305 n_gsm pps_ldisc ppp_synctty ppp_async 
ppp_generic serport slcan can_dev slip slhc snd_hrtimer snd_seq snd_seq_device 
snd_timer snd soundcore pcrypt crypto_user n_hdlc dummy veth tun nfsv3 nfs_acl 
nfs lockd grace fscache netfs brd overlay exfat vfat fat btrfs blake2b_generic 
xor raid6_pq zstd_compress xfs loop sctp ip6_udp_tunnel udp_tunnel libcrc32c 
dm_mod bonding rfkill tls sunrpc kmem device_dax nd_pmem nd_btt dax_pmem 
papr_scm pseries_rng libnvdimm vmx_crypto ext4 mbcache jbd2 sd_mod t10_pi 
crc64_rocksoft crc64 sg ibmvscsi scsi_transport_srp ibmveth fuse [last 
unloaded: ltp_uaccess(O)]
[ 3629.242911] CPU: 8 PID: 614614 Comm: modprobe Tainted: G O 
6.3.0-rc3-next-20230322 #1
[ 3629.242917] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 
of:IBM,FW1030.00 (NH1030_026) hv:phyp pSeries
[ 3629.242923] NIP: c018c204 LR: c022306c CTR: c02233c0
[ 3629.242927] REGS: c005c14e3880 TRAP: 0700 Tainted: G O 
(6.3.0-rc3-next-20230322)
[ 3629.242932] MSR: 8282b033  CR: 
4800 XER: 000a
[ 3629.242943] CFAR: c018c5b0 IRQMASK: 0 
[ 3629.242943] GPR00: c022306c c005c14e3b20 c1401200 
c0080c4419e8 
[ 3629.242943] GPR04: 0001 0001 0011 
fffe 
[ 3629.242943] GPR08: c00efe9a8300 0001  
c0080c42afe0 
[ 3629.242943] GPR12: c02233c0 c00e6700  
 
[ 3629.242943] GPR16:    
 
[ 3629.242943] GPR20:    
 
[ 3629.242943] GPR24: c0080c443400 c0080c440f60 c0080c4418c8 
c2abb368 
[ 3629.242943] GPR28: c0080c440f58  c0080c4419e8 
c0080c443400 
[ 3629.242987] NIP [c018c204] __flush_work.isra.44+0x44/0x370
[ 3629.242993] LR [c022306c] cleanup_srcu_struct+0x6c/0x1e0
[ 3629.242998] Call Trace:
[ 3629.243000] [c005c14e3b20] [c0080c440f58] 
srcu9+0x0/0xfffef0a8 [rcutorture] (unreliable)
[ 3629.243009] [c005c14e3bb0] [c022306c] 
cleanup_srcu_struct+0x6c/0x1e0
[ 3629.243015] [c005c14e3c50] [c0223428] 
srcu_module_notify+0x68/0x180
[ 3629.243021] [c005c14e3c90] [c019a1e0] 
notifier_call_chain+0xc0/0x1b0
[ 3629.243027] [c005c14e3cf0] [c019ad24] 
blocking_notifier_call_chain+0x64/0xa0
[ 3629.243033] [c005c14e3d30] [c024a4c8] 
sys_delete_module+0x1f8/0x3c0
[ 3629.243039] [c005c14e3e10] [c0037480] 
system_call_exception+0x140/0x350
[ 3629.243044] [c005c14e3e50] [c000d6a0] 
system_call_common+0x160/0x2e4
[ 3629.243050] --- interrupt: c00 at 0x7fff8cd39558
[ 3629.243054] NIP: 7fff8cd39558 LR: 00010d800398 CTR: 
[ 3629.243057] REGS: c005c14e3e80 TRAP: 0c00 Tainted: G O 
(6.3.0-rc3-next-20230322)
[ 3629.243062] MSR: 8280f033  CR: 
28008282 XER: 
[ 3629.243072] IRQMASK: 0 
[ 3629.243072] GPR00: 0081 7fffe99fd9c0 7fff8ce07300 
00013df30ec8 
[ 3629.243072] GPR04: 0800 000a 1999 
 
[ 3629.243072] GPR08: 7fff8cd98160   
 
[ 3629.243072] GPR12:  7fff8d5fcb50 00010d80a650 
00010d80a648 
[ 3629.243072] GPR16:  0001  
00010d80a428 
[ 3629.243072] GPR20: 00010d830068  7fffe99ff2f8 
00013df304f0 
[ 3629.243072] GPR24:  7fffe99ff2f8 00013df30ec8 
 
[ 3629.243072] GPR28:  00013df30e60 00013df30ec8 
00013df30e60 
[ 3629.243115] NIP [7fff8cd39558] 0x7fff8cd39558
[ 3629.243118] LR [00010d800398] 0x10d800398
[ 3629.243121] --- interrupt: c00
[ 3629.243123] Code: 89292e39 f821ff71 e94d0c78 f9410068 3940 69290001 
0b09 fbc10080 7c7e1b78 e9230018 7d290074 7929d182 <0b09> 7c0802a6 
fb810070 f80100a0 
[ 3629.243138] ---[ end trace  ]—

Followed by following traces:

[ 3629.243149] [ cut here ]
[ 3629.243152] WARNING: CPU: 8 PID: 614614 at kernel/rcu/srcutree.c:663 
cleanup_srcu_struct+0x11c/0x1e0
[ 3629.243159] Modules linked in: rcutorture(-) torture vmac poly1305_generic 
chacha_generic chacha20poly1305 n_gsm pps_ldisc ppp_synctty ppp_async 
ppp_generic serport slcan can_dev slip slhc snd_hrtimer snd_seq snd_seq_device 
snd_timer snd soundcore pcrypt crypto_user n_hdlc dummy 

Re: [kvm-unit-tests v2 01/10] MAINTAINERS: Update powerpc list

2023-03-23 Thread Thomas Huth

On 20/03/2023 08.03, Nicholas Piggin wrote:

KVM development on powerpc has moved to the Linux on Power mailing list,
as per linux.git commit 19b27f37ca97d ("MAINTAINERS: Update powerpc KVM
entry").

Signed-off-by: Nicholas Piggin 
---
  MAINTAINERS | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 649de50..b545a45 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -79,7 +79,7 @@ M: Laurent Vivier 
  M: Thomas Huth 
  S: Maintained
  L: k...@vger.kernel.org
-L: kvm-...@vger.kernel.org
+L: linuxppc-dev@lists.ozlabs.org
  F: powerpc/
  F: lib/powerpc/
  F: lib/ppc64/


Reviewed-by: Thomas Huth 



Re: [PATCH 02/14] arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Mike Rapoport
On Thu, Mar 23, 2023 at 10:15:33AM +, Catalin Marinas wrote:
> On Thu, Mar 23, 2023 at 11:21:44AM +0200, Mike Rapoport wrote:
> > From: "Mike Rapoport (IBM)" 
> > 
> > It is not a good idea to change fundamental parameters of core memory
> > management. Having predefined ranges suggests that the values within
> > those ranges are sensible, but one has to *really* understand
> > implications of changing MAX_ORDER before actually amending it and
> > ranges don't help here.
> > 
> > Drop ranges in definition of ARCH_FORCE_MAX_ORDER
> > 
> > Signed-off-by: Mike Rapoport (IBM) 
> > ---
> >  arch/arm64/Kconfig | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index e60baf7859d1..bab6483e4317 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -1489,9 +1489,7 @@ config XEN
> >  config ARCH_FORCE_MAX_ORDER
> > int "Maximum zone order" if ARM64_4K_PAGES || ARM64_16K_PAGES
> > default "13" if ARM64_64K_PAGES
> > -   range 11 13 if ARM64_16K_PAGES
> > default "11" if ARM64_16K_PAGES
> > -   range 10 15 if ARM64_4K_PAGES
> > default "10"
> 
> I don't mind rewriting the help text as in the subsequent patch but I'd
> keep the ranges as a safety measure. It's less wasted time explaining to
> people why some random max order doesn't work. Alternatively, we can
> drop the ranges but make this option configurable only if EXPERT.

I like the EXPERT alternative more. I'll add it in v2.
 
> -- 
> Catalin

-- 
Sincerely yours,
Mike.


Re: [PATCH 4/9] powerpc/dexcr: Support userspace ROP protection

2023-03-23 Thread Michael Ellerman
Benjamin Gray  writes:
> The ISA 3.1B hashst and hashchk instructions use a per-cpu SPR HASHKEYR
> to hold a key used in the hash calculation. This key should be different
> for each process to make it harder for a malicious process to recreate
> valid hash values for a victim process.
>
> Add support for storing a per-thread hash key, and setting/clearing
> HASHKEYR appropriately.
>
> Signed-off-by: Benjamin Gray 
>
> ---
>
> v1:   * Guard HASHKEYR update behind change check
>   * HASHKEYR reset moved earlier to patch 2
> ---
>  arch/powerpc/include/asm/processor.h |  1 +
>  arch/powerpc/kernel/process.c| 17 +
>  2 files changed, 18 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/processor.h 
> b/arch/powerpc/include/asm/processor.h
> index bad64d6a5d36..666d4e9804a8 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -264,6 +264,7 @@ struct thread_struct {
>   unsigned long   mmcr3;
>   unsigned long   sier2;
>   unsigned long   sier3;
> + unsigned long   hashkeyr;
  
hashkeyr is part of the thread state, so we should save it in core dumps.

The DEXCR also influences the threads behaviour, at least by
enabling/disabling hashst/chk, so I think we should also include it in
core dumps.

Adding regs to the core dump is done by adding eg. a new NT_PPC_HASHKEY
entry in include/uapi/linux/elf.h and wiring it up in ptrace.

But those are a non-renewable resource, so if we're going to add
HASHKEYR and DEXCR it would be better to group them as a single note. I
think given that HASHKEYR doesn't exist without the DEXCR, grouping them
is OK. Could be called NT_PPC_DEXCF (F for facility) ?

See NT_PPC_PKEY for an example.

I know HASHKEYR is security sensitive, but I think the existing ptrace
checks should be sufficient. A ptracer has more or less full control of
the tracee anyway.

To support checkpoint/restore we'd need to support setting NPHIE in the
DEXCR via ptrace. I think for starters we can just fail the ->set() if
the DEXCR doesn't match the current SPR value.

cheers


Re: [PATCH 03/14] arm64: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Catalin Marinas
On Thu, Mar 23, 2023 at 11:21:45AM +0200, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
> describe this configuration option.
> 
> Update both to actually describe what this option does.
> 
> Signed-off-by: Mike Rapoport (IBM) 

Acked-by: Catalin Marinas 


Re: [PATCH 00/14] arch,mm: cleanup Kconfig entries for ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Kirill A. Shutemov
On Thu, Mar 23, 2023 at 11:21:42AM +0200, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> Hi,
> 
> Several architectures have ARCH_FORCE_MAX_ORDER in their Kconfig and
> they all have wrong and misleading prompt and help text for this option.
> 
> Besides, some define insane limits for possible values of
> ARCH_FORCE_MAX_ORDER, some carefully define ranges only for a subset of
> possible configurations, some make this option configurable by users for no
> good reason.
> 
> This set updates the prompt and help text everywhere and does its best to
> update actual definitions of ranges where applicable.
> 
> Mike Rapoport (IBM) (14):
>   arm: reword ARCH_FORCE_MAX_ORDER prompt and help text
>   arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER
>   arm64: reword ARCH_FORCE_MAX_ORDER prompt and help text
>   csky: drop ARCH_FORCE_MAX_ORDER
>   ia64: don't allow users to override ARCH_FORCE_MAX_ORDER
>   m68k: reword ARCH_FORCE_MAX_ORDER prompt and help text
>   nios2: reword ARCH_FORCE_MAX_ORDER prompt and help text
>   nios2: drop ranges for definition of ARCH_FORCE_MAX_ORDER
>   powerpc: reword ARCH_FORCE_MAX_ORDER prompt and help text
>   powerpc: drop ranges for definition of ARCH_FORCE_MAX_ORDER
>   sh: reword ARCH_FORCE_MAX_ORDER prompt and help text
>   sh: drop ranges for definition of ARCH_FORCE_MAX_ORDER
>   sparc: reword ARCH_FORCE_MAX_ORDER prompt and help text
>   xtensa: reword ARCH_FORCE_MAX_ORDER prompt and help text
> 
>  arch/arm/Kconfig  | 16 +---
>  arch/arm64/Kconfig| 27 ---
>  arch/csky/Kconfig |  4 
>  arch/ia64/Kconfig |  3 +--
>  arch/m68k/Kconfig.cpu | 16 +---
>  arch/nios2/Kconfig| 17 +
>  arch/powerpc/Kconfig  | 22 +-
>  arch/sh/mm/Kconfig| 19 +--
>  arch/sparc/Kconfig| 16 +---
>  arch/xtensa/Kconfig   | 16 +---
>  10 files changed, 76 insertions(+), 80 deletions(-)

Acked-by: Kirill A. Shutemov 

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


Re: [PATCH 02/14] arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Catalin Marinas
On Thu, Mar 23, 2023 at 11:21:44AM +0200, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> It is not a good idea to change fundamental parameters of core memory
> management. Having predefined ranges suggests that the values within
> those ranges are sensible, but one has to *really* understand
> implications of changing MAX_ORDER before actually amending it and
> ranges don't help here.
> 
> Drop ranges in definition of ARCH_FORCE_MAX_ORDER
> 
> Signed-off-by: Mike Rapoport (IBM) 
> ---
>  arch/arm64/Kconfig | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e60baf7859d1..bab6483e4317 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1489,9 +1489,7 @@ config XEN
>  config ARCH_FORCE_MAX_ORDER
>   int "Maximum zone order" if ARM64_4K_PAGES || ARM64_16K_PAGES
>   default "13" if ARM64_64K_PAGES
> - range 11 13 if ARM64_16K_PAGES
>   default "11" if ARM64_16K_PAGES
> - range 10 15 if ARM64_4K_PAGES
>   default "10"

I don't mind rewriting the help text as in the subsequent patch but I'd
keep the ranges as a safety measure. It's less wasted time explaining to
people why some random max order doesn't work. Alternatively, we can
drop the ranges but make this option configurable only if EXPERT.

-- 
Catalin


Probing nvme disks fails on Upstream kernels on powerpc Maxconfig

2023-03-23 Thread Srikar Dronamraju
Hi,

I am unable to boot upstream kernels from v5.16 to the latest upstream
kernel on a maxconfig system. (Machine config details given below)

At boot, we see a series of messages like the below.

dracut-initqueue[13917]: Warning: dracut-initqueue: timeout, still waiting for 
following initqueue hooks:
dracut-initqueue[13917]: Warning: 
/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2f93dc0767-18aa-467f-afa7-5b4e9c13108a.sh:
 "if ! grep -q After=remote-fs-pre.target 
/run/systemd/generator/systemd-cryptsetup@*.service 2>/dev/null; then
dracut-initqueue[13917]: [ -e 
"/dev/disk/by-uuid/93dc0767-18aa-467f-afa7-5b4e9c13108a" ]
dracut-initqueue[13917]: fi"


journalctl shows the below warning.

 WARNING: CPU: 242 PID: 1219 at 
/home/srikar/work/linux.git/arch/powerpc/kernel/iommu.c:227 
iommu_range_alloc+0x3d4/0x450
 Modules linked in: lpfc(E+) nvmet_fc(E) nvmet(E) configfs(E) qla2xxx(E+) 
nvme_fc(E) nvme_fabrics(E) vmx_crypto(E) gf128mul(E) xhci_pci(E) 
xhci_pci_renesas(E) xhci_hcd(E) ipr(E+) nvme(E) usbcore(E) libata(E) 
nvme_core(E) t10_pi(E) scsi_transport_fc(E) usb_common(E) btrfs(E) 
blake2b_generic(E) libcrc32c(E) crc32c_vpmsum(E) xor(E) raid6_pq(E) sg(E) 
dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) 
scsi_mod(E) scsi_common(E)
 CPU: 242 PID: 1219 Comm: kworker/u3843:0 Tainted: GW   EL
5.15.0-sp4+ #33 91e1c36ffe385108bbe4a3834506a047dc78552d
 Workqueue: nvme-reset-wq nvme_reset_work [nvme]
 NIP:  c005a134 LR: c005a128 CTR: 
 REGS: c7fd4c7eb580 TRAP: 0700   Tainted: GW   EL (5.15.0-sp4+)
 MSR:  80029033   CR: 24002424  XER: 
 CFAR: c020972c IRQMASK: 0
 GPR00: c005a128 c7fd4c7eb820 c2aa4b00 0001
 GPR04: c273d648 0003 0bfbcb21 c2d88390
 GPR08:   00f2 c2b05240
 GPR12: 2000 cbfbdfffcb00  c7fd4c9d1c40
 GPR16:    
 GPR20:   c2bab580 
 GPR24: c73b30c8   
 GPR28: c7fd7133  0001 0001
 NIP [c005a134] iommu_range_alloc+0x3d4/0x450
 LR [c005a128] iommu_range_alloc+0x3c8/0x450
 Call Trace:
 [c7fd4c7eb820] [c005a128] iommu_range_alloc+0x3c8/0x450 
(unreliable)
 [c7fd4c7eb8e0] [c005a580] iommu_alloc+0x60/0x170
 [c7fd4c7eb930] [c005bd4c] iommu_alloc_coherent+0x11c/0x1d0
 [c7fd4c7eb9d0] [c00597e8] dma_iommu_alloc_coherent+0x38/0x50
 [c7fd4c7eb9f0] [c0249ce8] dma_alloc_attrs+0x128/0x180
 [c7fd4c7eba60] [c0080001093210d8] nvme_alloc_queue+0x90/0x2b0 [nvme]
 [c7fd4c7ebac0] [c008000109326034] nvme_reset_work+0x44c/0x1870 [nvme]
 [c7fd4c7ebc30] [c01870b8] process_one_work+0x388/0x730
 [c7fd4c7ebd10] [c01874d8] worker_thread+0x78/0x5b0
 [c7fd4c7ebda0] [c01945cc] kthread+0x1bc/0x1d0
 [c7fd4c7ebe10] [c000cee4] ret_from_kernel_thread+0x5c/0x64
 Instruction dump:
 6000 7b693e24 7d304a14 e9490100 f9490110 4bfffd44 3c62fe39 3863f0f8
 481af5d5 6000 2fa3 419e0050 <0fe0> f9c10030 f9e10038 fa010040
 ---[ end trace 01e0ce48acf1df9b ]---
 nvme nvme0: Removing after probe failure status: -12

Please note we are failing to probe nvme disks.

This corresponds to the below code in iommu_range_alloc() function.
/* Sanity check */
if (unlikely(npages == 0)) {
if (printk_ratelimit())
WARN_ON(1);
return DMA_MAPPING_ERROR;
}

So we are seeing npages to be 0.

We see similar messages for all the 4 nvme disks.  Now since the nvme probe
is failing, the kernel fails to boot as all the root/boot and other
partitions are carved out of nvme disks.

Do note, this problem happens only on cold boot or on reboot. There are no
problems when kernels are kexeced.

git bisect shows Commit 387273118714 ("powerps/pseries/dma: Add support for
2M IOMMU page size") as the cause of the regression.

git bisect start
# bad: [df0cc57e057f18e44dac8e6c18aba47ab53202f9] Linux 5.16
git bisect bad df0cc57e057f18e44dac8e6c18aba47ab53202f9
# good: [8bb7eca972ad531c9b149c0a51ab43a417385813] Linux 5.15
git bisect good 8bb7eca972ad531c9b149c0a51ab43a417385813
# good: [2219b0ceefe835b92a8a74a73fe964aa052742a2] Merge tag 'soc-5.16' of 
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect good 2219b0ceefe835b92a8a74a73fe964aa052742a2
# bad: [206825f50f908771934e1fba2bfc2e1f1138b36a] Merge tag 'mtd/for-5.16' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
git bisect bad 206825f50f908771934e1fba2bfc2e1f1138b36a
# good: [5cd4dc44b8a0f656100e3b6916cf73b1623299eb] Merge tag 'staging-5.16-rc1' 
of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect good 

Re: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()

2023-03-23 Thread Michael Ellerman
Nathan Lynch via B4 Relay  writes:
> From: Nathan Lynch 
>
> The kernel can handle retrying RTAS function calls in response to
> -2/990x in the sys_rtas() handler instead of relaying the intermediate
> status to user space.

This looks good in general.

One query ...

> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 47a2aa43d7d4..c330a22ccc70 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -1798,7 +1798,6 @@ static bool block_rtas_call(int token, int nargs,
>  /* We assume to be passed big endian arguments */
>  SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>  {
> - struct pin_cookie cookie;
>   struct rtas_args args;
>   unsigned long flags;
>   char *buff_copy, *errbuf = NULL;
> @@ -1866,20 +1865,25 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, 
> uargs)
>  
>   buff_copy = get_errorlog_buffer();
>  
> - raw_spin_lock_irqsave(_lock, flags);
> - cookie = lockdep_pin_lock(_lock);
> + do {
> + struct pin_cookie cookie;
>  
> - rtas_args = args;
> - do_enter_rtas(_args);
> - args = rtas_args;
> + raw_spin_lock_irqsave(_lock, flags);
> + cookie = lockdep_pin_lock(_lock);
>  
> - /* A -1 return code indicates that the last command couldn't
> -be completed due to a hardware error. */
> - if (be32_to_cpu(args.rets[0]) == -1)
> - errbuf = __fetch_rtas_last_error(buff_copy);
> + rtas_args = args;
> + do_enter_rtas(_args);
> + args = rtas_args;
>  
> - lockdep_unpin_lock(_lock, cookie);
> - raw_spin_unlock_irqrestore(_lock, flags);
> + /*
> +  * Handle error record retrieval before releasing the lock.
> +  */
> + if (be32_to_cpu(args.rets[0]) == -1)
> + errbuf = __fetch_rtas_last_error(buff_copy);
> +
> + lockdep_unpin_lock(_lock, cookie);
> + raw_spin_unlock_irqrestore(_lock, flags);
> + } while (rtas_busy_delay(be32_to_cpu(args.rets[0])));

rtas_busy_delay_early() has the successive_ext_delays case that will
break out eventually. But if we keep getting plain RTAS_BUSY back from
RTAS I *think* this loop will never terminate?

To avoid that, and just as good manners, I think we should have a
fatal_signal_pending() check, and if that returns true we bail out of
the syscall with -EINTR ?

cheers


Re: [PATCH 14/14] xtensa: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Max Filippov
On Thu, Mar 23, 2023 at 2:24 AM Mike Rapoport  wrote:
>
> From: "Mike Rapoport (IBM)" 
>
> The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
> describe this configuration option.
>
> Update both to actually describe what this option does.
>
> Signed-off-by: Mike Rapoport (IBM) 
> ---
>  arch/xtensa/Kconfig | 16 +---
>  1 file changed, 9 insertions(+), 7 deletions(-)

Reviewed-by: Max Filippov 

-- 
Thanks.
-- Max


[PATCH 14/14] xtensa: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.

Update both to actually describe what this option does.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/xtensa/Kconfig | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index 3eee334ba873..3c6e5471f025 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -772,15 +772,17 @@ config HIGHMEM
  If unsure, say Y.
 
 config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order"
+   int "Order of maximal physically contiguous allocations"
default "10"
help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages.  This option selects the largest power of two that the kernel
- keeps in the memory allocator.  If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
+
+ Don't change if unsure.
 
 endmenu
 
-- 
2.35.1



[PATCH 13/14] sparc: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.

Update both to actually describe what this option does.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/sparc/Kconfig | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index e3242bf5a8df..959e43a1aaca 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -270,15 +270,17 @@ config ARCH_SPARSEMEM_DEFAULT
def_bool y if SPARC64
 
 config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order"
+   int "Order of maximal physically contiguous allocations"
default "12"
help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages.  This option selects the largest power of two that the kernel
- keeps in the memory allocator.  If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
+
+ Don't change if unsure.
 
 if SPARC64 || COMPILE_TEST
 source "kernel/power/Kconfig"
-- 
2.35.1



[PATCH 12/14] sh: drop ranges for definition of ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

sh defines insane ranges for ARCH_FORCE_MAX_ORDER allowing MAX_ORDER
up to 63, which implies maximal contiguous allocation size of 2^63
pages.

Drop bogus definitions of ranges for ARCH_FORCE_MAX_ORDER and leave it a
simple integer with sensible defaults.

Users that *really* need to change the value of ARCH_FORCE_MAX_ORDER
will be able to do so but they won't be mislead by the bogus ranges.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/sh/mm/Kconfig | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig
index fb15ba1052ba..511c17aede4a 100644
--- a/arch/sh/mm/Kconfig
+++ b/arch/sh/mm/Kconfig
@@ -21,9 +21,7 @@ config PAGE_OFFSET
 config ARCH_FORCE_MAX_ORDER
int "Order of maximal physically contiguous allocations"
default "8" if PAGE_SIZE_16KB
-   range 6 63 if PAGE_SIZE_64KB
default "6" if PAGE_SIZE_64KB
-   range 10 63
default "13" if !MMU
default "10"
help
-- 
2.35.1



[PATCH 11/14] sh: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.

Update both to actually describe what this option does.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/sh/mm/Kconfig | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig
index 40271090bd7d..fb15ba1052ba 100644
--- a/arch/sh/mm/Kconfig
+++ b/arch/sh/mm/Kconfig
@@ -19,8 +19,7 @@ config PAGE_OFFSET
default "0x"
 
 config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order"
-   range 8 63 if PAGE_SIZE_16KB
+   int "Order of maximal physically contiguous allocations"
default "8" if PAGE_SIZE_16KB
range 6 63 if PAGE_SIZE_64KB
default "6" if PAGE_SIZE_64KB
@@ -28,16 +27,18 @@ config ARCH_FORCE_MAX_ORDER
default "13" if !MMU
default "10"
help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages.  This option selects the largest power of two that the kernel
- keeps in the memory allocator.  If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
 
  The page size is not necessarily 4KB. Keep this in mind when
  choosing a value for this option.
 
+ Don't change if unsure.
+
 config MEMORY_START
hex "Physical memory start address"
default "0x0800"
-- 
2.35.1



[PATCH 10/14] powerpc: drop ranges for definition of ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

PowerPC defines ranges for ARCH_FORCE_MAX_ORDER some of which are
insanely allowing MAX_ORDER up to 63, which implies maximal contiguous
allocation size of 2^63 pages.

Drop bogus definitions of ranges for ARCH_FORCE_MAX_ORDER and leave it a
simple integer with sensible defaults.

Users that *really* need to change the value of ARCH_FORCE_MAX_ORDER
will be able to do so but they won't be mislead by the bogus ranges.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/powerpc/Kconfig | 6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c0095bf795ca..419be4a71004 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -897,17 +897,11 @@ config DATA_SHIFT
 
 config ARCH_FORCE_MAX_ORDER
int "Order of maximal physically contiguous allocations"
-   range 7 8 if PPC64 && PPC_64K_PAGES
default "8" if PPC64 && PPC_64K_PAGES
-   range 12 12 if PPC64 && !PPC_64K_PAGES
default "12" if PPC64 && !PPC_64K_PAGES
-   range 8 63 if PPC32 && PPC_16K_PAGES
default "8" if PPC32 && PPC_16K_PAGES
-   range 6 63 if PPC32 && PPC_64K_PAGES
default "6" if PPC32 && PPC_64K_PAGES
-   range 4 63 if PPC32 && PPC_256K_PAGES
default "4" if PPC32 && PPC_256K_PAGES
-   range 10 63
default "10"
help
  The kernel page allocator limits the size of maximal physically
-- 
2.35.1



[PATCH 09/14] powerpc: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.

Update both to actually describe what this option does.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/powerpc/Kconfig | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 24d56536b269..c0095bf795ca 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -896,7 +896,7 @@ config DATA_SHIFT
  8M pages will be pinned.
 
 config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order"
+   int "Order of maximal physically contiguous allocations"
range 7 8 if PPC64 && PPC_64K_PAGES
default "8" if PPC64 && PPC_64K_PAGES
range 12 12 if PPC64 && !PPC_64K_PAGES
@@ -910,17 +910,19 @@ config ARCH_FORCE_MAX_ORDER
range 10 63
default "10"
help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages.  This option selects the largest power of two that the kernel
- keeps in the memory allocator.  If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
 
  The page size is not necessarily 4KB.  For example, on 64-bit
  systems, 64KB pages can be enabled via CONFIG_PPC_64K_PAGES.  Keep
  this in mind when choosing a value for this option.
 
+ Don't change if unsure.
+
 config PPC_SUBPAGE_PROT
bool "Support setting protections for 4k subpages (subpage_prot 
syscall)"
default n
-- 
2.35.1



[PATCH 08/14] nios2: drop ranges for definition of ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

nios2 defines range for ARCH_FORCE_MAX_ORDER allowing MAX_ORDER
up to 19, which implies maximal contiguous allocation size of 2^19
pages or 2GiB.

Drop bogus definition of ranges for ARCH_FORCE_MAX_ORDER and leave it a
simple integer with sensible default.

Users that *really* need to change the value of ARCH_FORCE_MAX_ORDER
will be able to do so but they won't be mislead by the bogus ranges.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/nios2/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/nios2/Kconfig b/arch/nios2/Kconfig
index fcaa6bbda3fc..e5936417d3cd 100644
--- a/arch/nios2/Kconfig
+++ b/arch/nios2/Kconfig
@@ -46,7 +46,6 @@ source "kernel/Kconfig.hz"
 
 config ARCH_FORCE_MAX_ORDER
int "Order of maximal physically contiguous allocations"
-   range 8 19
default "10"
help
  The kernel page allocator limits the size of maximal physically
-- 
2.35.1



[PATCH 07/14] nios2: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.

Update both to actually describe what this option does.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/nios2/Kconfig | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/nios2/Kconfig b/arch/nios2/Kconfig
index 89708b95978c..fcaa6bbda3fc 100644
--- a/arch/nios2/Kconfig
+++ b/arch/nios2/Kconfig
@@ -45,16 +45,18 @@ menu "Kernel features"
 source "kernel/Kconfig.hz"
 
 config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order"
+   int "Order of maximal physically contiguous allocations"
range 8 19
default "10"
help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages.  This option selects the largest power of two that the kernel
- keeps in the memory allocator.  If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
+
+ Don't change if unsure.
 
 endmenu
 
-- 
2.35.1



[PATCH 06/14] m68k: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.

Update both to actually describe what this option does.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/m68k/Kconfig.cpu | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu
index c9df6572133f..e530bc8f240f 100644
--- a/arch/m68k/Kconfig.cpu
+++ b/arch/m68k/Kconfig.cpu
@@ -398,21 +398,23 @@ config SINGLE_MEMORY_CHUNK
  Say N if not sure.
 
 config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order" if ADVANCED
+   int "Order of maximal physically contiguous allocations" if ADVANCED
depends on !SINGLE_MEMORY_CHUNK
default "10"
help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages.  This option selects the largest power of two that the kernel
- keeps in the memory allocator.  If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
 
  For systems that have holes in their physical address space this
  value also defines the minimal size of the hole that allows
  freeing unused memory map.
 
+ Don't change if unsure.
+
 config 060_WRITETHROUGH
bool "Use write-through caching for 68060 supervisor accesses"
depends on ADVANCED && M68060
-- 
2.35.1



[PATCH 05/14] ia64: don't allow users to override ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

It is enough to keep default values for base and huge pages without
letting users to override ARCH_FORCE_MAX_ORDER.

Drop the prompt to make the option unvisible in *config.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/ia64/Kconfig | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 0d2f41fa56ee..b61437cae162 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -202,8 +202,7 @@ config IA64_CYCLONE
  If you're unsure, answer N.
 
 config ARCH_FORCE_MAX_ORDER
-   int "MAX_ORDER (10 - 16)"  if !HUGETLB_PAGE
-   range 10 16  if !HUGETLB_PAGE
+   int
default "16" if HUGETLB_PAGE
default "10"
 
-- 
2.35.1



[PATCH 04/14] csky: drop ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The default value of ARCH_FORCE_MAX_ORDER matches the generic default
defined in the MM code, the architecture does not support huge pages, so
there is no need to keep ARCH_FORCE_MAX_ORDER option available.

Drop it.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/csky/Kconfig | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index c694fac43bed..00379a843c37 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -332,10 +332,6 @@ config HIGHMEM
select KMAP_LOCAL
default y
 
-config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order"
-   default "10"
-
 config DRAM_BASE
hex "DRAM start addr (the same with memory-section in dts)"
default 0x0
-- 
2.35.1



[PATCH 03/14] arm64: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.

Update both to actually describe what this option does.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/arm64/Kconfig | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index bab6483e4317..75af4c329224 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1487,24 +1487,24 @@ config XEN
 # 16K |   27  |  14  |   13| 11
 |
 # 64K |   29  |  16  |   13| 13
 |
 config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order" if ARM64_4K_PAGES || ARM64_16K_PAGES
+   int "Order of maximal physically contiguous allocations" if 
ARM64_4K_PAGES || ARM64_16K_PAGES
default "13" if ARM64_64K_PAGES
default "11" if ARM64_16K_PAGES
default "10"
help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages.  This option selects the largest power of two that the kernel
- keeps in the memory allocator.  If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
 
- We make sure that we can allocate up to a HugePage size for each 
configuration.
- Hence we have :
-   MAX_ORDER = PMD_SHIFT - PAGE_SHIFT  => PAGE_SHIFT - 3
+ The maximal size of allocation cannot exceed the size of the
+ section, so the value of MAX_ORDER should satisfy
 
- However for 4K, we choose a higher default value, 10 as opposed to 9, 
giving us
- 4M allocations matching the default size used by generic code.
+   MAX_ORDER + PAGE_SHIFT <= SECTION_SIZE_BITS
+
+ Don't change if unsure.
 
 config UNMAP_KERNEL_AT_EL0
bool "Unmap kernel when running in userspace (aka \"KAISER\")" if EXPERT
@@ -2298,4 +2298,3 @@ endmenu # "CPU Power Management"
 source "drivers/acpi/Kconfig"
 
 source "arch/arm64/kvm/Kconfig"
-
-- 
2.35.1



[PATCH 02/14] arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

It is not a good idea to change fundamental parameters of core memory
management. Having predefined ranges suggests that the values within
those ranges are sensible, but one has to *really* understand
implications of changing MAX_ORDER before actually amending it and
ranges don't help here.

Drop ranges in definition of ARCH_FORCE_MAX_ORDER

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/arm64/Kconfig | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e60baf7859d1..bab6483e4317 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1489,9 +1489,7 @@ config XEN
 config ARCH_FORCE_MAX_ORDER
int "Maximum zone order" if ARM64_4K_PAGES || ARM64_16K_PAGES
default "13" if ARM64_64K_PAGES
-   range 11 13 if ARM64_16K_PAGES
default "11" if ARM64_16K_PAGES
-   range 10 15 if ARM64_4K_PAGES
default "10"
help
  The kernel memory allocator divides physically contiguous memory
-- 
2.35.1



[PATCH 01/14] arm: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.

Update both to actually describe what this option does.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/arm/Kconfig | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 929e646e84b9..0b15384c62e6 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1354,17 +1354,19 @@ config ARM_MODULE_PLTS
  configurations. If unsure, say y.
 
 config ARCH_FORCE_MAX_ORDER
-   int "Maximum zone order"
+   int "Order of maximal physically contiguous allocations"
default "11" if SOC_AM33XX
default "8" if SA
default "10"
help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages.  This option selects the largest power of two that the kernel
- keeps in the memory allocator.  If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
+
+ Don't change if unsure.
 
 config ALIGNMENT_TRAP
def_bool CPU_CP15_MMU
-- 
2.35.1



[PATCH 00/14] arch,mm: cleanup Kconfig entries for ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Hi,

Several architectures have ARCH_FORCE_MAX_ORDER in their Kconfig and
they all have wrong and misleading prompt and help text for this option.

Besides, some define insane limits for possible values of
ARCH_FORCE_MAX_ORDER, some carefully define ranges only for a subset of
possible configurations, some make this option configurable by users for no
good reason.

This set updates the prompt and help text everywhere and does its best to
update actual definitions of ranges where applicable.

Mike Rapoport (IBM) (14):
  arm: reword ARCH_FORCE_MAX_ORDER prompt and help text
  arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER
  arm64: reword ARCH_FORCE_MAX_ORDER prompt and help text
  csky: drop ARCH_FORCE_MAX_ORDER
  ia64: don't allow users to override ARCH_FORCE_MAX_ORDER
  m68k: reword ARCH_FORCE_MAX_ORDER prompt and help text
  nios2: reword ARCH_FORCE_MAX_ORDER prompt and help text
  nios2: drop ranges for definition of ARCH_FORCE_MAX_ORDER
  powerpc: reword ARCH_FORCE_MAX_ORDER prompt and help text
  powerpc: drop ranges for definition of ARCH_FORCE_MAX_ORDER
  sh: reword ARCH_FORCE_MAX_ORDER prompt and help text
  sh: drop ranges for definition of ARCH_FORCE_MAX_ORDER
  sparc: reword ARCH_FORCE_MAX_ORDER prompt and help text
  xtensa: reword ARCH_FORCE_MAX_ORDER prompt and help text

 arch/arm/Kconfig  | 16 +---
 arch/arm64/Kconfig| 27 ---
 arch/csky/Kconfig |  4 
 arch/ia64/Kconfig |  3 +--
 arch/m68k/Kconfig.cpu | 16 +---
 arch/nios2/Kconfig| 17 +
 arch/powerpc/Kconfig  | 22 +-
 arch/sh/mm/Kconfig| 19 +--
 arch/sparc/Kconfig| 16 +---
 arch/xtensa/Kconfig   | 16 +---
 10 files changed, 76 insertions(+), 80 deletions(-)


base-commit: 51551d71edbc998fd8c8afa7312db3d270f5998e
-- 
2.35.1



[PATCH] arch/powerpc/kvm: kvmppc_core_vcpu_create_hv: check for kzalloc failure

2023-03-23 Thread Kautuk Consul
kvmppc_vcore_create() might not be able to allocate memory through
kzalloc. In that case the kvm->arch.online_vcores shouldn't be
incremented.
Add a check for kzalloc failure and return with -ENOMEM from
kvmppc_core_vcpu_create_hv().

Signed-off-by: Kautuk Consul 
---
 arch/powerpc/kvm/book3s_hv.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6ba68dd6190b..e29ee755c920 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2968,13 +2968,17 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu 
*vcpu)
pr_devel("KVM: collision on id %u", id);
vcore = NULL;
} else if (!vcore) {
+   vcore = kvmppc_vcore_create(kvm,
+   id & ~(kvm->arch.smt_mode - 1));
+   if (unlikely(!vcore)) {
+   mutex_unlock(>lock);
+   return -ENOMEM;
+   }
+
/*
 * Take mmu_setup_lock for mutual exclusion
 * with kvmppc_update_lpcr().
 */
-   err = -ENOMEM;
-   vcore = kvmppc_vcore_create(kvm,
-   id & ~(kvm->arch.smt_mode - 1));
mutex_lock(>arch.mmu_setup_lock);
kvm->arch.vcores[core] = vcore;
kvm->arch.online_vcores++;
-- 
2.39.2



Re: [PATCH v3 0/4] Use dma_default_coherent for devicetree default coherency

2023-03-23 Thread Christoph Hellwig
The series looks fine to me.  How should we merge it?


Re: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()

2023-03-23 Thread Andrew Donnellan
On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote:
> From: Nathan Lynch 
> 
> The kernel can handle retrying RTAS function calls in response to
> -2/990x in the sys_rtas() handler instead of relaying the
> intermediate
> status to user space.
> 
> Justifications:
> 
> * Currently it's nondeterministic and quite variable in practice
>   whether a retry status is returned for any given invocation of
>   sys_rtas(). Therefore user space code cannot be expecting a retry
>   result without already being broken.
> 
> * This tends to significantly reduce the total number of system calls
>   issued by programs such as drmgr which make use of sys_rtas(),
>   improving the experience of tracing and debugging such
>   programs. This is the main motivation for me: I think this change
>   will make it easier for us to characterize current sys_rtas() use
>   cases as we move them to other interfaces over time.
> 
> * It reduces the number of opportunities for user space to leave
>   complex operations, such as those associated with DLPAR, incomplete
>   and diffcult to recover.
> 
> * We can expect performance improvements for existing sys_rtas()
>   users, not only because of overall reduction in the number of
> system
>   calls issued, but also due to the better handling of -2/990x in the
>   kernel. For example, librtas still sleeps for 1ms on -2, which is
>   completely unnecessary.

Would be good to see this fixed on the librtas side.

> 
> Performance differences for PHB add and remove on a small P10 PowerVM
> partition are included below. For add, elapsed time is slightly
> reduced. For remove, there are more significant improvements: the
> number of context switches is reduced by an order of magnitude, and
> elapsed time is reduced by over half.
> 
> (- before, + after):
> 
>   Performance counter stats for 'drmgr -c phb -a -s PHB 23' (5 runs):
> 
> -  1,847.58 msec task-clock   #    0.135
> CPUs utilized   ( +- 14.15% )
> -    10,867  cs   #    9.800
> K/sec   ( +- 14.14% )
> +  1,901.15 msec task-clock   #    0.148
> CPUs utilized   ( +- 14.13% )
> +    10,451  cs   #    9.158
> K/sec   ( +- 14.14% )
> 
> - 13.656557 +- 0.000124 seconds time elapsed  ( +-  0.00% )
> +  12.88080 +- 0.00404 seconds time elapsed  ( +-  0.03% )
> 
>   Performance counter stats for 'drmgr -c phb -r -s PHB 23' (5 runs):
> 
> -  1,473.75 msec task-clock   #    0.092
> CPUs utilized   ( +- 14.15% )
> - 2,652  cs   #    3.000
> K/sec   ( +- 14.16% )
> +  1,444.55 msec task-clock   #    0.221
> CPUs utilized   ( +- 14.14% )
> +   104  cs   #  119.957
> /sec    ( +- 14.63% )
> 
> -  15.99718 +- 0.00801 seconds time elapsed  ( +-  0.05% )
> +   6.54256 +- 0.00830 seconds time elapsed  ( +-  0.13% )
> 
> Move the existing rtas_lock-guarded critical section in sys_rtas()
> into a conventional rtas_busy_delay()-based loop, returning to user
> space only when a final success or failure result is available.
> 
> Signed-off-by: Nathan Lynch 

Should there be some kind of timeout? I'm a bit worried by sleeping in
a syscall for an extended period.

-- 
Andrew DonnellanOzLabs, ADL Canberra
a...@linux.ibm.com   IBM Australia Limited


Re: [PATCH 6/8] powerpc/rtas: lockdep annotations

2023-03-23 Thread Andrew Donnellan
On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote:
> From: Nathan Lynch 
> 
> Add lockdep annotations for the following properties that must hold:
> 
> * Any error log retrieval must be atomically coupled with the prior
>   RTAS call, without a window for another RTAS call to occur before
> the
>   error log can be retrieved.
> 
> * All users of the core rtas_args parameter block must hold
> rtas_lock.
> 
> Move the definitions of rtas_lock and rtas_args up in the file so
> that
> __do_enter_rtas_trace() can refer to them.
> 
> Signed-off-by: Nathan Lynch 

I'm no lockdep expert and I haven't checked if every possible case that
can be annotated has been annotated, but these changes make sense as
far as I can tell from my limited inspection.

Reviewed-by: Andrew Donnellan 

-- 
Andrew DonnellanOzLabs, ADL Canberra
a...@linux.ibm.com   IBM Australia Limited