Re: [PATCH v5 06/11] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()

2021-08-16 Thread Leonardo Brás
Hello Fred, thanks for this feedback!

On Tue, 2021-07-20 at 19:49 +0200, Frederic Barrat wrote:
> 
> > kfree(window);
> >   
> > -out_clear_window:
> > -   remove_ddw(pdn, true);
> > +out_del_prop:
> > +   of_remove_property(pdn, win64);
> >   
> >   out_free_prop:
> > kfree(win64->name);
> > kfree(win64->value);
> > kfree(win64);
> >   
> > +out_remove_win:
> > +   remove_ddw(pdn, true);
> 
> 
> I believe there's a small problem here. We jump directly to 
> out_remove_win if allocating the property failed. Yet, the first
> thing 
> remove_ddw() does is look for the property. So it will never find it
> and 
> the window is never removed by the hypervisor.
> 
>    Fred

That makes sense, thanks for catching this one!

What I intended here was just removing the DDW, so I think it should be
ok replacing remove_ddw() by a new helper that only does the rtas-call.

I will send a v6 with this change soon.

> 
> 
> > +
> >   out_failed:
> > if (default_win_removed)
> > reset_dma_window(dev, pdn);
> > 




Re: [PATCH v2 2/2] powerpc/perf: Return regs->nip as instruction pointer value when SIAR is 0

2021-08-16 Thread Madhavan Srinivasan



On 8/16/21 12:26 PM, Christophe Leroy wrote:



Le 16/08/2021 à 08:44, kajoljain a écrit :



On 8/14/21 6:14 PM, Michael Ellerman wrote:

Christophe Leroy  writes:

Le 13/08/2021 à 10:24, Kajol Jain a écrit :

Incase of random sampling, there can be scenarios where SIAR is not
latching sample address and results in 0 value. Since current code
directly returning the siar value, we could see multiple instruction
pointer values as 0 in perf report.


Can you please give more detail on that? What scenarios? On what CPUs?



Hi Michael,
 Sure I will update these details in my next patch-set.


Patch resolves this issue by adding a ternary condition to return
regs->nip incase SIAR is 0.


Your description seems rather similar to
https://github.com/linuxppc/linux/commit/2ca13a4cc56c920a6c9fc8ee45d02bccacd7f46c 



Does it mean that the problem occurs on more than the power10 DD1 ?

In that case, can the solution be common instead of doing something 
for power10 DD1 and something

for others ?


Agreed.

This change would seem to make that P10 DD1 logic superfluous.

Also we already have a fallback to regs->nip in the else case of the 
if,

so we should just use that rather than adding a ternary condition.

eg.

if (use_siar && siar_valid(regs) && siar)
    return siar + perf_ip_adjust(regs);
else if (use_siar)
    return 0;    // no valid instruction pointer
else
    return regs->nip;


I'm also not sure why we have that return 0 case, I can't think of why
we'd ever want to do that rather than using nip. So maybe we should do
another patch to drop that case.


Yeah make sense. I will remove return 0 case in my next version.



This was added by commit 
https://github.com/linuxppc/linux/commit/e6878835ac4794f25385522d29c634b7bbb7cca9


Are we sure it was an error to add it and it can be removed ?


pc having 0 is wrong (kernel does not execute at 0x0 or userspace).
yeah we should drop it.

Maddy


Christophe


Re: [PATCH v2 3/3] powerpc/configs: Regenerate mpc885_ads_defconfig

2021-08-16 Thread Christophe Leroy




Le 17/08/2021 à 06:54, Joel Stanley a écrit :

Regenerate atop v5.14-rc6 by doing a make savedefconfig.

The changes a re-ordering except for the following (which are still set
indirectly):

  - CONFIG_DEBUG_KERNEL=y selected by EXPERT

  - CONFIG_PPC_EARLY_DEBUG_CPM_ADDR=0xff002008 which is the default
setting

Signed-off-by: Joel Stanley 


Acked-by: Christophe Leroy 


---
  arch/powerpc/configs/mpc885_ads_defconfig | 46 +++
  1 file changed, 22 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/configs/mpc885_ads_defconfig 
b/arch/powerpc/configs/mpc885_ads_defconfig
index cd08f9ed2c8d..c74dc76b1d0d 100644
--- a/arch/powerpc/configs/mpc885_ads_defconfig
+++ b/arch/powerpc/configs/mpc885_ads_defconfig
@@ -1,19 +1,30 @@
-CONFIG_PPC_8xx=y
  # CONFIG_SWAP is not set
  CONFIG_SYSVIPC=y
  CONFIG_NO_HZ=y
  CONFIG_HIGH_RES_TIMERS=y
+CONFIG_BPF_JIT=y
+CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
  CONFIG_LOG_BUF_SHIFT=14
  CONFIG_EXPERT=y
  # CONFIG_ELF_CORE is not set
  # CONFIG_BASE_FULL is not set
  # CONFIG_FUTEX is not set
+CONFIG_PERF_EVENTS=y
  # CONFIG_VM_EVENT_COUNTERS is not set
-# CONFIG_BLK_DEV_BSG is not set
-CONFIG_PARTITION_ADVANCED=y
+CONFIG_PPC_8xx=y
+CONFIG_8xx_GPIO=y
+CONFIG_SMC_UCODE_PATCH=y
+CONFIG_PIN_TLB=y
  CONFIG_GEN_RTC=y
  CONFIG_HZ_100=y
+CONFIG_MATH_EMULATION=y
+CONFIG_PPC_16K_PAGES=y
+CONFIG_ADVANCED_OPTIONS=y
  # CONFIG_SECCOMP is not set
+CONFIG_STRICT_KERNEL_RWX=y
+CONFIG_MODULES=y
+# CONFIG_BLK_DEV_BSG is not set
+CONFIG_PARTITION_ADVANCED=y
  CONFIG_NET=y
  CONFIG_PACKET=y
  CONFIG_UNIX=y
@@ -46,38 +57,25 @@ CONFIG_DAVICOM_PHY=y
  # CONFIG_LEGACY_PTYS is not set
  CONFIG_SERIAL_CPM=y
  CONFIG_SERIAL_CPM_CONSOLE=y
+CONFIG_SPI=y
+CONFIG_SPI_FSL_SPI=y
  # CONFIG_HWMON is not set
+CONFIG_WATCHDOG=y
+CONFIG_8xxx_WDT=y
  # CONFIG_USB_SUPPORT is not set
  # CONFIG_DNOTIFY is not set
  CONFIG_TMPFS=y
  CONFIG_CRAMFS=y
  CONFIG_NFS_FS=y
  CONFIG_ROOT_NFS=y
+CONFIG_CRYPTO=y
+CONFIG_CRYPTO_DEV_TALITOS=y
  CONFIG_CRC32_SLICEBY4=y
  CONFIG_DEBUG_INFO=y
  CONFIG_MAGIC_SYSRQ=y
-CONFIG_DETECT_HUNG_TASK=y
-CONFIG_PPC_16K_PAGES=y
-CONFIG_DEBUG_KERNEL=y
  CONFIG_DEBUG_FS=y
-CONFIG_PPC_PTDUMP=y
-CONFIG_MODULES=y
-CONFIG_SPI=y
-CONFIG_SPI_FSL_SPI=y
-CONFIG_CRYPTO=y
-CONFIG_CRYPTO_DEV_TALITOS=y
-CONFIG_8xx_GPIO=y
-CONFIG_WATCHDOG=y
-CONFIG_8xxx_WDT=y
-CONFIG_SMC_UCODE_PATCH=y
-CONFIG_ADVANCED_OPTIONS=y
-CONFIG_PIN_TLB=y
-CONFIG_PERF_EVENTS=y
-CONFIG_MATH_EMULATION=y
-CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
-CONFIG_STRICT_KERNEL_RWX=y
-CONFIG_BPF_JIT=y
  CONFIG_DEBUG_VM_PGTABLE=y
+CONFIG_DETECT_HUNG_TASK=y
  CONFIG_BDI_SWITCH=y
  CONFIG_PPC_EARLY_DEBUG=y
-CONFIG_PPC_EARLY_DEBUG_CPM_ADDR=0xff002008
+CONFIG_PPC_PTDUMP=y



Re: [PATCH v2 2/3] powerpc/config: Renable MTD_PHYSMAP_OF

2021-08-16 Thread Christophe Leroy




Le 17/08/2021 à 06:54, Joel Stanley a écrit :

CONFIG_MTD_PHYSMAP_OF is not longer enabled as it depends on
MTD_PHYSMAP which is not enabled.

This is a regression from commit 642b1e8dbed7 ("mtd: maps: Merge
physmap_of.c into physmap-core.c"), which added the extra dependency.
Add CONFIG_MTD_PHYSMAP=y so this stays in the config, as Christophe said
it is useful for build coverage.

Fixes: 642b1e8dbed7 ("mtd: maps: Merge physmap_of.c into physmap-core.c")
Signed-off-by: Joel Stanley 


Acked-by: Christophe Leroy 


---
  arch/powerpc/configs/mpc885_ads_defconfig | 1 +
  1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/configs/mpc885_ads_defconfig 
b/arch/powerpc/configs/mpc885_ads_defconfig
index 5cd17adf903f..cd08f9ed2c8d 100644
--- a/arch/powerpc/configs/mpc885_ads_defconfig
+++ b/arch/powerpc/configs/mpc885_ads_defconfig
@@ -33,6 +33,7 @@ CONFIG_MTD_CFI_GEOMETRY=y
  # CONFIG_MTD_CFI_I2 is not set
  CONFIG_MTD_CFI_I4=y
  CONFIG_MTD_CFI_AMDSTD=y
+CONFIG_MTD_PHYSMAP=y
  CONFIG_MTD_PHYSMAP_OF=y
  # CONFIG_BLK_DEV is not set
  CONFIG_NETDEVICES=y



[PATCH v2 3/3] powerpc/configs: Regenerate mpc885_ads_defconfig

2021-08-16 Thread Joel Stanley
Regenerate atop v5.14-rc6 by doing a make savedefconfig.

The changes a re-ordering except for the following (which are still set
indirectly):

 - CONFIG_DEBUG_KERNEL=y selected by EXPERT

 - CONFIG_PPC_EARLY_DEBUG_CPM_ADDR=0xff002008 which is the default
   setting

Signed-off-by: Joel Stanley 
---
 arch/powerpc/configs/mpc885_ads_defconfig | 46 +++
 1 file changed, 22 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/configs/mpc885_ads_defconfig 
b/arch/powerpc/configs/mpc885_ads_defconfig
index cd08f9ed2c8d..c74dc76b1d0d 100644
--- a/arch/powerpc/configs/mpc885_ads_defconfig
+++ b/arch/powerpc/configs/mpc885_ads_defconfig
@@ -1,19 +1,30 @@
-CONFIG_PPC_8xx=y
 # CONFIG_SWAP is not set
 CONFIG_SYSVIPC=y
 CONFIG_NO_HZ=y
 CONFIG_HIGH_RES_TIMERS=y
+CONFIG_BPF_JIT=y
+CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
 CONFIG_LOG_BUF_SHIFT=14
 CONFIG_EXPERT=y
 # CONFIG_ELF_CORE is not set
 # CONFIG_BASE_FULL is not set
 # CONFIG_FUTEX is not set
+CONFIG_PERF_EVENTS=y
 # CONFIG_VM_EVENT_COUNTERS is not set
-# CONFIG_BLK_DEV_BSG is not set
-CONFIG_PARTITION_ADVANCED=y
+CONFIG_PPC_8xx=y
+CONFIG_8xx_GPIO=y
+CONFIG_SMC_UCODE_PATCH=y
+CONFIG_PIN_TLB=y
 CONFIG_GEN_RTC=y
 CONFIG_HZ_100=y
+CONFIG_MATH_EMULATION=y
+CONFIG_PPC_16K_PAGES=y
+CONFIG_ADVANCED_OPTIONS=y
 # CONFIG_SECCOMP is not set
+CONFIG_STRICT_KERNEL_RWX=y
+CONFIG_MODULES=y
+# CONFIG_BLK_DEV_BSG is not set
+CONFIG_PARTITION_ADVANCED=y
 CONFIG_NET=y
 CONFIG_PACKET=y
 CONFIG_UNIX=y
@@ -46,38 +57,25 @@ CONFIG_DAVICOM_PHY=y
 # CONFIG_LEGACY_PTYS is not set
 CONFIG_SERIAL_CPM=y
 CONFIG_SERIAL_CPM_CONSOLE=y
+CONFIG_SPI=y
+CONFIG_SPI_FSL_SPI=y
 # CONFIG_HWMON is not set
+CONFIG_WATCHDOG=y
+CONFIG_8xxx_WDT=y
 # CONFIG_USB_SUPPORT is not set
 # CONFIG_DNOTIFY is not set
 CONFIG_TMPFS=y
 CONFIG_CRAMFS=y
 CONFIG_NFS_FS=y
 CONFIG_ROOT_NFS=y
+CONFIG_CRYPTO=y
+CONFIG_CRYPTO_DEV_TALITOS=y
 CONFIG_CRC32_SLICEBY4=y
 CONFIG_DEBUG_INFO=y
 CONFIG_MAGIC_SYSRQ=y
-CONFIG_DETECT_HUNG_TASK=y
-CONFIG_PPC_16K_PAGES=y
-CONFIG_DEBUG_KERNEL=y
 CONFIG_DEBUG_FS=y
-CONFIG_PPC_PTDUMP=y
-CONFIG_MODULES=y
-CONFIG_SPI=y
-CONFIG_SPI_FSL_SPI=y
-CONFIG_CRYPTO=y
-CONFIG_CRYPTO_DEV_TALITOS=y
-CONFIG_8xx_GPIO=y
-CONFIG_WATCHDOG=y
-CONFIG_8xxx_WDT=y
-CONFIG_SMC_UCODE_PATCH=y
-CONFIG_ADVANCED_OPTIONS=y
-CONFIG_PIN_TLB=y
-CONFIG_PERF_EVENTS=y
-CONFIG_MATH_EMULATION=y
-CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
-CONFIG_STRICT_KERNEL_RWX=y
-CONFIG_BPF_JIT=y
 CONFIG_DEBUG_VM_PGTABLE=y
+CONFIG_DETECT_HUNG_TASK=y
 CONFIG_BDI_SWITCH=y
 CONFIG_PPC_EARLY_DEBUG=y
-CONFIG_PPC_EARLY_DEBUG_CPM_ADDR=0xff002008
+CONFIG_PPC_PTDUMP=y
-- 
2.32.0



[PATCH v2 2/3] powerpc/config: Renable MTD_PHYSMAP_OF

2021-08-16 Thread Joel Stanley
CONFIG_MTD_PHYSMAP_OF is not longer enabled as it depends on
MTD_PHYSMAP which is not enabled.

This is a regression from commit 642b1e8dbed7 ("mtd: maps: Merge
physmap_of.c into physmap-core.c"), which added the extra dependency.
Add CONFIG_MTD_PHYSMAP=y so this stays in the config, as Christophe said
it is useful for build coverage.

Fixes: 642b1e8dbed7 ("mtd: maps: Merge physmap_of.c into physmap-core.c")
Signed-off-by: Joel Stanley 
---
 arch/powerpc/configs/mpc885_ads_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/configs/mpc885_ads_defconfig 
b/arch/powerpc/configs/mpc885_ads_defconfig
index 5cd17adf903f..cd08f9ed2c8d 100644
--- a/arch/powerpc/configs/mpc885_ads_defconfig
+++ b/arch/powerpc/configs/mpc885_ads_defconfig
@@ -33,6 +33,7 @@ CONFIG_MTD_CFI_GEOMETRY=y
 # CONFIG_MTD_CFI_I2 is not set
 CONFIG_MTD_CFI_I4=y
 CONFIG_MTD_CFI_AMDSTD=y
+CONFIG_MTD_PHYSMAP=y
 CONFIG_MTD_PHYSMAP_OF=y
 # CONFIG_BLK_DEV is not set
 CONFIG_NETDEVICES=y
-- 
2.32.0



[PATCH v2 1/3] powerpc/config: Fix IPV6 warning in mpc855_ads

2021-08-16 Thread Joel Stanley
When building this config there's a warning:

  79:warning: override: reassigning to symbol IPV6

Commit 9a1762a4a4ff ("powerpc/8xx: Update mpc885_ads_defconfig to
improve CI") added CONFIG_IPV6=y, but left '# CONFIG_IPV6 is not set'
in.

IPV6 is default y, so remove both to clean up the build.

Acked-by: Christophe Leroy 
Fixes: 9a1762a4a4ff ("powerpc/8xx: Update mpc885_ads_defconfig to improve CI")
Signed-off-by: Joel Stanley 
---
 arch/powerpc/configs/mpc885_ads_defconfig | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/configs/mpc885_ads_defconfig 
b/arch/powerpc/configs/mpc885_ads_defconfig
index d21f266cea9a..5cd17adf903f 100644
--- a/arch/powerpc/configs/mpc885_ads_defconfig
+++ b/arch/powerpc/configs/mpc885_ads_defconfig
@@ -21,7 +21,6 @@ CONFIG_INET=y
 CONFIG_IP_MULTICAST=y
 CONFIG_IP_PNP=y
 CONFIG_SYN_COOKIES=y
-# CONFIG_IPV6 is not set
 # CONFIG_FW_LOADER is not set
 CONFIG_MTD=y
 CONFIG_MTD_BLOCK=y
@@ -76,7 +75,6 @@ CONFIG_PERF_EVENTS=y
 CONFIG_MATH_EMULATION=y
 CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
 CONFIG_STRICT_KERNEL_RWX=y
-CONFIG_IPV6=y
 CONFIG_BPF_JIT=y
 CONFIG_DEBUG_VM_PGTABLE=y
 CONFIG_BDI_SWITCH=y
-- 
2.32.0



[PATCH v2 0/3] powerpc: mpc855_ads defconfig fixes

2021-08-16 Thread Joel Stanley
v2: fix typos, split out mtd fix from savedefconfig patch

The first patch fixes a build warning I noticed when testing something
unrelated.

The second fixes a regression where the MTD partition support dropped out
of the config. I have enabled the dependency so this is still part of
the config.

The third patch is the result of doing a make savedefconfig with the
first two patches applied.

Joel Stanley (3):
  powerpc/config: Fix IPV6 warning in mpc855_ads
  powerpc/config: Renable MTD_PHYSMAP_OF
  powerpc/configs: Regenerate mpc885_ads_defconfig

 arch/powerpc/configs/mpc885_ads_defconfig | 49 +++
 1 file changed, 23 insertions(+), 26 deletions(-)

-- 
2.32.0



Re: [PATCH 2/2] powerpc/configs: Regenerate mpc885_ads_defconfig

2021-08-16 Thread Joel Stanley
On Mon, 16 Aug 2021 at 08:49, Christophe Leroy
 wrote:
>
>
>
> Le 16/08/2021 à 10:31, Joel Stanley a écrit :
> > Regenrate atop v5.14-rc6.
>
> Typos.
>
> You mean you did redo a "make savedefconfig" ?

Yes, I did.


[PATCH 3/7] powerpc: replace cc-option-yn uses with cc-option

2021-08-16 Thread Nick Desaulniers
cc-option-yn can be replaced with cc-option. ie.
Checking for support:
ifeq ($(call cc-option-yn,$(FLAG)),y)
becomes:
ifneq ($(call cc-option,$(FLAG)),)

Checking for lack of support:
ifeq ($(call cc-option-yn,$(FLAG)),n)
becomes:
ifeq ($(call cc-option,$(FLAG)),)

This allows us to pursue removing cc-option-yn.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Nick Desaulniers 
---
 arch/powerpc/Makefile  | 12 ++--
 arch/powerpc/boot/Makefile |  5 +
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 9aaf1abbc641..85e224536cf7 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -12,12 +12,12 @@
 # Rewritten by Cort Dougan and Paul Mackerras
 #
 
-HAS_BIARCH := $(call cc-option-yn, -m32)
+HAS_BIARCH := $(call cc-option,-m32)
 
 # Set default 32 bits cross compilers for vdso and boot wrapper
 CROSS32_COMPILE ?=
 
-ifeq ($(HAS_BIARCH),y)
+ifeq ($(HAS_BIARCH),-m32)
 ifeq ($(CROSS32_COMPILE),)
 ifdef CONFIG_PPC32
 # These options will be overridden by any -mcpu option that the CPU
@@ -107,7 +107,7 @@ cflags-$(CONFIG_CPU_LITTLE_ENDIAN)  += -mlittle-endian
 aflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call 
cc-option,-mbig-endian)
 aflags-$(CONFIG_CPU_LITTLE_ENDIAN) += -mlittle-endian
 
-ifeq ($(HAS_BIARCH),y)
+ifeq ($(HAS_BIARCH),-m32)
 KBUILD_CFLAGS  += -m$(BITS)
 KBUILD_AFLAGS  += -m$(BITS) -Wl,-a$(BITS)
 KBUILD_LDFLAGS += -m elf$(BITS)$(LDEMULATION)
@@ -125,7 +125,9 @@ LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) := -pie
 LDFLAGS_vmlinux:= $(LDFLAGS_vmlinux-y)
 
 ifdef CONFIG_PPC64
-ifeq ($(call cc-option-yn,-mcmodel=medium),y)
+ifeq ($(call cc-option,-mcmodel=medium),)
+   export NO_MINIMAL_TOC := -mno-minimal-toc
+else
# -mcmodel=medium breaks modules because it uses 32bit offsets from
# the TOC pointer to create pointers where possible. Pointers into the
# percpu data area are created by this method.
@@ -135,8 +137,6 @@ ifeq ($(call cc-option-yn,-mcmodel=medium),y)
# kernel percpu data space (starting with 0xc...). We need a full
# 64bit relocation for this to work, hence -mcmodel=large.
KBUILD_CFLAGS_MODULE += -mcmodel=large
-else
-   export NO_MINIMAL_TOC := -mno-minimal-toc
 endif
 endif
 
diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 10c0fb306f15..33e1de5d1c95 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -66,10 +66,7 @@ ifdef CONFIG_DEBUG_INFO
 BOOTCFLAGS += -g
 endif
 
-ifeq ($(call cc-option-yn, -fstack-protector),y)
-BOOTCFLAGS += -fno-stack-protector
-endif
-
+BOOTCFLAGS += $(call cc-option,-fstack-protector)
 BOOTCFLAGS += -I$(objtree)/$(obj) -I$(srctree)/$(obj)
 
 DTC_FLAGS  ?= -p 1024
-- 
2.33.0.rc1.237.g0d66db33f3-goog



Re: [PATCH v8 5/5] powerpc/pseries: Add support for FORM2 associativity

2021-08-16 Thread David Gibson
On Thu, Aug 12, 2021 at 06:52:23PM +0530, Aneesh Kumar K.V wrote:
> PAPR interface currently supports two different ways of communicating resource
> grouping details to the OS. These are referred to as Form 0 and Form 1
> associativity grouping. Form 0 is the older format and is now considered
> deprecated. This patch adds another resource grouping named FORM2.
> 
> Signed-off-by: Daniel Henrique Barboza 
> Signed-off-by: Aneesh Kumar K.V 

Reviewed-by: David Gibson 

Though there a couple of cosmetic issues and one bad memory access
issue (though only in the case of buggy firmware).

[snip]
> +Form 2
> +---
> +Form 2 associativity format adds separate device tree properties 
> representing NUMA node distance
> +thereby making the node distance computation flexible. Form 2 also allows 
> flexible primary
> +domain numbering. With numa distance computation now detached from the index 
> value in
> +"ibm,associativity-reference-points" property, Form 2 allows a large number 
> of primary domain
> +ids at the same domainID index representing resource groups of different 
> performance/latency
> +characteristics.
> +
> +Hypervisor indicates the usage of FORM2 associativity using bit 2 of byte 5 
> in the
> +"ibm,architecture-vec-5" property.
> +
> +"ibm,numa-lookup-index-table" property contains a list of one or more 
> numbers representing
> +the domainIDs present in the system. The offset of the domainID in this 
> property is
> +used as an index while computing numa distance information via 
> "ibm,numa-distance-table".
> +
> +prop-encoded-array: The number N of the domainIDs encoded as with 
> encode-int, followed by
> +N domainID encoded as with encode-int
> +
> +For ex:
> +"ibm,numa-lookup-index-table" =  {4, 0, 8, 250, 252}. The offset of domainID 
> 8 (2) is used when

Since you're using dts syntax below, it probably makes sense to use it
here as well.

> +computing the distance of domain 8 from other domains present in the system. 
> For the rest of
> +this document, this offset will be referred to as domain distance offset.
> +
> +"ibm,numa-distance-table" property contains a list of one or more numbers 
> representing the NUMA
> +distance between resource groups/domains present in the system.
> +
> +prop-encoded-array: The number N of the distance values encoded as with 
> encode-int, followed by
> +N distance values encoded as with encode-bytes. The max distance value we 
> could encode is 255.
> +The number N must be equal to the square of m where m is the number of 
> domainIDs in the
> +numa-lookup-index-table.
> +
> +For ex:
> +ibm,numa-lookup-index-table = <3 0 8 40>;
> +ibm,numa-distace-table = <9>, /bits/ 8 < 10  20  80
> +  20  10 160
> +  80 160  10>;

[snip]
> +
> + /* FORM2 affinity  */
> + nid = of_node_to_nid_single(node);
> + if (nid == NUMA_NO_NODE)
> + return;
> +
> + /*
> +  * With FORM2 we expect NUMA distance of all possible NUMA
> +  * nodes to be provided during boot.
> +  */
> + WARN(numa_distance_table[nid][nid] == -1,
> +  "NUMA distance details for node %d not provided\n", nid);
> +}
> +
> +/*
> + * ibm,numa-lookup-index-table= {N, domainid1, domainid2, . domainidN}
> + * ibm,numa-distance-table = { N, 1, 2, 4, 5, 1, 6,  N elements}

.. and here too.

> + */
> +static void initialize_form2_numa_distance_lookup_table(void)
> +{
> + int i, j;
> + struct device_node *root;
> + const __u8 *numa_dist_table;
> + const __be32 *numa_lookup_index;
> + int numa_dist_table_length;
> + int max_numa_index, distance_index;
> +
> + if (firmware_has_feature(FW_FEATURE_OPAL))
> + root = of_find_node_by_path("/ibm,opal");
> + else
> + root = of_find_node_by_path("/rtas");
> + if (!root)
> + root = of_find_node_by_path("/");
> +
> + numa_lookup_index = of_get_property(root, 
> "ibm,numa-lookup-index-table", NULL);
> + max_numa_index = of_read_number(_lookup_index[0], 1);
> +
> + /* first element of the array is the size and is encode-int */
> + numa_dist_table = of_get_property(root, "ibm,numa-distance-table", 
> NULL);
> + numa_dist_table_length = of_read_number((const __be32 
> *)_dist_table[0], 1);
> + /* Skip the size which is encoded int */
> + numa_dist_table += sizeof(__be32);
> +
> + pr_debug("numa_dist_table_len = %d, numa_dist_indexes_len = %d\n",
> +  numa_dist_table_length, max_numa_index);

You validate numa_dist_table_length below.  However, AFAICT you don't
anywhere check that the property actually has the length its first
element claims it does.  Yes, that represents a firmware bug, but it's
probably best if we don't ready past the end of the array in that
case, which I think is what will happen now.

Same applies for the lookup-index-table.

> + for (i = 0; i < max_numa_index; i++)
> + /* +1 skip the 

Re: [PATCH v8 3/5] powerpc/pseries: Consolidate different NUMA distance update code paths

2021-08-16 Thread David Gibson
On Thu, Aug 12, 2021 at 06:52:21PM +0530, Aneesh Kumar K.V wrote:
> The associativity details of the newly added resourced are collected from
> the hypervisor via "ibm,configure-connector" rtas call. Update the numa
> distance details of the newly added numa node after the above call.
> 
> Instead of updating NUMA distance every time we lookup a node id
> from the associativity property, add helpers that can be used
> during boot which does this only once. Also remove the distance
> update from node id lookup helpers.
> 
> Currently, we duplicate parsing code for ibm,associativity and
> ibm,associativity-lookup-arrays in the kernel. The associativity array 
> provided
> by these device tree properties are very similar and hence can use
> a helper to parse the node id and numa distance details.
> 
> Signed-off-by: Aneesh Kumar K.V 

Reviewed-by: David Gibson 

There are a handful of nits it would be nice to clean up as followups, though:

[snip]
> +static int get_nid_and_numa_distance(struct drmem_lmb *lmb)
> +{
> + struct assoc_arrays aa = { .arrays = NULL };
> + int default_nid = NUMA_NO_NODE;

I don't think there's any point to the 'default_nid' variable.

> + int nid = default_nid;
> + int rc, index;
> +
> + if ((primary_domain_index < 0) || !numa_enabled)
> + return default_nid;
> +
> + rc = of_get_assoc_arrays();
> + if (rc)
> + return default_nid;
> +
> + if (primary_domain_index <= aa.array_sz &&

You don't need this test any more - it's included in __associativity_to_nid().

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 2/3] powerpc: Fix undefined static key

2021-08-16 Thread Michael Ellerman
Joel Stanley  writes:
> When CONFIG_PPC_BARRIER_NOSPEC=n, security.c is not built leading to a
> missing definition of uaccess_flush_key.
>
>   LD  vmlinux.o
>   MODPOST vmlinux.symvers
>   MODINFO modules.builtin.modinfo
>   GEN modules.builtin
>   LD  .tmp_vmlinux.kallsyms1
> powerpc64le-linux-gnu-ld: arch/powerpc/kernel/align.o:(.toc+0x0): undefined 
> reference to `uaccess_flush_key'
> powerpc64le-linux-gnu-ld: arch/powerpc/kernel/signal_64.o:(.toc+0x0): 
> undefined reference to `uaccess_flush_key'
> powerpc64le-linux-gnu-ld: arch/powerpc/kernel/process.o:(.toc+0x0): undefined 
> reference to `uaccess_flush_key'
> powerpc64le-linux-gnu-ld: arch/powerpc/kernel/traps.o:(.toc+0x0): undefined 
> reference to `uaccess_flush_key'
> powerpc64le-linux-gnu-ld: 
> arch/powerpc/kernel/hw_breakpoint_constraints.o:(.toc+0x0): undefined 
> reference to `uaccess_flush_key'
> powerpc64le-linux-gnu-ld: arch/powerpc/kernel/ptrace/ptrace.o:(.toc+0x0): 
> more undefined references to `uaccess_flush_key' follow
> make[1]: *** [Makefile:1176: vmlinux] Error 1
>
> Hack one in to fix the build.

Yeah sorry that is a bit of a hack :)

I think the root cause here is that we don't have a CONFIG for "want
security workaround stuff", because so far we haven't had a CPU that
wants to turn that all off.

The generic code uses CONFIG_GENERIC_CPU_VULNERABILITIES, so I guess we
should follow that example and add PPC_CPU_VULNERABILITIES.

Then we'd use that to disable all the security.c stuff, and
PPC_BARRIER_NOSPEC would depend on it also.

Then we could allow configuring that off for Microwatt, or possibly for
all platforms if people want to do that.

cheers


Re: [PATCH 2/3] powerpc: Fix undefined static key

2021-08-16 Thread Joel Stanley
On Mon, 16 Aug 2021 at 08:39, Christophe Leroy
 wrote:
>
>
>
> Le 16/08/2021 à 10:24, Joel Stanley a écrit :
> > When CONFIG_PPC_BARRIER_NOSPEC=n, security.c is not built leading to a
> > missing definition of uaccess_flush_key.
> >
> >LD  vmlinux.o
> >MODPOST vmlinux.symvers
> >MODINFO modules.builtin.modinfo
> >GEN modules.builtin
> >LD  .tmp_vmlinux.kallsyms1
> > powerpc64le-linux-gnu-ld: arch/powerpc/kernel/align.o:(.toc+0x0): undefined 
> > reference to `uaccess_flush_key'
> > powerpc64le-linux-gnu-ld: arch/powerpc/kernel/signal_64.o:(.toc+0x0): 
> > undefined reference to `uaccess_flush_key'
> > powerpc64le-linux-gnu-ld: arch/powerpc/kernel/process.o:(.toc+0x0): 
> > undefined reference to `uaccess_flush_key'
> > powerpc64le-linux-gnu-ld: arch/powerpc/kernel/traps.o:(.toc+0x0): undefined 
> > reference to `uaccess_flush_key'
> > powerpc64le-linux-gnu-ld: 
> > arch/powerpc/kernel/hw_breakpoint_constraints.o:(.toc+0x0): undefined 
> > reference to `uaccess_flush_key'
> > powerpc64le-linux-gnu-ld: arch/powerpc/kernel/ptrace/ptrace.o:(.toc+0x0): 
> > more undefined references to `uaccess_flush_key' follow
> > make[1]: *** [Makefile:1176: vmlinux] Error 1
> >
> > Hack one in to fix the build.
> >
> > Signed-off-by: Joel Stanley 
> > ---
> >   arch/powerpc/include/asm/security_features.h | 3 +++
> >   1 file changed, 3 insertions(+)
> >
> > diff --git a/arch/powerpc/include/asm/security_features.h 
> > b/arch/powerpc/include/asm/security_features.h
> > index 792eefaf230b..46ade7927a4c 100644
> > --- a/arch/powerpc/include/asm/security_features.h
> > +++ b/arch/powerpc/include/asm/security_features.h
> > @@ -39,6 +39,9 @@ static inline bool security_ftr_enabled(u64 feature)
> >   return !!(powerpc_security_features & feature);
> >   }
> >
> > +#ifndef CONFIG_PPC_BARRIER_NOSPEC
> > +DEFINE_STATIC_KEY_FALSE(uaccess_flush_key);
> > +#endif
>
> It will then be re-defined by each file that includes asm/security_features.h 
> 
>
> You can't use a DEFINE_ in a .h
>
> You have to fix the problem at its source.
>
> Cleanest way I see it to modify asm/book3s/64/kup.h and something like
>
> if (IS_ENABLED(CONFIG_PPC_BARRIER_NOSPEC) && 
> static_branch_unlikely(_flush_key)

This won't work either as there's no declaration for uaccess_flush_key:

arch/powerpc/include/asm/book3s/64/kup.h:411:78: error:
‘uaccess_flush_key’ undeclared (first use in this function)

We could add an extern for it, but that is distasteful as the static
key API suggests using the structs directly is deprecated, and the
macros we're supposed to use perform initialisation.

Could we assume microwatt-like platforms will not gain pkeys support,
and have a stubbed out set of prevent/restore_user_access for systems
that turn off either pkeys or BARRIER_NOSPEC?

Or do we get rid of PPC_BARRIER_NOSPEC as an option, and have machines
rely on disabling it at runtime?


Re: [PATCH 3/7] powerpc: replace cc-option-yn uses with cc-option

2021-08-16 Thread Michael Ellerman
Nick Desaulniers  writes:
> cc-option-yn can be replaced with cc-option. ie.
> Checking for support:
> ifeq ($(call cc-option-yn,$(FLAG)),y)
> becomes:
> ifneq ($(call cc-option,$(FLAG)),)
>
> Checking for lack of support:
> ifeq ($(call cc-option-yn,$(FLAG)),n)
> becomes:
> ifeq ($(call cc-option,$(FLAG)),)
>
> This allows us to pursue removing cc-option-yn.
>
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Nick Desaulniers 
> ---
>  arch/powerpc/Makefile  | 12 ++--
>  arch/powerpc/boot/Makefile |  5 +
>  2 files changed, 7 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 9aaf1abbc641..85e224536cf7 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -12,12 +12,12 @@
>  # Rewritten by Cort Dougan and Paul Mackerras
>  #
>  
> -HAS_BIARCH   := $(call cc-option-yn, -m32)
> +HAS_BIARCH   := $(call cc-option,-m32)
>  
>  # Set default 32 bits cross compilers for vdso and boot wrapper
>  CROSS32_COMPILE ?=
>  
> -ifeq ($(HAS_BIARCH),y)
> +ifeq ($(HAS_BIARCH),-m32)

I don't love that we have to repeat "-m32" in each check.

I'm pretty sure you can use ifdef here, because HAS_BIARCH is a simple
variable (assigned with ":=").

ie, this can be:

  ifdef HAS_BIARCH


And that avoids having to spell out "-m32" everywhere.

cheers


Re: [PATCH] powerpc/xive: Do not mark xive_request_ipi() as __init

2021-08-16 Thread Nick Desaulniers
On Mon, Aug 16, 2021 at 12:06 PM Nathan Chancellor  wrote:
>
> Compiling ppc64le_defconfig with clang-14 shows a modpost warning:
>
> WARNING: modpost: vmlinux.o(.text+0xa74e0): Section mismatch in
> reference from the function xive_setup_cpu_ipi() to the function
> .init.text:xive_request_ipi()
> The function xive_setup_cpu_ipi() references
> the function __init xive_request_ipi().
> This is often because xive_setup_cpu_ipi lacks a __init
> annotation or the annotation of xive_request_ipi is wrong.
>
> xive_request_ipi() is called from xive_setup_cpu_ipi(), which is not
> __init, so xive_request_ipi() should not be marked __init. Remove the
> attribute so there is no more warning.
>
> Fixes: cbc06f051c52 ("powerpc/xive: Do not skip CPU-less nodes when creating 
> the IPIs")
> Signed-off-by: Nathan Chancellor 

Thanks for the patch!
Reviewed-by: Nick Desaulniers 

> ---
>  arch/powerpc/sysdev/xive/common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/sysdev/xive/common.c 
> b/arch/powerpc/sysdev/xive/common.c
> index 943fd30095af..8183ca343675 100644
> --- a/arch/powerpc/sysdev/xive/common.c
> +++ b/arch/powerpc/sysdev/xive/common.c
> @@ -1170,7 +1170,7 @@ static int __init xive_init_ipis(void)
> return ret;
>  }
>
> -static int __init xive_request_ipi(unsigned int cpu)
> +static int xive_request_ipi(unsigned int cpu)
>  {
> struct xive_ipi_desc *xid = _ipis[early_cpu_to_node(cpu)];
> int ret;
>
> base-commit: a2824f19e6065a0d3735acd9fe7155b104e7edf5
> --
> 2.33.0.rc2
>


-- 
Thanks,
~Nick Desaulniers


Re: [PATCH 3/7] powerpc: replace cc-option-yn uses with cc-option

2021-08-16 Thread Nathan Chancellor




On 8/16/2021 5:21 PM, 'Nick Desaulniers' via Clang Built Linux wrote:

cc-option-yn can be replaced with cc-option. ie.
Checking for support:
ifeq ($(call cc-option-yn,$(FLAG)),y)
becomes:
ifneq ($(call cc-option,$(FLAG)),)

Checking for lack of support:
ifeq ($(call cc-option-yn,$(FLAG)),n)
becomes:
ifeq ($(call cc-option,$(FLAG)),)

This allows us to pursue removing cc-option-yn.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Nick Desaulniers 
---
  arch/powerpc/Makefile  | 12 ++--
  arch/powerpc/boot/Makefile |  5 +
  2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 9aaf1abbc641..85e224536cf7 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -12,12 +12,12 @@
  # Rewritten by Cort Dougan and Paul Mackerras
  #
  
-HAS_BIARCH	:= $(call cc-option-yn, -m32)

+HAS_BIARCH := $(call cc-option,-m32)
  
  # Set default 32 bits cross compilers for vdso and boot wrapper

  CROSS32_COMPILE ?=
  
-ifeq ($(HAS_BIARCH),y)

+ifeq ($(HAS_BIARCH),-m32)
  ifeq ($(CROSS32_COMPILE),)
  ifdef CONFIG_PPC32
  # These options will be overridden by any -mcpu option that the CPU
@@ -107,7 +107,7 @@ cflags-$(CONFIG_CPU_LITTLE_ENDIAN)  += -mlittle-endian
  aflags-$(CONFIG_CPU_BIG_ENDIAN)   += $(call 
cc-option,-mbig-endian)
  aflags-$(CONFIG_CPU_LITTLE_ENDIAN)+= -mlittle-endian
  
-ifeq ($(HAS_BIARCH),y)

+ifeq ($(HAS_BIARCH),-m32)
  KBUILD_CFLAGS += -m$(BITS)
  KBUILD_AFLAGS += -m$(BITS) -Wl,-a$(BITS)
  KBUILD_LDFLAGS+= -m elf$(BITS)$(LDEMULATION)
@@ -125,7 +125,9 @@ LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) := -pie
  LDFLAGS_vmlinux   := $(LDFLAGS_vmlinux-y)
  
  ifdef CONFIG_PPC64

-ifeq ($(call cc-option-yn,-mcmodel=medium),y)
+ifeq ($(call cc-option,-mcmodel=medium),)
+   export NO_MINIMAL_TOC := -mno-minimal-toc
+else
# -mcmodel=medium breaks modules because it uses 32bit offsets from
# the TOC pointer to create pointers where possible. Pointers into the
# percpu data area are created by this method.
@@ -135,8 +137,6 @@ ifeq ($(call cc-option-yn,-mcmodel=medium),y)
# kernel percpu data space (starting with 0xc...). We need a full
# 64bit relocation for this to work, hence -mcmodel=large.
KBUILD_CFLAGS_MODULE += -mcmodel=large
-else
-   export NO_MINIMAL_TOC := -mno-minimal-toc
  endif
  endif
  
diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile

index 10c0fb306f15..33e1de5d1c95 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -66,10 +66,7 @@ ifdef CONFIG_DEBUG_INFO
  BOOTCFLAGS+= -g
  endif
  
-ifeq ($(call cc-option-yn, -fstack-protector),y)

-BOOTCFLAGS += -fno-stack-protector
-endif
-
+BOOTCFLAGS += $(call cc-option,-fstack-protector)


This was previously disabling the stack protector but now it is enabling 
it. Just remove the ifeq conditional altogether as the kernel assumes 
-fno-stack-protector is always supported after commit 893ab00439a4 
("kbuild: remove cc-option test of -fno-stack-protector").



  BOOTCFLAGS+= -I$(objtree)/$(obj) -I$(srctree)/$(obj)
  
  DTC_FLAGS	?= -p 1024




Re: [PATCH v2 1/2] sched/topology: Skip updating masks for non-online nodes

2021-08-16 Thread Valentin Schneider
On 16/08/21 16:03, Srikar Dronamraju wrote:
>>
>> Your version is much much better than mine.
>> And I have verified that it works as expected.
>>
>>
>
> Hey Peter/Valentin
>
> Are we waiting for any more feedback/testing for this?
>

I'm not overly fond of that last one, but AFAICT the only alternative is
doing a full-fledged NUMA topology rebuild on new-node onlining (i.e. make
calling sched_init_numa() more than once work). It's a lot more work for a
very particular usecase.

>
> --
> Thanks and Regards
> Srikar Dronamraju


[PATCH] powerpc/xive: Do not mark xive_request_ipi() as __init

2021-08-16 Thread Nathan Chancellor
Compiling ppc64le_defconfig with clang-14 shows a modpost warning:

WARNING: modpost: vmlinux.o(.text+0xa74e0): Section mismatch in
reference from the function xive_setup_cpu_ipi() to the function
.init.text:xive_request_ipi()
The function xive_setup_cpu_ipi() references
the function __init xive_request_ipi().
This is often because xive_setup_cpu_ipi lacks a __init
annotation or the annotation of xive_request_ipi is wrong.

xive_request_ipi() is called from xive_setup_cpu_ipi(), which is not
__init, so xive_request_ipi() should not be marked __init. Remove the
attribute so there is no more warning.

Fixes: cbc06f051c52 ("powerpc/xive: Do not skip CPU-less nodes when creating 
the IPIs")
Signed-off-by: Nathan Chancellor 
---
 arch/powerpc/sysdev/xive/common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/xive/common.c 
b/arch/powerpc/sysdev/xive/common.c
index 943fd30095af..8183ca343675 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1170,7 +1170,7 @@ static int __init xive_init_ipis(void)
return ret;
 }
 
-static int __init xive_request_ipi(unsigned int cpu)
+static int xive_request_ipi(unsigned int cpu)
 {
struct xive_ipi_desc *xid = _ipis[early_cpu_to_node(cpu)];
int ret;

base-commit: a2824f19e6065a0d3735acd9fe7155b104e7edf5
-- 
2.33.0.rc2



Re: [PATCH v1 2/4] powerpc/64s/perf: add power_pmu_running to query whether perf is being used

2021-08-16 Thread kernel test robot
Hi Nicholas,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.14-rc6 next-20210816]
[cannot apply to scottwood/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-64s-interrupt-speedups/20210816-153424
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-wii_defconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/98b8a35aa718c93ace9df66df6274fe392633f80
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Nicholas-Piggin/powerpc-64s-interrupt-speedups/20210816-153424
git checkout 98b8a35aa718c93ace9df66df6274fe392633f80
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> arch/powerpc/perf/core-book3s.c:2392:6: error: no previous prototype for 
>> 'power_pmu_running' [-Werror=missing-prototypes]
2392 | bool power_pmu_running(void)
 |  ^
   cc1: all warnings being treated as errors


vim +/power_pmu_running +2392 arch/powerpc/perf/core-book3s.c

  2391  
> 2392  bool power_pmu_running(void)
  2393  {
  2394  struct cpu_hw_events *cpuhw;
  2395  
  2396  /* Could this simply test local_paca->pmcregs_in_use? */
  2397  
  2398  if (!ppmu)
  2399  return false;
  2400  
  2401  cpuhw = this_cpu_ptr(_hw_events);
  2402  return cpuhw->n_events;
  2403  }
  2404  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH] powerpc/pseries/cpuhp: fix non-NUMA build

2021-08-16 Thread Nathan Lynch
Christophe Leroy  writes:

> Le 16/08/2021 à 17:17, Nathan Lynch a écrit :
>> With CONFIG_NUMA unset, direct references to node_to_cpumask_map don't
>> work:
>> 
>> arch/powerpc/platforms/pseries/hotplug-cpu.c: In function 
>> 'pseries_cpu_hotplug_init':
 arch/powerpc/platforms/pseries/hotplug-cpu.c:1022:8: error: 
 'node_to_cpumask_map' undeclared (first use in this
 function)
>>  1022 |node_to_cpumask_map[node]);
>>   |^~~
>> 
>> Use cpumask_of_node() here instead.
>
>
> Isn't it the same as 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210816041032.2839343-1-...@ellerman.id.au/
>  ?

Yes, thanks.


Re: [PATCH] powerpc/pseries/cpuhp: fix non-NUMA build

2021-08-16 Thread Christophe Leroy

Le 16/08/2021 à 17:17, Nathan Lynch a écrit :

With CONFIG_NUMA unset, direct references to node_to_cpumask_map don't
work:

arch/powerpc/platforms/pseries/hotplug-cpu.c: In function 
'pseries_cpu_hotplug_init':

arch/powerpc/platforms/pseries/hotplug-cpu.c:1022:8: error: 
'node_to_cpumask_map' undeclared (first use in this
function)

 1022 |node_to_cpumask_map[node]);
 |^~~

Use cpumask_of_node() here instead.



Isn't it the same as 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210816041032.2839343-1-...@ellerman.id.au/ ?


Christophe



Fixes: bd1dd4c5f528 ("powerpc/pseries: Prevent free CPU ids being reused on another 
node")
Reported-by: kernel test robot 
Signed-off-by: Nathan Lynch 
---
  arch/powerpc/platforms/pseries/hotplug-cpu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 1ef40ef699a6..d646c22e94ab 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -1021,7 +1021,7 @@ static int __init pseries_cpu_hotplug_init(void)
/* Record ids of CPU added at boot time */
cpumask_or(node_recorded_ids_map[node],
   node_recorded_ids_map[node],
-  node_to_cpumask_map[node]);
+  cpumask_of_node(node));
}
  
  		of_reconfig_notifier_register(_smp_nb);




[PATCH] powerpc/pseries/cpuhp: fix non-NUMA build

2021-08-16 Thread Nathan Lynch
With CONFIG_NUMA unset, direct references to node_to_cpumask_map don't
work:

   arch/powerpc/platforms/pseries/hotplug-cpu.c: In function 
'pseries_cpu_hotplug_init':
>> arch/powerpc/platforms/pseries/hotplug-cpu.c:1022:8: error: 
>> 'node_to_cpumask_map' undeclared (first use in this
>> function)
1022 |node_to_cpumask_map[node]);
 |^~~

Use cpumask_of_node() here instead.

Fixes: bd1dd4c5f528 ("powerpc/pseries: Prevent free CPU ids being reused on 
another node")
Reported-by: kernel test robot 
Signed-off-by: Nathan Lynch 
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 1ef40ef699a6..d646c22e94ab 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -1021,7 +1021,7 @@ static int __init pseries_cpu_hotplug_init(void)
/* Record ids of CPU added at boot time */
cpumask_or(node_recorded_ids_map[node],
   node_recorded_ids_map[node],
-  node_to_cpumask_map[node]);
+  cpumask_of_node(node));
}
 
of_reconfig_notifier_register(_smp_nb);
-- 
2.31.1



Re: [FSL P50x0] lscpu reports wrong values since the RC1 of kernel 5.13

2021-08-16 Thread Christian Zigotzky

Hi All,

I tested the RC6 of kernel 5.14 today and unfortunately the issue still 
exists. We have figured out that only P5040 SoCs are affected. [1]

P5020 SoCs display the correct values.
Please check the CPU changes in the PowerPC updates 5.13-1 and 5.13-2.

Thanks,
Christian

[1] https://forum.hyperion-entertainment.com/viewtopic.php?p=53775#p53775


On 09 August 2021 um 02:37 pm, Christian Zigotzky wrote:

Hi All,

Lscpu reports wrong values [1] since the RC1 of kernel 5.13 on my FSL 
P5040 Cyrus+ board (A-EON AmigaOne X5000). [2]

The differences are:

Since the RC1 of kernel 5.13 (wrong values):

Core(s) per socket: 1
Socket(s): 3

Before (correct values):

Core(s) per socket: 4
Socket(s): 1

Through the wrong values, I can't use "-smp 4" with a virtual e5500 
QEMU machine with KVM HV anymore.  [3]

"-smp 3" works with KVM HV.

Maybe the file_load_64 commit from the PowerPC updates 5.13-2 is the 
problem ( powerpc/kexec_file: Use current CPU info while setting up 
FDT). [4]


Or maybe this change (PowerPC updates 5.13-1):

-#ifdef CONFIG_PPC_BOOK3E_64
-    state->ctx_state = exception_enter();
-    if (user_mode(regs))
-    account_cpu_user_entry();
-#endif

---

Or maybe this change (PowerPC updates 5.13-1):

diff --git a/arch/powerpc/include/asm/smp.h 
b/arch/powerpc/include/asm/smp.h

index 7a13bc20f0a0c..03b3d010cbab6 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -31,6 +31,7 @@ extern u32 *cpu_to_phys_id;
 extern bool coregroup_enabled;

 extern int cpu_to_chip_id(int cpu);
+extern int *chip_id_lookup_table;

 #ifdef CONFIG_SMP

@@ -121,6 +122,11 @@ static inline struct cpumask 
*cpu_sibling_mask(int cpu)

 return per_cpu(cpu_sibling_map, cpu);
 }

+static inline struct cpumask *cpu_core_mask(int cpu)
+{
+    return per_cpu(cpu_core_map, cpu);
+}
+
 static inline struct cpumask *cpu_l2_cache_mask(int cpu)
 {
 return per_cpu(cpu_l2_cache_map, cpu);

---

I have found a lot of other changes in the PowerPC updates 5.13-1 
regarding the CPU.


Could you please check the CPU changes in the PowerPC updates 5.13-1 
and 5.13-2?


Please find attached the kernel 5.14-rc5 config.

Thanks,
Christian


[1]

lscpu with the correct values before the RC1 of kernel 5.13:

Architecture: ppc64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Big Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Model: 1.2 (pvr 8024 0012)
Model name: e5500
L1d cache: 128 KiB
L1i cache: 128 KiB
L2 cache: 2 MiB
L3 cache: 2 MiB
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Branch predictor state flush
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

---

lscpu with the wrong values since the RC1 of kernel 5.13:

Architecture: ppc64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Big Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 3
Model: 1.2 (pvr 8024 0012)
Model name: e5500
L1d cache: 128 KiB
L1i cache: 128 KiB
L2 cache: 2 MiB
L3 cache: 2 MiB
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Branch predictor state flush
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

---

[2] http://wiki.amiga.org/index.php?title=X5000

[3] https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-May/229103.html

[4] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/arch/powerpc/kexec/file_load_64.c?id=ab159ac569fddf812c0a217d6dbffaa5d93ef88f




Re: [PATCH linux-next] module: remove duplicate include in interrupt.c

2021-08-16 Thread Christophe Leroy




Le 16/08/2021 à 13:34, cgel@gmail.com a écrit :

From: Lv Ruyi 

'asm/interrupt.h' included in 'interrupt.c' is duplicated.


This patch has been submitted at least half a dozen of times already.

You should maintain alphabetic order in the include list.

But please don't post it again, we have it in the pipe already, see 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/1624329437-84730-1-git-send-email-jiapeng.ch...@linux.alibaba.com/


Next time please check at https://patchwork.ozlabs.org/project/linuxppc-dev/list/ before submitting 
a new patch.


Thanks
Christophe



Reported-by: Zeal Robot 
Signed-off-by: Lv Ruyi 
---
  arch/powerpc/kernel/interrupt.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index 21bbd615ca41..8a936515e4e4 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -10,7 +10,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 



[PATCH linux-next] module: remove duplicate include in interrupt.c

2021-08-16 Thread cgel . zte
From: Lv Ruyi 

'asm/interrupt.h' included in 'interrupt.c' is duplicated.

Reported-by: Zeal Robot 
Signed-off-by: Lv Ruyi 
---
 arch/powerpc/kernel/interrupt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index 21bbd615ca41..8a936515e4e4 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -10,7 +10,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.25.1



Re: [PATCH v2 1/2] sched/topology: Skip updating masks for non-online nodes

2021-08-16 Thread Srikar Dronamraju
> 
> Your version is much much better than mine.
> And I have verified that it works as expected.
> 
> 

Hey Peter/Valentin

Are we waiting for any more feedback/testing for this?


-- 
Thanks and Regards
Srikar Dronamraju


Re: [PATCH 0/2] powerpc: mpc855_ads defconfig fixes

2021-08-16 Thread Christophe Leroy




Le 16/08/2021 à 10:31, Joel Stanley a écrit :

The first was a build warning I noticed when testing something
unrelated.

I took a moment to look into it, and came up with the second patch which
updates the defconfig to make it easier to maintain in the future

It also fixes a regression where the MTD partition support dropped out
of the config. Given noone noticed the regression since v4.20 was
released, perhaps it could be left disabled?


Most likely nobody is using this board anymore. But that's a good to have it to perform CI builds. 
So we should leave that kind of config.





Joel Stanley (2):
   powerpc/config: Fix IPV6 warning in mpc855_ads
   powerpc/configs: Regenerate mpc885_ads_defconfig

  arch/powerpc/configs/mpc885_ads_defconfig | 49 +++
  1 file changed, 23 insertions(+), 26 deletions(-)



Re: [PATCH 2/2] powerpc/configs: Regenerate mpc885_ads_defconfig

2021-08-16 Thread Christophe Leroy




Le 16/08/2021 à 10:31, Joel Stanley a écrit :

Regenrate atop v5.14-rc6.


Typos.

You mean you did redo a "make savedefconfig" ?



The chagnes are mostly re-ordering, except for the following which fall
out due to dependenacies:

  - CONFIG_DEBUG_KERNEL=y selected by EXPERT

  - CONFIG_PPC_EARLY_DEBUG_CPM_ADDR=0xff002008 which is the default
setting

CONFIG_MTD_PHYSMAP_OF is not longer enabled, as it depends on
MTD_PHYSMAP which is not enabled. This is a regression from commit
642b1e8dbed7 ("mtd: maps: Merge physmap_of.c into physmap-core.c"),
which added the extra dependency. Add CONFIG_MTD_PHYSMAP=y so this stays
in the config.

Signed-off-by: Joel Stanley 
---
  arch/powerpc/configs/mpc885_ads_defconfig | 47 +++
  1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/configs/mpc885_ads_defconfig 
b/arch/powerpc/configs/mpc885_ads_defconfig
index 5cd17adf903f..c74dc76b1d0d 100644
--- a/arch/powerpc/configs/mpc885_ads_defconfig
+++ b/arch/powerpc/configs/mpc885_ads_defconfig
@@ -1,19 +1,30 @@
-CONFIG_PPC_8xx=y
  # CONFIG_SWAP is not set
  CONFIG_SYSVIPC=y
  CONFIG_NO_HZ=y
  CONFIG_HIGH_RES_TIMERS=y
+CONFIG_BPF_JIT=y
+CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
  CONFIG_LOG_BUF_SHIFT=14
  CONFIG_EXPERT=y
  # CONFIG_ELF_CORE is not set
  # CONFIG_BASE_FULL is not set
  # CONFIG_FUTEX is not set
+CONFIG_PERF_EVENTS=y
  # CONFIG_VM_EVENT_COUNTERS is not set
-# CONFIG_BLK_DEV_BSG is not set
-CONFIG_PARTITION_ADVANCED=y
+CONFIG_PPC_8xx=y
+CONFIG_8xx_GPIO=y
+CONFIG_SMC_UCODE_PATCH=y
+CONFIG_PIN_TLB=y
  CONFIG_GEN_RTC=y
  CONFIG_HZ_100=y
+CONFIG_MATH_EMULATION=y
+CONFIG_PPC_16K_PAGES=y
+CONFIG_ADVANCED_OPTIONS=y
  # CONFIG_SECCOMP is not set
+CONFIG_STRICT_KERNEL_RWX=y
+CONFIG_MODULES=y
+# CONFIG_BLK_DEV_BSG is not set
+CONFIG_PARTITION_ADVANCED=y
  CONFIG_NET=y
  CONFIG_PACKET=y
  CONFIG_UNIX=y
@@ -33,6 +44,7 @@ CONFIG_MTD_CFI_GEOMETRY=y
  # CONFIG_MTD_CFI_I2 is not set
  CONFIG_MTD_CFI_I4=y
  CONFIG_MTD_CFI_AMDSTD=y
+CONFIG_MTD_PHYSMAP=y
  CONFIG_MTD_PHYSMAP_OF=y
  # CONFIG_BLK_DEV is not set
  CONFIG_NETDEVICES=y
@@ -45,38 +57,25 @@ CONFIG_DAVICOM_PHY=y
  # CONFIG_LEGACY_PTYS is not set
  CONFIG_SERIAL_CPM=y
  CONFIG_SERIAL_CPM_CONSOLE=y
+CONFIG_SPI=y
+CONFIG_SPI_FSL_SPI=y
  # CONFIG_HWMON is not set
+CONFIG_WATCHDOG=y
+CONFIG_8xxx_WDT=y
  # CONFIG_USB_SUPPORT is not set
  # CONFIG_DNOTIFY is not set
  CONFIG_TMPFS=y
  CONFIG_CRAMFS=y
  CONFIG_NFS_FS=y
  CONFIG_ROOT_NFS=y
+CONFIG_CRYPTO=y
+CONFIG_CRYPTO_DEV_TALITOS=y
  CONFIG_CRC32_SLICEBY4=y
  CONFIG_DEBUG_INFO=y
  CONFIG_MAGIC_SYSRQ=y
-CONFIG_DETECT_HUNG_TASK=y
-CONFIG_PPC_16K_PAGES=y
-CONFIG_DEBUG_KERNEL=y
  CONFIG_DEBUG_FS=y
-CONFIG_PPC_PTDUMP=y
-CONFIG_MODULES=y
-CONFIG_SPI=y
-CONFIG_SPI_FSL_SPI=y
-CONFIG_CRYPTO=y
-CONFIG_CRYPTO_DEV_TALITOS=y
-CONFIG_8xx_GPIO=y
-CONFIG_WATCHDOG=y
-CONFIG_8xxx_WDT=y
-CONFIG_SMC_UCODE_PATCH=y
-CONFIG_ADVANCED_OPTIONS=y
-CONFIG_PIN_TLB=y
-CONFIG_PERF_EVENTS=y
-CONFIG_MATH_EMULATION=y
-CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
-CONFIG_STRICT_KERNEL_RWX=y
-CONFIG_BPF_JIT=y
  CONFIG_DEBUG_VM_PGTABLE=y
+CONFIG_DETECT_HUNG_TASK=y
  CONFIG_BDI_SWITCH=y
  CONFIG_PPC_EARLY_DEBUG=y
-CONFIG_PPC_EARLY_DEBUG_CPM_ADDR=0xff002008
+CONFIG_PPC_PTDUMP=y



Re: [PATCH 1/2] powerpc/config: Fix IPV6 warning in mpc855_ads

2021-08-16 Thread Christophe Leroy




Le 16/08/2021 à 10:31, Joel Stanley a écrit :

When building this config there's a warning:

   79:warning: override: reassigning to symbol IPV6

Commit 9a1762a4a4ff ("powerpc/8xx: Update mpc885_ads_defconfig to
improve CI") added CONFIG_IPV6=y, but left '# CONFIG_IPV6 is not set'
in.

IPV6 is default y, so remove both to clean up the build.

Signed-off-by: Joel Stanley 


Acked-by: Christophe Leroy 


---
  arch/powerpc/configs/mpc885_ads_defconfig | 2 --
  1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/configs/mpc885_ads_defconfig 
b/arch/powerpc/configs/mpc885_ads_defconfig
index d21f266cea9a..5cd17adf903f 100644
--- a/arch/powerpc/configs/mpc885_ads_defconfig
+++ b/arch/powerpc/configs/mpc885_ads_defconfig
@@ -21,7 +21,6 @@ CONFIG_INET=y
  CONFIG_IP_MULTICAST=y
  CONFIG_IP_PNP=y
  CONFIG_SYN_COOKIES=y
-# CONFIG_IPV6 is not set
  # CONFIG_FW_LOADER is not set
  CONFIG_MTD=y
  CONFIG_MTD_BLOCK=y
@@ -76,7 +75,6 @@ CONFIG_PERF_EVENTS=y
  CONFIG_MATH_EMULATION=y
  CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
  CONFIG_STRICT_KERNEL_RWX=y
-CONFIG_IPV6=y
  CONFIG_BPF_JIT=y
  CONFIG_DEBUG_VM_PGTABLE=y
  CONFIG_BDI_SWITCH=y



Re: [PATCH 3/3] powerpc/microwatt: CPU doesn't (yet) have speculation bugs

2021-08-16 Thread Christophe Leroy




Le 16/08/2021 à 10:24, Joel Stanley a écrit :

Signed-off-by: Joel Stanley 
---
  arch/powerpc/Kconfig | 1 +
  1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 663766fbf505..d5af6667c206 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -279,6 +279,7 @@ config PPC_BARRIER_NOSPEC
bool
default y
depends on PPC_BOOK3S_64 || PPC_FSL_BOOK3E
+   depends on !PPC_MICROWATT


Not sure it is a good idea to disable it completely when PPC_MICROWATT is selected. Don't we want to 
be able to build generic kernels that can run on any book3s/64 ?


Maybe you should change the default instead, something like:

bool
default y if !PPC_MICROWATT
depends on PPC_BOOK3S_64 || PPC_FSL_BOOK3E

  
  config EARLY_PRINTK

bool



Re: [PATCH v2 00/60] KVM: PPC: Book3S HV P9: entry/exit optimisations

2021-08-16 Thread Athira Rajeev



> On 11-Aug-2021, at 9:30 PM, Nicholas Piggin  wrote:
> 
> This reduces radix guest full entry/exit latency on POWER9 and POWER10
> by 2x.
> 
> Nested HV guests should see smaller improvements in their L1 entry/exit,
> but this is also combined with most L0 speedups also applying to nested
> entry. nginx localhost throughput test in a SMP nested guest is improved
> about 10% (in a direct guest it doesn't change much because it uses XIVE
> for IPIs) when L0 and L1 are patched.
> 
> It does this in several main ways:
> 
> - Rearrange code to optimise SPR accesses. Mainly, avoid scoreboard
>  stalls.
> 
> - Test SPR values to avoid mtSPRs where possible. mtSPRs are expensive.
> 
> - Reduce mftb. mftb is expensive.
> 
> - Demand fault certain facilities to avoid saving and/or restoring them
>  (at the cost of fault when they are used, but this is mitigated over
>  a number of entries, like the facilities when context switching 
>  processes). PM, TM, and EBB so far.
> 
> - Defer some sequences that are made just in case a guest is interrupted
>  in the middle of a critical section to the case where the guest is
>  scheduled on a different CPU, rather than every time (at the cost of
>  an extra IPI in this case). Namely the tlbsync sequence for radix with
>  GTSE, which is very expensive.
> 
> - Reduce locking, barriers, atomics related to the vcpus-per-vcore > 1
>  handling that the P9 path does not require.
> 
> Changes since v1:
> - Verified DPDES changes still work with msgsndp SMT emulation.
> - Fixed HMI handling bug.
> - Split softpatch handling fixes into smaller pieces.
> - Rebased with Fabiano's latest HV sanitising patches.
> - Fix TM demand faulting bug causing nested guest TM tests to TM Bad
>  Thing the host in rare cases.
> - Re-name new "pmu=" command line option to "pmu_override=" and update
>  documentation wording.

Hi Nick,

For the PMU related changes,

Reviewed-by: Athira Rajeev 

Thanks
Athira
> - Add default=y config option rather than unconditionally removing the
>  L0 nested PMU workaround.
> - Remove unnecessary MSR[RI] updates in entry/exit. Down to about 4700
>  cycles now.
> - Another bugfix from Alexey's testing.
> 
> Changes since RFC:
> - Rebased with Fabiano's HV sanitising patches at the front.
> - Several demand faulting bug fixes mostly relating to nested guests.
> - Removed facility demand-faulting from L0 nested entry/exit handler.
>  Demand faulting is still done in the L1, but not the L0. The reason
>  is to reduce complexity (although it's only a small amount of
>  complexity), reduce demand faulting overhead that may require several
> 
> Fabiano Rosas (3):
>  KVM: PPC: Book3S HV Nested: Sanitise vcpu registers
>  KVM: PPC: Book3S HV Nested: Stop forwarding all HFUs to L1
>  KVM: PPC: Book3S HV Nested: save_hv_return_state does not require trap
>argument
> 
> Nicholas Piggin (57):
>  KVM: PPC: Book3S HV: Initialise vcpu MSR with MSR_ME
>  KVM: PPC: Book3S HV: Remove TM emulation from POWER7/8 path
>  KVM: PPC: Book3S HV P9: Fixes for TM softpatch interrupt NIP
>  KVM: PPC: Book3S HV Nested: Fix TM softpatch HFAC interrupt emulation
>  KVM: PPC: Book3S HV Nested: Make nested HFSCR state accessible
>  KVM: PPC: Book3S HV Nested: Reflect guest PMU in-use to L0 when guest
>SPRs are live
>  powerpc/64s: Remove WORT SPR from POWER9/10
>  KMV: PPC: Book3S HV P9: Use set_dec to set decrementer to host
>  KVM: PPC: Book3S HV P9: Use host timer accounting to avoid decrementer
>read
>  KVM: PPC: Book3S HV P9: Use large decrementer for HDEC
>  KVM: PPC: Book3S HV P9: Reduce mftb per guest entry/exit
>  powerpc/time: add API for KVM to re-arm the host timer/decrementer
>  KVM: PPC: Book3S HV: POWER10 enable HAIL when running radix guests
>  powerpc/64s: Keep AMOR SPR a constant ~0 at runtime
>  KVM: PPC: Book3S HV: Don't always save PMU for guest capable of
>nesting
>  powerpc/64s: Always set PMU control registers to frozen/disabled when
>not in use
>  powerpc/64s: Implement PMU override command line option
>  KVM: PPC: Book3S HV P9: Implement PMU save/restore in C
>  KVM: PPC: Book3S HV P9: Factor PMU save/load into context switch
>functions
>  KVM: PPC: Book3S HV P9: Demand fault PMU SPRs when marked not inuse
>  KVM: PPC: Book3S HV P9: Factor out yield_count increment
>  KVM: PPC: Book3S HV: CTRL SPR does not require read-modify-write
>  KVM: PPC: Book3S HV P9: Move SPRG restore to restore_p9_host_os_sprs
>  KVM: PPC: Book3S HV P9: Reduce mtmsrd instructions required to save
>host SPRs
>  KVM: PPC: Book3S HV P9: Improve mtmsrd scheduling by delaying MSR[EE]
>disable
>  KVM: PPC: Book3S HV P9: Add kvmppc_stop_thread to match
>kvmppc_start_thread
>  KVM: PPC: Book3S HV: Change dec_expires to be relative to guest
>timebase
>  KVM: PPC: Book3S HV P9: Move TB updates
>  KVM: PPC: Book3S HV P9: Optimise timebase reads
>  KVM: PPC: Book3S HV P9: Avoid SPR scoreboard stalls
>  KVM: PPC: Book3S HV P9: Only execute mtSPR if the value changed
>  

Re: [PATCH 2/3] powerpc: Fix undefined static key

2021-08-16 Thread Christophe Leroy




Le 16/08/2021 à 10:24, Joel Stanley a écrit :

When CONFIG_PPC_BARRIER_NOSPEC=n, security.c is not built leading to a
missing definition of uaccess_flush_key.

   LD  vmlinux.o
   MODPOST vmlinux.symvers
   MODINFO modules.builtin.modinfo
   GEN modules.builtin
   LD  .tmp_vmlinux.kallsyms1
powerpc64le-linux-gnu-ld: arch/powerpc/kernel/align.o:(.toc+0x0): undefined 
reference to `uaccess_flush_key'
powerpc64le-linux-gnu-ld: arch/powerpc/kernel/signal_64.o:(.toc+0x0): undefined 
reference to `uaccess_flush_key'
powerpc64le-linux-gnu-ld: arch/powerpc/kernel/process.o:(.toc+0x0): undefined 
reference to `uaccess_flush_key'
powerpc64le-linux-gnu-ld: arch/powerpc/kernel/traps.o:(.toc+0x0): undefined 
reference to `uaccess_flush_key'
powerpc64le-linux-gnu-ld: 
arch/powerpc/kernel/hw_breakpoint_constraints.o:(.toc+0x0): undefined reference 
to `uaccess_flush_key'
powerpc64le-linux-gnu-ld: arch/powerpc/kernel/ptrace/ptrace.o:(.toc+0x0): more 
undefined references to `uaccess_flush_key' follow
make[1]: *** [Makefile:1176: vmlinux] Error 1

Hack one in to fix the build.

Signed-off-by: Joel Stanley 
---
  arch/powerpc/include/asm/security_features.h | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/include/asm/security_features.h 
b/arch/powerpc/include/asm/security_features.h
index 792eefaf230b..46ade7927a4c 100644
--- a/arch/powerpc/include/asm/security_features.h
+++ b/arch/powerpc/include/asm/security_features.h
@@ -39,6 +39,9 @@ static inline bool security_ftr_enabled(u64 feature)
return !!(powerpc_security_features & feature);
  }
  
+#ifndef CONFIG_PPC_BARRIER_NOSPEC

+DEFINE_STATIC_KEY_FALSE(uaccess_flush_key);
+#endif


It will then be re-defined by each file that includes asm/security_features.h 


You can't use a DEFINE_ in a .h

You have to fix the problem at its source.

Cleanest way I see it to modify asm/book3s/64/kup.h and something like

if (IS_ENABLED(CONFIG_PPC_BARRIER_NOSPEC) && 
static_branch_unlikely(_flush_key)



  
  // Features indicating support for Spectre/Meltdown mitigations
  



[PATCH 2/2] powerpc/configs: Regenerate mpc885_ads_defconfig

2021-08-16 Thread Joel Stanley
Regenrate atop v5.14-rc6.

The chagnes are mostly re-ordering, except for the following which fall
out due to dependenacies:

 - CONFIG_DEBUG_KERNEL=y selected by EXPERT

 - CONFIG_PPC_EARLY_DEBUG_CPM_ADDR=0xff002008 which is the default
   setting

CONFIG_MTD_PHYSMAP_OF is not longer enabled, as it depends on
MTD_PHYSMAP which is not enabled. This is a regression from commit
642b1e8dbed7 ("mtd: maps: Merge physmap_of.c into physmap-core.c"),
which added the extra dependency. Add CONFIG_MTD_PHYSMAP=y so this stays
in the config.

Signed-off-by: Joel Stanley 
---
 arch/powerpc/configs/mpc885_ads_defconfig | 47 +++
 1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/configs/mpc885_ads_defconfig 
b/arch/powerpc/configs/mpc885_ads_defconfig
index 5cd17adf903f..c74dc76b1d0d 100644
--- a/arch/powerpc/configs/mpc885_ads_defconfig
+++ b/arch/powerpc/configs/mpc885_ads_defconfig
@@ -1,19 +1,30 @@
-CONFIG_PPC_8xx=y
 # CONFIG_SWAP is not set
 CONFIG_SYSVIPC=y
 CONFIG_NO_HZ=y
 CONFIG_HIGH_RES_TIMERS=y
+CONFIG_BPF_JIT=y
+CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
 CONFIG_LOG_BUF_SHIFT=14
 CONFIG_EXPERT=y
 # CONFIG_ELF_CORE is not set
 # CONFIG_BASE_FULL is not set
 # CONFIG_FUTEX is not set
+CONFIG_PERF_EVENTS=y
 # CONFIG_VM_EVENT_COUNTERS is not set
-# CONFIG_BLK_DEV_BSG is not set
-CONFIG_PARTITION_ADVANCED=y
+CONFIG_PPC_8xx=y
+CONFIG_8xx_GPIO=y
+CONFIG_SMC_UCODE_PATCH=y
+CONFIG_PIN_TLB=y
 CONFIG_GEN_RTC=y
 CONFIG_HZ_100=y
+CONFIG_MATH_EMULATION=y
+CONFIG_PPC_16K_PAGES=y
+CONFIG_ADVANCED_OPTIONS=y
 # CONFIG_SECCOMP is not set
+CONFIG_STRICT_KERNEL_RWX=y
+CONFIG_MODULES=y
+# CONFIG_BLK_DEV_BSG is not set
+CONFIG_PARTITION_ADVANCED=y
 CONFIG_NET=y
 CONFIG_PACKET=y
 CONFIG_UNIX=y
@@ -33,6 +44,7 @@ CONFIG_MTD_CFI_GEOMETRY=y
 # CONFIG_MTD_CFI_I2 is not set
 CONFIG_MTD_CFI_I4=y
 CONFIG_MTD_CFI_AMDSTD=y
+CONFIG_MTD_PHYSMAP=y
 CONFIG_MTD_PHYSMAP_OF=y
 # CONFIG_BLK_DEV is not set
 CONFIG_NETDEVICES=y
@@ -45,38 +57,25 @@ CONFIG_DAVICOM_PHY=y
 # CONFIG_LEGACY_PTYS is not set
 CONFIG_SERIAL_CPM=y
 CONFIG_SERIAL_CPM_CONSOLE=y
+CONFIG_SPI=y
+CONFIG_SPI_FSL_SPI=y
 # CONFIG_HWMON is not set
+CONFIG_WATCHDOG=y
+CONFIG_8xxx_WDT=y
 # CONFIG_USB_SUPPORT is not set
 # CONFIG_DNOTIFY is not set
 CONFIG_TMPFS=y
 CONFIG_CRAMFS=y
 CONFIG_NFS_FS=y
 CONFIG_ROOT_NFS=y
+CONFIG_CRYPTO=y
+CONFIG_CRYPTO_DEV_TALITOS=y
 CONFIG_CRC32_SLICEBY4=y
 CONFIG_DEBUG_INFO=y
 CONFIG_MAGIC_SYSRQ=y
-CONFIG_DETECT_HUNG_TASK=y
-CONFIG_PPC_16K_PAGES=y
-CONFIG_DEBUG_KERNEL=y
 CONFIG_DEBUG_FS=y
-CONFIG_PPC_PTDUMP=y
-CONFIG_MODULES=y
-CONFIG_SPI=y
-CONFIG_SPI_FSL_SPI=y
-CONFIG_CRYPTO=y
-CONFIG_CRYPTO_DEV_TALITOS=y
-CONFIG_8xx_GPIO=y
-CONFIG_WATCHDOG=y
-CONFIG_8xxx_WDT=y
-CONFIG_SMC_UCODE_PATCH=y
-CONFIG_ADVANCED_OPTIONS=y
-CONFIG_PIN_TLB=y
-CONFIG_PERF_EVENTS=y
-CONFIG_MATH_EMULATION=y
-CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
-CONFIG_STRICT_KERNEL_RWX=y
-CONFIG_BPF_JIT=y
 CONFIG_DEBUG_VM_PGTABLE=y
+CONFIG_DETECT_HUNG_TASK=y
 CONFIG_BDI_SWITCH=y
 CONFIG_PPC_EARLY_DEBUG=y
-CONFIG_PPC_EARLY_DEBUG_CPM_ADDR=0xff002008
+CONFIG_PPC_PTDUMP=y
-- 
2.32.0



[PATCH 1/2] powerpc/config: Fix IPV6 warning in mpc855_ads

2021-08-16 Thread Joel Stanley
When building this config there's a warning:

  79:warning: override: reassigning to symbol IPV6

Commit 9a1762a4a4ff ("powerpc/8xx: Update mpc885_ads_defconfig to
improve CI") added CONFIG_IPV6=y, but left '# CONFIG_IPV6 is not set'
in.

IPV6 is default y, so remove both to clean up the build.

Signed-off-by: Joel Stanley 
---
 arch/powerpc/configs/mpc885_ads_defconfig | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/configs/mpc885_ads_defconfig 
b/arch/powerpc/configs/mpc885_ads_defconfig
index d21f266cea9a..5cd17adf903f 100644
--- a/arch/powerpc/configs/mpc885_ads_defconfig
+++ b/arch/powerpc/configs/mpc885_ads_defconfig
@@ -21,7 +21,6 @@ CONFIG_INET=y
 CONFIG_IP_MULTICAST=y
 CONFIG_IP_PNP=y
 CONFIG_SYN_COOKIES=y
-# CONFIG_IPV6 is not set
 # CONFIG_FW_LOADER is not set
 CONFIG_MTD=y
 CONFIG_MTD_BLOCK=y
@@ -76,7 +75,6 @@ CONFIG_PERF_EVENTS=y
 CONFIG_MATH_EMULATION=y
 CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
 CONFIG_STRICT_KERNEL_RWX=y
-CONFIG_IPV6=y
 CONFIG_BPF_JIT=y
 CONFIG_DEBUG_VM_PGTABLE=y
 CONFIG_BDI_SWITCH=y
-- 
2.32.0



Re: [PATCH 1/3] powerpc/64s: Fix build when !PPC_BARRIER_NOSPEC

2021-08-16 Thread Christophe Leroy




Le 16/08/2021 à 10:24, Joel Stanley a écrit :

When disabling PPC_BARRIER_NOSPEC the do_barrier_nospec_fixups_range
definition is still included, as well as a stub in asm/setup.h:

../arch/powerpc/lib/feature-fixups.c:502:6: error: redefinition of 
‘do_barrier_nospec_fixu>
   502 | void do_barrier_nospec_fixups_range(bool enable, void *fixup_start, void 
*fixup_en>
   |  ^~
In file included from ../arch/powerpc/lib/feature-fixups.c:23:
../arch/powerpc/include/asm/setup.h:70:20: note: previous definition of 
‘do_barrier_nospec>
70 | static inline void do_barrier_nospec_fixups_range(bool enable, void 
*start, void *>
   |^~

I assume the intent was to put the just do_barrier_nospec_fixups
behind PPC_BARRIER_NOSPEC and let the compiler drop _range when there
are no users. (There is a caller in module.c, but this is behind
PPC_BARRIER_NOSPEC).


The compiler won't drop do_barrier_nospec_fixups_range() because it is not 
static.



This makes PPC_BOOK3S_64 match how the PPC_FSL_BOOK3E build works.

Signed-off-by: Joel Stanley 
---
  arch/powerpc/include/asm/setup.h | 2 --
  1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index 6c1a7d217d1a..71012284c044 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -66,8 +66,6 @@ extern bool barrier_nospec_enabled;
  
  #ifdef CONFIG_PPC_BARRIER_NOSPEC

  void do_barrier_nospec_fixups_range(bool enable, void *start, void *end);
-#else
-static inline void do_barrier_nospec_fixups_range(bool enable, void *start, 
void *end) { }
  #endif
  
  #ifdef CONFIG_PPC_FSL_BOOK3E




[PATCH 0/2] powerpc: mpc855_ads defconfig fixes

2021-08-16 Thread Joel Stanley
The first was a build warning I noticed when testing something
unrelated.

I took a moment to look into it, and came up with the second patch which
updates the defconfig to make it easier to maintain in the future

It also fixes a regression where the MTD partition support dropped out
of the config. Given noone noticed the regression since v4.20 was
released, perhaps it could be left disabled?

Joel Stanley (2):
  powerpc/config: Fix IPV6 warning in mpc855_ads
  powerpc/configs: Regenerate mpc885_ads_defconfig

 arch/powerpc/configs/mpc885_ads_defconfig | 49 +++
 1 file changed, 23 insertions(+), 26 deletions(-)

-- 
2.32.0



[PATCH 3/3] powerpc/microwatt: CPU doesn't (yet) have speculation bugs

2021-08-16 Thread Joel Stanley
Signed-off-by: Joel Stanley 
---
 arch/powerpc/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 663766fbf505..d5af6667c206 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -279,6 +279,7 @@ config PPC_BARRIER_NOSPEC
bool
default y
depends on PPC_BOOK3S_64 || PPC_FSL_BOOK3E
+   depends on !PPC_MICROWATT
 
 config EARLY_PRINTK
bool
-- 
2.32.0



[PATCH 2/3] powerpc: Fix undefined static key

2021-08-16 Thread Joel Stanley
When CONFIG_PPC_BARRIER_NOSPEC=n, security.c is not built leading to a
missing definition of uaccess_flush_key.

  LD  vmlinux.o
  MODPOST vmlinux.symvers
  MODINFO modules.builtin.modinfo
  GEN modules.builtin
  LD  .tmp_vmlinux.kallsyms1
powerpc64le-linux-gnu-ld: arch/powerpc/kernel/align.o:(.toc+0x0): undefined 
reference to `uaccess_flush_key'
powerpc64le-linux-gnu-ld: arch/powerpc/kernel/signal_64.o:(.toc+0x0): undefined 
reference to `uaccess_flush_key'
powerpc64le-linux-gnu-ld: arch/powerpc/kernel/process.o:(.toc+0x0): undefined 
reference to `uaccess_flush_key'
powerpc64le-linux-gnu-ld: arch/powerpc/kernel/traps.o:(.toc+0x0): undefined 
reference to `uaccess_flush_key'
powerpc64le-linux-gnu-ld: 
arch/powerpc/kernel/hw_breakpoint_constraints.o:(.toc+0x0): undefined reference 
to `uaccess_flush_key'
powerpc64le-linux-gnu-ld: arch/powerpc/kernel/ptrace/ptrace.o:(.toc+0x0): more 
undefined references to `uaccess_flush_key' follow
make[1]: *** [Makefile:1176: vmlinux] Error 1

Hack one in to fix the build.

Signed-off-by: Joel Stanley 
---
 arch/powerpc/include/asm/security_features.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/include/asm/security_features.h 
b/arch/powerpc/include/asm/security_features.h
index 792eefaf230b..46ade7927a4c 100644
--- a/arch/powerpc/include/asm/security_features.h
+++ b/arch/powerpc/include/asm/security_features.h
@@ -39,6 +39,9 @@ static inline bool security_ftr_enabled(u64 feature)
return !!(powerpc_security_features & feature);
 }
 
+#ifndef CONFIG_PPC_BARRIER_NOSPEC
+DEFINE_STATIC_KEY_FALSE(uaccess_flush_key);
+#endif
 
 // Features indicating support for Spectre/Meltdown mitigations
 
-- 
2.32.0



[PATCH 1/3] powerpc/64s: Fix build when !PPC_BARRIER_NOSPEC

2021-08-16 Thread Joel Stanley
When disabling PPC_BARRIER_NOSPEC the do_barrier_nospec_fixups_range
definition is still included, as well as a stub in asm/setup.h:

../arch/powerpc/lib/feature-fixups.c:502:6: error: redefinition of 
‘do_barrier_nospec_fixu>
  502 | void do_barrier_nospec_fixups_range(bool enable, void *fixup_start, 
void *fixup_en>
  |  ^~
In file included from ../arch/powerpc/lib/feature-fixups.c:23:
../arch/powerpc/include/asm/setup.h:70:20: note: previous definition of 
‘do_barrier_nospec>
   70 | static inline void do_barrier_nospec_fixups_range(bool enable, void 
*start, void *>
  |^~

I assume the intent was to put the just do_barrier_nospec_fixups
behind PPC_BARRIER_NOSPEC and let the compiler drop _range when there
are no users. (There is a caller in module.c, but this is behind
PPC_BARRIER_NOSPEC).

This makes PPC_BOOK3S_64 match how the PPC_FSL_BOOK3E build works.

Signed-off-by: Joel Stanley 
---
 arch/powerpc/include/asm/setup.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index 6c1a7d217d1a..71012284c044 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -66,8 +66,6 @@ extern bool barrier_nospec_enabled;
 
 #ifdef CONFIG_PPC_BARRIER_NOSPEC
 void do_barrier_nospec_fixups_range(bool enable, void *start, void *end);
-#else
-static inline void do_barrier_nospec_fixups_range(bool enable, void *start, 
void *end) { }
 #endif
 
 #ifdef CONFIG_PPC_FSL_BOOK3E
-- 
2.32.0



[PATCH 0/3] powerpc/64s: Fix PPC_BARRIER_NOSPEC=n

2021-08-16 Thread Joel Stanley
When disabling PPC_BARRIER_NOSPEC on Microwatt to see if it improved
boot time, I discovered the build was broken (first patch). This got
worse between when I first tried and now (second patch).

The third patch disables PPC_BARRIER_NOSPEC when building for Microwatt.
This one is optional, as it doesn't seem to change boot speed with the
current Microwatt design on an Arty.

Joel Stanley (3):
  powerpc/64s: Fix build when !PPC_BARRIER_NOSPEC
  powerpc: Fix undefined static key
  powerpc/microwatt: CPU doesn't (yet) have speculation bugs

 arch/powerpc/Kconfig | 1 +
 arch/powerpc/include/asm/security_features.h | 3 +++
 arch/powerpc/include/asm/setup.h | 2 --
 3 files changed, 4 insertions(+), 2 deletions(-)

-- 
2.32.0



[PATCH v1 4/4] powerpc/64s/interrupt: avoid saving CFAR in some asynchronous interrupts

2021-08-16 Thread Nicholas Piggin
Reading the CFAR register is quite costly (~20 cycles on POWER9). It is
a good idea to have for most synchronous interrupts, but for async ones
it is much less important.

Doorbell, external, and decrementer interrupts are the important
asynchronous ones. HV interrupts can't skip CFAR if KVM HV is possible,
because it might be a guest exit that requires CFAR preserved. But for
now the important pseries interrupts can avoid loading CFAR.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 63 
 1 file changed, 63 insertions(+)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 69a472c38f62..42badd7beaf0 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -111,6 +111,8 @@ name:
 #define IAREA  .L_IAREA_\name\()   /* PACA save area */
 #define IVIRT  .L_IVIRT_\name\()   /* Has virt mode entry point */
 #define IISIDE .L_IISIDE_\name\()  /* Uses SRR0/1 not DAR/DSISR */
+#define ICFAR  .L_ICFAR_\name\()   /* Uses CFAR */
+#define ICFAR_IF_HVMODE.L_ICFAR_IF_HVMODE_\name\() /* Uses CFAR if HV 
*/
 #define IDAR   .L_IDAR_\name\()/* Uses DAR (or SRR0) */
 #define IDSISR .L_IDSISR_\name\()  /* Uses DSISR (or SRR1) */
 #define IBRANCH_TO_COMMON  .L_IBRANCH_TO_COMMON_\name\() /* ENTRY branch 
to common */
@@ -150,6 +152,12 @@ do_define_int n
.ifndef IISIDE
IISIDE=0
.endif
+   .ifndef ICFAR
+   ICFAR=1
+   .endif
+   .ifndef ICFAR_IF_HVMODE
+   ICFAR_IF_HVMODE=0
+   .endif
.ifndef IDAR
IDAR=0
.endif
@@ -287,9 +295,21 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
HMT_MEDIUM
std r10,IAREA+EX_R10(r13)   /* save r10 - r12 */
+   .if ICFAR
 BEGIN_FTR_SECTION
mfspr   r10,SPRN_CFAR
 END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
+   .elseif ICFAR_IF_HVMODE
+BEGIN_FTR_SECTION
+  BEGIN_FTR_SECTION_NESTED(69)
+   mfspr   r10,SPRN_CFAR
+  END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 69)
+FTR_SECTION_ELSE
+  BEGIN_FTR_SECTION_NESTED(69)
+   li  r10,0
+  END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 69)
+ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
+   .endif
.if \ool
.if !\virt
b   tramp_real_\name
@@ -305,9 +325,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 BEGIN_FTR_SECTION
std r9,IAREA+EX_PPR(r13)
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+   .if ICFAR || ICFAR_IF_HVMODE
 BEGIN_FTR_SECTION
std r10,IAREA+EX_CFAR(r13)
 END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
+   .endif
INTERRUPT_TO_KERNEL
mfctr   r10
std r10,IAREA+EX_CTR(r13)
@@ -559,7 +581,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
.endif
 
 BEGIN_FTR_SECTION
+   .if ICFAR || ICFAR_IF_HVMODE
ld  r10,IAREA+EX_CFAR(r13)
+   .else
+   li  r10,0
+   .endif
std r10,ORIG_GPR3(r1)
 END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
ld  r10,IAREA+EX_CTR(r13)
@@ -1501,6 +1527,12 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
  *
  * If soft masked, the masked handler will note the pending interrupt for
  * replay, and clear MSR[EE] in the interrupted context.
+ *
+ * CFAR is not required because this is an asynchronous interrupt that in
+ * general won't have much bearing on the state of the CPU, with the possible
+ * exception of crash/debug IPIs, but those are generally moving to use SRESET
+ * IPIs. Unless this is an HV interrupt and KVM HV is possible, in which case
+ * it may be exiting the guest and need CFAR to be saved.
  */
 INT_DEFINE_BEGIN(hardware_interrupt)
IVEC=0x500
@@ -1508,6 +1540,10 @@ INT_DEFINE_BEGIN(hardware_interrupt)
IMASK=IRQS_DISABLED
IKVM_REAL=1
IKVM_VIRT=1
+   ICFAR=0
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   ICFAR_IF_HVMODE=1
+#endif
 INT_DEFINE_END(hardware_interrupt)
 
 EXC_REAL_BEGIN(hardware_interrupt, 0x500, 0x100)
@@ -1726,6 +1762,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
  * If PPC_WATCHDOG is configured, the soft masked handler will actually set
  * things back up to run soft_nmi_interrupt as a regular interrupt handler
  * on the emergency stack.
+ *
+ * CFAR is not required because this is asynchronous (see hardware_interrupt).
+ * A watchdog interrupt may like to have CFAR, but usually the interesting
+ * branch is long gone by that point (e.g., infinite loop).
  */
 INT_DEFINE_BEGIN(decrementer)
IVEC=0x900
@@ -1733,6 +1773,7 @@ INT_DEFINE_BEGIN(decrementer)
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_REAL=1
 #endif
+   ICFAR=0
 INT_DEFINE_END(decrementer)
 
 EXC_REAL_BEGIN(decrementer, 0x900, 0x80)
@@ -1808,6 +1849,8 @@ EXC_COMMON_BEGIN(hdecrementer_common)
  * If soft masked, the masked handler will note the pending interrupt for
  * 

[PATCH v1 3/4] powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use

2021-08-16 Thread Nicholas Piggin
Enabling MSR[EE] in interrupt handlers while interrupts are still soft
masked allows PMIs to profile interrupt handlers to some degree, beyond
what SIAR latching allows.

When perf is not being used, this is almost useless work. It requires an
extra mtmsrd in the irq handler, and it also opens the door to masked
interrupts hitting and requiring replay, which is more expensive than
just taking them directly. This effect can be noticable in high IRQ
workloads.

Avoid enabling MSR[EE] unless perf is currently in use. This saves about
60 cycles (or 8%) on a simple decrementer interrupt microbenchmark.
Replayed interrupts drop from 1.4% of interrupts to 0.003%.

This does prevent the soft-nmi interrupt being taken in these handlers,
but that's not too reliable anyway. The SMP watchdog will continue to be
the reliable way to catch lockups.

Cc: Madhavan Srinivasan 
Cc: Athira Rajeev 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/hw_irq.h | 47 +--
 arch/powerpc/kernel/dbell.c   |  3 +-
 arch/powerpc/kernel/irq.c |  3 +-
 arch/powerpc/kernel/time.c| 30 ++--
 4 files changed, 57 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 2d5c0d3ccbb6..e6644509c7af 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -309,17 +309,46 @@ static inline bool lazy_irq_pending_nocheck(void)
 bool power_pmu_running(void);
 
 /*
- * This is called by asynchronous interrupts to conditionally
- * re-enable hard interrupts after having cleared the source
- * of the interrupt. They are kept disabled if there is a different
- * soft-masked interrupt pending that requires hard masking.
+ * This is called by asynchronous interrupts to check whether to
+ * conditionally re-enable hard interrupts after having cleared
+ * the source of the interrupt. They are kept disabled if there
+ * is a different soft-masked interrupt pending that requires hard
+ * masking.
  */
-static inline void may_hard_irq_enable(void)
+static inline bool may_hard_irq_enable(void)
 {
-   if (!(get_paca()->irq_happened & PACA_IRQ_MUST_HARD_MASK)) {
-   get_paca()->irq_happened &= ~PACA_IRQ_HARD_DIS;
-   __hard_irq_enable();
-   }
+#ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
+   BUG_ON(mfmsr() & MSR_EE);
+#endif
+#ifdef CONFIG_PERF_EVENTS
+   if (!power_pmu_running())
+   return false;
+
+   if (get_paca()->irq_happened & PACA_IRQ_MUST_HARD_MASK)
+   return false;
+
+   return true;
+#else
+   return false;
+#endif
+}
+
+/*
+ * Do the hard enabling, only call this if may_hard_irq_enable is true.
+ */
+static inline void do_hard_irq_enable(void)
+{
+#ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
+   WARN_ON(irq_soft_mask_return() != IRQS_ALL_DISABLED);
+   WARN_ON(get_paca()->irq_happened & PACA_IRQ_MUST_HARD_MASK);
+   BUG_ON(mfmsr() & MSR_EE);
+#endif
+   /*
+* This allows PMI interrupts (and watchdog soft-NMIs) through.
+* There is no other reason to enable this way.
+*/
+   get_paca()->irq_happened &= ~PACA_IRQ_HARD_DIS;
+   __hard_irq_enable();
 }
 
 static inline bool arch_irq_disabled_regs(struct pt_regs *regs)
diff --git a/arch/powerpc/kernel/dbell.c b/arch/powerpc/kernel/dbell.c
index 5545c9cd17c1..0edeb5e9fede 100644
--- a/arch/powerpc/kernel/dbell.c
+++ b/arch/powerpc/kernel/dbell.c
@@ -27,7 +27,8 @@ DEFINE_INTERRUPT_HANDLER_ASYNC(doorbell_exception)
 
ppc_msgsync();
 
-   may_hard_irq_enable();
+   if (may_hard_irq_enable())
+   do_hard_irq_enable();
 
kvmppc_clear_host_ipi(smp_processor_id());
__this_cpu_inc(irq_stat.doorbell_irqs);
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 551b653228c4..745becbcd1ad 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -739,7 +739,8 @@ void __do_irq(struct pt_regs *regs)
irq = ppc_md.get_irq();
 
/* We can hard enable interrupts now to allow perf interrupts */
-   may_hard_irq_enable();
+   if (may_hard_irq_enable())
+   do_hard_irq_enable();
 
/* And finally process it */
if (unlikely(!irq))
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index c487ba5a6e11..ac67ec57f129 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -567,22 +567,22 @@ DEFINE_INTERRUPT_HANDLER_ASYNC(timer_interrupt)
return;
}
 
-   /* Ensure a positive value is written to the decrementer, or else
-* some CPUs will continue to take decrementer exceptions. When the
-* PPC_WATCHDOG (decrementer based) is configured, keep this at most
-* 31 bits, which is about 4 seconds on most systems, which gives
-* the watchdog a chance of catching timer interrupt hard lockups.
-*/
-   if 

[PATCH v1 2/4] powerpc/64s/perf: add power_pmu_running to query whether perf is being used

2021-08-16 Thread Nicholas Piggin
Interrupt handling code would like to know whether perf is enabled, to
know whether it should enable MSR[EE] to improve PMI coverage.

Cc: Madhavan Srinivasan 
Cc: Athira Rajeev 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/hw_irq.h |  2 ++
 arch/powerpc/perf/core-book3s.c   | 13 +
 2 files changed, 15 insertions(+)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 21cc571ea9c2..2d5c0d3ccbb6 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -306,6 +306,8 @@ static inline bool lazy_irq_pending_nocheck(void)
return __lazy_irq_pending(local_paca->irq_happened);
 }
 
+bool power_pmu_running(void);
+
 /*
  * This is called by asynchronous interrupts to conditionally
  * re-enable hard interrupts after having cleared the source
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index bb0ee716de91..76114a9afb2b 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2380,6 +2380,19 @@ static void perf_event_interrupt(struct pt_regs *regs)
perf_sample_event_took(sched_clock() - start_clock);
 }
 
+bool power_pmu_running(void)
+{
+   struct cpu_hw_events *cpuhw;
+
+   /* Could this simply test local_paca->pmcregs_in_use? */
+
+   if (!ppmu)
+   return false;
+
+   cpuhw = this_cpu_ptr(_hw_events);
+   return cpuhw->n_events;
+}
+
 static int power_pmu_prepare_cpu(unsigned int cpu)
 {
struct cpu_hw_events *cpuhw = _cpu(cpu_hw_events, cpu);
-- 
2.23.0



[PATCH v1 0/4] powerpc/64s: interrupt speedups

2021-08-16 Thread Nicholas Piggin
Here's a few stragglers. The first patch was submitted already but had
some bugs with unrecoverable exceptions on HPT (current->blah being
accessed before MSR[RI] was enabled). Those should be fixed now.

The others are generally for helping asynch interrupts, which are a bit
harder to measure well but important for IO and IPIs.

After this series, the SPR accesses of the interrupt handlers for radix
are becoming pretty optimal except for PPR which we could improve on,
and virt CPU accounting which is very costly -- we might disable that
by default unless someone comes up with a good reason to keep it.

Thanks,
Nick

Nicholas Piggin (4):
  powerpc/64: handle MSR EE and RI in interrupt entry wrapper
  powerpc/64s/perf: add power_pmu_running to query whether perf is being
used
  powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless
perf is in use
  powerpc/64s/interrupt: avoid saving CFAR in some asynchronous
interrupts

 arch/powerpc/include/asm/hw_irq.h| 49 ---
 arch/powerpc/include/asm/interrupt.h | 31 --
 arch/powerpc/kernel/dbell.c  |  3 +-
 arch/powerpc/kernel/exceptions-64s.S | 93 +++-
 arch/powerpc/kernel/fpu.S|  5 ++
 arch/powerpc/kernel/irq.c|  3 +-
 arch/powerpc/kernel/time.c   | 30 -
 arch/powerpc/kernel/vector.S |  8 +++
 arch/powerpc/perf/core-book3s.c  | 13 
 9 files changed, 175 insertions(+), 60 deletions(-)

-- 
2.23.0



[PATCH v1 1/4] powerpc/64: handle MSR EE and RI in interrupt entry wrapper

2021-08-16 Thread Nicholas Piggin
Similarly to the system call change in the previous patch, the mtmsrd to
enable RI can be combined with the mtmsrd to enable EE for interrupts
which enable the latter, which tends to be the important synchronous
interrupts (i.e., page faults).

Do this by enabling EE and RI together at the beginning of the entry
wrapper if PACA_IRQ_HARD_DIS is clear, and just enabling RI if it is set
(which means something wanted EE=0).

Asynchronous interrupts set PACA_IRQ_HARD_DIS, but synchronous ones
leave it unchanged, so by default they always get EE=1 unless they
interrupt a caller that has hard disabled. When the sync interrupt
later calls interrupt_cond_local_irq_enable(), that will not require
another mtmsrd because we already enabled here.

64e is conceptually unchanged, but it also sets MSR[EE]=1 now in the
interrupt wrapper for synchronous interrupts with the same code.

On 64s, saves one mtmsrd L=1 for synchronous interrupts on 64s, which
saves about 20 cycles. For kernel-mode interrupts, both synchronous and
asynchronous, this saves an additional ~40 cycles due to the mtmsrd
being moved ahead of mfspr SPRN_AMR, which prevents a SPR scoreboard
stall (on POWER9).

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/interrupt.h | 31 
 arch/powerpc/kernel/exceptions-64s.S | 30 ---
 arch/powerpc/kernel/fpu.S|  5 +
 arch/powerpc/kernel/vector.S |  8 +++
 4 files changed, 40 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index 6b800d3e2681..e3228a911b35 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -148,9 +148,21 @@ static inline void interrupt_enter_prepare(struct pt_regs 
*regs, struct interrup
 #endif
 
 #ifdef CONFIG_PPC64
-   if (irq_soft_mask_set_return(IRQS_ALL_DISABLED) == IRQS_ENABLED)
+   bool trace_enable = false;
+
+   if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS)) {
+   if (irq_soft_mask_set_return(IRQS_DISABLED) == IRQS_ENABLED)
+   trace_enable = true;
+   } else {
+   irq_soft_mask_set(IRQS_DISABLED);
+   }
+   /* If the interrupt was taken with HARD_DIS set, don't enable MSR[EE] */
+   if (local_paca->irq_happened & PACA_IRQ_HARD_DIS)
+   __hard_RI_enable();
+   else
+   __hard_irq_enable();
+   if (trace_enable)
trace_hardirqs_off();
-   local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
 
if (user_mode(regs)) {
CT_WARN_ON(ct_state() != CONTEXT_USER);
@@ -200,13 +212,20 @@ static inline void interrupt_exit_prepare(struct pt_regs 
*regs, struct interrupt
 
 static inline void interrupt_async_enter_prepare(struct pt_regs *regs, struct 
interrupt_state *state)
 {
+#ifdef CONFIG_PPC64
+   /* Ensure interrupt_enter_prepare does not enable MSR[EE] */
+   local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+#endif
+   interrupt_enter_prepare(regs, state);
 #ifdef CONFIG_PPC_BOOK3S_64
+   /*
+* MSR[RI] is only enabled after interrupt_enter_prepare, so this
+* thread flags access has to come afterward.
+*/
if (cpu_has_feature(CPU_FTR_CTRL) &&
!test_thread_local_flags(_TLF_RUNLATCH))
__ppc64_runlatch_on();
 #endif
-
-   interrupt_enter_prepare(regs, state);
irq_enter();
 }
 
@@ -273,6 +292,8 @@ static inline void interrupt_nmi_enter_prepare(struct 
pt_regs *regs, struct inte
if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
BUG_ON(!arch_irq_disabled_regs(regs) && !(regs->msr & MSR_EE));
 
+   __hard_RI_enable();
+
/* Don't do any per-CPU operations until interrupt state is fixed */
 
if (nmi_disables_ftrace(regs)) {
@@ -370,6 +391,8 @@ interrupt_handler long func(struct pt_regs *regs)   
\
 {  \
long ret;   \
\
+   __hard_RI_enable(); \
+   \
ret = ##func (regs);\
\
return ret; \
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 4aec59a77d4c..69a472c38f62 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -113,7 +113,6 @@ name:
 #define IISIDE .L_IISIDE_\name\()  /* Uses SRR0/1 not DAR/DSISR */
 #define IDAR   .L_IDAR_\name\()/* Uses DAR (or SRR0) */
 #define IDSISR .L_IDSISR_\name\()  /* Uses DSISR (or 

Re: [RFC PATCH] powerpc/book3s64/radix: Upgrade va tlbie to PID tlbie if we cross PMD_SIZE

2021-08-16 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> On 8/12/21 6:19 PM, Michael Ellerman wrote:
>> "Puvichakravarthy Ramachandran"  writes:
 With shared mapping, even though we are unmapping a large range, the kernel
 will force a TLB flush with ptl lock held to avoid the race mentioned in
 commit 1cf35d47712d ("mm: split 'tlb_flush_mmu()' into tlb flushing and 
 memory freeing parts")
 This results in the kernel issuing a high number of TLB flushes even for a 
 large
 range. This can be improved by making sure the kernel switch to pid based 
 flush if the
 kernel is unmapping a 2M range.

 Signed-off-by: Aneesh Kumar K.V 
 ---
   arch/powerpc/mm/book3s64/radix_tlb.c | 8 
   1 file changed, 4 insertions(+), 4 deletions(-)

 diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c > 
 b/arch/powerpc/mm/book3s64/radix_tlb.c
 index aefc100d79a7..21d0f098e43b 100644
 --- a/arch/powerpc/mm/book3s64/radix_tlb.c
 +++ b/arch/powerpc/mm/book3s64/radix_tlb.c
 @@ -1106,7 +1106,7 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
* invalidating a full PID, so it has a far lower threshold to change > 
 from
* individual page flushes to full-pid flushes.
*/
 -static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
 +static unsigned long tlb_single_page_flush_ceiling __read_mostly = 32;
   static unsigned long tlb_local_single_page_flush_ceiling __read_mostly > 
 = POWER9_TLB_SETS_RADIX * 2;

   static inline void __radix__flush_tlb_range(struct mm_struct *mm,
 @@ -1133,7 +1133,7 @@ static inline void __radix__flush_tlb_range(struct > 
 mm_struct *mm,
if (fullmm)
flush_pid = true;
else if (type == FLUSH_TYPE_GLOBAL)
 - flush_pid = nr_pages > tlb_single_page_flush_ceiling;
 + flush_pid = nr_pages >= tlb_single_page_flush_ceiling;
else
flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;
>>>
>>> Additional details on the test environment. This was tested on a 2 Node/8
>>> socket Power10 system.
>>> The LPAR had 105 cores and the LPAR spanned across all the sockets.
>>>
>>> # perf stat -I 1000 -a -e cycles,instructions -e
>>> "{cpu/config=0x030008,name=PM_EXEC_STALL/}" -e
>>> "{cpu/config=0x02E01C,name=PM_EXEC_STALL_TLBIE/}" ./tlbie -i 10 -c 1  -t 1
>>>   Rate of work: = 176
>>> #   time counts unit events
>>>   1.029206442 4198594519  cycles
>>>   1.029206442 2458254252  instructions  # 0.59 
>>> insn per cycle
>>>   1.029206442 3004031488  PM_EXEC_STALL
>>>   1.029206442 1798186036  PM_EXEC_STALL_TLBIE
>>>   Rate of work: = 181
>>>   2.054288539 4183883450  cycles
>>>   2.054288539 2472178171  instructions  # 0.59 
>>> insn per cycle
>>>   2.054288539 3014609313  PM_EXEC_STALL
>>>   2.054288539 1797851642  PM_EXEC_STALL_TLBIE
>>>   Rate of work: = 180
>>>   3.078306883 4171250717  cycles
>>>   3.078306883 2468341094  instructions  # 0.59 
>>> insn per cycle
>>>   3.078306883 2993036205  PM_EXEC_STALL
>>>   3.078306883 1798181890  PM_EXEC_STALL_TLBIE
>>> .
>>> .
>>>
>>> # cat /sys/kernel/debug/powerpc/tlb_single_page_flush_ceiling
>>> 34
>>>
>>> # echo 32 > /sys/kernel/debug/powerpc/tlb_single_page_flush_ceiling
>>>
>>> # perf stat -I 1000 -a -e cycles,instructions -e
>>> "{cpu/config=0x030008,name=PM_EXEC_STALL/}" -e
>>> "{cpu/config=0x02E01C,name=PM_EXEC_STALL_TLBIE/}" ./tlbie -i 10 -c 1  -t 1
>>>   Rate of work: = 313
>>> #   time counts unit events
>>>   1.030310506 4206071143  cycles
>>>   1.030310506 4314716958  instructions  # 1.03 
>>> insn per cycle
>>>   1.030310506 2157762167  PM_EXEC_STALL
>>>   1.030310506  110825573  PM_EXEC_STALL_TLBIE
>>>   Rate of work: = 322
>>>   2.056034068 4331745630  cycles
>>>   2.056034068 4531658304  instructions  # 1.05 
>>> insn per cycle
>>>   2.056034068 2288971361  PM_EXEC_STALL
>>>   2.056034068  111267927  PM_EXEC_STALL_TLBIE
>>>   Rate of work: = 321
>>>   3.081216434 4327050349  cycles
>>>   3.081216434 4379679508  instructions  # 1.01 
>>> insn per cycle
>>>   3.081216434 2252602550  PM_EXEC_STALL
>>>   3.081216434  110974887  PM_EXEC_STALL_TLBIE
>> 
>> 
>> What is the tlbie test actually doing?
>> 
>> Does it do anything to measure the cost of refilling after the full mm flush?
>
> That is essentially
>
> for ()
> {
>shmat()
>fillshm()
>shmdt()
>
> }
>
> for a 256MB range. So it is not really a fair benchmark because it 
> 

Re: [PATCH v2 2/2] powerpc/perf: Return regs->nip as instruction pointer value when SIAR is 0

2021-08-16 Thread Christophe Leroy




Le 16/08/2021 à 08:44, kajoljain a écrit :



On 8/14/21 6:14 PM, Michael Ellerman wrote:

Christophe Leroy  writes:

Le 13/08/2021 à 10:24, Kajol Jain a écrit :

Incase of random sampling, there can be scenarios where SIAR is not
latching sample address and results in 0 value. Since current code
directly returning the siar value, we could see multiple instruction
pointer values as 0 in perf report.


Can you please give more detail on that? What scenarios? On what CPUs?



Hi Michael,
 Sure I will update these details in my next patch-set.


Patch resolves this issue by adding a ternary condition to return
regs->nip incase SIAR is 0.


Your description seems rather similar to
https://github.com/linuxppc/linux/commit/2ca13a4cc56c920a6c9fc8ee45d02bccacd7f46c

Does it mean that the problem occurs on more than the power10 DD1 ?

In that case, can the solution be common instead of doing something for power10 
DD1 and something
for others ?


Agreed.

This change would seem to make that P10 DD1 logic superfluous.

Also we already have a fallback to regs->nip in the else case of the if,
so we should just use that rather than adding a ternary condition.

eg.

if (use_siar && siar_valid(regs) && siar)
return siar + perf_ip_adjust(regs);
else if (use_siar)
return 0;   // no valid instruction pointer
else
return regs->nip;


I'm also not sure why we have that return 0 case, I can't think of why
we'd ever want to do that rather than using nip. So maybe we should do
another patch to drop that case.


Yeah make sense. I will remove return 0 case in my next version.



This was added by commit 
https://github.com/linuxppc/linux/commit/e6878835ac4794f25385522d29c634b7bbb7cca9


Are we sure it was an error to add it and it can be removed ?

Christophe


Re: [PATCH v2 2/2] powerpc/perf: Return regs->nip as instruction pointer value when SIAR is 0

2021-08-16 Thread kajoljain



On 8/13/21 3:04 PM, Christophe Leroy wrote:
> 
> 
> Le 13/08/2021 à 10:24, Kajol Jain a écrit :
>> Incase of random sampling, there can be scenarios where SIAR is not
>> latching sample address and results in 0 value. Since current code
>> directly returning the siar value, we could see multiple instruction
>> pointer values as 0 in perf report.
>> Patch resolves this issue by adding a ternary condition to return
>> regs->nip incase SIAR is 0.
> 
> Your description seems rather similar to 
> https://github.com/linuxppc/linux/commit/2ca13a4cc56c920a6c9fc8ee45d02bccacd7f46c
> 
> Does it mean that the problem occurs on more than the power10 DD1 ?
> 
> In that case, can the solution be common instead of doing something for 
> power10 DD1 and something for others ?

Hi Christophe,
Yes its better to have common check. I will make these updates.

Thanks,
Kajol Jain

> 
>>
>> Fixes: 75382aa72f06 ("powerpc/perf: Move code to select SIAR or pt_regs
>> into perf_read_regs")
>> Signed-off-by: Kajol Jain 
>> ---
>>   arch/powerpc/perf/core-book3s.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/perf/core-book3s.c 
>> b/arch/powerpc/perf/core-book3s.c
>> index 1b464aad29c4..aeecaaf6810f 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -2260,7 +2260,7 @@ unsigned long perf_instruction_pointer(struct pt_regs 
>> *regs)
>>   else
>>   return regs->nip;
>>   } else if (use_siar && siar_valid(regs))
>> -    return siar + perf_ip_adjust(regs);
>> +    return siar ? siar + perf_ip_adjust(regs) : regs->nip;
>>   else if (use_siar)
>>   return 0;    // no valid instruction pointer
>>   else
>>


Re: [PATCH v2 2/2] powerpc/perf: Return regs->nip as instruction pointer value when SIAR is 0

2021-08-16 Thread kajoljain



On 8/14/21 6:14 PM, Michael Ellerman wrote:
> Christophe Leroy  writes:
>> Le 13/08/2021 à 10:24, Kajol Jain a écrit :
>>> Incase of random sampling, there can be scenarios where SIAR is not
>>> latching sample address and results in 0 value. Since current code
>>> directly returning the siar value, we could see multiple instruction
>>> pointer values as 0 in perf report.
> 
> Can you please give more detail on that? What scenarios? On what CPUs?
> 

Hi Michael,
Sure I will update these details in my next patch-set.

>>> Patch resolves this issue by adding a ternary condition to return
>>> regs->nip incase SIAR is 0.
>>
>> Your description seems rather similar to 
>> https://github.com/linuxppc/linux/commit/2ca13a4cc56c920a6c9fc8ee45d02bccacd7f46c
>>
>> Does it mean that the problem occurs on more than the power10 DD1 ?
>>
>> In that case, can the solution be common instead of doing something for 
>> power10 DD1 and something 
>> for others ?
> 
> Agreed.
> 
> This change would seem to make that P10 DD1 logic superfluous.
> 
> Also we already have a fallback to regs->nip in the else case of the if,
> so we should just use that rather than adding a ternary condition.
> 
> eg.
> 
>   if (use_siar && siar_valid(regs) && siar)
>   return siar + perf_ip_adjust(regs);
>   else if (use_siar)
>   return 0;   // no valid instruction pointer
>   else
>   return regs->nip;
> 
> 
> I'm also not sure why we have that return 0 case, I can't think of why
> we'd ever want to do that rather than using nip. So maybe we should do
> another patch to drop that case.

Yeah make sense. I will remove return 0 case in my next version.

Thanks,
Kajol Jain
> 
> cheers
>