Re: Possible bug in linux-6.2/tools/testing/selftests/powerpc/pmu/sampling_tests/mmcra_thresh_marked_sample_test.c

2023-02-26 Thread Michael Ellerman
David Binderman  writes:
> Hello there,
>
> I ran the static analyser cppcheck over the linux-6.2 source code and got 
> this:
>
> linux-6.2/tools/testing/selftests/powerpc/pmu/sampling_tests/mmcra_thresh_marked_sample_test.c:68:10:
>  style: Same expression '0x3' found multiple times in chain of '&' operators. 
> [duplicateExpression]

Thanks.

> Source code is
>
> FAIL_IF(EV_CODE_EXTRACT(event.attr.config, sample & 0x3) !=
> get_mmcra_sample_mode(get_reg_value(intr_regs, "MMCRA"), 4));
>
> but
>
> #define EV_CODE_EXTRACT(x, y)   \
> ((x >> ev_shift_##y) & ev_mask_##y)
>
>
> Given the token pasting, I very much doubt an expression like "sample & 0x3"
> will work correctly. Same thing on the line above 
>
> FAIL_IF(EV_CODE_EXTRACT(event.attr.config, sample >> 2) !=
> get_mmcra_rand_samp_elig(get_reg_value(intr_regs, "MMCRA"), 4));
>
> "sample >> 2" doesn't look like a valid token to me.

It expands to:

 if event.attr.config >> ev_shift_sample >> 2) & ev_mask_sample >> 2) != 
get_mmcra_rand_samp_elig(get_reg_value(intr_regs, "MMCRA"), 4))) 

Which AFAICS is valid, and does compile.

Whether it's what the author actually intended is less clear.

And the other example with & 0x3 seems obviously wrong, it expands to:

  if event.attr.config >> ev_shift_sample & 0x3) & ev_mask_sample & 0x3) != 
get_mmcra_sample_mode(get_reg_value(intr_regs, "MMCRA"), 4)))

The shift is 24, so bitwise anding it with 0x3 gets 0 which doesn't seem
likely to be what was intended.

cheers


Re: [PATCH v7 5/5] powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN

2023-02-26 Thread Andrew Morton
On Fri,  3 Feb 2023 17:18:37 +1000 Nicholas Piggin  wrote:

> On a 16-socket 192-core POWER8 system, the context_switch1_threads
> benchmark from will-it-scale (see earlier changelog), upstream can
> achieve a rate of about 1 million context switches per second, due to
> contention on the mm refcount.
> 
> 64s meets the prerequisites for CONFIG_MMU_LAZY_TLB_SHOOTDOWN, so enable
> the option. This increases the above benchmark to 118 million context
> switches per second.

Is that the best you can do ;)

> This generates 314 additional IPI interrupts on a 144 CPU system doing
> a kernel compile, which is in the noise in terms of kernel cycles.
> 
> ...
>
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -265,6 +265,7 @@ config PPC
>   select MMU_GATHER_PAGE_SIZE
>   select MMU_GATHER_RCU_TABLE_FREE
>   select MMU_GATHER_MERGE_VMAS
> + select MMU_LAZY_TLB_SHOOTDOWN   if PPC_BOOK3S_64
>   select MODULES_USE_ELF_RELA
>   select NEED_DMA_MAP_STATE   if PPC64 || NOT_COHERENT_CACHE
>   select NEED_PER_CPU_EMBED_FIRST_CHUNK   if PPC64

Can we please have a summary of which other architectures might benefit
from this, and what must they do?

As this is powerpc-only, I expect it won't get a lot of testing in
mm.git or in linux-next.  The powerpc maintainers might choose to merge
in the mm-stable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm if this is a
concern.


Re: [PATCH mm-unstable v1 11/26] microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

2023-02-26 Thread Geert Uytterhoeven
Hi David,

On Fri, Jan 13, 2023 at 6:16 PM David Hildenbrand  wrote:
> Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit
> from the type. Generic MM currently only uses 5 bits for the type
> (MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.
>
> The shift by 2 when converting between PTE and arch-specific swap entry
> makes the swap PTE layout a little bit harder to decipher.
>
> While at it, drop the comment from paulus---copy-and-paste leftover
> from powerpc where we actually have _PAGE_HASHPTE---and mask the type in
> __swp_entry_to_pte() as well.
>
> Cc: Michal Simek 
> Signed-off-by: David Hildenbrand 

Thanks for your patch, which is now commit b5c88f21531c3457
("microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE") in

>  arch/m68k/include/asm/mcf_pgtable.h   |  4 +--

What is this m68k change doing here?
Sorry for not noticing this earlier.

Furthermore, several things below look strange to me...

>  arch/microblaze/include/asm/pgtable.h | 45 +--
>  2 files changed, 37 insertions(+), 12 deletions(-)
>
> diff --git a/arch/m68k/include/asm/mcf_pgtable.h 
> b/arch/m68k/include/asm/mcf_pgtable.h
> index 3f8f4d0e66dd..e573d7b649f7 100644
> --- a/arch/m68k/include/asm/mcf_pgtable.h
> +++ b/arch/m68k/include/asm/mcf_pgtable.h
> @@ -46,8 +46,8 @@
>  #define _CACHEMASK040  (~0x060)
>  #define _PAGE_GLOBAL0400x400   /* 68040 global bit, used for 
> kva descs */
>
> -/* We borrow bit 7 to store the exclusive marker in swap PTEs. */
> -#define _PAGE_SWP_EXCLUSIVE0x080
> +/* We borrow bit 24 to store the exclusive marker in swap PTEs. */
> +#define _PAGE_SWP_EXCLUSIVECF_PAGE_NOCACHE

CF_PAGE_NOCACHE is 0x80, so this is still bit 7, thus the new comment
is wrong?

>
>  /*
>   * Externally used page protection values.
> diff --git a/arch/microblaze/include/asm/pgtable.h 
> b/arch/microblaze/include/asm/pgtable.h
> index 42f5988e998b..7e3de54bf426 100644
> --- a/arch/microblaze/include/asm/pgtable.h
> +++ b/arch/microblaze/include/asm/pgtable.h
> @@ -131,10 +131,10 @@ extern pte_t *va_to_pte(unsigned long address);
>   * of the 16 available.  Bit 24-26 of the TLB are cleared in the TLB
>   * miss handler.  Bit 27 is PAGE_USER, thus selecting the correct
>   * zone.
> - * - PRESENT *must* be in the bottom two bits because swap cache
> - * entries use the top 30 bits.  Because 4xx doesn't support SMP
> - * anyway, M is irrelevant so we borrow it for PAGE_PRESENT.  Bit 30
> - * is cleared in the TLB miss handler before the TLB entry is loaded.
> + * - PRESENT *must* be in the bottom two bits because swap PTEs use the top
> + * 30 bits.  Because 4xx doesn't support SMP anyway, M is irrelevant so we
> + * borrow it for PAGE_PRESENT.  Bit 30 is cleared in the TLB miss handler
> + * before the TLB entry is loaded.

So the PowerPC 4xx comment is still here?

>   * - All other bits of the PTE are loaded into TLBLO without
>   *  * modification, leaving us only the bits 20, 21, 24, 25, 26, 30 for
>   * software PTE bits.  We actually use bits 21, 24, 25, and
> @@ -155,6 +155,9 @@ extern pte_t *va_to_pte(unsigned long address);
>  #define _PAGE_ACCESSED 0x400   /* software: R: page referenced */
>  #define _PMD_PRESENT   PAGE_MASK
>
> +/* We borrow bit 24 to store the exclusive marker in swap PTEs. */
> +#define _PAGE_SWP_EXCLUSIVE_PAGE_DIRTY

_PAGE_DIRTY is 0x80, so this is also bit 7, thus the new comment is
wrong?

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH v6 01/10] dt-bindings: soc: fsl: cpm_qe: Add TSA controller

2023-02-26 Thread Rob Herring
On Fri, Feb 17, 2023 at 03:56:36PM +0100, Herve Codina wrote:
> Add support for the time slot assigner (TSA)
> available in some PowerQUICC SoC such as MPC885
> or MPC866.
> 
> Signed-off-by: Herve Codina 
> ---
>  .../bindings/soc/fsl/cpm_qe/fsl,cpm1-tsa.yaml | 215 ++
>  include/dt-bindings/soc/cpm1-fsl,tsa.h|  13 ++
>  2 files changed, 228 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/soc/fsl/cpm_qe/fsl,cpm1-tsa.yaml
>  create mode 100644 include/dt-bindings/soc/cpm1-fsl,tsa.h
> 
> diff --git 
> a/Documentation/devicetree/bindings/soc/fsl/cpm_qe/fsl,cpm1-tsa.yaml 
> b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/fsl,cpm1-tsa.yaml
> new file mode 100644
> index ..332e902bcc21
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/fsl,cpm1-tsa.yaml
> @@ -0,0 +1,215 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/soc/fsl/cpm_qe/fsl,cpm1-tsa.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: PowerQUICC CPM Time-slot assigner (TSA) controller
> +
> +maintainers:
> +  - Herve Codina 
> +
> +description:
> +  The TSA is the time-slot assigner that can be found on some PowerQUICC SoC.
> +  Its purpose is to route some TDM time-slots to other internal serial
> +  controllers.
> +
> +properties:
> +  compatible:
> +items:
> +  - enum:
> +  - fsl,mpc885-tsa
> +  - fsl,mpc866-tsa
> +  - const: fsl,cpm1-tsa
> +
> +  reg:
> +items:
> +  - description: SI (Serial Interface) register base
> +  - description: SI RAM base
> +
> +  reg-names:
> +items:
> +  - const: si_regs
> +  - const: si_ram
> +
> +  '#address-cells':
> +const: 1
> +
> +  '#size-cells':
> +const: 0
> +
> +  '#fsl,serial-cells':

#foo-cells is for when there are differing foo providers which need 
different number of cells. That's not the case here.

> +$ref: /schemas/types.yaml#/definitions/uint32
> +const: 1
> +description:
> +  TSA consumers that use a phandle to TSA need to pass the serial 
> identifier
> +  with this phandle (defined in dt-bindings/soc/fsl,tsa.h).
> +  For instance "fsl,tsa-serial = <&tsa FSL_CPM_TSA_SCC4>;".