Re: [PATCH] powerpc: Enhance pmem DMA bypass handling

2021-10-25 Thread Alexey Kardashevskiy



On 10/26/21 01:40, Brian King wrote:
> On 10/23/21 7:18 AM, Alexey Kardashevskiy wrote:
>>
>>
>> On 23/10/2021 07:18, Brian King wrote:
>>> On 10/22/21 7:24 AM, Alexey Kardashevskiy wrote:


 On 22/10/2021 04:44, Brian King wrote:
> If ibm,pmemory is installed in the system, it can appear anywhere
> in the address space. This patch enhances how we handle DMA for devices 
> when
> ibm,pmemory is present. In the case where we have enough DMA space to
> direct map all of RAM, but not ibm,pmemory, we use direct DMA for
> I/O to RAM and use the default window to dynamically map ibm,pmemory.
> In the case where we only have a single DMA window, this won't work, > so 
> if the window is not big enough to map the entire address range,
> we cannot direct map.

 but we want the pmem range to be mapped into the huge DMA window too if we 
 can, why skip it?
>>>
>>> This patch should simply do what the comment in this commit mentioned below 
>>> suggests, which says that
>>> ibm,pmemory can appear anywhere in the address space. If the DMA window is 
>>> large enough
>>> to map all of MAX_PHYSMEM_BITS, we will indeed simply do direct DMA for 
>>> everything,
>>> including the pmem. If we do not have a big enough window to do that, we 
>>> will do
>>> direct DMA for DRAM and dynamic mapping for pmem.
>>
>>
>> Right, and this is what we do already, do not we? I missing something here.
> 
> The upstream code does not work correctly that I can see. If I boot an 
> upstream kernel
> with an nvme device and vpmem assigned to the LPAR, and enable dev_dbg in 
> arch/powerpc/platforms/pseries/iommu.c,
> I see the following in the logs:
> 
> [2.157549] nvme 0121:50:00.0: ibm,query-pe-dma-windows(53) 50 800 
> 2121 returned 0
> [2.157561] nvme 0121:50:00.0: Skipping ibm,pmemory
> [2.157567] nvme 0121:50:00.0: can't map partition max 0x8 
> with 16777216 65536-sized pages
> [2.170150] nvme 0121:50:00.0: ibm,create-pe-dma-window(54) 50 800 
> 2121 10 28 returned 0 (liobn = 0x7121 starting addr = 800 0)
> [2.170170] nvme 0121:50:00.0: created tce table LIOBN 0x7121 for 
> /pci@8002121/pci1014,683@0
> [2.356260] nvme 0121:50:00.0: node is /pci@8002121/pci1014,683@0
> 
> This means we are heading down the leg in enable_ddw where we do not set 
> direct_mapping to true. We use
> create the DDW window, but don't do any direct DMA. This is because the 
> window is not large enough to
> map 2PB of memory, which is what ddw_memory_hotplug_max returns without my 
> patch. 
> 
> With my patch applied, I get this in the logs:
> 
> [2.204866] nvme 0121:50:00.0: ibm,query-pe-dma-windows(53) 50 800 
> 2121 returned 0
> [2.204875] nvme 0121:50:00.0: Skipping ibm,pmemory
> [2.205058] nvme 0121:50:00.0: ibm,create-pe-dma-window(54) 50 800 
> 2121 10 21 returned 0 (liobn = 0x7121 starting addr = 800 0)
> [2.205068] nvme 0121:50:00.0: created tce table LIOBN 0x7121 for 
> /pci@8002121/pci1014,683@0
> [2.215898] nvme 0121:50:00.0: iommu: 64-bit OK but direct DMA is limited 
> by 802
> 


ah I see. then...


> 
> Thanks,
> 
> Brian
> 
> 
>>
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/powerpc/platforms/pseries/iommu.c?id=bf6e2d562bbc4d115cf322b0bca57fe5bbd26f48
>>>
>>>
>>> Thanks,
>>>
>>> Brian
>>>
>>>


>
> Signed-off-by: Brian King 
> ---
>    arch/powerpc/platforms/pseries/iommu.c | 19 ++-
>    1 file changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index 269f61d519c2..d9ae985d10a4 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -1092,15 +1092,6 @@ static phys_addr_t ddw_memory_hotplug_max(void)
>    phys_addr_t max_addr = memory_hotplug_max();
>    struct device_node *memory;
>    -    /*
> - * The "ibm,pmemory" can appear anywhere in the address space.
> - * Assuming it is still backed by page structs, set the upper limit
> - * for the huge DMA window as MAX_PHYSMEM_BITS.
> - */
> -    if (of_find_node_by_type(NULL, "ibm,pmemory"))
> -    return (sizeof(phys_addr_t) * 8 <= MAX_PHYSMEM_BITS) ?
> -    (phys_addr_t) -1 : (1ULL << MAX_PHYSMEM_BITS);
> -
>    for_each_node_by_type(memory, "memory") {
>    unsigned long start, size;
>    int n_mem_addr_cells, n_mem_size_cells, len;
> @@ -1341,6 +1332,16 @@ static bool enable_ddw(struct pci_dev *dev, struct 
> device_node *pdn)
>     */
>    len = max_ram_len;
>    if (pmem_present) {
> +    if (default_win_removed) {
> +    /*
> + * If we 

[PATCH 3/3] powerpc/fsl_booke: Fix setting of exec flag when setting TLBCAMs

2021-10-25 Thread Christophe Leroy
Building tqm8541_defconfig results in:

arch/powerpc/mm/nohash/fsl_book3e.c: In function 'settlbcam':
arch/powerpc/mm/nohash/fsl_book3e.c:126:40: error: '_PAGE_BAP_SX' 
undeclared (first use in this function)
  126 | TLBCAM[index].MAS3 |= (flags & _PAGE_BAP_SX) ? MAS3_SX 
: 0;
  |^~~~
arch/powerpc/mm/nohash/fsl_book3e.c:126:40: note: each undeclared 
identifier is reported only once for each function it appears in
make[3]: *** [scripts/Makefile.build:277: 
arch/powerpc/mm/nohash/fsl_book3e.o] Error 1
make[2]: *** [scripts/Makefile.build:540: arch/powerpc/mm/nohash] Error 
2
make[1]: *** [scripts/Makefile.build:540: arch/powerpc/mm] Error 2
make: *** [Makefile:1868: arch/powerpc] Error 2

This is because _PAGE_BAP_SX is not defined when using 32 bits PTE.

Now that _PAGE_EXEC contains both _PAGE_BAP_SX and _PAGE_BAP_UX, it can be used 
instead.

Reported-by: kernel test robot 
Fixes: 01116e6e98b0 ("powerpc/fsl_booke: Take exec flag into account when 
setting TLBCAMs")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/nohash/fsl_book3e.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/nohash/fsl_book3e.c 
b/arch/powerpc/mm/nohash/fsl_book3e.c
index 978e0bcdfa2c..b231a54f540c 100644
--- a/arch/powerpc/mm/nohash/fsl_book3e.c
+++ b/arch/powerpc/mm/nohash/fsl_book3e.c
@@ -123,7 +123,6 @@ static void settlbcam(int index, unsigned long virt, 
phys_addr_t phys,
TLBCAM[index].MAS2 |= (flags & _PAGE_ENDIAN) ? MAS2_E : 0;
 
TLBCAM[index].MAS3 = (phys & MAS3_RPN) | MAS3_SR;
-   TLBCAM[index].MAS3 |= (flags & _PAGE_BAP_SX) ? MAS3_SX : 0;
TLBCAM[index].MAS3 |= (flags & _PAGE_RW) ? MAS3_SW : 0;
if (mmu_has_feature(MMU_FTR_BIG_PHYS))
TLBCAM[index].MAS7 = (u64)phys >> 32;
@@ -133,6 +132,8 @@ static void settlbcam(int index, unsigned long virt, 
phys_addr_t phys,
TLBCAM[index].MAS3 |= MAS3_UR;
TLBCAM[index].MAS3 |= (flags & _PAGE_EXEC) ? MAS3_UX : 0;
TLBCAM[index].MAS3 |= (flags & _PAGE_RW) ? MAS3_UW : 0;
+   } else {
+   TLBCAM[index].MAS3 |= (flags & _PAGE_EXEC) ? MAS3_SX : 0;
}
 
tlbcam_addrs[index].start = virt;
-- 
2.31.1



[PATCH 2/3] powerpc/book3e: Fix set_memory_x() and set_memory_nx()

2021-10-25 Thread Christophe Leroy
set_memory_x() calls pte_mkexec() which sets _PAGE_EXEC.
set_memory_nx() calls pte_exprotec() which clears _PAGE_EXEC.

Book3e has 2 bits, UX and SX, which defines the exec rights
resp. for user (PR=1) and for kernel (PR=0).

_PAGE_EXEC is defined as UX only.

An executable kernel page is set with either _PAGE_KERNEL_RWX
or _PAGE_KERNEL_ROX, which both have SX set and UX cleared.

So set_memory_nx() call for an executable kernel page does
nothing because UX is already cleared.

And set_memory_x() on a non-executable kernel page makes it
executable for the user and keeps it non-executable for kernel.

Also, pte_exec() always returns 'false' on kernel pages, because
it checks _PAGE_EXEC which doesn't include SX, so for instance
the W+X check doesn't work.

To fix this:
- change tlb_low_64e.S to use _PAGE_BAP_UX instead of _PAGE_USER
- sets both UX and SX in _PAGE_EXEC so that pte_user() returns
true whenever one of the two bits is set and pte_exprotect()
clears both bits.
- Define a book3e specific version of pte_mkexec() which sets
either SX or UX based on UR.

Fixes: 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines")
Signed-off-by: Christophe Leroy 
---
v3: Removed pte_mkexec() from nohash/64/pgtable.h
v2: New
---
 arch/powerpc/include/asm/nohash/32/pgtable.h |  2 ++
 arch/powerpc/include/asm/nohash/64/pgtable.h |  5 -
 arch/powerpc/include/asm/nohash/pte-book3e.h | 18 ++
 arch/powerpc/mm/nohash/tlb_low_64e.S |  8 
 4 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 11c6849f7864..b67742e2a9b2 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -193,10 +193,12 @@ static inline pte_t pte_wrprotect(pte_t pte)
 }
 #endif
 
+#ifndef pte_mkexec
 static inline pte_t pte_mkexec(pte_t pte)
 {
return __pte(pte_val(pte) | _PAGE_EXEC);
 }
+#endif
 
 #define pmd_none(pmd)  (!pmd_val(pmd))
 #definepmd_bad(pmd)(pmd_val(pmd) & _PMD_BAD)
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index d081704b13fb..9d2905a47410 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -118,11 +118,6 @@ static inline pte_t pte_wrprotect(pte_t pte)
return __pte(pte_val(pte) & ~_PAGE_RW);
 }
 
-static inline pte_t pte_mkexec(pte_t pte)
-{
-   return __pte(pte_val(pte) | _PAGE_EXEC);
-}
-
 #define PMD_BAD_BITS   (PTE_TABLE_SIZE-1)
 #define PUD_BAD_BITS   (PMD_TABLE_SIZE-1)
 
diff --git a/arch/powerpc/include/asm/nohash/pte-book3e.h 
b/arch/powerpc/include/asm/nohash/pte-book3e.h
index 813918f40765..f798640422c2 100644
--- a/arch/powerpc/include/asm/nohash/pte-book3e.h
+++ b/arch/powerpc/include/asm/nohash/pte-book3e.h
@@ -48,7 +48,7 @@
 #define _PAGE_WRITETHRU0x80 /* W: cache write-through */
 
 /* "Higher level" linux bit combinations */
-#define _PAGE_EXEC _PAGE_BAP_UX /* .. and was cache cleaned */
+#define _PAGE_EXEC (_PAGE_BAP_SX | _PAGE_BAP_UX) /* .. and was 
cache cleaned */
 #define _PAGE_RW   (_PAGE_BAP_SW | _PAGE_BAP_UW) /* User write 
permission */
 #define _PAGE_KERNEL_RW(_PAGE_BAP_SW | _PAGE_BAP_SR | 
_PAGE_DIRTY)
 #define _PAGE_KERNEL_RO(_PAGE_BAP_SR)
@@ -93,11 +93,11 @@
 /* Permission masks used to generate the __P and __S table */
 #define PAGE_NONE  __pgprot(_PAGE_BASE)
 #define PAGE_SHARED__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
-#define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | 
_PAGE_EXEC)
+#define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | 
_PAGE_BAP_UX)
 #define PAGE_COPY  __pgprot(_PAGE_BASE | _PAGE_USER)
-#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
+#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_BAP_UX)
 #define PAGE_READONLY  __pgprot(_PAGE_BASE | _PAGE_USER)
-#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
+#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_BAP_UX)
 
 #ifndef __ASSEMBLY__
 static inline pte_t pte_mkprivileged(pte_t pte)
@@ -113,6 +113,16 @@ static inline pte_t pte_mkuser(pte_t pte)
 }
 
 #define pte_mkuser pte_mkuser
+
+static inline pte_t pte_mkexec(pte_t pte)
+{
+   if (pte_val(pte) & _PAGE_BAP_UR)
+   return __pte((pte_val(pte) & ~_PAGE_BAP_SX) | _PAGE_BAP_UX);
+   else
+   return __pte((pte_val(pte) & ~_PAGE_BAP_UX) | _PAGE_BAP_SX);
+}
+#define pte_mkexec pte_mkexec
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/mm/nohash/tlb_low_64e.S 
b/arch/powerpc/mm/nohash/tlb_low_64e.S
index bf24451f3e71..9235e720e357 100644
--- a/arch/powerpc/mm/nohash/tlb_low_64e.S
+++ b/arch/powerpc/mm/nohash/tlb_low_64e.S
@@ -222,7 +222,7 

[PATCH 1/3] powerpc/nohash: Fix __ptep_set_access_flags() and ptep_set_wrprotect()

2021-10-25 Thread Christophe Leroy
Commit 26973fa5ac0e ("powerpc/mm: use pte helpers in generic code")
changed those two functions to use pte helpers to determine which
bits to clear and which bits to set.

This change was based on the assumption that bits to be set/cleared
are always the same and can be determined by applying the pte
manipulation helpers on __pte(0).

But on platforms like book3e, the bits depend on whether the page
is a user page or not.

For the time being it more or less works because of _PAGE_EXEC being
used for user pages only and exec right being set at all time on
kernel page. But following patch will clean that and output of
pte_mkexec() will depend on the page being a user or kernel page.

Instead of trying to make an even more complicated helper where bits
would become dependent on the final pte value, come back to a more
static situation like before commit 26973fa5ac0e ("powerpc/mm: use
pte helpers in generic code"), by introducing an 8xx specific
version of __ptep_set_access_flags() and ptep_set_wrprotect().

Fixes: 26973fa5ac0e ("powerpc/mm: use pte helpers in generic code")
Signed-off-by: Christophe Leroy 
---
v3: No change
v2: New
---
 arch/powerpc/include/asm/nohash/32/pgtable.h | 17 +++
 arch/powerpc/include/asm/nohash/32/pte-8xx.h | 22 
 2 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 34ce50da1850..11c6849f7864 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -306,30 +306,29 @@ static inline pte_t ptep_get_and_clear(struct mm_struct 
*mm, unsigned long addr,
 }
 
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
+#ifndef ptep_set_wrprotect
 static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
  pte_t *ptep)
 {
-   unsigned long clr = ~pte_val(pte_wrprotect(__pte(~0)));
-   unsigned long set = pte_val(pte_wrprotect(__pte(0)));
-
-   pte_update(mm, addr, ptep, clr, set, 0);
+   pte_update(mm, addr, ptep, _PAGE_RW, 0, 0);
 }
+#endif
 
+#ifndef __ptep_set_access_flags
 static inline void __ptep_set_access_flags(struct vm_area_struct *vma,
   pte_t *ptep, pte_t entry,
   unsigned long address,
   int psize)
 {
-   pte_t pte_set = 
pte_mkyoung(pte_mkdirty(pte_mkwrite(pte_mkexec(__pte(0);
-   pte_t pte_clr = 
pte_mkyoung(pte_mkdirty(pte_mkwrite(pte_mkexec(__pte(~0);
-   unsigned long set = pte_val(entry) & pte_val(pte_set);
-   unsigned long clr = ~pte_val(entry) & ~pte_val(pte_clr);
+   unsigned long set = pte_val(entry) &
+   (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | 
_PAGE_EXEC);
int huge = psize > mmu_virtual_psize ? 1 : 0;
 
-   pte_update(vma->vm_mm, address, ptep, clr, set, huge);
+   pte_update(vma->vm_mm, address, ptep, 0, set, huge);
 
flush_tlb_page(vma, address);
 }
+#endif
 
 static inline int pte_young(pte_t pte)
 {
diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h 
b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index fcc48d590d88..1a89ebdc3acc 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -136,6 +136,28 @@ static inline pte_t pte_mkhuge(pte_t pte)
 
 #define pte_mkhuge pte_mkhuge
 
+static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, 
pte_t *p,
+unsigned long clr, unsigned long set, int 
huge);
+
+static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep)
+{
+   pte_update(mm, addr, ptep, 0, _PAGE_RO, 0);
+}
+#define ptep_set_wrprotect ptep_set_wrprotect
+
+static inline void __ptep_set_access_flags(struct vm_area_struct *vma, pte_t 
*ptep,
+  pte_t entry, unsigned long address, 
int psize)
+{
+   unsigned long set = pte_val(entry) & (_PAGE_DIRTY | _PAGE_ACCESSED | 
_PAGE_EXEC);
+   unsigned long clr = ~pte_val(entry) & _PAGE_RO;
+   int huge = psize > mmu_virtual_psize ? 1 : 0;
+
+   pte_update(vma->vm_mm, address, ptep, clr, set, huge);
+
+   flush_tlb_page(vma, address);
+}
+#define __ptep_set_access_flags __ptep_set_access_flags
+
 static inline unsigned long pgd_leaf_size(pgd_t pgd)
 {
if (pgd_val(pgd) & _PMD_PAGE_8M)
-- 
2.31.1



Re: [PATCH v11 2/3] tty: hvc: pass DMA capable memory to put_chars()

2021-10-25 Thread Jiri Slaby

On 15. 10. 21, 4:46, Xianting Tian wrote:

@@ -151,9 +142,11 @@ static uint32_t vtermnos[MAX_NR_HVC_CONSOLES] =
  static void hvc_console_print(struct console *co, const char *b,
  unsigned count)
  {
-   char c[N_OUTBUF] __ALIGNED__;
+   char *c;
unsigned i = 0, n = 0;
int r, donecr = 0, index = co->index;
+   unsigned long flags;
+   struct hvc_struct *hp;
  
  	/* Console access attempt outside of acceptable console range. */

if (index >= MAX_NR_HVC_CONSOLES)
@@ -163,6 +156,13 @@ static void hvc_console_print(struct console *co, const 
char *b,
if (vtermnos[index] == -1)
return;
  
+	hp = cons_hvcs[index];

+   if (!hp)
+   return;


You effectively make the console unusable until someone calls 
hvc_alloc() for this device, correct? This doesn't look right. Neither 
you describe this change of behaviour in the commit log.


regards,
--
js
suse labs


[PATCH v2] macintosh/via-pmu-led: make disk activity usage a parameter.

2021-10-25 Thread Hill Ma
Whether to use the LED as a disk activity is a user preference.
Some like this usage while others find the LED too bright. So it
might be a good idea to make this choice a runtime parameter rather
than compile-time config.

The default is set to disabled as OS X does not use the LED as a
disk activity indicator.

Signed-off-by: Hill Ma 
---
 Documentation/admin-guide/kernel-parameters.txt |  6 ++
 drivers/macintosh/Kconfig   | 10 --
 drivers/macintosh/via-pmu-led.c | 11 ---
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 43dc35fe5bc0..a656a51ba0a8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -250,6 +250,12 @@
Use timer override. For some broken Nvidia NF5 boards
that require a timer override, but don't have HPET
 
+   adb_pmu_led_disk [PPC]
+   Use front LED as disk LED by default. Only applies to
+   PowerBook, iBook, PowerMac 7,2/7,3.
+   Format:   (1/Y/y=enable, 0/N/n=disable)
+   Default: disabled
+
add_efi_memmap  [EFI; X86] Include EFI memory map in
kernel's map of available physical RAM.
 
diff --git a/drivers/macintosh/Kconfig b/drivers/macintosh/Kconfig
index 5cdc361da37c..243215de563c 100644
--- a/drivers/macintosh/Kconfig
+++ b/drivers/macintosh/Kconfig
@@ -78,16 +78,6 @@ config ADB_PMU_LED
  behaviour of the old CONFIG_BLK_DEV_IDE_PMAC_BLINK, select this
  and the disk LED trigger and configure appropriately through sysfs.
 
-config ADB_PMU_LED_DISK
-   bool "Use front LED as DISK LED by default"
-   depends on ADB_PMU_LED
-   depends on LEDS_CLASS
-   select LEDS_TRIGGERS
-   select LEDS_TRIGGER_DISK
-   help
- This option makes the front LED default to the disk trigger
- so that it blinks on disk activity.
-
 config PMAC_SMU
bool "Support for SMU  based PowerMacs"
depends on PPC_PMAC64
diff --git a/drivers/macintosh/via-pmu-led.c b/drivers/macintosh/via-pmu-led.c
index ae067ab2373d..faf39a5962aa 100644
--- a/drivers/macintosh/via-pmu-led.c
+++ b/drivers/macintosh/via-pmu-led.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static spinlock_t pmu_blink_lock;
@@ -71,11 +72,10 @@ static void pmu_led_set(struct led_classdev *led_cdev,
spin_unlock_irqrestore(_blink_lock, flags);
 }
 
+bool adb_pmu_led_disk;
+
 static struct led_classdev pmu_led = {
.name = "pmu-led::front",
-#ifdef CONFIG_ADB_PMU_LED_DISK
-   .default_trigger = "disk-activity",
-#endif
.brightness_set = pmu_led_set,
 };
 
@@ -106,6 +106,9 @@ static int __init via_pmu_led_init(void)
}
of_node_put(dt);
 
+   if (adb_pmu_led_disk)
+   pmu_led.default_trigger = "disk-activity";
+
spin_lock_init(_blink_lock);
/* no outstanding req */
pmu_blink_req.complete = 1;
@@ -114,4 +117,6 @@ static int __init via_pmu_led_init(void)
return led_classdev_register(NULL, _led);
 }
 
+core_param(adb_pmu_led_disk, adb_pmu_led_disk, bool, 0644);
+
 late_initcall(via_pmu_led_init);
-- 
2.33.1



[PATCH v5 2/2] ftrace: do CPU checking after preemption disabled

2021-10-25 Thread 王贇
With CONFIG_DEBUG_PREEMPT we observed reports like:

  BUG: using smp_processor_id() in preemptible
  caller is perf_ftrace_function_call+0x6f/0x2e0
  CPU: 1 PID: 680 Comm: a.out Not tainted
  Call Trace:
   
   dump_stack_lvl+0x8d/0xcf
   check_preemption_disabled+0x104/0x110
   ? optimize_nops.isra.7+0x230/0x230
   ? text_poke_bp_batch+0x9f/0x310
   perf_ftrace_function_call+0x6f/0x2e0
   ...
   __text_poke+0x5/0x620
   text_poke_bp_batch+0x9f/0x310

This telling us the CPU could be changed after task is preempted, and
the checking on CPU before preemption will be invalid.

Since now ftrace_test_recursion_trylock() will help to disable the
preemption, this patch just do the checking after trylock() to address
the issue.

CC: Steven Rostedt 
Reported-by: Abaci 
Signed-off-by: Michael Wang 
---
 kernel/trace/trace_event_perf.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 6aed10e..fba8cb7 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -441,13 +441,13 @@ void perf_trace_buf_update(void *record, u16 type)
if (!rcu_is_watching())
return;

-   if ((unsigned long)ops->private != smp_processor_id())
-   return;
-
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;

+   if ((unsigned long)ops->private != smp_processor_id())
+   goto out;
+
event = container_of(ops, struct perf_event, ftrace_ops);

/*
-- 
1.8.3.1



[PATCH v5 1/2] ftrace: disable preemption when recursion locked

2021-10-25 Thread 王贇
As the documentation explained, ftrace_test_recursion_trylock()
and ftrace_test_recursion_unlock() were supposed to disable and
enable preemption properly, however currently this work is done
outside of the function, which could be missing by mistake.

And since the internal using of trace_test_and_set_recursion()
and trace_clear_recursion() also require preemption disabled, we
can just merge the logical.

This patch will make sure the preemption has been disabled when
trace_test_and_set_recursion() return bit >= 0, and
trace_clear_recursion() will enable the preemption if previously
enabled.

CC: Petr Mladek 
CC: Steven Rostedt 
CC: Miroslav Benes 
Reported-by: Abaci 
Suggested-by: Peter Zijlstra 
Signed-off-by: Michael Wang 
---
 arch/csky/kernel/probes/ftrace.c |  2 --
 arch/parisc/kernel/ftrace.c  |  2 --
 arch/powerpc/kernel/kprobes-ftrace.c |  2 --
 arch/riscv/kernel/probes/ftrace.c|  2 --
 arch/x86/kernel/kprobes/ftrace.c |  2 --
 include/linux/trace_recursion.h  | 11 ++-
 kernel/livepatch/patch.c | 13 +++--
 kernel/trace/ftrace.c| 15 +--
 kernel/trace/trace_functions.c   |  5 -
 9 files changed, 22 insertions(+), 32 deletions(-)

diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
index b388228..834cffc 100644
--- a/arch/csky/kernel/probes/ftrace.c
+++ b/arch/csky/kernel/probes/ftrace.c
@@ -17,7 +17,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
return;

regs = ftrace_get_regs(fregs);
-   preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (!p) {
p = get_kprobe((kprobe_opcode_t *)(ip - MCOUNT_INSN_SIZE));
@@ -57,7 +56,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
__this_cpu_write(current_kprobe, NULL);
}
 out:
-   preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 7d14242..90c4345 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -210,7 +210,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
return;

regs = ftrace_get_regs(fregs);
-   preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -239,7 +238,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
}
__this_cpu_write(current_kprobe, NULL);
 out:
-   preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c 
b/arch/powerpc/kernel/kprobes-ftrace.c
index 7154d58..072ebe7 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -26,7 +26,6 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long 
parent_nip,
return;

regs = ftrace_get_regs(fregs);
-   preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)nip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -61,7 +60,6 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long 
parent_nip,
__this_cpu_write(current_kprobe, NULL);
}
 out:
-   preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/riscv/kernel/probes/ftrace.c 
b/arch/riscv/kernel/probes/ftrace.c
index aab85a8..7142ec4 100644
--- a/arch/riscv/kernel/probes/ftrace.c
+++ b/arch/riscv/kernel/probes/ftrace.c
@@ -15,7 +15,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
if (bit < 0)
return;

-   preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -52,7 +51,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
__this_cpu_write(current_kprobe, NULL);
}
 out:
-   preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index 596de2f..dd2ec14 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -25,7 +25,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
if (bit < 0)
return;

-   preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -59,7 +58,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
__this_cpu_write(current_kprobe, NULL);
 

[PATCH v5 0/2] fix & prevent the missing preemption disabling

2021-10-25 Thread 王贇
The testing show that perf_ftrace_function_call() are using smp_processor_id()
with preemption enabled, all the checking on CPU could be wrong after 
preemption.

As Peter point out, the section between ftrace_test_recursion_trylock/unlock()
pair require the preemption to be disabled as 
'Documentation/trace/ftrace-uses.rst'
explained, but currently the work is done outside of the helpers.

And since the internal using of trace_test_and_set_recursion()
and trace_clear_recursion() also require preemption to be disabled, we
can just merge the logical together.

Patch 1/2 will make sure preemption disabled when recursion lock succeed,
patch 2/2 will do smp_processor_id() checking after trylock() to address the
issue.

v1: 
https://lore.kernel.org/all/8c7de46d-9869-aa5e-2bb9-5dbc2eda3...@linux.alibaba.com/
v2: 
https://lore.kernel.org/all/b1d7fe43-ce84-0ed7-32f7-ea1d12d0b...@linux.alibaba.com/
v3: 
https://lore.kernel.org/all/609b565a-ed6e-a1da-f025-166691b5d...@linux.alibaba.com/
V4: 
https://lore.kernel.org/all/32a36348-69ee-6464-390c-3a8d6e9d2...@linux.alibaba.com/

Michael Wang (2):
  ftrace: disable preemption when recursion locked
  ftrace: do CPU checking after preemption disabled

 arch/csky/kernel/probes/ftrace.c |  2 --
 arch/parisc/kernel/ftrace.c  |  2 --
 arch/powerpc/kernel/kprobes-ftrace.c |  2 --
 arch/riscv/kernel/probes/ftrace.c|  2 --
 arch/x86/kernel/kprobes/ftrace.c |  2 --
 include/linux/trace_recursion.h  | 11 ++-
 kernel/livepatch/patch.c | 13 +++--
 kernel/trace/ftrace.c| 15 +--
 kernel/trace/trace_event_perf.c  |  6 +++---
 kernel/trace/trace_functions.c   |  5 -
 10 files changed, 25 insertions(+), 35 deletions(-)

-- 
1.8.3.1



Re: [PATCH v4 0/2] fix & prevent the missing preemption disabling

2021-10-25 Thread 王贇



On 2021/10/26 上午10:42, Steven Rostedt wrote:
> On Tue, 26 Oct 2021 10:09:12 +0800
> 王贇  wrote:
> 
>> Just a ping, to see if there are any more comments :-P
> 
> I guess you missed what went into mainline (and your name found a bug
> in my perl script for importing patches ;-)
> 
>   https://lore.kernel.org/all/20211019091344.65629...@gandalf.local.home/

Cool~ Missing some chinese font maybe, that's fine :-)

> 
> Which means patch 1 needs to change:
>> +/*
>> + * Disable preemption to fulfill the promise.
>> + *
>> + * Don't worry about the bit 0 cases, they indicate
>> + * the disabling behaviour has already been done by
>> + * internal call previously.
>> + */
>> +preempt_disable_notrace();
>> +
>>  return bit + 1;
>>  }
>>
>> +/*
>> + * Preemption will be enabled (if it was previously enabled).
>> + */
>>  static __always_inline void trace_clear_recursion(int bit)
>>  {
>>  if (!bit)
>>  return;
>>
>> +if (bit > 0)
>> +preempt_enable_notrace();
>> +
> 
> Where this wont work anymore.
> 
> Need to preempt disable and enable always.

Yup, will rebase on the latest changes~

Regards,
Michael Wang

> 
> -- Steve
> 
> 
>>  barrier();
>>  bit--;
>>  trace_recursion_clear(bit);
>> @@ -209,7 +227,7 @@ static __always_inline void trace_clear_recursion(int 
>> bit)
>>   * tracing recursed in the same context (normal vs interrupt),
>>   *


Re: [PATCH v4 0/2] fix & prevent the missing preemption disabling

2021-10-25 Thread Steven Rostedt
On Tue, 26 Oct 2021 10:09:12 +0800
王贇  wrote:

> Just a ping, to see if there are any more comments :-P

I guess you missed what went into mainline (and your name found a bug
in my perl script for importing patches ;-)

  https://lore.kernel.org/all/20211019091344.65629...@gandalf.local.home/

Which means patch 1 needs to change:

> + /*
> +  * Disable preemption to fulfill the promise.
> +  *
> +  * Don't worry about the bit 0 cases, they indicate
> +  * the disabling behaviour has already been done by
> +  * internal call previously.
> +  */
> + preempt_disable_notrace();
> +
>   return bit + 1;
>  }
> 
> +/*
> + * Preemption will be enabled (if it was previously enabled).
> + */
>  static __always_inline void trace_clear_recursion(int bit)
>  {
>   if (!bit)
>   return;
> 
> + if (bit > 0)
> + preempt_enable_notrace();
> +

Where this wont work anymore.

Need to preempt disable and enable always.

-- Steve


>   barrier();
>   bit--;
>   trace_recursion_clear(bit);
> @@ -209,7 +227,7 @@ static __always_inline void trace_clear_recursion(int bit)
>   * tracing recursed in the same context (normal vs interrupt),
>   *


Re: [PATCH] mm/migrate.c: Remove MIGRATE_PFN_LOCKED

2021-10-25 Thread Ralph Campbell



On 10/24/21 21:16, Alistair Popple wrote:

MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a
source page was already locked during migrate_vma_collect(). If it
wasn't then the a second attempt is made to lock the page. However if
the first attempt failed it's unlikely a second attempt will succeed,
and the retry adds complexity. So clean this up by removing the retry
and MIGRATE_PFN_LOCKED flag.

Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag
set, but nothing actually checks that.

Signed-off-by: Alistair Popple 


You can add:
Reviewed-by: Ralph Campbell 


---
  Documentation/vm/hmm.rst |   2 +-
  arch/powerpc/kvm/book3s_hv_uvmem.c   |   4 +-
  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |   2 -
  drivers/gpu/drm/nouveau/nouveau_dmem.c   |   4 +-
  include/linux/migrate.h  |   1 -
  lib/test_hmm.c   |   5 +-
  mm/migrate.c | 145 +--
  7 files changed, 35 insertions(+), 128 deletions(-)

diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
index a14c2938e7af..f2a59ed82ed3 100644
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -360,7 +360,7 @@ between device driver specific code and shared common code:
 system memory page, locks the page with ``lock_page()``, and fills in the
 ``dst`` array entry with::
  
- dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;

+ dst[i] = migrate_pfn(page_to_pfn(dpage));
  
 Now that the driver knows that this page is being migrated, it can

 invalidate device private MMU mappings and copy device private memory
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index a7061ee3b157..28c436df9935 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -560,7 +560,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
  gpa, 0, page_shift);
  
  	if (ret == U_SUCCESS)

-   *mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED;
+   *mig.dst = migrate_pfn(pfn);
else {
unlock_page(dpage);
__free_page(dpage);
@@ -774,7 +774,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
}
}
  
-	*mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;

+   *mig.dst = migrate_pfn(page_to_pfn(dpage));
migrate_vma_pages();
  out_finalize:
migrate_vma_finalize();
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 4a16e3c257b9..41d9417f182b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -300,7 +300,6 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
migrate->dst[i] = svm_migrate_addr_to_pfn(adev, dst[i]);
svm_migrate_get_vram_page(prange, migrate->dst[i]);
migrate->dst[i] = migrate_pfn(migrate->dst[i]);
-   migrate->dst[i] |= MIGRATE_PFN_LOCKED;
src[i] = dma_map_page(dev, spage, 0, PAGE_SIZE,
  DMA_TO_DEVICE);
r = dma_mapping_error(dev, src[i]);
@@ -580,7 +579,6 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct 
svm_range *prange,
  dst[i] >> PAGE_SHIFT, page_to_pfn(dpage));
  
  		migrate->dst[i] = migrate_pfn(page_to_pfn(dpage));

-   migrate->dst[i] |= MIGRATE_PFN_LOCKED;
j++;
}
  
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c

index 92987daa5e17..3828aafd3ac4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -166,7 +166,7 @@ static vm_fault_t nouveau_dmem_fault_copy_one(struct 
nouveau_drm *drm,
goto error_dma_unmap;
mutex_unlock(>mutex);
  
-	args->dst[0] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;

+   args->dst[0] = migrate_pfn(page_to_pfn(dpage));
return 0;
  
  error_dma_unmap:

@@ -602,7 +602,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct 
nouveau_drm *drm,
((paddr >> PAGE_SHIFT) << NVIF_VMM_PFNMAP_V0_ADDR_SHIFT);
if (src & MIGRATE_PFN_WRITE)
*pfn |= NVIF_VMM_PFNMAP_V0_W;
-   return migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+   return migrate_pfn(page_to_pfn(dpage));
  
  out_dma_unmap:

dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index c8077e936691..479b861ae490 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -119,7 +119,6 @@ static inline int migrate_misplaced_page(struct page *page,
   */
  #define MIGRATE_PFN_VALID 

linux-next: manual merge of the audit tree with the powerpc tree

2021-10-25 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the audit tree got conflicts in:

  arch/powerpc/kernel/audit.c
  arch/powerpc/kernel/compat_audit.c

between commit:

  566af8cda399 ("powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC")

from the powerpc tree and commits:

  42f355ef59a2 ("audit: replace magic audit syscall class numbers with macros")
  1c30e3af8a79 ("audit: add support for the openat2 syscall")

from the audit tree.

I fixed it up (I just removed the files like the former commit) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.



-- 
Cheers,
Stephen Rothwell


pgpA2Gjdtyh3F.pgp
Description: OpenPGP digital signature


Re: [PATCH v4 0/2] fix & prevent the missing preemption disabling

2021-10-25 Thread 王贇
Just a ping, to see if there are any more comments :-P

Regards,
Michael Wang

On 2021/10/18 上午11:38, 王贇 wrote:
> The testing show that perf_ftrace_function_call() are using smp_processor_id()
> with preemption enabled, all the checking on CPU could be wrong after 
> preemption.
> 
> As Peter point out, the section between ftrace_test_recursion_trylock/unlock()
> pair require the preemption to be disabled as 
> 'Documentation/trace/ftrace-uses.rst'
> explained, but currently the work is done outside of the helpers.
> 
> And since the internal using of trace_test_and_set_recursion()
> and trace_clear_recursion() also require preemption to be disabled, we
> can just merge the logical together.
> 
> Patch 1/2 will make sure preemption disabled when recursion lock succeed,
> patch 2/2 will do smp_processor_id() checking after trylock() to address the
> issue.
> 
> v1: 
> https://lore.kernel.org/all/8c7de46d-9869-aa5e-2bb9-5dbc2eda3...@linux.alibaba.com/
> v2: 
> https://lore.kernel.org/all/b1d7fe43-ce84-0ed7-32f7-ea1d12d0b...@linux.alibaba.com/
> v3: 
> https://lore.kernel.org/all/609b565a-ed6e-a1da-f025-166691b5d...@linux.alibaba.com/
> 
> Michael Wang (2):
>   ftrace: disable preemption when recursion locked
>   ftrace: do CPU checking after preemption disabled
> 
>  arch/csky/kernel/probes/ftrace.c |  2 --
>  arch/parisc/kernel/ftrace.c  |  2 --
>  arch/powerpc/kernel/kprobes-ftrace.c |  2 --
>  arch/riscv/kernel/probes/ftrace.c|  2 --
>  arch/x86/kernel/kprobes/ftrace.c |  2 --
>  include/linux/trace_recursion.h  | 20 +++-
>  kernel/livepatch/patch.c | 13 +++--
>  kernel/trace/ftrace.c| 15 +--
>  kernel/trace/trace_event_perf.c  |  6 +++---
>  kernel/trace/trace_functions.c   |  5 -
>  10 files changed, 34 insertions(+), 35 deletions(-)
> 


Re: [PATCH] macintosh/via-pmu-led: make disk activity usage a parameter.

2021-10-25 Thread kernel test robot
Hi Hill,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.15-rc7 next-20211025]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Hill-Ma/macintosh-via-pmu-led-make-disk-activity-usage-a-parameter/20211025-152845
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
87066fdd2e30fe9dd531125d95257c118a74617e
config: powerpc-pmac32_defconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/9f9763c0836766055225087cee4126f8d2974252
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Hill-Ma/macintosh-via-pmu-led-make-disk-activity-usage-a-parameter/20211025-152845
git checkout 9f9763c0836766055225087cee4126f8d2974252
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   In file included from drivers/macintosh/via-pmu-led.c:28:
   drivers/macintosh/via-pmu-led.c: In function '__check_adb_pmu_led_disk':
>> include/linux/moduleparam.h:329:34: error: returning 'int *' from a function 
>> with incompatible return type 'bool *' {aka '_Bool *'} 
>> [-Werror=incompatible-pointer-types]
 329 | param_check_##type(name, &(var));
   \
 |  ^
   include/linux/moduleparam.h:409:75: note: in definition of macro 
'__param_check'
 409 | static inline type __always_unused *__check_##name(void) { 
return(p); }
 |  
 ^
   include/linux/moduleparam.h:329:9: note: in expansion of macro 
'param_check_bool'
 329 | param_check_##type(name, &(var));
   \
 | ^~~~
   drivers/macintosh/via-pmu-led.c:120:1: note: in expansion of macro 
'core_param'
 120 | core_param(adb_pmu_led_disk, adb_pmu_led_disk, bool, 0644);
 | ^~
   cc1: some warnings being treated as errors


vim +329 include/linux/moduleparam.h

907b29eb41aa604 Rusty Russell   2010-08-11  314  
67e67ceaac5bf55 Rusty Russell   2008-10-22  315  #ifndef MODULE
67e67ceaac5bf55 Rusty Russell   2008-10-22  316  /**
67e67ceaac5bf55 Rusty Russell   2008-10-22  317   * core_param - define a 
historical core kernel parameter.
67e67ceaac5bf55 Rusty Russell   2008-10-22  318   * @name: the name of the 
cmdline and sysfs parameter (often the same as var)
67e67ceaac5bf55 Rusty Russell   2008-10-22  319   * @var: the variable
546970bc6afc7fb Rusty Russell   2010-08-11  320   * @type: the type of the 
parameter
67e67ceaac5bf55 Rusty Russell   2008-10-22  321   * @perm: visibility in sysfs
67e67ceaac5bf55 Rusty Russell   2008-10-22  322   *
67e67ceaac5bf55 Rusty Russell   2008-10-22  323   * core_param is just like 
module_param(), but cannot be modular and
67e67ceaac5bf55 Rusty Russell   2008-10-22  324   * doesn't add a prefix (such 
as "printk.").  This is for compatibility
67e67ceaac5bf55 Rusty Russell   2008-10-22  325   * with __setup(), and it 
makes sense as truly core parameters aren't
67e67ceaac5bf55 Rusty Russell   2008-10-22  326   * tied to the particular file 
they're in.
67e67ceaac5bf55 Rusty Russell   2008-10-22  327   */
67e67ceaac5bf55 Rusty Russell   2008-10-22  328  #define core_param(name, var, 
type, perm)  \
67e67ceaac5bf55 Rusty Russell   2008-10-22 @329 
param_check_##type(name, &(var));   \
91f9d330cc14932 Jani Nikula 2014-08-27  330 __module_param_call("", 
name, _ops_##type, , perm, -1, 0)
ec0ccc16a09fc32 Dmitry Torokhov 2015-03-30  331  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


[powerpc:fixes-test] BUILD SUCCESS d853adc7adf601d7d6823afe3ed396065a3e08d1

2021-10-25 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
fixes-test
branch HEAD: d853adc7adf601d7d6823afe3ed396065a3e08d1  powerpc/pseries/iommu: 
Create huge DMA window if no MMIO32 is present

elapsed time: 735m

configs tested: 38
configs skipped: 100

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
i386 randconfig-c001-20211025
mips   ip28_defconfig
sh   se7206_defconfig
sh  defconfig
sh espt_defconfig
powerpc wii_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68kdefconfig
nios2   defconfig
nds32 allnoconfig
nds32   defconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
i386  debian-10.3
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64rhel-8.3-kselftests
um   x86_64_defconfig
um i386_defconfig

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


Re: [PATCH] powerpc/bpf: fix write protecting JIT code

2021-10-25 Thread Daniel Borkmann

On 10/25/21 8:15 AM, Naveen N. Rao wrote:

Hari Bathini wrote:

Running program with bpf-to-bpf function calls results in data access
exception (0x300) with the below call trace:

    [c0113f28] bpf_int_jit_compile+0x238/0x750 (unreliable)
    [c037d2f8] bpf_check+0x2008/0x2710
    [c0360050] bpf_prog_load+0xb00/0x13a0
    [c0361d94] __sys_bpf+0x6f4/0x27c0
    [c0363f0c] sys_bpf+0x2c/0x40
    [c0032434] system_call_exception+0x164/0x330
    [c000c1e8] system_call_vectored_common+0xe8/0x278

as bpf_int_jit_compile() tries writing to write protected JIT code
location during the extra pass.

Fix it by holding off write protection of JIT code until the extra
pass, where branch target addresses fixup happens.

Cc: sta...@vger.kernel.org
Fixes: 62e3d4210ac9 ("powerpc/bpf: Write protect JIT code")
Signed-off-by: Hari Bathini 
---
 arch/powerpc/net/bpf_jit_comp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


Thanks for the fix!

Reviewed-by: Naveen N. Rao 


LGTM, I presume this fix will be routed via Michael.

BPF selftests have plenty of BPF-to-BPF calls in there, too bad this was
caught so late. :/


Re: [PATCH v1 3/8] powerpc/fsl_booke: Take exec flag into account when setting TLBCAMs

2021-10-25 Thread Christophe Leroy




On 22/10/2021 08:36, kernel test robot wrote:

Hi Christophe,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.15-rc6 next-20211021]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Christophe-Leroy/powerpc-booke-Disable-STRICT_KERNEL_RWX-DEBUG_PAGEALLOC-and-KFENCE/20211015-180535
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-tqm8541_defconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
 wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
 chmod +x ~/bin/make.cross
 # 
https://github.com/0day-ci/linux/commit/159ed9a0b39712475dfebed64d1bb9387a0b9ad5
 git remote add linux-review https://github.com/0day-ci/linux
 git fetch --no-tags linux-review 
Christophe-Leroy/powerpc-booke-Disable-STRICT_KERNEL_RWX-DEBUG_PAGEALLOC-and-KFENCE/20211015-180535
 git checkout 159ed9a0b39712475dfebed64d1bb9387a0b9ad5
 # save the attached .config to linux build tree
 mkdir build_dir
 COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross 
O=build_dir ARCH=powerpc SHELL=/bin/bash arch/powerpc/mm/nohash/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

arch/powerpc/mm/nohash/fsl_book3e.c:63:15: error: no previous prototype for 
'tlbcam_sz' [-Werror=missing-prototypes]
   63 | unsigned long tlbcam_sz(int idx)
  |   ^
arch/powerpc/mm/nohash/fsl_book3e.c: In function 'settlbcam':

arch/powerpc/mm/nohash/fsl_book3e.c:126:40: error: '_PAGE_BAP_SX' undeclared 
(first use in this function)

  126 | TLBCAM[index].MAS3 |= (flags & _PAGE_BAP_SX) ? MAS3_SX : 0;
  |^~~~
arch/powerpc/mm/nohash/fsl_book3e.c:126:40: note: each undeclared 
identifier is reported only once for each function it appears in
cc1: all warnings being treated as errors



Thanks Robot for reporting that.

The problem is not trivial and is in fact deeper, we have a 
misdefinition of _PAGE_EXEC on book3e.


I sent a v2 which adds two patches at the begining of the series to 
clear that problem, then I fixed this patch 3 (which has become patch 5) 
to use _PAGE_EXEC instead of _PAGE_BAP_SX.


Christophe



vim +/_PAGE_BAP_SX +126 arch/powerpc/mm/nohash/fsl_book3e.c

114 
115 TLBCAM[index].MAS0 = MAS0_TLBSEL(1) | MAS0_ESEL(index) | 
MAS0_NV(index+1);
116 TLBCAM[index].MAS1 = MAS1_VALID | MAS1_IPROT | 
MAS1_TSIZE(tsize) | MAS1_TID(pid);
117 TLBCAM[index].MAS2 = virt & PAGE_MASK;
118 
119 TLBCAM[index].MAS2 |= (flags & _PAGE_WRITETHRU) ? MAS2_W : 0;
120 TLBCAM[index].MAS2 |= (flags & _PAGE_NO_CACHE) ? MAS2_I : 0;
121 TLBCAM[index].MAS2 |= (flags & _PAGE_COHERENT) ? MAS2_M : 0;
122 TLBCAM[index].MAS2 |= (flags & _PAGE_GUARDED) ? MAS2_G : 0;
123 TLBCAM[index].MAS2 |= (flags & _PAGE_ENDIAN) ? MAS2_E : 0;
124 
125 TLBCAM[index].MAS3 = (phys & MAS3_RPN) | MAS3_SR;
  > 126  TLBCAM[index].MAS3 |= (flags & _PAGE_BAP_SX) ? MAS3_SX : 0;
127 TLBCAM[index].MAS3 |= (flags & _PAGE_RW) ? MAS3_SW : 0;
128 if (mmu_has_feature(MMU_FTR_BIG_PHYS))
129 TLBCAM[index].MAS7 = (u64)phys >> 32;
130 
131 /* Below is unlikely -- only for large user pages or similar */
132 if (pte_user(__pte(flags))) {
133 TLBCAM[index].MAS3 |= MAS3_UR;
134 TLBCAM[index].MAS3 |= (flags & _PAGE_EXEC) ? MAS3_UX : 
0;
135 TLBCAM[index].MAS3 |= (flags & _PAGE_RW) ? MAS3_UW : 0;
136 }
137 
138 tlbcam_addrs[index].start = virt;
139 tlbcam_addrs[index].limit = virt + size - 1;
140 tlbcam_addrs[index].phys = phys;
141 }
142 

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org



Re: [PATCH v2 02/10] powerpc/book3e: Fix set_memory_x() and set_memory_nx()

2021-10-25 Thread Christophe Leroy




On 23/10/2021 13:47, Christophe Leroy wrote:

set_memory_x() calls pte_mkexec() which sets _PAGE_EXEC.
set_memory_nx() calls pte_exprotec() which clears _PAGE_EXEC.

Book3e has 2 bits, UX and SX, which defines the exec rights
resp. for user (PR=1) and for kernel (PR=0).

_PAGE_EXEC is defined as UX only.

An executable kernel page is set with either _PAGE_KERNEL_RWX
or _PAGE_KERNEL_ROX, which both have SX set and UX cleared.

So set_memory_nx() call for an executable kernel page does
nothing because UX is already cleared.

And set_memory_x() on a non-executable kernel page makes it
executable for the user and keeps it non-executable for kernel.

Also, pte_exec() always returns 'false' on kernel pages, because
it checks _PAGE_EXEC which doesn't include SX, so for instance
the W+X check doesn't work.

To fix this:
- change tlb_low_64e.S to use _PAGE_BAP_UX instead of _PAGE_USER
- sets both UX and SX in _PAGE_EXEC so that pte_user() returns
true whenever one of the two bits is set and pte_exprotect()
clears both bits.
- Define a book3e specific version of pte_mkexec() which sets
either SX or UX based on UR.

Fixes: 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines")
Signed-off-by: Christophe Leroy 
---
v2: New


pte_mkexec() in nohash/64/pgtable.h conflicts with the one in 
nohash/pte_book3e.h


Should guard it with  #ifndef pte_mkexec(), but as pte_book3e is the 
only user in 64 bits, then just remove it from there.


Send v3 with only that change compared to v2.

Christophe


---
  arch/powerpc/include/asm/nohash/32/pgtable.h |  2 ++
  arch/powerpc/include/asm/nohash/pte-book3e.h | 18 ++
  arch/powerpc/mm/nohash/tlb_low_64e.S |  8 
  3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index ac0a5ff48c3a..d6ba821a56ce 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -193,10 +193,12 @@ static inline pte_t pte_wrprotect(pte_t pte)
  }
  #endif
  
+#ifndef pte_mkexec

  static inline pte_t pte_mkexec(pte_t pte)
  {
return __pte(pte_val(pte) | _PAGE_EXEC);
  }
+#endif
  
  #define pmd_none(pmd)		(!pmd_val(pmd))

  #define   pmd_bad(pmd)(pmd_val(pmd) & _PMD_BAD)
diff --git a/arch/powerpc/include/asm/nohash/pte-book3e.h 
b/arch/powerpc/include/asm/nohash/pte-book3e.h
index 813918f40765..f798640422c2 100644
--- a/arch/powerpc/include/asm/nohash/pte-book3e.h
+++ b/arch/powerpc/include/asm/nohash/pte-book3e.h
@@ -48,7 +48,7 @@
  #define _PAGE_WRITETHRU   0x80 /* W: cache write-through */
  
  /* "Higher level" linux bit combinations */

-#define _PAGE_EXEC _PAGE_BAP_UX /* .. and was cache cleaned */
+#define _PAGE_EXEC (_PAGE_BAP_SX | _PAGE_BAP_UX) /* .. and was 
cache cleaned */
  #define _PAGE_RW  (_PAGE_BAP_SW | _PAGE_BAP_UW) /* User write 
permission */
  #define _PAGE_KERNEL_RW   (_PAGE_BAP_SW | _PAGE_BAP_SR | 
_PAGE_DIRTY)
  #define _PAGE_KERNEL_RO   (_PAGE_BAP_SR)
@@ -93,11 +93,11 @@
  /* Permission masks used to generate the __P and __S table */
  #define PAGE_NONE __pgprot(_PAGE_BASE)
  #define PAGE_SHARED   __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
-#define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | 
_PAGE_EXEC)
+#define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | 
_PAGE_BAP_UX)
  #define PAGE_COPY __pgprot(_PAGE_BASE | _PAGE_USER)
-#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
+#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_BAP_UX)
  #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER)
-#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
+#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_BAP_UX)
  
  #ifndef __ASSEMBLY__

  static inline pte_t pte_mkprivileged(pte_t pte)
@@ -113,6 +113,16 @@ static inline pte_t pte_mkuser(pte_t pte)
  }
  
  #define pte_mkuser pte_mkuser

+
+static inline pte_t pte_mkexec(pte_t pte)
+{
+   if (pte_val(pte) & _PAGE_BAP_UR)
+   return __pte((pte_val(pte) & ~_PAGE_BAP_SX) | _PAGE_BAP_UX);
+   else
+   return __pte((pte_val(pte) & ~_PAGE_BAP_UX) | _PAGE_BAP_SX);
+}
+#define pte_mkexec pte_mkexec
+
  #endif /* __ASSEMBLY__ */
  
  #endif /* __KERNEL__ */

diff --git a/arch/powerpc/mm/nohash/tlb_low_64e.S 
b/arch/powerpc/mm/nohash/tlb_low_64e.S
index bf24451f3e71..9235e720e357 100644
--- a/arch/powerpc/mm/nohash/tlb_low_64e.S
+++ b/arch/powerpc/mm/nohash/tlb_low_64e.S
@@ -222,7 +222,7 @@ tlb_miss_kernel_bolted:
  
  tlb_miss_fault_bolted:

/* We need to check if it was an instruction miss */
-   andi.   r10,r11,_PAGE_EXEC|_PAGE_BAP_SX
+   andi.   r10,r11,_PAGE_BAP_UX|_PAGE_BAP_SX
bne itlb_miss_fault_bolted
  dtlb_miss_fault_bolted:
tlb_epilog_bolted
@@ -239,7 +239,7 @@ 

[PATCH v3 02/10] powerpc/book3e: Fix set_memory_x() and set_memory_nx()

2021-10-25 Thread Christophe Leroy
set_memory_x() calls pte_mkexec() which sets _PAGE_EXEC.
set_memory_nx() calls pte_exprotec() which clears _PAGE_EXEC.

Book3e has 2 bits, UX and SX, which defines the exec rights
resp. for user (PR=1) and for kernel (PR=0).

_PAGE_EXEC is defined as UX only.

An executable kernel page is set with either _PAGE_KERNEL_RWX
or _PAGE_KERNEL_ROX, which both have SX set and UX cleared.

So set_memory_nx() call for an executable kernel page does
nothing because UX is already cleared.

And set_memory_x() on a non-executable kernel page makes it
executable for the user and keeps it non-executable for kernel.

Also, pte_exec() always returns 'false' on kernel pages, because
it checks _PAGE_EXEC which doesn't include SX, so for instance
the W+X check doesn't work.

To fix this:
- change tlb_low_64e.S to use _PAGE_BAP_UX instead of _PAGE_USER
- sets both UX and SX in _PAGE_EXEC so that pte_user() returns
true whenever one of the two bits is set and pte_exprotect()
clears both bits.
- Define a book3e specific version of pte_mkexec() which sets
either SX or UX based on UR.

Fixes: 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines")
Signed-off-by: Christophe Leroy 
---
v3: Removed pte_mkexec() from nohash/64/pgtable.h
v2: New
---
 arch/powerpc/include/asm/nohash/32/pgtable.h |  2 ++
 arch/powerpc/include/asm/nohash/64/pgtable.h |  5 -
 arch/powerpc/include/asm/nohash/pte-book3e.h | 18 ++
 arch/powerpc/mm/nohash/tlb_low_64e.S |  8 
 4 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index ac0a5ff48c3a..d6ba821a56ce 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -193,10 +193,12 @@ static inline pte_t pte_wrprotect(pte_t pte)
 }
 #endif
 
+#ifndef pte_mkexec
 static inline pte_t pte_mkexec(pte_t pte)
 {
return __pte(pte_val(pte) | _PAGE_EXEC);
 }
+#endif
 
 #define pmd_none(pmd)  (!pmd_val(pmd))
 #definepmd_bad(pmd)(pmd_val(pmd) & _PMD_BAD)
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index d081704b13fb..9d2905a47410 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -118,11 +118,6 @@ static inline pte_t pte_wrprotect(pte_t pte)
return __pte(pte_val(pte) & ~_PAGE_RW);
 }
 
-static inline pte_t pte_mkexec(pte_t pte)
-{
-   return __pte(pte_val(pte) | _PAGE_EXEC);
-}
-
 #define PMD_BAD_BITS   (PTE_TABLE_SIZE-1)
 #define PUD_BAD_BITS   (PMD_TABLE_SIZE-1)
 
diff --git a/arch/powerpc/include/asm/nohash/pte-book3e.h 
b/arch/powerpc/include/asm/nohash/pte-book3e.h
index 813918f40765..f798640422c2 100644
--- a/arch/powerpc/include/asm/nohash/pte-book3e.h
+++ b/arch/powerpc/include/asm/nohash/pte-book3e.h
@@ -48,7 +48,7 @@
 #define _PAGE_WRITETHRU0x80 /* W: cache write-through */
 
 /* "Higher level" linux bit combinations */
-#define _PAGE_EXEC _PAGE_BAP_UX /* .. and was cache cleaned */
+#define _PAGE_EXEC (_PAGE_BAP_SX | _PAGE_BAP_UX) /* .. and was 
cache cleaned */
 #define _PAGE_RW   (_PAGE_BAP_SW | _PAGE_BAP_UW) /* User write 
permission */
 #define _PAGE_KERNEL_RW(_PAGE_BAP_SW | _PAGE_BAP_SR | 
_PAGE_DIRTY)
 #define _PAGE_KERNEL_RO(_PAGE_BAP_SR)
@@ -93,11 +93,11 @@
 /* Permission masks used to generate the __P and __S table */
 #define PAGE_NONE  __pgprot(_PAGE_BASE)
 #define PAGE_SHARED__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
-#define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | 
_PAGE_EXEC)
+#define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | 
_PAGE_BAP_UX)
 #define PAGE_COPY  __pgprot(_PAGE_BASE | _PAGE_USER)
-#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
+#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_BAP_UX)
 #define PAGE_READONLY  __pgprot(_PAGE_BASE | _PAGE_USER)
-#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
+#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_BAP_UX)
 
 #ifndef __ASSEMBLY__
 static inline pte_t pte_mkprivileged(pte_t pte)
@@ -113,6 +113,16 @@ static inline pte_t pte_mkuser(pte_t pte)
 }
 
 #define pte_mkuser pte_mkuser
+
+static inline pte_t pte_mkexec(pte_t pte)
+{
+   if (pte_val(pte) & _PAGE_BAP_UR)
+   return __pte((pte_val(pte) & ~_PAGE_BAP_SX) | _PAGE_BAP_UX);
+   else
+   return __pte((pte_val(pte) & ~_PAGE_BAP_UX) | _PAGE_BAP_SX);
+}
+#define pte_mkexec pte_mkexec
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/mm/nohash/tlb_low_64e.S 
b/arch/powerpc/mm/nohash/tlb_low_64e.S
index bf24451f3e71..9235e720e357 100644
--- a/arch/powerpc/mm/nohash/tlb_low_64e.S
+++ b/arch/powerpc/mm/nohash/tlb_low_64e.S
@@ -222,7 +222,7 

[PATCH v3 07/10] powerpc/fsl_booke: Tell map_mem_in_cams() if init is done

2021-10-25 Thread Christophe Leroy
In order to be able to call map_mem_in_cams() once more
after init for STRICT_KERNEL_RWX, add an argument.

For now, map_mem_in_cams() is always called only during init.

Signed-off-by: Christophe Leroy 
---
v3: No change
v2: No change
---
 arch/powerpc/mm/mmu_decl.h   |  2 +-
 arch/powerpc/mm/nohash/fsl_book3e.c  | 12 ++--
 arch/powerpc/mm/nohash/kaslr_booke.c |  2 +-
 arch/powerpc/mm/nohash/tlb.c |  4 ++--
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index dd1cabc2ea0f..e13a3c0caa02 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -126,7 +126,7 @@ unsigned long mmu_mapin_ram(unsigned long base, unsigned 
long top);
 
 #ifdef CONFIG_PPC_FSL_BOOK3E
 extern unsigned long map_mem_in_cams(unsigned long ram, int max_cam_idx,
-bool dryrun);
+bool dryrun, bool init);
 extern unsigned long calc_cam_sz(unsigned long ram, unsigned long virt,
 phys_addr_t phys);
 #ifdef CONFIG_PPC32
diff --git a/arch/powerpc/mm/nohash/fsl_book3e.c 
b/arch/powerpc/mm/nohash/fsl_book3e.c
index 2668bb06e4fa..8ae1ba7985df 100644
--- a/arch/powerpc/mm/nohash/fsl_book3e.c
+++ b/arch/powerpc/mm/nohash/fsl_book3e.c
@@ -168,7 +168,7 @@ unsigned long calc_cam_sz(unsigned long ram, unsigned long 
virt,
 
 static unsigned long map_mem_in_cams_addr(phys_addr_t phys, unsigned long virt,
unsigned long ram, int max_cam_idx,
-   bool dryrun)
+   bool dryrun, bool init)
 {
int i;
unsigned long amount_mapped = 0;
@@ -203,12 +203,12 @@ static unsigned long map_mem_in_cams_addr(phys_addr_t 
phys, unsigned long virt,
return amount_mapped;
 }
 
-unsigned long map_mem_in_cams(unsigned long ram, int max_cam_idx, bool dryrun)
+unsigned long map_mem_in_cams(unsigned long ram, int max_cam_idx, bool dryrun, 
bool init)
 {
unsigned long virt = PAGE_OFFSET;
phys_addr_t phys = memstart_addr;
 
-   return map_mem_in_cams_addr(phys, virt, ram, max_cam_idx, dryrun);
+   return map_mem_in_cams_addr(phys, virt, ram, max_cam_idx, dryrun, init);
 }
 
 #ifdef CONFIG_PPC32
@@ -249,7 +249,7 @@ void __init adjust_total_lowmem(void)
ram = min((phys_addr_t)__max_low_memory, (phys_addr_t)total_lowmem);
 
i = switch_to_as1();
-   __max_low_memory = map_mem_in_cams(ram, CONFIG_LOWMEM_CAM_NUM, false);
+   __max_low_memory = map_mem_in_cams(ram, CONFIG_LOWMEM_CAM_NUM, false, 
true);
restore_to_as0(i, 0, 0, 1);
 
pr_info("Memory CAM mapping: ");
@@ -320,11 +320,11 @@ notrace void __init relocate_init(u64 dt_ptr, phys_addr_t 
start)
/* map a 64M area for the second relocation */
if (memstart_addr > start)
map_mem_in_cams(0x400, CONFIG_LOWMEM_CAM_NUM,
-   false);
+   false, true);
else
map_mem_in_cams_addr(start, PAGE_OFFSET + offset,
0x400, CONFIG_LOWMEM_CAM_NUM,
-   false);
+   false, true);
restore_to_as0(n, offset, __va(dt_ptr), 1);
/* We should never reach here */
panic("Relocation error");
diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c 
b/arch/powerpc/mm/nohash/kaslr_booke.c
index 4c74e8a5482b..8fc49b1b4a91 100644
--- a/arch/powerpc/mm/nohash/kaslr_booke.c
+++ b/arch/powerpc/mm/nohash/kaslr_booke.c
@@ -314,7 +314,7 @@ static unsigned long __init kaslr_choose_location(void 
*dt_ptr, phys_addr_t size
pr_warn("KASLR: No safe seed for randomizing the kernel 
base.\n");
 
ram = min_t(phys_addr_t, __max_low_memory, size);
-   ram = map_mem_in_cams(ram, CONFIG_LOWMEM_CAM_NUM, true);
+   ram = map_mem_in_cams(ram, CONFIG_LOWMEM_CAM_NUM, true, false);
linear_sz = min_t(unsigned long, ram, SZ_512M);
 
/* If the linear size is smaller than 64M, do not randmize */
diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index 5872f69141d5..fc195b9f524b 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -643,7 +643,7 @@ static void early_init_this_mmu(void)
 
if (map)
linear_map_top = map_mem_in_cams(linear_map_top,
-num_cams, false);
+num_cams, true, true);
}
 #endif
 
@@ -764,7 +764,7 @@ void setup_initial_memory_limit(phys_addr_t 
first_memblock_base,
num_cams = (mfspr(SPRN_TLB1CFG) & TLBnCFG_N_ENTRY) / 4;
 
linear_sz = 

[PATCH v3 01/10] powerpc/nohash: Fix __ptep_set_access_flags() and ptep_set_wrprotect()

2021-10-25 Thread Christophe Leroy
Commit 26973fa5ac0e ("powerpc/mm: use pte helpers in generic code")
changed those two functions to use pte helpers to determine which
bits to clear and which bits to set.

This change was based on the assumption that bits to be set/cleared
are always the same and can be determined by applying the pte
manipulation helpers on __pte(0).

But on platforms like book3e, the bits depend on whether the page
is a user page or not.

For the time being it more or less works because of _PAGE_EXEC being
used for user pages only and exec right being set at all time on
kernel page. But following patch will clean that and output of
pte_mkexec() will depend on the page being a user or kernel page.

Instead of trying to make an even more complicated helper where bits
would become dependent on the final pte value, come back to a more
static situation like before commit 26973fa5ac0e ("powerpc/mm: use
pte helpers in generic code"), by introducing an 8xx specific
version of __ptep_set_access_flags() and ptep_set_wrprotect().

Fixes: 26973fa5ac0e ("powerpc/mm: use pte helpers in generic code")
Signed-off-by: Christophe Leroy 
---
v3: No change
v2: New
---
 arch/powerpc/include/asm/nohash/32/pgtable.h | 17 +++
 arch/powerpc/include/asm/nohash/32/pte-8xx.h | 22 
 2 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index f06ae00f2a65..ac0a5ff48c3a 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -306,30 +306,29 @@ static inline pte_t ptep_get_and_clear(struct mm_struct 
*mm, unsigned long addr,
 }
 
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
+#ifndef ptep_set_wrprotect
 static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
  pte_t *ptep)
 {
-   unsigned long clr = ~pte_val(pte_wrprotect(__pte(~0)));
-   unsigned long set = pte_val(pte_wrprotect(__pte(0)));
-
-   pte_update(mm, addr, ptep, clr, set, 0);
+   pte_update(mm, addr, ptep, _PAGE_RW, 0, 0);
 }
+#endif
 
+#ifndef __ptep_set_access_flags
 static inline void __ptep_set_access_flags(struct vm_area_struct *vma,
   pte_t *ptep, pte_t entry,
   unsigned long address,
   int psize)
 {
-   pte_t pte_set = 
pte_mkyoung(pte_mkdirty(pte_mkwrite(pte_mkexec(__pte(0);
-   pte_t pte_clr = 
pte_mkyoung(pte_mkdirty(pte_mkwrite(pte_mkexec(__pte(~0);
-   unsigned long set = pte_val(entry) & pte_val(pte_set);
-   unsigned long clr = ~pte_val(entry) & ~pte_val(pte_clr);
+   unsigned long set = pte_val(entry) &
+   (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | 
_PAGE_EXEC);
int huge = psize > mmu_virtual_psize ? 1 : 0;
 
-   pte_update(vma->vm_mm, address, ptep, clr, set, huge);
+   pte_update(vma->vm_mm, address, ptep, 0, set, huge);
 
flush_tlb_page(vma, address);
 }
+#endif
 
 static inline int pte_young(pte_t pte)
 {
diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h 
b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index fcc48d590d88..1a89ebdc3acc 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -136,6 +136,28 @@ static inline pte_t pte_mkhuge(pte_t pte)
 
 #define pte_mkhuge pte_mkhuge
 
+static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, 
pte_t *p,
+unsigned long clr, unsigned long set, int 
huge);
+
+static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep)
+{
+   pte_update(mm, addr, ptep, 0, _PAGE_RO, 0);
+}
+#define ptep_set_wrprotect ptep_set_wrprotect
+
+static inline void __ptep_set_access_flags(struct vm_area_struct *vma, pte_t 
*ptep,
+  pte_t entry, unsigned long address, 
int psize)
+{
+   unsigned long set = pte_val(entry) & (_PAGE_DIRTY | _PAGE_ACCESSED | 
_PAGE_EXEC);
+   unsigned long clr = ~pte_val(entry) & _PAGE_RO;
+   int huge = psize > mmu_virtual_psize ? 1 : 0;
+
+   pte_update(vma->vm_mm, address, ptep, clr, set, huge);
+
+   flush_tlb_page(vma, address);
+}
+#define __ptep_set_access_flags __ptep_set_access_flags
+
 static inline unsigned long pgd_leaf_size(pgd_t pgd)
 {
if (pgd_val(pgd) & _PMD_PAGE_8M)
-- 
2.31.1



[PATCH v3 10/10] powerpc/fsl_booke: Enable STRICT_KERNEL_RWX

2021-10-25 Thread Christophe Leroy
Enable STRICT_KERNEL_RWX on fsl_booke.

For that, we need additional TLBCAMs dedicated to linear mapping,
based on the alignment of _sinittext.

By default, up to 768 Mbytes of memory are mapped.
It uses 3 TLBCAMs of size 256 Mbytes.

With a data alignment of 16, we need up to 9 TLBCAMs:
  16/16/16/16/64/64/64/256/256

With a data alignment of 4, we need up to 12 TLBCAMs:
  4/4/4/4/16/16/16/64/64/64/256/256

With a data alignment of 1, we need up to 15 TLBCAMs:
  1/1/1/1/4/4/4/16/16/16/64/64/64/256/256

By default, set a 16 Mbytes alignment as a compromise between memory
usage and number of TLBCAMs. This can be adjusted manually when needed.

For the time being, it doens't work when the base is randomised.

Signed-off-by: Christophe Leroy 
---
v3: No change
v2: No change
---
 arch/powerpc/Kconfig | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6b9f523882c5..939a47642a9c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -139,6 +139,7 @@ config PPC
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
&& PPC_BOOK3S_64
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX   if (PPC_BOOK3S || PPC_8xx || 
40x) && !HIBERNATION
+   select ARCH_HAS_STRICT_KERNEL_RWX   if FSL_BOOKE && !HIBERNATION && 
!RANDOMIZE_BASE
select ARCH_HAS_STRICT_MODULE_RWX   if ARCH_HAS_STRICT_KERNEL_RWX 
&& !PPC_BOOK3S_32
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE
@@ -778,7 +779,8 @@ config DATA_SHIFT_BOOL
bool "Set custom data alignment"
depends on ADVANCED_OPTIONS
depends on STRICT_KERNEL_RWX || DEBUG_PAGEALLOC || KFENCE
-   depends on PPC_BOOK3S_32 || (PPC_8xx && !PIN_TLB_DATA && 
!STRICT_KERNEL_RWX)
+   depends on PPC_BOOK3S_32 || (PPC_8xx && !PIN_TLB_DATA && 
!STRICT_KERNEL_RWX) || \
+  FSL_BOOKE
help
  This option allows you to set the kernel data alignment. When
  RAM is mapped by blocks, the alignment needs to fit the size and
@@ -791,11 +793,13 @@ config DATA_SHIFT
default 24 if STRICT_KERNEL_RWX && PPC64
range 17 28 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC || KFENCE) && 
PPC_BOOK3S_32
range 19 23 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC || KFENCE) && 
PPC_8xx
+   range 20 24 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC || KFENCE) && 
PPC_FSL_BOOKE
default 22 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default 18 if (DEBUG_PAGEALLOC || KFENCE) && PPC_BOOK3S_32
default 23 if STRICT_KERNEL_RWX && PPC_8xx
default 23 if (DEBUG_PAGEALLOC || KFENCE) && PPC_8xx && PIN_TLB_DATA
default 19 if (DEBUG_PAGEALLOC || KFENCE) && PPC_8xx
+   default 24 if STRICT_KERNEL_RWX && FSL_BOOKE
default PPC_PAGE_SHIFT
help
  On Book3S 32 (603+), DBATs are used to map kernel text and rodata RO.
@@ -1123,7 +1127,10 @@ config LOWMEM_CAM_NUM_BOOL
 config LOWMEM_CAM_NUM
depends on FSL_BOOKE
int "Number of CAMs to use to map low memory" if LOWMEM_CAM_NUM_BOOL
-   default 3
+   default 3 if !STRICT_KERNEL_RWX
+   default 9 if DATA_SHIFT >= 24
+   default 12 if DATA_SHIFT >= 22
+   default 15
 
 config DYNAMIC_MEMSTART
bool "Enable page aligned dynamic load address for kernel"
-- 
2.31.1



[PATCH v3 09/10] powerpc/fsl_booke: Update of TLBCAMs after init

2021-10-25 Thread Christophe Leroy
After init, set readonly memory as ROX and set readwrite
memory as RWX, if STRICT_KERNEL_RWX is enabled.

Signed-off-by: Christophe Leroy 
---
v3: No change
v2: No change
---
 arch/powerpc/mm/mmu_decl.h  |  2 +-
 arch/powerpc/mm/nohash/fsl_book3e.c | 32 +
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index e13a3c0caa02..0dd4c18f8363 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -168,7 +168,7 @@ static inline phys_addr_t v_block_mapped(unsigned long va) 
{ return 0; }
 static inline unsigned long p_block_mapped(phys_addr_t pa) { return 0; }
 #endif
 
-#if defined(CONFIG_PPC_BOOK3S_32) || defined(CONFIG_PPC_8xx)
+#if defined(CONFIG_PPC_BOOK3S_32) || defined(CONFIG_PPC_8xx) || 
defined(CONFIG_PPC_FSL_BOOK3E)
 void mmu_mark_initmem_nx(void);
 void mmu_mark_rodata_ro(void);
 #else
diff --git a/arch/powerpc/mm/nohash/fsl_book3e.c 
b/arch/powerpc/mm/nohash/fsl_book3e.c
index 88132cab3442..b231a54f540c 100644
--- a/arch/powerpc/mm/nohash/fsl_book3e.c
+++ b/arch/powerpc/mm/nohash/fsl_book3e.c
@@ -182,7 +182,7 @@ static unsigned long map_mem_in_cams_addr(phys_addr_t phys, 
unsigned long virt,
/* Calculate CAM values */
for (i = 0; boundary && i < max_cam_idx; i++) {
unsigned long cam_sz;
-   pgprot_t prot = PAGE_KERNEL_X;
+   pgprot_t prot = init ? PAGE_KERNEL_X : PAGE_KERNEL_ROX;
 
cam_sz = calc_cam_sz(boundary, virt, phys);
if (!dryrun)
@@ -195,7 +195,7 @@ static unsigned long map_mem_in_cams_addr(phys_addr_t phys, 
unsigned long virt,
}
for (ram -= amount_mapped; ram && i < max_cam_idx; i++) {
unsigned long cam_sz;
-   pgprot_t prot = PAGE_KERNEL_X;
+   pgprot_t prot = init ? PAGE_KERNEL_X : PAGE_KERNEL;
 
cam_sz = calc_cam_sz(ram, virt, phys);
if (!dryrun)
@@ -210,8 +210,13 @@ static unsigned long map_mem_in_cams_addr(phys_addr_t 
phys, unsigned long virt,
if (dryrun)
return amount_mapped;
 
-   loadcam_multi(0, i, max_cam_idx);
-   tlbcam_index = i;
+   if (init) {
+   loadcam_multi(0, i, max_cam_idx);
+   tlbcam_index = i;
+   } else {
+   loadcam_multi(0, i, 0);
+   WARN_ON(i > tlbcam_index);
+   }
 
 #ifdef CONFIG_PPC64
get_paca()->tcd.esel_next = i;
@@ -280,6 +285,25 @@ void __init adjust_total_lowmem(void)
memblock_set_current_limit(memstart_addr + __max_low_memory);
 }
 
+#ifdef CONFIG_STRICT_KERNEL_RWX
+void mmu_mark_rodata_ro(void)
+{
+   /* Everything is done in mmu_mark_initmem_nx() */
+}
+#endif
+
+void mmu_mark_initmem_nx(void)
+{
+   unsigned long remapped;
+
+   if (!strict_kernel_rwx_enabled())
+   return;
+
+   remapped = map_mem_in_cams(__max_low_memory, CONFIG_LOWMEM_CAM_NUM, 
false, false);
+
+   WARN_ON(__max_low_memory != remapped);
+}
+
 void setup_initial_memory_limit(phys_addr_t first_memblock_base,
phys_addr_t first_memblock_size)
 {
-- 
2.31.1



[PATCH v3 03/10] powerpc/booke: Disable STRICT_KERNEL_RWX, DEBUG_PAGEALLOC and KFENCE

2021-10-25 Thread Christophe Leroy
fsl_booke and 44x are not able to map kernel linear memory with
pages, so they can't support DEBUG_PAGEALLOC and KFENCE, and
STRICT_KERNEL_RWX is also a problem for now.

Enable those only on book3s (both 32 and 64 except KFENCE), 8xx and 40x.

Fixes: 88df6e90fa97 ("[POWERPC] DEBUG_PAGEALLOC for 32-bit")
Fixes: 95902e6c8864 ("powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32")
Fixes: 90cbac0e995d ("powerpc: Enable KFENCE for PPC32")
Signed-off-by: Christophe Leroy 
---
v3: No change
v2: No change
---
 arch/powerpc/Kconfig | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index ba5b66189358..6b9f523882c5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -138,7 +138,7 @@ config PPC
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
&& PPC_BOOK3S_64
select ARCH_HAS_SET_MEMORY
-   select ARCH_HAS_STRICT_KERNEL_RWX   if ((PPC_BOOK3S_64 || PPC32) && 
!HIBERNATION)
+   select ARCH_HAS_STRICT_KERNEL_RWX   if (PPC_BOOK3S || PPC_8xx || 
40x) && !HIBERNATION
select ARCH_HAS_STRICT_MODULE_RWX   if ARCH_HAS_STRICT_KERNEL_RWX 
&& !PPC_BOOK3S_32
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE
@@ -150,7 +150,7 @@ config PPC
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_STACKWALK
select ARCH_SUPPORTS_ATOMIC_RMW
-   select ARCH_SUPPORTS_DEBUG_PAGEALLOCif PPC32 || PPC_BOOK3S_64
+   select ARCH_SUPPORTS_DEBUG_PAGEALLOCif PPC_BOOK3S || PPC_8xx || 40x
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
select ARCH_USE_MEMTEST
@@ -190,7 +190,7 @@ config PPC
select HAVE_ARCH_JUMP_LABEL_RELATIVE
select HAVE_ARCH_KASAN  if PPC32 && PPC_PAGE_SHIFT <= 14
select HAVE_ARCH_KASAN_VMALLOC  if PPC32 && PPC_PAGE_SHIFT <= 14
-   select HAVE_ARCH_KFENCE if PPC32
+   select HAVE_ARCH_KFENCE if PPC_BOOK3S_32 || PPC_8xx || 
40x
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
-- 
2.31.1



[PATCH v3 08/10] powerpc/fsl_booke: Allocate separate TLBCAMs for readonly memory

2021-10-25 Thread Christophe Leroy
Reorganise TLBCAM allocation so that when STRICT_KERNEL_RWX is
enabled, TLBCAMs are allocated such that readonly memory uses
different TLBCAMs.

This results in an allocation looking like:

Memory CAM mapping: 4/4/4/1/1/1/1/16/16/16/64/64/64/256/256 Mb, residual: 256Mb

Signed-off-by: Christophe Leroy 
---
v3: No change
v2: No change
---
 arch/powerpc/mm/nohash/fsl_book3e.c | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/nohash/fsl_book3e.c 
b/arch/powerpc/mm/nohash/fsl_book3e.c
index 8ae1ba7985df..88132cab3442 100644
--- a/arch/powerpc/mm/nohash/fsl_book3e.c
+++ b/arch/powerpc/mm/nohash/fsl_book3e.c
@@ -172,15 +172,34 @@ static unsigned long map_mem_in_cams_addr(phys_addr_t 
phys, unsigned long virt,
 {
int i;
unsigned long amount_mapped = 0;
+   unsigned long boundary;
+
+   if (strict_kernel_rwx_enabled())
+   boundary = (unsigned long)(_sinittext - _stext);
+   else
+   boundary = ram;
 
/* Calculate CAM values */
-   for (i = 0; ram && i < max_cam_idx; i++) {
+   for (i = 0; boundary && i < max_cam_idx; i++) {
+   unsigned long cam_sz;
+   pgprot_t prot = PAGE_KERNEL_X;
+
+   cam_sz = calc_cam_sz(boundary, virt, phys);
+   if (!dryrun)
+   settlbcam(i, virt, phys, cam_sz, pgprot_val(prot), 0);
+
+   boundary -= cam_sz;
+   amount_mapped += cam_sz;
+   virt += cam_sz;
+   phys += cam_sz;
+   }
+   for (ram -= amount_mapped; ram && i < max_cam_idx; i++) {
unsigned long cam_sz;
+   pgprot_t prot = PAGE_KERNEL_X;
 
cam_sz = calc_cam_sz(ram, virt, phys);
if (!dryrun)
-   settlbcam(i, virt, phys, cam_sz,
- pgprot_val(PAGE_KERNEL_X), 0);
+   settlbcam(i, virt, phys, cam_sz, pgprot_val(prot), 0);
 
ram -= cam_sz;
amount_mapped += cam_sz;
-- 
2.31.1



[PATCH v3 05/10] powerpc/fsl_booke: Take exec flag into account when setting TLBCAMs

2021-10-25 Thread Christophe Leroy
Don't force MAS3_SX and MAS3_UX at all time. Take into account the
exec flag.

While at it, fix a couple of closeby style problems (indent with space
and unnecessary parenthesis), it keeps more readability.

Signed-off-by: Christophe Leroy 
---
v3: No change
v2: Use the new _PAGE_EXEC to check executability of flags instead of 
_PAGE_BAP_SX (Error reported by robot with tqm8541_defconfig)
---
 arch/powerpc/mm/nohash/fsl_book3e.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/nohash/fsl_book3e.c 
b/arch/powerpc/mm/nohash/fsl_book3e.c
index 03dacbe940e5..2668bb06e4fa 100644
--- a/arch/powerpc/mm/nohash/fsl_book3e.c
+++ b/arch/powerpc/mm/nohash/fsl_book3e.c
@@ -122,15 +122,18 @@ static void settlbcam(int index, unsigned long virt, 
phys_addr_t phys,
TLBCAM[index].MAS2 |= (flags & _PAGE_GUARDED) ? MAS2_G : 0;
TLBCAM[index].MAS2 |= (flags & _PAGE_ENDIAN) ? MAS2_E : 0;
 
-   TLBCAM[index].MAS3 = (phys & MAS3_RPN) | MAS3_SX | MAS3_SR;
-   TLBCAM[index].MAS3 |= ((flags & _PAGE_RW) ? MAS3_SW : 0);
+   TLBCAM[index].MAS3 = (phys & MAS3_RPN) | MAS3_SR;
+   TLBCAM[index].MAS3 |= (flags & _PAGE_RW) ? MAS3_SW : 0;
if (mmu_has_feature(MMU_FTR_BIG_PHYS))
TLBCAM[index].MAS7 = (u64)phys >> 32;
 
/* Below is unlikely -- only for large user pages or similar */
if (pte_user(__pte(flags))) {
-  TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR;
-  TLBCAM[index].MAS3 |= ((flags & _PAGE_RW) ? MAS3_UW : 0);
+   TLBCAM[index].MAS3 |= MAS3_UR;
+   TLBCAM[index].MAS3 |= (flags & _PAGE_EXEC) ? MAS3_UX : 0;
+   TLBCAM[index].MAS3 |= (flags & _PAGE_RW) ? MAS3_UW : 0;
+   } else {
+   TLBCAM[index].MAS3 |= (flags & _PAGE_EXEC) ? MAS3_SX : 0;
}
 
tlbcam_addrs[index].start = virt;
-- 
2.31.1



[PATCH v3 06/10] powerpc/fsl_booke: Enable reloading of TLBCAM without switching to AS1

2021-10-25 Thread Christophe Leroy
Avoid switching to AS1 when reloading TLBCAM after init for
STRICT_KERNEL_RWX.

When we setup AS1 we expect the entire accessible memory to be mapped
through one entry, this is not the case anymore at the end of init.

We are not changing the size of TLBCAMs, only flags, so no need to
switch to AS1.

So change loadcam_multi() to not switch to AS1 when the given
temporary tlb entry in 0.

Signed-off-by: Christophe Leroy 
---
v3: No change
v2: No change
---
 arch/powerpc/mm/nohash/tlb_low.S | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/nohash/tlb_low.S b/arch/powerpc/mm/nohash/tlb_low.S
index 5add4a51e51f..dd39074de9af 100644
--- a/arch/powerpc/mm/nohash/tlb_low.S
+++ b/arch/powerpc/mm/nohash/tlb_low.S
@@ -369,7 +369,7 @@ _GLOBAL(_tlbivax_bcast)
  * extern void loadcam_entry(unsigned int index)
  *
  * Load TLBCAM[index] entry in to the L2 CAM MMU
- * Must preserve r7, r8, r9, r10 and r11
+ * Must preserve r7, r8, r9, r10, r11, r12
  */
 _GLOBAL(loadcam_entry)
mflrr5
@@ -401,7 +401,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_BIG_PHYS)
  *
  * r3 = first entry to write
  * r4 = number of entries to write
- * r5 = temporary tlb entry
+ * r5 = temporary tlb entry (0 means no switch to AS1)
  */
 _GLOBAL(loadcam_multi)
mflrr8
@@ -409,6 +409,8 @@ _GLOBAL(loadcam_multi)
mfmsr   r11
andi.   r11,r11,MSR_IS
bne 10f
+   mr. r12, r5
+   beq 10f
 
/*
 * Set up temporary TLB entry that is the same as what we're
@@ -446,6 +448,8 @@ _GLOBAL(loadcam_multi)
/* Don't return to AS=0 if we were in AS=1 at function start */
andi.   r11,r11,MSR_IS
bne 3f
+   cmpwi   r12, 0
+   beq 3f
 
/* Return to AS=0 and clear the temporary entry */
mfmsr   r6
-- 
2.31.1



[PATCH v3 04/10] powerpc/fsl_booke: Rename fsl_booke.c to fsl_book3e.c

2021-10-25 Thread Christophe Leroy
We have a myriad of CONFIG symbols around different variants
of BOOKEs, which would be worth tidying up one day.

But at least, make file names and CONFIG option match:

We have CONFIG_FSL_BOOKE and CONFIG_PPC_FSL_BOOK3E.

fsl_booke.c is selected by and only by CONFIG_PPC_FSL_BOOK3E.
So rename it fsl_book3e to reduce confusion.

Signed-off-by: Christophe Leroy 
---
v3: No change
v2: No change
---
 arch/powerpc/mm/nohash/Makefile  | 4 ++--
 arch/powerpc/mm/nohash/{fsl_booke.c => fsl_book3e.c} | 0
 2 files changed, 2 insertions(+), 2 deletions(-)
 rename arch/powerpc/mm/nohash/{fsl_booke.c => fsl_book3e.c} (100%)

diff --git a/arch/powerpc/mm/nohash/Makefile b/arch/powerpc/mm/nohash/Makefile
index 0424f6ce5bd8..b1f630d423d8 100644
--- a/arch/powerpc/mm/nohash/Makefile
+++ b/arch/powerpc/mm/nohash/Makefile
@@ -7,7 +7,7 @@ obj-$(CONFIG_PPC_BOOK3E_64) += tlb_low_64e.o 
book3e_pgtable.o
 obj-$(CONFIG_40x)  += 40x.o
 obj-$(CONFIG_44x)  += 44x.o
 obj-$(CONFIG_PPC_8xx)  += 8xx.o
-obj-$(CONFIG_PPC_FSL_BOOK3E)   += fsl_booke.o
+obj-$(CONFIG_PPC_FSL_BOOK3E)   += fsl_book3e.o
 obj-$(CONFIG_RANDOMIZE_BASE)   += kaslr_booke.o
 ifdef CONFIG_HUGETLB_PAGE
 obj-$(CONFIG_PPC_FSL_BOOK3E)   += book3e_hugetlbpage.o
@@ -16,4 +16,4 @@ endif
 # Disable kcov instrumentation on sensitive code
 # This is necessary for booting with kcov enabled on book3e machines
 KCOV_INSTRUMENT_tlb.o := n
-KCOV_INSTRUMENT_fsl_booke.o := n
+KCOV_INSTRUMENT_fsl_book3e.o := n
diff --git a/arch/powerpc/mm/nohash/fsl_booke.c 
b/arch/powerpc/mm/nohash/fsl_book3e.c
similarity index 100%
rename from arch/powerpc/mm/nohash/fsl_booke.c
rename to arch/powerpc/mm/nohash/fsl_book3e.c
-- 
2.31.1



Re: [PATCH V2] powerpc/perf: Enable PMU counters post partition migration if PMU is active

2021-10-25 Thread Athira Rajeev


> On 21-Oct-2021, at 11:03 PM, Nathan Lynch  wrote:
> 
> Nicholas Piggin mailto:npig...@gmail.com>> writes:
>> Excerpts from Athira Rajeev's message of July 11, 2021 10:25 pm:
>>> During Live Partition Migration (LPM), it is observed that perf
>>> counter values reports zero post migration completion. However
>>> 'perf stat' with workload continues to show counts post migration
>>> since PMU gets disabled/enabled during sched switches. But incase
>>> of system/cpu wide monitoring, zero counts were reported with 'perf
>>> stat' after migration completion.
>>> 
>>> Example:
>>> ./perf stat -e r1001e -I 1000
>>>   time counts unit events
>>> 1.001010437 22,137,414  r1001e
>>> 2.002495447 15,455,821  r1001e
>>> <<>> As seen in next below logs, the counter values shows zero
>>>after migration is completed.
>>> <<>>
>>>86.142535370129,392,333,440  r1001e
>>>87.144714617  0  r1001e
>>>88.146526636  0  r1001e
>>>89.148085029  0  r1001e
>>> 
>>> Here PMU is enabled during start of perf session and counter
>>> values are read at intervals. Counters are only disabled at the
>>> end of session. The powerpc mobility code presently does not handle
>>> disabling and enabling back of PMU counters during partition
>>> migration. Also since the PMU register values are not saved/restored
>>> during migration, PMU registers like Monitor Mode Control Register 0
>>> (MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain
>>> the value it was programmed with. Hence PMU counters will not be
>>> enabled correctly post migration.
>>> 
>>> Fix this in mobility code by handling disabling and enabling of
>>> PMU in all cpu's before and after migration. Patch introduces two
>>> functions 'mobility_pmu_disable' and 'mobility_pmu_enable'.
>>> mobility_pmu_disable() is called before the processor threads goes
>>> to suspend state so as to disable the PMU counters. And disable is
>>> done only if there are any active events running on that cpu.
>>> mobility_pmu_enable() is called after the processor threads are
>>> back online to enable back the PMU counters.
>>> 
>>> Since the performance Monitor counters ( PMCs) are not
>>> saved/restored during LPM, results in PMC value being zero and the
>>> 'event->hw.prev_count' being non-zero value. This causes problem
>> 
>> Interesting. Are they defined to not be migrated, or may not be 
>> migrated?
> 
> PAPR may be silent on this... at least I haven't found anything yet. But
> I'm not very familiar with perf counters.
> 
> How much assurance do we have that hardware events we've programmed on
> the source can be reliably re-enabled on the destination, with the same
> semantics? Aren't there some model-specific counters that don't make
> sense to handle this way?
> 
> 
>>> diff --git a/arch/powerpc/include/asm/rtas.h 
>>> b/arch/powerpc/include/asm/rtas.h
>>> index 9dc97d2..cea72d7 100644
>>> --- a/arch/powerpc/include/asm/rtas.h
>>> +++ b/arch/powerpc/include/asm/rtas.h
>>> @@ -380,5 +380,13 @@ static inline void rtas_initialize(void) { }
>>> static inline void read_24x7_sys_info(void) { }
>>> #endif
>>> 
>>> +#ifdef CONFIG_PPC_PERF_CTRS
>>> +void mobility_pmu_disable(void);
>>> +void mobility_pmu_enable(void);
>>> +#else
>>> +static inline void mobility_pmu_disable(void) { }
>>> +static inline void mobility_pmu_enable(void) { }
>>> +#endif
>>> +
>>> #endif /* __KERNEL__ */
>>> #endif /* _POWERPC_RTAS_H */
>> 
>> It's not implemented in rtas, maybe consider putting this into a perf 
>> header?
> 
> +1

Sure, I will move this to perf_event header file

Thanks
Athira

Re: [PATCH V2] powerpc/perf: Enable PMU counters post partition migration if PMU is active

2021-10-25 Thread Athira Rajeev


> On 21-Oct-2021, at 10:47 PM, Nathan Lynch  wrote:
> 
> Athira Rajeev  > writes:
>> During Live Partition Migration (LPM), it is observed that perf
>> counter values reports zero post migration completion. However
>> 'perf stat' with workload continues to show counts post migration
>> since PMU gets disabled/enabled during sched switches. But incase
>> of system/cpu wide monitoring, zero counts were reported with 'perf
>> stat' after migration completion.
>> 
>> Example:
>> ./perf stat -e r1001e -I 1000
>>   time counts unit events
>> 1.001010437 22,137,414  r1001e
>> 2.002495447 15,455,821  r1001e
>> <<>> As seen in next below logs, the counter values shows zero
>>after migration is completed.
>> <<>>
>>86.142535370129,392,333,440  r1001e
>>87.144714617  0  r1001e
>>88.146526636  0  r1001e
>>89.148085029  0  r1001e
> 
> Confirmed in my environment:
> 
>51.099987985300,338  cache-misses
>52.101839374296,586  cache-misses
>53.116089796263,150  cache-misses
>54.117949249232,290  cache-misses
>55.602029375 68,700,421,711  cache-misses
>56.610073969  0  cache-misses
>57.614732000  0  cache-misses
> 
> I wonder what it means that there is a very unlikely huge value before
> the counter stops working -- I believe your example has this phenomenon
> too.
> 
> 
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c 
>> b/arch/powerpc/platforms/pseries/mobility.c
>> index e83e089..ff7a77c 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -476,6 +476,8 @@ static int do_join(void *arg)
>> retry:
>>  /* Must ensure MSR.EE off for H_JOIN. */
>>  hard_irq_disable();
>> +/* Disable PMU before suspend */
>> +mobility_pmu_disable();
>>  hvrc = plpar_hcall_norets(H_JOIN);
>> 
>>  switch (hvrc) {
>> @@ -530,6 +532,8 @@ static int do_join(void *arg)
>>   * reset the watchdog.
>>   */
>>  touch_nmi_watchdog();
>> +/* Enable PMU after resuming */
>> +mobility_pmu_enable();
>>  return ret;
>> }
> 
> We should minimize calls into other subsystems from this context (the
> callback function we've passed to stop_machine); it's fairly sensitive.
> Can this be moved out to pseries_migrate_partition() or similar?

Hi Nathan

Thanks for the review.
I will move the callbacks to “pseries_migrate_partition” in next version

Athira.



Re: [PATCH V2] powerpc/perf: Enable PMU counters post partition migration if PMU is active

2021-10-25 Thread Athira Rajeev


> On 21-Oct-2021, at 4:24 PM, Nicholas Piggin  wrote:
> 
> Excerpts from Athira Rajeev's message of July 11, 2021 10:25 pm:
>> During Live Partition Migration (LPM), it is observed that perf
>> counter values reports zero post migration completion. However
>> 'perf stat' with workload continues to show counts post migration
>> since PMU gets disabled/enabled during sched switches. But incase
>> of system/cpu wide monitoring, zero counts were reported with 'perf
>> stat' after migration completion.
>> 
>> Example:
>> ./perf stat -e r1001e -I 1000
>>   time counts unit events
>> 1.001010437 22,137,414  r1001e
>> 2.002495447 15,455,821  r1001e
>> <<>> As seen in next below logs, the counter values shows zero
>>after migration is completed.
>> <<>>
>>86.142535370129,392,333,440  r1001e
>>87.144714617  0  r1001e
>>88.146526636  0  r1001e
>>89.148085029  0  r1001e
>> 
>> Here PMU is enabled during start of perf session and counter
>> values are read at intervals. Counters are only disabled at the
>> end of session. The powerpc mobility code presently does not handle
>> disabling and enabling back of PMU counters during partition
>> migration. Also since the PMU register values are not saved/restored
>> during migration, PMU registers like Monitor Mode Control Register 0
>> (MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain
>> the value it was programmed with. Hence PMU counters will not be
>> enabled correctly post migration.
>> 
>> Fix this in mobility code by handling disabling and enabling of
>> PMU in all cpu's before and after migration. Patch introduces two
>> functions 'mobility_pmu_disable' and 'mobility_pmu_enable'.
>> mobility_pmu_disable() is called before the processor threads goes
>> to suspend state so as to disable the PMU counters. And disable is
>> done only if there are any active events running on that cpu.
>> mobility_pmu_enable() is called after the processor threads are
>> back online to enable back the PMU counters.
>> 
>> Since the performance Monitor counters ( PMCs) are not
>> saved/restored during LPM, results in PMC value being zero and the
>> 'event->hw.prev_count' being non-zero value. This causes problem
> 
> Interesting. Are they defined to not be migrated, or may not be 
> migrated?
> 
> I wonder what QEMU migration does with PMU registers.
> 
>> during updation of event->count since we always accumulate
>> (event->hw.prev_count - PMC value) in event->count.  If
>> event->hw.prev_count is greater PMC value, event->count becomes
>> negative. Fix this by re-initialising 'prev_count' also for all
>> events while enabling back the events. A new variable 'migrate' is
>> introduced in 'struct cpu_hw_event' to achieve this for LPM cases
>> in power_pmu_enable. Use the 'migrate' value to clear the PMC
>> index (stored in event->hw.idx) for all events so that event
>> count settings will get re-initialised correctly.
>> 
>> Signed-off-by: Athira Rajeev 
>> [ Fixed compilation error reported by kernel test robot ]
>> Reported-by: kernel test robot 
>> ---
>> Change from v1 -> v2:
>> - Moved the mobility_pmu_enable and mobility_pmu_disable
>>   declarations under CONFIG_PPC_PERF_CTRS in rtas.h.
>>   Also included 'asm/rtas.h' in core-book3s to fix the
>>   compilation warning reported by kernel test robot.
>> 
>> arch/powerpc/include/asm/rtas.h   |  8 ++
>> arch/powerpc/perf/core-book3s.c   | 44 
>> ---
>> arch/powerpc/platforms/pseries/mobility.c |  4 +++
>> 3 files changed, 53 insertions(+), 3 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/rtas.h 
>> b/arch/powerpc/include/asm/rtas.h
>> index 9dc97d2..cea72d7 100644
>> --- a/arch/powerpc/include/asm/rtas.h
>> +++ b/arch/powerpc/include/asm/rtas.h
>> @@ -380,5 +380,13 @@ static inline void rtas_initialize(void) { }
>> static inline void read_24x7_sys_info(void) { }
>> #endif
>> 
>> +#ifdef CONFIG_PPC_PERF_CTRS
>> +void mobility_pmu_disable(void);
>> +void mobility_pmu_enable(void);
>> +#else
>> +static inline void mobility_pmu_disable(void) { }
>> +static inline void mobility_pmu_enable(void) { }
>> +#endif
>> +
>> #endif /* __KERNEL__ */
>> #endif /* _POWERPC_RTAS_H */
> 
> It's not implemented in rtas, maybe consider putting this into a perf 
> header?

Hi Nick,
Thanks for the review comments.

Sure, I will move this to perf_event header file

> 
>> diff --git a/arch/powerpc/perf/core-book3s.c 
>> b/arch/powerpc/perf/core-book3s.c
>> index bb0ee71..90da7fa 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -18,6 +18,7 @@
>> #include 
>> #include 
>> #include 
>> +#include 
>> 
>> #ifdef CONFIG_PPC64
>> #include "internal.h"
>> @@ -58,6 +59,7 @@ struct cpu_hw_events {
>> 
>>  /* Store the PMC values */
>>  unsigned long pmcs[MAX_HWEVENTS];
>> +int migrate;
>> };
>> 

Re: [PATCH] locking: remove spin_lock_flags() etc

2021-10-25 Thread Waiman Long

On 10/25/21 11:44 AM, Arnd Bergmann wrote:

On Mon, Oct 25, 2021 at 5:28 PM Waiman Long  wrote:

On 10/25/21 9:06 AM, Arnd Bergmann wrote:

On s390, we pick between the cmpxchg() based directed-yield when
running on virtualized CPUs, and a normal qspinlock when running on a
dedicated CPU.

I am not aware that s390 is using qspinlocks at all as I don't see
ARCH_USE_QUEUED_SPINLOCKS being set anywhere under arch/s390. I only see
that it uses a cmpxchg based spinlock.

Sorry, I should not have said "normal" here. See arch/s390/lib/spinlock.c
for their custom queued spinlocks as implemented in arch_spin_lock_queued().
I don't know if that code actually does the same thing as the generic qspinlock,
but it seems at least similar.


Yes, you are right. Their queued lock code looks like a custom version 
of the pvqspinlock code.


Cheers,
Longman



[PATCH v5 5/5] PCI/AER: Include DEVCTL in aer_print_error()

2021-10-25 Thread Naveen Naidu
Print the contents of Device Control Register of the device which
detected the error. This might help in faster error diagnosis.

It is easy to test this by using aer-inject:

  $ aer-inject -s 00:03:0 corr-err-file

The content of the corr-err-file is as below:

  AER
  COR_STATUS BAD_TLP
  HEADER_LOG 0 1 2 3

Sample output from dummy error injected by aer-inject:

  pcieport :00:03.0: AER: Corrected error received: :00:03.0
  pcieport :00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link 
Layer, (Receiver)
  pcieport :00:03.0:   device [1b36:000c] error 
status/mask=0040/e000, devctl=0x000f <-- devctl added to the error log
  pcieport :00:03.0:[ 6] BadTLP

Signed-off-by: Naveen Naidu 
---
 drivers/pci/pci.h  |  2 ++
 drivers/pci/pcie/aer.c | 10 --
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index eb88d8bfeaf7..48ed7f91113b 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -437,6 +437,8 @@ struct aer_err_info {
u32 status; /* COR/UNCOR Error Status */
u32 mask;   /* COR/UNCOR Error Mask */
struct aer_header_log_regs tlp; /* TLP Header */
+
+   u16 devctl;
 };
 
 /* Preliminary AER error information processed from Root port */
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index d3937f5384e4..fdeef9deb016 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -729,8 +729,8 @@ void aer_print_error(struct pci_dev *dev, struct 
aer_err_info *info)
   aer_error_severity_string[info->severity],
   aer_error_layer[layer], aer_agent_string[agent]);
 
-   pci_printk(level, dev, "  device [%04x:%04x] error 
status/mask=%08x/%08x\n",
-  dev->vendor, dev->device, info->status, info->mask);
+   pci_printk(level, dev, "  device [%04x:%04x] error 
status/mask=%08x/%08x, devctl=%#06x\n",
+  dev->vendor, dev->device, info->status, info->mask, 
info->devctl);
 
__aer_print_error(dev, info);
 
@@ -1083,6 +1083,12 @@ int aer_get_device_error_info(struct pci_dev *dev, 
struct aer_err_info *info)
if (!aer)
return 0;
 
+   /*
+* Cache the value of Device Control Register now, because later the
+* device might not be available
+*/
+   pcie_capability_read_word(dev, PCI_EXP_DEVCTL, >devctl);
+
if (info->severity == AER_CORRECTABLE) {
pci_read_config_dword(dev, aer + PCI_ERR_COR_STATUS,
>status);
-- 
2.25.1



[PATCH v5 4/5] PCI/AER: Clear error device AER registers in aer_irq()

2021-10-25 Thread Naveen Naidu
Converge the APEI path and native AER path of clearing the AER registers
of the error device.

In APEI path, the system firmware clears the AER registers before
handing off the record to OS. But in "native AER" path, the execution
path of clearing the AER register is as follows:

  aer_isr_one_error
aer_print_port_info
  if (find_source_device())
aer_process_err_devices
  handle_error_source
pci_write_config_dword(dev, PCI_ERR_COR_STATUS, ...)

The above path has a bug, if the find_source_device() fails, AER
registers are not cleared from the error device. This means, the error
device will keep reporting the error again and again and would lead
to message spew.

Related Bug Report:
  https://lore.kernel.org/linux-pci/20151229155822.GA17321@localhost/
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1521173

The above bug could be avoided, if the AER registers are cleared during
the AER IRQ handler aer_irq(), which would provide guarantee that the AER
error registers are always cleared. This is similar to how APEI handles
these errors.

The main aim is that:

  When an interrupt handler deals with a interrupt, it must *always*
  clear the source of the interrupt.

Signed-off-by: Naveen Naidu 
---
 drivers/pci/pci.h  |  13 ++-
 drivers/pci/pcie/aer.c | 249 -
 2 files changed, 184 insertions(+), 78 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9be7a966fda7..eb88d8bfeaf7 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -424,7 +424,6 @@ static inline bool pci_dev_is_added(const struct pci_dev 
*dev)
 #define AER_MAX_MULTI_ERR_DEVICES  5   /* Not likely to have more */
 
 struct aer_err_info {
-   struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
int error_dev_num;
 
u16 id;
@@ -440,6 +439,18 @@ struct aer_err_info {
struct aer_header_log_regs tlp; /* TLP Header */
 };
 
+/* Preliminary AER error information processed from Root port */
+struct aer_devices_err_info {
+   struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
+   struct aer_err_info err_info;
+};
+
+/* AER information associated with each error device */
+struct aer_dev_err_info {
+   struct pci_dev *dev;
+   struct aer_err_info err_info;
+};
+
 int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info);
 void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
 #endif /* CONFIG_PCIEAER */
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 241ff361b43c..d3937f5384e4 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -36,6 +36,18 @@
 
 #define AER_ERROR_SOURCES_MAX  128
 
+/*
+ * There can be 128 maximum error sources (AER_ERROR_SOURCES_MAX) and each
+ * error source can have maximum of 5 error devices (AER_MAX_MULTI_ERR_DEVICES)
+ * so the maximum error devices we can report is:
+ *
+ * AER_ERROR_DEVICES_MAX = AER_ERROR_SOURCES_MAX * AER_MAX_MULTI_ERR_DEVICES 
== (128 * 5) == 640
+ *
+ * But since, the size in KFIFO should be a power of two, the closest value
+ * to 640 is 1024
+ */
+# define AER_ERROR_DEVICES_MAX 1024
+
 #define AER_MAX_TYPEOF_COR_ERRS16  /* as per 
PCI_ERR_COR_STATUS */
 #define AER_MAX_TYPEOF_UNCOR_ERRS  27  /* as per PCI_ERR_UNCOR_STATUS*/
 
@@ -46,7 +58,7 @@ struct aer_err_source {
 
 struct aer_rpc {
struct pci_dev *rpd;/* Root Port device */
-   DECLARE_KFIFO(aer_fifo, struct aer_err_source, AER_ERROR_SOURCES_MAX);
+   DECLARE_KFIFO(aer_fifo, struct aer_dev_err_info, AER_ERROR_DEVICES_MAX);
 };
 
 /* AER stats for the device */
@@ -803,14 +815,14 @@ void cper_print_aer(struct pci_dev *dev, int aer_severity,
 
 /**
  * add_error_device - list device to be handled
- * @e_info: pointer to error info
+ * @e_dev: pointer to error info
  * @dev: pointer to pci_dev to be added
  */
-static int add_error_device(struct aer_err_info *e_info, struct pci_dev *dev)
+static int add_error_device(struct aer_devices_err_info *e_dev, struct pci_dev 
*dev)
 {
-   if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) {
-   e_info->dev[e_info->error_dev_num] = pci_dev_get(dev);
-   e_info->error_dev_num++;
+   if (e_dev->err_info.error_dev_num < AER_MAX_MULTI_ERR_DEVICES) {
+   e_dev->dev[e_dev->err_info.error_dev_num] = pci_dev_get(dev);
+   e_dev->err_info.error_dev_num++;
return 0;
}
return -ENOSPC;
@@ -877,18 +889,18 @@ static bool is_error_source(struct pci_dev *dev, struct 
aer_err_info *e_info)
 
 static int find_device_iter(struct pci_dev *dev, void *data)
 {
-   struct aer_err_info *e_info = (struct aer_err_info *)data;
+   struct aer_devices_err_info *e_dev = (struct aer_devices_err_info 
*)data;
 
-   if (is_error_source(dev, e_info)) {
+   if (is_error_source(dev, _dev->err_info)) {
/* List this device */
- 

[PATCH v5 3/5] PCI/DPC: Initialize info.id in dpc_process_error()

2021-10-25 Thread Naveen Naidu
In the dpc_process_error() path, info.id isn't initialized before being
passed to aer_print_error(). In the corresponding AER path, it is
initialized in aer_isr_one_error().

The error message shown during Coverity Scan is:

  Coverity #1461602
  CID 1461602 (#1 of 1): Uninitialized scalar variable (UNINIT)
  8. uninit_use_in_call: Using uninitialized value info.id when calling 
aer_print_error.

Also Per PCIe r5.0, sec 7.9.15.5, the Source ID is defined only when the
Trigger Reason indicates ERR_NONFATAL or ERR_FATAL. Initialize the
"info.id" based on the trigger reason before passing it to
aer_print_error()

Fixes: 8aefa9b0d910 ("PCI/DPC: Print AER status in DPC event handling")
Signed-off-by: Naveen Naidu 
---
 drivers/pci/pcie/dpc.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index c556e7beafe3..6fa1b1eb4671 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -262,16 +262,24 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev 
*dev,
 
 void dpc_process_error(struct pci_dev *pdev)
 {
-   u16 cap = pdev->dpc_cap, status, source, reason, ext_reason;
+   u16 cap = pdev->dpc_cap, status, reason, ext_reason;
struct aer_err_info info;
 
pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, );
-   pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, );
+   reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN) >> 1;
+
+   /*
+* Per PCIe r5.0, sec 7.9.15.5, the Source ID is defined only when the
+* Trigger Reason indicates ERR_NONFATAL or ERR_FATAL.
+*/
+   if (reason == 1 || reason == 2)
+   pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, 
);
+   else
+   info.id = 0;
 
pci_info(pdev, "containment event, status:%#06x source:%#06x\n",
-status, source);
+status, info.id);
 
-   reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN) >> 1;
ext_reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN_EXT) >> 5;
pci_warn(pdev, "%s detected\n",
 (reason == 0) ? "unmasked uncorrectable error" :
-- 
2.25.1



[PATCH v5 2/5] PCI: Cleanup struct aer_err_info

2021-10-25 Thread Naveen Naidu
The id, status and the mask fields of the struct aer_err_info comes
directly from the registers, hence their sizes should be explicit.

The length of these registers are:
  - id: 16 bits - Represents the Error Source Requester ID
  - status: 32 bits - COR/UNCOR Error Status
  - mask: 32 bits - COR/UNCOR Error Mask

Since the length of the above registers are even, use u16 and u32
to represent their values.

Also remove the __pad fields.

"pahole" was run on the modified struct aer_err_info and the size
remains unchanged.

Signed-off-by: Naveen Naidu 
---
 drivers/pci/pci.h | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 1cce56c2aea0..9be7a966fda7 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -427,18 +427,16 @@ struct aer_err_info {
struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
int error_dev_num;
 
-   unsigned int id:16;
+   u16 id;
 
unsigned int severity:2;/* 0:NONFATAL | 1:FATAL | 2:COR */
-   unsigned int __pad1:5;
unsigned int multi_error_valid:1;
 
unsigned int first_error:5;
-   unsigned int __pad2:2;
unsigned int tlp_header_valid:1;
 
-   unsigned int status;/* COR/UNCOR Error Status */
-   unsigned int mask;  /* COR/UNCOR Error Mask */
+   u32 status; /* COR/UNCOR Error Status */
+   u32 mask;   /* COR/UNCOR Error Mask */
struct aer_header_log_regs tlp; /* TLP Header */
 };
 
-- 
2.25.1



[PATCH v5 1/5] PCI/AER: Remove ID from aer_agent_string[]

2021-10-25 Thread Naveen Naidu
Currently, we do not print the "id" field in the AER error logs. Yet the
aer_agent_string[] has the word "id" in it. The AER error log looks
like:

  pcieport :00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link 
Layer, (Receiver ID)

Without the "id" field in the error log, The aer_agent_string[]
(eg: "Receiver ID") does not make sense. A user reading the
aer_agent_string[] in the log, might inadvertently look for an "id"
field and not finding it might lead to confusion.

Remove the "ID" from the aer_agent_string[].

It is easy to reproduce this by using aer-inject:

  $ aer-inject -s 00:03:0 corr-err-file

The content of the corr-err-file file is as below:

  AER
  COR_STATUS BAD_TLP
  HEADER_LOG 0 1 2 3

The following are sample dummy errors inject via aer-inject.

Before
===

In 010caed4ccb6 ("PCI/AER: Decode Error Source Requester ID"),
the "id" field was removed from the AER error logs, so currently AER
logs look like:

  pcieport :00:03.0: AER: Corrected error received: :00:03:0
  pcieport :00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link 
Layer, (Receiver ID) <--- no id field
  pcieport :00:03.0:   device [1b36:000c] error 
status/mask=0040/e000
  pcieport :00:03.0:[ 6] BadTLP

After
==

  pcieport :00:03.0: AER: Corrected error received: :00:03.0
  pcieport :00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link 
Layer, (Receiver)
  pcieport :00:03.0:   device [1b36:000c] error 
status/mask=0040/e000
  pcieport :00:03.0:[ 6] BadTLP

Link: 
https://lore.kernel.org/linux-pci/20211021170317.GA2700910@bhelgaas/T/#m618bda4e54042d95a1a83fccc01cdb423f7590dc
Signed-off-by: Naveen Naidu 
---
 drivers/pci/pcie/aer.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 9784fdcf3006..241ff361b43c 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -516,10 +516,10 @@ static const char *aer_uncorrectable_error_string[] = {
 };
 
 static const char *aer_agent_string[] = {
-   "Receiver ID",
-   "Requester ID",
-   "Completer ID",
-   "Transmitter ID"
+   "Receiver",
+   "Requester",
+   "Completer",
+   "Transmitter"
 };
 
 #define aer_stats_dev_attr(name, stats_array, strings_array,   \
@@ -703,7 +703,7 @@ void aer_print_error(struct pci_dev *dev, struct 
aer_err_info *info)
const char *level;
 
if (!info->status) {
-   pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, 
(Unregistered Agent ID)\n",
+   pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, 
(Unregistered Agent)\n",
aer_error_severity_string[info->severity]);
goto out;
}
-- 
2.25.1



[PATCH v5 0/5] Fix long standing AER Error Handling Issues

2021-10-25 Thread Naveen Naidu
This patch series aims at fixing some of the AER error handling issues
we have.

Currently we have the following issues: 
  
  1. Confusing message in aer_print_error()
  2. aer_err_info not being initialized completely in DPC path before 
 we print the AER logs
  3. A bug [1] in clearing of AER registers in the native AER path

[1] https://lore.kernel.org/linux-pci/20151229155822.GA17321@localhost/

The patch series fixes the above things.

PATCH 1: 
  - Fixes the first issue
  - This patch is independent of other patches and can be applied
seperately

PATCH 2 - 3:
  - Fixes the second issue
  - Patch 3 is depended on Patch 2 in the series

PATCH 4
  - Fixes the bug in clearing of AER registers which leades to
AER message spew [1]

PATCH 5:
  - Adds extra information (devctl register) in AER error logs.
  - Patch 5 depends on Patch 4 of the series

Thanks,
Naveen Naidu

Changelog
=
v5:
- Edit the commit message of Patch 1 and Patch 5 to include how to
  test the AER messages using aer-inject.
- Edit Patch 3 to initialize info.id depending on the trigger
  reason.
- Drop few patches (v4 4/8, 5/8 7/8) since they were wrong.

v4:
- Fix logical error in 6/8, in the previous version of the patch set
  there was a bug, in how I added the devices to the queue.

v3:
- Edit the commit messages to be in imperative style and split the
  commits to be more atomic.

v2:
- Add [PATCH 7] which includes the device control register 
  information in AER error logs.

Naveen Naidu (5):
  [PATCH v5 1/5] PCI/AER: Remove ID from aer_agent_string[]
  [PATCH v5 2/5] PCI: Cleanup struct aer_err_info
  [PATCH v5 3/5] PCI/DPC: Initialize info.id in dpc_process_error()
  [PATCH v5 4/5] PCI/AER: Clear error device AER registers in aer_irq()
  [PATCH v5 5/5] PCI/AER: Include DEVCTL in aer_print_error()

 drivers/pci/pci.h  |  23 +++-
 drivers/pci/pcie/aer.c | 269 -
 drivers/pci/pcie/dpc.c |  16 ++-
 3 files changed, 214 insertions(+), 94 deletions(-)

-- 
2.25.1



Re: [PATCH 08/13] zram: add error handling support for add_disk()

2021-10-25 Thread Minchan Kim
On Fri, Oct 15, 2021 at 04:52:14PM -0700, Luis Chamberlain wrote:
> We never checked for errors on add_disk() as this function
> returned void. Now that this is fixed, use the shiny new
> error handling.
> 
> Signed-off-by: Luis Chamberlain 
Acked-by: Minchan Kim 


Re: [PATCH 00/13] block: add_disk() error handling stragglers

2021-10-25 Thread Luis Chamberlain
On Thu, Oct 21, 2021 at 08:10:49PM -0700, Geoff Levand wrote:
> Hi Luis,
> 
> On 10/18/21 9:15 AM, Luis Chamberlain wrote:
> > On Sun, Oct 17, 2021 at 08:26:33AM -0700, Geoff Levand wrote:
> >> Hi Luis,
> >>
> >> On 10/15/21 4:52 PM, Luis Chamberlain wrote:
> >>> This patch set consists of al the straggler drivers for which we have
> >>> have no patch reviews done for yet. I'd like to ask for folks to please
> >>> consider chiming in, specially if you're the maintainer for the driver.
> >>> Additionally if you can specify if you'll take the patch in yourself or
> >>> if you want Jens to take it, that'd be great too.
> >>
> >> Do you have a git repo with the patch set applied that I can use to test 
> >> with?
> > 
> > Sure, although the second to last patch is in a state of flux given
> > the ataflop driver currently is broken and so we're seeing how to fix
> > that first:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20211011-for-axboe-add-disk-error-handling
> 
> That branch has so many changes applied on top of the base v5.15-rc4
> that the patches I need to apply to test on PS3 with don't apply.
> 
> Do you have something closer to say v5.15-rc5?  Preferred would be
> just your add_disk() error handling patches plus what they depend
> on.

If you just want to test the ps3 changes, I've put this branch together
just for yo, its based on v5.15-rc6:

https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20211025-ps3-add-disk

  Luis


Re: [PATCH] locking: remove spin_lock_flags() etc

2021-10-25 Thread Arnd Bergmann
On Mon, Oct 25, 2021 at 5:28 PM Waiman Long  wrote:
> On 10/25/21 9:06 AM, Arnd Bergmann wrote:
> >
> > On s390, we pick between the cmpxchg() based directed-yield when
> > running on virtualized CPUs, and a normal qspinlock when running on a
> > dedicated CPU.
>
> I am not aware that s390 is using qspinlocks at all as I don't see
> ARCH_USE_QUEUED_SPINLOCKS being set anywhere under arch/s390. I only see
> that it uses a cmpxchg based spinlock.

Sorry, I should not have said "normal" here. See arch/s390/lib/spinlock.c
for their custom queued spinlocks as implemented in arch_spin_lock_queued().
I don't know if that code actually does the same thing as the generic qspinlock,
but it seems at least similar.

   Arnd


Re: [PATCH] locking: remove spin_lock_flags() etc

2021-10-25 Thread Waiman Long



On 10/25/21 9:06 AM, Arnd Bergmann wrote:

On Mon, Oct 25, 2021 at 11:57 AM Peter Zijlstra  wrote:

On Sat, Oct 23, 2021 at 06:04:57PM +0200, Arnd Bergmann wrote:

On Sat, Oct 23, 2021 at 3:37 AM Waiman Long  wrote:

On 10/22/21 7:59 AM, Arnd Bergmann wrote:
From: Arnd Bergmann 

As this is all dead code, just remove it and the helper functions built
around it. For arch/ia64, the inline asm could be cleaned up, but
it seems safer to leave it untouched.

Signed-off-by: Arnd Bergmann 

Does that mean we can also remove the GENERIC_LOCKBREAK config option
from the Kconfig files as well?

  I couldn't figure this out.

What I see is that the only architectures setting GENERIC_LOCKBREAK are
nds32, parisc, powerpc, s390, sh and sparc64, while the only architectures
implementing arch_spin_is_contended() are arm32, csky and ia64.

The part I don't understand is whether the option actually does anything
useful any more after commit d89c70356acf ("locking/core: Remove break_lock
field when CONFIG_GENERIC_LOCKBREAK=y").

Urgh, what a mess.. AFAICT there's still code in
kernel/locking/spinlock.c that relies on it. Specifically when
GENERIC_LOCKBREAK=y we seem to create _lock*() variants that are
basically TaS locks which drop preempt/irq disable while spinning.

Anybody having this on and not having native TaS locks is in for a rude
surprise I suppose... sparc64 being the obvious candidate there :/

Is this a problem on s390 and powerpc, those two being the ones
that matter in practice?

On s390, we pick between the cmpxchg() based directed-yield when
running on virtualized CPUs, and a normal qspinlock when running on a
dedicated CPU.


I am not aware that s390 is using qspinlocks at all as I don't see 
ARCH_USE_QUEUED_SPINLOCKS being set anywhere under arch/s390. I only see 
that it uses a cmpxchg based spinlock.


Cheers,
Longman





Re: [PATCH] powerpc: Enhance pmem DMA bypass handling

2021-10-25 Thread Brian King
On 10/23/21 7:18 AM, Alexey Kardashevskiy wrote:
> 
> 
> On 23/10/2021 07:18, Brian King wrote:
>> On 10/22/21 7:24 AM, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 22/10/2021 04:44, Brian King wrote:
 If ibm,pmemory is installed in the system, it can appear anywhere
 in the address space. This patch enhances how we handle DMA for devices 
 when
 ibm,pmemory is present. In the case where we have enough DMA space to
 direct map all of RAM, but not ibm,pmemory, we use direct DMA for
 I/O to RAM and use the default window to dynamically map ibm,pmemory.
 In the case where we only have a single DMA window, this won't work, > so 
 if the window is not big enough to map the entire address range,
 we cannot direct map.
>>>
>>> but we want the pmem range to be mapped into the huge DMA window too if we 
>>> can, why skip it?
>>
>> This patch should simply do what the comment in this commit mentioned below 
>> suggests, which says that
>> ibm,pmemory can appear anywhere in the address space. If the DMA window is 
>> large enough
>> to map all of MAX_PHYSMEM_BITS, we will indeed simply do direct DMA for 
>> everything,
>> including the pmem. If we do not have a big enough window to do that, we 
>> will do
>> direct DMA for DRAM and dynamic mapping for pmem.
> 
> 
> Right, and this is what we do already, do not we? I missing something here.

The upstream code does not work correctly that I can see. If I boot an upstream 
kernel
with an nvme device and vpmem assigned to the LPAR, and enable dev_dbg in 
arch/powerpc/platforms/pseries/iommu.c,
I see the following in the logs:

[2.157549] nvme 0121:50:00.0: ibm,query-pe-dma-windows(53) 50 800 
2121 returned 0
[2.157561] nvme 0121:50:00.0: Skipping ibm,pmemory
[2.157567] nvme 0121:50:00.0: can't map partition max 0x8 with 
16777216 65536-sized pages
[2.170150] nvme 0121:50:00.0: ibm,create-pe-dma-window(54) 50 800 
2121 10 28 returned 0 (liobn = 0x7121 starting addr = 800 0)
[2.170170] nvme 0121:50:00.0: created tce table LIOBN 0x7121 for 
/pci@8002121/pci1014,683@0
[2.356260] nvme 0121:50:00.0: node is /pci@8002121/pci1014,683@0

This means we are heading down the leg in enable_ddw where we do not set 
direct_mapping to true. We use
create the DDW window, but don't do any direct DMA. This is because the window 
is not large enough to
map 2PB of memory, which is what ddw_memory_hotplug_max returns without my 
patch. 

With my patch applied, I get this in the logs:

[2.204866] nvme 0121:50:00.0: ibm,query-pe-dma-windows(53) 50 800 
2121 returned 0
[2.204875] nvme 0121:50:00.0: Skipping ibm,pmemory
[2.205058] nvme 0121:50:00.0: ibm,create-pe-dma-window(54) 50 800 
2121 10 21 returned 0 (liobn = 0x7121 starting addr = 800 0)
[2.205068] nvme 0121:50:00.0: created tce table LIOBN 0x7121 for 
/pci@8002121/pci1014,683@0
[2.215898] nvme 0121:50:00.0: iommu: 64-bit OK but direct DMA is limited by 
802


Thanks,

Brian


> 
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/powerpc/platforms/pseries/iommu.c?id=bf6e2d562bbc4d115cf322b0bca57fe5bbd26f48
>>
>>
>> Thanks,
>>
>> Brian
>>
>>
>>>
>>>

 Signed-off-by: Brian King 
 ---
    arch/powerpc/platforms/pseries/iommu.c | 19 ++-
    1 file changed, 10 insertions(+), 9 deletions(-)

 diff --git a/arch/powerpc/platforms/pseries/iommu.c 
 b/arch/powerpc/platforms/pseries/iommu.c
 index 269f61d519c2..d9ae985d10a4 100644
 --- a/arch/powerpc/platforms/pseries/iommu.c
 +++ b/arch/powerpc/platforms/pseries/iommu.c
 @@ -1092,15 +1092,6 @@ static phys_addr_t ddw_memory_hotplug_max(void)
    phys_addr_t max_addr = memory_hotplug_max();
    struct device_node *memory;
    -    /*
 - * The "ibm,pmemory" can appear anywhere in the address space.
 - * Assuming it is still backed by page structs, set the upper limit
 - * for the huge DMA window as MAX_PHYSMEM_BITS.
 - */
 -    if (of_find_node_by_type(NULL, "ibm,pmemory"))
 -    return (sizeof(phys_addr_t) * 8 <= MAX_PHYSMEM_BITS) ?
 -    (phys_addr_t) -1 : (1ULL << MAX_PHYSMEM_BITS);
 -
    for_each_node_by_type(memory, "memory") {
    unsigned long start, size;
    int n_mem_addr_cells, n_mem_size_cells, len;
 @@ -1341,6 +1332,16 @@ static bool enable_ddw(struct pci_dev *dev, struct 
 device_node *pdn)
     */
    len = max_ram_len;
    if (pmem_present) {
 +    if (default_win_removed) {
 +    /*
 + * If we only have one DMA window and have pmem present,
 + * then we need to be able to map the entire address
 + * range in order to be able to do direct DMA to RAM.
 + */
 +

Re: [PATCH] locking: remove spin_lock_flags() etc

2021-10-25 Thread Peter Zijlstra
On Mon, Oct 25, 2021 at 03:06:24PM +0200, Arnd Bergmann wrote:
> On Mon, Oct 25, 2021 at 11:57 AM Peter Zijlstra  wrote:
> > On Sat, Oct 23, 2021 at 06:04:57PM +0200, Arnd Bergmann wrote:
> > > On Sat, Oct 23, 2021 at 3:37 AM Waiman Long  wrote:
> > > >> On 10/22/21 7:59 AM, Arnd Bergmann wrote:
> > > > > From: Arnd Bergmann 
> > > > >
> > > > > As this is all dead code, just remove it and the helper functions 
> > > > > built
> > > > > around it. For arch/ia64, the inline asm could be cleaned up, but
> > > > > it seems safer to leave it untouched.
> > > > >
> > > > > Signed-off-by: Arnd Bergmann 
> > > >
> > > > Does that mean we can also remove the GENERIC_LOCKBREAK config option
> > > > from the Kconfig files as well?
> > >
> > >  I couldn't figure this out.
> > >
> > > What I see is that the only architectures setting GENERIC_LOCKBREAK are
> > > nds32, parisc, powerpc, s390, sh and sparc64, while the only architectures
> > > implementing arch_spin_is_contended() are arm32, csky and ia64.
> > >
> > > The part I don't understand is whether the option actually does anything
> > > useful any more after commit d89c70356acf ("locking/core: Remove 
> > > break_lock
> > > field when CONFIG_GENERIC_LOCKBREAK=y").
> >
> > Urgh, what a mess.. AFAICT there's still code in
> > kernel/locking/spinlock.c that relies on it. Specifically when
> > GENERIC_LOCKBREAK=y we seem to create _lock*() variants that are
> > basically TaS locks which drop preempt/irq disable while spinning.
> >
> > Anybody having this on and not having native TaS locks is in for a rude
> > surprise I suppose... sparc64 being the obvious candidate there :/
> 
> Is this a problem on s390 and powerpc, those two being the ones
> that matter in practice?
> 
> On s390, we pick between the cmpxchg() based directed-yield when
> running on virtualized CPUs, and a normal qspinlock when running on a
> dedicated CPU.
> 
> On PowerPC, we pick at compile-time between either the qspinlock
> (default-enabled on Book3S-64, i.e. all server chips) or a ll/sc based
> spinlock plus vm_yield() (default on embedded and 32-bit mac).

Urgh, yeah, so this crud undermines the whole point of having a fair
lock. I'm thinking s390 and Power want to have this fixed.


Re: [PATCH] locking: remove spin_lock_flags() etc

2021-10-25 Thread Arnd Bergmann
On Mon, Oct 25, 2021 at 11:57 AM Peter Zijlstra  wrote:
> On Sat, Oct 23, 2021 at 06:04:57PM +0200, Arnd Bergmann wrote:
> > On Sat, Oct 23, 2021 at 3:37 AM Waiman Long  wrote:
> > >> On 10/22/21 7:59 AM, Arnd Bergmann wrote:
> > > > From: Arnd Bergmann 
> > > >
> > > > As this is all dead code, just remove it and the helper functions built
> > > > around it. For arch/ia64, the inline asm could be cleaned up, but
> > > > it seems safer to leave it untouched.
> > > >
> > > > Signed-off-by: Arnd Bergmann 
> > >
> > > Does that mean we can also remove the GENERIC_LOCKBREAK config option
> > > from the Kconfig files as well?
> >
> >  I couldn't figure this out.
> >
> > What I see is that the only architectures setting GENERIC_LOCKBREAK are
> > nds32, parisc, powerpc, s390, sh and sparc64, while the only architectures
> > implementing arch_spin_is_contended() are arm32, csky and ia64.
> >
> > The part I don't understand is whether the option actually does anything
> > useful any more after commit d89c70356acf ("locking/core: Remove break_lock
> > field when CONFIG_GENERIC_LOCKBREAK=y").
>
> Urgh, what a mess.. AFAICT there's still code in
> kernel/locking/spinlock.c that relies on it. Specifically when
> GENERIC_LOCKBREAK=y we seem to create _lock*() variants that are
> basically TaS locks which drop preempt/irq disable while spinning.
>
> Anybody having this on and not having native TaS locks is in for a rude
> surprise I suppose... sparc64 being the obvious candidate there :/

Is this a problem on s390 and powerpc, those two being the ones
that matter in practice?

On s390, we pick between the cmpxchg() based directed-yield when
running on virtualized CPUs, and a normal qspinlock when running on a
dedicated CPU.

On PowerPC, we pick at compile-time between either the qspinlock
(default-enabled on Book3S-64, i.e. all server chips) or a ll/sc based
spinlock plus vm_yield() (default on embedded and 32-bit mac).

   Arnd


Re: [PATCH v2] perf vendor events power10: Add metric events json file for power10 platform

2021-10-25 Thread Paul A. Clarke
On Mon, Oct 25, 2021 at 02:23:15PM +1100, Michael Ellerman wrote:
> "Paul A. Clarke"  writes:
> > Thanks for the changes!
> > More nits below (many left over from prior review)...
> >
> > On Fri, Oct 22, 2021 at 11:55:05AM +0530, Kajol Jain wrote:
> >> Add pmu metric json file for power10 platform.
> >> 
> >> Signed-off-by: Kajol Jain 
> >> ---
> >> Changelog v1 -> v2:
> >> - Did some nit changes in BriefDescription field
> >>   as suggested by Paul A. Clarke
> >> 
> >> - Link to the v1 patch: https://lkml.org/lkml/2021/10/6/131
> >> 
> >>  .../arch/powerpc/power10/metrics.json | 676 ++
> >>  1 file changed, 676 insertions(+)
> >>  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/metrics.json
> >> 
> >> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/metrics.json 
> >> b/tools/perf/pmu-events/arch/powerpc/power10/metrics.json
> >> new file mode 100644
> >> index ..8adab5cd9934
> >> --- /dev/null
> >> +++ b/tools/perf/pmu-events/arch/powerpc/power10/metrics.json
> >> @@ -0,0 +1,676 @@
> >> +[
> >> +{
> >> +"BriefDescription": "Percentage of cycles that are run cycles",
> >> +"MetricExpr": "PM_RUN_CYC / PM_CYC * 100",
> >> +"MetricGroup": "General",
> >> +"MetricName": "RUN_CYCLES_RATE",
> >> +"ScaleUnit": "1%"
> >> +},
> >> +{
> >> +"BriefDescription": "Average cycles per completed instruction",
> >> +"MetricExpr": "PM_CYC / PM_INST_CMPL",
> >> +"MetricGroup": "CPI",
> >> +"MetricName": "CYCLES_PER_INSTRUCTION"
> >> +},
> >> +{
> >> +"BriefDescription": "Average cycles per instruction when dispatch 
> >> was stalled for any reason",
> >> +"MetricExpr": "PM_DISP_STALL_CYC / PM_RUN_INST_CMPL",
> >> +"MetricGroup": "CPI",
> >> +"MetricName": "DISPATCHED_CPI"
> >> +},
> >> +{
> >> +"BriefDescription": "Average cycles per instruction when dispatch 
> >> was stalled because there was a flush",
> >> +"MetricExpr": "PM_DISP_STALL_FLUSH / PM_RUN_INST_CMPL",
> >> +"MetricGroup": "CPI",
> >> +"MetricName": "DISPATCHED_FLUSH_CPI"
> >> +},
> >> +{
> >> +"BriefDescription": "Average cycles per instruction when dispatch 
> >> was stalled because the MMU was handling a translation miss",
> >> +"MetricExpr": "PM_DISP_STALL_TRANSLATION / PM_RUN_INST_CMPL",
> >> +"MetricGroup": "CPI",
> >> +"MetricName": "DISPATCHED_TRANSLATION_CPI"
> >> +},
> >> +{
> >> +"BriefDescription": "Average cycles per instruction when dispatch 
> >> was stalled waiting to resolve an instruction ERAT miss",
> >> +"MetricExpr": "PM_DISP_STALL_IERAT_ONLY_MISS / PM_RUN_INST_CMPL",
> >> +"MetricGroup": "CPI",
> >> +"MetricName": "DISPATCHED_IERAT_ONLY_MISS_CPI"
> >> +},
> >> +{
> >> +"BriefDescription": "Average cycles per instruction when dispatch 
> >> was stalled waiting to resolve an instruction TLB miss",
> >> +"MetricExpr": "PM_DISP_STALL_ITLB_MISS / PM_RUN_INST_CMPL",
> >> +"MetricGroup": "CPI",
> >> +"MetricName": "DISPATCHED_ITLB_MISS_CPI"
> >> +},
> >> +{
> >> +"BriefDescription": "Average cycles per instruction when dispatch 
> >> was stalled due to an icache miss",
> >> +"MetricExpr": "PM_DISP_STALL_IC_MISS / PM_RUN_INST_CMPL",
> >> +"MetricGroup": "CPI",
> >> +"MetricName": "DISPATCHED_IC_MISS_CPI"
> >> +},
> >> +{
> >> +"BriefDescription": "Average cycles per instruction when dispatch 
> >> was stalled while the instruction was fetched from the local L2",
> >> +"MetricExpr": "PM_DISP_STALL_IC_L2 / PM_RUN_INST_CMPL",
> >> +"MetricGroup": "CPI",
> >> +"MetricName": "DISPATCHED_IC_L2_CPI"
> >> +},
> >> +{
> >> +"BriefDescription": "Average cycles per instruction when dispatch 
> >> was stalled while the instruction was fetched from the local L3",
> >> +"MetricExpr": "PM_DISP_STALL_IC_L3 / PM_RUN_INST_CMPL",
> >> +"MetricGroup": "CPI",
> >> +"MetricName": "DISPATCHED_IC_L3_CPI"
> >> +},
> >> +{
> >> +"BriefDescription": "Average cycles per instruction when dispatch 
> >> was stalled while the instruction was fetched from any source beyond the 
> >> local L3",
> >> +"MetricExpr": "PM_DISP_STALL_IC_L3MISS / PM_RUN_INST_CMPL",
> >> +"MetricGroup": "CPI",
> >> +"MetricName": "DISPATCHED_IC_L3MISS_CPI"
> >> +},
> >> +{
> >> +"BriefDescription": "Average cycles per instruction when dispatch 
> >> was stalled due to an icache miss after a branch mispredict",
> >> +"MetricExpr": "PM_DISP_STALL_BR_MPRED_ICMISS / PM_RUN_INST_CMPL",
> >> +"MetricGroup": "CPI",
> >> +"MetricName": "DISPATCHED_BR_MPRED_ICMISS_CPI"
> >> +},
> >> +{
> >> +"BriefDescription": "Average cycles per instruction when dispatch 
> >> 

Linux kernel: powerpc: KVM guest can trigger host crash on Power8

2021-10-25 Thread Michael Ellerman
The Linux kernel for powerpc since v5.2 has a bug which allows a
malicious KVM guest to crash the host, when the host is running on
Power8.

Only machines using Linux as the hypervisor, aka. KVM, powernv or bare
metal, are affected by the bug. Machines running PowerVM are not
affected.

The bug was introduced in:

10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")

Which was first released in v5.2.

The upstream fix is:

  cdeb5d7d890e ("KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if 
it went to guest")
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cdeb5d7d890e14f3b70e8087e745c4a6a7d9f337

Which will be included in the v5.16 release.

Note to backporters, the following commits are required:

  73287caa9210ded6066833195f4335f7f688a46b
  ("powerpc64/idle: Fix SP offsets when saving GPRs")

  9b4416c5095c20e110c82ae602c254099b83b72f
  ("KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()")

  cdeb5d7d890e14f3b70e8087e745c4a6a7d9f337
  ("KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to 
guest")

  496c5fe25c377ddb7815c4ce8ecfb676f051e9b6
  ("powerpc/idle: Don't corrupt back chain when going idle")


I have a test case to trigger the bug, which I can share privately with
anyone who would like to test the fix.

cheers


[PATCH] selftests/powerpc: Use date instead of EPOCHSECONDS in mitigation-patching.sh

2021-10-25 Thread Russell Currey
The EPOCHSECONDS environment variable was added in bash 5.0 (released
2019).  Some distributions of the "stable" and "long-term" variety ship
older versions of bash than this, so swap to using the date command
instead.

"%s" was added to coreutils `date` in 1993 so we should be good, but who
knows, it is a GNU extension and not part of the POSIX spec for `date`.

Signed-off-by: Russell Currey 
---
 .../testing/selftests/powerpc/security/mitigation-patching.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/powerpc/security/mitigation-patching.sh 
b/tools/testing/selftests/powerpc/security/mitigation-patching.sh
index 00197acb7ff1..b0b20e0b4e30 100755
--- a/tools/testing/selftests/powerpc/security/mitigation-patching.sh
+++ b/tools/testing/selftests/powerpc/security/mitigation-patching.sh
@@ -13,7 +13,7 @@ function do_one
 
 orig=$(cat "$mitigation")
 
-start=$EPOCHSECONDS
+start=$(date +%s)
 now=$start
 
 while [[ $((now-start)) -lt "$TIMEOUT" ]]
@@ -21,7 +21,7 @@ function do_one
 echo 0 > "$mitigation"
 echo 1 > "$mitigation"
 
-now=$EPOCHSECONDS
+now=$(date +%s)
 done
 
 echo "$orig" > "$mitigation"
-- 
2.33.1



Re: [PATCH] locking: remove spin_lock_flags() etc

2021-10-25 Thread Peter Zijlstra
On Mon, Oct 25, 2021 at 11:57:28AM +0200, Peter Zijlstra wrote:
> On Sat, Oct 23, 2021 at 06:04:57PM +0200, Arnd Bergmann wrote:
> > On Sat, Oct 23, 2021 at 3:37 AM Waiman Long  wrote:
> > >> On 10/22/21 7:59 AM, Arnd Bergmann wrote:
> > > > From: Arnd Bergmann 
> > > >
> > > > As this is all dead code, just remove it and the helper functions built
> > > > around it. For arch/ia64, the inline asm could be cleaned up, but
> > > > it seems safer to leave it untouched.
> > > >
> > > > Signed-off-by: Arnd Bergmann 
> > >
> > > Does that mean we can also remove the GENERIC_LOCKBREAK config option
> > > from the Kconfig files as well?
> > 
> >  I couldn't figure this out.
> > 
> > What I see is that the only architectures setting GENERIC_LOCKBREAK are
> > nds32, parisc, powerpc, s390, sh and sparc64, while the only architectures
> > implementing arch_spin_is_contended() are arm32, csky and ia64.
> > 
> > The part I don't understand is whether the option actually does anything
> > useful any more after commit d89c70356acf ("locking/core: Remove break_lock
> > field when CONFIG_GENERIC_LOCKBREAK=y").
> 
> Urgh, what a mess.. AFAICT there's still code in
> kernel/locking/spinlock.c that relies on it. Specifically when
> GENERIC_LOCKBREAK=y we seem to create _lock*() variants that are
> basically TaS locks which drop preempt/irq disable while spinning.
> 
> Anybody having this on and not having native TaS locks is in for a rude
> surprise I suppose... sparc64 being the obvious candidate there :/

Something like the *totally*untested* patch below would rip it all out.

---
 arch/ia64/Kconfig|  3 --
 arch/nds32/Kconfig   |  4 --
 arch/parisc/Kconfig  |  5 ---
 arch/powerpc/Kconfig |  5 ---
 arch/s390/Kconfig|  3 --
 arch/sh/Kconfig  |  4 --
 arch/sparc/Kconfig   |  6 ---
 include/linux/rwlock_api_smp.h   |  4 +-
 include/linux/spinlock_api_smp.h |  4 +-
 kernel/Kconfig.locks | 26 ++--
 kernel/locking/spinlock.c| 90 
 11 files changed, 17 insertions(+), 137 deletions(-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 1e33666fa679..5ec3abba3c81 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -81,9 +81,6 @@ config MMU
 config STACKTRACE_SUPPORT
def_bool y
 
-config GENERIC_LOCKBREAK
-   def_bool n
-
 config GENERIC_CALIBRATE_DELAY
bool
default y
diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig
index aea26e739543..699008dbd6c2 100644
--- a/arch/nds32/Kconfig
+++ b/arch/nds32/Kconfig
@@ -59,10 +59,6 @@ config GENERIC_CSUM
 config GENERIC_HWEIGHT
def_bool y
 
-config GENERIC_LOCKBREAK
-   def_bool y
-   depends on PREEMPTION
-
 config STACKTRACE_SUPPORT
def_bool y
 
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 27a8b49af11f..afe70bcdde2c 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -86,11 +86,6 @@ config ARCH_DEFCONFIG
default "arch/parisc/configs/generic-32bit_defconfig" if !64BIT
default "arch/parisc/configs/generic-64bit_defconfig" if 64BIT
 
-config GENERIC_LOCKBREAK
-   bool
-   default y
-   depends on SMP && PREEMPTION
-
 config ARCH_HAS_ILOG2_U32
bool
default n
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index ba5b66189358..e782c9ea3f81 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -98,11 +98,6 @@ config LOCKDEP_SUPPORT
bool
default y
 
-config GENERIC_LOCKBREAK
-   bool
-   default y
-   depends on SMP && PREEMPTION
-
 config GENERIC_HWEIGHT
bool
default y
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index b86de61b8caa..e4ff05f5393b 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -26,9 +26,6 @@ config GENERIC_BUG
 config GENERIC_BUG_RELATIVE_POINTERS
def_bool y
 
-config GENERIC_LOCKBREAK
-   def_bool y if PREEMPTION
-
 config PGSTE
def_bool y if KVM
 
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 6904f4bdbf00..26f1cf2c69a3 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -86,10 +86,6 @@ config GENERIC_HWEIGHT
 config GENERIC_CALIBRATE_DELAY
bool
 
-config GENERIC_LOCKBREAK
-   def_bool y
-   depends on SMP && PREEMPTION
-
 config ARCH_SUSPEND_POSSIBLE
def_bool n
 
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index b120ed947f50..e77e7254eaa0 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -246,12 +246,6 @@ config US3_MC
 
  If in doubt, say Y, as this information can be very useful.
 
-# Global things across all Sun machines.
-config GENERIC_LOCKBREAK
-   bool
-   default y
-   depends on SPARC64 && SMP && PREEMPTION
-
 config NUMA
bool "NUMA support"
depends on SPARC64 && SMP
diff --git a/include/linux/rwlock_api_smp.h b/include/linux/rwlock_api_smp.h
index abfb53ab11be..a281d81ef8ee 100644
--- 

Re: [PATCH] locking: remove spin_lock_flags() etc

2021-10-25 Thread Peter Zijlstra
On Sat, Oct 23, 2021 at 06:04:57PM +0200, Arnd Bergmann wrote:
> On Sat, Oct 23, 2021 at 3:37 AM Waiman Long  wrote:
> >> On 10/22/21 7:59 AM, Arnd Bergmann wrote:
> > > From: Arnd Bergmann 
> > >
> > > As this is all dead code, just remove it and the helper functions built
> > > around it. For arch/ia64, the inline asm could be cleaned up, but
> > > it seems safer to leave it untouched.
> > >
> > > Signed-off-by: Arnd Bergmann 
> >
> > Does that mean we can also remove the GENERIC_LOCKBREAK config option
> > from the Kconfig files as well?
> 
>  I couldn't figure this out.
> 
> What I see is that the only architectures setting GENERIC_LOCKBREAK are
> nds32, parisc, powerpc, s390, sh and sparc64, while the only architectures
> implementing arch_spin_is_contended() are arm32, csky and ia64.
> 
> The part I don't understand is whether the option actually does anything
> useful any more after commit d89c70356acf ("locking/core: Remove break_lock
> field when CONFIG_GENERIC_LOCKBREAK=y").

Urgh, what a mess.. AFAICT there's still code in
kernel/locking/spinlock.c that relies on it. Specifically when
GENERIC_LOCKBREAK=y we seem to create _lock*() variants that are
basically TaS locks which drop preempt/irq disable while spinning.

Anybody having this on and not having native TaS locks is in for a rude
surprise I suppose... sparc64 being the obvious candidate there :/





[Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load

2021-10-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #18 from John Paul Adrian Glaubitz (glaub...@physik.fu-berlin.de) 
---
There seems to be a related discussion:

> https://yhbt.net/lore/all/20200831091523.gc29...@kitsune.suse.cz/T/

This suspects 10d91611f426d4bafd2a83d966c36da811b2f7ad to be the cause:

>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=10d91611f426d4bafd2a83d966c36da811b2f7ad

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[PATCH] macintosh/via-pmu-led: make disk activity usage a parameter.

2021-10-25 Thread Hill Ma
Whether to use the LED as a disk activity is a user preference.
Some like this usage while others find the LED too bright. So it
might be a good idea to make this choice a runtime parameter rather
than compile-time config.

The default is set to disabled as OS X does not use the LED as a
disk activity indicator.

Signed-off-by: Hill Ma 
---
 Documentation/admin-guide/kernel-parameters.txt |  6 ++
 drivers/macintosh/Kconfig   | 10 --
 drivers/macintosh/via-pmu-led.c | 11 ---
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 43dc35fe5bc0..a656a51ba0a8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -250,6 +250,12 @@
Use timer override. For some broken Nvidia NF5 boards
that require a timer override, but don't have HPET
 
+   adb_pmu_led_disk [PPC]
+   Use front LED as disk LED by default. Only applies to
+   PowerBook, iBook, PowerMac 7,2/7,3.
+   Format:   (1/Y/y=enable, 0/N/n=disable)
+   Default: disabled
+
add_efi_memmap  [EFI; X86] Include EFI memory map in
kernel's map of available physical RAM.
 
diff --git a/drivers/macintosh/Kconfig b/drivers/macintosh/Kconfig
index 5cdc361da37c..243215de563c 100644
--- a/drivers/macintosh/Kconfig
+++ b/drivers/macintosh/Kconfig
@@ -78,16 +78,6 @@ config ADB_PMU_LED
  behaviour of the old CONFIG_BLK_DEV_IDE_PMAC_BLINK, select this
  and the disk LED trigger and configure appropriately through sysfs.
 
-config ADB_PMU_LED_DISK
-   bool "Use front LED as DISK LED by default"
-   depends on ADB_PMU_LED
-   depends on LEDS_CLASS
-   select LEDS_TRIGGERS
-   select LEDS_TRIGGER_DISK
-   help
- This option makes the front LED default to the disk trigger
- so that it blinks on disk activity.
-
 config PMAC_SMU
bool "Support for SMU  based PowerMacs"
depends on PPC_PMAC64
diff --git a/drivers/macintosh/via-pmu-led.c b/drivers/macintosh/via-pmu-led.c
index ae067ab2373d..838dcf98f82e 100644
--- a/drivers/macintosh/via-pmu-led.c
+++ b/drivers/macintosh/via-pmu-led.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static spinlock_t pmu_blink_lock;
@@ -71,11 +72,10 @@ static void pmu_led_set(struct led_classdev *led_cdev,
spin_unlock_irqrestore(_blink_lock, flags);
 }
 
+int adb_pmu_led_disk;
+
 static struct led_classdev pmu_led = {
.name = "pmu-led::front",
-#ifdef CONFIG_ADB_PMU_LED_DISK
-   .default_trigger = "disk-activity",
-#endif
.brightness_set = pmu_led_set,
 };
 
@@ -106,6 +106,9 @@ static int __init via_pmu_led_init(void)
}
of_node_put(dt);
 
+   if (adb_pmu_led_disk)
+   pmu_led.default_trigger = "disk-activity";
+
spin_lock_init(_blink_lock);
/* no outstanding req */
pmu_blink_req.complete = 1;
@@ -114,4 +117,6 @@ static int __init via_pmu_led_init(void)
return led_classdev_register(NULL, _led);
 }
 
+core_param(adb_pmu_led_disk, adb_pmu_led_disk, bool, 0644);
+
 late_initcall(via_pmu_led_init);
-- 
2.33.1



Re: [PATCH] powerpc/bpf: fix write protecting JIT code

2021-10-25 Thread Naveen N. Rao

Hari Bathini wrote:

Running program with bpf-to-bpf function calls results in data access
exception (0x300) with the below call trace:

[c0113f28] bpf_int_jit_compile+0x238/0x750 (unreliable)
[c037d2f8] bpf_check+0x2008/0x2710
[c0360050] bpf_prog_load+0xb00/0x13a0
[c0361d94] __sys_bpf+0x6f4/0x27c0
[c0363f0c] sys_bpf+0x2c/0x40
[c0032434] system_call_exception+0x164/0x330
[c000c1e8] system_call_vectored_common+0xe8/0x278

as bpf_int_jit_compile() tries writing to write protected JIT code
location during the extra pass.

Fix it by holding off write protection of JIT code until the extra
pass, where branch target addresses fixup happens.

Cc: sta...@vger.kernel.org
Fixes: 62e3d4210ac9 ("powerpc/bpf: Write protect JIT code")
Signed-off-by: Hari Bathini 
---
 arch/powerpc/net/bpf_jit_comp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


Thanks for the fix!

Reviewed-by: Naveen N. Rao