Re: [PATCH 0/6] Memory corruption may occur due to incorrent tlb flush

2020-02-19 Thread Greg KH
On Thu, Feb 20, 2020 at 11:04:51AM +0530, Santosh Sivaraj wrote:
> The TLB flush optimisation (a46cc7a90f: powerpc/mm/radix: Improve TLB/PWC
> flushes) may result in random memory corruption. Any concurrent page-table 
> walk
> could end up with a Use-after-Free. Even on UP this might give issues, since
> mmu_gather is preemptible these days. An interrupt or preempted task accessing
> user pages might stumble into the free page if the hardware caches page
> directories.
> 
> The series is a backport of the fix sent by Peter [1].
> 
> The first three patches are dependencies for the last patch (avoid potential
> double flush). If the performance impact due to double flush is considered
> trivial then the first three patches and last patch may be dropped.
> 
> [1] https://patchwork.kernel.org/cover/11284843/

Can you resend these with the git commit ids of the upstream patches in
them, and say what stable tree(s) you wish to have them applied to?

thanks,

greg k-h


[PATCH 6/6] asm-generic/tlb: avoid potential double flush

2020-02-19 Thread Santosh Sivaraj
From: Peter Zijlstra 

Aneesh reported that:

tlb_flush_mmu()
  tlb_flush_mmu_tlbonly()
tlb_flush() <-- #1
  tlb_flush_mmu_free()
tlb_table_flush()
  tlb_table_invalidate()
tlb_flush_mmu_tlbonly()
  tlb_flush()   <-- #2

does two TLBIs when tlb->fullmm, because __tlb_reset_range() will not
clear tlb->end in that case.

Observe that any caller to __tlb_adjust_range() also sets at least one of
the tlb->freed_tables || tlb->cleared_p* bits, and those are
unconditionally cleared by __tlb_reset_range().

Change the condition for actually issuing TLBI to having one of those bits
set, as opposed to having tlb->end != 0.

Link: 
http://lkml.kernel.org/r/20200116064531.483522-4-aneesh.ku...@linux.ibm.com
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Aneesh Kumar K.V 
Reported-by: "Aneesh Kumar K.V" 
Cc:   # 4.19
Signed-off-by: Santosh Sivaraj 
[santosh: backported to 4.19 stable]
---
 include/asm-generic/tlb.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 19934cdd143e..427a70c56ddd 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -179,7 +179,12 @@ static inline void __tlb_reset_range(struct mmu_gather 
*tlb)
 
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
-   if (!tlb->end)
+   /*
+* Anything calling __tlb_adjust_range() also sets at least one of
+* these bits.
+*/
+   if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds ||
+ tlb->cleared_puds || tlb->cleared_p4ds))
return;
 
tlb_flush(tlb);
-- 
2.24.1



[PATCH 5/6] mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush

2020-02-19 Thread Santosh Sivaraj
From: Peter Zijlstra 

Architectures for which we have hardware walkers of Linux page table
should flush TLB on mmu gather batch allocation failures and batch flush.
Some architectures like POWER supports multiple translation modes (hash
and radix) and in the case of POWER only radix translation mode needs the
above TLBI.  This is because for hash translation mode kernel wants to
avoid this extra flush since there are no hardware walkers of linux page
table.  With radix translation, the hardware also walks linux page table
and with that, kernel needs to make sure to TLB invalidate page walk cache
before page table pages are freed.

More details in commit d86564a2f085 ("mm/tlb, x86/mm: Support invalidating
TLB caches for RCU_TABLE_FREE")

The changes to sparc are to make sure we keep the old behavior since we
are now removing HAVE_RCU_TABLE_NO_INVALIDATE.  The default value for
tlb_needs_table_invalidate is to always force an invalidate and sparc can
avoid the table invalidate.  Hence we define tlb_needs_table_invalidate to
false for sparc architecture.

Link: 
http://lkml.kernel.org/r/20200116064531.483522-3-aneesh.ku...@linux.ibm.com
Fixes: a46cc7a90fd8 ("powerpc/mm/radix: Improve TLB/PWC flushes")
Signed-off-by: Peter Zijlstra (Intel) 
Cc:   # 4.19
Signed-off-by: Santosh Sivaraj 
[santosh: backported to 4.19 stable]
---
 arch/Kconfig|  3 ---
 arch/powerpc/Kconfig|  1 -
 arch/powerpc/include/asm/tlb.h  | 11 +++
 arch/sparc/Kconfig  |  1 -
 arch/sparc/include/asm/tlb_64.h |  9 +
 include/asm-generic/tlb.h   | 15 +++
 mm/memory.c | 16 
 7 files changed, 43 insertions(+), 13 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 061a12b8140e..3abbdb0cea44 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -363,9 +363,6 @@ config HAVE_ARCH_JUMP_LABEL
 config HAVE_RCU_TABLE_FREE
bool
 
-config HAVE_RCU_TABLE_NO_INVALIDATE
-   bool
-
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
bool
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index fa231130eee1..b6429f53835e 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -216,7 +216,6 @@ config PPC
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE
-   select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if PPC64 && CPU_LITTLE_ENDIAN
select HAVE_SYSCALL_TRACEPOINTS
diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index f0e571b2dc7c..63418275f402 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -30,6 +30,17 @@
 #define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
 
 extern void tlb_flush(struct mmu_gather *tlb);
+/*
+ * book3s:
+ * Hash does not use the linux page-tables, so we can avoid
+ * the TLB invalidate for page-table freeing, Radix otoh does use the
+ * page-tables and needs the TLBI.
+ *
+ * nohash:
+ * We still do TLB invalidate in the __pte_free_tlb routine before we
+ * add the page table pages to mmu gather table batch.
+ */
+#define tlb_needs_table_invalidate()   radix_enabled()
 
 /* Get the generic bits... */
 #include 
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index d90d632868aa..e6f2a38d2e61 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -64,7 +64,6 @@ config SPARC64
select HAVE_KRETPROBES
select HAVE_KPROBES
select HAVE_RCU_TABLE_FREE if SMP
-   select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
select HAVE_MEMBLOCK_NODE_MAP
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_DYNAMIC_FTRACE
diff --git a/arch/sparc/include/asm/tlb_64.h b/arch/sparc/include/asm/tlb_64.h
index a2f3fa61ee36..8cb8f3833239 100644
--- a/arch/sparc/include/asm/tlb_64.h
+++ b/arch/sparc/include/asm/tlb_64.h
@@ -28,6 +28,15 @@ void flush_tlb_pending(void);
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 #define tlb_flush(tlb) flush_tlb_pending()
 
+/*
+ * SPARC64's hardware TLB fill does not use the Linux page-tables
+ * and therefore we don't need a TLBI when freeing page-table pages.
+ */
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define tlb_needs_table_invalidate()   (false)
+#endif
+
 #include 
 
 #endif /* _SPARC64_TLB_H */
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index f2b9dc9cbaf8..19934cdd143e 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -61,8 +61,23 @@ struct mmu_table_batch {
 extern void tlb_table_flush(struct mmu_gather *tlb);
 extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 
+/*
+ * This allows an architecture that does not use the linux page-tables for
+ * hardware to skip the TLBI when freeing page tables.
+ */
+#ifndef tlb_needs_table_invalidate
+#define tlb_needs_table_invalidate() (true)
 

[PATCH 4/6] powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case

2020-02-19 Thread Santosh Sivaraj
From: "Aneesh Kumar K.V" 

Patch series "Fixup page directory freeing", v4.

This is a repost of patch series from Peter with the arch specific changes
except ppc64 dropped.  ppc64 changes are added here because we are redoing
the patch series on top of ppc64 changes.  This makes it easy to backport
these changes.  Only the first 2 patches need to be backported to stable.

The thing is, on anything SMP, freeing page directories should observe the
exact same order as normal page freeing:

 1) unhook page/directory
 2) TLB invalidate
 3) free page/directory

Without this, any concurrent page-table walk could end up with a
Use-after-Free.  This is esp.  trivial for anything that has software
page-table walkers (HAVE_FAST_GUP / software TLB fill) or the hardware
caches partial page-walks (ie.  caches page directories).

Even on UP this might give issues since mmu_gather is preemptible these
days.  An interrupt or preempted task accessing user pages might stumble
into the free page if the hardware caches page directories.

This patch series fixes ppc64 and add generic MMU_GATHER changes to
support the conversion of other architectures.  I haven't added patches
w.r.t other architecture because they are yet to be acked.

This patch (of 9):

A followup patch is going to make sure we correctly invalidate page walk
cache before we free page table pages.  In order to keep things simple
enable RCU_TABLE_FREE even for !SMP so that we don't have to fixup the
!SMP case differently in the followup patch

!SMP case is right now broken for radix translation w.r.t page walk
cache flush.  We can get interrupted in between page table free and
that would imply we have page walk cache entries pointing to tables
which got freed already.  Michael said "both our platforms that run on
Power9 force SMP on in Kconfig, so the !SMP case is unlikely to be a
problem for anyone in practice, unless they've hacked their kernel to
build it !SMP."

Link: 
http://lkml.kernel.org/r/20200116064531.483522-2-aneesh.ku...@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V 
Cc:  # 4.19
Signed-off-by: Santosh Sivaraj 
[santosh: backported for 4.19 stable]
---
 arch/powerpc/Kconfig | 2 +-
 arch/powerpc/include/asm/book3s/32/pgalloc.h | 8 
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 2 --
 arch/powerpc/mm/pgtable-book3s64.c   | 7 ---
 4 files changed, 1 insertion(+), 18 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index f7f046ff6407..fa231130eee1 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -215,7 +215,7 @@ config PPC
select HAVE_HARDLOCKUP_DETECTOR_PERFif PERF_EVENTS && 
HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
-   select HAVE_RCU_TABLE_FREE  if SMP
+   select HAVE_RCU_TABLE_FREE
select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if PPC64 && CPU_LITTLE_ENDIAN
diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 82e44b1a00ae..79ba3fbb512e 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -110,7 +110,6 @@ static inline void pgtable_free(void *table, unsigned 
index_size)
 #define check_pgt_cache()  do { } while (0)
 #define get_hugepd_cache_index(x)  (x)
 
-#ifdef CONFIG_SMP
 static inline void pgtable_free_tlb(struct mmu_gather *tlb,
void *table, int shift)
 {
@@ -127,13 +126,6 @@ static inline void __tlb_remove_table(void *_table)
 
pgtable_free(table, shift);
 }
-#else
-static inline void pgtable_free_tlb(struct mmu_gather *tlb,
-   void *table, int shift)
-{
-   pgtable_free(table, shift);
-}
-#endif
 
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
  unsigned long address)
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index f9019b579903..1013c0214213 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -47,9 +47,7 @@ extern pmd_t *pmd_fragment_alloc(struct mm_struct *, unsigned 
long);
 extern void pte_fragment_free(unsigned long *, int);
 extern void pmd_fragment_free(unsigned long *);
 extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift);
-#ifdef CONFIG_SMP
 extern void __tlb_remove_table(void *_table);
-#endif
 
 static inline pgd_t *radix__pgd_alloc(struct mm_struct *mm)
 {
diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index 297db665d953..5b4e9fd8990c 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -432,7 +432,6 @@ static inline void 

[PATCH 3/6] asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE

2020-02-19 Thread Santosh Sivaraj
From: Peter Zijlstra 

Make issuing a TLB invalidate for page-table pages the normal case.

The reason is twofold:

 - too many invalidates is safer than too few,
 - most architectures use the linux page-tables natively
   and would thus require this.

Make it an opt-out, instead of an opt-in.

No change in behavior intended.

Signed-off-by: Peter Zijlstra (Intel) 
Cc:  # 4.19
Signed-off-by: Santosh Sivaraj 
[santosh: prerequisite for upcoming tlbflush backports]
---
 arch/Kconfig | 2 +-
 arch/powerpc/Kconfig | 1 +
 arch/sparc/Kconfig   | 1 +
 arch/x86/Kconfig | 1 -
 mm/memory.c  | 2 +-
 5 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index a336548487e6..061a12b8140e 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -363,7 +363,7 @@ config HAVE_ARCH_JUMP_LABEL
 config HAVE_RCU_TABLE_FREE
bool
 
-config HAVE_RCU_TABLE_INVALIDATE
+config HAVE_RCU_TABLE_NO_INVALIDATE
bool
 
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a80669209155..f7f046ff6407 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -216,6 +216,7 @@ config PPC
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE  if SMP
+   select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if PPC64 && CPU_LITTLE_ENDIAN
select HAVE_SYSCALL_TRACEPOINTS
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index e6f2a38d2e61..d90d632868aa 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -64,6 +64,7 @@ config SPARC64
select HAVE_KRETPROBES
select HAVE_KPROBES
select HAVE_RCU_TABLE_FREE if SMP
+   select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
select HAVE_MEMBLOCK_NODE_MAP
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_DYNAMIC_FTRACE
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index af35f5caadbe..181d0d522977 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -181,7 +181,6 @@ config X86
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE  if PARAVIRT
-   select HAVE_RCU_TABLE_INVALIDATEif HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if X86_64 && 
(UNWINDER_FRAME_POINTER || UNWINDER_ORC) && STACK_VALIDATION
select HAVE_STACKPROTECTOR  if CC_HAS_SANE_STACKPROTECTOR
diff --git a/mm/memory.c b/mm/memory.c
index 1832c5ed6ac0..ba5689610c04 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -327,7 +327,7 @@ bool __tlb_remove_page_size(struct mmu_gather *tlb, struct 
page *page, int page_
  */
 static inline void tlb_table_invalidate(struct mmu_gather *tlb)
 {
-#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE
+#ifndef CONFIG_HAVE_RCU_TABLE_NO_INVALIDATE
/*
 * Invalidate page-table caches used by hardware walkers. Then we still
 * need to RCU-sched wait while freeing the pages because software
-- 
2.24.1



[PATCH 2/6] asm-generic/tlb: Track which levels of the page tables have been cleared

2020-02-19 Thread Santosh Sivaraj
From: Will Deacon 

It is common for architectures with hugepage support to require only a
single TLB invalidation operation per hugepage during unmap(), rather than
iterating through the mapping at a PAGE_SIZE increment. Currently,
however, the level in the page table where the unmap() operation occurs
is not stored in the mmu_gather structure, therefore forcing
architectures to issue additional TLB invalidation operations or to give
up and over-invalidate by e.g. invalidating the entire TLB.

Ideally, we could add an interval rbtree to the mmu_gather structure,
which would allow us to associate the correct mapping granule with the
various sub-mappings within the range being invalidated. However, this
is costly in terms of book-keeping and memory management, so instead we
approximate by keeping track of the page table levels that are cleared
and provide a means to query the smallest granule required for invalidation.

Signed-off-by: Will Deacon 
Cc:  # 4.19
Signed-off-by: Santosh Sivaraj 
[santosh: prerequisite for upcoming tlbflush backports]
---
 include/asm-generic/tlb.h | 58 +--
 mm/memory.c   |  4 ++-
 2 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 97306b32d8d2..f2b9dc9cbaf8 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -114,6 +114,14 @@ struct mmu_gather {
 */
unsigned intfreed_tables : 1;
 
+   /*
+* at which levels have we cleared entries?
+*/
+   unsigned intcleared_ptes : 1;
+   unsigned intcleared_pmds : 1;
+   unsigned intcleared_puds : 1;
+   unsigned intcleared_p4ds : 1;
+
struct mmu_gather_batch *active;
struct mmu_gather_batch local;
struct page *__pages[MMU_GATHER_BUNDLE];
@@ -148,6 +156,10 @@ static inline void __tlb_reset_range(struct mmu_gather 
*tlb)
tlb->end = 0;
}
tlb->freed_tables = 0;
+   tlb->cleared_ptes = 0;
+   tlb->cleared_pmds = 0;
+   tlb->cleared_puds = 0;
+   tlb->cleared_p4ds = 0;
 }
 
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
@@ -197,6 +209,25 @@ static inline void 
tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 }
 #endif
 
+static inline unsigned long tlb_get_unmap_shift(struct mmu_gather *tlb)
+{
+   if (tlb->cleared_ptes)
+   return PAGE_SHIFT;
+   if (tlb->cleared_pmds)
+   return PMD_SHIFT;
+   if (tlb->cleared_puds)
+   return PUD_SHIFT;
+   if (tlb->cleared_p4ds)
+   return P4D_SHIFT;
+
+   return PAGE_SHIFT;
+}
+
+static inline unsigned long tlb_get_unmap_size(struct mmu_gather *tlb)
+{
+   return 1UL << tlb_get_unmap_shift(tlb);
+}
+
 /*
  * In the case of tlb vma handling, we can optimise these away in the
  * case where we're doing a full MM flush.  When we're doing a munmap,
@@ -230,13 +261,19 @@ static inline void 
tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define tlb_remove_tlb_entry(tlb, ptep, address)   \
do {\
__tlb_adjust_range(tlb, address, PAGE_SIZE);\
+   tlb->cleared_ptes = 1;  \
__tlb_remove_tlb_entry(tlb, ptep, address); \
} while (0)
 
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)\
-   do { \
-   __tlb_adjust_range(tlb, address, huge_page_size(h)); \
-   __tlb_remove_tlb_entry(tlb, ptep, address);  \
+#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)   \
+   do {\
+   unsigned long _sz = huge_page_size(h);  \
+   __tlb_adjust_range(tlb, address, _sz);  \
+   if (_sz == PMD_SIZE)\
+   tlb->cleared_pmds = 1;  \
+   else if (_sz == PUD_SIZE)   \
+   tlb->cleared_puds = 1;  \
+   __tlb_remove_tlb_entry(tlb, ptep, address); \
} while (0)
 
 /**
@@ -250,6 +287,7 @@ static inline void tlb_remove_check_page_size_change(struct 
mmu_gather *tlb,
 #define tlb_remove_pmd_tlb_entry(tlb, pmdp, address)   \
do {\
__tlb_adjust_range(tlb, address, HPAGE_PMD_SIZE);   \
+   tlb->cleared_pmds = 1;  \
__tlb_remove_pmd_tlb_entry(tlb, pmdp, address); \
} while (0)
 
@@ -264,6 +302,7 @@ static inline void tlb_remove_check_page_size_change(struct 
mmu_gather *tlb,
 #define 

[PATCH 0/6] Memory corruption may occur due to incorrent tlb flush

2020-02-19 Thread Santosh Sivaraj
The TLB flush optimisation (a46cc7a90f: powerpc/mm/radix: Improve TLB/PWC
flushes) may result in random memory corruption. Any concurrent page-table walk
could end up with a Use-after-Free. Even on UP this might give issues, since
mmu_gather is preemptible these days. An interrupt or preempted task accessing
user pages might stumble into the free page if the hardware caches page
directories.

The series is a backport of the fix sent by Peter [1].

The first three patches are dependencies for the last patch (avoid potential
double flush). If the performance impact due to double flush is considered
trivial then the first three patches and last patch may be dropped.

[1] https://patchwork.kernel.org/cover/11284843/
--
Aneesh Kumar K.V (1):
  powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case

Peter Zijlstra (4):
  asm-generic/tlb: Track freeing of page-table directories in struct
mmu_gather
  asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE
  mm/mmu_gather: invalidate TLB correctly on batch allocation failure
and flush
  asm-generic/tlb: avoid potential double flush

Will Deacon (1):
  asm-generic/tlb: Track which levels of the page tables have been
cleared

 arch/Kconfig |   3 -
 arch/powerpc/Kconfig |   2 +-
 arch/powerpc/include/asm/book3s/32/pgalloc.h |   8 --
 arch/powerpc/include/asm/book3s/64/pgalloc.h |   2 -
 arch/powerpc/include/asm/tlb.h   |  11 ++
 arch/powerpc/mm/pgtable-book3s64.c   |   7 --
 arch/sparc/include/asm/tlb_64.h  |   9 ++
 arch/x86/Kconfig |   1 -
 include/asm-generic/tlb.h| 103 ---
 mm/memory.c  |  20 ++--
 10 files changed, 122 insertions(+), 44 deletions(-)

-- 
2.24.1



[PATCH 1/6] asm-generic/tlb: Track freeing of page-table directories in struct mmu_gather

2020-02-19 Thread Santosh Sivaraj
From: Peter Zijlstra 

Some architectures require different TLB invalidation instructions
depending on whether it is only the last-level of page table being
changed, or whether there are also changes to the intermediate
(directory) entries higher up the tree.

Add a new bit to the flags bitfield in struct mmu_gather so that the
architecture code can operate accordingly if it's the intermediate
levels being invalidated.

Signed-off-by: Peter Zijlstra 
Signed-off-by: Will Deacon 
Cc:  # 4.19
Signed-off-by: Santosh Sivaraj 
[santosh: prerequisite for tlbflush backports]
---
 include/asm-generic/tlb.h | 31 +++
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b3353e21f3b3..97306b32d8d2 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -97,12 +97,22 @@ struct mmu_gather {
 #endif
unsigned long   start;
unsigned long   end;
-   /* we are in the middle of an operation to clear
-* a full mm and can make some optimizations */
-   unsigned intfullmm : 1,
-   /* we have performed an operation which
-* requires a complete flush of the tlb */
-   need_flush_all : 1;
+   /*
+* we are in the middle of an operation to clear
+* a full mm and can make some optimizations
+*/
+   unsigned intfullmm : 1;
+
+   /*
+* we have performed an operation which
+* requires a complete flush of the tlb
+*/
+   unsigned intneed_flush_all : 1;
+
+   /*
+* we have removed page directories
+*/
+   unsigned intfreed_tables : 1;
 
struct mmu_gather_batch *active;
struct mmu_gather_batch local;
@@ -137,6 +147,7 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
tlb->start = TASK_SIZE;
tlb->end = 0;
}
+   tlb->freed_tables = 0;
 }
 
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
@@ -278,6 +289,7 @@ static inline void tlb_remove_check_page_size_change(struct 
mmu_gather *tlb,
 #define pte_free_tlb(tlb, ptep, address)   \
do {\
__tlb_adjust_range(tlb, address, PAGE_SIZE);\
+   tlb->freed_tables = 1;  \
__pte_free_tlb(tlb, ptep, address); \
} while (0)
 #endif
@@ -285,7 +297,8 @@ static inline void tlb_remove_check_page_size_change(struct 
mmu_gather *tlb,
 #ifndef pmd_free_tlb
 #define pmd_free_tlb(tlb, pmdp, address)   \
do {\
-   __tlb_adjust_range(tlb, address, PAGE_SIZE);\
+   __tlb_adjust_range(tlb, address, PAGE_SIZE);\
+   tlb->freed_tables = 1;  \
__pmd_free_tlb(tlb, pmdp, address); \
} while (0)
 #endif
@@ -295,6 +308,7 @@ static inline void tlb_remove_check_page_size_change(struct 
mmu_gather *tlb,
 #define pud_free_tlb(tlb, pudp, address)   \
do {\
__tlb_adjust_range(tlb, address, PAGE_SIZE);\
+   tlb->freed_tables = 1;  \
__pud_free_tlb(tlb, pudp, address); \
} while (0)
 #endif
@@ -304,7 +318,8 @@ static inline void tlb_remove_check_page_size_change(struct 
mmu_gather *tlb,
 #ifndef p4d_free_tlb
 #define p4d_free_tlb(tlb, pudp, address)   \
do {\
-   __tlb_adjust_range(tlb, address, PAGE_SIZE);\
+   __tlb_adjust_range(tlb, address, PAGE_SIZE);\
+   tlb->freed_tables = 1;  \
__p4d_free_tlb(tlb, pudp, address); \
} while (0)
 #endif
-- 
2.24.1



Re: [PATCH v3 0/6] implement KASLR for powerpc/fsl_booke/64

2020-02-19 Thread Jason Yan

ping...

on 2020/2/13 11:00, Jason Yan wrote:

Hi everyone, any comments or suggestions?

Thanks,
Jason

on 2020/2/6 10:58, Jason Yan wrote:

This is a try to implement KASLR for Freescale BookE64 which is based on
my earlier implementation for Freescale BookE32:
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718

The implementation for Freescale BookE64 is similar as BookE32. One
difference is that Freescale BookE64 set up a TLB mapping of 1G during
booting. Another difference is that ppc64 needs the kernel to be
64K-aligned. So we can randomize the kernel in this 1G mapping and make
it 64K-aligned. This can save some code to creat another TLB map at
early boot. The disadvantage is that we only have about 1G/64K = 16384
slots to put the kernel in.

 KERNELBASE

   64K |--> kernel <--|
    |  |  |
 +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
 |  |  |  ||  |  |  |  |  |  |  |  |  ||  |  |
 +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
 | |    1G
 |->   offset    <-|

   kernstart_virt_addr

I'm not sure if the slot numbers is enough or the design has any
defects. If you have some better ideas, I would be happy to hear that.

Thank you all.

v2->v3:
   Fix build error when KASLR is disabled.
v1->v2:
   Add __kaslr_offset for the secondary cpu boot up.

Jason Yan (6):
   powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and
 kaslr_early_init()
   powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper
   powerpc/fsl_booke/64: implement KASLR for fsl_booke64
   powerpc/fsl_booke/64: do not clear the BSS for the second pass
   powerpc/fsl_booke/64: clear the original kernel if randomized
   powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst
 and add 64bit part

  .../{kaslr-booke32.rst => kaslr-booke.rst}    | 35 +++--
  arch/powerpc/Kconfig  |  2 +-
  arch/powerpc/kernel/exceptions-64e.S  | 23 ++
  arch/powerpc/kernel/head_64.S | 14 
  arch/powerpc/kernel/setup_64.c    |  4 +-
  arch/powerpc/mm/mmu_decl.h    | 19 ++---
  arch/powerpc/mm/nohash/kaslr_booke.c  | 71 +--
  7 files changed, 132 insertions(+), 36 deletions(-)
  rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} 
(59%)





.




Re: [PATCH 0/3] soc: fsl: dpio: Enable QMAN batch enqueuing

2020-02-19 Thread Li Yang
On Thu, Feb 6, 2020 at 2:41 PM Roy Pledge  wrote:
>
> On 12/12/2019 12:01 PM, Youri Querry wrote:
> > This patch set consists of:
> > - We added an interface to enqueue several packets at a time and
> >improve performance.
> > - Make the algorithm decisions once at initialization and use
> >function pointers to improve performance.
> > - Replaced the QMAN enqueue array mode algorithm with a ring
> >mode algorithm. This is to make the enqueue of several frames
> >at a time more effective.
> >
> > Youri Querry (3):
> >soc: fsl: dpio: Adding QMAN multiple enqueue interface.
> >soc: fsl: dpio: QMAN performance improvement. Function pointer
> >  indirection.
> >soc: fsl: dpio: Replace QMAN array mode by ring mode enqueue.
> >
> >   drivers/soc/fsl/dpio/dpio-service.c |  69 +++-
> >   drivers/soc/fsl/dpio/qbman-portal.c | 766 
> > 
> >   drivers/soc/fsl/dpio/qbman-portal.h | 158 +++-
> >   include/soc/fsl/dpaa2-io.h  |   6 +-
> >   4 files changed, 907 insertions(+), 92 deletions(-)
> >
> Acked-by: Roy Pledge 

Series applied with some clean up and typo fix in the title/commit message.

Regards,
Leo


Re: MCE handler gets NIP wrong on MPC8378

2020-02-19 Thread Radu Rendec
On 02/19/2020 at 4:21 PM Christophe Leroy  wrote:
> > Radu Rendec  a écrit :
> >> On 02/19/2020 at 10:11 AM Radu Rendec  wrote:
> >>> On 02/18/2020 at 1:08 PM Christophe Leroy  wrote:
>  Le 18/02/2020 à 18:07, Radu Rendec a écrit :
>  > The saved NIP seems to be broken inside machine_check_exception() on
>  > MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
>  > but I have seen other weird values.
>  >
>  > I've been able to track down the entry code to head_32.S (vector 
>  > 0x200),
>  > but I'm not sure where/how the NIP value (where the exception occurred)
>  > is captured.
> 
>  NIP value is supposed to come from SRR0, loaded in r12 in PROLOG_2 and
>  saved into _NIP(r11) in transfer_to_handler in entry_32.S
> 
>  Can something clobber r12 at some point ?
> 
> >>>
> >>> I did something even simpler: I added the following
> >>>
> >>>  lis r12,0x1234
> >>>
> >>> ... right after
> >>>
> >>>  mfspr r12,SPRN_SRR0
> >>>
> >>> ... and now the NIP value I see in the crash dump is 0x1234. This
> >>> means r12 is not clobbered and most likely the NIP value I normally see
> >>> is the actual SRR0 value.
> >>
> >> I apologize for the noise. I just found out accidentally that the saved
> >> NIP value is correct if interrupts are disabled at the time when the
> >> faulty access that triggers the MCE occurs. This seems to happen
> >> consistently.
> >>
> >> By "interrupts are disabled" I mean local_irq_save/local_irq_restore, so
> >> it's basically enough to wrap ioread32 to get the NIP value right.
> >>
> >> Does this make any sense? Maybe it's not a silicon bug after all, or
> >> maybe it is and I just found a workaround. Could this happen on other
> >> PowerPC CPUs as well?
> >
> > Interesting.
> >
> > 0x900 is the adress of the timer interrupt.
> >
> > Would the MCE occur just after the timer interrupt ?

I doubt that. I'm using a small test module to artificially trigger the
MCE. Basically it's just this (the full code is in my original post):

bad_addr_base = ioremap(0xf000, 0x100);
x = ioread32(bad_addr_base);

I find it hard to believe that every time I load the module the lwbrx
instruction that triggers the MCE is executed exactly after the timer
interrupt (or that the timer interrupt always occurs close to the lwbrx
instruction).

> >
> > Can you tell how are configured your IO busses, etc ... ?

Nothing special. The device tree is mostly similar to mpc8379_rdb.dts,
but I can provide the actual dts if you think it's relevant.

> And what's the value of SERSR after the machine check ?

I'm assuming you're talking about the IPIC SERSR register. I modified
machine_check_exception and added a call to ipic_get_mcp_status, which
seems to read IPIC_SERSR. The value is 0, both with interrupts enabled
and disabled (which makes sense, since disabling/enabling interrupts is
local to the CPU core).

> Do you use the local bus monitoring driver ?

I don't. In fact, I'm not even aware of it. What driver is that?

Best regards,
Radu


Re: MCE handler gets NIP wrong on MPC8378

2020-02-19 Thread Christophe Leroy

Christophe Leroy  a écrit :


Radu Rendec  a écrit :


On 02/19/2020 at 10:11 AM Radu Rendec  wrote:

On 02/18/2020 at 1:08 PM Christophe Leroy  wrote:

Le 18/02/2020 à 18:07, Radu Rendec a écrit :
> The saved NIP seems to be broken inside machine_check_exception() on
> MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
> but I have seen other weird values.
>
> I've been able to track down the entry code to head_32.S (vector 0x200),
> but I'm not sure where/how the NIP value (where the exception occurred)
> is captured.

NIP value is supposed to come from SRR0, loaded in r12 in PROLOG_2 and
saved into _NIP(r11) in transfer_to_handler in entry_32.S

Can something clobber r12 at some point ?



I did something even simpler: I added the following

 lis r12,0x1234

... right after

 mfspr r12,SPRN_SRR0

... and now the NIP value I see in the crash dump is 0x1234. This
means r12 is not clobbered and most likely the NIP value I normally see
is the actual SRR0 value.


I apologize for the noise. I just found out accidentally that the saved
NIP value is correct if interrupts are disabled at the time when the
faulty access that triggers the MCE occurs. This seems to happen
consistently.

By "interrupts are disabled" I mean local_irq_save/local_irq_restore, so
it's basically enough to wrap ioread32 to get the NIP value right.

Does this make any sense? Maybe it's not a silicon bug after all, or
maybe it is and I just found a workaround. Could this happen on other
PowerPC CPUs as well?


Interesting.

0x900 is the adress of the timer interrupt.

Would the MCE occur just after the timer interrupt ?

Can you tell how are configured your IO busses, etc ... ?


And what's the value of SERSR after the machine check ?

Do you use the local bus monitoring driver ?

Christophe



Re: MCE handler gets NIP wrong on MPC8378

2020-02-19 Thread Christophe Leroy

Radu Rendec  a écrit :


On 02/19/2020 at 10:11 AM Radu Rendec  wrote:

On 02/18/2020 at 1:08 PM Christophe Leroy  wrote:
> Le 18/02/2020 à 18:07, Radu Rendec a écrit :
> > The saved NIP seems to be broken inside machine_check_exception() on
> > MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
> > but I have seen other weird values.
> >
> > I've been able to track down the entry code to head_32.S (vector 0x200),
> > but I'm not sure where/how the NIP value (where the exception occurred)
> > is captured.
>
> NIP value is supposed to come from SRR0, loaded in r12 in PROLOG_2 and
> saved into _NIP(r11) in transfer_to_handler in entry_32.S
>
> Can something clobber r12 at some point ?
>

I did something even simpler: I added the following

  lis r12,0x1234

... right after

  mfspr r12,SPRN_SRR0

... and now the NIP value I see in the crash dump is 0x1234. This
means r12 is not clobbered and most likely the NIP value I normally see
is the actual SRR0 value.


I apologize for the noise. I just found out accidentally that the saved
NIP value is correct if interrupts are disabled at the time when the
faulty access that triggers the MCE occurs. This seems to happen
consistently.

By "interrupts are disabled" I mean local_irq_save/local_irq_restore, so
it's basically enough to wrap ioread32 to get the NIP value right.

Does this make any sense? Maybe it's not a silicon bug after all, or
maybe it is and I just found a workaround. Could this happen on other
PowerPC CPUs as well?


Interesting.

0x900 is the adress of the timer interrupt.

Would the MCE occur just after the timer interrupt ?

Can you tell how are configured your IO busses, etc ... ?

Christophe




Re: [PATCH v2 2/3] ASoC: dt-bindings: fsl_easrc: Add document for EASRC

2020-02-19 Thread Rob Herring
On Tue, Feb 18, 2020 at 02:39:36PM +0800, Shengjiu Wang wrote:
> EASRC (Enhanced Asynchronous Sample Rate Converter) is a new
> IP module found on i.MX8MN.
> 
> Signed-off-by: Shengjiu Wang 
> ---
>  .../devicetree/bindings/sound/fsl,easrc.txt   | 57 +++
>  1 file changed, 57 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/sound/fsl,easrc.txt

Bindings are now in DT schema format. See 
Documentation/devicetree/writing-schema.rst.



Re: MCE handler gets NIP wrong on MPC8378

2020-02-19 Thread Radu Rendec
On 02/19/2020 at 10:11 AM Radu Rendec  wrote:
> On 02/18/2020 at 1:08 PM Christophe Leroy  wrote:
> > Le 18/02/2020 à 18:07, Radu Rendec a écrit :
> > > The saved NIP seems to be broken inside machine_check_exception() on
> > > MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
> > > but I have seen other weird values.
> > >
> > > I've been able to track down the entry code to head_32.S (vector 0x200),
> > > but I'm not sure where/how the NIP value (where the exception occurred)
> > > is captured.
> >
> > NIP value is supposed to come from SRR0, loaded in r12 in PROLOG_2 and
> > saved into _NIP(r11) in transfer_to_handler in entry_32.S
> >
> > Can something clobber r12 at some point ?
> >
>
> I did something even simpler: I added the following
>
>   lis r12,0x1234
>
> ... right after
>
>   mfspr r12,SPRN_SRR0
>
> ... and now the NIP value I see in the crash dump is 0x1234. This
> means r12 is not clobbered and most likely the NIP value I normally see
> is the actual SRR0 value.

I apologize for the noise. I just found out accidentally that the saved
NIP value is correct if interrupts are disabled at the time when the
faulty access that triggers the MCE occurs. This seems to happen
consistently.

By "interrupts are disabled" I mean local_irq_save/local_irq_restore, so
it's basically enough to wrap ioread32 to get the NIP value right.

Does this make any sense? Maybe it's not a silicon bug after all, or
maybe it is and I just found a workaround. Could this happen on other
PowerPC CPUs as well?

Best regards,
Radu


Re: [PATCH] KVM: PPC: Book3S HV: Use RADIX_PTE_INDEX_SIZE in Radix MMU code

2020-02-19 Thread Leonardo Bras
Hello Michael,

On Tue, 2020-02-18 at 15:36 +1100, Michael Ellerman wrote:
> In kvmppc_unmap_free_pte() in book3s_64_mmu_radix.c, we use the
> non-constant value PTE_INDEX_SIZE to clear a PTE page.
> 
> We can instead use the constant RADIX_PTE_INDEX_SIZE, because we know
> this code will only be running when the Radix MMU is active.
> 
> Note that we already use RADIX_PTE_INDEX_SIZE for the allocation of
> kvm_pte_cache.
> 
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
> b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> index 803940d79b73..134fbc1f029f 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> @@ -425,7 +425,7 @@ static void kvmppc_unmap_free_pte(struct kvm *kvm, pte_t 
> *pte, bool full,
> unsigned int lpid)
>  {
>   if (full) {
> - memset(pte, 0, sizeof(long) << PTE_INDEX_SIZE);
> + memset(pte, 0, sizeof(long) << RADIX_PTE_INDEX_SIZE);
>   } else {
>   pte_t *p = pte;
>   unsigned long it;

Looks fine to mee. 

For book3s_64, pgtable.h says:
extern unsigned long __pte_index_size;
#define PTE_INDEX_SIZE  __pte_index_size

powerpc/mm/pgtable_64.c defines/export the variable:
unsigned long __pte_index_size;
EXPORT_SYMBOL(__pte_index_size);

And book3s64/radix_pgtable.c set the value in radix__early_init_mmu().
__pte_index_size = RADIX_PTE_INDEX_SIZE;

So I think it's ok to use the value directly in book3s_64_mmu_radix.c.
The include dependency looks fine for that to work.

FWIW:
Reviewed-by: Leonardo Bras 



signature.asc
Description: This is a digitally signed message part


[RESEND PATCH v2 9/9] ath5k: Constify ioreadX() iomem argument (as in generic implementation)

2020-02-19 Thread Krzysztof Kozlowski
The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
Acked-by: Kalle Valo 
---
 drivers/net/wireless/ath/ath5k/ahb.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath5k/ahb.c 
b/drivers/net/wireless/ath/ath5k/ahb.c
index 2c9cec8b53d9..8bd01df369fb 100644
--- a/drivers/net/wireless/ath/ath5k/ahb.c
+++ b/drivers/net/wireless/ath/ath5k/ahb.c
@@ -138,18 +138,18 @@ static int ath_ahb_probe(struct platform_device *pdev)
 
if (bcfg->devid >= AR5K_SREV_AR2315_R6) {
/* Enable WMAC AHB arbitration */
-   reg = ioread32((void __iomem *) AR5K_AR2315_AHB_ARB_CTL);
+   reg = ioread32((const void __iomem *) AR5K_AR2315_AHB_ARB_CTL);
reg |= AR5K_AR2315_AHB_ARB_CTL_WLAN;
iowrite32(reg, (void __iomem *) AR5K_AR2315_AHB_ARB_CTL);
 
/* Enable global WMAC swapping */
-   reg = ioread32((void __iomem *) AR5K_AR2315_BYTESWAP);
+   reg = ioread32((const void __iomem *) AR5K_AR2315_BYTESWAP);
reg |= AR5K_AR2315_BYTESWAP_WMAC;
iowrite32(reg, (void __iomem *) AR5K_AR2315_BYTESWAP);
} else {
/* Enable WMAC DMA access (assuming 5312 or 231x*/
/* TODO: check other platforms */
-   reg = ioread32((void __iomem *) AR5K_AR5312_ENABLE);
+   reg = ioread32((const void __iomem *) AR5K_AR5312_ENABLE);
if (to_platform_device(ah->dev)->id == 0)
reg |= AR5K_AR5312_ENABLE_WLAN0;
else
@@ -202,12 +202,12 @@ static int ath_ahb_remove(struct platform_device *pdev)
 
if (bcfg->devid >= AR5K_SREV_AR2315_R6) {
/* Disable WMAC AHB arbitration */
-   reg = ioread32((void __iomem *) AR5K_AR2315_AHB_ARB_CTL);
+   reg = ioread32((const void __iomem *) AR5K_AR2315_AHB_ARB_CTL);
reg &= ~AR5K_AR2315_AHB_ARB_CTL_WLAN;
iowrite32(reg, (void __iomem *) AR5K_AR2315_AHB_ARB_CTL);
} else {
/*Stop DMA access */
-   reg = ioread32((void __iomem *) AR5K_AR5312_ENABLE);
+   reg = ioread32((const void __iomem *) AR5K_AR5312_ENABLE);
if (to_platform_device(ah->dev)->id == 0)
reg &= ~AR5K_AR5312_ENABLE_WLAN0;
else
-- 
2.17.1



[RESEND PATCH v2 8/9] media: fsl-viu: Constify ioreadX() iomem argument (as in generic implementation)

2020-02-19 Thread Krzysztof Kozlowski
The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/media/platform/fsl-viu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/media/platform/fsl-viu.c b/drivers/media/platform/fsl-viu.c
index 81a8faedbba6..991d9dc82749 100644
--- a/drivers/media/platform/fsl-viu.c
+++ b/drivers/media/platform/fsl-viu.c
@@ -34,7 +34,7 @@
 /* Allow building this driver with COMPILE_TEST */
 #if !defined(CONFIG_PPC) && !defined(CONFIG_MICROBLAZE)
 #define out_be32(v, a) iowrite32be(a, (void __iomem *)v)
-#define in_be32(a) ioread32be((void __iomem *)a)
+#define in_be32(a) ioread32be((const void __iomem *)a)
 #endif
 
 #define BUFFER_TIMEOUT msecs_to_jiffies(500)  /* 0.5 seconds */
-- 
2.17.1



[RESEND PATCH v2 7/9] drm/nouveau: Constify ioreadX() iomem argument (as in generic implementation)

2020-02-19 Thread Krzysztof Kozlowski
The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 1b62ccc57aef..d95bdd65dbca 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -613,7 +613,7 @@ nouveau_bo_rd32(struct nouveau_bo *nvbo, unsigned index)
mem += index;
 
if (is_iomem)
-   return ioread32_native((void __force __iomem *)mem);
+   return ioread32_native((const void __force __iomem *)mem);
else
return *mem;
 }
-- 
2.17.1



[RESEND PATCH v2 6/9] drm/mgag200: Constify ioreadX() iomem argument (as in generic implementation)

2020-02-19 Thread Krzysztof Kozlowski
The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Thomas Zimmermann 

---

Changes since v1:
1. Add Thomas' review.
---
 drivers/gpu/drm/mgag200/mgag200_drv.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/mgag200/mgag200_drv.h 
b/drivers/gpu/drm/mgag200/mgag200_drv.h
index aa32aad222c2..6512b3af4fb7 100644
--- a/drivers/gpu/drm/mgag200/mgag200_drv.h
+++ b/drivers/gpu/drm/mgag200/mgag200_drv.h
@@ -34,9 +34,9 @@
 
 #define MGAG200FB_CONN_LIMIT 1
 
-#define RREG8(reg) ioread8(((void __iomem *)mdev->rmmio) + (reg))
+#define RREG8(reg) ioread8(((const void __iomem *)mdev->rmmio) + (reg))
 #define WREG8(reg, v) iowrite8(v, ((void __iomem *)mdev->rmmio) + (reg))
-#define RREG32(reg) ioread32(((void __iomem *)mdev->rmmio) + (reg))
+#define RREG32(reg) ioread32(((const void __iomem *)mdev->rmmio) + (reg))
 #define WREG32(reg, v) iowrite32(v, ((void __iomem *)mdev->rmmio) + (reg))
 
 #define ATTR_INDEX 0x1fc0
-- 
2.17.1



[RESEND PATCH v2 5/9] arc: Constify ioreadX() iomem argument (as in generic implementation)

2020-02-19 Thread Krzysztof Kozlowski
The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the
address so they can be converted to a "const" version for const-safety
and consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
Acked-by: Alexey Brodkin 

---

Changes since v1:
1. Add Alexey's ack.
---
 arch/arc/plat-axs10x/axs10x.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arc/plat-axs10x/axs10x.c b/arch/arc/plat-axs10x/axs10x.c
index 63ea5a606ecd..180c260a8221 100644
--- a/arch/arc/plat-axs10x/axs10x.c
+++ b/arch/arc/plat-axs10x/axs10x.c
@@ -84,7 +84,7 @@ static void __init axs10x_print_board_ver(unsigned int creg, 
const char *str)
unsigned int val;
} board;
 
-   board.val = ioread32((void __iomem *)creg);
+   board.val = ioread32((const void __iomem *)creg);
pr_info("AXS: %s FPGA Date: %u-%u-%u\n", str, board.d, board.m,
board.y);
 }
@@ -95,7 +95,7 @@ static void __init axs10x_early_init(void)
char mb[32];
 
/* Determine motherboard version */
-   if (ioread32((void __iomem *) CREG_MB_CONFIG) & (1 << 28))
+   if (ioread32((const void __iomem *) CREG_MB_CONFIG) & (1 << 28))
mb_rev = 3; /* HT-3 (rev3.0) */
else
mb_rev = 2; /* HT-2 (rev2.0) */
-- 
2.17.1



[RESEND PATCH v2 4/9] virtio: pci: Constify ioreadX() iomem argument (as in generic implementation)

2020-02-19 Thread Krzysztof Kozlowski
The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Geert Uytterhoeven 

---

Changes since v1:
1. Add Geert's review.
---
 drivers/virtio/virtio_pci_modern.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index 7abcc50838b8..fc58db4ab6c3 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -26,16 +26,16 @@
  * method, i.e. 32-bit accesses for 32-bit fields, 16-bit accesses
  * for 16-bit fields and 8-bit accesses for 8-bit fields.
  */
-static inline u8 vp_ioread8(u8 __iomem *addr)
+static inline u8 vp_ioread8(const u8 __iomem *addr)
 {
return ioread8(addr);
 }
-static inline u16 vp_ioread16 (__le16 __iomem *addr)
+static inline u16 vp_ioread16 (const __le16 __iomem *addr)
 {
return ioread16(addr);
 }
 
-static inline u32 vp_ioread32(__le32 __iomem *addr)
+static inline u32 vp_ioread32(const __le32 __iomem *addr)
 {
return ioread32(addr);
 }
-- 
2.17.1



[RESEND PATCH v2 3/9] ntb: intel: Constify ioreadX() iomem argument (as in generic implementation)

2020-02-19 Thread Krzysztof Kozlowski
The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Geert Uytterhoeven 
Acked-by: Dave Jiang 

---

Changes since v1:
1. Add Geert's review.
2. Add Dave's ack.
---
 drivers/ntb/hw/intel/ntb_hw_gen1.c  | 2 +-
 drivers/ntb/hw/intel/ntb_hw_gen3.h  | 2 +-
 drivers/ntb/hw/intel/ntb_hw_intel.h | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/ntb/hw/intel/ntb_hw_gen1.c 
b/drivers/ntb/hw/intel/ntb_hw_gen1.c
index bb57ec239029..9202502a9787 100644
--- a/drivers/ntb/hw/intel/ntb_hw_gen1.c
+++ b/drivers/ntb/hw/intel/ntb_hw_gen1.c
@@ -1202,7 +1202,7 @@ int intel_ntb_peer_spad_write(struct ntb_dev *ntb, int 
pidx, int sidx,
   ndev->peer_reg->spad);
 }
 
-static u64 xeon_db_ioread(void __iomem *mmio)
+static u64 xeon_db_ioread(const void __iomem *mmio)
 {
return (u64)ioread16(mmio);
 }
diff --git a/drivers/ntb/hw/intel/ntb_hw_gen3.h 
b/drivers/ntb/hw/intel/ntb_hw_gen3.h
index 75fb86ca27bb..d1455f24ec99 100644
--- a/drivers/ntb/hw/intel/ntb_hw_gen3.h
+++ b/drivers/ntb/hw/intel/ntb_hw_gen3.h
@@ -91,7 +91,7 @@
 #define GEN3_DB_TOTAL_SHIFT33
 #define GEN3_SPAD_COUNT16
 
-static inline u64 gen3_db_ioread(void __iomem *mmio)
+static inline u64 gen3_db_ioread(const void __iomem *mmio)
 {
return ioread64(mmio);
 }
diff --git a/drivers/ntb/hw/intel/ntb_hw_intel.h 
b/drivers/ntb/hw/intel/ntb_hw_intel.h
index e071e28bca3f..3c0a5a2da241 100644
--- a/drivers/ntb/hw/intel/ntb_hw_intel.h
+++ b/drivers/ntb/hw/intel/ntb_hw_intel.h
@@ -102,7 +102,7 @@ struct intel_ntb_dev;
 struct intel_ntb_reg {
int (*poll_link)(struct intel_ntb_dev *ndev);
int (*link_is_up)(struct intel_ntb_dev *ndev);
-   u64 (*db_ioread)(void __iomem *mmio);
+   u64 (*db_ioread)(const void __iomem *mmio);
void (*db_iowrite)(u64 db_bits, void __iomem *mmio);
unsigned long   ntb_ctl;
resource_size_t db_size;
-- 
2.17.1



[RESEND PATCH v2 2/9] rtl818x: Constify ioreadX() iomem argument (as in generic implementation)

2020-02-19 Thread Krzysztof Kozlowski
The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Geert Uytterhoeven 
Acked-by: Kalle Valo 

---

Changes since v1:
1. Add Geert's review.
2. Add Kalle's ack. Fix subject prefix.
---
 drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h 
b/drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h
index 7948a2da195a..2ff00800d45b 100644
--- a/drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h
+++ b/drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h
@@ -150,17 +150,17 @@ void rtl8180_write_phy(struct ieee80211_hw *dev, u8 addr, 
u32 data);
 void rtl8180_set_anaparam(struct rtl8180_priv *priv, u32 anaparam);
 void rtl8180_set_anaparam2(struct rtl8180_priv *priv, u32 anaparam2);
 
-static inline u8 rtl818x_ioread8(struct rtl8180_priv *priv, u8 __iomem *addr)
+static inline u8 rtl818x_ioread8(struct rtl8180_priv *priv, const u8 __iomem 
*addr)
 {
return ioread8(addr);
 }
 
-static inline u16 rtl818x_ioread16(struct rtl8180_priv *priv, __le16 __iomem 
*addr)
+static inline u16 rtl818x_ioread16(struct rtl8180_priv *priv, const __le16 
__iomem *addr)
 {
return ioread16(addr);
 }
 
-static inline u32 rtl818x_ioread32(struct rtl8180_priv *priv, __le32 __iomem 
*addr)
+static inline u32 rtl818x_ioread32(struct rtl8180_priv *priv, const __le32 
__iomem *addr)
 {
return ioread32(addr);
 }
-- 
2.17.1



[RESEND PATCH v2 1/9] iomap: Constify ioreadX() iomem argument (as in generic implementation)

2020-02-19 Thread Krzysztof Kozlowski
The ioreadX() and ioreadX_rep() helpers have inconsistent interface.  On
some architectures void *__iomem address argument is a pointer to const,
on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Suggested-by: Geert Uytterhoeven 
Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Geert Uytterhoeven 
Reviewed-by: Arnd Bergmann 

---

Changes since v1:
1. Constify also ioreadX_rep() and mmio_insX(),
2. Squash lib+alpha+powerpc+parisc+sh into one patch for bisectability,
3. Add Geert's review.
4. Add Arnd's review.
---
 arch/alpha/include/asm/core_apecs.h   |  6 +--
 arch/alpha/include/asm/core_cia.h |  6 +--
 arch/alpha/include/asm/core_lca.h |  6 +--
 arch/alpha/include/asm/core_marvel.h  |  4 +-
 arch/alpha/include/asm/core_mcpcia.h  |  6 +--
 arch/alpha/include/asm/core_t2.h  |  2 +-
 arch/alpha/include/asm/io.h   | 12 ++---
 arch/alpha/include/asm/io_trivial.h   | 16 +++---
 arch/alpha/include/asm/jensen.h   |  2 +-
 arch/alpha/include/asm/machvec.h  |  6 +--
 arch/alpha/kernel/core_marvel.c   |  2 +-
 arch/alpha/kernel/io.c| 12 ++---
 arch/parisc/include/asm/io.h  |  4 +-
 arch/parisc/lib/iomap.c   | 72 +--
 arch/powerpc/kernel/iomap.c   | 28 +--
 arch/sh/kernel/iomap.c| 22 
 include/asm-generic/iomap.h   | 28 +--
 include/linux/io-64-nonatomic-hi-lo.h |  4 +-
 include/linux/io-64-nonatomic-lo-hi.h |  4 +-
 lib/iomap.c   | 30 +--
 20 files changed, 136 insertions(+), 136 deletions(-)

diff --git a/arch/alpha/include/asm/core_apecs.h 
b/arch/alpha/include/asm/core_apecs.h
index 0a07055bc0fe..2d9726fc02ef 100644
--- a/arch/alpha/include/asm/core_apecs.h
+++ b/arch/alpha/include/asm/core_apecs.h
@@ -384,7 +384,7 @@ struct el_apecs_procdata
}   \
} while (0)
 
-__EXTERN_INLINE unsigned int apecs_ioread8(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int apecs_ioread8(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -420,7 +420,7 @@ __EXTERN_INLINE void apecs_iowrite8(u8 b, void __iomem 
*xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int apecs_ioread16(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int apecs_ioread16(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -456,7 +456,7 @@ __EXTERN_INLINE void apecs_iowrite16(u16 b, void __iomem 
*xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int apecs_ioread32(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int apecs_ioread32(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
if (addr < APECS_DENSE_MEM)
diff --git a/arch/alpha/include/asm/core_cia.h 
b/arch/alpha/include/asm/core_cia.h
index c706a7f2b061..cb22991f6761 100644
--- a/arch/alpha/include/asm/core_cia.h
+++ b/arch/alpha/include/asm/core_cia.h
@@ -342,7 +342,7 @@ struct el_CIA_sysdata_mcheck {
 #define vuip   volatile unsigned int __force *
 #define vulp   volatile unsigned long __force *
 
-__EXTERN_INLINE unsigned int cia_ioread8(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread8(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -374,7 +374,7 @@ __EXTERN_INLINE void cia_iowrite8(u8 b, void __iomem *xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int cia_ioread16(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread16(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -404,7 +404,7 @@ __EXTERN_INLINE void cia_iowrite16(u16 b, void __iomem 
*xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int cia_ioread32(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread32(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
if (addr < CIA_DENSE_MEM)
diff --git a/arch/alpha/include/asm/core_lca.h 
b/arch/alpha/include/asm/core_lca.h
index 84d5e5b84f4f..ec86314418cb 100644
--- a/arch/alpha/include/asm/core_lca.h
+++ b/arch/alpha/include/asm/core_lca.h
@@ -230,7 +230,7 @@ union el_lca {
} while (0)
 
 
-__EXTERN_INLINE unsigned int lca_ioread8(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int lca_ioread8(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -266,7 +266,7 @@ __EXTERN_INLINE void lca_iowrite8(u8 b, void __iomem *xaddr)
*(vuip) ((addr << 5) + base_and_type) 

[RESEND PATCH v2 0/9] iomap: Constify ioreadX() iomem argument

2020-02-19 Thread Krzysztof Kozlowski
Hi,


Changes since v1

https://lore.kernel.org/lkml/1578415992-24054-1-git-send-email-k...@kernel.org/
1. Constify also ioreadX_rep() and mmio_insX(),
2. Squash lib+alpha+powerpc+parisc+sh into one patch for bisectability,
3. Add acks and reviews,
4. Re-order patches so all optional driver changes are at the end.


Description
===
The ioread8/16/32() and others have inconsistent interface among the
architectures: some taking address as const, some not.

It seems there is nothing really stopping all of them to take
pointer to const.

Patchset was only compile tested on affected architectures.  No real
testing.


volatile

There is still interface inconsistency between architectures around
"volatile" qualifier:
 - include/asm-generic/io.h:static inline u32 ioread32(const volatile void 
__iomem *addr)
 - include/asm-generic/iomap.h:extern unsigned int ioread32(const void __iomem 
*);

This is still discussed and out of scope of this patchset.


Merging
===
Multiple architectures are affected in first patch so acks are welcomed.

1. All patches depend on first patch,
2. Patches 2-4 unify the interface also in few drivers,
3. PAtches 5-9 are optional cleanup, without actual impact.


Best regards,
Krzysztof


Krzysztof Kozlowski (9):
  iomap: Constify ioreadX() iomem argument (as in generic
implementation)
  rtl818x: Constify ioreadX() iomem argument (as in generic
implementation)
  ntb: intel: Constify ioreadX() iomem argument (as in generic
implementation)
  virtio: pci: Constify ioreadX() iomem argument (as in generic
implementation)
  arc: Constify ioreadX() iomem argument (as in generic implementation)
  drm/mgag200: Constify ioreadX() iomem argument (as in generic
implementation)
  drm/nouveau: Constify ioreadX() iomem argument (as in generic
implementation)
  media: fsl-viu: Constify ioreadX() iomem argument (as in generic
implementation)
  ath5k: Constify ioreadX() iomem argument (as in generic
implementation)

 arch/alpha/include/asm/core_apecs.h   |  6 +-
 arch/alpha/include/asm/core_cia.h |  6 +-
 arch/alpha/include/asm/core_lca.h |  6 +-
 arch/alpha/include/asm/core_marvel.h  |  4 +-
 arch/alpha/include/asm/core_mcpcia.h  |  6 +-
 arch/alpha/include/asm/core_t2.h  |  2 +-
 arch/alpha/include/asm/io.h   | 12 ++--
 arch/alpha/include/asm/io_trivial.h   | 16 ++---
 arch/alpha/include/asm/jensen.h   |  2 +-
 arch/alpha/include/asm/machvec.h  |  6 +-
 arch/alpha/kernel/core_marvel.c   |  2 +-
 arch/alpha/kernel/io.c| 12 ++--
 arch/arc/plat-axs10x/axs10x.c |  4 +-
 arch/parisc/include/asm/io.h  |  4 +-
 arch/parisc/lib/iomap.c   | 72 +--
 arch/powerpc/kernel/iomap.c   | 28 
 arch/sh/kernel/iomap.c| 22 +++---
 drivers/gpu/drm/mgag200/mgag200_drv.h |  4 +-
 drivers/gpu/drm/nouveau/nouveau_bo.c  |  2 +-
 drivers/media/platform/fsl-viu.c  |  2 +-
 drivers/net/wireless/ath/ath5k/ahb.c  | 10 +--
 .../realtek/rtl818x/rtl8180/rtl8180.h |  6 +-
 drivers/ntb/hw/intel/ntb_hw_gen1.c|  2 +-
 drivers/ntb/hw/intel/ntb_hw_gen3.h|  2 +-
 drivers/ntb/hw/intel/ntb_hw_intel.h   |  2 +-
 drivers/virtio/virtio_pci_modern.c|  6 +-
 include/asm-generic/iomap.h   | 28 
 include/linux/io-64-nonatomic-hi-lo.h |  4 +-
 include/linux/io-64-nonatomic-lo-hi.h |  4 +-
 lib/iomap.c   | 30 
 30 files changed, 156 insertions(+), 156 deletions(-)

-- 
2.17.1



[PATCH] powerpc/kasan: Fix error detection on memory allocation

2020-02-19 Thread Christophe Leroy
In case (k_start & PAGE_MASK) doesn't equal (kstart), 'va' will never be
NULL allthough 'block' is NULL

Check the return of memblock_alloc() directly instead of
the resulting address in the loop.

Fixes: 509cd3f2b473 ("powerpc/32: Simplify KASAN init")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/kasan/kasan_init_32.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/kasan_init_32.c
index b533e7a8319d..e998c3576ae1 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -78,15 +78,14 @@ static int __init kasan_init_region(void *start, size_t 
size)
return ret;
 
block = memblock_alloc(k_end - k_start, PAGE_SIZE);
+   if (!block)
+   return -ENOMEM;
 
for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) {
pmd_t *pmd = pmd_offset(pud_offset(pgd_offset_k(k_cur), k_cur), 
k_cur);
void *va = block + k_cur - k_start;
pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
 
-   if (!va)
-   return -ENOMEM;
-
__set_pte_at(_mm, k_cur, pte_offset_kernel(pmd, k_cur), 
pte, 0);
}
flush_tlb_kernel_range(k_start, k_end);
-- 
2.25.0



[PATCH] powerpc/kasan: Fix shadow memory protection with CONFIG_KASAN_VMALLOC

2020-02-19 Thread Christophe Leroy
With CONFIG_KASAN_VMALLOC, new page tables are created at the time
shadow memory for vmalloc area in unmapped. If some parts of the
page table still has entries to the zero page shadow memory, the
entries are wrongly marked RW.

Make sure new page tables are populated with RO entries once
kasan_remap_early_shadow_ro() has run.

Fixes: 3d4247fcc938 ("powerpc/32: Add support of KASAN_VMALLOC")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/kasan/kasan_init_32.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/kasan_init_32.c
index 16dd95bd0749..b533e7a8319d 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -30,11 +30,13 @@ static void __init kasan_populate_pte(pte_t *ptep, pgprot_t 
prot)
__set_pte_at(_mm, va, ptep, pfn_pte(PHYS_PFN(pa), prot), 
0);
 }
 
-static int __init kasan_init_shadow_page_tables(unsigned long k_start, 
unsigned long k_end)
+static int __init
+kasan_init_shadow_page_tables(unsigned long k_start, unsigned long k_end, bool 
is_late)
 {
pmd_t *pmd;
unsigned long k_cur, k_next;
pte_t *new = NULL;
+   pgprot_t prot = is_late ? kasan_prot_ro() : PAGE_KERNEL;
 
pmd = pmd_offset(pud_offset(pgd_offset_k(k_start), k_start), k_start);
 
@@ -48,7 +50,7 @@ static int __init kasan_init_shadow_page_tables(unsigned long 
k_start, unsigned
 
if (!new)
return -ENOMEM;
-   kasan_populate_pte(new, PAGE_KERNEL);
+   kasan_populate_pte(new, prot);
 
smp_wmb(); /* See comment in __pte_alloc */
 
@@ -71,7 +73,7 @@ static int __init kasan_init_region(void *start, size_t size)
int ret;
void *block;
 
-   ret = kasan_init_shadow_page_tables(k_start, k_end);
+   ret = kasan_init_shadow_page_tables(k_start, k_end, false);
if (ret)
return ret;
 
@@ -121,7 +123,7 @@ static void __init kasan_unmap_early_shadow_vmalloc(void)
phys_addr_t pa = __pa(kasan_early_shadow_page);
 
if (!early_mmu_has_feature(MMU_FTR_HPTE_TABLE)) {
-   int ret = kasan_init_shadow_page_tables(k_start, k_end);
+   int ret = kasan_init_shadow_page_tables(k_start, k_end, true);
 
if (ret)
panic("kasan: kasan_init_shadow_page_tables() failed");
@@ -144,7 +146,8 @@ void __init kasan_mmu_init(void)
struct memblock_region *reg;
 
if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE)) {
-   ret = kasan_init_shadow_page_tables(KASAN_SHADOW_START, 
KASAN_SHADOW_END);
+   ret = kasan_init_shadow_page_tables(KASAN_SHADOW_START, 
KASAN_SHADOW_END,
+   false);
 
if (ret)
panic("kasan: kasan_init_shadow_page_tables() failed");
-- 
2.25.0



Re: MCE handler gets NIP wrong on MPC8378

2020-02-19 Thread Radu Rendec
On 02/18/2020 at 1:08 PM Christophe Leroy  wrote:
> Le 18/02/2020 à 18:07, Radu Rendec a écrit :
> > The saved NIP seems to be broken inside machine_check_exception() on
> > MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
> > but I have seen other weird values.
> >
> > I've been able to track down the entry code to head_32.S (vector 0x200),
> > but I'm not sure where/how the NIP value (where the exception occurred)
> > is captured.
>
> NIP value is supposed to come from SRR0, loaded in r12 in PROLOG_2 and
> saved into _NIP(r11) in transfer_to_handler in entry_32.S

Thank you so much for the information, it is extremely helpful!

> Can something clobber r12 at some point ?
>
> Maybe add the following at some place to trap when it happens ?
>
> tweqi r12, 0x900
>
> If you put it just after reading SRR0, and just before writing into
> NIP(r11), you'll see if its wrong from the begining or if it is
> overwriten later.

I did something even simpler: I added the following

lis r12,0x1234

... right after

mfspr r12,SPRN_SRR0

... and now the NIP value I see in the crash dump is 0x1234. This
means r12 is not clobbered and most likely the NIP value I normally see
is the actual SRR0 value.

Just to be sure that SRR0 is not clobbered before it's even saved to r12
(very unlikely though) I changed the code to save SRR0 to r8 at the very
beginning of the handler (first instruction, at address 0x200) and then
load r12 from r8 later. This of course clobbers r8, but it's good for
testing. Now in the crash dump I see 0x900 in both NIP and r8.

So I think I ruled out any problem in the Linux MCE handler. MPC8378 has
an e300 core and I double checked with the e300 core reference manual
(e300coreRM.pdf from NXP). I couldn't find anything weird there either.
Quoting from the RM:

| 5.5.2.1 Machine Check Interrupt Enabled (MSR[ME] = 1)
|
| When a machine check interrupt is taken, registers are updated as
| shown in Table 5-14.
|
| Table 5-14. Machine Check Interrupt—Register Settings
|
| SRR0 Set to the address of the next instruction that would have been
|  completed in the interrupted instruction stream. Neither this
|  instruction nor any others beyond it will have been completed.
|  All preceding instructions will have been completed.

At this point I'm assuming a silicon bug, although I couldn't find
anything interesting in the Errata provided by NXP.

Best regards,
Radu


[PATCHv3] powerpc/crashkernel: take "mem=" option into account

2020-02-19 Thread Pingfan Liu
'mem=" option is an easy way to put high pressure on memory during some
test. Hence after applying the memory limit, instead of total mem, the
actual usable memory should be considered when reserving mem for
crashkernel. Otherwise the boot up may experience OOM issue.

E.g. it would reserve 4G prior to the change and 512M afterward, if passing
crashkernel="2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G", and
mem=5G on a 256G machine.

This issue is powerpc specific because it puts higher priority on fadump
and kdump reservation than on "mem=". Referring the following code:
if (fadump_reserve_mem() == 0)
reserve_crashkernel();
...
/* Ensure that total memory size is page-aligned. */
limit = ALIGN(memory_limit ?: memblock_phys_mem_size(), PAGE_SIZE);
memblock_enforce_memory_limit(limit);

While on other arches, the effect of "mem=" takes a higher priority and pass
through memblock_phys_mem_size() before calling reserve_crashkernel().

Signed-off-by: Pingfan Liu 
To: linuxppc-dev@lists.ozlabs.org
Cc: Hari Bathini 
Cc: Michael Ellerman 
Cc: ke...@lists.infradead.org
---
v2 -> v3: improve commit log
 arch/powerpc/kernel/machine_kexec.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/machine_kexec.c 
b/arch/powerpc/kernel/machine_kexec.c
index c4ed328..eec96dc 100644
--- a/arch/powerpc/kernel/machine_kexec.c
+++ b/arch/powerpc/kernel/machine_kexec.c
@@ -114,11 +114,12 @@ void machine_kexec(struct kimage *image)
 
 void __init reserve_crashkernel(void)
 {
-   unsigned long long crash_size, crash_base;
+   unsigned long long crash_size, crash_base, total_mem_sz;
int ret;
 
+   total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size();
/* use common parsing */
-   ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+   ret = parse_crashkernel(boot_command_line, total_mem_sz,
_size, _base);
if (ret == 0 && crash_size > 0) {
crashk_res.start = crash_base;
@@ -185,7 +186,7 @@ void __init reserve_crashkernel(void)
"for crashkernel (System RAM: %ldMB)\n",
(unsigned long)(crash_size >> 20),
(unsigned long)(crashk_res.start >> 20),
-   (unsigned long)(memblock_phys_mem_size() >> 20));
+   (unsigned long)(total_mem_sz >> 20));
 
if (!memblock_is_region_memory(crashk_res.start, crash_size) ||
memblock_reserve(crashk_res.start, crash_size)) {
-- 
2.7.5



Re: [PATCH v2 07/13] powerpc: add support for folded p4d page tables

2020-02-19 Thread Mike Rapoport
On Wed, Feb 19, 2020 at 01:07:55PM +0100, Christophe Leroy wrote:
> 
> Le 16/02/2020 à 09:18, Mike Rapoport a écrit :
> > diff --git a/arch/powerpc/mm/ptdump/ptdump.c 
> > b/arch/powerpc/mm/ptdump/ptdump.c
> > index 206156255247..7bd4b81d5b5d 100644
> > --- a/arch/powerpc/mm/ptdump/ptdump.c
> > +++ b/arch/powerpc/mm/ptdump/ptdump.c
> > @@ -277,9 +277,9 @@ static void walk_pmd(struct pg_state *st, pud_t *pud, 
> > unsigned long start)
> > }
> >   }
> > -static void walk_pud(struct pg_state *st, pgd_t *pgd, unsigned long start)
> > +static void walk_pud(struct pg_state *st, p4d_t *p4d, unsigned long start)
> >   {
> > -   pud_t *pud = pud_offset(pgd, 0);
> > +   pud_t *pud = pud_offset(p4d, 0);
> > unsigned long addr;
> > unsigned int i;
> > @@ -293,6 +293,22 @@ static void walk_pud(struct pg_state *st, pgd_t *pgd, 
> > unsigned long start)
> > }
> >   }
> > +static void walk_p4d(struct pg_state *st, pgd_t *pgd, unsigned long start)
> > +{
> > +   p4d_t *p4d = p4d_offset(pgd, 0);
> > +   unsigned long addr;
> > +   unsigned int i;
> > +
> > +   for (i = 0; i < PTRS_PER_P4D; i++, p4d++) {
> > +   addr = start + i * P4D_SIZE;
> > +   if (!p4d_none(*p4d) && !p4d_is_leaf(*p4d))
> > +   /* p4d exists */
> > +   walk_pud(st, p4d, addr);
> > +   else
> > +   note_page(st, addr, 2, p4d_val(*p4d));
> 
> Level 2 is already used by walk_pud().
> 
> I think you have to increment the level used in walk_pud() and walk_pmd()
> and walk_pte()

Thanks for catching this!
I'll fix the numbers in the next version.
 
> > +   }
> > +}
> > +
> >   static void walk_pagetables(struct pg_state *st)
> >   {
> > unsigned int i;
> > @@ -306,7 +322,7 @@ static void walk_pagetables(struct pg_state *st)
> > for (i = pgd_index(addr); i < PTRS_PER_PGD; i++, pgd++, addr += 
> > PGDIR_SIZE) {
> > if (!pgd_none(*pgd) && !pgd_is_leaf(*pgd))
> > /* pgd exists */
> > -   walk_pud(st, pgd, addr);
> > +   walk_p4d(st, pgd, addr);
> > else
> > note_page(st, addr, 1, pgd_val(*pgd));
> > }
> 
> Christophe

-- 
Sincerely yours,
Mike.


Re: Surprising code generated for vdso_read_begin()

2020-02-19 Thread Segher Boessenkool
On Wed, Feb 19, 2020 at 10:52:16AM +0100, Arnd Bergmann wrote:
> On Wed, Feb 19, 2020 at 9:45 AM Christophe Leroy
>  wrote:
> > Le 16/02/2020 à 19:10, Arnd Bergmann a écrit :
> > > On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool
> > >  wrote:
> > >>
> > >> On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:
> > >>> Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :
> >  It looks like the compiler did loop peeling.  What GCC version is this?
> >  Please try current trunk (to become GCC 10), or at least GCC 9?
> > >>>
> > >>> It is with GCC 5.5
> > >>>
> > >>> https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more
> > >>> recent than 8.1
> > >>
> > >> Arnd, can you update the tools?  We are at 8.3 and 9.2 now :-)  Or is
> > >> this hard and/or painful to do?
> > >
> > > To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 
> > > 9.2
> > > binaries, as well as a recent 10.0 snapshot.
> > >
> >
> > Thanks Arnd,
> >
> > I have built the VDSO with 9.2, I get less performant result than with
> > 8.2 (same performance as with 5.5).
> >
> > After a quick look, I see:
> > - Irrelevant NOPs to align loops and stuff, allthough -mpcu=860 should
> > avoid that.
> > - A stack frame is set for saving r31 in __c_kernel_clock_gettime. GCC
> > 8.1 don't need that, all VDSO functions are frameless with 8.1
> 
> If you think it should be fixed in gcc, maybe try to reproduce it in
> https://godbolt.org/

(Feel free to skip this step; and don't put links to godbolt (or anything
else external) in our bugzilla, please; such links go stale before you
know it.)

> and open a gcc bug against that.

Yes please :-)

> Also, please try the gcc-10 snapshot, which has the highest chance
> of getting fixes if it shows the same issue (or worse).

If it is a regression, chances are it will be backported.  (But not to
9.3, which is due in just a few weeks, just like 8.4).  If it is just a
side effect of some other change, it will probably *not* be undone, not
on trunk (GCC 10) either.  It depends.

But sure, always test trunk if you can.


Segher


Re: [PATCH] powerpc/entry: Fix a #if which should be a #ifdef in entry_32.S

2020-02-19 Thread Michael Ellerman
On Tue, 2020-02-18 at 14:09:29 UTC, Christophe Leroy wrote:
> Fixes: 12c3f1fd87bf ("powerpc/32s: get rid of CPU_FTR_601 feature")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/9eb425b2e04e0e3006adffea5bf5f227a896f128

cheers


Re: [PATCH] powerpc/xmon: Fix whitespace handling in getstring()

2020-02-19 Thread Michael Ellerman
On Mon, 2020-02-17 at 04:13:43 UTC, Oliver O'Halloran wrote:
> The ls (lookup symbol) and zr (reboot) commands use xmon's getstring()
> helper to read a string argument from the xmon prompt. This function skips
> over leading whitespace, but doesn't check if the first "non-whitespace"
> character is a newline which causes some odd behaviour ( indicates
> a the enter key was pressed):
> 
>   0:mon> ls printk
>   printk: c01680c4
> 
>   0:mon> ls
>   printk
>   Symbol '
>   printk' not found.
>   0:mon>
> 
> With commit 2d9b332d99b ("powerpc/xmon: Allow passing an argument
> to ppc_md.restart()") we have a similar problem with the zr command.
> Previously zr took no arguments so "zr would trigger a reboot.
> With that patch applied a second newline needs to be sent in order for
> the reboot to occur. Fix this by checking if the leading whitespace
> ended on a newline:
> 
>   0:mon> ls
>   Symbol '' not found.
> 
> Fixes: 2d9b332d99b ("powerpc/xmon: Allow passing an argument to 
> ppc_md.restart()")
> Reported-by: Michael Ellerman 
> Signed-off-by: Oliver O'Halloran 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/066bc3576e653b615ee3f5230a89d69c8ebeeb71

cheers


Re: [PATCH v4] powerpc/32s: Fix DSI and ISI exceptions for CONFIG_VMAP_STACK

2020-02-19 Thread Michael Ellerman
On Sat, 2020-02-15 at 10:14:25 UTC, Christophe Leroy wrote:
> hash_page() needs to read page tables from kernel memory. When entire
> kernel memory is mapped by BATs, which is normally the case when
> CONFIG_STRICT_KERNEL_RWX is not set, it works even if the page hosting
> the page table is not referenced in the MMU hash table.
> 
> However, if the page where the page table resides is not covered by
> a BAT, a DSI fault can be encountered from hash_page(), and it loops
> forever. This can happen when CONFIG_STRICT_KERNEL_RWX is selected
> and the alignment of the different regions is too small to allow
> covering the entire memory with BATs. This also happens when
> CONFIG_DEBUG_PAGEALLOC is selected or when booting with 'nobats'
> flag.
> 
> Also, if the page containing the kernel stack is not present in the
> MMU hash table, registers cannot be saved and a recursive DSI fault
> is encountered.
> 
> To allow hash_page() to properly do its job at all time and load the
> MMU hash table whenever needed, it must run with data MMU disabled.
> This means it must be called before re-enabling data MMU. To allow
> this, registers clobbered by hash_page() and create_hpte() have to
> be saved in the thread struct together with SRR0, SSR1, DAR and DSISR.
> It is also necessary to ensure that DSI prolog doesn't overwrite
> regs saved by prolog of the current running exception. That means:
> - DSI can only use SPRN_SPRG_SCRATCH0
> - Exceptions must free SPRN_SPRG_SCRATCH0 before writing to the stack.
> 
> This also fixes the Oops reported by Erhard when create_hpte() is
> called by add_hash_page().
> 
> Due to prolog size increase, a few more exceptions had to get split
> in two parts.
> 
> Fixes: cd08f109e262 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
> Reported-by: Erhard F. 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206501
> Signed-off-by: Christophe Leroy 
> Tested-by: Erhard F. 
> Tested-by: Larry Finger 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/232ca1eecafed8c54491017f0612c33d8c742d74

cheers


Re: [PATCH RESEND with Fixes: tag] powerpc/chrp: Fix enter_rtas() with CONFIG_VMAP_STACK

2020-02-19 Thread Michael Ellerman
On Fri, 2020-02-14 at 08:39:50 UTC, Christophe Leroy wrote:
> With CONFIG_VMAP_STACK, data MMU has to be enabled
> to read data on the stack.
> 
> Fixes: cd08f109e262 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
> Signed-off-by: Christophe Leroy 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/5a528eb67908bcf493f3583cb4726fbd8e129605

cheers


Re: [PATCH] powerpc/6xx: Fix power_save_ppc32_restore() with CONFIG_VMAP_STACK

2020-02-19 Thread Michael Ellerman
On Fri, 2020-02-14 at 06:53:00 UTC, Christophe Leroy wrote:
> power_save_ppc32_restore() is called during exception entry, before
> re-enabling the MMU. It substracts KERNELBASE from the address
> of nap_save_msscr0 to access it.
> 
> With CONFIG_VMAP_STACK enabled, data MMU translation has already been
> re-enabled, so power_save_ppc32_restore() has to access
> nap_save_msscr0 by its virtual address.
> 
> Reported-by: Larry Finger 
> Signed-off-by: Christophe Leroy 
> Fixes: cd08f109e262 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
> Tested-by: Larry Finger 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/477f3488a94e35380c82a7498d46f10fa5f3edd2

cheers


Re: [PATCH v3 1/3] powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal delivery

2020-02-19 Thread Michael Ellerman
On Tue, 2020-02-11 at 03:38:29 UTC, Gustavo Luiz Duarte wrote:
> After a treclaim, we expect to be in non-transactional state. If we don't 
> clear
> the current thread's MSR[TS] before we get preempted, then
> tm_recheckpoint_new_task() will recheckpoint and we get rescheduled in
> suspended transaction state.
> 
> When handling a signal caught in transactional state, handle_rt_signal64()
> calls get_tm_stackpointer() that treclaims the transaction using
> tm_reclaim_current() but without clearing the thread's MSR[TS]. This can cause
> the TM Bad Thing exception below if later we pagefault and get preempted 
> trying
> to access the user's sigframe, using __put_user(). Afterwards, when we are
> rescheduled back into do_page_fault() (but now in suspended state since the
> thread's MSR[TS] was not cleared), upon executing 'rfid' after completion of
> the page fault handling, the exception is raised because a transition from
> suspended to non-transactional state is invalid.
> 
>   Unexpected TM Bad Thing exception at c000de44 (msr 
> 0x800302a03031) tm_scratch=80010280b033
>   Oops: Unrecoverable exception, sig: 6 [#1]
>   LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>   Modules linked in: nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 
> nf_defrag_ipv4 ip6_tables ip_tables nft_compat ip_set nf_tables nfnetlink xts 
> vmx_crypto sg virtio_balloon
>   r_mod cdrom virtio_net net_failover virtio_blk virtio_scsi failover 
> dm_mirror dm_region_hash dm_log dm_mod
>   CPU: 25 PID: 15547 Comm: a.out Not tainted 5.4.0-rc2 #32
>   NIP:  c000de44 LR: c0034728 CTR: 
>   REGS: c0003fe7bd70 TRAP: 0700   Not tainted  (5.4.0-rc2)
>   MSR:  800302a03031   CR: 44000884 
>  XER: 
>   CFAR: c000dda4 IRQMASK: 0
>   PACATMSCRATCH: 80010280b033
>   GPR00: c0034728 c00f65a17c80 c1662800 
> 7fffacf3fd78
>   GPR04: 1000 1000  
> c00f611f8af0
>   GPR08:  78006001  
> 000c
>   GPR12: c00f611f84b0 c0003ffcb200  
> 
>   GPR16:    
> 
>   GPR20:    
> c00f611f8140
>   GPR24:  7fffacf3fd68 c00f65a17d90 
> c00f611f7800
>   GPR28: c00f65a17e90 c00f65a17e90 c1685e18 
> 7fffacf3f000
>   NIP [c000de44] fast_exception_return+0xf4/0x1b0
>   LR [c0034728] handle_rt_signal64+0x78/0xc50
>   Call Trace:
>   [c00f65a17c80] [c0034710] handle_rt_signal64+0x60/0xc50 
> (unreliable)
>   [c00f65a17d30] [c0023640] do_notify_resume+0x330/0x460
>   [c00f65a17e20] [c000dcc4] ret_from_except_lite+0x70/0x74
>   Instruction dump:
>   7c4ff120 e8410170 7c5a03a6 3840 f8410060 e8010070 e8410080 e8610088
>   6000 6000 e8810090 e8210078 <4c24> 4800 e8610178 
> 88ed0989
>   ---[ end trace 93094aa44b442f87 ]---
> 
> The simplified sequence of events that triggers the above exception is:
> 
>   ... # userspace in NON-TRANSACTIONAL state
>   tbegin  # userspace in TRANSACTIONAL state
>   signal delivery # kernelspace in SUSPENDED state
>   handle_rt_signal64()
> get_tm_stackpointer()
>   treclaim# kernelspace in NON-TRANSACTIONAL state
> __put_user()
>   page fault happens. We will never get back here because of the TM Bad 
> Thing exception.
> 
>   page fault handling kicks in and we voluntarily preempt ourselves
>   do_page_fault()
> __schedule()
>   __switch_to(other_task)
> 
>   our task is rescheduled and we recheckpoint because the thread's MSR[TS] 
> was not cleared
>   __switch_to(our_task)
> switch_to_tm()
>   tm_recheckpoint_new_task()
> trechkpt  # kernelspace in SUSPENDED state
> 
>   The page fault handling resumes, but now we are in suspended transaction 
> state
>   do_page_fault()completes
>   rfid <- trying to get back where the page fault happened (we were 
> non-transactional back then)
>   TM Bad Thing# illegal transition from suspended to 
> non-transactional
> 
> This patch fixes that issue by clearing the current thread's MSR[TS] just 
> after
> treclaim in get_tm_stackpointer() so that we stay in non-transactional state 
> in
> case we are preempted. In order to make treclaim and clearing the thread's
> MSR[TS] atomic from a preemption perspective when CONFIG_PREEMPT is set,
> preempt_disable/enable() is used. It's also necessary to save the previous
> value of the thread's MSR before get_tm_stackpointer() is called so that it 
> can
> be exposed to the signal handler later in 

Re: [PATCH] powerpc/8xx: Fix clearing of bits 20-23 in ITLB miss

2020-02-19 Thread Michael Ellerman
On Sun, 2020-02-09 at 18:14:42 UTC, Christophe Leroy wrote:
> In ITLB miss handled the line supposed to clear bits 20-23 on the
> L2 ITLB entry is buggy and does indeed nothing, leading to undefined
> value which could allow execution when it shouldn't.
> 
> Properly do the clearing with the relevant instruction.
> 
> Fixes: 74fabcadfd43 ("powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB 
> Miss handlers")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/a4031afb9d10d97f4d0285844abbc0ab04245304

cheers


Re: [PATCH v2] powerpc/hugetlb: Fix 8M hugepages on 8xx

2020-02-19 Thread Michael Ellerman
On Sun, 2020-02-09 at 16:02:41 UTC, Christophe Leroy wrote:
> With HW assistance all page tables must be 4k aligned, the 8xx
> drops the last 12 bits during the walk.
> 
> Redefine HUGEPD_SHIFT_MASK to mask last 12 bits out.
> HUGEPD_SHIFT_MASK is used to for alignment of page table cache.
> 
> Fixes: 22569b881d37 ("powerpc/8xx: Enable 8M hugepage support with HW 
> assistance")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/50a175dd18de7a647e72aca7daf4744e3a5a81e3

cheers


Re: [PATCH 1/1] powerpc/eeh: fix deadlock handling dead PHB

2020-02-19 Thread Michael Ellerman
On Fri, 2020-02-07 at 04:57:31 UTC, Sam Bobroff wrote:
> Recovering a dead PHB can currently cause a deadlock as the PCI
> rescan/remove lock is taken twice.
> 
> This is caused as part of an existing bug in
> eeh_handle_special_event(). The pe is processed while traversing the
> PHBs even though the pe is unrelated to the loop. This causes the pe
> to be, incorrectly, processed more than once.
> 
> Untangling this section can move the pe processing out of the loop and
> also outside the locked section, correcting both problems.
> 
> Signed-off-by: Sam Bobroff 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/d4f194ed9eb9841a8f978710e4d24296f791a85b

cheers


Re: [PATCH] powerpc/hugetlb: Fix 512k hugepages on 8xx with 16k page size

2020-02-19 Thread Michael Ellerman
On Thu, 2020-02-06 at 13:50:28 UTC, Christophe Leroy wrote:
> Commit 55c8fc3f4930 ("powerpc/8xx: reintroduce 16K pages with HW
> assistance") redefined pte_t as a struct of 4 pte_basic_t, because
> in 16K pages mode there are four identical entries in the
> page table. But the size of hugepage tables is calculated based
> of the size of (void *). Therefore, we end up with page tables
> of size 1k instead of 4k for 512k pages.
> 
> As 512k hugepage tables are the same size as standard page tables,
> ie 4k, use the standard page tables instead of PGT_CACHE tables.
> 
> Fixes: 3fb69c6a1a13 ("powerpc/8xx: Enable 512k hugepage support with HW 
> assistance")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/f2b67ef90b0d5eca0f2255e02cf2f620bc0ddcdb

cheers


Re: [PATCH v2 07/13] powerpc: add support for folded p4d page tables

2020-02-19 Thread Christophe Leroy




Le 16/02/2020 à 09:18, Mike Rapoport a écrit :

From: Mike Rapoport 

Implement primitives necessary for the 4th level folding, add walks of p4d
level where appropriate and replace 5level-fixup.h with pgtable-nop4d.h.

Signed-off-by: Mike Rapoport 
Tested-by: Christophe Leroy  # 8xx and 83xx
---
  arch/powerpc/include/asm/book3s/32/pgtable.h  |  1 -
  arch/powerpc/include/asm/book3s/64/hash.h |  4 +-
  arch/powerpc/include/asm/book3s/64/pgalloc.h  |  4 +-
  arch/powerpc/include/asm/book3s/64/pgtable.h  | 58 ++
  arch/powerpc/include/asm/book3s/64/radix.h|  6 +-
  arch/powerpc/include/asm/nohash/32/pgtable.h  |  1 -
  arch/powerpc/include/asm/nohash/64/pgalloc.h  |  2 +-
  .../include/asm/nohash/64/pgtable-4k.h| 32 +-
  arch/powerpc/include/asm/nohash/64/pgtable.h  |  6 +-
  arch/powerpc/include/asm/pgtable.h|  8 +++
  arch/powerpc/kvm/book3s_64_mmu_radix.c| 59 ---
  arch/powerpc/lib/code-patching.c  |  7 ++-
  arch/powerpc/mm/book3s32/mmu.c|  2 +-
  arch/powerpc/mm/book3s32/tlb.c|  4 +-
  arch/powerpc/mm/book3s64/hash_pgtable.c   |  4 +-
  arch/powerpc/mm/book3s64/radix_pgtable.c  | 19 --
  arch/powerpc/mm/book3s64/subpage_prot.c   |  6 +-
  arch/powerpc/mm/hugetlbpage.c | 28 +
  arch/powerpc/mm/kasan/kasan_init_32.c |  8 +--
  arch/powerpc/mm/mem.c |  4 +-
  arch/powerpc/mm/nohash/40x.c  |  4 +-
  arch/powerpc/mm/nohash/book3e_pgtable.c   | 15 +++--
  arch/powerpc/mm/pgtable.c | 25 +++-
  arch/powerpc/mm/pgtable_32.c  | 28 +
  arch/powerpc/mm/pgtable_64.c  | 10 ++--
  arch/powerpc/mm/ptdump/hashpagetable.c| 20 ++-
  arch/powerpc/mm/ptdump/ptdump.c   | 22 ++-
  arch/powerpc/xmon/xmon.c  | 17 +-
  28 files changed, 284 insertions(+), 120 deletions(-)

diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index 206156255247..7bd4b81d5b5d 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -277,9 +277,9 @@ static void walk_pmd(struct pg_state *st, pud_t *pud, 
unsigned long start)
}
  }
  
-static void walk_pud(struct pg_state *st, pgd_t *pgd, unsigned long start)

+static void walk_pud(struct pg_state *st, p4d_t *p4d, unsigned long start)
  {
-   pud_t *pud = pud_offset(pgd, 0);
+   pud_t *pud = pud_offset(p4d, 0);
unsigned long addr;
unsigned int i;
  
@@ -293,6 +293,22 @@ static void walk_pud(struct pg_state *st, pgd_t *pgd, unsigned long start)

}
  }
  
+static void walk_p4d(struct pg_state *st, pgd_t *pgd, unsigned long start)

+{
+   p4d_t *p4d = p4d_offset(pgd, 0);
+   unsigned long addr;
+   unsigned int i;
+
+   for (i = 0; i < PTRS_PER_P4D; i++, p4d++) {
+   addr = start + i * P4D_SIZE;
+   if (!p4d_none(*p4d) && !p4d_is_leaf(*p4d))
+   /* p4d exists */
+   walk_pud(st, p4d, addr);
+   else
+   note_page(st, addr, 2, p4d_val(*p4d));


Level 2 is already used by walk_pud().

I think you have to increment the level used in walk_pud() and 
walk_pmd() and walk_pte()



+   }
+}
+
  static void walk_pagetables(struct pg_state *st)
  {
unsigned int i;
@@ -306,7 +322,7 @@ static void walk_pagetables(struct pg_state *st)
for (i = pgd_index(addr); i < PTRS_PER_PGD; i++, pgd++, addr += 
PGDIR_SIZE) {
if (!pgd_none(*pgd) && !pgd_is_leaf(*pgd))
/* pgd exists */
-   walk_pud(st, pgd, addr);
+   walk_p4d(st, pgd, addr);
else
note_page(st, addr, 1, pgd_val(*pgd));
}


Christophe


Re: powerpc Linux scv support and scv system call ABI proposal

2020-02-19 Thread Nicholas Piggin
Tulio Magno Quites Machado Filho's on January 30, 2020 1:51 am:
> Nicholas Piggin  writes:
> 
>> Adhemerval Zanella's on January 29, 2020 3:26 am:
>>> 
>>> We already had to push a similar hack where glibc used to abort transactions
>>> prior syscalls to avoid some side-effects on kernel (commit 56cf2763819d2f).
>>> It was eventually removed from syscall handling by f0458cf4f9ff3d870, where
>>> we only enable TLE if kernel suppors PPC_FEATURE2_HTM_NOSC.
>>> 
>>> The transaction syscall abort used to read a variable directly from TCB,
>>> so this could be an option. I would expect that we could optimize it where
>>> if glibc is building against a recent kernel and compiler is building
>>> for a ISA 3.0+ cpu we could remove the 'sc' code.
>>> 
>>
>> We would just have to be careful of running on ISA 3.0 CPUs on older
>> kernels which do not support scv.
> 
> Can we assume that, if a syscall is available through sc it's also available
> in scv 0?

Was on vacation, thanks for waiting.

Yes, except for the difference in calling convention, we would require
that the syscalls available to `sc` is exactly the same as `scv 0`. This
happens as a natural consequence of the kernel implementation which
re-uses code to select the syscall.

> 
> Because if that's true, I believe your suggestion to interpret 
> PPC_FEATURE2_SCV
> as scv 0 support would be helpful to provide this support via IFUNC even
> when glibc is built using --with-cpu=power8, which is the most common scenario
> in ppc64le.
> 
> In that scenario, it seems new HWCAP bits for new vectors wouldn't be too
> frequent, which was the only downside of this proposal.

Okay good feedback, thanks.

Thanks,
Nick


[PATCH] powerpc/xmon: Lower limits on nidump and ndump

2020-02-19 Thread Michael Ellerman
In xmon we have two variables that are used by the dump commands.
There's ndump which is the number of bytes to dump using 'd', and
nidump which is the number of instructions to dump using 'di'.

ndump starts as 64 and nidump starts as 16, but both can be set by the
user.

It's fairly common to be pasting addresses into xmon when trying to
debug something, and if you inadvertently double paste an address like
so:

  0:mon> di c2101f6c c2101f6c

The second value is interpreted as the number of instructions to dump.

Luckily it doesn't dump 13 quintrillion instructions, the value is
limited to MAX_DUMP (128K). But as each instruction is dumped on a
single line, that's still a lot of output. If you're on a slow console
that can take multiple minutes to print. If you were "just popping in
and out of xmon quickly before the RCU/hardlockup detector fires" you
are now having a bad day.

Things are not as bad with 'd' because we print 16 bytes per line, so
it's fewer lines. But it's still quite a lot.

So shrink the maximum for 'd' to 64K (one page), which is 4096 lines.
For 'di' add a new limit which is the above / 4 - because instructions
are 4 bytes, meaning again we can dump one page.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/xmon/xmon.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index e8c84d265602..722bf7ed0eda 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -81,8 +81,9 @@ static bool xmon_is_ro = 
IS_ENABLED(CONFIG_XMON_DEFAULT_RO_MODE);
 
 static unsigned long adrs;
 static int size = 1;
-#define MAX_DUMP (128 * 1024)
+#define MAX_DUMP (64 * 1024)
 static unsigned long ndump = 64;
+#define MAX_IDUMP (MAX_DUMP >> 2)
 static unsigned long nidump = 16;
 static unsigned long ncsum = 4096;
 static int termch;
@@ -2756,8 +2757,8 @@ dump(void)
scanhex();
if (nidump == 0)
nidump = 16;
-   else if (nidump > MAX_DUMP)
-   nidump = MAX_DUMP;
+   else if (nidump > MAX_IDUMP)
+   nidump = MAX_IDUMP;
adrs += ppc_inst_dump(adrs, nidump, 1);
last_cmd = "di\n";
} else if (c == 'l') {
-- 
2.21.1



Re: Surprising code generated for vdso_read_begin()

2020-02-19 Thread Arnd Bergmann
On Wed, Feb 19, 2020 at 9:45 AM Christophe Leroy
 wrote:
> Le 16/02/2020 à 19:10, Arnd Bergmann a écrit :
> > On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool
> >  wrote:
> >>
> >> On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:
> >>> Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :
>  It looks like the compiler did loop peeling.  What GCC version is this?
>  Please try current trunk (to become GCC 10), or at least GCC 9?
> >>>
> >>> It is with GCC 5.5
> >>>
> >>> https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more
> >>> recent than 8.1
> >>
> >> Arnd, can you update the tools?  We are at 8.3 and 9.2 now :-)  Or is
> >> this hard and/or painful to do?
> >
> > To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 9.2
> > binaries, as well as a recent 10.0 snapshot.
> >
>
> Thanks Arnd,
>
> I have built the VDSO with 9.2, I get less performant result than with
> 8.2 (same performance as with 5.5).
>
> After a quick look, I see:
> - Irrelevant NOPs to align loops and stuff, allthough -mpcu=860 should
> avoid that.
> - A stack frame is set for saving r31 in __c_kernel_clock_gettime. GCC
> 8.1 don't need that, all VDSO functions are frameless with 8.1

If you think it should be fixed in gcc, maybe try to reproduce it in
https://godbolt.org/ and open a gcc bug against that.

Also, please try the gcc-10 snapshot, which has the highest chance
of getting fixes if it shows the same issue (or worse).

 Arnd


Re: Surprising code generated for vdso_read_begin()

2020-02-19 Thread Christophe Leroy




Le 16/02/2020 à 19:10, Arnd Bergmann a écrit :

On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool
 wrote:


On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:

Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :

It looks like the compiler did loop peeling.  What GCC version is this?
Please try current trunk (to become GCC 10), or at least GCC 9?


It is with GCC 5.5

https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more
recent than 8.1


Arnd, can you update the tools?  We are at 8.3 and 9.2 now :-)  Or is
this hard and/or painful to do?


To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 9.2
binaries, as well as a recent 10.0 snapshot.



Thanks Arnd,

I have built the VDSO with 9.2, I get less performant result than with 
8.2 (same performance as with 5.5).


After a quick look, I see:
- Irrelevant NOPs to align loops and stuff, allthough -mpcu=860 should 
avoid that.
- A stack frame is set for saving r31 in __c_kernel_clock_gettime. GCC 
8.1 don't need that, all VDSO functions are frameless with 8.1


Christophe


[PATCH] powerpc/kprobes: Blacklist functions running with MMU disabled on PPC32

2020-02-19 Thread Christophe Leroy
kprobe does not handle events happening in real mode, all
functions running with MMU disabled have to be blacklisted.

As already done for PPC64, do it for PPC32.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/ppc_asm.h   | 10 +++
 arch/powerpc/kernel/cpu_setup_6xx.S  |  4 +-
 arch/powerpc/kernel/entry_32.S   | 68 
 arch/powerpc/kernel/fpu.S|  1 +
 arch/powerpc/kernel/idle_6xx.S   |  2 +-
 arch/powerpc/kernel/idle_e500.S  |  2 +-
 arch/powerpc/kernel/l2cr_6xx.S   |  2 +-
 arch/powerpc/kernel/misc.S   |  2 +
 arch/powerpc/kernel/misc_32.S|  4 +-
 arch/powerpc/kernel/swsusp_32.S  |  6 +-
 arch/powerpc/kernel/vector.S |  1 +
 arch/powerpc/mm/book3s32/hash_low.S  | 38 +--
 arch/powerpc/mm/mem.c|  2 +
 arch/powerpc/platforms/52xx/lite5200_sleep.S |  2 +
 arch/powerpc/platforms/82xx/pq2.c|  1 +
 arch/powerpc/platforms/83xx/suspend-asm.S|  1 +
 arch/powerpc/platforms/powermac/cache.S  |  2 +
 arch/powerpc/platforms/powermac/sleep.S  | 13 ++--
 18 files changed, 85 insertions(+), 76 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index 6b03dff61a05..e8f34ba89497 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -267,8 +267,18 @@ GLUE(.,name):
.pushsection "_kprobe_blacklist","aw";  \
PPC_LONG (entry) ;  \
.popsection
+#define _NOKPROBE_ENTRY(entry) \
+   _ASM_NOKPROBE_SYMBOL(entry) \
+   _ENTRY(entry)
+#define _NOKPROBE_GLOBAL(entry)\
+   _ASM_NOKPROBE_SYMBOL(entry) \
+   _GLOBAL(entry)
 #else
 #define _ASM_NOKPROBE_SYMBOL(entry)
+#define _NOKPROBE_ENTRY(entry) \
+   _ENTRY(entry)
+#define _NOKPROBE_GLOBAL(entry)\
+   _GLOBAL(entry)
 #endif
 
 #define FUNC_START(name)   _GLOBAL(name)
diff --git a/arch/powerpc/kernel/cpu_setup_6xx.S 
b/arch/powerpc/kernel/cpu_setup_6xx.S
index f6517f67265a..1cb947268546 100644
--- a/arch/powerpc/kernel/cpu_setup_6xx.S
+++ b/arch/powerpc/kernel/cpu_setup_6xx.S
@@ -276,7 +276,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NO_DPM)
  * in some 750 cpus where using a not yet initialized FPU register after
  * power on reset may hang the CPU
  */
-_GLOBAL(__init_fpu_registers)
+_NOKPROBE_GLOBAL(__init_fpu_registers)
mfmsr   r10
ori r11,r10,MSR_FP
mtmsr   r11
@@ -381,7 +381,7 @@ _GLOBAL(__save_cpu_setup)
  * restore CPU state as backed up by the previous
  * function. This does not include cache setting
  */
-_GLOBAL(__restore_cpu_setup)
+_NOKPROBE_GLOBAL(__restore_cpu_setup)
/* Some CR fields are volatile, we back it up all */
mfcrr7
 
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 0713daa651d9..cf9a7640abf0 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -44,24 +44,21 @@
.align  12
 
 #ifdef CONFIG_BOOKE
-   .globl  mcheck_transfer_to_handler
-mcheck_transfer_to_handler:
+_NOKPROBE_ENTRY(mcheck_transfer_to_handler)
mfspr   r0,SPRN_DSRR0
stw r0,_DSRR0(r11)
mfspr   r0,SPRN_DSRR1
stw r0,_DSRR1(r11)
/* fall through */
 
-   .globl  debug_transfer_to_handler
-debug_transfer_to_handler:
+_NOKPROBE_ENTRY(debug_transfer_to_handler)
mfspr   r0,SPRN_CSRR0
stw r0,_CSRR0(r11)
mfspr   r0,SPRN_CSRR1
stw r0,_CSRR1(r11)
/* fall through */
 
-   .globl  crit_transfer_to_handler
-crit_transfer_to_handler:
+_NOKPROBE_ENTRY(crit_transfer_to_handler)
 #ifdef CONFIG_PPC_BOOK3E_MMU
mfspr   r0,SPRN_MAS0
stw r0,MAS0(r11)
@@ -97,8 +94,7 @@ crit_transfer_to_handler:
 #endif
 
 #ifdef CONFIG_40x
-   .globl  crit_transfer_to_handler
-crit_transfer_to_handler:
+_NOKPROBE_ENTRY(crit_transfer_to_handler)
lwz r0,crit_r10@l(0)
stw r0,GPR10(r11)
lwz r0,crit_r11@l(0)
@@ -124,13 +120,11 @@ crit_transfer_to_handler:
  * Note that we rely on the caller having set cr0.eq iff the exception
  * occurred in kernel mode (i.e. MSR:PR = 0).
  */
-   .globl  transfer_to_handler_full
-transfer_to_handler_full:
+_NOKPROBE_ENTRY(transfer_to_handler_full)
SAVE_NVGPRS(r11)
/* fall through */
 
-   .globl  transfer_to_handler
-transfer_to_handler:
+_NOKPROBE_ENTRY(transfer_to_handler)
stw r2,GPR2(r11)
stw r12,_NIP(r11)
stw r9,_MSR(r11)
@@ -194,8 +188,7 @@ transfer_to_handler:
bt- 31-TLF_NAPPING,4f
bt- 31-TLF_SLEEPING,7f
 #endif /* CONFIG_PPC_BOOK3S_32 || CONFIG_E500 */
-   .globl 

[PATCH v2] powerpc/kprobes: Remove redundant code

2020-02-19 Thread Christophe Leroy
At the time being we have something like

if (something) {
p = get();
if (p) {
if (something_wrong)
goto out;
...
return;
} else if (a != b) {
if (some_error)
goto out;
...
}
goto out;
}
p = get();
if (!p) {
if (a != b) {
if (some_error)
goto out;
...
}
goto out;
}

This is similar to

p = get();
if (!p) {
if (a != b) {
if (some_error)
goto out;
...
}
goto out;
}
if (something) {
if (something_wrong)
goto out;
...
return;
}

Signed-off-by: Christophe Leroy 
---
v2: Reverse the logic by testing (!p) before kprobe_running() as suggested by 
Naveen.
---
 arch/powerpc/kernel/kprobes.c | 80 ++-
 1 file changed, 32 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 9b340af02c38..84567406b53d 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -274,54 +274,6 @@ int kprobe_handler(struct pt_regs *regs)
preempt_disable();
kcb = get_kprobe_ctlblk();
 
-   /* Check we're not actually recursing */
-   if (kprobe_running()) {
-   p = get_kprobe(addr);
-   if (p) {
-   kprobe_opcode_t insn = *p->ainsn.insn;
-   if (kcb->kprobe_status == KPROBE_HIT_SS &&
-   is_trap(insn)) {
-   /* Turn off 'trace' bits */
-   regs->msr &= ~MSR_SINGLESTEP;
-   regs->msr |= kcb->kprobe_saved_msr;
-   goto no_kprobe;
-   }
-   /* We have reentered the kprobe_handler(), since
-* another probe was hit while within the handler.
-* We here save the original kprobes variables and
-* just single step on the instruction of the new probe
-* without calling any user handlers.
-*/
-   save_previous_kprobe(kcb);
-   set_current_kprobe(p, regs, kcb);
-   kprobes_inc_nmissed_count(p);
-   kcb->kprobe_status = KPROBE_REENTER;
-   if (p->ainsn.boostable >= 0) {
-   ret = try_to_emulate(p, regs);
-
-   if (ret > 0) {
-   restore_previous_kprobe(kcb);
-   preempt_enable_no_resched();
-   return 1;
-   }
-   }
-   prepare_singlestep(p, regs);
-   return 1;
-   } else if (*addr != BREAKPOINT_INSTRUCTION) {
-   /* If trap variant, then it belongs not to us */
-   kprobe_opcode_t cur_insn = *addr;
-
-   if (is_trap(cur_insn))
-   goto no_kprobe;
-   /* The breakpoint instruction was removed by
-* another cpu right after we hit, no further
-* handling of this interrupt is appropriate
-*/
-   ret = 1;
-   }
-   goto no_kprobe;
-   }
-
p = get_kprobe(addr);
if (!p) {
if (*addr != BREAKPOINT_INSTRUCTION) {
@@ -346,6 +298,38 @@ int kprobe_handler(struct pt_regs *regs)
goto no_kprobe;
}
 
+   /* Check we're not actually recursing */
+   if (kprobe_running()) {
+   kprobe_opcode_t insn = *p->ainsn.insn;
+   if (kcb->kprobe_status == KPROBE_HIT_SS && is_trap(insn)) {
+   /* Turn off 'trace' bits */
+   regs->msr &= ~MSR_SINGLESTEP;
+   regs->msr |= kcb->kprobe_saved_msr;
+   goto no_kprobe;
+   }
+   /* We have reentered the kprobe_handler(), since
+* another probe was hit while within the handler.
+* We here save the original kprobes variables and
+* just single step on the instruction of the new probe
+* without calling any user handlers.
+*/
+