Re: [PATCH kernel] powerpc/dma: Fix dma_map_ops::get_required_mask

2020-09-07 Thread Christoph Hellwig
On Tue, Sep 08, 2020 at 11:51:06AM +1000, Alexey Kardashevskiy wrote:
> What is dma_get_required_mask() for anyway? What "requires" what here?

Yes, it is a really odd API.  It comes from classic old PCI where
64-bit addressing required an additional bus cycle, and various devices
had different addressing schemes, with the smaller addresses beeing
more efficient.  So this allows the driver to request the "required"
addressing mode to address all memory.  "preferred" might be a better
name as we'll bounce buffer if it isn't met.  I also don't really see
why a driver would ever want to use it for a modern PCIe device.


Re: [RFC PATCH v2 0/3] mm/gup: fix gup_fast with dynamic page table folding

2020-09-07 Thread Christophe Leroy




Le 07/09/2020 à 22:12, Mike Rapoport a écrit :

On Mon, Sep 07, 2020 at 08:00:55PM +0200, Gerald Schaefer wrote:

This is v2 of an RFC previously discussed here:
https://lore.kernel.org/lkml/20200828140314.8556-1-gerald.schae...@linux.ibm.com/

Patch 1 is a fix for a regression in gup_fast on s390, after our conversion
to common gup_fast code. It will introduce special helper functions
pXd_addr_end_folded(), which have to be used in places where pagetable walk
is done w/o lock and with READ_ONCE, so currently only in gup_fast.

Patch 2 is an attempt to make that more generic, i.e. change pXd_addr_end()
themselves by adding an extra pXd value parameter. That was suggested by
Jason during v1 discussion, because he is already thinking of some other
places where he might want to switch to the READ_ONCE logic for pagetable
walks. In general, that would be the cleanest / safest solution, but there
is some impact on other architectures and common code, hence the new and
greatly enlarged recipient list.

Patch 3 is a "nice to have" add-on, which makes pXd_addr_end() inline
functions instead of #defines, so that we get some type checking for the
new pXd value parameter.

Not sure about Fixes/stable tags for the generic solution. Only patch 1
fixes a real bug on s390, and has Fixes/stable tags. Patches 2 + 3 might
still be nice to have in stable, to ease future backports, but I guess
"nice to have" does not really qualify for stable backports.


I also think that adding pXd parameter to pXd_addr_end() is a cleaner
way and with this patch 1 is not really required. I would even merge
patches 2 and 3 into a single patch and use only it as the fix.


Why not merging patches 2 and 3, but I would keep patch 1 separate but 
after the generic changes, so that we first do the generic changes, then 
we do the specific S390 use of it.


Christophe


Re: [RFC PATCH v2 3/3] mm: make generic pXd_addr_end() macros inline functions

2020-09-07 Thread Christophe Leroy




Le 07/09/2020 à 20:00, Gerald Schaefer a écrit :

From: Alexander Gordeev 

Since pXd_addr_end() macros take pXd page-table entry as a
parameter it makes sense to check the entry type on compile.
Even though most archs do not make use of page-table entries
in pXd_addr_end() calls, checking the type in traversal code
paths could help to avoid subtle bugs.

Signed-off-by: Alexander Gordeev 
Signed-off-by: Gerald Schaefer 
---
  include/linux/pgtable.h | 36 
  1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 67ebc22cf83d..d9e7d16c2263 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -656,31 +656,35 @@ static inline int arch_unmap_one(struct mm_struct *mm,
   */
  
  #ifndef pgd_addr_end

-#define pgd_addr_end(pgd, addr, end)   \
-({ unsigned long __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;  \
-   (__boundary - 1 < (end) - 1)? __boundary: (end); \
-})
+#define pgd_addr_end pgd_addr_end


I think that #define is pointless, usually there is no such #define for 
the default case.



+static inline unsigned long pgd_addr_end(pgd_t pgd, unsigned long addr, 
unsigned long end)
+{  unsigned long __boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
+   return (__boundary - 1 < end - 1) ? __boundary : end;
+}


Please use the standard layout, ie entry { and exit } alone on their 
line, and space between local vars declaration and the rest.


Also remove the leading __ in front of var names as it's not needed once 
it is not macros anymore.


f_name()
{
some_local_var;

do_something();
}


  #endif
  
  #ifndef p4d_addr_end

-#define p4d_addr_end(p4d, addr, end)   \
-({ unsigned long __boundary = ((addr) + P4D_SIZE) & P4D_MASK;  \
-   (__boundary - 1 < (end) - 1)? __boundary: (end); \
-})
+#define p4d_addr_end p4d_addr_end
+static inline unsigned long p4d_addr_end(p4d_t p4d, unsigned long addr, 
unsigned long end)
+{  unsigned long __boundary = (addr + P4D_SIZE) & P4D_MASK;
+   return (__boundary - 1 < end - 1) ? __boundary : end;
+}
  #endif
  
  #ifndef pud_addr_end

-#define pud_addr_end(pud, addr, end)   \
-({ unsigned long __boundary = ((addr) + PUD_SIZE) & PUD_MASK;  \
-   (__boundary - 1 < (end) - 1)? __boundary: (end); \
-})
+#define pud_addr_end pud_addr_end
+static inline unsigned long pud_addr_end(pud_t pud, unsigned long addr, 
unsigned long end)
+{  unsigned long __boundary = (addr + PUD_SIZE) & PUD_MASK;
+   return (__boundary - 1 < end - 1) ? __boundary : end;
+}
  #endif
  
  #ifndef pmd_addr_end

-#define pmd_addr_end(pmd, addr, end)   \
-({ unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK;  \
-   (__boundary - 1 < (end) - 1)? __boundary: (end); \
-})
+#define pmd_addr_end pmd_addr_end
+static inline unsigned long pmd_addr_end(pmd_t pmd, unsigned long addr, 
unsigned long end)
+{  unsigned long __boundary = (addr + PMD_SIZE) & PMD_MASK;
+   return (__boundary - 1 < end - 1) ? __boundary : end;
+}
  #endif
  
  /*




Re: [RFC PATCH v2 2/3] mm: make pXd_addr_end() functions page-table entry aware

2020-09-07 Thread Christophe Leroy




Le 07/09/2020 à 20:00, Gerald Schaefer a écrit :

From: Alexander Gordeev 

Unlike all other page-table abstractions pXd_addr_end() do not take
into account a particular table entry in which context the functions
are called. On architectures with dynamic page-tables folding that
might lead to lack of necessary information that is difficult to
obtain other than from the table entry itself. That already led to
a subtle memory corruption issue on s390.

By letting pXd_addr_end() functions know about the page-table entry
we allow archs not only make extra checks, but also optimizations.

As result of this change the pXd_addr_end_folded() functions used
in gup_fast traversal code become unnecessary and get replaced with
universal pXd_addr_end() variants.

The arch-specific updates not only add dereferencing of page-table
entry pointers, but also small changes to the code flow to make those
dereferences possible, at least for x86 and powerpc. Also for arm64,
but in way that should not have any impact.



[...]



Signed-off-by: Alexander Gordeev 
Signed-off-by: Gerald Schaefer 
---
  arch/arm/include/asm/pgtable-2level.h|  2 +-
  arch/arm/mm/idmap.c  |  6 ++--
  arch/arm/mm/mmu.c|  8 ++---
  arch/arm64/kernel/hibernate.c| 16 ++
  arch/arm64/kvm/mmu.c | 16 +-
  arch/arm64/mm/kasan_init.c   |  8 ++---
  arch/arm64/mm/mmu.c  | 25 +++
  arch/powerpc/mm/book3s64/radix_pgtable.c |  7 ++---
  arch/powerpc/mm/hugetlbpage.c|  6 ++--


You forgot arch/powerpc/mm/book3s64/subpage_prot.c it seems.


  arch/s390/include/asm/pgtable.h  |  8 ++---
  arch/s390/mm/page-states.c   |  8 ++---
  arch/s390/mm/pageattr.c  |  8 ++---
  arch/s390/mm/vmem.c  |  8 ++---
  arch/sparc/mm/hugetlbpage.c  |  6 ++--
  arch/um/kernel/tlb.c |  8 ++---
  arch/x86/mm/init_64.c| 15 -
  arch/x86/mm/kasan_init_64.c  | 16 +-
  include/asm-generic/pgtable-nop4d.h  |  2 +-
  include/asm-generic/pgtable-nopmd.h  |  2 +-
  include/asm-generic/pgtable-nopud.h  |  2 +-
  include/linux/pgtable.h  | 26 ---
  mm/gup.c |  8 ++---
  mm/ioremap.c |  8 ++---
  mm/kasan/init.c  | 17 +-
  mm/madvise.c |  4 +--
  mm/memory.c  | 40 
  mm/mlock.c   | 18 ---
  mm/mprotect.c|  8 ++---
  mm/pagewalk.c|  8 ++---
  mm/swapfile.c|  8 ++---
  mm/vmalloc.c | 16 +-
  31 files changed, 165 insertions(+), 173 deletions(-)


Christophe


Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding

2020-09-07 Thread Christophe Leroy




Le 07/09/2020 à 20:00, Gerald Schaefer a écrit :

From: Alexander Gordeev 

Commit 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast
code") introduced a subtle but severe bug on s390 with gup_fast, due to
dynamic page table folding.

The question "What would it require for the generic code to work for s390"
has already been discussed here
https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1
and ended with a promising approach here
https://lkml.kernel.org/r/20190419153307.4f2911b5@mschwideX1
which in the end unfortunately didn't quite work completely.

We tried to mimic static level folding by changing pgd_offset to always
calculate top level page table offset, and do nothing in folded pXd_offset.
What has been overlooked is that PxD_SIZE/MASK and thus pXd_addr_end do
not reflect this dynamic behaviour, and still act like static 5-level
page tables.



[...]



Fix this by introducing new pXd_addr_end_folded helpers, which take an
additional pXd entry value parameter, that can be used on s390
to determine the correct page table level and return corresponding
end / boundary. With that, the pointer iteration will always
happen in gup_pgd_range for s390. No change for other architectures
introduced.


Not sure pXd_addr_end_folded() is the best understandable name, 
allthough I don't have any alternative suggestion at the moment.
Maybe could be something like pXd_addr_end_fixup() as it will disappear 
in the next patch, or pXd_addr_end_gup() ?


Also, if it happens to be acceptable to get patch 2 in stable, I think 
you should switch patch 1 and patch 2 to avoid the step through 
pXd_addr_end_folded()





Fixes: 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code")
Cc:  # 5.2+
Reviewed-by: Gerald Schaefer 
Signed-off-by: Alexander Gordeev 
Signed-off-by: Gerald Schaefer 
---
  arch/s390/include/asm/pgtable.h | 42 +
  include/linux/pgtable.h | 16 +
  mm/gup.c|  8 +++
  3 files changed, 62 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 7eb01a5459cd..027206e4959d 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -512,6 +512,48 @@ static inline bool mm_pmd_folded(struct mm_struct *mm)
  }
  #define mm_pmd_folded(mm) mm_pmd_folded(mm)
  
+/*

+ * With dynamic page table levels on s390, the static pXd_addr_end() functions
+ * will not return corresponding dynamic boundaries. This is no problem as long
+ * as only pXd pointers are passed down during page table walk, because
+ * pXd_offset() will simply return the given pointer for folded levels, and the
+ * pointer iteration over a range simply happens at the correct page table
+ * level.
+ * It is however a problem with gup_fast, or other places walking the page
+ * tables w/o locks using READ_ONCE(), and passing down the pXd values instead
+ * of pointers. In this case, the pointer given to pXd_offset() is a pointer to
+ * a stack variable, which cannot be used for pointer iteration at the correct
+ * level. Instead, the iteration then has to happen by going up to pgd level
+ * again. To allow this, provide pXd_addr_end_folded() functions with an
+ * additional pXd value parameter, which can be used on s390 to determine the
+ * folding level and return the corresponding boundary.
+ */
+static inline unsigned long rste_addr_end_folded(unsigned long rste, unsigned 
long addr, unsigned long end)


What does 'rste' stands for ?

Isn't this line a bit long ?


+{
+   unsigned long type = (rste & _REGION_ENTRY_TYPE_MASK) >> 2;
+   unsigned long size = 1UL << (_SEGMENT_SHIFT + type * 11);
+   unsigned long boundary = (addr + size) & ~(size - 1);
+
+   /*
+* FIXME The below check is for internal testing only, to be removed
+*/
+   VM_BUG_ON(type < (_REGION_ENTRY_TYPE_R3 >> 2));
+
+   return (boundary - 1) < (end - 1) ? boundary : end;
+}
+
+#define pgd_addr_end_folded pgd_addr_end_folded
+static inline unsigned long pgd_addr_end_folded(pgd_t pgd, unsigned long addr, 
unsigned long end)
+{
+   return rste_addr_end_folded(pgd_val(pgd), addr, end);
+}
+
+#define p4d_addr_end_folded p4d_addr_end_folded
+static inline unsigned long p4d_addr_end_folded(p4d_t p4d, unsigned long addr, 
unsigned long end)
+{
+   return rste_addr_end_folded(p4d_val(p4d), addr, end);
+}
+
  static inline int mm_has_pgste(struct mm_struct *mm)
  {
  #ifdef CONFIG_PGSTE
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index e8cbc2e795d5..981c4c2a31fe 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -681,6 +681,22 @@ static inline int arch_unmap_one(struct mm_struct *mm,
  })
  #endif
  
+#ifndef pgd_addr_end_folded

+#define pgd_addr_end_folded(pgd, addr, end)pgd_addr_end(addr, end)
+#endif
+
+#ifndef p4d_addr_end_folded
+#define p4d_addr_end_folded(p4d, addr, end)p4d_addr_end(addr, 

Re: [RFC PATCH v2 0/3] mm/gup: fix gup_fast with dynamic page table folding

2020-09-07 Thread Christophe Leroy




Le 07/09/2020 à 20:00, Gerald Schaefer a écrit :

This is v2 of an RFC previously discussed here:
https://lore.kernel.org/lkml/20200828140314.8556-1-gerald.schae...@linux.ibm.com/

Patch 1 is a fix for a regression in gup_fast on s390, after our conversion
to common gup_fast code. It will introduce special helper functions
pXd_addr_end_folded(), which have to be used in places where pagetable walk
is done w/o lock and with READ_ONCE, so currently only in gup_fast.

Patch 2 is an attempt to make that more generic, i.e. change pXd_addr_end()
themselves by adding an extra pXd value parameter. That was suggested by
Jason during v1 discussion, because he is already thinking of some other
places where he might want to switch to the READ_ONCE logic for pagetable
walks. In general, that would be the cleanest / safest solution, but there
is some impact on other architectures and common code, hence the new and
greatly enlarged recipient list.

Patch 3 is a "nice to have" add-on, which makes pXd_addr_end() inline
functions instead of #defines, so that we get some type checking for the
new pXd value parameter.

Not sure about Fixes/stable tags for the generic solution. Only patch 1
fixes a real bug on s390, and has Fixes/stable tags. Patches 2 + 3 might
still be nice to have in stable, to ease future backports, but I guess
"nice to have" does not really qualify for stable backports.


If one day you have to backport a fix that requires patch 2 and/or 3, 
just mark it "depends-on:" and the patches will go in stable at the 
relevant time.


Christophe


[PATCH v2] kbuild: preprocess module linker script

2020-09-07 Thread Masahiro Yamada
There was a request to preprocess the module linker script like we
do for the vmlinux one. (https://lkml.org/lkml/2020/8/21/512)

The difference between vmlinux.lds and module.lds is that the latter
is needed for external module builds, thus must be cleaned up by
'make mrproper' instead of 'make clean'. Also, it must be created
by 'make modules_prepare'.

You cannot put it in arch/$(SRCARCH)/kernel/, which is cleaned up by
'make clean'. I moved arch/$(SRCARCH)/kernel/module.lds to
arch/$(SRCARCH)/include/asm/module.lds.h, which is included from
scripts/module.lds.S.

scripts/module.lds is fine because 'make clean' keeps all the
build artifacts under scripts/.

You can add arch-specific sections in .

Signed-off-by: Masahiro Yamada 
Tested-by: Jessica Yu 
Acked-by: Will Deacon 
---

Changes in v2:
  - Fix the race between the two targets 'scripts' and 'asm-generic'

 Makefile   | 10 ++
 arch/arm/Makefile  |  4 
 .../{kernel/module.lds => include/asm/module.lds.h}|  2 ++
 arch/arm64/Makefile|  4 
 .../{kernel/module.lds => include/asm/module.lds.h}|  2 ++
 arch/ia64/Makefile |  1 -
 arch/ia64/{module.lds => include/asm/module.lds.h} |  0
 arch/m68k/Makefile |  1 -
 .../{kernel/module.lds => include/asm/module.lds.h}|  0
 arch/powerpc/Makefile  |  1 -
 .../{kernel/module.lds => include/asm/module.lds.h}|  0
 arch/riscv/Makefile|  3 ---
 .../{kernel/module.lds => include/asm/module.lds.h}|  3 ++-
 arch/um/include/asm/Kbuild |  1 +
 include/asm-generic/Kbuild |  1 +
 include/asm-generic/module.lds.h   | 10 ++
 scripts/.gitignore |  1 +
 scripts/Makefile   |  3 +++
 scripts/Makefile.modfinal  |  5 ++---
 scripts/{module-common.lds => module.lds.S}|  3 +++
 scripts/package/builddeb   |  2 +-
 21 files changed, 34 insertions(+), 23 deletions(-)
 rename arch/arm/{kernel/module.lds => include/asm/module.lds.h} (72%)
 rename arch/arm64/{kernel/module.lds => include/asm/module.lds.h} (76%)
 rename arch/ia64/{module.lds => include/asm/module.lds.h} (100%)
 rename arch/m68k/{kernel/module.lds => include/asm/module.lds.h} (100%)
 rename arch/powerpc/{kernel/module.lds => include/asm/module.lds.h} (100%)
 rename arch/riscv/{kernel/module.lds => include/asm/module.lds.h} (84%)
 create mode 100644 include/asm-generic/module.lds.h
 rename scripts/{module-common.lds => module.lds.S} (93%)

diff --git a/Makefile b/Makefile
index 37739ee53f27..97b1dae1783b 100644
--- a/Makefile
+++ b/Makefile
@@ -505,7 +505,6 @@ KBUILD_CFLAGS_KERNEL :=
 KBUILD_AFLAGS_MODULE  := -DMODULE
 KBUILD_CFLAGS_MODULE  := -DMODULE
 KBUILD_LDFLAGS_MODULE :=
-export KBUILD_LDS_MODULE := $(srctree)/scripts/module-common.lds
 KBUILD_LDFLAGS :=
 CLANG_FLAGS :=
 
@@ -1395,7 +1394,7 @@ endif
 # using awk while concatenating to the final file.
 
 PHONY += modules
-modules: $(if $(KBUILD_BUILTIN),vmlinux) modules_check
+modules: $(if $(KBUILD_BUILTIN),vmlinux) modules_check modules_prepare
$(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modpost
 
 PHONY += modules_check
@@ -1412,6 +1411,7 @@ targets += modules.order
 # Target to prepare building external modules
 PHONY += modules_prepare
 modules_prepare: prepare
+   $(Q)$(MAKE) $(build)=scripts scripts/module.lds
 
 # Target to install modules
 PHONY += modules_install
@@ -1743,7 +1743,9 @@ help:
@echo  '  clean   - remove generated files in module directory 
only'
@echo  ''
 
-PHONY += prepare
+# no-op for external module builds
+PHONY += prepare modules_prepare
+
 endif # KBUILD_EXTMOD
 
 # Single targets
@@ -1776,7 +1778,7 @@ MODORDER := .modules.tmp
 endif
 
 PHONY += single_modpost
-single_modpost: $(single-no-ko)
+single_modpost: $(single-no-ko) modules_prepare
$(Q){ $(foreach m, $(single-ko), echo $(extmod-prefix)$m;) } > 
$(MODORDER)
$(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modpost
 
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 4e877354515f..a0cb15de9677 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -16,10 +16,6 @@ LDFLAGS_vmlinux  += --be8
 KBUILD_LDFLAGS_MODULE  += --be8
 endif
 
-ifeq ($(CONFIG_ARM_MODULE_PLTS),y)
-KBUILD_LDS_MODULE  += $(srctree)/arch/arm/kernel/module.lds
-endif
-
 GZFLAGS:=-9
 #KBUILD_CFLAGS +=-pipe
 
diff --git a/arch/arm/kernel/module.lds b/arch/arm/include/asm/module.lds.h
similarity index 72%
rename from arch/arm/kernel/module.lds
rename to arch/arm/include/asm/module.lds.h
index 79cb6af565e5..0e7cb4e314b4 100644
--- a/arch/arm/kernel/module.lds
+++ 

Re: [PATCH v1 02/10] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE on iommu_*_coherent()

2020-09-07 Thread Alexey Kardashevskiy




On 04/09/2020 16:04, Leonardo Bras wrote:

On Thu, 2020-09-03 at 14:41 +1000, Alexey Kardashevskiy wrote:

I am new to this, so I am trying to understand how a memory page mapped

as DMA, and used for something else could be a problem.


  From the device prospective, there is PCI space and everything from 0
till 1<<64 is accessible and what is that mapped to - the device does
not know. PHB's IOMMU is the thing to notice invalid access and raise
EEH but PHB only knows about PCI->physical memory mapping (with IOMMU
pages) but nothing about the host kernel pages. Does this help? Thanks,


According to our conversation on Slack:
1- There is a problem if a hypervisor gives to it's VMs contiguous
memory blocks that are not aligned to IOMMU pages, because then an
iommu_map_page() could map some memory in this VM and some memory in
other VM / process.
2- To guarantee this, we should have system pagesize >= iommu_pagesize

One way to get (2) is by doing this in enable_ddw():
if ((query.page_size & 4) && PAGE_SHIFT >= 24) {


You won't ever (well, soon) see PAGE_SHIFT==24, it is either 4K or 64K. 
However 16MB IOMMU pages is fine - if hypervisor uses huge pages for VMs 
RAM, it also then advertises huge IOMMU pages in ddw-query. So for the 
1:1 case there must be no "PAGE_SHIFT >= 24".




page_shift = 24; /* 16MB */
} else if ((query.page_size & 2) &&  PAGE_SHIFT >= 16 ) {
page_shift = 16; /* 64kB */
} else if (query.page_size & 1 &&  PAGE_SHIFT >= 12) {
page_shift = 12; /* 4kB */
[...]

Another way of solving this, would be adding in LoPAR documentation
that the blocksize of contiguous memory the hypervisor gives a VM
should always be aligned to IOMMU pagesize offered.


I think this is assumed already by the design of the DDW API.



I think the best approach would be first sending the above patch, which
is faster, and then get working into adding that to documentation, so
hypervisors guarantee this.

If this gets into the docs, we can revert the patch.

What do you think?
I think we diverted from the original patch :) I am not quite sure what 
you were fixing there. Thanks,



--
Alexey


[PATCH kernel] powerpc/dma: Fix dma_map_ops::get_required_mask

2020-09-07 Thread Alexey Kardashevskiy
There are 2 problems with it:
1. "<" vs expected "<<"
2. the shift number is an IOMMU page number mask, not an address mask
as the IOMMU page shift is missing.

This did not hit us before f1565c24b596 ("powerpc: use the generic
dma_ops_bypass mode") because we had there additional code to handle
bypass mask so this chunk (almost?) never executed. However there
were reports that aacraid does not work with "iommu=nobypass".
After f1565c24b596, aacraid (and probably others which call
dma_get_required_mask() before setting the mask) was unable to
enable 64bit DMA and fall back to using IOMMU which was known not to work,
one of the problems is double free of an IOMMU page.

This fixes DMA for aacraid, both with and without "iommu=nobypass"
in the kernel command line. Verified with "stress-ng -d 4".

Fixes: f1565c24b596 ("powerpc: use the generic dma_ops_bypass mode")
Signed-off-by: Alexey Kardashevskiy 
---

The original code came Jun 24 2011:
6a5c7be5e484 ("powerpc: Override dma_get_required_mask by platform hook and 
ops")


What is dma_get_required_mask() for anyway? What "requires" what here?

Even though it works for now (due to huge - >4GB - default DMA window),
I am still not convinced we do not want this chunk here
(this is what f1565c24b596 removed):

if (dev_is_pci(dev)) {
u64 bypass_mask = dma_direct_get_required_mask(dev);

if (dma_iommu_bypass_supported(dev, bypass_mask))
return bypass_mask;
}
---
 arch/powerpc/kernel/dma-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 569fecd7b5b2..9053fc9d20c7 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -120,7 +120,8 @@ u64 dma_iommu_get_required_mask(struct device *dev)
if (!tbl)
return 0;
 
-   mask = 1ULL < (fls_long(tbl->it_offset + tbl->it_size) - 1);
+   mask = 1ULL << (fls_long(tbl->it_offset + tbl->it_size) +
+   tbl->it_page_shift - 1);
mask += mask - 1;
 
return mask;
-- 
2.17.1



Re: [PATCH AUTOSEL 5.8 14/53] ibmvnic fix NULL tx_pools and rx_tools issue at do_reset

2020-09-07 Thread Sasha Levin

On Mon, Sep 07, 2020 at 02:10:26PM -0700, Jakub Kicinski wrote:

On Mon,  7 Sep 2020 12:31:40 -0400 Sasha Levin wrote:

[ Upstream commit 9f13457377907fa253aef560e1a37e1ca4197f9b ]



@@ -2024,10 +2033,14 @@ static int do_reset(struct ibmvnic_adapter *adapter,
} else {
rc = reset_tx_pools(adapter);
if (rc)
+   netdev_dbg(adapter->netdev, "reset tx pools failed 
(%d)\n",
+   rc);
goto out;

rc = reset_rx_pools(adapter);
if (rc)
+   netdev_dbg(adapter->netdev, "reset rx pools failed 
(%d)\n",
+   rc);
goto out;
}
ibmvnic_disable_irqs(adapter);


Hi Sasha!

I just pushed this to net:

8ae4dff882eb ("ibmvnic: add missing parenthesis in do_reset()")

You definitely want to pull that in if you decide to backport this one.


Will do, thanks!

--
Thanks,
Sasha


Re: [PATCH AUTOSEL 5.8 14/53] ibmvnic fix NULL tx_pools and rx_tools issue at do_reset

2020-09-07 Thread Jakub Kicinski
On Mon,  7 Sep 2020 12:31:40 -0400 Sasha Levin wrote:
> [ Upstream commit 9f13457377907fa253aef560e1a37e1ca4197f9b ]

> @@ -2024,10 +2033,14 @@ static int do_reset(struct ibmvnic_adapter *adapter,
>   } else {
>   rc = reset_tx_pools(adapter);
>   if (rc)
> + netdev_dbg(adapter->netdev, "reset tx pools 
> failed (%d)\n",
> + rc);
>   goto out;
>  
>   rc = reset_rx_pools(adapter);
>   if (rc)
> + netdev_dbg(adapter->netdev, "reset rx pools 
> failed (%d)\n",
> + rc);
>   goto out;
>   }
>   ibmvnic_disable_irqs(adapter);

Hi Sasha!

I just pushed this to net:

8ae4dff882eb ("ibmvnic: add missing parenthesis in do_reset()")

You definitely want to pull that in if you decide to backport this one.


Re: [RFC PATCH v2 3/3] mm: make generic pXd_addr_end() macros inline functions

2020-09-07 Thread Mike Rapoport
Hi,

Some style comments below.

On Mon, Sep 07, 2020 at 08:00:58PM +0200, Gerald Schaefer wrote:
> From: Alexander Gordeev 
> 
> Since pXd_addr_end() macros take pXd page-table entry as a
> parameter it makes sense to check the entry type on compile.
> Even though most archs do not make use of page-table entries
> in pXd_addr_end() calls, checking the type in traversal code
> paths could help to avoid subtle bugs.
> 
> Signed-off-by: Alexander Gordeev 
> Signed-off-by: Gerald Schaefer 
> ---
>  include/linux/pgtable.h | 36 
>  1 file changed, 20 insertions(+), 16 deletions(-)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 67ebc22cf83d..d9e7d16c2263 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -656,31 +656,35 @@ static inline int arch_unmap_one(struct mm_struct *mm,
>   */
>  
>  #ifndef pgd_addr_end
> -#define pgd_addr_end(pgd, addr, end) \
> -({   unsigned long __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;  \
> - (__boundary - 1 < (end) - 1)? __boundary: (end);\
> -})
> +#define pgd_addr_end pgd_addr_end
> +static inline unsigned long pgd_addr_end(pgd_t pgd, unsigned long addr, 
> unsigned long end)
> +{unsigned long __boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;

The code should be on a separate line from the curly brace.
Besides, since this is not a macro anymore, I think it would be nicer to
use 'boundary' without underscores.
This applies to the changes below as well.

> + return (__boundary - 1 < end - 1) ? __boundary : end;
> +}
>  #endif
>  
>  #ifndef p4d_addr_end
> -#define p4d_addr_end(p4d, addr, end) \
> -({   unsigned long __boundary = ((addr) + P4D_SIZE) & P4D_MASK;  \
> - (__boundary - 1 < (end) - 1)? __boundary: (end);\
> -})
> +#define p4d_addr_end p4d_addr_end
> +static inline unsigned long p4d_addr_end(p4d_t p4d, unsigned long addr, 
> unsigned long end)
> +{unsigned long __boundary = (addr + P4D_SIZE) & P4D_MASK;
> + return (__boundary - 1 < end - 1) ? __boundary : end;
> +}
>  #endif
>  
>  #ifndef pud_addr_end
> -#define pud_addr_end(pud, addr, end) \
> -({   unsigned long __boundary = ((addr) + PUD_SIZE) & PUD_MASK;  \
> - (__boundary - 1 < (end) - 1)? __boundary: (end);\
> -})
> +#define pud_addr_end pud_addr_end
> +static inline unsigned long pud_addr_end(pud_t pud, unsigned long addr, 
> unsigned long end)
> +{unsigned long __boundary = (addr + PUD_SIZE) & PUD_MASK;
> + return (__boundary - 1 < end - 1) ? __boundary : end;
> +}
>  #endif
>  
>  #ifndef pmd_addr_end
> -#define pmd_addr_end(pmd, addr, end) \
> -({   unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK;  \
> - (__boundary - 1 < (end) - 1)? __boundary: (end);\
> -})
> +#define pmd_addr_end pmd_addr_end
> +static inline unsigned long pmd_addr_end(pmd_t pmd, unsigned long addr, 
> unsigned long end)
> +{unsigned long __boundary = (addr + PMD_SIZE) & PMD_MASK;
> + return (__boundary - 1 < end - 1) ? __boundary : end;
> +}
>  #endif
>  
>  /*
> -- 
> 2.17.1
> 

-- 
Sincerely yours,
Mike.


Re: [RFC PATCH v2 0/3] mm/gup: fix gup_fast with dynamic page table folding

2020-09-07 Thread Mike Rapoport
On Mon, Sep 07, 2020 at 08:00:55PM +0200, Gerald Schaefer wrote:
> This is v2 of an RFC previously discussed here:
> https://lore.kernel.org/lkml/20200828140314.8556-1-gerald.schae...@linux.ibm.com/
> 
> Patch 1 is a fix for a regression in gup_fast on s390, after our conversion
> to common gup_fast code. It will introduce special helper functions
> pXd_addr_end_folded(), which have to be used in places where pagetable walk
> is done w/o lock and with READ_ONCE, so currently only in gup_fast.
> 
> Patch 2 is an attempt to make that more generic, i.e. change pXd_addr_end()
> themselves by adding an extra pXd value parameter. That was suggested by
> Jason during v1 discussion, because he is already thinking of some other
> places where he might want to switch to the READ_ONCE logic for pagetable
> walks. In general, that would be the cleanest / safest solution, but there
> is some impact on other architectures and common code, hence the new and
> greatly enlarged recipient list.
> 
> Patch 3 is a "nice to have" add-on, which makes pXd_addr_end() inline
> functions instead of #defines, so that we get some type checking for the
> new pXd value parameter.
> 
> Not sure about Fixes/stable tags for the generic solution. Only patch 1
> fixes a real bug on s390, and has Fixes/stable tags. Patches 2 + 3 might
> still be nice to have in stable, to ease future backports, but I guess
> "nice to have" does not really qualify for stable backports.

I also think that adding pXd parameter to pXd_addr_end() is a cleaner
way and with this patch 1 is not really required. I would even merge
patches 2 and 3 into a single patch and use only it as the fix.

[ /me apologises to stable@ team :-) ]

> Changes in v2:
> - Pick option 2 from v1 discussion (pXd_addr_end_folded helpers)
> - Add patch 2 + 3 for more generic approach
> 
> Alexander Gordeev (3):
>   mm/gup: fix gup_fast with dynamic page table folding
>   mm: make pXd_addr_end() functions page-table entry aware
>   mm: make generic pXd_addr_end() macros inline functions
> 
>  arch/arm/include/asm/pgtable-2level.h|  2 +-
>  arch/arm/mm/idmap.c  |  6 ++--
>  arch/arm/mm/mmu.c|  8 ++---
>  arch/arm64/kernel/hibernate.c| 16 +
>  arch/arm64/kvm/mmu.c | 16 -
>  arch/arm64/mm/kasan_init.c   |  8 ++---
>  arch/arm64/mm/mmu.c  | 25 +++---
>  arch/powerpc/mm/book3s64/radix_pgtable.c |  7 ++--
>  arch/powerpc/mm/hugetlbpage.c|  6 ++--
>  arch/s390/include/asm/pgtable.h  | 42 
>  arch/s390/mm/page-states.c   |  8 ++---
>  arch/s390/mm/pageattr.c  |  8 ++---
>  arch/s390/mm/vmem.c  |  8 ++---
>  arch/sparc/mm/hugetlbpage.c  |  6 ++--
>  arch/um/kernel/tlb.c |  8 ++---
>  arch/x86/mm/init_64.c| 15 -
>  arch/x86/mm/kasan_init_64.c  | 16 -
>  include/asm-generic/pgtable-nop4d.h  |  2 +-
>  include/asm-generic/pgtable-nopmd.h  |  2 +-
>  include/asm-generic/pgtable-nopud.h  |  2 +-
>  include/linux/pgtable.h  | 38 -
>  mm/gup.c |  8 ++---
>  mm/ioremap.c |  8 ++---
>  mm/kasan/init.c  | 17 +-
>  mm/madvise.c |  4 +--
>  mm/memory.c  | 40 +++---
>  mm/mlock.c   | 18 +++---
>  mm/mprotect.c|  8 ++---
>  mm/pagewalk.c|  8 ++---
>  mm/swapfile.c|  8 ++---
>  mm/vmalloc.c | 16 -
>  31 files changed, 219 insertions(+), 165 deletions(-)
> 
> -- 
> 2.17.1
> 

-- 
Sincerely yours,
Mike.


[RFC PATCH v2 0/3] mm/gup: fix gup_fast with dynamic page table folding

2020-09-07 Thread Gerald Schaefer
This is v2 of an RFC previously discussed here:
https://lore.kernel.org/lkml/20200828140314.8556-1-gerald.schae...@linux.ibm.com/

Patch 1 is a fix for a regression in gup_fast on s390, after our conversion
to common gup_fast code. It will introduce special helper functions
pXd_addr_end_folded(), which have to be used in places where pagetable walk
is done w/o lock and with READ_ONCE, so currently only in gup_fast.

Patch 2 is an attempt to make that more generic, i.e. change pXd_addr_end()
themselves by adding an extra pXd value parameter. That was suggested by
Jason during v1 discussion, because he is already thinking of some other
places where he might want to switch to the READ_ONCE logic for pagetable
walks. In general, that would be the cleanest / safest solution, but there
is some impact on other architectures and common code, hence the new and
greatly enlarged recipient list.

Patch 3 is a "nice to have" add-on, which makes pXd_addr_end() inline
functions instead of #defines, so that we get some type checking for the
new pXd value parameter.

Not sure about Fixes/stable tags for the generic solution. Only patch 1
fixes a real bug on s390, and has Fixes/stable tags. Patches 2 + 3 might
still be nice to have in stable, to ease future backports, but I guess
"nice to have" does not really qualify for stable backports.

Changes in v2:
- Pick option 2 from v1 discussion (pXd_addr_end_folded helpers)
- Add patch 2 + 3 for more generic approach

Alexander Gordeev (3):
  mm/gup: fix gup_fast with dynamic page table folding
  mm: make pXd_addr_end() functions page-table entry aware
  mm: make generic pXd_addr_end() macros inline functions

 arch/arm/include/asm/pgtable-2level.h|  2 +-
 arch/arm/mm/idmap.c  |  6 ++--
 arch/arm/mm/mmu.c|  8 ++---
 arch/arm64/kernel/hibernate.c| 16 +
 arch/arm64/kvm/mmu.c | 16 -
 arch/arm64/mm/kasan_init.c   |  8 ++---
 arch/arm64/mm/mmu.c  | 25 +++---
 arch/powerpc/mm/book3s64/radix_pgtable.c |  7 ++--
 arch/powerpc/mm/hugetlbpage.c|  6 ++--
 arch/s390/include/asm/pgtable.h  | 42 
 arch/s390/mm/page-states.c   |  8 ++---
 arch/s390/mm/pageattr.c  |  8 ++---
 arch/s390/mm/vmem.c  |  8 ++---
 arch/sparc/mm/hugetlbpage.c  |  6 ++--
 arch/um/kernel/tlb.c |  8 ++---
 arch/x86/mm/init_64.c| 15 -
 arch/x86/mm/kasan_init_64.c  | 16 -
 include/asm-generic/pgtable-nop4d.h  |  2 +-
 include/asm-generic/pgtable-nopmd.h  |  2 +-
 include/asm-generic/pgtable-nopud.h  |  2 +-
 include/linux/pgtable.h  | 38 -
 mm/gup.c |  8 ++---
 mm/ioremap.c |  8 ++---
 mm/kasan/init.c  | 17 +-
 mm/madvise.c |  4 +--
 mm/memory.c  | 40 +++---
 mm/mlock.c   | 18 +++---
 mm/mprotect.c|  8 ++---
 mm/pagewalk.c|  8 ++---
 mm/swapfile.c|  8 ++---
 mm/vmalloc.c | 16 -
 31 files changed, 219 insertions(+), 165 deletions(-)

-- 
2.17.1



[RFC PATCH v2 2/3] mm: make pXd_addr_end() functions page-table entry aware

2020-09-07 Thread Gerald Schaefer
From: Alexander Gordeev 

Unlike all other page-table abstractions pXd_addr_end() do not take
into account a particular table entry in which context the functions
are called. On architectures with dynamic page-tables folding that
might lead to lack of necessary information that is difficult to
obtain other than from the table entry itself. That already led to
a subtle memory corruption issue on s390.

By letting pXd_addr_end() functions know about the page-table entry
we allow archs not only make extra checks, but also optimizations.

As result of this change the pXd_addr_end_folded() functions used
in gup_fast traversal code become unnecessary and get replaced with
universal pXd_addr_end() variants.

The arch-specific updates not only add dereferencing of page-table
entry pointers, but also small changes to the code flow to make those
dereferences possible, at least for x86 and powerpc. Also for arm64,
but in way that should not have any impact.

So, even though the dereferenced page-table entries are not used on
archs other than s390, and are optimized out by the compiler, there
is a small change in kernel size and this is what bloat-o-meter reports:

x86:
add/remove: 0/0 grow/shrink: 2/0 up/down: 10/0 (10)
Function old new   delta
vmemmap_populate 587 592  +5
munlock_vma_pages_range  556 561  +5
Total: Before=15534694, After=15534704, chg +0.00%

powerpc:
add/remove: 0/0 grow/shrink: 1/0 up/down: 4/0 (4)
Function old new   delta
.remove_pagetable   16481652  +4
Total: Before=21478240, After=21478244, chg +0.00%

arm64:
add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
Function old new   delta
Total: Before=20240851, After=20240851, chg +0.00%

sparc:
add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
Function old new   delta
Total: Before=4907262, After=4907262, chg +0.00%

Signed-off-by: Alexander Gordeev 
Signed-off-by: Gerald Schaefer 
---
 arch/arm/include/asm/pgtable-2level.h|  2 +-
 arch/arm/mm/idmap.c  |  6 ++--
 arch/arm/mm/mmu.c|  8 ++---
 arch/arm64/kernel/hibernate.c| 16 ++
 arch/arm64/kvm/mmu.c | 16 +-
 arch/arm64/mm/kasan_init.c   |  8 ++---
 arch/arm64/mm/mmu.c  | 25 +++
 arch/powerpc/mm/book3s64/radix_pgtable.c |  7 ++---
 arch/powerpc/mm/hugetlbpage.c|  6 ++--
 arch/s390/include/asm/pgtable.h  |  8 ++---
 arch/s390/mm/page-states.c   |  8 ++---
 arch/s390/mm/pageattr.c  |  8 ++---
 arch/s390/mm/vmem.c  |  8 ++---
 arch/sparc/mm/hugetlbpage.c  |  6 ++--
 arch/um/kernel/tlb.c |  8 ++---
 arch/x86/mm/init_64.c| 15 -
 arch/x86/mm/kasan_init_64.c  | 16 +-
 include/asm-generic/pgtable-nop4d.h  |  2 +-
 include/asm-generic/pgtable-nopmd.h  |  2 +-
 include/asm-generic/pgtable-nopud.h  |  2 +-
 include/linux/pgtable.h  | 26 ---
 mm/gup.c |  8 ++---
 mm/ioremap.c |  8 ++---
 mm/kasan/init.c  | 17 +-
 mm/madvise.c |  4 +--
 mm/memory.c  | 40 
 mm/mlock.c   | 18 ---
 mm/mprotect.c|  8 ++---
 mm/pagewalk.c|  8 ++---
 mm/swapfile.c|  8 ++---
 mm/vmalloc.c | 16 +-
 31 files changed, 165 insertions(+), 173 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level.h 
b/arch/arm/include/asm/pgtable-2level.h
index 3502c2f746ca..5e6416b339f4 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -209,7 +209,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long 
addr)
} while (0)
 
 /* we don't need complex calculations here as the pmd is folded into the pgd */
-#define pmd_addr_end(addr,end) (end)
+#define pmd_addr_end(pmd,addr,end) (end)
 
 #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
 
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index 448e57c6f653..5437f943ca8b 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -46,7 +46,7 @@ static void idmap_add_pmd(pud_t *pud, unsigned long addr, 
unsigned long end,
pmd = pmd_offset(pud, addr);
 
do {
-   next = pmd_addr_end(addr, end);
+   next = pmd_addr_end(*pmd, addr, end);
*pmd = __pmd((addr & PMD_MASK) | prot);
flush_pmd_entry(pmd);
} while (pmd++, addr = next, addr != end);
@@ 

[RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding

2020-09-07 Thread Gerald Schaefer
From: Alexander Gordeev 

Commit 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast
code") introduced a subtle but severe bug on s390 with gup_fast, due to
dynamic page table folding.

The question "What would it require for the generic code to work for s390"
has already been discussed here
https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1
and ended with a promising approach here
https://lkml.kernel.org/r/20190419153307.4f2911b5@mschwideX1
which in the end unfortunately didn't quite work completely.

We tried to mimic static level folding by changing pgd_offset to always
calculate top level page table offset, and do nothing in folded pXd_offset.
What has been overlooked is that PxD_SIZE/MASK and thus pXd_addr_end do
not reflect this dynamic behaviour, and still act like static 5-level
page tables.

Here is an example of what happens with gup_fast on s390, for a task with
3-levels paging, crossing a 2 GB pud boundary:

// addr = 0x1007000, end = 0x10080001000
static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end,
 unsigned int flags, struct page **pages, int *nr)
{
unsigned long next;
pud_t *pudp;

// pud_offset returns  itself (a pointer to a value on stack)
pudp = pud_offset(, addr);
do {
// on second iteratation reading "random" stack value
pud_t pud = READ_ONCE(*pudp);

// next = 0x1008000, due to PUD_SIZE/MASK != 
PGDIR_SIZE/MASK on s390
next = pud_addr_end(addr, end);
...
} while (pudp++, addr = next, addr != end); // pudp++ iterating over 
stack

return 1;
}

pud_addr_end = 0x1008000 is correct, but the previous pgd/p4d_addr_end
should also have returned that limit, instead of the 5-level static
pgd/p4d limits with PUD_SIZE/MASK != PGDIR_SIZE/MASK. Then the "end"
parameter for gup_pud_range would also have been 0x1008000, and we
would not iterate further in gup_pud_range, but rather go back and
(correctly) do it in gup_pgd_range.

So, for the second iteration in gup_pud_range, we will increase pudp,
which pointed to a stack value and not the real pud table. This new pudp
will then point to whatever lies behind the p4d stack value. In general,
this happens to be the previously read pgd, but it probably could also
be something different, depending on compiler decisions.

Most unfortunately, if it happens to be the pgd value, which is the
same as the p4d / pud due to folding, it is a valid and present entry.
So after the increment, we would still point to the same pud entry.
The addr however has been increased in the second iteration, so that we
now have different pmd/pte_index values, which will result in very wrong
behaviour for the remaining gup_pmd/pte_range calls. We will effectively
operate on an address minus 2 GB, due to missing pudp increase.

In the "good case", if nothing is mapped there, we will fall back to
the slow gup path. But if something is mapped there, and valid
for gup_fast, we will end up (silently) getting references on the wrong
pages and also add the wrong pages to the **pages result array. This
can cause data corruption.

Fix this by introducing new pXd_addr_end_folded helpers, which take an
additional pXd entry value parameter, that can be used on s390
to determine the correct page table level and return corresponding
end / boundary. With that, the pointer iteration will always
happen in gup_pgd_range for s390. No change for other architectures
introduced.

Fixes: 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code")
Cc:  # 5.2+
Reviewed-by: Gerald Schaefer 
Signed-off-by: Alexander Gordeev 
Signed-off-by: Gerald Schaefer 
---
 arch/s390/include/asm/pgtable.h | 42 +
 include/linux/pgtable.h | 16 +
 mm/gup.c|  8 +++
 3 files changed, 62 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 7eb01a5459cd..027206e4959d 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -512,6 +512,48 @@ static inline bool mm_pmd_folded(struct mm_struct *mm)
 }
 #define mm_pmd_folded(mm) mm_pmd_folded(mm)
 
+/*
+ * With dynamic page table levels on s390, the static pXd_addr_end() functions
+ * will not return corresponding dynamic boundaries. This is no problem as long
+ * as only pXd pointers are passed down during page table walk, because
+ * pXd_offset() will simply return the given pointer for folded levels, and the
+ * pointer iteration over a range simply happens at the correct page table
+ * level.
+ * It is however a problem with gup_fast, or other places walking the page
+ * tables w/o locks using READ_ONCE(), and passing down the pXd values instead
+ * of pointers. In this case, the pointer given to pXd_offset() is a pointer to
+ * a stack variable, which cannot be used for 

[RFC PATCH v2 3/3] mm: make generic pXd_addr_end() macros inline functions

2020-09-07 Thread Gerald Schaefer
From: Alexander Gordeev 

Since pXd_addr_end() macros take pXd page-table entry as a
parameter it makes sense to check the entry type on compile.
Even though most archs do not make use of page-table entries
in pXd_addr_end() calls, checking the type in traversal code
paths could help to avoid subtle bugs.

Signed-off-by: Alexander Gordeev 
Signed-off-by: Gerald Schaefer 
---
 include/linux/pgtable.h | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 67ebc22cf83d..d9e7d16c2263 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -656,31 +656,35 @@ static inline int arch_unmap_one(struct mm_struct *mm,
  */
 
 #ifndef pgd_addr_end
-#define pgd_addr_end(pgd, addr, end)   \
-({ unsigned long __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;  \
-   (__boundary - 1 < (end) - 1)? __boundary: (end);\
-})
+#define pgd_addr_end pgd_addr_end
+static inline unsigned long pgd_addr_end(pgd_t pgd, unsigned long addr, 
unsigned long end)
+{  unsigned long __boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
+   return (__boundary - 1 < end - 1) ? __boundary : end;
+}
 #endif
 
 #ifndef p4d_addr_end
-#define p4d_addr_end(p4d, addr, end)   \
-({ unsigned long __boundary = ((addr) + P4D_SIZE) & P4D_MASK;  \
-   (__boundary - 1 < (end) - 1)? __boundary: (end);\
-})
+#define p4d_addr_end p4d_addr_end
+static inline unsigned long p4d_addr_end(p4d_t p4d, unsigned long addr, 
unsigned long end)
+{  unsigned long __boundary = (addr + P4D_SIZE) & P4D_MASK;
+   return (__boundary - 1 < end - 1) ? __boundary : end;
+}
 #endif
 
 #ifndef pud_addr_end
-#define pud_addr_end(pud, addr, end)   \
-({ unsigned long __boundary = ((addr) + PUD_SIZE) & PUD_MASK;  \
-   (__boundary - 1 < (end) - 1)? __boundary: (end);\
-})
+#define pud_addr_end pud_addr_end
+static inline unsigned long pud_addr_end(pud_t pud, unsigned long addr, 
unsigned long end)
+{  unsigned long __boundary = (addr + PUD_SIZE) & PUD_MASK;
+   return (__boundary - 1 < end - 1) ? __boundary : end;
+}
 #endif
 
 #ifndef pmd_addr_end
-#define pmd_addr_end(pmd, addr, end)   \
-({ unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK;  \
-   (__boundary - 1 < (end) - 1)? __boundary: (end);\
-})
+#define pmd_addr_end pmd_addr_end
+static inline unsigned long pmd_addr_end(pmd_t pmd, unsigned long addr, 
unsigned long end)
+{  unsigned long __boundary = (addr + PMD_SIZE) & PMD_MASK;
+   return (__boundary - 1 < end - 1) ? __boundary : end;
+}
 #endif
 
 /*
-- 
2.17.1



[PATCH AUTOSEL 5.4 11/43] ibmvnic fix NULL tx_pools and rx_tools issue at do_reset

2020-09-07 Thread Sasha Levin
From: Mingming Cao 

[ Upstream commit 9f13457377907fa253aef560e1a37e1ca4197f9b ]

At the time of do_rest, ibmvnic tries to re-initalize the tx_pools
and rx_pools to avoid re-allocating the long term buffer. However
there is a window inside do_reset that the tx_pools and
rx_pools were freed before re-initialized making it possible to deference
null pointers.

This patch fix this issue by always check the tx_pool
and rx_pool are not NULL after ibmvnic_login. If so, re-allocating
the pools. This will avoid getting into calling reset_tx/rx_pools with
NULL adapter tx_pools/rx_pools pointer. Also add null pointer check in
reset_tx_pools and reset_rx_pools to safe handle NULL pointer case.

Signed-off-by: Mingming Cao 
Signed-off-by: Dany Madden 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 2d20a48f0ba0a..de45b3709c14e 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -416,6 +416,9 @@ static int reset_rx_pools(struct ibmvnic_adapter *adapter)
int i, j, rc;
u64 *size_array;
 
+   if (!adapter->rx_pool)
+   return -1;
+
size_array = (u64 *)((u8 *)(adapter->login_rsp_buf) +
be32_to_cpu(adapter->login_rsp_buf->off_rxadd_buff_size));
 
@@ -586,6 +589,9 @@ static int reset_tx_pools(struct ibmvnic_adapter *adapter)
int tx_scrqs;
int i, rc;
 
+   if (!adapter->tx_pool)
+   return -1;
+
tx_scrqs = be32_to_cpu(adapter->login_rsp_buf->num_txsubm_subcrqs);
for (i = 0; i < tx_scrqs; i++) {
rc = reset_one_tx_pool(adapter, >tso_pool[i]);
@@ -1918,7 +1924,10 @@ static int do_reset(struct ibmvnic_adapter *adapter,
adapter->req_rx_add_entries_per_subcrq !=
old_num_rx_slots ||
adapter->req_tx_entries_per_subcrq !=
-   old_num_tx_slots) {
+   old_num_tx_slots ||
+   !adapter->rx_pool ||
+   !adapter->tso_pool ||
+   !adapter->tx_pool) {
release_rx_pools(adapter);
release_tx_pools(adapter);
release_napi(adapter);
@@ -1931,10 +1940,14 @@ static int do_reset(struct ibmvnic_adapter *adapter,
} else {
rc = reset_tx_pools(adapter);
if (rc)
+   netdev_dbg(adapter->netdev, "reset tx pools 
failed (%d)\n",
+   rc);
goto out;
 
rc = reset_rx_pools(adapter);
if (rc)
+   netdev_dbg(adapter->netdev, "reset rx pools 
failed (%d)\n",
+   rc);
goto out;
}
ibmvnic_disable_irqs(adapter);
-- 
2.25.1



[PATCH AUTOSEL 5.8 14/53] ibmvnic fix NULL tx_pools and rx_tools issue at do_reset

2020-09-07 Thread Sasha Levin
From: Mingming Cao 

[ Upstream commit 9f13457377907fa253aef560e1a37e1ca4197f9b ]

At the time of do_rest, ibmvnic tries to re-initalize the tx_pools
and rx_pools to avoid re-allocating the long term buffer. However
there is a window inside do_reset that the tx_pools and
rx_pools were freed before re-initialized making it possible to deference
null pointers.

This patch fix this issue by always check the tx_pool
and rx_pool are not NULL after ibmvnic_login. If so, re-allocating
the pools. This will avoid getting into calling reset_tx/rx_pools with
NULL adapter tx_pools/rx_pools pointer. Also add null pointer check in
reset_tx_pools and reset_rx_pools to safe handle NULL pointer case.

Signed-off-by: Mingming Cao 
Signed-off-by: Dany Madden 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 5afb3c9c52d20..d3a774331afc7 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -479,6 +479,9 @@ static int reset_rx_pools(struct ibmvnic_adapter *adapter)
int i, j, rc;
u64 *size_array;
 
+   if (!adapter->rx_pool)
+   return -1;
+
size_array = (u64 *)((u8 *)(adapter->login_rsp_buf) +
be32_to_cpu(adapter->login_rsp_buf->off_rxadd_buff_size));
 
@@ -649,6 +652,9 @@ static int reset_tx_pools(struct ibmvnic_adapter *adapter)
int tx_scrqs;
int i, rc;
 
+   if (!adapter->tx_pool)
+   return -1;
+
tx_scrqs = be32_to_cpu(adapter->login_rsp_buf->num_txsubm_subcrqs);
for (i = 0; i < tx_scrqs; i++) {
rc = reset_one_tx_pool(adapter, >tso_pool[i]);
@@ -2011,7 +2017,10 @@ static int do_reset(struct ibmvnic_adapter *adapter,
adapter->req_rx_add_entries_per_subcrq !=
old_num_rx_slots ||
adapter->req_tx_entries_per_subcrq !=
-   old_num_tx_slots) {
+   old_num_tx_slots ||
+   !adapter->rx_pool ||
+   !adapter->tso_pool ||
+   !adapter->tx_pool) {
release_rx_pools(adapter);
release_tx_pools(adapter);
release_napi(adapter);
@@ -2024,10 +2033,14 @@ static int do_reset(struct ibmvnic_adapter *adapter,
} else {
rc = reset_tx_pools(adapter);
if (rc)
+   netdev_dbg(adapter->netdev, "reset tx pools 
failed (%d)\n",
+   rc);
goto out;
 
rc = reset_rx_pools(adapter);
if (rc)
+   netdev_dbg(adapter->netdev, "reset rx pools 
failed (%d)\n",
+   rc);
goto out;
}
ibmvnic_disable_irqs(adapter);
-- 
2.25.1



Re: fsl_espi errors on v5.7.15

2020-09-07 Thread Joakim Tjernlund
[SNIP]
> > 

> > > Would you be able to ftrace the interrupt handler function and see if you
> > > can see a difference in number or timing of interrupts? I'm at a bit of
> > > a loss.
> > 
> > I tried ftrace but I really wasn't sure what I was looking for.
> > Capturing a "bad" case was pretty tricky. But I think I've identified a
> > fix (I'll send it as a proper patch shortly). The gist is
> > 
> > diff --git a/drivers/spi/spi-fsl-espi.c b/drivers/spi/spi-fsl-espi.c
> > index 7e7c92cafdbb..cb120b68c0e2 100644
> > --- a/drivers/spi/spi-fsl-espi.c
> > +++ b/drivers/spi/spi-fsl-espi.c
> > @@ -574,13 +574,14 @@ static void fsl_espi_cpu_irq(struct fsl_espi
> > *espi, u32 events)
> >   static irqreturn_t fsl_espi_irq(s32 irq, void *context_data)
> >   {
> >  struct fsl_espi *espi = context_data;
> > -   u32 events;
> > +   u32 events, mask;
> > 
> >  spin_lock(>lock);
> > 
> >  /* Get interrupt events(tx/rx) */
> >  events = fsl_espi_read_reg(espi, ESPI_SPIE);
> > -   if (!events) {
> > +   mask = fsl_espi_read_reg(espi, ESPI_SPIM);
> > +   if (!(events & mask)) {
> >  spin_unlock(>lock);
> >  return IRQ_NONE;
> >  }
> > 
> > The SPIE register contains the TXCNT so events is pretty much always
> > going to have something set. By checking events against what we've
> > actually requested interrupts for we don't see any spurious events.
> > 
> > I've tested this on the T2080RDB and on our custom hardware and it seems
> > to resolve the problem.
> > 
> 
> I looked at the fsl_espi_irq() too and noticed that clearing of the IRQ events
> are after processing TX/RX. That looks a bit odd to me.

I should have been more specific. I think you can loose IRQs as fsl_espi_irq() 
works now.
Consider this:
1) You get TX IRQ and enter fsl_espi_irq()
2) Enter fsl_espi_fill_tx_fifo() to process any chars until done.
3) Now you get one more TX IRQ
4) fsl_espi_irq() clear events -> IRQ from 3) is lost.

 Jocke


[PATCH 2/2] powerpc/32: Fix vmap stack - Properly set r1 before activating MMU

2020-09-07 Thread Christophe Leroy
We need r1 to be properly set before activating MMU, otherwise any new
exception taken while saving registers into the stack in exception
prologs will use the user stack, which is wrong and will even lockup
or crash when KUAP is selected.

Do that by switching the meaning of r11 and r1 until we have saved r1
to the stack: copy r1 into r11 and setup the new stack pointer in r1.
To avoid complicating and impacting all generic and specific prolog
code (and more), copy back r1 into r11 once r11 is save onto
the stack.

We could get rid of copying r1 back and forth at the cost of
rewriting everything to use r1 instead of r11 all the way when
CONFIG_VMAP_STACK is set, but the effort is probably not worth it.

Fixes: 028474876f47 ("powerpc/32: prepare for CONFIG_VMAP_STACK")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.h | 43 +++
 1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 21effebb9277..cc36998c5541 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -39,15 +39,24 @@
 .endm
 
 .macro EXCEPTION_PROLOG_1 for_rtas=0
+#ifdef CONFIG_VMAP_STACK
+   mr  r11, r1
+   subir1, r1, INT_FRAME_SIZE  /* use r1 if kernel */
+   beq 1f
+   mfspr   r1,SPRN_SPRG_THREAD
+   lwz r1,TASK_STACK-THREAD(r1)
+   addir1, r1, THREAD_SIZE - INT_FRAME_SIZE
+#else
subir11, r1, INT_FRAME_SIZE /* use r1 if kernel */
beq 1f
mfspr   r11,SPRN_SPRG_THREAD
lwz r11,TASK_STACK-THREAD(r11)
addir11, r11, THREAD_SIZE - INT_FRAME_SIZE
+#endif
 1:
tophys_novmstack r11, r11
 #ifdef CONFIG_VMAP_STACK
-   mtcrf   0x7f, r11
+   mtcrf   0x7f, r1
bt  32 - THREAD_ALIGN_SHIFT, stack_overflow
 #endif
 .endm
@@ -62,6 +71,15 @@
stw r10,_CCR(r11)   /* save registers */
 #endif
mfspr   r10, SPRN_SPRG_SCRATCH0
+#ifdef CONFIG_VMAP_STACK
+   stw r11,GPR1(r1)
+   stw r11,0(r1)
+   mr  r11, r1
+#else
+   stw r1,GPR1(r11)
+   stw r1,0(r11)
+   tovirt(r1, r11) /* set new kernel sp */
+#endif
stw r12,GPR12(r11)
stw r9,GPR9(r11)
stw r10,GPR10(r11)
@@ -89,9 +107,6 @@
mfspr   r12,SPRN_SRR0
mfspr   r9,SPRN_SRR1
 #endif
-   stw r1,GPR1(r11)
-   stw r1,0(r11)
-   tovirt_novmstack r1, r11/* set new kernel sp */
 #ifdef CONFIG_40x
rlwinm  r9,r9,0,14,12   /* clear MSR_WE (necessary?) */
 #else
@@ -309,19 +324,19 @@
 .macro vmap_stack_overflow_exception
 #ifdef CONFIG_VMAP_STACK
 #ifdef CONFIG_SMP
-   mfspr   r11, SPRN_SPRG_THREAD
-   lwz r11, TASK_CPU - THREAD(r11)
-   slwir11, r11, 3
-   addis   r11, r11, emergency_ctx@ha
+   mfspr   r1, SPRN_SPRG_THREAD
+   lwz r1, TASK_CPU - THREAD(r1)
+   slwir1, r1, 3
+   addis   r1, r1, emergency_ctx@ha
 #else
-   lis r11, emergency_ctx@ha
+   lis r1, emergency_ctx@ha
 #endif
-   lwz r11, emergency_ctx@l(r11)
-   cmpwi   cr1, r11, 0
+   lwz r1, emergency_ctx@l(r1)
+   cmpwi   cr1, r1, 0
bne cr1, 1f
-   lis r11, init_thread_union@ha
-   addir11, r11, init_thread_union@l
-1: addir11, r11, THREAD_SIZE - INT_FRAME_SIZE
+   lis r1, init_thread_union@ha
+   addir1, r1, init_thread_union@l
+1: addir1, r1, THREAD_SIZE - INT_FRAME_SIZE
EXCEPTION_PROLOG_2
SAVE_NVGPRS(r11)
addir3, r1, STACK_FRAME_OVERHEAD
-- 
2.25.0



[PATCH 1/2] powerpc/32: Fix vmap stack - Do not activate MMU before reading task struct

2020-09-07 Thread Christophe Leroy
We need r1 to be properly set before activating MMU, so
reading task_struct->stack must be done with MMU off.

This means we need an additional register to play with MSR
bits while r11 now points to the stack. For that, move r10
back to CR (As is already done for hash MMU) and use r10.

We still don't have r1 correct yet when we activate MMU.
It is done in following patch.

Fixes: 028474876f47 ("powerpc/32: prepare for CONFIG_VMAP_STACK")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.S |  6 --
 arch/powerpc/kernel/head_32.h | 31 ++-
 2 files changed, 6 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index f3ab94d73936..d967266d62e8 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -274,14 +274,8 @@ __secondary_hold_acknowledge:
DO_KVM  0x200
 MachineCheck:
EXCEPTION_PROLOG_0
-#ifdef CONFIG_VMAP_STACK
-   li  r11, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */
-   mtmsr   r11
-   isync
-#endif
 #ifdef CONFIG_PPC_CHRP
mfspr   r11, SPRN_SPRG_THREAD
-   tovirt_vmstack r11, r11
lwz r11, RTAS_SP(r11)
cmpwi   cr1, r11, 0
bne cr1, 7f
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 9abec6cd099c..21effebb9277 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -39,24 +39,13 @@
 .endm
 
 .macro EXCEPTION_PROLOG_1 for_rtas=0
-#ifdef CONFIG_VMAP_STACK
-   .ifeq   \for_rtas
-   li  r11, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */
-   mtmsr   r11
-   isync
-   .endif
subir11, r1, INT_FRAME_SIZE /* use r1 if kernel */
-#else
-   tophys(r11,r1)  /* use tophys(r1) if kernel */
-   subir11, r11, INT_FRAME_SIZE/* alloc exc. frame */
-#endif
beq 1f
mfspr   r11,SPRN_SPRG_THREAD
-   tovirt_vmstack r11, r11
lwz r11,TASK_STACK-THREAD(r11)
addir11, r11, THREAD_SIZE - INT_FRAME_SIZE
-   tophys_novmstack r11, r11
 1:
+   tophys_novmstack r11, r11
 #ifdef CONFIG_VMAP_STACK
mtcrf   0x7f, r11
bt  32 - THREAD_ALIGN_SHIFT, stack_overflow
@@ -64,12 +53,11 @@
 .endm
 
 .macro EXCEPTION_PROLOG_2 handle_dar_dsisr=0
-#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S)
-BEGIN_MMU_FTR_SECTION
+#ifdef CONFIG_VMAP_STACK
mtcrr10
-FTR_SECTION_ELSE
-   stw r10, _CCR(r11)
-ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_HPTE_TABLE)
+   li  r10, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */
+   mtmsr   r10
+   isync
 #else
stw r10,_CCR(r11)   /* save registers */
 #endif
@@ -77,11 +65,9 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_HPTE_TABLE)
stw r12,GPR12(r11)
stw r9,GPR9(r11)
stw r10,GPR10(r11)
-#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S)
-BEGIN_MMU_FTR_SECTION
+#ifdef CONFIG_VMAP_STACK
mfcrr10
stw r10, _CCR(r11)
-END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
 #endif
mfspr   r12,SPRN_SPRG_SCRATCH1
stw r12,GPR11(r11)
@@ -97,11 +83,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
stw r10, _DSISR(r11)
.endif
lwz r9, SRR1(r12)
-#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S)
-BEGIN_MMU_FTR_SECTION
andi.   r10, r9, MSR_PR
-END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
-#endif
lwz r12, SRR0(r12)
 #else
mfspr   r12,SPRN_SRR0
@@ -328,7 +310,6 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
 #ifdef CONFIG_VMAP_STACK
 #ifdef CONFIG_SMP
mfspr   r11, SPRN_SPRG_THREAD
-   tovirt(r11, r11)
lwz r11, TASK_CPU - THREAD(r11)
slwir11, r11, 3
addis   r11, r11, emergency_ctx@ha
-- 
2.25.0



Re: [PATCH] kbuild: preprocess module linker script

2020-09-07 Thread Will Deacon
On Fri, Sep 04, 2020 at 10:31:21PM +0900, Masahiro Yamada wrote:
> There was a request to preprocess the module linker script like we do
> for the vmlinux one (https://lkml.org/lkml/2020/8/21/512).
> 
> The difference between vmlinux.lds and module.lds is that the latter
> is needed for external module builds, thus must be cleaned up by
> 'make mrproper' instead of 'make clean' (also, it must be created by
> 'make modules_prepare').
> 
> You cannot put it in arch/*/kernel/ because 'make clean' descends into
> it. I moved arch/*/kernel/module.lds to arch/*/include/asm/module.lds.h,
> which is included from scripts/module.lds.S.
> 
> scripts/module.lds is fine because 'make clean' keeps all the build
> artifacts under scripts/.
> 
> You can add arch-specific sections in .
> 
> Signed-off-by: Masahiro Yamada 
> Tested-by: Jessica Yu 
> ---

For the arm64 bits:

Acked-by: Will Deacon 

Thanks,

Will


Re: KVM on POWER8 host lock up since 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")

2020-09-07 Thread Michal Suchánek
On Mon, Sep 07, 2020 at 11:13:47PM +1000, Nicholas Piggin wrote:
> Excerpts from Michael Ellerman's message of August 31, 2020 8:50 pm:
> > Michal Suchánek  writes:
> >> On Mon, Aug 31, 2020 at 11:14:18AM +1000, Nicholas Piggin wrote:
> >>> Excerpts from Michal Suchánek's message of August 31, 2020 6:11 am:
> >>> > Hello,
> >>> > 
> >>> > on POWER8 KVM hosts lock up since commit 10d91611f426 ("powerpc/64s:
> >>> > Reimplement book3s idle code in C").
> >>> > 
> >>> > The symptom is host locking up completely after some hours of KVM
> >>> > workload with messages like
> >>> > 
> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 
> >>> > 47
> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 
> >>> > 71
> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 
> >>> > 47
> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 
> >>> > 71
> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 
> >>> > 47
> >>> > 
> >>> > printed before the host locks up.
> >>> > 
> >>> > The machines run sandboxed builds which is a mixed workload resulting in
> >>> > IO/single core/mutiple core load over time and there are periods of no
> >>> > activity and no VMS runnig as well. The VMs are shortlived so VM
> >>> > setup/terdown is somewhat excercised as well.
> >>> > 
> >>> > POWER9 with the new guest entry fast path does not seem to be affected.
> >>> > 
> >>> > Reverted the patch and the followup idle fixes on top of 5.2.14 and
> >>> > re-applied commit a3f3072db6ca ("powerpc/powernv/idle: Restore IAMR
> >>> > after idle") which gives same idle code as 5.1.16 and the kernel seems
> >>> > stable.
> >>> > 
> >>> > Config is attached.
> >>> > 
> >>> > I cannot easily revert this commit, especially if I want to use the same
> >>> > kernel on POWER8 and POWER9 - many of the POWER9 fixes are applicable
> >>> > only to the new idle code.
> >>> > 
> >>> > Any idea what can be the problem?
> >>> 
> >>> So hwthread_state is never getting back to to HWTHREAD_IN_IDLE on
> >>> those threads. I wonder what they are doing. POWER8 doesn't have a good
> >>> NMI IPI and I don't know if it supports pdbg dumping registers from the
> >>> BMC unfortunately.
> >>
> >> It may be possible to set up fadump with a later kernel version that
> >> supports it on powernv and dump the whole kernel.
> > 
> > Your firmware won't support it AFAIK.
> > 
> > You could try kdump, but if we have CPUs stuck in KVM then there's a
> > good chance it won't work :/
> 
> I haven't had any luck yet reproducing this still. Testing with sub 
> cores of various different combinations, etc. I'll keep trying though.
> 
> I don't know if there's much we can add to debug it. Can we run pdbg
> on the BMCs on these things?

I suppose it depends on the machine type?

Thanks

Michal


Re: KVM on POWER8 host lock up since 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")

2020-09-07 Thread Nicholas Piggin
Excerpts from Michael Ellerman's message of August 31, 2020 8:50 pm:
> Michal Suchánek  writes:
>> On Mon, Aug 31, 2020 at 11:14:18AM +1000, Nicholas Piggin wrote:
>>> Excerpts from Michal Suchánek's message of August 31, 2020 6:11 am:
>>> > Hello,
>>> > 
>>> > on POWER8 KVM hosts lock up since commit 10d91611f426 ("powerpc/64s:
>>> > Reimplement book3s idle code in C").
>>> > 
>>> > The symptom is host locking up completely after some hours of KVM
>>> > workload with messages like
>>> > 
>>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
>>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 71
>>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
>>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 71
>>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
>>> > 
>>> > printed before the host locks up.
>>> > 
>>> > The machines run sandboxed builds which is a mixed workload resulting in
>>> > IO/single core/mutiple core load over time and there are periods of no
>>> > activity and no VMS runnig as well. The VMs are shortlived so VM
>>> > setup/terdown is somewhat excercised as well.
>>> > 
>>> > POWER9 with the new guest entry fast path does not seem to be affected.
>>> > 
>>> > Reverted the patch and the followup idle fixes on top of 5.2.14 and
>>> > re-applied commit a3f3072db6ca ("powerpc/powernv/idle: Restore IAMR
>>> > after idle") which gives same idle code as 5.1.16 and the kernel seems
>>> > stable.
>>> > 
>>> > Config is attached.
>>> > 
>>> > I cannot easily revert this commit, especially if I want to use the same
>>> > kernel on POWER8 and POWER9 - many of the POWER9 fixes are applicable
>>> > only to the new idle code.
>>> > 
>>> > Any idea what can be the problem?
>>> 
>>> So hwthread_state is never getting back to to HWTHREAD_IN_IDLE on
>>> those threads. I wonder what they are doing. POWER8 doesn't have a good
>>> NMI IPI and I don't know if it supports pdbg dumping registers from the
>>> BMC unfortunately.
>>
>> It may be possible to set up fadump with a later kernel version that
>> supports it on powernv and dump the whole kernel.
> 
> Your firmware won't support it AFAIK.
> 
> You could try kdump, but if we have CPUs stuck in KVM then there's a
> good chance it won't work :/

I haven't had any luck yet reproducing this still. Testing with sub 
cores of various different combinations, etc. I'll keep trying though.

I don't know if there's much we can add to debug it. Can we run pdbg
on the BMCs on these things?

Thanks,
Nick


Re: [RFC PATCH 12/12] powerpc/64s: power4 nap fixup in C

2020-09-07 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of September 7, 2020 2:48 pm:
> 
> 
> Le 07/09/2020 à 06:02, Nicholas Piggin a écrit :
>> Excerpts from Christophe Leroy's message of September 6, 2020 5:32 pm:
>>>
>>>
>>> Le 05/09/2020 à 19:43, Nicholas Piggin a écrit :
 There is no need for this to be in asm, use the new intrrupt entry wrapper.

 Signed-off-by: Nicholas Piggin 
 ---
arch/powerpc/include/asm/interrupt.h   | 14 
arch/powerpc/include/asm/processor.h   |  1 +
arch/powerpc/include/asm/thread_info.h |  6 
arch/powerpc/kernel/exceptions-64s.S   | 45 --
arch/powerpc/kernel/idle_book3s.S  |  4 +++
5 files changed, 25 insertions(+), 45 deletions(-)

 diff --git a/arch/powerpc/include/asm/processor.h 
 b/arch/powerpc/include/asm/processor.h
 index ed0d633ab5aa..3da1dba91386 100644
 --- a/arch/powerpc/include/asm/processor.h
 +++ b/arch/powerpc/include/asm/processor.h
 @@ -424,6 +424,7 @@ extern unsigned long isa300_idle_stop_mayloss(unsigned 
 long psscr_val);
extern unsigned long isa206_idle_insn_mayloss(unsigned long type);
#ifdef CONFIG_PPC_970_NAP
extern void power4_idle_nap(void);
 +extern void power4_idle_nap_return(void);
>>>
>>> Please please please, 'extern' keyword is pointless and deprecated for
>>> function prototypes. Don't add new ones.
>>>
>>> Also, put it outside the #ifdef, so that you can use IS_ENABLED()
>>> instead of #ifdef when using it.
>> 
>> I just copy paste and forget to remove it. I expect someone will do a
>> "cleanup" patch to get rid of them in one go, I find a random assortment
>> of extern and not extern to be even uglier :(
> 
> If we don't want to make fixes backporting a huge headache, some 
> transition with random assortment is the price to pay.
> 
> One day, when 'extern' have become the minority, we can get rid of the 
> few last ones.
> 
> But if someone believe it is not such a problem with backporting, I can 
> provide a cleanup patch now.

I can't really see declarations being a big problem for backporting
or git history. But maybe Michael won't like a patch.

I will try to remember externs though.

Thanks,
Nick


Re: [PATCH] arch: vdso: add vdso linker script to 'targets' instead of extra-y

2020-09-07 Thread Masahiro Yamada
On Tue, Sep 1, 2020 at 3:23 AM Masahiro Yamada  wrote:
>
> The vdso linker script is preprocessed on demand.
> Adding it to 'targets' is enough to include the .cmd file.
>
> Signed-off-by: Masahiro Yamada 
> ---


Applied to linux-kbuild.


>  arch/arm64/kernel/vdso/Makefile | 2 +-
>  arch/arm64/kernel/vdso32/Makefile   | 2 +-
>  arch/nds32/kernel/vdso/Makefile | 2 +-
>  arch/powerpc/kernel/vdso32/Makefile | 2 +-
>  arch/powerpc/kernel/vdso64/Makefile | 2 +-
>  arch/s390/kernel/vdso64/Makefile| 2 +-
>  6 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
> index 45d5cfe46429..7cd8aafbe96e 100644
> --- a/arch/arm64/kernel/vdso/Makefile
> +++ b/arch/arm64/kernel/vdso/Makefile
> @@ -54,7 +54,7 @@ endif
>  GCOV_PROFILE := n
>
>  obj-y += vdso.o
> -extra-y += vdso.lds
> +targets += vdso.lds
>  CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
>
>  # Force dependency (incbin is bad)
> diff --git a/arch/arm64/kernel/vdso32/Makefile 
> b/arch/arm64/kernel/vdso32/Makefile
> index d6adb4677c25..572475b7b7ed 100644
> --- a/arch/arm64/kernel/vdso32/Makefile
> +++ b/arch/arm64/kernel/vdso32/Makefile
> @@ -155,7 +155,7 @@ asm-obj-vdso := $(addprefix $(obj)/, $(asm-obj-vdso))
>  obj-vdso := $(c-obj-vdso) $(c-obj-vdso-gettimeofday) $(asm-obj-vdso)
>
>  obj-y += vdso.o
> -extra-y += vdso.lds
> +targets += vdso.lds
>  CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
>
>  # Force dependency (vdso.s includes vdso.so through incbin)
> diff --git a/arch/nds32/kernel/vdso/Makefile b/arch/nds32/kernel/vdso/Makefile
> index 7c3c1ccb196e..55df25ef0057 100644
> --- a/arch/nds32/kernel/vdso/Makefile
> +++ b/arch/nds32/kernel/vdso/Makefile
> @@ -20,7 +20,7 @@ GCOV_PROFILE := n
>
>
>  obj-y += vdso.o
> -extra-y += vdso.lds
> +targets += vdso.lds
>  CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
>
>  # Force dependency
> diff --git a/arch/powerpc/kernel/vdso32/Makefile 
> b/arch/powerpc/kernel/vdso32/Makefile
> index 87ab1152d5ce..fd5072a4c73c 100644
> --- a/arch/powerpc/kernel/vdso32/Makefile
> +++ b/arch/powerpc/kernel/vdso32/Makefile
> @@ -29,7 +29,7 @@ ccflags-y := -shared -fno-common -fno-builtin -nostdlib \
>  asflags-y := -D__VDSO32__ -s
>
>  obj-y += vdso32_wrapper.o
> -extra-y += vdso32.lds
> +targets += vdso32.lds
>  CPPFLAGS_vdso32.lds += -P -C -Upowerpc
>
>  # Force dependency (incbin is bad)
> diff --git a/arch/powerpc/kernel/vdso64/Makefile 
> b/arch/powerpc/kernel/vdso64/Makefile
> index 38c317f25141..c737b3ea3207 100644
> --- a/arch/powerpc/kernel/vdso64/Makefile
> +++ b/arch/powerpc/kernel/vdso64/Makefile
> @@ -17,7 +17,7 @@ ccflags-y := -shared -fno-common -fno-builtin -nostdlib \
>  asflags-y := -D__VDSO64__ -s
>
>  obj-y += vdso64_wrapper.o
> -extra-y += vdso64.lds
> +targets += vdso64.lds
>  CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
>
>  # Force dependency (incbin is bad)
> diff --git a/arch/s390/kernel/vdso64/Makefile 
> b/arch/s390/kernel/vdso64/Makefile
> index 4a66a1cb919b..d0d406cfffa9 100644
> --- a/arch/s390/kernel/vdso64/Makefile
> +++ b/arch/s390/kernel/vdso64/Makefile
> @@ -25,7 +25,7 @@ $(targets:%=$(obj)/%.dbg): KBUILD_CFLAGS = 
> $(KBUILD_CFLAGS_64)
>  $(targets:%=$(obj)/%.dbg): KBUILD_AFLAGS = $(KBUILD_AFLAGS_64)
>
>  obj-y += vdso64_wrapper.o
> -extra-y += vdso64.lds
> +targets += vdso64.lds
>  CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
>
>  # Disable gcov profiling, ubsan and kasan for VDSO code
> --
> 2.25.1
>


-- 
Best Regards
Masahiro Yamada


[PATCH] scsi: ibmvfc: Fix error return in ibmvfc_probe()

2020-09-07 Thread Jing Xiangfeng
Fix to return error code PTR_ERR() from the error handling case instead
of 0.

Signed-off-by: Jing Xiangfeng 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index ea7c8930592d..70daa0605082 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -4928,6 +4928,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
if (IS_ERR(vhost->work_thread)) {
dev_err(dev, "Couldn't create kernel thread: %ld\n",
PTR_ERR(vhost->work_thread));
+   rc = PTR_ERR(vhost->work_thread);
goto free_host_mem;
}
 
-- 
2.17.1



Re: [RFC PATCH 02/12] powerpc: remove arguments from interrupt handler functions

2020-09-07 Thread Christophe Leroy
On Mon, 2020-09-07 at 11:20 +0200, Christophe Leroy wrote:
> 
> Le 05/09/2020 à 19:43, Nicholas Piggin a écrit :
> > Make interrupt handlers all just take the pt_regs * argument and load
> > DAR/DSISR etc from that. Make those that return a value return long.
> 
> I like this, it will likely simplify a bit the VMAP_STACK mess.
> 
> Not sure it is that easy. My board is stuck after the start of init.
> 
> 
> On the 8xx, on Instruction TLB Error exception, we do
> 
>   andis.  r5,r9,DSISR_SRR1_MATCH_32S@h /* Filter relevant SRR1 bits */
> 
> On book3s/32, on ISI exception we do:
>   andis.  r5,r9,DSISR_SRR1_MATCH_32S@h /* Filter relevant SRR1 bits */
> 
> On 40x and bookE, on ISI exception we do:
>   li  r5,0/* Pass zero as arg3 */
> 
> 
> And regs->dsisr will just contain nothing
> 
> So it means we should at least write back r5 into regs->dsisr from there 
> ? The performance impact should be minimal as we already write _DAR so 
> the cache line should already be in the cache.
> 
> A hacky 'stw r5, _DSISR(r1)' in handle_page_fault() does the trick, 
> allthough we don't want to do it for both ISI and DSI at the end, so 
> you'll have to do it in every head_xxx.S

To get you series build and work, I did the following hacks:

diff --git a/arch/powerpc/include/asm/interrupt.h
b/arch/powerpc/include/asm/interrupt.h
index acfcc7d5779b..c11045d3113a 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -93,7 +93,9 @@ static inline void interrupt_nmi_exit_prepare(struct
pt_regs *regs, struct inter
 {
nmi_exit();
 
+#ifdef CONFIG_PPC64
this_cpu_set_ftrace_enabled(state->ftrace_enabled);
+#endif
 
 #ifdef CONFIG_PPC_BOOK3S_64
/* Check we didn't change the pending interrupt mask. */
diff --git a/arch/powerpc/kernel/entry_32.S
b/arch/powerpc/kernel/entry_32.S
index f4d0af8e1136..66f7adbe1076 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -663,6 +663,7 @@ ppc_swapcontext:
  */
.globl  handle_page_fault
 handle_page_fault:
+   stw r5,_DSISR(r1)
addir3,r1,STACK_FRAME_OVERHEAD
 #ifdef CONFIG_PPC_BOOK3S_32
andis.  r0,r5,DSISR_DABRMATCH@h
---

Christophe



[PATCH v2] powerpc/papr_scm: Limit the readability of 'perf_stats' sysfs attribute

2020-09-07 Thread Vaibhav Jain
The newly introduced 'perf_stats' attribute uses the default access
mode of 0444 letting non-root users access performance stats of an
nvdimm and potentially force the kernel into issuing large number of
expensive HCALLs. Since the information exposed by this attribute
cannot be cached hence its better to ward of access to this attribute
from users who don't need to access these performance statistics.

Hence this patch updates access mode of 'perf_stats' attribute to
be only readable by root users.

Fixes: 2d02bf835e573 ('powerpc/papr_scm: Fetch nvdimm performance stats from 
PHYP')
Reported-by: Aneesh Kumar K.V 
Signed-off-by: Vaibhav Jain 
---
Change-log:

v2:
* Instead of checking for perfmon_capable() inside show_perf_stats()
  set the attribute as DEVICE_ATTR_ADMIN_RO [ Aneesh ]
* Update patch description
---
 arch/powerpc/platforms/pseries/papr_scm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index f439f0dfea7d1..a88a707a608aa 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -822,7 +822,7 @@ static ssize_t perf_stats_show(struct device *dev,
kfree(stats);
return rc ? rc : seq_buf_used();
 }
-DEVICE_ATTR_RO(perf_stats);
+DEVICE_ATTR_ADMIN_RO(perf_stats);
 
 static ssize_t flags_show(struct device *dev,
  struct device_attribute *attr, char *buf)
-- 
2.26.2



Re: fsl_espi errors on v5.7.15

2020-09-07 Thread Joakim Tjernlund
On Thu, 2020-09-03 at 23:58 +, Chris Packham wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
> 
> 
> On 1/09/20 6:14 pm, Nicholas Piggin wrote:
> > Excerpts from Chris Packham's message of September 1, 2020 11:25 am:
> > > On 1/09/20 12:33 am, Heiner Kallweit wrote:
> > > > On 30.08.2020 23:59, Chris Packham wrote:
> > > > > On 31/08/20 9:41 am, Heiner Kallweit wrote:
> > > > > > On 30.08.2020 23:00, Chris Packham wrote:
> > > > > > > On 31/08/20 12:30 am, Nicholas Piggin wrote:
> > > > > > > > Excerpts from Chris Packham's message of August 28, 2020 8:07 
> > > > > > > > am:
> > > > > > > 
> > > > > > > 
> > > > > > > > > > > > > I've also now seen the RX FIFO not empty error on the 
> > > > > > > > > > > > > T2080RDB
> > > > > > > > > > > > > 
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but SPIE_DON 
> > > > > > > > > > > > > isn't set!
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but SPIE_DON 
> > > > > > > > > > > > > isn't set!
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but SPIE_DON 
> > > > > > > > > > > > > isn't set!
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but SPIE_DON 
> > > > > > > > > > > > > isn't set!
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but rx/tx 
> > > > > > > > > > > > > fifo's aren't empty!
> > > > > > > > > > > > > fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 
> > > > > > > > > > > > > 32
> > > > > > > > > > > > > 
> > > > > > > > > > > > > With my current workaround of emptying the RX FIFO. 
> > > > > > > > > > > > > It seems
> > > > > > > > > > > > > survivable. Interestingly it only ever seems to be 1 
> > > > > > > > > > > > > extra byte in the
> > > > > > > > > > > > > RX FIFO and it seems to be after either a READ_SR or 
> > > > > > > > > > > > > a READ_FSR.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > fsl_espi ffe11.spi: tx 70
> > > > > > > > > > > > > fsl_espi ffe11.spi: rx 03
> > > > > > > > > > > > > fsl_espi ffe11.spi: Extra RX 00
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but SPIE_DON 
> > > > > > > > > > > > > isn't set!
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but rx/tx 
> > > > > > > > > > > > > fifo's aren't empty!
> > > > > > > > > > > > > fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 
> > > > > > > > > > > > > 32
> > > > > > > > > > > > > fsl_espi ffe11.spi: tx 05
> > > > > > > > > > > > > fsl_espi ffe11.spi: rx 00
> > > > > > > > > > > > > fsl_espi ffe11.spi: Extra RX 03
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but SPIE_DON 
> > > > > > > > > > > > > isn't set!
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but rx/tx 
> > > > > > > > > > > > > fifo's aren't empty!
> > > > > > > > > > > > > fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 
> > > > > > > > > > > > > 32
> > > > > > > > > > > > > fsl_espi ffe11.spi: tx 05
> > > > > > > > > > > > > fsl_espi ffe11.spi: rx 00
> > > > > > > > > > > > > fsl_espi ffe11.spi: Extra RX 03
> > > > > > > > > > > > > 
> > > > > > > > > > > > >  From all the Micron SPI-NOR datasheets I've got 
> > > > > > > > > > > > > access to it is
> > > > > > > > > > > > > possible to continually read the SR/FSR. But I've no 
> > > > > > > > > > > > > idea why it
> > > > > > > > > > > > > happens some times and not others.
> > > > > > > > > > > > So I think I've got a reproduction and I think I've 
> > > > > > > > > > > > bisected the problem
> > > > > > > > > > > > to commit 3282a3da25bd ("powerpc/64: Implement soft 
> > > > > > > > > > > > interrupt replay in
> > > > > > > > > > > > C"). My day is just finishing now so I haven't applied 
> > > > > > > > > > > > too much scrutiny
> > > > > > > > > > > > to this result. Given the various rabbit holes I've 
> > > > > > > > > > > > been down on this
> > > > > > > > > > > > issue already I'd take this information with a good 
> > > > > > > > > > > > degree of skepticism.
> > > > > > > > > > > > 
> > > > > > > > > > > OK, so an easy test should be to re-test with a 5.4 
> > > > > > > > > > > kernel.
> > > > > > > > > > > It doesn't have yet the change you're referring to, and 
> > > > > > > > > > > the fsl-espi driver
> > > > > > > > > > > is basically the same as in 5.7 (just two small changes 
> > > > > > > > > > > in 5.7).
> > > > > > > > > > There's 6cc0c16d82f88 and maybe also other interrupt 
> > > > > > > > > > related patches
> > > > > > > > > > around this time that could affect book E, so it's good if 
> > > > > > > > > > that exact
> > > > > > > > > > patch is confirmed.
> > > > > > > > > My confirmation is basically that I can induce the issue in a 
> > > > > > > > > 5.4 kernel
> > > > > > > > > by cherry-picking 3282a3da25bd. I'm also able to "fix" the 
> > > > > > > > > issue in
> > > > > > > > > 

Re: [RFC PATCH 02/12] powerpc: remove arguments from interrupt handler functions

2020-09-07 Thread Christophe Leroy




Le 05/09/2020 à 19:43, Nicholas Piggin a écrit :

Make interrupt handlers all just take the pt_regs * argument and load
DAR/DSISR etc from that. Make those that return a value return long.


I like this, it will likely simplify a bit the VMAP_STACK mess.

Not sure it is that easy. My board is stuck after the start of init.


On the 8xx, on Instruction TLB Error exception, we do

andis.  r5,r9,DSISR_SRR1_MATCH_32S@h /* Filter relevant SRR1 bits */

On book3s/32, on ISI exception we do:
andis.  r5,r9,DSISR_SRR1_MATCH_32S@h /* Filter relevant SRR1 bits */

On 40x and bookE, on ISI exception we do:
li  r5,0/* Pass zero as arg3 */


And regs->dsisr will just contain nothing

So it means we should at least write back r5 into regs->dsisr from there 
? The performance impact should be minimal as we already write _DAR so 
the cache line should already be in the cache.


A hacky 'stw r5, _DSISR(r1)' in handle_page_fault() does the trick, 
allthough we don't want to do it for both ISI and DSI at the end, so 
you'll have to do it in every head_xxx.S



While you are at it, it would probably also make sense to do remove the 
address param of bad_page_fault(), there is no point in loading back 
regs->dar in handle_page_fault() and machine_check_8xx() and 
alignment_exception(), just read regs->dar in bad_page_fault()


The case of do_break() should also be looked at.

Why changing return code from int to long ?

Christophe



This is done to make the function signatures match more closely, which
will help with a future patch to add wrappers. Explicit arguments could
be re-added for performance in future but that would require more
complex wrapper macros.

Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/include/asm/asm-prototypes.h |  4 ++--
  arch/powerpc/include/asm/bug.h|  4 ++--
  arch/powerpc/kernel/exceptions-64e.S  |  2 --
  arch/powerpc/kernel/exceptions-64s.S  | 14 ++
  arch/powerpc/mm/book3s64/hash_utils.c |  8 +---
  arch/powerpc/mm/book3s64/slb.c| 11 +++
  arch/powerpc/mm/fault.c   | 16 +---
  7 files changed, 27 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index d714d83bbc7c..2fa0cf6c6011 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -111,8 +111,8 @@
  #ifndef __ASSEMBLY__
  
  struct pt_regs;

-extern int do_page_fault(struct pt_regs *, unsigned long, unsigned long);
-extern int hash__do_page_fault(struct pt_regs *, unsigned long, unsigned long);
+extern long do_page_fault(struct pt_regs *);
+extern long hash__do_page_fault(struct pt_regs *);


no extern


  extern void bad_page_fault(struct pt_regs *, unsigned long, int);
  extern void _exception(int, struct pt_regs *, int, unsigned long);
  extern void _exception_pkey(struct pt_regs *, unsigned long, int);


Christophe


Re: [RFC PATCH 09/12] powerpc: move NMI entry/exit code into wrapper

2020-09-07 Thread Christophe Leroy




On 9/5/20 5:43 PM, Nicholas Piggin wrote:

This moves the common NMI entry and exit code into the interrupt handler
wrappers.

This changes the behaviour of soft-NMI (watchdog) and HMI interrupts, and
also MCE interrupts on 64e, by adding missing parts of the NMI entry to
them.

Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/include/asm/interrupt.h | 26 +++
  arch/powerpc/kernel/mce.c| 12 -
  arch/powerpc/kernel/traps.c  | 38 +---
  arch/powerpc/kernel/watchdog.c   | 10 +++-
  4 files changed, 37 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index 83fe1d64cf23..69eb8a432984 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -31,6 +31,27 @@ static inline void interrupt_enter_prepare(struct pt_regs 
*regs)
  }
  #endif /* CONFIG_PPC_BOOK3S_64 */
  
+struct interrupt_nmi_state {

+#ifdef CONFIG_PPC64
+   u8 ftrace_enabled;
+#endif
+};
+
+static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct 
interrupt_nmi_state *state)
+{
+   this_cpu_set_ftrace_enabled(0);
+
+   nmi_enter();
+}
+
+static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct 
interrupt_nmi_state *state)
+{
+   nmi_exit();
+
+   this_cpu_set_ftrace_enabled(state->ftrace_enabled);


PPC32 build:

In file included from arch/powerpc/kernel/irq.c:57:0:
./arch/powerpc/include/asm/interrupt.h: In function 
‘interrupt_nmi_exit_prepare’:
./arch/powerpc/include/asm/interrupt.h:96:35: error: ‘struct 
interrupt_nmi_state’ has no member named ‘ftrace_enabled’

  this_cpu_set_ftrace_enabled(state->ftrace_enabled);
   ^


+}
+
+
  /**
   * DECLARE_INTERRUPT_HANDLER_RAW - Declare raw interrupt handler function
   * @func: Function name of the entry point
@@ -177,10 +198,15 @@ static __always_inline long ___##func(struct pt_regs 
*regs);  \
\
  __visible noinstr long func(struct pt_regs *regs) \
  { \
+   struct interrupt_nmi_state state;   \
long ret;   \
\
+   interrupt_nmi_enter_prepare(regs, );  \
+   \
ret = ___##func (regs); \
\
+   interrupt_nmi_exit_prepare(regs, );   \
+   \
return ret; \
  } \
\


Christophe


Re: [PATCH -next] powerpc/book3s64: fix link error with CONFIG_PPC_RADIX_MMU=n

2020-09-07 Thread Christophe Leroy




Le 07/09/2020 à 03:51, Yang Yingliang a écrit :


On 2020/9/6 14:50, Christophe Leroy wrote:



Le 05/09/2020 à 13:25, Yang Yingliang a écrit :

Fix link error when CONFIG_PPC_RADIX_MMU is disabled:
powerpc64-linux-gnu-ld: 
arch/powerpc/platforms/pseries/lpar.o:(.toc+0x0): undefined reference 
to `mmu_pid_bits'


Reported-by: Hulk Robot 
Signed-off-by: Yang Yingliang 
---
  arch/powerpc/mm/book3s64/mmu_context.c | 4 


In your commit log, you are just mentionning 
arch/powerpc/platforms/pseries/lpar.o, which is right.


You shouldn't need to modify arch/powerpc/mm/book3s64/mmu_context.c at 
all, see below.



  arch/powerpc/platforms/pseries/lpar.c  | 2 ++
  2 files changed, 6 insertions(+)

diff --git a/arch/powerpc/mm/book3s64/mmu_context.c 
b/arch/powerpc/mm/book3s64/mmu_context.c

index 0ba30b8b935b..a8e292cd88f0 100644
--- a/arch/powerpc/mm/book3s64/mmu_context.c
+++ b/arch/powerpc/mm/book3s64/mmu_context.c
@@ -152,6 +152,7 @@ void hash__setup_new_exec(void)
    static int radix__init_new_context(struct mm_struct *mm)
  {
+#ifdef CONFIG_PPC_RADIX_MMU


This shouldn't be required. radix__init_new_context() is only called 
when radix_enabled() returns true.
As it is a static function, when it is not called it gets optimised 
away, so you will never get an undefined reference to `mmu_pid_bits` 
there.
powerpc64-linux-gnu-ld: 
arch/powerpc/mm/book3s64/mmu_context.o:(.toc+0x0): undefined reference 
to `mmu_pid_bits'
powerpc64-linux-gnu-ld: 
arch/powerpc/mm/book3s64/mmu_context.o:(.toc+0x8): undefined reference 
to `mmu_base_pid'



mmu_context.c is always compiled, it uses mmu_pid_bits and mmu_base_pid.


Yes, mmu_context.c is always compiled, but as I explained, 
radix__init_new_context() is defined as 'static' so it is optimised out 
when radix_enabled() returns false because there is no caller in that case.


I just made the test with ppc64_defconfig + CONFIG_PPC_RADIX_MMU=n (GCC 8.1)

The only failure I got was on lpar.c, which I fixed by enclosing the 
entire radix_init_pseries() in an #ifdef.


Once this is fixed, the build is OK, without any modification to 
mmu_context.c


powerpc64-linux-objdump -x arch/powerpc/mm/book3s64/mmu_context.o shows 
only the following objects in the .toc:


RELOCATION RECORDS FOR [.toc]:
OFFSET   TYPE  VALUE
 R_PPC64_ADDR64kmalloc_caches
0008 R_PPC64_ADDR64vmemmap
0010 R_PPC64_ADDR64__pmd_frag_nr
0018 R_PPC64_ADDR64__pmd_frag_size_shift

mmu_pid_bits and mmu_base_pid are not part of the undefined objetcs:

 *UND*   vmemmap
 *UND*   .mm_iommu_init
 *UND*   __pmd_frag_nr
 *UND*   .ida_alloc_range
 *UND*   .slb_setup_new_exec
 *UND*   mmu_feature_keys
 *UND*   .memset
 *UND*   .memcpy
 *UND*   .slice_init_new_context_exec
 *UND*   ._mcount
 *UND*   .__free_pages
 *UND*   __pmd_frag_size_shift
 *UND*   .slice_setup_new_exec
 *UND*   .ida_free
 *UND*   .pte_frag_destroy
 *UND*   .kfree
 *UND*   .pkey_mm_init
 *UND*   .kmem_cache_alloc_trace
 *UND*   .__warn_printk
 *UND*   _mcount
 *UND*   kmalloc_caches

Christophe


[PATCH] powerepc/book3s64/hash: Align start/end address correctly with bolt mapping

2020-09-07 Thread Aneesh Kumar K.V
This ensures we don't do a partial mapping of memory. With nvdimm, when
creating namespaces with size not aligned to 16MB, the kernel ends up partially
mapping the pages. This can result in kernel adding multiple hash page table
entries for the same range. A new namespace will result in
create_section_mapping() with start and end overlapping an already existing
bolted hash page table entry.

commit: 6acd7d5ef264 ("libnvdimm/namespace: Enforce memremap_compat_align()")
made sure that we always create namespaces aligned to 16MB. But we can do
better by avoiding mapping pages that are not aligned. This helps to catch
access to these partially mapped pages early.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/hash_utils.c| 12 +---
 arch/powerpc/mm/book3s64/radix_pgtable.c |  1 +
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index c663e7ba801f..7185bc43b24f 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -260,8 +260,12 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long 
vend,
DBG("htab_bolt_mapping(%lx..%lx -> %lx (%lx,%d,%d)\n",
vstart, vend, pstart, prot, psize, ssize);
 
-   for (vaddr = vstart, paddr = pstart; vaddr < vend;
-vaddr += step, paddr += step) {
+   /* Carefully map only the possible range */
+   vaddr = ALIGN(vstart, step);
+   paddr = ALIGN(pstart, step);
+   vend  = ALIGN_DOWN(vend, step);
+
+   for (; vaddr < vend; vaddr += step, paddr += step) {
unsigned long hash, hpteg;
unsigned long vsid = get_kernel_vsid(vaddr, ssize);
unsigned long vpn  = hpt_vpn(vaddr, vsid, ssize);
@@ -343,7 +347,9 @@ int htab_remove_mapping(unsigned long vstart, unsigned long 
vend,
if (!mmu_hash_ops.hpte_removebolted)
return -ENODEV;
 
-   for (vaddr = vstart; vaddr < vend; vaddr += step) {
+   /* Unmap the full range specificied */
+   vaddr = ALIGN_DOWN(vstart, step);
+   for (;vaddr < vend; vaddr += step) {
rc = mmu_hash_ops.hpte_removebolted(vaddr, psize, ssize);
if (rc == -ENOENT) {
ret = -ENOENT;
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index d5f0c10d752a..5c8adeb8c955 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -276,6 +276,7 @@ static int __meminit create_physical_mapping(unsigned long 
start,
int psize;
 
start = ALIGN(start, PAGE_SIZE);
+   end   = ALIGN_DOWN(end, PAGE_SIZE);
for (addr = start; addr < end; addr += mapping_size) {
unsigned long gap, previous_size;
int rc;
-- 
2.26.2



[Bug 209029] kernel 5.9-rc2 fails to boot on a PowerMac G5 11,2 - BUG: Kernel NULL pointer dereference on read at 0x00000020

2020-09-07 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=209029

Zorro Lang (zl...@redhat.com) changed:

   What|Removed |Added

 CC||zl...@redhat.com

--- Comment #5 from Zorro Lang (zl...@redhat.com) ---
*** Bug 209181 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 209181] kernel BUG at arch/powerpc/mm/pgtable.c:304!

2020-09-07 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=209181

Zorro Lang (zl...@redhat.com) changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #3 from Zorro Lang (zl...@redhat.com) ---


*** This bug has been marked as a duplicate of bug 209029 ***

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug 209181] kernel BUG at arch/powerpc/mm/pgtable.c:304!

2020-09-07 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=209181

--- Comment #2 from Zorro Lang (zl...@redhat.com) ---
(In reply to Christophe Leroy from comment #1)
> See https://bugzilla.kernel.org/show_bug.cgi?id=209029
> 
> Patch at
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200902040122.
> 136414-1-aneesh.ku...@linux.ibm.com/ to deactivate CONFIG_DEBUG_VM_PGTABLE
> on powerpc until the issue is fixes.

Thanks for this info

-- 
You are receiving this mail because:
You are on the CC list for the bug.