Re: [PATCH v1 08/10] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()

2020-09-01 Thread Leonardo Bras
On Mon, 2020-08-31 at 14:34 +1000, Alexey Kardashevskiy wrote:
> 
> On 29/08/2020 01:25, Leonardo Bras wrote:
> > On Mon, 2020-08-24 at 15:07 +1000, Alexey Kardashevskiy wrote:
> > > On 18/08/2020 09:40, Leonardo Bras wrote:
> > > > Code used to create a ddw property that was previously scattered in
> > > > enable_ddw() is now gathered in ddw_property_create(), which deals with
> > > > allocation and filling the property, letting it ready for
> > > > of_property_add(), which now occurs in sequence.
> > > > 
> > > > This created an opportunity to reorganize the second part of 
> > > > enable_ddw():
> > > > 
> > > > Without this patch enable_ddw() does, in order:
> > > > kzalloc() property & members, create_ddw(), fill ddwprop inside 
> > > > property,
> > > > ddw_list_add(), do tce_setrange_multi_pSeriesLP_walk in all memory,
> > > > of_add_property().
> > > > 
> > > > With this patch enable_ddw() does, in order:
> > > > create_ddw(), ddw_property_create(), of_add_property(), ddw_list_add(),
> > > > do tce_setrange_multi_pSeriesLP_walk in all memory.
> > > > 
> > > > This change requires of_remove_property() in case anything fails after
> > > > of_add_property(), but we get to do tce_setrange_multi_pSeriesLP_walk
> > > > in all memory, which looks the most expensive operation, only if
> > > > everything else succeeds.
> > > > 
> > > > Signed-off-by: Leonardo Bras 
> > > > ---
> > > >  arch/powerpc/platforms/pseries/iommu.c | 97 +++---
> > > >  1 file changed, 57 insertions(+), 40 deletions(-)
> > > > 
> > > > diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> > > > b/arch/powerpc/platforms/pseries/iommu.c
> > > > index 4031127c9537..3a1ef02ad9d5 100644
> > > > --- a/arch/powerpc/platforms/pseries/iommu.c
> > > > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > > > @@ -1123,6 +1123,31 @@ static void reset_dma_window(struct pci_dev 
> > > > *dev, struct device_node *par_dn)
> > > >  ret);
> > > >  }
> > > >  
> > > > +static int ddw_property_create(struct property **ddw_win, const char 
> > > > *propname,
> > > 
> > > @propname is always the same, do you really want to pass it every time?
> > 
> > I think it reads better, like "create a ddw property with this name".
> 
> This reads as "there are at least two ddw properties".
> 
> > Also, it makes possible to create ddw properties with other names, in
> > case we decide to create properties with different names depending on
> > the window created.
> 
> It is one window at any given moment, why call it different names... I
> get the part that it is not always "direct" anymore but still...
> 

It seems the case as one of the options you suggested on patch [09/10]

>> I suspect it breaks kexec (from older kernel to this one) so you
>> either need to check for both DT names or just keep the old one.

> 
> > Also, it's probably optimized / inlined at this point.
> > Is it ok doing it like this?
> > 
> > > > +  u32 liobn, u64 dma_addr, u32 page_shift, 
> > > > u32 window_shift)
> > > > +{
> > > > +   struct dynamic_dma_window_prop *ddwprop;
> > > > +   struct property *win64;
> > > > +
> > > > +   *ddw_win = win64 = kzalloc(sizeof(*win64), GFP_KERNEL);
> > > > +   if (!win64)
> > > > +   return -ENOMEM;
> > > > +
> > > > +   win64->name = kstrdup(propname, GFP_KERNEL);
> > > 
> > > Not clear why "win64->name = DIRECT64_PROPNAME" would not work here, the
> > > generic OF code does not try kfree() it but it is probably out of scope
> > > here.
> > 
> > Yeah, I had that question too. 
> > Previous code was like that, and I as trying not to mess too much on
> > how it's done.
> > 
> > > > +   ddwprop = kzalloc(sizeof(*ddwprop), GFP_KERNEL);
> > > > +   win64->value = ddwprop;
> > > > +   win64->length = sizeof(*ddwprop);
> > > > +   if (!win64->name || !win64->value)
> > > > +   return -ENOMEM;
> > > 
> > > Up to 2 memory leaks here. I see the cleanup at "out_free_prop:" but
> > > still looks fragile. Instead you could simply return win64 as the only
> > > error possible here is -ENOMEM and returning NULL is equally good.
> > 
> > I agree. It's better if this function have it's own cleaning routine.
> > It will be fixed for next version.
> > 
> > > 
> > > > +
> > > > +   ddwprop->liobn = cpu_to_be32(liobn);
> > > > +   ddwprop->dma_base = cpu_to_be64(dma_addr);
> > > > +   ddwprop->tce_shift = cpu_to_be32(page_shift);
> > > > +   ddwprop->window_shift = cpu_to_be32(window_shift);
> > > > +
> > > > +   return 0;
> > > > +}
> > > > +
> > > >  /*
> > > >   * If the PE supports dynamic dma windows, and there is space for a 
> > > > table
> > > >   * that can map all pages in a linear offset, then setup such a table,
> > > > @@ -1140,12 +1165,11 @@ static bool enable_ddw(struct pci_dev *dev, 
> > > > struct device_node *pdn)
> > > > struct ddw_query_response query;
> > > > struct ddw_create_response 

Re: [PATCH 0/2] link vdso with linker

2020-09-01 Thread Nathan Chancellor
On Tue, Sep 01, 2020 at 03:25:21PM -0700, Nick Desaulniers wrote:
> Kees Cook is working on series that adds --orphan-section=warn to arm,
> arm64, and x86.  I noticed that ppc vdso were still using cc-ldoption
> for these which I removed.  It seems this results in that flag being
> silently dropped.
> 
> I'm very confident with the first patch, but the second needs closer
> review around the error mentioned below the fold related to the .got
> section.
> 
> Nick Desaulniers (2):
>   powerpc/vdso64: link vdso64 with linker
>   powerpc/vdso32: link vdso64 with linker
> 
>  arch/powerpc/include/asm/vdso.h | 17 ++---
>  arch/powerpc/kernel/vdso32/Makefile |  7 +--
>  arch/powerpc/kernel/vdso32/vdso32.lds.S |  3 ++-
>  arch/powerpc/kernel/vdso64/Makefile |  8 ++--
>  arch/powerpc/kernel/vdso64/vdso64.lds.S |  1 -
>  5 files changed, 15 insertions(+), 21 deletions(-)
> 
> -- 
> 2.28.0.402.g5ffc5be6b7-goog
> 

ppc44x_defconfig and powernv_defconfig start failing with this series
when LD=ld.lld is used.


$ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- LLVM=1 
O=out/ppc32 distclean ppc44x_defconfig uImage
ld.lld: error: relocation R_PPC_REL16_LO cannot be used against symbol 
__kernel_datapage_offset; recompile with -fPIC
>>> defined in arch/powerpc/kernel/vdso32/datapage.o
>>> referenced by 
>>> arch/powerpc/kernel/vdso32/gettimeofday.o:(__kernel_gettimeofday)

ld.lld: error: relocation R_PPC_REL16_LO cannot be used against symbol 
__kernel_datapage_offset; recompile with -fPIC
>>> defined in arch/powerpc/kernel/vdso32/datapage.o
>>> referenced by 
>>> arch/powerpc/kernel/vdso32/gettimeofday.o:(__kernel_clock_gettime)

ld.lld: error: relocation R_PPC_REL16_LO cannot be used against symbol 
__kernel_datapage_offset; recompile with -fPIC
>>> defined in arch/powerpc/kernel/vdso32/datapage.o
>>> referenced by 
>>> arch/powerpc/kernel/vdso32/gettimeofday.o:(__kernel_clock_getres)

ld.lld: error: relocation R_PPC_REL16_LO cannot be used against symbol 
__kernel_datapage_offset; recompile with -fPIC
>>> defined in arch/powerpc/kernel/vdso32/datapage.o
>>> referenced by arch/powerpc/kernel/vdso32/gettimeofday.o:(__kernel_time)
...


$ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 
O=out/ppc64le distclean powernv_defconfig zImage.epapr
ld.lld: error: relocation R_PPC64_REL16_LO cannot be used against symbol 
__kernel_datapage_offset; recompile with -fPIC
>>> defined in arch/powerpc/kernel/vdso64/datapage.o
>>> referenced by 
>>> arch/powerpc/kernel/vdso64/gettimeofday.o:(__kernel_gettimeofday)

ld.lld: error: relocation R_PPC64_REL16_LO cannot be used against symbol 
__kernel_datapage_offset; recompile with -fPIC
>>> defined in arch/powerpc/kernel/vdso64/datapage.o
>>> referenced by 
>>> arch/powerpc/kernel/vdso64/gettimeofday.o:(__kernel_clock_gettime)

ld.lld: error: relocation R_PPC64_REL16_LO cannot be used against symbol 
__kernel_datapage_offset; recompile with -fPIC
>>> defined in arch/powerpc/kernel/vdso64/datapage.o
>>> referenced by 
>>> arch/powerpc/kernel/vdso64/gettimeofday.o:(__kernel_clock_getres)

ld.lld: error: relocation R_PPC64_REL16_LO cannot be used against symbol 
__kernel_datapage_offset; recompile with -fPIC
>>> defined in arch/powerpc/kernel/vdso64/datapage.o
>>> referenced by arch/powerpc/kernel/vdso64/gettimeofday.o:(__kernel_time)

ld.lld: error: relocation R_PPC64_REL16_LO cannot be used against symbol 
__kernel_datapage_offset; recompile with -fPIC
>>> defined in arch/powerpc/kernel/vdso64/datapage.o
>>> referenced by 
>>> arch/powerpc/kernel/vdso64/cacheflush.o:(__kernel_sync_dicache)
...


We need Fangrui's patch to fix ppc44x_defconfig:

https://lore.kernel.org/lkml/20200205005054.k72fuikf6rwrg...@google.com/

That exact same fix is needed in arch/powerpc/kernel/vdso64/datapage.S
to fix powernv_defconfig.

Cheers,
Nathan


Re: [PATCH 0/7] powerpc/watchpoint: 2nd DAWR kvm enablement + selftests

2020-09-01 Thread Ravi Bangoria

Hi Paul,

On 9/2/20 8:02 AM, Paul Mackerras wrote:

On Thu, Jul 23, 2020 at 03:50:51PM +0530, Ravi Bangoria wrote:

Patch #1, #2 and #3 enables p10 2nd DAWR feature for Book3S kvm guest. DAWR
is a hypervisor resource and thus H_SET_MODE hcall is used to set/unset it.
A new case H_SET_MODE_RESOURCE_SET_DAWR1 is introduced in H_SET_MODE hcall
for setting/unsetting 2nd DAWR. Also, new capability KVM_CAP_PPC_DAWR1 has
been added to query 2nd DAWR support via kvm ioctl.

This feature also needs to be enabled in Qemu to really use it. I'll reply
link to qemu patches once I post them in qemu-devel mailing list.

Patch #4, #5, #6 and #7 adds selftests to test 2nd DAWR.


If/when you resubmit these patches, please split the KVM patches into
a separate series, since the KVM patches would go via my tree whereas
I expect the selftests/powerpc patches would go through Michael
Ellerman's tree.


Sure. Will split it.

Thanks,
Ravi


Re: [PATCH 2/7] powerpc/watchpoint/kvm: Add infrastructure to support 2nd DAWR

2020-09-01 Thread Ravi Bangoria

Hi Paul,


diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 33793444144c..03f401d7be41 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -538,6 +538,8 @@ struct hv_guest_state {
s64 tb_offset;
u64 dawr0;
u64 dawrx0;
+   u64 dawr1;
+   u64 dawrx1;
u64 ciabr;
u64 hdec_expiry;
u64 purr;


After this struct, there is a macro HV_GUEST_STATE_VERSION, I guess that
also needs to be incremented because I'm adding new members in the struct?

Thanks,
Ravi


Re: [PATCH 1/7] powerpc/watchpoint/kvm: Rename current DAWR macros and variables

2020-09-01 Thread Ravi Bangoria

Hi Paul,

On 9/2/20 7:19 AM, Paul Mackerras wrote:

On Thu, Jul 23, 2020 at 03:50:52PM +0530, Ravi Bangoria wrote:

Power10 is introducing second DAWR. Use real register names (with
suffix 0) from ISA for current macros and variables used by kvm.


Most of this looks fine, but I think we should not change the existing
names in arch/powerpc/include/uapi/asm/kvm.h (and therefore also
Documentation/virt/kvm/api.rst).


Missed that I'm changing uapi. I'll rename only those macros/variables
which are not uapi.

Thanks,
Ravi


Re: [PATCH 2/7] powerpc/watchpoint/kvm: Add infrastructure to support 2nd DAWR

2020-09-01 Thread Ravi Bangoria

Hi Paul,

On 9/2/20 7:31 AM, Paul Mackerras wrote:

On Thu, Jul 23, 2020 at 03:50:53PM +0530, Ravi Bangoria wrote:

kvm code assumes single DAWR everywhere. Add code to support 2nd DAWR.
DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/
unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR.


Is this the same interface as will be defined in PAPR and available
under PowerVM, or is it a new/different interface for KVM?


Yes, kvm hcall interface for 2nd DAWR is same as PowerVM, as defined in PAPR.




Also, kvm will support 2nd DAWR only if CPU_FTR_DAWR1 is set.


In general QEMU wants to be able to control all aspects of the virtual
machine presented to the guest, meaning that just because a host has a
particular hardware capability does not mean we should automatically
present that capability to the guest.

In this case, QEMU will want a way to control whether the guest sees
the availability of the second DAWR/X registers or not, i.e. whether a
H_SET_MODE to set DAWR[X]1 will succeed or fail.


Patch #3 adds new kvm capability KVM_CAP_PPC_DAWR1 that can be checked
by Qemu. Also, as suggested by David in Qemu patch[1], I'm planning to
add new machine capability in Qemu:

  -machine cap-dawr1=ON/OFF

cap-dawr1 will be default ON when PPC_FEATURE2_ARCH_3_10 is set and OFF
otherwise.

Is this correct approach?

[1]: https://lore.kernel.org/kvm/20200724045613.ga8...@umbus.fritz.box

Thanks,
Ravi


Re: [PATCH -next] powerpc: Convert to DEFINE_SHOW_ATTRIBUTE

2020-09-01 Thread Paul Mackerras
On Thu, Jul 16, 2020 at 05:07:12PM +0800, Qinglang Miao wrote:
> From: Chen Huang 
> 
> Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
> 
> Signed-off-by: Chen Huang 

For the arch/powerpc/kvm part:

Acked-by: Paul Mackerras 

I expect Michael Ellerman will take the patch through his tree.

Paul.


[PATCH v6 8/8] powerpc/watchpoint/selftests: Tests for kernel accessing user memory

2020-09-01 Thread Ravi Bangoria
Introduce tests to cover simple scenarios where user is watching
memory which can be accessed by kernel as well. We also support
_MODE_EXACT with _SETHWDEBUG interface. Move those testcases out-
side of _BP_RANGE condition. This will help to test _MODE_EXACT
scenarios when CONFIG_HAVE_HW_BREAKPOINT is not set, eg:

  $ ./ptrace-hwbreak
  ...
  PTRACE_SET_DEBUGREG, Kernel Access Userspace, len: 8: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_EXACT, WO, len: 1: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RO, len: 1: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RW, len: 1: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace, len: 1: Ok
  success: ptrace-hwbreak

Suggested-by: Pedro Miraglia Franco de Carvalho 
Signed-off-by: Ravi Bangoria 
---
 .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 48 ++-
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
index fc477dfe86a2..2e0d86e0687e 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
@@ -20,6 +20,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "ptrace.h"
 
 #define SPRN_PVR   0x11F
@@ -44,6 +46,7 @@ struct gstruct {
 };
 static volatile struct gstruct gstruct __attribute__((aligned(512)));
 
+static volatile char cwd[PATH_MAX] __attribute__((aligned(8)));
 
 static void get_dbginfo(pid_t child_pid, struct ppc_debug_info *dbginfo)
 {
@@ -138,6 +141,9 @@ static void test_workload(void)
write_var(len);
}
 
+   /* PTRACE_SET_DEBUGREG, Kernel Access Userspace test */
+   syscall(__NR_getcwd, , PATH_MAX);
+
/* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, WO test */
write_var(1);
 
@@ -150,6 +156,9 @@ static void test_workload(void)
else
read_var(1);
 
+   /* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace test */
+   syscall(__NR_getcwd, , PATH_MAX);
+
/* PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, WO test */
gstruct.a[rand() % A_LEN] = 'a';
 
@@ -293,6 +302,24 @@ static int test_set_debugreg(pid_t child_pid)
return 0;
 }
 
+static int test_set_debugreg_kernel_userspace(pid_t child_pid)
+{
+   unsigned long wp_addr = (unsigned long)cwd;
+   char *name = "PTRACE_SET_DEBUGREG";
+
+   /* PTRACE_SET_DEBUGREG, Kernel Access Userspace test */
+   wp_addr &= ~0x7UL;
+   wp_addr |= (1Ul << DABR_READ_SHIFT);
+   wp_addr |= (1UL << DABR_WRITE_SHIFT);
+   wp_addr |= (1UL << DABR_TRANSLATION_SHIFT);
+   ptrace_set_debugreg(child_pid, wp_addr);
+   ptrace(PTRACE_CONT, child_pid, NULL, 0);
+   check_success(child_pid, name, "Kernel Access Userspace", wp_addr, 8);
+
+   ptrace_set_debugreg(child_pid, 0);
+   return 0;
+}
+
 static void get_ppc_hw_breakpoint(struct ppc_hw_breakpoint *info, int type,
  unsigned long addr, int len)
 {
@@ -338,6 +365,22 @@ static void test_sethwdebug_exact(pid_t child_pid)
ptrace_delhwdebug(child_pid, wh);
 }
 
+static void test_sethwdebug_exact_kernel_userspace(pid_t child_pid)
+{
+   struct ppc_hw_breakpoint info;
+   unsigned long wp_addr = (unsigned long)
+   char *name = "PPC_PTRACE_SETHWDEBUG, MODE_EXACT";
+   int len = 1; /* hardcoded in kernel */
+   int wh;
+
+   /* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace test */
+   get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr, 0);
+   wh = ptrace_sethwdebug(child_pid, );
+   ptrace(PTRACE_CONT, child_pid, NULL, 0);
+   check_success(child_pid, name, "Kernel Access Userspace", wp_addr, len);
+   ptrace_delhwdebug(child_pid, wh);
+}
+
 static void test_sethwdebug_range_aligned(pid_t child_pid)
 {
struct ppc_hw_breakpoint info;
@@ -452,9 +495,10 @@ static void
 run_tests(pid_t child_pid, struct ppc_debug_info *dbginfo, bool dawr)
 {
test_set_debugreg(child_pid);
+   test_set_debugreg_kernel_userspace(child_pid);
+   test_sethwdebug_exact(child_pid);
+   test_sethwdebug_exact_kernel_userspace(child_pid);
if (dbginfo->features & PPC_DEBUG_FEATURE_DATA_BP_RANGE) {
-   test_sethwdebug_exact(child_pid);
-
test_sethwdebug_range_aligned(child_pid);
if (dawr || is_8xx) {
test_sethwdebug_range_unaligned(child_pid);
-- 
2.26.2



[PATCH v6 7/8] powerpc/watchpoint/ptrace: Introduce PPC_DEBUG_FEATURE_DATA_BP_ARCH_31

2020-09-01 Thread Ravi Bangoria
PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 can be used to determine whether
we are running on an ISA 3.1 compliant machine. Which is needed to
determine DAR behaviour, 512 byte boundary limit etc. This was
requested by Pedro Miraglia Franco de Carvalho for extending
watchpoint features in gdb. Note that availability of 2nd DAWR is
independent of this flag and should be checked using
ppc_debug_info->num_data_bps.

Signed-off-by: Ravi Bangoria 
---
 Documentation/powerpc/ptrace.rst  | 1 +
 arch/powerpc/include/uapi/asm/ptrace.h| 1 +
 arch/powerpc/kernel/ptrace/ptrace-noadv.c | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/Documentation/powerpc/ptrace.rst b/Documentation/powerpc/ptrace.rst
index 864d4b61..77725d69eb4a 100644
--- a/Documentation/powerpc/ptrace.rst
+++ b/Documentation/powerpc/ptrace.rst
@@ -46,6 +46,7 @@ features will have bits indicating whether there is support 
for::
   #define PPC_DEBUG_FEATURE_DATA_BP_RANGE  0x4
   #define PPC_DEBUG_FEATURE_DATA_BP_MASK   0x8
   #define PPC_DEBUG_FEATURE_DATA_BP_DAWR   0x10
+  #define PPC_DEBUG_FEATURE_DATA_BP_ARCH_310x20
 
 2. PTRACE_SETHWDEBUG
 
diff --git a/arch/powerpc/include/uapi/asm/ptrace.h 
b/arch/powerpc/include/uapi/asm/ptrace.h
index f5f1ccc740fc..7004cfea3f5f 100644
--- a/arch/powerpc/include/uapi/asm/ptrace.h
+++ b/arch/powerpc/include/uapi/asm/ptrace.h
@@ -222,6 +222,7 @@ struct ppc_debug_info {
 #define PPC_DEBUG_FEATURE_DATA_BP_RANGE0x0004
 #define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x0008
 #define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x0010
+#define PPC_DEBUG_FEATURE_DATA_BP_ARCH_31  0x0020
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c 
b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
index 48c52426af80..aa36fcad36cd 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
@@ -57,6 +57,8 @@ void ppc_gethwdinfo(struct ppc_debug_info *dbginfo)
} else {
dbginfo->features = 0;
}
+   if (cpu_has_feature(CPU_FTR_ARCH_31))
+   dbginfo->features |= PPC_DEBUG_FEATURE_DATA_BP_ARCH_31;
 }
 
 int ptrace_get_debugreg(struct task_struct *child, unsigned long addr,
-- 
2.26.2



[PATCH v6 6/8] powerpc/watchpoint: Add hw_len wherever missing

2020-09-01 Thread Ravi Bangoria
There are couple of places where we set len but not hw_len. For
ptrace/perf watchpoints, when CONFIG_HAVE_HW_BREAKPOINT=Y, hw_len
will be calculated and set internally while parsing watchpoint.
But when CONFIG_HAVE_HW_BREAKPOINT=N, we need to manually set
'hw_len'. Similarly for xmon as well, hw_len needs to be set
directly.

Fixes: b57aeab811db ("powerpc/watchpoint: Fix length calculation for unaligned 
target")
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/kernel/ptrace/ptrace-noadv.c | 1 +
 arch/powerpc/xmon/xmon.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c 
b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
index c9122ed91340..48c52426af80 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
@@ -219,6 +219,7 @@ long ppc_set_hwdebug(struct task_struct *child, struct 
ppc_hw_breakpoint *bp_inf
brk.address = ALIGN_DOWN(bp_info->addr, HW_BREAKPOINT_SIZE);
brk.type = HW_BRK_TYPE_TRANSLATE | HW_BRK_TYPE_PRIV_ALL;
brk.len = DABR_MAX_LEN;
+   brk.hw_len = DABR_MAX_LEN;
if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_READ)
brk.type |= HW_BRK_TYPE_READ;
if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_WRITE)
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index df7bca00f5ec..55c43a6c9111 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -969,6 +969,7 @@ static void insert_cpu_bpts(void)
brk.address = dabr[i].address;
brk.type = (dabr[i].enabled & HW_BRK_TYPE_DABR) | 
HW_BRK_TYPE_PRIV_ALL;
brk.len = 8;
+   brk.hw_len = 8;
__set_breakpoint(i, );
}
}
-- 
2.26.2



[PATCH v6 5/8] powerpc/watchpoint: Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N

2020-09-01 Thread Ravi Bangoria
On powerpc, ptrace watchpoint works in one-shot mode. i.e. kernel
disables event every time it fires and user has to re-enable it.
Also, in case of ptrace watchpoint, kernel notifies ptrace user
before executing instruction.

With CONFIG_HAVE_HW_BREAKPOINT=N, kernel is missing to disable
ptrace event and thus it's causing infinite loop of exceptions.
This is especially harmful when user watches on a data which is
also read/written by kernel, eg syscall parameters. In such case,
infinite exceptions happens in kernel mode which causes soft-lockup.

Fixes: 9422de3e953d ("powerpc: Hardware breakpoints rewrite to handle non DABR 
breakpoint registers")
Reported-by: Pedro Miraglia Franco de Carvalho 
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/hw_breakpoint.h  |  3 ++
 arch/powerpc/kernel/process.c | 48 +++
 arch/powerpc/kernel/ptrace/ptrace-noadv.c |  4 +-
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 81872c420476..abebfbee5b1c 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -18,6 +18,7 @@ struct arch_hw_breakpoint {
u16 type;
u16 len; /* length of the target data symbol */
u16 hw_len; /* length programmed in hw */
+   u8  flags;
 };
 
 /* Note: Don't change the first 6 bits below as they are in the same order
@@ -37,6 +38,8 @@ struct arch_hw_breakpoint {
 #define HW_BRK_TYPE_PRIV_ALL   (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL | \
 HW_BRK_TYPE_HYP)
 
+#define HW_BRK_FLAG_DISABLED   0x1
+
 /* Minimum granularity */
 #ifdef CONFIG_PPC_8xx
 #define HW_BREAKPOINT_SIZE  0x4
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 016bd831908e..160fbbf41d40 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -636,6 +636,44 @@ void do_send_trap(struct pt_regs *regs, unsigned long 
address,
(void __user *)address);
 }
 #else  /* !CONFIG_PPC_ADV_DEBUG_REGS */
+
+static void do_break_handler(struct pt_regs *regs)
+{
+   struct arch_hw_breakpoint null_brk = {0};
+   struct arch_hw_breakpoint *info;
+   struct ppc_inst instr = ppc_inst(0);
+   int type = 0;
+   int size = 0;
+   unsigned long ea;
+   int i;
+
+   /*
+* If underneath hw supports only one watchpoint, we know it
+* caused exception. 8xx also falls into this category.
+*/
+   if (nr_wp_slots() == 1) {
+   __set_breakpoint(0, _brk);
+   current->thread.hw_brk[0] = null_brk;
+   current->thread.hw_brk[0].flags |= HW_BRK_FLAG_DISABLED;
+   return;
+   }
+
+   /* Otherwise findout which DAWR caused exception and disable it. */
+   wp_get_instr_detail(regs, , , , );
+
+   for (i = 0; i < nr_wp_slots(); i++) {
+   info = >thread.hw_brk[i];
+   if (!info->address)
+   continue;
+
+   if (wp_check_constraints(regs, instr, ea, type, size, info)) {
+   __set_breakpoint(i, _brk);
+   current->thread.hw_brk[i] = null_brk;
+   current->thread.hw_brk[i].flags |= HW_BRK_FLAG_DISABLED;
+   }
+   }
+}
+
 void do_break (struct pt_regs *regs, unsigned long address,
unsigned long error_code)
 {
@@ -647,6 +685,16 @@ void do_break (struct pt_regs *regs, unsigned long address,
if (debugger_break_match(regs))
return;
 
+   /*
+* We reach here only when watchpoint exception is generated by ptrace
+* event (or hw is buggy!). Now if CONFIG_HAVE_HW_BREAKPOINT is set,
+* watchpoint is already handled by hw_breakpoint_handler() so we don't
+* have to do anything. But when CONFIG_HAVE_HW_BREAKPOINT is not set,
+* we need to manually handle the watchpoint here.
+*/
+   if (!IS_ENABLED(CONFIG_HAVE_HW_BREAKPOINT))
+   do_break_handler(regs);
+
/* Deliver the signal to userspace */
force_sig_fault(SIGTRAP, TRAP_HWBKPT, (void __user *)address);
 }
diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c 
b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
index 57a0ab822334..c9122ed91340 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
@@ -286,11 +286,13 @@ long ppc_del_hwdebug(struct task_struct *child, long data)
}
return ret;
 #else /* CONFIG_HAVE_HW_BREAKPOINT */
-   if (child->thread.hw_brk[data - 1].address == 0)
+   if (!(child->thread.hw_brk[data - 1].flags & HW_BRK_FLAG_DISABLED) &&
+   child->thread.hw_brk[data - 1].address == 0)
return -ENOENT;
 
child->thread.hw_brk[data - 1].address = 0;

[PATCH v6 4/8] powerpc/watchpoint: Move DAWR detection logic outside of hw_breakpoint.c

2020-09-01 Thread Ravi Bangoria
Power10 hw has multiple DAWRs but hw doesn't tell which DAWR caused
the exception. So we have a sw logic to detect that in hw_breakpoint.c.
But hw_breakpoint.c gets compiled only with CONFIG_HAVE_HW_BREAKPOINT=Y.
Move DAWR detection logic outside of hw_breakpoint.c so that it can be
reused when CONFIG_HAVE_HW_BREAKPOINT is not set.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/hw_breakpoint.h  |   8 +
 arch/powerpc/kernel/Makefile  |   3 +-
 arch/powerpc/kernel/hw_breakpoint.c   | 159 +
 .../kernel/hw_breakpoint_constraints.c| 162 ++
 4 files changed, 174 insertions(+), 158 deletions(-)
 create mode 100644 arch/powerpc/kernel/hw_breakpoint_constraints.c

diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 9b68eafebf43..81872c420476 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -10,6 +10,7 @@
 #define _PPC_BOOK3S_64_HW_BREAKPOINT_H
 
 #include 
+#include 
 
 #ifdef __KERNEL__
 struct arch_hw_breakpoint {
@@ -52,6 +53,13 @@ static inline int nr_wp_slots(void)
return cpu_has_feature(CPU_FTR_DAWR1) ? 2 : 1;
 }
 
+bool wp_check_constraints(struct pt_regs *regs, struct ppc_inst instr,
+ unsigned long ea, int type, int size,
+ struct arch_hw_breakpoint *info);
+
+void wp_get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr,
+int *type, int *size, unsigned long *ea);
+
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 #include 
 #include 
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index cbf41fb4ee89..a5550c2b24c4 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -45,7 +45,8 @@ obj-y := cputable.o syscalls.o \
   signal.o sysfs.o cacheinfo.o time.o \
   prom.o traps.o setup-common.o \
   udbg.o misc.o io.o misc_$(BITS).o \
-  of_platform.o prom_parse.o firmware.o
+  of_platform.o prom_parse.o firmware.o \
+  hw_breakpoint_constraints.o
 obj-y  += ptrace/
 obj-$(CONFIG_PPC64)+= setup_64.o \
   paca.o nvram_64.o note.o syscall_64.o
diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index f6b24838ca3c..f4e8f21046f5 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -494,161 +494,6 @@ void thread_change_pc(struct task_struct *tsk, struct 
pt_regs *regs)
}
 }
 
-static bool dar_in_user_range(unsigned long dar, struct arch_hw_breakpoint 
*info)
-{
-   return ((info->address <= dar) && (dar - info->address < info->len));
-}
-
-static bool ea_user_range_overlaps(unsigned long ea, int size,
-  struct arch_hw_breakpoint *info)
-{
-   return ((ea < info->address + info->len) &&
-   (ea + size > info->address));
-}
-
-static bool dar_in_hw_range(unsigned long dar, struct arch_hw_breakpoint *info)
-{
-   unsigned long hw_start_addr, hw_end_addr;
-
-   hw_start_addr = ALIGN_DOWN(info->address, HW_BREAKPOINT_SIZE);
-   hw_end_addr = ALIGN(info->address + info->len, HW_BREAKPOINT_SIZE);
-
-   return ((hw_start_addr <= dar) && (hw_end_addr > dar));
-}
-
-static bool ea_hw_range_overlaps(unsigned long ea, int size,
-struct arch_hw_breakpoint *info)
-{
-   unsigned long hw_start_addr, hw_end_addr;
-   unsigned long align_size = HW_BREAKPOINT_SIZE;
-
-   /*
-* On p10 predecessors, quadword is handle differently then
-* other instructions.
-*/
-   if (!cpu_has_feature(CPU_FTR_ARCH_31) && size == 16)
-   align_size = HW_BREAKPOINT_SIZE_QUADWORD;
-
-   hw_start_addr = ALIGN_DOWN(info->address, align_size);
-   hw_end_addr = ALIGN(info->address + info->len, align_size);
-
-   return ((ea < hw_end_addr) && (ea + size > hw_start_addr));
-}
-
-/*
- * If hw has multiple DAWR registers, we also need to check all
- * dawrx constraint bits to confirm this is _really_ a valid event.
- * If type is UNKNOWN, but privilege level matches, consider it as
- * a positive match.
- */
-static bool check_dawrx_constraints(struct pt_regs *regs, int type,
-   struct arch_hw_breakpoint *info)
-{
-   if (OP_IS_LOAD(type) && !(info->type & HW_BRK_TYPE_READ))
-   return false;
-
-   /*
-* The Cache Management instructions other than dcbz never
-* cause a match. i.e. if type is CACHEOP, the instruction
-* is dcbz, and dcbz is treated as Store.
-*/
-   if ((OP_IS_STORE(type) || type == CACHEOP) && !(info->type & 

[PATCH v6 3/8] powerpc/watchpoint/ptrace: Fix SETHWDEBUG when CONFIG_HAVE_HW_BREAKPOINT=N

2020-09-01 Thread Ravi Bangoria
When kernel is compiled with CONFIG_HAVE_HW_BREAKPOINT=N, user can
still create watchpoint using PPC_PTRACE_SETHWDEBUG, with limited
functionalities. But, such watchpoints are never firing because of
the missing privilege settings. Fix that.

It's safe to set HW_BRK_TYPE_PRIV_ALL because we don't really leak
any kernel address in signal info. Setting HW_BRK_TYPE_PRIV_ALL will
also help to find scenarios when kernel accesses user memory.

Reported-by: Pedro Miraglia Franco de Carvalho 
Suggested-by: Pedro Miraglia Franco de Carvalho 
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/kernel/ptrace/ptrace-noadv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c 
b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
index 697c7e4b5877..57a0ab822334 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
@@ -217,7 +217,7 @@ long ppc_set_hwdebug(struct task_struct *child, struct 
ppc_hw_breakpoint *bp_inf
return -EIO;
 
brk.address = ALIGN_DOWN(bp_info->addr, HW_BREAKPOINT_SIZE);
-   brk.type = HW_BRK_TYPE_TRANSLATE;
+   brk.type = HW_BRK_TYPE_TRANSLATE | HW_BRK_TYPE_PRIV_ALL;
brk.len = DABR_MAX_LEN;
if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_READ)
brk.type |= HW_BRK_TYPE_READ;
-- 
2.26.2



[PATCH v6 2/8] powerpc/watchpoint: Fix handling of vector instructions

2020-09-01 Thread Ravi Bangoria
Vector load/store instructions are special because they are always
aligned. Thus unaligned EA needs to be aligned down before comparing
it with watch ranges. Otherwise we might consider valid event as
invalid.

Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than 
one watchpoint")
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/kernel/hw_breakpoint.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 9f7df1c37233..f6b24838ca3c 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -644,6 +644,8 @@ static void get_instr_detail(struct pt_regs *regs, struct 
ppc_inst *instr,
if (*type == CACHEOP) {
*size = cache_op_size();
*ea &= ~(*size - 1);
+   } else if (*type == LOAD_VMX || *type == STORE_VMX) {
+   *ea &= ~(*size - 1);
}
 }
 
-- 
2.26.2



[PATCH v6 1/8] powerpc/watchpoint: Fix quarword instruction handling on p10 predecessors

2020-09-01 Thread Ravi Bangoria
On p10 predecessors, watchpoint with quarword access is compared at
quardword length. If the watch range is doubleword or less than that
in a first half of quarword aligned 16 bytes, and if there is any
unaligned quadword access which will access only the 2nd half, the
handler should consider it as extraneous and emulate/single-step it
before continuing.

Reported-by: Pedro Miraglia Franco de Carvalho 
Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than 
one watchpoint")
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/hw_breakpoint.h |  1 +
 arch/powerpc/kernel/hw_breakpoint.c  | 12 ++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index db206a7f38e2..9b68eafebf43 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -42,6 +42,7 @@ struct arch_hw_breakpoint {
 #else
 #define HW_BREAKPOINT_SIZE  0x8
 #endif
+#define HW_BREAKPOINT_SIZE_QUADWORD0x10
 
 #define DABR_MAX_LEN   8
 #define DAWR_MAX_LEN   512
diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 1f4a1efa0074..9f7df1c37233 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -520,9 +520,17 @@ static bool ea_hw_range_overlaps(unsigned long ea, int 
size,
 struct arch_hw_breakpoint *info)
 {
unsigned long hw_start_addr, hw_end_addr;
+   unsigned long align_size = HW_BREAKPOINT_SIZE;
 
-   hw_start_addr = ALIGN_DOWN(info->address, HW_BREAKPOINT_SIZE);
-   hw_end_addr = ALIGN(info->address + info->len, HW_BREAKPOINT_SIZE);
+   /*
+* On p10 predecessors, quadword is handle differently then
+* other instructions.
+*/
+   if (!cpu_has_feature(CPU_FTR_ARCH_31) && size == 16)
+   align_size = HW_BREAKPOINT_SIZE_QUADWORD;
+
+   hw_start_addr = ALIGN_DOWN(info->address, align_size);
+   hw_end_addr = ALIGN(info->address + info->len, align_size);
 
return ((ea < hw_end_addr) && (ea + size > hw_start_addr));
 }
-- 
2.26.2



[PATCH v6 0/8] powerpc/watchpoint: Bug fixes plus new feature flag

2020-09-01 Thread Ravi Bangoria
Patch #1 fixes issue for quardword instruction on p10 predecessors.
Patch #2 fixes issue for vector instructions.
Patch #3 fixes a bug about watchpoint not firing when created with
 ptrace PPC_PTRACE_SETHWDEBUG and CONFIG_HAVE_HW_BREAKPOINT=N.
 The fix uses HW_BRK_TYPE_PRIV_ALL for ptrace user which, I
 guess, should be fine because we don't leak any kernel
 addresses and PRIV_ALL will also help to cover scenarios when
 kernel accesses user memory.
Patch #4,#5 fixes infinite exception bug, again the bug happens only
 with CONFIG_HAVE_HW_BREAKPOINT=N.
Patch #6 fixes two places where we are missing to set hw_len.
Patch #7 introduce new feature bit PPC_DEBUG_FEATURE_DATA_BP_ARCH_31
 which will be set when running on ISA 3.1 compliant machine.
Patch #8 finally adds selftest to test scenarios fixed by patch#2,#3
 and also moves MODE_EXACT tests outside of BP_RANGE condition.

Christophe, let me know if this series breaks something for 8xx.

v5: 
https://lore.kernel.org/r/20200825043617.1073634-1-ravi.bango...@linux.ibm.com

v5->v6:
 - Fix build faulure reported by kernel test robot
 - patch #5. Use more compact if condition, suggested by Christophe


Ravi Bangoria (8):
  powerpc/watchpoint: Fix quarword instruction handling on p10
predecessors
  powerpc/watchpoint: Fix handling of vector instructions
  powerpc/watchpoint/ptrace: Fix SETHWDEBUG when
CONFIG_HAVE_HW_BREAKPOINT=N
  powerpc/watchpoint: Move DAWR detection logic outside of
hw_breakpoint.c
  powerpc/watchpoint: Fix exception handling for
CONFIG_HAVE_HW_BREAKPOINT=N
  powerpc/watchpoint: Add hw_len wherever missing
  powerpc/watchpoint/ptrace: Introduce PPC_DEBUG_FEATURE_DATA_BP_ARCH_31
  powerpc/watchpoint/selftests: Tests for kernel accessing user memory

 Documentation/powerpc/ptrace.rst  |   1 +
 arch/powerpc/include/asm/hw_breakpoint.h  |  12 ++
 arch/powerpc/include/uapi/asm/ptrace.h|   1 +
 arch/powerpc/kernel/Makefile  |   3 +-
 arch/powerpc/kernel/hw_breakpoint.c   | 149 +---
 .../kernel/hw_breakpoint_constraints.c| 162 ++
 arch/powerpc/kernel/process.c |  48 ++
 arch/powerpc/kernel/ptrace/ptrace-noadv.c |   9 +-
 arch/powerpc/xmon/xmon.c  |   1 +
 .../selftests/powerpc/ptrace/ptrace-hwbreak.c |  48 +-
 10 files changed, 282 insertions(+), 152 deletions(-)
 create mode 100644 arch/powerpc/kernel/hw_breakpoint_constraints.c

-- 
2.26.2



[PATCH v3] powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc

2020-09-01 Thread Aneesh Kumar K.V
The test is broken w.r.t page table update rules and results in kernel
crash as below. Disable the support until we get the tests updated.

[   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
cpu 0x0: Vector: 700 (Program Check) at [c00c6d1e76c0]
pc: c009a5ec: assert_pte_locked+0x14c/0x380
lr: c05c: pte_update+0x11c/0x190
sp: c00c6d1e7950
   msr: 82029033
  current = 0xc00c6d172c80
  paca= 0xc3ba   irqmask: 0x03   irq_happened: 0x01
pid   = 1, comm = swapper/0
kernel BUG at arch/powerpc/mm/pgtable.c:304!
[link register   ] c05c pte_update+0x11c/0x190
[c00c6d1e7950] 0001 (unreliable)
[c00c6d1e79b0] c05eee14 pte_update+0x44/0x190
[c00c6d1e7a10] c1a2ca9c pte_advanced_tests+0x160/0x3d8
[c00c6d1e7ab0] c1a2d4fc debug_vm_pgtable+0x7e8/0x1338
[c00c6d1e7ba0] c00116ec do_one_initcall+0xac/0x5f0
[c00c6d1e7c80] c19e4fac kernel_init_freeable+0x4dc/0x5a4
[c00c6d1e7db0] c0012474 kernel_init+0x24/0x160
[c00c6d1e7e20] c000cbd0 ret_from_kernel_thread+0x5c/0x6c

With DEBUG_VM disabled

[   20.530152] BUG: Kernel NULL pointer dereference on read at 0x
[   20.530183] Faulting instruction address: 0xc00df330
cpu 0x33: Vector: 380 (Data SLB Access) at [c00c6d19f700]
pc: c00df330: memset+0x68/0x104
lr: c009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
sp: c00c6d19f990
   msr: 82009033
   dar: 0
  current = 0xc00c6d177480
  paca= 0xc0001ec4f400   irqmask: 0x03   irq_happened: 0x01
pid   = 1, comm = swapper/0
[link register   ] c009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
[c00c6d19f990] c009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 
(unreliable)
[c00c6d19fa10] c19ebf30 pmd_advanced_tests+0x1f0/0x378
[c00c6d19fab0] c19ed088 debug_vm_pgtable+0x79c/0x1244
[c00c6d19fba0] c00116ec do_one_initcall+0xac/0x5f0
[c00c6d19fc80] c19a4fac kernel_init_freeable+0x4dc/0x5a4
[c00c6d19fdb0] c0012474 kernel_init+0x24/0x160
[c00c6d19fe20] c000cbd0 ret_from_kernel_thread+0x5c/0x6c
33:mon>

Signed-off-by: Aneesh Kumar K.V 
---
 Documentation/features/debug/debug-vm-pgtable/arch-support.txt | 2 +-
 arch/powerpc/Kconfig   | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt 
b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
index 53da483c8326..1c49723e7534 100644
--- a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
+++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
@@ -22,7 +22,7 @@
 |   nios2: | TODO |
 |openrisc: | TODO |
 |  parisc: | TODO |
-| powerpc: |  ok  |
+| powerpc: | TODO |
 |   riscv: |  ok  |
 |s390: |  ok  |
 |  sh: | TODO |
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 65bed1fdeaad..787e829b6f25 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -116,7 +116,6 @@ config PPC
#
select ARCH_32BIT_OFF_T if PPC32
select ARCH_HAS_DEBUG_VIRTUAL
-   select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
-- 
2.26.2



Re: [PATCH v3 13/13] mm/debug_vm_pgtable: populate a pte entry before fetching it

2020-09-01 Thread Aneesh Kumar K.V

On 9/2/20 9:19 AM, Anshuman Khandual wrote:



On 09/01/2020 03:28 PM, Aneesh Kumar K.V wrote:

On 9/1/20 1:08 PM, Anshuman Khandual wrote:



On 09/01/2020 12:07 PM, Aneesh Kumar K.V wrote:

On 9/1/20 8:55 AM, Anshuman Khandual wrote:



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:

pte_clear_tests operate on an existing pte entry. Make sure that is not a none
pte entry.

Signed-off-by: Aneesh Kumar K.V 
---
    mm/debug_vm_pgtable.c | 6 --
    1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 21329c7d672f..8527ebb75f2c 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -546,7 +546,7 @@ static void __init pgd_populate_tests(struct mm_struct *mm, 
pgd_t *pgdp,
    static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
   unsigned long vaddr)
    {
-    pte_t pte = ptep_get(ptep);
+    pte_t pte =  ptep_get_and_clear(mm, vaddr, ptep);


Seems like ptep_get_and_clear() here just clears the entry in preparation
for a following set_pte_at() which otherwise would have been a problem on
ppc64 as you had pointed out earlier i.e set_pte_at() should not update an
existing valid entry. So the commit message here is bit misleading.



and also fetch the pte value which is used further.



      pr_debug("Validating PTE clear\n");
    pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
@@ -944,7 +944,7 @@ static int __init debug_vm_pgtable(void)
    p4d_t *p4dp, *saved_p4dp;
    pud_t *pudp, *saved_pudp;
    pmd_t *pmdp, *saved_pmdp, pmd;
-    pte_t *ptep;
+    pte_t *ptep, pte;
    pgtable_t saved_ptep;
    pgprot_t prot, protnone;
    phys_addr_t paddr;
@@ -1049,6 +1049,8 @@ static int __init debug_vm_pgtable(void)
     */
      ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );
+    pte = pfn_pte(pte_aligned, prot);
+    set_pte_at(mm, vaddr, ptep, pte);


Not here, creating and populating an entry must be done in respective
test functions itself. Besides, this seems bit redundant as well. The
test pte_clear_tests() with the above change added, already

- Clears the PTEP entry with ptep_get_and_clear()


and fetch the old value set previously.


In that case, please move above two lines i.e

pte = pfn_pte(pte_aligned, prot);
set_pte_at(mm, vaddr, ptep, pte);

from debug_vm_pgtable() to pte_clear_tests() and update it's arguments
as required.



Frankly, I don't understand what these tests are testing. It all looks like 
some random clear and set.


The idea here is to have some value with some randomness preferably, in
a given PTEP before attempting to clear the entry, in order to make sure
that pte_clear() is indeed clearing something of non-zero value.



static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
    unsigned long vaddr, unsigned long pfn,
    pgprot_t prot)
{

 pte_t pte = pfn_pte(pfn, prot);
 set_pte_at(mm, vaddr, ptep, pte);

 pte =  ptep_get_and_clear(mm, vaddr, ptep);


Looking at this again, this preceding pfn_pte() followed by set_pte_at()
is not really required. Its reasonable to start with what ever was there
in the PTEP as a seed value which anyway gets added with RANDOM_ORVALUE.
s/ptep_get/ptep_get_and_clear is sufficient to take care of the powerpc
set_pte_at() constraint.



But the way test is written we had none pte before. That is why I added 
that set_pte_at to put something there. With none pte the below sequence 
fails.


pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
set_pte_at(mm, vaddr, ptep, pte);


because nobody is marking a _PAGE_PTE there.

pte_t pte = pfn_pte(pfn, prot);

pr_debug("Validating PTE clear\n");
pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
set_pte_at(mm, vaddr, ptep, pte);
barrier();
pte_clear(mm, vaddr, ptep);
pte = ptep_get(ptep);
WARN_ON(!pte_none(pte));

will that work for you?

-aneesh


[PATCH] powerpc/powernv/pci: Drop VF MPS fixup

2020-09-01 Thread Oliver O'Halloran
The MPS field in the VF config space is marked as reserved in current
versions of the SR-IOV spec. In other words, this fixup doesn't do
anything.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 18 --
 1 file changed, 18 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 9af8c3b98853..0cabe4e632e3 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1689,24 +1689,6 @@ static struct eeh_ops pnv_eeh_ops = {
.notify_resume  = NULL
 };
 
-#ifdef CONFIG_PCI_IOV
-static void pnv_pci_fixup_vf_mps(struct pci_dev *pdev)
-{
-   struct pci_dn *pdn = pci_get_pdn(pdev);
-   int parent_mps;
-
-   if (!pdev->is_virtfn)
-   return;
-
-   /* Synchronize MPS for VF and PF */
-   parent_mps = pcie_get_mps(pdev->physfn);
-   if ((128 << pdev->pcie_mpss) >= parent_mps)
-   pcie_set_mps(pdev, parent_mps);
-   pdn->mps = pcie_get_mps(pdev);
-}
-DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pnv_pci_fixup_vf_mps);
-#endif /* CONFIG_PCI_IOV */
-
 /**
  * eeh_powernv_init - Register platform dependent EEH operations
  *
-- 
2.26.2



[PATCH] powerpc/pci: Remove unimplemented prototypes

2020-09-01 Thread Oliver O'Halloran
The corresponding definitions were deleted in commit 3d5134ee8341
("[POWERPC] Rewrite IO allocation & mapping on powerpc64") which
was merged a mere 13 years ago.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/ppc-pci.h | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-pci.h 
b/arch/powerpc/include/asm/ppc-pci.h
index 7f4be5a05eb3..0745422a8e57 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -13,10 +13,6 @@
 
 extern unsigned long isa_io_base;
 
-extern void pci_setup_phb_io(struct pci_controller *hose, int primary);
-extern void pci_setup_phb_io_dynamic(struct pci_controller *hose, int primary);
-
-
 extern struct list_head hose_list;
 
 extern struct pci_dev *isa_bridge_pcidev;  /* may be NULL if no ISA bus */
-- 
2.26.2



[PATCH] powerpc/pci: Delete traverse_pci_dn()

2020-09-01 Thread Oliver O'Halloran
Nothing uses it.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/ppc-pci.h |  3 ---
 arch/powerpc/kernel/pci_dn.c   | 40 --
 2 files changed, 43 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-pci.h 
b/arch/powerpc/include/asm/ppc-pci.h
index 0745422a8e57..2b9edbf6e929 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -28,9 +28,6 @@ struct pci_dn;
 void *pci_traverse_device_nodes(struct device_node *start,
void *(*fn)(struct device_node *, void *),
void *data);
-void *traverse_pci_dn(struct pci_dn *root,
- void *(*fn)(struct pci_dn *, void *),
- void *data);
 extern void pci_devs_phb_init_dynamic(struct pci_controller *phb);
 
 /* From rtas_pci.h */
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index e99b7c547d7e..54e240597fd9 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -443,46 +443,6 @@ void *pci_traverse_device_nodes(struct device_node *start,
 }
 EXPORT_SYMBOL_GPL(pci_traverse_device_nodes);
 
-static struct pci_dn *pci_dn_next_one(struct pci_dn *root,
- struct pci_dn *pdn)
-{
-   struct list_head *next = pdn->child_list.next;
-
-   if (next != >child_list)
-   return list_entry(next, struct pci_dn, list);
-
-   while (1) {
-   if (pdn == root)
-   return NULL;
-
-   next = pdn->list.next;
-   if (next != >parent->child_list)
-   break;
-
-   pdn = pdn->parent;
-   }
-
-   return list_entry(next, struct pci_dn, list);
-}
-
-void *traverse_pci_dn(struct pci_dn *root,
- void *(*fn)(struct pci_dn *, void *),
- void *data)
-{
-   struct pci_dn *pdn = root;
-   void *ret;
-
-   /* Only scan the child nodes */
-   for (pdn = pci_dn_next_one(root, pdn); pdn;
-pdn = pci_dn_next_one(root, pdn)) {
-   ret = fn(pdn, data);
-   if (ret)
-   return ret;
-   }
-
-   return NULL;
-}
-
 static void *add_pdn(struct device_node *dn, void *data)
 {
struct pci_controller *hose = data;
-- 
2.26.2



Re: [PATCH v3 13/13] mm/debug_vm_pgtable: populate a pte entry before fetching it

2020-09-01 Thread Anshuman Khandual



On 09/01/2020 03:28 PM, Aneesh Kumar K.V wrote:
> On 9/1/20 1:08 PM, Anshuman Khandual wrote:
>>
>>
>> On 09/01/2020 12:07 PM, Aneesh Kumar K.V wrote:
>>> On 9/1/20 8:55 AM, Anshuman Khandual wrote:


 On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:
> pte_clear_tests operate on an existing pte entry. Make sure that is not a 
> none
> pte entry.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>    mm/debug_vm_pgtable.c | 6 --
>    1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index 21329c7d672f..8527ebb75f2c 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -546,7 +546,7 @@ static void __init pgd_populate_tests(struct 
> mm_struct *mm, pgd_t *pgdp,
>    static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
>   unsigned long vaddr)
>    {
> -    pte_t pte = ptep_get(ptep);
> +    pte_t pte =  ptep_get_and_clear(mm, vaddr, ptep);

 Seems like ptep_get_and_clear() here just clears the entry in preparation
 for a following set_pte_at() which otherwise would have been a problem on
 ppc64 as you had pointed out earlier i.e set_pte_at() should not update an
 existing valid entry. So the commit message here is bit misleading.

>>>
>>> and also fetch the pte value which is used further.
>>>
>>>
>      pr_debug("Validating PTE clear\n");
>    pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
> @@ -944,7 +944,7 @@ static int __init debug_vm_pgtable(void)
>    p4d_t *p4dp, *saved_p4dp;
>    pud_t *pudp, *saved_pudp;
>    pmd_t *pmdp, *saved_pmdp, pmd;
> -    pte_t *ptep;
> +    pte_t *ptep, pte;
>    pgtable_t saved_ptep;
>    pgprot_t prot, protnone;
>    phys_addr_t paddr;
> @@ -1049,6 +1049,8 @@ static int __init debug_vm_pgtable(void)
>     */
>      ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );
> +    pte = pfn_pte(pte_aligned, prot);
> +    set_pte_at(mm, vaddr, ptep, pte);

 Not here, creating and populating an entry must be done in respective
 test functions itself. Besides, this seems bit redundant as well. The
 test pte_clear_tests() with the above change added, already

 - Clears the PTEP entry with ptep_get_and_clear()
>>>
>>> and fetch the old value set previously.
>>
>> In that case, please move above two lines i.e
>>
>> pte = pfn_pte(pte_aligned, prot);
>> set_pte_at(mm, vaddr, ptep, pte);
>>
>> from debug_vm_pgtable() to pte_clear_tests() and update it's arguments
>> as required.
>>
> 
> Frankly, I don't understand what these tests are testing. It all looks like 
> some random clear and set.

The idea here is to have some value with some randomness preferably, in
a given PTEP before attempting to clear the entry, in order to make sure
that pte_clear() is indeed clearing something of non-zero value.

> 
> static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
>    unsigned long vaddr, unsigned long pfn,
>    pgprot_t prot)
> {
> 
> pte_t pte = pfn_pte(pfn, prot);
> set_pte_at(mm, vaddr, ptep, pte);
> 
> pte =  ptep_get_and_clear(mm, vaddr, ptep);

Looking at this again, this preceding pfn_pte() followed by set_pte_at()
is not really required. Its reasonable to start with what ever was there
in the PTEP as a seed value which anyway gets added with RANDOM_ORVALUE.
s/ptep_get/ptep_get_and_clear is sufficient to take care of the powerpc
set_pte_at() constraint.


Re: [PATCH v3 08/13] mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP

2020-09-01 Thread Anshuman Khandual



On 09/01/2020 01:21 PM, Christophe Leroy wrote:
> 
> 
> Le 01/09/2020 à 09:40, Aneesh Kumar K.V a écrit :
>> On 9/1/20 12:20 PM, Christophe Leroy wrote:
>>>
>>>
>>> Le 01/09/2020 à 08:25, Aneesh Kumar K.V a écrit :
 On 9/1/20 8:52 AM, Anshuman Khandual wrote:
>
>
>
> There is a checkpatch.pl warning here.
>
> WARNING: Possible unwrapped commit description (prefer a maximum 75 chars 
> per line)
> #7:
> Architectures like ppc64 use deposited page table while updating the huge 
> pte
>
> total: 0 errors, 1 warnings, 40 lines checked
>

 I will ignore all these, because they are not really important IMHO.

>>>
>>> When doing a git log in a 80 chars terminal window, having wrapping lines 
>>> is not really convenient. It should be easy to avoid it.
>>>
>>
>> We have been ignoring that for a long time  isn't it?
>>
>> For example ppc64 checkpatch already had
>> --max-line-length=90
>>
>>
>> There was also recent discussion whether 80 character limit is valid any 
>> more. But I do keep it restricted to 80 character where ever it is easy/make 
>> sense.
>>
> 
> Here we are not talking about the code, but the commit log.
> 
> As far as I know, the discussions about 80 character lines, 90 lines in 
> powerpc etc ... is for the code.
> 
> We still aim at keeping lines not longer than 75 chars in the commit log.

Agreed.


Re: [PATCH v3 03/13] mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value

2020-09-01 Thread Anshuman Khandual



On 09/01/2020 01:25 PM, Aneesh Kumar K.V wrote:
> On 9/1/20 1:16 PM, Anshuman Khandual wrote:
>>
>>
>> On 09/01/2020 01:06 PM, Aneesh Kumar K.V wrote:
>>> On 9/1/20 1:02 PM, Anshuman Khandual wrote:


 On 09/01/2020 11:51 AM, Aneesh Kumar K.V wrote:
> On 9/1/20 8:45 AM, Anshuman Khandual wrote:
>>
>>
>> On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:
>>> ppc64 use bit 62 to indicate a pte entry (_PAGE_PTE). Avoid setting 
>>> that bit in
>>> random value.
>>>
>>> Signed-off-by: Aneesh Kumar K.V 
>>> ---
>>>     mm/debug_vm_pgtable.c | 13 ++---
>>>     1 file changed, 10 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>>> index 086309fb9b6f..bbf9df0e64c6 100644
>>> --- a/mm/debug_vm_pgtable.c
>>> +++ b/mm/debug_vm_pgtable.c
>>> @@ -44,10 +44,17 @@
>>>  * entry type. But these bits might affect the ability to clear 
>>> entries with
>>>  * pxx_clear() because of how dynamic page table folding works on 
>>> s390. So
>>>  * while loading up the entries do not change the lower 4 bits. It 
>>> does not
>>> - * have affect any other platform.
>>> + * have affect any other platform. Also avoid the 62nd bit on ppc64 
>>> that is
>>> + * used to mark a pte entry.
>>>  */
>>> -#define S390_MASK_BITS    4
>>> -#define RANDOM_ORVALUE    GENMASK(BITS_PER_LONG - 1, S390_MASK_BITS)
>>> +#define S390_SKIP_MASK    GENMASK(3, 0)
>>> +#ifdef CONFIG_PPC_BOOK3S_64
>>> +#define PPC64_SKIP_MASK    GENMASK(62, 62)
>>> +#else
>>> +#define PPC64_SKIP_MASK    0x0
>>> +#endif
>>
>> Please drop the #ifdef CONFIG_PPC_BOOK3S_64 here. We already accommodate 
>> skip
>> bits for a s390 platform requirement and can also do so for ppc64 as 
>> well. As
>> mentioned before, please avoid adding any platform specific constructs 
>> in the
>> test.
>>
>
>
> that is needed so that it can be built on 32 bit architectures.I did face 
> build errors with arch-linux

 Could not (#if __BITS_PER_LONG == 32) be used instead or something like
 that. But should be a generic conditional check identifying 32 bit arch
 not anything platform specific.

>>>
>>> that _PAGE_PTE bit is pretty much specific to PPC BOOK3S_64.  Not sure why 
>>> other architectures need to bothered about ignoring bit 62.
>>
>> Thats okay as long it does not adversely affect other architectures, ignoring
>> some more bits is acceptable. Like existing S390_MASK_BITS gets ignored on 
>> all
>> other platforms even if it is a s390 specific constraint. Not having platform
>> specific #ifdef here, is essential.
>>
> 
> Why is it essential?
IIRC, I might have already replied on this couple of times. But let me try once 
more.
It is a generic test aimed at finding inconsistencies between different 
architectures
in terms of the page table helper semantics. Any platform specific construct 
here, to
'make things work' has the potential to hide such inconsistencies and defeat 
the very
purpose. The test/file here follows this rule consistently i.e there is not a 
single
platform specific #ifdef right now and would really like to continue 
maintaining this
property, unless until absolutely necessary. Current situation here wrt 32 bit 
archs
can easily be accommodated with a generic check such as __BITS_PER_LONG.


[powerpc:next-test] BUILD SUCCESS 64cc8701ff3161ff1c257f90375a1a3a8a083d78

2020-09-01 Thread kernel test robot
allyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
x86_64   randconfig-a004-20200901
x86_64   randconfig-a006-20200901
x86_64   randconfig-a003-20200901
x86_64   randconfig-a005-20200901
x86_64   randconfig-a001-20200901
x86_64   randconfig-a002-20200901
i386 randconfig-a004-20200901
i386 randconfig-a005-20200901
i386 randconfig-a006-20200901
i386 randconfig-a002-20200901
i386 randconfig-a001-20200901
i386 randconfig-a003-20200901
i386 randconfig-a016-20200901
i386 randconfig-a015-20200901
i386 randconfig-a011-20200901
i386 randconfig-a013-20200901
i386 randconfig-a014-20200901
i386 randconfig-a012-20200901
i386 randconfig-a016-20200902
i386 randconfig-a015-20200902
i386 randconfig-a011-20200902
i386 randconfig-a013-20200902
i386 randconfig-a014-20200902
i386 randconfig-a012-20200902
riscvallyesconfig
riscv   defconfig
riscvallmodconfig
x86_64   rhel
x86_64   allyesconfig
x86_64rhel-7.6-kselftests
x86_64   rhel-8.3
x86_64  kexec

clang tested configs:
x86_64   randconfig-a013-20200901
x86_64   randconfig-a016-20200901
x86_64   randconfig-a011-20200901
x86_64   randconfig-a012-20200901
x86_64   randconfig-a015-20200901
x86_64   randconfig-a014-20200901

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


[powerpc:merge] BUILD SUCCESS 35f066fda170dde0a31f1447547a5d30b83c3920

2020-09-01 Thread kernel test robot
allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
x86_64   randconfig-a004-20200901
x86_64   randconfig-a006-20200901
x86_64   randconfig-a003-20200901
x86_64   randconfig-a005-20200901
x86_64   randconfig-a001-20200901
x86_64   randconfig-a002-20200901
i386 randconfig-a004-20200901
i386 randconfig-a005-20200901
i386 randconfig-a006-20200901
i386 randconfig-a002-20200901
i386 randconfig-a001-20200901
i386 randconfig-a003-20200901
i386 randconfig-a016-20200901
i386 randconfig-a015-20200901
i386 randconfig-a011-20200901
i386 randconfig-a013-20200901
i386 randconfig-a014-20200901
i386 randconfig-a012-20200901
riscvallyesconfig
riscv   defconfig
riscvallmodconfig
x86_64   rhel
x86_64   allyesconfig
x86_64rhel-7.6-kselftests
x86_64   rhel-8.3
x86_64  kexec

clang tested configs:
x86_64   randconfig-a013-20200901
x86_64   randconfig-a016-20200901
x86_64   randconfig-a011-20200901
x86_64   randconfig-a012-20200901
x86_64   randconfig-a015-20200901
x86_64   randconfig-a014-20200901

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


[powerpc:fixes-test] BUILD SUCCESS fc1f178cdb31783ff37296ecae817a1045a1a513

2020-09-01 Thread kernel test robot
 cobalt_defconfig
powerpc   maple_defconfig
sh  rsk7201_defconfig
arm nhk8815_defconfig
mips bigsur_defconfig
armrealview_defconfig
ia64 allmodconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
nds32 allnoconfig
c6x  allyesconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
xtensa   allyesconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
x86_64   randconfig-a004-20200901
x86_64   randconfig-a006-20200901
x86_64   randconfig-a003-20200901
x86_64   randconfig-a005-20200901
x86_64   randconfig-a001-20200901
x86_64   randconfig-a002-20200901
i386 randconfig-a004-20200901
i386 randconfig-a005-20200901
i386 randconfig-a006-20200901
i386 randconfig-a002-20200901
i386 randconfig-a001-20200901
i386 randconfig-a003-20200901
i386 randconfig-a016-20200901
i386 randconfig-a015-20200901
i386 randconfig-a011-20200901
i386 randconfig-a013-20200901
i386 randconfig-a014-20200901
i386 randconfig-a012-20200901
riscvallyesconfig
riscv   defconfig
riscvallmodconfig
x86_64   rhel
x86_64   allyesconfig
x86_64rhel-7.6-kselftests
x86_64   rhel-8.3
x86_64  kexec

clang tested configs:
x86_64   randconfig-a013-20200901
x86_64   randconfig-a016-20200901
x86_64   randconfig-a011-20200901
x86_64   randconfig-a012-20200901
x86_64   randconfig-a015-20200901
x86_64   randconfig-a014-20200901

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


Re: [PATCH] powerpc: Fix random segfault when freeing hugetlb range

2020-09-01 Thread Aneesh Kumar K.V
Christophe Leroy  writes:

> The following random segfault is observed from time to time with
> map_hugetlb selftest:
>
> root@localhost:~# ./map_hugetlb 1 19
> 524288 kB hugepages
> Mapping 1 Mbytes
> Segmentation fault
>
> [   31.219972] map_hugetlb[365]: segfault (11) at 117 nip 77974f8c lr 
> 779a6834 code 1 in ld-2.23.so[77966000+21000]
> [   31.220192] map_hugetlb[365]: code: 9421ffc0 480318d1 93410028 90010044 
> 9361002c 93810030 93a10034 93c10038
> [   31.220307] map_hugetlb[365]: code: 93e1003c 93210024 8123007c 81430038 
> <80e90004> 814a0004 7f443a14 813a0004
> [   31.221911] BUG: Bad rss-counter state mm:(ptrval) type:MM_FILEPAGES val:33
> [   31.229362] BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:5
>
> This fault is due to hugetlb_free_pgd_range() freeing page tables
> that are also used by regular pages.
>
> As explain in the comment at the beginning of
> hugetlb_free_pgd_range(), the verification done in free_pgd_range()
> on floor and ceiling is not done here, which means
> hugetlb_free_pte_range() can free outside the expected range.
>
> As the verification cannot be done in hugetlb_free_pgd_range(), it
> must be done in hugetlb_free_pte_range().
>

Reviewed-by: Aneesh Kumar K.V 

> Fixes: b250c8c08c79 ("powerpc/8xx: Manage 512k huge pages as standard pages.")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/mm/hugetlbpage.c | 18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index 26292544630f..e7ae2a2c4545 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -330,10 +330,24 @@ static void free_hugepd_range(struct mmu_gather *tlb, 
> hugepd_t *hpdp, int pdshif
>get_hugepd_cache_index(pdshift - shift));
>  }
>  
> -static void hugetlb_free_pte_range(struct mmu_gather *tlb, pmd_t *pmd, 
> unsigned long addr)
> +static void hugetlb_free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
> +unsigned long addr, unsigned long end,
> +unsigned long floor, unsigned long ceiling)
>  {
> + unsigned long start = addr;
>   pgtable_t token = pmd_pgtable(*pmd);
>  
> + start &= PMD_MASK;
> + if (start < floor)
> + return;
> + if (ceiling) {
> + ceiling &= PMD_MASK;
> + if (!ceiling)
> + return;
> + }
> + if (end - 1 > ceiling - 1)
> + return;
> +

We do repeat that for pud/pmd/pte hugetlb_free_range. Can we consolidate
that with comment explaining we are checking if the pgtable entry is
mapping outside the range?

>   pmd_clear(pmd);
>   pte_free_tlb(tlb, token, addr);
>   mm_dec_nr_ptes(tlb->mm);
> @@ -363,7 +377,7 @@ static void hugetlb_free_pmd_range(struct mmu_gather 
> *tlb, pud_t *pud,
>*/
>   WARN_ON(!IS_ENABLED(CONFIG_PPC_8xx));
>  
> - hugetlb_free_pte_range(tlb, pmd, addr);
> + hugetlb_free_pte_range(tlb, pmd, addr, end, floor, 
> ceiling);
>  
>   continue;
>   }
> -- 
> 2.25.0


Re: [PATCH 2/2] powerpc/vdso32: link vdso64 with linker

2020-09-01 Thread Kees Cook
I think $subject needs a typo update... vdso32...

On Tue, Sep 01, 2020 at 03:25:23PM -0700, Nick Desaulniers wrote:
> Rather than invoke the compiler as the driver, use the linker. That way
> we can check --orphan-handling=warn support correctly, as cc-ldoption
> was removed in
> commit 055efab3120b ("kbuild: drop support for cc-ldoption").
> 
> Requires dropping the .got section.  I couldn't find how it was used in
> the vdso32.
> 
> Fixes: commit f2af201002a8 ("powerpc/build: vdso linker warning for orphan 
> sections")
> Link: 
> https://lore.kernel.org/lkml/cakwvodnn3wxydjomvnveyd_njwrku3fabwt_bs92duihhyw...@mail.gmail.com/
> Signed-off-by: Nick Desaulniers 
> ---
> Not sure removing .got is a good idea or not.  Otherwise I observe the
> following link error:
> powerpc-linux-gnu-ld: warning: orphan section `.got' from 
> `arch/powerpc/kernel/vdso32/sigtramp.o' being placed in section `.got'
> powerpc-linux-gnu-ld: _GLOBAL_OFFSET_TABLE_ not defined in linker created .got
> powerpc-linux-gnu-ld: final link failed: bad value

If it's like the x86 and arm toolchains, I think you'll be required to
keep .got, but you can assert it to a 0 size, e.g.:

/*
 * Sections that should stay zero sized, which is safer to
 * explicitly check instead of blindly discarding.
 */
.got : {
*(.got)
}
ASSERT(SIZEOF(.got) == 0, "Unexpected GOT entries detected!")

(and put that at the end of the linker script)


-Kees

> 
> sigtramp.c doesn't mention anything from the GOT AFAICT, and doesn't
> look like it contains relocations that do, so I'm not sure where
> references to _GLOBAL_OFFSET_TABLE_ are coming from.
> 
>  arch/powerpc/kernel/vdso32/Makefile | 7 +--
>  arch/powerpc/kernel/vdso32/vdso32.lds.S | 3 ++-
>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/vdso32/Makefile 
> b/arch/powerpc/kernel/vdso32/Makefile
> index 87ab1152d5ce..611a5951945a 100644
> --- a/arch/powerpc/kernel/vdso32/Makefile
> +++ b/arch/powerpc/kernel/vdso32/Makefile
> @@ -27,6 +27,9 @@ UBSAN_SANITIZE := n
>  ccflags-y := -shared -fno-common -fno-builtin -nostdlib \
>   -Wl,-soname=linux-vdso32.so.1 -Wl,--hash-style=both
>  asflags-y := -D__VDSO32__ -s
> +ldflags-y := -shared -soname linux-vdso32.so.1 \
> + $(call ld-option, --eh-frame-hdr) \
> + $(call ld-option, --orphan-handling=warn) -T
>  
>  obj-y += vdso32_wrapper.o
>  extra-y += vdso32.lds
> @@ -49,8 +52,8 @@ $(obj-vdso32): %.o: %.S FORCE
>   $(call if_changed_dep,vdso32as)
>  
>  # actual build commands
> -quiet_cmd_vdso32ld = VDSO32L $@
> -  cmd_vdso32ld = $(VDSOCC) $(c_flags) $(CC32FLAGS) -o $@ $(call 
> cc-ldoption, -Wl$(comma)--orphan-handling=warn) -Wl,-T$(filter %.lds,$^) 
> $(filter %.o,$^)
> +quiet_cmd_vdso32ld = LD  $@
> +  cmd_vdso32ld = $(cmd_ld)
>  quiet_cmd_vdso32as = VDSO32A $@
>cmd_vdso32as = $(VDSOCC) $(a_flags) $(CC32FLAGS) -c -o $@ $<
>  
> diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S 
> b/arch/powerpc/kernel/vdso32/vdso32.lds.S
> index 4c985467a668..0ccdebad18b8 100644
> --- a/arch/powerpc/kernel/vdso32/vdso32.lds.S
> +++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S
> @@ -61,7 +61,6 @@ SECTIONS
>   .fixup  : { *(.fixup) }
>  
>   .dynamic: { *(.dynamic) }   :text   :dynamic
> - .got: { *(.got) }   :text
>   .plt: { *(.plt) }
>  
>   _end = .;
> @@ -108,7 +107,9 @@ SECTIONS
>   .debug_varnames  0 : { *(.debug_varnames) }
>  
>   /DISCARD/   : {
> + *(.got)
>   *(.note.GNU-stack)
> + *(.branch_lt)
>   *(.data .data.* .gnu.linkonce.d.* .sdata*)
>   *(.bss .sbss .dynbss .dynsbss)
>   *(.glink .iplt .plt .rela*)
> -- 
> 2.28.0.402.g5ffc5be6b7-goog
> 

-- 
Kees Cook


Re: [PATCH 0/7] powerpc/watchpoint: 2nd DAWR kvm enablement + selftests

2020-09-01 Thread Paul Mackerras
On Thu, Jul 23, 2020 at 03:50:51PM +0530, Ravi Bangoria wrote:
> Patch #1, #2 and #3 enables p10 2nd DAWR feature for Book3S kvm guest. DAWR
> is a hypervisor resource and thus H_SET_MODE hcall is used to set/unset it.
> A new case H_SET_MODE_RESOURCE_SET_DAWR1 is introduced in H_SET_MODE hcall
> for setting/unsetting 2nd DAWR. Also, new capability KVM_CAP_PPC_DAWR1 has
> been added to query 2nd DAWR support via kvm ioctl.
> 
> This feature also needs to be enabled in Qemu to really use it. I'll reply
> link to qemu patches once I post them in qemu-devel mailing list.
> 
> Patch #4, #5, #6 and #7 adds selftests to test 2nd DAWR.

If/when you resubmit these patches, please split the KVM patches into
a separate series, since the KVM patches would go via my tree whereas
I expect the selftests/powerpc patches would go through Michael
Ellerman's tree.

Paul.


[PATCH 2/2] powerpc/vdso32: link vdso64 with linker

2020-09-01 Thread Nick Desaulniers
Rather than invoke the compiler as the driver, use the linker. That way
we can check --orphan-handling=warn support correctly, as cc-ldoption
was removed in
commit 055efab3120b ("kbuild: drop support for cc-ldoption").

Requires dropping the .got section.  I couldn't find how it was used in
the vdso32.

Fixes: commit f2af201002a8 ("powerpc/build: vdso linker warning for orphan 
sections")
Link: 
https://lore.kernel.org/lkml/cakwvodnn3wxydjomvnveyd_njwrku3fabwt_bs92duihhyw...@mail.gmail.com/
Signed-off-by: Nick Desaulniers 
---
Not sure removing .got is a good idea or not.  Otherwise I observe the
following link error:
powerpc-linux-gnu-ld: warning: orphan section `.got' from 
`arch/powerpc/kernel/vdso32/sigtramp.o' being placed in section `.got'
powerpc-linux-gnu-ld: _GLOBAL_OFFSET_TABLE_ not defined in linker created .got
powerpc-linux-gnu-ld: final link failed: bad value

sigtramp.c doesn't mention anything from the GOT AFAICT, and doesn't
look like it contains relocations that do, so I'm not sure where
references to _GLOBAL_OFFSET_TABLE_ are coming from.

 arch/powerpc/kernel/vdso32/Makefile | 7 +--
 arch/powerpc/kernel/vdso32/vdso32.lds.S | 3 ++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/vdso32/Makefile 
b/arch/powerpc/kernel/vdso32/Makefile
index 87ab1152d5ce..611a5951945a 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -27,6 +27,9 @@ UBSAN_SANITIZE := n
 ccflags-y := -shared -fno-common -fno-builtin -nostdlib \
-Wl,-soname=linux-vdso32.so.1 -Wl,--hash-style=both
 asflags-y := -D__VDSO32__ -s
+ldflags-y := -shared -soname linux-vdso32.so.1 \
+   $(call ld-option, --eh-frame-hdr) \
+   $(call ld-option, --orphan-handling=warn) -T
 
 obj-y += vdso32_wrapper.o
 extra-y += vdso32.lds
@@ -49,8 +52,8 @@ $(obj-vdso32): %.o: %.S FORCE
$(call if_changed_dep,vdso32as)
 
 # actual build commands
-quiet_cmd_vdso32ld = VDSO32L $@
-  cmd_vdso32ld = $(VDSOCC) $(c_flags) $(CC32FLAGS) -o $@ $(call 
cc-ldoption, -Wl$(comma)--orphan-handling=warn) -Wl,-T$(filter %.lds,$^) 
$(filter %.o,$^)
+quiet_cmd_vdso32ld = LD  $@
+  cmd_vdso32ld = $(cmd_ld)
 quiet_cmd_vdso32as = VDSO32A $@
   cmd_vdso32as = $(VDSOCC) $(a_flags) $(CC32FLAGS) -c -o $@ $<
 
diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S 
b/arch/powerpc/kernel/vdso32/vdso32.lds.S
index 4c985467a668..0ccdebad18b8 100644
--- a/arch/powerpc/kernel/vdso32/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S
@@ -61,7 +61,6 @@ SECTIONS
.fixup  : { *(.fixup) }
 
.dynamic: { *(.dynamic) }   :text   :dynamic
-   .got: { *(.got) }   :text
.plt: { *(.plt) }
 
_end = .;
@@ -108,7 +107,9 @@ SECTIONS
.debug_varnames  0 : { *(.debug_varnames) }
 
/DISCARD/   : {
+   *(.got)
*(.note.GNU-stack)
+   *(.branch_lt)
*(.data .data.* .gnu.linkonce.d.* .sdata*)
*(.bss .sbss .dynbss .dynsbss)
*(.glink .iplt .plt .rela*)
-- 
2.28.0.402.g5ffc5be6b7-goog



Re: [PATCH 2/7] powerpc/watchpoint/kvm: Add infrastructure to support 2nd DAWR

2020-09-01 Thread Paul Mackerras
On Thu, Jul 23, 2020 at 03:50:53PM +0530, Ravi Bangoria wrote:
> kvm code assumes single DAWR everywhere. Add code to support 2nd DAWR.
> DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/
> unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR.

Is this the same interface as will be defined in PAPR and available
under PowerVM, or is it a new/different interface for KVM?

> Also, kvm will support 2nd DAWR only if CPU_FTR_DAWR1 is set.

In general QEMU wants to be able to control all aspects of the virtual
machine presented to the guest, meaning that just because a host has a
particular hardware capability does not mean we should automatically
present that capability to the guest.

In this case, QEMU will want a way to control whether the guest sees
the availability of the second DAWR/X registers or not, i.e. whether a
H_SET_MODE to set DAWR[X]1 will succeed or fail.

Paul.


Re: [PATCH 1/7] powerpc/watchpoint/kvm: Rename current DAWR macros and variables

2020-09-01 Thread Paul Mackerras
On Thu, Jul 23, 2020 at 03:50:52PM +0530, Ravi Bangoria wrote:
> Power10 is introducing second DAWR. Use real register names (with
> suffix 0) from ISA for current macros and variables used by kvm.

Most of this looks fine, but I think we should not change the existing
names in arch/powerpc/include/uapi/asm/kvm.h (and therefore also
Documentation/virt/kvm/api.rst).

> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 426f94582b7a..4dc18fe6a2bf 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -2219,8 +2219,8 @@ registers, find a list below:
>PPC KVM_REG_PPC_BESCR   64
>PPC KVM_REG_PPC_TAR 64
>PPC KVM_REG_PPC_DPDES   64
> -  PPC KVM_REG_PPC_DAWR64
> -  PPC KVM_REG_PPC_DAWRX   64
> +  PPC KVM_REG_PPC_DAWR0   64
> +  PPC KVM_REG_PPC_DAWRX0  64
>PPC KVM_REG_PPC_CIABR   64
>PPC KVM_REG_PPC_IC  64
>PPC KVM_REG_PPC_VTB 64
...
> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
> b/arch/powerpc/include/uapi/asm/kvm.h
> index 264e266a85bf..38d61b73f5ed 100644
> --- a/arch/powerpc/include/uapi/asm/kvm.h
> +++ b/arch/powerpc/include/uapi/asm/kvm.h
> @@ -608,8 +608,8 @@ struct kvm_ppc_cpu_char {
>  #define KVM_REG_PPC_BESCR(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xa7)
>  #define KVM_REG_PPC_TAR  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xa8)
>  #define KVM_REG_PPC_DPDES(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xa9)
> -#define KVM_REG_PPC_DAWR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xaa)
> -#define KVM_REG_PPC_DAWRX(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xab)
> +#define KVM_REG_PPC_DAWR0(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xaa)
> +#define KVM_REG_PPC_DAWRX0   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xab)
>  #define KVM_REG_PPC_CIABR(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xac)
>  #define KVM_REG_PPC_IC   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xad)
>  #define KVM_REG_PPC_VTB  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xae)

The existing names are an API, and if you change them you will break
compilation of existing userspace programs.  I don't see that adding
the '0' on the end is so important that we need to break userspace.

Paul.


[PATCH v2] powerpc/powernv/pci: Drop pnv_phb->initialized

2020-09-01 Thread Oliver O'Halloran
The pnv_phb->initialized flag is an odd beast. It was added back in 2012 in
commit db1266c85261 ("powerpc/powernv: Skip check on PE if necessary") to
allow devices to be enabled even if the device had not yet been assigned to
a PE. Allowing the device to be enabled before the PE is configured may
cause spurious EEH events since none of the IOMMU context has been setup.

I'm not entirely sure why this was ever necessary. My best guess is that it
was an workaround for a bug or some other undesireable behaviour from the
PCI core. Either way, it's unnecessary now since as of commit dc3d8f85bb57
("powerpc/powernv/pci: Re-work bus PE configuration") we can guarantee that
the PE will be configured before the PCI core will allow drivers to bind to
the device.

It's also worth pointing out that the ->initialized flag is only set in
pnv_pci_ioda_create_dbgfs(). That function has its entire body wrapped
in #ifdef CONFIG_DEBUG_FS. As a result, for kernels built without debugfs
(i.e. petitboot) the other checks in pnv_pci_enable_device_hook() are
bypassed entirely.

Signed-off-by: Oliver O'Halloran 
---
v2: reword commit message a bit
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 17 -
 arch/powerpc/platforms/powernv/pci.h  |  1 -
 2 files changed, 18 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 023a4f987bb2..6ac3c637b313 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2410,9 +2410,6 @@ static void pnv_pci_ioda_create_dbgfs(void)
list_for_each_entry_safe(hose, tmp, _list, list_node) {
phb = hose->private_data;
 
-   /* Notify initialization of PHB done */
-   phb->initialized = 1;
-
sprintf(name, "PCI%04x", hose->global_number);
phb->dbgfs = debugfs_create_dir(name, powerpc_debugfs_root);
 
@@ -2609,17 +2606,8 @@ static resource_size_t pnv_pci_default_alignment(void)
  */
 static bool pnv_pci_enable_device_hook(struct pci_dev *dev)
 {
-   struct pnv_phb *phb = pci_bus_to_pnvhb(dev->bus);
struct pci_dn *pdn;
 
-   /* The function is probably called while the PEs have
-* not be created yet. For example, resource reassignment
-* during PCI probe period. We just skip the check if
-* PEs isn't ready.
-*/
-   if (!phb->initialized)
-   return true;
-
pdn = pci_get_pdn(dev);
if (!pdn || pdn->pe_number == IODA_INVALID_PE)
return false;
@@ -2629,14 +2617,9 @@ static bool pnv_pci_enable_device_hook(struct pci_dev 
*dev)
 
 static bool pnv_ocapi_enable_device_hook(struct pci_dev *dev)
 {
-   struct pci_controller *hose = pci_bus_to_host(dev->bus);
-   struct pnv_phb *phb = hose->private_data;
struct pci_dn *pdn;
struct pnv_ioda_pe *pe;
 
-   if (!phb->initialized)
-   return true;
-
pdn = pci_get_pdn(dev);
if (!pdn)
return false;
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 739a0b3b72e1..36d22920f5a3 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -119,7 +119,6 @@ struct pnv_phb {
int flags;
void __iomem*regs;
u64 regs_phys;
-   int initialized;
spinlock_t  lock;
 
 #ifdef CONFIG_DEBUG_FS
-- 
2.26.2



Re: [PATCH] soc: fsl: Remove bogus packed attributes from qman.h

2020-09-01 Thread Herbert Xu
On Tue, Sep 01, 2020 at 04:40:16PM -0500, Li Yang wrote:
>
> Looks like the CAAM driver and dependent QBMAN driver doesn't support
> COMPILE_TEST yet.  Are you trying to add the support for it?

Yes.

> I think this is a valid concern that if the parent structure doesn't
> meet certain alignment requirements, the alignment for the
> sub-structure cannot be guaranteed.  If we just remove the __packed
> attribute from the parent structure, the compiler could try to add
> padding in the parent structure to fulfill the alignment requirements
> of the sub structure which is not good.  I think the following changes
> are a better fix for the warnings:

This works for me.  Thanks!
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] cpuidle-pseries: Fix CEDE latency conversion from tb to us

2020-09-01 Thread Joel Stanley
On Tue, 1 Sep 2020 at 14:09, Gautham R. Shenoy  wrote:
>
> From: "Gautham R. Shenoy" 
>
> commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for
> CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
> of the Extended CEDE states advertised by the platform. The values
> advertised by the platform are in timebase ticks. However the cpuidle
> framework requires the latency values in microseconds.
>
> If the tb-ticks value advertised by the platform correspond to a value
> smaller than 1us, during the conversion from tb-ticks to microseconds,
> in the current code, the result becomes zero. This is incorrect as it
> puts a CEDE state on par with the snooze state.
>
> This patch fixes this by rounding up the result obtained while
> converting the latency value from tb-ticks to microseconds.
>
> Fixes: commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for
> CEDE(0)")
>
> Signed-off-by: Gautham R. Shenoy 

Reviewed-by: Joel Stanley 

Should you check for the zero case and print a warning?

> ---
>  drivers/cpuidle/cpuidle-pseries.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/cpuidle/cpuidle-pseries.c 
> b/drivers/cpuidle/cpuidle-pseries.c
> index ff6d99e..9043358 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -361,7 +361,7 @@ static void __init fixup_cede0_latency(void)
> for (i = 0; i < nr_xcede_records; i++) {
> struct xcede_latency_record *record = >records[i];
> u64 latency_tb = be64_to_cpu(record->latency_ticks);
> -   u64 latency_us = tb_to_ns(latency_tb) / NSEC_PER_USEC;
> +   u64 latency_us = DIV_ROUND_UP_ULL(tb_to_ns(latency_tb), 
> NSEC_PER_USEC);
>
> if (latency_us < min_latency_us)
> min_latency_us = latency_us;
> --
> 1.9.4
>


Re: ptrace_syscall_32 is failing

2020-09-01 Thread Andy Lutomirski
On Tue, Sep 1, 2020 at 4:50 PM Thomas Gleixner  wrote:
>
> On Sun, Aug 30 2020 at 08:52, Andy Lutomirski wrote:
> >> > [RUN]SYSCALL
> >> > [FAIL]Initial args are wrong (nr=29, args=0 0 0 0 0 4289172732)
> >> > [RUN]SYSCALL
> >> > [OK]Args after SIGUSR1 are correct (ax = -514)
> >> > [OK]Child got SIGUSR1
> >> > [RUN]Step again
> >> > [OK]pause(2) restarted correctly
> >>
> >> Bisected to commit 0b085e68f407 ("x86/entry: Consolidate 32/64 bit
> >> syscall entry").
> >> It looks like it is because syscall_enter_from_user_mode() is called
> >> before reading the 6th argument from the user stack.
>
> Bah.I don't know how I managed to miss that part and interestingly
> enough that none of the robots caught that either
>
> > Thomas, can we revert the syscall_enter() and syscall_exit() part of
> > the series?
>
> Hrm.
>
> > I think that they almost work for x86, but not quite as
> > indicated by this bug.  Even if we imagine we can somehow hack around
> > this bug, I imagine we're going to find other problems with this
> > model, e.g. the potential upcoming exit problem I noted in my review.
>
> What's the upcoming problem?

If we ever want to get single-stepping fully correct across syscalls,
we might need to inject SIGTRAP on syscall return. This would be more
awkward if we can't run instrumentable code after the syscall part of
the syscall is done.

>
> > I really think the model should be:
> >
> > void do_syscall_whatever(...)
> > {
> >   irqentry_enter(...);
> >   instrumentation_begin();
> >
> >   /* Do whatever arch ABI oddities are needed on entry. */
> >
> >   Then either:
> >   syscall_begin(arch, nr, regs);
> >   dispatch the syscall;
> >   syscall_end(arch, nr, regs);
> >
> >   Or just:
> >   generic_do_syscall(arch, nr, regs);
> >
> >   /* Do whatever arch ABI oddities are needed on exit from the syscall. */
> >
> >   instrumentation_end();
> >   irqentry_exit(...);
> > }
>
> I don't think we want that in general. The current variant is perfectly
> fine for everything except the 32bit fast syscall nonsense. Also
> irqentry_entry/exit is not equivalent to the syscall_enter/exit
> counterparts.

If there are any architectures in which actual work is needed to
figure out whether something is a syscall in the first place, they'll
want to do the usual kernel entry work before the syscall entry work.
Maybe your patch actually makes this possible -- I haven't digested
all the details yet.

Who advised you to drop the arch parameter?

> ---
>  arch/x86/entry/common.c  |   29 
>  include/linux/entry-common.h |   51 
> +++
>  kernel/entry/common.c|   35 -
>  3 files changed, 91 insertions(+), 24 deletions(-)
>
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -60,16 +60,10 @@
>  #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
>  static __always_inline unsigned int syscall_32_enter(struct pt_regs *regs)
>  {
> -   unsigned int nr = (unsigned int)regs->orig_ax;
> -
> if (IS_ENABLED(CONFIG_IA32_EMULATION))
> current_thread_info()->status |= TS_COMPAT;
> -   /*
> -* Subtlety here: if ptrace pokes something larger than 2^32-1 into
> -* orig_ax, the unsigned int return value truncates it.  This may
> -* or may not be necessary, but it matches the old asm behavior.
> -*/
> -   return (unsigned int)syscall_enter_from_user_mode(regs, nr);
> +
> +   return (unsigned int)regs->orig_ax;
>  }
>
>  /*
> @@ -91,15 +85,29 @@ static __always_inline void do_syscall_3
>  {
> unsigned int nr = syscall_32_enter(regs);
>
> +   /*
> +* Subtlety here: if ptrace pokes something larger than 2^32-1 into
> +* orig_ax, the unsigned int return value truncates it.  This may
> +* or may not be necessary, but it matches the old asm behavior.
> +*/
> +   nr = (unsigned int)syscall_enter_from_user_mode(regs, nr);
> +
> do_syscall_32_irqs_on(regs, nr);
> syscall_exit_to_user_mode(regs);
>  }
>
>  static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
>  {
> -   unsigned int nr = syscall_32_enter(regs);
> +   unsigned int nr = syscall_32_enter(regs);
> int res;
>
> +   /*
> +* This cannot use syscall_enter_from_user_mode() as it has to
> +* fetch EBP before invoking any of the syscall entry work
> +* functions.
> +*/
> +   syscall_enter_from_user_mode_prepare(regs);

I'm getting lost in all these "enter" functions...


[PATCH v3] powerpc: Warn about use of smt_snooze_delay

2020-09-01 Thread Joel Stanley
It's not done anything for a long time. Save the percpu variable, and
emit a warning to remind users to not expect it to do anything.

This uses pr_warn_once instead of pr_warn_ratelimit as testing
'ppc64_cpu --smt=off' on a 24 core / 4 SMT system showed the warning to
be noisy, as the online/offline loop is slow.

Fixes: 3fa8cad82b94 ("powerpc/pseries/cpuidle: smt-snooze-delay cleanup.")
Cc: sta...@vger.kernel.org # v3.14
Acked-by: Gautham R. Shenoy 
Signed-off-by: Joel Stanley 
--
v3:
 pr_warn_once instead of pr_warn_ratelimited
 Update meessages with mpe's suggestions
v2:
 Use pr_warn instead of WARN
 Reword and print proccess name with pid in message
 Leave CPU_FTR_SMT test in
---
 arch/powerpc/kernel/sysfs.c | 42 +++--
 1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 46b4ebc33db7..5dea98fa2f93 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -32,29 +32,27 @@
 
 static DEFINE_PER_CPU(struct cpu, cpu_devices);
 
-/*
- * SMT snooze delay stuff, 64-bit only for now
- */
-
 #ifdef CONFIG_PPC64
 
-/* Time in microseconds we delay before sleeping in the idle loop */
-static DEFINE_PER_CPU(long, smt_snooze_delay) = { 100 };
+/*
+ * Snooze delay has not been hooked up since 3fa8cad82b94 
("powerpc/pseries/cpuidle:
+ * smt-snooze-delay cleanup.") and has been broken even longer. As was 
foretold in
+ * 2014:
+ *
+ *  "ppc64_util currently utilises it. Once we fix ppc64_util, propose to clean
+ *  up the kernel code."
+ *
+ * powerpc-utils stopped using it as of 1.3.8. At some point in the future this
+ * code should be removed.
+ */
 
 static ssize_t store_smt_snooze_delay(struct device *dev,
  struct device_attribute *attr,
  const char *buf,
  size_t count)
 {
-   struct cpu *cpu = container_of(dev, struct cpu, dev);
-   ssize_t ret;
-   long snooze;
-
-   ret = sscanf(buf, "%ld", );
-   if (ret != 1)
-   return -EINVAL;
-
-   per_cpu(smt_snooze_delay, cpu->dev.id) = snooze;
+   pr_warn_once("%s (%d) stored to unsupported smt_snooze_delay, which has 
no effect.\n",
+current->comm, current->pid);
return count;
 }
 
@@ -62,9 +60,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev,
 struct device_attribute *attr,
 char *buf)
 {
-   struct cpu *cpu = container_of(dev, struct cpu, dev);
-
-   return sprintf(buf, "%ld\n", per_cpu(smt_snooze_delay, cpu->dev.id));
+   pr_warn_once("%s (%d) read from unsupported smt_snooze_delay\n",
+current->comm, current->pid);
+   return sprintf(buf, "100\n");
 }
 
 static DEVICE_ATTR(smt_snooze_delay, 0644, show_smt_snooze_delay,
@@ -72,16 +70,10 @@ static DEVICE_ATTR(smt_snooze_delay, 0644, 
show_smt_snooze_delay,
 
 static int __init setup_smt_snooze_delay(char *str)
 {
-   unsigned int cpu;
-   long snooze;
-
if (!cpu_has_feature(CPU_FTR_SMT))
return 1;
 
-   snooze = simple_strtol(str, NULL, 10);
-   for_each_possible_cpu(cpu)
-   per_cpu(smt_snooze_delay, cpu) = snooze;
-
+   pr_warn("smt-snooze-delay command line option has no effect\n");
return 1;
 }
 __setup("smt-snooze-delay=", setup_smt_snooze_delay);
-- 
2.28.0



Re: ptrace_syscall_32 is failing

2020-09-01 Thread Thomas Gleixner
On Sun, Aug 30 2020 at 08:52, Andy Lutomirski wrote:
>> > [RUN]SYSCALL
>> > [FAIL]Initial args are wrong (nr=29, args=0 0 0 0 0 4289172732)
>> > [RUN]SYSCALL
>> > [OK]Args after SIGUSR1 are correct (ax = -514)
>> > [OK]Child got SIGUSR1
>> > [RUN]Step again
>> > [OK]pause(2) restarted correctly
>>
>> Bisected to commit 0b085e68f407 ("x86/entry: Consolidate 32/64 bit
>> syscall entry").
>> It looks like it is because syscall_enter_from_user_mode() is called
>> before reading the 6th argument from the user stack.

Bah.I don't know how I managed to miss that part and interestingly
enough that none of the robots caught that either

> Thomas, can we revert the syscall_enter() and syscall_exit() part of
> the series?

Hrm.

> I think that they almost work for x86, but not quite as
> indicated by this bug.  Even if we imagine we can somehow hack around
> this bug, I imagine we're going to find other problems with this
> model, e.g. the potential upcoming exit problem I noted in my review.

What's the upcoming problem?

> I really think the model should be:
>
> void do_syscall_whatever(...)
> {
>   irqentry_enter(...);
>   instrumentation_begin();
>
>   /* Do whatever arch ABI oddities are needed on entry. */
>
>   Then either:
>   syscall_begin(arch, nr, regs);
>   dispatch the syscall;
>   syscall_end(arch, nr, regs);
>
>   Or just:
>   generic_do_syscall(arch, nr, regs);
>
>   /* Do whatever arch ABI oddities are needed on exit from the syscall. */
>
>   instrumentation_end();
>   irqentry_exit(...);
> }

I don't think we want that in general. The current variant is perfectly
fine for everything except the 32bit fast syscall nonsense. Also
irqentry_entry/exit is not equivalent to the syscall_enter/exit
counterparts.

> x86 has an ABI oddity needed on entry: this fast syscall argument
> fixup.  We also might end up with ABI oddities on exit if we ever try
> to make single-stepping of syscalls work fully correctly.  x86 sort of
> gets away without specifying arch because the arch helpers that get
> called for audit, etc can deduce the arch, but this is kind of gross.
> I suppose we could omit arch as an explicit parameter.

I had that in one of the early versions and was advised to drop that.

> Or I suppose we could try to rejigger the API in time for 5.9.
> Fortunately only x86 uses the new APIs so far.  I cc'd a bunch of
> other arch maintainers to see if other architectures fit well in the
> new syscall_enter() model, but I feel like the fact that x86 is
> already broken indicates that we messed it up a bit.

It's not unfixable and the fix is close to what you suggested above
except that it preserves the straight forward stuff for the !32bit fast
syscall case. Completely untested patch below. I run proper tests
tomorrow with brain awake.

Thanks,

tglx
---
 arch/x86/entry/common.c  |   29 
 include/linux/entry-common.h |   51 +++
 kernel/entry/common.c|   35 -
 3 files changed, 91 insertions(+), 24 deletions(-)

--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -60,16 +60,10 @@
 #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
 static __always_inline unsigned int syscall_32_enter(struct pt_regs *regs)
 {
-   unsigned int nr = (unsigned int)regs->orig_ax;
-
if (IS_ENABLED(CONFIG_IA32_EMULATION))
current_thread_info()->status |= TS_COMPAT;
-   /*
-* Subtlety here: if ptrace pokes something larger than 2^32-1 into
-* orig_ax, the unsigned int return value truncates it.  This may
-* or may not be necessary, but it matches the old asm behavior.
-*/
-   return (unsigned int)syscall_enter_from_user_mode(regs, nr);
+
+   return (unsigned int)regs->orig_ax;
 }
 
 /*
@@ -91,15 +85,29 @@ static __always_inline void do_syscall_3
 {
unsigned int nr = syscall_32_enter(regs);
 
+   /*
+* Subtlety here: if ptrace pokes something larger than 2^32-1 into
+* orig_ax, the unsigned int return value truncates it.  This may
+* or may not be necessary, but it matches the old asm behavior.
+*/
+   nr = (unsigned int)syscall_enter_from_user_mode(regs, nr);
+
do_syscall_32_irqs_on(regs, nr);
syscall_exit_to_user_mode(regs);
 }
 
 static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
 {
-   unsigned int nr = syscall_32_enter(regs);
+   unsigned int nr = syscall_32_enter(regs);
int res;
 
+   /*
+* This cannot use syscall_enter_from_user_mode() as it has to
+* fetch EBP before invoking any of the syscall entry work
+* functions.
+*/
+   syscall_enter_from_user_mode_prepare(regs);
+
instrumentation_begin();
/* Fetch EBP from where the vDSO stashed it. */
if (IS_ENABLED(CONFIG_X86_64)) {
@@ -122,6 +130,9 @@ static noinstr bool __do_fast_syscall_32
 

Re: fsl_espi errors on v5.7.15

2020-09-01 Thread Chris Packham

On 2/09/20 11:29 am, Chris Packham wrote:
>
> On 1/09/20 6:14 pm, Nicholas Piggin wrote:
>> Excerpts from Chris Packham's message of September 1, 2020 11:25 am:
>>> On 1/09/20 12:33 am, Heiner Kallweit wrote:
 On 30.08.2020 23:59, Chris Packham wrote:
> On 31/08/20 9:41 am, Heiner Kallweit wrote:
>> On 30.08.2020 23:00, Chris Packham wrote:
>>> On 31/08/20 12:30 am, Nicholas Piggin wrote:
 Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
>>> 
>>>
> I've also now seen the RX FIFO not empty error on the 
> T2080RDB
>
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's 
> aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>
> With my current workaround of emptying the RX FIFO. It seems
> survivable. Interestingly it only ever seems to be 1 extra 
> byte in the
> RX FIFO and it seems to be after either a READ_SR or a 
> READ_FSR.
>
> fsl_espi ffe11.spi: tx 70
> fsl_espi ffe11.spi: rx 03
> fsl_espi ffe11.spi: Extra RX 00
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's 
> aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
> fsl_espi ffe11.spi: tx 05
> fsl_espi ffe11.spi: rx 00
> fsl_espi ffe11.spi: Extra RX 03
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's 
> aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
> fsl_espi ffe11.spi: tx 05
> fsl_espi ffe11.spi: rx 00
> fsl_espi ffe11.spi: Extra RX 03
>
>  From all the Micron SPI-NOR datasheets I've got 
> access to it is
> possible to continually read the SR/FSR. But I've no idea 
> why it
> happens some times and not others.
 So I think I've got a reproduction and I think I've 
 bisected the problem
 to commit 3282a3da25bd ("powerpc/64: Implement soft 
 interrupt replay in
 C"). My day is just finishing now so I haven't applied too 
 much scrutiny
 to this result. Given the various rabbit holes I've been 
 down on this
 issue already I'd take this information with a good degree 
 of skepticism.

>>> OK, so an easy test should be to re-test with a 5.4 kernel.
>>> It doesn't have yet the change you're referring to, and the 
>>> fsl-espi driver
>>> is basically the same as in 5.7 (just two small changes in 
>>> 5.7).
>> There's 6cc0c16d82f88 and maybe also other interrupt related 
>> patches
>> around this time that could affect book E, so it's good if 
>> that exact
>> patch is confirmed.
> My confirmation is basically that I can induce the issue in a 
> 5.4 kernel
> by cherry-picking 3282a3da25bd. I'm also able to "fix" the 
> issue in
> 5.9-rc2 by reverting that one commit.
>
> I both cases it's not exactly a clean cherry-pick/revert so I 
> also
> confirmed the bisection result by building at 3282a3da25bd 
> (which sees
> the issue) and the commit just before (which does not).
 Thanks for testing, that confirms it well.

 [snip patch]

> I still saw the issue with this change applied. 
> PPC_IRQ_SOFT_MASK_DEBUG
> didn't report anything (either with or without the change above).
 Okay, it was a bit of a shot in the dark. I still can't see what
 else has changed.

 What would cause this, a lost interrupt? A spurious interrupt? Or
 higher interrupt latency?

 I don't think the patch should cause significantly worse latency,
 (it's supposed to be a bit better if anything because it 
 doesn't set
 up the full interrupt frame). But it's possible.
>>> My working theory is that the SPI_DON indication is all about 
>>> the TX
>>> direction an now that the interrupts are faster we're hitting an 
>>> error
>>> because there is still RX activity going on. Heiner disagrees 
>>> with my
>>> 

Re: fsl_espi errors on v5.7.15

2020-09-01 Thread Chris Packham

On 1/09/20 6:14 pm, Nicholas Piggin wrote:
> Excerpts from Chris Packham's message of September 1, 2020 11:25 am:
>> On 1/09/20 12:33 am, Heiner Kallweit wrote:
>>> On 30.08.2020 23:59, Chris Packham wrote:
 On 31/08/20 9:41 am, Heiner Kallweit wrote:
> On 30.08.2020 23:00, Chris Packham wrote:
>> On 31/08/20 12:30 am, Nicholas Piggin wrote:
>>> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
>> 
>>
 I've also now seen the RX FIFO not empty error on the T2080RDB

 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32

 With my current workaround of emptying the RX FIFO. It seems
 survivable. Interestingly it only ever seems to be 1 extra byte in 
 the
 RX FIFO and it seems to be after either a READ_SR or a READ_FSR.

 fsl_espi ffe11.spi: tx 70
 fsl_espi ffe11.spi: rx 03
 fsl_espi ffe11.spi: Extra RX 00
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
 fsl_espi ffe11.spi: tx 05
 fsl_espi ffe11.spi: rx 00
 fsl_espi ffe11.spi: Extra RX 03
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
 fsl_espi ffe11.spi: tx 05
 fsl_espi ffe11.spi: rx 00
 fsl_espi ffe11.spi: Extra RX 03

  From all the Micron SPI-NOR datasheets I've got access to it 
 is
 possible to continually read the SR/FSR. But I've no idea why it
 happens some times and not others.
>>> So I think I've got a reproduction and I think I've bisected the 
>>> problem
>>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt 
>>> replay in
>>> C"). My day is just finishing now so I haven't applied too much 
>>> scrutiny
>>> to this result. Given the various rabbit holes I've been down on 
>>> this
>>> issue already I'd take this information with a good degree of 
>>> skepticism.
>>>
>> OK, so an easy test should be to re-test with a 5.4 kernel.
>> It doesn't have yet the change you're referring to, and the fsl-espi 
>> driver
>> is basically the same as in 5.7 (just two small changes in 5.7).
> There's 6cc0c16d82f88 and maybe also other interrupt related patches
> around this time that could affect book E, so it's good if that exact
> patch is confirmed.
 My confirmation is basically that I can induce the issue in a 5.4 
 kernel
 by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in
 5.9-rc2 by reverting that one commit.

 I both cases it's not exactly a clean cherry-pick/revert so I also
 confirmed the bisection result by building at 3282a3da25bd (which sees
 the issue) and the commit just before (which does not).
>>> Thanks for testing, that confirms it well.
>>>
>>> [snip patch]
>>>
 I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG
 didn't report anything (either with or without the change above).
>>> Okay, it was a bit of a shot in the dark. I still can't see what
>>> else has changed.
>>>
>>> What would cause this, a lost interrupt? A spurious interrupt? Or
>>> higher interrupt latency?
>>>
>>> I don't think the patch should cause significantly worse latency,
>>> (it's supposed to be a bit better if anything because it doesn't set
>>> up the full interrupt frame). But it's possible.
>> My working theory is that the SPI_DON indication is all about the TX
>> direction an now that the interrupts are faster we're hitting an error
>> because there is still RX activity going on. Heiner disagrees with my
>> interpretation of the SPI_DON indication and the fact that it doesn't
>> happen every time does throw doubt on it.
>>
> It's right that the eSPI spec can be interpreted that SPI_DON refers to
> TX only. However this wouldn't really make sense, because also for RX
> we program the frame 

Re: [PATCH v1 02/10] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE on iommu_*_coherent()

2020-09-01 Thread Leonardo Bras
On Mon, 2020-08-31 at 10:47 +1000, Alexey Kardashevskiy wrote:
> > 
> > Maybe testing with host 64k pagesize and IOMMU 16MB pagesize in qemu
> > should be enough, is there any chance to get indirect mapping in qemu
> > like this? (DDW but with smaller DMA window available) 
> 
> You will have to hack the guest kernel to always do indirect mapping or
> hack QEMU's rtas_ibm_query_pe_dma_window() to return a small number of
> available TCEs. But you will be testing QEMU/KVM which behave quite
> differently to pHyp in this particular case.
> 

As you suggested before, building for 4k cpu pagesize should be the
best approach. It would allow testing for both pHyp and qemu scenarios.

> > > > > Because if we want the former (==support), then we'll have to align 
> > > > > the
> > > > > size up to the bigger page size when allocating/zeroing system pages,
> > > > > etc. 
> > > > 
> > > > This part I don't understand. Why do we need to align everything to the
> > > > bigger pagesize? 
> > > > 
> > > > I mean, is not that enough that the range [ret, ret + size[ is both
> > > > allocated by mm and mapped on a iommu range?
> > > > 
> > > > Suppose a iommu_alloc_coherent() of 16kB on PAGESIZE = 4k and
> > > > IOMMU_PAGE_SIZE() == 64k.
> > > > Why 4 * cpu_pages mapped by a 64k IOMMU page is not enough? 
> > > > All the space the user asked for is allocated and mapped for DMA.
> > > 
> > > The user asked to map 16K, the rest - 48K - is used for something else
> > > (may be even mapped to another device) but you are making all 64K
> > > accessible by the device which only should be able to access 16K.
> > > 
> > > In practice, if this happens, H_PUT_TCE will simply fail.
> > 
> > I have noticed mlx5 driver getting a few bytes in a buffer, and using
> > iommu_map_page(). It does map a whole page for as few bytes as the user
> 
> Whole 4K system page or whole 64K iommu page?

I tested it in 64k system page + 64k iommu page.

The 64K system page may be used for anything, and a small portion of it
(say 128 bytes) needs to be used for DMA. 
The whole page is mapped by IOMMU, and the driver gets info of the
memory range it should access / modify.

> 
> > wants mapped, and the other bytes get used for something else, or just
> > mapped on another DMA page.
> > It seems to work fine.  
> 
> 
> With 4K system page and 64K IOMMU page? In practice it would take an
> effort or/and bad luck to see it crashing. Thanks,

I haven't tested it yet. On a 64k system page and 4k/64k iommu page, it
works as described above.

I am new to this, so I am trying to understand how a memory page mapped
as DMA, and used for something else could be a problem.

Thanks!

> 
> > > 
> > > > > Bigger pages are not the case here as I understand it.
> > > > 
> > > > I did not get this part, what do you mean?
> > > 
> > > Possible IOMMU page sizes are 4K, 64K, 2M, 16M, 256M, 1GB, and the
> > > supported set of sizes is different for P8/P9 and type of IO (PHB,
> > > NVLink/CAPI).
> > > 
> > > 
> > > > > > Update those functions to guarantee alignment with requested size
> > > > > > using IOMMU_PAGE_ALIGN() before doing iommu_alloc() / iommu_free().
> > > > > > 
> > > > > > Also, on iommu_range_alloc(), replace ALIGN(n, 1 << 
> > > > > > tbl->it_page_shift)
> > > > > > with IOMMU_PAGE_ALIGN(n, tbl), which seems easier to read.
> > > > > > 
> > > > > > Signed-off-by: Leonardo Bras 
> > > > > > ---
> > > > > >  arch/powerpc/kernel/iommu.c | 17 +
> > > > > >  1 file changed, 9 insertions(+), 8 deletions(-)
> > > > > > 
> > > > > > diff --git a/arch/powerpc/kernel/iommu.c 
> > > > > > b/arch/powerpc/kernel/iommu.c
> > > > > > index 9704f3f76e63..d7086087830f 100644
> > > > > > --- a/arch/powerpc/kernel/iommu.c
> > > > > > +++ b/arch/powerpc/kernel/iommu.c
> > > > > > @@ -237,10 +237,9 @@ static unsigned long iommu_range_alloc(struct 
> > > > > > device *dev,
> > > > > > }
> > > > > >  
> > > > > > if (dev)
> > > > > > -   boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> > > > > > - 1 << tbl->it_page_shift);
> > > > > > +   boundary_size = 
> > > > > > IOMMU_PAGE_ALIGN(dma_get_seg_boundary(dev) + 1, tbl);
> > > > > 
> > > > > Run checkpatch.pl, should complain about a long line.
> > > > 
> > > > It's 86 columns long, which is less than the new limit of 100 columns
> > > > Linus announced a few weeks ago. checkpatch.pl was updated too:
> > > > https://www.phoronix.com/scan.php?page=news_item=Linux-Kernel-Deprecates-80-Col
> > > 
> > > Yay finally :) Thanks,
> > 
> > :)
> > 
> > > 
> > > > > > else
> > > > > > -   boundary_size = ALIGN(1UL << 32, 1 << 
> > > > > > tbl->it_page_shift);
> > > > > > +   boundary_size = IOMMU_PAGE_ALIGN(1UL << 32, tbl);
> > > > > > /* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
> > > > > >  
> > > > > > n = iommu_area_alloc(tbl->it_map, limit, start, npages, 
> > > > > > tbl->it_offset,
> > > > > > @@ -858,6 +857,7 @@ void 

[PATCH 2/2] dma-mapping: set default segment_boundary_mask to ULONG_MAX

2020-09-01 Thread Nicolin Chen
The default segment_boundary_mask was set to DMA_BIT_MAKS(32)
a decade ago by referencing SCSI/block subsystem, as a 32-bit
mask was good enough for most of the devices.

Now more and more drivers set dma_masks above DMA_BIT_MAKS(32)
while only a handful of them call dma_set_seg_boundary(). This
means that most drivers have a 4GB segmention boundary because
DMA API returns a 32-bit default value, though they might not
really have such a limit.

The default segment_boundary_mask should mean "no limit" since
the device doesn't explicitly set the mask. But a 32-bit mask
certainly limits those devices capable of 32+ bits addressing.

So this patch sets default segment_boundary_mask to ULONG_MAX.

Signed-off-by: Nicolin Chen 
---
 include/linux/dma-mapping.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index faab0a8210b9..df0bff2ea750 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -629,7 +629,7 @@ static inline unsigned long dma_get_seg_boundary(struct 
device *dev)
 {
if (dev->dma_parms && dev->dma_parms->segment_boundary_mask)
return dev->dma_parms->segment_boundary_mask;
-   return DMA_BIT_MASK(32);
+   return ULONG_MAX;
 }
 
 /**
-- 
2.17.1



[PATCH 0/2] dma-mapping: update default segment_boundary_mask

2020-09-01 Thread Nicolin Chen
These two patches are to update default segment_boundary_mask.

PATCH-1 fixes overflow issues in callers of dma_get_seg_boundary.
Previous version was a series: https://lkml.org/lkml/2020/8/31/1026

Then PATCH-2 sets default segment_boundary_mask to ULONG_MAX.

Nicolin Chen (2):
  dma-mapping: introduce dma_get_seg_boundary_nr_pages()
  dma-mapping: set default segment_boundary_mask to ULONG_MAX

 arch/alpha/kernel/pci_iommu.c|  7 +--
 arch/ia64/hp/common/sba_iommu.c  |  3 +--
 arch/powerpc/kernel/iommu.c  |  9 ++---
 arch/s390/pci/pci_dma.c  |  6 ++
 arch/sparc/kernel/iommu-common.c | 10 +++---
 arch/sparc/kernel/iommu.c|  3 +--
 arch/sparc/kernel/pci_sun4v.c|  3 +--
 arch/x86/kernel/amd_gart_64.c|  3 +--
 drivers/parisc/ccio-dma.c|  3 +--
 drivers/parisc/sba_iommu.c   |  3 +--
 include/linux/dma-mapping.h  | 21 -
 11 files changed, 34 insertions(+), 37 deletions(-)

-- 
2.17.1



[PATCH 1/2] dma-mapping: introduce dma_get_seg_boundary_nr_pages()

2020-09-01 Thread Nicolin Chen
We found that callers of dma_get_seg_boundary mostly do an ALIGN
with page mask and then do a page shift to get number of pages:
ALIGN(boundary + 1, 1 << shift) >> shift

However, the boundary might be as large as ULONG_MAX, which means
that a device has no specific boundary limit. So either "+ 1" or
passing it to ALIGN() would potentially overflow.

According to kernel defines:
#define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask))
#define ALIGN(x, a) ALIGN_MASK(x, (typeof(x))(a) - 1)

We can simplify the logic here into a helper function doing:
  ALIGN(boundary + 1, 1 << shift) >> shift
= ALIGN_MASK(b + 1, (1 << s) - 1) >> s
= {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s
= [b + 1 + (1 << s) - 1] >> s
= [b + (1 << s)] >> s
= (b >> s) + 1

This patch introduces and applies dma_get_seg_boundary_nr_pages()
as an overflow-free helper for the dma_get_seg_boundary() callers
to get numbers of pages. It also takes care of the NULL dev case
for non-DMA API callers.

Suggested-by: Christoph Hellwig 
Signed-off-by: Nicolin Chen 
---
 arch/alpha/kernel/pci_iommu.c|  7 +--
 arch/ia64/hp/common/sba_iommu.c  |  3 +--
 arch/powerpc/kernel/iommu.c  |  9 ++---
 arch/s390/pci/pci_dma.c  |  6 ++
 arch/sparc/kernel/iommu-common.c | 10 +++---
 arch/sparc/kernel/iommu.c|  3 +--
 arch/sparc/kernel/pci_sun4v.c|  3 +--
 arch/x86/kernel/amd_gart_64.c|  3 +--
 drivers/parisc/ccio-dma.c|  3 +--
 drivers/parisc/sba_iommu.c   |  3 +--
 include/linux/dma-mapping.h  | 19 +++
 11 files changed, 33 insertions(+), 36 deletions(-)

diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index 81037907268d..6f7de4f4e191 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -141,12 +141,7 @@ iommu_arena_find_pages(struct device *dev, struct 
pci_iommu_arena *arena,
unsigned long boundary_size;
 
base = arena->dma_base >> PAGE_SHIFT;
-   if (dev) {
-   boundary_size = dma_get_seg_boundary(dev) + 1;
-   boundary_size >>= PAGE_SHIFT;
-   } else {
-   boundary_size = 1UL << (32 - PAGE_SHIFT);
-   }
+   boundary_size = dma_get_seg_boundary_nr_pages(dev, PAGE_SHIFT);
 
/* Search forward for the first mask-aligned sequence of N free ptes */
ptes = arena->ptes;
diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 656a4888c300..b49b73a95067 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -485,8 +485,7 @@ sba_search_bitmap(struct ioc *ioc, struct device *dev,
ASSERT(((unsigned long) ioc->res_hint & (sizeof(unsigned long) - 1UL)) 
== 0);
ASSERT(res_ptr < res_end);
 
-   boundary_size = (unsigned long long)dma_get_seg_boundary(dev) + 1;
-   boundary_size = ALIGN(boundary_size, 1ULL << iovp_shift) >> iovp_shift;
+   boundary_size = dma_get_seg_boundary_nr_pages(dev, iovp_shift);
 
BUG_ON(ioc->ibase & ~iovp_mask);
shift = ioc->ibase >> iovp_shift;
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 9704f3f76e63..cbc2e62db597 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -236,15 +236,10 @@ static unsigned long iommu_range_alloc(struct device *dev,
}
}
 
-   if (dev)
-   boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
- 1 << tbl->it_page_shift);
-   else
-   boundary_size = ALIGN(1UL << 32, 1 << tbl->it_page_shift);
-   /* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
+   boundary_size = dma_get_seg_boundary_nr_pages(dev, tbl->it_page_shift);
 
n = iommu_area_alloc(tbl->it_map, limit, start, npages, tbl->it_offset,
-boundary_size >> tbl->it_page_shift, align_mask);
+boundary_size, align_mask);
if (n == -1) {
if (likely(pass == 0)) {
/* First try the pool from the start */
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 64b1399a73f0..4a37d8f4de9d 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -261,13 +261,11 @@ static unsigned long __dma_alloc_iommu(struct device *dev,
   unsigned long start, int size)
 {
struct zpci_dev *zdev = to_zpci(to_pci_dev(dev));
-   unsigned long boundary_size;
 
-   boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
- PAGE_SIZE) >> PAGE_SHIFT;
return iommu_area_alloc(zdev->iommu_bitmap, zdev->iommu_pages,
start, size, zdev->start_dma >> PAGE_SHIFT,
-   boundary_size, 0);
+   dma_get_seg_boundary_nr_pages(dev, PAGE_SHIFT),
+   0);
 }
 
 static dma_addr_t 

Re: [PATCH] soc: fsl: Remove bogus packed attributes from qman.h

2020-09-01 Thread Li Yang
On Mon, Aug 31, 2020 at 8:57 PM Herbert Xu  wrote:
>
> On Tue, Sep 01, 2020 at 01:50:38AM +, Leo Li wrote:
> >
> > Sorry for the late response.  I missed this email previously.
> >
> > These structures are descriptors used by hardware, we cannot have _ANY_ 
> > padding from the compiler.  The compiled result might be the same with or 
> > without the __packed attribute for now, but I think keep it there probably 
> > is safer for dealing with unexpected alignment requirements from the 
> > compiler in the future.
> >
> > Having conflicting alignment requirements warning might means something is 
> > wrong with the structure in certain scenario.  I just tried a ARM64 build 
> > but didn't see the warnings.  Could you share the warning you got and the 
> > build setup?  Thanks.
>
> Just do a COMPILE_TEST build on x86-64:
>
> In file included from ../drivers/crypto/caam/qi.c:12:

Looks like the CAAM driver and dependent QBMAN driver doesn't support
COMPILE_TEST yet.  Are you trying to add the support for it?

I changed the Kconfig to enable the COMPILE_TEST anyway and updated my
toolchain to gcc-10 trying to duplicate the issue.  The issues can
only be reproduced with "W=1".

> ../include/soc/fsl/qman.h:259:1: warning: alignment 1 of ‘struct 
> qm_dqrr_entry’ is less than 8 [-Wpacked-not-aligned]
>  } __packed;
>  ^
> ../include/soc/fsl/qman.h:292:2: warning: alignment 1 of ‘struct ’ 
> is less than 8 [-Wpacked-not-aligned]
>   } __packed ern;
>   ^

I think this is a valid concern that if the parent structure doesn't
meet certain alignment requirements, the alignment for the
sub-structure cannot be guaranteed.  If we just remove the __packed
attribute from the parent structure, the compiler could try to add
padding in the parent structure to fulfill the alignment requirements
of the sub structure which is not good.  I think the following changes
are a better fix for the warnings:

diff --git a/include/soc/fsl/qman.h b/include/soc/fsl/qman.h
index cfe00e08e85b..9f484113cfda 100644
--- a/include/soc/fsl/qman.h
+++ b/include/soc/fsl/qman.h
@@ -256,7 +256,7 @@ struct qm_dqrr_entry {
__be32 context_b;
struct qm_fd fd;
u8 __reserved4[32];
-} __packed;
+} __packed __aligned(64);
 #define QM_DQRR_VERB_VBIT  0x80
 #define QM_DQRR_VERB_MASK  0x7f/* where the verb contains; */
 #define QM_DQRR_VERB_FRAME_DEQUEUE 0x60/* "this format" */
@@ -289,7 +289,7 @@ union qm_mr_entry {
__be32 tag;
struct qm_fd fd;
u8 __reserved1[32];
-   } __packed ern;
+   } __packed __aligned(64) ern;
struct {
u8 verb;
u8 fqs; /* Frame Queue Status */


Regards,
Leo


Re: [PATCH v1 01/10] powerpc/pseries/iommu: Replace hard-coded page shift

2020-09-01 Thread Leonardo Bras
On Mon, 2020-08-31 at 13:48 +1000, Alexey Kardashevskiy wrote:
> > > > Well, I created this TCE_RPN_BITS = 52 because the previous mask was a
> > > > hardcoded 40-bit mask (0xfful), for hard-coded 12-bit (4k)
> > > > pagesize, and on PAPR+/LoPAR also defines TCE as having bits 0-51
> > > > described as RPN, as described before.
> > > > 
> > > > IODA3 Revision 3.0_prd1 (OpenPowerFoundation), Figure 3.4 and 3.5.
> > > > shows system memory mapping into a TCE, and the TCE also has bits 0-51
> > > > for the RPN (52 bits). "Table 3.6. TCE Definition" also shows it.
> > > > In fact, by the looks of those figures, the RPN_MASK should always be a
> > > > 52-bit mask, and RPN = (page >> tceshift) & RPN_MASK.
> > > 
> > > I suspect the mask is there in the first place for extra protection
> > > against too big addresses going to the TCE table (or/and for virtial vs
> > > physical addresses). Using 52bit mask makes no sense for anything, you
> > > could just drop the mask and let c compiler deal with 64bit "uint" as it
> > > is basically a 4K page address anywhere in the 64bit space. Thanks,
> > 
> > Assuming 4K pages you need 52 RPN bits to cover the whole 64bit
> > physical address space. The IODA3 spec does explicitly say the upper
> > bits are optional and the implementation only needs to support enough
> > to cover up to the physical address limit, which is 56bits of P9 /
> > PHB4. If you want to validate that the address will fit inside of
> > MAX_PHYSMEM_BITS then fine, but I think that should be done as a
> > WARN_ON or similar rather than just silently masking off the bits.
> 
> We can do this and probably should anyway but I am also pretty sure we
> can just ditch the mask and have the hypervisor return an error which
> will show up in dmesg.

Ok then, ditching the mask.
Thanks!



Re: [RESEND][PATCH 1/7] powerpc/iommu: Avoid overflow at boundary_size

2020-09-01 Thread Nicolin Chen
On Tue, Sep 01, 2020 at 11:27:36PM +1000, Michael Ellerman wrote:
> Nicolin Chen  writes:
> > The boundary_size might be as large as ULONG_MAX, which means
> > that a device has no specific boundary limit. So either "+ 1"
> > or passing it to ALIGN() would potentially overflow.
> >
> > According to kernel defines:
> > #define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask))
> > #define ALIGN(x, a) ALIGN_MASK(x, (typeof(x))(a) - 1)
> >
> > We can simplify the logic here:
> >   ALIGN(boundary + 1, 1 << shift) >> shift
> > = ALIGN_MASK(b + 1, (1 << s) - 1) >> s
> > = {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s
> > = [b + 1 + (1 << s) - 1] >> s
> > = [b + (1 << s)] >> s
> > = (b >> s) + 1
> >
> > So fixing a potential overflow with the safer shortcut.
> >
> > Reported-by: Stephen Rothwell 
> > Signed-off-by: Nicolin Chen 
> > Cc: Christoph Hellwig 
> > ---
> >  arch/powerpc/kernel/iommu.c | 11 +--
> >  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> Are you asking for acks, or for maintainers to merge the patches
> individually?

I was expecting that but Christoph just suggested me to squash them
into one so he would merge it: https://lkml.org/lkml/2020/9/1/159

Though I feel it'd be nice to get maintainers' acks before merging.

> > diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> > index 9704f3f76e63..c01ccbf8afdd 100644
> > --- a/arch/powerpc/kernel/iommu.c
> > +++ b/arch/powerpc/kernel/iommu.c
> > @@ -236,15 +236,14 @@ static unsigned long iommu_range_alloc(struct device 
> > *dev,
> > }
> > }
> >  
> > -   if (dev)
> > -   boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> > - 1 << tbl->it_page_shift);
> > -   else
> > -   boundary_size = ALIGN(1UL << 32, 1 << tbl->it_page_shift);
> > /* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
> > +   boundary_size = dev ? dma_get_seg_boundary(dev) : U32_MAX;
> 
> Is there any path that passes a NULL dev anymore?
> 
> Both iseries_hv_alloc() and iseries_hv_map() were removed years ago.
> See:
>   8ee3e0d69623 ("powerpc: Remove the main legacy iSerie platform code")
> 
> 
> So maybe we should do a lead-up patch that drops the NULL dev support,
> which will then make this patch simpler.

The next version of this change will follow Christoph's suggestion
by having a helper function that takes care of !dev internally.

Thanks
Nic

> 
> 
> > +   /* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
> > +   boundary_size = (boundary_size >> tbl->it_page_shift) + 1;
> >  
> > n = iommu_area_alloc(tbl->it_map, limit, start, npages, tbl->it_offset,
> > -boundary_size >> tbl->it_page_shift, align_mask);
> > +boundary_size, align_mask);
> > if (n == -1) {
> > if (likely(pass == 0)) {
> > /* First try the pool from the start */
> > -- 
> > 2.17.1


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v2

2020-09-01 Thread Christophe Leroy




Le 01/09/2020 à 19:25, Al Viro a écrit :

On Tue, Sep 01, 2020 at 07:13:00PM +0200, Christophe Leroy wrote:


 10.92%  dd   [kernel.kallsyms]  [k] iov_iter_zero


Interesting...  Could you get an instruction-level profile inside 
iov_iter_zero(),
along with the disassembly of that sucker?



As a comparison, hereunder is the perf annotate of the 5.9-rc2 without 
the series:


 Percent |	Source code & Disassembly of vmlinux for cpu-clock (2581 
samples)

-
 :
 :
 :
 :  Disassembly of section .text:
 :
 :  c02cbb80 :
 :  iov_iter_zero():
3.22 :c02cbb80:   stwur1,-80(r1)
3.25 :c02cbb84:   stw r30,72(r1)
0.00 :c02cbb88:   mr  r30,r4
2.91 :c02cbb8c:   stw r31,76(r1)
0.00 :c02cbb90:   mr  r31,r3
0.19 :c02cbb94:   stw r27,60(r1)
 :  iov_iter_type():
1.82 :c02cbb98:   lwz r10,0(r4)
0.54 :c02cbb9c:   rlwinm  r9,r10,0,0,30
 :  iov_iter_zero():
1.98 :c02cbba0:   cmpwi   r9,32
0.00 :c02cbba4:   lwz r9,624(r2)
0.35 :c02cbba8:   stw r9,28(r1)
0.00 :c02cbbac:   li  r9,0
0.00 :c02cbbb0:   beq c02cbd00 
2.67 :c02cbbb4:   lwz r9,8(r4)
1.98 :c02cbbb8:   cmplw   r9,r3
0.00 :c02cbbbc:   mr  r27,r9
0.00 :c02cbbc0:   bgt c02cbce8 
0.31 :c02cbbc4:   cmpwi   r9,0
0.00 :c02cbbc8:   beq c02cbcbc 
3.22 :c02cbbcc:   andi.   r8,r10,16
1.70 :c02cbbd0:   lwz r31,4(r30)
0.00 :c02cbbd4:   bne c02cbe10 
0.31 :c02cbbd8:   andi.   r8,r10,8
0.00 :c02cbbdc:   bne c02cbf64 
1.82 :c02cbbe0:   andi.   r10,r10,64
0.00 :c02cbbe4:   bne c02cc080 
0.27 :c02cbbe8:   stw r29,68(r1)
1.94 :c02cbbec:   stw r28,64(r1)
1.98 :c02cbbf0:   lwz r28,12(r30)
0.31 :c02cbbf4:   lwz r7,4(r28)
2.13 :c02cbbf8:   subfr29,r31,r7
1.78 :c02cbbfc:   cmplw   r29,r27
0.08 :c02cbc00:   bgt c02cbcf8 
   28.24 :c02cbc04:   cmpwi   r29,0
0.00 :c02cbc08:   beq c02cc08c 
2.01 :c02cbc0c:   lwz r3,0(r28)
3.10 :c02cbc10:   lwz r10,1208(r2)
0.00 :c02cbc14:   add r3,r3,r31
 :  __access_ok():
0.00 :c02cbc18:   cmplw   r3,r10
0.00 :c02cbc1c:   bgt c02cbc7c 
3.37 :c02cbc20:   subfr10,r3,r10
0.00 :c02cbc24:   addir8,r29,-1
3.14 :c02cbc28:   cmplw   r8,r10
0.08 :c02cbc2c:   mflrr0
0.00 :c02cbc30:   stw r0,84(r1)
0.00 :c02cbc34:   bgt c02cbd40 
 :  clear_user():
0.00 :c02cbc38:   mr  r4,r29
2.40 :c02cbc3c:   bl  c001a428 <__arch_clear_user>
 :  iov_iter_zero():
1.55 :c02cbc40:   add r31,r31,r29
0.00 :c02cbc44:   cmpwi   r3,0
1.94 :c02cbc48:   subfr29,r29,r27
0.00 :c02cbc4c:   subfr31,r3,r31
0.00 :c02cbc50:   add r29,r29,r3
0.00 :c02cbc54:   beq c02cc0ac 
0.00 :c02cbc58:   lwz r9,8(r30)
0.00 :c02cbc5c:   subfr10,r27,r29
0.00 :c02cbc60:   lwz r0,84(r1)
0.00 :c02cbc64:   subfr27,r29,r27
0.00 :c02cbc68:   add r9,r10,r9
0.00 :c02cbc6c:   lwz r7,4(r28)
0.00 :c02cbc70:   lwz r10,12(r30)
0.00 :c02cbc74:   mtlrr0
0.00 :c02cbc78:   b   c02cbc84 
 :  __access_ok():
0.00 :c02cbc7c:   li  r27,0
0.00 :c02cbc80:   mr  r10,r28
 :  iov_iter_zero():
0.00 :c02cbc84:   cmplw   r31,r7
0.00 :c02cbc88:   bne c02cbc94 
0.93 :c02cbc8c:   addir28,r28,8
0.00 :c02cbc90:   li  r31,0
1.28 :c02cbc94:   lwz r8,16(r30)
0.00 :c02cbc98:   subfr10,r10,r28
1.05 :c02cbc9c:   srawi   r10,r10,3
0.00 :c02cbca0:   stw r28,12(r30)
0.00 :c02cbca4:   subfr10,r10,r8
0.93 :c02cbca8:   stw r10,16(r30)
0.04 :c02cbcac:   lwz r28,64(r1)
0.00 :c02cbcb0:   lwz r29,68(r1)
1.05 :c02cbcb4:   stw r9,8(r30)
0.00 :c02cbcb8: 

Re: [PATCH 05/10] lkdtm: disable set_fs-based tests for !CONFIG_SET_FS

2020-09-01 Thread Kees Cook
On Sat, Aug 29, 2020 at 11:24:06AM +0200, Christoph Hellwig wrote:
> On Thu, Aug 27, 2020 at 11:06:28AM -0700, Linus Torvalds wrote:
> > On Thu, Aug 27, 2020 at 8:00 AM Christoph Hellwig  wrote:
> > >
> > > Once we can't manipulate the address limit, we also can't test what
> > > happens when the manipulation is abused.
> > 
> > Just remove these tests entirely.
> > 
> > Once set_fs() doesn't exist on x86, the tests no longer make any sense
> > what-so-ever, because test coverage will be basically zero.
> > 
> > So don't make the code uglier just to maintain a fiction that
> > something is tested when it isn't really.
> 
> Sure fine with me unless Kees screams.

To clarify: if any of x86, arm64, arm, powerpc, riscv, and s390 are
using set_fs(), I want to keep this test. "ugly" is fine in lkdtm. :)

-- 
Kees Cook


Re: [PATCH 05/10] lkdtm: disable set_fs-based tests for !CONFIG_SET_FS

2020-09-01 Thread Kees Cook
On Sat, Aug 29, 2020 at 11:24:06AM +0200, Christoph Hellwig wrote:
> On Thu, Aug 27, 2020 at 11:06:28AM -0700, Linus Torvalds wrote:
> > On Thu, Aug 27, 2020 at 8:00 AM Christoph Hellwig  wrote:
> > >
> > > Once we can't manipulate the address limit, we also can't test what
> > > happens when the manipulation is abused.
> > 
> > Just remove these tests entirely.
> > 
> > Once set_fs() doesn't exist on x86, the tests no longer make any sense
> > what-so-ever, because test coverage will be basically zero.
> > 
> > So don't make the code uglier just to maintain a fiction that
> > something is tested when it isn't really.
> 
> Sure fine with me unless Kees screams.

If we don't have set_fs, we don't need the tests. :)

-- 
Kees Cook


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v2

2020-09-01 Thread Christophe Leroy




Le 01/09/2020 à 19:25, Al Viro a écrit :

On Tue, Sep 01, 2020 at 07:13:00PM +0200, Christophe Leroy wrote:


 10.92%  dd   [kernel.kallsyms]  [k] iov_iter_zero


Interesting...  Could you get an instruction-level profile inside 
iov_iter_zero(),
along with the disassembly of that sucker?



Output of perf annotate:


 Percent |	Source code & Disassembly of vmlinux for cpu-clock (3579 
samples)

-
 :
 :
 :
 :  Disassembly of section .text:
 :
 :  c02cb3a4 :
 :  iov_iter_zero():
2.24 :c02cb3a4:   stwur1,-80(r1)
0.31 :c02cb3a8:   stw r30,72(r1)
0.00 :c02cb3ac:   mr  r30,r4
0.11 :c02cb3b0:   stw r31,76(r1)
0.00 :c02cb3b4:   mr  r31,r3
1.06 :c02cb3b8:   stw r27,60(r1)
 :  iov_iter_type():
0.03 :c02cb3bc:   lwz r10,0(r4)
0.06 :c02cb3c0:   rlwinm  r9,r10,0,0,30
 :  iov_iter_zero():
0.03 :c02cb3c4:   cmpwi   r9,32
0.00 :c02cb3c8:   lwz r9,624(r2)
2.15 :c02cb3cc:   stw r9,28(r1)
0.00 :c02cb3d0:   li  r9,0
0.00 :c02cb3d4:   beq c02cb520 
0.14 :c02cb3d8:   lwz r9,8(r4)
0.08 :c02cb3dc:   cmplw   r9,r3
0.00 :c02cb3e0:   mr  r27,r9
0.03 :c02cb3e4:   bgt c02cb4fc 
1.34 :c02cb3e8:   cmpwi   r9,0
0.00 :c02cb3ec:   beq c02cb4d0 
0.11 :c02cb3f0:   andi.   r8,r10,16
0.17 :c02cb3f4:   lwz r31,4(r30)
1.79 :c02cb3f8:   bne c02cb61c 
0.00 :c02cb3fc:   andi.   r8,r10,8
0.06 :c02cb400:   bne c02cb770 
0.22 :c02cb404:   andi.   r10,r10,64
0.03 :c02cb408:   bne c02cb88c 
0.11 :c02cb40c:   stw r29,68(r1)
1.59 :c02cb410:   stw r28,64(r1)
0.03 :c02cb414:   lwz r28,12(r30)
0.00 :c02cb418:   lwz r7,4(r28)
1.87 :c02cb41c:   subfr29,r31,r7
0.28 :c02cb420:   cmplw   r29,r27
0.03 :c02cb424:   bgt c02cb50c 
0.03 :c02cb428:   cmpwi   r29,0
0.00 :c02cb42c:   beq c02cb898 
1.34 :c02cb430:   lwz r3,0(r28)
 :  __access_ok():
0.00 :c02cb434:   lis r10,-16384
 :  iov_iter_zero():
0.36 :c02cb438:   add r3,r3,r31
 :  __access_ok():
0.03 :c02cb43c:   cmplw   r3,r10
1.79 :c02cb440:   bge c02cb514 
   13.19 :c02cb444:   subfr10,r3,r10
 :  clear_user():
0.00 :c02cb448:   cmplw   r29,r10
4.41 :c02cb44c:   mflrr0
0.00 :c02cb450:   stw r0,84(r1)
0.00 :c02cb454:   bgt c02cb8c4 
0.00 :c02cb458:   mr  r4,r29
0.00 :c02cb45c:   bl  c001a41c <__arch_clear_user>
 :  iov_iter_zero():
0.70 :c02cb460:   add r31,r31,r29
0.00 :c02cb464:   cmpwi   r3,0
   17.13 :c02cb468:   subfr29,r29,r27
0.00 :c02cb46c:   subfr31,r3,r31
1.20 :c02cb470:   add r29,r29,r3
0.00 :c02cb474:   beq c02cb8b8 
0.00 :c02cb478:   lwz r9,8(r30)
0.00 :c02cb47c:   subfr10,r27,r29
0.00 :c02cb480:   lwz r0,84(r1)
0.00 :c02cb484:   subfr27,r29,r27
0.00 :c02cb488:   add r9,r10,r9
0.00 :c02cb48c:   lwz r7,4(r28)
0.00 :c02cb490:   lwz r10,12(r30)
0.00 :c02cb494:   mtlrr0
1.65 :c02cb498:   cmplw   r31,r7
   14.61 :c02cb49c:   bne c02cb4a8 
1.65 :c02cb4a0:   addir28,r28,8
0.00 :c02cb4a4:   li  r31,0
   14.92 :c02cb4a8:   lwz r8,16(r30)
0.00 :c02cb4ac:   subfr10,r10,r28
1.12 :c02cb4b0:   srawi   r10,r10,3
0.56 :c02cb4b4:   stw r28,12(r30)
0.00 :c02cb4b8:   subfr10,r10,r8
1.23 :c02cb4bc:   stw r10,16(r30)
0.00 :c02cb4c0:   lwz r28,64(r1)
0.56 :c02cb4c4:   lwz r29,68(r1)
0.00 :c02cb4c8:   stw r9,8(r30)
2.12 :c02cb4cc:   stw r31,4(r30)
0.00 :c02cb4d0:   lwz r9,28(r1)
0.61 :c02cb4d4:   lwz r10,624(r2)
0.00 :c02cb4d8:   xor.r9,r9,r10
0.00 :c02cb4dc:   li  r10,0
0.00 :c02cb4e0:   

Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v2

2020-09-01 Thread Matthew Wilcox
On Tue, Sep 01, 2020 at 06:25:12PM +0100, Al Viro wrote:
> On Tue, Sep 01, 2020 at 07:13:00PM +0200, Christophe Leroy wrote:
> 
> > 10.92%  dd   [kernel.kallsyms]  [k] iov_iter_zero
> 
> Interesting...  Could you get an instruction-level profile inside 
> iov_iter_zero(),
> along with the disassembly of that sucker?

Also, does [1] make any difference?  Probably not since it's translating
O flags into IOCB flags instead of RWF flags into IOCB flags.  I wonder
if there's a useful trick we can play here ... something like:

static inline int iocb_flags(struct file *file)
{
int res = 0;
if (likely(!file->f_flags & O_APPEND | O_DIRECT | O_DSYNC | __O_SYNC)) 
&& !IS_SYNC(file->f_mapping->host))
return res;
if (file->f_flags & O_APPEND)
res |= IOCB_APPEND;
if (file->f_flags & O_DIRECT)
res |= IOCB_DIRECT;
if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))
res |= IOCB_DSYNC;
if (file->f_flags & __O_SYNC)
res |= IOCB_SYNC;
return res;
}

Can we do something like force O_DSYNC to be set if the inode IS_SYNC()
at the time of open?  Or is setting the sync bit on the inode required
to affect currently-open files?

[1] 
https://lore.kernel.org/linux-fsdevel/95de7ce4-9254-39f1-304f-4455f66bf...@kernel.dk/
 


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v2

2020-09-01 Thread Al Viro
On Tue, Sep 01, 2020 at 07:13:00PM +0200, Christophe Leroy wrote:

> 10.92%  dd   [kernel.kallsyms]  [k] iov_iter_zero

Interesting...  Could you get an instruction-level profile inside 
iov_iter_zero(),
along with the disassembly of that sucker?


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v2

2020-09-01 Thread Christophe Leroy

Hi Christoph,

Le 27/08/2020 à 17:00, Christoph Hellwig a écrit :

Hi all,

this series removes the last set_fs() used to force a kernel address
space for the uaccess code in the kernel read/write/splice code, and then
stops implementing the address space overrides entirely for x86 and
powerpc.

The file system part has been posted a few times, and the read/write side
has been pretty much unchanced.  For splice this series drops the
conversion of the seq_file and sysctl code to the iter ops, and thus loses
the splice support for them.  The reasons for that is that it caused a lot
of churn for not much use - splice for these small files really isn't much
of a win, even if existing userspace uses it.  All callers I found do the
proper fallback, but if this turns out to be an issue the conversion can
be resurrected.

Besides x86 and powerpc I plan to eventually convert all other
architectures, although this will be a slow process, starting with the
easier ones once the infrastructure is merged.  The process to convert
architectures is roughtly:

  (1) ensure there is no set_fs(KERNEL_DS) left in arch specific code
  (2) implement __get_kernel_nofault and __put_kernel_nofault
  (3) remove the arch specific address limitation functionality

Changes since v1:
  - drop the patch to remove the non-iter ops for /dev/zero and
/dev/null as they caused a performance regression
  - don't enable user access in __get_kernel on powerpc
  - xfail the set_fs() based lkdtm tests

Diffstat:




I'm still sceptic with the results I get.

With 5.9-rc2:

root@vgoippro:~# time dd if=/dev/zero of=/dev/null count=1M
1048576+0 records in
1048576+0 records out
536870912 bytes (512.0MB) copied, 5.585880 seconds, 91.7MB/s
real0m 5.59s
user0m 1.40s
sys 0m 4.19s


With your series:

root@vgoippro:/tmp# time dd if=/dev/zero of=/dev/null count=1M
1048576+0 records in
1048576+0 records out
536870912 bytes (512.0MB) copied, 7.780540 seconds, 65.8MB/s
real0m 7.79s
user0m 2.12s
sys 0m 5.66s




Top of perf report of a standard perf record:

With 5.9-rc2:

20.31%  dd   [kernel.kallsyms]  [k] __arch_clear_user
 8.37%  dd   [kernel.kallsyms]  [k] transfer_to_syscall
 7.37%  dd   [kernel.kallsyms]  [k] __fsnotify_parent
 6.95%  dd   [kernel.kallsyms]  [k] iov_iter_zero
 5.72%  dd   [kernel.kallsyms]  [k] new_sync_read
 4.87%  dd   [kernel.kallsyms]  [k] vfs_write
 4.47%  dd   [kernel.kallsyms]  [k] vfs_read
 3.07%  dd   [kernel.kallsyms]  [k] ksys_write
 2.77%  dd   [kernel.kallsyms]  [k] ksys_read
 2.65%  dd   [kernel.kallsyms]  [k] __fget_light
 2.37%  dd   [kernel.kallsyms]  [k] __fdget_pos
 2.35%  dd   [kernel.kallsyms]  [k] memset
 1.53%  dd   [kernel.kallsyms]  [k] rw_verify_area
 1.52%  dd   [kernel.kallsyms]  [k] read_iter_zero

With your series:
19.60%  dd   [kernel.kallsyms]  [k] __arch_clear_user
10.92%  dd   [kernel.kallsyms]  [k] iov_iter_zero
 9.50%  dd   [kernel.kallsyms]  [k] vfs_write
 8.97%  dd   [kernel.kallsyms]  [k] __fsnotify_parent
 5.46%  dd   [kernel.kallsyms]  [k] transfer_to_syscall
 5.42%  dd   [kernel.kallsyms]  [k] vfs_read
 3.58%  dd   [kernel.kallsyms]  [k] ksys_read
 2.84%  dd   [kernel.kallsyms]  [k] read_iter_zero
 2.24%  dd   [kernel.kallsyms]  [k] ksys_write
 1.80%  dd   [kernel.kallsyms]  [k] __fget_light
 1.34%  dd   [kernel.kallsyms]  [k] __fdget_pos
 0.91%  dd   [kernel.kallsyms]  [k] memset
 0.91%  dd   [kernel.kallsyms]  [k] rw_verify_area

Christophe


Re: [PATCH] selftests/powerpc: Skip PROT_SAO test in guests/LPARS

2020-09-01 Thread Sachin Sant



> On 01-Sep-2020, at 6:16 PM, Michael Ellerman  wrote:
> 
> In commit 9b725a90a8f1 ("powerpc/64s: Disallow PROT_SAO in LPARs by
> default") PROT_SAO was disabled in guests/LPARs by default. So skip
> the test if we are running in a guest to avoid a spurious failure.
> 
> Signed-off-by: Michael Ellerman 
> —

Tested-by: Sachin Sant 

With the fix test is skipped while running in a guest

# ./prot_sao 
test: prot-sao
tags: git_version:unknown
[SKIP] Test skipped on line 25
skip: prot-sao
#



[PATCH v3 16/23] powerpc: use asm-generic/mmu_context.h for no-op implementations

2020-09-01 Thread Nicholas Piggin
Cc: linuxppc-dev@lists.ozlabs.org
Acked-by: Michael Ellerman  (powerpc)
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/mmu_context.h | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 7f3658a97384..a3a12a8341b2 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -14,7 +14,9 @@
 /*
  * Most if the context management is out of line
  */
+#define init_new_context init_new_context
 extern int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
+#define destroy_context destroy_context
 extern void destroy_context(struct mm_struct *mm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 struct mm_iommu_table_group_mem_t;
@@ -235,27 +237,26 @@ static inline void switch_mm(struct mm_struct *prev, 
struct mm_struct *next,
 }
 #define switch_mm_irqs_off switch_mm_irqs_off
 
-
-#define deactivate_mm(tsk,mm)  do { } while (0)
-
 /*
  * After we have set current->mm to a new value, this activates
  * the context for the new mm so we see the new mappings.
  */
+#define activate_mm activate_mm
 static inline void activate_mm(struct mm_struct *prev, struct mm_struct *next)
 {
switch_mm(prev, next, current);
 }
 
 /* We don't currently use enter_lazy_tlb() for anything */
+#ifdef CONFIG_PPC_BOOK3E_64
+#define enter_lazy_tlb enter_lazy_tlb
 static inline void enter_lazy_tlb(struct mm_struct *mm,
  struct task_struct *tsk)
 {
/* 64-bit Book3E keeps track of current PGD in the PACA */
-#ifdef CONFIG_PPC_BOOK3E_64
get_paca()->pgd = NULL;
-#endif
 }
+#endif
 
 extern void arch_exit_mmap(struct mm_struct *mm);
 
@@ -298,5 +299,7 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
return 0;
 }
 
+#include 
+
 #endif /* __KERNEL__ */
 #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
-- 
2.23.0



[PATCH] cpuidle-pseries: Fix CEDE latency conversion from tb to us

2020-09-01 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for
CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
of the Extended CEDE states advertised by the platform. The values
advertised by the platform are in timebase ticks. However the cpuidle
framework requires the latency values in microseconds.

If the tb-ticks value advertised by the platform correspond to a value
smaller than 1us, during the conversion from tb-ticks to microseconds,
in the current code, the result becomes zero. This is incorrect as it
puts a CEDE state on par with the snooze state.

This patch fixes this by rounding up the result obtained while
converting the latency value from tb-ticks to microseconds.

Fixes: commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for
CEDE(0)")

Signed-off-by: Gautham R. Shenoy 
---
 drivers/cpuidle/cpuidle-pseries.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index ff6d99e..9043358 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -361,7 +361,7 @@ static void __init fixup_cede0_latency(void)
for (i = 0; i < nr_xcede_records; i++) {
struct xcede_latency_record *record = >records[i];
u64 latency_tb = be64_to_cpu(record->latency_ticks);
-   u64 latency_us = tb_to_ns(latency_tb) / NSEC_PER_USEC;
+   u64 latency_us = DIV_ROUND_UP_ULL(tb_to_ns(latency_tb), 
NSEC_PER_USEC);
 
if (latency_us < min_latency_us)
min_latency_us = latency_us;
-- 
1.9.4



Re: [PATCH] powerpc/powernv/pci: Drop pnv_phb->initialized

2020-09-01 Thread Michael Ellerman
Oliver O'Halloran  writes:

> The pnv_phb->initialized flag is an odd beast. It was added back in 2012 in
> commit db1266c85261 ("powerpc/powernv: Skip check on PE if necessary") to
> allow devices to be enabled even if their PE assignments hadn't been
> completed yet. I can't think of any situation where we would (or should)
> have PCI devices being enabled before their PEs are assigned, so I can only
> assume it was a workaround for a bug or some other undesirable behaviour
> from the PCI core.
>
> Since commit dc3d8f85bb57 ("powerpc/powernv/pci: Re-work bus PE
> configuration") the PE setup occurs before the PCI core allows driver to
> attach to the device so the problem should no longer exist. Even it does
> allowing the device to be enabled before we have assigned the device to a
> PE is almost certainly broken and will cause spurious EEH events so we
> should probably just remove it.
>
> It's also worth pointing out that ->initialized flag is set in
> pnv_pci_ioda_create_dbgfs() which has the entire function body wrapped in
> flag.

"body wrapped in flag." ?

I guess you meant:

"wrapped in #ifdef CONFIG_DEBUG_FS" ?

> That has the fun side effect of bypassing any other checks in
> pnv_pci_enable_device_hook() which is probably not what anybody wants.

That would only be true for CONFIG_DEBUG_FS=n builds though.

cheers

> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 023a4f987bb2..6ac3c637b313 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2410,9 +2410,6 @@ static void pnv_pci_ioda_create_dbgfs(void)
>   list_for_each_entry_safe(hose, tmp, _list, list_node) {
>   phb = hose->private_data;
>  
> - /* Notify initialization of PHB done */
> - phb->initialized = 1;
> -
>   sprintf(name, "PCI%04x", hose->global_number);
>   phb->dbgfs = debugfs_create_dir(name, powerpc_debugfs_root);
>  
> @@ -2609,17 +2606,8 @@ static resource_size_t pnv_pci_default_alignment(void)
>   */
>  static bool pnv_pci_enable_device_hook(struct pci_dev *dev)
>  {
> - struct pnv_phb *phb = pci_bus_to_pnvhb(dev->bus);
>   struct pci_dn *pdn;
>  
> - /* The function is probably called while the PEs have
> -  * not be created yet. For example, resource reassignment
> -  * during PCI probe period. We just skip the check if
> -  * PEs isn't ready.
> -  */
> - if (!phb->initialized)
> - return true;
> -
>   pdn = pci_get_pdn(dev);
>   if (!pdn || pdn->pe_number == IODA_INVALID_PE)
>   return false;
> @@ -2629,14 +2617,9 @@ static bool pnv_pci_enable_device_hook(struct pci_dev 
> *dev)
>  
>  static bool pnv_ocapi_enable_device_hook(struct pci_dev *dev)
>  {
> - struct pci_controller *hose = pci_bus_to_host(dev->bus);
> - struct pnv_phb *phb = hose->private_data;
>   struct pci_dn *pdn;
>   struct pnv_ioda_pe *pe;
>  
> - if (!phb->initialized)
> - return true;
> -
>   pdn = pci_get_pdn(dev);
>   if (!pdn)
>   return false;
> diff --git a/arch/powerpc/platforms/powernv/pci.h 
> b/arch/powerpc/platforms/powernv/pci.h
> index 739a0b3b72e1..36d22920f5a3 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -119,7 +119,6 @@ struct pnv_phb {
>   int flags;
>   void __iomem*regs;
>   u64 regs_phys;
> - int initialized;
>   spinlock_t  lock;
>  
>  #ifdef CONFIG_DEBUG_FS
> -- 
> 2.26.2


Re: [RESEND][PATCH 1/7] powerpc/iommu: Avoid overflow at boundary_size

2020-09-01 Thread Michael Ellerman
Nicolin Chen  writes:
> The boundary_size might be as large as ULONG_MAX, which means
> that a device has no specific boundary limit. So either "+ 1"
> or passing it to ALIGN() would potentially overflow.
>
> According to kernel defines:
> #define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask))
> #define ALIGN(x, a)   ALIGN_MASK(x, (typeof(x))(a) - 1)
>
> We can simplify the logic here:
>   ALIGN(boundary + 1, 1 << shift) >> shift
> = ALIGN_MASK(b + 1, (1 << s) - 1) >> s
> = {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s
> = [b + 1 + (1 << s) - 1] >> s
> = [b + (1 << s)] >> s
> = (b >> s) + 1
>
> So fixing a potential overflow with the safer shortcut.
>
> Reported-by: Stephen Rothwell 
> Signed-off-by: Nicolin Chen 
> Cc: Christoph Hellwig 
> ---
>  arch/powerpc/kernel/iommu.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)

Are you asking for acks, or for maintainers to merge the patches
individually?

> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index 9704f3f76e63..c01ccbf8afdd 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -236,15 +236,14 @@ static unsigned long iommu_range_alloc(struct device 
> *dev,
>   }
>   }
>  
> - if (dev)
> - boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> -   1 << tbl->it_page_shift);
> - else
> - boundary_size = ALIGN(1UL << 32, 1 << tbl->it_page_shift);
>   /* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
> + boundary_size = dev ? dma_get_seg_boundary(dev) : U32_MAX;

Is there any path that passes a NULL dev anymore?

Both iseries_hv_alloc() and iseries_hv_map() were removed years ago.
See:
  8ee3e0d69623 ("powerpc: Remove the main legacy iSerie platform code")


So maybe we should do a lead-up patch that drops the NULL dev support,
which will then make this patch simpler.

cheers


> + /* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
> + boundary_size = (boundary_size >> tbl->it_page_shift) + 1;
>  
>   n = iommu_area_alloc(tbl->it_map, limit, start, npages, tbl->it_offset,
> -  boundary_size >> tbl->it_page_shift, align_mask);
> +  boundary_size, align_mask);
>   if (n == -1) {
>   if (likely(pass == 0)) {
>   /* First try the pool from the start */
> -- 
> 2.17.1


[PATCH] selftests/powerpc: Skip PROT_SAO test in guests/LPARS

2020-09-01 Thread Michael Ellerman
In commit 9b725a90a8f1 ("powerpc/64s: Disallow PROT_SAO in LPARs by
default") PROT_SAO was disabled in guests/LPARs by default. So skip
the test if we are running in a guest to avoid a spurious failure.

Signed-off-by: Michael Ellerman 
---
 tools/testing/selftests/powerpc/mm/prot_sao.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/powerpc/mm/prot_sao.c 
b/tools/testing/selftests/powerpc/mm/prot_sao.c
index e0cf8ebbf8cd..30b71b1d78d5 100644
--- a/tools/testing/selftests/powerpc/mm/prot_sao.c
+++ b/tools/testing/selftests/powerpc/mm/prot_sao.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -18,9 +19,13 @@ int test_prot_sao(void)
 {
char *p;
 
-   /* SAO was introduced in 2.06 and removed in 3.1 */
+   /*
+* SAO was introduced in 2.06 and removed in 3.1. It's disabled in
+* guests/LPARs by default, so also skip if we are running in a guest.
+*/
SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06) ||
-   have_hwcap2(PPC_FEATURE2_ARCH_3_1));
+   have_hwcap2(PPC_FEATURE2_ARCH_3_1) ||
+   access("/proc/device-tree/rtas/ibm,hypertas-functions", F_OK) 
== 0);
 
/*
 * Ensure we can ask for PROT_SAO.
-- 
2.25.1



Re: [PATCH 4/4] powerpc/64s/radix: Fix mm_cpumask trimming race vs kthread_use_mm

2020-09-01 Thread Michael Ellerman
Nicholas Piggin  writes:
> Commit 0cef77c7798a7 ("powerpc/64s/radix: flush remote CPUs out of
> single-threaded mm_cpumask") added a mechanism to trim the mm_cpumask of
> a process under certain conditions. One of the assumptions is that
> mm_users would not be incremented via a reference outside the process
> context with mmget_not_zero() then go on to kthread_use_mm() via that
> reference.
>
> That invariant was broken by io_uring code (see previous sparc64 fix),
> but I'll point Fixes: to the original powerpc commit because we are
> changing that assumption going forward, so this will make backports
> match up.
>
> Fix this by no longer relying on that assumption, but by having each CPU
> check the mm is not being used, and clearing their own bit from the mask
> if it's okay. This fix relies on commit 38cf307c1f20 ("mm: fix
> kthread_use_mm() vs TLB invalidate") to disable irqs over the mm switch,
> and ARCH_WANT_IRQS_OFF_ACTIVATE_MM to be enabled.

You could use:

Depends-on: 38cf307c1f20 ("mm: fix kthread_use_mm() vs TLB invalidate")

> Fixes: 0cef77c7798a7 ("powerpc/64s/radix: flush remote CPUs out of 
> single-threaded mm_cpumask")
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/include/asm/tlb.h   | 13 -
>  arch/powerpc/mm/book3s64/radix_tlb.c | 23 ---
>  2 files changed, 16 insertions(+), 20 deletions(-)

One minor nit below if you're respinning anyway.

You know this stuff better than me, but I still reviewed it and it seems
good to me.

Reviewed-by: Michael Ellerman 

> diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
> index fbc6f3002f23..d97f061fecac 100644
> --- a/arch/powerpc/include/asm/tlb.h
> +++ b/arch/powerpc/include/asm/tlb.h
> @@ -66,19 +66,6 @@ static inline int mm_is_thread_local(struct mm_struct *mm)
>   return false;
>   return cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm));
>  }
> -static inline void mm_reset_thread_local(struct mm_struct *mm)
> -{
> - WARN_ON(atomic_read(>context.copros) > 0);
> - /*
> -  * It's possible for mm_access to take a reference on mm_users to
> -  * access the remote mm from another thread, but it's not allowed
> -  * to set mm_cpumask, so mm_users may be > 1 here.
> -  */
> - WARN_ON(current->mm != mm);
> - atomic_set(>context.active_cpus, 1);
> - cpumask_clear(mm_cpumask(mm));
> - cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm));
> -}
>  #else /* CONFIG_PPC_BOOK3S_64 */
>  static inline int mm_is_thread_local(struct mm_struct *mm)
>  {
> diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
> b/arch/powerpc/mm/book3s64/radix_tlb.c
> index 0d233763441f..a421a0e3f930 100644
> --- a/arch/powerpc/mm/book3s64/radix_tlb.c
> +++ b/arch/powerpc/mm/book3s64/radix_tlb.c
> @@ -645,19 +645,29 @@ static void do_exit_flush_lazy_tlb(void *arg)
>   struct mm_struct *mm = arg;
>   unsigned long pid = mm->context.id;
>  
> + /*
> +  * A kthread could have done a mmget_not_zero() after the flushing CPU
> +  * checked mm_users == 1, and be in the process of kthread_use_mm when
^
in mm_is_singlethreaded()

Adding that reference would help join the dots for a new reader I think.

cheers

> +  * interrupted here. In that case, current->mm will be set to mm,
> +  * because kthread_use_mm() setting ->mm and switching to the mm is
> +  * done with interrupts off.
> +  */
>   if (current->mm == mm)
> - return; /* Local CPU */
> + goto out_flush;
>  
>   if (current->active_mm == mm) {
> - /*
> -  * Must be a kernel thread because sender is single-threaded.
> -  */
> - BUG_ON(current->mm);
> + WARN_ON_ONCE(current->mm != NULL);
> + /* Is a kernel thread and is using mm as the lazy tlb */
>   mmgrab(_mm);
> - switch_mm(mm, _mm, current);
>   current->active_mm = _mm;
> + switch_mm_irqs_off(mm, _mm, current);
>   mmdrop(mm);
>   }
> +
> + atomic_dec(>context.active_cpus);
> + cpumask_clear_cpu(smp_processor_id(), mm_cpumask(mm));
> +
> +out_flush:
>   _tlbiel_pid(pid, RIC_FLUSH_ALL);
>  }
>  
> @@ -672,7 +682,6 @@ static void exit_flush_lazy_tlbs(struct mm_struct *mm)
>*/
>   smp_call_function_many(mm_cpumask(mm), do_exit_flush_lazy_tlb,
>   (void *)mm, 1);
> - mm_reset_thread_local(mm);
>  }
>  
>  void radix__flush_tlb_mm(struct mm_struct *mm)


Re: [PATCH] arch: vdso: add vdso linker script to 'targets' instead of extra-y

2020-09-01 Thread Greentime Hu
Masahiro Yamada  於 2020年9月1日 週二 上午2:23寫道:
>
> The vdso linker script is preprocessed on demand.
> Adding it to 'targets' is enough to include the .cmd file.
>
> Signed-off-by: Masahiro Yamada 
> ---
>
>  arch/arm64/kernel/vdso/Makefile | 2 +-
>  arch/arm64/kernel/vdso32/Makefile   | 2 +-
>  arch/nds32/kernel/vdso/Makefile | 2 +-
>  arch/powerpc/kernel/vdso32/Makefile | 2 +-
>  arch/powerpc/kernel/vdso64/Makefile | 2 +-
>  arch/s390/kernel/vdso64/Makefile| 2 +-
>  6 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
> index 45d5cfe46429..7cd8aafbe96e 100644
> --- a/arch/arm64/kernel/vdso/Makefile
> +++ b/arch/arm64/kernel/vdso/Makefile
> @@ -54,7 +54,7 @@ endif
>  GCOV_PROFILE := n
>
>  obj-y += vdso.o
> -extra-y += vdso.lds
> +targets += vdso.lds
>  CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
>
>  # Force dependency (incbin is bad)
> diff --git a/arch/arm64/kernel/vdso32/Makefile 
> b/arch/arm64/kernel/vdso32/Makefile
> index d6adb4677c25..572475b7b7ed 100644
> --- a/arch/arm64/kernel/vdso32/Makefile
> +++ b/arch/arm64/kernel/vdso32/Makefile
> @@ -155,7 +155,7 @@ asm-obj-vdso := $(addprefix $(obj)/, $(asm-obj-vdso))
>  obj-vdso := $(c-obj-vdso) $(c-obj-vdso-gettimeofday) $(asm-obj-vdso)
>
>  obj-y += vdso.o
> -extra-y += vdso.lds
> +targets += vdso.lds
>  CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
>
>  # Force dependency (vdso.s includes vdso.so through incbin)
> diff --git a/arch/nds32/kernel/vdso/Makefile b/arch/nds32/kernel/vdso/Makefile
> index 7c3c1ccb196e..55df25ef0057 100644
> --- a/arch/nds32/kernel/vdso/Makefile
> +++ b/arch/nds32/kernel/vdso/Makefile
> @@ -20,7 +20,7 @@ GCOV_PROFILE := n
>
>
>  obj-y += vdso.o
> -extra-y += vdso.lds
> +targets += vdso.lds
>  CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
>
>  # Force dependency
> diff --git a/arch/powerpc/kernel/vdso32/Makefile 
> b/arch/powerpc/kernel/vdso32/Makefile
> index 87ab1152d5ce..fd5072a4c73c 100644
> --- a/arch/powerpc/kernel/vdso32/Makefile
> +++ b/arch/powerpc/kernel/vdso32/Makefile
> @@ -29,7 +29,7 @@ ccflags-y := -shared -fno-common -fno-builtin -nostdlib \
>  asflags-y := -D__VDSO32__ -s
>
>  obj-y += vdso32_wrapper.o
> -extra-y += vdso32.lds
> +targets += vdso32.lds
>  CPPFLAGS_vdso32.lds += -P -C -Upowerpc
>
>  # Force dependency (incbin is bad)
> diff --git a/arch/powerpc/kernel/vdso64/Makefile 
> b/arch/powerpc/kernel/vdso64/Makefile
> index 38c317f25141..c737b3ea3207 100644
> --- a/arch/powerpc/kernel/vdso64/Makefile
> +++ b/arch/powerpc/kernel/vdso64/Makefile
> @@ -17,7 +17,7 @@ ccflags-y := -shared -fno-common -fno-builtin -nostdlib \
>  asflags-y := -D__VDSO64__ -s
>
>  obj-y += vdso64_wrapper.o
> -extra-y += vdso64.lds
> +targets += vdso64.lds
>  CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
>
>  # Force dependency (incbin is bad)
> diff --git a/arch/s390/kernel/vdso64/Makefile 
> b/arch/s390/kernel/vdso64/Makefile
> index 4a66a1cb919b..d0d406cfffa9 100644
> --- a/arch/s390/kernel/vdso64/Makefile
> +++ b/arch/s390/kernel/vdso64/Makefile
> @@ -25,7 +25,7 @@ $(targets:%=$(obj)/%.dbg): KBUILD_CFLAGS = 
> $(KBUILD_CFLAGS_64)
>  $(targets:%=$(obj)/%.dbg): KBUILD_AFLAGS = $(KBUILD_AFLAGS_64)
>
>  obj-y += vdso64_wrapper.o
> -extra-y += vdso64.lds
> +targets += vdso64.lds
>  CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
>
>  # Disable gcov profiling, ubsan and kasan for VDSO code

For nds32:

Acked-by: Greentime Hu 


Re: [PATCH v2] powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc

2020-09-01 Thread Anshuman Khandual



On 09/01/2020 03:14 PM, Aneesh Kumar K.V wrote:
> The test is broken w.r.t page table update rules and results in kernel
> crash as below. Disable the support until we get the tests updated.
> 
> [   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
> cpu 0x0: Vector: 700 (Program Check) at [c00c6d1e76c0]
> pc: c009a5ec: assert_pte_locked+0x14c/0x380
> lr: c05c: pte_update+0x11c/0x190
> sp: c00c6d1e7950
>msr: 82029033
>   current = 0xc00c6d172c80
>   paca= 0xc3ba   irqmask: 0x03   irq_happened: 0x01
> pid   = 1, comm = swapper/0
> kernel BUG at arch/powerpc/mm/pgtable.c:304!
> [link register   ] c05c pte_update+0x11c/0x190
> [c00c6d1e7950] 0001 (unreliable)
> [c00c6d1e79b0] c05eee14 pte_update+0x44/0x190
> [c00c6d1e7a10] c1a2ca9c pte_advanced_tests+0x160/0x3d8
> [c00c6d1e7ab0] c1a2d4fc debug_vm_pgtable+0x7e8/0x1338
> [c00c6d1e7ba0] c00116ec do_one_initcall+0xac/0x5f0
> [c00c6d1e7c80] c19e4fac kernel_init_freeable+0x4dc/0x5a4
> [c00c6d1e7db0] c0012474 kernel_init+0x24/0x160
> [c00c6d1e7e20] c000cbd0 ret_from_kernel_thread+0x5c/0x6c
> 
> With DEBUG_VM disabled
> 
> [   20.530152] BUG: Kernel NULL pointer dereference on read at 0x
> [   20.530183] Faulting instruction address: 0xc00df330
> cpu 0x33: Vector: 380 (Data SLB Access) at [c00c6d19f700]
> pc: c00df330: memset+0x68/0x104
> lr: c009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
> sp: c00c6d19f990
>msr: 82009033
>dar: 0
>   current = 0xc00c6d177480
>   paca= 0xc0001ec4f400   irqmask: 0x03   irq_happened: 0x01
> pid   = 1, comm = swapper/0
> [link register   ] c009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
> [c00c6d19f990] c009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 
> (unreliable)
> [c00c6d19fa10] c19ebf30 pmd_advanced_tests+0x1f0/0x378
> [c00c6d19fab0] c19ed088 debug_vm_pgtable+0x79c/0x1244
> [c00c6d19fba0] c00116ec do_one_initcall+0xac/0x5f0
> [c00c6d19fc80] c19a4fac kernel_init_freeable+0x4dc/0x5a4
> [c00c6d19fdb0] c0012474 kernel_init+0x24/0x160
> [c00c6d19fe20] c000cbd0 ret_from_kernel_thread+0x5c/0x6c
> 33:mon>
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/Kconfig | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 65bed1fdeaad..787e829b6f25 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -116,7 +116,6 @@ config PPC
>   #
>   select ARCH_32BIT_OFF_T if PPC32
>   select ARCH_HAS_DEBUG_VIRTUAL
> - select ARCH_HAS_DEBUG_VM_PGTABLE
>   select ARCH_HAS_DEVMEM_IS_ALLOWED
>   select ARCH_HAS_ELF_RANDOMIZE
>   select ARCH_HAS_FORTIFY_SOURCE
> 

If support for powerpc is being dropped, please update the features file
here as well. They should be in sync.

Documentation/features/debug/debug-vm-pgtable/arch-support.txt


Re: [PATCH 00/10] sound: convert tasklets to use new tasklet_setup()

2020-09-01 Thread Takashi Iwai
On Tue, 01 Sep 2020 12:04:53 +0200,
Allen wrote:
> 
> Takashi,
> > > > > These patches which I wasn't CCed on and which need their subject 
> > > > > lines
> > > > > fixing :( .  With the subject lines fixed I guess so so
> > >
> > > > Extremely sorry. I thought I had it covered. How would you like it
> > > > worded?
> > >
> > > ASoC:
> >
> > To be more exact, "ASoC:" prefix is for sound/soc/*, and for the rest
> > sound/*, use "ALSA:" prefix please.
> 
> I could not get the generic API accepted upstream. We would stick to
> from_tasklet()
> or container_of(). Could I go ahead and send out V2 using
> from_tasklet() with subject line fixed?

Yes, please submit whatever should go into 5.9.


thanks,

Takashi


Re: [PATCH 0/8] scsi: convert tasklets to use new tasklet_setup()

2020-09-01 Thread Allen
> > > >
> > > > Commit 12cc923f1ccc ("tasklet: Introduce new initialization
> > > > API")' introduced a new tasklet initialization API. This series
> > > > converts all the scsi drivers to use the new tasklet_setup() API
> > >
> > > I've got to say I agree with Jens, this was a silly obfuscation:
> > >
> > > +#define from_tasklet(var, callback_tasklet, tasklet_fieldname) \
> > > +   container_of(callback_tasklet, typeof(*var),
> > > tasklet_fieldname)
> > >
> > > Just use container_of directly since we all understand what it
> > > does.
> >
> > But then the lines get really long, wrapped, etc.
>
> I really don't think that's a problem but if you want to add a new
> generic container_of that does typeof instead of insisting on the type,
> I'd be sort of OK with that ... provided you don't gratuitously alter
> the argument order.
>
> The thing I object to is that this encourages everyone to roll their
> own unnecessary container_of type macros in spite of the fact that it's
> function is wholly generic.  It's fine if you're eliminating one of the
> arguments, or actually making the macro specific to the type, but in
> this case you're not, you're making a completely generic macro where
> the name is the only thing that's specific to this case.
>
> >  This is what the timer_struct conversion did too (added a
> > container_of wrapper), so I think it makes sense here too.
>
> I didn't see that one to object to it ...

Since we could not get the generic API accepted, can I send out V2
which would use container_of()?

Thanks,

-- 
   - Allen


Re: [PATCH 00/10] sound: convert tasklets to use new tasklet_setup()

2020-09-01 Thread Allen
Takashi,
> > > > These patches which I wasn't CCed on and which need their subject lines
> > > > fixing :( .  With the subject lines fixed I guess so so
> >
> > > Extremely sorry. I thought I had it covered. How would you like it
> > > worded?
> >
> > ASoC:
>
> To be more exact, "ASoC:" prefix is for sound/soc/*, and for the rest
> sound/*, use "ALSA:" prefix please.

I could not get the generic API accepted upstream. We would stick to
from_tasklet()
or container_of(). Could I go ahead and send out V2 using
from_tasklet() with subject line fixed?

Thanks,
-- 
   - Allen


Re: [PATCH v3 09/13] mm/debug_vm_pgtable/locks: Move non page table modifying test together

2020-09-01 Thread Anshuman Khandual



On 09/01/2020 03:06 PM, Aneesh Kumar K.V wrote:
> 

 [   17.080644] [ cut here ]
 [   17.081342] kernel BUG at mm/pgtable-generic.c:164!
 [   17.082091] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
 [   17.082977] Modules linked in:
 [   17.083481] CPU: 79 PID: 1 Comm: swapper/0 Tainted: G    W 
 5.9.0-rc2-00105-g740e72cd6717 #14
 [   17.084998] Hardware name: linux,dummy-virt (DT)
 [   17.085745] pstate: 6045 (nZCv daif +PAN -UAO BTYPE=--)
 [   17.086645] pc : pgtable_trans_huge_deposit+0x58/0x60
 [   17.087462] lr : debug_vm_pgtable+0x4dc/0x8f0
 [   17.088168] sp : 80001219bcf0
 [   17.088710] x29: 80001219bcf0 x28: 8000114ac000
 [   17.089574] x27: 8000114ac000 x26: 00200fd3
 [   17.090431] x25: 002081400fd3 x24: fe00175c12c0
 [   17.091286] x23: 0005df04d1a8 x22: f18d6e035000
 [   17.092143] x21: 0005dd79 x20: 0005df18e1a8
 [   17.093003] x19: 0005df04cb80 x18: 0014
 [   17.093859] x17: b76667d0 x16: fd4e6611
 [   17.094716] x15: 0001 x14: 0002
 [   17.095572] x13: 0055d966 x12: fe001755e400
 [   17.096429] x11: 0008 x10: 0005fcb6e210
 [   17.097292] x9 : 0005fcb6e210 x8 : 002081590fd3
 [   17.098149] x7 : 0005dd71e0c0 x6 : 0005df18e1a8
 [   17.099006] x5 : 002081590fd3 x4 : 002081590fd3
 [   17.099862] x3 : 0005df18e1a8 x2 : fe00175c12c0
 [   17.100718] x1 : fe00175c1300 x0 : 
 [   17.101583] Call trace:
 [   17.101993]  pgtable_trans_huge_deposit+0x58/0x60
 [   17.102754]  do_one_initcall+0x74/0x1cc
 [   17.103381]  kernel_init_freeable+0x1d0/0x238
 [   17.104089]  kernel_init+0x14/0x118
 [   17.104658]  ret_from_fork+0x10/0x34
 [   17.105251] Code: f9000443 f9000843 f9000822 d65f03c0 (d421)
 [   17.106303] ---[ end trace e63471e00f8c147f ]---

>>>
>>> IIUC, this should happen even without the series right? This is
>>>
>>>  assert_spin_locked(pmd_lockptr(mm, pmdp));
>>
>> Crash does not happen without this series. A previous patch [PATCH v3 08/13]
>> added pgtable_trans_huge_deposit/withdraw() in pmd_advanced_tests(). arm64
>> does not define __HAVE_ARCH_PGTABLE_DEPOSIT and hence falls back on generic
>> pgtable_trans_huge_deposit() which the asserts the lock.
>>
> 
> 
> I fixed that by moving the pgtable deposit after adding the pmd locks 
> correctly.
> 
> mm/debug_vm_pgtable/locks: Move non page table modifying test together
> mm/debug_vm_pgtable/locks: Take correct page table lock
> mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP

Right, it does fix. But then both those patches should be folded/merged in
order to preserve git bisect ability, besides test classification reasons
as mentioned in a previous response and a possible redundant movement of
hugetlb_basic_tests().


Re: [PATCH v3 13/13] mm/debug_vm_pgtable: populate a pte entry before fetching it

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 1:08 PM, Anshuman Khandual wrote:



On 09/01/2020 12:07 PM, Aneesh Kumar K.V wrote:

On 9/1/20 8:55 AM, Anshuman Khandual wrote:



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:

pte_clear_tests operate on an existing pte entry. Make sure that is not a none
pte entry.

Signed-off-by: Aneesh Kumar K.V 
---
   mm/debug_vm_pgtable.c | 6 --
   1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 21329c7d672f..8527ebb75f2c 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -546,7 +546,7 @@ static void __init pgd_populate_tests(struct mm_struct *mm, 
pgd_t *pgdp,
   static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
  unsigned long vaddr)
   {
-    pte_t pte = ptep_get(ptep);
+    pte_t pte =  ptep_get_and_clear(mm, vaddr, ptep);


Seems like ptep_get_and_clear() here just clears the entry in preparation
for a following set_pte_at() which otherwise would have been a problem on
ppc64 as you had pointed out earlier i.e set_pte_at() should not update an
existing valid entry. So the commit message here is bit misleading.



and also fetch the pte value which is used further.



     pr_debug("Validating PTE clear\n");
   pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
@@ -944,7 +944,7 @@ static int __init debug_vm_pgtable(void)
   p4d_t *p4dp, *saved_p4dp;
   pud_t *pudp, *saved_pudp;
   pmd_t *pmdp, *saved_pmdp, pmd;
-    pte_t *ptep;
+    pte_t *ptep, pte;
   pgtable_t saved_ptep;
   pgprot_t prot, protnone;
   phys_addr_t paddr;
@@ -1049,6 +1049,8 @@ static int __init debug_vm_pgtable(void)
    */
     ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );
+    pte = pfn_pte(pte_aligned, prot);
+    set_pte_at(mm, vaddr, ptep, pte);


Not here, creating and populating an entry must be done in respective
test functions itself. Besides, this seems bit redundant as well. The
test pte_clear_tests() with the above change added, already

- Clears the PTEP entry with ptep_get_and_clear()


and fetch the old value set previously.


In that case, please move above two lines i.e

pte = pfn_pte(pte_aligned, prot);
set_pte_at(mm, vaddr, ptep, pte);

from debug_vm_pgtable() to pte_clear_tests() and update it's arguments
as required.



Frankly, I don't understand what these tests are testing. It all looks 
like some random clear and set.


static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
   unsigned long vaddr, unsigned long pfn,
   pgprot_t prot)
{

pte_t pte = pfn_pte(pfn, prot);
set_pte_at(mm, vaddr, ptep, pte);

pte =  ptep_get_and_clear(mm, vaddr, ptep);

pr_debug("Validating PTE clear\n");
pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
set_pte_at(mm, vaddr, ptep, pte);
barrier();
pte_clear(mm, vaddr, ptep);
pte = ptep_get(ptep);
WARN_ON(!pte_none(pte));
}


-aneesh


[PATCH v2] powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc

2020-09-01 Thread Aneesh Kumar K.V
The test is broken w.r.t page table update rules and results in kernel
crash as below. Disable the support until we get the tests updated.

[   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
cpu 0x0: Vector: 700 (Program Check) at [c00c6d1e76c0]
pc: c009a5ec: assert_pte_locked+0x14c/0x380
lr: c05c: pte_update+0x11c/0x190
sp: c00c6d1e7950
   msr: 82029033
  current = 0xc00c6d172c80
  paca= 0xc3ba   irqmask: 0x03   irq_happened: 0x01
pid   = 1, comm = swapper/0
kernel BUG at arch/powerpc/mm/pgtable.c:304!
[link register   ] c05c pte_update+0x11c/0x190
[c00c6d1e7950] 0001 (unreliable)
[c00c6d1e79b0] c05eee14 pte_update+0x44/0x190
[c00c6d1e7a10] c1a2ca9c pte_advanced_tests+0x160/0x3d8
[c00c6d1e7ab0] c1a2d4fc debug_vm_pgtable+0x7e8/0x1338
[c00c6d1e7ba0] c00116ec do_one_initcall+0xac/0x5f0
[c00c6d1e7c80] c19e4fac kernel_init_freeable+0x4dc/0x5a4
[c00c6d1e7db0] c0012474 kernel_init+0x24/0x160
[c00c6d1e7e20] c000cbd0 ret_from_kernel_thread+0x5c/0x6c

With DEBUG_VM disabled

[   20.530152] BUG: Kernel NULL pointer dereference on read at 0x
[   20.530183] Faulting instruction address: 0xc00df330
cpu 0x33: Vector: 380 (Data SLB Access) at [c00c6d19f700]
pc: c00df330: memset+0x68/0x104
lr: c009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
sp: c00c6d19f990
   msr: 82009033
   dar: 0
  current = 0xc00c6d177480
  paca= 0xc0001ec4f400   irqmask: 0x03   irq_happened: 0x01
pid   = 1, comm = swapper/0
[link register   ] c009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
[c00c6d19f990] c009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 
(unreliable)
[c00c6d19fa10] c19ebf30 pmd_advanced_tests+0x1f0/0x378
[c00c6d19fab0] c19ed088 debug_vm_pgtable+0x79c/0x1244
[c00c6d19fba0] c00116ec do_one_initcall+0xac/0x5f0
[c00c6d19fc80] c19a4fac kernel_init_freeable+0x4dc/0x5a4
[c00c6d19fdb0] c0012474 kernel_init+0x24/0x160
[c00c6d19fe20] c000cbd0 ret_from_kernel_thread+0x5c/0x6c
33:mon>

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 65bed1fdeaad..787e829b6f25 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -116,7 +116,6 @@ config PPC
#
select ARCH_32BIT_OFF_T if PPC32
select ARCH_HAS_DEBUG_VIRTUAL
-   select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
-- 
2.26.2



Re: [PATCH v3 09/13] mm/debug_vm_pgtable/locks: Move non page table modifying test together

2020-09-01 Thread Aneesh Kumar K.V





[   17.080644] [ cut here ]
[   17.081342] kernel BUG at mm/pgtable-generic.c:164!
[   17.082091] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[   17.082977] Modules linked in:
[   17.083481] CPU: 79 PID: 1 Comm: swapper/0 Tainted: G    W 
5.9.0-rc2-00105-g740e72cd6717 #14
[   17.084998] Hardware name: linux,dummy-virt (DT)
[   17.085745] pstate: 6045 (nZCv daif +PAN -UAO BTYPE=--)
[   17.086645] pc : pgtable_trans_huge_deposit+0x58/0x60
[   17.087462] lr : debug_vm_pgtable+0x4dc/0x8f0
[   17.088168] sp : 80001219bcf0
[   17.088710] x29: 80001219bcf0 x28: 8000114ac000
[   17.089574] x27: 8000114ac000 x26: 00200fd3
[   17.090431] x25: 002081400fd3 x24: fe00175c12c0
[   17.091286] x23: 0005df04d1a8 x22: f18d6e035000
[   17.092143] x21: 0005dd79 x20: 0005df18e1a8
[   17.093003] x19: 0005df04cb80 x18: 0014
[   17.093859] x17: b76667d0 x16: fd4e6611
[   17.094716] x15: 0001 x14: 0002
[   17.095572] x13: 0055d966 x12: fe001755e400
[   17.096429] x11: 0008 x10: 0005fcb6e210
[   17.097292] x9 : 0005fcb6e210 x8 : 002081590fd3
[   17.098149] x7 : 0005dd71e0c0 x6 : 0005df18e1a8
[   17.099006] x5 : 002081590fd3 x4 : 002081590fd3
[   17.099862] x3 : 0005df18e1a8 x2 : fe00175c12c0
[   17.100718] x1 : fe00175c1300 x0 : 
[   17.101583] Call trace:
[   17.101993]  pgtable_trans_huge_deposit+0x58/0x60
[   17.102754]  do_one_initcall+0x74/0x1cc
[   17.103381]  kernel_init_freeable+0x1d0/0x238
[   17.104089]  kernel_init+0x14/0x118
[   17.104658]  ret_from_fork+0x10/0x34
[   17.105251] Code: f9000443 f9000843 f9000822 d65f03c0 (d421)
[   17.106303] ---[ end trace e63471e00f8c147f ]---



IIUC, this should happen even without the series right? This is

 assert_spin_locked(pmd_lockptr(mm, pmdp));


Crash does not happen without this series. A previous patch [PATCH v3 08/13]
added pgtable_trans_huge_deposit/withdraw() in pmd_advanced_tests(). arm64
does not define __HAVE_ARCH_PGTABLE_DEPOSIT and hence falls back on generic
pgtable_trans_huge_deposit() which the asserts the lock.




I fixed that by moving the pgtable deposit after adding the pmd locks 
correctly.


mm/debug_vm_pgtable/locks: Move non page table modifying test together
mm/debug_vm_pgtable/locks: Take correct page table lock
mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP

-aneesh




Re: [PATCH v3 09/13] mm/debug_vm_pgtable/locks: Move non page table modifying test together

2020-09-01 Thread Anshuman Khandual



On 09/01/2020 12:08 PM, Aneesh Kumar K.V wrote:
> On 9/1/20 9:11 AM, Anshuman Khandual wrote:
>>
>>
>> On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:
>>> This will help in adding proper locks in a later patch
>>
>> It really makes sense to classify these tests here as static and dynamic.
>> Static are the ones that test via page table entry values modification
>> without changing anything on the actual page table itself. Where as the
>> dynamic tests do change the page table. Static tests would probably never
>> require any lock synchronization but the dynamic ones will do.
>>
>>>
>>> Signed-off-by: Aneesh Kumar K.V 
>>> ---
>>>   mm/debug_vm_pgtable.c | 52 ---
>>>   1 file changed, 29 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>>> index 0ce5c6a24c5b..78c8af3445ac 100644
>>> --- a/mm/debug_vm_pgtable.c
>>> +++ b/mm/debug_vm_pgtable.c
>>> @@ -992,7 +992,7 @@ static int __init debug_vm_pgtable(void)
>>>   p4dp = p4d_alloc(mm, pgdp, vaddr);
>>>   pudp = pud_alloc(mm, p4dp, vaddr);
>>>   pmdp = pmd_alloc(mm, pudp, vaddr);
>>> -    ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );
>>> +    ptep = pte_alloc_map(mm, pmdp, vaddr);
>>>     /*
>>>    * Save all the page table page addresses as the page table
>>> @@ -1012,33 +1012,12 @@ static int __init debug_vm_pgtable(void)
>>>   p4d_basic_tests(p4d_aligned, prot);
>>>   pgd_basic_tests(pgd_aligned, prot);
>>>   -    pte_clear_tests(mm, ptep, vaddr);
>>> -    pmd_clear_tests(mm, pmdp);
>>> -    pud_clear_tests(mm, pudp);
>>> -    p4d_clear_tests(mm, p4dp);
>>> -    pgd_clear_tests(mm, pgdp);
>>> -
>>> -    pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
>>> -    pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, 
>>> saved_ptep);
>>> -    pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
>>> -    hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
>>> -
>>>   pmd_leaf_tests(pmd_aligned, prot);
>>>   pud_leaf_tests(pud_aligned, prot);
>>>   -    pmd_huge_tests(pmdp, pmd_aligned, prot);
>>> -    pud_huge_tests(pudp, pud_aligned, prot);
>>> -
>>>   pte_savedwrite_tests(pte_aligned, protnone);
>>>   pmd_savedwrite_tests(pmd_aligned, protnone);
>>>   -    pte_unmap_unlock(ptep, ptl);
>>> -
>>> -    pmd_populate_tests(mm, pmdp, saved_ptep);
>>> -    pud_populate_tests(mm, pudp, saved_pmdp);
>>> -    p4d_populate_tests(mm, p4dp, saved_pudp);
>>> -    pgd_populate_tests(mm, pgdp, saved_p4dp);
>>> -
>>>   pte_special_tests(pte_aligned, prot);
>>>   pte_protnone_tests(pte_aligned, protnone);
>>>   pmd_protnone_tests(pmd_aligned, protnone);
>>> @@ -1056,11 +1035,38 @@ static int __init debug_vm_pgtable(void)
>>>   pmd_swap_tests(pmd_aligned, prot);
>>>     swap_migration_tests();
>>> -    hugetlb_basic_tests(pte_aligned, prot);
>>>     pmd_thp_tests(pmd_aligned, prot);
>>>   pud_thp_tests(pud_aligned, prot);
>>>   +    /*
>>> + * Page table modifying tests
>>> + */
>>
>> In this comment, please do add some more details about how these tests
>> need proper locking mechanism unlike the previous static ones. Also add
>> a similar comment section for the static tests that dont really change
>> page table and need not require corresponding locking mechanism. Both
>> comment sections will make this classification clear.
>>
>>> +    pte_clear_tests(mm, ptep, vaddr);
>>> +    pmd_clear_tests(mm, pmdp);
>>> +    pud_clear_tests(mm, pudp);
>>> +    p4d_clear_tests(mm, p4dp);
>>> +    pgd_clear_tests(mm, pgdp);
>>> +
>>> +    ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );
>>> +    pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
>>> +    pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, 
>>> saved_ptep);
>>> +    pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
>>> +    hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
>>> +
>>> +
>>> +    pmd_huge_tests(pmdp, pmd_aligned, prot);
>>> +    pud_huge_tests(pudp, pud_aligned, prot);
>>> +
>>> +    pte_unmap_unlock(ptep, ptl);
>>> +
>>> +    pmd_populate_tests(mm, pmdp, saved_ptep);
>>> +    pud_populate_tests(mm, pudp, saved_pmdp);
>>> +    p4d_populate_tests(mm, p4dp, saved_pudp);
>>> +    pgd_populate_tests(mm, pgdp, saved_p4dp);
>>> +
>>> +    hugetlb_basic_tests(pte_aligned, prot);
>>
>> hugetlb_basic_tests() is a non page table modifying static test and
>> should be classified accordingly.
>>
>>> +
>>>   p4d_free(mm, saved_p4dp);
>>>   pud_free(mm, saved_pudp);
>>>   pmd_free(mm, saved_pmdp);
>>>
>>
>> Changes till this patch fails to boot on arm64 platform. This should be
>> folded with the next patch.
>>
>> [   17.080644] [ cut here ]
>> [   17.081342] kernel BUG at mm/pgtable-generic.c:164!
>> [   17.082091] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>> [   17.082977] Modules linked in:
>> [   17.083481] CPU: 79 PID: 1 Comm: 

Re: [PATCH] arch: vdso: add vdso linker script to 'targets' instead of extra-y

2020-09-01 Thread Catalin Marinas
On Tue, Sep 01, 2020 at 03:22:39AM +0900, Masahiro Yamada wrote:
> The vdso linker script is preprocessed on demand.
> Adding it to 'targets' is enough to include the .cmd file.
> 
> Signed-off-by: Masahiro Yamada 

For arm64:

Acked-by: Catalin Marinas 


Re: [PATCH] powerpc/mm: Remove DEBUG_VM_PGTABLE support on ppc64

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 2:40 PM, Christophe Leroy wrote:



Le 01/09/2020 à 10:15, Christophe Leroy a écrit :



Le 01/09/2020 à 10:12, Aneesh Kumar K.V a écrit :

On 9/1/20 1:40 PM, Christophe Leroy wrote:



Le 01/09/2020 à 10:02, Aneesh Kumar K.V a écrit :

The test is broken w.r.t page table update rules and results in kernel
crash as below. Disable the support untill we get the tests updated.


Signed-off-by: Aneesh Kumar K.V 


Any Fixes: tag ?


---
  arch/powerpc/Kconfig | 1 -
  1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 65bed1fdeaad..787e829b6f25 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -116,7 +116,6 @@ config PPC
  #
  select ARCH_32BIT_OFF_T if PPC32
  select ARCH_HAS_DEBUG_VIRTUAL
-    select ARCH_HAS_DEBUG_VM_PGTABLE



You say you remove support for ppc64 but you are removing it for 
both PPC64 and PPC32. Should you replace the line by:


Does it work on PPC32 with DEBUG_VM enabled? I was expecting it will 
be broken there too.


I was wondering. I have just started a build to test that on my 8xx. 
I'll tell you before noon (Paris).




There are warnings but it boots OK. So up to you, but if you deactivate 
it for both PPC32 and PPC64, say so in the subject like.




I will update the subject line to indicate powerpc instead of ppc64

-aneesh


Re: [PATCH v2 00/13] mm/debug_vm_pgtable fixes

2020-09-01 Thread Anshuman Khandual



On 09/01/2020 01:33 PM, Christophe Leroy wrote:
> 
> 
> Le 21/08/2020 à 10:51, Anshuman Khandual a écrit :
>>
>> On 08/19/2020 06:30 PM, Aneesh Kumar K.V wrote:
>>> This patch series includes fixes for debug_vm_pgtable test code so that
>>> they follow page table updates rules correctly. The first two patches 
>>> introduce
>>> changes w.r.t ppc64. The patches are included in this series for 
>>> completeness. We can
>>> merge them via ppc64 tree if required.
>>>
>>> Hugetlb test is disabled on ppc64 because that needs larger change to 
>>> satisfy
>>> page table update rules.
>>>
> 
>>
>> Changes proposed here will impact other enabled platforms as well.
>> Adding the following folks and mailing lists, and hoping to get a
>> broader review and test coverage. Please do include them in the
>> next iteration as well.
>>
>> + linux-arm-ker...@lists.infradead.org
>> + linux-s...@vger.kernel.org
>> + linux-snps-...@lists.infradead.org
>> + x...@kernel.org
>> + linux-a...@vger.kernel.org
>>
>> + Gerald Schaefer 
>> + Christophe Leroy 
> 
> Please don't use anymore the above address. Only use the one below.
> 
>> + Christophe Leroy 

Sure, noted.

>> + Vineet Gupta 
>> + Mike Rapoport 
>> + Qian Cai 
>>
> 
> Thanks
> Christophe
> 
>


Re: [PATCH] powerpc/mm: Remove DEBUG_VM_PGTABLE support on ppc64

2020-09-01 Thread Christophe Leroy




Le 01/09/2020 à 10:15, Christophe Leroy a écrit :



Le 01/09/2020 à 10:12, Aneesh Kumar K.V a écrit :

On 9/1/20 1:40 PM, Christophe Leroy wrote:



Le 01/09/2020 à 10:02, Aneesh Kumar K.V a écrit :

The test is broken w.r.t page table update rules and results in kernel
crash as below. Disable the support untill we get the tests updated.


Signed-off-by: Aneesh Kumar K.V 


Any Fixes: tag ?


---
  arch/powerpc/Kconfig | 1 -
  1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 65bed1fdeaad..787e829b6f25 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -116,7 +116,6 @@ config PPC
  #
  select ARCH_32BIT_OFF_T if PPC32
  select ARCH_HAS_DEBUG_VIRTUAL
-    select ARCH_HAS_DEBUG_VM_PGTABLE



You say you remove support for ppc64 but you are removing it for both 
PPC64 and PPC32. Should you replace the line by:


Does it work on PPC32 with DEBUG_VM enabled? I was expecting it will 
be broken there too.


I was wondering. I have just started a build to test that on my 8xx. 
I'll tell you before noon (Paris).




There are warnings but it boots OK. So up to you, but if you deactivate 
it for both PPC32 and PPC64, say so in the subject like.


[7.065399] debug_vm_pgtable: [debug_vm_pgtable ]: Validating 
architecture page table helpers

[7.075155] [ cut here ]
[7.079590] WARNING: CPU: 0 PID: 1 at arch/powerpc/mm/pgtable.c:185 
set_pte_at+0x20/0xf4
[7.087542] CPU: 0 PID: 1 Comm: swapper Not tainted 
5.9.0-rc2-s3k-dev-01348-g283e890ee4ad #3933

[7.096122] NIP:  c000f634 LR: c07440f8 CTR: 
[7.101124] REGS: c9023c50 TRAP: 0700   Not tainted 
(5.9.0-rc2-s3k-dev-01348-g283e890ee4ad)

[7.109445] MSR:  00029032   CR: 53000339  XER: 80006100
[7.116072]
[7.116072] GPR00: c07440f8 c9023d08 c60e4000 c6ba4000 1efac000 
c6ba8eb0 c9023da8 0001
[7.116072] GPR08: 0004 007346c9 c6ba8ebc  93000333 
 c000390c 
[7.116072] GPR16: c084 0ec9 01ec 00734000 06ba8000 
c6bb c05f43e8 1efac000
[7.116072] GPR24: fff0 c6b96d08 c6ba8eac c6ba4000 1efac000 
007346c9 c6ba8eb0 007346c9

[7.150796] NIP [c000f634] set_pte_at+0x20/0xf4
[7.155274] LR [c07440f8] pte_advanced_tests+0xec/0x2bc
[7.160401] Call Trace:
[7.162831] [c9023d08] [c080db94] 0xc080db94 (unreliable)
[7.168183] [c9023d28] [c07440f8] pte_advanced_tests+0xec/0x2bc
[7.174036] [c9023dd8] [c0744498] debug_vm_pgtable+0x1d0/0x668
[7.179827] [c9023e98] [c0734cd4] do_one_initcall+0x8c/0x1cc
[7.185405] [c9023ef8] [c0735008] kernel_init_freeable+0x178/0x1d0
[7.191511] [c9023f28] [c0003920] kernel_init+0x14/0x114
[7.196763] [c9023f38] [c000e184] ret_from_kernel_thread+0x14/0x1c
[7.202818] Instruction dump:
[7.205754] bba10014 7c0803a6 38210020 4e800020 7c0802a6 9421ffe0 
bfc10018 90010024
[7.213412] 83e6 8125 71270001 41820008 <0fe0> 73e90040 
41820080 73ea0001

[7.221249] ---[ end trace 95bbebcafa22d0f7 ]---
[7.226049] [ cut here ]
[7.230438] WARNING: CPU: 0 PID: 1 at arch/powerpc/mm/pgtable.c:185 
set_pte_at+0x20/0xf4
[7.238410] CPU: 0 PID: 1 Comm: swapper Tainted: GW 
5.9.0-rc2-s3k-dev-01348-g283e890ee4ad #3933

[7.248363] NIP:  c000f634 LR: c0744218 CTR: 
[7.253368] REGS: c9023c50 TRAP: 0700   Tainted: GW 
(5.9.0-rc2-s3k-dev-01348-g283e890ee4ad)

[7.263064] MSR:  00029032   CR: 53000335  XER: a0006100
[7.269690]
[7.269690] GPR00: c0744218 c9023d08 c60e4000 c6ba4000 1efac000 
c6ba8eb0 c9023da8 0001
[7.269690] GPR08:  007341c9  007341c9 93000333 
 c000390c 
[7.269690] GPR16: c084 0ec9 01ec 00734000 06ba8000 
c6bb c05f43e8 1efac000
[7.269690] GPR24: fff0 c6b96d08 c6ba8eac c6ba4000 1efac000 
007346c9 c6ba8eb0 007346c9

[7.304418] NIP [c000f634] set_pte_at+0x20/0xf4
[7.308892] LR [c0744218] pte_advanced_tests+0x20c/0x2bc
[7.314105] Call Trace:
[7.316535] [c9023d08] [c080db94] 0xc080db94 (unreliable)
[7.321888] [c9023d28] [c0744218] pte_advanced_tests+0x20c/0x2bc
[7.327826] [c9023dd8] [c0744498] debug_vm_pgtable+0x1d0/0x668
[7.333613] [c9023e98] [c0734cd4] do_one_initcall+0x8c/0x1cc
[7.339196] [c9023ef8] [c0735008] kernel_init_freeable+0x178/0x1d0
[7.345300] [c9023f28] [c0003920] kernel_init+0x14/0x114
[7.350551] [c9023f38] [c000e184] ret_from_kernel_thread+0x14/0x1c
[7.356609] Instruction dump:
[7.359545] bba10014 7c0803a6 38210020 4e800020 7c0802a6 9421ffe0 
bfc10018 90010024
[7.367203] 83e6 8125 71270001 41820008 <0fe0> 73e90040 
41820080 73ea0001

[7.375039] ---[ end trace 95bbebcafa22d0f8 ]---
[7.379783] [ cut here ]
[7.384228] WARNING: CPU: 0 PID: 1 at arch/powerpc/mm/pgtable.c:276 
set_huge_pte_at+0x104/0x134
[7.392803] CPU: 0 PID: 1 Comm: swapper Tainted: GW 

Re: [RESEND][PATCH 0/7] Avoid overflow at boundary_size

2020-09-01 Thread Christoph Hellwig
On Tue, Sep 01, 2020 at 12:54:01AM -0700, Nicolin Chen wrote:
> Hi Christoph,
> 
> On Tue, Sep 01, 2020 at 09:36:23AM +0200, Christoph Hellwig wrote:
> > I really don't like all the open coded smarts in the various drivers.
> > What do you think about a helper like the one in the untested patch
> 
> A helper function will be actually better. I was thinking of
> one yet not very sure about the naming and where to put it.
> 
> > below (on top of your series).  Also please include the original
> > segment boundary patch with the next resend so that the series has
> > the full context.
> 
> I will use your change instead and resend with the ULONG_MAX
> change. But in that case, should I make separate changes for
> different files like this series, or just one single change
> like yours?
> 
> Asking this as I was expecting that those changes would get
> applied by different maintainers. But now it feels like you
> will merge it to your tree at once?

I guess one patch is fine.  I can queue it up in the dma-mapping
tree as a prep patch for the default boundary change.


Re: [PATCH] powerpc/mm: Remove DEBUG_VM_PGTABLE support on ppc64

2020-09-01 Thread Christophe Leroy




Le 01/09/2020 à 10:12, Aneesh Kumar K.V a écrit :

On 9/1/20 1:40 PM, Christophe Leroy wrote:



Le 01/09/2020 à 10:02, Aneesh Kumar K.V a écrit :

The test is broken w.r.t page table update rules and results in kernel
crash as below. Disable the support untill we get the tests updated.


Signed-off-by: Aneesh Kumar K.V 


Any Fixes: tag ?


---
  arch/powerpc/Kconfig | 1 -
  1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 65bed1fdeaad..787e829b6f25 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -116,7 +116,6 @@ config PPC
  #
  select ARCH_32BIT_OFF_T if PPC32
  select ARCH_HAS_DEBUG_VIRTUAL
-    select ARCH_HAS_DEBUG_VM_PGTABLE



You say you remove support for ppc64 but you are removing it for both 
PPC64 and PPC32. Should you replace the line by:


Does it work on PPC32 with DEBUG_VM enabled? I was expecting it will be 
broken there too.


I was wondering. I have just started a build to test that on my 8xx. 
I'll tell you before noon (Paris).


Christophe





 select ARCH_HAS_DEBUG_VM_PGTABLE if PPC32


  select ARCH_HAS_DEVMEM_IS_ALLOWED
  select ARCH_HAS_ELF_RANDOMIZE
  select ARCH_HAS_FORTIFY_SOURCE



What about 
Documentation/features/debug/debug-vm-pgtable/arch-support.txt ?




I am hoping we can enable the config once we resolve the test issues. 
may be in next merge window.


-aneesh





Re: [PATCH] powerpc/mm: Remove DEBUG_VM_PGTABLE support on ppc64

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 1:40 PM, Christophe Leroy wrote:



Le 01/09/2020 à 10:02, Aneesh Kumar K.V a écrit :

The test is broken w.r.t page table update rules and results in kernel
crash as below. Disable the support untill we get the tests updated.


Signed-off-by: Aneesh Kumar K.V 


Any Fixes: tag ?


---
  arch/powerpc/Kconfig | 1 -
  1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 65bed1fdeaad..787e829b6f25 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -116,7 +116,6 @@ config PPC
  #
  select ARCH_32BIT_OFF_T if PPC32
  select ARCH_HAS_DEBUG_VIRTUAL
-    select ARCH_HAS_DEBUG_VM_PGTABLE



You say you remove support for ppc64 but you are removing it for both 
PPC64 and PPC32. Should you replace the line by:


Does it work on PPC32 with DEBUG_VM enabled? I was expecting it will be 
broken there too.




 select ARCH_HAS_DEBUG_VM_PGTABLE if PPC32


  select ARCH_HAS_DEVMEM_IS_ALLOWED
  select ARCH_HAS_ELF_RANDOMIZE
  select ARCH_HAS_FORTIFY_SOURCE



What about Documentation/features/debug/debug-vm-pgtable/arch-support.txt ?



I am hoping we can enable the config once we resolve the test issues. 
may be in next merge window.


-aneesh





Re: [PATCH] powerpc/mm: Remove DEBUG_VM_PGTABLE support on ppc64

2020-09-01 Thread Christophe Leroy




Le 01/09/2020 à 10:02, Aneesh Kumar K.V a écrit :

The test is broken w.r.t page table update rules and results in kernel
crash as below. Disable the support untill we get the tests updated.


Signed-off-by: Aneesh Kumar K.V 


Any Fixes: tag ?


---
  arch/powerpc/Kconfig | 1 -
  1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 65bed1fdeaad..787e829b6f25 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -116,7 +116,6 @@ config PPC
#
select ARCH_32BIT_OFF_T if PPC32
select ARCH_HAS_DEBUG_VIRTUAL
-   select ARCH_HAS_DEBUG_VM_PGTABLE



You say you remove support for ppc64 but you are removing it for both 
PPC64 and PPC32. Should you replace the line by:


select ARCH_HAS_DEBUG_VM_PGTABLE if PPC32


select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE



What about Documentation/features/debug/debug-vm-pgtable/arch-support.txt ?

Christophe


Re: [PATCH v2 00/13] mm/debug_vm_pgtable fixes

2020-09-01 Thread Christophe Leroy




Le 21/08/2020 à 10:51, Anshuman Khandual a écrit :


On 08/19/2020 06:30 PM, Aneesh Kumar K.V wrote:

This patch series includes fixes for debug_vm_pgtable test code so that
they follow page table updates rules correctly. The first two patches introduce
changes w.r.t ppc64. The patches are included in this series for completeness. 
We can
merge them via ppc64 tree if required.

Hugetlb test is disabled on ppc64 because that needs larger change to satisfy
page table update rules.





Changes proposed here will impact other enabled platforms as well.
Adding the following folks and mailing lists, and hoping to get a
broader review and test coverage. Please do include them in the
next iteration as well.

+ linux-arm-ker...@lists.infradead.org
+ linux-s...@vger.kernel.org
+ linux-snps-...@lists.infradead.org
+ x...@kernel.org
+ linux-a...@vger.kernel.org

+ Gerald Schaefer 
+ Christophe Leroy 


Please don't use anymore the above address. Only use the one below.


+ Christophe Leroy 
+ Vineet Gupta 
+ Mike Rapoport 
+ Qian Cai 



Thanks
Christophe


[PATCH] powerpc/mm: Remove DEBUG_VM_PGTABLE support on ppc64

2020-09-01 Thread Aneesh Kumar K.V
The test is broken w.r.t page table update rules and results in kernel
crash as below. Disable the support untill we get the tests updated.

[   21.083506] [ cut here ]
[   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
cpu 0x0: Vector: 700 (Program Check) at [c00c6d1e76c0]
pc: c009a5ec: assert_pte_locked+0x14c/0x380
lr: c05c: pte_update+0x11c/0x190
sp: c00c6d1e7950
   msr: 82029033
  current = 0xc00c6d172c80
  paca= 0xc3ba   irqmask: 0x03   irq_happened: 0x01
pid   = 1, comm = swapper/0
kernel BUG at arch/powerpc/mm/pgtable.c:304!
Linux version 5.9.0-rc2-34902-g4da73871507c (kvaneesh@ltczz75-lp2) (gcc (Ubuntu 
9.3.0-10ubuntu2) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #301 SMP Tue Sep 
1 02:28:29 CDT 2020
enter ? for help
[link register   ] c05c pte_update+0x11c/0x190
[c00c6d1e7950] 0001 (unreliable)
[c00c6d1e79b0] c05eee14 pte_update+0x44/0x190
[c00c6d1e7a10] c1a2ca9c pte_advanced_tests+0x160/0x3d8
[c00c6d1e7ab0] c1a2d4fc debug_vm_pgtable+0x7e8/0x1338
[c00c6d1e7ba0] c00116ec do_one_initcall+0xac/0x5f0
[c00c6d1e7c80] c19e4fac kernel_init_freeable+0x4dc/0x5a4
[c00c6d1e7db0] c0012474 kernel_init+0x24/0x160
[c00c6d1e7e20] c000cbd0 ret_from_kernel_thread+0x5c/0x6c

With DEBUG_VM disabled

[   20.530152] BUG: Kernel NULL pointer dereference on read at 0x
[   20.530183] Faulting instruction address: 0xc00df330
cpu 0x33: Vector: 380 (Data SLB Access) at [c00c6d19f700]
pc: c00df330: memset+0x68/0x104
lr: c009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
sp: c00c6d19f990
   msr: 82009033
   dar: 0
  current = 0xc00c6d177480
  paca= 0xc0001ec4f400   irqmask: 0x03   irq_happened: 0x01
pid   = 1, comm = swapper/0
Linux version 5.9.0-rc2-34902-g4da73871507c (kvaneesh@ltczz75-lp2) (gcc (Ubuntu 
9.3.0-10ubuntu2) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #302 SMP Tue Sep 
1 02:56:02 CDT 2020
enter ? for help
[link register   ] c009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
[c00c6d19f990] c009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 
(unreliable)
[c00c6d19fa10] c19ebf30 pmd_advanced_tests+0x1f0/0x378
[c00c6d19fab0] c19ed088 debug_vm_pgtable+0x79c/0x1244
[c00c6d19fba0] c00116ec do_one_initcall+0xac/0x5f0
[c00c6d19fc80] c19a4fac kernel_init_freeable+0x4dc/0x5a4
[c00c6d19fdb0] c0012474 kernel_init+0x24/0x160
[c00c6d19fe20] c000cbd0 ret_from_kernel_thread+0x5c/0x6c
33:mon>

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 65bed1fdeaad..787e829b6f25 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -116,7 +116,6 @@ config PPC
#
select ARCH_32BIT_OFF_T if PPC32
select ARCH_HAS_DEBUG_VIRTUAL
-   select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
-- 
2.26.2



Re: [PATCH v3 12/13] mm/debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64

2020-09-01 Thread Christophe Leroy




Le 01/09/2020 à 08:30, Aneesh Kumar K.V a écrit :




I actually wanted to add #ifdef BROKEN. That test is completely broken. 
Infact I would suggest to remove that test completely.





#ifdef will not be required here as there would be a stub definition
for hugetlb_advanced_tests() when CONFIG_PPC_BOOK3S_64 is enabled.


  spin_lock(>page_table_lock);
  p4d_clear_tests(mm, p4dp);



But again, we should really try and avoid taking this path.



To be frank i am kind of frustrated with how this patch series is being 
looked at. We pushed a completely broken test to upstream and right now 
we have a code in upstream that crash when booted on ppc64. My attempt 
has been to make progress here and you definitely seems to be not in 
agreement to that.


At this point I am tempted to suggest we remove the DEBUG_VM_PGTABLE 
support on ppc64 because AFAIU it doesn't add any value.




Note that a bug has been filed at 
https://bugzilla.kernel.org/show_bug.cgi?id=209029


Christophe


Re: [PATCH v3 03/13] mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 1:16 PM, Anshuman Khandual wrote:



On 09/01/2020 01:06 PM, Aneesh Kumar K.V wrote:

On 9/1/20 1:02 PM, Anshuman Khandual wrote:



On 09/01/2020 11:51 AM, Aneesh Kumar K.V wrote:

On 9/1/20 8:45 AM, Anshuman Khandual wrote:



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:

ppc64 use bit 62 to indicate a pte entry (_PAGE_PTE). Avoid setting that bit in
random value.

Signed-off-by: Aneesh Kumar K.V 
---
    mm/debug_vm_pgtable.c | 13 ++---
    1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 086309fb9b6f..bbf9df0e64c6 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -44,10 +44,17 @@
     * entry type. But these bits might affect the ability to clear entries with
     * pxx_clear() because of how dynamic page table folding works on s390. So
     * while loading up the entries do not change the lower 4 bits. It does not
- * have affect any other platform.
+ * have affect any other platform. Also avoid the 62nd bit on ppc64 that is
+ * used to mark a pte entry.
     */
-#define S390_MASK_BITS    4
-#define RANDOM_ORVALUE    GENMASK(BITS_PER_LONG - 1, S390_MASK_BITS)
+#define S390_SKIP_MASK    GENMASK(3, 0)
+#ifdef CONFIG_PPC_BOOK3S_64
+#define PPC64_SKIP_MASK    GENMASK(62, 62)
+#else
+#define PPC64_SKIP_MASK    0x0
+#endif


Please drop the #ifdef CONFIG_PPC_BOOK3S_64 here. We already accommodate skip
bits for a s390 platform requirement and can also do so for ppc64 as well. As
mentioned before, please avoid adding any platform specific constructs in the
test.




that is needed so that it can be built on 32 bit architectures.I did face build 
errors with arch-linux


Could not (#if __BITS_PER_LONG == 32) be used instead or something like
that. But should be a generic conditional check identifying 32 bit arch
not anything platform specific.



that _PAGE_PTE bit is pretty much specific to PPC BOOK3S_64.  Not sure why 
other architectures need to bothered about ignoring bit 62.


Thats okay as long it does not adversely affect other architectures, ignoring
some more bits is acceptable. Like existing S390_MASK_BITS gets ignored on all
other platforms even if it is a s390 specific constraint. Not having platform
specific #ifdef here, is essential.



Why is it essential?

-aneesh


Re: [RESEND][PATCH 0/7] Avoid overflow at boundary_size

2020-09-01 Thread Nicolin Chen
Hi Christoph,

On Tue, Sep 01, 2020 at 09:36:23AM +0200, Christoph Hellwig wrote:
> I really don't like all the open coded smarts in the various drivers.
> What do you think about a helper like the one in the untested patch

A helper function will be actually better. I was thinking of
one yet not very sure about the naming and where to put it.

> below (on top of your series).  Also please include the original
> segment boundary patch with the next resend so that the series has
> the full context.

I will use your change instead and resend with the ULONG_MAX
change. But in that case, should I make separate changes for
different files like this series, or just one single change
like yours?

Asking this as I was expecting that those changes would get
applied by different maintainers. But now it feels like you
will merge it to your tree at once?

Thanks
Nic

> diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
> index 1ef2c647bd3ec2..6f7de4f4e191e7 100644
> --- a/arch/alpha/kernel/pci_iommu.c
> +++ b/arch/alpha/kernel/pci_iommu.c
> @@ -141,10 +141,7 @@ iommu_arena_find_pages(struct device *dev, struct 
> pci_iommu_arena *arena,
>   unsigned long boundary_size;
>  
>   base = arena->dma_base >> PAGE_SHIFT;
> -
> - boundary_size = dev ? dma_get_seg_boundary(dev) : U32_MAX;
> - /* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
> - boundary_size = (boundary_size >> PAGE_SHIFT) + 1;
> + boundary_size = dma_get_seg_boundary_nr_pages(dev, PAGE_SHIFT);
>  
>   /* Search forward for the first mask-aligned sequence of N free ptes */
>   ptes = arena->ptes;
> diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
> index 945954903bb0ba..b49b73a95067d2 100644
> --- a/arch/ia64/hp/common/sba_iommu.c
> +++ b/arch/ia64/hp/common/sba_iommu.c
> @@ -485,8 +485,7 @@ sba_search_bitmap(struct ioc *ioc, struct device *dev,
>   ASSERT(((unsigned long) ioc->res_hint & (sizeof(unsigned long) - 1UL)) 
> == 0);
>   ASSERT(res_ptr < res_end);
>  
> - /* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
> - boundary_size = (dma_get_seg_boundary(dev) >> iovp_shift) + 1;
> + boundary_size = dma_get_seg_boundary_nr_pages(dev, iovp_shift);
>  
>   BUG_ON(ioc->ibase & ~iovp_mask);
>   shift = ioc->ibase >> iovp_shift;
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index c01ccbf8afdd42..cbc2e62db597cf 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -236,11 +236,7 @@ static unsigned long iommu_range_alloc(struct device 
> *dev,
>   }
>   }
>  
> - /* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
> - boundary_size = dev ? dma_get_seg_boundary(dev) : U32_MAX;
> -
> - /* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
> - boundary_size = (boundary_size >> tbl->it_page_shift) + 1;
> + boundary_size = dma_get_seg_boundary_nr_pages(dev, tbl->it_page_shift);
>  
>   n = iommu_area_alloc(tbl->it_map, limit, start, npages, tbl->it_offset,
>boundary_size, align_mask);
> diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
> index ecb067acc6d532..4a37d8f4de9d9d 100644
> --- a/arch/s390/pci/pci_dma.c
> +++ b/arch/s390/pci/pci_dma.c
> @@ -261,13 +261,11 @@ static unsigned long __dma_alloc_iommu(struct device 
> *dev,
>  unsigned long start, int size)
>  {
>   struct zpci_dev *zdev = to_zpci(to_pci_dev(dev));
> - unsigned long boundary_size;
>  
> - /* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
> - boundary_size = (dma_get_seg_boundary(dev) >> PAGE_SHIFT) + 1;
>   return iommu_area_alloc(zdev->iommu_bitmap, zdev->iommu_pages,
>   start, size, zdev->start_dma >> PAGE_SHIFT,
> - boundary_size, 0);
> + dma_get_seg_boundary_nr_pages(dev, PAGE_SHIFT),
> + 0);
>  }
>  
>  static dma_addr_t dma_alloc_address(struct device *dev, int size)
> diff --git a/arch/sparc/kernel/iommu-common.c 
> b/arch/sparc/kernel/iommu-common.c
> index 843e71894d0482..e6139c99762e11 100644
> --- a/arch/sparc/kernel/iommu-common.c
> +++ b/arch/sparc/kernel/iommu-common.c
> @@ -166,10 +166,6 @@ unsigned long iommu_tbl_range_alloc(struct device *dev,
>   }
>   }
>  
> - boundary_size = dev ? dma_get_seg_boundary(dev) : U32_MAX;
> -
> - /* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
> - boundary_size = (boundary_size >> iommu->table_shift) + 1;
>   /*
>* if the skip_span_boundary_check had been set during init, we set
>* things up so that iommu_is_span_boundary() merely checks if the
> @@ -178,7 +174,11 @@ unsigned long iommu_tbl_range_alloc(struct device *dev,
>   if ((iommu->flags & IOMMU_NO_SPAN_BOUND) != 0) {
>   shift = 0;
>   

Re: [PATCH v3 08/13] mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP

2020-09-01 Thread Christophe Leroy




Le 01/09/2020 à 09:40, Aneesh Kumar K.V a écrit :

On 9/1/20 12:20 PM, Christophe Leroy wrote:



Le 01/09/2020 à 08:25, Aneesh Kumar K.V a écrit :

On 9/1/20 8:52 AM, Anshuman Khandual wrote:




There is a checkpatch.pl warning here.

WARNING: Possible unwrapped commit description (prefer a maximum 75 
chars per line)

#7:
Architectures like ppc64 use deposited page table while updating the 
huge pte


total: 0 errors, 1 warnings, 40 lines checked



I will ignore all these, because they are not really important IMHO.



When doing a git log in a 80 chars terminal window, having wrapping 
lines is not really convenient. It should be easy to avoid it.




We have been ignoring that for a long time  isn't it?

For example ppc64 checkpatch already had
--max-line-length=90


There was also recent discussion whether 80 character limit is valid any 
more. But I do keep it restricted to 80 character where ever it is 
easy/make sense.




Here we are not talking about the code, but the commit log.

As far as I know, the discussions about 80 character lines, 90 lines in 
powerpc etc ... is for the code.


We still aim at keeping lines not longer than 75 chars in the commit log.

Christophe

Christophe


Re: [PATCH v3 03/13] mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value

2020-09-01 Thread Anshuman Khandual



On 09/01/2020 01:06 PM, Aneesh Kumar K.V wrote:
> On 9/1/20 1:02 PM, Anshuman Khandual wrote:
>>
>>
>> On 09/01/2020 11:51 AM, Aneesh Kumar K.V wrote:
>>> On 9/1/20 8:45 AM, Anshuman Khandual wrote:


 On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:
> ppc64 use bit 62 to indicate a pte entry (_PAGE_PTE). Avoid setting that 
> bit in
> random value.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>    mm/debug_vm_pgtable.c | 13 ++---
>    1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index 086309fb9b6f..bbf9df0e64c6 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -44,10 +44,17 @@
>     * entry type. But these bits might affect the ability to clear 
> entries with
>     * pxx_clear() because of how dynamic page table folding works on 
> s390. So
>     * while loading up the entries do not change the lower 4 bits. It 
> does not
> - * have affect any other platform.
> + * have affect any other platform. Also avoid the 62nd bit on ppc64 that 
> is
> + * used to mark a pte entry.
>     */
> -#define S390_MASK_BITS    4
> -#define RANDOM_ORVALUE    GENMASK(BITS_PER_LONG - 1, S390_MASK_BITS)
> +#define S390_SKIP_MASK    GENMASK(3, 0)
> +#ifdef CONFIG_PPC_BOOK3S_64
> +#define PPC64_SKIP_MASK    GENMASK(62, 62)
> +#else
> +#define PPC64_SKIP_MASK    0x0
> +#endif

 Please drop the #ifdef CONFIG_PPC_BOOK3S_64 here. We already accommodate 
 skip
 bits for a s390 platform requirement and can also do so for ppc64 as well. 
 As
 mentioned before, please avoid adding any platform specific constructs in 
 the
 test.

>>>
>>>
>>> that is needed so that it can be built on 32 bit architectures.I did face 
>>> build errors with arch-linux
>>
>> Could not (#if __BITS_PER_LONG == 32) be used instead or something like
>> that. But should be a generic conditional check identifying 32 bit arch
>> not anything platform specific.
>>
> 
> that _PAGE_PTE bit is pretty much specific to PPC BOOK3S_64.  Not sure why 
> other architectures need to bothered about ignoring bit 62.

Thats okay as long it does not adversely affect other architectures, ignoring
some more bits is acceptable. Like existing S390_MASK_BITS gets ignored on all
other platforms even if it is a s390 specific constraint. Not having platform
specific #ifdef here, is essential.


Re: [PATCH v3 08/13] mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 12:20 PM, Christophe Leroy wrote:



Le 01/09/2020 à 08:25, Aneesh Kumar K.V a écrit :

On 9/1/20 8:52 AM, Anshuman Khandual wrote:




There is a checkpatch.pl warning here.

WARNING: Possible unwrapped commit description (prefer a maximum 75 
chars per line)

#7:
Architectures like ppc64 use deposited page table while updating the 
huge pte


total: 0 errors, 1 warnings, 40 lines checked



I will ignore all these, because they are not really important IMHO.



When doing a git log in a 80 chars terminal window, having wrapping 
lines is not really convenient. It should be easy to avoid it.




We have been ignoring that for a long time  isn't it?

For example ppc64 checkpatch already had
--max-line-length=90


There was also recent discussion whether 80 character limit is valid any 
more. But I do keep it restricted to 80 character where ever it is 
easy/make sense.


-aneesh



Re: [PATCH v3 13/13] mm/debug_vm_pgtable: populate a pte entry before fetching it

2020-09-01 Thread Anshuman Khandual



On 09/01/2020 12:07 PM, Aneesh Kumar K.V wrote:
> On 9/1/20 8:55 AM, Anshuman Khandual wrote:
>>
>>
>> On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:
>>> pte_clear_tests operate on an existing pte entry. Make sure that is not a 
>>> none
>>> pte entry.
>>>
>>> Signed-off-by: Aneesh Kumar K.V 
>>> ---
>>>   mm/debug_vm_pgtable.c | 6 --
>>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>>> index 21329c7d672f..8527ebb75f2c 100644
>>> --- a/mm/debug_vm_pgtable.c
>>> +++ b/mm/debug_vm_pgtable.c
>>> @@ -546,7 +546,7 @@ static void __init pgd_populate_tests(struct mm_struct 
>>> *mm, pgd_t *pgdp,
>>>   static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
>>>  unsigned long vaddr)
>>>   {
>>> -    pte_t pte = ptep_get(ptep);
>>> +    pte_t pte =  ptep_get_and_clear(mm, vaddr, ptep);
>>
>> Seems like ptep_get_and_clear() here just clears the entry in preparation
>> for a following set_pte_at() which otherwise would have been a problem on
>> ppc64 as you had pointed out earlier i.e set_pte_at() should not update an
>> existing valid entry. So the commit message here is bit misleading.
>>
> 
> and also fetch the pte value which is used further.
> 
> 
>>>     pr_debug("Validating PTE clear\n");
>>>   pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
>>> @@ -944,7 +944,7 @@ static int __init debug_vm_pgtable(void)
>>>   p4d_t *p4dp, *saved_p4dp;
>>>   pud_t *pudp, *saved_pudp;
>>>   pmd_t *pmdp, *saved_pmdp, pmd;
>>> -    pte_t *ptep;
>>> +    pte_t *ptep, pte;
>>>   pgtable_t saved_ptep;
>>>   pgprot_t prot, protnone;
>>>   phys_addr_t paddr;
>>> @@ -1049,6 +1049,8 @@ static int __init debug_vm_pgtable(void)
>>>    */
>>>     ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );
>>> +    pte = pfn_pte(pte_aligned, prot);
>>> +    set_pte_at(mm, vaddr, ptep, pte);
>>
>> Not here, creating and populating an entry must be done in respective
>> test functions itself. Besides, this seems bit redundant as well. The
>> test pte_clear_tests() with the above change added, already
>>
>> - Clears the PTEP entry with ptep_get_and_clear()
> 
> and fetch the old value set previously.

In that case, please move above two lines i.e

pte = pfn_pte(pte_aligned, prot);
set_pte_at(mm, vaddr, ptep, pte);

from debug_vm_pgtable() to pte_clear_tests() and update it's arguments
as required.


Re: fsl_espi errors on v5.7.15

2020-09-01 Thread Heiner Kallweit
On 01.09.2020 04:52, Chris Packham wrote:
> 
> On 1/09/20 12:33 am, Heiner Kallweit wrote:
>> On 30.08.2020 23:59, Chris Packham wrote:
>>> On 31/08/20 9:41 am, Heiner Kallweit wrote:
 On 30.08.2020 23:00, Chris Packham wrote:
> On 31/08/20 12:30 am, Nicholas Piggin wrote:
>> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
> 
>
>>> I've also now seen the RX FIFO not empty error on the T2080RDB
>>>
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>
>>> With my current workaround of emptying the RX FIFO. It seems
>>> survivable. Interestingly it only ever seems to be 1 extra byte in 
>>> the
>>> RX FIFO and it seems to be after either a READ_SR or a READ_FSR.
>>>
>>> fsl_espi ffe11.spi: tx 70
>>> fsl_espi ffe11.spi: rx 03
>>> fsl_espi ffe11.spi: Extra RX 00
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>> fsl_espi ffe11.spi: tx 05
>>> fsl_espi ffe11.spi: rx 00
>>> fsl_espi ffe11.spi: Extra RX 03
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>> fsl_espi ffe11.spi: tx 05
>>> fsl_espi ffe11.spi: rx 00
>>> fsl_espi ffe11.spi: Extra RX 03
>>>
>>> From all the Micron SPI-NOR datasheets I've got access to it is
>>> possible to continually read the SR/FSR. But I've no idea why it
>>> happens some times and not others.
>> So I think I've got a reproduction and I think I've bisected the 
>> problem
>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay 
>> in
>> C"). My day is just finishing now so I haven't applied too much 
>> scrutiny
>> to this result. Given the various rabbit holes I've been down on this
>> issue already I'd take this information with a good degree of 
>> skepticism.
>>
> OK, so an easy test should be to re-test with a 5.4 kernel.
> It doesn't have yet the change you're referring to, and the fsl-espi 
> driver
> is basically the same as in 5.7 (just two small changes in 5.7).
 There's 6cc0c16d82f88 and maybe also other interrupt related patches
 around this time that could affect book E, so it's good if that exact
 patch is confirmed.
>>> My confirmation is basically that I can induce the issue in a 5.4 kernel
>>> by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in
>>> 5.9-rc2 by reverting that one commit.
>>>
>>> I both cases it's not exactly a clean cherry-pick/revert so I also
>>> confirmed the bisection result by building at 3282a3da25bd (which sees
>>> the issue) and the commit just before (which does not).
>> Thanks for testing, that confirms it well.
>>
>> [snip patch]
>>
>>> I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG
>>> didn't report anything (either with or without the change above).
>> Okay, it was a bit of a shot in the dark. I still can't see what
>> else has changed.
>>
>> What would cause this, a lost interrupt? A spurious interrupt? Or
>> higher interrupt latency?
>>
>> I don't think the patch should cause significantly worse latency,
>> (it's supposed to be a bit better if anything because it doesn't set
>> up the full interrupt frame). But it's possible.
> My working theory is that the SPI_DON indication is all about the TX
> direction an now that the interrupts are faster we're hitting an error
> because there is still RX activity going on. Heiner disagrees with my
> interpretation of the SPI_DON indication and the fact that it doesn't
> happen every time does throw doubt on it.
>
 It's right that the eSPI spec can be interpreted that SPI_DON refers to
 TX only. However this wouldn't really make sense, because also for RX
 we program the frame length, and therefore want to be notified once the
 full frame was received. Also practical experience shows that SPI_DON
 is set also after RX-only transfers.
 Typical SPI NOR use case is that you write read command + start 

Re: [PATCH v3 03/13] mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 1:02 PM, Anshuman Khandual wrote:



On 09/01/2020 11:51 AM, Aneesh Kumar K.V wrote:

On 9/1/20 8:45 AM, Anshuman Khandual wrote:



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:

ppc64 use bit 62 to indicate a pte entry (_PAGE_PTE). Avoid setting that bit in
random value.

Signed-off-by: Aneesh Kumar K.V 
---
   mm/debug_vm_pgtable.c | 13 ++---
   1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 086309fb9b6f..bbf9df0e64c6 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -44,10 +44,17 @@
    * entry type. But these bits might affect the ability to clear entries with
    * pxx_clear() because of how dynamic page table folding works on s390. So
    * while loading up the entries do not change the lower 4 bits. It does not
- * have affect any other platform.
+ * have affect any other platform. Also avoid the 62nd bit on ppc64 that is
+ * used to mark a pte entry.
    */
-#define S390_MASK_BITS    4
-#define RANDOM_ORVALUE    GENMASK(BITS_PER_LONG - 1, S390_MASK_BITS)
+#define S390_SKIP_MASK    GENMASK(3, 0)
+#ifdef CONFIG_PPC_BOOK3S_64
+#define PPC64_SKIP_MASK    GENMASK(62, 62)
+#else
+#define PPC64_SKIP_MASK    0x0
+#endif


Please drop the #ifdef CONFIG_PPC_BOOK3S_64 here. We already accommodate skip
bits for a s390 platform requirement and can also do so for ppc64 as well. As
mentioned before, please avoid adding any platform specific constructs in the
test.




that is needed so that it can be built on 32 bit architectures.I did face build 
errors with arch-linux


Could not (#if __BITS_PER_LONG == 32) be used instead or something like
that. But should be a generic conditional check identifying 32 bit arch
not anything platform specific.



that _PAGE_PTE bit is pretty much specific to PPC BOOK3S_64.  Not sure 
why other architectures need to bothered about ignoring bit 62.


-aneesh


Re: [RESEND][PATCH 0/7] Avoid overflow at boundary_size

2020-09-01 Thread Christoph Hellwig
I really don't like all the open coded smarts in the various drivers.
What do you think about a helper like the one in the untested patch
below (on top of your series).  Also please include the original
segment boundary patch with the next resend so that the series has
the full context.

diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index 1ef2c647bd3ec2..6f7de4f4e191e7 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -141,10 +141,7 @@ iommu_arena_find_pages(struct device *dev, struct 
pci_iommu_arena *arena,
unsigned long boundary_size;
 
base = arena->dma_base >> PAGE_SHIFT;
-
-   boundary_size = dev ? dma_get_seg_boundary(dev) : U32_MAX;
-   /* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
-   boundary_size = (boundary_size >> PAGE_SHIFT) + 1;
+   boundary_size = dma_get_seg_boundary_nr_pages(dev, PAGE_SHIFT);
 
/* Search forward for the first mask-aligned sequence of N free ptes */
ptes = arena->ptes;
diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 945954903bb0ba..b49b73a95067d2 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -485,8 +485,7 @@ sba_search_bitmap(struct ioc *ioc, struct device *dev,
ASSERT(((unsigned long) ioc->res_hint & (sizeof(unsigned long) - 1UL)) 
== 0);
ASSERT(res_ptr < res_end);
 
-   /* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
-   boundary_size = (dma_get_seg_boundary(dev) >> iovp_shift) + 1;
+   boundary_size = dma_get_seg_boundary_nr_pages(dev, iovp_shift);
 
BUG_ON(ioc->ibase & ~iovp_mask);
shift = ioc->ibase >> iovp_shift;
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index c01ccbf8afdd42..cbc2e62db597cf 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -236,11 +236,7 @@ static unsigned long iommu_range_alloc(struct device *dev,
}
}
 
-   /* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
-   boundary_size = dev ? dma_get_seg_boundary(dev) : U32_MAX;
-
-   /* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
-   boundary_size = (boundary_size >> tbl->it_page_shift) + 1;
+   boundary_size = dma_get_seg_boundary_nr_pages(dev, tbl->it_page_shift);
 
n = iommu_area_alloc(tbl->it_map, limit, start, npages, tbl->it_offset,
 boundary_size, align_mask);
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index ecb067acc6d532..4a37d8f4de9d9d 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -261,13 +261,11 @@ static unsigned long __dma_alloc_iommu(struct device *dev,
   unsigned long start, int size)
 {
struct zpci_dev *zdev = to_zpci(to_pci_dev(dev));
-   unsigned long boundary_size;
 
-   /* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
-   boundary_size = (dma_get_seg_boundary(dev) >> PAGE_SHIFT) + 1;
return iommu_area_alloc(zdev->iommu_bitmap, zdev->iommu_pages,
start, size, zdev->start_dma >> PAGE_SHIFT,
-   boundary_size, 0);
+   dma_get_seg_boundary_nr_pages(dev, PAGE_SHIFT),
+   0);
 }
 
 static dma_addr_t dma_alloc_address(struct device *dev, int size)
diff --git a/arch/sparc/kernel/iommu-common.c b/arch/sparc/kernel/iommu-common.c
index 843e71894d0482..e6139c99762e11 100644
--- a/arch/sparc/kernel/iommu-common.c
+++ b/arch/sparc/kernel/iommu-common.c
@@ -166,10 +166,6 @@ unsigned long iommu_tbl_range_alloc(struct device *dev,
}
}
 
-   boundary_size = dev ? dma_get_seg_boundary(dev) : U32_MAX;
-
-   /* Overflow-free shortcut for: ALIGN(b + 1, 1 << s) >> s */
-   boundary_size = (boundary_size >> iommu->table_shift) + 1;
/*
 * if the skip_span_boundary_check had been set during init, we set
 * things up so that iommu_is_span_boundary() merely checks if the
@@ -178,7 +174,11 @@ unsigned long iommu_tbl_range_alloc(struct device *dev,
if ((iommu->flags & IOMMU_NO_SPAN_BOUND) != 0) {
shift = 0;
boundary_size = iommu->poolsize * iommu->nr_pools;
+   } else {
+   boundary_size = dma_get_seg_boundary_nr_pages(dev,
+   iommu->table_shift);
}
+
n = iommu_area_alloc(iommu->map, limit, start, npages, shift,
 boundary_size, align_mask);
if (n == -1) {
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index d981c37305ae31..c3e4e2df26a8b8 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -472,8 +472,7 @@ static int dma_4u_map_sg(struct device *dev, struct 
scatterlist *sglist,
outs->dma_length = 0;
 
max_seg_size = 

[PATCH] powerpc/powermac: Fix low_sleep_handler with KUAP and KUEP

2020-09-01 Thread Christophe Leroy
low_sleep_handler() has an hardcoded restore of segment registers
that doesn't take KUAP and KUEP into account.

Use head_32's load_segment_registers() routine instead.

Signed-off-by: Christophe Leroy 
Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access 
Protection")
Fixes: 31ed2b13c48d ("powerpc/32s: Implement Kernel Userspace Execution 
Prevention.")
Cc: sta...@vger.kernel.org
---
 arch/powerpc/platforms/powermac/sleep.S | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/sleep.S 
b/arch/powerpc/platforms/powermac/sleep.S
index f9a680fdd9c4..51bfdfe85058 100644
--- a/arch/powerpc/platforms/powermac/sleep.S
+++ b/arch/powerpc/platforms/powermac/sleep.S
@@ -294,14 +294,7 @@ grackle_wake_up:
 * we do any r1 memory access as we are not sure they
 * are in a sane state above the first 256Mb region
 */
-   li  r0,16   /* load up segment register values */
-   mtctr   r0  /* for context 0 */
-   lis r3,0x2000   /* Ku = 1, VSID = 0 */
-   li  r4,0
-3: mtsrin  r3,r4
-   addir3,r3,0x111 /* increment VSID */
-   addis   r4,r4,0x1000/* address of next segment */
-   bdnz3b
+   bl  load_segment_registers
sync
isync
 
-- 
2.25.0



Re: [PATCH v3 03/13] mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value

2020-09-01 Thread Anshuman Khandual



On 09/01/2020 11:51 AM, Aneesh Kumar K.V wrote:
> On 9/1/20 8:45 AM, Anshuman Khandual wrote:
>>
>>
>> On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:
>>> ppc64 use bit 62 to indicate a pte entry (_PAGE_PTE). Avoid setting that 
>>> bit in
>>> random value.
>>>
>>> Signed-off-by: Aneesh Kumar K.V 
>>> ---
>>>   mm/debug_vm_pgtable.c | 13 ++---
>>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>>> index 086309fb9b6f..bbf9df0e64c6 100644
>>> --- a/mm/debug_vm_pgtable.c
>>> +++ b/mm/debug_vm_pgtable.c
>>> @@ -44,10 +44,17 @@
>>>    * entry type. But these bits might affect the ability to clear entries 
>>> with
>>>    * pxx_clear() because of how dynamic page table folding works on s390. So
>>>    * while loading up the entries do not change the lower 4 bits. It does 
>>> not
>>> - * have affect any other platform.
>>> + * have affect any other platform. Also avoid the 62nd bit on ppc64 that is
>>> + * used to mark a pte entry.
>>>    */
>>> -#define S390_MASK_BITS    4
>>> -#define RANDOM_ORVALUE    GENMASK(BITS_PER_LONG - 1, S390_MASK_BITS)
>>> +#define S390_SKIP_MASK    GENMASK(3, 0)
>>> +#ifdef CONFIG_PPC_BOOK3S_64
>>> +#define PPC64_SKIP_MASK    GENMASK(62, 62)
>>> +#else
>>> +#define PPC64_SKIP_MASK    0x0
>>> +#endif
>>
>> Please drop the #ifdef CONFIG_PPC_BOOK3S_64 here. We already accommodate skip
>> bits for a s390 platform requirement and can also do so for ppc64 as well. As
>> mentioned before, please avoid adding any platform specific constructs in the
>> test.
>>
> 
> 
> that is needed so that it can be built on 32 bit architectures.I did face 
> build errors with arch-linux

Could not (#if __BITS_PER_LONG == 32) be used instead or something like
that. But should be a generic conditional check identifying 32 bit arch
not anything platform specific.


Re: [PATCH v2 16/23] powerpc: use asm-generic/mmu_context.h for no-op implementations

2020-09-01 Thread Michael Ellerman
Nicholas Piggin  writes:
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/include/asm/mmu_context.h | 22 +++---
>  1 file changed, 7 insertions(+), 15 deletions(-)

Acked-by: Michael Ellerman  (powerpc)

cheers


> diff --git a/arch/powerpc/include/asm/mmu_context.h 
> b/arch/powerpc/include/asm/mmu_context.h
> index 7f3658a97384..bc22e247ab55 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -14,7 +14,9 @@
>  /*
>   * Most if the context management is out of line
>   */
> +#define init_new_context init_new_context
>  extern int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
> +#define destroy_context destroy_context
>  extern void destroy_context(struct mm_struct *mm);
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>  struct mm_iommu_table_group_mem_t;
> @@ -235,27 +237,15 @@ static inline void switch_mm(struct mm_struct *prev, 
> struct mm_struct *next,
>  }
>  #define switch_mm_irqs_off switch_mm_irqs_off
>  
> -
> -#define deactivate_mm(tsk,mm)do { } while (0)
> -
> -/*
> - * After we have set current->mm to a new value, this activates
> - * the context for the new mm so we see the new mappings.
> - */
> -static inline void activate_mm(struct mm_struct *prev, struct mm_struct 
> *next)
> -{
> - switch_mm(prev, next, current);
> -}
> -
> -/* We don't currently use enter_lazy_tlb() for anything */
> +#ifdef CONFIG_PPC_BOOK3E_64
> +#define enter_lazy_tlb enter_lazy_tlb
>  static inline void enter_lazy_tlb(struct mm_struct *mm,
> struct task_struct *tsk)
>  {
>   /* 64-bit Book3E keeps track of current PGD in the PACA */
> -#ifdef CONFIG_PPC_BOOK3E_64
>   get_paca()->pgd = NULL;
> -#endif
>  }
> +#endif
>  
>  extern void arch_exit_mmap(struct mm_struct *mm);
>  
> @@ -298,5 +288,7 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
>   return 0;
>  }
>  
> +#include 
> +
>  #endif /* __KERNEL__ */
>  #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
> -- 
> 2.23.0


Re: [PATCH v11] Fixup for "powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32"

2020-09-01 Thread Christophe Leroy



Le 01/09/2020 à 09:19, Michal Suchánek a écrit :

Hello,

can you add Fixes: ?


That's a commit which is still in powerpc/next-test.
My intention was to provide something that Michael can squash/fixup into 
the culprit commit.


Christophe




Thanks

Michal

On Tue, Sep 01, 2020 at 05:28:57AM +, Christophe Leroy wrote:

Signed-off-by: Christophe Leroy 
---
  arch/powerpc/include/asm/vdso/gettimeofday.h | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
b/arch/powerpc/include/asm/vdso/gettimeofday.h
index 59a609a48b63..8da84722729b 100644
--- a/arch/powerpc/include/asm/vdso/gettimeofday.h
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -186,6 +186,8 @@ int __c_kernel_clock_getres(clockid_t clock_id, struct 
__kernel_timespec *res,
  #else
  int __c_kernel_clock_gettime(clockid_t clock, struct old_timespec32 *ts,
 const struct vdso_data *vd);
+int __c_kernel_clock_gettime64(clockid_t clock, struct __kernel_timespec *ts,
+  const struct vdso_data *vd);
  int __c_kernel_clock_getres(clockid_t clock_id, struct old_timespec32 *res,
const struct vdso_data *vd);
  #endif
--
2.25.0



Re: [PATCH v11] Fixup for "powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32"

2020-09-01 Thread Michal Suchánek
Hello,

can you add Fixes: ?

Thanks

Michal

On Tue, Sep 01, 2020 at 05:28:57AM +, Christophe Leroy wrote:
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/vdso/gettimeofday.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
> b/arch/powerpc/include/asm/vdso/gettimeofday.h
> index 59a609a48b63..8da84722729b 100644
> --- a/arch/powerpc/include/asm/vdso/gettimeofday.h
> +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
> @@ -186,6 +186,8 @@ int __c_kernel_clock_getres(clockid_t clock_id, struct 
> __kernel_timespec *res,
>  #else
>  int __c_kernel_clock_gettime(clockid_t clock, struct old_timespec32 *ts,
>const struct vdso_data *vd);
> +int __c_kernel_clock_gettime64(clockid_t clock, struct __kernel_timespec *ts,
> +const struct vdso_data *vd);
>  int __c_kernel_clock_getres(clockid_t clock_id, struct old_timespec32 *res,
>   const struct vdso_data *vd);
>  #endif
> -- 
> 2.25.0
> 


Re: [fs] ef30fb3c60: kernel write not supported for file /sys/kernel/softlockup_panic

2020-09-01 Thread Christoph Hellwig
Looks like since the start of this series we've grown new code to
use kernel_write on sysctl files based on boot parameters.  The good
news is that this just means I need to resurrect the sysctl series
as all that work was done already.


Re: [PATCH v3 08/13] mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP

2020-09-01 Thread Christophe Leroy




Le 01/09/2020 à 08:25, Aneesh Kumar K.V a écrit :

On 9/1/20 8:52 AM, Anshuman Khandual wrote:




There is a checkpatch.pl warning here.

WARNING: Possible unwrapped commit description (prefer a maximum 75 
chars per line)

#7:
Architectures like ppc64 use deposited page table while updating the 
huge pte


total: 0 errors, 1 warnings, 40 lines checked



I will ignore all these, because they are not really important IMHO.



When doing a git log in a 80 chars terminal window, having wrapping 
lines is not really convenient. It should be easy to avoid it.


Christophe


Re: [PATCH v3 09/13] mm/debug_vm_pgtable/locks: Move non page table modifying test together

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 9:11 AM, Anshuman Khandual wrote:



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:

This will help in adding proper locks in a later patch


It really makes sense to classify these tests here as static and dynamic.
Static are the ones that test via page table entry values modification
without changing anything on the actual page table itself. Where as the
dynamic tests do change the page table. Static tests would probably never
require any lock synchronization but the dynamic ones will do.



Signed-off-by: Aneesh Kumar K.V 
---
  mm/debug_vm_pgtable.c | 52 ---
  1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 0ce5c6a24c5b..78c8af3445ac 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -992,7 +992,7 @@ static int __init debug_vm_pgtable(void)
p4dp = p4d_alloc(mm, pgdp, vaddr);
pudp = pud_alloc(mm, p4dp, vaddr);
pmdp = pmd_alloc(mm, pudp, vaddr);
-   ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );
+   ptep = pte_alloc_map(mm, pmdp, vaddr);
  
  	/*

 * Save all the page table page addresses as the page table
@@ -1012,33 +1012,12 @@ static int __init debug_vm_pgtable(void)
p4d_basic_tests(p4d_aligned, prot);
pgd_basic_tests(pgd_aligned, prot);
  
-	pte_clear_tests(mm, ptep, vaddr);

-   pmd_clear_tests(mm, pmdp);
-   pud_clear_tests(mm, pudp);
-   p4d_clear_tests(mm, p4dp);
-   pgd_clear_tests(mm, pgdp);
-
-   pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
-   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
-   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-
pmd_leaf_tests(pmd_aligned, prot);
pud_leaf_tests(pud_aligned, prot);
  
-	pmd_huge_tests(pmdp, pmd_aligned, prot);

-   pud_huge_tests(pudp, pud_aligned, prot);
-
pte_savedwrite_tests(pte_aligned, protnone);
pmd_savedwrite_tests(pmd_aligned, protnone);
  
-	pte_unmap_unlock(ptep, ptl);

-
-   pmd_populate_tests(mm, pmdp, saved_ptep);
-   pud_populate_tests(mm, pudp, saved_pmdp);
-   p4d_populate_tests(mm, p4dp, saved_pudp);
-   pgd_populate_tests(mm, pgdp, saved_p4dp);
-
pte_special_tests(pte_aligned, prot);
pte_protnone_tests(pte_aligned, protnone);
pmd_protnone_tests(pmd_aligned, protnone);
@@ -1056,11 +1035,38 @@ static int __init debug_vm_pgtable(void)
pmd_swap_tests(pmd_aligned, prot);
  
  	swap_migration_tests();

-   hugetlb_basic_tests(pte_aligned, prot);
  
  	pmd_thp_tests(pmd_aligned, prot);

pud_thp_tests(pud_aligned, prot);
  
+	/*

+* Page table modifying tests
+*/


In this comment, please do add some more details about how these tests
need proper locking mechanism unlike the previous static ones. Also add
a similar comment section for the static tests that dont really change
page table and need not require corresponding locking mechanism. Both
comment sections will make this classification clear.


+   pte_clear_tests(mm, ptep, vaddr);
+   pmd_clear_tests(mm, pmdp);
+   pud_clear_tests(mm, pudp);
+   p4d_clear_tests(mm, p4dp);
+   pgd_clear_tests(mm, pgdp);
+
+   ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );
+   pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
+   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
+   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+
+
+   pmd_huge_tests(pmdp, pmd_aligned, prot);
+   pud_huge_tests(pudp, pud_aligned, prot);
+
+   pte_unmap_unlock(ptep, ptl);
+
+   pmd_populate_tests(mm, pmdp, saved_ptep);
+   pud_populate_tests(mm, pudp, saved_pmdp);
+   p4d_populate_tests(mm, p4dp, saved_pudp);
+   pgd_populate_tests(mm, pgdp, saved_p4dp);
+
+   hugetlb_basic_tests(pte_aligned, prot);


hugetlb_basic_tests() is a non page table modifying static test and
should be classified accordingly.


+
p4d_free(mm, saved_p4dp);
pud_free(mm, saved_pudp);
pmd_free(mm, saved_pmdp);



Changes till this patch fails to boot on arm64 platform. This should be
folded with the next patch.

[   17.080644] [ cut here ]
[   17.081342] kernel BUG at mm/pgtable-generic.c:164!
[   17.082091] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[   17.082977] Modules linked in:
[   17.083481] CPU: 79 PID: 1 Comm: swapper/0 Tainted: GW 
5.9.0-rc2-00105-g740e72cd6717 #14
[   17.084998] Hardware name: linux,dummy-virt (DT)
[   17.085745] pstate: 6045 (nZCv daif +PAN -UAO BTYPE=--)
[   17.086645] pc : pgtable_trans_huge_deposit+0x58/0x60
[   17.087462] lr : debug_vm_pgtable+0x4dc/0x8f0
[   17.088168] sp : 80001219bcf0
[   17.088710] 

  1   2   >