Re: [PATCH] powerpc/powernv/idle: Restore IAMR after idle

2019-02-19 Thread Akshay Adiga
On Wed, Feb 06, 2019 at 05:28:37PM +1100, Russell Currey wrote:
> Without restoring the IAMR after idle, execution prevention on POWER9
> with Radix MMU is overwritten and the kernel can freely execute userspace 
> without
> faulting.
> 
> This is necessary when returning from any stop state that modifies user
> state, as well as hypervisor state.
> 
> To test how this fails without this patch, load the lkdtm driver and
> do the following:
> 
>echo EXEC_USERSPACE > /sys/kernel/debug/provoke-crash/DIRECT
> 
> which won't fault, then boot the kernel with powersave=off, where it
> will fault.  Applying this patch will fix this.
> 
> Fixes: 3b10d0095a1e ("powerpc/mm/radix: Prevent kernel execution of user
> space")
> Cc: 
> Signed-off-by: Russell Currey 
> ---
>  arch/powerpc/include/asm/cpuidle.h |  1 +
>  arch/powerpc/kernel/asm-offsets.c  |  1 +
>  arch/powerpc/kernel/idle_book3s.S  | 20 
>  3 files changed, 22 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/cpuidle.h 
> b/arch/powerpc/include/asm/cpuidle.h
> index 43e5f31fe64d..ad67dbe59498 100644
> --- a/arch/powerpc/include/asm/cpuidle.h
> +++ b/arch/powerpc/include/asm/cpuidle.h
> @@ -77,6 +77,7 @@ struct stop_sprs {
>   u64 mmcr1;
>   u64 mmcr2;
>   u64 mmcra;
> + u64 iamr;
>  };
> 
>  #define PNV_IDLE_NAME_LEN16
> diff --git a/arch/powerpc/kernel/asm-offsets.c 
> b/arch/powerpc/kernel/asm-offsets.c
> index 9ffc72ded73a..10e0314c2b0d 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -774,6 +774,7 @@ int main(void)
>   STOP_SPR(STOP_MMCR1, mmcr1);
>   STOP_SPR(STOP_MMCR2, mmcr2);
>   STOP_SPR(STOP_MMCRA, mmcra);
> + STOP_SPR(STOP_IAMR, iamr);
>  #endif
> 
>   DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
> diff --git a/arch/powerpc/kernel/idle_book3s.S 
> b/arch/powerpc/kernel/idle_book3s.S
> index 7f5ac2e8581b..bb4f552f6c7e 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -200,6 +200,12 @@ pnv_powersave_common:
>   /* Continue saving state */
>   SAVE_GPR(2, r1)
>   SAVE_NVGPRS(r1)
> +
> +BEGIN_FTR_SECTION
> + mfspr   r5, SPRN_IAMR
> + std r5, STOP_IAMR(r13)
> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> +

Are we trying to add for both power8 and power9 ?
power9 would be CPU_FTR_ARCH_300.



[PATCH 3/3] powerpc: sstep: Add tests for addc[.] instruction

2019-02-19 Thread Sandipan Das
This adds test cases for the addc[.] instruction.

Signed-off-by: Sandipan Das 
---
 arch/powerpc/include/asm/ppc-opcode.h |   1 +
 arch/powerpc/lib/test_emulate_step.c  | 192 ++
 2 files changed, 193 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 19a8834e0398..87b73aa56b53 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -326,6 +326,7 @@
 #define PPC_INST_ADDI  0x3800
 #define PPC_INST_ADDIS 0x3c00
 #define PPC_INST_ADD   0x7c000214
+#define PPC_INST_ADDC  0x7c14
 #define PPC_INST_SUB   0x7c50
 #define PPC_INST_BLR   0x4e800020
 #define PPC_INST_BLRL  0x4e800021
diff --git a/arch/powerpc/lib/test_emulate_step.c 
b/arch/powerpc/lib/test_emulate_step.c
index bf88b20e53d7..1c13b3bebeca 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -53,6 +53,10 @@
___PPC_RA(a) | ___PPC_RB(b))
 #define TEST_ADD_DOT(t, a, b)  (PPC_INST_ADD | ___PPC_RT(t) |  \
___PPC_RA(a) | ___PPC_RB(b) | 0x1)
+#define TEST_ADDC(t, a, b) (PPC_INST_ADDC | ___PPC_RT(t) | \
+   ___PPC_RA(a) | ___PPC_RB(b))
+#define TEST_ADDC_DOT(t, a, b) (PPC_INST_ADDC | ___PPC_RT(t) | \
+   ___PPC_RA(a) | ___PPC_RB(b) | 0x1)
 
 #define MAX_SUBTESTS   16
 
@@ -649,6 +653,194 @@ static struct compute_test compute_tests[] = {
}
}
}
+   },
+   {
+   .mnemonic = "addc",
+   .subtests = {
+   {
+   .descr = "RA = LONG_MIN, RB = LONG_MIN",
+   .instr = TEST_ADDC(20, 21, 22),
+   .regs = {
+   .gpr[21] = LONG_MIN,
+   .gpr[22] = LONG_MIN,
+   }
+   },
+   {
+   .descr = "RA = LONG_MIN, RB = LONG_MAX",
+   .instr = TEST_ADDC(20, 21, 22),
+   .regs = {
+   .gpr[21] = LONG_MIN,
+   .gpr[22] = LONG_MAX,
+   }
+   },
+   {
+   .descr = "RA = LONG_MAX, RB = LONG_MAX",
+   .instr = TEST_ADDC(20, 21, 22),
+   .regs = {
+   .gpr[21] = LONG_MAX,
+   .gpr[22] = LONG_MAX,
+   }
+   },
+   {
+   .descr = "RA = ULONG_MAX, RB = ULONG_MAX",
+   .instr = TEST_ADDC(20, 21, 22),
+   .regs = {
+   .gpr[21] = ULONG_MAX,
+   .gpr[22] = ULONG_MAX,
+   }
+   },
+   {
+   .descr = "RA = ULONG_MAX, RB = 0x1",
+   .instr = TEST_ADDC(20, 21, 22),
+   .regs = {
+   .gpr[21] = ULONG_MAX,
+   .gpr[22] = 0x1,
+   }
+   },
+   {
+   .descr = "RA = INT_MIN, RB = INT_MIN",
+   .instr = TEST_ADDC(20, 21, 22),
+   .regs = {
+   .gpr[21] = INT_MIN,
+   .gpr[22] = INT_MIN,
+   }
+   },
+   {
+   .descr = "RA = INT_MIN, RB = INT_MAX",
+   .instr = TEST_ADDC(20, 21, 22),
+   .regs = {
+   .gpr[21] = INT_MIN,
+   .gpr[22] = INT_MAX,
+   }
+   },
+   {
+   .descr = "RA = INT_MAX, RB = INT_MAX",
+   .instr = TEST_ADDC(20, 21, 22),
+   .regs = {
+   .gpr[21] = INT_MAX,
+   .gpr[22] = INT_MAX,
+   }
+   },
+   {
+   

[PATCH 2/3] powerpc: sstep: Add tests for add[.] instruction

2019-02-19 Thread Sandipan Das
This adds test cases for the add[.] instruction.

Signed-off-by: Sandipan Das 
---
 arch/powerpc/lib/test_emulate_step.c | 176 +++
 1 file changed, 176 insertions(+)

diff --git a/arch/powerpc/lib/test_emulate_step.c 
b/arch/powerpc/lib/test_emulate_step.c
index 3d7f7bae51cc..bf88b20e53d7 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -49,6 +49,10 @@
___PPC_RA(a) | ___PPC_RB(b))
 #define TEST_LXVD2X(s, a, b)   (PPC_INST_LXVD2X | VSX_XX1((s), R##a, R##b))
 #define TEST_STXVD2X(s, a, b)  (PPC_INST_STXVD2X | VSX_XX1((s), R##a, R##b))
+#define TEST_ADD(t, a, b)  (PPC_INST_ADD | ___PPC_RT(t) |  \
+   ___PPC_RA(a) | ___PPC_RB(b))
+#define TEST_ADD_DOT(t, a, b)  (PPC_INST_ADD | ___PPC_RT(t) |  \
+   ___PPC_RA(a) | ___PPC_RB(b) | 0x1)
 
 #define MAX_SUBTESTS   16
 
@@ -473,6 +477,178 @@ static struct compute_test compute_tests[] = {
}
}
}
+   },
+   {
+   .mnemonic = "add",
+   .subtests = {
+   {
+   .descr = "RA = LONG_MIN, RB = LONG_MIN",
+   .instr = TEST_ADD(20, 21, 22),
+   .regs = {
+   .gpr[21] = LONG_MIN,
+   .gpr[22] = LONG_MIN,
+   }
+   },
+   {
+   .descr = "RA = LONG_MIN, RB = LONG_MAX",
+   .instr = TEST_ADD(20, 21, 22),
+   .regs = {
+   .gpr[21] = LONG_MIN,
+   .gpr[22] = LONG_MAX,
+   }
+   },
+   {
+   .descr = "RA = LONG_MAX, RB = LONG_MAX",
+   .instr = TEST_ADD(20, 21, 22),
+   .regs = {
+   .gpr[21] = LONG_MAX,
+   .gpr[22] = LONG_MAX,
+   }
+   },
+   {
+   .descr = "RA = ULONG_MAX, RB = ULONG_MAX",
+   .instr = TEST_ADD(20, 21, 22),
+   .regs = {
+   .gpr[21] = ULONG_MAX,
+   .gpr[22] = ULONG_MAX,
+   }
+   },
+   {
+   .descr = "RA = ULONG_MAX, RB = 0x1",
+   .instr = TEST_ADD(20, 21, 22),
+   .regs = {
+   .gpr[21] = ULONG_MAX,
+   .gpr[22] = 0x1,
+   }
+   },
+   {
+   .descr = "RA = INT_MIN, RB = INT_MIN",
+   .instr = TEST_ADD(20, 21, 22),
+   .regs = {
+   .gpr[21] = INT_MIN,
+   .gpr[22] = INT_MIN,
+   }
+   },
+   {
+   .descr = "RA = INT_MIN, RB = INT_MAX",
+   .instr = TEST_ADD(20, 21, 22),
+   .regs = {
+   .gpr[21] = INT_MIN,
+   .gpr[22] = INT_MAX,
+   }
+   },
+   {
+   .descr = "RA = INT_MAX, RB = INT_MAX",
+   .instr = TEST_ADD(20, 21, 22),
+   .regs = {
+   .gpr[21] = INT_MAX,
+   .gpr[22] = INT_MAX,
+   }
+   },
+   {
+   .descr = "RA = UINT_MAX, RB = UINT_MAX",
+   .instr = TEST_ADD(20, 21, 22),
+   .regs = {
+   .gpr[21] = UINT_MAX,
+   .gpr[22] = UINT_MAX,
+   }
+   },
+   {
+   .descr = "RA = UINT_MAX, RB = 0x1",
+   .instr = TEST_ADD(20, 21, 22),
+   .regs = {
+   .gpr[21] = UINT_MAX,
+   .gpr[22] = 0x1,
+ 

[PATCH 1/3] powerpc: sstep: Add tests for compute type instructions

2019-02-19 Thread Sandipan Das
This enhances the current selftest framework for validating
the in-kernel instruction emulation infrastructure by adding
support for compute type instructions i.e. integer ALU-based
instructions. Originally, this framework was limited to only
testing load and store instructions.

While most of the GPRs can be validated, support for SPRs is
limited to LR, CR and XER for now.

When writing the test cases, one must ensure that the Stack
Pointer (GPR1) or the Thread Pointer (GPR13) are not touched
by any means as these are vital non-volatile registers.

Signed-off-by: Sandipan Das 
---
 arch/powerpc/lib/Makefile |   3 +-
 arch/powerpc/lib/test_emulate_step.c  | 167 +-
 .../lib/test_emulate_step_exec_instr.S| 150 
 3 files changed, 315 insertions(+), 5 deletions(-)
 create mode 100644 arch/powerpc/lib/test_emulate_step_exec_instr.S

diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 3bf9fc6fd36c..79396e184bca 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -30,7 +30,8 @@ obj64-y   += copypage_64.o copyuser_64.o mem_64.o 
hweight_64.o \
 
 obj64-$(CONFIG_SMP)+= locks.o
 obj64-$(CONFIG_ALTIVEC)+= vmx-helper.o
-obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o
+obj64-$(CONFIG_KPROBES_SANITY_TEST)+= test_emulate_step.o \
+  test_emulate_step_exec_instr.o
 
 obj-y  += checksum_$(BITS).o checksum_wrappers.o \
   string_$(BITS).o memcmp_$(BITS).o
diff --git a/arch/powerpc/lib/test_emulate_step.c 
b/arch/powerpc/lib/test_emulate_step.c
index 6c47daa61614..3d7f7bae51cc 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -1,5 +1,5 @@
 /*
- * Simple sanity test for emulate_step load/store instructions.
+ * Simple sanity tests for instruction emulation infrastructure.
  *
  * Copyright IBM Corp. 2016
  *
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define IMM_L(i)   ((uintptr_t)(i) & 0x)
 
@@ -49,6 +50,11 @@
 #define TEST_LXVD2X(s, a, b)   (PPC_INST_LXVD2X | VSX_XX1((s), R##a, R##b))
 #define TEST_STXVD2X(s, a, b)  (PPC_INST_STXVD2X | VSX_XX1((s), R##a, R##b))
 
+#define MAX_SUBTESTS   16
+
+#define IGNORE_GPR(n)  (0x1UL << (n))
+#define IGNORE_XER (0x1UL << 32)
+#define IGNORE_CCR (0x1UL << 33)
 
 static void __init init_pt_regs(struct pt_regs *regs)
 {
@@ -72,9 +78,15 @@ static void __init init_pt_regs(struct pt_regs *regs)
msr_cached = true;
 }
 
-static void __init show_result(char *ins, char *result)
+static void __init show_result(char *mnemonic, char *result)
 {
-   pr_info("%-14s : %s\n", ins, result);
+   pr_info("%-14s : %s\n", mnemonic, result);
+}
+
+static void __init show_result_with_descr(char *mnemonic, char *descr,
+ char *result)
+{
+   pr_info("%-14s : %-50s %s\n", mnemonic, descr, result);
 }
 
 static void __init test_ld(void)
@@ -426,7 +438,7 @@ static void __init test_lxvd2x_stxvd2x(void)
 }
 #endif /* CONFIG_VSX */
 
-static int __init test_emulate_step(void)
+static void __init run_tests_load_store(void)
 {
test_ld();
test_lwz();
@@ -437,6 +449,153 @@ static int __init test_emulate_step(void)
test_lfdx_stfdx();
test_lvx_stvx();
test_lxvd2x_stxvd2x();
+}
+
+struct compute_test {
+   char *mnemonic;
+   struct {
+   char *descr;
+   unsigned long flags;
+   unsigned int instr;
+   struct pt_regs regs;
+   } subtests[MAX_SUBTESTS + 1];
+};
+
+static struct compute_test compute_tests[] = {
+   {
+   .mnemonic = "nop",
+   .subtests = {
+   {
+   .descr = "R0 = LONG_MAX",
+   .instr = PPC_INST_NOP,
+   .regs = {
+   .gpr[0] = LONG_MAX,
+   }
+   }
+   }
+   }
+};
+
+static int __init emulate_compute_instr(struct pt_regs *regs,
+   unsigned int instr)
+{
+   struct instruction_op op;
+
+   if (!regs || !instr)
+   return -EINVAL;
+
+   if (analyse_instr(, regs, instr) != 1 ||
+   GETTYPE(op.type) != COMPUTE) {
+   pr_info("emulation failed, instruction = 0x%08x\n", instr);
+   return -EFAULT;
+   }
+
+   emulate_update_regs(regs, );
+   return 0;
+}
+
+static int __init execute_compute_instr(struct pt_regs *regs,
+   unsigned int instr)
+{
+   extern unsigned int exec_instr_execute[];
+   extern int exec_instr(struct pt_regs *regs);
+
+   if (!regs || !instr)
+   return -EINVAL;
+
+   /* Patch the NOP with the actual instruction */
+   

[PATCH 0/3] powerpc: sstep: Emulation test infrastructure

2019-02-19 Thread Sandipan Das
This aims to extend the current test infrastructure for in-kernel
instruction emulation by adding support for validating basic integer
operations and will verify the GPRs, LR, XER and CR.

There can be multiple test cases for each instruction. Each test case
has to be provided with the initial register state (in the form of a
pt_regs) and the 32-bit instruction to test.

Apart from verifying the end result, problems with the behaviour of
certain instructions for things like setting certain bits in CR or
XER (which can also be processor dependent) can be identified.

For example, the newly introduced CA32 bit in XER, exclusive to P9
CPUs as of now, was not being set when expected for some of the
arithmetic and shift instructions. With this infrastructure, it will
be easier to identify such problems and rectify them. The test cases
for the addc[.] instruction demonstrate this for different scenarios
where the CA and CA32 bits of XER should be set.

Changelog:
  RFC -> v1:
- Integrate with current test infrastructure that already tests
  some load and store instructions.
- Remove first two patches that introduce new instructions fields
  in favour of extending the macros in the current infrastructure.
- Add a message to indicate that the tests are being run based on
  suggestions from Daniel.

Sandipan Das (3):
  powerpc: sstep: Add tests for compute type instructions
  powerpc: sstep: Add tests for add[.] instruction
  powerpc: sstep: Add tests for addc[.] instruction

 arch/powerpc/include/asm/ppc-opcode.h |   1 +
 arch/powerpc/lib/Makefile |   3 +-
 arch/powerpc/lib/test_emulate_step.c  | 535 +-
 .../lib/test_emulate_step_exec_instr.S| 150 +
 4 files changed, 684 insertions(+), 5 deletions(-)
 create mode 100644 arch/powerpc/lib/test_emulate_step_exec_instr.S

-- 
2.19.2



Re: [PATCH v2] powerpc/prom_init: add __init markers to all functions

2019-02-19 Thread Christophe Leroy




Le 20/02/2019 à 06:53, Masahiro Yamada a écrit :

It is fragile to rely on the compiler's optimization to avoid the
section mismatch. Some functions may not be necessarily inlined
when the compiler's inlining heuristic changes.

Add __init markers consistently.

As for prom_getprop() and prom_getproplen(), they are marked as
'inline', so inlining is guaranteed because PowerPC never enables
CONFIG_OPTIMIZE_INLINING. However, it would be better to leave the
inlining decision to the compiler. I replaced 'inline' with __init.
I added __maybe_unused to prom_getproplen() because it is currently
relying on the side-effect of 'inline'; GCC does not report
-Wunused-function warnings for functions with 'inline' marker.


__maybe_unused is really a bad trick that should be avoided, as it hides 
unused functions.


Why is it a problem to keep prom_getproplen() as 'static inline' ? Most 
small helpers are defined that way. Usually they are in an included 
header file, but what's really the problem with having it here directly ?


Christophe



Signed-off-by: Masahiro Yamada 
---

Changes in v2:
   - Add __maybe_unsed to prom_getproplen()
   - Add __init to enter_prom() as well

  arch/powerpc/kernel/prom_init.c | 29 +++--
  1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index f33ff41..1bad0ac 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -138,9 +138,9 @@ extern void __start(unsigned long r3, unsigned long r4, 
unsigned long r5,
unsigned long r9);
  
  #ifdef CONFIG_PPC64

-extern int enter_prom(struct prom_args *args, unsigned long entry);
+extern int __init enter_prom(struct prom_args *args, unsigned long entry);
  #else
-static inline int enter_prom(struct prom_args *args, unsigned long entry)
+static int __init enter_prom(struct prom_args *args, unsigned long entry)
  {
return ((int (*)(struct prom_args *))entry)(args);
  }
@@ -501,19 +501,20 @@ static int __init prom_next_node(phandle *nodep)
}
  }
  
-static inline int prom_getprop(phandle node, const char *pname,

+static int __init prom_getprop(phandle node, const char *pname,
   void *value, size_t valuelen)
  {
return call_prom("getprop", 4, 1, node, ADDR(pname),
 (u32)(unsigned long) value, (u32) valuelen);
  }
  
-static inline int prom_getproplen(phandle node, const char *pname)

+static int __init __maybe_unused prom_getproplen(phandle node,
+const char *pname)
  {
return call_prom("getproplen", 2, 1, node, ADDR(pname));
  }
  
-static void add_string(char **str, const char *q)

+static void __init add_string(char **str, const char *q)
  {
char *p = *str;
  
@@ -523,7 +524,7 @@ static void add_string(char **str, const char *q)

*str = p;
  }
  
-static char *tohex(unsigned int x)

+static char __init *tohex(unsigned int x)
  {
static const char digits[] __initconst = "0123456789abcdef";
static char result[9] __prombss;
@@ -570,7 +571,7 @@ static int __init prom_setprop(phandle node, const char 
*nodename,
  #define islower(c)('a' <= (c) && (c) <= 'z')
  #define toupper(c)(islower(c) ? ((c) - 'a' + 'A') : (c))
  
-static unsigned long prom_strtoul(const char *cp, const char **endp)

+static unsigned long __init prom_strtoul(const char *cp, const char **endp)
  {
unsigned long result = 0, base = 10, value;
  
@@ -595,7 +596,7 @@ static unsigned long prom_strtoul(const char *cp, const char **endp)

return result;
  }
  
-static unsigned long prom_memparse(const char *ptr, const char **retptr)

+static unsigned long __init prom_memparse(const char *ptr, const char **retptr)
  {
unsigned long ret = prom_strtoul(ptr, retptr);
int shift = 0;
@@ -2924,7 +2925,7 @@ static void __init fixup_device_tree_pasemi(void)
prom_setprop(iob, name, "device_type", "isa", sizeof("isa"));
  }
  #else /* !CONFIG_PPC_PASEMI_NEMO */
-static inline void fixup_device_tree_pasemi(void) { }
+static void __init fixup_device_tree_pasemi(void) { }
  #endif
  
  static void __init fixup_device_tree(void)

@@ -2986,15 +2987,15 @@ static void __init prom_check_initrd(unsigned long r3, 
unsigned long r4)
  
  #ifdef CONFIG_PPC64

  #ifdef CONFIG_RELOCATABLE
-static void reloc_toc(void)
+static void __init reloc_toc(void)
  {
  }
  
-static void unreloc_toc(void)

+static void __init unreloc_toc(void)
  {
  }
  #else
-static void __reloc_toc(unsigned long offset, unsigned long nr_entries)
+static void __init __reloc_toc(unsigned long offset, unsigned long nr_entries)
  {
unsigned long i;
unsigned long *toc_entry;
@@ -3008,7 +3009,7 @@ static void __reloc_toc(unsigned long offset, unsigned 
long nr_entries)
}
  }
  
-static void reloc_toc(void)

+static void __init reloc_toc(void)
  {

Re: [PATCH] powerpc: Make PPC_64K_PAGES depend on only 44x or PPC_BOOK3S_64

2019-02-19 Thread Scott Wood
On Wed, 2019-02-20 at 01:14 +1100, Michael Ellerman wrote:
> Christophe Leroy  writes:
> 
> > On 02/08/2019 12:34 PM, Michael Ellerman wrote:
> > > In commit 7820856a4fcd ("powerpc/mm/book3e/64: Remove unsupported
> > > 64Kpage size from 64bit booke") we dropped the 64K page size support
> > > from the 64-bit nohash (Book3E) code.
> > > 
> > > But we didn't update the dependencies of the PPC_64K_PAGES option,
> > > meaning a randconfig can still trigger this code and cause a build
> > > breakage, eg:
> > >arch/powerpc/include/asm/nohash/64/pgtable.h:14:2: error: #error
> > > "Page size not supported"
> > >arch/powerpc/include/asm/nohash/mmu-book3e.h:275:2: error: #error
> > > Unsupported page size
> > > 
> > > So remove PPC_BOOK3E_64 from the dependencies. This also means we
> > > don't need to worry about PPC_FSL_BOOK3E, because that was just trying
> > > to prevent the PPC_BOOK3E_64=y && PPC_FSL_BOOK3E=y case.
> > 
> > Does it means some cleanup could be done, for instance:
> > 
> > arch/powerpc/include/asm/nohash/64/pgalloc.h:#ifndef CONFIG_PPC_64K_PAGES
> > arch/powerpc/include/asm/nohash/64/pgalloc.h:#endif /* 
> > CONFIG_PPC_64K_PAGES */
> > arch/powerpc/include/asm/nohash/64/pgtable.h:#ifdef CONFIG_PPC_64K_PAGES
> > arch/powerpc/include/asm/nohash/64/slice.h:#ifdef CONFIG_PPC_64K_PAGES
> > arch/powerpc/include/asm/nohash/64/slice.h:#else /* CONFIG_PPC_64K_PAGES
> > */
> > arch/powerpc/include/asm/nohash/64/slice.h:#endif /* 
> > !CONFIG_PPC_64K_PAGES */
> > arch/powerpc/include/asm/nohash/pte-book3e.h:#ifdef CONFIG_PPC_64K_PAGES
> > 
> > arch/powerpc/mm/tlb_low_64e.S:#ifdef CONFIG_PPC_64K_PAGES
> > arch/powerpc/mm/tlb_low_64e.S:#ifndef CONFIG_PPC_64K_PAGES
> > arch/powerpc/mm/tlb_low_64e.S:#endif /* CONFIG_PPC_64K_PAGES */
> > arch/powerpc/mm/tlb_low_64e.S:#ifdef CONFIG_PPC_64K_PAGES
> > arch/powerpc/mm/tlb_low_64e.S:#ifdef CONFIG_PPC_64K_PAGES
> > arch/powerpc/mm/tlb_low_64e.S:#ifndef CONFIG_PPC_64K_PAGES
> > arch/powerpc/mm/tlb_low_64e.S:#endif /* CONFIG_PPC_64K_PAGES */
> > arch/powerpc/mm/tlb_low_64e.S:#ifndef CONFIG_PPC_64K_PAGES
> > arch/powerpc/mm/tlb_low_64e.S:#endif /* CONFIG_PPC_64K_PAGES */
> > arch/powerpc/mm/tlb_low_64e.S:#ifndef CONFIG_PPC_64K_PAGES
> > arch/powerpc/mm/tlb_low_64e.S:#ifdef CONFIG_PPC_64K_PAGES
> 
> Probably.
> 
> Some of the FSL chips do support 64K pages at least according to some
> datasheets. I don't know what would be required to get it working, or if
> it even works in practice.
> 
> So it would be nice to get 64K working on those chips, but probably no
> one has time or motivation to do it. In which case yeah all that code
> should be removed.

The primary TLB (TLB0) on these chips only supports 4K pages.  TLB1 supports
many different sizes but is much smaller, hardware tablewalk only loads into
TLB0, etc.

-Scott




Re: [PATCH] powerpc/powernv/idle: Restore IAMR after idle

2019-02-19 Thread Akshay Adiga
On Tue, Feb 19, 2019 at 02:21:04PM +1000, Nicholas Piggin wrote:
> Michael Ellerman's on February 8, 2019 11:04 am:
> > Nicholas Piggin  writes:
> >> Russell Currey's on February 6, 2019 4:28 pm:
> >>> Without restoring the IAMR after idle, execution prevention on POWER9
> >>> with Radix MMU is overwritten and the kernel can freely execute userspace 
> >>> without
> >>> faulting.
> >>> 
> >>> This is necessary when returning from any stop state that modifies user
> >>> state, as well as hypervisor state.
> >>> 
> >>> To test how this fails without this patch, load the lkdtm driver and
> >>> do the following:
> >>> 
> >>>echo EXEC_USERSPACE > /sys/kernel/debug/provoke-crash/DIRECT
> >>> 
> >>> which won't fault, then boot the kernel with powersave=off, where it
> >>> will fault.  Applying this patch will fix this.
> >>> 
> >>> Fixes: 3b10d0095a1e ("powerpc/mm/radix: Prevent kernel execution of user
> >>> space")
> >>> Cc: 
> >>> Signed-off-by: Russell Currey 
> >>
> >> Good catch and debugging. This really should be a quirk, we don't want 
> >> to have to restore this thing on a thread switch.
> > 
> > I'm not sure I follow. We don't context switch it on Radix, but we do
> > on hash if pkeys are enabled.
> 
> Badly worded, I mean a hardware quirk. It should follow thread
> switches. Still, avoiding it for the no-loss case is better than
> nothing. We can just revisit it as an optimization if future
> hardware does not require the restore.

Apparently, the POWER9 Processor User’s Manual v2.0 documents that
IAMR can be lost, and that is not just the end.

Pasting excerpt from "Section 23.5.9.2 State Loss and Restoration,Page 309"

  On the POWER9 core, the only state that can be lost for
  Stop levels less than four, when PSSCR[ESL] = ‘1’ are the
  following SPRs: CR, FPSCR, VSCR, XER, DSCR, AMR, IAMR, UAMOR,
  AMOR, DAWR, DAWRX.

My observation is that AMOR is being used in kernel as of today
and AMOR is also lost (recreated in similar scenarios where
IAMR is lost).



[PATCH v2] powerpc/prom_init: add __init markers to all functions

2019-02-19 Thread Masahiro Yamada
It is fragile to rely on the compiler's optimization to avoid the
section mismatch. Some functions may not be necessarily inlined
when the compiler's inlining heuristic changes.

Add __init markers consistently.

As for prom_getprop() and prom_getproplen(), they are marked as
'inline', so inlining is guaranteed because PowerPC never enables
CONFIG_OPTIMIZE_INLINING. However, it would be better to leave the
inlining decision to the compiler. I replaced 'inline' with __init.
I added __maybe_unused to prom_getproplen() because it is currently
relying on the side-effect of 'inline'; GCC does not report
-Wunused-function warnings for functions with 'inline' marker.

Signed-off-by: Masahiro Yamada 
---

Changes in v2:
  - Add __maybe_unsed to prom_getproplen()
  - Add __init to enter_prom() as well

 arch/powerpc/kernel/prom_init.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index f33ff41..1bad0ac 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -138,9 +138,9 @@ extern void __start(unsigned long r3, unsigned long r4, 
unsigned long r5,
unsigned long r9);
 
 #ifdef CONFIG_PPC64
-extern int enter_prom(struct prom_args *args, unsigned long entry);
+extern int __init enter_prom(struct prom_args *args, unsigned long entry);
 #else
-static inline int enter_prom(struct prom_args *args, unsigned long entry)
+static int __init enter_prom(struct prom_args *args, unsigned long entry)
 {
return ((int (*)(struct prom_args *))entry)(args);
 }
@@ -501,19 +501,20 @@ static int __init prom_next_node(phandle *nodep)
}
 }
 
-static inline int prom_getprop(phandle node, const char *pname,
+static int __init prom_getprop(phandle node, const char *pname,
   void *value, size_t valuelen)
 {
return call_prom("getprop", 4, 1, node, ADDR(pname),
 (u32)(unsigned long) value, (u32) valuelen);
 }
 
-static inline int prom_getproplen(phandle node, const char *pname)
+static int __init __maybe_unused prom_getproplen(phandle node,
+const char *pname)
 {
return call_prom("getproplen", 2, 1, node, ADDR(pname));
 }
 
-static void add_string(char **str, const char *q)
+static void __init add_string(char **str, const char *q)
 {
char *p = *str;
 
@@ -523,7 +524,7 @@ static void add_string(char **str, const char *q)
*str = p;
 }
 
-static char *tohex(unsigned int x)
+static char __init *tohex(unsigned int x)
 {
static const char digits[] __initconst = "0123456789abcdef";
static char result[9] __prombss;
@@ -570,7 +571,7 @@ static int __init prom_setprop(phandle node, const char 
*nodename,
 #define islower(c) ('a' <= (c) && (c) <= 'z')
 #define toupper(c) (islower(c) ? ((c) - 'a' + 'A') : (c))
 
-static unsigned long prom_strtoul(const char *cp, const char **endp)
+static unsigned long __init prom_strtoul(const char *cp, const char **endp)
 {
unsigned long result = 0, base = 10, value;
 
@@ -595,7 +596,7 @@ static unsigned long prom_strtoul(const char *cp, const 
char **endp)
return result;
 }
 
-static unsigned long prom_memparse(const char *ptr, const char **retptr)
+static unsigned long __init prom_memparse(const char *ptr, const char **retptr)
 {
unsigned long ret = prom_strtoul(ptr, retptr);
int shift = 0;
@@ -2924,7 +2925,7 @@ static void __init fixup_device_tree_pasemi(void)
prom_setprop(iob, name, "device_type", "isa", sizeof("isa"));
 }
 #else  /* !CONFIG_PPC_PASEMI_NEMO */
-static inline void fixup_device_tree_pasemi(void) { }
+static void __init fixup_device_tree_pasemi(void) { }
 #endif
 
 static void __init fixup_device_tree(void)
@@ -2986,15 +2987,15 @@ static void __init prom_check_initrd(unsigned long r3, 
unsigned long r4)
 
 #ifdef CONFIG_PPC64
 #ifdef CONFIG_RELOCATABLE
-static void reloc_toc(void)
+static void __init reloc_toc(void)
 {
 }
 
-static void unreloc_toc(void)
+static void __init unreloc_toc(void)
 {
 }
 #else
-static void __reloc_toc(unsigned long offset, unsigned long nr_entries)
+static void __init __reloc_toc(unsigned long offset, unsigned long nr_entries)
 {
unsigned long i;
unsigned long *toc_entry;
@@ -3008,7 +3009,7 @@ static void __reloc_toc(unsigned long offset, unsigned 
long nr_entries)
}
 }
 
-static void reloc_toc(void)
+static void __init reloc_toc(void)
 {
unsigned long offset = reloc_offset();
unsigned long nr_entries =
@@ -3019,7 +3020,7 @@ static void reloc_toc(void)
mb();
 }
 
-static void unreloc_toc(void)
+static void __init unreloc_toc(void)
 {
unsigned long offset = reloc_offset();
unsigned long nr_entries =
-- 
2.7.4



[RESEND PATCH 3/7] mm/gup: Change GUP fast to use flags rather than a write 'bool'

2019-02-19 Thread ira . weiny
From: Ira Weiny 

To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.

This patch does not change any functionality.  New functionality will
follow in subsequent patches.

Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.

Signed-off-by: Ira Weiny 
---
 arch/mips/mm/gup.c | 11 ++-
 arch/powerpc/kvm/book3s_64_mmu_hv.c|  4 ++--
 arch/powerpc/kvm/e500_mmu.c|  2 +-
 arch/powerpc/mm/mmu_context_iommu.c|  4 ++--
 arch/s390/kvm/interrupt.c  |  2 +-
 arch/s390/mm/gup.c | 12 ++--
 arch/sh/mm/gup.c   | 11 ++-
 arch/sparc/mm/gup.c|  9 +
 arch/x86/kvm/paging_tmpl.h |  2 +-
 arch/x86/kvm/svm.c |  2 +-
 drivers/fpga/dfl-afu-dma-region.c  |  2 +-
 drivers/gpu/drm/via/via_dmablit.c  |  3 ++-
 drivers/infiniband/hw/hfi1/user_pages.c|  3 ++-
 drivers/misc/genwqe/card_utils.c   |  2 +-
 drivers/misc/vmw_vmci/vmci_host.c  |  2 +-
 drivers/misc/vmw_vmci/vmci_queue_pair.c|  6 --
 drivers/platform/goldfish/goldfish_pipe.c  |  3 ++-
 drivers/rapidio/devices/rio_mport_cdev.c   |  4 +++-
 drivers/sbus/char/oradax.c |  2 +-
 drivers/scsi/st.c  |  3 ++-
 drivers/staging/gasket/gasket_page_table.c |  4 ++--
 drivers/tee/tee_shm.c  |  2 +-
 drivers/vfio/vfio_iommu_spapr_tce.c|  3 ++-
 drivers/vhost/vhost.c  |  2 +-
 drivers/video/fbdev/pvr2fb.c   |  2 +-
 drivers/virt/fsl_hypervisor.c  |  2 +-
 drivers/xen/gntdev.c   |  2 +-
 fs/orangefs/orangefs-bufmap.c  |  2 +-
 include/linux/mm.h |  4 ++--
 kernel/futex.c |  2 +-
 lib/iov_iter.c |  7 +--
 mm/gup.c   | 10 +-
 mm/util.c  |  8 
 net/ceph/pagevec.c |  2 +-
 net/rds/info.c |  2 +-
 net/rds/rdma.c |  3 ++-
 36 files changed, 81 insertions(+), 65 deletions(-)

diff --git a/arch/mips/mm/gup.c b/arch/mips/mm/gup.c
index 0d14e0d8eacf..4c2b4483683c 100644
--- a/arch/mips/mm/gup.c
+++ b/arch/mips/mm/gup.c
@@ -235,7 +235,7 @@ int __get_user_pages_fast(unsigned long start, int 
nr_pages, int write,
  * get_user_pages_fast() - pin user pages in memory
  * @start: starting user address
  * @nr_pages:  number of pages from start to pin
- * @write: whether pages will be written to
+ * @gup_flags: flags modifying pin behaviour
  * @pages: array that receives pointers to the pages pinned.
  * Should be at least nr_pages long.
  *
@@ -247,8 +247,8 @@ int __get_user_pages_fast(unsigned long start, int 
nr_pages, int write,
  * requested. If nr_pages is 0 or negative, returns 0. If no pages
  * were pinned, returns -errno.
  */
-int get_user_pages_fast(unsigned long start, int nr_pages, int write,
-   struct page **pages)
+int get_user_pages_fast(unsigned long start, int nr_pages,
+   unsigned int gup_flags, struct page **pages)
 {
struct mm_struct *mm = current->mm;
unsigned long addr, len, end;
@@ -273,7 +273,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, 
int write,
next = pgd_addr_end(addr, end);
if (pgd_none(pgd))
goto slow;
-   if (!gup_pud_range(pgd, addr, next, write, pages, ))
+   if (!gup_pud_range(pgd, addr, next, gup_flags & FOLL_WRITE,
+  pages, ))
goto slow;
} while (pgdp++, addr = next, addr != end);
local_irq_enable();
@@ -289,7 +290,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, 
int write,
pages += nr;
 
ret = get_user_pages_unlocked(start, (end - start) >> PAGE_SHIFT,
- pages, write ? FOLL_WRITE : 0);
+ pages, gup_flags);
 
/* Have to be a bit careful with return values */
if (nr > 0) {
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index bd2dcfbf00cd..8fcb0a921e46 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -582,7 +582,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
/* If writing != 0, then the HPTE must allow writing, if we get here */
write_ok = writing;
hva = gfn_to_hva_memslot(memslot, gfn);
-   npages = get_user_pages_fast(hva, 1, writing, pages);
+   npages = get_user_pages_fast(hva, 1, 

[RESEND PATCH 4/7] mm/gup: Add FOLL_LONGTERM capability to GUP fast

2019-02-19 Thread ira . weiny
From: Ira Weiny 

DAX pages were previously unprotected from longterm pins when users
called get_user_pages_fast().

Use the new FOLL_LONGTERM flag to check for DEVMAP pages and fall
back to regular GUP processing if a DEVMAP page is encountered.

Signed-off-by: Ira Weiny 
---
 mm/gup.c | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 6f32d36b3c5b..f7e759c523bb 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1439,6 +1439,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, 
unsigned long end,
goto pte_unmap;
 
if (pte_devmap(pte)) {
+   if (unlikely(flags & FOLL_LONGTERM))
+   goto pte_unmap;
+
pgmap = get_dev_pagemap(pte_pfn(pte), pgmap);
if (unlikely(!pgmap)) {
undo_dev_pagemap(nr, nr_start, pages);
@@ -1578,8 +1581,11 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, 
unsigned long addr,
if (!pmd_access_permitted(orig, flags & FOLL_WRITE))
return 0;
 
-   if (pmd_devmap(orig))
+   if (pmd_devmap(orig)) {
+   if (unlikely(flags & FOLL_LONGTERM))
+   return 0;
return __gup_device_huge_pmd(orig, pmdp, addr, end, pages, nr);
+   }
 
refs = 0;
page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
@@ -1904,8 +1910,20 @@ int get_user_pages_fast(unsigned long start, int 
nr_pages,
start += nr << PAGE_SHIFT;
pages += nr;
 
-   ret = get_user_pages_unlocked(start, nr_pages - nr, pages,
- gup_flags);
+   if (gup_flags & FOLL_LONGTERM) {
+   down_read(>mm->mmap_sem);
+   ret = __gup_longterm_locked(current, current->mm,
+   start, nr_pages - nr,
+   pages, NULL, gup_flags);
+   up_read(>mm->mmap_sem);
+   } else {
+   /*
+* retain FAULT_FOLL_ALLOW_RETRY optimization if
+* possible
+*/
+   ret = get_user_pages_unlocked(start, nr_pages - nr,
+ pages, gup_flags);
+   }
 
/* Have to be a bit careful with return values */
if (nr > 0) {
-- 
2.20.1



[RESEND PATCH 2/7] mm/gup: Change write parameter to flags in fast walk

2019-02-19 Thread ira . weiny
From: Ira Weiny 

In order to support more options in the GUP fast walk, change
the write parameter to flags throughout the call stack.

This patch does not change functionality and passes FOLL_WRITE
where write was previously used.

Signed-off-by: Ira Weiny 
---
 mm/gup.c | 52 ++--
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index ee96eaff118c..681388236106 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1417,7 +1417,7 @@ static void undo_dev_pagemap(int *nr, int nr_start, 
struct page **pages)
 
 #ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
 static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
-int write, struct page **pages, int *nr)
+unsigned int flags, struct page **pages, int *nr)
 {
struct dev_pagemap *pgmap = NULL;
int nr_start = *nr, ret = 0;
@@ -1435,7 +1435,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, 
unsigned long end,
if (pte_protnone(pte))
goto pte_unmap;
 
-   if (!pte_access_permitted(pte, write))
+   if (!pte_access_permitted(pte, flags & FOLL_WRITE))
goto pte_unmap;
 
if (pte_devmap(pte)) {
@@ -1487,7 +1487,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, 
unsigned long end,
  * useful to have gup_huge_pmd even if we can't operate on ptes.
  */
 static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
-int write, struct page **pages, int *nr)
+unsigned int flags, struct page **pages, int *nr)
 {
return 0;
 }
@@ -1570,12 +1570,12 @@ static int __gup_device_huge_pud(pud_t pud, pud_t 
*pudp, unsigned long addr,
 #endif
 
 static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
-   unsigned long end, int write, struct page **pages, int *nr)
+   unsigned long end, unsigned int flags, struct page **pages, int 
*nr)
 {
struct page *head, *page;
int refs;
 
-   if (!pmd_access_permitted(orig, write))
+   if (!pmd_access_permitted(orig, flags & FOLL_WRITE))
return 0;
 
if (pmd_devmap(orig))
@@ -1608,12 +1608,12 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, 
unsigned long addr,
 }
 
 static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
-   unsigned long end, int write, struct page **pages, int *nr)
+   unsigned long end, unsigned int flags, struct page **pages, int 
*nr)
 {
struct page *head, *page;
int refs;
 
-   if (!pud_access_permitted(orig, write))
+   if (!pud_access_permitted(orig, flags & FOLL_WRITE))
return 0;
 
if (pud_devmap(orig))
@@ -1646,13 +1646,13 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, 
unsigned long addr,
 }
 
 static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
-   unsigned long end, int write,
+   unsigned long end, unsigned int flags,
struct page **pages, int *nr)
 {
int refs;
struct page *head, *page;
 
-   if (!pgd_access_permitted(orig, write))
+   if (!pgd_access_permitted(orig, flags & FOLL_WRITE))
return 0;
 
BUILD_BUG_ON(pgd_devmap(orig));
@@ -1683,7 +1683,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned 
long addr,
 }
 
 static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
-   int write, struct page **pages, int *nr)
+   unsigned int flags, struct page **pages, int *nr)
 {
unsigned long next;
pmd_t *pmdp;
@@ -1705,7 +1705,7 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, 
unsigned long end,
if (pmd_protnone(pmd))
return 0;
 
-   if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
+   if (!gup_huge_pmd(pmd, pmdp, addr, next, flags,
pages, nr))
return 0;
 
@@ -1715,9 +1715,9 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, 
unsigned long end,
 * pmd format and THP pmd format
 */
if (!gup_huge_pd(__hugepd(pmd_val(pmd)), addr,
-PMD_SHIFT, next, write, pages, nr))
+PMD_SHIFT, next, flags, pages, nr))
return 0;
-   } else if (!gup_pte_range(pmd, addr, next, write, pages, nr))
+   } else if (!gup_pte_range(pmd, addr, next, flags, pages, nr))
return 0;
} while (pmdp++, addr = next, addr != end);
 
@@ -1725,7 +1725,7 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, 
unsigned long end,
 }
 
 static int 

[RESEND PATCH 7/7] IB/mthca: Use the new FOLL_LONGTERM flag to get_user_pages_fast()

2019-02-19 Thread ira . weiny
From: Ira Weiny 

Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against
FS DAX pages being mapped.

Signed-off-by: Ira Weiny 
---
 drivers/infiniband/hw/mthca/mthca_memfree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c 
b/drivers/infiniband/hw/mthca/mthca_memfree.c
index 112d2f38e0de..8ff0e90d7564 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -472,7 +472,8 @@ int mthca_map_user_db(struct mthca_dev *dev, struct 
mthca_uar *uar,
goto out;
}
 
-   ret = get_user_pages_fast(uaddr & PAGE_MASK, 1, FOLL_WRITE, pages);
+   ret = get_user_pages_fast(uaddr & PAGE_MASK, 1,
+ FOLL_WRITE | FOLL_LONGTERM, pages);
if (ret < 0)
goto out;
 
-- 
2.20.1



[RESEND PATCH 1/7] mm/gup: Replace get_user_pages_longterm() with FOLL_LONGTERM

2019-02-19 Thread ira . weiny
From: Ira Weiny 

Rather than have a separate get_user_pages_longterm() call,
introduce FOLL_LONGTERM and change the longterm callers to use
it.

This patch does not change any functionality.

FOLL_LONGTERM can only be supported with get_user_pages() as it
requires vmas to determine if DAX is in use.

Signed-off-by: Ira Weiny 
---
 drivers/infiniband/core/umem.c |   5 +-
 drivers/infiniband/hw/qib/qib_user_pages.c |   8 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c   |   9 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c  |   6 +-
 drivers/vfio/vfio_iommu_type1.c|   3 +-
 include/linux/mm.h |  13 +-
 mm/gup.c   | 138 -
 mm/gup_benchmark.c |   5 +-
 8 files changed, 101 insertions(+), 86 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index b69d3efa8712..120a40df91b4 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -185,10 +185,11 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, 
unsigned long addr,
 
while (npages) {
down_read(>mmap_sem);
-   ret = get_user_pages_longterm(cur_base,
+   ret = get_user_pages(cur_base,
 min_t(unsigned long, npages,
   PAGE_SIZE / sizeof (struct page *)),
-gup_flags, page_list, vma_list);
+gup_flags | FOLL_LONGTERM,
+page_list, vma_list);
if (ret < 0) {
up_read(>mmap_sem);
goto umem_release;
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c 
b/drivers/infiniband/hw/qib/qib_user_pages.c
index ef8bcf366ddc..1b9368261035 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -114,10 +114,10 @@ int qib_get_user_pages(unsigned long start_page, size_t 
num_pages,
 
down_read(>mm->mmap_sem);
for (got = 0; got < num_pages; got += ret) {
-   ret = get_user_pages_longterm(start_page + got * PAGE_SIZE,
- num_pages - got,
- FOLL_WRITE | FOLL_FORCE,
- p + got, NULL);
+   ret = get_user_pages(start_page + got * PAGE_SIZE,
+num_pages - got,
+FOLL_LONGTERM | FOLL_WRITE | FOLL_FORCE,
+p + got, NULL);
if (ret < 0) {
up_read(>mm->mmap_sem);
goto bail_release;
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c 
b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 06862a6af185..1d9a182ac163 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -143,10 +143,11 @@ static int usnic_uiom_get_pages(unsigned long addr, 
size_t size, int writable,
ret = 0;
 
while (npages) {
-   ret = get_user_pages_longterm(cur_base,
-   min_t(unsigned long, npages,
-   PAGE_SIZE / sizeof(struct page *)),
-   gup_flags, page_list, NULL);
+   ret = get_user_pages(cur_base,
+min_t(unsigned long, npages,
+PAGE_SIZE / sizeof(struct page *)),
+gup_flags | FOLL_LONGTERM,
+page_list, NULL);
 
if (ret < 0)
goto out;
diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c 
b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 08929c087e27..870a2a526e0b 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -186,12 +186,12 @@ static int videobuf_dma_init_user_locked(struct 
videobuf_dmabuf *dma,
dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
data, size, dma->nr_pages);
 
-   err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
-flags, dma->pages, NULL);
+   err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
+flags | FOLL_LONGTERM, dma->pages, NULL);
 
if (err != dma->nr_pages) {
dma->nr_pages = (err >= 0) ? err : 0;
-   dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err,
+   dprintk(1, "get_user_pages: err=%d [%d]\n", err,
dma->nr_pages);
return err < 0 ? err : -EINVAL;
}
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 73652e21efec..1500bd0bb6da 100644
--- 

[RESEND PATCH 6/7] IB/qib: Use the new FOLL_LONGTERM flag to get_user_pages_fast()

2019-02-19 Thread ira . weiny
From: Ira Weiny 

Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against
FS DAX pages being mapped.

Signed-off-by: Ira Weiny 
---
 drivers/infiniband/hw/qib/qib_user_sdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/qib/qib_user_sdma.c 
b/drivers/infiniband/hw/qib/qib_user_sdma.c
index 31c523b2a9f5..b53cc0240e02 100644
--- a/drivers/infiniband/hw/qib/qib_user_sdma.c
+++ b/drivers/infiniband/hw/qib/qib_user_sdma.c
@@ -673,7 +673,7 @@ static int qib_user_sdma_pin_pages(const struct qib_devdata 
*dd,
else
j = npages;
 
-   ret = get_user_pages_fast(addr, j, 0, pages);
+   ret = get_user_pages_fast(addr, j, FOLL_LONGTERM, pages);
if (ret != j) {
i = 0;
j = ret;
-- 
2.20.1



[RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it

2019-02-19 Thread ira . weiny
From: Ira Weiny 

Resending these as I had only 1 minor comment which I believe we have covered
in this series.  I was anticipating these going through the mm tree as they
depend on a cleanup patch there and the IB changes are very minor.  But they
could just as well go through the IB tree.

NOTE: This series depends on my clean up patch to remove the write parameter
from gup_fast_permitted()[1]

HFI1, qib, and mthca, use get_user_pages_fast() due to it performance
advantages.  These pages can be held for a significant time.  But
get_user_pages_fast() does not protect against mapping of FS DAX pages.

Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
retains the performance while also adding the FS DAX checks.  XDP has also
shown interest in using this functionality.[2]

In addition we change get_user_pages() to use the new FOLL_LONGTERM flag and
remove the specialized get_user_pages_longterm call.

[1] https://lkml.org/lkml/2019/2/11/237
[2] https://lkml.org/lkml/2019/2/11/1789

Ira Weiny (7):
  mm/gup: Replace get_user_pages_longterm() with FOLL_LONGTERM
  mm/gup: Change write parameter to flags in fast walk
  mm/gup: Change GUP fast to use flags rather than a write 'bool'
  mm/gup: Add FOLL_LONGTERM capability to GUP fast
  IB/hfi1: Use the new FOLL_LONGTERM flag to get_user_pages_fast()
  IB/qib: Use the new FOLL_LONGTERM flag to get_user_pages_fast()
  IB/mthca: Use the new FOLL_LONGTERM flag to get_user_pages_fast()

 arch/mips/mm/gup.c  |  11 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c |   4 +-
 arch/powerpc/kvm/e500_mmu.c |   2 +-
 arch/powerpc/mm/mmu_context_iommu.c |   4 +-
 arch/s390/kvm/interrupt.c   |   2 +-
 arch/s390/mm/gup.c  |  12 +-
 arch/sh/mm/gup.c|  11 +-
 arch/sparc/mm/gup.c |   9 +-
 arch/x86/kvm/paging_tmpl.h  |   2 +-
 arch/x86/kvm/svm.c  |   2 +-
 drivers/fpga/dfl-afu-dma-region.c   |   2 +-
 drivers/gpu/drm/via/via_dmablit.c   |   3 +-
 drivers/infiniband/core/umem.c  |   5 +-
 drivers/infiniband/hw/hfi1/user_pages.c |   5 +-
 drivers/infiniband/hw/mthca/mthca_memfree.c |   3 +-
 drivers/infiniband/hw/qib/qib_user_pages.c  |   8 +-
 drivers/infiniband/hw/qib/qib_user_sdma.c   |   2 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c|   9 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c   |   6 +-
 drivers/misc/genwqe/card_utils.c|   2 +-
 drivers/misc/vmw_vmci/vmci_host.c   |   2 +-
 drivers/misc/vmw_vmci/vmci_queue_pair.c |   6 +-
 drivers/platform/goldfish/goldfish_pipe.c   |   3 +-
 drivers/rapidio/devices/rio_mport_cdev.c|   4 +-
 drivers/sbus/char/oradax.c  |   2 +-
 drivers/scsi/st.c   |   3 +-
 drivers/staging/gasket/gasket_page_table.c  |   4 +-
 drivers/tee/tee_shm.c   |   2 +-
 drivers/vfio/vfio_iommu_spapr_tce.c |   3 +-
 drivers/vfio/vfio_iommu_type1.c |   3 +-
 drivers/vhost/vhost.c   |   2 +-
 drivers/video/fbdev/pvr2fb.c|   2 +-
 drivers/virt/fsl_hypervisor.c   |   2 +-
 drivers/xen/gntdev.c|   2 +-
 fs/orangefs/orangefs-bufmap.c   |   2 +-
 include/linux/mm.h  |  17 +-
 kernel/futex.c  |   2 +-
 lib/iov_iter.c  |   7 +-
 mm/gup.c| 220 
 mm/gup_benchmark.c  |   5 +-
 mm/util.c   |   8 +-
 net/ceph/pagevec.c  |   2 +-
 net/rds/info.c  |   2 +-
 net/rds/rdma.c  |   3 +-
 44 files changed, 232 insertions(+), 180 deletions(-)

-- 
2.20.1



[RESEND PATCH 5/7] IB/hfi1: Use the new FOLL_LONGTERM flag to get_user_pages_fast()

2019-02-19 Thread ira . weiny
From: Ira Weiny 

Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against
FS DAX pages being mapped.

Signed-off-by: Ira Weiny 
---
 drivers/infiniband/hw/hfi1/user_pages.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index 78ccacaf97d0..6a7f9cd5a94e 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -104,9 +104,11 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned 
long vaddr, size_t np
bool writable, struct page **pages)
 {
int ret;
+   unsigned int gup_flags = writable ? FOLL_WRITE : 0;
 
-   ret = get_user_pages_fast(vaddr, npages, writable ? FOLL_WRITE : 0,
- pages);
+   gup_flags |= FOLL_LONGTERM;
+
+   ret = get_user_pages_fast(vaddr, npages, gup_flags, pages);
if (ret < 0)
return ret;
 
-- 
2.20.1



[PATCH] powerpc/kvm: Save and restore AMR instead of zeroing

2019-02-19 Thread Russell Currey
Using the hash MMU on P7+, the AMR is used for pkeys.  It's important
that the host and guest never end up with each other's AMR value, since
this could disrupt operations and break things.

The AMR gets correctly restored on context switch, however before this
happens (i.e. in a program like qemu) having the host value of the AMR
be zero would interfere with that program using pkeys.

In addition, the AMR on Radix can control kernel access to userspace
data, which you wouldn't want to be zeroed.

So, just save and restore it like the other registers that get saved and
restored.

Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem")
Cc:  # v4.16+
Signed-off-by: Russell Currey 
---
I'm not entirely sure the stack frame numbers are correct, I've tested it
and it works but it'd be good if someone could double check this.

 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 9b8d50a7cbaf..6291751c4ad9 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -47,7 +47,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 #define NAPPING_NOVCPU 2
 
 /* Stack frame offsets for kvmppc_hv_entry */
-#define SFS208
+#define SFS224 /* must be divisible by 16 */
 #define STACK_SLOT_TRAP(SFS-4)
 #define STACK_SLOT_SHORT_PATH  (SFS-8)
 #define STACK_SLOT_TID (SFS-16)
@@ -58,8 +58,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 #define STACK_SLOT_DAWR(SFS-56)
 #define STACK_SLOT_DAWRX   (SFS-64)
 #define STACK_SLOT_HFSCR   (SFS-72)
+#define STACK_SLOT_AMR (SFS-80)
 /* the following is used by the P9 short path */
-#define STACK_SLOT_NVGPRS  (SFS-152)   /* 18 gprs */
+#define STACK_SLOT_NVGPRS  (SFS-160)   /* 18 gprs */
 
 /*
  * Call kvmppc_hv_entry in real mode.
@@ -743,6 +744,9 @@ BEGIN_FTR_SECTION
std r7, STACK_SLOT_DAWRX(r1)
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
+   mfspr   r5, SPRN_AMR
+   std r5, STACK_SLOT_AMR(r1)
+
 BEGIN_FTR_SECTION
/* Set partition DABR */
/* Do this before re-enabling PMU to avoid P7 DABR corruption bug */
@@ -1640,13 +1644,14 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 8:
 
-   /* Save and reset AMR and UAMOR before turning on the MMU */
+   /* Save and restore/reset AMR and UAMOR before turning on the MMU */
mfspr   r5,SPRN_AMR
mfspr   r6,SPRN_UAMOR
std r5,VCPU_AMR(r9)
std r6,VCPU_UAMOR(r9)
+   ld  r5,STACK_SLOT_AMR(r1)
li  r6,0
-   mtspr   SPRN_AMR,r6
+   mtspr   SPRN_AMR, r5
mtspr   SPRN_UAMOR, r6
 
/* Switch DSCR back to host value */
-- 
2.20.1



Re: [PATCH v2 2/2] powerpc: Enable kcov

2019-02-19 Thread Andrew Donnellan

On 20/2/19 3:26 pm, Daniel Axtens wrote:

I needed the following diff to get this booting on a T4240RDB:

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 961f44eabb65..fbe9894d6305 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -3,6 +3,10 @@
  # Makefile for the linux kernel.
  #
  
+KCOV_INSTRUMENT_cputable.o := n

+KCOV_INSTRUMENT_setup_64.o := n
+KCOV_INSTRUMENT_paca.o := n
+
  CFLAGS_ptrace.o+= -DUTS_MACHINE='"$(UTS_MACHINE)"'
  
  # Disable clang warning for using setjmp without setjmp.h header

diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index f965fc33a8b7..0140e7e12c29 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -5,6 +5,9 @@
  
  ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
  
+KCOV_INSTRUMENT_tlb_nohash.o := n

+KCOV_INSTRUMENT_fsl_booke_mmu.o := n
+
  CFLAGS_REMOVE_slb.o = $(CC_FLAGS_FTRACE)
  
  obj-y  := fault.o mem.o pgtable.o mmap.o \



The change to kernel/ is required to get the kernel to even begin to
boot, and the change to mm/ is required to successfully set up SMP. I'm
not sure preciesly why they cause issues.


Thanks for testing this - I'll roll this into v3.



I was then able to run kcovtrace and the results seem to make sense. >
Perhaps in the future some further stuff should be trimmed down to make
the coverage results less noisy (restore_math is probably not telling us
anything interesting, for example), but certainly this is a great start.


I think syzkaller (as the main kcov consumer) can probably cope...



With those changes,
Tested-by: Daniel Axtens  # e6500



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH v2 2/2] powerpc: Enable kcov

2019-02-19 Thread Daniel Axtens
Hi Andrew,

> kcov provides kernel coverage data that's useful for fuzzing tools like
> syzkaller.
>
> Wire up kcov support on powerpc. Disable kcov instrumentation on the same
> files where we currently disable gcov and UBSan instrumentation.
>
> Signed-off-by: Andrew Donnellan 
> Acked-by: Dmitry Vyukov 

I needed the following diff to get this booting on a T4240RDB:

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 961f44eabb65..fbe9894d6305 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -3,6 +3,10 @@
 # Makefile for the linux kernel.
 #
 
+KCOV_INSTRUMENT_cputable.o := n
+KCOV_INSTRUMENT_setup_64.o := n
+KCOV_INSTRUMENT_paca.o := n
+
 CFLAGS_ptrace.o+= -DUTS_MACHINE='"$(UTS_MACHINE)"'
 
 # Disable clang warning for using setjmp without setjmp.h header
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index f965fc33a8b7..0140e7e12c29 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -5,6 +5,9 @@
 
 ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
 
+KCOV_INSTRUMENT_tlb_nohash.o := n
+KCOV_INSTRUMENT_fsl_booke_mmu.o := n
+
 CFLAGS_REMOVE_slb.o = $(CC_FLAGS_FTRACE)
 
 obj-y  := fault.o mem.o pgtable.o mmap.o \


The change to kernel/ is required to get the kernel to even begin to
boot, and the change to mm/ is required to successfully set up SMP. I'm
not sure preciesly why they cause issues.

I was then able to run kcovtrace and the results seem to make sense.

Perhaps in the future some further stuff should be trimmed down to make
the coverage results less noisy (restore_math is probably not telling us
anything interesting, for example), but certainly this is a great start.

With those changes,
Tested-by: Daniel Axtens  # e6500

Regards,

> ---
>  arch/powerpc/Kconfig| 1 +
>  arch/powerpc/kernel/Makefile| 7 ++-
>  arch/powerpc/kernel/trace/Makefile  | 3 ++-
>  arch/powerpc/kernel/vdso32/Makefile | 1 +
>  arch/powerpc/kernel/vdso64/Makefile | 1 +
>  arch/powerpc/xmon/Makefile  | 1 +
>  6 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 2890d36eb531..d3698dae0e60 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -134,6 +134,7 @@ config PPC
>   select ARCH_HAS_ELF_RANDOMIZE
>   select ARCH_HAS_FORTIFY_SOURCE
>   select ARCH_HAS_GCOV_PROFILE_ALL
> + select ARCH_HAS_KCOV
>   select ARCH_HAS_PHYS_TO_DMA
>   select ARCH_HAS_PMEM_APIif PPC64
>   select ARCH_HAS_PTE_SPECIAL
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index cb7f0bb9ee71..961f44eabb65 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -142,16 +142,21 @@ endif
>  obj-$(CONFIG_EPAPR_PARAVIRT) += epapr_paravirt.o epapr_hcalls.o
>  obj-$(CONFIG_KVM_GUEST)  += kvm.o kvm_emul.o
>  
> -# Disable GCOV & sanitizers in odd or sensitive code
> +# Disable GCOV, KCOV & sanitizers in odd or sensitive code
>  GCOV_PROFILE_prom_init.o := n
> +KCOV_INSTRUMENT_prom_init.o := n
>  UBSAN_SANITIZE_prom_init.o := n
>  GCOV_PROFILE_machine_kexec_64.o := n
> +KCOV_INSTRUMENT_machine_kexec_64.o := n
>  UBSAN_SANITIZE_machine_kexec_64.o := n
>  GCOV_PROFILE_machine_kexec_32.o := n
> +KCOV_INSTRUMENT_machine_kexec_32.o := n
>  UBSAN_SANITIZE_machine_kexec_32.o := n
>  GCOV_PROFILE_kprobes.o := n
> +KCOV_INSTRUMENT_kprobes.o := n
>  UBSAN_SANITIZE_kprobes.o := n
>  GCOV_PROFILE_kprobes-ftrace.o := n
> +KCOV_INSTRUMENT_kprobes-ftrace.o := n
>  UBSAN_SANITIZE_kprobes-ftrace.o := n
>  UBSAN_SANITIZE_vdso.o := n
>  
> diff --git a/arch/powerpc/kernel/trace/Makefile 
> b/arch/powerpc/kernel/trace/Makefile
> index b1725ad3e13d..858503775c58 100644
> --- a/arch/powerpc/kernel/trace/Makefile
> +++ b/arch/powerpc/kernel/trace/Makefile
> @@ -23,6 +23,7 @@ obj-$(CONFIG_TRACING)   += trace_clock.o
>  obj-$(CONFIG_PPC64)  += $(obj64-y)
>  obj-$(CONFIG_PPC32)  += $(obj32-y)
>  
> -# Disable GCOV & sanitizers in odd or sensitive code
> +# Disable GCOV, KCOV & sanitizers in odd or sensitive code
>  GCOV_PROFILE_ftrace.o := n
> +KCOV_INSTRUMENT_ftrace.o := n
>  UBSAN_SANITIZE_ftrace.o := n
> diff --git a/arch/powerpc/kernel/vdso32/Makefile 
> b/arch/powerpc/kernel/vdso32/Makefile
> index 50112d4473bb..ce199f6e4256 100644
> --- a/arch/powerpc/kernel/vdso32/Makefile
> +++ b/arch/powerpc/kernel/vdso32/Makefile
> @@ -23,6 +23,7 @@ targets := $(obj-vdso32) vdso32.so vdso32.so.dbg
>  obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))
>  
>  GCOV_PROFILE := n
> +KCOV_INSTRUMENT := n
>  UBSAN_SANITIZE := n
>  
>  ccflags-y := -shared -fno-common -fno-builtin
> diff --git a/arch/powerpc/kernel/vdso64/Makefile 
> b/arch/powerpc/kernel/vdso64/Makefile
> index 69cecb346269..28e7d112aa2f 100644
> --- a/arch/powerpc/kernel/vdso64/Makefile
> +++ 

RE: [PATCHv6 3/4] pci: layerscape: Add the EP mode support.

2019-02-19 Thread Xiaowei Bao


-Original Message-
From: Lorenzo Pieralisi  
Sent: 2019年2月19日 19:27
To: Xiaowei Bao 
Cc: bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com; 
shawn...@kernel.org; Leo Li ; kis...@ti.com; a...@arndb.de; 
gre...@linuxfoundation.org; M.h. Lian ; Mingkai Hu 
; Roy Zang ; 
kstew...@linuxfoundation.org; cyrille.pitc...@free-electrons.com; 
pombreda...@nexb.com; shawn@rock-chips.com; linux-...@vger.kernel.org; 
devicet...@vger.kernel.org; linux-ker...@vger.kernel.org; 
linux-arm-ker...@lists.infradead.org; linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCHv6 3/4] pci: layerscape: Add the EP mode support.

On Tue, Jan 22, 2019 at 02:33:27PM +0800, Xiaowei Bao wrote:
> Add the PCIe EP mode support for layerscape platform.
> 
> Signed-off-by: Xiaowei Bao 
> Reviewed-by: Minghuan Lian 
> Reviewed-by: Zhiqiang Hou 
> Reviewed-by: Kishon Vijay Abraham I 
> ---
> depends on: 
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpat
> chwork.kernel.org%2Fproject%2Flinux-pci%2Flist%2F%3Fseries%3D66177
> ;data=02%7C01%7Cxiaowei.bao%40nxp.com%7C6f8772ba47c74d8ee0aa08d6965d3e
> b4%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636861724572611193
> ;sdata=b3Acj0fu7c9vSoHe9VzeAkEMbMkpyfYPsXtf6fA8Flk%3Dreserved=0
> 
> v2:
>  - remove the EP mode check function.
> v3:
>  - modif the return value when enter default case.
> v4:
>  - no change.
> v5:
>  - no change.
> v6:
>  - modify the code base on the submit patch of the EP framework.

Can I apply this series to my pci/endpoint branch (where I queued Kishon's EP 
features rework patches) ? Can you check please ?
[Xiaowei Bao] of course, in my patch, I found a compile warning, but this 
series patch have approved by you, I don't know how to do, the compile warning: 
" struct pci_epc *epc = ep->epc;" in "ls_pcie_ep_init" function, I am so sorry, 
could you help me remove this code, thanks a lot.

Thanks,
Lorenzo

>  drivers/pci/controller/dwc/Makefile|2 +-
>  drivers/pci/controller/dwc/pci-layerscape-ep.c |  157 
> 
>  2 files changed, 158 insertions(+), 1 deletions(-)  create mode 
> 100644 drivers/pci/controller/dwc/pci-layerscape-ep.c
> 
> diff --git a/drivers/pci/controller/dwc/Makefile 
> b/drivers/pci/controller/dwc/Makefile
> index 7bcdcdf..b5f3b83 100644
> --- a/drivers/pci/controller/dwc/Makefile
> +++ b/drivers/pci/controller/dwc/Makefile
> @@ -8,7 +8,7 @@ obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o
>  obj-$(CONFIG_PCI_IMX6) += pci-imx6.o
>  obj-$(CONFIG_PCIE_SPEAR13XX) += pcie-spear13xx.o
>  obj-$(CONFIG_PCI_KEYSTONE) += pci-keystone.o
> -obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o
> +obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o pci-layerscape-ep.o
>  obj-$(CONFIG_PCIE_QCOM) += pcie-qcom.o
>  obj-$(CONFIG_PCIE_ARMADA_8K) += pcie-armada8k.o
>  obj-$(CONFIG_PCIE_ARTPEC6) += pcie-artpec6.o diff --git 
> a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
> b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> new file mode 100644
> index 000..ddc2dbb
> --- /dev/null
> +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> @@ -0,0 +1,157 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PCIe controller EP driver for Freescale Layerscape SoCs
> + *
> + * Copyright (C) 2018 NXP Semiconductor.
> + *
> + * Author: Xiaowei Bao   */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "pcie-designware.h"
> +
> +#define PCIE_DBI2_OFFSET 0x1000  /* DBI2 base address*/
> +
> +struct ls_pcie_ep {
> + struct dw_pcie  *pci;
> +};
> +
> +#define to_ls_pcie_ep(x) dev_get_drvdata((x)->dev)
> +
> +static int ls_pcie_establish_link(struct dw_pcie *pci) {
> + return 0;
> +}
> +
> +static const struct dw_pcie_ops ls_pcie_ep_ops = {
> + .start_link = ls_pcie_establish_link, };
> +
> +static const struct of_device_id ls_pcie_ep_of_match[] = {
> + { .compatible = "fsl,ls-pcie-ep",},
> + { },
> +};
> +
> +static const struct pci_epc_features ls_pcie_epc_features = {
> + .linkup_notifier = false,
> + .msi_capable = true,
> + .msix_capable = false,
> +};
> +
> +static const struct pci_epc_features* ls_pcie_ep_get_features(struct 
> +dw_pcie_ep *ep) {
> + return _pcie_epc_features;
> +}
> +
> +static void ls_pcie_ep_init(struct dw_pcie_ep *ep) {
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + struct pci_epc *epc = ep->epc;
> + enum pci_barno bar;
> +
> + for (bar = BAR_0; bar <= BAR_5; bar++)
> + dw_pcie_ep_reset_bar(pci, bar);
> +}
> +
> +static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
> +   enum pci_epc_irq_type type, u16 
> interrupt_num) {
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> +
> + switch (type) {
> + case PCI_EPC_IRQ_LEGACY:
> + return dw_pcie_ep_raise_legacy_irq(ep, func_no);
> + case PCI_EPC_IRQ_MSI:
> + return dw_pcie_ep_raise_msi_irq(ep, 

[PATCH 1/2] KVM: PPC: Book3S HV: Simplify machine check handling

2019-02-19 Thread Paul Mackerras
This makes the handling of machine check interrupts that occur inside
a guest simpler and more robust, with less done in assembler code and
in real mode.

Now, when a machine check occurs inside a guest, we always get the
machine check event struct and put a copy in the vcpu struct for the
vcpu where the machine check occurred.  We no longer call
machine_check_queue_event() from kvmppc_realmode_mc_power7(), because
on POWER8, when a vcpu is running on an offline secondary thread and
we call machine_check_queue_event(), that calls irq_work_queue(),
which doesn't work because the CPU is offline, but instead triggers
the WARN_ON(lazy_irq_pending()) in pnv_smp_cpu_kill_self() (which
fires again and again because nothing clears the condition).

All that machine_check_queue_event() actually does is to cause the
event to be printed to the console.  For a machine check occurring in
the guest, we now print the event in kvmppc_handle_exit_hv()
instead.

The assembly code at label machine_check_realmode now just calls C
code and then continues exiting the guest.  We no longer either
synthesize a machine check for the guest in assembly code or return
to the guest without a machine check.

The code in kvmppc_handle_exit_hv() is extended to handle the case
where the guest is not FWNMI-capable.  In that case we now always
synthesize a machine check interrupt for the guest.  Previously, if
the host thinks it has recovered the machine check fully, it would
return to the guest without any notification that the machine check
had occurred.  If the machine check was caused by some action of the
guest (such as creating duplicate SLB entries), it is much better to
tell the guest that it has caused a problem.  Therefore we now always
generate a machine check interrupt for guests that are not
FWNMI-capable.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_ppc.h  |  3 +-
 arch/powerpc/kvm/book3s.c   |  7 +
 arch/powerpc/kvm/book3s_hv.c| 18 +--
 arch/powerpc/kvm/book3s_hv_ras.c| 56 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 40 ++-
 5 files changed, 42 insertions(+), 82 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index b3bf4f6..d283d31 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -143,6 +143,7 @@ extern void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu);
 
 extern int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu);
 extern int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu);
+extern void kvmppc_core_queue_machine_check(struct kvm_vcpu *vcpu, ulong 
flags);
 extern void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags);
 extern void kvmppc_core_queue_fpunavail(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_vec_unavail(struct kvm_vcpu *vcpu);
@@ -646,7 +647,7 @@ long int kvmppc_rm_h_confer(struct kvm_vcpu *vcpu, int 
target,
 unsigned int yield_count);
 long kvmppc_h_random(struct kvm_vcpu *vcpu);
 void kvmhv_commence_exit(int trap);
-long kvmppc_realmode_machine_check(struct kvm_vcpu *vcpu);
+void kvmppc_realmode_machine_check(struct kvm_vcpu *vcpu);
 void kvmppc_subcore_enter_guest(void);
 void kvmppc_subcore_exit_guest(void);
 long kvmppc_realmode_hmi_handler(void);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 22a46c6..10c5579 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -195,6 +195,13 @@ void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, 
unsigned int vec)
 }
 EXPORT_SYMBOL_GPL(kvmppc_book3s_queue_irqprio);
 
+void kvmppc_core_queue_machine_check(struct kvm_vcpu *vcpu, ulong flags)
+{
+   /* might as well deliver this straight away */
+   kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_MACHINE_CHECK, flags);
+}
+EXPORT_SYMBOL_GPL(kvmppc_core_queue_machine_check);
+
 void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags)
 {
/* might as well deliver this straight away */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 1860c0b..d8bf05a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1215,6 +1215,22 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
r = RESUME_GUEST;
break;
case BOOK3S_INTERRUPT_MACHINE_CHECK:
+   /* Print the MCE event to host console. */
+   machine_check_print_event_info(>arch.mce_evt, false);
+
+   /*
+* If the guest can do FWNMI, exit to userspace so it can
+* deliver a FWNMI to the guest.
+* Otherwise we synthesize a machine check for the guest
+* so that it knows that the machine check occurred.
+*/
+   if (!vcpu->kvm->arch.fwnmi_enabled) {
+   ulong flags = vcpu->arch.shregs.msr 

[PATCH 2/2] powerpc/64s: Better printing of machine check info for guest MCEs

2019-02-19 Thread Paul Mackerras
This adds an "in_guest" parameter to machine_check_print_event_info()
so that we can avoid trying to translate guest NIP values into
symbolic form using the host kernel's symbol table.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/mce.h| 2 +-
 arch/powerpc/kernel/mce.c | 8 +---
 arch/powerpc/kvm/book3s_hv.c  | 4 ++--
 arch/powerpc/platforms/powernv/opal.c | 2 +-
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index a8b8903..17996bc 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -209,7 +209,7 @@ extern int get_mce_event(struct machine_check_event *mce, 
bool release);
 extern void release_mce_event(void);
 extern void machine_check_queue_event(void);
 extern void machine_check_print_event_info(struct machine_check_event *evt,
-  bool user_mode);
+  bool user_mode, bool in_guest);
 #ifdef CONFIG_PPC_BOOK3S_64
 void flush_and_reload_slb(void);
 #endif /* CONFIG_PPC_BOOK3S_64 */
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index bd933a7..d01b690 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -301,13 +301,13 @@ static void machine_check_process_queued_event(struct 
irq_work *work)
while (__this_cpu_read(mce_queue_count) > 0) {
index = __this_cpu_read(mce_queue_count) - 1;
evt = this_cpu_ptr(_event_queue[index]);
-   machine_check_print_event_info(evt, false);
+   machine_check_print_event_info(evt, false, false);
__this_cpu_dec(mce_queue_count);
}
 }
 
 void machine_check_print_event_info(struct machine_check_event *evt,
-   bool user_mode)
+   bool user_mode, bool in_guest)
 {
const char *level, *sevstr, *subtype;
static const char *mc_ue_types[] = {
@@ -387,7 +387,9 @@ void machine_check_print_event_info(struct 
machine_check_event *evt,
   evt->disposition == MCE_DISPOSITION_RECOVERED ?
   "Recovered" : "Not recovered");
 
-   if (user_mode) {
+   if (in_guest) {
+   printk("%s  Guest NIP: %016llx\n", evt->srr0);
+   } else if (user_mode) {
printk("%s  NIP: [%016llx] PID: %d Comm: %s\n", level,
evt->srr0, current->pid, current->comm);
} else {
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d8bf05a..81cba4b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1216,7 +1216,7 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
break;
case BOOK3S_INTERRUPT_MACHINE_CHECK:
/* Print the MCE event to host console. */
-   machine_check_print_event_info(>arch.mce_evt, false);
+   machine_check_print_event_info(>arch.mce_evt, false, 
true);
 
/*
 * If the guest can do FWNMI, exit to userspace so it can
@@ -1406,7 +1406,7 @@ static int kvmppc_handle_nested_exit(struct kvm_run *run, 
struct kvm_vcpu *vcpu)
/* Pass the machine check to the L1 guest */
r = RESUME_HOST;
/* Print the MCE event to host console. */
-   machine_check_print_event_info(>arch.mce_evt, false);
+   machine_check_print_event_info(>arch.mce_evt, false, 
true);
break;
/*
 * We get these next two if the guest accesses a page which it thinks
diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 79586f1..05c85be 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -587,7 +587,7 @@ int opal_machine_check(struct pt_regs *regs)
   evt.version);
return 0;
}
-   machine_check_print_event_info(, user_mode(regs));
+   machine_check_print_event_info(, user_mode(regs), false);
 
if (opal_recover_mce(regs, ))
return 1;
-- 
2.7.4



Re: [PATCH v6] powerpc/64s: reimplement book3s idle code in C

2019-02-19 Thread Paul Mackerras
On Tue, Feb 19, 2019 at 02:13:51PM +1000, Nicholas Piggin wrote:
> Paul Mackerras's on February 18, 2019 9:06 am:
> > On Sat, Oct 13, 2018 at 10:04:09PM +1000, Nicholas Piggin wrote:
> >> Reimplement Book3S idle code in C, moving POWER7/8/9 implementation
> >> speific HV idle code to the powernv platform code.
> >> 
> > 
> > [...]
> > 
> >> @@ -2760,21 +2744,47 @@ BEGIN_FTR_SECTION
> >>li  r4, LPCR_PECE_HVEE@higher
> >>sldir4, r4, 32
> >>or  r5, r5, r4
> >> -END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
> >> +FTR_SECTION_ELSE
> >> +  li  r3, PNV_THREAD_NAP
> >> +ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
> >>mtspr   SPRN_LPCR,r5
> >>isync
> >> -  li  r0, 0
> >> -  std r0, HSTATE_SCRATCH0(r13)
> >> -  ptesync
> >> -  ld  r0, HSTATE_SCRATCH0(r13)
> >> -1:cmpdr0, r0
> >> -  bne 1b
> >> +
> >> +  mr  r0, r1
> >> +  ld  r1, PACAEMERGSP(r13)
> >> +  subir1, r1, STACK_FRAME_OVERHEAD
> >> +  std r0, 0(r1)
> >> +  ld  r0, PACAR1(r13)
> >> +  std r0, 8(r1)
> > 
> > This bit seems wrong to me.  If this is a secondary thread on POWER8,
> > we were already on the emergency stack, and now we've reset r1 back to
> > the top of the emergency stack and we're overwriting it.
> 
> I'll have to find some time to take another look at this stuff. The KVM
> stuff was a bit hasty.
> 
> > I wonder why you didn't see secondary threads going off into lala land
> > in your tests?
> 
> It must have been because I wasn't testing the guest SMT properly 
> because I did get it to break trivially sometime after posting this 
> patch out. So we were on the emergency stack here, that should make
> things easier, that may be what's wrong.

In fact I don't see why you need to load up a new stack here at all;
you could just use whatever stack we're currently on AFAICS.

Paul.


Re: [PATCH 1/6] powerpc sstep: Add maddhd, maddhdu, maddld instruction emulation

2019-02-19 Thread Michael Ellerman
Sandipan Das  writes:

> This adds emulation support for the following integer instructions:
>   * Multiply-Add High Doubleword (maddhd)
>   * Multiply-Add High Doubleword Unsigned (maddhdu)
>   * Multiply-Add Low Doubleword (maddld)

This doesn't build with old binutils.

{standard input}:2089: Error: Unrecognized opcode: `maddld'
{standard input}:2104: Error: Unrecognized opcode: `maddhdu'
{standard input}:1141: Error: Unrecognized opcode: `maddhd'


You'll need to add hand built versions, see ppc-opcode.h for examples.

cheers

> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index d81568f783e5..b40ec18515bd 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -1169,7 +1169,7 @@ static nokprobe_inline int trap_compare(long v1, long 
> v2)
>  int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
> unsigned int instr)
>  {
> - unsigned int opcode, ra, rb, rd, spr, u;
> + unsigned int opcode, ra, rb, rc, rd, spr, u;
>   unsigned long int imm;
>   unsigned long int val, val2;
>   unsigned int mb, me, sh;
> @@ -1292,6 +1292,7 @@ int analyse_instr(struct instruction_op *op, const 
> struct pt_regs *regs,
>   rd = (instr >> 21) & 0x1f;
>   ra = (instr >> 16) & 0x1f;
>   rb = (instr >> 11) & 0x1f;
> + rc = (instr >> 6) & 0x1f;
>  
>   switch (opcode) {
>  #ifdef __powerpc64__
> @@ -1305,6 +1306,38 @@ int analyse_instr(struct instruction_op *op, const 
> struct pt_regs *regs,
>   goto trap;
>   return 1;
>  
> +#ifdef __powerpc64__
> + case 4:
> + if (!cpu_has_feature(CPU_FTR_ARCH_300))
> + return -1;
> +
> + switch (instr & 0x3f) {
> + case 48:/* maddhd */
> + asm("maddhd %0,%1,%2,%3" : "=r" (op->val) :
> + "r" (regs->gpr[ra]), "r" (regs->gpr[rb]),
> + "r" (regs->gpr[rc]));
> + goto compute_done;
> +
> + case 49:/* maddhdu */
> + asm("maddhdu %0,%1,%2,%3" : "=r" (op->val) :
> + "r" (regs->gpr[ra]), "r" (regs->gpr[rb]),
> + "r" (regs->gpr[rc]));
> + goto compute_done;
> +
> + case 51:/* maddld */
> + asm("maddld %0,%1,%2,%3" : "=r" (op->val) :
> + "r" (regs->gpr[ra]), "r" (regs->gpr[rb]),
> + "r" (regs->gpr[rc]));
> + goto compute_done;
> + }
> +
> + /*
> +  * There are other instructions from ISA 3.0 with the same
> +  * primary opcode which do not have emulation support yet.
> +  */
> + return -1;
> +#endif
> +
>   case 7: /* mulli */
>   op->val = regs->gpr[ra] * (short) instr;
>   goto compute_done;
> -- 
> 2.14.4


Re: [PATCH 2/6] powerpc sstep: Add darn instruction emulation

2019-02-19 Thread Michael Ellerman
Sandipan Das  writes:

> This adds emulation support for the following integer instructions:
>   * Deliver A Random Number (darn)

This doesn't build with old binutils. We need to support old binutils.

{standard input}:4343: Error: Unrecognized opcode: `darn'


You need to use PPC_DARN().

cheers

> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index b40ec18515bd..18ac0a26c4fc 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -1728,6 +1728,25 @@ int analyse_instr(struct instruction_op *op, const 
> struct pt_regs *regs,
>   (int) regs->gpr[rb];
>   goto arith_done;
>  
> + case 755:   /* darn */
> + if (!cpu_has_feature(CPU_FTR_ARCH_300))
> + return -1;
> + switch (ra & 0x3) {
> + case 0:
> + asm("darn %0,0" : "=r" (op->val));
> + goto compute_done;
> +
> + case 1:
> + asm("darn %0,1" : "=r" (op->val));
> + goto compute_done;
> +
> + case 2:
> + asm("darn %0,2" : "=r" (op->val));
> + goto compute_done;
> + }
> +
> + return -1;
> +
>  
>  /*
>   * Logical instructions
> -- 
> 2.14.4


Re: [PATCH v3 1/7] dump_stack: Support adding to the dump stack arch description

2019-02-19 Thread Andrea Parri
On Mon, Feb 11, 2019 at 03:38:59PM +0100, Petr Mladek wrote:
> On Mon 2019-02-11 13:50:35, Andrea Parri wrote:
> > Hi Michael,
> > 
> > 
> > On Thu, Feb 07, 2019 at 11:46:29PM +1100, Michael Ellerman wrote:
> > > Arch code can set a "dump stack arch description string" which is
> > > displayed with oops output to describe the hardware platform.
> > > 
> > > It is useful to initialise this as early as possible, so that an early
> > > oops will have the hardware description.
> > > 
> > > However in practice we discover the hardware platform in stages, so it
> > > would be useful to be able to incrementally fill in the hardware
> > > description as we discover it.
> > > 
> > > This patch adds that ability, by creating dump_stack_add_arch_desc().
> > > 
> > > If there is no existing string it behaves exactly like
> > > dump_stack_set_arch_desc(). However if there is an existing string it
> > > appends to it, with a leading space.
> > > 
> > > This makes it easy to call it multiple times from different parts of the
> > > code and get a reasonable looking result.
> > > 
> > > Signed-off-by: Michael Ellerman 
> > > ---
> > >  include/linux/printk.h |  5 
> > >  lib/dump_stack.c   | 58 ++
> > >  2 files changed, 63 insertions(+)
> > > 
> > > v3: No change, just widened Cc list.
> > > 
> > > v2: Add a smp_wmb() and comment.
> > > 
> > > v1 is here for reference 
> > > https://lore.kernel.org/lkml/1430824337-15339-1-git-send-email-...@ellerman.id.au/
> > > 
> > > I'll take this series via the powerpc tree if no one minds?
> > > 
> > > 
> > > diff --git a/include/linux/printk.h b/include/linux/printk.h
> > > index 77740a506ebb..d5fb4f960271 100644
> > > --- a/include/linux/printk.h
> > > +++ b/include/linux/printk.h
> > > @@ -198,6 +198,7 @@ u32 log_buf_len_get(void);
> > >  void log_buf_vmcoreinfo_setup(void);
> > >  void __init setup_log_buf(int early);
> > >  __printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...);
> > > +__printf(1, 2) void dump_stack_add_arch_desc(const char *fmt, ...);
> > >  void dump_stack_print_info(const char *log_lvl);
> > >  void show_regs_print_info(const char *log_lvl);
> > >  extern asmlinkage void dump_stack(void) __cold;
> > > @@ -256,6 +257,10 @@ static inline __printf(1, 2) void 
> > > dump_stack_set_arch_desc(const char *fmt, ...)
> > >  {
> > >  }
> > >  
> > > +static inline __printf(1, 2) void dump_stack_add_arch_desc(const char 
> > > *fmt, ...)
> > > +{
> > > +}
> > > +
> > >  static inline void dump_stack_print_info(const char *log_lvl)
> > >  {
> > >  }
> > > diff --git a/lib/dump_stack.c b/lib/dump_stack.c
> > > index 5cff72f18c4a..69b710ff92b5 100644
> > > --- a/lib/dump_stack.c
> > > +++ b/lib/dump_stack.c
> > > @@ -35,6 +35,64 @@ void __init dump_stack_set_arch_desc(const char *fmt, 
> > > ...)
> > >   va_end(args);
> > >  }
> > >  
> > > +/**
> > > + * dump_stack_add_arch_desc - add arch-specific info to show with task 
> > > dumps
> > > + * @fmt: printf-style format string
> > > + * @...: arguments for the format string
> > > + *
> > > + * See dump_stack_set_arch_desc() for why you'd want to use this.
> > > + *
> > > + * This version adds to any existing string already created with either
> > > + * dump_stack_set_arch_desc() or dump_stack_add_arch_desc(). If there is 
> > > an
> > > + * existing string a space will be prepended to the passed string.
> > > + */
> > > +void __init dump_stack_add_arch_desc(const char *fmt, ...)
> > > +{
> > > + va_list args;
> > > + int pos, len;
> > > + char *p;
> > > +
> > > + /*
> > > +  * If there's an existing string we snprintf() past the end of it, and
> > > +  * then turn the terminating NULL of the existing string into a space
> > > +  * to create one string separated by a space.
> > > +  *
> > > +  * If there's no existing string we just snprintf() to the buffer, like
> > > +  * dump_stack_set_arch_desc(), but without calling it because we'd need
> > > +  * a varargs version.
> > > +  */
> > > + len = strnlen(dump_stack_arch_desc_str, 
> > > sizeof(dump_stack_arch_desc_str));
> > > + pos = len;
> > > +
> > > + if (len)
> > > + pos++;
> > > +
> > > + if (pos >= sizeof(dump_stack_arch_desc_str))
> > > + return; /* Ran out of space */
> > > +
> > > + p = _stack_arch_desc_str[pos];
> > > +
> > > + va_start(args, fmt);
> > > + vsnprintf(p, sizeof(dump_stack_arch_desc_str) - pos, fmt, args);
> > > + va_end(args);
> > > +
> > > + if (len) {
> > > + /*
> > > +  * Order the stores above in vsnprintf() vs the store of the
> > > +  * space below which joins the two strings. Note this doesn't
> > > +  * make the code truly race free because there is no barrier on
> > > +  * the read side. ie. Another CPU might load the uninitialised
> > > +  * tail of the buffer first and then the space below (rather
> > > +  * than the NULL that was there previously), and so print the
> > > +  * uninitialised tail. 

Re: [PATCH][next] ptp_qoriq: don't pass a large struct by value but instead pass it by reference

2019-02-19 Thread David Miller
From: Colin King 
Date: Tue, 19 Feb 2019 14:21:20 +

> From: Colin Ian King 
> 
> Passing the struct ptp_clock_info caps by parameter is passing over 130 bytes
> of data by value on the stack. Optimize this by passing it by reference 
> instead.
> Also shinks the object code size:
> 
> Before:
>text  data bss dec hex filename
>   12596  2160  64   1482039e4 drivers/ptp/ptp_qoriq.o
> 
> After:
>text  data bss dec hex filename
>   12567  2160  64   1479139c7 drivers/ptp/ptp_qoriq.o
> 
> Signed-off-by: Colin Ian King 

Looks good, applied, thanks.


Re: [PATCH 02/11] riscv: remove the HAVE_KPROBES option

2019-02-19 Thread Palmer Dabbelt

On Tue, 19 Feb 2019 07:17:59 PST (-0800), Christoph Hellwig wrote:

On Fri, Feb 15, 2019 at 06:32:07PM +0900, Masahiro Yamada wrote:

On Thu, Feb 14, 2019 at 2:40 AM Christoph Hellwig  wrote:
>
> HAVE_KPROBES is defined genericly in arch/Kconfig and architectures
> should just select it if supported.
>
> Signed-off-by: Christoph Hellwig 

Do you want this patch picked up by me?

Or, by Palmer?


Given that I don't think I'll have the rest of this series respun in time
for this merge window:  Palmer, can you pick it up?


It's on my for-next.


Re: [PATCH kernel] vfio/spapr_tce: Skip unsetting already unset table

2019-02-19 Thread Alex Williamson
On Wed, 13 Feb 2019 11:18:21 +1100
Alexey Kardashevskiy  wrote:

> On 13/02/2019 07:52, Alex Williamson wrote:
> > On Mon, 11 Feb 2019 18:49:17 +1100
> > Alexey Kardashevskiy  wrote:
> >   
> >> VFIO TCE IOMMU v2 owns IOMMU tables so when detach a IOMMU group from
> >> a container, we need to unset those from a group so we call unset_window()
> >> so do we unconditionally. We also unset tables when removing a DMA window  
> > 
> > Patch looks ok, but this first sentence trails off into a bit of a word
> > salad.  Care to refine a bit?  Thanks,  
> 
> Fair comment, sorry for the salad. How about this?
> 
> ===
> VFIO TCE IOMMU v2 owns IOMMU tables. When we detach an IOMMU group from
> a container, we need to unset these tables from the group which we do by
> calling unset_window(). We also unset tables when removing a DMA window
> via the VFIO_IOMMU_SPAPR_TCE_REMOVE ioctl.
> ===


Applied to vfio next branch with updated commit log and David's R-b.
Thanks,

Alex

> >   
> >> via the VFIO_IOMMU_SPAPR_TCE_REMOVE ioctl.
> >>
> >> The window removal checks if the table actually exists (hidden inside
> >> tce_iommu_find_table()) but the group detaching does not so the user
> >> may see duplicating messages:
> >> pci 0009:03 : [PE# fd] Removing DMA window #0
> >> pci 0009:03 : [PE# fd] Removing DMA window #1
> >> pci 0009:03 : [PE# fd] Removing DMA window #0
> >> pci 0009:03 : [PE# fd] Removing DMA window #1
> >>
> >> At the moment this is not a problem as the second invocation
> >> of unset_window() writes zeroes to the HW registers again and exits early
> >> as there is no table.
> >>
> >> Signed-off-by: Alexey Kardashevskiy 
> >> ---
> >>
> >> When doing VFIO PCI hot unplug, first we remove the DMA window and
> >> set container->tables[num] - this is a first couple of messages.
> >> Then we detach the group and then we see another couple of the same
> >> messages which confused myself.
> >> ---
> >>  drivers/vfio/vfio_iommu_spapr_tce.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> >> b/drivers/vfio/vfio_iommu_spapr_tce.c
> >> index c424913..8dbb270 100644
> >> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> >> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> >> @@ -1235,7 +1235,8 @@ static void tce_iommu_release_ownership_ddw(struct 
> >> tce_container *container,
> >>}
> >>  
> >>for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
> >> -  table_group->ops->unset_window(table_group, i);
> >> +  if (container->tables[i])
> >> +  table_group->ops->unset_window(table_group, i);
> >>  
> >>table_group->ops->release_ownership(table_group);
> >>  }  
> >   
> 



RE: [PATCH][next] soc: fsl: dpio: fix memory leak of a struct qbman on error exit path

2019-02-19 Thread Leo Li


> -Original Message-
> From: Colin King 
> Sent: Tuesday, February 19, 2019 8:05 AM
> To: Roy Pledge ; Leo Li ;
> linuxppc-dev@lists.ozlabs.org; linux-arm-ker...@lists.infradead.org
> Cc: kernel-janit...@vger.kernel.org; linux-ker...@vger.kernel.org
> Subject: [PATCH][next] soc: fsl: dpio: fix memory leak of a struct qbman on
> error exit path
> 
> From: Colin Ian King 
> 
> Currently the error check for a null reg leaks a struct qbman that was
> allocated earlier. Fix this by kfree'ing p on the error exit path.
> 
> Signed-off-by: Colin Ian King 

Applied for next.  Thanks.

> ---
>  drivers/soc/fsl/dpio/qbman-portal.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/soc/fsl/dpio/qbman-portal.c
> b/drivers/soc/fsl/dpio/qbman-portal.c
> index 0bddb85c0ae5..5a73397ae79e 100644
> --- a/drivers/soc/fsl/dpio/qbman-portal.c
> +++ b/drivers/soc/fsl/dpio/qbman-portal.c
> @@ -180,6 +180,7 @@ struct qbman_swp *qbman_swp_init(const struct
> qbman_swp_desc *d)
>   reg = qbman_read_register(p, QBMAN_CINH_SWP_CFG);
>   if (!reg) {
>   pr_err("qbman: the portal is not enabled!\n");
> + kfree(p);
>   return NULL;
>   }
> 
> --
> 2.20.1



Re: [PATCH] powerpc/64s: Fix possible corruption on big endian due to pgd/pud_present()

2019-02-19 Thread Segher Boessenkool
On Mon, Feb 18, 2019 at 11:49:18AM +1100, Michael Ellerman wrote:
> Balbir Singh  writes:
> > Fair enough, my point was that the compiler can help out. I'll see what
> > -Wconversion finds on my local build :)
> 
> I get about 43MB of warnings here :)

Yes, -Wconversion complains about a lot of things that are idiomatic C.
There is a reason -Wconversion is not in -Wall or -Wextra.


Segher


Re: [PATCH] powerpc/pseries: Fix dn reference error in dlpar_cpu_remove_by_index

2019-02-19 Thread Tyrel Datwyler
On 02/19/2019 07:46 AM, Michael Bringmann wrote:
> powerpc/pseries: Fix dn reference error in dlpar_cpu_remove_by_index()
> 
> A reference to the device node of the CPU to be removed is released
> upon successful removal of the associated CPU device.  If the call
> to remove the CPU device fails, dlpar_cpu_remove_by_index() still
> frees the reference and this leads to miscomparisons and/or
> addressing errors later on.
> 
> This problem may be observed when trying to DLPAR 'hot-remove' a CPU
> from a system that has only a single CPU.  The operation will fail
> because there is no other CPU to which the kernel operations may be
> migrated, but the refcount will still be decremented.
> 
> Signed-off-by: Michael Bringmann 
> 
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 97feb6e..9537bb9 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -635,7 +635,8 @@ static int dlpar_cpu_remove_by_index(u32 drc_index)
>   }
> 
>   rc = dlpar_cpu_remove(dn, drc_index);
> - of_node_put(dn);
> + if (!rc)
> + of_node_put(dn);
>   return rc;
>  }
> 

NACK!

The logic here is wrong. Here is the full function.

static int dlpar_cpu_remove_by_index(u32 drc_index)
{
struct device_node *dn;
int rc;

dn = cpu_drc_index_to_dn(drc_index);
if (!dn) {
pr_warn("Cannot find CPU (drc index %x) to remove\n",
drc_index);
return -ENODEV;
}

rc = dlpar_cpu_remove(dn, drc_index);
of_node_put(dn);
return rc;
}

The call to cpu_drc_index_to_dn() returns a device_node with the reference count
incremented. So, regardless of the success or failure of the call to
dlpar_cpu_remove() you need to release that reference.

If there is a reference counting issue it is somewhere else.

-Tyrel



Re: [PATCH] ASoC: fsl_spdif: fix sysclk_df type

2019-02-19 Thread Nicolin Chen
On Mon, Feb 18, 2019 at 03:25:00PM +, Viorel Suman wrote:
> According to RM SPDIF STC SYSCLK_DF field is 9-bit wide, values
> being in 0..511 range. Use a proper type to handle sysclk_df.
> 
> Signed-off-by: Viorel Suman 

Acked-by: Nicolin Chen 

> ---
>  sound/soc/fsl/fsl_spdif.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/sound/soc/fsl/fsl_spdif.c b/sound/soc/fsl/fsl_spdif.c
> index a26686e..4842e6d 100644
> --- a/sound/soc/fsl/fsl_spdif.c
> +++ b/sound/soc/fsl/fsl_spdif.c
> @@ -96,7 +96,7 @@ struct fsl_spdif_priv {
>   bool dpll_locked;
>   u32 txrate[SPDIF_TXRATE_MAX];
>   u8 txclk_df[SPDIF_TXRATE_MAX];
> - u8 sysclk_df[SPDIF_TXRATE_MAX];
> + u16 sysclk_df[SPDIF_TXRATE_MAX];
>   u8 txclk_src[SPDIF_TXRATE_MAX];
>   u8 rxclk_src;
>   struct clk *txclk[SPDIF_TXRATE_MAX];
> @@ -376,7 +376,8 @@ static int spdif_set_sample_rate(struct snd_pcm_substream 
> *substream,
>   struct platform_device *pdev = spdif_priv->pdev;
>   unsigned long csfs = 0;
>   u32 stc, mask, rate;
> - u8 clk, txclk_df, sysclk_df;
> + u16 sysclk_df;
> + u8 clk, txclk_df;
>   int ret;
>  
>   switch (sample_rate) {
> @@ -1109,8 +1110,9 @@ static u32 fsl_spdif_txclk_caldiv(struct fsl_spdif_priv 
> *spdif_priv,
>   static const u32 rate[] = { 32000, 44100, 48000, 96000, 192000 };
>   bool is_sysclk = clk_is_match(clk, spdif_priv->sysclk);
>   u64 rate_ideal, rate_actual, sub;
> - u32 sysclk_dfmin, sysclk_dfmax;
> - u32 txclk_df, sysclk_df, arate;
> + u32 arate;
> + u16 sysclk_dfmin, sysclk_dfmax, sysclk_df;
> + u8 txclk_df;
>  
>   /* The sysclk has an extra divisor [2, 512] */
>   sysclk_dfmin = is_sysclk ? 2 : 1;
> -- 
> 2.7.4
> 


Re: [PATCH] ASoC: fsl_spdif: fix TXCLK_DF mask

2019-02-19 Thread Nicolin Chen
On Mon, Feb 18, 2019 at 02:12:17PM +, Viorel Suman wrote:
> According to RM SPDIF TXCLK_DF mask is 7-bit wide.
> 
> Signed-off-by: Viorel Suman 

Acked-by: Nicolin Chen 

> ---
>  sound/soc/fsl/fsl_spdif.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/sound/soc/fsl/fsl_spdif.h b/sound/soc/fsl/fsl_spdif.h
> index 7666dab..e6c61e0 100644
> --- a/sound/soc/fsl/fsl_spdif.h
> +++ b/sound/soc/fsl/fsl_spdif.h
> @@ -152,7 +152,7 @@ enum spdif_gainsel {
>  #define STC_TXCLK_ALL_EN_MASK(1 << STC_TXCLK_ALL_EN_OFFSET)
>  #define STC_TXCLK_ALL_EN (1 << STC_TXCLK_ALL_EN_OFFSET)
>  #define STC_TXCLK_DF_OFFSET  0
> -#define STC_TXCLK_DF_MASK(0x7ff << STC_TXCLK_DF_OFFSET)
> +#define STC_TXCLK_DF_MASK(0x7f << STC_TXCLK_DF_OFFSET)
>  #define STC_TXCLK_DF(x)  x) - 1) << STC_TXCLK_DF_OFFSET) & 
> STC_TXCLK_DF_MASK)
>  #define STC_TXCLK_SRC_MAX8
>  
> -- 
> 2.7.4
> 


Re: [PATCH v5 3/3] powerpc/32: Add KASAN support

2019-02-19 Thread Christophe Leroy




Le 18/02/2019 à 10:27, Michael Ellerman a écrit :

Christophe Leroy  writes:


diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index e0637730a8e7..dba2c1038363 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -251,6 +251,10 @@ GLUE(.,name):
  
  #define _GLOBAL_TOC(name) _GLOBAL(name)
  
+#define KASAN_OVERRIDE(x, y) \

+   .weak x; \
+   .set x, y
+


Can you add a comment describing what that does and why?


It's gone. Hope the new approach is more clear. It's now in a dedicated 
patch.





diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 879b36602748..fc4c42262694 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -16,8 +16,9 @@ CFLAGS_prom_init.o  += -fPIC
  CFLAGS_btext.o+= -fPIC
  endif
  
-CFLAGS_cputable.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)

-CFLAGS_prom_init.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
+CFLAGS_early_32.o += -DDISABLE_BRANCH_PROFILING
+CFLAGS_cputable.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) 
-DDISABLE_BRANCH_PROFILING
+CFLAGS_prom_init.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) 
-DDISABLE_BRANCH_PROFILING


Why do we need to disable branch profiling now?


Recommended by Andrey, see https://patchwork.ozlabs.org/patch/1023887/

Maybe it should be only when KASAN is active ? For prom_init it should 
probably be all the time, for the others I don't know. Can't remember 
why I did it that way.




I'd probably be happier if all the CFLAGS changes were done in a leadup
patch to make them more obvious.


Oops, I forgot to read your mail entirely before sending out v6. Indeed 
I only read first part. Anyway, that's probably not the last run.





diff --git a/arch/powerpc/kernel/prom_init_check.sh 
b/arch/powerpc/kernel/prom_init_check.sh
index 667df97d2595..da6bb16e0876 100644
--- a/arch/powerpc/kernel/prom_init_check.sh
+++ b/arch/powerpc/kernel/prom_init_check.sh
@@ -16,8 +16,16 @@
  # If you really need to reference something from prom_init.o add
  # it to the list below:
  
+grep CONFIG_KASAN=y .config >/dev/null


Just to be safe "^CONFIG_KASAN=y$" ?


ok




+if [ $? -eq 0 ]
+then
+   MEMFCT="__memcpy __memset"
+else
+   MEMFCT="memcpy memset"
+fi


MEM_FUNCS ?


Yes, I change it now before I forget.




diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 3bf9fc6fd36c..ce8d4a9f810a 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -8,6 +8,14 @@ ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
  CFLAGS_REMOVE_code-patching.o = $(CC_FLAGS_FTRACE)
  CFLAGS_REMOVE_feature-fixups.o = $(CC_FLAGS_FTRACE)
  
+KASAN_SANITIZE_code-patching.o := n

+KASAN_SANITIZE_feature-fixups.o := n
+
+ifdef CONFIG_KASAN
+CFLAGS_code-patching.o += -DDISABLE_BRANCH_PROFILING
+CFLAGS_feature-fixups.o += -DDISABLE_BRANCH_PROFILING
+endif


There's that branch profiling again, though here it's only if KASAN is enabled.


diff --git a/arch/powerpc/mm/kasan_init.c b/arch/powerpc/mm/kasan_init.c
new file mode 100644
index ..bd8e0a263e12
--- /dev/null
+++ b/arch/powerpc/mm/kasan_init.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define DISABLE_BRANCH_PROFILING
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+void __init kasan_early_init(void)
+{
+   unsigned long addr = KASAN_SHADOW_START;
+   unsigned long end = KASAN_SHADOW_END;
+   unsigned long next;
+   pmd_t *pmd = pmd_offset(pud_offset(pgd_offset_k(addr), addr), addr);


Can none of those fail?


map_kernel_page() in pgtable_32.c does exactly the same.

pud_offset() and pmd_offset() are no-ops and only serve as type 
modifiers, so pmd will get the value returned by pgd_offset_k() which 
should always be valid unless init_mm->pgd is bad.


Christophe




cheers



[RFC WIP PATCH 3/3] powerpc: KASAN for 64bit Book3E

2019-02-19 Thread Christophe Leroy
From: Daniel Axtens 

Wire up KASAN. Only outline instrumentation is supported.

The KASAN shadow area is mapped into vmemmap space:
0x8000 0400   to 0x8000 0600  .
To do this we require that vmemmap be disabled. (This is the default
in the kernel config that QorIQ provides for the machine in their
SDK anyway - they use flat memory.)

Only the kernel linear mapping (0xc000...) is checked. The vmalloc and
ioremap areas (also in 0x800...) are all mapped to a zero page. As
with the Book3S hash series, this requires overriding the memory <->
shadow mapping.

Also, as with both previous 64-bit series, early instrumentation is not
supported.  It would allow us to drop the kasan_arch_is_ready()
hook in the KASAN core, but it's tricky to get it set up early enough:
we need it setup before the first call to instrumented code like printk().
Perhaps in the future.

Only KASAN_MINIMAL works.

Lightly tested on e6500. KVM, kexec and xmon have not been tested.

The test_kasan module fires warnings as expected, except for the
following tests:

 - Expected/by design:
kasan test: memcg_accounted_kmem_cache allocate memcg accounted object

 - Due to only supporting KASAN_MINIMAL:
kasan test: kasan_stack_oob out-of-bounds on stack
kasan test: kasan_global_oob out-of-bounds global variable
kasan test: kasan_alloca_oob_left out-of-bounds to left on alloca
kasan test: kasan_alloca_oob_right out-of-bounds to right on alloca
kasan test: use_after_scope_test use-after-scope on int
kasan test: use_after_scope_test use-after-scope on array

Thanks to those who have done the heavy lifting over the past several years:
 - Christophe's 32 bit series: 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-February/185379.html
 - Aneesh's Book3S hash series: https://lwn.net/Articles/655642/
 - Balbir's Book3S radix series: https://patchwork.ozlabs.org/patch/795211/

Cc: Christophe Leroy 
Cc: Aneesh Kumar K.V 
Cc: Balbir Singh 
Signed-off-by: Daniel Axtens 
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/Makefile|  2 +
 arch/powerpc/include/asm/kasan.h | 56 +++-
 arch/powerpc/mm/Makefile |  2 +
 arch/powerpc/mm/kasan/Makefile   |  1 +
 arch/powerpc/mm/kasan/kasan_init_book3e_64.c | 51 +
 6 files changed, 112 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/mm/kasan/kasan_init_book3e_64.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 850b06def84f..2c7c20d52778 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -176,6 +176,7 @@ config PPC
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KASAN  if PPC32
+   select HAVE_ARCH_KASAN  if PPC_BOOK3E_64 && 
!SPARSEMEM_VMEMMAP
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 06d085558d21..d1943d892db2 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -428,6 +428,7 @@ endif
 endif
 
 ifdef CONFIG_KASAN
+ifdef CONFIG_PPC32
 ifndef CONFIG_PPC_BOOK3S_32
 prepare: kasan_prepare
 
@@ -435,6 +436,7 @@ kasan_prepare: prepare0
$(eval KASAN_SHADOW_OFFSET = $(shell awk '{if ($$2 == 
"KASAN_SHADOW_OFFSET") print $$3;}' include/generated/asm-offsets.h))
 endif
 endif
+endif
 
 # Check toolchain versions:
 # - gcc-4.6 is the minimum kernel-wide version so nothing required.
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 97b5ccf0702f..6b2cd5ade185 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -15,6 +15,7 @@
 #ifndef __ASSEMBLY__
 
 #include 
+#include 
 #include 
 #include 
 
@@ -25,6 +26,7 @@
 
 #define KASAN_SHADOW_END   (KASAN_SHADOW_START + KASAN_SHADOW_SIZE)
 
+#ifdef CONFIG_PPC32
 #include 
 
 #define KASAN_SHADOW_START (ALIGN_DOWN(FIXADDR_START - KASAN_SHADOW_SIZE, \
@@ -33,13 +35,15 @@
 #define KASAN_SHADOW_SIZE  ((~0UL - PAGE_OFFSET + 1) >> 
KASAN_SHADOW_SCALE_SHIFT)
 
 void kasan_early_init(void);
+#endif /* CONFIG_PPC32 */
+
 void kasan_init(void);
 
 extern struct static_key_false powerpc_kasan_enabled_key;
 
 static inline bool kasan_arch_is_ready(void)
 {
-   if (!IS_ENABLED(CONFIG_BOOK3S_32))
+   if (IS_ENABLED(CONFIG_PPC_32) && !IS_ENABLED(CONFIG_BOOK3S_32))
return true;
if (static_branch_likely(_kasan_enabled_key))
return true;
@@ -47,5 +51,55 @@ static inline bool kasan_arch_is_ready(void)
 }
 #define kasan_arch_is_ready kasan_arch_is_ready
 
+#ifdef CONFIG_PPC_BOOK3E_64
+#define KASAN_SHADOW_START VMEMMAP_BASE
+#define KASAN_SHADOW_SIZE  (KERN_VIRT_SIZE >> KASAN_SHADOW_SCALE_SHIFT)
+
+static inline void *kasan_mem_to_shadow_book3e(const void *addr)
+{
+   if ((unsigned long)addr >= KERN_VIRT_START &&
+ 

[RFC WIP PATCH 2/3] kasan: allow architectures to manage the memory-to-shadow mapping

2019-02-19 Thread Christophe Leroy
From: Daniel Axtens 

Currently, shadow addresses are always addr >> shift + offset.
However, for powerpc, the virtual address space is fragmented in
ways that make this simple scheme impractical.

Allow architectures to override:
 - kasan_shadow_to_mem
 - kasan_mem_to_shadow
 - addr_has_shadow

Rename addr_has_shadow to kasan_addr_has_shadow as if it is
overridden it will be available in more places, increasing the
risk of collisions.

If architectures do not #define their own versions, the generic
code will continue to run as usual.

Signed-off-by: Daniel Axtens 
Reviewed-by: Dmitry Vyukov 
---
 include/linux/kasan.h | 2 ++
 mm/kasan/generic.c| 2 +-
 mm/kasan/generic_report.c | 2 +-
 mm/kasan/kasan.h  | 6 +-
 mm/kasan/report.c | 6 +++---
 mm/kasan/tags.c   | 2 +-
 6 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index b91c40af9f31..a630d53f1a36 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -27,11 +27,13 @@ extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
 int kasan_populate_early_shadow(const void *shadow_start,
const void *shadow_end);
 
+#ifndef kasan_mem_to_shadow
 static inline void *kasan_mem_to_shadow(const void *addr)
 {
return (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT)
+ KASAN_SHADOW_OFFSET;
 }
+#endif
 
 /* Enable reporting bugs after kasan_disable_current() */
 extern void kasan_enable_current(void);
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 08cbc7ed8953..6c6c30643d51 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -176,7 +176,7 @@ static __always_inline void 
check_memory_region_inline(unsigned long addr,
if (unlikely(size == 0))
return;
 
-   if (unlikely(!addr_has_shadow((void *)addr))) {
+   if (unlikely(!kasan_addr_has_shadow((void *)addr))) {
kasan_report(addr, size, write, ret_ip);
return;
}
diff --git a/mm/kasan/generic_report.c b/mm/kasan/generic_report.c
index 5e12035888f2..854f4de1fe10 100644
--- a/mm/kasan/generic_report.c
+++ b/mm/kasan/generic_report.c
@@ -110,7 +110,7 @@ static const char *get_wild_bug_type(struct 
kasan_access_info *info)
 
 const char *get_bug_type(struct kasan_access_info *info)
 {
-   if (addr_has_shadow(info->access_addr))
+   if (kasan_addr_has_shadow(info->access_addr))
return get_shadow_bug_type(info);
return get_wild_bug_type(info);
 }
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index ea51b2d898ec..57ec24cf7bd1 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -111,16 +111,20 @@ struct kasan_alloc_meta *get_alloc_info(struct kmem_cache 
*cache,
 struct kasan_free_meta *get_free_info(struct kmem_cache *cache,
const void *object);
 
+#ifndef kasan_shadow_to_mem
 static inline const void *kasan_shadow_to_mem(const void *shadow_addr)
 {
return (void *)(((unsigned long)shadow_addr - KASAN_SHADOW_OFFSET)
<< KASAN_SHADOW_SCALE_SHIFT);
 }
+#endif
 
-static inline bool addr_has_shadow(const void *addr)
+#ifndef kasan_addr_has_shadow
+static inline bool kasan_addr_has_shadow(const void *addr)
 {
return (addr >= kasan_shadow_to_mem((void *)KASAN_SHADOW_START));
 }
+#endif
 
 void kasan_poison_shadow(const void *address, size_t size, u8 value);
 
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index ca9418fe9232..bc3355ee2dd0 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -298,7 +298,7 @@ void kasan_report(unsigned long addr, size_t size,
untagged_addr = reset_tag(tagged_addr);
 
info.access_addr = tagged_addr;
-   if (addr_has_shadow(untagged_addr))
+   if (kasan_addr_has_shadow(untagged_addr))
info.first_bad_addr = find_first_bad_addr(tagged_addr, size);
else
info.first_bad_addr = untagged_addr;
@@ -309,11 +309,11 @@ void kasan_report(unsigned long addr, size_t size,
start_report();
 
print_error_description();
-   if (addr_has_shadow(untagged_addr))
+   if (kasan_addr_has_shadow(untagged_addr))
print_tags(get_tag(tagged_addr), info.first_bad_addr);
pr_err("\n");
 
-   if (addr_has_shadow(untagged_addr)) {
+   if (kasan_addr_has_shadow(untagged_addr)) {
print_address_description(untagged_addr);
pr_err("\n");
print_shadow_for_address(info.first_bad_addr);
diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c
index bc759f8f1c67..cdefd0fe1f5d 100644
--- a/mm/kasan/tags.c
+++ b/mm/kasan/tags.c
@@ -109,7 +109,7 @@ void check_memory_region(unsigned long addr, size_t size, 
bool write,
return;
 
untagged_addr = reset_tag((const void *)addr);
-   if (unlikely(!addr_has_shadow(untagged_addr))) {
+   if (unlikely(!kasan_addr_has_shadow(untagged_addr))) {
  

[RFC WIP PATCH 0/3] Your series rebased

2019-02-19 Thread Christophe Leroy
Hi Daniel,

Find here your series rebased on my v6. No test done, not even compiled.

Christophe

Christophe Leroy (3):
  kasan: do not open-code addr_has_shadow
  kasan: allow architectures to manage the memory-to-shadow mapping
  powerpc: KASAN for 64bit Book3E

 arch/powerpc/Kconfig |  1 +
 arch/powerpc/Makefile|  2 +
 arch/powerpc/include/asm/kasan.h | 56 +++-
 arch/powerpc/mm/Makefile |  2 +
 arch/powerpc/mm/kasan/Makefile   |  1 +
 arch/powerpc/mm/kasan/kasan_init_book3e_64.c | 51 +
 include/linux/kasan.h|  2 +
 mm/kasan/generic.c   |  3 +-
 mm/kasan/generic_report.c|  2 +-
 mm/kasan/kasan.h |  6 ++-
 mm/kasan/report.c|  6 +--
 mm/kasan/tags.c  |  3 +-
 12 files changed, 125 insertions(+), 10 deletions(-)
 create mode 100644 arch/powerpc/mm/kasan/kasan_init_book3e_64.c

-- 
2.13.3



[RFC WIP PATCH 1/3] kasan: do not open-code addr_has_shadow

2019-02-19 Thread Christophe Leroy
From: Daniel Axtens 

We have a couple of places checking for the existence of a shadow
mapping for an address by open-coding the inverse of the check in
addr_has_shadow.

Replace the open-coded versions with the helper. This will be
needed in future to allow architectures to override the layout
of the shadow mapping.

Signed-off-by: Daniel Axtens 
Reviewed-by: Andrew Donnellan 
Reviewed-by: Dmitry Vyukov 
---
 mm/kasan/generic.c | 3 +--
 mm/kasan/tags.c| 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 696c2f5b902b..08cbc7ed8953 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -176,8 +176,7 @@ static __always_inline void 
check_memory_region_inline(unsigned long addr,
if (unlikely(size == 0))
return;
 
-   if (unlikely((void *)addr <
-   kasan_shadow_to_mem((void *)KASAN_SHADOW_START))) {
+   if (unlikely(!addr_has_shadow((void *)addr))) {
kasan_report(addr, size, write, ret_ip);
return;
}
diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c
index 0777649e07c4..bc759f8f1c67 100644
--- a/mm/kasan/tags.c
+++ b/mm/kasan/tags.c
@@ -109,8 +109,7 @@ void check_memory_region(unsigned long addr, size_t size, 
bool write,
return;
 
untagged_addr = reset_tag((const void *)addr);
-   if (unlikely(untagged_addr <
-   kasan_shadow_to_mem((void *)KASAN_SHADOW_START))) {
+   if (unlikely(!addr_has_shadow(untagged_addr))) {
kasan_report(addr, size, write, ret_ip);
return;
}
-- 
2.13.3



[PATCH v6 6/6] powerpc/32: enable CONFIG_KASAN for book3s hash

2019-02-19 Thread Christophe Leroy
The challenge with book3s/32 is that the hash table management
has to be set up before being able to use KASAN.

This patch adds a kasan_arch_is_ready() helper to defer
the activation of KASAN until paging is ready.

This limits KASAN to KASAN_MINIMAL mode. The downside of it
is that the 603, which doesn't use hash table, also gets
downgraded to KASAN_MINIMAL because this is no way to
activate full support dynamically because that's compiled-in.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Makefile |  2 ++
 arch/powerpc/include/asm/kasan.h  | 13 +
 arch/powerpc/mm/kasan/kasan_init_32.c | 27 +--
 3 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index f0738099e31e..06d085558d21 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -428,11 +428,13 @@ endif
 endif
 
 ifdef CONFIG_KASAN
+ifndef CONFIG_PPC_BOOK3S_32
 prepare: kasan_prepare
 
 kasan_prepare: prepare0
$(eval KASAN_SHADOW_OFFSET = $(shell awk '{if ($$2 == 
"KASAN_SHADOW_OFFSET") print $$3;}' include/generated/asm-offsets.h))
 endif
+endif
 
 # Check toolchain versions:
 # - gcc-4.6 is the minimum kernel-wide version so nothing required.
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 0bc9148f5d87..97b5ccf0702f 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -16,6 +16,7 @@
 
 #include 
 #include 
+#include 
 
 #define KASAN_SHADOW_SCALE_SHIFT   3
 
@@ -34,5 +35,17 @@
 void kasan_early_init(void);
 void kasan_init(void);
 
+extern struct static_key_false powerpc_kasan_enabled_key;
+
+static inline bool kasan_arch_is_ready(void)
+{
+   if (!IS_ENABLED(CONFIG_BOOK3S_32))
+   return true;
+   if (static_branch_likely(_kasan_enabled_key))
+   return true;
+   return false;
+}
+#define kasan_arch_is_ready kasan_arch_is_ready
+
 #endif /* __ASSEMBLY */
 #endif
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/kasan_init_32.c
index 495c908d6ee6..f24f8f56d450 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -9,6 +9,9 @@
 #include 
 #include 
 
+/* Used by BOOK3S_32 only */
+DEFINE_STATIC_KEY_FALSE(powerpc_kasan_enabled_key);
+
 void __init kasan_early_init(void)
 {
unsigned long addr = KASAN_SHADOW_START;
@@ -21,7 +24,7 @@ void __init kasan_early_init(void)
BUILD_BUG_ON(KASAN_SHADOW_START & ~PGDIR_MASK);
 
if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE))
-   panic("KASAN not supported with Hash MMU\n");
+   return;
 
for (i = 0; i < PTRS_PER_PTE; i++)
__set_pte_at(_mm, (unsigned long)kasan_early_shadow_page,
@@ -32,6 +35,22 @@ void __init kasan_early_init(void)
next = pgd_addr_end(addr, end);
pmd_populate_kernel(_mm, pmd, kasan_early_shadow_pte);
} while (pmd++, addr = next, addr != end);
+
+   if (IS_ENABLED(CONFIG_PPC_BOOK3S_32)) {
+   jump_label_init();
+   static_branch_enable(_kasan_enabled_key);
+   }
+}
+
+static void __init kasan_late_init(void)
+{
+   unsigned long addr;
+   phys_addr_t pa = __pa(kasan_early_shadow_page);
+
+   for (addr = KASAN_SHADOW_START; addr < KASAN_SHADOW_END; addr += 
PAGE_SIZE)
+   map_kernel_page(addr, pa, PAGE_KERNEL_RO);
+
+   static_branch_enable(_kasan_enabled_key);
 }
 
 static void __ref *kasan_get_one_page(void)
@@ -113,6 +132,9 @@ void __init kasan_init(void)
 {
struct memblock_region *reg;
 
+   if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE))
+   kasan_late_init();
+
for_each_memblock(memory, reg) {
int ret = kasan_init_region(__va(reg->base), reg->size);
 
@@ -120,7 +142,8 @@ void __init kasan_init(void)
panic("kasan: kasan_init_region() failed");
}
 
-   kasan_remap_early_shadow_ro();
+   if (!early_mmu_has_feature(MMU_FTR_HPTE_TABLE))
+   kasan_remap_early_shadow_ro();
 
clear_page(kasan_early_shadow_page);
 
-- 
2.13.3



[PATCH v6 5/6] kasan: allow architectures to provide an outline readiness check

2019-02-19 Thread Christophe Leroy
From: Daniel Axtens 

In powerpc (as I understand it), we spend a lot of time in boot
running in real mode before MMU paging is initalised. During
this time we call a lot of generic code, including printk(). If
we try to access the shadow region during this time, things fail.

My attempts to move early init before the first printk have not
been successful. (Both previous RFCs for ppc64 - by 2 different
people - have needed this trick too!)

So, allow architectures to define a kasan_arch_is_ready()
hook that bails out of check_memory_region_inline() unless the
arch has done all of the init.

Link: https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
Link: https://patchwork.ozlabs.org/patch/795211/  # ppc radix series
Originally-by: Balbir Singh 
Cc: Aneesh Kumar K.V 
Signed-off-by: Daniel Axtens 
[check_return_arch_not_ready() ==> static inline kasan_arch_is_ready()]
Signed-off-by: Christophe Leroy 
---
 include/linux/kasan.h | 4 
 mm/kasan/generic.c| 3 +++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index b40ea104dd36..b91c40af9f31 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -14,6 +14,10 @@ struct task_struct;
 #include 
 #include 
 
+#ifndef kasan_arch_is_ready
+static inline bool kasan_arch_is_ready(void)   { return true; }
+#endif
+
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
 extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE];
 extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index ccb6207276e3..696c2f5b902b 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -170,6 +170,9 @@ static __always_inline void 
check_memory_region_inline(unsigned long addr,
size_t size, bool write,
unsigned long ret_ip)
 {
+   if (!kasan_arch_is_ready())
+   return;
+
if (unlikely(size == 0))
return;
 
-- 
2.13.3



[PATCH v6 4/6] powerpc/32: Add KASAN support

2019-02-19 Thread Christophe Leroy
This patch adds KASAN support for PPC32.

The KASAN shadow area is located between the vmalloc area and the
fixmap area.

KASAN_SHADOW_OFFSET is calculated in asm/kasan.h and extracted
by Makefile prepare rule via asm-offsets.h

For modules, the shadow area is allocated at module_alloc().

Note that on book3s it will only work on the 603 because the other
ones use hash table and can therefore not share a single PTE table
covering the entire early KASAN shadow area.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig  |   1 +
 arch/powerpc/Makefile |   7 ++
 arch/powerpc/include/asm/book3s/32/pgtable.h  |   2 +
 arch/powerpc/include/asm/highmem.h|  10 +-
 arch/powerpc/include/asm/kasan.h  |  23 
 arch/powerpc/include/asm/nohash/32/pgtable.h  |   2 +
 arch/powerpc/include/asm/setup.h  |   5 +
 arch/powerpc/kernel/Makefile  |   9 +-
 arch/powerpc/kernel/asm-offsets.c |   4 +
 arch/powerpc/kernel/head_32.S |   3 +
 arch/powerpc/kernel/head_40x.S|   3 +
 arch/powerpc/kernel/head_44x.S|   3 +
 arch/powerpc/kernel/head_8xx.S|   3 +
 arch/powerpc/kernel/head_fsl_booke.S  |   3 +
 arch/powerpc/kernel/setup-common.c|   2 +
 arch/powerpc/lib/Makefile |   8 ++
 arch/powerpc/mm/Makefile  |   1 +
 arch/powerpc/mm/kasan/Makefile|   5 +
 arch/powerpc/mm/kasan/kasan_init_32.c | 147 ++
 arch/powerpc/mm/mem.c |   4 +
 arch/powerpc/mm/ptdump/dump_linuxpagetables.c |   8 ++
 arch/powerpc/purgatory/Makefile   |   3 +
 arch/powerpc/xmon/Makefile|   1 +
 23 files changed, 253 insertions(+), 4 deletions(-)
 create mode 100644 arch/powerpc/mm/kasan/Makefile
 create mode 100644 arch/powerpc/mm/kasan/kasan_init_32.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 08908219fba9..850b06def84f 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -175,6 +175,7 @@ config PPC
select GENERIC_TIME_VSYSCALL
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_JUMP_LABEL
+   select HAVE_ARCH_KASAN  if PPC32
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index ac033341ed55..f0738099e31e 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -427,6 +427,13 @@ else
 endif
 endif
 
+ifdef CONFIG_KASAN
+prepare: kasan_prepare
+
+kasan_prepare: prepare0
+   $(eval KASAN_SHADOW_OFFSET = $(shell awk '{if ($$2 == 
"KASAN_SHADOW_OFFSET") print $$3;}' include/generated/asm-offsets.h))
+endif
+
 # Check toolchain versions:
 # - gcc-4.6 is the minimum kernel-wide version so nothing required.
 checkbin:
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 49d76adb9bc5..4543016f80ca 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -141,6 +141,8 @@ static inline bool pte_user(pte_t pte)
  */
 #ifdef CONFIG_HIGHMEM
 #define KVIRT_TOP  PKMAP_BASE
+#elif defined(CONFIG_KASAN)
+#define KVIRT_TOP  KASAN_SHADOW_START
 #else
 #define KVIRT_TOP  (0xfe00UL)  /* for now, could be FIXMAP_BASE ? */
 #endif
diff --git a/arch/powerpc/include/asm/highmem.h 
b/arch/powerpc/include/asm/highmem.h
index a4b65b186ec6..483b90025bef 100644
--- a/arch/powerpc/include/asm/highmem.h
+++ b/arch/powerpc/include/asm/highmem.h
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 extern pte_t *kmap_pte;
 extern pgprot_t kmap_prot;
@@ -50,10 +51,15 @@ extern pte_t *pkmap_page_table;
 #define PKMAP_ORDER9
 #endif
 #define LAST_PKMAP (1 << PKMAP_ORDER)
+#ifdef CONFIG_KASAN
+#define PKMAP_TOP  KASAN_SHADOW_START
+#else
+#define PKMAP_TOP  FIXADDR_START
+#endif
 #ifndef CONFIG_PPC_4K_PAGES
-#define PKMAP_BASE (FIXADDR_START - PAGE_SIZE*(LAST_PKMAP + 1))
+#define PKMAP_BASE (PKMAP_TOP - PAGE_SIZE*(LAST_PKMAP + 1))
 #else
-#define PKMAP_BASE ((FIXADDR_START - PAGE_SIZE*(LAST_PKMAP + 1)) & 
PMD_MASK)
+#define PKMAP_BASE ((PKMAP_TOP - PAGE_SIZE*(LAST_PKMAP + 1)) & PMD_MASK)
 #endif
 #define LAST_PKMAP_MASK(LAST_PKMAP-1)
 #define PKMAP_NR(virt)  ((virt-PKMAP_BASE) >> PAGE_SHIFT)
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 2efd0e42cfc9..0bc9148f5d87 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -12,4 +12,27 @@
 #define EXPORT_SYMBOL_KASAN(fn)EXPORT_SYMBOL(fn)
 #endif
 
+#ifndef __ASSEMBLY__
+
+#include 
+#include 
+
+#define KASAN_SHADOW_SCALE_SHIFT   3
+
+#define KASAN_SHADOW_OFFSET(KASAN_SHADOW_START - \
+(PAGE_OFFSET >> 

[PATCH v6 3/6] powerpc: prepare string/mem functions for KASAN

2019-02-19 Thread Christophe Leroy
CONFIG_KASAN implements wrappers for memcpy() memmove() and memset()
Those wrappers are doing the verification then call respectively
__memcpy() __memmove() and __memset(). The arches are therefore
expected to rename their optimised functions that way.

For files on which KASAN is inhibited, #defines are used to allow
them to directly call optimised versions of the functions without
going through the KASAN wrappers.

See 393f203f5fd5 ("x86_64: kasan: add interceptors for
memset/memmove/memcpy functions") for details.

Other string / mem functions do not (yet) have kasan wrappers,
we therefore have to fallback to the generic versions when
KASAN is active, otherwise KASAN checks will be skipped.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/kasan.h   | 15 +++
 arch/powerpc/include/asm/string.h  | 26 --
 arch/powerpc/kernel/prom_init_check.sh | 10 +-
 arch/powerpc/lib/Makefile  |  8 ++--
 arch/powerpc/lib/copy_32.S | 13 +++--
 arch/powerpc/lib/mem_64.S  |  8 
 arch/powerpc/lib/memcpy_64.S   |  4 ++--
 7 files changed, 67 insertions(+), 17 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kasan.h

diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
new file mode 100644
index ..2efd0e42cfc9
--- /dev/null
+++ b/arch/powerpc/include/asm/kasan.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KASAN_H
+#define __ASM_KASAN_H
+
+#ifdef CONFIG_KASAN
+#define _GLOBAL_KASAN(fn)  .weak fn ; _GLOBAL(__##fn) ; _GLOBAL(fn)
+#define _GLOBAL_KASAN_TOC(fn)  .weak fn ; _GLOBAL_TOC(__##fn) ; _GLOBAL_TOC(fn)
+#define EXPORT_SYMBOL_KASAN(fn)EXPORT_SYMBOL(__##fn) ; 
EXPORT_SYMBOL(fn)
+#else
+#define _GLOBAL_KASAN(fn)  _GLOBAL(fn)
+#define _GLOBAL_KASAN_TOC(fn)  _GLOBAL_TOC(fn)
+#define EXPORT_SYMBOL_KASAN(fn)EXPORT_SYMBOL(fn)
+#endif
+
+#endif
diff --git a/arch/powerpc/include/asm/string.h 
b/arch/powerpc/include/asm/string.h
index 1647de15a31e..2aa9ea6751cd 100644
--- a/arch/powerpc/include/asm/string.h
+++ b/arch/powerpc/include/asm/string.h
@@ -4,13 +4,16 @@
 
 #ifdef __KERNEL__
 
+#ifndef CONFIG_KASAN
 #define __HAVE_ARCH_STRNCPY
 #define __HAVE_ARCH_STRNCMP
+#define __HAVE_ARCH_MEMCHR
+#define __HAVE_ARCH_MEMCMP
+#endif
+
 #define __HAVE_ARCH_MEMSET
 #define __HAVE_ARCH_MEMCPY
 #define __HAVE_ARCH_MEMMOVE
-#define __HAVE_ARCH_MEMCMP
-#define __HAVE_ARCH_MEMCHR
 #define __HAVE_ARCH_MEMSET16
 #define __HAVE_ARCH_MEMCPY_FLUSHCACHE
 
@@ -27,6 +30,25 @@ extern int memcmp(const void *,const void *,__kernel_size_t);
 extern void * memchr(const void *,int,__kernel_size_t);
 extern void * memcpy_flushcache(void *,const void *,__kernel_size_t);
 
+void *__memset(void *s, int c, __kernel_size_t count);
+void *__memcpy(void *to, const void *from, __kernel_size_t n);
+void *__memmove(void *to, const void *from, __kernel_size_t n);
+
+#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
+/*
+ * For files that are not instrumented (e.g. mm/slub.c) we
+ * should use not instrumented version of mem* functions.
+ */
+#define memcpy(dst, src, len) __memcpy(dst, src, len)
+#define memmove(dst, src, len) __memmove(dst, src, len)
+#define memset(s, c, n) __memset(s, c, n)
+
+#ifndef __NO_FORTIFY
+#define __NO_FORTIFY /* FORTIFY_SOURCE uses __builtin_memcpy, etc. */
+#endif
+
+#endif
+
 #ifdef CONFIG_PPC64
 #define __HAVE_ARCH_MEMSET32
 #define __HAVE_ARCH_MEMSET64
diff --git a/arch/powerpc/kernel/prom_init_check.sh 
b/arch/powerpc/kernel/prom_init_check.sh
index 667df97d2595..da6bb16e0876 100644
--- a/arch/powerpc/kernel/prom_init_check.sh
+++ b/arch/powerpc/kernel/prom_init_check.sh
@@ -16,8 +16,16 @@
 # If you really need to reference something from prom_init.o add
 # it to the list below:
 
+grep CONFIG_KASAN=y .config >/dev/null
+if [ $? -eq 0 ]
+then
+   MEMFCT="__memcpy __memset"
+else
+   MEMFCT="memcpy memset"
+fi
+
 WHITELIST="add_reloc_offset __bss_start __bss_stop copy_and_flush
-_end enter_prom memcpy memset reloc_offset __secondary_hold
+_end enter_prom $MEMFCT reloc_offset __secondary_hold
 __secondary_hold_acknowledge __secondary_hold_spinloop __start
 strcmp strcpy strlcpy strlen strncmp strstr kstrtobool logo_linux_clut224
 reloc_got2 kernstart_addr memstart_addr linux_banner _stext
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 3bf9fc6fd36c..ee08a7e1bcdf 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -8,7 +8,11 @@ ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
 CFLAGS_REMOVE_code-patching.o = $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_feature-fixups.o = $(CC_FLAGS_FTRACE)
 
-obj-y += string.o alloc.o code-patching.o feature-fixups.o
+obj-y += alloc.o code-patching.o feature-fixups.o
+
+ifndef CONFIG_KASAN
+obj-y  +=  string.o memcmp_$(BITS).o
+endif
 
 obj-$(CONFIG_PPC32)+= div64.o copy_32.o crtsavres.o strlen_32.o
 
@@ -33,7 +37,7 

[PATCH v6 1/6] powerpc/mm: prepare kernel for KAsan on PPC32

2019-02-19 Thread Christophe Leroy
In kernel/cputable.c, explicitly use memcpy() in order
to allow GCC to replace it with __memcpy() when KASAN is
selected.

Since commit 400c47d81ca38 ("powerpc32: memset: only use dcbz once cache is
enabled"), memset() can be used before activation of the cache,
so no need to use memset_io() for zeroing the BSS.

Acked-by: Dmitry Vyukov 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/cputable.c | 13 ++---
 arch/powerpc/kernel/setup_32.c |  6 ++
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 1eab54bc6ee9..cd12f362b61f 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -2147,7 +2147,11 @@ void __init set_cur_cpu_spec(struct cpu_spec *s)
struct cpu_spec *t = _cpu_spec;
 
t = PTRRELOC(t);
-   *t = *s;
+   /*
+* use memcpy() instead of *t = *s so that GCC replaces it
+* by __memcpy() when KASAN is active
+*/
+   memcpy(t, s, sizeof(*t));
 
*PTRRELOC(_cpu_spec) = _cpu_spec;
 }
@@ -2161,8 +2165,11 @@ static struct cpu_spec * __init setup_cpu_spec(unsigned 
long offset,
t = PTRRELOC(t);
old = *t;
 
-   /* Copy everything, then do fixups */
-   *t = *s;
+   /*
+* Copy everything, then do fixups. Use memcpy() instead of *t = *s
+* so that GCC replaces it by __memcpy() when KASAN is active
+*/
+   memcpy(t, s, sizeof(*t));
 
/*
 * If we are overriding a previous value derived from the real
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 947f904688b0..5e761eb16a6d 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -73,10 +73,8 @@ notrace unsigned long __init early_init(unsigned long dt_ptr)
 {
unsigned long offset = reloc_offset();
 
-   /* First zero the BSS -- use memset_io, some platforms don't have
-* caches on yet */
-   memset_io((void __iomem *)PTRRELOC(&__bss_start), 0,
-   __bss_stop - __bss_start);
+   /* First zero the BSS */
+   memset(PTRRELOC(&__bss_start), 0, __bss_stop - __bss_start);
 
/*
 * Identify the CPU type and fix up code sections
-- 
2.13.3



[PATCH v6 2/6] powerpc/32: Move early_init() in a separate file

2019-02-19 Thread Christophe Leroy
In preparation of KASAN, move early_init() into a separate
file in order to allow deactivation of KASAN for that function.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/Makefile   |  2 +-
 arch/powerpc/kernel/early_32.c | 36 
 arch/powerpc/kernel/setup_32.c | 26 --
 3 files changed, 37 insertions(+), 27 deletions(-)
 create mode 100644 arch/powerpc/kernel/early_32.c

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index cb7f0bb9ee71..879b36602748 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -93,7 +93,7 @@ extra-y   += vmlinux.lds
 
 obj-$(CONFIG_RELOCATABLE)  += reloc_$(BITS).o
 
-obj-$(CONFIG_PPC32)+= entry_32.o setup_32.o
+obj-$(CONFIG_PPC32)+= entry_32.o setup_32.o early_32.o
 obj-$(CONFIG_PPC64)+= dma-iommu.o iommu.o
 obj-$(CONFIG_KGDB) += kgdb.o
 obj-$(CONFIG_BOOTX_TEXT)   += btext.o
diff --git a/arch/powerpc/kernel/early_32.c b/arch/powerpc/kernel/early_32.c
new file mode 100644
index ..3482118ffe76
--- /dev/null
+++ b/arch/powerpc/kernel/early_32.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Early init before relocation
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * We're called here very early in the boot.
+ *
+ * Note that the kernel may be running at an address which is different
+ * from the address that it was linked at, so we must use RELOC/PTRRELOC
+ * to access static data (including strings).  -- paulus
+ */
+notrace unsigned long __init early_init(unsigned long dt_ptr)
+{
+   unsigned long offset = reloc_offset();
+
+   /* First zero the BSS */
+   memset(PTRRELOC(&__bss_start), 0, __bss_stop - __bss_start);
+
+   /*
+* Identify the CPU type and fix up code sections
+* that depend on which cpu we have.
+*/
+   identify_cpu(offset, mfspr(SPRN_PVR));
+
+   apply_feature_fixups();
+
+   return KERNELBASE + offset;
+}
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 5e761eb16a6d..b46a9a33225b 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -63,32 +63,6 @@ EXPORT_SYMBOL(DMA_MODE_READ);
 EXPORT_SYMBOL(DMA_MODE_WRITE);
 
 /*
- * We're called here very early in the boot.
- *
- * Note that the kernel may be running at an address which is different
- * from the address that it was linked at, so we must use RELOC/PTRRELOC
- * to access static data (including strings).  -- paulus
- */
-notrace unsigned long __init early_init(unsigned long dt_ptr)
-{
-   unsigned long offset = reloc_offset();
-
-   /* First zero the BSS */
-   memset(PTRRELOC(&__bss_start), 0, __bss_stop - __bss_start);
-
-   /*
-* Identify the CPU type and fix up code sections
-* that depend on which cpu we have.
-*/
-   identify_cpu(offset, mfspr(SPRN_PVR));
-
-   apply_feature_fixups();
-
-   return KERNELBASE + offset;
-}
-
-
-/*
  * This is run before start_kernel(), the kernel has been relocated
  * and we are running with enough of the MMU enabled to have our
  * proper kernel virtual addresses
-- 
2.13.3



[PATCH v6 0/6] KASAN for powerpc/32

2019-02-19 Thread Christophe Leroy
This serie adds KASAN support to powerpc/32

Tested on nohash/32 (8xx) and book3s/32 (mpc832x ie 603).
Boot tested on qemu mac99

Changes in v6:
- Fixed oops on module loading (due to access to RO shadow zero area).
- Added support for hash book3s/32, thanks to Daniel's patch to differ KASAN 
activation.
- Reworked handling of optimised string functions (dedicated patch for it)
- Reordered some files to ease adding of book3e/64 support.

Changes in v5:
- Added KASAN_SHADOW_OFFSET in Makefile, otherwise we fallback to KASAN_MINIMAL
and some stuff like stack instrumentation is not performed
- Moved calls to kasan_early_init() in head.S because stack instrumentation
in machine_init was performed before the call to kasan_early_init()
- Mapping kasan_early_shadow_page RW in kasan_early_init() and
remaping RO later in kasan_init()
- Allocating a big memblock() for shadow area, falling back to PAGE_SIZE blocks 
in case of failure.

Changes in v4:
- Comments from Andrey (DISABLE_BRANCH_PROFILING, Activation of reports)
- Proper initialisation of shadow area in kasan_init()
- Panic in case Hash table is required.
- Added comments in patch one to explain why *t = *s becomes memcpy(t, s, ...)
- Call of kasan_init_tags()

Changes in v3:
- Removed the printk() in kasan_early_init() to avoid build failure (see 
https://github.com/linuxppc/issues/issues/218)
- Added necessary changes in asm/book3s/32/pgtable.h to get it work on powerpc 
603 family
- Added a few KASAN_SANITIZE_xxx.o := n to successfully boot on powerpc 603 
family

Changes in v2:
- Rebased.
- Using __set_pte_at() to build the early table.
- Worked around and got rid of the patch adding asm/page.h in 
asm/pgtable-types.h
==> might be fixed independently but not needed for this serie.

For book3s/32 we have to stick to KASAN_MINIMAL because Hash table
management is not active early enough at the time being.

Christophe Leroy (6):
  powerpc/mm: prepare kernel for KAsan on PPC32
  powerpc/32: Move early_init() in a separate file
  powerpc: prepare string/mem functions for KASAN
  powerpc/32: Add KASAN support
  kasan: allow architectures to provide an outline readiness check
  powerpc/32: enable CONFIG_KASAN for book3s hash

 arch/powerpc/Kconfig  |   1 +
 arch/powerpc/Makefile |   9 ++
 arch/powerpc/include/asm/book3s/32/pgtable.h  |   2 +
 arch/powerpc/include/asm/highmem.h|  10 +-
 arch/powerpc/include/asm/kasan.h  |  51 
 arch/powerpc/include/asm/nohash/32/pgtable.h  |   2 +
 arch/powerpc/include/asm/setup.h  |   5 +
 arch/powerpc/include/asm/string.h |  26 +++-
 arch/powerpc/kernel/Makefile  |  11 +-
 arch/powerpc/kernel/asm-offsets.c |   4 +
 arch/powerpc/kernel/cputable.c|  13 +-
 arch/powerpc/kernel/early_32.c|  36 ++
 arch/powerpc/kernel/head_32.S |   3 +
 arch/powerpc/kernel/head_40x.S|   3 +
 arch/powerpc/kernel/head_44x.S|   3 +
 arch/powerpc/kernel/head_8xx.S|   3 +
 arch/powerpc/kernel/head_fsl_booke.S  |   3 +
 arch/powerpc/kernel/prom_init_check.sh|  10 +-
 arch/powerpc/kernel/setup-common.c|   2 +
 arch/powerpc/kernel/setup_32.c|  28 -
 arch/powerpc/lib/Makefile |  16 ++-
 arch/powerpc/lib/copy_32.S|  13 +-
 arch/powerpc/lib/mem_64.S |   8 +-
 arch/powerpc/lib/memcpy_64.S  |   4 +-
 arch/powerpc/mm/Makefile  |   1 +
 arch/powerpc/mm/kasan/Makefile|   5 +
 arch/powerpc/mm/kasan/kasan_init_32.c | 170 ++
 arch/powerpc/mm/mem.c |   4 +
 arch/powerpc/mm/ptdump/dump_linuxpagetables.c |   8 ++
 arch/powerpc/purgatory/Makefile   |   3 +
 arch/powerpc/xmon/Makefile|   1 +
 include/linux/kasan.h |   4 +
 mm/kasan/generic.c|   3 +
 33 files changed, 412 insertions(+), 53 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kasan.h
 create mode 100644 arch/powerpc/kernel/early_32.c
 create mode 100644 arch/powerpc/mm/kasan/Makefile
 create mode 100644 arch/powerpc/mm/kasan/kasan_init_32.c

-- 
2.13.3



[PATCH] powerpc/pseries: Fix dn reference error in dlpar_cpu_remove_by_index

2019-02-19 Thread Michael Bringmann
powerpc/pseries: Fix dn reference error in dlpar_cpu_remove_by_index()

A reference to the device node of the CPU to be removed is released
upon successful removal of the associated CPU device.  If the call
to remove the CPU device fails, dlpar_cpu_remove_by_index() still
frees the reference and this leads to miscomparisons and/or
addressing errors later on.

This problem may be observed when trying to DLPAR 'hot-remove' a CPU
from a system that has only a single CPU.  The operation will fail
because there is no other CPU to which the kernel operations may be
migrated, but the refcount will still be decremented.

Signed-off-by: Michael Bringmann 


diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 97feb6e..9537bb9 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -635,7 +635,8 @@ static int dlpar_cpu_remove_by_index(u32 drc_index)
}
 
rc = dlpar_cpu_remove(dn, drc_index);
-   of_node_put(dn);
+   if (!rc)
+   of_node_put(dn);
return rc;
 }
 



Re: [PATCH 02/11] riscv: remove the HAVE_KPROBES option

2019-02-19 Thread Christoph Hellwig
On Fri, Feb 15, 2019 at 06:32:07PM +0900, Masahiro Yamada wrote:
> On Thu, Feb 14, 2019 at 2:40 AM Christoph Hellwig  wrote:
> >
> > HAVE_KPROBES is defined genericly in arch/Kconfig and architectures
> > should just select it if supported.
> >
> > Signed-off-by: Christoph Hellwig 
> 
> Do you want this patch picked up by me?
> 
> Or, by Palmer?

Given that I don't think I'll have the rest of this series respun in time
for this merge window:  Palmer, can you pick it up?


Re: [PATCH] powerpc: drop unused GENERIC_CSUM Kconfig item

2019-02-19 Thread Christoph Hellwig
Looks fine,

Reviewed-by: Christoph Hellwig 


[PATCH][next] ptp_qoriq: don't pass a large struct by value but instead pass it by reference

2019-02-19 Thread Colin King
From: Colin Ian King 

Passing the struct ptp_clock_info caps by parameter is passing over 130 bytes
of data by value on the stack. Optimize this by passing it by reference instead.
Also shinks the object code size:

Before:
   textdata bss dec hex filename
  125962160  64   1482039e4 drivers/ptp/ptp_qoriq.o

After:
   textdata bss dec hex filename
  125672160  64   1479139c7 drivers/ptp/ptp_qoriq.o

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/freescale/enetc/enetc_ptp.c | 2 +-
 drivers/ptp/ptp_qoriq.c  | 6 +++---
 include/linux/fsl/ptp_qoriq.h| 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_ptp.c 
b/drivers/net/ethernet/freescale/enetc/enetc_ptp.c
index dc2f58a7c9e5..8c1497e7d9c5 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_ptp.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_ptp.c
@@ -92,7 +92,7 @@ static int enetc_ptp_probe(struct pci_dev *pdev,
 
ptp_qoriq->dev = >dev;
 
-   err = ptp_qoriq_init(ptp_qoriq, base, enetc_ptp_caps);
+   err = ptp_qoriq_init(ptp_qoriq, base, _ptp_caps);
if (err)
goto err_no_clock;
 
diff --git a/drivers/ptp/ptp_qoriq.c b/drivers/ptp/ptp_qoriq.c
index 42d3654f77f0..53775362aac6 100644
--- a/drivers/ptp/ptp_qoriq.c
+++ b/drivers/ptp/ptp_qoriq.c
@@ -459,7 +459,7 @@ static int ptp_qoriq_auto_config(struct ptp_qoriq 
*ptp_qoriq,
 }
 
 int ptp_qoriq_init(struct ptp_qoriq *ptp_qoriq, void __iomem *base,
-  const struct ptp_clock_info caps)
+  const struct ptp_clock_info *caps)
 {
struct device_node *node = ptp_qoriq->dev->of_node;
struct ptp_qoriq_registers *regs;
@@ -468,7 +468,7 @@ int ptp_qoriq_init(struct ptp_qoriq *ptp_qoriq, void 
__iomem *base,
u32 tmr_ctrl;
 
ptp_qoriq->base = base;
-   ptp_qoriq->caps = caps;
+   ptp_qoriq->caps = *caps;
 
if (of_property_read_u32(node, "fsl,cksel", _qoriq->cksel))
ptp_qoriq->cksel = DEFAULT_CKSEL;
@@ -605,7 +605,7 @@ static int ptp_qoriq_probe(struct platform_device *dev)
goto no_ioremap;
}
 
-   err = ptp_qoriq_init(ptp_qoriq, base, ptp_qoriq_caps);
+   err = ptp_qoriq_init(ptp_qoriq, base, _qoriq_caps);
if (err)
goto no_clock;
 
diff --git a/include/linux/fsl/ptp_qoriq.h b/include/linux/fsl/ptp_qoriq.h
index f127adb71041..992bf9fa1729 100644
--- a/include/linux/fsl/ptp_qoriq.h
+++ b/include/linux/fsl/ptp_qoriq.h
@@ -183,7 +183,7 @@ static inline void qoriq_write_le(unsigned __iomem *addr, 
u32 val)
 
 irqreturn_t ptp_qoriq_isr(int irq, void *priv);
 int ptp_qoriq_init(struct ptp_qoriq *ptp_qoriq, void __iomem *base,
-  const struct ptp_clock_info caps);
+  const struct ptp_clock_info *caps);
 void ptp_qoriq_free(struct ptp_qoriq *ptp_qoriq);
 int ptp_qoriq_adjfine(struct ptp_clock_info *ptp, long scaled_ppm);
 int ptp_qoriq_adjtime(struct ptp_clock_info *ptp, s64 delta);
-- 
2.20.1



Re: [PATCH] powerpc: Make PPC_64K_PAGES depend on only 44x or PPC_BOOK3S_64

2019-02-19 Thread Michael Ellerman
Christophe Leroy  writes:

> On 02/08/2019 12:34 PM, Michael Ellerman wrote:
>> In commit 7820856a4fcd ("powerpc/mm/book3e/64: Remove unsupported
>> 64Kpage size from 64bit booke") we dropped the 64K page size support
>> from the 64-bit nohash (Book3E) code.
>> 
>> But we didn't update the dependencies of the PPC_64K_PAGES option,
>> meaning a randconfig can still trigger this code and cause a build
>> breakage, eg:
>>arch/powerpc/include/asm/nohash/64/pgtable.h:14:2: error: #error "Page 
>> size not supported"
>>arch/powerpc/include/asm/nohash/mmu-book3e.h:275:2: error: #error 
>> Unsupported page size
>> 
>> So remove PPC_BOOK3E_64 from the dependencies. This also means we
>> don't need to worry about PPC_FSL_BOOK3E, because that was just trying
>> to prevent the PPC_BOOK3E_64=y && PPC_FSL_BOOK3E=y case.
>
> Does it means some cleanup could be done, for instance:
>
> arch/powerpc/include/asm/nohash/64/pgalloc.h:#ifndef CONFIG_PPC_64K_PAGES
> arch/powerpc/include/asm/nohash/64/pgalloc.h:#endif /* 
> CONFIG_PPC_64K_PAGES */
> arch/powerpc/include/asm/nohash/64/pgtable.h:#ifdef CONFIG_PPC_64K_PAGES
> arch/powerpc/include/asm/nohash/64/slice.h:#ifdef CONFIG_PPC_64K_PAGES
> arch/powerpc/include/asm/nohash/64/slice.h:#else /* CONFIG_PPC_64K_PAGES */
> arch/powerpc/include/asm/nohash/64/slice.h:#endif /* 
> !CONFIG_PPC_64K_PAGES */
> arch/powerpc/include/asm/nohash/pte-book3e.h:#ifdef CONFIG_PPC_64K_PAGES
>
> arch/powerpc/mm/tlb_low_64e.S:#ifdef CONFIG_PPC_64K_PAGES
> arch/powerpc/mm/tlb_low_64e.S:#ifndef CONFIG_PPC_64K_PAGES
> arch/powerpc/mm/tlb_low_64e.S:#endif /* CONFIG_PPC_64K_PAGES */
> arch/powerpc/mm/tlb_low_64e.S:#ifdef CONFIG_PPC_64K_PAGES
> arch/powerpc/mm/tlb_low_64e.S:#ifdef CONFIG_PPC_64K_PAGES
> arch/powerpc/mm/tlb_low_64e.S:#ifndef CONFIG_PPC_64K_PAGES
> arch/powerpc/mm/tlb_low_64e.S:#endif /* CONFIG_PPC_64K_PAGES */
> arch/powerpc/mm/tlb_low_64e.S:#ifndef CONFIG_PPC_64K_PAGES
> arch/powerpc/mm/tlb_low_64e.S:#endif /* CONFIG_PPC_64K_PAGES */
> arch/powerpc/mm/tlb_low_64e.S:#ifndef CONFIG_PPC_64K_PAGES
> arch/powerpc/mm/tlb_low_64e.S:#ifdef CONFIG_PPC_64K_PAGES

Probably.

Some of the FSL chips do support 64K pages at least according to some
datasheets. I don't know what would be required to get it working, or if
it even works in practice.

So it would be nice to get 64K working on those chips, but probably no
one has time or motivation to do it. In which case yeah all that code
should be removed.

cheers


[PATCH][next] soc: fsl: dpio: fix memory leak of a struct qbman on error exit path

2019-02-19 Thread Colin King
From: Colin Ian King 

Currently the error check for a null reg leaks a struct qbman
that was allocated earlier. Fix this by kfree'ing p on the error exit
path.

Signed-off-by: Colin Ian King 
---
 drivers/soc/fsl/dpio/qbman-portal.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/soc/fsl/dpio/qbman-portal.c 
b/drivers/soc/fsl/dpio/qbman-portal.c
index 0bddb85c0ae5..5a73397ae79e 100644
--- a/drivers/soc/fsl/dpio/qbman-portal.c
+++ b/drivers/soc/fsl/dpio/qbman-portal.c
@@ -180,6 +180,7 @@ struct qbman_swp *qbman_swp_init(const struct 
qbman_swp_desc *d)
reg = qbman_read_register(p, QBMAN_CINH_SWP_CFG);
if (!reg) {
pr_err("qbman: the portal is not enabled!\n");
+   kfree(p);
return NULL;
}
 
-- 
2.20.1



Re: [PATCH] powerpc/64s: Fix possible corruption on big endian due to pgd/pud_present()

2019-02-19 Thread Balbir Singh
On Mon, Feb 18, 2019 at 11:49:18AM +1100, Michael Ellerman wrote:
> Balbir Singh  writes:
> > On Sun, Feb 17, 2019 at 07:34:20PM +1100, Michael Ellerman wrote:
> >> Balbir Singh  writes:
> >> > On Sat, Feb 16, 2019 at 08:22:12AM -0600, Segher Boessenkool wrote:
> >> >> On Sat, Feb 16, 2019 at 09:55:11PM +1100, Balbir Singh wrote:
> >> >> > On Thu, Feb 14, 2019 at 05:23:39PM +1100, Michael Ellerman wrote:
> >> >> > > In v4.20 we changed our pgd/pud_present() to check for _PAGE_PRESENT
> >> >> > > rather than just checking that the value is non-zero, e.g.:
> >> >> > > 
> >> >> > >   static inline int pgd_present(pgd_t pgd)
> >> >> > >   {
> >> >> > >  -   return !pgd_none(pgd);
> >> >> > >  +   return (pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
> >> >> > >   }
> >> >> > > 
> >> >> > > Unfortunately this is broken on big endian, as the result of the
> >> >> > > bitwise && is truncated to int, which is always zero because
> >> >> 
> >> >> (Bitwise "&" of course).
> >> >> 
> >> >> > Not sure why that should happen, why is the result an int? What
> >> >> > causes the casting of pgd_t & be64 to be truncated to an int.
> >> >> 
> >> >> Yes, it's not obvious as written...  It's simply that the return type of
> >> >> pgd_present is int.  So it is truncated _after_ the bitwise and.
> >> >>
> >> >
> >> > Thanks, I am surprised the compiler does not complain about the 
> >> > truncation
> >> > of bits. I wonder if we are missing -Wconversion
> >> 
> >> Good luck with that :)
> >> 
> >> What I should start doing is building with it enabled and then comparing
> >> the output before and after commits to make sure we're not introducing
> >> new cases.
> >
> > Fair enough, my point was that the compiler can help out. I'll see what
> > -Wconversion finds on my local build :)
> 
> I get about 43MB of warnings here :)
>

I got about 181M with a failed build :(, but the warnings pointed to some cases
that can be a good project for cleanup

For example

1. 
static inline long regs_return_value(struct pt_regs *regs)
{
if (is_syscall_success(regs))
return regs->gpr[3];
else
return -regs->gpr[3];
}

In the case of is_syscall_success() returning false, we should ensure that
regs->gpr[3] is negative and capped within a certain limit, but it might
be an expensive check

2.
static inline void mark_hpte_slot_valid(unsigned char *hpte_slot_array,
unsigned int index, unsigned int hidx)
{
hpte_slot_array[index] = (hidx << 1) | 0x1;
}

hidx is 3 bits, but the argument is unsigned int. The caller probably does a
hidx & 0x7, but it's not clear from the code

3. hash__pmd_bad (pmd_bad) and hash__pud_bad (pud_bad) have issues similar to 
what was found,
but since the the page table indices are below 32, the macros are safe :)

And a few more, but I am not sure why I spent time looking at possible issues,
may be I am being stupid or overly pessimistic :)

Balbir




Applied "SoC: imx-sgtl5000: add missing put_device()" to the asoc tree

2019-02-19 Thread Mark Brown
The patch

   SoC: imx-sgtl5000: add missing put_device()

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 8fa857da9744f513036df1c43ab57f338941ae7d Mon Sep 17 00:00:00 2001
From: Wen Yang 
Date: Mon, 18 Feb 2019 15:13:47 +
Subject: [PATCH] SoC: imx-sgtl5000: add missing put_device()

The of_find_device_by_node() takes a reference to the underlying device
structure, we should release that reference.

Detected by coccinelle with the following warnings:
./sound/soc/fsl/imx-sgtl5000.c:169:1-7: ERROR: missing put_device;
call of_find_device_by_node on line 105, but without a corresponding
object release within this function.
./sound/soc/fsl/imx-sgtl5000.c:177:1-7: ERROR: missing put_device;
call of_find_device_by_node on line 105, but without a corresponding
object release within this function.

Signed-off-by: Wen Yang 
Cc: Timur Tabi 
Cc: Nicolin Chen 
Cc: Xiubo Li 
Cc: Fabio Estevam 
Cc: Liam Girdwood 
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: Shawn Guo 
Cc: Sascha Hauer 
Cc: Pengutronix Kernel Team 
Cc: NXP Linux Team 
Cc: alsa-de...@alsa-project.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Mark Brown 
---
 sound/soc/fsl/imx-sgtl5000.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/soc/fsl/imx-sgtl5000.c b/sound/soc/fsl/imx-sgtl5000.c
index b6cb80480b60..bf8597f57dce 100644
--- a/sound/soc/fsl/imx-sgtl5000.c
+++ b/sound/soc/fsl/imx-sgtl5000.c
@@ -108,6 +108,7 @@ static int imx_sgtl5000_probe(struct platform_device *pdev)
ret = -EPROBE_DEFER;
goto fail;
}
+   put_device(_pdev->dev);
codec_dev = of_find_i2c_device_by_node(codec_np);
if (!codec_dev) {
dev_dbg(>dev, "failed to find codec platform device\n");
-- 
2.20.1



Re: [PATCHv6 3/4] pci: layerscape: Add the EP mode support.

2019-02-19 Thread Lorenzo Pieralisi
On Tue, Jan 22, 2019 at 02:33:27PM +0800, Xiaowei Bao wrote:
> Add the PCIe EP mode support for layerscape platform.
> 
> Signed-off-by: Xiaowei Bao 
> Reviewed-by: Minghuan Lian 
> Reviewed-by: Zhiqiang Hou 
> Reviewed-by: Kishon Vijay Abraham I 
> ---
> depends on: https://patchwork.kernel.org/project/linux-pci/list/?series=66177
> 
> v2:
>  - remove the EP mode check function.
> v3:
>  - modif the return value when enter default case.
> v4:
>  - no change.
> v5:
>  - no change.
> v6:
>  - modify the code base on the submit patch of the EP framework.

Can I apply this series to my pci/endpoint branch (where I queued
Kishon's EP features rework patches) ? Can you check please ?

Thanks,
Lorenzo

>  drivers/pci/controller/dwc/Makefile|2 +-
>  drivers/pci/controller/dwc/pci-layerscape-ep.c |  157 
> 
>  2 files changed, 158 insertions(+), 1 deletions(-)
>  create mode 100644 drivers/pci/controller/dwc/pci-layerscape-ep.c
> 
> diff --git a/drivers/pci/controller/dwc/Makefile 
> b/drivers/pci/controller/dwc/Makefile
> index 7bcdcdf..b5f3b83 100644
> --- a/drivers/pci/controller/dwc/Makefile
> +++ b/drivers/pci/controller/dwc/Makefile
> @@ -8,7 +8,7 @@ obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o
>  obj-$(CONFIG_PCI_IMX6) += pci-imx6.o
>  obj-$(CONFIG_PCIE_SPEAR13XX) += pcie-spear13xx.o
>  obj-$(CONFIG_PCI_KEYSTONE) += pci-keystone.o
> -obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o
> +obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o pci-layerscape-ep.o
>  obj-$(CONFIG_PCIE_QCOM) += pcie-qcom.o
>  obj-$(CONFIG_PCIE_ARMADA_8K) += pcie-armada8k.o
>  obj-$(CONFIG_PCIE_ARTPEC6) += pcie-artpec6.o
> diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
> b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> new file mode 100644
> index 000..ddc2dbb
> --- /dev/null
> +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> @@ -0,0 +1,157 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PCIe controller EP driver for Freescale Layerscape SoCs
> + *
> + * Copyright (C) 2018 NXP Semiconductor.
> + *
> + * Author: Xiaowei Bao 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "pcie-designware.h"
> +
> +#define PCIE_DBI2_OFFSET 0x1000  /* DBI2 base address*/
> +
> +struct ls_pcie_ep {
> + struct dw_pcie  *pci;
> +};
> +
> +#define to_ls_pcie_ep(x) dev_get_drvdata((x)->dev)
> +
> +static int ls_pcie_establish_link(struct dw_pcie *pci)
> +{
> + return 0;
> +}
> +
> +static const struct dw_pcie_ops ls_pcie_ep_ops = {
> + .start_link = ls_pcie_establish_link,
> +};
> +
> +static const struct of_device_id ls_pcie_ep_of_match[] = {
> + { .compatible = "fsl,ls-pcie-ep",},
> + { },
> +};
> +
> +static const struct pci_epc_features ls_pcie_epc_features = {
> + .linkup_notifier = false,
> + .msi_capable = true,
> + .msix_capable = false,
> +};
> +
> +static const struct pci_epc_features*
> +ls_pcie_ep_get_features(struct dw_pcie_ep *ep)
> +{
> + return _pcie_epc_features;
> +}
> +
> +static void ls_pcie_ep_init(struct dw_pcie_ep *ep)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + struct pci_epc *epc = ep->epc;
> + enum pci_barno bar;
> +
> + for (bar = BAR_0; bar <= BAR_5; bar++)
> + dw_pcie_ep_reset_bar(pci, bar);
> +}
> +
> +static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
> +   enum pci_epc_irq_type type, u16 interrupt_num)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> +
> + switch (type) {
> + case PCI_EPC_IRQ_LEGACY:
> + return dw_pcie_ep_raise_legacy_irq(ep, func_no);
> + case PCI_EPC_IRQ_MSI:
> + return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num);
> + case PCI_EPC_IRQ_MSIX:
> + return dw_pcie_ep_raise_msix_irq(ep, func_no, interrupt_num);
> + default:
> + dev_err(pci->dev, "UNKNOWN IRQ type\n");
> + return -EINVAL;
> + }
> +}
> +
> +static struct dw_pcie_ep_ops pcie_ep_ops = {
> + .ep_init = ls_pcie_ep_init,
> + .raise_irq = ls_pcie_ep_raise_irq,
> + .get_features = ls_pcie_ep_get_features,
> +};
> +
> +static int __init ls_add_pcie_ep(struct ls_pcie_ep *pcie,
> + struct platform_device *pdev)
> +{
> + struct dw_pcie *pci = pcie->pci;
> + struct device *dev = pci->dev;
> + struct dw_pcie_ep *ep;
> + struct resource *res;
> + int ret;
> +
> + ep = >ep;
> + ep->ops = _ep_ops;
> +
> + res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "addr_space");
> + if (!res)
> + return -EINVAL;
> +
> + ep->phys_base = res->start;
> + ep->addr_size = resource_size(res);
> +
> + ret = dw_pcie_ep_init(ep);
> + if (ret) {
> + dev_err(dev, "failed to initialize endpoint\n");
> + return ret;
> + }
> +
> + return 0;
> +}
> 

Re: [PATCH] powerpc/mm: Handle mmap_min_addr correctly in get_unmapped_area callback

2019-02-19 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:

> After we ALIGN up the address we need to make sure we didn't overflow
> and resulted in zero address. In that case, we need to make sure that
> the returned address is greater than mmap_min_addr.
>
> Also when doing top-down search the low_limit is not PAGE_SIZE but rather
> max(PAGE_SIZE, mmap_min_addr). This handle cases in which mmap_min_addr >
> PAGE_SIZE.
>
> This fixes selftest va_128TBswitch --run-hugetlb reporting failures when
> run as non root user for
>
> mmap(-1, MAP_HUGETLB)
> mmap(-1, MAP_HUGETLB)
>
> We also avoid the first mmap(-1, MAP_HUGETLB) returning NULL address as mmap 
> address
> with this change

So we think this is not a security issue, because it only affects
whether we choose an address below mmap_min_addr, not whether we
actually allow that address to be mapped.

ie. there are existing capability checks to prevent a user mapping below
mmap_min_addr and those will still be honoured even without this fix.

However there is a bug in that a non-root user requesting address -1
will be given address 0 which will then fail, whereas they should have
been given something else that would have succeeded.

Did I get that all right?

> CC: Laurent Dufour 
> Signed-off-by: Aneesh Kumar K.V 

Seems like this should have a Fixes: tag?

cheers

> ---
>  arch/powerpc/mm/hugetlbpage-radix.c |  5 +++--
>  arch/powerpc/mm/slice.c | 10 ++
>  2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/mm/hugetlbpage-radix.c 
> b/arch/powerpc/mm/hugetlbpage-radix.c
> index 2486bee0f93e..97c7a39ebc00 100644
> --- a/arch/powerpc/mm/hugetlbpage-radix.c
> +++ b/arch/powerpc/mm/hugetlbpage-radix.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -73,7 +74,7 @@ radix__hugetlb_get_unmapped_area(struct file *file, 
> unsigned long addr,
>   if (addr) {
>   addr = ALIGN(addr, huge_page_size(h));
>   vma = find_vma(mm, addr);
> - if (high_limit - len >= addr &&
> + if (high_limit - len >= addr && addr >= mmap_min_addr &&
>   (!vma || addr + len <= vm_start_gap(vma)))
>   return addr;
>   }
> @@ -83,7 +84,7 @@ radix__hugetlb_get_unmapped_area(struct file *file, 
> unsigned long addr,
>*/
>   info.flags = VM_UNMAPPED_AREA_TOPDOWN;
>   info.length = len;
> - info.low_limit = PAGE_SIZE;
> + info.low_limit = max(PAGE_SIZE, mmap_min_addr);
>   info.high_limit = mm->mmap_base + (high_limit - DEFAULT_MAP_WINDOW);
>   info.align_mask = PAGE_MASK & ~huge_page_mask(h);
>   info.align_offset = 0;
> diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
> index 06898c13901d..aec91dbcdc0b 100644
> --- a/arch/powerpc/mm/slice.c
> +++ b/arch/powerpc/mm/slice.c
> @@ -32,6 +32,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -377,6 +378,7 @@ static unsigned long slice_find_area_topdown(struct 
> mm_struct *mm,
>   int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
>   unsigned long addr, found, prev;
>   struct vm_unmapped_area_info info;
> + unsigned long min_addr = max(PAGE_SIZE, mmap_min_addr);
>  
>   info.flags = VM_UNMAPPED_AREA_TOPDOWN;
>   info.length = len;
> @@ -393,7 +395,7 @@ static unsigned long slice_find_area_topdown(struct 
> mm_struct *mm,
>   if (high_limit > DEFAULT_MAP_WINDOW)
>   addr += mm->context.slb_addr_limit - DEFAULT_MAP_WINDOW;
>  
> - while (addr > PAGE_SIZE) {
> + while (addr > min_addr) {
>   info.high_limit = addr;
>   if (!slice_scan_available(addr - 1, available, 0, ))
>   continue;
> @@ -405,8 +407,8 @@ static unsigned long slice_find_area_topdown(struct 
> mm_struct *mm,
>* Check if we need to reduce the range, or if we can
>* extend it to cover the previous available slice.
>*/
> - if (addr < PAGE_SIZE)
> - addr = PAGE_SIZE;
> + if (addr < min_addr)
> + addr = min_addr;
>   else if (slice_scan_available(addr - 1, available, 0, )) {
>   addr = prev;
>   goto prev_slice;
> @@ -528,7 +530,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
> unsigned long len,
>   addr = _ALIGN_UP(addr, page_size);
>   slice_dbg(" aligned addr=%lx\n", addr);
>   /* Ignore hint if it's too large or overlaps a VMA */
> - if (addr > high_limit - len ||
> + if (addr > high_limit - len || addr < mmap_min_addr ||
>   !slice_area_is_free(mm, addr, len))
>   addr = 0;
>   }
> -- 
> 2.20.1


Reading `/sys/kernel/debug/kmemleak` takes 3 s and content not shown

2019-02-19 Thread Paul Menzel
Dear Linux folks,


On a the IBM S822LC (8335-GTA) with Ubuntu 18.10, and Linux 5.0-rc5+
accessing `/sys/kernel/debug/kmemleak` takes a long time. According to
strace it takes three seconds.

```
$ sudo strace -tt -T cat /sys/kernel/debug/kmemleak
10:35:49.861641 execve("/bin/cat", ["cat", "/sys/kernel/debug/kmemleak"], 
0x7dbcb518 /* 16 vars */) = 0 <0.000293>
10:35:49.862112 brk(NULL)   = 0x75b12a5 <0.12>
10:35:49.862190 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or 
directory) <0.15>
10:35:49.862261 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or 
directory) <0.15>
10:35:49.862324 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 
<0.18>
10:35:49.862389 fstat(3, {st_mode=S_IFREG|0644, st_size=143482, ...}) = 0 
<0.11>
10:35:49.862444 mmap(NULL, 143482, PROT_READ, MAP_PRIVATE, 3, 0) = 
0x7ce4a115 <0.17>
10:35:49.862501 close(3)= 0 <0.11>
10:35:49.862550 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or 
directory) <0.15>
10:35:49.862615 openat(AT_FDCWD, "/lib/powerpc64le-linux-gnu/libc.so.6", 
O_RDONLY|O_CLOEXEC) = 3 <0.19>
10:35:49.862676 read(3, 
"\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0\25\0\1\0\0\0pN\2\0\0\0\0\0"..., 832) = 832 
<0.11>
10:35:49.862731 fstat(3, {st_mode=S_IFREG|0755, st_size=2310856, ...}) = 0 
<0.11>
10:35:49.862783 mmap(NULL, 2380672, PROT_READ|PROT_EXEC, 
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ce4a0f0 <0.18>
10:35:49.862842 mprotect(0x7ce4a112, 65536, PROT_NONE) = 0 <0.19>
10:35:49.862899 mmap(0x7ce4a113, 131072, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22) = 0x7ce4a113 <0.19>
10:35:49.862990 close(3)= 0 <0.10>
10:35:49.863110 mprotect(0x7ce4a113, 65536, PROT_READ) = 0 <0.17>
10:35:49.863192 mprotect(0x75ad43b, 65536, PROT_READ) = 0 <0.16>
10:35:49.863252 mprotect(0x7ce4a11e, 65536, PROT_READ) = 0 <0.15>
10:35:49.863305 munmap(0x7ce4a115, 143482) = 0 <0.22>
10:35:49.863446 brk(NULL)   = 0x75b12a5 <0.11>
10:35:49.863495 brk(0x75b12a8)  = 0x75b12a8 <0.14>
10:35:49.863561 openat(AT_FDCWD, "/usr/lib/locale/locale-archive", 
O_RDONLY|O_CLOEXEC) = 3 <0.19>
10:35:49.863624 fstat(3, {st_mode=S_IFREG|0644, st_size=6035920, ...}) = 0 
<0.10>
10:35:49.863677 mmap(NULL, 6035920, PROT_READ, MAP_PRIVATE, 3, 0) = 
0x7ce4a093 <0.17>
10:35:49.863736 close(3)= 0 <0.11>
10:35:49.863828 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) 
= 0 <0.10>
10:35:49.863881 openat(AT_FDCWD, "/sys/kernel/debug/kmemleak", O_RDONLY) = 3 
<0.34>
10:35:49.863956 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.29>
10:35:49.864028 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 <0.11>
10:35:49.864076 mmap(NULL, 262144, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ce4a08f <0.17>
10:35:49.864146 read(3, "", 131072) = 0 <3.528503>
10:35:53.392797 munmap(0x7ce4a08f, 262144) = 0 <0.92>
10:35:53.392957 close(3)= 0 <0.29>
10:35:53.393038 close(1)= 0 <0.10>
10:35:53.393078 close(2)= 0 <0.09>
10:35:53.393123 exit_group(0)   = ?
10:35:53.393280 +++ exited with 0 +++
$ uname -a
Linux flughafenberlinbrandenburgwillybrandt 5.0.0-rc5+ #1 SMP Thu Feb 7 
11:23:11 CET 2019 ppc64le ppc64le ppc64le GNU/Linux
$ more /proc/version
Linux version 5.0.0-rc5+ (pmenzel@flughafenberlinbrandenburgwillybrandt) (gcc 
version 8.2.0 (Ubuntu 8.2.0-7ubuntu1)) #1 SMP Thu Feb 7 11:23:11 CET 2019
$ more /proc/cmdline
root=UUID=2c3dd738-785a-469b-843e-9f0ba8b47b0d ro rootflags=subvol=@ quiet 
splash
$ grep KMEMLEAK /boot/config-5.0.0-rc5+
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=1
# CONFIG_DEBUG_KMEMLEAK_TEST is not set
# CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y
$ grep KMEMLEAK /boot/config-4.18.0-rc4+
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=1
# CONFIG_DEBUG_KMEMLEAK_TEST is not set
# CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
$ dmesg | grep leak
[4.407957] kmemleak: Kernel memory leak detector initialized
[4.407959] kmemleak: Automatic memory scanning thread started
[745989.625624] kmemleak: 1 new suspected memory leaks (see 
/sys/kernel/debug/kmemleak)
[1002619.951902] kmemleak: 1 new suspected memory leaks (see 
/sys/kernel/debug/kmemleak)
```

Unfortunately, the leaks supposedly stored in that file are not shown
either.


Kind regards,

Paul
[0.00] hash-mmu: Page sizes from device-tree:
[0.00] hash-mmu: base_shift=12: shift=12, sllp=0x, 
avpnm=0x, tlbiel=1, penc=0
[0.00] hash-mmu: base_shift=12: shift=16, sllp=0x, 
avpnm=0x, tlbiel=1, penc=7
[0.00] hash-mmu: base_shift=12: shift=24, sllp=0x, 

Re: [PATCH] powerpc: fix 32-bit KVM-PR lockup and panic with MacOS guest

2019-02-19 Thread Mark Cave-Ayland
On 19/02/2019 04:55, Michael Ellerman wrote:

> Mark Cave-Ayland  writes:
>> On 11/02/2019 00:30, Benjamin Herrenschmidt wrote:
>>
>>> On Fri, 2019-02-08 at 14:51 +, Mark Cave-Ayland wrote:

 Indeed, but there are still some questions to be asked here:

 1) Why were these bits removed from the original bitmask in the first 
 place without
 it being documented in the commit message?

 2) Is this the right fix? I'm told that MacOS guests already run without 
 this patch
 on a G5 under 64-bit KVM-PR which may suggest that this is a workaround 
 for another
 bug elsewhere in the 32-bit powerpc code.


 If you think that these points don't matter, then I'm happy to resubmit 
 the patch
 as-is based upon your comments above.
>>>
>>> We should write a test case to verify that FE0/FE1 are properly
>>> preserved/context-switched etc... I bet if we accidentally wiped them,
>>> we wouldn't notice 99.9% of the time.
>>
>> Right I guess it's more likely to cause in issue in the KVM PR case because 
>> the guest
>> can alter the flags in a way that doesn't go through the normal process 
>> switch mechanism.
>>
>> The original patchset at
>> https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg98326.html 
>> does include
>> some tests in the first few patches, but AFAICT they are concerned with the 
>> contents
>> of the FP registers rather than the related MSRs.
> 
> fpu_preempt.c should be able to be adapted to also check the MSR bits.
> 
>> Who is the right person to ask about fixing issues related to context 
>> switching with
>> KVM PR?
> 
> KVM PR doesn't really have a maintainer TBH. Feel like volunteering? :)

Well I only have a 32-bit Mac Mini here which I'm using to help flush out bugs 
in
QEMU's emulation, so I can keep an occasional eye on the 32-bit side of things 
but as
it's a hobby project time is quite limited.

As/when time allows I'd be interested to figure out what MacOS 9 does that 
causes KVM
PR to bail, and if it's possible to run KVM PR on an SMP kernel but certainly 
I'd
need some help from the very knowledgable people on these lists.

>> I did add the original author's email address to my first few emails but have
>> had no response back :/
> 
> Cyril who wrote the original FPU patch has moved on to other things.

Ah okay then.


ATB,

Mark.