Re: [PATCH v3 6/7] powerpc/mm: implement set_memory_attr()

2020-02-10 Thread kbuild test robot
Hi Christophe,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.6-rc1 next-20200211]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Christophe-Leroy/powerpc-mm-Implement-set_memory-routines/20200204-030618
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-skiroot_defconfig (attached as .config)
compiler: powerpc64le-linux-gcc (GCC) 7.5.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.5.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   In file included from arch/powerpc/include/asm/page.h:304:0,
from arch/powerpc/include/asm/mmu.h:132,
from arch/powerpc/include/asm/lppaca.h:47,
from arch/powerpc/include/asm/paca.h:17,
from arch/powerpc/include/asm/current.h:13,
from include/linux/thread_info.h:21,
from include/asm-generic/preempt.h:5,
from ./arch/powerpc/include/generated/asm/preempt.h:1,
from include/linux/preempt.h:78,
from include/linux/spinlock.h:51,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:6,
from include/linux/mm.h:10,
from arch/powerpc/mm/pageattr.c:9:
   arch/powerpc/mm/pageattr.c: In function 'set_page_attr':
>> arch/powerpc/mm/pageattr.c:85:27: error: cast from pointer to integer of 
>> different size [-Werror=pointer-to-int-cast]
 pgprot_t prot = __pgprot((int)data);
  ^
   arch/powerpc/include/asm/pgtable-be-types.h:69:36: note: in definition of 
macro '__pgprot'
#define __pgprot(x) ((pgprot_t) { (x) })
   ^
   cc1: all warnings being treated as errors

vim +85 arch/powerpc/mm/pageattr.c

75  
76  /*
77   * Set the attributes of a page:
78   *
79   * This function is used by PPC32 at the end of init to set final 
kernel memory
80   * protection. It includes changing the maping of the page it is 
executing from
81   * and data pages it is using.
82   */
83  static int set_page_attr(pte_t *ptep, unsigned long addr, void *data)
84  {
  > 85  pgprot_t prot = __pgprot((int)data);
86  
87  spin_lock(_mm.page_table_lock);
88  
89  set_pte_at(_mm, addr, ptep, pte_modify(*ptep, prot));
90  flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
91  
92  spin_unlock(_mm.page_table_lock);
93  
94  return 0;
95  }
96  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: Problem booting a PowerBook G4 Aluminum after commit cd08f109 with CONFIG_VMAP_STACK=y

2020-02-10 Thread Christophe Leroy




Le 10/02/2020 à 13:55, Larry Finger a écrit :

On 2/9/20 12:19 PM, Christophe Leroy wrote:

Do you have CONFIG_TRACE_IRQFLAGS in your config ?
If so, can you try the patch below ?

https://patchwork.ozlabs.org/patch/1235081/

Otherwise, can you send me your .config and tell me exactly where it 
stops during the boot.


Christophe,

That patch did not work. My .config is attached.

It does boot if CONFIG_VMAP_STACK is not set.

The console display ends with the "DMA ranges" output. A screen shot is 
also appended.


Larry



Hi,

I tried your config under QEMU, it works.

In fact your console display is looping on itself, it ends at "printk: 
bootconsole [udbg0] disabled".


Looks like you get stuck at the time of switching to graphic mode. Need 
to understand why.


Christophe


Re: [PATCH v2 10/13] powerpc/kprobes: Support kprobes on prefixed instructions

2020-02-10 Thread Christophe Leroy




Le 11/02/2020 à 06:33, Jordan Niethe a écrit :

A prefixed instruction is composed of a word prefix followed by a word
suffix. It does not make sense to be able to have a kprobe on the suffix
of a prefixed instruction, so make this impossible.

Kprobes work by replacing an instruction with a trap and saving that
instruction to be single stepped out of place later. Currently there is
not enough space allocated to keep a prefixed instruction for single
stepping. Increase the amount of space allocated for holding the
instruction copy.

kprobe_post_handler() expects all instructions to be 4 bytes long which
means that it does not function correctly for prefixed instructions.
Add checks for prefixed instructions which will use a length of 8 bytes
instead.

For optprobes we normally patch in loading the instruction we put a
probe on into r4 before calling emulate_step(). We now make space and
patch in loading the suffix into r5 as well.

Signed-off-by: Jordan Niethe 
---
  arch/powerpc/include/asm/kprobes.h   |  5 +--
  arch/powerpc/kernel/kprobes.c| 47 +---
  arch/powerpc/kernel/optprobes.c  | 32 ++-
  arch/powerpc/kernel/optprobes_head.S |  6 
  4 files changed, 63 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/kprobes.h 
b/arch/powerpc/include/asm/kprobes.h
index 66b3f2983b22..0d44ce8a3163 100644
--- a/arch/powerpc/include/asm/kprobes.h
+++ b/arch/powerpc/include/asm/kprobes.h
@@ -38,12 +38,13 @@ extern kprobe_opcode_t optprobe_template_entry[];
  extern kprobe_opcode_t optprobe_template_op_address[];
  extern kprobe_opcode_t optprobe_template_call_handler[];
  extern kprobe_opcode_t optprobe_template_insn[];
+extern kprobe_opcode_t optprobe_template_suffix[];
  extern kprobe_opcode_t optprobe_template_call_emulate[];
  extern kprobe_opcode_t optprobe_template_ret[];
  extern kprobe_opcode_t optprobe_template_end[];
  
-/* Fixed instruction size for powerpc */

-#define MAX_INSN_SIZE  1
+/* Prefixed instructions are two words */
+#define MAX_INSN_SIZE  2
  #define MAX_OPTIMIZED_LENGTH  sizeof(kprobe_opcode_t) /* 4 bytes */
  #define MAX_OPTINSN_SIZE  (optprobe_template_end - 
optprobe_template_entry)
  #define RELATIVEJUMP_SIZE sizeof(kprobe_opcode_t) /* 4 bytes */
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 24a56f062d9e..b061deba4fe7 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -104,17 +104,30 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
unsigned int offset)
  
  int arch_prepare_kprobe(struct kprobe *p)

  {
+   int len;
int ret = 0;
+   struct kprobe *prev;
kprobe_opcode_t insn = *p->addr;
+   kprobe_opcode_t prefix = *(p->addr - 1);
  
+	preempt_disable();

if ((unsigned long)p->addr & 0x03) {
printk("Attempt to register kprobe at an unaligned address\n");
ret = -EINVAL;
} else if (IS_MTMSRD(insn) || IS_RFID(insn) || IS_RFI(insn)) {
printk("Cannot register a kprobe on rfi/rfid or mtmsr[d]\n");
ret = -EINVAL;
+   } else if (IS_PREFIX(prefix)) {
+   printk("Cannot register a kprobe on the second word of prefixed 
instruction\n");
+   ret = -EINVAL;
+   }
+   prev = get_kprobe(p->addr - 1);
+   if (prev && IS_PREFIX(*prev->ainsn.insn)) {
+   printk("Cannot register a kprobe on the second word of prefixed 
instruction\n");
+   ret = -EINVAL;
}
  
+

/* insn must be on a special executable page on ppc64.  This is
 * not explicitly required on ppc32 (right now), but it doesn't hurt */
if (!ret) {
@@ -124,14 +137,18 @@ int arch_prepare_kprobe(struct kprobe *p)
}
  
  	if (!ret) {

-   memcpy(p->ainsn.insn, p->addr,
-   MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+   if (IS_PREFIX(insn))
+   len = MAX_INSN_SIZE * sizeof(kprobe_opcode_t);
+   else
+   len = sizeof(kprobe_opcode_t);
+   memcpy(p->ainsn.insn, p->addr, len);


This code is about to get changed, see 
https://patchwork.ozlabs.org/patch/1232619/



p->opcode = *p->addr;
flush_icache_range((unsigned long)p->ainsn.insn,
(unsigned long)p->ainsn.insn + sizeof(kprobe_opcode_t));
}
  
  	p->ainsn.boostable = 0;

+   preempt_enable_no_resched();
return ret;
  }
  NOKPROBE_SYMBOL(arch_prepare_kprobe);
@@ -216,10 +233,11 @@ NOKPROBE_SYMBOL(arch_prepare_kretprobe);
  static int try_to_emulate(struct kprobe *p, struct pt_regs *regs)
  {
int ret;
-   unsigned int insn = *p->ainsn.insn;
+   unsigned int insn = p->ainsn.insn[0];
+   unsigned int suffix = p->ainsn.insn[1];
  
  	/* regs->nip is also adjusted if emulate_step returns 1 */

-   ret = 

Re: [PATCH V13] mm/debug: Add tests validating architecture page table helpers

2020-02-10 Thread Christophe Leroy




Le 11/02/2020 à 03:25, Anshuman Khandual a écrit :



On 02/10/2020 04:36 PM, Russell King - ARM Linux admin wrote:

There are good reasons for the way ARM does stuff.  The generic crap was
written without regard for the circumstances that ARM has, and thus is
entirely unsuitable for 32-bit ARM.


Since we dont have an agreement here, lets just settle with disabling the
test for now on platforms where the build fails. CONFIG_EXPERT is enabling
this test for better adaptability and coverage, hence how about re framing
the config like this ? This at the least conveys the fact that EXPERT only
works when platform is neither IA64 or ARM.


Agreed



config DEBUG_VM_PGTABLE
bool "Debug arch page table for semantics compliance"
depends on MMU
depends on ARCH_HAS_DEBUG_VM_PGTABLE || (EXPERT &&  !(IA64 || ARM))


I think it's maybe better to have a dedicated depends line:

depends on !IA64 && !ARM
depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT

The day arm and/or ia64 is ready for building the test, we can remove 
that depends.



default n if !ARCH_HAS_DEBUG_VM_PGTABLE
default y if DEBUG_VM



Christophe


Re: [PATCH v2 09/13] powerpc/xmon: Dump prefixed instructions

2020-02-10 Thread Christophe Leroy




Le 11/02/2020 à 06:33, Jordan Niethe a écrit :

Currently when xmon is dumping instructions it reads a word at a time
and then prints that instruction (either as a hex number or by
disassembling it). For prefixed instructions it would be nice to show
its prefix and suffix as together. Use read_instr() so that if a prefix
is encountered its suffix is loaded too. Then print these in the form:
 prefix:suffix
Xmon uses the disassembly routines from GNU binutils. These currently do
not support prefixed instructions so we will not disassemble the
prefixed instructions yet.

Signed-off-by: Jordan Niethe 
---
v2: Rename sufx to suffix
---
  arch/powerpc/xmon/xmon.c | 50 +++-
  1 file changed, 39 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 0b085642bbe7..513901ee18b0 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2903,6 +2903,21 @@ prdump(unsigned long adrs, long ndump)
}
  }
  
+static bool instrs_are_equal(unsigned long insta, unsigned long suffixa,

+unsigned long instb, unsigned long suffixb)
+{
+   if (insta != instb)
+   return false;
+
+   if (!IS_PREFIX(insta) && !IS_PREFIX(instb))
+   return true;
+
+   if (IS_PREFIX(insta) && IS_PREFIX(instb))
+   return suffixa == suffixb;
+
+   return false;
+}
+
  typedef int (*instruction_dump_func)(unsigned long inst, unsigned long addr);
  
  static int

@@ -2911,12 +2926,11 @@ generic_inst_dump(unsigned long adr, long count, int 
praddr,
  {
int nr, dotted;
unsigned long first_adr;
-   unsigned int inst, last_inst = 0;
-   unsigned char val[4];
+   unsigned int inst, suffix, last_inst = 0, last_suffix = 0;
  
  	dotted = 0;

-   for (first_adr = adr; count > 0; --count, adr += 4) {
-   nr = mread(adr, val, 4);
+   for (first_adr = adr; count > 0; --count, adr += nr) {
+   nr = read_instr(adr, , );
if (nr == 0) {
if (praddr) {
const char *x = fault_chars[fault_type];
@@ -2924,8 +2938,9 @@ generic_inst_dump(unsigned long adr, long count, int 
praddr,
}
break;
}
-   inst = GETWORD(val);
-   if (adr > first_adr && inst == last_inst) {
+   if (adr > first_adr && instrs_are_equal(inst, suffix,
+   last_inst,
+   last_suffix)) {
if (!dotted) {
printf(" ...\n");
dotted = 1;
@@ -2934,11 +2949,24 @@ generic_inst_dump(unsigned long adr, long count, int 
praddr,
}
dotted = 0;
last_inst = inst;
-   if (praddr)
-   printf(REG"  %.8x", adr, inst);
-   printf("\t");
-   dump_func(inst, adr);
-   printf("\n");
+   last_suffix = suffix;
+   if (IS_PREFIX(inst)) {
+   if (praddr)
+   printf(REG"  %.8x:%.8x", adr, inst, suffix);
+   printf("\t");
+   /*
+* Just use this until binutils ppc disassembly
+* prints prefixed instructions.
+*/
+   printf("%.8x:%.8x", inst, suffix);
+   printf("\n");
+   } else {
+   if (praddr)
+   printf(REG"  %.8x", adr, inst);
+   printf("\t");
+   dump_func(inst, adr);
+   printf("\n");
+   }


What about:


if (pr_addr) {
printf(REG"  %.8x", adr, inst);
if (IS_PREFIX(inst))
printf(":%.8x", suffix);
}
printf("\t");
if (IS_PREFIX(inst))
printf("%.8x:%.8x", inst, suffix);
else
dump_func(inst, adr);
printf("\n");


}
return adr - first_adr;
  }



Christophe


Re: [PATCH v2 08/13] powerpc/xmon: Add initial support for prefixed instructions

2020-02-10 Thread Christophe Leroy




Le 11/02/2020 à 06:33, Jordan Niethe a écrit :

A prefixed instruction is composed of a word prefix and a word suffix.
It does not make sense to be able to have a breakpoint on the suffix of
a prefixed instruction, so make this impossible.

When leaving xmon_core() we check to see if we are currently at a
breakpoint. If this is the case, the breakpoint needs to be proceeded
from. Initially emulate_step() is tried, but if this fails then we need
to execute the saved instruction out of line. The NIP is set to the
address of bpt::instr[] for the current breakpoint.  bpt::instr[]
contains the instruction replaced by the breakpoint, followed by a trap
instruction.  After bpt::instr[0] is executed and we hit the trap we
enter back into xmon_bpt(). We know that if we got here and the offset
indicates we are at bpt::instr[1] then we have just executed out of line
so we can put the NIP back to the instruction after the breakpoint
location and continue on.

Adding prefixed instructions complicates this as the bpt::instr[1] needs
to be used to hold the suffix. To deal with this make bpt::instr[] big
enough for three word instructions.  bpt::instr[2] contains the trap,
and in the case of word instructions pad bpt::instr[1] with a noop.

No support for disassembling prefixed instructions.

Signed-off-by: Jordan Niethe 
---
v2: Rename sufx to suffix
---
  arch/powerpc/xmon/xmon.c | 82 ++--
  1 file changed, 71 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 897e512c6379..0b085642bbe7 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -97,7 +97,8 @@ static long *xmon_fault_jmp[NR_CPUS];
  /* Breakpoint stuff */
  struct bpt {
unsigned long   address;
-   unsigned intinstr[2];
+   /* Prefixed instructions can not cross 64-byte boundaries */
+   unsigned intinstr[3] __aligned(64);
atomic_tref_count;
int enabled;
unsigned long   pad;
@@ -113,6 +114,7 @@ static struct bpt bpts[NBPTS];
  static struct bpt dabr;
  static struct bpt *iabr;
  static unsigned bpinstr = 0x7fe8; /* trap */
+static unsigned nopinstr = 0x6000; /* nop */


Use PPC_INST_NOP instead of 0x6000

And this nopinstr variable will never change. Why not use directly 
PPC_INST_NOP  in the code ?


  
  #define BP_NUM(bp)	((bp) - bpts + 1)
  
@@ -120,6 +122,7 @@ static unsigned bpinstr = 0x7fe8;	/* trap */

  static int cmds(struct pt_regs *);
  static int mread(unsigned long, void *, int);
  static int mwrite(unsigned long, void *, int);
+static int read_instr(unsigned long, unsigned int *, unsigned int *);
  static int handle_fault(struct pt_regs *);
  static void byterev(unsigned char *, int);
  static void memex(void);
@@ -706,7 +709,7 @@ static int xmon_core(struct pt_regs *regs, int fromipi)
bp = at_breakpoint(regs->nip);
if (bp != NULL) {
int stepped = emulate_step(regs, bp->instr[0],
-  PPC_NO_SUFFIX);
+  bp->instr[1]);
if (stepped == 0) {
regs->nip = (unsigned long) >instr[0];
atomic_inc(>ref_count);
@@ -761,8 +764,8 @@ static int xmon_bpt(struct pt_regs *regs)
  
  	/* Are we at the trap at bp->instr[1] for some bp? */

bp = in_breakpoint_table(regs->nip, );
-   if (bp != NULL && offset == 4) {
-   regs->nip = bp->address + 4;
+   if (bp != NULL && (offset == 4 || offset == 8)) {
+   regs->nip = bp->address + offset;
atomic_dec(>ref_count);
return 1;
}
@@ -864,7 +867,8 @@ static struct bpt *in_breakpoint_table(unsigned long nip, 
unsigned long *offp)
return NULL;
off %= sizeof(struct bpt);
if (off != offsetof(struct bpt, instr[0])
-   && off != offsetof(struct bpt, instr[1]))
+   && off != offsetof(struct bpt, instr[1])
+   && off != offsetof(struct bpt, instr[2]))
return NULL;
*offp = off - offsetof(struct bpt, instr[0]);
return (struct bpt *) (nip - off);
@@ -881,9 +885,18 @@ static struct bpt *new_breakpoint(unsigned long a)
  
  	for (bp = bpts; bp < [NBPTS]; ++bp) {

if (!bp->enabled && atomic_read(>ref_count) == 0) {
+   /*
+* Prefixed instructions are two words, but regular
+* instructions are only one. Use a nop to pad out the
+* regular instructions so that we can place the trap
+* at the same plac. For prefixed instructions the nop


plac ==> place


+* will get overwritten during insert_bpts().
+*/
bp->address = a;
-   

Re: [PATCH v2 06/13] powerpc: Support prefixed instructions in alignment handler

2020-02-10 Thread Christophe Leroy




Le 11/02/2020 à 06:33, Jordan Niethe a écrit :

Alignment interrupts can be caused by prefixed instructions accessing
memory. In the alignment handler the instruction that caused the
exception is loaded and attempted emulate. If the instruction is a
prefixed instruction load the prefix and suffix to emulate. After
emulating increment the NIP by 8.

Prefixed instructions are not permitted to cross 64-byte boundaries. If
they do the alignment interrupt is invoked with SRR1 BOUNDARY bit set.
If this occurs send a SIGBUS to the offending process if in user mode.
If in kernel mode call bad_page_fault().

Signed-off-by: Jordan Niethe 
---
v2: - Move __get_user_instr() and __get_user_instr_inatomic() to this
commit (previously in "powerpc sstep: Prepare to support prefixed
instructions").
 - Rename sufx to suffix
 - Use a macro for calculating instruction length
---
  arch/powerpc/include/asm/uaccess.h | 30 ++
  arch/powerpc/kernel/align.c|  8 +---
  arch/powerpc/kernel/traps.c| 21 -
  3 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 2f500debae21..30f63a81c8d8 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -474,4 +474,34 @@ static __must_check inline bool user_access_begin(const 
void __user *ptr, size_t
  #define unsafe_copy_to_user(d, s, l, e) \
unsafe_op_wrap(raw_copy_to_user_allowed(d, s, l), e)
  


Could it go close to other __get_user() and friends instead of being at 
the end of the file ?



+/*
+ * When reading an instruction iff it is a prefix, the suffix needs to be also
+ * loaded.
+ */
+#define __get_user_instr(x, y, ptr)\
+({ \
+   long __gui_ret = 0; \
+   y = 0;  \
+   __gui_ret = __get_user(x, ptr); \
+   if (!__gui_ret) {   \
+   if (IS_PREFIX(x))   \


Does this apply to PPC32 ?
If not, can we make sure IS_PREFIX is constant 0 on PPC32 so that the 
second read gets dropped at compile time ?


Can we instead do :

if (!__gui_ret && IS_PREFIX(x))


+   __gui_ret = __get_user(y, ptr + 1); \
+   }   \
+   \
+   __gui_ret;  \
+})
+
+#define __get_user_instr_inatomic(x, y, ptr)   \
+({ \
+   long __gui_ret = 0; \
+   y = 0;  \
+   __gui_ret = __get_user_inatomic(x, ptr);\
+   if (!__gui_ret) {   \
+   if (IS_PREFIX(x))   \


Same commments as above


+   __gui_ret = __get_user_inatomic(y, ptr + 1);\
+   }   \
+   \
+   __gui_ret;  \
+})
+
  #endif/* _ARCH_POWERPC_UACCESS_H */
diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
index ba3bf5c3ab62..e42cfaa616d3 100644
--- a/arch/powerpc/kernel/align.c
+++ b/arch/powerpc/kernel/align.c
@@ -293,7 +293,7 @@ static int emulate_spe(struct pt_regs *regs, unsigned int 
reg,
  
  int fix_alignment(struct pt_regs *regs)

  {
-   unsigned int instr;
+   unsigned int instr, suffix;
struct instruction_op op;
int r, type;
  
@@ -303,13 +303,15 @@ int fix_alignment(struct pt_regs *regs)

 */
CHECK_FULL_REGS(regs);
  
-	if (unlikely(__get_user(instr, (unsigned int __user *)regs->nip)))

+   if (unlikely(__get_user_instr(instr, suffix,
+(unsigned int __user *)regs->nip)))
return -EFAULT;
if ((regs->msr & MSR_LE) != (MSR_KERNEL & MSR_LE)) {
/* We don't handle PPC little-endian any more... */
if (cpu_has_feature(CPU_FTR_PPC_LE))
return -EIO;
instr = swab32(instr);
+   suffix = swab32(suffix);
}
  
  #ifdef CONFIG_SPE

@@ -334,7 +336,7 @@ int fix_alignment(struct pt_regs *regs)
if ((instr & 0xfc0006fe) == (PPC_INST_COPY & 0xfc0006fe))
return -EIO;
  
-	r = analyse_instr(, regs, instr, PPC_NO_SUFFIX);

+   r = analyse_instr(, regs, instr, suffix);
if (r < 0)
return -EINVAL;
  
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c

index 82a3438300fd..d80b82fc1ae3 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -583,6 +583,10 @@ static inline int check_io_access(struct 

Re: [PATCH v2 04/13] powerpc sstep: Add support for prefixed load/stores

2020-02-10 Thread Christophe Leroy




Le 11/02/2020 à 06:33, Jordan Niethe a écrit :

This adds emulation support for the following prefixed integer
load/stores:
   * Prefixed Load Byte and Zero (plbz)
   * Prefixed Load Halfword and Zero (plhz)
   * Prefixed Load Halfword Algebraic (plha)
   * Prefixed Load Word and Zero (plwz)
   * Prefixed Load Word Algebraic (plwa)
   * Prefixed Load Doubleword (pld)
   * Prefixed Store Byte (pstb)
   * Prefixed Store Halfword (psth)
   * Prefixed Store Word (pstw)
   * Prefixed Store Doubleword (pstd)
   * Prefixed Load Quadword (plq)
   * Prefixed Store Quadword (pstq)

the follow prefixed floating-point load/stores:
   * Prefixed Load Floating-Point Single (plfs)
   * Prefixed Load Floating-Point Double (plfd)
   * Prefixed Store Floating-Point Single (pstfs)
   * Prefixed Store Floating-Point Double (pstfd)

and for the following prefixed VSX load/stores:
   * Prefixed Load VSX Scalar Doubleword (plxsd)
   * Prefixed Load VSX Scalar Single-Precision (plxssp)
   * Prefixed Load VSX Vector [0|1]  (plxv, plxv0, plxv1)
   * Prefixed Store VSX Scalar Doubleword (pstxsd)
   * Prefixed Store VSX Scalar Single-Precision (pstxssp)
   * Prefixed Store VSX Vector [0|1] (pstxv, pstxv0, pstxv1)

Signed-off-by: Jordan Niethe 
---
v2: - Combine all load/store patches
 - Fix the name of Type 01 instructions
 - Remove sign extension flag from pstd/pld
 - Rename sufx -> suffix
---
  arch/powerpc/lib/sstep.c | 165 +++
  1 file changed, 165 insertions(+)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 65143ab1bf64..0e21c21ff2be 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -187,6 +187,44 @@ static nokprobe_inline unsigned long xform_ea(unsigned int 
instr,
return ea;
  }
  
+/*

+ * Calculate effective address for a MLS:D-form / 8LS:D-form
+ * prefixed instruction
+ */
+static nokprobe_inline unsigned long mlsd_8lsd_ea(unsigned int instr,
+ unsigned int suffix,
+ const struct pt_regs *regs)
+{
+   int ra, prefix_r;
+   unsigned int  dd;
+   unsigned long ea, d0, d1, d;
+
+   prefix_r = instr & (1ul << 20);
+   ra = (suffix >> 16) & 0x1f;
+
+   d0 = instr & 0x3;
+   d1 = suffix & 0x;
+   d = (d0 << 16) | d1;
+
+   /*
+* sign extend a 34 bit number
+*/
+   dd = (unsigned int) (d >> 2);
+   ea = (signed int) dd;
+   ea = (ea << 2) | (d & 0x3);
+
+   if (!prefix_r && ra)
+   ea += regs->gpr[ra];
+   else if (!prefix_r && !ra)
+   ; /* Leave ea as is */
+   else if (prefix_r && !ra)
+   ea += regs->nip;
+   else if (prefix_r && ra)
+   ; /* Invalid form. Should already be checked for by caller! */
+
+   return ea;
+}
+
  /*
   * Return the largest power of 2, not greater than sizeof(unsigned long),
   * such that x is a multiple of it.
@@ -1166,6 +1204,7 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
  unsigned int instr, unsigned int suffix)
  {
unsigned int opcode, ra, rb, rc, rd, spr, u;
+   unsigned int suffixopcode, prefixtype, prefix_r;
unsigned long int imm;
unsigned long int val, val2;
unsigned int mb, me, sh;
@@ -2652,6 +2691,132 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
  
  	}
  
+/*

+ * Prefixed instructions
+ */
+   switch (opcode) {
+   case 1:


Why not include it in the above switch () ?

Should it be enclosed by #ifdef __powerpc64__, or will this new ISA also 
apply to 32 bits processors ?



+   prefix_r = instr & (1ul << 20);
+   ra = (suffix >> 16) & 0x1f;
+   op->update_reg = ra;
+   rd = (suffix >> 21) & 0x1f;
+   op->reg = rd;
+   op->val = regs->gpr[rd];
+
+   suffixopcode = suffix >> 26;
+   prefixtype = (instr >> 24) & 0x3;
+   switch (prefixtype) {
+   case 0: /* Type 00  Eight-Byte Load/Store */
+   if (prefix_r && ra)
+   break;
+   op->ea = mlsd_8lsd_ea(instr, suffix, regs);
+   switch (suffixopcode) {
+   case 41:/* plwa */
+   op->type = MKOP(LOAD, PREFIXED | SIGNEXT, 4);
+   break;
+   case 42:/* plxsd */
+   op->reg = rd + 32;
+   op->type = MKOP(LOAD_VSX, PREFIXED, 8);
+   op->element_size = 8;
+   op->vsx_flags = VSX_CHECK_VEC;
+   break;
+   case 43:/* plxssp */
+   op->reg = rd + 32;
+   

Re: [PATCH v2 03/13] powerpc sstep: Prepare to support prefixed instructions

2020-02-10 Thread Christophe Leroy




Le 11/02/2020 à 06:33, Jordan Niethe a écrit :

Currently all instructions are a single word long. A future ISA version
will include prefixed instructions which have a double word length. The
functions used for analysing and emulating instructions need to be
modified so that they can handle these new instruction types.

A prefixed instruction is a word prefix followed by a word suffix. All
prefixes uniquely have the primary op-code 1. Suffixes may be valid word
instructions or instructions that only exist as suffixes.

In handling prefixed instructions it will be convenient to treat the
suffix and prefix as separate words. To facilitate this modify
analyse_instr() and emulate_step() to take a suffix as a
parameter. For word instructions it does not matter what is passed in
here - it will be ignored.

We also define a new flag, PREFIXED, to be used in instruction_op:type.
This flag will indicate when emulating an analysed instruction if the
NIP should be advanced by word length or double word length.

The callers of analyse_instr() and emulate_step() will need their own
changes to be able to support prefixed instructions. For now modify them
to pass in 0 as a suffix.

Note that at this point no prefixed instructions are emulated or
analysed - this is just making it possible to do so.

Signed-off-by: Jordan Niethe 
---
v2: - Move definition of __get_user_instr() and
__get_user_instr_inatomic() to "powerpc: Support prefixed instructions
in alignment handler."
 - Use a macro for returning the length of an op
 - Rename sufx -> suffix
 - Define and use PPC_NO_SUFFIX instead of 0
---
  arch/powerpc/include/asm/ppc-opcode.h |  5 +
  arch/powerpc/include/asm/sstep.h  |  9 ++--
  arch/powerpc/kernel/align.c   |  2 +-
  arch/powerpc/kernel/hw_breakpoint.c   |  4 ++--
  arch/powerpc/kernel/kprobes.c |  2 +-
  arch/powerpc/kernel/mce_power.c   |  2 +-
  arch/powerpc/kernel/optprobes.c   |  3 ++-
  arch/powerpc/kernel/uprobes.c |  2 +-
  arch/powerpc/kvm/emulate_loadstore.c  |  2 +-
  arch/powerpc/lib/sstep.c  | 12 ++-
  arch/powerpc/lib/test_emulate_step.c  | 30 +--
  arch/powerpc/xmon/xmon.c  |  5 +++--
  12 files changed, 46 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index c1df75edde44..72783bc92e50 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -377,6 +377,11 @@
  #define PPC_INST_VCMPEQUD 0x10c7
  #define PPC_INST_VCMPEQUB 0x1006
  
+/* macro to check if a word is a prefix */

+#define IS_PREFIX(x)   (((x) >> 26) == 1)


Can you add an OP_PREFIX in the OP list and use it instead of '1' ?


+#definePPC_NO_SUFFIX   0
+#definePPC_INST_LENGTH(x)  (IS_PREFIX(x) ? 8 : 4)
+
  /* macros to insert fields into opcodes */
  #define ___PPC_RA(a)  (((a) & 0x1f) << 16)
  #define ___PPC_RB(b)  (((b) & 0x1f) << 11)
diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index 769f055509c9..9ea8904a1549 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -89,11 +89,15 @@ enum instruction_type {
  #define VSX_LDLEFT4   /* load VSX register from left */
  #define VSX_CHECK_VEC 8   /* check MSR_VEC not MSR_VSX for reg >= 32 */
  
+/* Prefixed flag, ORed in with type */

+#define PREFIXED   0x800
+
  /* Size field in type word */
  #define SIZE(n)   ((n) << 12)
  #define GETSIZE(w)((w) >> 12)
  
  #define GETTYPE(t)	((t) & INSTR_TYPE_MASK)

+#define OP_LENGTH(t)   (((t) & PREFIXED) ? 8 : 4)


Is it worth naming it OP_LENGTH ? Can't it be mistaken as one of the 
OP_xxx from the list in asm/opcode.h ?


What about GETLENGTH() instead to be consistant with the above lines ?

Christophe


[PATCH v2 13/13] powerpc: Add prefix support to mce_find_instr_ea_and_pfn()

2020-02-10 Thread Jordan Niethe
mce_find_instr_ea_and_pfn analyses an instruction to determine the
effective address that caused the machine check. Update this to load and
pass the suffix to analyse_instr for prefixed instructions.

Signed-off-by: Jordan Niethe 
---
v2: - Rename sufx to suffix
---
 arch/powerpc/kernel/mce_power.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
index 824eda536f5d..091bab4a5464 100644
--- a/arch/powerpc/kernel/mce_power.c
+++ b/arch/powerpc/kernel/mce_power.c
@@ -365,7 +365,7 @@ static int mce_find_instr_ea_and_phys(struct pt_regs *regs, 
uint64_t *addr,
 * in real-mode is tricky and can lead to recursive
 * faults
 */
-   int instr;
+   int instr, suffix = 0;
unsigned long pfn, instr_addr;
struct instruction_op op;
struct pt_regs tmp = *regs;
@@ -374,7 +374,9 @@ static int mce_find_instr_ea_and_phys(struct pt_regs *regs, 
uint64_t *addr,
if (pfn != ULONG_MAX) {
instr_addr = (pfn << PAGE_SHIFT) + (regs->nip & ~PAGE_MASK);
instr = *(unsigned int *)(instr_addr);
-   if (!analyse_instr(, , instr, PPC_NO_SUFFIX)) {
+   if (IS_PREFIX(instr))
+   suffix = *(unsigned int *)(instr_addr + 4);
+   if (!analyse_instr(, , instr, suffix)) {
pfn = addr_to_pfn(regs, op.ea);
*addr = op.ea;
*phys_addr = (pfn << PAGE_SHIFT);
-- 
2.17.1



[PATCH v2 12/13] powerpc/hw_breakpoints: Initial support for prefixed instructions

2020-02-10 Thread Jordan Niethe
Currently when getting an instruction to emulate in
hw_breakpoint_handler() we do not load the suffix of a prefixed
instruction. Ensure we load the suffix if the instruction we need to
emulate is a prefixed instruction.

Signed-off-by: Jordan Niethe 
---
v2: Rename sufx to suffix
---
 arch/powerpc/kernel/hw_breakpoint.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 3a7ec6760dab..c69189641b05 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -243,15 +243,15 @@ dar_range_overlaps(unsigned long dar, int size, struct 
arch_hw_breakpoint *info)
 static bool stepping_handler(struct pt_regs *regs, struct perf_event *bp,
 struct arch_hw_breakpoint *info)
 {
-   unsigned int instr = 0;
+   unsigned int instr = 0, suffix = 0;
int ret, type, size;
struct instruction_op op;
unsigned long addr = info->address;
 
-   if (__get_user_inatomic(instr, (unsigned int *)regs->nip))
+   if (__get_user_instr_inatomic(instr, suffix, (unsigned int *)regs->nip))
goto fail;
 
-   ret = analyse_instr(, regs, instr, PPC_NO_SUFFIX);
+   ret = analyse_instr(, regs, instr, suffix);
type = GETTYPE(op.type);
size = GETSIZE(op.type);
 
@@ -275,7 +275,7 @@ static bool stepping_handler(struct pt_regs *regs, struct 
perf_event *bp,
return false;
}
 
-   if (!emulate_step(regs, instr, PPC_NO_SUFFIX))
+   if (!emulate_step(regs, instr, suffix))
goto fail;
 
return true;
-- 
2.17.1



[PATCH v2 11/13] powerpc/uprobes: Add support for prefixed instructions

2020-02-10 Thread Jordan Niethe
Uprobes can execute instructions out of line. Increase the size of the
buffer used  for this so that this works for prefixed instructions. Take
into account the length of prefixed instructions when fixing up the nip.

Signed-off-by: Jordan Niethe 
---
v2: - Fix typo
- Use macro for instruction length
---
 arch/powerpc/include/asm/uprobes.h | 16 
 arch/powerpc/kernel/uprobes.c  |  4 ++--
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/uprobes.h 
b/arch/powerpc/include/asm/uprobes.h
index 2bbdf27d09b5..5516ab27db47 100644
--- a/arch/powerpc/include/asm/uprobes.h
+++ b/arch/powerpc/include/asm/uprobes.h
@@ -14,18 +14,26 @@
 
 typedef ppc_opcode_t uprobe_opcode_t;
 
+/*
+ * Ensure we have enough space for prefixed instructions, which
+ * are double the size of a word instruction, i.e. 8 bytes.
+ */
 #define MAX_UINSN_BYTES4
-#define UPROBE_XOL_SLOT_BYTES  (MAX_UINSN_BYTES)
+#define UPROBE_XOL_SLOT_BYTES  (2 * MAX_UINSN_BYTES)
 
 /* The following alias is needed for reference from arch-agnostic code */
 #define UPROBE_SWBP_INSN   BREAKPOINT_INSTRUCTION
 #define UPROBE_SWBP_INSN_SIZE  4 /* swbp insn size in bytes */
 
 struct arch_uprobe {
+/*
+ * Ensure there is enough space for prefixed instructions. Prefixed
+ * instructions must not cross 64-byte boundaries.
+ */
union {
-   u32 insn;
-   u32 ixol;
-   };
+   uprobe_opcode_t insn[2];
+   uprobe_opcode_t ixol[2];
+   } __aligned(64);
 };
 
 struct arch_uprobe_task {
diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index 4ab40c4b576f..7e0334ad5cfe 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -111,7 +111,7 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 * support doesn't exist and have to fix-up the next instruction
 * to be executed.
 */
-   regs->nip = utask->vaddr + MAX_UINSN_BYTES;
+   regs->nip = utask->vaddr + PPC_INST_LENGTH(auprobe->insn[0]);
 
user_disable_single_step(current);
return 0;
@@ -173,7 +173,7 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 * emulate_step() returns 1 if the insn was successfully emulated.
 * For all other cases, we need to single-step in hardware.
 */
-   ret = emulate_step(regs, auprobe->insn, PPC_NO_SUFFIX);
+   ret = emulate_step(regs, auprobe->insn[0], auprobe->insn[1]);
if (ret > 0)
return true;
 
-- 
2.17.1



[PATCH v2 10/13] powerpc/kprobes: Support kprobes on prefixed instructions

2020-02-10 Thread Jordan Niethe
A prefixed instruction is composed of a word prefix followed by a word
suffix. It does not make sense to be able to have a kprobe on the suffix
of a prefixed instruction, so make this impossible.

Kprobes work by replacing an instruction with a trap and saving that
instruction to be single stepped out of place later. Currently there is
not enough space allocated to keep a prefixed instruction for single
stepping. Increase the amount of space allocated for holding the
instruction copy.

kprobe_post_handler() expects all instructions to be 4 bytes long which
means that it does not function correctly for prefixed instructions.
Add checks for prefixed instructions which will use a length of 8 bytes
instead.

For optprobes we normally patch in loading the instruction we put a
probe on into r4 before calling emulate_step(). We now make space and
patch in loading the suffix into r5 as well.

Signed-off-by: Jordan Niethe 
---
 arch/powerpc/include/asm/kprobes.h   |  5 +--
 arch/powerpc/kernel/kprobes.c| 47 +---
 arch/powerpc/kernel/optprobes.c  | 32 ++-
 arch/powerpc/kernel/optprobes_head.S |  6 
 4 files changed, 63 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/kprobes.h 
b/arch/powerpc/include/asm/kprobes.h
index 66b3f2983b22..0d44ce8a3163 100644
--- a/arch/powerpc/include/asm/kprobes.h
+++ b/arch/powerpc/include/asm/kprobes.h
@@ -38,12 +38,13 @@ extern kprobe_opcode_t optprobe_template_entry[];
 extern kprobe_opcode_t optprobe_template_op_address[];
 extern kprobe_opcode_t optprobe_template_call_handler[];
 extern kprobe_opcode_t optprobe_template_insn[];
+extern kprobe_opcode_t optprobe_template_suffix[];
 extern kprobe_opcode_t optprobe_template_call_emulate[];
 extern kprobe_opcode_t optprobe_template_ret[];
 extern kprobe_opcode_t optprobe_template_end[];
 
-/* Fixed instruction size for powerpc */
-#define MAX_INSN_SIZE  1
+/* Prefixed instructions are two words */
+#define MAX_INSN_SIZE  2
 #define MAX_OPTIMIZED_LENGTH   sizeof(kprobe_opcode_t) /* 4 bytes */
 #define MAX_OPTINSN_SIZE   (optprobe_template_end - 
optprobe_template_entry)
 #define RELATIVEJUMP_SIZE  sizeof(kprobe_opcode_t) /* 4 bytes */
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 24a56f062d9e..b061deba4fe7 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -104,17 +104,30 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
unsigned int offset)
 
 int arch_prepare_kprobe(struct kprobe *p)
 {
+   int len;
int ret = 0;
+   struct kprobe *prev;
kprobe_opcode_t insn = *p->addr;
+   kprobe_opcode_t prefix = *(p->addr - 1);
 
+   preempt_disable();
if ((unsigned long)p->addr & 0x03) {
printk("Attempt to register kprobe at an unaligned address\n");
ret = -EINVAL;
} else if (IS_MTMSRD(insn) || IS_RFID(insn) || IS_RFI(insn)) {
printk("Cannot register a kprobe on rfi/rfid or mtmsr[d]\n");
ret = -EINVAL;
+   } else if (IS_PREFIX(prefix)) {
+   printk("Cannot register a kprobe on the second word of prefixed 
instruction\n");
+   ret = -EINVAL;
+   }
+   prev = get_kprobe(p->addr - 1);
+   if (prev && IS_PREFIX(*prev->ainsn.insn)) {
+   printk("Cannot register a kprobe on the second word of prefixed 
instruction\n");
+   ret = -EINVAL;
}
 
+
/* insn must be on a special executable page on ppc64.  This is
 * not explicitly required on ppc32 (right now), but it doesn't hurt */
if (!ret) {
@@ -124,14 +137,18 @@ int arch_prepare_kprobe(struct kprobe *p)
}
 
if (!ret) {
-   memcpy(p->ainsn.insn, p->addr,
-   MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+   if (IS_PREFIX(insn))
+   len = MAX_INSN_SIZE * sizeof(kprobe_opcode_t);
+   else
+   len = sizeof(kprobe_opcode_t);
+   memcpy(p->ainsn.insn, p->addr, len);
p->opcode = *p->addr;
flush_icache_range((unsigned long)p->ainsn.insn,
(unsigned long)p->ainsn.insn + sizeof(kprobe_opcode_t));
}
 
p->ainsn.boostable = 0;
+   preempt_enable_no_resched();
return ret;
 }
 NOKPROBE_SYMBOL(arch_prepare_kprobe);
@@ -216,10 +233,11 @@ NOKPROBE_SYMBOL(arch_prepare_kretprobe);
 static int try_to_emulate(struct kprobe *p, struct pt_regs *regs)
 {
int ret;
-   unsigned int insn = *p->ainsn.insn;
+   unsigned int insn = p->ainsn.insn[0];
+   unsigned int suffix = p->ainsn.insn[1];
 
/* regs->nip is also adjusted if emulate_step returns 1 */
-   ret = emulate_step(regs, insn, PPC_NO_SUFFIX);
+   ret = emulate_step(regs, insn, suffix);
if (ret > 0) {
/*
 * Once this 

[PATCH v2 09/13] powerpc/xmon: Dump prefixed instructions

2020-02-10 Thread Jordan Niethe
Currently when xmon is dumping instructions it reads a word at a time
and then prints that instruction (either as a hex number or by
disassembling it). For prefixed instructions it would be nice to show
its prefix and suffix as together. Use read_instr() so that if a prefix
is encountered its suffix is loaded too. Then print these in the form:
prefix:suffix
Xmon uses the disassembly routines from GNU binutils. These currently do
not support prefixed instructions so we will not disassemble the
prefixed instructions yet.

Signed-off-by: Jordan Niethe 
---
v2: Rename sufx to suffix
---
 arch/powerpc/xmon/xmon.c | 50 +++-
 1 file changed, 39 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 0b085642bbe7..513901ee18b0 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2903,6 +2903,21 @@ prdump(unsigned long adrs, long ndump)
}
 }
 
+static bool instrs_are_equal(unsigned long insta, unsigned long suffixa,
+unsigned long instb, unsigned long suffixb)
+{
+   if (insta != instb)
+   return false;
+
+   if (!IS_PREFIX(insta) && !IS_PREFIX(instb))
+   return true;
+
+   if (IS_PREFIX(insta) && IS_PREFIX(instb))
+   return suffixa == suffixb;
+
+   return false;
+}
+
 typedef int (*instruction_dump_func)(unsigned long inst, unsigned long addr);
 
 static int
@@ -2911,12 +2926,11 @@ generic_inst_dump(unsigned long adr, long count, int 
praddr,
 {
int nr, dotted;
unsigned long first_adr;
-   unsigned int inst, last_inst = 0;
-   unsigned char val[4];
+   unsigned int inst, suffix, last_inst = 0, last_suffix = 0;
 
dotted = 0;
-   for (first_adr = adr; count > 0; --count, adr += 4) {
-   nr = mread(adr, val, 4);
+   for (first_adr = adr; count > 0; --count, adr += nr) {
+   nr = read_instr(adr, , );
if (nr == 0) {
if (praddr) {
const char *x = fault_chars[fault_type];
@@ -2924,8 +2938,9 @@ generic_inst_dump(unsigned long adr, long count, int 
praddr,
}
break;
}
-   inst = GETWORD(val);
-   if (adr > first_adr && inst == last_inst) {
+   if (adr > first_adr && instrs_are_equal(inst, suffix,
+   last_inst,
+   last_suffix)) {
if (!dotted) {
printf(" ...\n");
dotted = 1;
@@ -2934,11 +2949,24 @@ generic_inst_dump(unsigned long adr, long count, int 
praddr,
}
dotted = 0;
last_inst = inst;
-   if (praddr)
-   printf(REG"  %.8x", adr, inst);
-   printf("\t");
-   dump_func(inst, adr);
-   printf("\n");
+   last_suffix = suffix;
+   if (IS_PREFIX(inst)) {
+   if (praddr)
+   printf(REG"  %.8x:%.8x", adr, inst, suffix);
+   printf("\t");
+   /*
+* Just use this until binutils ppc disassembly
+* prints prefixed instructions.
+*/
+   printf("%.8x:%.8x", inst, suffix);
+   printf("\n");
+   } else {
+   if (praddr)
+   printf(REG"  %.8x", adr, inst);
+   printf("\t");
+   dump_func(inst, adr);
+   printf("\n");
+   }
}
return adr - first_adr;
 }
-- 
2.17.1



[PATCH v2 08/13] powerpc/xmon: Add initial support for prefixed instructions

2020-02-10 Thread Jordan Niethe
A prefixed instruction is composed of a word prefix and a word suffix.
It does not make sense to be able to have a breakpoint on the suffix of
a prefixed instruction, so make this impossible.

When leaving xmon_core() we check to see if we are currently at a
breakpoint. If this is the case, the breakpoint needs to be proceeded
from. Initially emulate_step() is tried, but if this fails then we need
to execute the saved instruction out of line. The NIP is set to the
address of bpt::instr[] for the current breakpoint.  bpt::instr[]
contains the instruction replaced by the breakpoint, followed by a trap
instruction.  After bpt::instr[0] is executed and we hit the trap we
enter back into xmon_bpt(). We know that if we got here and the offset
indicates we are at bpt::instr[1] then we have just executed out of line
so we can put the NIP back to the instruction after the breakpoint
location and continue on.

Adding prefixed instructions complicates this as the bpt::instr[1] needs
to be used to hold the suffix. To deal with this make bpt::instr[] big
enough for three word instructions.  bpt::instr[2] contains the trap,
and in the case of word instructions pad bpt::instr[1] with a noop.

No support for disassembling prefixed instructions.

Signed-off-by: Jordan Niethe 
---
v2: Rename sufx to suffix
---
 arch/powerpc/xmon/xmon.c | 82 ++--
 1 file changed, 71 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 897e512c6379..0b085642bbe7 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -97,7 +97,8 @@ static long *xmon_fault_jmp[NR_CPUS];
 /* Breakpoint stuff */
 struct bpt {
unsigned long   address;
-   unsigned intinstr[2];
+   /* Prefixed instructions can not cross 64-byte boundaries */
+   unsigned intinstr[3] __aligned(64);
atomic_tref_count;
int enabled;
unsigned long   pad;
@@ -113,6 +114,7 @@ static struct bpt bpts[NBPTS];
 static struct bpt dabr;
 static struct bpt *iabr;
 static unsigned bpinstr = 0x7fe8;  /* trap */
+static unsigned nopinstr = 0x6000; /* nop */
 
 #define BP_NUM(bp) ((bp) - bpts + 1)
 
@@ -120,6 +122,7 @@ static unsigned bpinstr = 0x7fe8;   /* trap */
 static int cmds(struct pt_regs *);
 static int mread(unsigned long, void *, int);
 static int mwrite(unsigned long, void *, int);
+static int read_instr(unsigned long, unsigned int *, unsigned int *);
 static int handle_fault(struct pt_regs *);
 static void byterev(unsigned char *, int);
 static void memex(void);
@@ -706,7 +709,7 @@ static int xmon_core(struct pt_regs *regs, int fromipi)
bp = at_breakpoint(regs->nip);
if (bp != NULL) {
int stepped = emulate_step(regs, bp->instr[0],
-  PPC_NO_SUFFIX);
+  bp->instr[1]);
if (stepped == 0) {
regs->nip = (unsigned long) >instr[0];
atomic_inc(>ref_count);
@@ -761,8 +764,8 @@ static int xmon_bpt(struct pt_regs *regs)
 
/* Are we at the trap at bp->instr[1] for some bp? */
bp = in_breakpoint_table(regs->nip, );
-   if (bp != NULL && offset == 4) {
-   regs->nip = bp->address + 4;
+   if (bp != NULL && (offset == 4 || offset == 8)) {
+   regs->nip = bp->address + offset;
atomic_dec(>ref_count);
return 1;
}
@@ -864,7 +867,8 @@ static struct bpt *in_breakpoint_table(unsigned long nip, 
unsigned long *offp)
return NULL;
off %= sizeof(struct bpt);
if (off != offsetof(struct bpt, instr[0])
-   && off != offsetof(struct bpt, instr[1]))
+   && off != offsetof(struct bpt, instr[1])
+   && off != offsetof(struct bpt, instr[2]))
return NULL;
*offp = off - offsetof(struct bpt, instr[0]);
return (struct bpt *) (nip - off);
@@ -881,9 +885,18 @@ static struct bpt *new_breakpoint(unsigned long a)
 
for (bp = bpts; bp < [NBPTS]; ++bp) {
if (!bp->enabled && atomic_read(>ref_count) == 0) {
+   /*
+* Prefixed instructions are two words, but regular
+* instructions are only one. Use a nop to pad out the
+* regular instructions so that we can place the trap
+* at the same plac. For prefixed instructions the nop
+* will get overwritten during insert_bpts().
+*/
bp->address = a;
-   bp->instr[1] = bpinstr;
+   bp->instr[1] = nopinstr;
store_inst(>instr[1]);
+   bp->instr[2] = bpinstr;
+   

[PATCH v2 07/13] powerpc/traps: Check for prefixed instructions in facility_unavailable_exception()

2020-02-10 Thread Jordan Niethe
If prefixed instructions are made unavailable by the [H]FSCR, attempting
to use them will cause a facility unavailable exception. Add "PREFIX" to
the facility_strings[].

Currently there are no prefixed instructions that are actually emulated
by emulate_instruction() within facility_unavailable_exception().
However, when caused by a prefixed instructions the SRR1 PREFIXED bit is
set. Prepare for dealing with emulated prefixed instructions by checking
for this bit.

Signed-off-by: Jordan Niethe 
---
 arch/powerpc/kernel/traps.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index d80b82fc1ae3..cd8b3043c268 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1739,6 +1739,7 @@ void facility_unavailable_exception(struct pt_regs *regs)
[FSCR_TAR_LG] = "TAR",
[FSCR_MSGP_LG] = "MSGP",
[FSCR_SCV_LG] = "SCV",
+   [FSCR_PREFIX_LG] = "PREFIX",
};
char *facility = "unknown";
u64 value;
-- 
2.17.1



[PATCH v2 06/13] powerpc: Support prefixed instructions in alignment handler

2020-02-10 Thread Jordan Niethe
Alignment interrupts can be caused by prefixed instructions accessing
memory. In the alignment handler the instruction that caused the
exception is loaded and attempted emulate. If the instruction is a
prefixed instruction load the prefix and suffix to emulate. After
emulating increment the NIP by 8.

Prefixed instructions are not permitted to cross 64-byte boundaries. If
they do the alignment interrupt is invoked with SRR1 BOUNDARY bit set.
If this occurs send a SIGBUS to the offending process if in user mode.
If in kernel mode call bad_page_fault().

Signed-off-by: Jordan Niethe 
---
v2: - Move __get_user_instr() and __get_user_instr_inatomic() to this
commit (previously in "powerpc sstep: Prepare to support prefixed
instructions").
- Rename sufx to suffix
- Use a macro for calculating instruction length
---
 arch/powerpc/include/asm/uaccess.h | 30 ++
 arch/powerpc/kernel/align.c|  8 +---
 arch/powerpc/kernel/traps.c| 21 -
 3 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 2f500debae21..30f63a81c8d8 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -474,4 +474,34 @@ static __must_check inline bool user_access_begin(const 
void __user *ptr, size_t
 #define unsafe_copy_to_user(d, s, l, e) \
unsafe_op_wrap(raw_copy_to_user_allowed(d, s, l), e)
 
+/*
+ * When reading an instruction iff it is a prefix, the suffix needs to be also
+ * loaded.
+ */
+#define __get_user_instr(x, y, ptr)\
+({ \
+   long __gui_ret = 0; \
+   y = 0;  \
+   __gui_ret = __get_user(x, ptr); \
+   if (!__gui_ret) {   \
+   if (IS_PREFIX(x))   \
+   __gui_ret = __get_user(y, ptr + 1); \
+   }   \
+   \
+   __gui_ret;  \
+})
+
+#define __get_user_instr_inatomic(x, y, ptr)   \
+({ \
+   long __gui_ret = 0; \
+   y = 0;  \
+   __gui_ret = __get_user_inatomic(x, ptr);\
+   if (!__gui_ret) {   \
+   if (IS_PREFIX(x))   \
+   __gui_ret = __get_user_inatomic(y, ptr + 1);\
+   }   \
+   \
+   __gui_ret;  \
+})
+
 #endif /* _ARCH_POWERPC_UACCESS_H */
diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
index ba3bf5c3ab62..e42cfaa616d3 100644
--- a/arch/powerpc/kernel/align.c
+++ b/arch/powerpc/kernel/align.c
@@ -293,7 +293,7 @@ static int emulate_spe(struct pt_regs *regs, unsigned int 
reg,
 
 int fix_alignment(struct pt_regs *regs)
 {
-   unsigned int instr;
+   unsigned int instr, suffix;
struct instruction_op op;
int r, type;
 
@@ -303,13 +303,15 @@ int fix_alignment(struct pt_regs *regs)
 */
CHECK_FULL_REGS(regs);
 
-   if (unlikely(__get_user(instr, (unsigned int __user *)regs->nip)))
+   if (unlikely(__get_user_instr(instr, suffix,
+(unsigned int __user *)regs->nip)))
return -EFAULT;
if ((regs->msr & MSR_LE) != (MSR_KERNEL & MSR_LE)) {
/* We don't handle PPC little-endian any more... */
if (cpu_has_feature(CPU_FTR_PPC_LE))
return -EIO;
instr = swab32(instr);
+   suffix = swab32(suffix);
}
 
 #ifdef CONFIG_SPE
@@ -334,7 +336,7 @@ int fix_alignment(struct pt_regs *regs)
if ((instr & 0xfc0006fe) == (PPC_INST_COPY & 0xfc0006fe))
return -EIO;
 
-   r = analyse_instr(, regs, instr, PPC_NO_SUFFIX);
+   r = analyse_instr(, regs, instr, suffix);
if (r < 0)
return -EINVAL;
 
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 82a3438300fd..d80b82fc1ae3 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -583,6 +583,10 @@ static inline int check_io_access(struct pt_regs *regs)
 #define REASON_ILLEGAL (ESR_PIL | ESR_PUO)
 #define REASON_PRIVILEGED  ESR_PPR
 #define REASON_TRAPESR_PTR
+#define REASON_PREFIXED0
+#define REASON_BOUNDARY0
+
+#define inst_length(reason)4
 
 /* single-step stuff */
 #define single_stepping(regs)  (current->thread.debug.dbcr0 & DBCR0_IC)
@@ -597,6 +601,10 @@ static inline int 

[PATCH v2 05/13] powerpc sstep: Add support for prefixed fixed-point arithmetic

2020-02-10 Thread Jordan Niethe
This adds emulation support for the following prefixed Fixed-Point
Arithmetic instructions:
  * Prefixed Add Immediate (paddi)

Signed-off-by: Jordan Niethe 
---
 arch/powerpc/lib/sstep.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 0e21c21ff2be..8ba74c10bc03 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -2777,6 +2777,10 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
break;
op->ea = mlsd_8lsd_ea(instr, suffix, regs);
switch (suffixopcode) {
+   case 14:/* paddi */
+   op->type = COMPUTE | PREFIXED;
+   op->val = op->ea;
+   goto compute_done;
case 32:/* plwz */
op->type = MKOP(LOAD, PREFIXED, 4);
break;
-- 
2.17.1



[PATCH v2 04/13] powerpc sstep: Add support for prefixed load/stores

2020-02-10 Thread Jordan Niethe
This adds emulation support for the following prefixed integer
load/stores:
  * Prefixed Load Byte and Zero (plbz)
  * Prefixed Load Halfword and Zero (plhz)
  * Prefixed Load Halfword Algebraic (plha)
  * Prefixed Load Word and Zero (plwz)
  * Prefixed Load Word Algebraic (plwa)
  * Prefixed Load Doubleword (pld)
  * Prefixed Store Byte (pstb)
  * Prefixed Store Halfword (psth)
  * Prefixed Store Word (pstw)
  * Prefixed Store Doubleword (pstd)
  * Prefixed Load Quadword (plq)
  * Prefixed Store Quadword (pstq)

the follow prefixed floating-point load/stores:
  * Prefixed Load Floating-Point Single (plfs)
  * Prefixed Load Floating-Point Double (plfd)
  * Prefixed Store Floating-Point Single (pstfs)
  * Prefixed Store Floating-Point Double (pstfd)

and for the following prefixed VSX load/stores:
  * Prefixed Load VSX Scalar Doubleword (plxsd)
  * Prefixed Load VSX Scalar Single-Precision (plxssp)
  * Prefixed Load VSX Vector [0|1]  (plxv, plxv0, plxv1)
  * Prefixed Store VSX Scalar Doubleword (pstxsd)
  * Prefixed Store VSX Scalar Single-Precision (pstxssp)
  * Prefixed Store VSX Vector [0|1] (pstxv, pstxv0, pstxv1)

Signed-off-by: Jordan Niethe 
---
v2: - Combine all load/store patches
- Fix the name of Type 01 instructions
- Remove sign extension flag from pstd/pld
- Rename sufx -> suffix
---
 arch/powerpc/lib/sstep.c | 165 +++
 1 file changed, 165 insertions(+)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 65143ab1bf64..0e21c21ff2be 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -187,6 +187,44 @@ static nokprobe_inline unsigned long xform_ea(unsigned int 
instr,
return ea;
 }
 
+/*
+ * Calculate effective address for a MLS:D-form / 8LS:D-form
+ * prefixed instruction
+ */
+static nokprobe_inline unsigned long mlsd_8lsd_ea(unsigned int instr,
+ unsigned int suffix,
+ const struct pt_regs *regs)
+{
+   int ra, prefix_r;
+   unsigned int  dd;
+   unsigned long ea, d0, d1, d;
+
+   prefix_r = instr & (1ul << 20);
+   ra = (suffix >> 16) & 0x1f;
+
+   d0 = instr & 0x3;
+   d1 = suffix & 0x;
+   d = (d0 << 16) | d1;
+
+   /*
+* sign extend a 34 bit number
+*/
+   dd = (unsigned int) (d >> 2);
+   ea = (signed int) dd;
+   ea = (ea << 2) | (d & 0x3);
+
+   if (!prefix_r && ra)
+   ea += regs->gpr[ra];
+   else if (!prefix_r && !ra)
+   ; /* Leave ea as is */
+   else if (prefix_r && !ra)
+   ea += regs->nip;
+   else if (prefix_r && ra)
+   ; /* Invalid form. Should already be checked for by caller! */
+
+   return ea;
+}
+
 /*
  * Return the largest power of 2, not greater than sizeof(unsigned long),
  * such that x is a multiple of it.
@@ -1166,6 +1204,7 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
  unsigned int instr, unsigned int suffix)
 {
unsigned int opcode, ra, rb, rc, rd, spr, u;
+   unsigned int suffixopcode, prefixtype, prefix_r;
unsigned long int imm;
unsigned long int val, val2;
unsigned int mb, me, sh;
@@ -2652,6 +2691,132 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
 
}
 
+/*
+ * Prefixed instructions
+ */
+   switch (opcode) {
+   case 1:
+   prefix_r = instr & (1ul << 20);
+   ra = (suffix >> 16) & 0x1f;
+   op->update_reg = ra;
+   rd = (suffix >> 21) & 0x1f;
+   op->reg = rd;
+   op->val = regs->gpr[rd];
+
+   suffixopcode = suffix >> 26;
+   prefixtype = (instr >> 24) & 0x3;
+   switch (prefixtype) {
+   case 0: /* Type 00  Eight-Byte Load/Store */
+   if (prefix_r && ra)
+   break;
+   op->ea = mlsd_8lsd_ea(instr, suffix, regs);
+   switch (suffixopcode) {
+   case 41:/* plwa */
+   op->type = MKOP(LOAD, PREFIXED | SIGNEXT, 4);
+   break;
+   case 42:/* plxsd */
+   op->reg = rd + 32;
+   op->type = MKOP(LOAD_VSX, PREFIXED, 8);
+   op->element_size = 8;
+   op->vsx_flags = VSX_CHECK_VEC;
+   break;
+   case 43:/* plxssp */
+   op->reg = rd + 32;
+   op->type = MKOP(LOAD_VSX, PREFIXED, 4);
+   op->element_size = 8;
+   op->vsx_flags = VSX_FPCONV | VSX_CHECK_VEC;
+   break;
+   case 46:  

[PATCH v2 03/13] powerpc sstep: Prepare to support prefixed instructions

2020-02-10 Thread Jordan Niethe
Currently all instructions are a single word long. A future ISA version
will include prefixed instructions which have a double word length. The
functions used for analysing and emulating instructions need to be
modified so that they can handle these new instruction types.

A prefixed instruction is a word prefix followed by a word suffix. All
prefixes uniquely have the primary op-code 1. Suffixes may be valid word
instructions or instructions that only exist as suffixes.

In handling prefixed instructions it will be convenient to treat the
suffix and prefix as separate words. To facilitate this modify
analyse_instr() and emulate_step() to take a suffix as a
parameter. For word instructions it does not matter what is passed in
here - it will be ignored.

We also define a new flag, PREFIXED, to be used in instruction_op:type.
This flag will indicate when emulating an analysed instruction if the
NIP should be advanced by word length or double word length.

The callers of analyse_instr() and emulate_step() will need their own
changes to be able to support prefixed instructions. For now modify them
to pass in 0 as a suffix.

Note that at this point no prefixed instructions are emulated or
analysed - this is just making it possible to do so.

Signed-off-by: Jordan Niethe 
---
v2: - Move definition of __get_user_instr() and
__get_user_instr_inatomic() to "powerpc: Support prefixed instructions
in alignment handler."
- Use a macro for returning the length of an op
- Rename sufx -> suffix
- Define and use PPC_NO_SUFFIX instead of 0
---
 arch/powerpc/include/asm/ppc-opcode.h |  5 +
 arch/powerpc/include/asm/sstep.h  |  9 ++--
 arch/powerpc/kernel/align.c   |  2 +-
 arch/powerpc/kernel/hw_breakpoint.c   |  4 ++--
 arch/powerpc/kernel/kprobes.c |  2 +-
 arch/powerpc/kernel/mce_power.c   |  2 +-
 arch/powerpc/kernel/optprobes.c   |  3 ++-
 arch/powerpc/kernel/uprobes.c |  2 +-
 arch/powerpc/kvm/emulate_loadstore.c  |  2 +-
 arch/powerpc/lib/sstep.c  | 12 ++-
 arch/powerpc/lib/test_emulate_step.c  | 30 +--
 arch/powerpc/xmon/xmon.c  |  5 +++--
 12 files changed, 46 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index c1df75edde44..72783bc92e50 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -377,6 +377,11 @@
 #define PPC_INST_VCMPEQUD  0x10c7
 #define PPC_INST_VCMPEQUB  0x1006
 
+/* macro to check if a word is a prefix */
+#define IS_PREFIX(x)   (((x) >> 26) == 1)
+#definePPC_NO_SUFFIX   0
+#definePPC_INST_LENGTH(x)  (IS_PREFIX(x) ? 8 : 4)
+
 /* macros to insert fields into opcodes */
 #define ___PPC_RA(a)   (((a) & 0x1f) << 16)
 #define ___PPC_RB(b)   (((b) & 0x1f) << 11)
diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index 769f055509c9..9ea8904a1549 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -89,11 +89,15 @@ enum instruction_type {
 #define VSX_LDLEFT 4   /* load VSX register from left */
 #define VSX_CHECK_VEC  8   /* check MSR_VEC not MSR_VSX for reg >= 32 */
 
+/* Prefixed flag, ORed in with type */
+#define PREFIXED   0x800
+
 /* Size field in type word */
 #define SIZE(n)((n) << 12)
 #define GETSIZE(w) ((w) >> 12)
 
 #define GETTYPE(t) ((t) & INSTR_TYPE_MASK)
+#define OP_LENGTH(t)   (((t) & PREFIXED) ? 8 : 4)
 
 #define MKOP(t, f, s)  ((t) | (f) | SIZE(s))
 
@@ -132,7 +136,7 @@ union vsx_reg {
  * otherwise.
  */
 extern int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
-unsigned int instr);
+unsigned int instr, unsigned int suffix);
 
 /*
  * Emulate an instruction that can be executed just by updating
@@ -149,7 +153,8 @@ void emulate_update_regs(struct pt_regs *reg, struct 
instruction_op *op);
  * 0 if it could not be emulated, or -1 for an instruction that
  * should not be emulated (rfid, mtmsrd clearing MSR_RI, etc.).
  */
-extern int emulate_step(struct pt_regs *regs, unsigned int instr);
+extern int emulate_step(struct pt_regs *regs, unsigned int instr,
+   unsigned int suffix);
 
 /*
  * Emulate a load or store instruction by reading/writing the
diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
index 92045ed64976..ba3bf5c3ab62 100644
--- a/arch/powerpc/kernel/align.c
+++ b/arch/powerpc/kernel/align.c
@@ -334,7 +334,7 @@ int fix_alignment(struct pt_regs *regs)
if ((instr & 0xfc0006fe) == (PPC_INST_COPY & 0xfc0006fe))
return -EIO;
 
-   r = analyse_instr(, regs, instr);
+   r = analyse_instr(, regs, instr, PPC_NO_SUFFIX);
if (r < 0)
return -EINVAL;
 
diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c

[PATCH v2 02/13] powerpc: Define new SRR1 bits for a future ISA version

2020-02-10 Thread Jordan Niethe
Add the BOUNDARY SRR1 bit definition for when the cause of an alignment
exception is a prefixed instruction that crosses a 64-byte boundary.
Add the PREFIXED SRR1 bit definition for exceptions caused by prefixed
instructions.

Bit 35 of SRR1 is called SRR1_ISI_N_OR_G. This name comes from it being
used to indicate that an ISI was due to the access being no-exec or
guarded. A future ISA version adds another purpose. It is also set if
there is an access in a cache-inhibited location for prefixed
instruction.  Rename from SRR1_ISI_N_OR_G to SRR1_ISI_N_G_OR_CIP.

Signed-off-by: Jordan Niethe 
---
v2: Combined all the commits concerning SRR1 bits.
---
 arch/powerpc/include/asm/reg.h  | 4 +++-
 arch/powerpc/kvm/book3s_hv_nested.c | 2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index c7758c2ccc5f..173f33df4fab 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -762,7 +762,7 @@
 #endif
 
 #define   SRR1_ISI_NOPT0x4000 /* ISI: Not found in hash */
-#define   SRR1_ISI_N_OR_G  0x1000 /* ISI: Access is no-exec or G */
+#define   SRR1_ISI_N_G_OR_CIP  0x1000 /* ISI: Access is no-exec or G or CI 
for a prefixed instruction */
 #define   SRR1_ISI_PROT0x0800 /* ISI: Other protection 
fault */
 #define   SRR1_WAKEMASK0x0038 /* reason for wakeup */
 #define   SRR1_WAKEMASK_P8 0x003c /* reason for wakeup on POWER8 and 9 
*/
@@ -789,6 +789,8 @@
 #define   SRR1_PROGADDR0x0001 /* SRR0 contains subsequent 
addr */
 
 #define   SRR1_MCE_MCP 0x0008 /* Machine check signal caused 
interrupt */
+#define   SRR1_BOUNDARY0x1000 /* Prefixed instruction 
crosses 64-byte boundary */
+#define   SRR1_PREFIXED0x2000 /* Exception caused by 
prefixed instruction */
 
 #define SPRN_HSRR0 0x13A   /* Save/Restore Register 0 */
 #define SPRN_HSRR1 0x13B   /* Save/Restore Register 1 */
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index dc97e5be76f6..6ab685227574 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -1169,7 +1169,7 @@ static int kvmhv_translate_addr_nested(struct kvm_vcpu 
*vcpu,
} else if (vcpu->arch.trap == BOOK3S_INTERRUPT_H_INST_STORAGE) {
/* Can we execute? */
if (!gpte_p->may_execute) {
-   flags |= SRR1_ISI_N_OR_G;
+   flags |= SRR1_ISI_N_G_OR_CIP;
goto forward_to_l1;
}
} else {
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 220305454c23..b53a9f1c1a46 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -1260,7 +1260,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned 
long addr,
status &= ~DSISR_NOHPTE;/* DSISR_NOHPTE == SRR1_ISI_NOPT */
if (!data) {
if (gr & (HPTE_R_N | HPTE_R_G))
-   return status | SRR1_ISI_N_OR_G;
+   return status | SRR1_ISI_N_G_OR_CIP;
if (!hpte_read_permission(pp, slb_v & key))
return status | SRR1_ISI_PROT;
} else if (status & DSISR_ISSTORE) {
-- 
2.17.1



[PATCH v2 01/13] powerpc: Enable Prefixed Instructions

2020-02-10 Thread Jordan Niethe
From: Alistair Popple 

Prefix instructions have their own FSCR bit which needs to enabled via
a CPU feature. The kernel will save the FSCR for problem state but it
needs to be enabled initially.

Signed-off-by: Alistair Popple 
---
 arch/powerpc/include/asm/reg.h|  3 +++
 arch/powerpc/kernel/dt_cpu_ftrs.c | 23 +++
 2 files changed, 26 insertions(+)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 1aa46dff0957..c7758c2ccc5f 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -397,6 +397,7 @@
 #define SPRN_RWMR  0x375   /* Region-Weighting Mode Register */
 
 /* HFSCR and FSCR bit numbers are the same */
+#define FSCR_PREFIX_LG 13  /* Enable Prefix Instructions */
 #define FSCR_SCV_LG12  /* Enable System Call Vectored */
 #define FSCR_MSGP_LG   10  /* Enable MSGP */
 #define FSCR_TAR_LG8   /* Enable Target Address Register */
@@ -408,11 +409,13 @@
 #define FSCR_VECVSX_LG 1   /* Enable VMX/VSX  */
 #define FSCR_FP_LG 0   /* Enable Floating Point */
 #define SPRN_FSCR  0x099   /* Facility Status & Control Register */
+#define   FSCR_PREFIX  __MASK(FSCR_PREFIX_LG)
 #define   FSCR_SCV __MASK(FSCR_SCV_LG)
 #define   FSCR_TAR __MASK(FSCR_TAR_LG)
 #define   FSCR_EBB __MASK(FSCR_EBB_LG)
 #define   FSCR_DSCR__MASK(FSCR_DSCR_LG)
 #define SPRN_HFSCR 0xbe/* HV=1 Facility Status & Control Register */
+#define   HFSCR_PREFIX __MASK(FSCR_PREFIX_LG)
 #define   HFSCR_MSGP   __MASK(FSCR_MSGP_LG)
 #define   HFSCR_TAR__MASK(FSCR_TAR_LG)
 #define   HFSCR_EBB__MASK(FSCR_EBB_LG)
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 182b4047c1ef..396f2c6c588e 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -553,6 +553,28 @@ static int __init feat_enable_large_ci(struct 
dt_cpu_feature *f)
return 1;
 }
 
+static int __init feat_enable_prefix(struct dt_cpu_feature *f)
+{
+   u64 fscr, hfscr;
+
+   if (f->usable_privilege & USABLE_HV) {
+   hfscr = mfspr(SPRN_HFSCR);
+   hfscr |= HFSCR_PREFIX;
+   mtspr(SPRN_HFSCR, hfscr);
+   }
+
+   if (f->usable_privilege & USABLE_OS) {
+   fscr = mfspr(SPRN_FSCR);
+   fscr |= FSCR_PREFIX;
+   mtspr(SPRN_FSCR, fscr);
+
+   if (f->usable_privilege & USABLE_PR)
+   current->thread.fscr |= FSCR_PREFIX;
+   }
+
+   return 1;
+}
+
 struct dt_cpu_feature_match {
const char *name;
int (*enable)(struct dt_cpu_feature *f);
@@ -626,6 +648,7 @@ static struct dt_cpu_feature_match __initdata
{"vector-binary128", feat_enable, 0},
{"vector-binary16", feat_enable, 0},
{"wait-v3", feat_enable, 0},
+   {"prefix-instructions", feat_enable_prefix, 0},
 };
 
 static bool __initdata using_dt_cpu_ftrs;
-- 
2.17.1



[PATCH v2 00/13] Initial Prefixed Instruction support

2020-02-10 Thread Jordan Niethe
A future revision of the ISA will introduce prefixed instructions. A
prefixed instruction is composed of a 4-byte prefix followed by a
4-byte suffix.

All prefixes have the major opcode 1. A prefix will never be a valid
word instruction. A suffix may be an existing word instruction or a
new instruction.

This series enables prefixed instructions and extends the instruction
emulation to support them. Then the places where prefixed instructions
might need to be emulated are updated.

This v2 incorporates feedback from Daniel Axtens and and Balamuruhan
S. The major changes are:
- Squashing together all commits about SRR1 bits
- Squashing all commits for supporting prefixed load stores
- Changing abbreviated references to sufx/prfx -> suffix/prefix
- Introducing macros for returning the length of an instruction
- Removing sign extension flag from pstd/pld in sstep.c
- Dropping patch  "powerpc/fault: Use analyse_instr() to check for
  store with updates to sp" from the series, it did not really fit
  with prefixed enablement in the first place and as reported by Greg
  Kurz did not work correctly.


Alistair Popple (1):
  powerpc: Enable Prefixed Instructions

Jordan Niethe (12):
  powerpc: Define new SRR1 bits for a future ISA version
  powerpc sstep: Prepare to support prefixed instructions
  powerpc sstep: Add support for prefixed load/stores
  powerpc sstep: Add support for prefixed fixed-point arithmetic
  powerpc: Support prefixed instructions in alignment handler
  powerpc/traps: Check for prefixed instructions in
facility_unavailable_exception()
  powerpc/xmon: Add initial support for prefixed instructions
  powerpc/xmon: Dump prefixed instructions
  powerpc/kprobes: Support kprobes on prefixed instructions
  powerpc/uprobes: Add support for prefixed instructions
  powerpc/hw_breakpoints: Initial support for prefixed instructions
  powerpc: Add prefix support to mce_find_instr_ea_and_pfn()

 arch/powerpc/include/asm/kprobes.h|   5 +-
 arch/powerpc/include/asm/ppc-opcode.h |   5 +
 arch/powerpc/include/asm/reg.h|   7 +-
 arch/powerpc/include/asm/sstep.h  |   9 +-
 arch/powerpc/include/asm/uaccess.h|  30 +
 arch/powerpc/include/asm/uprobes.h|  16 ++-
 arch/powerpc/kernel/align.c   |   8 +-
 arch/powerpc/kernel/dt_cpu_ftrs.c |  23 
 arch/powerpc/kernel/hw_breakpoint.c   |   8 +-
 arch/powerpc/kernel/kprobes.c |  47 +--
 arch/powerpc/kernel/mce_power.c   |   6 +-
 arch/powerpc/kernel/optprobes.c   |  31 +++--
 arch/powerpc/kernel/optprobes_head.S  |   6 +
 arch/powerpc/kernel/traps.c   |  22 +++-
 arch/powerpc/kernel/uprobes.c |   4 +-
 arch/powerpc/kvm/book3s_hv_nested.c   |   2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   |   2 +-
 arch/powerpc/kvm/emulate_loadstore.c  |   2 +-
 arch/powerpc/lib/sstep.c  | 181 +-
 arch/powerpc/lib/test_emulate_step.c  |  30 ++---
 arch/powerpc/xmon/xmon.c  | 133 +++
 21 files changed, 487 insertions(+), 90 deletions(-)

-- 
2.17.1



Re: [PATCH] powerpc/8xx: Fix clearing of bits 20-23 in ITLB miss

2020-02-10 Thread Leonardo Bras
Christophe Leroy  writes:

> In ITLB miss handled the line supposed to clear bits 20-23 on the
> L2 ITLB entry is buggy and does indeed nothing, leading to undefined
> value which could allow execution when it shouldn't.
>
> Properly do the clearing with the relevant instruction.
>
> Fixes: 74fabcadfd43 ("powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB 
> Miss handlers")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/kernel/head_8xx.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
> index 9922306ae512..073a651787df 100644
> --- a/arch/powerpc/kernel/head_8xx.S
> +++ b/arch/powerpc/kernel/head_8xx.S
> @@ -256,7 +256,7 @@ InstructionTLBMiss:
>* set.  All other Linux PTE bits control the behavior
>* of the MMU.
>*/
> - rlwimi  r10, r10, 0, 0x0f00 /* Clear bits 20-23 */
> + rlwinm  r10, r10, 0, ~0x0f00/* Clear bits 20-23 */
>   rlwimi  r10, r10, 4, 0x0400 /* Copy _PAGE_EXEC into bit 21 */
>   ori r10, r10, RPN_PATTERN | 0x200 /* Set 22 and 24-27 */
>   mtspr   SPRN_MI_RPN, r10/* Update TLB entry */
> -- 
> 2.25.0

Looks a valid change.
rlwimi  r10, r10, 0, 0x0f00 means: 
r10 = ((r10 << 0) & 0x0f00) | (r10 & ~0x0f00) which ends up being
r10 = r10 

On ISA, rlwinm is recommended for clearing high order bits.
rlwinm  r10, r10, 0, ~0x0f00 means:
r10 = (r10 << 0) & ~0x0f00

Which does exactly what the comments suggests.

FWIW:
Reviwed-by: Leonardo Bras 


signature.asc
Description: This is a digitally signed message part


QEMU/KVM snapshot restore bug

2020-02-10 Thread dftxbs3e
Hello,

I took a snapshot of a ppc64 (big endian) VM from a ppc64 (little endian) host 
using `virsh snapshot-create-as --domain  --name `

Then I restarted my system and tried restoring the snapshot:

# virsh snapshot-revert --domain  --snapshotname 
error: internal error: process exited while connecting to monitor: 
2020-02-11T03:18:08.110582Z qemu-system-ppc64: KVM_SET_DEVICE_ATTR failed: 
Group 3 attr 0x1309: Device or resource busy
2020-02-11T03:18:08.110605Z qemu-system-ppc64: error while loading state for 
instance 0x0 of device 'spapr'
2020-02-11T03:18:08.112843Z qemu-system-ppc64: Error -1 while loading VM state

And dmesg shows each time the restore command is executed:

[  180.176606] WARNING: CPU: 16 PID: 5528 at arch/powerpc/kvm/book3s_xive.c:345 
xive_try_pick_queue+0x40/0xb8 [kvm]
[  180.176608] Modules linked in: vhost_net vhost tap kvm_hv kvm xt_CHECKSUM 
xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge 8021q garp mrp stp llc 
rfkill nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT 
nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat 
ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security 
iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack 
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables 
ip6table_filter ip6_tables iptable_filter sunrpc raid1 at24 regmap_i2c 
snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg joydev snd_hda_codec 
snd_hda_core ofpart snd_hwdep crct10dif_vpmsum snd_seq ipmi_powernv 
powernv_flash ipmi_devintf snd_seq_device mtd ipmi_msghandler rtc_opal snd_pcm 
opal_prd i2c_opal snd_timer snd soundcore lz4 lz4_compress zram ip_tables xfs 
libcrc32c dm_crypt amdgpu ast drm_vram_helper mfd_core i2c_algo_bit gpu_sched 
drm_kms_helper mpt3sas
[  180.176652]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm 
vmx_crypto tg3 crc32c_vpmsum nvme raid_class scsi_transport_sas nvme_core 
drm_panel_orientation_quirks i2c_core fuse
[  180.176663] CPU: 16 PID: 5528 Comm: qemu-system-ppc Not tainted 
5.4.17-200.fc31.ppc64le #1
[  180.176665] NIP:  c0080a883c80 LR: c0080a886db8 CTR: c0080a88a9e0
[  180.176667] REGS: c00767a17890 TRAP: 0700   Not tainted  
(5.4.17-200.fc31.ppc64le)
[  180.176668] MSR:  90029033   CR: 48224248  
XER: 2004
[  180.176673] CFAR: c0080a886db4 IRQMASK: 0 
   GPR00: c0080a886db8 c00767a17b20 c0080a8aed00 
c0002005468a4480 
   GPR04:    
0001 
   GPR08: c0002007142b2400 c0002007142b2400  
c0080a8910f0 
   GPR12: c0080a88a488 c007fffed000  
 
   GPR16: 000149524180 739bda78 739bda30 
025c 
   GPR20:  0003 c0002006f13a 
 
   GPR24: 1359  c002f8c96c38 
c002f8c8 
   GPR28:  c0002006f13a c0002006f13a4038 
c00767a17be4 
[  180.176688] NIP [c0080a883c80] xive_try_pick_queue+0x40/0xb8 [kvm]
[  180.176693] LR [c0080a886db8] kvmppc_xive_select_target+0x100/0x210 [kvm]
[  180.176694] Call Trace:
[  180.176696] [c00767a17b20] [c00767a17b70] 0xc00767a17b70 
(unreliable)
[  180.176701] [c00767a17b70] [c0080a88b420] 
kvmppc_xive_native_set_attr+0xf98/0x1760 [kvm]
[  180.176705] [c00767a17cc0] [c0080a86392c] 
kvm_device_ioctl+0xf4/0x180 [kvm]
[  180.176710] [c00767a17d10] [c05380b0] do_vfs_ioctl+0xaa0/0xd90
[  180.176712] [c00767a17dd0] [c0538464] sys_ioctl+0xc4/0x110
[  180.176716] [c00767a17e20] [c000b9d0] system_call+0x5c/0x68
[  180.176717] Instruction dump:
[  180.176719] 794ad182 0b0a 2c29 41820080 89490010 2c0a 41820074 
78883664 
[  180.176723] 7d094214 e9480070 7d470074 78e7d182 <0b07> 2c2a 41820054 
81480078 
[  180.176727] ---[ end trace 056a6dd275e20684 ]---

Let me know if I can provide more information

Thanks



signature.asc
Description: OpenPGP digital signature


Re: [PATCH V5 06/14] powerpc/vas: Setup thread IRQ handler per VAS instance

2020-02-10 Thread Michael Neuling
On Sun, 2020-02-09 at 21:17 -0800, Haren Myneni wrote:
> On Fri, 2020-02-07 at 16:57 +1100, Michael Neuling wrote:
> > >  /*
> > > + * Process CRBs that we receive on the fault window.
> > > + */
> > > +irqreturn_t vas_fault_handler(int irq, void *data)
> > > +{
> > > + struct vas_instance *vinst = data;
> > > + struct coprocessor_request_block buf, *crb;
> > > + struct vas_window *window;
> > > + void *fifo;
> > > +
> > > + /*
> > > +  * VAS can interrupt with multiple page faults. So process all
> > > +  * valid CRBs within fault FIFO until reaches invalid CRB.
> > > +  * NX updates nx_fault_stamp in CRB and pastes in fault FIFO.
> > > +  * kernel retrives send window from parition send window ID
> > > +  * (pswid) in nx_fault_stamp. So pswid should be non-zero and
> > > +  * use this to check whether CRB is valid.
> > > +  * After reading CRB entry, it is reset with 0's in fault FIFO.
> > > +  *
> > > +  * In case kernel receives another interrupt with different page
> > > +  * fault and CRBs are processed by the previous handling, will be
> > > +  * returned from this function when it sees invalid CRB (means 0's).
> > > +  */
> > > + do {
> > > + mutex_lock(>mutex);
> > 
> > This isn't going to work.
> > 
> > From Documentation/locking/mutex-design.rst
> > 
> > - Mutexes may not be used in hardware or software interrupt
> >   contexts such as tasklets and timers.
> 
> Initially used kernel thread per VAS instance and later using IRQ
> thread. 
> 
> vas_fault_handler() is IRQ thread function, not IRQ handler. I thought
> we can use mutex_lock() in thread function.

Sorry, I missed it was a threaded IRQ handler, so I think is ok to use a
mutex_lock() in there.

You should run with CONFIG DEBUG_MUTEXES and CONFIG_LOCKDEP enabled to give you
some more confidence.

It would be good to document how this mutex is used and document the start of
the function so it doesn't get changed later to a non-threaded handler. 

Mikey


Re: [PATCH V13] mm/debug: Add tests validating architecture page table helpers

2020-02-10 Thread Anshuman Khandual



On 02/10/2020 04:36 PM, Russell King - ARM Linux admin wrote:
> On Mon, Feb 10, 2020 at 11:46:23AM +0100, Christophe Leroy wrote:
>>
>>
>> Le 10/02/2020 à 11:02, Russell King - ARM Linux admin a écrit :
>>> On Mon, Feb 10, 2020 at 07:38:38AM +0100, Christophe Leroy wrote:


 Le 10/02/2020 à 06:35, Anshuman Khandual a écrit :
>
>
> On 02/10/2020 10:22 AM, Andrew Morton wrote:
>> On Thu, 6 Feb 2020 13:49:35 +0530 Anshuman Khandual 
>>  wrote:
>>
>>>
>>> On 02/06/2020 04:40 AM, kbuild test robot wrote:
 Hi Anshuman,

 Thank you for the patch! Yet something to improve:

 [auto build test ERROR on powerpc/next]
 [also build test ERROR on s390/features linus/master arc/for-next v5.5]
 [cannot apply to mmotm/master tip/x86/core arm64/for-next/core 
 next-20200205]
 [if your patch is applied to the wrong git tree, please drop us a note 
 to help
 improve the system. BTW, we also suggest to use '--base' option to 
 specify the
 base tree in git format-patch, please see 
 https://stackoverflow.com/a/37406982]

 url:
 https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-debug-Add-tests-validating-architecture-page-table-helpers/20200205-215507
 base:   
 https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
 config: ia64-allmodconfig (attached as .config)
 compiler: ia64-linux-gcc (GCC) 7.5.0
 reproduce:
   wget 
 https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross
  -O ~/bin/make.cross
   chmod +x ~/bin/make.cross
   # save the attached .config to linux build tree
   GCC_VERSION=7.5.0 make.cross ARCH=ia64

 If you fix the issue, kindly add following tag
 Reported-by: kbuild test robot 

 All error/warnings (new ones prefixed by >>):

  In file included from include/asm-generic/pgtable-nopud.h:8:0,
   from arch/ia64/include/asm/pgtable.h:586,
   from include/linux/mm.h:99,
   from include/linux/highmem.h:8,
   from mm/debug_vm_pgtable.c:14:
  mm/debug_vm_pgtable.c: In function 'pud_clear_tests':
>> include/asm-generic/pgtable-nop4d-hack.h:47:32: error: implicit 
>> declaration of function '__pgd'; did you mean '__p4d'? 
>> [-Werror=implicit-function-declaration]
   #define __pud(x)((pud_t) { __pgd(x) })
  ^
>> mm/debug_vm_pgtable.c:141:8: note: in expansion of macro '__pud'
pud = __pud(pud_val(pud) | RANDOM_ORVALUE);
  ^
>> include/asm-generic/pgtable-nop4d-hack.h:47:22: warning: missing 
>> braces around initializer [-Wmissing-braces]
   #define __pud(x)((pud_t) { __pgd(x) })
^
>> mm/debug_vm_pgtable.c:141:8: note: in expansion of macro '__pud'
pud = __pud(pud_val(pud) | RANDOM_ORVALUE);
  ^
  cc1: some warnings being treated as errors
>>>
>>> This build failure is expected now given that we have allowed 
>>> DEBUG_VM_PGTABLE
>>> with EXPERT without platform requiring ARCH_HAS_DEBUG_VM_PGTABLE. This 
>>> problem
>>> i.e build failure caused without a platform __pgd(), is known to exist 
>>> both on
>>> ia64 and arm (32bit) platforms. Please refer 
>>> https://lkml.org/lkml/2019/9/24/314
>>> for details where this was discussed earlier.
>>>
>>
>> I'd prefer not to merge a patch which is known to cause build
>> regressions.  Is there some temporary thing we can do to prevent these
>> errors until arch maintainers(?) get around to implementing the
>> long-term fixes?
>
> We could explicitly disable CONFIG_DEBUG_VM_PGTABLE on ia64 and arm 
> platforms
> which will ensure that others can still use the EXPERT path.
>
> config DEBUG_VM_PGTABLE
>   bool "Debug arch page table for semantics compliance"
>   depends on MMU
>   depends on !(IA64 || ARM)
>   depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
>   default n if !ARCH_HAS_DEBUG_VM_PGTABLE
>   default y if DEBUG_VM
>

 On both ia32 and arm, the fix is trivial.

 Can we include the fix within this patch, just the same way as the
 mm_p4d_folded() fix for x86 ?
>>>
>>> Why should arm include a macro for something that nothing (apart from
>>> this checker) requires?  If the checker requires it but the rest of
>>> the kernel does not, it suggests that the checker isn't actually
>>> correct, and the results can't be relied upon.
>>>
>>
>> As far as I can see, 

[PATCH v3 3/3] selftests/powerpc: Don't rely on segfault to rerun the test

2020-02-10 Thread Gustavo Luiz Duarte
The test case tm-signal-context-force-tm expects a segfault to happen on
returning from signal handler, and then does a setcontext() to run the test
again. However, the test doesn't always segfault, causing the test to run a
single time.

This patch fixes the test by putting it within a loop and jumping, via
setcontext, just prior to the loop in case it segfaults. This way we get the
desired behavior (run the test COUNT_MAX times) regardless if it segfaults or
not. This also reduces the use of setcontext for control flow logic, keeping it
only in the segfault handler.

Also, since 'count' is changed within the signal handler, it is declared as
volatile to prevent any compiler optimization getting confused with
asynchronous changes.

Signed-off-by: Gustavo Luiz Duarte 
---
 .../powerpc/tm/tm-signal-context-force-tm.c   | 79 +--
 1 file changed, 37 insertions(+), 42 deletions(-)

diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-context-force-tm.c 
b/tools/testing/selftests/powerpc/tm/tm-signal-context-force-tm.c
index 31717625f318..9ff7bdb6d47a 100644
--- a/tools/testing/selftests/powerpc/tm/tm-signal-context-force-tm.c
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-context-force-tm.c
@@ -42,9 +42,10 @@
 #endif
 
 /* Setting contexts because the test will crash and we want to recover */
-ucontext_t init_context, main_context;
+ucontext_t init_context;
 
-static int count, first_time;
+/* count is changed in the signal handler, so it must be volatile */
+static volatile int count;
 
 void usr_signal_handler(int signo, siginfo_t *si, void *uc)
 {
@@ -98,11 +99,6 @@ void usr_signal_handler(int signo, siginfo_t *si, void *uc)
 
 void seg_signal_handler(int signo, siginfo_t *si, void *uc)
 {
-   if (count == COUNT_MAX) {
-   /* Return to tm_signal_force_msr() and exit */
-   setcontext(_context);
-   }
-
count++;
 
/* Reexecute the test */
@@ -126,37 +122,40 @@ void tm_trap_test(void)
 */
getcontext(_context);
 
-   /* Allocated an alternative signal stack area */
-   ss.ss_sp = mmap(NULL, SIGSTKSZ, PROT_READ | PROT_WRITE,
-   MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
-   ss.ss_size = SIGSTKSZ;
-   ss.ss_flags = 0;
-
-   if (ss.ss_sp == (void *)-1) {
-   perror("mmap error\n");
-   exit(-1);
-   }
-
-   /* Force the allocation through a page fault */
-   if (madvise(ss.ss_sp, SIGSTKSZ, MADV_DONTNEED)) {
-   perror("madvise\n");
-   exit(-1);
-   }
-
-   /* Setting an alternative stack to generate a page fault when
-* the signal is raised.
-*/
-   if (sigaltstack(, NULL)) {
-   perror("sigaltstack\n");
-   exit(-1);
+   while (count < COUNT_MAX) {
+   /* Allocated an alternative signal stack area */
+   ss.ss_sp = mmap(NULL, SIGSTKSZ, PROT_READ | PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+   ss.ss_size = SIGSTKSZ;
+   ss.ss_flags = 0;
+
+   if (ss.ss_sp == (void *)-1) {
+   perror("mmap error\n");
+   exit(-1);
+   }
+
+   /* Force the allocation through a page fault */
+   if (madvise(ss.ss_sp, SIGSTKSZ, MADV_DONTNEED)) {
+   perror("madvise\n");
+   exit(-1);
+   }
+
+   /* Setting an alternative stack to generate a page fault when
+* the signal is raised.
+*/
+   if (sigaltstack(, NULL)) {
+   perror("sigaltstack\n");
+   exit(-1);
+   }
+
+   /* The signal handler will enable MSR_TS */
+   sigaction(SIGUSR1, _sa, NULL);
+   /* If it does not crash, it might segfault, avoid it to retest 
*/
+   sigaction(SIGSEGV, _sa, NULL);
+
+   raise(SIGUSR1);
+   count++;
}
-
-   /* The signal handler will enable MSR_TS */
-   sigaction(SIGUSR1, _sa, NULL);
-   /* If it does not crash, it will segfault, avoid it to retest */
-   sigaction(SIGSEGV, _sa, NULL);
-
-   raise(SIGUSR1);
 }
 
 int tm_signal_context_force_tm(void)
@@ -169,11 +168,7 @@ int tm_signal_context_force_tm(void)
 */
SKIP_IF(!is_ppc64le());
 
-   /* Will get back here after COUNT_MAX interactions */
-   getcontext(_context);
-
-   if (!first_time++)
-   tm_trap_test();
+   tm_trap_test();
 
return EXIT_SUCCESS;
 }
-- 
2.21.1



[PATCH v3 2/3] selftests/powerpc: Add tm-signal-pagefault test

2020-02-10 Thread Gustavo Luiz Duarte
This test triggers a TM Bad Thing by raising a signal in transactional state
and forcing a pagefault to happen in kernelspace when the kernel signal
handling code first touches the user signal stack.

This is inspired by the test tm-signal-context-force-tm but uses userfaultfd to
make the test deterministic. While this test always triggers the bug in one
run, I had to execute tm-signal-context-force-tm several times (the test runs
5000 times each execution) to trigger the same bug.

tm-signal-context-force-tm is kept instead of replaced because, while this test
is more reliable and triggers the same bug, tm-signal-context-force-tm has a
better coverage, in the sense that by running the test several times it might
trigger the pagefault and/or be preempted at different places.

v3: skip test if userfaultfd is unavailable.

Signed-off-by: Gustavo Luiz Duarte 
---
 tools/testing/selftests/powerpc/tm/.gitignore |   1 +
 tools/testing/selftests/powerpc/tm/Makefile   |   3 +-
 .../powerpc/tm/tm-signal-pagefault.c  | 284 ++
 3 files changed, 287 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-pagefault.c

diff --git a/tools/testing/selftests/powerpc/tm/.gitignore 
b/tools/testing/selftests/powerpc/tm/.gitignore
index 98f2708d86cc..e1c72a4a3e91 100644
--- a/tools/testing/selftests/powerpc/tm/.gitignore
+++ b/tools/testing/selftests/powerpc/tm/.gitignore
@@ -13,6 +13,7 @@ tm-signal-context-chk-vmx
 tm-signal-context-chk-vsx
 tm-signal-context-force-tm
 tm-signal-sigreturn-nt
+tm-signal-pagefault
 tm-vmx-unavail
 tm-unavailable
 tm-trap
diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index b15a1a325bd0..b1d99736f8b8 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -5,7 +5,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr 
tm-signal-context-chk-fpu
 TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv 
tm-signal-stack \
tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable 
tm-trap \
$(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-sigreturn-nt \
-   tm-signal-context-force-tm tm-poison
+   tm-signal-context-force-tm tm-poison tm-signal-pagefault
 
 top_srcdir = ../../../../..
 include ../../lib.mk
@@ -22,6 +22,7 @@ $(OUTPUT)/tm-resched-dscr: ../pmu/lib.c
 $(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno-error=uninitialized 
-mvsx
 $(OUTPUT)/tm-trap: CFLAGS += -O0 -pthread -m64
 $(OUTPUT)/tm-signal-context-force-tm: CFLAGS += -pthread -m64
+$(OUTPUT)/tm-signal-pagefault: CFLAGS += -pthread -m64
 
 SIGNAL_CONTEXT_CHK_TESTS := $(patsubst 
%,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS))
 $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-pagefault.c 
b/tools/testing/selftests/powerpc/tm/tm-signal-pagefault.c
new file mode 100644
index ..5908bc6abe60
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-pagefault.c
@@ -0,0 +1,284 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020, Gustavo Luiz Duarte, IBM Corp.
+ *
+ * This test starts a transaction and triggers a signal, forcing a pagefault to
+ * happen when the kernel signal handling code touches the user signal stack.
+ *
+ * In order to avoid pre-faulting the signal stack memory and to force the
+ * pagefault to happen precisely in the kernel signal handling code, the
+ * pagefault handling is done in userspace using the userfaultfd facility.
+ *
+ * Further pagefaults are triggered by crafting the signal handler's ucontext
+ * to point to additional memory regions managed by the userfaultfd, so using
+ * the same mechanism used to avoid pre-faulting the signal stack memory.
+ *
+ * On failure (bug is present) kernel crashes or never returns control back to
+ * userspace. If bug is not present, tests completes almost immediately.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "tm.h"
+
+
+#define UF_MEM_SIZE 655360 /* 10 x 64k pages */
+
+/* Memory handled by userfaultfd */
+static char *uf_mem;
+static size_t uf_mem_offset = 0;
+
+/*
+ * Data that will be copied into the faulting pages (instead of zero-filled
+ * pages). This is used to make the test more reliable and avoid segfaulting
+ * when we return from the signal handler. Since we are making the signal
+ * handler's ucontext point to newly allocated memory, when that memory is
+ * paged-in it will contain the expected content.
+ */
+static char backing_mem[UF_MEM_SIZE];
+
+static size_t pagesize;
+
+/*
+ * Return a chunk of at least 'size' bytes of memory that will be handled by
+ * userfaultfd. If 'backing_data' is not NULL, its content will be save to
+ * 'backing_mem' and then copied into the faulting pages when the page fault
+ * is handled.
+ */
+void 

[PATCH v3 1/3] powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal delivery

2020-02-10 Thread Gustavo Luiz Duarte
After a treclaim, we expect to be in non-transactional state. If we don't clear
the current thread's MSR[TS] before we get preempted, then
tm_recheckpoint_new_task() will recheckpoint and we get rescheduled in
suspended transaction state.

When handling a signal caught in transactional state, handle_rt_signal64()
calls get_tm_stackpointer() that treclaims the transaction using
tm_reclaim_current() but without clearing the thread's MSR[TS]. This can cause
the TM Bad Thing exception below if later we pagefault and get preempted trying
to access the user's sigframe, using __put_user(). Afterwards, when we are
rescheduled back into do_page_fault() (but now in suspended state since the
thread's MSR[TS] was not cleared), upon executing 'rfid' after completion of
the page fault handling, the exception is raised because a transition from
suspended to non-transactional state is invalid.

Unexpected TM Bad Thing exception at c000de44 (msr 
0x800302a03031) tm_scratch=80010280b033
Oops: Unrecoverable exception, sig: 6 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 ip6_tables ip_tables nft_compat ip_set nf_tables nfnetlink xts 
vmx_crypto sg virtio_balloon
r_mod cdrom virtio_net net_failover virtio_blk virtio_scsi failover 
dm_mirror dm_region_hash dm_log dm_mod
CPU: 25 PID: 15547 Comm: a.out Not tainted 5.4.0-rc2 #32
NIP:  c000de44 LR: c0034728 CTR: 
REGS: c0003fe7bd70 TRAP: 0700   Not tainted  (5.4.0-rc2)
MSR:  800302a03031   CR: 44000884 
 XER: 
CFAR: c000dda4 IRQMASK: 0
PACATMSCRATCH: 80010280b033
GPR00: c0034728 c00f65a17c80 c1662800 
7fffacf3fd78
GPR04: 1000 1000  
c00f611f8af0
GPR08:  78006001  
000c
GPR12: c00f611f84b0 c0003ffcb200  

GPR16:    

GPR20:    
c00f611f8140
GPR24:  7fffacf3fd68 c00f65a17d90 
c00f611f7800
GPR28: c00f65a17e90 c00f65a17e90 c1685e18 
7fffacf3f000
NIP [c000de44] fast_exception_return+0xf4/0x1b0
LR [c0034728] handle_rt_signal64+0x78/0xc50
Call Trace:
[c00f65a17c80] [c0034710] handle_rt_signal64+0x60/0xc50 
(unreliable)
[c00f65a17d30] [c0023640] do_notify_resume+0x330/0x460
[c00f65a17e20] [c000dcc4] ret_from_except_lite+0x70/0x74
Instruction dump:
7c4ff120 e8410170 7c5a03a6 3840 f8410060 e8010070 e8410080 e8610088
6000 6000 e8810090 e8210078 <4c24> 4800 e8610178 
88ed0989
---[ end trace 93094aa44b442f87 ]---

The simplified sequence of events that triggers the above exception is:

  ...   # userspace in NON-TRANSACTIONAL state
  tbegin# userspace in TRANSACTIONAL state
  signal delivery   # kernelspace in SUSPENDED state
  handle_rt_signal64()
get_tm_stackpointer()
  treclaim  # kernelspace in NON-TRANSACTIONAL state
__put_user()
  page fault happens. We will never get back here because of the TM Bad 
Thing exception.

  page fault handling kicks in and we voluntarily preempt ourselves
  do_page_fault()
__schedule()
  __switch_to(other_task)

  our task is rescheduled and we recheckpoint because the thread's MSR[TS] was 
not cleared
  __switch_to(our_task)
switch_to_tm()
  tm_recheckpoint_new_task()
trechkpt# kernelspace in SUSPENDED state

  The page fault handling resumes, but now we are in suspended transaction state
  do_page_fault()completes
  rfid <- trying to get back where the page fault happened (we were 
non-transactional back then)
  TM Bad Thing  # illegal transition from suspended to 
non-transactional

This patch fixes that issue by clearing the current thread's MSR[TS] just after
treclaim in get_tm_stackpointer() so that we stay in non-transactional state in
case we are preempted. In order to make treclaim and clearing the thread's
MSR[TS] atomic from a preemption perspective when CONFIG_PREEMPT is set,
preempt_disable/enable() is used. It's also necessary to save the previous
value of the thread's MSR before get_tm_stackpointer() is called so that it can
be exposed to the signal handler later in setup_tm_sigcontexts() to inform the
userspace MSR at the moment of the signal delivery.

Found with tm-signal-context-force-tm kernel selftest.

v3: Subject and comment improvements.
v2: Fix build failure when tm is disabled.


[PATCH 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents

2020-02-10 Thread Pingfan Liu
A bug is observed on pseries by taking the following steps on rhel:
-1. drmgr -c mem -r -q 5
-2. echo c > /proc/sysrq-trigger

And then, the failure looks like:
kdump: saving to /sysroot//var/crash/127.0.0.1-2020-01-16-02:06:14/
kdump: saving vmcore-dmesg.txt
kdump: saving vmcore-dmesg.txt complete
kdump: saving vmcore
 Checking for memory holes : [  0.0 %] /
   Checking for memory holes : [100.0 %] |  
 Excluding unnecessary pages   : [100.0 %] \
   Copying data  : [  0.3 %] -  
eta: 38s[   44.337636] hash-mmu: mm: Hashing failure ! EA=0x7fffba40 
access=0x8004 current=makedumpfile
[   44.337663] hash-mmu: trap=0x300 vsid=0x13a109c ssize=1 base psize=2 
psize 2 pte=0xc0005504
[   44.337677] hash-mmu: mm: Hashing failure ! EA=0x7fffba40 
access=0x8004 current=makedumpfile
[   44.337692] hash-mmu: trap=0x300 vsid=0x13a109c ssize=1 base psize=2 
psize 2 pte=0xc0005504
[   44.337708] makedumpfile[469]: unhandled signal 7 at 7fffba40 nip 
7fffbbc4d7fc lr 00011356ca3c code 2
[   44.338548] Core dump to |/bin/false pipe failed
/lib/kdump-lib-initramfs.sh: line 98:   469 Bus error   
$CORE_COLLECTOR /proc/vmcore 
$_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore-incomplete
kdump: saving vmcore failed

* Root cause *
  After analyzing, it turns out that in the current implementation,
when hot-removing lmb, the KOBJ_REMOVE event ejects before the dt updating as
the code __remove_memory() comes before drmem_update_dt().

>From a viewpoint of listener and publisher, the publisher notifies the
listener before data is ready.  This introduces a problem where udev
launches kexec-tools (due to KOBJ_REMOVE) and loads a stale dt before
updating. And in capture kernel, makedumpfile will access the memory based
on the stale dt info, and hit a SIGBUS error due to an un-existed lmb.

* Fix *
  In order to fix this issue, update dt before __remove_memory(), and
accordingly the same rule in hot-add path.

This will introduce extra dt updating payload for each involved lmb when 
hotplug.
But it should be fine since drmem_update_dt() is memory based operation and
hotplug is not a hot path.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Hari Bathini 
To: linuxppc-dev@lists.ozlabs.org
Cc: ke...@lists.infradead.org
---
 arch/powerpc/platforms/pseries/hotplug-memory.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index a3a9353..1f623c3 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -392,6 +392,9 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
invalidate_lmb_associativity_index(lmb);
lmb_clear_nid(lmb);
lmb->flags &= ~DRCONF_MEM_ASSIGNED;
+   rtas_hp_event = true;
+   drmem_update_dt();
+   rtas_hp_event = false;
 
__remove_memory(nid, base_addr, block_sz);
 
@@ -665,6 +668,9 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 
lmb_set_nid(lmb);
lmb->flags |= DRCONF_MEM_ASSIGNED;
+   rtas_hp_event = true;
+   drmem_update_dt();
+   rtas_hp_event = false;
 
block_sz = memory_block_size_bytes();
 
@@ -683,6 +689,9 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
invalidate_lmb_associativity_index(lmb);
lmb_clear_nid(lmb);
lmb->flags &= ~DRCONF_MEM_ASSIGNED;
+   rtas_hp_event = true;
+   drmem_update_dt();
+   rtas_hp_event = false;
 
__remove_memory(nid, base_addr, block_sz);
}
@@ -939,12 +948,6 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
break;
}
 
-   if (!rc) {
-   rtas_hp_event = true;
-   rc = drmem_update_dt();
-   rtas_hp_event = false;
-   }
-
unlock_device_hotplug();
return rc;
 }
-- 
2.7.5



[PATCH 1/2] powerpc/pseries: group lmb operation and memblock's

2020-02-10 Thread Pingfan Liu
This patch prepares for the incoming patch which swaps the order of KOBJ_
uevent and dt's updating.

It has no functional effect, just groups lmb operation and memblock's in
order to insert dt updating operation easily, and makes it easier to
review.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Hari Bathini 
To: linuxppc-dev@lists.ozlabs.org
Cc: ke...@lists.infradead.org
---
 arch/powerpc/platforms/pseries/hotplug-memory.c | 26 -
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c126b94..a3a9353 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -375,7 +375,8 @@ static int dlpar_add_lmb(struct drmem_lmb *);
 static int dlpar_remove_lmb(struct drmem_lmb *lmb)
 {
unsigned long block_sz;
-   int rc;
+   phys_addr_t base_addr;
+   int rc, nid;
 
if (!lmb_is_removable(lmb))
return -EINVAL;
@@ -384,17 +385,19 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
if (rc)
return rc;
 
+   base_addr = lmb->base_addr;
+   nid = lmb->nid;
block_sz = pseries_memory_block_size();
 
-   __remove_memory(lmb->nid, lmb->base_addr, block_sz);
-
-   /* Update memory regions for memory remove */
-   memblock_remove(lmb->base_addr, block_sz);
-
invalidate_lmb_associativity_index(lmb);
lmb_clear_nid(lmb);
lmb->flags &= ~DRCONF_MEM_ASSIGNED;
 
+   __remove_memory(nid, base_addr, block_sz);
+
+   /* Update memory regions for memory remove */
+   memblock_remove(base_addr, block_sz);
+
return 0;
 }
 
@@ -661,6 +664,8 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
}
 
lmb_set_nid(lmb);
+   lmb->flags |= DRCONF_MEM_ASSIGNED;
+
block_sz = memory_block_size_bytes();
 
/* Add the memory */
@@ -672,11 +677,14 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 
rc = dlpar_online_lmb(lmb);
if (rc) {
-   __remove_memory(lmb->nid, lmb->base_addr, block_sz);
+   int nid = lmb->nid;
+   phys_addr_t base_addr = lmb->base_addr;
+
invalidate_lmb_associativity_index(lmb);
lmb_clear_nid(lmb);
-   } else {
-   lmb->flags |= DRCONF_MEM_ASSIGNED;
+   lmb->flags &= ~DRCONF_MEM_ASSIGNED;
+
+   __remove_memory(nid, base_addr, block_sz);
}
 
return rc;
-- 
2.7.5



Re: [PATCH 2/2] selftests: vm: Fix 64-bit test builds for powerpc64le

2020-02-10 Thread Michael Ellerman
Sandipan Das  writes:
> Some tests are built only for 64-bit systems. This makes
> sure that these tests are built for both big and little
> endian variants of powerpc64.
>
> Fixes: 7549b3364201 ("selftests: vm: Build/Run 64bit tests only on 64bit 
> arch")
> Reviewed-by: Kamalesh Babulal 
> Signed-off-by: Sandipan Das 
> ---
>  tools/testing/selftests/vm/Makefile| 2 +-
>  tools/testing/selftests/vm/run_vmtests | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

Tested-by: Michael Ellerman 

cheers

> diff --git a/tools/testing/selftests/vm/Makefile 
> b/tools/testing/selftests/vm/Makefile
> index 3f2e2f0ccbc9..8074340c6b3a 100644
> --- a/tools/testing/selftests/vm/Makefile
> +++ b/tools/testing/selftests/vm/Makefile
> @@ -19,7 +19,7 @@ TEST_GEN_FILES += thuge-gen
>  TEST_GEN_FILES += transhuge-stress
>  TEST_GEN_FILES += userfaultfd
>  
> -ifneq (,$(filter $(MACHINE),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x 
> sh64 sparc64 x86_64))
> +ifneq (,$(filter $(MACHINE),arm64 ia64 mips64 parisc64 ppc64 ppc64le riscv64 
> s390x sh64 sparc64 x86_64))
>  TEST_GEN_FILES += va_128TBswitch
>  TEST_GEN_FILES += virtual_address_range
>  endif
> diff --git a/tools/testing/selftests/vm/run_vmtests 
> b/tools/testing/selftests/vm/run_vmtests
> index a692ea828317..db8e0d1c7b39 100755
> --- a/tools/testing/selftests/vm/run_vmtests
> +++ b/tools/testing/selftests/vm/run_vmtests
> @@ -59,7 +59,7 @@ else
>  fi
>  
>  #filter 64bit architectures
> -ARCH64STR="arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 
> x86_64"
> +ARCH64STR="arm64 ia64 mips64 parisc64 ppc64 ppc64le riscv64 s390x sh64 
> sparc64 x86_64"
>  if [ -z $ARCH ]; then
>ARCH=`uname -m 2>/dev/null | sed -e 's/aarch64.*/arm64/'`
>  fi
> -- 
> 2.17.1


Re: [PATCH 1/2] selftests: vm: Do not override definition of ARCH

2020-02-10 Thread Michael Ellerman
Sandipan Das  writes:
> Independent builds of the vm selftests is currently broken
> because commit 7549b3364201 overrides the value of ARCH with
> the machine name from uname. This does not always match the
> architecture names used for tasks like header installation.
>
> E.g. for building tests on powerpc64, we need ARCH=powerpc
> and not ARCH=ppc64 or ARCH=ppc64le. Otherwise, the build
> fails as shown below.
>
>   $ uname -m
>   ppc64le
>
>   $ make -C tools/testing/selftests/vm
>   make: Entering directory '/home/sandipan/linux/tools/testing/selftests/vm'
>   make --no-builtin-rules ARCH=ppc64le -C ../../../.. headers_install
>   make[1]: Entering directory '/home/sandipan/linux'
>   Makefile:653: arch/ppc64le/Makefile: No such file or directory
>   make[1]: *** No rule to make target 'arch/ppc64le/Makefile'.  Stop.
>   make[1]: Leaving directory '/home/sandipan/linux'
>   ../lib.mk:50: recipe for target 'khdr' failed
>   make: *** [khdr] Error 2
>   make: Leaving directory '/home/sandipan/linux/tools/testing/selftests/vm'
>
> Fixes: 7549b3364201 ("selftests: vm: Build/Run 64bit tests only on 64bit 
> arch")
> Signed-off-by: Sandipan Das 
> ---
>  tools/testing/selftests/vm/Makefile | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Tested-by: Michael Ellerman 

cheers

> diff --git a/tools/testing/selftests/vm/Makefile 
> b/tools/testing/selftests/vm/Makefile
> index 7f9a8a8c31da..3f2e2f0ccbc9 100644
> --- a/tools/testing/selftests/vm/Makefile
> +++ b/tools/testing/selftests/vm/Makefile
> @@ -1,7 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0
>  # Makefile for vm selftests
>  uname_M := $(shell uname -m 2>/dev/null || echo not)
> -ARCH ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/')
> +MACHINE ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/')
>  
>  CFLAGS = -Wall -I ../../../../usr/include $(EXTRA_CFLAGS)
>  LDLIBS = -lrt
> @@ -19,7 +19,7 @@ TEST_GEN_FILES += thuge-gen
>  TEST_GEN_FILES += transhuge-stress
>  TEST_GEN_FILES += userfaultfd
>  
> -ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 
> sparc64 x86_64))
> +ifneq (,$(filter $(MACHINE),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x 
> sh64 sparc64 x86_64))
>  TEST_GEN_FILES += va_128TBswitch
>  TEST_GEN_FILES += virtual_address_range
>  endif
> -- 
> 2.17.1


Re: linux-next: DTS build warnings

2020-02-10 Thread Stephen Rothwell
Hi all,

On Wed, 31 Jul 2019 15:39:36 +1000 Stephen Rothwell  
wrote:
>
> I have been getting the following warnings from a couple of powerpc
> builds for quite a while now.  I was hoping someone might have time to
> look at them and maybe even fix them up :-)

Today's list (from an allyesconfig build) looks like this:

arch/powerpc/boot/dts/icon.dts:318.26-357.5: Warning (pci_bridge): 
/plb/pciex@d: node name is not "pci" or "pcie"
arch/powerpc/boot/dts/icon.dts:359.26-398.5: Warning (pci_bridge): 
/plb/pciex@d2000: node name is not "pci" or "pcie"
arch/powerpc/boot/dts/icon.dtb: Warning (pci_device_bus_num): Failed 
prerequisite 'pci_bridge'
arch/powerpc/boot/dts/virtex440-ml510.dts:335.37-439.6: Warning (pci_bridge): 
/plb@0/plbv46-pci@85e0: node name is not "pci" or "pcie"
arch/powerpc/boot/dts/virtex440-ml510.dtb: Warning (pci_device_bus_num): Failed 
prerequisite 'pci_bridge'
arch/powerpc/boot/dts/mpc5200b.dtsi:267.20-280.4: Warning (pci_bridge): 
/pci@fd00: missing ranges for PCI bridge (or not a bridge)
arch/powerpc/boot/dts/o2mnt.dtb: Warning (pci_device_bus_num): Failed 
prerequisite 'pci_bridge'
arch/powerpc/boot/dts/mpc5200b.dtsi:182.18-186.5: Warning (spi_bus_bridge): 
/soc5200@f000/psc@2000: node name for SPI buses should be 'spi'
  also defined at arch/powerpc/boot/dts/o2d.dtsi:32.12-43.5
arch/powerpc/boot/dts/o2mnt.dtb: Warning (spi_bus_reg): Failed prerequisite 
'spi_bus_bridge'
arch/powerpc/boot/dts/makalu.dts:271.25-310.5: Warning (pci_bridge): 
/plb/pciex@a000: node name is not "pci" or "pcie"
arch/powerpc/boot/dts/makalu.dts:312.25-351.5: Warning (pci_bridge): 
/plb/pciex@c000: node name is not "pci" or "pcie"
arch/powerpc/boot/dts/makalu.dtb: Warning (pci_device_bus_num): Failed 
prerequisite 'pci_bridge'
arch/powerpc/boot/dts/mgcoge.dts:230.14-234.7: Warning (spi_bus_reg): 
/soc@f000/cpm@119c0/spi@11aa0/ds3106@1: SPI bus unit address format error, 
expected "0"
arch/powerpc/boot/dts/mpc5200b.dtsi:267.20-280.4: Warning (pci_bridge): 
/pci@fd00: missing ranges for PCI bridge (or not a bridge)
  also defined at arch/powerpc/boot/dts/motionpro.dts:84.15-86.4
arch/powerpc/boot/dts/motionpro.dtb: Warning (pci_device_bus_num): Failed 
prerequisite 'pci_bridge'
arch/powerpc/boot/dts/glacier.dts:492.26-532.5: Warning (pci_bridge): 
/plb/pciex@d: node name is not "pci" or "pcie"
arch/powerpc/boot/dts/glacier.dts:534.26-574.5: Warning (pci_bridge): 
/plb/pciex@d2000: node name is not "pci" or "pcie"
arch/powerpc/boot/dts/glacier.dtb: Warning (pci_device_bus_num): Failed 
prerequisite 'pci_bridge'
arch/powerpc/boot/dts/fsl/p2020rdb.dts:251.22-254.4: Warning (pci_bridge): 
/pcie@ffe08000: missing ranges for PCI bridge (or not a bridge)
  also defined at arch/powerpc/boot/dts/fsl/p2020si-post.dtsi:43.7-68.3
arch/powerpc/boot/dts/fsl/p2020si-post.dtsi:52.9-67.4: Warning (pci_bridge): 
/pcie@ffe08000/pcie@0: missing ranges for PCI bridge (or not a bridge)
arch/powerpc/boot/dts/fsl/p2020rdb.dtb: Warning (pci_device_bus_num): Failed 
prerequisite 'pci_bridge'
arch/powerpc/boot/dts/fsl/mvme7100.dts:135.22-137.4: Warning (pci_bridge): 
/pcie@f1008000: missing ranges for PCI bridge (or not a bridge)
  also defined at arch/powerpc/boot/dts/fsl/mpc8641si-post.dtsi:92.7-117.3
arch/powerpc/boot/dts/fsl/mpc8641si-post.dtsi:102.9-116.4: Warning 
(pci_bridge): /pcie@f1008000/pcie@0: missing ranges for PCI bridge (or not a 
bridge)
arch/powerpc/boot/dts/fsl/mvme7100.dts:139.22-141.4: Warning (pci_bridge): 
/pcie@f1009000: missing ranges for PCI bridge (or not a bridge)
  also defined at arch/powerpc/boot/dts/fsl/mpc8641si-post.dtsi:119.7-144.3
arch/powerpc/boot/dts/fsl/mpc8641si-post.dtsi:129.9-143.4: Warning 
(pci_bridge): /pcie@f1009000/pcie@0: missing ranges for PCI bridge (or not a 
bridge)
arch/powerpc/boot/dts/fsl/mvme7100.dtb: Warning (pci_device_bus_num): Failed 
prerequisite 'pci_bridge'
arch/powerpc/boot/dts/fsl/mvme7100.dts:30.11-32.6: Warning (i2c_bus_reg): 
/soc@f100/i2c@3000/rtc@68: missing or empty reg property
arch/powerpc/boot/dts/fsl/pq3-mpic.dtsi:38.2-25: Warning (interrupts_property): 
/soc@fffe0/mdio@24000/ethernet-phy@0:#interrupt-cells: size is (8), 
expected multiple of 16
arch/powerpc/boot/dts/fsl/pq3-mpic.dtsi:38.2-25: Warning (interrupts_property): 
/soc@fffe0/mdio@24000/ethernet-phy@1:#interrupt-cells: size is (8), 
expected multiple of 16
arch/powerpc/boot/dts/mpc5200b.dtsi:267.20-280.4: Warning (pci_bridge): 
/pci@fd00: missing ranges for PCI bridge (or not a bridge)
  also defined at arch/powerpc/boot/dts/uc101.dts:100.15-102.4
arch/powerpc/boot/dts/uc101.dtb: Warning (pci_device_bus_num): Failed 
prerequisite 'pci_bridge'
arch/powerpc/boot/dts/fsl/pq3-mpic.dtsi:38.2-25: Warning (interrupts_property): 
/soc@fffe0/mdio@24000/ethernet-phy@0:#interrupt-cells: size is (8), 
expected multiple of 16
arch/powerpc/boot/dts/fsl/pq3-mpic.dtsi:38.2-25: Warning (interrupts_property): 

Re: [PATCH v2] libnvdimm: Update persistence domain value for of_pmem and papr_scm device

2020-02-10 Thread Dan Williams
On Mon, Feb 10, 2020 at 6:20 AM Aneesh Kumar K.V
 wrote:
>
> Dan Williams  writes:
>
> > On Tue, Feb 4, 2020 at 9:21 PM Aneesh Kumar K.V
> >  wrote:
> >>
> >> Currently, kernel shows the below values
> >> "persistence_domain":"cpu_cache"
> >> "persistence_domain":"memory_controller"
> >> "persistence_domain":"unknown"
> >>
> >> "cpu_cache" indicates no extra instructions is needed to ensure the 
> >> persistence
> >> of data in the pmem media on power failure.
> >>
> >> "memory_controller" indicates platform provided instructions need to be 
> >> issued
> >
> > No, it does not. The only requirement implied by "memory_controller"
> > is global visibility outside the cpu cache. If there are special
> > instructions beyond that then it isn't persistent memory, at least not
> > pmem that is safe for dax. virtio-pmem is an example of pmem-like
> > memory that is not enabled for userspace flushing (MAP_SYNC disabled).
> >
>
> Can you explain this more? The way I was expecting the application to
> interpret the value was, a regular store instruction doesn't guarantee
> persistence if you find the "memory_controller" value for
> persistence_domain. Instead, we need to make sure we flush data to the
> controller at which point the platform will take care of the persistence in
> case of power loss. How we flush data to the controller will also be
> defined by the platform.

If the platform requires any flush mechanism outside of the base cpu
ISA of cache flushes and memory barriers then MAP_SYNC needs to be
explicitly disabled to force the application to call fsync()/msync().
Then those platform specific mechanisms need to be triggered through a
platform-aware driver.

>
>
> >> as per documented sequence to make sure data get flushed so that it is
> >> guaranteed to be on pmem media in case of system power loss.
> >>
> >> Based on the above use memory_controller for non volatile regions on ppc64.
> >>
> >> Signed-off-by: Aneesh Kumar K.V 
> >> ---
> >>  arch/powerpc/platforms/pseries/papr_scm.c | 7 ++-
> >>  drivers/nvdimm/of_pmem.c  | 4 +++-
> >>  include/linux/libnvdimm.h | 1 -
> >>  3 files changed, 9 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> >> b/arch/powerpc/platforms/pseries/papr_scm.c
> >> index 7525635a8536..ffcd0d7a867c 100644
> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> >> @@ -359,8 +359,13 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv 
> >> *p)
> >>
> >> if (p->is_volatile)
> >> p->region = nvdimm_volatile_region_create(p->bus, 
> >> _desc);
> >> -   else
> >> +   else {
> >> +   /*
> >> +* We need to flush things correctly to guarantee 
> >> persistance
> >> +*/
> >
> > There are never guarantees. If you're going to comment what does
> > software need to flush, and how?
>
> Can you explain why you say there are never guarantees? If you follow the 
> platform
> recommended instruction sequence to flush data, we can be sure of data
> persistence in the pmem media.

Because storage can always fail. You can reduce risk, but never
eliminate it. This is similar to SSDs that use latent capacitance to
flush their write caches on driver power loss. Even if the application
successfully flushes its writes to buffers that are protected by that
capacitance that power source can still (and in practice does) fail.

>
>
> >
> >> +   set_bit(ND_REGION_PERSIST_MEMCTRL, _desc.flags);
> >> p->region = nvdimm_pmem_region_create(p->bus, _desc);
> >> +   }
> >> if (!p->region) {
> >> dev_err(dev, "Error registering region %pR from %pOF\n",
> >> ndr_desc.res, p->dn);
> >> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> >> index 8224d1431ea9..6826a274a1f1 100644
> >> --- a/drivers/nvdimm/of_pmem.c
> >> +++ b/drivers/nvdimm/of_pmem.c
> >> @@ -62,8 +62,10 @@ static int of_pmem_region_probe(struct platform_device 
> >> *pdev)
> >>
> >> if (is_volatile)
> >> region = nvdimm_volatile_region_create(bus, 
> >> _desc);
> >> -   else
> >> +   else {
> >> +   set_bit(ND_REGION_PERSIST_MEMCTRL, 
> >> _desc.flags);
> >> region = nvdimm_pmem_region_create(bus, _desc);
> >> +   }
> >>
> >> if (!region)
> >> dev_warn(>dev, "Unable to register region 
> >> %pR from %pOF\n",
> >> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> >> index 0f366706b0aa..771d888a5ed7 100644
> >> --- a/include/linux/libnvdimm.h
> >> +++ b/include/linux/libnvdimm.h
> >> @@ -54,7 +54,6 @@ enum {
> >> /*
> >>  * Platform provides mechanisms to automatically flush outstanding
> >>  * write data from 

[PATCH RESEND] macintosh: convert to i2c_new_scanned_device

2020-02-10 Thread Wolfram Sang
Move from the deprecated i2c_new_probed_device() to the new
i2c_new_scanned_device(). No functional change for this driver because
it doesn't check the return code anyhow.

Signed-off-by: Wolfram Sang 
---

I can take this via I2C tree if this makes things easier...

 drivers/macintosh/therm_windtunnel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/macintosh/therm_windtunnel.c 
b/drivers/macintosh/therm_windtunnel.c
index 8c744578122a..f15fec5e1cb6 100644
--- a/drivers/macintosh/therm_windtunnel.c
+++ b/drivers/macintosh/therm_windtunnel.c
@@ -321,10 +321,10 @@ do_attach( struct i2c_adapter *adapter )
 
memset(, 0, sizeof(struct i2c_board_info));
strlcpy(info.type, "therm_ds1775", I2C_NAME_SIZE);
-   i2c_new_probed_device(adapter, , scan_ds1775, NULL);
+   i2c_new_scanned_device(adapter, , scan_ds1775, NULL);
 
strlcpy(info.type, "therm_adm1030", I2C_NAME_SIZE);
-   i2c_new_probed_device(adapter, , scan_adm1030, NULL);
+   i2c_new_scanned_device(adapter, , scan_adm1030, NULL);
 
if( x.thermostat && x.fan ) {
x.running = 1;
-- 
2.20.1



Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-02-10 Thread Catalin Marinas
On Tue, Jan 28, 2020 at 06:57:53AM +0530, Anshuman Khandual wrote:
> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
> arm64. Going forward, other architectures too can enable this after fixing
> build or runtime problems (if any) with their page table helpers.

It may be worth posting the next version to linux-arch to reach out to
other arch maintainers.

Also I've seen that you posted a v13 but it hasn't reached
linux-arm-kernel (likely held in moderation because of the large amount
of addresses cc'ed) and I don't normally follow LKML. I'm not cc'ed to
this patch either (which is fine as long as you post to a list that I
read).

Since I started the reply on v12 about a week ago, I'll follow up here.
When you post a v14, please trim the people on cc only to those strictly
necessary (e.g. arch maintainers, linux-mm, linux-arch and lkml).

> diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt 
> b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
> new file mode 100644
> index ..f3f8111edbe3
> --- /dev/null
> +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
> @@ -0,0 +1,35 @@
> +#
> +# Feature name:  debug-vm-pgtable
> +# Kconfig:   ARCH_HAS_DEBUG_VM_PGTABLE
> +# description:   arch supports pgtable tests for semantics compliance
> +#
> +---
> +| arch |status|
> +---
> +|   alpha: | TODO |
> +| arc: |  ok  |
> +| arm: | TODO |

I'm sure you can find some arm32 hardware around (or a VM) to give this
a try ;).

> diff --git a/arch/x86/include/asm/pgtable_64.h 
> b/arch/x86/include/asm/pgtable_64.h
> index 0b6c4042942a..fb0e76d254b3 100644
> --- a/arch/x86/include/asm/pgtable_64.h
> +++ b/arch/x86/include/asm/pgtable_64.h
[...]
> @@ -1197,6 +1197,7 @@ static noinline void __init kernel_init_freeable(void)
>   sched_init_smp();
>  
>   page_alloc_init_late();
> + debug_vm_pgtable();
>   /* Initialize page ext after all struct pages are initialized. */
>   page_ext_init();

I guess you could even make debug_vm_pgtable() an early_initcall(). I
don't have a strong opinion either way.

> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> new file mode 100644
> index ..0f37f32d15f1
> --- /dev/null
> +++ b/mm/debug_vm_pgtable.c
> @@ -0,0 +1,388 @@
[...]
> +/*
> + * Basic operations
> + *
> + * mkold(entry)  = An old and not a young entry
> + * mkyoung(entry)= A young and not an old entry
> + * mkdirty(entry)= A dirty and not a clean entry
> + * mkclean(entry)= A clean and not a dirty entry
> + * mkwrite(entry)= A write and not a write protected entry
> + * wrprotect(entry)  = A write protected and not a write entry
> + * pxx_bad(entry)= A mapped and non-table entry
> + * pxx_same(entry1, entry2)  = Both entries hold the exact same value
> + */
> +#define VMFLAGS  (VM_READ|VM_WRITE|VM_EXEC)
> +
> +/*
> + * On s390 platform, the lower 12 bits are used to identify given page table
> + * entry type and for other arch specific requirements. But these bits might
> + * affect the ability to clear entries with pxx_clear(). So while loading up
> + * the entries skip all lower 12 bits in order to accommodate s390 platform.
> + * It does not have affect any other platform.
> + */
> +#define RANDOM_ORVALUE   (0xf000UL)

I'd suggest you generate this mask with something like
GENMASK(BITS_PER_LONG, PAGE_SHIFT).

> +#define RANDOM_NZVALUE   (0xff)
> +
> +static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot)
> +{
> + pte_t pte = pfn_pte(pfn, prot);
> +
> + WARN_ON(!pte_same(pte, pte));
> + WARN_ON(!pte_young(pte_mkyoung(pte)));
> + WARN_ON(!pte_dirty(pte_mkdirty(pte)));
> + WARN_ON(!pte_write(pte_mkwrite(pte)));
> + WARN_ON(pte_young(pte_mkold(pte)));
> + WARN_ON(pte_dirty(pte_mkclean(pte)));
> + WARN_ON(pte_write(pte_wrprotect(pte)));

Given that you start with rwx permissions set,
some of these ops would not have any effect. For example, on arm64 at
least, mkwrite clears a bit already cleared here. You could try with
multiple rwx combinations values (e.g. all set and all cleared) or maybe
something like below:

WARN_ON(!pte_write(pte_mkwrite(pte_wrprotect(pte;

You could also try something like this:

WARN_ON(!pte_same(pte_wrprotect(pte), pte_wrprotect(pte_mkwrite(pte;

though the above approach may not work for arm64 ptep_set_wrprotect() on
a dirty pte (if you extend these tests later).

> +}
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
> +{
> + pmd_t pmd = pfn_pmd(pfn, 

Re: [PATCH 6/6] powerpc: powernv: no need to check return value of debugfs_create functions

2020-02-10 Thread Greg Kroah-Hartman
On Tue, Feb 11, 2020 at 02:01:53AM +1100, Oliver O'Halloran wrote:
> On Mon, Feb 10, 2020 at 12:12 AM Greg Kroah-Hartman
>  wrote:
> >
> > When calling debugfs functions, there is no need to ever check the
> > return value.  The function can work or not, but the code logic should
> > never do something different based on this.
> 
> For memtrace debugfs is the only way to actually use the feature. It'd
> be nice if it still printed out *something* if it failed to create the
> files rather than just being mysteriously absent, but maybe debugfs
> itself does that. Looks fine otherwise.

No, debugfs will only spit out an error message to the log if a
file/directory is attempted to be created for an already present
file/directory.

For other failures, no error will be printed, other than the normal
lower-level "out of memory" issues that might rarely happen.

thanks,

greg k-h


Re: [PATCH 6/6] powerpc: powernv: no need to check return value of debugfs_create functions

2020-02-10 Thread Oliver O'Halloran
On Mon, Feb 10, 2020 at 12:12 AM Greg Kroah-Hartman
 wrote:
>
> When calling debugfs functions, there is no need to ever check the
> return value.  The function can work or not, but the code logic should
> never do something different based on this.

For memtrace debugfs is the only way to actually use the feature. It'd
be nice if it still printed out *something* if it failed to create the
files rather than just being mysteriously absent, but maybe debugfs
itself does that. Looks fine otherwise.

Reviewed-by: Oliver O'Halloran 

> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Sukadev Bhattiprolu 
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Greg Kroah-Hartman 
> ---
>  arch/powerpc/platforms/powernv/memtrace.c  |  7 
>  arch/powerpc/platforms/powernv/opal-imc.c  | 24 --
>  arch/powerpc/platforms/powernv/pci-ioda.c  |  5 ---
>  arch/powerpc/platforms/powernv/vas-debug.c | 37 ++
>  4 files changed, 10 insertions(+), 63 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
> b/arch/powerpc/platforms/powernv/memtrace.c
> index eb2e75dac369..d6d64f8718e6 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -187,11 +187,6 @@ static int memtrace_init_debugfs(void)
>
> snprintf(ent->name, 16, "%08x", ent->nid);
> dir = debugfs_create_dir(ent->name, memtrace_debugfs_dir);
> -   if (!dir) {
> -   pr_err("Failed to create debugfs directory for node 
> %d\n",
> -   ent->nid);
> -   return -1;
> -   }
>
> ent->dir = dir;
> debugfs_create_file("trace", 0400, dir, ent, _fops);
> @@ -314,8 +309,6 @@ static int memtrace_init(void)
>  {
> memtrace_debugfs_dir = debugfs_create_dir("memtrace",
>   powerpc_debugfs_root);
> -   if (!memtrace_debugfs_dir)
> -   return -1;
>
> debugfs_create_file("enable", 0600, memtrace_debugfs_dir,
> NULL, _init_fops);
> diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
> b/arch/powerpc/platforms/powernv/opal-imc.c
> index 000b350d4060..968b9a4d1cd9 100644
> --- a/arch/powerpc/platforms/powernv/opal-imc.c
> +++ b/arch/powerpc/platforms/powernv/opal-imc.c
> @@ -35,11 +35,10 @@ static int imc_mem_set(void *data, u64 val)
>  }
>  DEFINE_DEBUGFS_ATTRIBUTE(fops_imc_x64, imc_mem_get, imc_mem_set, 
> "0x%016llx\n");
>
> -static struct dentry *imc_debugfs_create_x64(const char *name, umode_t mode,
> -struct dentry *parent, u64  
> *value)
> +static void imc_debugfs_create_x64(const char *name, umode_t mode,
> +  struct dentry *parent, u64  *value)
>  {
> -   return debugfs_create_file_unsafe(name, mode, parent,
> - value, _imc_x64);
> +   debugfs_create_file_unsafe(name, mode, parent, value, _imc_x64);
>  }
>
>  /*
> @@ -59,9 +58,6 @@ static void export_imc_mode_and_cmd(struct device_node 
> *node,
>
> imc_debugfs_parent = debugfs_create_dir("imc", powerpc_debugfs_root);
>
> -   if (!imc_debugfs_parent)
> -   return;
> -
> if (of_property_read_u32(node, "cb_offset", _offset))
> cb_offset = IMC_CNTL_BLK_OFFSET;
>
> @@ -69,21 +65,15 @@ static void export_imc_mode_and_cmd(struct device_node 
> *node,
> loc = (u64)(ptr->vbase) + cb_offset;
> imc_mode_addr = (u64 *)(loc + IMC_CNTL_BLK_MODE_OFFSET);
> sprintf(mode, "imc_mode_%d", (u32)(ptr->id));
> -   if (!imc_debugfs_create_x64(mode, 0600, imc_debugfs_parent,
> -   imc_mode_addr))
> -   goto err;
> +   imc_debugfs_create_x64(mode, 0600, imc_debugfs_parent,
> +  imc_mode_addr);
>
> imc_cmd_addr = (u64 *)(loc + IMC_CNTL_BLK_CMD_OFFSET);
> sprintf(cmd, "imc_cmd_%d", (u32)(ptr->id));
> -   if (!imc_debugfs_create_x64(cmd, 0600, imc_debugfs_parent,
> -   imc_cmd_addr))
> -   goto err;
> +   imc_debugfs_create_x64(cmd, 0600, imc_debugfs_parent,
> +  imc_cmd_addr);
> ptr++;
> }
> -   return;
> -
> -err:
> -   debugfs_remove_recursive(imc_debugfs_parent);
>  }
>
>  /*
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 22c22cd7bd82..57d3a6af1d52 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -3174,11 +3174,6 @@ static void pnv_pci_ioda_create_dbgfs(void)
>
> sprintf(name, "PCI%04x", 

Re: [PATCH v2] libnvdimm: Update persistence domain value for of_pmem and papr_scm device

2020-02-10 Thread Aneesh Kumar K.V
Dan Williams  writes:

> On Tue, Feb 4, 2020 at 9:21 PM Aneesh Kumar K.V
>  wrote:
>>
>> Currently, kernel shows the below values
>> "persistence_domain":"cpu_cache"
>> "persistence_domain":"memory_controller"
>> "persistence_domain":"unknown"
>>
>> "cpu_cache" indicates no extra instructions is needed to ensure the 
>> persistence
>> of data in the pmem media on power failure.
>>
>> "memory_controller" indicates platform provided instructions need to be 
>> issued
>
> No, it does not. The only requirement implied by "memory_controller"
> is global visibility outside the cpu cache. If there are special
> instructions beyond that then it isn't persistent memory, at least not
> pmem that is safe for dax. virtio-pmem is an example of pmem-like
> memory that is not enabled for userspace flushing (MAP_SYNC disabled).
>

Can you explain this more? The way I was expecting the application to
interpret the value was, a regular store instruction doesn't guarantee
persistence if you find the "memory_controller" value for
persistence_domain. Instead, we need to make sure we flush data to the
controller at which point the platform will take care of the persistence in
case of power loss. How we flush data to the controller will also be
defined by the platform.


>> as per documented sequence to make sure data get flushed so that it is
>> guaranteed to be on pmem media in case of system power loss.
>>
>> Based on the above use memory_controller for non volatile regions on ppc64.
>>
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>>  arch/powerpc/platforms/pseries/papr_scm.c | 7 ++-
>>  drivers/nvdimm/of_pmem.c  | 4 +++-
>>  include/linux/libnvdimm.h | 1 -
>>  3 files changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
>> b/arch/powerpc/platforms/pseries/papr_scm.c
>> index 7525635a8536..ffcd0d7a867c 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -359,8 +359,13 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>>
>> if (p->is_volatile)
>> p->region = nvdimm_volatile_region_create(p->bus, _desc);
>> -   else
>> +   else {
>> +   /*
>> +* We need to flush things correctly to guarantee persistance
>> +*/
>
> There are never guarantees. If you're going to comment what does
> software need to flush, and how?

Can you explain why you say there are never guarantees? If you follow the 
platform
recommended instruction sequence to flush data, we can be sure of data
persistence in the pmem media.


>
>> +   set_bit(ND_REGION_PERSIST_MEMCTRL, _desc.flags);
>> p->region = nvdimm_pmem_region_create(p->bus, _desc);
>> +   }
>> if (!p->region) {
>> dev_err(dev, "Error registering region %pR from %pOF\n",
>> ndr_desc.res, p->dn);
>> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
>> index 8224d1431ea9..6826a274a1f1 100644
>> --- a/drivers/nvdimm/of_pmem.c
>> +++ b/drivers/nvdimm/of_pmem.c
>> @@ -62,8 +62,10 @@ static int of_pmem_region_probe(struct platform_device 
>> *pdev)
>>
>> if (is_volatile)
>> region = nvdimm_volatile_region_create(bus, 
>> _desc);
>> -   else
>> +   else {
>> +   set_bit(ND_REGION_PERSIST_MEMCTRL, _desc.flags);
>> region = nvdimm_pmem_region_create(bus, _desc);
>> +   }
>>
>> if (!region)
>> dev_warn(>dev, "Unable to register region %pR 
>> from %pOF\n",
>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
>> index 0f366706b0aa..771d888a5ed7 100644
>> --- a/include/linux/libnvdimm.h
>> +++ b/include/linux/libnvdimm.h
>> @@ -54,7 +54,6 @@ enum {
>> /*
>>  * Platform provides mechanisms to automatically flush outstanding
>>  * write data from memory controler to pmem on system power loss.
>> -* (ADR)
>
> I'd rather not delete critical terminology for a developer / platform
> owner to be able to consult documentation, or their vendor. Can you
> instead add the PowerPC equivalent term for this capability? I.e. list
> (x86: ADR PowerPC: foo ...).

Power ISA doesn't clearly call out what mechanism will be used to ensure
that a load following power loss will return the previously flushed
data. Hence there is no description of details like Asynchronous DRAM
Refresh. Only details specified is with respect to flush sequence that ensures
that a load following power loss will return the value stored.

-aneesh


[powerpc:merge] BUILD SUCCESS a5bc6e124219546a81ce334dc9b16483d55e9abf

2020-02-10 Thread kbuild test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
merge
branch HEAD: a5bc6e124219546a81ce334dc9b16483d55e9abf  Automatic merge of 
branches 'master', 'next' and 'fixes' into merge

elapsed time: 2918m

configs tested: 177
configs skipped: 0

The following configs have been built successfully.
More configs may be tested in the coming days.

arm  allmodconfig
arm   allnoconfig
arm  allyesconfig
arm at91_dt_defconfig
arm   efm32_defconfig
arm  exynos_defconfig
armmulti_v5_defconfig
armmulti_v7_defconfig
armshmobile_defconfig
arm   sunxi_defconfig
arm64allmodconfig
arm64 allnoconfig
arm64allyesconfig
arm64   defconfig
sparcallyesconfig
sh   allmodconfig
riscv   defconfig
arc defconfig
nds32   defconfig
um  defconfig
xtensa   common_defconfig
s390  debug_defconfig
sparc64   allnoconfig
um   x86_64_defconfig
m68k   m5475evb_defconfig
ia64 allmodconfig
ia64 allyesconfig
parisc b180_defconfig
i386defconfig
sparc   defconfig
sh  rsk7269_defconfig
sparc64  allyesconfig
s390 allyesconfig
ia64defconfig
powerpc   ppc64_defconfig
nios2 3c120_defconfig
shtitan_defconfig
sparc64  allmodconfig
parisc   allyesconfig
riscvallmodconfig
mips  fuloong2e_defconfig
microblazenommu_defconfig
m68k  multi_defconfig
riscv allnoconfig
pariscallnoconfig
c6x  allyesconfig
s390defconfig
i386  allnoconfig
i386 allyesconfig
i386 alldefconfig
ia64 alldefconfig
ia64  allnoconfig
c6xevmc6678_defconfig
nios2 10m50_defconfig
openriscor1ksim_defconfig
openrisc simple_smp_defconfig
xtensa  iss_defconfig
alpha   defconfig
cskydefconfig
nds32 allnoconfig
h8300 edosk2674_defconfig
h8300h8300h-sim_defconfig
h8300   h8s-sim_defconfig
m68k allmodconfig
m68k   sun3_defconfig
arc  allyesconfig
microblaze  mmu_defconfig
powerpc   allnoconfig
powerpc defconfig
powerpc  rhel-kconfig
mips   32r2_defconfig
mips 64r6el_defconfig
mips allmodconfig
mips  allnoconfig
mips allyesconfig
mips  malta_kvm_defconfig
pariscc3000_defconfig
parisc  defconfig
x86_64   randconfig-a001-20200210
x86_64   randconfig-a002-20200210
x86_64   randconfig-a003-20200210
i386 randconfig-a001-20200210
i386 randconfig-a002-20200210
i386 randconfig-a003-20200210
alpharandconfig-a001-20200208
parisc   randconfig-a001-20200208
m68k randconfig-a001-20200208
nds32randconfig-a001-20200208
mips randconfig-a001-20200208
riscvrandconfig-a001-20200208
c6x  randconfig-a001-20200210
h8300randconfig-a001-20200210
microblaze   randconfig-a001-20200210
nios2randconfig-a001-20200210
sparc64  randconfig-a001-20200210
h8300randconfig-a001-20200208
nios2randconfig-a001-20200208
microblaze   randconfig-a001-20200208
sparc64  randconfig-a001-20200208
c6x  randconfig-a001-20200208
x86_64

Re: [PATCH V13] mm/debug: Add tests validating architecture page table helpers

2020-02-10 Thread Russell King - ARM Linux admin
On Mon, Feb 10, 2020 at 11:46:23AM +0100, Christophe Leroy wrote:
> 
> 
> Le 10/02/2020 à 11:02, Russell King - ARM Linux admin a écrit :
> > On Mon, Feb 10, 2020 at 07:38:38AM +0100, Christophe Leroy wrote:
> > > 
> > > 
> > > Le 10/02/2020 à 06:35, Anshuman Khandual a écrit :
> > > > 
> > > > 
> > > > On 02/10/2020 10:22 AM, Andrew Morton wrote:
> > > > > On Thu, 6 Feb 2020 13:49:35 +0530 Anshuman Khandual 
> > > > >  wrote:
> > > > > 
> > > > > > 
> > > > > > On 02/06/2020 04:40 AM, kbuild test robot wrote:
> > > > > > > Hi Anshuman,
> > > > > > > 
> > > > > > > Thank you for the patch! Yet something to improve:
> > > > > > > 
> > > > > > > [auto build test ERROR on powerpc/next]
> > > > > > > [also build test ERROR on s390/features linus/master arc/for-next 
> > > > > > > v5.5]
> > > > > > > [cannot apply to mmotm/master tip/x86/core arm64/for-next/core 
> > > > > > > next-20200205]
> > > > > > > [if your patch is applied to the wrong git tree, please drop us a 
> > > > > > > note to help
> > > > > > > improve the system. BTW, we also suggest to use '--base' option 
> > > > > > > to specify the
> > > > > > > base tree in git format-patch, please see 
> > > > > > > https://stackoverflow.com/a/37406982]
> > > > > > > 
> > > > > > > url:
> > > > > > > https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-debug-Add-tests-validating-architecture-page-table-helpers/20200205-215507
> > > > > > > base:   
> > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
> > > > > > > next
> > > > > > > config: ia64-allmodconfig (attached as .config)
> > > > > > > compiler: ia64-linux-gcc (GCC) 7.5.0
> > > > > > > reproduce:
> > > > > > >   wget 
> > > > > > > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross
> > > > > > >  -O ~/bin/make.cross
> > > > > > >   chmod +x ~/bin/make.cross
> > > > > > >   # save the attached .config to linux build tree
> > > > > > >   GCC_VERSION=7.5.0 make.cross ARCH=ia64
> > > > > > > 
> > > > > > > If you fix the issue, kindly add following tag
> > > > > > > Reported-by: kbuild test robot 
> > > > > > > 
> > > > > > > All error/warnings (new ones prefixed by >>):
> > > > > > > 
> > > > > > >  In file included from 
> > > > > > > include/asm-generic/pgtable-nopud.h:8:0,
> > > > > > >   from arch/ia64/include/asm/pgtable.h:586,
> > > > > > >   from include/linux/mm.h:99,
> > > > > > >   from include/linux/highmem.h:8,
> > > > > > >   from mm/debug_vm_pgtable.c:14:
> > > > > > >  mm/debug_vm_pgtable.c: In function 'pud_clear_tests':
> > > > > > > > > include/asm-generic/pgtable-nop4d-hack.h:47:32: error: 
> > > > > > > > > implicit declaration of function '__pgd'; did you mean 
> > > > > > > > > '__p4d'? [-Werror=implicit-function-declaration]
> > > > > > >   #define __pud(x)((pud_t) { __pgd(x) })
> > > > > > >  ^
> > > > > > > > > mm/debug_vm_pgtable.c:141:8: note: in expansion of macro 
> > > > > > > > > '__pud'
> > > > > > >pud = __pud(pud_val(pud) | RANDOM_ORVALUE);
> > > > > > >  ^
> > > > > > > > > include/asm-generic/pgtable-nop4d-hack.h:47:22: warning: 
> > > > > > > > > missing braces around initializer [-Wmissing-braces]
> > > > > > >   #define __pud(x)((pud_t) { __pgd(x) })
> > > > > > >^
> > > > > > > > > mm/debug_vm_pgtable.c:141:8: note: in expansion of macro 
> > > > > > > > > '__pud'
> > > > > > >pud = __pud(pud_val(pud) | RANDOM_ORVALUE);
> > > > > > >  ^
> > > > > > >  cc1: some warnings being treated as errors
> > > > > > 
> > > > > > This build failure is expected now given that we have allowed 
> > > > > > DEBUG_VM_PGTABLE
> > > > > > with EXPERT without platform requiring ARCH_HAS_DEBUG_VM_PGTABLE. 
> > > > > > This problem
> > > > > > i.e build failure caused without a platform __pgd(), is known to 
> > > > > > exist both on
> > > > > > ia64 and arm (32bit) platforms. Please refer 
> > > > > > https://lkml.org/lkml/2019/9/24/314
> > > > > > for details where this was discussed earlier.
> > > > > > 
> > > > > 
> > > > > I'd prefer not to merge a patch which is known to cause build
> > > > > regressions.  Is there some temporary thing we can do to prevent these
> > > > > errors until arch maintainers(?) get around to implementing the
> > > > > long-term fixes?
> > > > 
> > > > We could explicitly disable CONFIG_DEBUG_VM_PGTABLE on ia64 and arm 
> > > > platforms
> > > > which will ensure that others can still use the EXPERT path.
> > > > 
> > > > config DEBUG_VM_PGTABLE
> > > > bool "Debug arch page table for semantics compliance"
> > > > depends on MMU
> > > > depends on !(IA64 || ARM)
> > > > depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
> > > > default n if !ARCH_HAS_DEBUG_VM_PGTABLE
> > > > default y if DEBUG_VM

Re: [PATCH] powerpc/vdso32: mark __kernel_datapage_offset as STV_PROTECTED

2020-02-10 Thread Michael Ellerman
Fangrui Song  writes:
> A PC-relative relocation (R_PPC_REL16_LO in this case) referencing a
> preemptible symbol in a -shared link is not allowed.  GNU ld's powerpc
> port is permissive and allows it [1], but lld will report an error after
> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=ec0895f08f99515194e9fcfe1338becf6f759d38
>
> Make the symbol protected so that it is non-preemptible but still
> exported.

"preemptible" means something different to me, and I assume we're not
using it to mean the same thing.

Can you explain it using small words that a kernel developer can
understand? :)

cheers

> [1]: https://sourceware.org/bugzilla/show_bug.cgi?id=25500
>
> Link: https://github.com/ClangBuiltLinux/linux/issues/851
> Signed-off-by: Fangrui Song 

> ---
>  arch/powerpc/kernel/vdso32/datapage.S | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/vdso32/datapage.S 
> b/arch/powerpc/kernel/vdso32/datapage.S
> index 217bb630f8f9..2831a8676365 100644
> --- a/arch/powerpc/kernel/vdso32/datapage.S
> +++ b/arch/powerpc/kernel/vdso32/datapage.S
> @@ -13,7 +13,8 @@
>  #include 
>  
>   .text
> - .global __kernel_datapage_offset;
> + .global __kernel_datapage_offset
> + .protected  __kernel_datapage_offset
>  __kernel_datapage_offset:
>   .long   0
>  
> -- 
> 2.25.0.341.g760bfbb309-goog


Re: [PATCH V13] mm/debug: Add tests validating architecture page table helpers

2020-02-10 Thread Christophe Leroy




Le 10/02/2020 à 11:02, Russell King - ARM Linux admin a écrit :

On Mon, Feb 10, 2020 at 07:38:38AM +0100, Christophe Leroy wrote:



Le 10/02/2020 à 06:35, Anshuman Khandual a écrit :



On 02/10/2020 10:22 AM, Andrew Morton wrote:

On Thu, 6 Feb 2020 13:49:35 +0530 Anshuman Khandual  
wrote:



On 02/06/2020 04:40 AM, kbuild test robot wrote:

Hi Anshuman,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on s390/features linus/master arc/for-next v5.5]
[cannot apply to mmotm/master tip/x86/core arm64/for-next/core next-20200205]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-debug-Add-tests-validating-architecture-page-table-helpers/20200205-215507
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 7.5.0
reproduce:
  wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
  chmod +x ~/bin/make.cross
  # save the attached .config to linux build tree
  GCC_VERSION=7.5.0 make.cross ARCH=ia64

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All error/warnings (new ones prefixed by >>):

 In file included from include/asm-generic/pgtable-nopud.h:8:0,
  from arch/ia64/include/asm/pgtable.h:586,
  from include/linux/mm.h:99,
  from include/linux/highmem.h:8,
  from mm/debug_vm_pgtable.c:14:
 mm/debug_vm_pgtable.c: In function 'pud_clear_tests':

include/asm-generic/pgtable-nop4d-hack.h:47:32: error: implicit declaration of 
function '__pgd'; did you mean '__p4d'? [-Werror=implicit-function-declaration]

  #define __pud(x)((pud_t) { __pgd(x) })
 ^

mm/debug_vm_pgtable.c:141:8: note: in expansion of macro '__pud'

   pud = __pud(pud_val(pud) | RANDOM_ORVALUE);
 ^

include/asm-generic/pgtable-nop4d-hack.h:47:22: warning: missing braces around 
initializer [-Wmissing-braces]

  #define __pud(x)((pud_t) { __pgd(x) })
   ^

mm/debug_vm_pgtable.c:141:8: note: in expansion of macro '__pud'

   pud = __pud(pud_val(pud) | RANDOM_ORVALUE);
 ^
 cc1: some warnings being treated as errors


This build failure is expected now given that we have allowed DEBUG_VM_PGTABLE
with EXPERT without platform requiring ARCH_HAS_DEBUG_VM_PGTABLE. This problem
i.e build failure caused without a platform __pgd(), is known to exist both on
ia64 and arm (32bit) platforms. Please refer https://lkml.org/lkml/2019/9/24/314
for details where this was discussed earlier.



I'd prefer not to merge a patch which is known to cause build
regressions.  Is there some temporary thing we can do to prevent these
errors until arch maintainers(?) get around to implementing the
long-term fixes?


We could explicitly disable CONFIG_DEBUG_VM_PGTABLE on ia64 and arm platforms
which will ensure that others can still use the EXPERT path.

config DEBUG_VM_PGTABLE
bool "Debug arch page table for semantics compliance"
depends on MMU
depends on !(IA64 || ARM)
depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
default n if !ARCH_HAS_DEBUG_VM_PGTABLE
default y if DEBUG_VM



On both ia32 and arm, the fix is trivial.

Can we include the fix within this patch, just the same way as the
mm_p4d_folded() fix for x86 ?


Why should arm include a macro for something that nothing (apart from
this checker) requires?  If the checker requires it but the rest of
the kernel does not, it suggests that the checker isn't actually
correct, and the results can't be relied upon.



As far as I can see, the problem is that arm opencodes part of the API 
instead of including asm-generic/pgtable-nopmd.h


Here, the ARM has 2 levels, ie only PGD and PTE. But instead of defining 
__pgd and __pte and getting everything else from asm-generic, it defines 
a __pmd then redefines the folded levels like the pud, etc ...


That's exactly what the checker aims at detecting: architectures than do 
not properly use the standard linux page table structures.


Christophe


Re: [PATCH V13] mm/debug: Add tests validating architecture page table helpers

2020-02-10 Thread Russell King - ARM Linux admin
On Mon, Feb 10, 2020 at 07:38:38AM +0100, Christophe Leroy wrote:
> 
> 
> Le 10/02/2020 à 06:35, Anshuman Khandual a écrit :
> > 
> > 
> > On 02/10/2020 10:22 AM, Andrew Morton wrote:
> > > On Thu, 6 Feb 2020 13:49:35 +0530 Anshuman Khandual 
> > >  wrote:
> > > 
> > > > 
> > > > On 02/06/2020 04:40 AM, kbuild test robot wrote:
> > > > > Hi Anshuman,
> > > > > 
> > > > > Thank you for the patch! Yet something to improve:
> > > > > 
> > > > > [auto build test ERROR on powerpc/next]
> > > > > [also build test ERROR on s390/features linus/master arc/for-next 
> > > > > v5.5]
> > > > > [cannot apply to mmotm/master tip/x86/core arm64/for-next/core 
> > > > > next-20200205]
> > > > > [if your patch is applied to the wrong git tree, please drop us a 
> > > > > note to help
> > > > > improve the system. BTW, we also suggest to use '--base' option to 
> > > > > specify the
> > > > > base tree in git format-patch, please see 
> > > > > https://stackoverflow.com/a/37406982]
> > > > > 
> > > > > url:
> > > > > https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-debug-Add-tests-validating-architecture-page-table-helpers/20200205-215507
> > > > > base:   
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> > > > > config: ia64-allmodconfig (attached as .config)
> > > > > compiler: ia64-linux-gcc (GCC) 7.5.0
> > > > > reproduce:
> > > > >  wget 
> > > > > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross
> > > > >  -O ~/bin/make.cross
> > > > >  chmod +x ~/bin/make.cross
> > > > >  # save the attached .config to linux build tree
> > > > >  GCC_VERSION=7.5.0 make.cross ARCH=ia64
> > > > > 
> > > > > If you fix the issue, kindly add following tag
> > > > > Reported-by: kbuild test robot 
> > > > > 
> > > > > All error/warnings (new ones prefixed by >>):
> > > > > 
> > > > > In file included from include/asm-generic/pgtable-nopud.h:8:0,
> > > > >  from arch/ia64/include/asm/pgtable.h:586,
> > > > >  from include/linux/mm.h:99,
> > > > >  from include/linux/highmem.h:8,
> > > > >  from mm/debug_vm_pgtable.c:14:
> > > > > mm/debug_vm_pgtable.c: In function 'pud_clear_tests':
> > > > > > > include/asm-generic/pgtable-nop4d-hack.h:47:32: error: implicit 
> > > > > > > declaration of function '__pgd'; did you mean '__p4d'? 
> > > > > > > [-Werror=implicit-function-declaration]
> > > > >  #define __pud(x)((pud_t) { __pgd(x) })
> > > > > ^
> > > > > > > mm/debug_vm_pgtable.c:141:8: note: in expansion of macro '__pud'
> > > > >   pud = __pud(pud_val(pud) | RANDOM_ORVALUE);
> > > > > ^
> > > > > > > include/asm-generic/pgtable-nop4d-hack.h:47:22: warning: missing 
> > > > > > > braces around initializer [-Wmissing-braces]
> > > > >  #define __pud(x)((pud_t) { __pgd(x) })
> > > > >   ^
> > > > > > > mm/debug_vm_pgtable.c:141:8: note: in expansion of macro '__pud'
> > > > >   pud = __pud(pud_val(pud) | RANDOM_ORVALUE);
> > > > > ^
> > > > > cc1: some warnings being treated as errors
> > > > 
> > > > This build failure is expected now given that we have allowed 
> > > > DEBUG_VM_PGTABLE
> > > > with EXPERT without platform requiring ARCH_HAS_DEBUG_VM_PGTABLE. This 
> > > > problem
> > > > i.e build failure caused without a platform __pgd(), is known to exist 
> > > > both on
> > > > ia64 and arm (32bit) platforms. Please refer 
> > > > https://lkml.org/lkml/2019/9/24/314
> > > > for details where this was discussed earlier.
> > > > 
> > > 
> > > I'd prefer not to merge a patch which is known to cause build
> > > regressions.  Is there some temporary thing we can do to prevent these
> > > errors until arch maintainers(?) get around to implementing the
> > > long-term fixes?
> > 
> > We could explicitly disable CONFIG_DEBUG_VM_PGTABLE on ia64 and arm 
> > platforms
> > which will ensure that others can still use the EXPERT path.
> > 
> > config DEBUG_VM_PGTABLE
> > bool "Debug arch page table for semantics compliance"
> > depends on MMU
> > depends on !(IA64 || ARM)
> > depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
> > default n if !ARCH_HAS_DEBUG_VM_PGTABLE
> > default y if DEBUG_VM
> > 
> 
> On both ia32 and arm, the fix is trivial.
> 
> Can we include the fix within this patch, just the same way as the
> mm_p4d_folded() fix for x86 ?

Why should arm include a macro for something that nothing (apart from
this checker) requires?  If the checker requires it but the rest of
the kernel does not, it suggests that the checker isn't actually
correct, and the results can't be relied upon.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [RESEND][PATCH] selftests/vm: Fix vm tests build and run

2020-02-10 Thread Michael Ellerman
Harish  writes:
> A recent change overrides the ARCH env variable and hence runs
> using make fails with the following.
>
> $ make -C vm/
> make: Entering directory '/home/harish/linux/tools/testing/selftests/vm'
> make --no-builtin-rules ARCH=ppc64le -C ../../../.. headers_install
> make[1]: Entering directory '/home/harish/linux'
> Makefile:652: arch/ppc64le/Makefile: No such file or directory
> make[1]: *** No rule to make target 'arch/ppc64le/Makefile'.  Stop.
> make[1]: Leaving directory '/home/harish/linux'
> make: *** [../lib.mk:50: khdr] Error 2
> make: Leaving directory '/home/harish/linux/tools/testing/selftests/vm'
>
> Patch fixes this issue and also handles ppc64/ppc64le archs to enable
> few tests
>
> Signed-off-by: Harish 
> ---
>  tools/testing/selftests/vm/Makefile| 4 ++--
>  tools/testing/selftests/vm/run_vmtests | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)

I don't maintain the vm tests. You need to send the patch to the right
people.

But as Christophe points out I think Sandipan has already done that.

cheers


Re: [PATCH V5 09/14] powerpc/vas: Update CSB and notify process for fault CRBs

2020-02-10 Thread Michael Neuling


> > > +
> > > + csb.cc = CSB_CC_TRANSLATION;
> > > + csb.ce = CSB_CE_TERMINATION;
> > > + csb.cs = 0;
> > > + csb.count = 0;
> > > +
> > > + /*
> > > +  * Returns the fault address in CPU format since it is passed with
> > > +  * signal. But if the user space expects BE format, need changes.
> > > +  * i.e either kernel (here) or user should convert to CPU format.
> > > +  * Not both!
> > > +  */
> > > + csb.address = be64_to_cpu(crb->stamp.nx.fault_storage_addr);
> > 
> > This looks wrong and I don't understand the comment. You need to convert
> > this
> > back to be64 to write it to csb.address. ie.
> > 
> >   csb.address = cpu_to_be64(be64_to_cpu(crb->stamp.nx.fault_storage_addr));
> > 
> > Which I think you can just avoid the endian conversion all together.
> 
> NX pastes fault CRB in big-endian, so passing this address in CPU format
> to user space, otherwise the library has to convert. 

OK, then please change the definition in struct coprocessor_status_block to just
__u64.

struct coprocessor_status_block {
u8 flags;
u8 cs;
u8 cc;
u8 ce;
__be32 count;
__be64 address;
} __packed __aligned(CSB_ALIGN);

Big but

I thought "struct coprocessor_status_block" was also written by hardware. If
that's the case then it needs to be __be64 and you need the kernel to synthesize
exactly what the hardware is doing. Hence the struct definition is correct and
the kernel needs to convert to _be64 on writing. 

> What is the standard way for passing to user space? 

CPU endian.

> > > +  * process will be polling on csb.flags after request is sent to
> > > +  * NX. So generally CSB update should not fail except when an
> > > +  * application does not follow the process properly. So an error
> > > +  * message will be displayed and leave it to user space whether
> > > +  * to ignore or handle this signal.
> > > +  */
> > > + rcu_read_lock();
> > > + rc = kill_pid_info(SIGSEGV, , pid);
> > > + rcu_read_unlock();
> > 
> > why the rcu_read_un/lock() here?
> 
> Used same as in kill_proc_info()/kill_something_info()

Please document.

Mikey


Re: [RESEND][PATCH] selftests/vm: Fix vm tests build and run

2020-02-10 Thread Christophe Leroy




Le 10/02/2020 à 08:35, Harish a écrit :

A recent change overrides the ARCH env variable and hence runs
using make fails with the following.

$ make -C vm/
make: Entering directory '/home/harish/linux/tools/testing/selftests/vm'
make --no-builtin-rules ARCH=ppc64le -C ../../../.. headers_install
make[1]: Entering directory '/home/harish/linux'
Makefile:652: arch/ppc64le/Makefile: No such file or directory
make[1]: *** No rule to make target 'arch/ppc64le/Makefile'.  Stop.
make[1]: Leaving directory '/home/harish/linux'
make: *** [../lib.mk:50: khdr] Error 2
make: Leaving directory '/home/harish/linux/tools/testing/selftests/vm'

Patch fixes this issue and also handles ppc64/ppc64le archs to enable
few tests


Isn't it the same as 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=155975 ?


Christophe



Signed-off-by: Harish 
---
  tools/testing/selftests/vm/Makefile| 4 ++--
  tools/testing/selftests/vm/run_vmtests | 2 +-
  2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/vm/Makefile 
b/tools/testing/selftests/vm/Makefile
index 7f9a8a8c31da..49bb15be1447 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -1,7 +1,7 @@
  # SPDX-License-Identifier: GPL-2.0
  # Makefile for vm selftests
  uname_M := $(shell uname -m 2>/dev/null || echo not)
-ARCH ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/')
+ARCH_USED ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/' -e 
's/ppc64.*/ppc64/')
  
  CFLAGS = -Wall -I ../../../../usr/include $(EXTRA_CFLAGS)

  LDLIBS = -lrt
@@ -19,7 +19,7 @@ TEST_GEN_FILES += thuge-gen
  TEST_GEN_FILES += transhuge-stress
  TEST_GEN_FILES += userfaultfd
  
-ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64))

+ifneq (,$(filter $(ARCH_USED),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x 
sh64 sparc64 x86_64))
  TEST_GEN_FILES += va_128TBswitch
  TEST_GEN_FILES += virtual_address_range
  endif
diff --git a/tools/testing/selftests/vm/run_vmtests 
b/tools/testing/selftests/vm/run_vmtests
index a692ea828317..da63dfb9713a 100755
--- a/tools/testing/selftests/vm/run_vmtests
+++ b/tools/testing/selftests/vm/run_vmtests
@@ -61,7 +61,7 @@ fi
  #filter 64bit architectures
  ARCH64STR="arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64"
  if [ -z $ARCH ]; then
-  ARCH=`uname -m 2>/dev/null | sed -e 's/aarch64.*/arm64/'`
+  ARCH=`uname -m 2>/dev/null | sed -e 's/aarch64.*/arm64/' -e 
's/ppc64.*/ppc64/'`
  fi
  VADDR64=0
  echo "$ARCH64STR" | grep $ARCH && VADDR64=1