date:20181008

[PATCH] powerpc/perf: Quiet PMU registration message

2018-10-08 Thread Joel Stanley

On a Power9 box we get a few screens full of these on boot. Drop
them to pr_debug.

[5.993645] nest_centaur6_imc performance monitor hardware support registered
[5.993728] nest_centaur7_imc performance monitor hardware support registered
[5.996510] core_imc performance monitor hardware support registered
[5.996569] nest_mba0_imc performance monitor hardware support registered
[5.996631] nest_mba1_imc performance monitor hardware support registered
[5.996685] nest_mba2_imc performance monitor hardware support registered

Signed-off-by: Joel Stanley 
---
 arch/powerpc/perf/core-book3s.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 81f8a0c838ae..a01c521694e8 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2249,8 +2249,8 @@ int register_power_pmu(struct power_pmu *pmu)
return -EBUSY;  /* something's already registered */
 
ppmu = pmu;
-   pr_info("%s performance monitor hardware support registered\n",
-   pmu->name);
+   pr_debug("%s performance monitor hardware support registered\n",
+pmu->name);
 
power_pmu.attr_groups = ppmu->attr_groups;
 
-- 
2.17.1

Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt

2018-10-08 Thread Nicholas Piggin

On Tue, 9 Oct 2018 06:46:30 +0200
Christophe LEROY  wrote:

> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :
> > On Mon, 8 Oct 2018 17:39:11 +0200
> > Christophe LEROY  wrote:
> >   
> >> Hi Nick,
> >>
> >> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :  
> >>> Use nmi_enter similarly to system reset interrupts. This uses NMI
> >>> printk NMI buffers and turns off various debugging facilities that
> >>> helps avoid tripping on ourselves or other CPUs.
> >>>
> >>> Signed-off-by: Nicholas Piggin 
> >>> ---
> >>>arch/powerpc/kernel/traps.c | 9 ++---
> >>>1 file changed, 6 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> >>> index 2849c4f50324..6d31f9d7c333 100644
> >>> --- a/arch/powerpc/kernel/traps.c
> >>> +++ b/arch/powerpc/kernel/traps.c
> >>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
> >>>
> >>>void machine_check_exception(struct pt_regs *regs)
> >>>{
> >>> - enum ctx_state prev_state = exception_enter();
> >>>   int recover = 0;
> >>> + bool nested = in_nmi();
> >>> + if (!nested)
> >>> + nmi_enter();  
> >>
> >> This alters preempt_count, then when die() is called
> >> in_interrupt() returns true allthough the trap didn't happen in
> >> interrupt, so oops_end() panics for "fatal exception in interrupt"
> >> instead of gently sending SIGBUS the faulting app.  
> > 
> > Thanks for tracking that down.
> >   
> >> Any idea on how to fix this ?  
> > 
> > I would say we have to deliver the sigbus by hand.
> > 
> >  if ((user_mode(regs)))
> >  _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
> >  else
> >  die("Machine check", regs, SIGBUS);
> >   
> 
> And what about all the other things done by 'die()' ?
> 
> And what if it is a kernel thread ?
> 
> In one of my boards, I have a kernel thread regularly checking the HW, 
> and if it gets a machine check I expect it to gently stop and the die 
> notification to be delivered to all registered notifiers.
> 
> Until before this patch, it was working well.

I guess the alternative is we could check regs->trap for machine
check in the die test. Complication is having to account for MCE
in an interrupt handler.

   if (in_interrupt()) {
if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + 
HARDIRQ_OFFSET)))
panic("Fatal exception in interrupt");
   }

Something like that might work for you? We needs a ppc64 macro for the
MCE, and can probably add something like in_nmi_from_interrupt() for
the second part of the test.

Thanks,
Nick

[PATCH] powerpc/mm: make NULL pointer deferences explicit on bad page faults.

2018-10-08 Thread Christophe Leroy

As several other arches including x86, this patch makes it explicit
that a bad page fault is a NULL pointer dereference when the fault
address is lower than PAGE_SIZE

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/fault.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index d51cf5f4e45e..501a1eadb3e9 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -631,13 +631,16 @@ void bad_page_fault(struct pt_regs *regs, unsigned long 
address, int sig)
switch (TRAP(regs)) {
case 0x300:
case 0x380:
-   printk(KERN_ALERT "Unable to handle kernel paging request for "
-   "data at address 0x%08lx\n", regs->dar);
+   pr_alert("Unable to handle kernel %s for data at address 
0x%08lx\n",
+regs->dar < PAGE_SIZE ? "NULL pointer dereference" :
+"paging request",
+regs->dar);
break;
case 0x400:
case 0x480:
-   printk(KERN_ALERT "Unable to handle kernel paging request for "
-   "instruction fetch\n");
+   pr_alert("Unable to handle kernel %s for instruction fetch\n",
+regs->nip < PAGE_SIZE ? "NULL pointer dereference" :
+"paging request");
break;
case 0x600:
printk(KERN_ALERT "Unable to handle kernel paging request for "
-- 
2.13.3

Re: [PATCH 5/5] dma-direct: always allow dma mask <= physiscal memory size

2018-10-08 Thread Benjamin Herrenschmidt

On Wed, 2018-10-03 at 16:10 -0700, Alexander Duyck wrote:
> > -* Because 32-bit DMA masks are so common we expect every 
> > architecture
> > -* to be able to satisfy them - either by not supporting more 
> > physical
> > -* memory, or by providing a ZONE_DMA32.  If neither is the case, 
> > the
> > -* architecture needs to use an IOMMU instead of the direct mapping.
> > -*/
> > -   if (mask < phys_to_dma(dev, DMA_BIT_MASK(32)))
> > +   u64 min_mask;
> > +
> > +   if (IS_ENABLED(CONFIG_ZONE_DMA))
> > +   min_mask = DMA_BIT_MASK(ARCH_ZONE_DMA_BITS);
> > +   else
> > +   min_mask = DMA_BIT_MASK(32);
> > +
> > +   min_mask = min_t(u64, min_mask, (max_pfn - 1) << PAGE_SHIFT);
> > +
> > +   if (mask >= phys_to_dma(dev, min_mask))
> >  return 0;
> > -#endif
> >  return 1;
> >   }
> 
> So I believe I have run into the same issue that Guenter reported. On
> an x86_64 system w/ Intel IOMMU. I wasn't able to complete boot and
> all probe attempts for various devices were failing with -EIO errors.
> 
> I believe the last mask check should be "if (mask < phys_to_dma(dev,
> min_mask))" not a ">=" check.

Right, that test is backwards. I needed to change it here too (powermac
with the rest of the powerpc series).

Cheers,
Ben.

[PATCH v7 9/9] powerpc: clean stack pointers naming

2018-10-08 Thread Christophe Leroy

Some stack pointers used to also be thread_info pointers
and were called tp. Now that they are only stack pointers,
rename them sp.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/irq.c  | 17 +++--
 arch/powerpc/kernel/setup_64.c | 20 ++--
 2 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 62cfccf4af89..754f0efc507b 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -659,21 +659,21 @@ void __do_irq(struct pt_regs *regs)
 void do_IRQ(struct pt_regs *regs)
 {
struct pt_regs *old_regs = set_irq_regs(regs);
-   void *curtp, *irqtp, *sirqtp;
+   void *cursp, *irqsp, *sirqsp;
 
/* Switch to the irq stack to handle this */
-   curtp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1));
-   irqtp = hardirq_ctx[raw_smp_processor_id()];
-   sirqtp = softirq_ctx[raw_smp_processor_id()];
+   cursp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1));
+   irqsp = hardirq_ctx[raw_smp_processor_id()];
+   sirqsp = softirq_ctx[raw_smp_processor_id()];
 
/* Already there ? */
-   if (unlikely(curtp == irqtp || curtp == sirqtp)) {
+   if (unlikely(cursp == irqsp || cursp == sirqsp)) {
__do_irq(regs);
set_irq_regs(old_regs);
return;
}
/* Switch stack and call */
-   call_do_irq(regs, irqtp);
+   call_do_irq(regs, irqsp);
 
set_irq_regs(old_regs);
 }
@@ -732,10 +732,7 @@ void irq_ctx_init(void)
 
 void do_softirq_own_stack(void)
 {
-   void *irqtp;
-
-   irqtp = softirq_ctx[smp_processor_id()];
-   call_do_softirq(irqtp);
+   call_do_softirq(softirq_ctx[smp_processor_id()]);
 }
 
 irq_hw_number_t virq_to_hw(unsigned int virq)
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 6792e9c90689..4912ec0320b8 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -717,22 +717,22 @@ void __init emergency_stack_init(void)
limit = min(ppc64_bolted_size(), ppc64_rma_size);
 
for_each_possible_cpu(i) {
-   void *ti;
+   void *sp;
 
-   ti = alloc_stack(limit, i);
-   memset(ti, 0, THREAD_SIZE);
-   paca_ptrs[i]->emergency_sp = ti + THREAD_SIZE;
+   sp = alloc_stack(limit, i);
+   memset(sp, 0, THREAD_SIZE);
+   paca_ptrs[i]->emergency_sp = sp + THREAD_SIZE;
 
 #ifdef CONFIG_PPC_BOOK3S_64
/* emergency stack for NMI exception handling. */
-   ti = alloc_stack(limit, i);
-   memset(ti, 0, THREAD_SIZE);
-   paca_ptrs[i]->nmi_emergency_sp = ti + THREAD_SIZE;
+   sp = alloc_stack(limit, i);
+   memset(sp, 0, THREAD_SIZE);
+   paca_ptrs[i]->nmi_emergency_sp = sp + THREAD_SIZE;
 
/* emergency stack for machine check exception handling. */
-   ti = alloc_stack(limit, i);
-   memset(ti, 0, THREAD_SIZE);
-   paca_ptrs[i]->mc_emergency_sp = ti + THREAD_SIZE;
+   sp = alloc_stack(limit, i);
+   memset(sp, 0, THREAD_SIZE);
+   paca_ptrs[i]->mc_emergency_sp = sp + THREAD_SIZE;
 #endif
}
 }
-- 
2.13.3

[PATCH v7 8/9] powerpc/64: Remove CURRENT_THREAD_INFO

2018-10-08 Thread Christophe Leroy

Now that current_thread_info is located at the beginning of 'current'
task struct, CURRENT_THREAD_INFO macro is not really needed any more.

This patch replaces it by loads of the value at PACACURRENT(r13).

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/exception-64s.h   |  4 ++--
 arch/powerpc/include/asm/thread_info.h |  4 
 arch/powerpc/kernel/entry_64.S | 10 +-
 arch/powerpc/kernel/exceptions-64e.S   |  2 +-
 arch/powerpc/kernel/exceptions-64s.S   |  2 +-
 arch/powerpc/kernel/idle_book3e.S  |  2 +-
 arch/powerpc/kernel/idle_power4.S  |  2 +-
 arch/powerpc/kernel/trace/ftrace_64_mprofile.S |  6 +++---
 8 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index a86fead0..ca3af3e9015e 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -680,7 +680,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 #define RUNLATCH_ON\
 BEGIN_FTR_SECTION  \
-   CURRENT_THREAD_INFO(r3, r1);\
+   ld  r3, PACACURRENT(r13);   \
ld  r4,TI_LOCAL_FLAGS(r3);  \
andi.   r0,r4,_TLF_RUNLATCH;\
beqlppc64_runlatch_on_trampoline;   \
@@ -730,7 +730,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
 #ifdef CONFIG_PPC_970_NAP
 #define FINISH_NAP \
 BEGIN_FTR_SECTION  \
-   CURRENT_THREAD_INFO(r11, r1);   \
+   ld  r11, PACACURRENT(r13);  \
ld  r9,TI_LOCAL_FLAGS(r11); \
andi.   r10,r9,_TLF_NAPPING;\
bnelpower4_fixup_nap;   \
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 361bb45b8990..2ee9e248c933 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -17,10 +17,6 @@
 
 #define THREAD_SIZE(1 << THREAD_SHIFT)
 
-#ifdef CONFIG_PPC64
-#define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(ld dest, 
PACACURRENT(r13))
-#endif
-
 #ifndef __ASSEMBLY__
 #include 
 #include 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6fce0f8fd8c4..06d9a7c084a1 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -158,7 +158,7 @@ system_call:/* label this so stack 
traces look sane */
li  r10,IRQS_ENABLED
std r10,SOFTE(r1)
 
-   CURRENT_THREAD_INFO(r11, r1)
+   ld  r11, PACACURRENT(r13)
ld  r10,TI_FLAGS(r11)
andi.   r11,r10,_TIF_SYSCALL_DOTRACE
bne .Lsyscall_dotrace   /* does not return */
@@ -205,7 +205,7 @@ system_call:/* label this so stack 
traces look sane */
ld  r3,RESULT(r1)
 #endif
 
-   CURRENT_THREAD_INFO(r12, r1)
+   ld  r12, PACACURRENT(r13)
 
ld  r8,_MSR(r1)
 #ifdef CONFIG_PPC_BOOK3S
@@ -336,7 +336,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
/* Repopulate r9 and r10 for the syscall path */
addir9,r1,STACK_FRAME_OVERHEAD
-   CURRENT_THREAD_INFO(r10, r1)
+   ld  r10, PACACURRENT(r13)
ld  r10,TI_FLAGS(r10)
 
cmpldi  r0,NR_syscalls
@@ -735,7 +735,7 @@ _GLOBAL(ret_from_except_lite)
mtmsrd  r10,1 /* Update machine state */
 #endif /* CONFIG_PPC_BOOK3E */
 
-   CURRENT_THREAD_INFO(r9, r1)
+   ld  r9, PACACURRENT(r13)
ld  r3,_MSR(r1)
 #ifdef CONFIG_PPC_BOOK3E
ld  r10,PACACURRENT(r13)
@@ -849,7 +849,7 @@ resume_kernel:
 1: bl  preempt_schedule_irq
 
/* Re-test flags and eventually loop */
-   CURRENT_THREAD_INFO(r9, r1)
+   ld  r9, PACACURRENT(r13)
ld  r4,TI_FLAGS(r9)
andi.   r0,r4,_TIF_NEED_RESCHED
bne 1b
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 231d066b4a3d..dfafcd0af009 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -469,7 +469,7 @@ exc_##n##_bad_stack:
\
  * interrupts happen before the wait instruction.
  */
 #define CHECK_NAPPING()
\
-   CURRENT_THREAD_INFO(r11, r1);   \
+   ld  r11, PACACURRENT(r13);  \
ld  r10,TI_LOCAL_FLAGS(r11);\
andi.   r9,r10,_TLF_NAPPING;\
beq+1f; \
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index b9239dbf6d59..f776f30ecfcc 100644

[PATCH v7 6/9] powerpc: 'current_set' is now a table of task_struct pointers

2018-10-08 Thread Christophe Leroy

The table of pointers 'current_set' has been used for retrieving
the stack and current. They used to be thread_info pointers as
they were pointing to the stack and current was taken from the
'task' field of the thread_info.

Now, the pointers of 'current_set' table are now both pointers
to task_struct and pointers to thread_info.

As they are used to get current, and the stack pointer is
retrieved from current's stack field, this patch changes
their type to task_struct, and renames secondary_ti to
secondary_current.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/asm-prototypes.h |  4 ++--
 arch/powerpc/kernel/head_32.S |  6 +++---
 arch/powerpc/kernel/head_44x.S|  4 ++--
 arch/powerpc/kernel/head_fsl_booke.S  |  4 ++--
 arch/powerpc/kernel/smp.c | 10 --
 5 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 9bc98c239305..ab0541f9da42 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -23,8 +23,8 @@
 #include 
 
 /* SMP */
-extern struct thread_info *current_set[NR_CPUS];
-extern struct thread_info *secondary_ti;
+extern struct task_struct *current_set[NR_CPUS];
+extern struct task_struct *secondary_current;
 void start_secondary(void *unused);
 
 /* kexec */
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 44dfd73b2a62..ba0341bd5a00 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -842,9 +842,9 @@ __secondary_start:
 #endif /* CONFIG_6xx */
 
/* get current's stack and current */
-   lis r1,secondary_ti@ha
-   tophys(r1,r1)
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   tophys(r2,r2)
+   lwz r2,secondary_current@l(r2)
tophys(r1,r2)
lwz r1,TASK_STACK(r1)
 
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 2c7e90f36358..48e4de4dfd0c 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -1021,8 +1021,8 @@ _GLOBAL(start_secondary_47x)
/* Now we can get our task struct and real stack pointer */
 
/* Get current's stack and current */
-   lis r1,secondary_ti@ha
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   lwz r2,secondary_current@l(r2)
lwz r1,TASK_STACK(r2)
 
/* Current stack pointer */
diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index b8a2b789677e..0d27bfff52dd 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -1076,8 +1076,8 @@ __secondary_start:
bl  call_setup_cpu
 
/* get current's stack and current */
-   lis r1,secondary_ti@ha
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   lwz r2,secondary_current@l(r2)
lwz r1,TASK_STACK(r2)
 
/* stack */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index f22fcbeb9898..00193643f0da 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -74,7 +74,7 @@
 static DEFINE_PER_CPU(int, cpu_state) = { 0 };
 #endif
 
-struct thread_info *secondary_ti;
+struct task_struct *secondary_current;
 
 DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_l2_cache_map);
@@ -644,7 +644,7 @@ void smp_send_stop(void)
 }
 #endif /* CONFIG_NMI_IPI */
 
-struct thread_info *current_set[NR_CPUS];
+struct task_struct *current_set[NR_CPUS];
 
 static void smp_store_cpu_info(int id)
 {
@@ -724,7 +724,7 @@ void smp_prepare_boot_cpu(void)
paca_ptrs[boot_cpuid]->__current = current;
 #endif
set_numa_node(numa_cpu_lookup_table[boot_cpuid]);
-   current_set[boot_cpuid] = task_thread_info(current);
+   current_set[boot_cpuid] = current;
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -809,15 +809,13 @@ static bool secondaries_inhibited(void)
 
 static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle)
 {
-   struct thread_info *ti = task_thread_info(idle);
-
 #ifdef CONFIG_PPC64
paca_ptrs[cpu]->__current = idle;
paca_ptrs[cpu]->kstack = (unsigned long)task_stack_page(idle) +
  THREAD_SIZE - STACK_FRAME_OVERHEAD;
 #endif
idle->cpu = cpu;
-   secondary_ti = current_set[cpu] = ti;
+   secondary_current = current_set[cpu] = idle;
 }
 
 int __cpu_up(unsigned int cpu, struct task_struct *tidle)
-- 
2.13.3

[PATCH v7 7/9] powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU

2018-10-08 Thread Christophe Leroy

Now that thread_info is similar to task_struct, it's address is in r2
so CURRENT_THREAD_INFO() macro is useless. This patch removes it.

At the same time, as the 'cpu' field is not anymore in thread_info,
this patch renames it to TASK_CPU.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Makefile  |  2 +-
 arch/powerpc/include/asm/thread_info.h |  2 --
 arch/powerpc/kernel/asm-offsets.c  |  2 +-
 arch/powerpc/kernel/entry_32.S | 43 --
 arch/powerpc/kernel/epapr_hcalls.S |  5 ++--
 arch/powerpc/kernel/head_fsl_booke.S   |  5 ++--
 arch/powerpc/kernel/idle_6xx.S |  8 +++
 arch/powerpc/kernel/idle_e500.S|  8 +++
 arch/powerpc/kernel/misc_32.S  |  3 +--
 arch/powerpc/mm/hash_low_32.S  | 14 ---
 arch/powerpc/sysdev/6xx-suspend.S  |  5 ++--
 11 files changed, 35 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 02e7ca1c15d4..f1e2d7f7b022 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -426,7 +426,7 @@ ifdef CONFIG_SMP
 prepare: task_cpu_prepare
 
 task_cpu_prepare: prepare0
-   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") 
print $$3;}' include/generated/asm-offsets.h))
+   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == 
"TASK_CPU") print $$3;}' include/generated/asm-offsets.h))
 endif
 
 # Use the file '.tmp_gas_check' for binutils tests, as gas won't output
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 61c8747cd926..361bb45b8990 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -19,8 +19,6 @@
 
 #ifdef CONFIG_PPC64
 #define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(ld dest, 
PACACURRENT(r13))
-#else
-#define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(mr dest, r2)
 #endif
 
 #ifndef __ASSEMBLY__
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 768ce602d624..31be6eb9c0d4 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -97,7 +97,7 @@ int main(void)
 #endif /* CONFIG_PPC64 */
OFFSET(TASK_STACK, task_struct, stack);
 #ifdef CONFIG_SMP
-   OFFSET(TI_CPU, task_struct, cpu);
+   OFFSET(TASK_CPU, task_struct, cpu);
 #endif
 
 #ifdef CONFIG_LIVEPATCH
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index bd3b146e18a3..d0c546ce387e 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -168,8 +168,7 @@ transfer_to_handler:
tophys(r11,r11)
addir11,r11,global_dbcr0@l
 #ifdef CONFIG_SMP
-   CURRENT_THREAD_INFO(r9, r1)
-   lwz r9,TI_CPU(r9)
+   lwz r9,TASK_CPU(r2)
slwir9,r9,3
add r11,r11,r9
 #endif
@@ -180,8 +179,7 @@ transfer_to_handler:
stw r12,4(r11)
 #endif
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-   CURRENT_THREAD_INFO(r9, r1)
-   tophys(r9, r9)
+   tophys(r9, r2)
ACCOUNT_CPU_USER_ENTRY(r9, r11, r12)
 #endif
 
@@ -195,8 +193,7 @@ transfer_to_handler:
ble-stack_ovf   /* then the kernel stack overflowed */
 5:
 #if defined(CONFIG_6xx) || defined(CONFIG_E500)
-   CURRENT_THREAD_INFO(r9, r1)
-   tophys(r9,r9)   /* check local flags */
+   tophys(r9,r2)   /* check local flags */
lwz r12,TI_LOCAL_FLAGS(r9)
mtcrf   0x01,r12
bt- 31-TLF_NAPPING,4f
@@ -345,8 +342,7 @@ _GLOBAL(DoSyscall)
mtmsr   r11
 1:
 #endif /* CONFIG_TRACE_IRQFLAGS */
-   CURRENT_THREAD_INFO(r10, r1)
-   lwz r11,TI_FLAGS(r10)
+   lwz r11,TI_FLAGS(r2)
andi.   r11,r11,_TIF_SYSCALL_DOTRACE
bne-syscall_dotrace
 syscall_dotrace_cont:
@@ -379,13 +375,12 @@ ret_from_syscall:
lwz r3,GPR3(r1)
 #endif
mr  r6,r3
-   CURRENT_THREAD_INFO(r12, r1)
/* disable interrupts so current_thread_info()->flags can't change */
LOAD_MSR_KERNEL(r10,MSR_KERNEL) /* doesn't include MSR_EE */
/* Note: We don't bother telling lockdep about it */
SYNC
MTMSRD(r10)
-   lwz r9,TI_FLAGS(r12)
+   lwz r9,TI_FLAGS(r2)
li  r8,-MAX_ERRNO
andi.   
r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
bne-syscall_exit_work
@@ -432,8 +427,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX)
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
andi.   r4,r8,MSR_PR
beq 3f
-   CURRENT_THREAD_INFO(r4, r1)
-   ACCOUNT_CPU_USER_EXIT(r4, r5, r7)
+   ACCOUNT_CPU_USER_EXIT(r2, r5, r7)
 3:
 #endif
lwz r4,_LINK(r1)
@@ -526,7 +520,7 @@ syscall_exit_work:
/* Clear per-syscall TIF flags if any are set.  */
 
li  r11,_TIF_PERSYSCALL_MASK
-   addir12,r12,TI_FLAGS
+   addi

[PATCH v7 5/9] powerpc: regain entire stack space

2018-10-08 Thread Christophe Leroy

thread_info is not anymore in the stack, so the entire stack
can now be used.

There is also no risk anymore of corrupting task_cpu(p) with a
stack overflow so the patch removes the test.

When doing this, an explicit test for NULL stack pointer is
needed in validate_sp() as it is not anymore implicitely covered
by the sizeof(thread_info) gap.

In the meantime, with the previous patch all pointers to the stacks
are not anymore pointers to thread_info so this patch changes them
to void*

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/irq.h   | 10 +-
 arch/powerpc/include/asm/processor.h |  3 +--
 arch/powerpc/kernel/asm-offsets.c|  1 -
 arch/powerpc/kernel/entry_32.S   | 14 --
 arch/powerpc/kernel/irq.c| 19 +--
 arch/powerpc/kernel/misc_32.S|  6 ++
 arch/powerpc/kernel/process.c| 32 +---
 arch/powerpc/kernel/setup_64.c   |  8 
 8 files changed, 38 insertions(+), 55 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index 2efbae8d93be..966ddd4d2414 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -48,9 +48,9 @@ struct pt_regs;
  * Per-cpu stacks for handling critical, debug and machine check
  * level interrupts.
  */
-extern struct thread_info *critirq_ctx[NR_CPUS];
-extern struct thread_info *dbgirq_ctx[NR_CPUS];
-extern struct thread_info *mcheckirq_ctx[NR_CPUS];
+extern void *critirq_ctx[NR_CPUS];
+extern void *dbgirq_ctx[NR_CPUS];
+extern void *mcheckirq_ctx[NR_CPUS];
 extern void exc_lvl_ctx_init(void);
 #else
 #define exc_lvl_ctx_init()
@@ -59,8 +59,8 @@ extern void exc_lvl_ctx_init(void);
 /*
  * Per-cpu stacks for handling hard and soft interrupts.
  */
-extern struct thread_info *hardirq_ctx[NR_CPUS];
-extern struct thread_info *softirq_ctx[NR_CPUS];
+extern void *hardirq_ctx[NR_CPUS];
+extern void *softirq_ctx[NR_CPUS];
 
 extern void irq_ctx_init(void);
 void call_do_softirq(void *sp);
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index b225c7f7c5a4..e763342265a2 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -331,8 +331,7 @@ struct thread_struct {
 #define ARCH_MIN_TASKALIGN 16
 
 #define INIT_SP(sizeof(init_stack) + (unsigned long) 
_stack)
-#define INIT_SP_LIMIT \
-   (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack)
+#define INIT_SP_LIMIT  ((unsigned long)_stack)
 
 #ifdef CONFIG_SPE
 #define SPEFSCR_INIT \
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 833d189df04c..768ce602d624 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -93,7 +93,6 @@ int main(void)
DEFINE(NMI_MASK, NMI_MASK);
OFFSET(TASKTHREADPPR, task_struct, thread.ppr);
 #else
-   DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16));
OFFSET(KSP_LIMIT, thread_struct, ksp_limit);
 #endif /* CONFIG_PPC64 */
OFFSET(TASK_STACK, task_struct, stack);
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index fa7a69ffb37a..bd3b146e18a3 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -97,14 +97,11 @@ crit_transfer_to_handler:
mfspr   r0,SPRN_SRR1
stw r0,_SRR1(r11)
 
-   /* set the stack limit to the current stack
-* and set the limit to protect the thread_info
-* struct
-*/
+   /* set the stack limit to the current stack */
mfspr   r8,SPRN_SPRG_THREAD
lwz r0,KSP_LIMIT(r8)
stw r0,SAVED_KSP_LIMIT(r11)
-   rlwimi  r0,r1,0,0,(31-THREAD_SHIFT)
+   rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
 #endif
@@ -121,14 +118,11 @@ crit_transfer_to_handler:
mfspr   r0,SPRN_SRR1
stw r0,crit_srr1@l(0)
 
-   /* set the stack limit to the current stack
-* and set the limit to protect the thread_info
-* struct
-*/
+   /* set the stack limit to the current stack */
mfspr   r8,SPRN_SPRG_THREAD
lwz r0,KSP_LIMIT(r8)
stw r0,saved_ksp_limit@l(0)
-   rlwimi  r0,r1,0,0,(31-THREAD_SHIFT)
+   rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
 #endif
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 3fdb6b6973cf..62cfccf4af89 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -618,9 +618,8 @@ static inline void check_stack_overflow(void)
sp = current_stack_pointer() & (THREAD_SIZE-1);
 
/* check for stack overflow: is there less than 2KB free? */
-   if (unlikely(sp < (sizeof(struct thread_info) + 2048))) {
-   pr_err("do_IRQ: stack overflow: %ld\n",
-   sp -

[PATCH v7 4/9] powerpc: Activate CONFIG_THREAD_INFO_IN_TASK

2018-10-08 Thread Christophe Leroy

This patch activates CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

Moving thread_info into task_struct has the following advantages:
- It protects thread_info from corruption in the case of stack
overflows.
- Its address is harder to determine if stack addresses are
leaked, making a number of attacks more difficult.

This has the following consequences:
- thread_info is now located at the beginning of task_struct.
- The 'cpu' field is now in task_struct, and only exists when
CONFIG_SMP is active.
- thread_info doesn't have anymore the 'task' field.

This patch:
- Removes all recopy of thread_info struct when the stack changes.
- Changes the CURRENT_THREAD_INFO() macro to point to current.
- Selects CONFIG_THREAD_INFO_IN_TASK.
- Modifies raw_smp_processor_id() to get ->cpu from current without
including linux/sched.h to avoid circular inclusion and without
including asm/asm-offsets.h to avoid symbol names duplication
between ASM constants and C constants.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/Makefile  |  8 +-
 arch/powerpc/include/asm/ptrace.h  |  2 +-
 arch/powerpc/include/asm/smp.h | 17 +++-
 arch/powerpc/include/asm/thread_info.h | 17 ++--
 arch/powerpc/kernel/asm-offsets.c  |  7 +++--
 arch/powerpc/kernel/entry_32.S |  9 +++
 arch/powerpc/kernel/exceptions-64e.S   | 11 
 arch/powerpc/kernel/head_32.S  |  6 ++---
 arch/powerpc/kernel/head_44x.S |  4 +--
 arch/powerpc/kernel/head_64.S  |  1 +
 arch/powerpc/kernel/head_booke.h   |  8 +-
 arch/powerpc/kernel/head_fsl_booke.S   |  7 +++--
 arch/powerpc/kernel/irq.c  | 47 +-
 arch/powerpc/kernel/kgdb.c | 28 
 arch/powerpc/kernel/machine_kexec_64.c |  6 ++---
 arch/powerpc/kernel/setup_64.c | 21 ---
 arch/powerpc/kernel/smp.c  |  2 +-
 arch/powerpc/net/bpf_jit32.h   |  5 ++--
 19 files changed, 52 insertions(+), 155 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 602eea723624..3b958cd4e284 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -238,6 +238,7 @@ config PPC
select RTC_LIB
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
+   select THREAD_INFO_IN_TASK
select VIRT_TO_BUS  if !PPC64
#
# Please keep this list sorted alphabetically.
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 81552c7b46eb..02e7ca1c15d4 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -422,6 +422,13 @@ else
 endif
 endif
 
+ifdef CONFIG_SMP
+prepare: task_cpu_prepare
+
+task_cpu_prepare: prepare0
+   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") 
print $$3;}' include/generated/asm-offsets.h))
+endif
+
 # Use the file '.tmp_gas_check' for binutils tests, as gas won't output
 # to stdout and these checks are run even on install targets.
 TOUT   := .tmp_gas_check
@@ -439,4 +446,3 @@ checkbin:
 
 
 CLEAN_FILES += $(TOUT)
-
diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 447cbd1bee99..3a7e5561630b 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -120,7 +120,7 @@ extern int ptrace_put_reg(struct task_struct *task, int 
regno,
  unsigned long data);
 
 #define current_pt_regs() \
-   ((struct pt_regs *)((unsigned long)current_thread_info() + THREAD_SIZE) 
- 1)
+   ((struct pt_regs *)((unsigned long)task_stack_page(current) + 
THREAD_SIZE) - 1)
 /*
  * We use the least-significant bit of the trap field to indicate
  * whether we have saved the full set of registers, or only a
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 95b66a0c639b..93a8cd120663 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -83,7 +83,22 @@ int is_cpu_dead(unsigned int cpu);
 /* 32-bit */
 extern int smp_hw_index[];
 
-#define raw_smp_processor_id() (current_thread_info()->cpu)
+/*
+ * This is particularly ugly: it appears we can't actually get the definition
+ * of task_struct here, but we need access to the CPU this task is running on.
+ * Instead of using task_struct we're using _TASK_CPU which is extracted from
+ * asm-offsets.h by kbuild to get the current processor ID.
+ *
+ * This also needs to be safeguarded when building asm-offsets.s because at
+ * that time _TASK_CPU is not defined yet. It could have been guarded by
+ * _TASK_CPU itself, but we want the build to fail if _TASK_CPU is missing
+ * when building something else than asm-offsets.s
+ */
+#ifdef GENERATING_ASM_OFFSETS
+#define raw_smp_processor_id() (0)
+#else
+#define raw_smp_processor_id() (*(unsigned int *)((void *)current + 
_TASK_CPU))

[PATCH v7 3/9] powerpc: Prepare for moving thread_info into task_struct

2018-10-08 Thread Christophe Leroy

This patch cleans the powerpc kernel before activating
CONFIG_THREAD_INFO_IN_TASK:
- The purpose of the pointer given to call_do_softirq() and
call_do_irq() is to point the new stack ==> change it to void* and
rename it 'sp'
- Don't use CURRENT_THREAD_INFO() to locate the stack.
- Fix a few comments.
- Replace current_thread_info()->task by current
- Remove unnecessary casts to thread_info, as they'll become invalid
once thread_info is not in stack anymore.
- Rename THREAD_INFO to TASK_STASK: as it is in fact the offset of the
pointer to the stack in task_struct, this pointer will not be impacted
by the move of THREAD_INFO.
- Makes TASK_STACK available to PPC64. PPC64 will need it to get the
stack pointer from current once the thread_info have been moved.
- Modifies klp_init_thread_info() to take task_struct pointer argument.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/irq.h   |  4 ++--
 arch/powerpc/include/asm/livepatch.h |  7 ---
 arch/powerpc/include/asm/processor.h |  4 ++--
 arch/powerpc/include/asm/reg.h   |  2 +-
 arch/powerpc/kernel/asm-offsets.c|  2 +-
 arch/powerpc/kernel/entry_32.S   |  2 +-
 arch/powerpc/kernel/entry_64.S   |  2 +-
 arch/powerpc/kernel/head_32.S|  4 ++--
 arch/powerpc/kernel/head_40x.S   |  4 ++--
 arch/powerpc/kernel/head_44x.S   |  2 +-
 arch/powerpc/kernel/head_8xx.S   |  2 +-
 arch/powerpc/kernel/head_booke.h |  4 ++--
 arch/powerpc/kernel/head_fsl_booke.S |  4 ++--
 arch/powerpc/kernel/irq.c|  2 +-
 arch/powerpc/kernel/misc_32.S|  4 ++--
 arch/powerpc/kernel/process.c|  8 
 arch/powerpc/kernel/setup-common.c   |  2 +-
 arch/powerpc/kernel/setup_32.c   | 15 +--
 arch/powerpc/kernel/smp.c|  4 +++-
 19 files changed, 38 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index ee39ce56b2a2..2efbae8d93be 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -63,8 +63,8 @@ extern struct thread_info *hardirq_ctx[NR_CPUS];
 extern struct thread_info *softirq_ctx[NR_CPUS];
 
 extern void irq_ctx_init(void);
-extern void call_do_softirq(struct thread_info *tp);
-extern void call_do_irq(struct pt_regs *regs, struct thread_info *tp);
+void call_do_softirq(void *sp);
+void call_do_irq(struct pt_regs *regs, void *sp);
 extern void do_IRQ(struct pt_regs *regs);
 extern void __init init_IRQ(void);
 extern void __do_irq(struct pt_regs *regs);
diff --git a/arch/powerpc/include/asm/livepatch.h 
b/arch/powerpc/include/asm/livepatch.h
index 47a03b9b528b..8a81d10ccc82 100644
--- a/arch/powerpc/include/asm/livepatch.h
+++ b/arch/powerpc/include/asm/livepatch.h
@@ -43,13 +43,14 @@ static inline unsigned long 
klp_get_ftrace_location(unsigned long faddr)
return ftrace_location_range(faddr, faddr + 16);
 }
 
-static inline void klp_init_thread_info(struct thread_info *ti)
+static inline void klp_init_thread_info(struct task_struct *p)
 {
+   struct thread_info *ti = task_thread_info(p);
/* + 1 to account for STACK_END_MAGIC */
-   ti->livepatch_sp = (unsigned long *)(ti + 1) + 1;
+   ti->livepatch_sp = end_of_stack(p) + 1;
 }
 #else
-static void klp_init_thread_info(struct thread_info *ti) { }
+static inline void klp_init_thread_info(struct task_struct *p) { }
 #endif /* CONFIG_LIVEPATCH */
 
 #endif /* _ASM_POWERPC_LIVEPATCH_H */
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 13589274fe9b..b225c7f7c5a4 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -40,7 +40,7 @@
 
 #ifndef __ASSEMBLY__
 #include 
-#include 
+#include 
 #include 
 #include 
 
@@ -332,7 +332,7 @@ struct thread_struct {
 
 #define INIT_SP(sizeof(init_stack) + (unsigned long) 
_stack)
 #define INIT_SP_LIMIT \
-   (_ALIGN_UP(sizeof(init_thread_info), 16) + (unsigned long) _stack)
+   (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack)
 
 #ifdef CONFIG_SPE
 #define SPEFSCR_INIT \
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 640a4d818772..d2528a0b2f5b 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1058,7 +1058,7 @@
  * - SPRG9 debug exception scratch
  *
  * All 32-bit:
- * - SPRG3 current thread_info pointer
+ * - SPRG3 current thread_struct physical addr pointer
  *(virtual on BookE, physical on others)
  *
  * 32-bit classic:
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index a6d70fd2e499..c583a02e5a21 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -91,10 +91,10 @@ int main(void)
DEFINE(NMI_MASK, NMI_MASK);
OFFSET(TASKTHREADPPR, task_struct, thread.ppr);
 #else
-   OFFSET(THREAD_INFO, task_struct, stack);

[PATCH v7 2/9] powerpc: Only use task_struct 'cpu' field on SMP

2018-10-08 Thread Christophe Leroy

When moving to CONFIG_THREAD_INFO_IN_TASK, the thread_info 'cpu' field
gets moved into task_struct and only defined when CONFIG_SMP is set.

This patch ensures that TI_CPU is only used when CONFIG_SMP is set and
that task_struct 'cpu' field is not used directly out of SMP code.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kernel/head_fsl_booke.S | 2 ++
 arch/powerpc/kernel/misc_32.S| 4 
 arch/powerpc/xmon/xmon.c | 2 +-
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index e2750b856c8f..05b574f416b3 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -243,8 +243,10 @@ set_ivor:
li  r0,0
stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1)
 
+#ifdef CONFIG_SMP
CURRENT_THREAD_INFO(r22, r1)
stw r24, TI_CPU(r22)
+#endif
 
bl  early_init
 
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 695b24a2d954..2f0fe8bfc078 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -183,10 +183,14 @@ _GLOBAL(low_choose_750fx_pll)
or  r4,r4,r5
mtspr   SPRN_HID1,r4
 
+#ifdef CONFIG_SMP
/* Store new HID1 image */
CURRENT_THREAD_INFO(r6, r1)
lwz r6,TI_CPU(r6)
slwir6,r6,2
+#else
+   li  r6, 0
+#endif
addis   r6,r6,nap_save_hid1@ha
stw r4,nap_save_hid1@l(r6)
 
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index c70d17c9a6ba..1731793e1277 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2986,7 +2986,7 @@ static void show_task(struct task_struct *tsk)
printf("%px %016lx %6d %6d %c %2d %s\n", tsk,
tsk->thread.ksp,
tsk->pid, tsk->parent->pid,
-   state, task_thread_info(tsk)->cpu,
+   state, task_cpu(tsk),
tsk->comm);
 }
 
-- 
2.13.3

[PATCH v7 1/9] book3s/64: avoid circular header inclusion in mmu-hash.h

2018-10-08 Thread Christophe Leroy

When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h
includes asm/current.h. This generates a circular dependency.
To avoid that, asm/processor.h shall not be included in mmu-hash.h

In order to do that, this patch moves into a new header called
asm/task_size_user64.h the information from asm/processor.h required
by mmu-hash.h

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  2 +-
 arch/powerpc/include/asm/processor.h  | 34 +-
 arch/powerpc/include/asm/task_size_user64.h   | 42 +++
 arch/powerpc/kvm/book3s_hv_hmi.c  |  1 +
 4 files changed, 45 insertions(+), 34 deletions(-)
 create mode 100644 arch/powerpc/include/asm/task_size_user64.h

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index e0e4ce8f77d6..02955d867067 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -23,7 +23,7 @@
  */
 #include 
 #include 
-#include 
+#include 
 #include 
 
 /*
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 52fadded5c1e..13589274fe9b 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -101,40 +101,8 @@ void release_thread(struct task_struct *);
 #endif
 
 #ifdef CONFIG_PPC64
-/*
- * 64-bit user address space can have multiple limits
- * For now supported values are:
- */
-#define TASK_SIZE_64TB  (0x4000UL)
-#define TASK_SIZE_128TB (0x8000UL)
-#define TASK_SIZE_512TB (0x0002UL)
-#define TASK_SIZE_1PB   (0x0004UL)
-#define TASK_SIZE_2PB   (0x0008UL)
-/*
- * With 52 bits in the address we can support
- * upto 4PB of range.
- */
-#define TASK_SIZE_4PB   (0x0010UL)
 
-/*
- * For now 512TB is only supported with book3s and 64K linux page size.
- */
-#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES)
-/*
- * Max value currently used:
- */
-#define TASK_SIZE_USER64   TASK_SIZE_4PB
-#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_128TB
-#define TASK_CONTEXT_SIZE  TASK_SIZE_512TB
-#else
-#define TASK_SIZE_USER64   TASK_SIZE_64TB
-#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_64TB
-/*
- * We don't need to allocate extended context ids for 4K page size, because
- * we limit the max effective address on this config to 64TB.
- */
-#define TASK_CONTEXT_SIZE  TASK_SIZE_64TB
-#endif
+#include 
 
 /*
  * 32-bit user address space is 4GB - 1 page
diff --git a/arch/powerpc/include/asm/task_size_user64.h 
b/arch/powerpc/include/asm/task_size_user64.h
new file mode 100644
index ..a4043075864b
--- /dev/null
+++ b/arch/powerpc/include/asm/task_size_user64.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_TASK_SIZE_USER64_H
+#define _ASM_POWERPC_TASK_SIZE_USER64_H
+
+#ifdef CONFIG_PPC64
+/*
+ * 64-bit user address space can have multiple limits
+ * For now supported values are:
+ */
+#define TASK_SIZE_64TB  (0x4000UL)
+#define TASK_SIZE_128TB (0x8000UL)
+#define TASK_SIZE_512TB (0x0002UL)
+#define TASK_SIZE_1PB   (0x0004UL)
+#define TASK_SIZE_2PB   (0x0008UL)
+/*
+ * With 52 bits in the address we can support
+ * upto 4PB of range.
+ */
+#define TASK_SIZE_4PB   (0x0010UL)
+
+/*
+ * For now 512TB is only supported with book3s and 64K linux page size.
+ */
+#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES)
+/*
+ * Max value currently used:
+ */
+#define TASK_SIZE_USER64   TASK_SIZE_4PB
+#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_128TB
+#define TASK_CONTEXT_SIZE  TASK_SIZE_512TB
+#else
+#define TASK_SIZE_USER64   TASK_SIZE_64TB
+#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_64TB
+/*
+ * We don't need to allocate extended context ids for 4K page size, because
+ * we limit the max effective address on this config to 64TB.
+ */
+#define TASK_CONTEXT_SIZE  TASK_SIZE_64TB
+#endif
+
+#endif /* CONFIG_PPC64 */
+#endif /* _ASM_POWERPC_TASK_SIZE_USER64_H */
diff --git a/arch/powerpc/kvm/book3s_hv_hmi.c b/arch/powerpc/kvm/book3s_hv_hmi.c
index e3f738eb1cac..64b5011475c7 100644
--- a/arch/powerpc/kvm/book3s_hv_hmi.c
+++ b/arch/powerpc/kvm/book3s_hv_hmi.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void wait_for_subcore_guest_exit(void)
 {
-- 
2.13.3

[PATCH v7 0/9] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK

2018-10-08 Thread Christophe Leroy

The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

Moving thread_info into task_struct has the following advantages:
- It protects thread_info from corruption in the case of stack
overflows.
- Its address is harder to determine if stack addresses are
leaked, making a number of attacks more difficult.

Changes since v6:
 - Fixed validate_sp() to exclude NULL sp in 'regain entire stack space' patch 
(early crash with CONFIG_KMEMLEAK)

Changes since v5:
 - Fixed livepatch_sp setup by using end_of_stack() instead of hardcoding
 - Fixed PPC_BPF_LOAD_CPU() macro

Changes since v4:
 - Fixed a build failure on 32bits SMP when include/generated/asm-offsets.h is 
not
 already existing, was due to spaces instead of a tab in the Makefile

Changes since RFC v3: (based on Nick's review)
 - Renamed task_size.h to task_size_user64.h to better relate to what it 
contains.
 - Handling of the isolation of thread_info cpu field inside CONFIG_SMP #ifdefs 
moved to a separate patch.
 - Removed CURRENT_THREAD_INFO macro completely.
 - Added a guard in asm/smp.h to avoid build failure before _TASK_CPU is 
defined.
 - Added a patch at the end to rename 'tp' pointers to 'sp' pointers
 - Renamed 'tp' into 'sp' pointers in preparation patch when relevant
 - Fixed a few commit logs
 - Fixed checkpatch report.

Changes since RFC v2:
 - Removed the modification of names in asm-offsets
 - Created a rule in arch/powerpc/Makefile to append the offset of current->cpu 
in CFLAGS
 - Modified asm/smp.h to use the offset set in CFLAGS
 - Squashed the renaming of THREAD_INFO to TASK_STACK in the preparation patch
 - Moved the modification of current_pt_regs in the patch activating 
CONFIG_THREAD_INFO_IN_TASK

Changes since RFC v1:
 - Removed the first patch which was modifying header inclusion order in timer
 - Modified some names in asm-offsets to avoid conflicts when including 
asm-offsets in C files
 - Modified asm/smp.h to avoid having to include linux/sched.h (using 
asm-offsets instead)
 - Moved some changes from the activation patch to the preparation patch.

Christophe Leroy (9):
  book3s/64: avoid circular header inclusion in mmu-hash.h
  powerpc: Only use task_struct 'cpu' field on SMP
  powerpc: Prepare for moving thread_info into task_struct
  powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
  powerpc: regain entire stack space
  powerpc: 'current_set' is now a table of task_struct pointers
  powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU
  powerpc/64: Remove CURRENT_THREAD_INFO
  powerpc: clean stack pointers naming

Christophe Leroy (9):
  book3s/64: avoid circular header inclusion in mmu-hash.h
  powerpc: Only use task_struct 'cpu' field on SMP
  powerpc: Prepare for moving thread_info into task_struct
  powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
  powerpc: regain entire stack space
  powerpc: 'current_set' is now a table of task_struct pointers
  powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU
  powerpc/64: Remove CURRENT_THREAD_INFO
  powerpc: clean stack pointers naming

 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/Makefile  |  8 ++-
 arch/powerpc/include/asm/asm-prototypes.h  |  4 +-
 arch/powerpc/include/asm/book3s/64/mmu-hash.h  |  2 +-
 arch/powerpc/include/asm/exception-64s.h   |  4 +-
 arch/powerpc/include/asm/irq.h | 14 ++---
 arch/powerpc/include/asm/livepatch.h   |  7 ++-
 arch/powerpc/include/asm/processor.h   | 39 +
 arch/powerpc/include/asm/ptrace.h  |  2 +-
 arch/powerpc/include/asm/reg.h |  2 +-
 arch/powerpc/include/asm/smp.h | 17 +-
 arch/powerpc/include/asm/task_size_user64.h| 42 ++
 arch/powerpc/include/asm/thread_info.h | 19 ---
 arch/powerpc/kernel/asm-offsets.c  | 10 ++--
 arch/powerpc/kernel/entry_32.S | 66 --
 arch/powerpc/kernel/entry_64.S | 12 ++--
 arch/powerpc/kernel/epapr_hcalls.S |  5 +-
 arch/powerpc/kernel/exceptions-64e.S   | 13 +
 arch/powerpc/kernel/exceptions-64s.S   |  2 +-
 arch/powerpc/kernel/head_32.S  | 14 ++---
 arch/powerpc/kernel/head_40x.S |  4 +-
 arch/powerpc/kernel/head_44x.S |  8 +--
 arch/powerpc/kernel/head_64.S  |  1 +
 arch/powerpc/kernel/head_8xx.S |  2 +-
 arch/powerpc/kernel/head_booke.h   | 12 +---
 arch/powerpc/kernel/head_fsl_booke.S   | 16 +++---
 arch/powerpc/kernel/idle_6xx.S |  8 +--
 arch/powerpc/kernel/idle_book3e.S  |  2 +-
 arch/powerpc/kernel/idle_e500.S|  8 +--
 arch/powerpc/kernel/idle_power4.S  |  2 +-
 arch/powerpc/kernel/irq.c  | 77 +-
 arch/powerpc/kernel/kgdb.c | 28 --

Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt

2018-10-08 Thread Christophe LEROY





Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :

On Mon, 8 Oct 2018 17:39:11 +0200
Christophe LEROY  wrote:


Hi Nick,

Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :

Use nmi_enter similarly to system reset interrupts. This uses NMI
printk NMI buffers and turns off various debugging facilities that
helps avoid tripping on ourselves or other CPUs.

Signed-off-by: Nicholas Piggin 
---
   arch/powerpc/kernel/traps.c | 9 ++---
   1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 2849c4f50324..6d31f9d7c333 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
   
   void machine_check_exception(struct pt_regs *regs)

   {
-   enum ctx_state prev_state = exception_enter();
int recover = 0;
+   bool nested = in_nmi();
+   if (!nested)
+   nmi_enter();


This alters preempt_count, then when die() is called
in_interrupt() returns true allthough the trap didn't happen in
interrupt, so oops_end() panics for "fatal exception in interrupt"
instead of gently sending SIGBUS the faulting app.


Thanks for tracking that down.


Any idea on how to fix this ?


I would say we have to deliver the sigbus by hand.

 if ((user_mode(regs)))
 _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
 else
 die("Machine check", regs, SIGBUS);



And what about all the other things done by 'die()' ?

And what if it is a kernel thread ?

In one of my boards, I have a kernel thread regularly checking the HW, 
and if it gets a machine check I expect it to gently stop and the die 
notification to be delivered to all registered notifiers.


Until before this patch, it was working well.

Christophe

Re: [PATCH] powerpc/xmon/ppc-opc: Use ARRAY_SIZE macro

2018-10-08 Thread Joe Perches

On Tue, 2018-10-09 at 14:43 +1100, Michael Ellerman wrote:
> Joe Perches  writes:
> 
> > On Thu, 2018-10-04 at 19:10 +0200, Gustavo A. R. Silva wrote:
> > > Use ARRAY_SIZE instead of dividing sizeof array with sizeof an element.
> > []
> > > diff --git a/arch/powerpc/xmon/ppc-opc.c b/arch/powerpc/xmon/ppc-opc.c
> > []
> > > @@ -966,8 +966,7 @@ const struct powerpc_operand powerpc_operands[] =
> > >{ 0xff, 11, NULL, NULL, PPC_OPERAND_SIGNOPT },
> > >  };
> > >  
> > > -const unsigned int num_powerpc_operands = (sizeof (powerpc_operands)
> > > -/ sizeof (powerpc_operands[0]));
> > > +const unsigned int num_powerpc_operands = ARRAY_SIZE(powerpc_operands);
> > 
> > It seems this is unused and could be deleted.
> 
> The code in this file is copied from binutils.
> 
> We don't want to needlessly diverge it.
> 
> I've said this before:
> 
>   
> https://lore.kernel.org/linuxppc-dev/874lfxjnzl@concordia.ellerman.id.au/

Don't expect people to remember this.

> Is there some way we can blacklist this file from checkpatch, Coccinelle
> etc?

Modify both to look for some specific tag
in a file and then update the scripts to
read the file when looking at patches too.

Otherwise, no.

Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt

2018-10-08 Thread Nicholas Piggin

On Mon, 8 Oct 2018 17:39:11 +0200
Christophe LEROY  wrote:

> Hi Nick,
> 
> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :
> > Use nmi_enter similarly to system reset interrupts. This uses NMI
> > printk NMI buffers and turns off various debugging facilities that
> > helps avoid tripping on ourselves or other CPUs.
> > 
> > Signed-off-by: Nicholas Piggin 
> > ---
> >   arch/powerpc/kernel/traps.c | 9 ++---
> >   1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> > index 2849c4f50324..6d31f9d7c333 100644
> > --- a/arch/powerpc/kernel/traps.c
> > +++ b/arch/powerpc/kernel/traps.c
> > @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
> >   
> >   void machine_check_exception(struct pt_regs *regs)
> >   {
> > -   enum ctx_state prev_state = exception_enter();
> > int recover = 0;
> > +   bool nested = in_nmi();
> > +   if (!nested)
> > +   nmi_enter();  
> 
> This alters preempt_count, then when die() is called
> in_interrupt() returns true allthough the trap didn't happen in 
> interrupt, so oops_end() panics for "fatal exception in interrupt" 
> instead of gently sending SIGBUS the faulting app.

Thanks for tracking that down.

> Any idea on how to fix this ?

I would say we have to deliver the sigbus by hand.

if ((user_mode(regs)))
_exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
else
die("Machine check", regs, SIGBUS);

Re: [PATCH] powerpc/xmon/ppc-opc: Use ARRAY_SIZE macro

2018-10-08 Thread Michael Ellerman

Joe Perches  writes:

> On Thu, 2018-10-04 at 19:10 +0200, Gustavo A. R. Silva wrote:
>> Use ARRAY_SIZE instead of dividing sizeof array with sizeof an element.
> []
>> diff --git a/arch/powerpc/xmon/ppc-opc.c b/arch/powerpc/xmon/ppc-opc.c
> []
>> @@ -966,8 +966,7 @@ const struct powerpc_operand powerpc_operands[] =
>>{ 0xff, 11, NULL, NULL, PPC_OPERAND_SIGNOPT },
>>  };
>>  
>> -const unsigned int num_powerpc_operands = (sizeof (powerpc_operands)
>> -   / sizeof (powerpc_operands[0]));
>> +const unsigned int num_powerpc_operands = ARRAY_SIZE(powerpc_operands);
>
> It seems this is unused and could be deleted.

The code in this file is copied from binutils.

We don't want to needlessly diverge it.

I've said this before:

  https://lore.kernel.org/linuxppc-dev/874lfxjnzl@concordia.ellerman.id.au/

Is there some way we can blacklist this file from checkpatch, Coccinelle
etc?

cheers

Re: [PATCH] powerpc/xmon/ppc-opc: Use ARRAY_SIZE macro

2018-10-08 Thread Joe Perches

On Thu, 2018-10-04 at 19:10 +0200, Gustavo A. R. Silva wrote:
> Use ARRAY_SIZE instead of dividing sizeof array with sizeof an element.
[]
> diff --git a/arch/powerpc/xmon/ppc-opc.c b/arch/powerpc/xmon/ppc-opc.c
[]
> @@ -966,8 +966,7 @@ const struct powerpc_operand powerpc_operands[] =
>{ 0xff, 11, NULL, NULL, PPC_OPERAND_SIGNOPT },
>  };
>  
> -const unsigned int num_powerpc_operands = (sizeof (powerpc_operands)
> -/ sizeof (powerpc_operands[0]));
> +const unsigned int num_powerpc_operands = ARRAY_SIZE(powerpc_operands);

It seems this is unused and could be deleted.

>  /* The functions used to insert and extract complicated operands.  */
>  
> @@ -6980,8 +6979,7 @@ const struct powerpc_opcode powerpc_opcodes[] = {
>  {"fcfidu.",  XRC(63,974,1),  XRA_MASK, POWER7|PPCA2, PPCVLE, {FRT, 
> FRB}},
>  };
>  
> -const int powerpc_num_opcodes =
> -  sizeof (powerpc_opcodes) / sizeof (powerpc_opcodes[0]);
> +const int powerpc_num_opcodes = ARRAY_SIZE(powerpc_opcodes);

This is used once and should probably be replaced where
it is used with ARRAY_SIZE

>  /* The VLE opcode table.
>  
> @@ -7219,8 +7217,7 @@ const struct powerpc_opcode vle_opcodes[] = {
>  {"se_bl",BD8(58,0,1),BD8_MASK,   PPCVLE, 0,  {B8}},
>  };
>  
> -const int vle_num_opcodes =
> -  sizeof (vle_opcodes) / sizeof (vle_opcodes[0]);
> +const int vle_num_opcodes = ARRAY_SIZE(vle_opcodes);

Also apparently unused and could be deleted.

>  
>  /* The macro table.  This is only used by the assembler.  */
>  
> @@ -7288,5 +7285,4 @@ const struct powerpc_macro powerpc_macros[] = {
>  {"e_clrlslwi",4, PPCVLE, "e_rlwinm %0,%1,%3,(%2)-(%3),31-(%3)"},
>  };
>  ld
> -const int powerpc_num_macros =
> -  sizeof (powerpc_macros) / sizeof (powerpc_macros[0]);
> +const int powerpc_num_macros = ARRAY_SIZE(powerpc_macros);

Also apparently unused and could be deleted.

Re: [PATCH v5 06/33] KVM: PPC: Book3S HV: Simplify real-mode interrupt handling

2018-10-08 Thread David Gibson

On Mon, Oct 08, 2018 at 04:30:52PM +1100, Paul Mackerras wrote:
> This streamlines the first part of the code that handles a hypervisor
> interrupt that occurred in the guest.  With this, all of the real-mode
> handling that occurs is done before the "guest_exit_cont" label; once
> we get to that label we are committed to exiting to host virtual mode.
> Thus the machine check and HMI real-mode handling is moved before that
> label.
> 
> Also, the code to handle external interrupts is moved out of line, as
> is the code that calls kvmppc_realmode_hmi_handler().
> 
> Signed-off-by: Paul Mackerras 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/kvm/book3s_hv_ras.c|   8 ++
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 220 
> 
>  2 files changed, 119 insertions(+), 109 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_ras.c 
> b/arch/powerpc/kvm/book3s_hv_ras.c
> index b11043b..ee564b6 100644
> --- a/arch/powerpc/kvm/book3s_hv_ras.c
> +++ b/arch/powerpc/kvm/book3s_hv_ras.c
> @@ -331,5 +331,13 @@ long kvmppc_realmode_hmi_handler(void)
>   } else {
>   wait_for_tb_resync();
>   }
> +
> + /*
> +  * Reset tb_offset_applied so the guest exit code won't try
> +  * to subtract the previous timebase offset from the timebase.
> +  */
> + if (local_paca->kvm_hstate.kvm_vcore)
> + local_paca->kvm_hstate.kvm_vcore->tb_offset_applied = 0;
> +
>   return 0;
>  }
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 5b2ae34..fc360b5 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1018,8 +1018,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
>  no_xive:
>  #endif /* CONFIG_KVM_XICS */
>  
> -deliver_guest_interrupt:
> -kvmppc_cede_reentry: /* r4 = vcpu, r13 = paca */
> +deliver_guest_interrupt: /* r4 = vcpu, r13 = paca */
>   /* Check if we can deliver an external or decrementer interrupt now */
>   ld  r0, VCPU_PENDING_EXC(r4)
>  BEGIN_FTR_SECTION
> @@ -1269,18 +1268,26 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
>   std r3, VCPU_CTR(r9)
>   std r4, VCPU_XER(r9)
>  
> -#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> - /* For softpatch interrupt, go off and do TM instruction emulation */
> - cmpwi   r12, BOOK3S_INTERRUPT_HV_SOFTPATCH
> - beq kvmppc_tm_emul
> -#endif
> + /* Save more register state  */
> + mfdar   r3
> + mfdsisr r4
> + std r3, VCPU_DAR(r9)
> + stw r4, VCPU_DSISR(r9)
>  
>   /* If this is a page table miss then see if it's theirs or ours */
>   cmpwi   r12, BOOK3S_INTERRUPT_H_DATA_STORAGE
>   beq kvmppc_hdsi
> + std r3, VCPU_FAULT_DAR(r9)
> + stw r4, VCPU_FAULT_DSISR(r9)
>   cmpwi   r12, BOOK3S_INTERRUPT_H_INST_STORAGE
>   beq kvmppc_hisi
>  
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> + /* For softpatch interrupt, go off and do TM instruction emulation */
> + cmpwi   r12, BOOK3S_INTERRUPT_HV_SOFTPATCH
> + beq kvmppc_tm_emul
> +#endif
> +
>   /* See if this is a leftover HDEC interrupt */
>   cmpwi   r12,BOOK3S_INTERRUPT_HV_DECREMENTER
>   bne 2f
> @@ -1303,7 +1310,7 @@ BEGIN_FTR_SECTION
>  END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
>   lbz r0, HSTATE_HOST_IPI(r13)
>   cmpwi   r0, 0
> - beq 4f
> + beq maybe_reenter_guest
>   b   guest_exit_cont
>  3:
>   /* If it's a hypervisor facility unavailable interrupt, save HFSCR */
> @@ -1315,82 +1322,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
>  14:
>   /* External interrupt ? */
>   cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
> - bne+guest_exit_cont
> -
> - /* External interrupt, first check for host_ipi. If this is
> -  * set, we know the host wants us out so let's do it now
> -  */
> - bl  kvmppc_read_intr
> -
> - /*
> -  * Restore the active volatile registers after returning from
> -  * a C function.
> -  */
> - ld  r9, HSTATE_KVM_VCPU(r13)
> - li  r12, BOOK3S_INTERRUPT_EXTERNAL
> -
> - /*
> -  * kvmppc_read_intr return codes:
> -  *
> -  * Exit to host (r3 > 0)
> -  *   1 An interrupt is pending that needs to be handled by the host
> -  * Exit guest and return to host by branching to guest_exit_cont
> -  *
> -  *   2 Passthrough that needs completion in the host
> -  * Exit guest and return to host by branching to guest_exit_cont
> -  * However, we also set r12 to BOOK3S_INTERRUPT_HV_RM_HARD
> -  * to indicate to the host to complete handling the interrupt
> -  *
> -  * Before returning to guest, we check if any CPU is heading out
> -  * to the host and if so, we head out also. If no CPUs are heading
> -  * check return values <= 0.
> -  *
> -  * Return to guest (r3 <= 0)
> -  *  0 No external interrupt is pending
>

Re: [PATCH v5 22/33] KVM: PPC: Book3S HV: Introduce rmap to track nested guest mappings

2018-10-08 Thread David Gibson

On Mon, Oct 08, 2018 at 04:31:08PM +1100, Paul Mackerras wrote:
> From: Suraj Jitindar Singh 
> 
> When a host (L0) page which is mapped into a (L1) guest is in turn
> mapped through to a nested (L2) guest we keep a reverse mapping (rmap)
> so that these mappings can be retrieved later.
> 
> Whenever we create an entry in a shadow_pgtable for a nested guest we
> create a corresponding rmap entry and add it to the list for the
> L1 guest memslot at the index of the L1 guest page it maps. This means
> at the L1 guest memslot we end up with lists of rmaps.
> 
> When we are notified of a host page being invalidated which has been
> mapped through to a (L1) guest, we can then walk the rmap list for that
> guest page, and find and invalidate all of the corresponding
> shadow_pgtable entries.
> 
> In order to reduce memory consumption, we compress the information for
> each rmap entry down to 52 bits -- 12 bits for the LPID and 40 bits
> for the guest real page frame number -- which will fit in a single
> unsigned long.  To avoid a scenario where a guest can trigger
> unbounded memory allocations, we scan the list when adding an entry to
> see if there is already an entry with the contents we need.  This can
> occur, because we don't ever remove entries from the middle of a list.
> 
> A struct nested guest rmap is a list pointer and an rmap entry;
> 
> | next pointer |
> 
> | rmap entry   |
> 
> 
> Thus the rmap pointer for each guest frame number in the memslot can be
> either NULL, a single entry, or a pointer to a list of nested rmap entries.
> 
> gfnmemslot rmap array
>   -
>  0| NULL  |   (no rmap entry)
>   -
>  1| single rmap entry |   (rmap entry with low bit set)
>   -
>  2| list head pointer |   (list of rmap entries)
>   -
> 
> The final entry always has the lowest bit set and is stored in the next
> pointer of the last list entry, or as a single rmap entry.
> With a list of rmap entries looking like;
> 
> - -   -
> | list head ptr   | > | next pointer  | > | single rmap entry 
> |
> - -   -
>   | rmap entry|   | rmap entry|
>   -   -
> 
> Signed-off-by: Suraj Jitindar Singh 
> Signed-off-by: Paul Mackerras 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/include/asm/kvm_book3s.h|   3 +
>  arch/powerpc/include/asm/kvm_book3s_64.h |  69 +++-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c   |  44 +++---
>  arch/powerpc/kvm/book3s_hv.c |   1 +
>  arch/powerpc/kvm/book3s_hv_nested.c  | 138 
> ++-
>  5 files changed, 240 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
> b/arch/powerpc/include/asm/kvm_book3s.h
> index 63f7ccf..d7aeb6f 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -196,6 +196,9 @@ extern int kvmppc_mmu_radix_translate_table(struct 
> kvm_vcpu *vcpu, gva_t eaddr,
>   int table_index, u64 *pte_ret_p);
>  extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
>   struct kvmppc_pte *gpte, bool data, bool iswrite);
> +extern void kvmppc_unmap_pte(struct kvm *kvm, pte_t *pte, unsigned long gpa,
> + unsigned int shift, struct kvm_memory_slot *memslot,
> + unsigned int lpid);
>  extern bool kvmppc_hv_handle_set_rc(struct kvm *kvm, pgd_t *pgtable,
>   bool writing, unsigned long gpa,
>   unsigned int lpid);
> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
> b/arch/powerpc/include/asm/kvm_book3s_64.h
> index 5496152..c2a9146 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
> @@ -53,6 +53,66 @@ struct kvm_nested_guest {
>   struct kvm_nested_guest *next;
>  };
>  
> +/*
> + * We define a nested rmap entry as a single 64-bit quantity
> + * 0xFFF012-bit lpid field
> + * 0x000FF00040-bit guest 4k page frame number
> + * 0x00011-bit  single entry flag
> + */
> +#define RMAP_NESTED_LPID_MASK0xFFF0UL
> +#define RMAP_NESTED_LPID_SHIFT   (52)
> +#define RMAP_NESTED_GPA_MASK 0x000FF000UL
> +#define RMAP_NESTED_IS_SINGLE_ENTRY  0x0001UL
> +
> +/* Structure for a nested guest rmap entry */
> +struct rmap_nested {
> + struct llist_node list;
> + u64 rmap;
> +};
> +
> +/*
> + * for_each_nest_rmap_safe - iterate over the list of nested rmap entries
> + *

Re: [PATCH v5 17/33] KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization

2018-10-08 Thread David Gibson

On Mon, Oct 08, 2018 at 04:31:03PM +1100, Paul Mackerras wrote:
> This starts the process of adding the code to support nested HV-style
> virtualization.  It defines a new H_SET_PARTITION_TABLE hypercall which
> a nested hypervisor can use to set the base address and size of a
> partition table in its memory (analogous to the PTCR register).
> On the host (level 0 hypervisor) side, the H_SET_PARTITION_TABLE
> hypercall from the guest is handled by code that saves the virtual
> PTCR value for the guest.
> 
> This also adds code for creating and destroying nested guests and for
> reading the partition table entry for a nested guest from L1 memory.
> Each nested guest has its own shadow LPID value, different in general
> from the LPID value used by the nested hypervisor to refer to it.  The
> shadow LPID value is allocated at nested guest creation time.
> 
> Nested hypervisor functionality is only available for a radix guest,
> which therefore means a radix host on a POWER9 (or later) processor.
> 
> Signed-off-by: Paul Mackerras 

Reviewed-by: David Gibson 


> ---
>  arch/powerpc/include/asm/hvcall.h |   5 +
>  arch/powerpc/include/asm/kvm_book3s.h |  10 +-
>  arch/powerpc/include/asm/kvm_book3s_64.h  |  33 
>  arch/powerpc/include/asm/kvm_book3s_asm.h |   3 +
>  arch/powerpc/include/asm/kvm_host.h   |   5 +
>  arch/powerpc/kvm/Makefile |   3 +-
>  arch/powerpc/kvm/book3s_hv.c  |  31 ++-
>  arch/powerpc/kvm/book3s_hv_nested.c   | 301 
> ++
>  8 files changed, 384 insertions(+), 7 deletions(-)
>  create mode 100644 arch/powerpc/kvm/book3s_hv_nested.c
> 
> diff --git a/arch/powerpc/include/asm/hvcall.h 
> b/arch/powerpc/include/asm/hvcall.h
> index a0b17f9..c95c651 100644
> --- a/arch/powerpc/include/asm/hvcall.h
> +++ b/arch/powerpc/include/asm/hvcall.h
> @@ -322,6 +322,11 @@
>  #define H_GET_24X7_DATA  0xF07C
>  #define H_GET_PERF_COUNTER_INFO  0xF080
>  
> +/* Platform-specific hcalls used for nested HV KVM */
> +#define H_SET_PARTITION_TABLE0xF800
> +#define H_ENTER_NESTED   0xF804
> +#define H_TLB_INVALIDATE 0xF808
> +
>  /* Values for 2nd argument to H_SET_MODE */
>  #define H_SET_MODE_RESOURCE_SET_CIABR1
>  #define H_SET_MODE_RESOURCE_SET_DAWR 2
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
> b/arch/powerpc/include/asm/kvm_book3s.h
> index 91c9779..43f212e 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -274,6 +274,13 @@ static inline void kvmppc_save_tm_sprs(struct kvm_vcpu 
> *vcpu) {}
>  static inline void kvmppc_restore_tm_sprs(struct kvm_vcpu *vcpu) {}
>  #endif
>  
> +long kvmhv_nested_init(void);
> +void kvmhv_nested_exit(void);
> +void kvmhv_vm_nested_init(struct kvm *kvm);
> +long kvmhv_set_partition_table(struct kvm_vcpu *vcpu);
> +void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1);
> +void kvmhv_release_all_nested(struct kvm *kvm);
> +
>  void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac);
>  
>  extern int kvm_irq_bypass;
> @@ -387,9 +394,6 @@ extern int kvmppc_h_logical_ci_store(struct kvm_vcpu 
> *vcpu);
>  /* TO = 31 for unconditional trap */
>  #define INS_TW   0x7fe8
>  
> -/* LPIDs we support with this build -- runtime limit may be lower */
> -#define KVMPPC_NR_LPIDS  (LPID_RSVD + 1)
> -
>  #define SPLIT_HACK_MASK  0xff00
>  #define SPLIT_HACK_OFFS  0xfb00
>  
> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
> b/arch/powerpc/include/asm/kvm_book3s_64.h
> index 5c0e2d9..6d67b6a 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
> @@ -23,6 +23,39 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +
> +#ifdef CONFIG_PPC_PSERIES
> +static inline bool kvmhv_on_pseries(void)
> +{
> + return !cpu_has_feature(CPU_FTR_HVMODE);
> +}
> +#else
> +static inline bool kvmhv_on_pseries(void)
> +{
> + return false;
> +}
> +#endif
> +
> +/*
> + * Structure for a nested guest, that is, for a guest that is managed by
> + * one of our guests.
> + */
> +struct kvm_nested_guest {
> + struct kvm *l1_host;/* L1 VM that owns this nested guest */
> + int l1_lpid;/* lpid L1 guest thinks this guest is */
> + int shadow_lpid;/* real lpid of this nested guest */
> + pgd_t *shadow_pgtable;  /* our page table for this guest */
> + u64 l1_gr_to_hr;/* L1's addr of part'n-scoped table */
> + u64 process_table;  /* process table entry for this guest */
> + long refcnt;/* number of pointers to this struct */
> + struct mutex tlb_lock;  /* serialize page faults and tlbies */
> + struct kvm_nested_guest *next;
> +};
> +
> +struct kvm_nested_guest *kvmhv_get_nested(struct kvm *kvm,

Re: [PATCH v5 33/33] KVM: PPC: Book3S HV: Add NO_HASH flag to GET_SMMU_INFO ioctl result

2018-10-08 Thread David Gibson

On Mon, Oct 08, 2018 at 04:31:19PM +1100, Paul Mackerras wrote:
> This adds a KVM_PPC_NO_HASH flag to the flags field of the
> kvm_ppc_smmu_info struct, and arranges for it to be set when
> running as a nested hypervisor, as an unambiguous indication
> to userspace that HPT guests are not supported.  Reporting the
> KVM_CAP_PPC_MMU_HASH_V3 capability as false could be taken as
> indicating only that the new HPT features in ISA V3.0 are not
> supported, leaving it ambiguous whether pre-V3.0 HPT features
> are supported.
> 
> Signed-off-by: Paul Mackerras 

Reviewed-by: David Gibson 

> ---
>  Documentation/virtual/kvm/api.txt | 4 
>  arch/powerpc/kvm/book3s_hv.c  | 4 
>  include/uapi/linux/kvm.h  | 1 +
>  3 files changed, 9 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index fde48b6..df98b63 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2270,6 +2270,10 @@ The supported flags are:
>  The emulated MMU supports 1T segments in addition to the
>  standard 256M ones.
>  
> +- KVM_PPC_NO_HASH
> + This flag indicates that HPT guests are not supported by KVM,
> + thus all guests must use radix MMU mode.
> +
>  The "slb_size" field indicates how many SLB entries are supported
>  
>  The "sps" array contains 8 entries indicating the supported base
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index fa61647..f565403 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4245,6 +4245,10 @@ static int kvm_vm_ioctl_get_smmu_info_hv(struct kvm 
> *kvm,
>   kvmppc_add_seg_page_size(, 16, SLB_VSID_L | SLB_VSID_LP_01);
>   kvmppc_add_seg_page_size(, 24, SLB_VSID_L);
>  
> + /* If running as a nested hypervisor, we don't support HPT guests */
> + if (kvmhv_on_pseries())
> + info->flags |= KVM_PPC_NO_HASH;
> +
>   return 0;
>  }
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index d9cec6b..7f2ff3a 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -719,6 +719,7 @@ struct kvm_ppc_one_seg_page_size {
>  
>  #define KVM_PPC_PAGE_SIZES_REAL  0x0001
>  #define KVM_PPC_1T_SEGMENTS  0x0002
> +#define KVM_PPC_NO_HASH  0x0004
>  
>  struct kvm_ppc_smmu_info {
>   __u64 flags;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v5 09/33] KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests

2018-10-08 Thread David Gibson

On Mon, Oct 08, 2018 at 04:30:55PM +1100, Paul Mackerras wrote:
> This creates an alternative guest entry/exit path which is used for
> radix guests on POWER9 systems when we have indep_threads_mode=Y.  In
> these circumstances there is exactly one vcpu per vcore and there is
> no coordination required between vcpus or vcores; the vcpu can enter
> the guest without needing to synchronize with anything else.
> 
> The new fast path is implemented almost entirely in C in book3s_hv.c
> and runs with the MMU on until the guest is entered.  On guest exit
> we use the existing path until the point where we are committed to
> exiting the guest (as distinct from handling an interrupt in the
> low-level code and returning to the guest) and we have pulled the
> guest context from the XIVE.  At that point we check a flag in the
> stack frame to see whether we came in via the old path and the new
> path; if we came in via the new path then we go back to C code to do
> the rest of the process of saving the guest context and restoring the
> host context.
> 
> The C code is split into separate functions for handling the
> OS-accessible state and the hypervisor state, with the idea that the
> latter can be replaced by a hypercall when we implement nested
> virtualization.
> 
> Signed-off-by: Paul Mackerras 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/include/asm/asm-prototypes.h |   2 +
>  arch/powerpc/include/asm/kvm_ppc.h|   2 +
>  arch/powerpc/kvm/book3s_hv.c  | 429 
> +-
>  arch/powerpc/kvm/book3s_hv_ras.c  |   2 +
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  95 ++-
>  arch/powerpc/kvm/book3s_xive.c|  63 +
>  6 files changed, 589 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
> b/arch/powerpc/include/asm/asm-prototypes.h
> index 0c1a2b0..5c9b00c 100644
> --- a/arch/powerpc/include/asm/asm-prototypes.h
> +++ b/arch/powerpc/include/asm/asm-prototypes.h
> @@ -165,4 +165,6 @@ void kvmhv_load_host_pmu(void);
>  void kvmhv_save_guest_pmu(struct kvm_vcpu *vcpu, bool pmu_in_use);
>  void kvmhv_load_guest_pmu(struct kvm_vcpu *vcpu);
>  
> +int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu);
> +
>  #endif /* _ASM_POWERPC_ASM_PROTOTYPES_H */
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> b/arch/powerpc/include/asm/kvm_ppc.h
> index 83d61b8..245e564 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -585,6 +585,7 @@ extern int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 
> icpval);
>  
>  extern int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq,
>  int level, bool line_status);
> +extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
>  #else
>  static inline int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, u32 server,
>  u32 priority) { return -1; }
> @@ -607,6 +608,7 @@ static inline int kvmppc_xive_set_icp(struct kvm_vcpu 
> *vcpu, u64 icpval) { retur
>  
>  static inline int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, 
> u32 irq,
> int level, bool line_status) { return 
> -ENODEV; }
> +static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
>  #endif /* CONFIG_KVM_XIVE */
>  
>  /*
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 0e17593..0c1dd76 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3080,6 +3080,269 @@ static noinline void kvmppc_run_core(struct 
> kvmppc_vcore *vc)
>  }
>  
>  /*
> + * Load up hypervisor-mode registers on P9.
> + */
> +static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit)
> +{
> + struct kvmppc_vcore *vc = vcpu->arch.vcore;
> + s64 hdec;
> + u64 tb, purr, spurr;
> + int trap;
> + unsigned long host_hfscr = mfspr(SPRN_HFSCR);
> + unsigned long host_ciabr = mfspr(SPRN_CIABR);
> + unsigned long host_dawr = mfspr(SPRN_DAWR);
> + unsigned long host_dawrx = mfspr(SPRN_DAWRX);
> + unsigned long host_psscr = mfspr(SPRN_PSSCR);
> + unsigned long host_pidr = mfspr(SPRN_PID);
> +
> + hdec = time_limit - mftb();
> + if (hdec < 0)
> + return BOOK3S_INTERRUPT_HV_DECREMENTER;
> + mtspr(SPRN_HDEC, hdec);
> +
> + if (vc->tb_offset) {
> + u64 new_tb = mftb() + vc->tb_offset;
> + mtspr(SPRN_TBU40, new_tb);
> + tb = mftb();
> + if ((tb & 0xff) < (new_tb & 0xff))
> + mtspr(SPRN_TBU40, new_tb + 0x100);
> + vc->tb_offset_applied = vc->tb_offset;
> + }
> +
> + if (vc->pcr)
> + mtspr(SPRN_PCR, vc->pcr);
> + mtspr(SPRN_DPDES, vc->dpdes);
> + mtspr(SPRN_VTB, vc->vtb);
> +
> + local_paca->kvm_hstate.host_purr = mfspr(SPRN_PURR);
> + local_paca->kvm_hstate.host_spurr = mfspr(SPRN_SPURR);
> + mtspr(SPRN_PURR,

Re: [PATCH v5 32/33] KVM: PPC: Book3S HV: Add a VM capability to enable nested virtualization

2018-10-08 Thread David Gibson

On Mon, Oct 08, 2018 at 04:31:18PM +1100, Paul Mackerras wrote:
> With this, userspace can enable a KVM-HV guest to run nested guests
> under it.
> 
> The administrator can control whether any nested guests can be run;
> setting the "nested" module parameter to false prevents any guests
> becoming nested hypervisors (that is, any attempt to enable the nested
> capability on a guest will fail).  Guests which are already nested
> hypervisors will continue to be so.
> 
> Signed-off-by: Paul Mackerras 

Reviewed-by: David Gibson 

> ---
>  Documentation/virtual/kvm/api.txt  | 14 ++
>  arch/powerpc/include/asm/kvm_ppc.h |  1 +
>  arch/powerpc/kvm/book3s_hv.c   | 39 
> +-
>  arch/powerpc/kvm/powerpc.c | 12 
>  include/uapi/linux/kvm.h   |  1 +
>  5 files changed, 58 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 2f5f9b7..fde48b6 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -4532,6 +4532,20 @@ With this capability, a guest may read the 
> MSR_PLATFORM_INFO MSR. Otherwise,
>  a #GP would be raised when the guest tries to access. Currently, this
>  capability does not enable write permissions of this MSR for the guest.
>  
> +7.16 KVM_CAP_PPC_NESTED_HV
> +
> +Architectures: ppc
> +Parameters: none
> +Returns: 0 on success, -EINVAL when the implementation doesn't support
> +  nested-HV virtualization.
> +
> +HV-KVM on POWER9 and later systems allows for "nested-HV"
> +virtualization, which provides a way for a guest VM to run guests that
> +can run using the CPU's supervisor mode (privileged non-hypervisor
> +state).  Enabling this capability on a VM depends on the CPU having
> +the necessary functionality and on the facility being enabled with a
> +kvm-hv module parameter.
> +
>  8. Other capabilities.
>  --
>  
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> b/arch/powerpc/include/asm/kvm_ppc.h
> index 245e564..b3796bd 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -327,6 +327,7 @@ struct kvmppc_ops {
>   int (*set_smt_mode)(struct kvm *kvm, unsigned long mode,
>   unsigned long flags);
>   void (*giveup_ext)(struct kvm_vcpu *vcpu, ulong msr);
> + int (*enable_nested)(struct kvm *kvm);
>  };
>  
>  extern struct kvmppc_ops *kvmppc_hv_ops;
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 152bf75..fa61647 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -118,6 +118,16 @@ module_param_cb(h_ipi_redirect, _param_ops, 
> _ipi_redirect, 0644);
>  MODULE_PARM_DESC(h_ipi_redirect, "Redirect H_IPI wakeup to a free host 
> core");
>  #endif
>  
> +/* If set, guests are allowed to create and control nested guests */
> +static bool nested = true;
> +module_param(nested, bool, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(nested, "Enable nested virtualization (only on POWER9)");
> +
> +static inline bool nesting_enabled(struct kvm *kvm)
> +{
> + return kvm->arch.nested_enable && kvm_is_radix(kvm);
> +}
> +
>  /* If set, the threads on each CPU core have to be in the same MMU mode */
>  static bool no_mixing_hpt_and_radix;
>  
> @@ -959,12 +969,12 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>  
>   case H_SET_PARTITION_TABLE:
>   ret = H_FUNCTION;
> - if (vcpu->kvm->arch.nested_enable)
> + if (nesting_enabled(vcpu->kvm))
>   ret = kvmhv_set_partition_table(vcpu);
>   break;
>   case H_ENTER_NESTED:
>   ret = H_FUNCTION;
> - if (!vcpu->kvm->arch.nested_enable)
> + if (!nesting_enabled(vcpu->kvm))
>   break;
>   ret = kvmhv_enter_nested_guest(vcpu);
>   if (ret == H_INTERRUPT) {
> @@ -974,9 +984,8 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>   break;
>   case H_TLB_INVALIDATE:
>   ret = H_FUNCTION;
> - if (!vcpu->kvm->arch.nested_enable)
> - break;
> - ret = kvmhv_do_nested_tlbie(vcpu);
> + if (nesting_enabled(vcpu->kvm))
> + ret = kvmhv_do_nested_tlbie(vcpu);
>   break;
>  
>   default:
> @@ -4496,10 +4505,8 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu 
> *vcpu)
>  /* Must be called with kvm->lock held and mmu_ready = 0 and no vcpus running 
> */
>  int kvmppc_switch_mmu_to_hpt(struct kvm *kvm)
>  {
> - if (kvm->arch.nested_enable) {
> - kvm->arch.nested_enable = false;
> + if (nesting_enabled(kvm))
>   kvmhv_release_all_nested(kvm);
> - }
>   kvmppc_free_radix(kvm);
>   kvmppc_update_lpcr(kvm, LPCR_VPM1,
>  LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR);
> @@ -4776,7

Re: [PATCH v5 30/33] KVM: PPC: Book3S HV: Allow HV module to load without hypervisor mode

2018-10-08 Thread David Gibson

On Mon, Oct 08, 2018 at 04:31:16PM +1100, Paul Mackerras wrote:
> With this, the KVM-HV module can be loaded in a guest running under
> KVM-HV, and if the hypervisor supports nested virtualization, this
> guest can now act as a nested hypervisor and run nested guests.
> 
> This also adds some checks to inform userspace that HPT guests are not
> supported by nested hypervisors (by returning false for the
> KVM_CAP_PPC_MMU_HASH_V3 capability), and to prevent userspace from
> configuring a guest to use HPT mode.
> 
> Signed-off-by: Paul Mackerras 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/kvm/book3s_hv.c | 16 
>  arch/powerpc/kvm/powerpc.c   |  3 ++-
>  2 files changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 127bb5f..152bf75 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4807,11 +4807,15 @@ static int kvmppc_core_emulate_mfspr_hv(struct 
> kvm_vcpu *vcpu, int sprn,
>  
>  static int kvmppc_core_check_processor_compat_hv(void)
>  {
> - if (!cpu_has_feature(CPU_FTR_HVMODE) ||
> - !cpu_has_feature(CPU_FTR_ARCH_206))
> - return -EIO;
> + if (cpu_has_feature(CPU_FTR_HVMODE) &&
> + cpu_has_feature(CPU_FTR_ARCH_206))
> + return 0;
>  
> - return 0;
> + /* POWER9 in radix mode is capable of being a nested hypervisor. */
> + if (cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled())
> + return 0;
> +
> + return -EIO;
>  }
>  
>  #ifdef CONFIG_KVM_XICS
> @@ -5129,6 +5133,10 @@ static int kvmhv_configure_mmu(struct kvm *kvm, struct 
> kvm_ppc_mmuv3_cfg *cfg)
>   if (radix && !radix_enabled())
>   return -EINVAL;
>  
> + /* If we're a nested hypervisor, we currently only support radix */
> + if (kvmhv_on_pseries() && !radix)
> + return -EINVAL;
> +
>   mutex_lock(>lock);
>   if (radix != kvm_is_radix(kvm)) {
>   if (kvm->arch.mmu_ready) {
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index eba5756..1f4b128 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -594,7 +594,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   r = !!(hv_enabled && radix_enabled());
>   break;
>   case KVM_CAP_PPC_MMU_HASH_V3:
> - r = !!(hv_enabled && cpu_has_feature(CPU_FTR_ARCH_300));
> + r = !!(hv_enabled && cpu_has_feature(CPU_FTR_ARCH_300) &&
> +cpu_has_feature(CPU_FTR_HVMODE));
>   break;
>  #endif
>   case KVM_CAP_SYNC_MMU:

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH 09/16] of: overlay: validate overlay properties #address-cells and #size-cells

2018-10-08 Thread Frank Rowand

On 10/08/18 11:46, Alan Tull wrote:
> On Mon, Oct 8, 2018 at 10:57 AM Alan Tull  wrote:
>>
>> On Thu, Oct 4, 2018 at 11:14 PM  wrote:
>>>
>>> From: Frank Rowand 
>>>
>>> If overlay properties #address-cells or #size-cells are already in
>>> the live devicetree for any given node, then the values in the
>>> overlay must match the values in the live tree.
>>
>> Hi Frank,
>>
>> I'm starting some FPGA testing on this patchset applied to v4.19-rc7.
>> That applied cleanly; if that's not the best base to test against,
>> please let me know.

I would expect -rc7 to be ok to test against.  I'm doing the development
of it on -rc1.

Thanks for the testing.

>> On a very simple overlay, I'm seeing this patch's warning catching
>> things other than #address-cells or #size-cells.

#address-cells and #size-cells escape the warning for properties on an
existing (non-overlay) node if the existing node already contains them
as a special case.  Those two properties are needed in the overlay to
avoid dtc compiler warnings.  If the same properties already exist in
the base devicetree and have the same values as in the overlay then
there is no need to add property update changeset entries in the overlay
changeset.  Since there will not be changeset entries for those two
properties, there will be no memory leak when the changeset is removed.

The special casing of #address-cells and #size-cells is part of the
fix patches that are a result of the validation patches.  Thus a little
bit less memory leaking than we have today.

> What it's warning about are new properties being added to an existing
> node.  So !prop is true and !of_node_check_flag(target->np,
> OF_OVERLAY) also is true.  Is that a potential memory leak as you are
> warning?  If so, your code is working as planned and you'll just need
> to document that also in the header.

Yes, you are accurately describing what the check is catching.

The memory leak (on release) is because the memory allocated for overlay
properties is released when the reference count of the node they are
attached is decremented to zero, but only if the node is a dynamic flagged
node (as overlays are).  The memory allocated for the overlay properties
will not be freed in this case because the node is not a dynamic node.

>> I'm just getting
>> started looking at this, will spend time understanding this better and
>> I'll test other overlays.  The warnings were:
>>
>> Applying dtbo: socfpga_overlay.dtb
>> [   33.117881] fpga_manager fpga0: writing soc_system.rbf to Altera
>> SOCFPGA FPGA Manager
>> [   33.575223] OF: overlay: WARNING: add_changeset_property(), memory
>> leak will occur if overlay removed.  Property:
>> /soc/base-fpga-region/firmware-name
>> [   33.588584] OF: overlay: WARNING: add_changeset_property(), memory
>> leak will occur if overlay removed.  Property:
>> /soc/base-fpga-region/fpga-bridges
>> [   33.601856] OF: overlay: WARNING: add_changeset_property(), memory
>> leak will occur if overlay removed.  Property:
>> /soc/base-fpga-region/ranges

Are there properties in /soc/base-fpga-region/ in the base devicetree?

If not, then that node could be removed from the base devicetree and first 
created
in an overlay.

If so, is it possible to add an additional level of node, 
/soc/base-fpga-region/foo,
which would contain the properties that are warned about above?  Then the 
properties
would be children of an overlay node and the memory would be freed on overlay
release.

This is not actually a suggestion that should be implemented right now, just 
trying
to understand the possible alternatives, because this would result in an 
arbitrary
fake level in the tree (which I don't like).

My intent is to leave these validation checks as warnings while we figure out 
the
best way to solve the underlying memory leak issue.  Note that some of the
validation checks result in errors and cause an overlay apply to fail.  If I
did those checks correctly, they should only catch cases where the live tree
after applying the overlay was a "corrupt" tree instead of the desired changes.

I expect that Plumbers will be a good place to explore these things.

>> Here's part of that overlay including the properties it's complaining about:
>>
>> /dts-v1/;
>> /plugin/;
>> / {
>> fragment@0 {
>> target = <_fpga_region>;
>> #address-cells = <1>;
>> #size-cells = <1>;
>> __overlay__ {
>> #address-cells = <1>;
>> #size-cells = <1>;
>>
>> firmware-name = "soc_system.rbf";
>> fpga-bridges = <_bridge1>;
>> ranges = <0x2 0xff20 0x10>,
>> <0x0 0xc000 0x2000>;
>>
>> gpio@10040 {
>> so on...
>>
>> By the way, I didn't get any warnings when I subsequently removed this 
>> overlay.

Yes, I did not add any check that could catch this at release time.

-Frank

Re: Looking for architecture papers

2018-10-08 Thread Gustavo Romero


Hi Raz,

On 10/04/2018 04:41 AM, Raz wrote:

Frankly, the more I read the more perplexed I get. For example,
according to BOOK III-S, chapter 3,
the MSR bits are differ from the ones described in
arch/powerpc/include/asm/reg.h.
Bit zero, is LE, but in the book it is 64-bit mode.

Would someone be kind to explain what I do not understand?


Yes, I know that can be confusing at the first sight when one is used to, for
instance, x86.

x86 documents use LSB 0 notation, which means (as others already pointed out)
that the least significant bit of a value is marked as being bit 0.

On the other hand Power documents use MSB 0 notation, which means that the most
significant bit of a value is marked as being bit 0 and as a consequence the
least significant bit in that notation in a 64-bit platform is bit 63, not bit
0. MSB 0 notation is also known as IBM bit notation/bit numbering.

Historically LSB 0 notation tend to be used on docs about little-endian
architectures (for instance, x86), whilst MSB 0 notation tend to be used on docs
about big-endian architectures (for instance, Power - Power is actually a little
different because it's now bi-endian actually).

However LSB 0 and MSB 0 are only different notations, so LSB 0 can be employed
on a big-endian architecture documentation, and vice versa.

It happens that kernel code is written in C and for shifts, etc, it's convenient
the LSB 0 notation, not the MSB 0 notation, so it's convenient to use LSB 0
notation when creating a mask, like in arch/powerpc/include/asm/reg.h), i.e.
it's convenient to employ bit positions as '63 - '.

So, as another example, in the following gcc macro '_TEXASR_EXTRACT_BITS' takes
a bit position 'BITNUM' as found in the PowerISA documentation but then for the
shift right it uses '63 - BITNUM':

https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000/htmintrin.h#L44-L45

I think it's also important to mention that on PowerISA the elements also follow
the MSB 0 notation. So byte, word, and dword elements in a register found in the
instruction descriptions when referred to 0 are the element "at the left tip",
i.e. "the most significant elements", so to speak. For instance, take
instruction "vperm": doc says 'index' takes bits 3:7 of a byte from [byte]
element 'i'. So for a byte element i=0 it means the most significant byte
("on the left tip") of vector register operand 'VRC'. Moreover, specified bits
in that byte element, i.e. bits 3:7,  also follow the MSB 0, so for the
little-endian addicted thought they are bits 4:0 (LSB 0 notation).

Now, if bits 4:0 = 0b00011 (decimal 3), we grab byte element 3 from 'src'
(256-bit). However byte element 3 is also in MSB 0 notation, so it means third
byte of 'src' but starting counting bytes from 0 from the left to the right
(which in IMO looks indeed more natural since we count, for instance, Natural
Numbers on the 'x' axis similarly).

Hence, it's like to say that 'vperm' instruction in a certain sense has a
"big-endian semantics" for the byte indices. The 'vpermr' instruction introduced
by PowerISA v3.0 is meant to cope with that, so 'vpermr' byte indices have a
"little-endian semantics", so for bits 3:7 MSB 0 (or bits 4:0 in LSB 0 
notation) =
0b00011 (decimal 3), on the 'vpermr' instruction it really means we must count
bytes starting from right to left as in the LSB 0 notation and grab the third 
byte
element from right to left.

So, for instance:

vr0uint128 = 0x
vr1uint128 = 0x00102030405060708090a0b0c0d0e0f0
vr2uint128 = 0x0111223344556677aabbccddeeff
vr3uint128 = 0x0300

we have 'src' as:

MSB 0: v--- byte 0, 1, 2, 3, ...
LSB 0:  
...  3, 2, 1, byte 0 ---v
src = vr1 || vr2 = 00 10 20 30 40 50 60 70 80 90 A0 B0 C0 D0 E0 F0 01 11 22 33 
44 55 66 77 99 99 AA BB CC DD EE FF

vperm   vr0, vr1, vr2, vr3 result is:
vr0uint128 = 0x3000
byte 3 in MSB 0  = 0x30 ---^ and 0x00 (byte 0 in MSB 0) copied to the remaining 
bytes

whilst with vpermr (PowerISA v3.0 / POWER9):
vpermr   vr0, vr1, vr2, vr3 result is:
vr0uint128 = 0xccff
byte 3 in LSB 0  = 0xCC^ and 0xFF (byte 0 in LSB 0) copied to the remaining 
bytes


Anyway, vperm/vpermr was just an example about notation not being restricted to
bits on Power ISA. So read the docs carefully :) GDB is always useful for 
checking
if one's understanding about a given Power instruction is correct.

HTH.


Regards,
Gustavo

Re: Looking for architecture papers

2018-10-08 Thread Segher Boessenkool

On Mon, Oct 08, 2018 at 07:44:12PM +0300, Raz wrote:
> Both systemsim and my powerpc server boots with MSR_HV=1, i.e, hypervisor 
> state.
> Is there away to fix that ? writing to the MSR cannot work according
> the documentation ( and reality ).

But that is what you do: you write HV=0 in MSR.  After doing other setup,
of course.

On some hardware you cannot set HV=0.  You cannot do logical partitioning
on such hardware.  PowerMac G5 comes to mind.

Segher

[PATCH 4.18 125/168] sched/topology: Set correct NUMA topology type

2018-10-08 Thread Greg Kroah-Hartman

4.18-stable review patch.  If anyone has any objections, please let me know.

--

From: Srikar Dronamraju 

[ Upstream commit e5e96fafd9028b1478b165db78c52d981c14f471 ]

With the following commit:

  051f3ca02e46 ("sched/topology: Introduce NUMA identity node sched domain")

the scheduler introduced a new NUMA level. However this leads to the NUMA 
topology
on 2 node systems to not be marked as NUMA_DIRECT anymore.

After this commit, it gets reported as NUMA_BACKPLANE, because
sched_domains_numa_level is now 2 on 2 node systems.

Fix this by allowing setting systems that have up to 2 NUMA levels as
NUMA_DIRECT.

While here remove code that assumes that level can be 0.

Signed-off-by: Srikar Dronamraju 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Andre Wild 
Cc: Heiko Carstens 
Cc: Linus Torvalds 
Cc: Mel Gorman 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Suravee Suthikulpanit 
Cc: Thomas Gleixner 
Cc: linuxppc-dev 
Fixes: 051f3ca02e46 "Introduce NUMA identity node sched domain"
Link: 
http://lkml.kernel.org/r/1533920419-17410-1-git-send-email-sri...@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman 
---
 kernel/sched/topology.c |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1295,7 +1295,7 @@ static void init_numa_topology_type(void
 
n = sched_max_numa_distance;
 
-   if (sched_domains_numa_levels <= 1) {
+   if (sched_domains_numa_levels <= 2) {
sched_numa_topology_type = NUMA_DIRECT;
return;
}
@@ -1380,9 +1380,6 @@ void sched_init_numa(void)
break;
}
 
-   if (!level)
-   return;
-
/*
 * 'level' contains the number of unique distances
 *

Re: [PATCH 09/16] of: overlay: validate overlay properties #address-cells and #size-cells

2018-10-08 Thread Alan Tull

On Mon, Oct 8, 2018 at 10:57 AM Alan Tull  wrote:
>
> On Thu, Oct 4, 2018 at 11:14 PM  wrote:
> >
> > From: Frank Rowand 
> >
> > If overlay properties #address-cells or #size-cells are already in
> > the live devicetree for any given node, then the values in the
> > overlay must match the values in the live tree.
>
> Hi Frank,
>
> I'm starting some FPGA testing on this patchset applied to v4.19-rc7.
> That applied cleanly; if that's not the best base to test against,
> please let me know.
>
> On a very simple overlay, I'm seeing this patch's warning catching
> things other than #address-cells or #size-cells.

What it's warning about are new properties being added to an existing
node.  So !prop is true and !of_node_check_flag(target->np,
OF_OVERLAY) also is true.  Is that a potential memory leak as you are
warning?  If so, your code is working as planned and you'll just need
to document that also in the header.

> I'm just getting
> started looking at this, will spend time understanding this better and
> I'll test other overlays.  The warnings were:
>
> Applying dtbo: socfpga_overlay.dtb
> [   33.117881] fpga_manager fpga0: writing soc_system.rbf to Altera
> SOCFPGA FPGA Manager
> [   33.575223] OF: overlay: WARNING: add_changeset_property(), memory
> leak will occur if overlay removed.  Property:
> /soc/base-fpga-region/firmware-name
> [   33.588584] OF: overlay: WARNING: add_changeset_property(), memory
> leak will occur if overlay removed.  Property:
> /soc/base-fpga-region/fpga-bridges
> [   33.601856] OF: overlay: WARNING: add_changeset_property(), memory
> leak will occur if overlay removed.  Property:
> /soc/base-fpga-region/ranges
>
> Here's part of that overlay including the properties it's complaining about:
>
> /dts-v1/;
> /plugin/;
> / {
> fragment@0 {
> target = <_fpga_region>;
> #address-cells = <1>;
> #size-cells = <1>;
> __overlay__ {
> #address-cells = <1>;
> #size-cells = <1>;
>
> firmware-name = "soc_system.rbf";
> fpga-bridges = <_bridge1>;
> ranges = <0x2 0xff20 0x10>,
> <0x0 0xc000 0x2000>;
>
> gpio@10040 {
> so on...
>
> By the way, I didn't get any warnings when I subsequently removed this 
> overlay.
>
> Alan
>
> >
> > If the properties are already in the live tree then there is no
> > need to create a changeset entry to add them since they must
> > have the same value.  This reduces the memory used by the
> > changeset and eliminates a possible memory leak.  This is
> > verified by 12 fewer warnings during the devicetree unittest,
> > as the possible memory leak warnings about #address-cells and
> >
> > Signed-off-by: Frank Rowand 
> > ---
> >  drivers/of/overlay.c | 38 +++---
> >  1 file changed, 35 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> > index 29c33a5c533f..e6fb3ffe9d93 100644
> > --- a/drivers/of/overlay.c
> > +++ b/drivers/of/overlay.c
> > @@ -287,7 +287,12 @@ static struct property *dup_and_fixup_symbol_prop(
> >   * @target may be either in the live devicetree or in a new subtree that
> >   * is contained in the changeset.
> >   *
> > - * Some special properties are not updated (no error returned).
> > + * Some special properties are not added or updated (no error returned):
> > + * "name", "phandle", "linux,phandle".
> > + *
> > + * Properties "#address-cells" and "#size-cells" are not updated if they
> > + * are already in the live tree, but if present in the live tree, the 
> > values
> > + * in the overlay must match the values in the live tree.
> >   *
> >   * Update of property in symbols node is not allowed.
> >   *
> > @@ -300,6 +305,7 @@ static int add_changeset_property(struct 
> > overlay_changeset *ovcs,
> >  {
> > struct property *new_prop = NULL, *prop;
> > int ret = 0;
> > +   bool check_for_non_overlay_node = false;
> >
> > if (!of_prop_cmp(overlay_prop->name, "name") ||
> > !of_prop_cmp(overlay_prop->name, "phandle") ||
> > @@ -322,13 +328,39 @@ static int add_changeset_property(struct 
> > overlay_changeset *ovcs,
> > if (!new_prop)
> > return -ENOMEM;
> >
> > -   if (!prop)
> > +   if (!prop) {
> > +
> > +   check_for_non_overlay_node = true;
> > ret = of_changeset_add_property(>cset, target->np,
> > new_prop);
> > -   else
> > +
> > +   } else if (!of_prop_cmp(prop->name, "#address-cells")) {
> > +
> > +   if (prop->length != 4 || new_prop->length != 4 ||
> > +   *(u32 *)prop->value != *(u32 *)new_prop->value)
> > +   pr_err("ERROR: overlay and/or live tree 
> > #address-cells invalid in node %pOF\n",
> > +

Re: [PATCH 0/8] add generic builtin command line

2018-10-08 Thread Maksym Kokhan

Hi, Daniel

On Sat, Sep 29, 2018 at 9:17 PM  wrote:
>
> On Thu, Sep 27, 2018 at 07:55:08PM +0300, Maksym Kokhan wrote:
> > Daniel Walker (7):
> >   add generic builtin command line
> >   drivers: of: ifdef out cmdline section
> >   x86: convert to generic builtin command line
> >   arm: convert to generic builtin command line
> >   arm64: convert to generic builtin command line
> >   mips: convert to generic builtin command line
> >   powerpc: convert to generic builtin command line
> >
>
> When I originally submitted these I had a very good conversion with Rob 
> Herring
> on the device tree changes. It seemed fairly clear that my approach in these
> changes could be done better. It effected specifically arm64, but a lot of 
> other
> platforms use the device tree integrally. With arm64 you can reduce the 
> changes
> down to only Kconfig changes, and that would likely be the case for many of 
> the
> other architecture. I made patches to do this a while back, but have not had
> time to test them and push them out.

Can you please share this patches? I could test them and use to improve this
generic command line implementation.

> In terms of mips I think there's a fair amount of work needed to pull out 
> their
> architecture specific mangling into something generic. Part of my motivation 
> for
> these was to take the architecture specific feature and open that up for all 
> the
> architecture. So it makes sense that the mips changes should become part of
> that.

This is really makes sense, and we have intentions to implement it
afterward. It would be easier to initially merge this simple
implementation and then develop it step by step.

> The only changes which have no comments are the generic changes, x86, and
> powerpc. Those patches have been used at Cisco for years with no issues.
> I added those changes into my -next tree for a round of testing. Assuming 
> there
> are no issues I can work out the merging with the architecture maintainers.
> As for the other changes I think they can be done in time, as long as the
> generic parts of upstream the rest can be worked on by any of the architecture
> developers.

Thanks,
Maksym

Re: [PATCH 29/36] dt-bindings: arm: Convert Renesas board/soc bindings to json-schema

2018-10-08 Thread Rob Herring

On Mon, Oct 8, 2018 at 10:13 AM Geert Uytterhoeven  wrote:
>
> Hi Rob,
>
> On Mon, Oct 8, 2018 at 4:57 PM Rob Herring  wrote:
> > On Mon, Oct 8, 2018 at 2:47 AM Geert Uytterhoeven  
> > wrote:
> > > On Fri, Oct 5, 2018 at 6:59 PM Rob Herring  wrote:
> > > > Convert Renesas SoC bindings to DT schema format using json-schema.
>
> > > > --- /dev/null
> > > > +++ b/Documentation/devicetree/bindings/arm/shmobile.yaml
> > > > @@ -0,0 +1,205 @@
>
> > > > +  - description: Kingfisher (SBEV-RCAR-KF-M03)
> > > > +items:
> > > > +  - const: shimafuji,kingfisher
> > > > +  - enum:
> > > > +  - renesas,h3ulcb
> > > > +  - renesas,m3ulcb
> > > > +  - enum:
> > > > +  - renesas,r8a7795
> > > > +  - renesas,r8a7796
> > >
> > > This looks a bit funny: all other entries have the "const" last, and
> > > use it for the
> > > SoC number. May be correct, though.
> > > To clarify, this is an extension board that can fit both the [HM]3ULCB
> > > boards (actually also the new M3NULCB, I think).
> >
> > This being Kingfisher?
>
> Correct.
>
> > I wrote this based on dts files in the tree. There's 2 combinations that I 
> > see:
> >
> > "shimafuji,kingfisher", "renesas,h3ulcb", "renesas,r8a7795"
> > "shimafuji,kingfisher", "renesas,m3ulcb", "renesas,r8a7796"
> >
> > The schema allows 4 combinations (1 * 2 * 2). I have no idea if the
> > other combinations are possible. If not, then we could rewrite this as
> > 2 entries with 3 const values each.
>
> I expect there will soon be a third one:
>
> "shimafuji,kingfisher", "renesas,m3nulcb", "renesas,r8a77965"
>
> Technically, {h3,m3,m3n}ulcb are the same board (although there may be
> minor revision differences), with a different SiP mounted.
> But they are called/marketed depending on which SiP is mounted.
>
> And on top of that, you can plug in a Kingfisher daughterboard. Could be an
> overlay ;-)

We probably shouldn't have put kingfisher as a top-level compatible
then. But we did, so not really much point to discuss that now.

As to whether there's a better way to express it in the schema, I'm
not sure. I don't think there's a way with json-schema to express a
list, but the 1st item is optional.

Rob

Re: Looking for architecture papers

2018-10-08 Thread Raz

Both systemsim and my powerpc server boots with MSR_HV=1, i.e, hypervisor state.
Is there away to fix that ? writing to the MSR cannot work according
the documentation ( and reality ).



On Sat, Oct 6, 2018 at 3:27 PM Segher Boessenkool
 wrote:
>
> On Sat, Oct 06, 2018 at 12:19:45PM +0300, Raz wrote:
> > Hey
> > How does HVSC works ?
> > I looked in the code and LoPAR documentation. It looks like there is
> > vector called
> > system_call_pSeries ( at 0xc00 ) that is supposed to be called when we
> > invoke HVSC from kernel
> > mode.
> > Now, I wrote a NULL call HSVC and patched the exceptions-64s.S to
> > return RFID immediately.
> > This does not work.
> > Would you be so kind to explain how HVSC works ?
> > thank you
>
> If your kernel is not running in hypervisor mode, sc 1 does not call the
> kernel (but the hypervisor, instead).  If your kernel _is_ running in
> hypervisor mode, sc 1 does the same as sc 0, a normal system call.
>
> I don't know which it is for you; you didn't say.
>
> I have no idea what "a NULL call HSVC" means.  If you make exception c00
> return immediately (as you suggest) then you have made all system calls
> non-functional, which indeed is unlikely to work as you want.
>
>
> Segher

Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt

2018-10-08 Thread Christophe LEROY


Hi Nick,

Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :

Use nmi_enter similarly to system reset interrupts. This uses NMI
printk NMI buffers and turns off various debugging facilities that
helps avoid tripping on ourselves or other CPUs.

Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/kernel/traps.c | 9 ++---
  1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 2849c4f50324..6d31f9d7c333 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
  
  void machine_check_exception(struct pt_regs *regs)

  {
-   enum ctx_state prev_state = exception_enter();
int recover = 0;
+   bool nested = in_nmi();
+   if (!nested)
+   nmi_enter();


This alters preempt_count, then when die() is called
in_interrupt() returns true allthough the trap didn't happen in 
interrupt, so oops_end() panics for "fatal exception in interrupt" 
instead of gently sending SIGBUS the faulting app.


Any idea on how to fix this ?

Christophe

  
  	__this_cpu_inc(irq_stat.mce_exceptions);
  
@@ -820,10 +822,11 @@ void machine_check_exception(struct pt_regs *regs)
  
  	/* Must die if the interrupt is not recoverable */

if (!(regs->msr & MSR_RI))
-   panic("Unrecoverable Machine check");
+   nmi_panic(regs, "Unrecoverable Machine check");
  
  bail:

-   exception_exit(prev_state);
+   if (!nested)
+   nmi_exit();
  }
  
  void SMIException(struct pt_regs *regs)

Re: [PATCH 09/16] of: overlay: validate overlay properties #address-cells and #size-cells

2018-10-08 Thread Alan Tull

On Thu, Oct 4, 2018 at 11:14 PM  wrote:
>
> From: Frank Rowand 
>
> If overlay properties #address-cells or #size-cells are already in
> the live devicetree for any given node, then the values in the
> overlay must match the values in the live tree.

Hi Frank,

I'm starting some FPGA testing on this patchset applied to v4.19-rc7.
That applied cleanly; if that's not the best base to test against,
please let me know.

On a very simple overlay, I'm seeing this patch's warning catching
things other than #address-cells or #size-cells.  I'm just getting
started looking at this, will spend time understanding this better and
I'll test other overlays.  The warnings were:

Applying dtbo: socfpga_overlay.dtb
[   33.117881] fpga_manager fpga0: writing soc_system.rbf to Altera
SOCFPGA FPGA Manager
[   33.575223] OF: overlay: WARNING: add_changeset_property(), memory
leak will occur if overlay removed.  Property:
/soc/base-fpga-region/firmware-name
[   33.588584] OF: overlay: WARNING: add_changeset_property(), memory
leak will occur if overlay removed.  Property:
/soc/base-fpga-region/fpga-bridges
[   33.601856] OF: overlay: WARNING: add_changeset_property(), memory
leak will occur if overlay removed.  Property:
/soc/base-fpga-region/ranges

Here's part of that overlay including the properties it's complaining about:

/dts-v1/;
/plugin/;
/ {
fragment@0 {
target = <_fpga_region>;
#address-cells = <1>;
#size-cells = <1>;
__overlay__ {
#address-cells = <1>;
#size-cells = <1>;

firmware-name = "soc_system.rbf";
fpga-bridges = <_bridge1>;
ranges = <0x2 0xff20 0x10>,
<0x0 0xc000 0x2000>;

gpio@10040 {
so on...

By the way, I didn't get any warnings when I subsequently removed this overlay.

Alan

>
> If the properties are already in the live tree then there is no
> need to create a changeset entry to add them since they must
> have the same value.  This reduces the memory used by the
> changeset and eliminates a possible memory leak.  This is
> verified by 12 fewer warnings during the devicetree unittest,
> as the possible memory leak warnings about #address-cells and
>
> Signed-off-by: Frank Rowand 
> ---
>  drivers/of/overlay.c | 38 +++---
>  1 file changed, 35 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> index 29c33a5c533f..e6fb3ffe9d93 100644
> --- a/drivers/of/overlay.c
> +++ b/drivers/of/overlay.c
> @@ -287,7 +287,12 @@ static struct property *dup_and_fixup_symbol_prop(
>   * @target may be either in the live devicetree or in a new subtree that
>   * is contained in the changeset.
>   *
> - * Some special properties are not updated (no error returned).
> + * Some special properties are not added or updated (no error returned):
> + * "name", "phandle", "linux,phandle".
> + *
> + * Properties "#address-cells" and "#size-cells" are not updated if they
> + * are already in the live tree, but if present in the live tree, the values
> + * in the overlay must match the values in the live tree.
>   *
>   * Update of property in symbols node is not allowed.
>   *
> @@ -300,6 +305,7 @@ static int add_changeset_property(struct 
> overlay_changeset *ovcs,
>  {
> struct property *new_prop = NULL, *prop;
> int ret = 0;
> +   bool check_for_non_overlay_node = false;
>
> if (!of_prop_cmp(overlay_prop->name, "name") ||
> !of_prop_cmp(overlay_prop->name, "phandle") ||
> @@ -322,13 +328,39 @@ static int add_changeset_property(struct 
> overlay_changeset *ovcs,
> if (!new_prop)
> return -ENOMEM;
>
> -   if (!prop)
> +   if (!prop) {
> +
> +   check_for_non_overlay_node = true;
> ret = of_changeset_add_property(>cset, target->np,
> new_prop);
> -   else
> +
> +   } else if (!of_prop_cmp(prop->name, "#address-cells")) {
> +
> +   if (prop->length != 4 || new_prop->length != 4 ||
> +   *(u32 *)prop->value != *(u32 *)new_prop->value)
> +   pr_err("ERROR: overlay and/or live tree 
> #address-cells invalid in node %pOF\n",
> +  target->np);
> +
> +   } else if (!of_prop_cmp(prop->name, "#size-cells")) {
> +
> +   if (prop->length != 4 || new_prop->length != 4 ||
> +   *(u32 *)prop->value != *(u32 *)new_prop->value)
> +   pr_err("ERROR: overlay and/or live tree #size-cells 
> invalid in node %pOF\n",
> +  target->np);
> +
> +   } else {
> +
> +   check_for_non_overlay_node = true;
> ret = of_changeset_update_property(>cset, target->np,
>

Patch "sched/topology: Set correct NUMA topology type" has been added to the 4.18-stable tree

2018-10-08 Thread gregkh



This is a note to let you know that I've just added the patch titled

sched/topology: Set correct NUMA topology type

to the 4.18-stable tree which can be found at:

http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
 sched-topology-set-correct-numa-topology-type.patch
and it can be found in the queue-4.18 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let  know about it.


>From foo@baz Mon Oct  8 17:39:53 CEST 2018
From: Srikar Dronamraju 
Date: Fri, 10 Aug 2018 22:30:18 +0530
Subject: sched/topology: Set correct NUMA topology type

From: Srikar Dronamraju 

[ Upstream commit e5e96fafd9028b1478b165db78c52d981c14f471 ]

With the following commit:

  051f3ca02e46 ("sched/topology: Introduce NUMA identity node sched domain")

the scheduler introduced a new NUMA level. However this leads to the NUMA 
topology
on 2 node systems to not be marked as NUMA_DIRECT anymore.

After this commit, it gets reported as NUMA_BACKPLANE, because
sched_domains_numa_level is now 2 on 2 node systems.

Fix this by allowing setting systems that have up to 2 NUMA levels as
NUMA_DIRECT.

While here remove code that assumes that level can be 0.

Signed-off-by: Srikar Dronamraju 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Andre Wild 
Cc: Heiko Carstens 
Cc: Linus Torvalds 
Cc: Mel Gorman 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Suravee Suthikulpanit 
Cc: Thomas Gleixner 
Cc: linuxppc-dev 
Fixes: 051f3ca02e46 "Introduce NUMA identity node sched domain"
Link: 
http://lkml.kernel.org/r/1533920419-17410-1-git-send-email-sri...@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman 
---
 kernel/sched/topology.c |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1295,7 +1295,7 @@ static void init_numa_topology_type(void
 
n = sched_max_numa_distance;
 
-   if (sched_domains_numa_levels <= 1) {
+   if (sched_domains_numa_levels <= 2) {
sched_numa_topology_type = NUMA_DIRECT;
return;
}
@@ -1380,9 +1380,6 @@ void sched_init_numa(void)
break;
}
 
-   if (!level)
-   return;
-
/*
 * 'level' contains the number of unique distances
 *


Patches currently in stable-queue which might be from sri...@linux.vnet.ibm.com 
are

queue-4.18/sched-topology-set-correct-numa-topology-type.patch

Re: [PATCH 29/36] dt-bindings: arm: Convert Renesas board/soc bindings to json-schema

2018-10-08 Thread Geert Uytterhoeven

Hi Rob,

On Mon, Oct 8, 2018 at 4:57 PM Rob Herring  wrote:
> On Mon, Oct 8, 2018 at 2:47 AM Geert Uytterhoeven  
> wrote:
> > On Fri, Oct 5, 2018 at 6:59 PM Rob Herring  wrote:
> > > Convert Renesas SoC bindings to DT schema format using json-schema.

> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/arm/shmobile.yaml
> > > @@ -0,0 +1,205 @@

> > > +  - description: Kingfisher (SBEV-RCAR-KF-M03)
> > > +items:
> > > +  - const: shimafuji,kingfisher
> > > +  - enum:
> > > +  - renesas,h3ulcb
> > > +  - renesas,m3ulcb
> > > +  - enum:
> > > +  - renesas,r8a7795
> > > +  - renesas,r8a7796
> >
> > This looks a bit funny: all other entries have the "const" last, and
> > use it for the
> > SoC number. May be correct, though.
> > To clarify, this is an extension board that can fit both the [HM]3ULCB
> > boards (actually also the new M3NULCB, I think).
>
> This being Kingfisher?

Correct.

> I wrote this based on dts files in the tree. There's 2 combinations that I 
> see:
>
> "shimafuji,kingfisher", "renesas,h3ulcb", "renesas,r8a7795"
> "shimafuji,kingfisher", "renesas,m3ulcb", "renesas,r8a7796"
>
> The schema allows 4 combinations (1 * 2 * 2). I have no idea if the
> other combinations are possible. If not, then we could rewrite this as
> 2 entries with 3 const values each.

I expect there will soon be a third one:

"shimafuji,kingfisher", "renesas,m3nulcb", "renesas,r8a77965"

Technically, {h3,m3,m3n}ulcb are the same board (although there may be
minor revision differences), with a different SiP mounted.
But they are called/marketed depending on which SiP is mounted.

And on top of that, you can plug in a Kingfisher daughterboard. Could be an
overlay ;-)

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH 05/36] dt-bindings: arm: renesas: Move 'renesas,prr' binding to its own doc

2018-10-08 Thread Rob Herring

On Mon, Oct 8, 2018 at 2:05 AM Geert Uytterhoeven  wrote:
>
> Hi Rob,
>
> On Fri, Oct 5, 2018 at 6:58 PM Rob Herring  wrote:
> > In preparation to convert board-level bindings to json-schema, move
> > various misc SoC bindings out to their own file.
> >
> > Cc: Mark Rutland 
> > Cc: Simon Horman 
> > Cc: Magnus Damm 
> > Cc: devicet...@vger.kernel.org
> > Cc: linux-renesas-...@vger.kernel.org
> > Signed-off-by: Rob Herring 
>
> Looks good to me, but needs a rebase, as the PRR section has been extended
> in -next.

Is this something you all can apply still for 4.20?

Rob

Re: [PATCH 29/36] dt-bindings: arm: Convert Renesas board/soc bindings to json-schema

2018-10-08 Thread Rob Herring

On Mon, Oct 8, 2018 at 2:47 AM Geert Uytterhoeven  wrote:
>
> Hi Rob,
>
> On Fri, Oct 5, 2018 at 6:59 PM Rob Herring  wrote:
> > Convert Renesas SoC bindings to DT schema format using json-schema.
> >
> > Cc: Simon Horman 
> > Cc: Magnus Damm 
> > Cc: Mark Rutland 
> > Cc: linux-renesas-...@vger.kernel.org
> > Cc: devicet...@vger.kernel.org
> > Signed-off-by: Rob Herring 
>
> Thanks for your patch!
>
> Note that this will need a rebase, as more SoCs/boards have been added
> in -next.
>
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/arm/shmobile.yaml
> > @@ -0,0 +1,205 @@
> > +# SPDX-License-Identifier: None
>
> The old file didn't have an SPDX header, so it was GPL-2.0, implicitly?

Right. I meant to update this with something. I'd prefer it be dual
licensed as these aren't just kernel files, but I don't really want to
try to gather permissions from all the copyright holders. And who is
the copyright holder when it is implicit? Everyone listed by git
blame?

> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/bindings/arm/shmobile.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: Renesas SH-Mobile, R-Mobile, and R-Car Platform Device Tree Bindings
> > +
> > +maintainers:
> > +  - Geert Uytterhoeven 
>
> Simon Horman  (supporter:ARM/SHMOBILE ARM ARCHITECTURE)
> Magnus Damm  (supporter:ARM/SHMOBILE ARM ARCHITECTURE)
>
> You had it right in the CC list, though...

I generated it here from git log rather get_maintainers.pl because
get_maintainers.pl just lists me for a bunch of them.

> > +  - description: RZ/G1M (R8A77430)
> > +items:
> > +  - enum:
> > +  # iWave Systems RZ/G1M Qseven Development Platform 
> > (iW-RainboW-G20D-Qseven)
> > +  - iwave,g20d
> > +  - const: iwave,g20m
> > +  - const: renesas,r8a7743
> > +
> > +  - items:
> > +  - enum:
> > +  # iWave Systems RZ/G1M Qseven System On Module 
> > (iW-RainboW-G20M-Qseven)
> > +  - iwave,g20m
> > +  - const: renesas,r8a7743
> > +
> > +  - description: RZ/G1N (R8A77440)
> > +items:
> > +  - enum:
> > +  - renesas,sk-rzg1m # SK-RZG1M (YR8A77430S000BE)
>
> This board belongs under the RZ/G1M section above
> (see also the 7743 in the part number).

Indeed. Not sure how I screwed that one up.

> > +  - const: renesas,r8a7744
>
> > +  - description: Kingfisher (SBEV-RCAR-KF-M03)
> > +items:
> > +  - const: shimafuji,kingfisher
> > +  - enum:
> > +  - renesas,h3ulcb
> > +  - renesas,m3ulcb
> > +  - enum:
> > +  - renesas,r8a7795
> > +  - renesas,r8a7796
>
> This looks a bit funny: all other entries have the "const" last, and
> use it for the
> SoC number. May be correct, though.
> To clarify, this is an extension board that can fit both the [HM]3ULCB
> boards (actually also the new M3NULCB, I think).

This being Kingfisher?

I wrote this based on dts files in the tree. There's 2 combinations that I see:

"shimafuji,kingfisher", "renesas,h3ulcb", "renesas,r8a7795"
"shimafuji,kingfisher", "renesas,m3ulcb", "renesas,r8a7796"

The schema allows 4 combinations (1 * 2 * 2). I have no idea if the
other combinations are possible. If not, then we could rewrite this as
2 entries with 3 const values each.

Rob

Re: [PATCH 29/36] dt-bindings: arm: Convert Renesas board/soc bindings to json-schema

2018-10-08 Thread Rob Herring

On Mon, Oct 8, 2018 at 3:02 AM Simon Horman  wrote:
>
> On Fri, Oct 05, 2018 at 11:58:41AM -0500, Rob Herring wrote:
> > Convert Renesas SoC bindings to DT schema format using json-schema.
> >
> > Cc: Simon Horman 
> > Cc: Magnus Damm 
> > Cc: Mark Rutland 
> > Cc: linux-renesas-...@vger.kernel.org
> > Cc: devicet...@vger.kernel.org
> > Signed-off-by: Rob Herring 
>
> This seems fine to me other than that it does not seem
> to apply cleanly to next.
>
> shmobile.txt sees a couple of updates per release cycle so from my point of
> view it would ideal if this change could hit -rc1 to allow patches for
> v4.21 to be accepted smoothly (already one from Sergei will need rebasing).

When we get to the point of merging (which isn't going to be 4.20),
you and other maintainers can probably take all these patches. Other
than the few restructuring patches, the only dependency is the build
support which isn't a dependency to apply it, but build it. I plan to
build any patches as part of reviewing at least early on. OTOH, the
build support is small enough and self contained that maybe it can
just be applied for 4.20.

Rob

Re: [PATCH 28/36] dt-bindings: arm: Convert Rockchip board/soc bindings to json-schema

2018-10-08 Thread Rob Herring

On Mon, Oct 8, 2018 at 4:45 AM Heiko Stuebner  wrote:
>
> Hi Rob,
>
> either I'm misunderstanding that, or something did go a bit wrong during
> the conversion, as pointed out below:
>
> Am Freitag, 5. Oktober 2018, 18:58:40 CEST schrieb Rob Herring:
> > Convert Rockchip SoC bindings to DT schema format using json-schema.
> >
> > Cc: Mark Rutland 
> > Cc: Heiko Stuebner 
> > Cc: devicet...@vger.kernel.org
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-rockc...@lists.infradead.org
> > Signed-off-by: Rob Herring 
> > ---
> >  .../devicetree/bindings/arm/rockchip.txt  | 220 
> >  .../devicetree/bindings/arm/rockchip.yaml | 242 ++
> >  2 files changed, 242 insertions(+), 220 deletions(-)
> >  delete mode 100644 Documentation/devicetree/bindings/arm/rockchip.txt
> >  create mode 100644 Documentation/devicetree/bindings/arm/rockchip.yaml
> >
>
>
>
> > +properties:
> > +  $nodename:
> > +const: '/'
> > +  compatible:
> > +oneOf:
> > +  - items:
> > +  - enum:
> > +  - amarula,vyasa-rk3288
> > +  - asus,rk3288-tinker
> > +  - radxa,rock2-square
> > +  - chipspark,popmetal-rk3288
> > +  - netxeon,r89
> > +  - firefly,firefly-rk3288
> > +  - firefly,firefly-rk3288-beta
> > +  - firefly,firefly-rk3288-reload
> > +  - mqmaker,miqi
> > +  - rockchip,rk3288-fennec
> > +  - const: rockchip,rk3288
>
> These are very much distinct boards, so shouldn't they also get
> individual entries including their existing description like the phytec
> or google boards below?

It is grouped by SoC compatible and # of compatible strings. So this
one is all the cases that have 2 compatible strings. It is simply
saying the 1st compatible string must be one of the enums and the 2nd
compatible string must be "rockchip,rk3288".

>
> Similarly why is it an enum for those, while the Google boards get a
> const for each compatible string?

Because each Google board is a fixed list of strings.

> Most non-google boards below also lost their description and where lumped
> together into combined entries. Was that intentional?

If the description was just repeating the compatible string with
spaces and capitalization, then yes it was intentional. If your
description matches what you have for 'model', then I'd prefer to see
model added as a property schema.

Rob

Re: [PATCH 22/36] dt-bindings: arm: Convert FSL board/soc bindings to json-schema

2018-10-08 Thread Rob Herring

On Mon, Oct 8, 2018 at 2:02 AM Shawn Guo  wrote:
>
> On Fri, Oct 05, 2018 at 11:58:34AM -0500, Rob Herring wrote:
> > Convert Freescale SoC bindings to DT schema format using json-schema.

> > +properties:
> > +  $nodename:
> > +const: '/'
> > +  compatible:
> > +oneOf:
> > +  - description: i.MX23 based Boards
> > +items:
> > +  - enum:
> > +  - fsl,imx23-evk
> > +  - olimex,imx23-olinuxino
> > +  - const: fsl,imx23
> > +
> > +  - description: i.MX25 Product Development Kit
> > +items:
> > +  - enum:
> > +  - fsl,imx25-pdk
> > +  - const: fsl,imx25
> > +
> > +  - description: i.MX27 Product Development Kit
> > +items:
> > +  - enum:
> > +  - fsl,imx27-pdk
> > +  - const: fsl,imx27
> > +
> > +  - description: i.MX28 based Boards
> > +items:
> > +  - enum:
> > +  - fsl,imx28-evk
> > +  - i2se,duckbill
> > +  - i2se,duckbill-2
> > +  - technologic,imx28-ts4600
> > +  - const: fsl,imx28
> > +  - items:
>
> The schema is new to me.  This line looks unusual to me, so you may want
> to double check.

It's fine. There's just no description schema on this one as it's a
continuation of the previous one (logically, but not from a schema
perspective). Perhaps add "i.MX28 I2SE Duckbill 2 based boards".

> > +  - enum:
> > +  - i2se,duckbill-2-485
> > +  - i2se,duckbill-2-enocean
> > +  - i2se,duckbill-2-spi
> > +  - const: i2se,duckbill-2
> > +  - const: fsl,imx28
> > +
> > +  - description: i.MX51 Babbage Board

Re: [PATCH v5 0/9] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK

2018-10-08 Thread Christophe Leroy





On 10/08/2018 11:06 AM, Michael Ellerman wrote:

Christophe Leroy  writes:


The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

Moving thread_info into task_struct has the following advantages:
- It protects thread_info from corruption in the case of stack
overflows.
- Its address is harder to determine if stack addresses are
leaked, making a number of attacks more difficult.


This is blowing up pretty nicely with CONFIG_KMEMLEAK enabled, haven't
had time to dig further:


Nice :)

I have the same issue on PPC32.
Seems like when descending the stack, save_context_stack() calls 
validate_sp(), which in turn calls valid_irq_stack() when the first test 
fails.

But than early, hardirq_ctx[cpu] is NULL.
With sp = 0, valid_irq_stack() used to return false because it expected 
sp to be above the thread_info. But now that thread_info is gone, sp = 0 
is valid when stack = NULL.


The following fixes it:

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index afe76f7f316c..3e534147fd8f 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2006,6 +2006,9 @@ int validate_sp(unsigned long sp, struct 
task_struct *p,

 {
unsigned long stack_page = (unsigned long)task_stack_page(p);

+   if (sp < THREAD_SIZE)
+   return 0;
+
if (sp >= stack_page + sizeof(struct thread_struct)
&& sp <= stack_page + THREAD_SIZE - nbytes)
return 1;


Looking at this I also realise I forgot to remove the sizeof(struct 
thread_struct) from here. And this sizeof() was buggy, it should have 
been thread_info instead of thread_struct, but nevermind as it is going 
away.


Christophe


Unable to handle kernel paging request for data at address 0x
Faulting instruction address: 0xc0022064
Oops: Kernel access of bad area, sig: 11 [#9]
LE SMP NR_CPUS=32 NUMA
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 
4.19.0-rc3-gcc-7.3.1-00103-gc795acc08338 #268
NIP:  c0022064 LR: c00220c0 CTR: c001f5c0
REGS: c1244a50 TRAP: 0380   Not tainted  
(4.19.0-rc3-gcc-7.3.1-00103-gc795acc08338)
MSR:  80001033   CR: 48022244  XER: 2000
CFAR: c00220c4 IRQMASK: 1
GPR00: c00220c0 c1244cd0 c124b200 0001
GPR04: c1201180 0070 c1275ef8 
GPR08:  0001 3f90 2b6e6f6d6d6f635f
GPR12: c001f5c0 c145  02e2be38
GPR16: 7dc54c70 02d854b8  c0d87f00
GPR20: c0d87ef0 c0d87ee0 c0d87f08 c006c1a8
GPR24: c0d87ec8  7265677368657265 c0062a04
GPR28: 0006 c1201180  
NIP [c0022064] show_stack+0xe4/0x2b0
LR [c00220c0] show_stack+0x140/0x2b0
Call Trace:
[c1244cd0] [c002217c] show_stack+0x1fc/0x2b0 (unreliable)
[c1244da0] [c002245c] show_regs+0x22c/0x430
[c1244e50] [c002ae8c] __die+0xfc/0x140
[c1244ed0] [c002b954] die+0x74/0xf0
[c1244f10] [c006e0f8] bad_page_fault+0xe8/0x180
[c1244f80] [c0074f48] slb_miss_large_addr+0x68/0x2e0
[c1244fc0] [c0008ce8] large_addr_slb+0x158/0x160
--- interrupt: 380 at show_stack+0xe4/0x2b0
 LR = show_stack+0x140/0x2b0
[c12452b0] [c002217c] show_stack+0x1fc/0x2b0 (unreliable)
[c1245380] [c002245c] show_regs+0x22c/0x430
[c1245430] [c002ae8c] __die+0xfc/0x140
[c12454b0] [c002b954] die+0x74/0xf0
[c12454f0] [c006e0f8] bad_page_fault+0xe8/0x180
[c1245560] [c0074f48] slb_miss_large_addr+0x68/0x2e0
[c12455a0] [c0008ce8] large_addr_slb+0x158/0x160
--- interrupt: 380 at show_stack+0xe4/0x2b0
 LR = show_stack+0x140/0x2b0
[c1245890] [c002217c] show_stack+0x1fc/0x2b0 (unreliable)
[c1245960] [c002245c] show_regs+0x22c/0x430
[c1245a10] [c002ae8c] __die+0xfc/0x140
[c1245a90] [c002b954] die+0x74/0xf0
[c1245ad0] [c006e0f8] bad_page_fault+0xe8/0x180
[c1245b40] [c0074f48] slb_miss_large_addr+0x68/0x2e0
[c1245b80] [c0008ce8] large_addr_slb+0x158/0x160
--- interrupt: 380 at show_stack+0xe4/0x2b0
 LR = show_stack+0x140/0x2b0
[c1245e70] [c002217c] show_stack+0x1fc/0x2b0 (unreliable)
[c1245f40] [c002245c] show_regs+0x22c/0x430
[c1245ff0] [c002ae8c] __die+0xfc/0x140
[c1246070] [c002b954] die+0x74/0xf0
[c12460b0] [c006e0f8] bad_page_fault+0xe8/0x180
[c1246120] [c0074f48] slb_miss_large_addr+0x68/0x2e0
[c1246160] [c0008ce8] large_addr_slb+0x158/0x160
---

Re: [PATCH 29/36] dt-bindings: arm: Convert Renesas board/soc bindings to json-schema

2018-10-08 Thread Simon Horman

On Fri, Oct 05, 2018 at 11:58:41AM -0500, Rob Herring wrote:
> Convert Renesas SoC bindings to DT schema format using json-schema.
> 
> Cc: Simon Horman 
> Cc: Magnus Damm 
> Cc: Mark Rutland 
> Cc: linux-renesas-...@vger.kernel.org
> Cc: devicet...@vger.kernel.org
> Signed-off-by: Rob Herring 

This seems fine to me other than that it does not seem
to apply cleanly to next.

shmobile.txt sees a couple of updates per release cycle so from my point of
view it would ideal if this change could hit -rc1 to allow patches for
v4.21 to be accepted smoothly (already one from Sergei will need rebasing).

> ---
>  .../devicetree/bindings/arm/shmobile.txt  | 143 
>  .../devicetree/bindings/arm/shmobile.yaml | 205 ++
>  2 files changed, 205 insertions(+), 143 deletions(-)
>  delete mode 100644 Documentation/devicetree/bindings/arm/shmobile.txt
>  create mode 100644 Documentation/devicetree/bindings/arm/shmobile.yaml
> 
> diff --git a/Documentation/devicetree/bindings/arm/shmobile.txt 
> b/Documentation/devicetree/bindings/arm/shmobile.txt
> deleted file mode 100644
> index 619b765e5bee..
> --- a/Documentation/devicetree/bindings/arm/shmobile.txt
> +++ /dev/null
> @@ -1,143 +0,0 @@
> -Renesas SH-Mobile, R-Mobile, and R-Car Platform Device Tree Bindings
> -
> -
> -SoCs:
> -
> -  - Emma Mobile EV2
> -compatible = "renesas,emev2"
> -  - RZ/A1H (R7S72100)
> -compatible = "renesas,r7s72100"
> -  - SH-Mobile AG5 (R8A73A00/SH73A0)
> -compatible = "renesas,sh73a0"
> -  - R-Mobile APE6 (R8A73A40)
> -compatible = "renesas,r8a73a4"
> -  - R-Mobile A1 (R8A77400)
> -compatible = "renesas,r8a7740"
> -  - RZ/G1H (R8A77420)
> -compatible = "renesas,r8a7742"
> -  - RZ/G1M (R8A77430)
> -compatible = "renesas,r8a7743"
> -  - RZ/G1N (R8A77440)
> -compatible = "renesas,r8a7744"
> -  - RZ/G1E (R8A77450)
> -compatible = "renesas,r8a7745"
> -  - RZ/G1C (R8A77470)
> -compatible = "renesas,r8a77470"
> -  - R-Car M1A (R8A77781)
> -compatible = "renesas,r8a7778"
> -  - R-Car H1 (R8A77790)
> -compatible = "renesas,r8a7779"
> -  - R-Car H2 (R8A77900)
> -compatible = "renesas,r8a7790"
> -  - R-Car M2-W (R8A77910)
> -compatible = "renesas,r8a7791"
> -  - R-Car V2H (R8A77920)
> -compatible = "renesas,r8a7792"
> -  - R-Car M2-N (R8A77930)
> -compatible = "renesas,r8a7793"
> -  - R-Car E2 (R8A77940)
> -compatible = "renesas,r8a7794"
> -  - R-Car H3 (R8A77950)
> -compatible = "renesas,r8a7795"
> -  - R-Car M3-W (R8A77960)
> -compatible = "renesas,r8a7796"
> -  - R-Car M3-N (R8A77965)
> -compatible = "renesas,r8a77965"
> -  - R-Car V3M (R8A77970)
> -compatible = "renesas,r8a77970"
> -  - R-Car V3H (R8A77980)
> -compatible = "renesas,r8a77980"
> -  - R-Car E3 (R8A77990)
> -compatible = "renesas,r8a77990"
> -  - R-Car D3 (R8A77995)
> -compatible = "renesas,r8a77995"
> -  - RZ/N1D (R9A06G032)
> -compatible = "renesas,r9a06g032"
> -
> -Boards:
> -
> -  - Alt (RTP0RC7794SEB00010S)
> -compatible = "renesas,alt", "renesas,r8a7794"
> -  - APE6-EVM
> -compatible = "renesas,ape6evm", "renesas,r8a73a4"
> -  - Atmark Techno Armadillo-800 EVA
> -compatible = "renesas,armadillo800eva", "renesas,r8a7740"
> -  - Blanche (RTP0RC7792SEB00010S)
> -compatible = "renesas,blanche", "renesas,r8a7792"
> -  - BOCK-W
> -compatible = "renesas,bockw", "renesas,r8a7778"
> -  - Condor (RTP0RC77980SEB0010SS/RTP0RC77980SEB0010SA01)
> -compatible = "renesas,condor", "renesas,r8a77980"
> -  - Draak (RTP0RC77995SEB0010S)
> -compatible = "renesas,draak", "renesas,r8a77995"
> -  - Eagle (RTP0RC77970SEB0010S)
> -compatible = "renesas,eagle", "renesas,r8a77970"
> -  - Ebisu (RTP0RC77990SEB0010S)
> -compatible = "renesas,ebisu", "renesas,r8a77990"
> -  - Genmai (RTK772100BC0BR)
> -compatible = "renesas,genmai", "renesas,r7s72100"
> -  - GR-Peach (X28A-M01-E/F)
> -compatible = "renesas,gr-peach", "renesas,r7s72100"
> -  - Gose (RTP0RC7793SEB00010S)
> -compatible = "renesas,gose", "renesas,r8a7793"
> -  - H3ULCB (R-Car Starter Kit Premier, RTP0RC7795SKBX0010SA00 (H3 ES1.1))
> -H3ULCB (R-Car Starter Kit Premier, RTP0RC77951SKBX010SA00 (H3 ES2.0))
> -compatible = "renesas,h3ulcb", "renesas,r8a7795"
> -  - Henninger
> -compatible = "renesas,henninger", "renesas,r8a7791"
> -  - iWave Systems RZ/G1C Single Board Computer (iW-RainboW-G23S)
> -compatible = "iwave,g23s", "renesas,r8a77470"
> -  - iWave Systems RZ/G1E SODIMM SOM Development Platform (iW-RainboW-G22D)
> -compatible = "iwave,g22d", "iwave,g22m", "renesas,r8a7745"
> -  - iWave Systems RZ/G1E SODIMM System On Module (iW-RainboW-G22M-SM)
> -compatible = "iwave,g22m", "renesas,r8a7745"
> -  - iWave Systems RZ/G1M Qseven Development Platform (iW-RainboW-G20D-Qseven)
> -compatible = "iwave,g20d", "iwave,g20m", "renesas,r8a7743"
> -  - iWave Systems RZ/G1M

Re: [PATCH v5 0/9] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK

2018-10-08 Thread Michael Ellerman

Christophe Leroy  writes:

> The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which
> moves the thread_info into task_struct.
>
> Moving thread_info into task_struct has the following advantages:
> - It protects thread_info from corruption in the case of stack
> overflows.
> - Its address is harder to determine if stack addresses are
> leaked, making a number of attacks more difficult.

This is blowing up pretty nicely with CONFIG_KMEMLEAK enabled, haven't
had time to dig further:

Unable to handle kernel paging request for data at address 0x
Faulting instruction address: 0xc0022064
Oops: Kernel access of bad area, sig: 11 [#9]
LE SMP NR_CPUS=32 NUMA 
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 
4.19.0-rc3-gcc-7.3.1-00103-gc795acc08338 #268
NIP:  c0022064 LR: c00220c0 CTR: c001f5c0
REGS: c1244a50 TRAP: 0380   Not tainted  
(4.19.0-rc3-gcc-7.3.1-00103-gc795acc08338)
MSR:  80001033   CR: 48022244  XER: 2000
CFAR: c00220c4 IRQMASK: 1 
GPR00: c00220c0 c1244cd0 c124b200 0001 
GPR04: c1201180 0070 c1275ef8  
GPR08:  0001 3f90 2b6e6f6d6d6f635f 
GPR12: c001f5c0 c145  02e2be38 
GPR16: 7dc54c70 02d854b8  c0d87f00 
GPR20: c0d87ef0 c0d87ee0 c0d87f08 c006c1a8 
GPR24: c0d87ec8  7265677368657265 c0062a04 
GPR28: 0006 c1201180   
NIP [c0022064] show_stack+0xe4/0x2b0
LR [c00220c0] show_stack+0x140/0x2b0
Call Trace:
[c1244cd0] [c002217c] show_stack+0x1fc/0x2b0 (unreliable)
[c1244da0] [c002245c] show_regs+0x22c/0x430
[c1244e50] [c002ae8c] __die+0xfc/0x140
[c1244ed0] [c002b954] die+0x74/0xf0
[c1244f10] [c006e0f8] bad_page_fault+0xe8/0x180
[c1244f80] [c0074f48] slb_miss_large_addr+0x68/0x2e0
[c1244fc0] [c0008ce8] large_addr_slb+0x158/0x160
--- interrupt: 380 at show_stack+0xe4/0x2b0
LR = show_stack+0x140/0x2b0
[c12452b0] [c002217c] show_stack+0x1fc/0x2b0 (unreliable)
[c1245380] [c002245c] show_regs+0x22c/0x430
[c1245430] [c002ae8c] __die+0xfc/0x140
[c12454b0] [c002b954] die+0x74/0xf0
[c12454f0] [c006e0f8] bad_page_fault+0xe8/0x180
[c1245560] [c0074f48] slb_miss_large_addr+0x68/0x2e0
[c12455a0] [c0008ce8] large_addr_slb+0x158/0x160
--- interrupt: 380 at show_stack+0xe4/0x2b0
LR = show_stack+0x140/0x2b0
[c1245890] [c002217c] show_stack+0x1fc/0x2b0 (unreliable)
[c1245960] [c002245c] show_regs+0x22c/0x430
[c1245a10] [c002ae8c] __die+0xfc/0x140
[c1245a90] [c002b954] die+0x74/0xf0
[c1245ad0] [c006e0f8] bad_page_fault+0xe8/0x180
[c1245b40] [c0074f48] slb_miss_large_addr+0x68/0x2e0
[c1245b80] [c0008ce8] large_addr_slb+0x158/0x160
--- interrupt: 380 at show_stack+0xe4/0x2b0
LR = show_stack+0x140/0x2b0
[c1245e70] [c002217c] show_stack+0x1fc/0x2b0 (unreliable)
[c1245f40] [c002245c] show_regs+0x22c/0x430
[c1245ff0] [c002ae8c] __die+0xfc/0x140
[c1246070] [c002b954] die+0x74/0xf0
[c12460b0] [c006e0f8] bad_page_fault+0xe8/0x180
[c1246120] [c0074f48] slb_miss_large_addr+0x68/0x2e0
[c1246160] [c0008ce8] large_addr_slb+0x158/0x160
--- interrupt: 380 at show_stack+0xe4/0x2b0
LR = show_stack+0x140/0x2b0
[c1246450] [c002217c] show_stack+0x1fc/0x2b0 (unreliable)
[c1246520] [c002245c] show_regs+0x22c/0x430
[c12465d0] [c002ae8c] __die+0xfc/0x140
[c1246650] [c002b954] die+0x74/0xf0
[c1246690] [c006e0f8] bad_page_fault+0xe8/0x180
[c1246700] [c0074f48] slb_miss_large_addr+0x68/0x2e0
[c1246740] [c0008ce8] large_addr_slb+0x158/0x160
--- interrupt: 380 at show_stack+0xe4/0x2b0
LR = show_stack+0x140/0x2b0
[c1246a30] [c002217c] show_stack+0x1fc/0x2b0 (unreliable)
[c1246b00] [c002245c] show_regs+0x22c/0x430
[c1246bb0] [c002ae8c] __die+0xfc/0x140
[c1246c30] [c002b954] die+0x74/0xf0
[c1246c70] [c006e0f8] bad_page_fault+0xe8/0x180
[c1246ce0] [c0074f48] slb_miss_large_addr+0x68/0x2e0
[c1246d20] [c0008ce8] large_addr_slb+0x158/0x160
--- interrupt: 380 at show_stack+0xe4/0x2b0
LR = show_stack+0x140/0x2b0
[c1247010] [c002217c] show_stack+0x1fc/0x2b0 (unreliable)
[c12470e0] [c002245c] show_regs+0x22c/0x430

Re: [PATCH v6 0/9] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK

2018-10-08 Thread Benjamin Herrenschmidt

On Mon, 2018-10-08 at 09:16 +, Christophe Leroy wrote:
> The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which
> moves the thread_info into task_struct.

We need to make sure we don't have code that assumes that we don't take
faults on TI access.

On ppc64, the stack SLB entries are bolted, which means the TI is too.

We might have code that assumes that we don't get SLB faults when
accessing TI. If not, we're fine but that needs a close look.

Ben.

> Moving thread_info into task_struct has the following advantages:
> - It protects thread_info from corruption in the case of stack
> overflows.
> - Its address is harder to determine if stack addresses are
> leaked, making a number of attacks more difficult.
> 
> Changes since v5:
>  - Fixed livepatch_sp setup by using end_of_stack() instead of hardcoding
>  - Fixed PPC_BPF_LOAD_CPU() macro
> 
> Changes since v4:
>  - Fixed a build failure on 32bits SMP when include/generated/asm-offsets.h 
> is not
>  already existing, was due to spaces instead of a tab in the Makefile
> 
> Changes since RFC v3: (based on Nick's review)
>  - Renamed task_size.h to task_size_user64.h to better relate to what it 
> contains.
>  - Handling of the isolation of thread_info cpu field inside CONFIG_SMP 
> #ifdefs moved to a separate patch.
>  - Removed CURRENT_THREAD_INFO macro completely.
>  - Added a guard in asm/smp.h to avoid build failure before _TASK_CPU is 
> defined.
>  - Added a patch at the end to rename 'tp' pointers to 'sp' pointers
>  - Renamed 'tp' into 'sp' pointers in preparation patch when relevant
>  - Fixed a few commit logs
>  - Fixed checkpatch report.
> 
> Changes since RFC v2:
>  - Removed the modification of names in asm-offsets
>  - Created a rule in arch/powerpc/Makefile to append the offset of 
> current->cpu in CFLAGS
>  - Modified asm/smp.h to use the offset set in CFLAGS
>  - Squashed the renaming of THREAD_INFO to TASK_STACK in the preparation patch
>  - Moved the modification of current_pt_regs in the patch activating 
> CONFIG_THREAD_INFO_IN_TASK
> 
> Changes since RFC v1:
>  - Removed the first patch which was modifying header inclusion order in timer
>  - Modified some names in asm-offsets to avoid conflicts when including 
> asm-offsets in C files
>  - Modified asm/smp.h to avoid having to include linux/sched.h (using 
> asm-offsets instead)
>  - Moved some changes from the activation patch to the preparation patch.
> 
> Christophe Leroy (9):
>   book3s/64: avoid circular header inclusion in mmu-hash.h
>   powerpc: Only use task_struct 'cpu' field on SMP
>   powerpc: Prepare for moving thread_info into task_struct
>   powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
>   powerpc: regain entire stack space
>   powerpc: 'current_set' is now a table of task_struct pointers
>   powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU
>   powerpc/64: Remove CURRENT_THREAD_INFO
>   powerpc: clean stack pointers naming
> 
>  arch/powerpc/Kconfig   |  1 +
>  arch/powerpc/Makefile  |  8 ++-
>  arch/powerpc/include/asm/asm-prototypes.h  |  4 +-
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h  |  2 +-
>  arch/powerpc/include/asm/exception-64s.h   |  4 +-
>  arch/powerpc/include/asm/irq.h | 14 ++---
>  arch/powerpc/include/asm/livepatch.h   |  7 ++-
>  arch/powerpc/include/asm/processor.h   | 39 +
>  arch/powerpc/include/asm/ptrace.h  |  2 +-
>  arch/powerpc/include/asm/reg.h |  2 +-
>  arch/powerpc/include/asm/smp.h | 17 +-
>  arch/powerpc/include/asm/task_size_user64.h| 42 ++
>  arch/powerpc/include/asm/thread_info.h | 19 ---
>  arch/powerpc/kernel/asm-offsets.c  | 10 ++--
>  arch/powerpc/kernel/entry_32.S | 66 --
>  arch/powerpc/kernel/entry_64.S | 12 ++--
>  arch/powerpc/kernel/epapr_hcalls.S |  5 +-
>  arch/powerpc/kernel/exceptions-64e.S   | 13 +
>  arch/powerpc/kernel/exceptions-64s.S   |  2 +-
>  arch/powerpc/kernel/head_32.S  | 14 ++---
>  arch/powerpc/kernel/head_40x.S |  4 +-
>  arch/powerpc/kernel/head_44x.S |  8 +--
>  arch/powerpc/kernel/head_64.S  |  1 +
>  arch/powerpc/kernel/head_8xx.S |  2 +-
>  arch/powerpc/kernel/head_booke.h   | 12 +---
>  arch/powerpc/kernel/head_fsl_booke.S   | 16 +++---
>  arch/powerpc/kernel/idle_6xx.S |  8 +--
>  arch/powerpc/kernel/idle_book3e.S  |  2 +-
>  arch/powerpc/kernel/idle_e500.S|  8 +--
>  arch/powerpc/kernel/idle_power4.S  |  2 +-
>  arch/powerpc/kernel/irq.c  | 77 
> +-
>  arch/powerpc/kernel/kgdb.c | 28 --
>  arch/powerpc/kernel/machine_kexec_64.c |  6 +-
>

Re: [PATCH 28/36] dt-bindings: arm: Convert Rockchip board/soc bindings to json-schema

2018-10-08 Thread Heiko Stuebner

Hi Rob,

either I'm misunderstanding that, or something did go a bit wrong during
the conversion, as pointed out below:

Am Freitag, 5. Oktober 2018, 18:58:40 CEST schrieb Rob Herring:
> Convert Rockchip SoC bindings to DT schema format using json-schema.
> 
> Cc: Mark Rutland 
> Cc: Heiko Stuebner 
> Cc: devicet...@vger.kernel.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-rockc...@lists.infradead.org
> Signed-off-by: Rob Herring 
> ---
>  .../devicetree/bindings/arm/rockchip.txt  | 220 
>  .../devicetree/bindings/arm/rockchip.yaml | 242 ++
>  2 files changed, 242 insertions(+), 220 deletions(-)
>  delete mode 100644 Documentation/devicetree/bindings/arm/rockchip.txt
>  create mode 100644 Documentation/devicetree/bindings/arm/rockchip.yaml
> 



> +properties:
> +  $nodename:
> +const: '/'
> +  compatible:
> +oneOf:
> +  - items:
> +  - enum:
> +  - amarula,vyasa-rk3288
> +  - asus,rk3288-tinker
> +  - radxa,rock2-square
> +  - chipspark,popmetal-rk3288
> +  - netxeon,r89
> +  - firefly,firefly-rk3288
> +  - firefly,firefly-rk3288-beta
> +  - firefly,firefly-rk3288-reload
> +  - mqmaker,miqi
> +  - rockchip,rk3288-fennec
> +  - const: rockchip,rk3288

These are very much distinct boards, so shouldn't they also get
individual entries including their existing description like the phytec
or google boards below?

Similarly why is it an enum for those, while the Google boards get a
const for each compatible string?


Most non-google boards below also lost their description and where lumped
together into combined entries. Was that intentional?


Thanks
Heiko

> +
> +  - description: Phytec phyCORE-RK3288 Rapid Development Kit
> +items:
> +  - const: phytec,rk3288-pcm-947
> +  - const: phytec,rk3288-phycore-som
> +  - const: rockchip,rk3288
> +
> +  - description: Google Mickey (Asus Chromebit CS10)
> +items:
> +  - const: google,veyron-mickey-rev8
> +  - const: google,veyron-mickey-rev7
> +  - const: google,veyron-mickey-rev6
> +  - const: google,veyron-mickey-rev5
> +  - const: google,veyron-mickey-rev4
> +  - const: google,veyron-mickey-rev3
> +  - const: google,veyron-mickey-rev2
> +  - const: google,veyron-mickey-rev1
> +  - const: google,veyron-mickey-rev0
> +  - const: google,veyron-mickey
> +  - const: google,veyron
> +  - const: rockchip,rk3288
> +
> +  - description: Google Minnie (Asus Chromebook Flip C100P)
> +items:
> +  - const: google,veyron-minnie-rev4
> +  - const: google,veyron-minnie-rev3
> +  - const: google,veyron-minnie-rev2
> +  - const: google,veyron-minnie-rev1
> +  - const: google,veyron-minnie-rev0
> +  - const: google,veyron-minnie
> +  - const: google,veyron
> +  - const: rockchip,rk3288
> +
> +  - description: Google Pinky (dev-board)
> +items:
> +  - const: google,veyron-pinky-rev2
> +  - const: google,veyron-pinky
> +  - const: google,veyron
> +  - const: rockchip,rk3288
> +
> +  - description: Google Speedy (Asus C201 Chromebook)
> +items:
> +  - const: google,veyron-speedy-rev9
> +  - const: google,veyron-speedy-rev8
> +  - const: google,veyron-speedy-rev7
> +  - const: google,veyron-speedy-rev6
> +  - const: google,veyron-speedy-rev5
> +  - const: google,veyron-speedy-rev4
> +  - const: google,veyron-speedy-rev3
> +  - const: google,veyron-speedy-rev2
> +  - const: google,veyron-speedy
> +  - const: google,veyron
> +  - const: rockchip,rk3288
> +
> +  - description: Google Jaq (Haier Chromebook 11 and more)
> +items:
> +  - const: google,veyron-jaq-rev5
> +  - const: google,veyron-jaq-rev4
> +  - const: google,veyron-jaq-rev3
> +  - const: google,veyron-jaq-rev2
> +  - const: google,veyron-jaq-rev1
> +  - const: google,veyron-jaq
> +  - const: google,veyron
> +  - const: rockchip,rk3288
> +
> +  - description: Google Jerry (Hisense Chromebook C11 and more)
> +items:
> +  - const: google,veyron-jerry-rev7
> +  - const: google,veyron-jerry-rev6
> +  - const: google,veyron-jerry-rev5
> +  - const: google,veyron-jerry-rev4
> +  - const: google,veyron-jerry-rev3
> +  - const: google,veyron-jerry
> +  - const: google,veyron
> +  - const: rockchip,rk3288
> +
> +  - description: Google Brain (dev-board)
> +items:
> +  - const: google,veyron-brain-rev0
> +  - const: google,veyron-brain
> +  - const: google,veyron
> +  - const:

Re: [RFC PATCH kernel] vfio/spapr_tce: Get rid of possible infinite loop

2018-10-08 Thread Michael Ellerman

Serhii Popovych  writes:
> Alexey Kardashevskiy wrote:
>> As a part of cleanup, the SPAPR TCE IOMMU subdriver releases preregistered
>> memory. If there is a bug in memory release, the loop in
>> tce_iommu_release() becomes infinite; this actually happened to me.
>> 
>> This makes the loop finite and prints a warning on every failure to make
>> the code more bug prone.
>> 
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>>  drivers/vfio/vfio_iommu_spapr_tce.c | 10 +++---
>>  1 file changed, 3 insertions(+), 7 deletions(-)
>> 
>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
>> b/drivers/vfio/vfio_iommu_spapr_tce.c
>> index b1a8ab3..ece0651 100644
>> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
>> @@ -393,13 +394,8 @@ static void tce_iommu_release(void *iommu_data)
>>  tce_iommu_free_table(container, tbl);
>>  }
>>  
>> -while (!list_empty(>prereg_list)) {
>> -struct tce_iommu_prereg *tcemem;
>> -
>> -tcemem = list_first_entry(>prereg_list,
>> -struct tce_iommu_prereg, next);
>> -WARN_ON_ONCE(tce_iommu_prereg_free(container, tcemem));
>> -}
>> +list_for_each_entry_safe(tcemem, tmtmp, >prereg_list, next)
>> +WARN_ON(tce_iommu_prereg_free(container, tcemem));
>
> I'm not sure that tce_iommu_prereg_free() call under WARN_ON() is good
> idea because WARN_ON() is a preprocessor macro:
>
>   if CONFIG_WARN=n is added by the analogy with CONFIG_BUG=n defining
>   WARN_ON() as empty we will loose call to tce_iommu_prereg_free()
>   leaking resources.

I don't think that's likely to ever happen though, we have a large
number of uses that would need to be checked one-by-one:

  $ git grep "if (WARN_ON(" | wc -l
  2853


So if we ever did add CONFIG_WARN, I think it would still need to
evaluate the condition, just not emit a warning.

cheers

Re: [PATCH] powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y

2018-10-08 Thread Benjamin Herrenschmidt

On Mon, 2018-10-08 at 17:04 +1000, Nicholas Piggin wrote:
> On Mon, 08 Oct 2018 15:08:31 +1100
> Benjamin Herrenschmidt  wrote:
> 
> > HMIs will crash the kernel due to
> > 
> > BRANCH_LINK_TO_FAR(hmi_exception_realmode)
> > 
> > Calling into the OPD instead of the actual code.
> > 
> > Signed-off-by: Benjamin Herrenschmidt 
> > ---
> > 
> > This hack fixes it for me, but it's not great. Nick, any better idea ?
> 
> Is it a hack because the ifdef gunk, or because there's something
> deeper wrong with using the .sym?

I'd say ifdef gunk, also the KVM use doesn't need it bcs the kvm entry
isn't an OPD.

> I guess all those handlers that load label address by hand could have
> the bug silently creep in. Can we have them use the DOTSYM() macro?

The KVM one doesnt have a dotsym does it ?

Also should we load the TOC from the OPD ?

> Thanks,
> Nick
> 
> > 
> > diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> > b/arch/powerpc/kernel/exceptions-64s.S
> > index ea04dfb..752709cc8 100644
> > --- a/arch/powerpc/kernel/exceptions-64s.S
> > +++ b/arch/powerpc/kernel/exceptions-64s.S
> > @@ -1119,7 +1119,11 @@ TRAMP_REAL_BEGIN(hmi_exception_early)
> > EXCEPTION_PROLOG_COMMON_2(PACA_EXGEN)
> > EXCEPTION_PROLOG_COMMON_3(0xe60)
> > addir3,r1,STACK_FRAME_OVERHEAD
> > +#ifdef PPC64_ELF_ABI_v1
> > +   BRANCH_LINK_TO_FAR(.hmi_exception_realmode) /* Function call ABI */
> > +#else
> > BRANCH_LINK_TO_FAR(hmi_exception_realmode) /* Function call ABI */
> > +#endif
> > cmpdi   cr0,r3,0
> >  
> > /* Windup the stack. */
> > 
> >

[PATCH v6 9/9] powerpc: clean stack pointers naming

2018-10-08 Thread Christophe Leroy

Some stack pointers used to also be thread_info pointers
and were called tp. Now that they are only stack pointers,
rename them sp.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/irq.c  | 17 +++--
 arch/powerpc/kernel/setup_64.c | 20 ++--
 2 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 62cfccf4af89..754f0efc507b 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -659,21 +659,21 @@ void __do_irq(struct pt_regs *regs)
 void do_IRQ(struct pt_regs *regs)
 {
struct pt_regs *old_regs = set_irq_regs(regs);
-   void *curtp, *irqtp, *sirqtp;
+   void *cursp, *irqsp, *sirqsp;
 
/* Switch to the irq stack to handle this */
-   curtp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1));
-   irqtp = hardirq_ctx[raw_smp_processor_id()];
-   sirqtp = softirq_ctx[raw_smp_processor_id()];
+   cursp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1));
+   irqsp = hardirq_ctx[raw_smp_processor_id()];
+   sirqsp = softirq_ctx[raw_smp_processor_id()];
 
/* Already there ? */
-   if (unlikely(curtp == irqtp || curtp == sirqtp)) {
+   if (unlikely(cursp == irqsp || cursp == sirqsp)) {
__do_irq(regs);
set_irq_regs(old_regs);
return;
}
/* Switch stack and call */
-   call_do_irq(regs, irqtp);
+   call_do_irq(regs, irqsp);
 
set_irq_regs(old_regs);
 }
@@ -732,10 +732,7 @@ void irq_ctx_init(void)
 
 void do_softirq_own_stack(void)
 {
-   void *irqtp;
-
-   irqtp = softirq_ctx[smp_processor_id()];
-   call_do_softirq(irqtp);
+   call_do_softirq(softirq_ctx[smp_processor_id()]);
 }
 
 irq_hw_number_t virq_to_hw(unsigned int virq)
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 6792e9c90689..4912ec0320b8 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -717,22 +717,22 @@ void __init emergency_stack_init(void)
limit = min(ppc64_bolted_size(), ppc64_rma_size);
 
for_each_possible_cpu(i) {
-   void *ti;
+   void *sp;
 
-   ti = alloc_stack(limit, i);
-   memset(ti, 0, THREAD_SIZE);
-   paca_ptrs[i]->emergency_sp = ti + THREAD_SIZE;
+   sp = alloc_stack(limit, i);
+   memset(sp, 0, THREAD_SIZE);
+   paca_ptrs[i]->emergency_sp = sp + THREAD_SIZE;
 
 #ifdef CONFIG_PPC_BOOK3S_64
/* emergency stack for NMI exception handling. */
-   ti = alloc_stack(limit, i);
-   memset(ti, 0, THREAD_SIZE);
-   paca_ptrs[i]->nmi_emergency_sp = ti + THREAD_SIZE;
+   sp = alloc_stack(limit, i);
+   memset(sp, 0, THREAD_SIZE);
+   paca_ptrs[i]->nmi_emergency_sp = sp + THREAD_SIZE;
 
/* emergency stack for machine check exception handling. */
-   ti = alloc_stack(limit, i);
-   memset(ti, 0, THREAD_SIZE);
-   paca_ptrs[i]->mc_emergency_sp = ti + THREAD_SIZE;
+   sp = alloc_stack(limit, i);
+   memset(sp, 0, THREAD_SIZE);
+   paca_ptrs[i]->mc_emergency_sp = sp + THREAD_SIZE;
 #endif
}
 }
-- 
2.13.3

[PATCH v6 8/9] powerpc/64: Remove CURRENT_THREAD_INFO

2018-10-08 Thread Christophe Leroy

Now that current_thread_info is located at the beginning of 'current'
task struct, CURRENT_THREAD_INFO macro is not really needed any more.

This patch replaces it by loads of the value at PACACURRENT(r13).

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/exception-64s.h   |  4 ++--
 arch/powerpc/include/asm/thread_info.h |  4 
 arch/powerpc/kernel/entry_64.S | 10 +-
 arch/powerpc/kernel/exceptions-64e.S   |  2 +-
 arch/powerpc/kernel/exceptions-64s.S   |  2 +-
 arch/powerpc/kernel/idle_book3e.S  |  2 +-
 arch/powerpc/kernel/idle_power4.S  |  2 +-
 arch/powerpc/kernel/trace/ftrace_64_mprofile.S |  6 +++---
 8 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index a86fead0..ca3af3e9015e 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -680,7 +680,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 #define RUNLATCH_ON\
 BEGIN_FTR_SECTION  \
-   CURRENT_THREAD_INFO(r3, r1);\
+   ld  r3, PACACURRENT(r13);   \
ld  r4,TI_LOCAL_FLAGS(r3);  \
andi.   r0,r4,_TLF_RUNLATCH;\
beqlppc64_runlatch_on_trampoline;   \
@@ -730,7 +730,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
 #ifdef CONFIG_PPC_970_NAP
 #define FINISH_NAP \
 BEGIN_FTR_SECTION  \
-   CURRENT_THREAD_INFO(r11, r1);   \
+   ld  r11, PACACURRENT(r13);  \
ld  r9,TI_LOCAL_FLAGS(r11); \
andi.   r10,r9,_TLF_NAPPING;\
bnelpower4_fixup_nap;   \
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 361bb45b8990..2ee9e248c933 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -17,10 +17,6 @@
 
 #define THREAD_SIZE(1 << THREAD_SHIFT)
 
-#ifdef CONFIG_PPC64
-#define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(ld dest, 
PACACURRENT(r13))
-#endif
-
 #ifndef __ASSEMBLY__
 #include 
 #include 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6fce0f8fd8c4..06d9a7c084a1 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -158,7 +158,7 @@ system_call:/* label this so stack 
traces look sane */
li  r10,IRQS_ENABLED
std r10,SOFTE(r1)
 
-   CURRENT_THREAD_INFO(r11, r1)
+   ld  r11, PACACURRENT(r13)
ld  r10,TI_FLAGS(r11)
andi.   r11,r10,_TIF_SYSCALL_DOTRACE
bne .Lsyscall_dotrace   /* does not return */
@@ -205,7 +205,7 @@ system_call:/* label this so stack 
traces look sane */
ld  r3,RESULT(r1)
 #endif
 
-   CURRENT_THREAD_INFO(r12, r1)
+   ld  r12, PACACURRENT(r13)
 
ld  r8,_MSR(r1)
 #ifdef CONFIG_PPC_BOOK3S
@@ -336,7 +336,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
/* Repopulate r9 and r10 for the syscall path */
addir9,r1,STACK_FRAME_OVERHEAD
-   CURRENT_THREAD_INFO(r10, r1)
+   ld  r10, PACACURRENT(r13)
ld  r10,TI_FLAGS(r10)
 
cmpldi  r0,NR_syscalls
@@ -735,7 +735,7 @@ _GLOBAL(ret_from_except_lite)
mtmsrd  r10,1 /* Update machine state */
 #endif /* CONFIG_PPC_BOOK3E */
 
-   CURRENT_THREAD_INFO(r9, r1)
+   ld  r9, PACACURRENT(r13)
ld  r3,_MSR(r1)
 #ifdef CONFIG_PPC_BOOK3E
ld  r10,PACACURRENT(r13)
@@ -849,7 +849,7 @@ resume_kernel:
 1: bl  preempt_schedule_irq
 
/* Re-test flags and eventually loop */
-   CURRENT_THREAD_INFO(r9, r1)
+   ld  r9, PACACURRENT(r13)
ld  r4,TI_FLAGS(r9)
andi.   r0,r4,_TIF_NEED_RESCHED
bne 1b
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 231d066b4a3d..dfafcd0af009 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -469,7 +469,7 @@ exc_##n##_bad_stack:
\
  * interrupts happen before the wait instruction.
  */
 #define CHECK_NAPPING()
\
-   CURRENT_THREAD_INFO(r11, r1);   \
+   ld  r11, PACACURRENT(r13);  \
ld  r10,TI_LOCAL_FLAGS(r11);\
andi.   r9,r10,_TLF_NAPPING;\
beq+1f; \
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index b9239dbf6d59..f776f30ecfcc 100644

[PATCH v6 7/9] powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU

2018-10-08 Thread Christophe Leroy

Now that thread_info is similar to task_struct, it's address is in r2
so CURRENT_THREAD_INFO() macro is useless. This patch removes it.

At the same time, as the 'cpu' field is not anymore in thread_info,
this patch renames it to TASK_CPU.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Makefile  |  2 +-
 arch/powerpc/include/asm/thread_info.h |  2 --
 arch/powerpc/kernel/asm-offsets.c  |  2 +-
 arch/powerpc/kernel/entry_32.S | 43 --
 arch/powerpc/kernel/epapr_hcalls.S |  5 ++--
 arch/powerpc/kernel/head_fsl_booke.S   |  5 ++--
 arch/powerpc/kernel/idle_6xx.S |  8 +++
 arch/powerpc/kernel/idle_e500.S|  8 +++
 arch/powerpc/kernel/misc_32.S  |  3 +--
 arch/powerpc/mm/hash_low_32.S  | 14 ---
 arch/powerpc/sysdev/6xx-suspend.S  |  5 ++--
 11 files changed, 35 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 02e7ca1c15d4..f1e2d7f7b022 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -426,7 +426,7 @@ ifdef CONFIG_SMP
 prepare: task_cpu_prepare
 
 task_cpu_prepare: prepare0
-   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") 
print $$3;}' include/generated/asm-offsets.h))
+   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == 
"TASK_CPU") print $$3;}' include/generated/asm-offsets.h))
 endif
 
 # Use the file '.tmp_gas_check' for binutils tests, as gas won't output
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 61c8747cd926..361bb45b8990 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -19,8 +19,6 @@
 
 #ifdef CONFIG_PPC64
 #define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(ld dest, 
PACACURRENT(r13))
-#else
-#define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(mr dest, r2)
 #endif
 
 #ifndef __ASSEMBLY__
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 768ce602d624..31be6eb9c0d4 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -97,7 +97,7 @@ int main(void)
 #endif /* CONFIG_PPC64 */
OFFSET(TASK_STACK, task_struct, stack);
 #ifdef CONFIG_SMP
-   OFFSET(TI_CPU, task_struct, cpu);
+   OFFSET(TASK_CPU, task_struct, cpu);
 #endif
 
 #ifdef CONFIG_LIVEPATCH
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index bd3b146e18a3..d0c546ce387e 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -168,8 +168,7 @@ transfer_to_handler:
tophys(r11,r11)
addir11,r11,global_dbcr0@l
 #ifdef CONFIG_SMP
-   CURRENT_THREAD_INFO(r9, r1)
-   lwz r9,TI_CPU(r9)
+   lwz r9,TASK_CPU(r2)
slwir9,r9,3
add r11,r11,r9
 #endif
@@ -180,8 +179,7 @@ transfer_to_handler:
stw r12,4(r11)
 #endif
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-   CURRENT_THREAD_INFO(r9, r1)
-   tophys(r9, r9)
+   tophys(r9, r2)
ACCOUNT_CPU_USER_ENTRY(r9, r11, r12)
 #endif
 
@@ -195,8 +193,7 @@ transfer_to_handler:
ble-stack_ovf   /* then the kernel stack overflowed */
 5:
 #if defined(CONFIG_6xx) || defined(CONFIG_E500)
-   CURRENT_THREAD_INFO(r9, r1)
-   tophys(r9,r9)   /* check local flags */
+   tophys(r9,r2)   /* check local flags */
lwz r12,TI_LOCAL_FLAGS(r9)
mtcrf   0x01,r12
bt- 31-TLF_NAPPING,4f
@@ -345,8 +342,7 @@ _GLOBAL(DoSyscall)
mtmsr   r11
 1:
 #endif /* CONFIG_TRACE_IRQFLAGS */
-   CURRENT_THREAD_INFO(r10, r1)
-   lwz r11,TI_FLAGS(r10)
+   lwz r11,TI_FLAGS(r2)
andi.   r11,r11,_TIF_SYSCALL_DOTRACE
bne-syscall_dotrace
 syscall_dotrace_cont:
@@ -379,13 +375,12 @@ ret_from_syscall:
lwz r3,GPR3(r1)
 #endif
mr  r6,r3
-   CURRENT_THREAD_INFO(r12, r1)
/* disable interrupts so current_thread_info()->flags can't change */
LOAD_MSR_KERNEL(r10,MSR_KERNEL) /* doesn't include MSR_EE */
/* Note: We don't bother telling lockdep about it */
SYNC
MTMSRD(r10)
-   lwz r9,TI_FLAGS(r12)
+   lwz r9,TI_FLAGS(r2)
li  r8,-MAX_ERRNO
andi.   
r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
bne-syscall_exit_work
@@ -432,8 +427,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX)
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
andi.   r4,r8,MSR_PR
beq 3f
-   CURRENT_THREAD_INFO(r4, r1)
-   ACCOUNT_CPU_USER_EXIT(r4, r5, r7)
+   ACCOUNT_CPU_USER_EXIT(r2, r5, r7)
 3:
 #endif
lwz r4,_LINK(r1)
@@ -526,7 +520,7 @@ syscall_exit_work:
/* Clear per-syscall TIF flags if any are set.  */
 
li  r11,_TIF_PERSYSCALL_MASK
-   addir12,r12,TI_FLAGS
+   addi

[PATCH v6 6/9] powerpc: 'current_set' is now a table of task_struct pointers

2018-10-08 Thread Christophe Leroy

The table of pointers 'current_set' has been used for retrieving
the stack and current. They used to be thread_info pointers as
they were pointing to the stack and current was taken from the
'task' field of the thread_info.

Now, the pointers of 'current_set' table are now both pointers
to task_struct and pointers to thread_info.

As they are used to get current, and the stack pointer is
retrieved from current's stack field, this patch changes
their type to task_struct, and renames secondary_ti to
secondary_current.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/asm-prototypes.h |  4 ++--
 arch/powerpc/kernel/head_32.S |  6 +++---
 arch/powerpc/kernel/head_44x.S|  4 ++--
 arch/powerpc/kernel/head_fsl_booke.S  |  4 ++--
 arch/powerpc/kernel/smp.c | 10 --
 5 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 9bc98c239305..ab0541f9da42 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -23,8 +23,8 @@
 #include 
 
 /* SMP */
-extern struct thread_info *current_set[NR_CPUS];
-extern struct thread_info *secondary_ti;
+extern struct task_struct *current_set[NR_CPUS];
+extern struct task_struct *secondary_current;
 void start_secondary(void *unused);
 
 /* kexec */
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 44dfd73b2a62..ba0341bd5a00 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -842,9 +842,9 @@ __secondary_start:
 #endif /* CONFIG_6xx */
 
/* get current's stack and current */
-   lis r1,secondary_ti@ha
-   tophys(r1,r1)
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   tophys(r2,r2)
+   lwz r2,secondary_current@l(r2)
tophys(r1,r2)
lwz r1,TASK_STACK(r1)
 
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 2c7e90f36358..48e4de4dfd0c 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -1021,8 +1021,8 @@ _GLOBAL(start_secondary_47x)
/* Now we can get our task struct and real stack pointer */
 
/* Get current's stack and current */
-   lis r1,secondary_ti@ha
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   lwz r2,secondary_current@l(r2)
lwz r1,TASK_STACK(r2)
 
/* Current stack pointer */
diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index b8a2b789677e..0d27bfff52dd 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -1076,8 +1076,8 @@ __secondary_start:
bl  call_setup_cpu
 
/* get current's stack and current */
-   lis r1,secondary_ti@ha
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   lwz r2,secondary_current@l(r2)
lwz r1,TASK_STACK(r2)
 
/* stack */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index f22fcbeb9898..00193643f0da 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -74,7 +74,7 @@
 static DEFINE_PER_CPU(int, cpu_state) = { 0 };
 #endif
 
-struct thread_info *secondary_ti;
+struct task_struct *secondary_current;
 
 DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_l2_cache_map);
@@ -644,7 +644,7 @@ void smp_send_stop(void)
 }
 #endif /* CONFIG_NMI_IPI */
 
-struct thread_info *current_set[NR_CPUS];
+struct task_struct *current_set[NR_CPUS];
 
 static void smp_store_cpu_info(int id)
 {
@@ -724,7 +724,7 @@ void smp_prepare_boot_cpu(void)
paca_ptrs[boot_cpuid]->__current = current;
 #endif
set_numa_node(numa_cpu_lookup_table[boot_cpuid]);
-   current_set[boot_cpuid] = task_thread_info(current);
+   current_set[boot_cpuid] = current;
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -809,15 +809,13 @@ static bool secondaries_inhibited(void)
 
 static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle)
 {
-   struct thread_info *ti = task_thread_info(idle);
-
 #ifdef CONFIG_PPC64
paca_ptrs[cpu]->__current = idle;
paca_ptrs[cpu]->kstack = (unsigned long)task_stack_page(idle) +
  THREAD_SIZE - STACK_FRAME_OVERHEAD;
 #endif
idle->cpu = cpu;
-   secondary_ti = current_set[cpu] = ti;
+   secondary_current = current_set[cpu] = idle;
 }
 
 int __cpu_up(unsigned int cpu, struct task_struct *tidle)
-- 
2.13.3

[PATCH v6 5/9] powerpc: regain entire stack space

2018-10-08 Thread Christophe Leroy

thread_info is not anymore in the stack, so the entire stack
can now be used.

In the meantime, with the previous patch all pointers to the stacks
are not anymore pointers to thread_info so this patch changes them
to void*

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/irq.h   | 10 +-
 arch/powerpc/include/asm/processor.h |  3 +--
 arch/powerpc/kernel/asm-offsets.c|  1 -
 arch/powerpc/kernel/entry_32.S   | 14 --
 arch/powerpc/kernel/irq.c| 19 +--
 arch/powerpc/kernel/misc_32.S|  6 ++
 arch/powerpc/kernel/process.c|  9 +++--
 arch/powerpc/kernel/setup_64.c   |  8 
 8 files changed, 28 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index 2efbae8d93be..966ddd4d2414 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -48,9 +48,9 @@ struct pt_regs;
  * Per-cpu stacks for handling critical, debug and machine check
  * level interrupts.
  */
-extern struct thread_info *critirq_ctx[NR_CPUS];
-extern struct thread_info *dbgirq_ctx[NR_CPUS];
-extern struct thread_info *mcheckirq_ctx[NR_CPUS];
+extern void *critirq_ctx[NR_CPUS];
+extern void *dbgirq_ctx[NR_CPUS];
+extern void *mcheckirq_ctx[NR_CPUS];
 extern void exc_lvl_ctx_init(void);
 #else
 #define exc_lvl_ctx_init()
@@ -59,8 +59,8 @@ extern void exc_lvl_ctx_init(void);
 /*
  * Per-cpu stacks for handling hard and soft interrupts.
  */
-extern struct thread_info *hardirq_ctx[NR_CPUS];
-extern struct thread_info *softirq_ctx[NR_CPUS];
+extern void *hardirq_ctx[NR_CPUS];
+extern void *softirq_ctx[NR_CPUS];
 
 extern void irq_ctx_init(void);
 void call_do_softirq(void *sp);
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index b225c7f7c5a4..e763342265a2 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -331,8 +331,7 @@ struct thread_struct {
 #define ARCH_MIN_TASKALIGN 16
 
 #define INIT_SP(sizeof(init_stack) + (unsigned long) 
_stack)
-#define INIT_SP_LIMIT \
-   (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack)
+#define INIT_SP_LIMIT  ((unsigned long)_stack)
 
 #ifdef CONFIG_SPE
 #define SPEFSCR_INIT \
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 833d189df04c..768ce602d624 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -93,7 +93,6 @@ int main(void)
DEFINE(NMI_MASK, NMI_MASK);
OFFSET(TASKTHREADPPR, task_struct, thread.ppr);
 #else
-   DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16));
OFFSET(KSP_LIMIT, thread_struct, ksp_limit);
 #endif /* CONFIG_PPC64 */
OFFSET(TASK_STACK, task_struct, stack);
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index fa7a69ffb37a..bd3b146e18a3 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -97,14 +97,11 @@ crit_transfer_to_handler:
mfspr   r0,SPRN_SRR1
stw r0,_SRR1(r11)
 
-   /* set the stack limit to the current stack
-* and set the limit to protect the thread_info
-* struct
-*/
+   /* set the stack limit to the current stack */
mfspr   r8,SPRN_SPRG_THREAD
lwz r0,KSP_LIMIT(r8)
stw r0,SAVED_KSP_LIMIT(r11)
-   rlwimi  r0,r1,0,0,(31-THREAD_SHIFT)
+   rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
 #endif
@@ -121,14 +118,11 @@ crit_transfer_to_handler:
mfspr   r0,SPRN_SRR1
stw r0,crit_srr1@l(0)
 
-   /* set the stack limit to the current stack
-* and set the limit to protect the thread_info
-* struct
-*/
+   /* set the stack limit to the current stack */
mfspr   r8,SPRN_SPRG_THREAD
lwz r0,KSP_LIMIT(r8)
stw r0,saved_ksp_limit@l(0)
-   rlwimi  r0,r1,0,0,(31-THREAD_SHIFT)
+   rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
 #endif
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 3fdb6b6973cf..62cfccf4af89 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -618,9 +618,8 @@ static inline void check_stack_overflow(void)
sp = current_stack_pointer() & (THREAD_SIZE-1);
 
/* check for stack overflow: is there less than 2KB free? */
-   if (unlikely(sp < (sizeof(struct thread_info) + 2048))) {
-   pr_err("do_IRQ: stack overflow: %ld\n",
-   sp - sizeof(struct thread_info));
+   if (unlikely(sp < 2048)) {
+   pr_err("do_IRQ: stack overflow: %ld\n", sp);
dump_stack();
}
 #endif
@@ -660,7 +659,7 @@ void __do_irq(struct pt_regs *regs)
 void do_IRQ(struct pt_regs *regs)
 {
struct pt_regs *old_regs

[PATCH v6 4/9] powerpc: Activate CONFIG_THREAD_INFO_IN_TASK

2018-10-08 Thread Christophe Leroy

This patch activates CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

Moving thread_info into task_struct has the following advantages:
- It protects thread_info from corruption in the case of stack
overflows.
- Its address is harder to determine if stack addresses are
leaked, making a number of attacks more difficult.

This has the following consequences:
- thread_info is now located at the beginning of task_struct.
- The 'cpu' field is now in task_struct, and only exists when
CONFIG_SMP is active.
- thread_info doesn't have anymore the 'task' field.

This patch:
- Removes all recopy of thread_info struct when the stack changes.
- Changes the CURRENT_THREAD_INFO() macro to point to current.
- Selects CONFIG_THREAD_INFO_IN_TASK.
- Modifies raw_smp_processor_id() to get ->cpu from current without
including linux/sched.h to avoid circular inclusion and without
including asm/asm-offsets.h to avoid symbol names duplication
between ASM constants and C constants.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/Makefile  |  8 +-
 arch/powerpc/include/asm/ptrace.h  |  2 +-
 arch/powerpc/include/asm/smp.h | 17 +++-
 arch/powerpc/include/asm/thread_info.h | 17 ++--
 arch/powerpc/kernel/asm-offsets.c  |  7 +++--
 arch/powerpc/kernel/entry_32.S |  9 +++
 arch/powerpc/kernel/exceptions-64e.S   | 11 
 arch/powerpc/kernel/head_32.S  |  6 ++---
 arch/powerpc/kernel/head_44x.S |  4 +--
 arch/powerpc/kernel/head_64.S  |  1 +
 arch/powerpc/kernel/head_booke.h   |  8 +-
 arch/powerpc/kernel/head_fsl_booke.S   |  7 +++--
 arch/powerpc/kernel/irq.c  | 47 +-
 arch/powerpc/kernel/kgdb.c | 28 
 arch/powerpc/kernel/machine_kexec_64.c |  6 ++---
 arch/powerpc/kernel/setup_64.c | 21 ---
 arch/powerpc/kernel/smp.c  |  2 +-
 arch/powerpc/net/bpf_jit32.h   |  5 ++--
 19 files changed, 52 insertions(+), 155 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 602eea723624..3b958cd4e284 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -238,6 +238,7 @@ config PPC
select RTC_LIB
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
+   select THREAD_INFO_IN_TASK
select VIRT_TO_BUS  if !PPC64
#
# Please keep this list sorted alphabetically.
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 81552c7b46eb..02e7ca1c15d4 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -422,6 +422,13 @@ else
 endif
 endif
 
+ifdef CONFIG_SMP
+prepare: task_cpu_prepare
+
+task_cpu_prepare: prepare0
+   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") 
print $$3;}' include/generated/asm-offsets.h))
+endif
+
 # Use the file '.tmp_gas_check' for binutils tests, as gas won't output
 # to stdout and these checks are run even on install targets.
 TOUT   := .tmp_gas_check
@@ -439,4 +446,3 @@ checkbin:
 
 
 CLEAN_FILES += $(TOUT)
-
diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 447cbd1bee99..3a7e5561630b 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -120,7 +120,7 @@ extern int ptrace_put_reg(struct task_struct *task, int 
regno,
  unsigned long data);
 
 #define current_pt_regs() \
-   ((struct pt_regs *)((unsigned long)current_thread_info() + THREAD_SIZE) 
- 1)
+   ((struct pt_regs *)((unsigned long)task_stack_page(current) + 
THREAD_SIZE) - 1)
 /*
  * We use the least-significant bit of the trap field to indicate
  * whether we have saved the full set of registers, or only a
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 95b66a0c639b..93a8cd120663 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -83,7 +83,22 @@ int is_cpu_dead(unsigned int cpu);
 /* 32-bit */
 extern int smp_hw_index[];
 
-#define raw_smp_processor_id() (current_thread_info()->cpu)
+/*
+ * This is particularly ugly: it appears we can't actually get the definition
+ * of task_struct here, but we need access to the CPU this task is running on.
+ * Instead of using task_struct we're using _TASK_CPU which is extracted from
+ * asm-offsets.h by kbuild to get the current processor ID.
+ *
+ * This also needs to be safeguarded when building asm-offsets.s because at
+ * that time _TASK_CPU is not defined yet. It could have been guarded by
+ * _TASK_CPU itself, but we want the build to fail if _TASK_CPU is missing
+ * when building something else than asm-offsets.s
+ */
+#ifdef GENERATING_ASM_OFFSETS
+#define raw_smp_processor_id() (0)
+#else
+#define raw_smp_processor_id() (*(unsigned int *)((void *)current + 
_TASK_CPU))

[PATCH v6 3/9] powerpc: Prepare for moving thread_info into task_struct

2018-10-08 Thread Christophe Leroy

This patch cleans the powerpc kernel before activating
CONFIG_THREAD_INFO_IN_TASK:
- The purpose of the pointer given to call_do_softirq() and
call_do_irq() is to point the new stack ==> change it to void* and
rename it 'sp'
- Don't use CURRENT_THREAD_INFO() to locate the stack.
- Fix a few comments.
- Replace current_thread_info()->task by current
- Remove unnecessary casts to thread_info, as they'll become invalid
once thread_info is not in stack anymore.
- Rename THREAD_INFO to TASK_STASK: as it is in fact the offset of the
pointer to the stack in task_struct, this pointer will not be impacted
by the move of THREAD_INFO.
- Makes TASK_STACK available to PPC64. PPC64 will need it to get the
stack pointer from current once the thread_info have been moved.
- Modifies klp_init_thread_info() to take task_struct pointer argument.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/irq.h   |  4 ++--
 arch/powerpc/include/asm/livepatch.h |  7 ---
 arch/powerpc/include/asm/processor.h |  4 ++--
 arch/powerpc/include/asm/reg.h   |  2 +-
 arch/powerpc/kernel/asm-offsets.c|  2 +-
 arch/powerpc/kernel/entry_32.S   |  2 +-
 arch/powerpc/kernel/entry_64.S   |  2 +-
 arch/powerpc/kernel/head_32.S|  4 ++--
 arch/powerpc/kernel/head_40x.S   |  4 ++--
 arch/powerpc/kernel/head_44x.S   |  2 +-
 arch/powerpc/kernel/head_8xx.S   |  2 +-
 arch/powerpc/kernel/head_booke.h |  4 ++--
 arch/powerpc/kernel/head_fsl_booke.S |  4 ++--
 arch/powerpc/kernel/irq.c|  2 +-
 arch/powerpc/kernel/misc_32.S|  4 ++--
 arch/powerpc/kernel/process.c|  8 
 arch/powerpc/kernel/setup-common.c   |  2 +-
 arch/powerpc/kernel/setup_32.c   | 15 +--
 arch/powerpc/kernel/smp.c|  4 +++-
 19 files changed, 38 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index ee39ce56b2a2..2efbae8d93be 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -63,8 +63,8 @@ extern struct thread_info *hardirq_ctx[NR_CPUS];
 extern struct thread_info *softirq_ctx[NR_CPUS];
 
 extern void irq_ctx_init(void);
-extern void call_do_softirq(struct thread_info *tp);
-extern void call_do_irq(struct pt_regs *regs, struct thread_info *tp);
+void call_do_softirq(void *sp);
+void call_do_irq(struct pt_regs *regs, void *sp);
 extern void do_IRQ(struct pt_regs *regs);
 extern void __init init_IRQ(void);
 extern void __do_irq(struct pt_regs *regs);
diff --git a/arch/powerpc/include/asm/livepatch.h 
b/arch/powerpc/include/asm/livepatch.h
index 47a03b9b528b..8a81d10ccc82 100644
--- a/arch/powerpc/include/asm/livepatch.h
+++ b/arch/powerpc/include/asm/livepatch.h
@@ -43,13 +43,14 @@ static inline unsigned long 
klp_get_ftrace_location(unsigned long faddr)
return ftrace_location_range(faddr, faddr + 16);
 }
 
-static inline void klp_init_thread_info(struct thread_info *ti)
+static inline void klp_init_thread_info(struct task_struct *p)
 {
+   struct thread_info *ti = task_thread_info(p);
/* + 1 to account for STACK_END_MAGIC */
-   ti->livepatch_sp = (unsigned long *)(ti + 1) + 1;
+   ti->livepatch_sp = end_of_stack(p) + 1;
 }
 #else
-static void klp_init_thread_info(struct thread_info *ti) { }
+static inline void klp_init_thread_info(struct task_struct *p) { }
 #endif /* CONFIG_LIVEPATCH */
 
 #endif /* _ASM_POWERPC_LIVEPATCH_H */
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 13589274fe9b..b225c7f7c5a4 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -40,7 +40,7 @@
 
 #ifndef __ASSEMBLY__
 #include 
-#include 
+#include 
 #include 
 #include 
 
@@ -332,7 +332,7 @@ struct thread_struct {
 
 #define INIT_SP(sizeof(init_stack) + (unsigned long) 
_stack)
 #define INIT_SP_LIMIT \
-   (_ALIGN_UP(sizeof(init_thread_info), 16) + (unsigned long) _stack)
+   (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack)
 
 #ifdef CONFIG_SPE
 #define SPEFSCR_INIT \
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 640a4d818772..d2528a0b2f5b 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1058,7 +1058,7 @@
  * - SPRG9 debug exception scratch
  *
  * All 32-bit:
- * - SPRG3 current thread_info pointer
+ * - SPRG3 current thread_struct physical addr pointer
  *(virtual on BookE, physical on others)
  *
  * 32-bit classic:
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index a6d70fd2e499..c583a02e5a21 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -91,10 +91,10 @@ int main(void)
DEFINE(NMI_MASK, NMI_MASK);
OFFSET(TASKTHREADPPR, task_struct, thread.ppr);
 #else
-   OFFSET(THREAD_INFO, task_struct, stack);

[PATCH v6 2/9] powerpc: Only use task_struct 'cpu' field on SMP

2018-10-08 Thread Christophe Leroy

When moving to CONFIG_THREAD_INFO_IN_TASK, the thread_info 'cpu' field
gets moved into task_struct and only defined when CONFIG_SMP is set.

This patch ensures that TI_CPU is only used when CONFIG_SMP is set and
that task_struct 'cpu' field is not used directly out of SMP code.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kernel/head_fsl_booke.S | 2 ++
 arch/powerpc/kernel/misc_32.S| 4 
 arch/powerpc/xmon/xmon.c | 2 +-
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index e2750b856c8f..05b574f416b3 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -243,8 +243,10 @@ set_ivor:
li  r0,0
stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1)
 
+#ifdef CONFIG_SMP
CURRENT_THREAD_INFO(r22, r1)
stw r24, TI_CPU(r22)
+#endif
 
bl  early_init
 
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 695b24a2d954..2f0fe8bfc078 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -183,10 +183,14 @@ _GLOBAL(low_choose_750fx_pll)
or  r4,r4,r5
mtspr   SPRN_HID1,r4
 
+#ifdef CONFIG_SMP
/* Store new HID1 image */
CURRENT_THREAD_INFO(r6, r1)
lwz r6,TI_CPU(r6)
slwir6,r6,2
+#else
+   li  r6, 0
+#endif
addis   r6,r6,nap_save_hid1@ha
stw r4,nap_save_hid1@l(r6)
 
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index c70d17c9a6ba..1731793e1277 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2986,7 +2986,7 @@ static void show_task(struct task_struct *tsk)
printf("%px %016lx %6d %6d %c %2d %s\n", tsk,
tsk->thread.ksp,
tsk->pid, tsk->parent->pid,
-   state, task_thread_info(tsk)->cpu,
+   state, task_cpu(tsk),
tsk->comm);
 }
 
-- 
2.13.3

[PATCH v6 0/9] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK

2018-10-08 Thread Christophe Leroy

The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

Moving thread_info into task_struct has the following advantages:
- It protects thread_info from corruption in the case of stack
overflows.
- Its address is harder to determine if stack addresses are
leaked, making a number of attacks more difficult.

Changes since v5:
 - Fixed livepatch_sp setup by using end_of_stack() instead of hardcoding
 - Fixed PPC_BPF_LOAD_CPU() macro

Changes since v4:
 - Fixed a build failure on 32bits SMP when include/generated/asm-offsets.h is 
not
 already existing, was due to spaces instead of a tab in the Makefile

Changes since RFC v3: (based on Nick's review)
 - Renamed task_size.h to task_size_user64.h to better relate to what it 
contains.
 - Handling of the isolation of thread_info cpu field inside CONFIG_SMP #ifdefs 
moved to a separate patch.
 - Removed CURRENT_THREAD_INFO macro completely.
 - Added a guard in asm/smp.h to avoid build failure before _TASK_CPU is 
defined.
 - Added a patch at the end to rename 'tp' pointers to 'sp' pointers
 - Renamed 'tp' into 'sp' pointers in preparation patch when relevant
 - Fixed a few commit logs
 - Fixed checkpatch report.

Changes since RFC v2:
 - Removed the modification of names in asm-offsets
 - Created a rule in arch/powerpc/Makefile to append the offset of current->cpu 
in CFLAGS
 - Modified asm/smp.h to use the offset set in CFLAGS
 - Squashed the renaming of THREAD_INFO to TASK_STACK in the preparation patch
 - Moved the modification of current_pt_regs in the patch activating 
CONFIG_THREAD_INFO_IN_TASK

Changes since RFC v1:
 - Removed the first patch which was modifying header inclusion order in timer
 - Modified some names in asm-offsets to avoid conflicts when including 
asm-offsets in C files
 - Modified asm/smp.h to avoid having to include linux/sched.h (using 
asm-offsets instead)
 - Moved some changes from the activation patch to the preparation patch.

Christophe Leroy (9):
  book3s/64: avoid circular header inclusion in mmu-hash.h
  powerpc: Only use task_struct 'cpu' field on SMP
  powerpc: Prepare for moving thread_info into task_struct
  powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
  powerpc: regain entire stack space
  powerpc: 'current_set' is now a table of task_struct pointers
  powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU
  powerpc/64: Remove CURRENT_THREAD_INFO
  powerpc: clean stack pointers naming

 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/Makefile  |  8 ++-
 arch/powerpc/include/asm/asm-prototypes.h  |  4 +-
 arch/powerpc/include/asm/book3s/64/mmu-hash.h  |  2 +-
 arch/powerpc/include/asm/exception-64s.h   |  4 +-
 arch/powerpc/include/asm/irq.h | 14 ++---
 arch/powerpc/include/asm/livepatch.h   |  7 ++-
 arch/powerpc/include/asm/processor.h   | 39 +
 arch/powerpc/include/asm/ptrace.h  |  2 +-
 arch/powerpc/include/asm/reg.h |  2 +-
 arch/powerpc/include/asm/smp.h | 17 +-
 arch/powerpc/include/asm/task_size_user64.h| 42 ++
 arch/powerpc/include/asm/thread_info.h | 19 ---
 arch/powerpc/kernel/asm-offsets.c  | 10 ++--
 arch/powerpc/kernel/entry_32.S | 66 --
 arch/powerpc/kernel/entry_64.S | 12 ++--
 arch/powerpc/kernel/epapr_hcalls.S |  5 +-
 arch/powerpc/kernel/exceptions-64e.S   | 13 +
 arch/powerpc/kernel/exceptions-64s.S   |  2 +-
 arch/powerpc/kernel/head_32.S  | 14 ++---
 arch/powerpc/kernel/head_40x.S |  4 +-
 arch/powerpc/kernel/head_44x.S |  8 +--
 arch/powerpc/kernel/head_64.S  |  1 +
 arch/powerpc/kernel/head_8xx.S |  2 +-
 arch/powerpc/kernel/head_booke.h   | 12 +---
 arch/powerpc/kernel/head_fsl_booke.S   | 16 +++---
 arch/powerpc/kernel/idle_6xx.S |  8 +--
 arch/powerpc/kernel/idle_book3e.S  |  2 +-
 arch/powerpc/kernel/idle_e500.S|  8 +--
 arch/powerpc/kernel/idle_power4.S  |  2 +-
 arch/powerpc/kernel/irq.c  | 77 +-
 arch/powerpc/kernel/kgdb.c | 28 --
 arch/powerpc/kernel/machine_kexec_64.c |  6 +-
 arch/powerpc/kernel/misc_32.S  | 17 +++---
 arch/powerpc/kernel/process.c  | 17 +++---
 arch/powerpc/kernel/setup-common.c |  2 +-
 arch/powerpc/kernel/setup_32.c | 15 ++---
 arch/powerpc/kernel/setup_64.c | 41 --
 arch/powerpc/kernel/smp.c  | 16 +++---
 arch/powerpc/kernel/trace/ftrace_64_mprofile.S |  6 +-
 arch/powerpc/kvm/book3s_hv_hmi.c   |  1 +
 arch/powerpc/mm/hash_low_32.S  | 14 ++---
 arch/powerpc/net/bpf_jit32.h

[PATCH v6 1/9] book3s/64: avoid circular header inclusion in mmu-hash.h

2018-10-08 Thread Christophe Leroy

When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h
includes asm/current.h. This generates a circular dependency.
To avoid that, asm/processor.h shall not be included in mmu-hash.h

In order to do that, this patch moves into a new header called
asm/task_size_user64.h the information from asm/processor.h required
by mmu-hash.h

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  2 +-
 arch/powerpc/include/asm/processor.h  | 34 +-
 arch/powerpc/include/asm/task_size_user64.h   | 42 +++
 arch/powerpc/kvm/book3s_hv_hmi.c  |  1 +
 4 files changed, 45 insertions(+), 34 deletions(-)
 create mode 100644 arch/powerpc/include/asm/task_size_user64.h

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index e0e4ce8f77d6..02955d867067 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -23,7 +23,7 @@
  */
 #include 
 #include 
-#include 
+#include 
 #include 
 
 /*
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 52fadded5c1e..13589274fe9b 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -101,40 +101,8 @@ void release_thread(struct task_struct *);
 #endif
 
 #ifdef CONFIG_PPC64
-/*
- * 64-bit user address space can have multiple limits
- * For now supported values are:
- */
-#define TASK_SIZE_64TB  (0x4000UL)
-#define TASK_SIZE_128TB (0x8000UL)
-#define TASK_SIZE_512TB (0x0002UL)
-#define TASK_SIZE_1PB   (0x0004UL)
-#define TASK_SIZE_2PB   (0x0008UL)
-/*
- * With 52 bits in the address we can support
- * upto 4PB of range.
- */
-#define TASK_SIZE_4PB   (0x0010UL)
 
-/*
- * For now 512TB is only supported with book3s and 64K linux page size.
- */
-#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES)
-/*
- * Max value currently used:
- */
-#define TASK_SIZE_USER64   TASK_SIZE_4PB
-#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_128TB
-#define TASK_CONTEXT_SIZE  TASK_SIZE_512TB
-#else
-#define TASK_SIZE_USER64   TASK_SIZE_64TB
-#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_64TB
-/*
- * We don't need to allocate extended context ids for 4K page size, because
- * we limit the max effective address on this config to 64TB.
- */
-#define TASK_CONTEXT_SIZE  TASK_SIZE_64TB
-#endif
+#include 
 
 /*
  * 32-bit user address space is 4GB - 1 page
diff --git a/arch/powerpc/include/asm/task_size_user64.h 
b/arch/powerpc/include/asm/task_size_user64.h
new file mode 100644
index ..a4043075864b
--- /dev/null
+++ b/arch/powerpc/include/asm/task_size_user64.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_TASK_SIZE_USER64_H
+#define _ASM_POWERPC_TASK_SIZE_USER64_H
+
+#ifdef CONFIG_PPC64
+/*
+ * 64-bit user address space can have multiple limits
+ * For now supported values are:
+ */
+#define TASK_SIZE_64TB  (0x4000UL)
+#define TASK_SIZE_128TB (0x8000UL)
+#define TASK_SIZE_512TB (0x0002UL)
+#define TASK_SIZE_1PB   (0x0004UL)
+#define TASK_SIZE_2PB   (0x0008UL)
+/*
+ * With 52 bits in the address we can support
+ * upto 4PB of range.
+ */
+#define TASK_SIZE_4PB   (0x0010UL)
+
+/*
+ * For now 512TB is only supported with book3s and 64K linux page size.
+ */
+#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES)
+/*
+ * Max value currently used:
+ */
+#define TASK_SIZE_USER64   TASK_SIZE_4PB
+#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_128TB
+#define TASK_CONTEXT_SIZE  TASK_SIZE_512TB
+#else
+#define TASK_SIZE_USER64   TASK_SIZE_64TB
+#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_64TB
+/*
+ * We don't need to allocate extended context ids for 4K page size, because
+ * we limit the max effective address on this config to 64TB.
+ */
+#define TASK_CONTEXT_SIZE  TASK_SIZE_64TB
+#endif
+
+#endif /* CONFIG_PPC64 */
+#endif /* _ASM_POWERPC_TASK_SIZE_USER64_H */
diff --git a/arch/powerpc/kvm/book3s_hv_hmi.c b/arch/powerpc/kvm/book3s_hv_hmi.c
index e3f738eb1cac..64b5011475c7 100644
--- a/arch/powerpc/kvm/book3s_hv_hmi.c
+++ b/arch/powerpc/kvm/book3s_hv_hmi.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void wait_for_subcore_guest_exit(void)
 {
-- 
2.13.3

Re: [PATCH -next] powerpc/powernv: Fix debugfs_simple_attr.cocci warnings

2018-10-08 Thread Michael Ellerman

YueHaibing  writes:
> Use DEFINE_DEBUGFS_ATTRIBUTE rather than DEFINE_SIMPLE_ATTRIBUTE
> for debugfs files.
>
> Semantic patch information:
> Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file()
> imposes some significant overhead as compared to
> DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe().

Sorry this isn't detailed enough for me to actually understand the
pros/cons of this patch.

Perhaps I'm expected to know it, but I don't.

I had a look at what each macro produces and it wasn't obvious to me
what the benefit is.

cheers

> Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci
>
> Signed-off-by: YueHaibing 
> ---
>  arch/powerpc/platforms/powernv/memtrace.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
> b/arch/powerpc/platforms/powernv/memtrace.c
> index 84d038e..0cb6548 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -311,8 +311,8 @@ static int memtrace_enable_get(void *data, u64 *val)
>   return 0;
>  }
>  
> -DEFINE_SIMPLE_ATTRIBUTE(memtrace_init_fops, memtrace_enable_get,
> - memtrace_enable_set, "0x%016llx\n");
> +DEFINE_DEBUGFS_ATTRIBUTE(memtrace_init_fops, memtrace_enable_get,
> +  memtrace_enable_set, "0x%016llx\n");
>  
>  static int memtrace_init(void)
>  {
> @@ -321,8 +321,8 @@ static int memtrace_init(void)
>   if (!memtrace_debugfs_dir)
>   return -1;
>  
> - debugfs_create_file("enable", 0600, memtrace_debugfs_dir,
> - NULL, _init_fops);
> + debugfs_create_file_unsafe("enable", 0600, memtrace_debugfs_dir, NULL,
> +_init_fops);
>  
>   return 0;
>  }

Re: [PATCH] powerpc: Don't print kernel instructions in show_user_instructions()

2018-10-08 Thread Michael Ellerman

Jann Horn  writes:
> On Fri, Oct 5, 2018 at 3:21 PM Michael Ellerman  wrote:
>> Recently we implemented show_user_instructions() which dumps the code
>> around the NIP when a user space process dies with an unhandled
>> signal. This was modelled on the x86 code, and we even went so far as
>> to implement the exact same bug, namely that if the user process
>> crashed with its NIP pointing into the kernel we will dump kernel text
>> to dmesg. eg:
>>
>>   bad-bctr[2996]: segfault (11) at c001 nip c001 lr 
>> 12d0b0894 code 1
>>   bad-bctr[2996]: code: fbe10068 7cbe2b78 7c7f1b78 fb610048 38a10028 
>> 38810020 fb810050 7f8802a6
>>   bad-bctr[2996]: code: 3860001c f8010080 48242371 6000 <7c7b1b79> 
>> 4082002c e8010080 eb610048
>>
>> This was discovered on x86 by Jann Horn and fixed in commit
>> 342db04ae712 ("x86/dumpstack: Don't dump kernel memory based on usermode 
>> RIP").
>>
>> Fix it by checking the adjusted NIP value (pc) and number of
>> instructions against USER_DS, and bail if we fail the check, eg:
>
> This fix looks good to me.

Thanks.

> In the long term, I think it is somewhat awkward to use
> probe_kernel_address(), which uses set_fs(KERNEL_DS), when you
> actually just want to access userspace memory. It might make sense to
> provide a better helper for explicitly accessing memory with USER_DS.

Yes I agree, it's a bit messy. A probe_user_read() that sets USER_DS and
does the access_ok() check would be less error prone I think.

cheers

Re: [PATCH v5 05/33] KVM: PPC: Book3S HV: Extract PMU save/restore operations as C-callable functions

2018-10-08 Thread Madhavan Srinivasan





On Monday 08 October 2018 11:00 AM, Paul Mackerras wrote:

This pulls out the assembler code that is responsible for saving and
restoring the PMU state for the host and guest into separate functions
so they can be used from an alternate entry path.  The calling
convention is made compatible with C.


Reviewed-by: Madhavan Srinivasan 



Reviewed-by: David Gibson 
Signed-off-by: Paul Mackerras 
---
  arch/powerpc/include/asm/asm-prototypes.h |   5 +
  arch/powerpc/kvm/book3s_hv_interrupts.S   |  95 
  arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 363 --
  3 files changed, 253 insertions(+), 210 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 1f4691c..024e8fc 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -150,4 +150,9 @@ extern s32 patch__memset_nocache, patch__memcpy_nocache;

  extern long flush_count_cache;

+void kvmhv_save_host_pmu(void);
+void kvmhv_load_host_pmu(void);
+void kvmhv_save_guest_pmu(struct kvm_vcpu *vcpu, bool pmu_in_use);
+void kvmhv_load_guest_pmu(struct kvm_vcpu *vcpu);
+
  #endif /* _ASM_POWERPC_ASM_PROTOTYPES_H */
diff --git a/arch/powerpc/kvm/book3s_hv_interrupts.S 
b/arch/powerpc/kvm/book3s_hv_interrupts.S
index 666b91c..a6d1001 100644
--- a/arch/powerpc/kvm/book3s_hv_interrupts.S
+++ b/arch/powerpc/kvm/book3s_hv_interrupts.S
@@ -64,52 +64,7 @@ BEGIN_FTR_SECTION
  END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)

/* Save host PMU registers */
-BEGIN_FTR_SECTION
-   /* Work around P8 PMAE bug */
-   li  r3, -1
-   clrrdi  r3, r3, 10
-   mfspr   r8, SPRN_MMCR2
-   mtspr   SPRN_MMCR2, r3  /* freeze all counters using MMCR2 */
-   isync
-END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
-   li  r3, 1
-   sldir3, r3, 31  /* MMCR0_FC (freeze counters) bit */
-   mfspr   r7, SPRN_MMCR0  /* save MMCR0 */
-   mtspr   SPRN_MMCR0, r3  /* freeze all counters, disable 
interrupts */
-   mfspr   r6, SPRN_MMCRA
-   /* Clear MMCRA in order to disable SDAR updates */
-   li  r5, 0
-   mtspr   SPRN_MMCRA, r5
-   isync
-   lbz r5, PACA_PMCINUSE(r13)  /* is the host using the PMU? */
-   cmpwi   r5, 0
-   beq 31f /* skip if not */
-   mfspr   r5, SPRN_MMCR1
-   mfspr   r9, SPRN_SIAR
-   mfspr   r10, SPRN_SDAR
-   std r7, HSTATE_MMCR0(r13)
-   std r5, HSTATE_MMCR1(r13)
-   std r6, HSTATE_MMCRA(r13)
-   std r9, HSTATE_SIAR(r13)
-   std r10, HSTATE_SDAR(r13)
-BEGIN_FTR_SECTION
-   mfspr   r9, SPRN_SIER
-   std r8, HSTATE_MMCR2(r13)
-   std r9, HSTATE_SIER(r13)
-END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
-   mfspr   r3, SPRN_PMC1
-   mfspr   r5, SPRN_PMC2
-   mfspr   r6, SPRN_PMC3
-   mfspr   r7, SPRN_PMC4
-   mfspr   r8, SPRN_PMC5
-   mfspr   r9, SPRN_PMC6
-   stw r3, HSTATE_PMC1(r13)
-   stw r5, HSTATE_PMC2(r13)
-   stw r6, HSTATE_PMC3(r13)
-   stw r7, HSTATE_PMC4(r13)
-   stw r8, HSTATE_PMC5(r13)
-   stw r9, HSTATE_PMC6(r13)
-31:
+   bl  kvmhv_save_host_pmu

/*
 * Put whatever is in the decrementer into the
@@ -161,3 +116,51 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
ld  r0, PPC_LR_STKOFF(r1)
mtlrr0
blr
+
+_GLOBAL(kvmhv_save_host_pmu)
+BEGIN_FTR_SECTION
+   /* Work around P8 PMAE bug */
+   li  r3, -1
+   clrrdi  r3, r3, 10
+   mfspr   r8, SPRN_MMCR2
+   mtspr   SPRN_MMCR2, r3  /* freeze all counters using MMCR2 */
+   isync
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
+   li  r3, 1
+   sldir3, r3, 31  /* MMCR0_FC (freeze counters) bit */
+   mfspr   r7, SPRN_MMCR0  /* save MMCR0 */
+   mtspr   SPRN_MMCR0, r3  /* freeze all counters, disable 
interrupts */
+   mfspr   r6, SPRN_MMCRA
+   /* Clear MMCRA in order to disable SDAR updates */
+   li  r5, 0
+   mtspr   SPRN_MMCRA, r5
+   isync
+   lbz r5, PACA_PMCINUSE(r13)  /* is the host using the PMU? */
+   cmpwi   r5, 0
+   beq 31f /* skip if not */
+   mfspr   r5, SPRN_MMCR1
+   mfspr   r9, SPRN_SIAR
+   mfspr   r10, SPRN_SDAR
+   std r7, HSTATE_MMCR0(r13)
+   std r5, HSTATE_MMCR1(r13)
+   std r6, HSTATE_MMCRA(r13)
+   std r9, HSTATE_SIAR(r13)
+   std r10, HSTATE_SDAR(r13)
+BEGIN_FTR_SECTION
+   mfspr   r9, SPRN_SIER
+   std r8, HSTATE_MMCR2(r13)
+   std r9, HSTATE_SIER(r13)
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
+   mfspr   r3, SPRN_PMC1
+   mfspr   r5, SPRN_PMC2
+   mfspr   r6, SPRN_PMC3
+   mfspr   r7, SPRN_PMC4
+   mfspr   r8, SPRN_PMC5
+   mfspr   r9, SPRN_PMC6
+   stw r3, HSTATE_PMC1(r13)
+

Re: [PATCH] powerpc: Don't print kernel instructions in show_user_instructions()

2018-10-08 Thread Michael Ellerman

Christophe LEROY  writes:
> Le 05/10/2018 à 15:21, Michael Ellerman a écrit :
>> Recently we implemented show_user_instructions() which dumps the code
...
>> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>> index 913c5725cdb2..bb6ac471a784 100644
>> --- a/arch/powerpc/kernel/process.c
>> +++ b/arch/powerpc/kernel/process.c
>> @@ -1306,6 +1306,16 @@ void show_user_instructions(struct pt_regs *regs)
>>   
>>  pc = regs->nip - (instructions_to_print * 3 / 4 * sizeof(int));
>>   
>> +/*
>> + * Make sure the NIP points at userspace, not kernel text/data or
>> + * elsewhere.
>> + */
>> +if (!__access_ok(pc, instructions_to_print * sizeof(int), USER_DS)) {
>> +pr_info("%s[%d]: Bad NIP, not dumping instructions.\n",
>> +current->comm, current->pid);
>> +return;
>> +}
>> +
>
> This will conflict with my serie 
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=64611 
> which changes instructions_to_print to a constant. Will you merge it or 
> do you expect me to rebase my serie ?

I can fix it up.

But I see you've already rebased it and resent, you're too quick for me :)

cheers

Re: [PATCH 29/36] dt-bindings: arm: Convert Renesas board/soc bindings to json-schema

2018-10-08 Thread Geert Uytterhoeven

Hi Rob,

On Fri, Oct 5, 2018 at 6:59 PM Rob Herring  wrote:
> Convert Renesas SoC bindings to DT schema format using json-schema.
>
> Cc: Simon Horman 
> Cc: Magnus Damm 
> Cc: Mark Rutland 
> Cc: linux-renesas-...@vger.kernel.org
> Cc: devicet...@vger.kernel.org
> Signed-off-by: Rob Herring 

Thanks for your patch!

Note that this will need a rebase, as more SoCs/boards have been added
in -next.

> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/shmobile.yaml
> @@ -0,0 +1,205 @@
> +# SPDX-License-Identifier: None

The old file didn't have an SPDX header, so it was GPL-2.0, implicitly?

> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/bindings/arm/shmobile.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Renesas SH-Mobile, R-Mobile, and R-Car Platform Device Tree Bindings
> +
> +maintainers:
> +  - Geert Uytterhoeven 

Simon Horman  (supporter:ARM/SHMOBILE ARM ARCHITECTURE)
Magnus Damm  (supporter:ARM/SHMOBILE ARM ARCHITECTURE)

You had it right in the CC list, though...

> +  - description: RZ/G1M (R8A77430)
> +items:
> +  - enum:
> +  # iWave Systems RZ/G1M Qseven Development Platform 
> (iW-RainboW-G20D-Qseven)
> +  - iwave,g20d
> +  - const: iwave,g20m
> +  - const: renesas,r8a7743
> +
> +  - items:
> +  - enum:
> +  # iWave Systems RZ/G1M Qseven System On Module 
> (iW-RainboW-G20M-Qseven)
> +  - iwave,g20m
> +  - const: renesas,r8a7743
> +
> +  - description: RZ/G1N (R8A77440)
> +items:
> +  - enum:
> +  - renesas,sk-rzg1m # SK-RZG1M (YR8A77430S000BE)

This board belongs under the RZ/G1M section above
(see also the 7743 in the part number).

> +  - const: renesas,r8a7744

> +  - description: Kingfisher (SBEV-RCAR-KF-M03)
> +items:
> +  - const: shimafuji,kingfisher
> +  - enum:
> +  - renesas,h3ulcb
> +  - renesas,m3ulcb
> +  - enum:
> +  - renesas,r8a7795
> +  - renesas,r8a7796

This looks a bit funny: all other entries have the "const" last, and
use it for the
SoC number. May be correct, though.
To clarify, this is an extension board that can fit both the [HM]3ULCB
boards (actually also the new M3NULCB, I think).

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [RFC PATCH kernel] vfio/spapr_tce: Get rid of possible infinite loop

2018-10-08 Thread Serhii Popovych

Alexey Kardashevskiy wrote:
> As a part of cleanup, the SPAPR TCE IOMMU subdriver releases preregistered
> memory. If there is a bug in memory release, the loop in
> tce_iommu_release() becomes infinite; this actually happened to me.
> 
> This makes the loop finite and prints a warning on every failure to make
> the code more bug prone.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  drivers/vfio/vfio_iommu_spapr_tce.c | 10 +++---
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> b/drivers/vfio/vfio_iommu_spapr_tce.c
> index b1a8ab3..ece0651 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -371,6 +371,7 @@ static void tce_iommu_release(void *iommu_data)
>  {
>   struct tce_container *container = iommu_data;
>   struct tce_iommu_group *tcegrp;
> + struct tce_iommu_prereg *tcemem, *tmtmp;
>   long i;
>  
>   while (tce_groups_attached(container)) {
> @@ -393,13 +394,8 @@ static void tce_iommu_release(void *iommu_data)
>   tce_iommu_free_table(container, tbl);
>   }
>  
> - while (!list_empty(>prereg_list)) {
> - struct tce_iommu_prereg *tcemem;
> -
> - tcemem = list_first_entry(>prereg_list,
> - struct tce_iommu_prereg, next);
> - WARN_ON_ONCE(tce_iommu_prereg_free(container, tcemem));
> - }
> + list_for_each_entry_safe(tcemem, tmtmp, >prereg_list, next)
> + WARN_ON(tce_iommu_prereg_free(container, tcemem));

I'm not sure that tce_iommu_prereg_free() call under WARN_ON() is good
idea because WARN_ON() is a preprocessor macro:

  if CONFIG_WARN=n is added by the analogy with CONFIG_BUG=n defining
  WARN_ON() as empty we will loose call to tce_iommu_prereg_free()
  leaking resources.

There is no problem at the moment: WARN_ON() defined for PPC in
arch/powerpc/include/asm/bug.h unconditionally.

So your first version with intermediate variable looks better to me.

>  
>   tce_iommu_disable(container);
>   if (container->mm)
> 


-- 
Thanks,
Serhii



signature.asc
Description: OpenPGP digital signature

Re: [PATCH 36/36] dt-bindings: arm: Convert ZTE board/soc bindings to json-schema

2018-10-08 Thread Shawn Guo

On Fri, Oct 05, 2018 at 11:58:48AM -0500, Rob Herring wrote:
> Convert ZTE SoC bindings to DT schema format using json-schema.
> 
> Cc: Jun Nie 
> Cc: Baoyou Xie 
> Cc: Shawn Guo 
> Cc: Mark Rutland 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: devicet...@vger.kernel.org
> Signed-off-by: Rob Herring 

Acked-by: Shawn Guo

Re: [PATCH 05/36] dt-bindings: arm: renesas: Move 'renesas,prr' binding to its own doc

2018-10-08 Thread Geert Uytterhoeven

Hi Rob,

On Fri, Oct 5, 2018 at 6:58 PM Rob Herring  wrote:
> In preparation to convert board-level bindings to json-schema, move
> various misc SoC bindings out to their own file.
>
> Cc: Mark Rutland 
> Cc: Simon Horman 
> Cc: Magnus Damm 
> Cc: devicet...@vger.kernel.org
> Cc: linux-renesas-...@vger.kernel.org
> Signed-off-by: Rob Herring 

Looks good to me, but needs a rebase, as the PRR section has been extended
in -next.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH] powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y

2018-10-08 Thread Nicholas Piggin

On Mon, 08 Oct 2018 15:08:31 +1100
Benjamin Herrenschmidt  wrote:

> HMIs will crash the kernel due to
> 
>   BRANCH_LINK_TO_FAR(hmi_exception_realmode)
> 
> Calling into the OPD instead of the actual code.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> ---
> 
> This hack fixes it for me, but it's not great. Nick, any better idea ?

Is it a hack because the ifdef gunk, or because there's something
deeper wrong with using the .sym?

I guess all those handlers that load label address by hand could have
the bug silently creep in. Can we have them use the DOTSYM() macro?

Thanks,
Nick

> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index ea04dfb..752709cc8 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1119,7 +1119,11 @@ TRAMP_REAL_BEGIN(hmi_exception_early)
>   EXCEPTION_PROLOG_COMMON_2(PACA_EXGEN)
>   EXCEPTION_PROLOG_COMMON_3(0xe60)
>   addir3,r1,STACK_FRAME_OVERHEAD
> +#ifdef PPC64_ELF_ABI_v1
> + BRANCH_LINK_TO_FAR(.hmi_exception_realmode) /* Function call ABI */
> +#else
>   BRANCH_LINK_TO_FAR(hmi_exception_realmode) /* Function call ABI */
> +#endif
>   cmpdi   cr0,r3,0
>  
>   /* Windup the stack. */
> 
>

Re: [PATCH 22/36] dt-bindings: arm: Convert FSL board/soc bindings to json-schema

2018-10-08 Thread Shawn Guo

On Fri, Oct 05, 2018 at 11:58:34AM -0500, Rob Herring wrote:
> Convert Freescale SoC bindings to DT schema format using json-schema.
> 
> Cc: Shawn Guo 
> Cc: Mark Rutland 
> Cc: devicet...@vger.kernel.org
> Signed-off-by: Rob Herring 
> ---
>  .../devicetree/bindings/arm/armadeus.txt  |   6 -
>  Documentation/devicetree/bindings/arm/bhf.txt |   6 -
>  .../bindings/arm/compulab-boards.txt  |  25 ---
>  Documentation/devicetree/bindings/arm/fsl.txt | 185 --
>  .../devicetree/bindings/arm/fsl.yaml  | 166 
>  .../devicetree/bindings/arm/i2se.txt  |  22 ---
>  .../devicetree/bindings/arm/olimex.txt|  10 -
>  .../devicetree/bindings/arm/technologic.txt   |  23 ---
>  8 files changed, 166 insertions(+), 277 deletions(-)
>  delete mode 100644 Documentation/devicetree/bindings/arm/armadeus.txt
>  delete mode 100644 Documentation/devicetree/bindings/arm/bhf.txt
>  delete mode 100644 Documentation/devicetree/bindings/arm/compulab-boards.txt
>  delete mode 100644 Documentation/devicetree/bindings/arm/fsl.txt
>  create mode 100644 Documentation/devicetree/bindings/arm/fsl.yaml
>  delete mode 100644 Documentation/devicetree/bindings/arm/i2se.txt
>  delete mode 100644 Documentation/devicetree/bindings/arm/olimex.txt
>  delete mode 100644 Documentation/devicetree/bindings/arm/technologic.txt
> 
> diff --git a/Documentation/devicetree/bindings/arm/armadeus.txt 
> b/Documentation/devicetree/bindings/arm/armadeus.txt
> deleted file mode 100644
> index 9821283ff516..
> --- a/Documentation/devicetree/bindings/arm/armadeus.txt
> +++ /dev/null
> @@ -1,6 +0,0 @@
> -Armadeus i.MX Platforms Device Tree Bindings
> 
> -
> -APF51: i.MX51 based module.
> -Required root node properties:
> -- compatible = "armadeus,imx51-apf51", "fsl,imx51";
> diff --git a/Documentation/devicetree/bindings/arm/bhf.txt 
> b/Documentation/devicetree/bindings/arm/bhf.txt
> deleted file mode 100644
> index 886b503caf9c..
> --- a/Documentation/devicetree/bindings/arm/bhf.txt
> +++ /dev/null
> @@ -1,6 +0,0 @@
> -Beckhoff Automation Platforms Device Tree Bindings
> ---
> -
> -CX9020 Embedded PC
> -Required root node properties:
> -- compatible = "bhf,cx9020", "fsl,imx53";
> diff --git a/Documentation/devicetree/bindings/arm/compulab-boards.txt 
> b/Documentation/devicetree/bindings/arm/compulab-boards.txt
> deleted file mode 100644
> index 42a10285af9c..
> --- a/Documentation/devicetree/bindings/arm/compulab-boards.txt
> +++ /dev/null
> @@ -1,25 +0,0 @@
> -CompuLab SB-SOM is a multi-module baseboard capable of carrying:
> - - CM-T43
> - - CM-T54
> - - CM-QS600
> - - CL-SOM-AM57x
> - - CL-SOM-iMX7
> -modules with minor modifications to the SB-SOM assembly.
> -
> -Required root node properties:
> -- compatible = should be "compulab,sb-som"
> -
> -Compulab CL-SOM-iMX7 is a miniature System-on-Module (SoM) based on
> -Freescale i.MX7 ARM Cortex-A7 System-on-Chip.
> -
> -Required root node properties:
> -- compatible = "compulab,cl-som-imx7", "fsl,imx7d";
> -
> -Compulab SBC-iMX7 is a single board computer based on the
> -Freescale i.MX7 system-on-chip. SBC-iMX7 is implemented with
> -the CL-SOM-iMX7 System-on-Module providing most of the functions,
> -and SB-SOM-iMX7 carrier board providing additional peripheral
> -functions and connectors.
> -
> -Required root node properties:
> -- compatible = "compulab,sbc-imx7", "compulab,cl-som-imx7", "fsl,imx7d";
> diff --git a/Documentation/devicetree/bindings/arm/fsl.txt 
> b/Documentation/devicetree/bindings/arm/fsl.txt
> deleted file mode 100644
> index 1e775aaa5c5b..
> --- a/Documentation/devicetree/bindings/arm/fsl.txt
> +++ /dev/null
> @@ -1,185 +0,0 @@
> -Freescale i.MX Platforms Device Tree Bindings
> 
> -
> -i.MX23 Evaluation Kit
> -Required root node properties:
> -- compatible = "fsl,imx23-evk", "fsl,imx23";
> -
> -i.MX25 Product Development Kit
> -Required root node properties:
> -- compatible = "fsl,imx25-pdk", "fsl,imx25";
> -
> -i.MX27 Product Development Kit
> -Required root node properties:
> -- compatible = "fsl,imx27-pdk", "fsl,imx27";
> -
> -i.MX28 Evaluation Kit
> -Required root node properties:
> -- compatible = "fsl,imx28-evk", "fsl,imx28";
> -
> -i.MX51 Babbage Board
> -Required root node properties:
> -- compatible = "fsl,imx51-babbage", "fsl,imx51";
> -
> -i.MX53 Automotive Reference Design Board
> -Required root node properties:
> -- compatible = "fsl,imx53-ard", "fsl,imx53";
> -
> -i.MX53 Evaluation Kit
> -Required root node properties:
> -- compatible = "fsl,imx53-evk", "fsl,imx53";
> -
> -i.MX53 Quick Start Board
> -Required root node properties:
> -- compatible = "fsl,imx53-qsb", "fsl,imx53";
> -
> -i.MX53 Smart Mobile Reference Design Board
> -Required root node properties:
> --

Re: [PATCH 06/36] dt-bindings: arm: zte: Move sysctrl bindings to their own doc

2018-10-08 Thread Shawn Guo

On Fri, Oct 05, 2018 at 11:58:18AM -0500, Rob Herring wrote:
> In preparation to convert board-level bindings to json-schema, move
> various misc SoC bindings out to their own file.
> 
> Cc: Mark Rutland 
> Cc: Jun Nie 
> Cc: Baoyou Xie 
> Cc: Shawn Guo 
> Cc: devicet...@vger.kernel.org
> Cc: linux-arm-ker...@lists.infradead.org
> Signed-off-by: Rob Herring 
> ---
>  .../devicetree/bindings/arm/zte-sysctrl.txt   | 30 +++

zte,sysctrl.txt to be consistent with other files like
fsl,layerscape-dcfg.txt?  I'm fine with either way, but just want to
see more consistent naming convention?  Other than that,

Acked-by: Shawn Guo

Re: [PATCH 04/36] dt-bindings: arm: fsl: Move DCFG and SCFG bindings to their own docs

2018-10-08 Thread Shawn Guo

On Fri, Oct 05, 2018 at 11:58:16AM -0500, Rob Herring wrote:
> In preparation to convert board-level bindings to json-schema, move
> various misc SoC bindings out to their own file.
> 
> Cc: Shawn Guo 
> Cc: Mark Rutland 
> Cc: devicet...@vger.kernel.org
> Signed-off-by: Rob Herring 

Acked-by: Shawn Guo

Re: [PATCH v4 6/6] arm64: dts: add LX2160ARDB board support

2018-10-08 Thread Shawn Guo

On Thu, Oct 04, 2018 at 06:33:51AM +0530, Vabhav Sharma wrote:
> LX2160A reference design board (RDB) is a high-performance
> computing, evaluation, and development platform with LX2160A
> SoC.
> 
> Signed-off-by: Priyanka Jain 
> Signed-off-by: Sriram Dash 
> Signed-off-by: Vabhav Sharma 
> ---
>  arch/arm64/boot/dts/freescale/Makefile|   1 +
>  arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts | 100 
> ++
>  2 files changed, 101 insertions(+)
>  create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
> 
> diff --git a/arch/arm64/boot/dts/freescale/Makefile 
> b/arch/arm64/boot/dts/freescale/Makefile
> index 86e18ad..445b72b 100644
> --- a/arch/arm64/boot/dts/freescale/Makefile
> +++ b/arch/arm64/boot/dts/freescale/Makefile
> @@ -13,3 +13,4 @@ dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-rdb.dtb
>  dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-simu.dtb
>  dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-qds.dtb
>  dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-rdb.dtb
> +dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-rdb.dtb
> diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts 
> b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
> new file mode 100644
> index 000..1483071
> --- /dev/null
> +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
> @@ -0,0 +1,100 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR MIT)
> +//
> +// Device Tree file for LX2160ARDB
> +//
> +// Copyright 2018 NXP
> +
> +/dts-v1/;
> +
> +#include "fsl-lx2160a.dtsi"
> +
> +/ {
> + model = "NXP Layerscape LX2160ARDB";
> + compatible = "fsl,lx2160a-rdb", "fsl,lx2160a";
> +
> + chosen {
> + stdout-path = "serial0:115200n8";
> + };
> +
> + sb_3v3: regulator-fixed {

The node name should probably be named like regulator-sb3v3 or
something, so that the pattern can be followed when we have another
fixed regulator to be added.

> + compatible = "regulator-fixed";
> + regulator-name = "fixed-3.3V";

The name should be something we can find on board schematics.

> + regulator-min-microvolt = <330>;
> + regulator-max-microvolt = <330>;
> + regulator-boot-on;
> + regulator-always-on;
> + };
> +
> +};
> +
> + {
> + status = "okay";
> +};
> +
> + {
> + status = "okay";
> +};
> +
> + {

Please keep these labeled nodes sorted alphabetically.

> + status = "okay";

Have a newline between properties and child node.

> + i2c-mux@77 {
> + compatible = "nxp,pca9547";
> + reg = <0x77>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + i2c@2 {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + reg = <0x2>;
> +
> + power-monitor@40 {
> + compatible = "ti,ina220";
> + reg = <0x40>;
> + shunt-resistor = <1000>;
> + };
> + };
> +
> + i2c@3 {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + reg = <0x3>;
> +
> + temperature-sensor@4c {
> + compatible = "nxp,sa56004";
> + reg = <0x4c>;
> + vcc-supply = <_3v3>;
> + };
> +
> + temperature-sensor@4d {
> + compatible = "nxp,sa56004";
> + reg = <0x4d>;
> + vcc-supply = <_3v3>;
> + };
> + };
> + };
> +};
> +
> + {
> + status = "okay";
> +
> + rtc@51 {
> + compatible = "nxp,pcf2129";
> + reg = <0x51>;
> + // IRQ10_B
> + interrupts = <0 150 0x4>;
> + };

Bad indentation.

Shawn

> +
> +};
> +
> + {
> + status = "okay";
> +};
> +
> + {
> + status = "okay";
> +};
> +
> + {
> + status = "okay";
> +};
> -- 
> 2.7.4
>

Re: [PATCH v4 5/6] arm64: dts: add QorIQ LX2160A SoC support

2018-10-08 Thread Shawn Guo

On Thu, Oct 04, 2018 at 06:33:50AM +0530, Vabhav Sharma wrote:
> LX2160A SoC is based on Layerscape Chassis Generation 3.2 Architecture.
> 
> LX2160A features an advanced 16 64-bit ARM v8 CortexA72 processor cores
> in 8 cluster, CCN508, GICv3,two 64-bit DDR4 memory controller, 8 I2C
> controllers, 3 dspi, 2 esdhc,2 USB 3.0, mmu 500, 3 SATA, 4 PL011 SBSA
> UARTs etc.
> 
> Signed-off-by: Ramneek Mehresh 
> Signed-off-by: Zhang Ying-22455 
> Signed-off-by: Nipun Gupta 
> Signed-off-by: Priyanka Jain 
> Signed-off-by: Yogesh Gaur 
> Signed-off-by: Sriram Dash 
> Signed-off-by: Vabhav Sharma 
> ---
>  arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 702 
> +
>  1 file changed, 702 insertions(+)
>  create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi 
> b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> new file mode 100644
> index 000..c758268
> --- /dev/null
> +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> @@ -0,0 +1,702 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR MIT)
> +//
> +// Device Tree Include file for Layerscape-LX2160A family SoC.
> +//
> +// Copyright 2018 NXP
> +
> +#include 
> +
> +/memreserve/ 0x8000 0x0001;
> +
> +/ {
> + compatible = "fsl,lx2160a";
> + interrupt-parent = <>;
> + #address-cells = <2>;
> + #size-cells = <2>;
> +
> + cpus {
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + // 8 clusters having 2 Cortex-A72 cores each
> + cpu@0 {
> + device_type = "cpu";
> + compatible = "arm,cortex-a72";
> + enable-method = "psci";
> + reg = <0x0>;
> + clocks = < 1 0>;
> + d-cache-size = <0x8000>;
> + d-cache-line-size = <64>;
> + d-cache-sets = <128>;
> + i-cache-size = <0xC000>;
> + i-cache-line-size = <64>;
> + i-cache-sets = <192>;
> + next-level-cache = <_l2>;
> + };
> +
> + cpu@1 {
> + device_type = "cpu";
> + compatible = "arm,cortex-a72";
> + enable-method = "psci";
> + reg = <0x1>;
> + clocks = < 1 0>;
> + d-cache-size = <0x8000>;
> + d-cache-line-size = <64>;
> + d-cache-sets = <128>;
> + i-cache-size = <0xC000>;
> + i-cache-line-size = <64>;
> + i-cache-sets = <192>;
> + next-level-cache = <_l2>;
> + };
> +
> + cpu@100 {
> + device_type = "cpu";
> + compatible = "arm,cortex-a72";
> + enable-method = "psci";
> + reg = <0x100>;
> + clocks = < 1 1>;
> + d-cache-size = <0x8000>;
> + d-cache-line-size = <64>;
> + d-cache-sets = <128>;
> + i-cache-size = <0xC000>;
> + i-cache-line-size = <64>;
> + i-cache-sets = <192>;
> + next-level-cache = <_l2>;
> + };
> +
> + cpu@101 {
> + device_type = "cpu";
> + compatible = "arm,cortex-a72";
> + enable-method = "psci";
> + reg = <0x101>;
> + clocks = < 1 1>;
> + d-cache-size = <0x8000>;
> + d-cache-line-size = <64>;
> + d-cache-sets = <128>;
> + i-cache-size = <0xC000>;
> + i-cache-line-size = <64>;
> + i-cache-sets = <192>;
> + next-level-cache = <_l2>;
> + };
> +
> + cpu@200 {
> + device_type = "cpu";
> + compatible = "arm,cortex-a72";
> + enable-method = "psci";
> + reg = <0x200>;
> + clocks = < 1 2>;
> + d-cache-size = <0x8000>;
> + d-cache-line-size = <64>;
> + d-cache-sets = <128>;
> + i-cache-size = <0xC000>;
> + i-cache-line-size = <64>;
> + i-cache-sets = <192>;
> + next-level-cache = <_l2>;
> + };
> +
> + cpu@201 {
> + device_type = "cpu";
> + compatible = "arm,cortex-a72";
> + enable-method = "psci";
> + reg = <0x201>;
> + clocks = < 1 2>;
> + d-cache-size = <0x8000>;
> + d-cache-line-size = <64>;
> +

[PATCH v5 33/33] KVM: PPC: Book3S HV: Add NO_HASH flag to GET_SMMU_INFO ioctl result

2018-10-08 Thread Paul Mackerras

This adds a KVM_PPC_NO_HASH flag to the flags field of the
kvm_ppc_smmu_info struct, and arranges for it to be set when
running as a nested hypervisor, as an unambiguous indication
to userspace that HPT guests are not supported.  Reporting the
KVM_CAP_PPC_MMU_HASH_V3 capability as false could be taken as
indicating only that the new HPT features in ISA V3.0 are not
supported, leaving it ambiguous whether pre-V3.0 HPT features
are supported.

Signed-off-by: Paul Mackerras 
---
 Documentation/virtual/kvm/api.txt | 4 
 arch/powerpc/kvm/book3s_hv.c  | 4 
 include/uapi/linux/kvm.h  | 1 +
 3 files changed, 9 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index fde48b6..df98b63 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2270,6 +2270,10 @@ The supported flags are:
 The emulated MMU supports 1T segments in addition to the
 standard 256M ones.
 
+- KVM_PPC_NO_HASH
+   This flag indicates that HPT guests are not supported by KVM,
+   thus all guests must use radix MMU mode.
+
 The "slb_size" field indicates how many SLB entries are supported
 
 The "sps" array contains 8 entries indicating the supported base
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index fa61647..f565403 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4245,6 +4245,10 @@ static int kvm_vm_ioctl_get_smmu_info_hv(struct kvm *kvm,
kvmppc_add_seg_page_size(, 16, SLB_VSID_L | SLB_VSID_LP_01);
kvmppc_add_seg_page_size(, 24, SLB_VSID_L);
 
+   /* If running as a nested hypervisor, we don't support HPT guests */
+   if (kvmhv_on_pseries())
+   info->flags |= KVM_PPC_NO_HASH;
+
return 0;
 }
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d9cec6b..7f2ff3a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -719,6 +719,7 @@ struct kvm_ppc_one_seg_page_size {
 
 #define KVM_PPC_PAGE_SIZES_REAL0x0001
 #define KVM_PPC_1T_SEGMENTS0x0002
+#define KVM_PPC_NO_HASH0x0004
 
 struct kvm_ppc_smmu_info {
__u64 flags;
-- 
2.7.4

[PATCH v5 32/33] KVM: PPC: Book3S HV: Add a VM capability to enable nested virtualization

2018-10-08 Thread Paul Mackerras

With this, userspace can enable a KVM-HV guest to run nested guests
under it.

The administrator can control whether any nested guests can be run;
setting the "nested" module parameter to false prevents any guests
becoming nested hypervisors (that is, any attempt to enable the nested
capability on a guest will fail).  Guests which are already nested
hypervisors will continue to be so.

Signed-off-by: Paul Mackerras 
---
 Documentation/virtual/kvm/api.txt  | 14 ++
 arch/powerpc/include/asm/kvm_ppc.h |  1 +
 arch/powerpc/kvm/book3s_hv.c   | 39 +-
 arch/powerpc/kvm/powerpc.c | 12 
 include/uapi/linux/kvm.h   |  1 +
 5 files changed, 58 insertions(+), 9 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 2f5f9b7..fde48b6 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4532,6 +4532,20 @@ With this capability, a guest may read the 
MSR_PLATFORM_INFO MSR. Otherwise,
 a #GP would be raised when the guest tries to access. Currently, this
 capability does not enable write permissions of this MSR for the guest.
 
+7.16 KVM_CAP_PPC_NESTED_HV
+
+Architectures: ppc
+Parameters: none
+Returns: 0 on success, -EINVAL when the implementation doesn't support
+nested-HV virtualization.
+
+HV-KVM on POWER9 and later systems allows for "nested-HV"
+virtualization, which provides a way for a guest VM to run guests that
+can run using the CPU's supervisor mode (privileged non-hypervisor
+state).  Enabling this capability on a VM depends on the CPU having
+the necessary functionality and on the facility being enabled with a
+kvm-hv module parameter.
+
 8. Other capabilities.
 --
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 245e564..b3796bd 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -327,6 +327,7 @@ struct kvmppc_ops {
int (*set_smt_mode)(struct kvm *kvm, unsigned long mode,
unsigned long flags);
void (*giveup_ext)(struct kvm_vcpu *vcpu, ulong msr);
+   int (*enable_nested)(struct kvm *kvm);
 };
 
 extern struct kvmppc_ops *kvmppc_hv_ops;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 152bf75..fa61647 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -118,6 +118,16 @@ module_param_cb(h_ipi_redirect, _param_ops, 
_ipi_redirect, 0644);
 MODULE_PARM_DESC(h_ipi_redirect, "Redirect H_IPI wakeup to a free host core");
 #endif
 
+/* If set, guests are allowed to create and control nested guests */
+static bool nested = true;
+module_param(nested, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(nested, "Enable nested virtualization (only on POWER9)");
+
+static inline bool nesting_enabled(struct kvm *kvm)
+{
+   return kvm->arch.nested_enable && kvm_is_radix(kvm);
+}
+
 /* If set, the threads on each CPU core have to be in the same MMU mode */
 static bool no_mixing_hpt_and_radix;
 
@@ -959,12 +969,12 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 
case H_SET_PARTITION_TABLE:
ret = H_FUNCTION;
-   if (vcpu->kvm->arch.nested_enable)
+   if (nesting_enabled(vcpu->kvm))
ret = kvmhv_set_partition_table(vcpu);
break;
case H_ENTER_NESTED:
ret = H_FUNCTION;
-   if (!vcpu->kvm->arch.nested_enable)
+   if (!nesting_enabled(vcpu->kvm))
break;
ret = kvmhv_enter_nested_guest(vcpu);
if (ret == H_INTERRUPT) {
@@ -974,9 +984,8 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
break;
case H_TLB_INVALIDATE:
ret = H_FUNCTION;
-   if (!vcpu->kvm->arch.nested_enable)
-   break;
-   ret = kvmhv_do_nested_tlbie(vcpu);
+   if (nesting_enabled(vcpu->kvm))
+   ret = kvmhv_do_nested_tlbie(vcpu);
break;
 
default:
@@ -4496,10 +4505,8 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu 
*vcpu)
 /* Must be called with kvm->lock held and mmu_ready = 0 and no vcpus running */
 int kvmppc_switch_mmu_to_hpt(struct kvm *kvm)
 {
-   if (kvm->arch.nested_enable) {
-   kvm->arch.nested_enable = false;
+   if (nesting_enabled(kvm))
kvmhv_release_all_nested(kvm);
-   }
kvmppc_free_radix(kvm);
kvmppc_update_lpcr(kvm, LPCR_VPM1,
   LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR);
@@ -4776,7 +4783,7 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 
/* Perform global invalidation and return lpid to the pool */
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
-   if (kvm->arch.nested_enable)
+   if (nesting_enabled(kvm))

[PATCH v5 30/33] KVM: PPC: Book3S HV: Allow HV module to load without hypervisor mode

2018-10-08 Thread Paul Mackerras

With this, the KVM-HV module can be loaded in a guest running under
KVM-HV, and if the hypervisor supports nested virtualization, this
guest can now act as a nested hypervisor and run nested guests.

This also adds some checks to inform userspace that HPT guests are not
supported by nested hypervisors (by returning false for the
KVM_CAP_PPC_MMU_HASH_V3 capability), and to prevent userspace from
configuring a guest to use HPT mode.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv.c | 16 
 arch/powerpc/kvm/powerpc.c   |  3 ++-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 127bb5f..152bf75 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4807,11 +4807,15 @@ static int kvmppc_core_emulate_mfspr_hv(struct kvm_vcpu 
*vcpu, int sprn,
 
 static int kvmppc_core_check_processor_compat_hv(void)
 {
-   if (!cpu_has_feature(CPU_FTR_HVMODE) ||
-   !cpu_has_feature(CPU_FTR_ARCH_206))
-   return -EIO;
+   if (cpu_has_feature(CPU_FTR_HVMODE) &&
+   cpu_has_feature(CPU_FTR_ARCH_206))
+   return 0;
 
-   return 0;
+   /* POWER9 in radix mode is capable of being a nested hypervisor. */
+   if (cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled())
+   return 0;
+
+   return -EIO;
 }
 
 #ifdef CONFIG_KVM_XICS
@@ -5129,6 +5133,10 @@ static int kvmhv_configure_mmu(struct kvm *kvm, struct 
kvm_ppc_mmuv3_cfg *cfg)
if (radix && !radix_enabled())
return -EINVAL;
 
+   /* If we're a nested hypervisor, we currently only support radix */
+   if (kvmhv_on_pseries() && !radix)
+   return -EINVAL;
+
mutex_lock(>lock);
if (radix != kvm_is_radix(kvm)) {
if (kvm->arch.mmu_ready) {
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index eba5756..1f4b128 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -594,7 +594,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = !!(hv_enabled && radix_enabled());
break;
case KVM_CAP_PPC_MMU_HASH_V3:
-   r = !!(hv_enabled && cpu_has_feature(CPU_FTR_ARCH_300));
+   r = !!(hv_enabled && cpu_has_feature(CPU_FTR_ARCH_300) &&
+  cpu_has_feature(CPU_FTR_HVMODE));
break;
 #endif
case KVM_CAP_SYNC_MMU:
-- 
2.7.4

[PATCH v5 31/33] KVM: PPC: Book3S HV: Add nested shadow page tables to debugfs

2018-10-08 Thread Paul Mackerras

This adds a list of valid shadow PTEs for each nested guest to
the 'radix' file for the guest in debugfs.  This can be useful for
debugging.

Reviewed-by: David Gibson 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  1 +
 arch/powerpc/kvm/book3s_64_mmu_radix.c   | 39 +---
 arch/powerpc/kvm/book3s_hv_nested.c  | 15 
 3 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 83d4def..6d29814 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -120,6 +120,7 @@ struct rmap_nested {
 struct kvm_nested_guest *kvmhv_get_nested(struct kvm *kvm, int l1_lpid,
  bool create);
 void kvmhv_put_nested(struct kvm_nested_guest *gp);
+int kvmhv_nested_next_lpid(struct kvm *kvm, int lpid);
 
 /* Encoding of first parameter for H_TLB_INVALIDATE */
 #define H_TLBIE_P1_ENC(ric, prs, r)(___PPC_RIC(ric) | ___PPC_PRS(prs) | \
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index ae0e3ed..43b21e8 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -1002,6 +1002,7 @@ struct debugfs_radix_state {
struct kvm  *kvm;
struct mutexmutex;
unsigned long   gpa;
+   int lpid;
int chars_left;
int buf_index;
charbuf[128];
@@ -1043,6 +1044,7 @@ static ssize_t debugfs_radix_read(struct file *file, char 
__user *buf,
struct kvm *kvm;
unsigned long gpa;
pgd_t *pgt;
+   struct kvm_nested_guest *nested;
pgd_t pgd, *pgdp;
pud_t pud, *pudp;
pmd_t pmd, *pmdp;
@@ -1077,10 +1079,39 @@ static ssize_t debugfs_radix_read(struct file *file, 
char __user *buf,
}
 
gpa = p->gpa;
-   pgt = kvm->arch.pgtable;
-   while (len != 0 && gpa < RADIX_PGTABLE_RANGE) {
+   nested = NULL;
+   pgt = NULL;
+   while (len != 0 && p->lpid >= 0) {
+   if (gpa >= RADIX_PGTABLE_RANGE) {
+   gpa = 0;
+   pgt = NULL;
+   if (nested) {
+   kvmhv_put_nested(nested);
+   nested = NULL;
+   }
+   p->lpid = kvmhv_nested_next_lpid(kvm, p->lpid);
+   p->hdr = 0;
+   if (p->lpid < 0)
+   break;
+   }
+   if (!pgt) {
+   if (p->lpid == 0) {
+   pgt = kvm->arch.pgtable;
+   } else {
+   nested = kvmhv_get_nested(kvm, p->lpid, false);
+   if (!nested) {
+   gpa = RADIX_PGTABLE_RANGE;
+   continue;
+   }
+   pgt = nested->shadow_pgtable;
+   }
+   }
+   n = 0;
if (!p->hdr) {
-   n = scnprintf(p->buf, sizeof(p->buf),
+   if (p->lpid > 0)
+   n = scnprintf(p->buf, sizeof(p->buf),
+ "\nNested LPID %d: ", p->lpid);
+   n += scnprintf(p->buf + n, sizeof(p->buf) - n,
  "pgdir: %lx\n", (unsigned long)pgt);
p->hdr = 1;
goto copy;
@@ -1146,6 +1177,8 @@ static ssize_t debugfs_radix_read(struct file *file, char 
__user *buf,
}
}
p->gpa = gpa;
+   if (nested)
+   kvmhv_put_nested(nested);
 
  out:
mutex_unlock(>mutex);
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 3f21f78..401d2ec 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -1274,3 +1274,18 @@ long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu)
mutex_unlock(>tlb_lock);
return ret;
 }
+
+int kvmhv_nested_next_lpid(struct kvm *kvm, int lpid)
+{
+   int ret = -1;
+
+   spin_lock(>mmu_lock);
+   while (++lpid <= kvm->arch.max_nested_lpid) {
+   if (kvm->arch.nested_guests[lpid]) {
+   ret = lpid;
+   break;
+   }
+   }
+   spin_unlock(>mmu_lock);
+   return ret;
+}
-- 
2.7.4

[PATCH v5 29/33] KVM: PPC: Book3S HV: Handle differing endianness for H_ENTER_NESTED

2018-10-08 Thread Paul Mackerras

From: Suraj Jitindar Singh 

The hcall H_ENTER_NESTED takes two parameters: the address in L1 guest
memory of a hv_regs struct and the address of a pt_regs struct.  The
hcall requests the L0 hypervisor to use the register values in these
structs to run a L2 guest and to return the exit state of the L2 guest
in these structs.  These are in the endianness of the L1 guest, rather
than being always big-endian as is usually the case for PAPR
hypercalls.

This is convenient because it means that the L1 guest can pass the
address of the regs field in its kvm_vcpu_arch struct.  This also
improves performance slightly by avoiding the need for two copies of
the pt_regs struct.

When reading/writing these structures, this patch handles the case
where the endianness of the L1 guest differs from that of the L0
hypervisor, by byteswapping the structures after reading and before
writing them back.

Since all the fields of the pt_regs are of the same type, i.e.,
unsigned long, we treat it as an array of unsigned longs.  The fields
of struct hv_guest_state are not all the same, so its fields are
byteswapped individually.

Reviewed-by: David Gibson 
Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_nested.c | 51 -
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index e2305962..3f21f78 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -51,6 +51,48 @@ void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct 
hv_guest_state *hr)
hr->ppr = vcpu->arch.ppr;
 }
 
+static void byteswap_pt_regs(struct pt_regs *regs)
+{
+   unsigned long *addr = (unsigned long *) regs;
+
+   for (; addr < ((unsigned long *) (regs + 1)); addr++)
+   *addr = swab64(*addr);
+}
+
+static void byteswap_hv_regs(struct hv_guest_state *hr)
+{
+   hr->version = swab64(hr->version);
+   hr->lpid = swab32(hr->lpid);
+   hr->vcpu_token = swab32(hr->vcpu_token);
+   hr->lpcr = swab64(hr->lpcr);
+   hr->pcr = swab64(hr->pcr);
+   hr->amor = swab64(hr->amor);
+   hr->dpdes = swab64(hr->dpdes);
+   hr->hfscr = swab64(hr->hfscr);
+   hr->tb_offset = swab64(hr->tb_offset);
+   hr->dawr0 = swab64(hr->dawr0);
+   hr->dawrx0 = swab64(hr->dawrx0);
+   hr->ciabr = swab64(hr->ciabr);
+   hr->hdec_expiry = swab64(hr->hdec_expiry);
+   hr->purr = swab64(hr->purr);
+   hr->spurr = swab64(hr->spurr);
+   hr->ic = swab64(hr->ic);
+   hr->vtb = swab64(hr->vtb);
+   hr->hdar = swab64(hr->hdar);
+   hr->hdsisr = swab64(hr->hdsisr);
+   hr->heir = swab64(hr->heir);
+   hr->asdr = swab64(hr->asdr);
+   hr->srr0 = swab64(hr->srr0);
+   hr->srr1 = swab64(hr->srr1);
+   hr->sprg[0] = swab64(hr->sprg[0]);
+   hr->sprg[1] = swab64(hr->sprg[1]);
+   hr->sprg[2] = swab64(hr->sprg[2]);
+   hr->sprg[3] = swab64(hr->sprg[3]);
+   hr->pidr = swab64(hr->pidr);
+   hr->cfar = swab64(hr->cfar);
+   hr->ppr = swab64(hr->ppr);
+}
+
 static void save_hv_return_state(struct kvm_vcpu *vcpu, int trap,
 struct hv_guest_state *hr)
 {
@@ -175,6 +217,8 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
  sizeof(struct hv_guest_state));
if (err)
return H_PARAMETER;
+   if (kvmppc_need_byteswap(vcpu))
+   byteswap_hv_regs(_hv);
if (l2_hv.version != HV_GUEST_STATE_VERSION)
return H_P2;
 
@@ -183,7 +227,8 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
  sizeof(struct pt_regs));
if (err)
return H_PARAMETER;
-
+   if (kvmppc_need_byteswap(vcpu))
+   byteswap_pt_regs(_regs);
if (l2_hv.vcpu_token >= NR_CPUS)
return H_PARAMETER;
 
@@ -255,6 +300,10 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
kvmhv_put_nested(l2);
 
/* copy l2_hv_state and regs back to guest */
+   if (kvmppc_need_byteswap(vcpu)) {
+   byteswap_hv_regs(_hv);
+   byteswap_pt_regs(_regs);
+   }
err = kvm_vcpu_write_guest(vcpu, hv_ptr, _hv,
   sizeof(struct hv_guest_state));
if (err)
-- 
2.7.4

[PATCH v5 28/33] KVM: PPC: Book3S HV: Sanitise hv_regs on nested guest entry

2018-10-08 Thread Paul Mackerras

From: Suraj Jitindar Singh 

restore_hv_regs() is used to copy the hv_regs L1 wants to set to run the
nested (L2) guest into the vcpu structure. We need to sanitise these
values to ensure we don't let the L1 guest hypervisor do things we don't
want it to.

We don't let data address watchpoints or completed instruction address
breakpoints be set to match in hypervisor state.

We also don't let L1 enable features in the hypervisor facility status
and control register (HFSCR) for L2 which we have disabled for L1. That
is L2 will get the subset of features which the L0 hypervisor has
enabled for L1 and the features L1 wants to enable for L2. This could
mean we give L1 a hypervisor facility unavailable interrupt for a
facility it thinks it has enabled, however it shouldn't have enabled a
facility it itself doesn't have for the L2 guest.

We sanitise the registers when copying in the L2 hv_regs. We don't need
to sanitise when copying back the L1 hv_regs since these shouldn't be
able to contain invalid values as they're just what was copied out.

Reviewed-by: David Gibson 
Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/reg.h  |  1 +
 arch/powerpc/kvm/book3s_hv_nested.c | 17 +
 2 files changed, 18 insertions(+)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 6fda746..c9069897 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -415,6 +415,7 @@
 #define   HFSCR_DSCR   __MASK(FSCR_DSCR_LG)
 #define   HFSCR_VECVSX __MASK(FSCR_VECVSX_LG)
 #define   HFSCR_FP __MASK(FSCR_FP_LG)
+#define   HFSCR_INTR_CAUSE (ASM_CONST(0xFF) << 56) /* interrupt cause */
 #define SPRN_TAR   0x32f   /* Target Address Register */
 #define SPRN_LPCR  0x13E   /* LPAR Control Register */
 #define   LPCR_VPM0ASM_CONST(0x8000)
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index a876dc3..e2305962 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -86,6 +86,22 @@ static void save_hv_return_state(struct kvm_vcpu *vcpu, int 
trap,
}
 }
 
+static void sanitise_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr)
+{
+   /*
+* Don't let L1 enable features for L2 which we've disabled for L1,
+* but preserve the interrupt cause field.
+*/
+   hr->hfscr &= (HFSCR_INTR_CAUSE | vcpu->arch.hfscr);
+
+   /* Don't let data address watchpoint match in hypervisor state */
+   hr->dawrx0 &= ~DAWRX_HYP;
+
+   /* Don't let completed instruction address breakpt match in HV state */
+   if ((hr->ciabr & CIABR_PRIV) == CIABR_PRIV_HYPER)
+   hr->ciabr &= ~CIABR_PRIV;
+}
+
 static void restore_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr)
 {
struct kvmppc_vcore *vc = vcpu->arch.vcore;
@@ -198,6 +214,7 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD |
LPCR_LPES | LPCR_MER;
lpcr = (vc->lpcr & ~mask) | (l2_hv.lpcr & mask);
+   sanitise_hv_regs(vcpu, _hv);
restore_hv_regs(vcpu, _hv);
 
vcpu->arch.ret = RESUME_GUEST;
-- 
2.7.4

[PATCH v5 27/33] KVM: PPC: Book3S HV: Add one-reg interface to virtual PTCR register

2018-10-08 Thread Paul Mackerras

This adds a one-reg register identifier which can be used to read and
set the virtual PTCR for the guest.  This register identifies the
address and size of the virtual partition table for the guest, which
contains information about the nested guests under this guest.

Migrating this value is the only extra requirement for migrating a
guest which has nested guests (assuming of course that the destination
host supports nested virtualization in the kvm-hv module).

Reviewed-by: David Gibson 
Signed-off-by: Paul Mackerras 
---
 Documentation/virtual/kvm/api.txt   | 1 +
 arch/powerpc/include/uapi/asm/kvm.h | 1 +
 arch/powerpc/kvm/book3s_hv.c| 6 ++
 3 files changed, 8 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 647f941..2f5f9b7 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1922,6 +1922,7 @@ registers, find a list below:
   PPC   | KVM_REG_PPC_TIDR  | 64
   PPC   | KVM_REG_PPC_PSSCR | 64
   PPC   | KVM_REG_PPC_DEC_EXPIRY| 64
+  PPC   | KVM_REG_PPC_PTCR  | 64
   PPC   | KVM_REG_PPC_TM_GPR0   | 64
   ...
   PPC   | KVM_REG_PPC_TM_GPR31  | 64
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index 1b32b56..8c876c1 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -634,6 +634,7 @@ struct kvm_ppc_cpu_char {
 
 #define KVM_REG_PPC_DEC_EXPIRY (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbe)
 #define KVM_REG_PPC_ONLINE (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbf)
+#define KVM_REG_PPC_PTCR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc0)
 
 /* Transactional Memory checkpointed state:
  * This is all GPRs, all VSX regs and a subset of SPRs
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index b8f14ea..127bb5f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1710,6 +1710,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, 
u64 id,
case KVM_REG_PPC_ONLINE:
*val = get_reg_val(id, vcpu->arch.online);
break;
+   case KVM_REG_PPC_PTCR:
+   *val = get_reg_val(id, vcpu->kvm->arch.l1_ptcr);
+   break;
default:
r = -EINVAL;
break;
@@ -1941,6 +1944,9 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, 
u64 id,
atomic_dec(>arch.vcore->online_count);
vcpu->arch.online = i;
break;
+   case KVM_REG_PPC_PTCR:
+   vcpu->kvm->arch.l1_ptcr = set_reg_val(id, *val);
+   break;
default:
r = -EINVAL;
break;
-- 
2.7.4

[PATCH v5 26/33] KVM: PPC: Book3S HV: Don't access HFSCR, LPIDR or LPCR when running nested

2018-10-08 Thread Paul Mackerras

When running as a nested hypervisor, this avoids reading hypervisor
privileged registers (specifically HFSCR, LPIDR and LPCR) at startup;
instead reasonable default values are used.  This also avoids writing
LPIDR in the single-vcpu entry/exit path.

Also, this removes the check for CPU_FTR_HVMODE in kvmppc_mmu_hv_init()
since its only caller already checks this.

Reviewed-by: David Gibson 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c |  7 +++
 arch/powerpc/kvm/book3s_hv.c| 33 +
 2 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 68e14af..c615617 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -268,14 +268,13 @@ int kvmppc_mmu_hv_init(void)
 {
unsigned long host_lpid, rsvd_lpid;
 
-   if (!cpu_has_feature(CPU_FTR_HVMODE))
-   return -EINVAL;
-
if (!mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE))
return -EINVAL;
 
/* POWER7 has 10-bit LPIDs (12-bit in POWER8) */
-   host_lpid = mfspr(SPRN_LPID);
+   host_lpid = 0;
+   if (cpu_has_feature(CPU_FTR_HVMODE))
+   host_lpid = mfspr(SPRN_LPID);
rsvd_lpid = LPID_RSVD;
 
kvmppc_init_lpid(rsvd_lpid + 1);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 24a6683..b8f14ea 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2174,15 +2174,18 @@ static struct kvm_vcpu 
*kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 * Set the default HFSCR for the guest from the host value.
 * This value is only used on POWER9.
 * On POWER9, we want to virtualize the doorbell facility, so we
-* turn off the HFSCR bit, which causes those instructions to trap.
+* don't set the HFSCR_MSGP bit, and that causes those instructions
+* to trap and then we emulate them.
 */
-   vcpu->arch.hfscr = mfspr(SPRN_HFSCR);
-   if (cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
+   vcpu->arch.hfscr = HFSCR_TAR | HFSCR_EBB | HFSCR_PM | HFSCR_BHRB |
+   HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP;
+   if (cpu_has_feature(CPU_FTR_HVMODE)) {
+   vcpu->arch.hfscr &= mfspr(SPRN_HFSCR);
+   if (cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
+   vcpu->arch.hfscr |= HFSCR_TM;
+   }
+   if (cpu_has_feature(CPU_FTR_TM_COMP))
vcpu->arch.hfscr |= HFSCR_TM;
-   else if (!cpu_has_feature(CPU_FTR_TM_COMP))
-   vcpu->arch.hfscr &= ~HFSCR_TM;
-   if (cpu_has_feature(CPU_FTR_ARCH_300))
-   vcpu->arch.hfscr &= ~HFSCR_MSGP;
 
kvmppc_mmu_book3s_hv_init(vcpu);
 
@@ -4002,8 +4005,10 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run,
 
srcu_read_unlock(>srcu, srcu_idx);
 
-   mtspr(SPRN_LPID, kvm->arch.host_lpid);
-   isync();
+   if (cpu_has_feature(CPU_FTR_HVMODE)) {
+   mtspr(SPRN_LPID, kvm->arch.host_lpid);
+   isync();
+   }
 
trace_hardirqs_off();
set_irq_happened(trap);
@@ -4630,9 +4635,13 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
kvm->arch.host_sdr1 = mfspr(SPRN_SDR1);
 
/* Init LPCR for virtual RMA mode */
-   kvm->arch.host_lpid = mfspr(SPRN_LPID);
-   kvm->arch.host_lpcr = lpcr = mfspr(SPRN_LPCR);
-   lpcr &= LPCR_PECE | LPCR_LPES;
+   if (cpu_has_feature(CPU_FTR_HVMODE)) {
+   kvm->arch.host_lpid = mfspr(SPRN_LPID);
+   kvm->arch.host_lpcr = lpcr = mfspr(SPRN_LPCR);
+   lpcr &= LPCR_PECE | LPCR_LPES;
+   } else {
+   lpcr = 0;
+   }
lpcr |= (4UL << LPCR_DPFD_SH) | LPCR_HDICE |
LPCR_VPM0 | LPCR_VPM1;
kvm->arch.vrma_slb_v = SLB_VSID_B_1T |
-- 
2.7.4

[PATCH v5 25/33] KVM: PPC: Book3S HV: Invalidate TLB when nested vcpu moves physical cpu

2018-10-08 Thread Paul Mackerras

From: Suraj Jitindar Singh 

This is only done at level 0, since only level 0 knows which physical
CPU a vcpu is running on.  This does for nested guests what L0 already
did for its own guests, which is to flush the TLB on a pCPU when it
goes to run a vCPU there, and there is another vCPU in the same VM
which previously ran on this pCPU and has now started to run on another
pCPU.  This is to handle the situation where the other vCPU touched
a mapping, moved to another pCPU and did a tlbiel (local-only tlbie)
on that new pCPU and thus left behind a stale TLB entry on this pCPU.

This introduces a limit on the the vcpu_token values used in the
H_ENTER_NESTED hcall -- they must now be less than NR_CPUS.

[pau...@ozlabs.org - made prev_cpu array be short[] to reduce
 memory consumption.]

Reviewed-by: David Gibson 
Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   3 +
 arch/powerpc/kvm/book3s_hv.c | 101 +++
 arch/powerpc/kvm/book3s_hv_nested.c  |   5 ++
 3 files changed, 71 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 719b31723..83d4def 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -52,6 +52,9 @@ struct kvm_nested_guest {
long refcnt;/* number of pointers to this struct */
struct mutex tlb_lock;  /* serialize page faults and tlbies */
struct kvm_nested_guest *next;
+   cpumask_t need_tlb_flush;
+   cpumask_t cpu_in_guest;
+   short prev_cpu[NR_CPUS];
 };
 
 /*
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 49f07de..24a6683 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2397,10 +2397,18 @@ static void kvmppc_release_hwthread(int cpu)
 
 static void radix_flush_cpu(struct kvm *kvm, int cpu, struct kvm_vcpu *vcpu)
 {
+   struct kvm_nested_guest *nested = vcpu->arch.nested;
+   cpumask_t *cpu_in_guest;
int i;
 
cpu = cpu_first_thread_sibling(cpu);
-   cpumask_set_cpu(cpu, >arch.need_tlb_flush);
+   if (nested) {
+   cpumask_set_cpu(cpu, >need_tlb_flush);
+   cpu_in_guest = >cpu_in_guest;
+   } else {
+   cpumask_set_cpu(cpu, >arch.need_tlb_flush);
+   cpu_in_guest = >arch.cpu_in_guest;
+   }
/*
 * Make sure setting of bit in need_tlb_flush precedes
 * testing of cpu_in_guest bits.  The matching barrier on
@@ -2408,13 +2416,23 @@ static void radix_flush_cpu(struct kvm *kvm, int cpu, 
struct kvm_vcpu *vcpu)
 */
smp_mb();
for (i = 0; i < threads_per_core; ++i)
-   if (cpumask_test_cpu(cpu + i, >arch.cpu_in_guest))
+   if (cpumask_test_cpu(cpu + i, cpu_in_guest))
smp_call_function_single(cpu + i, do_nothing, NULL, 1);
 }
 
 static void kvmppc_prepare_radix_vcpu(struct kvm_vcpu *vcpu, int pcpu)
 {
+   struct kvm_nested_guest *nested = vcpu->arch.nested;
struct kvm *kvm = vcpu->kvm;
+   int prev_cpu;
+
+   if (!cpu_has_feature(CPU_FTR_HVMODE))
+   return;
+
+   if (nested)
+   prev_cpu = nested->prev_cpu[vcpu->arch.nested_vcpu_id];
+   else
+   prev_cpu = vcpu->arch.prev_cpu;
 
/*
 * With radix, the guest can do TLB invalidations itself,
@@ -2428,12 +2446,46 @@ static void kvmppc_prepare_radix_vcpu(struct kvm_vcpu 
*vcpu, int pcpu)
 * ran to flush the TLB.  The TLB is shared between threads,
 * so we use a single bit in .need_tlb_flush for all 4 threads.
 */
-   if (vcpu->arch.prev_cpu != pcpu) {
-   if (vcpu->arch.prev_cpu >= 0 &&
-   cpu_first_thread_sibling(vcpu->arch.prev_cpu) !=
+   if (prev_cpu != pcpu) {
+   if (prev_cpu >= 0 &&
+   cpu_first_thread_sibling(prev_cpu) !=
cpu_first_thread_sibling(pcpu))
-   radix_flush_cpu(kvm, vcpu->arch.prev_cpu, vcpu);
-   vcpu->arch.prev_cpu = pcpu;
+   radix_flush_cpu(kvm, prev_cpu, vcpu);
+   if (nested)
+   nested->prev_cpu[vcpu->arch.nested_vcpu_id] = pcpu;
+   else
+   vcpu->arch.prev_cpu = pcpu;
+   }
+}
+
+static void kvmppc_radix_check_need_tlb_flush(struct kvm *kvm, int pcpu,
+ struct kvm_nested_guest *nested)
+{
+   cpumask_t *need_tlb_flush;
+   int lpid;
+
+   if (!cpu_has_feature(CPU_FTR_HVMODE))
+   return;
+
+   if (cpu_has_feature(CPU_FTR_ARCH_300))
+   pcpu &= ~0x3UL;
+
+   if (nested) {
+   lpid = nested->shadow_lpid;
+   need_tlb_flush = >need_tlb_flush;
+   } else {
+

[PATCH v5 24/33] KVM: PPC: Book3S HV: Use hypercalls for TLB invalidation when nested

2018-10-08 Thread Paul Mackerras

This adds code to call the H_TLB_INVALIDATE hypercall when running as
a guest, in the cases where we need to invalidate TLBs (or other MMU
caches) as part of managing the mappings for a nested guest.  Calling
H_TLB_INVALIDATE lets the nested hypervisor inform the parent
hypervisor about changes to partition-scoped page tables or the
partition table without needing to do hypervisor-privileged tlbie
instructions.

Reviewed-by: David Gibson 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  5 +
 arch/powerpc/kvm/book3s_64_mmu_radix.c   | 30 --
 arch/powerpc/kvm/book3s_hv_nested.c  | 30 --
 3 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index c2a9146..719b31723 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_PPC_PSERIES
 static inline bool kvmhv_on_pseries(void)
@@ -117,6 +118,10 @@ struct kvm_nested_guest *kvmhv_get_nested(struct kvm *kvm, 
int l1_lpid,
  bool create);
 void kvmhv_put_nested(struct kvm_nested_guest *gp);
 
+/* Encoding of first parameter for H_TLB_INVALIDATE */
+#define H_TLBIE_P1_ENC(ric, prs, r)(___PPC_RIC(ric) | ___PPC_PRS(prs) | \
+___PPC_R(r))
+
 /* Power architecture requires HPT is at least 256kiB, at most 64TiB */
 #define PPC_MIN_HPT_ORDER  18
 #define PPC_MAX_HPT_ORDER  46
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 4c1eccb..ae0e3ed 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -201,17 +201,43 @@ static void kvmppc_radix_tlbie_page(struct kvm *kvm, 
unsigned long addr,
unsigned int pshift, unsigned int lpid)
 {
unsigned long psize = PAGE_SIZE;
+   int psi;
+   long rc;
+   unsigned long rb;
 
if (pshift)
psize = 1UL << pshift;
+   else
+   pshift = PAGE_SHIFT;
 
addr &= ~(psize - 1);
-   radix__flush_tlb_lpid_page(lpid, addr, psize);
+
+   if (!kvmhv_on_pseries()) {
+   radix__flush_tlb_lpid_page(lpid, addr, psize);
+   return;
+   }
+
+   psi = shift_to_mmu_psize(pshift);
+   rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 1),
+   lpid, rb);
+   if (rc)
+   pr_err("KVM: TLB page invalidation hcall failed, rc=%ld\n", rc);
 }
 
 static void kvmppc_radix_flush_pwc(struct kvm *kvm, unsigned int lpid)
 {
-   radix__flush_pwc_lpid(lpid);
+   long rc;
+
+   if (!kvmhv_on_pseries()) {
+   radix__flush_pwc_lpid(lpid);
+   return;
+   }
+
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   if (rc)
+   pr_err("KVM: TLB PWC invalidation hcall failed, rc=%ld\n", rc);
 }
 
 static unsigned long kvmppc_radix_update_pte(struct kvm *kvm, pte_t *ptep,
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index c83c13d..486d900 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -299,14 +299,32 @@ void kvmhv_nested_exit(void)
}
 }
 
+static void kvmhv_flush_lpid(unsigned int lpid)
+{
+   long rc;
+
+   if (!kvmhv_on_pseries()) {
+   radix__flush_tlb_lpid(lpid);
+   return;
+   }
+
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   if (rc)
+   pr_err("KVM: TLB LPID invalidation hcall failed, rc=%ld\n", rc);
+}
+
 void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1)
 {
-   if (cpu_has_feature(CPU_FTR_HVMODE)) {
+   if (!kvmhv_on_pseries()) {
mmu_partition_table_set_entry(lpid, dw0, dw1);
-   } else {
-   pseries_partition_tb[lpid].patb0 = cpu_to_be64(dw0);
-   pseries_partition_tb[lpid].patb1 = cpu_to_be64(dw1);
+   return;
}
+
+   pseries_partition_tb[lpid].patb0 = cpu_to_be64(dw0);
+   pseries_partition_tb[lpid].patb1 = cpu_to_be64(dw1);
+   /* L0 will do the necessary barriers */
+   kvmhv_flush_lpid(lpid);
 }
 
 static void kvmhv_set_nested_ptbl(struct kvm_nested_guest *gp)
@@ -493,7 +511,7 @@ static void kvmhv_flush_nested(struct kvm_nested_guest *gp)
spin_lock(>mmu_lock);
kvmppc_free_pgtable_radix(kvm, gp->shadow_pgtable, gp->shadow_lpid);
spin_unlock(>mmu_lock);
-   radix__flush_tlb_lpid(gp->shadow_lpid);
+

[PATCH v5 22/33] KVM: PPC: Book3S HV: Introduce rmap to track nested guest mappings

2018-10-08 Thread Paul Mackerras

From: Suraj Jitindar Singh 

When a host (L0) page which is mapped into a (L1) guest is in turn
mapped through to a nested (L2) guest we keep a reverse mapping (rmap)
so that these mappings can be retrieved later.

Whenever we create an entry in a shadow_pgtable for a nested guest we
create a corresponding rmap entry and add it to the list for the
L1 guest memslot at the index of the L1 guest page it maps. This means
at the L1 guest memslot we end up with lists of rmaps.

When we are notified of a host page being invalidated which has been
mapped through to a (L1) guest, we can then walk the rmap list for that
guest page, and find and invalidate all of the corresponding
shadow_pgtable entries.

In order to reduce memory consumption, we compress the information for
each rmap entry down to 52 bits -- 12 bits for the LPID and 40 bits
for the guest real page frame number -- which will fit in a single
unsigned long.  To avoid a scenario where a guest can trigger
unbounded memory allocations, we scan the list when adding an entry to
see if there is already an entry with the contents we need.  This can
occur, because we don't ever remove entries from the middle of a list.

A struct nested guest rmap is a list pointer and an rmap entry;

| next pointer |

| rmap entry   |


Thus the rmap pointer for each guest frame number in the memslot can be
either NULL, a single entry, or a pointer to a list of nested rmap entries.

gfn  memslot rmap array
-
 0  | NULL  |   (no rmap entry)
-
 1  | single rmap entry |   (rmap entry with low bit set)
-
 2  | list head pointer |   (list of rmap entries)
-

The final entry always has the lowest bit set and is stored in the next
pointer of the last list entry, or as a single rmap entry.
With a list of rmap entries looking like;

-   -   -
| list head ptr | > | next pointer  | > | single rmap entry |
-   -   -
| rmap entry|   | rmap entry|
-   -

Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s.h|   3 +
 arch/powerpc/include/asm/kvm_book3s_64.h |  69 +++-
 arch/powerpc/kvm/book3s_64_mmu_radix.c   |  44 +++---
 arch/powerpc/kvm/book3s_hv.c |   1 +
 arch/powerpc/kvm/book3s_hv_nested.c  | 138 ++-
 5 files changed, 240 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 63f7ccf..d7aeb6f 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -196,6 +196,9 @@ extern int kvmppc_mmu_radix_translate_table(struct kvm_vcpu 
*vcpu, gva_t eaddr,
int table_index, u64 *pte_ret_p);
 extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
struct kvmppc_pte *gpte, bool data, bool iswrite);
+extern void kvmppc_unmap_pte(struct kvm *kvm, pte_t *pte, unsigned long gpa,
+   unsigned int shift, struct kvm_memory_slot *memslot,
+   unsigned int lpid);
 extern bool kvmppc_hv_handle_set_rc(struct kvm *kvm, pgd_t *pgtable,
bool writing, unsigned long gpa,
unsigned int lpid);
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 5496152..c2a9146 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -53,6 +53,66 @@ struct kvm_nested_guest {
struct kvm_nested_guest *next;
 };
 
+/*
+ * We define a nested rmap entry as a single 64-bit quantity
+ * 0xFFF0  12-bit lpid field
+ * 0x000FF000  40-bit guest 4k page frame number
+ * 0x0001  1-bit  single entry flag
+ */
+#define RMAP_NESTED_LPID_MASK  0xFFF0UL
+#define RMAP_NESTED_LPID_SHIFT (52)
+#define RMAP_NESTED_GPA_MASK   0x000FF000UL
+#define RMAP_NESTED_IS_SINGLE_ENTRY0x0001UL
+
+/* Structure for a nested guest rmap entry */
+struct rmap_nested {
+   struct llist_node list;
+   u64 rmap;
+};
+
+/*
+ * for_each_nest_rmap_safe - iterate over the list of nested rmap entries
+ *  safe against removal of the list entry or NULL list
+ * @pos:   a (struct rmap_nested *) to use as a loop cursor
+ * @node:  pointer to the first entry
+ * NOTE: this can be NULL
+ * @rmapp: an (unsigned long *) in which to return the rmap entries on each
+

[PATCH v5 23/33] KVM: PPC: Book3S HV: Implement H_TLB_INVALIDATE hcall

2018-10-08 Thread Paul Mackerras

From: Suraj Jitindar Singh 

When running a nested (L2) guest the guest (L1) hypervisor will use
the H_TLB_INVALIDATE hcall when it needs to change the partition
scoped page tables or the partition table which it manages.  It will
use this hcall in the situations where it would use a partition-scoped
tlbie instruction if it were running in hypervisor mode.

The H_TLB_INVALIDATE hcall can invalidate different scopes:

Invalidate TLB for a given target address:
- This invalidates a single L2 -> L1 pte
- We need to invalidate any L2 -> L0 shadow_pgtable ptes which map the L2
  address space which is being invalidated. This is because a single
  L2 -> L1 pte may have been mapped with more than one pte in the
  L2 -> L0 page tables.

Invalidate the entire TLB for a given LPID or for all LPIDs:
- Invalidate the entire shadow_pgtable for a given nested guest, or
  for all nested guests.

Invalidate the PWC (page walk cache) for a given LPID or for all LPIDs:
- We don't cache the PWC, so nothing to do.

Invalidate the entire TLB, PWC and partition table for a given/all LPIDs:
- Here we re-read the partition table entry and remove the nested state
  for any nested guest for which the first doubleword of the partition
  table entry is now zero.

The H_TLB_INVALIDATE hcall takes as parameters the tlbie instruction
word (of which only the RIC, PRS and R fields are used), the rS value
(giving the lpid, where required) and the rB value (giving the IS, AP
and EPN values).

[pau...@ozlabs.org - adapted to having the partition table in guest
memory, added the H_TLB_INVALIDATE implementation, removed tlbie
instruction emulation, reworded the commit message.]

Reviewed-by: David Gibson 
Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  12 ++
 arch/powerpc/include/asm/kvm_book3s.h |   1 +
 arch/powerpc/include/asm/ppc-opcode.h |   1 +
 arch/powerpc/kvm/book3s_emulate.c |   1 -
 arch/powerpc/kvm/book3s_hv.c  |   3 +
 arch/powerpc/kvm/book3s_hv_nested.c   | 196 +-
 6 files changed, 212 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index b3520b5..66db23e 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -203,6 +203,18 @@ static inline unsigned int mmu_psize_to_shift(unsigned int 
mmu_psize)
BUG();
 }
 
+static inline unsigned int ap_to_shift(unsigned long ap)
+{
+   int psize;
+
+   for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
+   if (mmu_psize_defs[psize].ap == ap)
+   return mmu_psize_defs[psize].shift;
+   }
+
+   return -1;
+}
+
 static inline unsigned long get_sllp_encoding(int psize)
 {
unsigned long sllp;
diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index d7aeb6f..09f8e9b 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -301,6 +301,7 @@ long kvmhv_set_partition_table(struct kvm_vcpu *vcpu);
 void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1);
 void kvmhv_release_all_nested(struct kvm *kvm);
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
+long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
 int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu,
  u64 time_limit, unsigned long lpcr);
 void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 665af14..6093bc8 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -104,6 +104,7 @@
 #define OP_31_XOP_LHZUX 311
 #define OP_31_XOP_MSGSNDP   142
 #define OP_31_XOP_MSGCLRP   174
+#define OP_31_XOP_TLBIE 306
 #define OP_31_XOP_MFSPR 339
 #define OP_31_XOP_LWAX  341
 #define OP_31_XOP_LHAX  343
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 2654df2..8c7e933 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -36,7 +36,6 @@
 #define OP_31_XOP_MTSR 210
 #define OP_31_XOP_MTSRIN   242
 #define OP_31_XOP_TLBIEL   274
-#define OP_31_XOP_TLBIE306
 /* Opcode is officially reserved, reuse it as sc 1 when sc 1 doesn't trap */
 #define OP_31_XOP_FAKE_SC1 308
 #define OP_31_XOP_SLBMTE   402
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cb9e738..49f07de 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -974,6 +974,9 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
break;
case H_TLB_INVALIDATE:
ret = H_FUNCTION;
+   if (!vcpu->kvm->arch.nested_enable)
+

[PATCH v5 21/33] KVM: PPC: Book3S HV: Handle page fault for a nested guest

2018-10-08 Thread Paul Mackerras

From: Suraj Jitindar Singh 

Consider a normal (L1) guest running under the main hypervisor (L0),
and then a nested guest (L2) running under the L1 guest which is acting
as a nested hypervisor. L0 has page tables to map the address space for
L1 providing the translation from L1 real address -> L0 real address;

L1
|
| (L1 -> L0)
|
> L0

There are also page tables in L1 used to map the address space for L2
providing the translation from L2 real address -> L1 read address. Since
the hardware can only walk a single level of page table, we need to
maintain in L0 a "shadow_pgtable" for L2 which provides the translation
from L2 real address -> L0 real address. Which looks like;

L2  L2
|   |
| (L2 -> L1)|
|   |
> L1| (L2 -> L0)
  | |
  | (L1 -> L0)  |
  | |
  > L0  > L0

When a page fault occurs while running a nested (L2) guest we need to
insert a pte into this "shadow_pgtable" for the L2 -> L0 mapping. To
do this we need to:

1. Walk the pgtable in L1 memory to find the L2 -> L1 mapping, and
   provide a page fault to L1 if this mapping doesn't exist.
2. Use our L1 -> L0 pgtable to convert this L1 address to an L0 address,
   or try to insert a pte for that mapping if it doesn't exist.
3. Now we have a L2 -> L0 mapping, insert this into our shadow_pgtable

Once this mapping exists we can take rc faults when hardware is unable
to automatically set the reference and change bits in the pte. On these
we need to:

1. Check the rc bits on the L2 -> L1 pte match, and otherwise reflect
   the fault down to L1.
2. Set the rc bits in the L1 -> L0 pte which corresponds to the same
   host page.
3. Set the rc bits in the L2 -> L0 pte.

As we reuse a large number of functions in book3s_64_mmu_radix.c for
this we also needed to refactor a number of these functions to take
an lpid parameter so that the correct lpid is used for tlb invalidations.
The functionality however has remained the same.

Reviewed-by: David Gibson 
Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Paul Mackerras 
---
 .../powerpc/include/asm/book3s/64/tlbflush-radix.h |   1 +
 arch/powerpc/include/asm/kvm_book3s.h  |  17 ++
 arch/powerpc/include/asm/kvm_book3s_64.h   |   4 +
 arch/powerpc/include/asm/kvm_host.h|   2 +
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 194 ++--
 arch/powerpc/kvm/book3s_hv_nested.c| 332 -
 arch/powerpc/mm/tlb-radix.c|   9 +
 7 files changed, 473 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 1154a6d..671316f 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -53,6 +53,7 @@ extern void radix__flush_tlb_lpid_page(unsigned int lpid,
unsigned long addr,
unsigned long page_size);
 extern void radix__flush_pwc_lpid(unsigned int lpid);
+extern void radix__flush_tlb_lpid(unsigned int lpid);
 extern void radix__local_flush_tlb_lpid(unsigned int lpid);
 extern void radix__local_flush_tlb_lpid_guest(unsigned int lpid);
 
diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 093fd70..63f7ccf 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -188,17 +188,34 @@ extern int kvmppc_book3s_hcall_implemented(struct kvm 
*kvm, unsigned long hc);
 extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run,
struct kvm_vcpu *vcpu,
unsigned long ea, unsigned long dsisr);
+extern int kvmppc_mmu_walk_radix_tree(struct kvm_vcpu *vcpu, gva_t eaddr,
+ struct kvmppc_pte *gpte, u64 root,
+ u64 *pte_ret_p);
 extern int kvmppc_mmu_radix_translate_table(struct kvm_vcpu *vcpu, gva_t eaddr,
struct kvmppc_pte *gpte, u64 table,
int table_index, u64 *pte_ret_p);
 extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
struct kvmppc_pte *gpte, bool data, bool iswrite);
+extern bool kvmppc_hv_handle_set_rc(struct kvm *kvm, pgd_t *pgtable,
+   bool writing, unsigned long gpa,
+   unsigned int lpid);
+extern int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
+   unsigned long gpa,
+   struct kvm_memory_slot *memslot,
+

[PATCH v5 20/33] KVM: PPC: Book3S HV: Handle hypercalls correctly when nested

2018-10-08 Thread Paul Mackerras

When we are running as a nested hypervisor, we use a hypercall to
enter the guest rather than code in book3s_hv_rmhandlers.S.  This means
that the hypercall handlers listed in hcall_real_table never get called.
There are some hypercalls that are handled there and not in
kvmppc_pseries_do_hcall(), which therefore won't get processed for
a nested guest.

To fix this, we add cases to kvmppc_pseries_do_hcall() to handle those
hypercalls, with the following exceptions:

- The HPT hypercalls (H_ENTER, H_REMOVE, etc.) are not handled because
  we only support radix mode for nested guests.

- H_CEDE has to be handled specially because the cede logic in
  kvmhv_run_single_vcpu assumes that it has been processed by the time
  that kvmhv_p9_guest_entry() returns.  Therefore we put a special
  case for H_CEDE in kvmhv_p9_guest_entry().

For the XICS hypercalls, if real-mode processing is enabled, then the
virtual-mode handlers assume that they are being called only to finish
up the operation.  Therefore we turn off the real-mode flag in the XICS
code when running as a nested hypervisor.

Reviewed-by: David Gibson 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/asm-prototypes.h |  4 +++
 arch/powerpc/kvm/book3s_hv.c  | 43 +++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  2 ++
 arch/powerpc/kvm/book3s_xics.c|  3 ++-
 4 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 5c9b00c..c55ba3b 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -167,4 +167,8 @@ void kvmhv_load_guest_pmu(struct kvm_vcpu *vcpu);
 
 int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu);
 
+long kvmppc_h_set_dabr(struct kvm_vcpu *vcpu, unsigned long dabr);
+long kvmppc_h_set_xdabr(struct kvm_vcpu *vcpu, unsigned long dabr,
+   unsigned long dabrx);
+
 #endif /* _ASM_POWERPC_ASM_PROTOTYPES_H */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index dd84252..dc25461 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -915,6 +916,19 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
break;
}
return RESUME_HOST;
+   case H_SET_DABR:
+   ret = kvmppc_h_set_dabr(vcpu, kvmppc_get_gpr(vcpu, 4));
+   break;
+   case H_SET_XDABR:
+   ret = kvmppc_h_set_xdabr(vcpu, kvmppc_get_gpr(vcpu, 4),
+   kvmppc_get_gpr(vcpu, 5));
+   break;
+   case H_GET_TCE:
+   ret = kvmppc_h_get_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
+   kvmppc_get_gpr(vcpu, 5));
+   if (ret == H_TOO_HARD)
+   return RESUME_HOST;
+   break;
case H_PUT_TCE:
ret = kvmppc_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
kvmppc_get_gpr(vcpu, 5),
@@ -938,6 +952,10 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
if (ret == H_TOO_HARD)
return RESUME_HOST;
break;
+   case H_RANDOM:
+   if (!powernv_get_random_long(>arch.regs.gpr[4]))
+   ret = H_HARDWARE;
+   break;
 
case H_SET_PARTITION_TABLE:
ret = H_FUNCTION;
@@ -966,6 +984,24 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
return RESUME_GUEST;
 }
 
+/*
+ * Handle H_CEDE in the nested virtualization case where we haven't
+ * called the real-mode hcall handlers in book3s_hv_rmhandlers.S.
+ * This has to be done early, not in kvmppc_pseries_do_hcall(), so
+ * that the cede logic in kvmppc_run_single_vcpu() works properly.
+ */
+static void kvmppc_nested_cede(struct kvm_vcpu *vcpu)
+{
+   vcpu->arch.shregs.msr |= MSR_EE;
+   vcpu->arch.ceded = 1;
+   smp_mb();
+   if (vcpu->arch.prodded) {
+   vcpu->arch.prodded = 0;
+   smp_mb();
+   vcpu->arch.ceded = 0;
+   }
+}
+
 static int kvmppc_hcall_impl_hv(unsigned long cmd)
 {
switch (cmd) {
@@ -3422,6 +3458,13 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 
time_limit,
vcpu->arch.shregs.msr = vcpu->arch.regs.msr;
vcpu->arch.shregs.dar = mfspr(SPRN_DAR);
vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR);
+
+   /* H_CEDE has to be handled now, not later */
+   if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested &&
+   kvmppc_get_gpr(vcpu, 3) == H_CEDE) {
+   kvmppc_nested_cede(vcpu);
+   trap = 0;
+   }
} else {
trap =

[PATCH v5 19/33] KVM: PPC: Book3S HV: Use XICS hypercalls when running as a nested hypervisor

2018-10-08 Thread Paul Mackerras

This adds code to call the H_IPI and H_EOI hypercalls when we are
running as a nested hypervisor (i.e. without the CPU_FTR_HVMODE cpu
feature) and we would otherwise access the XICS interrupt controller
directly or via an OPAL call.

Reviewed-by: David Gibson 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv.c |  7 +-
 arch/powerpc/kvm/book3s_hv_builtin.c | 44 +---
 arch/powerpc/kvm/book3s_hv_rm_xics.c |  8 +++
 3 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d58a4a6..dd84252 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -173,6 +173,10 @@ static bool kvmppc_ipi_thread(int cpu)
 {
unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
 
+   /* If we're a nested hypervisor, fall back to ordinary IPIs for now */
+   if (kvmhv_on_pseries())
+   return false;
+
/* On POWER9 we can use msgsnd to IPI any cpu */
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
msg |= get_hard_smp_processor_id(cpu);
@@ -5173,7 +5177,8 @@ static int kvmppc_book3s_init_hv(void)
 * indirectly, via OPAL.
 */
 #ifdef CONFIG_SMP
-   if (!xive_enabled() && !local_paca->kvm_hstate.xics_phys) {
+   if (!xive_enabled() && !kvmhv_on_pseries() &&
+   !local_paca->kvm_hstate.xics_phys) {
struct device_node *np;
 
np = of_find_compatible_node(NULL, NULL, "ibm,opal-intc");
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index ccfea5b..a71e2fc 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -231,6 +231,15 @@ void kvmhv_rm_send_ipi(int cpu)
void __iomem *xics_phys;
unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
 
+   /* For a nested hypervisor, use the XICS via hcall */
+   if (kvmhv_on_pseries()) {
+   unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+
+   plpar_hcall_raw(H_IPI, retbuf, get_hard_smp_processor_id(cpu),
+   IPI_PRIORITY);
+   return;
+   }
+
/* On POWER9 we can use msgsnd for any destination cpu. */
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
msg |= get_hard_smp_processor_id(cpu);
@@ -460,12 +469,19 @@ static long kvmppc_read_one_intr(bool *again)
return 1;
 
/* Now read the interrupt from the ICP */
-   xics_phys = local_paca->kvm_hstate.xics_phys;
-   rc = 0;
-   if (!xics_phys)
-   rc = opal_int_get_xirr(, false);
-   else
-   xirr = __raw_rm_readl(xics_phys + XICS_XIRR);
+   if (kvmhv_on_pseries()) {
+   unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+
+   rc = plpar_hcall_raw(H_XIRR, retbuf, 0xFF);
+   xirr = cpu_to_be32(retbuf[0]);
+   } else {
+   xics_phys = local_paca->kvm_hstate.xics_phys;
+   rc = 0;
+   if (!xics_phys)
+   rc = opal_int_get_xirr(, false);
+   else
+   xirr = __raw_rm_readl(xics_phys + XICS_XIRR);
+   }
if (rc < 0)
return 1;
 
@@ -494,7 +510,13 @@ static long kvmppc_read_one_intr(bool *again)
 */
if (xisr == XICS_IPI) {
rc = 0;
-   if (xics_phys) {
+   if (kvmhv_on_pseries()) {
+   unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+
+   plpar_hcall_raw(H_IPI, retbuf,
+   hard_smp_processor_id(), 0xff);
+   plpar_hcall_raw(H_EOI, retbuf, h_xirr);
+   } else if (xics_phys) {
__raw_rm_writeb(0xff, xics_phys + XICS_MFRR);
__raw_rm_writel(xirr, xics_phys + XICS_XIRR);
} else {
@@ -520,7 +542,13 @@ static long kvmppc_read_one_intr(bool *again)
/* We raced with the host,
 * we need to resend that IPI, bummer
 */
-   if (xics_phys)
+   if (kvmhv_on_pseries()) {
+   unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+
+   plpar_hcall_raw(H_IPI, retbuf,
+   hard_smp_processor_id(),
+   IPI_PRIORITY);
+   } else if (xics_phys)
__raw_rm_writeb(IPI_PRIORITY,
xics_phys + XICS_MFRR);
else
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 8b9f356..b3f5786 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -767,6 +767,14 @@ static void

[PATCH v5 18/33] KVM: PPC: Book3S HV: Nested guest entry via hypercall

2018-10-08 Thread Paul Mackerras

This adds a new hypercall, H_ENTER_NESTED, which is used by a nested
hypervisor to enter one of its nested guests.  The hypercall supplies
register values in two structs.  Those values are copied by the level 0
(L0) hypervisor (the one which is running in hypervisor mode) into the
vcpu struct of the L1 guest, and then the guest is run until an
interrupt or error occurs which needs to be reported to L1 via the
hypercall return value.

Currently this assumes that the L0 and L1 hypervisors are the same
endianness, and the structs passed as arguments are in native
endianness.  If they are of different endianness, the version number
check will fail and the hcall will be rejected.

Nested hypervisors do not support indep_threads_mode=N, so this adds
code to print a warning message if the administrator has set
indep_threads_mode=N, and treat it as Y.

Reviewed-by: David Gibson 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/hvcall.h   |  36 +
 arch/powerpc/include/asm/kvm_book3s.h   |   7 +
 arch/powerpc/include/asm/kvm_host.h |   5 +
 arch/powerpc/kernel/asm-offsets.c   |   1 +
 arch/powerpc/kvm/book3s_hv.c| 214 -
 arch/powerpc/kvm/book3s_hv_nested.c | 230 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   8 ++
 7 files changed, 471 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index c95c651..45e8789 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -466,6 +466,42 @@ struct h_cpu_char_result {
u64 behaviour;
 };
 
+/* Register state for entering a nested guest with H_ENTER_NESTED */
+struct hv_guest_state {
+   u64 version;/* version of this structure layout */
+   u32 lpid;
+   u32 vcpu_token;
+   /* These registers are hypervisor privileged (at least for writing) */
+   u64 lpcr;
+   u64 pcr;
+   u64 amor;
+   u64 dpdes;
+   u64 hfscr;
+   s64 tb_offset;
+   u64 dawr0;
+   u64 dawrx0;
+   u64 ciabr;
+   u64 hdec_expiry;
+   u64 purr;
+   u64 spurr;
+   u64 ic;
+   u64 vtb;
+   u64 hdar;
+   u64 hdsisr;
+   u64 heir;
+   u64 asdr;
+   /* These are OS privileged but need to be set late in guest entry */
+   u64 srr0;
+   u64 srr1;
+   u64 sprg[4];
+   u64 pidr;
+   u64 cfar;
+   u64 ppr;
+};
+
+/* Latest version of hv_guest_state structure */
+#define HV_GUEST_STATE_VERSION 1
+
 #endif /* __ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_HVCALL_H */
diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 43f212e..093fd70 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -280,6 +280,13 @@ void kvmhv_vm_nested_init(struct kvm *kvm);
 long kvmhv_set_partition_table(struct kvm_vcpu *vcpu);
 void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1);
 void kvmhv_release_all_nested(struct kvm *kvm);
+long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
+int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu,
+ u64 time_limit, unsigned long lpcr);
+void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
+void kvmhv_restore_hv_return_state(struct kvm_vcpu *vcpu,
+  struct hv_guest_state *hr);
+long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu);
 
 void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac);
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index c35d4f2..ceb9f20 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -95,6 +95,7 @@ struct dtl_entry;
 
 struct kvmppc_vcpu_book3s;
 struct kvmppc_book3s_shadow_vcpu;
+struct kvm_nested_guest;
 
 struct kvm_vm_stat {
ulong remote_tlb_flush;
@@ -786,6 +787,10 @@ struct kvm_vcpu_arch {
u32 emul_inst;
 
u32 online;
+
+   /* For support of nested guests */
+   struct kvm_nested_guest *nested;
+   u32 nested_vcpu_id;
 #endif
 
 #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 7c3738d..d0abcbb 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -503,6 +503,7 @@ int main(void)
OFFSET(VCPU_VPA, kvm_vcpu, arch.vpa.pinned_addr);
OFFSET(VCPU_VPA_DIRTY, kvm_vcpu, arch.vpa.dirty);
OFFSET(VCPU_HEIR, kvm_vcpu, arch.emul_inst);
+   OFFSET(VCPU_NESTED, kvm_vcpu, arch.nested);
OFFSET(VCPU_CPU, kvm_vcpu, cpu);
OFFSET(VCPU_THREAD_CPU, kvm_vcpu, arch.thread_cpu);
 #endif
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4c72f2f..d58a4a6 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -942,6 +942,13 @@ int

[PATCH v5 17/33] KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization

2018-10-08 Thread Paul Mackerras

This starts the process of adding the code to support nested HV-style
virtualization.  It defines a new H_SET_PARTITION_TABLE hypercall which
a nested hypervisor can use to set the base address and size of a
partition table in its memory (analogous to the PTCR register).
On the host (level 0 hypervisor) side, the H_SET_PARTITION_TABLE
hypercall from the guest is handled by code that saves the virtual
PTCR value for the guest.

This also adds code for creating and destroying nested guests and for
reading the partition table entry for a nested guest from L1 memory.
Each nested guest has its own shadow LPID value, different in general
from the LPID value used by the nested hypervisor to refer to it.  The
shadow LPID value is allocated at nested guest creation time.

Nested hypervisor functionality is only available for a radix guest,
which therefore means a radix host on a POWER9 (or later) processor.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/hvcall.h |   5 +
 arch/powerpc/include/asm/kvm_book3s.h |  10 +-
 arch/powerpc/include/asm/kvm_book3s_64.h  |  33 
 arch/powerpc/include/asm/kvm_book3s_asm.h |   3 +
 arch/powerpc/include/asm/kvm_host.h   |   5 +
 arch/powerpc/kvm/Makefile |   3 +-
 arch/powerpc/kvm/book3s_hv.c  |  31 ++-
 arch/powerpc/kvm/book3s_hv_nested.c   | 301 ++
 8 files changed, 384 insertions(+), 7 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_hv_nested.c

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index a0b17f9..c95c651 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -322,6 +322,11 @@
 #define H_GET_24X7_DATA0xF07C
 #define H_GET_PERF_COUNTER_INFO0xF080
 
+/* Platform-specific hcalls used for nested HV KVM */
+#define H_SET_PARTITION_TABLE  0xF800
+#define H_ENTER_NESTED 0xF804
+#define H_TLB_INVALIDATE   0xF808
+
 /* Values for 2nd argument to H_SET_MODE */
 #define H_SET_MODE_RESOURCE_SET_CIABR  1
 #define H_SET_MODE_RESOURCE_SET_DAWR   2
diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 91c9779..43f212e 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -274,6 +274,13 @@ static inline void kvmppc_save_tm_sprs(struct kvm_vcpu 
*vcpu) {}
 static inline void kvmppc_restore_tm_sprs(struct kvm_vcpu *vcpu) {}
 #endif
 
+long kvmhv_nested_init(void);
+void kvmhv_nested_exit(void);
+void kvmhv_vm_nested_init(struct kvm *kvm);
+long kvmhv_set_partition_table(struct kvm_vcpu *vcpu);
+void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1);
+void kvmhv_release_all_nested(struct kvm *kvm);
+
 void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac);
 
 extern int kvm_irq_bypass;
@@ -387,9 +394,6 @@ extern int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu);
 /* TO = 31 for unconditional trap */
 #define INS_TW 0x7fe8
 
-/* LPIDs we support with this build -- runtime limit may be lower */
-#define KVMPPC_NR_LPIDS(LPID_RSVD + 1)
-
 #define SPLIT_HACK_MASK0xff00
 #define SPLIT_HACK_OFFS0xfb00
 
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 5c0e2d9..6d67b6a 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -23,6 +23,39 @@
 #include 
 #include 
 #include 
+#include 
+
+#ifdef CONFIG_PPC_PSERIES
+static inline bool kvmhv_on_pseries(void)
+{
+   return !cpu_has_feature(CPU_FTR_HVMODE);
+}
+#else
+static inline bool kvmhv_on_pseries(void)
+{
+   return false;
+}
+#endif
+
+/*
+ * Structure for a nested guest, that is, for a guest that is managed by
+ * one of our guests.
+ */
+struct kvm_nested_guest {
+   struct kvm *l1_host;/* L1 VM that owns this nested guest */
+   int l1_lpid;/* lpid L1 guest thinks this guest is */
+   int shadow_lpid;/* real lpid of this nested guest */
+   pgd_t *shadow_pgtable;  /* our page table for this guest */
+   u64 l1_gr_to_hr;/* L1's addr of part'n-scoped table */
+   u64 process_table;  /* process table entry for this guest */
+   long refcnt;/* number of pointers to this struct */
+   struct mutex tlb_lock;  /* serialize page faults and tlbies */
+   struct kvm_nested_guest *next;
+};
+
+struct kvm_nested_guest *kvmhv_get_nested(struct kvm *kvm, int l1_lpid,
+ bool create);
+void kvmhv_put_nested(struct kvm_nested_guest *gp);
 
 /* Power architecture requires HPT is at least 256kiB, at most 64TiB */
 #define PPC_MIN_HPT_ORDER  18
diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h

[PATCH v5 15/33] KVM: PPC: Book3S HV: Refactor radix page fault handler

2018-10-08 Thread Paul Mackerras

From: Suraj Jitindar Singh 

The radix page fault handler accounts for all cases, including just
needing to insert a pte.  This breaks it up into separate functions for
the two main cases; setting rc and inserting a pte.

This allows us to make the setting of rc and inserting of a pte
generic for any pgtable, not specific to the one for this guest.

[pau...@ozlabs.org - reduced diffs from previous code]

Reviewed-by: David Gibson 
Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 210 +++--
 1 file changed, 123 insertions(+), 87 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index f2976f4..47f2b18 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -400,8 +400,9 @@ static void kvmppc_unmap_free_pud_entry_table(struct kvm 
*kvm, pud_t *pud,
  */
 #define PTE_BITS_MUST_MATCH (~(_PAGE_WRITE | _PAGE_DIRTY | _PAGE_ACCESSED))
 
-static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
-unsigned int level, unsigned long mmu_seq)
+static int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
+unsigned long gpa, unsigned int level,
+unsigned long mmu_seq)
 {
pgd_t *pgd;
pud_t *pud, *new_pud = NULL;
@@ -410,7 +411,7 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, 
unsigned long gpa,
int ret;
 
/* Traverse the guest's 2nd-level tree, allocate new levels needed */
-   pgd = kvm->arch.pgtable + pgd_index(gpa);
+   pgd = pgtable + pgd_index(gpa);
pud = NULL;
if (pgd_present(*pgd))
pud = pud_offset(pgd, gpa);
@@ -565,95 +566,49 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, 
unsigned long gpa,
return ret;
 }
 
-int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
-  unsigned long ea, unsigned long dsisr)
+static bool kvmppc_hv_handle_set_rc(struct kvm *kvm, pgd_t *pgtable,
+   bool writing, unsigned long gpa)
+{
+   unsigned long pgflags;
+   unsigned int shift;
+   pte_t *ptep;
+
+   /*
+* Need to set an R or C bit in the 2nd-level tables;
+* since we are just helping out the hardware here,
+* it is sufficient to do what the hardware does.
+*/
+   pgflags = _PAGE_ACCESSED;
+   if (writing)
+   pgflags |= _PAGE_DIRTY;
+   /*
+* We are walking the secondary (partition-scoped) page table here.
+* We can do this without disabling irq because the Linux MM
+* subsystem doesn't do THP splits and collapses on this tree.
+*/
+   ptep = __find_linux_pte(pgtable, gpa, NULL, );
+   if (ptep && pte_present(*ptep) && (!writing || pte_write(*ptep))) {
+   kvmppc_radix_update_pte(kvm, ptep, 0, pgflags, gpa, shift);
+   return true;
+   }
+   return false;
+}
+
+static int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
+   unsigned long gpa,
+   struct kvm_memory_slot *memslot,
+   bool writing, bool kvm_ro,
+   pte_t *inserted_pte, unsigned int *levelp)
 {
struct kvm *kvm = vcpu->kvm;
-   unsigned long mmu_seq;
-   unsigned long gpa, gfn, hva;
-   struct kvm_memory_slot *memslot;
struct page *page = NULL;
-   long ret;
-   bool writing;
+   unsigned long mmu_seq;
+   unsigned long hva, gfn = gpa >> PAGE_SHIFT;
bool upgrade_write = false;
bool *upgrade_p = _write;
pte_t pte, *ptep;
-   unsigned long pgflags;
unsigned int shift, level;
-
-   /* Check for unusual errors */
-   if (dsisr & DSISR_UNSUPP_MMU) {
-   pr_err("KVM: Got unsupported MMU fault\n");
-   return -EFAULT;
-   }
-   if (dsisr & DSISR_BADACCESS) {
-   /* Reflect to the guest as DSI */
-   pr_err("KVM: Got radix HV page fault with DSISR=%lx\n", dsisr);
-   kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
-   return RESUME_GUEST;
-   }
-
-   /* Translate the logical address and get the page */
-   gpa = vcpu->arch.fault_gpa & ~0xfffUL;
-   gpa &= ~0xF000ul;
-   gfn = gpa >> PAGE_SHIFT;
-   if (!(dsisr & DSISR_PRTABLE_FAULT))
-   gpa |= ea & 0xfff;
-   memslot = gfn_to_memslot(kvm, gfn);
-
-   /* No memslot means it's an emulated MMIO region */
-   if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID)) {
-   if (dsisr & (DSISR_PRTABLE_FAULT | DSISR_BADACCESS |
-DSISR_SET_RC)) {
-   /*
-* Bad address in guest

[PATCH v5 16/33] KVM: PPC: Book3S HV: Use kvmppc_unmap_pte() in kvm_unmap_radix()

2018-10-08 Thread Paul Mackerras

kvmppc_unmap_pte() does a sequence of operations that are open-coded in
kvm_unmap_radix().  This extends kvmppc_unmap_pte() a little so that it
can be used by kvm_unmap_radix(), and makes kvm_unmap_radix() call it.

Reviewed-by: David Gibson 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 33 +
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 47f2b18..bd06a95 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -240,19 +240,22 @@ static void kvmppc_pmd_free(pmd_t *pmdp)
 }
 
 static void kvmppc_unmap_pte(struct kvm *kvm, pte_t *pte,
-unsigned long gpa, unsigned int shift)
+unsigned long gpa, unsigned int shift,
+struct kvm_memory_slot *memslot)
 
 {
-   unsigned long page_size = 1ul << shift;
unsigned long old;
 
old = kvmppc_radix_update_pte(kvm, pte, ~0UL, 0, gpa, shift);
kvmppc_radix_tlbie_page(kvm, gpa, shift);
if (old & _PAGE_DIRTY) {
unsigned long gfn = gpa >> PAGE_SHIFT;
-   struct kvm_memory_slot *memslot;
+   unsigned long page_size = PAGE_SIZE;
 
-   memslot = gfn_to_memslot(kvm, gfn);
+   if (shift)
+   page_size = 1ul << shift;
+   if (!memslot)
+   memslot = gfn_to_memslot(kvm, gfn);
if (memslot && memslot->dirty_bitmap)
kvmppc_update_dirty_map(memslot, gfn, page_size);
}
@@ -282,7 +285,7 @@ static void kvmppc_unmap_free_pte(struct kvm *kvm, pte_t 
*pte, bool full)
WARN_ON_ONCE(1);
kvmppc_unmap_pte(kvm, p,
 pte_pfn(*p) << PAGE_SHIFT,
-PAGE_SHIFT);
+PAGE_SHIFT, NULL);
}
}
 
@@ -304,7 +307,7 @@ static void kvmppc_unmap_free_pmd(struct kvm *kvm, pmd_t 
*pmd, bool full)
WARN_ON_ONCE(1);
kvmppc_unmap_pte(kvm, (pte_t *)p,
 pte_pfn(*(pte_t *)p) << PAGE_SHIFT,
-PMD_SHIFT);
+PMD_SHIFT, NULL);
}
} else {
pte_t *pte;
@@ -468,7 +471,7 @@ static int kvmppc_create_pte(struct kvm *kvm, pgd_t 
*pgtable, pte_t pte,
goto out_unlock;
}
/* Valid 1GB page here already, remove it */
-   kvmppc_unmap_pte(kvm, (pte_t *)pud, hgpa, PUD_SHIFT);
+   kvmppc_unmap_pte(kvm, (pte_t *)pud, hgpa, PUD_SHIFT, NULL);
}
if (level == 2) {
if (!pud_none(*pud)) {
@@ -517,7 +520,7 @@ static int kvmppc_create_pte(struct kvm *kvm, pgd_t 
*pgtable, pte_t pte,
goto out_unlock;
}
/* Valid 2MB page here already, remove it */
-   kvmppc_unmap_pte(kvm, pmdp_ptep(pmd), lgpa, PMD_SHIFT);
+   kvmppc_unmap_pte(kvm, pmdp_ptep(pmd), lgpa, PMD_SHIFT, NULL);
}
if (level == 1) {
if (!pmd_none(*pmd)) {
@@ -780,20 +783,10 @@ int kvm_unmap_radix(struct kvm *kvm, struct 
kvm_memory_slot *memslot,
pte_t *ptep;
unsigned long gpa = gfn << PAGE_SHIFT;
unsigned int shift;
-   unsigned long old;
 
ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, );
-   if (ptep && pte_present(*ptep)) {
-   old = kvmppc_radix_update_pte(kvm, ptep, ~0UL, 0,
- gpa, shift);
-   kvmppc_radix_tlbie_page(kvm, gpa, shift);
-   if ((old & _PAGE_DIRTY) && memslot->dirty_bitmap) {
-   unsigned long psize = PAGE_SIZE;
-   if (shift)
-   psize = 1ul << shift;
-   kvmppc_update_dirty_map(memslot, gfn, psize);
-   }
-   }
+   if (ptep && pte_present(*ptep))
+   kvmppc_unmap_pte(kvm, ptep, gpa, shift, memslot);
return 0;   
 }
 
-- 
2.7.4

[PATCH v5 14/33] KVM: PPC: Book3S HV: Make kvmppc_mmu_radix_xlate process/partition table agnostic

2018-10-08 Thread Paul Mackerras

From: Suraj Jitindar Singh 

kvmppc_mmu_radix_xlate() is used to translate an effective address
through the process tables. The process table and partition tables have
identical layout. Exploit this fact to make the kvmppc_mmu_radix_xlate()
function able to translate either an effective address through the
process tables or a guest real address through the partition tables.

[pau...@ozlabs.org - reduced diffs from previous code]

Reviewed-by: David Gibson 
Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s.h  |   3 +
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 109 +++--
 2 files changed, 78 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index dd18d81..91c9779 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -188,6 +188,9 @@ extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, 
unsigned long hc);
 extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run,
struct kvm_vcpu *vcpu,
unsigned long ea, unsigned long dsisr);
+extern int kvmppc_mmu_radix_translate_table(struct kvm_vcpu *vcpu, gva_t eaddr,
+   struct kvmppc_pte *gpte, u64 table,
+   int table_index, u64 *pte_ret_p);
 extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
struct kvmppc_pte *gpte, bool data, bool iswrite);
 extern int kvmppc_init_vm_radix(struct kvm *kvm);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 71951b5..f2976f4 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -29,83 +29,92 @@
  */
 static int p9_supported_radix_bits[4] = { 5, 9, 9, 13 };
 
-int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
-  struct kvmppc_pte *gpte, bool data, bool iswrite)
+/*
+ * Used to walk a partition or process table radix tree in guest memory
+ * Note: We exploit the fact that a partition table and a process
+ * table have the same layout, a partition-scoped page table and a
+ * process-scoped page table have the same layout, and the 2nd
+ * doubleword of a partition table entry has the same layout as
+ * the PTCR register.
+ */
+int kvmppc_mmu_radix_translate_table(struct kvm_vcpu *vcpu, gva_t eaddr,
+struct kvmppc_pte *gpte, u64 table,
+int table_index, u64 *pte_ret_p)
 {
struct kvm *kvm = vcpu->kvm;
-   u32 pid;
int ret, level, ps;
-   __be64 prte, rpte;
-   unsigned long ptbl;
-   unsigned long root, pte, index;
+   unsigned long ptbl, root;
unsigned long rts, bits, offset;
-   unsigned long gpa;
-   unsigned long proc_tbl_size;
+   unsigned long size, index;
+   struct prtb_entry entry;
+   u64 pte, base, gpa;
+   __be64 rpte;
 
-   /* Work out effective PID */
-   switch (eaddr >> 62) {
-   case 0:
-   pid = vcpu->arch.pid;
-   break;
-   case 3:
-   pid = 0;
-   break;
-   default:
+   if ((table & PRTS_MASK) > 24)
return -EINVAL;
-   }
-   proc_tbl_size = 1 << ((kvm->arch.process_table & PRTS_MASK) + 12);
-   if (pid * 16 >= proc_tbl_size)
+   size = 1ul << ((table & PRTS_MASK) + 12);
+
+   /* Is the table big enough to contain this entry? */
+   if ((table_index * sizeof(entry)) >= size)
return -EINVAL;
 
-   /* Read partition table to find root of tree for effective PID */
-   ptbl = (kvm->arch.process_table & PRTB_MASK) + (pid * 16);
-   ret = kvm_read_guest(kvm, ptbl, , sizeof(prte));
+   /* Read the table to find the root of the radix tree */
+   ptbl = (table & PRTB_MASK) + (table_index * sizeof(entry));
+   ret = kvm_read_guest(kvm, ptbl, , sizeof(entry));
if (ret)
return ret;
 
-   root = be64_to_cpu(prte);
+   /* Root is stored in the first double word */
+   root = be64_to_cpu(entry.prtb0);
rts = ((root & RTS1_MASK) >> (RTS1_SHIFT - 3)) |
((root & RTS2_MASK) >> RTS2_SHIFT);
bits = root & RPDS_MASK;
-   root = root & RPDB_MASK;
+   base = root & RPDB_MASK;
 
offset = rts + 31;
 
-   /* current implementations only support 52-bit space */
+   /* Current implementations only support 52-bit space */
if (offset != 52)
return -EINVAL;
 
+   /* Walk each level of the radix tree */
for (level = 3; level >= 0; --level) {
+   /* Check a valid size */
if (level && bits != p9_supported_radix_bits[level])
return -EINVAL;
if (level == 0 && !(bits == 5 || bits == 9))

[PATCH v5 13/33] KVM: PPC: Book3S HV: Clear partition table entry on vm teardown

2018-10-08 Thread Paul Mackerras

From: Suraj Jitindar Singh 

When destroying a VM we return the LPID to the pool, however we never
zero the partition table entry. This is instead done when we reallocate
the LPID.

Zero the partition table entry on VM teardown before returning the LPID
to the pool. This means if we were running as a nested hypervisor the
real hypervisor could use this to determine when it can free resources.

Reviewed-by: David Gibson 
Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 123bd18..8425d72 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4505,13 +4505,19 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 
kvmppc_free_vcores(kvm);
 
-   kvmppc_free_lpid(kvm->arch.lpid);
 
if (kvm_is_radix(kvm))
kvmppc_free_radix(kvm);
else
kvmppc_free_hpt(>arch.hpt);
 
+   /* Perform global invalidation and return lpid to the pool */
+   if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+   kvm->arch.process_table = 0;
+   kvmppc_setup_partition_table(kvm);
+   }
+   kvmppc_free_lpid(kvm->arch.lpid);
+
kvmppc_free_pimap(kvm);
 }
 
-- 
2.7.4

[PATCH v5 12/33] KVM: PPC: Use ccr field in pt_regs struct embedded in vcpu struct

2018-10-08 Thread Paul Mackerras

When the 'regs' field was added to struct kvm_vcpu_arch, the code
was changed to use several of the fields inside regs (e.g., gpr, lr,
etc.) but not the ccr field, because the ccr field in struct pt_regs
is 64 bits on 64-bit platforms, but the cr field in kvm_vcpu_arch is
only 32 bits.  This changes the code to use the regs.ccr field
instead of cr, and changes the assembly code on 64-bit platforms to
use 64-bit loads and stores instead of 32-bit ones.

Reviewed-by: David Gibson 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s.h|  4 ++--
 arch/powerpc/include/asm/kvm_book3s_64.h |  4 ++--
 arch/powerpc/include/asm/kvm_booke.h |  4 ++--
 arch/powerpc/include/asm/kvm_host.h  |  2 --
 arch/powerpc/kernel/asm-offsets.c|  4 ++--
 arch/powerpc/kvm/book3s_emulate.c| 12 ++--
 arch/powerpc/kvm/book3s_hv.c |  4 ++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |  4 ++--
 arch/powerpc/kvm/book3s_hv_tm.c  |  6 +++---
 arch/powerpc/kvm/book3s_hv_tm_builtin.c  |  5 +++--
 arch/powerpc/kvm/book3s_pr.c |  4 ++--
 arch/powerpc/kvm/bookehv_interrupts.S|  8 
 arch/powerpc/kvm/emulate_loadstore.c |  1 -
 13 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 83a9aa3..dd18d81 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -301,12 +301,12 @@ static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, 
int num)
 
 static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val)
 {
-   vcpu->arch.cr = val;
+   vcpu->arch.regs.ccr = val;
 }
 
 static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
 {
-   return vcpu->arch.cr;
+   return vcpu->arch.regs.ccr;
 }
 
 static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index af25aaa..5c0e2d9 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -483,7 +483,7 @@ static inline u64 sanitize_msr(u64 msr)
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 static inline void copy_from_checkpoint(struct kvm_vcpu *vcpu)
 {
-   vcpu->arch.cr  = vcpu->arch.cr_tm;
+   vcpu->arch.regs.ccr  = vcpu->arch.cr_tm;
vcpu->arch.regs.xer = vcpu->arch.xer_tm;
vcpu->arch.regs.link  = vcpu->arch.lr_tm;
vcpu->arch.regs.ctr = vcpu->arch.ctr_tm;
@@ -500,7 +500,7 @@ static inline void copy_from_checkpoint(struct kvm_vcpu 
*vcpu)
 
 static inline void copy_to_checkpoint(struct kvm_vcpu *vcpu)
 {
-   vcpu->arch.cr_tm  = vcpu->arch.cr;
+   vcpu->arch.cr_tm  = vcpu->arch.regs.ccr;
vcpu->arch.xer_tm = vcpu->arch.regs.xer;
vcpu->arch.lr_tm  = vcpu->arch.regs.link;
vcpu->arch.ctr_tm = vcpu->arch.regs.ctr;
diff --git a/arch/powerpc/include/asm/kvm_booke.h 
b/arch/powerpc/include/asm/kvm_booke.h
index d513e3e..f0cef62 100644
--- a/arch/powerpc/include/asm/kvm_booke.h
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -46,12 +46,12 @@ static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, 
int num)
 
 static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val)
 {
-   vcpu->arch.cr = val;
+   vcpu->arch.regs.ccr = val;
 }
 
 static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
 {
-   return vcpu->arch.cr;
+   return vcpu->arch.regs.ccr;
 }
 
 static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index a3d4f61..c9cc42f 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -538,8 +538,6 @@ struct kvm_vcpu_arch {
ulong tar;
 #endif
 
-   u32 cr;
-
 #ifdef CONFIG_PPC_BOOK3S
ulong hflags;
ulong guest_owned_ext;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 89cf155..7c3738d 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -438,7 +438,7 @@ int main(void)
 #ifdef CONFIG_PPC_BOOK3S
OFFSET(VCPU_TAR, kvm_vcpu, arch.tar);
 #endif
-   OFFSET(VCPU_CR, kvm_vcpu, arch.cr);
+   OFFSET(VCPU_CR, kvm_vcpu, arch.regs.ccr);
OFFSET(VCPU_PC, kvm_vcpu, arch.regs.nip);
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
OFFSET(VCPU_MSR, kvm_vcpu, arch.shregs.msr);
@@ -695,7 +695,7 @@ int main(void)
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
 #else /* CONFIG_PPC_BOOK3S */
-   OFFSET(VCPU_CR, kvm_vcpu, arch.cr);
+   OFFSET(VCPU_CR, kvm_vcpu, arch.regs.ccr);
OFFSET(VCPU_XER, kvm_vcpu, arch.regs.xer);
OFFSET(VCPU_LR, kvm_vcpu, arch.regs.link);
OFFSET(VCPU_CTR, kvm_vcpu, arch.regs.ctr);
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 36b11c5..2654df2 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++

[PATCH v5 11/33] KVM: PPC: Book3S HV: Add a debugfs file to dump radix mappings

2018-10-08 Thread Paul Mackerras

This adds a file called 'radix' in the debugfs directory for the
guest, which when read gives all of the valid leaf PTEs in the
partition-scoped radix tree for a radix guest, in human-readable
format.  It is analogous to the existing 'htab' file which dumps
the HPT entries for a HPT guest.

Reviewed-by: David Gibson 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   1 +
 arch/powerpc/include/asm/kvm_host.h  |   1 +
 arch/powerpc/kvm/book3s_64_mmu_radix.c   | 179 +++
 arch/powerpc/kvm/book3s_hv.c |   2 +
 4 files changed, 183 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index dc435a5..af25aaa 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -435,6 +435,7 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct 
kvm *kvm)
 }
 
 extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
+extern void kvmhv_radix_debugfs_init(struct kvm *kvm);
 
 extern void kvmhv_rm_send_ipi(int cpu);
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 3cd0b9f..a3d4f61 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -291,6 +291,7 @@ struct kvm_arch {
u64 process_table;
struct dentry *debugfs_dir;
struct dentry *htab_dentry;
+   struct dentry *radix_dentry;
struct kvm_resize_hpt *resize_hpt; /* protected by kvm->lock */
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 933c574..71951b5 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -10,6 +10,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include 
 #include 
@@ -853,6 +856,182 @@ static void pmd_ctor(void *addr)
memset(addr, 0, RADIX_PMD_TABLE_SIZE);
 }
 
+struct debugfs_radix_state {
+   struct kvm  *kvm;
+   struct mutexmutex;
+   unsigned long   gpa;
+   int chars_left;
+   int buf_index;
+   charbuf[128];
+   u8  hdr;
+};
+
+static int debugfs_radix_open(struct inode *inode, struct file *file)
+{
+   struct kvm *kvm = inode->i_private;
+   struct debugfs_radix_state *p;
+
+   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   if (!p)
+   return -ENOMEM;
+
+   kvm_get_kvm(kvm);
+   p->kvm = kvm;
+   mutex_init(>mutex);
+   file->private_data = p;
+
+   return nonseekable_open(inode, file);
+}
+
+static int debugfs_radix_release(struct inode *inode, struct file *file)
+{
+   struct debugfs_radix_state *p = file->private_data;
+
+   kvm_put_kvm(p->kvm);
+   kfree(p);
+   return 0;
+}
+
+static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
+size_t len, loff_t *ppos)
+{
+   struct debugfs_radix_state *p = file->private_data;
+   ssize_t ret, r;
+   unsigned long n;
+   struct kvm *kvm;
+   unsigned long gpa;
+   pgd_t *pgt;
+   pgd_t pgd, *pgdp;
+   pud_t pud, *pudp;
+   pmd_t pmd, *pmdp;
+   pte_t *ptep;
+   int shift;
+   unsigned long pte;
+
+   kvm = p->kvm;
+   if (!kvm_is_radix(kvm))
+   return 0;
+
+   ret = mutex_lock_interruptible(>mutex);
+   if (ret)
+   return ret;
+
+   if (p->chars_left) {
+   n = p->chars_left;
+   if (n > len)
+   n = len;
+   r = copy_to_user(buf, p->buf + p->buf_index, n);
+   n -= r;
+   p->chars_left -= n;
+   p->buf_index += n;
+   buf += n;
+   len -= n;
+   ret = n;
+   if (r) {
+   if (!n)
+   ret = -EFAULT;
+   goto out;
+   }
+   }
+
+   gpa = p->gpa;
+   pgt = kvm->arch.pgtable;
+   while (len != 0 && gpa < RADIX_PGTABLE_RANGE) {
+   if (!p->hdr) {
+   n = scnprintf(p->buf, sizeof(p->buf),
+ "pgdir: %lx\n", (unsigned long)pgt);
+   p->hdr = 1;
+   goto copy;
+   }
+
+   pgdp = pgt + pgd_index(gpa);
+   pgd = READ_ONCE(*pgdp);
+   if (!(pgd_val(pgd) & _PAGE_PRESENT)) {
+   gpa = (gpa & PGDIR_MASK) + PGDIR_SIZE;
+   continue;
+   }
+
+   pudp = pud_offset(, gpa);
+   pud = READ_ONCE(*pudp);
+   if (!(pud_val(pud) & _PAGE_PRESENT)) {
+   gpa = (gpa & PUD_MASK) + PUD_SIZE;
+   continue;
+   }
+

97 matches

Mail list logo