[PATCH] powerpc/perf: Quiet PMU registration message
On a Power9 box we get a few screens full of these on boot. Drop them to pr_debug. [5.993645] nest_centaur6_imc performance monitor hardware support registered [5.993728] nest_centaur7_imc performance monitor hardware support registered [5.996510] core_imc performance monitor hardware support registered [5.996569] nest_mba0_imc performance monitor hardware support registered [5.996631] nest_mba1_imc performance monitor hardware support registered [5.996685] nest_mba2_imc performance monitor hardware support registered Signed-off-by: Joel Stanley --- arch/powerpc/perf/core-book3s.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 81f8a0c838ae..a01c521694e8 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -2249,8 +2249,8 @@ int register_power_pmu(struct power_pmu *pmu) return -EBUSY; /* something's already registered */ ppmu = pmu; - pr_info("%s performance monitor hardware support registered\n", - pmu->name); + pr_debug("%s performance monitor hardware support registered\n", +pmu->name); power_pmu.attr_groups = ppmu->attr_groups; -- 2.17.1
Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
On Tue, 9 Oct 2018 06:46:30 +0200 Christophe LEROY wrote: > Le 09/10/2018 à 06:32, Nicholas Piggin a écrit : > > On Mon, 8 Oct 2018 17:39:11 +0200 > > Christophe LEROY wrote: > > > >> Hi Nick, > >> > >> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit : > >>> Use nmi_enter similarly to system reset interrupts. This uses NMI > >>> printk NMI buffers and turns off various debugging facilities that > >>> helps avoid tripping on ourselves or other CPUs. > >>> > >>> Signed-off-by: Nicholas Piggin > >>> --- > >>>arch/powerpc/kernel/traps.c | 9 ++--- > >>>1 file changed, 6 insertions(+), 3 deletions(-) > >>> > >>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c > >>> index 2849c4f50324..6d31f9d7c333 100644 > >>> --- a/arch/powerpc/kernel/traps.c > >>> +++ b/arch/powerpc/kernel/traps.c > >>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs) > >>> > >>>void machine_check_exception(struct pt_regs *regs) > >>>{ > >>> - enum ctx_state prev_state = exception_enter(); > >>> int recover = 0; > >>> + bool nested = in_nmi(); > >>> + if (!nested) > >>> + nmi_enter(); > >> > >> This alters preempt_count, then when die() is called > >> in_interrupt() returns true allthough the trap didn't happen in > >> interrupt, so oops_end() panics for "fatal exception in interrupt" > >> instead of gently sending SIGBUS the faulting app. > > > > Thanks for tracking that down. > > > >> Any idea on how to fix this ? > > > > I would say we have to deliver the sigbus by hand. > > > > if ((user_mode(regs))) > > _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip); > > else > > die("Machine check", regs, SIGBUS); > > > > And what about all the other things done by 'die()' ? > > And what if it is a kernel thread ? > > In one of my boards, I have a kernel thread regularly checking the HW, > and if it gets a machine check I expect it to gently stop and the die > notification to be delivered to all registered notifiers. > > Until before this patch, it was working well. I guess the alternative is we could check regs->trap for machine check in the die test. Complication is having to account for MCE in an interrupt handler. if (in_interrupt()) { if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET))) panic("Fatal exception in interrupt"); } Something like that might work for you? We needs a ppc64 macro for the MCE, and can probably add something like in_nmi_from_interrupt() for the second part of the test. Thanks, Nick
[PATCH] powerpc/mm: make NULL pointer deferences explicit on bad page faults.
As several other arches including x86, this patch makes it explicit that a bad page fault is a NULL pointer dereference when the fault address is lower than PAGE_SIZE Signed-off-by: Christophe Leroy --- arch/powerpc/mm/fault.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index d51cf5f4e45e..501a1eadb3e9 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -631,13 +631,16 @@ void bad_page_fault(struct pt_regs *regs, unsigned long address, int sig) switch (TRAP(regs)) { case 0x300: case 0x380: - printk(KERN_ALERT "Unable to handle kernel paging request for " - "data at address 0x%08lx\n", regs->dar); + pr_alert("Unable to handle kernel %s for data at address 0x%08lx\n", +regs->dar < PAGE_SIZE ? "NULL pointer dereference" : +"paging request", +regs->dar); break; case 0x400: case 0x480: - printk(KERN_ALERT "Unable to handle kernel paging request for " - "instruction fetch\n"); + pr_alert("Unable to handle kernel %s for instruction fetch\n", +regs->nip < PAGE_SIZE ? "NULL pointer dereference" : +"paging request"); break; case 0x600: printk(KERN_ALERT "Unable to handle kernel paging request for " -- 2.13.3
Re: [PATCH 5/5] dma-direct: always allow dma mask <= physiscal memory size
On Wed, 2018-10-03 at 16:10 -0700, Alexander Duyck wrote: > > -* Because 32-bit DMA masks are so common we expect every > > architecture > > -* to be able to satisfy them - either by not supporting more > > physical > > -* memory, or by providing a ZONE_DMA32. If neither is the case, > > the > > -* architecture needs to use an IOMMU instead of the direct mapping. > > -*/ > > - if (mask < phys_to_dma(dev, DMA_BIT_MASK(32))) > > + u64 min_mask; > > + > > + if (IS_ENABLED(CONFIG_ZONE_DMA)) > > + min_mask = DMA_BIT_MASK(ARCH_ZONE_DMA_BITS); > > + else > > + min_mask = DMA_BIT_MASK(32); > > + > > + min_mask = min_t(u64, min_mask, (max_pfn - 1) << PAGE_SHIFT); > > + > > + if (mask >= phys_to_dma(dev, min_mask)) > > return 0; > > -#endif > > return 1; > > } > > So I believe I have run into the same issue that Guenter reported. On > an x86_64 system w/ Intel IOMMU. I wasn't able to complete boot and > all probe attempts for various devices were failing with -EIO errors. > > I believe the last mask check should be "if (mask < phys_to_dma(dev, > min_mask))" not a ">=" check. Right, that test is backwards. I needed to change it here too (powermac with the rest of the powerpc series). Cheers, Ben.
[PATCH v7 9/9] powerpc: clean stack pointers naming
Some stack pointers used to also be thread_info pointers and were called tp. Now that they are only stack pointers, rename them sp. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/irq.c | 17 +++-- arch/powerpc/kernel/setup_64.c | 20 ++-- 2 files changed, 17 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 62cfccf4af89..754f0efc507b 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -659,21 +659,21 @@ void __do_irq(struct pt_regs *regs) void do_IRQ(struct pt_regs *regs) { struct pt_regs *old_regs = set_irq_regs(regs); - void *curtp, *irqtp, *sirqtp; + void *cursp, *irqsp, *sirqsp; /* Switch to the irq stack to handle this */ - curtp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1)); - irqtp = hardirq_ctx[raw_smp_processor_id()]; - sirqtp = softirq_ctx[raw_smp_processor_id()]; + cursp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1)); + irqsp = hardirq_ctx[raw_smp_processor_id()]; + sirqsp = softirq_ctx[raw_smp_processor_id()]; /* Already there ? */ - if (unlikely(curtp == irqtp || curtp == sirqtp)) { + if (unlikely(cursp == irqsp || cursp == sirqsp)) { __do_irq(regs); set_irq_regs(old_regs); return; } /* Switch stack and call */ - call_do_irq(regs, irqtp); + call_do_irq(regs, irqsp); set_irq_regs(old_regs); } @@ -732,10 +732,7 @@ void irq_ctx_init(void) void do_softirq_own_stack(void) { - void *irqtp; - - irqtp = softirq_ctx[smp_processor_id()]; - call_do_softirq(irqtp); + call_do_softirq(softirq_ctx[smp_processor_id()]); } irq_hw_number_t virq_to_hw(unsigned int virq) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 6792e9c90689..4912ec0320b8 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -717,22 +717,22 @@ void __init emergency_stack_init(void) limit = min(ppc64_bolted_size(), ppc64_rma_size); for_each_possible_cpu(i) { - void *ti; + void *sp; - ti = alloc_stack(limit, i); - memset(ti, 0, THREAD_SIZE); - paca_ptrs[i]->emergency_sp = ti + THREAD_SIZE; + sp = alloc_stack(limit, i); + memset(sp, 0, THREAD_SIZE); + paca_ptrs[i]->emergency_sp = sp + THREAD_SIZE; #ifdef CONFIG_PPC_BOOK3S_64 /* emergency stack for NMI exception handling. */ - ti = alloc_stack(limit, i); - memset(ti, 0, THREAD_SIZE); - paca_ptrs[i]->nmi_emergency_sp = ti + THREAD_SIZE; + sp = alloc_stack(limit, i); + memset(sp, 0, THREAD_SIZE); + paca_ptrs[i]->nmi_emergency_sp = sp + THREAD_SIZE; /* emergency stack for machine check exception handling. */ - ti = alloc_stack(limit, i); - memset(ti, 0, THREAD_SIZE); - paca_ptrs[i]->mc_emergency_sp = ti + THREAD_SIZE; + sp = alloc_stack(limit, i); + memset(sp, 0, THREAD_SIZE); + paca_ptrs[i]->mc_emergency_sp = sp + THREAD_SIZE; #endif } } -- 2.13.3
[PATCH v7 8/9] powerpc/64: Remove CURRENT_THREAD_INFO
Now that current_thread_info is located at the beginning of 'current' task struct, CURRENT_THREAD_INFO macro is not really needed any more. This patch replaces it by loads of the value at PACACURRENT(r13). Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/exception-64s.h | 4 ++-- arch/powerpc/include/asm/thread_info.h | 4 arch/powerpc/kernel/entry_64.S | 10 +- arch/powerpc/kernel/exceptions-64e.S | 2 +- arch/powerpc/kernel/exceptions-64s.S | 2 +- arch/powerpc/kernel/idle_book3e.S | 2 +- arch/powerpc/kernel/idle_power4.S | 2 +- arch/powerpc/kernel/trace/ftrace_64_mprofile.S | 6 +++--- 8 files changed, 14 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index a86fead0..ca3af3e9015e 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -680,7 +680,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #define RUNLATCH_ON\ BEGIN_FTR_SECTION \ - CURRENT_THREAD_INFO(r3, r1);\ + ld r3, PACACURRENT(r13); \ ld r4,TI_LOCAL_FLAGS(r3); \ andi. r0,r4,_TLF_RUNLATCH;\ beqlppc64_runlatch_on_trampoline; \ @@ -730,7 +730,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL) #ifdef CONFIG_PPC_970_NAP #define FINISH_NAP \ BEGIN_FTR_SECTION \ - CURRENT_THREAD_INFO(r11, r1); \ + ld r11, PACACURRENT(r13); \ ld r9,TI_LOCAL_FLAGS(r11); \ andi. r10,r9,_TLF_NAPPING;\ bnelpower4_fixup_nap; \ diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index 361bb45b8990..2ee9e248c933 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -17,10 +17,6 @@ #define THREAD_SIZE(1 << THREAD_SHIFT) -#ifdef CONFIG_PPC64 -#define CURRENT_THREAD_INFO(dest, sp) stringify_in_c(ld dest, PACACURRENT(r13)) -#endif - #ifndef __ASSEMBLY__ #include #include diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 6fce0f8fd8c4..06d9a7c084a1 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -158,7 +158,7 @@ system_call:/* label this so stack traces look sane */ li r10,IRQS_ENABLED std r10,SOFTE(r1) - CURRENT_THREAD_INFO(r11, r1) + ld r11, PACACURRENT(r13) ld r10,TI_FLAGS(r11) andi. r11,r10,_TIF_SYSCALL_DOTRACE bne .Lsyscall_dotrace /* does not return */ @@ -205,7 +205,7 @@ system_call:/* label this so stack traces look sane */ ld r3,RESULT(r1) #endif - CURRENT_THREAD_INFO(r12, r1) + ld r12, PACACURRENT(r13) ld r8,_MSR(r1) #ifdef CONFIG_PPC_BOOK3S @@ -336,7 +336,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) /* Repopulate r9 and r10 for the syscall path */ addir9,r1,STACK_FRAME_OVERHEAD - CURRENT_THREAD_INFO(r10, r1) + ld r10, PACACURRENT(r13) ld r10,TI_FLAGS(r10) cmpldi r0,NR_syscalls @@ -735,7 +735,7 @@ _GLOBAL(ret_from_except_lite) mtmsrd r10,1 /* Update machine state */ #endif /* CONFIG_PPC_BOOK3E */ - CURRENT_THREAD_INFO(r9, r1) + ld r9, PACACURRENT(r13) ld r3,_MSR(r1) #ifdef CONFIG_PPC_BOOK3E ld r10,PACACURRENT(r13) @@ -849,7 +849,7 @@ resume_kernel: 1: bl preempt_schedule_irq /* Re-test flags and eventually loop */ - CURRENT_THREAD_INFO(r9, r1) + ld r9, PACACURRENT(r13) ld r4,TI_FLAGS(r9) andi. r0,r4,_TIF_NEED_RESCHED bne 1b diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 231d066b4a3d..dfafcd0af009 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -469,7 +469,7 @@ exc_##n##_bad_stack: \ * interrupts happen before the wait instruction. */ #define CHECK_NAPPING() \ - CURRENT_THREAD_INFO(r11, r1); \ + ld r11, PACACURRENT(r13); \ ld r10,TI_LOCAL_FLAGS(r11);\ andi. r9,r10,_TLF_NAPPING;\ beq+1f; \ diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index b9239dbf6d59..f776f30ecfcc 100644
[PATCH v7 6/9] powerpc: 'current_set' is now a table of task_struct pointers
The table of pointers 'current_set' has been used for retrieving the stack and current. They used to be thread_info pointers as they were pointing to the stack and current was taken from the 'task' field of the thread_info. Now, the pointers of 'current_set' table are now both pointers to task_struct and pointers to thread_info. As they are used to get current, and the stack pointer is retrieved from current's stack field, this patch changes their type to task_struct, and renames secondary_ti to secondary_current. Reviewed-by: Nicholas Piggin Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/asm-prototypes.h | 4 ++-- arch/powerpc/kernel/head_32.S | 6 +++--- arch/powerpc/kernel/head_44x.S| 4 ++-- arch/powerpc/kernel/head_fsl_booke.S | 4 ++-- arch/powerpc/kernel/smp.c | 10 -- 5 files changed, 13 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index 9bc98c239305..ab0541f9da42 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -23,8 +23,8 @@ #include /* SMP */ -extern struct thread_info *current_set[NR_CPUS]; -extern struct thread_info *secondary_ti; +extern struct task_struct *current_set[NR_CPUS]; +extern struct task_struct *secondary_current; void start_secondary(void *unused); /* kexec */ diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index 44dfd73b2a62..ba0341bd5a00 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -842,9 +842,9 @@ __secondary_start: #endif /* CONFIG_6xx */ /* get current's stack and current */ - lis r1,secondary_ti@ha - tophys(r1,r1) - lwz r2,secondary_ti@l(r1) + lis r2,secondary_current@ha + tophys(r2,r2) + lwz r2,secondary_current@l(r2) tophys(r1,r2) lwz r1,TASK_STACK(r1) diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S index 2c7e90f36358..48e4de4dfd0c 100644 --- a/arch/powerpc/kernel/head_44x.S +++ b/arch/powerpc/kernel/head_44x.S @@ -1021,8 +1021,8 @@ _GLOBAL(start_secondary_47x) /* Now we can get our task struct and real stack pointer */ /* Get current's stack and current */ - lis r1,secondary_ti@ha - lwz r2,secondary_ti@l(r1) + lis r2,secondary_current@ha + lwz r2,secondary_current@l(r2) lwz r1,TASK_STACK(r2) /* Current stack pointer */ diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index b8a2b789677e..0d27bfff52dd 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -1076,8 +1076,8 @@ __secondary_start: bl call_setup_cpu /* get current's stack and current */ - lis r1,secondary_ti@ha - lwz r2,secondary_ti@l(r1) + lis r2,secondary_current@ha + lwz r2,secondary_current@l(r2) lwz r1,TASK_STACK(r2) /* stack */ diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index f22fcbeb9898..00193643f0da 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -74,7 +74,7 @@ static DEFINE_PER_CPU(int, cpu_state) = { 0 }; #endif -struct thread_info *secondary_ti; +struct task_struct *secondary_current; DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map); DEFINE_PER_CPU(cpumask_var_t, cpu_l2_cache_map); @@ -644,7 +644,7 @@ void smp_send_stop(void) } #endif /* CONFIG_NMI_IPI */ -struct thread_info *current_set[NR_CPUS]; +struct task_struct *current_set[NR_CPUS]; static void smp_store_cpu_info(int id) { @@ -724,7 +724,7 @@ void smp_prepare_boot_cpu(void) paca_ptrs[boot_cpuid]->__current = current; #endif set_numa_node(numa_cpu_lookup_table[boot_cpuid]); - current_set[boot_cpuid] = task_thread_info(current); + current_set[boot_cpuid] = current; } #ifdef CONFIG_HOTPLUG_CPU @@ -809,15 +809,13 @@ static bool secondaries_inhibited(void) static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle) { - struct thread_info *ti = task_thread_info(idle); - #ifdef CONFIG_PPC64 paca_ptrs[cpu]->__current = idle; paca_ptrs[cpu]->kstack = (unsigned long)task_stack_page(idle) + THREAD_SIZE - STACK_FRAME_OVERHEAD; #endif idle->cpu = cpu; - secondary_ti = current_set[cpu] = ti; + secondary_current = current_set[cpu] = idle; } int __cpu_up(unsigned int cpu, struct task_struct *tidle) -- 2.13.3
[PATCH v7 7/9] powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU
Now that thread_info is similar to task_struct, it's address is in r2 so CURRENT_THREAD_INFO() macro is useless. This patch removes it. At the same time, as the 'cpu' field is not anymore in thread_info, this patch renames it to TASK_CPU. Signed-off-by: Christophe Leroy --- arch/powerpc/Makefile | 2 +- arch/powerpc/include/asm/thread_info.h | 2 -- arch/powerpc/kernel/asm-offsets.c | 2 +- arch/powerpc/kernel/entry_32.S | 43 -- arch/powerpc/kernel/epapr_hcalls.S | 5 ++-- arch/powerpc/kernel/head_fsl_booke.S | 5 ++-- arch/powerpc/kernel/idle_6xx.S | 8 +++ arch/powerpc/kernel/idle_e500.S| 8 +++ arch/powerpc/kernel/misc_32.S | 3 +-- arch/powerpc/mm/hash_low_32.S | 14 --- arch/powerpc/sysdev/6xx-suspend.S | 5 ++-- 11 files changed, 35 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 02e7ca1c15d4..f1e2d7f7b022 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -426,7 +426,7 @@ ifdef CONFIG_SMP prepare: task_cpu_prepare task_cpu_prepare: prepare0 - $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") print $$3;}' include/generated/asm-offsets.h)) + $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TASK_CPU") print $$3;}' include/generated/asm-offsets.h)) endif # Use the file '.tmp_gas_check' for binutils tests, as gas won't output diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index 61c8747cd926..361bb45b8990 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -19,8 +19,6 @@ #ifdef CONFIG_PPC64 #define CURRENT_THREAD_INFO(dest, sp) stringify_in_c(ld dest, PACACURRENT(r13)) -#else -#define CURRENT_THREAD_INFO(dest, sp) stringify_in_c(mr dest, r2) #endif #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 768ce602d624..31be6eb9c0d4 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -97,7 +97,7 @@ int main(void) #endif /* CONFIG_PPC64 */ OFFSET(TASK_STACK, task_struct, stack); #ifdef CONFIG_SMP - OFFSET(TI_CPU, task_struct, cpu); + OFFSET(TASK_CPU, task_struct, cpu); #endif #ifdef CONFIG_LIVEPATCH diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index bd3b146e18a3..d0c546ce387e 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -168,8 +168,7 @@ transfer_to_handler: tophys(r11,r11) addir11,r11,global_dbcr0@l #ifdef CONFIG_SMP - CURRENT_THREAD_INFO(r9, r1) - lwz r9,TI_CPU(r9) + lwz r9,TASK_CPU(r2) slwir9,r9,3 add r11,r11,r9 #endif @@ -180,8 +179,7 @@ transfer_to_handler: stw r12,4(r11) #endif #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE - CURRENT_THREAD_INFO(r9, r1) - tophys(r9, r9) + tophys(r9, r2) ACCOUNT_CPU_USER_ENTRY(r9, r11, r12) #endif @@ -195,8 +193,7 @@ transfer_to_handler: ble-stack_ovf /* then the kernel stack overflowed */ 5: #if defined(CONFIG_6xx) || defined(CONFIG_E500) - CURRENT_THREAD_INFO(r9, r1) - tophys(r9,r9) /* check local flags */ + tophys(r9,r2) /* check local flags */ lwz r12,TI_LOCAL_FLAGS(r9) mtcrf 0x01,r12 bt- 31-TLF_NAPPING,4f @@ -345,8 +342,7 @@ _GLOBAL(DoSyscall) mtmsr r11 1: #endif /* CONFIG_TRACE_IRQFLAGS */ - CURRENT_THREAD_INFO(r10, r1) - lwz r11,TI_FLAGS(r10) + lwz r11,TI_FLAGS(r2) andi. r11,r11,_TIF_SYSCALL_DOTRACE bne-syscall_dotrace syscall_dotrace_cont: @@ -379,13 +375,12 @@ ret_from_syscall: lwz r3,GPR3(r1) #endif mr r6,r3 - CURRENT_THREAD_INFO(r12, r1) /* disable interrupts so current_thread_info()->flags can't change */ LOAD_MSR_KERNEL(r10,MSR_KERNEL) /* doesn't include MSR_EE */ /* Note: We don't bother telling lockdep about it */ SYNC MTMSRD(r10) - lwz r9,TI_FLAGS(r12) + lwz r9,TI_FLAGS(r2) li r8,-MAX_ERRNO andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) bne-syscall_exit_work @@ -432,8 +427,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX) #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE andi. r4,r8,MSR_PR beq 3f - CURRENT_THREAD_INFO(r4, r1) - ACCOUNT_CPU_USER_EXIT(r4, r5, r7) + ACCOUNT_CPU_USER_EXIT(r2, r5, r7) 3: #endif lwz r4,_LINK(r1) @@ -526,7 +520,7 @@ syscall_exit_work: /* Clear per-syscall TIF flags if any are set. */ li r11,_TIF_PERSYSCALL_MASK - addir12,r12,TI_FLAGS + addi
[PATCH v7 5/9] powerpc: regain entire stack space
thread_info is not anymore in the stack, so the entire stack can now be used. There is also no risk anymore of corrupting task_cpu(p) with a stack overflow so the patch removes the test. When doing this, an explicit test for NULL stack pointer is needed in validate_sp() as it is not anymore implicitely covered by the sizeof(thread_info) gap. In the meantime, with the previous patch all pointers to the stacks are not anymore pointers to thread_info so this patch changes them to void* Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/irq.h | 10 +- arch/powerpc/include/asm/processor.h | 3 +-- arch/powerpc/kernel/asm-offsets.c| 1 - arch/powerpc/kernel/entry_32.S | 14 -- arch/powerpc/kernel/irq.c| 19 +-- arch/powerpc/kernel/misc_32.S| 6 ++ arch/powerpc/kernel/process.c| 32 +--- arch/powerpc/kernel/setup_64.c | 8 8 files changed, 38 insertions(+), 55 deletions(-) diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index 2efbae8d93be..966ddd4d2414 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -48,9 +48,9 @@ struct pt_regs; * Per-cpu stacks for handling critical, debug and machine check * level interrupts. */ -extern struct thread_info *critirq_ctx[NR_CPUS]; -extern struct thread_info *dbgirq_ctx[NR_CPUS]; -extern struct thread_info *mcheckirq_ctx[NR_CPUS]; +extern void *critirq_ctx[NR_CPUS]; +extern void *dbgirq_ctx[NR_CPUS]; +extern void *mcheckirq_ctx[NR_CPUS]; extern void exc_lvl_ctx_init(void); #else #define exc_lvl_ctx_init() @@ -59,8 +59,8 @@ extern void exc_lvl_ctx_init(void); /* * Per-cpu stacks for handling hard and soft interrupts. */ -extern struct thread_info *hardirq_ctx[NR_CPUS]; -extern struct thread_info *softirq_ctx[NR_CPUS]; +extern void *hardirq_ctx[NR_CPUS]; +extern void *softirq_ctx[NR_CPUS]; extern void irq_ctx_init(void); void call_do_softirq(void *sp); diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index b225c7f7c5a4..e763342265a2 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -331,8 +331,7 @@ struct thread_struct { #define ARCH_MIN_TASKALIGN 16 #define INIT_SP(sizeof(init_stack) + (unsigned long) _stack) -#define INIT_SP_LIMIT \ - (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack) +#define INIT_SP_LIMIT ((unsigned long)_stack) #ifdef CONFIG_SPE #define SPEFSCR_INIT \ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 833d189df04c..768ce602d624 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -93,7 +93,6 @@ int main(void) DEFINE(NMI_MASK, NMI_MASK); OFFSET(TASKTHREADPPR, task_struct, thread.ppr); #else - DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16)); OFFSET(KSP_LIMIT, thread_struct, ksp_limit); #endif /* CONFIG_PPC64 */ OFFSET(TASK_STACK, task_struct, stack); diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index fa7a69ffb37a..bd3b146e18a3 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -97,14 +97,11 @@ crit_transfer_to_handler: mfspr r0,SPRN_SRR1 stw r0,_SRR1(r11) - /* set the stack limit to the current stack -* and set the limit to protect the thread_info -* struct -*/ + /* set the stack limit to the current stack */ mfspr r8,SPRN_SPRG_THREAD lwz r0,KSP_LIMIT(r8) stw r0,SAVED_KSP_LIMIT(r11) - rlwimi r0,r1,0,0,(31-THREAD_SHIFT) + rlwinm r0,r1,0,0,(31 - THREAD_SHIFT) stw r0,KSP_LIMIT(r8) /* fall through */ #endif @@ -121,14 +118,11 @@ crit_transfer_to_handler: mfspr r0,SPRN_SRR1 stw r0,crit_srr1@l(0) - /* set the stack limit to the current stack -* and set the limit to protect the thread_info -* struct -*/ + /* set the stack limit to the current stack */ mfspr r8,SPRN_SPRG_THREAD lwz r0,KSP_LIMIT(r8) stw r0,saved_ksp_limit@l(0) - rlwimi r0,r1,0,0,(31-THREAD_SHIFT) + rlwinm r0,r1,0,0,(31 - THREAD_SHIFT) stw r0,KSP_LIMIT(r8) /* fall through */ #endif diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 3fdb6b6973cf..62cfccf4af89 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -618,9 +618,8 @@ static inline void check_stack_overflow(void) sp = current_stack_pointer() & (THREAD_SIZE-1); /* check for stack overflow: is there less than 2KB free? */ - if (unlikely(sp < (sizeof(struct thread_info) + 2048))) { - pr_err("do_IRQ: stack overflow: %ld\n", - sp -
[PATCH v7 4/9] powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
This patch activates CONFIG_THREAD_INFO_IN_TASK which moves the thread_info into task_struct. Moving thread_info into task_struct has the following advantages: - It protects thread_info from corruption in the case of stack overflows. - Its address is harder to determine if stack addresses are leaked, making a number of attacks more difficult. This has the following consequences: - thread_info is now located at the beginning of task_struct. - The 'cpu' field is now in task_struct, and only exists when CONFIG_SMP is active. - thread_info doesn't have anymore the 'task' field. This patch: - Removes all recopy of thread_info struct when the stack changes. - Changes the CURRENT_THREAD_INFO() macro to point to current. - Selects CONFIG_THREAD_INFO_IN_TASK. - Modifies raw_smp_processor_id() to get ->cpu from current without including linux/sched.h to avoid circular inclusion and without including asm/asm-offsets.h to avoid symbol names duplication between ASM constants and C constants. Signed-off-by: Christophe Leroy Reviewed-by: Nicholas Piggin --- arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 8 +- arch/powerpc/include/asm/ptrace.h | 2 +- arch/powerpc/include/asm/smp.h | 17 +++- arch/powerpc/include/asm/thread_info.h | 17 ++-- arch/powerpc/kernel/asm-offsets.c | 7 +++-- arch/powerpc/kernel/entry_32.S | 9 +++ arch/powerpc/kernel/exceptions-64e.S | 11 arch/powerpc/kernel/head_32.S | 6 ++--- arch/powerpc/kernel/head_44x.S | 4 +-- arch/powerpc/kernel/head_64.S | 1 + arch/powerpc/kernel/head_booke.h | 8 +- arch/powerpc/kernel/head_fsl_booke.S | 7 +++-- arch/powerpc/kernel/irq.c | 47 +- arch/powerpc/kernel/kgdb.c | 28 arch/powerpc/kernel/machine_kexec_64.c | 6 ++--- arch/powerpc/kernel/setup_64.c | 21 --- arch/powerpc/kernel/smp.c | 2 +- arch/powerpc/net/bpf_jit32.h | 5 ++-- 19 files changed, 52 insertions(+), 155 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 602eea723624..3b958cd4e284 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -238,6 +238,7 @@ config PPC select RTC_LIB select SPARSE_IRQ select SYSCTL_EXCEPTION_TRACE + select THREAD_INFO_IN_TASK select VIRT_TO_BUS if !PPC64 # # Please keep this list sorted alphabetically. diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 81552c7b46eb..02e7ca1c15d4 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -422,6 +422,13 @@ else endif endif +ifdef CONFIG_SMP +prepare: task_cpu_prepare + +task_cpu_prepare: prepare0 + $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") print $$3;}' include/generated/asm-offsets.h)) +endif + # Use the file '.tmp_gas_check' for binutils tests, as gas won't output # to stdout and these checks are run even on install targets. TOUT := .tmp_gas_check @@ -439,4 +446,3 @@ checkbin: CLEAN_FILES += $(TOUT) - diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h index 447cbd1bee99..3a7e5561630b 100644 --- a/arch/powerpc/include/asm/ptrace.h +++ b/arch/powerpc/include/asm/ptrace.h @@ -120,7 +120,7 @@ extern int ptrace_put_reg(struct task_struct *task, int regno, unsigned long data); #define current_pt_regs() \ - ((struct pt_regs *)((unsigned long)current_thread_info() + THREAD_SIZE) - 1) + ((struct pt_regs *)((unsigned long)task_stack_page(current) + THREAD_SIZE) - 1) /* * We use the least-significant bit of the trap field to indicate * whether we have saved the full set of registers, or only a diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index 95b66a0c639b..93a8cd120663 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -83,7 +83,22 @@ int is_cpu_dead(unsigned int cpu); /* 32-bit */ extern int smp_hw_index[]; -#define raw_smp_processor_id() (current_thread_info()->cpu) +/* + * This is particularly ugly: it appears we can't actually get the definition + * of task_struct here, but we need access to the CPU this task is running on. + * Instead of using task_struct we're using _TASK_CPU which is extracted from + * asm-offsets.h by kbuild to get the current processor ID. + * + * This also needs to be safeguarded when building asm-offsets.s because at + * that time _TASK_CPU is not defined yet. It could have been guarded by + * _TASK_CPU itself, but we want the build to fail if _TASK_CPU is missing + * when building something else than asm-offsets.s + */ +#ifdef GENERATING_ASM_OFFSETS +#define raw_smp_processor_id() (0) +#else +#define raw_smp_processor_id() (*(unsigned int *)((void *)current + _TASK_CPU))
[PATCH v7 3/9] powerpc: Prepare for moving thread_info into task_struct
This patch cleans the powerpc kernel before activating CONFIG_THREAD_INFO_IN_TASK: - The purpose of the pointer given to call_do_softirq() and call_do_irq() is to point the new stack ==> change it to void* and rename it 'sp' - Don't use CURRENT_THREAD_INFO() to locate the stack. - Fix a few comments. - Replace current_thread_info()->task by current - Remove unnecessary casts to thread_info, as they'll become invalid once thread_info is not in stack anymore. - Rename THREAD_INFO to TASK_STASK: as it is in fact the offset of the pointer to the stack in task_struct, this pointer will not be impacted by the move of THREAD_INFO. - Makes TASK_STACK available to PPC64. PPC64 will need it to get the stack pointer from current once the thread_info have been moved. - Modifies klp_init_thread_info() to take task_struct pointer argument. Signed-off-by: Christophe Leroy Reviewed-by: Nicholas Piggin --- arch/powerpc/include/asm/irq.h | 4 ++-- arch/powerpc/include/asm/livepatch.h | 7 --- arch/powerpc/include/asm/processor.h | 4 ++-- arch/powerpc/include/asm/reg.h | 2 +- arch/powerpc/kernel/asm-offsets.c| 2 +- arch/powerpc/kernel/entry_32.S | 2 +- arch/powerpc/kernel/entry_64.S | 2 +- arch/powerpc/kernel/head_32.S| 4 ++-- arch/powerpc/kernel/head_40x.S | 4 ++-- arch/powerpc/kernel/head_44x.S | 2 +- arch/powerpc/kernel/head_8xx.S | 2 +- arch/powerpc/kernel/head_booke.h | 4 ++-- arch/powerpc/kernel/head_fsl_booke.S | 4 ++-- arch/powerpc/kernel/irq.c| 2 +- arch/powerpc/kernel/misc_32.S| 4 ++-- arch/powerpc/kernel/process.c| 8 arch/powerpc/kernel/setup-common.c | 2 +- arch/powerpc/kernel/setup_32.c | 15 +-- arch/powerpc/kernel/smp.c| 4 +++- 19 files changed, 38 insertions(+), 40 deletions(-) diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index ee39ce56b2a2..2efbae8d93be 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -63,8 +63,8 @@ extern struct thread_info *hardirq_ctx[NR_CPUS]; extern struct thread_info *softirq_ctx[NR_CPUS]; extern void irq_ctx_init(void); -extern void call_do_softirq(struct thread_info *tp); -extern void call_do_irq(struct pt_regs *regs, struct thread_info *tp); +void call_do_softirq(void *sp); +void call_do_irq(struct pt_regs *regs, void *sp); extern void do_IRQ(struct pt_regs *regs); extern void __init init_IRQ(void); extern void __do_irq(struct pt_regs *regs); diff --git a/arch/powerpc/include/asm/livepatch.h b/arch/powerpc/include/asm/livepatch.h index 47a03b9b528b..8a81d10ccc82 100644 --- a/arch/powerpc/include/asm/livepatch.h +++ b/arch/powerpc/include/asm/livepatch.h @@ -43,13 +43,14 @@ static inline unsigned long klp_get_ftrace_location(unsigned long faddr) return ftrace_location_range(faddr, faddr + 16); } -static inline void klp_init_thread_info(struct thread_info *ti) +static inline void klp_init_thread_info(struct task_struct *p) { + struct thread_info *ti = task_thread_info(p); /* + 1 to account for STACK_END_MAGIC */ - ti->livepatch_sp = (unsigned long *)(ti + 1) + 1; + ti->livepatch_sp = end_of_stack(p) + 1; } #else -static void klp_init_thread_info(struct thread_info *ti) { } +static inline void klp_init_thread_info(struct task_struct *p) { } #endif /* CONFIG_LIVEPATCH */ #endif /* _ASM_POWERPC_LIVEPATCH_H */ diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 13589274fe9b..b225c7f7c5a4 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -40,7 +40,7 @@ #ifndef __ASSEMBLY__ #include -#include +#include #include #include @@ -332,7 +332,7 @@ struct thread_struct { #define INIT_SP(sizeof(init_stack) + (unsigned long) _stack) #define INIT_SP_LIMIT \ - (_ALIGN_UP(sizeof(init_thread_info), 16) + (unsigned long) _stack) + (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack) #ifdef CONFIG_SPE #define SPEFSCR_INIT \ diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 640a4d818772..d2528a0b2f5b 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -1058,7 +1058,7 @@ * - SPRG9 debug exception scratch * * All 32-bit: - * - SPRG3 current thread_info pointer + * - SPRG3 current thread_struct physical addr pointer *(virtual on BookE, physical on others) * * 32-bit classic: diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index a6d70fd2e499..c583a02e5a21 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -91,10 +91,10 @@ int main(void) DEFINE(NMI_MASK, NMI_MASK); OFFSET(TASKTHREADPPR, task_struct, thread.ppr); #else - OFFSET(THREAD_INFO, task_struct, stack);
[PATCH v7 2/9] powerpc: Only use task_struct 'cpu' field on SMP
When moving to CONFIG_THREAD_INFO_IN_TASK, the thread_info 'cpu' field gets moved into task_struct and only defined when CONFIG_SMP is set. This patch ensures that TI_CPU is only used when CONFIG_SMP is set and that task_struct 'cpu' field is not used directly out of SMP code. Signed-off-by: Christophe Leroy Reviewed-by: Nicholas Piggin --- arch/powerpc/kernel/head_fsl_booke.S | 2 ++ arch/powerpc/kernel/misc_32.S| 4 arch/powerpc/xmon/xmon.c | 2 +- 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index e2750b856c8f..05b574f416b3 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -243,8 +243,10 @@ set_ivor: li r0,0 stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1) +#ifdef CONFIG_SMP CURRENT_THREAD_INFO(r22, r1) stw r24, TI_CPU(r22) +#endif bl early_init diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S index 695b24a2d954..2f0fe8bfc078 100644 --- a/arch/powerpc/kernel/misc_32.S +++ b/arch/powerpc/kernel/misc_32.S @@ -183,10 +183,14 @@ _GLOBAL(low_choose_750fx_pll) or r4,r4,r5 mtspr SPRN_HID1,r4 +#ifdef CONFIG_SMP /* Store new HID1 image */ CURRENT_THREAD_INFO(r6, r1) lwz r6,TI_CPU(r6) slwir6,r6,2 +#else + li r6, 0 +#endif addis r6,r6,nap_save_hid1@ha stw r4,nap_save_hid1@l(r6) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index c70d17c9a6ba..1731793e1277 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -2986,7 +2986,7 @@ static void show_task(struct task_struct *tsk) printf("%px %016lx %6d %6d %c %2d %s\n", tsk, tsk->thread.ksp, tsk->pid, tsk->parent->pid, - state, task_thread_info(tsk)->cpu, + state, task_cpu(tsk), tsk->comm); } -- 2.13.3
[PATCH v7 1/9] book3s/64: avoid circular header inclusion in mmu-hash.h
When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h includes asm/current.h. This generates a circular dependency. To avoid that, asm/processor.h shall not be included in mmu-hash.h In order to do that, this patch moves into a new header called asm/task_size_user64.h the information from asm/processor.h required by mmu-hash.h Signed-off-by: Christophe Leroy Reviewed-by: Nicholas Piggin --- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 +- arch/powerpc/include/asm/processor.h | 34 +- arch/powerpc/include/asm/task_size_user64.h | 42 +++ arch/powerpc/kvm/book3s_hv_hmi.c | 1 + 4 files changed, 45 insertions(+), 34 deletions(-) create mode 100644 arch/powerpc/include/asm/task_size_user64.h diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index e0e4ce8f77d6..02955d867067 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -23,7 +23,7 @@ */ #include #include -#include +#include #include /* diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 52fadded5c1e..13589274fe9b 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -101,40 +101,8 @@ void release_thread(struct task_struct *); #endif #ifdef CONFIG_PPC64 -/* - * 64-bit user address space can have multiple limits - * For now supported values are: - */ -#define TASK_SIZE_64TB (0x4000UL) -#define TASK_SIZE_128TB (0x8000UL) -#define TASK_SIZE_512TB (0x0002UL) -#define TASK_SIZE_1PB (0x0004UL) -#define TASK_SIZE_2PB (0x0008UL) -/* - * With 52 bits in the address we can support - * upto 4PB of range. - */ -#define TASK_SIZE_4PB (0x0010UL) -/* - * For now 512TB is only supported with book3s and 64K linux page size. - */ -#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES) -/* - * Max value currently used: - */ -#define TASK_SIZE_USER64 TASK_SIZE_4PB -#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_128TB -#define TASK_CONTEXT_SIZE TASK_SIZE_512TB -#else -#define TASK_SIZE_USER64 TASK_SIZE_64TB -#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_64TB -/* - * We don't need to allocate extended context ids for 4K page size, because - * we limit the max effective address on this config to 64TB. - */ -#define TASK_CONTEXT_SIZE TASK_SIZE_64TB -#endif +#include /* * 32-bit user address space is 4GB - 1 page diff --git a/arch/powerpc/include/asm/task_size_user64.h b/arch/powerpc/include/asm/task_size_user64.h new file mode 100644 index ..a4043075864b --- /dev/null +++ b/arch/powerpc/include/asm/task_size_user64.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_TASK_SIZE_USER64_H +#define _ASM_POWERPC_TASK_SIZE_USER64_H + +#ifdef CONFIG_PPC64 +/* + * 64-bit user address space can have multiple limits + * For now supported values are: + */ +#define TASK_SIZE_64TB (0x4000UL) +#define TASK_SIZE_128TB (0x8000UL) +#define TASK_SIZE_512TB (0x0002UL) +#define TASK_SIZE_1PB (0x0004UL) +#define TASK_SIZE_2PB (0x0008UL) +/* + * With 52 bits in the address we can support + * upto 4PB of range. + */ +#define TASK_SIZE_4PB (0x0010UL) + +/* + * For now 512TB is only supported with book3s and 64K linux page size. + */ +#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES) +/* + * Max value currently used: + */ +#define TASK_SIZE_USER64 TASK_SIZE_4PB +#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_128TB +#define TASK_CONTEXT_SIZE TASK_SIZE_512TB +#else +#define TASK_SIZE_USER64 TASK_SIZE_64TB +#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_64TB +/* + * We don't need to allocate extended context ids for 4K page size, because + * we limit the max effective address on this config to 64TB. + */ +#define TASK_CONTEXT_SIZE TASK_SIZE_64TB +#endif + +#endif /* CONFIG_PPC64 */ +#endif /* _ASM_POWERPC_TASK_SIZE_USER64_H */ diff --git a/arch/powerpc/kvm/book3s_hv_hmi.c b/arch/powerpc/kvm/book3s_hv_hmi.c index e3f738eb1cac..64b5011475c7 100644 --- a/arch/powerpc/kvm/book3s_hv_hmi.c +++ b/arch/powerpc/kvm/book3s_hv_hmi.c @@ -24,6 +24,7 @@ #include #include #include +#include void wait_for_subcore_guest_exit(void) { -- 2.13.3
[PATCH v7 0/9] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK
The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which moves the thread_info into task_struct. Moving thread_info into task_struct has the following advantages: - It protects thread_info from corruption in the case of stack overflows. - Its address is harder to determine if stack addresses are leaked, making a number of attacks more difficult. Changes since v6: - Fixed validate_sp() to exclude NULL sp in 'regain entire stack space' patch (early crash with CONFIG_KMEMLEAK) Changes since v5: - Fixed livepatch_sp setup by using end_of_stack() instead of hardcoding - Fixed PPC_BPF_LOAD_CPU() macro Changes since v4: - Fixed a build failure on 32bits SMP when include/generated/asm-offsets.h is not already existing, was due to spaces instead of a tab in the Makefile Changes since RFC v3: (based on Nick's review) - Renamed task_size.h to task_size_user64.h to better relate to what it contains. - Handling of the isolation of thread_info cpu field inside CONFIG_SMP #ifdefs moved to a separate patch. - Removed CURRENT_THREAD_INFO macro completely. - Added a guard in asm/smp.h to avoid build failure before _TASK_CPU is defined. - Added a patch at the end to rename 'tp' pointers to 'sp' pointers - Renamed 'tp' into 'sp' pointers in preparation patch when relevant - Fixed a few commit logs - Fixed checkpatch report. Changes since RFC v2: - Removed the modification of names in asm-offsets - Created a rule in arch/powerpc/Makefile to append the offset of current->cpu in CFLAGS - Modified asm/smp.h to use the offset set in CFLAGS - Squashed the renaming of THREAD_INFO to TASK_STACK in the preparation patch - Moved the modification of current_pt_regs in the patch activating CONFIG_THREAD_INFO_IN_TASK Changes since RFC v1: - Removed the first patch which was modifying header inclusion order in timer - Modified some names in asm-offsets to avoid conflicts when including asm-offsets in C files - Modified asm/smp.h to avoid having to include linux/sched.h (using asm-offsets instead) - Moved some changes from the activation patch to the preparation patch. Christophe Leroy (9): book3s/64: avoid circular header inclusion in mmu-hash.h powerpc: Only use task_struct 'cpu' field on SMP powerpc: Prepare for moving thread_info into task_struct powerpc: Activate CONFIG_THREAD_INFO_IN_TASK powerpc: regain entire stack space powerpc: 'current_set' is now a table of task_struct pointers powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU powerpc/64: Remove CURRENT_THREAD_INFO powerpc: clean stack pointers naming Christophe Leroy (9): book3s/64: avoid circular header inclusion in mmu-hash.h powerpc: Only use task_struct 'cpu' field on SMP powerpc: Prepare for moving thread_info into task_struct powerpc: Activate CONFIG_THREAD_INFO_IN_TASK powerpc: regain entire stack space powerpc: 'current_set' is now a table of task_struct pointers powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU powerpc/64: Remove CURRENT_THREAD_INFO powerpc: clean stack pointers naming arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 8 ++- arch/powerpc/include/asm/asm-prototypes.h | 4 +- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 +- arch/powerpc/include/asm/exception-64s.h | 4 +- arch/powerpc/include/asm/irq.h | 14 ++--- arch/powerpc/include/asm/livepatch.h | 7 ++- arch/powerpc/include/asm/processor.h | 39 + arch/powerpc/include/asm/ptrace.h | 2 +- arch/powerpc/include/asm/reg.h | 2 +- arch/powerpc/include/asm/smp.h | 17 +- arch/powerpc/include/asm/task_size_user64.h| 42 ++ arch/powerpc/include/asm/thread_info.h | 19 --- arch/powerpc/kernel/asm-offsets.c | 10 ++-- arch/powerpc/kernel/entry_32.S | 66 -- arch/powerpc/kernel/entry_64.S | 12 ++-- arch/powerpc/kernel/epapr_hcalls.S | 5 +- arch/powerpc/kernel/exceptions-64e.S | 13 + arch/powerpc/kernel/exceptions-64s.S | 2 +- arch/powerpc/kernel/head_32.S | 14 ++--- arch/powerpc/kernel/head_40x.S | 4 +- arch/powerpc/kernel/head_44x.S | 8 +-- arch/powerpc/kernel/head_64.S | 1 + arch/powerpc/kernel/head_8xx.S | 2 +- arch/powerpc/kernel/head_booke.h | 12 +--- arch/powerpc/kernel/head_fsl_booke.S | 16 +++--- arch/powerpc/kernel/idle_6xx.S | 8 +-- arch/powerpc/kernel/idle_book3e.S | 2 +- arch/powerpc/kernel/idle_e500.S| 8 +-- arch/powerpc/kernel/idle_power4.S | 2 +- arch/powerpc/kernel/irq.c | 77 +- arch/powerpc/kernel/kgdb.c | 28 --
Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
Le 09/10/2018 à 06:32, Nicholas Piggin a écrit : On Mon, 8 Oct 2018 17:39:11 +0200 Christophe LEROY wrote: Hi Nick, Le 19/07/2017 à 08:59, Nicholas Piggin a écrit : Use nmi_enter similarly to system reset interrupts. This uses NMI printk NMI buffers and turns off various debugging facilities that helps avoid tripping on ourselves or other CPUs. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/traps.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 2849c4f50324..6d31f9d7c333 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs) void machine_check_exception(struct pt_regs *regs) { - enum ctx_state prev_state = exception_enter(); int recover = 0; + bool nested = in_nmi(); + if (!nested) + nmi_enter(); This alters preempt_count, then when die() is called in_interrupt() returns true allthough the trap didn't happen in interrupt, so oops_end() panics for "fatal exception in interrupt" instead of gently sending SIGBUS the faulting app. Thanks for tracking that down. Any idea on how to fix this ? I would say we have to deliver the sigbus by hand. if ((user_mode(regs))) _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip); else die("Machine check", regs, SIGBUS); And what about all the other things done by 'die()' ? And what if it is a kernel thread ? In one of my boards, I have a kernel thread regularly checking the HW, and if it gets a machine check I expect it to gently stop and the die notification to be delivered to all registered notifiers. Until before this patch, it was working well. Christophe
Re: [PATCH] powerpc/xmon/ppc-opc: Use ARRAY_SIZE macro
On Tue, 2018-10-09 at 14:43 +1100, Michael Ellerman wrote: > Joe Perches writes: > > > On Thu, 2018-10-04 at 19:10 +0200, Gustavo A. R. Silva wrote: > > > Use ARRAY_SIZE instead of dividing sizeof array with sizeof an element. > > [] > > > diff --git a/arch/powerpc/xmon/ppc-opc.c b/arch/powerpc/xmon/ppc-opc.c > > [] > > > @@ -966,8 +966,7 @@ const struct powerpc_operand powerpc_operands[] = > > >{ 0xff, 11, NULL, NULL, PPC_OPERAND_SIGNOPT }, > > > }; > > > > > > -const unsigned int num_powerpc_operands = (sizeof (powerpc_operands) > > > -/ sizeof (powerpc_operands[0])); > > > +const unsigned int num_powerpc_operands = ARRAY_SIZE(powerpc_operands); > > > > It seems this is unused and could be deleted. > > The code in this file is copied from binutils. > > We don't want to needlessly diverge it. > > I've said this before: > > > https://lore.kernel.org/linuxppc-dev/874lfxjnzl@concordia.ellerman.id.au/ Don't expect people to remember this. > Is there some way we can blacklist this file from checkpatch, Coccinelle > etc? Modify both to look for some specific tag in a file and then update the scripts to read the file when looking at patches too. Otherwise, no.
Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
On Mon, 8 Oct 2018 17:39:11 +0200 Christophe LEROY wrote: > Hi Nick, > > Le 19/07/2017 à 08:59, Nicholas Piggin a écrit : > > Use nmi_enter similarly to system reset interrupts. This uses NMI > > printk NMI buffers and turns off various debugging facilities that > > helps avoid tripping on ourselves or other CPUs. > > > > Signed-off-by: Nicholas Piggin > > --- > > arch/powerpc/kernel/traps.c | 9 ++--- > > 1 file changed, 6 insertions(+), 3 deletions(-) > > > > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c > > index 2849c4f50324..6d31f9d7c333 100644 > > --- a/arch/powerpc/kernel/traps.c > > +++ b/arch/powerpc/kernel/traps.c > > @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs) > > > > void machine_check_exception(struct pt_regs *regs) > > { > > - enum ctx_state prev_state = exception_enter(); > > int recover = 0; > > + bool nested = in_nmi(); > > + if (!nested) > > + nmi_enter(); > > This alters preempt_count, then when die() is called > in_interrupt() returns true allthough the trap didn't happen in > interrupt, so oops_end() panics for "fatal exception in interrupt" > instead of gently sending SIGBUS the faulting app. Thanks for tracking that down. > Any idea on how to fix this ? I would say we have to deliver the sigbus by hand. if ((user_mode(regs))) _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip); else die("Machine check", regs, SIGBUS);
Re: [PATCH] powerpc/xmon/ppc-opc: Use ARRAY_SIZE macro
Joe Perches writes: > On Thu, 2018-10-04 at 19:10 +0200, Gustavo A. R. Silva wrote: >> Use ARRAY_SIZE instead of dividing sizeof array with sizeof an element. > [] >> diff --git a/arch/powerpc/xmon/ppc-opc.c b/arch/powerpc/xmon/ppc-opc.c > [] >> @@ -966,8 +966,7 @@ const struct powerpc_operand powerpc_operands[] = >>{ 0xff, 11, NULL, NULL, PPC_OPERAND_SIGNOPT }, >> }; >> >> -const unsigned int num_powerpc_operands = (sizeof (powerpc_operands) >> - / sizeof (powerpc_operands[0])); >> +const unsigned int num_powerpc_operands = ARRAY_SIZE(powerpc_operands); > > It seems this is unused and could be deleted. The code in this file is copied from binutils. We don't want to needlessly diverge it. I've said this before: https://lore.kernel.org/linuxppc-dev/874lfxjnzl@concordia.ellerman.id.au/ Is there some way we can blacklist this file from checkpatch, Coccinelle etc? cheers
Re: [PATCH] powerpc/xmon/ppc-opc: Use ARRAY_SIZE macro
On Thu, 2018-10-04 at 19:10 +0200, Gustavo A. R. Silva wrote: > Use ARRAY_SIZE instead of dividing sizeof array with sizeof an element. [] > diff --git a/arch/powerpc/xmon/ppc-opc.c b/arch/powerpc/xmon/ppc-opc.c [] > @@ -966,8 +966,7 @@ const struct powerpc_operand powerpc_operands[] = >{ 0xff, 11, NULL, NULL, PPC_OPERAND_SIGNOPT }, > }; > > -const unsigned int num_powerpc_operands = (sizeof (powerpc_operands) > -/ sizeof (powerpc_operands[0])); > +const unsigned int num_powerpc_operands = ARRAY_SIZE(powerpc_operands); It seems this is unused and could be deleted. > /* The functions used to insert and extract complicated operands. */ > > @@ -6980,8 +6979,7 @@ const struct powerpc_opcode powerpc_opcodes[] = { > {"fcfidu.", XRC(63,974,1), XRA_MASK, POWER7|PPCA2, PPCVLE, {FRT, > FRB}}, > }; > > -const int powerpc_num_opcodes = > - sizeof (powerpc_opcodes) / sizeof (powerpc_opcodes[0]); > +const int powerpc_num_opcodes = ARRAY_SIZE(powerpc_opcodes); This is used once and should probably be replaced where it is used with ARRAY_SIZE > /* The VLE opcode table. > > @@ -7219,8 +7217,7 @@ const struct powerpc_opcode vle_opcodes[] = { > {"se_bl",BD8(58,0,1),BD8_MASK, PPCVLE, 0, {B8}}, > }; > > -const int vle_num_opcodes = > - sizeof (vle_opcodes) / sizeof (vle_opcodes[0]); > +const int vle_num_opcodes = ARRAY_SIZE(vle_opcodes); Also apparently unused and could be deleted. > > /* The macro table. This is only used by the assembler. */ > > @@ -7288,5 +7285,4 @@ const struct powerpc_macro powerpc_macros[] = { > {"e_clrlslwi",4, PPCVLE, "e_rlwinm %0,%1,%3,(%2)-(%3),31-(%3)"}, > }; > ld > -const int powerpc_num_macros = > - sizeof (powerpc_macros) / sizeof (powerpc_macros[0]); > +const int powerpc_num_macros = ARRAY_SIZE(powerpc_macros); Also apparently unused and could be deleted.
Re: [PATCH v5 06/33] KVM: PPC: Book3S HV: Simplify real-mode interrupt handling
On Mon, Oct 08, 2018 at 04:30:52PM +1100, Paul Mackerras wrote: > This streamlines the first part of the code that handles a hypervisor > interrupt that occurred in the guest. With this, all of the real-mode > handling that occurs is done before the "guest_exit_cont" label; once > we get to that label we are committed to exiting to host virtual mode. > Thus the machine check and HMI real-mode handling is moved before that > label. > > Also, the code to handle external interrupts is moved out of line, as > is the code that calls kvmppc_realmode_hmi_handler(). > > Signed-off-by: Paul Mackerras Reviewed-by: David Gibson > --- > arch/powerpc/kvm/book3s_hv_ras.c| 8 ++ > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 220 > > 2 files changed, 119 insertions(+), 109 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv_ras.c > b/arch/powerpc/kvm/book3s_hv_ras.c > index b11043b..ee564b6 100644 > --- a/arch/powerpc/kvm/book3s_hv_ras.c > +++ b/arch/powerpc/kvm/book3s_hv_ras.c > @@ -331,5 +331,13 @@ long kvmppc_realmode_hmi_handler(void) > } else { > wait_for_tb_resync(); > } > + > + /* > + * Reset tb_offset_applied so the guest exit code won't try > + * to subtract the previous timebase offset from the timebase. > + */ > + if (local_paca->kvm_hstate.kvm_vcore) > + local_paca->kvm_hstate.kvm_vcore->tb_offset_applied = 0; > + > return 0; > } > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > index 5b2ae34..fc360b5 100644 > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > @@ -1018,8 +1018,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300) > no_xive: > #endif /* CONFIG_KVM_XICS */ > > -deliver_guest_interrupt: > -kvmppc_cede_reentry: /* r4 = vcpu, r13 = paca */ > +deliver_guest_interrupt: /* r4 = vcpu, r13 = paca */ > /* Check if we can deliver an external or decrementer interrupt now */ > ld r0, VCPU_PENDING_EXC(r4) > BEGIN_FTR_SECTION > @@ -1269,18 +1268,26 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) > std r3, VCPU_CTR(r9) > std r4, VCPU_XER(r9) > > -#ifdef CONFIG_PPC_TRANSACTIONAL_MEM > - /* For softpatch interrupt, go off and do TM instruction emulation */ > - cmpwi r12, BOOK3S_INTERRUPT_HV_SOFTPATCH > - beq kvmppc_tm_emul > -#endif > + /* Save more register state */ > + mfdar r3 > + mfdsisr r4 > + std r3, VCPU_DAR(r9) > + stw r4, VCPU_DSISR(r9) > > /* If this is a page table miss then see if it's theirs or ours */ > cmpwi r12, BOOK3S_INTERRUPT_H_DATA_STORAGE > beq kvmppc_hdsi > + std r3, VCPU_FAULT_DAR(r9) > + stw r4, VCPU_FAULT_DSISR(r9) > cmpwi r12, BOOK3S_INTERRUPT_H_INST_STORAGE > beq kvmppc_hisi > > +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM > + /* For softpatch interrupt, go off and do TM instruction emulation */ > + cmpwi r12, BOOK3S_INTERRUPT_HV_SOFTPATCH > + beq kvmppc_tm_emul > +#endif > + > /* See if this is a leftover HDEC interrupt */ > cmpwi r12,BOOK3S_INTERRUPT_HV_DECREMENTER > bne 2f > @@ -1303,7 +1310,7 @@ BEGIN_FTR_SECTION > END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) > lbz r0, HSTATE_HOST_IPI(r13) > cmpwi r0, 0 > - beq 4f > + beq maybe_reenter_guest > b guest_exit_cont > 3: > /* If it's a hypervisor facility unavailable interrupt, save HFSCR */ > @@ -1315,82 +1322,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) > 14: > /* External interrupt ? */ > cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL > - bne+guest_exit_cont > - > - /* External interrupt, first check for host_ipi. If this is > - * set, we know the host wants us out so let's do it now > - */ > - bl kvmppc_read_intr > - > - /* > - * Restore the active volatile registers after returning from > - * a C function. > - */ > - ld r9, HSTATE_KVM_VCPU(r13) > - li r12, BOOK3S_INTERRUPT_EXTERNAL > - > - /* > - * kvmppc_read_intr return codes: > - * > - * Exit to host (r3 > 0) > - * 1 An interrupt is pending that needs to be handled by the host > - * Exit guest and return to host by branching to guest_exit_cont > - * > - * 2 Passthrough that needs completion in the host > - * Exit guest and return to host by branching to guest_exit_cont > - * However, we also set r12 to BOOK3S_INTERRUPT_HV_RM_HARD > - * to indicate to the host to complete handling the interrupt > - * > - * Before returning to guest, we check if any CPU is heading out > - * to the host and if so, we head out also. If no CPUs are heading > - * check return values <= 0. > - * > - * Return to guest (r3 <= 0) > - * 0 No external interrupt is pending >
Re: [PATCH v5 22/33] KVM: PPC: Book3S HV: Introduce rmap to track nested guest mappings
On Mon, Oct 08, 2018 at 04:31:08PM +1100, Paul Mackerras wrote: > From: Suraj Jitindar Singh > > When a host (L0) page which is mapped into a (L1) guest is in turn > mapped through to a nested (L2) guest we keep a reverse mapping (rmap) > so that these mappings can be retrieved later. > > Whenever we create an entry in a shadow_pgtable for a nested guest we > create a corresponding rmap entry and add it to the list for the > L1 guest memslot at the index of the L1 guest page it maps. This means > at the L1 guest memslot we end up with lists of rmaps. > > When we are notified of a host page being invalidated which has been > mapped through to a (L1) guest, we can then walk the rmap list for that > guest page, and find and invalidate all of the corresponding > shadow_pgtable entries. > > In order to reduce memory consumption, we compress the information for > each rmap entry down to 52 bits -- 12 bits for the LPID and 40 bits > for the guest real page frame number -- which will fit in a single > unsigned long. To avoid a scenario where a guest can trigger > unbounded memory allocations, we scan the list when adding an entry to > see if there is already an entry with the contents we need. This can > occur, because we don't ever remove entries from the middle of a list. > > A struct nested guest rmap is a list pointer and an rmap entry; > > | next pointer | > > | rmap entry | > > > Thus the rmap pointer for each guest frame number in the memslot can be > either NULL, a single entry, or a pointer to a list of nested rmap entries. > > gfnmemslot rmap array > - > 0| NULL | (no rmap entry) > - > 1| single rmap entry | (rmap entry with low bit set) > - > 2| list head pointer | (list of rmap entries) > - > > The final entry always has the lowest bit set and is stored in the next > pointer of the last list entry, or as a single rmap entry. > With a list of rmap entries looking like; > > - - - > | list head ptr | > | next pointer | > | single rmap entry > | > - - - > | rmap entry| | rmap entry| > - - > > Signed-off-by: Suraj Jitindar Singh > Signed-off-by: Paul Mackerras Reviewed-by: David Gibson > --- > arch/powerpc/include/asm/kvm_book3s.h| 3 + > arch/powerpc/include/asm/kvm_book3s_64.h | 69 +++- > arch/powerpc/kvm/book3s_64_mmu_radix.c | 44 +++--- > arch/powerpc/kvm/book3s_hv.c | 1 + > arch/powerpc/kvm/book3s_hv_nested.c | 138 > ++- > 5 files changed, 240 insertions(+), 15 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_book3s.h > b/arch/powerpc/include/asm/kvm_book3s.h > index 63f7ccf..d7aeb6f 100644 > --- a/arch/powerpc/include/asm/kvm_book3s.h > +++ b/arch/powerpc/include/asm/kvm_book3s.h > @@ -196,6 +196,9 @@ extern int kvmppc_mmu_radix_translate_table(struct > kvm_vcpu *vcpu, gva_t eaddr, > int table_index, u64 *pte_ret_p); > extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, > struct kvmppc_pte *gpte, bool data, bool iswrite); > +extern void kvmppc_unmap_pte(struct kvm *kvm, pte_t *pte, unsigned long gpa, > + unsigned int shift, struct kvm_memory_slot *memslot, > + unsigned int lpid); > extern bool kvmppc_hv_handle_set_rc(struct kvm *kvm, pgd_t *pgtable, > bool writing, unsigned long gpa, > unsigned int lpid); > diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h > b/arch/powerpc/include/asm/kvm_book3s_64.h > index 5496152..c2a9146 100644 > --- a/arch/powerpc/include/asm/kvm_book3s_64.h > +++ b/arch/powerpc/include/asm/kvm_book3s_64.h > @@ -53,6 +53,66 @@ struct kvm_nested_guest { > struct kvm_nested_guest *next; > }; > > +/* > + * We define a nested rmap entry as a single 64-bit quantity > + * 0xFFF012-bit lpid field > + * 0x000FF00040-bit guest 4k page frame number > + * 0x00011-bit single entry flag > + */ > +#define RMAP_NESTED_LPID_MASK0xFFF0UL > +#define RMAP_NESTED_LPID_SHIFT (52) > +#define RMAP_NESTED_GPA_MASK 0x000FF000UL > +#define RMAP_NESTED_IS_SINGLE_ENTRY 0x0001UL > + > +/* Structure for a nested guest rmap entry */ > +struct rmap_nested { > + struct llist_node list; > + u64 rmap; > +}; > + > +/* > + * for_each_nest_rmap_safe - iterate over the list of nested rmap entries > + *
Re: [PATCH v5 17/33] KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization
On Mon, Oct 08, 2018 at 04:31:03PM +1100, Paul Mackerras wrote: > This starts the process of adding the code to support nested HV-style > virtualization. It defines a new H_SET_PARTITION_TABLE hypercall which > a nested hypervisor can use to set the base address and size of a > partition table in its memory (analogous to the PTCR register). > On the host (level 0 hypervisor) side, the H_SET_PARTITION_TABLE > hypercall from the guest is handled by code that saves the virtual > PTCR value for the guest. > > This also adds code for creating and destroying nested guests and for > reading the partition table entry for a nested guest from L1 memory. > Each nested guest has its own shadow LPID value, different in general > from the LPID value used by the nested hypervisor to refer to it. The > shadow LPID value is allocated at nested guest creation time. > > Nested hypervisor functionality is only available for a radix guest, > which therefore means a radix host on a POWER9 (or later) processor. > > Signed-off-by: Paul Mackerras Reviewed-by: David Gibson > --- > arch/powerpc/include/asm/hvcall.h | 5 + > arch/powerpc/include/asm/kvm_book3s.h | 10 +- > arch/powerpc/include/asm/kvm_book3s_64.h | 33 > arch/powerpc/include/asm/kvm_book3s_asm.h | 3 + > arch/powerpc/include/asm/kvm_host.h | 5 + > arch/powerpc/kvm/Makefile | 3 +- > arch/powerpc/kvm/book3s_hv.c | 31 ++- > arch/powerpc/kvm/book3s_hv_nested.c | 301 > ++ > 8 files changed, 384 insertions(+), 7 deletions(-) > create mode 100644 arch/powerpc/kvm/book3s_hv_nested.c > > diff --git a/arch/powerpc/include/asm/hvcall.h > b/arch/powerpc/include/asm/hvcall.h > index a0b17f9..c95c651 100644 > --- a/arch/powerpc/include/asm/hvcall.h > +++ b/arch/powerpc/include/asm/hvcall.h > @@ -322,6 +322,11 @@ > #define H_GET_24X7_DATA 0xF07C > #define H_GET_PERF_COUNTER_INFO 0xF080 > > +/* Platform-specific hcalls used for nested HV KVM */ > +#define H_SET_PARTITION_TABLE0xF800 > +#define H_ENTER_NESTED 0xF804 > +#define H_TLB_INVALIDATE 0xF808 > + > /* Values for 2nd argument to H_SET_MODE */ > #define H_SET_MODE_RESOURCE_SET_CIABR1 > #define H_SET_MODE_RESOURCE_SET_DAWR 2 > diff --git a/arch/powerpc/include/asm/kvm_book3s.h > b/arch/powerpc/include/asm/kvm_book3s.h > index 91c9779..43f212e 100644 > --- a/arch/powerpc/include/asm/kvm_book3s.h > +++ b/arch/powerpc/include/asm/kvm_book3s.h > @@ -274,6 +274,13 @@ static inline void kvmppc_save_tm_sprs(struct kvm_vcpu > *vcpu) {} > static inline void kvmppc_restore_tm_sprs(struct kvm_vcpu *vcpu) {} > #endif > > +long kvmhv_nested_init(void); > +void kvmhv_nested_exit(void); > +void kvmhv_vm_nested_init(struct kvm *kvm); > +long kvmhv_set_partition_table(struct kvm_vcpu *vcpu); > +void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1); > +void kvmhv_release_all_nested(struct kvm *kvm); > + > void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac); > > extern int kvm_irq_bypass; > @@ -387,9 +394,6 @@ extern int kvmppc_h_logical_ci_store(struct kvm_vcpu > *vcpu); > /* TO = 31 for unconditional trap */ > #define INS_TW 0x7fe8 > > -/* LPIDs we support with this build -- runtime limit may be lower */ > -#define KVMPPC_NR_LPIDS (LPID_RSVD + 1) > - > #define SPLIT_HACK_MASK 0xff00 > #define SPLIT_HACK_OFFS 0xfb00 > > diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h > b/arch/powerpc/include/asm/kvm_book3s_64.h > index 5c0e2d9..6d67b6a 100644 > --- a/arch/powerpc/include/asm/kvm_book3s_64.h > +++ b/arch/powerpc/include/asm/kvm_book3s_64.h > @@ -23,6 +23,39 @@ > #include > #include > #include > +#include > + > +#ifdef CONFIG_PPC_PSERIES > +static inline bool kvmhv_on_pseries(void) > +{ > + return !cpu_has_feature(CPU_FTR_HVMODE); > +} > +#else > +static inline bool kvmhv_on_pseries(void) > +{ > + return false; > +} > +#endif > + > +/* > + * Structure for a nested guest, that is, for a guest that is managed by > + * one of our guests. > + */ > +struct kvm_nested_guest { > + struct kvm *l1_host;/* L1 VM that owns this nested guest */ > + int l1_lpid;/* lpid L1 guest thinks this guest is */ > + int shadow_lpid;/* real lpid of this nested guest */ > + pgd_t *shadow_pgtable; /* our page table for this guest */ > + u64 l1_gr_to_hr;/* L1's addr of part'n-scoped table */ > + u64 process_table; /* process table entry for this guest */ > + long refcnt;/* number of pointers to this struct */ > + struct mutex tlb_lock; /* serialize page faults and tlbies */ > + struct kvm_nested_guest *next; > +}; > + > +struct kvm_nested_guest *kvmhv_get_nested(struct kvm *kvm,
Re: [PATCH v5 33/33] KVM: PPC: Book3S HV: Add NO_HASH flag to GET_SMMU_INFO ioctl result
On Mon, Oct 08, 2018 at 04:31:19PM +1100, Paul Mackerras wrote: > This adds a KVM_PPC_NO_HASH flag to the flags field of the > kvm_ppc_smmu_info struct, and arranges for it to be set when > running as a nested hypervisor, as an unambiguous indication > to userspace that HPT guests are not supported. Reporting the > KVM_CAP_PPC_MMU_HASH_V3 capability as false could be taken as > indicating only that the new HPT features in ISA V3.0 are not > supported, leaving it ambiguous whether pre-V3.0 HPT features > are supported. > > Signed-off-by: Paul Mackerras Reviewed-by: David Gibson > --- > Documentation/virtual/kvm/api.txt | 4 > arch/powerpc/kvm/book3s_hv.c | 4 > include/uapi/linux/kvm.h | 1 + > 3 files changed, 9 insertions(+) > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index fde48b6..df98b63 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -2270,6 +2270,10 @@ The supported flags are: > The emulated MMU supports 1T segments in addition to the > standard 256M ones. > > +- KVM_PPC_NO_HASH > + This flag indicates that HPT guests are not supported by KVM, > + thus all guests must use radix MMU mode. > + > The "slb_size" field indicates how many SLB entries are supported > > The "sps" array contains 8 entries indicating the supported base > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index fa61647..f565403 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -4245,6 +4245,10 @@ static int kvm_vm_ioctl_get_smmu_info_hv(struct kvm > *kvm, > kvmppc_add_seg_page_size(, 16, SLB_VSID_L | SLB_VSID_LP_01); > kvmppc_add_seg_page_size(, 24, SLB_VSID_L); > > + /* If running as a nested hypervisor, we don't support HPT guests */ > + if (kvmhv_on_pseries()) > + info->flags |= KVM_PPC_NO_HASH; > + > return 0; > } > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index d9cec6b..7f2ff3a 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -719,6 +719,7 @@ struct kvm_ppc_one_seg_page_size { > > #define KVM_PPC_PAGE_SIZES_REAL 0x0001 > #define KVM_PPC_1T_SEGMENTS 0x0002 > +#define KVM_PPC_NO_HASH 0x0004 > > struct kvm_ppc_smmu_info { > __u64 flags; -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [PATCH v5 09/33] KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests
On Mon, Oct 08, 2018 at 04:30:55PM +1100, Paul Mackerras wrote: > This creates an alternative guest entry/exit path which is used for > radix guests on POWER9 systems when we have indep_threads_mode=Y. In > these circumstances there is exactly one vcpu per vcore and there is > no coordination required between vcpus or vcores; the vcpu can enter > the guest without needing to synchronize with anything else. > > The new fast path is implemented almost entirely in C in book3s_hv.c > and runs with the MMU on until the guest is entered. On guest exit > we use the existing path until the point where we are committed to > exiting the guest (as distinct from handling an interrupt in the > low-level code and returning to the guest) and we have pulled the > guest context from the XIVE. At that point we check a flag in the > stack frame to see whether we came in via the old path and the new > path; if we came in via the new path then we go back to C code to do > the rest of the process of saving the guest context and restoring the > host context. > > The C code is split into separate functions for handling the > OS-accessible state and the hypervisor state, with the idea that the > latter can be replaced by a hypercall when we implement nested > virtualization. > > Signed-off-by: Paul Mackerras Reviewed-by: David Gibson > --- > arch/powerpc/include/asm/asm-prototypes.h | 2 + > arch/powerpc/include/asm/kvm_ppc.h| 2 + > arch/powerpc/kvm/book3s_hv.c | 429 > +- > arch/powerpc/kvm/book3s_hv_ras.c | 2 + > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 95 ++- > arch/powerpc/kvm/book3s_xive.c| 63 + > 6 files changed, 589 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/include/asm/asm-prototypes.h > b/arch/powerpc/include/asm/asm-prototypes.h > index 0c1a2b0..5c9b00c 100644 > --- a/arch/powerpc/include/asm/asm-prototypes.h > +++ b/arch/powerpc/include/asm/asm-prototypes.h > @@ -165,4 +165,6 @@ void kvmhv_load_host_pmu(void); > void kvmhv_save_guest_pmu(struct kvm_vcpu *vcpu, bool pmu_in_use); > void kvmhv_load_guest_pmu(struct kvm_vcpu *vcpu); > > +int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu); > + > #endif /* _ASM_POWERPC_ASM_PROTOTYPES_H */ > diff --git a/arch/powerpc/include/asm/kvm_ppc.h > b/arch/powerpc/include/asm/kvm_ppc.h > index 83d61b8..245e564 100644 > --- a/arch/powerpc/include/asm/kvm_ppc.h > +++ b/arch/powerpc/include/asm/kvm_ppc.h > @@ -585,6 +585,7 @@ extern int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 > icpval); > > extern int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, > int level, bool line_status); > +extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu); > #else > static inline int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, u32 server, > u32 priority) { return -1; } > @@ -607,6 +608,7 @@ static inline int kvmppc_xive_set_icp(struct kvm_vcpu > *vcpu, u64 icpval) { retur > > static inline int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, > u32 irq, > int level, bool line_status) { return > -ENODEV; } > +static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { } > #endif /* CONFIG_KVM_XIVE */ > > /* > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 0e17593..0c1dd76 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -3080,6 +3080,269 @@ static noinline void kvmppc_run_core(struct > kvmppc_vcore *vc) > } > > /* > + * Load up hypervisor-mode registers on P9. > + */ > +static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit) > +{ > + struct kvmppc_vcore *vc = vcpu->arch.vcore; > + s64 hdec; > + u64 tb, purr, spurr; > + int trap; > + unsigned long host_hfscr = mfspr(SPRN_HFSCR); > + unsigned long host_ciabr = mfspr(SPRN_CIABR); > + unsigned long host_dawr = mfspr(SPRN_DAWR); > + unsigned long host_dawrx = mfspr(SPRN_DAWRX); > + unsigned long host_psscr = mfspr(SPRN_PSSCR); > + unsigned long host_pidr = mfspr(SPRN_PID); > + > + hdec = time_limit - mftb(); > + if (hdec < 0) > + return BOOK3S_INTERRUPT_HV_DECREMENTER; > + mtspr(SPRN_HDEC, hdec); > + > + if (vc->tb_offset) { > + u64 new_tb = mftb() + vc->tb_offset; > + mtspr(SPRN_TBU40, new_tb); > + tb = mftb(); > + if ((tb & 0xff) < (new_tb & 0xff)) > + mtspr(SPRN_TBU40, new_tb + 0x100); > + vc->tb_offset_applied = vc->tb_offset; > + } > + > + if (vc->pcr) > + mtspr(SPRN_PCR, vc->pcr); > + mtspr(SPRN_DPDES, vc->dpdes); > + mtspr(SPRN_VTB, vc->vtb); > + > + local_paca->kvm_hstate.host_purr = mfspr(SPRN_PURR); > + local_paca->kvm_hstate.host_spurr = mfspr(SPRN_SPURR); > + mtspr(SPRN_PURR,
Re: [PATCH v5 32/33] KVM: PPC: Book3S HV: Add a VM capability to enable nested virtualization
On Mon, Oct 08, 2018 at 04:31:18PM +1100, Paul Mackerras wrote: > With this, userspace can enable a KVM-HV guest to run nested guests > under it. > > The administrator can control whether any nested guests can be run; > setting the "nested" module parameter to false prevents any guests > becoming nested hypervisors (that is, any attempt to enable the nested > capability on a guest will fail). Guests which are already nested > hypervisors will continue to be so. > > Signed-off-by: Paul Mackerras Reviewed-by: David Gibson > --- > Documentation/virtual/kvm/api.txt | 14 ++ > arch/powerpc/include/asm/kvm_ppc.h | 1 + > arch/powerpc/kvm/book3s_hv.c | 39 > +- > arch/powerpc/kvm/powerpc.c | 12 > include/uapi/linux/kvm.h | 1 + > 5 files changed, 58 insertions(+), 9 deletions(-) > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index 2f5f9b7..fde48b6 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -4532,6 +4532,20 @@ With this capability, a guest may read the > MSR_PLATFORM_INFO MSR. Otherwise, > a #GP would be raised when the guest tries to access. Currently, this > capability does not enable write permissions of this MSR for the guest. > > +7.16 KVM_CAP_PPC_NESTED_HV > + > +Architectures: ppc > +Parameters: none > +Returns: 0 on success, -EINVAL when the implementation doesn't support > + nested-HV virtualization. > + > +HV-KVM on POWER9 and later systems allows for "nested-HV" > +virtualization, which provides a way for a guest VM to run guests that > +can run using the CPU's supervisor mode (privileged non-hypervisor > +state). Enabling this capability on a VM depends on the CPU having > +the necessary functionality and on the facility being enabled with a > +kvm-hv module parameter. > + > 8. Other capabilities. > -- > > diff --git a/arch/powerpc/include/asm/kvm_ppc.h > b/arch/powerpc/include/asm/kvm_ppc.h > index 245e564..b3796bd 100644 > --- a/arch/powerpc/include/asm/kvm_ppc.h > +++ b/arch/powerpc/include/asm/kvm_ppc.h > @@ -327,6 +327,7 @@ struct kvmppc_ops { > int (*set_smt_mode)(struct kvm *kvm, unsigned long mode, > unsigned long flags); > void (*giveup_ext)(struct kvm_vcpu *vcpu, ulong msr); > + int (*enable_nested)(struct kvm *kvm); > }; > > extern struct kvmppc_ops *kvmppc_hv_ops; > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 152bf75..fa61647 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -118,6 +118,16 @@ module_param_cb(h_ipi_redirect, _param_ops, > _ipi_redirect, 0644); > MODULE_PARM_DESC(h_ipi_redirect, "Redirect H_IPI wakeup to a free host > core"); > #endif > > +/* If set, guests are allowed to create and control nested guests */ > +static bool nested = true; > +module_param(nested, bool, S_IRUGO | S_IWUSR); > +MODULE_PARM_DESC(nested, "Enable nested virtualization (only on POWER9)"); > + > +static inline bool nesting_enabled(struct kvm *kvm) > +{ > + return kvm->arch.nested_enable && kvm_is_radix(kvm); > +} > + > /* If set, the threads on each CPU core have to be in the same MMU mode */ > static bool no_mixing_hpt_and_radix; > > @@ -959,12 +969,12 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) > > case H_SET_PARTITION_TABLE: > ret = H_FUNCTION; > - if (vcpu->kvm->arch.nested_enable) > + if (nesting_enabled(vcpu->kvm)) > ret = kvmhv_set_partition_table(vcpu); > break; > case H_ENTER_NESTED: > ret = H_FUNCTION; > - if (!vcpu->kvm->arch.nested_enable) > + if (!nesting_enabled(vcpu->kvm)) > break; > ret = kvmhv_enter_nested_guest(vcpu); > if (ret == H_INTERRUPT) { > @@ -974,9 +984,8 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) > break; > case H_TLB_INVALIDATE: > ret = H_FUNCTION; > - if (!vcpu->kvm->arch.nested_enable) > - break; > - ret = kvmhv_do_nested_tlbie(vcpu); > + if (nesting_enabled(vcpu->kvm)) > + ret = kvmhv_do_nested_tlbie(vcpu); > break; > > default: > @@ -4496,10 +4505,8 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu > *vcpu) > /* Must be called with kvm->lock held and mmu_ready = 0 and no vcpus running > */ > int kvmppc_switch_mmu_to_hpt(struct kvm *kvm) > { > - if (kvm->arch.nested_enable) { > - kvm->arch.nested_enable = false; > + if (nesting_enabled(kvm)) > kvmhv_release_all_nested(kvm); > - } > kvmppc_free_radix(kvm); > kvmppc_update_lpcr(kvm, LPCR_VPM1, > LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR); > @@ -4776,7
Re: [PATCH v5 30/33] KVM: PPC: Book3S HV: Allow HV module to load without hypervisor mode
On Mon, Oct 08, 2018 at 04:31:16PM +1100, Paul Mackerras wrote: > With this, the KVM-HV module can be loaded in a guest running under > KVM-HV, and if the hypervisor supports nested virtualization, this > guest can now act as a nested hypervisor and run nested guests. > > This also adds some checks to inform userspace that HPT guests are not > supported by nested hypervisors (by returning false for the > KVM_CAP_PPC_MMU_HASH_V3 capability), and to prevent userspace from > configuring a guest to use HPT mode. > > Signed-off-by: Paul Mackerras Reviewed-by: David Gibson > --- > arch/powerpc/kvm/book3s_hv.c | 16 > arch/powerpc/kvm/powerpc.c | 3 ++- > 2 files changed, 14 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 127bb5f..152bf75 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -4807,11 +4807,15 @@ static int kvmppc_core_emulate_mfspr_hv(struct > kvm_vcpu *vcpu, int sprn, > > static int kvmppc_core_check_processor_compat_hv(void) > { > - if (!cpu_has_feature(CPU_FTR_HVMODE) || > - !cpu_has_feature(CPU_FTR_ARCH_206)) > - return -EIO; > + if (cpu_has_feature(CPU_FTR_HVMODE) && > + cpu_has_feature(CPU_FTR_ARCH_206)) > + return 0; > > - return 0; > + /* POWER9 in radix mode is capable of being a nested hypervisor. */ > + if (cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled()) > + return 0; > + > + return -EIO; > } > > #ifdef CONFIG_KVM_XICS > @@ -5129,6 +5133,10 @@ static int kvmhv_configure_mmu(struct kvm *kvm, struct > kvm_ppc_mmuv3_cfg *cfg) > if (radix && !radix_enabled()) > return -EINVAL; > > + /* If we're a nested hypervisor, we currently only support radix */ > + if (kvmhv_on_pseries() && !radix) > + return -EINVAL; > + > mutex_lock(>lock); > if (radix != kvm_is_radix(kvm)) { > if (kvm->arch.mmu_ready) { > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > index eba5756..1f4b128 100644 > --- a/arch/powerpc/kvm/powerpc.c > +++ b/arch/powerpc/kvm/powerpc.c > @@ -594,7 +594,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long > ext) > r = !!(hv_enabled && radix_enabled()); > break; > case KVM_CAP_PPC_MMU_HASH_V3: > - r = !!(hv_enabled && cpu_has_feature(CPU_FTR_ARCH_300)); > + r = !!(hv_enabled && cpu_has_feature(CPU_FTR_ARCH_300) && > +cpu_has_feature(CPU_FTR_HVMODE)); > break; > #endif > case KVM_CAP_SYNC_MMU: -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [PATCH 09/16] of: overlay: validate overlay properties #address-cells and #size-cells
On 10/08/18 11:46, Alan Tull wrote: > On Mon, Oct 8, 2018 at 10:57 AM Alan Tull wrote: >> >> On Thu, Oct 4, 2018 at 11:14 PM wrote: >>> >>> From: Frank Rowand >>> >>> If overlay properties #address-cells or #size-cells are already in >>> the live devicetree for any given node, then the values in the >>> overlay must match the values in the live tree. >> >> Hi Frank, >> >> I'm starting some FPGA testing on this patchset applied to v4.19-rc7. >> That applied cleanly; if that's not the best base to test against, >> please let me know. I would expect -rc7 to be ok to test against. I'm doing the development of it on -rc1. Thanks for the testing. >> On a very simple overlay, I'm seeing this patch's warning catching >> things other than #address-cells or #size-cells. #address-cells and #size-cells escape the warning for properties on an existing (non-overlay) node if the existing node already contains them as a special case. Those two properties are needed in the overlay to avoid dtc compiler warnings. If the same properties already exist in the base devicetree and have the same values as in the overlay then there is no need to add property update changeset entries in the overlay changeset. Since there will not be changeset entries for those two properties, there will be no memory leak when the changeset is removed. The special casing of #address-cells and #size-cells is part of the fix patches that are a result of the validation patches. Thus a little bit less memory leaking than we have today. > What it's warning about are new properties being added to an existing > node. So !prop is true and !of_node_check_flag(target->np, > OF_OVERLAY) also is true. Is that a potential memory leak as you are > warning? If so, your code is working as planned and you'll just need > to document that also in the header. Yes, you are accurately describing what the check is catching. The memory leak (on release) is because the memory allocated for overlay properties is released when the reference count of the node they are attached is decremented to zero, but only if the node is a dynamic flagged node (as overlays are). The memory allocated for the overlay properties will not be freed in this case because the node is not a dynamic node. >> I'm just getting >> started looking at this, will spend time understanding this better and >> I'll test other overlays. The warnings were: >> >> Applying dtbo: socfpga_overlay.dtb >> [ 33.117881] fpga_manager fpga0: writing soc_system.rbf to Altera >> SOCFPGA FPGA Manager >> [ 33.575223] OF: overlay: WARNING: add_changeset_property(), memory >> leak will occur if overlay removed. Property: >> /soc/base-fpga-region/firmware-name >> [ 33.588584] OF: overlay: WARNING: add_changeset_property(), memory >> leak will occur if overlay removed. Property: >> /soc/base-fpga-region/fpga-bridges >> [ 33.601856] OF: overlay: WARNING: add_changeset_property(), memory >> leak will occur if overlay removed. Property: >> /soc/base-fpga-region/ranges Are there properties in /soc/base-fpga-region/ in the base devicetree? If not, then that node could be removed from the base devicetree and first created in an overlay. If so, is it possible to add an additional level of node, /soc/base-fpga-region/foo, which would contain the properties that are warned about above? Then the properties would be children of an overlay node and the memory would be freed on overlay release. This is not actually a suggestion that should be implemented right now, just trying to understand the possible alternatives, because this would result in an arbitrary fake level in the tree (which I don't like). My intent is to leave these validation checks as warnings while we figure out the best way to solve the underlying memory leak issue. Note that some of the validation checks result in errors and cause an overlay apply to fail. If I did those checks correctly, they should only catch cases where the live tree after applying the overlay was a "corrupt" tree instead of the desired changes. I expect that Plumbers will be a good place to explore these things. >> Here's part of that overlay including the properties it's complaining about: >> >> /dts-v1/; >> /plugin/; >> / { >> fragment@0 { >> target = <_fpga_region>; >> #address-cells = <1>; >> #size-cells = <1>; >> __overlay__ { >> #address-cells = <1>; >> #size-cells = <1>; >> >> firmware-name = "soc_system.rbf"; >> fpga-bridges = <_bridge1>; >> ranges = <0x2 0xff20 0x10>, >> <0x0 0xc000 0x2000>; >> >> gpio@10040 { >> so on... >> >> By the way, I didn't get any warnings when I subsequently removed this >> overlay. Yes, I did not add any check that could catch this at release time. -Frank
Re: Looking for architecture papers
Hi Raz, On 10/04/2018 04:41 AM, Raz wrote: Frankly, the more I read the more perplexed I get. For example, according to BOOK III-S, chapter 3, the MSR bits are differ from the ones described in arch/powerpc/include/asm/reg.h. Bit zero, is LE, but in the book it is 64-bit mode. Would someone be kind to explain what I do not understand? Yes, I know that can be confusing at the first sight when one is used to, for instance, x86. x86 documents use LSB 0 notation, which means (as others already pointed out) that the least significant bit of a value is marked as being bit 0. On the other hand Power documents use MSB 0 notation, which means that the most significant bit of a value is marked as being bit 0 and as a consequence the least significant bit in that notation in a 64-bit platform is bit 63, not bit 0. MSB 0 notation is also known as IBM bit notation/bit numbering. Historically LSB 0 notation tend to be used on docs about little-endian architectures (for instance, x86), whilst MSB 0 notation tend to be used on docs about big-endian architectures (for instance, Power - Power is actually a little different because it's now bi-endian actually). However LSB 0 and MSB 0 are only different notations, so LSB 0 can be employed on a big-endian architecture documentation, and vice versa. It happens that kernel code is written in C and for shifts, etc, it's convenient the LSB 0 notation, not the MSB 0 notation, so it's convenient to use LSB 0 notation when creating a mask, like in arch/powerpc/include/asm/reg.h), i.e. it's convenient to employ bit positions as '63 - '. So, as another example, in the following gcc macro '_TEXASR_EXTRACT_BITS' takes a bit position 'BITNUM' as found in the PowerISA documentation but then for the shift right it uses '63 - BITNUM': https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000/htmintrin.h#L44-L45 I think it's also important to mention that on PowerISA the elements also follow the MSB 0 notation. So byte, word, and dword elements in a register found in the instruction descriptions when referred to 0 are the element "at the left tip", i.e. "the most significant elements", so to speak. For instance, take instruction "vperm": doc says 'index' takes bits 3:7 of a byte from [byte] element 'i'. So for a byte element i=0 it means the most significant byte ("on the left tip") of vector register operand 'VRC'. Moreover, specified bits in that byte element, i.e. bits 3:7, also follow the MSB 0, so for the little-endian addicted thought they are bits 4:0 (LSB 0 notation). Now, if bits 4:0 = 0b00011 (decimal 3), we grab byte element 3 from 'src' (256-bit). However byte element 3 is also in MSB 0 notation, so it means third byte of 'src' but starting counting bytes from 0 from the left to the right (which in IMO looks indeed more natural since we count, for instance, Natural Numbers on the 'x' axis similarly). Hence, it's like to say that 'vperm' instruction in a certain sense has a "big-endian semantics" for the byte indices. The 'vpermr' instruction introduced by PowerISA v3.0 is meant to cope with that, so 'vpermr' byte indices have a "little-endian semantics", so for bits 3:7 MSB 0 (or bits 4:0 in LSB 0 notation) = 0b00011 (decimal 3), on the 'vpermr' instruction it really means we must count bytes starting from right to left as in the LSB 0 notation and grab the third byte element from right to left. So, for instance: vr0uint128 = 0x vr1uint128 = 0x00102030405060708090a0b0c0d0e0f0 vr2uint128 = 0x0111223344556677aabbccddeeff vr3uint128 = 0x0300 we have 'src' as: MSB 0: v--- byte 0, 1, 2, 3, ... LSB 0: ... 3, 2, 1, byte 0 ---v src = vr1 || vr2 = 00 10 20 30 40 50 60 70 80 90 A0 B0 C0 D0 E0 F0 01 11 22 33 44 55 66 77 99 99 AA BB CC DD EE FF vperm vr0, vr1, vr2, vr3 result is: vr0uint128 = 0x3000 byte 3 in MSB 0 = 0x30 ---^ and 0x00 (byte 0 in MSB 0) copied to the remaining bytes whilst with vpermr (PowerISA v3.0 / POWER9): vpermr vr0, vr1, vr2, vr3 result is: vr0uint128 = 0xccff byte 3 in LSB 0 = 0xCC^ and 0xFF (byte 0 in LSB 0) copied to the remaining bytes Anyway, vperm/vpermr was just an example about notation not being restricted to bits on Power ISA. So read the docs carefully :) GDB is always useful for checking if one's understanding about a given Power instruction is correct. HTH. Regards, Gustavo
Re: Looking for architecture papers
On Mon, Oct 08, 2018 at 07:44:12PM +0300, Raz wrote: > Both systemsim and my powerpc server boots with MSR_HV=1, i.e, hypervisor > state. > Is there away to fix that ? writing to the MSR cannot work according > the documentation ( and reality ). But that is what you do: you write HV=0 in MSR. After doing other setup, of course. On some hardware you cannot set HV=0. You cannot do logical partitioning on such hardware. PowerMac G5 comes to mind. Segher
[PATCH 4.18 125/168] sched/topology: Set correct NUMA topology type
4.18-stable review patch. If anyone has any objections, please let me know. -- From: Srikar Dronamraju [ Upstream commit e5e96fafd9028b1478b165db78c52d981c14f471 ] With the following commit: 051f3ca02e46 ("sched/topology: Introduce NUMA identity node sched domain") the scheduler introduced a new NUMA level. However this leads to the NUMA topology on 2 node systems to not be marked as NUMA_DIRECT anymore. After this commit, it gets reported as NUMA_BACKPLANE, because sched_domains_numa_level is now 2 on 2 node systems. Fix this by allowing setting systems that have up to 2 NUMA levels as NUMA_DIRECT. While here remove code that assumes that level can be 0. Signed-off-by: Srikar Dronamraju Signed-off-by: Peter Zijlstra (Intel) Cc: Andre Wild Cc: Heiko Carstens Cc: Linus Torvalds Cc: Mel Gorman Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Rik van Riel Cc: Suravee Suthikulpanit Cc: Thomas Gleixner Cc: linuxppc-dev Fixes: 051f3ca02e46 "Introduce NUMA identity node sched domain" Link: http://lkml.kernel.org/r/1533920419-17410-1-git-send-email-sri...@linux.vnet.ibm.com Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/sched/topology.c |5 + 1 file changed, 1 insertion(+), 4 deletions(-) --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1295,7 +1295,7 @@ static void init_numa_topology_type(void n = sched_max_numa_distance; - if (sched_domains_numa_levels <= 1) { + if (sched_domains_numa_levels <= 2) { sched_numa_topology_type = NUMA_DIRECT; return; } @@ -1380,9 +1380,6 @@ void sched_init_numa(void) break; } - if (!level) - return; - /* * 'level' contains the number of unique distances *
Re: [PATCH 09/16] of: overlay: validate overlay properties #address-cells and #size-cells
On Mon, Oct 8, 2018 at 10:57 AM Alan Tull wrote: > > On Thu, Oct 4, 2018 at 11:14 PM wrote: > > > > From: Frank Rowand > > > > If overlay properties #address-cells or #size-cells are already in > > the live devicetree for any given node, then the values in the > > overlay must match the values in the live tree. > > Hi Frank, > > I'm starting some FPGA testing on this patchset applied to v4.19-rc7. > That applied cleanly; if that's not the best base to test against, > please let me know. > > On a very simple overlay, I'm seeing this patch's warning catching > things other than #address-cells or #size-cells. What it's warning about are new properties being added to an existing node. So !prop is true and !of_node_check_flag(target->np, OF_OVERLAY) also is true. Is that a potential memory leak as you are warning? If so, your code is working as planned and you'll just need to document that also in the header. > I'm just getting > started looking at this, will spend time understanding this better and > I'll test other overlays. The warnings were: > > Applying dtbo: socfpga_overlay.dtb > [ 33.117881] fpga_manager fpga0: writing soc_system.rbf to Altera > SOCFPGA FPGA Manager > [ 33.575223] OF: overlay: WARNING: add_changeset_property(), memory > leak will occur if overlay removed. Property: > /soc/base-fpga-region/firmware-name > [ 33.588584] OF: overlay: WARNING: add_changeset_property(), memory > leak will occur if overlay removed. Property: > /soc/base-fpga-region/fpga-bridges > [ 33.601856] OF: overlay: WARNING: add_changeset_property(), memory > leak will occur if overlay removed. Property: > /soc/base-fpga-region/ranges > > Here's part of that overlay including the properties it's complaining about: > > /dts-v1/; > /plugin/; > / { > fragment@0 { > target = <_fpga_region>; > #address-cells = <1>; > #size-cells = <1>; > __overlay__ { > #address-cells = <1>; > #size-cells = <1>; > > firmware-name = "soc_system.rbf"; > fpga-bridges = <_bridge1>; > ranges = <0x2 0xff20 0x10>, > <0x0 0xc000 0x2000>; > > gpio@10040 { > so on... > > By the way, I didn't get any warnings when I subsequently removed this > overlay. > > Alan > > > > > If the properties are already in the live tree then there is no > > need to create a changeset entry to add them since they must > > have the same value. This reduces the memory used by the > > changeset and eliminates a possible memory leak. This is > > verified by 12 fewer warnings during the devicetree unittest, > > as the possible memory leak warnings about #address-cells and > > > > Signed-off-by: Frank Rowand > > --- > > drivers/of/overlay.c | 38 +++--- > > 1 file changed, 35 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > > index 29c33a5c533f..e6fb3ffe9d93 100644 > > --- a/drivers/of/overlay.c > > +++ b/drivers/of/overlay.c > > @@ -287,7 +287,12 @@ static struct property *dup_and_fixup_symbol_prop( > > * @target may be either in the live devicetree or in a new subtree that > > * is contained in the changeset. > > * > > - * Some special properties are not updated (no error returned). > > + * Some special properties are not added or updated (no error returned): > > + * "name", "phandle", "linux,phandle". > > + * > > + * Properties "#address-cells" and "#size-cells" are not updated if they > > + * are already in the live tree, but if present in the live tree, the > > values > > + * in the overlay must match the values in the live tree. > > * > > * Update of property in symbols node is not allowed. > > * > > @@ -300,6 +305,7 @@ static int add_changeset_property(struct > > overlay_changeset *ovcs, > > { > > struct property *new_prop = NULL, *prop; > > int ret = 0; > > + bool check_for_non_overlay_node = false; > > > > if (!of_prop_cmp(overlay_prop->name, "name") || > > !of_prop_cmp(overlay_prop->name, "phandle") || > > @@ -322,13 +328,39 @@ static int add_changeset_property(struct > > overlay_changeset *ovcs, > > if (!new_prop) > > return -ENOMEM; > > > > - if (!prop) > > + if (!prop) { > > + > > + check_for_non_overlay_node = true; > > ret = of_changeset_add_property(>cset, target->np, > > new_prop); > > - else > > + > > + } else if (!of_prop_cmp(prop->name, "#address-cells")) { > > + > > + if (prop->length != 4 || new_prop->length != 4 || > > + *(u32 *)prop->value != *(u32 *)new_prop->value) > > + pr_err("ERROR: overlay and/or live tree > > #address-cells invalid in node %pOF\n", > > +
Re: [PATCH 0/8] add generic builtin command line
Hi, Daniel On Sat, Sep 29, 2018 at 9:17 PM wrote: > > On Thu, Sep 27, 2018 at 07:55:08PM +0300, Maksym Kokhan wrote: > > Daniel Walker (7): > > add generic builtin command line > > drivers: of: ifdef out cmdline section > > x86: convert to generic builtin command line > > arm: convert to generic builtin command line > > arm64: convert to generic builtin command line > > mips: convert to generic builtin command line > > powerpc: convert to generic builtin command line > > > > When I originally submitted these I had a very good conversion with Rob > Herring > on the device tree changes. It seemed fairly clear that my approach in these > changes could be done better. It effected specifically arm64, but a lot of > other > platforms use the device tree integrally. With arm64 you can reduce the > changes > down to only Kconfig changes, and that would likely be the case for many of > the > other architecture. I made patches to do this a while back, but have not had > time to test them and push them out. Can you please share this patches? I could test them and use to improve this generic command line implementation. > In terms of mips I think there's a fair amount of work needed to pull out > their > architecture specific mangling into something generic. Part of my motivation > for > these was to take the architecture specific feature and open that up for all > the > architecture. So it makes sense that the mips changes should become part of > that. This is really makes sense, and we have intentions to implement it afterward. It would be easier to initially merge this simple implementation and then develop it step by step. > The only changes which have no comments are the generic changes, x86, and > powerpc. Those patches have been used at Cisco for years with no issues. > I added those changes into my -next tree for a round of testing. Assuming > there > are no issues I can work out the merging with the architecture maintainers. > As for the other changes I think they can be done in time, as long as the > generic parts of upstream the rest can be worked on by any of the architecture > developers. Thanks, Maksym
Re: [PATCH 29/36] dt-bindings: arm: Convert Renesas board/soc bindings to json-schema
On Mon, Oct 8, 2018 at 10:13 AM Geert Uytterhoeven wrote: > > Hi Rob, > > On Mon, Oct 8, 2018 at 4:57 PM Rob Herring wrote: > > On Mon, Oct 8, 2018 at 2:47 AM Geert Uytterhoeven > > wrote: > > > On Fri, Oct 5, 2018 at 6:59 PM Rob Herring wrote: > > > > Convert Renesas SoC bindings to DT schema format using json-schema. > > > > > --- /dev/null > > > > +++ b/Documentation/devicetree/bindings/arm/shmobile.yaml > > > > @@ -0,0 +1,205 @@ > > > > > + - description: Kingfisher (SBEV-RCAR-KF-M03) > > > > +items: > > > > + - const: shimafuji,kingfisher > > > > + - enum: > > > > + - renesas,h3ulcb > > > > + - renesas,m3ulcb > > > > + - enum: > > > > + - renesas,r8a7795 > > > > + - renesas,r8a7796 > > > > > > This looks a bit funny: all other entries have the "const" last, and > > > use it for the > > > SoC number. May be correct, though. > > > To clarify, this is an extension board that can fit both the [HM]3ULCB > > > boards (actually also the new M3NULCB, I think). > > > > This being Kingfisher? > > Correct. > > > I wrote this based on dts files in the tree. There's 2 combinations that I > > see: > > > > "shimafuji,kingfisher", "renesas,h3ulcb", "renesas,r8a7795" > > "shimafuji,kingfisher", "renesas,m3ulcb", "renesas,r8a7796" > > > > The schema allows 4 combinations (1 * 2 * 2). I have no idea if the > > other combinations are possible. If not, then we could rewrite this as > > 2 entries with 3 const values each. > > I expect there will soon be a third one: > > "shimafuji,kingfisher", "renesas,m3nulcb", "renesas,r8a77965" > > Technically, {h3,m3,m3n}ulcb are the same board (although there may be > minor revision differences), with a different SiP mounted. > But they are called/marketed depending on which SiP is mounted. > > And on top of that, you can plug in a Kingfisher daughterboard. Could be an > overlay ;-) We probably shouldn't have put kingfisher as a top-level compatible then. But we did, so not really much point to discuss that now. As to whether there's a better way to express it in the schema, I'm not sure. I don't think there's a way with json-schema to express a list, but the 1st item is optional. Rob
Re: Looking for architecture papers
Both systemsim and my powerpc server boots with MSR_HV=1, i.e, hypervisor state. Is there away to fix that ? writing to the MSR cannot work according the documentation ( and reality ). On Sat, Oct 6, 2018 at 3:27 PM Segher Boessenkool wrote: > > On Sat, Oct 06, 2018 at 12:19:45PM +0300, Raz wrote: > > Hey > > How does HVSC works ? > > I looked in the code and LoPAR documentation. It looks like there is > > vector called > > system_call_pSeries ( at 0xc00 ) that is supposed to be called when we > > invoke HVSC from kernel > > mode. > > Now, I wrote a NULL call HSVC and patched the exceptions-64s.S to > > return RFID immediately. > > This does not work. > > Would you be so kind to explain how HVSC works ? > > thank you > > If your kernel is not running in hypervisor mode, sc 1 does not call the > kernel (but the hypervisor, instead). If your kernel _is_ running in > hypervisor mode, sc 1 does the same as sc 0, a normal system call. > > I don't know which it is for you; you didn't say. > > I have no idea what "a NULL call HSVC" means. If you make exception c00 > return immediately (as you suggest) then you have made all system calls > non-functional, which indeed is unlikely to work as you want. > > > Segher
Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
Hi Nick, Le 19/07/2017 à 08:59, Nicholas Piggin a écrit : Use nmi_enter similarly to system reset interrupts. This uses NMI printk NMI buffers and turns off various debugging facilities that helps avoid tripping on ourselves or other CPUs. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/traps.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 2849c4f50324..6d31f9d7c333 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs) void machine_check_exception(struct pt_regs *regs) { - enum ctx_state prev_state = exception_enter(); int recover = 0; + bool nested = in_nmi(); + if (!nested) + nmi_enter(); This alters preempt_count, then when die() is called in_interrupt() returns true allthough the trap didn't happen in interrupt, so oops_end() panics for "fatal exception in interrupt" instead of gently sending SIGBUS the faulting app. Any idea on how to fix this ? Christophe __this_cpu_inc(irq_stat.mce_exceptions); @@ -820,10 +822,11 @@ void machine_check_exception(struct pt_regs *regs) /* Must die if the interrupt is not recoverable */ if (!(regs->msr & MSR_RI)) - panic("Unrecoverable Machine check"); + nmi_panic(regs, "Unrecoverable Machine check"); bail: - exception_exit(prev_state); + if (!nested) + nmi_exit(); } void SMIException(struct pt_regs *regs)
Re: [PATCH 09/16] of: overlay: validate overlay properties #address-cells and #size-cells
On Thu, Oct 4, 2018 at 11:14 PM wrote: > > From: Frank Rowand > > If overlay properties #address-cells or #size-cells are already in > the live devicetree for any given node, then the values in the > overlay must match the values in the live tree. Hi Frank, I'm starting some FPGA testing on this patchset applied to v4.19-rc7. That applied cleanly; if that's not the best base to test against, please let me know. On a very simple overlay, I'm seeing this patch's warning catching things other than #address-cells or #size-cells. I'm just getting started looking at this, will spend time understanding this better and I'll test other overlays. The warnings were: Applying dtbo: socfpga_overlay.dtb [ 33.117881] fpga_manager fpga0: writing soc_system.rbf to Altera SOCFPGA FPGA Manager [ 33.575223] OF: overlay: WARNING: add_changeset_property(), memory leak will occur if overlay removed. Property: /soc/base-fpga-region/firmware-name [ 33.588584] OF: overlay: WARNING: add_changeset_property(), memory leak will occur if overlay removed. Property: /soc/base-fpga-region/fpga-bridges [ 33.601856] OF: overlay: WARNING: add_changeset_property(), memory leak will occur if overlay removed. Property: /soc/base-fpga-region/ranges Here's part of that overlay including the properties it's complaining about: /dts-v1/; /plugin/; / { fragment@0 { target = <_fpga_region>; #address-cells = <1>; #size-cells = <1>; __overlay__ { #address-cells = <1>; #size-cells = <1>; firmware-name = "soc_system.rbf"; fpga-bridges = <_bridge1>; ranges = <0x2 0xff20 0x10>, <0x0 0xc000 0x2000>; gpio@10040 { so on... By the way, I didn't get any warnings when I subsequently removed this overlay. Alan > > If the properties are already in the live tree then there is no > need to create a changeset entry to add them since they must > have the same value. This reduces the memory used by the > changeset and eliminates a possible memory leak. This is > verified by 12 fewer warnings during the devicetree unittest, > as the possible memory leak warnings about #address-cells and > > Signed-off-by: Frank Rowand > --- > drivers/of/overlay.c | 38 +++--- > 1 file changed, 35 insertions(+), 3 deletions(-) > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > index 29c33a5c533f..e6fb3ffe9d93 100644 > --- a/drivers/of/overlay.c > +++ b/drivers/of/overlay.c > @@ -287,7 +287,12 @@ static struct property *dup_and_fixup_symbol_prop( > * @target may be either in the live devicetree or in a new subtree that > * is contained in the changeset. > * > - * Some special properties are not updated (no error returned). > + * Some special properties are not added or updated (no error returned): > + * "name", "phandle", "linux,phandle". > + * > + * Properties "#address-cells" and "#size-cells" are not updated if they > + * are already in the live tree, but if present in the live tree, the values > + * in the overlay must match the values in the live tree. > * > * Update of property in symbols node is not allowed. > * > @@ -300,6 +305,7 @@ static int add_changeset_property(struct > overlay_changeset *ovcs, > { > struct property *new_prop = NULL, *prop; > int ret = 0; > + bool check_for_non_overlay_node = false; > > if (!of_prop_cmp(overlay_prop->name, "name") || > !of_prop_cmp(overlay_prop->name, "phandle") || > @@ -322,13 +328,39 @@ static int add_changeset_property(struct > overlay_changeset *ovcs, > if (!new_prop) > return -ENOMEM; > > - if (!prop) > + if (!prop) { > + > + check_for_non_overlay_node = true; > ret = of_changeset_add_property(>cset, target->np, > new_prop); > - else > + > + } else if (!of_prop_cmp(prop->name, "#address-cells")) { > + > + if (prop->length != 4 || new_prop->length != 4 || > + *(u32 *)prop->value != *(u32 *)new_prop->value) > + pr_err("ERROR: overlay and/or live tree > #address-cells invalid in node %pOF\n", > + target->np); > + > + } else if (!of_prop_cmp(prop->name, "#size-cells")) { > + > + if (prop->length != 4 || new_prop->length != 4 || > + *(u32 *)prop->value != *(u32 *)new_prop->value) > + pr_err("ERROR: overlay and/or live tree #size-cells > invalid in node %pOF\n", > + target->np); > + > + } else { > + > + check_for_non_overlay_node = true; > ret = of_changeset_update_property(>cset, target->np, >
Patch "sched/topology: Set correct NUMA topology type" has been added to the 4.18-stable tree
This is a note to let you know that I've just added the patch titled sched/topology: Set correct NUMA topology type to the 4.18-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: sched-topology-set-correct-numa-topology-type.patch and it can be found in the queue-4.18 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let know about it. >From foo@baz Mon Oct 8 17:39:53 CEST 2018 From: Srikar Dronamraju Date: Fri, 10 Aug 2018 22:30:18 +0530 Subject: sched/topology: Set correct NUMA topology type From: Srikar Dronamraju [ Upstream commit e5e96fafd9028b1478b165db78c52d981c14f471 ] With the following commit: 051f3ca02e46 ("sched/topology: Introduce NUMA identity node sched domain") the scheduler introduced a new NUMA level. However this leads to the NUMA topology on 2 node systems to not be marked as NUMA_DIRECT anymore. After this commit, it gets reported as NUMA_BACKPLANE, because sched_domains_numa_level is now 2 on 2 node systems. Fix this by allowing setting systems that have up to 2 NUMA levels as NUMA_DIRECT. While here remove code that assumes that level can be 0. Signed-off-by: Srikar Dronamraju Signed-off-by: Peter Zijlstra (Intel) Cc: Andre Wild Cc: Heiko Carstens Cc: Linus Torvalds Cc: Mel Gorman Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Rik van Riel Cc: Suravee Suthikulpanit Cc: Thomas Gleixner Cc: linuxppc-dev Fixes: 051f3ca02e46 "Introduce NUMA identity node sched domain" Link: http://lkml.kernel.org/r/1533920419-17410-1-git-send-email-sri...@linux.vnet.ibm.com Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/sched/topology.c |5 + 1 file changed, 1 insertion(+), 4 deletions(-) --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1295,7 +1295,7 @@ static void init_numa_topology_type(void n = sched_max_numa_distance; - if (sched_domains_numa_levels <= 1) { + if (sched_domains_numa_levels <= 2) { sched_numa_topology_type = NUMA_DIRECT; return; } @@ -1380,9 +1380,6 @@ void sched_init_numa(void) break; } - if (!level) - return; - /* * 'level' contains the number of unique distances * Patches currently in stable-queue which might be from sri...@linux.vnet.ibm.com are queue-4.18/sched-topology-set-correct-numa-topology-type.patch
Re: [PATCH 29/36] dt-bindings: arm: Convert Renesas board/soc bindings to json-schema
Hi Rob, On Mon, Oct 8, 2018 at 4:57 PM Rob Herring wrote: > On Mon, Oct 8, 2018 at 2:47 AM Geert Uytterhoeven > wrote: > > On Fri, Oct 5, 2018 at 6:59 PM Rob Herring wrote: > > > Convert Renesas SoC bindings to DT schema format using json-schema. > > > --- /dev/null > > > +++ b/Documentation/devicetree/bindings/arm/shmobile.yaml > > > @@ -0,0 +1,205 @@ > > > + - description: Kingfisher (SBEV-RCAR-KF-M03) > > > +items: > > > + - const: shimafuji,kingfisher > > > + - enum: > > > + - renesas,h3ulcb > > > + - renesas,m3ulcb > > > + - enum: > > > + - renesas,r8a7795 > > > + - renesas,r8a7796 > > > > This looks a bit funny: all other entries have the "const" last, and > > use it for the > > SoC number. May be correct, though. > > To clarify, this is an extension board that can fit both the [HM]3ULCB > > boards (actually also the new M3NULCB, I think). > > This being Kingfisher? Correct. > I wrote this based on dts files in the tree. There's 2 combinations that I > see: > > "shimafuji,kingfisher", "renesas,h3ulcb", "renesas,r8a7795" > "shimafuji,kingfisher", "renesas,m3ulcb", "renesas,r8a7796" > > The schema allows 4 combinations (1 * 2 * 2). I have no idea if the > other combinations are possible. If not, then we could rewrite this as > 2 entries with 3 const values each. I expect there will soon be a third one: "shimafuji,kingfisher", "renesas,m3nulcb", "renesas,r8a77965" Technically, {h3,m3,m3n}ulcb are the same board (although there may be minor revision differences), with a different SiP mounted. But they are called/marketed depending on which SiP is mounted. And on top of that, you can plug in a Kingfisher daughterboard. Could be an overlay ;-) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH 05/36] dt-bindings: arm: renesas: Move 'renesas,prr' binding to its own doc
On Mon, Oct 8, 2018 at 2:05 AM Geert Uytterhoeven wrote: > > Hi Rob, > > On Fri, Oct 5, 2018 at 6:58 PM Rob Herring wrote: > > In preparation to convert board-level bindings to json-schema, move > > various misc SoC bindings out to their own file. > > > > Cc: Mark Rutland > > Cc: Simon Horman > > Cc: Magnus Damm > > Cc: devicet...@vger.kernel.org > > Cc: linux-renesas-...@vger.kernel.org > > Signed-off-by: Rob Herring > > Looks good to me, but needs a rebase, as the PRR section has been extended > in -next. Is this something you all can apply still for 4.20? Rob
Re: [PATCH 29/36] dt-bindings: arm: Convert Renesas board/soc bindings to json-schema
On Mon, Oct 8, 2018 at 2:47 AM Geert Uytterhoeven wrote: > > Hi Rob, > > On Fri, Oct 5, 2018 at 6:59 PM Rob Herring wrote: > > Convert Renesas SoC bindings to DT schema format using json-schema. > > > > Cc: Simon Horman > > Cc: Magnus Damm > > Cc: Mark Rutland > > Cc: linux-renesas-...@vger.kernel.org > > Cc: devicet...@vger.kernel.org > > Signed-off-by: Rob Herring > > Thanks for your patch! > > Note that this will need a rebase, as more SoCs/boards have been added > in -next. > > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/arm/shmobile.yaml > > @@ -0,0 +1,205 @@ > > +# SPDX-License-Identifier: None > > The old file didn't have an SPDX header, so it was GPL-2.0, implicitly? Right. I meant to update this with something. I'd prefer it be dual licensed as these aren't just kernel files, but I don't really want to try to gather permissions from all the copyright holders. And who is the copyright holder when it is implicit? Everyone listed by git blame? > > +%YAML 1.2 > > +--- > > +$id: http://devicetree.org/schemas/bindings/arm/shmobile.yaml# > > +$schema: http://devicetree.org/meta-schemas/core.yaml# > > + > > +title: Renesas SH-Mobile, R-Mobile, and R-Car Platform Device Tree Bindings > > + > > +maintainers: > > + - Geert Uytterhoeven > > Simon Horman (supporter:ARM/SHMOBILE ARM ARCHITECTURE) > Magnus Damm (supporter:ARM/SHMOBILE ARM ARCHITECTURE) > > You had it right in the CC list, though... I generated it here from git log rather get_maintainers.pl because get_maintainers.pl just lists me for a bunch of them. > > + - description: RZ/G1M (R8A77430) > > +items: > > + - enum: > > + # iWave Systems RZ/G1M Qseven Development Platform > > (iW-RainboW-G20D-Qseven) > > + - iwave,g20d > > + - const: iwave,g20m > > + - const: renesas,r8a7743 > > + > > + - items: > > + - enum: > > + # iWave Systems RZ/G1M Qseven System On Module > > (iW-RainboW-G20M-Qseven) > > + - iwave,g20m > > + - const: renesas,r8a7743 > > + > > + - description: RZ/G1N (R8A77440) > > +items: > > + - enum: > > + - renesas,sk-rzg1m # SK-RZG1M (YR8A77430S000BE) > > This board belongs under the RZ/G1M section above > (see also the 7743 in the part number). Indeed. Not sure how I screwed that one up. > > + - const: renesas,r8a7744 > > > + - description: Kingfisher (SBEV-RCAR-KF-M03) > > +items: > > + - const: shimafuji,kingfisher > > + - enum: > > + - renesas,h3ulcb > > + - renesas,m3ulcb > > + - enum: > > + - renesas,r8a7795 > > + - renesas,r8a7796 > > This looks a bit funny: all other entries have the "const" last, and > use it for the > SoC number. May be correct, though. > To clarify, this is an extension board that can fit both the [HM]3ULCB > boards (actually also the new M3NULCB, I think). This being Kingfisher? I wrote this based on dts files in the tree. There's 2 combinations that I see: "shimafuji,kingfisher", "renesas,h3ulcb", "renesas,r8a7795" "shimafuji,kingfisher", "renesas,m3ulcb", "renesas,r8a7796" The schema allows 4 combinations (1 * 2 * 2). I have no idea if the other combinations are possible. If not, then we could rewrite this as 2 entries with 3 const values each. Rob
Re: [PATCH 29/36] dt-bindings: arm: Convert Renesas board/soc bindings to json-schema
On Mon, Oct 8, 2018 at 3:02 AM Simon Horman wrote: > > On Fri, Oct 05, 2018 at 11:58:41AM -0500, Rob Herring wrote: > > Convert Renesas SoC bindings to DT schema format using json-schema. > > > > Cc: Simon Horman > > Cc: Magnus Damm > > Cc: Mark Rutland > > Cc: linux-renesas-...@vger.kernel.org > > Cc: devicet...@vger.kernel.org > > Signed-off-by: Rob Herring > > This seems fine to me other than that it does not seem > to apply cleanly to next. > > shmobile.txt sees a couple of updates per release cycle so from my point of > view it would ideal if this change could hit -rc1 to allow patches for > v4.21 to be accepted smoothly (already one from Sergei will need rebasing). When we get to the point of merging (which isn't going to be 4.20), you and other maintainers can probably take all these patches. Other than the few restructuring patches, the only dependency is the build support which isn't a dependency to apply it, but build it. I plan to build any patches as part of reviewing at least early on. OTOH, the build support is small enough and self contained that maybe it can just be applied for 4.20. Rob
Re: [PATCH 28/36] dt-bindings: arm: Convert Rockchip board/soc bindings to json-schema
On Mon, Oct 8, 2018 at 4:45 AM Heiko Stuebner wrote: > > Hi Rob, > > either I'm misunderstanding that, or something did go a bit wrong during > the conversion, as pointed out below: > > Am Freitag, 5. Oktober 2018, 18:58:40 CEST schrieb Rob Herring: > > Convert Rockchip SoC bindings to DT schema format using json-schema. > > > > Cc: Mark Rutland > > Cc: Heiko Stuebner > > Cc: devicet...@vger.kernel.org > > Cc: linux-arm-ker...@lists.infradead.org > > Cc: linux-rockc...@lists.infradead.org > > Signed-off-by: Rob Herring > > --- > > .../devicetree/bindings/arm/rockchip.txt | 220 > > .../devicetree/bindings/arm/rockchip.yaml | 242 ++ > > 2 files changed, 242 insertions(+), 220 deletions(-) > > delete mode 100644 Documentation/devicetree/bindings/arm/rockchip.txt > > create mode 100644 Documentation/devicetree/bindings/arm/rockchip.yaml > > > > > > > +properties: > > + $nodename: > > +const: '/' > > + compatible: > > +oneOf: > > + - items: > > + - enum: > > + - amarula,vyasa-rk3288 > > + - asus,rk3288-tinker > > + - radxa,rock2-square > > + - chipspark,popmetal-rk3288 > > + - netxeon,r89 > > + - firefly,firefly-rk3288 > > + - firefly,firefly-rk3288-beta > > + - firefly,firefly-rk3288-reload > > + - mqmaker,miqi > > + - rockchip,rk3288-fennec > > + - const: rockchip,rk3288 > > These are very much distinct boards, so shouldn't they also get > individual entries including their existing description like the phytec > or google boards below? It is grouped by SoC compatible and # of compatible strings. So this one is all the cases that have 2 compatible strings. It is simply saying the 1st compatible string must be one of the enums and the 2nd compatible string must be "rockchip,rk3288". > > Similarly why is it an enum for those, while the Google boards get a > const for each compatible string? Because each Google board is a fixed list of strings. > Most non-google boards below also lost their description and where lumped > together into combined entries. Was that intentional? If the description was just repeating the compatible string with spaces and capitalization, then yes it was intentional. If your description matches what you have for 'model', then I'd prefer to see model added as a property schema. Rob
Re: [PATCH 22/36] dt-bindings: arm: Convert FSL board/soc bindings to json-schema
On Mon, Oct 8, 2018 at 2:02 AM Shawn Guo wrote: > > On Fri, Oct 05, 2018 at 11:58:34AM -0500, Rob Herring wrote: > > Convert Freescale SoC bindings to DT schema format using json-schema. > > +properties: > > + $nodename: > > +const: '/' > > + compatible: > > +oneOf: > > + - description: i.MX23 based Boards > > +items: > > + - enum: > > + - fsl,imx23-evk > > + - olimex,imx23-olinuxino > > + - const: fsl,imx23 > > + > > + - description: i.MX25 Product Development Kit > > +items: > > + - enum: > > + - fsl,imx25-pdk > > + - const: fsl,imx25 > > + > > + - description: i.MX27 Product Development Kit > > +items: > > + - enum: > > + - fsl,imx27-pdk > > + - const: fsl,imx27 > > + > > + - description: i.MX28 based Boards > > +items: > > + - enum: > > + - fsl,imx28-evk > > + - i2se,duckbill > > + - i2se,duckbill-2 > > + - technologic,imx28-ts4600 > > + - const: fsl,imx28 > > + - items: > > The schema is new to me. This line looks unusual to me, so you may want > to double check. It's fine. There's just no description schema on this one as it's a continuation of the previous one (logically, but not from a schema perspective). Perhaps add "i.MX28 I2SE Duckbill 2 based boards". > > + - enum: > > + - i2se,duckbill-2-485 > > + - i2se,duckbill-2-enocean > > + - i2se,duckbill-2-spi > > + - const: i2se,duckbill-2 > > + - const: fsl,imx28 > > + > > + - description: i.MX51 Babbage Board
Re: [PATCH v5 0/9] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK
On 10/08/2018 11:06 AM, Michael Ellerman wrote: Christophe Leroy writes: The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which moves the thread_info into task_struct. Moving thread_info into task_struct has the following advantages: - It protects thread_info from corruption in the case of stack overflows. - Its address is harder to determine if stack addresses are leaked, making a number of attacks more difficult. This is blowing up pretty nicely with CONFIG_KMEMLEAK enabled, haven't had time to dig further: Nice :) I have the same issue on PPC32. Seems like when descending the stack, save_context_stack() calls validate_sp(), which in turn calls valid_irq_stack() when the first test fails. But than early, hardirq_ctx[cpu] is NULL. With sp = 0, valid_irq_stack() used to return false because it expected sp to be above the thread_info. But now that thread_info is gone, sp = 0 is valid when stack = NULL. The following fixes it: diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index afe76f7f316c..3e534147fd8f 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -2006,6 +2006,9 @@ int validate_sp(unsigned long sp, struct task_struct *p, { unsigned long stack_page = (unsigned long)task_stack_page(p); + if (sp < THREAD_SIZE) + return 0; + if (sp >= stack_page + sizeof(struct thread_struct) && sp <= stack_page + THREAD_SIZE - nbytes) return 1; Looking at this I also realise I forgot to remove the sizeof(struct thread_struct) from here. And this sizeof() was buggy, it should have been thread_info instead of thread_struct, but nevermind as it is going away. Christophe Unable to handle kernel paging request for data at address 0x Faulting instruction address: 0xc0022064 Oops: Kernel access of bad area, sig: 11 [#9] LE SMP NR_CPUS=32 NUMA Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc3-gcc-7.3.1-00103-gc795acc08338 #268 NIP: c0022064 LR: c00220c0 CTR: c001f5c0 REGS: c1244a50 TRAP: 0380 Not tainted (4.19.0-rc3-gcc-7.3.1-00103-gc795acc08338) MSR: 80001033 CR: 48022244 XER: 2000 CFAR: c00220c4 IRQMASK: 1 GPR00: c00220c0 c1244cd0 c124b200 0001 GPR04: c1201180 0070 c1275ef8 GPR08: 0001 3f90 2b6e6f6d6d6f635f GPR12: c001f5c0 c145 02e2be38 GPR16: 7dc54c70 02d854b8 c0d87f00 GPR20: c0d87ef0 c0d87ee0 c0d87f08 c006c1a8 GPR24: c0d87ec8 7265677368657265 c0062a04 GPR28: 0006 c1201180 NIP [c0022064] show_stack+0xe4/0x2b0 LR [c00220c0] show_stack+0x140/0x2b0 Call Trace: [c1244cd0] [c002217c] show_stack+0x1fc/0x2b0 (unreliable) [c1244da0] [c002245c] show_regs+0x22c/0x430 [c1244e50] [c002ae8c] __die+0xfc/0x140 [c1244ed0] [c002b954] die+0x74/0xf0 [c1244f10] [c006e0f8] bad_page_fault+0xe8/0x180 [c1244f80] [c0074f48] slb_miss_large_addr+0x68/0x2e0 [c1244fc0] [c0008ce8] large_addr_slb+0x158/0x160 --- interrupt: 380 at show_stack+0xe4/0x2b0 LR = show_stack+0x140/0x2b0 [c12452b0] [c002217c] show_stack+0x1fc/0x2b0 (unreliable) [c1245380] [c002245c] show_regs+0x22c/0x430 [c1245430] [c002ae8c] __die+0xfc/0x140 [c12454b0] [c002b954] die+0x74/0xf0 [c12454f0] [c006e0f8] bad_page_fault+0xe8/0x180 [c1245560] [c0074f48] slb_miss_large_addr+0x68/0x2e0 [c12455a0] [c0008ce8] large_addr_slb+0x158/0x160 --- interrupt: 380 at show_stack+0xe4/0x2b0 LR = show_stack+0x140/0x2b0 [c1245890] [c002217c] show_stack+0x1fc/0x2b0 (unreliable) [c1245960] [c002245c] show_regs+0x22c/0x430 [c1245a10] [c002ae8c] __die+0xfc/0x140 [c1245a90] [c002b954] die+0x74/0xf0 [c1245ad0] [c006e0f8] bad_page_fault+0xe8/0x180 [c1245b40] [c0074f48] slb_miss_large_addr+0x68/0x2e0 [c1245b80] [c0008ce8] large_addr_slb+0x158/0x160 --- interrupt: 380 at show_stack+0xe4/0x2b0 LR = show_stack+0x140/0x2b0 [c1245e70] [c002217c] show_stack+0x1fc/0x2b0 (unreliable) [c1245f40] [c002245c] show_regs+0x22c/0x430 [c1245ff0] [c002ae8c] __die+0xfc/0x140 [c1246070] [c002b954] die+0x74/0xf0 [c12460b0] [c006e0f8] bad_page_fault+0xe8/0x180 [c1246120] [c0074f48] slb_miss_large_addr+0x68/0x2e0 [c1246160] [c0008ce8] large_addr_slb+0x158/0x160 ---
Re: [PATCH 29/36] dt-bindings: arm: Convert Renesas board/soc bindings to json-schema
On Fri, Oct 05, 2018 at 11:58:41AM -0500, Rob Herring wrote: > Convert Renesas SoC bindings to DT schema format using json-schema. > > Cc: Simon Horman > Cc: Magnus Damm > Cc: Mark Rutland > Cc: linux-renesas-...@vger.kernel.org > Cc: devicet...@vger.kernel.org > Signed-off-by: Rob Herring This seems fine to me other than that it does not seem to apply cleanly to next. shmobile.txt sees a couple of updates per release cycle so from my point of view it would ideal if this change could hit -rc1 to allow patches for v4.21 to be accepted smoothly (already one from Sergei will need rebasing). > --- > .../devicetree/bindings/arm/shmobile.txt | 143 > .../devicetree/bindings/arm/shmobile.yaml | 205 ++ > 2 files changed, 205 insertions(+), 143 deletions(-) > delete mode 100644 Documentation/devicetree/bindings/arm/shmobile.txt > create mode 100644 Documentation/devicetree/bindings/arm/shmobile.yaml > > diff --git a/Documentation/devicetree/bindings/arm/shmobile.txt > b/Documentation/devicetree/bindings/arm/shmobile.txt > deleted file mode 100644 > index 619b765e5bee.. > --- a/Documentation/devicetree/bindings/arm/shmobile.txt > +++ /dev/null > @@ -1,143 +0,0 @@ > -Renesas SH-Mobile, R-Mobile, and R-Car Platform Device Tree Bindings > - > - > -SoCs: > - > - - Emma Mobile EV2 > -compatible = "renesas,emev2" > - - RZ/A1H (R7S72100) > -compatible = "renesas,r7s72100" > - - SH-Mobile AG5 (R8A73A00/SH73A0) > -compatible = "renesas,sh73a0" > - - R-Mobile APE6 (R8A73A40) > -compatible = "renesas,r8a73a4" > - - R-Mobile A1 (R8A77400) > -compatible = "renesas,r8a7740" > - - RZ/G1H (R8A77420) > -compatible = "renesas,r8a7742" > - - RZ/G1M (R8A77430) > -compatible = "renesas,r8a7743" > - - RZ/G1N (R8A77440) > -compatible = "renesas,r8a7744" > - - RZ/G1E (R8A77450) > -compatible = "renesas,r8a7745" > - - RZ/G1C (R8A77470) > -compatible = "renesas,r8a77470" > - - R-Car M1A (R8A77781) > -compatible = "renesas,r8a7778" > - - R-Car H1 (R8A77790) > -compatible = "renesas,r8a7779" > - - R-Car H2 (R8A77900) > -compatible = "renesas,r8a7790" > - - R-Car M2-W (R8A77910) > -compatible = "renesas,r8a7791" > - - R-Car V2H (R8A77920) > -compatible = "renesas,r8a7792" > - - R-Car M2-N (R8A77930) > -compatible = "renesas,r8a7793" > - - R-Car E2 (R8A77940) > -compatible = "renesas,r8a7794" > - - R-Car H3 (R8A77950) > -compatible = "renesas,r8a7795" > - - R-Car M3-W (R8A77960) > -compatible = "renesas,r8a7796" > - - R-Car M3-N (R8A77965) > -compatible = "renesas,r8a77965" > - - R-Car V3M (R8A77970) > -compatible = "renesas,r8a77970" > - - R-Car V3H (R8A77980) > -compatible = "renesas,r8a77980" > - - R-Car E3 (R8A77990) > -compatible = "renesas,r8a77990" > - - R-Car D3 (R8A77995) > -compatible = "renesas,r8a77995" > - - RZ/N1D (R9A06G032) > -compatible = "renesas,r9a06g032" > - > -Boards: > - > - - Alt (RTP0RC7794SEB00010S) > -compatible = "renesas,alt", "renesas,r8a7794" > - - APE6-EVM > -compatible = "renesas,ape6evm", "renesas,r8a73a4" > - - Atmark Techno Armadillo-800 EVA > -compatible = "renesas,armadillo800eva", "renesas,r8a7740" > - - Blanche (RTP0RC7792SEB00010S) > -compatible = "renesas,blanche", "renesas,r8a7792" > - - BOCK-W > -compatible = "renesas,bockw", "renesas,r8a7778" > - - Condor (RTP0RC77980SEB0010SS/RTP0RC77980SEB0010SA01) > -compatible = "renesas,condor", "renesas,r8a77980" > - - Draak (RTP0RC77995SEB0010S) > -compatible = "renesas,draak", "renesas,r8a77995" > - - Eagle (RTP0RC77970SEB0010S) > -compatible = "renesas,eagle", "renesas,r8a77970" > - - Ebisu (RTP0RC77990SEB0010S) > -compatible = "renesas,ebisu", "renesas,r8a77990" > - - Genmai (RTK772100BC0BR) > -compatible = "renesas,genmai", "renesas,r7s72100" > - - GR-Peach (X28A-M01-E/F) > -compatible = "renesas,gr-peach", "renesas,r7s72100" > - - Gose (RTP0RC7793SEB00010S) > -compatible = "renesas,gose", "renesas,r8a7793" > - - H3ULCB (R-Car Starter Kit Premier, RTP0RC7795SKBX0010SA00 (H3 ES1.1)) > -H3ULCB (R-Car Starter Kit Premier, RTP0RC77951SKBX010SA00 (H3 ES2.0)) > -compatible = "renesas,h3ulcb", "renesas,r8a7795" > - - Henninger > -compatible = "renesas,henninger", "renesas,r8a7791" > - - iWave Systems RZ/G1C Single Board Computer (iW-RainboW-G23S) > -compatible = "iwave,g23s", "renesas,r8a77470" > - - iWave Systems RZ/G1E SODIMM SOM Development Platform (iW-RainboW-G22D) > -compatible = "iwave,g22d", "iwave,g22m", "renesas,r8a7745" > - - iWave Systems RZ/G1E SODIMM System On Module (iW-RainboW-G22M-SM) > -compatible = "iwave,g22m", "renesas,r8a7745" > - - iWave Systems RZ/G1M Qseven Development Platform (iW-RainboW-G20D-Qseven) > -compatible = "iwave,g20d", "iwave,g20m", "renesas,r8a7743" > - - iWave Systems RZ/G1M
Re: [PATCH v5 0/9] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK
Christophe Leroy writes: > The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which > moves the thread_info into task_struct. > > Moving thread_info into task_struct has the following advantages: > - It protects thread_info from corruption in the case of stack > overflows. > - Its address is harder to determine if stack addresses are > leaked, making a number of attacks more difficult. This is blowing up pretty nicely with CONFIG_KMEMLEAK enabled, haven't had time to dig further: Unable to handle kernel paging request for data at address 0x Faulting instruction address: 0xc0022064 Oops: Kernel access of bad area, sig: 11 [#9] LE SMP NR_CPUS=32 NUMA Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc3-gcc-7.3.1-00103-gc795acc08338 #268 NIP: c0022064 LR: c00220c0 CTR: c001f5c0 REGS: c1244a50 TRAP: 0380 Not tainted (4.19.0-rc3-gcc-7.3.1-00103-gc795acc08338) MSR: 80001033 CR: 48022244 XER: 2000 CFAR: c00220c4 IRQMASK: 1 GPR00: c00220c0 c1244cd0 c124b200 0001 GPR04: c1201180 0070 c1275ef8 GPR08: 0001 3f90 2b6e6f6d6d6f635f GPR12: c001f5c0 c145 02e2be38 GPR16: 7dc54c70 02d854b8 c0d87f00 GPR20: c0d87ef0 c0d87ee0 c0d87f08 c006c1a8 GPR24: c0d87ec8 7265677368657265 c0062a04 GPR28: 0006 c1201180 NIP [c0022064] show_stack+0xe4/0x2b0 LR [c00220c0] show_stack+0x140/0x2b0 Call Trace: [c1244cd0] [c002217c] show_stack+0x1fc/0x2b0 (unreliable) [c1244da0] [c002245c] show_regs+0x22c/0x430 [c1244e50] [c002ae8c] __die+0xfc/0x140 [c1244ed0] [c002b954] die+0x74/0xf0 [c1244f10] [c006e0f8] bad_page_fault+0xe8/0x180 [c1244f80] [c0074f48] slb_miss_large_addr+0x68/0x2e0 [c1244fc0] [c0008ce8] large_addr_slb+0x158/0x160 --- interrupt: 380 at show_stack+0xe4/0x2b0 LR = show_stack+0x140/0x2b0 [c12452b0] [c002217c] show_stack+0x1fc/0x2b0 (unreliable) [c1245380] [c002245c] show_regs+0x22c/0x430 [c1245430] [c002ae8c] __die+0xfc/0x140 [c12454b0] [c002b954] die+0x74/0xf0 [c12454f0] [c006e0f8] bad_page_fault+0xe8/0x180 [c1245560] [c0074f48] slb_miss_large_addr+0x68/0x2e0 [c12455a0] [c0008ce8] large_addr_slb+0x158/0x160 --- interrupt: 380 at show_stack+0xe4/0x2b0 LR = show_stack+0x140/0x2b0 [c1245890] [c002217c] show_stack+0x1fc/0x2b0 (unreliable) [c1245960] [c002245c] show_regs+0x22c/0x430 [c1245a10] [c002ae8c] __die+0xfc/0x140 [c1245a90] [c002b954] die+0x74/0xf0 [c1245ad0] [c006e0f8] bad_page_fault+0xe8/0x180 [c1245b40] [c0074f48] slb_miss_large_addr+0x68/0x2e0 [c1245b80] [c0008ce8] large_addr_slb+0x158/0x160 --- interrupt: 380 at show_stack+0xe4/0x2b0 LR = show_stack+0x140/0x2b0 [c1245e70] [c002217c] show_stack+0x1fc/0x2b0 (unreliable) [c1245f40] [c002245c] show_regs+0x22c/0x430 [c1245ff0] [c002ae8c] __die+0xfc/0x140 [c1246070] [c002b954] die+0x74/0xf0 [c12460b0] [c006e0f8] bad_page_fault+0xe8/0x180 [c1246120] [c0074f48] slb_miss_large_addr+0x68/0x2e0 [c1246160] [c0008ce8] large_addr_slb+0x158/0x160 --- interrupt: 380 at show_stack+0xe4/0x2b0 LR = show_stack+0x140/0x2b0 [c1246450] [c002217c] show_stack+0x1fc/0x2b0 (unreliable) [c1246520] [c002245c] show_regs+0x22c/0x430 [c12465d0] [c002ae8c] __die+0xfc/0x140 [c1246650] [c002b954] die+0x74/0xf0 [c1246690] [c006e0f8] bad_page_fault+0xe8/0x180 [c1246700] [c0074f48] slb_miss_large_addr+0x68/0x2e0 [c1246740] [c0008ce8] large_addr_slb+0x158/0x160 --- interrupt: 380 at show_stack+0xe4/0x2b0 LR = show_stack+0x140/0x2b0 [c1246a30] [c002217c] show_stack+0x1fc/0x2b0 (unreliable) [c1246b00] [c002245c] show_regs+0x22c/0x430 [c1246bb0] [c002ae8c] __die+0xfc/0x140 [c1246c30] [c002b954] die+0x74/0xf0 [c1246c70] [c006e0f8] bad_page_fault+0xe8/0x180 [c1246ce0] [c0074f48] slb_miss_large_addr+0x68/0x2e0 [c1246d20] [c0008ce8] large_addr_slb+0x158/0x160 --- interrupt: 380 at show_stack+0xe4/0x2b0 LR = show_stack+0x140/0x2b0 [c1247010] [c002217c] show_stack+0x1fc/0x2b0 (unreliable) [c12470e0] [c002245c] show_regs+0x22c/0x430
Re: [PATCH v6 0/9] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK
On Mon, 2018-10-08 at 09:16 +, Christophe Leroy wrote: > The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which > moves the thread_info into task_struct. We need to make sure we don't have code that assumes that we don't take faults on TI access. On ppc64, the stack SLB entries are bolted, which means the TI is too. We might have code that assumes that we don't get SLB faults when accessing TI. If not, we're fine but that needs a close look. Ben. > Moving thread_info into task_struct has the following advantages: > - It protects thread_info from corruption in the case of stack > overflows. > - Its address is harder to determine if stack addresses are > leaked, making a number of attacks more difficult. > > Changes since v5: > - Fixed livepatch_sp setup by using end_of_stack() instead of hardcoding > - Fixed PPC_BPF_LOAD_CPU() macro > > Changes since v4: > - Fixed a build failure on 32bits SMP when include/generated/asm-offsets.h > is not > already existing, was due to spaces instead of a tab in the Makefile > > Changes since RFC v3: (based on Nick's review) > - Renamed task_size.h to task_size_user64.h to better relate to what it > contains. > - Handling of the isolation of thread_info cpu field inside CONFIG_SMP > #ifdefs moved to a separate patch. > - Removed CURRENT_THREAD_INFO macro completely. > - Added a guard in asm/smp.h to avoid build failure before _TASK_CPU is > defined. > - Added a patch at the end to rename 'tp' pointers to 'sp' pointers > - Renamed 'tp' into 'sp' pointers in preparation patch when relevant > - Fixed a few commit logs > - Fixed checkpatch report. > > Changes since RFC v2: > - Removed the modification of names in asm-offsets > - Created a rule in arch/powerpc/Makefile to append the offset of > current->cpu in CFLAGS > - Modified asm/smp.h to use the offset set in CFLAGS > - Squashed the renaming of THREAD_INFO to TASK_STACK in the preparation patch > - Moved the modification of current_pt_regs in the patch activating > CONFIG_THREAD_INFO_IN_TASK > > Changes since RFC v1: > - Removed the first patch which was modifying header inclusion order in timer > - Modified some names in asm-offsets to avoid conflicts when including > asm-offsets in C files > - Modified asm/smp.h to avoid having to include linux/sched.h (using > asm-offsets instead) > - Moved some changes from the activation patch to the preparation patch. > > Christophe Leroy (9): > book3s/64: avoid circular header inclusion in mmu-hash.h > powerpc: Only use task_struct 'cpu' field on SMP > powerpc: Prepare for moving thread_info into task_struct > powerpc: Activate CONFIG_THREAD_INFO_IN_TASK > powerpc: regain entire stack space > powerpc: 'current_set' is now a table of task_struct pointers > powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU > powerpc/64: Remove CURRENT_THREAD_INFO > powerpc: clean stack pointers naming > > arch/powerpc/Kconfig | 1 + > arch/powerpc/Makefile | 8 ++- > arch/powerpc/include/asm/asm-prototypes.h | 4 +- > arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 +- > arch/powerpc/include/asm/exception-64s.h | 4 +- > arch/powerpc/include/asm/irq.h | 14 ++--- > arch/powerpc/include/asm/livepatch.h | 7 ++- > arch/powerpc/include/asm/processor.h | 39 + > arch/powerpc/include/asm/ptrace.h | 2 +- > arch/powerpc/include/asm/reg.h | 2 +- > arch/powerpc/include/asm/smp.h | 17 +- > arch/powerpc/include/asm/task_size_user64.h| 42 ++ > arch/powerpc/include/asm/thread_info.h | 19 --- > arch/powerpc/kernel/asm-offsets.c | 10 ++-- > arch/powerpc/kernel/entry_32.S | 66 -- > arch/powerpc/kernel/entry_64.S | 12 ++-- > arch/powerpc/kernel/epapr_hcalls.S | 5 +- > arch/powerpc/kernel/exceptions-64e.S | 13 + > arch/powerpc/kernel/exceptions-64s.S | 2 +- > arch/powerpc/kernel/head_32.S | 14 ++--- > arch/powerpc/kernel/head_40x.S | 4 +- > arch/powerpc/kernel/head_44x.S | 8 +-- > arch/powerpc/kernel/head_64.S | 1 + > arch/powerpc/kernel/head_8xx.S | 2 +- > arch/powerpc/kernel/head_booke.h | 12 +--- > arch/powerpc/kernel/head_fsl_booke.S | 16 +++--- > arch/powerpc/kernel/idle_6xx.S | 8 +-- > arch/powerpc/kernel/idle_book3e.S | 2 +- > arch/powerpc/kernel/idle_e500.S| 8 +-- > arch/powerpc/kernel/idle_power4.S | 2 +- > arch/powerpc/kernel/irq.c | 77 > +- > arch/powerpc/kernel/kgdb.c | 28 -- > arch/powerpc/kernel/machine_kexec_64.c | 6 +- >
Re: [PATCH 28/36] dt-bindings: arm: Convert Rockchip board/soc bindings to json-schema
Hi Rob, either I'm misunderstanding that, or something did go a bit wrong during the conversion, as pointed out below: Am Freitag, 5. Oktober 2018, 18:58:40 CEST schrieb Rob Herring: > Convert Rockchip SoC bindings to DT schema format using json-schema. > > Cc: Mark Rutland > Cc: Heiko Stuebner > Cc: devicet...@vger.kernel.org > Cc: linux-arm-ker...@lists.infradead.org > Cc: linux-rockc...@lists.infradead.org > Signed-off-by: Rob Herring > --- > .../devicetree/bindings/arm/rockchip.txt | 220 > .../devicetree/bindings/arm/rockchip.yaml | 242 ++ > 2 files changed, 242 insertions(+), 220 deletions(-) > delete mode 100644 Documentation/devicetree/bindings/arm/rockchip.txt > create mode 100644 Documentation/devicetree/bindings/arm/rockchip.yaml > > +properties: > + $nodename: > +const: '/' > + compatible: > +oneOf: > + - items: > + - enum: > + - amarula,vyasa-rk3288 > + - asus,rk3288-tinker > + - radxa,rock2-square > + - chipspark,popmetal-rk3288 > + - netxeon,r89 > + - firefly,firefly-rk3288 > + - firefly,firefly-rk3288-beta > + - firefly,firefly-rk3288-reload > + - mqmaker,miqi > + - rockchip,rk3288-fennec > + - const: rockchip,rk3288 These are very much distinct boards, so shouldn't they also get individual entries including their existing description like the phytec or google boards below? Similarly why is it an enum for those, while the Google boards get a const for each compatible string? Most non-google boards below also lost their description and where lumped together into combined entries. Was that intentional? Thanks Heiko > + > + - description: Phytec phyCORE-RK3288 Rapid Development Kit > +items: > + - const: phytec,rk3288-pcm-947 > + - const: phytec,rk3288-phycore-som > + - const: rockchip,rk3288 > + > + - description: Google Mickey (Asus Chromebit CS10) > +items: > + - const: google,veyron-mickey-rev8 > + - const: google,veyron-mickey-rev7 > + - const: google,veyron-mickey-rev6 > + - const: google,veyron-mickey-rev5 > + - const: google,veyron-mickey-rev4 > + - const: google,veyron-mickey-rev3 > + - const: google,veyron-mickey-rev2 > + - const: google,veyron-mickey-rev1 > + - const: google,veyron-mickey-rev0 > + - const: google,veyron-mickey > + - const: google,veyron > + - const: rockchip,rk3288 > + > + - description: Google Minnie (Asus Chromebook Flip C100P) > +items: > + - const: google,veyron-minnie-rev4 > + - const: google,veyron-minnie-rev3 > + - const: google,veyron-minnie-rev2 > + - const: google,veyron-minnie-rev1 > + - const: google,veyron-minnie-rev0 > + - const: google,veyron-minnie > + - const: google,veyron > + - const: rockchip,rk3288 > + > + - description: Google Pinky (dev-board) > +items: > + - const: google,veyron-pinky-rev2 > + - const: google,veyron-pinky > + - const: google,veyron > + - const: rockchip,rk3288 > + > + - description: Google Speedy (Asus C201 Chromebook) > +items: > + - const: google,veyron-speedy-rev9 > + - const: google,veyron-speedy-rev8 > + - const: google,veyron-speedy-rev7 > + - const: google,veyron-speedy-rev6 > + - const: google,veyron-speedy-rev5 > + - const: google,veyron-speedy-rev4 > + - const: google,veyron-speedy-rev3 > + - const: google,veyron-speedy-rev2 > + - const: google,veyron-speedy > + - const: google,veyron > + - const: rockchip,rk3288 > + > + - description: Google Jaq (Haier Chromebook 11 and more) > +items: > + - const: google,veyron-jaq-rev5 > + - const: google,veyron-jaq-rev4 > + - const: google,veyron-jaq-rev3 > + - const: google,veyron-jaq-rev2 > + - const: google,veyron-jaq-rev1 > + - const: google,veyron-jaq > + - const: google,veyron > + - const: rockchip,rk3288 > + > + - description: Google Jerry (Hisense Chromebook C11 and more) > +items: > + - const: google,veyron-jerry-rev7 > + - const: google,veyron-jerry-rev6 > + - const: google,veyron-jerry-rev5 > + - const: google,veyron-jerry-rev4 > + - const: google,veyron-jerry-rev3 > + - const: google,veyron-jerry > + - const: google,veyron > + - const: rockchip,rk3288 > + > + - description: Google Brain (dev-board) > +items: > + - const: google,veyron-brain-rev0 > + - const: google,veyron-brain > + - const: google,veyron > + - const:
Re: [RFC PATCH kernel] vfio/spapr_tce: Get rid of possible infinite loop
Serhii Popovych writes: > Alexey Kardashevskiy wrote: >> As a part of cleanup, the SPAPR TCE IOMMU subdriver releases preregistered >> memory. If there is a bug in memory release, the loop in >> tce_iommu_release() becomes infinite; this actually happened to me. >> >> This makes the loop finite and prints a warning on every failure to make >> the code more bug prone. >> >> Signed-off-by: Alexey Kardashevskiy >> --- >> drivers/vfio/vfio_iommu_spapr_tce.c | 10 +++--- >> 1 file changed, 3 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c >> b/drivers/vfio/vfio_iommu_spapr_tce.c >> index b1a8ab3..ece0651 100644 >> --- a/drivers/vfio/vfio_iommu_spapr_tce.c >> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c >> @@ -393,13 +394,8 @@ static void tce_iommu_release(void *iommu_data) >> tce_iommu_free_table(container, tbl); >> } >> >> -while (!list_empty(>prereg_list)) { >> -struct tce_iommu_prereg *tcemem; >> - >> -tcemem = list_first_entry(>prereg_list, >> -struct tce_iommu_prereg, next); >> -WARN_ON_ONCE(tce_iommu_prereg_free(container, tcemem)); >> -} >> +list_for_each_entry_safe(tcemem, tmtmp, >prereg_list, next) >> +WARN_ON(tce_iommu_prereg_free(container, tcemem)); > > I'm not sure that tce_iommu_prereg_free() call under WARN_ON() is good > idea because WARN_ON() is a preprocessor macro: > > if CONFIG_WARN=n is added by the analogy with CONFIG_BUG=n defining > WARN_ON() as empty we will loose call to tce_iommu_prereg_free() > leaking resources. I don't think that's likely to ever happen though, we have a large number of uses that would need to be checked one-by-one: $ git grep "if (WARN_ON(" | wc -l 2853 So if we ever did add CONFIG_WARN, I think it would still need to evaluate the condition, just not emit a warning. cheers
Re: [PATCH] powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y
On Mon, 2018-10-08 at 17:04 +1000, Nicholas Piggin wrote: > On Mon, 08 Oct 2018 15:08:31 +1100 > Benjamin Herrenschmidt wrote: > > > HMIs will crash the kernel due to > > > > BRANCH_LINK_TO_FAR(hmi_exception_realmode) > > > > Calling into the OPD instead of the actual code. > > > > Signed-off-by: Benjamin Herrenschmidt > > --- > > > > This hack fixes it for me, but it's not great. Nick, any better idea ? > > Is it a hack because the ifdef gunk, or because there's something > deeper wrong with using the .sym? I'd say ifdef gunk, also the KVM use doesn't need it bcs the kvm entry isn't an OPD. > I guess all those handlers that load label address by hand could have > the bug silently creep in. Can we have them use the DOTSYM() macro? The KVM one doesnt have a dotsym does it ? Also should we load the TOC from the OPD ? > Thanks, > Nick > > > > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > > b/arch/powerpc/kernel/exceptions-64s.S > > index ea04dfb..752709cc8 100644 > > --- a/arch/powerpc/kernel/exceptions-64s.S > > +++ b/arch/powerpc/kernel/exceptions-64s.S > > @@ -1119,7 +1119,11 @@ TRAMP_REAL_BEGIN(hmi_exception_early) > > EXCEPTION_PROLOG_COMMON_2(PACA_EXGEN) > > EXCEPTION_PROLOG_COMMON_3(0xe60) > > addir3,r1,STACK_FRAME_OVERHEAD > > +#ifdef PPC64_ELF_ABI_v1 > > + BRANCH_LINK_TO_FAR(.hmi_exception_realmode) /* Function call ABI */ > > +#else > > BRANCH_LINK_TO_FAR(hmi_exception_realmode) /* Function call ABI */ > > +#endif > > cmpdi cr0,r3,0 > > > > /* Windup the stack. */ > > > >
[PATCH v6 9/9] powerpc: clean stack pointers naming
Some stack pointers used to also be thread_info pointers and were called tp. Now that they are only stack pointers, rename them sp. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/irq.c | 17 +++-- arch/powerpc/kernel/setup_64.c | 20 ++-- 2 files changed, 17 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 62cfccf4af89..754f0efc507b 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -659,21 +659,21 @@ void __do_irq(struct pt_regs *regs) void do_IRQ(struct pt_regs *regs) { struct pt_regs *old_regs = set_irq_regs(regs); - void *curtp, *irqtp, *sirqtp; + void *cursp, *irqsp, *sirqsp; /* Switch to the irq stack to handle this */ - curtp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1)); - irqtp = hardirq_ctx[raw_smp_processor_id()]; - sirqtp = softirq_ctx[raw_smp_processor_id()]; + cursp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1)); + irqsp = hardirq_ctx[raw_smp_processor_id()]; + sirqsp = softirq_ctx[raw_smp_processor_id()]; /* Already there ? */ - if (unlikely(curtp == irqtp || curtp == sirqtp)) { + if (unlikely(cursp == irqsp || cursp == sirqsp)) { __do_irq(regs); set_irq_regs(old_regs); return; } /* Switch stack and call */ - call_do_irq(regs, irqtp); + call_do_irq(regs, irqsp); set_irq_regs(old_regs); } @@ -732,10 +732,7 @@ void irq_ctx_init(void) void do_softirq_own_stack(void) { - void *irqtp; - - irqtp = softirq_ctx[smp_processor_id()]; - call_do_softirq(irqtp); + call_do_softirq(softirq_ctx[smp_processor_id()]); } irq_hw_number_t virq_to_hw(unsigned int virq) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 6792e9c90689..4912ec0320b8 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -717,22 +717,22 @@ void __init emergency_stack_init(void) limit = min(ppc64_bolted_size(), ppc64_rma_size); for_each_possible_cpu(i) { - void *ti; + void *sp; - ti = alloc_stack(limit, i); - memset(ti, 0, THREAD_SIZE); - paca_ptrs[i]->emergency_sp = ti + THREAD_SIZE; + sp = alloc_stack(limit, i); + memset(sp, 0, THREAD_SIZE); + paca_ptrs[i]->emergency_sp = sp + THREAD_SIZE; #ifdef CONFIG_PPC_BOOK3S_64 /* emergency stack for NMI exception handling. */ - ti = alloc_stack(limit, i); - memset(ti, 0, THREAD_SIZE); - paca_ptrs[i]->nmi_emergency_sp = ti + THREAD_SIZE; + sp = alloc_stack(limit, i); + memset(sp, 0, THREAD_SIZE); + paca_ptrs[i]->nmi_emergency_sp = sp + THREAD_SIZE; /* emergency stack for machine check exception handling. */ - ti = alloc_stack(limit, i); - memset(ti, 0, THREAD_SIZE); - paca_ptrs[i]->mc_emergency_sp = ti + THREAD_SIZE; + sp = alloc_stack(limit, i); + memset(sp, 0, THREAD_SIZE); + paca_ptrs[i]->mc_emergency_sp = sp + THREAD_SIZE; #endif } } -- 2.13.3
[PATCH v6 8/9] powerpc/64: Remove CURRENT_THREAD_INFO
Now that current_thread_info is located at the beginning of 'current' task struct, CURRENT_THREAD_INFO macro is not really needed any more. This patch replaces it by loads of the value at PACACURRENT(r13). Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/exception-64s.h | 4 ++-- arch/powerpc/include/asm/thread_info.h | 4 arch/powerpc/kernel/entry_64.S | 10 +- arch/powerpc/kernel/exceptions-64e.S | 2 +- arch/powerpc/kernel/exceptions-64s.S | 2 +- arch/powerpc/kernel/idle_book3e.S | 2 +- arch/powerpc/kernel/idle_power4.S | 2 +- arch/powerpc/kernel/trace/ftrace_64_mprofile.S | 6 +++--- 8 files changed, 14 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index a86fead0..ca3af3e9015e 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -680,7 +680,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #define RUNLATCH_ON\ BEGIN_FTR_SECTION \ - CURRENT_THREAD_INFO(r3, r1);\ + ld r3, PACACURRENT(r13); \ ld r4,TI_LOCAL_FLAGS(r3); \ andi. r0,r4,_TLF_RUNLATCH;\ beqlppc64_runlatch_on_trampoline; \ @@ -730,7 +730,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL) #ifdef CONFIG_PPC_970_NAP #define FINISH_NAP \ BEGIN_FTR_SECTION \ - CURRENT_THREAD_INFO(r11, r1); \ + ld r11, PACACURRENT(r13); \ ld r9,TI_LOCAL_FLAGS(r11); \ andi. r10,r9,_TLF_NAPPING;\ bnelpower4_fixup_nap; \ diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index 361bb45b8990..2ee9e248c933 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -17,10 +17,6 @@ #define THREAD_SIZE(1 << THREAD_SHIFT) -#ifdef CONFIG_PPC64 -#define CURRENT_THREAD_INFO(dest, sp) stringify_in_c(ld dest, PACACURRENT(r13)) -#endif - #ifndef __ASSEMBLY__ #include #include diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 6fce0f8fd8c4..06d9a7c084a1 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -158,7 +158,7 @@ system_call:/* label this so stack traces look sane */ li r10,IRQS_ENABLED std r10,SOFTE(r1) - CURRENT_THREAD_INFO(r11, r1) + ld r11, PACACURRENT(r13) ld r10,TI_FLAGS(r11) andi. r11,r10,_TIF_SYSCALL_DOTRACE bne .Lsyscall_dotrace /* does not return */ @@ -205,7 +205,7 @@ system_call:/* label this so stack traces look sane */ ld r3,RESULT(r1) #endif - CURRENT_THREAD_INFO(r12, r1) + ld r12, PACACURRENT(r13) ld r8,_MSR(r1) #ifdef CONFIG_PPC_BOOK3S @@ -336,7 +336,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) /* Repopulate r9 and r10 for the syscall path */ addir9,r1,STACK_FRAME_OVERHEAD - CURRENT_THREAD_INFO(r10, r1) + ld r10, PACACURRENT(r13) ld r10,TI_FLAGS(r10) cmpldi r0,NR_syscalls @@ -735,7 +735,7 @@ _GLOBAL(ret_from_except_lite) mtmsrd r10,1 /* Update machine state */ #endif /* CONFIG_PPC_BOOK3E */ - CURRENT_THREAD_INFO(r9, r1) + ld r9, PACACURRENT(r13) ld r3,_MSR(r1) #ifdef CONFIG_PPC_BOOK3E ld r10,PACACURRENT(r13) @@ -849,7 +849,7 @@ resume_kernel: 1: bl preempt_schedule_irq /* Re-test flags and eventually loop */ - CURRENT_THREAD_INFO(r9, r1) + ld r9, PACACURRENT(r13) ld r4,TI_FLAGS(r9) andi. r0,r4,_TIF_NEED_RESCHED bne 1b diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 231d066b4a3d..dfafcd0af009 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -469,7 +469,7 @@ exc_##n##_bad_stack: \ * interrupts happen before the wait instruction. */ #define CHECK_NAPPING() \ - CURRENT_THREAD_INFO(r11, r1); \ + ld r11, PACACURRENT(r13); \ ld r10,TI_LOCAL_FLAGS(r11);\ andi. r9,r10,_TLF_NAPPING;\ beq+1f; \ diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index b9239dbf6d59..f776f30ecfcc 100644
[PATCH v6 7/9] powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU
Now that thread_info is similar to task_struct, it's address is in r2 so CURRENT_THREAD_INFO() macro is useless. This patch removes it. At the same time, as the 'cpu' field is not anymore in thread_info, this patch renames it to TASK_CPU. Signed-off-by: Christophe Leroy --- arch/powerpc/Makefile | 2 +- arch/powerpc/include/asm/thread_info.h | 2 -- arch/powerpc/kernel/asm-offsets.c | 2 +- arch/powerpc/kernel/entry_32.S | 43 -- arch/powerpc/kernel/epapr_hcalls.S | 5 ++-- arch/powerpc/kernel/head_fsl_booke.S | 5 ++-- arch/powerpc/kernel/idle_6xx.S | 8 +++ arch/powerpc/kernel/idle_e500.S| 8 +++ arch/powerpc/kernel/misc_32.S | 3 +-- arch/powerpc/mm/hash_low_32.S | 14 --- arch/powerpc/sysdev/6xx-suspend.S | 5 ++-- 11 files changed, 35 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 02e7ca1c15d4..f1e2d7f7b022 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -426,7 +426,7 @@ ifdef CONFIG_SMP prepare: task_cpu_prepare task_cpu_prepare: prepare0 - $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") print $$3;}' include/generated/asm-offsets.h)) + $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TASK_CPU") print $$3;}' include/generated/asm-offsets.h)) endif # Use the file '.tmp_gas_check' for binutils tests, as gas won't output diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index 61c8747cd926..361bb45b8990 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -19,8 +19,6 @@ #ifdef CONFIG_PPC64 #define CURRENT_THREAD_INFO(dest, sp) stringify_in_c(ld dest, PACACURRENT(r13)) -#else -#define CURRENT_THREAD_INFO(dest, sp) stringify_in_c(mr dest, r2) #endif #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 768ce602d624..31be6eb9c0d4 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -97,7 +97,7 @@ int main(void) #endif /* CONFIG_PPC64 */ OFFSET(TASK_STACK, task_struct, stack); #ifdef CONFIG_SMP - OFFSET(TI_CPU, task_struct, cpu); + OFFSET(TASK_CPU, task_struct, cpu); #endif #ifdef CONFIG_LIVEPATCH diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index bd3b146e18a3..d0c546ce387e 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -168,8 +168,7 @@ transfer_to_handler: tophys(r11,r11) addir11,r11,global_dbcr0@l #ifdef CONFIG_SMP - CURRENT_THREAD_INFO(r9, r1) - lwz r9,TI_CPU(r9) + lwz r9,TASK_CPU(r2) slwir9,r9,3 add r11,r11,r9 #endif @@ -180,8 +179,7 @@ transfer_to_handler: stw r12,4(r11) #endif #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE - CURRENT_THREAD_INFO(r9, r1) - tophys(r9, r9) + tophys(r9, r2) ACCOUNT_CPU_USER_ENTRY(r9, r11, r12) #endif @@ -195,8 +193,7 @@ transfer_to_handler: ble-stack_ovf /* then the kernel stack overflowed */ 5: #if defined(CONFIG_6xx) || defined(CONFIG_E500) - CURRENT_THREAD_INFO(r9, r1) - tophys(r9,r9) /* check local flags */ + tophys(r9,r2) /* check local flags */ lwz r12,TI_LOCAL_FLAGS(r9) mtcrf 0x01,r12 bt- 31-TLF_NAPPING,4f @@ -345,8 +342,7 @@ _GLOBAL(DoSyscall) mtmsr r11 1: #endif /* CONFIG_TRACE_IRQFLAGS */ - CURRENT_THREAD_INFO(r10, r1) - lwz r11,TI_FLAGS(r10) + lwz r11,TI_FLAGS(r2) andi. r11,r11,_TIF_SYSCALL_DOTRACE bne-syscall_dotrace syscall_dotrace_cont: @@ -379,13 +375,12 @@ ret_from_syscall: lwz r3,GPR3(r1) #endif mr r6,r3 - CURRENT_THREAD_INFO(r12, r1) /* disable interrupts so current_thread_info()->flags can't change */ LOAD_MSR_KERNEL(r10,MSR_KERNEL) /* doesn't include MSR_EE */ /* Note: We don't bother telling lockdep about it */ SYNC MTMSRD(r10) - lwz r9,TI_FLAGS(r12) + lwz r9,TI_FLAGS(r2) li r8,-MAX_ERRNO andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) bne-syscall_exit_work @@ -432,8 +427,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX) #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE andi. r4,r8,MSR_PR beq 3f - CURRENT_THREAD_INFO(r4, r1) - ACCOUNT_CPU_USER_EXIT(r4, r5, r7) + ACCOUNT_CPU_USER_EXIT(r2, r5, r7) 3: #endif lwz r4,_LINK(r1) @@ -526,7 +520,7 @@ syscall_exit_work: /* Clear per-syscall TIF flags if any are set. */ li r11,_TIF_PERSYSCALL_MASK - addir12,r12,TI_FLAGS + addi
[PATCH v6 6/9] powerpc: 'current_set' is now a table of task_struct pointers
The table of pointers 'current_set' has been used for retrieving the stack and current. They used to be thread_info pointers as they were pointing to the stack and current was taken from the 'task' field of the thread_info. Now, the pointers of 'current_set' table are now both pointers to task_struct and pointers to thread_info. As they are used to get current, and the stack pointer is retrieved from current's stack field, this patch changes their type to task_struct, and renames secondary_ti to secondary_current. Reviewed-by: Nicholas Piggin Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/asm-prototypes.h | 4 ++-- arch/powerpc/kernel/head_32.S | 6 +++--- arch/powerpc/kernel/head_44x.S| 4 ++-- arch/powerpc/kernel/head_fsl_booke.S | 4 ++-- arch/powerpc/kernel/smp.c | 10 -- 5 files changed, 13 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index 9bc98c239305..ab0541f9da42 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -23,8 +23,8 @@ #include /* SMP */ -extern struct thread_info *current_set[NR_CPUS]; -extern struct thread_info *secondary_ti; +extern struct task_struct *current_set[NR_CPUS]; +extern struct task_struct *secondary_current; void start_secondary(void *unused); /* kexec */ diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index 44dfd73b2a62..ba0341bd5a00 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -842,9 +842,9 @@ __secondary_start: #endif /* CONFIG_6xx */ /* get current's stack and current */ - lis r1,secondary_ti@ha - tophys(r1,r1) - lwz r2,secondary_ti@l(r1) + lis r2,secondary_current@ha + tophys(r2,r2) + lwz r2,secondary_current@l(r2) tophys(r1,r2) lwz r1,TASK_STACK(r1) diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S index 2c7e90f36358..48e4de4dfd0c 100644 --- a/arch/powerpc/kernel/head_44x.S +++ b/arch/powerpc/kernel/head_44x.S @@ -1021,8 +1021,8 @@ _GLOBAL(start_secondary_47x) /* Now we can get our task struct and real stack pointer */ /* Get current's stack and current */ - lis r1,secondary_ti@ha - lwz r2,secondary_ti@l(r1) + lis r2,secondary_current@ha + lwz r2,secondary_current@l(r2) lwz r1,TASK_STACK(r2) /* Current stack pointer */ diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index b8a2b789677e..0d27bfff52dd 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -1076,8 +1076,8 @@ __secondary_start: bl call_setup_cpu /* get current's stack and current */ - lis r1,secondary_ti@ha - lwz r2,secondary_ti@l(r1) + lis r2,secondary_current@ha + lwz r2,secondary_current@l(r2) lwz r1,TASK_STACK(r2) /* stack */ diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index f22fcbeb9898..00193643f0da 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -74,7 +74,7 @@ static DEFINE_PER_CPU(int, cpu_state) = { 0 }; #endif -struct thread_info *secondary_ti; +struct task_struct *secondary_current; DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map); DEFINE_PER_CPU(cpumask_var_t, cpu_l2_cache_map); @@ -644,7 +644,7 @@ void smp_send_stop(void) } #endif /* CONFIG_NMI_IPI */ -struct thread_info *current_set[NR_CPUS]; +struct task_struct *current_set[NR_CPUS]; static void smp_store_cpu_info(int id) { @@ -724,7 +724,7 @@ void smp_prepare_boot_cpu(void) paca_ptrs[boot_cpuid]->__current = current; #endif set_numa_node(numa_cpu_lookup_table[boot_cpuid]); - current_set[boot_cpuid] = task_thread_info(current); + current_set[boot_cpuid] = current; } #ifdef CONFIG_HOTPLUG_CPU @@ -809,15 +809,13 @@ static bool secondaries_inhibited(void) static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle) { - struct thread_info *ti = task_thread_info(idle); - #ifdef CONFIG_PPC64 paca_ptrs[cpu]->__current = idle; paca_ptrs[cpu]->kstack = (unsigned long)task_stack_page(idle) + THREAD_SIZE - STACK_FRAME_OVERHEAD; #endif idle->cpu = cpu; - secondary_ti = current_set[cpu] = ti; + secondary_current = current_set[cpu] = idle; } int __cpu_up(unsigned int cpu, struct task_struct *tidle) -- 2.13.3
[PATCH v6 5/9] powerpc: regain entire stack space
thread_info is not anymore in the stack, so the entire stack can now be used. In the meantime, with the previous patch all pointers to the stacks are not anymore pointers to thread_info so this patch changes them to void* Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/irq.h | 10 +- arch/powerpc/include/asm/processor.h | 3 +-- arch/powerpc/kernel/asm-offsets.c| 1 - arch/powerpc/kernel/entry_32.S | 14 -- arch/powerpc/kernel/irq.c| 19 +-- arch/powerpc/kernel/misc_32.S| 6 ++ arch/powerpc/kernel/process.c| 9 +++-- arch/powerpc/kernel/setup_64.c | 8 8 files changed, 28 insertions(+), 42 deletions(-) diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index 2efbae8d93be..966ddd4d2414 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -48,9 +48,9 @@ struct pt_regs; * Per-cpu stacks for handling critical, debug and machine check * level interrupts. */ -extern struct thread_info *critirq_ctx[NR_CPUS]; -extern struct thread_info *dbgirq_ctx[NR_CPUS]; -extern struct thread_info *mcheckirq_ctx[NR_CPUS]; +extern void *critirq_ctx[NR_CPUS]; +extern void *dbgirq_ctx[NR_CPUS]; +extern void *mcheckirq_ctx[NR_CPUS]; extern void exc_lvl_ctx_init(void); #else #define exc_lvl_ctx_init() @@ -59,8 +59,8 @@ extern void exc_lvl_ctx_init(void); /* * Per-cpu stacks for handling hard and soft interrupts. */ -extern struct thread_info *hardirq_ctx[NR_CPUS]; -extern struct thread_info *softirq_ctx[NR_CPUS]; +extern void *hardirq_ctx[NR_CPUS]; +extern void *softirq_ctx[NR_CPUS]; extern void irq_ctx_init(void); void call_do_softirq(void *sp); diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index b225c7f7c5a4..e763342265a2 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -331,8 +331,7 @@ struct thread_struct { #define ARCH_MIN_TASKALIGN 16 #define INIT_SP(sizeof(init_stack) + (unsigned long) _stack) -#define INIT_SP_LIMIT \ - (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack) +#define INIT_SP_LIMIT ((unsigned long)_stack) #ifdef CONFIG_SPE #define SPEFSCR_INIT \ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 833d189df04c..768ce602d624 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -93,7 +93,6 @@ int main(void) DEFINE(NMI_MASK, NMI_MASK); OFFSET(TASKTHREADPPR, task_struct, thread.ppr); #else - DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16)); OFFSET(KSP_LIMIT, thread_struct, ksp_limit); #endif /* CONFIG_PPC64 */ OFFSET(TASK_STACK, task_struct, stack); diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index fa7a69ffb37a..bd3b146e18a3 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -97,14 +97,11 @@ crit_transfer_to_handler: mfspr r0,SPRN_SRR1 stw r0,_SRR1(r11) - /* set the stack limit to the current stack -* and set the limit to protect the thread_info -* struct -*/ + /* set the stack limit to the current stack */ mfspr r8,SPRN_SPRG_THREAD lwz r0,KSP_LIMIT(r8) stw r0,SAVED_KSP_LIMIT(r11) - rlwimi r0,r1,0,0,(31-THREAD_SHIFT) + rlwinm r0,r1,0,0,(31 - THREAD_SHIFT) stw r0,KSP_LIMIT(r8) /* fall through */ #endif @@ -121,14 +118,11 @@ crit_transfer_to_handler: mfspr r0,SPRN_SRR1 stw r0,crit_srr1@l(0) - /* set the stack limit to the current stack -* and set the limit to protect the thread_info -* struct -*/ + /* set the stack limit to the current stack */ mfspr r8,SPRN_SPRG_THREAD lwz r0,KSP_LIMIT(r8) stw r0,saved_ksp_limit@l(0) - rlwimi r0,r1,0,0,(31-THREAD_SHIFT) + rlwinm r0,r1,0,0,(31 - THREAD_SHIFT) stw r0,KSP_LIMIT(r8) /* fall through */ #endif diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 3fdb6b6973cf..62cfccf4af89 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -618,9 +618,8 @@ static inline void check_stack_overflow(void) sp = current_stack_pointer() & (THREAD_SIZE-1); /* check for stack overflow: is there less than 2KB free? */ - if (unlikely(sp < (sizeof(struct thread_info) + 2048))) { - pr_err("do_IRQ: stack overflow: %ld\n", - sp - sizeof(struct thread_info)); + if (unlikely(sp < 2048)) { + pr_err("do_IRQ: stack overflow: %ld\n", sp); dump_stack(); } #endif @@ -660,7 +659,7 @@ void __do_irq(struct pt_regs *regs) void do_IRQ(struct pt_regs *regs) { struct pt_regs *old_regs
[PATCH v6 4/9] powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
This patch activates CONFIG_THREAD_INFO_IN_TASK which moves the thread_info into task_struct. Moving thread_info into task_struct has the following advantages: - It protects thread_info from corruption in the case of stack overflows. - Its address is harder to determine if stack addresses are leaked, making a number of attacks more difficult. This has the following consequences: - thread_info is now located at the beginning of task_struct. - The 'cpu' field is now in task_struct, and only exists when CONFIG_SMP is active. - thread_info doesn't have anymore the 'task' field. This patch: - Removes all recopy of thread_info struct when the stack changes. - Changes the CURRENT_THREAD_INFO() macro to point to current. - Selects CONFIG_THREAD_INFO_IN_TASK. - Modifies raw_smp_processor_id() to get ->cpu from current without including linux/sched.h to avoid circular inclusion and without including asm/asm-offsets.h to avoid symbol names duplication between ASM constants and C constants. Signed-off-by: Christophe Leroy Reviewed-by: Nicholas Piggin --- arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 8 +- arch/powerpc/include/asm/ptrace.h | 2 +- arch/powerpc/include/asm/smp.h | 17 +++- arch/powerpc/include/asm/thread_info.h | 17 ++-- arch/powerpc/kernel/asm-offsets.c | 7 +++-- arch/powerpc/kernel/entry_32.S | 9 +++ arch/powerpc/kernel/exceptions-64e.S | 11 arch/powerpc/kernel/head_32.S | 6 ++--- arch/powerpc/kernel/head_44x.S | 4 +-- arch/powerpc/kernel/head_64.S | 1 + arch/powerpc/kernel/head_booke.h | 8 +- arch/powerpc/kernel/head_fsl_booke.S | 7 +++-- arch/powerpc/kernel/irq.c | 47 +- arch/powerpc/kernel/kgdb.c | 28 arch/powerpc/kernel/machine_kexec_64.c | 6 ++--- arch/powerpc/kernel/setup_64.c | 21 --- arch/powerpc/kernel/smp.c | 2 +- arch/powerpc/net/bpf_jit32.h | 5 ++-- 19 files changed, 52 insertions(+), 155 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 602eea723624..3b958cd4e284 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -238,6 +238,7 @@ config PPC select RTC_LIB select SPARSE_IRQ select SYSCTL_EXCEPTION_TRACE + select THREAD_INFO_IN_TASK select VIRT_TO_BUS if !PPC64 # # Please keep this list sorted alphabetically. diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 81552c7b46eb..02e7ca1c15d4 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -422,6 +422,13 @@ else endif endif +ifdef CONFIG_SMP +prepare: task_cpu_prepare + +task_cpu_prepare: prepare0 + $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") print $$3;}' include/generated/asm-offsets.h)) +endif + # Use the file '.tmp_gas_check' for binutils tests, as gas won't output # to stdout and these checks are run even on install targets. TOUT := .tmp_gas_check @@ -439,4 +446,3 @@ checkbin: CLEAN_FILES += $(TOUT) - diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h index 447cbd1bee99..3a7e5561630b 100644 --- a/arch/powerpc/include/asm/ptrace.h +++ b/arch/powerpc/include/asm/ptrace.h @@ -120,7 +120,7 @@ extern int ptrace_put_reg(struct task_struct *task, int regno, unsigned long data); #define current_pt_regs() \ - ((struct pt_regs *)((unsigned long)current_thread_info() + THREAD_SIZE) - 1) + ((struct pt_regs *)((unsigned long)task_stack_page(current) + THREAD_SIZE) - 1) /* * We use the least-significant bit of the trap field to indicate * whether we have saved the full set of registers, or only a diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index 95b66a0c639b..93a8cd120663 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -83,7 +83,22 @@ int is_cpu_dead(unsigned int cpu); /* 32-bit */ extern int smp_hw_index[]; -#define raw_smp_processor_id() (current_thread_info()->cpu) +/* + * This is particularly ugly: it appears we can't actually get the definition + * of task_struct here, but we need access to the CPU this task is running on. + * Instead of using task_struct we're using _TASK_CPU which is extracted from + * asm-offsets.h by kbuild to get the current processor ID. + * + * This also needs to be safeguarded when building asm-offsets.s because at + * that time _TASK_CPU is not defined yet. It could have been guarded by + * _TASK_CPU itself, but we want the build to fail if _TASK_CPU is missing + * when building something else than asm-offsets.s + */ +#ifdef GENERATING_ASM_OFFSETS +#define raw_smp_processor_id() (0) +#else +#define raw_smp_processor_id() (*(unsigned int *)((void *)current + _TASK_CPU))
[PATCH v6 3/9] powerpc: Prepare for moving thread_info into task_struct
This patch cleans the powerpc kernel before activating CONFIG_THREAD_INFO_IN_TASK: - The purpose of the pointer given to call_do_softirq() and call_do_irq() is to point the new stack ==> change it to void* and rename it 'sp' - Don't use CURRENT_THREAD_INFO() to locate the stack. - Fix a few comments. - Replace current_thread_info()->task by current - Remove unnecessary casts to thread_info, as they'll become invalid once thread_info is not in stack anymore. - Rename THREAD_INFO to TASK_STASK: as it is in fact the offset of the pointer to the stack in task_struct, this pointer will not be impacted by the move of THREAD_INFO. - Makes TASK_STACK available to PPC64. PPC64 will need it to get the stack pointer from current once the thread_info have been moved. - Modifies klp_init_thread_info() to take task_struct pointer argument. Signed-off-by: Christophe Leroy Reviewed-by: Nicholas Piggin --- arch/powerpc/include/asm/irq.h | 4 ++-- arch/powerpc/include/asm/livepatch.h | 7 --- arch/powerpc/include/asm/processor.h | 4 ++-- arch/powerpc/include/asm/reg.h | 2 +- arch/powerpc/kernel/asm-offsets.c| 2 +- arch/powerpc/kernel/entry_32.S | 2 +- arch/powerpc/kernel/entry_64.S | 2 +- arch/powerpc/kernel/head_32.S| 4 ++-- arch/powerpc/kernel/head_40x.S | 4 ++-- arch/powerpc/kernel/head_44x.S | 2 +- arch/powerpc/kernel/head_8xx.S | 2 +- arch/powerpc/kernel/head_booke.h | 4 ++-- arch/powerpc/kernel/head_fsl_booke.S | 4 ++-- arch/powerpc/kernel/irq.c| 2 +- arch/powerpc/kernel/misc_32.S| 4 ++-- arch/powerpc/kernel/process.c| 8 arch/powerpc/kernel/setup-common.c | 2 +- arch/powerpc/kernel/setup_32.c | 15 +-- arch/powerpc/kernel/smp.c| 4 +++- 19 files changed, 38 insertions(+), 40 deletions(-) diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index ee39ce56b2a2..2efbae8d93be 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -63,8 +63,8 @@ extern struct thread_info *hardirq_ctx[NR_CPUS]; extern struct thread_info *softirq_ctx[NR_CPUS]; extern void irq_ctx_init(void); -extern void call_do_softirq(struct thread_info *tp); -extern void call_do_irq(struct pt_regs *regs, struct thread_info *tp); +void call_do_softirq(void *sp); +void call_do_irq(struct pt_regs *regs, void *sp); extern void do_IRQ(struct pt_regs *regs); extern void __init init_IRQ(void); extern void __do_irq(struct pt_regs *regs); diff --git a/arch/powerpc/include/asm/livepatch.h b/arch/powerpc/include/asm/livepatch.h index 47a03b9b528b..8a81d10ccc82 100644 --- a/arch/powerpc/include/asm/livepatch.h +++ b/arch/powerpc/include/asm/livepatch.h @@ -43,13 +43,14 @@ static inline unsigned long klp_get_ftrace_location(unsigned long faddr) return ftrace_location_range(faddr, faddr + 16); } -static inline void klp_init_thread_info(struct thread_info *ti) +static inline void klp_init_thread_info(struct task_struct *p) { + struct thread_info *ti = task_thread_info(p); /* + 1 to account for STACK_END_MAGIC */ - ti->livepatch_sp = (unsigned long *)(ti + 1) + 1; + ti->livepatch_sp = end_of_stack(p) + 1; } #else -static void klp_init_thread_info(struct thread_info *ti) { } +static inline void klp_init_thread_info(struct task_struct *p) { } #endif /* CONFIG_LIVEPATCH */ #endif /* _ASM_POWERPC_LIVEPATCH_H */ diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 13589274fe9b..b225c7f7c5a4 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -40,7 +40,7 @@ #ifndef __ASSEMBLY__ #include -#include +#include #include #include @@ -332,7 +332,7 @@ struct thread_struct { #define INIT_SP(sizeof(init_stack) + (unsigned long) _stack) #define INIT_SP_LIMIT \ - (_ALIGN_UP(sizeof(init_thread_info), 16) + (unsigned long) _stack) + (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack) #ifdef CONFIG_SPE #define SPEFSCR_INIT \ diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 640a4d818772..d2528a0b2f5b 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -1058,7 +1058,7 @@ * - SPRG9 debug exception scratch * * All 32-bit: - * - SPRG3 current thread_info pointer + * - SPRG3 current thread_struct physical addr pointer *(virtual on BookE, physical on others) * * 32-bit classic: diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index a6d70fd2e499..c583a02e5a21 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -91,10 +91,10 @@ int main(void) DEFINE(NMI_MASK, NMI_MASK); OFFSET(TASKTHREADPPR, task_struct, thread.ppr); #else - OFFSET(THREAD_INFO, task_struct, stack);
[PATCH v6 2/9] powerpc: Only use task_struct 'cpu' field on SMP
When moving to CONFIG_THREAD_INFO_IN_TASK, the thread_info 'cpu' field gets moved into task_struct and only defined when CONFIG_SMP is set. This patch ensures that TI_CPU is only used when CONFIG_SMP is set and that task_struct 'cpu' field is not used directly out of SMP code. Signed-off-by: Christophe Leroy Reviewed-by: Nicholas Piggin --- arch/powerpc/kernel/head_fsl_booke.S | 2 ++ arch/powerpc/kernel/misc_32.S| 4 arch/powerpc/xmon/xmon.c | 2 +- 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index e2750b856c8f..05b574f416b3 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -243,8 +243,10 @@ set_ivor: li r0,0 stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1) +#ifdef CONFIG_SMP CURRENT_THREAD_INFO(r22, r1) stw r24, TI_CPU(r22) +#endif bl early_init diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S index 695b24a2d954..2f0fe8bfc078 100644 --- a/arch/powerpc/kernel/misc_32.S +++ b/arch/powerpc/kernel/misc_32.S @@ -183,10 +183,14 @@ _GLOBAL(low_choose_750fx_pll) or r4,r4,r5 mtspr SPRN_HID1,r4 +#ifdef CONFIG_SMP /* Store new HID1 image */ CURRENT_THREAD_INFO(r6, r1) lwz r6,TI_CPU(r6) slwir6,r6,2 +#else + li r6, 0 +#endif addis r6,r6,nap_save_hid1@ha stw r4,nap_save_hid1@l(r6) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index c70d17c9a6ba..1731793e1277 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -2986,7 +2986,7 @@ static void show_task(struct task_struct *tsk) printf("%px %016lx %6d %6d %c %2d %s\n", tsk, tsk->thread.ksp, tsk->pid, tsk->parent->pid, - state, task_thread_info(tsk)->cpu, + state, task_cpu(tsk), tsk->comm); } -- 2.13.3
[PATCH v6 0/9] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK
The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which moves the thread_info into task_struct. Moving thread_info into task_struct has the following advantages: - It protects thread_info from corruption in the case of stack overflows. - Its address is harder to determine if stack addresses are leaked, making a number of attacks more difficult. Changes since v5: - Fixed livepatch_sp setup by using end_of_stack() instead of hardcoding - Fixed PPC_BPF_LOAD_CPU() macro Changes since v4: - Fixed a build failure on 32bits SMP when include/generated/asm-offsets.h is not already existing, was due to spaces instead of a tab in the Makefile Changes since RFC v3: (based on Nick's review) - Renamed task_size.h to task_size_user64.h to better relate to what it contains. - Handling of the isolation of thread_info cpu field inside CONFIG_SMP #ifdefs moved to a separate patch. - Removed CURRENT_THREAD_INFO macro completely. - Added a guard in asm/smp.h to avoid build failure before _TASK_CPU is defined. - Added a patch at the end to rename 'tp' pointers to 'sp' pointers - Renamed 'tp' into 'sp' pointers in preparation patch when relevant - Fixed a few commit logs - Fixed checkpatch report. Changes since RFC v2: - Removed the modification of names in asm-offsets - Created a rule in arch/powerpc/Makefile to append the offset of current->cpu in CFLAGS - Modified asm/smp.h to use the offset set in CFLAGS - Squashed the renaming of THREAD_INFO to TASK_STACK in the preparation patch - Moved the modification of current_pt_regs in the patch activating CONFIG_THREAD_INFO_IN_TASK Changes since RFC v1: - Removed the first patch which was modifying header inclusion order in timer - Modified some names in asm-offsets to avoid conflicts when including asm-offsets in C files - Modified asm/smp.h to avoid having to include linux/sched.h (using asm-offsets instead) - Moved some changes from the activation patch to the preparation patch. Christophe Leroy (9): book3s/64: avoid circular header inclusion in mmu-hash.h powerpc: Only use task_struct 'cpu' field on SMP powerpc: Prepare for moving thread_info into task_struct powerpc: Activate CONFIG_THREAD_INFO_IN_TASK powerpc: regain entire stack space powerpc: 'current_set' is now a table of task_struct pointers powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU powerpc/64: Remove CURRENT_THREAD_INFO powerpc: clean stack pointers naming arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 8 ++- arch/powerpc/include/asm/asm-prototypes.h | 4 +- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 +- arch/powerpc/include/asm/exception-64s.h | 4 +- arch/powerpc/include/asm/irq.h | 14 ++--- arch/powerpc/include/asm/livepatch.h | 7 ++- arch/powerpc/include/asm/processor.h | 39 + arch/powerpc/include/asm/ptrace.h | 2 +- arch/powerpc/include/asm/reg.h | 2 +- arch/powerpc/include/asm/smp.h | 17 +- arch/powerpc/include/asm/task_size_user64.h| 42 ++ arch/powerpc/include/asm/thread_info.h | 19 --- arch/powerpc/kernel/asm-offsets.c | 10 ++-- arch/powerpc/kernel/entry_32.S | 66 -- arch/powerpc/kernel/entry_64.S | 12 ++-- arch/powerpc/kernel/epapr_hcalls.S | 5 +- arch/powerpc/kernel/exceptions-64e.S | 13 + arch/powerpc/kernel/exceptions-64s.S | 2 +- arch/powerpc/kernel/head_32.S | 14 ++--- arch/powerpc/kernel/head_40x.S | 4 +- arch/powerpc/kernel/head_44x.S | 8 +-- arch/powerpc/kernel/head_64.S | 1 + arch/powerpc/kernel/head_8xx.S | 2 +- arch/powerpc/kernel/head_booke.h | 12 +--- arch/powerpc/kernel/head_fsl_booke.S | 16 +++--- arch/powerpc/kernel/idle_6xx.S | 8 +-- arch/powerpc/kernel/idle_book3e.S | 2 +- arch/powerpc/kernel/idle_e500.S| 8 +-- arch/powerpc/kernel/idle_power4.S | 2 +- arch/powerpc/kernel/irq.c | 77 +- arch/powerpc/kernel/kgdb.c | 28 -- arch/powerpc/kernel/machine_kexec_64.c | 6 +- arch/powerpc/kernel/misc_32.S | 17 +++--- arch/powerpc/kernel/process.c | 17 +++--- arch/powerpc/kernel/setup-common.c | 2 +- arch/powerpc/kernel/setup_32.c | 15 ++--- arch/powerpc/kernel/setup_64.c | 41 -- arch/powerpc/kernel/smp.c | 16 +++--- arch/powerpc/kernel/trace/ftrace_64_mprofile.S | 6 +- arch/powerpc/kvm/book3s_hv_hmi.c | 1 + arch/powerpc/mm/hash_low_32.S | 14 ++--- arch/powerpc/net/bpf_jit32.h
[PATCH v6 1/9] book3s/64: avoid circular header inclusion in mmu-hash.h
When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h includes asm/current.h. This generates a circular dependency. To avoid that, asm/processor.h shall not be included in mmu-hash.h In order to do that, this patch moves into a new header called asm/task_size_user64.h the information from asm/processor.h required by mmu-hash.h Signed-off-by: Christophe Leroy Reviewed-by: Nicholas Piggin --- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 +- arch/powerpc/include/asm/processor.h | 34 +- arch/powerpc/include/asm/task_size_user64.h | 42 +++ arch/powerpc/kvm/book3s_hv_hmi.c | 1 + 4 files changed, 45 insertions(+), 34 deletions(-) create mode 100644 arch/powerpc/include/asm/task_size_user64.h diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index e0e4ce8f77d6..02955d867067 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -23,7 +23,7 @@ */ #include #include -#include +#include #include /* diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 52fadded5c1e..13589274fe9b 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -101,40 +101,8 @@ void release_thread(struct task_struct *); #endif #ifdef CONFIG_PPC64 -/* - * 64-bit user address space can have multiple limits - * For now supported values are: - */ -#define TASK_SIZE_64TB (0x4000UL) -#define TASK_SIZE_128TB (0x8000UL) -#define TASK_SIZE_512TB (0x0002UL) -#define TASK_SIZE_1PB (0x0004UL) -#define TASK_SIZE_2PB (0x0008UL) -/* - * With 52 bits in the address we can support - * upto 4PB of range. - */ -#define TASK_SIZE_4PB (0x0010UL) -/* - * For now 512TB is only supported with book3s and 64K linux page size. - */ -#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES) -/* - * Max value currently used: - */ -#define TASK_SIZE_USER64 TASK_SIZE_4PB -#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_128TB -#define TASK_CONTEXT_SIZE TASK_SIZE_512TB -#else -#define TASK_SIZE_USER64 TASK_SIZE_64TB -#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_64TB -/* - * We don't need to allocate extended context ids for 4K page size, because - * we limit the max effective address on this config to 64TB. - */ -#define TASK_CONTEXT_SIZE TASK_SIZE_64TB -#endif +#include /* * 32-bit user address space is 4GB - 1 page diff --git a/arch/powerpc/include/asm/task_size_user64.h b/arch/powerpc/include/asm/task_size_user64.h new file mode 100644 index ..a4043075864b --- /dev/null +++ b/arch/powerpc/include/asm/task_size_user64.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_TASK_SIZE_USER64_H +#define _ASM_POWERPC_TASK_SIZE_USER64_H + +#ifdef CONFIG_PPC64 +/* + * 64-bit user address space can have multiple limits + * For now supported values are: + */ +#define TASK_SIZE_64TB (0x4000UL) +#define TASK_SIZE_128TB (0x8000UL) +#define TASK_SIZE_512TB (0x0002UL) +#define TASK_SIZE_1PB (0x0004UL) +#define TASK_SIZE_2PB (0x0008UL) +/* + * With 52 bits in the address we can support + * upto 4PB of range. + */ +#define TASK_SIZE_4PB (0x0010UL) + +/* + * For now 512TB is only supported with book3s and 64K linux page size. + */ +#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES) +/* + * Max value currently used: + */ +#define TASK_SIZE_USER64 TASK_SIZE_4PB +#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_128TB +#define TASK_CONTEXT_SIZE TASK_SIZE_512TB +#else +#define TASK_SIZE_USER64 TASK_SIZE_64TB +#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_64TB +/* + * We don't need to allocate extended context ids for 4K page size, because + * we limit the max effective address on this config to 64TB. + */ +#define TASK_CONTEXT_SIZE TASK_SIZE_64TB +#endif + +#endif /* CONFIG_PPC64 */ +#endif /* _ASM_POWERPC_TASK_SIZE_USER64_H */ diff --git a/arch/powerpc/kvm/book3s_hv_hmi.c b/arch/powerpc/kvm/book3s_hv_hmi.c index e3f738eb1cac..64b5011475c7 100644 --- a/arch/powerpc/kvm/book3s_hv_hmi.c +++ b/arch/powerpc/kvm/book3s_hv_hmi.c @@ -24,6 +24,7 @@ #include #include #include +#include void wait_for_subcore_guest_exit(void) { -- 2.13.3
Re: [PATCH -next] powerpc/powernv: Fix debugfs_simple_attr.cocci warnings
YueHaibing writes: > Use DEFINE_DEBUGFS_ATTRIBUTE rather than DEFINE_SIMPLE_ATTRIBUTE > for debugfs files. > > Semantic patch information: > Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file() > imposes some significant overhead as compared to > DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe(). Sorry this isn't detailed enough for me to actually understand the pros/cons of this patch. Perhaps I'm expected to know it, but I don't. I had a look at what each macro produces and it wasn't obvious to me what the benefit is. cheers > Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci > > Signed-off-by: YueHaibing > --- > arch/powerpc/platforms/powernv/memtrace.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/memtrace.c > b/arch/powerpc/platforms/powernv/memtrace.c > index 84d038e..0cb6548 100644 > --- a/arch/powerpc/platforms/powernv/memtrace.c > +++ b/arch/powerpc/platforms/powernv/memtrace.c > @@ -311,8 +311,8 @@ static int memtrace_enable_get(void *data, u64 *val) > return 0; > } > > -DEFINE_SIMPLE_ATTRIBUTE(memtrace_init_fops, memtrace_enable_get, > - memtrace_enable_set, "0x%016llx\n"); > +DEFINE_DEBUGFS_ATTRIBUTE(memtrace_init_fops, memtrace_enable_get, > + memtrace_enable_set, "0x%016llx\n"); > > static int memtrace_init(void) > { > @@ -321,8 +321,8 @@ static int memtrace_init(void) > if (!memtrace_debugfs_dir) > return -1; > > - debugfs_create_file("enable", 0600, memtrace_debugfs_dir, > - NULL, _init_fops); > + debugfs_create_file_unsafe("enable", 0600, memtrace_debugfs_dir, NULL, > +_init_fops); > > return 0; > }
Re: [PATCH] powerpc: Don't print kernel instructions in show_user_instructions()
Jann Horn writes: > On Fri, Oct 5, 2018 at 3:21 PM Michael Ellerman wrote: >> Recently we implemented show_user_instructions() which dumps the code >> around the NIP when a user space process dies with an unhandled >> signal. This was modelled on the x86 code, and we even went so far as >> to implement the exact same bug, namely that if the user process >> crashed with its NIP pointing into the kernel we will dump kernel text >> to dmesg. eg: >> >> bad-bctr[2996]: segfault (11) at c001 nip c001 lr >> 12d0b0894 code 1 >> bad-bctr[2996]: code: fbe10068 7cbe2b78 7c7f1b78 fb610048 38a10028 >> 38810020 fb810050 7f8802a6 >> bad-bctr[2996]: code: 3860001c f8010080 48242371 6000 <7c7b1b79> >> 4082002c e8010080 eb610048 >> >> This was discovered on x86 by Jann Horn and fixed in commit >> 342db04ae712 ("x86/dumpstack: Don't dump kernel memory based on usermode >> RIP"). >> >> Fix it by checking the adjusted NIP value (pc) and number of >> instructions against USER_DS, and bail if we fail the check, eg: > > This fix looks good to me. Thanks. > In the long term, I think it is somewhat awkward to use > probe_kernel_address(), which uses set_fs(KERNEL_DS), when you > actually just want to access userspace memory. It might make sense to > provide a better helper for explicitly accessing memory with USER_DS. Yes I agree, it's a bit messy. A probe_user_read() that sets USER_DS and does the access_ok() check would be less error prone I think. cheers
Re: [PATCH v5 05/33] KVM: PPC: Book3S HV: Extract PMU save/restore operations as C-callable functions
On Monday 08 October 2018 11:00 AM, Paul Mackerras wrote: This pulls out the assembler code that is responsible for saving and restoring the PMU state for the host and guest into separate functions so they can be used from an alternate entry path. The calling convention is made compatible with C. Reviewed-by: Madhavan Srinivasan Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/asm-prototypes.h | 5 + arch/powerpc/kvm/book3s_hv_interrupts.S | 95 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 363 -- 3 files changed, 253 insertions(+), 210 deletions(-) diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index 1f4691c..024e8fc 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -150,4 +150,9 @@ extern s32 patch__memset_nocache, patch__memcpy_nocache; extern long flush_count_cache; +void kvmhv_save_host_pmu(void); +void kvmhv_load_host_pmu(void); +void kvmhv_save_guest_pmu(struct kvm_vcpu *vcpu, bool pmu_in_use); +void kvmhv_load_guest_pmu(struct kvm_vcpu *vcpu); + #endif /* _ASM_POWERPC_ASM_PROTOTYPES_H */ diff --git a/arch/powerpc/kvm/book3s_hv_interrupts.S b/arch/powerpc/kvm/book3s_hv_interrupts.S index 666b91c..a6d1001 100644 --- a/arch/powerpc/kvm/book3s_hv_interrupts.S +++ b/arch/powerpc/kvm/book3s_hv_interrupts.S @@ -64,52 +64,7 @@ BEGIN_FTR_SECTION END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) /* Save host PMU registers */ -BEGIN_FTR_SECTION - /* Work around P8 PMAE bug */ - li r3, -1 - clrrdi r3, r3, 10 - mfspr r8, SPRN_MMCR2 - mtspr SPRN_MMCR2, r3 /* freeze all counters using MMCR2 */ - isync -END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) - li r3, 1 - sldir3, r3, 31 /* MMCR0_FC (freeze counters) bit */ - mfspr r7, SPRN_MMCR0 /* save MMCR0 */ - mtspr SPRN_MMCR0, r3 /* freeze all counters, disable interrupts */ - mfspr r6, SPRN_MMCRA - /* Clear MMCRA in order to disable SDAR updates */ - li r5, 0 - mtspr SPRN_MMCRA, r5 - isync - lbz r5, PACA_PMCINUSE(r13) /* is the host using the PMU? */ - cmpwi r5, 0 - beq 31f /* skip if not */ - mfspr r5, SPRN_MMCR1 - mfspr r9, SPRN_SIAR - mfspr r10, SPRN_SDAR - std r7, HSTATE_MMCR0(r13) - std r5, HSTATE_MMCR1(r13) - std r6, HSTATE_MMCRA(r13) - std r9, HSTATE_SIAR(r13) - std r10, HSTATE_SDAR(r13) -BEGIN_FTR_SECTION - mfspr r9, SPRN_SIER - std r8, HSTATE_MMCR2(r13) - std r9, HSTATE_SIER(r13) -END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) - mfspr r3, SPRN_PMC1 - mfspr r5, SPRN_PMC2 - mfspr r6, SPRN_PMC3 - mfspr r7, SPRN_PMC4 - mfspr r8, SPRN_PMC5 - mfspr r9, SPRN_PMC6 - stw r3, HSTATE_PMC1(r13) - stw r5, HSTATE_PMC2(r13) - stw r6, HSTATE_PMC3(r13) - stw r7, HSTATE_PMC4(r13) - stw r8, HSTATE_PMC5(r13) - stw r9, HSTATE_PMC6(r13) -31: + bl kvmhv_save_host_pmu /* * Put whatever is in the decrementer into the @@ -161,3 +116,51 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) ld r0, PPC_LR_STKOFF(r1) mtlrr0 blr + +_GLOBAL(kvmhv_save_host_pmu) +BEGIN_FTR_SECTION + /* Work around P8 PMAE bug */ + li r3, -1 + clrrdi r3, r3, 10 + mfspr r8, SPRN_MMCR2 + mtspr SPRN_MMCR2, r3 /* freeze all counters using MMCR2 */ + isync +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) + li r3, 1 + sldir3, r3, 31 /* MMCR0_FC (freeze counters) bit */ + mfspr r7, SPRN_MMCR0 /* save MMCR0 */ + mtspr SPRN_MMCR0, r3 /* freeze all counters, disable interrupts */ + mfspr r6, SPRN_MMCRA + /* Clear MMCRA in order to disable SDAR updates */ + li r5, 0 + mtspr SPRN_MMCRA, r5 + isync + lbz r5, PACA_PMCINUSE(r13) /* is the host using the PMU? */ + cmpwi r5, 0 + beq 31f /* skip if not */ + mfspr r5, SPRN_MMCR1 + mfspr r9, SPRN_SIAR + mfspr r10, SPRN_SDAR + std r7, HSTATE_MMCR0(r13) + std r5, HSTATE_MMCR1(r13) + std r6, HSTATE_MMCRA(r13) + std r9, HSTATE_SIAR(r13) + std r10, HSTATE_SDAR(r13) +BEGIN_FTR_SECTION + mfspr r9, SPRN_SIER + std r8, HSTATE_MMCR2(r13) + std r9, HSTATE_SIER(r13) +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) + mfspr r3, SPRN_PMC1 + mfspr r5, SPRN_PMC2 + mfspr r6, SPRN_PMC3 + mfspr r7, SPRN_PMC4 + mfspr r8, SPRN_PMC5 + mfspr r9, SPRN_PMC6 + stw r3, HSTATE_PMC1(r13) +
Re: [PATCH] powerpc: Don't print kernel instructions in show_user_instructions()
Christophe LEROY writes: > Le 05/10/2018 à 15:21, Michael Ellerman a écrit : >> Recently we implemented show_user_instructions() which dumps the code ... >> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c >> index 913c5725cdb2..bb6ac471a784 100644 >> --- a/arch/powerpc/kernel/process.c >> +++ b/arch/powerpc/kernel/process.c >> @@ -1306,6 +1306,16 @@ void show_user_instructions(struct pt_regs *regs) >> >> pc = regs->nip - (instructions_to_print * 3 / 4 * sizeof(int)); >> >> +/* >> + * Make sure the NIP points at userspace, not kernel text/data or >> + * elsewhere. >> + */ >> +if (!__access_ok(pc, instructions_to_print * sizeof(int), USER_DS)) { >> +pr_info("%s[%d]: Bad NIP, not dumping instructions.\n", >> +current->comm, current->pid); >> +return; >> +} >> + > > This will conflict with my serie > https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=64611 > which changes instructions_to_print to a constant. Will you merge it or > do you expect me to rebase my serie ? I can fix it up. But I see you've already rebased it and resent, you're too quick for me :) cheers
Re: [PATCH 29/36] dt-bindings: arm: Convert Renesas board/soc bindings to json-schema
Hi Rob, On Fri, Oct 5, 2018 at 6:59 PM Rob Herring wrote: > Convert Renesas SoC bindings to DT schema format using json-schema. > > Cc: Simon Horman > Cc: Magnus Damm > Cc: Mark Rutland > Cc: linux-renesas-...@vger.kernel.org > Cc: devicet...@vger.kernel.org > Signed-off-by: Rob Herring Thanks for your patch! Note that this will need a rebase, as more SoCs/boards have been added in -next. > --- /dev/null > +++ b/Documentation/devicetree/bindings/arm/shmobile.yaml > @@ -0,0 +1,205 @@ > +# SPDX-License-Identifier: None The old file didn't have an SPDX header, so it was GPL-2.0, implicitly? > +%YAML 1.2 > +--- > +$id: http://devicetree.org/schemas/bindings/arm/shmobile.yaml# > +$schema: http://devicetree.org/meta-schemas/core.yaml# > + > +title: Renesas SH-Mobile, R-Mobile, and R-Car Platform Device Tree Bindings > + > +maintainers: > + - Geert Uytterhoeven Simon Horman (supporter:ARM/SHMOBILE ARM ARCHITECTURE) Magnus Damm (supporter:ARM/SHMOBILE ARM ARCHITECTURE) You had it right in the CC list, though... > + - description: RZ/G1M (R8A77430) > +items: > + - enum: > + # iWave Systems RZ/G1M Qseven Development Platform > (iW-RainboW-G20D-Qseven) > + - iwave,g20d > + - const: iwave,g20m > + - const: renesas,r8a7743 > + > + - items: > + - enum: > + # iWave Systems RZ/G1M Qseven System On Module > (iW-RainboW-G20M-Qseven) > + - iwave,g20m > + - const: renesas,r8a7743 > + > + - description: RZ/G1N (R8A77440) > +items: > + - enum: > + - renesas,sk-rzg1m # SK-RZG1M (YR8A77430S000BE) This board belongs under the RZ/G1M section above (see also the 7743 in the part number). > + - const: renesas,r8a7744 > + - description: Kingfisher (SBEV-RCAR-KF-M03) > +items: > + - const: shimafuji,kingfisher > + - enum: > + - renesas,h3ulcb > + - renesas,m3ulcb > + - enum: > + - renesas,r8a7795 > + - renesas,r8a7796 This looks a bit funny: all other entries have the "const" last, and use it for the SoC number. May be correct, though. To clarify, this is an extension board that can fit both the [HM]3ULCB boards (actually also the new M3NULCB, I think). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [RFC PATCH kernel] vfio/spapr_tce: Get rid of possible infinite loop
Alexey Kardashevskiy wrote: > As a part of cleanup, the SPAPR TCE IOMMU subdriver releases preregistered > memory. If there is a bug in memory release, the loop in > tce_iommu_release() becomes infinite; this actually happened to me. > > This makes the loop finite and prints a warning on every failure to make > the code more bug prone. > > Signed-off-by: Alexey Kardashevskiy > --- > drivers/vfio/vfio_iommu_spapr_tce.c | 10 +++--- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c > b/drivers/vfio/vfio_iommu_spapr_tce.c > index b1a8ab3..ece0651 100644 > --- a/drivers/vfio/vfio_iommu_spapr_tce.c > +++ b/drivers/vfio/vfio_iommu_spapr_tce.c > @@ -371,6 +371,7 @@ static void tce_iommu_release(void *iommu_data) > { > struct tce_container *container = iommu_data; > struct tce_iommu_group *tcegrp; > + struct tce_iommu_prereg *tcemem, *tmtmp; > long i; > > while (tce_groups_attached(container)) { > @@ -393,13 +394,8 @@ static void tce_iommu_release(void *iommu_data) > tce_iommu_free_table(container, tbl); > } > > - while (!list_empty(>prereg_list)) { > - struct tce_iommu_prereg *tcemem; > - > - tcemem = list_first_entry(>prereg_list, > - struct tce_iommu_prereg, next); > - WARN_ON_ONCE(tce_iommu_prereg_free(container, tcemem)); > - } > + list_for_each_entry_safe(tcemem, tmtmp, >prereg_list, next) > + WARN_ON(tce_iommu_prereg_free(container, tcemem)); I'm not sure that tce_iommu_prereg_free() call under WARN_ON() is good idea because WARN_ON() is a preprocessor macro: if CONFIG_WARN=n is added by the analogy with CONFIG_BUG=n defining WARN_ON() as empty we will loose call to tce_iommu_prereg_free() leaking resources. There is no problem at the moment: WARN_ON() defined for PPC in arch/powerpc/include/asm/bug.h unconditionally. So your first version with intermediate variable looks better to me. > > tce_iommu_disable(container); > if (container->mm) > -- Thanks, Serhii signature.asc Description: OpenPGP digital signature
Re: [PATCH 36/36] dt-bindings: arm: Convert ZTE board/soc bindings to json-schema
On Fri, Oct 05, 2018 at 11:58:48AM -0500, Rob Herring wrote: > Convert ZTE SoC bindings to DT schema format using json-schema. > > Cc: Jun Nie > Cc: Baoyou Xie > Cc: Shawn Guo > Cc: Mark Rutland > Cc: linux-arm-ker...@lists.infradead.org > Cc: devicet...@vger.kernel.org > Signed-off-by: Rob Herring Acked-by: Shawn Guo
Re: [PATCH 05/36] dt-bindings: arm: renesas: Move 'renesas,prr' binding to its own doc
Hi Rob, On Fri, Oct 5, 2018 at 6:58 PM Rob Herring wrote: > In preparation to convert board-level bindings to json-schema, move > various misc SoC bindings out to their own file. > > Cc: Mark Rutland > Cc: Simon Horman > Cc: Magnus Damm > Cc: devicet...@vger.kernel.org > Cc: linux-renesas-...@vger.kernel.org > Signed-off-by: Rob Herring Looks good to me, but needs a rebase, as the PRR section has been extended in -next. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH] powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y
On Mon, 08 Oct 2018 15:08:31 +1100 Benjamin Herrenschmidt wrote: > HMIs will crash the kernel due to > > BRANCH_LINK_TO_FAR(hmi_exception_realmode) > > Calling into the OPD instead of the actual code. > > Signed-off-by: Benjamin Herrenschmidt > --- > > This hack fixes it for me, but it's not great. Nick, any better idea ? Is it a hack because the ifdef gunk, or because there's something deeper wrong with using the .sym? I guess all those handlers that load label address by hand could have the bug silently creep in. Can we have them use the DOTSYM() macro? Thanks, Nick > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index ea04dfb..752709cc8 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -1119,7 +1119,11 @@ TRAMP_REAL_BEGIN(hmi_exception_early) > EXCEPTION_PROLOG_COMMON_2(PACA_EXGEN) > EXCEPTION_PROLOG_COMMON_3(0xe60) > addir3,r1,STACK_FRAME_OVERHEAD > +#ifdef PPC64_ELF_ABI_v1 > + BRANCH_LINK_TO_FAR(.hmi_exception_realmode) /* Function call ABI */ > +#else > BRANCH_LINK_TO_FAR(hmi_exception_realmode) /* Function call ABI */ > +#endif > cmpdi cr0,r3,0 > > /* Windup the stack. */ > >
Re: [PATCH 22/36] dt-bindings: arm: Convert FSL board/soc bindings to json-schema
On Fri, Oct 05, 2018 at 11:58:34AM -0500, Rob Herring wrote: > Convert Freescale SoC bindings to DT schema format using json-schema. > > Cc: Shawn Guo > Cc: Mark Rutland > Cc: devicet...@vger.kernel.org > Signed-off-by: Rob Herring > --- > .../devicetree/bindings/arm/armadeus.txt | 6 - > Documentation/devicetree/bindings/arm/bhf.txt | 6 - > .../bindings/arm/compulab-boards.txt | 25 --- > Documentation/devicetree/bindings/arm/fsl.txt | 185 -- > .../devicetree/bindings/arm/fsl.yaml | 166 > .../devicetree/bindings/arm/i2se.txt | 22 --- > .../devicetree/bindings/arm/olimex.txt| 10 - > .../devicetree/bindings/arm/technologic.txt | 23 --- > 8 files changed, 166 insertions(+), 277 deletions(-) > delete mode 100644 Documentation/devicetree/bindings/arm/armadeus.txt > delete mode 100644 Documentation/devicetree/bindings/arm/bhf.txt > delete mode 100644 Documentation/devicetree/bindings/arm/compulab-boards.txt > delete mode 100644 Documentation/devicetree/bindings/arm/fsl.txt > create mode 100644 Documentation/devicetree/bindings/arm/fsl.yaml > delete mode 100644 Documentation/devicetree/bindings/arm/i2se.txt > delete mode 100644 Documentation/devicetree/bindings/arm/olimex.txt > delete mode 100644 Documentation/devicetree/bindings/arm/technologic.txt > > diff --git a/Documentation/devicetree/bindings/arm/armadeus.txt > b/Documentation/devicetree/bindings/arm/armadeus.txt > deleted file mode 100644 > index 9821283ff516.. > --- a/Documentation/devicetree/bindings/arm/armadeus.txt > +++ /dev/null > @@ -1,6 +0,0 @@ > -Armadeus i.MX Platforms Device Tree Bindings > > - > -APF51: i.MX51 based module. > -Required root node properties: > -- compatible = "armadeus,imx51-apf51", "fsl,imx51"; > diff --git a/Documentation/devicetree/bindings/arm/bhf.txt > b/Documentation/devicetree/bindings/arm/bhf.txt > deleted file mode 100644 > index 886b503caf9c.. > --- a/Documentation/devicetree/bindings/arm/bhf.txt > +++ /dev/null > @@ -1,6 +0,0 @@ > -Beckhoff Automation Platforms Device Tree Bindings > --- > - > -CX9020 Embedded PC > -Required root node properties: > -- compatible = "bhf,cx9020", "fsl,imx53"; > diff --git a/Documentation/devicetree/bindings/arm/compulab-boards.txt > b/Documentation/devicetree/bindings/arm/compulab-boards.txt > deleted file mode 100644 > index 42a10285af9c.. > --- a/Documentation/devicetree/bindings/arm/compulab-boards.txt > +++ /dev/null > @@ -1,25 +0,0 @@ > -CompuLab SB-SOM is a multi-module baseboard capable of carrying: > - - CM-T43 > - - CM-T54 > - - CM-QS600 > - - CL-SOM-AM57x > - - CL-SOM-iMX7 > -modules with minor modifications to the SB-SOM assembly. > - > -Required root node properties: > -- compatible = should be "compulab,sb-som" > - > -Compulab CL-SOM-iMX7 is a miniature System-on-Module (SoM) based on > -Freescale i.MX7 ARM Cortex-A7 System-on-Chip. > - > -Required root node properties: > -- compatible = "compulab,cl-som-imx7", "fsl,imx7d"; > - > -Compulab SBC-iMX7 is a single board computer based on the > -Freescale i.MX7 system-on-chip. SBC-iMX7 is implemented with > -the CL-SOM-iMX7 System-on-Module providing most of the functions, > -and SB-SOM-iMX7 carrier board providing additional peripheral > -functions and connectors. > - > -Required root node properties: > -- compatible = "compulab,sbc-imx7", "compulab,cl-som-imx7", "fsl,imx7d"; > diff --git a/Documentation/devicetree/bindings/arm/fsl.txt > b/Documentation/devicetree/bindings/arm/fsl.txt > deleted file mode 100644 > index 1e775aaa5c5b.. > --- a/Documentation/devicetree/bindings/arm/fsl.txt > +++ /dev/null > @@ -1,185 +0,0 @@ > -Freescale i.MX Platforms Device Tree Bindings > > - > -i.MX23 Evaluation Kit > -Required root node properties: > -- compatible = "fsl,imx23-evk", "fsl,imx23"; > - > -i.MX25 Product Development Kit > -Required root node properties: > -- compatible = "fsl,imx25-pdk", "fsl,imx25"; > - > -i.MX27 Product Development Kit > -Required root node properties: > -- compatible = "fsl,imx27-pdk", "fsl,imx27"; > - > -i.MX28 Evaluation Kit > -Required root node properties: > -- compatible = "fsl,imx28-evk", "fsl,imx28"; > - > -i.MX51 Babbage Board > -Required root node properties: > -- compatible = "fsl,imx51-babbage", "fsl,imx51"; > - > -i.MX53 Automotive Reference Design Board > -Required root node properties: > -- compatible = "fsl,imx53-ard", "fsl,imx53"; > - > -i.MX53 Evaluation Kit > -Required root node properties: > -- compatible = "fsl,imx53-evk", "fsl,imx53"; > - > -i.MX53 Quick Start Board > -Required root node properties: > -- compatible = "fsl,imx53-qsb", "fsl,imx53"; > - > -i.MX53 Smart Mobile Reference Design Board > -Required root node properties: > --
Re: [PATCH 06/36] dt-bindings: arm: zte: Move sysctrl bindings to their own doc
On Fri, Oct 05, 2018 at 11:58:18AM -0500, Rob Herring wrote: > In preparation to convert board-level bindings to json-schema, move > various misc SoC bindings out to their own file. > > Cc: Mark Rutland > Cc: Jun Nie > Cc: Baoyou Xie > Cc: Shawn Guo > Cc: devicet...@vger.kernel.org > Cc: linux-arm-ker...@lists.infradead.org > Signed-off-by: Rob Herring > --- > .../devicetree/bindings/arm/zte-sysctrl.txt | 30 +++ zte,sysctrl.txt to be consistent with other files like fsl,layerscape-dcfg.txt? I'm fine with either way, but just want to see more consistent naming convention? Other than that, Acked-by: Shawn Guo
Re: [PATCH 04/36] dt-bindings: arm: fsl: Move DCFG and SCFG bindings to their own docs
On Fri, Oct 05, 2018 at 11:58:16AM -0500, Rob Herring wrote: > In preparation to convert board-level bindings to json-schema, move > various misc SoC bindings out to their own file. > > Cc: Shawn Guo > Cc: Mark Rutland > Cc: devicet...@vger.kernel.org > Signed-off-by: Rob Herring Acked-by: Shawn Guo
Re: [PATCH v4 6/6] arm64: dts: add LX2160ARDB board support
On Thu, Oct 04, 2018 at 06:33:51AM +0530, Vabhav Sharma wrote: > LX2160A reference design board (RDB) is a high-performance > computing, evaluation, and development platform with LX2160A > SoC. > > Signed-off-by: Priyanka Jain > Signed-off-by: Sriram Dash > Signed-off-by: Vabhav Sharma > --- > arch/arm64/boot/dts/freescale/Makefile| 1 + > arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts | 100 > ++ > 2 files changed, 101 insertions(+) > create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts > > diff --git a/arch/arm64/boot/dts/freescale/Makefile > b/arch/arm64/boot/dts/freescale/Makefile > index 86e18ad..445b72b 100644 > --- a/arch/arm64/boot/dts/freescale/Makefile > +++ b/arch/arm64/boot/dts/freescale/Makefile > @@ -13,3 +13,4 @@ dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-rdb.dtb > dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-simu.dtb > dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-qds.dtb > dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-rdb.dtb > +dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-rdb.dtb > diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts > b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts > new file mode 100644 > index 000..1483071 > --- /dev/null > +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts > @@ -0,0 +1,100 @@ > +// SPDX-License-Identifier: (GPL-2.0 OR MIT) > +// > +// Device Tree file for LX2160ARDB > +// > +// Copyright 2018 NXP > + > +/dts-v1/; > + > +#include "fsl-lx2160a.dtsi" > + > +/ { > + model = "NXP Layerscape LX2160ARDB"; > + compatible = "fsl,lx2160a-rdb", "fsl,lx2160a"; > + > + chosen { > + stdout-path = "serial0:115200n8"; > + }; > + > + sb_3v3: regulator-fixed { The node name should probably be named like regulator-sb3v3 or something, so that the pattern can be followed when we have another fixed regulator to be added. > + compatible = "regulator-fixed"; > + regulator-name = "fixed-3.3V"; The name should be something we can find on board schematics. > + regulator-min-microvolt = <330>; > + regulator-max-microvolt = <330>; > + regulator-boot-on; > + regulator-always-on; > + }; > + > +}; > + > + { > + status = "okay"; > +}; > + > + { > + status = "okay"; > +}; > + > + { Please keep these labeled nodes sorted alphabetically. > + status = "okay"; Have a newline between properties and child node. > + i2c-mux@77 { > + compatible = "nxp,pca9547"; > + reg = <0x77>; > + #address-cells = <1>; > + #size-cells = <0>; > + > + i2c@2 { > + #address-cells = <1>; > + #size-cells = <0>; > + reg = <0x2>; > + > + power-monitor@40 { > + compatible = "ti,ina220"; > + reg = <0x40>; > + shunt-resistor = <1000>; > + }; > + }; > + > + i2c@3 { > + #address-cells = <1>; > + #size-cells = <0>; > + reg = <0x3>; > + > + temperature-sensor@4c { > + compatible = "nxp,sa56004"; > + reg = <0x4c>; > + vcc-supply = <_3v3>; > + }; > + > + temperature-sensor@4d { > + compatible = "nxp,sa56004"; > + reg = <0x4d>; > + vcc-supply = <_3v3>; > + }; > + }; > + }; > +}; > + > + { > + status = "okay"; > + > + rtc@51 { > + compatible = "nxp,pcf2129"; > + reg = <0x51>; > + // IRQ10_B > + interrupts = <0 150 0x4>; > + }; Bad indentation. Shawn > + > +}; > + > + { > + status = "okay"; > +}; > + > + { > + status = "okay"; > +}; > + > + { > + status = "okay"; > +}; > -- > 2.7.4 >
Re: [PATCH v4 5/6] arm64: dts: add QorIQ LX2160A SoC support
On Thu, Oct 04, 2018 at 06:33:50AM +0530, Vabhav Sharma wrote: > LX2160A SoC is based on Layerscape Chassis Generation 3.2 Architecture. > > LX2160A features an advanced 16 64-bit ARM v8 CortexA72 processor cores > in 8 cluster, CCN508, GICv3,two 64-bit DDR4 memory controller, 8 I2C > controllers, 3 dspi, 2 esdhc,2 USB 3.0, mmu 500, 3 SATA, 4 PL011 SBSA > UARTs etc. > > Signed-off-by: Ramneek Mehresh > Signed-off-by: Zhang Ying-22455 > Signed-off-by: Nipun Gupta > Signed-off-by: Priyanka Jain > Signed-off-by: Yogesh Gaur > Signed-off-by: Sriram Dash > Signed-off-by: Vabhav Sharma > --- > arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 702 > + > 1 file changed, 702 insertions(+) > create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi > > diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi > b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi > new file mode 100644 > index 000..c758268 > --- /dev/null > +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi > @@ -0,0 +1,702 @@ > +// SPDX-License-Identifier: (GPL-2.0 OR MIT) > +// > +// Device Tree Include file for Layerscape-LX2160A family SoC. > +// > +// Copyright 2018 NXP > + > +#include > + > +/memreserve/ 0x8000 0x0001; > + > +/ { > + compatible = "fsl,lx2160a"; > + interrupt-parent = <>; > + #address-cells = <2>; > + #size-cells = <2>; > + > + cpus { > + #address-cells = <1>; > + #size-cells = <0>; > + > + // 8 clusters having 2 Cortex-A72 cores each > + cpu@0 { > + device_type = "cpu"; > + compatible = "arm,cortex-a72"; > + enable-method = "psci"; > + reg = <0x0>; > + clocks = < 1 0>; > + d-cache-size = <0x8000>; > + d-cache-line-size = <64>; > + d-cache-sets = <128>; > + i-cache-size = <0xC000>; > + i-cache-line-size = <64>; > + i-cache-sets = <192>; > + next-level-cache = <_l2>; > + }; > + > + cpu@1 { > + device_type = "cpu"; > + compatible = "arm,cortex-a72"; > + enable-method = "psci"; > + reg = <0x1>; > + clocks = < 1 0>; > + d-cache-size = <0x8000>; > + d-cache-line-size = <64>; > + d-cache-sets = <128>; > + i-cache-size = <0xC000>; > + i-cache-line-size = <64>; > + i-cache-sets = <192>; > + next-level-cache = <_l2>; > + }; > + > + cpu@100 { > + device_type = "cpu"; > + compatible = "arm,cortex-a72"; > + enable-method = "psci"; > + reg = <0x100>; > + clocks = < 1 1>; > + d-cache-size = <0x8000>; > + d-cache-line-size = <64>; > + d-cache-sets = <128>; > + i-cache-size = <0xC000>; > + i-cache-line-size = <64>; > + i-cache-sets = <192>; > + next-level-cache = <_l2>; > + }; > + > + cpu@101 { > + device_type = "cpu"; > + compatible = "arm,cortex-a72"; > + enable-method = "psci"; > + reg = <0x101>; > + clocks = < 1 1>; > + d-cache-size = <0x8000>; > + d-cache-line-size = <64>; > + d-cache-sets = <128>; > + i-cache-size = <0xC000>; > + i-cache-line-size = <64>; > + i-cache-sets = <192>; > + next-level-cache = <_l2>; > + }; > + > + cpu@200 { > + device_type = "cpu"; > + compatible = "arm,cortex-a72"; > + enable-method = "psci"; > + reg = <0x200>; > + clocks = < 1 2>; > + d-cache-size = <0x8000>; > + d-cache-line-size = <64>; > + d-cache-sets = <128>; > + i-cache-size = <0xC000>; > + i-cache-line-size = <64>; > + i-cache-sets = <192>; > + next-level-cache = <_l2>; > + }; > + > + cpu@201 { > + device_type = "cpu"; > + compatible = "arm,cortex-a72"; > + enable-method = "psci"; > + reg = <0x201>; > + clocks = < 1 2>; > + d-cache-size = <0x8000>; > + d-cache-line-size = <64>; > +
[PATCH v5 33/33] KVM: PPC: Book3S HV: Add NO_HASH flag to GET_SMMU_INFO ioctl result
This adds a KVM_PPC_NO_HASH flag to the flags field of the kvm_ppc_smmu_info struct, and arranges for it to be set when running as a nested hypervisor, as an unambiguous indication to userspace that HPT guests are not supported. Reporting the KVM_CAP_PPC_MMU_HASH_V3 capability as false could be taken as indicating only that the new HPT features in ISA V3.0 are not supported, leaving it ambiguous whether pre-V3.0 HPT features are supported. Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/api.txt | 4 arch/powerpc/kvm/book3s_hv.c | 4 include/uapi/linux/kvm.h | 1 + 3 files changed, 9 insertions(+) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index fde48b6..df98b63 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2270,6 +2270,10 @@ The supported flags are: The emulated MMU supports 1T segments in addition to the standard 256M ones. +- KVM_PPC_NO_HASH + This flag indicates that HPT guests are not supported by KVM, + thus all guests must use radix MMU mode. + The "slb_size" field indicates how many SLB entries are supported The "sps" array contains 8 entries indicating the supported base diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index fa61647..f565403 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -4245,6 +4245,10 @@ static int kvm_vm_ioctl_get_smmu_info_hv(struct kvm *kvm, kvmppc_add_seg_page_size(, 16, SLB_VSID_L | SLB_VSID_LP_01); kvmppc_add_seg_page_size(, 24, SLB_VSID_L); + /* If running as a nested hypervisor, we don't support HPT guests */ + if (kvmhv_on_pseries()) + info->flags |= KVM_PPC_NO_HASH; + return 0; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index d9cec6b..7f2ff3a 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -719,6 +719,7 @@ struct kvm_ppc_one_seg_page_size { #define KVM_PPC_PAGE_SIZES_REAL0x0001 #define KVM_PPC_1T_SEGMENTS0x0002 +#define KVM_PPC_NO_HASH0x0004 struct kvm_ppc_smmu_info { __u64 flags; -- 2.7.4
[PATCH v5 32/33] KVM: PPC: Book3S HV: Add a VM capability to enable nested virtualization
With this, userspace can enable a KVM-HV guest to run nested guests under it. The administrator can control whether any nested guests can be run; setting the "nested" module parameter to false prevents any guests becoming nested hypervisors (that is, any attempt to enable the nested capability on a guest will fail). Guests which are already nested hypervisors will continue to be so. Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/api.txt | 14 ++ arch/powerpc/include/asm/kvm_ppc.h | 1 + arch/powerpc/kvm/book3s_hv.c | 39 +- arch/powerpc/kvm/powerpc.c | 12 include/uapi/linux/kvm.h | 1 + 5 files changed, 58 insertions(+), 9 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 2f5f9b7..fde48b6 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -4532,6 +4532,20 @@ With this capability, a guest may read the MSR_PLATFORM_INFO MSR. Otherwise, a #GP would be raised when the guest tries to access. Currently, this capability does not enable write permissions of this MSR for the guest. +7.16 KVM_CAP_PPC_NESTED_HV + +Architectures: ppc +Parameters: none +Returns: 0 on success, -EINVAL when the implementation doesn't support +nested-HV virtualization. + +HV-KVM on POWER9 and later systems allows for "nested-HV" +virtualization, which provides a way for a guest VM to run guests that +can run using the CPU's supervisor mode (privileged non-hypervisor +state). Enabling this capability on a VM depends on the CPU having +the necessary functionality and on the facility being enabled with a +kvm-hv module parameter. + 8. Other capabilities. -- diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 245e564..b3796bd 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -327,6 +327,7 @@ struct kvmppc_ops { int (*set_smt_mode)(struct kvm *kvm, unsigned long mode, unsigned long flags); void (*giveup_ext)(struct kvm_vcpu *vcpu, ulong msr); + int (*enable_nested)(struct kvm *kvm); }; extern struct kvmppc_ops *kvmppc_hv_ops; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 152bf75..fa61647 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -118,6 +118,16 @@ module_param_cb(h_ipi_redirect, _param_ops, _ipi_redirect, 0644); MODULE_PARM_DESC(h_ipi_redirect, "Redirect H_IPI wakeup to a free host core"); #endif +/* If set, guests are allowed to create and control nested guests */ +static bool nested = true; +module_param(nested, bool, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(nested, "Enable nested virtualization (only on POWER9)"); + +static inline bool nesting_enabled(struct kvm *kvm) +{ + return kvm->arch.nested_enable && kvm_is_radix(kvm); +} + /* If set, the threads on each CPU core have to be in the same MMU mode */ static bool no_mixing_hpt_and_radix; @@ -959,12 +969,12 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) case H_SET_PARTITION_TABLE: ret = H_FUNCTION; - if (vcpu->kvm->arch.nested_enable) + if (nesting_enabled(vcpu->kvm)) ret = kvmhv_set_partition_table(vcpu); break; case H_ENTER_NESTED: ret = H_FUNCTION; - if (!vcpu->kvm->arch.nested_enable) + if (!nesting_enabled(vcpu->kvm)) break; ret = kvmhv_enter_nested_guest(vcpu); if (ret == H_INTERRUPT) { @@ -974,9 +984,8 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) break; case H_TLB_INVALIDATE: ret = H_FUNCTION; - if (!vcpu->kvm->arch.nested_enable) - break; - ret = kvmhv_do_nested_tlbie(vcpu); + if (nesting_enabled(vcpu->kvm)) + ret = kvmhv_do_nested_tlbie(vcpu); break; default: @@ -4496,10 +4505,8 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu) /* Must be called with kvm->lock held and mmu_ready = 0 and no vcpus running */ int kvmppc_switch_mmu_to_hpt(struct kvm *kvm) { - if (kvm->arch.nested_enable) { - kvm->arch.nested_enable = false; + if (nesting_enabled(kvm)) kvmhv_release_all_nested(kvm); - } kvmppc_free_radix(kvm); kvmppc_update_lpcr(kvm, LPCR_VPM1, LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR); @@ -4776,7 +4783,7 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm) /* Perform global invalidation and return lpid to the pool */ if (cpu_has_feature(CPU_FTR_ARCH_300)) { - if (kvm->arch.nested_enable) + if (nesting_enabled(kvm))
[PATCH v5 30/33] KVM: PPC: Book3S HV: Allow HV module to load without hypervisor mode
With this, the KVM-HV module can be loaded in a guest running under KVM-HV, and if the hypervisor supports nested virtualization, this guest can now act as a nested hypervisor and run nested guests. This also adds some checks to inform userspace that HPT guests are not supported by nested hypervisors (by returning false for the KVM_CAP_PPC_MMU_HASH_V3 capability), and to prevent userspace from configuring a guest to use HPT mode. Signed-off-by: Paul Mackerras --- arch/powerpc/kvm/book3s_hv.c | 16 arch/powerpc/kvm/powerpc.c | 3 ++- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 127bb5f..152bf75 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -4807,11 +4807,15 @@ static int kvmppc_core_emulate_mfspr_hv(struct kvm_vcpu *vcpu, int sprn, static int kvmppc_core_check_processor_compat_hv(void) { - if (!cpu_has_feature(CPU_FTR_HVMODE) || - !cpu_has_feature(CPU_FTR_ARCH_206)) - return -EIO; + if (cpu_has_feature(CPU_FTR_HVMODE) && + cpu_has_feature(CPU_FTR_ARCH_206)) + return 0; - return 0; + /* POWER9 in radix mode is capable of being a nested hypervisor. */ + if (cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled()) + return 0; + + return -EIO; } #ifdef CONFIG_KVM_XICS @@ -5129,6 +5133,10 @@ static int kvmhv_configure_mmu(struct kvm *kvm, struct kvm_ppc_mmuv3_cfg *cfg) if (radix && !radix_enabled()) return -EINVAL; + /* If we're a nested hypervisor, we currently only support radix */ + if (kvmhv_on_pseries() && !radix) + return -EINVAL; + mutex_lock(>lock); if (radix != kvm_is_radix(kvm)) { if (kvm->arch.mmu_ready) { diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index eba5756..1f4b128 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -594,7 +594,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = !!(hv_enabled && radix_enabled()); break; case KVM_CAP_PPC_MMU_HASH_V3: - r = !!(hv_enabled && cpu_has_feature(CPU_FTR_ARCH_300)); + r = !!(hv_enabled && cpu_has_feature(CPU_FTR_ARCH_300) && + cpu_has_feature(CPU_FTR_HVMODE)); break; #endif case KVM_CAP_SYNC_MMU: -- 2.7.4
[PATCH v5 31/33] KVM: PPC: Book3S HV: Add nested shadow page tables to debugfs
This adds a list of valid shadow PTEs for each nested guest to the 'radix' file for the guest in debugfs. This can be useful for debugging. Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_64.h | 1 + arch/powerpc/kvm/book3s_64_mmu_radix.c | 39 +--- arch/powerpc/kvm/book3s_hv_nested.c | 15 3 files changed, 52 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 83d4def..6d29814 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -120,6 +120,7 @@ struct rmap_nested { struct kvm_nested_guest *kvmhv_get_nested(struct kvm *kvm, int l1_lpid, bool create); void kvmhv_put_nested(struct kvm_nested_guest *gp); +int kvmhv_nested_next_lpid(struct kvm *kvm, int lpid); /* Encoding of first parameter for H_TLB_INVALIDATE */ #define H_TLBIE_P1_ENC(ric, prs, r)(___PPC_RIC(ric) | ___PPC_PRS(prs) | \ diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index ae0e3ed..43b21e8 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -1002,6 +1002,7 @@ struct debugfs_radix_state { struct kvm *kvm; struct mutexmutex; unsigned long gpa; + int lpid; int chars_left; int buf_index; charbuf[128]; @@ -1043,6 +1044,7 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf, struct kvm *kvm; unsigned long gpa; pgd_t *pgt; + struct kvm_nested_guest *nested; pgd_t pgd, *pgdp; pud_t pud, *pudp; pmd_t pmd, *pmdp; @@ -1077,10 +1079,39 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf, } gpa = p->gpa; - pgt = kvm->arch.pgtable; - while (len != 0 && gpa < RADIX_PGTABLE_RANGE) { + nested = NULL; + pgt = NULL; + while (len != 0 && p->lpid >= 0) { + if (gpa >= RADIX_PGTABLE_RANGE) { + gpa = 0; + pgt = NULL; + if (nested) { + kvmhv_put_nested(nested); + nested = NULL; + } + p->lpid = kvmhv_nested_next_lpid(kvm, p->lpid); + p->hdr = 0; + if (p->lpid < 0) + break; + } + if (!pgt) { + if (p->lpid == 0) { + pgt = kvm->arch.pgtable; + } else { + nested = kvmhv_get_nested(kvm, p->lpid, false); + if (!nested) { + gpa = RADIX_PGTABLE_RANGE; + continue; + } + pgt = nested->shadow_pgtable; + } + } + n = 0; if (!p->hdr) { - n = scnprintf(p->buf, sizeof(p->buf), + if (p->lpid > 0) + n = scnprintf(p->buf, sizeof(p->buf), + "\nNested LPID %d: ", p->lpid); + n += scnprintf(p->buf + n, sizeof(p->buf) - n, "pgdir: %lx\n", (unsigned long)pgt); p->hdr = 1; goto copy; @@ -1146,6 +1177,8 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf, } } p->gpa = gpa; + if (nested) + kvmhv_put_nested(nested); out: mutex_unlock(>mutex); diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 3f21f78..401d2ec 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -1274,3 +1274,18 @@ long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu) mutex_unlock(>tlb_lock); return ret; } + +int kvmhv_nested_next_lpid(struct kvm *kvm, int lpid) +{ + int ret = -1; + + spin_lock(>mmu_lock); + while (++lpid <= kvm->arch.max_nested_lpid) { + if (kvm->arch.nested_guests[lpid]) { + ret = lpid; + break; + } + } + spin_unlock(>mmu_lock); + return ret; +} -- 2.7.4
[PATCH v5 29/33] KVM: PPC: Book3S HV: Handle differing endianness for H_ENTER_NESTED
From: Suraj Jitindar Singh The hcall H_ENTER_NESTED takes two parameters: the address in L1 guest memory of a hv_regs struct and the address of a pt_regs struct. The hcall requests the L0 hypervisor to use the register values in these structs to run a L2 guest and to return the exit state of the L2 guest in these structs. These are in the endianness of the L1 guest, rather than being always big-endian as is usually the case for PAPR hypercalls. This is convenient because it means that the L1 guest can pass the address of the regs field in its kvm_vcpu_arch struct. This also improves performance slightly by avoiding the need for two copies of the pt_regs struct. When reading/writing these structures, this patch handles the case where the endianness of the L1 guest differs from that of the L0 hypervisor, by byteswapping the structures after reading and before writing them back. Since all the fields of the pt_regs are of the same type, i.e., unsigned long, we treat it as an array of unsigned longs. The fields of struct hv_guest_state are not all the same, so its fields are byteswapped individually. Reviewed-by: David Gibson Signed-off-by: Suraj Jitindar Singh Signed-off-by: Paul Mackerras --- arch/powerpc/kvm/book3s_hv_nested.c | 51 - 1 file changed, 50 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index e2305962..3f21f78 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -51,6 +51,48 @@ void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr) hr->ppr = vcpu->arch.ppr; } +static void byteswap_pt_regs(struct pt_regs *regs) +{ + unsigned long *addr = (unsigned long *) regs; + + for (; addr < ((unsigned long *) (regs + 1)); addr++) + *addr = swab64(*addr); +} + +static void byteswap_hv_regs(struct hv_guest_state *hr) +{ + hr->version = swab64(hr->version); + hr->lpid = swab32(hr->lpid); + hr->vcpu_token = swab32(hr->vcpu_token); + hr->lpcr = swab64(hr->lpcr); + hr->pcr = swab64(hr->pcr); + hr->amor = swab64(hr->amor); + hr->dpdes = swab64(hr->dpdes); + hr->hfscr = swab64(hr->hfscr); + hr->tb_offset = swab64(hr->tb_offset); + hr->dawr0 = swab64(hr->dawr0); + hr->dawrx0 = swab64(hr->dawrx0); + hr->ciabr = swab64(hr->ciabr); + hr->hdec_expiry = swab64(hr->hdec_expiry); + hr->purr = swab64(hr->purr); + hr->spurr = swab64(hr->spurr); + hr->ic = swab64(hr->ic); + hr->vtb = swab64(hr->vtb); + hr->hdar = swab64(hr->hdar); + hr->hdsisr = swab64(hr->hdsisr); + hr->heir = swab64(hr->heir); + hr->asdr = swab64(hr->asdr); + hr->srr0 = swab64(hr->srr0); + hr->srr1 = swab64(hr->srr1); + hr->sprg[0] = swab64(hr->sprg[0]); + hr->sprg[1] = swab64(hr->sprg[1]); + hr->sprg[2] = swab64(hr->sprg[2]); + hr->sprg[3] = swab64(hr->sprg[3]); + hr->pidr = swab64(hr->pidr); + hr->cfar = swab64(hr->cfar); + hr->ppr = swab64(hr->ppr); +} + static void save_hv_return_state(struct kvm_vcpu *vcpu, int trap, struct hv_guest_state *hr) { @@ -175,6 +217,8 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) sizeof(struct hv_guest_state)); if (err) return H_PARAMETER; + if (kvmppc_need_byteswap(vcpu)) + byteswap_hv_regs(_hv); if (l2_hv.version != HV_GUEST_STATE_VERSION) return H_P2; @@ -183,7 +227,8 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) sizeof(struct pt_regs)); if (err) return H_PARAMETER; - + if (kvmppc_need_byteswap(vcpu)) + byteswap_pt_regs(_regs); if (l2_hv.vcpu_token >= NR_CPUS) return H_PARAMETER; @@ -255,6 +300,10 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) kvmhv_put_nested(l2); /* copy l2_hv_state and regs back to guest */ + if (kvmppc_need_byteswap(vcpu)) { + byteswap_hv_regs(_hv); + byteswap_pt_regs(_regs); + } err = kvm_vcpu_write_guest(vcpu, hv_ptr, _hv, sizeof(struct hv_guest_state)); if (err) -- 2.7.4
[PATCH v5 28/33] KVM: PPC: Book3S HV: Sanitise hv_regs on nested guest entry
From: Suraj Jitindar Singh restore_hv_regs() is used to copy the hv_regs L1 wants to set to run the nested (L2) guest into the vcpu structure. We need to sanitise these values to ensure we don't let the L1 guest hypervisor do things we don't want it to. We don't let data address watchpoints or completed instruction address breakpoints be set to match in hypervisor state. We also don't let L1 enable features in the hypervisor facility status and control register (HFSCR) for L2 which we have disabled for L1. That is L2 will get the subset of features which the L0 hypervisor has enabled for L1 and the features L1 wants to enable for L2. This could mean we give L1 a hypervisor facility unavailable interrupt for a facility it thinks it has enabled, however it shouldn't have enabled a facility it itself doesn't have for the L2 guest. We sanitise the registers when copying in the L2 hv_regs. We don't need to sanitise when copying back the L1 hv_regs since these shouldn't be able to contain invalid values as they're just what was copied out. Reviewed-by: David Gibson Signed-off-by: Suraj Jitindar Singh Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/kvm/book3s_hv_nested.c | 17 + 2 files changed, 18 insertions(+) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 6fda746..c9069897 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -415,6 +415,7 @@ #define HFSCR_DSCR __MASK(FSCR_DSCR_LG) #define HFSCR_VECVSX __MASK(FSCR_VECVSX_LG) #define HFSCR_FP __MASK(FSCR_FP_LG) +#define HFSCR_INTR_CAUSE (ASM_CONST(0xFF) << 56) /* interrupt cause */ #define SPRN_TAR 0x32f /* Target Address Register */ #define SPRN_LPCR 0x13E /* LPAR Control Register */ #define LPCR_VPM0ASM_CONST(0x8000) diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index a876dc3..e2305962 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -86,6 +86,22 @@ static void save_hv_return_state(struct kvm_vcpu *vcpu, int trap, } } +static void sanitise_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr) +{ + /* +* Don't let L1 enable features for L2 which we've disabled for L1, +* but preserve the interrupt cause field. +*/ + hr->hfscr &= (HFSCR_INTR_CAUSE | vcpu->arch.hfscr); + + /* Don't let data address watchpoint match in hypervisor state */ + hr->dawrx0 &= ~DAWRX_HYP; + + /* Don't let completed instruction address breakpt match in HV state */ + if ((hr->ciabr & CIABR_PRIV) == CIABR_PRIV_HYPER) + hr->ciabr &= ~CIABR_PRIV; +} + static void restore_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr) { struct kvmppc_vcore *vc = vcpu->arch.vcore; @@ -198,6 +214,7 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD | LPCR_LPES | LPCR_MER; lpcr = (vc->lpcr & ~mask) | (l2_hv.lpcr & mask); + sanitise_hv_regs(vcpu, _hv); restore_hv_regs(vcpu, _hv); vcpu->arch.ret = RESUME_GUEST; -- 2.7.4
[PATCH v5 27/33] KVM: PPC: Book3S HV: Add one-reg interface to virtual PTCR register
This adds a one-reg register identifier which can be used to read and set the virtual PTCR for the guest. This register identifies the address and size of the virtual partition table for the guest, which contains information about the nested guests under this guest. Migrating this value is the only extra requirement for migrating a guest which has nested guests (assuming of course that the destination host supports nested virtualization in the kvm-hv module). Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/api.txt | 1 + arch/powerpc/include/uapi/asm/kvm.h | 1 + arch/powerpc/kvm/book3s_hv.c| 6 ++ 3 files changed, 8 insertions(+) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 647f941..2f5f9b7 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1922,6 +1922,7 @@ registers, find a list below: PPC | KVM_REG_PPC_TIDR | 64 PPC | KVM_REG_PPC_PSSCR | 64 PPC | KVM_REG_PPC_DEC_EXPIRY| 64 + PPC | KVM_REG_PPC_PTCR | 64 PPC | KVM_REG_PPC_TM_GPR0 | 64 ... PPC | KVM_REG_PPC_TM_GPR31 | 64 diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 1b32b56..8c876c1 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -634,6 +634,7 @@ struct kvm_ppc_cpu_char { #define KVM_REG_PPC_DEC_EXPIRY (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbe) #define KVM_REG_PPC_ONLINE (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbf) +#define KVM_REG_PPC_PTCR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc0) /* Transactional Memory checkpointed state: * This is all GPRs, all VSX regs and a subset of SPRs diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index b8f14ea..127bb5f 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1710,6 +1710,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, case KVM_REG_PPC_ONLINE: *val = get_reg_val(id, vcpu->arch.online); break; + case KVM_REG_PPC_PTCR: + *val = get_reg_val(id, vcpu->kvm->arch.l1_ptcr); + break; default: r = -EINVAL; break; @@ -1941,6 +1944,9 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, atomic_dec(>arch.vcore->online_count); vcpu->arch.online = i; break; + case KVM_REG_PPC_PTCR: + vcpu->kvm->arch.l1_ptcr = set_reg_val(id, *val); + break; default: r = -EINVAL; break; -- 2.7.4
[PATCH v5 26/33] KVM: PPC: Book3S HV: Don't access HFSCR, LPIDR or LPCR when running nested
When running as a nested hypervisor, this avoids reading hypervisor privileged registers (specifically HFSCR, LPIDR and LPCR) at startup; instead reasonable default values are used. This also avoids writing LPIDR in the single-vcpu entry/exit path. Also, this removes the check for CPU_FTR_HVMODE in kvmppc_mmu_hv_init() since its only caller already checks this. Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 7 +++ arch/powerpc/kvm/book3s_hv.c| 33 + 2 files changed, 24 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 68e14af..c615617 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -268,14 +268,13 @@ int kvmppc_mmu_hv_init(void) { unsigned long host_lpid, rsvd_lpid; - if (!cpu_has_feature(CPU_FTR_HVMODE)) - return -EINVAL; - if (!mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE)) return -EINVAL; /* POWER7 has 10-bit LPIDs (12-bit in POWER8) */ - host_lpid = mfspr(SPRN_LPID); + host_lpid = 0; + if (cpu_has_feature(CPU_FTR_HVMODE)) + host_lpid = mfspr(SPRN_LPID); rsvd_lpid = LPID_RSVD; kvmppc_init_lpid(rsvd_lpid + 1); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 24a6683..b8f14ea 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2174,15 +2174,18 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm, * Set the default HFSCR for the guest from the host value. * This value is only used on POWER9. * On POWER9, we want to virtualize the doorbell facility, so we -* turn off the HFSCR bit, which causes those instructions to trap. +* don't set the HFSCR_MSGP bit, and that causes those instructions +* to trap and then we emulate them. */ - vcpu->arch.hfscr = mfspr(SPRN_HFSCR); - if (cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST)) + vcpu->arch.hfscr = HFSCR_TAR | HFSCR_EBB | HFSCR_PM | HFSCR_BHRB | + HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP; + if (cpu_has_feature(CPU_FTR_HVMODE)) { + vcpu->arch.hfscr &= mfspr(SPRN_HFSCR); + if (cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST)) + vcpu->arch.hfscr |= HFSCR_TM; + } + if (cpu_has_feature(CPU_FTR_TM_COMP)) vcpu->arch.hfscr |= HFSCR_TM; - else if (!cpu_has_feature(CPU_FTR_TM_COMP)) - vcpu->arch.hfscr &= ~HFSCR_TM; - if (cpu_has_feature(CPU_FTR_ARCH_300)) - vcpu->arch.hfscr &= ~HFSCR_MSGP; kvmppc_mmu_book3s_hv_init(vcpu); @@ -4002,8 +4005,10 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, srcu_read_unlock(>srcu, srcu_idx); - mtspr(SPRN_LPID, kvm->arch.host_lpid); - isync(); + if (cpu_has_feature(CPU_FTR_HVMODE)) { + mtspr(SPRN_LPID, kvm->arch.host_lpid); + isync(); + } trace_hardirqs_off(); set_irq_happened(trap); @@ -4630,9 +4635,13 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm) kvm->arch.host_sdr1 = mfspr(SPRN_SDR1); /* Init LPCR for virtual RMA mode */ - kvm->arch.host_lpid = mfspr(SPRN_LPID); - kvm->arch.host_lpcr = lpcr = mfspr(SPRN_LPCR); - lpcr &= LPCR_PECE | LPCR_LPES; + if (cpu_has_feature(CPU_FTR_HVMODE)) { + kvm->arch.host_lpid = mfspr(SPRN_LPID); + kvm->arch.host_lpcr = lpcr = mfspr(SPRN_LPCR); + lpcr &= LPCR_PECE | LPCR_LPES; + } else { + lpcr = 0; + } lpcr |= (4UL << LPCR_DPFD_SH) | LPCR_HDICE | LPCR_VPM0 | LPCR_VPM1; kvm->arch.vrma_slb_v = SLB_VSID_B_1T | -- 2.7.4
[PATCH v5 25/33] KVM: PPC: Book3S HV: Invalidate TLB when nested vcpu moves physical cpu
From: Suraj Jitindar Singh This is only done at level 0, since only level 0 knows which physical CPU a vcpu is running on. This does for nested guests what L0 already did for its own guests, which is to flush the TLB on a pCPU when it goes to run a vCPU there, and there is another vCPU in the same VM which previously ran on this pCPU and has now started to run on another pCPU. This is to handle the situation where the other vCPU touched a mapping, moved to another pCPU and did a tlbiel (local-only tlbie) on that new pCPU and thus left behind a stale TLB entry on this pCPU. This introduces a limit on the the vcpu_token values used in the H_ENTER_NESTED hcall -- they must now be less than NR_CPUS. [pau...@ozlabs.org - made prev_cpu array be short[] to reduce memory consumption.] Reviewed-by: David Gibson Signed-off-by: Suraj Jitindar Singh Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_64.h | 3 + arch/powerpc/kvm/book3s_hv.c | 101 +++ arch/powerpc/kvm/book3s_hv_nested.c | 5 ++ 3 files changed, 71 insertions(+), 38 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 719b31723..83d4def 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -52,6 +52,9 @@ struct kvm_nested_guest { long refcnt;/* number of pointers to this struct */ struct mutex tlb_lock; /* serialize page faults and tlbies */ struct kvm_nested_guest *next; + cpumask_t need_tlb_flush; + cpumask_t cpu_in_guest; + short prev_cpu[NR_CPUS]; }; /* diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 49f07de..24a6683 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2397,10 +2397,18 @@ static void kvmppc_release_hwthread(int cpu) static void radix_flush_cpu(struct kvm *kvm, int cpu, struct kvm_vcpu *vcpu) { + struct kvm_nested_guest *nested = vcpu->arch.nested; + cpumask_t *cpu_in_guest; int i; cpu = cpu_first_thread_sibling(cpu); - cpumask_set_cpu(cpu, >arch.need_tlb_flush); + if (nested) { + cpumask_set_cpu(cpu, >need_tlb_flush); + cpu_in_guest = >cpu_in_guest; + } else { + cpumask_set_cpu(cpu, >arch.need_tlb_flush); + cpu_in_guest = >arch.cpu_in_guest; + } /* * Make sure setting of bit in need_tlb_flush precedes * testing of cpu_in_guest bits. The matching barrier on @@ -2408,13 +2416,23 @@ static void radix_flush_cpu(struct kvm *kvm, int cpu, struct kvm_vcpu *vcpu) */ smp_mb(); for (i = 0; i < threads_per_core; ++i) - if (cpumask_test_cpu(cpu + i, >arch.cpu_in_guest)) + if (cpumask_test_cpu(cpu + i, cpu_in_guest)) smp_call_function_single(cpu + i, do_nothing, NULL, 1); } static void kvmppc_prepare_radix_vcpu(struct kvm_vcpu *vcpu, int pcpu) { + struct kvm_nested_guest *nested = vcpu->arch.nested; struct kvm *kvm = vcpu->kvm; + int prev_cpu; + + if (!cpu_has_feature(CPU_FTR_HVMODE)) + return; + + if (nested) + prev_cpu = nested->prev_cpu[vcpu->arch.nested_vcpu_id]; + else + prev_cpu = vcpu->arch.prev_cpu; /* * With radix, the guest can do TLB invalidations itself, @@ -2428,12 +2446,46 @@ static void kvmppc_prepare_radix_vcpu(struct kvm_vcpu *vcpu, int pcpu) * ran to flush the TLB. The TLB is shared between threads, * so we use a single bit in .need_tlb_flush for all 4 threads. */ - if (vcpu->arch.prev_cpu != pcpu) { - if (vcpu->arch.prev_cpu >= 0 && - cpu_first_thread_sibling(vcpu->arch.prev_cpu) != + if (prev_cpu != pcpu) { + if (prev_cpu >= 0 && + cpu_first_thread_sibling(prev_cpu) != cpu_first_thread_sibling(pcpu)) - radix_flush_cpu(kvm, vcpu->arch.prev_cpu, vcpu); - vcpu->arch.prev_cpu = pcpu; + radix_flush_cpu(kvm, prev_cpu, vcpu); + if (nested) + nested->prev_cpu[vcpu->arch.nested_vcpu_id] = pcpu; + else + vcpu->arch.prev_cpu = pcpu; + } +} + +static void kvmppc_radix_check_need_tlb_flush(struct kvm *kvm, int pcpu, + struct kvm_nested_guest *nested) +{ + cpumask_t *need_tlb_flush; + int lpid; + + if (!cpu_has_feature(CPU_FTR_HVMODE)) + return; + + if (cpu_has_feature(CPU_FTR_ARCH_300)) + pcpu &= ~0x3UL; + + if (nested) { + lpid = nested->shadow_lpid; + need_tlb_flush = >need_tlb_flush; + } else { +
[PATCH v5 24/33] KVM: PPC: Book3S HV: Use hypercalls for TLB invalidation when nested
This adds code to call the H_TLB_INVALIDATE hypercall when running as a guest, in the cases where we need to invalidate TLBs (or other MMU caches) as part of managing the mappings for a nested guest. Calling H_TLB_INVALIDATE lets the nested hypervisor inform the parent hypervisor about changes to partition-scoped page tables or the partition table without needing to do hypervisor-privileged tlbie instructions. Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_64.h | 5 + arch/powerpc/kvm/book3s_64_mmu_radix.c | 30 -- arch/powerpc/kvm/book3s_hv_nested.c | 30 -- 3 files changed, 57 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index c2a9146..719b31723 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -24,6 +24,7 @@ #include #include #include +#include #ifdef CONFIG_PPC_PSERIES static inline bool kvmhv_on_pseries(void) @@ -117,6 +118,10 @@ struct kvm_nested_guest *kvmhv_get_nested(struct kvm *kvm, int l1_lpid, bool create); void kvmhv_put_nested(struct kvm_nested_guest *gp); +/* Encoding of first parameter for H_TLB_INVALIDATE */ +#define H_TLBIE_P1_ENC(ric, prs, r)(___PPC_RIC(ric) | ___PPC_PRS(prs) | \ +___PPC_R(r)) + /* Power architecture requires HPT is at least 256kiB, at most 64TiB */ #define PPC_MIN_HPT_ORDER 18 #define PPC_MAX_HPT_ORDER 46 diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 4c1eccb..ae0e3ed 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -201,17 +201,43 @@ static void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned long addr, unsigned int pshift, unsigned int lpid) { unsigned long psize = PAGE_SIZE; + int psi; + long rc; + unsigned long rb; if (pshift) psize = 1UL << pshift; + else + pshift = PAGE_SHIFT; addr &= ~(psize - 1); - radix__flush_tlb_lpid_page(lpid, addr, psize); + + if (!kvmhv_on_pseries()) { + radix__flush_tlb_lpid_page(lpid, addr, psize); + return; + } + + psi = shift_to_mmu_psize(pshift); + rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58)); + rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 1), + lpid, rb); + if (rc) + pr_err("KVM: TLB page invalidation hcall failed, rc=%ld\n", rc); } static void kvmppc_radix_flush_pwc(struct kvm *kvm, unsigned int lpid) { - radix__flush_pwc_lpid(lpid); + long rc; + + if (!kvmhv_on_pseries()) { + radix__flush_pwc_lpid(lpid); + return; + } + + rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 1), + lpid, TLBIEL_INVAL_SET_LPID); + if (rc) + pr_err("KVM: TLB PWC invalidation hcall failed, rc=%ld\n", rc); } static unsigned long kvmppc_radix_update_pte(struct kvm *kvm, pte_t *ptep, diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index c83c13d..486d900 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -299,14 +299,32 @@ void kvmhv_nested_exit(void) } } +static void kvmhv_flush_lpid(unsigned int lpid) +{ + long rc; + + if (!kvmhv_on_pseries()) { + radix__flush_tlb_lpid(lpid); + return; + } + + rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 1), + lpid, TLBIEL_INVAL_SET_LPID); + if (rc) + pr_err("KVM: TLB LPID invalidation hcall failed, rc=%ld\n", rc); +} + void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1) { - if (cpu_has_feature(CPU_FTR_HVMODE)) { + if (!kvmhv_on_pseries()) { mmu_partition_table_set_entry(lpid, dw0, dw1); - } else { - pseries_partition_tb[lpid].patb0 = cpu_to_be64(dw0); - pseries_partition_tb[lpid].patb1 = cpu_to_be64(dw1); + return; } + + pseries_partition_tb[lpid].patb0 = cpu_to_be64(dw0); + pseries_partition_tb[lpid].patb1 = cpu_to_be64(dw1); + /* L0 will do the necessary barriers */ + kvmhv_flush_lpid(lpid); } static void kvmhv_set_nested_ptbl(struct kvm_nested_guest *gp) @@ -493,7 +511,7 @@ static void kvmhv_flush_nested(struct kvm_nested_guest *gp) spin_lock(>mmu_lock); kvmppc_free_pgtable_radix(kvm, gp->shadow_pgtable, gp->shadow_lpid); spin_unlock(>mmu_lock); - radix__flush_tlb_lpid(gp->shadow_lpid); +
[PATCH v5 22/33] KVM: PPC: Book3S HV: Introduce rmap to track nested guest mappings
From: Suraj Jitindar Singh When a host (L0) page which is mapped into a (L1) guest is in turn mapped through to a nested (L2) guest we keep a reverse mapping (rmap) so that these mappings can be retrieved later. Whenever we create an entry in a shadow_pgtable for a nested guest we create a corresponding rmap entry and add it to the list for the L1 guest memslot at the index of the L1 guest page it maps. This means at the L1 guest memslot we end up with lists of rmaps. When we are notified of a host page being invalidated which has been mapped through to a (L1) guest, we can then walk the rmap list for that guest page, and find and invalidate all of the corresponding shadow_pgtable entries. In order to reduce memory consumption, we compress the information for each rmap entry down to 52 bits -- 12 bits for the LPID and 40 bits for the guest real page frame number -- which will fit in a single unsigned long. To avoid a scenario where a guest can trigger unbounded memory allocations, we scan the list when adding an entry to see if there is already an entry with the contents we need. This can occur, because we don't ever remove entries from the middle of a list. A struct nested guest rmap is a list pointer and an rmap entry; | next pointer | | rmap entry | Thus the rmap pointer for each guest frame number in the memslot can be either NULL, a single entry, or a pointer to a list of nested rmap entries. gfn memslot rmap array - 0 | NULL | (no rmap entry) - 1 | single rmap entry | (rmap entry with low bit set) - 2 | list head pointer | (list of rmap entries) - The final entry always has the lowest bit set and is stored in the next pointer of the last list entry, or as a single rmap entry. With a list of rmap entries looking like; - - - | list head ptr | > | next pointer | > | single rmap entry | - - - | rmap entry| | rmap entry| - - Signed-off-by: Suraj Jitindar Singh Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s.h| 3 + arch/powerpc/include/asm/kvm_book3s_64.h | 69 +++- arch/powerpc/kvm/book3s_64_mmu_radix.c | 44 +++--- arch/powerpc/kvm/book3s_hv.c | 1 + arch/powerpc/kvm/book3s_hv_nested.c | 138 ++- 5 files changed, 240 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 63f7ccf..d7aeb6f 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -196,6 +196,9 @@ extern int kvmppc_mmu_radix_translate_table(struct kvm_vcpu *vcpu, gva_t eaddr, int table_index, u64 *pte_ret_p); extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *gpte, bool data, bool iswrite); +extern void kvmppc_unmap_pte(struct kvm *kvm, pte_t *pte, unsigned long gpa, + unsigned int shift, struct kvm_memory_slot *memslot, + unsigned int lpid); extern bool kvmppc_hv_handle_set_rc(struct kvm *kvm, pgd_t *pgtable, bool writing, unsigned long gpa, unsigned int lpid); diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 5496152..c2a9146 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -53,6 +53,66 @@ struct kvm_nested_guest { struct kvm_nested_guest *next; }; +/* + * We define a nested rmap entry as a single 64-bit quantity + * 0xFFF0 12-bit lpid field + * 0x000FF000 40-bit guest 4k page frame number + * 0x0001 1-bit single entry flag + */ +#define RMAP_NESTED_LPID_MASK 0xFFF0UL +#define RMAP_NESTED_LPID_SHIFT (52) +#define RMAP_NESTED_GPA_MASK 0x000FF000UL +#define RMAP_NESTED_IS_SINGLE_ENTRY0x0001UL + +/* Structure for a nested guest rmap entry */ +struct rmap_nested { + struct llist_node list; + u64 rmap; +}; + +/* + * for_each_nest_rmap_safe - iterate over the list of nested rmap entries + * safe against removal of the list entry or NULL list + * @pos: a (struct rmap_nested *) to use as a loop cursor + * @node: pointer to the first entry + * NOTE: this can be NULL + * @rmapp: an (unsigned long *) in which to return the rmap entries on each +
[PATCH v5 23/33] KVM: PPC: Book3S HV: Implement H_TLB_INVALIDATE hcall
From: Suraj Jitindar Singh When running a nested (L2) guest the guest (L1) hypervisor will use the H_TLB_INVALIDATE hcall when it needs to change the partition scoped page tables or the partition table which it manages. It will use this hcall in the situations where it would use a partition-scoped tlbie instruction if it were running in hypervisor mode. The H_TLB_INVALIDATE hcall can invalidate different scopes: Invalidate TLB for a given target address: - This invalidates a single L2 -> L1 pte - We need to invalidate any L2 -> L0 shadow_pgtable ptes which map the L2 address space which is being invalidated. This is because a single L2 -> L1 pte may have been mapped with more than one pte in the L2 -> L0 page tables. Invalidate the entire TLB for a given LPID or for all LPIDs: - Invalidate the entire shadow_pgtable for a given nested guest, or for all nested guests. Invalidate the PWC (page walk cache) for a given LPID or for all LPIDs: - We don't cache the PWC, so nothing to do. Invalidate the entire TLB, PWC and partition table for a given/all LPIDs: - Here we re-read the partition table entry and remove the nested state for any nested guest for which the first doubleword of the partition table entry is now zero. The H_TLB_INVALIDATE hcall takes as parameters the tlbie instruction word (of which only the RIC, PRS and R fields are used), the rS value (giving the lpid, where required) and the rB value (giving the IS, AP and EPN values). [pau...@ozlabs.org - adapted to having the partition table in guest memory, added the H_TLB_INVALIDATE implementation, removed tlbie instruction emulation, reworded the commit message.] Reviewed-by: David Gibson Signed-off-by: Suraj Jitindar Singh Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 12 ++ arch/powerpc/include/asm/kvm_book3s.h | 1 + arch/powerpc/include/asm/ppc-opcode.h | 1 + arch/powerpc/kvm/book3s_emulate.c | 1 - arch/powerpc/kvm/book3s_hv.c | 3 + arch/powerpc/kvm/book3s_hv_nested.c | 196 +- 6 files changed, 212 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index b3520b5..66db23e 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -203,6 +203,18 @@ static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize) BUG(); } +static inline unsigned int ap_to_shift(unsigned long ap) +{ + int psize; + + for (psize = 0; psize < MMU_PAGE_COUNT; psize++) { + if (mmu_psize_defs[psize].ap == ap) + return mmu_psize_defs[psize].shift; + } + + return -1; +} + static inline unsigned long get_sllp_encoding(int psize) { unsigned long sllp; diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index d7aeb6f..09f8e9b 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -301,6 +301,7 @@ long kvmhv_set_partition_table(struct kvm_vcpu *vcpu); void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1); void kvmhv_release_all_nested(struct kvm *kvm); long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu); +long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu); int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr); void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr); diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 665af14..6093bc8 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -104,6 +104,7 @@ #define OP_31_XOP_LHZUX 311 #define OP_31_XOP_MSGSNDP 142 #define OP_31_XOP_MSGCLRP 174 +#define OP_31_XOP_TLBIE 306 #define OP_31_XOP_MFSPR 339 #define OP_31_XOP_LWAX 341 #define OP_31_XOP_LHAX 343 diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 2654df2..8c7e933 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -36,7 +36,6 @@ #define OP_31_XOP_MTSR 210 #define OP_31_XOP_MTSRIN 242 #define OP_31_XOP_TLBIEL 274 -#define OP_31_XOP_TLBIE306 /* Opcode is officially reserved, reuse it as sc 1 when sc 1 doesn't trap */ #define OP_31_XOP_FAKE_SC1 308 #define OP_31_XOP_SLBMTE 402 diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index cb9e738..49f07de 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -974,6 +974,9 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) break; case H_TLB_INVALIDATE: ret = H_FUNCTION; + if (!vcpu->kvm->arch.nested_enable) +
[PATCH v5 21/33] KVM: PPC: Book3S HV: Handle page fault for a nested guest
From: Suraj Jitindar Singh Consider a normal (L1) guest running under the main hypervisor (L0), and then a nested guest (L2) running under the L1 guest which is acting as a nested hypervisor. L0 has page tables to map the address space for L1 providing the translation from L1 real address -> L0 real address; L1 | | (L1 -> L0) | > L0 There are also page tables in L1 used to map the address space for L2 providing the translation from L2 real address -> L1 read address. Since the hardware can only walk a single level of page table, we need to maintain in L0 a "shadow_pgtable" for L2 which provides the translation from L2 real address -> L0 real address. Which looks like; L2 L2 | | | (L2 -> L1)| | | > L1| (L2 -> L0) | | | (L1 -> L0) | | | > L0 > L0 When a page fault occurs while running a nested (L2) guest we need to insert a pte into this "shadow_pgtable" for the L2 -> L0 mapping. To do this we need to: 1. Walk the pgtable in L1 memory to find the L2 -> L1 mapping, and provide a page fault to L1 if this mapping doesn't exist. 2. Use our L1 -> L0 pgtable to convert this L1 address to an L0 address, or try to insert a pte for that mapping if it doesn't exist. 3. Now we have a L2 -> L0 mapping, insert this into our shadow_pgtable Once this mapping exists we can take rc faults when hardware is unable to automatically set the reference and change bits in the pte. On these we need to: 1. Check the rc bits on the L2 -> L1 pte match, and otherwise reflect the fault down to L1. 2. Set the rc bits in the L1 -> L0 pte which corresponds to the same host page. 3. Set the rc bits in the L2 -> L0 pte. As we reuse a large number of functions in book3s_64_mmu_radix.c for this we also needed to refactor a number of these functions to take an lpid parameter so that the correct lpid is used for tlb invalidations. The functionality however has remained the same. Reviewed-by: David Gibson Signed-off-by: Suraj Jitindar Singh Signed-off-by: Paul Mackerras --- .../powerpc/include/asm/book3s/64/tlbflush-radix.h | 1 + arch/powerpc/include/asm/kvm_book3s.h | 17 ++ arch/powerpc/include/asm/kvm_book3s_64.h | 4 + arch/powerpc/include/asm/kvm_host.h| 2 + arch/powerpc/kvm/book3s_64_mmu_radix.c | 194 ++-- arch/powerpc/kvm/book3s_hv_nested.c| 332 - arch/powerpc/mm/tlb-radix.c| 9 + 7 files changed, 473 insertions(+), 86 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h index 1154a6d..671316f 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h @@ -53,6 +53,7 @@ extern void radix__flush_tlb_lpid_page(unsigned int lpid, unsigned long addr, unsigned long page_size); extern void radix__flush_pwc_lpid(unsigned int lpid); +extern void radix__flush_tlb_lpid(unsigned int lpid); extern void radix__local_flush_tlb_lpid(unsigned int lpid); extern void radix__local_flush_tlb_lpid_guest(unsigned int lpid); diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 093fd70..63f7ccf 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -188,17 +188,34 @@ extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hc); extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned long ea, unsigned long dsisr); +extern int kvmppc_mmu_walk_radix_tree(struct kvm_vcpu *vcpu, gva_t eaddr, + struct kvmppc_pte *gpte, u64 root, + u64 *pte_ret_p); extern int kvmppc_mmu_radix_translate_table(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *gpte, u64 table, int table_index, u64 *pte_ret_p); extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *gpte, bool data, bool iswrite); +extern bool kvmppc_hv_handle_set_rc(struct kvm *kvm, pgd_t *pgtable, + bool writing, unsigned long gpa, + unsigned int lpid); +extern int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu, + unsigned long gpa, + struct kvm_memory_slot *memslot, +
[PATCH v5 20/33] KVM: PPC: Book3S HV: Handle hypercalls correctly when nested
When we are running as a nested hypervisor, we use a hypercall to enter the guest rather than code in book3s_hv_rmhandlers.S. This means that the hypercall handlers listed in hcall_real_table never get called. There are some hypercalls that are handled there and not in kvmppc_pseries_do_hcall(), which therefore won't get processed for a nested guest. To fix this, we add cases to kvmppc_pseries_do_hcall() to handle those hypercalls, with the following exceptions: - The HPT hypercalls (H_ENTER, H_REMOVE, etc.) are not handled because we only support radix mode for nested guests. - H_CEDE has to be handled specially because the cede logic in kvmhv_run_single_vcpu assumes that it has been processed by the time that kvmhv_p9_guest_entry() returns. Therefore we put a special case for H_CEDE in kvmhv_p9_guest_entry(). For the XICS hypercalls, if real-mode processing is enabled, then the virtual-mode handlers assume that they are being called only to finish up the operation. Therefore we turn off the real-mode flag in the XICS code when running as a nested hypervisor. Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/asm-prototypes.h | 4 +++ arch/powerpc/kvm/book3s_hv.c | 43 +++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 ++ arch/powerpc/kvm/book3s_xics.c| 3 ++- 4 files changed, 51 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index 5c9b00c..c55ba3b 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -167,4 +167,8 @@ void kvmhv_load_guest_pmu(struct kvm_vcpu *vcpu); int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu); +long kvmppc_h_set_dabr(struct kvm_vcpu *vcpu, unsigned long dabr); +long kvmppc_h_set_xdabr(struct kvm_vcpu *vcpu, unsigned long dabr, + unsigned long dabrx); + #endif /* _ASM_POWERPC_ASM_PROTOTYPES_H */ diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index dd84252..dc25461 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -50,6 +50,7 @@ #include #include #include +#include #include #include #include @@ -915,6 +916,19 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) break; } return RESUME_HOST; + case H_SET_DABR: + ret = kvmppc_h_set_dabr(vcpu, kvmppc_get_gpr(vcpu, 4)); + break; + case H_SET_XDABR: + ret = kvmppc_h_set_xdabr(vcpu, kvmppc_get_gpr(vcpu, 4), + kvmppc_get_gpr(vcpu, 5)); + break; + case H_GET_TCE: + ret = kvmppc_h_get_tce(vcpu, kvmppc_get_gpr(vcpu, 4), + kvmppc_get_gpr(vcpu, 5)); + if (ret == H_TOO_HARD) + return RESUME_HOST; + break; case H_PUT_TCE: ret = kvmppc_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4), kvmppc_get_gpr(vcpu, 5), @@ -938,6 +952,10 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) if (ret == H_TOO_HARD) return RESUME_HOST; break; + case H_RANDOM: + if (!powernv_get_random_long(>arch.regs.gpr[4])) + ret = H_HARDWARE; + break; case H_SET_PARTITION_TABLE: ret = H_FUNCTION; @@ -966,6 +984,24 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) return RESUME_GUEST; } +/* + * Handle H_CEDE in the nested virtualization case where we haven't + * called the real-mode hcall handlers in book3s_hv_rmhandlers.S. + * This has to be done early, not in kvmppc_pseries_do_hcall(), so + * that the cede logic in kvmppc_run_single_vcpu() works properly. + */ +static void kvmppc_nested_cede(struct kvm_vcpu *vcpu) +{ + vcpu->arch.shregs.msr |= MSR_EE; + vcpu->arch.ceded = 1; + smp_mb(); + if (vcpu->arch.prodded) { + vcpu->arch.prodded = 0; + smp_mb(); + vcpu->arch.ceded = 0; + } +} + static int kvmppc_hcall_impl_hv(unsigned long cmd) { switch (cmd) { @@ -3422,6 +3458,13 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, vcpu->arch.shregs.msr = vcpu->arch.regs.msr; vcpu->arch.shregs.dar = mfspr(SPRN_DAR); vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR); + + /* H_CEDE has to be handled now, not later */ + if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested && + kvmppc_get_gpr(vcpu, 3) == H_CEDE) { + kvmppc_nested_cede(vcpu); + trap = 0; + } } else { trap =
[PATCH v5 19/33] KVM: PPC: Book3S HV: Use XICS hypercalls when running as a nested hypervisor
This adds code to call the H_IPI and H_EOI hypercalls when we are running as a nested hypervisor (i.e. without the CPU_FTR_HVMODE cpu feature) and we would otherwise access the XICS interrupt controller directly or via an OPAL call. Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- arch/powerpc/kvm/book3s_hv.c | 7 +- arch/powerpc/kvm/book3s_hv_builtin.c | 44 +--- arch/powerpc/kvm/book3s_hv_rm_xics.c | 8 +++ 3 files changed, 50 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index d58a4a6..dd84252 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -173,6 +173,10 @@ static bool kvmppc_ipi_thread(int cpu) { unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER); + /* If we're a nested hypervisor, fall back to ordinary IPIs for now */ + if (kvmhv_on_pseries()) + return false; + /* On POWER9 we can use msgsnd to IPI any cpu */ if (cpu_has_feature(CPU_FTR_ARCH_300)) { msg |= get_hard_smp_processor_id(cpu); @@ -5173,7 +5177,8 @@ static int kvmppc_book3s_init_hv(void) * indirectly, via OPAL. */ #ifdef CONFIG_SMP - if (!xive_enabled() && !local_paca->kvm_hstate.xics_phys) { + if (!xive_enabled() && !kvmhv_on_pseries() && + !local_paca->kvm_hstate.xics_phys) { struct device_node *np; np = of_find_compatible_node(NULL, NULL, "ibm,opal-intc"); diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index ccfea5b..a71e2fc 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -231,6 +231,15 @@ void kvmhv_rm_send_ipi(int cpu) void __iomem *xics_phys; unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER); + /* For a nested hypervisor, use the XICS via hcall */ + if (kvmhv_on_pseries()) { + unsigned long retbuf[PLPAR_HCALL_BUFSIZE]; + + plpar_hcall_raw(H_IPI, retbuf, get_hard_smp_processor_id(cpu), + IPI_PRIORITY); + return; + } + /* On POWER9 we can use msgsnd for any destination cpu. */ if (cpu_has_feature(CPU_FTR_ARCH_300)) { msg |= get_hard_smp_processor_id(cpu); @@ -460,12 +469,19 @@ static long kvmppc_read_one_intr(bool *again) return 1; /* Now read the interrupt from the ICP */ - xics_phys = local_paca->kvm_hstate.xics_phys; - rc = 0; - if (!xics_phys) - rc = opal_int_get_xirr(, false); - else - xirr = __raw_rm_readl(xics_phys + XICS_XIRR); + if (kvmhv_on_pseries()) { + unsigned long retbuf[PLPAR_HCALL_BUFSIZE]; + + rc = plpar_hcall_raw(H_XIRR, retbuf, 0xFF); + xirr = cpu_to_be32(retbuf[0]); + } else { + xics_phys = local_paca->kvm_hstate.xics_phys; + rc = 0; + if (!xics_phys) + rc = opal_int_get_xirr(, false); + else + xirr = __raw_rm_readl(xics_phys + XICS_XIRR); + } if (rc < 0) return 1; @@ -494,7 +510,13 @@ static long kvmppc_read_one_intr(bool *again) */ if (xisr == XICS_IPI) { rc = 0; - if (xics_phys) { + if (kvmhv_on_pseries()) { + unsigned long retbuf[PLPAR_HCALL_BUFSIZE]; + + plpar_hcall_raw(H_IPI, retbuf, + hard_smp_processor_id(), 0xff); + plpar_hcall_raw(H_EOI, retbuf, h_xirr); + } else if (xics_phys) { __raw_rm_writeb(0xff, xics_phys + XICS_MFRR); __raw_rm_writel(xirr, xics_phys + XICS_XIRR); } else { @@ -520,7 +542,13 @@ static long kvmppc_read_one_intr(bool *again) /* We raced with the host, * we need to resend that IPI, bummer */ - if (xics_phys) + if (kvmhv_on_pseries()) { + unsigned long retbuf[PLPAR_HCALL_BUFSIZE]; + + plpar_hcall_raw(H_IPI, retbuf, + hard_smp_processor_id(), + IPI_PRIORITY); + } else if (xics_phys) __raw_rm_writeb(IPI_PRIORITY, xics_phys + XICS_MFRR); else diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index 8b9f356..b3f5786 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -767,6 +767,14 @@ static void
[PATCH v5 18/33] KVM: PPC: Book3S HV: Nested guest entry via hypercall
This adds a new hypercall, H_ENTER_NESTED, which is used by a nested hypervisor to enter one of its nested guests. The hypercall supplies register values in two structs. Those values are copied by the level 0 (L0) hypervisor (the one which is running in hypervisor mode) into the vcpu struct of the L1 guest, and then the guest is run until an interrupt or error occurs which needs to be reported to L1 via the hypercall return value. Currently this assumes that the L0 and L1 hypervisors are the same endianness, and the structs passed as arguments are in native endianness. If they are of different endianness, the version number check will fail and the hcall will be rejected. Nested hypervisors do not support indep_threads_mode=N, so this adds code to print a warning message if the administrator has set indep_threads_mode=N, and treat it as Y. Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/hvcall.h | 36 + arch/powerpc/include/asm/kvm_book3s.h | 7 + arch/powerpc/include/asm/kvm_host.h | 5 + arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kvm/book3s_hv.c| 214 - arch/powerpc/kvm/book3s_hv_nested.c | 230 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 ++ 7 files changed, 471 insertions(+), 30 deletions(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index c95c651..45e8789 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -466,6 +466,42 @@ struct h_cpu_char_result { u64 behaviour; }; +/* Register state for entering a nested guest with H_ENTER_NESTED */ +struct hv_guest_state { + u64 version;/* version of this structure layout */ + u32 lpid; + u32 vcpu_token; + /* These registers are hypervisor privileged (at least for writing) */ + u64 lpcr; + u64 pcr; + u64 amor; + u64 dpdes; + u64 hfscr; + s64 tb_offset; + u64 dawr0; + u64 dawrx0; + u64 ciabr; + u64 hdec_expiry; + u64 purr; + u64 spurr; + u64 ic; + u64 vtb; + u64 hdar; + u64 hdsisr; + u64 heir; + u64 asdr; + /* These are OS privileged but need to be set late in guest entry */ + u64 srr0; + u64 srr1; + u64 sprg[4]; + u64 pidr; + u64 cfar; + u64 ppr; +}; + +/* Latest version of hv_guest_state structure */ +#define HV_GUEST_STATE_VERSION 1 + #endif /* __ASSEMBLY__ */ #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_HVCALL_H */ diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 43f212e..093fd70 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -280,6 +280,13 @@ void kvmhv_vm_nested_init(struct kvm *kvm); long kvmhv_set_partition_table(struct kvm_vcpu *vcpu); void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1); void kvmhv_release_all_nested(struct kvm *kvm); +long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu); +int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu, + u64 time_limit, unsigned long lpcr); +void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr); +void kvmhv_restore_hv_return_state(struct kvm_vcpu *vcpu, + struct hv_guest_state *hr); +long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu); void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac); diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index c35d4f2..ceb9f20 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -95,6 +95,7 @@ struct dtl_entry; struct kvmppc_vcpu_book3s; struct kvmppc_book3s_shadow_vcpu; +struct kvm_nested_guest; struct kvm_vm_stat { ulong remote_tlb_flush; @@ -786,6 +787,10 @@ struct kvm_vcpu_arch { u32 emul_inst; u32 online; + + /* For support of nested guests */ + struct kvm_nested_guest *nested; + u32 nested_vcpu_id; #endif #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 7c3738d..d0abcbb 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -503,6 +503,7 @@ int main(void) OFFSET(VCPU_VPA, kvm_vcpu, arch.vpa.pinned_addr); OFFSET(VCPU_VPA_DIRTY, kvm_vcpu, arch.vpa.dirty); OFFSET(VCPU_HEIR, kvm_vcpu, arch.emul_inst); + OFFSET(VCPU_NESTED, kvm_vcpu, arch.nested); OFFSET(VCPU_CPU, kvm_vcpu, cpu); OFFSET(VCPU_THREAD_CPU, kvm_vcpu, arch.thread_cpu); #endif diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 4c72f2f..d58a4a6 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -942,6 +942,13 @@ int
[PATCH v5 17/33] KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization
This starts the process of adding the code to support nested HV-style virtualization. It defines a new H_SET_PARTITION_TABLE hypercall which a nested hypervisor can use to set the base address and size of a partition table in its memory (analogous to the PTCR register). On the host (level 0 hypervisor) side, the H_SET_PARTITION_TABLE hypercall from the guest is handled by code that saves the virtual PTCR value for the guest. This also adds code for creating and destroying nested guests and for reading the partition table entry for a nested guest from L1 memory. Each nested guest has its own shadow LPID value, different in general from the LPID value used by the nested hypervisor to refer to it. The shadow LPID value is allocated at nested guest creation time. Nested hypervisor functionality is only available for a radix guest, which therefore means a radix host on a POWER9 (or later) processor. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/hvcall.h | 5 + arch/powerpc/include/asm/kvm_book3s.h | 10 +- arch/powerpc/include/asm/kvm_book3s_64.h | 33 arch/powerpc/include/asm/kvm_book3s_asm.h | 3 + arch/powerpc/include/asm/kvm_host.h | 5 + arch/powerpc/kvm/Makefile | 3 +- arch/powerpc/kvm/book3s_hv.c | 31 ++- arch/powerpc/kvm/book3s_hv_nested.c | 301 ++ 8 files changed, 384 insertions(+), 7 deletions(-) create mode 100644 arch/powerpc/kvm/book3s_hv_nested.c diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index a0b17f9..c95c651 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -322,6 +322,11 @@ #define H_GET_24X7_DATA0xF07C #define H_GET_PERF_COUNTER_INFO0xF080 +/* Platform-specific hcalls used for nested HV KVM */ +#define H_SET_PARTITION_TABLE 0xF800 +#define H_ENTER_NESTED 0xF804 +#define H_TLB_INVALIDATE 0xF808 + /* Values for 2nd argument to H_SET_MODE */ #define H_SET_MODE_RESOURCE_SET_CIABR 1 #define H_SET_MODE_RESOURCE_SET_DAWR 2 diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 91c9779..43f212e 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -274,6 +274,13 @@ static inline void kvmppc_save_tm_sprs(struct kvm_vcpu *vcpu) {} static inline void kvmppc_restore_tm_sprs(struct kvm_vcpu *vcpu) {} #endif +long kvmhv_nested_init(void); +void kvmhv_nested_exit(void); +void kvmhv_vm_nested_init(struct kvm *kvm); +long kvmhv_set_partition_table(struct kvm_vcpu *vcpu); +void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1); +void kvmhv_release_all_nested(struct kvm *kvm); + void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac); extern int kvm_irq_bypass; @@ -387,9 +394,6 @@ extern int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu); /* TO = 31 for unconditional trap */ #define INS_TW 0x7fe8 -/* LPIDs we support with this build -- runtime limit may be lower */ -#define KVMPPC_NR_LPIDS(LPID_RSVD + 1) - #define SPLIT_HACK_MASK0xff00 #define SPLIT_HACK_OFFS0xfb00 diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 5c0e2d9..6d67b6a 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -23,6 +23,39 @@ #include #include #include +#include + +#ifdef CONFIG_PPC_PSERIES +static inline bool kvmhv_on_pseries(void) +{ + return !cpu_has_feature(CPU_FTR_HVMODE); +} +#else +static inline bool kvmhv_on_pseries(void) +{ + return false; +} +#endif + +/* + * Structure for a nested guest, that is, for a guest that is managed by + * one of our guests. + */ +struct kvm_nested_guest { + struct kvm *l1_host;/* L1 VM that owns this nested guest */ + int l1_lpid;/* lpid L1 guest thinks this guest is */ + int shadow_lpid;/* real lpid of this nested guest */ + pgd_t *shadow_pgtable; /* our page table for this guest */ + u64 l1_gr_to_hr;/* L1's addr of part'n-scoped table */ + u64 process_table; /* process table entry for this guest */ + long refcnt;/* number of pointers to this struct */ + struct mutex tlb_lock; /* serialize page faults and tlbies */ + struct kvm_nested_guest *next; +}; + +struct kvm_nested_guest *kvmhv_get_nested(struct kvm *kvm, int l1_lpid, + bool create); +void kvmhv_put_nested(struct kvm_nested_guest *gp); /* Power architecture requires HPT is at least 256kiB, at most 64TiB */ #define PPC_MIN_HPT_ORDER 18 diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h
[PATCH v5 15/33] KVM: PPC: Book3S HV: Refactor radix page fault handler
From: Suraj Jitindar Singh The radix page fault handler accounts for all cases, including just needing to insert a pte. This breaks it up into separate functions for the two main cases; setting rc and inserting a pte. This allows us to make the setting of rc and inserting of a pte generic for any pgtable, not specific to the one for this guest. [pau...@ozlabs.org - reduced diffs from previous code] Reviewed-by: David Gibson Signed-off-by: Suraj Jitindar Singh Signed-off-by: Paul Mackerras --- arch/powerpc/kvm/book3s_64_mmu_radix.c | 210 +++-- 1 file changed, 123 insertions(+), 87 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index f2976f4..47f2b18 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -400,8 +400,9 @@ static void kvmppc_unmap_free_pud_entry_table(struct kvm *kvm, pud_t *pud, */ #define PTE_BITS_MUST_MATCH (~(_PAGE_WRITE | _PAGE_DIRTY | _PAGE_ACCESSED)) -static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa, -unsigned int level, unsigned long mmu_seq) +static int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte, +unsigned long gpa, unsigned int level, +unsigned long mmu_seq) { pgd_t *pgd; pud_t *pud, *new_pud = NULL; @@ -410,7 +411,7 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa, int ret; /* Traverse the guest's 2nd-level tree, allocate new levels needed */ - pgd = kvm->arch.pgtable + pgd_index(gpa); + pgd = pgtable + pgd_index(gpa); pud = NULL; if (pgd_present(*pgd)) pud = pud_offset(pgd, gpa); @@ -565,95 +566,49 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa, return ret; } -int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, - unsigned long ea, unsigned long dsisr) +static bool kvmppc_hv_handle_set_rc(struct kvm *kvm, pgd_t *pgtable, + bool writing, unsigned long gpa) +{ + unsigned long pgflags; + unsigned int shift; + pte_t *ptep; + + /* +* Need to set an R or C bit in the 2nd-level tables; +* since we are just helping out the hardware here, +* it is sufficient to do what the hardware does. +*/ + pgflags = _PAGE_ACCESSED; + if (writing) + pgflags |= _PAGE_DIRTY; + /* +* We are walking the secondary (partition-scoped) page table here. +* We can do this without disabling irq because the Linux MM +* subsystem doesn't do THP splits and collapses on this tree. +*/ + ptep = __find_linux_pte(pgtable, gpa, NULL, ); + if (ptep && pte_present(*ptep) && (!writing || pte_write(*ptep))) { + kvmppc_radix_update_pte(kvm, ptep, 0, pgflags, gpa, shift); + return true; + } + return false; +} + +static int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu, + unsigned long gpa, + struct kvm_memory_slot *memslot, + bool writing, bool kvm_ro, + pte_t *inserted_pte, unsigned int *levelp) { struct kvm *kvm = vcpu->kvm; - unsigned long mmu_seq; - unsigned long gpa, gfn, hva; - struct kvm_memory_slot *memslot; struct page *page = NULL; - long ret; - bool writing; + unsigned long mmu_seq; + unsigned long hva, gfn = gpa >> PAGE_SHIFT; bool upgrade_write = false; bool *upgrade_p = _write; pte_t pte, *ptep; - unsigned long pgflags; unsigned int shift, level; - - /* Check for unusual errors */ - if (dsisr & DSISR_UNSUPP_MMU) { - pr_err("KVM: Got unsupported MMU fault\n"); - return -EFAULT; - } - if (dsisr & DSISR_BADACCESS) { - /* Reflect to the guest as DSI */ - pr_err("KVM: Got radix HV page fault with DSISR=%lx\n", dsisr); - kvmppc_core_queue_data_storage(vcpu, ea, dsisr); - return RESUME_GUEST; - } - - /* Translate the logical address and get the page */ - gpa = vcpu->arch.fault_gpa & ~0xfffUL; - gpa &= ~0xF000ul; - gfn = gpa >> PAGE_SHIFT; - if (!(dsisr & DSISR_PRTABLE_FAULT)) - gpa |= ea & 0xfff; - memslot = gfn_to_memslot(kvm, gfn); - - /* No memslot means it's an emulated MMIO region */ - if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID)) { - if (dsisr & (DSISR_PRTABLE_FAULT | DSISR_BADACCESS | -DSISR_SET_RC)) { - /* -* Bad address in guest
[PATCH v5 16/33] KVM: PPC: Book3S HV: Use kvmppc_unmap_pte() in kvm_unmap_radix()
kvmppc_unmap_pte() does a sequence of operations that are open-coded in kvm_unmap_radix(). This extends kvmppc_unmap_pte() a little so that it can be used by kvm_unmap_radix(), and makes kvm_unmap_radix() call it. Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- arch/powerpc/kvm/book3s_64_mmu_radix.c | 33 + 1 file changed, 13 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 47f2b18..bd06a95 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -240,19 +240,22 @@ static void kvmppc_pmd_free(pmd_t *pmdp) } static void kvmppc_unmap_pte(struct kvm *kvm, pte_t *pte, -unsigned long gpa, unsigned int shift) +unsigned long gpa, unsigned int shift, +struct kvm_memory_slot *memslot) { - unsigned long page_size = 1ul << shift; unsigned long old; old = kvmppc_radix_update_pte(kvm, pte, ~0UL, 0, gpa, shift); kvmppc_radix_tlbie_page(kvm, gpa, shift); if (old & _PAGE_DIRTY) { unsigned long gfn = gpa >> PAGE_SHIFT; - struct kvm_memory_slot *memslot; + unsigned long page_size = PAGE_SIZE; - memslot = gfn_to_memslot(kvm, gfn); + if (shift) + page_size = 1ul << shift; + if (!memslot) + memslot = gfn_to_memslot(kvm, gfn); if (memslot && memslot->dirty_bitmap) kvmppc_update_dirty_map(memslot, gfn, page_size); } @@ -282,7 +285,7 @@ static void kvmppc_unmap_free_pte(struct kvm *kvm, pte_t *pte, bool full) WARN_ON_ONCE(1); kvmppc_unmap_pte(kvm, p, pte_pfn(*p) << PAGE_SHIFT, -PAGE_SHIFT); +PAGE_SHIFT, NULL); } } @@ -304,7 +307,7 @@ static void kvmppc_unmap_free_pmd(struct kvm *kvm, pmd_t *pmd, bool full) WARN_ON_ONCE(1); kvmppc_unmap_pte(kvm, (pte_t *)p, pte_pfn(*(pte_t *)p) << PAGE_SHIFT, -PMD_SHIFT); +PMD_SHIFT, NULL); } } else { pte_t *pte; @@ -468,7 +471,7 @@ static int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte, goto out_unlock; } /* Valid 1GB page here already, remove it */ - kvmppc_unmap_pte(kvm, (pte_t *)pud, hgpa, PUD_SHIFT); + kvmppc_unmap_pte(kvm, (pte_t *)pud, hgpa, PUD_SHIFT, NULL); } if (level == 2) { if (!pud_none(*pud)) { @@ -517,7 +520,7 @@ static int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte, goto out_unlock; } /* Valid 2MB page here already, remove it */ - kvmppc_unmap_pte(kvm, pmdp_ptep(pmd), lgpa, PMD_SHIFT); + kvmppc_unmap_pte(kvm, pmdp_ptep(pmd), lgpa, PMD_SHIFT, NULL); } if (level == 1) { if (!pmd_none(*pmd)) { @@ -780,20 +783,10 @@ int kvm_unmap_radix(struct kvm *kvm, struct kvm_memory_slot *memslot, pte_t *ptep; unsigned long gpa = gfn << PAGE_SHIFT; unsigned int shift; - unsigned long old; ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, ); - if (ptep && pte_present(*ptep)) { - old = kvmppc_radix_update_pte(kvm, ptep, ~0UL, 0, - gpa, shift); - kvmppc_radix_tlbie_page(kvm, gpa, shift); - if ((old & _PAGE_DIRTY) && memslot->dirty_bitmap) { - unsigned long psize = PAGE_SIZE; - if (shift) - psize = 1ul << shift; - kvmppc_update_dirty_map(memslot, gfn, psize); - } - } + if (ptep && pte_present(*ptep)) + kvmppc_unmap_pte(kvm, ptep, gpa, shift, memslot); return 0; } -- 2.7.4
[PATCH v5 14/33] KVM: PPC: Book3S HV: Make kvmppc_mmu_radix_xlate process/partition table agnostic
From: Suraj Jitindar Singh kvmppc_mmu_radix_xlate() is used to translate an effective address through the process tables. The process table and partition tables have identical layout. Exploit this fact to make the kvmppc_mmu_radix_xlate() function able to translate either an effective address through the process tables or a guest real address through the partition tables. [pau...@ozlabs.org - reduced diffs from previous code] Reviewed-by: David Gibson Signed-off-by: Suraj Jitindar Singh Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s.h | 3 + arch/powerpc/kvm/book3s_64_mmu_radix.c | 109 +++-- 2 files changed, 78 insertions(+), 34 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index dd18d81..91c9779 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -188,6 +188,9 @@ extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hc); extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned long ea, unsigned long dsisr); +extern int kvmppc_mmu_radix_translate_table(struct kvm_vcpu *vcpu, gva_t eaddr, + struct kvmppc_pte *gpte, u64 table, + int table_index, u64 *pte_ret_p); extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *gpte, bool data, bool iswrite); extern int kvmppc_init_vm_radix(struct kvm *kvm); diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 71951b5..f2976f4 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -29,83 +29,92 @@ */ static int p9_supported_radix_bits[4] = { 5, 9, 9, 13 }; -int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, - struct kvmppc_pte *gpte, bool data, bool iswrite) +/* + * Used to walk a partition or process table radix tree in guest memory + * Note: We exploit the fact that a partition table and a process + * table have the same layout, a partition-scoped page table and a + * process-scoped page table have the same layout, and the 2nd + * doubleword of a partition table entry has the same layout as + * the PTCR register. + */ +int kvmppc_mmu_radix_translate_table(struct kvm_vcpu *vcpu, gva_t eaddr, +struct kvmppc_pte *gpte, u64 table, +int table_index, u64 *pte_ret_p) { struct kvm *kvm = vcpu->kvm; - u32 pid; int ret, level, ps; - __be64 prte, rpte; - unsigned long ptbl; - unsigned long root, pte, index; + unsigned long ptbl, root; unsigned long rts, bits, offset; - unsigned long gpa; - unsigned long proc_tbl_size; + unsigned long size, index; + struct prtb_entry entry; + u64 pte, base, gpa; + __be64 rpte; - /* Work out effective PID */ - switch (eaddr >> 62) { - case 0: - pid = vcpu->arch.pid; - break; - case 3: - pid = 0; - break; - default: + if ((table & PRTS_MASK) > 24) return -EINVAL; - } - proc_tbl_size = 1 << ((kvm->arch.process_table & PRTS_MASK) + 12); - if (pid * 16 >= proc_tbl_size) + size = 1ul << ((table & PRTS_MASK) + 12); + + /* Is the table big enough to contain this entry? */ + if ((table_index * sizeof(entry)) >= size) return -EINVAL; - /* Read partition table to find root of tree for effective PID */ - ptbl = (kvm->arch.process_table & PRTB_MASK) + (pid * 16); - ret = kvm_read_guest(kvm, ptbl, , sizeof(prte)); + /* Read the table to find the root of the radix tree */ + ptbl = (table & PRTB_MASK) + (table_index * sizeof(entry)); + ret = kvm_read_guest(kvm, ptbl, , sizeof(entry)); if (ret) return ret; - root = be64_to_cpu(prte); + /* Root is stored in the first double word */ + root = be64_to_cpu(entry.prtb0); rts = ((root & RTS1_MASK) >> (RTS1_SHIFT - 3)) | ((root & RTS2_MASK) >> RTS2_SHIFT); bits = root & RPDS_MASK; - root = root & RPDB_MASK; + base = root & RPDB_MASK; offset = rts + 31; - /* current implementations only support 52-bit space */ + /* Current implementations only support 52-bit space */ if (offset != 52) return -EINVAL; + /* Walk each level of the radix tree */ for (level = 3; level >= 0; --level) { + /* Check a valid size */ if (level && bits != p9_supported_radix_bits[level]) return -EINVAL; if (level == 0 && !(bits == 5 || bits == 9))
[PATCH v5 13/33] KVM: PPC: Book3S HV: Clear partition table entry on vm teardown
From: Suraj Jitindar Singh When destroying a VM we return the LPID to the pool, however we never zero the partition table entry. This is instead done when we reallocate the LPID. Zero the partition table entry on VM teardown before returning the LPID to the pool. This means if we were running as a nested hypervisor the real hypervisor could use this to determine when it can free resources. Reviewed-by: David Gibson Signed-off-by: Suraj Jitindar Singh Signed-off-by: Paul Mackerras --- arch/powerpc/kvm/book3s_hv.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 123bd18..8425d72 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -4505,13 +4505,19 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm) kvmppc_free_vcores(kvm); - kvmppc_free_lpid(kvm->arch.lpid); if (kvm_is_radix(kvm)) kvmppc_free_radix(kvm); else kvmppc_free_hpt(>arch.hpt); + /* Perform global invalidation and return lpid to the pool */ + if (cpu_has_feature(CPU_FTR_ARCH_300)) { + kvm->arch.process_table = 0; + kvmppc_setup_partition_table(kvm); + } + kvmppc_free_lpid(kvm->arch.lpid); + kvmppc_free_pimap(kvm); } -- 2.7.4
[PATCH v5 12/33] KVM: PPC: Use ccr field in pt_regs struct embedded in vcpu struct
When the 'regs' field was added to struct kvm_vcpu_arch, the code was changed to use several of the fields inside regs (e.g., gpr, lr, etc.) but not the ccr field, because the ccr field in struct pt_regs is 64 bits on 64-bit platforms, but the cr field in kvm_vcpu_arch is only 32 bits. This changes the code to use the regs.ccr field instead of cr, and changes the assembly code on 64-bit platforms to use 64-bit loads and stores instead of 32-bit ones. Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s.h| 4 ++-- arch/powerpc/include/asm/kvm_book3s_64.h | 4 ++-- arch/powerpc/include/asm/kvm_booke.h | 4 ++-- arch/powerpc/include/asm/kvm_host.h | 2 -- arch/powerpc/kernel/asm-offsets.c| 4 ++-- arch/powerpc/kvm/book3s_emulate.c| 12 ++-- arch/powerpc/kvm/book3s_hv.c | 4 ++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 4 ++-- arch/powerpc/kvm/book3s_hv_tm.c | 6 +++--- arch/powerpc/kvm/book3s_hv_tm_builtin.c | 5 +++-- arch/powerpc/kvm/book3s_pr.c | 4 ++-- arch/powerpc/kvm/bookehv_interrupts.S| 8 arch/powerpc/kvm/emulate_loadstore.c | 1 - 13 files changed, 30 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 83a9aa3..dd18d81 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -301,12 +301,12 @@ static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num) static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val) { - vcpu->arch.cr = val; + vcpu->arch.regs.ccr = val; } static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) { - return vcpu->arch.cr; + return vcpu->arch.regs.ccr; } static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index af25aaa..5c0e2d9 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -483,7 +483,7 @@ static inline u64 sanitize_msr(u64 msr) #ifdef CONFIG_PPC_TRANSACTIONAL_MEM static inline void copy_from_checkpoint(struct kvm_vcpu *vcpu) { - vcpu->arch.cr = vcpu->arch.cr_tm; + vcpu->arch.regs.ccr = vcpu->arch.cr_tm; vcpu->arch.regs.xer = vcpu->arch.xer_tm; vcpu->arch.regs.link = vcpu->arch.lr_tm; vcpu->arch.regs.ctr = vcpu->arch.ctr_tm; @@ -500,7 +500,7 @@ static inline void copy_from_checkpoint(struct kvm_vcpu *vcpu) static inline void copy_to_checkpoint(struct kvm_vcpu *vcpu) { - vcpu->arch.cr_tm = vcpu->arch.cr; + vcpu->arch.cr_tm = vcpu->arch.regs.ccr; vcpu->arch.xer_tm = vcpu->arch.regs.xer; vcpu->arch.lr_tm = vcpu->arch.regs.link; vcpu->arch.ctr_tm = vcpu->arch.regs.ctr; diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h index d513e3e..f0cef62 100644 --- a/arch/powerpc/include/asm/kvm_booke.h +++ b/arch/powerpc/include/asm/kvm_booke.h @@ -46,12 +46,12 @@ static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num) static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val) { - vcpu->arch.cr = val; + vcpu->arch.regs.ccr = val; } static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) { - return vcpu->arch.cr; + return vcpu->arch.regs.ccr; } static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index a3d4f61..c9cc42f 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -538,8 +538,6 @@ struct kvm_vcpu_arch { ulong tar; #endif - u32 cr; - #ifdef CONFIG_PPC_BOOK3S ulong hflags; ulong guest_owned_ext; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 89cf155..7c3738d 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -438,7 +438,7 @@ int main(void) #ifdef CONFIG_PPC_BOOK3S OFFSET(VCPU_TAR, kvm_vcpu, arch.tar); #endif - OFFSET(VCPU_CR, kvm_vcpu, arch.cr); + OFFSET(VCPU_CR, kvm_vcpu, arch.regs.ccr); OFFSET(VCPU_PC, kvm_vcpu, arch.regs.nip); #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE OFFSET(VCPU_MSR, kvm_vcpu, arch.shregs.msr); @@ -695,7 +695,7 @@ int main(void) #endif /* CONFIG_PPC_BOOK3S_64 */ #else /* CONFIG_PPC_BOOK3S */ - OFFSET(VCPU_CR, kvm_vcpu, arch.cr); + OFFSET(VCPU_CR, kvm_vcpu, arch.regs.ccr); OFFSET(VCPU_XER, kvm_vcpu, arch.regs.xer); OFFSET(VCPU_LR, kvm_vcpu, arch.regs.link); OFFSET(VCPU_CTR, kvm_vcpu, arch.regs.ctr); diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 36b11c5..2654df2 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++
[PATCH v5 11/33] KVM: PPC: Book3S HV: Add a debugfs file to dump radix mappings
This adds a file called 'radix' in the debugfs directory for the guest, which when read gives all of the valid leaf PTEs in the partition-scoped radix tree for a radix guest, in human-readable format. It is analogous to the existing 'htab' file which dumps the HPT entries for a HPT guest. Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_64.h | 1 + arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/kvm/book3s_64_mmu_radix.c | 179 +++ arch/powerpc/kvm/book3s_hv.c | 2 + 4 files changed, 183 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index dc435a5..af25aaa 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -435,6 +435,7 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct kvm *kvm) } extern void kvmppc_mmu_debugfs_init(struct kvm *kvm); +extern void kvmhv_radix_debugfs_init(struct kvm *kvm); extern void kvmhv_rm_send_ipi(int cpu); diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 3cd0b9f..a3d4f61 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -291,6 +291,7 @@ struct kvm_arch { u64 process_table; struct dentry *debugfs_dir; struct dentry *htab_dentry; + struct dentry *radix_dentry; struct kvm_resize_hpt *resize_hpt; /* protected by kvm->lock */ #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 933c574..71951b5 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -10,6 +10,9 @@ #include #include #include +#include +#include +#include #include #include @@ -853,6 +856,182 @@ static void pmd_ctor(void *addr) memset(addr, 0, RADIX_PMD_TABLE_SIZE); } +struct debugfs_radix_state { + struct kvm *kvm; + struct mutexmutex; + unsigned long gpa; + int chars_left; + int buf_index; + charbuf[128]; + u8 hdr; +}; + +static int debugfs_radix_open(struct inode *inode, struct file *file) +{ + struct kvm *kvm = inode->i_private; + struct debugfs_radix_state *p; + + p = kzalloc(sizeof(*p), GFP_KERNEL); + if (!p) + return -ENOMEM; + + kvm_get_kvm(kvm); + p->kvm = kvm; + mutex_init(>mutex); + file->private_data = p; + + return nonseekable_open(inode, file); +} + +static int debugfs_radix_release(struct inode *inode, struct file *file) +{ + struct debugfs_radix_state *p = file->private_data; + + kvm_put_kvm(p->kvm); + kfree(p); + return 0; +} + +static ssize_t debugfs_radix_read(struct file *file, char __user *buf, +size_t len, loff_t *ppos) +{ + struct debugfs_radix_state *p = file->private_data; + ssize_t ret, r; + unsigned long n; + struct kvm *kvm; + unsigned long gpa; + pgd_t *pgt; + pgd_t pgd, *pgdp; + pud_t pud, *pudp; + pmd_t pmd, *pmdp; + pte_t *ptep; + int shift; + unsigned long pte; + + kvm = p->kvm; + if (!kvm_is_radix(kvm)) + return 0; + + ret = mutex_lock_interruptible(>mutex); + if (ret) + return ret; + + if (p->chars_left) { + n = p->chars_left; + if (n > len) + n = len; + r = copy_to_user(buf, p->buf + p->buf_index, n); + n -= r; + p->chars_left -= n; + p->buf_index += n; + buf += n; + len -= n; + ret = n; + if (r) { + if (!n) + ret = -EFAULT; + goto out; + } + } + + gpa = p->gpa; + pgt = kvm->arch.pgtable; + while (len != 0 && gpa < RADIX_PGTABLE_RANGE) { + if (!p->hdr) { + n = scnprintf(p->buf, sizeof(p->buf), + "pgdir: %lx\n", (unsigned long)pgt); + p->hdr = 1; + goto copy; + } + + pgdp = pgt + pgd_index(gpa); + pgd = READ_ONCE(*pgdp); + if (!(pgd_val(pgd) & _PAGE_PRESENT)) { + gpa = (gpa & PGDIR_MASK) + PGDIR_SIZE; + continue; + } + + pudp = pud_offset(, gpa); + pud = READ_ONCE(*pudp); + if (!(pud_val(pud) & _PAGE_PRESENT)) { + gpa = (gpa & PUD_MASK) + PUD_SIZE; + continue; + } +