Re: [PATCH v4 11/23] powerpc/syscall: Rename syscall_64.c into syscall.c
Le 02/02/2021 à 07:38, Nicholas Piggin a écrit : Excerpts from Christophe Leroy's message of February 2, 2021 4:15 pm: Le 28/01/2021 à 00:50, Nicholas Piggin a écrit : Excerpts from David Laight's message of January 26, 2021 8:28 pm: From: Nicholas Piggin Sent: 26 January 2021 10:21 Excerpts from Christophe Leroy's message of January 26, 2021 12:48 am: syscall_64.c will be reused almost as is for PPC32. Rename it syscall.c Could you rename it to interrupt.c instead? A system call is an interrupt, and the file now also has code to return from other interrupts as well, and it matches the new asm/interrupt.h from the interrupts series. Hmmm That might make it harder for someone looking for the system call entry code to find it. It's very grep'able. In some sense interrupts are the simpler case. Especially when comparing with other architectures which have special instructions for syscall entry. powerpc does have a special instruction for syscall, and it causes a system call interrupt. I'm not sure about other architectures, but for powerpc its more sensible to call it interrupt.c than syscall.c. Many other architectures have a syscall.c but for a different purpose: it contains arch specific system calls. We have that in powerpc as well, it is called syscalls.c So to avoid confusion, I'll rename it. But I think "interrupt" is maybe not the right name. An interrupt most of the time refers to IRQ. That depends what you mean by interrupt and IRQ. Linux kind of considers any asynchronous maskable interrupt an irq (local_irq_disable()). But if you say irq it's more likely to mean a device interrupt, and "interrupt" usually refres to the asynch ones. But Linux doesn't really assign names to synchronous interrupts in core code. It doesn't say they aren't interrupts, it just doesn't really have a convention for them at all. Other architectures e.g., x86 also have things like interrupt descriptor table for synchronous interrupts as well. That's where I got the interrupt wrappers code from actually. So it's really fine to use the proper arch-specific names for things in arch code. I'm trying to slowly change names from exception to interrupt. For me system call is not an interrupt in the way it doesn't unexpectedly interrupt a program flow. In powerpc manuals it is generally called exceptions, no I'm more inclined to call it exception.c Actually that's backwards. Powerpc manuals (at least the one I look at) calls them all interrupts including system calls, and also the system call interrupt is actually the only one that doesn't appear to be associated with an exception. Also there is no distinction about expecte/unexpected -- a data storage interrupt is expected if you access a location without the right access permissions for example, but it is still an interrupt. These handlers very specifically deal with the change to execution flow (i.e., the interrupt), they do *not* deal with the exception which may be associated with it (that is the job of the handler). And on the other hand you can deal with exceptions in some cases without taking an interrupt at all. For example if you had MSR[EE]=0 you could change the decrementer or execute msgclr or change HMER SPR etc to clear various exceptions without ever taking the interrupt. Ok, let's call it interrupt.c then, to be consistant with the interrupt wrapper story. Christophe
Re: [RFC 11/20] mm/tlb: remove arch-specific tlb_start/end_vma()
Excerpts from Peter Zijlstra's message of February 1, 2021 10:09 pm: > On Sat, Jan 30, 2021 at 04:11:23PM -0800, Nadav Amit wrote: > >> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h >> index 427bfcc6cdec..b97136b7010b 100644 >> --- a/include/asm-generic/tlb.h >> +++ b/include/asm-generic/tlb.h >> @@ -334,8 +334,8 @@ static inline void __tlb_reset_range(struct mmu_gather >> *tlb) >> >> #ifdef CONFIG_MMU_GATHER_NO_RANGE >> >> -#if defined(tlb_flush) || defined(tlb_start_vma) || defined(tlb_end_vma) >> -#error MMU_GATHER_NO_RANGE relies on default tlb_flush(), tlb_start_vma() >> and tlb_end_vma() >> +#if defined(tlb_flush) >> +#error MMU_GATHER_NO_RANGE relies on default tlb_flush() >> #endif >> >> /* >> @@ -362,10 +362,6 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, >> struct vm_area_struct *vm >> >> #ifndef tlb_flush >> >> -#if defined(tlb_start_vma) || defined(tlb_end_vma) >> -#error Default tlb_flush() relies on default tlb_start_vma() and >> tlb_end_vma() >> -#endif > > #ifdef CONFIG_ARCH_WANT_AGGRESSIVE_TLB_FLUSH_BATCHING > #error > #endif > > goes here... > > >> static inline void tlb_end_vma(struct mmu_gather *tlb, struct >> vm_area_struct *vma) >> { >> if (tlb->fullmm) >> return; >> >> +if (IS_ENABLED(CONFIG_ARCH_WANT_AGGRESSIVE_TLB_FLUSH_BATCHING)) >> +return; > > Also, can you please stick to the CONFIG_MMU_GATHER_* namespace? > > I also don't think AGRESSIVE_FLUSH_BATCHING quite captures what it does. > How about: > > CONFIG_MMU_GATHER_NO_PER_VMA_FLUSH Yes please, have to have descriptive names. I didn't quite see why this was much of an improvement though. Maybe follow up patches take advantage of it? I didn't see how they all fit together. Thanks, Nick
[PATCH] powerpc/perf: Fix the guest crash issue with trace-imc
when perf kvm record with trace_imc event is attach to guest pid(with -p option), the qemu process gets killed with permission issue. This happens because trace_imc event requires admin privileges to monitor the process.If the qemu creates threads, by default child tasks also inherit the counters and if there is no permission to monitor qemu threads, we return permission denied ( EACCES ). Fix this by returning EACCES only if there is no CAP_SYS_ADMIN and the event doesn’t have inheritance. Fixes: 012ae244845f ("powerpc/perf: Trace imc PMU functions") Signed-off-by: Athira Rajeev --- arch/powerpc/perf/imc-pmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c index e106909ff9c3..cc5679bfd28b 100644 --- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -1429,7 +1429,7 @@ static int trace_imc_event_init(struct perf_event *event) if (event->attr.type != event->pmu->type) return -ENOENT; - if (!perfmon_capable()) + if (!perfmon_capable() && !event->attr.inherit) return -EACCES; /* Return if this is a couting event */ -- 1.8.3.1
Re: [PATCH v4 11/23] powerpc/syscall: Rename syscall_64.c into syscall.c
Excerpts from Christophe Leroy's message of February 2, 2021 4:15 pm: > > > Le 28/01/2021 à 00:50, Nicholas Piggin a écrit : >> Excerpts from David Laight's message of January 26, 2021 8:28 pm: >>> From: Nicholas Piggin Sent: 26 January 2021 10:21 Excerpts from Christophe Leroy's message of January 26, 2021 12:48 am: > syscall_64.c will be reused almost as is for PPC32. > > Rename it syscall.c Could you rename it to interrupt.c instead? A system call is an interrupt, and the file now also has code to return from other interrupts as well, and it matches the new asm/interrupt.h from the interrupts series. >>> >>> Hmmm >>> >>> That might make it harder for someone looking for the system call >>> entry code to find it. >> >> It's very grep'able. >> >>> In some sense interrupts are the simpler case. >>> >>> Especially when comparing with other architectures which have >>> special instructions for syscall entry. >> >> powerpc does have a special instruction for syscall, and it causes a >> system call interrupt. >> >> I'm not sure about other architectures, but for powerpc its more >> sensible to call it interrupt.c than syscall.c. > > Many other architectures have a syscall.c but for a different purpose: it > contains arch specific > system calls. We have that in powerpc as well, it is called syscalls.c > > So to avoid confusion, I'll rename it. But I think "interrupt" is maybe not > the right name. An > interrupt most of the time refers to IRQ. That depends what you mean by interrupt and IRQ. Linux kind of considers any asynchronous maskable interrupt an irq (local_irq_disable()). But if you say irq it's more likely to mean a device interrupt, and "interrupt" usually refres to the asynch ones. But Linux doesn't really assign names to synchronous interrupts in core code. It doesn't say they aren't interrupts, it just doesn't really have a convention for them at all. Other architectures e.g., x86 also have things like interrupt descriptor table for synchronous interrupts as well. That's where I got the interrupt wrappers code from actually. So it's really fine to use the proper arch-specific names for things in arch code. I'm trying to slowly change names from exception to interrupt. > For me system call is not an interrupt in the way it > doesn't unexpectedly interrupt a program flow. In powerpc manuals it is > generally called exceptions, > no I'm more inclined to call it exception.c Actually that's backwards. Powerpc manuals (at least the one I look at) calls them all interrupts including system calls, and also the system call interrupt is actually the only one that doesn't appear to be associated with an exception. Also there is no distinction about expecte/unexpected -- a data storage interrupt is expected if you access a location without the right access permissions for example, but it is still an interrupt. These handlers very specifically deal with the change to execution flow (i.e., the interrupt), they do *not* deal with the exception which may be associated with it (that is the job of the handler). And on the other hand you can deal with exceptions in some cases without taking an interrupt at all. For example if you had MSR[EE]=0 you could change the decrementer or execute msgclr or change HMER SPR etc to clear various exceptions without ever taking the interrupt. Thanks, Nick
Re: [PATCH] powerpc/fault: fix wrong KUAP fault for IO_URING
On 2/2/21 11:50 AM, Christophe Leroy wrote: Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit : On 2/2/21 11:32 AM, Christophe Leroy wrote: Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit : Aneesh Kumar K.V writes: Nicholas Piggin writes: Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: Christophe Leroy writes: +Aneesh Le 29/01/2021 à 07:52, Zorro Lang a écrit : .. [ 96.200296] [ cut here ] [ 96.200304] Bug: Read fault blocked by KUAP! [ 96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 [ 96.200734] NIP [c0849424] fault_in_pages_readable+0x104/0x350 [ 96.200741] LR [c084952c] fault_in_pages_readable+0x20c/0x350 [ 96.200747] --- interrupt: 300 Problem happens in a section where userspace access is supposed to be granted, so the patch you proposed is definitely not the right fix. c0849408: 2c 01 00 4c isync c084940c: a6 03 3d 7d mtspr 29,r9 <== granting userspace access permission c0849410: 2c 01 00 4c isync c0849414: 00 00 36 e9 ld r9,0(r22) c0849418: 20 00 29 81 lwz r9,32(r9) c084941c: 00 02 29 71 andi. r9,r9,512 c0849420: 78 d3 5e 7f mr r30,r26 ==> c0849424: 00 00 bf 8b lbz r29,0(r31) <== accessing userspace c0849428: 10 00 82 41 beq c0849438 c084942c: 2c 01 00 4c isync c0849430: a6 03 bd 7e mtspr 29,r21 <== clearing userspace access permission c0849434: 2c 01 00 4c isync My first guess is that the problem is linked to the following function, see the comment /* * For kernel thread that doesn't have thread.regs return * default AMR/IAMR values. */ static inline u64 current_thread_amr(void) { if (current->thread.regs) return current->thread.regs->amr; return AMR_KUAP_BLOCKED; } Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode") Yeah that's a bit of a curly one. At some point io_uring did kthread_use_mm(), which is supposed to mean the kthread can operate on behalf of the original process that submitted the IO. But because KUAP is implemented using memory protection keys, it depends on the value of the AMR register, which is not part of the mm, it's in thread.regs->amr. And what's worse by the time we're in kthread_use_mm() we no longer have access to the thread.regs->amr of the original process that submitted the IO. We also can't simply move the AMR into the mm, precisely because it's per thread, not per mm. So TBH I don't know how we're going to fix this. I guess we could return AMR=unblocked for kernel threads, but that's arguably a bug because it allows a process to circumvent memory keys by asking the kernel to do the access. We shouldn't need to inherit AMR should we? We only need it to be locked for kernel threads until it's explicitly unlocked -- nothing mm specific there. I think current_thread_amr could return 0 for kernel threads? Or I would even avoid using that function for allow_user_access and open code the kthread case and remove it from current_thread_amr(). Thanks, Nick updated one From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Tue, 2 Feb 2021 09:23:38 +0530 Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm This fix the bad fault reported by KUAP when io_wqe_worker access userspace. Bug: Read fault blocked by KUAP! WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0 NIP [c009e7e4] __do_page_fault+0x6b4/0xcd0 LR [c009e7e0] __do_page_fault+0x6b0/0xcd0 .. Call Trace: [c00016367330] [c009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable) [c000163673e0] [c009ee3c] do_page_fault+0x3c/0x120 [c00016367430] [c000c848] handle_page_fault+0x10/0x2c --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 .. NIP [c08e8228] iov_iter_fault_in_readable+0x148/0x6f0 LR [c08e834c] iov_iter_fault_in_readable+0x26c/0x6f0 interrupt: 300 [c000163677e0] [c07154a0] iomap_write_actor+0xc0/0x280 [c00016367880] [c070fc94] iomap_apply+0x1c4/0x780 [c00016367990] [c0710330] iomap_file_buffered_write+0xa0/0x120 [c000163679e0] [c0080040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] [c00016367a90] [c06d74bc] io_write+0x10c/0x460 [c00016367bb0] [c06d80e4] io_issue_sqe+0x8d4/0x1200 [c00016367c70] [c06d8ad0] io_wq_submit_work+0xc0/0x250 [c00016367cb0] [c06e2578] io_worker_handle_work+0x498/0x800 [c00016367d40] [c06e2cdc]
Re: [PATCH] powerpc/fault: fix wrong KUAP fault for IO_URING
Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit : On 2/2/21 11:32 AM, Christophe Leroy wrote: Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit : Aneesh Kumar K.V writes: Nicholas Piggin writes: Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: Christophe Leroy writes: +Aneesh Le 29/01/2021 à 07:52, Zorro Lang a écrit : .. [ 96.200296] [ cut here ] [ 96.200304] Bug: Read fault blocked by KUAP! [ 96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 [ 96.200734] NIP [c0849424] fault_in_pages_readable+0x104/0x350 [ 96.200741] LR [c084952c] fault_in_pages_readable+0x20c/0x350 [ 96.200747] --- interrupt: 300 Problem happens in a section where userspace access is supposed to be granted, so the patch you proposed is definitely not the right fix. c0849408: 2c 01 00 4c isync c084940c: a6 03 3d 7d mtspr 29,r9 <== granting userspace access permission c0849410: 2c 01 00 4c isync c0849414: 00 00 36 e9 ld r9,0(r22) c0849418: 20 00 29 81 lwz r9,32(r9) c084941c: 00 02 29 71 andi. r9,r9,512 c0849420: 78 d3 5e 7f mr r30,r26 ==> c0849424: 00 00 bf 8b lbz r29,0(r31) <== accessing userspace c0849428: 10 00 82 41 beq c0849438 c084942c: 2c 01 00 4c isync c0849430: a6 03 bd 7e mtspr 29,r21 <== clearing userspace access permission c0849434: 2c 01 00 4c isync My first guess is that the problem is linked to the following function, see the comment /* * For kernel thread that doesn't have thread.regs return * default AMR/IAMR values. */ static inline u64 current_thread_amr(void) { if (current->thread.regs) return current->thread.regs->amr; return AMR_KUAP_BLOCKED; } Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode") Yeah that's a bit of a curly one. At some point io_uring did kthread_use_mm(), which is supposed to mean the kthread can operate on behalf of the original process that submitted the IO. But because KUAP is implemented using memory protection keys, it depends on the value of the AMR register, which is not part of the mm, it's in thread.regs->amr. And what's worse by the time we're in kthread_use_mm() we no longer have access to the thread.regs->amr of the original process that submitted the IO. We also can't simply move the AMR into the mm, precisely because it's per thread, not per mm. So TBH I don't know how we're going to fix this. I guess we could return AMR=unblocked for kernel threads, but that's arguably a bug because it allows a process to circumvent memory keys by asking the kernel to do the access. We shouldn't need to inherit AMR should we? We only need it to be locked for kernel threads until it's explicitly unlocked -- nothing mm specific there. I think current_thread_amr could return 0 for kernel threads? Or I would even avoid using that function for allow_user_access and open code the kthread case and remove it from current_thread_amr(). Thanks, Nick updated one From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Tue, 2 Feb 2021 09:23:38 +0530 Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm This fix the bad fault reported by KUAP when io_wqe_worker access userspace. Bug: Read fault blocked by KUAP! WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0 NIP [c009e7e4] __do_page_fault+0x6b4/0xcd0 LR [c009e7e0] __do_page_fault+0x6b0/0xcd0 .. Call Trace: [c00016367330] [c009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable) [c000163673e0] [c009ee3c] do_page_fault+0x3c/0x120 [c00016367430] [c000c848] handle_page_fault+0x10/0x2c --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 .. NIP [c08e8228] iov_iter_fault_in_readable+0x148/0x6f0 LR [c08e834c] iov_iter_fault_in_readable+0x26c/0x6f0 interrupt: 300 [c000163677e0] [c07154a0] iomap_write_actor+0xc0/0x280 [c00016367880] [c070fc94] iomap_apply+0x1c4/0x780 [c00016367990] [c0710330] iomap_file_buffered_write+0xa0/0x120 [c000163679e0] [c0080040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] [c00016367a90] [c06d74bc] io_write+0x10c/0x460 [c00016367bb0] [c06d80e4] io_issue_sqe+0x8d4/0x1200 [c00016367c70] [c06d8ad0] io_wq_submit_work+0xc0/0x250 [c00016367cb0] [c06e2578] io_worker_handle_work+0x498/0x800 [c00016367d40] [c06e2cdc] io_wqe_worker+0x3fc/0x4f0 [c00016367da0] [c01cb0a4] kthread+0x1c4/0x1d0
Re: [PATCH] powerpc/fault: fix wrong KUAP fault for IO_URING
On 2/2/21 11:32 AM, Christophe Leroy wrote: Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit : Aneesh Kumar K.V writes: Nicholas Piggin writes: Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: Christophe Leroy writes: +Aneesh Le 29/01/2021 à 07:52, Zorro Lang a écrit : .. [ 96.200296] [ cut here ] [ 96.200304] Bug: Read fault blocked by KUAP! [ 96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 [ 96.200734] NIP [c0849424] fault_in_pages_readable+0x104/0x350 [ 96.200741] LR [c084952c] fault_in_pages_readable+0x20c/0x350 [ 96.200747] --- interrupt: 300 Problem happens in a section where userspace access is supposed to be granted, so the patch you proposed is definitely not the right fix. c0849408: 2c 01 00 4c isync c084940c: a6 03 3d 7d mtspr 29,r9 <== granting userspace access permission c0849410: 2c 01 00 4c isync c0849414: 00 00 36 e9 ld r9,0(r22) c0849418: 20 00 29 81 lwz r9,32(r9) c084941c: 00 02 29 71 andi. r9,r9,512 c0849420: 78 d3 5e 7f mr r30,r26 ==> c0849424: 00 00 bf 8b lbz r29,0(r31) <== accessing userspace c0849428: 10 00 82 41 beq c0849438 c084942c: 2c 01 00 4c isync c0849430: a6 03 bd 7e mtspr 29,r21 <== clearing userspace access permission c0849434: 2c 01 00 4c isync My first guess is that the problem is linked to the following function, see the comment /* * For kernel thread that doesn't have thread.regs return * default AMR/IAMR values. */ static inline u64 current_thread_amr(void) { if (current->thread.regs) return current->thread.regs->amr; return AMR_KUAP_BLOCKED; } Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode") Yeah that's a bit of a curly one. At some point io_uring did kthread_use_mm(), which is supposed to mean the kthread can operate on behalf of the original process that submitted the IO. But because KUAP is implemented using memory protection keys, it depends on the value of the AMR register, which is not part of the mm, it's in thread.regs->amr. And what's worse by the time we're in kthread_use_mm() we no longer have access to the thread.regs->amr of the original process that submitted the IO. We also can't simply move the AMR into the mm, precisely because it's per thread, not per mm. So TBH I don't know how we're going to fix this. I guess we could return AMR=unblocked for kernel threads, but that's arguably a bug because it allows a process to circumvent memory keys by asking the kernel to do the access. We shouldn't need to inherit AMR should we? We only need it to be locked for kernel threads until it's explicitly unlocked -- nothing mm specific there. I think current_thread_amr could return 0 for kernel threads? Or I would even avoid using that function for allow_user_access and open code the kthread case and remove it from current_thread_amr(). Thanks, Nick updated one From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Tue, 2 Feb 2021 09:23:38 +0530 Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm This fix the bad fault reported by KUAP when io_wqe_worker access userspace. Bug: Read fault blocked by KUAP! WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0 NIP [c009e7e4] __do_page_fault+0x6b4/0xcd0 LR [c009e7e0] __do_page_fault+0x6b0/0xcd0 .. Call Trace: [c00016367330] [c009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable) [c000163673e0] [c009ee3c] do_page_fault+0x3c/0x120 [c00016367430] [c000c848] handle_page_fault+0x10/0x2c --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 .. NIP [c08e8228] iov_iter_fault_in_readable+0x148/0x6f0 LR [c08e834c] iov_iter_fault_in_readable+0x26c/0x6f0 interrupt: 300 [c000163677e0] [c07154a0] iomap_write_actor+0xc0/0x280 [c00016367880] [c070fc94] iomap_apply+0x1c4/0x780 [c00016367990] [c0710330] iomap_file_buffered_write+0xa0/0x120 [c000163679e0] [c0080040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] [c00016367a90] [c06d74bc] io_write+0x10c/0x460 [c00016367bb0] [c06d80e4] io_issue_sqe+0x8d4/0x1200 [c00016367c70] [c06d8ad0] io_wq_submit_work+0xc0/0x250 [c00016367cb0] [c06e2578] io_worker_handle_work+0x498/0x800 [c00016367d40] [c06e2cdc] io_wqe_worker+0x3fc/0x4f0 [c00016367da0] [c01cb0a4] kthread+0x1c4/0x1d0 [c00016367e10] [c000dbf0]
Re: [PATCH v4 11/23] powerpc/syscall: Rename syscall_64.c into syscall.c
Le 28/01/2021 à 00:50, Nicholas Piggin a écrit : Excerpts from David Laight's message of January 26, 2021 8:28 pm: From: Nicholas Piggin Sent: 26 January 2021 10:21 Excerpts from Christophe Leroy's message of January 26, 2021 12:48 am: syscall_64.c will be reused almost as is for PPC32. Rename it syscall.c Could you rename it to interrupt.c instead? A system call is an interrupt, and the file now also has code to return from other interrupts as well, and it matches the new asm/interrupt.h from the interrupts series. Hmmm That might make it harder for someone looking for the system call entry code to find it. It's very grep'able. In some sense interrupts are the simpler case. Especially when comparing with other architectures which have special instructions for syscall entry. powerpc does have a special instruction for syscall, and it causes a system call interrupt. I'm not sure about other architectures, but for powerpc its more sensible to call it interrupt.c than syscall.c. Many other architectures have a syscall.c but for a different purpose: it contains arch specific system calls. We have that in powerpc as well, it is called syscalls.c So to avoid confusion, I'll rename it. But I think "interrupt" is maybe not the right name. An interrupt most of the time refers to IRQ. For me system call is not an interrupt in the way it doesn't unexpectedly interrupt a program flow. In powerpc manuals it is generally called exceptions, no I'm more inclined to call it exception.c Christophe
Re: [PATCH] powerpc/fault: fix wrong KUAP fault for IO_URING
Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit : Aneesh Kumar K.V writes: Nicholas Piggin writes: Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: Christophe Leroy writes: +Aneesh Le 29/01/2021 à 07:52, Zorro Lang a écrit : .. [ 96.200296] [ cut here ] [ 96.200304] Bug: Read fault blocked by KUAP! [ 96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 [ 96.200734] NIP [c0849424] fault_in_pages_readable+0x104/0x350 [ 96.200741] LR [c084952c] fault_in_pages_readable+0x20c/0x350 [ 96.200747] --- interrupt: 300 Problem happens in a section where userspace access is supposed to be granted, so the patch you proposed is definitely not the right fix. c0849408: 2c 01 00 4c isync c084940c: a6 03 3d 7d mtspr 29,r9 <== granting userspace access permission c0849410: 2c 01 00 4c isync c0849414: 00 00 36 e9 ld r9,0(r22) c0849418: 20 00 29 81 lwz r9,32(r9) c084941c: 00 02 29 71 andi. r9,r9,512 c0849420: 78 d3 5e 7f mr r30,r26 ==> c0849424:00 00 bf 8b lbz r29,0(r31) <== accessing userspace c0849428: 10 00 82 41 beq c0849438 c084942c: 2c 01 00 4c isync c0849430: a6 03 bd 7e mtspr 29,r21 <== clearing userspace access permission c0849434: 2c 01 00 4c isync My first guess is that the problem is linked to the following function, see the comment /* * For kernel thread that doesn't have thread.regs return * default AMR/IAMR values. */ static inline u64 current_thread_amr(void) { if (current->thread.regs) return current->thread.regs->amr; return AMR_KUAP_BLOCKED; } Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode") Yeah that's a bit of a curly one. At some point io_uring did kthread_use_mm(), which is supposed to mean the kthread can operate on behalf of the original process that submitted the IO. But because KUAP is implemented using memory protection keys, it depends on the value of the AMR register, which is not part of the mm, it's in thread.regs->amr. And what's worse by the time we're in kthread_use_mm() we no longer have access to the thread.regs->amr of the original process that submitted the IO. We also can't simply move the AMR into the mm, precisely because it's per thread, not per mm. So TBH I don't know how we're going to fix this. I guess we could return AMR=unblocked for kernel threads, but that's arguably a bug because it allows a process to circumvent memory keys by asking the kernel to do the access. We shouldn't need to inherit AMR should we? We only need it to be locked for kernel threads until it's explicitly unlocked -- nothing mm specific there. I think current_thread_amr could return 0 for kernel threads? Or I would even avoid using that function for allow_user_access and open code the kthread case and remove it from current_thread_amr(). Thanks, Nick updated one From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Tue, 2 Feb 2021 09:23:38 +0530 Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm This fix the bad fault reported by KUAP when io_wqe_worker access userspace. Bug: Read fault blocked by KUAP! WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0 NIP [c009e7e4] __do_page_fault+0x6b4/0xcd0 LR [c009e7e0] __do_page_fault+0x6b0/0xcd0 .. Call Trace: [c00016367330] [c009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable) [c000163673e0] [c009ee3c] do_page_fault+0x3c/0x120 [c00016367430] [c000c848] handle_page_fault+0x10/0x2c --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 .. NIP [c08e8228] iov_iter_fault_in_readable+0x148/0x6f0 LR [c08e834c] iov_iter_fault_in_readable+0x26c/0x6f0 interrupt: 300 [c000163677e0] [c07154a0] iomap_write_actor+0xc0/0x280 [c00016367880] [c070fc94] iomap_apply+0x1c4/0x780 [c00016367990] [c0710330] iomap_file_buffered_write+0xa0/0x120 [c000163679e0] [c0080040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] [c00016367a90] [c06d74bc] io_write+0x10c/0x460 [c00016367bb0] [c06d80e4] io_issue_sqe+0x8d4/0x1200 [c00016367c70] [c06d8ad0] io_wq_submit_work+0xc0/0x250 [c00016367cb0] [c06e2578] io_worker_handle_work+0x498/0x800 [c00016367d40] [c06e2cdc] io_wqe_worker+0x3fc/0x4f0 [c00016367da0] [c01cb0a4] kthread+0x1c4/0x1d0 [c00016367e10] [c000dbf0]
Re: [PATCH v7 00/42] powerpc: interrupt wrappers
Le 30/01/2021 à 14:08, Nicholas Piggin a écrit : This adds interrupt handler wrapper functions, similar to the generic / x86 code, and moves several common operations into them from either asm or open coded in the individual handlers. This series is based on powerpc fixes-test tree, there's another unrelated pending fix in patch 1 of the series which clashes a bit. This series trivialy conflicts with https://github.com/linuxppc/linux/commit/11f9c1d2fb497f69f83d4fab6fb7fc8a6884eded on powerpc next tree. This includes more changes and fixes suggested by Christophe, a few minor bug fixes and compile fix noticed by kbuild, and some NMI changes Athira asked about -- PMI interrupts don't block tracing when they are soft-NMI. Since v1: - Fixed a couple of compile issues - Fixed perf weirdness (sometimes NMI, sometimes not) - Also move irq_enter/exit into wrappers Since v2: - Rebased upstream - Took code in patch 3 from Christophe - Fixed some compile errors from 0day Since v3: - Rebased - Split Christophe's 32s DABR patch into its own patch - Fixed missing asm from 32s on patch 3 noticed by Christophe. - Moved changes around, split out one more patch (patch 9) to make changes more logical and atomic. - Add comments explaining _RAW handlers (SLB, HPTE) interrupts better Since v4: - Rebased (on top of scv fallback flush fix) - Rearranged a few changes into different patches from Christophe, e.g., the ___do_page_fault change from patch 2 to 10. I didn't do everything (e.g., splitting to update __hash_page to drop the msr argument before the bulk of patch 2 seemed like churn without much improvement), and also other things like removing the new ___do_page_fault variant if we can change hash fault context tracking I didn't get time to completely investigate and implement. I think this shouldn't be a showstopper though we can make more improvements as we go. Since v5: - Lots of good review suggestions from Christophe, see v5 email threads. - Major change being do_break is left in asm and selected early as an alternate interrupt handler now, which is a smaller step and matches other subarchs better. - Rearranged patches, split, moved things, bug fixes, etc. - Converted a few more missed exception handlers for debug and ras Since v6: - Move related interrupt handler de-argify patches together [Christophe] - Split do_bad_page_fault patch [Christophe] - Change do_page_fault cleanup patch [Christophe] - entry_32.S can't avoid saving r4/r5 until later in the series [Christophe] - Soft-NMI decrementer and perf don't block ftrace [Athira] - Rebased on some fixes - Fixed mismerge / duplicate line in patch 40 - Fix kbuild hash missing declaration bug Christophe Leroy (1): powerpc/32s: move DABR match out of handle_page_fault Nicholas Piggin (41): powerpc/64s: interrupt exit improve bounding of interrupt recursion KVM: PPC: Book3S HV: Context tracking exit guest context before enabling irqs powerpc/64s: move DABR match out of handle_page_fault powerpc/64s: move the hash fault handling logic to C powerpc: remove arguments from fault handler functions powerpc/fsl_booke/32: CacheLockingException remove args powerpc: do_break get registers from regs powerpc: DebugException remove args powerpc/32: transfer can avoid saving r4/r5 over trace call powerpc: bad_page_fault get registers from regs powerpc/64s: add do_bad_page_fault_segv handler powerpc: rearrange do_page_fault error case to be inside exception_enter powerpc/64s: move bad_page_fault handling to C powerpc/64s: split do_hash_fault powerpc/mm: Remove stale do_page_fault comment referring to SLB faults powerpc/64s: slb comment update powerpc/traps: add NOKPROBE_SYMBOL for sreset and mce powerpc/perf: move perf irq/nmi handling details into traps.c powerpc/time: move timer_broadcast_interrupt prototype to asm/time.h powerpc: add and use unknown_async_exception powerpc/cell: tidy up pervasive declarations powerpc: introduce die_mce powerpc/mce: ensure machine check handler always tests RI powerpc: improve handling of unrecoverable system reset powerpc: interrupt handler wrapper functions powerpc: add interrupt wrapper entry / exit stub functions powerpc: convert interrupt handlers to use wrappers powerpc: add interrupt_cond_local_irq_enable helper powerpc/64: context tracking remove _TIF_NOHZ powerpc/64s/hash: improve context tracking of hash faults powerpc/64: context tracking move to interrupt wrappers powerpc/64: add context tracking to asynchronous interrupts powerpc: handle irq_enter/irq_exit in interrupt handler wrappers powerpc/64s: move context tracking exit to interrupt exit path powerpc/64s: reconcile interrupts in C powerpc/64: move account_stolen_time into its own function powerpc/64: entry cpu time accounting in C powerpc: move NMI entry/exit code into wrapper powerpc/64s: move NMI
Re: [PATCH] powerpc/fault: fix wrong KUAP fault for IO_URING
Aneesh Kumar K.V writes: > Nicholas Piggin writes: > >> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: >>> Christophe Leroy writes: +Aneesh Le 29/01/2021 à 07:52, Zorro Lang a écrit : >>> .. > [ 96.200296] [ cut here ] > [ 96.200304] Bug: Read fault blocked by KUAP! > [ 96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 > bad_kernel_fault+0x180/0x310 > [ 96.200734] NIP [c0849424] fault_in_pages_readable+0x104/0x350 > [ 96.200741] LR [c084952c] fault_in_pages_readable+0x20c/0x350 > [ 96.200747] --- interrupt: 300 Problem happens in a section where userspace access is supposed to be granted, so the patch you proposed is definitely not the right fix. c0849408: 2c 01 00 4c isync c084940c: a6 03 3d 7d mtspr 29,r9 <== granting userspace access permission c0849410: 2c 01 00 4c isync c0849414: 00 00 36 e9 ld r9,0(r22) c0849418: 20 00 29 81 lwz r9,32(r9) c084941c: 00 02 29 71 andi. r9,r9,512 c0849420: 78 d3 5e 7f mr r30,r26 ==> c0849424: 00 00 bf 8b lbz r29,0(r31) <== accessing userspace c0849428: 10 00 82 41 beq c0849438 c084942c: 2c 01 00 4c isync c0849430: a6 03 bd 7e mtspr 29,r21 <== clearing userspace access permission c0849434: 2c 01 00 4c isync My first guess is that the problem is linked to the following function, see the comment /* * For kernel thread that doesn't have thread.regs return * default AMR/IAMR values. */ static inline u64 current_thread_amr(void) { if (current->thread.regs) return current->thread.regs->amr; return AMR_KUAP_BLOCKED; } Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode") >>> >>> Yeah that's a bit of a curly one. >>> >>> At some point io_uring did kthread_use_mm(), which is supposed to mean >>> the kthread can operate on behalf of the original process that submitted >>> the IO. >>> >>> But because KUAP is implemented using memory protection keys, it depends >>> on the value of the AMR register, which is not part of the mm, it's in >>> thread.regs->amr. >>> >>> And what's worse by the time we're in kthread_use_mm() we no longer have >>> access to the thread.regs->amr of the original process that submitted >>> the IO. >>> >>> We also can't simply move the AMR into the mm, precisely because it's >>> per thread, not per mm. >>> >>> So TBH I don't know how we're going to fix this. >>> >>> I guess we could return AMR=unblocked for kernel threads, but that's >>> arguably a bug because it allows a process to circumvent memory keys by >>> asking the kernel to do the access. >> >> We shouldn't need to inherit AMR should we? We only need it to be locked >> for kernel threads until it's explicitly unlocked -- nothing mm specific >> there. I think current_thread_amr could return 0 for kernel threads? Or >> I would even avoid using that function for allow_user_access and open >> code the kthread case and remove it from current_thread_amr(). >> >> Thanks, >> Nick > updated one >From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Tue, 2 Feb 2021 09:23:38 +0530 Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm This fix the bad fault reported by KUAP when io_wqe_worker access userspace. Bug: Read fault blocked by KUAP! WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0 NIP [c009e7e4] __do_page_fault+0x6b4/0xcd0 LR [c009e7e0] __do_page_fault+0x6b0/0xcd0 .. Call Trace: [c00016367330] [c009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable) [c000163673e0] [c009ee3c] do_page_fault+0x3c/0x120 [c00016367430] [c000c848] handle_page_fault+0x10/0x2c --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 .. NIP [c08e8228] iov_iter_fault_in_readable+0x148/0x6f0 LR [c08e834c] iov_iter_fault_in_readable+0x26c/0x6f0 interrupt: 300 [c000163677e0] [c07154a0] iomap_write_actor+0xc0/0x280 [c00016367880] [c070fc94] iomap_apply+0x1c4/0x780 [c00016367990] [c0710330] iomap_file_buffered_write+0xa0/0x120 [c000163679e0] [c0080040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] [c00016367a90] [c06d74bc] io_write+0x10c/0x460 [c00016367bb0] [c06d80e4] io_issue_sqe+0x8d4/0x1200 [c00016367c70] [c06d8ad0] io_wq_submit_work+0xc0/0x250 [c00016367cb0]
[PATCH] cpufreq: Remove unused flag CPUFREQ_PM_NO_WARN
This flag is set by one of the drivers but it isn't used in the code otherwise. Remove the unused flag and update the driver. Signed-off-by: Viresh Kumar --- Rebased over: https://lore.kernel.org/lkml/a59bb322b22c247d570b70a8e94067804287623b.1612241683.git.viresh.ku...@linaro.org/ drivers/cpufreq/pmac32-cpufreq.c | 3 +-- include/linux/cpufreq.h | 13 + 2 files changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/cpufreq/pmac32-cpufreq.c b/drivers/cpufreq/pmac32-cpufreq.c index 73621bc11976..4f20c6a9108d 100644 --- a/drivers/cpufreq/pmac32-cpufreq.c +++ b/drivers/cpufreq/pmac32-cpufreq.c @@ -439,8 +439,7 @@ static struct cpufreq_driver pmac_cpufreq_driver = { .init = pmac_cpufreq_cpu_init, .suspend= pmac_cpufreq_suspend, .resume = pmac_cpufreq_resume, - .flags = CPUFREQ_PM_NO_WARN | - CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING, + .flags = CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING, .attr = cpufreq_generic_attr, .name = "powermac", }; diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index c8e40e91fe9b..353969c7acd3 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -398,8 +398,11 @@ struct cpufreq_driver { /* loops_per_jiffy or other kernel "constants" aren't affected by frequency transitions */ #define CPUFREQ_CONST_LOOPSBIT(1) -/* don't warn on suspend/resume speed mismatches */ -#define CPUFREQ_PM_NO_WARN BIT(2) +/* + * Set by drivers that want the core to automatically register the cpufreq + * driver as a thermal cooling device. + */ +#define CPUFREQ_IS_COOLING_DEV BIT(2) /* * This should be set by platforms having multiple clock-domains, i.e. @@ -431,12 +434,6 @@ struct cpufreq_driver { */ #define CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING BIT(6) -/* - * Set by drivers that want the core to automatically register the cpufreq - * driver as a thermal cooling device. - */ -#define CPUFREQ_IS_COOLING_DEV BIT(7) - int cpufreq_register_driver(struct cpufreq_driver *driver_data); int cpufreq_unregister_driver(struct cpufreq_driver *driver_data); -- 2.25.0.rc1.19.g042ed3e048af
Re: [PATCH] powerpc/fault: fix wrong KUAP fault for IO_URING
Nicholas Piggin writes: > Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: >> Christophe Leroy writes: >>> +Aneesh >>> >>> Le 29/01/2021 à 07:52, Zorro Lang a écrit : >> .. [ 96.200296] [ cut here ] [ 96.200304] Bug: Read fault blocked by KUAP! [ 96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 >>> [ 96.200734] NIP [c0849424] fault_in_pages_readable+0x104/0x350 [ 96.200741] LR [c084952c] fault_in_pages_readable+0x20c/0x350 [ 96.200747] --- interrupt: 300 >>> >>> >>> Problem happens in a section where userspace access is supposed to be >>> granted, so the patch you >>> proposed is definitely not the right fix. >>> >>> c0849408: 2c 01 00 4c isync >>> c084940c: a6 03 3d 7d mtspr 29,r9 <== granting userspace >>> access permission >>> c0849410: 2c 01 00 4c isync >>> c0849414: 00 00 36 e9 ld r9,0(r22) >>> c0849418: 20 00 29 81 lwz r9,32(r9) >>> c084941c: 00 02 29 71 andi. r9,r9,512 >>> c0849420: 78 d3 5e 7f mr r30,r26 >>> ==> c0849424: 00 00 bf 8b lbz r29,0(r31) <== >>> accessing userspace >>> c0849428: 10 00 82 41 beq c0849438 >>> >>> c084942c: 2c 01 00 4c isync >>> c0849430: a6 03 bd 7e mtspr 29,r21 <== clearing userspace >>> access permission >>> c0849434: 2c 01 00 4c isync >>> >>> My first guess is that the problem is linked to the following function, see >>> the comment >>> >>> /* >>> * For kernel thread that doesn't have thread.regs return >>> * default AMR/IAMR values. >>> */ >>> static inline u64 current_thread_amr(void) >>> { >>> if (current->thread.regs) >>> return current->thread.regs->amr; >>> return AMR_KUAP_BLOCKED; >>> } >>> >>> Above function was introduced by commit 48a8ab4eeb82 >>> ("powerpc/book3s64/pkeys: Don't update SPRN_AMR >>> when in kernel mode") >> >> Yeah that's a bit of a curly one. >> >> At some point io_uring did kthread_use_mm(), which is supposed to mean >> the kthread can operate on behalf of the original process that submitted >> the IO. >> >> But because KUAP is implemented using memory protection keys, it depends >> on the value of the AMR register, which is not part of the mm, it's in >> thread.regs->amr. >> >> And what's worse by the time we're in kthread_use_mm() we no longer have >> access to the thread.regs->amr of the original process that submitted >> the IO. >> >> We also can't simply move the AMR into the mm, precisely because it's >> per thread, not per mm. >> >> So TBH I don't know how we're going to fix this. >> >> I guess we could return AMR=unblocked for kernel threads, but that's >> arguably a bug because it allows a process to circumvent memory keys by >> asking the kernel to do the access. > > We shouldn't need to inherit AMR should we? We only need it to be locked > for kernel threads until it's explicitly unlocked -- nothing mm specific > there. I think current_thread_amr could return 0 for kernel threads? Or > I would even avoid using that function for allow_user_access and open > code the kthread case and remove it from current_thread_amr(). > > Thanks, > Nick Can we try this? commit 9a193d38b1a0a52364bc70ec953e0685993603b6 Author: Aneesh Kumar K.V Date: Tue Feb 2 09:23:38 2021 +0530 powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm This fix the bad fault reported by KUAP when io_wqe_worker access userspace. Bug: Read fault blocked by KUAP! WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0 NIP [c009e7e4] __do_page_fault+0x6b4/0xcd0 LR [c009e7e0] __do_page_fault+0x6b0/0xcd0 .. Call Trace: [c00016367330] [c009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable) [c000163673e0] [c009ee3c] do_page_fault+0x3c/0x120 [c00016367430] [c000c848] handle_page_fault+0x10/0x2c --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 .. NIP [c08e8228] iov_iter_fault_in_readable+0x148/0x6f0 LR [c08e834c] iov_iter_fault_in_readable+0x26c/0x6f0 interrupt: 300 [c000163677e0] [c07154a0] iomap_write_actor+0xc0/0x280 [c00016367880] [c070fc94] iomap_apply+0x1c4/0x780 [c00016367990] [c0710330] iomap_file_buffered_write+0xa0/0x120 [c000163679e0] [c0080040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] [c00016367a90] [c06d74bc] io_write+0x10c/0x460 [c00016367bb0] [c06d80e4] io_issue_sqe+0x8d4/0x1200 [c00016367c70] [c06d8ad0] io_wq_submit_work+0xc0/0x250 [c00016367cb0] [c06e2578]
Re: [PATCH] dma-mapping: remove unneeded semicolon
On 2/1/21 7:41 PM, Yang Li wrote: > Eliminate the following coccicheck warning: > ./arch/powerpc/platforms/ps3/system-bus.c:606:2-3: Unneeded semicolon > ./arch/powerpc/platforms/ps3/system-bus.c:765:2-3: Unneeded semicolon > > Reported-by: Abaci Robot > Signed-off-by: Yang Li > --- > arch/powerpc/platforms/ps3/system-bus.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Thanks for your patch, it looks good. Acked-by: Geoff Levand -Geoff
[PATCH] powerpc/book3s64: remove unneeded semicolon
Eliminate the following coccicheck warning: ./arch/powerpc/platforms/pseries/lpar.c:1632:2-3: Unneeded semicolon ./arch/powerpc/platforms/pseries/lpar.c:1663:2-3: Unneeded semicolon Reported-by: Abaci Robot Signed-off-by: Yang Li --- arch/powerpc/platforms/pseries/lpar.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c index 764170f..24889b8 100644 --- a/arch/powerpc/platforms/pseries/lpar.c +++ b/arch/powerpc/platforms/pseries/lpar.c @@ -1629,7 +1629,7 @@ static int pseries_lpar_resize_hpt(unsigned long shift) } msleep(delay); rc = plpar_resize_hpt_prepare(0, shift); - }; + } switch (rc) { case H_SUCCESS: @@ -1663,7 +1663,7 @@ static int pseries_lpar_resize_hpt(unsigned long shift) pr_warn("Unexpected error %d from H_RESIZE_HPT_COMMIT\n", state.commit_rc); return -EIO; - }; + } } pr_info("HPT resize to shift %lu complete (%lld ms / %lld ms)\n", -- 1.8.3.1
Re: [PATCH] powerpc/eeh: remove unneeded semicolon
On Tue, Feb 2, 2021 at 2:21 PM Yang Li wrote: > > Eliminate the following coccicheck warning: > ./arch/powerpc/kernel/eeh.c:782:2-3: Unneeded semicolon Doesn't appear to be a load-bearing semicolon. It's hard to tell with EEH. Reviewed-by: Oliver O'Halloran > > Reported-by: Abaci Robot > Signed-off-by: Yang Li > --- > arch/powerpc/kernel/eeh.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c > index 813713c..02c058d 100644 > --- a/arch/powerpc/kernel/eeh.c > +++ b/arch/powerpc/kernel/eeh.c > @@ -779,7 +779,7 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, > enum pcie_reset_state stat > default: > eeh_pe_state_clear(pe, EEH_PE_ISOLATED | EEH_PE_CFG_BLOCKED, > true); > return -EINVAL; > - }; > + } > > return 0; > } > -- > 1.8.3.1 >
[PATCH] dma-mapping: remove unneeded semicolon
Eliminate the following coccicheck warning: ./arch/powerpc/platforms/ps3/system-bus.c:606:2-3: Unneeded semicolon ./arch/powerpc/platforms/ps3/system-bus.c:765:2-3: Unneeded semicolon Reported-by: Abaci Robot Signed-off-by: Yang Li --- arch/powerpc/platforms/ps3/system-bus.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/ps3/system-bus.c b/arch/powerpc/platforms/ps3/system-bus.c index b431f41..5c73926 100644 --- a/arch/powerpc/platforms/ps3/system-bus.c +++ b/arch/powerpc/platforms/ps3/system-bus.c @@ -603,7 +603,7 @@ static dma_addr_t ps3_ioc0_map_page(struct device *_dev, struct page *page, default: /* not happned */ BUG(); - }; + } result = ps3_dma_map(dev->d_region, (unsigned long)ptr, size, _addr, iopte_flag); @@ -762,7 +762,7 @@ int ps3_system_bus_device_register(struct ps3_system_bus_device *dev) break; default: BUG(); - }; + } dev->core.of_node = NULL; set_dev_node(>core, 0); -- 1.8.3.1
[PATCH] powerpc/64s: remove unneeded semicolon
Eliminate the following coccicheck warning: ./arch/powerpc/platforms/powernv/setup.c:160:2-3: Unneeded semicolon Reported-by: Abaci Robot Signed-off-by: Yang Li --- arch/powerpc/platforms/powernv/setup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 4426a10..167d6ed 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -157,7 +157,7 @@ static void __init pnv_check_guarded_cores(void) for_each_node_by_type(dn, "cpu") { if (of_property_match_string(dn, "status", "bad") >= 0) bad_count++; - }; + } if (bad_count) { printk(" _ ___\n"); -- 1.8.3.1
[PATCH] KVM: PPC: remove unneeded semicolon
Eliminate the following coccicheck warning: ./arch/powerpc/kvm/booke.c:701:2-3: Unneeded semicolon Reported-by: Abaci Robot Signed-off-by: Yang Li --- arch/powerpc/kvm/booke.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 288a982..f38ae3e 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -698,7 +698,7 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu) kvmppc_set_exit_type(vcpu, EMULATED_MTMSRWE_EXITS); r = 1; - }; + } return r; } -- 1.8.3.1
[PATCH] powerpc/prom_init: remove unneeded semicolon
Eliminate the following coccicheck warning: ./arch/powerpc/kernel/prom_init.c:2990:2-3: Unneeded semicolon Reported-by: Abaci Robot Signed-off-by: Yang Li --- arch/powerpc/kernel/prom_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index e9d4eb6..2d111bb 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -2987,7 +2987,7 @@ static void __init fixup_device_tree_efika_add_phy(void) " 0x3 encode-int encode+" " s\" interrupts\" property" " finish-device"); - }; + } /* Check for a PHY device node - if missing then create one and * give it's phandle to the ethernet node */ -- 1.8.3.1
[PATCH] powerpc/eeh: remove unneeded semicolon
Eliminate the following coccicheck warning: ./arch/powerpc/kernel/eeh.c:782:2-3: Unneeded semicolon Reported-by: Abaci Robot Signed-off-by: Yang Li --- arch/powerpc/kernel/eeh.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 813713c..02c058d 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -779,7 +779,7 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat default: eeh_pe_state_clear(pe, EEH_PE_ISOLATED | EEH_PE_CFG_BLOCKED, true); return -EINVAL; - }; + } return 0; } -- 1.8.3.1
[PATCH] crypto: powerpc: remove unneeded semicolon
Eliminate the following coccicheck warning: ./arch/powerpc/crypto/sha256-spe-glue.c:132:2-3: Unneeded semicolon Reported-by: Abaci Robot Signed-off-by: Yang Li --- arch/powerpc/crypto/sha256-spe-glue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/crypto/sha256-spe-glue.c b/arch/powerpc/crypto/sha256-spe-glue.c index a6e650a..ffedea7 100644 --- a/arch/powerpc/crypto/sha256-spe-glue.c +++ b/arch/powerpc/crypto/sha256-spe-glue.c @@ -129,7 +129,7 @@ static int ppc_spe_sha256_update(struct shash_desc *desc, const u8 *data, src += bytes; len -= bytes; - }; + } memcpy((char *)sctx->buf, src, len); return 0; -- 1.8.3.1
[RFC PATCH 9/9] KVM: PPC: Book3S HV: Implement the rest of the P9 entry/exit handling in C
Almost all logic is moved to C, by introducing a new in_guest mode that selects and branches very early in the interrupt handler to the P9 exit code. The remaining asm is a low level stack setup, with VCPU vs host register save and restore. Only the GPRs that might be touched by the compiler are handled in asm. More consolidation could be made in future by inlining more of __kvmhv_vcpu_entry_p9 into its callers and handling some bits there perhaps, although there is something to be said for keeping the P9 and old path looking similar, so that needs to be done carefully if at all. Some bits are left out for now, ultravisor return for example. I haven't added any loop to return quickly back into the guest, no simple "realmode" hcalls, decrementer checks, etc. It remains to be seen whether these would actually matter to add -- ISA v3.0 (radix, large decrementer, etc) + XIVE should cut down on guest exits by a huge amount, and of those that remain I would say a large portion of them will need to go back out to Linux. Radix also makes this much more light-weight because we are already in host MMU mode. There are a few bits like LPID/LPCR/PID switching and XIVE pulling and pushing which would still be desirable to avoid so if it happens a lot we could probably handle some simple common cases. I hope we won't have to at all. --- arch/powerpc/include/asm/asm-prototypes.h | 2 +- arch/powerpc/include/asm/kvm_asm.h| 3 +- arch/powerpc/include/asm/kvm_book3s_64.h | 2 + arch/powerpc/include/asm/kvm_ppc.h| 2 + arch/powerpc/kvm/Makefile | 3 + arch/powerpc/kvm/book3s_64_entry.S| 135 ++ arch/powerpc/kvm/book3s_hv.c | 21 ++- arch/powerpc/kvm/book3s_hv_interrupt.c| 164 ++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 123 ++-- arch/powerpc/kvm/book3s_xive.c| 32 + 10 files changed, 364 insertions(+), 123 deletions(-) create mode 100644 arch/powerpc/kvm/book3s_hv_interrupt.c diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index d0b832cbbec8..ba15ce78ffe5 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -171,7 +171,7 @@ void kvmhv_load_host_pmu(void); void kvmhv_save_guest_pmu(struct kvm_vcpu *vcpu, bool pmu_in_use); void kvmhv_load_guest_pmu(struct kvm_vcpu *vcpu); -int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu); +void kvmppc_p9_enter_guest(struct kvm_vcpu *vcpu); long kvmppc_h_set_dabr(struct kvm_vcpu *vcpu, unsigned long dabr); long kvmppc_h_set_xdabr(struct kvm_vcpu *vcpu, unsigned long dabr, diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index a3633560493b..b4f9996bd331 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -146,7 +146,8 @@ #define KVM_GUEST_MODE_GUEST 1 #define KVM_GUEST_MODE_SKIP2 #define KVM_GUEST_MODE_GUEST_HV3 -#define KVM_GUEST_MODE_HOST_HV 4 +#define KVM_GUEST_MODE_GUEST_HV_FAST 4 /* ISA v3.0 with host radix mode */ +#define KVM_GUEST_MODE_HOST_HV 5 #define KVM_INST_FETCH_FAILED -1 diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 9bb9bb370b53..7f08f6ed73df 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -153,6 +153,8 @@ static inline bool kvmhv_vcpu_is_radix(struct kvm_vcpu *vcpu) return radix; } +int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu); + #define KVM_DEFAULT_HPT_ORDER 24 /* 16MB HPT by default */ #endif diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 0a056c64c317..1699e8ca96b1 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -670,6 +670,7 @@ extern int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval); extern int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level, bool line_status); extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu); +extern void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu); static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu) { @@ -710,6 +711,7 @@ static inline int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval) { retur static inline int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level, bool line_status) { return -ENODEV; } static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { } +static inline void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu) { } static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu) { return 0; } diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index cdd119028f64..b94be8c9bad1 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -43,6 +43,9 @@
[RFC PATCH 8/9] KVM: PPC: Book3S HV: Minimise hcall handler calling convention differences
Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 16 +++- arch/powerpc/kvm/book3s_64_entry.S | 22 +++--- 2 files changed, 18 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index bed4499488b3..0844558f1d7c 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1923,8 +1923,22 @@ EXC_VIRT_END(system_call, 0x4c00, 0x100) #ifdef CONFIG_KVM_BOOK3S_64_HANDLER TRAMP_REAL_BEGIN(kvm_hcall) + std r9,PACA_EXGEN+EX_R9(r13) + std r11,PACA_EXGEN+EX_R11(r13) + std r12,PACA_EXGEN+EX_R12(r13) + mfcrr9 mfctr r10 - SET_SCRATCH0(r10) /* Save r13 in SCRATCH0 */ + std r10,PACA_EXGEN+EX_R13(r13) + li r10,0 + std r10,PACA_EXGEN+EX_CFAR(r13) + std r10,PACA_EXGEN+EX_CTR(r13) +BEGIN_FTR_SECTION + mfspr r10,SPRN_PPR + std r10,PACA_EXGEN+EX_PPR(r13) +END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) + + HMT_MEDIUM + #ifdef CONFIG_RELOCATABLE /* * Requires __LOAD_FAR_HANDLER beause kvmppc_hcall lives diff --git a/arch/powerpc/kvm/book3s_64_entry.S b/arch/powerpc/kvm/book3s_64_entry.S index 0d9e1e55c24d..0e39267aef63 100644 --- a/arch/powerpc/kvm/book3s_64_entry.S +++ b/arch/powerpc/kvm/book3s_64_entry.S @@ -9,24 +9,9 @@ .globalkvmppc_hcall .balign IFETCH_ALIGN_BYTES kvmppc_hcall: - /* -* This is a hcall, so register convention is as -* Documentation/powerpc/papr_hcalls.rst, with these additions: -* R13 = PACA -* guest R13 saved in SPRN_SCRATCH0 -* R10 = free -*/ -BEGIN_FTR_SECTION - mfspr r10,SPRN_PPR - std r10,HSTATE_PPR(r13) -END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) - HMT_MEDIUM - mfcrr10 - std r12,HSTATE_SCRATCH0(r13) - sldir12,r10,32 - ori r12,r12,0xc00 - ld r10,PACA_EXGEN+EX_R10(r13) - b do_kvm_interrupt + ld r10,PACA_EXGEN+EX_R13(r13) + SET_SCRATCH0(r10) + li r10,0xc00 .globalkvmppc_interrupt .balign IFETCH_ALIGN_BYTES @@ -57,7 +42,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) ld r10,EX_R10(r11) ld r11,EX_R11(r11) -do_kvm_interrupt: /* * Hcalls and other interrupts come here after normalising register * contents and save locations: -- 2.23.0
[RFC PATCH 7/9] KVM: PPC: Book3S HV: move bad_host_intr check to HV handler
This is not used by PR KVM. Signed-off-by: Nicholas Piggin --- arch/powerpc/kvm/book3s_64_entry.S | 3 --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 4 +++- arch/powerpc/kvm/book3s_segment.S | 7 +++ 3 files changed, 10 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_entry.S b/arch/powerpc/kvm/book3s_64_entry.S index 5db76c8d4012..0d9e1e55c24d 100644 --- a/arch/powerpc/kvm/book3s_64_entry.S +++ b/arch/powerpc/kvm/book3s_64_entry.S @@ -73,11 +73,8 @@ do_kvm_interrupt: beq maybe_skip no_skip: #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE - cmpwi r9,KVM_GUEST_MODE_HOST_HV - beq kvmppc_bad_host_intr #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE cmpwi r9,KVM_GUEST_MODE_GUEST - ld r9,HSTATE_SCRATCH2(r13) beq kvmppc_interrupt_pr #endif b kvmppc_interrupt_hv diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index b9c4acd747f7..8144c1403203 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1251,6 +1251,7 @@ hdec_soon: kvmppc_interrupt_hv: /* * Register contents: +* R9 = HSTATE_IN_GUEST * R12 = (guest CR << 32) | interrupt vector * R13 = PACA * guest R12 saved in shadow VCPU SCRATCH0 @@ -1258,6 +1259,8 @@ kvmppc_interrupt_hv: * guest R9 saved in HSTATE_SCRATCH2 */ /* We're now back in the host but in guest MMU context */ + cmpwi r9,KVM_GUEST_MODE_HOST_HV + beq kvmppc_bad_host_intr li r9, KVM_GUEST_MODE_HOST_HV stb r9, HSTATE_IN_GUEST(r13) @@ -3245,7 +3248,6 @@ END_FTR_SECTION_IFCLR(CPU_FTR_P9_TM_HV_ASSIST) * cfar is saved in HSTATE_CFAR(r13) * ppr is saved in HSTATE_PPR(r13) */ -.global kvmppc_bad_host_intr kvmppc_bad_host_intr: /* * Switch to the emergency stack, but start half-way down in diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S index 1f492aa4c8d6..ef1d88b869bf 100644 --- a/arch/powerpc/kvm/book3s_segment.S +++ b/arch/powerpc/kvm/book3s_segment.S @@ -167,8 +167,15 @@ kvmppc_interrupt_pr: * R12 = (guest CR << 32) | exit handler id * R13 = PACA * HSTATE.SCRATCH0 = guest R12 +* +* If HV is possible, additionally: +* R9 = HSTATE_IN_GUEST +* HSTATE.SCRATCH2 = guest R9 */ #ifdef CONFIG_PPC64 +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE + ld r9,HSTATE_SCRATCH2(r13) +#endif /* Match 32-bit entry */ rotldi r12, r12, 32 /* Flip R12 halves for stw */ stw r12, HSTATE_SCRATCH1(r13) /* CR is now in the low half */ -- 2.23.0
[RFC PATCH 6/9] KVM: PPC: Book3S HV: Move interrupt early register setup to KVM
Like the earlier patch for hcalls, KVM interrupt entry requires a different calling convention than the Linux interrupt handlers set up. Move the code that converts from one to the other into KVM. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 126 --- arch/powerpc/kvm/book3s_64_entry.S | 37 ++-- 2 files changed, 50 insertions(+), 113 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index c893801eb1d5..bed4499488b3 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -191,7 +191,6 @@ do_define_int n .endif .endm -#ifdef CONFIG_KVM_BOOK3S_64_HANDLER /* * All interrupts which set HSRR registers, as well as SRESET and MCE and * syscall when invoked with "sc 1" switch to MSR[HV]=1 (HVMODE) to be taken, @@ -224,54 +223,25 @@ do_define_int n * to KVM to handle. */ -.macro KVMTEST name +.macro KVMTEST name handler +#ifdef CONFIG_KVM_BOOK3S_64_HANDLER lbz r10,HSTATE_IN_GUEST(r13) cmpwi r10,0 - bne \name\()_kvm -.endm - -.macro GEN_KVM name - .balign IFETCH_ALIGN_BYTES -\name\()_kvm: - -BEGIN_FTR_SECTION - ld r10,IAREA+EX_CFAR(r13) - std r10,HSTATE_CFAR(r13) -END_FTR_SECTION_IFSET(CPU_FTR_CFAR) - - ld r10,IAREA+EX_CTR(r13) - mtctr r10 -BEGIN_FTR_SECTION - ld r10,IAREA+EX_PPR(r13) - std r10,HSTATE_PPR(r13) -END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) - ld r11,IAREA+EX_R11(r13) - ld r12,IAREA+EX_R12(r13) - std r12,HSTATE_SCRATCH0(r13) - sldir12,r9,32 - ld r9,IAREA+EX_R9(r13) - ld r10,IAREA+EX_R10(r13) /* HSRR variants have the 0x2 bit added to their trap number */ .if IHSRR_IF_HVMODE BEGIN_FTR_SECTION - ori r12,r12,(IVEC + 0x2) + li r10,(IVEC + 0x2) FTR_SECTION_ELSE - ori r12,r12,(IVEC) + li r10,(IVEC) ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) .elseif IHSRR - ori r12,r12,(IVEC+ 0x2) + li r10,(IVEC + 0x2) .else - ori r12,r12,(IVEC) + li r10,(IVEC) .endif - b kvmppc_interrupt -.endm - -#else -.macro KVMTEST name -.endm -.macro GEN_KVM name -.endm + bne \handler #endif +.endm /* * This is the BOOK3S interrupt entry code macro. @@ -413,7 +383,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR) DEFINE_FIXED_SYMBOL(\name\()_common_real) \name\()_common_real: .if IKVM_REAL - KVMTEST \name + KVMTEST \name kvm_interrupt .endif ld r10,PACAKMSR(r13) /* get MSR value for kernel */ @@ -436,7 +406,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real) DEFINE_FIXED_SYMBOL(\name\()_common_virt) \name\()_common_virt: .if IKVM_VIRT - KVMTEST \name + KVMTEST \name kvm_interrupt 1: .endif .endif /* IVIRT */ @@ -450,7 +420,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt) DEFINE_FIXED_SYMBOL(\name\()_common_real) \name\()_common_real: .if IKVM_REAL - KVMTEST \name + KVMTEST \name kvm_interrupt .endif .endm @@ -1011,8 +981,6 @@ EXC_COMMON_BEGIN(system_reset_common) EXCEPTION_RESTORE_REGS RFI_TO_USER_OR_KERNEL - GEN_KVM system_reset - /** * Interrupt 0x200 - Machine Check Interrupt (MCE). @@ -1196,7 +1164,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) /* * Check if we are coming from guest. If yes, then run the normal * exception handler which will take the -* machine_check_kvm->kvmppc_interrupt branch to deliver the MC event +* machine_check_kvm->kvm_interrupt branch to deliver the MC event * to guest. */ lbz r11,HSTATE_IN_GUEST(r13) @@ -1267,8 +1235,6 @@ EXC_COMMON_BEGIN(machine_check_common) bl machine_check_exception b interrupt_return - GEN_KVM machine_check - #ifdef CONFIG_PPC_P7_NAP /* @@ -1393,8 +1359,6 @@ MMU_FTR_SECTION_ELSE b handle_page_fault ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) - GEN_KVM data_access - /** * Interrupt 0x380 - Data Segment Interrupt (DSLB). @@ -1449,8 +1413,6 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) bl do_bad_slb_fault b interrupt_return - GEN_KVM data_access_slb - /** * Interrupt 0x400 - Instruction Storage Interrupt (ISI). @@ -1489,8 +1451,6 @@ MMU_FTR_SECTION_ELSE b handle_page_fault ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) - GEN_KVM instruction_access - /** * Interrupt 0x480 - Instruction Segment Interrupt (ISLB). @@ -1540,8 +1500,6 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) bl do_bad_slb_fault b interrupt_return -
[RFC PATCH 5/9] powerpc/64s: Remove EXSLB interrupt save area
SLB faults should not be taken while the PACA save areas are live, so EXSLB is not be required. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/paca.h | 3 +-- arch/powerpc/kernel/asm-offsets.c| 1 - arch/powerpc/kernel/exceptions-64s.S | 5 - 3 files changed, 1 insertion(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index 9454d29ff4b4..be0b00cb9fbb 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -108,8 +108,7 @@ struct paca_struct { */ /* used for most interrupts/exceptions */ u64 exgen[EX_SIZE] __attribute__((aligned(0x80))); - u64 exslb[EX_SIZE]; /* used for SLB/segment table misses -* on the linear mapping */ + /* SLB related definitions */ u16 vmalloc_sllp; u8 slb_cache_ptr; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 489a22cf1a92..51302fb74d14 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -255,7 +255,6 @@ int main(void) #endif /* CONFIG_PPC_MM_SLICES */ OFFSET(PACA_EXGEN, paca_struct, exgen); OFFSET(PACA_EXMC, paca_struct, exmc); - OFFSET(PACA_EXSLB, paca_struct, exslb); OFFSET(PACA_EXNMI, paca_struct, exnmi); #ifdef CONFIG_PPC_PSERIES OFFSET(PACALPPACAPTR, paca_struct, lppaca_ptr); diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index e964ea5149e8..c893801eb1d5 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1412,13 +1412,9 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) * on user-handler data structures. * * KVM: Same as 0x300, DSLB must test for KVM guest. - * - * A dedicated save area EXSLB is used (XXX: but it actually need not be - * these days, we could use EXGEN). */ INT_DEFINE_BEGIN(data_access_slb) IVEC=0x380 - IAREA=PACA_EXSLB IRECONCILE=0 IDAR=1 IKVM_REAL=1 @@ -1507,7 +1503,6 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) */ INT_DEFINE_BEGIN(instruction_access_slb) IVEC=0x480 - IAREA=PACA_EXSLB IRECONCILE=0 IISIDE=1 IDAR=1 -- 2.23.0
[RFC PATCH 4/9] KVM: PPC: Book3S HV: Move hcall early register setup to KVM
System calls / hcalls have a different calling convention than other interrupts, so there is code in the KVMTEST to massage these into the same form as other interrupt handlers. Move this work into the KVM hcall handler. This means teaching KVM a little more about the low level interrupt handler setup, PACA save areas, etc., although that's not obviously worse than the current approach of coming up with an entirely different interrupt register / save convention. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/exception-64s.h | 13 +++ arch/powerpc/kernel/exceptions-64s.S | 44 ++-- arch/powerpc/kvm/book3s_64_entry.S | 17 + 3 files changed, 32 insertions(+), 42 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 1d32b174ab6a..7e5168b83155 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -35,6 +35,19 @@ /* PACA save area size in u64 units (exgen, exmc, etc) */ #define EX_SIZE10 +/* PACA save area offsets */ +#define EX_R9 0 +#define EX_R10 8 +#define EX_R11 16 +#define EX_R12 24 +#define EX_R13 32 +#define EX_DAR 40 +#define EX_DSISR 48 +#define EX_CCR 52 +#define EX_CFAR56 +#define EX_PPR 64 +#define EX_CTR 72 + /* * maximum recursive depth of MCE exceptions */ diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index c25395b5921a..e964ea5149e8 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -21,22 +21,6 @@ #include #include -/* PACA save area offsets (exgen, exmc, etc) */ -#define EX_R9 0 -#define EX_R10 8 -#define EX_R11 16 -#define EX_R12 24 -#define EX_R13 32 -#define EX_DAR 40 -#define EX_DSISR 48 -#define EX_CCR 52 -#define EX_CFAR56 -#define EX_PPR 64 -#define EX_CTR 72 -.if EX_SIZE != 10 - .error "EX_SIZE is wrong" -.endif - /* * Following are fixed section helper macros. * @@ -2000,45 +1984,21 @@ EXC_VIRT_END(system_call, 0x4c00, 0x100) #ifdef CONFIG_KVM_BOOK3S_64_HANDLER TRAMP_REAL_BEGIN(system_call_kvm) - /* -* This is a hcall, so register convention is as above, with these -* differences: -* r13 = PACA -* ctr = orig r13 -* orig r10 saved in PACA -*/ -/* - * Save the PPR (on systems that support it) before changing to - * HMT_MEDIUM. That allows the KVM code to save that value into the - * guest state (it is the guest's PPR value). - */ -BEGIN_FTR_SECTION - mfspr r10,SPRN_PPR - std r10,HSTATE_PPR(r13) -END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) - HMT_MEDIUM mfctr r10 - SET_SCRATCH0(r10) - mfcrr10 - std r12,HSTATE_SCRATCH0(r13) - sldir12,r10,32 - ori r12,r12,0xc00 + SET_SCRATCH0(r10) /* Save r13 in SCRATCH0 */ #ifdef CONFIG_RELOCATABLE /* -* Requires __LOAD_FAR_HANDLER beause kvmppc_interrupt lives +* Requires __LOAD_FAR_HANDLER beause kvmppc_hcall lives * outside the head section. */ __LOAD_FAR_HANDLER(r10, kvmppc_hcall) mtctr r10 - ld r10,PACA_EXGEN+EX_R10(r13) bctr #else - ld r10,PACA_EXGEN+EX_R10(r13) b kvmppc_hcall #endif #endif - /** * Interrupt 0xd00 - Trace Interrupt. * This is a synchronous interrupt in response to instruction step or diff --git a/arch/powerpc/kvm/book3s_64_entry.S b/arch/powerpc/kvm/book3s_64_entry.S index 3b894b90862f..df074d33f6d7 100644 --- a/arch/powerpc/kvm/book3s_64_entry.S +++ b/arch/powerpc/kvm/book3s_64_entry.S @@ -12,6 +12,23 @@ .globalkvmppc_hcall .balign IFETCH_ALIGN_BYTES kvmppc_hcall: + /* +* This is a hcall, so register convention is as +* Documentation/powerpc/papr_hcalls.rst, with these additions: +* R13 = PACA +* guest R13 saved in SPRN_SCRATCH0 +* R10 = free +*/ +BEGIN_FTR_SECTION + mfspr r10,SPRN_PPR + std r10,HSTATE_PPR(r13) +END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) + HMT_MEDIUM + mfcrr10 + std r12,HSTATE_SCRATCH0(r13) + sldir12,r10,32 + ori r12,r12,0xc00 + ld r10,PACA_EXGEN+EX_R10(r13) .globalkvmppc_interrupt .balign IFETCH_ALIGN_BYTES -- 2.23.0
[RFC PATCH 3/9] KVM: PPC: Book3S 64: add hcall interrupt handler
Add a separate hcall entry point. This can be used to deal with the different calling convention. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 4 ++-- arch/powerpc/kvm/book3s_64_entry.S | 4 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index e6f7fc7c61a1..c25395b5921a 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -2028,13 +2028,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) * Requires __LOAD_FAR_HANDLER beause kvmppc_interrupt lives * outside the head section. */ - __LOAD_FAR_HANDLER(r10, kvmppc_interrupt) + __LOAD_FAR_HANDLER(r10, kvmppc_hcall) mtctr r10 ld r10,PACA_EXGEN+EX_R10(r13) bctr #else ld r10,PACA_EXGEN+EX_R10(r13) - b kvmppc_interrupt + b kvmppc_hcall #endif #endif diff --git a/arch/powerpc/kvm/book3s_64_entry.S b/arch/powerpc/kvm/book3s_64_entry.S index 8e7216f3c3ee..3b894b90862f 100644 --- a/arch/powerpc/kvm/book3s_64_entry.S +++ b/arch/powerpc/kvm/book3s_64_entry.S @@ -9,6 +9,10 @@ /* * We come here from the first-level interrupt handlers. */ +.globalkvmppc_hcall +.balign IFETCH_ALIGN_BYTES +kvmppc_hcall: + .globalkvmppc_interrupt .balign IFETCH_ALIGN_BYTES kvmppc_interrupt: -- 2.23.0
[RFC PATCH 2/9] KVM: PPC: Book3S 64: Move GUEST_MODE_SKIP test into KVM
Move the GUEST_MODE_SKIP logic into KVM code. This is quite a KVM internal detail that has no real need to be in common handlers. (XXX: Need to confirm CBE handlers etc) Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 64 arch/powerpc/kvm/book3s_64_entry.S | 50 ++ 2 files changed, 42 insertions(+), 72 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 65659ea3cec4..e6f7fc7c61a1 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -133,7 +133,6 @@ name: #define IBRANCH_TO_COMMON .L_IBRANCH_TO_COMMON_\name\() /* ENTRY branch to common */ #define IREALMODE_COMMON .L_IREALMODE_COMMON_\name\() /* Common runs in realmode */ #define IMASK .L_IMASK_\name\() /* IRQ soft-mask bit */ -#define IKVM_SKIP .L_IKVM_SKIP_\name\() /* Generate KVM skip handler */ #define IKVM_REAL .L_IKVM_REAL_\name\() /* Real entry tests KVM */ #define __IKVM_REAL(name) .L_IKVM_REAL_ ## name #define IKVM_VIRT .L_IKVM_VIRT_\name\() /* Virt entry tests KVM */ @@ -191,9 +190,6 @@ do_define_int n .ifndef IMASK IMASK=0 .endif - .ifndef IKVM_SKIP - IKVM_SKIP=0 - .endif .ifndef IKVM_REAL IKVM_REAL=0 .endif @@ -254,15 +250,10 @@ do_define_int n .balign IFETCH_ALIGN_BYTES \name\()_kvm: - .if IKVM_SKIP - cmpwi r10,KVM_GUEST_MODE_SKIP - beq 89f - .else BEGIN_FTR_SECTION ld r10,IAREA+EX_CFAR(r13) std r10,HSTATE_CFAR(r13) END_FTR_SECTION_IFSET(CPU_FTR_CFAR) - .endif ld r10,IAREA+EX_CTR(r13) mtctr r10 @@ -289,27 +280,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) ori r12,r12,(IVEC) .endif b kvmppc_interrupt - - .if IKVM_SKIP -89:mtocrf 0x80,r9 - ld r10,IAREA+EX_CTR(r13) - mtctr r10 - ld r9,IAREA+EX_R9(r13) - ld r10,IAREA+EX_R10(r13) - ld r11,IAREA+EX_R11(r13) - ld r12,IAREA+EX_R12(r13) - .if IHSRR_IF_HVMODE - BEGIN_FTR_SECTION - b kvmppc_skip_Hinterrupt - FTR_SECTION_ELSE - b kvmppc_skip_interrupt - ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) - .elseif IHSRR - b kvmppc_skip_Hinterrupt - .else - b kvmppc_skip_interrupt - .endif - .endif .endm #else @@ -1128,7 +1098,6 @@ INT_DEFINE_BEGIN(machine_check) ISET_RI=0 IDAR=1 IDSISR=1 - IKVM_SKIP=1 IKVM_REAL=1 INT_DEFINE_END(machine_check) @@ -1419,7 +1388,6 @@ INT_DEFINE_BEGIN(data_access) IVEC=0x300 IDAR=1 IDSISR=1 - IKVM_SKIP=1 IKVM_REAL=1 INT_DEFINE_END(data_access) @@ -1469,7 +1437,6 @@ INT_DEFINE_BEGIN(data_access_slb) IAREA=PACA_EXSLB IRECONCILE=0 IDAR=1 - IKVM_SKIP=1 IKVM_REAL=1 INT_DEFINE_END(data_access_slb) @@ -2116,7 +2083,6 @@ INT_DEFINE_BEGIN(h_data_storage) IHSRR=1 IDAR=1 IDSISR=1 - IKVM_SKIP=1 IKVM_REAL=1 IKVM_VIRT=1 INT_DEFINE_END(h_data_storage) @@ -2573,7 +2539,6 @@ EXC_VIRT_NONE(0x5100, 0x100) INT_DEFINE_BEGIN(cbe_system_error) IVEC=0x1200 IHSRR=1 - IKVM_SKIP=1 IKVM_REAL=1 INT_DEFINE_END(cbe_system_error) @@ -2598,7 +2563,6 @@ EXC_VIRT_NONE(0x5200, 0x100) INT_DEFINE_BEGIN(instruction_breakpoint) IVEC=0x1300 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE - IKVM_SKIP=1 IKVM_REAL=1 #endif INT_DEFINE_END(instruction_breakpoint) @@ -2744,7 +2708,6 @@ EXC_COMMON_BEGIN(denorm_exception_common) INT_DEFINE_BEGIN(cbe_maintenance) IVEC=0x1600 IHSRR=1 - IKVM_SKIP=1 IKVM_REAL=1 INT_DEFINE_END(cbe_maintenance) @@ -2797,7 +2760,6 @@ EXC_COMMON_BEGIN(altivec_assist_common) INT_DEFINE_BEGIN(cbe_thermal) IVEC=0x1800 IHSRR=1 - IKVM_SKIP=1 IKVM_REAL=1 INT_DEFINE_END(cbe_thermal) @@ -3081,32 +3043,6 @@ EXPORT_SYMBOL(do_uaccess_flush) MASKED_INTERRUPT MASKED_INTERRUPT hsrr=1 -#ifdef CONFIG_KVM_BOOK3S_64_HANDLER -kvmppc_skip_interrupt: - /* -* Here all GPRs are unchanged from when the interrupt happened -* except for r13, which is saved in SPRG_SCRATCH0. -*/ - mfspr r13, SPRN_SRR0 - addir13, r13, 4 - mtspr SPRN_SRR0, r13 - GET_SCRATCH0(r13) - RFI_TO_KERNEL - b . - -kvmppc_skip_Hinterrupt: - /* -* Here all GPRs are unchanged from when the interrupt happened -* except for r13, which is saved in SPRG_SCRATCH0. -*/ - mfspr r13, SPRN_HSRR0 - addir13, r13, 4 - mtspr SPRN_HSRR0, r13 - GET_SCRATCH0(r13) - HRFI_TO_KERNEL - b . -#endif -
[RFC PATCH 1/9] KVM: PPC: Book3S 64: move KVM interrupt entry to a common entry point
Rather than bifurcate the call depending on whether or not HV is possible, and have the HV entry test for PR, just make a single common point which does the demultiplexing. This makes it simpler to add another type of exit handler. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S| 8 +- arch/powerpc/kvm/Makefile | 3 +++ arch/powerpc/kvm/book3s_64_entry.S | 34 + arch/powerpc/kvm/book3s_hv_rmhandlers.S | 11 ++-- 4 files changed, 40 insertions(+), 16 deletions(-) create mode 100644 arch/powerpc/kvm/book3s_64_entry.S diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index e02ad6fefa46..65659ea3cec4 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -212,7 +212,6 @@ do_define_int n .endm #ifdef CONFIG_KVM_BOOK3S_64_HANDLER -#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE /* * All interrupts which set HSRR registers, as well as SRESET and MCE and * syscall when invoked with "sc 1" switch to MSR[HV]=1 (HVMODE) to be taken, @@ -242,13 +241,8 @@ do_define_int n /* * If an interrupt is taken while a guest is running, it is immediately routed - * to KVM to handle. If both HV and PR KVM arepossible, KVM interrupts go first - * to kvmppc_interrupt_hv, which handles the PR guest case. + * to KVM to handle. */ -#define kvmppc_interrupt kvmppc_interrupt_hv -#else -#define kvmppc_interrupt kvmppc_interrupt_pr -#endif .macro KVMTEST name lbz r10,HSTATE_IN_GUEST(r13) diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index 2bfeaa13befb..cdd119028f64 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -59,6 +59,9 @@ kvm-pr-y := \ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \ tm.o +kvm-book3s_64-builtin-objs-y += \ + book3s_64_entry.o + ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \ book3s_rmhandlers.o diff --git a/arch/powerpc/kvm/book3s_64_entry.S b/arch/powerpc/kvm/book3s_64_entry.S new file mode 100644 index ..22e34b95f478 --- /dev/null +++ b/arch/powerpc/kvm/book3s_64_entry.S @@ -0,0 +1,34 @@ +#include +#include +#include +#include +#include +#include + +/* + * We come here from the first-level interrupt handlers. + */ +.globalkvmppc_interrupt +.balign IFETCH_ALIGN_BYTES +kvmppc_interrupt: + /* +* Register contents: +* R12 = (guest CR << 32) | interrupt vector +* R13 = PACA +* guest R12 saved in shadow VCPU SCRATCH0 +* guest R13 saved in SPRN_SCRATCH0 +*/ +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE + std r9, HSTATE_SCRATCH2(r13) + lbz r9, HSTATE_IN_GUEST(r13) + cmpwi r9, KVM_GUEST_MODE_HOST_HV + beq kvmppc_bad_host_intr +#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE + cmpwi r9, KVM_GUEST_MODE_GUEST + ld r9, HSTATE_SCRATCH2(r13) + beq kvmppc_interrupt_pr +#endif + b kvmppc_interrupt_hv +#else + b kvmppc_interrupt_pr +#endif diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 8cf1f69f442e..b9c4acd747f7 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1255,16 +1255,8 @@ kvmppc_interrupt_hv: * R13 = PACA * guest R12 saved in shadow VCPU SCRATCH0 * guest R13 saved in SPRN_SCRATCH0 +* guest R9 saved in HSTATE_SCRATCH2 */ - std r9, HSTATE_SCRATCH2(r13) - lbz r9, HSTATE_IN_GUEST(r13) - cmpwi r9, KVM_GUEST_MODE_HOST_HV - beq kvmppc_bad_host_intr -#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE - cmpwi r9, KVM_GUEST_MODE_GUEST - ld r9, HSTATE_SCRATCH2(r13) - beq kvmppc_interrupt_pr -#endif /* We're now back in the host but in guest MMU context */ li r9, KVM_GUEST_MODE_HOST_HV stb r9, HSTATE_IN_GUEST(r13) @@ -3253,6 +3245,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_P9_TM_HV_ASSIST) * cfar is saved in HSTATE_CFAR(r13) * ppr is saved in HSTATE_PPR(r13) */ +.global kvmppc_bad_host_intr kvmppc_bad_host_intr: /* * Switch to the emergency stack, but start half-way down in -- 2.23.0
[RFC PATCH 0/9] KVM: PPC: Book3S: C-ify the P9 entry/exit code
This series goes on top of the KVM patches I sent for the next merge window. It's pretty rough and incomplete at the moment but has started booting a simple guest into busybox in sim. One of the main points of this is to avoid running KVM in a special mode (aka "realmode") which is hostile to the rest of Linux and can't be instrumented well by core facilities or generally use core code. The asm itself is also tricky to follow. It's nothing compared with the old HV path when you sit down and go through it, but it's not trivial and sharing code paths makes things painful too. One issue is what to do about PR-KVM and pre-ISAv3.0 / HPT HV-KVM. The old path could also be converted similarly to C, although that's a far bigger job. At least removing the asm code sharing makes a (slight) simplification to the old path for now. This change is a pretty clean break in the low level exit/entry code, and the rest of the code already has many divergences, so I don't think this exacerbates the problem, but if those PR / old-HV are to be maintained well then we should eat the cost early to modernise that code too. If they stay walled off with these interface shims then they'll just be harder to maintain and the problem will definitely not get better. Now, the number of people who understand PR-KVM and have time to maintain it is roughly zero, and for the old HV path it's approaching zero. The radix, LPAR-per-thread programming model by comparison is so nice and simple, maybe its better to just keep the rest on life support and keep them going for as long as we can with the bare minimum. Thanks, Nick Nicholas Piggin (9): KVM: PPC: Book3S 64: move KVM interrupt entry to a common entry point KVM: PPC: Book3S 64: Move GUEST_MODE_SKIP test into KVM KVM: PPC: Book3S 64: add hcall interrupt handler KVM: PPC: Book3S HV: Move hcall early register setup to KVM powerpc/64s: Remove EXSLB interrupt save area KVM: PPC: Book3S HV: Move interrupt early register setup to KVM KVM: PPC: Book3S HV: move bad_host_intr check to HV handler KVM: PPC: Book3S HV: Minimise hcall handler calling convention differences KVM: PPC: Book3S HV: Implement the rest of the P9 entry/exit handling in C arch/powerpc/include/asm/asm-prototypes.h | 2 +- arch/powerpc/include/asm/exception-64s.h | 13 ++ arch/powerpc/include/asm/kvm_asm.h| 3 +- arch/powerpc/include/asm/kvm_book3s_64.h | 2 + arch/powerpc/include/asm/kvm_ppc.h| 2 + arch/powerpc/include/asm/paca.h | 3 +- arch/powerpc/kernel/asm-offsets.c | 1 - arch/powerpc/kernel/exceptions-64s.S | 259 +++--- arch/powerpc/kvm/Makefile | 6 + arch/powerpc/kvm/book3s_64_entry.S| 232 +++ arch/powerpc/kvm/book3s_hv.c | 21 +- arch/powerpc/kvm/book3s_hv_interrupt.c| 164 ++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 136 ++-- arch/powerpc/kvm/book3s_segment.S | 7 + arch/powerpc/kvm/book3s_xive.c| 32 +++ 15 files changed, 523 insertions(+), 360 deletions(-) create mode 100644 arch/powerpc/kvm/book3s_64_entry.S create mode 100644 arch/powerpc/kvm/book3s_hv_interrupt.c -- 2.23.0
Re: [PATCH net v2] ibmvnic: device remove has higher precedence over reset
Hello: This patch was applied to netdev/net.git (refs/heads/master): On Thu, 28 Jan 2021 22:34:01 -0600 you wrote: > Returning -EBUSY in ibmvnic_remove() does not actually hold the > removal procedure since driver core doesn't care for the return > value (see __device_release_driver() in drivers/base/dd.c > calling dev->bus->remove()) though vio_bus_remove > (in arch/powerpc/platforms/pseries/vio.c) records the > return value and passes it on. [1] > > [...] Here is the summary with links: - [net,v2] ibmvnic: device remove has higher precedence over reset https://git.kernel.org/netdev/net/c/5e9eff5dfa46 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
Re: [PATCH] ASoC: fsl_spdif: Utilize the defined parameter to clear code
On Thu, Jan 28, 2021 at 7:28 PM Tang Bin wrote: > > Utilize the defined parameter 'dev' to make the code cleaner. > > Signed-off-by: Tang Bin Acked-by: Shengjiu Wang
Re: [PATCH] ASoC: fsl_xcvr: remove unneeded semicolon
On Mon, Feb 1, 2021 at 4:08 PM Yang Li wrote: > > Eliminate the following coccicheck warning: > ./sound/soc/fsl/fsl_xcvr.c:739:2-3: Unneeded semicolon > > Reported-by: Abaci Robot > Signed-off-by: Yang Li Acked-by: Shengjiu Wang
Re: [PATCH v4 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
On Mon Feb 1, 2021 at 10:54 AM CST, Gabriel Paubert wrote: > On Mon, Feb 01, 2021 at 09:55:44AM -0600, Christopher M. Riedl wrote: > > On Thu Jan 28, 2021 at 4:38 AM CST, David Laight wrote: > > > From: Christopher M. Riedl > > > > Sent: 28 January 2021 04:04 > > > > > > > > Reuse the "safe" implementation from signal.c except for calling > > > > unsafe_copy_from_user() to copy into a local buffer. > > > > > > > > Signed-off-by: Christopher M. Riedl > > > > --- > > > > arch/powerpc/kernel/signal.h | 33 + > > > > 1 file changed, 33 insertions(+) > > > > > > > > diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h > > > > index 2559a681536e..c18402d625f1 100644 > > > > --- a/arch/powerpc/kernel/signal.h > > > > +++ b/arch/powerpc/kernel/signal.h > > > > @@ -53,6 +53,33 @@ unsigned long copy_ckfpr_from_user(struct > > > > task_struct *task, void __user *from); > > > > [i], label);\ > > > > } while (0) > > > > > > > > +#define unsafe_copy_fpr_from_user(task, from, label) do { > > > > \ > > > > + struct task_struct *__t = task; > > > > \ > > > > + u64 __user *__f = (u64 __user *)from; > > > > \ > > > > + u64 buf[ELF_NFPREG]; > > > > \ > > > > > > How big is that buffer? > > > Isn't is likely to be reasonably large compared to a reasonable > > > kernel stack frame. > > > Especially since this isn't even a leaf function. > > > > > > > I think Christophe answered this - I don't really have an opinion either > > way. What would be a 'reasonable' kernel stack frame for reference? > > See include/linux/poll.h, where the limit is of the order of 800 bytes > and the number of entries in an on stack array is chosen at compile time > (different between 32 and 64 bit for example). > > The values are used in do_sys_poll, which, with almost 1000 bytes of > stack footprint, appears close to the top of "make checkstack". In > addition do_sys_poll has to call the ->poll function of every file > descriptor in its table, so it is not a tail function. > > This 264 bytes array looks reasonable, but please use 'make checkstack' > to verify that the function's total stack usage stays within reason. Neat, looks like total usage is a bit larger but still reasonable and less than half of 800B: 0xc0017e900 __unsafe_restore_sigcontext.constprop.0 [vmlinux]:352 Thanks for the tip! > > Gabriel > > > > > > > + int i; > > > > \ > > > > + > > > > \ > > > > + unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double), > > > > \ > > > > > > That really ought to be sizeof(buf). > > > > > > > Agreed, I will fix this. Thanks! > > > > > David > > > > > > > > > > + label); > > > > \ > > > > + for (i = 0; i < ELF_NFPREG - 1; i++) > > > > \ > > > > + __t->thread.TS_FPR(i) = buf[i]; > > > > \ > > > > + __t->thread.fp_state.fpscr = buf[i]; > > > > \ > > > > +} while (0) > > > > + > > > > +#define unsafe_copy_vsx_from_user(task, from, label) do { > > > > \ > > > > + struct task_struct *__t = task; > > > > \ > > > > + u64 __user *__f = (u64 __user *)from; > > > > \ > > > > + u64 buf[ELF_NVSRHALFREG]; > > > > \ > > > > + int i; > > > > \ > > > > + > > > > \ > > > > + unsafe_copy_from_user(buf, __f, > > > > \ > > > > + ELF_NVSRHALFREG * sizeof(double), > > > > \ > > > > + label); > > > > \ > > > > + for (i = 0; i < ELF_NVSRHALFREG ; i++) > > > > \ > > > > + __t->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i]; > > > > \ > > > > +} while (0) > > > > + > > > > + > > > > #ifdef CONFIG_PPC_TRANSACTIONAL_MEM > > > > #define unsafe_copy_ckfpr_to_user(to, task, label) do { > > > > \ > > > > struct task_struct *__t = task; > > > > \ > > > > @@ -80,6 +107,10 @@ unsigned long copy_ckfpr_from_user(struct > > > > task_struct *task, void __user *from); > > > > unsafe_copy_to_user(to, (task)->thread.fp_state.fpr,\ > > > > ELF_NFPREG * sizeof(double), label) > > > > > > > > +#define unsafe_copy_fpr_from_user(task, from, label) > > > > \ > > > > +
[PATCH] powerpc/64/signal: Fix regression in __kernel_sigtramp_rt64 semantics
Tested on powerpc64 and powerpc64le, with a glibc build and running the affected glibc's testcase[2], inspected that glibc's backtrace() now gives the correct result and gdb backtrace also keeps working as before. I believe this should be backported to releases 5.9 and 5.10 as userspace is affected in this releases. 8< A Change[1] in __kernel_sigtramp_rt64 VDSO and trampoline code introduced a regression in the way glibc's backtrace()[2] detects the signal-handler stack frame. Apart from the practical implications, __kernel_sigtram_rt64 was a VDSO with the semantics that it is a function you can call from userspace to end a signal handling. Now this semantics are no longer valid. I believe the aforementioned change affects all releases since 5.9. This patch tries to fix both the semantics and practical aspect of __kernel_sigtramp_rt64 returning it to the previous code, whilst keeping the intended behavior from[1] by adding a new symbol to serve as the jump target from the kernel to the trampoline. Now the trampoline has two parts, an new entry point and the old return point. [1] commit 0138ba5783ae0dcc799ad401a1e8ac8333790df9 ("powerpc/64/signal: Balance return predictor stack in signal trampoline") [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-January/223194.html Fixes: 0138ba5783ae ("powerpc/64/signal: Balance return predictor stack in signal trampoline") Signed-off-by: Raoni Fassina Firmino --- arch/powerpc/kernel/vdso64/sigtramp.S | 9 - arch/powerpc/kernel/vdso64/vdso64.lds.S | 2 +- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/vdso64/sigtramp.S b/arch/powerpc/kernel/vdso64/sigtramp.S index bbf68cd01088..f0fd8d2a9fc4 100644 --- a/arch/powerpc/kernel/vdso64/sigtramp.S +++ b/arch/powerpc/kernel/vdso64/sigtramp.S @@ -15,11 +15,18 @@ .text +/* __kernel_start_sigtramp_rt64 and __kernel_sigtramp_rt64 together + are one function split in two parts. The kernel jumps to the former + and the signal handler indirectly (by blr) returns to the latter. + __kernel_sigtramp_rt64 needs to point to the return address so + glibc can correctly identify the trampoline stack frame. */ .balign 8 .balign IFETCH_ALIGN_BYTES -V_FUNCTION_BEGIN(__kernel_sigtramp_rt64) +V_FUNCTION_BEGIN(__kernel_start_sigtramp_rt64) .Lsigrt_start: bctrl /* call the handler */ +V_FUNCTION_END(__kernel_start_sigtramp_rt64) +V_FUNCTION_BEGIN(__kernel_sigtramp_rt64) addir1, r1, __SIGNAL_FRAMESIZE li r0,__NR_rt_sigreturn sc diff --git a/arch/powerpc/kernel/vdso64/vdso64.lds.S b/arch/powerpc/kernel/vdso64/vdso64.lds.S index 6164d1a1ba11..2f3c359cacd3 100644 --- a/arch/powerpc/kernel/vdso64/vdso64.lds.S +++ b/arch/powerpc/kernel/vdso64/vdso64.lds.S @@ -131,4 +131,4 @@ VERSION /* * Make the sigreturn code visible to the kernel. */ -VDSO_sigtramp_rt64 = __kernel_sigtramp_rt64; +VDSO_sigtramp_rt64 = __kernel_start_sigtramp_rt64; base-commit: 76c057c84d286140c6c416c3b4ba832cd1d8984e -- 2.26.2
Re: [PATCH v15 09/10] arm64: Call kmalloc() to allocate DTB buffer
Joe Perches writes: > On Thu, 2021-01-28 at 00:52 -0300, Thiago Jung Bauermann wrote: >> The problem is that this patch implements only part of the suggestion, >> which isn't useful in itself. So the patch series should either drop >> this patch or consolidate the FDT allocation between the arches. >> >> I just tested on powernv and pseries platforms and powerpc can use >> vmalloc for the FDT buffer. > > Perhaps more sensible to use kvmalloc/kvfree. That's true. Converting both arm64 to powerpc to kvmalloc/kvfree is a good option. I don't think it's that much better though, because kexec_file_load() is called infrequently and doesn't need to be fast so the vmalloc() overhead isn't important in practice. -- Thiago Jung Bauermann IBM Linux Technology Center
RE: [PATCH v4 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
On Mon Feb 1, 2021 at 11:37 AM CST, David Laight wrote: > From: Christopher M. Riedl > > Sent: 01 February 2021 16:55 > ... > > > > > > + int i; > > > > > > \ > > > > > > + > > > > > > \ > > > > > > + unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double), > > > > > > \ > > > > > > + label); > > > > > > \ > > > > > > + for (i = 0; i < ELF_NFPREG - 1; i++) > > > > > > \ > > > > > > + __t->thread.TS_FPR(i) = buf[i]; > > > > > > \ > > > > > > + __t->thread.fp_state.fpscr = buf[i]; > > > > > > \ > > > > > > +} while (0) > > > > > > On further reflection, since you immediately loop through the buffer > > > why not just use user_access_begin() and unsafe_get_user() in the loop. > > > > Christophe had suggested this a few revisions ago as well. When I tried > > this approach, the signal handling performance took a pretty big hit: > > https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-October/219351.html > > > > I included some numbers on v3 as well but decided to drop the approach > > altogether for this one since it just didn't seem worth the hit. > > Was that using unsafe_get_user (which relies on user_access_begin() > having 'opened up' user accesses) or just get_user() that does > it for every access? > > The former should be ok, the latter will be horrid. It was using unsafe_get_user() whereas unsafe_copy_from_user() will just call the optimized __copy_tofrom_user() a single time - assuming that user access is open. > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, > MK1 1PT, UK > Registration No: 1397386 (Wales)
RE: [PATCH v4 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
From: Christopher M. Riedl > Sent: 01 February 2021 16:55 ... > > > > > + int i; > > > > > \ > > > > > + > > > > > \ > > > > > + unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double), > > > > > \ > > > > > + label); > > > > > \ > > > > > + for (i = 0; i < ELF_NFPREG - 1; i++) > > > > > \ > > > > > + __t->thread.TS_FPR(i) = buf[i]; > > > > > \ > > > > > + __t->thread.fp_state.fpscr = buf[i]; > > > > > \ > > > > > +} while (0) > > > > On further reflection, since you immediately loop through the buffer > > why not just use user_access_begin() and unsafe_get_user() in the loop. > > Christophe had suggested this a few revisions ago as well. When I tried > this approach, the signal handling performance took a pretty big hit: > https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-October/219351.html > > I included some numbers on v3 as well but decided to drop the approach > altogether for this one since it just didn't seem worth the hit. Was that using unsafe_get_user (which relies on user_access_begin() having 'opened up' user accesses) or just get_user() that does it for every access? The former should be ok, the latter will be horrid. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH v4 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
On Mon, Feb 01, 2021 at 09:55:44AM -0600, Christopher M. Riedl wrote: > On Thu Jan 28, 2021 at 4:38 AM CST, David Laight wrote: > > From: Christopher M. Riedl > > > Sent: 28 January 2021 04:04 > > > > > > Reuse the "safe" implementation from signal.c except for calling > > > unsafe_copy_from_user() to copy into a local buffer. > > > > > > Signed-off-by: Christopher M. Riedl > > > --- > > > arch/powerpc/kernel/signal.h | 33 + > > > 1 file changed, 33 insertions(+) > > > > > > diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h > > > index 2559a681536e..c18402d625f1 100644 > > > --- a/arch/powerpc/kernel/signal.h > > > +++ b/arch/powerpc/kernel/signal.h > > > @@ -53,6 +53,33 @@ unsigned long copy_ckfpr_from_user(struct task_struct > > > *task, void __user *from); > > > [i], label);\ > > > } while (0) > > > > > > +#define unsafe_copy_fpr_from_user(task, from, label) do { > > > \ > > > + struct task_struct *__t = task; \ > > > + u64 __user *__f = (u64 __user *)from; \ > > > + u64 buf[ELF_NFPREG];\ > > > > How big is that buffer? > > Isn't is likely to be reasonably large compared to a reasonable > > kernel stack frame. > > Especially since this isn't even a leaf function. > > > > I think Christophe answered this - I don't really have an opinion either > way. What would be a 'reasonable' kernel stack frame for reference? See include/linux/poll.h, where the limit is of the order of 800 bytes and the number of entries in an on stack array is chosen at compile time (different between 32 and 64 bit for example). The values are used in do_sys_poll, which, with almost 1000 bytes of stack footprint, appears close to the top of "make checkstack". In addition do_sys_poll has to call the ->poll function of every file descriptor in its table, so it is not a tail function. This 264 bytes array looks reasonable, but please use 'make checkstack' to verify that the function's total stack usage stays within reason. Gabriel > > > > + int i; \ > > > + \ > > > + unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double),\ > > > > That really ought to be sizeof(buf). > > > > Agreed, I will fix this. Thanks! > > > David > > > > > > > + label); \ > > > + for (i = 0; i < ELF_NFPREG - 1; i++)\ > > > + __t->thread.TS_FPR(i) = buf[i]; \ > > > + __t->thread.fp_state.fpscr = buf[i];\ > > > +} while (0) > > > + > > > +#define unsafe_copy_vsx_from_user(task, from, label) do { > > > \ > > > + struct task_struct *__t = task; \ > > > + u64 __user *__f = (u64 __user *)from; \ > > > + u64 buf[ELF_NVSRHALFREG]; \ > > > + int i; \ > > > + \ > > > + unsafe_copy_from_user(buf, __f, \ > > > + ELF_NVSRHALFREG * sizeof(double), \ > > > + label); \ > > > + for (i = 0; i < ELF_NVSRHALFREG ; i++) \ > > > + __t->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i]; \ > > > +} while (0) > > > + > > > + > > > #ifdef CONFIG_PPC_TRANSACTIONAL_MEM > > > #define unsafe_copy_ckfpr_to_user(to, task, label) do { > > > \ > > > struct task_struct *__t = task; \ > > > @@ -80,6 +107,10 @@ unsigned long copy_ckfpr_from_user(struct task_struct > > > *task, void __user *from); > > > unsafe_copy_to_user(to, (task)->thread.fp_state.fpr,\ > > > ELF_NFPREG * sizeof(double), label) > > > > > > +#define unsafe_copy_fpr_from_user(task, from, label) > > > \ > > > + unsafe_copy_from_user((task)->thread.fp_state.fpr, from,\ > > > + ELF_NFPREG * sizeof(double), label) > > > + > > > static inline unsigned long > > > copy_fpr_to_user(void __user *to, struct task_struct *task) > > > { > > > @@ -115,6 +146,8 @@ copy_ckfpr_from_user(struct task_struct *task, void > > > __user *from) > > > #else > > > #define unsafe_copy_fpr_to_user(to, task, label) do { } while (0) > > > > > > +#define unsafe_copy_fpr_from_user(task, from, label) do { } while (0) > > > + > > > static inline unsigned long > > > copy_fpr_to_user(void __user *to, struct task_struct *task) > > > { > > > -- > > > 2.26.1 > > > > - > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, > > MK1 1PT, UK > >
RE: [PATCH v4 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
On Mon Feb 1, 2021 at 10:15 AM CST, David Laight wrote: > From: Christopher M. Riedl > > Sent: 01 February 2021 15:56 > > > > On Thu Jan 28, 2021 at 4:38 AM CST, David Laight wrote: > > > From: Christopher M. Riedl > > > > Sent: 28 January 2021 04:04 > > > > > > > > Reuse the "safe" implementation from signal.c except for calling > > > > unsafe_copy_from_user() to copy into a local buffer. > > > > > > > > Signed-off-by: Christopher M. Riedl > > > > --- > > > > arch/powerpc/kernel/signal.h | 33 + > > > > 1 file changed, 33 insertions(+) > > > > > > > > diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h > > > > index 2559a681536e..c18402d625f1 100644 > > > > --- a/arch/powerpc/kernel/signal.h > > > > +++ b/arch/powerpc/kernel/signal.h > > > > @@ -53,6 +53,33 @@ unsigned long copy_ckfpr_from_user(struct > > > > task_struct *task, void __user *from); > > > > [i], label);\ > > > > } while (0) > > > > > > > > +#define unsafe_copy_fpr_from_user(task, from, label) do { > > > > \ > > > > + struct task_struct *__t = task; > > > > \ > > > > + u64 __user *__f = (u64 __user *)from; > > > > \ > > > > + u64 buf[ELF_NFPREG]; > > > > \ > > > > > > How big is that buffer? > > > Isn't is likely to be reasonably large compared to a reasonable > > > kernel stack frame. > > > Especially since this isn't even a leaf function. > > > > > > > I think Christophe answered this - I don't really have an opinion either > > way. What would be a 'reasonable' kernel stack frame for reference? > > Zero :-) > Hehe good point! > > > > > > + int i; > > > > \ > > > > + > > > > \ > > > > + unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double), > > > > \ > > > > + label); > > > > \ > > > > + for (i = 0; i < ELF_NFPREG - 1; i++) > > > > \ > > > > + __t->thread.TS_FPR(i) = buf[i]; > > > > \ > > > > + __t->thread.fp_state.fpscr = buf[i]; > > > > \ > > > > +} while (0) > > On further reflection, since you immediately loop through the buffer > why not just use user_access_begin() and unsafe_get_user() in the loop. Christophe had suggested this a few revisions ago as well. When I tried this approach, the signal handling performance took a pretty big hit: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-October/219351.html I included some numbers on v3 as well but decided to drop the approach altogether for this one since it just didn't seem worth the hit. > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, > MK1 1PT, UK > Registration No: 1397386 (Wales)
Re: [PATCH 05/13] kallsyms: refactor {,module_}kallsyms_on_each_symbol
On Mon, Feb 01, 2021 at 02:37:12PM +0100, Miroslav Benes wrote: > > > This change is not needed. (objname == NULL) means that we are > > > interested only in symbols in "vmlinux". > > > > > > module_kallsyms_on_each_symbol(klp_find_callback, ) > > > will always fail when objname == NULL. > > > > I just tried to keep the old behavior. I can respin it with your > > recommended change noting the change in behavior, though. > > Yes, please. It would be cleaner that way. Let me know if this works for you: --- >From 18af41e88d088cfb8680d1669fcae2bc2ede5328 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 20 Jan 2021 16:23:16 +0100 Subject: kallsyms: refactor {,module_}kallsyms_on_each_symbol Require an explicit call to module_kallsyms_on_each_symbol to look for symbols in modules instead of the call from kallsyms_on_each_symbol, and acquire module_mutex inside of module_kallsyms_on_each_symbol instead of leaving that up to the caller. Note that this slightly changes the behavior for the livepatch code in that the symbols from vmlinux are not iterated anymore if objname is set, but that actually is the desired behavior in this case. Signed-off-by: Christoph Hellwig --- kernel/kallsyms.c | 6 +- kernel/livepatch/core.c | 2 -- kernel/module.c | 13 - 3 files changed, 9 insertions(+), 12 deletions(-) diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index fe9de067771c34..a0d3f0865916f9 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -177,6 +177,10 @@ unsigned long kallsyms_lookup_name(const char *name) return module_kallsyms_lookup_name(name); } +/* + * Iterate over all symbols in vmlinux. For symbols from modules use + * module_kallsyms_on_each_symbol instead. + */ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, unsigned long), void *data) @@ -192,7 +196,7 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, if (ret != 0) return ret; } - return module_kallsyms_on_each_symbol(fn, data); + return 0; } static unsigned long get_symbol_pos(unsigned long addr, diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index 262cd9b003b9f0..335d988bd81117 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -164,12 +164,10 @@ static int klp_find_object_symbol(const char *objname, const char *name, .pos = sympos, }; - mutex_lock(_mutex); if (objname) module_kallsyms_on_each_symbol(klp_find_callback, ); else kallsyms_on_each_symbol(klp_find_callback, ); - mutex_unlock(_mutex); /* * Ensure an address was found. If sympos is 0, ensure symbol is unique; diff --git a/kernel/module.c b/kernel/module.c index 6772fb2680eb3e..25345792c770d1 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -255,11 +255,6 @@ static void mod_update_bounds(struct module *mod) struct list_head *kdb_modules = /* kdb needs the list of modules */ #endif /* CONFIG_KGDB_KDB */ -static void module_assert_mutex(void) -{ - lockdep_assert_held(_mutex); -} - static void module_assert_mutex_or_preempt(void) { #ifdef CONFIG_LOCKDEP @@ -4379,8 +4374,7 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, unsigned int i; int ret; - module_assert_mutex(); - + mutex_lock(_mutex); list_for_each_entry(mod, , list) { /* We hold module_mutex: no need for rcu_dereference_sched */ struct mod_kallsyms *kallsyms = mod->kallsyms; @@ -4396,10 +4390,11 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, ret = fn(data, kallsyms_symbol_name(kallsyms, i), mod, kallsyms_symbol_value(sym)); if (ret != 0) - return ret; + break; } } - return 0; + mutex_unlock(_mutex); + return ret; } #endif /* CONFIG_KALLSYMS */ -- 2.29.2
RE: [PATCH v4 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
From: Christopher M. Riedl > Sent: 01 February 2021 15:56 > > On Thu Jan 28, 2021 at 4:38 AM CST, David Laight wrote: > > From: Christopher M. Riedl > > > Sent: 28 January 2021 04:04 > > > > > > Reuse the "safe" implementation from signal.c except for calling > > > unsafe_copy_from_user() to copy into a local buffer. > > > > > > Signed-off-by: Christopher M. Riedl > > > --- > > > arch/powerpc/kernel/signal.h | 33 + > > > 1 file changed, 33 insertions(+) > > > > > > diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h > > > index 2559a681536e..c18402d625f1 100644 > > > --- a/arch/powerpc/kernel/signal.h > > > +++ b/arch/powerpc/kernel/signal.h > > > @@ -53,6 +53,33 @@ unsigned long copy_ckfpr_from_user(struct task_struct > > > *task, void __user *from); > > > [i], label);\ > > > } while (0) > > > > > > +#define unsafe_copy_fpr_from_user(task, from, label) do { > > > \ > > > + struct task_struct *__t = task; \ > > > + u64 __user *__f = (u64 __user *)from; \ > > > + u64 buf[ELF_NFPREG];\ > > > > How big is that buffer? > > Isn't is likely to be reasonably large compared to a reasonable > > kernel stack frame. > > Especially since this isn't even a leaf function. > > > > I think Christophe answered this - I don't really have an opinion either > way. What would be a 'reasonable' kernel stack frame for reference? Zero :-) > > > > + int i; \ > > > + \ > > > + unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double),\ > > > + label); \ > > > + for (i = 0; i < ELF_NFPREG - 1; i++)\ > > > + __t->thread.TS_FPR(i) = buf[i]; \ > > > + __t->thread.fp_state.fpscr = buf[i];\ > > > +} while (0) On further reflection, since you immediately loop through the buffer why not just use user_access_begin() and unsafe_get_user() in the loop. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
RE: [PATCH v4 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
On Thu Jan 28, 2021 at 4:38 AM CST, David Laight wrote: > From: Christopher M. Riedl > > Sent: 28 January 2021 04:04 > > > > Reuse the "safe" implementation from signal.c except for calling > > unsafe_copy_from_user() to copy into a local buffer. > > > > Signed-off-by: Christopher M. Riedl > > --- > > arch/powerpc/kernel/signal.h | 33 + > > 1 file changed, 33 insertions(+) > > > > diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h > > index 2559a681536e..c18402d625f1 100644 > > --- a/arch/powerpc/kernel/signal.h > > +++ b/arch/powerpc/kernel/signal.h > > @@ -53,6 +53,33 @@ unsigned long copy_ckfpr_from_user(struct task_struct > > *task, void __user *from); > > [i], label);\ > > } while (0) > > > > +#define unsafe_copy_fpr_from_user(task, from, label) do { > > \ > > + struct task_struct *__t = task; \ > > + u64 __user *__f = (u64 __user *)from; \ > > + u64 buf[ELF_NFPREG];\ > > How big is that buffer? > Isn't is likely to be reasonably large compared to a reasonable > kernel stack frame. > Especially since this isn't even a leaf function. > I think Christophe answered this - I don't really have an opinion either way. What would be a 'reasonable' kernel stack frame for reference? > > + int i; \ > > + \ > > + unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double),\ > > That really ought to be sizeof(buf). > Agreed, I will fix this. Thanks! > David > > > > + label); \ > > + for (i = 0; i < ELF_NFPREG - 1; i++)\ > > + __t->thread.TS_FPR(i) = buf[i]; \ > > + __t->thread.fp_state.fpscr = buf[i];\ > > +} while (0) > > + > > +#define unsafe_copy_vsx_from_user(task, from, label) do { > > \ > > + struct task_struct *__t = task; \ > > + u64 __user *__f = (u64 __user *)from; \ > > + u64 buf[ELF_NVSRHALFREG]; \ > > + int i; \ > > + \ > > + unsafe_copy_from_user(buf, __f, \ > > + ELF_NVSRHALFREG * sizeof(double), \ > > + label); \ > > + for (i = 0; i < ELF_NVSRHALFREG ; i++) \ > > + __t->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i]; \ > > +} while (0) > > + > > + > > #ifdef CONFIG_PPC_TRANSACTIONAL_MEM > > #define unsafe_copy_ckfpr_to_user(to, task, label) do {\ > > struct task_struct *__t = task; \ > > @@ -80,6 +107,10 @@ unsigned long copy_ckfpr_from_user(struct task_struct > > *task, void __user *from); > > unsafe_copy_to_user(to, (task)->thread.fp_state.fpr,\ > > ELF_NFPREG * sizeof(double), label) > > > > +#define unsafe_copy_fpr_from_user(task, from, label) > > \ > > + unsafe_copy_from_user((task)->thread.fp_state.fpr, from,\ > > + ELF_NFPREG * sizeof(double), label) > > + > > static inline unsigned long > > copy_fpr_to_user(void __user *to, struct task_struct *task) > > { > > @@ -115,6 +146,8 @@ copy_ckfpr_from_user(struct task_struct *task, void > > __user *from) > > #else > > #define unsafe_copy_fpr_to_user(to, task, label) do { } while (0) > > > > +#define unsafe_copy_fpr_from_user(task, from, label) do { } while (0) > > + > > static inline unsigned long > > copy_fpr_to_user(void __user *to, struct task_struct *task) > > { > > -- > > 2.26.1 > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, > MK1 1PT, UK > Registration No: 1397386 (Wales)
Re: [PATCH 05/13] kallsyms: refactor {,module_}kallsyms_on_each_symbol
One more thing... > @@ -4379,8 +4379,7 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, > const char *, > unsigned int i; > int ret; > > - module_assert_mutex(); > - > + mutex_lock(_mutex); > list_for_each_entry(mod, , list) { > /* We hold module_mutex: no need for rcu_dereference_sched */ > struct mod_kallsyms *kallsyms = mod->kallsyms; This was the last user of module_assert_mutex(), which can be removed now. Miroslav
Re: [PATCH 05/13] kallsyms: refactor {,module_}kallsyms_on_each_symbol
On Mon, 1 Feb 2021, Christoph Hellwig wrote: > On Fri, Jan 29, 2021 at 10:43:36AM +0100, Petr Mladek wrote: > > > --- a/kernel/livepatch/core.c > > > +++ b/kernel/livepatch/core.c > > > @@ -164,12 +164,8 @@ static int klp_find_object_symbol(const char > > > *objname, const char *name, > > > .pos = sympos, > > > }; > > > > > > - mutex_lock(_mutex); > > > - if (objname) > > > + if (objname || !kallsyms_on_each_symbol(klp_find_callback, )) > > > module_kallsyms_on_each_symbol(klp_find_callback, ); > > > - else > > > - kallsyms_on_each_symbol(klp_find_callback, ); > > > - mutex_unlock(_mutex); > > > > This change is not needed. (objname == NULL) means that we are > > interested only in symbols in "vmlinux". > > > > module_kallsyms_on_each_symbol(klp_find_callback, ) > > will always fail when objname == NULL. > > I just tried to keep the old behavior. I can respin it with your > recommended change noting the change in behavior, though. Yes, please. It would be cleaner that way. Miroslav
Re: [PATCH 04/13] module: use RCU to synchronize find_module
On Mon, 1 Feb 2021, Jessica Yu wrote: > +++ Miroslav Benes [29/01/21 16:29 +0100]: > >On Thu, 28 Jan 2021, Christoph Hellwig wrote: > > > >> Allow for a RCU-sched critical section around find_module, following > >> the lower level find_module_all helper, and switch the two callers > >> outside of module.c to use such a RCU-sched critical section instead > >> of module_mutex. > > > >That's a nice idea. > > > >> @@ -57,7 +58,7 @@ static void klp_find_object_module(struct klp_object > >> *obj) > >> if (!klp_is_module(obj)) > >>return; > >> > >> - mutex_lock(_mutex); > >> + rcu_read_lock_sched(); > >> /* > >>* We do not want to block removal of patched modules and therefore > >>* we do not take a reference here. The patches are removed by > >> @@ -74,7 +75,7 @@ static void klp_find_object_module(struct klp_object > >> *obj) > >> if (mod && mod->klp_alive) > > > >RCU always baffles me a bit, so I'll ask. Don't we need > >rcu_dereference_sched() here? "mod" comes from a RCU-protected list, so I > >wonder. > > Same here :-) I had to double check the RCU documentation. For our > modules list case I believe the rcu list API should take care of that > for us. Worth noting is this snippet from Documentation/RCU/whatisRCU.txt: > >rcu_dereference() is typically used indirectly, via the _rcu >list-manipulation primitives, such as list_for_each_entry_rcu() Ok, thanks to both for checking and explanation. Ack to the patch then. Miroslav
Re: [RFC 00/20] TLB batching consolidation and enhancements
On Sun, Jan 31, 2021 at 07:57:01AM +, Nadav Amit wrote: > > On Jan 30, 2021, at 7:30 PM, Nicholas Piggin wrote: > > I'll go through the patches a bit more closely when they all come > > through. Sparc and powerpc of course need the arch lazy mode to get > > per-page/pte information for operations that are not freeing pages, > > which is what mmu gather is designed for. > > IIUC you mean any PTE change requires a TLB flush. Even setting up a new PTE > where no previous PTE was set, right? These are the HASH architectures. Their hardware doesn't walk the page-tables, but it consults a hash-table to resolve page translations. They _MUST_ flush the entries under the PTL to avoid ever seeing conflicting information, which will make them really unhappy. They can do this because they have TLBI broadcast. There's a few more details I'm sure, but those seem to have slipped from my mind.
RE: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE
Thank you very much, I appreciate your fast responses. Thank you also for clarification, I did completely oversee the permission settings in the segment setup and expected the fault reaction on the PP bits in the TLB. And I will re-read the chapters, got get deeper into this topic. Greetings Christoph -Original Message- From: Christophe Leroy Sent: Montag, 1. Februar 2021 12:39 To: PLATTNER Christoph ; Benjamin Herrenschmidt ; Paul Mackerras ; Michael Ellerman Cc: linux-ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; HAMETNER Reinhard ; REITHER Robert - Contractor ; KOENIG Werner Subject: Re: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE Le 01/02/2021 à 11:22, PLATTNER Christoph a écrit : > Hello to all, and thank you very much for first and second fast response. > > I do not have a long history on PowerPC MMU environment, I hacked into > this topic for about 3 months for analyzing that problem- so, sorry, if I am > wrong in some points ... Yes you are wrong on some points, sorry, see below. > > What I learn so far from this MPC5121e (variant of e300c4 core): > - It uses book3s32 hash-code, but it DOES NOT provide KEY hash method, so > always the > branch "if (! Hash) " is taken, so, I assume that "key 0" and "key > 1" setups are not > used on this CPU (not supporting MMU_FTR_HPTE_TABLE) hash method is not used, this is SW TLB loading that is used, but still, all the PP and Ks/Kp keys defined in the segment register are used, see e300 core reference manual §6.4.2 Page Memory Protection > - The PP bits are NOT checked by the CPU in HW, even if set to 00, the CPU > does not react. > As far I have understood, the TLB miss routines are responsible for > checking permissions. > The TLB miss routines check the Linux PTE styled entries and generates > the PP bits > for the TLB entry. The PowerPC PP bits are never check elsewhere on that > CPU models ... PP bits ARE checked hoppefully. If it was not the case, then the TLB miss routines would install a TLB on a read, then the user could do a write without any verification being done ? Refer to e300 Core reference Manual, §6.1.4 Memory Protection Facilities As I explained in the patch, the problem is not that the HW doesn't check the permission. It is that user accessed been done with key 0 as programmed in the segment registers, PP 00 means RW access. > - The PTE entries in Linux are fully "void" in sense of this CPU type, as > this CPU does not > read any PTEs from RAM (no HW support in contrast to x86 or ARM or later > ppc...). No, the PTE are read by the TLB miss exception handlers and writen into TLB entries. > > In summary - as far as I understand it now - we have to handle the PTE > bits differently (Linux style) for PROT_NONE permissions - OR - we > have to expand the permission checking like my proposed experimental > patch. (PROT_NONE is not NUMA related only, but may not used very often ...). Yes, expanding the permission checking is the easiest solution, hence the patch I sent out based on your proposal. > > Another related point: > According e300 RM (manual) the ACCESSED bit in the PTE shall be set on > TLB miss, as it is an indication, that page is used. In 4.4 kernel > this write back of the _PAGE_ACCESSED bit was performed after successful > permission check: > > bne-DataAddressInvalid /* return if access not permitted */ > ori r0,r0,_PAGE_ACCESSED/* set _PAGE_ACCESSED in pte */ > /* > * NOTE! We are assuming this is not an SMP system, otherwise > * we would need to update the pte atomically with lwarx/stwcx. > */ > stw r0,0(r2)/* update PTE (accessed bit) */ > /* Convert linux-style PTE to low word of PPC-style PTE */ > > Bit is set (ori ...) and written back (stw ...) to Linux PTE. May be, > this is not needed, as the PTE is never seen by the PPC chip. But I do > not understand, WHY the PAGE_ACCCESSED is used for permission check in the > late 5.4 kernel (not used in 4.4 kernel): > > cmplw 0,r1,r3 > mfspr r2, SPRN_SDR1 > li r1, _PAGE_PRESENT | _PAGE_ACCESSED > rlwinm r2, r2, 28, 0xf000 > bgt-112f > > What is the reason or relevance for checking this here ? > Was not checked in 4.4, bit or-ed afterwards, as it is accessed now. > Do you know the reason of change on this point ? PAGE_ACCESSED is important for memory management, linux kernel need it. But instead of spending time at every miss to perform a write which will be a no-op in 99% of cases, we prefer bailing out to the page_fault logic when the accessed bit is not set. Then the page_fault logic will set the bit. This also allowed to simplify the handling in __set_pte()_at function by avoiding races in the update of PTEs. > > Another remark to Core manual relevant for this: > There is the
Re: [PATCH 04/13] module: use RCU to synchronize find_module
+++ Miroslav Benes [29/01/21 16:29 +0100]: On Thu, 28 Jan 2021, Christoph Hellwig wrote: Allow for a RCU-sched critical section around find_module, following the lower level find_module_all helper, and switch the two callers outside of module.c to use such a RCU-sched critical section instead of module_mutex. That's a nice idea. @@ -57,7 +58,7 @@ static void klp_find_object_module(struct klp_object *obj) if (!klp_is_module(obj)) return; - mutex_lock(_mutex); + rcu_read_lock_sched(); /* * We do not want to block removal of patched modules and therefore * we do not take a reference here. The patches are removed by @@ -74,7 +75,7 @@ static void klp_find_object_module(struct klp_object *obj) if (mod && mod->klp_alive) RCU always baffles me a bit, so I'll ask. Don't we need rcu_dereference_sched() here? "mod" comes from a RCU-protected list, so I wonder. Same here :-) I had to double check the RCU documentation. For our modules list case I believe the rcu list API should take care of that for us. Worth noting is this snippet from Documentation/RCU/whatisRCU.txt: rcu_dereference() is typically used indirectly, via the _rcu list-manipulation primitives, such as list_for_each_entry_rcu()
Re: [RFC 11/20] mm/tlb: remove arch-specific tlb_start/end_vma()
On Sat, Jan 30, 2021 at 04:11:23PM -0800, Nadav Amit wrote: > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h > index 427bfcc6cdec..b97136b7010b 100644 > --- a/include/asm-generic/tlb.h > +++ b/include/asm-generic/tlb.h > @@ -334,8 +334,8 @@ static inline void __tlb_reset_range(struct mmu_gather > *tlb) > > #ifdef CONFIG_MMU_GATHER_NO_RANGE > > -#if defined(tlb_flush) || defined(tlb_start_vma) || defined(tlb_end_vma) > -#error MMU_GATHER_NO_RANGE relies on default tlb_flush(), tlb_start_vma() > and tlb_end_vma() > +#if defined(tlb_flush) > +#error MMU_GATHER_NO_RANGE relies on default tlb_flush() > #endif > > /* > @@ -362,10 +362,6 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, > struct vm_area_struct *vm > > #ifndef tlb_flush > > -#if defined(tlb_start_vma) || defined(tlb_end_vma) > -#error Default tlb_flush() relies on default tlb_start_vma() and > tlb_end_vma() > -#endif #ifdef CONFIG_ARCH_WANT_AGGRESSIVE_TLB_FLUSH_BATCHING #error #endif goes here... > static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct > *vma) > { > if (tlb->fullmm) > return; > > + if (IS_ENABLED(CONFIG_ARCH_WANT_AGGRESSIVE_TLB_FLUSH_BATCHING)) > + return; Also, can you please stick to the CONFIG_MMU_GATHER_* namespace? I also don't think AGRESSIVE_FLUSH_BATCHING quite captures what it does. How about: CONFIG_MMU_GATHER_NO_PER_VMA_FLUSH ?
Re: [PATCH 05/13] kallsyms: refactor {,module_}kallsyms_on_each_symbol
On Fri, Jan 29, 2021 at 10:43:36AM +0100, Petr Mladek wrote: > > --- a/kernel/livepatch/core.c > > +++ b/kernel/livepatch/core.c > > @@ -164,12 +164,8 @@ static int klp_find_object_symbol(const char *objname, > > const char *name, > > .pos = sympos, > > }; > > > > - mutex_lock(_mutex); > > - if (objname) > > + if (objname || !kallsyms_on_each_symbol(klp_find_callback, )) > > module_kallsyms_on_each_symbol(klp_find_callback, ); > > - else > > - kallsyms_on_each_symbol(klp_find_callback, ); > > - mutex_unlock(_mutex); > > This change is not needed. (objname == NULL) means that we are > interested only in symbols in "vmlinux". > > module_kallsyms_on_each_symbol(klp_find_callback, ) > will always fail when objname == NULL. I just tried to keep the old behavior. I can respin it with your recommended change noting the change in behavior, though.
Re: [PATCH 04/13] module: use RCU to synchronize find_module
On Fri, Jan 29, 2021 at 04:29:02PM +0100, Miroslav Benes wrote: > > > > - mutex_lock(_mutex); > > + rcu_read_lock_sched(); > > /* > > * We do not want to block removal of patched modules and therefore > > * we do not take a reference here. The patches are removed by > > @@ -74,7 +75,7 @@ static void klp_find_object_module(struct klp_object *obj) > > if (mod && mod->klp_alive) > > RCU always baffles me a bit, so I'll ask. Don't we need > rcu_dereference_sched() here? "mod" comes from a RCU-protected list, so I > wonder. rcu_dereference* is only used for dereferencing points where that reference itself is RCU protected, that is the lookup of mod itself down in find_module_all in this case.
Re: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE
Le 01/02/2021 à 11:22, PLATTNER Christoph a écrit : Hello to all, and thank you very much for first and second fast response. I do not have a long history on PowerPC MMU environment, I hacked into this topic for about 3 months for analyzing that problem- so, sorry, if I am wrong in some points ... Yes you are wrong on some points, sorry, see below. What I learn so far from this MPC5121e (variant of e300c4 core): - It uses book3s32 hash-code, but it DOES NOT provide KEY hash method, so always the branch "if (! Hash) " is taken, so, I assume that "key 0" and "key 1" setups are not used on this CPU (not supporting MMU_FTR_HPTE_TABLE) hash method is not used, this is SW TLB loading that is used, but still, all the PP and Ks/Kp keys defined in the segment register are used, see e300 core reference manual §6.4.2 Page Memory Protection - The PP bits are NOT checked by the CPU in HW, even if set to 00, the CPU does not react. As far I have understood, the TLB miss routines are responsible for checking permissions. The TLB miss routines check the Linux PTE styled entries and generates the PP bits for the TLB entry. The PowerPC PP bits are never check elsewhere on that CPU models ... PP bits ARE checked hoppefully. If it was not the case, then the TLB miss routines would install a TLB on a read, then the user could do a write without any verification being done ? Refer to e300 Core reference Manual, §6.1.4 Memory Protection Facilities As I explained in the patch, the problem is not that the HW doesn't check the permission. It is that user accessed been done with key 0 as programmed in the segment registers, PP 00 means RW access. - The PTE entries in Linux are fully "void" in sense of this CPU type, as this CPU does not read any PTEs from RAM (no HW support in contrast to x86 or ARM or later ppc...). No, the PTE are read by the TLB miss exception handlers and writen into TLB entries. In summary - as far as I understand it now - we have to handle the PTE bits differently (Linux style) for PROT_NONE permissions - OR - we have to expand the permission checking like my proposed experimental patch. (PROT_NONE is not NUMA related only, but may not used very often ...). Yes, expanding the permission checking is the easiest solution, hence the patch I sent out based on your proposal. Another related point: According e300 RM (manual) the ACCESSED bit in the PTE shall be set on TLB miss, as it is an indication, that page is used. In 4.4 kernel this write back of the _PAGE_ACCESSED bit was performed after successful permission check: bne-DataAddressInvalid /* return if access not permitted */ ori r0,r0,_PAGE_ACCESSED/* set _PAGE_ACCESSED in pte */ /* * NOTE! We are assuming this is not an SMP system, otherwise * we would need to update the pte atomically with lwarx/stwcx. */ stw r0,0(r2)/* update PTE (accessed bit) */ /* Convert linux-style PTE to low word of PPC-style PTE */ Bit is set (ori ...) and written back (stw ...) to Linux PTE. May be, this is not needed, as the PTE is never seen by the PPC chip. But I do not understand, WHY the PAGE_ACCCESSED is used for permission check in the late 5.4 kernel (not used in 4.4 kernel): cmplw 0,r1,r3 mfspr r2, SPRN_SDR1 li r1, _PAGE_PRESENT | _PAGE_ACCESSED rlwinm r2, r2, 28, 0xf000 bgt-112f What is the reason or relevance for checking this here ? Was not checked in 4.4, bit or-ed afterwards, as it is accessed now. Do you know the reason of change on this point ? PAGE_ACCESSED is important for memory management, linux kernel need it. But instead of spending time at every miss to perform a write which will be a no-op in 99% of cases, we prefer bailing out to the page_fault logic when the accessed bit is not set. Then the page_fault logic will set the bit. This also allowed to simplify the handling in __set_pte()_at function by avoiding races in the update of PTEs. Another remark to Core manual relevant for this: There is the reference manual for e300 core available (e300 RM). It includes many remarks in range of Memory Management section, that many features are optional or variable for dedicated implementations. On the other hand, the MPC5121e reference manual refers to the e300 core RM, but DOES NOT information, which of the optional points are there or nor. According my analysis, MPC5121e does not include any of the optional features. Not sure what you mean. As far as I understand, that chapter tells you that some functionnalities are optional for the powerpc architectecture, and provided (or not) by the e300 core. The MPC5121 supports all the things that are defined by e300 core. Thanks a lot for first reactions You are welcome, don't hesitate if you have additional questions. Christophe
RE: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE
Hello to all, and thank you very much for first and second fast response. I do not have a long history on PowerPC MMU environment, I hacked into this topic for about 3 months for analyzing that problem- so, sorry, if I am wrong in some points ... What I learn so far from this MPC5121e (variant of e300c4 core): - It uses book3s32 hash-code, but it DOES NOT provide KEY hash method, so always the branch "if (! Hash) " is taken, so, I assume that "key 0" and "key 1" setups are not used on this CPU (not supporting MMU_FTR_HPTE_TABLE) - The PP bits are NOT checked by the CPU in HW, even if set to 00, the CPU does not react. As far I have understood, the TLB miss routines are responsible for checking permissions. The TLB miss routines check the Linux PTE styled entries and generates the PP bits for the TLB entry. The PowerPC PP bits are never check elsewhere on that CPU models ... - The PTE entries in Linux are fully "void" in sense of this CPU type, as this CPU does not read any PTEs from RAM (no HW support in contrast to x86 or ARM or later ppc...). In summary - as far as I understand it now - we have to handle the PTE bits differently (Linux style) for PROT_NONE permissions - OR - we have to expand the permission checking like my proposed experimental patch. (PROT_NONE is not NUMA related only, but may not used very often ...). Another related point: According e300 RM (manual) the ACCESSED bit in the PTE shall be set on TLB miss, as it is an indication, that page is used. In 4.4 kernel this write back of the _PAGE_ACCESSED bit was performed after successful permission check: bne-DataAddressInvalid /* return if access not permitted */ ori r0,r0,_PAGE_ACCESSED/* set _PAGE_ACCESSED in pte */ /* * NOTE! We are assuming this is not an SMP system, otherwise * we would need to update the pte atomically with lwarx/stwcx. */ stw r0,0(r2)/* update PTE (accessed bit) */ /* Convert linux-style PTE to low word of PPC-style PTE */ Bit is set (ori ...) and written back (stw ...) to Linux PTE. May be, this is not needed, as the PTE is never seen by the PPC chip. But I do not understand, WHY the PAGE_ACCCESSED is used for permission check in the late 5.4 kernel (not used in 4.4 kernel): cmplw 0,r1,r3 mfspr r2, SPRN_SDR1 li r1, _PAGE_PRESENT | _PAGE_ACCESSED rlwinm r2, r2, 28, 0xf000 bgt-112f What is the reason or relevance for checking this here ? Was not checked in 4.4, bit or-ed afterwards, as it is accessed now. Do you know the reason of change on this point ? Another remark to Core manual relevant for this: There is the reference manual for e300 core available (e300 RM). It includes many remarks in range of Memory Management section, that many features are optional or variable for dedicated implementations. On the other hand, the MPC5121e reference manual refers to the e300 core RM, but DOES NOT information, which of the optional points are there or nor. According my analysis, MPC5121e does not include any of the optional features. Thanks a lot for first reactions Christoph -Original Message- From: Christophe Leroy Sent: Montag, 1. Februar 2021 07:30 To: Benjamin Herrenschmidt ; Paul Mackerras ; Michael Ellerman ; PLATTNER Christoph Cc: linux-ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Subject: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE On book3s/32, page protection is defined by the PP bits in the PTE which provide the following protection depending on the access keys defined in the matching segment register: - PP 00 means RW with key 0 and N/A with key 1. - PP 01 means RW with key 0 and RO with key 1. - PP 10 means RW with both key 0 and key 1. - PP 11 means RO with both key 0 and key 1. Since the implementation of kernel userspace access protection, PP bits have been set as follows: - PP00 for pages without _PAGE_USER - PP01 for pages with _PAGE_USER and _PAGE_RW - PP11 for pages with _PAGE_USER and without _PAGE_RW For kernelspace segments, kernel accesses are performed with key 0 and user accesses are performed with key 1. As PP00 is used for non _PAGE_USER pages, user can't access kernel pages not flagged _PAGE_USER while kernel can. For userspace segments, both kernel and user accesses are performed with key 0, therefore pages not flagged _PAGE_USER are still accessible to the user. This shouldn't be an issue, because userspace is expected to be accessible to the user. But unlike most other architectures, powerpc implements PROT_NONE protection by removing _PAGE_USER flag instead of flagging the page as not valid. This means that pages in userspace that are not flagged _PAGE_USER shall remain inaccessible. To get the expected behaviour, just mimic other architectures in the TLB miss handler by checking _PAGE_USER permission on
Re: FSL P5040: KVM HV doesn't work with the RC5 of kernel 5.11
Hello, I compiled the RC6 of kernel 5.11 today and KVM HV works again. Therefore I don't need the patch below anymore. Many thanks for solving the issue, Christian On 27 January 2021 at 05:07pm, Christian Zigotzky wrote: Hello, I compiled the RC5 of kernel 5.11 today. Unfortunately KVM HV doesn't work anymore on my FSL P5040 board [1]. I tested it with QEMU 5.0.0 today [2]. The virtual e5500 QEMU machine works with the "RC4 with KVM HV" and with the "RC5 without KVM HV". The complete system freezes if I use KVM HV with the RC5. I have bisected and 785025820a6a565185ce9d47fdd8d23dbf91dee8 (powerpc/mm/highmem: use __set_pte_at() for kmap_local()) [3] is the first bad commit. I was able to revert this bad commit and after a new compiling, KVM HV works again. I created a patch for reverting the commit. [4] Please find attached the kernel config. I use one uImage for the virtual machine and for the P5040 board. Please check the first bad commit. Thanks, Christian [1] http://wiki.amiga.org/index.php?title=X5000 [2] qemu-system-ppc64 -M ppce500 -cpu e5500 -enable-kvm -m 1024 -kernel uImage-5.11 -drive format=raw,file=MintPPC32-X5000.img,index=0,if=virtio -netdev user,id=mynet0 -device e1000,netdev=mynet0 -append "rw root=/dev/vda" -device virtio-vga -device virtio-mouse-pci -device virtio-keyboard-pci -device pci-ohci,id=newusb -device usb-audio,bus=newusb.0 -smp 4 [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.11-rc5=785025820a6a565185ce9d47fdd8d23dbf91dee8 [4] diff -rupN a/arch/powerpc/include/asm/highmem.h b/arch/powerpc/include/asm/highmem.h --- a/arch/powerpc/include/asm/highmem.h 2021-01-27 16:12:40.382164118 +0100 +++ b/arch/powerpc/include/asm/highmem.h 2021-01-27 16:10:54.055249957 +0100 @@ -58,8 +58,6 @@ extern pte_t *pkmap_page_table; #define flush_cache_kmaps() flush_cache_all() -#define arch_kmap_local_set_pte(mm, vaddr, ptep, ptev) \ - __set_pte_at(mm, vaddr, ptep, ptev, 1) #define arch_kmap_local_post_map(vaddr, pteval) \ local_flush_tlb_page(NULL, vaddr) #define arch_kmap_local_post_unmap(vaddr) \
[PATCH] ASoC: fsl_xcvr: remove unneeded semicolon
Eliminate the following coccicheck warning: ./sound/soc/fsl/fsl_xcvr.c:739:2-3: Unneeded semicolon Reported-by: Abaci Robot Signed-off-by: Yang Li --- sound/soc/fsl/fsl_xcvr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/fsl/fsl_xcvr.c b/sound/soc/fsl/fsl_xcvr.c index 3d58c88..65b388a 100644 --- a/sound/soc/fsl/fsl_xcvr.c +++ b/sound/soc/fsl/fsl_xcvr.c @@ -736,7 +736,7 @@ static int fsl_xcvr_load_firmware(struct fsl_xcvr *xcvr) /* clean current page, including data memory */ memset_io(xcvr->ram_addr, 0, size); } - }; + } err_firmware: release_firmware(fw); -- 1.8.3.1