[PATCH] Documentation: Clarify better about the rwsem non-owner release issue
Reword and clarify better about the rwsem non-owner release issue. Link: https://lore.kernel.org/linux-pci/20200321212144.ga6...@google.com/ Signed-off-by: Joel Fernandes (Google) --- Documentation/locking/locktypes.rst | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/locking/locktypes.rst b/Documentation/locking/locktypes.rst index 6f2c0f5b041e..656dce72f11f 100644 --- a/Documentation/locking/locktypes.rst +++ b/Documentation/locking/locktypes.rst @@ -292,7 +292,7 @@ implementations to provide priority inheritance for all lock types except the truly spinning ones. Priority inheritance on ownerless locks is obviously impossible. -For now the rwsem non-owner release excludes code which utilizes it from -being used on PREEMPT_RT enabled kernels. In same cases this can be -mitigated by disabling portions of the code, in other cases the complete -functionality has to be disabled until a workable solution has been found. +For now, a PREEMPT_RT kernel just disables code sections that perform a +non-owner release of an rwsem. In some cases, parts of the code are disabled. +In other cases, the complete functionality has to be disabled until a workable +solution has been found. -- 2.25.1.696.g5e7596f4ac-goog
Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
On Sat, Mar 21, 2020 at 10:49:04PM +0100, Thomas Gleixner wrote: [...] > >> +rwsems have grown interfaces which allow non owner release for special > >> +purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT > >> +substitutes all locking primitives except semaphores with RT-mutex based > >> +implementations to provide priority inheritance for all lock types except > >> +the truly spinning ones. Priority inheritance on ownerless locks is > >> +obviously impossible. > >> + > >> +For now the rwsem non-owner release excludes code which utilizes it from > >> +being used on PREEMPT_RT enabled kernels. > > > > I could not parse the last sentence here, but I think you meant "For now, > > PREEMPT_RT enabled kernels disable code that perform a non-owner release of > > an rwsem". Correct me if I'm wrong. > > Right, that's what I wanted to say :) > > Care to send a delta patch? Absolutely, doing that now. :-) thanks, - Joel
Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
Joel Fernandes writes: >> +rwlock_t >> + >> + >> +rwlock_t is a multiple readers and single writer lock mechanism. >> + >> +On a non PREEMPT_RT enabled kernel rwlock_t is implemented as a spinning >> +lock and the suffix rules of spinlock_t apply accordingly. The >> +implementation is fair and prevents writer starvation. >> > > You mentioned writer starvation, but I think it would be good to also mention > that rwlock_t on a non-PREEMPT_RT kernel also does not have _reader_ > starvation problem, since it uses queued implementation. This fact is worth > mentioning here, since further below you explain that an rwlock in PREEMPT_RT > does have reader starvation problem. It's worth mentioning. But RT really has only write starvation not reader starvation. >> +rwlock_t and PREEMPT_RT >> +--- >> + >> +On a PREEMPT_RT enabled kernel rwlock_t is mapped to a separate >> +implementation based on rt_mutex which changes the semantics: >> + >> + - Same changes as for spinlock_t >> + >> + - The implementation is not fair and can cause writer starvation under >> + certain circumstances. The reason for this is that a writer cannot grant >> + its priority to multiple readers. Readers which are blocked on a writer >> + fully support the priority inheritance protocol. > > Is it hard to give priority to multiple readers because the number of readers > to give priority to could be unbounded? Yes, and it's horribly complex and racy. We had an implemetation years ago which taught us not to try it again :) >> +PREEMPT_RT also offers a local_lock mechanism to substitute the >> +local_irq_disable/save() constructs in cases where a separation of the >> +interrupt disabling and the locking is really unavoidable. This should be >> +restricted to very rare cases. > > It would also be nice to mention where else local_lock() can be used, such as > protecting per-cpu variables without disabling preemption. Could we add a > section on protecting per-cpu data? (Happy to do that and send a patch if you > prefer). The local lock section will come soon when we post the local lock patches again. >> +rwsems have grown interfaces which allow non owner release for special >> +purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT >> +substitutes all locking primitives except semaphores with RT-mutex based >> +implementations to provide priority inheritance for all lock types except >> +the truly spinning ones. Priority inheritance on ownerless locks is >> +obviously impossible. >> + >> +For now the rwsem non-owner release excludes code which utilizes it from >> +being used on PREEMPT_RT enabled kernels. > > I could not parse the last sentence here, but I think you meant "For now, > PREEMPT_RT enabled kernels disable code that perform a non-owner release of > an rwsem". Correct me if I'm wrong. Right, that's what I wanted to say :) Care to send a delta patch? Thanks! tglx
Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
Hi Thomas, Just a few comments: [...] > +rtmutex > +=== > + > +RT-mutexes are mutexes with support for priority inheritance (PI). > + > +PI has limitations on non PREEMPT_RT enabled kernels due to preemption and > +interrupt disabled sections. > + > +On a PREEMPT_RT enabled kernel most of these sections are fully > +preemptible. This is possible because PREEMPT_RT forces most executions > +into task context, especially interrupt handlers and soft interrupts, which > +allows to substitute spinlock_t and rwlock_t with RT-mutex based > +implementations. > + > + > +raw_spinlock_t and spinlock_t > += > + > +raw_spinlock_t > +-- > + > +raw_spinlock_t is a strict spinning lock implementation regardless of the > +kernel configuration including PREEMPT_RT enabled kernels. > + > +raw_spinlock_t is to be used only in real critical core code, low level > +interrupt handling and places where protecting (hardware) state is required > +to be safe against preemption and eventually interrupts. > + > +Another reason to use raw_spinlock_t is when the critical section is tiny > +to avoid the overhead of spinlock_t on a PREEMPT_RT enabled kernel in the > +contended case. > + > +spinlock_t > +-- > + > +The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT. > + > +On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t > +and has exactly the same semantics. > + > +spinlock_t and PREEMPT_RT > +- > + > +On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate > +implementation based on rt_mutex which changes the semantics: > + > + - Preemption is not disabled > + > + - The hard interrupt related suffixes for spin_lock / spin_unlock > + operations (_irq, _irqsave / _irqrestore) do not affect the CPUs > + interrupt disabled state > + > + - The soft interrupt related suffix (_bh()) is still disabling the > + execution of soft interrupts, but contrary to a non PREEMPT_RT enabled > + kernel, which utilizes the preemption count, this is achieved by a per > + CPU bottom half locking mechanism. > + > +All other semantics of spinlock_t are preserved: > + > + - Migration of tasks which hold a spinlock_t is prevented. On a non > + PREEMPT_RT enabled kernel this is implicit due to preemption disable. > + PREEMPT_RT has a separate mechanism to achieve this. This ensures that > + pointers to per CPU variables stay valid even if the task is preempted. > + > + - Task state preservation. The task state is not affected when a lock is > + contended and the task has to schedule out and wait for the lock to > + become available. The lock wake up restores the task state unless there > + was a regular (not lock related) wake up on the task. This ensures that > + the task state rules are always correct independent of the kernel > + configuration. > + > +rwlock_t > + > + > +rwlock_t is a multiple readers and single writer lock mechanism. > + > +On a non PREEMPT_RT enabled kernel rwlock_t is implemented as a spinning > +lock and the suffix rules of spinlock_t apply accordingly. The > +implementation is fair and prevents writer starvation. > You mentioned writer starvation, but I think it would be good to also mention that rwlock_t on a non-PREEMPT_RT kernel also does not have _reader_ starvation problem, since it uses queued implementation. This fact is worth mentioning here, since further below you explain that an rwlock in PREEMPT_RT does have reader starvation problem. > +rwlock_t and PREEMPT_RT > +--- > + > +On a PREEMPT_RT enabled kernel rwlock_t is mapped to a separate > +implementation based on rt_mutex which changes the semantics: > + > + - Same changes as for spinlock_t > + > + - The implementation is not fair and can cause writer starvation under > + certain circumstances. The reason for this is that a writer cannot grant > + its priority to multiple readers. Readers which are blocked on a writer > + fully support the priority inheritance protocol. Is it hard to give priority to multiple readers because the number of readers to give priority to could be unbounded? > + > + > +PREEMPT_RT caveats > +== > + > +spinlock_t and rwlock_t > +--- > + > +The substitution of spinlock_t and rwlock_t on PREEMPT_RT enabled kernels > +with RT-mutex based implementations has a few implications. > + > +On a non PREEMPT_RT enabled kernel the following code construct is > +perfectly fine:: > + > + local_irq_disable(); > + spin_lock(); > + > +and fully equivalent to:: > + > + spin_lock_irq(); > + > +Same applies to rwlock_t and the _irqsave() suffix variant. > + > +On a PREEMPT_RT enabled kernel this breaks because the RT-mutex > +substitution expects a fully preemptible context. > + > +The preferred solution is to use :c:func:`spin_lock_irq()` or > +:c:func:`spin_lock_irqsave()` and their unlock counterparts. > + > +PREEMPT_RT also offers a
Re: [patch V3 00/20] Lock ordering documentation and annotation for lockdep
Davidlohr Bueso writes: > On Sat, 21 Mar 2020, Thomas Gleixner wrote: > >>This is the third and hopefully final version of this work. The second one >>can be found here: > > Would you rather I send in a separate series with the kvm changes, or > should I just send a v2 with the fixes here again? Send a separate series please. These nested threads are hard to follow. Thanks, tglx
Re: [PATCH 00/18] genirq: Remove setup_irq()
Hi Thomas, On Thu, Feb 27, 2020 at 04:37:13PM +0530, afzal mohammed wrote: > On Thu, Feb 27, 2020 at 11:31:15AM +0100, Thomas Gleixner wrote: > > Vs. merging this series, I suggest the following approach: > > > >- Resubmit the individual changes as single patches or small series > > to the relevant maintainers and subsystem mailing lists. They have > > no dependency on a core change and can be applied where they belong > > to. > > > >- After 5.6-rc6, verify which parts have made their way into > > linux-next and resubmit the ignored ones as a series to me along > > with the removal of the core parts. > > > > That way we can avoid conflicting changes between subsystems and the tip > > irq/core branch as much as possible. > > Okay, i will do accordingly. i am on it, is delayed due to the reason as mentioned at, https://lkml.kernel.org/r/20200321172626.GA6323@afzalpc [ not repeating contents here since other mail was sent just now, cc'ing you ] Regards afzal
Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
On Sat, Mar 21, 2020 at 11:26:06AM +0100, Thomas Gleixner wrote: > "Paul E. McKenney" writes: > > On Fri, Mar 20, 2020 at 11:36:03PM +0100, Thomas Gleixner wrote: > >> I agree that what I tried to express is hard to parse, but it's at least > >> halfways correct :) > > > > Apologies! That is what I get for not looking it up in the source. :-/ > > > > OK, so I am stupid enough not only to get it wrong, but also to try again: > > > >... Other types of wakeups would normally unconditionally set the > >task state to RUNNING, but that does not work here because the task > >must remain blocked until the lock becomes available. Therefore, > >when a non-lock wakeup attempts to awaken a task blocked waiting > >for a spinlock, it instead sets the saved state to RUNNING. Then, > >when the lock acquisition completes, the lock wakeup sets the task > >state to the saved state, in this case setting it to RUNNING. > > > > Is that better? > > Definitely! > > Thanks for all the editorial work! NP, and glad you like it! But I felt even more stupid sometime in the middle of the night. Why on earth didn't I work in your nice examples? :-/ I will pull them in later. Time to go hike!!! Thanx, Paul
Re: [patch V3 00/20] Lock ordering documentation and annotation for lockdep
On Sat, 21 Mar 2020, Thomas Gleixner wrote: This is the third and hopefully final version of this work. The second one can be found here: Would you rather I send in a separate series with the kvm changes, or should I just send a v2 with the fixes here again? Thanks, Davidlohr
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.6-5 tag
The pull request you sent on Sat, 21 Mar 2020 23:54:54 +1100: > https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git > tags/powerpc-5.6-5 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/c63c50fc2ec9afc4de21ef9ead2eac64b178cce1 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [PATCH 2/5] csky: Remove mm.h from asm/uaccess.h
On Sat, Mar 21, 2020 at 8:08 PM Thomas Gleixner wrote: > > Guo Ren writes: > > > Tested and Acked by me. > > > > Queued for next pull request, thx > > Can we please route that with the rcuwait changes to avoid breakage > unless you ship it to Linus right away? Ok, I won't queue it.
Re: [patch V3 12/20] powerpc/ps3: Convert half completion to rcuwait
Thomas Gleixner writes: > From: Thomas Gleixner That's obviously bogus and wants to be: From: Peter Zijlstra (Intel)
[GIT PULL] Please pull powerpc/linux.git powerpc-5.6-5 tag
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi Linus, Please pull a couple more powerpc fixes for 5.6: The following changes since commit 59bee45b9712c759ea4d3dcc4eff1752f3a66558: powerpc/mm: Fix missing KUAP disable in flush_coherent_icache() (2020-03-05 17:15:08 +1100) are available in the git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-5.6-5 for you to fetch changes up to 1d0c32ec3b860a32df593a22bad0d1dbc5546a59: KVM: PPC: Fix kernel crash with PR KVM (2020-03-20 13:39:10 +1100) - -- powerpc fixes for 5.6 #5 Two fixes for bugs introduced this cycle. One for a crash when shutting down a KVM PR guest (our original style of KVM which doesn't use hypervisor mode). And one fix for the recently added 32-bit KASAN_VMALLOC support. Thanks to: Christophe Leroy, Greg Kurz, Sean Christopherson. - -- Christophe Leroy (1): powerpc/kasan: Fix shadow memory protection with CONFIG_KASAN_VMALLOC Greg Kurz (1): KVM: PPC: Fix kernel crash with PR KVM arch/powerpc/kvm/book3s_pr.c | 1 + arch/powerpc/kvm/powerpc.c| 2 -- arch/powerpc/mm/kasan/kasan_init_32.c | 9 ++--- 3 files changed, 3 insertions(+), 9 deletions(-) -BEGIN PGP SIGNATURE- iQIzBAEBCAAdFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl52BiYACgkQUevqPMjh pYDwqhAAsiBYfVZiof+3JQM1Qaf2nnrYHMCLgwT6BJ6l8IxcjPih/4tcWAhe9plX 8xAa44leV0I+HpO0gxFOcfmPTtMmgdk8elUq8NEN7NBQgCb0vDyhjRBPuotwFqvx U5Hiztra4c87NSL3A/h9JXxtRQBTw5Eg3NuxVPHV6seMtUhBkJThfYWFY7DWvolX v8poLVZjO2Tq+QuEwJuC1lFTCqT1u1V+IdxEbNrBWy3yNqvb9p4gAlvB+0s/xe30 SFmTyNWUdW4KCE+8Vvbpgr9+s7m6CCvysjlmPra+m7pgFe+X6qcRFLWz+L+8MSSj +mSmFuyI6KYOxRRY3Er4Q7gnssfiNlcxr5I/jvhiXWJNwS3inv4OinlEfV4pW/eT ++Uy7JCwqel1JZANh0oqcydeuGdU8emozsi5LuRKLB6Mm5mFymzYRc+W6MEpUwM6 b+2rPAagOwY44DN517Ixkc/jdad3BCiLeuDWqzx1BVUHgEKNvu3RWhrHClM1kEey AaH6u1KGQ81XuKqYrODFaLkOoCumPzUgK42CcjblAQcea1PpYl5myCVNILLBQjJK kTa1CzX9JG0z9mWLjQwJSpYKB7MOQxBGy0BmxOdjjUBD9DkhGm0Pt7NGz1Xo0n9s hCSVtCV9M1cfaPyvCDICbDtR+j8SQTfjFk9fDwWcnrssbhtac3c= =3KDP -END PGP SIGNATURE-
FSL P5020/Cyrus+ Board: Poweroff and Restart Support
Hello, We would like to add poweroff and restart support for the Cyrus+ board [1] [2] to the mainline vanilla kernel. There is a patch for adding poweroff and restart support. (attached) It works but I am not sure if it is good enough for the mainline vanilla kernel. Please post some suggestions and comments about this patch. Thanks, Christian [1] http://wiki.amiga.org/index.php?title=X5000 [2] https://www.amigaos.net/hardware/133/amigaone-x5000 diff -rupN a/arch/powerpc/boot/dts/fsl/cyrus_p5020.dts b/arch/powerpc/boot/dts/fsl/cyrus_p5020.dts --- a/arch/powerpc/boot/dts/fsl/cyrus_p5020.dts 2020-02-10 01:08:48.0 +0100 +++ b/arch/powerpc/boot/dts/fsl/cyrus_p5020.dts 2020-02-10 08:49:47.953680947 +0100 @@ -146,6 +146,25 @@ 0 0x0001>; }; }; + + gpio-poweroff { + compatible = "gpio-poweroff"; + gpios = < 3 1>; + }; + + gpio-restart { + compatible = "gpio-restart"; + gpios = < 2 1>; + }; + + leds { + compatible = "gpio-leds"; + hdd { + label = "Disk activity"; + gpios = < 5 0>; + linux,default-trigger = "disk-activity"; + }; + }; }; /include/ "p5020si-post.dtsi" diff -rupN a/arch/powerpc/platforms/85xx/corenet_generic.c b/arch/powerpc/platforms/85xx/corenet_generic.c --- a/arch/powerpc/platforms/85xx/corenet_generic.c 2020-02-10 01:08:48.0 +0100 +++ b/arch/powerpc/platforms/85xx/corenet_generic.c 2020-02-10 08:49:47.953680947 +0100 @@ -46,6 +46,16 @@ void __init corenet_gen_pic_init(void) mpic_init(mpic); } +/* If someone has registered a poweroff callback, invoke it */ +static void __noreturn corenet_generic_halt(void) +{ + if (pm_power_off) + pm_power_off(); + + /* Should not return */ + for(;;); +} + /* * Setup the architecture */ @@ -99,6 +109,15 @@ static const struct of_device_id of_devi { .name = "handles", }, + { + .name = "gpio-poweroff", + }, + { + .name = "gpio-restart", + }, + { + .name = "leds", + }, {} }; @@ -149,6 +168,8 @@ static int __init corenet_generic_probe( extern struct smp_ops_t smp_85xx_ops; #endif + ppc_md.halt = corenet_generic_halt; + if (of_device_compatible_match(of_root, boards)) return 1;
Re: [patch V3 05/20] acpi: Remove header dependency
On Sat, Mar 21, 2020 at 1:34 PM Thomas Gleixner wrote: > > From: Peter Zijlstra > > In order to avoid future header hell, remove the inclusion of > proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a > struct. > Acked-by: Andy Shevchenko # for PDx86 > Signed-off-by: Peter Zijlstra (Intel) > Signed-off-by: Thomas Gleixner > Cc: Darren Hart > Cc: Andy Shevchenko > Cc: platform-driver-...@vger.kernel.org > Cc: Greg Kroah-Hartman > Cc: Zhang Rui > Cc: "Rafael J. Wysocki" > Cc: linux...@vger.kernel.org > Cc: Len Brown > Cc: linux-a...@vger.kernel.org > --- > drivers/platform/x86/dell-smo8800.c |1 + > drivers/platform/x86/wmi.c |1 + > drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c |1 + > include/acpi/acpi_bus.h |2 +- > 4 files changed, 4 insertions(+), 1 deletion(-) > > --- a/drivers/platform/x86/dell-smo8800.c > +++ b/drivers/platform/x86/dell-smo8800.c > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > > struct smo8800_device { > u32 irq; /* acpi device irq */ > --- a/drivers/platform/x86/wmi.c > +++ b/drivers/platform/x86/wmi.c > @@ -29,6 +29,7 @@ > #include > #include > #include > +#include > #include > > ACPI_MODULE_NAME("wmi"); > --- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c > +++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > #include "acpi_thermal_rel.h" > > static acpi_handle acpi_thermal_rel_handle; > --- a/include/acpi/acpi_bus.h > +++ b/include/acpi/acpi_bus.h > @@ -80,7 +80,7 @@ bool acpi_dev_present(const char *hid, c > > #ifdef CONFIG_ACPI > > -#include > +struct proc_dir_entry; > > #define ACPI_BUS_FILE_ROOT "acpi" > extern struct proc_dir_entry *acpi_root_dir; > > -- With Best Regards, Andy Shevchenko
Re: [PATCH 2/5] csky: Remove mm.h from asm/uaccess.h
Guo Ren writes: > Tested and Acked by me. > > Queued for next pull request, thx Can we please route that with the rcuwait changes to avoid breakage unless you ship it to Linus right away? Thanks, tglx
Re: [PATCH 1/3] KVM: PPC: Fix kernel crash with PR KVM
On Wed, 2020-03-18 at 17:43:30 UTC, Greg Kurz wrote: > With PR KVM, shutting down a VM causes the host kernel to crash: > > [ 314.219284] BUG: Unable to handle kernel data access on read at > 0xc0080176c638 > [ 314.219299] Faulting instruction address: 0xc00800d4ddb0 > cpu 0x0: Vector: 300 (Data Access) at [c0036da077a0] > pc: c00800d4ddb0: kvmppc_mmu_pte_flush_all+0x68/0xd0 [kvm_pr] > lr: c00800d4dd94: kvmppc_mmu_pte_flush_all+0x4c/0xd0 [kvm_pr] > sp: c0036da07a30 >msr: 90010280b033 >dar: c0080176c638 > dsisr: 4000 > current = 0xc0036d4c > paca= 0xc1a0 irqmask: 0x03 irq_happened: 0x01 > pid = 1992, comm = qemu-system-ppc > Linux version 5.6.0-master-gku+ (greg@palmb) (gcc version 7.5.0 (Ubuntu > 7.5.0-3ubuntu1~18.04)) #17 SMP Wed Mar 18 13:49:29 CET 2020 > enter ? for help > [c0036da07ab0] c00800d4fbe0 kvmppc_mmu_destroy_pr+0x28/0x60 [kvm_pr] > [c0036da07ae0] c008009eab8c kvmppc_mmu_destroy+0x34/0x50 [kvm] > [c0036da07b00] c008009e50c0 kvm_arch_vcpu_destroy+0x108/0x140 [kvm] > [c0036da07b30] c008009d1b50 kvm_vcpu_destroy+0x28/0x80 [kvm] > [c0036da07b60] c008009e4434 kvm_arch_destroy_vm+0xbc/0x190 [kvm] > [c0036da07ba0] c008009d9c2c kvm_put_kvm+0x1d4/0x3f0 [kvm] > [c0036da07c00] c008009da760 kvm_vm_release+0x38/0x60 [kvm] > [c0036da07c30] c0420be0 __fput+0xe0/0x310 > [c0036da07c90] c01747a0 task_work_run+0x150/0x1c0 > [c0036da07cf0] c014896c do_exit+0x44c/0xd00 > [c0036da07dc0] c01492f4 do_group_exit+0x64/0xd0 > [c0036da07e00] c0149384 sys_exit_group+0x24/0x30 > [c0036da07e20] c000b9d0 system_call+0x5c/0x68 > > This is caused by a use-after-free in kvmppc_mmu_pte_flush_all() > which dereferences vcpu->arch.book3s which was previously freed by > kvmppc_core_vcpu_free_pr(). This happens because kvmppc_mmu_destroy() > is called after kvmppc_core_vcpu_free() since commit ff030fdf5573 > ("KVM: PPC: Move kvm_vcpu_init() invocation to common code"). > > The kvmppc_mmu_destroy() helper calls one of the following depending > on the KVM backend: > > - kvmppc_mmu_destroy_hv() which does nothing (Book3s HV) > > - kvmppc_mmu_destroy_pr() which undoes the effects of > kvmppc_mmu_init() (Book3s PR 32-bit) > > - kvmppc_mmu_destroy_pr() which undoes the effects of > kvmppc_mmu_init() (Book3s PR 64-bit) > > - kvmppc_mmu_destroy_e500() which does nothing (BookE e500/e500mc) > > It turns out that this is only relevant to PR KVM actually. And both > 32 and 64 backends need vcpu->arch.book3s to be valid when calling > kvmppc_mmu_destroy_pr(). So instead of calling kvmppc_mmu_destroy() > from kvm_arch_vcpu_destroy(), call kvmppc_mmu_destroy_pr() at the > beginning of kvmppc_core_vcpu_free_pr(). This is consistent with > kvmppc_mmu_init() being the last call in kvmppc_core_vcpu_create_pr(). > > For the same reason, if kvmppc_core_vcpu_create_pr() returns an > error then this means that kvmppc_mmu_init() was either not called > or failed, in which case kvmppc_mmu_destroy() should not be called. > Drop the line in the error path of kvm_arch_vcpu_create(). > > Fixes: ff030fdf5573 ("KVM: PPC: Move kvm_vcpu_init() invocation to common > code") > Signed-off-by: Greg Kurz Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/1d0c32ec3b860a32df593a22bad0d1dbc5546a59 cheers
Re: [PATCH v3] powerpc/kasan: Fix shadow memory protection with CONFIG_KASAN_VMALLOC
On Fri, 2020-03-06 at 16:49:49 UTC, Christophe Leroy wrote: > With CONFIG_KASAN_VMALLOC, new page tables are created at the time > shadow memory for vmalloc area in unmapped. If some parts of the > page table still has entries to the zero page shadow memory, the > entries are wrongly marked RW. > > With CONFIG_KASAN_VMALLOC, almost the entire kernel address space > is managed by KASAN. To make it simple, just create KASAN page tables > for the entire kernel space at kasan_init(). That doesn't use much > more space, and that's anyway already done for hash platforms. > > Fixes: 3d4247fcc938 ("powerpc/32: Add support of KASAN_VMALLOC") > Signed-off-by: Christophe Leroy Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/af3d0a68698c7e5df8b72267086b23422a3954bb cheers
[patch V3 17/20] lockdep: Introduce wait-type checks
From: Peter Zijlstra Extend lockdep to validate lock wait-type context. The current wait-types are: LD_WAIT_FREE, /* wait free, rcu etc.. */ LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */ LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */ LD_WAIT_SLEEP, /* sleeping locks, mutex_t etc.. */ Where lockdep validates that the current lock (the one being acquired) fits in the current wait-context (as generated by the held stack). This ensures that there is no attempt to acquire mutexes while holding spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In other words, its a more fancy might_sleep(). Obviously RCU made the entire ordeal more complex than a simple single value test because RCU can be acquired in (pretty much) any context and while it presents a context to nested locks it is not the same as it got acquired in. Therefore its necessary to split the wait_type into two values, one representing the acquire (outer) and one representing the nested context (inner). For most 'normal' locks these two are the same. [ To make static initialization easier we have the rule that: .outer == INV means .outer == .inner; because INV == 0. ] It further means that its required to find the minimal .inner of the held stack to compare against the outer of the new lock; because while 'normal' RCU presents a CONFIG type to nested locks, if it is taken while already holding a SPIN type it obviously doesn't relax the rules. Below is an example output generated by the trivial test code: raw_spin_lock(); spin_lock(); spin_unlock(); raw_spin_unlock(); [ BUG: Invalid wait context ] - swapper/0/1 is trying to lock: c9013f20 (){}-{3:3}, at: kernel_init+0xdb/0x187 other info that might help us debug this: 1 lock held by swapper/0/1: #0: c9013ee0 (){+.+.}-{2:2}, at: kernel_init+0xd1/0x187 The way to read it is to look at the new -{n,m} part in the lock description; -{3:3} for the attempted lock, and try and match that up to the held locks, which in this case is the one: -{2,2}. This tells that the acquiring lock requires a more relaxed environment than presented by the lock stack. Currently only the normal locks and RCU are converted, the rest of the lockdep users defaults to .inner = INV which is ignored. More conversions can be done when desired. The check for spinlock_t nesting is not enabled by default. It's a separate config option for now as there are known problems which are currently addressed. The config option allows to identify these problems and to verify that the solutions found are indeed solving them. The config switch will be removed and the checks will permanently enabled once the vast majority of issues has been addressed. [ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP] [ tglx: Add the config option ] Requested-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner --- V2: Fix the LOCKDEP=y && LOCK_PROVE=n case --- include/linux/irqflags.h|8 ++ include/linux/lockdep.h | 71 +--- include/linux/mutex.h |7 +- include/linux/rwlock_types.h|6 + include/linux/rwsem.h |6 + include/linux/sched.h |1 include/linux/spinlock.h| 35 +++--- include/linux/spinlock_types.h | 24 +- kernel/irq/handle.c |7 ++ kernel/locking/lockdep.c| 138 ++-- kernel/locking/mutex-debug.c|2 kernel/locking/rwsem.c |2 kernel/locking/spinlock_debug.c |6 - kernel/rcu/update.c | 24 +- lib/Kconfig.debug | 17 15 files changed, 307 insertions(+), 47 deletions(-) --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -37,7 +37,12 @@ # define trace_softirqs_enabled(p) ((p)->softirqs_enabled) # define trace_hardirq_enter() \ do { \ - current->hardirq_context++; \ + if (!current->hardirq_context++)\ + current->hardirq_threaded = 0; \ +} while (0) +# define trace_hardirq_threaded() \ +do { \ + current->hardirq_threaded = 1; \ } while (0) # define trace_hardirq_exit() \ do { \ @@ -59,6 +64,7 @@ do { \ # define trace_hardirqs_enabled(p) 0 # define trace_softirqs_enabled(p) 0 # define trace_hardirq_enter() do { } while (0) +# define trace_hardirq_threaded() do { } while (0) # define trace_hardirq_exit() do { } while (0) # define
[patch V3 16/20] completion: Use simple wait queues
From: Thomas Gleixner completion uses a wait_queue_head_t to enqueue waiters. wait_queue_head_t contains a spinlock_t to protect the list of waiters which excludes it from being used in truly atomic context on a PREEMPT_RT enabled kernel. The spinlock in the wait queue head cannot be replaced by a raw_spinlock because: - wait queues can have custom wakeup callbacks, which acquire other spinlock_t locks and have potentially long execution times - wake_up() walks an unbounded number of list entries during the wake up and may wake an unbounded number of waiters. For simplicity and performance reasons complete() should be usable on PREEMPT_RT enabled kernels. completions do not use custom wakeup callbacks and are usually single waiter, except for a few corner cases. Replace the wait queue in the completion with a simple wait queue (swait), which uses a raw_spinlock_t for protecting the waiter list and therefore is safe to use inside truly atomic regions on PREEMPT_RT. There is no semantical or functional change: - completions use the exclusive wait mode which is what swait provides - complete() wakes one exclusive waiter - complete_all() wakes all waiters while holding the lock which protects the wait queue against newly incoming waiters. The conversion to swait preserves this behaviour. complete_all() might cause unbound latencies with a large number of waiters being woken at once, but most complete_all() usage sites are either in testing or initialization code or have only a really small number of concurrent waiters which for now does not cause a latency problem. Keep it simple for now. The fixup of the warning check in the USB gadget driver is just a straight forward conversion of the lockless waiter check from one waitqueue type to the other. Signed-off-by: Thomas Gleixner Reviewed-by: Joel Fernandes (Google) Reviewed-by: Davidlohr Bueso Reviewed-by: Greg Kroah-Hartman Acked-by: Linus Torvalds --- V2: Split out the orinoco and usb gadget parts and amended change log --- drivers/usb/gadget/function/f_fs.c |2 +- include/linux/completion.h |8 kernel/sched/completion.c | 36 +++- 3 files changed, 24 insertions(+), 22 deletions(-) --- a/drivers/usb/gadget/function/f_fs.c +++ b/drivers/usb/gadget/function/f_fs.c @@ -1703,7 +1703,7 @@ static void ffs_data_put(struct ffs_data pr_info("%s(): freeing\n", __func__); ffs_data_clear(ffs); BUG_ON(waitqueue_active(>ev.waitq) || - waitqueue_active(>ep0req_completion.wait) || + swait_active(>ep0req_completion.wait) || waitqueue_active(>wait)); destroy_workqueue(ffs->io_completion_wq); kfree(ffs->dev_name); --- a/include/linux/completion.h +++ b/include/linux/completion.h @@ -9,7 +9,7 @@ * See kernel/sched/completion.c for details. */ -#include +#include /* * struct completion - structure used to maintain state for a "completion" @@ -25,7 +25,7 @@ */ struct completion { unsigned int done; - wait_queue_head_t wait; + struct swait_queue_head wait; }; #define init_completion_map(x, m) __init_completion(x) @@ -34,7 +34,7 @@ static inline void complete_acquire(stru static inline void complete_release(struct completion *x) {} #define COMPLETION_INITIALIZER(work) \ - { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) } + { 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) } #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \ (*({ init_completion_map(&(work), &(map)); &(work); })) @@ -85,7 +85,7 @@ static inline void complete_release(stru static inline void __init_completion(struct completion *x) { x->done = 0; - init_waitqueue_head(>wait); + init_swait_queue_head(>wait); } /** --- a/kernel/sched/completion.c +++ b/kernel/sched/completion.c @@ -29,12 +29,12 @@ void complete(struct completion *x) { unsigned long flags; - spin_lock_irqsave(>wait.lock, flags); + raw_spin_lock_irqsave(>wait.lock, flags); if (x->done != UINT_MAX) x->done++; - __wake_up_locked(>wait, TASK_NORMAL, 1); - spin_unlock_irqrestore(>wait.lock, flags); + swake_up_locked(>wait); + raw_spin_unlock_irqrestore(>wait.lock, flags); } EXPORT_SYMBOL(complete); @@ -58,10 +58,12 @@ void complete_all(struct completion *x) { unsigned long flags; - spin_lock_irqsave(>wait.lock, flags); + WARN_ON(irqs_disabled()); + + raw_spin_lock_irqsave(>wait.lock, flags); x->done = UINT_MAX; - __wake_up_locked(>wait, TASK_NORMAL, 0); - spin_unlock_irqrestore(>wait.lock, flags); + swake_up_all_locked(>wait); + raw_spin_unlock_irqrestore(>wait.lock, flags); } EXPORT_SYMBOL(complete_all); @@ -70,20 +72,20 @@ do_wait_for_common(struct completion *x,
[patch V3 09/20] ia64: Remove mm.h from asm/uaccess.h
From: Sebastian Andrzej Siewior The defconfig compiles without linux/mm.h. With mm.h included the include chain leands to: | CC kernel/locking/percpu-rwsem.o | In file included from include/linux/huge_mm.h:8, | from include/linux/mm.h:567, | from arch/ia64/include/asm/uaccess.h:, | from include/linux/uaccess.h:11, | from include/linux/sched/task.h:11, | from include/linux/sched/signal.h:9, | from include/linux/rcuwait.h:6, | from include/linux/percpu-rwsem.h:8, | from kernel/locking/percpu-rwsem.c:6: | include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore' | 1422 | struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS]; once rcuwait.h includes linux/sched/signal.h. Remove the linux/mm.h include. Reported-by: kbuild test robot Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Cc: Tony Luck Cc: Fenghua Yu Cc: linux-i...@vger.kernel.org --- V3: New patch --- arch/ia64/include/asm/uaccess.h | 1 - arch/ia64/kernel/process.c | 1 + arch/ia64/mm/ioremap.c | 1 + 3 files changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/ia64/include/asm/uaccess.h b/arch/ia64/include/asm/uaccess.h index 89782ad3fb887..5c7e79eccaeed 100644 --- a/arch/ia64/include/asm/uaccess.h +++ b/arch/ia64/include/asm/uaccess.h @@ -35,7 +35,6 @@ #include #include -#include #include #include diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c index 968b5f33e725e..743aaf5283278 100644 --- a/arch/ia64/kernel/process.c +++ b/arch/ia64/kernel/process.c @@ -681,3 +681,4 @@ machine_power_off (void) machine_halt(); } +EXPORT_SYMBOL(ia64_delay_loop); diff --git a/arch/ia64/mm/ioremap.c b/arch/ia64/mm/ioremap.c index a09cfa0645369..55fd3eb753ff9 100644 --- a/arch/ia64/mm/ioremap.c +++ b/arch/ia64/mm/ioremap.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include -- 2.26.0.rc2
[patch V3 18/20] lockdep: Add hrtimer context tracing bits
From: Sebastian Andrzej Siewior Set current->irq_config = 1 for hrtimers which are not marked to expire in hard interrupt context during hrtimer_init(). These timers will expire in softirq context on PREEMPT_RT. Setting this allows lockdep to differentiate these timers. If a timer is marked to expire in hard interrupt context then the timer callback is not supposed to acquire a regular spinlock instead of a raw_spinlock in the expiry callback. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner --- include/linux/irqflags.h | 15 +++ include/linux/sched.h|1 + kernel/locking/lockdep.c |2 +- kernel/time/hrtimer.c|6 +- 4 files changed, 22 insertions(+), 2 deletions(-) --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -56,6 +56,19 @@ do { \ do { \ current->softirq_context--; \ } while (0) + +# define lockdep_hrtimer_enter(__hrtimer) \ + do { \ + if (!__hrtimer->is_hard) \ + current->irq_config = 1;\ + } while (0) + +# define lockdep_hrtimer_exit(__hrtimer) \ + do { \ + if (!__hrtimer->is_hard) \ + current->irq_config = 0;\ + } while (0) + #else # define trace_hardirqs_on() do { } while (0) # define trace_hardirqs_off() do { } while (0) @@ -68,6 +81,8 @@ do { \ # define trace_hardirq_exit() do { } while (0) # define lockdep_softirq_enter() do { } while (0) # define lockdep_softirq_exit()do { } while (0) +# define lockdep_hrtimer_enter(__hrtimer) do { } while (0) +# define lockdep_hrtimer_exit(__hrtimer) do { } while (0) #endif #if defined(CONFIG_IRQSOFF_TRACER) || \ --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -983,6 +983,7 @@ struct task_struct { unsigned intsoftirq_enable_event; int softirqs_enabled; int softirq_context; + int irq_config; #endif #ifdef CONFIG_LOCKDEP --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -3759,7 +3759,7 @@ static int check_wait_context(struct tas /* * Check if force_irqthreads will run us threaded. */ - if (curr->hardirq_threaded) + if (curr->hardirq_threaded || curr->irq_config) curr_inner = LD_WAIT_CONFIG; else curr_inner = LD_WAIT_SPIN; --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1404,7 +1404,7 @@ static void __hrtimer_init(struct hrtime base = softtimer ? HRTIMER_MAX_CLOCK_BASES / 2 : 0; base += hrtimer_clockid_to_base(clock_id); timer->is_soft = softtimer; - timer->is_hard = !softtimer; + timer->is_hard = !!(mode & HRTIMER_MODE_HARD); timer->base = _base->clock_base[base]; timerqueue_init(>node); } @@ -1514,7 +1514,11 @@ static void __run_hrtimer(struct hrtimer */ raw_spin_unlock_irqrestore(_base->lock, flags); trace_hrtimer_expire_entry(timer, now); + lockdep_hrtimer_enter(timer); + restart = fn(timer); + + lockdep_hrtimer_exit(timer); trace_hrtimer_expire_exit(timer); raw_spin_lock_irq(_base->lock);
[patch V3 01/20] PCI/switchtec: Fix init_completion race condition with poll_wait()
From: Logan Gunthorpe The call to init_completion() in mrpc_queue_cmd() can theoretically race with the call to poll_wait() in switchtec_dev_poll(). poll()write() switchtec_dev_poll() switchtec_dev_write() poll_wait(>comp.wait); mrpc_queue_cmd() init_completion(>comp) init_waitqueue_head(>comp.wait) To my knowledge, no one has hit this bug. Fix this by using reinit_completion() instead of init_completion() in mrpc_queue_cmd(). Fixes: 080b47def5e5 ("MicroSemi Switchtec management interface driver") Reported-by: Sebastian Andrzej Siewior Signed-off-by: Logan Gunthorpe Signed-off-by: Thomas Gleixner Acked-by: Bjorn Helgaas Cc: Kurt Schwemmer Cc: linux-...@vger.kernel.org Link: https://lkml.kernel.org/r/20200313183608.2646-1-log...@deltatee.com --- drivers/pci/switch/switchtec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c index a823b4b8ef8a..81dc7ac01381 100644 --- a/drivers/pci/switch/switchtec.c +++ b/drivers/pci/switch/switchtec.c @@ -175,7 +175,7 @@ static int mrpc_queue_cmd(struct switchtec_user *stuser) kref_get(>kref); stuser->read_len = sizeof(stuser->data); stuser_set_state(stuser, MRPC_QUEUED); - init_completion(>comp); + reinit_completion(>comp); list_add_tail(>list, >mrpc_queue); mrpc_cmd_submit(stdev); -- 2.20.1
[patch V3 11/20] rcuwait: Add @state argument to rcuwait_wait_event()
From: Peter Zijlstra (Intel) Extend rcuwait_wait_event() with a state variable so that it is not restricted to UNINTERRUPTIBLE waits. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Linus Torvalds Cc: Davidlohr Bueso --- include/linux/rcuwait.h | 12 ++-- kernel/locking/percpu-rwsem.c |2 +- 2 files changed, 11 insertions(+), 3 deletions(-) --- a/include/linux/rcuwait.h +++ b/include/linux/rcuwait.h @@ -3,6 +3,7 @@ #define _LINUX_RCUWAIT_H_ #include +#include /* * rcuwait provides a way of blocking and waking up a single @@ -30,23 +31,30 @@ extern void rcuwait_wake_up(struct rcuwa * The caller is responsible for locking around rcuwait_wait_event(), * such that writes to @task are properly serialized. */ -#define rcuwait_wait_event(w, condition) \ +#define rcuwait_wait_event(w, condition, state) \ ({ \ + int __ret = 0; \ rcu_assign_pointer((w)->task, current); \ for (;;) { \ /* \ * Implicit barrier (A) pairs with (B) in \ * rcuwait_wake_up(). \ */ \ - set_current_state(TASK_UNINTERRUPTIBLE);\ + set_current_state(state); \ if (condition) \ break; \ \ + if (signal_pending_state(state, current)) { \ + __ret = -EINTR; \ + break; \ + } \ + \ schedule(); \ } \ \ WRITE_ONCE((w)->task, NULL);\ __set_current_state(TASK_RUNNING); \ + __ret; \ }) #endif /* _LINUX_RCUWAIT_H_ */ --- a/kernel/locking/percpu-rwsem.c +++ b/kernel/locking/percpu-rwsem.c @@ -162,7 +162,7 @@ void percpu_down_write(struct percpu_rw_ */ /* Wait for all now active readers to complete. */ - rcuwait_wait_event(>writer, readers_active_check(sem)); + rcuwait_wait_event(>writer, readers_active_check(sem), TASK_UNINTERRUPTIBLE); } EXPORT_SYMBOL_GPL(percpu_down_write);
[patch V3 03/20] usb: gadget: Use completion interface instead of open coding it
From: Thomas Gleixner ep_io() uses a completion on stack and open codes the waiting with: wait_event_interruptible (done.wait, done.done); and wait_event (done.wait, done.done); This waits in non-exclusive mode for complete(), but there is no reason to do so because the completion can only be waited for by the task itself and complete() wakes exactly one exlusive waiter. Replace the open coded implementation with the corresponding wait_for_completion*() functions. No functional change. Reported-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Cc: Felipe Balbi Cc: linux-...@vger.kernel.org --- V2: New patch to avoid the conversion to swait interfaces later --- drivers/usb/gadget/legacy/inode.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/drivers/usb/gadget/legacy/inode.c +++ b/drivers/usb/gadget/legacy/inode.c @@ -344,7 +344,7 @@ ep_io (struct ep_data *epdata, void *buf spin_unlock_irq (>dev->lock); if (likely (value == 0)) { - value = wait_event_interruptible (done.wait, done.done); + value = wait_for_completion_interruptible(); if (value != 0) { spin_lock_irq (>dev->lock); if (likely (epdata->ep != NULL)) { @@ -353,7 +353,7 @@ ep_io (struct ep_data *epdata, void *buf usb_ep_dequeue (epdata->ep, epdata->req); spin_unlock_irq (>dev->lock); - wait_event (done.wait, done.done); + wait_for_completion(); if (epdata->status == -ECONNRESET) epdata->status = -EINTR; } else {
[patch V3 10/20] microblaze: Remove mm.h from asm/uaccess.h
From: Sebastian Andrzej Siewior The defconfig compiles without linux/mm.h. With mm.h included the include chain leands to: | CC kernel/locking/percpu-rwsem.o | In file included from include/linux/huge_mm.h:8, | from include/linux/mm.h:567, | from arch/microblaze/include/asm/uaccess.h:, | from include/linux/uaccess.h:11, | from include/linux/sched/task.h:11, | from include/linux/sched/signal.h:9, | from include/linux/rcuwait.h:6, | from include/linux/percpu-rwsem.h:8, | from kernel/locking/percpu-rwsem.c:6: | include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore' | 1422 | struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS]; once rcuwait.h includes linux/sched/signal.h. Remove the linux/mm.h include. Reported-by: kbuild test robot Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Cc: Michal Simek --- V3; New patch --- arch/microblaze/include/asm/uaccess.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/microblaze/include/asm/uaccess.h b/arch/microblaze/include/asm/uaccess.h index a1f206b90753a..4916d5fbea5e3 100644 --- a/arch/microblaze/include/asm/uaccess.h +++ b/arch/microblaze/include/asm/uaccess.h @@ -12,7 +12,6 @@ #define _ASM_MICROBLAZE_UACCESS_H #include -#include #include #include -- 2.26.0.rc2
[patch V3 19/20] lockdep: Annotate irq_work
From: Sebastian Andrzej Siewior Mark irq_work items with IRQ_WORK_HARD_IRQ which should be invoked in hardirq context even on PREEMPT_RT. IRQ_WORK without this flag will be invoked in softirq context on PREEMPT_RT. Set ->irq_config to 1 for the IRQ_WORK items which are invoked in softirq context so lockdep knows that these can safely acquire a spinlock_t. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner --- include/linux/irq_work.h |2 ++ include/linux/irqflags.h | 13 + kernel/irq_work.c|2 ++ kernel/rcu/tree.c|1 + kernel/time/tick-sched.c |1 + 5 files changed, 19 insertions(+) --- a/include/linux/irq_work.h +++ b/include/linux/irq_work.h @@ -18,6 +18,8 @@ /* Doesn't want IPI, wait for tick: */ #define IRQ_WORK_LAZY BIT(2) +/* Run hard IRQ context, even on RT */ +#define IRQ_WORK_HARD_IRQ BIT(3) #define IRQ_WORK_CLAIMED (IRQ_WORK_PENDING | IRQ_WORK_BUSY) --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -69,6 +69,17 @@ do { \ current->irq_config = 0;\ } while (0) +# define lockdep_irq_work_enter(__work) \ + do { \ + if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\ + current->irq_config = 1;\ + } while (0) +# define lockdep_irq_work_exit(__work) \ + do { \ + if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\ + current->irq_config = 0;\ + } while (0) + #else # define trace_hardirqs_on() do { } while (0) # define trace_hardirqs_off() do { } while (0) @@ -83,6 +94,8 @@ do { \ # define lockdep_softirq_exit()do { } while (0) # define lockdep_hrtimer_enter(__hrtimer) do { } while (0) # define lockdep_hrtimer_exit(__hrtimer) do { } while (0) +# define lockdep_irq_work_enter(__work)do { } while (0) +# define lockdep_irq_work_exit(__work) do { } while (0) #endif #if defined(CONFIG_IRQSOFF_TRACER) || \ --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -153,7 +153,9 @@ static void irq_work_run_list(struct lli */ flags = atomic_fetch_andnot(IRQ_WORK_PENDING, >flags); + lockdep_irq_work_enter(work); work->func(work); + lockdep_irq_work_exit(work); /* * Clear the BUSY bit and return to the free state if * no-one else claimed it meanwhile. --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1113,6 +1113,7 @@ static int rcu_implicit_dynticks_qs(stru !rdp->rcu_iw_pending && rdp->rcu_iw_gp_seq != rnp->gp_seq && (rnp->ffmask & rdp->grpmask)) { init_irq_work(>rcu_iw, rcu_iw_handler); + atomic_set(>rcu_iw.flags, IRQ_WORK_HARD_IRQ); rdp->rcu_iw_pending = true; rdp->rcu_iw_gp_seq = rnp->gp_seq; irq_work_queue_on(>rcu_iw, rdp->cpu); --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -245,6 +245,7 @@ static void nohz_full_kick_func(struct i static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = { .func = nohz_full_kick_func, + .flags = ATOMIC_INIT(IRQ_WORK_HARD_IRQ), }; /*
[patch V3 05/20] acpi: Remove header dependency
From: Peter Zijlstra In order to avoid future header hell, remove the inclusion of proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a struct. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Darren Hart Cc: Andy Shevchenko Cc: platform-driver-...@vger.kernel.org Cc: Greg Kroah-Hartman Cc: Zhang Rui Cc: "Rafael J. Wysocki" Cc: linux...@vger.kernel.org Cc: Len Brown Cc: linux-a...@vger.kernel.org --- drivers/platform/x86/dell-smo8800.c |1 + drivers/platform/x86/wmi.c |1 + drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c |1 + include/acpi/acpi_bus.h |2 +- 4 files changed, 4 insertions(+), 1 deletion(-) --- a/drivers/platform/x86/dell-smo8800.c +++ b/drivers/platform/x86/dell-smo8800.c @@ -16,6 +16,7 @@ #include #include #include +#include struct smo8800_device { u32 irq; /* acpi device irq */ --- a/drivers/platform/x86/wmi.c +++ b/drivers/platform/x86/wmi.c @@ -29,6 +29,7 @@ #include #include #include +#include #include ACPI_MODULE_NAME("wmi"); --- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c +++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c @@ -19,6 +19,7 @@ #include #include #include +#include #include "acpi_thermal_rel.h" static acpi_handle acpi_thermal_rel_handle; --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -80,7 +80,7 @@ bool acpi_dev_present(const char *hid, c #ifdef CONFIG_ACPI -#include +struct proc_dir_entry; #define ACPI_BUS_FILE_ROOT "acpi" extern struct proc_dir_entry *acpi_root_dir;
[patch V3 13/20] Documentation: Add lock ordering and nesting documentation
From: Thomas Gleixner The kernel provides a variety of locking primitives. The nesting of these lock types and the implications of them on RT enabled kernels is nowhere documented. Add initial documentation. Signed-off-by: Thomas Gleixner Cc: "Paul E . McKenney" Cc: Jonathan Corbet Cc: Davidlohr Bueso Cc: Randy Dunlap --- V3: Addressed review comments from Paul, Jonathan, Davidlohr V2: Addressed review comments from Randy --- Documentation/locking/index.rst |1 Documentation/locking/locktypes.rst | 299 2 files changed, 300 insertions(+) create mode 100644 Documentation/locking/locktypes.rst --- a/Documentation/locking/index.rst +++ b/Documentation/locking/index.rst @@ -7,6 +7,7 @@ locking .. toctree:: :maxdepth: 1 +locktypes lockdep-design lockstat locktorture --- /dev/null +++ b/Documentation/locking/locktypes.rst @@ -0,0 +1,299 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _kernel_hacking_locktypes: + +== +Lock types and their rules +== + +Introduction + + +The kernel provides a variety of locking primitives which can be divided +into two categories: + + - Sleeping locks + - Spinning locks + +This document conceptually describes these lock types and provides rules +for their nesting, including the rules for use under PREEMPT_RT. + + +Lock categories +=== + +Sleeping locks +-- + +Sleeping locks can only be acquired in preemptible task context. + +Although implementations allow try_lock() from other contexts, it is +necessary to carefully evaluate the safety of unlock() as well as of +try_lock(). Furthermore, it is also necessary to evaluate the debugging +versions of these primitives. In short, don't acquire sleeping locks from +other contexts unless there is no other option. + +Sleeping lock types: + + - mutex + - rt_mutex + - semaphore + - rw_semaphore + - ww_mutex + - percpu_rw_semaphore + +On PREEMPT_RT kernels, these lock types are converted to sleeping locks: + + - spinlock_t + - rwlock_t + +Spinning locks +-- + + - raw_spinlock_t + - bit spinlocks + +On non-PREEMPT_RT kernels, these lock types are also spinning locks: + + - spinlock_t + - rwlock_t + +Spinning locks implicitly disable preemption and the lock / unlock functions +can have suffixes which apply further protections: + + === + _bh()Disable / enable bottom halves (soft interrupts) + _irq() Disable / enable interrupts + _irqsave/restore() Save and disable / restore interrupt disabled state + === + + +rtmutex +=== + +RT-mutexes are mutexes with support for priority inheritance (PI). + +PI has limitations on non PREEMPT_RT enabled kernels due to preemption and +interrupt disabled sections. + +PI clearly cannot preempt preemption-disabled or interrupt-disabled +regions of code, even on PREEMPT_RT kernels. Instead, PREEMPT_RT kernels +execute most such regions of code in preemptible task context, especially +interrupt handlers and soft interrupts. This conversion allows spinlock_t +and rwlock_t to be implemented via RT-mutexes. + + +raw_spinlock_t and spinlock_t += + +raw_spinlock_t +-- + +raw_spinlock_t is a strict spinning lock implementation regardless of the +kernel configuration including PREEMPT_RT enabled kernels. + +raw_spinlock_t is a strict spinning lock implementation in all kernels, +including PREEMPT_RT kernels. Use raw_spinlock_t only in real critical +core code, low level interrupt handling and places where disabling +preemption or interrupts is required, for example, to safely access +hardware state. raw_spinlock_t can sometimes also be used when the +critical section is tiny, thus avoiding RT-mutex overhead. + +spinlock_t +-- + +The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT. + +On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t +and has exactly the same semantics. + +spinlock_t and PREEMPT_RT +- + +On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate +implementation based on rt_mutex which changes the semantics: + + - Preemption is not disabled + + - The hard interrupt related suffixes for spin_lock / spin_unlock + operations (_irq, _irqsave / _irqrestore) do not affect the CPUs + interrupt disabled state + + - The soft interrupt related suffix (_bh()) still disables softirq + handlers. + + Non-PREEMPT_RT kernels disable preemption to get this effect. + + PREEMPT_RT kernels use a per-CPU lock for serialization which keeps + preemption disabled. The lock disables softirq handlers and also + prevents reentrancy due to task preemption. + +PREEMPT_RT kernels preserve all other spinlock_t semantics: + + - Tasks holding a
[patch V3 14/20] timekeeping: Split jiffies seqlock
From: Thomas Gleixner seqlock consists of a sequence counter and a spinlock_t which is used to serialize the writers. spinlock_t is substituted by a "sleeping" spinlock on PREEMPT_RT enabled kernels which breaks the usage in the timekeeping code as the writers are executed in hard interrupt and therefore non-preemptible context even on PREEMPT_RT. The spinlock in seqlock cannot be unconditionally replaced by a raw_spinlock_t as many seqlock users have nesting spinlock sections or other code which is not suitable to run in truly atomic context on RT. Instead of providing a raw_seqlock API for a single use case, open code the seqlock for the jiffies use case and implement it with a raw_spinlock_t and a sequence counter. Signed-off-by: Thomas Gleixner --- kernel/time/jiffies.c |7 --- kernel/time/tick-common.c | 10 ++ kernel/time/tick-sched.c | 19 --- kernel/time/timekeeping.c |6 -- kernel/time/timekeeping.h |3 ++- 5 files changed, 28 insertions(+), 17 deletions(-) --- a/kernel/time/jiffies.c +++ b/kernel/time/jiffies.c @@ -58,7 +58,8 @@ static struct clocksource clocksource_ji .max_cycles = 10, }; -__cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock); +__cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(jiffies_lock); +__cacheline_aligned_in_smp seqcount_t jiffies_seq; #if (BITS_PER_LONG < 64) u64 get_jiffies_64(void) @@ -67,9 +68,9 @@ u64 get_jiffies_64(void) u64 ret; do { - seq = read_seqbegin(_lock); + seq = read_seqcount_begin(_seq); ret = jiffies_64; - } while (read_seqretry(_lock, seq)); + } while (read_seqcount_retry(_seq, seq)); return ret; } EXPORT_SYMBOL(get_jiffies_64); --- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -84,13 +84,15 @@ int tick_is_oneshot_available(void) static void tick_periodic(int cpu) { if (tick_do_timer_cpu == cpu) { - write_seqlock(_lock); + raw_spin_lock(_lock); + write_seqcount_begin(_seq); /* Keep track of the next tick event */ tick_next_period = ktime_add(tick_next_period, tick_period); do_timer(1); - write_sequnlock(_lock); + write_seqcount_end(_seq); + raw_spin_unlock(_lock); update_wall_time(); } @@ -162,9 +164,9 @@ void tick_setup_periodic(struct clock_ev ktime_t next; do { - seq = read_seqbegin(_lock); + seq = read_seqcount_begin(_seq); next = tick_next_period; - } while (read_seqretry(_lock, seq)); + } while (read_seqcount_retry(_seq, seq)); clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT); --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -65,7 +65,8 @@ static void tick_do_update_jiffies64(kti return; /* Reevaluate with jiffies_lock held */ - write_seqlock(_lock); + raw_spin_lock(_lock); + write_seqcount_begin(_seq); delta = ktime_sub(now, last_jiffies_update); if (delta >= tick_period) { @@ -91,10 +92,12 @@ static void tick_do_update_jiffies64(kti /* Keep the tick_next_period variable up to date */ tick_next_period = ktime_add(last_jiffies_update, tick_period); } else { - write_sequnlock(_lock); + write_seqcount_end(_seq); + raw_spin_unlock(_lock); return; } - write_sequnlock(_lock); + write_seqcount_end(_seq); + raw_spin_unlock(_lock); update_wall_time(); } @@ -105,12 +108,14 @@ static ktime_t tick_init_jiffy_update(vo { ktime_t period; - write_seqlock(_lock); + raw_spin_lock(_lock); + write_seqcount_begin(_seq); /* Did we start the jiffies update yet ? */ if (last_jiffies_update == 0) last_jiffies_update = tick_next_period; period = last_jiffies_update; - write_sequnlock(_lock); + write_seqcount_end(_seq); + raw_spin_unlock(_lock); return period; } @@ -676,10 +681,10 @@ static ktime_t tick_nohz_next_event(stru /* Read jiffies and the time when jiffies were updated last */ do { - seq = read_seqbegin(_lock); + seq = read_seqcount_begin(_seq); basemono = last_jiffies_update; basejiff = jiffies; - } while (read_seqretry(_lock, seq)); + } while (read_seqcount_retry(_seq, seq)); ts->last_jiffies = basejiff; ts->timer_expires_base = basemono; --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -2397,8 +2397,10 @@ EXPORT_SYMBOL(hardpps); */ void xtime_update(unsigned long ticks) { - write_seqlock(_lock); +
[patch V3 04/20] orinoco_usb: Use the regular completion interfaces
From: Thomas Gleixner The completion usage in this driver is interesting: - it uses a magic complete function which according to the comment was implemented by invoking complete() four times in a row because complete_all() was not exported at that time. - it uses an open coded wait/poll which checks completion:done. Only one wait side (device removal) uses the regular wait_for_completion() interface. The rationale behind this is to prevent that wait_for_completion() consumes completion::done which would prevent that all waiters are woken. This is not necessary with complete_all() as that sets completion::done to UINT_MAX which is left unmodified by the woken waiters. Replace the magic complete function with complete_all() and convert the open coded wait/poll to regular completion interfaces. This changes the wait to exclusive wait mode. But that does not make any difference because the wakers use complete_all() which ignores the exclusive mode. Reported-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Cc: Kalle Valo Cc: "David S. Miller" Cc: linux-wirel...@vger.kernel.org Cc: net...@vger.kernel.org Cc: linux-...@vger.kernel.org --- V2: New patch to avoid conversion to swait functions later. --- drivers/net/wireless/intersil/orinoco/orinoco_usb.c | 21 1 file changed, 5 insertions(+), 16 deletions(-) --- a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c +++ b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c @@ -365,17 +365,6 @@ static struct request_context *ezusb_all return ctx; } - -/* Hopefully the real complete_all will soon be exported, in the mean - * while this should work. */ -static inline void ezusb_complete_all(struct completion *comp) -{ - complete(comp); - complete(comp); - complete(comp); - complete(comp); -} - static void ezusb_ctx_complete(struct request_context *ctx) { struct ezusb_priv *upriv = ctx->upriv; @@ -409,7 +398,7 @@ static void ezusb_ctx_complete(struct re netif_wake_queue(dev); } - ezusb_complete_all(>done); + complete_all(>done); ezusb_request_context_put(ctx); break; @@ -419,7 +408,7 @@ static void ezusb_ctx_complete(struct re /* This is normal, as all request contexts get flushed * when the device is disconnected */ err("Called, CTX not terminating, but device gone"); - ezusb_complete_all(>done); + complete_all(>done); ezusb_request_context_put(ctx); break; } @@ -690,11 +679,11 @@ static void ezusb_req_ctx_wait(struct ez * get the chance to run themselves. So we make sure * that we don't sleep for ever */ int msecs = DEF_TIMEOUT * (1000 / HZ); - while (!ctx->done.done && msecs--) + + while (!try_wait_for_completion(>done) && msecs--) udelay(1000); } else { - wait_event_interruptible(ctx->done.wait, -ctx->done.done); + wait_for_completion(>done); } break; default:
[patch V3 15/20] sched/swait: Prepare usage in completions
From: Thomas Gleixner As a preparation to use simple wait queues for completions: - Provide swake_up_all_locked() to support complete_all() - Make __prepare_to_swait() public available This is done to enable the usage of complete() within truly atomic contexts on a PREEMPT_RT enabled kernel. Signed-off-by: Thomas Gleixner Cc: Linus Torvalds --- V2: Add comment to swake_up_all_locked() --- kernel/sched/sched.h |3 +++ kernel/sched/swait.c | 15 ++- 2 files changed, 17 insertions(+), 1 deletion(-) --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2492,3 +2492,6 @@ static inline bool is_per_cpu_kthread(st return true; } #endif + +void swake_up_all_locked(struct swait_queue_head *q); +void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait); --- a/kernel/sched/swait.c +++ b/kernel/sched/swait.c @@ -32,6 +32,19 @@ void swake_up_locked(struct swait_queue_ } EXPORT_SYMBOL(swake_up_locked); +/* + * Wake up all waiters. This is an interface which is solely exposed for + * completions and not for general usage. + * + * It is intentionally different from swake_up_all() to allow usage from + * hard interrupt context and interrupt disabled regions. + */ +void swake_up_all_locked(struct swait_queue_head *q) +{ + while (!list_empty(>task_list)) + swake_up_locked(q); +} + void swake_up_one(struct swait_queue_head *q) { unsigned long flags; @@ -69,7 +82,7 @@ void swake_up_all(struct swait_queue_hea } EXPORT_SYMBOL(swake_up_all); -static void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait) +void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait) { wait->task = current; if (list_empty(>task_list))
[patch V3 08/20] hexagon: Remove mm.h from asm/uaccess.h
From: Sebastian Andrzej Siewior The defconfig compiles without linux/mm.h. With mm.h included the include chain leands to: | CC kernel/locking/percpu-rwsem.o | In file included from include/linux/huge_mm.h:8, | from include/linux/mm.h:567, | from arch/hexagon/include/asm/uaccess.h:, | from include/linux/uaccess.h:11, | from include/linux/sched/task.h:11, | from include/linux/sched/signal.h:9, | from include/linux/rcuwait.h:6, | from include/linux/percpu-rwsem.h:8, | from kernel/locking/percpu-rwsem.c:6: | include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore' | 1422 | struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS]; once rcuwait.h includes linux/sched/signal.h. Remove the linux/mm.h include. Reported-by: kbuild test robot Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Cc: Brian Cain Cc: linux-hexa...@vger.kernel.org --- V3: New patch --- arch/hexagon/include/asm/uaccess.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/hexagon/include/asm/uaccess.h b/arch/hexagon/include/asm/uaccess.h index 00cb38faad0c4..c1019a736ff13 100644 --- a/arch/hexagon/include/asm/uaccess.h +++ b/arch/hexagon/include/asm/uaccess.h @@ -10,7 +10,6 @@ /* * User space memory access functions */ -#include #include /* -- 2.26.0.rc2
[patch V3 02/20] pci/switchtec: Replace completion wait queue usage for poll
From: Sebastian Andrzej Siewior The poll callback is using the completion wait queue and sticks it into poll_wait() to wake up pollers after a command has completed. This works to some extent, but cannot provide EPOLLEXCLUSIVE support because the waker side uses complete_all() which unconditionally wakes up all waiters. complete_all() is required because completions internally use exclusive wait and complete() only wakes up one waiter by default. This mixes conceptually different mechanisms and relies on internal implementation details of completions, which in turn puts contraints on changing the internal implementation of completions. Replace it with a regular wait queue and store the state in struct switchtec_user. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Reviewed-by: Logan Gunthorpe Acked-by: Peter Zijlstra (Intel) Acked-by: Bjorn Helgaas Cc: Kurt Schwemmer Cc: linux-...@vger.kernel.org --- V2: Reworded changelog. --- drivers/pci/switch/switchtec.c | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) --- a/drivers/pci/switch/switchtec.c +++ b/drivers/pci/switch/switchtec.c @@ -52,10 +52,11 @@ struct switchtec_user { enum mrpc_state state; - struct completion comp; + wait_queue_head_t cmd_comp; struct kref kref; struct list_head list; + bool cmd_done; u32 cmd; u32 status; u32 return_code; @@ -77,7 +78,7 @@ static struct switchtec_user *stuser_cre stuser->stdev = stdev; kref_init(>kref); INIT_LIST_HEAD(>list); - init_completion(>comp); + init_waitqueue_head(>cmd_comp); stuser->event_cnt = atomic_read(>event_cnt); dev_dbg(>dev, "%s: %p\n", __func__, stuser); @@ -175,7 +176,7 @@ static int mrpc_queue_cmd(struct switcht kref_get(>kref); stuser->read_len = sizeof(stuser->data); stuser_set_state(stuser, MRPC_QUEUED); - reinit_completion(>comp); + stuser->cmd_done = false; list_add_tail(>list, >mrpc_queue); mrpc_cmd_submit(stdev); @@ -222,7 +223,8 @@ static void mrpc_complete_cmd(struct swi memcpy_fromio(stuser->data, >mmio_mrpc->output_data, stuser->read_len); out: - complete_all(>comp); + stuser->cmd_done = true; + wake_up_interruptible(>cmd_comp); list_del_init(>list); stuser_put(stuser); stdev->mrpc_busy = 0; @@ -529,10 +531,11 @@ static ssize_t switchtec_dev_read(struct mutex_unlock(>mrpc_mutex); if (filp->f_flags & O_NONBLOCK) { - if (!try_wait_for_completion(>comp)) + if (!stuser->cmd_done) return -EAGAIN; } else { - rc = wait_for_completion_interruptible(>comp); + rc = wait_event_interruptible(stuser->cmd_comp, + stuser->cmd_done); if (rc < 0) return rc; } @@ -580,7 +583,7 @@ static __poll_t switchtec_dev_poll(struc struct switchtec_dev *stdev = stuser->stdev; __poll_t ret = 0; - poll_wait(filp, >comp.wait, wait); + poll_wait(filp, >cmd_comp, wait); poll_wait(filp, >event_wq, wait); if (lock_mutex_and_test_alive(stdev)) @@ -588,7 +591,7 @@ static __poll_t switchtec_dev_poll(struc mutex_unlock(>mrpc_mutex); - if (try_wait_for_completion(>comp)) + if (stuser->cmd_done) ret |= EPOLLIN | EPOLLRDNORM; if (stuser->event_cnt != atomic_read(>event_cnt)) @@ -1272,7 +1275,8 @@ static void stdev_kill(struct switchtec_ /* Wake up and kill any users waiting on an MRPC request */ list_for_each_entry_safe(stuser, tmpuser, >mrpc_queue, list) { - complete_all(>comp); + stuser->cmd_done = true; + wake_up_interruptible(>cmd_comp); list_del_init(>list); stuser_put(stuser); }
[patch V3 06/20] nds32: Remove mm.h from asm/uaccess.h
From: Sebastian Andrzej Siewior The defconfig compiles without linux/mm.h. With mm.h included the include chain leands to: | CC kernel/locking/percpu-rwsem.o | In file included from include/linux/huge_mm.h:8, | from include/linux/mm.h:567, | from arch/nds32/include/asm/uaccess.h:, | from include/linux/uaccess.h:11, | from include/linux/sched/task.h:11, | from include/linux/sched/signal.h:9, | from include/linux/rcuwait.h:6, | from include/linux/percpu-rwsem.h:8, | from kernel/locking/percpu-rwsem.c:6: | include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore' | 1422 | struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS]; once rcuwait.h includes linux/sched/signal.h. Remove the linux/mm.h include. Reported-by: kbuild test robot Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Cc: Nick Hu Cc: Greentime Hu Cc: Vincent Chen --- V3: New patch --- arch/nds32/include/asm/uaccess.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/nds32/include/asm/uaccess.h b/arch/nds32/include/asm/uaccess.h index 8916ad9f9f139..3a9219f53ee0d 100644 --- a/arch/nds32/include/asm/uaccess.h +++ b/arch/nds32/include/asm/uaccess.h @@ -11,7 +11,6 @@ #include #include #include -#include #define __asmeq(x, y) ".ifnc " x "," y " ; .err ; .endif\n\t" -- 2.26.0.rc2
[patch V3 07/20] csky: Remove mm.h from asm/uaccess.h
From: Sebastian Andrzej Siewior The defconfig compiles without linux/mm.h. With mm.h included the include chain leands to: | CC kernel/locking/percpu-rwsem.o | In file included from include/linux/huge_mm.h:8, | from include/linux/mm.h:567, | from arch/csky/include/asm/uaccess.h:, | from include/linux/uaccess.h:11, | from include/linux/sched/task.h:11, | from include/linux/sched/signal.h:9, | from include/linux/rcuwait.h:6, | from include/linux/percpu-rwsem.h:8, | from kernel/locking/percpu-rwsem.c:6: | include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore' | 1422 | struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS]; once rcuwait.h includes linux/sched/signal.h. Remove the linux/mm.h include. Reported-by: kbuild test robot Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Cc: Guo Ren Cc: linux-c...@vger.kernel.org --- V3: New patch --- arch/csky/include/asm/uaccess.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/csky/include/asm/uaccess.h b/arch/csky/include/asm/uaccess.h index eaa1c3403a424..abefa125b93cf 100644 --- a/arch/csky/include/asm/uaccess.h +++ b/arch/csky/include/asm/uaccess.h @@ -11,7 +11,6 @@ #include #include #include -#include #include #include #include -- 2.26.0.rc2
[patch V3 12/20] powerpc/ps3: Convert half completion to rcuwait
From: Thomas Gleixner The PS3 notification interrupt and kthread use a hacked up completion to communicate. Since we're wanting to change the completion implementation and this is abuse anyway, replace it with a simple rcuwait since there is only ever the one waiter. AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads cannot receive signals by default and this one doesn't look different. Use TASK_IDLE instead. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Michael Ellerman Cc: Arnd Bergmann Cc: Geoff Levand Cc: linuxppc-dev@lists.ozlabs.org --- V3: Folded the init fix from bigeasy V2: New patch to avoid the magic completion wait variant --- arch/powerpc/platforms/ps3/device-init.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) --- a/arch/powerpc/platforms/ps3/device-init.c +++ b/arch/powerpc/platforms/ps3/device-init.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -670,7 +671,8 @@ struct ps3_notification_device { spinlock_t lock; u64 tag; u64 lv1_status; - struct completion done; + struct rcuwait wait; + bool done; }; enum ps3_notify_type { @@ -712,7 +714,8 @@ static irqreturn_t ps3_notification_inte pr_debug("%s:%u: completed, status 0x%llx\n", __func__, __LINE__, status); dev->lv1_status = status; - complete(>done); + dev->done = true; + rcuwait_wake_up(>wait); } spin_unlock(>lock); return IRQ_HANDLED; @@ -725,12 +728,12 @@ static int ps3_notification_read_write(s unsigned long flags; int res; - init_completion(>done); spin_lock_irqsave(>lock, flags); res = write ? lv1_storage_write(dev->sbd.dev_id, 0, 0, 1, 0, lpar, >tag) : lv1_storage_read(dev->sbd.dev_id, 0, 0, 1, 0, lpar, >tag); + dev->done = false; spin_unlock_irqrestore(>lock, flags); if (res) { pr_err("%s:%u: %s failed %d\n", __func__, __LINE__, op, res); @@ -738,14 +741,10 @@ static int ps3_notification_read_write(s } pr_debug("%s:%u: notification %s issued\n", __func__, __LINE__, op); - res = wait_event_interruptible(dev->done.wait, - dev->done.done || kthread_should_stop()); + rcuwait_wait_event(>wait, dev->done || kthread_should_stop(), TASK_IDLE); + if (kthread_should_stop()) res = -EINTR; - if (res) { - pr_debug("%s:%u: interrupted %s\n", __func__, __LINE__, op); - return res; - } if (dev->lv1_status) { pr_err("%s:%u: %s not completed, status 0x%llx\n", __func__, @@ -810,6 +809,7 @@ static int ps3_probe_thread(void *data) } spin_lock_init(); + rcuwait_init(); res = request_irq(irq, ps3_notification_interrupt, 0, "ps3_notification", );
[patch V3 00/20] Lock ordering documentation and annotation for lockdep
This is the third and hopefully final version of this work. The second one can be found here: https://lore.kernel.org/r/20200318204302.693307...@linutronix.de Changes since V2: - Included the arch/XXX fixups for the rcuwait changes (Sebastian) - Folded the init fix for the PS3 change (Sebastian) - Addressed feedback on documentation (Paul, Davidlohr, Jonathan) - Picked up acks and reviewed tags Thanks, tglx
[patch V3 20/20] lockdep: Add posixtimer context tracing bits
From: Sebastian Andrzej Siewior Splitting run_posix_cpu_timers() into two parts is work in progress which is stuck on other entry code related problems. The heavy lifting which involves locking of sighand lock will be moved into task context so the necessary execution time is burdened on the task and not on interrupt context. Until this work completes lockdep with the spinlock nesting rules enabled would emit warnings for this known context. Prevent it by setting "->irq_config = 1" for the invocation of run_posix_cpu_timers() so lockdep does not complain when sighand lock is acquried. This will be removed once the split is completed. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner --- include/linux/irqflags.h | 12 kernel/time/posix-cpu-timers.c |6 +- 2 files changed, 17 insertions(+), 1 deletion(-) --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -69,6 +69,16 @@ do { \ current->irq_config = 0;\ } while (0) +# define lockdep_posixtimer_enter()\ + do { \ + current->irq_config = 1; \ + } while (0) + +# define lockdep_posixtimer_exit() \ + do { \ + current->irq_config = 0; \ + } while (0) + # define lockdep_irq_work_enter(__work) \ do { \ if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\ @@ -94,6 +104,8 @@ do { \ # define lockdep_softirq_exit()do { } while (0) # define lockdep_hrtimer_enter(__hrtimer) do { } while (0) # define lockdep_hrtimer_exit(__hrtimer) do { } while (0) +# define lockdep_posixtimer_enter()do { } while (0) +# define lockdep_posixtimer_exit() do { } while (0) # define lockdep_irq_work_enter(__work)do { } while (0) # define lockdep_irq_work_exit(__work) do { } while (0) #endif --- a/kernel/time/posix-cpu-timers.c +++ b/kernel/time/posix-cpu-timers.c @@ -1126,8 +1126,11 @@ void run_posix_cpu_timers(void) if (!fastpath_timer_check(tsk)) return; - if (!lock_task_sighand(tsk, )) + lockdep_posixtimer_enter(); + if (!lock_task_sighand(tsk, )) { + lockdep_posixtimer_exit(); return; + } /* * Here we take off tsk->signal->cpu_timers[N] and * tsk->cpu_timers[N] all the timers that are firing, and @@ -1169,6 +1172,7 @@ void run_posix_cpu_timers(void) cpu_timer_fire(timer); spin_unlock(>it_lock); } + lockdep_posixtimer_exit(); } /*
Re: [PATCH 2/5] csky: Remove mm.h from asm/uaccess.h
Tested and Acked by me. Queued for next pull request, thx On Fri, Mar 20, 2020 at 5:49 PM Sebastian Andrzej Siewior wrote: > > The defconfig compiles without linux/mm.h. With mm.h included the > include chain leands to: > | CC kernel/locking/percpu-rwsem.o > | In file included from include/linux/huge_mm.h:8, > | from include/linux/mm.h:567, > | from arch/csky/include/asm/uaccess.h:, > | from include/linux/uaccess.h:11, > | from include/linux/sched/task.h:11, > | from include/linux/sched/signal.h:9, > | from include/linux/rcuwait.h:6, > | from include/linux/percpu-rwsem.h:8, > | from kernel/locking/percpu-rwsem.c:6: > | include/linux/fs.h:1422:29: error: array type has incomplete element type > 'struct percpu_rw_semaphore' > | 1422 | struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS]; > > once rcuwait.h includes linux/sched/signal.h. > > Remove the linux/mm.h include. > > Cc: Guo Ren > Cc: linux-c...@vger.kernel.org > Reported-by: kbuild test robot > Signed-off-by: Sebastian Andrzej Siewior > --- > arch/csky/include/asm/uaccess.h | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/arch/csky/include/asm/uaccess.h b/arch/csky/include/asm/uaccess.h > index eaa1c3403a424..abefa125b93cf 100644 > --- a/arch/csky/include/asm/uaccess.h > +++ b/arch/csky/include/asm/uaccess.h > @@ -11,7 +11,6 @@ > #include > #include > #include > -#include > #include > #include > #include > -- > 2.26.0.rc2 > -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/
Re: [PATCH 07/10] ARM: dts: tango4: Make /serial compatible with ns16550a
Lubomir Rintel writes: > ralink,rt2880-uart is compatible with ns16550a and all other > instances of RT2880 UART nodes include it in the compatible property. > Add it also here, to make the binding schema simpler. > > Signed-off-by: Lubomir Rintel Acked-by: Mans Rullgard > --- > arch/arm/boot/dts/tango4-common.dtsi | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm/boot/dts/tango4-common.dtsi > b/arch/arm/boot/dts/tango4-common.dtsi > index ff72a8efb73d0..54fd522badfc9 100644 > --- a/arch/arm/boot/dts/tango4-common.dtsi > +++ b/arch/arm/boot/dts/tango4-common.dtsi > @@ -85,7 +85,7 @@ tick-counter@10048 { > }; > > uart: serial@10700 { > - compatible = "ralink,rt2880-uart"; > + compatible = "ralink,rt2880-uart", "ns16550a"; > reg = <0x10700 0x30>; > interrupts = <1 IRQ_TYPE_LEVEL_HIGH>; > clock-frequency = <7372800>; > -- > 2.25.1 > -- Måns Rullgård
Re: [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait
Christoph Hellwig writes: > On Wed, Mar 18, 2020 at 09:43:09PM +0100, Thomas Gleixner wrote: >> The PS3 notification interrupt and kthread use a hacked up completion to >> communicate. Since we're wanting to change the completion implementation and >> this is abuse anyway, replace it with a simple rcuwait since there is only >> ever >> the one waiter. >> >> AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads >> cannot receive signals by default and this one doesn't look different. Use >> TASK_IDLE instead. > > I think the right fix here is to jut convert the thing to a threaded > interrupt handler and kill off the stupid kthread. That'd be a major surgery. > But I wonder how alive the whole PS3 support is to start with.. There seem to be a few enthusiast left which have a Other-OS capable PS3 Thanks, tglx
[PATCH] powerpc/ptrace: Fix ptrace-hwbreak selftest failure
Pieces of commit c3f68b0478e7 ("powerpc/watchpoint: Fix ptrace code that muck around with address/len") disappeared with commit 53387e67d003 ("powerpc/ptrace: split out ADV_DEBUG_REGS related functions."). Restore them. Fixes: 53387e67d003 ("powerpc/ptrace: split out ADV_DEBUG_REGS related functions.") Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/ptrace/ptrace-noadv.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index cf05fadba0d5..d4170932acb4 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -153,7 +153,7 @@ long ppc_set_hwdebug(struct task_struct *child, struct ppc_hw_breakpoint *bp_inf if ((unsigned long)bp_info->addr >= TASK_SIZE) return -EIO; - brk.address = bp_info->addr & ~7UL; + brk.address = bp_info->addr & ~HW_BREAKPOINT_ALIGN; brk.type = HW_BRK_TYPE_TRANSLATE; brk.len = DABR_MAX_LEN; if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_READ) @@ -161,10 +161,6 @@ long ppc_set_hwdebug(struct task_struct *child, struct ppc_hw_breakpoint *bp_inf if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_WRITE) brk.type |= HW_BRK_TYPE_WRITE; #ifdef CONFIG_HAVE_HW_BREAKPOINT - /* -* Check if the request is for 'range' breakpoints. We can -* support it if range < 8 bytes. -*/ if (bp_info->addr_mode == PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE) len = bp_info->addr2 - bp_info->addr; else if (bp_info->addr_mode == PPC_BREAKPOINT_MODE_EXACT) @@ -177,7 +173,7 @@ long ppc_set_hwdebug(struct task_struct *child, struct ppc_hw_breakpoint *bp_inf /* Create a new breakpoint request if one doesn't exist already */ hw_breakpoint_init(); - attr.bp_addr = (unsigned long)bp_info->addr & ~HW_BREAKPOINT_ALIGN; + attr.bp_addr = (unsigned long)bp_info->addr; attr.bp_len = len; arch_bp_generic_fields(brk.type, _type); -- 2.25.0
Re: [PATCH v5 10/13] powerpc/ptrace: split out ADV_DEBUG_REGS related functions.
On 03/20/2020 02:12 AM, Michael Ellerman wrote: Christophe Leroy writes: Move ADV_DEBUG_REGS functions out of ptrace.c, into ptrace-adv.c and ptrace-noadv.c Signed-off-by: Christophe Leroy --- v4: Leave hw_breakpoint.h for ptrace.c --- arch/powerpc/kernel/ptrace/Makefile | 4 + arch/powerpc/kernel/ptrace/ptrace-adv.c | 468 arch/powerpc/kernel/ptrace/ptrace-decl.h | 5 + arch/powerpc/kernel/ptrace/ptrace-noadv.c | 236 arch/powerpc/kernel/ptrace/ptrace.c | 650 -- 5 files changed, 713 insertions(+), 650 deletions(-) create mode 100644 arch/powerpc/kernel/ptrace/ptrace-adv.c create mode 100644 arch/powerpc/kernel/ptrace/ptrace-noadv.c This is somehow breaking the ptrace-hwbreak selftest on Power8: test: ptrace-hwbreak tags: git_version:v5.6-rc6-892-g7a285a6067d6 PTRACE_SET_DEBUGREG, WO, len: 1: Ok PTRACE_SET_DEBUGREG, WO, len: 2: Ok PTRACE_SET_DEBUGREG, WO, len: 4: Ok PTRACE_SET_DEBUGREG, WO, len: 8: Ok PTRACE_SET_DEBUGREG, RO, len: 1: Ok PTRACE_SET_DEBUGREG, RO, len: 2: Ok PTRACE_SET_DEBUGREG, RO, len: 4: Ok PTRACE_SET_DEBUGREG, RO, len: 8: Ok PTRACE_SET_DEBUGREG, RW, len: 1: Ok PTRACE_SET_DEBUGREG, RW, len: 2: Ok PTRACE_SET_DEBUGREG, RW, len: 4: Ok PTRACE_SET_DEBUGREG, RW, len: 8: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, WO, len: 1: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RO, len: 1: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RW, len: 1: Ok PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, WO, len: 6: Ok PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, RO, len: 6: Ok PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, RW, len: 6: Ok PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW UNALIGNED, WO, len: 6: Ok PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW UNALIGNED, RO, len: 6: Fail failure: ptrace-hwbreak I haven't had time to work out why yet. A (big) part of commit c3f68b0478e7 ("powerpc/watchpoint: Fix ptrace code that muck around with address/len") was lost during rebase. I'll send a fix, up to you to squash it in or commit it as is. Christophe
Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
"Paul E. McKenney" writes: > On Fri, Mar 20, 2020 at 11:36:03PM +0100, Thomas Gleixner wrote: >> I agree that what I tried to express is hard to parse, but it's at least >> halfways correct :) > > Apologies! That is what I get for not looking it up in the source. :-/ > > OK, so I am stupid enough not only to get it wrong, but also to try again: > >... Other types of wakeups would normally unconditionally set the >task state to RUNNING, but that does not work here because the task >must remain blocked until the lock becomes available. Therefore, >when a non-lock wakeup attempts to awaken a task blocked waiting >for a spinlock, it instead sets the saved state to RUNNING. Then, >when the lock acquisition completes, the lock wakeup sets the task >state to the saved state, in this case setting it to RUNNING. > > Is that better? Definitely! Thanks for all the editorial work! tglx
[powerpc:fixes-test] BUILD SUCCESS 1d0c32ec3b860a32df593a22bad0d1dbc5546a59
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git fixes-test branch HEAD: 1d0c32ec3b860a32df593a22bad0d1dbc5546a59 KVM: PPC: Fix kernel crash with PR KVM elapsed time: 1740m configs tested: 226 configs skipped: 154 The following configs have been built successfully. More configs may be tested in the coming days. arm allmodconfig arm allnoconfig arm allyesconfig arm64allmodconfig arm64 allnoconfig arm64allyesconfig arm at91_dt_defconfig arm efm32_defconfig arm exynos_defconfig armmulti_v5_defconfig armmulti_v7_defconfig armshmobile_defconfig arm sunxi_defconfig arm64 defconfig sparcallyesconfig nds32 defconfig powerpc allnoconfig shallnoconfig riscv defconfig ia64 allyesconfig nds32 allnoconfig sh rsk7269_defconfig nios2 3c120_defconfig pariscgeneric-32bit_defconfig riscv rv32_defconfig microblazenommu_defconfig sparc defconfig m68k allmodconfig s390 allmodconfig i386 alldefconfig i386 allnoconfig i386 allyesconfig i386defconfig ia64 alldefconfig ia64 allmodconfig ia64 allnoconfig ia64defconfig c6x allyesconfig c6xevmc6678_defconfig nios2 10m50_defconfig openriscor1ksim_defconfig openrisc simple_smp_defconfig xtensa common_defconfig xtensa iss_defconfig h8300 edosk2674_defconfig h8300h8300h-sim_defconfig h8300 h8s-sim_defconfig m68k m5475evb_defconfig m68k sun3_defconfig m68k multi_defconfig arc allyesconfig arc defconfig microblaze mmu_defconfig powerpc defconfig powerpc ppc64_defconfig powerpc rhel-kconfig mips 32r2_defconfig mips 64r6el_defconfig mips allmodconfig mips allnoconfig mips allyesconfig mips fuloong2e_defconfig mips malta_kvm_defconfig pariscallnoconfig parisc allyesconfig pariscgeneric-64bit_defconfig x86_64 randconfig-a001-20200320 x86_64 randconfig-a002-20200320 x86_64 randconfig-a003-20200320 i386 randconfig-a001-20200320 i386 randconfig-a002-20200320 i386 randconfig-a003-20200320 x86_64 randconfig-a001-20200321 x86_64 randconfig-a002-20200321 x86_64 randconfig-a003-20200321 i386 randconfig-a001-20200321 i386 randconfig-a002-20200321 i386 randconfig-a003-20200321 alpharandconfig-a001-20200320 m68k randconfig-a001-20200320 mips randconfig-a001-20200320 nds32randconfig-a001-20200320 parisc randconfig-a001-20200320 alpharandconfig-a001-20200321 m68k randconfig-a001-20200321 mips randconfig-a001-20200321 nds32randconfig-a001-20200321 parisc randconfig-a001-20200321 riscvrandconfig-a001-20200321 c6x randconfig-a001-20200321 h8300randconfig-a001-20200321 microblaze randconfig-a001-20200321 nios2randconfig-a001-20200321 sparc64 randconfig-a001-20200321 c6x randconfig-a001-20200320 h8300randconfig-a001-20200320 microblaze randconfig-a001-20200320 nios2randconfig-a001-20200320 sparc64 randconfig-a001-20200320 csky randconfig-a001-20200320 openrisc randconfig-a001-20200320
Re: [PATCH] powerpc/pseries: avoid harmless preempt warning
Christophe Leroy's on March 21, 2020 1:33 am: > > > Le 20/03/2020 à 16:24, Nicholas Piggin a écrit : >> Signed-off-by: Nicholas Piggin >> --- >> arch/powerpc/platforms/pseries/lpar.c | 10 +- >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/platforms/pseries/lpar.c >> b/arch/powerpc/platforms/pseries/lpar.c >> index 3c3da25b445c..e4ed5317f117 100644 >> --- a/arch/powerpc/platforms/pseries/lpar.c >> +++ b/arch/powerpc/platforms/pseries/lpar.c >> @@ -636,8 +636,16 @@ static const struct proc_ops >> vcpudispatch_stats_freq_proc_ops = { >> >> static int __init vcpudispatch_stats_procfs_init(void) >> { >> -if (!lppaca_shared_proc(get_lppaca())) >> +/* >> + * Avoid smp_processor_id while preemptible. All CPUs should have >> + * the same value for lppaca_shared_proc. >> + */ >> +preempt_disable(); >> +if (!lppaca_shared_proc(get_lppaca())) { >> +preempt_enable(); >> return 0; >> +} >> +preempt_enable(); > > Can we avoid the double preempt_enable() with something like: > > preempt_disable(); > is_shared = lppaca_shared_proc(get_lppaca()); > preempt_enable(); > if (!is_shared) > return 0; I don't mind too much. Same number of lines. Thanks, Nick
Re: [PATCH] powerpc/64: ftrace don't trace real mode
Naveen N. Rao's on March 21, 2020 4:39 am: > Hi Nick, > > Nicholas Piggin wrote: >> This warns and prevents tracing attempted in a real-mode context. > > Is this something you're seeing often? Last time we looked at this, KVM > was the biggest offender and we introduced paca->ftrace_enabled as a way > to disable ftrace while in KVM code. Not often but it has a tendancy to blow up the least tested code at the worst times :) Machine check is bad, I'm sure HMI too but I haven't tested that yet. I've fixed up most of it with annotations, this is obviously an extra safety not something to rely on like ftrace_enabled. Probably even the WARN_ON here is dangerous here, but I don't want to leave these bugs in there. Although the machine check and hmi code touch a fair bit of stuff and annotating is a bit fragile. It might actually be better if the paca->ftrace_enabled could be a nesting counter, then we could use it in machine checks too and avoid a lot of annotations. > While this is cheap when handling ftrace_regs_caller() as done in this > patch, for simple function tracing (see below), we will have to grab the > MSR which will slow things down slightly. mfmsr is not too bad these days. > >> >> Signed-off-by: Nicholas Piggin >> --- >> arch/powerpc/kernel/trace/ftrace.c| 3 +++ >> .../powerpc/kernel/trace/ftrace_64_mprofile.S | 19 +++ >> 2 files changed, 18 insertions(+), 4 deletions(-) >> >> diff --git a/arch/powerpc/kernel/trace/ftrace.c >> b/arch/powerpc/kernel/trace/ftrace.c >> index 7ea0ca044b65..ef965815fcb9 100644 >> --- a/arch/powerpc/kernel/trace/ftrace.c >> +++ b/arch/powerpc/kernel/trace/ftrace.c >> @@ -949,6 +949,9 @@ unsigned long prepare_ftrace_return(unsigned long >> parent, unsigned long ip, >> { >> unsigned long return_hooker; >> >> +if (WARN_ON_ONCE((mfmsr() & (MSR_IR|MSR_DR)) != (MSR_IR|MSR_DR))) >> +goto out; >> + > > This is called on function entry to redirect function return to a > trampoline if needed. I am not sure if we have (or will have) too many C > functions that disable MSR_IR|MSR_DR. Unless the number of such > functions is large, it might be preferable to mark specific functions as > notrace. > >> if (unlikely(ftrace_graph_is_dead())) >> goto out; >> >> diff --git a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S >> b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S >> index f9fd5f743eba..6205f15cb603 100644 >> --- a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S >> +++ b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S >> @@ -51,16 +51,21 @@ _GLOBAL(ftrace_regs_caller) >> SAVE_10GPRS(12, r1) >> SAVE_10GPRS(22, r1) >> >> -/* Save previous stack pointer (r1) */ >> -addir8, r1, SWITCH_FRAME_SIZE >> -std r8, GPR1(r1) >> - >> /* Load special regs for save below */ >> mfmsr r8 >> mfctr r9 >> mfxer r10 >> mfcrr11 >> >> +/* Shouldn't be called in real mode */ >> +andi. r3,r8,(MSR_IR|MSR_DR) >> +cmpdi r3,(MSR_IR|MSR_DR) >> +bne ftrace_bad_realmode >> + >> +/* Save previous stack pointer (r1) */ >> +addir8, r1, SWITCH_FRAME_SIZE >> +std r8, GPR1(r1) >> + > > This stomps on the MSR value in r8, which is saved into pt_regs further > below. > > You'll also have to handle ftrace_caller() which is used for simple > function tracing. We don't read the MSR there today, but that will be > needed if we want to suppress tracing. Oops, thanks good catch. Thanks, Nick