[PATCH] Documentation: Clarify better about the rwsem non-owner release issue

2020-03-21 Thread Joel Fernandes (Google)
Reword and clarify better about the rwsem non-owner release issue.

Link: https://lore.kernel.org/linux-pci/20200321212144.ga6...@google.com/

Signed-off-by: Joel Fernandes (Google) 
---
 Documentation/locking/locktypes.rst | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/locking/locktypes.rst 
b/Documentation/locking/locktypes.rst
index 6f2c0f5b041e..656dce72f11f 100644
--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -292,7 +292,7 @@ implementations to provide priority inheritance for all 
lock types except
 the truly spinning ones. Priority inheritance on ownerless locks is
 obviously impossible.
 
-For now the rwsem non-owner release excludes code which utilizes it from
-being used on PREEMPT_RT enabled kernels. In same cases this can be
-mitigated by disabling portions of the code, in other cases the complete
-functionality has to be disabled until a workable solution has been found.
+For now, a PREEMPT_RT kernel just disables code sections that perform a
+non-owner release of an rwsem. In some cases, parts of the code are disabled.
+In other cases, the complete functionality has to be disabled until a workable
+solution has been found.
-- 
2.25.1.696.g5e7596f4ac-goog



Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation

2020-03-21 Thread Joel Fernandes
On Sat, Mar 21, 2020 at 10:49:04PM +0100, Thomas Gleixner wrote:
[...] 
> >> +rwsems have grown interfaces which allow non owner release for special
> >> +purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
> >> +substitutes all locking primitives except semaphores with RT-mutex based
> >> +implementations to provide priority inheritance for all lock types except
> >> +the truly spinning ones. Priority inheritance on ownerless locks is
> >> +obviously impossible.
> >> +
> >> +For now the rwsem non-owner release excludes code which utilizes it from
> >> +being used on PREEMPT_RT enabled kernels.
> >
> > I could not parse the last sentence here, but I think you meant "For now,
> > PREEMPT_RT enabled kernels disable code that perform a non-owner release of
> > an rwsem". Correct me if I'm wrong.
> 
> Right, that's what I wanted to say :)
> 
> Care to send a delta patch?

Absolutely, doing that now. :-)

thanks,

 - Joel



Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation

2020-03-21 Thread Thomas Gleixner
Joel Fernandes  writes:
>> +rwlock_t
>> +
>> +
>> +rwlock_t is a multiple readers and single writer lock mechanism.
>> +
>> +On a non PREEMPT_RT enabled kernel rwlock_t is implemented as a spinning
>> +lock and the suffix rules of spinlock_t apply accordingly. The
>> +implementation is fair and prevents writer starvation.
>>
>
> You mentioned writer starvation, but I think it would be good to also mention
> that rwlock_t on a non-PREEMPT_RT kernel also does not have _reader_
> starvation problem, since it uses queued implementation.  This fact is worth
> mentioning here, since further below you explain that an rwlock in PREEMPT_RT
> does have reader starvation problem.

It's worth mentioning. But RT really has only write starvation not
reader starvation.

>> +rwlock_t and PREEMPT_RT
>> +---
>> +
>> +On a PREEMPT_RT enabled kernel rwlock_t is mapped to a separate
>> +implementation based on rt_mutex which changes the semantics:
>> +
>> + - Same changes as for spinlock_t
>> +
>> + - The implementation is not fair and can cause writer starvation under
>> +   certain circumstances. The reason for this is that a writer cannot grant
>> +   its priority to multiple readers. Readers which are blocked on a writer
>> +   fully support the priority inheritance protocol.
>
> Is it hard to give priority to multiple readers because the number of readers
> to give priority to could be unbounded?

Yes, and it's horribly complex and racy. We had an implemetation years
ago which taught us not to try it again :)

>> +PREEMPT_RT also offers a local_lock mechanism to substitute the
>> +local_irq_disable/save() constructs in cases where a separation of the
>> +interrupt disabling and the locking is really unavoidable. This should be
>> +restricted to very rare cases.
>
> It would also be nice to mention where else local_lock() can be used, such as
> protecting per-cpu variables without disabling preemption. Could we add a
> section on protecting per-cpu data? (Happy to do that and send a patch if you
> prefer).

The local lock section will come soon when we post the local lock
patches again.

>> +rwsems have grown interfaces which allow non owner release for special
>> +purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
>> +substitutes all locking primitives except semaphores with RT-mutex based
>> +implementations to provide priority inheritance for all lock types except
>> +the truly spinning ones. Priority inheritance on ownerless locks is
>> +obviously impossible.
>> +
>> +For now the rwsem non-owner release excludes code which utilizes it from
>> +being used on PREEMPT_RT enabled kernels.
>
> I could not parse the last sentence here, but I think you meant "For now,
> PREEMPT_RT enabled kernels disable code that perform a non-owner release of
> an rwsem". Correct me if I'm wrong.

Right, that's what I wanted to say :)

Care to send a delta patch?

Thanks!

tglx


Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation

2020-03-21 Thread Joel Fernandes
Hi Thomas,

Just a few comments:

[...]
> +rtmutex
> +===
> +
> +RT-mutexes are mutexes with support for priority inheritance (PI).
> +
> +PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
> +interrupt disabled sections.
> +
> +On a PREEMPT_RT enabled kernel most of these sections are fully
> +preemptible. This is possible because PREEMPT_RT forces most executions
> +into task context, especially interrupt handlers and soft interrupts, which
> +allows to substitute spinlock_t and rwlock_t with RT-mutex based
> +implementations.
> +
> +
> +raw_spinlock_t and spinlock_t
> +=
> +
> +raw_spinlock_t
> +--
> +
> +raw_spinlock_t is a strict spinning lock implementation regardless of the
> +kernel configuration including PREEMPT_RT enabled kernels.
> +
> +raw_spinlock_t is to be used only in real critical core code, low level
> +interrupt handling and places where protecting (hardware) state is required
> +to be safe against preemption and eventually interrupts.
> +
> +Another reason to use raw_spinlock_t is when the critical section is tiny
> +to avoid the overhead of spinlock_t on a PREEMPT_RT enabled kernel in the
> +contended case.
> +
> +spinlock_t
> +--
> +
> +The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
> +
> +On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
> +and has exactly the same semantics.
> +
> +spinlock_t and PREEMPT_RT
> +-
> +
> +On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
> +implementation based on rt_mutex which changes the semantics:
> +
> + - Preemption is not disabled
> +
> + - The hard interrupt related suffixes for spin_lock / spin_unlock
> +   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
> +   interrupt disabled state
> +
> + - The soft interrupt related suffix (_bh()) is still disabling the
> +   execution of soft interrupts, but contrary to a non PREEMPT_RT enabled
> +   kernel, which utilizes the preemption count, this is achieved by a per
> +   CPU bottom half locking mechanism.
> +
> +All other semantics of spinlock_t are preserved:
> +
> + - Migration of tasks which hold a spinlock_t is prevented. On a non
> +   PREEMPT_RT enabled kernel this is implicit due to preemption disable.
> +   PREEMPT_RT has a separate mechanism to achieve this. This ensures that
> +   pointers to per CPU variables stay valid even if the task is preempted.
> +
> + - Task state preservation. The task state is not affected when a lock is
> +   contended and the task has to schedule out and wait for the lock to
> +   become available. The lock wake up restores the task state unless there
> +   was a regular (not lock related) wake up on the task. This ensures that
> +   the task state rules are always correct independent of the kernel
> +   configuration.
> +
> +rwlock_t
> +
> +
> +rwlock_t is a multiple readers and single writer lock mechanism.
> +
> +On a non PREEMPT_RT enabled kernel rwlock_t is implemented as a spinning
> +lock and the suffix rules of spinlock_t apply accordingly. The
> +implementation is fair and prevents writer starvation.
>

You mentioned writer starvation, but I think it would be good to also mention
that rwlock_t on a non-PREEMPT_RT kernel also does not have _reader_
starvation problem, since it uses queued implementation.  This fact is worth
mentioning here, since further below you explain that an rwlock in PREEMPT_RT
does have reader starvation problem.

> +rwlock_t and PREEMPT_RT
> +---
> +
> +On a PREEMPT_RT enabled kernel rwlock_t is mapped to a separate
> +implementation based on rt_mutex which changes the semantics:
> +
> + - Same changes as for spinlock_t
> +
> + - The implementation is not fair and can cause writer starvation under
> +   certain circumstances. The reason for this is that a writer cannot grant
> +   its priority to multiple readers. Readers which are blocked on a writer
> +   fully support the priority inheritance protocol.

Is it hard to give priority to multiple readers because the number of readers
to give priority to could be unbounded?

> +
> +
> +PREEMPT_RT caveats
> +==
> +
> +spinlock_t and rwlock_t
> +---
> +
> +The substitution of spinlock_t and rwlock_t on PREEMPT_RT enabled kernels
> +with RT-mutex based implementations has a few implications.
> +
> +On a non PREEMPT_RT enabled kernel the following code construct is
> +perfectly fine::
> +
> +   local_irq_disable();
> +   spin_lock();
> +
> +and fully equivalent to::
> +
> +   spin_lock_irq();
> +
> +Same applies to rwlock_t and the _irqsave() suffix variant.
> +
> +On a PREEMPT_RT enabled kernel this breaks because the RT-mutex
> +substitution expects a fully preemptible context.
> +
> +The preferred solution is to use :c:func:`spin_lock_irq()` or
> +:c:func:`spin_lock_irqsave()` and their unlock counterparts.
> +
> +PREEMPT_RT also offers a 

Re: [patch V3 00/20] Lock ordering documentation and annotation for lockdep

2020-03-21 Thread Thomas Gleixner
Davidlohr Bueso  writes:

> On Sat, 21 Mar 2020, Thomas Gleixner wrote:
>
>>This is the third and hopefully final version of this work. The second one
>>can be found here:
>
> Would you rather I send in a separate series with the kvm changes, or
> should I just send a v2 with the fixes here again?

Send a separate series please. These nested threads are hard to follow.

Thanks,

tglx


Re: [PATCH 00/18] genirq: Remove setup_irq()

2020-03-21 Thread afzal mohammed
Hi Thomas,

On Thu, Feb 27, 2020 at 04:37:13PM +0530, afzal mohammed wrote:
> On Thu, Feb 27, 2020 at 11:31:15AM +0100, Thomas Gleixner wrote:

> > Vs. merging this series, I suggest the following approach:
> > 
> >- Resubmit the individual changes as single patches or small series
> >  to the relevant maintainers and subsystem mailing lists. They have
> >  no dependency on a core change and can be applied where they belong
> >  to.
> > 
> >- After 5.6-rc6, verify which parts have made their way into
> >  linux-next and resubmit the ignored ones as a series to me along
> >  with the removal of the core parts.
> > 
> > That way we can avoid conflicting changes between subsystems and the tip
> > irq/core branch as much as possible.
> 
> Okay, i will do accordingly.

i am on it, is delayed due to the reason as mentioned at,

https://lkml.kernel.org/r/20200321172626.GA6323@afzalpc

[ not repeating contents here since other mail was sent just now,
  cc'ing you ]

Regards
afzal



Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation

2020-03-21 Thread Paul E. McKenney
On Sat, Mar 21, 2020 at 11:26:06AM +0100, Thomas Gleixner wrote:
> "Paul E. McKenney"  writes:
> > On Fri, Mar 20, 2020 at 11:36:03PM +0100, Thomas Gleixner wrote:
> >> I agree that what I tried to express is hard to parse, but it's at least
> >> halfways correct :)
> >
> > Apologies!  That is what I get for not looking it up in the source.  :-/
> >
> > OK, so I am stupid enough not only to get it wrong, but also to try again:
> >
> >... Other types of wakeups would normally unconditionally set the
> >task state to RUNNING, but that does not work here because the task
> >must remain blocked until the lock becomes available.  Therefore,
> >when a non-lock wakeup attempts to awaken a task blocked waiting
> >for a spinlock, it instead sets the saved state to RUNNING.  Then,
> >when the lock acquisition completes, the lock wakeup sets the task
> >state to the saved state, in this case setting it to RUNNING.
> >
> > Is that better?
> 
> Definitely!
> 
> Thanks for all the editorial work!

NP, and glad you like it!

But I felt even more stupid sometime in the middle of the night.  Why on
earth didn't I work in your nice examples?  :-/

I will pull them in later.  Time to go hike!!!

Thanx, Paul


Re: [patch V3 00/20] Lock ordering documentation and annotation for lockdep

2020-03-21 Thread Davidlohr Bueso

On Sat, 21 Mar 2020, Thomas Gleixner wrote:


This is the third and hopefully final version of this work. The second one
can be found here:


Would you rather I send in a separate series with the kvm changes, or
should I just send a v2 with the fixes here again?

Thanks,
Davidlohr


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.6-5 tag

2020-03-21 Thread pr-tracker-bot
The pull request you sent on Sat, 21 Mar 2020 23:54:54 +1100:

> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
> tags/powerpc-5.6-5

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/c63c50fc2ec9afc4de21ef9ead2eac64b178cce1

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [PATCH 2/5] csky: Remove mm.h from asm/uaccess.h

2020-03-21 Thread Guo Ren
On Sat, Mar 21, 2020 at 8:08 PM Thomas Gleixner  wrote:
>
> Guo Ren  writes:
>
> > Tested and Acked by me.
> >
> > Queued for next pull request, thx
>
> Can we please route that with the rcuwait changes to avoid breakage
> unless you ship it to Linus right away?

Ok, I won't queue it.


Re: [patch V3 12/20] powerpc/ps3: Convert half completion to rcuwait

2020-03-21 Thread Thomas Gleixner
Thomas Gleixner  writes:

> From: Thomas Gleixner 

That's obviously bogus and wants to be:

From: Peter Zijlstra (Intel) 



[GIT PULL] Please pull powerpc/linux.git powerpc-5.6-5 tag

2020-03-21 Thread Michael Ellerman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Linus,

Please pull a couple more powerpc fixes for 5.6:

The following changes since commit 59bee45b9712c759ea4d3dcc4eff1752f3a66558:

  powerpc/mm: Fix missing KUAP disable in flush_coherent_icache() (2020-03-05 
17:15:08 +1100)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-5.6-5

for you to fetch changes up to 1d0c32ec3b860a32df593a22bad0d1dbc5546a59:

  KVM: PPC: Fix kernel crash with PR KVM (2020-03-20 13:39:10 +1100)

- --
powerpc fixes for 5.6 #5

Two fixes for bugs introduced this cycle.

One for a crash when shutting down a KVM PR guest (our original style of KVM
which doesn't use hypervisor mode).

And one fix for the recently added 32-bit KASAN_VMALLOC support.

Thanks to:
  Christophe Leroy, Greg Kurz, Sean Christopherson.

- --
Christophe Leroy (1):
  powerpc/kasan: Fix shadow memory protection with CONFIG_KASAN_VMALLOC

Greg Kurz (1):
  KVM: PPC: Fix kernel crash with PR KVM


 arch/powerpc/kvm/book3s_pr.c  | 1 +
 arch/powerpc/kvm/powerpc.c| 2 --
 arch/powerpc/mm/kasan/kasan_init_32.c | 9 ++---
 3 files changed, 3 insertions(+), 9 deletions(-)
-BEGIN PGP SIGNATURE-

iQIzBAEBCAAdFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl52BiYACgkQUevqPMjh
pYDwqhAAsiBYfVZiof+3JQM1Qaf2nnrYHMCLgwT6BJ6l8IxcjPih/4tcWAhe9plX
8xAa44leV0I+HpO0gxFOcfmPTtMmgdk8elUq8NEN7NBQgCb0vDyhjRBPuotwFqvx
U5Hiztra4c87NSL3A/h9JXxtRQBTw5Eg3NuxVPHV6seMtUhBkJThfYWFY7DWvolX
v8poLVZjO2Tq+QuEwJuC1lFTCqT1u1V+IdxEbNrBWy3yNqvb9p4gAlvB+0s/xe30
SFmTyNWUdW4KCE+8Vvbpgr9+s7m6CCvysjlmPra+m7pgFe+X6qcRFLWz+L+8MSSj
+mSmFuyI6KYOxRRY3Er4Q7gnssfiNlcxr5I/jvhiXWJNwS3inv4OinlEfV4pW/eT
++Uy7JCwqel1JZANh0oqcydeuGdU8emozsi5LuRKLB6Mm5mFymzYRc+W6MEpUwM6
b+2rPAagOwY44DN517Ixkc/jdad3BCiLeuDWqzx1BVUHgEKNvu3RWhrHClM1kEey
AaH6u1KGQ81XuKqYrODFaLkOoCumPzUgK42CcjblAQcea1PpYl5myCVNILLBQjJK
kTa1CzX9JG0z9mWLjQwJSpYKB7MOQxBGy0BmxOdjjUBD9DkhGm0Pt7NGz1Xo0n9s
hCSVtCV9M1cfaPyvCDICbDtR+j8SQTfjFk9fDwWcnrssbhtac3c=
=3KDP
-END PGP SIGNATURE-


FSL P5020/Cyrus+ Board: Poweroff and Restart Support

2020-03-21 Thread Christian Zigotzky

Hello,

We would like to add poweroff and restart support for the Cyrus+ board 
[1] [2] to the mainline vanilla kernel.

There is a patch for adding poweroff and restart support. (attached)
It works but I am not sure if it is good enough for the mainline vanilla 
kernel.

Please post some suggestions and comments about this patch.

Thanks,
Christian


[1] http://wiki.amiga.org/index.php?title=X5000
[2] https://www.amigaos.net/hardware/133/amigaone-x5000
diff -rupN a/arch/powerpc/boot/dts/fsl/cyrus_p5020.dts 
b/arch/powerpc/boot/dts/fsl/cyrus_p5020.dts
--- a/arch/powerpc/boot/dts/fsl/cyrus_p5020.dts 2020-02-10 01:08:48.0 
+0100
+++ b/arch/powerpc/boot/dts/fsl/cyrus_p5020.dts 2020-02-10 08:49:47.953680947 
+0100
@@ -146,6 +146,25 @@
  0 0x0001>;
};
};
+
+   gpio-poweroff {
+   compatible = "gpio-poweroff";
+   gpios = < 3 1>;
+   };
+
+   gpio-restart {
+   compatible = "gpio-restart";
+   gpios = < 2 1>;
+   };
+
+   leds {
+   compatible = "gpio-leds";
+   hdd {
+   label = "Disk activity";
+   gpios = < 5 0>;
+   linux,default-trigger = "disk-activity";
+   };
+   };
 };
 
 /include/ "p5020si-post.dtsi"
diff -rupN a/arch/powerpc/platforms/85xx/corenet_generic.c 
b/arch/powerpc/platforms/85xx/corenet_generic.c
--- a/arch/powerpc/platforms/85xx/corenet_generic.c 2020-02-10 
01:08:48.0 +0100
+++ b/arch/powerpc/platforms/85xx/corenet_generic.c 2020-02-10 
08:49:47.953680947 +0100
@@ -46,6 +46,16 @@ void __init corenet_gen_pic_init(void)
mpic_init(mpic);
 }
 
+/* If someone has registered a poweroff callback, invoke it */
+static void __noreturn corenet_generic_halt(void)
+{
+   if (pm_power_off)
+   pm_power_off();
+
+   /* Should not return */
+   for(;;);
+}
+
 /*
  * Setup the architecture
  */
@@ -99,6 +109,15 @@ static const struct of_device_id of_devi
{
.name   = "handles",
},
+   {
+   .name   = "gpio-poweroff",
+   },
+   {
+   .name   = "gpio-restart",
+   },
+   {
+   .name   = "leds",
+   },
{}
 };
 
@@ -149,6 +168,8 @@ static int __init corenet_generic_probe(
extern struct smp_ops_t smp_85xx_ops;
 #endif
 
+   ppc_md.halt = corenet_generic_halt;
+
if (of_device_compatible_match(of_root, boards))
return 1;
 


Re: [patch V3 05/20] acpi: Remove header dependency

2020-03-21 Thread Andy Shevchenko
On Sat, Mar 21, 2020 at 1:34 PM Thomas Gleixner  wrote:
>
> From: Peter Zijlstra 
>
> In order to avoid future header hell, remove the inclusion of
> proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a
> struct.
>

Acked-by: Andy Shevchenko  # for PDx86

> Signed-off-by: Peter Zijlstra (Intel) 
> Signed-off-by: Thomas Gleixner 
> Cc: Darren Hart 
> Cc: Andy Shevchenko 
> Cc: platform-driver-...@vger.kernel.org
> Cc: Greg Kroah-Hartman 
> Cc: Zhang Rui 
> Cc: "Rafael J. Wysocki" 
> Cc: linux...@vger.kernel.org
> Cc: Len Brown 
> Cc: linux-a...@vger.kernel.org
> ---
>  drivers/platform/x86/dell-smo8800.c  |1 +
>  drivers/platform/x86/wmi.c   |1 +
>  drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c |1 +
>  include/acpi/acpi_bus.h  |2 +-
>  4 files changed, 4 insertions(+), 1 deletion(-)
>
> --- a/drivers/platform/x86/dell-smo8800.c
> +++ b/drivers/platform/x86/dell-smo8800.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  struct smo8800_device {
> u32 irq; /* acpi device irq */
> --- a/drivers/platform/x86/wmi.c
> +++ b/drivers/platform/x86/wmi.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  ACPI_MODULE_NAME("wmi");
> --- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> +++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "acpi_thermal_rel.h"
>
>  static acpi_handle acpi_thermal_rel_handle;
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -80,7 +80,7 @@ bool acpi_dev_present(const char *hid, c
>
>  #ifdef CONFIG_ACPI
>
> -#include 
> +struct proc_dir_entry;
>
>  #define ACPI_BUS_FILE_ROOT "acpi"
>  extern struct proc_dir_entry *acpi_root_dir;
>
>


-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH 2/5] csky: Remove mm.h from asm/uaccess.h

2020-03-21 Thread Thomas Gleixner
Guo Ren  writes:

> Tested and Acked by me.
>
> Queued for next pull request, thx

Can we please route that with the rcuwait changes to avoid breakage
unless you ship it to Linus right away?

Thanks,

tglx


Re: [PATCH 1/3] KVM: PPC: Fix kernel crash with PR KVM

2020-03-21 Thread Michael Ellerman
On Wed, 2020-03-18 at 17:43:30 UTC, Greg Kurz wrote:
> With PR KVM, shutting down a VM causes the host kernel to crash:
> 
> [  314.219284] BUG: Unable to handle kernel data access on read at 
> 0xc0080176c638
> [  314.219299] Faulting instruction address: 0xc00800d4ddb0
> cpu 0x0: Vector: 300 (Data Access) at [c0036da077a0]
> pc: c00800d4ddb0: kvmppc_mmu_pte_flush_all+0x68/0xd0 [kvm_pr]
> lr: c00800d4dd94: kvmppc_mmu_pte_flush_all+0x4c/0xd0 [kvm_pr]
> sp: c0036da07a30
>msr: 90010280b033
>dar: c0080176c638
>  dsisr: 4000
>   current = 0xc0036d4c
>   paca= 0xc1a0   irqmask: 0x03   irq_happened: 0x01
> pid   = 1992, comm = qemu-system-ppc
> Linux version 5.6.0-master-gku+ (greg@palmb) (gcc version 7.5.0 (Ubuntu 
> 7.5.0-3ubuntu1~18.04)) #17 SMP Wed Mar 18 13:49:29 CET 2020
> enter ? for help
> [c0036da07ab0] c00800d4fbe0 kvmppc_mmu_destroy_pr+0x28/0x60 [kvm_pr]
> [c0036da07ae0] c008009eab8c kvmppc_mmu_destroy+0x34/0x50 [kvm]
> [c0036da07b00] c008009e50c0 kvm_arch_vcpu_destroy+0x108/0x140 [kvm]
> [c0036da07b30] c008009d1b50 kvm_vcpu_destroy+0x28/0x80 [kvm]
> [c0036da07b60] c008009e4434 kvm_arch_destroy_vm+0xbc/0x190 [kvm]
> [c0036da07ba0] c008009d9c2c kvm_put_kvm+0x1d4/0x3f0 [kvm]
> [c0036da07c00] c008009da760 kvm_vm_release+0x38/0x60 [kvm]
> [c0036da07c30] c0420be0 __fput+0xe0/0x310
> [c0036da07c90] c01747a0 task_work_run+0x150/0x1c0
> [c0036da07cf0] c014896c do_exit+0x44c/0xd00
> [c0036da07dc0] c01492f4 do_group_exit+0x64/0xd0
> [c0036da07e00] c0149384 sys_exit_group+0x24/0x30
> [c0036da07e20] c000b9d0 system_call+0x5c/0x68
> 
> This is caused by a use-after-free in kvmppc_mmu_pte_flush_all()
> which dereferences vcpu->arch.book3s which was previously freed by
> kvmppc_core_vcpu_free_pr(). This happens because kvmppc_mmu_destroy()
> is called after kvmppc_core_vcpu_free() since commit ff030fdf5573
> ("KVM: PPC: Move kvm_vcpu_init() invocation to common code").
> 
> The kvmppc_mmu_destroy() helper calls one of the following depending
> on the KVM backend:
> 
> - kvmppc_mmu_destroy_hv() which does nothing (Book3s HV)
> 
> - kvmppc_mmu_destroy_pr() which undoes the effects of
>   kvmppc_mmu_init() (Book3s PR 32-bit)
> 
> - kvmppc_mmu_destroy_pr() which undoes the effects of
>   kvmppc_mmu_init() (Book3s PR 64-bit)
> 
> - kvmppc_mmu_destroy_e500() which does nothing (BookE e500/e500mc)
> 
> It turns out that this is only relevant to PR KVM actually. And both
> 32 and 64 backends need vcpu->arch.book3s to be valid when calling
> kvmppc_mmu_destroy_pr(). So instead of calling kvmppc_mmu_destroy()
> from kvm_arch_vcpu_destroy(), call kvmppc_mmu_destroy_pr() at the
> beginning of kvmppc_core_vcpu_free_pr(). This is consistent with
> kvmppc_mmu_init() being the last call in kvmppc_core_vcpu_create_pr().
> 
> For the same reason, if kvmppc_core_vcpu_create_pr() returns an
> error then this means that kvmppc_mmu_init() was either not called
> or failed, in which case kvmppc_mmu_destroy() should not be called.
> Drop the line in the error path of kvm_arch_vcpu_create().
> 
> Fixes: ff030fdf5573 ("KVM: PPC: Move kvm_vcpu_init() invocation to common 
> code")
> Signed-off-by: Greg Kurz 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/1d0c32ec3b860a32df593a22bad0d1dbc5546a59

cheers


Re: [PATCH v3] powerpc/kasan: Fix shadow memory protection with CONFIG_KASAN_VMALLOC

2020-03-21 Thread Michael Ellerman
On Fri, 2020-03-06 at 16:49:49 UTC, Christophe Leroy wrote:
> With CONFIG_KASAN_VMALLOC, new page tables are created at the time
> shadow memory for vmalloc area in unmapped. If some parts of the
> page table still has entries to the zero page shadow memory, the
> entries are wrongly marked RW.
> 
> With CONFIG_KASAN_VMALLOC, almost the entire kernel address space
> is managed by KASAN. To make it simple, just create KASAN page tables
> for the entire kernel space at kasan_init(). That doesn't use much
> more space, and that's anyway already done for hash platforms.
> 
> Fixes: 3d4247fcc938 ("powerpc/32: Add support of KASAN_VMALLOC")
> Signed-off-by: Christophe Leroy 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/af3d0a68698c7e5df8b72267086b23422a3954bb

cheers


[patch V3 17/20] lockdep: Introduce wait-type checks

2020-03-21 Thread Thomas Gleixner
From: Peter Zijlstra 

Extend lockdep to validate lock wait-type context.

The current wait-types are:

LD_WAIT_FREE,   /* wait free, rcu etc.. */
LD_WAIT_SPIN,   /* spin loops, raw_spinlock_t etc.. */
LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
LD_WAIT_SLEEP,  /* sleeping locks, mutex_t etc.. */

Where lockdep validates that the current lock (the one being acquired)
fits in the current wait-context (as generated by the held stack).

This ensures that there is no attempt to acquire mutexes while holding
spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In
other words, its a more fancy might_sleep().

Obviously RCU made the entire ordeal more complex than a simple single
value test because RCU can be acquired in (pretty much) any context and
while it presents a context to nested locks it is not the same as it
got acquired in.

Therefore its necessary to split the wait_type into two values, one
representing the acquire (outer) and one representing the nested context
(inner). For most 'normal' locks these two are the same.

[ To make static initialization easier we have the rule that:
  .outer == INV means .outer == .inner; because INV == 0. ]

It further means that its required to find the minimal .inner of the held
stack to compare against the outer of the new lock; because while 'normal'
RCU presents a CONFIG type to nested locks, if it is taken while already
holding a SPIN type it obviously doesn't relax the rules.

Below is an example output generated by the trivial test code:

  raw_spin_lock();
  spin_lock();
  spin_unlock();
  raw_spin_unlock();

 [ BUG: Invalid wait context ]
 -
 swapper/0/1 is trying to lock:
 c9013f20 (){}-{3:3}, at: kernel_init+0xdb/0x187
 other info that might help us debug this:
 1 lock held by swapper/0/1:
  #0: c9013ee0 (){+.+.}-{2:2}, at: kernel_init+0xd1/0x187

The way to read it is to look at the new -{n,m} part in the lock
description; -{3:3} for the attempted lock, and try and match that up to
the held locks, which in this case is the one: -{2,2}.

This tells that the acquiring lock requires a more relaxed environment than
presented by the lock stack.

Currently only the normal locks and RCU are converted, the rest of the
lockdep users defaults to .inner = INV which is ignored. More conversions
can be done when desired.

The check for spinlock_t nesting is not enabled by default. It's a separate
config option for now as there are known problems which are currently
addressed. The config option allows to identify these problems and to
verify that the solutions found are indeed solving them.

The config switch will be removed and the checks will permanently enabled
once the vast majority of issues has been addressed.

[ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile
   failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP]
[ tglx: Add the config option ]

Requested-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
---
V2: Fix the LOCKDEP=y && LOCK_PROVE=n case
---
 include/linux/irqflags.h|8 ++
 include/linux/lockdep.h |   71 +---
 include/linux/mutex.h   |7 +-
 include/linux/rwlock_types.h|6 +
 include/linux/rwsem.h   |6 +
 include/linux/sched.h   |1 
 include/linux/spinlock.h|   35 +++---
 include/linux/spinlock_types.h  |   24 +-
 kernel/irq/handle.c |7 ++
 kernel/locking/lockdep.c|  138 ++--
 kernel/locking/mutex-debug.c|2 
 kernel/locking/rwsem.c  |2 
 kernel/locking/spinlock_debug.c |6 -
 kernel/rcu/update.c |   24 +-
 lib/Kconfig.debug   |   17 
 15 files changed, 307 insertions(+), 47 deletions(-)

--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -37,7 +37,12 @@
 # define trace_softirqs_enabled(p) ((p)->softirqs_enabled)
 # define trace_hardirq_enter() \
 do {   \
-   current->hardirq_context++; \
+   if (!current->hardirq_context++)\
+   current->hardirq_threaded = 0;  \
+} while (0)
+# define trace_hardirq_threaded()  \
+do {   \
+   current->hardirq_threaded = 1;  \
 } while (0)
 # define trace_hardirq_exit()  \
 do {   \
@@ -59,6 +64,7 @@ do {  \
 # define trace_hardirqs_enabled(p) 0
 # define trace_softirqs_enabled(p) 0
 # define trace_hardirq_enter() do { } while (0)
+# define trace_hardirq_threaded()  do { } while (0)
 # define trace_hardirq_exit()  do { } while (0)
 # define 

[patch V3 16/20] completion: Use simple wait queues

2020-03-21 Thread Thomas Gleixner
From: Thomas Gleixner 

completion uses a wait_queue_head_t to enqueue waiters.

wait_queue_head_t contains a spinlock_t to protect the list of waiters
which excludes it from being used in truly atomic context on a PREEMPT_RT
enabled kernel.

The spinlock in the wait queue head cannot be replaced by a raw_spinlock
because:

  - wait queues can have custom wakeup callbacks, which acquire other
spinlock_t locks and have potentially long execution times

  - wake_up() walks an unbounded number of list entries during the wake up
and may wake an unbounded number of waiters.

For simplicity and performance reasons complete() should be usable on
PREEMPT_RT enabled kernels.

completions do not use custom wakeup callbacks and are usually single
waiter, except for a few corner cases.

Replace the wait queue in the completion with a simple wait queue (swait),
which uses a raw_spinlock_t for protecting the waiter list and therefore is
safe to use inside truly atomic regions on PREEMPT_RT.

There is no semantical or functional change:

  - completions use the exclusive wait mode which is what swait provides

  - complete() wakes one exclusive waiter

  - complete_all() wakes all waiters while holding the lock which protects
the wait queue against newly incoming waiters. The conversion to swait
preserves this behaviour.

complete_all() might cause unbound latencies with a large number of waiters
being woken at once, but most complete_all() usage sites are either in
testing or initialization code or have only a really small number of
concurrent waiters which for now does not cause a latency problem. Keep it
simple for now.

The fixup of the warning check in the USB gadget driver is just a straight
forward conversion of the lockless waiter check from one waitqueue type to
the other.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Joel Fernandes (Google) 
Reviewed-by: Davidlohr Bueso 
Reviewed-by: Greg Kroah-Hartman 
Acked-by: Linus Torvalds 
---
V2: Split out the orinoco and usb gadget parts and amended change log
---
 drivers/usb/gadget/function/f_fs.c |2 +-
 include/linux/completion.h |8 
 kernel/sched/completion.c  |   36 +++-
 3 files changed, 24 insertions(+), 22 deletions(-)

--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -1703,7 +1703,7 @@ static void ffs_data_put(struct ffs_data
pr_info("%s(): freeing\n", __func__);
ffs_data_clear(ffs);
BUG_ON(waitqueue_active(>ev.waitq) ||
-  waitqueue_active(>ep0req_completion.wait) ||
+  swait_active(>ep0req_completion.wait) ||
   waitqueue_active(>wait));
destroy_workqueue(ffs->io_completion_wq);
kfree(ffs->dev_name);
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,7 +9,7 @@
  * See kernel/sched/completion.c for details.
  */
 
-#include 
+#include 
 
 /*
  * struct completion - structure used to maintain state for a "completion"
@@ -25,7 +25,7 @@
  */
 struct completion {
unsigned int done;
-   wait_queue_head_t wait;
+   struct swait_queue_head wait;
 };
 
 #define init_completion_map(x, m) __init_completion(x)
@@ -34,7 +34,7 @@ static inline void complete_acquire(stru
 static inline void complete_release(struct completion *x) {}
 
 #define COMPLETION_INITIALIZER(work) \
-   { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+   { 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
 
 #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \
(*({ init_completion_map(&(work), &(map)); &(work); }))
@@ -85,7 +85,7 @@ static inline void complete_release(stru
 static inline void __init_completion(struct completion *x)
 {
x->done = 0;
-   init_waitqueue_head(>wait);
+   init_swait_queue_head(>wait);
 }
 
 /**
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -29,12 +29,12 @@ void complete(struct completion *x)
 {
unsigned long flags;
 
-   spin_lock_irqsave(>wait.lock, flags);
+   raw_spin_lock_irqsave(>wait.lock, flags);
 
if (x->done != UINT_MAX)
x->done++;
-   __wake_up_locked(>wait, TASK_NORMAL, 1);
-   spin_unlock_irqrestore(>wait.lock, flags);
+   swake_up_locked(>wait);
+   raw_spin_unlock_irqrestore(>wait.lock, flags);
 }
 EXPORT_SYMBOL(complete);
 
@@ -58,10 +58,12 @@ void complete_all(struct completion *x)
 {
unsigned long flags;
 
-   spin_lock_irqsave(>wait.lock, flags);
+   WARN_ON(irqs_disabled());
+
+   raw_spin_lock_irqsave(>wait.lock, flags);
x->done = UINT_MAX;
-   __wake_up_locked(>wait, TASK_NORMAL, 0);
-   spin_unlock_irqrestore(>wait.lock, flags);
+   swake_up_all_locked(>wait);
+   raw_spin_unlock_irqrestore(>wait.lock, flags);
 }
 EXPORT_SYMBOL(complete_all);
 
@@ -70,20 +72,20 @@ do_wait_for_common(struct completion *x,
 

[patch V3 09/20] ia64: Remove mm.h from asm/uaccess.h

2020-03-21 Thread Thomas Gleixner
From: Sebastian Andrzej Siewior 

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC  kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|  from include/linux/mm.h:567,
|  from arch/ia64/include/asm/uaccess.h:,
|  from include/linux/uaccess.h:11,
|  from include/linux/sched/task.h:11,
|  from include/linux/sched/signal.h:9,
|  from include/linux/rcuwait.h:6,
|  from include/linux/percpu-rwsem.h:8,
|  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 
'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: linux-i...@vger.kernel.org
---
V3: New patch
---
 arch/ia64/include/asm/uaccess.h | 1 -
 arch/ia64/kernel/process.c  | 1 +
 arch/ia64/mm/ioremap.c  | 1 +
 3 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/uaccess.h b/arch/ia64/include/asm/uaccess.h
index 89782ad3fb887..5c7e79eccaeed 100644
--- a/arch/ia64/include/asm/uaccess.h
+++ b/arch/ia64/include/asm/uaccess.h
@@ -35,7 +35,6 @@
 
 #include 
 #include 
-#include 
 
 #include 
 #include 
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index 968b5f33e725e..743aaf5283278 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -681,3 +681,4 @@ machine_power_off (void)
machine_halt();
 }
 
+EXPORT_SYMBOL(ia64_delay_loop);
diff --git a/arch/ia64/mm/ioremap.c b/arch/ia64/mm/ioremap.c
index a09cfa0645369..55fd3eb753ff9 100644
--- a/arch/ia64/mm/ioremap.c
+++ b/arch/ia64/mm/ioremap.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.26.0.rc2




[patch V3 18/20] lockdep: Add hrtimer context tracing bits

2020-03-21 Thread Thomas Gleixner
From: Sebastian Andrzej Siewior 

Set current->irq_config = 1 for hrtimers which are not marked to expire in
hard interrupt context during hrtimer_init(). These timers will expire in
softirq context on PREEMPT_RT.

Setting this allows lockdep to differentiate these timers. If a timer is
marked to expire in hard interrupt context then the timer callback is not
supposed to acquire a regular spinlock instead of a raw_spinlock in the
expiry callback.

Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
---
 include/linux/irqflags.h |   15 +++
 include/linux/sched.h|1 +
 kernel/locking/lockdep.c |2 +-
 kernel/time/hrtimer.c|6 +-
 4 files changed, 22 insertions(+), 2 deletions(-)

--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -56,6 +56,19 @@ do { \
 do {   \
current->softirq_context--; \
 } while (0)
+
+# define lockdep_hrtimer_enter(__hrtimer)  \
+ do {  \
+ if (!__hrtimer->is_hard)  \
+   current->irq_config = 1;\
+ } while (0)
+
+# define lockdep_hrtimer_exit(__hrtimer)   \
+ do {  \
+ if (!__hrtimer->is_hard)  \
+   current->irq_config = 0;\
+ } while (0)
+
 #else
 # define trace_hardirqs_on()   do { } while (0)
 # define trace_hardirqs_off()  do { } while (0)
@@ -68,6 +81,8 @@ do {  \
 # define trace_hardirq_exit()  do { } while (0)
 # define lockdep_softirq_enter()   do { } while (0)
 # define lockdep_softirq_exit()do { } while (0)
+# define lockdep_hrtimer_enter(__hrtimer)  do { } while (0)
+# define lockdep_hrtimer_exit(__hrtimer)   do { } while (0)
 #endif
 
 #if defined(CONFIG_IRQSOFF_TRACER) || \
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -983,6 +983,7 @@ struct task_struct {
unsigned intsoftirq_enable_event;
int softirqs_enabled;
int softirq_context;
+   int irq_config;
 #endif
 
 #ifdef CONFIG_LOCKDEP
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3759,7 +3759,7 @@ static int check_wait_context(struct tas
/*
 * Check if force_irqthreads will run us threaded.
 */
-   if (curr->hardirq_threaded)
+   if (curr->hardirq_threaded || curr->irq_config)
curr_inner = LD_WAIT_CONFIG;
else
curr_inner = LD_WAIT_SPIN;
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1404,7 +1404,7 @@ static void __hrtimer_init(struct hrtime
base = softtimer ? HRTIMER_MAX_CLOCK_BASES / 2 : 0;
base += hrtimer_clockid_to_base(clock_id);
timer->is_soft = softtimer;
-   timer->is_hard = !softtimer;
+   timer->is_hard = !!(mode & HRTIMER_MODE_HARD);
timer->base = _base->clock_base[base];
timerqueue_init(>node);
 }
@@ -1514,7 +1514,11 @@ static void __run_hrtimer(struct hrtimer
 */
raw_spin_unlock_irqrestore(_base->lock, flags);
trace_hrtimer_expire_entry(timer, now);
+   lockdep_hrtimer_enter(timer);
+
restart = fn(timer);
+
+   lockdep_hrtimer_exit(timer);
trace_hrtimer_expire_exit(timer);
raw_spin_lock_irq(_base->lock);
 




[patch V3 01/20] PCI/switchtec: Fix init_completion race condition with poll_wait()

2020-03-21 Thread Thomas Gleixner
From: Logan Gunthorpe 

The call to init_completion() in mrpc_queue_cmd() can theoretically
race with the call to poll_wait() in switchtec_dev_poll().

  poll()write()
switchtec_dev_poll()  switchtec_dev_write()
  poll_wait(>comp.wait);  mrpc_queue_cmd()
   init_completion(>comp)
 init_waitqueue_head(>comp.wait)

To my knowledge, no one has hit this bug.

Fix this by using reinit_completion() instead of init_completion() in
mrpc_queue_cmd().

Fixes: 080b47def5e5 ("MicroSemi Switchtec management interface driver")
Reported-by: Sebastian Andrzej Siewior 
Signed-off-by: Logan Gunthorpe 
Signed-off-by: Thomas Gleixner 
Acked-by: Bjorn Helgaas 
Cc: Kurt Schwemmer 
Cc: linux-...@vger.kernel.org
Link: https://lkml.kernel.org/r/20200313183608.2646-1-log...@deltatee.com

---
 drivers/pci/switch/switchtec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
index a823b4b8ef8a..81dc7ac01381 100644
--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -175,7 +175,7 @@ static int mrpc_queue_cmd(struct switchtec_user *stuser)
kref_get(>kref);
stuser->read_len = sizeof(stuser->data);
stuser_set_state(stuser, MRPC_QUEUED);
-   init_completion(>comp);
+   reinit_completion(>comp);
list_add_tail(>list, >mrpc_queue);
 
mrpc_cmd_submit(stdev);
-- 
2.20.1





[patch V3 11/20] rcuwait: Add @state argument to rcuwait_wait_event()

2020-03-21 Thread Thomas Gleixner
From: Peter Zijlstra (Intel) 

Extend rcuwait_wait_event() with a state variable so that it is not
restricted to UNINTERRUPTIBLE waits.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Davidlohr Bueso 

---
 include/linux/rcuwait.h   |   12 ++--
 kernel/locking/percpu-rwsem.c |2 +-
 2 files changed, 11 insertions(+), 3 deletions(-)

--- a/include/linux/rcuwait.h
+++ b/include/linux/rcuwait.h
@@ -3,6 +3,7 @@
 #define _LINUX_RCUWAIT_H_
 
 #include 
+#include 
 
 /*
  * rcuwait provides a way of blocking and waking up a single
@@ -30,23 +31,30 @@ extern void rcuwait_wake_up(struct rcuwa
  * The caller is responsible for locking around rcuwait_wait_event(),
  * such that writes to @task are properly serialized.
  */
-#define rcuwait_wait_event(w, condition)   \
+#define rcuwait_wait_event(w, condition, state)
\
 ({ \
+   int __ret = 0;  \
rcu_assign_pointer((w)->task, current); \
for (;;) {  \
/*  \
 * Implicit barrier (A) pairs with (B) in   \
 * rcuwait_wake_up().   \
 */ \
-   set_current_state(TASK_UNINTERRUPTIBLE);\
+   set_current_state(state);   \
if (condition)  \
break;  \
\
+   if (signal_pending_state(state, current)) { \
+   __ret = -EINTR; \
+   break;  \
+   }   \
+   \
schedule(); \
}   \
\
WRITE_ONCE((w)->task, NULL);\
__set_current_state(TASK_RUNNING);  \
+   __ret;  \
 })
 
 #endif /* _LINUX_RCUWAIT_H_ */
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -162,7 +162,7 @@ void percpu_down_write(struct percpu_rw_
 */
 
/* Wait for all now active readers to complete. */
-   rcuwait_wait_event(>writer, readers_active_check(sem));
+   rcuwait_wait_event(>writer, readers_active_check(sem), 
TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL_GPL(percpu_down_write);
 




[patch V3 03/20] usb: gadget: Use completion interface instead of open coding it

2020-03-21 Thread Thomas Gleixner
From: Thomas Gleixner 

ep_io() uses a completion on stack and open codes the waiting with:

  wait_event_interruptible (done.wait, done.done);
and
  wait_event (done.wait, done.done);

This waits in non-exclusive mode for complete(), but there is no reason to
do so because the completion can only be waited for by the task itself and
complete() wakes exactly one exlusive waiter.

Replace the open coded implementation with the corresponding
wait_for_completion*() functions.

No functional change.

Reported-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Cc: Felipe Balbi 
Cc: linux-...@vger.kernel.org
---
V2: New patch to avoid the conversion to swait interfaces later
---
 drivers/usb/gadget/legacy/inode.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/usb/gadget/legacy/inode.c
+++ b/drivers/usb/gadget/legacy/inode.c
@@ -344,7 +344,7 @@ ep_io (struct ep_data *epdata, void *buf
spin_unlock_irq (>dev->lock);
 
if (likely (value == 0)) {
-   value = wait_event_interruptible (done.wait, done.done);
+   value = wait_for_completion_interruptible();
if (value != 0) {
spin_lock_irq (>dev->lock);
if (likely (epdata->ep != NULL)) {
@@ -353,7 +353,7 @@ ep_io (struct ep_data *epdata, void *buf
usb_ep_dequeue (epdata->ep, epdata->req);
spin_unlock_irq (>dev->lock);
 
-   wait_event (done.wait, done.done);
+   wait_for_completion();
if (epdata->status == -ECONNRESET)
epdata->status = -EINTR;
} else {




[patch V3 10/20] microblaze: Remove mm.h from asm/uaccess.h

2020-03-21 Thread Thomas Gleixner
From: Sebastian Andrzej Siewior 

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC  kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|  from include/linux/mm.h:567,
|  from arch/microblaze/include/asm/uaccess.h:,
|  from include/linux/uaccess.h:11,
|  from include/linux/sched/task.h:11,
|  from include/linux/sched/signal.h:9,
|  from include/linux/rcuwait.h:6,
|  from include/linux/percpu-rwsem.h:8,
|  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 
'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Cc: Michal Simek 
---
V3; New patch
---
 arch/microblaze/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/microblaze/include/asm/uaccess.h 
b/arch/microblaze/include/asm/uaccess.h
index a1f206b90753a..4916d5fbea5e3 100644
--- a/arch/microblaze/include/asm/uaccess.h
+++ b/arch/microblaze/include/asm/uaccess.h
@@ -12,7 +12,6 @@
 #define _ASM_MICROBLAZE_UACCESS_H
 
 #include 
-#include 
 
 #include 
 #include 
-- 
2.26.0.rc2




[patch V3 19/20] lockdep: Annotate irq_work

2020-03-21 Thread Thomas Gleixner
From: Sebastian Andrzej Siewior 

Mark irq_work items with IRQ_WORK_HARD_IRQ which should be invoked in
hardirq context even on PREEMPT_RT. IRQ_WORK without this flag will be
invoked in softirq context on PREEMPT_RT.

Set ->irq_config to 1 for the IRQ_WORK items which are invoked in softirq
context so lockdep knows that these can safely acquire a spinlock_t.

Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
---
 include/linux/irq_work.h |2 ++
 include/linux/irqflags.h |   13 +
 kernel/irq_work.c|2 ++
 kernel/rcu/tree.c|1 +
 kernel/time/tick-sched.c |1 +
 5 files changed, 19 insertions(+)

--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -18,6 +18,8 @@
 
 /* Doesn't want IPI, wait for tick: */
 #define IRQ_WORK_LAZY  BIT(2)
+/* Run hard IRQ context, even on RT */
+#define IRQ_WORK_HARD_IRQ  BIT(3)
 
 #define IRQ_WORK_CLAIMED   (IRQ_WORK_PENDING | IRQ_WORK_BUSY)
 
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -69,6 +69,17 @@ do { \
current->irq_config = 0;\
  } while (0)
 
+# define lockdep_irq_work_enter(__work)
\
+ do {  \
+ if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
+   current->irq_config = 1;\
+ } while (0)
+# define lockdep_irq_work_exit(__work) \
+ do {  \
+ if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
+   current->irq_config = 0;\
+ } while (0)
+
 #else
 # define trace_hardirqs_on()   do { } while (0)
 # define trace_hardirqs_off()  do { } while (0)
@@ -83,6 +94,8 @@ do {  \
 # define lockdep_softirq_exit()do { } while (0)
 # define lockdep_hrtimer_enter(__hrtimer)  do { } while (0)
 # define lockdep_hrtimer_exit(__hrtimer)   do { } while (0)
+# define lockdep_irq_work_enter(__work)do { } while (0)
+# define lockdep_irq_work_exit(__work) do { } while (0)
 #endif
 
 #if defined(CONFIG_IRQSOFF_TRACER) || \
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -153,7 +153,9 @@ static void irq_work_run_list(struct lli
 */
flags = atomic_fetch_andnot(IRQ_WORK_PENDING, >flags);
 
+   lockdep_irq_work_enter(work);
work->func(work);
+   lockdep_irq_work_exit(work);
/*
 * Clear the BUSY bit and return to the free state if
 * no-one else claimed it meanwhile.
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1113,6 +1113,7 @@ static int rcu_implicit_dynticks_qs(stru
!rdp->rcu_iw_pending && rdp->rcu_iw_gp_seq != rnp->gp_seq &&
(rnp->ffmask & rdp->grpmask)) {
init_irq_work(>rcu_iw, rcu_iw_handler);
+   atomic_set(>rcu_iw.flags, IRQ_WORK_HARD_IRQ);
rdp->rcu_iw_pending = true;
rdp->rcu_iw_gp_seq = rnp->gp_seq;
irq_work_queue_on(>rcu_iw, rdp->cpu);
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -245,6 +245,7 @@ static void nohz_full_kick_func(struct i
 
 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
.func = nohz_full_kick_func,
+   .flags = ATOMIC_INIT(IRQ_WORK_HARD_IRQ),
 };
 
 /*




[patch V3 05/20] acpi: Remove header dependency

2020-03-21 Thread Thomas Gleixner
From: Peter Zijlstra 

In order to avoid future header hell, remove the inclusion of
proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a
struct.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Cc: Darren Hart 
Cc: Andy Shevchenko 
Cc: platform-driver-...@vger.kernel.org
Cc: Greg Kroah-Hartman 
Cc: Zhang Rui 
Cc: "Rafael J. Wysocki" 
Cc: linux...@vger.kernel.org
Cc: Len Brown 
Cc: linux-a...@vger.kernel.org
---
 drivers/platform/x86/dell-smo8800.c  |1 +
 drivers/platform/x86/wmi.c   |1 +
 drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c |1 +
 include/acpi/acpi_bus.h  |2 +-
 4 files changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/platform/x86/dell-smo8800.c
+++ b/drivers/platform/x86/dell-smo8800.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct smo8800_device {
u32 irq; /* acpi device irq */
--- a/drivers/platform/x86/wmi.c
+++ b/drivers/platform/x86/wmi.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 ACPI_MODULE_NAME("wmi");
--- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
+++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "acpi_thermal_rel.h"
 
 static acpi_handle acpi_thermal_rel_handle;
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -80,7 +80,7 @@ bool acpi_dev_present(const char *hid, c
 
 #ifdef CONFIG_ACPI
 
-#include 
+struct proc_dir_entry;
 
 #define ACPI_BUS_FILE_ROOT "acpi"
 extern struct proc_dir_entry *acpi_root_dir;




[patch V3 13/20] Documentation: Add lock ordering and nesting documentation

2020-03-21 Thread Thomas Gleixner
From: Thomas Gleixner 

The kernel provides a variety of locking primitives. The nesting of these
lock types and the implications of them on RT enabled kernels is nowhere
documented.

Add initial documentation.

Signed-off-by: Thomas Gleixner 
Cc: "Paul E . McKenney" 
Cc: Jonathan Corbet 
Cc: Davidlohr Bueso 
Cc: Randy Dunlap 
---
V3: Addressed review comments from Paul, Jonathan, Davidlohr
V2: Addressed review comments from Randy
---
 Documentation/locking/index.rst |1 
 Documentation/locking/locktypes.rst |  299 
 2 files changed, 300 insertions(+)
 create mode 100644 Documentation/locking/locktypes.rst

--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -7,6 +7,7 @@ locking
 .. toctree::
 :maxdepth: 1
 
+locktypes
 lockdep-design
 lockstat
 locktorture
--- /dev/null
+++ b/Documentation/locking/locktypes.rst
@@ -0,0 +1,299 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _kernel_hacking_locktypes:
+
+==
+Lock types and their rules
+==
+
+Introduction
+
+
+The kernel provides a variety of locking primitives which can be divided
+into two categories:
+
+ - Sleeping locks
+ - Spinning locks
+
+This document conceptually describes these lock types and provides rules
+for their nesting, including the rules for use under PREEMPT_RT.
+
+
+Lock categories
+===
+
+Sleeping locks
+--
+
+Sleeping locks can only be acquired in preemptible task context.
+
+Although implementations allow try_lock() from other contexts, it is
+necessary to carefully evaluate the safety of unlock() as well as of
+try_lock().  Furthermore, it is also necessary to evaluate the debugging
+versions of these primitives.  In short, don't acquire sleeping locks from
+other contexts unless there is no other option.
+
+Sleeping lock types:
+
+ - mutex
+ - rt_mutex
+ - semaphore
+ - rw_semaphore
+ - ww_mutex
+ - percpu_rw_semaphore
+
+On PREEMPT_RT kernels, these lock types are converted to sleeping locks:
+
+ - spinlock_t
+ - rwlock_t
+
+Spinning locks
+--
+
+ - raw_spinlock_t
+ - bit spinlocks
+
+On non-PREEMPT_RT kernels, these lock types are also spinning locks:
+
+ - spinlock_t
+ - rwlock_t
+
+Spinning locks implicitly disable preemption and the lock / unlock functions
+can have suffixes which apply further protections:
+
+ ===  
+ _bh()Disable / enable bottom halves (soft interrupts)
+ _irq()   Disable / enable interrupts
+ _irqsave/restore()   Save and disable / restore interrupt disabled state
+ ===  
+
+
+rtmutex
+===
+
+RT-mutexes are mutexes with support for priority inheritance (PI).
+
+PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
+interrupt disabled sections.
+
+PI clearly cannot preempt preemption-disabled or interrupt-disabled
+regions of code, even on PREEMPT_RT kernels.  Instead, PREEMPT_RT kernels
+execute most such regions of code in preemptible task context, especially
+interrupt handlers and soft interrupts.  This conversion allows spinlock_t
+and rwlock_t to be implemented via RT-mutexes.
+
+
+raw_spinlock_t and spinlock_t
+=
+
+raw_spinlock_t
+--
+
+raw_spinlock_t is a strict spinning lock implementation regardless of the
+kernel configuration including PREEMPT_RT enabled kernels.
+
+raw_spinlock_t is a strict spinning lock implementation in all kernels,
+including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
+core code, low level interrupt handling and places where disabling
+preemption or interrupts is required, for example, to safely access
+hardware state.  raw_spinlock_t can sometimes also be used when the
+critical section is tiny, thus avoiding RT-mutex overhead.
+
+spinlock_t
+--
+
+The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+
+On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
+and has exactly the same semantics.
+
+spinlock_t and PREEMPT_RT
+-
+
+On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
+implementation based on rt_mutex which changes the semantics:
+
+ - Preemption is not disabled
+
+ - The hard interrupt related suffixes for spin_lock / spin_unlock
+   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
+   interrupt disabled state
+
+ - The soft interrupt related suffix (_bh()) still disables softirq
+   handlers.
+
+   Non-PREEMPT_RT kernels disable preemption to get this effect.
+
+   PREEMPT_RT kernels use a per-CPU lock for serialization which keeps
+   preemption disabled. The lock disables softirq handlers and also
+   prevents reentrancy due to task preemption.
+
+PREEMPT_RT kernels preserve all other spinlock_t semantics:
+
+ - Tasks holding a 

[patch V3 14/20] timekeeping: Split jiffies seqlock

2020-03-21 Thread Thomas Gleixner
From: Thomas Gleixner 

seqlock consists of a sequence counter and a spinlock_t which is used to
serialize the writers. spinlock_t is substituted by a "sleeping" spinlock
on PREEMPT_RT enabled kernels which breaks the usage in the timekeeping
code as the writers are executed in hard interrupt and therefore
non-preemptible context even on PREEMPT_RT.

The spinlock in seqlock cannot be unconditionally replaced by a
raw_spinlock_t as many seqlock users have nesting spinlock sections or
other code which is not suitable to run in truly atomic context on RT.

Instead of providing a raw_seqlock API for a single use case, open code the
seqlock for the jiffies use case and implement it with a raw_spinlock_t and
a sequence counter.

Signed-off-by: Thomas Gleixner 

---
 kernel/time/jiffies.c |7 ---
 kernel/time/tick-common.c |   10 ++
 kernel/time/tick-sched.c  |   19 ---
 kernel/time/timekeeping.c |6 --
 kernel/time/timekeeping.h |3 ++-
 5 files changed, 28 insertions(+), 17 deletions(-)

--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -58,7 +58,8 @@ static struct clocksource clocksource_ji
.max_cycles = 10,
 };
 
-__cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock);
+__cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(jiffies_lock);
+__cacheline_aligned_in_smp seqcount_t jiffies_seq;
 
 #if (BITS_PER_LONG < 64)
 u64 get_jiffies_64(void)
@@ -67,9 +68,9 @@ u64 get_jiffies_64(void)
u64 ret;
 
do {
-   seq = read_seqbegin(_lock);
+   seq = read_seqcount_begin(_seq);
ret = jiffies_64;
-   } while (read_seqretry(_lock, seq));
+   } while (read_seqcount_retry(_seq, seq));
return ret;
 }
 EXPORT_SYMBOL(get_jiffies_64);
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -84,13 +84,15 @@ int tick_is_oneshot_available(void)
 static void tick_periodic(int cpu)
 {
if (tick_do_timer_cpu == cpu) {
-   write_seqlock(_lock);
+   raw_spin_lock(_lock);
+   write_seqcount_begin(_seq);
 
/* Keep track of the next tick event */
tick_next_period = ktime_add(tick_next_period, tick_period);
 
do_timer(1);
-   write_sequnlock(_lock);
+   write_seqcount_end(_seq);
+   raw_spin_unlock(_lock);
update_wall_time();
}
 
@@ -162,9 +164,9 @@ void tick_setup_periodic(struct clock_ev
ktime_t next;
 
do {
-   seq = read_seqbegin(_lock);
+   seq = read_seqcount_begin(_seq);
next = tick_next_period;
-   } while (read_seqretry(_lock, seq));
+   } while (read_seqcount_retry(_seq, seq));
 
clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT);
 
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -65,7 +65,8 @@ static void tick_do_update_jiffies64(kti
return;
 
/* Reevaluate with jiffies_lock held */
-   write_seqlock(_lock);
+   raw_spin_lock(_lock);
+   write_seqcount_begin(_seq);
 
delta = ktime_sub(now, last_jiffies_update);
if (delta >= tick_period) {
@@ -91,10 +92,12 @@ static void tick_do_update_jiffies64(kti
/* Keep the tick_next_period variable up to date */
tick_next_period = ktime_add(last_jiffies_update, tick_period);
} else {
-   write_sequnlock(_lock);
+   write_seqcount_end(_seq);
+   raw_spin_unlock(_lock);
return;
}
-   write_sequnlock(_lock);
+   write_seqcount_end(_seq);
+   raw_spin_unlock(_lock);
update_wall_time();
 }
 
@@ -105,12 +108,14 @@ static ktime_t tick_init_jiffy_update(vo
 {
ktime_t period;
 
-   write_seqlock(_lock);
+   raw_spin_lock(_lock);
+   write_seqcount_begin(_seq);
/* Did we start the jiffies update yet ? */
if (last_jiffies_update == 0)
last_jiffies_update = tick_next_period;
period = last_jiffies_update;
-   write_sequnlock(_lock);
+   write_seqcount_end(_seq);
+   raw_spin_unlock(_lock);
return period;
 }
 
@@ -676,10 +681,10 @@ static ktime_t tick_nohz_next_event(stru
 
/* Read jiffies and the time when jiffies were updated last */
do {
-   seq = read_seqbegin(_lock);
+   seq = read_seqcount_begin(_seq);
basemono = last_jiffies_update;
basejiff = jiffies;
-   } while (read_seqretry(_lock, seq));
+   } while (read_seqcount_retry(_seq, seq));
ts->last_jiffies = basejiff;
ts->timer_expires_base = basemono;
 
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2397,8 +2397,10 @@ EXPORT_SYMBOL(hardpps);
  */
 void xtime_update(unsigned long ticks)
 {
-   write_seqlock(_lock);
+   

[patch V3 04/20] orinoco_usb: Use the regular completion interfaces

2020-03-21 Thread Thomas Gleixner
From: Thomas Gleixner 

The completion usage in this driver is interesting:

  - it uses a magic complete function which according to the comment was
implemented by invoking complete() four times in a row because
complete_all() was not exported at that time.

  - it uses an open coded wait/poll which checks completion:done. Only one wait
side (device removal) uses the regular wait_for_completion() interface.

The rationale behind this is to prevent that wait_for_completion() consumes
completion::done which would prevent that all waiters are woken. This is not
necessary with complete_all() as that sets completion::done to UINT_MAX which
is left unmodified by the woken waiters.

Replace the magic complete function with complete_all() and convert the
open coded wait/poll to regular completion interfaces.

This changes the wait to exclusive wait mode. But that does not make any
difference because the wakers use complete_all() which ignores the
exclusive mode.

Reported-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Cc: Kalle Valo 
Cc: "David S. Miller" 
Cc: linux-wirel...@vger.kernel.org
Cc: net...@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
V2: New patch to avoid conversion to swait functions later.
---
 drivers/net/wireless/intersil/orinoco/orinoco_usb.c |   21 
 1 file changed, 5 insertions(+), 16 deletions(-)

--- a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
+++ b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
@@ -365,17 +365,6 @@ static struct request_context *ezusb_all
return ctx;
 }
 
-
-/* Hopefully the real complete_all will soon be exported, in the mean
- * while this should work. */
-static inline void ezusb_complete_all(struct completion *comp)
-{
-   complete(comp);
-   complete(comp);
-   complete(comp);
-   complete(comp);
-}
-
 static void ezusb_ctx_complete(struct request_context *ctx)
 {
struct ezusb_priv *upriv = ctx->upriv;
@@ -409,7 +398,7 @@ static void ezusb_ctx_complete(struct re
 
netif_wake_queue(dev);
}
-   ezusb_complete_all(>done);
+   complete_all(>done);
ezusb_request_context_put(ctx);
break;
 
@@ -419,7 +408,7 @@ static void ezusb_ctx_complete(struct re
/* This is normal, as all request contexts get flushed
 * when the device is disconnected */
err("Called, CTX not terminating, but device gone");
-   ezusb_complete_all(>done);
+   complete_all(>done);
ezusb_request_context_put(ctx);
break;
}
@@ -690,11 +679,11 @@ static void ezusb_req_ctx_wait(struct ez
 * get the chance to run themselves. So we make sure
 * that we don't sleep for ever */
int msecs = DEF_TIMEOUT * (1000 / HZ);
-   while (!ctx->done.done && msecs--)
+
+   while (!try_wait_for_completion(>done) && msecs--)
udelay(1000);
} else {
-   wait_event_interruptible(ctx->done.wait,
-ctx->done.done);
+   wait_for_completion(>done);
}
break;
default:




[patch V3 15/20] sched/swait: Prepare usage in completions

2020-03-21 Thread Thomas Gleixner
From: Thomas Gleixner 

As a preparation to use simple wait queues for completions:

  - Provide swake_up_all_locked() to support complete_all()
  - Make __prepare_to_swait() public available

This is done to enable the usage of complete() within truly atomic contexts
on a PREEMPT_RT enabled kernel.

Signed-off-by: Thomas Gleixner 
Cc: Linus Torvalds 
---
V2: Add comment to swake_up_all_locked()
---
 kernel/sched/sched.h |3 +++
 kernel/sched/swait.c |   15 ++-
 2 files changed, 17 insertions(+), 1 deletion(-)

--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2492,3 +2492,6 @@ static inline bool is_per_cpu_kthread(st
return true;
 }
 #endif
+
+void swake_up_all_locked(struct swait_queue_head *q);
+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
--- a/kernel/sched/swait.c
+++ b/kernel/sched/swait.c
@@ -32,6 +32,19 @@ void swake_up_locked(struct swait_queue_
 }
 EXPORT_SYMBOL(swake_up_locked);
 
+/*
+ * Wake up all waiters. This is an interface which is solely exposed for
+ * completions and not for general usage.
+ *
+ * It is intentionally different from swake_up_all() to allow usage from
+ * hard interrupt context and interrupt disabled regions.
+ */
+void swake_up_all_locked(struct swait_queue_head *q)
+{
+   while (!list_empty(>task_list))
+   swake_up_locked(q);
+}
+
 void swake_up_one(struct swait_queue_head *q)
 {
unsigned long flags;
@@ -69,7 +82,7 @@ void swake_up_all(struct swait_queue_hea
 }
 EXPORT_SYMBOL(swake_up_all);
 
-static void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue 
*wait)
+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
 {
wait->task = current;
if (list_empty(>task_list))




[patch V3 08/20] hexagon: Remove mm.h from asm/uaccess.h

2020-03-21 Thread Thomas Gleixner
From: Sebastian Andrzej Siewior 

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC  kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|  from include/linux/mm.h:567,
|  from arch/hexagon/include/asm/uaccess.h:,
|  from include/linux/uaccess.h:11,
|  from include/linux/sched/task.h:11,
|  from include/linux/sched/signal.h:9,
|  from include/linux/rcuwait.h:6,
|  from include/linux/percpu-rwsem.h:8,
|  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 
'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Cc: Brian Cain 
Cc: linux-hexa...@vger.kernel.org
---
V3: New patch
---
 arch/hexagon/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/hexagon/include/asm/uaccess.h 
b/arch/hexagon/include/asm/uaccess.h
index 00cb38faad0c4..c1019a736ff13 100644
--- a/arch/hexagon/include/asm/uaccess.h
+++ b/arch/hexagon/include/asm/uaccess.h
@@ -10,7 +10,6 @@
 /*
  * User space memory access functions
  */
-#include 
 #include 
 
 /*
-- 
2.26.0.rc2




[patch V3 02/20] pci/switchtec: Replace completion wait queue usage for poll

2020-03-21 Thread Thomas Gleixner
From: Sebastian Andrzej Siewior 

The poll callback is using the completion wait queue and sticks it into
poll_wait() to wake up pollers after a command has completed.

This works to some extent, but cannot provide EPOLLEXCLUSIVE support
because the waker side uses complete_all() which unconditionally wakes up
all waiters. complete_all() is required because completions internally use
exclusive wait and complete() only wakes up one waiter by default.

This mixes conceptually different mechanisms and relies on internal
implementation details of completions, which in turn puts contraints on
changing the internal implementation of completions.

Replace it with a regular wait queue and store the state in struct
switchtec_user.

Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Logan Gunthorpe 
Acked-by: Peter Zijlstra (Intel) 
Acked-by: Bjorn Helgaas 
Cc: Kurt Schwemmer 
Cc: linux-...@vger.kernel.org
---
V2: Reworded changelog.
---
 drivers/pci/switch/switchtec.c |   22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -52,10 +52,11 @@ struct switchtec_user {
 
enum mrpc_state state;
 
-   struct completion comp;
+   wait_queue_head_t cmd_comp;
struct kref kref;
struct list_head list;
 
+   bool cmd_done;
u32 cmd;
u32 status;
u32 return_code;
@@ -77,7 +78,7 @@ static struct switchtec_user *stuser_cre
stuser->stdev = stdev;
kref_init(>kref);
INIT_LIST_HEAD(>list);
-   init_completion(>comp);
+   init_waitqueue_head(>cmd_comp);
stuser->event_cnt = atomic_read(>event_cnt);
 
dev_dbg(>dev, "%s: %p\n", __func__, stuser);
@@ -175,7 +176,7 @@ static int mrpc_queue_cmd(struct switcht
kref_get(>kref);
stuser->read_len = sizeof(stuser->data);
stuser_set_state(stuser, MRPC_QUEUED);
-   reinit_completion(>comp);
+   stuser->cmd_done = false;
list_add_tail(>list, >mrpc_queue);
 
mrpc_cmd_submit(stdev);
@@ -222,7 +223,8 @@ static void mrpc_complete_cmd(struct swi
memcpy_fromio(stuser->data, >mmio_mrpc->output_data,
  stuser->read_len);
 out:
-   complete_all(>comp);
+   stuser->cmd_done = true;
+   wake_up_interruptible(>cmd_comp);
list_del_init(>list);
stuser_put(stuser);
stdev->mrpc_busy = 0;
@@ -529,10 +531,11 @@ static ssize_t switchtec_dev_read(struct
mutex_unlock(>mrpc_mutex);
 
if (filp->f_flags & O_NONBLOCK) {
-   if (!try_wait_for_completion(>comp))
+   if (!stuser->cmd_done)
return -EAGAIN;
} else {
-   rc = wait_for_completion_interruptible(>comp);
+   rc = wait_event_interruptible(stuser->cmd_comp,
+ stuser->cmd_done);
if (rc < 0)
return rc;
}
@@ -580,7 +583,7 @@ static __poll_t switchtec_dev_poll(struc
struct switchtec_dev *stdev = stuser->stdev;
__poll_t ret = 0;
 
-   poll_wait(filp, >comp.wait, wait);
+   poll_wait(filp, >cmd_comp, wait);
poll_wait(filp, >event_wq, wait);
 
if (lock_mutex_and_test_alive(stdev))
@@ -588,7 +591,7 @@ static __poll_t switchtec_dev_poll(struc
 
mutex_unlock(>mrpc_mutex);
 
-   if (try_wait_for_completion(>comp))
+   if (stuser->cmd_done)
ret |= EPOLLIN | EPOLLRDNORM;
 
if (stuser->event_cnt != atomic_read(>event_cnt))
@@ -1272,7 +1275,8 @@ static void stdev_kill(struct switchtec_
 
/* Wake up and kill any users waiting on an MRPC request */
list_for_each_entry_safe(stuser, tmpuser, >mrpc_queue, list) {
-   complete_all(>comp);
+   stuser->cmd_done = true;
+   wake_up_interruptible(>cmd_comp);
list_del_init(>list);
stuser_put(stuser);
}




[patch V3 06/20] nds32: Remove mm.h from asm/uaccess.h

2020-03-21 Thread Thomas Gleixner
From: Sebastian Andrzej Siewior 

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC  kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|  from include/linux/mm.h:567,
|  from arch/nds32/include/asm/uaccess.h:,
|  from include/linux/uaccess.h:11,
|  from include/linux/sched/task.h:11,
|  from include/linux/sched/signal.h:9,
|  from include/linux/rcuwait.h:6,
|  from include/linux/percpu-rwsem.h:8,
|  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 
'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Cc: Nick Hu 
Cc: Greentime Hu 
Cc: Vincent Chen 
---
V3: New patch
---
 arch/nds32/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/nds32/include/asm/uaccess.h b/arch/nds32/include/asm/uaccess.h
index 8916ad9f9f139..3a9219f53ee0d 100644
--- a/arch/nds32/include/asm/uaccess.h
+++ b/arch/nds32/include/asm/uaccess.h
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #define __asmeq(x, y)  ".ifnc " x "," y " ; .err ; .endif\n\t"
 
-- 
2.26.0.rc2




[patch V3 07/20] csky: Remove mm.h from asm/uaccess.h

2020-03-21 Thread Thomas Gleixner
From: Sebastian Andrzej Siewior 

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC  kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|  from include/linux/mm.h:567,
|  from arch/csky/include/asm/uaccess.h:,
|  from include/linux/uaccess.h:11,
|  from include/linux/sched/task.h:11,
|  from include/linux/sched/signal.h:9,
|  from include/linux/rcuwait.h:6,
|  from include/linux/percpu-rwsem.h:8,
|  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 
'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Cc: Guo Ren 
Cc: linux-c...@vger.kernel.org
---
V3: New patch
---
 arch/csky/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/csky/include/asm/uaccess.h b/arch/csky/include/asm/uaccess.h
index eaa1c3403a424..abefa125b93cf 100644
--- a/arch/csky/include/asm/uaccess.h
+++ b/arch/csky/include/asm/uaccess.h
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.26.0.rc2




[patch V3 12/20] powerpc/ps3: Convert half completion to rcuwait

2020-03-21 Thread Thomas Gleixner
From: Thomas Gleixner 

The PS3 notification interrupt and kthread use a hacked up completion to
communicate. Since we're wanting to change the completion implementation and
this is abuse anyway, replace it with a simple rcuwait since there is only ever
the one waiter.

AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads
cannot receive signals by default and this one doesn't look different. Use
TASK_IDLE instead.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Cc: Michael Ellerman 
Cc: Arnd Bergmann 
Cc: Geoff Levand 
Cc: linuxppc-dev@lists.ozlabs.org
---
V3: Folded the init fix from bigeasy
V2: New patch to avoid the magic completion wait variant
---
 arch/powerpc/platforms/ps3/device-init.c |   18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

--- a/arch/powerpc/platforms/ps3/device-init.c
+++ b/arch/powerpc/platforms/ps3/device-init.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -670,7 +671,8 @@ struct ps3_notification_device {
spinlock_t lock;
u64 tag;
u64 lv1_status;
-   struct completion done;
+   struct rcuwait wait;
+   bool done;
 };
 
 enum ps3_notify_type {
@@ -712,7 +714,8 @@ static irqreturn_t ps3_notification_inte
pr_debug("%s:%u: completed, status 0x%llx\n", __func__,
 __LINE__, status);
dev->lv1_status = status;
-   complete(>done);
+   dev->done = true;
+   rcuwait_wake_up(>wait);
}
spin_unlock(>lock);
return IRQ_HANDLED;
@@ -725,12 +728,12 @@ static int ps3_notification_read_write(s
unsigned long flags;
int res;
 
-   init_completion(>done);
spin_lock_irqsave(>lock, flags);
res = write ? lv1_storage_write(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
>tag)
: lv1_storage_read(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
   >tag);
+   dev->done = false;
spin_unlock_irqrestore(>lock, flags);
if (res) {
pr_err("%s:%u: %s failed %d\n", __func__, __LINE__, op, res);
@@ -738,14 +741,10 @@ static int ps3_notification_read_write(s
}
pr_debug("%s:%u: notification %s issued\n", __func__, __LINE__, op);
 
-   res = wait_event_interruptible(dev->done.wait,
-  dev->done.done || kthread_should_stop());
+   rcuwait_wait_event(>wait, dev->done || kthread_should_stop(), 
TASK_IDLE);
+
if (kthread_should_stop())
res = -EINTR;
-   if (res) {
-   pr_debug("%s:%u: interrupted %s\n", __func__, __LINE__, op);
-   return res;
-   }
 
if (dev->lv1_status) {
pr_err("%s:%u: %s not completed, status 0x%llx\n", __func__,
@@ -810,6 +809,7 @@ static int ps3_probe_thread(void *data)
}
 
spin_lock_init();
+   rcuwait_init();
 
res = request_irq(irq, ps3_notification_interrupt, 0,
  "ps3_notification", );



[patch V3 00/20] Lock ordering documentation and annotation for lockdep

2020-03-21 Thread Thomas Gleixner
This is the third and hopefully final version of this work. The second one
can be found here:

   https://lore.kernel.org/r/20200318204302.693307...@linutronix.de

Changes since V2:

  - Included the arch/XXX fixups for the rcuwait changes (Sebastian)

  - Folded the init fix for the PS3 change (Sebastian)

  - Addressed feedback on documentation (Paul, Davidlohr, Jonathan)

  - Picked up acks and reviewed tags

Thanks,

tglx



[patch V3 20/20] lockdep: Add posixtimer context tracing bits

2020-03-21 Thread Thomas Gleixner
From: Sebastian Andrzej Siewior 

Splitting run_posix_cpu_timers() into two parts is work in progress which
is stuck on other entry code related problems. The heavy lifting which
involves locking of sighand lock will be moved into task context so the
necessary execution time is burdened on the task and not on interrupt
context.

Until this work completes lockdep with the spinlock nesting rules enabled
would emit warnings for this known context.

Prevent it by setting "->irq_config = 1" for the invocation of
run_posix_cpu_timers() so lockdep does not complain when sighand lock is
acquried. This will be removed once the split is completed.

Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
---
 include/linux/irqflags.h   |   12 
 kernel/time/posix-cpu-timers.c |6 +-
 2 files changed, 17 insertions(+), 1 deletion(-)

--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -69,6 +69,16 @@ do { \
current->irq_config = 0;\
  } while (0)
 
+# define lockdep_posixtimer_enter()\
+ do {  \
+ current->irq_config = 1;  \
+ } while (0)
+
+# define lockdep_posixtimer_exit() \
+ do {  \
+ current->irq_config = 0;  \
+ } while (0)
+
 # define lockdep_irq_work_enter(__work)
\
  do {  \
  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
@@ -94,6 +104,8 @@ do { \
 # define lockdep_softirq_exit()do { } while (0)
 # define lockdep_hrtimer_enter(__hrtimer)  do { } while (0)
 # define lockdep_hrtimer_exit(__hrtimer)   do { } while (0)
+# define lockdep_posixtimer_enter()do { } while (0)
+# define lockdep_posixtimer_exit() do { } while (0)
 # define lockdep_irq_work_enter(__work)do { } while (0)
 # define lockdep_irq_work_exit(__work) do { } while (0)
 #endif
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1126,8 +1126,11 @@ void run_posix_cpu_timers(void)
if (!fastpath_timer_check(tsk))
return;
 
-   if (!lock_task_sighand(tsk, ))
+   lockdep_posixtimer_enter();
+   if (!lock_task_sighand(tsk, )) {
+   lockdep_posixtimer_exit();
return;
+   }
/*
 * Here we take off tsk->signal->cpu_timers[N] and
 * tsk->cpu_timers[N] all the timers that are firing, and
@@ -1169,6 +1172,7 @@ void run_posix_cpu_timers(void)
cpu_timer_fire(timer);
spin_unlock(>it_lock);
}
+   lockdep_posixtimer_exit();
 }
 
 /*



Re: [PATCH 2/5] csky: Remove mm.h from asm/uaccess.h

2020-03-21 Thread Guo Ren
Tested and Acked by me.

Queued for next pull request, thx

On Fri, Mar 20, 2020 at 5:49 PM Sebastian Andrzej Siewior
 wrote:
>
> The defconfig compiles without linux/mm.h. With mm.h included the
> include chain leands to:
> |   CC  kernel/locking/percpu-rwsem.o
> | In file included from include/linux/huge_mm.h:8,
> |  from include/linux/mm.h:567,
> |  from arch/csky/include/asm/uaccess.h:,
> |  from include/linux/uaccess.h:11,
> |  from include/linux/sched/task.h:11,
> |  from include/linux/sched/signal.h:9,
> |  from include/linux/rcuwait.h:6,
> |  from include/linux/percpu-rwsem.h:8,
> |  from kernel/locking/percpu-rwsem.c:6:
> | include/linux/fs.h:1422:29: error: array type has incomplete element type 
> 'struct percpu_rw_semaphore'
> |  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];
>
> once rcuwait.h includes linux/sched/signal.h.
>
> Remove the linux/mm.h include.
>
> Cc: Guo Ren 
> Cc: linux-c...@vger.kernel.org
> Reported-by: kbuild test robot 
> Signed-off-by: Sebastian Andrzej Siewior 
> ---
>  arch/csky/include/asm/uaccess.h | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/arch/csky/include/asm/uaccess.h b/arch/csky/include/asm/uaccess.h
> index eaa1c3403a424..abefa125b93cf 100644
> --- a/arch/csky/include/asm/uaccess.h
> +++ b/arch/csky/include/asm/uaccess.h
> @@ -11,7 +11,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> --
> 2.26.0.rc2
>


-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/


Re: [PATCH 07/10] ARM: dts: tango4: Make /serial compatible with ns16550a

2020-03-21 Thread Måns Rullgård
Lubomir Rintel  writes:

> ralink,rt2880-uart is compatible with ns16550a and all other
> instances of RT2880 UART nodes include it in the compatible property.
> Add it also here, to make the binding schema simpler.
>
> Signed-off-by: Lubomir Rintel 

Acked-by: Mans Rullgard 

> ---
>  arch/arm/boot/dts/tango4-common.dtsi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm/boot/dts/tango4-common.dtsi 
> b/arch/arm/boot/dts/tango4-common.dtsi
> index ff72a8efb73d0..54fd522badfc9 100644
> --- a/arch/arm/boot/dts/tango4-common.dtsi
> +++ b/arch/arm/boot/dts/tango4-common.dtsi
> @@ -85,7 +85,7 @@ tick-counter@10048 {
>   };
>
>   uart: serial@10700 {
> - compatible = "ralink,rt2880-uart";
> + compatible = "ralink,rt2880-uart", "ns16550a";
>   reg = <0x10700 0x30>;
>   interrupts = <1 IRQ_TYPE_LEVEL_HIGH>;
>   clock-frequency = <7372800>;
> -- 
> 2.25.1
>

-- 
Måns Rullgård


Re: [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait

2020-03-21 Thread Thomas Gleixner
Christoph Hellwig  writes:

> On Wed, Mar 18, 2020 at 09:43:09PM +0100, Thomas Gleixner wrote:
>> The PS3 notification interrupt and kthread use a hacked up completion to
>> communicate. Since we're wanting to change the completion implementation and
>> this is abuse anyway, replace it with a simple rcuwait since there is only 
>> ever
>> the one waiter.
>> 
>> AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads
>> cannot receive signals by default and this one doesn't look different. Use
>> TASK_IDLE instead.
>
> I think the right fix here is to jut convert the thing to a threaded
> interrupt handler and kill off the stupid kthread.

That'd be a major surgery.

> But I wonder how alive the whole PS3 support is to start with..

There seem to be a few enthusiast left which have a Other-OS capable PS3

Thanks,

tglx



[PATCH] powerpc/ptrace: Fix ptrace-hwbreak selftest failure

2020-03-21 Thread Christophe Leroy
Pieces of commit c3f68b0478e7 ("powerpc/watchpoint: Fix ptrace
code that muck around with address/len") disappeared with
commit 53387e67d003 ("powerpc/ptrace: split out ADV_DEBUG_REGS
related functions.").

Restore them.

Fixes: 53387e67d003 ("powerpc/ptrace: split out ADV_DEBUG_REGS related 
functions.")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/ptrace/ptrace-noadv.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c 
b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
index cf05fadba0d5..d4170932acb4 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
@@ -153,7 +153,7 @@ long ppc_set_hwdebug(struct task_struct *child, struct 
ppc_hw_breakpoint *bp_inf
if ((unsigned long)bp_info->addr >= TASK_SIZE)
return -EIO;
 
-   brk.address = bp_info->addr & ~7UL;
+   brk.address = bp_info->addr & ~HW_BREAKPOINT_ALIGN;
brk.type = HW_BRK_TYPE_TRANSLATE;
brk.len = DABR_MAX_LEN;
if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_READ)
@@ -161,10 +161,6 @@ long ppc_set_hwdebug(struct task_struct *child, struct 
ppc_hw_breakpoint *bp_inf
if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_WRITE)
brk.type |= HW_BRK_TYPE_WRITE;
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
-   /*
-* Check if the request is for 'range' breakpoints. We can
-* support it if range < 8 bytes.
-*/
if (bp_info->addr_mode == PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE)
len = bp_info->addr2 - bp_info->addr;
else if (bp_info->addr_mode == PPC_BREAKPOINT_MODE_EXACT)
@@ -177,7 +173,7 @@ long ppc_set_hwdebug(struct task_struct *child, struct 
ppc_hw_breakpoint *bp_inf
 
/* Create a new breakpoint request if one doesn't exist already */
hw_breakpoint_init();
-   attr.bp_addr = (unsigned long)bp_info->addr & ~HW_BREAKPOINT_ALIGN;
+   attr.bp_addr = (unsigned long)bp_info->addr;
attr.bp_len = len;
arch_bp_generic_fields(brk.type, _type);
 
-- 
2.25.0



Re: [PATCH v5 10/13] powerpc/ptrace: split out ADV_DEBUG_REGS related functions.

2020-03-21 Thread Christophe Leroy




On 03/20/2020 02:12 AM, Michael Ellerman wrote:

Christophe Leroy  writes:

Move ADV_DEBUG_REGS functions out of ptrace.c, into
ptrace-adv.c and ptrace-noadv.c

Signed-off-by: Christophe Leroy 
---
v4: Leave hw_breakpoint.h for ptrace.c
---
  arch/powerpc/kernel/ptrace/Makefile   |   4 +
  arch/powerpc/kernel/ptrace/ptrace-adv.c   | 468 
  arch/powerpc/kernel/ptrace/ptrace-decl.h  |   5 +
  arch/powerpc/kernel/ptrace/ptrace-noadv.c | 236 
  arch/powerpc/kernel/ptrace/ptrace.c   | 650 --
  5 files changed, 713 insertions(+), 650 deletions(-)
  create mode 100644 arch/powerpc/kernel/ptrace/ptrace-adv.c
  create mode 100644 arch/powerpc/kernel/ptrace/ptrace-noadv.c


This is somehow breaking the ptrace-hwbreak selftest on Power8:

   test: ptrace-hwbreak
   tags: git_version:v5.6-rc6-892-g7a285a6067d6
   PTRACE_SET_DEBUGREG, WO, len: 1: Ok
   PTRACE_SET_DEBUGREG, WO, len: 2: Ok
   PTRACE_SET_DEBUGREG, WO, len: 4: Ok
   PTRACE_SET_DEBUGREG, WO, len: 8: Ok
   PTRACE_SET_DEBUGREG, RO, len: 1: Ok
   PTRACE_SET_DEBUGREG, RO, len: 2: Ok
   PTRACE_SET_DEBUGREG, RO, len: 4: Ok
   PTRACE_SET_DEBUGREG, RO, len: 8: Ok
   PTRACE_SET_DEBUGREG, RW, len: 1: Ok
   PTRACE_SET_DEBUGREG, RW, len: 2: Ok
   PTRACE_SET_DEBUGREG, RW, len: 4: Ok
   PTRACE_SET_DEBUGREG, RW, len: 8: Ok
   PPC_PTRACE_SETHWDEBUG, MODE_EXACT, WO, len: 1: Ok
   PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RO, len: 1: Ok
   PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RW, len: 1: Ok
   PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, WO, len: 6: Ok
   PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, RO, len: 6: Ok
   PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, RW, len: 6: Ok
   PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW UNALIGNED, WO, len: 6: Ok
   PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW UNALIGNED, RO, len: 6: Fail
   failure: ptrace-hwbreak

I haven't had time to work out why yet.



A (big) part of commit c3f68b0478e7 ("powerpc/watchpoint: Fix ptrace 
code that muck around with address/len") was lost during rebase.


I'll send a fix, up to you to squash it in or commit it as is.

Christophe


Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation

2020-03-21 Thread Thomas Gleixner
"Paul E. McKenney"  writes:
> On Fri, Mar 20, 2020 at 11:36:03PM +0100, Thomas Gleixner wrote:
>> I agree that what I tried to express is hard to parse, but it's at least
>> halfways correct :)
>
> Apologies!  That is what I get for not looking it up in the source.  :-/
>
> OK, so I am stupid enough not only to get it wrong, but also to try again:
>
>... Other types of wakeups would normally unconditionally set the
>task state to RUNNING, but that does not work here because the task
>must remain blocked until the lock becomes available.  Therefore,
>when a non-lock wakeup attempts to awaken a task blocked waiting
>for a spinlock, it instead sets the saved state to RUNNING.  Then,
>when the lock acquisition completes, the lock wakeup sets the task
>state to the saved state, in this case setting it to RUNNING.
>
> Is that better?

Definitely!

Thanks for all the editorial work!

   tglx


[powerpc:fixes-test] BUILD SUCCESS 1d0c32ec3b860a32df593a22bad0d1dbc5546a59

2020-03-21 Thread kbuild test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
fixes-test
branch HEAD: 1d0c32ec3b860a32df593a22bad0d1dbc5546a59  KVM: PPC: Fix kernel 
crash with PR KVM

elapsed time: 1740m

configs tested: 226
configs skipped: 154

The following configs have been built successfully.
More configs may be tested in the coming days.

arm  allmodconfig
arm   allnoconfig
arm  allyesconfig
arm64allmodconfig
arm64 allnoconfig
arm64allyesconfig
arm at91_dt_defconfig
arm   efm32_defconfig
arm  exynos_defconfig
armmulti_v5_defconfig
armmulti_v7_defconfig
armshmobile_defconfig
arm   sunxi_defconfig
arm64   defconfig
sparcallyesconfig
nds32   defconfig
powerpc   allnoconfig
shallnoconfig
riscv   defconfig
ia64 allyesconfig
nds32 allnoconfig
sh  rsk7269_defconfig
nios2 3c120_defconfig
pariscgeneric-32bit_defconfig
riscv  rv32_defconfig
microblazenommu_defconfig
sparc   defconfig
m68k allmodconfig
s390 allmodconfig
i386 alldefconfig
i386  allnoconfig
i386 allyesconfig
i386defconfig
ia64 alldefconfig
ia64 allmodconfig
ia64  allnoconfig
ia64defconfig
c6x  allyesconfig
c6xevmc6678_defconfig
nios2 10m50_defconfig
openriscor1ksim_defconfig
openrisc simple_smp_defconfig
xtensa   common_defconfig
xtensa  iss_defconfig
h8300 edosk2674_defconfig
h8300h8300h-sim_defconfig
h8300   h8s-sim_defconfig
m68k   m5475evb_defconfig
m68k   sun3_defconfig
m68k  multi_defconfig
arc  allyesconfig
arc defconfig
microblaze  mmu_defconfig
powerpc defconfig
powerpc   ppc64_defconfig
powerpc  rhel-kconfig
mips   32r2_defconfig
mips 64r6el_defconfig
mips allmodconfig
mips  allnoconfig
mips allyesconfig
mips  fuloong2e_defconfig
mips  malta_kvm_defconfig
pariscallnoconfig
parisc   allyesconfig
pariscgeneric-64bit_defconfig
x86_64   randconfig-a001-20200320
x86_64   randconfig-a002-20200320
x86_64   randconfig-a003-20200320
i386 randconfig-a001-20200320
i386 randconfig-a002-20200320
i386 randconfig-a003-20200320
x86_64   randconfig-a001-20200321
x86_64   randconfig-a002-20200321
x86_64   randconfig-a003-20200321
i386 randconfig-a001-20200321
i386 randconfig-a002-20200321
i386 randconfig-a003-20200321
alpharandconfig-a001-20200320
m68k randconfig-a001-20200320
mips randconfig-a001-20200320
nds32randconfig-a001-20200320
parisc   randconfig-a001-20200320
alpharandconfig-a001-20200321
m68k randconfig-a001-20200321
mips randconfig-a001-20200321
nds32randconfig-a001-20200321
parisc   randconfig-a001-20200321
riscvrandconfig-a001-20200321
c6x  randconfig-a001-20200321
h8300randconfig-a001-20200321
microblaze   randconfig-a001-20200321
nios2randconfig-a001-20200321
sparc64  randconfig-a001-20200321
c6x  randconfig-a001-20200320
h8300randconfig-a001-20200320
microblaze   randconfig-a001-20200320
nios2randconfig-a001-20200320
sparc64  randconfig-a001-20200320
csky randconfig-a001-20200320
openrisc randconfig-a001-20200320

Re: [PATCH] powerpc/pseries: avoid harmless preempt warning

2020-03-21 Thread Nicholas Piggin
Christophe Leroy's on March 21, 2020 1:33 am:
> 
> 
> Le 20/03/2020 à 16:24, Nicholas Piggin a écrit :
>> Signed-off-by: Nicholas Piggin 
>> ---
>>   arch/powerpc/platforms/pseries/lpar.c | 10 +-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
>> b/arch/powerpc/platforms/pseries/lpar.c
>> index 3c3da25b445c..e4ed5317f117 100644
>> --- a/arch/powerpc/platforms/pseries/lpar.c
>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>> @@ -636,8 +636,16 @@ static const struct proc_ops 
>> vcpudispatch_stats_freq_proc_ops = {
>>   
>>   static int __init vcpudispatch_stats_procfs_init(void)
>>   {
>> -if (!lppaca_shared_proc(get_lppaca()))
>> +/*
>> + * Avoid smp_processor_id while preemptible. All CPUs should have
>> + * the same value for lppaca_shared_proc.
>> + */
>> +preempt_disable();
>> +if (!lppaca_shared_proc(get_lppaca())) {
>> +preempt_enable();
>>  return 0;
>> +}
>> +preempt_enable();
> 
> Can we avoid the double preempt_enable() with something like:
> 
>   preempt_disable();
>   is_shared = lppaca_shared_proc(get_lppaca());
>   preempt_enable();
>   if (!is_shared)
>   return 0;

I don't mind too much. Same number of lines.

Thanks,
Nick


Re: [PATCH] powerpc/64: ftrace don't trace real mode

2020-03-21 Thread Nicholas Piggin
Naveen N. Rao's on March 21, 2020 4:39 am:
> Hi Nick,
> 
> Nicholas Piggin wrote:
>> This warns and prevents tracing attempted in a real-mode context.
> 
> Is this something you're seeing often? Last time we looked at this, KVM 
> was the biggest offender and we introduced paca->ftrace_enabled as a way 
> to disable ftrace while in KVM code.

Not often but it has a tendancy to blow up the least tested code at the
worst times :)

Machine check is bad, I'm sure HMI too but I haven't tested that yet.

I've fixed up most of it with annotations, this is obviously an extra
safety not something to rely on like ftrace_enabled. Probably even the
WARN_ON here is dangerous here, but I don't want to leave these bugs
in there.

Although the machine check and hmi code touch a fair bit of stuff and
annotating is a bit fragile. It might actually be better if the
paca->ftrace_enabled could be a nesting counter, then we could use it
in machine checks too and avoid a lot of annotations.

> While this is cheap when handling ftrace_regs_caller() as done in this 
> patch, for simple function tracing (see below), we will have to grab the 
> MSR which will slow things down slightly.

mfmsr is not too bad these days. 


> 
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>>  arch/powerpc/kernel/trace/ftrace.c|  3 +++
>>  .../powerpc/kernel/trace/ftrace_64_mprofile.S | 19 +++
>>  2 files changed, 18 insertions(+), 4 deletions(-)
>> 
>> diff --git a/arch/powerpc/kernel/trace/ftrace.c 
>> b/arch/powerpc/kernel/trace/ftrace.c
>> index 7ea0ca044b65..ef965815fcb9 100644
>> --- a/arch/powerpc/kernel/trace/ftrace.c
>> +++ b/arch/powerpc/kernel/trace/ftrace.c
>> @@ -949,6 +949,9 @@ unsigned long prepare_ftrace_return(unsigned long 
>> parent, unsigned long ip,
>>  {
>>  unsigned long return_hooker;
>> 
>> +if (WARN_ON_ONCE((mfmsr() & (MSR_IR|MSR_DR)) != (MSR_IR|MSR_DR)))
>> +goto out;
>> +
> 
> This is called on function entry to redirect function return to a 
> trampoline if needed. I am not sure if we have (or will have) too many C 
> functions that disable MSR_IR|MSR_DR. Unless the number of such 
> functions is large, it might be preferable to mark specific functions as 
> notrace.
> 
>>  if (unlikely(ftrace_graph_is_dead()))
>>  goto out;
>> 
>> diff --git a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S 
>> b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
>> index f9fd5f743eba..6205f15cb603 100644
>> --- a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
>> +++ b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
>> @@ -51,16 +51,21 @@ _GLOBAL(ftrace_regs_caller)
>>  SAVE_10GPRS(12, r1)
>>  SAVE_10GPRS(22, r1)
>> 
>> -/* Save previous stack pointer (r1) */
>> -addir8, r1, SWITCH_FRAME_SIZE
>> -std r8, GPR1(r1)
>> -
>>  /* Load special regs for save below */
>>  mfmsr   r8
>>  mfctr   r9
>>  mfxer   r10
>>  mfcrr11
>> 
>> +/* Shouldn't be called in real mode */
>> +andi.   r3,r8,(MSR_IR|MSR_DR)
>> +cmpdi   r3,(MSR_IR|MSR_DR)
>> +bne ftrace_bad_realmode
>> +
>> +/* Save previous stack pointer (r1) */
>> +addir8, r1, SWITCH_FRAME_SIZE
>> +std r8, GPR1(r1)
>> +
> 
> This stomps on the MSR value in r8, which is saved into pt_regs further 
> below.
> 
> You'll also have to handle ftrace_caller() which is used for simple 
> function tracing. We don't read the MSR there today, but that will be 
> needed if we want to suppress tracing.

Oops, thanks good catch.

Thanks,
Nick