Re: [PATCH] perf powerpc: Don't call perf_event_disable from atomic context

2016-10-10 Thread Will Deacon
On Tue, Oct 04, 2016 at 09:06:18AM +0200, Peter Zijlstra wrote:
> On Tue, Oct 04, 2016 at 03:29:33PM +1100, Michael Ellerman wrote:
> > Peter Zijlstra  writes:
> > > So it would be good to also explain why PPC needs this in the first
> > > place.
> > 
> > Unfortunately I don't really know the code, and the original author is AWOL.
> > 
> > But AFAICS perf_event_disable() is only called here:
> > 
> > if (!stepped) {
> > WARN(1, "Unable to handle hardware breakpoint. Breakpoint at "
> > "0x%lx will be disabled.", info->address);
> > perf_event_disable(bp);
> > goto out;
> > }
> > 
> > Which is where we cope with the possibility that we couldn't emulate the
> > instruction that hit the breakpoint. Seems that is not an issue on x86,
> > or it's handled elsewhere?
> 
> I don't think x86 ever needs to emulate things on hw breakpoint
> (although I could be mistaken), but I would expect ARM to maybe need
> so, and I couldn't find a disable there either.
> 
> Will?

We don't do any emulation, so no need for us to call perf_event_disable
in the hw_breakpoint "overflow" path. We do play some awful games to
fake up a single-step, but I don't think perf core needs to care about
it.

Will


Re: [PATCH] perf powerpc: Don't call perf_event_disable from atomic context

2016-10-06 Thread Peter Zijlstra
On Wed, Oct 05, 2016 at 09:53:38PM +0200, Jiri Olsa wrote:
> On Wed, Oct 05, 2016 at 10:09:21AM +0200, Jiri Olsa wrote:
> > On Tue, Oct 04, 2016 at 03:29:33PM +1100, Michael Ellerman wrote:
> > 
> > SNIP
> > 
> > > Which is where we cope with the possibility that we couldn't emulate the
> > > instruction that hit the breakpoint. Seems that is not an issue on x86,
> > > or it's handled elsewhere?
> > > 
> > > We should fix emulate_step() if it failed to emulate something it
> > > should have, but there will always be the possibility that it fails.
> > > 
> > > Instead of calling perf_event_disable() we could just add a flag to
> > > arch_hw_breakpoint that says we hit an error on the event, and block
> > > reinstalling it in arch_install_hw_breakpoint().
> > 
> > ok, might be easier.. I'll check on that
> 
> so staring on that I think disabling is the right way here.. 
> 
> we need the event to be unscheduled and not scheduled back
> again, I don't see better way at the moment

OK, can you resend the patch with updated Changelog that explains these
things?


Re: [PATCH] perf powerpc: Don't call perf_event_disable from atomic context

2016-10-05 Thread Jiri Olsa
On Wed, Oct 05, 2016 at 10:09:21AM +0200, Jiri Olsa wrote:
> On Tue, Oct 04, 2016 at 03:29:33PM +1100, Michael Ellerman wrote:
> 
> SNIP
> 
> > Which is where we cope with the possibility that we couldn't emulate the
> > instruction that hit the breakpoint. Seems that is not an issue on x86,
> > or it's handled elsewhere?
> > 
> > We should fix emulate_step() if it failed to emulate something it
> > should have, but there will always be the possibility that it fails.
> > 
> > Instead of calling perf_event_disable() we could just add a flag to
> > arch_hw_breakpoint that says we hit an error on the event, and block
> > reinstalling it in arch_install_hw_breakpoint().
> 
> ok, might be easier.. I'll check on that

so staring on that I think disabling is the right way here.. 

we need the event to be unscheduled and not scheduled back
again, I don't see better way at the moment

jirka


Re: [PATCH] perf powerpc: Don't call perf_event_disable from atomic context

2016-10-05 Thread Jan Stancek


- Original Message -
> From: "Michael Ellerman" 
> To: "Jiri Olsa" , "Peter Zijlstra" 
> Cc: "lkml" , "Ingo Molnar" , 
> "Michael Neuling" ,
> "Paul Mackerras" , "Alexander Shishkin" 
> , "Jan Stancek"
> 
> Sent: Tuesday, 4 October, 2016 6:08:27 AM
> Subject: Re: [PATCH] perf powerpc: Don't call perf_event_disable from atomic 
> context
> 
> Jiri Olsa  writes:
> 
> > The trinity syscall fuzzer triggered following WARN on powerpc:
> >   WARNING: CPU: 9 PID: 2998 at arch/powerpc/kernel/hw_breakpoint.c:278
> >   ...
> >   NIP [c093aedc] .hw_breakpoint_handler+0x28c/0x2b0
> >   LR [c093aed8] .hw_breakpoint_handler+0x288/0x2b0
> >   Call Trace:
> >   [c002f7933580] [c093aed8] .hw_breakpoint_handler+0x288/0x2b0
> >   (unreliable)
> >   [c002f7933630] [c00f671c] .notifier_call_chain+0x7c/0xf0
> >   [c002f79336d0] [c00f6abc]
> >   .__atomic_notifier_call_chain+0xbc/0x1c0
> >   [c002f7933780] [c00f6c40] .notify_die+0x70/0xd0
> >   [c002f7933820] [c001a74c] .do_break+0x4c/0x100
> >   [c002f7933920] [c00089fc] handle_dabr_fault+0x14/0x48
> 
> Is that the full stack trace? It doesn't look like it.
> 
> And were you running trinity as root or regular user?

As regular user:

# adduser dummy
# su dummy /mnt/testarea/trinity --children $proc_num -m --syslog -q -T DIE

Regards,
Jan


Re: [PATCH] perf powerpc: Don't call perf_event_disable from atomic context

2016-10-05 Thread Jiri Olsa
On Tue, Oct 04, 2016 at 03:29:33PM +1100, Michael Ellerman wrote:

SNIP

> Which is where we cope with the possibility that we couldn't emulate the
> instruction that hit the breakpoint. Seems that is not an issue on x86,
> or it's handled elsewhere?
> 
> We should fix emulate_step() if it failed to emulate something it
> should have, but there will always be the possibility that it fails.
> 
> Instead of calling perf_event_disable() we could just add a flag to
> arch_hw_breakpoint that says we hit an error on the event, and block
> reinstalling it in arch_install_hw_breakpoint().

ok, might be easier.. I'll check on that

thanks,
jirka


Re: [PATCH] perf powerpc: Don't call perf_event_disable from atomic context

2016-10-05 Thread Jiri Olsa
On Tue, Oct 04, 2016 at 03:08:27PM +1100, Michael Ellerman wrote:
> Jiri Olsa  writes:
> 
> > The trinity syscall fuzzer triggered following WARN on powerpc:
> >   WARNING: CPU: 9 PID: 2998 at arch/powerpc/kernel/hw_breakpoint.c:278
> >   ...
> >   NIP [c093aedc] .hw_breakpoint_handler+0x28c/0x2b0
> >   LR [c093aed8] .hw_breakpoint_handler+0x288/0x2b0
> >   Call Trace:
> >   [c002f7933580] [c093aed8] .hw_breakpoint_handler+0x288/0x2b0 
> > (unreliable)
> >   [c002f7933630] [c00f671c] .notifier_call_chain+0x7c/0xf0
> >   [c002f79336d0] [c00f6abc] 
> > .__atomic_notifier_call_chain+0xbc/0x1c0
> >   [c002f7933780] [c00f6c40] .notify_die+0x70/0xd0
> >   [c002f7933820] [c001a74c] .do_break+0x4c/0x100
> >   [c002f7933920] [c00089fc] handle_dabr_fault+0x14/0x48
> 
> Is that the full stack trace? It doesn't look like it.
> 
> And were you running trinity as root or regular user?

I cut it just to contain the backtrace.. attached full
one from another instance of this error.

Jan, could you please answer the trinity question?


thanks,
jirka

---
[ 4557.360587] ===
[ 4557.360590] [ INFO: suspicious RCU usage. ]
[ 4557.360593] 3.10.0-379.el7.ppc64le.debug #1 Tainted: GW  

[ 4557.360596] ---
[ 4557.360599] include/linux/rcupdate.h:488 Illegal context switch in RCU 
read-side critical section!
[ 4557.360602]
[ 4557.360602] other info that might help us debug this:
[ 4557.360602]
[ 4557.360606]
[ 4557.360606] rcu_scheduler_active = 1, debug_locks = 0
[ 4557.360610] 2 locks held by trinity-c0/138777:
[ 4557.360613]  #0:  (rcu_read_lock){.+.+..}, at: [] 
notify_die+0x8/0x1c0
[ 4557.360622]  #1:  (rcu_read_lock){.+.+..}, at: [] 
hw_breakpoint_handler+0x8/0x310
[ 4557.360631]
[ 4557.360631] stack backtrace:
[ 4557.360635] CPU: 0 PID: 138777 Comm: trinity-c0 Tainted: GW  
   3.10.0-379.el7.ppc64le.debug #1
[ 4557.360639] Call Trace:
[ 4557.360641] [c0042844b510] [c0019e18] show_stack+0x88/0x390 
(unreliable)
[ 4557.360647] [c0042844b5d0] [c0a7ab04] dump_stack+0x30/0x44
[ 4557.360653] [c0042844b5f0] [c01a0140] 
lockdep_rcu_suspicious+0x140/0x190
[ 4557.360659] [c0042844b670] [c0140ee8] __might_sleep+0x278/0x2d0
[ 4557.360663] [c0042844b6f0] [c0a55794] 
mutex_lock_nested+0x74/0x5b0
[ 4557.360669] [c0042844b7f0] [c026f67c] 
perf_event_ctx_lock_nested+0x15c/0x370
[ 4557.360674] [c0042844b880] [c02714b8] 
perf_event_disable+0x28/0xe0
[ 4557.360679] [c0042844b8b0] [c0a602a4] 
hw_breakpoint_handler+0x204/0x310
[ 4557.360684] [c0042844b950] [c0a65124] 
notifier_call_chain.constprop.6+0xa4/0x1d0
[ 4557.360689] [c0042844b9e0] [c0a654b8] notify_die+0xb8/0x1c0
[ 4557.360693] [c0042844ba40] [c00181c4] do_break+0x54/0x100
[ 4557.360698] [c0042844baf0] [c00095a0] handle_dabr_fault+0x14/0x48
[ 4557.360704] --- Exception: 300 at SyS_getresgid+0xb0/0x170
[ 4557.360704] LR = SyS_getresgid+0x88/0x170
[ 4557.360709] [c0042844be30] [c000a188] system_call+0x38/0xb4
[ 4557.360713] BUG: sleeping function called from invalid context at 
kernel/mutex.c:576
[ 4557.360716] in_atomic(): 1, irqs_disabled(): 1, pid: 138777, name: trinity-c0
[ 4557.360720] INFO: lockdep is turned off.
[ 4557.360722] irq event stamp: 9356
[ 4557.360725] hardirqs last  enabled at (9355): [] 
__mutex_unlock_slowpath+0x120/0x2a0
[ 4557.360730] hardirqs last disabled at (9356): [] 
data_access_common+0x11c/0x180
[ 4557.360735] softirqs last  enabled at (8688): [] 
bdi_wakeup_thread_delayed+0x8c/0xb0
[ 4557.360742] softirqs last disabled at (8684): [] 
_raw_spin_lock_bh+0x2c/0xd0
[ 4557.360748] CPU: 0 PID: 138777 Comm: trinity-c0 Tainted: GW  
   3.10.0-379.el7.ppc64le.debug #1
[ 4557.360751] Call Trace:
[ 4557.360754] [c0042844b590] [c0019e18] show_stack+0x88/0x390 
(unreliable)
[ 4557.360760] [c0042844b650] [c0a7ab04] dump_stack+0x30/0x44
[ 4557.360764] [c0042844b670] [c0140e34] __might_sleep+0x1c4/0x2d0
[ 4557.360769] [c0042844b6f0] [c0a55794] 
mutex_lock_nested+0x74/0x5b0
[ 4557.360774] [c0042844b7f0] [c026f67c] 
perf_event_ctx_lock_nested+0x15c/0x370
[ 4557.360779] [c0042844b880] [c02714b8] 
perf_event_disable+0x28/0xe0
[ 4557.360784] [c0042844b8b0] [c0a602a4] 
hw_breakpoint_handler+0x204/0x310
[ 4557.360789] [c0042844b950] [c0a65124] 
notifier_call_chain.constprop.6+0xa4/0x1d0
[ 4557.360793] [c0042844b9e0] [c0a654b8] notify_die+0xb8/0x1c0
[ 4557.360798] [c0042844ba40] [c00181c4] do_break+0x54/0x100
[ 4557.360803] [c0042844baf0] [c00095a0] handle_dabr_fault+0x14/0x48
[ 4557.360809] --- Exception: 300 at SyS_getresgid+0xb0/0x170
[ 4557.360809] LR = SyS_getresgid+0x88/0x17

Re: [PATCH] perf powerpc: Don't call perf_event_disable from atomic context

2016-10-04 Thread Peter Zijlstra
On Tue, Oct 04, 2016 at 03:29:33PM +1100, Michael Ellerman wrote:
> Peter Zijlstra  writes:
> > So it would be good to also explain why PPC needs this in the first
> > place.
> 
> Unfortunately I don't really know the code, and the original author is AWOL.
> 
> But AFAICS perf_event_disable() is only called here:
> 
>   if (!stepped) {
>   WARN(1, "Unable to handle hardware breakpoint. Breakpoint at "
>   "0x%lx will be disabled.", info->address);
>   perf_event_disable(bp);
>   goto out;
>   }
> 
> Which is where we cope with the possibility that we couldn't emulate the
> instruction that hit the breakpoint. Seems that is not an issue on x86,
> or it's handled elsewhere?

I don't think x86 ever needs to emulate things on hw breakpoint
(although I could be mistaken), but I would expect ARM to maybe need
so, and I couldn't find a disable there either.

Will?

> We should fix emulate_step() if it failed to emulate something it
> should have, but there will always be the possibility that it fails.
> 
> Instead of calling perf_event_disable() we could just add a flag to
> arch_hw_breakpoint that says we hit an error on the event, and block
> reinstalling it in arch_install_hw_breakpoint().

Possible..


Re: [PATCH] perf powerpc: Don't call perf_event_disable from atomic context

2016-10-03 Thread Michael Ellerman
Peter Zijlstra  writes:

> On Mon, Oct 03, 2016 at 03:29:32PM +0200, Jiri Olsa wrote:
>> On Fri, Sep 23, 2016 at 06:37:47PM +0200, Peter Zijlstra wrote:
>> > On Wed, Sep 21, 2016 at 03:55:34PM +0200, Jiri Olsa wrote:
>> > >   stack backtrace:
>> > >   CPU: 9 PID: 2998 Comm: ls Tainted: GW   4.8.0-rc5+ #7
>> > >   Call Trace:
>> > >   [c002f7933150] [c094b1f8] .dump_stack+0xe0/0x14c 
>> > > (unreliable)
>> > >   [c002f79331e0] [c013c468] 
>> > > .lockdep_rcu_suspicious+0x138/0x180
>> > >   [c002f7933270] [c01005d8] .___might_sleep+0x278/0x2e0
>> > >   [c002f7933300] [c0935584] .mutex_lock_nested+0x64/0x5a0
>> > >   [c002f7933410] [c023084c] 
>> > > .perf_event_ctx_lock_nested+0x16c/0x380
>> > >   [c002f7933500] [c0230a80] .perf_event_disable+0x20/0x60
>> > >   [c002f7933580] [c093aeec] 
>> > > .hw_breakpoint_handler+0x29c/0x2b0
>> > >   [c002f7933630] [c00f671c] .notifier_call_chain+0x7c/0xf0
>> > >   [c002f79336d0] [c00f6abc] 
>> > > .__atomic_notifier_call_chain+0xbc/0x1c0
>> > >   [c002f7933780] [c00f6c40] .notify_die+0x70/0xd0
>> > >   [c002f7933820] [c001a74c] .do_break+0x4c/0x100
>> > >   [c002f7933920] [c00089fc] handle_dabr_fault+0x14/0x48
>> > 
>> > Well, that lockdep warning only says you should not be taking sleeping
>> > locks while holding rcu_read_lock(), which is true. It does not say the
>> > context you're doing this is cannot sleep.
>> > 
>> > I'm not familiar enough with the PPC stuff to tell if the DIE_DABR_MATCH
>> > trap context is atomic or not and this Changelog doesn't tell me.
>> 
>> ping
>
> So I think all the DIE notifiers are atomic, which means this would
> indeed be the thing to do. That said, I didn't see anything similar on
> other BP implementations.

Seems everyone is being called from the same notifier, which is atomic,
but powerpc is the only arch that does perf_event_disable().

> So it would be good to also explain why PPC needs this in the first
> place.

Unfortunately I don't really know the code, and the original author is AWOL.

But AFAICS perf_event_disable() is only called here:

if (!stepped) {
WARN(1, "Unable to handle hardware breakpoint. Breakpoint at "
"0x%lx will be disabled.", info->address);
perf_event_disable(bp);
goto out;
}

Which is where we cope with the possibility that we couldn't emulate the
instruction that hit the breakpoint. Seems that is not an issue on x86,
or it's handled elsewhere?

We should fix emulate_step() if it failed to emulate something it
should have, but there will always be the possibility that it fails.

Instead of calling perf_event_disable() we could just add a flag to
arch_hw_breakpoint that says we hit an error on the event, and block
reinstalling it in arch_install_hw_breakpoint().

cheers


Re: [PATCH] perf powerpc: Don't call perf_event_disable from atomic context

2016-10-03 Thread Michael Ellerman
Jiri Olsa  writes:

> The trinity syscall fuzzer triggered following WARN on powerpc:
>   WARNING: CPU: 9 PID: 2998 at arch/powerpc/kernel/hw_breakpoint.c:278
>   ...
>   NIP [c093aedc] .hw_breakpoint_handler+0x28c/0x2b0
>   LR [c093aed8] .hw_breakpoint_handler+0x288/0x2b0
>   Call Trace:
>   [c002f7933580] [c093aed8] .hw_breakpoint_handler+0x288/0x2b0 
> (unreliable)
>   [c002f7933630] [c00f671c] .notifier_call_chain+0x7c/0xf0
>   [c002f79336d0] [c00f6abc] 
> .__atomic_notifier_call_chain+0xbc/0x1c0
>   [c002f7933780] [c00f6c40] .notify_die+0x70/0xd0
>   [c002f7933820] [c001a74c] .do_break+0x4c/0x100
>   [c002f7933920] [c00089fc] handle_dabr_fault+0x14/0x48

Is that the full stack trace? It doesn't look like it.

And were you running trinity as root or regular user?

cheers


Re: [PATCH] perf powerpc: Don't call perf_event_disable from atomic context

2016-10-03 Thread Peter Zijlstra
On Mon, Oct 03, 2016 at 03:29:32PM +0200, Jiri Olsa wrote:
> On Fri, Sep 23, 2016 at 06:37:47PM +0200, Peter Zijlstra wrote:
> > On Wed, Sep 21, 2016 at 03:55:34PM +0200, Jiri Olsa wrote:
> > > The trinity syscall fuzzer triggered following WARN on powerpc:
> > >   WARNING: CPU: 9 PID: 2998 at arch/powerpc/kernel/hw_breakpoint.c:278
> > >   ...
> > >   NIP [c093aedc] .hw_breakpoint_handler+0x28c/0x2b0
> > >   LR [c093aed8] .hw_breakpoint_handler+0x288/0x2b0
> > >   Call Trace:
> > >   [c002f7933580] [c093aed8] 
> > > .hw_breakpoint_handler+0x288/0x2b0 (unreliable)
> > >   [c002f7933630] [c00f671c] .notifier_call_chain+0x7c/0xf0
> > >   [c002f79336d0] [c00f6abc] 
> > > .__atomic_notifier_call_chain+0xbc/0x1c0
> > >   [c002f7933780] [c00f6c40] .notify_die+0x70/0xd0
> > >   [c002f7933820] [c001a74c] .do_break+0x4c/0x100
> > >   [c002f7933920] [c00089fc] handle_dabr_fault+0x14/0x48
> > > 
> > > Followed by lockdep warning:
> > >   ===
> > >   [ INFO: suspicious RCU usage. ]
> > >   4.8.0-rc5+ #7 Tainted: GW
> > >   ---
> > >   ./include/linux/rcupdate.h:556 Illegal context switch in RCU read-side 
> > > critical section!
> > > 
> > >   other info that might help us debug this:
> > > 
> > >   rcu_scheduler_active = 1, debug_locks = 0
> > >   2 locks held by ls/2998:
> > >#0:  (rcu_read_lock){..}, at: [] 
> > > .__atomic_notifier_call_chain+0x0/0x1c0
> > >#1:  (rcu_read_lock){..}, at: [] 
> > > .hw_breakpoint_handler+0x0/0x2b0
> > > 
> > >   stack backtrace:
> > >   CPU: 9 PID: 2998 Comm: ls Tainted: GW   4.8.0-rc5+ #7
> > >   Call Trace:
> > >   [c002f7933150] [c094b1f8] .dump_stack+0xe0/0x14c 
> > > (unreliable)
> > >   [c002f79331e0] [c013c468] 
> > > .lockdep_rcu_suspicious+0x138/0x180
> > >   [c002f7933270] [c01005d8] .___might_sleep+0x278/0x2e0
> > >   [c002f7933300] [c0935584] .mutex_lock_nested+0x64/0x5a0
> > >   [c002f7933410] [c023084c] 
> > > .perf_event_ctx_lock_nested+0x16c/0x380
> > >   [c002f7933500] [c0230a80] .perf_event_disable+0x20/0x60
> > >   [c002f7933580] [c093aeec] .hw_breakpoint_handler+0x29c/0x2b0
> > >   [c002f7933630] [c00f671c] .notifier_call_chain+0x7c/0xf0
> > >   [c002f79336d0] [c00f6abc] 
> > > .__atomic_notifier_call_chain+0xbc/0x1c0
> > >   [c002f7933780] [c00f6c40] .notify_die+0x70/0xd0
> > >   [c002f7933820] [c001a74c] .do_break+0x4c/0x100
> > >   [c002f7933920] [c00089fc] handle_dabr_fault+0x14/0x48
> > > 
> > 
> > Well, that lockdep warning only says you should not be taking sleeping
> > locks while holding rcu_read_lock(), which is true. It does not say the
> > context you're doing this is cannot sleep.
> > 
> > I'm not familiar enough with the PPC stuff to tell if the DIE_DABR_MATCH
> > trap context is atomic or not and this Changelog doesn't tell me.
> > 
> > Anybody?
> 
> ping

So I think all the DIE notifiers are atomic, which means this would
indeed be the thing to do. That said, I didn't see anything similar on
other BP implementations.

So it would be good to also explain why PPC needs this in the first
place.


Re: [PATCH] perf powerpc: Don't call perf_event_disable from atomic context

2016-10-03 Thread Jiri Olsa
On Fri, Sep 23, 2016 at 06:37:47PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 21, 2016 at 03:55:34PM +0200, Jiri Olsa wrote:
> > The trinity syscall fuzzer triggered following WARN on powerpc:
> >   WARNING: CPU: 9 PID: 2998 at arch/powerpc/kernel/hw_breakpoint.c:278
> >   ...
> >   NIP [c093aedc] .hw_breakpoint_handler+0x28c/0x2b0
> >   LR [c093aed8] .hw_breakpoint_handler+0x288/0x2b0
> >   Call Trace:
> >   [c002f7933580] [c093aed8] .hw_breakpoint_handler+0x288/0x2b0 
> > (unreliable)
> >   [c002f7933630] [c00f671c] .notifier_call_chain+0x7c/0xf0
> >   [c002f79336d0] [c00f6abc] 
> > .__atomic_notifier_call_chain+0xbc/0x1c0
> >   [c002f7933780] [c00f6c40] .notify_die+0x70/0xd0
> >   [c002f7933820] [c001a74c] .do_break+0x4c/0x100
> >   [c002f7933920] [c00089fc] handle_dabr_fault+0x14/0x48
> > 
> > Followed by lockdep warning:
> >   ===
> >   [ INFO: suspicious RCU usage. ]
> >   4.8.0-rc5+ #7 Tainted: GW
> >   ---
> >   ./include/linux/rcupdate.h:556 Illegal context switch in RCU read-side 
> > critical section!
> > 
> >   other info that might help us debug this:
> > 
> >   rcu_scheduler_active = 1, debug_locks = 0
> >   2 locks held by ls/2998:
> >#0:  (rcu_read_lock){..}, at: [] 
> > .__atomic_notifier_call_chain+0x0/0x1c0
> >#1:  (rcu_read_lock){..}, at: [] 
> > .hw_breakpoint_handler+0x0/0x2b0
> > 
> >   stack backtrace:
> >   CPU: 9 PID: 2998 Comm: ls Tainted: GW   4.8.0-rc5+ #7
> >   Call Trace:
> >   [c002f7933150] [c094b1f8] .dump_stack+0xe0/0x14c (unreliable)
> >   [c002f79331e0] [c013c468] .lockdep_rcu_suspicious+0x138/0x180
> >   [c002f7933270] [c01005d8] .___might_sleep+0x278/0x2e0
> >   [c002f7933300] [c0935584] .mutex_lock_nested+0x64/0x5a0
> >   [c002f7933410] [c023084c] 
> > .perf_event_ctx_lock_nested+0x16c/0x380
> >   [c002f7933500] [c0230a80] .perf_event_disable+0x20/0x60
> >   [c002f7933580] [c093aeec] .hw_breakpoint_handler+0x29c/0x2b0
> >   [c002f7933630] [c00f671c] .notifier_call_chain+0x7c/0xf0
> >   [c002f79336d0] [c00f6abc] 
> > .__atomic_notifier_call_chain+0xbc/0x1c0
> >   [c002f7933780] [c00f6c40] .notify_die+0x70/0xd0
> >   [c002f7933820] [c001a74c] .do_break+0x4c/0x100
> >   [c002f7933920] [c00089fc] handle_dabr_fault+0x14/0x48
> > 
> 
> Well, that lockdep warning only says you should not be taking sleeping
> locks while holding rcu_read_lock(), which is true. It does not say the
> context you're doing this is cannot sleep.
> 
> I'm not familiar enough with the PPC stuff to tell if the DIE_DABR_MATCH
> trap context is atomic or not and this Changelog doesn't tell me.
> 
> Anybody?

ping

thanks,
jirka


Re: [PATCH] perf powerpc: Don't call perf_event_disable from atomic context

2016-09-23 Thread Peter Zijlstra
On Wed, Sep 21, 2016 at 03:55:34PM +0200, Jiri Olsa wrote:
> The trinity syscall fuzzer triggered following WARN on powerpc:
>   WARNING: CPU: 9 PID: 2998 at arch/powerpc/kernel/hw_breakpoint.c:278
>   ...
>   NIP [c093aedc] .hw_breakpoint_handler+0x28c/0x2b0
>   LR [c093aed8] .hw_breakpoint_handler+0x288/0x2b0
>   Call Trace:
>   [c002f7933580] [c093aed8] .hw_breakpoint_handler+0x288/0x2b0 
> (unreliable)
>   [c002f7933630] [c00f671c] .notifier_call_chain+0x7c/0xf0
>   [c002f79336d0] [c00f6abc] 
> .__atomic_notifier_call_chain+0xbc/0x1c0
>   [c002f7933780] [c00f6c40] .notify_die+0x70/0xd0
>   [c002f7933820] [c001a74c] .do_break+0x4c/0x100
>   [c002f7933920] [c00089fc] handle_dabr_fault+0x14/0x48
> 
> Followed by lockdep warning:
>   ===
>   [ INFO: suspicious RCU usage. ]
>   4.8.0-rc5+ #7 Tainted: GW
>   ---
>   ./include/linux/rcupdate.h:556 Illegal context switch in RCU read-side 
> critical section!
> 
>   other info that might help us debug this:
> 
>   rcu_scheduler_active = 1, debug_locks = 0
>   2 locks held by ls/2998:
>#0:  (rcu_read_lock){..}, at: [] 
> .__atomic_notifier_call_chain+0x0/0x1c0
>#1:  (rcu_read_lock){..}, at: [] 
> .hw_breakpoint_handler+0x0/0x2b0
> 
>   stack backtrace:
>   CPU: 9 PID: 2998 Comm: ls Tainted: GW   4.8.0-rc5+ #7
>   Call Trace:
>   [c002f7933150] [c094b1f8] .dump_stack+0xe0/0x14c (unreliable)
>   [c002f79331e0] [c013c468] .lockdep_rcu_suspicious+0x138/0x180
>   [c002f7933270] [c01005d8] .___might_sleep+0x278/0x2e0
>   [c002f7933300] [c0935584] .mutex_lock_nested+0x64/0x5a0
>   [c002f7933410] [c023084c] 
> .perf_event_ctx_lock_nested+0x16c/0x380
>   [c002f7933500] [c0230a80] .perf_event_disable+0x20/0x60
>   [c002f7933580] [c093aeec] .hw_breakpoint_handler+0x29c/0x2b0
>   [c002f7933630] [c00f671c] .notifier_call_chain+0x7c/0xf0
>   [c002f79336d0] [c00f6abc] 
> .__atomic_notifier_call_chain+0xbc/0x1c0
>   [c002f7933780] [c00f6c40] .notify_die+0x70/0xd0
>   [c002f7933820] [c001a74c] .do_break+0x4c/0x100
>   [c002f7933920] [c00089fc] handle_dabr_fault+0x14/0x48
> 

Well, that lockdep warning only says you should not be taking sleeping
locks while holding rcu_read_lock(), which is true. It does not say the
context you're doing this is cannot sleep.

I'm not familiar enough with the PPC stuff to tell if the DIE_DABR_MATCH
trap context is atomic or not and this Changelog doesn't tell me.

Anybody?


[PATCH] perf powerpc: Don't call perf_event_disable from atomic context

2016-09-21 Thread Jiri Olsa
The trinity syscall fuzzer triggered following WARN on powerpc:
  WARNING: CPU: 9 PID: 2998 at arch/powerpc/kernel/hw_breakpoint.c:278
  ...
  NIP [c093aedc] .hw_breakpoint_handler+0x28c/0x2b0
  LR [c093aed8] .hw_breakpoint_handler+0x288/0x2b0
  Call Trace:
  [c002f7933580] [c093aed8] .hw_breakpoint_handler+0x288/0x2b0 
(unreliable)
  [c002f7933630] [c00f671c] .notifier_call_chain+0x7c/0xf0
  [c002f79336d0] [c00f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0
  [c002f7933780] [c00f6c40] .notify_die+0x70/0xd0
  [c002f7933820] [c001a74c] .do_break+0x4c/0x100
  [c002f7933920] [c00089fc] handle_dabr_fault+0x14/0x48

Followed by lockdep warning:
  ===
  [ INFO: suspicious RCU usage. ]
  4.8.0-rc5+ #7 Tainted: GW
  ---
  ./include/linux/rcupdate.h:556 Illegal context switch in RCU read-side 
critical section!

  other info that might help us debug this:

  rcu_scheduler_active = 1, debug_locks = 0
  2 locks held by ls/2998:
   #0:  (rcu_read_lock){..}, at: [] 
.__atomic_notifier_call_chain+0x0/0x1c0
   #1:  (rcu_read_lock){..}, at: [] 
.hw_breakpoint_handler+0x0/0x2b0

  stack backtrace:
  CPU: 9 PID: 2998 Comm: ls Tainted: GW   4.8.0-rc5+ #7
  Call Trace:
  [c002f7933150] [c094b1f8] .dump_stack+0xe0/0x14c (unreliable)
  [c002f79331e0] [c013c468] .lockdep_rcu_suspicious+0x138/0x180
  [c002f7933270] [c01005d8] .___might_sleep+0x278/0x2e0
  [c002f7933300] [c0935584] .mutex_lock_nested+0x64/0x5a0
  [c002f7933410] [c023084c] .perf_event_ctx_lock_nested+0x16c/0x380
  [c002f7933500] [c0230a80] .perf_event_disable+0x20/0x60
  [c002f7933580] [c093aeec] .hw_breakpoint_handler+0x29c/0x2b0
  [c002f7933630] [c00f671c] .notifier_call_chain+0x7c/0xf0
  [c002f79336d0] [c00f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0
  [c002f7933780] [c00f6c40] .notify_die+0x70/0xd0
  [c002f7933820] [c001a74c] .do_break+0x4c/0x100
  [c002f7933920] [c00089fc] handle_dabr_fault+0x14/0x48

While it looks like the first WARN is probably valid,
the other one is triggered by disabling event via
perf_event_disable from atomic context.

Using the event's pending_disable irq_work way to disable
event from atomic context.

Reported-by: Jan Stancek 
Signed-off-by: Jiri Olsa 
---
 arch/powerpc/kernel/hw_breakpoint.c |  2 +-
 include/linux/perf_event.h  |  1 +
 kernel/events/core.c| 11 ---
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index aec9a1b1d25b..4d3bcbbf626a 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -275,7 +275,7 @@ int __kprobes hw_breakpoint_handler(struct die_args *args)
if (!stepped) {
WARN(1, "Unable to handle hardware breakpoint. Breakpoint at "
"0x%lx will be disabled.", info->address);
-   perf_event_disable(bp);
+   perf_event_disable_inatomic(bp);
goto out;
}
/*
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2b6b43cc0dd5..cfc7f9f963fb 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1234,6 +1234,7 @@ extern u64 perf_swevent_set_period(struct perf_event 
*event);
 extern void perf_event_enable(struct perf_event *event);
 extern void perf_event_disable(struct perf_event *event);
 extern void perf_event_disable_local(struct perf_event *event);
+extern void perf_event_disable_inatomic(struct perf_event *event);
 extern void perf_event_task_tick(void);
 #else /* !CONFIG_PERF_EVENTS: */
 static inline void *
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3cfabdf7b942..ac08cf243dd7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1959,6 +1959,13 @@ void perf_event_disable(struct perf_event *event)
 }
 EXPORT_SYMBOL_GPL(perf_event_disable);
 
+void perf_event_disable_inatomic(struct perf_event *event)
+{
+   event->pending_kill= POLL_HUP;
+   event->pending_disable = 1;
+   irq_work_queue(&event->pending);
+}
+
 static void perf_set_shadow_time(struct perf_event *event,
 struct perf_event_context *ctx,
 u64 tstamp)
@@ -7017,9 +7024,7 @@ static int __perf_event_overflow(struct perf_event *event,
event->pending_kill = POLL_IN;
if (events && atomic_dec_and_test(&event->event_limit)) {
ret = 1;
-   event->pending_kill = POLL_HUP;
-   event->pending_disable = 1;
-   irq_work_queue(&event->pending);
+   perf_event_disable_inatomic(event);
}
 
event->overflow_handler(event, data, regs);
-- 
2.7.4