Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Sat, 2015-03-07 at 13:54 +0800, Ming Lei wrote: > On Sat, Mar 7, 2015 at 12:31 PM, Jason Low wrote: > > On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: > > Cc: Ming Lei > > Cc: Davidlohr Bueso > > Signed-off-by: Jason Low > > Reported-and-tested-by: Ming Lei Thanks! > > static

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Fri, 2015-03-06 at 20:44 -0800, Davidlohr Bueso wrote: > On Fri, 2015-03-06 at 20:31 -0800, Jason Low wrote: > > On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: > > > > Just in case, here's the updated patch which addresses Linus's comments > > and with a changelog. > > > > Note: The

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Ming Lei
On Sat, Mar 7, 2015 at 12:31 PM, Jason Low wrote: > On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: > > Just in case, here's the updated patch which addresses Linus's comments > and with a changelog. > > Note: The changelog says that it fixes (locking/rwsem: Avoid deceiving > lock spinners),

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 20:31 -0800, Jason Low wrote: > On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: > > Just in case, here's the updated patch which addresses Linus's comments > and with a changelog. > > Note: The changelog says that it fixes (locking/rwsem: Avoid deceiving > lock

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: Just in case, here's the updated patch which addresses Linus's comments and with a changelog. Note: The changelog says that it fixes (locking/rwsem: Avoid deceiving lock spinners), though I still haven't seen full confirmation that it addresses

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Sat, 2015-03-07 at 11:39 +0800, Ming Lei wrote: > On Sat, Mar 7, 2015 at 11:17 AM, Jason Low wrote: > > On Sat, 2015-03-07 at 11:08 +0800, Ming Lei wrote: > >> On Sat, Mar 7, 2015 at 10:56 AM, Jason Low wrote: > >> > On Sat, 2015-03-07 at 10:10 +0800, Ming Lei wrote: > >> >> On Sat, Mar 7,

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Sat, 2015-03-07 at 11:19 +0800, Ming Lei wrote: > On Sat, Mar 7, 2015 at 11:10 AM, Davidlohr Bueso wrote: > > On Sat, 2015-03-07 at 10:55 +0800, Ming Lei wrote: > >> On Sat, Mar 7, 2015 at 10:29 AM, Davidlohr Bueso wrote: > >> > On Fri, 2015-03-06 at 18:26 -0800, Davidlohr Bueso wrote: > >>

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Ming Lei
On Sat, Mar 7, 2015 at 11:17 AM, Jason Low wrote: > On Sat, 2015-03-07 at 11:08 +0800, Ming Lei wrote: >> On Sat, Mar 7, 2015 at 10:56 AM, Jason Low wrote: >> > On Sat, 2015-03-07 at 10:10 +0800, Ming Lei wrote: >> >> On Sat, Mar 7, 2015 at 10:07 AM, Davidlohr Bueso >> >> wrote: >> >> > On

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Ming Lei
On Sat, Mar 7, 2015 at 11:10 AM, Davidlohr Bueso wrote: > On Sat, 2015-03-07 at 10:55 +0800, Ming Lei wrote: >> On Sat, Mar 7, 2015 at 10:29 AM, Davidlohr Bueso wrote: >> > On Fri, 2015-03-06 at 18:26 -0800, Davidlohr Bueso wrote: >> >> That's not what this is about. New lock _owners_ need to

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Sat, 2015-03-07 at 11:08 +0800, Ming Lei wrote: > On Sat, Mar 7, 2015 at 10:56 AM, Jason Low wrote: > > On Sat, 2015-03-07 at 10:10 +0800, Ming Lei wrote: > >> On Sat, Mar 7, 2015 at 10:07 AM, Davidlohr Bueso wrote: > >> > On Sat, 2015-03-07 at 09:55 +0800, Ming Lei wrote: > >> >> On Fri, 06

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Sat, 2015-03-07 at 11:08 +0800, Ming Lei wrote: > On Sat, Mar 7, 2015 at 10:56 AM, Jason Low wrote: > > On Sat, 2015-03-07 at 10:10 +0800, Ming Lei wrote: > >> On Sat, Mar 7, 2015 at 10:07 AM, Davidlohr Bueso wrote: > >> > On Sat, 2015-03-07 at 09:55 +0800, Ming Lei wrote: > >> >> On Fri, 06

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Sat, 2015-03-07 at 10:55 +0800, Ming Lei wrote: > On Sat, Mar 7, 2015 at 10:29 AM, Davidlohr Bueso wrote: > > On Fri, 2015-03-06 at 18:26 -0800, Davidlohr Bueso wrote: > >> That's not what this is about. New lock _owners_ need to worry about > >

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Ming Lei
On Sat, Mar 7, 2015 at 10:56 AM, Jason Low wrote: > On Sat, 2015-03-07 at 10:10 +0800, Ming Lei wrote: >> On Sat, Mar 7, 2015 at 10:07 AM, Davidlohr Bueso wrote: >> > On Sat, 2015-03-07 at 09:55 +0800, Ming Lei wrote: >> >> On Fri, 06 Mar 2015 14:15:37 -0800 >> >> Davidlohr Bueso wrote: >> >>

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Sat, 2015-03-07 at 10:10 +0800, Ming Lei wrote: > On Sat, Mar 7, 2015 at 10:07 AM, Davidlohr Bueso wrote: > > On Sat, 2015-03-07 at 09:55 +0800, Ming Lei wrote: > >> On Fri, 06 Mar 2015 14:15:37 -0800 > >> Davidlohr Bueso wrote: > >> > >> > On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote:

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Ming Lei
On Sat, Mar 7, 2015 at 10:29 AM, Davidlohr Bueso wrote: > On Fri, 2015-03-06 at 18:26 -0800, Davidlohr Bueso wrote: >> That's not what this is about. New lock _owners_ need to worry about > ^^^ make that "need not" Sorry, could you explain a

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 18:26 -0800, Davidlohr Bueso wrote: > That's not what this is about. New lock _owners_ need to worry about ^^^ make that "need not" -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Sat, 2015-03-07 at 10:10 +0800, Ming Lei wrote: > On Sat, Mar 7, 2015 at 10:07 AM, Davidlohr Bueso wrote: > > On Sat, 2015-03-07 at 09:55 +0800, Ming Lei wrote: > >> On Fri, 06 Mar 2015 14:15:37 -0800 > >> Davidlohr Bueso wrote: > >> > >> > On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote:

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Ming Lei
On Sat, Mar 7, 2015 at 10:07 AM, Davidlohr Bueso wrote: > On Sat, 2015-03-07 at 09:55 +0800, Ming Lei wrote: >> On Fri, 06 Mar 2015 14:15:37 -0800 >> Davidlohr Bueso wrote: >> >> > On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: >> > > In owner_running() there are 2 conditions that would

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Sat, 2015-03-07 at 09:55 +0800, Ming Lei wrote: > On Fri, 06 Mar 2015 14:15:37 -0800 > Davidlohr Bueso wrote: > > > On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: > > > In owner_running() there are 2 conditions that would make it return > > > false: if the owner changed or if the owner

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Fri, 2015-03-06 at 14:15 -0800, Davidlohr Bueso wrote: > On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: > > In owner_running() there are 2 conditions that would make it return > > false: if the owner changed or if the owner is not running. However, > > that patch continues spinning if

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Ming Lei
On Fri, 06 Mar 2015 14:15:37 -0800 Davidlohr Bueso wrote: > On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: > > In owner_running() there are 2 conditions that would make it return > > false: if the owner changed or if the owner is not running. However, > > that patch continues spinning if

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Fri, 2015-03-06 at 13:24 -0800, Linus Torvalds wrote: > On Fri, Mar 6, 2015 at 1:12 PM, Jason Low wrote: > > > > + while (true) { > > + if (sem->owner != owner) > > + break; > > That looks *really* odd. > > Why is this not > > while

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: > In owner_running() there are 2 conditions that would make it return > false: if the owner changed or if the owner is not running. However, > that patch continues spinning if there is a "new owner" but it does not > take into account that we may

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin
On 03/06/2015 01:02 PM, Sasha Levin wrote: > I can go redo that again if you suspect that that commit is not the cause. I took a closer look at the logs, and I'm seeing hangs that begin this way as well: [ 2298.020237] NMI watchdog: BUG: soft lockup - CPU#19 stuck for 23s! [trinity-c19:839] [

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Linus Torvalds
On Fri, Mar 6, 2015 at 11:55 AM, Davidlohr Bueso wrote: >> >> - look up the vma in the vma lookup cache > > But you'd still need mmap_sem there to at least get the VMA's first > value. So my theory was that the vma cache is such a trivial data structure that we could trivially make it be

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Linus Torvalds
On Fri, Mar 6, 2015 at 1:12 PM, Jason Low wrote: > > + while (true) { > + if (sem->owner != owner) > + break; That looks *really* odd. Why is this not while (sem->owner == owner) { Also, this "barrier()" now lost the comment: > +

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Fri, 2015-03-06 at 11:29 -0800, Jason Low wrote: > Hi Linus, > > Agreed, this is an issue we need to address, though we're just trying to > figure out if the change to rwsem_can_spin_on_owner() in "commit: > 37e9562453b" is really the one that's causing the issue. > > For example, it looks

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 11:55 -0800, Davidlohr Bueso wrote: > On Fri, 2015-03-06 at 11:32 -0800, Linus Torvalds wrote: > > > IOW, I wonder if we could special-case the common non-IO > > fault-handling path something along the lines of: > > > > - look up the vma in the vma lookup cache > > But

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 11:32 -0800, Linus Torvalds wrote: > IOW, I wonder if we could special-case the common non-IO > fault-handling path something along the lines of: > > - look up the vma in the vma lookup cache But you'd still need mmap_sem there to at least get the VMA's first value. > -

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 11:32 -0800, Linus Torvalds wrote: > Basically, to me, the whole "if a lock is so contended that we need to > play locking games, then we should look at why we *use* the lock, > rather than at the lock itself" is a religion. Oh absolutely, I'm only mentioning the locking

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Linus Torvalds
On Fri, Mar 6, 2015 at 11:20 AM, Davidlohr Bueso wrote: > > I obviously agree with all those points, however fyi most of the testing > on rwsems I do includes scaling address space ops stressing the > mmap_sem, which is a real world concern. So while it does include > microbenchmarks, it is not

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Fri, 2015-03-06 at 11:05 -0800, Linus Torvalds wrote: > On Fri, Mar 6, 2015 at 10:57 AM, Jason Low wrote: > > > > Right, the can_spin_on_owner() was originally added to the mutex > > spinning code for optimization purposes, particularly so that we can > > avoid adding the spinner to the OSQ

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 11:05 -0800, Linus Torvalds wrote: > On Fri, Mar 6, 2015 at 10:57 AM, Jason Low wrote: > > > > Right, the can_spin_on_owner() was originally added to the mutex > > spinning code for optimization purposes, particularly so that we can > > avoid adding the spinner to the OSQ

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Linus Torvalds
On Fri, Mar 6, 2015 at 10:57 AM, Jason Low wrote: > > Right, the can_spin_on_owner() was originally added to the mutex > spinning code for optimization purposes, particularly so that we can > avoid adding the spinner to the OSQ only to find that it doesn't need to > spin. This function needing to

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Fri, 2015-03-06 at 09:19 -0800, Davidlohr Bueso wrote: > On Fri, 2015-03-06 at 13:32 +0100, Ingo Molnar wrote: > > * Sasha Levin wrote: > > > > > I've bisected this to "locking/rwsem: Check for active lock before > > > bailing on spinning". Relevant parties Cc'ed. > > > > That would be: > >

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin
On 03/06/2015 12:19 PM, Davidlohr Bueso wrote: >> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c >> > index 1c0d11e8ce34..e4ad019e23f5 100644 >> > --- a/kernel/locking/rwsem-xadd.c >> > +++ b/kernel/locking/rwsem-xadd.c >> > @@ -298,23 +298,30 @@ static inline bool >> >

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 13:32 +0100, Ingo Molnar wrote: > * Sasha Levin wrote: > > > I've bisected this to "locking/rwsem: Check for active lock before bailing > > on spinning". Relevant parties Cc'ed. > > That would be: > > 1a99367023f6 ("locking/rwsem: Check for active lock before bailing

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin
On 03/06/2015 09:45 AM, Sasha Levin wrote: > On 03/06/2015 09:34 AM, Rafael David Tinoco wrote: >> Are you sure about this ? I have a core dump locked on the same place >> (state machine for powering cpu down for the task swap) from a 3.13 (+ >> upstream patches) and this commit wasn't backported

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin
On 03/06/2015 09:34 AM, Rafael David Tinoco wrote: > Are you sure about this ? I have a core dump locked on the same place > (state machine for powering cpu down for the task swap) from a 3.13 (+ > upstream patches) and this commit wasn't backported yet. bisect took me to that same commit twice,

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Rafael David Tinoco
Are you sure about this ? I have a core dump locked on the same place (state machine for powering cpu down for the task swap) from a 3.13 (+ upstream patches) and this commit wasn't backported yet. -> multi_cpu_stop -> do { } while (curstate != MULTI_STOP_EXIT); In my case, curstate is WAY

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Ingo Molnar
* Sasha Levin wrote: > I've bisected this to "locking/rwsem: Check for active lock before bailing on > spinning". Relevant parties Cc'ed. That would be: 1a99367023f6 ("locking/rwsem: Check for active lock before bailing on spinning") attached below. Thanks, Ingo

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin
I've bisected this to "locking/rwsem: Check for active lock before bailing on spinning". Relevant parties Cc'ed. Thanks, Sasha On 03/02/2015 02:45 AM, Sasha Levin wrote: > Hi all, > > I'm seeing the following lockup pretty often while fuzzing with trinity: > > [ 880.960250] NMI watchdog:

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Fri, 2015-03-06 at 14:15 -0800, Davidlohr Bueso wrote: On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: In owner_running() there are 2 conditions that would make it return false: if the owner changed or if the owner is not running. However, that patch continues spinning if there is a

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Ming Lei
On Fri, 06 Mar 2015 14:15:37 -0800 Davidlohr Bueso d...@stgolabs.net wrote: On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: In owner_running() there are 2 conditions that would make it return false: if the owner changed or if the owner is not running. However, that patch continues

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Sat, 2015-03-07 at 09:55 +0800, Ming Lei wrote: On Fri, 06 Mar 2015 14:15:37 -0800 Davidlohr Bueso d...@stgolabs.net wrote: On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: In owner_running() there are 2 conditions that would make it return false: if the owner changed or if the

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Sat, 2015-03-07 at 10:10 +0800, Ming Lei wrote: On Sat, Mar 7, 2015 at 10:07 AM, Davidlohr Bueso d...@stgolabs.net wrote: On Sat, 2015-03-07 at 09:55 +0800, Ming Lei wrote: On Fri, 06 Mar 2015 14:15:37 -0800 Davidlohr Bueso d...@stgolabs.net wrote: On Fri, 2015-03-06 at 13:12 -0800,

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 18:26 -0800, Davidlohr Bueso wrote: That's not what this is about. New lock _owners_ need to worry about ^^^ make that need not -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Sat, 2015-03-07 at 11:08 +0800, Ming Lei wrote: On Sat, Mar 7, 2015 at 10:56 AM, Jason Low jason.l...@hp.com wrote: On Sat, 2015-03-07 at 10:10 +0800, Ming Lei wrote: On Sat, Mar 7, 2015 at 10:07 AM, Davidlohr Bueso d...@stgolabs.net wrote: On Sat, 2015-03-07 at 09:55 +0800, Ming Lei

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Fri, 2015-03-06 at 13:24 -0800, Linus Torvalds wrote: On Fri, Mar 6, 2015 at 1:12 PM, Jason Low jason.l...@hp.com wrote: + while (true) { + if (sem-owner != owner) + break; That looks *really* odd. Why is this not while

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Sat, 2015-03-07 at 11:08 +0800, Ming Lei wrote: On Sat, Mar 7, 2015 at 10:56 AM, Jason Low jason.l...@hp.com wrote: On Sat, 2015-03-07 at 10:10 +0800, Ming Lei wrote: On Sat, Mar 7, 2015 at 10:07 AM, Davidlohr Bueso d...@stgolabs.net wrote: On Sat, 2015-03-07 at 09:55 +0800, Ming Lei

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Ming Lei
On Sat, Mar 7, 2015 at 10:07 AM, Davidlohr Bueso d...@stgolabs.net wrote: On Sat, 2015-03-07 at 09:55 +0800, Ming Lei wrote: On Fri, 06 Mar 2015 14:15:37 -0800 Davidlohr Bueso d...@stgolabs.net wrote: On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: In owner_running() there are 2

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Sat, 2015-03-07 at 10:55 +0800, Ming Lei wrote: On Sat, Mar 7, 2015 at 10:29 AM, Davidlohr Bueso d...@stgolabs.net wrote: On Fri, 2015-03-06 at 18:26 -0800, Davidlohr Bueso wrote: That's not what this is about. New lock _owners_ need to worry about

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Ming Lei
On Sat, Mar 7, 2015 at 10:29 AM, Davidlohr Bueso d...@stgolabs.net wrote: On Fri, 2015-03-06 at 18:26 -0800, Davidlohr Bueso wrote: That's not what this is about. New lock _owners_ need to worry about ^^^ make that need not Sorry, could you

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Sat, 2015-03-07 at 10:10 +0800, Ming Lei wrote: On Sat, Mar 7, 2015 at 10:07 AM, Davidlohr Bueso d...@stgolabs.net wrote: On Sat, 2015-03-07 at 09:55 +0800, Ming Lei wrote: On Fri, 06 Mar 2015 14:15:37 -0800 Davidlohr Bueso d...@stgolabs.net wrote: On Fri, 2015-03-06 at 13:12 -0800,

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Ming Lei
On Sat, Mar 7, 2015 at 10:56 AM, Jason Low jason.l...@hp.com wrote: On Sat, 2015-03-07 at 10:10 +0800, Ming Lei wrote: On Sat, Mar 7, 2015 at 10:07 AM, Davidlohr Bueso d...@stgolabs.net wrote: On Sat, 2015-03-07 at 09:55 +0800, Ming Lei wrote: On Fri, 06 Mar 2015 14:15:37 -0800 Davidlohr

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Ming Lei
On Sat, Mar 7, 2015 at 11:10 AM, Davidlohr Bueso d...@stgolabs.net wrote: On Sat, 2015-03-07 at 10:55 +0800, Ming Lei wrote: On Sat, Mar 7, 2015 at 10:29 AM, Davidlohr Bueso d...@stgolabs.net wrote: On Fri, 2015-03-06 at 18:26 -0800, Davidlohr Bueso wrote: That's not what this is about. New

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Sat, 2015-03-07 at 11:19 +0800, Ming Lei wrote: On Sat, Mar 7, 2015 at 11:10 AM, Davidlohr Bueso d...@stgolabs.net wrote: On Sat, 2015-03-07 at 10:55 +0800, Ming Lei wrote: On Sat, Mar 7, 2015 at 10:29 AM, Davidlohr Bueso d...@stgolabs.net wrote: On Fri, 2015-03-06 at 18:26 -0800,

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Ming Lei
On Sat, Mar 7, 2015 at 11:17 AM, Jason Low jason.l...@hp.com wrote: On Sat, 2015-03-07 at 11:08 +0800, Ming Lei wrote: On Sat, Mar 7, 2015 at 10:56 AM, Jason Low jason.l...@hp.com wrote: On Sat, 2015-03-07 at 10:10 +0800, Ming Lei wrote: On Sat, Mar 7, 2015 at 10:07 AM, Davidlohr Bueso

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Sat, 2015-03-07 at 11:39 +0800, Ming Lei wrote: On Sat, Mar 7, 2015 at 11:17 AM, Jason Low jason.l...@hp.com wrote: On Sat, 2015-03-07 at 11:08 +0800, Ming Lei wrote: On Sat, Mar 7, 2015 at 10:56 AM, Jason Low jason.l...@hp.com wrote: On Sat, 2015-03-07 at 10:10 +0800, Ming Lei wrote:

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Rafael David Tinoco
Are you sure about this ? I have a core dump locked on the same place (state machine for powering cpu down for the task swap) from a 3.13 (+ upstream patches) and this commit wasn't backported yet. - multi_cpu_stop - do { } while (curstate != MULTI_STOP_EXIT); In my case, curstate is WAY

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin
On 03/06/2015 09:34 AM, Rafael David Tinoco wrote: Are you sure about this ? I have a core dump locked on the same place (state machine for powering cpu down for the task swap) from a 3.13 (+ upstream patches) and this commit wasn't backported yet. bisect took me to that same commit twice, and

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Ingo Molnar
* Sasha Levin sasha.le...@oracle.com wrote: I've bisected this to locking/rwsem: Check for active lock before bailing on spinning. Relevant parties Cc'ed. That would be: 1a99367023f6 (locking/rwsem: Check for active lock before bailing on spinning) attached below. Thanks, Ingo

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin
On 03/06/2015 09:45 AM, Sasha Levin wrote: On 03/06/2015 09:34 AM, Rafael David Tinoco wrote: Are you sure about this ? I have a core dump locked on the same place (state machine for powering cpu down for the task swap) from a 3.13 (+ upstream patches) and this commit wasn't backported yet.

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 13:32 +0100, Ingo Molnar wrote: * Sasha Levin sasha.le...@oracle.com wrote: I've bisected this to locking/rwsem: Check for active lock before bailing on spinning. Relevant parties Cc'ed. That would be: 1a99367023f6 (locking/rwsem: Check for active lock before

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin
I've bisected this to locking/rwsem: Check for active lock before bailing on spinning. Relevant parties Cc'ed. Thanks, Sasha On 03/02/2015 02:45 AM, Sasha Levin wrote: Hi all, I'm seeing the following lockup pretty often while fuzzing with trinity: [ 880.960250] NMI watchdog: BUG: soft

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin
On 03/06/2015 12:19 PM, Davidlohr Bueso wrote: diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 1c0d11e8ce34..e4ad019e23f5 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -298,23 +298,30 @@ static inline bool

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Linus Torvalds
On Fri, Mar 6, 2015 at 10:57 AM, Jason Low jason.l...@hp.com wrote: Right, the can_spin_on_owner() was originally added to the mutex spinning code for optimization purposes, particularly so that we can avoid adding the spinner to the OSQ only to find that it doesn't need to spin. This

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Fri, 2015-03-06 at 09:19 -0800, Davidlohr Bueso wrote: On Fri, 2015-03-06 at 13:32 +0100, Ingo Molnar wrote: * Sasha Levin sasha.le...@oracle.com wrote: I've bisected this to locking/rwsem: Check for active lock before bailing on spinning. Relevant parties Cc'ed. That would

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 11:32 -0800, Linus Torvalds wrote: IOW, I wonder if we could special-case the common non-IO fault-handling path something along the lines of: - look up the vma in the vma lookup cache But you'd still need mmap_sem there to at least get the VMA's first value. - look

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 11:05 -0800, Linus Torvalds wrote: On Fri, Mar 6, 2015 at 10:57 AM, Jason Low jason.l...@hp.com wrote: Right, the can_spin_on_owner() was originally added to the mutex spinning code for optimization purposes, particularly so that we can avoid adding the spinner to

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Fri, 2015-03-06 at 11:05 -0800, Linus Torvalds wrote: On Fri, Mar 6, 2015 at 10:57 AM, Jason Low jason.l...@hp.com wrote: Right, the can_spin_on_owner() was originally added to the mutex spinning code for optimization purposes, particularly so that we can avoid adding the spinner to

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Linus Torvalds
On Fri, Mar 6, 2015 at 11:20 AM, Davidlohr Bueso d...@stgolabs.net wrote: I obviously agree with all those points, however fyi most of the testing on rwsems I do includes scaling address space ops stressing the mmap_sem, which is a real world concern. So while it does include microbenchmarks,

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 11:32 -0800, Linus Torvalds wrote: Basically, to me, the whole if a lock is so contended that we need to play locking games, then we should look at why we *use* the lock, rather than at the lock itself is a religion. Oh absolutely, I'm only mentioning the locking

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 11:55 -0800, Davidlohr Bueso wrote: On Fri, 2015-03-06 at 11:32 -0800, Linus Torvalds wrote: IOW, I wonder if we could special-case the common non-IO fault-handling path something along the lines of: - look up the vma in the vma lookup cache But you'd still

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Linus Torvalds
On Fri, Mar 6, 2015 at 1:12 PM, Jason Low jason.l...@hp.com wrote: + while (true) { + if (sem-owner != owner) + break; That looks *really* odd. Why is this not while (sem-owner == owner) { Also, this barrier() now lost the comment: +

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Linus Torvalds
On Fri, Mar 6, 2015 at 11:55 AM, Davidlohr Bueso d...@stgolabs.net wrote: - look up the vma in the vma lookup cache But you'd still need mmap_sem there to at least get the VMA's first value. So my theory was that the vma cache is such a trivial data structure that we could trivially make it

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Fri, 2015-03-06 at 11:29 -0800, Jason Low wrote: Hi Linus, Agreed, this is an issue we need to address, though we're just trying to figure out if the change to rwsem_can_spin_on_owner() in commit: 37e9562453b is really the one that's causing the issue. For example, it looks like Ming

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: In owner_running() there are 2 conditions that would make it return false: if the owner changed or if the owner is not running. However, that patch continues spinning if there is a new owner but it does not take into account that we may want

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Fri, 2015-03-06 at 20:44 -0800, Davidlohr Bueso wrote: On Fri, 2015-03-06 at 20:31 -0800, Jason Low wrote: On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: Just in case, here's the updated patch which addresses Linus's comments and with a changelog. Note: The changelog says

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Sat, 2015-03-07 at 13:54 +0800, Ming Lei wrote: On Sat, Mar 7, 2015 at 12:31 PM, Jason Low jason.l...@hp.com wrote: On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: Cc: Ming Lei ming@canonical.com Cc: Davidlohr Bueso d...@stgolabs.net Signed-off-by: Jason Low jason.l...@hp.com

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: Just in case, here's the updated patch which addresses Linus's comments and with a changelog. Note: The changelog says that it fixes (locking/rwsem: Avoid deceiving lock spinners), though I still haven't seen full confirmation that it addresses

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 20:31 -0800, Jason Low wrote: On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: Just in case, here's the updated patch which addresses Linus's comments and with a changelog. Note: The changelog says that it fixes (locking/rwsem: Avoid deceiving lock spinners),

Re: softlockups in multi_cpu_stop

2015-03-06 Thread Ming Lei
On Sat, Mar 7, 2015 at 12:31 PM, Jason Low jason.l...@hp.com wrote: On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: Just in case, here's the updated patch which addresses Linus's comments and with a changelog. Note: The changelog says that it fixes (locking/rwsem: Avoid deceiving lock

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin
On 03/06/2015 01:02 PM, Sasha Levin wrote: I can go redo that again if you suspect that that commit is not the cause. I took a closer look at the logs, and I'm seeing hangs that begin this way as well: [ 2298.020237] NMI watchdog: BUG: soft lockup - CPU#19 stuck for 23s! [trinity-c19:839] [

Re: sched: softlockups in multi_cpu_stop

2015-03-03 Thread Rafael David Tinoco
Some more info: multi_cpu_stop seems to be spinning inside do { ... } while (curstate != MULTI_STOP_EXIT); So, multi_cpu_stop is an offload ([migration]) for: migrate_swap -> stop_two_cpus -> wait_for_completion() sequence... for cross-migrating 2 tasks. Based on task structs from callers

Re: sched: softlockups in multi_cpu_stop

2015-03-03 Thread Rafael David Tinoco
Some more info: multi_cpu_stop seems to be spinning inside do { ... } while (curstate != MULTI_STOP_EXIT); So, multi_cpu_stop is an offload ([migration]) for: migrate_swap - stop_two_cpus - wait_for_completion() sequence... for cross-migrating 2 tasks. Based on task structs from callers stacks:

sched: softlockups in multi_cpu_stop

2015-03-01 Thread Sasha Levin
Hi all, I'm seeing the following lockup pretty often while fuzzing with trinity: [ 880.960250] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 447s! [migration/1:14] [ 880.960700] Modules linked in: [ 880.960700] irq event stamp: 380954 [ 880.960700] hardirqs last enabled at (380953):

sched: softlockups in multi_cpu_stop

2015-03-01 Thread Sasha Levin
Hi all, I'm seeing the following lockup pretty often while fuzzing with trinity: [ 880.960250] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 447s! [migration/1:14] [ 880.960700] Modules linked in: [ 880.960700] irq event stamp: 380954 [ 880.960700] hardirqs last enabled at (380953):