subject:"sched\: softlockups in multi_cpu

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin

On 03/06/2015 01:02 PM, Sasha Levin wrote: > I can go redo that again if you suspect that that commit is not the cause. I took a closer look at the logs, and I'm seeing hangs that begin this way as well: [ 2298.020237] NMI watchdog: BUG: soft lockup - CPU#19 stuck for 23s! [trinity-c19:839] [

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Linus Torvalds

On Fri, Mar 6, 2015 at 11:55 AM, Davidlohr Bueso wrote: >> >> - look up the vma in the vma lookup cache > > But you'd still need mmap_sem there to at least get the VMA's first > value. So my theory was that the vma cache is such a trivial data structure that we could trivially make it be

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso

On Fri, 2015-03-06 at 11:55 -0800, Davidlohr Bueso wrote: > On Fri, 2015-03-06 at 11:32 -0800, Linus Torvalds wrote: > > > IOW, I wonder if we could special-case the common non-IO > > fault-handling path something along the lines of: > > > > - look up the vma in the vma lookup cache > > But

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso

On Fri, 2015-03-06 at 11:32 -0800, Linus Torvalds wrote: > IOW, I wonder if we could special-case the common non-IO > fault-handling path something along the lines of: > > - look up the vma in the vma lookup cache But you'd still need mmap_sem there to at least get the VMA's first value. > -

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso

On Fri, 2015-03-06 at 11:32 -0800, Linus Torvalds wrote: > Basically, to me, the whole "if a lock is so contended that we need to > play locking games, then we should look at why we *use* the lock, > rather than at the lock itself" is a religion. Oh absolutely, I'm only mentioning the locking

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Linus Torvalds

On Fri, Mar 6, 2015 at 11:20 AM, Davidlohr Bueso wrote: > > I obviously agree with all those points, however fyi most of the testing > on rwsems I do includes scaling address space ops stressing the > mmap_sem, which is a real world concern. So while it does include > microbenchmarks, it is not

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low

On Fri, 2015-03-06 at 11:05 -0800, Linus Torvalds wrote: > On Fri, Mar 6, 2015 at 10:57 AM, Jason Low wrote: > > > > Right, the can_spin_on_owner() was originally added to the mutex > > spinning code for optimization purposes, particularly so that we can > > avoid adding the spinner to the OSQ

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso

On Fri, 2015-03-06 at 11:05 -0800, Linus Torvalds wrote: > On Fri, Mar 6, 2015 at 10:57 AM, Jason Low wrote: > > > > Right, the can_spin_on_owner() was originally added to the mutex > > spinning code for optimization purposes, particularly so that we can > > avoid adding the spinner to the OSQ

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Linus Torvalds

On Fri, Mar 6, 2015 at 10:57 AM, Jason Low wrote: > > Right, the can_spin_on_owner() was originally added to the mutex > spinning code for optimization purposes, particularly so that we can > avoid adding the spinner to the OSQ only to find that it doesn't need to > spin. This function needing to

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low

On Fri, 2015-03-06 at 09:19 -0800, Davidlohr Bueso wrote: > On Fri, 2015-03-06 at 13:32 +0100, Ingo Molnar wrote: > > * Sasha Levin wrote: > > > > > I've bisected this to "locking/rwsem: Check for active lock before > > > bailing on spinning". Relevant parties Cc'ed. > > > > That would be: > >

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin

On 03/06/2015 12:19 PM, Davidlohr Bueso wrote: >> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c >> > index 1c0d11e8ce34..e4ad019e23f5 100644 >> > --- a/kernel/locking/rwsem-xadd.c >> > +++ b/kernel/locking/rwsem-xadd.c >> > @@ -298,23 +298,30 @@ static inline bool >> >

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso

On Fri, 2015-03-06 at 13:32 +0100, Ingo Molnar wrote: > * Sasha Levin wrote: > > > I've bisected this to "locking/rwsem: Check for active lock before bailing > > on spinning". Relevant parties Cc'ed. > > That would be: > > 1a99367023f6 ("locking/rwsem: Check for active lock before bailing

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin

On 03/06/2015 09:45 AM, Sasha Levin wrote: > On 03/06/2015 09:34 AM, Rafael David Tinoco wrote: >> Are you sure about this ? I have a core dump locked on the same place >> (state machine for powering cpu down for the task swap) from a 3.13 (+ >> upstream patches) and this commit wasn't backported

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin

On 03/06/2015 09:34 AM, Rafael David Tinoco wrote: > Are you sure about this ? I have a core dump locked on the same place > (state machine for powering cpu down for the task swap) from a 3.13 (+ > upstream patches) and this commit wasn't backported yet. bisect took me to that same commit twice,

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Rafael David Tinoco

Are you sure about this ? I have a core dump locked on the same place (state machine for powering cpu down for the task swap) from a 3.13 (+ upstream patches) and this commit wasn't backported yet. -> multi_cpu_stop -> do { } while (curstate != MULTI_STOP_EXIT); In my case, curstate is WAY

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Ingo Molnar

* Sasha Levin wrote: > I've bisected this to "locking/rwsem: Check for active lock before bailing on > spinning". Relevant parties Cc'ed. That would be: 1a99367023f6 ("locking/rwsem: Check for active lock before bailing on spinning") attached below. Thanks, Ingo

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin

I've bisected this to "locking/rwsem: Check for active lock before bailing on spinning". Relevant parties Cc'ed. Thanks, Sasha On 03/02/2015 02:45 AM, Sasha Levin wrote: > Hi all, > > I'm seeing the following lockup pretty often while fuzzing with trinity: > > [ 880.960250] NMI watchdog:

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Rafael David Tinoco

Are you sure about this ? I have a core dump locked on the same place (state machine for powering cpu down for the task swap) from a 3.13 (+ upstream patches) and this commit wasn't backported yet. - multi_cpu_stop - do { } while (curstate != MULTI_STOP_EXIT); In my case, curstate is WAY

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin

On 03/06/2015 09:34 AM, Rafael David Tinoco wrote: Are you sure about this ? I have a core dump locked on the same place (state machine for powering cpu down for the task swap) from a 3.13 (+ upstream patches) and this commit wasn't backported yet. bisect took me to that same commit twice, and

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Ingo Molnar

* Sasha Levin sasha.le...@oracle.com wrote: I've bisected this to locking/rwsem: Check for active lock before bailing on spinning. Relevant parties Cc'ed. That would be: 1a99367023f6 (locking/rwsem: Check for active lock before bailing on spinning) attached below. Thanks, Ingo

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin

On 03/06/2015 09:45 AM, Sasha Levin wrote: On 03/06/2015 09:34 AM, Rafael David Tinoco wrote: Are you sure about this ? I have a core dump locked on the same place (state machine for powering cpu down for the task swap) from a 3.13 (+ upstream patches) and this commit wasn't backported yet.

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso

On Fri, 2015-03-06 at 13:32 +0100, Ingo Molnar wrote: * Sasha Levin sasha.le...@oracle.com wrote: I've bisected this to locking/rwsem: Check for active lock before bailing on spinning. Relevant parties Cc'ed. That would be: 1a99367023f6 (locking/rwsem: Check for active lock before

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin

I've bisected this to locking/rwsem: Check for active lock before bailing on spinning. Relevant parties Cc'ed. Thanks, Sasha On 03/02/2015 02:45 AM, Sasha Levin wrote: Hi all, I'm seeing the following lockup pretty often while fuzzing with trinity: [ 880.960250] NMI watchdog: BUG: soft

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin

On 03/06/2015 12:19 PM, Davidlohr Bueso wrote: diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 1c0d11e8ce34..e4ad019e23f5 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -298,23 +298,30 @@ static inline bool

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Linus Torvalds

On Fri, Mar 6, 2015 at 10:57 AM, Jason Low jason.l...@hp.com wrote: Right, the can_spin_on_owner() was originally added to the mutex spinning code for optimization purposes, particularly so that we can avoid adding the spinner to the OSQ only to find that it doesn't need to spin. This

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low

On Fri, 2015-03-06 at 09:19 -0800, Davidlohr Bueso wrote: On Fri, 2015-03-06 at 13:32 +0100, Ingo Molnar wrote: * Sasha Levin sasha.le...@oracle.com wrote: I've bisected this to locking/rwsem: Check for active lock before bailing on spinning. Relevant parties Cc'ed. That would

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso

On Fri, 2015-03-06 at 11:32 -0800, Linus Torvalds wrote: IOW, I wonder if we could special-case the common non-IO fault-handling path something along the lines of: - look up the vma in the vma lookup cache But you'd still need mmap_sem there to at least get the VMA's first value. - look

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso

On Fri, 2015-03-06 at 11:05 -0800, Linus Torvalds wrote: On Fri, Mar 6, 2015 at 10:57 AM, Jason Low jason.l...@hp.com wrote: Right, the can_spin_on_owner() was originally added to the mutex spinning code for optimization purposes, particularly so that we can avoid adding the spinner to

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low

On Fri, 2015-03-06 at 11:05 -0800, Linus Torvalds wrote: On Fri, Mar 6, 2015 at 10:57 AM, Jason Low jason.l...@hp.com wrote: Right, the can_spin_on_owner() was originally added to the mutex spinning code for optimization purposes, particularly so that we can avoid adding the spinner to

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Linus Torvalds

On Fri, Mar 6, 2015 at 11:20 AM, Davidlohr Bueso d...@stgolabs.net wrote: I obviously agree with all those points, however fyi most of the testing on rwsems I do includes scaling address space ops stressing the mmap_sem, which is a real world concern. So while it does include microbenchmarks,

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso

On Fri, 2015-03-06 at 11:32 -0800, Linus Torvalds wrote: Basically, to me, the whole if a lock is so contended that we need to play locking games, then we should look at why we *use* the lock, rather than at the lock itself is a religion. Oh absolutely, I'm only mentioning the locking

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso

On Fri, 2015-03-06 at 11:55 -0800, Davidlohr Bueso wrote: On Fri, 2015-03-06 at 11:32 -0800, Linus Torvalds wrote: IOW, I wonder if we could special-case the common non-IO fault-handling path something along the lines of: - look up the vma in the vma lookup cache But you'd still

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Linus Torvalds

On Fri, Mar 6, 2015 at 11:55 AM, Davidlohr Bueso d...@stgolabs.net wrote: - look up the vma in the vma lookup cache But you'd still need mmap_sem there to at least get the VMA's first value. So my theory was that the vma cache is such a trivial data structure that we could trivially make it

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin

On 03/06/2015 01:02 PM, Sasha Levin wrote: I can go redo that again if you suspect that that commit is not the cause. I took a closer look at the logs, and I'm seeing hangs that begin this way as well: [ 2298.020237] NMI watchdog: BUG: soft lockup - CPU#19 stuck for 23s! [trinity-c19:839] [

Re: sched: softlockups in multi_cpu_stop

2015-03-03 Thread Rafael David Tinoco

Some more info: multi_cpu_stop seems to be spinning inside do { ... } while (curstate != MULTI_STOP_EXIT); So, multi_cpu_stop is an offload ([migration]) for: migrate_swap -> stop_two_cpus -> wait_for_completion() sequence... for cross-migrating 2 tasks. Based on task structs from callers

Re: sched: softlockups in multi_cpu_stop

2015-03-03 Thread Rafael David Tinoco

Some more info: multi_cpu_stop seems to be spinning inside do { ... } while (curstate != MULTI_STOP_EXIT); So, multi_cpu_stop is an offload ([migration]) for: migrate_swap - stop_two_cpus - wait_for_completion() sequence... for cross-migrating 2 tasks. Based on task structs from callers stacks:

sched: softlockups in multi_cpu_stop

2015-03-01 Thread Sasha Levin

Hi all, I'm seeing the following lockup pretty often while fuzzing with trinity: [ 880.960250] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 447s! [migration/1:14] [ 880.960700] Modules linked in: [ 880.960700] irq event stamp: 380954 [ 880.960700] hardirqs last enabled at (380953):

sched: softlockups in multi_cpu_stop

2015-03-01 Thread Sasha Levin

Hi all, I'm seeing the following lockup pretty often while fuzzing with trinity: [ 880.960250] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 447s! [migration/1:14] [ 880.960700] Modules linked in: [ 880.960700] irq event stamp: 380954 [ 880.960700] hardirqs last enabled at (380953):

38 matches

Mail list logo