Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-21 Thread Davidlohr Bueso
On Sat, 2014-03-22 at 07:57 +0530, Srikar Dronamraju wrote: > > > So reverting and applying v3 3/4 and 4/4 patches works for me. > > > > Ok, I verified that the above endds up resulting in the same tree as > > the minimal patch I sent out, modulo (a) some comments and (b) an > > #ifdef CONFIG_SMP

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-21 Thread Srikar Dronamraju
> > So reverting and applying v3 3/4 and 4/4 patches works for me. > > Ok, I verified that the above endds up resulting in the same tree as > the minimal patch I sent out, modulo (a) some comments and (b) an > #ifdef CONFIG_SMP in futex_get_mm() that doesn't really matter. > > So I committed

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-21 Thread Srikar Dronamraju
So reverting and applying v3 3/4 and 4/4 patches works for me. Ok, I verified that the above endds up resulting in the same tree as the minimal patch I sent out, modulo (a) some comments and (b) an #ifdef CONFIG_SMP in futex_get_mm() that doesn't really matter. So I committed the

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-21 Thread Davidlohr Bueso
On Sat, 2014-03-22 at 07:57 +0530, Srikar Dronamraju wrote: So reverting and applying v3 3/4 and 4/4 patches works for me. Ok, I verified that the above endds up resulting in the same tree as the minimal patch I sent out, modulo (a) some comments and (b) an #ifdef CONFIG_SMP in

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 9:55 PM, Srikar Dronamraju wrote: > > I reverted commits 99b60ce6 and b0c29f79. Then applied the patches in > the above url. The last one had a reject but it was pretty > straightforward to resolve it. After this, specjbb completes. > > So reverting and applying v3 3/4 and

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Srikar Dronamraju
> > Ok, so a big reason why this patch doesn't apply cleanly after reverting > is because *most* of the changes were done at the top of the file with > regards to documenting the ordering guarantees, the actual code changes > are quite minimal. > > I reverted commits 99b60ce6 (documentation) and

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 1:20 PM, Davidlohr Bueso wrote: > > I reverted commits 99b60ce6 (documentation) and b0c29f79 (the offending > commit), and then I cleanly applied the equivalent ones from v3 of the > series (which was already *tested* and ready for upstream until you > suggested looking

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Benjamin Herrenschmidt
On Thu, 2014-03-20 at 09:31 -0700, Davidlohr Bueso wrote: > hmmm looking at ppc spinlock code, it seems that it doesn't have ticket > spinlocks -- in fact Torsten Duwe has been trying to get them upstream > very recently. Since we rely on the counter for detecting waiters, this > might explain the

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 12:25 -0700, Linus Torvalds wrote: > On Thu, Mar 20, 2014 at 12:08 PM, Davidlohr Bueso wrote: > > > > Oh, it does. This atomics technique was tested at a customer's site and > > ready for upstream. > > I'm not worried about the *original* patch. I'm worried about the >

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 12:08 PM, Davidlohr Bueso wrote: > > Oh, it does. This atomics technique was tested at a customer's site and > ready for upstream. I'm not worried about the *original* patch. I'm worried about the incremental one. Your original patch never applied to my tree - I think it

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 11:36 -0700, Linus Torvalds wrote: > On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso wrote: > > > > Comparing with the patch I sent earlier this morning, looks equivalent, > > and fwiw, passes my initial qemu bootup, which is the first way of > > detecting anything stupid

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso wrote: > > Comparing with the patch I sent earlier this morning, looks equivalent, > and fwiw, passes my initial qemu bootup, which is the first way of > detecting anything stupid going on. > > So, Srikar, please try this patch out, as opposed to

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 11:03 AM, Davidlohr Bueso wrote: > > I still wonder about ppc and spinlocks (no ticketing!!) ... sure the > "waiters" patch might fix the problem just because we explicitly count > the members of the plist. And I guess if we cannot rely on all archs > having an equivalent

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 10:42 -0700, Linus Torvalds wrote: > On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso wrote: > >> It strikes me that the "spin_is_locked()" test has no barriers wrt the > >> writing of the new futex value on the wake path. And the read barrier > >> obviously does nothing

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso wrote: >> It strikes me that the "spin_is_locked()" test has no barriers wrt the >> writing of the new futex value on the wake path. And the read barrier >> obviously does nothing wrt the write either. Or am I missing >> something? So the write

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 09:41 -0700, Linus Torvalds wrote: > On Wed, Mar 19, 2014 at 10:56 PM, Davidlohr Bueso wrote: > > > > This problem suggests that we missed a wakeup for a task that was adding > > itself to the queue in a wait path. And the only place that can happen > > is with the hb

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Wed, Mar 19, 2014 at 10:56 PM, Davidlohr Bueso wrote: > > This problem suggests that we missed a wakeup for a task that was adding > itself to the queue in a wait path. And the only place that can happen > is with the hb spinlock check for any pending waiters. Ok, so thinking about

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Wed, 2014-03-19 at 22:56 -0700, Davidlohr Bueso wrote: > On Thu, 2014-03-20 at 11:03 +0530, Srikar Dronamraju wrote: > > > > Joy,.. let me look at that with ppc in mind. > > > > > > OK; so while pretty much all the comments from that patch are utter > > > nonsense (what was I thinking), I

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 15:38 +0530, Srikar Dronamraju wrote: > > This problem suggests that we missed a wakeup for a task that was adding > > itself to the queue in a wait path. And the only place that can happen > > is with the hb spinlock check for any pending waiters. Just in case we > > missed

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Srikar Dronamraju
> This problem suggests that we missed a wakeup for a task that was adding > itself to the queue in a wait path. And the only place that can happen > is with the hb spinlock check for any pending waiters. Just in case we > missed some assumption about checking the hash bucket spinlock as a way >

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Peter Zijlstra
On Thu, Mar 20, 2014 at 11:03:50AM +0530, Srikar Dronamraju wrote: > > > Joy,.. let me look at that with ppc in mind. > > > > OK; so while pretty much all the comments from that patch are utter > > nonsense (what was I thinking), I cannot actually find a real bug. > > > > But could you try the

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Peter Zijlstra
On Thu, Mar 20, 2014 at 11:03:50AM +0530, Srikar Dronamraju wrote: Joy,.. let me look at that with ppc in mind. OK; so while pretty much all the comments from that patch are utter nonsense (what was I thinking), I cannot actually find a real bug. But could you try the below which

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Srikar Dronamraju
This problem suggests that we missed a wakeup for a task that was adding itself to the queue in a wait path. And the only place that can happen is with the hb spinlock check for any pending waiters. Just in case we missed some assumption about checking the hash bucket spinlock as a way of

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 15:38 +0530, Srikar Dronamraju wrote: This problem suggests that we missed a wakeup for a task that was adding itself to the queue in a wait path. And the only place that can happen is with the hb spinlock check for any pending waiters. Just in case we missed some

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Wed, 2014-03-19 at 22:56 -0700, Davidlohr Bueso wrote: On Thu, 2014-03-20 at 11:03 +0530, Srikar Dronamraju wrote: Joy,.. let me look at that with ppc in mind. OK; so while pretty much all the comments from that patch are utter nonsense (what was I thinking), I cannot actually

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Wed, Mar 19, 2014 at 10:56 PM, Davidlohr Bueso davidl...@hp.com wrote: This problem suggests that we missed a wakeup for a task that was adding itself to the queue in a wait path. And the only place that can happen is with the hb spinlock check for any pending waiters. Ok, so thinking

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 09:41 -0700, Linus Torvalds wrote: On Wed, Mar 19, 2014 at 10:56 PM, Davidlohr Bueso davidl...@hp.com wrote: This problem suggests that we missed a wakeup for a task that was adding itself to the queue in a wait path. And the only place that can happen is with the hb

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso davidl...@hp.com wrote: It strikes me that the spin_is_locked() test has no barriers wrt the writing of the new futex value on the wake path. And the read barrier obviously does nothing wrt the write either. Or am I missing something? So the

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 10:42 -0700, Linus Torvalds wrote: On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso davidl...@hp.com wrote: It strikes me that the spin_is_locked() test has no barriers wrt the writing of the new futex value on the wake path. And the read barrier obviously does

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 11:03 AM, Davidlohr Bueso davidl...@hp.com wrote: I still wonder about ppc and spinlocks (no ticketing!!) ... sure the waiters patch might fix the problem just because we explicitly count the members of the plist. And I guess if we cannot rely on all archs having an

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso davidl...@hp.com wrote: Comparing with the patch I sent earlier this morning, looks equivalent, and fwiw, passes my initial qemu bootup, which is the first way of detecting anything stupid going on. So, Srikar, please try this patch out, as

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 11:36 -0700, Linus Torvalds wrote: On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso davidl...@hp.com wrote: Comparing with the patch I sent earlier this morning, looks equivalent, and fwiw, passes my initial qemu bootup, which is the first way of detecting anything

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 12:08 PM, Davidlohr Bueso davidl...@hp.com wrote: Oh, it does. This atomics technique was tested at a customer's site and ready for upstream. I'm not worried about the *original* patch. I'm worried about the incremental one. Your original patch never applied to my tree

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 12:25 -0700, Linus Torvalds wrote: On Thu, Mar 20, 2014 at 12:08 PM, Davidlohr Bueso davidl...@hp.com wrote: Oh, it does. This atomics technique was tested at a customer's site and ready for upstream. I'm not worried about the *original* patch. I'm worried about the

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Benjamin Herrenschmidt
On Thu, 2014-03-20 at 09:31 -0700, Davidlohr Bueso wrote: hmmm looking at ppc spinlock code, it seems that it doesn't have ticket spinlocks -- in fact Torsten Duwe has been trying to get them upstream very recently. Since we rely on the counter for detecting waiters, this might explain the

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 1:20 PM, Davidlohr Bueso davidl...@hp.com wrote: I reverted commits 99b60ce6 (documentation) and b0c29f79 (the offending commit), and then I cleanly applied the equivalent ones from v3 of the series (which was already *tested* and ready for upstream until you suggested

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Srikar Dronamraju
Ok, so a big reason why this patch doesn't apply cleanly after reverting is because *most* of the changes were done at the top of the file with regards to documenting the ordering guarantees, the actual code changes are quite minimal. I reverted commits 99b60ce6 (documentation) and

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 9:55 PM, Srikar Dronamraju sri...@linux.vnet.ibm.com wrote: I reverted commits 99b60ce6 and b0c29f79. Then applied the patches in the above url. The last one had a reject but it was pretty straightforward to resolve it. After this, specjbb completes. So reverting and

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 11:03 +0530, Srikar Dronamraju wrote: > > > Joy,.. let me look at that with ppc in mind. > > > > OK; so while pretty much all the comments from that patch are utter > > nonsense (what was I thinking), I cannot actually find a real bug. > > > > But could you try the below

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Srikar Dronamraju
> > Joy,.. let me look at that with ppc in mind. > > OK; so while pretty much all the comments from that patch are utter > nonsense (what was I thinking), I cannot actually find a real bug. > > But could you try the below which replaces a control dependency with a > full barrier. The control

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Davidlohr Bueso
On Wed, 2014-03-19 at 18:08 +0100, Peter Zijlstra wrote: > On Wed, Mar 19, 2014 at 04:47:05PM +0100, Peter Zijlstra wrote: > > > I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 > > > and confirmed that > > > reverting the commit solved the problem. > > > > Joy,.. let me

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Peter Zijlstra
On Wed, Mar 19, 2014 at 04:47:05PM +0100, Peter Zijlstra wrote: > > I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and > > confirmed that > > reverting the commit solved the problem. > > Joy,.. let me look at that with ppc in mind. OK; so while pretty much all the

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Linus Torvalds
On Wed, Mar 19, 2014 at 8:26 AM, Srikar Dronamraju wrote: > > I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and > confirmed that > reverting the commit solved the problem. Ok. I'll give Peter and Davidlohr a few days to perhaps find something obvious, but I guess we'll

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Srikar Dronamraju
> > > > Infact I can reproduce this if the java_constraint is either node, socket, > > system. > > However I am not able to reproduce if java_constraint is set to core. > > What's any of that mean? > Using the constraint, one can specify how many jvm instances should participate in the

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Peter Zijlstra
On Wed, Mar 19, 2014 at 08:56:19PM +0530, Srikar Dronamraju wrote: > There are 332 tasks all stuck in futex_wait_queue_me(). > I am able to reproduce this consistently. > > Infact I can reproduce this if the java_constraint is either node, socket, > system. > However I am not able to reproduce

Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Srikar Dronamraju
Hi, When running specjbb on a power7 numa box, I am seeing java threads getting stuck in futex # ps -Ao pid,tt,user,fname,tmout,f,wchan | grep futex 14808 pts/0root java - 0 futex_wait_queue_me 14925 pts/0root java - 0 futex_wait_queue_me # stack traces, I

Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Srikar Dronamraju
Hi, When running specjbb on a power7 numa box, I am seeing java threads getting stuck in futex # ps -Ao pid,tt,user,fname,tmout,f,wchan | grep futex 14808 pts/0root java - 0 futex_wait_queue_me 14925 pts/0root java - 0 futex_wait_queue_me # stack traces, I

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Peter Zijlstra
On Wed, Mar 19, 2014 at 08:56:19PM +0530, Srikar Dronamraju wrote: There are 332 tasks all stuck in futex_wait_queue_me(). I am able to reproduce this consistently. Infact I can reproduce this if the java_constraint is either node, socket, system. However I am not able to reproduce if

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Srikar Dronamraju
Infact I can reproduce this if the java_constraint is either node, socket, system. However I am not able to reproduce if java_constraint is set to core. What's any of that mean? Using the constraint, one can specify how many jvm instances should participate in the specjbb run. For

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Linus Torvalds
On Wed, Mar 19, 2014 at 8:26 AM, Srikar Dronamraju sri...@linux.vnet.ibm.com wrote: I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and confirmed that reverting the commit solved the problem. Ok. I'll give Peter and Davidlohr a few days to perhaps find something

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Peter Zijlstra
On Wed, Mar 19, 2014 at 04:47:05PM +0100, Peter Zijlstra wrote: I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and confirmed that reverting the commit solved the problem. Joy,.. let me look at that with ppc in mind. OK; so while pretty much all the comments from

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Davidlohr Bueso
On Wed, 2014-03-19 at 18:08 +0100, Peter Zijlstra wrote: On Wed, Mar 19, 2014 at 04:47:05PM +0100, Peter Zijlstra wrote: I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and confirmed that reverting the commit solved the problem. Joy,.. let me look at that

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Srikar Dronamraju
Joy,.. let me look at that with ppc in mind. OK; so while pretty much all the comments from that patch are utter nonsense (what was I thinking), I cannot actually find a real bug. But could you try the below which replaces a control dependency with a full barrier. The control flow is

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 11:03 +0530, Srikar Dronamraju wrote: Joy,.. let me look at that with ppc in mind. OK; so while pretty much all the comments from that patch are utter nonsense (what was I thinking), I cannot actually find a real bug. But could you try the below which replaces