Re: cpu stopper threads and load balancing leads to deadlock

2018-05-22 Thread Paul E. McKenney
On Thu, May 17, 2018 at 07:56:14AM -0700, Paul E. McKenney wrote: > On Thu, May 17, 2018 at 04:23:22PM +0200, Peter Zijlstra wrote: > > On Thu, May 17, 2018 at 07:03:45AM -0700, Paul E. McKenney wrote: > > > On Tue, May 15, 2018 at 06:30:26AM +0200, Mike Galbraith wrote: > > > I have not queued

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-17 Thread Paul E. McKenney
On Thu, May 17, 2018 at 04:23:22PM +0200, Peter Zijlstra wrote: > On Thu, May 17, 2018 at 07:03:45AM -0700, Paul E. McKenney wrote: > > On Tue, May 15, 2018 at 06:30:26AM +0200, Mike Galbraith wrote: > > I have not queued it, but given Peter's Signed-off-by and your Tested-by > > I would be happy

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-17 Thread Peter Zijlstra
On Thu, May 17, 2018 at 07:03:45AM -0700, Paul E. McKenney wrote: > On Tue, May 15, 2018 at 06:30:26AM +0200, Mike Galbraith wrote: > I have not queued it, but given Peter's Signed-off-by and your Tested-by > I would be happy to do so. And a Changelog of course :-) --- From: Peter Zijlstra

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-17 Thread Mike Galbraith
On Thu, 2018-05-17 at 07:03 -0700, Paul E. McKenney wrote: > On Tue, May 15, 2018 at 06:30:26AM +0200, Mike Galbraith wrote: > > > > Something like so perhaps? Mike, can you play around with that? Could > > > burn your granny and eat your cookies. > > > > Did this get queued anywhere? > > I

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-17 Thread Paul E. McKenney
On Tue, May 15, 2018 at 06:30:26AM +0200, Mike Galbraith wrote: > On Thu, 2018-05-03 at 18:45 +0200, Peter Zijlstra wrote: > > On Thu, May 03, 2018 at 09:12:31AM -0700, Paul E. McKenney wrote: > > > On Thu, May 03, 2018 at 04:44:50PM +0200, Peter Zijlstra wrote: > > > > On Thu, May 03, 2018 at

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-14 Thread Mike Galbraith
On Thu, 2018-05-03 at 18:45 +0200, Peter Zijlstra wrote: > On Thu, May 03, 2018 at 09:12:31AM -0700, Paul E. McKenney wrote: > > On Thu, May 03, 2018 at 04:44:50PM +0200, Peter Zijlstra wrote: > > > On Thu, May 03, 2018 at 04:16:55PM +0200, Mike Galbraith wrote: > > > > On Thu, 2018-05-03 at 15:56

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Mike Galbraith
On Thu, 2018-05-03 at 18:45 +0200, Peter Zijlstra wrote: > > Something like so perhaps? Mike, can you play around with that? Could > burn your granny and eat your cookies. That worked, and nothing entertaining has happened.. yet. Hm, I could use this kernel to update my backup drive, if there's

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Paul E. McKenney
On Thu, May 03, 2018 at 07:54:56PM +0200, Peter Zijlstra wrote: > On Thu, May 03, 2018 at 10:18:50AM -0700, Paul E. McKenney wrote: > > > + if (per_cpu(rcu_cpu_started, cpu)) > > > > I would log a non-splat dmesg the first time this happened, just for my > > future sanity, but otherwise looks

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Peter Zijlstra
On Thu, May 03, 2018 at 10:18:50AM -0700, Paul E. McKenney wrote: > > + if (per_cpu(rcu_cpu_started, cpu)) > > I would log a non-splat dmesg the first time this happened, just for my > future sanity, but otherwise looks fine. I am a bit concerned about > calls to rcu_cpu_starting() getting

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Paul E. McKenney
On Thu, May 03, 2018 at 06:45:08PM +0200, Peter Zijlstra wrote: > On Thu, May 03, 2018 at 09:12:31AM -0700, Paul E. McKenney wrote: > > On Thu, May 03, 2018 at 04:44:50PM +0200, Peter Zijlstra wrote: > > > On Thu, May 03, 2018 at 04:16:55PM +0200, Mike Galbraith wrote: > > > > On Thu, 2018-05-03

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Peter Zijlstra
On Thu, May 03, 2018 at 09:12:31AM -0700, Paul E. McKenney wrote: > On Thu, May 03, 2018 at 04:44:50PM +0200, Peter Zijlstra wrote: > > On Thu, May 03, 2018 at 04:16:55PM +0200, Mike Galbraith wrote: > > > On Thu, 2018-05-03 at 15:56 +0200, Peter Zijlstra wrote: > > > > On Thu, May 03, 2018 at

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Paul E. McKenney
On Thu, May 03, 2018 at 04:44:50PM +0200, Peter Zijlstra wrote: > On Thu, May 03, 2018 at 04:16:55PM +0200, Mike Galbraith wrote: > > On Thu, 2018-05-03 at 15:56 +0200, Peter Zijlstra wrote: > > > On Thu, May 03, 2018 at 03:32:39PM +0200, Mike Galbraith wrote: > > > > > > > Dang. With $subject

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Peter Zijlstra
On Thu, May 03, 2018 at 07:39:41AM -0700, Paul E. McKenney wrote: > Huh. > > No, RCU_NONIDLE() only works for idle, not for offline. Oh bummer.. > Maybe... Let me take a look. There must be some way to mark a > specific lock acquisition and release as being lockdep-invisible... But I suspect

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Peter Zijlstra
On Thu, May 03, 2018 at 04:16:55PM +0200, Mike Galbraith wrote: > On Thu, 2018-05-03 at 15:56 +0200, Peter Zijlstra wrote: > > On Thu, May 03, 2018 at 03:32:39PM +0200, Mike Galbraith wrote: > > > > > Dang. With $subject fix applied as well.. > > > > That's a NO then... :-( > > Could say who

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Paul E. McKenney
On Thu, May 03, 2018 at 03:56:17PM +0200, Peter Zijlstra wrote: > On Thu, May 03, 2018 at 03:32:39PM +0200, Mike Galbraith wrote: > > > Dang. With $subject fix applied as well.. > > That's a NO then... :-( > > > [ 151.103732] smpboot: Booting Node 0 Processor 2 APIC 0x4 > > [ 151.104908]

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Mike Galbraith
On Thu, 2018-05-03 at 15:56 +0200, Peter Zijlstra wrote: > On Thu, May 03, 2018 at 03:32:39PM +0200, Mike Galbraith wrote: > > > Dang. With $subject fix applied as well.. > > That's a NO then... :-( Could say who cares about oddball offline wakeup stat.

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Peter Zijlstra
On Thu, May 03, 2018 at 03:32:39PM +0200, Mike Galbraith wrote: > Dang. With $subject fix applied as well.. That's a NO then... :-( > [ 151.103732] smpboot: Booting Node 0 Processor 2 APIC 0x4 > [ 151.104908] = > [ 151.104909] WARNING: suspicious RCU usage > [

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Mike Galbraith
On Thu, 2018-05-03 at 14:49 +0200, Peter Zijlstra wrote: > On Thu, May 03, 2018 at 02:40:21PM +0200, Mike Galbraith wrote: > > On Thu, 2018-05-03 at 14:28 +0200, Peter Zijlstra wrote: > > > > > > Hurm.. I don't see how this is 'new'. We moved the wakeup out from under > > > stopper lock, but that

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Peter Zijlstra
On Thu, May 03, 2018 at 02:40:21PM +0200, Mike Galbraith wrote: > On Thu, 2018-05-03 at 14:28 +0200, Peter Zijlstra wrote: > > > > Hurm.. I don't see how this is 'new'. We moved the wakeup out from under > > stopper lock, but that should not affect the RCU state. > > No, not new, just an

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Mike Galbraith
On Thu, 2018-05-03 at 14:28 +0200, Peter Zijlstra wrote: > > Hurm.. I don't see how this is 'new'. We moved the wakeup out from under > stopper lock, but that should not affect the RCU state. No, not new, just an additional woes from same spot. -Mike

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Peter Zijlstra
On Thu, May 03, 2018 at 02:12:22PM +0200, Mike Galbraith wrote: > [ 124.216939] = > [ 124.216939] WARNING: suspicious RCU usage > [ 124.216941] 4.17.0.g66d489e-tip-default #82 Tainted: GE > [ 124.216941] - > [ 124.216943]

Re: cpu stopper threads and load balancing leads to deadlock

2018-05-03 Thread Mike Galbraith
On Tue, 2018-04-24 at 14:33 +0100, Matt Fleming wrote: > On Fri, 20 Apr, at 11:50:05AM, Peter Zijlstra wrote: > > On Tue, Apr 17, 2018 at 03:21:19PM +0100, Matt Fleming wrote: > > > Hi guys, > > > > > > We've seen a bug in one of our SLE kernels where the cpu stopper > > > thread ("migration/15")

Re: cpu stopper threads and load balancing leads to deadlock

2018-04-24 Thread Matt Fleming
On Fri, 20 Apr, at 11:50:05AM, Peter Zijlstra wrote: > On Tue, Apr 17, 2018 at 03:21:19PM +0100, Matt Fleming wrote: > > Hi guys, > > > > We've seen a bug in one of our SLE kernels where the cpu stopper > > thread ("migration/15") is entering idle balance. This then triggers > > active load

Re: cpu stopper threads and load balancing leads to deadlock

2018-04-20 Thread Peter Zijlstra
On Tue, Apr 17, 2018 at 03:21:19PM +0100, Matt Fleming wrote: > Hi guys, > > We've seen a bug in one of our SLE kernels where the cpu stopper > thread ("migration/15") is entering idle balance. This then triggers > active load balance. > > At the same time, a task on another CPU triggers a page

Re: cpu stopper threads and load balancing leads to deadlock

2018-04-18 Thread Mike Galbraith
On Wed, 2018-04-18 at 07:47 +0200, Mike Galbraith wrote: > On Tue, 2018-04-17 at 15:21 +0100, Matt Fleming wrote: > > Hi guys, > > > > We've seen a bug in one of our SLE kernels where the cpu stopper > > thread ("migration/15") is entering idle balance. This then triggers > > active load balance.

Re: cpu stopper threads and load balancing leads to deadlock

2018-04-17 Thread Mike Galbraith
On Tue, 2018-04-17 at 15:21 +0100, Matt Fleming wrote: > Hi guys, > > We've seen a bug in one of our SLE kernels where the cpu stopper > thread ("migration/15") is entering idle balance. This then triggers > active load balance. > > At the same time, a task on another CPU triggers a page fault

cpu stopper threads and load balancing leads to deadlock

2018-04-17 Thread Matt Fleming
Hi guys, We've seen a bug in one of our SLE kernels where the cpu stopper thread ("migration/15") is entering idle balance. This then triggers active load balance. At the same time, a task on another CPU triggers a page fault and NUMA balancing kicks in to try and migrate the task closer to the