Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-07-01 Thread Paul E. McKenney
On Wed, Jul 01, 2015 at 06:16:40PM +0200, Peter Zijlstra wrote: > On Wed, Jul 01, 2015 at 08:56:55AM -0700, Paul E. McKenney wrote: > > On Wed, Jul 01, 2015 at 01:56:42PM +0200, Peter Zijlstra wrote: > > Odd that you have four of eight of the rcuos CPUs with higher consumption > > than the others.

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-07-01 Thread Peter Zijlstra
On Wed, Jul 01, 2015 at 08:56:55AM -0700, Paul E. McKenney wrote: > On Wed, Jul 01, 2015 at 01:56:42PM +0200, Peter Zijlstra wrote: > Odd that you have four of eight of the rcuos CPUs with higher consumption > than the others. I would expect three of eight. Are you by chance running > an

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-07-01 Thread Paul E. McKenney
On Wed, Jul 01, 2015 at 01:56:42PM +0200, Peter Zijlstra wrote: > On Tue, Jun 30, 2015 at 02:32:58PM -0700, Paul E. McKenney wrote: > > > > I had indeed forgotten that got farmed out to the kthread; on which, my > > > poor desktop seems to have spend ~140 minutes of its (most recent) > > >

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-07-01 Thread Peter Zijlstra
On Tue, Jun 30, 2015 at 02:32:58PM -0700, Paul E. McKenney wrote: > > I had indeed forgotten that got farmed out to the kthread; on which, my > > poor desktop seems to have spend ~140 minutes of its (most recent) > > existence poking RCU things. > > > > 7 root 20 0 0 0

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-07-01 Thread Paul E. McKenney
On Wed, Jul 01, 2015 at 06:16:40PM +0200, Peter Zijlstra wrote: On Wed, Jul 01, 2015 at 08:56:55AM -0700, Paul E. McKenney wrote: On Wed, Jul 01, 2015 at 01:56:42PM +0200, Peter Zijlstra wrote: Odd that you have four of eight of the rcuos CPUs with higher consumption than the others. I

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-07-01 Thread Peter Zijlstra
On Tue, Jun 30, 2015 at 02:32:58PM -0700, Paul E. McKenney wrote: I had indeed forgotten that got farmed out to the kthread; on which, my poor desktop seems to have spend ~140 minutes of its (most recent) existence poking RCU things. 7 root 20 0 0 0 0 S 0.0

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-07-01 Thread Paul E. McKenney
On Wed, Jul 01, 2015 at 01:56:42PM +0200, Peter Zijlstra wrote: On Tue, Jun 30, 2015 at 02:32:58PM -0700, Paul E. McKenney wrote: I had indeed forgotten that got farmed out to the kthread; on which, my poor desktop seems to have spend ~140 minutes of its (most recent) existence poking

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-07-01 Thread Peter Zijlstra
On Wed, Jul 01, 2015 at 08:56:55AM -0700, Paul E. McKenney wrote: On Wed, Jul 01, 2015 at 01:56:42PM +0200, Peter Zijlstra wrote: Odd that you have four of eight of the rcuos CPUs with higher consumption than the others. I would expect three of eight. Are you by chance running an eight-core

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-30 Thread Paul E. McKenney
On Mon, Jun 29, 2015 at 09:56:46AM +0200, Peter Zijlstra wrote: > On Fri, Jun 26, 2015 at 09:14:28AM -0700, Paul E. McKenney wrote: > > > To me it just makes more sense to have a single RCU state machine. With > > > expedited we'll push it as fast as we can, but no faster. > > > > Suppose that

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-30 Thread Paul E. McKenney
On Mon, Jun 29, 2015 at 09:56:46AM +0200, Peter Zijlstra wrote: On Fri, Jun 26, 2015 at 09:14:28AM -0700, Paul E. McKenney wrote: To me it just makes more sense to have a single RCU state machine. With expedited we'll push it as fast as we can, but no faster. Suppose that someone

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-29 Thread Peter Zijlstra
On Fri, Jun 26, 2015 at 09:14:28AM -0700, Paul E. McKenney wrote: > > To me it just makes more sense to have a single RCU state machine. With > > expedited we'll push it as fast as we can, but no faster. > > Suppose that someone invokes synchronize_sched_expedited(), but there > is no normal

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-29 Thread Peter Zijlstra
On Fri, Jun 26, 2015 at 09:14:28AM -0700, Paul E. McKenney wrote: To me it just makes more sense to have a single RCU state machine. With expedited we'll push it as fast as we can, but no faster. Suppose that someone invokes synchronize_sched_expedited(), but there is no normal grace

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-26 Thread Paul E. McKenney
On Fri, Jun 26, 2015 at 02:32:07PM +0200, Peter Zijlstra wrote: > On Thu, Jun 25, 2015 at 07:51:46AM -0700, Paul E. McKenney wrote: > > > So please humour me and explain how all this is far more complicated ;-) > > > > Yeah, I do need to get RCU design/implementation documentation put together. >

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-26 Thread Peter Zijlstra
On Thu, Jun 25, 2015 at 07:51:46AM -0700, Paul E. McKenney wrote: > > So please humour me and explain how all this is far more complicated ;-) > > Yeah, I do need to get RCU design/implementation documentation put together. > > In the meantime, RCU's normal grace-period machinery is designed to

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-26 Thread Paul E. McKenney
On Fri, Jun 26, 2015 at 02:32:07PM +0200, Peter Zijlstra wrote: On Thu, Jun 25, 2015 at 07:51:46AM -0700, Paul E. McKenney wrote: So please humour me and explain how all this is far more complicated ;-) Yeah, I do need to get RCU design/implementation documentation put together. In

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-26 Thread Peter Zijlstra
On Thu, Jun 25, 2015 at 07:51:46AM -0700, Paul E. McKenney wrote: So please humour me and explain how all this is far more complicated ;-) Yeah, I do need to get RCU design/implementation documentation put together. In the meantime, RCU's normal grace-period machinery is designed to be

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-25 Thread Peter Zijlstra
On Tue, Jun 23, 2015 at 07:24:16PM +0200, Oleg Nesterov wrote: > IOW. Suppose we add ->work_mutex into struct cpu_stopper. Btw, > I think we should move all per-cpu variables there... > > Now, > > lock_stop_cpus_works(cpumask) > { > for_each_cpu(cpu, cpumask) >

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-25 Thread Paul E. McKenney
On Thu, Jun 25, 2015 at 04:20:11PM +0200, Peter Zijlstra wrote: > On Thu, Jun 25, 2015 at 06:47:55AM -0700, Paul E. McKenney wrote: > > On Thu, Jun 25, 2015 at 01:07:34PM +0200, Peter Zijlstra wrote: > > > I'm still somewhat confused by the whole strict order sequence vs this > > > non ordered

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-25 Thread Peter Zijlstra
On Thu, Jun 25, 2015 at 06:47:55AM -0700, Paul E. McKenney wrote: > On Thu, Jun 25, 2015 at 01:07:34PM +0200, Peter Zijlstra wrote: > > I'm still somewhat confused by the whole strict order sequence vs this > > non ordered 'polling' of global state. > > > > This funnel thing basically waits

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-25 Thread Paul E. McKenney
On Thu, Jun 25, 2015 at 01:07:34PM +0200, Peter Zijlstra wrote: > On Wed, Jun 24, 2015 at 08:23:17PM -0700, Paul E. McKenney wrote: > > Here is what I had in mind, where you don't have any global trashing > > except when the ->expedited_sequence gets updated. Passes mild rcutorture > > testing. >

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-25 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 08:23:17PM -0700, Paul E. McKenney wrote: > Here is what I had in mind, where you don't have any global trashing > except when the ->expedited_sequence gets updated. Passes mild rcutorture > testing. > /* > + * Each pass through the following loop works its way

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-25 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 08:23:17PM -0700, Paul E. McKenney wrote: Here is what I had in mind, where you don't have any global trashing except when the -expedited_sequence gets updated. Passes mild rcutorture testing. /* + * Each pass through the following loop works its way +

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-25 Thread Paul E. McKenney
On Thu, Jun 25, 2015 at 01:07:34PM +0200, Peter Zijlstra wrote: On Wed, Jun 24, 2015 at 08:23:17PM -0700, Paul E. McKenney wrote: Here is what I had in mind, where you don't have any global trashing except when the -expedited_sequence gets updated. Passes mild rcutorture testing.

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-25 Thread Paul E. McKenney
On Thu, Jun 25, 2015 at 04:20:11PM +0200, Peter Zijlstra wrote: On Thu, Jun 25, 2015 at 06:47:55AM -0700, Paul E. McKenney wrote: On Thu, Jun 25, 2015 at 01:07:34PM +0200, Peter Zijlstra wrote: I'm still somewhat confused by the whole strict order sequence vs this non ordered 'polling' of

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-25 Thread Peter Zijlstra
On Thu, Jun 25, 2015 at 06:47:55AM -0700, Paul E. McKenney wrote: On Thu, Jun 25, 2015 at 01:07:34PM +0200, Peter Zijlstra wrote: I'm still somewhat confused by the whole strict order sequence vs this non ordered 'polling' of global state. This funnel thing basically waits random times

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-25 Thread Peter Zijlstra
On Tue, Jun 23, 2015 at 07:24:16PM +0200, Oleg Nesterov wrote: IOW. Suppose we add -work_mutex into struct cpu_stopper. Btw, I think we should move all per-cpu variables there... Now, lock_stop_cpus_works(cpumask) { for_each_cpu(cpu, cpumask)

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 07:58:30PM +0200, Peter Zijlstra wrote: > On Wed, Jun 24, 2015 at 10:10:17AM -0700, Paul E. McKenney wrote: > > > The thing is, once you start bailing on this condition your 'queue' > > > drains very fast and this is around the same time sync_rcu() would've > > > released

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 07:28:18PM +0200, Peter Zijlstra wrote: > How about something like this, it replaced mutex and start/done ticket > thing with an MCS style lockless FIFO queue. > > I further uses the gpnum/completed thing to short circuit things if > we've waited long enough. Prettier

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 10:10:17AM -0700, Paul E. McKenney wrote: > > The thing is, once you start bailing on this condition your 'queue' > > drains very fast and this is around the same time sync_rcu() would've > > released the waiters too. > > In my experience, this sort of thing simply melts

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 07:28:18PM +0200, Peter Zijlstra wrote: > +unlock: > + /* MCS style queue 'unlock' */ > + next = READ_ONCE(entry.next); > + if (!next) { > + if (cmpxchg(>expedited_queue, , NULL) == ) > + goto done; > + while (!(next =

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 10:20:18AM -0700, Paul E. McKenney wrote: > Except that I promised Ingo I would check for CPUs failing to schedule > quickly enough, which means that I must track them individually rather > than via a single counter... You can track individual CPUs timestamps by extending

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 10:10:17AM -0700, Paul E. McKenney wrote: > OK, I will give this a try. Of course, the counter needs to be > initialized to 1 rather than zero, and it needs to be atomically > decremented after all stop_one_cpu_nowait() invocations, otherwise you > can get an early wakeup

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 10:10:04AM -0700, Paul E. McKenney wrote: > On Wed, Jun 24, 2015 at 06:42:00PM +0200, Peter Zijlstra wrote: > > On Wed, Jun 24, 2015 at 09:09:04AM -0700, Paul E. McKenney wrote: [ . . . ] > > > It looks like I do need to use smp_call_function_single() and your > > >

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 06:42:00PM +0200, Peter Zijlstra wrote: > On Wed, Jun 24, 2015 at 09:09:04AM -0700, Paul E. McKenney wrote: > > Yes, good point, that would be a way of speeding the existing polling > > loop up in the case where the polling loop took longer than a normal > > grace period.

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 09:09:04AM -0700, Paul E. McKenney wrote: > Yes, good point, that would be a way of speeding the existing polling > loop up in the case where the polling loop took longer than a normal > grace period. Might also be a way to speed up the new "polling" regime, > but I am

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 05:40:10PM +0200, Peter Zijlstra wrote: > On Wed, Jun 24, 2015 at 08:27:19AM -0700, Paul E. McKenney wrote: > > > The thing is, if we're stalled on a stop_one_cpu() call, the sync_rcu() > > > is equally stalled. The sync_rcu() cannot wait more efficient than we're > > >

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 08:27:19AM -0700, Paul E. McKenney wrote: > > The thing is, if we're stalled on a stop_one_cpu() call, the sync_rcu() > > is equally stalled. The sync_rcu() cannot wait more efficient than we're > > already waiting either. > > Ah, but synchronize_rcu() doesn't force

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 08:01:29AM -0700, Paul E. McKenney wrote: > On Wed, Jun 24, 2015 at 10:32:57AM +0200, Peter Zijlstra wrote: > > On Tue, Jun 23, 2015 at 07:23:44PM -0700, Paul E. McKenney wrote: > > > And here is an untested patch that applies the gist of your approach, > > > the series of

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 05:01:51PM +0200, Peter Zijlstra wrote: > On Wed, Jun 24, 2015 at 07:50:42AM -0700, Paul E. McKenney wrote: > > On Wed, Jun 24, 2015 at 09:35:03AM +0200, Peter Zijlstra wrote: > > > > I still don't see a problem here though; the stop_one_cpu() invocation > > > for the CPU

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 10:32:57AM +0200, Peter Zijlstra wrote: > On Tue, Jun 23, 2015 at 07:23:44PM -0700, Paul E. McKenney wrote: > > And here is an untested patch that applies the gist of your approach, > > the series of stop_one_cpu() calls, but without undoing the rest. > > I forged your

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 07:50:42AM -0700, Paul E. McKenney wrote: > On Wed, Jun 24, 2015 at 09:35:03AM +0200, Peter Zijlstra wrote: > > I still don't see a problem here though; the stop_one_cpu() invocation > > for the CPU that's suffering its preemption latency will take longer, > > but so what?

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 09:35:03AM +0200, Peter Zijlstra wrote: > On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: > > > I really think you're making that expedited nonsense far too accessible. > > > > This has nothing to do with accessibility and everything to do with > >

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 03:43:37PM +0200, Ingo Molnar wrote: > > * Paul E. McKenney wrote: > > > On Wed, Jun 24, 2015 at 10:42:48AM +0200, Ingo Molnar wrote: > > > > > > * Peter Zijlstra wrote: > > > > > > > On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: > > > > > > > > >

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 11:31:02AM +0200, Peter Zijlstra wrote: > On Wed, Jun 24, 2015 at 10:32:57AM +0200, Peter Zijlstra wrote: > > + s = atomic_long_read(>expedited_done); > > + if (ULONG_CMP_GE((ulong)s, (ulong)snap)) { > > + /* ensure test happens before caller kfree */ > > +

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Ingo Molnar
* Paul E. McKenney wrote: > On Wed, Jun 24, 2015 at 10:42:48AM +0200, Ingo Molnar wrote: > > > > * Peter Zijlstra wrote: > > > > > On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: > > > > > > > > > > I really think you're making that expedited nonsense far too > > > > >

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 10:42:48AM +0200, Ingo Molnar wrote: > > * Peter Zijlstra wrote: > > > On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: > > > > > > > > I really think you're making that expedited nonsense far too accessible. > > > > > > This has nothing to do with

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 10:32:57AM +0200, Peter Zijlstra wrote: > + s = atomic_long_read(>expedited_done); > + if (ULONG_CMP_GE((ulong)s, (ulong)snap)) { > + /* ensure test happens before caller kfree */ > + smp_mb__before_atomic(); /* ^^^ */ FWIW isn't that

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Ingo Molnar
* Peter Zijlstra wrote: > On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: > > > > > > I really think you're making that expedited nonsense far too accessible. > > > > This has nothing to do with accessibility and everything to do with > > robustness. And with me not

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Tue, Jun 23, 2015 at 07:23:44PM -0700, Paul E. McKenney wrote: > And here is an untested patch that applies the gist of your approach, > the series of stop_one_cpu() calls, but without undoing the rest. > I forged your Signed-off-by, please let me know if that doesn't work > for you. There are

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: > > I really think you're making that expedited nonsense far too accessible. > > This has nothing to do with accessibility and everything to do with > robustness. And with me not becoming the triage center for too many > non-RCU

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Tue, Jun 23, 2015 at 07:23:44PM -0700, Paul E. McKenney wrote: And here is an untested patch that applies the gist of your approach, the series of stop_one_cpu() calls, but without undoing the rest. I forged your Signed-off-by, please let me know if that doesn't work for you. There are a

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Ingo Molnar
* Peter Zijlstra pet...@infradead.org wrote: On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: I really think you're making that expedited nonsense far too accessible. This has nothing to do with accessibility and everything to do with robustness. And with me not

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 10:32:57AM +0200, Peter Zijlstra wrote: + s = atomic_long_read(rsp-expedited_done); + if (ULONG_CMP_GE((ulong)s, (ulong)snap)) { + /* ensure test happens before caller kfree */ + smp_mb__before_atomic(); /* ^^^ */ FWIW isn't that

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: I really think you're making that expedited nonsense far too accessible. This has nothing to do with accessibility and everything to do with robustness. And with me not becoming the triage center for too many non-RCU bugs.

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 10:42:48AM +0200, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: I really think you're making that expedited nonsense far too accessible. This has nothing to do with

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 03:43:37PM +0200, Ingo Molnar wrote: * Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Wed, Jun 24, 2015 at 10:42:48AM +0200, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E.

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 09:35:03AM +0200, Peter Zijlstra wrote: On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: I really think you're making that expedited nonsense far too accessible. This has nothing to do with accessibility and everything to do with robustness. And

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 11:31:02AM +0200, Peter Zijlstra wrote: On Wed, Jun 24, 2015 at 10:32:57AM +0200, Peter Zijlstra wrote: + s = atomic_long_read(rsp-expedited_done); + if (ULONG_CMP_GE((ulong)s, (ulong)snap)) { + /* ensure test happens before caller kfree */ +

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 08:27:19AM -0700, Paul E. McKenney wrote: The thing is, if we're stalled on a stop_one_cpu() call, the sync_rcu() is equally stalled. The sync_rcu() cannot wait more efficient than we're already waiting either. Ah, but synchronize_rcu() doesn't force waiting on

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 08:01:29AM -0700, Paul E. McKenney wrote: On Wed, Jun 24, 2015 at 10:32:57AM +0200, Peter Zijlstra wrote: On Tue, Jun 23, 2015 at 07:23:44PM -0700, Paul E. McKenney wrote: And here is an untested patch that applies the gist of your approach, the series of

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Ingo Molnar
* Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Wed, Jun 24, 2015 at 10:42:48AM +0200, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: I really think you're making that expedited

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 07:50:42AM -0700, Paul E. McKenney wrote: On Wed, Jun 24, 2015 at 09:35:03AM +0200, Peter Zijlstra wrote: I still don't see a problem here though; the stop_one_cpu() invocation for the CPU that's suffering its preemption latency will take longer, but so what?

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 10:32:57AM +0200, Peter Zijlstra wrote: On Tue, Jun 23, 2015 at 07:23:44PM -0700, Paul E. McKenney wrote: And here is an untested patch that applies the gist of your approach, the series of stop_one_cpu() calls, but without undoing the rest. I forged your

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 05:01:51PM +0200, Peter Zijlstra wrote: On Wed, Jun 24, 2015 at 07:50:42AM -0700, Paul E. McKenney wrote: On Wed, Jun 24, 2015 at 09:35:03AM +0200, Peter Zijlstra wrote: I still don't see a problem here though; the stop_one_cpu() invocation for the CPU that's

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 07:58:30PM +0200, Peter Zijlstra wrote: On Wed, Jun 24, 2015 at 10:10:17AM -0700, Paul E. McKenney wrote: The thing is, once you start bailing on this condition your 'queue' drains very fast and this is around the same time sync_rcu() would've released the waiters

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 09:09:04AM -0700, Paul E. McKenney wrote: Yes, good point, that would be a way of speeding the existing polling loop up in the case where the polling loop took longer than a normal grace period. Might also be a way to speed up the new polling regime, but I am still

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 05:40:10PM +0200, Peter Zijlstra wrote: On Wed, Jun 24, 2015 at 08:27:19AM -0700, Paul E. McKenney wrote: The thing is, if we're stalled on a stop_one_cpu() call, the sync_rcu() is equally stalled. The sync_rcu() cannot wait more efficient than we're already

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 07:28:18PM +0200, Peter Zijlstra wrote: How about something like this, it replaced mutex and start/done ticket thing with an MCS style lockless FIFO queue. I further uses the gpnum/completed thing to short circuit things if we've waited long enough. Prettier version

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 10:10:04AM -0700, Paul E. McKenney wrote: On Wed, Jun 24, 2015 at 06:42:00PM +0200, Peter Zijlstra wrote: On Wed, Jun 24, 2015 at 09:09:04AM -0700, Paul E. McKenney wrote: [ . . . ] It looks like I do need to use smp_call_function_single() and your resched_cpu()

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 10:20:18AM -0700, Paul E. McKenney wrote: Except that I promised Ingo I would check for CPUs failing to schedule quickly enough, which means that I must track them individually rather than via a single counter... You can track individual CPUs timestamps by extending the

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 07:28:18PM +0200, Peter Zijlstra wrote: +unlock: + /* MCS style queue 'unlock' */ + next = READ_ONCE(entry.next); + if (!next) { + if (cmpxchg(rsp-expedited_queue, entry, NULL) == entry) + goto done; + while

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 10:10:17AM -0700, Paul E. McKenney wrote: The thing is, once you start bailing on this condition your 'queue' drains very fast and this is around the same time sync_rcu() would've released the waiters too. In my experience, this sort of thing simply melts down on

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Peter Zijlstra
On Wed, Jun 24, 2015 at 10:10:17AM -0700, Paul E. McKenney wrote: OK, I will give this a try. Of course, the counter needs to be initialized to 1 rather than zero, and it needs to be atomically decremented after all stop_one_cpu_nowait() invocations, otherwise you can get an early wakeup due

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-24 Thread Paul E. McKenney
On Wed, Jun 24, 2015 at 06:42:00PM +0200, Peter Zijlstra wrote: On Wed, Jun 24, 2015 at 09:09:04AM -0700, Paul E. McKenney wrote: Yes, good point, that would be a way of speeding the existing polling loop up in the case where the polling loop took longer than a normal grace period. Might

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Paul E. McKenney
On Tue, Jun 23, 2015 at 12:05:06PM -0700, Paul E. McKenney wrote: > On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: > > On Tue, Jun 23, 2015 at 08:04:11PM +0200, Peter Zijlstra wrote: > > > On Tue, Jun 23, 2015 at 10:30:38AM -0700, Paul E. McKenney wrote: > > > > Good, you don't

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Paul E. McKenney
On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: > On Tue, Jun 23, 2015 at 08:04:11PM +0200, Peter Zijlstra wrote: > > On Tue, Jun 23, 2015 at 10:30:38AM -0700, Paul E. McKenney wrote: > > > Good, you don't need this because you can check for dynticks later. > > > You will need to

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Paul E. McKenney
On Tue, Jun 23, 2015 at 08:04:11PM +0200, Peter Zijlstra wrote: > On Tue, Jun 23, 2015 at 10:30:38AM -0700, Paul E. McKenney wrote: > > Good, you don't need this because you can check for dynticks later. > > You will need to check for offline CPUs. > > get_online_cpus() > for_each_online_cpus() {

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Peter Zijlstra
On Tue, Jun 23, 2015 at 10:30:38AM -0700, Paul E. McKenney wrote: > Good, you don't need this because you can check for dynticks later. > You will need to check for offline CPUs. get_online_cpus() for_each_online_cpus() { ... } is what the new code does. > > - /* > > -* Each pass through

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Paul E. McKenney
On Tue, Jun 23, 2015 at 03:08:26PM +0200, Peter Zijlstra wrote: > On Tue, Jun 23, 2015 at 01:20:41PM +0200, Peter Zijlstra wrote: > > Paul, why does this use stop_machine anyway? I seemed to remember you > > sending resched IPIs around. It used to, but someone submitted a patch long ago that

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Oleg Nesterov
On 06/23, Oleg Nesterov wrote: > > > It would be nice to remove stop_cpus_mutex, it actually protects > stop_cpus_work... Then probably stop_two_cpus() can just use > stop_cpus(). We could simply make stop_cpus_mutex per-cpu too, > but this doesn't look nice. IOW. Suppose we add ->work_mutex into

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Oleg Nesterov
On 06/23, Peter Zijlstra wrote: > > void synchronize_sched_expedited(void) > { ... > - while (try_stop_cpus(cma ? cm : cpu_online_mask, > - synchronize_sched_expedited_cpu_stop, > - NULL) == -EAGAIN) { > - put_online_cpus(); > -

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Oleg Nesterov
On 06/23, Peter Zijlstra wrote: > > On Tue, Jun 23, 2015 at 12:21:52AM +0200, Oleg Nesterov wrote: > > > Suppose that stop_two_cpus(cpu1 => 0, cpu2 => 1) races with stop_machine(). > > > > - stop_machine takes the lock on CPU 0, adds the work > > and drops the lock > > > > -

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Paul E. McKenney
On Tue, Jun 23, 2015 at 12:55:48PM +0200, Peter Zijlstra wrote: > On Tue, Jun 23, 2015 at 12:09:32PM +0200, Peter Zijlstra wrote: > > We can of course slap a percpu-rwsem in, but I wonder if there's > > anything smarter we can do here. > > Urgh, we cannot use percpu-rwsem here, because that would

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Peter Zijlstra
On Tue, Jun 23, 2015 at 01:20:41PM +0200, Peter Zijlstra wrote: > Paul, why does this use stop_machine anyway? I seemed to remember you > sending resched IPIs around. > > The rcu_sched_qs() thing would set passed_quiesce, which you can then > collect to gauge progress. > > Shooting IPIs around

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Peter Zijlstra
On Tue, Jun 23, 2015 at 12:55:48PM +0200, Peter Zijlstra wrote: > On Tue, Jun 23, 2015 at 12:09:32PM +0200, Peter Zijlstra wrote: > > We can of course slap a percpu-rwsem in, but I wonder if there's > > anything smarter we can do here. > > Urgh, we cannot use percpu-rwsem here, because that would

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Peter Zijlstra
On Tue, Jun 23, 2015 at 12:09:32PM +0200, Peter Zijlstra wrote: > We can of course slap a percpu-rwsem in, but I wonder if there's > anything smarter we can do here. Urgh, we cannot use percpu-rwsem here, because that would require percpu_down_write_trylock(), and I'm not sure we can get around

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Peter Zijlstra
On Tue, Jun 23, 2015 at 12:21:52AM +0200, Oleg Nesterov wrote: > Suppose that stop_two_cpus(cpu1 => 0, cpu2 => 1) races with stop_machine(). > > - stop_machine takes the lock on CPU 0, adds the work > and drops the lock > > - cpu_stop_queue_work() queues both works

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Paul E. McKenney
On Tue, Jun 23, 2015 at 12:55:48PM +0200, Peter Zijlstra wrote: On Tue, Jun 23, 2015 at 12:09:32PM +0200, Peter Zijlstra wrote: We can of course slap a percpu-rwsem in, but I wonder if there's anything smarter we can do here. Urgh, we cannot use percpu-rwsem here, because that would

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Peter Zijlstra
On Tue, Jun 23, 2015 at 01:20:41PM +0200, Peter Zijlstra wrote: Paul, why does this use stop_machine anyway? I seemed to remember you sending resched IPIs around. The rcu_sched_qs() thing would set passed_quiesce, which you can then collect to gauge progress. Shooting IPIs around is bad

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Oleg Nesterov
On 06/23, Oleg Nesterov wrote: It would be nice to remove stop_cpus_mutex, it actually protects stop_cpus_work... Then probably stop_two_cpus() can just use stop_cpus(). We could simply make stop_cpus_mutex per-cpu too, but this doesn't look nice. IOW. Suppose we add -work_mutex into struct

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Oleg Nesterov
On 06/23, Peter Zijlstra wrote: void synchronize_sched_expedited(void) { ... - while (try_stop_cpus(cma ? cm : cpu_online_mask, - synchronize_sched_expedited_cpu_stop, - NULL) == -EAGAIN) { - put_online_cpus(); -

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Oleg Nesterov
On 06/23, Peter Zijlstra wrote: On Tue, Jun 23, 2015 at 12:21:52AM +0200, Oleg Nesterov wrote: Suppose that stop_two_cpus(cpu1 = 0, cpu2 = 1) races with stop_machine(). - stop_machine takes the lock on CPU 0, adds the work and drops the lock - cpu_stop_queue_work()

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Paul E. McKenney
On Tue, Jun 23, 2015 at 03:08:26PM +0200, Peter Zijlstra wrote: On Tue, Jun 23, 2015 at 01:20:41PM +0200, Peter Zijlstra wrote: Paul, why does this use stop_machine anyway? I seemed to remember you sending resched IPIs around. It used to, but someone submitted a patch long ago that switched

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Peter Zijlstra
On Tue, Jun 23, 2015 at 10:30:38AM -0700, Paul E. McKenney wrote: Good, you don't need this because you can check for dynticks later. You will need to check for offline CPUs. get_online_cpus() for_each_online_cpus() { ... } is what the new code does. - /* -* Each pass through the

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Paul E. McKenney
On Tue, Jun 23, 2015 at 08:04:11PM +0200, Peter Zijlstra wrote: On Tue, Jun 23, 2015 at 10:30:38AM -0700, Paul E. McKenney wrote: Good, you don't need this because you can check for dynticks later. You will need to check for offline CPUs. get_online_cpus() for_each_online_cpus() { ...

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Paul E. McKenney
On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: On Tue, Jun 23, 2015 at 08:04:11PM +0200, Peter Zijlstra wrote: On Tue, Jun 23, 2015 at 10:30:38AM -0700, Paul E. McKenney wrote: Good, you don't need this because you can check for dynticks later. You will need to check

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Peter Zijlstra
On Tue, Jun 23, 2015 at 12:55:48PM +0200, Peter Zijlstra wrote: On Tue, Jun 23, 2015 at 12:09:32PM +0200, Peter Zijlstra wrote: We can of course slap a percpu-rwsem in, but I wonder if there's anything smarter we can do here. Urgh, we cannot use percpu-rwsem here, because that would

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Peter Zijlstra
On Tue, Jun 23, 2015 at 12:21:52AM +0200, Oleg Nesterov wrote: Suppose that stop_two_cpus(cpu1 = 0, cpu2 = 1) races with stop_machine(). - stop_machine takes the lock on CPU 0, adds the work and drops the lock - cpu_stop_queue_work() queues both works

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Peter Zijlstra
On Tue, Jun 23, 2015 at 12:09:32PM +0200, Peter Zijlstra wrote: We can of course slap a percpu-rwsem in, but I wonder if there's anything smarter we can do here. Urgh, we cannot use percpu-rwsem here, because that would require percpu_down_write_trylock(), and I'm not sure we can get around the

Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

2015-06-23 Thread Paul E. McKenney
On Tue, Jun 23, 2015 at 12:05:06PM -0700, Paul E. McKenney wrote: On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: On Tue, Jun 23, 2015 at 08:04:11PM +0200, Peter Zijlstra wrote: On Tue, Jun 23, 2015 at 10:30:38AM -0700, Paul E. McKenney wrote: Good, you don't need this

  1   2   >