subject:"\[RFC PATCH v2\] memory\-barriers\: remove smp_mb__after_unlock

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-31 Thread Paul Mackerras

On Thu, Jul 16, 2015 at 03:14:55PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2015-07-16 at 15:03 +1000, Benjamin Herrenschmidt wrote:
> > On Thu, 2015-07-16 at 12:00 +1000, Michael Ellerman wrote:
> > > That would fix the problem with smp_mb__after_unlock_lock(), but not
> > > the original worry we had about loads happening before the SC in lock.
> > 
> > However I think isync fixes *that* :-) The problem with isync is as you
> > said, it's not a -memory- barrier per-se, it's an execution barrier /
> > context synchronizing instruction. The combination stwcx. + bne + isync
> > however prevents the execution of anything past the isync until the
> > stwcx has completed and the bne has been "decided", which prevents loads
> > from leaking into the LL/SC loop. It will also prevent a store in the
> > lock from being issued before the stwcx. has completed. It does *not*
> > prevent as far as I can tell another unrelated store before the lock
> > from leaking into the lock, including the one used to unlock a different
> > lock.
> 
> Except that the architecture says:
> 
> <<
> Because a Store Conditional instruction may com-
> plete before its store has been performed, a condi-
> tional Branch instruction that depends on the CR0
> value set by a Store Conditional instruction does
> not order the Store Conditional's store with respect
> to storage accesses caused by instructions that
> follow the Branch
> >>
> 
> So isync in lock in architecturally incorrect, despite being what the
> architecture recommends using, yay !

That paragraph doesn't say anything about isync.  Doing stcx., bne,
isync to acquire a lock will ensure that the instructions inside the
locked region can't execute until the lock is globally known to be
acquired.  It is true that another CPU that doesn't take the lock
could see the effect of some load or store inside the locked region
and then see the lock variable itself being clear; but any CPU that
tried to acquire the lock would have its stcx. fail (at least until
the first CPU releases the lock).

So isync in the lock sequence is architecturally correct.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-25 Thread Michael Ellerman

On Tue, 2015-08-25 at 17:27 -0700, Paul E. McKenney wrote:
> On Thu, Aug 20, 2015 at 04:56:04PM +0100, Will Deacon wrote:
> > On Thu, Aug 20, 2015 at 10:45:05AM +0100, Michael Ellerman wrote:
> > > On Tue, 2015-08-18 at 09:37 +0100, Will Deacon wrote:
> > > > 
> > > > Thanks, that sounds great. FWIW, there are multiple ways of implementing
> > > > the patch (i.e. whether you strengthen lock or unlock). I had a crack at
> > > > something here, but it's not tested:
> > > > 
> > > >   http://marc.info/?l=linux-arch&m=143758379023849&w=2
> > > 
> > > I notice you are not changing PPC_RELEASE_BARRIER, but only the spin 
> > > unlock
> > > code. But from my reading of the docs we need to make sure any 
> > > UNLOCK+LOCK is a
> > > full barrier, not just spin unlock/lock?
> > > 
> > > So don't we need to worry about some of the other locks as well? At least
> > > rwlock, and mutex fast path?
> > 
> > Hmm, that's a good question. I notice that you don't do any of the SYNC_IO
> > stuff for any locks other than spinlocks but I don't know whether
> > smp_mb__after_unlock_lock is similarly limited in scope.
> > 
> > Paul?
> 
> I would expect the various locks to have similar ordering characteristics.
> 
> Or am I missing something subtle here?

I don't think so.

The docs just talk about ACQUIRE/RELEASE, so I think it needs to apply to all
lock types. Or at least the list mentioned in the docs which is:

 (*) spin locks
 (*) R/W spin locks
 (*) mutexes
 (*) semaphores
 (*) R/W semaphores
 (*) RCU

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-25 Thread Paul E. McKenney

On Thu, Aug 20, 2015 at 04:56:04PM +0100, Will Deacon wrote:
> On Thu, Aug 20, 2015 at 10:45:05AM +0100, Michael Ellerman wrote:
> > On Tue, 2015-08-18 at 09:37 +0100, Will Deacon wrote:
> > > On Tue, Aug 18, 2015 at 02:50:55AM +0100, Michael Ellerman wrote:
> > > > On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote:
> > > > > On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
> > > > > > On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
> > > > > > > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
> > > > > > > I thought the end result of this thread was that we didn't *need* 
> > > > > > > to change the
> > > > > > > powerpc lock semantics? Or did I read it wrong?
> > > > > > > 
> > > > > > > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, 
> > > > > > > which is
> > > > > > > consistent with our current implementation.
> > > > > > 
> > > > > > That change happened about 1.5 years ago, and I thought that the
> > > > > > current discussion was about reversing it, based in part on the
> > > > > > recent powerpc benchmarks of locking primitives with and without the
> > > > > > sync instruction.  But regardless, I clearly cannot remove either 
> > > > > > the
> > > > > > smp_mb__after_unlock_lock() or the powerpc definition of it to be 
> > > > > > smp_mb()
> > > > > > if powerpc unlock/lock is not strengthened.
> > > > > 
> > > > > Yup. Peter and I would really like to get rid of 
> > > > > smp_mb__after_unlock_lock
> > > > > entirely, which would mean strengthening the ppc spinlocks. Moving the
> > > > > barrier primitive into RCU is a good step to prevent more widespread 
> > > > > usage
> > > > > of the barrier, but we'd really like to go further if the performance 
> > > > > impact
> > > > > is deemed acceptable (which is what this thread is about).
> > > > 
> > > > OK, sorry for completely missing the point, too many balls in the air 
> > > > here.
> > > 
> > > No problem!
> > > 
> > > > I'll do some benchmarks and see what we come up with.
> > > 
> > > Thanks, that sounds great. FWIW, there are multiple ways of implementing
> > > the patch (i.e. whether you strengthen lock or unlock). I had a crack at
> > > something here, but it's not tested:
> > > 
> > >   http://marc.info/?l=linux-arch&m=143758379023849&w=2
> > 
> > Thanks.
> > 
> > I notice you are not changing PPC_RELEASE_BARRIER, but only the spin unlock
> > code. But from my reading of the docs we need to make sure any UNLOCK+LOCK 
> > is a
> > full barrier, not just spin unlock/lock?
> > 
> > So don't we need to worry about some of the other locks as well? At least
> > rwlock, and mutex fast path?
> 
> Hmm, that's a good question. I notice that you don't do any of the SYNC_IO
> stuff for any locks other than spinlocks but I don't know whether
> smp_mb__after_unlock_lock is similarly limited in scope.
> 
> Paul?

I would expect the various locks to have similar ordering characteristics.

Or am I missing something subtle here?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-20 Thread Will Deacon

On Thu, Aug 20, 2015 at 10:45:05AM +0100, Michael Ellerman wrote:
> On Tue, 2015-08-18 at 09:37 +0100, Will Deacon wrote:
> > On Tue, Aug 18, 2015 at 02:50:55AM +0100, Michael Ellerman wrote:
> > > On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote:
> > > > On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
> > > > > On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
> > > > > > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
> > > > > > I thought the end result of this thread was that we didn't *need* 
> > > > > > to change the
> > > > > > powerpc lock semantics? Or did I read it wrong?
> > > > > > 
> > > > > > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, 
> > > > > > which is
> > > > > > consistent with our current implementation.
> > > > > 
> > > > > That change happened about 1.5 years ago, and I thought that the
> > > > > current discussion was about reversing it, based in part on the
> > > > > recent powerpc benchmarks of locking primitives with and without the
> > > > > sync instruction.  But regardless, I clearly cannot remove either the
> > > > > smp_mb__after_unlock_lock() or the powerpc definition of it to be 
> > > > > smp_mb()
> > > > > if powerpc unlock/lock is not strengthened.
> > > > 
> > > > Yup. Peter and I would really like to get rid of 
> > > > smp_mb__after_unlock_lock
> > > > entirely, which would mean strengthening the ppc spinlocks. Moving the
> > > > barrier primitive into RCU is a good step to prevent more widespread 
> > > > usage
> > > > of the barrier, but we'd really like to go further if the performance 
> > > > impact
> > > > is deemed acceptable (which is what this thread is about).
> > > 
> > > OK, sorry for completely missing the point, too many balls in the air 
> > > here.
> > 
> > No problem!
> > 
> > > I'll do some benchmarks and see what we come up with.
> > 
> > Thanks, that sounds great. FWIW, there are multiple ways of implementing
> > the patch (i.e. whether you strengthen lock or unlock). I had a crack at
> > something here, but it's not tested:
> > 
> >   http://marc.info/?l=linux-arch&m=143758379023849&w=2
> 
> Thanks.
> 
> I notice you are not changing PPC_RELEASE_BARRIER, but only the spin unlock
> code. But from my reading of the docs we need to make sure any UNLOCK+LOCK is 
> a
> full barrier, not just spin unlock/lock?
> 
> So don't we need to worry about some of the other locks as well? At least
> rwlock, and mutex fast path?

Hmm, that's a good question. I notice that you don't do any of the SYNC_IO
stuff for any locks other than spinlocks but I don't know whether
smp_mb__after_unlock_lock is similarly limited in scope.

Paul?

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-20 Thread Michael Ellerman

On Tue, 2015-08-18 at 09:37 +0100, Will Deacon wrote:
> On Tue, Aug 18, 2015 at 02:50:55AM +0100, Michael Ellerman wrote:
> > On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote:
> > > On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
> > > > On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
> > > > > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
> > > > > I thought the end result of this thread was that we didn't *need* to 
> > > > > change the
> > > > > powerpc lock semantics? Or did I read it wrong?
> > > > > 
> > > > > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, 
> > > > > which is
> > > > > consistent with our current implementation.
> > > > 
> > > > That change happened about 1.5 years ago, and I thought that the
> > > > current discussion was about reversing it, based in part on the
> > > > recent powerpc benchmarks of locking primitives with and without the
> > > > sync instruction.  But regardless, I clearly cannot remove either the
> > > > smp_mb__after_unlock_lock() or the powerpc definition of it to be 
> > > > smp_mb()
> > > > if powerpc unlock/lock is not strengthened.
> > > 
> > > Yup. Peter and I would really like to get rid of smp_mb__after_unlock_lock
> > > entirely, which would mean strengthening the ppc spinlocks. Moving the
> > > barrier primitive into RCU is a good step to prevent more widespread usage
> > > of the barrier, but we'd really like to go further if the performance 
> > > impact
> > > is deemed acceptable (which is what this thread is about).
> > 
> > OK, sorry for completely missing the point, too many balls in the air here.
> 
> No problem!
> 
> > I'll do some benchmarks and see what we come up with.
> 
> Thanks, that sounds great. FWIW, there are multiple ways of implementing
> the patch (i.e. whether you strengthen lock or unlock). I had a crack at
> something here, but it's not tested:
> 
>   http://marc.info/?l=linux-arch&m=143758379023849&w=2

Thanks.

I notice you are not changing PPC_RELEASE_BARRIER, but only the spin unlock
code. But from my reading of the docs we need to make sure any UNLOCK+LOCK is a
full barrier, not just spin unlock/lock?

So don't we need to worry about some of the other locks as well? At least
rwlock, and mutex fast path?

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-18 Thread Will Deacon

On Tue, Aug 18, 2015 at 02:50:55AM +0100, Michael Ellerman wrote:
> On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote:
> > On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
> > > On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
> > > > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
> > > > I thought the end result of this thread was that we didn't *need* to 
> > > > change the
> > > > powerpc lock semantics? Or did I read it wrong?
> > > > 
> > > > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which 
> > > > is
> > > > consistent with our current implementation.
> > > 
> > > That change happened about 1.5 years ago, and I thought that the
> > > current discussion was about reversing it, based in part on the
> > > recent powerpc benchmarks of locking primitives with and without the
> > > sync instruction.  But regardless, I clearly cannot remove either the
> > > smp_mb__after_unlock_lock() or the powerpc definition of it to be smp_mb()
> > > if powerpc unlock/lock is not strengthened.
> > 
> > Yup. Peter and I would really like to get rid of smp_mb__after_unlock_lock
> > entirely, which would mean strengthening the ppc spinlocks. Moving the
> > barrier primitive into RCU is a good step to prevent more widespread usage
> > of the barrier, but we'd really like to go further if the performance impact
> > is deemed acceptable (which is what this thread is about).
> 
> OK, sorry for completely missing the point, too many balls in the air here.

No problem!

> I'll do some benchmarks and see what we come up with.

Thanks, that sounds great. FWIW, there are multiple ways of implementing
the patch (i.e. whether you strengthen lock or unlock). I had a crack at
something here, but it's not tested:

  http://marc.info/?l=linux-arch&m=143758379023849&w=2

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-17 Thread Michael Ellerman

On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote:
> On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
> > On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
> > > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
> > > I thought the end result of this thread was that we didn't *need* to 
> > > change the
> > > powerpc lock semantics? Or did I read it wrong?
> > > 
> > > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which is
> > > consistent with our current implementation.
> > 
> > That change happened about 1.5 years ago, and I thought that the
> > current discussion was about reversing it, based in part on the
> > recent powerpc benchmarks of locking primitives with and without the
> > sync instruction.  But regardless, I clearly cannot remove either the
> > smp_mb__after_unlock_lock() or the powerpc definition of it to be smp_mb()
> > if powerpc unlock/lock is not strengthened.
> 
> Yup. Peter and I would really like to get rid of smp_mb__after_unlock_lock
> entirely, which would mean strengthening the ppc spinlocks. Moving the
> barrier primitive into RCU is a good step to prevent more widespread usage
> of the barrier, but we'd really like to go further if the performance impact
> is deemed acceptable (which is what this thread is about).

OK, sorry for completely missing the point, too many balls in the air here.

I'll do some benchmarks and see what we come up with.

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-17 Thread Will Deacon

On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
> On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
> > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
> > > > On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
> > > > > On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
> > > > > > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
> > > > > > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > > > > > > > > Author: Paul E. McKenney 
> > > > > > > > > Date:   Tue Jul 14 18:35:23 2015 -0700
> > > > > > > > > 
> > > > > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > > > > > > > > 
> > > > > > > > > RCU is the only thing that uses 
> > > > > > > > > smp_mb__after_unlock_lock(), and is
> > > > > > > > > likely the only thing that ever will use it, so this 
> > > > > > > > > commit makes this
> > > > > > > > > macro private to RCU.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Paul E. McKenney 
> > > > > > > > > 
> > > > > > > > > Cc: Will Deacon 
> > > > > > > > > Cc: Peter Zijlstra 
> > > > > > > > > Cc: Benjamin Herrenschmidt 
> > > > > > > > > Cc: "linux-a...@vger.kernel.org" 
> > > > > > > > > 
> > > > > > 
> > > > > > Are you planning to queue this somewhere? I think it makes sense 
> > > > > > regardless
> > > > > > of whether we change PowerPc or not and ideally it would be merged 
> > > > > > around
> > > > > > the same time as my relaxed atomics series.
> > > > > 
> > > > > I have is in -rcu.  By default, I will push it to the 4.4 merge 
> > > > > window.
> > > > > Please let me know if you need it sooner.
> > > > 
> > > > The generic relaxed atomics are now queued in -tip, so it would be 
> > > > really
> > > > good to see this Documentation update land in 4.3 if at all possible. I
> > > > appreciate it's late in the cycle, but it's always worth asking.
> > > 
> > > Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
> > > commit, and if it passes a few day's worth of testing, I will see what
> > > Ingo has to say about a pull request.
> > > 
> > > This commit also privatizes smp_mb__after_unlock_lock() as well as
> > > updating documentation.  Looks like we need to strengthen powerpc's
> > > locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
> > > Or did that already happen and I just missed it?
> > 
> > No it didn't.
> > 
> > I thought the end result of this thread was that we didn't *need* to change 
> > the
> > powerpc lock semantics? Or did I read it wrong?
> > 
> > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which is
> > consistent with our current implementation.
> 
> That change happened about 1.5 years ago, and I thought that the
> current discussion was about reversing it, based in part on the
> recent powerpc benchmarks of locking primitives with and without the
> sync instruction.  But regardless, I clearly cannot remove either the
> smp_mb__after_unlock_lock() or the powerpc definition of it to be smp_mb()
> if powerpc unlock/lock is not strengthened.

Yup. Peter and I would really like to get rid of smp_mb__after_unlock_lock
entirely, which would mean strengthening the ppc spinlocks. Moving the
barrier primitive into RCU is a good step to prevent more widespread usage
of the barrier, but we'd really like to go further if the performance impact
is deemed acceptable (which is what this thread is about).

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-16 Thread Paul E. McKenney

On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
> On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
> > On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
> > > Hello Paul,
> > > 
> > > On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
> > > > On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
> > > > > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
> > > > > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > > > > > > > Author: Paul E. McKenney 
> > > > > > > > Date:   Tue Jul 14 18:35:23 2015 -0700
> > > > > > > > 
> > > > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > > > > > > > 
> > > > > > > > RCU is the only thing that uses 
> > > > > > > > smp_mb__after_unlock_lock(), and is
> > > > > > > > likely the only thing that ever will use it, so this commit 
> > > > > > > > makes this
> > > > > > > > macro private to RCU.
> > > > > > > > 
> > > > > > > > Signed-off-by: Paul E. McKenney 
> > > > > > > > Cc: Will Deacon 
> > > > > > > > Cc: Peter Zijlstra 
> > > > > > > > Cc: Benjamin Herrenschmidt 
> > > > > > > > Cc: "linux-a...@vger.kernel.org" 
> > > > > > > > 
> > > > > 
> > > > > Are you planning to queue this somewhere? I think it makes sense 
> > > > > regardless
> > > > > of whether we change PowerPc or not and ideally it would be merged 
> > > > > around
> > > > > the same time as my relaxed atomics series.
> > > > 
> > > > I have is in -rcu.  By default, I will push it to the 4.4 merge window.
> > > > Please let me know if you need it sooner.
> > > 
> > > The generic relaxed atomics are now queued in -tip, so it would be really
> > > good to see this Documentation update land in 4.3 if at all possible. I
> > > appreciate it's late in the cycle, but it's always worth asking.
> > 
> > Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
> > commit, and if it passes a few day's worth of testing, I will see what
> > Ingo has to say about a pull request.
> > 
> > This commit also privatizes smp_mb__after_unlock_lock() as well as
> > updating documentation.  Looks like we need to strengthen powerpc's
> > locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
> > Or did that already happen and I just missed it?
> 
> No it didn't.
> 
> I thought the end result of this thread was that we didn't *need* to change 
> the
> powerpc lock semantics? Or did I read it wrong?
> 
> ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which is
> consistent with our current implementation.

That change happened about 1.5 years ago, and I thought that the
current discussion was about reversing it, based in part on the
recent powerpc benchmarks of locking primitives with and without the
sync instruction.  But regardless, I clearly cannot remove either the
smp_mb__after_unlock_lock() or the powerpc definition of it to be smp_mb()
if powerpc unlock/lock is not strengthened.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-16 Thread Michael Ellerman

On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
> On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
> > Hello Paul,
> > 
> > On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
> > > On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
> > > > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
> > > > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > > > > > > Author: Paul E. McKenney 
> > > > > > > Date:   Tue Jul 14 18:35:23 2015 -0700
> > > > > > > 
> > > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > > > > > > 
> > > > > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), 
> > > > > > > and is
> > > > > > > likely the only thing that ever will use it, so this commit 
> > > > > > > makes this
> > > > > > > macro private to RCU.
> > > > > > > 
> > > > > > > Signed-off-by: Paul E. McKenney 
> > > > > > > Cc: Will Deacon 
> > > > > > > Cc: Peter Zijlstra 
> > > > > > > Cc: Benjamin Herrenschmidt 
> > > > > > > Cc: "linux-a...@vger.kernel.org" 
> > > > 
> > > > Are you planning to queue this somewhere? I think it makes sense 
> > > > regardless
> > > > of whether we change PowerPc or not and ideally it would be merged 
> > > > around
> > > > the same time as my relaxed atomics series.
> > > 
> > > I have is in -rcu.  By default, I will push it to the 4.4 merge window.
> > > Please let me know if you need it sooner.
> > 
> > The generic relaxed atomics are now queued in -tip, so it would be really
> > good to see this Documentation update land in 4.3 if at all possible. I
> > appreciate it's late in the cycle, but it's always worth asking.
> 
> Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
> commit, and if it passes a few day's worth of testing, I will see what
> Ingo has to say about a pull request.
> 
> This commit also privatizes smp_mb__after_unlock_lock() as well as
> updating documentation.  Looks like we need to strengthen powerpc's
> locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
> Or did that already happen and I just missed it?

No it didn't.

I thought the end result of this thread was that we didn't *need* to change the
powerpc lock semantics? Or did I read it wrong?

ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which is
consistent with our current implementation.

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-13 Thread Paul E. McKenney

On Thu, Aug 13, 2015 at 11:49:28AM +0100, Will Deacon wrote:
> On Wed, Aug 12, 2015 at 06:59:38PM +0100, Paul E. McKenney wrote:
> > On Wed, Aug 12, 2015 at 08:43:46AM -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
> > > > The generic relaxed atomics are now queued in -tip, so it would be 
> > > > really
> > > > good to see this Documentation update land in 4.3 if at all possible. I
> > > > appreciate it's late in the cycle, but it's always worth asking.
> > > 
> > > Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
> > > commit, and if it passes a few day's worth of testing, I will see what
> > > Ingo has to say about a pull request.
> > > 
> > > This commit also privatizes smp_mb__after_unlock_lock() as well as
> > > updating documentation.  Looks like we need to strengthen powerpc's
> > > locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
> > > Or did that already happen and I just missed it?
> > 
> > And just for completeness, here is the current version of that commit.
> > 
> > Thanx, Paul
> > 
> > 
> > 
> >  b/Documentation/memory-barriers.txt   |   71 
> > +-
> >  b/arch/powerpc/include/asm/spinlock.h |2 
> >  b/include/linux/spinlock.h|   10 
> >  b/kernel/rcu/tree.h   |   12 +
> >  4 files changed, 16 insertions(+), 79 deletions(-)
> > 
> > commit 12d560f4ea87030667438a169912380be00cea4b
> > Author: Paul E. McKenney 
> > Date:   Tue Jul 14 18:35:23 2015 -0700
> > 
> > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > 
> > RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
> > likely the only thing that ever will use it, so this commit makes this
> > macro private to RCU.
> > 
> > Signed-off-by: Paul E. McKenney 
> > Cc: Will Deacon 
> > Cc: Peter Zijlstra 
> > Cc: Benjamin Herrenschmidt 
> > Cc: "linux-a...@vger.kernel.org" 
> 
> Acked-by: Will Deacon 
> 
> I don't think the PowerPC spinlock change is queued anywhere (I sent it
> out as a diff for discussion, but that was it). This patch doesn't rely
> on that though, right?

No, this patch just moves the smp_mb__after_unlock_lock() definition,
it does not change the code generated.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-13 Thread Will Deacon

On Wed, Aug 12, 2015 at 06:59:38PM +0100, Paul E. McKenney wrote:
> On Wed, Aug 12, 2015 at 08:43:46AM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
> > > The generic relaxed atomics are now queued in -tip, so it would be really
> > > good to see this Documentation update land in 4.3 if at all possible. I
> > > appreciate it's late in the cycle, but it's always worth asking.
> > 
> > Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
> > commit, and if it passes a few day's worth of testing, I will see what
> > Ingo has to say about a pull request.
> > 
> > This commit also privatizes smp_mb__after_unlock_lock() as well as
> > updating documentation.  Looks like we need to strengthen powerpc's
> > locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
> > Or did that already happen and I just missed it?
> 
> And just for completeness, here is the current version of that commit.
> 
>   Thanx, Paul
> 
> 
> 
>  b/Documentation/memory-barriers.txt   |   71 
> +-
>  b/arch/powerpc/include/asm/spinlock.h |2 
>  b/include/linux/spinlock.h|   10 
>  b/kernel/rcu/tree.h   |   12 +
>  4 files changed, 16 insertions(+), 79 deletions(-)
> 
> commit 12d560f4ea87030667438a169912380be00cea4b
> Author: Paul E. McKenney 
> Date:   Tue Jul 14 18:35:23 2015 -0700
> 
> rcu,locking: Privatize smp_mb__after_unlock_lock()
> 
> RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
> likely the only thing that ever will use it, so this commit makes this
> macro private to RCU.
> 
> Signed-off-by: Paul E. McKenney 
> Cc: Will Deacon 
> Cc: Peter Zijlstra 
> Cc: Benjamin Herrenschmidt 
> Cc: "linux-a...@vger.kernel.org" 

Acked-by: Will Deacon 

I don't think the PowerPC spinlock change is queued anywhere (I sent it
out as a diff for discussion, but that was it). This patch doesn't rely
on that though, right?

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-12 Thread Paul E. McKenney

On Wed, Aug 12, 2015 at 08:43:46AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
> > Hello Paul,
> > 
> > On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
> > > On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
> > > > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
> > > > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > > > > > > Author: Paul E. McKenney 
> > > > > > > Date:   Tue Jul 14 18:35:23 2015 -0700
> > > > > > > 
> > > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > > > > > > 
> > > > > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), 
> > > > > > > and is
> > > > > > > likely the only thing that ever will use it, so this commit 
> > > > > > > makes this
> > > > > > > macro private to RCU.
> > > > > > > 
> > > > > > > Signed-off-by: Paul E. McKenney 
> > > > > > > Cc: Will Deacon 
> > > > > > > Cc: Peter Zijlstra 
> > > > > > > Cc: Benjamin Herrenschmidt 
> > > > > > > Cc: "linux-a...@vger.kernel.org" 
> > > > 
> > > > Are you planning to queue this somewhere? I think it makes sense 
> > > > regardless
> > > > of whether we change PowerPc or not and ideally it would be merged 
> > > > around
> > > > the same time as my relaxed atomics series.
> > > 
> > > I have is in -rcu.  By default, I will push it to the 4.4 merge window.
> > > Please let me know if you need it sooner.
> > 
> > The generic relaxed atomics are now queued in -tip, so it would be really
> > good to see this Documentation update land in 4.3 if at all possible. I
> > appreciate it's late in the cycle, but it's always worth asking.
> 
> Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
> commit, and if it passes a few day's worth of testing, I will see what
> Ingo has to say about a pull request.
> 
> This commit also privatizes smp_mb__after_unlock_lock() as well as
> updating documentation.  Looks like we need to strengthen powerpc's
> locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
> Or did that already happen and I just missed it?

And just for completeness, here is the current version of that commit.

Thanx, Paul



 b/Documentation/memory-barriers.txt   |   71 +-
 b/arch/powerpc/include/asm/spinlock.h |2 
 b/include/linux/spinlock.h|   10 
 b/kernel/rcu/tree.h   |   12 +
 4 files changed, 16 insertions(+), 79 deletions(-)

commit 12d560f4ea87030667438a169912380be00cea4b
Author: Paul E. McKenney 
Date:   Tue Jul 14 18:35:23 2015 -0700

rcu,locking: Privatize smp_mb__after_unlock_lock()

RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
likely the only thing that ever will use it, so this commit makes this
macro private to RCU.

Signed-off-by: Paul E. McKenney 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Benjamin Herrenschmidt 
Cc: "linux-a...@vger.kernel.org" 

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index 318523872db5..eafa6a53f72c 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only from 
the perspective of
 another CPU not holding that lock.  In short, a ACQUIRE followed by an
 RELEASE may -not- be assumed to be a full memory barrier.
 
-Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
-imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
-pair to produce a full barrier, the ACQUIRE can be followed by an
-smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
-(including transitivity) if either (a) the RELEASE and the ACQUIRE are
-executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
-the same variable.  The smp_mb__after_unlock_lock() primitive is free
-on many architectures.  Without smp_mb__after_unlock_lock(), the CPU's
-execution of the critical sections corresponding to the RELEASE and the
-ACQUIRE can cross, so that:
+Similarly, the reverse case of a RELEASE followed by an ACQUIRE does
+not imply a full memory barrier.  Therefore, the CPU's execution of the
+critical sections corresponding to the RELEASE and the ACQUIRE can cross,
+so that:
 
*A = a;
RELEASE M
@@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding the 
deadlock.
a sleep-unlock race, but the locking primitive needs to resolve
such races properly in any case.
 
-With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
-For example, with the following code, the store to *A will always be
-seen by other CPUs before the store to *B:
-
-   *A = a;
-   RELEASE M
-   ACQUIRE

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-12 Thread Paul E. McKenney

On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
> Hello Paul,
> 
> On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
> > On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
> > > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
> > > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > > > > > Author: Paul E. McKenney 
> > > > > > Date:   Tue Jul 14 18:35:23 2015 -0700
> > > > > > 
> > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > > > > > 
> > > > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), 
> > > > > > and is
> > > > > > likely the only thing that ever will use it, so this commit 
> > > > > > makes this
> > > > > > macro private to RCU.
> > > > > > 
> > > > > > Signed-off-by: Paul E. McKenney 
> > > > > > Cc: Will Deacon 
> > > > > > Cc: Peter Zijlstra 
> > > > > > Cc: Benjamin Herrenschmidt 
> > > > > > Cc: "linux-a...@vger.kernel.org" 
> > > 
> > > Are you planning to queue this somewhere? I think it makes sense 
> > > regardless
> > > of whether we change PowerPc or not and ideally it would be merged around
> > > the same time as my relaxed atomics series.
> > 
> > I have is in -rcu.  By default, I will push it to the 4.4 merge window.
> > Please let me know if you need it sooner.
> 
> The generic relaxed atomics are now queued in -tip, so it would be really
> good to see this Documentation update land in 4.3 if at all possible. I
> appreciate it's late in the cycle, but it's always worth asking.

Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
commit, and if it passes a few day's worth of testing, I will see what
Ingo has to say about a pull request.

This commit also privatizes smp_mb__after_unlock_lock() as well as
updating documentation.  Looks like we need to strengthen powerpc's
locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
Or did that already happen and I just missed it?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-12 Thread Will Deacon

Hello Paul,

On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
> On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
> > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
> > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > > > > Author: Paul E. McKenney 
> > > > > Date:   Tue Jul 14 18:35:23 2015 -0700
> > > > > 
> > > > > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > > > > 
> > > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), and 
> > > > > is
> > > > > likely the only thing that ever will use it, so this commit makes 
> > > > > this
> > > > > macro private to RCU.
> > > > > 
> > > > > Signed-off-by: Paul E. McKenney 
> > > > > Cc: Will Deacon 
> > > > > Cc: Peter Zijlstra 
> > > > > Cc: Benjamin Herrenschmidt 
> > > > > Cc: "linux-a...@vger.kernel.org" 
> > 
> > Are you planning to queue this somewhere? I think it makes sense regardless
> > of whether we change PowerPc or not and ideally it would be merged around
> > the same time as my relaxed atomics series.
> 
> I have is in -rcu.  By default, I will push it to the 4.4 merge window.
> Please let me know if you need it sooner.

The generic relaxed atomics are now queued in -tip, so it would be really
good to see this Documentation update land in 4.3 if at all possible. I
appreciate it's late in the cycle, but it's always worth asking.

Thanks,

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-24 Thread Paul E. McKenney

On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
> Hi Paul,
> 
> On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
> > On Wed, Jul 15, 2015 at 11:51:35AM +0100, Will Deacon wrote:
> > > On Wed, Jul 15, 2015 at 02:38:20AM +0100, Paul E. McKenney wrote:
> > > > On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote:
> > > > > On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
> > > > > > On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
> > > > > > > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
> > > > > > > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney 
> > > > > > > > wrote:
> > > > > > > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > > > > > > > > > Given that RCU is currently the only user of this barrier, 
> > > > > > > > > > how would you
> > > > > > > > > > feel about making the barrier local to RCU and not part of 
> > > > > > > > > > the general
> > > > > > > > > > memory-barrier API?
> > > > > > > > > 
> > > > > > > > > In theory, no objection.  Your thought is to leave the 
> > > > > > > > > definitions where
> > > > > > > > > they are, mark them as being used only by RCU, and removing 
> > > > > > > > > mention from
> > > > > > > > > memory-barriers.txt?  Or did you have something else in mind?
> > > > > > > > 
> > > > > > > > Actually, I was thinking of defining them in an RCU header file 
> > > > > > > > with an
> > > > > > > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could 
> > > > > > > > have a big
> > > > > > > > comment describing the semantics, or put that in an RCU 
> > > > > > > > Documentation file
> > > > > > > > instead of memory-barriers.txt.
> > > > > > > > 
> > > > > > > > That *should* then mean we notice anybody else trying to use 
> > > > > > > > the barrier,
> > > > > > > > because they'd need to send patches to either add something 
> > > > > > > > equivalent
> > > > > > > > or move the definition out again.
> > > > > > > 
> > > > > > > My concern with this approach is that someone putting together a 
> > > > > > > new
> > > > > > > architecture might miss this.  That said, this approach certainly 
> > > > > > > would
> > > > > > > work for the current architectures.
> > > > > > 
> > > > > > I don't think they're any more likely to miss it than with the 
> > > > > > current
> > > > > > situation where the generic code defines the macro as a NOP unless 
> > > > > > you
> > > > > > explicitly override it.
> > > > > 
> > > > > Fair enough...
> > > > 
> > > > Like this?
> > > 
> > > Precisely! Thanks for cooking the patch -- this lays all my worries to
> > > rest, so:
> > > 
> > >   Acked-by: Will Deacon 
> > 
> > Thank you!
> 
> [...]
> 
> > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > > > Author: Paul E. McKenney 
> > > > Date:   Tue Jul 14 18:35:23 2015 -0700
> > > > 
> > > > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > > > 
> > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
> > > > likely the only thing that ever will use it, so this commit makes 
> > > > this
> > > > macro private to RCU.
> > > > 
> > > > Signed-off-by: Paul E. McKenney 
> > > > Cc: Will Deacon 
> > > > Cc: Peter Zijlstra 
> > > > Cc: Benjamin Herrenschmidt 
> > > > Cc: "linux-a...@vger.kernel.org" 
> 
> Are you planning to queue this somewhere? I think it makes sense regardless
> of whether we change PowerPc or not and ideally it would be merged around
> the same time as my relaxed atomics series.

I have is in -rcu.  By default, I will push it to the 4.4 merge window.
Please let me know if you need it sooner.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-24 Thread Will Deacon

Hi Paul,

On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
> On Wed, Jul 15, 2015 at 11:51:35AM +0100, Will Deacon wrote:
> > On Wed, Jul 15, 2015 at 02:38:20AM +0100, Paul E. McKenney wrote:
> > > On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote:
> > > > On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
> > > > > On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
> > > > > > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
> > > > > > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
> > > > > > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > > > > > > > > Given that RCU is currently the only user of this barrier, 
> > > > > > > > > how would you
> > > > > > > > > feel about making the barrier local to RCU and not part of 
> > > > > > > > > the general
> > > > > > > > > memory-barrier API?
> > > > > > > > 
> > > > > > > > In theory, no objection.  Your thought is to leave the 
> > > > > > > > definitions where
> > > > > > > > they are, mark them as being used only by RCU, and removing 
> > > > > > > > mention from
> > > > > > > > memory-barriers.txt?  Or did you have something else in mind?
> > > > > > > 
> > > > > > > Actually, I was thinking of defining them in an RCU header file 
> > > > > > > with an
> > > > > > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could 
> > > > > > > have a big
> > > > > > > comment describing the semantics, or put that in an RCU 
> > > > > > > Documentation file
> > > > > > > instead of memory-barriers.txt.
> > > > > > > 
> > > > > > > That *should* then mean we notice anybody else trying to use the 
> > > > > > > barrier,
> > > > > > > because they'd need to send patches to either add something 
> > > > > > > equivalent
> > > > > > > or move the definition out again.
> > > > > > 
> > > > > > My concern with this approach is that someone putting together a new
> > > > > > architecture might miss this.  That said, this approach certainly 
> > > > > > would
> > > > > > work for the current architectures.
> > > > > 
> > > > > I don't think they're any more likely to miss it than with the current
> > > > > situation where the generic code defines the macro as a NOP unless you
> > > > > explicitly override it.
> > > > 
> > > > Fair enough...
> > > 
> > > Like this?
> > 
> > Precisely! Thanks for cooking the patch -- this lays all my worries to
> > rest, so:
> > 
> >   Acked-by: Will Deacon 
> 
> Thank you!

[...]

> > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > > Author: Paul E. McKenney 
> > > Date:   Tue Jul 14 18:35:23 2015 -0700
> > > 
> > > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > > 
> > > RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
> > > likely the only thing that ever will use it, so this commit makes this
> > > macro private to RCU.
> > > 
> > > Signed-off-by: Paul E. McKenney 
> > > Cc: Will Deacon 
> > > Cc: Peter Zijlstra 
> > > Cc: Benjamin Herrenschmidt 
> > > Cc: "linux-a...@vger.kernel.org" 

Are you planning to queue this somewhere? I think it makes sense regardless
of whether we change PowerPc or not and ideally it would be merged around
the same time as my relaxed atomics series.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-22 Thread Will Deacon

On Mon, Jul 20, 2015 at 10:18:14PM +0100, Benjamin Herrenschmidt wrote:
> On Mon, 2015-07-20 at 14:39 +0100, Will Deacon wrote:
> > On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote:
> > > On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
> > > >  static inline void arch_spin_unlock(arch_spinlock_t *lock)
> > > >  {
> > > > -   SYNC_IO;
> > > > -   __asm__ __volatile__("# arch_spin_unlock\n\t"
> > > > -   PPC_RELEASE_BARRIER: : :"memory");
> > > > +   smp_mb();
> > > > lock->slock = 0;
> > > >  }
> > > 
> > > That probably needs to be mb() in case somebody has the expectation that
> > > it does a barrier vs. DMA on UP.
> > 
> > Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier()
> > in the core code?
> 
> include/linux/spinlock_up.h:# define arch_spin_unlock(lock)   do {
> barrier(); (void)(lock); } while (0)
> 
> Indeed...

So, leaving mmiowb() alone, we end up with the patch below.

Will

--->8

>From 9373a4226986dd69b6aaf896590ea43abeb1cc8e Mon Sep 17 00:00:00 2001
From: Will Deacon 
Date: Wed, 22 Jul 2015 17:37:58 +0100
Subject: [PATCH] powerpc: kill smp_mb__after_unlock_lock

PowerPC is the only architecture defining smp_mb__after_unlock_lock,
but we can remove it by adding an unconditional sync (smp_mb()) to the
spin_unlock code.

Signed-off-by: Will Deacon 
---
 arch/powerpc/include/asm/io.h   | 13 +
 arch/powerpc/include/asm/spinlock.h | 22 +-
 2 files changed, 2 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index a8d2ef30d473..a3ad51046b04 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -105,12 +105,6 @@ extern bool isa_io_special;
  *
  */
 
-#ifdef CONFIG_PPC64
-#define IO_SET_SYNC_FLAG() do { local_paca->io_sync = 1; } while(0)
-#else
-#define IO_SET_SYNC_FLAG()
-#endif
-
 /* gcc 4.0 and older doesn't have 'Z' constraint */
 #if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ == 0)
 #define DEF_MMIO_IN_X(name, size, insn)\
@@ -127,7 +121,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__("sync;"#insn" %1,0,%2" \
: "=m" (*addr) : "r" (val), "r" (addr) : "memory"); \
-   IO_SET_SYNC_FLAG(); \
 }
 #else /* newer gcc */
 #define DEF_MMIO_IN_X(name, size, insn)\
@@ -144,7 +137,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__("sync;"#insn" %1,%y0"  \
: "=Z" (*addr) : "r" (val) : "memory"); \
-   IO_SET_SYNC_FLAG(); \
 }
 #endif
 
@@ -162,7 +154,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__("sync;"#insn"%U0%X0 %1,%0" \
: "=m" (*addr) : "r" (val) : "memory"); \
-   IO_SET_SYNC_FLAG(); \
 }
 
 DEF_MMIO_IN_D(in_8, 8, lbz);
@@ -630,9 +621,7 @@ static inline void name at  
\
 #define mmiowb()
 #else
 /*
- * Enforce synchronisation of stores vs. spin_unlock
- * (this does it explicitly, though our implementation of spin_unlock
- * does it implicitely too)
+ * Explicitly enforce synchronisation of stores vs. spin_unlock
  */
 static inline void mmiowb(void)
 {
diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 4dbe072eecbe..9da89ea4ff31 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -28,8 +28,6 @@
 #include 
 #include 
 
-#define smp_mb__after_unlock_lock()smp_mb()  /* Full ordering for lock. */
-
 #ifdef CONFIG_PPC64
 /* use 0x80yy when locked, where yy == CPU number */
 #ifdef __BIG_ENDIAN__
@@ -41,19 +39,6 @@
 #define LOCK_TOKEN 1
 #endif
 
-#if defined(CONFIG_PPC64) && defined(CONFIG_SMP)
-#define CLEAR_IO_SYNC  (get_paca()->io_sync = 0)
-#define SYNC_IOdo {
\
-   if (unlikely(get_paca()->io_sync)) {\
-   mb();   \
-   get_paca()->io_sync = 0;\
-   }   \
-   } while (0)
-#else
-#define CLEAR_IO_SYNC
-#define SYNC_IO
-#endif
-
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
return lo

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-20 Thread Benjamin Herrenschmidt

On Mon, 2015-07-20 at 14:39 +0100, Will Deacon wrote:
> On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote:
> > On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
> > >  static inline void arch_spin_unlock(arch_spinlock_t *lock)
> > >  {
> > > -   SYNC_IO;
> > > -   __asm__ __volatile__("# arch_spin_unlock\n\t"
> > > -   PPC_RELEASE_BARRIER: : :"memory");
> > > +   smp_mb();
> > > lock->slock = 0;
> > >  }
> > 
> > That probably needs to be mb() in case somebody has the expectation that
> > it does a barrier vs. DMA on UP.
> 
> Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier()
> in the core code?

include/linux/spinlock_up.h:# define arch_spin_unlock(lock) do { barrier(); 
(void)(lock); } while (0)

Indeed...

Ben.


> Will
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-20 Thread Will Deacon

On Mon, Jul 20, 2015 at 02:48:49PM +0100, Paul E. McKenney wrote:
> On Mon, Jul 20, 2015 at 02:39:06PM +0100, Will Deacon wrote:
> > On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote:
> > > On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
> > > >  static inline void arch_spin_unlock(arch_spinlock_t *lock)
> > > >  {
> > > > -   SYNC_IO;
> > > > -   __asm__ __volatile__("# arch_spin_unlock\n\t"
> > > > -   PPC_RELEASE_BARRIER: : :"memory");
> > > > +   smp_mb();
> > > > lock->slock = 0;
> > > >  }
> > > 
> > > That probably needs to be mb() in case somebody has the expectation that
> > > it does a barrier vs. DMA on UP.
> > 
> > Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier()
> > in the core code?
> 
> Yes, to barrier(), but that doesn't generate any code.  In contrast, the
> mb() that Ben is asking for puts out a sync instruction.  Without that
> sync instruction, MMIO accesses can be reordered with the spin_unlock(),
> even on single-CPU systems.  So the bm() is really needed if unlock is
> to order against MMIO (and thus DMA) on UP.

Understood, but my point was that this needs to be done in the core code
rather than here. Perhaps it's easier to leave mmiowb() alone for PowerPC
for now and instead predicate that on !SMP?

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-20 Thread Paul E. McKenney

On Mon, Jul 20, 2015 at 02:39:06PM +0100, Will Deacon wrote:
> On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote:
> > On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
> > >  static inline void arch_spin_unlock(arch_spinlock_t *lock)
> > >  {
> > > -   SYNC_IO;
> > > -   __asm__ __volatile__("# arch_spin_unlock\n\t"
> > > -   PPC_RELEASE_BARRIER: : :"memory");
> > > +   smp_mb();
> > > lock->slock = 0;
> > >  }
> > 
> > That probably needs to be mb() in case somebody has the expectation that
> > it does a barrier vs. DMA on UP.
> 
> Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier()
> in the core code?

Yes, to barrier(), but that doesn't generate any code.  In contrast, the
mb() that Ben is asking for puts out a sync instruction.  Without that
sync instruction, MMIO accesses can be reordered with the spin_unlock(),
even on single-CPU systems.  So the bm() is really needed if unlock is
to order against MMIO (and thus DMA) on UP.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-20 Thread Will Deacon

On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote:
> On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
> >  static inline void arch_spin_unlock(arch_spinlock_t *lock)
> >  {
> > -   SYNC_IO;
> > -   __asm__ __volatile__("# arch_spin_unlock\n\t"
> > -   PPC_RELEASE_BARRIER: : :"memory");
> > +   smp_mb();
> > lock->slock = 0;
> >  }
> 
> That probably needs to be mb() in case somebody has the expectation that
> it does a barrier vs. DMA on UP.

Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier()
in the core code?

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-17 Thread Benjamin Herrenschmidt

On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
>  static inline void arch_spin_unlock(arch_spinlock_t *lock)
>  {
> -   SYNC_IO;
> -   __asm__ __volatile__("# arch_spin_unlock\n\t"
> -   PPC_RELEASE_BARRIER: : :"memory");
> +   smp_mb();
> lock->slock = 0;
>  }

That probably needs to be mb() in case somebody has the expectation that
it does a barrier vs. DMA on UP.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-17 Thread Paul E. McKenney

On Fri, Jul 17, 2015 at 12:15:35PM +0200, Peter Zijlstra wrote:
> On Fri, Jul 17, 2015 at 10:32:21AM +0100, Will Deacon wrote:
> > @@ -158,9 +140,7 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, 
> > unsigned long flags)
> >  
> >  static inline void arch_spin_unlock(arch_spinlock_t *lock)
> >  {
> > -   SYNC_IO;
> > -   __asm__ __volatile__("# arch_spin_unlock\n\t"
> > -   PPC_RELEASE_BARRIER: : :"memory");
> > +   smp_mb();
> > lock->slock = 0;
> >  }
> 
> Should we then also make smp_store_release() use sync instead of lwsync
> to keep it consistent?

Unless smp_store_release() needs to interact with MMIO accesses, it
should still be able to be lwsync.  This means that unlock-lock is a full
barrier, but relase-acquire is not necessarily, which should be just fine.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-17 Thread Peter Zijlstra

On Fri, Jul 17, 2015 at 10:32:21AM +0100, Will Deacon wrote:
> @@ -158,9 +140,7 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned 
> long flags)
>  
>  static inline void arch_spin_unlock(arch_spinlock_t *lock)
>  {
> - SYNC_IO;
> - __asm__ __volatile__("# arch_spin_unlock\n\t"
> - PPC_RELEASE_BARRIER: : :"memory");
> + smp_mb();
>   lock->slock = 0;
>  }

Should we then also make smp_store_release() use sync instead of lwsync
to keep it consistent?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-17 Thread Will Deacon

On Thu, Jul 16, 2015 at 11:54:25PM +0100, Benjamin Herrenschmidt wrote:
> On Thu, 2015-07-16 at 08:11 -0700, Paul E. McKenney wrote:
> > > So isync in lock in architecturally incorrect, despite being what
> > the
> > > architecture recommends using, yay !
> > 
> > Well, the architecture isn't expecting that crazies like myself would
> > want to have an unlock-lock provide ordering to some CPU not holding
> > the lock.  :-/
> 
> Yes, isync in lock effectively allows any load or store before the lock
> to leak into the lock and get re-ordered with things in there.
> 
> lwsync leaves us exposed to the re-order inside the LL/SC of a
> subsequent load.
> 
> So to get the full barrier semantic, the only option is a full sync,
> either in lock or unlock. Instinctively I prefer in lock but there's
> an argument to have it in unlock so we can get rid of the iosync
> business.

Yeah, removing the iosync logic kills mmiowb() as well (totally untested
diff below and I was too scared to touch that paca thing!). It also sticks
the full barrier in one place, since there's no try_unlock to worry about.

Will

--->8

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index a8d2ef30d473..bb34f3bb66dc 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -105,12 +105,6 @@ extern bool isa_io_special;
  *
  */
 
-#ifdef CONFIG_PPC64
-#define IO_SET_SYNC_FLAG() do { local_paca->io_sync = 1; } while(0)
-#else
-#define IO_SET_SYNC_FLAG()
-#endif
-
 /* gcc 4.0 and older doesn't have 'Z' constraint */
 #if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ == 0)
 #define DEF_MMIO_IN_X(name, size, insn)\
@@ -127,7 +121,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__("sync;"#insn" %1,0,%2" \
: "=m" (*addr) : "r" (val), "r" (addr) : "memory"); \
-   IO_SET_SYNC_FLAG(); \
 }
 #else /* newer gcc */
 #define DEF_MMIO_IN_X(name, size, insn)\
@@ -144,7 +137,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__("sync;"#insn" %1,%y0"  \
: "=Z" (*addr) : "r" (val) : "memory"); \
-   IO_SET_SYNC_FLAG(); \
 }
 #endif
 
@@ -162,7 +154,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__("sync;"#insn"%U0%X0 %1,%0" \
: "=m" (*addr) : "r" (val) : "memory"); \
-   IO_SET_SYNC_FLAG(); \
 }
 
 DEF_MMIO_IN_D(in_8, 8, lbz);
@@ -626,23 +617,7 @@ static inline void name at 
\
 #define writel_relaxed(v, addr)writel(v, addr)
 #define writeq_relaxed(v, addr)writeq(v, addr)
 
-#ifdef CONFIG_PPC32
 #define mmiowb()
-#else
-/*
- * Enforce synchronisation of stores vs. spin_unlock
- * (this does it explicitly, though our implementation of spin_unlock
- * does it implicitely too)
- */
-static inline void mmiowb(void)
-{
-   unsigned long tmp;
-
-   __asm__ __volatile__("sync; li %0,0; stb %0,%1(13)"
-   : "=&r" (tmp) : "i" (offsetof(struct paca_struct, io_sync))
-   : "memory");
-}
-#endif /* !CONFIG_PPC32 */
 
 static inline void iosync(void)
 {
diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 4dbe072eecbe..9da89ea4ff31 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -28,8 +28,6 @@
 #include 
 #include 
 
-#define smp_mb__after_unlock_lock()smp_mb()  /* Full ordering for lock. */
-
 #ifdef CONFIG_PPC64
 /* use 0x80yy when locked, where yy == CPU number */
 #ifdef __BIG_ENDIAN__
@@ -41,19 +39,6 @@
 #define LOCK_TOKEN 1
 #endif
 
-#if defined(CONFIG_PPC64) && defined(CONFIG_SMP)
-#define CLEAR_IO_SYNC  (get_paca()->io_sync = 0)
-#define SYNC_IOdo {
\
-   if (unlikely(get_paca()->io_sync)) {\
-   mb();   \
-   get_paca()->io_sync = 0;\
-   }   \
-   } while (0)
-#else
-#define CLEAR_IO_SYNC
-#define SYNC_IO
-#endif
-
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
return lock.slock == 0;
@@ -91,7 +76,6 @@ static inline unsigned long 
__arch_spin_trylock(arch_spinlock_t *lock)
 
 static inline

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-16 Thread Benjamin Herrenschmidt

On Thu, 2015-07-16 at 08:11 -0700, Paul E. McKenney wrote:
> > So isync in lock in architecturally incorrect, despite being what
> the
> > architecture recommends using, yay !
> 
> Well, the architecture isn't expecting that crazies like myself would
> want to have an unlock-lock provide ordering to some CPU not holding
> the lock.  :-/

Yes, isync in lock effectively allows any load or store before the lock
to leak into the lock and get re-ordered with things in there.

lwsync leaves us exposed to the re-order inside the LL/SC of a
subsequent load.

So to get the full barrier semantic, the only option is a full sync,
either in lock or unlock. Instinctively I prefer in lock but there's
an argument to have it in unlock so we can get rid of the iosync
business.

Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-16 Thread Paul E. McKenney

On Thu, Jul 16, 2015 at 03:14:55PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2015-07-16 at 15:03 +1000, Benjamin Herrenschmidt wrote:
> > On Thu, 2015-07-16 at 12:00 +1000, Michael Ellerman wrote:
> > > That would fix the problem with smp_mb__after_unlock_lock(), but not
> > > the original worry we had about loads happening before the SC in lock.
> > 
> > However I think isync fixes *that* :-) The problem with isync is as you
> > said, it's not a -memory- barrier per-se, it's an execution barrier /
> > context synchronizing instruction. The combination stwcx. + bne + isync
> > however prevents the execution of anything past the isync until the
> > stwcx has completed and the bne has been "decided", which prevents loads
> > from leaking into the LL/SC loop. It will also prevent a store in the
> > lock from being issued before the stwcx. has completed. It does *not*
> > prevent as far as I can tell another unrelated store before the lock
> > from leaking into the lock, including the one used to unlock a different
> > lock.
> 
> Except that the architecture says:
> 
> <<
> Because a Store Conditional instruction may com-
> plete before its store has been performed, a condi-
> tional Branch instruction that depends on the CR0
> value set by a Store Conditional instruction does
> not order the Store Conditional's store with respect
> to storage accesses caused by instructions that
> follow the Branch
> >>
> 
> So isync in lock in architecturally incorrect, despite being what the
> architecture recommends using, yay !

Well, the architecture isn't expecting that crazies like myself would
want to have an unlock-lock provide ordering to some CPU not holding
the lock.  :-/

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Benjamin Herrenschmidt

On Thu, 2015-07-16 at 15:03 +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2015-07-16 at 12:00 +1000, Michael Ellerman wrote:
> > That would fix the problem with smp_mb__after_unlock_lock(), but not
> > the original worry we had about loads happening before the SC in lock.
> 
> However I think isync fixes *that* :-) The problem with isync is as you
> said, it's not a -memory- barrier per-se, it's an execution barrier /
> context synchronizing instruction. The combination stwcx. + bne + isync
> however prevents the execution of anything past the isync until the
> stwcx has completed and the bne has been "decided", which prevents loads
> from leaking into the LL/SC loop. It will also prevent a store in the
> lock from being issued before the stwcx. has completed. It does *not*
> prevent as far as I can tell another unrelated store before the lock
> from leaking into the lock, including the one used to unlock a different
> lock.

Except that the architecture says:

<<
Because a Store Conditional instruction may com-
plete before its store has been performed, a condi-
tional Branch instruction that depends on the CR0
value set by a Store Conditional instruction does
not order the Store Conditional's store with respect
to storage accesses caused by instructions that
follow the Branch
>>

So isync in lock in architecturally incorrect, despite being what the
architecture recommends using, yay !

Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Benjamin Herrenschmidt

On Thu, 2015-07-16 at 12:00 +1000, Michael Ellerman wrote:
> That would fix the problem with smp_mb__after_unlock_lock(), but not
> the original worry we had about loads happening before the SC in lock.

However I think isync fixes *that* :-) The problem with isync is as you
said, it's not a -memory- barrier per-se, it's an execution barrier /
context synchronizing instruction. The combination stwcx. + bne + isync
however prevents the execution of anything past the isync until the
stwcx has completed and the bne has been "decided", which prevents loads
from leaking into the LL/SC loop. It will also prevent a store in the
lock from being issued before the stwcx. has completed. It does *not*
prevent as far as I can tell another unrelated store before the lock
from leaking into the lock, including the one used to unlock a different
lock.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Michael Ellerman

On Wed, 2015-07-15 at 11:44 +0100, Will Deacon wrote:
> Hi Michael,
> 
> On Wed, Jul 15, 2015 at 04:06:18AM +0100, Michael Ellerman wrote:
> > On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote:
> > > On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
> > > > This didn't go anywhere last time I posted it, but here it is again.
> > > > I'd really appreciate some feedback from the PowerPC guys, especially as
> > > > to whether this change requires them to add an additional barrier in
> > > > arch_spin_unlock and what the cost of that would be.
> > > 
> > > We'd have to turn the lwsync in unlock or the isync in lock into a full
> > > barrier. As it is, we *almost* have a full barrier semantic, but not
> > > quite, as in things can get mixed up inside spin_lock between the LL and
> > > the SC (things leaking in past LL and things leaking "out" up before SC
> > > and then getting mixed up in there).
> > > 
> > > Michael, at some point you were experimenting a bit with that and tried
> > > to get some perf numbers of the impact that would have, did that
> > > solidify ? Otherwise, I'll have a look when I'm back next week.
> > 
> > I was mainly experimenting with replacing the lwsync in lock with an isync.
> > 
> > But I think you're talking about making it a full sync in lock.
> > 
> > That was about +7% on p8, +25% on p7 and +88% on p6.
> 
> Ok, so that's overhead incurred by moving from isync -> lwsync? The numbers
> look quite large...

Sorry no.

Currently we use lwsync in lock. You'll see isync in the code
(PPC_ACQUIRE_BARRIER), but on most modern CPUs that is patched at runtime to
lwsync.

I benchmarked lwsync (current), isync (proposed at the time) and sync (just for
comparison).

The numbers above are going from lwsync -> sync, as I thought that was what Ben
was talking about.

Going from lwsync to isync was actually a small speedup on power8, which was
surprising.

> > We got stuck deciding whether isync was safe to use as a memory barrier,
> > because the wording in the arch is a bit vague.
> > 
> > But if we're talking about a full sync then I think there is no question 
> > that's
> > OK and we should just do it.
> 
> Is this because there's a small overhead from lwsync -> sync? Just want to
> make sure I follow your reasoning.

No I mean that sync is clearly a memory barrier. The issue with switching to
isync in lock was that it's not a memory barrier per se, so we were not 100%
confident in it.

> If you went the way of sync in unlock, could you remove the conditional
> SYNC_IO stuff?

Yeah we could, it's just a conditional sync in unlock when mmio has been done.

That would fix the problem with smp_mb__after_unlock_lock(), but not the
original worry we had about loads happening before the SC in lock.

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Michael Ellerman

On Wed, 2015-07-15 at 07:18 -0700, Paul E. McKenney wrote:
> On Wed, Jul 15, 2015 at 01:06:18PM +1000, Michael Ellerman wrote:
> > On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote:
> > > Michael, at some point you were experimenting a bit with that and tried
> > > to get some perf numbers of the impact that would have, did that
> > > solidify ? Otherwise, I'll have a look when I'm back next week.
> > 
> > I was mainly experimenting with replacing the lwsync in lock with an isync.
> > 
> > But I think you're talking about making it a full sync in lock.
> > 
> > That was about +7% on p8, +25% on p7 and +88% on p6.
> 
> Just for completeness, what were you running as benchmark?  ;-)

Heh sorry :)

That was my lockcomparison benchmark, based on Anton's original. It just does
100,000,000 lock/unlocks for each type in userspace:

  https://github.com/mpe/lockcomparison/blob/master/lock_comparison.c

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Paul E. McKenney

On Wed, Jul 15, 2015 at 01:06:18PM +1000, Michael Ellerman wrote:
> On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
> > > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
> > > into a full memory barrier.
> > > 
> > > However:
> > > 
> > >   - This ordering guarantee is already provided without the barrier on
> > > all architectures apart from PowerPC
> > > 
> > >   - The barrier only applies to UNLOCK + LOCK, not general
> > > RELEASE + ACQUIRE operations
> > > 
> > >   - Locks are generally assumed to offer SC ordering semantics, so
> > > having this additional barrier is error-prone and complicates the
> > > callers of LOCK/UNLOCK primitives
> > > 
> > >   - The barrier is not well used outside of RCU and, because it was
> > > retrofitted into the kernel, it's not clear whether other areas of
> > > the kernel are incorrectly relying on UNLOCK + LOCK implying a full
> > > barrier
> > > 
> > > This patch removes the barrier and instead requires architectures to
> > > provide full barrier semantics for an UNLOCK + LOCK sequence.
> > > 
> > > Cc: Benjamin Herrenschmidt 
> > > Cc: Paul McKenney 
> > > Cc: Peter Zijlstra 
> > > Signed-off-by: Will Deacon 
> > > ---
> > > 
> > > This didn't go anywhere last time I posted it, but here it is again.
> > > I'd really appreciate some feedback from the PowerPC guys, especially as
> > > to whether this change requires them to add an additional barrier in
> > > arch_spin_unlock and what the cost of that would be.
> > 
> > We'd have to turn the lwsync in unlock or the isync in lock into a full
> > barrier. As it is, we *almost* have a full barrier semantic, but not
> > quite, as in things can get mixed up inside spin_lock between the LL and
> > the SC (things leaking in past LL and things leaking "out" up before SC
> > and then getting mixed up in there).
> > 
> > Michael, at some point you were experimenting a bit with that and tried
> > to get some perf numbers of the impact that would have, did that
> > solidify ? Otherwise, I'll have a look when I'm back next week.
> 
> I was mainly experimenting with replacing the lwsync in lock with an isync.
> 
> But I think you're talking about making it a full sync in lock.
> 
> That was about +7% on p8, +25% on p7 and +88% on p6.

Just for completeness, what were you running as benchmark?  ;-)

Thanx, Paul

> We got stuck deciding whether isync was safe to use as a memory barrier,
> because the wording in the arch is a bit vague.
> 
> But if we're talking about a full sync then I think there is no question 
> that's
> OK and we should just do it.
> 
> cheers
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Paul E. McKenney

On Wed, Jul 15, 2015 at 11:51:35AM +0100, Will Deacon wrote:
> Hi Paul,
> 
> On Wed, Jul 15, 2015 at 02:38:20AM +0100, Paul E. McKenney wrote:
> > On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote:
> > > On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
> > > > On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
> > > > > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
> > > > > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
> > > > > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > > > > > > > Given that RCU is currently the only user of this barrier, how 
> > > > > > > > would you
> > > > > > > > feel about making the barrier local to RCU and not part of the 
> > > > > > > > general
> > > > > > > > memory-barrier API?
> > > > > > > 
> > > > > > > In theory, no objection.  Your thought is to leave the 
> > > > > > > definitions where
> > > > > > > they are, mark them as being used only by RCU, and removing 
> > > > > > > mention from
> > > > > > > memory-barriers.txt?  Or did you have something else in mind?
> > > > > > 
> > > > > > Actually, I was thinking of defining them in an RCU header file 
> > > > > > with an
> > > > > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have 
> > > > > > a big
> > > > > > comment describing the semantics, or put that in an RCU 
> > > > > > Documentation file
> > > > > > instead of memory-barriers.txt.
> > > > > > 
> > > > > > That *should* then mean we notice anybody else trying to use the 
> > > > > > barrier,
> > > > > > because they'd need to send patches to either add something 
> > > > > > equivalent
> > > > > > or move the definition out again.
> > > > > 
> > > > > My concern with this approach is that someone putting together a new
> > > > > architecture might miss this.  That said, this approach certainly 
> > > > > would
> > > > > work for the current architectures.
> > > > 
> > > > I don't think they're any more likely to miss it than with the current
> > > > situation where the generic code defines the macro as a NOP unless you
> > > > explicitly override it.
> > > 
> > > Fair enough...
> > 
> > Like this?
> 
> Precisely! Thanks for cooking the patch -- this lays all my worries to
> rest, so:
> 
>   Acked-by: Will Deacon 

Thank you!

> We should continue the discussion with Ben and Michael about whether or
> not the PowerPC locking code can be strengthened, though (making this
> barrier a NOP on all currently supported archs).

Indeed -- if it becomes a NOP on all supported architectures, we might
want to consider just removing it completely.

Thanx, Paul

> Will
> 
> > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > Author: Paul E. McKenney 
> > Date:   Tue Jul 14 18:35:23 2015 -0700
> > 
> > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > 
> > RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
> > likely the only thing that ever will use it, so this commit makes this
> > macro private to RCU.
> > 
> > Signed-off-by: Paul E. McKenney 
> > Cc: Will Deacon 
> > Cc: Peter Zijlstra 
> > Cc: Benjamin Herrenschmidt 
> > Cc: "linux-a...@vger.kernel.org" 
> > 
> > diff --git a/Documentation/memory-barriers.txt 
> > b/Documentation/memory-barriers.txt
> > index 318523872db5..eafa6a53f72c 100644
> > --- a/Documentation/memory-barriers.txt
> > +++ b/Documentation/memory-barriers.txt
> > @@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only 
> > from the perspective of
> >  another CPU not holding that lock.  In short, a ACQUIRE followed by an
> >  RELEASE may -not- be assumed to be a full memory barrier.
> >  
> > -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
> > -imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
> > -pair to produce a full barrier, the ACQUIRE can be followed by an
> > -smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
> > -(including transitivity) if either (a) the RELEASE and the ACQUIRE are
> > -executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
> > -the same variable.  The smp_mb__after_unlock_lock() primitive is free
> > -on many architectures.  Without smp_mb__after_unlock_lock(), the CPU's
> > -execution of the critical sections corresponding to the RELEASE and the
> > -ACQUIRE can cross, so that:
> > +Similarly, the reverse case of a RELEASE followed by an ACQUIRE does
> > +not imply a full memory barrier.  Therefore, the CPU's execution of the
> > +critical sections corresponding to the RELEASE and the ACQUIRE can cross,
> > +so that:
> >  
> > *A = a;
> > RELEASE M
> > @@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding 
> > the deadlock.
> > a sleep-unlock race, but the locking primitive needs to resolve
> > such races properly in any case.
> >  
> > -With

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Will Deacon

Hi Paul,

On Wed, Jul 15, 2015 at 02:38:20AM +0100, Paul E. McKenney wrote:
> On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote:
> > On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
> > > On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
> > > > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
> > > > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
> > > > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > > > > > > Given that RCU is currently the only user of this barrier, how 
> > > > > > > would you
> > > > > > > feel about making the barrier local to RCU and not part of the 
> > > > > > > general
> > > > > > > memory-barrier API?
> > > > > > 
> > > > > > In theory, no objection.  Your thought is to leave the definitions 
> > > > > > where
> > > > > > they are, mark them as being used only by RCU, and removing mention 
> > > > > > from
> > > > > > memory-barriers.txt?  Or did you have something else in mind?
> > > > > 
> > > > > Actually, I was thinking of defining them in an RCU header file with 
> > > > > an
> > > > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a 
> > > > > big
> > > > > comment describing the semantics, or put that in an RCU Documentation 
> > > > > file
> > > > > instead of memory-barriers.txt.
> > > > > 
> > > > > That *should* then mean we notice anybody else trying to use the 
> > > > > barrier,
> > > > > because they'd need to send patches to either add something equivalent
> > > > > or move the definition out again.
> > > > 
> > > > My concern with this approach is that someone putting together a new
> > > > architecture might miss this.  That said, this approach certainly would
> > > > work for the current architectures.
> > > 
> > > I don't think they're any more likely to miss it than with the current
> > > situation where the generic code defines the macro as a NOP unless you
> > > explicitly override it.
> > 
> > Fair enough...
> 
> Like this?

Precisely! Thanks for cooking the patch -- this lays all my worries to
rest, so:

  Acked-by: Will Deacon 

We should continue the discussion with Ben and Michael about whether or
not the PowerPC locking code can be strengthened, though (making this
barrier a NOP on all currently supported archs).

Will

> commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> Author: Paul E. McKenney 
> Date:   Tue Jul 14 18:35:23 2015 -0700
> 
> rcu,locking: Privatize smp_mb__after_unlock_lock()
> 
> RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
> likely the only thing that ever will use it, so this commit makes this
> macro private to RCU.
> 
> Signed-off-by: Paul E. McKenney 
> Cc: Will Deacon 
> Cc: Peter Zijlstra 
> Cc: Benjamin Herrenschmidt 
> Cc: "linux-a...@vger.kernel.org" 
> 
> diff --git a/Documentation/memory-barriers.txt 
> b/Documentation/memory-barriers.txt
> index 318523872db5..eafa6a53f72c 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only from 
> the perspective of
>  another CPU not holding that lock.  In short, a ACQUIRE followed by an
>  RELEASE may -not- be assumed to be a full memory barrier.
>  
> -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
> -imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
> -pair to produce a full barrier, the ACQUIRE can be followed by an
> -smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
> -(including transitivity) if either (a) the RELEASE and the ACQUIRE are
> -executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
> -the same variable.  The smp_mb__after_unlock_lock() primitive is free
> -on many architectures.  Without smp_mb__after_unlock_lock(), the CPU's
> -execution of the critical sections corresponding to the RELEASE and the
> -ACQUIRE can cross, so that:
> +Similarly, the reverse case of a RELEASE followed by an ACQUIRE does
> +not imply a full memory barrier.  Therefore, the CPU's execution of the
> +critical sections corresponding to the RELEASE and the ACQUIRE can cross,
> +so that:
>  
>   *A = a;
>   RELEASE M
> @@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding 
> the deadlock.
>   a sleep-unlock race, but the locking primitive needs to resolve
>   such races properly in any case.
>  
> -With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
> -For example, with the following code, the store to *A will always be
> -seen by other CPUs before the store to *B:
> -
> - *A = a;
> - RELEASE M
> - ACQUIRE N
> - smp_mb__after_unlock_lock();
> - *B = b;
> -
> -The operations will always occur in one of the following orders:
> -
> - STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B
> - STORE *A, ACQUIRE, RELEASE,

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Will Deacon

Hi Michael,

On Wed, Jul 15, 2015 at 04:06:18AM +0100, Michael Ellerman wrote:
> On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
> > > This didn't go anywhere last time I posted it, but here it is again.
> > > I'd really appreciate some feedback from the PowerPC guys, especially as
> > > to whether this change requires them to add an additional barrier in
> > > arch_spin_unlock and what the cost of that would be.
> > 
> > We'd have to turn the lwsync in unlock or the isync in lock into a full
> > barrier. As it is, we *almost* have a full barrier semantic, but not
> > quite, as in things can get mixed up inside spin_lock between the LL and
> > the SC (things leaking in past LL and things leaking "out" up before SC
> > and then getting mixed up in there).
> > 
> > Michael, at some point you were experimenting a bit with that and tried
> > to get some perf numbers of the impact that would have, did that
> > solidify ? Otherwise, I'll have a look when I'm back next week.
> 
> I was mainly experimenting with replacing the lwsync in lock with an isync.
> 
> But I think you're talking about making it a full sync in lock.
> 
> That was about +7% on p8, +25% on p7 and +88% on p6.

Ok, so that's overhead incurred by moving from isync -> lwsync? The numbers
look quite large...

> We got stuck deciding whether isync was safe to use as a memory barrier,
> because the wording in the arch is a bit vague.
> 
> But if we're talking about a full sync then I think there is no question 
> that's
> OK and we should just do it.

Is this because there's a small overhead from lwsync -> sync? Just want to
make sure I follow your reasoning.

If you went the way of sync in unlock, could you remove the conditional
SYNC_IO stuff?

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Michael Ellerman

On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
> > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
> > into a full memory barrier.
> > 
> > However:
> > 
> >   - This ordering guarantee is already provided without the barrier on
> > all architectures apart from PowerPC
> > 
> >   - The barrier only applies to UNLOCK + LOCK, not general
> > RELEASE + ACQUIRE operations
> > 
> >   - Locks are generally assumed to offer SC ordering semantics, so
> > having this additional barrier is error-prone and complicates the
> > callers of LOCK/UNLOCK primitives
> > 
> >   - The barrier is not well used outside of RCU and, because it was
> > retrofitted into the kernel, it's not clear whether other areas of
> > the kernel are incorrectly relying on UNLOCK + LOCK implying a full
> > barrier
> > 
> > This patch removes the barrier and instead requires architectures to
> > provide full barrier semantics for an UNLOCK + LOCK sequence.
> > 
> > Cc: Benjamin Herrenschmidt 
> > Cc: Paul McKenney 
> > Cc: Peter Zijlstra 
> > Signed-off-by: Will Deacon 
> > ---
> > 
> > This didn't go anywhere last time I posted it, but here it is again.
> > I'd really appreciate some feedback from the PowerPC guys, especially as
> > to whether this change requires them to add an additional barrier in
> > arch_spin_unlock and what the cost of that would be.
> 
> We'd have to turn the lwsync in unlock or the isync in lock into a full
> barrier. As it is, we *almost* have a full barrier semantic, but not
> quite, as in things can get mixed up inside spin_lock between the LL and
> the SC (things leaking in past LL and things leaking "out" up before SC
> and then getting mixed up in there).
> 
> Michael, at some point you were experimenting a bit with that and tried
> to get some perf numbers of the impact that would have, did that
> solidify ? Otherwise, I'll have a look when I'm back next week.

I was mainly experimenting with replacing the lwsync in lock with an isync.

But I think you're talking about making it a full sync in lock.

That was about +7% on p8, +25% on p7 and +88% on p6.

We got stuck deciding whether isync was safe to use as a memory barrier,
because the wording in the arch is a bit vague.

But if we're talking about a full sync then I think there is no question that's
OK and we should just do it.

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Paul E. McKenney

On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote:
> On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
> > On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
> > > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
> > > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
> > > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > > > > > Given that RCU is currently the only user of this barrier, how 
> > > > > > would you
> > > > > > feel about making the barrier local to RCU and not part of the 
> > > > > > general
> > > > > > memory-barrier API?
> > > > > 
> > > > > In theory, no objection.  Your thought is to leave the definitions 
> > > > > where
> > > > > they are, mark them as being used only by RCU, and removing mention 
> > > > > from
> > > > > memory-barriers.txt?  Or did you have something else in mind?
> > > > 
> > > > Actually, I was thinking of defining them in an RCU header file with an
> > > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a 
> > > > big
> > > > comment describing the semantics, or put that in an RCU Documentation 
> > > > file
> > > > instead of memory-barriers.txt.
> > > > 
> > > > That *should* then mean we notice anybody else trying to use the 
> > > > barrier,
> > > > because they'd need to send patches to either add something equivalent
> > > > or move the definition out again.
> > > 
> > > My concern with this approach is that someone putting together a new
> > > architecture might miss this.  That said, this approach certainly would
> > > work for the current architectures.
> > 
> > I don't think they're any more likely to miss it than with the current
> > situation where the generic code defines the macro as a NOP unless you
> > explicitly override it.
> 
> Fair enough...

Like this?

Thanx, Paul



commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
Author: Paul E. McKenney 
Date:   Tue Jul 14 18:35:23 2015 -0700

rcu,locking: Privatize smp_mb__after_unlock_lock()

RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
likely the only thing that ever will use it, so this commit makes this
macro private to RCU.

Signed-off-by: Paul E. McKenney 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Benjamin Herrenschmidt 
Cc: "linux-a...@vger.kernel.org" 

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index 318523872db5..eafa6a53f72c 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only from 
the perspective of
 another CPU not holding that lock.  In short, a ACQUIRE followed by an
 RELEASE may -not- be assumed to be a full memory barrier.
 
-Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
-imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
-pair to produce a full barrier, the ACQUIRE can be followed by an
-smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
-(including transitivity) if either (a) the RELEASE and the ACQUIRE are
-executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
-the same variable.  The smp_mb__after_unlock_lock() primitive is free
-on many architectures.  Without smp_mb__after_unlock_lock(), the CPU's
-execution of the critical sections corresponding to the RELEASE and the
-ACQUIRE can cross, so that:
+Similarly, the reverse case of a RELEASE followed by an ACQUIRE does
+not imply a full memory barrier.  Therefore, the CPU's execution of the
+critical sections corresponding to the RELEASE and the ACQUIRE can cross,
+so that:
 
*A = a;
RELEASE M
@@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding the 
deadlock.
a sleep-unlock race, but the locking primitive needs to resolve
such races properly in any case.
 
-With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
-For example, with the following code, the store to *A will always be
-seen by other CPUs before the store to *B:
-
-   *A = a;
-   RELEASE M
-   ACQUIRE N
-   smp_mb__after_unlock_lock();
-   *B = b;
-
-The operations will always occur in one of the following orders:
-
-   STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B
-   STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B
-   ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B
-
-If the RELEASE and ACQUIRE were instead both operating on the same lock
-variable, only the first of these alternatives can occur.  In addition,
-the more strongly ordered systems may rule out some of the above orders.
-But in any case, as noted earlier, the smp_mb__after_unlock_lock()
-ensures that the store to *A w

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Paul E. McKenney

On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
> On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
> > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
> > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
> > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > > > > Given that RCU is currently the only user of this barrier, how would 
> > > > > you
> > > > > feel about making the barrier local to RCU and not part of the general
> > > > > memory-barrier API?
> > > > 
> > > > In theory, no objection.  Your thought is to leave the definitions where
> > > > they are, mark them as being used only by RCU, and removing mention from
> > > > memory-barriers.txt?  Or did you have something else in mind?
> > > 
> > > Actually, I was thinking of defining them in an RCU header file with an
> > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a big
> > > comment describing the semantics, or put that in an RCU Documentation file
> > > instead of memory-barriers.txt.
> > > 
> > > That *should* then mean we notice anybody else trying to use the barrier,
> > > because they'd need to send patches to either add something equivalent
> > > or move the definition out again.
> > 
> > My concern with this approach is that someone putting together a new
> > architecture might miss this.  That said, this approach certainly would
> > work for the current architectures.
> 
> I don't think they're any more likely to miss it than with the current
> situation where the generic code defines the macro as a NOP unless you
> explicitly override it.

Fair enough...

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Will Deacon

On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
> On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
> > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
> > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > > > Given that RCU is currently the only user of this barrier, how would you
> > > > feel about making the barrier local to RCU and not part of the general
> > > > memory-barrier API?
> > > 
> > > In theory, no objection.  Your thought is to leave the definitions where
> > > they are, mark them as being used only by RCU, and removing mention from
> > > memory-barriers.txt?  Or did you have something else in mind?
> > 
> > Actually, I was thinking of defining them in an RCU header file with an
> > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a big
> > comment describing the semantics, or put that in an RCU Documentation file
> > instead of memory-barriers.txt.
> > 
> > That *should* then mean we notice anybody else trying to use the barrier,
> > because they'd need to send patches to either add something equivalent
> > or move the definition out again.
> 
> My concern with this approach is that someone putting together a new
> architecture might miss this.  That said, this approach certainly would
> work for the current architectures.

I don't think they're any more likely to miss it than with the current
situation where the generic code defines the macro as a NOP unless you
explicitly override it.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Paul E. McKenney

On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
> Hi Paul,
> 
> On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
> > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > > Given that RCU is currently the only user of this barrier, how would you
> > > feel about making the barrier local to RCU and not part of the general
> > > memory-barrier API?
> > 
> > In theory, no objection.  Your thought is to leave the definitions where
> > they are, mark them as being used only by RCU, and removing mention from
> > memory-barriers.txt?  Or did you have something else in mind?
> 
> Actually, I was thinking of defining them in an RCU header file with an
> #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a big
> comment describing the semantics, or put that in an RCU Documentation file
> instead of memory-barriers.txt.
> 
> That *should* then mean we notice anybody else trying to use the barrier,
> because they'd need to send patches to either add something equivalent
> or move the definition out again.

My concern with this approach is that someone putting together a new
architecture might miss this.  That said, this approach certainly would
work for the current architectures.

> > > My main reason for proposing its removal is because I don't want to see
> > > it being used (incorrectly) all over the place to order the new RELEASE
> > > and ACQUIRE operations I posted separately, at which point we have to try
> > > fixing up all the callers or retrofitting some semantics. It doesn't help
> > > that memory-barriers.txt lumps things like LOCK and ACQUIRE together,
> > > whereas this barrier is currently only intended to be used in conjunction
> > > with the former.
> > 
> > Heh!  That lumping was considered to be a feature at the time.  ;-)
> 
> Oh, I'm sure it was added with good intentions!

And we all know which road is paved with good intentions!  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Will Deacon

Hi Paul,

On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
> On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > Given that RCU is currently the only user of this barrier, how would you
> > feel about making the barrier local to RCU and not part of the general
> > memory-barrier API?
> 
> In theory, no objection.  Your thought is to leave the definitions where
> they are, mark them as being used only by RCU, and removing mention from
> memory-barriers.txt?  Or did you have something else in mind?

Actually, I was thinking of defining them in an RCU header file with an
#ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a big
comment describing the semantics, or put that in an RCU Documentation file
instead of memory-barriers.txt.

That *should* then mean we notice anybody else trying to use the barrier,
because they'd need to send patches to either add something equivalent
or move the definition out again.

> > My main reason for proposing its removal is because I don't want to see
> > it being used (incorrectly) all over the place to order the new RELEASE
> > and ACQUIRE operations I posted separately, at which point we have to try
> > fixing up all the callers or retrofitting some semantics. It doesn't help
> > that memory-barriers.txt lumps things like LOCK and ACQUIRE together,
> > whereas this barrier is currently only intended to be used in conjunction
> > with the former.
> 
> Heh!  That lumping was considered to be a feature at the time.  ;-)

Oh, I'm sure it was added with good intentions!

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Paul E. McKenney

On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> On Tue, Jul 14, 2015 at 12:04:06AM +0100, Paul E. McKenney wrote:
> > On Tue, Jul 14, 2015 at 12:23:46AM +0200, Peter Zijlstra wrote:
> > > If we look at the inside of the critical section again -- similar
> > > argument as before:
> > > 
> > >   *A = a
> > >   smp_mb()
> > >   store M
> > >   load N
> > >   smp_mb()
> > >   *B = b
> > > 
> > > A and B are fully ordered, and in this case even transitivity is
> > > provided.
> > > 
> > > I'm stating that the order of M and N don't matter, only the
> > > load/stores that are inside the acquire/release are constrained.
> > 
> > No argument here.
> > 
> > > IOW, I think smp_mb__after_unlock_lock() already works as advertised
> > > with all our acquire/release thingies -- as is stated by the
> > > documentation.
> > > 
> > > That said, I'm not aware of anybody but RCU actually using this, so its
> > > not used in that capacity.
> > 
> > OK, I might actually understand what you are getting at.  And, yes, if
> > someone actually comes up with a need to combine smp_mb__after_unlock_lock()
> > with something other than locking, we should worry about it at that point.
> > And probably rename smp_mb__after_unlock_lock() at that point, as well.
> > Until then, why lock ourselves into semantics that no one needs, and
> > that it is quite possible that no one will ever need?
> 
> Given that RCU is currently the only user of this barrier, how would you
> feel about making the barrier local to RCU and not part of the general
> memory-barrier API?

In theory, no objection.  Your thought is to leave the definitions where
they are, mark them as being used only by RCU, and removing mention from
memory-barriers.txt?  Or did you have something else in mind?

> My main reason for proposing its removal is because I don't want to see
> it being used (incorrectly) all over the place to order the new RELEASE
> and ACQUIRE operations I posted separately, at which point we have to try
> fixing up all the callers or retrofitting some semantics. It doesn't help
> that memory-barriers.txt lumps things like LOCK and ACQUIRE together,
> whereas this barrier is currently only intended to be used in conjunction
> with the former.

Heh!  That lumping was considered to be a feature at the time.  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Will Deacon

On Mon, Jul 13, 2015 at 11:31:29PM +0100, Benjamin Herrenschmidt wrote:
> On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
> > This didn't go anywhere last time I posted it, but here it is again.
> > I'd really appreciate some feedback from the PowerPC guys, especially as
> > to whether this change requires them to add an additional barrier in
> > arch_spin_unlock and what the cost of that would be.
> 
> We'd have to turn the lwsync in unlock or the isync in lock into a full
> barrier. As it is, we *almost* have a full barrier semantic, but not
> quite, as in things can get mixed up inside spin_lock between the LL and
> the SC (things leaking in past LL and things leaking "out" up before SC
> and then getting mixed up in there).

Thanks, Ben.

> Michael, at some point you were experimenting a bit with that and tried
> to get some perf numbers of the impact that would have, did that
> solidify ? Otherwise, I'll have a look when I'm back next week.

These numbers would be really interesting...

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Will Deacon

On Tue, Jul 14, 2015 at 12:04:06AM +0100, Paul E. McKenney wrote:
> On Tue, Jul 14, 2015 at 12:23:46AM +0200, Peter Zijlstra wrote:
> > If we look at the inside of the critical section again -- similar
> > argument as before:
> > 
> > *A = a
> > smp_mb()
> > store M
> > load N
> > smp_mb()
> > *B = b
> > 
> > A and B are fully ordered, and in this case even transitivity is
> > provided.
> > 
> > I'm stating that the order of M and N don't matter, only the
> > load/stores that are inside the acquire/release are constrained.
> 
> No argument here.
> 
> > IOW, I think smp_mb__after_unlock_lock() already works as advertised
> > with all our acquire/release thingies -- as is stated by the
> > documentation.
> > 
> > That said, I'm not aware of anybody but RCU actually using this, so its
> > not used in that capacity.
> 
> OK, I might actually understand what you are getting at.  And, yes, if
> someone actually comes up with a need to combine smp_mb__after_unlock_lock()
> with something other than locking, we should worry about it at that point.
> And probably rename smp_mb__after_unlock_lock() at that point, as well.
> Until then, why lock ourselves into semantics that no one needs, and
> that it is quite possible that no one will ever need?

Given that RCU is currently the only user of this barrier, how would you
feel about making the barrier local to RCU and not part of the general
memory-barrier API?

My main reason for proposing its removal is because I don't want to see
it being used (incorrectly) all over the place to order the new RELEASE
and ACQUIRE operations I posted separately, at which point we have to try
fixing up all the callers or retrofitting some semantics. It doesn't help
that memory-barriers.txt lumps things like LOCK and ACQUIRE together,
whereas this barrier is currently only intended to be used in conjunction
with the former.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Peter Zijlstra

On Tue, Jul 14, 2015 at 08:43:44AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2015-07-14 at 00:15 +0200, Peter Zijlstra wrote:
> 
> > > This is instead the sequence that is of concern:
> > > 
> > >   store a
> > >   unlock M
> > >   lock N
> > >   load b
> > 
> > So its late and that table didn't parse, but that should be ordered too.
> > The load of b should not be able to escape the lock N.
> > 
> > If only because LWSYNC is a valid RMB and any LOCK implementation must
> > load the lock state to observe it unlocked.
> 
> What happens is that the load passes the store conditional, though it
> doesn't pass the load with reserve. However, both store A and unlock M
> being just stores with an lwsync, can pass a load, so they can pass the
> load with reserve. And thus inside the LL/SC loop, our store A has
> passed our load B.

Ah cute.. Thanks, clearly I wasn't awake enough anymore :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Paul E. McKenney

On Tue, Jul 14, 2015 at 12:23:46AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 13, 2015 at 01:20:32PM -0700, Paul E. McKenney wrote:
> > On Mon, Jul 13, 2015 at 06:50:29PM +0100, Will Deacon wrote:
> 
> > > So if I'm following along, smp_mb__after_unlock_lock *does* provide
> > > transitivity when used with UNLOCK + LOCK, which is stronger than your
> > > example here.
> > 
> > Yes, that is indeed the intent.
> 
> Maybe good to state this explicitly somewhere.

Fair enough!  Please see patch below.

> > > I don't think we want to make the same guarantee for general RELEASE +
> > > ACQUIRE, because we'd end up forcing most architectures to implement the
> > > expensive macro for a case that currently has no users.
> > 
> > Agreed, smp_mb__after_unlock_lock() makes a limited guarantee.
> 
> I'm still not seeing how the archs that implement load_acquire and
> store_release with smp_mb() are a problem.

I don't know that I ever claimed such architectures to be a problem.
Color me confused.

> If we look at the inside of the critical section again -- similar
> argument as before:
> 
>   *A = a
>   smp_mb()
>   store M
>   load N
>   smp_mb()
>   *B = b
> 
> A and B are fully ordered, and in this case even transitivity is
> provided.
> 
> I'm stating that the order of M and N don't matter, only the
> load/stores that are inside the acquire/release are constrained.

No argument here.

> IOW, I think smp_mb__after_unlock_lock() already works as advertised
> with all our acquire/release thingies -- as is stated by the
> documentation.
> 
> That said, I'm not aware of anybody but RCU actually using this, so its
> not used in that capacity.

OK, I might actually understand what you are getting at.  And, yes, if
someone actually comes up with a need to combine smp_mb__after_unlock_lock()
with something other than locking, we should worry about it at that point.
And probably rename smp_mb__after_unlock_lock() at that point, as well.
Until then, why lock ourselves into semantics that no one needs, and
that it is quite possible that no one will ever need?

> > > In which case, it boils down to the question of how expensive it would
> > > be to implement an SC UNLOCK operation on PowerPC and whether that 
> > > justifies
> > > the existence of a complicated barrier macro that isn't used outside of
> > > RCU.
> > 
> > Given that it is either smp_mb() or nothing, I am not seeing the
> > "complicated" part...
> 
> The 'complicated' part is that we need think about it; that is we need
> to realized and remember that UNLOCK+LOCK is a load-store barrier but
> fails to provide transitivity.

...  unless you are holding the lock.  So in the common case, you do
get transitivity.

Thanx, Paul



commit bae5cf1e2973bb1e8f852abb54f8b1948ffd82e4
Author: Paul E. McKenney 
Date:   Mon Jul 13 15:55:52 2015 -0700

doc: Call out smp_mb__after_unlock_lock() transitivity

Although "full barrier" should be interpreted as providing transitivity,
it is worth eliminating any possible confusion.  This commit therefore
adds "(including transitivity)" to eliminate any possible confusion.

Reported-by: Peter Zijlstra 
Signed-off-by: Paul E. McKenney 

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index 470c07c868e4..318523872db5 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1858,11 +1858,12 @@ Similarly, the reverse case of a RELEASE followed by an 
ACQUIRE does not
 imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
 pair to produce a full barrier, the ACQUIRE can be followed by an
 smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
-if either (a) the RELEASE and the ACQUIRE are executed by the same
-CPU or task, or (b) the RELEASE and ACQUIRE act on the same variable.
-The smp_mb__after_unlock_lock() primitive is free on many architectures.
-Without smp_mb__after_unlock_lock(), the CPU's execution of the critical
-sections corresponding to the RELEASE and the ACQUIRE can cross, so that:
+(including transitivity) if either (a) the RELEASE and the ACQUIRE are
+executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
+the same variable.  The smp_mb__after_unlock_lock() primitive is free
+on many architectures.  Without smp_mb__after_unlock_lock(), the CPU's
+execution of the critical sections corresponding to the RELEASE and the
+ACQUIRE can cross, so that:
 
*A = a;
RELEASE M

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Paul E. McKenney

On Tue, Jul 14, 2015 at 12:15:03AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 13, 2015 at 01:16:42PM -0700, Paul E. McKenney wrote:
> > On Mon, Jul 13, 2015 at 03:41:53PM -0400, Peter Hurley wrote:
> > > > Does that answer the question, or am I missing the point?
> > > 
> > > Yes, it shows that smp_mb__after_unlock_lock() has no purpose, since it
> > > is defined only for PowerPC and your test above just showed that for
> > > the sequence
> 
> The only purpose is to provide transitivity, but the documentation fails
> to explicitly call that out.

It does say that it is a full barrier, but I added explicit mention of
transitivity.

> > > 
> > >   store a
> > >   UNLOCK M
> > >   LOCK N
> > >   store b
> > > 
> > > a and b is always observed as an ordered pair {a,b}.
> > 
> > Not quite.
> > 
> > This is instead the sequence that is of concern:
> > 
> > store a
> > unlock M
> > lock N
> > load b
> 
> So its late and that table didn't parse, but that should be ordered too.
> The load of b should not be able to escape the lock N.
> 
> If only because LWSYNC is a valid RMB and any LOCK implementation must
> load the lock state to observe it unlocked.

If you actually hold a given lock, then yes, you will observe anything
previously done while holding that same lock, even if you don't use
smp_mb__after_unlock_lock().  The smp_mb__after_unlock_lock() comes into
play when code not holding a lock needs to see the ordering.  RCU needs
this because of the strong ordering that grace periods must provide:
regardless of who started or ended the grace period, anything on any
CPU preceding a given grace period is fully ordered before anything on
any CPU following that same grace period.  It is not clear to me that
anything else would need such strong ordering.

> > > Additionally, the assertion in Documentation/memory_barriers.txt that
> > > the sequence above can be reordered as
> > > 
> > >   LOCK N
> > >   store b
> > >   store a
> > >   UNLOCK M
> > > 
> > > is not true on any existing arch in Linux.
> > 
> > It was at one time and might be again.
> 
> What would be required to make this true? I'm having a hard time seeing
> how things can get reordered like that.

You are right, I failed to merge current and past knowledge.  At one time,
Itanium was said to allow things to bleed into lock-based critical sections.
However, we now know that ld,acq and st,rel really do full ordering.

Compilers might one day do this sort of reordering, but I would guess
that Linux kernel builds would disable this sort of thing.  Something
about wanting critical sections to remain small.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Benjamin Herrenschmidt

On Tue, 2015-07-14 at 00:15 +0200, Peter Zijlstra wrote:

> > This is instead the sequence that is of concern:
> > 
> > store a
> > unlock M
> > lock N
> > load b
> 
> So its late and that table didn't parse, but that should be ordered too.
> The load of b should not be able to escape the lock N.
> 
> If only because LWSYNC is a valid RMB and any LOCK implementation must
> load the lock state to observe it unlocked.

What happens is that the load passes the store conditional, though it
doesn't pass the load with reserve. However, both store A and unlock M
being just stores with an lwsync, can pass a load, so they can pass the
load with reserve. And thus inside the LL/SC loop, our store A has
passed our load B.

> > > Additionally, the assertion in Documentation/memory_barriers.txt that
> > > the sequence above can be reordered as
> > > 
> > >   LOCK N
> > >   store b
> > >   store a
> > >   UNLOCK M
> > > 
> > > is not true on any existing arch in Linux.
> > 
> > It was at one time and might be again.
> 
> What would be required to make this true? I'm having a hard time seeing
> how things can get reordered like that.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Benjamin Herrenschmidt

On Mon, 2015-07-13 at 17:54 +0200, Peter Zijlstra wrote:

> That said, I don't think this could even happen on PPC because we have
> load_acquire and store_release, this means that:
> 
>   *A = a
>   lwsync
>   store_release M
>   load_acquire N
>   lwsync
>   *B = b
> 
> And since the store to M is wrapped inside two lwsync there must be
> strong store order, and because the load from N is equally wrapped in
> two lwsyncs there must also be strong load order.
> 
> In fact, no store/load can cross from before the first lwsync to after
> the latter and the other way around.
> 
> So in that respect it does provide full load-store ordering. What it
> does not provide is order for M and N, nor does it provide transitivity,
> but looking at our documentation I'm not at all sure we guarantee that
> in any case.

The problem is if you have a load after the second lwsync, that load can
go up pass the store release. This has caused issues when N or M is what
you are trying to order against. That's why we had to add a sync to
spin_is_locked or similar.

Ben.

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Benjamin Herrenschmidt

On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
> smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
> into a full memory barrier.
> 
> However:
> 
>   - This ordering guarantee is already provided without the barrier on
> all architectures apart from PowerPC
> 
>   - The barrier only applies to UNLOCK + LOCK, not general
> RELEASE + ACQUIRE operations
> 
>   - Locks are generally assumed to offer SC ordering semantics, so
> having this additional barrier is error-prone and complicates the
> callers of LOCK/UNLOCK primitives
> 
>   - The barrier is not well used outside of RCU and, because it was
> retrofitted into the kernel, it's not clear whether other areas of
> the kernel are incorrectly relying on UNLOCK + LOCK implying a full
> barrier
> 
> This patch removes the barrier and instead requires architectures to
> provide full barrier semantics for an UNLOCK + LOCK sequence.
> 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul McKenney 
> Cc: Peter Zijlstra 
> Signed-off-by: Will Deacon 
> ---
> 
> This didn't go anywhere last time I posted it, but here it is again.
> I'd really appreciate some feedback from the PowerPC guys, especially as
> to whether this change requires them to add an additional barrier in
> arch_spin_unlock and what the cost of that would be.

We'd have to turn the lwsync in unlock or the isync in lock into a full
barrier. As it is, we *almost* have a full barrier semantic, but not
quite, as in things can get mixed up inside spin_lock between the LL and
the SC (things leaking in past LL and things leaking "out" up before SC
and then getting mixed up in there).

Michael, at some point you were experimenting a bit with that and tried
to get some perf numbers of the impact that would have, did that
solidify ? Otherwise, I'll have a look when I'm back next week.

Ben.

>  Documentation/memory-barriers.txt   | 77 
> ++---
>  arch/powerpc/include/asm/spinlock.h |  2 -
>  include/linux/spinlock.h| 10 -
>  kernel/locking/mcs_spinlock.h   |  9 -
>  kernel/rcu/tree.c   | 21 +-
>  kernel/rcu/tree_plugin.h| 11 --
>  6 files changed, 4 insertions(+), 126 deletions(-)
> 
> diff --git a/Documentation/memory-barriers.txt 
> b/Documentation/memory-barriers.txt
> index 13feb697271f..fff21b632893 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1848,74 +1848,9 @@ RELEASE are to the same lock variable, but only from 
> the perspective of
>  another CPU not holding that lock.  In short, a ACQUIRE followed by an
>  RELEASE may -not- be assumed to be a full memory barrier.
>  
> -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
> -imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
> -pair to produce a full barrier, the ACQUIRE can be followed by an
> -smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
> -if either (a) the RELEASE and the ACQUIRE are executed by the same
> -CPU or task, or (b) the RELEASE and ACQUIRE act on the same variable.
> -The smp_mb__after_unlock_lock() primitive is free on many architectures.
> -Without smp_mb__after_unlock_lock(), the CPU's execution of the critical
> -sections corresponding to the RELEASE and the ACQUIRE can cross, so that:
> -
> - *A = a;
> - RELEASE M
> - ACQUIRE N
> - *B = b;
> -
> -could occur as:
> -
> - ACQUIRE N, STORE *B, STORE *A, RELEASE M
> -
> -It might appear that this reordering could introduce a deadlock.
> -However, this cannot happen because if such a deadlock threatened,
> -the RELEASE would simply complete, thereby avoiding the deadlock.
> -
> - Why does this work?
> -
> - One key point is that we are only talking about the CPU doing
> - the reordering, not the compiler.  If the compiler (or, for
> - that matter, the developer) switched the operations, deadlock
> - -could- occur.
> -
> - But suppose the CPU reordered the operations.  In this case,
> - the unlock precedes the lock in the assembly code.  The CPU
> - simply elected to try executing the later lock operation first.
> - If there is a deadlock, this lock operation will simply spin (or
> - try to sleep, but more on that later).  The CPU will eventually
> - execute the unlock operation (which preceded the lock operation
> - in the assembly code), which will unravel the potential deadlock,
> - allowing the lock operation to succeed.
> -
> - But what if the lock is a sleeplock?  In that case, the code will
> - try to enter the scheduler, where it will eventually encounter
> - a memory barrier, which will force the earlier unlock operation
> - to complete, again unraveling the deadlock.  There might be
> - a sleep-unlock race, but the locking primitive needs to resolve
> - such races properly in any case.
> -
> -With smp_mb__after_unlock_lock()

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Peter Zijlstra

On Mon, Jul 13, 2015 at 01:20:32PM -0700, Paul E. McKenney wrote:
> On Mon, Jul 13, 2015 at 06:50:29PM +0100, Will Deacon wrote:

> > So if I'm following along, smp_mb__after_unlock_lock *does* provide
> > transitivity when used with UNLOCK + LOCK, which is stronger than your
> > example here.
> 
> Yes, that is indeed the intent.

Maybe good to state this explicitly somewhere.

> > I don't think we want to make the same guarantee for general RELEASE +
> > ACQUIRE, because we'd end up forcing most architectures to implement the
> > expensive macro for a case that currently has no users.
> 
> Agreed, smp_mb__after_unlock_lock() makes a limited guarantee.

I'm still not seeing how the archs that implement load_acquire and
store_release with smp_mb() are a problem.

If we look at the inside of the critical section again -- similar
argument as before:

*A = a
smp_mb()
store M
load N
smp_mb()
*B = b

A and B are fully ordered, and in this case even transitivity is
provided.

I'm stating that the order of M and N don't matter, only the
load/stores that are inside the acquire/release are constrained.

IOW, I think smp_mb__after_unlock_lock() already works as advertised
with all our acquire/release thingies -- as is stated by the
documentation.

That said, I'm not aware of anybody but RCU actually using this, so its
not used in that capacity.

> > In which case, it boils down to the question of how expensive it would
> > be to implement an SC UNLOCK operation on PowerPC and whether that justifies
> > the existence of a complicated barrier macro that isn't used outside of
> > RCU.
> 
> Given that it is either smp_mb() or nothing, I am not seeing the
> "complicated" part...

The 'complicated' part is that we need think about it; that is we need
to realized and remember that UNLOCK+LOCK is a load-store barrier but
fails to provide transitivity.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Peter Zijlstra

On Mon, Jul 13, 2015 at 01:16:42PM -0700, Paul E. McKenney wrote:
> On Mon, Jul 13, 2015 at 03:41:53PM -0400, Peter Hurley wrote:
> > > Does that answer the question, or am I missing the point?
> > 
> > Yes, it shows that smp_mb__after_unlock_lock() has no purpose, since it
> > is defined only for PowerPC and your test above just showed that for
> > the sequence

The only purpose is to provide transitivity, but the documentation fails
to explicitly call that out.

> > 
> >   store a
> >   UNLOCK M
> >   LOCK N
> >   store b
> > 
> > a and b is always observed as an ordered pair {a,b}.
> 
> Not quite.
> 
> This is instead the sequence that is of concern:
> 
>   store a
>   unlock M
>   lock N
>   load b

So its late and that table didn't parse, but that should be ordered too.
The load of b should not be able to escape the lock N.

If only because LWSYNC is a valid RMB and any LOCK implementation must
load the lock state to observe it unlocked.

> > Additionally, the assertion in Documentation/memory_barriers.txt that
> > the sequence above can be reordered as
> > 
> >   LOCK N
> >   store b
> >   store a
> >   UNLOCK M
> > 
> > is not true on any existing arch in Linux.
> 
> It was at one time and might be again.

What would be required to make this true? I'm having a hard time seeing
how things can get reordered like that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Paul E. McKenney

On Mon, Jul 13, 2015 at 06:50:29PM +0100, Will Deacon wrote:
> On Mon, Jul 13, 2015 at 04:54:47PM +0100, Peter Zijlstra wrote:
> > However I think we should look at the insides of the critical sections;
> > for example (from Documentation/memory-barriers.txt):
> > 
> > "   *A = a;
> > RELEASE M
> > ACQUIRE N
> > *B = b;
> > 
> > could occur as:
> > 
> > ACQUIRE N, STORE *B, STORE *A, RELEASE M"
> > 
> > This could not in fact happen, even though we could flip M and N, A and
> > B will remain strongly ordered.
> > 
> > That said, I don't think this could even happen on PPC because we have
> > load_acquire and store_release, this means that:
> > 
> > *A = a
> > lwsync
> > store_release M
> > load_acquire N
> > lwsync
> > *B = b
> > 
> > And since the store to M is wrapped inside two lwsync there must be
> > strong store order, and because the load from N is equally wrapped in
> > two lwsyncs there must also be strong load order.
> > 
> > In fact, no store/load can cross from before the first lwsync to after
> > the latter and the other way around.
> > 
> > So in that respect it does provide full load-store ordering. What it
> > does not provide is order for M and N, nor does it provide transitivity,
> > but looking at our documentation I'm not at all sure we guarantee that
> > in any case.
> 
> So if I'm following along, smp_mb__after_unlock_lock *does* provide
> transitivity when used with UNLOCK + LOCK, which is stronger than your
> example here.

Yes, that is indeed the intent.

> I don't think we want to make the same guarantee for general RELEASE +
> ACQUIRE, because we'd end up forcing most architectures to implement the
> expensive macro for a case that currently has no users.

Agreed, smp_mb__after_unlock_lock() makes a limited guarantee.

> In which case, it boils down to the question of how expensive it would
> be to implement an SC UNLOCK operation on PowerPC and whether that justifies
> the existence of a complicated barrier macro that isn't used outside of
> RCU.

Given that it is either smp_mb() or nothing, I am not seeing the
"complicated" part...

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Paul E. McKenney

On Mon, Jul 13, 2015 at 03:41:53PM -0400, Peter Hurley wrote:
> On 07/13/2015 02:23 PM, Paul E. McKenney wrote:
> > On Mon, Jul 13, 2015 at 05:54:47PM +0200, Peter Zijlstra wrote:
> >> On Mon, Jul 13, 2015 at 03:21:10PM +0100, Will Deacon wrote:
> >>> On Mon, Jul 13, 2015 at 03:09:15PM +0100, Will Deacon wrote:
>  On Mon, Jul 13, 2015 at 02:11:43PM +0100, Peter Zijlstra wrote:
> > On Mon, Jul 13, 2015 at 01:15:04PM +0100, Will Deacon wrote:
> >> smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
> >> into a full memory barrier.
> >>
> >> However:
> >
> >>   - The barrier only applies to UNLOCK + LOCK, not general
> >> RELEASE + ACQUIRE operations
> >
> > No it does too; note that on ppc both acquire and release use lwsync and
> > two lwsyncs do not make a sync.
> 
>  Really? IIUC, that means smp_mb__after_unlock_lock needs to be a full
>  barrier on all architectures implementing smp_store_release as smp_mb() +
>  STORE, otherwise the following isn't ordered:
> 
>    RELEASE X
>    smp_mb__after_unlock_lock()
>    ACQUIRE Y
> 
>  On 32-bit ARM (at least), the ACQUIRE can be observed before the RELEASE.
> >>>
> >>> I knew we'd had this conversation before ;)
> >>>
> >>>   
> >>> http://lkml.kernel.org/r/20150120093443.ga11...@twins.programming.kicks-ass.net
> >>
> >> Ha! yes. And I had indeed forgotten about this argument.
> >>
> >> However I think we should look at the insides of the critical sections;
> >> for example (from Documentation/memory-barriers.txt):
> >>
> >> "   *A = a;
> >> RELEASE M
> >> ACQUIRE N
> >> *B = b;
> >>
> >> could occur as:
> >>
> >> ACQUIRE N, STORE *B, STORE *A, RELEASE M"
> >>
> >> This could not in fact happen, even though we could flip M and N, A and
> >> B will remain strongly ordered.
> >>
> >> That said, I don't think this could even happen on PPC because we have
> >> load_acquire and store_release, this means that:
> >>
> >>*A = a
> >>lwsync
> >>store_release M
> >>load_acquire N
> >>lwsync
> > 
> > Presumably the lwsync instructions are part of the store_release and
> > load_acquire?
> > 
> >>*B = b
> >>
> >> And since the store to M is wrapped inside two lwsync there must be
> >> strong store order, and because the load from N is equally wrapped in
> >> two lwsyncs there must also be strong load order.
> >>
> >> In fact, no store/load can cross from before the first lwsync to after
> >> the latter and the other way around.
> >>
> >> So in that respect it does provide full load-store ordering. What it
> >> does not provide is order for M and N, nor does it provide transitivity,
> >> but looking at our documentation I'm not at all sure we guarantee that
> >> in any case.
> > 
> > I have no idea what the other thread is doing, so I put together the
> > following litmus test, guessing reverse order, inverse operations,
> > and full ordering:
> > 
> > PPC peterz.2015.07.13a
> > ""
> > {
> > 0:r1=1; 0:r2=a; 0:r3=b; 0:r4=m; 0:r5=n;
> > 1:r1=1; 1:r2=a; 1:r3=b; 1:r4=m; 1:r5=n;
> > }
> >  P0| P1;
> >  stw r1,0(r2)  | lwz r10,0(r3) ;
> >  lwsync| sync  ;
> >  stw r1,0(r4)  | stw r1,0(r5)  ;
> >  lwz r10,0(r5) | sync  ;
> >  lwsync| lwz r11,0(r4) ;
> >  stw r1,0(r3)  | sync  ;
> >| lwz r12,0(r2) ;
> > exists
> > (0:r10=0 /\ 1:r10=1 /\ 1:r11=1 /\ 1:r12=1)
> > 
> > See http://lwn.net/Articles/608550/ and http://lwn.net/Articles/470681/
> > for information on tools that operate on these litmus tests.  (Both
> > the herd and ppcmem tools agree, as is usually the case.)
> > 
> > Of the 16 possible combinations of values loaded, the following seven
> > can happen:
> > 
> > 0:r10=0; 1:r10=0; 1:r11=0; 1:r12=0;
> > 0:r10=0; 1:r10=0; 1:r11=0; 1:r12=1;
> > 0:r10=0; 1:r10=0; 1:r11=1; 1:r12=1;
> > 0:r10=0; 1:r10=1; 1:r11=1; 1:r12=1;
> > 0:r10=1; 1:r10=0; 1:r11=0; 1:r12=0;
> > 0:r10=1; 1:r10=0; 1:r11=0; 1:r12=1;
> > 0:r10=1; 1:r10=0; 1:r11=1; 1:r12=1;
> > 
> > P0's store to "m" and load from "n" can clearly be misordered, as there
> > is nothing to order them.  And all four possible outcomes for 0:r10 and
> > 1:r11 are seen, as expected.
> > 
> > Given that smp_store_release() is only guaranteed to order against prior
> > operations and smp_load_acquire() is only guaranteed to order against
> > subsequent operations, P0's load from "n" can be misordered with its
> > store to "a", and as expected, all four possible outcomes for 0:r10 and
> > 1:r12 are observed.
> > 
> > P0's pairs of stores should all be ordered:
> > 
> > o   "a" and "m" -> 1:r11=1 and 1:r12=0 cannot happen, as expected.
> > 
> > o   "a" and "b" -> 1:r10=1 and 1:r12=0 cannot happen, as expected.
> > 
> > o   "m" and "b" -> 1:r10=1 and 1:r11=0 cannot happen, as expected.
> > 
> > So smp_load_a

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Peter Hurley

On 07/13/2015 02:23 PM, Paul E. McKenney wrote:
> On Mon, Jul 13, 2015 at 05:54:47PM +0200, Peter Zijlstra wrote:
>> On Mon, Jul 13, 2015 at 03:21:10PM +0100, Will Deacon wrote:
>>> On Mon, Jul 13, 2015 at 03:09:15PM +0100, Will Deacon wrote:
 On Mon, Jul 13, 2015 at 02:11:43PM +0100, Peter Zijlstra wrote:
> On Mon, Jul 13, 2015 at 01:15:04PM +0100, Will Deacon wrote:
>> smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
>> into a full memory barrier.
>>
>> However:
>
>>   - The barrier only applies to UNLOCK + LOCK, not general
>> RELEASE + ACQUIRE operations
>
> No it does too; note that on ppc both acquire and release use lwsync and
> two lwsyncs do not make a sync.

 Really? IIUC, that means smp_mb__after_unlock_lock needs to be a full
 barrier on all architectures implementing smp_store_release as smp_mb() +
 STORE, otherwise the following isn't ordered:

   RELEASE X
   smp_mb__after_unlock_lock()
   ACQUIRE Y

 On 32-bit ARM (at least), the ACQUIRE can be observed before the RELEASE.
>>>
>>> I knew we'd had this conversation before ;)
>>>
>>>   
>>> http://lkml.kernel.org/r/20150120093443.ga11...@twins.programming.kicks-ass.net
>>
>> Ha! yes. And I had indeed forgotten about this argument.
>>
>> However I think we should look at the insides of the critical sections;
>> for example (from Documentation/memory-barriers.txt):
>>
>> "   *A = a;
>> RELEASE M
>> ACQUIRE N
>> *B = b;
>>
>> could occur as:
>>
>> ACQUIRE N, STORE *B, STORE *A, RELEASE M"
>>
>> This could not in fact happen, even though we could flip M and N, A and
>> B will remain strongly ordered.
>>
>> That said, I don't think this could even happen on PPC because we have
>> load_acquire and store_release, this means that:
>>
>>  *A = a
>>  lwsync
>>  store_release M
>>  load_acquire N
>>  lwsync
> 
> Presumably the lwsync instructions are part of the store_release and
> load_acquire?
> 
>>  *B = b
>>
>> And since the store to M is wrapped inside two lwsync there must be
>> strong store order, and because the load from N is equally wrapped in
>> two lwsyncs there must also be strong load order.
>>
>> In fact, no store/load can cross from before the first lwsync to after
>> the latter and the other way around.
>>
>> So in that respect it does provide full load-store ordering. What it
>> does not provide is order for M and N, nor does it provide transitivity,
>> but looking at our documentation I'm not at all sure we guarantee that
>> in any case.
> 
> I have no idea what the other thread is doing, so I put together the
> following litmus test, guessing reverse order, inverse operations,
> and full ordering:
> 
>   PPC peterz.2015.07.13a
>   ""
>   {
>   0:r1=1; 0:r2=a; 0:r3=b; 0:r4=m; 0:r5=n;
>   1:r1=1; 1:r2=a; 1:r3=b; 1:r4=m; 1:r5=n;
>   }
>P0| P1;
>stw r1,0(r2)  | lwz r10,0(r3) ;
>lwsync| sync  ;
>stw r1,0(r4)  | stw r1,0(r5)  ;
>lwz r10,0(r5) | sync  ;
>lwsync| lwz r11,0(r4) ;
>stw r1,0(r3)  | sync  ;
>  | lwz r12,0(r2) ;
>   exists
>   (0:r10=0 /\ 1:r10=1 /\ 1:r11=1 /\ 1:r12=1)
> 
> See http://lwn.net/Articles/608550/ and http://lwn.net/Articles/470681/
> for information on tools that operate on these litmus tests.  (Both
> the herd and ppcmem tools agree, as is usually the case.)
> 
> Of the 16 possible combinations of values loaded, the following seven
> can happen:
> 
>   0:r10=0; 1:r10=0; 1:r11=0; 1:r12=0;
>   0:r10=0; 1:r10=0; 1:r11=0; 1:r12=1;
>   0:r10=0; 1:r10=0; 1:r11=1; 1:r12=1;
>   0:r10=0; 1:r10=1; 1:r11=1; 1:r12=1;
>   0:r10=1; 1:r10=0; 1:r11=0; 1:r12=0;
>   0:r10=1; 1:r10=0; 1:r11=0; 1:r12=1;
>   0:r10=1; 1:r10=0; 1:r11=1; 1:r12=1;
> 
> P0's store to "m" and load from "n" can clearly be misordered, as there
> is nothing to order them.  And all four possible outcomes for 0:r10 and
> 1:r11 are seen, as expected.
> 
> Given that smp_store_release() is only guaranteed to order against prior
> operations and smp_load_acquire() is only guaranteed to order against
> subsequent operations, P0's load from "n" can be misordered with its
> store to "a", and as expected, all four possible outcomes for 0:r10 and
> 1:r12 are observed.
> 
> P0's pairs of stores should all be ordered:
> 
> o "a" and "m" -> 1:r11=1 and 1:r12=0 cannot happen, as expected.
> 
> o "a" and "b" -> 1:r10=1 and 1:r12=0 cannot happen, as expected.
> 
> o "m" and "b" -> 1:r10=1 and 1:r11=0 cannot happen, as expected.
> 
> So smp_load_acquire() orders against all subsequent operations, but not
> necessarily against any prior ones, and smp_store_release() orders against
> all prior operations but not necessarily against any subsequent onse.
> But additional stray orderings are permitted, as

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Paul E. McKenney

On Mon, Jul 13, 2015 at 05:54:47PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 13, 2015 at 03:21:10PM +0100, Will Deacon wrote:
> > On Mon, Jul 13, 2015 at 03:09:15PM +0100, Will Deacon wrote:
> > > On Mon, Jul 13, 2015 at 02:11:43PM +0100, Peter Zijlstra wrote:
> > > > On Mon, Jul 13, 2015 at 01:15:04PM +0100, Will Deacon wrote:
> > > > > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
> > > > > into a full memory barrier.
> > > > > 
> > > > > However:
> > > > 
> > > > >   - The barrier only applies to UNLOCK + LOCK, not general
> > > > > RELEASE + ACQUIRE operations
> > > > 
> > > > No it does too; note that on ppc both acquire and release use lwsync and
> > > > two lwsyncs do not make a sync.
> > > 
> > > Really? IIUC, that means smp_mb__after_unlock_lock needs to be a full
> > > barrier on all architectures implementing smp_store_release as smp_mb() +
> > > STORE, otherwise the following isn't ordered:
> > > 
> > >   RELEASE X
> > >   smp_mb__after_unlock_lock()
> > >   ACQUIRE Y
> > > 
> > > On 32-bit ARM (at least), the ACQUIRE can be observed before the RELEASE.
> > 
> > I knew we'd had this conversation before ;)
> > 
> >   
> > http://lkml.kernel.org/r/20150120093443.ga11...@twins.programming.kicks-ass.net
> 
> Ha! yes. And I had indeed forgotten about this argument.
> 
> However I think we should look at the insides of the critical sections;
> for example (from Documentation/memory-barriers.txt):
> 
> "   *A = a;
> RELEASE M
> ACQUIRE N
> *B = b;
> 
> could occur as:
> 
> ACQUIRE N, STORE *B, STORE *A, RELEASE M"
> 
> This could not in fact happen, even though we could flip M and N, A and
> B will remain strongly ordered.
> 
> That said, I don't think this could even happen on PPC because we have
> load_acquire and store_release, this means that:
> 
>   *A = a
>   lwsync
>   store_release M
>   load_acquire N
>   lwsync

Presumably the lwsync instructions are part of the store_release and
load_acquire?

>   *B = b
> 
> And since the store to M is wrapped inside two lwsync there must be
> strong store order, and because the load from N is equally wrapped in
> two lwsyncs there must also be strong load order.
> 
> In fact, no store/load can cross from before the first lwsync to after
> the latter and the other way around.
> 
> So in that respect it does provide full load-store ordering. What it
> does not provide is order for M and N, nor does it provide transitivity,
> but looking at our documentation I'm not at all sure we guarantee that
> in any case.

I have no idea what the other thread is doing, so I put together the
following litmus test, guessing reverse order, inverse operations,
and full ordering:

PPC peterz.2015.07.13a
""
{
0:r1=1; 0:r2=a; 0:r3=b; 0:r4=m; 0:r5=n;
1:r1=1; 1:r2=a; 1:r3=b; 1:r4=m; 1:r5=n;
}
 P0| P1;
 stw r1,0(r2)  | lwz r10,0(r3) ;
 lwsync| sync  ;
 stw r1,0(r4)  | stw r1,0(r5)  ;
 lwz r10,0(r5) | sync  ;
 lwsync| lwz r11,0(r4) ;
 stw r1,0(r3)  | sync  ;
   | lwz r12,0(r2) ;
exists
(0:r10=0 /\ 1:r10=1 /\ 1:r11=1 /\ 1:r12=1)

See http://lwn.net/Articles/608550/ and http://lwn.net/Articles/470681/
for information on tools that operate on these litmus tests.  (Both
the herd and ppcmem tools agree, as is usually the case.)

Of the 16 possible combinations of values loaded, the following seven
can happen:

0:r10=0; 1:r10=0; 1:r11=0; 1:r12=0;
0:r10=0; 1:r10=0; 1:r11=0; 1:r12=1;
0:r10=0; 1:r10=0; 1:r11=1; 1:r12=1;
0:r10=0; 1:r10=1; 1:r11=1; 1:r12=1;
0:r10=1; 1:r10=0; 1:r11=0; 1:r12=0;
0:r10=1; 1:r10=0; 1:r11=0; 1:r12=1;
0:r10=1; 1:r10=0; 1:r11=1; 1:r12=1;

P0's store to "m" and load from "n" can clearly be misordered, as there
is nothing to order them.  And all four possible outcomes for 0:r10 and
1:r11 are seen, as expected.

Given that smp_store_release() is only guaranteed to order against prior
operations and smp_load_acquire() is only guaranteed to order against
subsequent operations, P0's load from "n" can be misordered with its
store to "a", and as expected, all four possible outcomes for 0:r10 and
1:r12 are observed.

P0's pairs of stores should all be ordered:

o   "a" and "m" -> 1:r11=1 and 1:r12=0 cannot happen, as expected.

o   "a" and "b" -> 1:r10=1 and 1:r12=0 cannot happen, as expected.

o   "m" and "b" -> 1:r10=1 and 1:r11=0 cannot happen, as expected.

So smp_load_acquire() orders against all subsequent operations, but not
necessarily against any prior ones, and smp_store_release() orders against
all prior operations but not necessarily against any subsequent onse.
But additional stray orderings are permitted, as is the case here.
Which is in fact what these operations are defined to do.

Does that answer the q

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Will Deacon

On Mon, Jul 13, 2015 at 04:54:47PM +0100, Peter Zijlstra wrote:
> However I think we should look at the insides of the critical sections;
> for example (from Documentation/memory-barriers.txt):
> 
> "   *A = a;
> RELEASE M
> ACQUIRE N
> *B = b;
> 
> could occur as:
> 
> ACQUIRE N, STORE *B, STORE *A, RELEASE M"
> 
> This could not in fact happen, even though we could flip M and N, A and
> B will remain strongly ordered.
> 
> That said, I don't think this could even happen on PPC because we have
> load_acquire and store_release, this means that:
> 
>   *A = a
>   lwsync
>   store_release M
>   load_acquire N
>   lwsync
>   *B = b
> 
> And since the store to M is wrapped inside two lwsync there must be
> strong store order, and because the load from N is equally wrapped in
> two lwsyncs there must also be strong load order.
> 
> In fact, no store/load can cross from before the first lwsync to after
> the latter and the other way around.
> 
> So in that respect it does provide full load-store ordering. What it
> does not provide is order for M and N, nor does it provide transitivity,
> but looking at our documentation I'm not at all sure we guarantee that
> in any case.

So if I'm following along, smp_mb__after_unlock_lock *does* provide
transitivity when used with UNLOCK + LOCK, which is stronger than your
example here.

I don't think we want to make the same guarantee for general RELEASE +
ACQUIRE, because we'd end up forcing most architectures to implement the
expensive macro for a case that currently has no users.

In which case, it boils down to the question of how expensive it would
be to implement an SC UNLOCK operation on PowerPC and whether that justifies
the existence of a complicated barrier macro that isn't used outside of
RCU.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Peter Zijlstra

On Mon, Jul 13, 2015 at 03:24:18PM +0100, Will Deacon wrote:
> On Mon, Jul 13, 2015 at 02:09:50PM +0100, Peter Hurley wrote:
> > On 07/13/2015 08:15 AM, Will Deacon wrote:
> > > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
> > > into a full memory barrier.
> > >
> > > However:
> > >
> > >   - This ordering guarantee is already provided without the barrier on
> > > all architectures apart from PowerPC
> > >
> > >   - The barrier only applies to UNLOCK + LOCK, not general
> > > RELEASE + ACQUIRE operations
> > 
> > I'm unclear what you mean here: do you mean
> > A) a memory barrier is not required between RELEASE M + ACQUIRE N when you
> >want to maintain distinct order between those operations on all arches
> >(with the possible exception of PowerPC), or,
> > B) no one is using smp_mb__after_unlock_lock() in that way right now.
> 
> My understanding is (B), but Peter and I don't seem to agree yet!
> I'll tighten up the text once we reach a conclusion.

I'm fairly sure (but I've not looked) that nobody does in fact rely on
this.

So I'm in agreement with B, and I'm quibbling on what exactly A means
;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Peter Zijlstra

On Mon, Jul 13, 2015 at 03:21:10PM +0100, Will Deacon wrote:
> On Mon, Jul 13, 2015 at 03:09:15PM +0100, Will Deacon wrote:
> > On Mon, Jul 13, 2015 at 02:11:43PM +0100, Peter Zijlstra wrote:
> > > On Mon, Jul 13, 2015 at 01:15:04PM +0100, Will Deacon wrote:
> > > > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
> > > > into a full memory barrier.
> > > > 
> > > > However:
> > > 
> > > >   - The barrier only applies to UNLOCK + LOCK, not general
> > > > RELEASE + ACQUIRE operations
> > > 
> > > No it does too; note that on ppc both acquire and release use lwsync and
> > > two lwsyncs do not make a sync.
> > 
> > Really? IIUC, that means smp_mb__after_unlock_lock needs to be a full
> > barrier on all architectures implementing smp_store_release as smp_mb() +
> > STORE, otherwise the following isn't ordered:
> > 
> >   RELEASE X
> >   smp_mb__after_unlock_lock()
> >   ACQUIRE Y
> > 
> > On 32-bit ARM (at least), the ACQUIRE can be observed before the RELEASE.
> 
> I knew we'd had this conversation before ;)
> 
>   
> http://lkml.kernel.org/r/20150120093443.ga11...@twins.programming.kicks-ass.net

Ha! yes. And I had indeed forgotten about this argument.

However I think we should look at the insides of the critical sections;
for example (from Documentation/memory-barriers.txt):

"   *A = a;
RELEASE M
ACQUIRE N
*B = b;

could occur as:

ACQUIRE N, STORE *B, STORE *A, RELEASE M"

This could not in fact happen, even though we could flip M and N, A and
B will remain strongly ordered.

That said, I don't think this could even happen on PPC because we have
load_acquire and store_release, this means that:

*A = a
lwsync
store_release M
load_acquire N
lwsync
*B = b

And since the store to M is wrapped inside two lwsync there must be
strong store order, and because the load from N is equally wrapped in
two lwsyncs there must also be strong load order.

In fact, no store/load can cross from before the first lwsync to after
the latter and the other way around.

So in that respect it does provide full load-store ordering. What it
does not provide is order for M and N, nor does it provide transitivity,
but looking at our documentation I'm not at all sure we guarantee that
in any case.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Will Deacon

On Mon, Jul 13, 2015 at 02:09:50PM +0100, Peter Hurley wrote:
> On 07/13/2015 08:15 AM, Will Deacon wrote:
> > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
> > into a full memory barrier.
> >
> > However:
> >
> >   - This ordering guarantee is already provided without the barrier on
> > all architectures apart from PowerPC
> >
> >   - The barrier only applies to UNLOCK + LOCK, not general
> > RELEASE + ACQUIRE operations
> 
> I'm unclear what you mean here: do you mean
> A) a memory barrier is not required between RELEASE M + ACQUIRE N when you
>want to maintain distinct order between those operations on all arches
>(with the possible exception of PowerPC), or,
> B) no one is using smp_mb__after_unlock_lock() in that way right now.

My understanding is (B), but Peter and I don't seem to agree yet!
I'll tighten up the text once we reach a conclusion.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Will Deacon

On Mon, Jul 13, 2015 at 03:09:15PM +0100, Will Deacon wrote:
> On Mon, Jul 13, 2015 at 02:11:43PM +0100, Peter Zijlstra wrote:
> > On Mon, Jul 13, 2015 at 01:15:04PM +0100, Will Deacon wrote:
> > > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
> > > into a full memory barrier.
> > > 
> > > However:
> > 
> > >   - The barrier only applies to UNLOCK + LOCK, not general
> > > RELEASE + ACQUIRE operations
> > 
> > No it does too; note that on ppc both acquire and release use lwsync and
> > two lwsyncs do not make a sync.
> 
> Really? IIUC, that means smp_mb__after_unlock_lock needs to be a full
> barrier on all architectures implementing smp_store_release as smp_mb() +
> STORE, otherwise the following isn't ordered:
> 
>   RELEASE X
>   smp_mb__after_unlock_lock()
>   ACQUIRE Y
> 
> On 32-bit ARM (at least), the ACQUIRE can be observed before the RELEASE.

I knew we'd had this conversation before ;)

  
http://lkml.kernel.org/r/20150120093443.ga11...@twins.programming.kicks-ass.net

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Will Deacon

On Mon, Jul 13, 2015 at 02:11:43PM +0100, Peter Zijlstra wrote:
> On Mon, Jul 13, 2015 at 01:15:04PM +0100, Will Deacon wrote:
> > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
> > into a full memory barrier.
> > 
> > However:
> 
> >   - The barrier only applies to UNLOCK + LOCK, not general
> > RELEASE + ACQUIRE operations
> 
> No it does too; note that on ppc both acquire and release use lwsync and
> two lwsyncs do not make a sync.

Really? IIUC, that means smp_mb__after_unlock_lock needs to be a full
barrier on all architectures implementing smp_store_release as smp_mb() +
STORE, otherwise the following isn't ordered:

  RELEASE X
  smp_mb__after_unlock_lock()
  ACQUIRE Y

On 32-bit ARM (at least), the ACQUIRE can be observed before the RELEASE.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Peter Zijlstra

On Mon, Jul 13, 2015 at 01:15:04PM +0100, Will Deacon wrote:
> smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
> into a full memory barrier.
> 
> However:

>   - The barrier only applies to UNLOCK + LOCK, not general
> RELEASE + ACQUIRE operations

No it does too; note that on ppc both acquire and release use lwsync and
two lwsyncs do not make a sync.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Peter Hurley

On 07/13/2015 08:15 AM, Will Deacon wrote:
> smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
> into a full memory barrier.
> 
> However:
> 
>   - This ordering guarantee is already provided without the barrier on
> all architectures apart from PowerPC
> 
>   - The barrier only applies to UNLOCK + LOCK, not general
> RELEASE + ACQUIRE operations

I'm unclear what you mean here: do you mean
A) a memory barrier is not required between RELEASE M + ACQUIRE N when you
   want to maintain distinct order between those operations on all arches
   (with the possible exception of PowerPC), or,
B) no one is using smp_mb__after_unlock_lock() in that way right now.

Regards,
Peter Hurley

>   - Locks are generally assumed to offer SC ordering semantics, so
> having this additional barrier is error-prone and complicates the
> callers of LOCK/UNLOCK primitives
> 
>   - The barrier is not well used outside of RCU and, because it was
> retrofitted into the kernel, it's not clear whether other areas of
> the kernel are incorrectly relying on UNLOCK + LOCK implying a full
> barrier
> 
> This patch removes the barrier and instead requires architectures to
> provide full barrier semantics for an UNLOCK + LOCK sequence.
> 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul McKenney 
> Cc: Peter Zijlstra 
> Signed-off-by: Will Deacon 
> ---
> 
> This didn't go anywhere last time I posted it, but here it is again.
> I'd really appreciate some feedback from the PowerPC guys, especially as
> to whether this change requires them to add an additional barrier in
> arch_spin_unlock and what the cost of that would be.
> 
>  Documentation/memory-barriers.txt   | 77 
> ++---
>  arch/powerpc/include/asm/spinlock.h |  2 -
>  include/linux/spinlock.h| 10 -
>  kernel/locking/mcs_spinlock.h   |  9 -
>  kernel/rcu/tree.c   | 21 +-
>  kernel/rcu/tree_plugin.h| 11 --
>  6 files changed, 4 insertions(+), 126 deletions(-)
> 
> diff --git a/Documentation/memory-barriers.txt 
> b/Documentation/memory-barriers.txt
> index 13feb697271f..fff21b632893 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1848,74 +1848,9 @@ RELEASE are to the same lock variable, but only from 
> the perspective of
>  another CPU not holding that lock.  In short, a ACQUIRE followed by an
>  RELEASE may -not- be assumed to be a full memory barrier.
>  
> -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
> -imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
> -pair to produce a full barrier, the ACQUIRE can be followed by an
> -smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
> -if either (a) the RELEASE and the ACQUIRE are executed by the same
> -CPU or task, or (b) the RELEASE and ACQUIRE act on the same variable.
> -The smp_mb__after_unlock_lock() primitive is free on many architectures.
> -Without smp_mb__after_unlock_lock(), the CPU's execution of the critical
> -sections corresponding to the RELEASE and the ACQUIRE can cross, so that:
> -
> - *A = a;
> - RELEASE M
> - ACQUIRE N
> - *B = b;
> -
> -could occur as:
> -
> - ACQUIRE N, STORE *B, STORE *A, RELEASE M
> -
> -It might appear that this reordering could introduce a deadlock.
> -However, this cannot happen because if such a deadlock threatened,
> -the RELEASE would simply complete, thereby avoiding the deadlock.
> -
> - Why does this work?
> -
> - One key point is that we are only talking about the CPU doing
> - the reordering, not the compiler.  If the compiler (or, for
> - that matter, the developer) switched the operations, deadlock
> - -could- occur.
> -
> - But suppose the CPU reordered the operations.  In this case,
> - the unlock precedes the lock in the assembly code.  The CPU
> - simply elected to try executing the later lock operation first.
> - If there is a deadlock, this lock operation will simply spin (or
> - try to sleep, but more on that later).  The CPU will eventually
> - execute the unlock operation (which preceded the lock operation
> - in the assembly code), which will unravel the potential deadlock,
> - allowing the lock operation to succeed.
> -
> - But what if the lock is a sleeplock?  In that case, the code will
> - try to enter the scheduler, where it will eventually encounter
> - a memory barrier, which will force the earlier unlock operation
> - to complete, again unraveling the deadlock.  There might be
> - a sleep-unlock race, but the locking primitive needs to resolve
> - such races properly in any case.
> -
> -With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
> -For example, with the following code, the store to *A will always be
> -seen by other CPUs before the store to *B:
> -
> - *A = a;
> - RELEASE M
> -

[RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Will Deacon

smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
into a full memory barrier.

However:

  - This ordering guarantee is already provided without the barrier on
all architectures apart from PowerPC

  - The barrier only applies to UNLOCK + LOCK, not general
RELEASE + ACQUIRE operations

  - Locks are generally assumed to offer SC ordering semantics, so
having this additional barrier is error-prone and complicates the
callers of LOCK/UNLOCK primitives

  - The barrier is not well used outside of RCU and, because it was
retrofitted into the kernel, it's not clear whether other areas of
the kernel are incorrectly relying on UNLOCK + LOCK implying a full
barrier

This patch removes the barrier and instead requires architectures to
provide full barrier semantics for an UNLOCK + LOCK sequence.

Cc: Benjamin Herrenschmidt 
Cc: Paul McKenney 
Cc: Peter Zijlstra 
Signed-off-by: Will Deacon 
---

This didn't go anywhere last time I posted it, but here it is again.
I'd really appreciate some feedback from the PowerPC guys, especially as
to whether this change requires them to add an additional barrier in
arch_spin_unlock and what the cost of that would be.

 Documentation/memory-barriers.txt   | 77 ++---
 arch/powerpc/include/asm/spinlock.h |  2 -
 include/linux/spinlock.h| 10 -
 kernel/locking/mcs_spinlock.h   |  9 -
 kernel/rcu/tree.c   | 21 +-
 kernel/rcu/tree_plugin.h| 11 --
 6 files changed, 4 insertions(+), 126 deletions(-)

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index 13feb697271f..fff21b632893 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1848,74 +1848,9 @@ RELEASE are to the same lock variable, but only from the 
perspective of
 another CPU not holding that lock.  In short, a ACQUIRE followed by an
 RELEASE may -not- be assumed to be a full memory barrier.
 
-Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
-imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
-pair to produce a full barrier, the ACQUIRE can be followed by an
-smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
-if either (a) the RELEASE and the ACQUIRE are executed by the same
-CPU or task, or (b) the RELEASE and ACQUIRE act on the same variable.
-The smp_mb__after_unlock_lock() primitive is free on many architectures.
-Without smp_mb__after_unlock_lock(), the CPU's execution of the critical
-sections corresponding to the RELEASE and the ACQUIRE can cross, so that:
-
-   *A = a;
-   RELEASE M
-   ACQUIRE N
-   *B = b;
-
-could occur as:
-
-   ACQUIRE N, STORE *B, STORE *A, RELEASE M
-
-It might appear that this reordering could introduce a deadlock.
-However, this cannot happen because if such a deadlock threatened,
-the RELEASE would simply complete, thereby avoiding the deadlock.
-
-   Why does this work?
-
-   One key point is that we are only talking about the CPU doing
-   the reordering, not the compiler.  If the compiler (or, for
-   that matter, the developer) switched the operations, deadlock
-   -could- occur.
-
-   But suppose the CPU reordered the operations.  In this case,
-   the unlock precedes the lock in the assembly code.  The CPU
-   simply elected to try executing the later lock operation first.
-   If there is a deadlock, this lock operation will simply spin (or
-   try to sleep, but more on that later).  The CPU will eventually
-   execute the unlock operation (which preceded the lock operation
-   in the assembly code), which will unravel the potential deadlock,
-   allowing the lock operation to succeed.
-
-   But what if the lock is a sleeplock?  In that case, the code will
-   try to enter the scheduler, where it will eventually encounter
-   a memory barrier, which will force the earlier unlock operation
-   to complete, again unraveling the deadlock.  There might be
-   a sleep-unlock race, but the locking primitive needs to resolve
-   such races properly in any case.
-
-With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
-For example, with the following code, the store to *A will always be
-seen by other CPUs before the store to *B:
-
-   *A = a;
-   RELEASE M
-   ACQUIRE N
-   smp_mb__after_unlock_lock();
-   *B = b;
-
-The operations will always occur in one of the following orders:
-
-   STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B
-   STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B
-   ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B
-
-If the RELEASE and ACQUIRE were instead both operating on the same lock
-variable, only the first of these alternatives can occur.  In addition,
-the more strongly ordered syste

66 matches

Mail list logo