Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Thu, Jul 16, 2015 at 03:14:55PM +1000, Benjamin Herrenschmidt wrote: > On Thu, 2015-07-16 at 15:03 +1000, Benjamin Herrenschmidt wrote: > > On Thu, 2015-07-16 at 12:00 +1000, Michael Ellerman wrote: > > > That would fix the problem with smp_mb__after_unlock_lock(), but not > > > the original worry we had about loads happening before the SC in lock. > > > > However I think isync fixes *that* :-) The problem with isync is as you > > said, it's not a -memory- barrier per-se, it's an execution barrier / > > context synchronizing instruction. The combination stwcx. + bne + isync > > however prevents the execution of anything past the isync until the > > stwcx has completed and the bne has been "decided", which prevents loads > > from leaking into the LL/SC loop. It will also prevent a store in the > > lock from being issued before the stwcx. has completed. It does *not* > > prevent as far as I can tell another unrelated store before the lock > > from leaking into the lock, including the one used to unlock a different > > lock. > > Except that the architecture says: > > << > Because a Store Conditional instruction may com- > plete before its store has been performed, a condi- > tional Branch instruction that depends on the CR0 > value set by a Store Conditional instruction does > not order the Store Conditional's store with respect > to storage accesses caused by instructions that > follow the Branch > >> > > So isync in lock in architecturally incorrect, despite being what the > architecture recommends using, yay ! That paragraph doesn't say anything about isync. Doing stcx., bne, isync to acquire a lock will ensure that the instructions inside the locked region can't execute until the lock is globally known to be acquired. It is true that another CPU that doesn't take the lock could see the effect of some load or store inside the locked region and then see the lock variable itself being clear; but any CPU that tried to acquire the lock would have its stcx. fail (at least until the first CPU releases the lock). So isync in the lock sequence is architecturally correct. Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Tue, 2015-08-25 at 17:27 -0700, Paul E. McKenney wrote: > On Thu, Aug 20, 2015 at 04:56:04PM +0100, Will Deacon wrote: > > On Thu, Aug 20, 2015 at 10:45:05AM +0100, Michael Ellerman wrote: > > > On Tue, 2015-08-18 at 09:37 +0100, Will Deacon wrote: > > > > > > > > Thanks, that sounds great. FWIW, there are multiple ways of implementing > > > > the patch (i.e. whether you strengthen lock or unlock). I had a crack at > > > > something here, but it's not tested: > > > > > > > > http://marc.info/?l=linux-arch&m=143758379023849&w=2 > > > > > > I notice you are not changing PPC_RELEASE_BARRIER, but only the spin > > > unlock > > > code. But from my reading of the docs we need to make sure any > > > UNLOCK+LOCK is a > > > full barrier, not just spin unlock/lock? > > > > > > So don't we need to worry about some of the other locks as well? At least > > > rwlock, and mutex fast path? > > > > Hmm, that's a good question. I notice that you don't do any of the SYNC_IO > > stuff for any locks other than spinlocks but I don't know whether > > smp_mb__after_unlock_lock is similarly limited in scope. > > > > Paul? > > I would expect the various locks to have similar ordering characteristics. > > Or am I missing something subtle here? I don't think so. The docs just talk about ACQUIRE/RELEASE, so I think it needs to apply to all lock types. Or at least the list mentioned in the docs which is: (*) spin locks (*) R/W spin locks (*) mutexes (*) semaphores (*) R/W semaphores (*) RCU cheers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Thu, Aug 20, 2015 at 04:56:04PM +0100, Will Deacon wrote: > On Thu, Aug 20, 2015 at 10:45:05AM +0100, Michael Ellerman wrote: > > On Tue, 2015-08-18 at 09:37 +0100, Will Deacon wrote: > > > On Tue, Aug 18, 2015 at 02:50:55AM +0100, Michael Ellerman wrote: > > > > On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote: > > > > > On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote: > > > > > > On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote: > > > > > > > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote: > > > > > > > I thought the end result of this thread was that we didn't *need* > > > > > > > to change the > > > > > > > powerpc lock semantics? Or did I read it wrong? > > > > > > > > > > > > > > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, > > > > > > > which is > > > > > > > consistent with our current implementation. > > > > > > > > > > > > That change happened about 1.5 years ago, and I thought that the > > > > > > current discussion was about reversing it, based in part on the > > > > > > recent powerpc benchmarks of locking primitives with and without the > > > > > > sync instruction. But regardless, I clearly cannot remove either > > > > > > the > > > > > > smp_mb__after_unlock_lock() or the powerpc definition of it to be > > > > > > smp_mb() > > > > > > if powerpc unlock/lock is not strengthened. > > > > > > > > > > Yup. Peter and I would really like to get rid of > > > > > smp_mb__after_unlock_lock > > > > > entirely, which would mean strengthening the ppc spinlocks. Moving the > > > > > barrier primitive into RCU is a good step to prevent more widespread > > > > > usage > > > > > of the barrier, but we'd really like to go further if the performance > > > > > impact > > > > > is deemed acceptable (which is what this thread is about). > > > > > > > > OK, sorry for completely missing the point, too many balls in the air > > > > here. > > > > > > No problem! > > > > > > > I'll do some benchmarks and see what we come up with. > > > > > > Thanks, that sounds great. FWIW, there are multiple ways of implementing > > > the patch (i.e. whether you strengthen lock or unlock). I had a crack at > > > something here, but it's not tested: > > > > > > http://marc.info/?l=linux-arch&m=143758379023849&w=2 > > > > Thanks. > > > > I notice you are not changing PPC_RELEASE_BARRIER, but only the spin unlock > > code. But from my reading of the docs we need to make sure any UNLOCK+LOCK > > is a > > full barrier, not just spin unlock/lock? > > > > So don't we need to worry about some of the other locks as well? At least > > rwlock, and mutex fast path? > > Hmm, that's a good question. I notice that you don't do any of the SYNC_IO > stuff for any locks other than spinlocks but I don't know whether > smp_mb__after_unlock_lock is similarly limited in scope. > > Paul? I would expect the various locks to have similar ordering characteristics. Or am I missing something subtle here? Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Thu, Aug 20, 2015 at 10:45:05AM +0100, Michael Ellerman wrote: > On Tue, 2015-08-18 at 09:37 +0100, Will Deacon wrote: > > On Tue, Aug 18, 2015 at 02:50:55AM +0100, Michael Ellerman wrote: > > > On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote: > > > > On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote: > > > > > On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote: > > > > > > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote: > > > > > > I thought the end result of this thread was that we didn't *need* > > > > > > to change the > > > > > > powerpc lock semantics? Or did I read it wrong? > > > > > > > > > > > > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, > > > > > > which is > > > > > > consistent with our current implementation. > > > > > > > > > > That change happened about 1.5 years ago, and I thought that the > > > > > current discussion was about reversing it, based in part on the > > > > > recent powerpc benchmarks of locking primitives with and without the > > > > > sync instruction. But regardless, I clearly cannot remove either the > > > > > smp_mb__after_unlock_lock() or the powerpc definition of it to be > > > > > smp_mb() > > > > > if powerpc unlock/lock is not strengthened. > > > > > > > > Yup. Peter and I would really like to get rid of > > > > smp_mb__after_unlock_lock > > > > entirely, which would mean strengthening the ppc spinlocks. Moving the > > > > barrier primitive into RCU is a good step to prevent more widespread > > > > usage > > > > of the barrier, but we'd really like to go further if the performance > > > > impact > > > > is deemed acceptable (which is what this thread is about). > > > > > > OK, sorry for completely missing the point, too many balls in the air > > > here. > > > > No problem! > > > > > I'll do some benchmarks and see what we come up with. > > > > Thanks, that sounds great. FWIW, there are multiple ways of implementing > > the patch (i.e. whether you strengthen lock or unlock). I had a crack at > > something here, but it's not tested: > > > > http://marc.info/?l=linux-arch&m=143758379023849&w=2 > > Thanks. > > I notice you are not changing PPC_RELEASE_BARRIER, but only the spin unlock > code. But from my reading of the docs we need to make sure any UNLOCK+LOCK is > a > full barrier, not just spin unlock/lock? > > So don't we need to worry about some of the other locks as well? At least > rwlock, and mutex fast path? Hmm, that's a good question. I notice that you don't do any of the SYNC_IO stuff for any locks other than spinlocks but I don't know whether smp_mb__after_unlock_lock is similarly limited in scope. Paul? Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Tue, 2015-08-18 at 09:37 +0100, Will Deacon wrote: > On Tue, Aug 18, 2015 at 02:50:55AM +0100, Michael Ellerman wrote: > > On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote: > > > On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote: > > > > On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote: > > > > > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote: > > > > > I thought the end result of this thread was that we didn't *need* to > > > > > change the > > > > > powerpc lock semantics? Or did I read it wrong? > > > > > > > > > > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, > > > > > which is > > > > > consistent with our current implementation. > > > > > > > > That change happened about 1.5 years ago, and I thought that the > > > > current discussion was about reversing it, based in part on the > > > > recent powerpc benchmarks of locking primitives with and without the > > > > sync instruction. But regardless, I clearly cannot remove either the > > > > smp_mb__after_unlock_lock() or the powerpc definition of it to be > > > > smp_mb() > > > > if powerpc unlock/lock is not strengthened. > > > > > > Yup. Peter and I would really like to get rid of smp_mb__after_unlock_lock > > > entirely, which would mean strengthening the ppc spinlocks. Moving the > > > barrier primitive into RCU is a good step to prevent more widespread usage > > > of the barrier, but we'd really like to go further if the performance > > > impact > > > is deemed acceptable (which is what this thread is about). > > > > OK, sorry for completely missing the point, too many balls in the air here. > > No problem! > > > I'll do some benchmarks and see what we come up with. > > Thanks, that sounds great. FWIW, there are multiple ways of implementing > the patch (i.e. whether you strengthen lock or unlock). I had a crack at > something here, but it's not tested: > > http://marc.info/?l=linux-arch&m=143758379023849&w=2 Thanks. I notice you are not changing PPC_RELEASE_BARRIER, but only the spin unlock code. But from my reading of the docs we need to make sure any UNLOCK+LOCK is a full barrier, not just spin unlock/lock? So don't we need to worry about some of the other locks as well? At least rwlock, and mutex fast path? cheers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Tue, Aug 18, 2015 at 02:50:55AM +0100, Michael Ellerman wrote: > On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote: > > On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote: > > > On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote: > > > > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote: > > > > I thought the end result of this thread was that we didn't *need* to > > > > change the > > > > powerpc lock semantics? Or did I read it wrong? > > > > > > > > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which > > > > is > > > > consistent with our current implementation. > > > > > > That change happened about 1.5 years ago, and I thought that the > > > current discussion was about reversing it, based in part on the > > > recent powerpc benchmarks of locking primitives with and without the > > > sync instruction. But regardless, I clearly cannot remove either the > > > smp_mb__after_unlock_lock() or the powerpc definition of it to be smp_mb() > > > if powerpc unlock/lock is not strengthened. > > > > Yup. Peter and I would really like to get rid of smp_mb__after_unlock_lock > > entirely, which would mean strengthening the ppc spinlocks. Moving the > > barrier primitive into RCU is a good step to prevent more widespread usage > > of the barrier, but we'd really like to go further if the performance impact > > is deemed acceptable (which is what this thread is about). > > OK, sorry for completely missing the point, too many balls in the air here. No problem! > I'll do some benchmarks and see what we come up with. Thanks, that sounds great. FWIW, there are multiple ways of implementing the patch (i.e. whether you strengthen lock or unlock). I had a crack at something here, but it's not tested: http://marc.info/?l=linux-arch&m=143758379023849&w=2 Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote: > On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote: > > On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote: > > > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote: > > > I thought the end result of this thread was that we didn't *need* to > > > change the > > > powerpc lock semantics? Or did I read it wrong? > > > > > > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which is > > > consistent with our current implementation. > > > > That change happened about 1.5 years ago, and I thought that the > > current discussion was about reversing it, based in part on the > > recent powerpc benchmarks of locking primitives with and without the > > sync instruction. But regardless, I clearly cannot remove either the > > smp_mb__after_unlock_lock() or the powerpc definition of it to be smp_mb() > > if powerpc unlock/lock is not strengthened. > > Yup. Peter and I would really like to get rid of smp_mb__after_unlock_lock > entirely, which would mean strengthening the ppc spinlocks. Moving the > barrier primitive into RCU is a good step to prevent more widespread usage > of the barrier, but we'd really like to go further if the performance impact > is deemed acceptable (which is what this thread is about). OK, sorry for completely missing the point, too many balls in the air here. I'll do some benchmarks and see what we come up with. cheers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote: > On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote: > > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote: > > > On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote: > > > > On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote: > > > > > On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote: > > > > > > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote: > > > > > > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771 > > > > > > > > > Author: Paul E. McKenney > > > > > > > > > Date: Tue Jul 14 18:35:23 2015 -0700 > > > > > > > > > > > > > > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock() > > > > > > > > > > > > > > > > > > RCU is the only thing that uses > > > > > > > > > smp_mb__after_unlock_lock(), and is > > > > > > > > > likely the only thing that ever will use it, so this > > > > > > > > > commit makes this > > > > > > > > > macro private to RCU. > > > > > > > > > > > > > > > > > > Signed-off-by: Paul E. McKenney > > > > > > > > > > > > > > > > > > Cc: Will Deacon > > > > > > > > > Cc: Peter Zijlstra > > > > > > > > > Cc: Benjamin Herrenschmidt > > > > > > > > > Cc: "linux-a...@vger.kernel.org" > > > > > > > > > > > > > > > > > > > > > Are you planning to queue this somewhere? I think it makes sense > > > > > > regardless > > > > > > of whether we change PowerPc or not and ideally it would be merged > > > > > > around > > > > > > the same time as my relaxed atomics series. > > > > > > > > > > I have is in -rcu. By default, I will push it to the 4.4 merge > > > > > window. > > > > > Please let me know if you need it sooner. > > > > > > > > The generic relaxed atomics are now queued in -tip, so it would be > > > > really > > > > good to see this Documentation update land in 4.3 if at all possible. I > > > > appreciate it's late in the cycle, but it's always worth asking. > > > > > > Can't hurt to give it a try. I have set -rcu's rcu/next branch to this > > > commit, and if it passes a few day's worth of testing, I will see what > > > Ingo has to say about a pull request. > > > > > > This commit also privatizes smp_mb__after_unlock_lock() as well as > > > updating documentation. Looks like we need to strengthen powerpc's > > > locking primitives, then get rid of smp_mb__after_unlock_lock() entirely. > > > Or did that already happen and I just missed it? > > > > No it didn't. > > > > I thought the end result of this thread was that we didn't *need* to change > > the > > powerpc lock semantics? Or did I read it wrong? > > > > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which is > > consistent with our current implementation. > > That change happened about 1.5 years ago, and I thought that the > current discussion was about reversing it, based in part on the > recent powerpc benchmarks of locking primitives with and without the > sync instruction. But regardless, I clearly cannot remove either the > smp_mb__after_unlock_lock() or the powerpc definition of it to be smp_mb() > if powerpc unlock/lock is not strengthened. Yup. Peter and I would really like to get rid of smp_mb__after_unlock_lock entirely, which would mean strengthening the ppc spinlocks. Moving the barrier primitive into RCU is a good step to prevent more widespread usage of the barrier, but we'd really like to go further if the performance impact is deemed acceptable (which is what this thread is about). Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote: > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote: > > On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote: > > > Hello Paul, > > > > > > On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote: > > > > On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote: > > > > > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote: > > > > > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771 > > > > > > > > Author: Paul E. McKenney > > > > > > > > Date: Tue Jul 14 18:35:23 2015 -0700 > > > > > > > > > > > > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock() > > > > > > > > > > > > > > > > RCU is the only thing that uses > > > > > > > > smp_mb__after_unlock_lock(), and is > > > > > > > > likely the only thing that ever will use it, so this commit > > > > > > > > makes this > > > > > > > > macro private to RCU. > > > > > > > > > > > > > > > > Signed-off-by: Paul E. McKenney > > > > > > > > Cc: Will Deacon > > > > > > > > Cc: Peter Zijlstra > > > > > > > > Cc: Benjamin Herrenschmidt > > > > > > > > Cc: "linux-a...@vger.kernel.org" > > > > > > > > > > > > > > > > > > Are you planning to queue this somewhere? I think it makes sense > > > > > regardless > > > > > of whether we change PowerPc or not and ideally it would be merged > > > > > around > > > > > the same time as my relaxed atomics series. > > > > > > > > I have is in -rcu. By default, I will push it to the 4.4 merge window. > > > > Please let me know if you need it sooner. > > > > > > The generic relaxed atomics are now queued in -tip, so it would be really > > > good to see this Documentation update land in 4.3 if at all possible. I > > > appreciate it's late in the cycle, but it's always worth asking. > > > > Can't hurt to give it a try. I have set -rcu's rcu/next branch to this > > commit, and if it passes a few day's worth of testing, I will see what > > Ingo has to say about a pull request. > > > > This commit also privatizes smp_mb__after_unlock_lock() as well as > > updating documentation. Looks like we need to strengthen powerpc's > > locking primitives, then get rid of smp_mb__after_unlock_lock() entirely. > > Or did that already happen and I just missed it? > > No it didn't. > > I thought the end result of this thread was that we didn't *need* to change > the > powerpc lock semantics? Or did I read it wrong? > > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which is > consistent with our current implementation. That change happened about 1.5 years ago, and I thought that the current discussion was about reversing it, based in part on the recent powerpc benchmarks of locking primitives with and without the sync instruction. But regardless, I clearly cannot remove either the smp_mb__after_unlock_lock() or the powerpc definition of it to be smp_mb() if powerpc unlock/lock is not strengthened. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote: > On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote: > > Hello Paul, > > > > On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote: > > > On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote: > > > > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote: > > > > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771 > > > > > > > Author: Paul E. McKenney > > > > > > > Date: Tue Jul 14 18:35:23 2015 -0700 > > > > > > > > > > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock() > > > > > > > > > > > > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), > > > > > > > and is > > > > > > > likely the only thing that ever will use it, so this commit > > > > > > > makes this > > > > > > > macro private to RCU. > > > > > > > > > > > > > > Signed-off-by: Paul E. McKenney > > > > > > > Cc: Will Deacon > > > > > > > Cc: Peter Zijlstra > > > > > > > Cc: Benjamin Herrenschmidt > > > > > > > Cc: "linux-a...@vger.kernel.org" > > > > > > > > Are you planning to queue this somewhere? I think it makes sense > > > > regardless > > > > of whether we change PowerPc or not and ideally it would be merged > > > > around > > > > the same time as my relaxed atomics series. > > > > > > I have is in -rcu. By default, I will push it to the 4.4 merge window. > > > Please let me know if you need it sooner. > > > > The generic relaxed atomics are now queued in -tip, so it would be really > > good to see this Documentation update land in 4.3 if at all possible. I > > appreciate it's late in the cycle, but it's always worth asking. > > Can't hurt to give it a try. I have set -rcu's rcu/next branch to this > commit, and if it passes a few day's worth of testing, I will see what > Ingo has to say about a pull request. > > This commit also privatizes smp_mb__after_unlock_lock() as well as > updating documentation. Looks like we need to strengthen powerpc's > locking primitives, then get rid of smp_mb__after_unlock_lock() entirely. > Or did that already happen and I just missed it? No it didn't. I thought the end result of this thread was that we didn't *need* to change the powerpc lock semantics? Or did I read it wrong? ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which is consistent with our current implementation. cheers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Thu, Aug 13, 2015 at 11:49:28AM +0100, Will Deacon wrote: > On Wed, Aug 12, 2015 at 06:59:38PM +0100, Paul E. McKenney wrote: > > On Wed, Aug 12, 2015 at 08:43:46AM -0700, Paul E. McKenney wrote: > > > On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote: > > > > The generic relaxed atomics are now queued in -tip, so it would be > > > > really > > > > good to see this Documentation update land in 4.3 if at all possible. I > > > > appreciate it's late in the cycle, but it's always worth asking. > > > > > > Can't hurt to give it a try. I have set -rcu's rcu/next branch to this > > > commit, and if it passes a few day's worth of testing, I will see what > > > Ingo has to say about a pull request. > > > > > > This commit also privatizes smp_mb__after_unlock_lock() as well as > > > updating documentation. Looks like we need to strengthen powerpc's > > > locking primitives, then get rid of smp_mb__after_unlock_lock() entirely. > > > Or did that already happen and I just missed it? > > > > And just for completeness, here is the current version of that commit. > > > > Thanx, Paul > > > > > > > > b/Documentation/memory-barriers.txt | 71 > > +- > > b/arch/powerpc/include/asm/spinlock.h |2 > > b/include/linux/spinlock.h| 10 > > b/kernel/rcu/tree.h | 12 + > > 4 files changed, 16 insertions(+), 79 deletions(-) > > > > commit 12d560f4ea87030667438a169912380be00cea4b > > Author: Paul E. McKenney > > Date: Tue Jul 14 18:35:23 2015 -0700 > > > > rcu,locking: Privatize smp_mb__after_unlock_lock() > > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), and is > > likely the only thing that ever will use it, so this commit makes this > > macro private to RCU. > > > > Signed-off-by: Paul E. McKenney > > Cc: Will Deacon > > Cc: Peter Zijlstra > > Cc: Benjamin Herrenschmidt > > Cc: "linux-a...@vger.kernel.org" > > Acked-by: Will Deacon > > I don't think the PowerPC spinlock change is queued anywhere (I sent it > out as a diff for discussion, but that was it). This patch doesn't rely > on that though, right? No, this patch just moves the smp_mb__after_unlock_lock() definition, it does not change the code generated. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Wed, Aug 12, 2015 at 06:59:38PM +0100, Paul E. McKenney wrote: > On Wed, Aug 12, 2015 at 08:43:46AM -0700, Paul E. McKenney wrote: > > On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote: > > > The generic relaxed atomics are now queued in -tip, so it would be really > > > good to see this Documentation update land in 4.3 if at all possible. I > > > appreciate it's late in the cycle, but it's always worth asking. > > > > Can't hurt to give it a try. I have set -rcu's rcu/next branch to this > > commit, and if it passes a few day's worth of testing, I will see what > > Ingo has to say about a pull request. > > > > This commit also privatizes smp_mb__after_unlock_lock() as well as > > updating documentation. Looks like we need to strengthen powerpc's > > locking primitives, then get rid of smp_mb__after_unlock_lock() entirely. > > Or did that already happen and I just missed it? > > And just for completeness, here is the current version of that commit. > > Thanx, Paul > > > > b/Documentation/memory-barriers.txt | 71 > +- > b/arch/powerpc/include/asm/spinlock.h |2 > b/include/linux/spinlock.h| 10 > b/kernel/rcu/tree.h | 12 + > 4 files changed, 16 insertions(+), 79 deletions(-) > > commit 12d560f4ea87030667438a169912380be00cea4b > Author: Paul E. McKenney > Date: Tue Jul 14 18:35:23 2015 -0700 > > rcu,locking: Privatize smp_mb__after_unlock_lock() > > RCU is the only thing that uses smp_mb__after_unlock_lock(), and is > likely the only thing that ever will use it, so this commit makes this > macro private to RCU. > > Signed-off-by: Paul E. McKenney > Cc: Will Deacon > Cc: Peter Zijlstra > Cc: Benjamin Herrenschmidt > Cc: "linux-a...@vger.kernel.org" Acked-by: Will Deacon I don't think the PowerPC spinlock change is queued anywhere (I sent it out as a diff for discussion, but that was it). This patch doesn't rely on that though, right? Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Wed, Aug 12, 2015 at 08:43:46AM -0700, Paul E. McKenney wrote: > On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote: > > Hello Paul, > > > > On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote: > > > On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote: > > > > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote: > > > > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771 > > > > > > > Author: Paul E. McKenney > > > > > > > Date: Tue Jul 14 18:35:23 2015 -0700 > > > > > > > > > > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock() > > > > > > > > > > > > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), > > > > > > > and is > > > > > > > likely the only thing that ever will use it, so this commit > > > > > > > makes this > > > > > > > macro private to RCU. > > > > > > > > > > > > > > Signed-off-by: Paul E. McKenney > > > > > > > Cc: Will Deacon > > > > > > > Cc: Peter Zijlstra > > > > > > > Cc: Benjamin Herrenschmidt > > > > > > > Cc: "linux-a...@vger.kernel.org" > > > > > > > > Are you planning to queue this somewhere? I think it makes sense > > > > regardless > > > > of whether we change PowerPc or not and ideally it would be merged > > > > around > > > > the same time as my relaxed atomics series. > > > > > > I have is in -rcu. By default, I will push it to the 4.4 merge window. > > > Please let me know if you need it sooner. > > > > The generic relaxed atomics are now queued in -tip, so it would be really > > good to see this Documentation update land in 4.3 if at all possible. I > > appreciate it's late in the cycle, but it's always worth asking. > > Can't hurt to give it a try. I have set -rcu's rcu/next branch to this > commit, and if it passes a few day's worth of testing, I will see what > Ingo has to say about a pull request. > > This commit also privatizes smp_mb__after_unlock_lock() as well as > updating documentation. Looks like we need to strengthen powerpc's > locking primitives, then get rid of smp_mb__after_unlock_lock() entirely. > Or did that already happen and I just missed it? And just for completeness, here is the current version of that commit. Thanx, Paul b/Documentation/memory-barriers.txt | 71 +- b/arch/powerpc/include/asm/spinlock.h |2 b/include/linux/spinlock.h| 10 b/kernel/rcu/tree.h | 12 + 4 files changed, 16 insertions(+), 79 deletions(-) commit 12d560f4ea87030667438a169912380be00cea4b Author: Paul E. McKenney Date: Tue Jul 14 18:35:23 2015 -0700 rcu,locking: Privatize smp_mb__after_unlock_lock() RCU is the only thing that uses smp_mb__after_unlock_lock(), and is likely the only thing that ever will use it, so this commit makes this macro private to RCU. Signed-off-by: Paul E. McKenney Cc: Will Deacon Cc: Peter Zijlstra Cc: Benjamin Herrenschmidt Cc: "linux-a...@vger.kernel.org" diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 318523872db5..eafa6a53f72c 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only from the perspective of another CPU not holding that lock. In short, a ACQUIRE followed by an RELEASE may -not- be assumed to be a full memory barrier. -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not -imply a full memory barrier. If it is necessary for a RELEASE-ACQUIRE -pair to produce a full barrier, the ACQUIRE can be followed by an -smp_mb__after_unlock_lock() invocation. This will produce a full barrier -(including transitivity) if either (a) the RELEASE and the ACQUIRE are -executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on -the same variable. The smp_mb__after_unlock_lock() primitive is free -on many architectures. Without smp_mb__after_unlock_lock(), the CPU's -execution of the critical sections corresponding to the RELEASE and the -ACQUIRE can cross, so that: +Similarly, the reverse case of a RELEASE followed by an ACQUIRE does +not imply a full memory barrier. Therefore, the CPU's execution of the +critical sections corresponding to the RELEASE and the ACQUIRE can cross, +so that: *A = a; RELEASE M @@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding the deadlock. a sleep-unlock race, but the locking primitive needs to resolve such races properly in any case. -With smp_mb__after_unlock_lock(), the two critical sections cannot overlap. -For example, with the following code, the store to *A will always be -seen by other CPUs before the store to *B: - - *A = a; - RELEASE M - ACQUIRE
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote: > Hello Paul, > > On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote: > > On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote: > > > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote: > > > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771 > > > > > > Author: Paul E. McKenney > > > > > > Date: Tue Jul 14 18:35:23 2015 -0700 > > > > > > > > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock() > > > > > > > > > > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), > > > > > > and is > > > > > > likely the only thing that ever will use it, so this commit > > > > > > makes this > > > > > > macro private to RCU. > > > > > > > > > > > > Signed-off-by: Paul E. McKenney > > > > > > Cc: Will Deacon > > > > > > Cc: Peter Zijlstra > > > > > > Cc: Benjamin Herrenschmidt > > > > > > Cc: "linux-a...@vger.kernel.org" > > > > > > Are you planning to queue this somewhere? I think it makes sense > > > regardless > > > of whether we change PowerPc or not and ideally it would be merged around > > > the same time as my relaxed atomics series. > > > > I have is in -rcu. By default, I will push it to the 4.4 merge window. > > Please let me know if you need it sooner. > > The generic relaxed atomics are now queued in -tip, so it would be really > good to see this Documentation update land in 4.3 if at all possible. I > appreciate it's late in the cycle, but it's always worth asking. Can't hurt to give it a try. I have set -rcu's rcu/next branch to this commit, and if it passes a few day's worth of testing, I will see what Ingo has to say about a pull request. This commit also privatizes smp_mb__after_unlock_lock() as well as updating documentation. Looks like we need to strengthen powerpc's locking primitives, then get rid of smp_mb__after_unlock_lock() entirely. Or did that already happen and I just missed it? Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
Hello Paul, On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote: > On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote: > > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote: > > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771 > > > > > Author: Paul E. McKenney > > > > > Date: Tue Jul 14 18:35:23 2015 -0700 > > > > > > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock() > > > > > > > > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), and > > > > > is > > > > > likely the only thing that ever will use it, so this commit makes > > > > > this > > > > > macro private to RCU. > > > > > > > > > > Signed-off-by: Paul E. McKenney > > > > > Cc: Will Deacon > > > > > Cc: Peter Zijlstra > > > > > Cc: Benjamin Herrenschmidt > > > > > Cc: "linux-a...@vger.kernel.org" > > > > Are you planning to queue this somewhere? I think it makes sense regardless > > of whether we change PowerPc or not and ideally it would be merged around > > the same time as my relaxed atomics series. > > I have is in -rcu. By default, I will push it to the 4.4 merge window. > Please let me know if you need it sooner. The generic relaxed atomics are now queued in -tip, so it would be really good to see this Documentation update land in 4.3 if at all possible. I appreciate it's late in the cycle, but it's always worth asking. Thanks, Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote: > Hi Paul, > > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote: > > On Wed, Jul 15, 2015 at 11:51:35AM +0100, Will Deacon wrote: > > > On Wed, Jul 15, 2015 at 02:38:20AM +0100, Paul E. McKenney wrote: > > > > On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote: > > > > > On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote: > > > > > > On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote: > > > > > > > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote: > > > > > > > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney > > > > > > > > wrote: > > > > > > > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote: > > > > > > > > > > Given that RCU is currently the only user of this barrier, > > > > > > > > > > how would you > > > > > > > > > > feel about making the barrier local to RCU and not part of > > > > > > > > > > the general > > > > > > > > > > memory-barrier API? > > > > > > > > > > > > > > > > > > In theory, no objection. Your thought is to leave the > > > > > > > > > definitions where > > > > > > > > > they are, mark them as being used only by RCU, and removing > > > > > > > > > mention from > > > > > > > > > memory-barriers.txt? Or did you have something else in mind? > > > > > > > > > > > > > > > > Actually, I was thinking of defining them in an RCU header file > > > > > > > > with an > > > > > > > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could > > > > > > > > have a big > > > > > > > > comment describing the semantics, or put that in an RCU > > > > > > > > Documentation file > > > > > > > > instead of memory-barriers.txt. > > > > > > > > > > > > > > > > That *should* then mean we notice anybody else trying to use > > > > > > > > the barrier, > > > > > > > > because they'd need to send patches to either add something > > > > > > > > equivalent > > > > > > > > or move the definition out again. > > > > > > > > > > > > > > My concern with this approach is that someone putting together a > > > > > > > new > > > > > > > architecture might miss this. That said, this approach certainly > > > > > > > would > > > > > > > work for the current architectures. > > > > > > > > > > > > I don't think they're any more likely to miss it than with the > > > > > > current > > > > > > situation where the generic code defines the macro as a NOP unless > > > > > > you > > > > > > explicitly override it. > > > > > > > > > > Fair enough... > > > > > > > > Like this? > > > > > > Precisely! Thanks for cooking the patch -- this lays all my worries to > > > rest, so: > > > > > > Acked-by: Will Deacon > > > > Thank you! > > [...] > > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771 > > > > Author: Paul E. McKenney > > > > Date: Tue Jul 14 18:35:23 2015 -0700 > > > > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock() > > > > > > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), and is > > > > likely the only thing that ever will use it, so this commit makes > > > > this > > > > macro private to RCU. > > > > > > > > Signed-off-by: Paul E. McKenney > > > > Cc: Will Deacon > > > > Cc: Peter Zijlstra > > > > Cc: Benjamin Herrenschmidt > > > > Cc: "linux-a...@vger.kernel.org" > > Are you planning to queue this somewhere? I think it makes sense regardless > of whether we change PowerPc or not and ideally it would be merged around > the same time as my relaxed atomics series. I have is in -rcu. By default, I will push it to the 4.4 merge window. Please let me know if you need it sooner. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
Hi Paul, On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote: > On Wed, Jul 15, 2015 at 11:51:35AM +0100, Will Deacon wrote: > > On Wed, Jul 15, 2015 at 02:38:20AM +0100, Paul E. McKenney wrote: > > > On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote: > > > > On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote: > > > > > On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote: > > > > > > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote: > > > > > > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote: > > > > > > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote: > > > > > > > > > Given that RCU is currently the only user of this barrier, > > > > > > > > > how would you > > > > > > > > > feel about making the barrier local to RCU and not part of > > > > > > > > > the general > > > > > > > > > memory-barrier API? > > > > > > > > > > > > > > > > In theory, no objection. Your thought is to leave the > > > > > > > > definitions where > > > > > > > > they are, mark them as being used only by RCU, and removing > > > > > > > > mention from > > > > > > > > memory-barriers.txt? Or did you have something else in mind? > > > > > > > > > > > > > > Actually, I was thinking of defining them in an RCU header file > > > > > > > with an > > > > > > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could > > > > > > > have a big > > > > > > > comment describing the semantics, or put that in an RCU > > > > > > > Documentation file > > > > > > > instead of memory-barriers.txt. > > > > > > > > > > > > > > That *should* then mean we notice anybody else trying to use the > > > > > > > barrier, > > > > > > > because they'd need to send patches to either add something > > > > > > > equivalent > > > > > > > or move the definition out again. > > > > > > > > > > > > My concern with this approach is that someone putting together a new > > > > > > architecture might miss this. That said, this approach certainly > > > > > > would > > > > > > work for the current architectures. > > > > > > > > > > I don't think they're any more likely to miss it than with the current > > > > > situation where the generic code defines the macro as a NOP unless you > > > > > explicitly override it. > > > > > > > > Fair enough... > > > > > > Like this? > > > > Precisely! Thanks for cooking the patch -- this lays all my worries to > > rest, so: > > > > Acked-by: Will Deacon > > Thank you! [...] > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771 > > > Author: Paul E. McKenney > > > Date: Tue Jul 14 18:35:23 2015 -0700 > > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock() > > > > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), and is > > > likely the only thing that ever will use it, so this commit makes this > > > macro private to RCU. > > > > > > Signed-off-by: Paul E. McKenney > > > Cc: Will Deacon > > > Cc: Peter Zijlstra > > > Cc: Benjamin Herrenschmidt > > > Cc: "linux-a...@vger.kernel.org" Are you planning to queue this somewhere? I think it makes sense regardless of whether we change PowerPc or not and ideally it would be merged around the same time as my relaxed atomics series. Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Jul 20, 2015 at 10:18:14PM +0100, Benjamin Herrenschmidt wrote: > On Mon, 2015-07-20 at 14:39 +0100, Will Deacon wrote: > > On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote: > > > On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote: > > > > static inline void arch_spin_unlock(arch_spinlock_t *lock) > > > > { > > > > - SYNC_IO; > > > > - __asm__ __volatile__("# arch_spin_unlock\n\t" > > > > - PPC_RELEASE_BARRIER: : :"memory"); > > > > + smp_mb(); > > > > lock->slock = 0; > > > > } > > > > > > That probably needs to be mb() in case somebody has the expectation that > > > it does a barrier vs. DMA on UP. > > > > Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier() > > in the core code? > > include/linux/spinlock_up.h:# define arch_spin_unlock(lock) do { > barrier(); (void)(lock); } while (0) > > Indeed... So, leaving mmiowb() alone, we end up with the patch below. Will --->8 >From 9373a4226986dd69b6aaf896590ea43abeb1cc8e Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Wed, 22 Jul 2015 17:37:58 +0100 Subject: [PATCH] powerpc: kill smp_mb__after_unlock_lock PowerPC is the only architecture defining smp_mb__after_unlock_lock, but we can remove it by adding an unconditional sync (smp_mb()) to the spin_unlock code. Signed-off-by: Will Deacon --- arch/powerpc/include/asm/io.h | 13 + arch/powerpc/include/asm/spinlock.h | 22 +- 2 files changed, 2 insertions(+), 33 deletions(-) diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h index a8d2ef30d473..a3ad51046b04 100644 --- a/arch/powerpc/include/asm/io.h +++ b/arch/powerpc/include/asm/io.h @@ -105,12 +105,6 @@ extern bool isa_io_special; * */ -#ifdef CONFIG_PPC64 -#define IO_SET_SYNC_FLAG() do { local_paca->io_sync = 1; } while(0) -#else -#define IO_SET_SYNC_FLAG() -#endif - /* gcc 4.0 and older doesn't have 'Z' constraint */ #if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ == 0) #define DEF_MMIO_IN_X(name, size, insn)\ @@ -127,7 +121,6 @@ static inline void name(volatile u##size __iomem *addr, u##size val)\ { \ __asm__ __volatile__("sync;"#insn" %1,0,%2" \ : "=m" (*addr) : "r" (val), "r" (addr) : "memory"); \ - IO_SET_SYNC_FLAG(); \ } #else /* newer gcc */ #define DEF_MMIO_IN_X(name, size, insn)\ @@ -144,7 +137,6 @@ static inline void name(volatile u##size __iomem *addr, u##size val)\ { \ __asm__ __volatile__("sync;"#insn" %1,%y0" \ : "=Z" (*addr) : "r" (val) : "memory"); \ - IO_SET_SYNC_FLAG(); \ } #endif @@ -162,7 +154,6 @@ static inline void name(volatile u##size __iomem *addr, u##size val)\ { \ __asm__ __volatile__("sync;"#insn"%U0%X0 %1,%0" \ : "=m" (*addr) : "r" (val) : "memory"); \ - IO_SET_SYNC_FLAG(); \ } DEF_MMIO_IN_D(in_8, 8, lbz); @@ -630,9 +621,7 @@ static inline void name at \ #define mmiowb() #else /* - * Enforce synchronisation of stores vs. spin_unlock - * (this does it explicitly, though our implementation of spin_unlock - * does it implicitely too) + * Explicitly enforce synchronisation of stores vs. spin_unlock */ static inline void mmiowb(void) { diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h index 4dbe072eecbe..9da89ea4ff31 100644 --- a/arch/powerpc/include/asm/spinlock.h +++ b/arch/powerpc/include/asm/spinlock.h @@ -28,8 +28,6 @@ #include #include -#define smp_mb__after_unlock_lock()smp_mb() /* Full ordering for lock. */ - #ifdef CONFIG_PPC64 /* use 0x80yy when locked, where yy == CPU number */ #ifdef __BIG_ENDIAN__ @@ -41,19 +39,6 @@ #define LOCK_TOKEN 1 #endif -#if defined(CONFIG_PPC64) && defined(CONFIG_SMP) -#define CLEAR_IO_SYNC (get_paca()->io_sync = 0) -#define SYNC_IOdo { \ - if (unlikely(get_paca()->io_sync)) {\ - mb(); \ - get_paca()->io_sync = 0;\ - } \ - } while (0) -#else -#define CLEAR_IO_SYNC -#define SYNC_IO -#endif - static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock) { return lo
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, 2015-07-20 at 14:39 +0100, Will Deacon wrote: > On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote: > > On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote: > > > static inline void arch_spin_unlock(arch_spinlock_t *lock) > > > { > > > - SYNC_IO; > > > - __asm__ __volatile__("# arch_spin_unlock\n\t" > > > - PPC_RELEASE_BARRIER: : :"memory"); > > > + smp_mb(); > > > lock->slock = 0; > > > } > > > > That probably needs to be mb() in case somebody has the expectation that > > it does a barrier vs. DMA on UP. > > Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier() > in the core code? include/linux/spinlock_up.h:# define arch_spin_unlock(lock) do { barrier(); (void)(lock); } while (0) Indeed... Ben. > Will > -- > To unsubscribe from this list: send the line "unsubscribe linux-arch" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Jul 20, 2015 at 02:48:49PM +0100, Paul E. McKenney wrote: > On Mon, Jul 20, 2015 at 02:39:06PM +0100, Will Deacon wrote: > > On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote: > > > On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote: > > > > static inline void arch_spin_unlock(arch_spinlock_t *lock) > > > > { > > > > - SYNC_IO; > > > > - __asm__ __volatile__("# arch_spin_unlock\n\t" > > > > - PPC_RELEASE_BARRIER: : :"memory"); > > > > + smp_mb(); > > > > lock->slock = 0; > > > > } > > > > > > That probably needs to be mb() in case somebody has the expectation that > > > it does a barrier vs. DMA on UP. > > > > Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier() > > in the core code? > > Yes, to barrier(), but that doesn't generate any code. In contrast, the > mb() that Ben is asking for puts out a sync instruction. Without that > sync instruction, MMIO accesses can be reordered with the spin_unlock(), > even on single-CPU systems. So the bm() is really needed if unlock is > to order against MMIO (and thus DMA) on UP. Understood, but my point was that this needs to be done in the core code rather than here. Perhaps it's easier to leave mmiowb() alone for PowerPC for now and instead predicate that on !SMP? Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Jul 20, 2015 at 02:39:06PM +0100, Will Deacon wrote: > On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote: > > On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote: > > > static inline void arch_spin_unlock(arch_spinlock_t *lock) > > > { > > > - SYNC_IO; > > > - __asm__ __volatile__("# arch_spin_unlock\n\t" > > > - PPC_RELEASE_BARRIER: : :"memory"); > > > + smp_mb(); > > > lock->slock = 0; > > > } > > > > That probably needs to be mb() in case somebody has the expectation that > > it does a barrier vs. DMA on UP. > > Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier() > in the core code? Yes, to barrier(), but that doesn't generate any code. In contrast, the mb() that Ben is asking for puts out a sync instruction. Without that sync instruction, MMIO accesses can be reordered with the spin_unlock(), even on single-CPU systems. So the bm() is really needed if unlock is to order against MMIO (and thus DMA) on UP. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote: > On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote: > > static inline void arch_spin_unlock(arch_spinlock_t *lock) > > { > > - SYNC_IO; > > - __asm__ __volatile__("# arch_spin_unlock\n\t" > > - PPC_RELEASE_BARRIER: : :"memory"); > > + smp_mb(); > > lock->slock = 0; > > } > > That probably needs to be mb() in case somebody has the expectation that > it does a barrier vs. DMA on UP. Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier() in the core code? Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote: > static inline void arch_spin_unlock(arch_spinlock_t *lock) > { > - SYNC_IO; > - __asm__ __volatile__("# arch_spin_unlock\n\t" > - PPC_RELEASE_BARRIER: : :"memory"); > + smp_mb(); > lock->slock = 0; > } That probably needs to be mb() in case somebody has the expectation that it does a barrier vs. DMA on UP. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Fri, Jul 17, 2015 at 12:15:35PM +0200, Peter Zijlstra wrote: > On Fri, Jul 17, 2015 at 10:32:21AM +0100, Will Deacon wrote: > > @@ -158,9 +140,7 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, > > unsigned long flags) > > > > static inline void arch_spin_unlock(arch_spinlock_t *lock) > > { > > - SYNC_IO; > > - __asm__ __volatile__("# arch_spin_unlock\n\t" > > - PPC_RELEASE_BARRIER: : :"memory"); > > + smp_mb(); > > lock->slock = 0; > > } > > Should we then also make smp_store_release() use sync instead of lwsync > to keep it consistent? Unless smp_store_release() needs to interact with MMIO accesses, it should still be able to be lwsync. This means that unlock-lock is a full barrier, but relase-acquire is not necessarily, which should be just fine. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Fri, Jul 17, 2015 at 10:32:21AM +0100, Will Deacon wrote: > @@ -158,9 +140,7 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned > long flags) > > static inline void arch_spin_unlock(arch_spinlock_t *lock) > { > - SYNC_IO; > - __asm__ __volatile__("# arch_spin_unlock\n\t" > - PPC_RELEASE_BARRIER: : :"memory"); > + smp_mb(); > lock->slock = 0; > } Should we then also make smp_store_release() use sync instead of lwsync to keep it consistent? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Thu, Jul 16, 2015 at 11:54:25PM +0100, Benjamin Herrenschmidt wrote: > On Thu, 2015-07-16 at 08:11 -0700, Paul E. McKenney wrote: > > > So isync in lock in architecturally incorrect, despite being what > > the > > > architecture recommends using, yay ! > > > > Well, the architecture isn't expecting that crazies like myself would > > want to have an unlock-lock provide ordering to some CPU not holding > > the lock. :-/ > > Yes, isync in lock effectively allows any load or store before the lock > to leak into the lock and get re-ordered with things in there. > > lwsync leaves us exposed to the re-order inside the LL/SC of a > subsequent load. > > So to get the full barrier semantic, the only option is a full sync, > either in lock or unlock. Instinctively I prefer in lock but there's > an argument to have it in unlock so we can get rid of the iosync > business. Yeah, removing the iosync logic kills mmiowb() as well (totally untested diff below and I was too scared to touch that paca thing!). It also sticks the full barrier in one place, since there's no try_unlock to worry about. Will --->8 diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h index a8d2ef30d473..bb34f3bb66dc 100644 --- a/arch/powerpc/include/asm/io.h +++ b/arch/powerpc/include/asm/io.h @@ -105,12 +105,6 @@ extern bool isa_io_special; * */ -#ifdef CONFIG_PPC64 -#define IO_SET_SYNC_FLAG() do { local_paca->io_sync = 1; } while(0) -#else -#define IO_SET_SYNC_FLAG() -#endif - /* gcc 4.0 and older doesn't have 'Z' constraint */ #if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ == 0) #define DEF_MMIO_IN_X(name, size, insn)\ @@ -127,7 +121,6 @@ static inline void name(volatile u##size __iomem *addr, u##size val)\ { \ __asm__ __volatile__("sync;"#insn" %1,0,%2" \ : "=m" (*addr) : "r" (val), "r" (addr) : "memory"); \ - IO_SET_SYNC_FLAG(); \ } #else /* newer gcc */ #define DEF_MMIO_IN_X(name, size, insn)\ @@ -144,7 +137,6 @@ static inline void name(volatile u##size __iomem *addr, u##size val)\ { \ __asm__ __volatile__("sync;"#insn" %1,%y0" \ : "=Z" (*addr) : "r" (val) : "memory"); \ - IO_SET_SYNC_FLAG(); \ } #endif @@ -162,7 +154,6 @@ static inline void name(volatile u##size __iomem *addr, u##size val)\ { \ __asm__ __volatile__("sync;"#insn"%U0%X0 %1,%0" \ : "=m" (*addr) : "r" (val) : "memory"); \ - IO_SET_SYNC_FLAG(); \ } DEF_MMIO_IN_D(in_8, 8, lbz); @@ -626,23 +617,7 @@ static inline void name at \ #define writel_relaxed(v, addr)writel(v, addr) #define writeq_relaxed(v, addr)writeq(v, addr) -#ifdef CONFIG_PPC32 #define mmiowb() -#else -/* - * Enforce synchronisation of stores vs. spin_unlock - * (this does it explicitly, though our implementation of spin_unlock - * does it implicitely too) - */ -static inline void mmiowb(void) -{ - unsigned long tmp; - - __asm__ __volatile__("sync; li %0,0; stb %0,%1(13)" - : "=&r" (tmp) : "i" (offsetof(struct paca_struct, io_sync)) - : "memory"); -} -#endif /* !CONFIG_PPC32 */ static inline void iosync(void) { diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h index 4dbe072eecbe..9da89ea4ff31 100644 --- a/arch/powerpc/include/asm/spinlock.h +++ b/arch/powerpc/include/asm/spinlock.h @@ -28,8 +28,6 @@ #include #include -#define smp_mb__after_unlock_lock()smp_mb() /* Full ordering for lock. */ - #ifdef CONFIG_PPC64 /* use 0x80yy when locked, where yy == CPU number */ #ifdef __BIG_ENDIAN__ @@ -41,19 +39,6 @@ #define LOCK_TOKEN 1 #endif -#if defined(CONFIG_PPC64) && defined(CONFIG_SMP) -#define CLEAR_IO_SYNC (get_paca()->io_sync = 0) -#define SYNC_IOdo { \ - if (unlikely(get_paca()->io_sync)) {\ - mb(); \ - get_paca()->io_sync = 0;\ - } \ - } while (0) -#else -#define CLEAR_IO_SYNC -#define SYNC_IO -#endif - static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock) { return lock.slock == 0; @@ -91,7 +76,6 @@ static inline unsigned long __arch_spin_trylock(arch_spinlock_t *lock) static inline
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Thu, 2015-07-16 at 08:11 -0700, Paul E. McKenney wrote: > > So isync in lock in architecturally incorrect, despite being what > the > > architecture recommends using, yay ! > > Well, the architecture isn't expecting that crazies like myself would > want to have an unlock-lock provide ordering to some CPU not holding > the lock. :-/ Yes, isync in lock effectively allows any load or store before the lock to leak into the lock and get re-ordered with things in there. lwsync leaves us exposed to the re-order inside the LL/SC of a subsequent load. So to get the full barrier semantic, the only option is a full sync, either in lock or unlock. Instinctively I prefer in lock but there's an argument to have it in unlock so we can get rid of the iosync business. Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Thu, Jul 16, 2015 at 03:14:55PM +1000, Benjamin Herrenschmidt wrote: > On Thu, 2015-07-16 at 15:03 +1000, Benjamin Herrenschmidt wrote: > > On Thu, 2015-07-16 at 12:00 +1000, Michael Ellerman wrote: > > > That would fix the problem with smp_mb__after_unlock_lock(), but not > > > the original worry we had about loads happening before the SC in lock. > > > > However I think isync fixes *that* :-) The problem with isync is as you > > said, it's not a -memory- barrier per-se, it's an execution barrier / > > context synchronizing instruction. The combination stwcx. + bne + isync > > however prevents the execution of anything past the isync until the > > stwcx has completed and the bne has been "decided", which prevents loads > > from leaking into the LL/SC loop. It will also prevent a store in the > > lock from being issued before the stwcx. has completed. It does *not* > > prevent as far as I can tell another unrelated store before the lock > > from leaking into the lock, including the one used to unlock a different > > lock. > > Except that the architecture says: > > << > Because a Store Conditional instruction may com- > plete before its store has been performed, a condi- > tional Branch instruction that depends on the CR0 > value set by a Store Conditional instruction does > not order the Store Conditional's store with respect > to storage accesses caused by instructions that > follow the Branch > >> > > So isync in lock in architecturally incorrect, despite being what the > architecture recommends using, yay ! Well, the architecture isn't expecting that crazies like myself would want to have an unlock-lock provide ordering to some CPU not holding the lock. :-/ Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Thu, 2015-07-16 at 15:03 +1000, Benjamin Herrenschmidt wrote: > On Thu, 2015-07-16 at 12:00 +1000, Michael Ellerman wrote: > > That would fix the problem with smp_mb__after_unlock_lock(), but not > > the original worry we had about loads happening before the SC in lock. > > However I think isync fixes *that* :-) The problem with isync is as you > said, it's not a -memory- barrier per-se, it's an execution barrier / > context synchronizing instruction. The combination stwcx. + bne + isync > however prevents the execution of anything past the isync until the > stwcx has completed and the bne has been "decided", which prevents loads > from leaking into the LL/SC loop. It will also prevent a store in the > lock from being issued before the stwcx. has completed. It does *not* > prevent as far as I can tell another unrelated store before the lock > from leaking into the lock, including the one used to unlock a different > lock. Except that the architecture says: << Because a Store Conditional instruction may com- plete before its store has been performed, a condi- tional Branch instruction that depends on the CR0 value set by a Store Conditional instruction does not order the Store Conditional's store with respect to storage accesses caused by instructions that follow the Branch >> So isync in lock in architecturally incorrect, despite being what the architecture recommends using, yay ! Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Thu, 2015-07-16 at 12:00 +1000, Michael Ellerman wrote: > That would fix the problem with smp_mb__after_unlock_lock(), but not > the original worry we had about loads happening before the SC in lock. However I think isync fixes *that* :-) The problem with isync is as you said, it's not a -memory- barrier per-se, it's an execution barrier / context synchronizing instruction. The combination stwcx. + bne + isync however prevents the execution of anything past the isync until the stwcx has completed and the bne has been "decided", which prevents loads from leaking into the LL/SC loop. It will also prevent a store in the lock from being issued before the stwcx. has completed. It does *not* prevent as far as I can tell another unrelated store before the lock from leaking into the lock, including the one used to unlock a different lock. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Wed, 2015-07-15 at 11:44 +0100, Will Deacon wrote: > Hi Michael, > > On Wed, Jul 15, 2015 at 04:06:18AM +0100, Michael Ellerman wrote: > > On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote: > > > On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote: > > > > This didn't go anywhere last time I posted it, but here it is again. > > > > I'd really appreciate some feedback from the PowerPC guys, especially as > > > > to whether this change requires them to add an additional barrier in > > > > arch_spin_unlock and what the cost of that would be. > > > > > > We'd have to turn the lwsync in unlock or the isync in lock into a full > > > barrier. As it is, we *almost* have a full barrier semantic, but not > > > quite, as in things can get mixed up inside spin_lock between the LL and > > > the SC (things leaking in past LL and things leaking "out" up before SC > > > and then getting mixed up in there). > > > > > > Michael, at some point you were experimenting a bit with that and tried > > > to get some perf numbers of the impact that would have, did that > > > solidify ? Otherwise, I'll have a look when I'm back next week. > > > > I was mainly experimenting with replacing the lwsync in lock with an isync. > > > > But I think you're talking about making it a full sync in lock. > > > > That was about +7% on p8, +25% on p7 and +88% on p6. > > Ok, so that's overhead incurred by moving from isync -> lwsync? The numbers > look quite large... Sorry no. Currently we use lwsync in lock. You'll see isync in the code (PPC_ACQUIRE_BARRIER), but on most modern CPUs that is patched at runtime to lwsync. I benchmarked lwsync (current), isync (proposed at the time) and sync (just for comparison). The numbers above are going from lwsync -> sync, as I thought that was what Ben was talking about. Going from lwsync to isync was actually a small speedup on power8, which was surprising. > > We got stuck deciding whether isync was safe to use as a memory barrier, > > because the wording in the arch is a bit vague. > > > > But if we're talking about a full sync then I think there is no question > > that's > > OK and we should just do it. > > Is this because there's a small overhead from lwsync -> sync? Just want to > make sure I follow your reasoning. No I mean that sync is clearly a memory barrier. The issue with switching to isync in lock was that it's not a memory barrier per se, so we were not 100% confident in it. > If you went the way of sync in unlock, could you remove the conditional > SYNC_IO stuff? Yeah we could, it's just a conditional sync in unlock when mmio has been done. That would fix the problem with smp_mb__after_unlock_lock(), but not the original worry we had about loads happening before the SC in lock. cheers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Wed, 2015-07-15 at 07:18 -0700, Paul E. McKenney wrote: > On Wed, Jul 15, 2015 at 01:06:18PM +1000, Michael Ellerman wrote: > > On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote: > > > Michael, at some point you were experimenting a bit with that and tried > > > to get some perf numbers of the impact that would have, did that > > > solidify ? Otherwise, I'll have a look when I'm back next week. > > > > I was mainly experimenting with replacing the lwsync in lock with an isync. > > > > But I think you're talking about making it a full sync in lock. > > > > That was about +7% on p8, +25% on p7 and +88% on p6. > > Just for completeness, what were you running as benchmark? ;-) Heh sorry :) That was my lockcomparison benchmark, based on Anton's original. It just does 100,000,000 lock/unlocks for each type in userspace: https://github.com/mpe/lockcomparison/blob/master/lock_comparison.c cheers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Wed, Jul 15, 2015 at 01:06:18PM +1000, Michael Ellerman wrote: > On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote: > > On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote: > > > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence > > > into a full memory barrier. > > > > > > However: > > > > > > - This ordering guarantee is already provided without the barrier on > > > all architectures apart from PowerPC > > > > > > - The barrier only applies to UNLOCK + LOCK, not general > > > RELEASE + ACQUIRE operations > > > > > > - Locks are generally assumed to offer SC ordering semantics, so > > > having this additional barrier is error-prone and complicates the > > > callers of LOCK/UNLOCK primitives > > > > > > - The barrier is not well used outside of RCU and, because it was > > > retrofitted into the kernel, it's not clear whether other areas of > > > the kernel are incorrectly relying on UNLOCK + LOCK implying a full > > > barrier > > > > > > This patch removes the barrier and instead requires architectures to > > > provide full barrier semantics for an UNLOCK + LOCK sequence. > > > > > > Cc: Benjamin Herrenschmidt > > > Cc: Paul McKenney > > > Cc: Peter Zijlstra > > > Signed-off-by: Will Deacon > > > --- > > > > > > This didn't go anywhere last time I posted it, but here it is again. > > > I'd really appreciate some feedback from the PowerPC guys, especially as > > > to whether this change requires them to add an additional barrier in > > > arch_spin_unlock and what the cost of that would be. > > > > We'd have to turn the lwsync in unlock or the isync in lock into a full > > barrier. As it is, we *almost* have a full barrier semantic, but not > > quite, as in things can get mixed up inside spin_lock between the LL and > > the SC (things leaking in past LL and things leaking "out" up before SC > > and then getting mixed up in there). > > > > Michael, at some point you were experimenting a bit with that and tried > > to get some perf numbers of the impact that would have, did that > > solidify ? Otherwise, I'll have a look when I'm back next week. > > I was mainly experimenting with replacing the lwsync in lock with an isync. > > But I think you're talking about making it a full sync in lock. > > That was about +7% on p8, +25% on p7 and +88% on p6. Just for completeness, what were you running as benchmark? ;-) Thanx, Paul > We got stuck deciding whether isync was safe to use as a memory barrier, > because the wording in the arch is a bit vague. > > But if we're talking about a full sync then I think there is no question > that's > OK and we should just do it. > > cheers > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Wed, Jul 15, 2015 at 11:51:35AM +0100, Will Deacon wrote: > Hi Paul, > > On Wed, Jul 15, 2015 at 02:38:20AM +0100, Paul E. McKenney wrote: > > On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote: > > > On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote: > > > > On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote: > > > > > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote: > > > > > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote: > > > > > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote: > > > > > > > > Given that RCU is currently the only user of this barrier, how > > > > > > > > would you > > > > > > > > feel about making the barrier local to RCU and not part of the > > > > > > > > general > > > > > > > > memory-barrier API? > > > > > > > > > > > > > > In theory, no objection. Your thought is to leave the > > > > > > > definitions where > > > > > > > they are, mark them as being used only by RCU, and removing > > > > > > > mention from > > > > > > > memory-barriers.txt? Or did you have something else in mind? > > > > > > > > > > > > Actually, I was thinking of defining them in an RCU header file > > > > > > with an > > > > > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have > > > > > > a big > > > > > > comment describing the semantics, or put that in an RCU > > > > > > Documentation file > > > > > > instead of memory-barriers.txt. > > > > > > > > > > > > That *should* then mean we notice anybody else trying to use the > > > > > > barrier, > > > > > > because they'd need to send patches to either add something > > > > > > equivalent > > > > > > or move the definition out again. > > > > > > > > > > My concern with this approach is that someone putting together a new > > > > > architecture might miss this. That said, this approach certainly > > > > > would > > > > > work for the current architectures. > > > > > > > > I don't think they're any more likely to miss it than with the current > > > > situation where the generic code defines the macro as a NOP unless you > > > > explicitly override it. > > > > > > Fair enough... > > > > Like this? > > Precisely! Thanks for cooking the patch -- this lays all my worries to > rest, so: > > Acked-by: Will Deacon Thank you! > We should continue the discussion with Ben and Michael about whether or > not the PowerPC locking code can be strengthened, though (making this > barrier a NOP on all currently supported archs). Indeed -- if it becomes a NOP on all supported architectures, we might want to consider just removing it completely. Thanx, Paul > Will > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771 > > Author: Paul E. McKenney > > Date: Tue Jul 14 18:35:23 2015 -0700 > > > > rcu,locking: Privatize smp_mb__after_unlock_lock() > > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), and is > > likely the only thing that ever will use it, so this commit makes this > > macro private to RCU. > > > > Signed-off-by: Paul E. McKenney > > Cc: Will Deacon > > Cc: Peter Zijlstra > > Cc: Benjamin Herrenschmidt > > Cc: "linux-a...@vger.kernel.org" > > > > diff --git a/Documentation/memory-barriers.txt > > b/Documentation/memory-barriers.txt > > index 318523872db5..eafa6a53f72c 100644 > > --- a/Documentation/memory-barriers.txt > > +++ b/Documentation/memory-barriers.txt > > @@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only > > from the perspective of > > another CPU not holding that lock. In short, a ACQUIRE followed by an > > RELEASE may -not- be assumed to be a full memory barrier. > > > > -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not > > -imply a full memory barrier. If it is necessary for a RELEASE-ACQUIRE > > -pair to produce a full barrier, the ACQUIRE can be followed by an > > -smp_mb__after_unlock_lock() invocation. This will produce a full barrier > > -(including transitivity) if either (a) the RELEASE and the ACQUIRE are > > -executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on > > -the same variable. The smp_mb__after_unlock_lock() primitive is free > > -on many architectures. Without smp_mb__after_unlock_lock(), the CPU's > > -execution of the critical sections corresponding to the RELEASE and the > > -ACQUIRE can cross, so that: > > +Similarly, the reverse case of a RELEASE followed by an ACQUIRE does > > +not imply a full memory barrier. Therefore, the CPU's execution of the > > +critical sections corresponding to the RELEASE and the ACQUIRE can cross, > > +so that: > > > > *A = a; > > RELEASE M > > @@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding > > the deadlock. > > a sleep-unlock race, but the locking primitive needs to resolve > > such races properly in any case. > > > > -With
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
Hi Paul, On Wed, Jul 15, 2015 at 02:38:20AM +0100, Paul E. McKenney wrote: > On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote: > > On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote: > > > On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote: > > > > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote: > > > > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote: > > > > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote: > > > > > > > Given that RCU is currently the only user of this barrier, how > > > > > > > would you > > > > > > > feel about making the barrier local to RCU and not part of the > > > > > > > general > > > > > > > memory-barrier API? > > > > > > > > > > > > In theory, no objection. Your thought is to leave the definitions > > > > > > where > > > > > > they are, mark them as being used only by RCU, and removing mention > > > > > > from > > > > > > memory-barriers.txt? Or did you have something else in mind? > > > > > > > > > > Actually, I was thinking of defining them in an RCU header file with > > > > > an > > > > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a > > > > > big > > > > > comment describing the semantics, or put that in an RCU Documentation > > > > > file > > > > > instead of memory-barriers.txt. > > > > > > > > > > That *should* then mean we notice anybody else trying to use the > > > > > barrier, > > > > > because they'd need to send patches to either add something equivalent > > > > > or move the definition out again. > > > > > > > > My concern with this approach is that someone putting together a new > > > > architecture might miss this. That said, this approach certainly would > > > > work for the current architectures. > > > > > > I don't think they're any more likely to miss it than with the current > > > situation where the generic code defines the macro as a NOP unless you > > > explicitly override it. > > > > Fair enough... > > Like this? Precisely! Thanks for cooking the patch -- this lays all my worries to rest, so: Acked-by: Will Deacon We should continue the discussion with Ben and Michael about whether or not the PowerPC locking code can be strengthened, though (making this barrier a NOP on all currently supported archs). Will > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771 > Author: Paul E. McKenney > Date: Tue Jul 14 18:35:23 2015 -0700 > > rcu,locking: Privatize smp_mb__after_unlock_lock() > > RCU is the only thing that uses smp_mb__after_unlock_lock(), and is > likely the only thing that ever will use it, so this commit makes this > macro private to RCU. > > Signed-off-by: Paul E. McKenney > Cc: Will Deacon > Cc: Peter Zijlstra > Cc: Benjamin Herrenschmidt > Cc: "linux-a...@vger.kernel.org" > > diff --git a/Documentation/memory-barriers.txt > b/Documentation/memory-barriers.txt > index 318523872db5..eafa6a53f72c 100644 > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only from > the perspective of > another CPU not holding that lock. In short, a ACQUIRE followed by an > RELEASE may -not- be assumed to be a full memory barrier. > > -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not > -imply a full memory barrier. If it is necessary for a RELEASE-ACQUIRE > -pair to produce a full barrier, the ACQUIRE can be followed by an > -smp_mb__after_unlock_lock() invocation. This will produce a full barrier > -(including transitivity) if either (a) the RELEASE and the ACQUIRE are > -executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on > -the same variable. The smp_mb__after_unlock_lock() primitive is free > -on many architectures. Without smp_mb__after_unlock_lock(), the CPU's > -execution of the critical sections corresponding to the RELEASE and the > -ACQUIRE can cross, so that: > +Similarly, the reverse case of a RELEASE followed by an ACQUIRE does > +not imply a full memory barrier. Therefore, the CPU's execution of the > +critical sections corresponding to the RELEASE and the ACQUIRE can cross, > +so that: > > *A = a; > RELEASE M > @@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding > the deadlock. > a sleep-unlock race, but the locking primitive needs to resolve > such races properly in any case. > > -With smp_mb__after_unlock_lock(), the two critical sections cannot overlap. > -For example, with the following code, the store to *A will always be > -seen by other CPUs before the store to *B: > - > - *A = a; > - RELEASE M > - ACQUIRE N > - smp_mb__after_unlock_lock(); > - *B = b; > - > -The operations will always occur in one of the following orders: > - > - STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B > - STORE *A, ACQUIRE, RELEASE,
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
Hi Michael, On Wed, Jul 15, 2015 at 04:06:18AM +0100, Michael Ellerman wrote: > On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote: > > On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote: > > > This didn't go anywhere last time I posted it, but here it is again. > > > I'd really appreciate some feedback from the PowerPC guys, especially as > > > to whether this change requires them to add an additional barrier in > > > arch_spin_unlock and what the cost of that would be. > > > > We'd have to turn the lwsync in unlock or the isync in lock into a full > > barrier. As it is, we *almost* have a full barrier semantic, but not > > quite, as in things can get mixed up inside spin_lock between the LL and > > the SC (things leaking in past LL and things leaking "out" up before SC > > and then getting mixed up in there). > > > > Michael, at some point you were experimenting a bit with that and tried > > to get some perf numbers of the impact that would have, did that > > solidify ? Otherwise, I'll have a look when I'm back next week. > > I was mainly experimenting with replacing the lwsync in lock with an isync. > > But I think you're talking about making it a full sync in lock. > > That was about +7% on p8, +25% on p7 and +88% on p6. Ok, so that's overhead incurred by moving from isync -> lwsync? The numbers look quite large... > We got stuck deciding whether isync was safe to use as a memory barrier, > because the wording in the arch is a bit vague. > > But if we're talking about a full sync then I think there is no question > that's > OK and we should just do it. Is this because there's a small overhead from lwsync -> sync? Just want to make sure I follow your reasoning. If you went the way of sync in unlock, could you remove the conditional SYNC_IO stuff? Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote: > On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote: > > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence > > into a full memory barrier. > > > > However: > > > > - This ordering guarantee is already provided without the barrier on > > all architectures apart from PowerPC > > > > - The barrier only applies to UNLOCK + LOCK, not general > > RELEASE + ACQUIRE operations > > > > - Locks are generally assumed to offer SC ordering semantics, so > > having this additional barrier is error-prone and complicates the > > callers of LOCK/UNLOCK primitives > > > > - The barrier is not well used outside of RCU and, because it was > > retrofitted into the kernel, it's not clear whether other areas of > > the kernel are incorrectly relying on UNLOCK + LOCK implying a full > > barrier > > > > This patch removes the barrier and instead requires architectures to > > provide full barrier semantics for an UNLOCK + LOCK sequence. > > > > Cc: Benjamin Herrenschmidt > > Cc: Paul McKenney > > Cc: Peter Zijlstra > > Signed-off-by: Will Deacon > > --- > > > > This didn't go anywhere last time I posted it, but here it is again. > > I'd really appreciate some feedback from the PowerPC guys, especially as > > to whether this change requires them to add an additional barrier in > > arch_spin_unlock and what the cost of that would be. > > We'd have to turn the lwsync in unlock or the isync in lock into a full > barrier. As it is, we *almost* have a full barrier semantic, but not > quite, as in things can get mixed up inside spin_lock between the LL and > the SC (things leaking in past LL and things leaking "out" up before SC > and then getting mixed up in there). > > Michael, at some point you were experimenting a bit with that and tried > to get some perf numbers of the impact that would have, did that > solidify ? Otherwise, I'll have a look when I'm back next week. I was mainly experimenting with replacing the lwsync in lock with an isync. But I think you're talking about making it a full sync in lock. That was about +7% on p8, +25% on p7 and +88% on p6. We got stuck deciding whether isync was safe to use as a memory barrier, because the wording in the arch is a bit vague. But if we're talking about a full sync then I think there is no question that's OK and we should just do it. cheers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote: > On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote: > > On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote: > > > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote: > > > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote: > > > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote: > > > > > > Given that RCU is currently the only user of this barrier, how > > > > > > would you > > > > > > feel about making the barrier local to RCU and not part of the > > > > > > general > > > > > > memory-barrier API? > > > > > > > > > > In theory, no objection. Your thought is to leave the definitions > > > > > where > > > > > they are, mark them as being used only by RCU, and removing mention > > > > > from > > > > > memory-barriers.txt? Or did you have something else in mind? > > > > > > > > Actually, I was thinking of defining them in an RCU header file with an > > > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a > > > > big > > > > comment describing the semantics, or put that in an RCU Documentation > > > > file > > > > instead of memory-barriers.txt. > > > > > > > > That *should* then mean we notice anybody else trying to use the > > > > barrier, > > > > because they'd need to send patches to either add something equivalent > > > > or move the definition out again. > > > > > > My concern with this approach is that someone putting together a new > > > architecture might miss this. That said, this approach certainly would > > > work for the current architectures. > > > > I don't think they're any more likely to miss it than with the current > > situation where the generic code defines the macro as a NOP unless you > > explicitly override it. > > Fair enough... Like this? Thanx, Paul commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771 Author: Paul E. McKenney Date: Tue Jul 14 18:35:23 2015 -0700 rcu,locking: Privatize smp_mb__after_unlock_lock() RCU is the only thing that uses smp_mb__after_unlock_lock(), and is likely the only thing that ever will use it, so this commit makes this macro private to RCU. Signed-off-by: Paul E. McKenney Cc: Will Deacon Cc: Peter Zijlstra Cc: Benjamin Herrenschmidt Cc: "linux-a...@vger.kernel.org" diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 318523872db5..eafa6a53f72c 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only from the perspective of another CPU not holding that lock. In short, a ACQUIRE followed by an RELEASE may -not- be assumed to be a full memory barrier. -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not -imply a full memory barrier. If it is necessary for a RELEASE-ACQUIRE -pair to produce a full barrier, the ACQUIRE can be followed by an -smp_mb__after_unlock_lock() invocation. This will produce a full barrier -(including transitivity) if either (a) the RELEASE and the ACQUIRE are -executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on -the same variable. The smp_mb__after_unlock_lock() primitive is free -on many architectures. Without smp_mb__after_unlock_lock(), the CPU's -execution of the critical sections corresponding to the RELEASE and the -ACQUIRE can cross, so that: +Similarly, the reverse case of a RELEASE followed by an ACQUIRE does +not imply a full memory barrier. Therefore, the CPU's execution of the +critical sections corresponding to the RELEASE and the ACQUIRE can cross, +so that: *A = a; RELEASE M @@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding the deadlock. a sleep-unlock race, but the locking primitive needs to resolve such races properly in any case. -With smp_mb__after_unlock_lock(), the two critical sections cannot overlap. -For example, with the following code, the store to *A will always be -seen by other CPUs before the store to *B: - - *A = a; - RELEASE M - ACQUIRE N - smp_mb__after_unlock_lock(); - *B = b; - -The operations will always occur in one of the following orders: - - STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B - STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B - ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B - -If the RELEASE and ACQUIRE were instead both operating on the same lock -variable, only the first of these alternatives can occur. In addition, -the more strongly ordered systems may rule out some of the above orders. -But in any case, as noted earlier, the smp_mb__after_unlock_lock() -ensures that the store to *A w
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote: > On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote: > > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote: > > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote: > > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote: > > > > > Given that RCU is currently the only user of this barrier, how would > > > > > you > > > > > feel about making the barrier local to RCU and not part of the general > > > > > memory-barrier API? > > > > > > > > In theory, no objection. Your thought is to leave the definitions where > > > > they are, mark them as being used only by RCU, and removing mention from > > > > memory-barriers.txt? Or did you have something else in mind? > > > > > > Actually, I was thinking of defining them in an RCU header file with an > > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a big > > > comment describing the semantics, or put that in an RCU Documentation file > > > instead of memory-barriers.txt. > > > > > > That *should* then mean we notice anybody else trying to use the barrier, > > > because they'd need to send patches to either add something equivalent > > > or move the definition out again. > > > > My concern with this approach is that someone putting together a new > > architecture might miss this. That said, this approach certainly would > > work for the current architectures. > > I don't think they're any more likely to miss it than with the current > situation where the generic code defines the macro as a NOP unless you > explicitly override it. Fair enough... Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote: > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote: > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote: > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote: > > > > Given that RCU is currently the only user of this barrier, how would you > > > > feel about making the barrier local to RCU and not part of the general > > > > memory-barrier API? > > > > > > In theory, no objection. Your thought is to leave the definitions where > > > they are, mark them as being used only by RCU, and removing mention from > > > memory-barriers.txt? Or did you have something else in mind? > > > > Actually, I was thinking of defining them in an RCU header file with an > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a big > > comment describing the semantics, or put that in an RCU Documentation file > > instead of memory-barriers.txt. > > > > That *should* then mean we notice anybody else trying to use the barrier, > > because they'd need to send patches to either add something equivalent > > or move the definition out again. > > My concern with this approach is that someone putting together a new > architecture might miss this. That said, this approach certainly would > work for the current architectures. I don't think they're any more likely to miss it than with the current situation where the generic code defines the macro as a NOP unless you explicitly override it. Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote: > Hi Paul, > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote: > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote: > > > Given that RCU is currently the only user of this barrier, how would you > > > feel about making the barrier local to RCU and not part of the general > > > memory-barrier API? > > > > In theory, no objection. Your thought is to leave the definitions where > > they are, mark them as being used only by RCU, and removing mention from > > memory-barriers.txt? Or did you have something else in mind? > > Actually, I was thinking of defining them in an RCU header file with an > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a big > comment describing the semantics, or put that in an RCU Documentation file > instead of memory-barriers.txt. > > That *should* then mean we notice anybody else trying to use the barrier, > because they'd need to send patches to either add something equivalent > or move the definition out again. My concern with this approach is that someone putting together a new architecture might miss this. That said, this approach certainly would work for the current architectures. > > > My main reason for proposing its removal is because I don't want to see > > > it being used (incorrectly) all over the place to order the new RELEASE > > > and ACQUIRE operations I posted separately, at which point we have to try > > > fixing up all the callers or retrofitting some semantics. It doesn't help > > > that memory-barriers.txt lumps things like LOCK and ACQUIRE together, > > > whereas this barrier is currently only intended to be used in conjunction > > > with the former. > > > > Heh! That lumping was considered to be a feature at the time. ;-) > > Oh, I'm sure it was added with good intentions! And we all know which road is paved with good intentions! ;-) Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
Hi Paul, On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote: > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote: > > Given that RCU is currently the only user of this barrier, how would you > > feel about making the barrier local to RCU and not part of the general > > memory-barrier API? > > In theory, no objection. Your thought is to leave the definitions where > they are, mark them as being used only by RCU, and removing mention from > memory-barriers.txt? Or did you have something else in mind? Actually, I was thinking of defining them in an RCU header file with an #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a big comment describing the semantics, or put that in an RCU Documentation file instead of memory-barriers.txt. That *should* then mean we notice anybody else trying to use the barrier, because they'd need to send patches to either add something equivalent or move the definition out again. > > My main reason for proposing its removal is because I don't want to see > > it being used (incorrectly) all over the place to order the new RELEASE > > and ACQUIRE operations I posted separately, at which point we have to try > > fixing up all the callers or retrofitting some semantics. It doesn't help > > that memory-barriers.txt lumps things like LOCK and ACQUIRE together, > > whereas this barrier is currently only intended to be used in conjunction > > with the former. > > Heh! That lumping was considered to be a feature at the time. ;-) Oh, I'm sure it was added with good intentions! Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote: > On Tue, Jul 14, 2015 at 12:04:06AM +0100, Paul E. McKenney wrote: > > On Tue, Jul 14, 2015 at 12:23:46AM +0200, Peter Zijlstra wrote: > > > If we look at the inside of the critical section again -- similar > > > argument as before: > > > > > > *A = a > > > smp_mb() > > > store M > > > load N > > > smp_mb() > > > *B = b > > > > > > A and B are fully ordered, and in this case even transitivity is > > > provided. > > > > > > I'm stating that the order of M and N don't matter, only the > > > load/stores that are inside the acquire/release are constrained. > > > > No argument here. > > > > > IOW, I think smp_mb__after_unlock_lock() already works as advertised > > > with all our acquire/release thingies -- as is stated by the > > > documentation. > > > > > > That said, I'm not aware of anybody but RCU actually using this, so its > > > not used in that capacity. > > > > OK, I might actually understand what you are getting at. And, yes, if > > someone actually comes up with a need to combine smp_mb__after_unlock_lock() > > with something other than locking, we should worry about it at that point. > > And probably rename smp_mb__after_unlock_lock() at that point, as well. > > Until then, why lock ourselves into semantics that no one needs, and > > that it is quite possible that no one will ever need? > > Given that RCU is currently the only user of this barrier, how would you > feel about making the barrier local to RCU and not part of the general > memory-barrier API? In theory, no objection. Your thought is to leave the definitions where they are, mark them as being used only by RCU, and removing mention from memory-barriers.txt? Or did you have something else in mind? > My main reason for proposing its removal is because I don't want to see > it being used (incorrectly) all over the place to order the new RELEASE > and ACQUIRE operations I posted separately, at which point we have to try > fixing up all the callers or retrofitting some semantics. It doesn't help > that memory-barriers.txt lumps things like LOCK and ACQUIRE together, > whereas this barrier is currently only intended to be used in conjunction > with the former. Heh! That lumping was considered to be a feature at the time. ;-) Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Jul 13, 2015 at 11:31:29PM +0100, Benjamin Herrenschmidt wrote: > On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote: > > This didn't go anywhere last time I posted it, but here it is again. > > I'd really appreciate some feedback from the PowerPC guys, especially as > > to whether this change requires them to add an additional barrier in > > arch_spin_unlock and what the cost of that would be. > > We'd have to turn the lwsync in unlock or the isync in lock into a full > barrier. As it is, we *almost* have a full barrier semantic, but not > quite, as in things can get mixed up inside spin_lock between the LL and > the SC (things leaking in past LL and things leaking "out" up before SC > and then getting mixed up in there). Thanks, Ben. > Michael, at some point you were experimenting a bit with that and tried > to get some perf numbers of the impact that would have, did that > solidify ? Otherwise, I'll have a look when I'm back next week. These numbers would be really interesting... Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Tue, Jul 14, 2015 at 12:04:06AM +0100, Paul E. McKenney wrote: > On Tue, Jul 14, 2015 at 12:23:46AM +0200, Peter Zijlstra wrote: > > If we look at the inside of the critical section again -- similar > > argument as before: > > > > *A = a > > smp_mb() > > store M > > load N > > smp_mb() > > *B = b > > > > A and B are fully ordered, and in this case even transitivity is > > provided. > > > > I'm stating that the order of M and N don't matter, only the > > load/stores that are inside the acquire/release are constrained. > > No argument here. > > > IOW, I think smp_mb__after_unlock_lock() already works as advertised > > with all our acquire/release thingies -- as is stated by the > > documentation. > > > > That said, I'm not aware of anybody but RCU actually using this, so its > > not used in that capacity. > > OK, I might actually understand what you are getting at. And, yes, if > someone actually comes up with a need to combine smp_mb__after_unlock_lock() > with something other than locking, we should worry about it at that point. > And probably rename smp_mb__after_unlock_lock() at that point, as well. > Until then, why lock ourselves into semantics that no one needs, and > that it is quite possible that no one will ever need? Given that RCU is currently the only user of this barrier, how would you feel about making the barrier local to RCU and not part of the general memory-barrier API? My main reason for proposing its removal is because I don't want to see it being used (incorrectly) all over the place to order the new RELEASE and ACQUIRE operations I posted separately, at which point we have to try fixing up all the callers or retrofitting some semantics. It doesn't help that memory-barriers.txt lumps things like LOCK and ACQUIRE together, whereas this barrier is currently only intended to be used in conjunction with the former. Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Tue, Jul 14, 2015 at 08:43:44AM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2015-07-14 at 00:15 +0200, Peter Zijlstra wrote: > > > > This is instead the sequence that is of concern: > > > > > > store a > > > unlock M > > > lock N > > > load b > > > > So its late and that table didn't parse, but that should be ordered too. > > The load of b should not be able to escape the lock N. > > > > If only because LWSYNC is a valid RMB and any LOCK implementation must > > load the lock state to observe it unlocked. > > What happens is that the load passes the store conditional, though it > doesn't pass the load with reserve. However, both store A and unlock M > being just stores with an lwsync, can pass a load, so they can pass the > load with reserve. And thus inside the LL/SC loop, our store A has > passed our load B. Ah cute.. Thanks, clearly I wasn't awake enough anymore :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Tue, Jul 14, 2015 at 12:23:46AM +0200, Peter Zijlstra wrote: > On Mon, Jul 13, 2015 at 01:20:32PM -0700, Paul E. McKenney wrote: > > On Mon, Jul 13, 2015 at 06:50:29PM +0100, Will Deacon wrote: > > > > So if I'm following along, smp_mb__after_unlock_lock *does* provide > > > transitivity when used with UNLOCK + LOCK, which is stronger than your > > > example here. > > > > Yes, that is indeed the intent. > > Maybe good to state this explicitly somewhere. Fair enough! Please see patch below. > > > I don't think we want to make the same guarantee for general RELEASE + > > > ACQUIRE, because we'd end up forcing most architectures to implement the > > > expensive macro for a case that currently has no users. > > > > Agreed, smp_mb__after_unlock_lock() makes a limited guarantee. > > I'm still not seeing how the archs that implement load_acquire and > store_release with smp_mb() are a problem. I don't know that I ever claimed such architectures to be a problem. Color me confused. > If we look at the inside of the critical section again -- similar > argument as before: > > *A = a > smp_mb() > store M > load N > smp_mb() > *B = b > > A and B are fully ordered, and in this case even transitivity is > provided. > > I'm stating that the order of M and N don't matter, only the > load/stores that are inside the acquire/release are constrained. No argument here. > IOW, I think smp_mb__after_unlock_lock() already works as advertised > with all our acquire/release thingies -- as is stated by the > documentation. > > That said, I'm not aware of anybody but RCU actually using this, so its > not used in that capacity. OK, I might actually understand what you are getting at. And, yes, if someone actually comes up with a need to combine smp_mb__after_unlock_lock() with something other than locking, we should worry about it at that point. And probably rename smp_mb__after_unlock_lock() at that point, as well. Until then, why lock ourselves into semantics that no one needs, and that it is quite possible that no one will ever need? > > > In which case, it boils down to the question of how expensive it would > > > be to implement an SC UNLOCK operation on PowerPC and whether that > > > justifies > > > the existence of a complicated barrier macro that isn't used outside of > > > RCU. > > > > Given that it is either smp_mb() or nothing, I am not seeing the > > "complicated" part... > > The 'complicated' part is that we need think about it; that is we need > to realized and remember that UNLOCK+LOCK is a load-store barrier but > fails to provide transitivity. ... unless you are holding the lock. So in the common case, you do get transitivity. Thanx, Paul commit bae5cf1e2973bb1e8f852abb54f8b1948ffd82e4 Author: Paul E. McKenney Date: Mon Jul 13 15:55:52 2015 -0700 doc: Call out smp_mb__after_unlock_lock() transitivity Although "full barrier" should be interpreted as providing transitivity, it is worth eliminating any possible confusion. This commit therefore adds "(including transitivity)" to eliminate any possible confusion. Reported-by: Peter Zijlstra Signed-off-by: Paul E. McKenney diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 470c07c868e4..318523872db5 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -1858,11 +1858,12 @@ Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not imply a full memory barrier. If it is necessary for a RELEASE-ACQUIRE pair to produce a full barrier, the ACQUIRE can be followed by an smp_mb__after_unlock_lock() invocation. This will produce a full barrier -if either (a) the RELEASE and the ACQUIRE are executed by the same -CPU or task, or (b) the RELEASE and ACQUIRE act on the same variable. -The smp_mb__after_unlock_lock() primitive is free on many architectures. -Without smp_mb__after_unlock_lock(), the CPU's execution of the critical -sections corresponding to the RELEASE and the ACQUIRE can cross, so that: +(including transitivity) if either (a) the RELEASE and the ACQUIRE are +executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on +the same variable. The smp_mb__after_unlock_lock() primitive is free +on many architectures. Without smp_mb__after_unlock_lock(), the CPU's +execution of the critical sections corresponding to the RELEASE and the +ACQUIRE can cross, so that: *A = a; RELEASE M -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Tue, Jul 14, 2015 at 12:15:03AM +0200, Peter Zijlstra wrote: > On Mon, Jul 13, 2015 at 01:16:42PM -0700, Paul E. McKenney wrote: > > On Mon, Jul 13, 2015 at 03:41:53PM -0400, Peter Hurley wrote: > > > > Does that answer the question, or am I missing the point? > > > > > > Yes, it shows that smp_mb__after_unlock_lock() has no purpose, since it > > > is defined only for PowerPC and your test above just showed that for > > > the sequence > > The only purpose is to provide transitivity, but the documentation fails > to explicitly call that out. It does say that it is a full barrier, but I added explicit mention of transitivity. > > > > > > store a > > > UNLOCK M > > > LOCK N > > > store b > > > > > > a and b is always observed as an ordered pair {a,b}. > > > > Not quite. > > > > This is instead the sequence that is of concern: > > > > store a > > unlock M > > lock N > > load b > > So its late and that table didn't parse, but that should be ordered too. > The load of b should not be able to escape the lock N. > > If only because LWSYNC is a valid RMB and any LOCK implementation must > load the lock state to observe it unlocked. If you actually hold a given lock, then yes, you will observe anything previously done while holding that same lock, even if you don't use smp_mb__after_unlock_lock(). The smp_mb__after_unlock_lock() comes into play when code not holding a lock needs to see the ordering. RCU needs this because of the strong ordering that grace periods must provide: regardless of who started or ended the grace period, anything on any CPU preceding a given grace period is fully ordered before anything on any CPU following that same grace period. It is not clear to me that anything else would need such strong ordering. > > > Additionally, the assertion in Documentation/memory_barriers.txt that > > > the sequence above can be reordered as > > > > > > LOCK N > > > store b > > > store a > > > UNLOCK M > > > > > > is not true on any existing arch in Linux. > > > > It was at one time and might be again. > > What would be required to make this true? I'm having a hard time seeing > how things can get reordered like that. You are right, I failed to merge current and past knowledge. At one time, Itanium was said to allow things to bleed into lock-based critical sections. However, we now know that ld,acq and st,rel really do full ordering. Compilers might one day do this sort of reordering, but I would guess that Linux kernel builds would disable this sort of thing. Something about wanting critical sections to remain small. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Tue, 2015-07-14 at 00:15 +0200, Peter Zijlstra wrote: > > This is instead the sequence that is of concern: > > > > store a > > unlock M > > lock N > > load b > > So its late and that table didn't parse, but that should be ordered too. > The load of b should not be able to escape the lock N. > > If only because LWSYNC is a valid RMB and any LOCK implementation must > load the lock state to observe it unlocked. What happens is that the load passes the store conditional, though it doesn't pass the load with reserve. However, both store A and unlock M being just stores with an lwsync, can pass a load, so they can pass the load with reserve. And thus inside the LL/SC loop, our store A has passed our load B. > > > Additionally, the assertion in Documentation/memory_barriers.txt that > > > the sequence above can be reordered as > > > > > > LOCK N > > > store b > > > store a > > > UNLOCK M > > > > > > is not true on any existing arch in Linux. > > > > It was at one time and might be again. > > What would be required to make this true? I'm having a hard time seeing > how things can get reordered like that. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, 2015-07-13 at 17:54 +0200, Peter Zijlstra wrote: > That said, I don't think this could even happen on PPC because we have > load_acquire and store_release, this means that: > > *A = a > lwsync > store_release M > load_acquire N > lwsync > *B = b > > And since the store to M is wrapped inside two lwsync there must be > strong store order, and because the load from N is equally wrapped in > two lwsyncs there must also be strong load order. > > In fact, no store/load can cross from before the first lwsync to after > the latter and the other way around. > > So in that respect it does provide full load-store ordering. What it > does not provide is order for M and N, nor does it provide transitivity, > but looking at our documentation I'm not at all sure we guarantee that > in any case. The problem is if you have a load after the second lwsync, that load can go up pass the store release. This has caused issues when N or M is what you are trying to order against. That's why we had to add a sync to spin_is_locked or similar. Ben. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote: > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence > into a full memory barrier. > > However: > > - This ordering guarantee is already provided without the barrier on > all architectures apart from PowerPC > > - The barrier only applies to UNLOCK + LOCK, not general > RELEASE + ACQUIRE operations > > - Locks are generally assumed to offer SC ordering semantics, so > having this additional barrier is error-prone and complicates the > callers of LOCK/UNLOCK primitives > > - The barrier is not well used outside of RCU and, because it was > retrofitted into the kernel, it's not clear whether other areas of > the kernel are incorrectly relying on UNLOCK + LOCK implying a full > barrier > > This patch removes the barrier and instead requires architectures to > provide full barrier semantics for an UNLOCK + LOCK sequence. > > Cc: Benjamin Herrenschmidt > Cc: Paul McKenney > Cc: Peter Zijlstra > Signed-off-by: Will Deacon > --- > > This didn't go anywhere last time I posted it, but here it is again. > I'd really appreciate some feedback from the PowerPC guys, especially as > to whether this change requires them to add an additional barrier in > arch_spin_unlock and what the cost of that would be. We'd have to turn the lwsync in unlock or the isync in lock into a full barrier. As it is, we *almost* have a full barrier semantic, but not quite, as in things can get mixed up inside spin_lock between the LL and the SC (things leaking in past LL and things leaking "out" up before SC and then getting mixed up in there). Michael, at some point you were experimenting a bit with that and tried to get some perf numbers of the impact that would have, did that solidify ? Otherwise, I'll have a look when I'm back next week. Ben. > Documentation/memory-barriers.txt | 77 > ++--- > arch/powerpc/include/asm/spinlock.h | 2 - > include/linux/spinlock.h| 10 - > kernel/locking/mcs_spinlock.h | 9 - > kernel/rcu/tree.c | 21 +- > kernel/rcu/tree_plugin.h| 11 -- > 6 files changed, 4 insertions(+), 126 deletions(-) > > diff --git a/Documentation/memory-barriers.txt > b/Documentation/memory-barriers.txt > index 13feb697271f..fff21b632893 100644 > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -1848,74 +1848,9 @@ RELEASE are to the same lock variable, but only from > the perspective of > another CPU not holding that lock. In short, a ACQUIRE followed by an > RELEASE may -not- be assumed to be a full memory barrier. > > -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not > -imply a full memory barrier. If it is necessary for a RELEASE-ACQUIRE > -pair to produce a full barrier, the ACQUIRE can be followed by an > -smp_mb__after_unlock_lock() invocation. This will produce a full barrier > -if either (a) the RELEASE and the ACQUIRE are executed by the same > -CPU or task, or (b) the RELEASE and ACQUIRE act on the same variable. > -The smp_mb__after_unlock_lock() primitive is free on many architectures. > -Without smp_mb__after_unlock_lock(), the CPU's execution of the critical > -sections corresponding to the RELEASE and the ACQUIRE can cross, so that: > - > - *A = a; > - RELEASE M > - ACQUIRE N > - *B = b; > - > -could occur as: > - > - ACQUIRE N, STORE *B, STORE *A, RELEASE M > - > -It might appear that this reordering could introduce a deadlock. > -However, this cannot happen because if such a deadlock threatened, > -the RELEASE would simply complete, thereby avoiding the deadlock. > - > - Why does this work? > - > - One key point is that we are only talking about the CPU doing > - the reordering, not the compiler. If the compiler (or, for > - that matter, the developer) switched the operations, deadlock > - -could- occur. > - > - But suppose the CPU reordered the operations. In this case, > - the unlock precedes the lock in the assembly code. The CPU > - simply elected to try executing the later lock operation first. > - If there is a deadlock, this lock operation will simply spin (or > - try to sleep, but more on that later). The CPU will eventually > - execute the unlock operation (which preceded the lock operation > - in the assembly code), which will unravel the potential deadlock, > - allowing the lock operation to succeed. > - > - But what if the lock is a sleeplock? In that case, the code will > - try to enter the scheduler, where it will eventually encounter > - a memory barrier, which will force the earlier unlock operation > - to complete, again unraveling the deadlock. There might be > - a sleep-unlock race, but the locking primitive needs to resolve > - such races properly in any case. > - > -With smp_mb__after_unlock_lock()
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Jul 13, 2015 at 01:20:32PM -0700, Paul E. McKenney wrote: > On Mon, Jul 13, 2015 at 06:50:29PM +0100, Will Deacon wrote: > > So if I'm following along, smp_mb__after_unlock_lock *does* provide > > transitivity when used with UNLOCK + LOCK, which is stronger than your > > example here. > > Yes, that is indeed the intent. Maybe good to state this explicitly somewhere. > > I don't think we want to make the same guarantee for general RELEASE + > > ACQUIRE, because we'd end up forcing most architectures to implement the > > expensive macro for a case that currently has no users. > > Agreed, smp_mb__after_unlock_lock() makes a limited guarantee. I'm still not seeing how the archs that implement load_acquire and store_release with smp_mb() are a problem. If we look at the inside of the critical section again -- similar argument as before: *A = a smp_mb() store M load N smp_mb() *B = b A and B are fully ordered, and in this case even transitivity is provided. I'm stating that the order of M and N don't matter, only the load/stores that are inside the acquire/release are constrained. IOW, I think smp_mb__after_unlock_lock() already works as advertised with all our acquire/release thingies -- as is stated by the documentation. That said, I'm not aware of anybody but RCU actually using this, so its not used in that capacity. > > In which case, it boils down to the question of how expensive it would > > be to implement an SC UNLOCK operation on PowerPC and whether that justifies > > the existence of a complicated barrier macro that isn't used outside of > > RCU. > > Given that it is either smp_mb() or nothing, I am not seeing the > "complicated" part... The 'complicated' part is that we need think about it; that is we need to realized and remember that UNLOCK+LOCK is a load-store barrier but fails to provide transitivity. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Jul 13, 2015 at 01:16:42PM -0700, Paul E. McKenney wrote: > On Mon, Jul 13, 2015 at 03:41:53PM -0400, Peter Hurley wrote: > > > Does that answer the question, or am I missing the point? > > > > Yes, it shows that smp_mb__after_unlock_lock() has no purpose, since it > > is defined only for PowerPC and your test above just showed that for > > the sequence The only purpose is to provide transitivity, but the documentation fails to explicitly call that out. > > > > store a > > UNLOCK M > > LOCK N > > store b > > > > a and b is always observed as an ordered pair {a,b}. > > Not quite. > > This is instead the sequence that is of concern: > > store a > unlock M > lock N > load b So its late and that table didn't parse, but that should be ordered too. The load of b should not be able to escape the lock N. If only because LWSYNC is a valid RMB and any LOCK implementation must load the lock state to observe it unlocked. > > Additionally, the assertion in Documentation/memory_barriers.txt that > > the sequence above can be reordered as > > > > LOCK N > > store b > > store a > > UNLOCK M > > > > is not true on any existing arch in Linux. > > It was at one time and might be again. What would be required to make this true? I'm having a hard time seeing how things can get reordered like that. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Jul 13, 2015 at 06:50:29PM +0100, Will Deacon wrote: > On Mon, Jul 13, 2015 at 04:54:47PM +0100, Peter Zijlstra wrote: > > However I think we should look at the insides of the critical sections; > > for example (from Documentation/memory-barriers.txt): > > > > " *A = a; > > RELEASE M > > ACQUIRE N > > *B = b; > > > > could occur as: > > > > ACQUIRE N, STORE *B, STORE *A, RELEASE M" > > > > This could not in fact happen, even though we could flip M and N, A and > > B will remain strongly ordered. > > > > That said, I don't think this could even happen on PPC because we have > > load_acquire and store_release, this means that: > > > > *A = a > > lwsync > > store_release M > > load_acquire N > > lwsync > > *B = b > > > > And since the store to M is wrapped inside two lwsync there must be > > strong store order, and because the load from N is equally wrapped in > > two lwsyncs there must also be strong load order. > > > > In fact, no store/load can cross from before the first lwsync to after > > the latter and the other way around. > > > > So in that respect it does provide full load-store ordering. What it > > does not provide is order for M and N, nor does it provide transitivity, > > but looking at our documentation I'm not at all sure we guarantee that > > in any case. > > So if I'm following along, smp_mb__after_unlock_lock *does* provide > transitivity when used with UNLOCK + LOCK, which is stronger than your > example here. Yes, that is indeed the intent. > I don't think we want to make the same guarantee for general RELEASE + > ACQUIRE, because we'd end up forcing most architectures to implement the > expensive macro for a case that currently has no users. Agreed, smp_mb__after_unlock_lock() makes a limited guarantee. > In which case, it boils down to the question of how expensive it would > be to implement an SC UNLOCK operation on PowerPC and whether that justifies > the existence of a complicated barrier macro that isn't used outside of > RCU. Given that it is either smp_mb() or nothing, I am not seeing the "complicated" part... Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Jul 13, 2015 at 03:41:53PM -0400, Peter Hurley wrote: > On 07/13/2015 02:23 PM, Paul E. McKenney wrote: > > On Mon, Jul 13, 2015 at 05:54:47PM +0200, Peter Zijlstra wrote: > >> On Mon, Jul 13, 2015 at 03:21:10PM +0100, Will Deacon wrote: > >>> On Mon, Jul 13, 2015 at 03:09:15PM +0100, Will Deacon wrote: > On Mon, Jul 13, 2015 at 02:11:43PM +0100, Peter Zijlstra wrote: > > On Mon, Jul 13, 2015 at 01:15:04PM +0100, Will Deacon wrote: > >> smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence > >> into a full memory barrier. > >> > >> However: > > > >> - The barrier only applies to UNLOCK + LOCK, not general > >> RELEASE + ACQUIRE operations > > > > No it does too; note that on ppc both acquire and release use lwsync and > > two lwsyncs do not make a sync. > > Really? IIUC, that means smp_mb__after_unlock_lock needs to be a full > barrier on all architectures implementing smp_store_release as smp_mb() + > STORE, otherwise the following isn't ordered: > > RELEASE X > smp_mb__after_unlock_lock() > ACQUIRE Y > > On 32-bit ARM (at least), the ACQUIRE can be observed before the RELEASE. > >>> > >>> I knew we'd had this conversation before ;) > >>> > >>> > >>> http://lkml.kernel.org/r/20150120093443.ga11...@twins.programming.kicks-ass.net > >> > >> Ha! yes. And I had indeed forgotten about this argument. > >> > >> However I think we should look at the insides of the critical sections; > >> for example (from Documentation/memory-barriers.txt): > >> > >> " *A = a; > >> RELEASE M > >> ACQUIRE N > >> *B = b; > >> > >> could occur as: > >> > >> ACQUIRE N, STORE *B, STORE *A, RELEASE M" > >> > >> This could not in fact happen, even though we could flip M and N, A and > >> B will remain strongly ordered. > >> > >> That said, I don't think this could even happen on PPC because we have > >> load_acquire and store_release, this means that: > >> > >>*A = a > >>lwsync > >>store_release M > >>load_acquire N > >>lwsync > > > > Presumably the lwsync instructions are part of the store_release and > > load_acquire? > > > >>*B = b > >> > >> And since the store to M is wrapped inside two lwsync there must be > >> strong store order, and because the load from N is equally wrapped in > >> two lwsyncs there must also be strong load order. > >> > >> In fact, no store/load can cross from before the first lwsync to after > >> the latter and the other way around. > >> > >> So in that respect it does provide full load-store ordering. What it > >> does not provide is order for M and N, nor does it provide transitivity, > >> but looking at our documentation I'm not at all sure we guarantee that > >> in any case. > > > > I have no idea what the other thread is doing, so I put together the > > following litmus test, guessing reverse order, inverse operations, > > and full ordering: > > > > PPC peterz.2015.07.13a > > "" > > { > > 0:r1=1; 0:r2=a; 0:r3=b; 0:r4=m; 0:r5=n; > > 1:r1=1; 1:r2=a; 1:r3=b; 1:r4=m; 1:r5=n; > > } > > P0| P1; > > stw r1,0(r2) | lwz r10,0(r3) ; > > lwsync| sync ; > > stw r1,0(r4) | stw r1,0(r5) ; > > lwz r10,0(r5) | sync ; > > lwsync| lwz r11,0(r4) ; > > stw r1,0(r3) | sync ; > >| lwz r12,0(r2) ; > > exists > > (0:r10=0 /\ 1:r10=1 /\ 1:r11=1 /\ 1:r12=1) > > > > See http://lwn.net/Articles/608550/ and http://lwn.net/Articles/470681/ > > for information on tools that operate on these litmus tests. (Both > > the herd and ppcmem tools agree, as is usually the case.) > > > > Of the 16 possible combinations of values loaded, the following seven > > can happen: > > > > 0:r10=0; 1:r10=0; 1:r11=0; 1:r12=0; > > 0:r10=0; 1:r10=0; 1:r11=0; 1:r12=1; > > 0:r10=0; 1:r10=0; 1:r11=1; 1:r12=1; > > 0:r10=0; 1:r10=1; 1:r11=1; 1:r12=1; > > 0:r10=1; 1:r10=0; 1:r11=0; 1:r12=0; > > 0:r10=1; 1:r10=0; 1:r11=0; 1:r12=1; > > 0:r10=1; 1:r10=0; 1:r11=1; 1:r12=1; > > > > P0's store to "m" and load from "n" can clearly be misordered, as there > > is nothing to order them. And all four possible outcomes for 0:r10 and > > 1:r11 are seen, as expected. > > > > Given that smp_store_release() is only guaranteed to order against prior > > operations and smp_load_acquire() is only guaranteed to order against > > subsequent operations, P0's load from "n" can be misordered with its > > store to "a", and as expected, all four possible outcomes for 0:r10 and > > 1:r12 are observed. > > > > P0's pairs of stores should all be ordered: > > > > o "a" and "m" -> 1:r11=1 and 1:r12=0 cannot happen, as expected. > > > > o "a" and "b" -> 1:r10=1 and 1:r12=0 cannot happen, as expected. > > > > o "m" and "b" -> 1:r10=1 and 1:r11=0 cannot happen, as expected. > > > > So smp_load_a
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On 07/13/2015 02:23 PM, Paul E. McKenney wrote: > On Mon, Jul 13, 2015 at 05:54:47PM +0200, Peter Zijlstra wrote: >> On Mon, Jul 13, 2015 at 03:21:10PM +0100, Will Deacon wrote: >>> On Mon, Jul 13, 2015 at 03:09:15PM +0100, Will Deacon wrote: On Mon, Jul 13, 2015 at 02:11:43PM +0100, Peter Zijlstra wrote: > On Mon, Jul 13, 2015 at 01:15:04PM +0100, Will Deacon wrote: >> smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence >> into a full memory barrier. >> >> However: > >> - The barrier only applies to UNLOCK + LOCK, not general >> RELEASE + ACQUIRE operations > > No it does too; note that on ppc both acquire and release use lwsync and > two lwsyncs do not make a sync. Really? IIUC, that means smp_mb__after_unlock_lock needs to be a full barrier on all architectures implementing smp_store_release as smp_mb() + STORE, otherwise the following isn't ordered: RELEASE X smp_mb__after_unlock_lock() ACQUIRE Y On 32-bit ARM (at least), the ACQUIRE can be observed before the RELEASE. >>> >>> I knew we'd had this conversation before ;) >>> >>> >>> http://lkml.kernel.org/r/20150120093443.ga11...@twins.programming.kicks-ass.net >> >> Ha! yes. And I had indeed forgotten about this argument. >> >> However I think we should look at the insides of the critical sections; >> for example (from Documentation/memory-barriers.txt): >> >> " *A = a; >> RELEASE M >> ACQUIRE N >> *B = b; >> >> could occur as: >> >> ACQUIRE N, STORE *B, STORE *A, RELEASE M" >> >> This could not in fact happen, even though we could flip M and N, A and >> B will remain strongly ordered. >> >> That said, I don't think this could even happen on PPC because we have >> load_acquire and store_release, this means that: >> >> *A = a >> lwsync >> store_release M >> load_acquire N >> lwsync > > Presumably the lwsync instructions are part of the store_release and > load_acquire? > >> *B = b >> >> And since the store to M is wrapped inside two lwsync there must be >> strong store order, and because the load from N is equally wrapped in >> two lwsyncs there must also be strong load order. >> >> In fact, no store/load can cross from before the first lwsync to after >> the latter and the other way around. >> >> So in that respect it does provide full load-store ordering. What it >> does not provide is order for M and N, nor does it provide transitivity, >> but looking at our documentation I'm not at all sure we guarantee that >> in any case. > > I have no idea what the other thread is doing, so I put together the > following litmus test, guessing reverse order, inverse operations, > and full ordering: > > PPC peterz.2015.07.13a > "" > { > 0:r1=1; 0:r2=a; 0:r3=b; 0:r4=m; 0:r5=n; > 1:r1=1; 1:r2=a; 1:r3=b; 1:r4=m; 1:r5=n; > } >P0| P1; >stw r1,0(r2) | lwz r10,0(r3) ; >lwsync| sync ; >stw r1,0(r4) | stw r1,0(r5) ; >lwz r10,0(r5) | sync ; >lwsync| lwz r11,0(r4) ; >stw r1,0(r3) | sync ; > | lwz r12,0(r2) ; > exists > (0:r10=0 /\ 1:r10=1 /\ 1:r11=1 /\ 1:r12=1) > > See http://lwn.net/Articles/608550/ and http://lwn.net/Articles/470681/ > for information on tools that operate on these litmus tests. (Both > the herd and ppcmem tools agree, as is usually the case.) > > Of the 16 possible combinations of values loaded, the following seven > can happen: > > 0:r10=0; 1:r10=0; 1:r11=0; 1:r12=0; > 0:r10=0; 1:r10=0; 1:r11=0; 1:r12=1; > 0:r10=0; 1:r10=0; 1:r11=1; 1:r12=1; > 0:r10=0; 1:r10=1; 1:r11=1; 1:r12=1; > 0:r10=1; 1:r10=0; 1:r11=0; 1:r12=0; > 0:r10=1; 1:r10=0; 1:r11=0; 1:r12=1; > 0:r10=1; 1:r10=0; 1:r11=1; 1:r12=1; > > P0's store to "m" and load from "n" can clearly be misordered, as there > is nothing to order them. And all four possible outcomes for 0:r10 and > 1:r11 are seen, as expected. > > Given that smp_store_release() is only guaranteed to order against prior > operations and smp_load_acquire() is only guaranteed to order against > subsequent operations, P0's load from "n" can be misordered with its > store to "a", and as expected, all four possible outcomes for 0:r10 and > 1:r12 are observed. > > P0's pairs of stores should all be ordered: > > o "a" and "m" -> 1:r11=1 and 1:r12=0 cannot happen, as expected. > > o "a" and "b" -> 1:r10=1 and 1:r12=0 cannot happen, as expected. > > o "m" and "b" -> 1:r10=1 and 1:r11=0 cannot happen, as expected. > > So smp_load_acquire() orders against all subsequent operations, but not > necessarily against any prior ones, and smp_store_release() orders against > all prior operations but not necessarily against any subsequent onse. > But additional stray orderings are permitted, as
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Jul 13, 2015 at 05:54:47PM +0200, Peter Zijlstra wrote: > On Mon, Jul 13, 2015 at 03:21:10PM +0100, Will Deacon wrote: > > On Mon, Jul 13, 2015 at 03:09:15PM +0100, Will Deacon wrote: > > > On Mon, Jul 13, 2015 at 02:11:43PM +0100, Peter Zijlstra wrote: > > > > On Mon, Jul 13, 2015 at 01:15:04PM +0100, Will Deacon wrote: > > > > > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence > > > > > into a full memory barrier. > > > > > > > > > > However: > > > > > > > > > - The barrier only applies to UNLOCK + LOCK, not general > > > > > RELEASE + ACQUIRE operations > > > > > > > > No it does too; note that on ppc both acquire and release use lwsync and > > > > two lwsyncs do not make a sync. > > > > > > Really? IIUC, that means smp_mb__after_unlock_lock needs to be a full > > > barrier on all architectures implementing smp_store_release as smp_mb() + > > > STORE, otherwise the following isn't ordered: > > > > > > RELEASE X > > > smp_mb__after_unlock_lock() > > > ACQUIRE Y > > > > > > On 32-bit ARM (at least), the ACQUIRE can be observed before the RELEASE. > > > > I knew we'd had this conversation before ;) > > > > > > http://lkml.kernel.org/r/20150120093443.ga11...@twins.programming.kicks-ass.net > > Ha! yes. And I had indeed forgotten about this argument. > > However I think we should look at the insides of the critical sections; > for example (from Documentation/memory-barriers.txt): > > " *A = a; > RELEASE M > ACQUIRE N > *B = b; > > could occur as: > > ACQUIRE N, STORE *B, STORE *A, RELEASE M" > > This could not in fact happen, even though we could flip M and N, A and > B will remain strongly ordered. > > That said, I don't think this could even happen on PPC because we have > load_acquire and store_release, this means that: > > *A = a > lwsync > store_release M > load_acquire N > lwsync Presumably the lwsync instructions are part of the store_release and load_acquire? > *B = b > > And since the store to M is wrapped inside two lwsync there must be > strong store order, and because the load from N is equally wrapped in > two lwsyncs there must also be strong load order. > > In fact, no store/load can cross from before the first lwsync to after > the latter and the other way around. > > So in that respect it does provide full load-store ordering. What it > does not provide is order for M and N, nor does it provide transitivity, > but looking at our documentation I'm not at all sure we guarantee that > in any case. I have no idea what the other thread is doing, so I put together the following litmus test, guessing reverse order, inverse operations, and full ordering: PPC peterz.2015.07.13a "" { 0:r1=1; 0:r2=a; 0:r3=b; 0:r4=m; 0:r5=n; 1:r1=1; 1:r2=a; 1:r3=b; 1:r4=m; 1:r5=n; } P0| P1; stw r1,0(r2) | lwz r10,0(r3) ; lwsync| sync ; stw r1,0(r4) | stw r1,0(r5) ; lwz r10,0(r5) | sync ; lwsync| lwz r11,0(r4) ; stw r1,0(r3) | sync ; | lwz r12,0(r2) ; exists (0:r10=0 /\ 1:r10=1 /\ 1:r11=1 /\ 1:r12=1) See http://lwn.net/Articles/608550/ and http://lwn.net/Articles/470681/ for information on tools that operate on these litmus tests. (Both the herd and ppcmem tools agree, as is usually the case.) Of the 16 possible combinations of values loaded, the following seven can happen: 0:r10=0; 1:r10=0; 1:r11=0; 1:r12=0; 0:r10=0; 1:r10=0; 1:r11=0; 1:r12=1; 0:r10=0; 1:r10=0; 1:r11=1; 1:r12=1; 0:r10=0; 1:r10=1; 1:r11=1; 1:r12=1; 0:r10=1; 1:r10=0; 1:r11=0; 1:r12=0; 0:r10=1; 1:r10=0; 1:r11=0; 1:r12=1; 0:r10=1; 1:r10=0; 1:r11=1; 1:r12=1; P0's store to "m" and load from "n" can clearly be misordered, as there is nothing to order them. And all four possible outcomes for 0:r10 and 1:r11 are seen, as expected. Given that smp_store_release() is only guaranteed to order against prior operations and smp_load_acquire() is only guaranteed to order against subsequent operations, P0's load from "n" can be misordered with its store to "a", and as expected, all four possible outcomes for 0:r10 and 1:r12 are observed. P0's pairs of stores should all be ordered: o "a" and "m" -> 1:r11=1 and 1:r12=0 cannot happen, as expected. o "a" and "b" -> 1:r10=1 and 1:r12=0 cannot happen, as expected. o "m" and "b" -> 1:r10=1 and 1:r11=0 cannot happen, as expected. So smp_load_acquire() orders against all subsequent operations, but not necessarily against any prior ones, and smp_store_release() orders against all prior operations but not necessarily against any subsequent onse. But additional stray orderings are permitted, as is the case here. Which is in fact what these operations are defined to do. Does that answer the q
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Jul 13, 2015 at 04:54:47PM +0100, Peter Zijlstra wrote: > However I think we should look at the insides of the critical sections; > for example (from Documentation/memory-barriers.txt): > > " *A = a; > RELEASE M > ACQUIRE N > *B = b; > > could occur as: > > ACQUIRE N, STORE *B, STORE *A, RELEASE M" > > This could not in fact happen, even though we could flip M and N, A and > B will remain strongly ordered. > > That said, I don't think this could even happen on PPC because we have > load_acquire and store_release, this means that: > > *A = a > lwsync > store_release M > load_acquire N > lwsync > *B = b > > And since the store to M is wrapped inside two lwsync there must be > strong store order, and because the load from N is equally wrapped in > two lwsyncs there must also be strong load order. > > In fact, no store/load can cross from before the first lwsync to after > the latter and the other way around. > > So in that respect it does provide full load-store ordering. What it > does not provide is order for M and N, nor does it provide transitivity, > but looking at our documentation I'm not at all sure we guarantee that > in any case. So if I'm following along, smp_mb__after_unlock_lock *does* provide transitivity when used with UNLOCK + LOCK, which is stronger than your example here. I don't think we want to make the same guarantee for general RELEASE + ACQUIRE, because we'd end up forcing most architectures to implement the expensive macro for a case that currently has no users. In which case, it boils down to the question of how expensive it would be to implement an SC UNLOCK operation on PowerPC and whether that justifies the existence of a complicated barrier macro that isn't used outside of RCU. Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Jul 13, 2015 at 03:24:18PM +0100, Will Deacon wrote: > On Mon, Jul 13, 2015 at 02:09:50PM +0100, Peter Hurley wrote: > > On 07/13/2015 08:15 AM, Will Deacon wrote: > > > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence > > > into a full memory barrier. > > > > > > However: > > > > > > - This ordering guarantee is already provided without the barrier on > > > all architectures apart from PowerPC > > > > > > - The barrier only applies to UNLOCK + LOCK, not general > > > RELEASE + ACQUIRE operations > > > > I'm unclear what you mean here: do you mean > > A) a memory barrier is not required between RELEASE M + ACQUIRE N when you > >want to maintain distinct order between those operations on all arches > >(with the possible exception of PowerPC), or, > > B) no one is using smp_mb__after_unlock_lock() in that way right now. > > My understanding is (B), but Peter and I don't seem to agree yet! > I'll tighten up the text once we reach a conclusion. I'm fairly sure (but I've not looked) that nobody does in fact rely on this. So I'm in agreement with B, and I'm quibbling on what exactly A means ;-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Jul 13, 2015 at 03:21:10PM +0100, Will Deacon wrote: > On Mon, Jul 13, 2015 at 03:09:15PM +0100, Will Deacon wrote: > > On Mon, Jul 13, 2015 at 02:11:43PM +0100, Peter Zijlstra wrote: > > > On Mon, Jul 13, 2015 at 01:15:04PM +0100, Will Deacon wrote: > > > > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence > > > > into a full memory barrier. > > > > > > > > However: > > > > > > > - The barrier only applies to UNLOCK + LOCK, not general > > > > RELEASE + ACQUIRE operations > > > > > > No it does too; note that on ppc both acquire and release use lwsync and > > > two lwsyncs do not make a sync. > > > > Really? IIUC, that means smp_mb__after_unlock_lock needs to be a full > > barrier on all architectures implementing smp_store_release as smp_mb() + > > STORE, otherwise the following isn't ordered: > > > > RELEASE X > > smp_mb__after_unlock_lock() > > ACQUIRE Y > > > > On 32-bit ARM (at least), the ACQUIRE can be observed before the RELEASE. > > I knew we'd had this conversation before ;) > > > http://lkml.kernel.org/r/20150120093443.ga11...@twins.programming.kicks-ass.net Ha! yes. And I had indeed forgotten about this argument. However I think we should look at the insides of the critical sections; for example (from Documentation/memory-barriers.txt): " *A = a; RELEASE M ACQUIRE N *B = b; could occur as: ACQUIRE N, STORE *B, STORE *A, RELEASE M" This could not in fact happen, even though we could flip M and N, A and B will remain strongly ordered. That said, I don't think this could even happen on PPC because we have load_acquire and store_release, this means that: *A = a lwsync store_release M load_acquire N lwsync *B = b And since the store to M is wrapped inside two lwsync there must be strong store order, and because the load from N is equally wrapped in two lwsyncs there must also be strong load order. In fact, no store/load can cross from before the first lwsync to after the latter and the other way around. So in that respect it does provide full load-store ordering. What it does not provide is order for M and N, nor does it provide transitivity, but looking at our documentation I'm not at all sure we guarantee that in any case. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Jul 13, 2015 at 02:09:50PM +0100, Peter Hurley wrote: > On 07/13/2015 08:15 AM, Will Deacon wrote: > > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence > > into a full memory barrier. > > > > However: > > > > - This ordering guarantee is already provided without the barrier on > > all architectures apart from PowerPC > > > > - The barrier only applies to UNLOCK + LOCK, not general > > RELEASE + ACQUIRE operations > > I'm unclear what you mean here: do you mean > A) a memory barrier is not required between RELEASE M + ACQUIRE N when you >want to maintain distinct order between those operations on all arches >(with the possible exception of PowerPC), or, > B) no one is using smp_mb__after_unlock_lock() in that way right now. My understanding is (B), but Peter and I don't seem to agree yet! I'll tighten up the text once we reach a conclusion. Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Jul 13, 2015 at 03:09:15PM +0100, Will Deacon wrote: > On Mon, Jul 13, 2015 at 02:11:43PM +0100, Peter Zijlstra wrote: > > On Mon, Jul 13, 2015 at 01:15:04PM +0100, Will Deacon wrote: > > > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence > > > into a full memory barrier. > > > > > > However: > > > > > - The barrier only applies to UNLOCK + LOCK, not general > > > RELEASE + ACQUIRE operations > > > > No it does too; note that on ppc both acquire and release use lwsync and > > two lwsyncs do not make a sync. > > Really? IIUC, that means smp_mb__after_unlock_lock needs to be a full > barrier on all architectures implementing smp_store_release as smp_mb() + > STORE, otherwise the following isn't ordered: > > RELEASE X > smp_mb__after_unlock_lock() > ACQUIRE Y > > On 32-bit ARM (at least), the ACQUIRE can be observed before the RELEASE. I knew we'd had this conversation before ;) http://lkml.kernel.org/r/20150120093443.ga11...@twins.programming.kicks-ass.net Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Jul 13, 2015 at 02:11:43PM +0100, Peter Zijlstra wrote: > On Mon, Jul 13, 2015 at 01:15:04PM +0100, Will Deacon wrote: > > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence > > into a full memory barrier. > > > > However: > > > - The barrier only applies to UNLOCK + LOCK, not general > > RELEASE + ACQUIRE operations > > No it does too; note that on ppc both acquire and release use lwsync and > two lwsyncs do not make a sync. Really? IIUC, that means smp_mb__after_unlock_lock needs to be a full barrier on all architectures implementing smp_store_release as smp_mb() + STORE, otherwise the following isn't ordered: RELEASE X smp_mb__after_unlock_lock() ACQUIRE Y On 32-bit ARM (at least), the ACQUIRE can be observed before the RELEASE. Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On Mon, Jul 13, 2015 at 01:15:04PM +0100, Will Deacon wrote: > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence > into a full memory barrier. > > However: > - The barrier only applies to UNLOCK + LOCK, not general > RELEASE + ACQUIRE operations No it does too; note that on ppc both acquire and release use lwsync and two lwsyncs do not make a sync. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
On 07/13/2015 08:15 AM, Will Deacon wrote: > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence > into a full memory barrier. > > However: > > - This ordering guarantee is already provided without the barrier on > all architectures apart from PowerPC > > - The barrier only applies to UNLOCK + LOCK, not general > RELEASE + ACQUIRE operations I'm unclear what you mean here: do you mean A) a memory barrier is not required between RELEASE M + ACQUIRE N when you want to maintain distinct order between those operations on all arches (with the possible exception of PowerPC), or, B) no one is using smp_mb__after_unlock_lock() in that way right now. Regards, Peter Hurley > - Locks are generally assumed to offer SC ordering semantics, so > having this additional barrier is error-prone and complicates the > callers of LOCK/UNLOCK primitives > > - The barrier is not well used outside of RCU and, because it was > retrofitted into the kernel, it's not clear whether other areas of > the kernel are incorrectly relying on UNLOCK + LOCK implying a full > barrier > > This patch removes the barrier and instead requires architectures to > provide full barrier semantics for an UNLOCK + LOCK sequence. > > Cc: Benjamin Herrenschmidt > Cc: Paul McKenney > Cc: Peter Zijlstra > Signed-off-by: Will Deacon > --- > > This didn't go anywhere last time I posted it, but here it is again. > I'd really appreciate some feedback from the PowerPC guys, especially as > to whether this change requires them to add an additional barrier in > arch_spin_unlock and what the cost of that would be. > > Documentation/memory-barriers.txt | 77 > ++--- > arch/powerpc/include/asm/spinlock.h | 2 - > include/linux/spinlock.h| 10 - > kernel/locking/mcs_spinlock.h | 9 - > kernel/rcu/tree.c | 21 +- > kernel/rcu/tree_plugin.h| 11 -- > 6 files changed, 4 insertions(+), 126 deletions(-) > > diff --git a/Documentation/memory-barriers.txt > b/Documentation/memory-barriers.txt > index 13feb697271f..fff21b632893 100644 > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -1848,74 +1848,9 @@ RELEASE are to the same lock variable, but only from > the perspective of > another CPU not holding that lock. In short, a ACQUIRE followed by an > RELEASE may -not- be assumed to be a full memory barrier. > > -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not > -imply a full memory barrier. If it is necessary for a RELEASE-ACQUIRE > -pair to produce a full barrier, the ACQUIRE can be followed by an > -smp_mb__after_unlock_lock() invocation. This will produce a full barrier > -if either (a) the RELEASE and the ACQUIRE are executed by the same > -CPU or task, or (b) the RELEASE and ACQUIRE act on the same variable. > -The smp_mb__after_unlock_lock() primitive is free on many architectures. > -Without smp_mb__after_unlock_lock(), the CPU's execution of the critical > -sections corresponding to the RELEASE and the ACQUIRE can cross, so that: > - > - *A = a; > - RELEASE M > - ACQUIRE N > - *B = b; > - > -could occur as: > - > - ACQUIRE N, STORE *B, STORE *A, RELEASE M > - > -It might appear that this reordering could introduce a deadlock. > -However, this cannot happen because if such a deadlock threatened, > -the RELEASE would simply complete, thereby avoiding the deadlock. > - > - Why does this work? > - > - One key point is that we are only talking about the CPU doing > - the reordering, not the compiler. If the compiler (or, for > - that matter, the developer) switched the operations, deadlock > - -could- occur. > - > - But suppose the CPU reordered the operations. In this case, > - the unlock precedes the lock in the assembly code. The CPU > - simply elected to try executing the later lock operation first. > - If there is a deadlock, this lock operation will simply spin (or > - try to sleep, but more on that later). The CPU will eventually > - execute the unlock operation (which preceded the lock operation > - in the assembly code), which will unravel the potential deadlock, > - allowing the lock operation to succeed. > - > - But what if the lock is a sleeplock? In that case, the code will > - try to enter the scheduler, where it will eventually encounter > - a memory barrier, which will force the earlier unlock operation > - to complete, again unraveling the deadlock. There might be > - a sleep-unlock race, but the locking primitive needs to resolve > - such races properly in any case. > - > -With smp_mb__after_unlock_lock(), the two critical sections cannot overlap. > -For example, with the following code, the store to *A will always be > -seen by other CPUs before the store to *B: > - > - *A = a; > - RELEASE M > -
[RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()
smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence into a full memory barrier. However: - This ordering guarantee is already provided without the barrier on all architectures apart from PowerPC - The barrier only applies to UNLOCK + LOCK, not general RELEASE + ACQUIRE operations - Locks are generally assumed to offer SC ordering semantics, so having this additional barrier is error-prone and complicates the callers of LOCK/UNLOCK primitives - The barrier is not well used outside of RCU and, because it was retrofitted into the kernel, it's not clear whether other areas of the kernel are incorrectly relying on UNLOCK + LOCK implying a full barrier This patch removes the barrier and instead requires architectures to provide full barrier semantics for an UNLOCK + LOCK sequence. Cc: Benjamin Herrenschmidt Cc: Paul McKenney Cc: Peter Zijlstra Signed-off-by: Will Deacon --- This didn't go anywhere last time I posted it, but here it is again. I'd really appreciate some feedback from the PowerPC guys, especially as to whether this change requires them to add an additional barrier in arch_spin_unlock and what the cost of that would be. Documentation/memory-barriers.txt | 77 ++--- arch/powerpc/include/asm/spinlock.h | 2 - include/linux/spinlock.h| 10 - kernel/locking/mcs_spinlock.h | 9 - kernel/rcu/tree.c | 21 +- kernel/rcu/tree_plugin.h| 11 -- 6 files changed, 4 insertions(+), 126 deletions(-) diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 13feb697271f..fff21b632893 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -1848,74 +1848,9 @@ RELEASE are to the same lock variable, but only from the perspective of another CPU not holding that lock. In short, a ACQUIRE followed by an RELEASE may -not- be assumed to be a full memory barrier. -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not -imply a full memory barrier. If it is necessary for a RELEASE-ACQUIRE -pair to produce a full barrier, the ACQUIRE can be followed by an -smp_mb__after_unlock_lock() invocation. This will produce a full barrier -if either (a) the RELEASE and the ACQUIRE are executed by the same -CPU or task, or (b) the RELEASE and ACQUIRE act on the same variable. -The smp_mb__after_unlock_lock() primitive is free on many architectures. -Without smp_mb__after_unlock_lock(), the CPU's execution of the critical -sections corresponding to the RELEASE and the ACQUIRE can cross, so that: - - *A = a; - RELEASE M - ACQUIRE N - *B = b; - -could occur as: - - ACQUIRE N, STORE *B, STORE *A, RELEASE M - -It might appear that this reordering could introduce a deadlock. -However, this cannot happen because if such a deadlock threatened, -the RELEASE would simply complete, thereby avoiding the deadlock. - - Why does this work? - - One key point is that we are only talking about the CPU doing - the reordering, not the compiler. If the compiler (or, for - that matter, the developer) switched the operations, deadlock - -could- occur. - - But suppose the CPU reordered the operations. In this case, - the unlock precedes the lock in the assembly code. The CPU - simply elected to try executing the later lock operation first. - If there is a deadlock, this lock operation will simply spin (or - try to sleep, but more on that later). The CPU will eventually - execute the unlock operation (which preceded the lock operation - in the assembly code), which will unravel the potential deadlock, - allowing the lock operation to succeed. - - But what if the lock is a sleeplock? In that case, the code will - try to enter the scheduler, where it will eventually encounter - a memory barrier, which will force the earlier unlock operation - to complete, again unraveling the deadlock. There might be - a sleep-unlock race, but the locking primitive needs to resolve - such races properly in any case. - -With smp_mb__after_unlock_lock(), the two critical sections cannot overlap. -For example, with the following code, the store to *A will always be -seen by other CPUs before the store to *B: - - *A = a; - RELEASE M - ACQUIRE N - smp_mb__after_unlock_lock(); - *B = b; - -The operations will always occur in one of the following orders: - - STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B - STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B - ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B - -If the RELEASE and ACQUIRE were instead both operating on the same lock -variable, only the first of these alternatives can occur. In addition, -the more strongly ordered syste