Re: Bug in herd7 [Was: Re: Litmus test for question from Al Viro]

2020-10-05 Thread Luc Maranget
> On Sun, Oct 04, 2020 at 12:16:31AM +0900, Akira Yokosawa wrote:
> > Hi Alan,
> > 
> > Just a minor nit in the litmus test.
> > 
> > On Sat, 3 Oct 2020 09:22:12 -0400, Alan Stern wrote:
> > > To expand on my statement about the LKMM's weakness regarding control 
> > > constructs, here is a litmus test to illustrate the issue.  You might 
> > > want to add this to one of the archives.
> > > 
> > > Alan
> > > 
> > > C crypto-control-data
> > > (*
> > >  * LB plus crypto-control-data plus data
> > >  *
> > >  * Expected result: allowed
> > >  *
> > >  * This is an example of OOTA and we would like it to be forbidden.
> > >  * The WRITE_ONCE in P0 is both data-dependent and (at the hardware level)
> > >  * control-dependent on the preceding READ_ONCE.  But the dependencies are
> > >  * hidden by the form of the conditional control construct, hence the 
> > >  * name "crypto-control-data".  The memory model doesn't recognize them.
> > >  *)
> > > 
> > > {}
> > > 
> > > P0(int *x, int *y)
> > > {
> > >   int r1;
> > > 
> > >   r1 = 1;
> > >   if (READ_ONCE(*x) == 0)
> > >   r1 = 0;
> > >   WRITE_ONCE(*y, r1);
> > > }
> > > 
> > > P1(int *x, int *y)
> > > {
> > >   WRITE_ONCE(*x, READ_ONCE(*y));
> > 
> > Looks like this one-liner doesn't provide data-dependency of y -> x on 
> > herd7.
> 
> You're right.  This is definitely a bug in herd7.
> 
> Luc, were you aware of this?

Hi Alan,

No I was not aware of it. Now I am, the bug is normally fixed in the master 
branch of herd git deposit.


Thanks for the report.

--Luc


[tip:locking/core] tools/memory-model: Add model support for spin_is_locked()

2018-05-15 Thread tip-bot for Luc Maranget
Commit-ID:  15553dcbca0638de57047e79b9fb4ea77aa04db3
Gitweb: https://git.kernel.org/tip/15553dcbca0638de57047e79b9fb4ea77aa04db3
Author: Luc Maranget <luc.maran...@inria.fr>
AuthorDate: Mon, 14 May 2018 16:33:48 -0700
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 15 May 2018 08:11:17 +0200

tools/memory-model: Add model support for spin_is_locked()

This commit first adds a trivial macro for spin_is_locked() to
linux-kernel.def.

It also adds cat code for enumerating all possible matches of lock
write events (set LKW) with islocked events returning true (set RL,
for Read from Lock), and unlock write events (set UL) with islocked
events returning false (set RU, for Read from Unlock).  Note that this
intentionally does not model uniprocessor kernels (CONFIG_SMP=n) built
with CONFIG_DEBUG_SPINLOCK=n, in which spin_is_locked() unconditionally
returns zero.

It also adds a pair of litmus tests demonstrating the minimal ordering
provided by spin_is_locked() in conjunction with spin_lock().  Will Deacon
noted that this minimal ordering happens on ARMv8:
https://lkml.kernel.org/r/20180226162426.gb17...@arm.com

Notice that herd7 installations strictly older than version 7.49
do not handle the new constructs.

Signed-off-by: Luc Maranget <luc.maran...@inria.fr>
Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com>
Reviewed-by: Alan Stern <st...@rowland.harvard.edu>
Cc: Akira Yokosawa <aki...@gmail.com>
Cc: Andrea Parri <parri.and...@gmail.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Boqun Feng <boqun.f...@gmail.com>
Cc: David Howells <dhowe...@redhat.com>
Cc: Jade Alglave <j.algl...@ucl.ac.uk>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Luc Maranget <luc.maran...@inria.fr>
Cc: Nicholas Piggin <npig...@gmail.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Will Deacon <will.dea...@arm.com>
Cc: linux-a...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1526340837-1-10-git-send-email-paul...@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 tools/memory-model/linux-kernel.def|  1 +
 .../MP+polockmbonce+poacquiresilsil.litmus | 35 ++
 .../MP+polockonce+poacquiresilsil.litmus   | 34 ++
 tools/memory-model/litmus-tests/README | 10 
 tools/memory-model/lock.cat| 53 --
 5 files changed, 129 insertions(+), 4 deletions(-)

diff --git a/tools/memory-model/linux-kernel.def 
b/tools/memory-model/linux-kernel.def
index 6bd3bc431b3d..f0553bd37c08 100644
--- a/tools/memory-model/linux-kernel.def
+++ b/tools/memory-model/linux-kernel.def
@@ -38,6 +38,7 @@ cmpxchg_release(X,V,W) __cmpxchg{release}(X,V,W)
 spin_lock(X) { __lock(X); }
 spin_unlock(X) { __unlock(X); }
 spin_trylock(X) __trylock(X)
+spin_is_locked(X) __islocked(X)
 
 // RCU
 rcu_read_lock() { __fence{rcu-lock}; }
diff --git 
a/tools/memory-model/litmus-tests/MP+polockmbonce+poacquiresilsil.litmus 
b/tools/memory-model/litmus-tests/MP+polockmbonce+poacquiresilsil.litmus
new file mode 100644
index ..50f4d62bbf0e
--- /dev/null
+++ b/tools/memory-model/litmus-tests/MP+polockmbonce+poacquiresilsil.litmus
@@ -0,0 +1,35 @@
+C MP+polockmbonce+poacquiresilsil
+
+(*
+ * Result: Never
+ *
+ * Do spinlocks combined with smp_mb__after_spinlock() provide order
+ * to outside observers using spin_is_locked() to sense the lock-held
+ * state, ordered by acquire?  Note that when the first spin_is_locked()
+ * returns false and the second true, we know that the smp_load_acquire()
+ * executed before the lock was acquired (loosely speaking).
+ *)
+
+{
+}
+
+P0(spinlock_t *lo, int *x)
+{
+   spin_lock(lo);
+   smp_mb__after_spinlock();
+   WRITE_ONCE(*x, 1);
+   spin_unlock(lo);
+}
+
+P1(spinlock_t *lo, int *x)
+{
+   int r1;
+   int r2;
+   int r3;
+
+   r1 = smp_load_acquire(x);
+   r2 = spin_is_locked(lo);
+   r3 = spin_is_locked(lo);
+}
+
+exists (1:r1=1 /\ 1:r2=0 /\ 1:r3=1)
diff --git 
a/tools/memory-model/litmus-tests/MP+polockonce+poacquiresilsil.litmus 
b/tools/memory-model/litmus-tests/MP+polockonce+poacquiresilsil.litmus
new file mode 100644
index ..abf81e7a0895
--- /dev/null
+++ b/tools/memory-model/litmus-tests/MP+polockonce+poacquiresilsil.litmus
@@ -0,0 +1,34 @@
+C MP+polockonce+poacquiresilsil
+
+(*
+ * Result: Sometimes
+ *
+ * Do spinlocks provide order to outside observers using spin_is_locked()
+ * to sense the lock-held state, ordered by acquire?  Note that when the
+ * first spin_is_locked() returns false and the second true, we know that
+ * the smp_load_acquire() executed before the lock was acquired (loosely
+ * speaking).
+ *)
+
+{
+}
+
+P0(spinlock_t *lo, int *x)
+{
+   spin_lock(lo);
+   WRITE_ONCE(*x, 1);
+   spin_unlock(lo);
+}
+
+P1(spinlock_t *l

[tip:locking/core] tools/memory-model: Add model support for spin_is_locked()

2018-05-15 Thread tip-bot for Luc Maranget
Commit-ID:  15553dcbca0638de57047e79b9fb4ea77aa04db3
Gitweb: https://git.kernel.org/tip/15553dcbca0638de57047e79b9fb4ea77aa04db3
Author: Luc Maranget 
AuthorDate: Mon, 14 May 2018 16:33:48 -0700
Committer:  Ingo Molnar 
CommitDate: Tue, 15 May 2018 08:11:17 +0200

tools/memory-model: Add model support for spin_is_locked()

This commit first adds a trivial macro for spin_is_locked() to
linux-kernel.def.

It also adds cat code for enumerating all possible matches of lock
write events (set LKW) with islocked events returning true (set RL,
for Read from Lock), and unlock write events (set UL) with islocked
events returning false (set RU, for Read from Unlock).  Note that this
intentionally does not model uniprocessor kernels (CONFIG_SMP=n) built
with CONFIG_DEBUG_SPINLOCK=n, in which spin_is_locked() unconditionally
returns zero.

It also adds a pair of litmus tests demonstrating the minimal ordering
provided by spin_is_locked() in conjunction with spin_lock().  Will Deacon
noted that this minimal ordering happens on ARMv8:
https://lkml.kernel.org/r/20180226162426.gb17...@arm.com

Notice that herd7 installations strictly older than version 7.49
do not handle the new constructs.

Signed-off-by: Luc Maranget 
Signed-off-by: Paul E. McKenney 
Reviewed-by: Alan Stern 
Cc: Akira Yokosawa 
Cc: Andrea Parri 
Cc: Andrew Morton 
Cc: Boqun Feng 
Cc: David Howells 
Cc: Jade Alglave 
Cc: Linus Torvalds 
Cc: Luc Maranget 
Cc: Nicholas Piggin 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: linux-a...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1526340837-1-10-git-send-email-paul...@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar 
---
 tools/memory-model/linux-kernel.def|  1 +
 .../MP+polockmbonce+poacquiresilsil.litmus | 35 ++
 .../MP+polockonce+poacquiresilsil.litmus   | 34 ++
 tools/memory-model/litmus-tests/README | 10 
 tools/memory-model/lock.cat| 53 --
 5 files changed, 129 insertions(+), 4 deletions(-)

diff --git a/tools/memory-model/linux-kernel.def 
b/tools/memory-model/linux-kernel.def
index 6bd3bc431b3d..f0553bd37c08 100644
--- a/tools/memory-model/linux-kernel.def
+++ b/tools/memory-model/linux-kernel.def
@@ -38,6 +38,7 @@ cmpxchg_release(X,V,W) __cmpxchg{release}(X,V,W)
 spin_lock(X) { __lock(X); }
 spin_unlock(X) { __unlock(X); }
 spin_trylock(X) __trylock(X)
+spin_is_locked(X) __islocked(X)
 
 // RCU
 rcu_read_lock() { __fence{rcu-lock}; }
diff --git 
a/tools/memory-model/litmus-tests/MP+polockmbonce+poacquiresilsil.litmus 
b/tools/memory-model/litmus-tests/MP+polockmbonce+poacquiresilsil.litmus
new file mode 100644
index ..50f4d62bbf0e
--- /dev/null
+++ b/tools/memory-model/litmus-tests/MP+polockmbonce+poacquiresilsil.litmus
@@ -0,0 +1,35 @@
+C MP+polockmbonce+poacquiresilsil
+
+(*
+ * Result: Never
+ *
+ * Do spinlocks combined with smp_mb__after_spinlock() provide order
+ * to outside observers using spin_is_locked() to sense the lock-held
+ * state, ordered by acquire?  Note that when the first spin_is_locked()
+ * returns false and the second true, we know that the smp_load_acquire()
+ * executed before the lock was acquired (loosely speaking).
+ *)
+
+{
+}
+
+P0(spinlock_t *lo, int *x)
+{
+   spin_lock(lo);
+   smp_mb__after_spinlock();
+   WRITE_ONCE(*x, 1);
+   spin_unlock(lo);
+}
+
+P1(spinlock_t *lo, int *x)
+{
+   int r1;
+   int r2;
+   int r3;
+
+   r1 = smp_load_acquire(x);
+   r2 = spin_is_locked(lo);
+   r3 = spin_is_locked(lo);
+}
+
+exists (1:r1=1 /\ 1:r2=0 /\ 1:r3=1)
diff --git 
a/tools/memory-model/litmus-tests/MP+polockonce+poacquiresilsil.litmus 
b/tools/memory-model/litmus-tests/MP+polockonce+poacquiresilsil.litmus
new file mode 100644
index ..abf81e7a0895
--- /dev/null
+++ b/tools/memory-model/litmus-tests/MP+polockonce+poacquiresilsil.litmus
@@ -0,0 +1,34 @@
+C MP+polockonce+poacquiresilsil
+
+(*
+ * Result: Sometimes
+ *
+ * Do spinlocks provide order to outside observers using spin_is_locked()
+ * to sense the lock-held state, ordered by acquire?  Note that when the
+ * first spin_is_locked() returns false and the second true, we know that
+ * the smp_load_acquire() executed before the lock was acquired (loosely
+ * speaking).
+ *)
+
+{
+}
+
+P0(spinlock_t *lo, int *x)
+{
+   spin_lock(lo);
+   WRITE_ONCE(*x, 1);
+   spin_unlock(lo);
+}
+
+P1(spinlock_t *lo, int *x)
+{
+   int r1;
+   int r2;
+   int r3;
+
+   r1 = smp_load_acquire(x);
+   r2 = spin_is_locked(lo);
+   r3 = spin_is_locked(lo);
+}
+
+exists (1:r1=1 /\ 1:r2=0 /\ 1:r3=1)
diff --git a/tools/memory-model/litmus-tests/README 
b/tools/memory-model/litmus-tests/README
index 04096fb8b8d9..6919909bbd0f 100644
--- a/tools/memory-model/litmus-tests/README
+++ b/tools/memory-model/litmus-tests/README
@@ -63,6 +63,16 @@ LB+poonceonces.litmus
 MP+onceassign+derefonce.litmus

Re: [PATCH v2 2/2] riscv/atomic: Strengthen implementations with fences

2018-03-13 Thread Luc Maranget
> On Fri, Mar 09, 2018 at 04:21:37PM -0800, Daniel Lustig wrote:
> > On 3/9/2018 2:57 PM, Palmer Dabbelt wrote:
> > > On Fri, 09 Mar 2018 13:30:08 PST (-0800), parri.and...@gmail.com wrote:
> > >> On Fri, Mar 09, 2018 at 10:54:27AM -0800, Palmer Dabbelt wrote:
> > >>> On Fri, 09 Mar 2018 10:36:44 PST (-0800), parri.and...@gmail.com wrote:
> > >>
> > >> [...]
> > >>
> > >>> >This proposal relies on the generic definition,
> > >>> >
> > >>> >   include/linux/atomic.h ,
> > >>> >
> > >>> >and on the
> > >>> >
> > >>> >   __atomic_op_acquire()
> > >>> >   __atomic_op_release()
> > >>> >
> > >>> >above to build the acquire/release atomics (except for the 
> > >>> >xchg,cmpxchg,
> > >>> >where the ACQUIRE_BARRIER is inserted conditionally/on success).
> > >>>
> > >>> I thought we wanted to use the AQ and RL bits for AMOs, just not for 
> > >>> LR/SC
> > >>> sequences.  IIRC the AMOs are safe with the current memory model, but I 
> > >>> might
> > >>> just have some version mismatches in my head.
> > >>
> > >> AMO.aqrl are "safe" w.r.t. the LKMM (as they provide "full-ordering"); 
> > >> OTOH,
> > >> AMO.aq and AMO.rl present weaknesses that LKMM (and some kernel 
> > >> developers)
> > >> do not "expect".  I was probing this issue in:
> > >>
> > >>   https://marc.info/?l=linux-kernel=151930201102853=2
> > >>
> > >> (c.f., e.g., test "RISCV-unlock-lock-read-ordering" from that post).
> > >>
> > >> Quoting from the commit message of my patch 1/2:
> > >>
> > >>   "Referring to the "unlock-lock-read-ordering" test reported below,
> > >>    Daniel wrote:
> > >>
> > >>  I think an RCpc interpretation of .aq and .rl would in fact
> > >>  allow the two normal loads in P1 to be reordered [...]
> > >>
> > >>  [...]
> > >>
> > >>  Likewise even if the unlock()/lock() is between two stores.
> > >>  A control dependency might originate from the load part of
> > >>  the amoswap.w.aq, but there still would have to be something
> > >>  to ensure that this load part in fact performs after the store
> > >>  part of the amoswap.w.rl performs globally, and that's not
> > >>  automatic under RCpc.
> > >>
> > >>    Simulation of the RISC-V memory consistency model confirmed this
> > >>    expectation."
> > >>
> > >> I have just (re)checked these observations against the latest 
> > >> specification,
> > >> and my results _confirmed_ these verdicts.
> > > 
> > > Thanks, I must have just gotten confused about a draft spec or something. 
> > >  I'm
> > > pulling these on top or your other memory model related patch.  I've 
> > > renamed
> > > the branch "next-mm" to be a bit more descriptiove.
> > 
> > (Sorry for being out of the loop this week, I was out to deal with
> > a family matter.)
> > 
> > I assume you're using the herd model?  Luc's doing a great job with
> > that, but even so, nothing is officially confirmed until we ratify
> > the model.  In other words, the herd model may end up changing too.
> > If something is broken on our end, there's still time to fix it.
> > 
> > Regarding AMOs, let me copy from something I wrote in a previous
> > offline conversation:
> > 
> > > it seems to us that pairing a store-release of "amoswap.rl" with
> > > a "ld; fence r,rw" doesn't actually give us the RC semantics we've
> > > been discussing for LKMM.  For example:
> > > 
> > > (a) sd t0,0(s0)
> > > (b) amoswap.d.rl x0,t1,0(s1)
> > > ...
> > > (c) ld a0,0(s1)
> > > (d) fence r,rw
> > > (e) sd t2,0(s2)
> > > 
> > > There, we won't get (a) ordered before (e) regardless of whether
> > > (b) is RCpc or RCsc.  Do you agree?
> > 
> > At the moment, only the load part of (b) is in the predecessor
> > set of (d), but the store part of (b) is not.  Likewise, the
> > .rl annotation applies only to the store part of (b), not the
> > load part.
> > 
> > This gets back to a question Linus asked last week about
> > whether the AMO is a single unit or whether it can be
> 
> You mean AMO or RmW atomic operations?
> 
> > considered to split into a load and a store part (which still
> > perform atomically).  For RISC-V, for right now at least, the
> > answer is the latter.  Is it still the latter for Linux too?
> > 
> 
> I think for RmW atomics it's still the latter, the acquire or release is
> for the load part or the store part of an RmW. For example, ppc uses
> lwsync as acquire/release barriers, and lwsync could not order
> write->read. 

You are correct LKMM represent read-modify-write constructs with 2 events,
one read and one write. Those are connected by the special "rmw" relation.



> 
> Regards,
> Boqun

Regards,

--Luc


> 
> > https://lkml.org/lkml/2018/2/26/606
> > 
> > > So I think we'll need to make sure we pair .rl with .aq, or that
> > > we pair fence-based mappings with fence-based mappings, in order
> > > to make the acquire/release operations work.
> > 
> > This assumes we'll say that .aq and .rl are RCsc, not RCpc.
> > But in this case, I think .aq and .rl could still be safe to use,
> > as long 

Re: [PATCH v2 2/2] riscv/atomic: Strengthen implementations with fences

2018-03-13 Thread Luc Maranget
> On Fri, Mar 09, 2018 at 04:21:37PM -0800, Daniel Lustig wrote:
> > On 3/9/2018 2:57 PM, Palmer Dabbelt wrote:
> > > On Fri, 09 Mar 2018 13:30:08 PST (-0800), parri.and...@gmail.com wrote:
> > >> On Fri, Mar 09, 2018 at 10:54:27AM -0800, Palmer Dabbelt wrote:
> > >>> On Fri, 09 Mar 2018 10:36:44 PST (-0800), parri.and...@gmail.com wrote:
> > >>
> > >> [...]
> > >>
> > >>> >This proposal relies on the generic definition,
> > >>> >
> > >>> >   include/linux/atomic.h ,
> > >>> >
> > >>> >and on the
> > >>> >
> > >>> >   __atomic_op_acquire()
> > >>> >   __atomic_op_release()
> > >>> >
> > >>> >above to build the acquire/release atomics (except for the 
> > >>> >xchg,cmpxchg,
> > >>> >where the ACQUIRE_BARRIER is inserted conditionally/on success).
> > >>>
> > >>> I thought we wanted to use the AQ and RL bits for AMOs, just not for 
> > >>> LR/SC
> > >>> sequences.  IIRC the AMOs are safe with the current memory model, but I 
> > >>> might
> > >>> just have some version mismatches in my head.
> > >>
> > >> AMO.aqrl are "safe" w.r.t. the LKMM (as they provide "full-ordering"); 
> > >> OTOH,
> > >> AMO.aq and AMO.rl present weaknesses that LKMM (and some kernel 
> > >> developers)
> > >> do not "expect".  I was probing this issue in:
> > >>
> > >>   https://marc.info/?l=linux-kernel=151930201102853=2
> > >>
> > >> (c.f., e.g., test "RISCV-unlock-lock-read-ordering" from that post).
> > >>
> > >> Quoting from the commit message of my patch 1/2:
> > >>
> > >>   "Referring to the "unlock-lock-read-ordering" test reported below,
> > >>    Daniel wrote:
> > >>
> > >>  I think an RCpc interpretation of .aq and .rl would in fact
> > >>  allow the two normal loads in P1 to be reordered [...]
> > >>
> > >>  [...]
> > >>
> > >>  Likewise even if the unlock()/lock() is between two stores.
> > >>  A control dependency might originate from the load part of
> > >>  the amoswap.w.aq, but there still would have to be something
> > >>  to ensure that this load part in fact performs after the store
> > >>  part of the amoswap.w.rl performs globally, and that's not
> > >>  automatic under RCpc.
> > >>
> > >>    Simulation of the RISC-V memory consistency model confirmed this
> > >>    expectation."
> > >>
> > >> I have just (re)checked these observations against the latest 
> > >> specification,
> > >> and my results _confirmed_ these verdicts.
> > > 
> > > Thanks, I must have just gotten confused about a draft spec or something. 
> > >  I'm
> > > pulling these on top or your other memory model related patch.  I've 
> > > renamed
> > > the branch "next-mm" to be a bit more descriptiove.
> > 
> > (Sorry for being out of the loop this week, I was out to deal with
> > a family matter.)
> > 
> > I assume you're using the herd model?  Luc's doing a great job with
> > that, but even so, nothing is officially confirmed until we ratify
> > the model.  In other words, the herd model may end up changing too.
> > If something is broken on our end, there's still time to fix it.
> > 
> > Regarding AMOs, let me copy from something I wrote in a previous
> > offline conversation:
> > 
> > > it seems to us that pairing a store-release of "amoswap.rl" with
> > > a "ld; fence r,rw" doesn't actually give us the RC semantics we've
> > > been discussing for LKMM.  For example:
> > > 
> > > (a) sd t0,0(s0)
> > > (b) amoswap.d.rl x0,t1,0(s1)
> > > ...
> > > (c) ld a0,0(s1)
> > > (d) fence r,rw
> > > (e) sd t2,0(s2)
> > > 
> > > There, we won't get (a) ordered before (e) regardless of whether
> > > (b) is RCpc or RCsc.  Do you agree?
> > 
> > At the moment, only the load part of (b) is in the predecessor
> > set of (d), but the store part of (b) is not.  Likewise, the
> > .rl annotation applies only to the store part of (b), not the
> > load part.
> > 
> > This gets back to a question Linus asked last week about
> > whether the AMO is a single unit or whether it can be
> 
> You mean AMO or RmW atomic operations?
> 
> > considered to split into a load and a store part (which still
> > perform atomically).  For RISC-V, for right now at least, the
> > answer is the latter.  Is it still the latter for Linux too?
> > 
> 
> I think for RmW atomics it's still the latter, the acquire or release is
> for the load part or the store part of an RmW. For example, ppc uses
> lwsync as acquire/release barriers, and lwsync could not order
> write->read. 

You are correct LKMM represent read-modify-write constructs with 2 events,
one read and one write. Those are connected by the special "rmw" relation.



> 
> Regards,
> Boqun

Regards,

--Luc


> 
> > https://lkml.org/lkml/2018/2/26/606
> > 
> > > So I think we'll need to make sure we pair .rl with .aq, or that
> > > we pair fence-based mappings with fence-based mappings, in order
> > > to make the acquire/release operations work.
> > 
> > This assumes we'll say that .aq and .rl are RCsc, not RCpc.
> > But in this case, I think .aq and .rl could still be safe to use,
> > as long 

Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()

2018-02-26 Thread Luc Maranget
> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
> >> So we have something that is not all that rare in the Linux kernel
> >> community, namely two conflicting more-or-less concurrent changes.
> >> This clearly needs to be resolved, either by us not strengthening the
> >> Linux-kernel memory model in the way we were planning to or by you
> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
> >> externally viewed release-acquire situations.
> >>
> >> Other thoughts?
> > 
> > Like said in the other email, I would _much_ prefer to not go weaker
> > than PPC, I find that PPC is already painfully weak at times.
> 
> Sure, and RISC-V could make this work too by using RCsc instructions
> and/or by using lightweight fences instead.  It just wasn't clear at
> first whether smp_load_acquire() and smp_store_release() were RCpc,
> RCsc, or something else, 

As far as I now understand, they are something else. Something in-between.

Basically, considering a multi-copy atomic model, and following RISCV
axiomatic herd notations

RCpc
   [Acq];po in ppo (ie Acquire "half barrier")
   po;[Rel] in ppo (ie Release "half barrier")

RCsc adds the rule
   [Rel];po;[Acq] in ppo. (ie Store release followed by load Acquire is
  kept in order)

While LKMM has a less constrained additional rule
   [Rel];po-loc;[Acq] (ie Store release followed by load Acquire from
  the SAME location is  kept in order)
(For specialists LKMM is formulated differently, the overall
effect is the same)

> and hence whether RISC-V would actually need
> to use something stronger than pure RCpc there.  Likewise for
> spin_unlock()/spin_lock() and everywhere else this comes up.
On may here observe that the LKMM rule is here precisely to
forbid Andrea's test. In that aspect,the suggested RISCv implemention
of lock and unlock on spinlock_t is too weak as it allows the test.

However, I cannot conclude on smp_load_acquire/smp_store_release (see below).

> 
> As Paul's email in the other thread observed, RCpc seems to be
> OK for smp_load_acquire()/smp_store_release() at least according
> to the current LKMM herd spec.  Unlock/lock are stronger already
> I guess.  But if there's an active proposal to strengthen them all
> to something stricter than pure RCpc, then that's good to know.

As illustrated above  RCpc is not enough to implement LLKM
smp_load_acquire()/smp_store_release(). However, I do not know
for sure what the current implementation of smp_load_acquire()
and smp_store_release() (ie LLKM primitives) in RISCV.
As far as I could find the current implementation is the generic default
that uses strong fences and is safe.

On a related note, it may not be clear yet to everyone
that LLKM has "platonic spinlocks".

That is, locks are not implemented from more basic primitive but are specified.
The specification can be described as behaving that way:
  - A lock behaves as a read-modify-write. teh read behaving as a read-acquire
  - A unlock behaves as a store release.
  - Lock and Unlock operation (to a given lock variable)
alternate in some total order. This order induces teh usal relations
between accesses to a given location.

This of course strongly suggest implementing lock with a RWM (R being acquire,
ie amoswap.aq in RISCV)  and unlock as a store release
(ie amoswap.rl in RISCV). And again, this is too weak given LKMM
additional rule for RC. However, nothing prevents from a different
implemention provided it is safe.
As regards LKMM spinlocks, I am not sure that having strong
implementations of lock/unlock in Power (lwsync) and
ARMv8 (RCsc?) commands a specification stronger than the current one.


> 
> My understanding from earlier discussions is that ARM has no plans
> to use their own RCpc instruction for smp_load_acquire() instead
> of their RCsc instructions.  Is that still true?  If they were to
> use the RCpc load there, that would cause them to have the same
> problem we're discussing here, right?  Just checking.
> 
> Dan


Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()

2018-02-26 Thread Luc Maranget
> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
> >> So we have something that is not all that rare in the Linux kernel
> >> community, namely two conflicting more-or-less concurrent changes.
> >> This clearly needs to be resolved, either by us not strengthening the
> >> Linux-kernel memory model in the way we were planning to or by you
> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
> >> externally viewed release-acquire situations.
> >>
> >> Other thoughts?
> > 
> > Like said in the other email, I would _much_ prefer to not go weaker
> > than PPC, I find that PPC is already painfully weak at times.
> 
> Sure, and RISC-V could make this work too by using RCsc instructions
> and/or by using lightweight fences instead.  It just wasn't clear at
> first whether smp_load_acquire() and smp_store_release() were RCpc,
> RCsc, or something else, 

As far as I now understand, they are something else. Something in-between.

Basically, considering a multi-copy atomic model, and following RISCV
axiomatic herd notations

RCpc
   [Acq];po in ppo (ie Acquire "half barrier")
   po;[Rel] in ppo (ie Release "half barrier")

RCsc adds the rule
   [Rel];po;[Acq] in ppo. (ie Store release followed by load Acquire is
  kept in order)

While LKMM has a less constrained additional rule
   [Rel];po-loc;[Acq] (ie Store release followed by load Acquire from
  the SAME location is  kept in order)
(For specialists LKMM is formulated differently, the overall
effect is the same)

> and hence whether RISC-V would actually need
> to use something stronger than pure RCpc there.  Likewise for
> spin_unlock()/spin_lock() and everywhere else this comes up.
On may here observe that the LKMM rule is here precisely to
forbid Andrea's test. In that aspect,the suggested RISCv implemention
of lock and unlock on spinlock_t is too weak as it allows the test.

However, I cannot conclude on smp_load_acquire/smp_store_release (see below).

> 
> As Paul's email in the other thread observed, RCpc seems to be
> OK for smp_load_acquire()/smp_store_release() at least according
> to the current LKMM herd spec.  Unlock/lock are stronger already
> I guess.  But if there's an active proposal to strengthen them all
> to something stricter than pure RCpc, then that's good to know.

As illustrated above  RCpc is not enough to implement LLKM
smp_load_acquire()/smp_store_release(). However, I do not know
for sure what the current implementation of smp_load_acquire()
and smp_store_release() (ie LLKM primitives) in RISCV.
As far as I could find the current implementation is the generic default
that uses strong fences and is safe.

On a related note, it may not be clear yet to everyone
that LLKM has "platonic spinlocks".

That is, locks are not implemented from more basic primitive but are specified.
The specification can be described as behaving that way:
  - A lock behaves as a read-modify-write. teh read behaving as a read-acquire
  - A unlock behaves as a store release.
  - Lock and Unlock operation (to a given lock variable)
alternate in some total order. This order induces teh usal relations
between accesses to a given location.

This of course strongly suggest implementing lock with a RWM (R being acquire,
ie amoswap.aq in RISCV)  and unlock as a store release
(ie amoswap.rl in RISCV). And again, this is too weak given LKMM
additional rule for RC. However, nothing prevents from a different
implemention provided it is safe.
As regards LKMM spinlocks, I am not sure that having strong
implementations of lock/unlock in Power (lwsync) and
ARMv8 (RCsc?) commands a specification stronger than the current one.


> 
> My understanding from earlier discussions is that ARM has no plans
> to use their own RCpc instruction for smp_load_acquire() instead
> of their RCsc instructions.  Is that still true?  If they were to
> use the RCpc load there, that would cause them to have the same
> problem we're discussing here, right?  Just checking.
> 
> Dan