Re: [PATCH] locking: Remove an insn from spin and write locks

2018-08-20 Thread Will Deacon
On Mon, Aug 20, 2018 at 09:26:39AM -0700, Matthew Wilcox wrote:
> On Mon, Aug 20, 2018 at 05:56:50PM +0200, Peter Zijlstra wrote:
> > Yeah, _acquire should be retained; sorry about loosing that. I'm neck
> > deep into tlb invalidate stuff and wrote this without much thinking
> > involved.
> 
> NP.  Here's the current version I've got, with some updated likely()
> hints.
> 
> From 337298a88266f7b21492f893c2bf05409a5392c8 Mon Sep 17 00:00:00 2001
> From: Matthew Wilcox 
> Date: Mon, 20 Aug 2018 10:19:14 -0400
> Subject: [PATCH] locking: Remove an insn from spin and write locks
> 
> Both spin locks and write locks currently do:
> 
>  f0 0f b1 17 lock cmpxchg %edx,(%rdi)
>  85 c0   test   %eax,%eax
>  75 05   jne[slowpath]
> 
> This 'test' insn is superfluous; the cmpxchg insn sets the Z flag
> appropriately.  Peter pointed out that using atomic_try_cmpxchg_acquire()
> will let the compiler know this is true.  Comparing before/after
> disassemblies show the only effect is to remove this insn.
> 
> Take this opportunity to make the spin & write lock code resemble each
> other more closely and have similar likely() hints.
> 
> Suggested-by: Peter Zijlstra 
> Signed-off-by: Matthew Wilcox 
> ---
>  include/asm-generic/qrwlock.h   |  7 ---
>  include/asm-generic/qspinlock.h | 17 ++---
>  2 files changed, 14 insertions(+), 10 deletions(-)

Shouldn't make any difference on arm64, so:

Acked-by: Will Deacon 

Will


Re: [PATCH] locking: Remove an insn from spin and write locks

2018-08-20 Thread Will Deacon
On Mon, Aug 20, 2018 at 09:26:39AM -0700, Matthew Wilcox wrote:
> On Mon, Aug 20, 2018 at 05:56:50PM +0200, Peter Zijlstra wrote:
> > Yeah, _acquire should be retained; sorry about loosing that. I'm neck
> > deep into tlb invalidate stuff and wrote this without much thinking
> > involved.
> 
> NP.  Here's the current version I've got, with some updated likely()
> hints.
> 
> From 337298a88266f7b21492f893c2bf05409a5392c8 Mon Sep 17 00:00:00 2001
> From: Matthew Wilcox 
> Date: Mon, 20 Aug 2018 10:19:14 -0400
> Subject: [PATCH] locking: Remove an insn from spin and write locks
> 
> Both spin locks and write locks currently do:
> 
>  f0 0f b1 17 lock cmpxchg %edx,(%rdi)
>  85 c0   test   %eax,%eax
>  75 05   jne[slowpath]
> 
> This 'test' insn is superfluous; the cmpxchg insn sets the Z flag
> appropriately.  Peter pointed out that using atomic_try_cmpxchg_acquire()
> will let the compiler know this is true.  Comparing before/after
> disassemblies show the only effect is to remove this insn.
> 
> Take this opportunity to make the spin & write lock code resemble each
> other more closely and have similar likely() hints.
> 
> Suggested-by: Peter Zijlstra 
> Signed-off-by: Matthew Wilcox 
> ---
>  include/asm-generic/qrwlock.h   |  7 ---
>  include/asm-generic/qspinlock.h | 17 ++---
>  2 files changed, 14 insertions(+), 10 deletions(-)

Shouldn't make any difference on arm64, so:

Acked-by: Will Deacon 

Will


Re: [PATCH] locking: Remove an insn from spin and write locks

2018-08-20 Thread Matthew Wilcox
On Mon, Aug 20, 2018 at 05:56:50PM +0200, Peter Zijlstra wrote:
> Yeah, _acquire should be retained; sorry about loosing that. I'm neck
> deep into tlb invalidate stuff and wrote this without much thinking
> involved.

NP.  Here's the current version I've got, with some updated likely()
hints.

>From 337298a88266f7b21492f893c2bf05409a5392c8 Mon Sep 17 00:00:00 2001
From: Matthew Wilcox 
Date: Mon, 20 Aug 2018 10:19:14 -0400
Subject: [PATCH] locking: Remove an insn from spin and write locks

Both spin locks and write locks currently do:

 f0 0f b1 17 lock cmpxchg %edx,(%rdi)
 85 c0   test   %eax,%eax
 75 05   jne[slowpath]

This 'test' insn is superfluous; the cmpxchg insn sets the Z flag
appropriately.  Peter pointed out that using atomic_try_cmpxchg_acquire()
will let the compiler know this is true.  Comparing before/after
disassemblies show the only effect is to remove this insn.

Take this opportunity to make the spin & write lock code resemble each
other more closely and have similar likely() hints.

Suggested-by: Peter Zijlstra 
Signed-off-by: Matthew Wilcox 
---
 include/asm-generic/qrwlock.h   |  7 ---
 include/asm-generic/qspinlock.h | 17 ++---
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 0f7062bd55e5..36254d2da8e0 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -71,8 +71,8 @@ static inline int queued_write_trylock(struct qrwlock *lock)
if (unlikely(cnts))
return 0;
 
-   return likely(atomic_cmpxchg_acquire(>cnts,
-cnts, cnts | _QW_LOCKED) == cnts);
+   return likely(atomic_try_cmpxchg_acquire(>cnts, ,
+   _QW_LOCKED));
 }
 /**
  * queued_read_lock - acquire read lock of a queue rwlock
@@ -96,8 +96,9 @@ static inline void queued_read_lock(struct qrwlock *lock)
  */
 static inline void queued_write_lock(struct qrwlock *lock)
 {
+   u32 cnts = 0;
/* Optimize for the unfair lock case where the fair flag is 0. */
-   if (atomic_cmpxchg_acquire(>cnts, 0, _QW_LOCKED) == 0)
+   if (likely(atomic_try_cmpxchg_acquire(>cnts, , _QW_LOCKED)))
return;
 
queued_write_lock_slowpath(lock);
diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index 95263e943fcc..24e7915eee56 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -68,10 +68,14 @@ int queued_spin_is_contended(const struct qspinlock *lock)
  */
 static __always_inline int queued_spin_trylock(struct qspinlock *lock)
 {
-   if (!atomic_read(>val) &&
-  (atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL) == 0))
-   return 1;
-   return 0;
+   u32 val;
+
+   val = atomic_read(>val);
+   if (unlikely(val))
+   return 0;
+
+   return likely(atomic_try_cmpxchg_acquire(>val, ,
+   _Q_LOCKED_VAL));
 }
 
 extern void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
@@ -82,10 +86,9 @@ extern void queued_spin_lock_slowpath(struct qspinlock 
*lock, u32 val);
  */
 static __always_inline void queued_spin_lock(struct qspinlock *lock)
 {
-   u32 val;
+   u32 val = 0;
 
-   val = atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL);
-   if (likely(val == 0))
+   if (likely(atomic_try_cmpxchg_acquire(>val, , _Q_LOCKED_VAL)))
return;
queued_spin_lock_slowpath(lock, val);
 }
-- 
2.18.0



Re: [PATCH] locking: Remove an insn from spin and write locks

2018-08-20 Thread Matthew Wilcox
On Mon, Aug 20, 2018 at 05:56:50PM +0200, Peter Zijlstra wrote:
> Yeah, _acquire should be retained; sorry about loosing that. I'm neck
> deep into tlb invalidate stuff and wrote this without much thinking
> involved.

NP.  Here's the current version I've got, with some updated likely()
hints.

>From 337298a88266f7b21492f893c2bf05409a5392c8 Mon Sep 17 00:00:00 2001
From: Matthew Wilcox 
Date: Mon, 20 Aug 2018 10:19:14 -0400
Subject: [PATCH] locking: Remove an insn from spin and write locks

Both spin locks and write locks currently do:

 f0 0f b1 17 lock cmpxchg %edx,(%rdi)
 85 c0   test   %eax,%eax
 75 05   jne[slowpath]

This 'test' insn is superfluous; the cmpxchg insn sets the Z flag
appropriately.  Peter pointed out that using atomic_try_cmpxchg_acquire()
will let the compiler know this is true.  Comparing before/after
disassemblies show the only effect is to remove this insn.

Take this opportunity to make the spin & write lock code resemble each
other more closely and have similar likely() hints.

Suggested-by: Peter Zijlstra 
Signed-off-by: Matthew Wilcox 
---
 include/asm-generic/qrwlock.h   |  7 ---
 include/asm-generic/qspinlock.h | 17 ++---
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 0f7062bd55e5..36254d2da8e0 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -71,8 +71,8 @@ static inline int queued_write_trylock(struct qrwlock *lock)
if (unlikely(cnts))
return 0;
 
-   return likely(atomic_cmpxchg_acquire(>cnts,
-cnts, cnts | _QW_LOCKED) == cnts);
+   return likely(atomic_try_cmpxchg_acquire(>cnts, ,
+   _QW_LOCKED));
 }
 /**
  * queued_read_lock - acquire read lock of a queue rwlock
@@ -96,8 +96,9 @@ static inline void queued_read_lock(struct qrwlock *lock)
  */
 static inline void queued_write_lock(struct qrwlock *lock)
 {
+   u32 cnts = 0;
/* Optimize for the unfair lock case where the fair flag is 0. */
-   if (atomic_cmpxchg_acquire(>cnts, 0, _QW_LOCKED) == 0)
+   if (likely(atomic_try_cmpxchg_acquire(>cnts, , _QW_LOCKED)))
return;
 
queued_write_lock_slowpath(lock);
diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index 95263e943fcc..24e7915eee56 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -68,10 +68,14 @@ int queued_spin_is_contended(const struct qspinlock *lock)
  */
 static __always_inline int queued_spin_trylock(struct qspinlock *lock)
 {
-   if (!atomic_read(>val) &&
-  (atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL) == 0))
-   return 1;
-   return 0;
+   u32 val;
+
+   val = atomic_read(>val);
+   if (unlikely(val))
+   return 0;
+
+   return likely(atomic_try_cmpxchg_acquire(>val, ,
+   _Q_LOCKED_VAL));
 }
 
 extern void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
@@ -82,10 +86,9 @@ extern void queued_spin_lock_slowpath(struct qspinlock 
*lock, u32 val);
  */
 static __always_inline void queued_spin_lock(struct qspinlock *lock)
 {
-   u32 val;
+   u32 val = 0;
 
-   val = atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL);
-   if (likely(val == 0))
+   if (likely(atomic_try_cmpxchg_acquire(>val, , _Q_LOCKED_VAL)))
return;
queued_spin_lock_slowpath(lock, val);
 }
-- 
2.18.0



Re: [PATCH] locking: Remove an insn from spin and write locks

2018-08-20 Thread Peter Zijlstra
On Mon, Aug 20, 2018 at 08:50:02AM -0700, Matthew Wilcox wrote:
> On Mon, Aug 20, 2018 at 11:14:04AM -0400, Waiman Long wrote:
> > On 08/20/2018 11:06 AM, Matthew Wilcox wrote:
> > > Both spin locks and write locks currently do:
> > >
> > >  f0 0f b1 17 lock cmpxchg %edx,(%rdi)
> > >  85 c0   test   %eax,%eax
> > >  75 05   jne[slowpath]
> > >
> > > This 'test' insn is superfluous; the cmpxchg insn sets the Z flag
> > > appropriately.  Peter pointed out that using atomic_try_cmpxchg()
> > > will let the compiler know this is true.  Comparing before/after
> > > disassemblies show the only effect is to remove this insn.
> ...
> > >  static __always_inline int queued_spin_trylock(struct qspinlock *lock)
> > >  {
> > > + u32 val = 0;
> > > +
> > >   if (!atomic_read(>val) &&
> > > -(atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL) == 0))
> > > + (atomic_try_cmpxchg(>val, , _Q_LOCKED_VAL)))
> > 
> > Should you keep the _acquire suffix?
> 
> I don't know ;-)  Probably.  Peter didn't include it as part of his
> suggested fix, but on reviewing the documentation, it seems likely that
> it should be retained.  I put them back in and (as expected) it changes
> nothing on x86-64.

Yeah, _acquire should be retained; sorry about loosing that. I'm neck
deep into tlb invalidate stuff and wrote this without much thinking
involved.




Re: [PATCH] locking: Remove an insn from spin and write locks

2018-08-20 Thread Peter Zijlstra
On Mon, Aug 20, 2018 at 08:50:02AM -0700, Matthew Wilcox wrote:
> On Mon, Aug 20, 2018 at 11:14:04AM -0400, Waiman Long wrote:
> > On 08/20/2018 11:06 AM, Matthew Wilcox wrote:
> > > Both spin locks and write locks currently do:
> > >
> > >  f0 0f b1 17 lock cmpxchg %edx,(%rdi)
> > >  85 c0   test   %eax,%eax
> > >  75 05   jne[slowpath]
> > >
> > > This 'test' insn is superfluous; the cmpxchg insn sets the Z flag
> > > appropriately.  Peter pointed out that using atomic_try_cmpxchg()
> > > will let the compiler know this is true.  Comparing before/after
> > > disassemblies show the only effect is to remove this insn.
> ...
> > >  static __always_inline int queued_spin_trylock(struct qspinlock *lock)
> > >  {
> > > + u32 val = 0;
> > > +
> > >   if (!atomic_read(>val) &&
> > > -(atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL) == 0))
> > > + (atomic_try_cmpxchg(>val, , _Q_LOCKED_VAL)))
> > 
> > Should you keep the _acquire suffix?
> 
> I don't know ;-)  Probably.  Peter didn't include it as part of his
> suggested fix, but on reviewing the documentation, it seems likely that
> it should be retained.  I put them back in and (as expected) it changes
> nothing on x86-64.

Yeah, _acquire should be retained; sorry about loosing that. I'm neck
deep into tlb invalidate stuff and wrote this without much thinking
involved.




Re: [PATCH] locking: Remove an insn from spin and write locks

2018-08-20 Thread Waiman Long
On 08/20/2018 11:50 AM, Matthew Wilcox wrote:
> On Mon, Aug 20, 2018 at 11:14:04AM -0400, Waiman Long wrote:
>> On 08/20/2018 11:06 AM, Matthew Wilcox wrote:
>>> Both spin locks and write locks currently do:
>>>
>>>  f0 0f b1 17 lock cmpxchg %edx,(%rdi)
>>>  85 c0   test   %eax,%eax
>>>  75 05   jne[slowpath]
>>>
>>> This 'test' insn is superfluous; the cmpxchg insn sets the Z flag
>>> appropriately.  Peter pointed out that using atomic_try_cmpxchg()
>>> will let the compiler know this is true.  Comparing before/after
>>> disassemblies show the only effect is to remove this insn.
> ...
>>>  static __always_inline int queued_spin_trylock(struct qspinlock *lock)
>>>  {
>>> +   u32 val = 0;
>>> +
>>> if (!atomic_read(>val) &&
>>> -  (atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL) == 0))
>>> +   (atomic_try_cmpxchg(>val, , _Q_LOCKED_VAL)))
>> Should you keep the _acquire suffix?
> I don't know ;-)  Probably.  Peter didn't include it as part of his
> suggested fix, but on reviewing the documentation, it seems likely that
> it should be retained.  I put them back in and (as expected) it changes
> nothing on x86-64.

We will certainly need to keep the _acquire suffix or it will likely
regress performance for arm64.

>> BTW, qspinlock and qrwlock are now also used by AArch64, mips and sparc.
>> Have you tried to see what the effect will be for those architecture?
> Nope!  That's why I cc'd linux-arch, because I don't know who (other
> than arm64 and x86) is using q-locks these days.

I think both Sparc and mips are using qlocks now, though these
architectures are not the ones that I am interested in. I do like to
make sure that there will be no regression for arm64. Will should be
able to answer that.

Cheers,
Longman



Re: [PATCH] locking: Remove an insn from spin and write locks

2018-08-20 Thread Waiman Long
On 08/20/2018 11:50 AM, Matthew Wilcox wrote:
> On Mon, Aug 20, 2018 at 11:14:04AM -0400, Waiman Long wrote:
>> On 08/20/2018 11:06 AM, Matthew Wilcox wrote:
>>> Both spin locks and write locks currently do:
>>>
>>>  f0 0f b1 17 lock cmpxchg %edx,(%rdi)
>>>  85 c0   test   %eax,%eax
>>>  75 05   jne[slowpath]
>>>
>>> This 'test' insn is superfluous; the cmpxchg insn sets the Z flag
>>> appropriately.  Peter pointed out that using atomic_try_cmpxchg()
>>> will let the compiler know this is true.  Comparing before/after
>>> disassemblies show the only effect is to remove this insn.
> ...
>>>  static __always_inline int queued_spin_trylock(struct qspinlock *lock)
>>>  {
>>> +   u32 val = 0;
>>> +
>>> if (!atomic_read(>val) &&
>>> -  (atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL) == 0))
>>> +   (atomic_try_cmpxchg(>val, , _Q_LOCKED_VAL)))
>> Should you keep the _acquire suffix?
> I don't know ;-)  Probably.  Peter didn't include it as part of his
> suggested fix, but on reviewing the documentation, it seems likely that
> it should be retained.  I put them back in and (as expected) it changes
> nothing on x86-64.

We will certainly need to keep the _acquire suffix or it will likely
regress performance for arm64.

>> BTW, qspinlock and qrwlock are now also used by AArch64, mips and sparc.
>> Have you tried to see what the effect will be for those architecture?
> Nope!  That's why I cc'd linux-arch, because I don't know who (other
> than arm64 and x86) is using q-locks these days.

I think both Sparc and mips are using qlocks now, though these
architectures are not the ones that I am interested in. I do like to
make sure that there will be no regression for arm64. Will should be
able to answer that.

Cheers,
Longman



Re: [PATCH] locking: Remove an insn from spin and write locks

2018-08-20 Thread Matthew Wilcox
On Mon, Aug 20, 2018 at 11:14:04AM -0400, Waiman Long wrote:
> On 08/20/2018 11:06 AM, Matthew Wilcox wrote:
> > Both spin locks and write locks currently do:
> >
> >  f0 0f b1 17 lock cmpxchg %edx,(%rdi)
> >  85 c0   test   %eax,%eax
> >  75 05   jne[slowpath]
> >
> > This 'test' insn is superfluous; the cmpxchg insn sets the Z flag
> > appropriately.  Peter pointed out that using atomic_try_cmpxchg()
> > will let the compiler know this is true.  Comparing before/after
> > disassemblies show the only effect is to remove this insn.
...
> >  static __always_inline int queued_spin_trylock(struct qspinlock *lock)
> >  {
> > +   u32 val = 0;
> > +
> > if (!atomic_read(>val) &&
> > -  (atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL) == 0))
> > +   (atomic_try_cmpxchg(>val, , _Q_LOCKED_VAL)))
> 
> Should you keep the _acquire suffix?

I don't know ;-)  Probably.  Peter didn't include it as part of his
suggested fix, but on reviewing the documentation, it seems likely that
it should be retained.  I put them back in and (as expected) it changes
nothing on x86-64.

> BTW, qspinlock and qrwlock are now also used by AArch64, mips and sparc.
> Have you tried to see what the effect will be for those architecture?

Nope!  That's why I cc'd linux-arch, because I don't know who (other
than arm64 and x86) is using q-locks these days.


Re: [PATCH] locking: Remove an insn from spin and write locks

2018-08-20 Thread Matthew Wilcox
On Mon, Aug 20, 2018 at 11:14:04AM -0400, Waiman Long wrote:
> On 08/20/2018 11:06 AM, Matthew Wilcox wrote:
> > Both spin locks and write locks currently do:
> >
> >  f0 0f b1 17 lock cmpxchg %edx,(%rdi)
> >  85 c0   test   %eax,%eax
> >  75 05   jne[slowpath]
> >
> > This 'test' insn is superfluous; the cmpxchg insn sets the Z flag
> > appropriately.  Peter pointed out that using atomic_try_cmpxchg()
> > will let the compiler know this is true.  Comparing before/after
> > disassemblies show the only effect is to remove this insn.
...
> >  static __always_inline int queued_spin_trylock(struct qspinlock *lock)
> >  {
> > +   u32 val = 0;
> > +
> > if (!atomic_read(>val) &&
> > -  (atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL) == 0))
> > +   (atomic_try_cmpxchg(>val, , _Q_LOCKED_VAL)))
> 
> Should you keep the _acquire suffix?

I don't know ;-)  Probably.  Peter didn't include it as part of his
suggested fix, but on reviewing the documentation, it seems likely that
it should be retained.  I put them back in and (as expected) it changes
nothing on x86-64.

> BTW, qspinlock and qrwlock are now also used by AArch64, mips and sparc.
> Have you tried to see what the effect will be for those architecture?

Nope!  That's why I cc'd linux-arch, because I don't know who (other
than arm64 and x86) is using q-locks these days.


Re: [PATCH] locking: Remove an insn from spin and write locks

2018-08-20 Thread Waiman Long
On 08/20/2018 11:06 AM, Matthew Wilcox wrote:
> Both spin locks and write locks currently do:
>
>  f0 0f b1 17 lock cmpxchg %edx,(%rdi)
>  85 c0   test   %eax,%eax
>  75 05   jne[slowpath]
>
> This 'test' insn is superfluous; the cmpxchg insn sets the Z flag
> appropriately.  Peter pointed out that using atomic_try_cmpxchg()
> will let the compiler know this is true.  Comparing before/after
> disassemblies show the only effect is to remove this insn.
>
> Suggested-by: Peter Zijlstra 
> Signed-off-by: Matthew Wilcox 
> ---
>  include/asm-generic/qrwlock.h   | 6 +++---
>  include/asm-generic/qspinlock.h | 9 +
>  2 files changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
> index 0f7062bd55e5..9a1beb7ad0de 100644
> --- a/include/asm-generic/qrwlock.h
> +++ b/include/asm-generic/qrwlock.h
> @@ -71,8 +71,7 @@ static inline int queued_write_trylock(struct qrwlock *lock)
>   if (unlikely(cnts))
>   return 0;
>  
> - return likely(atomic_cmpxchg_acquire(>cnts,
> -  cnts, cnts | _QW_LOCKED) == cnts);
> + return likely(atomic_try_cmpxchg(>cnts, , _QW_LOCKED));
>  }
>  /**
>   * queued_read_lock - acquire read lock of a queue rwlock
> @@ -96,8 +95,9 @@ static inline void queued_read_lock(struct qrwlock *lock)
>   */
>  static inline void queued_write_lock(struct qrwlock *lock)
>  {
> + u32 cnts = 0;
>   /* Optimize for the unfair lock case where the fair flag is 0. */
> - if (atomic_cmpxchg_acquire(>cnts, 0, _QW_LOCKED) == 0)
> + if (atomic_try_cmpxchg_acquire(>cnts, , _QW_LOCKED))
>   return;
>  
>   queued_write_lock_slowpath(lock);
> diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
> index 95263e943fcc..d125f0c0b3d9 100644
> --- a/include/asm-generic/qspinlock.h
> +++ b/include/asm-generic/qspinlock.h
> @@ -68,8 +68,10 @@ int queued_spin_is_contended(const struct qspinlock *lock)
>   */
>  static __always_inline int queued_spin_trylock(struct qspinlock *lock)
>  {
> + u32 val = 0;
> +
>   if (!atomic_read(>val) &&
> -(atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL) == 0))
> + (atomic_try_cmpxchg(>val, , _Q_LOCKED_VAL)))

Should you keep the _acquire suffix?

>   return 1;
>   return 0;
>  }
> @@ -82,10 +84,9 @@ extern void queued_spin_lock_slowpath(struct qspinlock 
> *lock, u32 val);
>   */
>  static __always_inline void queued_spin_lock(struct qspinlock *lock)
>  {
> - u32 val;
> + u32 val = 0;
>  
> - val = atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL);
> - if (likely(val == 0))
> + if (likely(atomic_try_cmpxchg(>val, , _Q_LOCKED_VAL)))
>   return;
>   queued_spin_lock_slowpath(lock, val);
>  }

Ditto.

BTW, qspinlock and qrwlock are now also used by AArch64, mips and sparc.
Have you tried to see what the effect will be for those architecture?

-Longman



Re: [PATCH] locking: Remove an insn from spin and write locks

2018-08-20 Thread Waiman Long
On 08/20/2018 11:06 AM, Matthew Wilcox wrote:
> Both spin locks and write locks currently do:
>
>  f0 0f b1 17 lock cmpxchg %edx,(%rdi)
>  85 c0   test   %eax,%eax
>  75 05   jne[slowpath]
>
> This 'test' insn is superfluous; the cmpxchg insn sets the Z flag
> appropriately.  Peter pointed out that using atomic_try_cmpxchg()
> will let the compiler know this is true.  Comparing before/after
> disassemblies show the only effect is to remove this insn.
>
> Suggested-by: Peter Zijlstra 
> Signed-off-by: Matthew Wilcox 
> ---
>  include/asm-generic/qrwlock.h   | 6 +++---
>  include/asm-generic/qspinlock.h | 9 +
>  2 files changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
> index 0f7062bd55e5..9a1beb7ad0de 100644
> --- a/include/asm-generic/qrwlock.h
> +++ b/include/asm-generic/qrwlock.h
> @@ -71,8 +71,7 @@ static inline int queued_write_trylock(struct qrwlock *lock)
>   if (unlikely(cnts))
>   return 0;
>  
> - return likely(atomic_cmpxchg_acquire(>cnts,
> -  cnts, cnts | _QW_LOCKED) == cnts);
> + return likely(atomic_try_cmpxchg(>cnts, , _QW_LOCKED));
>  }
>  /**
>   * queued_read_lock - acquire read lock of a queue rwlock
> @@ -96,8 +95,9 @@ static inline void queued_read_lock(struct qrwlock *lock)
>   */
>  static inline void queued_write_lock(struct qrwlock *lock)
>  {
> + u32 cnts = 0;
>   /* Optimize for the unfair lock case where the fair flag is 0. */
> - if (atomic_cmpxchg_acquire(>cnts, 0, _QW_LOCKED) == 0)
> + if (atomic_try_cmpxchg_acquire(>cnts, , _QW_LOCKED))
>   return;
>  
>   queued_write_lock_slowpath(lock);
> diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
> index 95263e943fcc..d125f0c0b3d9 100644
> --- a/include/asm-generic/qspinlock.h
> +++ b/include/asm-generic/qspinlock.h
> @@ -68,8 +68,10 @@ int queued_spin_is_contended(const struct qspinlock *lock)
>   */
>  static __always_inline int queued_spin_trylock(struct qspinlock *lock)
>  {
> + u32 val = 0;
> +
>   if (!atomic_read(>val) &&
> -(atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL) == 0))
> + (atomic_try_cmpxchg(>val, , _Q_LOCKED_VAL)))

Should you keep the _acquire suffix?

>   return 1;
>   return 0;
>  }
> @@ -82,10 +84,9 @@ extern void queued_spin_lock_slowpath(struct qspinlock 
> *lock, u32 val);
>   */
>  static __always_inline void queued_spin_lock(struct qspinlock *lock)
>  {
> - u32 val;
> + u32 val = 0;
>  
> - val = atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL);
> - if (likely(val == 0))
> + if (likely(atomic_try_cmpxchg(>val, , _Q_LOCKED_VAL)))
>   return;
>   queued_spin_lock_slowpath(lock, val);
>  }

Ditto.

BTW, qspinlock and qrwlock are now also used by AArch64, mips and sparc.
Have you tried to see what the effect will be for those architecture?

-Longman



[PATCH] locking: Remove an insn from spin and write locks

2018-08-20 Thread Matthew Wilcox
Both spin locks and write locks currently do:

 f0 0f b1 17 lock cmpxchg %edx,(%rdi)
 85 c0   test   %eax,%eax
 75 05   jne[slowpath]

This 'test' insn is superfluous; the cmpxchg insn sets the Z flag
appropriately.  Peter pointed out that using atomic_try_cmpxchg()
will let the compiler know this is true.  Comparing before/after
disassemblies show the only effect is to remove this insn.

Suggested-by: Peter Zijlstra 
Signed-off-by: Matthew Wilcox 
---
 include/asm-generic/qrwlock.h   | 6 +++---
 include/asm-generic/qspinlock.h | 9 +
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 0f7062bd55e5..9a1beb7ad0de 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -71,8 +71,7 @@ static inline int queued_write_trylock(struct qrwlock *lock)
if (unlikely(cnts))
return 0;
 
-   return likely(atomic_cmpxchg_acquire(>cnts,
-cnts, cnts | _QW_LOCKED) == cnts);
+   return likely(atomic_try_cmpxchg(>cnts, , _QW_LOCKED));
 }
 /**
  * queued_read_lock - acquire read lock of a queue rwlock
@@ -96,8 +95,9 @@ static inline void queued_read_lock(struct qrwlock *lock)
  */
 static inline void queued_write_lock(struct qrwlock *lock)
 {
+   u32 cnts = 0;
/* Optimize for the unfair lock case where the fair flag is 0. */
-   if (atomic_cmpxchg_acquire(>cnts, 0, _QW_LOCKED) == 0)
+   if (atomic_try_cmpxchg_acquire(>cnts, , _QW_LOCKED))
return;
 
queued_write_lock_slowpath(lock);
diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index 95263e943fcc..d125f0c0b3d9 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -68,8 +68,10 @@ int queued_spin_is_contended(const struct qspinlock *lock)
  */
 static __always_inline int queued_spin_trylock(struct qspinlock *lock)
 {
+   u32 val = 0;
+
if (!atomic_read(>val) &&
-  (atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL) == 0))
+   (atomic_try_cmpxchg(>val, , _Q_LOCKED_VAL)))
return 1;
return 0;
 }
@@ -82,10 +84,9 @@ extern void queued_spin_lock_slowpath(struct qspinlock 
*lock, u32 val);
  */
 static __always_inline void queued_spin_lock(struct qspinlock *lock)
 {
-   u32 val;
+   u32 val = 0;
 
-   val = atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL);
-   if (likely(val == 0))
+   if (likely(atomic_try_cmpxchg(>val, , _Q_LOCKED_VAL)))
return;
queued_spin_lock_slowpath(lock, val);
 }
-- 
2.18.0



[PATCH] locking: Remove an insn from spin and write locks

2018-08-20 Thread Matthew Wilcox
Both spin locks and write locks currently do:

 f0 0f b1 17 lock cmpxchg %edx,(%rdi)
 85 c0   test   %eax,%eax
 75 05   jne[slowpath]

This 'test' insn is superfluous; the cmpxchg insn sets the Z flag
appropriately.  Peter pointed out that using atomic_try_cmpxchg()
will let the compiler know this is true.  Comparing before/after
disassemblies show the only effect is to remove this insn.

Suggested-by: Peter Zijlstra 
Signed-off-by: Matthew Wilcox 
---
 include/asm-generic/qrwlock.h   | 6 +++---
 include/asm-generic/qspinlock.h | 9 +
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 0f7062bd55e5..9a1beb7ad0de 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -71,8 +71,7 @@ static inline int queued_write_trylock(struct qrwlock *lock)
if (unlikely(cnts))
return 0;
 
-   return likely(atomic_cmpxchg_acquire(>cnts,
-cnts, cnts | _QW_LOCKED) == cnts);
+   return likely(atomic_try_cmpxchg(>cnts, , _QW_LOCKED));
 }
 /**
  * queued_read_lock - acquire read lock of a queue rwlock
@@ -96,8 +95,9 @@ static inline void queued_read_lock(struct qrwlock *lock)
  */
 static inline void queued_write_lock(struct qrwlock *lock)
 {
+   u32 cnts = 0;
/* Optimize for the unfair lock case where the fair flag is 0. */
-   if (atomic_cmpxchg_acquire(>cnts, 0, _QW_LOCKED) == 0)
+   if (atomic_try_cmpxchg_acquire(>cnts, , _QW_LOCKED))
return;
 
queued_write_lock_slowpath(lock);
diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index 95263e943fcc..d125f0c0b3d9 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -68,8 +68,10 @@ int queued_spin_is_contended(const struct qspinlock *lock)
  */
 static __always_inline int queued_spin_trylock(struct qspinlock *lock)
 {
+   u32 val = 0;
+
if (!atomic_read(>val) &&
-  (atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL) == 0))
+   (atomic_try_cmpxchg(>val, , _Q_LOCKED_VAL)))
return 1;
return 0;
 }
@@ -82,10 +84,9 @@ extern void queued_spin_lock_slowpath(struct qspinlock 
*lock, u32 val);
  */
 static __always_inline void queued_spin_lock(struct qspinlock *lock)
 {
-   u32 val;
+   u32 val = 0;
 
-   val = atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL);
-   if (likely(val == 0))
+   if (likely(atomic_try_cmpxchg(>val, , _Q_LOCKED_VAL)))
return;
queued_spin_lock_slowpath(lock, val);
 }
-- 
2.18.0