Re: [PATCH] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write

2015-04-23 Thread Waiman Long

On 04/18/2015 11:40 AM, Peter Zijlstra wrote:

On Fri, Apr 17, 2015 at 10:03:18PM -0400, Waiman Long wrote:

@@ -478,7 +515,28 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
  {
unsigned long flags;

-   raw_spin_lock_irqsave(>wait_lock, flags);
+   /*
+* If a spinner is present, it is not necessary to do the wakeup.
+* Try to do wakeup only if the trylock succeeds to minimize
+* spinlock contention which may introduce too much delay in the
+* unlock operation.
+*
+* In case the spinning writer is just going to break out of the
+* waiting loop, it will still do a trylock in
+* rwsem_down_write_failed() before sleeping.
+* IOW, if rwsem_has_spinner() is true, it will guarantee at least
+* one trylock attempt on the rwsem.

successful trylock? I think we're having 'issues' on if failed trylocks
(and cmpxchg) imply full barriers.


+*
+*spinning writer
+*---
+* [S]   osq_unlock()
+*   MB
+* [RmW] rwsem_try_write_lock()
+*/

Ordering comes in pairs, this is incomplete.


I am sorry that I am a bit sloppy here. I have just sent out an updated 
patch to remedy this. I have added a smp_mb__after_atomic() to ensure 
proper memory ordering. However, I am not so sure if this primitive or 
just a simple smp_rmb() will be more expensive in other non-x86 
architectures.


Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write

2015-04-23 Thread Waiman Long

On 04/18/2015 11:40 AM, Peter Zijlstra wrote:

On Fri, Apr 17, 2015 at 10:03:18PM -0400, Waiman Long wrote:

@@ -478,7 +515,28 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
  {
unsigned long flags;

-   raw_spin_lock_irqsave(sem-wait_lock, flags);
+   /*
+* If a spinner is present, it is not necessary to do the wakeup.
+* Try to do wakeup only if the trylock succeeds to minimize
+* spinlock contention which may introduce too much delay in the
+* unlock operation.
+*
+* In case the spinning writer is just going to break out of the
+* waiting loop, it will still do a trylock in
+* rwsem_down_write_failed() before sleeping.
+* IOW, if rwsem_has_spinner() is true, it will guarantee at least
+* one trylock attempt on the rwsem.

successful trylock? I think we're having 'issues' on if failed trylocks
(and cmpxchg) imply full barriers.


+*
+*spinning writer
+*---
+* [S]   osq_unlock()
+*   MB
+* [RmW] rwsem_try_write_lock()
+*/

Ordering comes in pairs, this is incomplete.


I am sorry that I am a bit sloppy here. I have just sent out an updated 
patch to remedy this. I have added a smp_mb__after_atomic() to ensure 
proper memory ordering. However, I am not so sure if this primitive or 
just a simple smp_rmb() will be more expensive in other non-x86 
architectures.


Cheers,
Longman
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write

2015-04-22 Thread Waiman Long

On 04/20/2015 04:23 PM, Jason Low wrote:

On Fri, 2015-04-17 at 22:03 -0400, Waiman Long wrote:


diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h
index 3a6490e..703ea5c 100644
--- a/include/linux/osq_lock.h
+++ b/include/linux/osq_lock.h
@@ -32,4 +32,9 @@ static inline void osq_lock_init(struct optimistic_spin_queue 
*lock)
  extern bool osq_lock(struct optimistic_spin_queue *lock);
  extern void osq_unlock(struct optimistic_spin_queue *lock);

+static inline bool osq_is_locked(struct optimistic_spin_queue *lock)
+{
+   return atomic_read(>tail) != OSQ_UNLOCKED_VAL;
+}

Would it be better to separate the addition of osq_is_locked() into its
own patch, since this can be useful for other situations and isn't just
specific to the rwsem optimization.



I think the osq_lock.h change is too simple and straight forward to 
warrant a separate patch.


Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write

2015-04-22 Thread Waiman Long

On 04/20/2015 04:23 PM, Jason Low wrote:

On Fri, 2015-04-17 at 22:03 -0400, Waiman Long wrote:


diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h
index 3a6490e..703ea5c 100644
--- a/include/linux/osq_lock.h
+++ b/include/linux/osq_lock.h
@@ -32,4 +32,9 @@ static inline void osq_lock_init(struct optimistic_spin_queue 
*lock)
  extern bool osq_lock(struct optimistic_spin_queue *lock);
  extern void osq_unlock(struct optimistic_spin_queue *lock);

+static inline bool osq_is_locked(struct optimistic_spin_queue *lock)
+{
+   return atomic_read(lock-tail) != OSQ_UNLOCKED_VAL;
+}

Would it be better to separate the addition of osq_is_locked() into its
own patch, since this can be useful for other situations and isn't just
specific to the rwsem optimization.



I think the osq_lock.h change is too simple and straight forward to 
warrant a separate patch.


Cheers,
Longman
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write

2015-04-20 Thread Jason Low
On Fri, 2015-04-17 at 22:03 -0400, Waiman Long wrote:

> diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h
> index 3a6490e..703ea5c 100644
> --- a/include/linux/osq_lock.h
> +++ b/include/linux/osq_lock.h
> @@ -32,4 +32,9 @@ static inline void osq_lock_init(struct 
> optimistic_spin_queue *lock)
>  extern bool osq_lock(struct optimistic_spin_queue *lock);
>  extern void osq_unlock(struct optimistic_spin_queue *lock);
>  
> +static inline bool osq_is_locked(struct optimistic_spin_queue *lock)
> +{
> + return atomic_read(>tail) != OSQ_UNLOCKED_VAL;
> +}

Would it be better to separate the addition of osq_is_locked() into its
own patch, since this can be useful for other situations and isn't just
specific to the rwsem optimization.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write

2015-04-20 Thread Jason Low
On Fri, 2015-04-17 at 22:03 -0400, Waiman Long wrote:

 diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h
 index 3a6490e..703ea5c 100644
 --- a/include/linux/osq_lock.h
 +++ b/include/linux/osq_lock.h
 @@ -32,4 +32,9 @@ static inline void osq_lock_init(struct 
 optimistic_spin_queue *lock)
  extern bool osq_lock(struct optimistic_spin_queue *lock);
  extern void osq_unlock(struct optimistic_spin_queue *lock);
  
 +static inline bool osq_is_locked(struct optimistic_spin_queue *lock)
 +{
 + return atomic_read(lock-tail) != OSQ_UNLOCKED_VAL;
 +}

Would it be better to separate the addition of osq_is_locked() into its
own patch, since this can be useful for other situations and isn't just
specific to the rwsem optimization.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write

2015-04-18 Thread Peter Zijlstra
On Fri, Apr 17, 2015 at 10:03:18PM -0400, Waiman Long wrote:
> @@ -478,7 +515,28 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
>  {
>   unsigned long flags;
>  
> - raw_spin_lock_irqsave(>wait_lock, flags);
> + /*
> +  * If a spinner is present, it is not necessary to do the wakeup.
> +  * Try to do wakeup only if the trylock succeeds to minimize
> +  * spinlock contention which may introduce too much delay in the
> +  * unlock operation.
> +  *
> +  * In case the spinning writer is just going to break out of the
> +  * waiting loop, it will still do a trylock in
> +  * rwsem_down_write_failed() before sleeping.
> +  * IOW, if rwsem_has_spinner() is true, it will guarantee at least
> +  * one trylock attempt on the rwsem.

successful trylock? I think we're having 'issues' on if failed trylocks
(and cmpxchg) imply full barriers.

> +  *
> +  *spinning writer
> +  *---
> +  * [S]   osq_unlock()
> +  *   MB
> +  * [RmW] rwsem_try_write_lock()
> +  */

Ordering comes in pairs, this is incomplete.

> + if (!rwsem_has_spinner(sem))
> + raw_spin_lock_irqsave(>wait_lock, flags);
> + else if (!raw_spin_trylock_irqsave(>wait_lock, flags))
> + return sem;
>  
>   /* do nothing if list empty */
>   if (!list_empty(>wait_list))
> -- 
> 1.7.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write

2015-04-18 Thread Peter Zijlstra
On Fri, Apr 17, 2015 at 10:03:18PM -0400, Waiman Long wrote:
 @@ -478,7 +515,28 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
  {
   unsigned long flags;
  
 - raw_spin_lock_irqsave(sem-wait_lock, flags);
 + /*
 +  * If a spinner is present, it is not necessary to do the wakeup.
 +  * Try to do wakeup only if the trylock succeeds to minimize
 +  * spinlock contention which may introduce too much delay in the
 +  * unlock operation.
 +  *
 +  * In case the spinning writer is just going to break out of the
 +  * waiting loop, it will still do a trylock in
 +  * rwsem_down_write_failed() before sleeping.
 +  * IOW, if rwsem_has_spinner() is true, it will guarantee at least
 +  * one trylock attempt on the rwsem.

successful trylock? I think we're having 'issues' on if failed trylocks
(and cmpxchg) imply full barriers.

 +  *
 +  *spinning writer
 +  *---
 +  * [S]   osq_unlock()
 +  *   MB
 +  * [RmW] rwsem_try_write_lock()
 +  */

Ordering comes in pairs, this is incomplete.

 + if (!rwsem_has_spinner(sem))
 + raw_spin_lock_irqsave(sem-wait_lock, flags);
 + else if (!raw_spin_trylock_irqsave(sem-wait_lock, flags))
 + return sem;
  
   /* do nothing if list empty */
   if (!list_empty(sem-wait_list))
 -- 
 1.7.1
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write

2015-04-17 Thread Waiman Long
In up_read()/up_write(), rwsem_wake() will be called whenever it
detects that some writers/readers are waiting. The rwsem_wake()
function will take the wait_lock and call __rwsem_do_wake() to do
the real wakeup.  This can be a problem especially for up_read()
where many readers can potentially call rwsem_wake() at more or less
the same time even though a single call should be enough. This will
cause contention in the wait_lock cacheline resulting in delay of
the up_read/up_write operations.

This patch makes the wait_lock taking and the call to __rwsem_do_wake()
optional if at least one spinning writer is present. The spinning
writer will be able to take the rwsem and call rwsem_wake() later
when it calls up_write(). With the presence of a spinning writer,
rwsem_wake() will now try to acquire the lock using trylock. If that
fails, it will just quit.

This patch also checks one more time to see if the rwsem was stolen
just before doing the expensive wakeup operation which will be
unnecessary if the lock was stolen.

On an 8-socket Westmere-EX server (80 cores, HT off), running AIM7's
high_systime workload (1000 users) on a vanilla 4.0 kernel produced
the following perf profile for spinlock contention:

  9.23%reaim  [kernel.kallsyms][k] _raw_spin_lock_irqsave
  |--97.39%-- rwsem_wake
  |--0.69%-- try_to_wake_up
  |--0.52%-- release_pages
   --1.40%-- [...]

  1.70%reaim  [kernel.kallsyms][k] _raw_spin_lock_irq
  |--96.61%-- rwsem_down_write_failed
  |--2.03%-- __schedule
  |--0.50%-- run_timer_softirq
   --0.86%-- [...]

With a patched 4.0 kernel, the perf profile became:

  1.87%reaim  [kernel.kallsyms][k] _raw_spin_lock_irqsave
  |--87.64%-- rwsem_wake
  |--2.80%-- release_pages
  |--2.56%-- try_to_wake_up
  |--1.10%-- __wake_up
  |--1.06%-- pagevec_lru_move_fn
  |--0.93%-- prepare_to_wait_exclusive
  |--0.71%-- free_pid
  |--0.58%-- get_page_from_freelist
  |--0.57%-- add_device_randomness
   --2.04%-- [...]

  0.80%reaim  [kernel.kallsyms][k] _raw_spin_lock_irq
  |--92.49%-- rwsem_down_write_failed
  |--4.24%-- __schedule
  |--1.37%-- run_timer_softirq
   --1.91%-- [...]

The table below shows the % improvement in throughput (1100-2000 users)
in the various AIM7's workloads:

Workload% increase in throughput

custom   3.8%
five-sec 3.5%
fserver  4.1%
high_systime22.2%
shared   2.1%
short   10.1%

Signed-off-by: Waiman Long 
---
 include/linux/osq_lock.h|5 +++
 kernel/locking/rwsem-xadd.c |   60 ++-
 2 files changed, 64 insertions(+), 1 deletions(-)

diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h
index 3a6490e..703ea5c 100644
--- a/include/linux/osq_lock.h
+++ b/include/linux/osq_lock.h
@@ -32,4 +32,9 @@ static inline void osq_lock_init(struct optimistic_spin_queue 
*lock)
 extern bool osq_lock(struct optimistic_spin_queue *lock);
 extern void osq_unlock(struct optimistic_spin_queue *lock);
 
+static inline bool osq_is_locked(struct optimistic_spin_queue *lock)
+{
+   return atomic_read(>tail) != OSQ_UNLOCKED_VAL;
+}
+
 #endif
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 2f7cc40..d088045 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -107,6 +107,35 @@ enum rwsem_wake_type {
RWSEM_WAKE_READ_OWNED   /* Waker thread holds the read lock */
 };
 
+#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
+/*
+ * return true if there is an active writer by checking the owner field which
+ * should be set if there is one.
+ */
+static inline bool rwsem_has_active_writer(struct rw_semaphore *sem)
+{
+   return READ_ONCE(sem->owner) != NULL;
+}
+
+/*
+ * Return true if the rwsem has active spinner
+ */
+static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
+{
+   return osq_is_locked(>osq);
+}
+#else /* CONFIG_RWSEM_SPIN_ON_OWNER */
+static inline bool rwsem_has_active_writer(struct rw_semaphore *sem)
+{
+   return false;   /* Assume it has no active writer */
+}
+
+static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
+{
+   return false;
+}
+#endif /* CONFIG_RWSEM_SPIN_ON_OWNER */
+
 /*
  * handle the lock release when processes blocked on it that can now run
  * - if we come here from up_(), then:
@@ -125,6 +154,14 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum 
rwsem_wake_type wake_type)
struct list_head *next;
long oldcount, woken, loop, adjustment;
 
+   /*
+* up_write() cleared the owner field before calling this function.
+

[PATCH] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write

2015-04-17 Thread Waiman Long
In up_read()/up_write(), rwsem_wake() will be called whenever it
detects that some writers/readers are waiting. The rwsem_wake()
function will take the wait_lock and call __rwsem_do_wake() to do
the real wakeup.  This can be a problem especially for up_read()
where many readers can potentially call rwsem_wake() at more or less
the same time even though a single call should be enough. This will
cause contention in the wait_lock cacheline resulting in delay of
the up_read/up_write operations.

This patch makes the wait_lock taking and the call to __rwsem_do_wake()
optional if at least one spinning writer is present. The spinning
writer will be able to take the rwsem and call rwsem_wake() later
when it calls up_write(). With the presence of a spinning writer,
rwsem_wake() will now try to acquire the lock using trylock. If that
fails, it will just quit.

This patch also checks one more time to see if the rwsem was stolen
just before doing the expensive wakeup operation which will be
unnecessary if the lock was stolen.

On an 8-socket Westmere-EX server (80 cores, HT off), running AIM7's
high_systime workload (1000 users) on a vanilla 4.0 kernel produced
the following perf profile for spinlock contention:

  9.23%reaim  [kernel.kallsyms][k] _raw_spin_lock_irqsave
  |--97.39%-- rwsem_wake
  |--0.69%-- try_to_wake_up
  |--0.52%-- release_pages
   --1.40%-- [...]

  1.70%reaim  [kernel.kallsyms][k] _raw_spin_lock_irq
  |--96.61%-- rwsem_down_write_failed
  |--2.03%-- __schedule
  |--0.50%-- run_timer_softirq
   --0.86%-- [...]

With a patched 4.0 kernel, the perf profile became:

  1.87%reaim  [kernel.kallsyms][k] _raw_spin_lock_irqsave
  |--87.64%-- rwsem_wake
  |--2.80%-- release_pages
  |--2.56%-- try_to_wake_up
  |--1.10%-- __wake_up
  |--1.06%-- pagevec_lru_move_fn
  |--0.93%-- prepare_to_wait_exclusive
  |--0.71%-- free_pid
  |--0.58%-- get_page_from_freelist
  |--0.57%-- add_device_randomness
   --2.04%-- [...]

  0.80%reaim  [kernel.kallsyms][k] _raw_spin_lock_irq
  |--92.49%-- rwsem_down_write_failed
  |--4.24%-- __schedule
  |--1.37%-- run_timer_softirq
   --1.91%-- [...]

The table below shows the % improvement in throughput (1100-2000 users)
in the various AIM7's workloads:

Workload% increase in throughput

custom   3.8%
five-sec 3.5%
fserver  4.1%
high_systime22.2%
shared   2.1%
short   10.1%

Signed-off-by: Waiman Long waiman.l...@hp.com
---
 include/linux/osq_lock.h|5 +++
 kernel/locking/rwsem-xadd.c |   60 ++-
 2 files changed, 64 insertions(+), 1 deletions(-)

diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h
index 3a6490e..703ea5c 100644
--- a/include/linux/osq_lock.h
+++ b/include/linux/osq_lock.h
@@ -32,4 +32,9 @@ static inline void osq_lock_init(struct optimistic_spin_queue 
*lock)
 extern bool osq_lock(struct optimistic_spin_queue *lock);
 extern void osq_unlock(struct optimistic_spin_queue *lock);
 
+static inline bool osq_is_locked(struct optimistic_spin_queue *lock)
+{
+   return atomic_read(lock-tail) != OSQ_UNLOCKED_VAL;
+}
+
 #endif
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 2f7cc40..d088045 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -107,6 +107,35 @@ enum rwsem_wake_type {
RWSEM_WAKE_READ_OWNED   /* Waker thread holds the read lock */
 };
 
+#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
+/*
+ * return true if there is an active writer by checking the owner field which
+ * should be set if there is one.
+ */
+static inline bool rwsem_has_active_writer(struct rw_semaphore *sem)
+{
+   return READ_ONCE(sem-owner) != NULL;
+}
+
+/*
+ * Return true if the rwsem has active spinner
+ */
+static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
+{
+   return osq_is_locked(sem-osq);
+}
+#else /* CONFIG_RWSEM_SPIN_ON_OWNER */
+static inline bool rwsem_has_active_writer(struct rw_semaphore *sem)
+{
+   return false;   /* Assume it has no active writer */
+}
+
+static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
+{
+   return false;
+}
+#endif /* CONFIG_RWSEM_SPIN_ON_OWNER */
+
 /*
  * handle the lock release when processes blocked on it that can now run
  * - if we come here from up_(), then:
@@ -125,6 +154,14 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum 
rwsem_wake_type wake_type)
struct list_head *next;
long oldcount, woken, loop, adjustment;
 
+   /*
+* up_write() cleared the owner field before calling