Re: [PATCH 3/4] locking/rwsem: Enable lockless waiter wakeup(s)

2016-05-09 Thread Peter Zijlstra
On Sun, May 08, 2016 at 09:56:09PM -0700, Davidlohr Bueso wrote:
> @@ -129,12 +133,14 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum 
> rwsem_wake_type wake_type)
>   waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
>   if (waiter->type == RWSEM_WAITING_FOR_WRITE) {
>   if (wake_type == RWSEM_WAKE_ANY)
> - /* Wake writer at the front of the queue, but do not
> -  * grant it the lock yet as we want other writers
> -  * to be able to steal it.  Readers, on the other hand,
> -  * will block as they will notice the queued writer.
> + /*
> +  * Mark writer at the front of the queue for wakeup.
> +  * Until the task is actually later awoken later by
> +  * the caller, other writers are able to steal it the
> +  * lock to be able to steal it.  Readers, on the other,
> +  * hand, will block as they will notice the queued 
> writer.
>*/
> - wake_up_process(waiter->task);
> + wake_q_add(wake_q, waiter->task);

Thanks for fixing that comment; that bugged the hell out of me ;-)

>   goto out;
>   }
>  
> @@ -196,12 +202,11 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum 
> rwsem_wake_type wake_type)
>*/
>   smp_mb();
>   waiter->task = NULL;
> - wake_up_process(tsk);
> + wake_q_add(wake_q, tsk);


However, note that per the race in the previous email; this is too late
to acquire the tsk refcount. I think it'll work if you do wake_q_add()
_before_ the waiter->task = NULL.

On that same note; I think that you can replace:

smp_mb();
waiter->task = NULL;

with:

smp_store_release(>task, NULL);

>   } while (--loop);
>  
>   sem->wait_list.next = next;
>   next->prev = >wait_list;



Re: [PATCH 3/4] locking/rwsem: Enable lockless waiter wakeup(s)

2016-05-09 Thread Peter Zijlstra
On Sun, May 08, 2016 at 09:56:09PM -0700, Davidlohr Bueso wrote:
> @@ -129,12 +133,14 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum 
> rwsem_wake_type wake_type)
>   waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
>   if (waiter->type == RWSEM_WAITING_FOR_WRITE) {
>   if (wake_type == RWSEM_WAKE_ANY)
> - /* Wake writer at the front of the queue, but do not
> -  * grant it the lock yet as we want other writers
> -  * to be able to steal it.  Readers, on the other hand,
> -  * will block as they will notice the queued writer.
> + /*
> +  * Mark writer at the front of the queue for wakeup.
> +  * Until the task is actually later awoken later by
> +  * the caller, other writers are able to steal it the
> +  * lock to be able to steal it.  Readers, on the other,
> +  * hand, will block as they will notice the queued 
> writer.
>*/
> - wake_up_process(waiter->task);
> + wake_q_add(wake_q, waiter->task);

Thanks for fixing that comment; that bugged the hell out of me ;-)

>   goto out;
>   }
>  
> @@ -196,12 +202,11 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum 
> rwsem_wake_type wake_type)
>*/
>   smp_mb();
>   waiter->task = NULL;
> - wake_up_process(tsk);
> + wake_q_add(wake_q, tsk);


However, note that per the race in the previous email; this is too late
to acquire the tsk refcount. I think it'll work if you do wake_q_add()
_before_ the waiter->task = NULL.

On that same note; I think that you can replace:

smp_mb();
waiter->task = NULL;

with:

smp_store_release(>task, NULL);

>   } while (--loop);
>  
>   sem->wait_list.next = next;
>   next->prev = >wait_list;



[PATCH 3/4] locking/rwsem: Enable lockless waiter wakeup(s)

2016-05-08 Thread Davidlohr Bueso
As wake_qs gain users, we can teach rwsems about them such that
waiters can be awoken without the wait_lock. This is for both
readers and writer, the former being the most ideal candidate
as we can batch the wakeups shortening the critical region that
much more -- ie writer task blocking a bunch of tasks waiting to
service page-faults (mmap_sem readers).

In general applying wake_qs to rwsem (xadd) is not difficult as
the wait_lock is intended to be released soon _anyways_, with
the exception of when a writer slowpath will proactively wakeup
any queued readers if it sees that the lock is owned by a reader,
in which we simply do the wakeups with the lock held (see comment
in __rwsem_down_write_failed_common()).

Similar to other locking primitives, delaying the waiter being
awoken does allow, at least in theory, the lock to be stolen in
the case of writers, however no harm was seen in this (in fact
lock stealing tends to be a _good_ thing in most workloads), and
this is a tiny window anyways.

Some page-fault (pft) and mmap_sem intensive benchmarks show some
pretty constant reduction in systime (by up to ~8 and ~10%) on a
2-socket, 12 core AMD box.

Signed-off-by: Davidlohr Bueso 
---
 kernel/locking/rwsem-xadd.c | 53 +++--
 1 file changed, 37 insertions(+), 16 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index b592bb48d880..1b8c1285a2aa 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -114,12 +114,16 @@ enum rwsem_wake_type {
  *   - the 'active part' of count (&0x) reached 0 (but may have 
changed)
  *   - the 'waiting part' of count (&0x) is -ve (and will still be so)
  * - there must be someone on the queue
- * - the spinlock must be held by the caller
+ * - the wait_lock must be held by the caller
+ * - tasks are marked for wakeup, the caller must later invoke wake_up_q()
+ *   to actually wakeup the blocked task(s), preferably when the wait_lock
+ *   is released
  * - woken process blocks are discarded from the list after having task zeroed
- * - writers are only woken if downgrading is false
+ * - writers are only marked woken if downgrading is false
  */
 static struct rw_semaphore *
-__rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
+__rwsem_mark_wake(struct rw_semaphore *sem,
+ enum rwsem_wake_type wake_type, struct wake_q_head *wake_q)
 {
struct rwsem_waiter *waiter;
struct task_struct *tsk;
@@ -129,12 +133,14 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum 
rwsem_wake_type wake_type)
waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
if (waiter->type == RWSEM_WAITING_FOR_WRITE) {
if (wake_type == RWSEM_WAKE_ANY)
-   /* Wake writer at the front of the queue, but do not
-* grant it the lock yet as we want other writers
-* to be able to steal it.  Readers, on the other hand,
-* will block as they will notice the queued writer.
+   /*
+* Mark writer at the front of the queue for wakeup.
+* Until the task is actually later awoken later by
+* the caller, other writers are able to steal it the
+* lock to be able to steal it.  Readers, on the other,
+* hand, will block as they will notice the queued 
writer.
 */
-   wake_up_process(waiter->task);
+   wake_q_add(wake_q, waiter->task);
goto out;
}
 
@@ -196,12 +202,11 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum 
rwsem_wake_type wake_type)
 */
smp_mb();
waiter->task = NULL;
-   wake_up_process(tsk);
+   wake_q_add(wake_q, tsk);
} while (--loop);
 
sem->wait_list.next = next;
next->prev = >wait_list;
-
  out:
return sem;
 }
@@ -215,6 +220,7 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct 
rw_semaphore *sem)
long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
struct rwsem_waiter waiter;
struct task_struct *tsk = current;
+   WAKE_Q(wake_q);
 
/* set up my own style of waitqueue */
waiter.task = tsk;
@@ -236,9 +242,10 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct 
rw_semaphore *sem)
if (count == RWSEM_WAITING_BIAS ||
(count > RWSEM_WAITING_BIAS &&
 adjustment != -RWSEM_ACTIVE_READ_BIAS))
-   sem = __rwsem_do_wake(sem, RWSEM_WAKE_ANY);
+   sem = __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, _q);
 
raw_spin_unlock_irq(>wait_lock);
+   wake_up_q(_q);
 
/* wait to be given the lock */
while (true) {
@@ -470,9 +477,19 @@ 

[PATCH 3/4] locking/rwsem: Enable lockless waiter wakeup(s)

2016-05-08 Thread Davidlohr Bueso
As wake_qs gain users, we can teach rwsems about them such that
waiters can be awoken without the wait_lock. This is for both
readers and writer, the former being the most ideal candidate
as we can batch the wakeups shortening the critical region that
much more -- ie writer task blocking a bunch of tasks waiting to
service page-faults (mmap_sem readers).

In general applying wake_qs to rwsem (xadd) is not difficult as
the wait_lock is intended to be released soon _anyways_, with
the exception of when a writer slowpath will proactively wakeup
any queued readers if it sees that the lock is owned by a reader,
in which we simply do the wakeups with the lock held (see comment
in __rwsem_down_write_failed_common()).

Similar to other locking primitives, delaying the waiter being
awoken does allow, at least in theory, the lock to be stolen in
the case of writers, however no harm was seen in this (in fact
lock stealing tends to be a _good_ thing in most workloads), and
this is a tiny window anyways.

Some page-fault (pft) and mmap_sem intensive benchmarks show some
pretty constant reduction in systime (by up to ~8 and ~10%) on a
2-socket, 12 core AMD box.

Signed-off-by: Davidlohr Bueso 
---
 kernel/locking/rwsem-xadd.c | 53 +++--
 1 file changed, 37 insertions(+), 16 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index b592bb48d880..1b8c1285a2aa 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -114,12 +114,16 @@ enum rwsem_wake_type {
  *   - the 'active part' of count (&0x) reached 0 (but may have 
changed)
  *   - the 'waiting part' of count (&0x) is -ve (and will still be so)
  * - there must be someone on the queue
- * - the spinlock must be held by the caller
+ * - the wait_lock must be held by the caller
+ * - tasks are marked for wakeup, the caller must later invoke wake_up_q()
+ *   to actually wakeup the blocked task(s), preferably when the wait_lock
+ *   is released
  * - woken process blocks are discarded from the list after having task zeroed
- * - writers are only woken if downgrading is false
+ * - writers are only marked woken if downgrading is false
  */
 static struct rw_semaphore *
-__rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
+__rwsem_mark_wake(struct rw_semaphore *sem,
+ enum rwsem_wake_type wake_type, struct wake_q_head *wake_q)
 {
struct rwsem_waiter *waiter;
struct task_struct *tsk;
@@ -129,12 +133,14 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum 
rwsem_wake_type wake_type)
waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
if (waiter->type == RWSEM_WAITING_FOR_WRITE) {
if (wake_type == RWSEM_WAKE_ANY)
-   /* Wake writer at the front of the queue, but do not
-* grant it the lock yet as we want other writers
-* to be able to steal it.  Readers, on the other hand,
-* will block as they will notice the queued writer.
+   /*
+* Mark writer at the front of the queue for wakeup.
+* Until the task is actually later awoken later by
+* the caller, other writers are able to steal it the
+* lock to be able to steal it.  Readers, on the other,
+* hand, will block as they will notice the queued 
writer.
 */
-   wake_up_process(waiter->task);
+   wake_q_add(wake_q, waiter->task);
goto out;
}
 
@@ -196,12 +202,11 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum 
rwsem_wake_type wake_type)
 */
smp_mb();
waiter->task = NULL;
-   wake_up_process(tsk);
+   wake_q_add(wake_q, tsk);
} while (--loop);
 
sem->wait_list.next = next;
next->prev = >wait_list;
-
  out:
return sem;
 }
@@ -215,6 +220,7 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct 
rw_semaphore *sem)
long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
struct rwsem_waiter waiter;
struct task_struct *tsk = current;
+   WAKE_Q(wake_q);
 
/* set up my own style of waitqueue */
waiter.task = tsk;
@@ -236,9 +242,10 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct 
rw_semaphore *sem)
if (count == RWSEM_WAITING_BIAS ||
(count > RWSEM_WAITING_BIAS &&
 adjustment != -RWSEM_ACTIVE_READ_BIAS))
-   sem = __rwsem_do_wake(sem, RWSEM_WAKE_ANY);
+   sem = __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, _q);
 
raw_spin_unlock_irq(>wait_lock);
+   wake_up_q(_q);
 
/* wait to be given the lock */
while (true) {
@@ -470,9 +477,19 @@ __rwsem_down_write_failed_common(struct