Re: [RFC PATCH] wfcqueue: implement __wfcq_enqueue_head() (v2)

2013-04-07 Thread Mathieu Desnoyers
* Eric Wong (normalper...@yhbt.net) wrote:
> Mathieu Desnoyers  wrote:
> > Implement enqueue-to-head. It can run concurrently with enqueue, splice
> > to queue, and iteration, but requires a mutex against dequeue and splice
> > from queue operations.
> > 
> > Useful for special-cases where a queue needs to have nodes enqueued into
> > its head.
> > 
> > This patch is only compile-tested.
> > 
> > Changes since v1:
> > * Don't require mutual exclusion between traversals and
> >   __wfcq_enqueue_head().
> > 
> > Signed-off-by: Mathieu Desnoyers 
> 
> Thanks!  The first hunk (sync table comment) conflicted with
> my __wfcq_enqueue patch, but other than that I could not benchmark any
> regression with my 4-core machine with v4 of my
> "epoll: avoid spinlock contention with wfcqueue" patch.
> 
> All I needed was "s/__wfcq_prepend/__wfcq_enqueue_head/g" to my original
> patch to use the updated API.
> 
> I was worried about the cmpxchg at first, but it does not seem to hurt
> performance on my 4-core system.  In fact, it was slightly better
> (but within margin of error)
> 
> time ./eponeshotmt -c 100 -w 4 -t 4 -f 10
> real0m 5.78s
> user0m 1.20s
> sys 0m 21.90s
> 
> Tested-by: Eric Wong 
> 
> Hopefully somebody can test my epoll patches with more cores/threads :)

Thanks for testing. Taking care of your comments, and of memory
barriers, brings me to send a v3 of this patch shortly. Testing is
welcome!

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] wfcqueue: implement __wfcq_enqueue_head() (v2)

2013-04-07 Thread Mathieu Desnoyers
* Eric Wong (normalper...@yhbt.net) wrote:
 Mathieu Desnoyers mathieu.desnoy...@efficios.com wrote:
  Implement enqueue-to-head. It can run concurrently with enqueue, splice
  to queue, and iteration, but requires a mutex against dequeue and splice
  from queue operations.
  
  Useful for special-cases where a queue needs to have nodes enqueued into
  its head.
  
  This patch is only compile-tested.
  
  Changes since v1:
  * Don't require mutual exclusion between traversals and
__wfcq_enqueue_head().
  
  Signed-off-by: Mathieu Desnoyers mathieu.desnoy...@efficios.com
 
 Thanks!  The first hunk (sync table comment) conflicted with
 my __wfcq_enqueue patch, but other than that I could not benchmark any
 regression with my 4-core machine with v4 of my
 epoll: avoid spinlock contention with wfcqueue patch.
 
 All I needed was s/__wfcq_prepend/__wfcq_enqueue_head/g to my original
 patch to use the updated API.
 
 I was worried about the cmpxchg at first, but it does not seem to hurt
 performance on my 4-core system.  In fact, it was slightly better
 (but within margin of error)
 
 time ./eponeshotmt -c 100 -w 4 -t 4 -f 10
 real0m 5.78s
 user0m 1.20s
 sys 0m 21.90s
 
 Tested-by: Eric Wong normalper...@yhbt.net
 
 Hopefully somebody can test my epoll patches with more cores/threads :)

Thanks for testing. Taking care of your comments, and of memory
barriers, brings me to send a v3 of this patch shortly. Testing is
welcome!

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] wfcqueue: implement __wfcq_enqueue_head() (v2)

2013-04-06 Thread Eric Wong
Mathieu Desnoyers  wrote:
> Implement enqueue-to-head. It can run concurrently with enqueue, splice
> to queue, and iteration, but requires a mutex against dequeue and splice
> from queue operations.
> 
> Useful for special-cases where a queue needs to have nodes enqueued into
> its head.
> 
> This patch is only compile-tested.
> 
> Changes since v1:
> * Don't require mutual exclusion between traversals and
>   __wfcq_enqueue_head().
> 
> Signed-off-by: Mathieu Desnoyers 

Thanks!  The first hunk (sync table comment) conflicted with
my __wfcq_enqueue patch, but other than that I could not benchmark any
regression with my 4-core machine with v4 of my
"epoll: avoid spinlock contention with wfcqueue" patch.

All I needed was "s/__wfcq_prepend/__wfcq_enqueue_head/g" to my original
patch to use the updated API.

I was worried about the cmpxchg at first, but it does not seem to hurt
performance on my 4-core system.  In fact, it was slightly better
(but within margin of error)

time ./eponeshotmt -c 100 -w 4 -t 4 -f 10
real0m 5.78s
user0m 1.20s
sys 0m 21.90s

Tested-by: Eric Wong 

Hopefully somebody can test my epoll patches with more cores/threads :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] wfcqueue: implement __wfcq_enqueue_head() (v2)

2013-04-06 Thread Mathieu Desnoyers
Implement enqueue-to-head. It can run concurrently with enqueue, splice
to queue, and iteration, but requires a mutex against dequeue and splice
from queue operations.

Useful for special-cases where a queue needs to have nodes enqueued into
its head.

This patch is only compile-tested.

Changes since v1:
* Don't require mutual exclusion between traversals and
  __wfcq_enqueue_head().

Signed-off-by: Mathieu Desnoyers 
---
 include/linux/wfcqueue.h |   67 ++-
 1 file changed, 60 insertions(+), 7 deletions(-)

Index: linux/include/linux/wfcqueue.h
===
--- linux.orig/include/linux/wfcqueue.h
+++ linux/include/linux/wfcqueue.h
@@ -55,14 +55,16 @@
  * [4] __wfcq_splice (source queue)
  * [5] __wfcq_first
  * [6] __wfcq_next
+ * [7] __wfcq_enqueue_head
  *
- * [1] [2] [3] [4] [5] [6]
- * [1]  -   -   -   -   -   -
- * [2]  -   -   -   -   -   -
- * [3]  -   -   X   X   X   X
- * [4]  -   -   X   -   X   X
- * [5]  -   -   X   X   -   -
- * [6]  -   -   X   X   -   -
+ * [1] [2] [3] [4] [5] [6] [7]
+ * [1]  -   -   -   -   -   -   -
+ * [2]  -   -   -   -   -   -   X
+ * [3]  -   -   X   X   X   X   X
+ * [4]  -   -   X   -   X   X   X
+ * [5]  -   -   X   X   -   -   -
+ * [6]  -   -   X   X   -   -   -
+ * [7]  -   X   X   X   -   -   X
  *
  * Besides locking, mutual exclusion of dequeue, splice and iteration
  * can be ensured by performing all of those operations from a single
@@ -230,6 +232,57 @@ ___wfcq_node_sync_next(struct wfcq_node
 }
 
 /*
+ * __wfcq_enqueue_head: prepend a node into a queue.
+ *
+ * No memory barriers are issued. Mutual exclusion is the responsibility
+ * of the caller.
+ *
+ * Returns false if the queue was empty prior to adding the node.
+ * Returns true otherwise.
+ */
+static inline bool __wfcq_enqueue_head(struct wfcq_head *head,
+   struct wfcq_tail *tail,
+   struct wfcq_node *node)
+{
+   bool not_empty = 0;
+
+/*
+* Move tail if queue was empty. Tail pointer is the
+* linearization point of enqueuers.
+*/
+   if (cmpxchg(>p, >node, node) != >node) {
+   not_empty = 1;
+
+   /*
+* Queue was non-empty. We need to wait for
+* head->node.next to become non-NULL, because a
+* concurrent wfcq_append may be updating it.
+*/
+   CMM_STORE_SHARED(node->next,
+   ___wfcq_node_sync_next(>node));
+   }
+
+   /*
+* If cmpxchg succeeds (queue was empty), tail now points to
+* node, but head->node.next is still NULL. Concurrent
+* traversals seeing this state will busy-wait until we set
+* head->node.next.
+*
+* Else, if cmpxchg fails (queue was not empty), traversals will
+* only see node after we set head->node.next.
+*/
+
+   /*
+* From this point, we know that wfcq_append cannot touch
+* head->node.next, either because we successfully moved tail->p
+* to node, or because we waited for head->node.next to become
+* non-NULL. It is therefore safe to update it.
+*/
+   CMM_STORE_SHARED(head->node.next, node);
+   return not_empty;
+}
+
+/*
  * __wfcq_first: get first node of a queue, without dequeuing.
  *
  * Content written into the node before enqueue is guaranteed to be
-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] wfcqueue: implement __wfcq_enqueue_head() (v2)

2013-04-06 Thread Mathieu Desnoyers
Implement enqueue-to-head. It can run concurrently with enqueue, splice
to queue, and iteration, but requires a mutex against dequeue and splice
from queue operations.

Useful for special-cases where a queue needs to have nodes enqueued into
its head.

This patch is only compile-tested.

Changes since v1:
* Don't require mutual exclusion between traversals and
  __wfcq_enqueue_head().

Signed-off-by: Mathieu Desnoyers mathieu.desnoy...@efficios.com
---
 include/linux/wfcqueue.h |   67 ++-
 1 file changed, 60 insertions(+), 7 deletions(-)

Index: linux/include/linux/wfcqueue.h
===
--- linux.orig/include/linux/wfcqueue.h
+++ linux/include/linux/wfcqueue.h
@@ -55,14 +55,16 @@
  * [4] __wfcq_splice (source queue)
  * [5] __wfcq_first
  * [6] __wfcq_next
+ * [7] __wfcq_enqueue_head
  *
- * [1] [2] [3] [4] [5] [6]
- * [1]  -   -   -   -   -   -
- * [2]  -   -   -   -   -   -
- * [3]  -   -   X   X   X   X
- * [4]  -   -   X   -   X   X
- * [5]  -   -   X   X   -   -
- * [6]  -   -   X   X   -   -
+ * [1] [2] [3] [4] [5] [6] [7]
+ * [1]  -   -   -   -   -   -   -
+ * [2]  -   -   -   -   -   -   X
+ * [3]  -   -   X   X   X   X   X
+ * [4]  -   -   X   -   X   X   X
+ * [5]  -   -   X   X   -   -   -
+ * [6]  -   -   X   X   -   -   -
+ * [7]  -   X   X   X   -   -   X
  *
  * Besides locking, mutual exclusion of dequeue, splice and iteration
  * can be ensured by performing all of those operations from a single
@@ -230,6 +232,57 @@ ___wfcq_node_sync_next(struct wfcq_node
 }
 
 /*
+ * __wfcq_enqueue_head: prepend a node into a queue.
+ *
+ * No memory barriers are issued. Mutual exclusion is the responsibility
+ * of the caller.
+ *
+ * Returns false if the queue was empty prior to adding the node.
+ * Returns true otherwise.
+ */
+static inline bool __wfcq_enqueue_head(struct wfcq_head *head,
+   struct wfcq_tail *tail,
+   struct wfcq_node *node)
+{
+   bool not_empty = 0;
+
+/*
+* Move tail if queue was empty. Tail pointer is the
+* linearization point of enqueuers.
+*/
+   if (cmpxchg(tail-p, head-node, node) != head-node) {
+   not_empty = 1;
+
+   /*
+* Queue was non-empty. We need to wait for
+* head-node.next to become non-NULL, because a
+* concurrent wfcq_append may be updating it.
+*/
+   CMM_STORE_SHARED(node-next,
+   ___wfcq_node_sync_next(head-node));
+   }
+
+   /*
+* If cmpxchg succeeds (queue was empty), tail now points to
+* node, but head-node.next is still NULL. Concurrent
+* traversals seeing this state will busy-wait until we set
+* head-node.next.
+*
+* Else, if cmpxchg fails (queue was not empty), traversals will
+* only see node after we set head-node.next.
+*/
+
+   /*
+* From this point, we know that wfcq_append cannot touch
+* head-node.next, either because we successfully moved tail-p
+* to node, or because we waited for head-node.next to become
+* non-NULL. It is therefore safe to update it.
+*/
+   CMM_STORE_SHARED(head-node.next, node);
+   return not_empty;
+}
+
+/*
  * __wfcq_first: get first node of a queue, without dequeuing.
  *
  * Content written into the node before enqueue is guaranteed to be
-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] wfcqueue: implement __wfcq_enqueue_head() (v2)

2013-04-06 Thread Eric Wong
Mathieu Desnoyers mathieu.desnoy...@efficios.com wrote:
 Implement enqueue-to-head. It can run concurrently with enqueue, splice
 to queue, and iteration, but requires a mutex against dequeue and splice
 from queue operations.
 
 Useful for special-cases where a queue needs to have nodes enqueued into
 its head.
 
 This patch is only compile-tested.
 
 Changes since v1:
 * Don't require mutual exclusion between traversals and
   __wfcq_enqueue_head().
 
 Signed-off-by: Mathieu Desnoyers mathieu.desnoy...@efficios.com

Thanks!  The first hunk (sync table comment) conflicted with
my __wfcq_enqueue patch, but other than that I could not benchmark any
regression with my 4-core machine with v4 of my
epoll: avoid spinlock contention with wfcqueue patch.

All I needed was s/__wfcq_prepend/__wfcq_enqueue_head/g to my original
patch to use the updated API.

I was worried about the cmpxchg at first, but it does not seem to hurt
performance on my 4-core system.  In fact, it was slightly better
(but within margin of error)

time ./eponeshotmt -c 100 -w 4 -t 4 -f 10
real0m 5.78s
user0m 1.20s
sys 0m 21.90s

Tested-by: Eric Wong normalper...@yhbt.net

Hopefully somebody can test my epoll patches with more cores/threads :)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/