Re: [RFC PATCH] wfcqueue: implement __wfcq_enqueue_head() (v2)
* Eric Wong (normalper...@yhbt.net) wrote: > Mathieu Desnoyers wrote: > > Implement enqueue-to-head. It can run concurrently with enqueue, splice > > to queue, and iteration, but requires a mutex against dequeue and splice > > from queue operations. > > > > Useful for special-cases where a queue needs to have nodes enqueued into > > its head. > > > > This patch is only compile-tested. > > > > Changes since v1: > > * Don't require mutual exclusion between traversals and > > __wfcq_enqueue_head(). > > > > Signed-off-by: Mathieu Desnoyers > > Thanks! The first hunk (sync table comment) conflicted with > my __wfcq_enqueue patch, but other than that I could not benchmark any > regression with my 4-core machine with v4 of my > "epoll: avoid spinlock contention with wfcqueue" patch. > > All I needed was "s/__wfcq_prepend/__wfcq_enqueue_head/g" to my original > patch to use the updated API. > > I was worried about the cmpxchg at first, but it does not seem to hurt > performance on my 4-core system. In fact, it was slightly better > (but within margin of error) > > time ./eponeshotmt -c 100 -w 4 -t 4 -f 10 > real0m 5.78s > user0m 1.20s > sys 0m 21.90s > > Tested-by: Eric Wong > > Hopefully somebody can test my epoll patches with more cores/threads :) Thanks for testing. Taking care of your comments, and of memory barriers, brings me to send a v3 of this patch shortly. Testing is welcome! Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] wfcqueue: implement __wfcq_enqueue_head() (v2)
* Eric Wong (normalper...@yhbt.net) wrote: Mathieu Desnoyers mathieu.desnoy...@efficios.com wrote: Implement enqueue-to-head. It can run concurrently with enqueue, splice to queue, and iteration, but requires a mutex against dequeue and splice from queue operations. Useful for special-cases where a queue needs to have nodes enqueued into its head. This patch is only compile-tested. Changes since v1: * Don't require mutual exclusion between traversals and __wfcq_enqueue_head(). Signed-off-by: Mathieu Desnoyers mathieu.desnoy...@efficios.com Thanks! The first hunk (sync table comment) conflicted with my __wfcq_enqueue patch, but other than that I could not benchmark any regression with my 4-core machine with v4 of my epoll: avoid spinlock contention with wfcqueue patch. All I needed was s/__wfcq_prepend/__wfcq_enqueue_head/g to my original patch to use the updated API. I was worried about the cmpxchg at first, but it does not seem to hurt performance on my 4-core system. In fact, it was slightly better (but within margin of error) time ./eponeshotmt -c 100 -w 4 -t 4 -f 10 real0m 5.78s user0m 1.20s sys 0m 21.90s Tested-by: Eric Wong normalper...@yhbt.net Hopefully somebody can test my epoll patches with more cores/threads :) Thanks for testing. Taking care of your comments, and of memory barriers, brings me to send a v3 of this patch shortly. Testing is welcome! Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] wfcqueue: implement __wfcq_enqueue_head() (v2)
Mathieu Desnoyers wrote: > Implement enqueue-to-head. It can run concurrently with enqueue, splice > to queue, and iteration, but requires a mutex against dequeue and splice > from queue operations. > > Useful for special-cases where a queue needs to have nodes enqueued into > its head. > > This patch is only compile-tested. > > Changes since v1: > * Don't require mutual exclusion between traversals and > __wfcq_enqueue_head(). > > Signed-off-by: Mathieu Desnoyers Thanks! The first hunk (sync table comment) conflicted with my __wfcq_enqueue patch, but other than that I could not benchmark any regression with my 4-core machine with v4 of my "epoll: avoid spinlock contention with wfcqueue" patch. All I needed was "s/__wfcq_prepend/__wfcq_enqueue_head/g" to my original patch to use the updated API. I was worried about the cmpxchg at first, but it does not seem to hurt performance on my 4-core system. In fact, it was slightly better (but within margin of error) time ./eponeshotmt -c 100 -w 4 -t 4 -f 10 real0m 5.78s user0m 1.20s sys 0m 21.90s Tested-by: Eric Wong Hopefully somebody can test my epoll patches with more cores/threads :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH] wfcqueue: implement __wfcq_enqueue_head() (v2)
Implement enqueue-to-head. It can run concurrently with enqueue, splice to queue, and iteration, but requires a mutex against dequeue and splice from queue operations. Useful for special-cases where a queue needs to have nodes enqueued into its head. This patch is only compile-tested. Changes since v1: * Don't require mutual exclusion between traversals and __wfcq_enqueue_head(). Signed-off-by: Mathieu Desnoyers --- include/linux/wfcqueue.h | 67 ++- 1 file changed, 60 insertions(+), 7 deletions(-) Index: linux/include/linux/wfcqueue.h === --- linux.orig/include/linux/wfcqueue.h +++ linux/include/linux/wfcqueue.h @@ -55,14 +55,16 @@ * [4] __wfcq_splice (source queue) * [5] __wfcq_first * [6] __wfcq_next + * [7] __wfcq_enqueue_head * - * [1] [2] [3] [4] [5] [6] - * [1] - - - - - - - * [2] - - - - - - - * [3] - - X X X X - * [4] - - X - X X - * [5] - - X X - - - * [6] - - X X - - + * [1] [2] [3] [4] [5] [6] [7] + * [1] - - - - - - - + * [2] - - - - - - X + * [3] - - X X X X X + * [4] - - X - X X X + * [5] - - X X - - - + * [6] - - X X - - - + * [7] - X X X - - X * * Besides locking, mutual exclusion of dequeue, splice and iteration * can be ensured by performing all of those operations from a single @@ -230,6 +232,57 @@ ___wfcq_node_sync_next(struct wfcq_node } /* + * __wfcq_enqueue_head: prepend a node into a queue. + * + * No memory barriers are issued. Mutual exclusion is the responsibility + * of the caller. + * + * Returns false if the queue was empty prior to adding the node. + * Returns true otherwise. + */ +static inline bool __wfcq_enqueue_head(struct wfcq_head *head, + struct wfcq_tail *tail, + struct wfcq_node *node) +{ + bool not_empty = 0; + +/* +* Move tail if queue was empty. Tail pointer is the +* linearization point of enqueuers. +*/ + if (cmpxchg(>p, >node, node) != >node) { + not_empty = 1; + + /* +* Queue was non-empty. We need to wait for +* head->node.next to become non-NULL, because a +* concurrent wfcq_append may be updating it. +*/ + CMM_STORE_SHARED(node->next, + ___wfcq_node_sync_next(>node)); + } + + /* +* If cmpxchg succeeds (queue was empty), tail now points to +* node, but head->node.next is still NULL. Concurrent +* traversals seeing this state will busy-wait until we set +* head->node.next. +* +* Else, if cmpxchg fails (queue was not empty), traversals will +* only see node after we set head->node.next. +*/ + + /* +* From this point, we know that wfcq_append cannot touch +* head->node.next, either because we successfully moved tail->p +* to node, or because we waited for head->node.next to become +* non-NULL. It is therefore safe to update it. +*/ + CMM_STORE_SHARED(head->node.next, node); + return not_empty; +} + +/* * __wfcq_first: get first node of a queue, without dequeuing. * * Content written into the node before enqueue is guaranteed to be -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH] wfcqueue: implement __wfcq_enqueue_head() (v2)
Implement enqueue-to-head. It can run concurrently with enqueue, splice to queue, and iteration, but requires a mutex against dequeue and splice from queue operations. Useful for special-cases where a queue needs to have nodes enqueued into its head. This patch is only compile-tested. Changes since v1: * Don't require mutual exclusion between traversals and __wfcq_enqueue_head(). Signed-off-by: Mathieu Desnoyers mathieu.desnoy...@efficios.com --- include/linux/wfcqueue.h | 67 ++- 1 file changed, 60 insertions(+), 7 deletions(-) Index: linux/include/linux/wfcqueue.h === --- linux.orig/include/linux/wfcqueue.h +++ linux/include/linux/wfcqueue.h @@ -55,14 +55,16 @@ * [4] __wfcq_splice (source queue) * [5] __wfcq_first * [6] __wfcq_next + * [7] __wfcq_enqueue_head * - * [1] [2] [3] [4] [5] [6] - * [1] - - - - - - - * [2] - - - - - - - * [3] - - X X X X - * [4] - - X - X X - * [5] - - X X - - - * [6] - - X X - - + * [1] [2] [3] [4] [5] [6] [7] + * [1] - - - - - - - + * [2] - - - - - - X + * [3] - - X X X X X + * [4] - - X - X X X + * [5] - - X X - - - + * [6] - - X X - - - + * [7] - X X X - - X * * Besides locking, mutual exclusion of dequeue, splice and iteration * can be ensured by performing all of those operations from a single @@ -230,6 +232,57 @@ ___wfcq_node_sync_next(struct wfcq_node } /* + * __wfcq_enqueue_head: prepend a node into a queue. + * + * No memory barriers are issued. Mutual exclusion is the responsibility + * of the caller. + * + * Returns false if the queue was empty prior to adding the node. + * Returns true otherwise. + */ +static inline bool __wfcq_enqueue_head(struct wfcq_head *head, + struct wfcq_tail *tail, + struct wfcq_node *node) +{ + bool not_empty = 0; + +/* +* Move tail if queue was empty. Tail pointer is the +* linearization point of enqueuers. +*/ + if (cmpxchg(tail-p, head-node, node) != head-node) { + not_empty = 1; + + /* +* Queue was non-empty. We need to wait for +* head-node.next to become non-NULL, because a +* concurrent wfcq_append may be updating it. +*/ + CMM_STORE_SHARED(node-next, + ___wfcq_node_sync_next(head-node)); + } + + /* +* If cmpxchg succeeds (queue was empty), tail now points to +* node, but head-node.next is still NULL. Concurrent +* traversals seeing this state will busy-wait until we set +* head-node.next. +* +* Else, if cmpxchg fails (queue was not empty), traversals will +* only see node after we set head-node.next. +*/ + + /* +* From this point, we know that wfcq_append cannot touch +* head-node.next, either because we successfully moved tail-p +* to node, or because we waited for head-node.next to become +* non-NULL. It is therefore safe to update it. +*/ + CMM_STORE_SHARED(head-node.next, node); + return not_empty; +} + +/* * __wfcq_first: get first node of a queue, without dequeuing. * * Content written into the node before enqueue is guaranteed to be -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] wfcqueue: implement __wfcq_enqueue_head() (v2)
Mathieu Desnoyers mathieu.desnoy...@efficios.com wrote: Implement enqueue-to-head. It can run concurrently with enqueue, splice to queue, and iteration, but requires a mutex against dequeue and splice from queue operations. Useful for special-cases where a queue needs to have nodes enqueued into its head. This patch is only compile-tested. Changes since v1: * Don't require mutual exclusion between traversals and __wfcq_enqueue_head(). Signed-off-by: Mathieu Desnoyers mathieu.desnoy...@efficios.com Thanks! The first hunk (sync table comment) conflicted with my __wfcq_enqueue patch, but other than that I could not benchmark any regression with my 4-core machine with v4 of my epoll: avoid spinlock contention with wfcqueue patch. All I needed was s/__wfcq_prepend/__wfcq_enqueue_head/g to my original patch to use the updated API. I was worried about the cmpxchg at first, but it does not seem to hurt performance on my 4-core system. In fact, it was slightly better (but within margin of error) time ./eponeshotmt -c 100 -w 4 -t 4 -f 10 real0m 5.78s user0m 1.20s sys 0m 21.90s Tested-by: Eric Wong normalper...@yhbt.net Hopefully somebody can test my epoll patches with more cores/threads :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/