Re: buf_ring(9) API precisions
You're right. A write memory barrier is needed there. Thanks On Thu, Sep 22, 2011 at 12:43 AM, Arnaud Lacombe lacom...@gmail.com wrote: Hi, On Mon, Sep 19, 2011 at 8:46 AM, K. Macy km...@freebsd.org wrote: If the value lags next by one then it is ours. This rule applies to all callers so the rule holds consistently. I think you do not understand what I mean, which is that the following: while (br-br_prod_tail != prod_head) cpu_spinwait(); br-br_prod_bufs++; br-br_prod_bytes += nbytes; br-br_prod_tail = prod_next; critical_exit(); at runtime, can be seen, memory-wise as: while (br-br_prod_tail != prod_head) cpu_spinwait(); br-br_prod_tail = prod_next; br-br_prod_bufs++; br-br_prod_bytes += nbytes; critical_exit(); That is, there is no memory barrier to enforce completion of the load/increment/store/load/load/addition/store operations before updating what other thread spin on. Yes, `br_prod_tail' is marked `volatile', but there is no guarantee that it will not be re-ordered wrt. non-volatile write (to `br_prod_bufs' and `br_prod_bytes'). - Arnaud On Mon, Sep 19, 2011 at 5:53 AM, Arnaud Lacombe lacom...@gmail.com wrote: Hi, On Fri, Sep 16, 2011 at 10:41 AM, K. Macy km...@freebsd.org wrote: On Fri, Sep 16, 2011 at 3:02 AM, Arnaud Lacombe lacom...@gmail.com wrote: Hi, On Wed, Sep 14, 2011 at 10:53 PM, Arnaud Lacombe lacom...@gmail.com wrote: Hi Kip, I've got a few question about the buf_ring(9) API. 1) what means the 'drbr_' prefix. I can guess the two last letter, 'b' and 'r', for Buffer Ring, but what about 'd' and 'r' ? 2) in `sys/sys/buf_ring.h', you defined 'struct buf_ring' as: struct buf_ring { volatile uint32_t br_prod_head; volatile uint32_t br_prod_tail; int br_prod_size; int br_prod_mask; uint64_t br_drops; uint64_t br_prod_bufs; uint64_t br_prod_bytes; shouldn't those 3 fields be updated atomically, especially on 32bits platforms ? That might pose a problem as, AFAIK, FreeBSD do not have MI 64bits atomics operations... Between the point at which br_prod_tail == prod_head and when we update br_prod_tail to point to prod_next we are the exclusive owners of the fields in buf_ring. That is why we wait for any other enqueueing threads to update br_prod_tail to point to prod_head before continuing. How do you enforce ordering ? I do not see anything particular forbidding the `br-br_prod_tail' to be committed first, leading other thread to believe they have access to the statistics, while the other thread has not yet committed its change. Thanks, - Arnaud Cheers /* * If there are other enqueues in progress * that preceeded us, we need to wait for them * to complete */ while (br-br_prod_tail != prod_head) cpu_spinwait(); br-br_prod_bufs++; br-br_prod_bytes += nbytes; br-br_prod_tail = prod_next; critical_exit(); ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: buf_ring(9) API precisions
Hi, On Mon, Sep 19, 2011 at 8:46 AM, K. Macy km...@freebsd.org wrote: If the value lags next by one then it is ours. This rule applies to all callers so the rule holds consistently. I think you do not understand what I mean, which is that the following: while (br-br_prod_tail != prod_head) cpu_spinwait(); br-br_prod_bufs++; br-br_prod_bytes += nbytes; br-br_prod_tail = prod_next; critical_exit(); at runtime, can be seen, memory-wise as: while (br-br_prod_tail != prod_head) cpu_spinwait(); br-br_prod_tail = prod_next; br-br_prod_bufs++; br-br_prod_bytes += nbytes; critical_exit(); That is, there is no memory barrier to enforce completion of the load/increment/store/load/load/addition/store operations before updating what other thread spin on. Yes, `br_prod_tail' is marked `volatile', but there is no guarantee that it will not be re-ordered wrt. non-volatile write (to `br_prod_bufs' and `br_prod_bytes'). - Arnaud On Mon, Sep 19, 2011 at 5:53 AM, Arnaud Lacombe lacom...@gmail.com wrote: Hi, On Fri, Sep 16, 2011 at 10:41 AM, K. Macy km...@freebsd.org wrote: On Fri, Sep 16, 2011 at 3:02 AM, Arnaud Lacombe lacom...@gmail.com wrote: Hi, On Wed, Sep 14, 2011 at 10:53 PM, Arnaud Lacombe lacom...@gmail.com wrote: Hi Kip, I've got a few question about the buf_ring(9) API. 1) what means the 'drbr_' prefix. I can guess the two last letter, 'b' and 'r', for Buffer Ring, but what about 'd' and 'r' ? 2) in `sys/sys/buf_ring.h', you defined 'struct buf_ring' as: struct buf_ring { volatile uint32_t br_prod_head; volatile uint32_t br_prod_tail; int br_prod_size; int br_prod_mask; uint64_t br_drops; uint64_t br_prod_bufs; uint64_t br_prod_bytes; shouldn't those 3 fields be updated atomically, especially on 32bits platforms ? That might pose a problem as, AFAIK, FreeBSD do not have MI 64bits atomics operations... Between the point at which br_prod_tail == prod_head and when we update br_prod_tail to point to prod_next we are the exclusive owners of the fields in buf_ring. That is why we wait for any other enqueueing threads to update br_prod_tail to point to prod_head before continuing. How do you enforce ordering ? I do not see anything particular forbidding the `br-br_prod_tail' to be committed first, leading other thread to believe they have access to the statistics, while the other thread has not yet committed its change. Thanks, - Arnaud Cheers /* * If there are other enqueues in progress * that preceeded us, we need to wait for them * to complete */ while (br-br_prod_tail != prod_head) cpu_spinwait(); br-br_prod_bufs++; br-br_prod_bytes += nbytes; br-br_prod_tail = prod_next; critical_exit(); ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: buf_ring(9) API precisions
If the value lags next by one then it is ours. This rule applies to all callers so the rule holds consistently. On Mon, Sep 19, 2011 at 5:53 AM, Arnaud Lacombe lacom...@gmail.com wrote: Hi, On Fri, Sep 16, 2011 at 10:41 AM, K. Macy km...@freebsd.org wrote: On Fri, Sep 16, 2011 at 3:02 AM, Arnaud Lacombe lacom...@gmail.com wrote: Hi, On Wed, Sep 14, 2011 at 10:53 PM, Arnaud Lacombe lacom...@gmail.com wrote: Hi Kip, I've got a few question about the buf_ring(9) API. 1) what means the 'drbr_' prefix. I can guess the two last letter, 'b' and 'r', for Buffer Ring, but what about 'd' and 'r' ? 2) in `sys/sys/buf_ring.h', you defined 'struct buf_ring' as: struct buf_ring { volatile uint32_t br_prod_head; volatile uint32_t br_prod_tail; int br_prod_size; int br_prod_mask; uint64_t br_drops; uint64_t br_prod_bufs; uint64_t br_prod_bytes; shouldn't those 3 fields be updated atomically, especially on 32bits platforms ? That might pose a problem as, AFAIK, FreeBSD do not have MI 64bits atomics operations... Between the point at which br_prod_tail == prod_head and when we update br_prod_tail to point to prod_next we are the exclusive owners of the fields in buf_ring. That is why we wait for any other enqueueing threads to update br_prod_tail to point to prod_head before continuing. How do you enforce ordering ? I do not see anything particular forbidding the `br-br_prod_tail' to be committed first, leading other thread to believe they have access to the statistics, while the other thread has not yet committed its change. Thanks, - Arnaud Cheers /* * If there are other enqueues in progress * that preceeded us, we need to wait for them * to complete */ while (br-br_prod_tail != prod_head) cpu_spinwait(); br-br_prod_bufs++; br-br_prod_bytes += nbytes; br-br_prod_tail = prod_next; critical_exit(); ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: buf_ring(9) API precisions
Hi, On Fri, Sep 16, 2011 at 10:41 AM, K. Macy km...@freebsd.org wrote: On Fri, Sep 16, 2011 at 3:02 AM, Arnaud Lacombe lacom...@gmail.com wrote: Hi, On Wed, Sep 14, 2011 at 10:53 PM, Arnaud Lacombe lacom...@gmail.com wrote: Hi Kip, I've got a few question about the buf_ring(9) API. 1) what means the 'drbr_' prefix. I can guess the two last letter, 'b' and 'r', for Buffer Ring, but what about 'd' and 'r' ? 2) in `sys/sys/buf_ring.h', you defined 'struct buf_ring' as: struct buf_ring { volatile uint32_t br_prod_head; volatile uint32_t br_prod_tail; int br_prod_size; int br_prod_mask; uint64_t br_drops; uint64_t br_prod_bufs; uint64_t br_prod_bytes; shouldn't those 3 fields be updated atomically, especially on 32bits platforms ? That might pose a problem as, AFAIK, FreeBSD do not have MI 64bits atomics operations... Between the point at which br_prod_tail == prod_head and when we update br_prod_tail to point to prod_next we are the exclusive owners of the fields in buf_ring. That is why we wait for any other enqueueing threads to update br_prod_tail to point to prod_head before continuing. How do you enforce ordering ? I do not see anything particular forbidding the `br-br_prod_tail' to be committed first, leading other thread to believe they have access to the statistics, while the other thread has not yet committed its change. Thanks, - Arnaud Cheers /* * If there are other enqueues in progress * that preceeded us, we need to wait for them * to complete */ while (br-br_prod_tail != prod_head) cpu_spinwait(); br-br_prod_bufs++; br-br_prod_bytes += nbytes; br-br_prod_tail = prod_next; critical_exit(); ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: buf_ring(9) API precisions
On Fri, Sep 16, 2011 at 3:02 AM, Arnaud Lacombe lacom...@gmail.com wrote: Hi, On Wed, Sep 14, 2011 at 10:53 PM, Arnaud Lacombe lacom...@gmail.com wrote: Hi Kip, I've got a few question about the buf_ring(9) API. 1) what means the 'drbr_' prefix. I can guess the two last letter, 'b' and 'r', for Buffer Ring, but what about 'd' and 'r' ? 2) in `sys/sys/buf_ring.h', you defined 'struct buf_ring' as: struct buf_ring { volatile uint32_t br_prod_head; volatile uint32_t br_prod_tail; int br_prod_size; int br_prod_mask; uint64_t br_drops; uint64_t br_prod_bufs; uint64_t br_prod_bytes; shouldn't those 3 fields be updated atomically, especially on 32bits platforms ? That might pose a problem as, AFAIK, FreeBSD do not have MI 64bits atomics operations... Between the point at which br_prod_tail == prod_head and when we update br_prod_tail to point to prod_next we are the exclusive owners of the fields in buf_ring. That is why we wait for any other enqueueing threads to update br_prod_tail to point to prod_head before continuing. Cheers /* * If there are other enqueues in progress * that preceeded us, we need to wait for them * to complete */ while (br-br_prod_tail != prod_head) cpu_spinwait(); br-br_prod_bufs++; br-br_prod_bytes += nbytes; br-br_prod_tail = prod_next; critical_exit(); ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: buf_ring(9) API precisions
On Thu, Sep 15, 2011 at 4:53 AM, Arnaud Lacombe lacom...@gmail.com wrote: Hi Kip, I've got a few question about the buf_ring(9) API. 1) what means the 'drbr_' prefix. I can guess the two last letter, 'b' and 'r', for Buffer Ring, but what about 'd' and 'r' ? DRiver BufRing 2) in `sys/sys/buf_ring.h', you defined 'struct buf_ring' as: struct buf_ring { volatile uint32_t br_prod_head; volatile uint32_t br_prod_tail; int br_prod_size; int br_prod_mask; uint64_t br_drops; uint64_t br_prod_bufs; uint64_t br_prod_bytes; /* * Pad out to next L2 cache line */ uint64_t _pad0[11]; volatile uint32_t br_cons_head; volatile uint32_t br_cons_tail; int br_cons_size; int br_cons_mask; /* * Pad out to next L2 cache line */ uint64_t _pad1[14]; #ifdef DEBUG_BUFRING struct mtx *br_lock; #endif void *br_ring[0]; }; Why are you making an MD guess, the amount of padding to fit the size of a cache line, in MI API ? Strangely enough, you did not make this assumption in, say r205488 (picked randomly). It has been several years, and I haven't done any work in svn in over a year, I don't remember. I probably meant to refine it in a later iteration. If you would like to send me a patch addressing this I'd be more than happy to apply it if appropriate. Otherwise, I will deal with it some time after 9 settles. Thanks for pointing this out. Cheers ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: buf_ring(9) API precisions
On Thu, 15 Sep 2011, K. Macy wrote: Why are you making an MD guess, the amount of padding to fit the size of a cache line, in MI API ? Strangely enough, you did not make this assumption in, say r205488 (picked randomly). It has been several years, and I haven't done any work in svn in over a year, I don't remember. I probably meant to refine it in a later iteration. If you would like to send me a patch addressing this I'd be more than happy to apply it if appropriate. Otherwise, I will deal with it some time after 9 settles. Thanks for pointing this out. I'm not sure if gcc (and friends) allow __aligned(CACHE_LINE_SIZE) to be used on individual elements of a struct (causing appropriate padding to be added), but that may be one option here. Of course, that introduces a further alignment requirement on the struct itself, so a moderate amount of care would need to be used. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: buf_ring(9) API precisions
Hi, On Thu, Sep 15, 2011 at 5:21 PM, Robert Watson rwat...@freebsd.org wrote: On Thu, 15 Sep 2011, K. Macy wrote: Why are you making an MD guess, the amount of padding to fit the size of a cache line, in MI API ? Strangely enough, you did not make this assumption in, say r205488 (picked randomly). It has been several years, and I haven't done any work in svn in over a year, I don't remember. I probably meant to refine it in a later iteration. If you would like to send me a patch addressing this I'd be more than happy to apply it if appropriate. Otherwise, I will deal with it some time after 9 settles. Thanks for pointing this out. I'm not sure if gcc (and friends) allow __aligned(CACHE_LINE_SIZE) to be used on individual elements of a struct (causing appropriate padding to be added), but that may be one option here. It definitively does, it is used in several structure in the tree. Attached a patch to convert buf_ring(9), however, I'm not a huge fan of dedicating a full cache line to the lock pointer. - Arnaud Of course, that introduces a further alignment requirement on the struct itself, so a moderate amount of care would need to be used. Robert diff --git a/sys/sys/buf_ring.h b/sys/sys/buf_ring.h index 57e42e5..01f399e 100644 --- a/sys/sys/buf_ring.h +++ b/sys/sys/buf_ring.h @@ -49,25 +49,17 @@ struct buf_ring { uint64_t br_drops; uint64_t br_prod_bufs; uint64_t br_prod_bytes; - /* - * Pad out to next L2 cache line - */ - uint64_t _pad0[11]; - volatile uint32_t br_cons_head; + volatile uint32_t br_cons_head __aligned(CACHE_LINE_SIZE); volatile uint32_t br_cons_tail; int br_cons_size; int br_cons_mask; - /* - * Pad out to next L2 cache line - */ - uint64_t _pad1[14]; #ifdef DEBUG_BUFRING - struct mtx *br_lock; + struct mtx *br_lock __aligned(CACHE_LINE_SIZE); #endif - void *br_ring[0]; -}; + void *br_ring[0] __aligned(CACHE_LINE_SIZE); +} __aligned(CACHE_LINE_SIZE); /* * multi-producer safe lock-free ring buffer enqueue ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: buf_ring(9) API precisions
Hi, On Wed, Sep 14, 2011 at 10:53 PM, Arnaud Lacombe lacom...@gmail.com wrote: Hi Kip, I've got a few question about the buf_ring(9) API. 1) what means the 'drbr_' prefix. I can guess the two last letter, 'b' and 'r', for Buffer Ring, but what about 'd' and 'r' ? 2) in `sys/sys/buf_ring.h', you defined 'struct buf_ring' as: struct buf_ring { volatile uint32_t br_prod_head; volatile uint32_t br_prod_tail; int br_prod_size; int br_prod_mask; uint64_t br_drops; uint64_t br_prod_bufs; uint64_t br_prod_bytes; shouldn't those 3 fields be updated atomically, especially on 32bits platforms ? That might pose a problem as, AFAIK, FreeBSD do not have MI 64bits atomics operations... - Arnaud /* * Pad out to next L2 cache line */ uint64_t _pad0[11]; volatile uint32_t br_cons_head; volatile uint32_t br_cons_tail; int br_cons_size; int br_cons_mask; /* * Pad out to next L2 cache line */ uint64_t _pad1[14]; #ifdef DEBUG_BUFRING struct mtx *br_lock; #endif void *br_ring[0]; }; Why are you making an MD guess, the amount of padding to fit the size of a cache line, in MI API ? Strangely enough, you did not make this assumption in, say r205488 (picked randomly). Thanks in advance, - Arnaud ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org