Re: buf_ring(9) API precisions

2011-09-24 Thread K. Macy
You're right. A write memory barrier is needed there.

Thanks

On Thu, Sep 22, 2011 at 12:43 AM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 On Mon, Sep 19, 2011 at 8:46 AM, K. Macy km...@freebsd.org wrote:
 If the value lags next by one then it is ours. This rule applies to
 all callers so the rule holds consistently.

 I think you do not understand what I mean, which is that the following:

       while (br-br_prod_tail != prod_head)
               cpu_spinwait();
       br-br_prod_bufs++;
       br-br_prod_bytes += nbytes;
       br-br_prod_tail = prod_next;
       critical_exit();

 at runtime, can be seen, memory-wise as:

       while (br-br_prod_tail != prod_head)
               cpu_spinwait();
       br-br_prod_tail = prod_next;
       br-br_prod_bufs++;
       br-br_prod_bytes += nbytes;
       critical_exit();

 That is, there is no memory barrier to enforce completion of the
 load/increment/store/load/load/addition/store operations before
 updating what other thread spin on. Yes, `br_prod_tail' is marked
 `volatile', but there is no guarantee that it will not be re-ordered
 wrt. non-volatile write (to `br_prod_bufs' and `br_prod_bytes').

  - Arnaud

 On Mon, Sep 19, 2011 at 5:53 AM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 On Fri, Sep 16, 2011 at 10:41 AM, K. Macy km...@freebsd.org wrote:
 On Fri, Sep 16, 2011 at 3:02 AM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 On Wed, Sep 14, 2011 at 10:53 PM, Arnaud Lacombe lacom...@gmail.com 
 wrote:
 Hi Kip,

 I've got a few question about the buf_ring(9) API.

 1) what means the 'drbr_' prefix. I can guess the two last letter, 'b'
 and 'r', for Buffer Ring, but what about 'd' and 'r' ?

 2) in `sys/sys/buf_ring.h', you defined 'struct buf_ring' as:

 struct buf_ring {
        volatile uint32_t       br_prod_head;
        volatile uint32_t       br_prod_tail;
        int                     br_prod_size;
        int                     br_prod_mask;
        uint64_t                br_drops;
        uint64_t                br_prod_bufs;
        uint64_t                br_prod_bytes;
 shouldn't those 3 fields be updated atomically, especially on 32bits
 platforms ? That might pose a problem as, AFAIK, FreeBSD do not have
 MI 64bits atomics operations...

 Between the point at which br_prod_tail == prod_head and when we
 update br_prod_tail to point to prod_next we are the exclusive owners
 of the fields in buf_ring. That is why we wait for any other
 enqueueing threads to update br_prod_tail to point to prod_head before
 continuing.

 How do you enforce ordering ? I do not see anything particular
 forbidding the `br-br_prod_tail' to be committed first, leading other
 thread to believe they have access to the statistics, while the other
 thread has not yet committed its change.

 Thanks,
  - Arnaud

 Cheers

        /*
         * If there are other enqueues in progress
         * that preceeded us, we need to wait for them
         * to complete
         */
        while (br-br_prod_tail != prod_head)
                cpu_spinwait();
        br-br_prod_bufs++;
        br-br_prod_bytes += nbytes;
        br-br_prod_tail = prod_next;
        critical_exit();




___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: buf_ring(9) API precisions

2011-09-21 Thread Arnaud Lacombe
Hi,

On Mon, Sep 19, 2011 at 8:46 AM, K. Macy km...@freebsd.org wrote:
 If the value lags next by one then it is ours. This rule applies to
 all callers so the rule holds consistently.

I think you do not understand what I mean, which is that the following:

   while (br-br_prod_tail != prod_head)
   cpu_spinwait();
   br-br_prod_bufs++;
   br-br_prod_bytes += nbytes;
   br-br_prod_tail = prod_next;
   critical_exit();

at runtime, can be seen, memory-wise as:

   while (br-br_prod_tail != prod_head)
   cpu_spinwait();
   br-br_prod_tail = prod_next;
   br-br_prod_bufs++;
   br-br_prod_bytes += nbytes;
   critical_exit();

That is, there is no memory barrier to enforce completion of the
load/increment/store/load/load/addition/store operations before
updating what other thread spin on. Yes, `br_prod_tail' is marked
`volatile', but there is no guarantee that it will not be re-ordered
wrt. non-volatile write (to `br_prod_bufs' and `br_prod_bytes').

 - Arnaud

 On Mon, Sep 19, 2011 at 5:53 AM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 On Fri, Sep 16, 2011 at 10:41 AM, K. Macy km...@freebsd.org wrote:
 On Fri, Sep 16, 2011 at 3:02 AM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 On Wed, Sep 14, 2011 at 10:53 PM, Arnaud Lacombe lacom...@gmail.com 
 wrote:
 Hi Kip,

 I've got a few question about the buf_ring(9) API.

 1) what means the 'drbr_' prefix. I can guess the two last letter, 'b'
 and 'r', for Buffer Ring, but what about 'd' and 'r' ?

 2) in `sys/sys/buf_ring.h', you defined 'struct buf_ring' as:

 struct buf_ring {
        volatile uint32_t       br_prod_head;
        volatile uint32_t       br_prod_tail;
        int                     br_prod_size;
        int                     br_prod_mask;
        uint64_t                br_drops;
        uint64_t                br_prod_bufs;
        uint64_t                br_prod_bytes;
 shouldn't those 3 fields be updated atomically, especially on 32bits
 platforms ? That might pose a problem as, AFAIK, FreeBSD do not have
 MI 64bits atomics operations...

 Between the point at which br_prod_tail == prod_head and when we
 update br_prod_tail to point to prod_next we are the exclusive owners
 of the fields in buf_ring. That is why we wait for any other
 enqueueing threads to update br_prod_tail to point to prod_head before
 continuing.

 How do you enforce ordering ? I do not see anything particular
 forbidding the `br-br_prod_tail' to be committed first, leading other
 thread to believe they have access to the statistics, while the other
 thread has not yet committed its change.

 Thanks,
  - Arnaud

 Cheers

        /*
         * If there are other enqueues in progress
         * that preceeded us, we need to wait for them
         * to complete
         */
        while (br-br_prod_tail != prod_head)
                cpu_spinwait();
        br-br_prod_bufs++;
        br-br_prod_bytes += nbytes;
        br-br_prod_tail = prod_next;
        critical_exit();



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: buf_ring(9) API precisions

2011-09-19 Thread K. Macy
If the value lags next by one then it is ours. This rule applies to
all callers so the rule holds consistently.

On Mon, Sep 19, 2011 at 5:53 AM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 On Fri, Sep 16, 2011 at 10:41 AM, K. Macy km...@freebsd.org wrote:
 On Fri, Sep 16, 2011 at 3:02 AM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 On Wed, Sep 14, 2011 at 10:53 PM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi Kip,

 I've got a few question about the buf_ring(9) API.

 1) what means the 'drbr_' prefix. I can guess the two last letter, 'b'
 and 'r', for Buffer Ring, but what about 'd' and 'r' ?

 2) in `sys/sys/buf_ring.h', you defined 'struct buf_ring' as:

 struct buf_ring {
        volatile uint32_t       br_prod_head;
        volatile uint32_t       br_prod_tail;
        int                     br_prod_size;
        int                     br_prod_mask;
        uint64_t                br_drops;
        uint64_t                br_prod_bufs;
        uint64_t                br_prod_bytes;
 shouldn't those 3 fields be updated atomically, especially on 32bits
 platforms ? That might pose a problem as, AFAIK, FreeBSD do not have
 MI 64bits atomics operations...

 Between the point at which br_prod_tail == prod_head and when we
 update br_prod_tail to point to prod_next we are the exclusive owners
 of the fields in buf_ring. That is why we wait for any other
 enqueueing threads to update br_prod_tail to point to prod_head before
 continuing.

 How do you enforce ordering ? I do not see anything particular
 forbidding the `br-br_prod_tail' to be committed first, leading other
 thread to believe they have access to the statistics, while the other
 thread has not yet committed its change.

 Thanks,
  - Arnaud

 Cheers

        /*
         * If there are other enqueues in progress
         * that preceeded us, we need to wait for them
         * to complete
         */
        while (br-br_prod_tail != prod_head)
                cpu_spinwait();
        br-br_prod_bufs++;
        br-br_prod_bytes += nbytes;
        br-br_prod_tail = prod_next;
        critical_exit();


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: buf_ring(9) API precisions

2011-09-18 Thread Arnaud Lacombe
Hi,

On Fri, Sep 16, 2011 at 10:41 AM, K. Macy km...@freebsd.org wrote:
 On Fri, Sep 16, 2011 at 3:02 AM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 On Wed, Sep 14, 2011 at 10:53 PM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi Kip,

 I've got a few question about the buf_ring(9) API.

 1) what means the 'drbr_' prefix. I can guess the two last letter, 'b'
 and 'r', for Buffer Ring, but what about 'd' and 'r' ?

 2) in `sys/sys/buf_ring.h', you defined 'struct buf_ring' as:

 struct buf_ring {
        volatile uint32_t       br_prod_head;
        volatile uint32_t       br_prod_tail;
        int                     br_prod_size;
        int                     br_prod_mask;
        uint64_t                br_drops;
        uint64_t                br_prod_bufs;
        uint64_t                br_prod_bytes;
 shouldn't those 3 fields be updated atomically, especially on 32bits
 platforms ? That might pose a problem as, AFAIK, FreeBSD do not have
 MI 64bits atomics operations...

 Between the point at which br_prod_tail == prod_head and when we
 update br_prod_tail to point to prod_next we are the exclusive owners
 of the fields in buf_ring. That is why we wait for any other
 enqueueing threads to update br_prod_tail to point to prod_head before
 continuing.

How do you enforce ordering ? I do not see anything particular
forbidding the `br-br_prod_tail' to be committed first, leading other
thread to believe they have access to the statistics, while the other
thread has not yet committed its change.

Thanks,
 - Arnaud

 Cheers

        /*
         * If there are other enqueues in progress
         * that preceeded us, we need to wait for them
         * to complete
         */
        while (br-br_prod_tail != prod_head)
                cpu_spinwait();
        br-br_prod_bufs++;
        br-br_prod_bytes += nbytes;
        br-br_prod_tail = prod_next;
        critical_exit();

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: buf_ring(9) API precisions

2011-09-16 Thread K. Macy
On Fri, Sep 16, 2011 at 3:02 AM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 On Wed, Sep 14, 2011 at 10:53 PM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi Kip,

 I've got a few question about the buf_ring(9) API.

 1) what means the 'drbr_' prefix. I can guess the two last letter, 'b'
 and 'r', for Buffer Ring, but what about 'd' and 'r' ?

 2) in `sys/sys/buf_ring.h', you defined 'struct buf_ring' as:

 struct buf_ring {
        volatile uint32_t       br_prod_head;
        volatile uint32_t       br_prod_tail;
        int                     br_prod_size;
        int                     br_prod_mask;
        uint64_t                br_drops;
        uint64_t                br_prod_bufs;
        uint64_t                br_prod_bytes;
 shouldn't those 3 fields be updated atomically, especially on 32bits
 platforms ? That might pose a problem as, AFAIK, FreeBSD do not have
 MI 64bits atomics operations...

Between the point at which br_prod_tail == prod_head and when we
update br_prod_tail to point to prod_next we are the exclusive owners
of the fields in buf_ring. That is why we wait for any other
enqueueing threads to update br_prod_tail to point to prod_head before
continuing.

Cheers

/*
 * If there are other enqueues in progress
 * that preceeded us, we need to wait for them
 * to complete
 */
while (br-br_prod_tail != prod_head)
cpu_spinwait();
br-br_prod_bufs++;
br-br_prod_bytes += nbytes;
br-br_prod_tail = prod_next;
critical_exit();
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: buf_ring(9) API precisions

2011-09-15 Thread K. Macy
On Thu, Sep 15, 2011 at 4:53 AM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi Kip,

 I've got a few question about the buf_ring(9) API.

 1) what means the 'drbr_' prefix. I can guess the two last letter, 'b'
 and 'r', for Buffer Ring, but what about 'd' and 'r' ?

DRiver BufRing

 2) in `sys/sys/buf_ring.h', you defined 'struct buf_ring' as:

 struct buf_ring {
        volatile uint32_t       br_prod_head;
        volatile uint32_t       br_prod_tail;
        int                     br_prod_size;
        int                     br_prod_mask;
        uint64_t                br_drops;
        uint64_t                br_prod_bufs;
        uint64_t                br_prod_bytes;
        /*
         * Pad out to next L2 cache line
         */
        uint64_t                _pad0[11];

        volatile uint32_t       br_cons_head;
        volatile uint32_t       br_cons_tail;
        int                     br_cons_size;
        int                     br_cons_mask;

        /*
         * Pad out to next L2 cache line
         */
        uint64_t                _pad1[14];
 #ifdef DEBUG_BUFRING
        struct mtx              *br_lock;
 #endif
        void                    *br_ring[0];
 };

 Why are you making an MD guess, the amount of padding to fit the size
 of a cache line, in MI API ? Strangely enough, you did not make this
 assumption in, say r205488 (picked randomly).

It has been several years, and I haven't done any work in svn in over
a year, I don't remember. I probably meant to refine it in a later
iteration.

If you would like to send me a patch addressing this I'd be more than
happy to apply it if appropriate. Otherwise, I will deal with it some
time after 9 settles.

Thanks for pointing this out.

Cheers
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: buf_ring(9) API precisions

2011-09-15 Thread Robert Watson

On Thu, 15 Sep 2011, K. Macy wrote:

Why are you making an MD guess, the amount of padding to fit the size of a 
cache line, in MI API ? Strangely enough, you did not make this assumption 
in, say r205488 (picked randomly).


It has been several years, and I haven't done any work in svn in over a 
year, I don't remember. I probably meant to refine it in a later iteration.


If you would like to send me a patch addressing this I'd be more than happy 
to apply it if appropriate. Otherwise, I will deal with it some time after 9 
settles.


Thanks for pointing this out.


I'm not sure if gcc (and friends) allow __aligned(CACHE_LINE_SIZE) to be used 
on individual elements of a struct (causing appropriate padding to be added), 
but that may be one option here.  Of course, that introduces a further 
alignment requirement on the struct itself, so a moderate amount of care would 
need to be used.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: buf_ring(9) API precisions

2011-09-15 Thread Arnaud Lacombe
Hi,

On Thu, Sep 15, 2011 at 5:21 PM, Robert Watson rwat...@freebsd.org wrote:
 On Thu, 15 Sep 2011, K. Macy wrote:

 Why are you making an MD guess, the amount of padding to fit the size of
 a cache line, in MI API ? Strangely enough, you did not make this assumption
 in, say r205488 (picked randomly).

 It has been several years, and I haven't done any work in svn in over a
 year, I don't remember. I probably meant to refine it in a later iteration.

 If you would like to send me a patch addressing this I'd be more than
 happy to apply it if appropriate. Otherwise, I will deal with it some time
 after 9 settles.

 Thanks for pointing this out.

 I'm not sure if gcc (and friends) allow __aligned(CACHE_LINE_SIZE) to be
 used on individual elements of a struct (causing appropriate padding to be
 added), but that may be one option here.

It definitively does, it is used in several structure in the tree.

Attached a patch to convert buf_ring(9), however, I'm not a huge fan
of dedicating a full cache line to the lock pointer.

 - Arnaud

 Of course, that introduces a
 further alignment requirement on the struct itself, so a moderate amount of
 care would need to be used.

 Robert

diff --git a/sys/sys/buf_ring.h b/sys/sys/buf_ring.h
index 57e42e5..01f399e 100644
--- a/sys/sys/buf_ring.h
+++ b/sys/sys/buf_ring.h
@@ -49,25 +49,17 @@ struct buf_ring {
 	uint64_t		br_drops;
 	uint64_t		br_prod_bufs;
 	uint64_t		br_prod_bytes;
-	/*
-	 * Pad out to next L2 cache line
-	 */
-	uint64_t	  	_pad0[11];
 
-	volatile uint32_t	br_cons_head;
+	volatile uint32_t	br_cons_head __aligned(CACHE_LINE_SIZE);
 	volatile uint32_t	br_cons_tail;
 	int		 	br_cons_size;
 	int  	br_cons_mask;
 	
-	/*
-	 * Pad out to next L2 cache line
-	 */
-	uint64_t	  	_pad1[14];
 #ifdef DEBUG_BUFRING
-	struct mtx		*br_lock;
+	struct mtx		*br_lock __aligned(CACHE_LINE_SIZE);
 #endif	
-	void			*br_ring[0];
-};
+	void			*br_ring[0] __aligned(CACHE_LINE_SIZE);
+} __aligned(CACHE_LINE_SIZE);
 
 /*
  * multi-producer safe lock-free ring buffer enqueue
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: buf_ring(9) API precisions

2011-09-15 Thread Arnaud Lacombe
Hi,

On Wed, Sep 14, 2011 at 10:53 PM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi Kip,

 I've got a few question about the buf_ring(9) API.

 1) what means the 'drbr_' prefix. I can guess the two last letter, 'b'
 and 'r', for Buffer Ring, but what about 'd' and 'r' ?

 2) in `sys/sys/buf_ring.h', you defined 'struct buf_ring' as:

 struct buf_ring {
        volatile uint32_t       br_prod_head;
        volatile uint32_t       br_prod_tail;
        int                     br_prod_size;
        int                     br_prod_mask;
        uint64_t                br_drops;
        uint64_t                br_prod_bufs;
        uint64_t                br_prod_bytes;
shouldn't those 3 fields be updated atomically, especially on 32bits
platforms ? That might pose a problem as, AFAIK, FreeBSD do not have
MI 64bits atomics operations...

 - Arnaud

        /*
         * Pad out to next L2 cache line
         */
        uint64_t                _pad0[11];

        volatile uint32_t       br_cons_head;
        volatile uint32_t       br_cons_tail;
        int                     br_cons_size;
        int                     br_cons_mask;

        /*
         * Pad out to next L2 cache line
         */
        uint64_t                _pad1[14];
 #ifdef DEBUG_BUFRING
        struct mtx              *br_lock;
 #endif
        void                    *br_ring[0];
 };

 Why are you making an MD guess, the amount of padding to fit the size
 of a cache line, in MI API ? Strangely enough, you did not make this
 assumption in, say r205488 (picked randomly).

 Thanks in advance,
  - Arnaud

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org