Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-26 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 46
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+   ring->head = 0;
+   ring->tail = 0;
+   ring->mask = size - 1;
+   ring->data = data;
+}
+
+/* Dequeue data from the ring head. Max_num is smaller than ring size.*/
+static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[],
+uint32_t max_num)
+{
+   uint32_t head, tail, mask, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   num  = tail - head;


Comment:
OK

> Petri Savolainen(psavol) wrote:
> ```
> if (ring_st_is_empty(>s.ring_st))
> 
> if (ring_st_is_empty(>s.ring_st))
> 
> if (ring_st_is_empty(>s.ring_st) == 0)
> 
> if (ring_st_is_empty(>s.ring_st))
> 
> if (ring_st_is_empty(>s.ring_st))
> 
> if (!ring_st_is_empty(>s.ring_st))
> 
> if (ring_st_is_empty(>s.ring_st))
> 
> if (ring_st_is_empty(>s.ring_st))
> ```
> 
> Find if clause for "ring is not empty" from picture above.


>> Petri Savolainen(psavol) wrote:
>> It's unsigned 32 bit arithmetic, not signed integer arithmetic. Also, tail 
>> is always ahead of head, but no more than queue size (e.g. 4k), So, e.g. 
>> when tail is 0, and head is 0x the number of items in ring is 0 - 
>> 0x = 1
>> 
>> ```
>> headtail
>> hex[index], hex[index], num, num_free
>> 0xfffe [6], 0x [7], 1,   7
>> 0x [7], 0x [0], 1,   7
>> 0x [0], 0x0001 [1], 1,   7
>> ```


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Please see above discussion. The comment stands.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Not true. See the comment below where it's clear that `head` and `tail` 
 are free running and thus will eventually wrap as described above. If 
 these were `uint64_t` variables you could argue that the wraps would take 
 centuries and hence are nothing to worry about, but being `uint32_t` 
 variables this should be expected to happen regularly on heavily used 
 queues. 
 
 `abs()` adds no overhead since compilers treat it as an intrinsic.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Note that neither `head` nor `tail` get masked when doing enqueues or 
> dequeues, so they will traverse the entire 32-bit range of the variable 
> over time. While the ring itself will never hold more than `size` 
> entries, `head` and `tail` will wrap around. When you index into the ring 
> you do so via a mask (`idx = head & mask;`), but calculating the number 
> of elements in the ring doesn't do this, which is why `abs()` is needed.


>> Petri Savolainen(psavol) wrote:
>> Plus this is done only once - in pool create phase.


>>> Petri Savolainen(psavol) wrote:
>>> No since ring size will be much smaller than 4 billion.


 Petri Savolainen(psavol) wrote:
 The point is that ring size will never be close to 4 billion entries. 
 E.g. currently tail is always max 4096 larger than head. Your example 
 above is based assumption of 4 billion entry ring. Overflow is avoided 
 when ring size if <2 billion, as 32 bit indexes can be still used to 
 calculate number of items correctly.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Agreed.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Agreed, this is good for now. Later we may wish to honor the 
>> user-requested queue `size` parameter.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Agreed.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Correct, as noted earlier. I withdraw that comment.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Presumably the compiler will see the overlap and optimize away 
> the redundancy, so I assume the performance impact will be nil 
> here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Since you're allowing `head` and `tail` to run over the entire 
>> 32-bit range, you're correct that you can completely fill the 
>> ring. I did miss that 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-26 Thread Github ODP bot
Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 46
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+   ring->head = 0;
+   ring->tail = 0;
+   ring->mask = size - 1;
+   ring->data = data;
+}
+
+/* Dequeue data from the ring head. Max_num is smaller than ring size.*/
+static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[],
+uint32_t max_num)
+{
+   uint32_t head, tail, mask, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   num  = tail - head;


Comment:
It's unsigned 32 bit arithmetic, not signed integer arithmetic. Also, tail is 
always ahead of head, but no more than queue size (e.g. 4k), So, e.g. when tail 
is 0, and head is 0x the number of items in ring is 0 - 0x = 1

```
headtail
hex[index], hex[index], num, num_free
0xfffe [6], 0x [7], 1,   7
0x [7], 0x [0], 1,   7
0x [0], 0x0001 [1], 1,   7
```

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Please see above discussion. The comment stands.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Not true. See the comment below where it's clear that `head` and `tail` are 
>> free running and thus will eventually wrap as described above. If these were 
>> `uint64_t` variables you could argue that the wraps would take centuries and 
>> hence are nothing to worry about, but being `uint32_t` variables this should 
>> be expected to happen regularly on heavily used queues. 
>> 
>> `abs()` adds no overhead since compilers treat it as an intrinsic.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Note that neither `head` nor `tail` get masked when doing enqueues or 
>>> dequeues, so they will traverse the entire 32-bit range of the variable 
>>> over time. While the ring itself will never hold more than `size` entries, 
>>> `head` and `tail` will wrap around. When you index into the ring you do so 
>>> via a mask (`idx = head & mask;`), but calculating the number of elements 
>>> in the ring doesn't do this, which is why `abs()` is needed.


 Petri Savolainen(psavol) wrote:
 Plus this is done only once - in pool create phase.


> Petri Savolainen(psavol) wrote:
> No since ring size will be much smaller than 4 billion.


>> Petri Savolainen(psavol) wrote:
>> The point is that ring size will never be close to 4 billion entries. 
>> E.g. currently tail is always max 4096 larger than head. Your example 
>> above is based assumption of 4 billion entry ring. Overflow is avoided 
>> when ring size if <2 billion, as 32 bit indexes can be still used to 
>> calculate number of items correctly.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Agreed.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Agreed, this is good for now. Later we may wish to honor the 
 user-requested queue `size` parameter.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Agreed.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Correct, as noted earlier. I withdraw that comment.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Presumably the compiler will see the overlap and optimize away the 
>>> redundancy, so I assume the performance impact will be nil here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Since you're allowing `head` and `tail` to run over the entire 
 32-bit range, you're correct that you can completely fill the 
 ring. I did miss that point. However, as shown above you still 
 need this to be:
 
 ```
 num = size - abs(tail - head);
 ```
 To avoid problems at the 32-bit wrap boundary.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> You're computing in 32 bits not 8 bits, and your ring size is 
> less than 2^32 elements. Consider the following test program:
> ```
> #include 
> #include 
> #include 
> #include 
> 
> void main() {
>   uint32_t head[4] 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-23 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 78
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+   ring->head = 0;
+   ring->tail = 0;
+   ring->mask = size - 1;
+   ring->data = data;
+}
+
+/* Dequeue data from the ring head. Max_num is smaller than ring size.*/
+static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[],
+uint32_t max_num)
+{
+   uint32_t head, tail, mask, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   num  = tail - head;
+
+   /* Empty */
+   if (num == 0)
+   return 0;
+
+   if (num > max_num)
+   num = max_num;
+
+   idx = head & mask;
+
+   for (i = 0; i < num; i++) {
+   data[i] = ring->data[idx];
+   idx = (idx + 1) & mask;
+   }
+
+   ring->head = head + num;
+
+   return num;
+}
+
+/* Enqueue data into the ring tail. Num_data is smaller than ring size. */
+static inline uint32_t ring_st_enq_multi(ring_st_t *ring, const uint32_t 
data[],
+uint32_t num_data)
+{
+   uint32_t head, tail, mask, size, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   size = mask + 1;
+   num  = size - (tail - head);


Comment:
Please see above discussion. The comment stands.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Not true. See the comment below where it's clear that `head` and `tail` are 
> free running and thus will eventually wrap as described above. If these were 
> `uint64_t` variables you could argue that the wraps would take centuries and 
> hence are nothing to worry about, but being `uint32_t` variables this should 
> be expected to happen regularly on heavily used queues. 
> 
> `abs()` adds no overhead since compilers treat it as an intrinsic.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Note that neither `head` nor `tail` get masked when doing enqueues or 
>> dequeues, so they will traverse the entire 32-bit range of the variable over 
>> time. While the ring itself will never hold more than `size` entries, `head` 
>> and `tail` will wrap around. When you index into the ring you do so via a 
>> mask (`idx = head & mask;`), but calculating the number of elements in the 
>> ring doesn't do this, which is why `abs()` is needed.


>>> Petri Savolainen(psavol) wrote:
>>> Plus this is done only once - in pool create phase.


 Petri Savolainen(psavol) wrote:
 No since ring size will be much smaller than 4 billion.


> Petri Savolainen(psavol) wrote:
> The point is that ring size will never be close to 4 billion entries. 
> E.g. currently tail is always max 4096 larger than head. Your example 
> above is based assumption of 4 billion entry ring. Overflow is avoided 
> when ring size if <2 billion, as 32 bit indexes can be still used to 
> calculate number of items correctly.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Agreed.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Agreed, this is good for now. Later we may wish to honor the 
>>> user-requested queue `size` parameter.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Agreed.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Correct, as noted earlier. I withdraw that comment.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Presumably the compiler will see the overlap and optimize away the 
>> redundancy, so I assume the performance impact will be nil here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Since you're allowing `head` and `tail` to run over the entire 
>>> 32-bit range, you're correct that you can completely fill the ring. 
>>> I did miss that point. However, as shown above you still need this 
>>> to be:
>>> 
>>> ```
>>> num = size - abs(tail - head);
>>> ```
>>> To avoid problems at the 32-bit wrap boundary.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 You're computing in 32 bits not 8 bits, and your ring 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-23 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 46
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+   ring->head = 0;
+   ring->tail = 0;
+   ring->mask = size - 1;
+   ring->data = data;
+}
+
+/* Dequeue data from the ring head. Max_num is smaller than ring size.*/
+static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[],
+uint32_t max_num)
+{
+   uint32_t head, tail, mask, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   num  = tail - head;


Comment:
Not true. See the comment below where it's clear that `head` and `tail` are 
free running and thus will eventually wrap as described above. If these were 
`uint64_t` variables you could argue that the wraps would take centuries and 
hence are nothing to worry about, but being `uint32_t` variables this should be 
expected to happen regularly on heavily used queues. 

`abs()` adds no overhead since compilers treat it as an intrinsic.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Note that neither `head` nor `tail` get masked when doing enqueues or 
> dequeues, so they will traverse the entire 32-bit range of the variable over 
> time. While the ring itself will never hold more than `size` entries, `head` 
> and `tail` will wrap around. When you index into the ring you do so via a 
> mask (`idx = head & mask;`), but calculating the number of elements in the 
> ring doesn't do this, which is why `abs()` is needed.


>> Petri Savolainen(psavol) wrote:
>> Plus this is done only once - in pool create phase.


>>> Petri Savolainen(psavol) wrote:
>>> No since ring size will be much smaller than 4 billion.


 Petri Savolainen(psavol) wrote:
 The point is that ring size will never be close to 4 billion entries. E.g. 
 currently tail is always max 4096 larger than head. Your example above is 
 based assumption of 4 billion entry ring. Overflow is avoided when ring 
 size if <2 billion, as 32 bit indexes can be still used to calculate 
 number of items correctly.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Agreed.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Agreed, this is good for now. Later we may wish to honor the 
>> user-requested queue `size` parameter.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Agreed.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Correct, as noted earlier. I withdraw that comment.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Presumably the compiler will see the overlap and optimize away the 
> redundancy, so I assume the performance impact will be nil here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Since you're allowing `head` and `tail` to run over the entire 
>> 32-bit range, you're correct that you can completely fill the ring. 
>> I did miss that point. However, as shown above you still need this 
>> to be:
>> 
>> ```
>> num = size - abs(tail - head);
>> ```
>> To avoid problems at the 32-bit wrap boundary.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> You're computing in 32 bits not 8 bits, and your ring size is less 
>>> than 2^32 elements. Consider the following test program:
>>> ```
>>> #include 
>>> #include 
>>> #include 
>>> #include 
>>> 
>>> void main() {
>>> uint32_t head[4] = {0, 1, 2, 3};
>>> uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd};
>>> uint32_t mask = 4095;
>>> uint32_t result;
>>> int i;
>>> 
>>> for (i = 0; i < 4; i++) {
>>> printf("head = %" PRIu32 " tail = %" PRIu32 ":\n",
>>>head[i], tail[i]);
>>> 
>>> result = tail[i] - head[i];
>>> printf("tail - head = %" PRIu32 "\n", result);
>>> 
>>> result = (tail[i] - head[i]) & mask;
>>> printf("(tail - head) & mask = %" PRIu32 "\n", result);
>>> 
>>>

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-23 Thread Github ODP bot
Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/odp_pool.c
line 28
@@ -296,7 +282,9 @@ static void init_buffers(pool_t *pool)
memset(buf_hdr, 0, (uintptr_t)data - (uintptr_t)buf_hdr);
 
/* Initialize buffer metadata */
-   buf_hdr->index = i;
+   buf_hdr->index.u32= 0;
+   buf_hdr->index.pool   = pool->pool_idx;
+   buf_hdr->index.buffer = i;


Comment:
Plus this is done only once - in pool create phase.

> Petri Savolainen(psavol) wrote:
> No since ring size will be much smaller than 4 billion.


>> Petri Savolainen(psavol) wrote:
>> The point is that ring size will never be close to 4 billion entries. E.g. 
>> currently tail is always max 4096 larger than head. Your example above is 
>> based assumption of 4 billion entry ring. Overflow is avoided when ring size 
>> if <2 billion, as 32 bit indexes can be still used to calculate number of 
>> items correctly.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Agreed.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Agreed, this is good for now. Later we may wish to honor the 
 user-requested queue `size` parameter.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Agreed.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Correct, as noted earlier. I withdraw that comment.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Presumably the compiler will see the overlap and optimize away the 
>>> redundancy, so I assume the performance impact will be nil here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Since you're allowing `head` and `tail` to run over the entire 32-bit 
 range, you're correct that you can completely fill the ring. I did 
 miss that point. However, as shown above you still need this to be:
 
 ```
 num = size - abs(tail - head);
 ```
 To avoid problems at the 32-bit wrap boundary.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> You're computing in 32 bits not 8 bits, and your ring size is less 
> than 2^32 elements. Consider the following test program:
> ```
> #include 
> #include 
> #include 
> #include 
> 
> void main() {
>   uint32_t head[4] = {0, 1, 2, 3};
>   uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd};
>   uint32_t mask = 4095;
>   uint32_t result;
>   int i;
> 
>   for (i = 0; i < 4; i++) {
>   printf("head = %" PRIu32 " tail = %" PRIu32 ":\n",
>  head[i], tail[i]);
> 
>   result = tail[i] - head[i];
>   printf("tail - head = %" PRIu32 "\n", result);
> 
>   result = (tail[i] - head[i]) & mask;
>   printf("(tail - head) & mask = %" PRIu32 "\n", result);
> 
>   result = abs(tail[i] - head[i]);
>   printf("abs(tail - head) = %" PRIu32 "\n\n", result);
>   }
> }
> ```
> in theory `tail - head` should be the number of elements in the ring, 
> in this case 0, 2, 4, and 6. But running this test program gives the 
> following output:
> ```
> head = 0 tail = 0:
> tail - head = 0
> (tail - head) & mask = 0
> abs(tail - head) = 0
> 
> head = 1 tail = 4294967295:
> tail - head = 4294967294
> (tail - head) & mask = 4094
> abs(tail - head) = 2
> 
> head = 2 tail = 4294967294:
> tail - head = 4294967292
> (tail - head) & mask = 4092
> abs(tail - head) = 4
> 
> head = 3 tail = 4294967293:
> tail - head = 4294967290
> (tail - head) & mask = 4090
> abs(tail - head) = 6
> ```
> Since you're allowing head to run free over the 32-bit range of the 
> variable, when the 32-bits rolls over you'll get a large positive 
> number, not the small one you need to stay within the ring bounds. 
> The alternative is to mask `head` and `tail` as you increment them, 
> but then you run into the effective range issue.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> OK


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> OK, the `ODP_ASSERT()` would still be useful for debugging.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 That functionality is not obvious from the name. It either implies 
 that one of the input arguments is written (not true here) or the 
 reader might assume that it is an expression without side-effect 
 and should be deleted (what I originally thought when reading it). 
 You 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-23 Thread Github ODP bot
Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 78
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+   ring->head = 0;
+   ring->tail = 0;
+   ring->mask = size - 1;
+   ring->data = data;
+}
+
+/* Dequeue data from the ring head. Max_num is smaller than ring size.*/
+static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[],
+uint32_t max_num)
+{
+   uint32_t head, tail, mask, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   num  = tail - head;
+
+   /* Empty */
+   if (num == 0)
+   return 0;
+
+   if (num > max_num)
+   num = max_num;
+
+   idx = head & mask;
+
+   for (i = 0; i < num; i++) {
+   data[i] = ring->data[idx];
+   idx = (idx + 1) & mask;
+   }
+
+   ring->head = head + num;
+
+   return num;
+}
+
+/* Enqueue data into the ring tail. Num_data is smaller than ring size. */
+static inline uint32_t ring_st_enq_multi(ring_st_t *ring, const uint32_t 
data[],
+uint32_t num_data)
+{
+   uint32_t head, tail, mask, size, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   size = mask + 1;
+   num  = size - (tail - head);


Comment:
No since ring size will be much smaller than 4 billion.

> Petri Savolainen(psavol) wrote:
> The point is that ring size will never be close to 4 billion entries. E.g. 
> currently tail is always max 4096 larger than head. Your example above is 
> based assumption of 4 billion entry ring. Overflow is avoided when ring size 
> if <2 billion, as 32 bit indexes can be still used to calculate number of 
> items correctly.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Agreed.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Agreed, this is good for now. Later we may wish to honor the user-requested 
>>> queue `size` parameter.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Agreed.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Correct, as noted earlier. I withdraw that comment.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Presumably the compiler will see the overlap and optimize away the 
>> redundancy, so I assume the performance impact will be nil here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Since you're allowing `head` and `tail` to run over the entire 32-bit 
>>> range, you're correct that you can completely fill the ring. I did miss 
>>> that point. However, as shown above you still need this to be:
>>> 
>>> ```
>>> num = size - abs(tail - head);
>>> ```
>>> To avoid problems at the 32-bit wrap boundary.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 You're computing in 32 bits not 8 bits, and your ring size is less 
 than 2^32 elements. Consider the following test program:
 ```
 #include 
 #include 
 #include 
 #include 
 
 void main() {
uint32_t head[4] = {0, 1, 2, 3};
uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd};
uint32_t mask = 4095;
uint32_t result;
int i;
 
for (i = 0; i < 4; i++) {
printf("head = %" PRIu32 " tail = %" PRIu32 ":\n",
   head[i], tail[i]);
 
result = tail[i] - head[i];
printf("tail - head = %" PRIu32 "\n", result);
 
result = (tail[i] - head[i]) & mask;
printf("(tail - head) & mask = %" PRIu32 "\n", result);
 
result = abs(tail[i] - head[i]);
printf("abs(tail - head) = %" PRIu32 "\n\n", result);
}
 }
 ```
 in theory `tail - head` should be the number of elements in the ring, 
 in this case 0, 2, 4, and 6. But running this test program gives the 
 following output:
 ```
 head = 0 tail = 0:
 tail - head = 0

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-23 Thread Github ODP bot
Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 46
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+   ring->head = 0;
+   ring->tail = 0;
+   ring->mask = size - 1;
+   ring->data = data;
+}
+
+/* Dequeue data from the ring head. Max_num is smaller than ring size.*/
+static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[],
+uint32_t max_num)
+{
+   uint32_t head, tail, mask, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   num  = tail - head;


Comment:
The point is that ring size will never be close to 4 billion entries. E.g. 
currently tail is always max 4096 larger than head. Your example above is based 
assumption of 4 billion entry ring. Overflow is avoided when ring size if <2 
billion, as 32 bit indexes can be still used to calculate number of items 
correctly.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Agreed.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Agreed, this is good for now. Later we may wish to honor the user-requested 
>> queue `size` parameter.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Agreed.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Correct, as noted earlier. I withdraw that comment.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Presumably the compiler will see the overlap and optimize away the 
> redundancy, so I assume the performance impact will be nil here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Since you're allowing `head` and `tail` to run over the entire 32-bit 
>> range, you're correct that you can completely fill the ring. I did miss 
>> that point. However, as shown above you still need this to be:
>> 
>> ```
>> num = size - abs(tail - head);
>> ```
>> To avoid problems at the 32-bit wrap boundary.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> You're computing in 32 bits not 8 bits, and your ring size is less than 
>>> 2^32 elements. Consider the following test program:
>>> ```
>>> #include 
>>> #include 
>>> #include 
>>> #include 
>>> 
>>> void main() {
>>> uint32_t head[4] = {0, 1, 2, 3};
>>> uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd};
>>> uint32_t mask = 4095;
>>> uint32_t result;
>>> int i;
>>> 
>>> for (i = 0; i < 4; i++) {
>>> printf("head = %" PRIu32 " tail = %" PRIu32 ":\n",
>>>head[i], tail[i]);
>>> 
>>> result = tail[i] - head[i];
>>> printf("tail - head = %" PRIu32 "\n", result);
>>> 
>>> result = (tail[i] - head[i]) & mask;
>>> printf("(tail - head) & mask = %" PRIu32 "\n", result);
>>> 
>>> result = abs(tail[i] - head[i]);
>>> printf("abs(tail - head) = %" PRIu32 "\n\n", result);
>>> }
>>> }
>>> ```
>>> in theory `tail - head` should be the number of elements in the ring, 
>>> in this case 0, 2, 4, and 6. But running this test program gives the 
>>> following output:
>>> ```
>>> head = 0 tail = 0:
>>> tail - head = 0
>>> (tail - head) & mask = 0
>>> abs(tail - head) = 0
>>> 
>>> head = 1 tail = 4294967295:
>>> tail - head = 4294967294
>>> (tail - head) & mask = 4094
>>> abs(tail - head) = 2
>>> 
>>> head = 2 tail = 4294967294:
>>> tail - head = 4294967292
>>> (tail - head) & mask = 4092
>>> abs(tail - head) = 4
>>> 
>>> head = 3 tail = 4294967293:
>>> tail - head = 4294967290
>>> (tail - head) & mask = 4090
>>> abs(tail - head) = 6
>>> ```
>>> Since you're allowing head to run free over the 32-bit range of the 
>>> variable, when the 32-bits rolls over you'll get a large positive 
>>> number, not the small one you need to stay within the ring bounds. The 
>>> alternative is to mask `head` and `tail` as you increment them, but 
>>> then you run into the effective range issue.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 OK


> 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_queue.c
line 103
@@ -192,6 +201,9 @@ static odp_queue_t queue_create(const char *name,
param = _param;
}
 
+   if (param->size > CONFIG_QUEUE_SIZE)


Comment:
Agreed, this is good for now. Later we may wish to honor the user-requested 
queue `size` parameter.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Agreed.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Correct, as noted earlier. I withdraw that comment.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Presumably the compiler will see the overlap and optimize away the 
>>> redundancy, so I assume the performance impact will be nil here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Since you're allowing `head` and `tail` to run over the entire 32-bit 
 range, you're correct that you can completely fill the ring. I did miss 
 that point. However, as shown above you still need this to be:
 
 ```
 num = size - abs(tail - head);
 ```
 To avoid problems at the 32-bit wrap boundary.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> You're computing in 32 bits not 8 bits, and your ring size is less than 
> 2^32 elements. Consider the following test program:
> ```
> #include 
> #include 
> #include 
> #include 
> 
> void main() {
>   uint32_t head[4] = {0, 1, 2, 3};
>   uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd};
>   uint32_t mask = 4095;
>   uint32_t result;
>   int i;
> 
>   for (i = 0; i < 4; i++) {
>   printf("head = %" PRIu32 " tail = %" PRIu32 ":\n",
>  head[i], tail[i]);
> 
>   result = tail[i] - head[i];
>   printf("tail - head = %" PRIu32 "\n", result);
> 
>   result = (tail[i] - head[i]) & mask;
>   printf("(tail - head) & mask = %" PRIu32 "\n", result);
> 
>   result = abs(tail[i] - head[i]);
>   printf("abs(tail - head) = %" PRIu32 "\n\n", result);
>   }
> }
> ```
> in theory `tail - head` should be the number of elements in the ring, in 
> this case 0, 2, 4, and 6. But running this test program gives the 
> following output:
> ```
> head = 0 tail = 0:
> tail - head = 0
> (tail - head) & mask = 0
> abs(tail - head) = 0
> 
> head = 1 tail = 4294967295:
> tail - head = 4294967294
> (tail - head) & mask = 4094
> abs(tail - head) = 2
> 
> head = 2 tail = 4294967294:
> tail - head = 4294967292
> (tail - head) & mask = 4092
> abs(tail - head) = 4
> 
> head = 3 tail = 4294967293:
> tail - head = 4294967290
> (tail - head) & mask = 4090
> abs(tail - head) = 6
> ```
> Since you're allowing head to run free over the 32-bit range of the 
> variable, when the 32-bits rolls over you'll get a large positive number, 
> not the small one you need to stay within the ring bounds. The 
> alternative is to mask `head` and `tail` as you increment them, but then 
> you run into the effective range issue.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> OK


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> OK, the `ODP_ASSERT()` would still be useful for debugging.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 That functionality is not obvious from the name. It either implies 
 that one of the input arguments is written (not true here) or the 
 reader might assume that it is an expression without side-effect and 
 should be deleted (what I originally thought when reading it). You 
 should pick a routine name that makes it clear it's actually doing 
 something real, in this case performing prefetch processing.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
> (ring_st_is_empty(...) == 1)` so this is inconsistent. 


>> Petri Savolainen(psavol) wrote:
>> Didn't try larger than 32. 32 is already quite large from QoS point 
>> of view. 
>> 
>> I'm planning to use config file for run time tunning, so this hard 
>> coding may change in that phase.


>>> Petri Savolainen(psavol) wrote:
>>> One entry is not lost.


 Petri Savolainen(psavol) wrote:
 One entry is not lost. User provided size if not (currently) used. 
 Queue size is always 4k.


> Petri Savolainen(psavol) wrote:
> One entry is not lost.


>> Petri Savolainen(psavol) wrote:
>> One entry is not lost.


>>> Petri Savolainen(psavol) wrote:
>>> OK, added checks in v2.


 Petri Savolainen(psavol) wrote:
 OK. Compiler probably did that already, but changed 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_queue.c
line 95
@@ -143,8 +150,10 @@ static int queue_capability(odp_queue_capability_t *capa)
capa->max_sched_groups  = sched_fn->num_grps();
capa->sched_prios   = odp_schedule_num_prio();
capa->plain.max_num = capa->max_queues;
+   capa->plain.max_size= CONFIG_QUEUE_SIZE;
capa->plain.nonblocking = ODP_BLOCKING;
capa->sched.max_num = capa->max_queues;
+   capa->sched.max_size= CONFIG_QUEUE_SIZE;


Comment:
Agreed.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Correct, as noted earlier. I withdraw that comment.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Presumably the compiler will see the overlap and optimize away the 
>> redundancy, so I assume the performance impact will be nil here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Since you're allowing `head` and `tail` to run over the entire 32-bit 
>>> range, you're correct that you can completely fill the ring. I did miss 
>>> that point. However, as shown above you still need this to be:
>>> 
>>> ```
>>> num = size - abs(tail - head);
>>> ```
>>> To avoid problems at the 32-bit wrap boundary.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 You're computing in 32 bits not 8 bits, and your ring size is less than 
 2^32 elements. Consider the following test program:
 ```
 #include 
 #include 
 #include 
 #include 
 
 void main() {
uint32_t head[4] = {0, 1, 2, 3};
uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd};
uint32_t mask = 4095;
uint32_t result;
int i;
 
for (i = 0; i < 4; i++) {
printf("head = %" PRIu32 " tail = %" PRIu32 ":\n",
   head[i], tail[i]);
 
result = tail[i] - head[i];
printf("tail - head = %" PRIu32 "\n", result);
 
result = (tail[i] - head[i]) & mask;
printf("(tail - head) & mask = %" PRIu32 "\n", result);
 
result = abs(tail[i] - head[i]);
printf("abs(tail - head) = %" PRIu32 "\n\n", result);
}
 }
 ```
 in theory `tail - head` should be the number of elements in the ring, in 
 this case 0, 2, 4, and 6. But running this test program gives the 
 following output:
 ```
 head = 0 tail = 0:
 tail - head = 0
 (tail - head) & mask = 0
 abs(tail - head) = 0
 
 head = 1 tail = 4294967295:
 tail - head = 4294967294
 (tail - head) & mask = 4094
 abs(tail - head) = 2
 
 head = 2 tail = 4294967294:
 tail - head = 4294967292
 (tail - head) & mask = 4092
 abs(tail - head) = 4
 
 head = 3 tail = 4294967293:
 tail - head = 4294967290
 (tail - head) & mask = 4090
 abs(tail - head) = 6
 ```
 Since you're allowing head to run free over the 32-bit range of the 
 variable, when the 32-bits rolls over you'll get a large positive number, 
 not the small one you need to stay within the ring bounds. The alternative 
 is to mask `head` and `tail` as you increment them, but then you run into 
 the effective range issue.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> OK


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> OK, the `ODP_ASSERT()` would still be useful for debugging.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> That functionality is not obvious from the name. It either implies that 
>>> one of the input arguments is written (not true here) or the reader 
>>> might assume that it is an expression without side-effect and should be 
>>> deleted (what I originally thought when reading it). You should pick a 
>>> routine name that makes it clear it's actually doing something real, in 
>>> this case performing prefetch processing.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
 (ring_st_is_empty(...) == 1)` so this is inconsistent. 


> Petri Savolainen(psavol) wrote:
> Didn't try larger than 32. 32 is already quite large from QoS point 
> of view. 
> 
> I'm planning to use config file for run time tunning, so this hard 
> coding may change in that phase.


>> Petri Savolainen(psavol) wrote:
>> One entry is not lost.


>>> Petri Savolainen(psavol) wrote:
>>> One entry is not lost. User provided size if not (currently) used. 
>>> Queue size is always 4k.


 Petri Savolainen(psavol) wrote:
 One entry is not lost.


> Petri Savolainen(psavol) wrote:
> One entry is not lost.


>> Petri Savolainen(psavol) wrote:
>> OK, added checks in v2.


>>> Petri Savolainen(psavol) wrote:
>>> OK. 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_queue.c
line 92
@@ -143,8 +150,10 @@ static int queue_capability(odp_queue_capability_t *capa)
capa->max_sched_groups  = sched_fn->num_grps();
capa->sched_prios   = odp_schedule_num_prio();
capa->plain.max_num = capa->max_queues;
+   capa->plain.max_size= CONFIG_QUEUE_SIZE;


Comment:
Correct, as noted earlier. I withdraw that comment.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Presumably the compiler will see the overlap and optimize away the 
> redundancy, so I assume the performance impact will be nil here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Since you're allowing `head` and `tail` to run over the entire 32-bit range, 
>> you're correct that you can completely fill the ring. I did miss that point. 
>> However, as shown above you still need this to be:
>> 
>> ```
>> num = size - abs(tail - head);
>> ```
>> To avoid problems at the 32-bit wrap boundary.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> You're computing in 32 bits not 8 bits, and your ring size is less than 
>>> 2^32 elements. Consider the following test program:
>>> ```
>>> #include 
>>> #include 
>>> #include 
>>> #include 
>>> 
>>> void main() {
>>> uint32_t head[4] = {0, 1, 2, 3};
>>> uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd};
>>> uint32_t mask = 4095;
>>> uint32_t result;
>>> int i;
>>> 
>>> for (i = 0; i < 4; i++) {
>>> printf("head = %" PRIu32 " tail = %" PRIu32 ":\n",
>>>head[i], tail[i]);
>>> 
>>> result = tail[i] - head[i];
>>> printf("tail - head = %" PRIu32 "\n", result);
>>> 
>>> result = (tail[i] - head[i]) & mask;
>>> printf("(tail - head) & mask = %" PRIu32 "\n", result);
>>> 
>>> result = abs(tail[i] - head[i]);
>>> printf("abs(tail - head) = %" PRIu32 "\n\n", result);
>>> }
>>> }
>>> ```
>>> in theory `tail - head` should be the number of elements in the ring, in 
>>> this case 0, 2, 4, and 6. But running this test program gives the following 
>>> output:
>>> ```
>>> head = 0 tail = 0:
>>> tail - head = 0
>>> (tail - head) & mask = 0
>>> abs(tail - head) = 0
>>> 
>>> head = 1 tail = 4294967295:
>>> tail - head = 4294967294
>>> (tail - head) & mask = 4094
>>> abs(tail - head) = 2
>>> 
>>> head = 2 tail = 4294967294:
>>> tail - head = 4294967292
>>> (tail - head) & mask = 4092
>>> abs(tail - head) = 4
>>> 
>>> head = 3 tail = 4294967293:
>>> tail - head = 4294967290
>>> (tail - head) & mask = 4090
>>> abs(tail - head) = 6
>>> ```
>>> Since you're allowing head to run free over the 32-bit range of the 
>>> variable, when the 32-bits rolls over you'll get a large positive number, 
>>> not the small one you need to stay within the ring bounds. The alternative 
>>> is to mask `head` and `tail` as you increment them, but then you run into 
>>> the effective range issue.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 OK


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> OK, the `ODP_ASSERT()` would still be useful for debugging.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> That functionality is not obvious from the name. It either implies that 
>> one of the input arguments is written (not true here) or the reader 
>> might assume that it is an expression without side-effect and should be 
>> deleted (what I originally thought when reading it). You should pick a 
>> routine name that makes it clear it's actually doing something real, in 
>> this case performing prefetch processing.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
>>> (ring_st_is_empty(...) == 1)` so this is inconsistent. 


 Petri Savolainen(psavol) wrote:
 Didn't try larger than 32. 32 is already quite large from QoS point of 
 view. 
 
 I'm planning to use config file for run time tunning, so this hard 
 coding may change in that phase.


> Petri Savolainen(psavol) wrote:
> One entry is not lost.


>> Petri Savolainen(psavol) wrote:
>> One entry is not lost. User provided size if not (currently) used. 
>> Queue size is always 4k.


>>> Petri Savolainen(psavol) wrote:
>>> One entry is not lost.


 Petri Savolainen(psavol) wrote:
 One entry is not lost.


> Petri Savolainen(psavol) wrote:
> OK, added checks in v2.


>> Petri Savolainen(psavol) wrote:
>> OK. Compiler probably did that already, but changed in v2.


>>> Petri Savolainen(psavol) wrote:
>>> Tail and head indexes are (masked from) uint32_t and do not 
>>> wrap around when the ring is full. I think you assume that the 
>>> store index is 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_pool.c
line 28
@@ -296,7 +282,9 @@ static void init_buffers(pool_t *pool)
memset(buf_hdr, 0, (uintptr_t)data - (uintptr_t)buf_hdr);
 
/* Initialize buffer metadata */
-   buf_hdr->index = i;
+   buf_hdr->index.u32= 0;
+   buf_hdr->index.pool   = pool->pool_idx;
+   buf_hdr->index.buffer = i;


Comment:
Presumably the compiler will see the overlap and optimize away the redundancy, 
so I assume the performance impact will be nil here.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Since you're allowing `head` and `tail` to run over the entire 32-bit range, 
> you're correct that you can completely fill the ring. I did miss that point. 
> However, as shown above you still need this to be:
> 
> ```
> num = size - abs(tail - head);
> ```
> To avoid problems at the 32-bit wrap boundary.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> You're computing in 32 bits not 8 bits, and your ring size is less than 2^32 
>> elements. Consider the following test program:
>> ```
>> #include 
>> #include 
>> #include 
>> #include 
>> 
>> void main() {
>>  uint32_t head[4] = {0, 1, 2, 3};
>>  uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd};
>>  uint32_t mask = 4095;
>>  uint32_t result;
>>  int i;
>> 
>>  for (i = 0; i < 4; i++) {
>>  printf("head = %" PRIu32 " tail = %" PRIu32 ":\n",
>> head[i], tail[i]);
>> 
>>  result = tail[i] - head[i];
>>  printf("tail - head = %" PRIu32 "\n", result);
>> 
>>  result = (tail[i] - head[i]) & mask;
>>  printf("(tail - head) & mask = %" PRIu32 "\n", result);
>> 
>>  result = abs(tail[i] - head[i]);
>>  printf("abs(tail - head) = %" PRIu32 "\n\n", result);
>>  }
>> }
>> ```
>> in theory `tail - head` should be the number of elements in the ring, in 
>> this case 0, 2, 4, and 6. But running this test program gives the following 
>> output:
>> ```
>> head = 0 tail = 0:
>> tail - head = 0
>> (tail - head) & mask = 0
>> abs(tail - head) = 0
>> 
>> head = 1 tail = 4294967295:
>> tail - head = 4294967294
>> (tail - head) & mask = 4094
>> abs(tail - head) = 2
>> 
>> head = 2 tail = 4294967294:
>> tail - head = 4294967292
>> (tail - head) & mask = 4092
>> abs(tail - head) = 4
>> 
>> head = 3 tail = 4294967293:
>> tail - head = 4294967290
>> (tail - head) & mask = 4090
>> abs(tail - head) = 6
>> ```
>> Since you're allowing head to run free over the 32-bit range of the 
>> variable, when the 32-bits rolls over you'll get a large positive number, 
>> not the small one you need to stay within the ring bounds. The alternative 
>> is to mask `head` and `tail` as you increment them, but then you run into 
>> the effective range issue.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> OK


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 OK, the `ODP_ASSERT()` would still be useful for debugging.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> That functionality is not obvious from the name. It either implies that 
> one of the input arguments is written (not true here) or the reader might 
> assume that it is an expression without side-effect and should be deleted 
> (what I originally thought when reading it). You should pick a routine 
> name that makes it clear it's actually doing something real, in this case 
> performing prefetch processing.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
>> (ring_st_is_empty(...) == 1)` so this is inconsistent. 


>>> Petri Savolainen(psavol) wrote:
>>> Didn't try larger than 32. 32 is already quite large from QoS point of 
>>> view. 
>>> 
>>> I'm planning to use config file for run time tunning, so this hard 
>>> coding may change in that phase.


 Petri Savolainen(psavol) wrote:
 One entry is not lost.


> Petri Savolainen(psavol) wrote:
> One entry is not lost. User provided size if not (currently) used. 
> Queue size is always 4k.


>> Petri Savolainen(psavol) wrote:
>> One entry is not lost.


>>> Petri Savolainen(psavol) wrote:
>>> One entry is not lost.


 Petri Savolainen(psavol) wrote:
 OK, added checks in v2.


> Petri Savolainen(psavol) wrote:
> OK. Compiler probably did that already, but changed in v2.


>> Petri Savolainen(psavol) wrote:
>> Tail and head indexes are (masked from) uint32_t and do not wrap 
>> around when the ring is full. I think you assume that the store 
>> index is 0...size-1, while it's full uint32_t which is then 
>> masked to get the actual index.
>> 
>> For example:

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_queue.c
line 420
@@ -584,8 +556,9 @@ static int queue_init(queue_entry_t *queue, const char 
*name,
queue->s.pktin = PKTIN_INVALID;
queue->s.pktout = PKTOUT_INVALID;
 
-   queue->s.head = NULL;
-   queue->s.tail = NULL;
+   ring_st_init(>s.ring_st,
+queue_tbl->ring_data[queue->s.index].data,
+CONFIG_QUEUE_SIZE);


Comment:
Agreed.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Agreed, this is good for now. Later we may wish to honor the user-requested 
> queue `size` parameter.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Agreed.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Correct, as noted earlier. I withdraw that comment.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Presumably the compiler will see the overlap and optimize away the 
 redundancy, so I assume the performance impact will be nil here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Since you're allowing `head` and `tail` to run over the entire 32-bit 
> range, you're correct that you can completely fill the ring. I did miss 
> that point. However, as shown above you still need this to be:
> 
> ```
> num = size - abs(tail - head);
> ```
> To avoid problems at the 32-bit wrap boundary.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> You're computing in 32 bits not 8 bits, and your ring size is less than 
>> 2^32 elements. Consider the following test program:
>> ```
>> #include 
>> #include 
>> #include 
>> #include 
>> 
>> void main() {
>>  uint32_t head[4] = {0, 1, 2, 3};
>>  uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd};
>>  uint32_t mask = 4095;
>>  uint32_t result;
>>  int i;
>> 
>>  for (i = 0; i < 4; i++) {
>>  printf("head = %" PRIu32 " tail = %" PRIu32 ":\n",
>> head[i], tail[i]);
>> 
>>  result = tail[i] - head[i];
>>  printf("tail - head = %" PRIu32 "\n", result);
>> 
>>  result = (tail[i] - head[i]) & mask;
>>  printf("(tail - head) & mask = %" PRIu32 "\n", result);
>> 
>>  result = abs(tail[i] - head[i]);
>>  printf("abs(tail - head) = %" PRIu32 "\n\n", result);
>>  }
>> }
>> ```
>> in theory `tail - head` should be the number of elements in the ring, in 
>> this case 0, 2, 4, and 6. But running this test program gives the 
>> following output:
>> ```
>> head = 0 tail = 0:
>> tail - head = 0
>> (tail - head) & mask = 0
>> abs(tail - head) = 0
>> 
>> head = 1 tail = 4294967295:
>> tail - head = 4294967294
>> (tail - head) & mask = 4094
>> abs(tail - head) = 2
>> 
>> head = 2 tail = 4294967294:
>> tail - head = 4294967292
>> (tail - head) & mask = 4092
>> abs(tail - head) = 4
>> 
>> head = 3 tail = 4294967293:
>> tail - head = 4294967290
>> (tail - head) & mask = 4090
>> abs(tail - head) = 6
>> ```
>> Since you're allowing head to run free over the 32-bit range of the 
>> variable, when the 32-bits rolls over you'll get a large positive 
>> number, not the small one you need to stay within the ring bounds. The 
>> alternative is to mask `head` and `tail` as you increment them, but then 
>> you run into the effective range issue.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> OK


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 OK, the `ODP_ASSERT()` would still be useful for debugging.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> That functionality is not obvious from the name. It either implies 
> that one of the input arguments is written (not true here) or the 
> reader might assume that it is an expression without side-effect and 
> should be deleted (what I originally thought when reading it). You 
> should pick a routine name that makes it clear it's actually doing 
> something real, in this case performing prefetch processing.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
>> (ring_st_is_empty(...) == 1)` so this is inconsistent. 


>>> Petri Savolainen(psavol) wrote:
>>> Didn't try larger than 32. 32 is already quite large from QoS point 
>>> of view. 
>>> 
>>> I'm planning to use config file for run time tunning, so this hard 
>>> coding may change in that phase.


 Petri Savolainen(psavol) wrote:
 One entry is not lost.


> Petri Savolainen(psavol) wrote:
> One entry is not lost. User provided size if not (currently) 
> used. Queue size is always 4k.


>> Petri 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 78
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+   ring->head = 0;
+   ring->tail = 0;
+   ring->mask = size - 1;
+   ring->data = data;
+}
+
+/* Dequeue data from the ring head. Max_num is smaller than ring size.*/
+static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[],
+uint32_t max_num)
+{
+   uint32_t head, tail, mask, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   num  = tail - head;
+
+   /* Empty */
+   if (num == 0)
+   return 0;
+
+   if (num > max_num)
+   num = max_num;
+
+   idx = head & mask;
+
+   for (i = 0; i < num; i++) {
+   data[i] = ring->data[idx];
+   idx = (idx + 1) & mask;
+   }
+
+   ring->head = head + num;
+
+   return num;
+}
+
+/* Enqueue data into the ring tail. Num_data is smaller than ring size. */
+static inline uint32_t ring_st_enq_multi(ring_st_t *ring, const uint32_t 
data[],
+uint32_t num_data)
+{
+   uint32_t head, tail, mask, size, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   size = mask + 1;
+   num  = size - (tail - head);


Comment:
Since you're allowing `head` and `tail` to run over the entire 32-bit range, 
you're correct that you can completely fill the ring. I did miss that point. 
However, as shown above you still need this to be:

```
num = size - abs(tail - head);
```
To avoid problems at the 32-bit wrap boundary.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> You're computing in 32 bits not 8 bits, and your ring size is less than 2^32 
> elements. Consider the following test program:
> ```
> #include 
> #include 
> #include 
> #include 
> 
> void main() {
>   uint32_t head[4] = {0, 1, 2, 3};
>   uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd};
>   uint32_t mask = 4095;
>   uint32_t result;
>   int i;
> 
>   for (i = 0; i < 4; i++) {
>   printf("head = %" PRIu32 " tail = %" PRIu32 ":\n",
>  head[i], tail[i]);
> 
>   result = tail[i] - head[i];
>   printf("tail - head = %" PRIu32 "\n", result);
> 
>   result = (tail[i] - head[i]) & mask;
>   printf("(tail - head) & mask = %" PRIu32 "\n", result);
> 
>   result = abs(tail[i] - head[i]);
>   printf("abs(tail - head) = %" PRIu32 "\n\n", result);
>   }
> }
> ```
> in theory `tail - head` should be the number of elements in the ring, in this 
> case 0, 2, 4, and 6. But running this test program gives the following output:
> ```
> head = 0 tail = 0:
> tail - head = 0
> (tail - head) & mask = 0
> abs(tail - head) = 0
> 
> head = 1 tail = 4294967295:
> tail - head = 4294967294
> (tail - head) & mask = 4094
> abs(tail - head) = 2
> 
> head = 2 tail = 4294967294:
> tail - head = 4294967292
> (tail - head) & mask = 4092
> abs(tail - head) = 4
> 
> head = 3 tail = 4294967293:
> tail - head = 4294967290
> (tail - head) & mask = 4090
> abs(tail - head) = 6
> ```
> Since you're allowing head to run free over the 32-bit range of the variable, 
> when the 32-bits rolls over you'll get a large positive number, not the small 
> one you need to stay within the ring bounds. The alternative is to mask 
> `head` and `tail` as you increment them, but then you run into the effective 
> range issue.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> OK


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> OK, the `ODP_ASSERT()` would still be useful for debugging.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 That functionality is not obvious from the name. It either implies that 
 one of the input arguments is written (not true here) or the reader might 
 assume that it is an expression without side-effect and should be deleted 
 (what I originally thought when reading it). You should pick a routine 
 name that makes it clear it's actually doing something real, in this case 
 performing prefetch processing.


> Bill 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 46
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+   ring->head = 0;
+   ring->tail = 0;
+   ring->mask = size - 1;
+   ring->data = data;
+}
+
+/* Dequeue data from the ring head. Max_num is smaller than ring size.*/
+static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[],
+uint32_t max_num)
+{
+   uint32_t head, tail, mask, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   num  = tail - head;


Comment:
You're computing in 32 bits not 8 bits, and your ring size is less than 2^32 
elements. Consider the following test program:
```
#include 
#include 
#include 
#include 

void main() {
uint32_t head[4] = {0, 1, 2, 3};
uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd};
uint32_t mask = 4095;
uint32_t result;
int i;

for (i = 0; i < 4; i++) {
printf("head = %" PRIu32 " tail = %" PRIu32 ":\n",
   head[i], tail[i]);

result = tail[i] - head[i];
printf("tail - head = %" PRIu32 "\n", result);

result = (tail[i] - head[i]) & mask;
printf("(tail - head) & mask = %" PRIu32 "\n", result);

result = abs(tail[i] - head[i]);
printf("abs(tail - head) = %" PRIu32 "\n\n", result);
}
}
```
in theory `tail - head` should be the number of elements in the ring, in this 
case 0, 2, 4, and 6. But running this test program gives the following output:
```
head = 0 tail = 0:
tail - head = 0
(tail - head) & mask = 0
abs(tail - head) = 0

head = 1 tail = 4294967295:
tail - head = 4294967294
(tail - head) & mask = 4094
abs(tail - head) = 2

head = 2 tail = 4294967294:
tail - head = 4294967292
(tail - head) & mask = 4092
abs(tail - head) = 4

head = 3 tail = 4294967293:
tail - head = 4294967290
(tail - head) & mask = 4090
abs(tail - head) = 6
```
Since you're allowing head to run free over the 32-bit range of the variable, 
when the 32-bits rolls over you'll get a large positive number, not the small 
one you need to stay within the ring bounds. The alternative is to mask `head` 
and `tail` as you increment them, but then you run into the effective range 
issue.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> OK


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> OK, the `ODP_ASSERT()` would still be useful for debugging.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> That functionality is not obvious from the name. It either implies that one 
>>> of the input arguments is written (not true here) or the reader might 
>>> assume that it is an expression without side-effect and should be deleted 
>>> (what I originally thought when reading it). You should pick a routine name 
>>> that makes it clear it's actually doing something real, in this case 
>>> performing prefetch processing.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
 (ring_st_is_empty(...) == 1)` so this is inconsistent. 


> Petri Savolainen(psavol) wrote:
> Didn't try larger than 32. 32 is already quite large from QoS point of 
> view. 
> 
> I'm planning to use config file for run time tunning, so this hard coding 
> may change in that phase.


>> Petri Savolainen(psavol) wrote:
>> One entry is not lost.


>>> Petri Savolainen(psavol) wrote:
>>> One entry is not lost. User provided size if not (currently) used. 
>>> Queue size is always 4k.


 Petri Savolainen(psavol) wrote:
 One entry is not lost.


> Petri Savolainen(psavol) wrote:
> One entry is not lost.


>> Petri Savolainen(psavol) wrote:
>> OK, added checks in v2.


>>> Petri Savolainen(psavol) wrote:
>>> OK. Compiler probably did that already, but changed in v2.


 Petri Savolainen(psavol) wrote:
 Tail and head indexes are (masked from) uint32_t and do not wrap 
 around when the ring is full. I think you assume that the store 
 index is 0...size-1, while it's full uint32_t 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 24
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+


Comment:
OK

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> OK, the `ODP_ASSERT()` would still be useful for debugging.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> That functionality is not obvious from the name. It either implies that one 
>> of the input arguments is written (not true here) or the reader might assume 
>> that it is an expression without side-effect and should be deleted (what I 
>> originally thought when reading it). You should pick a routine name that 
>> makes it clear it's actually doing something real, in this case performing 
>> prefetch processing.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
>>> (ring_st_is_empty(...) == 1)` so this is inconsistent. 


 Petri Savolainen(psavol) wrote:
 Didn't try larger than 32. 32 is already quite large from QoS point of 
 view. 
 
 I'm planning to use config file for run time tunning, so this hard coding 
 may change in that phase.


> Petri Savolainen(psavol) wrote:
> One entry is not lost.


>> Petri Savolainen(psavol) wrote:
>> One entry is not lost. User provided size if not (currently) used. Queue 
>> size is always 4k.


>>> Petri Savolainen(psavol) wrote:
>>> One entry is not lost.


 Petri Savolainen(psavol) wrote:
 One entry is not lost.


> Petri Savolainen(psavol) wrote:
> OK, added checks in v2.


>> Petri Savolainen(psavol) wrote:
>> OK. Compiler probably did that already, but changed in v2.


>>> Petri Savolainen(psavol) wrote:
>>> Tail and head indexes are (masked from) uint32_t and do not wrap 
>>> around when the ring is full. I think you assume that the store 
>>> index is 0...size-1, while it's full uint32_t which is then masked 
>>> to get the actual index.
>>> 
>>> For example:
>>> size = 100;
>>> 
>>> Empty:
>>> head = 100
>>> tail = 100
>>> num = 100 - 100 = 0
>>> 
>>> Full:
>>> head = 100
>>> tail = 200
>>> num = 200 - 100 = 100
>>> 
>>> Wrap uint32_t + full:
>>> head = 0xFF9C
>>> tail = 0
>>> num = 0 - 0xFF9C = 0x64 = 100
>>> 
>>> So, no abs() needed. Ring size can be 4096, instead of 4095.


 Petri Savolainen(psavol) wrote:
 It's already documented 5 lines above:
 
 /* Initialize ring. Ring size must be a power of two. */
 static inline void ring_st_init(ring_st_t *ring, uint32_t *data, 
 uint32_t size)
 {


> Petri Savolainen(psavol) wrote:
> This function converts 32 bit buffer indexes to buffer header 
> pointers. The counter operation is buffer_index_from_buf(). The 
> prefetch is a side effect of the function, which may be 
> changed/moved any time if it's found out that there's a place for 
> prefetching. I actually plan to test if number of prefetches 
> should be limited as e.g. 32 consecutive prefetches may be too 
> much for some CPU architectures.


>> Petri Savolainen(psavol) wrote:
>> I prefer style where '== 0' is used instead of '!'. Especially, 
>> when the if clause is as complex as this and there's danger for 
>> reader to miss the '!' sign. 


>>> Petri Savolainen(psavol) wrote:
>>> It's there to ensure that all bits are zero also when someone 
>>> would modify the bitfield from two to three fields later on. 
>>> Similarly to memset() zero is used for struct inits.


 Petri Savolainen(psavol) wrote:
 There's no need for abs(). Since it's all uint32_t variables, 
 wrap a round is handled already.
 An example in 8bits:
 0xff - 0xfd = 0x02
 0x00 - 0xfe = 0x02
 0x01 - 0xff = 0x02
 0x02 - 0x00 = 0x02
 
 This passes both gcc and clang, and is used already in the 
 other ring implementation see ring_deq_multi().



Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 32
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+   ring->head = 0;
+   ring->tail = 0;
+   ring->mask = size - 1;


Comment:
OK, the `ODP_ASSERT()` would still be useful for debugging.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> That functionality is not obvious from the name. It either implies that one 
> of the input arguments is written (not true here) or the reader might assume 
> that it is an expression without side-effect and should be deleted (what I 
> originally thought when reading it). You should pick a routine name that 
> makes it clear it's actually doing something real, in this case performing 
> prefetch processing.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
>> (ring_st_is_empty(...) == 1)` so this is inconsistent. 


>>> Petri Savolainen(psavol) wrote:
>>> Didn't try larger than 32. 32 is already quite large from QoS point of 
>>> view. 
>>> 
>>> I'm planning to use config file for run time tunning, so this hard coding 
>>> may change in that phase.


 Petri Savolainen(psavol) wrote:
 One entry is not lost.


> Petri Savolainen(psavol) wrote:
> One entry is not lost. User provided size if not (currently) used. Queue 
> size is always 4k.


>> Petri Savolainen(psavol) wrote:
>> One entry is not lost.


>>> Petri Savolainen(psavol) wrote:
>>> One entry is not lost.


 Petri Savolainen(psavol) wrote:
 OK, added checks in v2.


> Petri Savolainen(psavol) wrote:
> OK. Compiler probably did that already, but changed in v2.


>> Petri Savolainen(psavol) wrote:
>> Tail and head indexes are (masked from) uint32_t and do not wrap 
>> around when the ring is full. I think you assume that the store 
>> index is 0...size-1, while it's full uint32_t which is then masked 
>> to get the actual index.
>> 
>> For example:
>> size = 100;
>> 
>> Empty:
>> head = 100
>> tail = 100
>> num = 100 - 100 = 0
>> 
>> Full:
>> head = 100
>> tail = 200
>> num = 200 - 100 = 100
>> 
>> Wrap uint32_t + full:
>> head = 0xFF9C
>> tail = 0
>> num = 0 - 0xFF9C = 0x64 = 100
>> 
>> So, no abs() needed. Ring size can be 4096, instead of 4095.


>>> Petri Savolainen(psavol) wrote:
>>> It's already documented 5 lines above:
>>> 
>>> /* Initialize ring. Ring size must be a power of two. */
>>> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, 
>>> uint32_t size)
>>> {


 Petri Savolainen(psavol) wrote:
 This function converts 32 bit buffer indexes to buffer header 
 pointers. The counter operation is buffer_index_from_buf(). The 
 prefetch is a side effect of the function, which may be 
 changed/moved any time if it's found out that there's a place for 
 prefetching. I actually plan to test if number of prefetches 
 should be limited as e.g. 32 consecutive prefetches may be too 
 much for some CPU architectures.


> Petri Savolainen(psavol) wrote:
> I prefer style where '== 0' is used instead of '!'. Especially, 
> when the if clause is as complex as this and there's danger for 
> reader to miss the '!' sign. 


>> Petri Savolainen(psavol) wrote:
>> It's there to ensure that all bits are zero also when someone 
>> would modify the bitfield from two to three fields later on. 
>> Similarly to memset() zero is used for struct inits.


>>> Petri Savolainen(psavol) wrote:
>>> There's no need for abs(). Since it's all uint32_t variables, 
>>> wrap a round is handled already.
>>> An example in 8bits:
>>> 0xff - 0xfd = 0x02
>>> 0x00 - 0xfe = 0x02
>>> 0x01 - 0xff = 0x02
>>> 0x02 - 0x00 = 0x02
>>> 
>>> This passes both gcc and clang, and is 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_queue.c
@@ -471,51 +476,18 @@ static inline int deq_multi(queue_entry_t *queue, 
odp_buffer_hdr_t *buf_hdr[],
}
 
UNLOCK(>s.lock);
-   return 0;
-   }
-
-   for (i = 0; i < num && hdr; ) {
-   int burst_num = hdr->burst_num;
-   int first = hdr->burst_first;
 
-   /* First, get bursted buffers */
-   for (j = 0; j < burst_num && i < num; j++, i++) {
-   buf_hdr[i] = hdr->burst[first + j];
-   odp_prefetch(buf_hdr[i]);
-   }
-
-   if (burst_num) {
-   hdr->burst_num   = burst_num - j;
-   hdr->burst_first = first + j;
-   }
-
-   if (i == num)
-   break;
-
-   /* When burst is empty, consume the current buffer header and
-* move to the next header */
-   buf_hdr[i] = hdr;
-   next   = hdr->next;
-   hdr->next  = NULL;
-   hdr= next;
-   updated++;
-   i++;
+   return 0;
}
 
-   /* Write head only if updated */
-   if (updated)
-   queue->s.head = hdr;
-
-   /* Queue is empty */
-   if (hdr == NULL)
-   queue->s.tail = NULL;
-
if (status_sync && queue->s.type == ODP_QUEUE_TYPE_SCHED)
sched_fn->save_context(queue->s.index);
 
UNLOCK(>s.lock);
 
-   return i;
+   buffer_index_to_buf(buf_hdr, buf_idx, num_deq);


Comment:
That functionality is not obvious from the name. It either implies that one of 
the input arguments is written (not true here) or the reader might assume that 
it is an expression without side-effect and should be deleted (what I 
originally thought when reading it). You should pick a routine name that makes 
it clear it's actually doing something real, in this case performing prefetch 
processing.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
> (ring_st_is_empty(...) == 1)` so this is inconsistent. 


>> Petri Savolainen(psavol) wrote:
>> Didn't try larger than 32. 32 is already quite large from QoS point of view. 
>> 
>> I'm planning to use config file for run time tunning, so this hard coding 
>> may change in that phase.


>>> Petri Savolainen(psavol) wrote:
>>> One entry is not lost.


 Petri Savolainen(psavol) wrote:
 One entry is not lost. User provided size if not (currently) used. Queue 
 size is always 4k.


> Petri Savolainen(psavol) wrote:
> One entry is not lost.


>> Petri Savolainen(psavol) wrote:
>> One entry is not lost.


>>> Petri Savolainen(psavol) wrote:
>>> OK, added checks in v2.


 Petri Savolainen(psavol) wrote:
 OK. Compiler probably did that already, but changed in v2.


> Petri Savolainen(psavol) wrote:
> Tail and head indexes are (masked from) uint32_t and do not wrap 
> around when the ring is full. I think you assume that the store index 
> is 0...size-1, while it's full uint32_t which is then masked to get 
> the actual index.
> 
> For example:
> size = 100;
> 
> Empty:
> head = 100
> tail = 100
> num = 100 - 100 = 0
> 
> Full:
> head = 100
> tail = 200
> num = 200 - 100 = 100
> 
> Wrap uint32_t + full:
> head = 0xFF9C
> tail = 0
> num = 0 - 0xFF9C = 0x64 = 100
> 
> So, no abs() needed. Ring size can be 4096, instead of 4095.


>> Petri Savolainen(psavol) wrote:
>> It's already documented 5 lines above:
>> 
>> /* Initialize ring. Ring size must be a power of two. */
>> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, 
>> uint32_t size)
>> {


>>> Petri Savolainen(psavol) wrote:
>>> This function converts 32 bit buffer indexes to buffer header 
>>> pointers. The counter operation is buffer_index_from_buf(). The 
>>> prefetch is a side effect of the function, which may be 
>>> changed/moved any time if it's found out that there's a place for 
>>> prefetching. I actually plan to test if number of prefetches should 
>>> be limited as e.g. 32 consecutive prefetches may be too much for 
>>> some CPU architectures.


 Petri Savolainen(psavol) wrote:
 I prefer style where '== 0' is used instead of '!'. Especially, 
 when the if clause is as complex as this and there's danger for 
 reader to miss the '!' sign. 


> Petri Savolainen(psavol) wrote:
> It's there to ensure that all bits are zero also when 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_queue.c
@@ -263,7 +275,7 @@ static int queue_destroy(odp_queue_t handle)
ODP_ERR("queue \"%s\" already destroyed\n", queue->s.name);
return -1;
}
-   if (queue->s.head != NULL) {
+   if (ring_st_is_empty(>s.ring_st) == 0) {


Comment:
Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
(ring_st_is_empty(...) == 1)` so this is inconsistent. 

> Petri Savolainen(psavol) wrote:
> Didn't try larger than 32. 32 is already quite large from QoS point of view. 
> 
> I'm planning to use config file for run time tunning, so this hard coding may 
> change in that phase.


>> Petri Savolainen(psavol) wrote:
>> One entry is not lost.


>>> Petri Savolainen(psavol) wrote:
>>> One entry is not lost. User provided size if not (currently) used. Queue 
>>> size is always 4k.


 Petri Savolainen(psavol) wrote:
 One entry is not lost.


> Petri Savolainen(psavol) wrote:
> One entry is not lost.


>> Petri Savolainen(psavol) wrote:
>> OK, added checks in v2.


>>> Petri Savolainen(psavol) wrote:
>>> OK. Compiler probably did that already, but changed in v2.


 Petri Savolainen(psavol) wrote:
 Tail and head indexes are (masked from) uint32_t and do not wrap 
 around when the ring is full. I think you assume that the store index 
 is 0...size-1, while it's full uint32_t which is then masked to get 
 the actual index.
 
 For example:
 size = 100;
 
 Empty:
 head = 100
 tail = 100
 num = 100 - 100 = 0
 
 Full:
 head = 100
 tail = 200
 num = 200 - 100 = 100
 
 Wrap uint32_t + full:
 head = 0xFF9C
 tail = 0
 num = 0 - 0xFF9C = 0x64 = 100
 
 So, no abs() needed. Ring size can be 4096, instead of 4095.


> Petri Savolainen(psavol) wrote:
> It's already documented 5 lines above:
> 
> /* Initialize ring. Ring size must be a power of two. */
> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, 
> uint32_t size)
> {


>> Petri Savolainen(psavol) wrote:
>> This function converts 32 bit buffer indexes to buffer header 
>> pointers. The counter operation is buffer_index_from_buf(). The 
>> prefetch is a side effect of the function, which may be 
>> changed/moved any time if it's found out that there's a place for 
>> prefetching. I actually plan to test if number of prefetches should 
>> be limited as e.g. 32 consecutive prefetches may be too much for 
>> some CPU architectures.


>>> Petri Savolainen(psavol) wrote:
>>> I prefer style where '== 0' is used instead of '!'. Especially, 
>>> when the if clause is as complex as this and there's danger for 
>>> reader to miss the '!' sign. 


 Petri Savolainen(psavol) wrote:
 It's there to ensure that all bits are zero also when someone 
 would modify the bitfield from two to three fields later on. 
 Similarly to memset() zero is used for struct inits.


> Petri Savolainen(psavol) wrote:
> There's no need for abs(). Since it's all uint32_t variables, 
> wrap a round is handled already.
> An example in 8bits:
> 0xff - 0xfd = 0x02
> 0x00 - 0xfe = 0x02
> 0x01 - 0xff = 0x02
> 0x02 - 0x00 = 0x02
> 
> This passes both gcc and clang, and is used already in the other 
> ring implementation see ring_deq_multi().


>> Petri Savolainen(psavol) wrote:
>> I prefer style with blank line in the end of a typedef, since 
>> it's easier to spot the type name (as it's not mixed into struct 
>> field names). Checkpatch passes so this style should be OK.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Does this mean that sizes larger than 32 have no added 
>>> performance benefit?


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if 
 we're not going to use the user-supplied queue size.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Given its name, this looks like an extraneous statement that 
> should be deleted. Renaming this to something like 
> `prefetch_dequeued_bufs()` would make the intent clearer here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> `if (!ring_st_is_empty(>s.ring_st))` seems more 
>> natural here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/odp_queue.c
line 420
@@ -584,8 +556,9 @@ static int queue_init(queue_entry_t *queue, const char 
*name,
queue->s.pktin = PKTIN_INVALID;
queue->s.pktout = PKTOUT_INVALID;
 
-   queue->s.head = NULL;
-   queue->s.tail = NULL;
+   ring_st_init(>s.ring_st,
+queue_tbl->ring_data[queue->s.index].data,
+CONFIG_QUEUE_SIZE);


Comment:
One entry is not lost.

> Petri Savolainen(psavol) wrote:
> One entry is not lost. User provided size if not (currently) used. Queue size 
> is always 4k.


>> Petri Savolainen(psavol) wrote:
>> One entry is not lost.


>>> Petri Savolainen(psavol) wrote:
>>> One entry is not lost.


 Petri Savolainen(psavol) wrote:
 OK, added checks in v2.


> Petri Savolainen(psavol) wrote:
> OK. Compiler probably did that already, but changed in v2.


>> Petri Savolainen(psavol) wrote:
>> Tail and head indexes are (masked from) uint32_t and do not wrap around 
>> when the ring is full. I think you assume that the store index is 
>> 0...size-1, while it's full uint32_t which is then masked to get the 
>> actual index.
>> 
>> For example:
>> size = 100;
>> 
>> Empty:
>> head = 100
>> tail = 100
>> num = 100 - 100 = 0
>> 
>> Full:
>> head = 100
>> tail = 200
>> num = 200 - 100 = 100
>> 
>> Wrap uint32_t + full:
>> head = 0xFF9C
>> tail = 0
>> num = 0 - 0xFF9C = 0x64 = 100
>> 
>> So, no abs() needed. Ring size can be 4096, instead of 4095.


>>> Petri Savolainen(psavol) wrote:
>>> It's already documented 5 lines above:
>>> 
>>> /* Initialize ring. Ring size must be a power of two. */
>>> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, 
>>> uint32_t size)
>>> {


 Petri Savolainen(psavol) wrote:
 This function converts 32 bit buffer indexes to buffer header 
 pointers. The counter operation is buffer_index_from_buf(). The 
 prefetch is a side effect of the function, which may be changed/moved 
 any time if it's found out that there's a place for prefetching. I 
 actually plan to test if number of prefetches should be limited as 
 e.g. 32 consecutive prefetches may be too much for some CPU 
 architectures.


> Petri Savolainen(psavol) wrote:
> I prefer style where '== 0' is used instead of '!'. Especially, when 
> the if clause is as complex as this and there's danger for reader to 
> miss the '!' sign. 


>> Petri Savolainen(psavol) wrote:
>> It's there to ensure that all bits are zero also when someone would 
>> modify the bitfield from two to three fields later on. Similarly to 
>> memset() zero is used for struct inits.


>>> Petri Savolainen(psavol) wrote:
>>> There's no need for abs(). Since it's all uint32_t variables, wrap 
>>> a round is handled already.
>>> An example in 8bits:
>>> 0xff - 0xfd = 0x02
>>> 0x00 - 0xfe = 0x02
>>> 0x01 - 0xff = 0x02
>>> 0x02 - 0x00 = 0x02
>>> 
>>> This passes both gcc and clang, and is used already in the other 
>>> ring implementation see ring_deq_multi().


 Petri Savolainen(psavol) wrote:
 I prefer style with blank line in the end of a typedef, since it's 
 easier to spot the type name (as it's not mixed into struct field 
 names). Checkpatch passes so this style should be OK.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Does this mean that sizes larger than 32 have no added 
> performance benefit?


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if 
>> we're not going to use the user-supplied queue size.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Given its name, this looks like an extraneous statement that 
>>> should be deleted. Renaming this to something like 
>>> `prefetch_dequeued_bufs()` would make the intent clearer here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 `if (!ring_st_is_empty(>s.ring_st))` seems more natural 
 here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle 
> the effective queue capacity. The user-supplied `size` should 
> then be set to `ROUNDUP_POWER2_U32(size) - 1` for the masking 
> to work properly.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Same comment here as for plain queues.


>>> Bill 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/include/odp_config_internal.h
line 5
@@ -144,7 +144,7 @@ extern "C" {
  * This controls the burst size on various enqueue, dequeue, etc calls. Large
  * burst size improves throughput, but may degrade QoS (increase latency).
  */
-#define CONFIG_BURST_SIZE 16
+#define CONFIG_BURST_SIZE 32


Comment:
Didn't try larger than 32. 32 is already quite large from QoS point of view. 

I'm planning to use config file for run time tunning, so this hard coding may 
change in that phase.


> Petri Savolainen(psavol) wrote:
> One entry is not lost.


>> Petri Savolainen(psavol) wrote:
>> One entry is not lost. User provided size if not (currently) used. Queue 
>> size is always 4k.


>>> Petri Savolainen(psavol) wrote:
>>> One entry is not lost.


 Petri Savolainen(psavol) wrote:
 One entry is not lost.


> Petri Savolainen(psavol) wrote:
> OK, added checks in v2.


>> Petri Savolainen(psavol) wrote:
>> OK. Compiler probably did that already, but changed in v2.


>>> Petri Savolainen(psavol) wrote:
>>> Tail and head indexes are (masked from) uint32_t and do not wrap around 
>>> when the ring is full. I think you assume that the store index is 
>>> 0...size-1, while it's full uint32_t which is then masked to get the 
>>> actual index.
>>> 
>>> For example:
>>> size = 100;
>>> 
>>> Empty:
>>> head = 100
>>> tail = 100
>>> num = 100 - 100 = 0
>>> 
>>> Full:
>>> head = 100
>>> tail = 200
>>> num = 200 - 100 = 100
>>> 
>>> Wrap uint32_t + full:
>>> head = 0xFF9C
>>> tail = 0
>>> num = 0 - 0xFF9C = 0x64 = 100
>>> 
>>> So, no abs() needed. Ring size can be 4096, instead of 4095.


 Petri Savolainen(psavol) wrote:
 It's already documented 5 lines above:
 
 /* Initialize ring. Ring size must be a power of two. */
 static inline void ring_st_init(ring_st_t *ring, uint32_t *data, 
 uint32_t size)
 {


> Petri Savolainen(psavol) wrote:
> This function converts 32 bit buffer indexes to buffer header 
> pointers. The counter operation is buffer_index_from_buf(). The 
> prefetch is a side effect of the function, which may be changed/moved 
> any time if it's found out that there's a place for prefetching. I 
> actually plan to test if number of prefetches should be limited as 
> e.g. 32 consecutive prefetches may be too much for some CPU 
> architectures.


>> Petri Savolainen(psavol) wrote:
>> I prefer style where '== 0' is used instead of '!'. Especially, when 
>> the if clause is as complex as this and there's danger for reader to 
>> miss the '!' sign. 


>>> Petri Savolainen(psavol) wrote:
>>> It's there to ensure that all bits are zero also when someone would 
>>> modify the bitfield from two to three fields later on. Similarly to 
>>> memset() zero is used for struct inits.


 Petri Savolainen(psavol) wrote:
 There's no need for abs(). Since it's all uint32_t variables, wrap 
 a round is handled already.
 An example in 8bits:
 0xff - 0xfd = 0x02
 0x00 - 0xfe = 0x02
 0x01 - 0xff = 0x02
 0x02 - 0x00 = 0x02
 
 This passes both gcc and clang, and is used already in the other 
 ring implementation see ring_deq_multi().


> Petri Savolainen(psavol) wrote:
> I prefer style with blank line in the end of a typedef, since 
> it's easier to spot the type name (as it's not mixed into struct 
> field names). Checkpatch passes so this style should be OK.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Does this mean that sizes larger than 32 have no added 
>> performance benefit?


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if 
>>> we're not going to use the user-supplied queue size.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Given its name, this looks like an extraneous statement that 
 should be deleted. Renaming this to something like 
 `prefetch_dequeued_bufs()` would make the intent clearer here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> `if (!ring_st_is_empty(>s.ring_st))` seems more 
> natural here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle 
>> the effective queue capacity. The user-supplied `size` 
>> should then be set to `ROUNDUP_POWER2_U32(size) - 1` for the 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/odp_queue.c
@@ -471,51 +476,18 @@ static inline int deq_multi(queue_entry_t *queue, 
odp_buffer_hdr_t *buf_hdr[],
}
 
UNLOCK(>s.lock);
-   return 0;
-   }
-
-   for (i = 0; i < num && hdr; ) {
-   int burst_num = hdr->burst_num;
-   int first = hdr->burst_first;
 
-   /* First, get bursted buffers */
-   for (j = 0; j < burst_num && i < num; j++, i++) {
-   buf_hdr[i] = hdr->burst[first + j];
-   odp_prefetch(buf_hdr[i]);
-   }
-
-   if (burst_num) {
-   hdr->burst_num   = burst_num - j;
-   hdr->burst_first = first + j;
-   }
-
-   if (i == num)
-   break;
-
-   /* When burst is empty, consume the current buffer header and
-* move to the next header */
-   buf_hdr[i] = hdr;
-   next   = hdr->next;
-   hdr->next  = NULL;
-   hdr= next;
-   updated++;
-   i++;
+   return 0;
}
 
-   /* Write head only if updated */
-   if (updated)
-   queue->s.head = hdr;
-
-   /* Queue is empty */
-   if (hdr == NULL)
-   queue->s.tail = NULL;
-
if (status_sync && queue->s.type == ODP_QUEUE_TYPE_SCHED)
sched_fn->save_context(queue->s.index);
 
UNLOCK(>s.lock);
 
-   return i;
+   buffer_index_to_buf(buf_hdr, buf_idx, num_deq);


Comment:
This function converts 32 bit buffer indexes to buffer header pointers. The 
counter operation is buffer_index_from_buf(). The prefetch is a side effect of 
the function, which may be changed/moved any time if it's found out that 
there's a place for prefetching. I actually plan to test if number of 
prefetches should be limited as e.g. 32 consecutive prefetches may be too much 
for some CPU architectures.

> Petri Savolainen(psavol) wrote:
> I prefer style where '== 0' is used instead of '!'. Especially, when the if 
> clause is as complex as this and there's danger for reader to miss the '!' 
> sign. 


>> Petri Savolainen(psavol) wrote:
>> It's there to ensure that all bits are zero also when someone would modify 
>> the bitfield from two to three fields later on. Similarly to memset() zero 
>> is used for struct inits.


>>> Petri Savolainen(psavol) wrote:
>>> There's no need for abs(). Since it's all uint32_t variables, wrap a round 
>>> is handled already.
>>> An example in 8bits:
>>> 0xff - 0xfd = 0x02
>>> 0x00 - 0xfe = 0x02
>>> 0x01 - 0xff = 0x02
>>> 0x02 - 0x00 = 0x02
>>> 
>>> This passes both gcc and clang, and is used already in the other ring 
>>> implementation see ring_deq_multi().


 Petri Savolainen(psavol) wrote:
 I prefer style with blank line in the end of a typedef, since it's easier 
 to spot the type name (as it's not mixed into struct field names). 
 Checkpatch passes so this style should be OK.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Does this mean that sizes larger than 32 have no added performance 
> benefit?


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not 
>> going to use the user-supplied queue size.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Given its name, this looks like an extraneous statement that should be 
>>> deleted. Renaming this to something like `prefetch_dequeued_bufs()` 
>>> would make the intent clearer here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 `if (!ring_st_is_empty(>s.ring_st))` seems more natural here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the 
> effective queue capacity. The user-supplied `size` should then be set 
> to `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Same comment here as for plain queues.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> As noted earlier, due to "losing" one entry to distinguish queue 
>>> empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and 
>>> we also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power 
>>> of 2.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Since you're initializing `index.pool` and `index.buffer` there's 
 no need to set `index.u32` here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> We originally had this index partitioning based on 
> `ODP_CONFIG_POOLS`. Do we want to return to that here?
> 
> If not, we at least need an 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 32
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+   ring->head = 0;
+   ring->tail = 0;
+   ring->mask = size - 1;


Comment:
This only works if `size` is a power of 2. Should be documented as such, since 
this is an internal routine. In this case an `ODP_ASSERT(size == 
ROUNDUP_POWER2_U32(size))` for this requirement would be a useful debugging aid.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> should be `num = abs(tail - head);` to deal with wrap arounds, otherwise may 
> be misinterpreted as overly large since it's `uint32_t`. Note that GCC and 
> clang recognize `abs()` and treat it as a builtin, so there's no actual 
> `stdlib.h` call here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169793906
updated_at 2018-02-22 09:32:40


Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_queue.c
line 95
@@ -143,8 +150,10 @@ static int queue_capability(odp_queue_capability_t *capa)
capa->max_sched_groups  = sched_fn->num_grps();
capa->sched_prios   = odp_schedule_num_prio();
capa->plain.max_num = capa->max_queues;
+   capa->plain.max_size= CONFIG_QUEUE_SIZE;
capa->plain.nonblocking = ODP_BLOCKING;
capa->sched.max_num = capa->max_queues;
+   capa->sched.max_size= CONFIG_QUEUE_SIZE;


Comment:
Same comment here as for plain queues.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> As noted earlier, due to "losing" one entry to distinguish queue empty/full, 
> this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we also need to 
> ensure that `CONFIG_QUEUE_SIZE` is itself a power of 2.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Since you're initializing `index.pool` and `index.buffer` there's no need to 
>> set `index.u32` here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> We originally had this index partitioning based on `ODP_CONFIG_POOLS`. Do 
>>> we want to return to that here?
>>> 
>>> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that 
>>> `ODP_CONFIG_POOLS < 256` or else bad things will happen here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 This routine can be optimized to:
 ```
 return ring->head == ring->tail;
 ```


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Your invariant is the queue is empty when `head == tail` therefore the 
> queue is full when `abs(tail - head) == mask`, so the correct calculation 
> here is:
> 
> `num = mask - abs(tail - head);`
> 
> The effect is that a queue can only hold `size - 1` elements, otherwise 
> you cannot distinguish between a full and an empty queue without another 
> bit of metadata, which is a cost you're trying to avoid.
> 
> This is somewhat problematic if the caller is trying to be "optimal" by 
> specifying a power of two in the `size` parameter of the 
> `odp_queue_param_t` passed to `odp_queue_create()`. For this reason we 
> may wish to return a `max_size` of a power of 2 - 1 in 
> `odp_queue_capability()` as part of this patch series.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> This only works if `size` is a power of 2. Should be documented as such, 
>> since this is an internal routine. In this case an `ODP_ASSERT(size == 
>> ROUNDUP_POWER2_U32(size))` for this requirement would be a useful 
>> debugging aid.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> should be `num = abs(tail - head);` to deal with wrap arounds, 
>>> otherwise may be misinterpreted as overly large since it's `uint32_t`. 
>>> Note that GCC and clang recognize `abs()` and treat it as a builtin, so 
>>> there's no actual `stdlib.h` call here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169821967
updated_at 2018-02-22 09:32:40


Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/odp_queue.c
line 103
@@ -192,6 +201,9 @@ static odp_queue_t queue_create(const char *name,
param = _param;
}
 
+   if (param->size > CONFIG_QUEUE_SIZE)


Comment:
One entry is not lost. User provided size if not (currently) used. Queue size 
is always 4k.

> Petri Savolainen(psavol) wrote:
> One entry is not lost.


>> Petri Savolainen(psavol) wrote:
>> One entry is not lost.


>>> Petri Savolainen(psavol) wrote:
>>> OK, added checks in v2.


 Petri Savolainen(psavol) wrote:
 OK. Compiler probably did that already, but changed in v2.


> Petri Savolainen(psavol) wrote:
> Tail and head indexes are (masked from) uint32_t and do not wrap around 
> when the ring is full. I think you assume that the store index is 
> 0...size-1, while it's full uint32_t which is then masked to get the 
> actual index.
> 
> For example:
> size = 100;
> 
> Empty:
> head = 100
> tail = 100
> num = 100 - 100 = 0
> 
> Full:
> head = 100
> tail = 200
> num = 200 - 100 = 100
> 
> Wrap uint32_t + full:
> head = 0xFF9C
> tail = 0
> num = 0 - 0xFF9C = 0x64 = 100
> 
> So, no abs() needed. Ring size can be 4096, instead of 4095.


>> Petri Savolainen(psavol) wrote:
>> It's already documented 5 lines above:
>> 
>> /* Initialize ring. Ring size must be a power of two. */
>> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, 
>> uint32_t size)
>> {


>>> Petri Savolainen(psavol) wrote:
>>> This function converts 32 bit buffer indexes to buffer header pointers. 
>>> The counter operation is buffer_index_from_buf(). The prefetch is a 
>>> side effect of the function, which may be changed/moved any time if 
>>> it's found out that there's a place for prefetching. I actually plan to 
>>> test if number of prefetches should be limited as e.g. 32 consecutive 
>>> prefetches may be too much for some CPU architectures.


 Petri Savolainen(psavol) wrote:
 I prefer style where '== 0' is used instead of '!'. Especially, when 
 the if clause is as complex as this and there's danger for reader to 
 miss the '!' sign. 


> Petri Savolainen(psavol) wrote:
> It's there to ensure that all bits are zero also when someone would 
> modify the bitfield from two to three fields later on. Similarly to 
> memset() zero is used for struct inits.


>> Petri Savolainen(psavol) wrote:
>> There's no need for abs(). Since it's all uint32_t variables, wrap a 
>> round is handled already.
>> An example in 8bits:
>> 0xff - 0xfd = 0x02
>> 0x00 - 0xfe = 0x02
>> 0x01 - 0xff = 0x02
>> 0x02 - 0x00 = 0x02
>> 
>> This passes both gcc and clang, and is used already in the other 
>> ring implementation see ring_deq_multi().


>>> Petri Savolainen(psavol) wrote:
>>> I prefer style with blank line in the end of a typedef, since it's 
>>> easier to spot the type name (as it's not mixed into struct field 
>>> names). Checkpatch passes so this style should be OK.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Does this mean that sizes larger than 32 have no added performance 
 benefit?


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're 
> not going to use the user-supplied queue size.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Given its name, this looks like an extraneous statement that 
>> should be deleted. Renaming this to something like 
>> `prefetch_dequeued_bufs()` would make the intent clearer here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> `if (!ring_st_is_empty(>s.ring_st))` seems more natural 
>>> here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle 
 the effective queue capacity. The user-supplied `size` should 
 then be set to `ROUNDUP_POWER2_U32(size) - 1` for the masking 
 to work properly.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Same comment here as for plain queues.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> As noted earlier, due to "losing" one entry to distinguish 
>> queue empty/full, this should be returned as 
>> `CONFIG_QUEUE_SIZE - 1`, and we also need to ensure that 
>> `CONFIG_QUEUE_SIZE` is itself a power of 2.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/odp_queue.c
line 95
@@ -143,8 +150,10 @@ static int queue_capability(odp_queue_capability_t *capa)
capa->max_sched_groups  = sched_fn->num_grps();
capa->sched_prios   = odp_schedule_num_prio();
capa->plain.max_num = capa->max_queues;
+   capa->plain.max_size= CONFIG_QUEUE_SIZE;
capa->plain.nonblocking = ODP_BLOCKING;
capa->sched.max_num = capa->max_queues;
+   capa->sched.max_size= CONFIG_QUEUE_SIZE;


Comment:
One entry is not lost.

> Petri Savolainen(psavol) wrote:
> One entry is not lost.


>> Petri Savolainen(psavol) wrote:
>> OK, added checks in v2.


>>> Petri Savolainen(psavol) wrote:
>>> OK. Compiler probably did that already, but changed in v2.


 Petri Savolainen(psavol) wrote:
 Tail and head indexes are (masked from) uint32_t and do not wrap around 
 when the ring is full. I think you assume that the store index is 
 0...size-1, while it's full uint32_t which is then masked to get the 
 actual index.
 
 For example:
 size = 100;
 
 Empty:
 head = 100
 tail = 100
 num = 100 - 100 = 0
 
 Full:
 head = 100
 tail = 200
 num = 200 - 100 = 100
 
 Wrap uint32_t + full:
 head = 0xFF9C
 tail = 0
 num = 0 - 0xFF9C = 0x64 = 100
 
 So, no abs() needed. Ring size can be 4096, instead of 4095.


> Petri Savolainen(psavol) wrote:
> It's already documented 5 lines above:
> 
> /* Initialize ring. Ring size must be a power of two. */
> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t 
> size)
> {


>> Petri Savolainen(psavol) wrote:
>> This function converts 32 bit buffer indexes to buffer header pointers. 
>> The counter operation is buffer_index_from_buf(). The prefetch is a side 
>> effect of the function, which may be changed/moved any time if it's 
>> found out that there's a place for prefetching. I actually plan to test 
>> if number of prefetches should be limited as e.g. 32 consecutive 
>> prefetches may be too much for some CPU architectures.


>>> Petri Savolainen(psavol) wrote:
>>> I prefer style where '== 0' is used instead of '!'. Especially, when 
>>> the if clause is as complex as this and there's danger for reader to 
>>> miss the '!' sign. 


 Petri Savolainen(psavol) wrote:
 It's there to ensure that all bits are zero also when someone would 
 modify the bitfield from two to three fields later on. Similarly to 
 memset() zero is used for struct inits.


> Petri Savolainen(psavol) wrote:
> There's no need for abs(). Since it's all uint32_t variables, wrap a 
> round is handled already.
> An example in 8bits:
> 0xff - 0xfd = 0x02
> 0x00 - 0xfe = 0x02
> 0x01 - 0xff = 0x02
> 0x02 - 0x00 = 0x02
> 
> This passes both gcc and clang, and is used already in the other ring 
> implementation see ring_deq_multi().


>> Petri Savolainen(psavol) wrote:
>> I prefer style with blank line in the end of a typedef, since it's 
>> easier to spot the type name (as it's not mixed into struct field 
>> names). Checkpatch passes so this style should be OK.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Does this mean that sizes larger than 32 have no added performance 
>>> benefit?


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're 
 not going to use the user-supplied queue size.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Given its name, this looks like an extraneous statement that 
> should be deleted. Renaming this to something like 
> `prefetch_dequeued_bufs()` would make the intent clearer here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> `if (!ring_st_is_empty(>s.ring_st))` seems more natural 
>> here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the 
>>> effective queue capacity. The user-supplied `size` should then 
>>> be set to `ROUNDUP_POWER2_U32(size) - 1` for the masking to 
>>> work properly.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Same comment here as for plain queues.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> As noted earlier, due to "losing" one entry to distinguish 
> queue empty/full, this should be returned as 
> `CONFIG_QUEUE_SIZE - 1`, and we also need to ensure that 
> `CONFIG_QUEUE_SIZE` is itself a power of 2.



Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/odp_queue.c
line 92
@@ -143,8 +150,10 @@ static int queue_capability(odp_queue_capability_t *capa)
capa->max_sched_groups  = sched_fn->num_grps();
capa->sched_prios   = odp_schedule_num_prio();
capa->plain.max_num = capa->max_queues;
+   capa->plain.max_size= CONFIG_QUEUE_SIZE;


Comment:
One entry is not lost.

> Petri Savolainen(psavol) wrote:
> OK, added checks in v2.


>> Petri Savolainen(psavol) wrote:
>> OK. Compiler probably did that already, but changed in v2.


>>> Petri Savolainen(psavol) wrote:
>>> Tail and head indexes are (masked from) uint32_t and do not wrap around 
>>> when the ring is full. I think you assume that the store index is 
>>> 0...size-1, while it's full uint32_t which is then masked to get the actual 
>>> index.
>>> 
>>> For example:
>>> size = 100;
>>> 
>>> Empty:
>>> head = 100
>>> tail = 100
>>> num = 100 - 100 = 0
>>> 
>>> Full:
>>> head = 100
>>> tail = 200
>>> num = 200 - 100 = 100
>>> 
>>> Wrap uint32_t + full:
>>> head = 0xFF9C
>>> tail = 0
>>> num = 0 - 0xFF9C = 0x64 = 100
>>> 
>>> So, no abs() needed. Ring size can be 4096, instead of 4095.


 Petri Savolainen(psavol) wrote:
 It's already documented 5 lines above:
 
 /* Initialize ring. Ring size must be a power of two. */
 static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t 
 size)
 {


> Petri Savolainen(psavol) wrote:
> This function converts 32 bit buffer indexes to buffer header pointers. 
> The counter operation is buffer_index_from_buf(). The prefetch is a side 
> effect of the function, which may be changed/moved any time if it's found 
> out that there's a place for prefetching. I actually plan to test if 
> number of prefetches should be limited as e.g. 32 consecutive prefetches 
> may be too much for some CPU architectures.


>> Petri Savolainen(psavol) wrote:
>> I prefer style where '== 0' is used instead of '!'. Especially, when the 
>> if clause is as complex as this and there's danger for reader to miss 
>> the '!' sign. 


>>> Petri Savolainen(psavol) wrote:
>>> It's there to ensure that all bits are zero also when someone would 
>>> modify the bitfield from two to three fields later on. Similarly to 
>>> memset() zero is used for struct inits.


 Petri Savolainen(psavol) wrote:
 There's no need for abs(). Since it's all uint32_t variables, wrap a 
 round is handled already.
 An example in 8bits:
 0xff - 0xfd = 0x02
 0x00 - 0xfe = 0x02
 0x01 - 0xff = 0x02
 0x02 - 0x00 = 0x02
 
 This passes both gcc and clang, and is used already in the other ring 
 implementation see ring_deq_multi().


> Petri Savolainen(psavol) wrote:
> I prefer style with blank line in the end of a typedef, since it's 
> easier to spot the type name (as it's not mixed into struct field 
> names). Checkpatch passes so this style should be OK.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Does this mean that sizes larger than 32 have no added performance 
>> benefit?


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're 
>>> not going to use the user-supplied queue size.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Given its name, this looks like an extraneous statement that 
 should be deleted. Renaming this to something like 
 `prefetch_dequeued_bufs()` would make the intent clearer here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> `if (!ring_st_is_empty(>s.ring_st))` seems more natural 
> here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the 
>> effective queue capacity. The user-supplied `size` should then 
>> be set to `ROUNDUP_POWER2_U32(size) - 1` for the masking to work 
>> properly.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Same comment here as for plain queues.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 As noted earlier, due to "losing" one entry to distinguish 
 queue empty/full, this should be returned as 
 `CONFIG_QUEUE_SIZE - 1`, and we also need to ensure that 
 `CONFIG_QUEUE_SIZE` is itself a power of 2.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Since you're initializing `index.pool` and `index.buffer` 
> there's no need to set `index.u32` here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> We originally had this 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/include/odp_buffer_internal.h
line 17
@@ -41,11 +41,19 @@ typedef struct seg_entry_t {
uint32_t  len;
 } seg_entry_t;
 
+typedef union buffer_index_t {
+   uint32_t u32;
+
+   struct {
+   uint32_t pool   :8;
+   uint32_t buffer :24;


Comment:
OK, added checks in v2.

> Petri Savolainen(psavol) wrote:
> OK. Compiler probably did that already, but changed in v2.


>> Petri Savolainen(psavol) wrote:
>> Tail and head indexes are (masked from) uint32_t and do not wrap around when 
>> the ring is full. I think you assume that the store index is 0...size-1, 
>> while it's full uint32_t which is then masked to get the actual index.
>> 
>> For example:
>> size = 100;
>> 
>> Empty:
>> head = 100
>> tail = 100
>> num = 100 - 100 = 0
>> 
>> Full:
>> head = 100
>> tail = 200
>> num = 200 - 100 = 100
>> 
>> Wrap uint32_t + full:
>> head = 0xFF9C
>> tail = 0
>> num = 0 - 0xFF9C = 0x64 = 100
>> 
>> So, no abs() needed. Ring size can be 4096, instead of 4095.


>>> Petri Savolainen(psavol) wrote:
>>> It's already documented 5 lines above:
>>> 
>>> /* Initialize ring. Ring size must be a power of two. */
>>> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t 
>>> size)
>>> {


 Petri Savolainen(psavol) wrote:
 This function converts 32 bit buffer indexes to buffer header pointers. 
 The counter operation is buffer_index_from_buf(). The prefetch is a side 
 effect of the function, which may be changed/moved any time if it's found 
 out that there's a place for prefetching. I actually plan to test if 
 number of prefetches should be limited as e.g. 32 consecutive prefetches 
 may be too much for some CPU architectures.


> Petri Savolainen(psavol) wrote:
> I prefer style where '== 0' is used instead of '!'. Especially, when the 
> if clause is as complex as this and there's danger for reader to miss the 
> '!' sign. 


>> Petri Savolainen(psavol) wrote:
>> It's there to ensure that all bits are zero also when someone would 
>> modify the bitfield from two to three fields later on. Similarly to 
>> memset() zero is used for struct inits.


>>> Petri Savolainen(psavol) wrote:
>>> There's no need for abs(). Since it's all uint32_t variables, wrap a 
>>> round is handled already.
>>> An example in 8bits:
>>> 0xff - 0xfd = 0x02
>>> 0x00 - 0xfe = 0x02
>>> 0x01 - 0xff = 0x02
>>> 0x02 - 0x00 = 0x02
>>> 
>>> This passes both gcc and clang, and is used already in the other ring 
>>> implementation see ring_deq_multi().


 Petri Savolainen(psavol) wrote:
 I prefer style with blank line in the end of a typedef, since it's 
 easier to spot the type name (as it's not mixed into struct field 
 names). Checkpatch passes so this style should be OK.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Does this mean that sizes larger than 32 have no added performance 
> benefit?


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're 
>> not going to use the user-supplied queue size.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Given its name, this looks like an extraneous statement that should 
>>> be deleted. Renaming this to something like 
>>> `prefetch_dequeued_bufs()` would make the intent clearer here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 `if (!ring_st_is_empty(>s.ring_st))` seems more natural 
 here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the 
> effective queue capacity. The user-supplied `size` should then be 
> set to `ROUNDUP_POWER2_U32(size) - 1` for the masking to work 
> properly.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Same comment here as for plain queues.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> As noted earlier, due to "losing" one entry to distinguish 
>>> queue empty/full, this should be returned as `CONFIG_QUEUE_SIZE 
>>> - 1`, and we also need to ensure that `CONFIG_QUEUE_SIZE` is 
>>> itself a power of 2.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Since you're initializing `index.pool` and `index.buffer` 
 there's no need to set `index.u32` here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> We originally had this index partitioning based on 
> `ODP_CONFIG_POOLS`. Do we want to return to that here?
> 
> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+   ring->head = 0;
+   ring->tail = 0;
+   ring->mask = size - 1;
+   ring->data = data;
+}
+
+/* Dequeue data from the ring head. Max_num is smaller than ring size.*/
+static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[],
+uint32_t max_num)
+{
+   uint32_t head, tail, mask, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   num  = tail - head;
+
+   /* Empty */
+   if (num == 0)
+   return 0;
+
+   if (num > max_num)
+   num = max_num;
+
+   idx = head & mask;
+
+   for (i = 0; i < num; i++) {
+   data[i] = ring->data[idx];
+   idx = (idx + 1) & mask;
+   }
+
+   ring->head = head + num;
+
+   return num;
+}
+
+/* Enqueue data into the ring tail. Num_data is smaller than ring size. */
+static inline uint32_t ring_st_enq_multi(ring_st_t *ring, const uint32_t 
data[],
+uint32_t num_data)
+{
+   uint32_t head, tail, mask, size, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   size = mask + 1;
+   num  = size - (tail - head);
+
+   /* Full */
+   if (num == 0)
+   return 0;
+
+   if (num > num_data)
+   num = num_data;
+
+   idx = tail & mask;
+
+   for (i = 0; i < num; i++) {
+   ring->data[idx] = data[i];
+   idx = (idx + 1) & mask;
+   }
+
+   ring->tail = tail + num;
+
+   return num;
+}
+
+/* Check if ring is empty */
+static inline int ring_st_is_empty(ring_st_t *ring)
+{
+   uint32_t head, tail, num;
+
+   head = ring->head;
+   tail = ring->tail;
+   num  = tail - head;
+
+   if (num == 0)
+   return 1;
+
+   return 0;


Comment:
OK. Compiler probably did that already, but changed in v2.

> Petri Savolainen(psavol) wrote:
> Tail and head indexes are (masked from) uint32_t and do not wrap around when 
> the ring is full. I think you assume that the store index is 0...size-1, 
> while it's full uint32_t which is then masked to get the actual index.
> 
> For example:
> size = 100;
> 
> Empty:
> head = 100
> tail = 100
> num = 100 - 100 = 0
> 
> Full:
> head = 100
> tail = 200
> num = 200 - 100 = 100
> 
> Wrap uint32_t + full:
> head = 0xFF9C
> tail = 0
> num = 0 - 0xFF9C = 0x64 = 100
> 
> So, no abs() needed. Ring size can be 4096, instead of 4095.


>> Petri Savolainen(psavol) wrote:
>> It's already documented 5 lines above:
>> 
>> /* Initialize ring. Ring size must be a power of two. */
>> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t 
>> size)
>> {


>>> Petri Savolainen(psavol) wrote:
>>> This function converts 32 bit buffer indexes to buffer header pointers. The 
>>> counter operation is buffer_index_from_buf(). The prefetch is a side effect 
>>> of the function, which may be changed/moved any time if it's found out that 
>>> there's a place for prefetching. I actually plan to test if number of 
>>> prefetches should be limited as e.g. 32 consecutive prefetches may be too 
>>> much for some CPU architectures.


 Petri Savolainen(psavol) wrote:
 I prefer style where '== 0' is used instead of '!'. Especially, when the 
 if clause is as complex as this and there's danger for reader to miss the 
 '!' sign. 


> Petri Savolainen(psavol) wrote:
> It's there to ensure that all bits are zero also when someone would 
> modify the bitfield from two to three fields later on. Similarly to 
> memset() zero is used for struct inits.


>> Petri Savolainen(psavol) wrote:
>> There's no need for abs(). Since it's all uint32_t variables, wrap a 
>> round is handled already.
>> An example in 8bits:
>> 0xff - 0xfd = 0x02
>> 0x00 - 0xfe = 0x02
>> 0x01 - 0xff = 0x02
>> 0x02 - 0x00 = 0x02
>> 
>> This passes both gcc and clang, and is used already in the other ring 
>> implementation see ring_deq_multi().


>>> Petri Savolainen(psavol) wrote:
>>> 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_queue.c
line 103
@@ -192,6 +201,9 @@ static odp_queue_t queue_create(const char *name,
param = _param;
}
 
+   if (param->size > CONFIG_QUEUE_SIZE)


Comment:
Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the effective queue 
capacity. The user-supplied `size` should then be set to 
`ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Same comment here as for plain queues.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> As noted earlier, due to "losing" one entry to distinguish queue empty/full, 
>> this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we also need to 
>> ensure that `CONFIG_QUEUE_SIZE` is itself a power of 2.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Since you're initializing `index.pool` and `index.buffer` there's no need 
>>> to set `index.u32` here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 We originally had this index partitioning based on `ODP_CONFIG_POOLS`. Do 
 we want to return to that here?
 
 If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that 
 `ODP_CONFIG_POOLS < 256` or else bad things will happen here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> This routine can be optimized to:
> ```
> return ring->head == ring->tail;
> ```


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Your invariant is the queue is empty when `head == tail` therefore the 
>> queue is full when `abs(tail - head) == mask`, so the correct 
>> calculation here is:
>> 
>> `num = mask - abs(tail - head);`
>> 
>> The effect is that a queue can only hold `size - 1` elements, otherwise 
>> you cannot distinguish between a full and an empty queue without another 
>> bit of metadata, which is a cost you're trying to avoid.
>> 
>> This is somewhat problematic if the caller is trying to be "optimal" by 
>> specifying a power of two in the `size` parameter of the 
>> `odp_queue_param_t` passed to `odp_queue_create()`. For this reason we 
>> may wish to return a `max_size` of a power of 2 - 1 in 
>> `odp_queue_capability()` as part of this patch series.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> This only works if `size` is a power of 2. Should be documented as 
>>> such, since this is an internal routine. In this case an 
>>> `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this requirement 
>>> would be a useful debugging aid.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 should be `num = abs(tail - head);` to deal with wrap arounds, 
 otherwise may be misinterpreted as overly large since it's `uint32_t`. 
 Note that GCC and clang recognize `abs()` and treat it as a builtin, 
 so there's no actual `stdlib.h` call here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169822191
updated_at 2018-02-22 09:32:40


Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 78
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+   ring->head = 0;
+   ring->tail = 0;
+   ring->mask = size - 1;
+   ring->data = data;
+}
+
+/* Dequeue data from the ring head. Max_num is smaller than ring size.*/
+static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[],
+uint32_t max_num)
+{
+   uint32_t head, tail, mask, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   num  = tail - head;
+
+   /* Empty */
+   if (num == 0)
+   return 0;
+
+   if (num > max_num)
+   num = max_num;
+
+   idx = head & mask;
+
+   for (i = 0; i < num; i++) {
+   data[i] = ring->data[idx];
+   idx = (idx + 1) & mask;
+   }
+
+   ring->head = head + num;
+
+   return num;
+}
+
+/* Enqueue data into the ring tail. Num_data is smaller than ring size. */
+static inline uint32_t ring_st_enq_multi(ring_st_t *ring, const uint32_t 
data[],
+uint32_t num_data)
+{
+   uint32_t head, tail, mask, size, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   size = mask + 1;
+   num  = size - (tail - head);


Comment:
Tail and head indexes are (masked from) uint32_t and do not wrap around when 
the ring is full. I think you assume that the store index is 0...size-1, while 
it's full uint32_t which is then masked to get the actual index.

For example:
size = 100;

Empty:
head = 100
tail = 100
num = 100 - 100 = 0

Full:
head = 100
tail = 200
num = 200 - 100 = 100

Wrap uint32_t + full:
head = 0xFF9C
tail = 0
num = 0 - 0xFF9C = 0x64 = 100

So, no abs() needed. Ring size can be 4096, instead of 4095.

> Petri Savolainen(psavol) wrote:
> It's already documented 5 lines above:
> 
> /* Initialize ring. Ring size must be a power of two. */
> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t 
> size)
> {


>> Petri Savolainen(psavol) wrote:
>> This function converts 32 bit buffer indexes to buffer header pointers. The 
>> counter operation is buffer_index_from_buf(). The prefetch is a side effect 
>> of the function, which may be changed/moved any time if it's found out that 
>> there's a place for prefetching. I actually plan to test if number of 
>> prefetches should be limited as e.g. 32 consecutive prefetches may be too 
>> much for some CPU architectures.


>>> Petri Savolainen(psavol) wrote:
>>> I prefer style where '== 0' is used instead of '!'. Especially, when the if 
>>> clause is as complex as this and there's danger for reader to miss the '!' 
>>> sign. 


 Petri Savolainen(psavol) wrote:
 It's there to ensure that all bits are zero also when someone would modify 
 the bitfield from two to three fields later on. Similarly to memset() zero 
 is used for struct inits.


> Petri Savolainen(psavol) wrote:
> There's no need for abs(). Since it's all uint32_t variables, wrap a 
> round is handled already.
> An example in 8bits:
> 0xff - 0xfd = 0x02
> 0x00 - 0xfe = 0x02
> 0x01 - 0xff = 0x02
> 0x02 - 0x00 = 0x02
> 
> This passes both gcc and clang, and is used already in the other ring 
> implementation see ring_deq_multi().


>> Petri Savolainen(psavol) wrote:
>> I prefer style with blank line in the end of a typedef, since it's 
>> easier to spot the type name (as it's not mixed into struct field 
>> names). Checkpatch passes so this style should be OK.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Does this mean that sizes larger than 32 have no added performance 
>>> benefit?


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not 
 going to use the user-supplied queue size.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Given its name, this looks like an extraneous statement that should 
> be deleted. Renaming this to something like 
> `prefetch_dequeued_bufs()` would make the intent clearer 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 32
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+   ring->head = 0;
+   ring->tail = 0;
+   ring->mask = size - 1;


Comment:
It's already documented 5 lines above:

/* Initialize ring. Ring size must be a power of two. */
static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
{

> Petri Savolainen(psavol) wrote:
> This function converts 32 bit buffer indexes to buffer header pointers. The 
> counter operation is buffer_index_from_buf(). The prefetch is a side effect 
> of the function, which may be changed/moved any time if it's found out that 
> there's a place for prefetching. I actually plan to test if number of 
> prefetches should be limited as e.g. 32 consecutive prefetches may be too 
> much for some CPU architectures.


>> Petri Savolainen(psavol) wrote:
>> I prefer style where '== 0' is used instead of '!'. Especially, when the if 
>> clause is as complex as this and there's danger for reader to miss the '!' 
>> sign. 


>>> Petri Savolainen(psavol) wrote:
>>> It's there to ensure that all bits are zero also when someone would modify 
>>> the bitfield from two to three fields later on. Similarly to memset() zero 
>>> is used for struct inits.


 Petri Savolainen(psavol) wrote:
 There's no need for abs(). Since it's all uint32_t variables, wrap a round 
 is handled already.
 An example in 8bits:
 0xff - 0xfd = 0x02
 0x00 - 0xfe = 0x02
 0x01 - 0xff = 0x02
 0x02 - 0x00 = 0x02
 
 This passes both gcc and clang, and is used already in the other ring 
 implementation see ring_deq_multi().


> Petri Savolainen(psavol) wrote:
> I prefer style with blank line in the end of a typedef, since it's easier 
> to spot the type name (as it's not mixed into struct field names). 
> Checkpatch passes so this style should be OK.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Does this mean that sizes larger than 32 have no added performance 
>> benefit?


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not 
>>> going to use the user-supplied queue size.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Given its name, this looks like an extraneous statement that should be 
 deleted. Renaming this to something like `prefetch_dequeued_bufs()` 
 would make the intent clearer here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> `if (!ring_st_is_empty(>s.ring_st))` seems more natural here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the 
>> effective queue capacity. The user-supplied `size` should then be 
>> set to `ROUNDUP_POWER2_U32(size) - 1` for the masking to work 
>> properly.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Same comment here as for plain queues.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 As noted earlier, due to "losing" one entry to distinguish queue 
 empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, 
 and we also need to ensure that `CONFIG_QUEUE_SIZE` is itself a 
 power of 2.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Since you're initializing `index.pool` and `index.buffer` there's 
> no need to set `index.u32` here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> We originally had this index partitioning based on 
>> `ODP_CONFIG_POOLS`. Do we want to return to that here?
>> 
>> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that 
>> `ODP_CONFIG_POOLS < 256` or else bad things will happen here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> This routine can be optimized to:
>>> ```
>>> return ring->head == ring->tail;
>>> ```


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Your invariant is the queue is empty when `head == tail` 
 therefore 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/odp_queue.c
@@ -263,7 +275,7 @@ static int queue_destroy(odp_queue_t handle)
ODP_ERR("queue \"%s\" already destroyed\n", queue->s.name);
return -1;
}
-   if (queue->s.head != NULL) {
+   if (ring_st_is_empty(>s.ring_st) == 0) {


Comment:
I prefer style where '== 0' is used instead of '!'. Especially, when the if 
clause is as complex as this and there's danger for reader to miss the '!' 
sign. 

> Petri Savolainen(psavol) wrote:
> It's there to ensure that all bits are zero also when someone would modify 
> the bitfield from two to three fields later on. Similarly to memset() zero is 
> used for struct inits.


>> Petri Savolainen(psavol) wrote:
>> There's no need for abs(). Since it's all uint32_t variables, wrap a round 
>> is handled already.
>> An example in 8bits:
>> 0xff - 0xfd = 0x02
>> 0x00 - 0xfe = 0x02
>> 0x01 - 0xff = 0x02
>> 0x02 - 0x00 = 0x02
>> 
>> This passes both gcc and clang, and is used already in the other ring 
>> implementation see ring_deq_multi().


>>> Petri Savolainen(psavol) wrote:
>>> I prefer style with blank line in the end of a typedef, since it's easier 
>>> to spot the type name (as it's not mixed into struct field names). 
>>> Checkpatch passes so this style should be OK.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Does this mean that sizes larger than 32 have no added performance benefit?


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not 
> going to use the user-supplied queue size.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Given its name, this looks like an extraneous statement that should be 
>> deleted. Renaming this to something like `prefetch_dequeued_bufs()` 
>> would make the intent clearer here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> `if (!ring_st_is_empty(>s.ring_st))` seems more natural here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the 
 effective queue capacity. The user-supplied `size` should then be set 
 to `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Same comment here as for plain queues.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> As noted earlier, due to "losing" one entry to distinguish queue 
>> empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and 
>> we also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power of 
>> 2.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Since you're initializing `index.pool` and `index.buffer` there's 
>>> no need to set `index.u32` here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 We originally had this index partitioning based on 
 `ODP_CONFIG_POOLS`. Do we want to return to that here?
 
 If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that 
 `ODP_CONFIG_POOLS < 256` or else bad things will happen here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> This routine can be optimized to:
> ```
> return ring->head == ring->tail;
> ```


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Your invariant is the queue is empty when `head == tail` 
>> therefore the queue is full when `abs(tail - head) == mask`, so 
>> the correct calculation here is:
>> 
>> `num = mask - abs(tail - head);`
>> 
>> The effect is that a queue can only hold `size - 1` elements, 
>> otherwise you cannot distinguish between a full and an empty 
>> queue without another bit of metadata, which is a cost you're 
>> trying to avoid.
>> 
>> This is somewhat problematic if the caller is trying to be 
>> "optimal" by specifying a power of two in the `size` parameter 
>> of the `odp_queue_param_t` passed to `odp_queue_create()`. For 
>> this reason we may wish to return a `max_size` of a power of 2 - 
>> 1 in `odp_queue_capability()` as part of this patch series.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> This only works if `size` is a power of 2. Should be documented 
>>> as such, since this is an internal routine. In this case an 
>>> `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this 
>>> requirement would be a useful debugging aid.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 should be `num = abs(tail - head);` to deal with wrap arounds, 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 24
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+


Comment:
I prefer style with blank line in the end of a typedef, since it's easier to 
spot the type name (as it's not mixed into struct field names). Checkpatch 
passes so this style should be OK.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Does this mean that sizes larger than 32 have no added performance benefit?


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not going 
>> to use the user-supplied queue size.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Given its name, this looks like an extraneous statement that should be 
>>> deleted. Renaming this to something like `prefetch_dequeued_bufs()` would 
>>> make the intent clearer here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 `if (!ring_st_is_empty(>s.ring_st))` seems more natural here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the effective 
> queue capacity. The user-supplied `size` should then be set to 
> `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Same comment here as for plain queues.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> As noted earlier, due to "losing" one entry to distinguish queue 
>>> empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we 
>>> also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power of 2.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Since you're initializing `index.pool` and `index.buffer` there's no 
 need to set `index.u32` here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> We originally had this index partitioning based on 
> `ODP_CONFIG_POOLS`. Do we want to return to that here?
> 
> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that 
> `ODP_CONFIG_POOLS < 256` or else bad things will happen here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> This routine can be optimized to:
>> ```
>> return ring->head == ring->tail;
>> ```


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Your invariant is the queue is empty when `head == tail` therefore 
>>> the queue is full when `abs(tail - head) == mask`, so the correct 
>>> calculation here is:
>>> 
>>> `num = mask - abs(tail - head);`
>>> 
>>> The effect is that a queue can only hold `size - 1` elements, 
>>> otherwise you cannot distinguish between a full and an empty queue 
>>> without another bit of metadata, which is a cost you're trying to 
>>> avoid.
>>> 
>>> This is somewhat problematic if the caller is trying to be 
>>> "optimal" by specifying a power of two in the `size` parameter of 
>>> the `odp_queue_param_t` passed to `odp_queue_create()`. For this 
>>> reason we may wish to return a `max_size` of a power of 2 - 1 in 
>>> `odp_queue_capability()` as part of this patch series.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 This only works if `size` is a power of 2. Should be documented as 
 such, since this is an internal routine. In this case an 
 `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this 
 requirement would be a useful debugging aid.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> should be `num = abs(tail - head);` to deal with wrap arounds, 
> otherwise may be misinterpreted as overly large since it's 
> `uint32_t`. Note that GCC and clang recognize `abs()` and treat 
> it as a builtin, so there's no actual `stdlib.h` call here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169871326
updated_at 2018-02-22 09:32:40


Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/odp_pool.c
line 28
@@ -296,7 +282,9 @@ static void init_buffers(pool_t *pool)
memset(buf_hdr, 0, (uintptr_t)data - (uintptr_t)buf_hdr);
 
/* Initialize buffer metadata */
-   buf_hdr->index = i;
+   buf_hdr->index.u32= 0;
+   buf_hdr->index.pool   = pool->pool_idx;
+   buf_hdr->index.buffer = i;


Comment:
It's there to ensure that all bits are zero also when someone would modify the 
bitfield from two to three fields later on. Similarly to memset() zero is used 
for struct inits.

> Petri Savolainen(psavol) wrote:
> There's no need for abs(). Since it's all uint32_t variables, wrap a round is 
> handled already.
> An example in 8bits:
> 0xff - 0xfd = 0x02
> 0x00 - 0xfe = 0x02
> 0x01 - 0xff = 0x02
> 0x02 - 0x00 = 0x02
> 
> This passes both gcc and clang, and is used already in the other ring 
> implementation see ring_deq_multi().


>> Petri Savolainen(psavol) wrote:
>> I prefer style with blank line in the end of a typedef, since it's easier to 
>> spot the type name (as it's not mixed into struct field names). Checkpatch 
>> passes so this style should be OK.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Does this mean that sizes larger than 32 have no added performance benefit?


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not 
 going to use the user-supplied queue size.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Given its name, this looks like an extraneous statement that should be 
> deleted. Renaming this to something like `prefetch_dequeued_bufs()` would 
> make the intent clearer here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> `if (!ring_st_is_empty(>s.ring_st))` seems more natural here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the 
>>> effective queue capacity. The user-supplied `size` should then be set 
>>> to `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Same comment here as for plain queues.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> As noted earlier, due to "losing" one entry to distinguish queue 
> empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and 
> we also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power of 
> 2.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Since you're initializing `index.pool` and `index.buffer` there's no 
>> need to set `index.u32` here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> We originally had this index partitioning based on 
>>> `ODP_CONFIG_POOLS`. Do we want to return to that here?
>>> 
>>> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that 
>>> `ODP_CONFIG_POOLS < 256` or else bad things will happen here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 This routine can be optimized to:
 ```
 return ring->head == ring->tail;
 ```


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Your invariant is the queue is empty when `head == tail` 
> therefore the queue is full when `abs(tail - head) == mask`, so 
> the correct calculation here is:
> 
> `num = mask - abs(tail - head);`
> 
> The effect is that a queue can only hold `size - 1` elements, 
> otherwise you cannot distinguish between a full and an empty 
> queue without another bit of metadata, which is a cost you're 
> trying to avoid.
> 
> This is somewhat problematic if the caller is trying to be 
> "optimal" by specifying a power of two in the `size` parameter of 
> the `odp_queue_param_t` passed to `odp_queue_create()`. For this 
> reason we may wish to return a `max_size` of a power of 2 - 1 in 
> `odp_queue_capability()` as part of this patch series.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> This only works if `size` is a power of 2. Should be documented 
>> as such, since this is an internal routine. In this case an 
>> `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this 
>> requirement would be a useful debugging aid.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> should be `num = abs(tail - head);` to deal with wrap arounds, 
>>> otherwise may be misinterpreted as overly large since it's 
>>> `uint32_t`. Note that GCC and clang recognize `abs()` and treat 
>>> it as 

Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/include/odp_config_internal.h
line 5
@@ -144,7 +144,7 @@ extern "C" {
  * This controls the burst size on various enqueue, dequeue, etc calls. Large
  * burst size improves throughput, but may degrade QoS (increase latency).
  */
-#define CONFIG_BURST_SIZE 16
+#define CONFIG_BURST_SIZE 32


Comment:
Does this mean that sizes larger than 32 have no added performance benefit?

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not going 
> to use the user-supplied queue size.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Given its name, this looks like an extraneous statement that should be 
>> deleted. Renaming this to something like `prefetch_dequeued_bufs()` would 
>> make the intent clearer here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> `if (!ring_st_is_empty(>s.ring_st))` seems more natural here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the effective 
 queue capacity. The user-supplied `size` should then be set to 
 `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Same comment here as for plain queues.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> As noted earlier, due to "losing" one entry to distinguish queue 
>> empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we 
>> also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power of 2.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Since you're initializing `index.pool` and `index.buffer` there's no 
>>> need to set `index.u32` here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 We originally had this index partitioning based on `ODP_CONFIG_POOLS`. 
 Do we want to return to that here?
 
 If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that 
 `ODP_CONFIG_POOLS < 256` or else bad things will happen here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> This routine can be optimized to:
> ```
> return ring->head == ring->tail;
> ```


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Your invariant is the queue is empty when `head == tail` therefore 
>> the queue is full when `abs(tail - head) == mask`, so the correct 
>> calculation here is:
>> 
>> `num = mask - abs(tail - head);`
>> 
>> The effect is that a queue can only hold `size - 1` elements, 
>> otherwise you cannot distinguish between a full and an empty queue 
>> without another bit of metadata, which is a cost you're trying to 
>> avoid.
>> 
>> This is somewhat problematic if the caller is trying to be "optimal" 
>> by specifying a power of two in the `size` parameter of the 
>> `odp_queue_param_t` passed to `odp_queue_create()`. For this reason 
>> we may wish to return a `max_size` of a power of 2 - 1 in 
>> `odp_queue_capability()` as part of this patch series.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> This only works if `size` is a power of 2. Should be documented as 
>>> such, since this is an internal routine. In this case an 
>>> `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this requirement 
>>> would be a useful debugging aid.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 should be `num = abs(tail - head);` to deal with wrap arounds, 
 otherwise may be misinterpreted as overly large since it's 
 `uint32_t`. Note that GCC and clang recognize `abs()` and treat it 
 as a builtin, so there's no actual `stdlib.h` call here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169825596
updated_at 2018-02-22 09:32:40


Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_queue.c
line 420
@@ -584,8 +556,9 @@ static int queue_init(queue_entry_t *queue, const char 
*name,
queue->s.pktin = PKTIN_INVALID;
queue->s.pktout = PKTOUT_INVALID;
 
-   queue->s.head = NULL;
-   queue->s.tail = NULL;
+   ring_st_init(>s.ring_st,
+queue_tbl->ring_data[queue->s.index].data,
+CONFIG_QUEUE_SIZE);


Comment:
Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not going to 
use the user-supplied queue size.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Given its name, this looks like an extraneous statement that should be 
> deleted. Renaming this to something like `prefetch_dequeued_bufs()` would 
> make the intent clearer here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> `if (!ring_st_is_empty(>s.ring_st))` seems more natural here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the effective 
>>> queue capacity. The user-supplied `size` should then be set to 
>>> `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Same comment here as for plain queues.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> As noted earlier, due to "losing" one entry to distinguish queue 
> empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we 
> also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power of 2.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Since you're initializing `index.pool` and `index.buffer` there's no 
>> need to set `index.u32` here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> We originally had this index partitioning based on `ODP_CONFIG_POOLS`. 
>>> Do we want to return to that here?
>>> 
>>> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that 
>>> `ODP_CONFIG_POOLS < 256` or else bad things will happen here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 This routine can be optimized to:
 ```
 return ring->head == ring->tail;
 ```


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Your invariant is the queue is empty when `head == tail` therefore 
> the queue is full when `abs(tail - head) == mask`, so the correct 
> calculation here is:
> 
> `num = mask - abs(tail - head);`
> 
> The effect is that a queue can only hold `size - 1` elements, 
> otherwise you cannot distinguish between a full and an empty queue 
> without another bit of metadata, which is a cost you're trying to 
> avoid.
> 
> This is somewhat problematic if the caller is trying to be "optimal" 
> by specifying a power of two in the `size` parameter of the 
> `odp_queue_param_t` passed to `odp_queue_create()`. For this reason 
> we may wish to return a `max_size` of a power of 2 - 1 in 
> `odp_queue_capability()` as part of this patch series.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> This only works if `size` is a power of 2. Should be documented as 
>> such, since this is an internal routine. In this case an 
>> `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this requirement 
>> would be a useful debugging aid.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> should be `num = abs(tail - head);` to deal with wrap arounds, 
>>> otherwise may be misinterpreted as overly large since it's 
>>> `uint32_t`. Note that GCC and clang recognize `abs()` and treat it 
>>> as a builtin, so there's no actual `stdlib.h` call here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169825091
updated_at 2018-02-22 09:32:40


Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_queue.c
@@ -471,51 +476,18 @@ static inline int deq_multi(queue_entry_t *queue, 
odp_buffer_hdr_t *buf_hdr[],
}
 
UNLOCK(>s.lock);
-   return 0;
-   }
-
-   for (i = 0; i < num && hdr; ) {
-   int burst_num = hdr->burst_num;
-   int first = hdr->burst_first;
 
-   /* First, get bursted buffers */
-   for (j = 0; j < burst_num && i < num; j++, i++) {
-   buf_hdr[i] = hdr->burst[first + j];
-   odp_prefetch(buf_hdr[i]);
-   }
-
-   if (burst_num) {
-   hdr->burst_num   = burst_num - j;
-   hdr->burst_first = first + j;
-   }
-
-   if (i == num)
-   break;
-
-   /* When burst is empty, consume the current buffer header and
-* move to the next header */
-   buf_hdr[i] = hdr;
-   next   = hdr->next;
-   hdr->next  = NULL;
-   hdr= next;
-   updated++;
-   i++;
+   return 0;
}
 
-   /* Write head only if updated */
-   if (updated)
-   queue->s.head = hdr;
-
-   /* Queue is empty */
-   if (hdr == NULL)
-   queue->s.tail = NULL;
-
if (status_sync && queue->s.type == ODP_QUEUE_TYPE_SCHED)
sched_fn->save_context(queue->s.index);
 
UNLOCK(>s.lock);
 
-   return i;
+   buffer_index_to_buf(buf_hdr, buf_idx, num_deq);


Comment:
Given its name, this looks like an extraneous statement that should be deleted. 
Renaming this to something like `prefetch_dequeued_bufs()` would make the 
intent clearer here.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> `if (!ring_st_is_empty(>s.ring_st))` seems more natural here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the effective 
>> queue capacity. The user-supplied `size` should then be set to 
>> `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Same comment here as for plain queues.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 As noted earlier, due to "losing" one entry to distinguish queue 
 empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we 
 also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power of 2.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Since you're initializing `index.pool` and `index.buffer` there's no need 
> to set `index.u32` here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> We originally had this index partitioning based on `ODP_CONFIG_POOLS`. 
>> Do we want to return to that here?
>> 
>> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that 
>> `ODP_CONFIG_POOLS < 256` or else bad things will happen here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> This routine can be optimized to:
>>> ```
>>> return ring->head == ring->tail;
>>> ```


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Your invariant is the queue is empty when `head == tail` therefore the 
 queue is full when `abs(tail - head) == mask`, so the correct 
 calculation here is:
 
 `num = mask - abs(tail - head);`
 
 The effect is that a queue can only hold `size - 1` elements, 
 otherwise you cannot distinguish between a full and an empty queue 
 without another bit of metadata, which is a cost you're trying to 
 avoid.
 
 This is somewhat problematic if the caller is trying to be "optimal" 
 by specifying a power of two in the `size` parameter of the 
 `odp_queue_param_t` passed to `odp_queue_create()`. For this reason we 
 may wish to return a `max_size` of a power of 2 - 1 in 
 `odp_queue_capability()` as part of this patch series.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> This only works if `size` is a power of 2. Should be documented as 
> such, since this is an internal routine. In this case an 
> `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this requirement 
> would be a useful debugging aid.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> should be `num = abs(tail - head);` to deal with wrap arounds, 
>> otherwise may be misinterpreted as overly large since it's 
>> `uint32_t`. Note that GCC and clang recognize `abs()` and treat it 
>> as a builtin, so there's no actual `stdlib.h` call here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Extra blank line should be removed (nit).



Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_queue.c
@@ -263,7 +275,7 @@ static int queue_destroy(odp_queue_t handle)
ODP_ERR("queue \"%s\" already destroyed\n", queue->s.name);
return -1;
}
-   if (queue->s.head != NULL) {
+   if (ring_st_is_empty(>s.ring_st) == 0) {


Comment:
`if (!ring_st_is_empty(>s.ring_st))` seems more natural here.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the effective 
> queue capacity. The user-supplied `size` should then be set to 
> `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Same comment here as for plain queues.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> As noted earlier, due to "losing" one entry to distinguish queue 
>>> empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we also 
>>> need to ensure that `CONFIG_QUEUE_SIZE` is itself a power of 2.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Since you're initializing `index.pool` and `index.buffer` there's no need 
 to set `index.u32` here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> We originally had this index partitioning based on `ODP_CONFIG_POOLS`. Do 
> we want to return to that here?
> 
> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that 
> `ODP_CONFIG_POOLS < 256` or else bad things will happen here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> This routine can be optimized to:
>> ```
>> return ring->head == ring->tail;
>> ```


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Your invariant is the queue is empty when `head == tail` therefore the 
>>> queue is full when `abs(tail - head) == mask`, so the correct 
>>> calculation here is:
>>> 
>>> `num = mask - abs(tail - head);`
>>> 
>>> The effect is that a queue can only hold `size - 1` elements, otherwise 
>>> you cannot distinguish between a full and an empty queue without 
>>> another bit of metadata, which is a cost you're trying to avoid.
>>> 
>>> This is somewhat problematic if the caller is trying to be "optimal" by 
>>> specifying a power of two in the `size` parameter of the 
>>> `odp_queue_param_t` passed to `odp_queue_create()`. For this reason we 
>>> may wish to return a `max_size` of a power of 2 - 1 in 
>>> `odp_queue_capability()` as part of this patch series.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 This only works if `size` is a power of 2. Should be documented as 
 such, since this is an internal routine. In this case an 
 `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this requirement 
 would be a useful debugging aid.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> should be `num = abs(tail - head);` to deal with wrap arounds, 
> otherwise may be misinterpreted as overly large since it's 
> `uint32_t`. Note that GCC and clang recognize `abs()` and treat it as 
> a builtin, so there's no actual `stdlib.h` call here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169823060
updated_at 2018-02-22 09:32:40


Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+   ring->head = 0;
+   ring->tail = 0;
+   ring->mask = size - 1;
+   ring->data = data;
+}
+
+/* Dequeue data from the ring head. Max_num is smaller than ring size.*/
+static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[],
+uint32_t max_num)
+{
+   uint32_t head, tail, mask, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   num  = tail - head;
+
+   /* Empty */
+   if (num == 0)
+   return 0;
+
+   if (num > max_num)
+   num = max_num;
+
+   idx = head & mask;
+
+   for (i = 0; i < num; i++) {
+   data[i] = ring->data[idx];
+   idx = (idx + 1) & mask;
+   }
+
+   ring->head = head + num;
+
+   return num;
+}
+
+/* Enqueue data into the ring tail. Num_data is smaller than ring size. */
+static inline uint32_t ring_st_enq_multi(ring_st_t *ring, const uint32_t 
data[],
+uint32_t num_data)
+{
+   uint32_t head, tail, mask, size, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   size = mask + 1;
+   num  = size - (tail - head);
+
+   /* Full */
+   if (num == 0)
+   return 0;
+
+   if (num > num_data)
+   num = num_data;
+
+   idx = tail & mask;
+
+   for (i = 0; i < num; i++) {
+   ring->data[idx] = data[i];
+   idx = (idx + 1) & mask;
+   }
+
+   ring->tail = tail + num;
+
+   return num;
+}
+
+/* Check if ring is empty */
+static inline int ring_st_is_empty(ring_st_t *ring)
+{
+   uint32_t head, tail, num;
+
+   head = ring->head;
+   tail = ring->tail;
+   num  = tail - head;
+
+   if (num == 0)
+   return 1;
+
+   return 0;


Comment:
This routine can be optimized to:
```
return ring->head == ring->tail;
```

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Your invariant is the queue is empty when `head == tail` therefore the queue 
> is full when `abs(tail - head) == mask`, so the correct calculation here is:
> 
> `num = mask - abs(tail - head);`
> 
> The effect is that a queue can only hold `size - 1` elements, otherwise you 
> cannot distinguish between a full and an empty queue without another bit of 
> metadata, which is a cost you're trying to avoid.
> 
> This is somewhat problematic if the caller is trying to be "optimal" by 
> specifying a power of two in the `size` parameter of the `odp_queue_param_t` 
> passed to `odp_queue_create()`. For this reason we may wish to return a 
> `max_size` of a power of 2 - 1 in `odp_queue_capability()` as part of this 
> patch series.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> This only works if `size` is a power of 2. Should be documented as such, 
>> since this is an internal routine. In this case an `ODP_ASSERT(size == 
>> ROUNDUP_POWER2_U32(size))` for this requirement would be a useful debugging 
>> aid.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> should be `num = abs(tail - head);` to deal with wrap arounds, otherwise 
>>> may be misinterpreted as overly large since it's `uint32_t`. Note that GCC 
>>> and clang recognize `abs()` and treat it as a builtin, so there's no actual 
>>> `stdlib.h` call here.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169804664
updated_at 2018-02-22 09:32:40


Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_queue.c
line 92
@@ -143,8 +150,10 @@ static int queue_capability(odp_queue_capability_t *capa)
capa->max_sched_groups  = sched_fn->num_grps();
capa->sched_prios   = odp_schedule_num_prio();
capa->plain.max_num = capa->max_queues;
+   capa->plain.max_size= CONFIG_QUEUE_SIZE;


Comment:
As noted earlier, due to "losing" one entry to distinguish queue empty/full, 
this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we also need to ensure 
that `CONFIG_QUEUE_SIZE` is itself a power of 2.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Since you're initializing `index.pool` and `index.buffer` there's no need to 
> set `index.u32` here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> We originally had this index partitioning based on `ODP_CONFIG_POOLS`. Do we 
>> want to return to that here?
>> 
>> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that 
>> `ODP_CONFIG_POOLS < 256` or else bad things will happen here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> This routine can be optimized to:
>>> ```
>>> return ring->head == ring->tail;
>>> ```


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 Your invariant is the queue is empty when `head == tail` therefore the 
 queue is full when `abs(tail - head) == mask`, so the correct calculation 
 here is:
 
 `num = mask - abs(tail - head);`
 
 The effect is that a queue can only hold `size - 1` elements, otherwise 
 you cannot distinguish between a full and an empty queue without another 
 bit of metadata, which is a cost you're trying to avoid.
 
 This is somewhat problematic if the caller is trying to be "optimal" by 
 specifying a power of two in the `size` parameter of the 
 `odp_queue_param_t` passed to `odp_queue_create()`. For this reason we may 
 wish to return a `max_size` of a power of 2 - 1 in 
 `odp_queue_capability()` as part of this patch series.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> This only works if `size` is a power of 2. Should be documented as such, 
> since this is an internal routine. In this case an `ODP_ASSERT(size == 
> ROUNDUP_POWER2_U32(size))` for this requirement would be a useful 
> debugging aid.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> should be `num = abs(tail - head);` to deal with wrap arounds, otherwise 
>> may be misinterpreted as overly large since it's `uint32_t`. Note that 
>> GCC and clang recognize `abs()` and treat it as a builtin, so there's no 
>> actual `stdlib.h` call here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169821543
updated_at 2018-02-22 09:32:40


Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/include/odp_buffer_internal.h
line 17
@@ -41,11 +41,19 @@ typedef struct seg_entry_t {
uint32_t  len;
 } seg_entry_t;
 
+typedef union buffer_index_t {
+   uint32_t u32;
+
+   struct {
+   uint32_t pool   :8;
+   uint32_t buffer :24;


Comment:
We originally had this index partitioning based on `ODP_CONFIG_POOLS`. Do we 
want to return to that here?

If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that 
`ODP_CONFIG_POOLS < 256` or else bad things will happen here.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> This routine can be optimized to:
> ```
> return ring->head == ring->tail;
> ```


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Your invariant is the queue is empty when `head == tail` therefore the queue 
>> is full when `abs(tail - head) == mask`, so the correct calculation here is:
>> 
>> `num = mask - abs(tail - head);`
>> 
>> The effect is that a queue can only hold `size - 1` elements, otherwise you 
>> cannot distinguish between a full and an empty queue without another bit of 
>> metadata, which is a cost you're trying to avoid.
>> 
>> This is somewhat problematic if the caller is trying to be "optimal" by 
>> specifying a power of two in the `size` parameter of the `odp_queue_param_t` 
>> passed to `odp_queue_create()`. For this reason we may wish to return a 
>> `max_size` of a power of 2 - 1 in `odp_queue_capability()` as part of this 
>> patch series.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> This only works if `size` is a power of 2. Should be documented as such, 
>>> since this is an internal routine. In this case an `ODP_ASSERT(size == 
>>> ROUNDUP_POWER2_U32(size))` for this requirement would be a useful debugging 
>>> aid.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 should be `num = abs(tail - head);` to deal with wrap arounds, otherwise 
 may be misinterpreted as overly large since it's `uint32_t`. Note that GCC 
 and clang recognize `abs()` and treat it as a builtin, so there's no 
 actual `stdlib.h` call here.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169819392
updated_at 2018-02-22 09:32:40


Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/odp_pool.c
line 28
@@ -296,7 +282,9 @@ static void init_buffers(pool_t *pool)
memset(buf_hdr, 0, (uintptr_t)data - (uintptr_t)buf_hdr);
 
/* Initialize buffer metadata */
-   buf_hdr->index = i;
+   buf_hdr->index.u32= 0;
+   buf_hdr->index.pool   = pool->pool_idx;
+   buf_hdr->index.buffer = i;


Comment:
Since you're initializing `index.pool` and `index.buffer` there's no need to 
set `index.u32` here.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> We originally had this index partitioning based on `ODP_CONFIG_POOLS`. Do we 
> want to return to that here?
> 
> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that 
> `ODP_CONFIG_POOLS < 256` or else bad things will happen here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> This routine can be optimized to:
>> ```
>> return ring->head == ring->tail;
>> ```


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Your invariant is the queue is empty when `head == tail` therefore the 
>>> queue is full when `abs(tail - head) == mask`, so the correct calculation 
>>> here is:
>>> 
>>> `num = mask - abs(tail - head);`
>>> 
>>> The effect is that a queue can only hold `size - 1` elements, otherwise you 
>>> cannot distinguish between a full and an empty queue without another bit of 
>>> metadata, which is a cost you're trying to avoid.
>>> 
>>> This is somewhat problematic if the caller is trying to be "optimal" by 
>>> specifying a power of two in the `size` parameter of the 
>>> `odp_queue_param_t` passed to `odp_queue_create()`. For this reason we may 
>>> wish to return a `max_size` of a power of 2 - 1 in `odp_queue_capability()` 
>>> as part of this patch series.


 Bill Fischofer(Bill-Fischofer-Linaro) wrote:
 This only works if `size` is a power of 2. Should be documented as such, 
 since this is an internal routine. In this case an `ODP_ASSERT(size == 
 ROUNDUP_POWER2_U32(size))` for this requirement would be a useful 
 debugging aid.


> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> should be `num = abs(tail - head);` to deal with wrap arounds, otherwise 
> may be misinterpreted as overly large since it's `uint32_t`. Note that 
> GCC and clang recognize `abs()` and treat it as a builtin, so there's no 
> actual `stdlib.h` call here.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169821150
updated_at 2018-02-22 09:32:40


Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 78
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+   ring->head = 0;
+   ring->tail = 0;
+   ring->mask = size - 1;
+   ring->data = data;
+}
+
+/* Dequeue data from the ring head. Max_num is smaller than ring size.*/
+static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[],
+uint32_t max_num)
+{
+   uint32_t head, tail, mask, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   num  = tail - head;
+
+   /* Empty */
+   if (num == 0)
+   return 0;
+
+   if (num > max_num)
+   num = max_num;
+
+   idx = head & mask;
+
+   for (i = 0; i < num; i++) {
+   data[i] = ring->data[idx];
+   idx = (idx + 1) & mask;
+   }
+
+   ring->head = head + num;
+
+   return num;
+}
+
+/* Enqueue data into the ring tail. Num_data is smaller than ring size. */
+static inline uint32_t ring_st_enq_multi(ring_st_t *ring, const uint32_t 
data[],
+uint32_t num_data)
+{
+   uint32_t head, tail, mask, size, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   size = mask + 1;
+   num  = size - (tail - head);


Comment:
Your invariant is the queue is empty when `head == tail` therefore the queue is 
full when `abs(tail - head) == mask`, so the correct calculation here is:

`num = mask - abs(tail - head);`

The effect is that a queue can only hold `size - 1` elements, otherwise you 
cannot distinguish between a full and an empty queue without another bit of 
metadata, which is a cost you're trying to avoid.

This is somewhat problematic if the caller is trying to be "optimal" by 
specifying a power of two in the `size` parameter of the `odp_queue_param_t` 
passed to `odp_queue_create()`. For this reason we may wish to return a 
`max_size` of a power of 2 - 1 in `odp_queue_capability()` as part of this 
patch series.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> This only works if `size` is a power of 2. Should be documented as such, 
> since this is an internal routine. In this case an `ODP_ASSERT(size == 
> ROUNDUP_POWER2_U32(size))` for this requirement would be a useful debugging 
> aid.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> should be `num = abs(tail - head);` to deal with wrap arounds, otherwise may 
>> be misinterpreted as overly large since it's `uint32_t`. Note that GCC and 
>> clang recognize `abs()` and treat it as a builtin, so there's no actual 
>> `stdlib.h` call here.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169794712
updated_at 2018-02-22 09:32:40


Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 24
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+


Comment:
Extra blank line should be removed (nit).

https://github.com/Linaro/odp/pull/492#discussion_r169736572
updated_at 2018-02-22 09:32:40


Re: [lng-odp] [PATCH v2] Ring implementation of queues

2018-02-22 Thread Github ODP bot
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 46
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+#include 
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+   uint32_t head;
+   uint32_t tail;
+   uint32_t mask;
+   uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+   ring->head = 0;
+   ring->tail = 0;
+   ring->mask = size - 1;
+   ring->data = data;
+}
+
+/* Dequeue data from the ring head. Max_num is smaller than ring size.*/
+static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[],
+uint32_t max_num)
+{
+   uint32_t head, tail, mask, idx;
+   uint32_t num, i;
+
+   head = ring->head;
+   tail = ring->tail;
+   mask = ring->mask;
+   num  = tail - head;


Comment:
should be `num = abs(tail - head);` to deal with wrap arounds, otherwise may be 
misinterpreted as overly large since it's `uint32_t`. Note that GCC and clang 
recognize `abs()` and treat it as a builtin, so there's no actual `stdlib.h` 
call here.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169793454
updated_at 2018-02-22 09:32:40