Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 46 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; + ring->data = data; +} + +/* Dequeue data from the ring head. Max_num is smaller than ring size.*/ +static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[], +uint32_t max_num) +{ + uint32_t head, tail, mask, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + num = tail - head; Comment: OK > Petri Savolainen(psavol) wrote: > ``` > if (ring_st_is_empty(>s.ring_st)) > > if (ring_st_is_empty(>s.ring_st)) > > if (ring_st_is_empty(>s.ring_st) == 0) > > if (ring_st_is_empty(>s.ring_st)) > > if (ring_st_is_empty(>s.ring_st)) > > if (!ring_st_is_empty(>s.ring_st)) > > if (ring_st_is_empty(>s.ring_st)) > > if (ring_st_is_empty(>s.ring_st)) > ``` > > Find if clause for "ring is not empty" from picture above. >> Petri Savolainen(psavol) wrote: >> It's unsigned 32 bit arithmetic, not signed integer arithmetic. Also, tail >> is always ahead of head, but no more than queue size (e.g. 4k), So, e.g. >> when tail is 0, and head is 0x the number of items in ring is 0 - >> 0x = 1 >> >> ``` >> headtail >> hex[index], hex[index], num, num_free >> 0xfffe [6], 0x [7], 1, 7 >> 0x [7], 0x [0], 1, 7 >> 0x [0], 0x0001 [1], 1, 7 >> ``` >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Please see above discussion. The comment stands. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Not true. See the comment below where it's clear that `head` and `tail` are free running and thus will eventually wrap as described above. If these were `uint64_t` variables you could argue that the wraps would take centuries and hence are nothing to worry about, but being `uint32_t` variables this should be expected to happen regularly on heavily used queues. `abs()` adds no overhead since compilers treat it as an intrinsic. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Note that neither `head` nor `tail` get masked when doing enqueues or > dequeues, so they will traverse the entire 32-bit range of the variable > over time. While the ring itself will never hold more than `size` > entries, `head` and `tail` will wrap around. When you index into the ring > you do so via a mask (`idx = head & mask;`), but calculating the number > of elements in the ring doesn't do this, which is why `abs()` is needed. >> Petri Savolainen(psavol) wrote: >> Plus this is done only once - in pool create phase. >>> Petri Savolainen(psavol) wrote: >>> No since ring size will be much smaller than 4 billion. Petri Savolainen(psavol) wrote: The point is that ring size will never be close to 4 billion entries. E.g. currently tail is always max 4096 larger than head. Your example above is based assumption of 4 billion entry ring. Overflow is avoided when ring size if <2 billion, as 32 bit indexes can be still used to calculate number of items correctly. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Agreed. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Agreed, this is good for now. Later we may wish to honor the >> user-requested queue `size` parameter. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Agreed. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Correct, as noted earlier. I withdraw that comment. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Presumably the compiler will see the overlap and optimize away > the redundancy, so I assume the performance impact will be nil > here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Since you're allowing `head` and `tail` to run over the entire >> 32-bit range, you're correct that you can completely fill the >> ring. I did miss that
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Petri Savolainen(psavol) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 46 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; + ring->data = data; +} + +/* Dequeue data from the ring head. Max_num is smaller than ring size.*/ +static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[], +uint32_t max_num) +{ + uint32_t head, tail, mask, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + num = tail - head; Comment: It's unsigned 32 bit arithmetic, not signed integer arithmetic. Also, tail is always ahead of head, but no more than queue size (e.g. 4k), So, e.g. when tail is 0, and head is 0x the number of items in ring is 0 - 0x = 1 ``` headtail hex[index], hex[index], num, num_free 0xfffe [6], 0x [7], 1, 7 0x [7], 0x [0], 1, 7 0x [0], 0x0001 [1], 1, 7 ``` > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Please see above discussion. The comment stands. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Not true. See the comment below where it's clear that `head` and `tail` are >> free running and thus will eventually wrap as described above. If these were >> `uint64_t` variables you could argue that the wraps would take centuries and >> hence are nothing to worry about, but being `uint32_t` variables this should >> be expected to happen regularly on heavily used queues. >> >> `abs()` adds no overhead since compilers treat it as an intrinsic. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Note that neither `head` nor `tail` get masked when doing enqueues or >>> dequeues, so they will traverse the entire 32-bit range of the variable >>> over time. While the ring itself will never hold more than `size` entries, >>> `head` and `tail` will wrap around. When you index into the ring you do so >>> via a mask (`idx = head & mask;`), but calculating the number of elements >>> in the ring doesn't do this, which is why `abs()` is needed. Petri Savolainen(psavol) wrote: Plus this is done only once - in pool create phase. > Petri Savolainen(psavol) wrote: > No since ring size will be much smaller than 4 billion. >> Petri Savolainen(psavol) wrote: >> The point is that ring size will never be close to 4 billion entries. >> E.g. currently tail is always max 4096 larger than head. Your example >> above is based assumption of 4 billion entry ring. Overflow is avoided >> when ring size if <2 billion, as 32 bit indexes can be still used to >> calculate number of items correctly. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Agreed. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Agreed, this is good for now. Later we may wish to honor the user-requested queue `size` parameter. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Agreed. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Correct, as noted earlier. I withdraw that comment. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Presumably the compiler will see the overlap and optimize away the >>> redundancy, so I assume the performance impact will be nil here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Since you're allowing `head` and `tail` to run over the entire 32-bit range, you're correct that you can completely fill the ring. I did miss that point. However, as shown above you still need this to be: ``` num = size - abs(tail - head); ``` To avoid problems at the 32-bit wrap boundary. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > You're computing in 32 bits not 8 bits, and your ring size is > less than 2^32 elements. Consider the following test program: > ``` > #include > #include > #include > #include > > void main() { > uint32_t head[4]
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 78 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; + ring->data = data; +} + +/* Dequeue data from the ring head. Max_num is smaller than ring size.*/ +static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[], +uint32_t max_num) +{ + uint32_t head, tail, mask, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + num = tail - head; + + /* Empty */ + if (num == 0) + return 0; + + if (num > max_num) + num = max_num; + + idx = head & mask; + + for (i = 0; i < num; i++) { + data[i] = ring->data[idx]; + idx = (idx + 1) & mask; + } + + ring->head = head + num; + + return num; +} + +/* Enqueue data into the ring tail. Num_data is smaller than ring size. */ +static inline uint32_t ring_st_enq_multi(ring_st_t *ring, const uint32_t data[], +uint32_t num_data) +{ + uint32_t head, tail, mask, size, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + size = mask + 1; + num = size - (tail - head); Comment: Please see above discussion. The comment stands. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Not true. See the comment below where it's clear that `head` and `tail` are > free running and thus will eventually wrap as described above. If these were > `uint64_t` variables you could argue that the wraps would take centuries and > hence are nothing to worry about, but being `uint32_t` variables this should > be expected to happen regularly on heavily used queues. > > `abs()` adds no overhead since compilers treat it as an intrinsic. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Note that neither `head` nor `tail` get masked when doing enqueues or >> dequeues, so they will traverse the entire 32-bit range of the variable over >> time. While the ring itself will never hold more than `size` entries, `head` >> and `tail` will wrap around. When you index into the ring you do so via a >> mask (`idx = head & mask;`), but calculating the number of elements in the >> ring doesn't do this, which is why `abs()` is needed. >>> Petri Savolainen(psavol) wrote: >>> Plus this is done only once - in pool create phase. Petri Savolainen(psavol) wrote: No since ring size will be much smaller than 4 billion. > Petri Savolainen(psavol) wrote: > The point is that ring size will never be close to 4 billion entries. > E.g. currently tail is always max 4096 larger than head. Your example > above is based assumption of 4 billion entry ring. Overflow is avoided > when ring size if <2 billion, as 32 bit indexes can be still used to > calculate number of items correctly. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Agreed. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Agreed, this is good for now. Later we may wish to honor the >>> user-requested queue `size` parameter. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Agreed. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Correct, as noted earlier. I withdraw that comment. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Presumably the compiler will see the overlap and optimize away the >> redundancy, so I assume the performance impact will be nil here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Since you're allowing `head` and `tail` to run over the entire >>> 32-bit range, you're correct that you can completely fill the ring. >>> I did miss that point. However, as shown above you still need this >>> to be: >>> >>> ``` >>> num = size - abs(tail - head); >>> ``` >>> To avoid problems at the 32-bit wrap boundary. Bill Fischofer(Bill-Fischofer-Linaro) wrote: You're computing in 32 bits not 8 bits, and your ring
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 46 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; + ring->data = data; +} + +/* Dequeue data from the ring head. Max_num is smaller than ring size.*/ +static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[], +uint32_t max_num) +{ + uint32_t head, tail, mask, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + num = tail - head; Comment: Not true. See the comment below where it's clear that `head` and `tail` are free running and thus will eventually wrap as described above. If these were `uint64_t` variables you could argue that the wraps would take centuries and hence are nothing to worry about, but being `uint32_t` variables this should be expected to happen regularly on heavily used queues. `abs()` adds no overhead since compilers treat it as an intrinsic. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Note that neither `head` nor `tail` get masked when doing enqueues or > dequeues, so they will traverse the entire 32-bit range of the variable over > time. While the ring itself will never hold more than `size` entries, `head` > and `tail` will wrap around. When you index into the ring you do so via a > mask (`idx = head & mask;`), but calculating the number of elements in the > ring doesn't do this, which is why `abs()` is needed. >> Petri Savolainen(psavol) wrote: >> Plus this is done only once - in pool create phase. >>> Petri Savolainen(psavol) wrote: >>> No since ring size will be much smaller than 4 billion. Petri Savolainen(psavol) wrote: The point is that ring size will never be close to 4 billion entries. E.g. currently tail is always max 4096 larger than head. Your example above is based assumption of 4 billion entry ring. Overflow is avoided when ring size if <2 billion, as 32 bit indexes can be still used to calculate number of items correctly. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Agreed. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Agreed, this is good for now. Later we may wish to honor the >> user-requested queue `size` parameter. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Agreed. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Correct, as noted earlier. I withdraw that comment. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Presumably the compiler will see the overlap and optimize away the > redundancy, so I assume the performance impact will be nil here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Since you're allowing `head` and `tail` to run over the entire >> 32-bit range, you're correct that you can completely fill the ring. >> I did miss that point. However, as shown above you still need this >> to be: >> >> ``` >> num = size - abs(tail - head); >> ``` >> To avoid problems at the 32-bit wrap boundary. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> You're computing in 32 bits not 8 bits, and your ring size is less >>> than 2^32 elements. Consider the following test program: >>> ``` >>> #include >>> #include >>> #include >>> #include >>> >>> void main() { >>> uint32_t head[4] = {0, 1, 2, 3}; >>> uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd}; >>> uint32_t mask = 4095; >>> uint32_t result; >>> int i; >>> >>> for (i = 0; i < 4; i++) { >>> printf("head = %" PRIu32 " tail = %" PRIu32 ":\n", >>>head[i], tail[i]); >>> >>> result = tail[i] - head[i]; >>> printf("tail - head = %" PRIu32 "\n", result); >>> >>> result = (tail[i] - head[i]) & mask; >>> printf("(tail - head) & mask = %" PRIu32 "\n", result); >>> >>>
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Petri Savolainen(psavol) replied on github web page: platform/linux-generic/odp_pool.c line 28 @@ -296,7 +282,9 @@ static void init_buffers(pool_t *pool) memset(buf_hdr, 0, (uintptr_t)data - (uintptr_t)buf_hdr); /* Initialize buffer metadata */ - buf_hdr->index = i; + buf_hdr->index.u32= 0; + buf_hdr->index.pool = pool->pool_idx; + buf_hdr->index.buffer = i; Comment: Plus this is done only once - in pool create phase. > Petri Savolainen(psavol) wrote: > No since ring size will be much smaller than 4 billion. >> Petri Savolainen(psavol) wrote: >> The point is that ring size will never be close to 4 billion entries. E.g. >> currently tail is always max 4096 larger than head. Your example above is >> based assumption of 4 billion entry ring. Overflow is avoided when ring size >> if <2 billion, as 32 bit indexes can be still used to calculate number of >> items correctly. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Agreed. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Agreed, this is good for now. Later we may wish to honor the user-requested queue `size` parameter. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Agreed. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Correct, as noted earlier. I withdraw that comment. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Presumably the compiler will see the overlap and optimize away the >>> redundancy, so I assume the performance impact will be nil here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Since you're allowing `head` and `tail` to run over the entire 32-bit range, you're correct that you can completely fill the ring. I did miss that point. However, as shown above you still need this to be: ``` num = size - abs(tail - head); ``` To avoid problems at the 32-bit wrap boundary. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > You're computing in 32 bits not 8 bits, and your ring size is less > than 2^32 elements. Consider the following test program: > ``` > #include > #include > #include > #include > > void main() { > uint32_t head[4] = {0, 1, 2, 3}; > uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd}; > uint32_t mask = 4095; > uint32_t result; > int i; > > for (i = 0; i < 4; i++) { > printf("head = %" PRIu32 " tail = %" PRIu32 ":\n", > head[i], tail[i]); > > result = tail[i] - head[i]; > printf("tail - head = %" PRIu32 "\n", result); > > result = (tail[i] - head[i]) & mask; > printf("(tail - head) & mask = %" PRIu32 "\n", result); > > result = abs(tail[i] - head[i]); > printf("abs(tail - head) = %" PRIu32 "\n\n", result); > } > } > ``` > in theory `tail - head` should be the number of elements in the ring, > in this case 0, 2, 4, and 6. But running this test program gives the > following output: > ``` > head = 0 tail = 0: > tail - head = 0 > (tail - head) & mask = 0 > abs(tail - head) = 0 > > head = 1 tail = 4294967295: > tail - head = 4294967294 > (tail - head) & mask = 4094 > abs(tail - head) = 2 > > head = 2 tail = 4294967294: > tail - head = 4294967292 > (tail - head) & mask = 4092 > abs(tail - head) = 4 > > head = 3 tail = 4294967293: > tail - head = 4294967290 > (tail - head) & mask = 4090 > abs(tail - head) = 6 > ``` > Since you're allowing head to run free over the 32-bit range of the > variable, when the 32-bits rolls over you'll get a large positive > number, not the small one you need to stay within the ring bounds. > The alternative is to mask `head` and `tail` as you increment them, > but then you run into the effective range issue. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> OK >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> OK, the `ODP_ASSERT()` would still be useful for debugging. Bill Fischofer(Bill-Fischofer-Linaro) wrote: That functionality is not obvious from the name. It either implies that one of the input arguments is written (not true here) or the reader might assume that it is an expression without side-effect and should be deleted (what I originally thought when reading it). You
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Petri Savolainen(psavol) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 78 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; + ring->data = data; +} + +/* Dequeue data from the ring head. Max_num is smaller than ring size.*/ +static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[], +uint32_t max_num) +{ + uint32_t head, tail, mask, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + num = tail - head; + + /* Empty */ + if (num == 0) + return 0; + + if (num > max_num) + num = max_num; + + idx = head & mask; + + for (i = 0; i < num; i++) { + data[i] = ring->data[idx]; + idx = (idx + 1) & mask; + } + + ring->head = head + num; + + return num; +} + +/* Enqueue data into the ring tail. Num_data is smaller than ring size. */ +static inline uint32_t ring_st_enq_multi(ring_st_t *ring, const uint32_t data[], +uint32_t num_data) +{ + uint32_t head, tail, mask, size, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + size = mask + 1; + num = size - (tail - head); Comment: No since ring size will be much smaller than 4 billion. > Petri Savolainen(psavol) wrote: > The point is that ring size will never be close to 4 billion entries. E.g. > currently tail is always max 4096 larger than head. Your example above is > based assumption of 4 billion entry ring. Overflow is avoided when ring size > if <2 billion, as 32 bit indexes can be still used to calculate number of > items correctly. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Agreed. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Agreed, this is good for now. Later we may wish to honor the user-requested >>> queue `size` parameter. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Agreed. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Correct, as noted earlier. I withdraw that comment. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Presumably the compiler will see the overlap and optimize away the >> redundancy, so I assume the performance impact will be nil here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Since you're allowing `head` and `tail` to run over the entire 32-bit >>> range, you're correct that you can completely fill the ring. I did miss >>> that point. However, as shown above you still need this to be: >>> >>> ``` >>> num = size - abs(tail - head); >>> ``` >>> To avoid problems at the 32-bit wrap boundary. Bill Fischofer(Bill-Fischofer-Linaro) wrote: You're computing in 32 bits not 8 bits, and your ring size is less than 2^32 elements. Consider the following test program: ``` #include #include #include #include void main() { uint32_t head[4] = {0, 1, 2, 3}; uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd}; uint32_t mask = 4095; uint32_t result; int i; for (i = 0; i < 4; i++) { printf("head = %" PRIu32 " tail = %" PRIu32 ":\n", head[i], tail[i]); result = tail[i] - head[i]; printf("tail - head = %" PRIu32 "\n", result); result = (tail[i] - head[i]) & mask; printf("(tail - head) & mask = %" PRIu32 "\n", result); result = abs(tail[i] - head[i]); printf("abs(tail - head) = %" PRIu32 "\n\n", result); } } ``` in theory `tail - head` should be the number of elements in the ring, in this case 0, 2, 4, and 6. But running this test program gives the following output: ``` head = 0 tail = 0: tail - head = 0
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Petri Savolainen(psavol) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 46 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; + ring->data = data; +} + +/* Dequeue data from the ring head. Max_num is smaller than ring size.*/ +static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[], +uint32_t max_num) +{ + uint32_t head, tail, mask, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + num = tail - head; Comment: The point is that ring size will never be close to 4 billion entries. E.g. currently tail is always max 4096 larger than head. Your example above is based assumption of 4 billion entry ring. Overflow is avoided when ring size if <2 billion, as 32 bit indexes can be still used to calculate number of items correctly. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Agreed. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Agreed, this is good for now. Later we may wish to honor the user-requested >> queue `size` parameter. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Agreed. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Correct, as noted earlier. I withdraw that comment. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Presumably the compiler will see the overlap and optimize away the > redundancy, so I assume the performance impact will be nil here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Since you're allowing `head` and `tail` to run over the entire 32-bit >> range, you're correct that you can completely fill the ring. I did miss >> that point. However, as shown above you still need this to be: >> >> ``` >> num = size - abs(tail - head); >> ``` >> To avoid problems at the 32-bit wrap boundary. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> You're computing in 32 bits not 8 bits, and your ring size is less than >>> 2^32 elements. Consider the following test program: >>> ``` >>> #include >>> #include >>> #include >>> #include >>> >>> void main() { >>> uint32_t head[4] = {0, 1, 2, 3}; >>> uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd}; >>> uint32_t mask = 4095; >>> uint32_t result; >>> int i; >>> >>> for (i = 0; i < 4; i++) { >>> printf("head = %" PRIu32 " tail = %" PRIu32 ":\n", >>>head[i], tail[i]); >>> >>> result = tail[i] - head[i]; >>> printf("tail - head = %" PRIu32 "\n", result); >>> >>> result = (tail[i] - head[i]) & mask; >>> printf("(tail - head) & mask = %" PRIu32 "\n", result); >>> >>> result = abs(tail[i] - head[i]); >>> printf("abs(tail - head) = %" PRIu32 "\n\n", result); >>> } >>> } >>> ``` >>> in theory `tail - head` should be the number of elements in the ring, >>> in this case 0, 2, 4, and 6. But running this test program gives the >>> following output: >>> ``` >>> head = 0 tail = 0: >>> tail - head = 0 >>> (tail - head) & mask = 0 >>> abs(tail - head) = 0 >>> >>> head = 1 tail = 4294967295: >>> tail - head = 4294967294 >>> (tail - head) & mask = 4094 >>> abs(tail - head) = 2 >>> >>> head = 2 tail = 4294967294: >>> tail - head = 4294967292 >>> (tail - head) & mask = 4092 >>> abs(tail - head) = 4 >>> >>> head = 3 tail = 4294967293: >>> tail - head = 4294967290 >>> (tail - head) & mask = 4090 >>> abs(tail - head) = 6 >>> ``` >>> Since you're allowing head to run free over the 32-bit range of the >>> variable, when the 32-bits rolls over you'll get a large positive >>> number, not the small one you need to stay within the ring bounds. The >>> alternative is to mask `head` and `tail` as you increment them, but >>> then you run into the effective range issue. Bill Fischofer(Bill-Fischofer-Linaro) wrote: OK >
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/odp_queue.c line 103 @@ -192,6 +201,9 @@ static odp_queue_t queue_create(const char *name, param = _param; } + if (param->size > CONFIG_QUEUE_SIZE) Comment: Agreed, this is good for now. Later we may wish to honor the user-requested queue `size` parameter. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Agreed. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Correct, as noted earlier. I withdraw that comment. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Presumably the compiler will see the overlap and optimize away the >>> redundancy, so I assume the performance impact will be nil here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Since you're allowing `head` and `tail` to run over the entire 32-bit range, you're correct that you can completely fill the ring. I did miss that point. However, as shown above you still need this to be: ``` num = size - abs(tail - head); ``` To avoid problems at the 32-bit wrap boundary. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > You're computing in 32 bits not 8 bits, and your ring size is less than > 2^32 elements. Consider the following test program: > ``` > #include > #include > #include > #include > > void main() { > uint32_t head[4] = {0, 1, 2, 3}; > uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd}; > uint32_t mask = 4095; > uint32_t result; > int i; > > for (i = 0; i < 4; i++) { > printf("head = %" PRIu32 " tail = %" PRIu32 ":\n", > head[i], tail[i]); > > result = tail[i] - head[i]; > printf("tail - head = %" PRIu32 "\n", result); > > result = (tail[i] - head[i]) & mask; > printf("(tail - head) & mask = %" PRIu32 "\n", result); > > result = abs(tail[i] - head[i]); > printf("abs(tail - head) = %" PRIu32 "\n\n", result); > } > } > ``` > in theory `tail - head` should be the number of elements in the ring, in > this case 0, 2, 4, and 6. But running this test program gives the > following output: > ``` > head = 0 tail = 0: > tail - head = 0 > (tail - head) & mask = 0 > abs(tail - head) = 0 > > head = 1 tail = 4294967295: > tail - head = 4294967294 > (tail - head) & mask = 4094 > abs(tail - head) = 2 > > head = 2 tail = 4294967294: > tail - head = 4294967292 > (tail - head) & mask = 4092 > abs(tail - head) = 4 > > head = 3 tail = 4294967293: > tail - head = 4294967290 > (tail - head) & mask = 4090 > abs(tail - head) = 6 > ``` > Since you're allowing head to run free over the 32-bit range of the > variable, when the 32-bits rolls over you'll get a large positive number, > not the small one you need to stay within the ring bounds. The > alternative is to mask `head` and `tail` as you increment them, but then > you run into the effective range issue. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> OK >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> OK, the `ODP_ASSERT()` would still be useful for debugging. Bill Fischofer(Bill-Fischofer-Linaro) wrote: That functionality is not obvious from the name. It either implies that one of the input arguments is written (not true here) or the reader might assume that it is an expression without side-effect and should be deleted (what I originally thought when reading it). You should pick a routine name that makes it clear it's actually doing something real, in this case performing prefetch processing. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Elsewhere you write `if (ring_st_is_empty(...))`, not `if > (ring_st_is_empty(...) == 1)` so this is inconsistent. >> Petri Savolainen(psavol) wrote: >> Didn't try larger than 32. 32 is already quite large from QoS point >> of view. >> >> I'm planning to use config file for run time tunning, so this hard >> coding may change in that phase. >>> Petri Savolainen(psavol) wrote: >>> One entry is not lost. Petri Savolainen(psavol) wrote: One entry is not lost. User provided size if not (currently) used. Queue size is always 4k. > Petri Savolainen(psavol) wrote: > One entry is not lost. >> Petri Savolainen(psavol) wrote: >> One entry is not lost. >>> Petri Savolainen(psavol) wrote: >>> OK, added checks in v2. Petri Savolainen(psavol) wrote: OK. Compiler probably did that already, but changed
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/odp_queue.c line 95 @@ -143,8 +150,10 @@ static int queue_capability(odp_queue_capability_t *capa) capa->max_sched_groups = sched_fn->num_grps(); capa->sched_prios = odp_schedule_num_prio(); capa->plain.max_num = capa->max_queues; + capa->plain.max_size= CONFIG_QUEUE_SIZE; capa->plain.nonblocking = ODP_BLOCKING; capa->sched.max_num = capa->max_queues; + capa->sched.max_size= CONFIG_QUEUE_SIZE; Comment: Agreed. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Correct, as noted earlier. I withdraw that comment. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Presumably the compiler will see the overlap and optimize away the >> redundancy, so I assume the performance impact will be nil here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Since you're allowing `head` and `tail` to run over the entire 32-bit >>> range, you're correct that you can completely fill the ring. I did miss >>> that point. However, as shown above you still need this to be: >>> >>> ``` >>> num = size - abs(tail - head); >>> ``` >>> To avoid problems at the 32-bit wrap boundary. Bill Fischofer(Bill-Fischofer-Linaro) wrote: You're computing in 32 bits not 8 bits, and your ring size is less than 2^32 elements. Consider the following test program: ``` #include #include #include #include void main() { uint32_t head[4] = {0, 1, 2, 3}; uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd}; uint32_t mask = 4095; uint32_t result; int i; for (i = 0; i < 4; i++) { printf("head = %" PRIu32 " tail = %" PRIu32 ":\n", head[i], tail[i]); result = tail[i] - head[i]; printf("tail - head = %" PRIu32 "\n", result); result = (tail[i] - head[i]) & mask; printf("(tail - head) & mask = %" PRIu32 "\n", result); result = abs(tail[i] - head[i]); printf("abs(tail - head) = %" PRIu32 "\n\n", result); } } ``` in theory `tail - head` should be the number of elements in the ring, in this case 0, 2, 4, and 6. But running this test program gives the following output: ``` head = 0 tail = 0: tail - head = 0 (tail - head) & mask = 0 abs(tail - head) = 0 head = 1 tail = 4294967295: tail - head = 4294967294 (tail - head) & mask = 4094 abs(tail - head) = 2 head = 2 tail = 4294967294: tail - head = 4294967292 (tail - head) & mask = 4092 abs(tail - head) = 4 head = 3 tail = 4294967293: tail - head = 4294967290 (tail - head) & mask = 4090 abs(tail - head) = 6 ``` Since you're allowing head to run free over the 32-bit range of the variable, when the 32-bits rolls over you'll get a large positive number, not the small one you need to stay within the ring bounds. The alternative is to mask `head` and `tail` as you increment them, but then you run into the effective range issue. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > OK >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> OK, the `ODP_ASSERT()` would still be useful for debugging. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> That functionality is not obvious from the name. It either implies that >>> one of the input arguments is written (not true here) or the reader >>> might assume that it is an expression without side-effect and should be >>> deleted (what I originally thought when reading it). You should pick a >>> routine name that makes it clear it's actually doing something real, in >>> this case performing prefetch processing. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Elsewhere you write `if (ring_st_is_empty(...))`, not `if (ring_st_is_empty(...) == 1)` so this is inconsistent. > Petri Savolainen(psavol) wrote: > Didn't try larger than 32. 32 is already quite large from QoS point > of view. > > I'm planning to use config file for run time tunning, so this hard > coding may change in that phase. >> Petri Savolainen(psavol) wrote: >> One entry is not lost. >>> Petri Savolainen(psavol) wrote: >>> One entry is not lost. User provided size if not (currently) used. >>> Queue size is always 4k. Petri Savolainen(psavol) wrote: One entry is not lost. > Petri Savolainen(psavol) wrote: > One entry is not lost. >> Petri Savolainen(psavol) wrote: >> OK, added checks in v2. >>> Petri Savolainen(psavol) wrote: >>> OK.
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/odp_queue.c line 92 @@ -143,8 +150,10 @@ static int queue_capability(odp_queue_capability_t *capa) capa->max_sched_groups = sched_fn->num_grps(); capa->sched_prios = odp_schedule_num_prio(); capa->plain.max_num = capa->max_queues; + capa->plain.max_size= CONFIG_QUEUE_SIZE; Comment: Correct, as noted earlier. I withdraw that comment. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Presumably the compiler will see the overlap and optimize away the > redundancy, so I assume the performance impact will be nil here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Since you're allowing `head` and `tail` to run over the entire 32-bit range, >> you're correct that you can completely fill the ring. I did miss that point. >> However, as shown above you still need this to be: >> >> ``` >> num = size - abs(tail - head); >> ``` >> To avoid problems at the 32-bit wrap boundary. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> You're computing in 32 bits not 8 bits, and your ring size is less than >>> 2^32 elements. Consider the following test program: >>> ``` >>> #include >>> #include >>> #include >>> #include >>> >>> void main() { >>> uint32_t head[4] = {0, 1, 2, 3}; >>> uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd}; >>> uint32_t mask = 4095; >>> uint32_t result; >>> int i; >>> >>> for (i = 0; i < 4; i++) { >>> printf("head = %" PRIu32 " tail = %" PRIu32 ":\n", >>>head[i], tail[i]); >>> >>> result = tail[i] - head[i]; >>> printf("tail - head = %" PRIu32 "\n", result); >>> >>> result = (tail[i] - head[i]) & mask; >>> printf("(tail - head) & mask = %" PRIu32 "\n", result); >>> >>> result = abs(tail[i] - head[i]); >>> printf("abs(tail - head) = %" PRIu32 "\n\n", result); >>> } >>> } >>> ``` >>> in theory `tail - head` should be the number of elements in the ring, in >>> this case 0, 2, 4, and 6. But running this test program gives the following >>> output: >>> ``` >>> head = 0 tail = 0: >>> tail - head = 0 >>> (tail - head) & mask = 0 >>> abs(tail - head) = 0 >>> >>> head = 1 tail = 4294967295: >>> tail - head = 4294967294 >>> (tail - head) & mask = 4094 >>> abs(tail - head) = 2 >>> >>> head = 2 tail = 4294967294: >>> tail - head = 4294967292 >>> (tail - head) & mask = 4092 >>> abs(tail - head) = 4 >>> >>> head = 3 tail = 4294967293: >>> tail - head = 4294967290 >>> (tail - head) & mask = 4090 >>> abs(tail - head) = 6 >>> ``` >>> Since you're allowing head to run free over the 32-bit range of the >>> variable, when the 32-bits rolls over you'll get a large positive number, >>> not the small one you need to stay within the ring bounds. The alternative >>> is to mask `head` and `tail` as you increment them, but then you run into >>> the effective range issue. Bill Fischofer(Bill-Fischofer-Linaro) wrote: OK > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > OK, the `ODP_ASSERT()` would still be useful for debugging. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> That functionality is not obvious from the name. It either implies that >> one of the input arguments is written (not true here) or the reader >> might assume that it is an expression without side-effect and should be >> deleted (what I originally thought when reading it). You should pick a >> routine name that makes it clear it's actually doing something real, in >> this case performing prefetch processing. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if >>> (ring_st_is_empty(...) == 1)` so this is inconsistent. Petri Savolainen(psavol) wrote: Didn't try larger than 32. 32 is already quite large from QoS point of view. I'm planning to use config file for run time tunning, so this hard coding may change in that phase. > Petri Savolainen(psavol) wrote: > One entry is not lost. >> Petri Savolainen(psavol) wrote: >> One entry is not lost. User provided size if not (currently) used. >> Queue size is always 4k. >>> Petri Savolainen(psavol) wrote: >>> One entry is not lost. Petri Savolainen(psavol) wrote: One entry is not lost. > Petri Savolainen(psavol) wrote: > OK, added checks in v2. >> Petri Savolainen(psavol) wrote: >> OK. Compiler probably did that already, but changed in v2. >>> Petri Savolainen(psavol) wrote: >>> Tail and head indexes are (masked from) uint32_t and do not >>> wrap around when the ring is full. I think you assume that the >>> store index is
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/odp_pool.c line 28 @@ -296,7 +282,9 @@ static void init_buffers(pool_t *pool) memset(buf_hdr, 0, (uintptr_t)data - (uintptr_t)buf_hdr); /* Initialize buffer metadata */ - buf_hdr->index = i; + buf_hdr->index.u32= 0; + buf_hdr->index.pool = pool->pool_idx; + buf_hdr->index.buffer = i; Comment: Presumably the compiler will see the overlap and optimize away the redundancy, so I assume the performance impact will be nil here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Since you're allowing `head` and `tail` to run over the entire 32-bit range, > you're correct that you can completely fill the ring. I did miss that point. > However, as shown above you still need this to be: > > ``` > num = size - abs(tail - head); > ``` > To avoid problems at the 32-bit wrap boundary. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> You're computing in 32 bits not 8 bits, and your ring size is less than 2^32 >> elements. Consider the following test program: >> ``` >> #include >> #include >> #include >> #include >> >> void main() { >> uint32_t head[4] = {0, 1, 2, 3}; >> uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd}; >> uint32_t mask = 4095; >> uint32_t result; >> int i; >> >> for (i = 0; i < 4; i++) { >> printf("head = %" PRIu32 " tail = %" PRIu32 ":\n", >> head[i], tail[i]); >> >> result = tail[i] - head[i]; >> printf("tail - head = %" PRIu32 "\n", result); >> >> result = (tail[i] - head[i]) & mask; >> printf("(tail - head) & mask = %" PRIu32 "\n", result); >> >> result = abs(tail[i] - head[i]); >> printf("abs(tail - head) = %" PRIu32 "\n\n", result); >> } >> } >> ``` >> in theory `tail - head` should be the number of elements in the ring, in >> this case 0, 2, 4, and 6. But running this test program gives the following >> output: >> ``` >> head = 0 tail = 0: >> tail - head = 0 >> (tail - head) & mask = 0 >> abs(tail - head) = 0 >> >> head = 1 tail = 4294967295: >> tail - head = 4294967294 >> (tail - head) & mask = 4094 >> abs(tail - head) = 2 >> >> head = 2 tail = 4294967294: >> tail - head = 4294967292 >> (tail - head) & mask = 4092 >> abs(tail - head) = 4 >> >> head = 3 tail = 4294967293: >> tail - head = 4294967290 >> (tail - head) & mask = 4090 >> abs(tail - head) = 6 >> ``` >> Since you're allowing head to run free over the 32-bit range of the >> variable, when the 32-bits rolls over you'll get a large positive number, >> not the small one you need to stay within the ring bounds. The alternative >> is to mask `head` and `tail` as you increment them, but then you run into >> the effective range issue. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> OK Bill Fischofer(Bill-Fischofer-Linaro) wrote: OK, the `ODP_ASSERT()` would still be useful for debugging. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > That functionality is not obvious from the name. It either implies that > one of the input arguments is written (not true here) or the reader might > assume that it is an expression without side-effect and should be deleted > (what I originally thought when reading it). You should pick a routine > name that makes it clear it's actually doing something real, in this case > performing prefetch processing. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Elsewhere you write `if (ring_st_is_empty(...))`, not `if >> (ring_st_is_empty(...) == 1)` so this is inconsistent. >>> Petri Savolainen(psavol) wrote: >>> Didn't try larger than 32. 32 is already quite large from QoS point of >>> view. >>> >>> I'm planning to use config file for run time tunning, so this hard >>> coding may change in that phase. Petri Savolainen(psavol) wrote: One entry is not lost. > Petri Savolainen(psavol) wrote: > One entry is not lost. User provided size if not (currently) used. > Queue size is always 4k. >> Petri Savolainen(psavol) wrote: >> One entry is not lost. >>> Petri Savolainen(psavol) wrote: >>> One entry is not lost. Petri Savolainen(psavol) wrote: OK, added checks in v2. > Petri Savolainen(psavol) wrote: > OK. Compiler probably did that already, but changed in v2. >> Petri Savolainen(psavol) wrote: >> Tail and head indexes are (masked from) uint32_t and do not wrap >> around when the ring is full. I think you assume that the store >> index is 0...size-1, while it's full uint32_t which is then >> masked to get the actual index. >> >> For example:
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/odp_queue.c line 420 @@ -584,8 +556,9 @@ static int queue_init(queue_entry_t *queue, const char *name, queue->s.pktin = PKTIN_INVALID; queue->s.pktout = PKTOUT_INVALID; - queue->s.head = NULL; - queue->s.tail = NULL; + ring_st_init(>s.ring_st, +queue_tbl->ring_data[queue->s.index].data, +CONFIG_QUEUE_SIZE); Comment: Agreed. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Agreed, this is good for now. Later we may wish to honor the user-requested > queue `size` parameter. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Agreed. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Correct, as noted earlier. I withdraw that comment. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Presumably the compiler will see the overlap and optimize away the redundancy, so I assume the performance impact will be nil here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Since you're allowing `head` and `tail` to run over the entire 32-bit > range, you're correct that you can completely fill the ring. I did miss > that point. However, as shown above you still need this to be: > > ``` > num = size - abs(tail - head); > ``` > To avoid problems at the 32-bit wrap boundary. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> You're computing in 32 bits not 8 bits, and your ring size is less than >> 2^32 elements. Consider the following test program: >> ``` >> #include >> #include >> #include >> #include >> >> void main() { >> uint32_t head[4] = {0, 1, 2, 3}; >> uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd}; >> uint32_t mask = 4095; >> uint32_t result; >> int i; >> >> for (i = 0; i < 4; i++) { >> printf("head = %" PRIu32 " tail = %" PRIu32 ":\n", >> head[i], tail[i]); >> >> result = tail[i] - head[i]; >> printf("tail - head = %" PRIu32 "\n", result); >> >> result = (tail[i] - head[i]) & mask; >> printf("(tail - head) & mask = %" PRIu32 "\n", result); >> >> result = abs(tail[i] - head[i]); >> printf("abs(tail - head) = %" PRIu32 "\n\n", result); >> } >> } >> ``` >> in theory `tail - head` should be the number of elements in the ring, in >> this case 0, 2, 4, and 6. But running this test program gives the >> following output: >> ``` >> head = 0 tail = 0: >> tail - head = 0 >> (tail - head) & mask = 0 >> abs(tail - head) = 0 >> >> head = 1 tail = 4294967295: >> tail - head = 4294967294 >> (tail - head) & mask = 4094 >> abs(tail - head) = 2 >> >> head = 2 tail = 4294967294: >> tail - head = 4294967292 >> (tail - head) & mask = 4092 >> abs(tail - head) = 4 >> >> head = 3 tail = 4294967293: >> tail - head = 4294967290 >> (tail - head) & mask = 4090 >> abs(tail - head) = 6 >> ``` >> Since you're allowing head to run free over the 32-bit range of the >> variable, when the 32-bits rolls over you'll get a large positive >> number, not the small one you need to stay within the ring bounds. The >> alternative is to mask `head` and `tail` as you increment them, but then >> you run into the effective range issue. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> OK Bill Fischofer(Bill-Fischofer-Linaro) wrote: OK, the `ODP_ASSERT()` would still be useful for debugging. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > That functionality is not obvious from the name. It either implies > that one of the input arguments is written (not true here) or the > reader might assume that it is an expression without side-effect and > should be deleted (what I originally thought when reading it). You > should pick a routine name that makes it clear it's actually doing > something real, in this case performing prefetch processing. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Elsewhere you write `if (ring_st_is_empty(...))`, not `if >> (ring_st_is_empty(...) == 1)` so this is inconsistent. >>> Petri Savolainen(psavol) wrote: >>> Didn't try larger than 32. 32 is already quite large from QoS point >>> of view. >>> >>> I'm planning to use config file for run time tunning, so this hard >>> coding may change in that phase. Petri Savolainen(psavol) wrote: One entry is not lost. > Petri Savolainen(psavol) wrote: > One entry is not lost. User provided size if not (currently) > used. Queue size is always 4k. >> Petri
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 78 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; + ring->data = data; +} + +/* Dequeue data from the ring head. Max_num is smaller than ring size.*/ +static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[], +uint32_t max_num) +{ + uint32_t head, tail, mask, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + num = tail - head; + + /* Empty */ + if (num == 0) + return 0; + + if (num > max_num) + num = max_num; + + idx = head & mask; + + for (i = 0; i < num; i++) { + data[i] = ring->data[idx]; + idx = (idx + 1) & mask; + } + + ring->head = head + num; + + return num; +} + +/* Enqueue data into the ring tail. Num_data is smaller than ring size. */ +static inline uint32_t ring_st_enq_multi(ring_st_t *ring, const uint32_t data[], +uint32_t num_data) +{ + uint32_t head, tail, mask, size, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + size = mask + 1; + num = size - (tail - head); Comment: Since you're allowing `head` and `tail` to run over the entire 32-bit range, you're correct that you can completely fill the ring. I did miss that point. However, as shown above you still need this to be: ``` num = size - abs(tail - head); ``` To avoid problems at the 32-bit wrap boundary. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > You're computing in 32 bits not 8 bits, and your ring size is less than 2^32 > elements. Consider the following test program: > ``` > #include > #include > #include > #include > > void main() { > uint32_t head[4] = {0, 1, 2, 3}; > uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd}; > uint32_t mask = 4095; > uint32_t result; > int i; > > for (i = 0; i < 4; i++) { > printf("head = %" PRIu32 " tail = %" PRIu32 ":\n", > head[i], tail[i]); > > result = tail[i] - head[i]; > printf("tail - head = %" PRIu32 "\n", result); > > result = (tail[i] - head[i]) & mask; > printf("(tail - head) & mask = %" PRIu32 "\n", result); > > result = abs(tail[i] - head[i]); > printf("abs(tail - head) = %" PRIu32 "\n\n", result); > } > } > ``` > in theory `tail - head` should be the number of elements in the ring, in this > case 0, 2, 4, and 6. But running this test program gives the following output: > ``` > head = 0 tail = 0: > tail - head = 0 > (tail - head) & mask = 0 > abs(tail - head) = 0 > > head = 1 tail = 4294967295: > tail - head = 4294967294 > (tail - head) & mask = 4094 > abs(tail - head) = 2 > > head = 2 tail = 4294967294: > tail - head = 4294967292 > (tail - head) & mask = 4092 > abs(tail - head) = 4 > > head = 3 tail = 4294967293: > tail - head = 4294967290 > (tail - head) & mask = 4090 > abs(tail - head) = 6 > ``` > Since you're allowing head to run free over the 32-bit range of the variable, > when the 32-bits rolls over you'll get a large positive number, not the small > one you need to stay within the ring bounds. The alternative is to mask > `head` and `tail` as you increment them, but then you run into the effective > range issue. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> OK >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> OK, the `ODP_ASSERT()` would still be useful for debugging. Bill Fischofer(Bill-Fischofer-Linaro) wrote: That functionality is not obvious from the name. It either implies that one of the input arguments is written (not true here) or the reader might assume that it is an expression without side-effect and should be deleted (what I originally thought when reading it). You should pick a routine name that makes it clear it's actually doing something real, in this case performing prefetch processing. > Bill
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 46 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; + ring->data = data; +} + +/* Dequeue data from the ring head. Max_num is smaller than ring size.*/ +static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[], +uint32_t max_num) +{ + uint32_t head, tail, mask, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + num = tail - head; Comment: You're computing in 32 bits not 8 bits, and your ring size is less than 2^32 elements. Consider the following test program: ``` #include #include #include #include void main() { uint32_t head[4] = {0, 1, 2, 3}; uint32_t tail[4] = {0, 0x, 0xfffe, 0xfffd}; uint32_t mask = 4095; uint32_t result; int i; for (i = 0; i < 4; i++) { printf("head = %" PRIu32 " tail = %" PRIu32 ":\n", head[i], tail[i]); result = tail[i] - head[i]; printf("tail - head = %" PRIu32 "\n", result); result = (tail[i] - head[i]) & mask; printf("(tail - head) & mask = %" PRIu32 "\n", result); result = abs(tail[i] - head[i]); printf("abs(tail - head) = %" PRIu32 "\n\n", result); } } ``` in theory `tail - head` should be the number of elements in the ring, in this case 0, 2, 4, and 6. But running this test program gives the following output: ``` head = 0 tail = 0: tail - head = 0 (tail - head) & mask = 0 abs(tail - head) = 0 head = 1 tail = 4294967295: tail - head = 4294967294 (tail - head) & mask = 4094 abs(tail - head) = 2 head = 2 tail = 4294967294: tail - head = 4294967292 (tail - head) & mask = 4092 abs(tail - head) = 4 head = 3 tail = 4294967293: tail - head = 4294967290 (tail - head) & mask = 4090 abs(tail - head) = 6 ``` Since you're allowing head to run free over the 32-bit range of the variable, when the 32-bits rolls over you'll get a large positive number, not the small one you need to stay within the ring bounds. The alternative is to mask `head` and `tail` as you increment them, but then you run into the effective range issue. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > OK >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> OK, the `ODP_ASSERT()` would still be useful for debugging. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> That functionality is not obvious from the name. It either implies that one >>> of the input arguments is written (not true here) or the reader might >>> assume that it is an expression without side-effect and should be deleted >>> (what I originally thought when reading it). You should pick a routine name >>> that makes it clear it's actually doing something real, in this case >>> performing prefetch processing. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Elsewhere you write `if (ring_st_is_empty(...))`, not `if (ring_st_is_empty(...) == 1)` so this is inconsistent. > Petri Savolainen(psavol) wrote: > Didn't try larger than 32. 32 is already quite large from QoS point of > view. > > I'm planning to use config file for run time tunning, so this hard coding > may change in that phase. >> Petri Savolainen(psavol) wrote: >> One entry is not lost. >>> Petri Savolainen(psavol) wrote: >>> One entry is not lost. User provided size if not (currently) used. >>> Queue size is always 4k. Petri Savolainen(psavol) wrote: One entry is not lost. > Petri Savolainen(psavol) wrote: > One entry is not lost. >> Petri Savolainen(psavol) wrote: >> OK, added checks in v2. >>> Petri Savolainen(psavol) wrote: >>> OK. Compiler probably did that already, but changed in v2. Petri Savolainen(psavol) wrote: Tail and head indexes are (masked from) uint32_t and do not wrap around when the ring is full. I think you assume that the store index is 0...size-1, while it's full uint32_t
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 24 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + Comment: OK > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > OK, the `ODP_ASSERT()` would still be useful for debugging. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> That functionality is not obvious from the name. It either implies that one >> of the input arguments is written (not true here) or the reader might assume >> that it is an expression without side-effect and should be deleted (what I >> originally thought when reading it). You should pick a routine name that >> makes it clear it's actually doing something real, in this case performing >> prefetch processing. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if >>> (ring_st_is_empty(...) == 1)` so this is inconsistent. Petri Savolainen(psavol) wrote: Didn't try larger than 32. 32 is already quite large from QoS point of view. I'm planning to use config file for run time tunning, so this hard coding may change in that phase. > Petri Savolainen(psavol) wrote: > One entry is not lost. >> Petri Savolainen(psavol) wrote: >> One entry is not lost. User provided size if not (currently) used. Queue >> size is always 4k. >>> Petri Savolainen(psavol) wrote: >>> One entry is not lost. Petri Savolainen(psavol) wrote: One entry is not lost. > Petri Savolainen(psavol) wrote: > OK, added checks in v2. >> Petri Savolainen(psavol) wrote: >> OK. Compiler probably did that already, but changed in v2. >>> Petri Savolainen(psavol) wrote: >>> Tail and head indexes are (masked from) uint32_t and do not wrap >>> around when the ring is full. I think you assume that the store >>> index is 0...size-1, while it's full uint32_t which is then masked >>> to get the actual index. >>> >>> For example: >>> size = 100; >>> >>> Empty: >>> head = 100 >>> tail = 100 >>> num = 100 - 100 = 0 >>> >>> Full: >>> head = 100 >>> tail = 200 >>> num = 200 - 100 = 100 >>> >>> Wrap uint32_t + full: >>> head = 0xFF9C >>> tail = 0 >>> num = 0 - 0xFF9C = 0x64 = 100 >>> >>> So, no abs() needed. Ring size can be 4096, instead of 4095. Petri Savolainen(psavol) wrote: It's already documented 5 lines above: /* Initialize ring. Ring size must be a power of two. */ static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) { > Petri Savolainen(psavol) wrote: > This function converts 32 bit buffer indexes to buffer header > pointers. The counter operation is buffer_index_from_buf(). The > prefetch is a side effect of the function, which may be > changed/moved any time if it's found out that there's a place for > prefetching. I actually plan to test if number of prefetches > should be limited as e.g. 32 consecutive prefetches may be too > much for some CPU architectures. >> Petri Savolainen(psavol) wrote: >> I prefer style where '== 0' is used instead of '!'. Especially, >> when the if clause is as complex as this and there's danger for >> reader to miss the '!' sign. >>> Petri Savolainen(psavol) wrote: >>> It's there to ensure that all bits are zero also when someone >>> would modify the bitfield from two to three fields later on. >>> Similarly to memset() zero is used for struct inits. Petri Savolainen(psavol) wrote: There's no need for abs(). Since it's all uint32_t variables, wrap a round is handled already. An example in 8bits: 0xff - 0xfd = 0x02 0x00 - 0xfe = 0x02 0x01 - 0xff = 0x02 0x02 - 0x00 = 0x02 This passes both gcc and clang, and is used already in the other ring implementation see ring_deq_multi().
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 32 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; Comment: OK, the `ODP_ASSERT()` would still be useful for debugging. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > That functionality is not obvious from the name. It either implies that one > of the input arguments is written (not true here) or the reader might assume > that it is an expression without side-effect and should be deleted (what I > originally thought when reading it). You should pick a routine name that > makes it clear it's actually doing something real, in this case performing > prefetch processing. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Elsewhere you write `if (ring_st_is_empty(...))`, not `if >> (ring_st_is_empty(...) == 1)` so this is inconsistent. >>> Petri Savolainen(psavol) wrote: >>> Didn't try larger than 32. 32 is already quite large from QoS point of >>> view. >>> >>> I'm planning to use config file for run time tunning, so this hard coding >>> may change in that phase. Petri Savolainen(psavol) wrote: One entry is not lost. > Petri Savolainen(psavol) wrote: > One entry is not lost. User provided size if not (currently) used. Queue > size is always 4k. >> Petri Savolainen(psavol) wrote: >> One entry is not lost. >>> Petri Savolainen(psavol) wrote: >>> One entry is not lost. Petri Savolainen(psavol) wrote: OK, added checks in v2. > Petri Savolainen(psavol) wrote: > OK. Compiler probably did that already, but changed in v2. >> Petri Savolainen(psavol) wrote: >> Tail and head indexes are (masked from) uint32_t and do not wrap >> around when the ring is full. I think you assume that the store >> index is 0...size-1, while it's full uint32_t which is then masked >> to get the actual index. >> >> For example: >> size = 100; >> >> Empty: >> head = 100 >> tail = 100 >> num = 100 - 100 = 0 >> >> Full: >> head = 100 >> tail = 200 >> num = 200 - 100 = 100 >> >> Wrap uint32_t + full: >> head = 0xFF9C >> tail = 0 >> num = 0 - 0xFF9C = 0x64 = 100 >> >> So, no abs() needed. Ring size can be 4096, instead of 4095. >>> Petri Savolainen(psavol) wrote: >>> It's already documented 5 lines above: >>> >>> /* Initialize ring. Ring size must be a power of two. */ >>> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, >>> uint32_t size) >>> { Petri Savolainen(psavol) wrote: This function converts 32 bit buffer indexes to buffer header pointers. The counter operation is buffer_index_from_buf(). The prefetch is a side effect of the function, which may be changed/moved any time if it's found out that there's a place for prefetching. I actually plan to test if number of prefetches should be limited as e.g. 32 consecutive prefetches may be too much for some CPU architectures. > Petri Savolainen(psavol) wrote: > I prefer style where '== 0' is used instead of '!'. Especially, > when the if clause is as complex as this and there's danger for > reader to miss the '!' sign. >> Petri Savolainen(psavol) wrote: >> It's there to ensure that all bits are zero also when someone >> would modify the bitfield from two to three fields later on. >> Similarly to memset() zero is used for struct inits. >>> Petri Savolainen(psavol) wrote: >>> There's no need for abs(). Since it's all uint32_t variables, >>> wrap a round is handled already. >>> An example in 8bits: >>> 0xff - 0xfd = 0x02 >>> 0x00 - 0xfe = 0x02 >>> 0x01 - 0xff = 0x02 >>> 0x02 - 0x00 = 0x02 >>> >>> This passes both gcc and clang, and is
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/odp_queue.c @@ -471,51 +476,18 @@ static inline int deq_multi(queue_entry_t *queue, odp_buffer_hdr_t *buf_hdr[], } UNLOCK(>s.lock); - return 0; - } - - for (i = 0; i < num && hdr; ) { - int burst_num = hdr->burst_num; - int first = hdr->burst_first; - /* First, get bursted buffers */ - for (j = 0; j < burst_num && i < num; j++, i++) { - buf_hdr[i] = hdr->burst[first + j]; - odp_prefetch(buf_hdr[i]); - } - - if (burst_num) { - hdr->burst_num = burst_num - j; - hdr->burst_first = first + j; - } - - if (i == num) - break; - - /* When burst is empty, consume the current buffer header and -* move to the next header */ - buf_hdr[i] = hdr; - next = hdr->next; - hdr->next = NULL; - hdr= next; - updated++; - i++; + return 0; } - /* Write head only if updated */ - if (updated) - queue->s.head = hdr; - - /* Queue is empty */ - if (hdr == NULL) - queue->s.tail = NULL; - if (status_sync && queue->s.type == ODP_QUEUE_TYPE_SCHED) sched_fn->save_context(queue->s.index); UNLOCK(>s.lock); - return i; + buffer_index_to_buf(buf_hdr, buf_idx, num_deq); Comment: That functionality is not obvious from the name. It either implies that one of the input arguments is written (not true here) or the reader might assume that it is an expression without side-effect and should be deleted (what I originally thought when reading it). You should pick a routine name that makes it clear it's actually doing something real, in this case performing prefetch processing. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Elsewhere you write `if (ring_st_is_empty(...))`, not `if > (ring_st_is_empty(...) == 1)` so this is inconsistent. >> Petri Savolainen(psavol) wrote: >> Didn't try larger than 32. 32 is already quite large from QoS point of view. >> >> I'm planning to use config file for run time tunning, so this hard coding >> may change in that phase. >>> Petri Savolainen(psavol) wrote: >>> One entry is not lost. Petri Savolainen(psavol) wrote: One entry is not lost. User provided size if not (currently) used. Queue size is always 4k. > Petri Savolainen(psavol) wrote: > One entry is not lost. >> Petri Savolainen(psavol) wrote: >> One entry is not lost. >>> Petri Savolainen(psavol) wrote: >>> OK, added checks in v2. Petri Savolainen(psavol) wrote: OK. Compiler probably did that already, but changed in v2. > Petri Savolainen(psavol) wrote: > Tail and head indexes are (masked from) uint32_t and do not wrap > around when the ring is full. I think you assume that the store index > is 0...size-1, while it's full uint32_t which is then masked to get > the actual index. > > For example: > size = 100; > > Empty: > head = 100 > tail = 100 > num = 100 - 100 = 0 > > Full: > head = 100 > tail = 200 > num = 200 - 100 = 100 > > Wrap uint32_t + full: > head = 0xFF9C > tail = 0 > num = 0 - 0xFF9C = 0x64 = 100 > > So, no abs() needed. Ring size can be 4096, instead of 4095. >> Petri Savolainen(psavol) wrote: >> It's already documented 5 lines above: >> >> /* Initialize ring. Ring size must be a power of two. */ >> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, >> uint32_t size) >> { >>> Petri Savolainen(psavol) wrote: >>> This function converts 32 bit buffer indexes to buffer header >>> pointers. The counter operation is buffer_index_from_buf(). The >>> prefetch is a side effect of the function, which may be >>> changed/moved any time if it's found out that there's a place for >>> prefetching. I actually plan to test if number of prefetches should >>> be limited as e.g. 32 consecutive prefetches may be too much for >>> some CPU architectures. Petri Savolainen(psavol) wrote: I prefer style where '== 0' is used instead of '!'. Especially, when the if clause is as complex as this and there's danger for reader to miss the '!' sign. > Petri Savolainen(psavol) wrote: > It's there to ensure that all bits are zero also when
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/odp_queue.c @@ -263,7 +275,7 @@ static int queue_destroy(odp_queue_t handle) ODP_ERR("queue \"%s\" already destroyed\n", queue->s.name); return -1; } - if (queue->s.head != NULL) { + if (ring_st_is_empty(>s.ring_st) == 0) { Comment: Elsewhere you write `if (ring_st_is_empty(...))`, not `if (ring_st_is_empty(...) == 1)` so this is inconsistent. > Petri Savolainen(psavol) wrote: > Didn't try larger than 32. 32 is already quite large from QoS point of view. > > I'm planning to use config file for run time tunning, so this hard coding may > change in that phase. >> Petri Savolainen(psavol) wrote: >> One entry is not lost. >>> Petri Savolainen(psavol) wrote: >>> One entry is not lost. User provided size if not (currently) used. Queue >>> size is always 4k. Petri Savolainen(psavol) wrote: One entry is not lost. > Petri Savolainen(psavol) wrote: > One entry is not lost. >> Petri Savolainen(psavol) wrote: >> OK, added checks in v2. >>> Petri Savolainen(psavol) wrote: >>> OK. Compiler probably did that already, but changed in v2. Petri Savolainen(psavol) wrote: Tail and head indexes are (masked from) uint32_t and do not wrap around when the ring is full. I think you assume that the store index is 0...size-1, while it's full uint32_t which is then masked to get the actual index. For example: size = 100; Empty: head = 100 tail = 100 num = 100 - 100 = 0 Full: head = 100 tail = 200 num = 200 - 100 = 100 Wrap uint32_t + full: head = 0xFF9C tail = 0 num = 0 - 0xFF9C = 0x64 = 100 So, no abs() needed. Ring size can be 4096, instead of 4095. > Petri Savolainen(psavol) wrote: > It's already documented 5 lines above: > > /* Initialize ring. Ring size must be a power of two. */ > static inline void ring_st_init(ring_st_t *ring, uint32_t *data, > uint32_t size) > { >> Petri Savolainen(psavol) wrote: >> This function converts 32 bit buffer indexes to buffer header >> pointers. The counter operation is buffer_index_from_buf(). The >> prefetch is a side effect of the function, which may be >> changed/moved any time if it's found out that there's a place for >> prefetching. I actually plan to test if number of prefetches should >> be limited as e.g. 32 consecutive prefetches may be too much for >> some CPU architectures. >>> Petri Savolainen(psavol) wrote: >>> I prefer style where '== 0' is used instead of '!'. Especially, >>> when the if clause is as complex as this and there's danger for >>> reader to miss the '!' sign. Petri Savolainen(psavol) wrote: It's there to ensure that all bits are zero also when someone would modify the bitfield from two to three fields later on. Similarly to memset() zero is used for struct inits. > Petri Savolainen(psavol) wrote: > There's no need for abs(). Since it's all uint32_t variables, > wrap a round is handled already. > An example in 8bits: > 0xff - 0xfd = 0x02 > 0x00 - 0xfe = 0x02 > 0x01 - 0xff = 0x02 > 0x02 - 0x00 = 0x02 > > This passes both gcc and clang, and is used already in the other > ring implementation see ring_deq_multi(). >> Petri Savolainen(psavol) wrote: >> I prefer style with blank line in the end of a typedef, since >> it's easier to spot the type name (as it's not mixed into struct >> field names). Checkpatch passes so this style should be OK. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Does this mean that sizes larger than 32 have no added >>> performance benefit? Bill Fischofer(Bill-Fischofer-Linaro) wrote: Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not going to use the user-supplied queue size. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Given its name, this looks like an extraneous statement that > should be deleted. Renaming this to something like > `prefetch_dequeued_bufs()` would make the intent clearer here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> `if (!ring_st_is_empty(>s.ring_st))` seems more >> natural here. >>> Bill Fischofer(Bill-Fischofer-Linaro)
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Petri Savolainen(psavol) replied on github web page: platform/linux-generic/odp_queue.c line 420 @@ -584,8 +556,9 @@ static int queue_init(queue_entry_t *queue, const char *name, queue->s.pktin = PKTIN_INVALID; queue->s.pktout = PKTOUT_INVALID; - queue->s.head = NULL; - queue->s.tail = NULL; + ring_st_init(>s.ring_st, +queue_tbl->ring_data[queue->s.index].data, +CONFIG_QUEUE_SIZE); Comment: One entry is not lost. > Petri Savolainen(psavol) wrote: > One entry is not lost. User provided size if not (currently) used. Queue size > is always 4k. >> Petri Savolainen(psavol) wrote: >> One entry is not lost. >>> Petri Savolainen(psavol) wrote: >>> One entry is not lost. Petri Savolainen(psavol) wrote: OK, added checks in v2. > Petri Savolainen(psavol) wrote: > OK. Compiler probably did that already, but changed in v2. >> Petri Savolainen(psavol) wrote: >> Tail and head indexes are (masked from) uint32_t and do not wrap around >> when the ring is full. I think you assume that the store index is >> 0...size-1, while it's full uint32_t which is then masked to get the >> actual index. >> >> For example: >> size = 100; >> >> Empty: >> head = 100 >> tail = 100 >> num = 100 - 100 = 0 >> >> Full: >> head = 100 >> tail = 200 >> num = 200 - 100 = 100 >> >> Wrap uint32_t + full: >> head = 0xFF9C >> tail = 0 >> num = 0 - 0xFF9C = 0x64 = 100 >> >> So, no abs() needed. Ring size can be 4096, instead of 4095. >>> Petri Savolainen(psavol) wrote: >>> It's already documented 5 lines above: >>> >>> /* Initialize ring. Ring size must be a power of two. */ >>> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, >>> uint32_t size) >>> { Petri Savolainen(psavol) wrote: This function converts 32 bit buffer indexes to buffer header pointers. The counter operation is buffer_index_from_buf(). The prefetch is a side effect of the function, which may be changed/moved any time if it's found out that there's a place for prefetching. I actually plan to test if number of prefetches should be limited as e.g. 32 consecutive prefetches may be too much for some CPU architectures. > Petri Savolainen(psavol) wrote: > I prefer style where '== 0' is used instead of '!'. Especially, when > the if clause is as complex as this and there's danger for reader to > miss the '!' sign. >> Petri Savolainen(psavol) wrote: >> It's there to ensure that all bits are zero also when someone would >> modify the bitfield from two to three fields later on. Similarly to >> memset() zero is used for struct inits. >>> Petri Savolainen(psavol) wrote: >>> There's no need for abs(). Since it's all uint32_t variables, wrap >>> a round is handled already. >>> An example in 8bits: >>> 0xff - 0xfd = 0x02 >>> 0x00 - 0xfe = 0x02 >>> 0x01 - 0xff = 0x02 >>> 0x02 - 0x00 = 0x02 >>> >>> This passes both gcc and clang, and is used already in the other >>> ring implementation see ring_deq_multi(). Petri Savolainen(psavol) wrote: I prefer style with blank line in the end of a typedef, since it's easier to spot the type name (as it's not mixed into struct field names). Checkpatch passes so this style should be OK. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Does this mean that sizes larger than 32 have no added > performance benefit? >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if >> we're not going to use the user-supplied queue size. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Given its name, this looks like an extraneous statement that >>> should be deleted. Renaming this to something like >>> `prefetch_dequeued_bufs()` would make the intent clearer here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: `if (!ring_st_is_empty(>s.ring_st))` seems more natural here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle > the effective queue capacity. The user-supplied `size` should > then be set to `ROUNDUP_POWER2_U32(size) - 1` for the masking > to work properly. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Same comment here as for plain queues. >>> Bill
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Petri Savolainen(psavol) replied on github web page: platform/linux-generic/include/odp_config_internal.h line 5 @@ -144,7 +144,7 @@ extern "C" { * This controls the burst size on various enqueue, dequeue, etc calls. Large * burst size improves throughput, but may degrade QoS (increase latency). */ -#define CONFIG_BURST_SIZE 16 +#define CONFIG_BURST_SIZE 32 Comment: Didn't try larger than 32. 32 is already quite large from QoS point of view. I'm planning to use config file for run time tunning, so this hard coding may change in that phase. > Petri Savolainen(psavol) wrote: > One entry is not lost. >> Petri Savolainen(psavol) wrote: >> One entry is not lost. User provided size if not (currently) used. Queue >> size is always 4k. >>> Petri Savolainen(psavol) wrote: >>> One entry is not lost. Petri Savolainen(psavol) wrote: One entry is not lost. > Petri Savolainen(psavol) wrote: > OK, added checks in v2. >> Petri Savolainen(psavol) wrote: >> OK. Compiler probably did that already, but changed in v2. >>> Petri Savolainen(psavol) wrote: >>> Tail and head indexes are (masked from) uint32_t and do not wrap around >>> when the ring is full. I think you assume that the store index is >>> 0...size-1, while it's full uint32_t which is then masked to get the >>> actual index. >>> >>> For example: >>> size = 100; >>> >>> Empty: >>> head = 100 >>> tail = 100 >>> num = 100 - 100 = 0 >>> >>> Full: >>> head = 100 >>> tail = 200 >>> num = 200 - 100 = 100 >>> >>> Wrap uint32_t + full: >>> head = 0xFF9C >>> tail = 0 >>> num = 0 - 0xFF9C = 0x64 = 100 >>> >>> So, no abs() needed. Ring size can be 4096, instead of 4095. Petri Savolainen(psavol) wrote: It's already documented 5 lines above: /* Initialize ring. Ring size must be a power of two. */ static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) { > Petri Savolainen(psavol) wrote: > This function converts 32 bit buffer indexes to buffer header > pointers. The counter operation is buffer_index_from_buf(). The > prefetch is a side effect of the function, which may be changed/moved > any time if it's found out that there's a place for prefetching. I > actually plan to test if number of prefetches should be limited as > e.g. 32 consecutive prefetches may be too much for some CPU > architectures. >> Petri Savolainen(psavol) wrote: >> I prefer style where '== 0' is used instead of '!'. Especially, when >> the if clause is as complex as this and there's danger for reader to >> miss the '!' sign. >>> Petri Savolainen(psavol) wrote: >>> It's there to ensure that all bits are zero also when someone would >>> modify the bitfield from two to three fields later on. Similarly to >>> memset() zero is used for struct inits. Petri Savolainen(psavol) wrote: There's no need for abs(). Since it's all uint32_t variables, wrap a round is handled already. An example in 8bits: 0xff - 0xfd = 0x02 0x00 - 0xfe = 0x02 0x01 - 0xff = 0x02 0x02 - 0x00 = 0x02 This passes both gcc and clang, and is used already in the other ring implementation see ring_deq_multi(). > Petri Savolainen(psavol) wrote: > I prefer style with blank line in the end of a typedef, since > it's easier to spot the type name (as it's not mixed into struct > field names). Checkpatch passes so this style should be OK. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Does this mean that sizes larger than 32 have no added >> performance benefit? >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if >>> we're not going to use the user-supplied queue size. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Given its name, this looks like an extraneous statement that should be deleted. Renaming this to something like `prefetch_dequeued_bufs()` would make the intent clearer here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > `if (!ring_st_is_empty(>s.ring_st))` seems more > natural here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle >> the effective queue capacity. The user-supplied `size` >> should then be set to `ROUNDUP_POWER2_U32(size) - 1` for the
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Petri Savolainen(psavol) replied on github web page: platform/linux-generic/odp_queue.c @@ -471,51 +476,18 @@ static inline int deq_multi(queue_entry_t *queue, odp_buffer_hdr_t *buf_hdr[], } UNLOCK(>s.lock); - return 0; - } - - for (i = 0; i < num && hdr; ) { - int burst_num = hdr->burst_num; - int first = hdr->burst_first; - /* First, get bursted buffers */ - for (j = 0; j < burst_num && i < num; j++, i++) { - buf_hdr[i] = hdr->burst[first + j]; - odp_prefetch(buf_hdr[i]); - } - - if (burst_num) { - hdr->burst_num = burst_num - j; - hdr->burst_first = first + j; - } - - if (i == num) - break; - - /* When burst is empty, consume the current buffer header and -* move to the next header */ - buf_hdr[i] = hdr; - next = hdr->next; - hdr->next = NULL; - hdr= next; - updated++; - i++; + return 0; } - /* Write head only if updated */ - if (updated) - queue->s.head = hdr; - - /* Queue is empty */ - if (hdr == NULL) - queue->s.tail = NULL; - if (status_sync && queue->s.type == ODP_QUEUE_TYPE_SCHED) sched_fn->save_context(queue->s.index); UNLOCK(>s.lock); - return i; + buffer_index_to_buf(buf_hdr, buf_idx, num_deq); Comment: This function converts 32 bit buffer indexes to buffer header pointers. The counter operation is buffer_index_from_buf(). The prefetch is a side effect of the function, which may be changed/moved any time if it's found out that there's a place for prefetching. I actually plan to test if number of prefetches should be limited as e.g. 32 consecutive prefetches may be too much for some CPU architectures. > Petri Savolainen(psavol) wrote: > I prefer style where '== 0' is used instead of '!'. Especially, when the if > clause is as complex as this and there's danger for reader to miss the '!' > sign. >> Petri Savolainen(psavol) wrote: >> It's there to ensure that all bits are zero also when someone would modify >> the bitfield from two to three fields later on. Similarly to memset() zero >> is used for struct inits. >>> Petri Savolainen(psavol) wrote: >>> There's no need for abs(). Since it's all uint32_t variables, wrap a round >>> is handled already. >>> An example in 8bits: >>> 0xff - 0xfd = 0x02 >>> 0x00 - 0xfe = 0x02 >>> 0x01 - 0xff = 0x02 >>> 0x02 - 0x00 = 0x02 >>> >>> This passes both gcc and clang, and is used already in the other ring >>> implementation see ring_deq_multi(). Petri Savolainen(psavol) wrote: I prefer style with blank line in the end of a typedef, since it's easier to spot the type name (as it's not mixed into struct field names). Checkpatch passes so this style should be OK. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Does this mean that sizes larger than 32 have no added performance > benefit? >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not >> going to use the user-supplied queue size. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Given its name, this looks like an extraneous statement that should be >>> deleted. Renaming this to something like `prefetch_dequeued_bufs()` >>> would make the intent clearer here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: `if (!ring_st_is_empty(>s.ring_st))` seems more natural here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the > effective queue capacity. The user-supplied `size` should then be set > to `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Same comment here as for plain queues. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> As noted earlier, due to "losing" one entry to distinguish queue >>> empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and >>> we also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power >>> of 2. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Since you're initializing `index.pool` and `index.buffer` there's no need to set `index.u32` here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > We originally had this index partitioning based on > `ODP_CONFIG_POOLS`. Do we want to return to that here? > > If not, we at least need an
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 32 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; Comment: This only works if `size` is a power of 2. Should be documented as such, since this is an internal routine. In this case an `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this requirement would be a useful debugging aid. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > should be `num = abs(tail - head);` to deal with wrap arounds, otherwise may > be misinterpreted as overly large since it's `uint32_t`. Note that GCC and > clang recognize `abs()` and treat it as a builtin, so there's no actual > `stdlib.h` call here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Extra blank line should be removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r169793906 updated_at 2018-02-22 09:32:40
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/odp_queue.c line 95 @@ -143,8 +150,10 @@ static int queue_capability(odp_queue_capability_t *capa) capa->max_sched_groups = sched_fn->num_grps(); capa->sched_prios = odp_schedule_num_prio(); capa->plain.max_num = capa->max_queues; + capa->plain.max_size= CONFIG_QUEUE_SIZE; capa->plain.nonblocking = ODP_BLOCKING; capa->sched.max_num = capa->max_queues; + capa->sched.max_size= CONFIG_QUEUE_SIZE; Comment: Same comment here as for plain queues. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > As noted earlier, due to "losing" one entry to distinguish queue empty/full, > this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we also need to > ensure that `CONFIG_QUEUE_SIZE` is itself a power of 2. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Since you're initializing `index.pool` and `index.buffer` there's no need to >> set `index.u32` here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> We originally had this index partitioning based on `ODP_CONFIG_POOLS`. Do >>> we want to return to that here? >>> >>> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that >>> `ODP_CONFIG_POOLS < 256` or else bad things will happen here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: This routine can be optimized to: ``` return ring->head == ring->tail; ``` > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Your invariant is the queue is empty when `head == tail` therefore the > queue is full when `abs(tail - head) == mask`, so the correct calculation > here is: > > `num = mask - abs(tail - head);` > > The effect is that a queue can only hold `size - 1` elements, otherwise > you cannot distinguish between a full and an empty queue without another > bit of metadata, which is a cost you're trying to avoid. > > This is somewhat problematic if the caller is trying to be "optimal" by > specifying a power of two in the `size` parameter of the > `odp_queue_param_t` passed to `odp_queue_create()`. For this reason we > may wish to return a `max_size` of a power of 2 - 1 in > `odp_queue_capability()` as part of this patch series. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> This only works if `size` is a power of 2. Should be documented as such, >> since this is an internal routine. In this case an `ODP_ASSERT(size == >> ROUNDUP_POWER2_U32(size))` for this requirement would be a useful >> debugging aid. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> should be `num = abs(tail - head);` to deal with wrap arounds, >>> otherwise may be misinterpreted as overly large since it's `uint32_t`. >>> Note that GCC and clang recognize `abs()` and treat it as a builtin, so >>> there's no actual `stdlib.h` call here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Extra blank line should be removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r169821967 updated_at 2018-02-22 09:32:40
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Petri Savolainen(psavol) replied on github web page: platform/linux-generic/odp_queue.c line 103 @@ -192,6 +201,9 @@ static odp_queue_t queue_create(const char *name, param = _param; } + if (param->size > CONFIG_QUEUE_SIZE) Comment: One entry is not lost. User provided size if not (currently) used. Queue size is always 4k. > Petri Savolainen(psavol) wrote: > One entry is not lost. >> Petri Savolainen(psavol) wrote: >> One entry is not lost. >>> Petri Savolainen(psavol) wrote: >>> OK, added checks in v2. Petri Savolainen(psavol) wrote: OK. Compiler probably did that already, but changed in v2. > Petri Savolainen(psavol) wrote: > Tail and head indexes are (masked from) uint32_t and do not wrap around > when the ring is full. I think you assume that the store index is > 0...size-1, while it's full uint32_t which is then masked to get the > actual index. > > For example: > size = 100; > > Empty: > head = 100 > tail = 100 > num = 100 - 100 = 0 > > Full: > head = 100 > tail = 200 > num = 200 - 100 = 100 > > Wrap uint32_t + full: > head = 0xFF9C > tail = 0 > num = 0 - 0xFF9C = 0x64 = 100 > > So, no abs() needed. Ring size can be 4096, instead of 4095. >> Petri Savolainen(psavol) wrote: >> It's already documented 5 lines above: >> >> /* Initialize ring. Ring size must be a power of two. */ >> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, >> uint32_t size) >> { >>> Petri Savolainen(psavol) wrote: >>> This function converts 32 bit buffer indexes to buffer header pointers. >>> The counter operation is buffer_index_from_buf(). The prefetch is a >>> side effect of the function, which may be changed/moved any time if >>> it's found out that there's a place for prefetching. I actually plan to >>> test if number of prefetches should be limited as e.g. 32 consecutive >>> prefetches may be too much for some CPU architectures. Petri Savolainen(psavol) wrote: I prefer style where '== 0' is used instead of '!'. Especially, when the if clause is as complex as this and there's danger for reader to miss the '!' sign. > Petri Savolainen(psavol) wrote: > It's there to ensure that all bits are zero also when someone would > modify the bitfield from two to three fields later on. Similarly to > memset() zero is used for struct inits. >> Petri Savolainen(psavol) wrote: >> There's no need for abs(). Since it's all uint32_t variables, wrap a >> round is handled already. >> An example in 8bits: >> 0xff - 0xfd = 0x02 >> 0x00 - 0xfe = 0x02 >> 0x01 - 0xff = 0x02 >> 0x02 - 0x00 = 0x02 >> >> This passes both gcc and clang, and is used already in the other >> ring implementation see ring_deq_multi(). >>> Petri Savolainen(psavol) wrote: >>> I prefer style with blank line in the end of a typedef, since it's >>> easier to spot the type name (as it's not mixed into struct field >>> names). Checkpatch passes so this style should be OK. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Does this mean that sizes larger than 32 have no added performance benefit? > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're > not going to use the user-supplied queue size. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Given its name, this looks like an extraneous statement that >> should be deleted. Renaming this to something like >> `prefetch_dequeued_bufs()` would make the intent clearer here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> `if (!ring_st_is_empty(>s.ring_st))` seems more natural >>> here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the effective queue capacity. The user-supplied `size` should then be set to `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Same comment here as for plain queues. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> As noted earlier, due to "losing" one entry to distinguish >> queue empty/full, this should be returned as >> `CONFIG_QUEUE_SIZE - 1`, and we also need to ensure that >> `CONFIG_QUEUE_SIZE` is itself a power of 2. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Petri Savolainen(psavol) replied on github web page: platform/linux-generic/odp_queue.c line 95 @@ -143,8 +150,10 @@ static int queue_capability(odp_queue_capability_t *capa) capa->max_sched_groups = sched_fn->num_grps(); capa->sched_prios = odp_schedule_num_prio(); capa->plain.max_num = capa->max_queues; + capa->plain.max_size= CONFIG_QUEUE_SIZE; capa->plain.nonblocking = ODP_BLOCKING; capa->sched.max_num = capa->max_queues; + capa->sched.max_size= CONFIG_QUEUE_SIZE; Comment: One entry is not lost. > Petri Savolainen(psavol) wrote: > One entry is not lost. >> Petri Savolainen(psavol) wrote: >> OK, added checks in v2. >>> Petri Savolainen(psavol) wrote: >>> OK. Compiler probably did that already, but changed in v2. Petri Savolainen(psavol) wrote: Tail and head indexes are (masked from) uint32_t and do not wrap around when the ring is full. I think you assume that the store index is 0...size-1, while it's full uint32_t which is then masked to get the actual index. For example: size = 100; Empty: head = 100 tail = 100 num = 100 - 100 = 0 Full: head = 100 tail = 200 num = 200 - 100 = 100 Wrap uint32_t + full: head = 0xFF9C tail = 0 num = 0 - 0xFF9C = 0x64 = 100 So, no abs() needed. Ring size can be 4096, instead of 4095. > Petri Savolainen(psavol) wrote: > It's already documented 5 lines above: > > /* Initialize ring. Ring size must be a power of two. */ > static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t > size) > { >> Petri Savolainen(psavol) wrote: >> This function converts 32 bit buffer indexes to buffer header pointers. >> The counter operation is buffer_index_from_buf(). The prefetch is a side >> effect of the function, which may be changed/moved any time if it's >> found out that there's a place for prefetching. I actually plan to test >> if number of prefetches should be limited as e.g. 32 consecutive >> prefetches may be too much for some CPU architectures. >>> Petri Savolainen(psavol) wrote: >>> I prefer style where '== 0' is used instead of '!'. Especially, when >>> the if clause is as complex as this and there's danger for reader to >>> miss the '!' sign. Petri Savolainen(psavol) wrote: It's there to ensure that all bits are zero also when someone would modify the bitfield from two to three fields later on. Similarly to memset() zero is used for struct inits. > Petri Savolainen(psavol) wrote: > There's no need for abs(). Since it's all uint32_t variables, wrap a > round is handled already. > An example in 8bits: > 0xff - 0xfd = 0x02 > 0x00 - 0xfe = 0x02 > 0x01 - 0xff = 0x02 > 0x02 - 0x00 = 0x02 > > This passes both gcc and clang, and is used already in the other ring > implementation see ring_deq_multi(). >> Petri Savolainen(psavol) wrote: >> I prefer style with blank line in the end of a typedef, since it's >> easier to spot the type name (as it's not mixed into struct field >> names). Checkpatch passes so this style should be OK. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Does this mean that sizes larger than 32 have no added performance >>> benefit? Bill Fischofer(Bill-Fischofer-Linaro) wrote: Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not going to use the user-supplied queue size. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Given its name, this looks like an extraneous statement that > should be deleted. Renaming this to something like > `prefetch_dequeued_bufs()` would make the intent clearer here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> `if (!ring_st_is_empty(>s.ring_st))` seems more natural >> here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the >>> effective queue capacity. The user-supplied `size` should then >>> be set to `ROUNDUP_POWER2_U32(size) - 1` for the masking to >>> work properly. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Same comment here as for plain queues. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > As noted earlier, due to "losing" one entry to distinguish > queue empty/full, this should be returned as > `CONFIG_QUEUE_SIZE - 1`, and we also need to ensure that > `CONFIG_QUEUE_SIZE` is itself a power of 2.
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Petri Savolainen(psavol) replied on github web page: platform/linux-generic/odp_queue.c line 92 @@ -143,8 +150,10 @@ static int queue_capability(odp_queue_capability_t *capa) capa->max_sched_groups = sched_fn->num_grps(); capa->sched_prios = odp_schedule_num_prio(); capa->plain.max_num = capa->max_queues; + capa->plain.max_size= CONFIG_QUEUE_SIZE; Comment: One entry is not lost. > Petri Savolainen(psavol) wrote: > OK, added checks in v2. >> Petri Savolainen(psavol) wrote: >> OK. Compiler probably did that already, but changed in v2. >>> Petri Savolainen(psavol) wrote: >>> Tail and head indexes are (masked from) uint32_t and do not wrap around >>> when the ring is full. I think you assume that the store index is >>> 0...size-1, while it's full uint32_t which is then masked to get the actual >>> index. >>> >>> For example: >>> size = 100; >>> >>> Empty: >>> head = 100 >>> tail = 100 >>> num = 100 - 100 = 0 >>> >>> Full: >>> head = 100 >>> tail = 200 >>> num = 200 - 100 = 100 >>> >>> Wrap uint32_t + full: >>> head = 0xFF9C >>> tail = 0 >>> num = 0 - 0xFF9C = 0x64 = 100 >>> >>> So, no abs() needed. Ring size can be 4096, instead of 4095. Petri Savolainen(psavol) wrote: It's already documented 5 lines above: /* Initialize ring. Ring size must be a power of two. */ static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) { > Petri Savolainen(psavol) wrote: > This function converts 32 bit buffer indexes to buffer header pointers. > The counter operation is buffer_index_from_buf(). The prefetch is a side > effect of the function, which may be changed/moved any time if it's found > out that there's a place for prefetching. I actually plan to test if > number of prefetches should be limited as e.g. 32 consecutive prefetches > may be too much for some CPU architectures. >> Petri Savolainen(psavol) wrote: >> I prefer style where '== 0' is used instead of '!'. Especially, when the >> if clause is as complex as this and there's danger for reader to miss >> the '!' sign. >>> Petri Savolainen(psavol) wrote: >>> It's there to ensure that all bits are zero also when someone would >>> modify the bitfield from two to three fields later on. Similarly to >>> memset() zero is used for struct inits. Petri Savolainen(psavol) wrote: There's no need for abs(). Since it's all uint32_t variables, wrap a round is handled already. An example in 8bits: 0xff - 0xfd = 0x02 0x00 - 0xfe = 0x02 0x01 - 0xff = 0x02 0x02 - 0x00 = 0x02 This passes both gcc and clang, and is used already in the other ring implementation see ring_deq_multi(). > Petri Savolainen(psavol) wrote: > I prefer style with blank line in the end of a typedef, since it's > easier to spot the type name (as it's not mixed into struct field > names). Checkpatch passes so this style should be OK. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Does this mean that sizes larger than 32 have no added performance >> benefit? >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're >>> not going to use the user-supplied queue size. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Given its name, this looks like an extraneous statement that should be deleted. Renaming this to something like `prefetch_dequeued_bufs()` would make the intent clearer here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > `if (!ring_st_is_empty(>s.ring_st))` seems more natural > here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the >> effective queue capacity. The user-supplied `size` should then >> be set to `ROUNDUP_POWER2_U32(size) - 1` for the masking to work >> properly. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Same comment here as for plain queues. Bill Fischofer(Bill-Fischofer-Linaro) wrote: As noted earlier, due to "losing" one entry to distinguish queue empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power of 2. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Since you're initializing `index.pool` and `index.buffer` > there's no need to set `index.u32` here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> We originally had this
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Petri Savolainen(psavol) replied on github web page: platform/linux-generic/include/odp_buffer_internal.h line 17 @@ -41,11 +41,19 @@ typedef struct seg_entry_t { uint32_t len; } seg_entry_t; +typedef union buffer_index_t { + uint32_t u32; + + struct { + uint32_t pool :8; + uint32_t buffer :24; Comment: OK, added checks in v2. > Petri Savolainen(psavol) wrote: > OK. Compiler probably did that already, but changed in v2. >> Petri Savolainen(psavol) wrote: >> Tail and head indexes are (masked from) uint32_t and do not wrap around when >> the ring is full. I think you assume that the store index is 0...size-1, >> while it's full uint32_t which is then masked to get the actual index. >> >> For example: >> size = 100; >> >> Empty: >> head = 100 >> tail = 100 >> num = 100 - 100 = 0 >> >> Full: >> head = 100 >> tail = 200 >> num = 200 - 100 = 100 >> >> Wrap uint32_t + full: >> head = 0xFF9C >> tail = 0 >> num = 0 - 0xFF9C = 0x64 = 100 >> >> So, no abs() needed. Ring size can be 4096, instead of 4095. >>> Petri Savolainen(psavol) wrote: >>> It's already documented 5 lines above: >>> >>> /* Initialize ring. Ring size must be a power of two. */ >>> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t >>> size) >>> { Petri Savolainen(psavol) wrote: This function converts 32 bit buffer indexes to buffer header pointers. The counter operation is buffer_index_from_buf(). The prefetch is a side effect of the function, which may be changed/moved any time if it's found out that there's a place for prefetching. I actually plan to test if number of prefetches should be limited as e.g. 32 consecutive prefetches may be too much for some CPU architectures. > Petri Savolainen(psavol) wrote: > I prefer style where '== 0' is used instead of '!'. Especially, when the > if clause is as complex as this and there's danger for reader to miss the > '!' sign. >> Petri Savolainen(psavol) wrote: >> It's there to ensure that all bits are zero also when someone would >> modify the bitfield from two to three fields later on. Similarly to >> memset() zero is used for struct inits. >>> Petri Savolainen(psavol) wrote: >>> There's no need for abs(). Since it's all uint32_t variables, wrap a >>> round is handled already. >>> An example in 8bits: >>> 0xff - 0xfd = 0x02 >>> 0x00 - 0xfe = 0x02 >>> 0x01 - 0xff = 0x02 >>> 0x02 - 0x00 = 0x02 >>> >>> This passes both gcc and clang, and is used already in the other ring >>> implementation see ring_deq_multi(). Petri Savolainen(psavol) wrote: I prefer style with blank line in the end of a typedef, since it's easier to spot the type name (as it's not mixed into struct field names). Checkpatch passes so this style should be OK. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Does this mean that sizes larger than 32 have no added performance > benefit? >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're >> not going to use the user-supplied queue size. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Given its name, this looks like an extraneous statement that should >>> be deleted. Renaming this to something like >>> `prefetch_dequeued_bufs()` would make the intent clearer here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: `if (!ring_st_is_empty(>s.ring_st))` seems more natural here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the > effective queue capacity. The user-supplied `size` should then be > set to `ROUNDUP_POWER2_U32(size) - 1` for the masking to work > properly. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Same comment here as for plain queues. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> As noted earlier, due to "losing" one entry to distinguish >>> queue empty/full, this should be returned as `CONFIG_QUEUE_SIZE >>> - 1`, and we also need to ensure that `CONFIG_QUEUE_SIZE` is >>> itself a power of 2. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Since you're initializing `index.pool` and `index.buffer` there's no need to set `index.u32` here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > We originally had this index partitioning based on > `ODP_CONFIG_POOLS`. Do we want to return to that here? > > If not, we at least need an `ODP_STATIC_ASSERT()` to ensure
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Petri Savolainen(psavol) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; + ring->data = data; +} + +/* Dequeue data from the ring head. Max_num is smaller than ring size.*/ +static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[], +uint32_t max_num) +{ + uint32_t head, tail, mask, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + num = tail - head; + + /* Empty */ + if (num == 0) + return 0; + + if (num > max_num) + num = max_num; + + idx = head & mask; + + for (i = 0; i < num; i++) { + data[i] = ring->data[idx]; + idx = (idx + 1) & mask; + } + + ring->head = head + num; + + return num; +} + +/* Enqueue data into the ring tail. Num_data is smaller than ring size. */ +static inline uint32_t ring_st_enq_multi(ring_st_t *ring, const uint32_t data[], +uint32_t num_data) +{ + uint32_t head, tail, mask, size, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + size = mask + 1; + num = size - (tail - head); + + /* Full */ + if (num == 0) + return 0; + + if (num > num_data) + num = num_data; + + idx = tail & mask; + + for (i = 0; i < num; i++) { + ring->data[idx] = data[i]; + idx = (idx + 1) & mask; + } + + ring->tail = tail + num; + + return num; +} + +/* Check if ring is empty */ +static inline int ring_st_is_empty(ring_st_t *ring) +{ + uint32_t head, tail, num; + + head = ring->head; + tail = ring->tail; + num = tail - head; + + if (num == 0) + return 1; + + return 0; Comment: OK. Compiler probably did that already, but changed in v2. > Petri Savolainen(psavol) wrote: > Tail and head indexes are (masked from) uint32_t and do not wrap around when > the ring is full. I think you assume that the store index is 0...size-1, > while it's full uint32_t which is then masked to get the actual index. > > For example: > size = 100; > > Empty: > head = 100 > tail = 100 > num = 100 - 100 = 0 > > Full: > head = 100 > tail = 200 > num = 200 - 100 = 100 > > Wrap uint32_t + full: > head = 0xFF9C > tail = 0 > num = 0 - 0xFF9C = 0x64 = 100 > > So, no abs() needed. Ring size can be 4096, instead of 4095. >> Petri Savolainen(psavol) wrote: >> It's already documented 5 lines above: >> >> /* Initialize ring. Ring size must be a power of two. */ >> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t >> size) >> { >>> Petri Savolainen(psavol) wrote: >>> This function converts 32 bit buffer indexes to buffer header pointers. The >>> counter operation is buffer_index_from_buf(). The prefetch is a side effect >>> of the function, which may be changed/moved any time if it's found out that >>> there's a place for prefetching. I actually plan to test if number of >>> prefetches should be limited as e.g. 32 consecutive prefetches may be too >>> much for some CPU architectures. Petri Savolainen(psavol) wrote: I prefer style where '== 0' is used instead of '!'. Especially, when the if clause is as complex as this and there's danger for reader to miss the '!' sign. > Petri Savolainen(psavol) wrote: > It's there to ensure that all bits are zero also when someone would > modify the bitfield from two to three fields later on. Similarly to > memset() zero is used for struct inits. >> Petri Savolainen(psavol) wrote: >> There's no need for abs(). Since it's all uint32_t variables, wrap a >> round is handled already. >> An example in 8bits: >> 0xff - 0xfd = 0x02 >> 0x00 - 0xfe = 0x02 >> 0x01 - 0xff = 0x02 >> 0x02 - 0x00 = 0x02 >> >> This passes both gcc and clang, and is used already in the other ring >> implementation see ring_deq_multi(). >>> Petri Savolainen(psavol) wrote: >>>
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/odp_queue.c line 103 @@ -192,6 +201,9 @@ static odp_queue_t queue_create(const char *name, param = _param; } + if (param->size > CONFIG_QUEUE_SIZE) Comment: Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the effective queue capacity. The user-supplied `size` should then be set to `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Same comment here as for plain queues. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> As noted earlier, due to "losing" one entry to distinguish queue empty/full, >> this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we also need to >> ensure that `CONFIG_QUEUE_SIZE` is itself a power of 2. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Since you're initializing `index.pool` and `index.buffer` there's no need >>> to set `index.u32` here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: We originally had this index partitioning based on `ODP_CONFIG_POOLS`. Do we want to return to that here? If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that `ODP_CONFIG_POOLS < 256` or else bad things will happen here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > This routine can be optimized to: > ``` > return ring->head == ring->tail; > ``` >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Your invariant is the queue is empty when `head == tail` therefore the >> queue is full when `abs(tail - head) == mask`, so the correct >> calculation here is: >> >> `num = mask - abs(tail - head);` >> >> The effect is that a queue can only hold `size - 1` elements, otherwise >> you cannot distinguish between a full and an empty queue without another >> bit of metadata, which is a cost you're trying to avoid. >> >> This is somewhat problematic if the caller is trying to be "optimal" by >> specifying a power of two in the `size` parameter of the >> `odp_queue_param_t` passed to `odp_queue_create()`. For this reason we >> may wish to return a `max_size` of a power of 2 - 1 in >> `odp_queue_capability()` as part of this patch series. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> This only works if `size` is a power of 2. Should be documented as >>> such, since this is an internal routine. In this case an >>> `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this requirement >>> would be a useful debugging aid. Bill Fischofer(Bill-Fischofer-Linaro) wrote: should be `num = abs(tail - head);` to deal with wrap arounds, otherwise may be misinterpreted as overly large since it's `uint32_t`. Note that GCC and clang recognize `abs()` and treat it as a builtin, so there's no actual `stdlib.h` call here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Extra blank line should be removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r169822191 updated_at 2018-02-22 09:32:40
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Petri Savolainen(psavol) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 78 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; + ring->data = data; +} + +/* Dequeue data from the ring head. Max_num is smaller than ring size.*/ +static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[], +uint32_t max_num) +{ + uint32_t head, tail, mask, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + num = tail - head; + + /* Empty */ + if (num == 0) + return 0; + + if (num > max_num) + num = max_num; + + idx = head & mask; + + for (i = 0; i < num; i++) { + data[i] = ring->data[idx]; + idx = (idx + 1) & mask; + } + + ring->head = head + num; + + return num; +} + +/* Enqueue data into the ring tail. Num_data is smaller than ring size. */ +static inline uint32_t ring_st_enq_multi(ring_st_t *ring, const uint32_t data[], +uint32_t num_data) +{ + uint32_t head, tail, mask, size, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + size = mask + 1; + num = size - (tail - head); Comment: Tail and head indexes are (masked from) uint32_t and do not wrap around when the ring is full. I think you assume that the store index is 0...size-1, while it's full uint32_t which is then masked to get the actual index. For example: size = 100; Empty: head = 100 tail = 100 num = 100 - 100 = 0 Full: head = 100 tail = 200 num = 200 - 100 = 100 Wrap uint32_t + full: head = 0xFF9C tail = 0 num = 0 - 0xFF9C = 0x64 = 100 So, no abs() needed. Ring size can be 4096, instead of 4095. > Petri Savolainen(psavol) wrote: > It's already documented 5 lines above: > > /* Initialize ring. Ring size must be a power of two. */ > static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t > size) > { >> Petri Savolainen(psavol) wrote: >> This function converts 32 bit buffer indexes to buffer header pointers. The >> counter operation is buffer_index_from_buf(). The prefetch is a side effect >> of the function, which may be changed/moved any time if it's found out that >> there's a place for prefetching. I actually plan to test if number of >> prefetches should be limited as e.g. 32 consecutive prefetches may be too >> much for some CPU architectures. >>> Petri Savolainen(psavol) wrote: >>> I prefer style where '== 0' is used instead of '!'. Especially, when the if >>> clause is as complex as this and there's danger for reader to miss the '!' >>> sign. Petri Savolainen(psavol) wrote: It's there to ensure that all bits are zero also when someone would modify the bitfield from two to three fields later on. Similarly to memset() zero is used for struct inits. > Petri Savolainen(psavol) wrote: > There's no need for abs(). Since it's all uint32_t variables, wrap a > round is handled already. > An example in 8bits: > 0xff - 0xfd = 0x02 > 0x00 - 0xfe = 0x02 > 0x01 - 0xff = 0x02 > 0x02 - 0x00 = 0x02 > > This passes both gcc and clang, and is used already in the other ring > implementation see ring_deq_multi(). >> Petri Savolainen(psavol) wrote: >> I prefer style with blank line in the end of a typedef, since it's >> easier to spot the type name (as it's not mixed into struct field >> names). Checkpatch passes so this style should be OK. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Does this mean that sizes larger than 32 have no added performance >>> benefit? Bill Fischofer(Bill-Fischofer-Linaro) wrote: Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not going to use the user-supplied queue size. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Given its name, this looks like an extraneous statement that should > be deleted. Renaming this to something like > `prefetch_dequeued_bufs()` would make the intent clearer
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Petri Savolainen(psavol) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 32 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; Comment: It's already documented 5 lines above: /* Initialize ring. Ring size must be a power of two. */ static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) { > Petri Savolainen(psavol) wrote: > This function converts 32 bit buffer indexes to buffer header pointers. The > counter operation is buffer_index_from_buf(). The prefetch is a side effect > of the function, which may be changed/moved any time if it's found out that > there's a place for prefetching. I actually plan to test if number of > prefetches should be limited as e.g. 32 consecutive prefetches may be too > much for some CPU architectures. >> Petri Savolainen(psavol) wrote: >> I prefer style where '== 0' is used instead of '!'. Especially, when the if >> clause is as complex as this and there's danger for reader to miss the '!' >> sign. >>> Petri Savolainen(psavol) wrote: >>> It's there to ensure that all bits are zero also when someone would modify >>> the bitfield from two to three fields later on. Similarly to memset() zero >>> is used for struct inits. Petri Savolainen(psavol) wrote: There's no need for abs(). Since it's all uint32_t variables, wrap a round is handled already. An example in 8bits: 0xff - 0xfd = 0x02 0x00 - 0xfe = 0x02 0x01 - 0xff = 0x02 0x02 - 0x00 = 0x02 This passes both gcc and clang, and is used already in the other ring implementation see ring_deq_multi(). > Petri Savolainen(psavol) wrote: > I prefer style with blank line in the end of a typedef, since it's easier > to spot the type name (as it's not mixed into struct field names). > Checkpatch passes so this style should be OK. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Does this mean that sizes larger than 32 have no added performance >> benefit? >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not >>> going to use the user-supplied queue size. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Given its name, this looks like an extraneous statement that should be deleted. Renaming this to something like `prefetch_dequeued_bufs()` would make the intent clearer here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > `if (!ring_st_is_empty(>s.ring_st))` seems more natural here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the >> effective queue capacity. The user-supplied `size` should then be >> set to `ROUNDUP_POWER2_U32(size) - 1` for the masking to work >> properly. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Same comment here as for plain queues. Bill Fischofer(Bill-Fischofer-Linaro) wrote: As noted earlier, due to "losing" one entry to distinguish queue empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power of 2. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Since you're initializing `index.pool` and `index.buffer` there's > no need to set `index.u32` here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> We originally had this index partitioning based on >> `ODP_CONFIG_POOLS`. Do we want to return to that here? >> >> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that >> `ODP_CONFIG_POOLS < 256` or else bad things will happen here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> This routine can be optimized to: >>> ``` >>> return ring->head == ring->tail; >>> ``` Bill Fischofer(Bill-Fischofer-Linaro) wrote: Your invariant is the queue is empty when `head == tail` therefore
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Petri Savolainen(psavol) replied on github web page: platform/linux-generic/odp_queue.c @@ -263,7 +275,7 @@ static int queue_destroy(odp_queue_t handle) ODP_ERR("queue \"%s\" already destroyed\n", queue->s.name); return -1; } - if (queue->s.head != NULL) { + if (ring_st_is_empty(>s.ring_st) == 0) { Comment: I prefer style where '== 0' is used instead of '!'. Especially, when the if clause is as complex as this and there's danger for reader to miss the '!' sign. > Petri Savolainen(psavol) wrote: > It's there to ensure that all bits are zero also when someone would modify > the bitfield from two to three fields later on. Similarly to memset() zero is > used for struct inits. >> Petri Savolainen(psavol) wrote: >> There's no need for abs(). Since it's all uint32_t variables, wrap a round >> is handled already. >> An example in 8bits: >> 0xff - 0xfd = 0x02 >> 0x00 - 0xfe = 0x02 >> 0x01 - 0xff = 0x02 >> 0x02 - 0x00 = 0x02 >> >> This passes both gcc and clang, and is used already in the other ring >> implementation see ring_deq_multi(). >>> Petri Savolainen(psavol) wrote: >>> I prefer style with blank line in the end of a typedef, since it's easier >>> to spot the type name (as it's not mixed into struct field names). >>> Checkpatch passes so this style should be OK. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Does this mean that sizes larger than 32 have no added performance benefit? > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not > going to use the user-supplied queue size. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Given its name, this looks like an extraneous statement that should be >> deleted. Renaming this to something like `prefetch_dequeued_bufs()` >> would make the intent clearer here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> `if (!ring_st_is_empty(>s.ring_st))` seems more natural here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the effective queue capacity. The user-supplied `size` should then be set to `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Same comment here as for plain queues. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> As noted earlier, due to "losing" one entry to distinguish queue >> empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and >> we also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power of >> 2. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Since you're initializing `index.pool` and `index.buffer` there's >>> no need to set `index.u32` here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: We originally had this index partitioning based on `ODP_CONFIG_POOLS`. Do we want to return to that here? If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that `ODP_CONFIG_POOLS < 256` or else bad things will happen here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > This routine can be optimized to: > ``` > return ring->head == ring->tail; > ``` >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Your invariant is the queue is empty when `head == tail` >> therefore the queue is full when `abs(tail - head) == mask`, so >> the correct calculation here is: >> >> `num = mask - abs(tail - head);` >> >> The effect is that a queue can only hold `size - 1` elements, >> otherwise you cannot distinguish between a full and an empty >> queue without another bit of metadata, which is a cost you're >> trying to avoid. >> >> This is somewhat problematic if the caller is trying to be >> "optimal" by specifying a power of two in the `size` parameter >> of the `odp_queue_param_t` passed to `odp_queue_create()`. For >> this reason we may wish to return a `max_size` of a power of 2 - >> 1 in `odp_queue_capability()` as part of this patch series. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> This only works if `size` is a power of 2. Should be documented >>> as such, since this is an internal routine. In this case an >>> `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this >>> requirement would be a useful debugging aid. Bill Fischofer(Bill-Fischofer-Linaro) wrote: should be `num = abs(tail - head);` to deal with wrap arounds,
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Petri Savolainen(psavol) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 24 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + Comment: I prefer style with blank line in the end of a typedef, since it's easier to spot the type name (as it's not mixed into struct field names). Checkpatch passes so this style should be OK. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Does this mean that sizes larger than 32 have no added performance benefit? >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not going >> to use the user-supplied queue size. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Given its name, this looks like an extraneous statement that should be >>> deleted. Renaming this to something like `prefetch_dequeued_bufs()` would >>> make the intent clearer here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: `if (!ring_st_is_empty(>s.ring_st))` seems more natural here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the effective > queue capacity. The user-supplied `size` should then be set to > `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Same comment here as for plain queues. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> As noted earlier, due to "losing" one entry to distinguish queue >>> empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we >>> also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power of 2. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Since you're initializing `index.pool` and `index.buffer` there's no need to set `index.u32` here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > We originally had this index partitioning based on > `ODP_CONFIG_POOLS`. Do we want to return to that here? > > If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that > `ODP_CONFIG_POOLS < 256` or else bad things will happen here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> This routine can be optimized to: >> ``` >> return ring->head == ring->tail; >> ``` >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Your invariant is the queue is empty when `head == tail` therefore >>> the queue is full when `abs(tail - head) == mask`, so the correct >>> calculation here is: >>> >>> `num = mask - abs(tail - head);` >>> >>> The effect is that a queue can only hold `size - 1` elements, >>> otherwise you cannot distinguish between a full and an empty queue >>> without another bit of metadata, which is a cost you're trying to >>> avoid. >>> >>> This is somewhat problematic if the caller is trying to be >>> "optimal" by specifying a power of two in the `size` parameter of >>> the `odp_queue_param_t` passed to `odp_queue_create()`. For this >>> reason we may wish to return a `max_size` of a power of 2 - 1 in >>> `odp_queue_capability()` as part of this patch series. Bill Fischofer(Bill-Fischofer-Linaro) wrote: This only works if `size` is a power of 2. Should be documented as such, since this is an internal routine. In this case an `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this requirement would be a useful debugging aid. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > should be `num = abs(tail - head);` to deal with wrap arounds, > otherwise may be misinterpreted as overly large since it's > `uint32_t`. Note that GCC and clang recognize `abs()` and treat > it as a builtin, so there's no actual `stdlib.h` call here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Extra blank line should be removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r169871326 updated_at 2018-02-22 09:32:40
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Petri Savolainen(psavol) replied on github web page: platform/linux-generic/odp_pool.c line 28 @@ -296,7 +282,9 @@ static void init_buffers(pool_t *pool) memset(buf_hdr, 0, (uintptr_t)data - (uintptr_t)buf_hdr); /* Initialize buffer metadata */ - buf_hdr->index = i; + buf_hdr->index.u32= 0; + buf_hdr->index.pool = pool->pool_idx; + buf_hdr->index.buffer = i; Comment: It's there to ensure that all bits are zero also when someone would modify the bitfield from two to three fields later on. Similarly to memset() zero is used for struct inits. > Petri Savolainen(psavol) wrote: > There's no need for abs(). Since it's all uint32_t variables, wrap a round is > handled already. > An example in 8bits: > 0xff - 0xfd = 0x02 > 0x00 - 0xfe = 0x02 > 0x01 - 0xff = 0x02 > 0x02 - 0x00 = 0x02 > > This passes both gcc and clang, and is used already in the other ring > implementation see ring_deq_multi(). >> Petri Savolainen(psavol) wrote: >> I prefer style with blank line in the end of a typedef, since it's easier to >> spot the type name (as it's not mixed into struct field names). Checkpatch >> passes so this style should be OK. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Does this mean that sizes larger than 32 have no added performance benefit? Bill Fischofer(Bill-Fischofer-Linaro) wrote: Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not going to use the user-supplied queue size. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Given its name, this looks like an extraneous statement that should be > deleted. Renaming this to something like `prefetch_dequeued_bufs()` would > make the intent clearer here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> `if (!ring_st_is_empty(>s.ring_st))` seems more natural here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the >>> effective queue capacity. The user-supplied `size` should then be set >>> to `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Same comment here as for plain queues. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > As noted earlier, due to "losing" one entry to distinguish queue > empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and > we also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power of > 2. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Since you're initializing `index.pool` and `index.buffer` there's no >> need to set `index.u32` here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> We originally had this index partitioning based on >>> `ODP_CONFIG_POOLS`. Do we want to return to that here? >>> >>> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that >>> `ODP_CONFIG_POOLS < 256` or else bad things will happen here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: This routine can be optimized to: ``` return ring->head == ring->tail; ``` > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Your invariant is the queue is empty when `head == tail` > therefore the queue is full when `abs(tail - head) == mask`, so > the correct calculation here is: > > `num = mask - abs(tail - head);` > > The effect is that a queue can only hold `size - 1` elements, > otherwise you cannot distinguish between a full and an empty > queue without another bit of metadata, which is a cost you're > trying to avoid. > > This is somewhat problematic if the caller is trying to be > "optimal" by specifying a power of two in the `size` parameter of > the `odp_queue_param_t` passed to `odp_queue_create()`. For this > reason we may wish to return a `max_size` of a power of 2 - 1 in > `odp_queue_capability()` as part of this patch series. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> This only works if `size` is a power of 2. Should be documented >> as such, since this is an internal routine. In this case an >> `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this >> requirement would be a useful debugging aid. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> should be `num = abs(tail - head);` to deal with wrap arounds, >>> otherwise may be misinterpreted as overly large since it's >>> `uint32_t`. Note that GCC and clang recognize `abs()` and treat >>> it as
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_config_internal.h line 5 @@ -144,7 +144,7 @@ extern "C" { * This controls the burst size on various enqueue, dequeue, etc calls. Large * burst size improves throughput, but may degrade QoS (increase latency). */ -#define CONFIG_BURST_SIZE 16 +#define CONFIG_BURST_SIZE 32 Comment: Does this mean that sizes larger than 32 have no added performance benefit? > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not going > to use the user-supplied queue size. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Given its name, this looks like an extraneous statement that should be >> deleted. Renaming this to something like `prefetch_dequeued_bufs()` would >> make the intent clearer here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> `if (!ring_st_is_empty(>s.ring_st))` seems more natural here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the effective queue capacity. The user-supplied `size` should then be set to `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Same comment here as for plain queues. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> As noted earlier, due to "losing" one entry to distinguish queue >> empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we >> also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power of 2. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Since you're initializing `index.pool` and `index.buffer` there's no >>> need to set `index.u32` here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: We originally had this index partitioning based on `ODP_CONFIG_POOLS`. Do we want to return to that here? If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that `ODP_CONFIG_POOLS < 256` or else bad things will happen here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > This routine can be optimized to: > ``` > return ring->head == ring->tail; > ``` >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Your invariant is the queue is empty when `head == tail` therefore >> the queue is full when `abs(tail - head) == mask`, so the correct >> calculation here is: >> >> `num = mask - abs(tail - head);` >> >> The effect is that a queue can only hold `size - 1` elements, >> otherwise you cannot distinguish between a full and an empty queue >> without another bit of metadata, which is a cost you're trying to >> avoid. >> >> This is somewhat problematic if the caller is trying to be "optimal" >> by specifying a power of two in the `size` parameter of the >> `odp_queue_param_t` passed to `odp_queue_create()`. For this reason >> we may wish to return a `max_size` of a power of 2 - 1 in >> `odp_queue_capability()` as part of this patch series. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> This only works if `size` is a power of 2. Should be documented as >>> such, since this is an internal routine. In this case an >>> `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this requirement >>> would be a useful debugging aid. Bill Fischofer(Bill-Fischofer-Linaro) wrote: should be `num = abs(tail - head);` to deal with wrap arounds, otherwise may be misinterpreted as overly large since it's `uint32_t`. Note that GCC and clang recognize `abs()` and treat it as a builtin, so there's no actual `stdlib.h` call here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Extra blank line should be removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r169825596 updated_at 2018-02-22 09:32:40
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/odp_queue.c line 420 @@ -584,8 +556,9 @@ static int queue_init(queue_entry_t *queue, const char *name, queue->s.pktin = PKTIN_INVALID; queue->s.pktout = PKTOUT_INVALID; - queue->s.head = NULL; - queue->s.tail = NULL; + ring_st_init(>s.ring_st, +queue_tbl->ring_data[queue->s.index].data, +CONFIG_QUEUE_SIZE); Comment: Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're not going to use the user-supplied queue size. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Given its name, this looks like an extraneous statement that should be > deleted. Renaming this to something like `prefetch_dequeued_bufs()` would > make the intent clearer here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> `if (!ring_st_is_empty(>s.ring_st))` seems more natural here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the effective >>> queue capacity. The user-supplied `size` should then be set to >>> `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Same comment here as for plain queues. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > As noted earlier, due to "losing" one entry to distinguish queue > empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we > also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power of 2. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Since you're initializing `index.pool` and `index.buffer` there's no >> need to set `index.u32` here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> We originally had this index partitioning based on `ODP_CONFIG_POOLS`. >>> Do we want to return to that here? >>> >>> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that >>> `ODP_CONFIG_POOLS < 256` or else bad things will happen here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: This routine can be optimized to: ``` return ring->head == ring->tail; ``` > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Your invariant is the queue is empty when `head == tail` therefore > the queue is full when `abs(tail - head) == mask`, so the correct > calculation here is: > > `num = mask - abs(tail - head);` > > The effect is that a queue can only hold `size - 1` elements, > otherwise you cannot distinguish between a full and an empty queue > without another bit of metadata, which is a cost you're trying to > avoid. > > This is somewhat problematic if the caller is trying to be "optimal" > by specifying a power of two in the `size` parameter of the > `odp_queue_param_t` passed to `odp_queue_create()`. For this reason > we may wish to return a `max_size` of a power of 2 - 1 in > `odp_queue_capability()` as part of this patch series. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> This only works if `size` is a power of 2. Should be documented as >> such, since this is an internal routine. In this case an >> `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this requirement >> would be a useful debugging aid. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> should be `num = abs(tail - head);` to deal with wrap arounds, >>> otherwise may be misinterpreted as overly large since it's >>> `uint32_t`. Note that GCC and clang recognize `abs()` and treat it >>> as a builtin, so there's no actual `stdlib.h` call here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Extra blank line should be removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r169825091 updated_at 2018-02-22 09:32:40
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/odp_queue.c @@ -471,51 +476,18 @@ static inline int deq_multi(queue_entry_t *queue, odp_buffer_hdr_t *buf_hdr[], } UNLOCK(>s.lock); - return 0; - } - - for (i = 0; i < num && hdr; ) { - int burst_num = hdr->burst_num; - int first = hdr->burst_first; - /* First, get bursted buffers */ - for (j = 0; j < burst_num && i < num; j++, i++) { - buf_hdr[i] = hdr->burst[first + j]; - odp_prefetch(buf_hdr[i]); - } - - if (burst_num) { - hdr->burst_num = burst_num - j; - hdr->burst_first = first + j; - } - - if (i == num) - break; - - /* When burst is empty, consume the current buffer header and -* move to the next header */ - buf_hdr[i] = hdr; - next = hdr->next; - hdr->next = NULL; - hdr= next; - updated++; - i++; + return 0; } - /* Write head only if updated */ - if (updated) - queue->s.head = hdr; - - /* Queue is empty */ - if (hdr == NULL) - queue->s.tail = NULL; - if (status_sync && queue->s.type == ODP_QUEUE_TYPE_SCHED) sched_fn->save_context(queue->s.index); UNLOCK(>s.lock); - return i; + buffer_index_to_buf(buf_hdr, buf_idx, num_deq); Comment: Given its name, this looks like an extraneous statement that should be deleted. Renaming this to something like `prefetch_dequeued_bufs()` would make the intent clearer here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > `if (!ring_st_is_empty(>s.ring_st))` seems more natural here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the effective >> queue capacity. The user-supplied `size` should then be set to >> `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Same comment here as for plain queues. Bill Fischofer(Bill-Fischofer-Linaro) wrote: As noted earlier, due to "losing" one entry to distinguish queue empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power of 2. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Since you're initializing `index.pool` and `index.buffer` there's no need > to set `index.u32` here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> We originally had this index partitioning based on `ODP_CONFIG_POOLS`. >> Do we want to return to that here? >> >> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that >> `ODP_CONFIG_POOLS < 256` or else bad things will happen here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> This routine can be optimized to: >>> ``` >>> return ring->head == ring->tail; >>> ``` Bill Fischofer(Bill-Fischofer-Linaro) wrote: Your invariant is the queue is empty when `head == tail` therefore the queue is full when `abs(tail - head) == mask`, so the correct calculation here is: `num = mask - abs(tail - head);` The effect is that a queue can only hold `size - 1` elements, otherwise you cannot distinguish between a full and an empty queue without another bit of metadata, which is a cost you're trying to avoid. This is somewhat problematic if the caller is trying to be "optimal" by specifying a power of two in the `size` parameter of the `odp_queue_param_t` passed to `odp_queue_create()`. For this reason we may wish to return a `max_size` of a power of 2 - 1 in `odp_queue_capability()` as part of this patch series. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > This only works if `size` is a power of 2. Should be documented as > such, since this is an internal routine. In this case an > `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this requirement > would be a useful debugging aid. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> should be `num = abs(tail - head);` to deal with wrap arounds, >> otherwise may be misinterpreted as overly large since it's >> `uint32_t`. Note that GCC and clang recognize `abs()` and treat it >> as a builtin, so there's no actual `stdlib.h` call here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Extra blank line should be removed (nit).
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/odp_queue.c @@ -263,7 +275,7 @@ static int queue_destroy(odp_queue_t handle) ODP_ERR("queue \"%s\" already destroyed\n", queue->s.name); return -1; } - if (queue->s.head != NULL) { + if (ring_st_is_empty(>s.ring_st) == 0) { Comment: `if (!ring_st_is_empty(>s.ring_st))` seems more natural here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the effective > queue capacity. The user-supplied `size` should then be set to > `ROUNDUP_POWER2_U32(size) - 1` for the masking to work properly. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Same comment here as for plain queues. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> As noted earlier, due to "losing" one entry to distinguish queue >>> empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we also >>> need to ensure that `CONFIG_QUEUE_SIZE` is itself a power of 2. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Since you're initializing `index.pool` and `index.buffer` there's no need to set `index.u32` here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > We originally had this index partitioning based on `ODP_CONFIG_POOLS`. Do > we want to return to that here? > > If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that > `ODP_CONFIG_POOLS < 256` or else bad things will happen here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> This routine can be optimized to: >> ``` >> return ring->head == ring->tail; >> ``` >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Your invariant is the queue is empty when `head == tail` therefore the >>> queue is full when `abs(tail - head) == mask`, so the correct >>> calculation here is: >>> >>> `num = mask - abs(tail - head);` >>> >>> The effect is that a queue can only hold `size - 1` elements, otherwise >>> you cannot distinguish between a full and an empty queue without >>> another bit of metadata, which is a cost you're trying to avoid. >>> >>> This is somewhat problematic if the caller is trying to be "optimal" by >>> specifying a power of two in the `size` parameter of the >>> `odp_queue_param_t` passed to `odp_queue_create()`. For this reason we >>> may wish to return a `max_size` of a power of 2 - 1 in >>> `odp_queue_capability()` as part of this patch series. Bill Fischofer(Bill-Fischofer-Linaro) wrote: This only works if `size` is a power of 2. Should be documented as such, since this is an internal routine. In this case an `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this requirement would be a useful debugging aid. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > should be `num = abs(tail - head);` to deal with wrap arounds, > otherwise may be misinterpreted as overly large since it's > `uint32_t`. Note that GCC and clang recognize `abs()` and treat it as > a builtin, so there's no actual `stdlib.h` call here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Extra blank line should be removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r169823060 updated_at 2018-02-22 09:32:40
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; + ring->data = data; +} + +/* Dequeue data from the ring head. Max_num is smaller than ring size.*/ +static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[], +uint32_t max_num) +{ + uint32_t head, tail, mask, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + num = tail - head; + + /* Empty */ + if (num == 0) + return 0; + + if (num > max_num) + num = max_num; + + idx = head & mask; + + for (i = 0; i < num; i++) { + data[i] = ring->data[idx]; + idx = (idx + 1) & mask; + } + + ring->head = head + num; + + return num; +} + +/* Enqueue data into the ring tail. Num_data is smaller than ring size. */ +static inline uint32_t ring_st_enq_multi(ring_st_t *ring, const uint32_t data[], +uint32_t num_data) +{ + uint32_t head, tail, mask, size, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + size = mask + 1; + num = size - (tail - head); + + /* Full */ + if (num == 0) + return 0; + + if (num > num_data) + num = num_data; + + idx = tail & mask; + + for (i = 0; i < num; i++) { + ring->data[idx] = data[i]; + idx = (idx + 1) & mask; + } + + ring->tail = tail + num; + + return num; +} + +/* Check if ring is empty */ +static inline int ring_st_is_empty(ring_st_t *ring) +{ + uint32_t head, tail, num; + + head = ring->head; + tail = ring->tail; + num = tail - head; + + if (num == 0) + return 1; + + return 0; Comment: This routine can be optimized to: ``` return ring->head == ring->tail; ``` > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Your invariant is the queue is empty when `head == tail` therefore the queue > is full when `abs(tail - head) == mask`, so the correct calculation here is: > > `num = mask - abs(tail - head);` > > The effect is that a queue can only hold `size - 1` elements, otherwise you > cannot distinguish between a full and an empty queue without another bit of > metadata, which is a cost you're trying to avoid. > > This is somewhat problematic if the caller is trying to be "optimal" by > specifying a power of two in the `size` parameter of the `odp_queue_param_t` > passed to `odp_queue_create()`. For this reason we may wish to return a > `max_size` of a power of 2 - 1 in `odp_queue_capability()` as part of this > patch series. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> This only works if `size` is a power of 2. Should be documented as such, >> since this is an internal routine. In this case an `ODP_ASSERT(size == >> ROUNDUP_POWER2_U32(size))` for this requirement would be a useful debugging >> aid. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> should be `num = abs(tail - head);` to deal with wrap arounds, otherwise >>> may be misinterpreted as overly large since it's `uint32_t`. Note that GCC >>> and clang recognize `abs()` and treat it as a builtin, so there's no actual >>> `stdlib.h` call here. Bill Fischofer(Bill-Fischofer-Linaro) wrote: Extra blank line should be removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r169804664 updated_at 2018-02-22 09:32:40
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/odp_queue.c line 92 @@ -143,8 +150,10 @@ static int queue_capability(odp_queue_capability_t *capa) capa->max_sched_groups = sched_fn->num_grps(); capa->sched_prios = odp_schedule_num_prio(); capa->plain.max_num = capa->max_queues; + capa->plain.max_size= CONFIG_QUEUE_SIZE; Comment: As noted earlier, due to "losing" one entry to distinguish queue empty/full, this should be returned as `CONFIG_QUEUE_SIZE - 1`, and we also need to ensure that `CONFIG_QUEUE_SIZE` is itself a power of 2. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Since you're initializing `index.pool` and `index.buffer` there's no need to > set `index.u32` here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> We originally had this index partitioning based on `ODP_CONFIG_POOLS`. Do we >> want to return to that here? >> >> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that >> `ODP_CONFIG_POOLS < 256` or else bad things will happen here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> This routine can be optimized to: >>> ``` >>> return ring->head == ring->tail; >>> ``` Bill Fischofer(Bill-Fischofer-Linaro) wrote: Your invariant is the queue is empty when `head == tail` therefore the queue is full when `abs(tail - head) == mask`, so the correct calculation here is: `num = mask - abs(tail - head);` The effect is that a queue can only hold `size - 1` elements, otherwise you cannot distinguish between a full and an empty queue without another bit of metadata, which is a cost you're trying to avoid. This is somewhat problematic if the caller is trying to be "optimal" by specifying a power of two in the `size` parameter of the `odp_queue_param_t` passed to `odp_queue_create()`. For this reason we may wish to return a `max_size` of a power of 2 - 1 in `odp_queue_capability()` as part of this patch series. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > This only works if `size` is a power of 2. Should be documented as such, > since this is an internal routine. In this case an `ODP_ASSERT(size == > ROUNDUP_POWER2_U32(size))` for this requirement would be a useful > debugging aid. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> should be `num = abs(tail - head);` to deal with wrap arounds, otherwise >> may be misinterpreted as overly large since it's `uint32_t`. Note that >> GCC and clang recognize `abs()` and treat it as a builtin, so there's no >> actual `stdlib.h` call here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Extra blank line should be removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r169821543 updated_at 2018-02-22 09:32:40
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_buffer_internal.h line 17 @@ -41,11 +41,19 @@ typedef struct seg_entry_t { uint32_t len; } seg_entry_t; +typedef union buffer_index_t { + uint32_t u32; + + struct { + uint32_t pool :8; + uint32_t buffer :24; Comment: We originally had this index partitioning based on `ODP_CONFIG_POOLS`. Do we want to return to that here? If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that `ODP_CONFIG_POOLS < 256` or else bad things will happen here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > This routine can be optimized to: > ``` > return ring->head == ring->tail; > ``` >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Your invariant is the queue is empty when `head == tail` therefore the queue >> is full when `abs(tail - head) == mask`, so the correct calculation here is: >> >> `num = mask - abs(tail - head);` >> >> The effect is that a queue can only hold `size - 1` elements, otherwise you >> cannot distinguish between a full and an empty queue without another bit of >> metadata, which is a cost you're trying to avoid. >> >> This is somewhat problematic if the caller is trying to be "optimal" by >> specifying a power of two in the `size` parameter of the `odp_queue_param_t` >> passed to `odp_queue_create()`. For this reason we may wish to return a >> `max_size` of a power of 2 - 1 in `odp_queue_capability()` as part of this >> patch series. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> This only works if `size` is a power of 2. Should be documented as such, >>> since this is an internal routine. In this case an `ODP_ASSERT(size == >>> ROUNDUP_POWER2_U32(size))` for this requirement would be a useful debugging >>> aid. Bill Fischofer(Bill-Fischofer-Linaro) wrote: should be `num = abs(tail - head);` to deal with wrap arounds, otherwise may be misinterpreted as overly large since it's `uint32_t`. Note that GCC and clang recognize `abs()` and treat it as a builtin, so there's no actual `stdlib.h` call here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Extra blank line should be removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r169819392 updated_at 2018-02-22 09:32:40
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/odp_pool.c line 28 @@ -296,7 +282,9 @@ static void init_buffers(pool_t *pool) memset(buf_hdr, 0, (uintptr_t)data - (uintptr_t)buf_hdr); /* Initialize buffer metadata */ - buf_hdr->index = i; + buf_hdr->index.u32= 0; + buf_hdr->index.pool = pool->pool_idx; + buf_hdr->index.buffer = i; Comment: Since you're initializing `index.pool` and `index.buffer` there's no need to set `index.u32` here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > We originally had this index partitioning based on `ODP_CONFIG_POOLS`. Do we > want to return to that here? > > If not, we at least need an `ODP_STATIC_ASSERT()` to ensure that > `ODP_CONFIG_POOLS < 256` or else bad things will happen here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> This routine can be optimized to: >> ``` >> return ring->head == ring->tail; >> ``` >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Your invariant is the queue is empty when `head == tail` therefore the >>> queue is full when `abs(tail - head) == mask`, so the correct calculation >>> here is: >>> >>> `num = mask - abs(tail - head);` >>> >>> The effect is that a queue can only hold `size - 1` elements, otherwise you >>> cannot distinguish between a full and an empty queue without another bit of >>> metadata, which is a cost you're trying to avoid. >>> >>> This is somewhat problematic if the caller is trying to be "optimal" by >>> specifying a power of two in the `size` parameter of the >>> `odp_queue_param_t` passed to `odp_queue_create()`. For this reason we may >>> wish to return a `max_size` of a power of 2 - 1 in `odp_queue_capability()` >>> as part of this patch series. Bill Fischofer(Bill-Fischofer-Linaro) wrote: This only works if `size` is a power of 2. Should be documented as such, since this is an internal routine. In this case an `ODP_ASSERT(size == ROUNDUP_POWER2_U32(size))` for this requirement would be a useful debugging aid. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > should be `num = abs(tail - head);` to deal with wrap arounds, otherwise > may be misinterpreted as overly large since it's `uint32_t`. Note that > GCC and clang recognize `abs()` and treat it as a builtin, so there's no > actual `stdlib.h` call here. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Extra blank line should be removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r169821150 updated_at 2018-02-22 09:32:40
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 78 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; + ring->data = data; +} + +/* Dequeue data from the ring head. Max_num is smaller than ring size.*/ +static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[], +uint32_t max_num) +{ + uint32_t head, tail, mask, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + num = tail - head; + + /* Empty */ + if (num == 0) + return 0; + + if (num > max_num) + num = max_num; + + idx = head & mask; + + for (i = 0; i < num; i++) { + data[i] = ring->data[idx]; + idx = (idx + 1) & mask; + } + + ring->head = head + num; + + return num; +} + +/* Enqueue data into the ring tail. Num_data is smaller than ring size. */ +static inline uint32_t ring_st_enq_multi(ring_st_t *ring, const uint32_t data[], +uint32_t num_data) +{ + uint32_t head, tail, mask, size, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + size = mask + 1; + num = size - (tail - head); Comment: Your invariant is the queue is empty when `head == tail` therefore the queue is full when `abs(tail - head) == mask`, so the correct calculation here is: `num = mask - abs(tail - head);` The effect is that a queue can only hold `size - 1` elements, otherwise you cannot distinguish between a full and an empty queue without another bit of metadata, which is a cost you're trying to avoid. This is somewhat problematic if the caller is trying to be "optimal" by specifying a power of two in the `size` parameter of the `odp_queue_param_t` passed to `odp_queue_create()`. For this reason we may wish to return a `max_size` of a power of 2 - 1 in `odp_queue_capability()` as part of this patch series. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > This only works if `size` is a power of 2. Should be documented as such, > since this is an internal routine. In this case an `ODP_ASSERT(size == > ROUNDUP_POWER2_U32(size))` for this requirement would be a useful debugging > aid. >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> should be `num = abs(tail - head);` to deal with wrap arounds, otherwise may >> be misinterpreted as overly large since it's `uint32_t`. Note that GCC and >> clang recognize `abs()` and treat it as a builtin, so there's no actual >> `stdlib.h` call here. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Extra blank line should be removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r169794712 updated_at 2018-02-22 09:32:40
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 24 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + Comment: Extra blank line should be removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r169736572 updated_at 2018-02-22 09:32:40
Re: [lng-odp] [PATCH v2] Ring implementation of queues
Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 46 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; + ring->data = data; +} + +/* Dequeue data from the ring head. Max_num is smaller than ring size.*/ +static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[], +uint32_t max_num) +{ + uint32_t head, tail, mask, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + num = tail - head; Comment: should be `num = abs(tail - head);` to deal with wrap arounds, otherwise may be misinterpreted as overly large since it's `uint32_t`. Note that GCC and clang recognize `abs()` and treat it as a builtin, so there's no actual `stdlib.h` call here. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Extra blank line should be removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r169793454 updated_at 2018-02-22 09:32:40