In vPMD, when load Rx desc with _mm_loadu_si128,
volatile point will be cast into non-volatile point.
So GCC is allowed to reorder the load instructions,
while Rx read's correctness is reply on these load
instructions to follow a backward sequence strictly,
so we add compile barrier to prevent compiler reorder.

Fixes: c95584dc2b18 ("ixgbe: new vectorized functions for Rx/Tx")

Signed-off-by: Qi Zhang <qi.z.zhang at intel.com>
---

v2:
- fix check-git-log.sh warning.
- add more detail commit message.

 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
index ad8a9ff..abbf284 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
@@ -343,6 +343,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
                /* Read desc statuses backwards to avoid race condition */
                /* A.1 load 4 pkts desc */
                descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
+               rte_compiler_barrier();

                /* B.2 copy 2 mbuf point into rx_pkts  */
                _mm_storeu_si128((__m128i *)&rx_pkts[pos], mbp1);
@@ -351,8 +352,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
                mbp2 = _mm_loadu_si128((__m128i *)&sw_ring[pos+2]);

                descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2));
+               rte_compiler_barrier();
                /* B.1 load 2 mbuf point */
                descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1));
+               rte_compiler_barrier();
                descs[0] = _mm_loadu_si128((__m128i *)(rxdp));

                /* B.2 copy 2 mbuf point into rx_pkts  */
-- 
2.7.4

Reply via email to