** Description changed:

+ [Impact]
+ 
+  * Crashing on certain SkyLake Chips
+ 
+  * Follow upstream disabling one of the gcc options
+ 
+ [Test Case]
+ 
+  * Part of the MRE bug 1817675 following the MRE verficiation process as 
+    defined there.
+ 
+ [Regression Potential]
+ 
+  * Rebuilds with the new code using DPDK headers will be slightly slower 
+    (not using the feature) but avoiding the crash. The slowdown should 
+    be negligible for most cases and the crash avoidance outweigh this.
+ 
+ [Other Info]
+  
+  * n/a
+ 
+ ---
+ 
  Hi, Christian
  
  We've recently encountered a weird issue with Ubuntu 18.04 on the Skylake
  server. I can always reproduce this crash and I could narrowed it down. I 
guess
  it could be a GCC issue.
- 
  
  [1] How to reproduce
  - ConnectX-4Lx/ConnectX-5 with mlx5 PMD in DPDK 18.02.1
  - Ubuntu 18.04 on Intel Skylake server
  - gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0
  - Testpmd crashes when it starts to forward traffic. Easy to reproduce.
  - Only happens on the Skylake server.
  - DPDK 18.05 and later don't have such issue. git-bisect gives no clue.
  
  This is because I enabled MEMPOOL_DEBUG and MLX5_DEBUG. As mempool/rte_memcpy 
is
  inlined function, it should be affected. Now I can see the crash regardlessly 
-
  18.02, 18.05 and 18.08.
  
  [2] Failure point
  
  The attached patch gives an insight of why it crashes. The following is the
  result of the patch and the GDB commands.
  
  In summary, rte_memcpy() doesn't work as expected. In __mempool_generic_put(),
  there's rte_memcpy() to move the array of objects to the lcore cache. If I run
  memcmp() right after rte_memcpy(dst, src, n), data in dst differs from data in
  src. And it looks like some of data got shifted by a few bytes as you can see
  below.
  
-       [GDB command]
-       $dst = 0x7ffff4e09ea8
-       $src = 0x7fffce3fb970
-       $n = 256
-       x/32gx 0x7ffff4e09ea8
-       x/32gx 0x7fffce3fb970
-       testpmd: /home/mlnxtest/dpdk/build/include/rte_mempool.h:1140: 
__mempool_generic_put: Assertion `0' failed.
+  [GDB command]
+  $dst = 0x7ffff4e09ea8
+  $src = 0x7fffce3fb970
+  $n = 256
+  x/32gx 0x7ffff4e09ea8
+  x/32gx 0x7fffce3fb970
+  testpmd: /home/mlnxtest/dpdk/build/include/rte_mempool.h:1140: 
__mempool_generic_put: Assertion `0' failed.
  
-       Thread 4 "lcore-slave-1" received signal SIGABRT, Aborted.
-       [Switching to Thread 0x7fffce3ff700 (LWP 69913)]
-       (gdb) x/32gx 0x7ffff4e09ea8
-       0x7ffff4e09ea8: 0x00007fffaac38ec0      0x00007fffaac38500
-       0x7ffff4e09eb8: 0x00007fffaac37b40      0x00007fffaac37180
-       0x7ffff4e09ec8: 0x850000007fffaac3      0x7b4000007fffaac3
-       0x7ffff4e09ed8: 0x00007fffaac35440      0x00007fffaac34a80
-       0x7ffff4e09ee8: 0xaac3850000007fff      0xaac37b4000007fff
-       0x7ffff4e09ef8: 0x00007fffaac32d40      0x00007fffaac32380
-       0x7ffff4e09f08: 0x7fffaac385000000      0x7fffaac37b400000
-       0x7ffff4e09f18: 0x00007fffaac30640      0x00007fffaac2fc80
-       0x7ffff4e09f28: 0x00007fffaac2f2c0      0x00007fffaac2e900
-       0x7ffff4e09f38: 0x00007fffaac2df40      0x00007fffaac2d580
-       0x7ffff4e09f48: 0x00007fffaac2cbc0      0x00007fffaac2c200
-       0x7ffff4e09f58: 0x00007fffaac2b840      0x00007fffaac2ae80
-       0x7ffff4e09f68: 0x00007fffaac2a4c0      0x00007fffaac29b00
-       0x7ffff4e09f78: 0x00007fffaac29140      0x00007fffaac28780
-       0x7ffff4e09f88: 0x00007fffaac27dc0      0x00007fffaac27400
-       0x7ffff4e09f98: 0x00007fffaac26a40      0x00007fffaac26080
-       (gdb) x/32gx 0x7fffce3fb970
-       0x7fffce3fb970: 0x00007fffaac38ec0      0x00007fffaac38500
-       0x7fffce3fb980: 0x00007fffaac37b40      0x00007fffaac37180
-       0x7fffce3fb990: 0x00007fffaac367c0      0x00007fffaac35e00
-       0x7fffce3fb9a0: 0x00007fffaac35440      0x00007fffaac34a80
-       0x7fffce3fb9b0: 0x00007fffaac340c0      0x00007fffaac33700
-       0x7fffce3fb9c0: 0x00007fffaac32d40      0x00007fffaac32380
-       0x7fffce3fb9d0: 0x00007fffaac319c0      0x00007fffaac31000
-       0x7fffce3fb9e0: 0x00007fffaac30640      0x00007fffaac2fc80
-       0x7fffce3fb9f0: 0x00007fffaac2f2c0      0x00007fffaac2e900
-       0x7fffce3fba00: 0x00007fffaac2df40      0x00007fffaac2d580
-       0x7fffce3fba10: 0x00007fffaac2cbc0      0x00007fffaac2c200
-       0x7fffce3fba20: 0x00007fffaac2b840      0x00007fffaac2ae80
-       0x7fffce3fba30: 0x00007fffaac2a4c0      0x00007fffaac29b00
-       0x7fffce3fba40: 0x00007fffaac29140      0x00007fffaac28780
-       0x7fffce3fba50: 0x00007fffaac27dc0      0x00007fffaac27400
-       0x7fffce3fba60: 0x00007fffaac26a40      0x00007fffaac26080
- 
+  Thread 4 "lcore-slave-1" received signal SIGABRT, Aborted.
+  [Switching to Thread 0x7fffce3ff700 (LWP 69913)]
+  (gdb) x/32gx 0x7ffff4e09ea8
+  0x7ffff4e09ea8: 0x00007fffaac38ec0      0x00007fffaac38500
+  0x7ffff4e09eb8: 0x00007fffaac37b40      0x00007fffaac37180
+  0x7ffff4e09ec8: 0x850000007fffaac3      0x7b4000007fffaac3
+  0x7ffff4e09ed8: 0x00007fffaac35440      0x00007fffaac34a80
+  0x7ffff4e09ee8: 0xaac3850000007fff      0xaac37b4000007fff
+  0x7ffff4e09ef8: 0x00007fffaac32d40      0x00007fffaac32380
+  0x7ffff4e09f08: 0x7fffaac385000000      0x7fffaac37b400000
+  0x7ffff4e09f18: 0x00007fffaac30640      0x00007fffaac2fc80
+  0x7ffff4e09f28: 0x00007fffaac2f2c0      0x00007fffaac2e900
+  0x7ffff4e09f38: 0x00007fffaac2df40      0x00007fffaac2d580
+  0x7ffff4e09f48: 0x00007fffaac2cbc0      0x00007fffaac2c200
+  0x7ffff4e09f58: 0x00007fffaac2b840      0x00007fffaac2ae80
+  0x7ffff4e09f68: 0x00007fffaac2a4c0      0x00007fffaac29b00
+  0x7ffff4e09f78: 0x00007fffaac29140      0x00007fffaac28780
+  0x7ffff4e09f88: 0x00007fffaac27dc0      0x00007fffaac27400
+  0x7ffff4e09f98: 0x00007fffaac26a40      0x00007fffaac26080
+  (gdb) x/32gx 0x7fffce3fb970
+  0x7fffce3fb970: 0x00007fffaac38ec0      0x00007fffaac38500
+  0x7fffce3fb980: 0x00007fffaac37b40      0x00007fffaac37180
+  0x7fffce3fb990: 0x00007fffaac367c0      0x00007fffaac35e00
+  0x7fffce3fb9a0: 0x00007fffaac35440      0x00007fffaac34a80
+  0x7fffce3fb9b0: 0x00007fffaac340c0      0x00007fffaac33700
+  0x7fffce3fb9c0: 0x00007fffaac32d40      0x00007fffaac32380
+  0x7fffce3fb9d0: 0x00007fffaac319c0      0x00007fffaac31000
+  0x7fffce3fb9e0: 0x00007fffaac30640      0x00007fffaac2fc80
+  0x7fffce3fb9f0: 0x00007fffaac2f2c0      0x00007fffaac2e900
+  0x7fffce3fba00: 0x00007fffaac2df40      0x00007fffaac2d580
+  0x7fffce3fba10: 0x00007fffaac2cbc0      0x00007fffaac2c200
+  0x7fffce3fba20: 0x00007fffaac2b840      0x00007fffaac2ae80
+  0x7fffce3fba30: 0x00007fffaac2a4c0      0x00007fffaac29b00
+  0x7fffce3fba40: 0x00007fffaac29140      0x00007fffaac28780
+  0x7fffce3fba50: 0x00007fffaac27dc0      0x00007fffaac27400
+  0x7fffce3fba60: 0x00007fffaac26a40      0x00007fffaac26080
  
  AFAIK, AVX512F support is disabled by default in DPDK as it is still
  experimental (CONFIG_RTE_ENABLE_AVX512=n). But with gcc optimization, AVX2
  version of rte_memcpy() seems to be optimized with 512b instructions. If I
  disable it by adding EXTRA_CFLAGS="-mno-avx512f", then it works fine and 
doesn't
  crash.
  
  Do you have any idea regarding this issue or are you already aware of
  it?
  
- 
  Thanks,
  Yongseok
- 
  
  $ git diff
  diff --git a/config/common_base b/config/common_base
  index ad03cf433..f512b5a88 100644
  --- a/config/common_base
  +++ b/config/common_base
  @@ -275,8 +275,8 @@ CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
-  #
-  # Compile burst-oriented Mellanox ConnectX-4 & ConnectX-5 (MLX5) PMD
-  #
+  #
+  # Compile burst-oriented Mellanox ConnectX-4 & ConnectX-5 (MLX5) PMD
+  #
  -CONFIG_RTE_LIBRTE_MLX5_PMD=n
  -CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
  +CONFIG_RTE_LIBRTE_MLX5_PMD=y
  +CONFIG_RTE_LIBRTE_MLX5_DEBUG=y
-  CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS=n
-  CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8
+  CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS=n
+  CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8
  
  @@ -597,7 +597,7 @@ CONFIG_RTE_RING_USE_C11_MEM_MODEL=n
-  #
-  CONFIG_RTE_LIBRTE_MEMPOOL=y
-  CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE=512
+  #
+  CONFIG_RTE_LIBRTE_MEMPOOL=y
+  CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE=512
  -CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n
  +CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=y
  
-  #
-  # Compile Mempool drivers
+  #
+  # Compile Mempool drivers
  diff --git a/lib/librte_mempool/rte_mempool.h 
b/lib/librte_mempool/rte_mempool.h
  index 8b1b7f7ed..9f48028d9 100644
  --- a/lib/librte_mempool/rte_mempool.h
  +++ b/lib/librte_mempool/rte_mempool.h
  @@ -39,6 +39,7 @@
-  #include <errno.h>
-  #include <inttypes.h>
-  #include <sys/queue.h>
+  #include <errno.h>
+  #include <inttypes.h>
+  #include <sys/queue.h>
  +#include <assert.h>
  
-  #include <rte_config.h>
-  #include <rte_spinlock.h>
+  #include <rte_config.h>
+  #include <rte_spinlock.h>
  @@ -1123,6 +1124,22 @@ __mempool_generic_put(struct rte_mempool *mp, void * 
const *obj_table,
-         /* Add elements back into the cache */
-         rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n);
+         /* Add elements back into the cache */
+         rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n);
  
  +       if(memcmp(&cache_objs[0], obj_table, sizeof(void *) * n)) {
  +               printf("[GDB command] \n"
  +                      "$dst = %p\n"
  +                      "$src = %p\n"
  +                      "$n = %ld\n"
  +                      "x/%ldgx %p\n"
  +                      "x/%ldgx %p\n",
  +                      (void *)&cache_objs[0],
  +                      (const void *)obj_table,
  +                      sizeof(void *) * n,
  +                      sizeof(void *) * n / 8, (void *)&cache_objs[0],
  +                      sizeof(void *) * n / 8, (const void *)obj_table
  +                      );
  +               assert(0);
  +       }
  +
-         cache->len += n;
+         cache->len += n;
  
-         if (cache->len >= cache->flushthresh) {
+         if (cache->len >= cache->flushthresh) {

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1799397

Title:
  [dpdk]rte_memcpy() moves data incorrectly on Ubuntu 18.04 on    Intel
  Skylake.

To manage notifications about this bug go to:
https://bugs.launchpad.net/dpdk/+bug/1799397/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to