Hello VPP Developers, We are writing to report a recurring VPP crash.
The issue occurs when we attempt to send traffic from the Linux host system through an LCP interface into a GRE tunnel terminated on VPP, for example ip netns exec vppDataplane ping 10.88.0.65 We've observed that pinging the tunnel directly from VPP's ping plugin works correctly without causing a crash. Here is some additional context about our environment and the steps we've already taken: System Details: - VPP is running on a bare-metal server. - We were unable to reproduce the issue on servers with a different CPU, specifically Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz. LCP worked as expected, and ping from linux was successful. Troubleshooting Steps Taken: - We applied the recommended BIOS settings as per the performance optimization guide on the fd.io wiki ( https://wiki.fd.io/view/VPP/How_To_Optimize_Performance_(System_Tuning)), but the issue persists. - We have tried running VPP in single-threaded mode, reducing the allocated memory, and adjusting various LCP settings. None of these actions resolved the problem. This leads us to believe the issue may be related to the interaction between the LCP interface and the GRE encapsulation process, possibly specific to certain hardware. Error logs Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: from /lib/x86_64-linux-gnu/libc.so.6 Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: #5 0x000070f966729c3c __clone + 0x24c Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: from /lib/x86_64-linux-gnu/libc.so.6 Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: #5 0x000070f966729c3c __clone + 0x24c Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: from /lib/x86_64-linux-gnu/libc.so.6 Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: #4 0x000070f96669caa4 pthread_condattr_setpshared + 0x684 Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: from /lib/x86_64-linux-gnu/libc.so.6 Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: #4 0x000070f96669caa4 pthread_condattr_setpshared + 0x684 Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: from /lib/x86_64-linux-gnu/libvlib.so.25.06 Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: #3 0x000070f966a7f77e vlib_worker_thread_bootstrap_fn + 0x4e Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: from /lib/x86_64-linux-gnu/libvlib.so.25.06 Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: #3 0x000070f966a7f77e vlib_worker_thread_bootstrap_fn + 0x4e Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: from /lib/x86_64-linux-gnu/libvlib.so.25.06 Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: #2 0x000070f966a3c53e vlib_exit_with_status + 0x375e Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: from /lib/x86_64-linux-gnu/libvlib.so.25.06 Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: #2 0x000070f966a3c53e vlib_exit_with_status + 0x375e Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: from /lib/x86_64-linux-gnu/libvlib.so.25.06 Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: #1 0x000070f966a395ef vlib_exit_with_status + 0x80f Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: from /lib/x86_64-linux-gnu/libvlib.so.25.06 Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: #1 0x000070f966a395ef vlib_exit_with_status + 0x80f Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: from /lib/x86_64-linux-gnu/libvnet.so.25.06 Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: #0 0x000070f9681e0347 adj_l2_midchain_node_fn_skx + 0x737 Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: from /lib/x86_64-linux-gnu/libvnet.so.25.06 Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: #0 0x000070f9681e0347 adj_l2_midchain_node_fn_skx + 0x737 Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: Code: 41 0f b7 4c 1c 46 48 83 f9 14 0f 85 ce 00 00 00 c4 c1 7a 6f Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: Code: 41 0f b7 4c 1c 46 48 83 f9 14 0f 85 ce 00 00 00 c4 c1 7a 6f Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: received signal SIGSEGV, PC 0x70f9681e0347, faulting address 0x71f47e8a37c6 Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: received signal SIGSEGV, PC 0x70f9681e0347, faulting address 0x71f47e8a37c6 Aug 18 11:21:32 net-chgr-vpp03 vpp[14019]: vlib/file: file error: nl_route_error_cb: Error polling netlink socket 1698 Aug 18 11:21:32 net-chgr-vpp03 vpp[14019]: vpp[14019]: vlib/file: file error: nl_route_error_cb: Error polling netlink socket 1698 Aug 18 11:21:32 net-chgr-vpp03 vpp[14019]: nl/nl: Error polling netlink socket (fd 1698) Aug 18 11:21:32 net-chgr-vpp03 vpp[14019]: vpp[14019]: nl/nl: Error polling netlink socket (fd 1698) Aug 18 11:21:30 net-chgr-vpp03 vpp[14019]: vlib/file: file error: nl_route_error_cb: Error polling netlink socket 1698 Aug 18 11:21:30 net-chgr-vpp03 vpp[14019]: nl/nl: Error polling netlink socket (fd 1698) Aug 18 11:21:30 net-chgr-vpp03 vpp[14019]: vpp[14019]: vlib/file: file error: nl_route_error_cb: Error polling netlink socket 1698 Aug 18 11:21:30 net-chgr-vpp03 vpp[14019]: vpp[14019]: nl/nl: Error polling netlink socket (fd 1698) Aug 18 11:21:28 net-chgr-vpp03 vpp[14019]: vlib/file: file error: nl_route_error_cb: Error polling netlink socket 1698 Aug 18 11:21:28 net-chgr-vpp03 vpp[14019]: nl/nl: Error polling netlink socket (fd 1698) Aug 18 11:21:28 net-chgr-vpp03 vpp[14019]: vpp[14019]: vlib/file: file error: nl_route_error_cb: Error polling netlink socket 1698 Aug 18 11:21:28 net-chgr-vpp03 vpp[14019]: vpp[14019]: nl/nl: Error polling netlink socket (fd 1698) The commands that I've used to configure the gre tunnel create gre tunnel src 10.10.25.5 dst 10.10.35.5 instance 0 set interface state gre0 up lcp create gre0 host-if gre0@vpp tun set interface ip address gre0 10.88.0.64/31 Linux distro is Ubuntu 24.04.2 LTS exit interface for gre tunnel info driver: mlx5_core version: 6.14.0-27-generic firmware-version: 16.31.1014 (MT_0000000013) expansion-rom-version: bus-info: 0000:d8:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes Affected host CPU info: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 40 On-line CPU(s) list: 0-39 Vendor ID: GenuineIntel BIOS Vendor ID: Intel(R) Corporation Model name: Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz BIOS Model name: Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz CPU @ 2.1GHz BIOS CPU family: 179 CPU family: 6 Model: 85 Thread(s) per core: 1 Core(s) per socket: 20 Socket(s): 2 Stepping: 7 CPU(s) scaling MHz: 71% CPU max MHz: 3900.0000 CPU min MHz: 800.0000 BogoMIPS: 4200.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfm on pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx5 12vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req vnmi pku ospke avx512_vnni md_clear flush_l1d arch_capabilities Virtualization features: Virtualization: VT-x Caches (sum of all): L1d: 1.3 MiB (40 instances) L1i: 1.3 MiB (40 instances) L2: 40 MiB (40 instances) L3: 55 MiB (2 instances) NUMA: NUMA node(s): 2 NUMA node0 CPU(s): 0-19 NUMA node1 CPU(s): 20-39 Vulnerabilities: Gather data sampling: Vulnerable Ghostwrite: Not affected Itlb multihit: KVM: Mitigation: Split huge pages L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Mitigation; Clear CPU buffers; SMT disabled Reg file data sampling: Not affected Retbleed: Mitigation; Enhanced IBRS Spec rstack overflow: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop Srbds: Not affected Tsx async abort: Mitigation; TSX disabled root@localhost:~# vppctl show version verbose cmdline Version: v25.06-release Compiled by: root Compile host: e29c327af67c Compile date: 2025-06-25T13:23:10 Compile location: /w/workspace/vpp-merge-2506-ubuntu2404-x86_64 Compiler: Clang/LLVM 18.1.3 (1ubuntu1) Current PID: 26500 Command line arguments: vppctl show version verbose command extensive output in attached file P.S: I have not created a Jira ticket, because jira.fd.io fails in dns resolution. Thank you for your time and consideration. -- Kind regards, Andrey Zelentsov Network Engineer
root@localhost:~# vppctl show version verbose cmdline Version: v25.06-release Compiled by: root Compile host: e29c327af67c Compile date: 2025-06-25T13:23:10 Compile location: /w/workspace/vpp-merge-2506-ubuntu2404-x86_64 Compiler: Clang/LLVM 18.1.3 (1ubuntu1) Current PID: 26500 Command line arguments: /usr/bin/vpp unix { nodaemon log /var/log/vpp/vpp.log cli-listen /run/vpp/cli.sock full-coredump gid vpp startup-config /etc/vpp/startup.commands poll-sleep-usec 1000 } api-trace { on } api-segment { gid vpp } plugins { path /usr/lib/x86_64-linux-gnu/vpp_plugins/ plugin default { disable } plugin linux_cp_plugin.so { enable } plugin linux_nl_plugin.so { enable } plugin acl_plugin.so { enable } plugin lldp_plugin.so { enable } plugin dpdk_plugin.so { enable } plugin flowprobe_plugin.so { enable } plugin ping_plugin.so { enable } plugin wireguard_plugin.so { enable } plugin lacp_plugin.so { enable } gre_plugin.so { enable } } memory { main-heap-size 20G main-heap-page-size 2M } buffers { buffers-per-numa 2000000 } statseg { size 30G per-node-counters on } linux-cp { default netns vppDataplane lcp-sync } cpu { main-core 0 corelist-workers 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39 } dpdk { dev 0000:5e:00.0 { name mlx5c0 num-rx-queues 40 num-rx-desc 4096 } dev 0000:5e:00.1 { name mlx5c1 num-rx-queues 40 num-rx-desc 4096 } dev 0000:d8:00.0 { name mlx5c2 num-rx-queues 40 num-rx-desc 4096 } dev 0000:d8:00.1 { name mlx5c3 num-rx-queues 40 num-rx-desc 4096 }}
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#26273): https://lists.fd.io/g/vpp-dev/message/26273 Mute This Topic: https://lists.fd.io/mt/114761798/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/14379924/21656/631435203/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-