Hello VPP Developers,

We are writing to report a recurring VPP crash.

The issue occurs when we attempt to send traffic from the Linux host system
through an LCP interface into a GRE tunnel terminated on VPP, for example
ip netns exec vppDataplane ping 10.88.0.65

We've observed that pinging the tunnel directly from VPP's ping plugin
works correctly without causing a crash.

Here is some additional context about our environment and the steps we've
already taken:

System Details:

- VPP is running on a bare-metal server.

- We were unable to reproduce the issue on servers with a different CPU,
specifically Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz. LCP worked as
expected, and ping from linux was successful.

Troubleshooting Steps Taken:

- We applied the recommended BIOS settings as per the performance
optimization guide on the fd.io wiki (
https://wiki.fd.io/view/VPP/How_To_Optimize_Performance_(System_Tuning)),
but the issue persists.

- We have tried running VPP in single-threaded mode, reducing the allocated
memory, and adjusting various LCP settings. None of these actions resolved
the problem.

This leads us to believe the issue may be related to the interaction
between the LCP interface and the GRE encapsulation process, possibly
specific to certain hardware.

Error logs
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]:      from
/lib/x86_64-linux-gnu/libc.so.6
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: #5  0x000070f966729c3c __clone +
0x24c
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]:      from
/lib/x86_64-linux-gnu/libc.so.6
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: #5
 0x000070f966729c3c __clone + 0x24c
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]:      from
/lib/x86_64-linux-gnu/libc.so.6
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: #4  0x000070f96669caa4
pthread_condattr_setpshared + 0x684
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]:      from
/lib/x86_64-linux-gnu/libc.so.6
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: #4
 0x000070f96669caa4 pthread_condattr_setpshared + 0x684
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]:      from
/lib/x86_64-linux-gnu/libvlib.so.25.06
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: #3  0x000070f966a7f77e
vlib_worker_thread_bootstrap_fn + 0x4e
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]:      from
/lib/x86_64-linux-gnu/libvlib.so.25.06
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: #3
 0x000070f966a7f77e vlib_worker_thread_bootstrap_fn + 0x4e
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]:      from
/lib/x86_64-linux-gnu/libvlib.so.25.06
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: #2  0x000070f966a3c53e
vlib_exit_with_status + 0x375e
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]:      from
/lib/x86_64-linux-gnu/libvlib.so.25.06
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: #2
 0x000070f966a3c53e vlib_exit_with_status + 0x375e
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]:      from
/lib/x86_64-linux-gnu/libvlib.so.25.06
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: #1  0x000070f966a395ef
vlib_exit_with_status + 0x80f
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]:      from
/lib/x86_64-linux-gnu/libvlib.so.25.06
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: #1
 0x000070f966a395ef vlib_exit_with_status + 0x80f
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]:      from
/lib/x86_64-linux-gnu/libvnet.so.25.06
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: #0  0x000070f9681e0347
adj_l2_midchain_node_fn_skx + 0x737
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]:      from
/lib/x86_64-linux-gnu/libvnet.so.25.06
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: #0
 0x000070f9681e0347 adj_l2_midchain_node_fn_skx + 0x737
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: Code:  41 0f b7 4c 1c 46 48 83
f9 14 0f 85 ce 00 00 00 c4 c1 7a 6f
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: Code:  41 0f b7 4c
1c 46 48 83 f9 14 0f 85 ce 00 00 00 c4 c1 7a 6f
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: received signal SIGSEGV, PC
0x70f9681e0347, faulting address 0x71f47e8a37c6
Aug 18 11:23:12 net-chgr-vpp03 vpp[14019]: vpp[14019]: received signal
SIGSEGV, PC 0x70f9681e0347, faulting address 0x71f47e8a37c6
Aug 18 11:21:32 net-chgr-vpp03 vpp[14019]: vlib/file: file error:
nl_route_error_cb: Error polling netlink socket 1698
Aug 18 11:21:32 net-chgr-vpp03 vpp[14019]: vpp[14019]: vlib/file: file
error: nl_route_error_cb: Error polling netlink socket 1698
Aug 18 11:21:32 net-chgr-vpp03 vpp[14019]: nl/nl: Error polling netlink
socket (fd 1698)
Aug 18 11:21:32 net-chgr-vpp03 vpp[14019]: vpp[14019]: nl/nl: Error polling
netlink socket (fd 1698)
Aug 18 11:21:30 net-chgr-vpp03 vpp[14019]: vlib/file: file error:
nl_route_error_cb: Error polling netlink socket 1698
Aug 18 11:21:30 net-chgr-vpp03 vpp[14019]: nl/nl: Error polling netlink
socket (fd 1698)
Aug 18 11:21:30 net-chgr-vpp03 vpp[14019]: vpp[14019]: vlib/file: file
error: nl_route_error_cb: Error polling netlink socket 1698
Aug 18 11:21:30 net-chgr-vpp03 vpp[14019]: vpp[14019]: nl/nl: Error polling
netlink socket (fd 1698)
Aug 18 11:21:28 net-chgr-vpp03 vpp[14019]: vlib/file: file error:
nl_route_error_cb: Error polling netlink socket 1698
Aug 18 11:21:28 net-chgr-vpp03 vpp[14019]: nl/nl: Error polling netlink
socket (fd 1698)
Aug 18 11:21:28 net-chgr-vpp03 vpp[14019]: vpp[14019]: vlib/file: file
error: nl_route_error_cb: Error polling netlink socket 1698
Aug 18 11:21:28 net-chgr-vpp03 vpp[14019]: vpp[14019]: nl/nl: Error polling
netlink socket (fd 1698)

The commands that I've used to configure the gre tunnel

create gre tunnel src 10.10.25.5 dst 10.10.35.5 instance 0

set interface state gre0 up

lcp create gre0 host-if gre0@vpp tun

set interface ip address gre0 10.88.0.64/31


Linux distro is Ubuntu 24.04.2 LTS

exit interface for gre tunnel info
driver: mlx5_core
version: 6.14.0-27-generic
firmware-version: 16.31.1014 (MT_0000000013)
expansion-rom-version:
bus-info: 0000:d8:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

Affected host CPU info:

Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          46 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   40
  On-line CPU(s) list:    0-39
Vendor ID:                GenuineIntel
  BIOS Vendor ID:         Intel(R) Corporation
  Model name:             Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
    BIOS Model name:      Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz  CPU @
2.1GHz
    BIOS CPU family:      179
    CPU family:           6
    Model:                85
    Thread(s) per core:   1
    Core(s) per socket:   20
    Socket(s):            2
    Stepping:             7
    CPU(s) scaling MHz:   71%
    CPU max MHz:          3900.0000
    CPU min MHz:          800.0000
    BogoMIPS:             4200.00
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfm
                          on pebs bts rep_good nopl xtopology nonstop_tsc
cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3
sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
                           movbe popcnt tsc_deadline_timer aes xsave avx
f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3
intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow
                          flexpriority ept vpid ept_ad fsgsbase tsc_adjust
bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx
smap clflushopt clwb intel_pt avx512cd avx512bw avx5
                          12vl xsaveopt xsavec xgetbv1 xsaves cqm_llc
cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp
hwp_act_window hwp_epp hwp_pkg_req vnmi pku ospke avx512_vnni
                          md_clear flush_l1d arch_capabilities
Virtualization features:
  Virtualization:         VT-x
Caches (sum of all):
  L1d:                    1.3 MiB (40 instances)
  L1i:                    1.3 MiB (40 instances)
  L2:                     40 MiB (40 instances)
  L3:                     55 MiB (2 instances)
NUMA:
  NUMA node(s):           2
  NUMA node0 CPU(s):      0-19
  NUMA node1 CPU(s):      20-39
Vulnerabilities:
  Gather data sampling:   Vulnerable
  Ghostwrite:             Not affected
  Itlb multihit:          KVM: Mitigation: Split huge pages
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Mitigation; Clear CPU buffers; SMT disabled
  Reg file data sampling: Not affected
  Retbleed:               Mitigation; Enhanced IBRS
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via
prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user
pointer sanitization
  Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB
conditional; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop
  Srbds:                  Not affected
  Tsx async abort:        Mitigation; TSX disabled

root@localhost:~# vppctl show version verbose cmdline
Version: v25.06-release
Compiled by: root
Compile host: e29c327af67c
Compile date: 2025-06-25T13:23:10
Compile location: /w/workspace/vpp-merge-2506-ubuntu2404-x86_64
Compiler: Clang/LLVM 18.1.3 (1ubuntu1)
Current PID: 26500
Command line arguments:
vppctl show version verbose command extensive output in attached file

P.S: I have not created a Jira ticket, because jira.fd.io fails in dns
resolution.

Thank you for your time and consideration.

-- 

Kind regards,

Andrey Zelentsov

Network Engineer
root@localhost:~# vppctl show version verbose cmdline
Version:                  v25.06-release
Compiled by:              root
Compile host:             e29c327af67c
Compile date:             2025-06-25T13:23:10
Compile location:         /w/workspace/vpp-merge-2506-ubuntu2404-x86_64
Compiler:                 Clang/LLVM 18.1.3 (1ubuntu1)
Current PID:              26500
Command line arguments:
  /usr/bin/vpp
  unix
    {
    nodaemon
    log
    /var/log/vpp/vpp.log
    cli-listen
    /run/vpp/cli.sock
    full-coredump
    gid
    vpp
    startup-config
    /etc/vpp/startup.commands
    poll-sleep-usec
    1000
    }
  api-trace
    {
    on
    }
  api-segment
    {
    gid
    vpp
    }
  plugins
    {
    path
    /usr/lib/x86_64-linux-gnu/vpp_plugins/
    plugin
    default
      {
      disable
      }
    plugin
    linux_cp_plugin.so
      {
      enable
      }
    plugin
    linux_nl_plugin.so
      {
      enable
      }
    plugin
    acl_plugin.so
      {
      enable
      }
    plugin
    lldp_plugin.so
      {
      enable
      }
    plugin
    dpdk_plugin.so
      {
      enable
      }
    plugin
    flowprobe_plugin.so
      {
      enable
      }
    plugin
    ping_plugin.so
      {
      enable
      }
    plugin
    wireguard_plugin.so
      {
      enable
      }
    plugin
    lacp_plugin.so
      {
      enable
      }
    gre_plugin.so
      {
      enable
      }
    }
  memory
    {
    main-heap-size
    20G
    main-heap-page-size
    2M
    }
  buffers
    {
    buffers-per-numa
    2000000
    }
  statseg
    {
    size
    30G
    per-node-counters
    on
    }
  linux-cp
    {
    default
    netns
    vppDataplane
    lcp-sync
    }
  cpu
    {
    main-core
    0
    corelist-workers
    
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
    }
  dpdk
    {
    dev
    0000:5e:00.0
      {
      name
      mlx5c0
      num-rx-queues
      40
      num-rx-desc
      4096
      }
    dev
    0000:5e:00.1
      {
      name
      mlx5c1
      num-rx-queues
      40
      num-rx-desc
      4096
      }
    dev
    0000:d8:00.0
      {
      name
      mlx5c2
      num-rx-queues
      40
      num-rx-desc
      4096
      }
    dev
    0000:d8:00.1
      {
      name
      mlx5c3
      num-rx-queues
      40
      num-rx-desc
      4096
      }}
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#26273): https://lists.fd.io/g/vpp-dev/message/26273
Mute This Topic: https://lists.fd.io/mt/114761798/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/14379924/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to