Hi Andre, When you say you took down the linux router, what does that mean? You stopped the BGP daemon? Or rebooted it? Or took the interface it uses to communicate with the VPP router down?
The symptoms you describe seem reminiscent of an issue mentioned in a comment here - https://github.com/FDio/vpp/blob/master/src/plugins/linux-cp/lcp_router.c#L516. The gist of it is that in some cases when the link on an interface goes down, the kernel can automatically remove routes resolving through that interface and it does not send an RTM_DELROUTE so linux-cp does not receive any indication that the route should be deleted. If your VPP router is directly attached to your linux router and the interface on the linux router side went down and took the VPP router interface's link down, maybe this is your problem. At the time that comment was added I believe this affected FRR but did not affect bird. But it's possible that the behavior of bird or the linux kernel has changed since then. There was an option added to startup.conf for linux-cp which automatically deletes routes that resolve through an interface when the link state of the interface goes down. You could try adding del-dynamic-on-link-down to the linux-cp section of your startup.conf and see if that helps. If that option doesn't help any, you could possibly get more information on exactly what routes are being added/replaced/deleted via one of the following: - vppctl set logging class linux-cp log-level debug syslog-level debug - ip link add dev nlmon0 type nlmon; ip link set dev nlmon0 up; tcpdump -w nlmon.pcap -i nlmon I see many log messages like this with "journalctl -xeu vpp", but I'm not > sure if they are relevant. Aug 07 12:48:45 router1 vpp[3332459]: linux-cp/router: Failed to delete > neighbor: <some-ip-address> BondEthernet0 Those messages are noise. When VPP does not have a neighbor cache entry for some IP address and the kernel announces it has deleted it's neighbor entry for that address, the operation fails. The failure is inconsequential, there are several legitimate reasons why VPP would not have a neighbor entry (e.g. linux's neighbor entry was not fully resolved yet and was in some intermediate state). The log level for the message written when that happens should probably be changed to debug instead of notice. -Matt On Thu, Aug 7, 2025 at 8:00 AM Andre Nathan via lists.fd.io <andre= digirati.com...@lists.fd.io> wrote: > Hi Stanislav > > On 8/7/25 7:27 AM, Stanislav Zaikin via lists.fd.io wrote: > > Hello Andre, > > > > I'd suggest upgrading to the latest stable release. > > I'm going to try the upgrade as soon as possible. > > > For wrong ECMP: are you sure you have different metrics in Linux? can > > you show the output of ip route? > > Those routes are not in Linux, though they remain in the vpp fib, so I > can't see what they look like: > > # ip route | grep 'via a.b.c.d' | wc -l > # 0 > # vppctl show ip fib | grep 'via a.b.c.d' | wc -l > 788 > > > For stale entries: did you adjust sysctl to support big buffers for > > netlink socket? Do you see in lcp log messages about re-synchronization? > > I have net.core.rmem_default, net.core.wmem_default, net.core.rmem_max > and net.core.wmem_max all set to 67108864. Do you think increasing them > further can be helpful? Are there other sysctls I should increase? > > I see many log messages like this with "journalctl -xeu vpp", but I'm > not sure if they are relevant. > > Aug 07 12:48:45 router1 vpp[3332459]: linux-cp/router: Failed to delete > neighbor: <some-ip-address> BondEthernet0 > > Thanks, > Andre > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#26251): https://lists.fd.io/g/vpp-dev/message/26251 Mute This Topic: https://lists.fd.io/mt/114547460/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/14379924/21656/631435203/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-