Hi Pim and Andrew,

Thanks for the help! Turns out it was the stats memory that I had left out. After increasing that to 128M I was able to import a full v4 and v6 table no problem. As an aside, is the netlink plugin scheduled for an upcoming release or is the interface still experimental?

Many thanks,
Nate


On Thu, May 27, 2021 at 11:36 am, Pim van Pelt <p...@ipng.nl> wrote:
Hoi Nate,

further to what Andrew suggested, there are a few more hints I can offer: 1) Make sure there is enough netlink socket buffer by adding this to your sysctl set:
cat << EOF > /etc/sysctl.d/81-VPP-netlink.conf
# Increase netlink to 64M
net.core.rmem_default=67108864
net.core.wmem_default=67108864
net.core.rmem_max=67108864
net.core.wmem_max=67108864
EOF
sysctl -p /etc/sysctl.d/81-VPP-netlink.conf

2) Ensure there is enough memory by adding this to VPP's startup config:
memory {
  main-heap-size 2G
  main-heap-page-size default-hugepage
}

3) Many prefixes (like a full BGP routing table) will need more stats memory, so increase that too in VPP's startup config:
statseg {
  size 128M
}

And in case you missed it, make sure to create the linux-cp devices in a separate namespace by adding this to the startup config:
linux-cp {
  default netns dataplane
}

Then you should be able to consume the IPv4 and IPv6 DFZ in your router. I tested extensively with FRR and Bird2, and so far had good success.

groet,
Pim

On Thu, May 27, 2021 at 10:02 AM Andrew Yourtchenko <ayour...@gmail.com <mailto:ayour...@gmail.com>> wrote:
I would guess from your traceback you are running out of memory, so increasing the main heap size to something like 4x could help…

--a

On 27 May 2021, at 08:29, Nate Sales <n...@natesales.net <mailto:n...@natesales.net>> wrote:


Hello,

I'm having some trouble with the linux-cp netlink plugin. After building it from the patch set (<https://gerrit.fd.io/r/c/vpp/+/31122>), it does correctly receive netlink messages and insert routes from the linux kernel table into the VPP FIB. When loading a large amount of routes however (full IPv4 table), VPP crashes after loading about 400k routes.

It appears to be receiving a SIGABRT that terminates the VPP process:

May 27 06:10:33 pdx1rtr1 vnet[2232]: received signal SIGABRT, PC 0x7fe9b99bdce1 May 27 06:10:33 pdx1rtr1 vnet[2232]: #0 0x00007fe9b9de1a7b 0x7fe9b9de1a7b May 27 06:10:33 pdx1rtr1 vnet[2232]: #1 0x00007fe9b9d13140 0x7fe9b9d13140 May 27 06:10:33 pdx1rtr1 vnet[2232]: #2 0x00007fe9b99bdce1 gsignal + 0x141 May 27 06:10:33 pdx1rtr1 vnet[2232]: #3 0x00007fe9b99a7537 abort + 0x123 May 27 06:10:33 pdx1rtr1 vnet[2232]: #4 0x000055d43480a1f3 0x55d43480a1f3 May 27 06:10:33 pdx1rtr1 vnet[2232]: #5 0x00007fe9b9c9c8d5 vec_resize_allocate_memory + 0x285 May 27 06:10:33 pdx1rtr1 vnet[2232]: #6 0x00007fe9b9d71feb vlib_validate_combined_counter + 0xdb May 27 06:10:33 pdx1rtr1 vnet[2232]: #7 0x00007fe9ba4f1e55 load_balance_create + 0x205 May 27 06:10:33 pdx1rtr1 vnet[2232]: #8 0x00007fe9ba4c639d fib_entry_src_mk_lb + 0x38d May 27 06:10:33 pdx1rtr1 vnet[2232]: #9 0x00007fe9ba4c64a4 fib_entry_src_action_install + 0x44 May 27 06:10:33 pdx1rtr1 vnet[2232]: #10 0x00007fe9ba4c681b fib_entry_src_action_activate + 0x17b May 27 06:10:33 pdx1rtr1 vnet[2232]: #11 0x00007fe9ba4c3780 fib_entry_create + 0x70 May 27 06:10:33 pdx1rtr1 vnet[2232]: #12 0x00007fe9ba4b9afc fib_table_entry_update + 0x29c May 27 06:10:33 pdx1rtr1 vnet[2232]: #13 0x00007fe935fcedce 0x7fe935fcedce May 27 06:10:33 pdx1rtr1 vnet[2232]: #14 0x00007fe935fd2ab5 0x7fe935fd2ab5 May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Main process exited, code=killed, status=6/ABRT May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Failed with result 'signal'. May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU time. May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Scheduled restart job, restart counter is at 2. May 27 06:10:34 pdx1rtr1 systemd[1]: Stopped vector packet processing engine. May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU time. May 27 06:10:34 pdx1rtr1 systemd[1]: Starting vector packet processing engine... May 27 06:10:34 pdx1rtr1 systemd[1]: Started vector packet processing engine.

Here's what I'm working with:

root@pdx1rtr1:~# uname -a< /div>
Linux pdx1rtr1 5.10.0-7-amd64 #1 SMP Debian 5.10.38-1 (2021-05-20) x86_64 GNU/Linux
root@pdx1rtr1:~# vppctl show ver
vpp v21.10-rc0~3-g3f3da0d27 built by nate on altair at 2021-05-27T01:21:58
root@pdx1rtr1:~# bird --version
BIRD version 2.0.7

And some adjusted sysctl params:

net.core.rmem_default = 67108864
net.core.wmem_default = 67108864
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
vm.nr_hugepages = 1024
vm.max_map_count = 3096
vm.hugetlb_shm_group = 0
kernel.shmmax = 2147483648

In case it's at all helpful, I ran a "sh ip fib sum" every second and restarted BIRD to observe when the routes start processing, and to get the last known fib state before the crash:

Thu May 27 06:10:20 UTC 2021
ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none locks:[adjacency:1, default-route:1, lcp-rt:1, ]
    Prefix length         Count
                   0               1
                   4               2
                   8               3
                   9               5
                  10              29
                  11              62
        ;           12             169
                  13             357
                  14             702
                  15            1140
                  16            7110
                  17            4710
                  18            7763
                  19           13814
              &nb sp;   20           22146
                  21           26557
                  22           51780
                  23           43914
                  24          227173
                  27               1
                  32               6
Thu May 27 06:10:21 UTC 2021
clib_socket_init: connect (fd 3, '/run/vpp/cli.sock'): Connection refused
Thu May 27 06:10:22 UTC 2021
ipv4-VRF:0, fib_index:0, flow hash:[src dst spor t dport proto flowlabel ] epoch:0 flags:none locks:[default-route:1, ]
    Prefix length         Count
                   0               1
                   4               2
                  32               2


I'm new to VPP so let me know if there are other logs that would be useful too.

Cheers,
Nate










--
Pim van Pelt <p...@ipng.nl <mailto:p...@ipng.nl>>
PBVP1-RIPE - <http://www.ipng.nl/>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19493): https://lists.fd.io/g/vpp-dev/message/19493
Mute This Topic: https://lists.fd.io/mt/83119168/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to