Re: [vpp-dev] linux_nl_plugin causes VPP crash when importing a full IPv4 table

2021-05-27 Thread Nate Sales

Hi Pim and Andrew,

Thanks for the help! Turns out it was the stats memory that I had left 
out. After increasing that to 128M I was able to import a full v4 and 
v6 table no problem. As an aside, is the netlink plugin scheduled for 
an upcoming release or is the interface still experimental?


Many thanks,
Nate


On Thu, May 27, 2021 at 11:36 am, Pim van Pelt  wrote:

Hoi Nate,

further to what Andrew suggested, there are a few more hints I can 
offer:
1) Make sure there is enough netlink socket buffer by adding this to 
your sysctl set:

cat << EOF > /etc/sysctl.d/81-VPP-netlink.conf
# Increase netlink to 64M
net.core.rmem_default=67108864
net.core.wmem_default=67108864
net.core.rmem_max=67108864
net.core.wmem_max=67108864
EOF
sysctl -p /etc/sysctl.d/81-VPP-netlink.conf

2) Ensure there is enough memory by adding this to VPP's startup 
config:

memory {
  main-heap-size 2G
  main-heap-page-size default-hugepage
}

3) Many prefixes (like a full BGP routing table) will need more stats 
memory, so increase that too in VPP's startup config:

statseg {
  size 128M
}

And in case you missed it, make sure to create the linux-cp devices 
in a separate namespace by adding this to the startup config:

linux-cp {
  default netns dataplane
}

Then you should be able to consume the IPv4 and IPv6 DFZ in your 
router. I tested extensively with FRR and Bird2, and so far had good 
success.


groet,
Pim

On Thu, May 27, 2021 at 10:02 AM Andrew Yourtchenko 
mailto:ayour...@gmail.com>> wrote:
I would guess from your traceback you are running out of memory, so 
increasing the main heap size to something like 4x could help…


--a

On 27 May 2021, at 08:29, Nate Sales <mailto:n...@natesales.net>> wrote:



Hello,

I'm having some trouble with the linux-cp netlink plugin. After 
building it from the patch set 
(<https://gerrit.fd.io/r/c/vpp/+/31122>), it does correctly receive 
netlink messages and insert routes from the linux kernel table into 
the VPP FIB. When loading a large amount of routes however (full 
IPv4 table), VPP crashes after loading about 400k routes.


It appears to be receiving a SIGABRT that terminates the VPP 
process:


May 27 06:10:33 pdx1rtr1 vnet[2232]: received signal SIGABRT, PC 
0x7fe9b99bdce1
May 27 06:10:33 pdx1rtr1 vnet[2232]: #0  0x7fe9b9de1a7b 
0x7fe9b9de1a7b
May 27 06:10:33 pdx1rtr1 vnet[2232]: #1  0x7fe9b9d13140 
0x7fe9b9d13140
May 27 06:10:33 pdx1rtr1 vnet[2232]: #2  0x7fe9b99bdce1 gsignal 
+ 0x141
May 27 06:10:33 pdx1rtr1 vnet[2232]: #3  0x7fe9b99a7537 abort + 
0x123
May 27 06:10:33 pdx1rtr1 vnet[2232]: #4  0x55d43480a1f3 
0x55d43480a1f3
May 27 06:10:33 pdx1rtr1 vnet[2232]: #5  0x7fe9b9c9c8d5 
vec_resize_allocate_memory + 0x285
May 27 06:10:33 pdx1rtr1 vnet[2232]: #6  0x7fe9b9d71feb 
vlib_validate_combined_counter + 0xdb
May 27 06:10:33 pdx1rtr1 vnet[2232]: #7  0x7fe9ba4f1e55 
load_balance_create + 0x205
May 27 06:10:33 pdx1rtr1 vnet[2232]: #8  0x7fe9ba4c639d 
fib_entry_src_mk_lb + 0x38d
May 27 06:10:33 pdx1rtr1 vnet[2232]: #9  0x7fe9ba4c64a4 
fib_entry_src_action_install + 0x44
May 27 06:10:33 pdx1rtr1 vnet[2232]: #10 0x7fe9ba4c681b 
fib_entry_src_action_activate + 0x17b
May 27 06:10:33 pdx1rtr1 vnet[2232]: #11 0x7fe9ba4c3780 
fib_entry_create + 0x70
May 27 06:10:33 pdx1rtr1 vnet[2232]: #12 0x7fe9ba4b9afc 
fib_table_entry_update + 0x29c
May 27 06:10:33 pdx1rtr1 vnet[2232]: #13 0x7fe935fcedce 
0x7fe935fcedce
May 27 06:10:33 pdx1rtr1 vnet[2232]: #14 0x7fe935fd2ab5 
0x7fe935fd2ab5
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Main process 
exited, code=killed, status=6/ABRT
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Failed with 
result 'signal'.
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s 
CPU time.
May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Scheduled restart 
job, restart counter is at 2.
May 27 06:10:34 pdx1rtr1 systemd[1]: Stopped vector packet 
processing engine.
May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s 
CPU time.
May 27 06:10:34 pdx1rtr1 systemd[1]: Starting vector packet 
processing engine...
May 27 06:10:34 pdx1rtr1 systemd[1]: Started vector packet 
processing engine.


Here's what I'm working with:

root@pdx1rtr1:~# uname -a< /div>
Linux pdx1rtr1 5.10.0-7-amd64 #1 SMP Debian 5.10.38-1 (2021-05-20) 
x86_64 GNU/Linux

root@pdx1rtr1:~# vppctl show ver
vpp v21.10-rc0~3-g3f3da0d27 built by nate on altair at 
2021-05-27T01:21:58

root@pdx1rtr1:~# bird --version
BIRD version 2.0.7

And some adjusted sysctl params:

net.core.rmem_default = 67108864
net.core.wmem_default = 67108864
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
vm.nr_hugepages = 1024
vm.max_map_count = 3096
vm.hugetlb_shm_group = 0
kernel.shmmax = 2147483648

In case it's at all helpful, I ran a "sh ip fib sum" every second 
and restarted BIRD to observe when the routes start processing, and 
to get the last known fib state before the cr

[vpp-dev] linux_nl_plugin causes VPP crash when importing a full IPv4 table

2021-05-27 Thread Nate Sales

Hello,

I'm having some trouble with the linux-cp netlink plugin. After 
building it from the patch set 
(), it does correctly receive 
netlink messages and insert routes from the linux kernel table into the 
VPP FIB. When loading a large amount of routes however (full IPv4 
table), VPP crashes after loading about 400k routes.


It appears to be receiving a SIGABRT that terminates the VPP process:

May 27 06:10:33 pdx1rtr1 vnet[2232]: received signal SIGABRT, PC 
0x7fe9b99bdce1
May 27 06:10:33 pdx1rtr1 vnet[2232]: #0  0x7fe9b9de1a7b 
0x7fe9b9de1a7b
May 27 06:10:33 pdx1rtr1 vnet[2232]: #1  0x7fe9b9d13140 
0x7fe9b9d13140
May 27 06:10:33 pdx1rtr1 vnet[2232]: #2  0x7fe9b99bdce1 gsignal + 
0x141
May 27 06:10:33 pdx1rtr1 vnet[2232]: #3  0x7fe9b99a7537 abort + 
0x123
May 27 06:10:33 pdx1rtr1 vnet[2232]: #4  0x55d43480a1f3 
0x55d43480a1f3
May 27 06:10:33 pdx1rtr1 vnet[2232]: #5  0x7fe9b9c9c8d5 
vec_resize_allocate_memory + 0x285
May 27 06:10:33 pdx1rtr1 vnet[2232]: #6  0x7fe9b9d71feb 
vlib_validate_combined_counter + 0xdb
May 27 06:10:33 pdx1rtr1 vnet[2232]: #7  0x7fe9ba4f1e55 
load_balance_create + 0x205
May 27 06:10:33 pdx1rtr1 vnet[2232]: #8  0x7fe9ba4c639d 
fib_entry_src_mk_lb + 0x38d
May 27 06:10:33 pdx1rtr1 vnet[2232]: #9  0x7fe9ba4c64a4 
fib_entry_src_action_install + 0x44
May 27 06:10:33 pdx1rtr1 vnet[2232]: #10 0x7fe9ba4c681b 
fib_entry_src_action_activate + 0x17b
May 27 06:10:33 pdx1rtr1 vnet[2232]: #11 0x7fe9ba4c3780 
fib_entry_create + 0x70
May 27 06:10:33 pdx1rtr1 vnet[2232]: #12 0x7fe9ba4b9afc 
fib_table_entry_update + 0x29c
May 27 06:10:33 pdx1rtr1 vnet[2232]: #13 0x7fe935fcedce 
0x7fe935fcedce
May 27 06:10:33 pdx1rtr1 vnet[2232]: #14 0x7fe935fd2ab5 
0x7fe935fd2ab5
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Main process exited, 
code=killed, status=6/ABRT
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Failed with result 
'signal'.
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU 
time.
May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Scheduled restart 
job, restart counter is at 2.
May 27 06:10:34 pdx1rtr1 systemd[1]: Stopped vector packet processing 
engine.
May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU 
time.
May 27 06:10:34 pdx1rtr1 systemd[1]: Starting vector packet processing 
engine...
May 27 06:10:34 pdx1rtr1 systemd[1]: Started vector packet processing 
engine.


Here's what I'm working with:

root@pdx1rtr1:~# uname -a
Linux pdx1rtr1 5.10.0-7-amd64 #1 SMP Debian 5.10.38-1 (2021-05-20) 
x86_64 GNU/Linux

root@pdx1rtr1:~# vppctl show ver
vpp v21.10-rc0~3-g3f3da0d27 built by nate on altair at 
2021-05-27T01:21:58

root@pdx1rtr1:~# bird --version
BIRD version 2.0.7

And some adjusted sysctl params:

net.core.rmem_default = 67108864
net.core.wmem_default = 67108864
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
vm.nr_hugepages = 1024
vm.max_map_count = 3096
vm.hugetlb_shm_group = 0
kernel.shmmax = 2147483648

In case it's at all helpful, I ran a "sh ip fib sum" every second and 
restarted BIRD to observe when the routes start processing, and to get 
the last known fib state before the crash:


Thu May 27 06:10:20 UTC 2021
ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel 
] epoch:0 flags:none locks:[adjacency:1, default-route:1, lcp-rt:1, ]

   Prefix length Count
  0   1
  4   2
  8   3
  9   5
 10  29
 11  62
 12 169
 13 357
 14 702
 151140
 167110
 174710
 187763
 19   13814
 20   22146
 21   26557
 22   51780
 23   43914
 24  227173
 27   1
 32   6
Thu May 27 06:10:21 UTC 2021
clib_socket_init: connect (fd 3, '/run/vpp/cli.sock'): Connection 
refused

Thu May 27 06:10:22 UTC 2021
ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel 
] epoch:0 flags:none locks:[default-route:1, ]

   Prefix length Count
  0   1
  4   2
 32   2


I'm new to VPP so let me know if there are other logs that would be 
useful too.


Cheers,
Nate




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19483): https://lists.fd.io/g/vpp-dev/message/19483
Mute This Topic: https://lists.fd.io/mt/83119168/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub