Re: linux_nl_plugin routing issues [Was: Re: [vpp-dev] linux_nl_plugin causes VPP crash when importing a full IPv4 table]

2021-06-02 Thread Mike Beattie
On Fri, May 28, 2021 at 10:32:06AM -0500, Matthew Smith via lists.fd.io wrote:
> Hi Mike,
> 
> The first problem you mentioned (packets matching a route are not sent when
> the next hop has not been resolved at the time the route is added) is
> likely fixed by this patch:
> 
> e2353a7f6 linux-cp: Add delegate to adjacencies
> 
> It was merged after fd77f8c00, so you would either need to cherry-pick it
> or rebase onto that commit or some more recent one on master.

Hi Matt, 

I've rebased onto master as of now (2f64790c), and after fixing a merge
conflict for Neale's V2 pair create commit (6bb77dec7), it now works as
expected. So I can confirm that that patch does indeed appear to fix the
problem.

Thanks!

On Fri, May 28, 2021 at 10:32:06AM -0500, Matthew Smith via lists.fd.io wrote:
> I have heard another report of the second issue you mentioned (changing
> default netns does not work correctly) but I haven't gotten time to look at
> it yet.

On Fri, May 28, 2021 at 07:28:12PM +0200, Pim van Pelt wrote:
> I can confirm that changing the default netns after startup of VPP does not
> work, and your diagnosis is correct; the listener is created at start time,
> and is not reconfigured after changing the default.

Yeah, I was initially testing by setting the default ns within vppctl - I
went looking at the code after that didn't appear to be working, and found the
listener creation in lcp_nl_init(), and not in lcp_set_default_ns().

However, until this patch is merged, there's no reason for lcp_set_default_ns
to know anything about netlink listeners, so that makes sense.

Thanks again for both of your help,

Mike.
-- 
Mike Beattie 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19522): https://lists.fd.io/g/vpp-dev/message/19522
Mute This Topic: https://lists.fd.io/mt/83138237/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: linux_nl_plugin routing issues [Was: Re: [vpp-dev] linux_nl_plugin causes VPP crash when importing a full IPv4 table]

2021-05-28 Thread Pim van Pelt
On Fri, May 28, 2021 at 1:59 AM Mike Beattie  wrote:

>
> # vppctl lcp default netns dataplane
>
> As the netlink listener doesn't appear to be re-created in that netns
> dynamically.
>
I can confirm that changing the default netns after startup of VPP does not
work, and your diagnosis is correct; the listener is created at start time,
and is not reconfigured after changing the default.
For me, adding the default netns in the startup.conf is sufficient, and
perhaps we ought to simply remove the API call, because changing it several
times will mean there needs to be a pool of listeners, one per netns used,
and routing will become difficult unless VRFs are used per namespace --
sounds, to me at least, more problematic to keep the API call and ability
to switch/plumb into differing namespaces, than simply to remove the
feature :)

But then, maybe there was a design consideration/reason to add TAPs into
multiple namespaces, Neale or Matt may know more.

groet,
Pim

-- 
Pim van Pelt 
PBVP1-RIPE - http://www.ipng.nl/

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19501): https://lists.fd.io/g/vpp-dev/message/19501
Mute This Topic: https://lists.fd.io/mt/83138237/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: linux_nl_plugin routing issues [Was: Re: [vpp-dev] linux_nl_plugin causes VPP crash when importing a full IPv4 table]

2021-05-28 Thread Matthew Smith via lists.fd.io
Hi Mike,

The first problem you mentioned (packets matching a route are not sent when
the next hop has not been resolved at the time the route is added) is
likely fixed by this patch:

e2353a7f6 linux-cp: Add delegate to adjacencies

It was merged after fd77f8c00, so you would either need to cherry-pick it
or rebase onto that commit or some more recent one on master.

I have heard another report of the second issue you mentioned (changing
default netns does not work correctly) but I haven't gotten time to look at
it yet.

Thanks,
-Matt


On Thu, May 27, 2021 at 6:59 PM Mike Beattie  wrote:

> On Thu, May 27, 2021 at 11:36:02AM +0200, Pim van Pelt wrote:
> > Hoi Nate,
> >
> > further to what Andrew suggested, there are a few more hints I can offer:
> > 
> > Then you should be able to consume the IPv4 and IPv6 DFZ in your router.
> I
> > tested extensively with FRR and Bird2, and so far had good success.
>
> Pim, thank you for those hints - I plan to be implementing a new core
> routing infrastructure using VPP & FRR w/ linux-cp & linux-nl that will be
> consuming full tables in the near future. Your hints will be invaluable I
> susect.
>
> However, in my testing, I discovered an interesting behaviour with regards
> to routing. I have previously tried to reply with my findings to the list,
> but I wasn't subscribed at the time of Neale's posts, and I wanted to
> continue on his thread ... I composed a detailed report on the web
> interface
> of the list, then managed to completely miss the "CC list" checkbox. So I
> think Neale got it himself only. (Sorry Neale).
>
> I digress... what I discovered was that if a route entry is created before
> a
> neighbor entry with the next hop is established, no traffic flows:
>
>
> root@vpp-test:~# ip netns exec dataplane bash
> root@vpp-test:~# systemctl restart vpp.service
> root@vpp-test:~# vppctl set interface mtu 1500 GigabitEthernet0/13/0
> root@vpp-test:~# vppctl lcp create GigabitEthernet0/13/0 host-if vpp1
> netns dataplane
> root@vpp-test:~# ip l
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode
> DEFAULT group default qlen 1000
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> root@vpp-test:~# ip a
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
> default qlen 1000
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8 scope host lo
>valid_lft forever preferred_lft forever
> inet6 ::1/128 scope host
>valid_lft forever preferred_lft forever
> 15: vpp1:  mtu 1500 qdisc mq state DOWN group default
> qlen 1000
> link/ether 32:dc:fa:93:9e:fe brd ff:ff:ff:ff:ff:ff
> root@vpp-test:~# cat init50.sh
> #!/bin/sh
>
> ip link set up dev vpp1
>
> ip link add link vpp1 vpp1.50 type vlan id 50
> ip link set up dev vpp1.50
> ip addr add 10.xxx.yyy.202/24 dev vpp1.50
>
> root@vpp-test:~# ./init50.sh
> root@vpp-test:~# ping 1.1.1.1
> ping: connect: Network is unreachable
> root@vpp-test:~# ip route add default via 10.xxx.yyy.254
> root@vpp-test:~# ping 1.1.1.1
> PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
> ^C
> --- 1.1.1.1 ping statistics ---
> 5 packets transmitted, 0 received, 100% packet loss, time 4077ms
>
> root@vpp-test:~# ping 10.xxx.yyy.254
> PING 10.xxx.yyy.254 (10.xxx.yyy.254) 56(84) bytes of data.
> ^C
> --- 10.xxx.yyy.254 ping statistics ---
> 4 packets transmitted, 0 received, 100% packet loss, time 3070ms
>
> root@vpp-test:~# ip route delete default
> root@vpp-test:~# ping 10.xxx.yyy.254
> PING 10.xxx.yyy.254 (10.xxx.yyy.254) 56(84) bytes of data.
> ^C
> --- 10.xxx.yyy.254 ping statistics ---
> 4 packets transmitted, 0 received, 100% packet loss, time 3062ms
>
>
>
> No traffic passed... ping router before adding route:
>
>
>
> root@vpp-test:~# systemctl restart vpp.service
> root@vpp-test:~# ./init50.sh
> root@vpp-test:~# ping 10.xxx.yyy.254
> PING 10.xxx.yyy.254 (10.xxx.yyy.254) 56(84) bytes of data.
> 64 bytes from 10.xxx.yyy.254: icmp_seq=1 ttl=64 time=0.780 ms
> 64 bytes from 10.xxx.yyy.254: icmp_seq=2 ttl=64 time=0.306 ms
> 64 bytes from 10.xxx.yyy.254: icmp_seq=3 ttl=64 time=0.310 ms
> ^C
> --- 10.xxx.yyy.254 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2038ms
> rtt min/avg/max/mdev = 0.306/0.465/0.780/0.222 ms
> root@vpp-test:~# ip route add default via 10.xxx.yyy.254
> root@vpp-test:~# ping 1.1.1.1
> PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
> 64 bytes from 1.1.1.1: icmp_seq=1 ttl=60 time=23.5 ms
> 64 bytes from 1.1.1.1: icmp_seq=2 ttl=60 time=23.9 ms
> ^C
> --- 1.1.1.1 ping statistics ---
> 2 packets transmitted, 2 received, 0% packet loss, time 1002ms
> rtt min/avg/max/mdev = 23.541/23.710/23.879/0.169 ms
> root@vpp-test:~#
>
>
> Traffic passes fine.
>
> This is a basic VPP installation built with
> https://gerrit.fd.io/r/c/vpp/+/31122 rebased onto master of a couple weeks
> ago (fd77f8c00). Ping plugin disabled, linux-cp and linux-nl enabled, with
> linux-cp config of:
>
> linux-cp {
> default netns dataplane
> interface-auto

Re: [vpp-dev] linux_nl_plugin causes VPP crash when importing a full IPv4 table

2021-05-27 Thread Andrew Yourtchenko
Hi Nate,

Cool that it works and thanks to Pim for a much more detailed reply than mine 
:) !

Since https://gerrit.fd.io/r/c/vpp/+/31122 isn’t merged in the tree yet, it 
won’t be at least part of 21.06… I would suggest to ping Neale as to what the 
plans are :)

--a

> On 28 May 2021, at 06:29, Nate Sales  wrote:
> 
> 
> Hi Pim and Andrew,
> 
> Thanks for the help! Turns out it was the stats memory that I had left out. 
> After increasing that to 128M I was able to import a full v4 and v6 table no 
> problem. As an aside, is the netlink plugin scheduled for an upcoming release 
> or is the interface still experimental?
> 
> Many thanks,
> Nate
> 
> 
>> On Thu, May 27, 2021 at 11:36 am, Pim van Pelt  wrote:
>> Hoi Nate,
>> 
>> further to what Andrew suggested, there are a few more hints I can offer:
>> 1) Make sure there is enough netlink socket buffer by adding this to your 
>> sysctl set:
>> cat << EOF > /etc/sysctl.d/81-VPP-netlink.conf 
>> # Increase netlink to 64M
>> net.core.rmem_default=67108864
>> net.core.wmem_default=67108864
>> net.core.rmem_max=67108864
>> net.core.wmem_max=67108864
>> EOF
>> sysctl -p /etc/sysctl.d/81-VPP-netlink.conf
>> 
>> 2) Ensure there is enough memory by adding this to VPP's startup config:
>> memory {
>>   main-heap-size 2G
>>   main-heap-page-size default-hugepage
>> }
>> 
>> 3) Many prefixes (like a full BGP routing table) will need more stats 
>> memory, so increase that too in VPP's startup config:
>> statseg {
>>   size 128M
>> }
>> 
>> And in case you missed it, make sure to create the linux-cp devices in a 
>> separate namespace by adding this to the startup config:
>> linux-cp {
>>   default netns dataplane
>> }
>> 
>> Then you should be able to consume the IPv4 and IPv6 DFZ in your router. I 
>> tested extensively with FRR and Bird2, and so far had good success.
>> 
>> groet,
>> Pim
>> 
>>> On Thu, May 27, 2021 at 10:02 AM Andrew Yourtchenko  
>>> wrote:
>>> I would guess from your traceback you are running out of memory, so 
>>> increasing the main heap size to something like 4x could help…
>>> 
>>> --a
>>> 
> On 27 May 2021, at 08:29, Nate Sales  wrote:
> 
 
 Hello,
 
 I'm having some trouble with the linux-cp netlink plugin. After building 
 it from the patch set (https://gerrit.fd.io/r/c/vpp/+/31122), it does 
 correctly receive netlink messages and insert routes from the linux kernel 
 table into the VPP FIB. When loading a large amount of routes however 
 (full IPv4 table), VPP crashes after loading about 400k routes.
 
 It appears to be receiving a SIGABRT that terminates the VPP process:
 
 May 27 06:10:33 pdx1rtr1 vnet[2232]: received signal SIGABRT, PC 
 0x7fe9b99bdce1
 May 27 06:10:33 pdx1rtr1 vnet[2232]: #0  0x7fe9b9de1a7b 0x7fe9b9de1a7b
 May 27 06:10:33 pdx1rtr1 vnet[2232]: #1  0x7fe9b9d13140 0x7fe9b9d13140
 May 27 06:10:33 pdx1rtr1 vnet[2232]: #2  0x7fe9b99bdce1 gsignal + 0x141
 May 27 06:10:33 pdx1rtr1 vnet[2232]: #3  0x7fe9b99a7537 abort + 0x123
 May 27 06:10:33 pdx1rtr1 vnet[2232]: #4  0x55d43480a1f3 0x55d43480a1f3
 May 27 06:10:33 pdx1rtr1 vnet[2232]: #5  0x7fe9b9c9c8d5 
 vec_resize_allocate_memory + 0x285
 May 27 06:10:33 pdx1rtr1 vnet[2232]: #6  0x7fe9b9d71feb 
 vlib_validate_combined_counter + 0xdb
 May 27 06:10:33 pdx1rtr1 vnet[2232]: #7  0x7fe9ba4f1e55 
 load_balance_create + 0x205
 May 27 06:10:33 pdx1rtr1 vnet[2232]: #8  0x7fe9ba4c639d 
 fib_entry_src_mk_lb + 0x38d
 May 27 06:10:33 pdx1rtr1 vnet[2232]: #9  0x7fe9ba4c64a4 
 fib_entry_src_action_install + 0x44
 May 27 06:10:33 pdx1rtr1 vnet[2232]: #10 0x7fe9ba4c681b 
 fib_entry_src_action_activate + 0x17b
 May 27 06:10:33 pdx1rtr1 vnet[2232]: #11 0x7fe9ba4c3780 
 fib_entry_create + 0x70
 May 27 06:10:33 pdx1rtr1 vnet[2232]: #12 0x7fe9ba4b9afc 
 fib_table_entry_update + 0x29c
 May 27 06:10:33 pdx1rtr1 vnet[2232]: #13 0x7fe935fcedce 0x7fe935fcedce
 May 27 06:10:33 pdx1rtr1 vnet[2232]: #14 0x7fe935fd2ab5 0x7fe935fd2ab5
 May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Main process exited, 
 code=killed, status=6/ABRT
 May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Failed with result 
 'signal'.
 May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU 
 time.
 May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Scheduled restart job, 
 restart counter is at 2.
 May 27 06:10:34 pdx1rtr1 systemd[1]: Stopped vector packet processing 
 engine.
 May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU 
 time.
 May 27 06:10:34 pdx1rtr1 systemd[1]: Starting vector packet processing 
 engine...
 May 27 06:10:34 pdx1rtr1 systemd[1]: Started vector packet processing 
 engine.
 
 Here's what I'm working with:
 
 root@pdx1rtr1:~# uname -a< /div>
 Linux pdx1rtr1 5.10.0-7-amd64 #1 S

Re: [vpp-dev] linux_nl_plugin causes VPP crash when importing a full IPv4 table

2021-05-27 Thread Nate Sales

Hi Pim and Andrew,

Thanks for the help! Turns out it was the stats memory that I had left 
out. After increasing that to 128M I was able to import a full v4 and 
v6 table no problem. As an aside, is the netlink plugin scheduled for 
an upcoming release or is the interface still experimental?


Many thanks,
Nate


On Thu, May 27, 2021 at 11:36 am, Pim van Pelt  wrote:

Hoi Nate,

further to what Andrew suggested, there are a few more hints I can 
offer:
1) Make sure there is enough netlink socket buffer by adding this to 
your sysctl set:

cat << EOF > /etc/sysctl.d/81-VPP-netlink.conf
# Increase netlink to 64M
net.core.rmem_default=67108864
net.core.wmem_default=67108864
net.core.rmem_max=67108864
net.core.wmem_max=67108864
EOF
sysctl -p /etc/sysctl.d/81-VPP-netlink.conf

2) Ensure there is enough memory by adding this to VPP's startup 
config:

memory {
  main-heap-size 2G
  main-heap-page-size default-hugepage
}

3) Many prefixes (like a full BGP routing table) will need more stats 
memory, so increase that too in VPP's startup config:

statseg {
  size 128M
}

And in case you missed it, make sure to create the linux-cp devices 
in a separate namespace by adding this to the startup config:

linux-cp {
  default netns dataplane
}

Then you should be able to consume the IPv4 and IPv6 DFZ in your 
router. I tested extensively with FRR and Bird2, and so far had good 
success.


groet,
Pim

On Thu, May 27, 2021 at 10:02 AM Andrew Yourtchenko 
mailto:ayour...@gmail.com>> wrote:
I would guess from your traceback you are running out of memory, so 
increasing the main heap size to something like 4x could help…


--a

On 27 May 2021, at 08:29, Nate Sales > wrote:



Hello,

I'm having some trouble with the linux-cp netlink plugin. After 
building it from the patch set 
(), it does correctly receive 
netlink messages and insert routes from the linux kernel table into 
the VPP FIB. When loading a large amount of routes however (full 
IPv4 table), VPP crashes after loading about 400k routes.


It appears to be receiving a SIGABRT that terminates the VPP 
process:


May 27 06:10:33 pdx1rtr1 vnet[2232]: received signal SIGABRT, PC 
0x7fe9b99bdce1
May 27 06:10:33 pdx1rtr1 vnet[2232]: #0  0x7fe9b9de1a7b 
0x7fe9b9de1a7b
May 27 06:10:33 pdx1rtr1 vnet[2232]: #1  0x7fe9b9d13140 
0x7fe9b9d13140
May 27 06:10:33 pdx1rtr1 vnet[2232]: #2  0x7fe9b99bdce1 gsignal 
+ 0x141
May 27 06:10:33 pdx1rtr1 vnet[2232]: #3  0x7fe9b99a7537 abort + 
0x123
May 27 06:10:33 pdx1rtr1 vnet[2232]: #4  0x55d43480a1f3 
0x55d43480a1f3
May 27 06:10:33 pdx1rtr1 vnet[2232]: #5  0x7fe9b9c9c8d5 
vec_resize_allocate_memory + 0x285
May 27 06:10:33 pdx1rtr1 vnet[2232]: #6  0x7fe9b9d71feb 
vlib_validate_combined_counter + 0xdb
May 27 06:10:33 pdx1rtr1 vnet[2232]: #7  0x7fe9ba4f1e55 
load_balance_create + 0x205
May 27 06:10:33 pdx1rtr1 vnet[2232]: #8  0x7fe9ba4c639d 
fib_entry_src_mk_lb + 0x38d
May 27 06:10:33 pdx1rtr1 vnet[2232]: #9  0x7fe9ba4c64a4 
fib_entry_src_action_install + 0x44
May 27 06:10:33 pdx1rtr1 vnet[2232]: #10 0x7fe9ba4c681b 
fib_entry_src_action_activate + 0x17b
May 27 06:10:33 pdx1rtr1 vnet[2232]: #11 0x7fe9ba4c3780 
fib_entry_create + 0x70
May 27 06:10:33 pdx1rtr1 vnet[2232]: #12 0x7fe9ba4b9afc 
fib_table_entry_update + 0x29c
May 27 06:10:33 pdx1rtr1 vnet[2232]: #13 0x7fe935fcedce 
0x7fe935fcedce
May 27 06:10:33 pdx1rtr1 vnet[2232]: #14 0x7fe935fd2ab5 
0x7fe935fd2ab5
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Main process 
exited, code=killed, status=6/ABRT
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Failed with 
result 'signal'.
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s 
CPU time.
May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Scheduled restart 
job, restart counter is at 2.
May 27 06:10:34 pdx1rtr1 systemd[1]: Stopped vector packet 
processing engine.
May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s 
CPU time.
May 27 06:10:34 pdx1rtr1 systemd[1]: Starting vector packet 
processing engine...
May 27 06:10:34 pdx1rtr1 systemd[1]: Started vector packet 
processing engine.


Here's what I'm working with:

root@pdx1rtr1:~# uname -a< /div>
Linux pdx1rtr1 5.10.0-7-amd64 #1 SMP Debian 5.10.38-1 (2021-05-20) 
x86_64 GNU/Linux

root@pdx1rtr1:~# vppctl show ver
vpp v21.10-rc0~3-g3f3da0d27 built by nate on altair at 
2021-05-27T01:21:58

root@pdx1rtr1:~# bird --version
BIRD version 2.0.7

And some adjusted sysctl params:

net.core.rmem_default = 67108864
net.core.wmem_default = 67108864
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
vm.nr_hugepages = 1024
vm.max_map_count = 3096
vm.hugetlb_shm_group = 0
kernel.shmmax = 2147483648

In case it's at all helpful, I ran a "sh ip fib sum" every second 
and restarted BIRD to observe when the routes start processing, and 
to get the last known fib state before the crash:


Thu May 27 06:10:20 UTC 2021
ipv4-VRF:0, fib_in

linux_nl_plugin routing issues [Was: Re: [vpp-dev] linux_nl_plugin causes VPP crash when importing a full IPv4 table]

2021-05-27 Thread Mike Beattie
On Thu, May 27, 2021 at 11:36:02AM +0200, Pim van Pelt wrote:
> Hoi Nate,
> 
> further to what Andrew suggested, there are a few more hints I can offer:
> 
> Then you should be able to consume the IPv4 and IPv6 DFZ in your router. I
> tested extensively with FRR and Bird2, and so far had good success.

Pim, thank you for those hints - I plan to be implementing a new core
routing infrastructure using VPP & FRR w/ linux-cp & linux-nl that will be
consuming full tables in the near future. Your hints will be invaluable I
susect.

However, in my testing, I discovered an interesting behaviour with regards
to routing. I have previously tried to reply with my findings to the list,
but I wasn't subscribed at the time of Neale's posts, and I wanted to
continue on his thread ... I composed a detailed report on the web interface
of the list, then managed to completely miss the "CC list" checkbox. So I
think Neale got it himself only. (Sorry Neale).

I digress... what I discovered was that if a route entry is created before a
neighbor entry with the next hop is established, no traffic flows:


root@vpp-test:~# ip netns exec dataplane bash
root@vpp-test:~# systemctl restart vpp.service
root@vpp-test:~# vppctl set interface mtu 1500 GigabitEthernet0/13/0
root@vpp-test:~# vppctl lcp create GigabitEthernet0/13/0 host-if vpp1 netns 
dataplane
root@vpp-test:~# ip l
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode 
DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
root@vpp-test:~# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
   valid_lft forever preferred_lft forever
15: vpp1:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
link/ether 32:dc:fa:93:9e:fe brd ff:ff:ff:ff:ff:ff
root@vpp-test:~# cat init50.sh
#!/bin/sh

ip link set up dev vpp1

ip link add link vpp1 vpp1.50 type vlan id 50
ip link set up dev vpp1.50
ip addr add 10.xxx.yyy.202/24 dev vpp1.50

root@vpp-test:~# ./init50.sh
root@vpp-test:~# ping 1.1.1.1
ping: connect: Network is unreachable
root@vpp-test:~# ip route add default via 10.xxx.yyy.254
root@vpp-test:~# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
^C
--- 1.1.1.1 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4077ms

root@vpp-test:~# ping 10.xxx.yyy.254
PING 10.xxx.yyy.254 (10.xxx.yyy.254) 56(84) bytes of data.
^C
--- 10.xxx.yyy.254 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3070ms

root@vpp-test:~# ip route delete default
root@vpp-test:~# ping 10.xxx.yyy.254
PING 10.xxx.yyy.254 (10.xxx.yyy.254) 56(84) bytes of data.
^C
--- 10.xxx.yyy.254 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3062ms



No traffic passed... ping router before adding route:



root@vpp-test:~# systemctl restart vpp.service
root@vpp-test:~# ./init50.sh
root@vpp-test:~# ping 10.xxx.yyy.254
PING 10.xxx.yyy.254 (10.xxx.yyy.254) 56(84) bytes of data.
64 bytes from 10.xxx.yyy.254: icmp_seq=1 ttl=64 time=0.780 ms
64 bytes from 10.xxx.yyy.254: icmp_seq=2 ttl=64 time=0.306 ms
64 bytes from 10.xxx.yyy.254: icmp_seq=3 ttl=64 time=0.310 ms
^C
--- 10.xxx.yyy.254 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2038ms
rtt min/avg/max/mdev = 0.306/0.465/0.780/0.222 ms
root@vpp-test:~# ip route add default via 10.xxx.yyy.254
root@vpp-test:~# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=60 time=23.5 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=60 time=23.9 ms
^C
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 23.541/23.710/23.879/0.169 ms
root@vpp-test:~#


Traffic passes fine.

This is a basic VPP installation built with
https://gerrit.fd.io/r/c/vpp/+/31122 rebased onto master of a couple weeks
ago (fd77f8c00). Ping plugin disabled, linux-cp and linux-nl enabled, with
linux-cp config of:

linux-cp {
default netns dataplane
interface-auto-create
}

Normally, this behaviour wouldn't be an issue, as a neighbor relationship
with the nexthop will be created with the BGP converstion that would cause
routes to be created that use that nexthop - however, that's not the case
with Route Reflectors that I plan on implementing. OSPF will be used in the
implementation which might mitigate the problem - I hadn't gotten that far
in testing. However, I figured that if this is a real bug, then it's worth
fixing.


There were a couple of other feedback items for the linux-nl plugin that I'd
written to Neale in the web form for the list, but I can only recall one of
them - the default netns has to be specified in the config file, you can't
use the command:

# vppctl lcp default netns dataplane

As the netlink listener doesn't appear to be re-created

Re: [vpp-dev] linux_nl_plugin causes VPP crash when importing a full IPv4 table

2021-05-27 Thread Pim van Pelt
Hoi Nate,

further to what Andrew suggested, there are a few more hints I can offer:
1) Make sure there is enough netlink socket buffer by adding this to your
sysctl set:
cat << EOF > /etc/sysctl.d/81-VPP-netlink.conf
# Increase netlink to 64M
net.core.rmem_default=67108864
net.core.wmem_default=67108864
net.core.rmem_max=67108864
net.core.wmem_max=67108864
EOF
sysctl -p /etc/sysctl.d/81-VPP-netlink.conf

2) Ensure there is enough memory by adding this to VPP's startup config:
memory {
  main-heap-size 2G
  main-heap-page-size default-hugepage
}

3) Many prefixes (like a full BGP routing table) will need more stats
memory, so increase that too in VPP's startup config:
statseg {
  size 128M
}

And in case you missed it, make sure to create the linux-cp devices in a
separate namespace by adding this to the startup config:
linux-cp {
  default netns dataplane
}

Then you should be able to consume the IPv4 and IPv6 DFZ in your router. I
tested extensively with FRR and Bird2, and so far had good success.

groet,
Pim

On Thu, May 27, 2021 at 10:02 AM Andrew Yourtchenko 
wrote:

> I would guess from your traceback you are running out of memory, so
> increasing the main heap size to something like 4x could help…
>
> --a
>
> On 27 May 2021, at 08:29, Nate Sales  wrote:
>
> 
> Hello,
>
> I'm having some trouble with the linux-cp netlink plugin. After building
> it from the patch set (https://gerrit.fd.io/r/c/vpp/+/31122), it does
> correctly receive netlink messages and insert routes from the linux kernel
> table into the VPP FIB. When loading a large amount of routes however (full
> IPv4 table), VPP crashes after loading about 400k routes.
>
> It appears to be receiving a SIGABRT that terminates the VPP process:
>
> May 27 06:10:33 pdx1rtr1 vnet[2232]: received signal SIGABRT, PC
> 0x7fe9b99bdce1
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #0  0x7fe9b9de1a7b 0x7fe9b9de1a7b
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #1  0x7fe9b9d13140 0x7fe9b9d13140
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #2  0x7fe9b99bdce1 gsignal + 0x141
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #3  0x7fe9b99a7537 abort + 0x123
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #4  0x55d43480a1f3 0x55d43480a1f3
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #5  0x7fe9b9c9c8d5
> vec_resize_allocate_memory + 0x285
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #6  0x7fe9b9d71feb
> vlib_validate_combined_counter + 0xdb
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #7  0x7fe9ba4f1e55
> load_balance_create + 0x205
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #8  0x7fe9ba4c639d
> fib_entry_src_mk_lb + 0x38d
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #9  0x7fe9ba4c64a4
> fib_entry_src_action_install + 0x44
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #10 0x7fe9ba4c681b
> fib_entry_src_action_activate + 0x17b
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #11 0x7fe9ba4c3780
> fib_entry_create + 0x70
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #12 0x7fe9ba4b9afc
> fib_table_entry_update + 0x29c
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #13 0x7fe935fcedce 0x7fe935fcedce
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #14 0x7fe935fd2ab5 0x7fe935fd2ab5
> May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Main process exited,
> code=killed, status=6/ABRT
> May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Failed with result
> 'signal'.
> May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU
> time.
> May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Scheduled restart job,
> restart counter is at 2.
> May 27 06:10:34 pdx1rtr1 systemd[1]: Stopped vector packet processing
> engine.
> May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU
> time.
> May 27 06:10:34 pdx1rtr1 systemd[1]: Starting vector packet processing
> engine...
> May 27 06:10:34 pdx1rtr1 systemd[1]: Started vector packet processing
> engine.
>
> Here's what I'm working with:
>
> root@pdx1rtr1:~# uname -a< /div>
> Linux pdx1rtr1 5.10.0-7-amd64 #1 SMP Debian 5.10.38-1 (2021-05-20) x86_64
> GNU/Linux
> root@pdx1rtr1:~# vppctl show ver
> vpp v21.10-rc0~3-g3f3da0d27 built by nate on altair at 2021-05-27T01:21:58
> root@pdx1rtr1:~# bird --version
> BIRD version 2.0.7
>
> And some adjusted sysctl params:
>
> net.core.rmem_default = 67108864
> net.core.wmem_default = 67108864
> net.core.rmem_max = 67108864
> net.core.wmem_max = 67108864
> vm.nr_hugepages = 1024
> vm.max_map_count = 3096
> vm.hugetlb_shm_group = 0
> kernel.shmmax = 2147483648
>
> In case it's at all helpful, I ran a "sh ip fib sum" every second and
> restarted BIRD to observe when the routes start processing, and to get the
> last known fib state before the crash:
>
> Thu May 27 06:10:20 UTC 2021
> ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel ]
> epoch:0 flags:none locks:[adjacency:1, default-route:1, lcp-rt:1, ]
> Prefix length Count
>0   1
>4   2
>8   3
>9   5
>  

Re: [vpp-dev] linux_nl_plugin causes VPP crash when importing a full IPv4 table

2021-05-27 Thread Andrew Yourtchenko
I would guess from your traceback you are running out of memory, so increasing 
the main heap size to something like 4x could help…

--a

> On 27 May 2021, at 08:29, Nate Sales  wrote:
> 
> 
> Hello,
> 
> I'm having some trouble with the linux-cp netlink plugin. After building it 
> from the patch set (https://gerrit.fd.io/r/c/vpp/+/31122), it does correctly 
> receive netlink messages and insert routes from the linux kernel table into 
> the VPP FIB. When loading a large amount of routes however (full IPv4 table), 
> VPP crashes after loading about 400k routes.
> 
> It appears to be receiving a SIGABRT that terminates the VPP process:
> 
> May 27 06:10:33 pdx1rtr1 vnet[2232]: received signal SIGABRT, PC 
> 0x7fe9b99bdce1
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #0  0x7fe9b9de1a7b 0x7fe9b9de1a7b
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #1  0x7fe9b9d13140 0x7fe9b9d13140
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #2  0x7fe9b99bdce1 gsignal + 0x141
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #3  0x7fe9b99a7537 abort + 0x123
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #4  0x55d43480a1f3 0x55d43480a1f3
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #5  0x7fe9b9c9c8d5 
> vec_resize_allocate_memory + 0x285
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #6  0x7fe9b9d71feb 
> vlib_validate_combined_counter + 0xdb
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #7  0x7fe9ba4f1e55 
> load_balance_create + 0x205
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #8  0x7fe9ba4c639d 
> fib_entry_src_mk_lb + 0x38d
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #9  0x7fe9ba4c64a4 
> fib_entry_src_action_install + 0x44
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #10 0x7fe9ba4c681b 
> fib_entry_src_action_activate + 0x17b
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #11 0x7fe9ba4c3780 fib_entry_create 
> + 0x70
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #12 0x7fe9ba4b9afc 
> fib_table_entry_update + 0x29c
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #13 0x7fe935fcedce 0x7fe935fcedce
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #14 0x7fe935fd2ab5 0x7fe935fd2ab5
> May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Main process exited, 
> code=killed, status=6/ABRT
> May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Failed with result 'signal'.
> May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU time.
> May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Scheduled restart job, 
> restart counter is at 2.
> May 27 06:10:34 pdx1rtr1 systemd[1]: Stopped vector packet processing engine.
> May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU time.
> May 27 06:10:34 pdx1rtr1 systemd[1]: Starting vector packet processing 
> engine...
> May 27 06:10:34 pdx1rtr1 systemd[1]: Started vector packet processing engine.
> 
> Here's what I'm working with:
> 
> root@pdx1rtr1:~# uname -a< /div>
> Linux pdx1rtr1 5.10.0-7-amd64 #1 SMP Debian 5.10.38-1 (2021-05-20) x86_64 
> GNU/Linux
> root@pdx1rtr1:~# vppctl show ver
> vpp v21.10-rc0~3-g3f3da0d27 built by nate on altair at 2021-05-27T01:21:58
> root@pdx1rtr1:~# bird --version
> BIRD version 2.0.7
> 
> And some adjusted sysctl params:
> 
> net.core.rmem_default = 67108864
> net.core.wmem_default = 67108864
> net.core.rmem_max = 67108864
> net.core.wmem_max = 67108864
> vm.nr_hugepages = 1024
> vm.max_map_count = 3096
> vm.hugetlb_shm_group = 0
> kernel.shmmax = 2147483648
> 
> In case it's at all helpful, I ran a "sh ip fib sum" every second and 
> restarted BIRD to observe when the routes start processing, and to get the 
> last known fib state before the crash:
> 
> Thu May 27 06:10:20 UTC 2021
> ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel ] 
> epoch:0 flags:none locks:[adjacency:1, default-route:1, lcp-rt:1, ]
> Prefix length Count 
>0   1
>4   2
>8   3
>9   5
>   10  29
>   11  62
> ;   12 169
>   13 357
>   14 702
>   151140
>   167110
>   174710
>   187763
>   19   13814
>   &nb sp;   20   22146
>   21   26557
>   22   51780
>   23   43914
>   24  227173
>   27   1
>   32   6
> Thu May 27 06:10:21 UTC 2021
> clib_socket_init: connect (fd 3, '/run/vpp/cli.sock'): Connection refused
> Thu May 27 06:10:22 UTC 2021
> ipv4-VRF:0, fib_index:0, flow hash:[src dst spor t dport proto flowlabel ] 
> epoch:0 flags:none locks:[default-route:1, ]
> Prefix length Count 
>0   1
>4   2
>   32 

[vpp-dev] linux_nl_plugin causes VPP crash when importing a full IPv4 table

2021-05-26 Thread Nate Sales

Hello,

I'm having some trouble with the linux-cp netlink plugin. After 
building it from the patch set 
(), it does correctly receive 
netlink messages and insert routes from the linux kernel table into the 
VPP FIB. When loading a large amount of routes however (full IPv4 
table), VPP crashes after loading about 400k routes.


It appears to be receiving a SIGABRT that terminates the VPP process:

May 27 06:10:33 pdx1rtr1 vnet[2232]: received signal SIGABRT, PC 
0x7fe9b99bdce1
May 27 06:10:33 pdx1rtr1 vnet[2232]: #0  0x7fe9b9de1a7b 
0x7fe9b9de1a7b
May 27 06:10:33 pdx1rtr1 vnet[2232]: #1  0x7fe9b9d13140 
0x7fe9b9d13140
May 27 06:10:33 pdx1rtr1 vnet[2232]: #2  0x7fe9b99bdce1 gsignal + 
0x141
May 27 06:10:33 pdx1rtr1 vnet[2232]: #3  0x7fe9b99a7537 abort + 
0x123
May 27 06:10:33 pdx1rtr1 vnet[2232]: #4  0x55d43480a1f3 
0x55d43480a1f3
May 27 06:10:33 pdx1rtr1 vnet[2232]: #5  0x7fe9b9c9c8d5 
vec_resize_allocate_memory + 0x285
May 27 06:10:33 pdx1rtr1 vnet[2232]: #6  0x7fe9b9d71feb 
vlib_validate_combined_counter + 0xdb
May 27 06:10:33 pdx1rtr1 vnet[2232]: #7  0x7fe9ba4f1e55 
load_balance_create + 0x205
May 27 06:10:33 pdx1rtr1 vnet[2232]: #8  0x7fe9ba4c639d 
fib_entry_src_mk_lb + 0x38d
May 27 06:10:33 pdx1rtr1 vnet[2232]: #9  0x7fe9ba4c64a4 
fib_entry_src_action_install + 0x44
May 27 06:10:33 pdx1rtr1 vnet[2232]: #10 0x7fe9ba4c681b 
fib_entry_src_action_activate + 0x17b
May 27 06:10:33 pdx1rtr1 vnet[2232]: #11 0x7fe9ba4c3780 
fib_entry_create + 0x70
May 27 06:10:33 pdx1rtr1 vnet[2232]: #12 0x7fe9ba4b9afc 
fib_table_entry_update + 0x29c
May 27 06:10:33 pdx1rtr1 vnet[2232]: #13 0x7fe935fcedce 
0x7fe935fcedce
May 27 06:10:33 pdx1rtr1 vnet[2232]: #14 0x7fe935fd2ab5 
0x7fe935fd2ab5
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Main process exited, 
code=killed, status=6/ABRT
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Failed with result 
'signal'.
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU 
time.
May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Scheduled restart 
job, restart counter is at 2.
May 27 06:10:34 pdx1rtr1 systemd[1]: Stopped vector packet processing 
engine.
May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU 
time.
May 27 06:10:34 pdx1rtr1 systemd[1]: Starting vector packet processing 
engine...
May 27 06:10:34 pdx1rtr1 systemd[1]: Started vector packet processing 
engine.


Here's what I'm working with:

root@pdx1rtr1:~# uname -a
Linux pdx1rtr1 5.10.0-7-amd64 #1 SMP Debian 5.10.38-1 (2021-05-20) 
x86_64 GNU/Linux

root@pdx1rtr1:~# vppctl show ver
vpp v21.10-rc0~3-g3f3da0d27 built by nate on altair at 
2021-05-27T01:21:58

root@pdx1rtr1:~# bird --version
BIRD version 2.0.7

And some adjusted sysctl params:

net.core.rmem_default = 67108864
net.core.wmem_default = 67108864
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
vm.nr_hugepages = 1024
vm.max_map_count = 3096
vm.hugetlb_shm_group = 0
kernel.shmmax = 2147483648

In case it's at all helpful, I ran a "sh ip fib sum" every second and 
restarted BIRD to observe when the routes start processing, and to get 
the last known fib state before the crash:


Thu May 27 06:10:20 UTC 2021
ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel 
] epoch:0 flags:none locks:[adjacency:1, default-route:1, lcp-rt:1, ]

   Prefix length Count
  0   1
  4   2
  8   3
  9   5
 10  29
 11  62
 12 169
 13 357
 14 702
 151140
 167110
 174710
 187763
 19   13814
 20   22146
 21   26557
 22   51780
 23   43914
 24  227173
 27   1
 32   6
Thu May 27 06:10:21 UTC 2021
clib_socket_init: connect (fd 3, '/run/vpp/cli.sock'): Connection 
refused

Thu May 27 06:10:22 UTC 2021
ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel 
] epoch:0 flags:none locks:[default-route:1, ]

   Prefix length Count
  0   1
  4   2
 32   2


I'm new to VPP so let me know if there are other logs that would be 
useful too.


Cheers,
Nate




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19483): https://lists.fd.io/g/vpp-dev/message/19483
Mute This Topic: https://lists.fd.io/mt/83119168/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com