[vpp-dev] subinterface's ip did not deleted after subinterface deleted

2021-05-27 Thread jiangxiaoming
Hi Dave Barach:
Subinterface's ip did not deleted after subinterface deleted. Hrere is the test 
command

> 
> set interface state eth0 up
> create sub-interface eth0 1
> set interface ip addr eth0.1 192.168.1.1/24
> show int addr
> delete sub-interface eth0.1
> set interface ip addr eth0 192.168.1.1/24
> 

in cli get following error:

> 
> DBGvpp# set interface state eth0 up
> DBGvpp# create sub-interface eth0 1
> eth0.1
> DBGvpp# set interface ip addr eth0.1 192.168.1.1/24
> DBGvpp# show int addr
> eth0 (up):
> eth0.1 (dn):
> L3 192.168.1.1/24
> local0 (dn):
> DBGvpp# delete sub-interface eth0.1
> DBGvpp# set interface ip addr eth0 192.168.1.1/24
> set interface ip address: Prefix 192.168.1.1/24 already found on interface
> DELETED
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19494): https://lists.fd.io/g/vpp-dev/message/19494
Mute This Topic: https://lists.fd.io/mt/83142347/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] linux_nl_plugin causes VPP crash when importing a full IPv4 table

2021-05-27 Thread Nate Sales

Hi Pim and Andrew,

Thanks for the help! Turns out it was the stats memory that I had left 
out. After increasing that to 128M I was able to import a full v4 and 
v6 table no problem. As an aside, is the netlink plugin scheduled for 
an upcoming release or is the interface still experimental?


Many thanks,
Nate


On Thu, May 27, 2021 at 11:36 am, Pim van Pelt  wrote:

Hoi Nate,

further to what Andrew suggested, there are a few more hints I can 
offer:
1) Make sure there is enough netlink socket buffer by adding this to 
your sysctl set:

cat << EOF > /etc/sysctl.d/81-VPP-netlink.conf
# Increase netlink to 64M
net.core.rmem_default=67108864
net.core.wmem_default=67108864
net.core.rmem_max=67108864
net.core.wmem_max=67108864
EOF
sysctl -p /etc/sysctl.d/81-VPP-netlink.conf

2) Ensure there is enough memory by adding this to VPP's startup 
config:

memory {
  main-heap-size 2G
  main-heap-page-size default-hugepage
}

3) Many prefixes (like a full BGP routing table) will need more stats 
memory, so increase that too in VPP's startup config:

statseg {
  size 128M
}

And in case you missed it, make sure to create the linux-cp devices 
in a separate namespace by adding this to the startup config:

linux-cp {
  default netns dataplane
}

Then you should be able to consume the IPv4 and IPv6 DFZ in your 
router. I tested extensively with FRR and Bird2, and so far had good 
success.


groet,
Pim

On Thu, May 27, 2021 at 10:02 AM Andrew Yourtchenko 
mailto:ayour...@gmail.com>> wrote:
I would guess from your traceback you are running out of memory, so 
increasing the main heap size to something like 4x could help…


--a

On 27 May 2021, at 08:29, Nate Sales > wrote:



Hello,

I'm having some trouble with the linux-cp netlink plugin. After 
building it from the patch set 
(), it does correctly receive 
netlink messages and insert routes from the linux kernel table into 
the VPP FIB. When loading a large amount of routes however (full 
IPv4 table), VPP crashes after loading about 400k routes.


It appears to be receiving a SIGABRT that terminates the VPP 
process:


May 27 06:10:33 pdx1rtr1 vnet[2232]: received signal SIGABRT, PC 
0x7fe9b99bdce1
May 27 06:10:33 pdx1rtr1 vnet[2232]: #0  0x7fe9b9de1a7b 
0x7fe9b9de1a7b
May 27 06:10:33 pdx1rtr1 vnet[2232]: #1  0x7fe9b9d13140 
0x7fe9b9d13140
May 27 06:10:33 pdx1rtr1 vnet[2232]: #2  0x7fe9b99bdce1 gsignal 
+ 0x141
May 27 06:10:33 pdx1rtr1 vnet[2232]: #3  0x7fe9b99a7537 abort + 
0x123
May 27 06:10:33 pdx1rtr1 vnet[2232]: #4  0x55d43480a1f3 
0x55d43480a1f3
May 27 06:10:33 pdx1rtr1 vnet[2232]: #5  0x7fe9b9c9c8d5 
vec_resize_allocate_memory + 0x285
May 27 06:10:33 pdx1rtr1 vnet[2232]: #6  0x7fe9b9d71feb 
vlib_validate_combined_counter + 0xdb
May 27 06:10:33 pdx1rtr1 vnet[2232]: #7  0x7fe9ba4f1e55 
load_balance_create + 0x205
May 27 06:10:33 pdx1rtr1 vnet[2232]: #8  0x7fe9ba4c639d 
fib_entry_src_mk_lb + 0x38d
May 27 06:10:33 pdx1rtr1 vnet[2232]: #9  0x7fe9ba4c64a4 
fib_entry_src_action_install + 0x44
May 27 06:10:33 pdx1rtr1 vnet[2232]: #10 0x7fe9ba4c681b 
fib_entry_src_action_activate + 0x17b
May 27 06:10:33 pdx1rtr1 vnet[2232]: #11 0x7fe9ba4c3780 
fib_entry_create + 0x70
May 27 06:10:33 pdx1rtr1 vnet[2232]: #12 0x7fe9ba4b9afc 
fib_table_entry_update + 0x29c
May 27 06:10:33 pdx1rtr1 vnet[2232]: #13 0x7fe935fcedce 
0x7fe935fcedce
May 27 06:10:33 pdx1rtr1 vnet[2232]: #14 0x7fe935fd2ab5 
0x7fe935fd2ab5
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Main process 
exited, code=killed, status=6/ABRT
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Failed with 
result 'signal'.
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s 
CPU time.
May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Scheduled restart 
job, restart counter is at 2.
May 27 06:10:34 pdx1rtr1 systemd[1]: Stopped vector packet 
processing engine.
May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s 
CPU time.
May 27 06:10:34 pdx1rtr1 systemd[1]: Starting vector packet 
processing engine...
May 27 06:10:34 pdx1rtr1 systemd[1]: Started vector packet 
processing engine.


Here's what I'm working with:

root@pdx1rtr1:~# uname -a< /div>
Linux pdx1rtr1 5.10.0-7-amd64 #1 SMP Debian 5.10.38-1 (2021-05-20) 
x86_64 GNU/Linux

root@pdx1rtr1:~# vppctl show ver
vpp v21.10-rc0~3-g3f3da0d27 built by nate on altair at 
2021-05-27T01:21:58

root@pdx1rtr1:~# bird --version
BIRD version 2.0.7

And some adjusted sysctl params:

net.core.rmem_default = 67108864
net.core.wmem_default = 67108864
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
vm.nr_hugepages = 1024
vm.max_map_count = 3096
vm.hugetlb_shm_group = 0
kernel.shmmax = 2147483648

In case it's at all helpful, I ran a "sh ip fib sum" every second 
and restarted BIRD to observe when the routes start processing, and 
to get the last known fib state before the crash:


Thu May 27 06:10:20 UTC 2021
ipv4-VRF:0, 

linux_nl_plugin routing issues [Was: Re: [vpp-dev] linux_nl_plugin causes VPP crash when importing a full IPv4 table]

2021-05-27 Thread Mike Beattie
On Thu, May 27, 2021 at 11:36:02AM +0200, Pim van Pelt wrote:
> Hoi Nate,
> 
> further to what Andrew suggested, there are a few more hints I can offer:
> 
> Then you should be able to consume the IPv4 and IPv6 DFZ in your router. I
> tested extensively with FRR and Bird2, and so far had good success.

Pim, thank you for those hints - I plan to be implementing a new core
routing infrastructure using VPP & FRR w/ linux-cp & linux-nl that will be
consuming full tables in the near future. Your hints will be invaluable I
susect.

However, in my testing, I discovered an interesting behaviour with regards
to routing. I have previously tried to reply with my findings to the list,
but I wasn't subscribed at the time of Neale's posts, and I wanted to
continue on his thread ... I composed a detailed report on the web interface
of the list, then managed to completely miss the "CC list" checkbox. So I
think Neale got it himself only. (Sorry Neale).

I digress... what I discovered was that if a route entry is created before a
neighbor entry with the next hop is established, no traffic flows:


root@vpp-test:~# ip netns exec dataplane bash
root@vpp-test:~# systemctl restart vpp.service
root@vpp-test:~# vppctl set interface mtu 1500 GigabitEthernet0/13/0
root@vpp-test:~# vppctl lcp create GigabitEthernet0/13/0 host-if vpp1 netns 
dataplane
root@vpp-test:~# ip l
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode 
DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
root@vpp-test:~# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
   valid_lft forever preferred_lft forever
15: vpp1:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
link/ether 32:dc:fa:93:9e:fe brd ff:ff:ff:ff:ff:ff
root@vpp-test:~# cat init50.sh
#!/bin/sh

ip link set up dev vpp1

ip link add link vpp1 vpp1.50 type vlan id 50
ip link set up dev vpp1.50
ip addr add 10.xxx.yyy.202/24 dev vpp1.50

root@vpp-test:~# ./init50.sh
root@vpp-test:~# ping 1.1.1.1
ping: connect: Network is unreachable
root@vpp-test:~# ip route add default via 10.xxx.yyy.254
root@vpp-test:~# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
^C
--- 1.1.1.1 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4077ms

root@vpp-test:~# ping 10.xxx.yyy.254
PING 10.xxx.yyy.254 (10.xxx.yyy.254) 56(84) bytes of data.
^C
--- 10.xxx.yyy.254 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3070ms

root@vpp-test:~# ip route delete default
root@vpp-test:~# ping 10.xxx.yyy.254
PING 10.xxx.yyy.254 (10.xxx.yyy.254) 56(84) bytes of data.
^C
--- 10.xxx.yyy.254 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3062ms



No traffic passed... ping router before adding route:



root@vpp-test:~# systemctl restart vpp.service
root@vpp-test:~# ./init50.sh
root@vpp-test:~# ping 10.xxx.yyy.254
PING 10.xxx.yyy.254 (10.xxx.yyy.254) 56(84) bytes of data.
64 bytes from 10.xxx.yyy.254: icmp_seq=1 ttl=64 time=0.780 ms
64 bytes from 10.xxx.yyy.254: icmp_seq=2 ttl=64 time=0.306 ms
64 bytes from 10.xxx.yyy.254: icmp_seq=3 ttl=64 time=0.310 ms
^C
--- 10.xxx.yyy.254 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2038ms
rtt min/avg/max/mdev = 0.306/0.465/0.780/0.222 ms
root@vpp-test:~# ip route add default via 10.xxx.yyy.254
root@vpp-test:~# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=60 time=23.5 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=60 time=23.9 ms
^C
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 23.541/23.710/23.879/0.169 ms
root@vpp-test:~#


Traffic passes fine.

This is a basic VPP installation built with
https://gerrit.fd.io/r/c/vpp/+/31122 rebased onto master of a couple weeks
ago (fd77f8c00). Ping plugin disabled, linux-cp and linux-nl enabled, with
linux-cp config of:

linux-cp {
default netns dataplane
interface-auto-create
}

Normally, this behaviour wouldn't be an issue, as a neighbor relationship
with the nexthop will be created with the BGP converstion that would cause
routes to be created that use that nexthop - however, that's not the case
with Route Reflectors that I plan on implementing. OSPF will be used in the
implementation which might mitigate the problem - I hadn't gotten that far
in testing. However, I figured that if this is a real bug, then it's worth
fixing.


There were a couple of other feedback items for the linux-nl plugin that I'd
written to Neale in the web form for the list, but I can only recall one of
them - the default netns has to be specified in the config file, you can't
use the command:

# vppctl lcp default netns dataplane

As the netlink listener doesn't appear to be 

Re: [vpp-dev] IPsec crash with async crypto

2021-05-27 Thread Florin Coras
Hi Matt, 

No worries. I asked because, as luck would have it, quic does use the crypto 
infra :-)

Cheers, 
Florin

> On May 27, 2021, at 6:02 AM, Matthew Smith  wrote:
> 
> Hi Florin!
> 
> It appears that the quic plugin is disabled in my build:
> 
> 2021/05/27 07:44:49:044 notice plugin/loadPlugin disabled (default): 
> quic_plugin.so
> 
> I didn't mean to give the impression that I thought this issue was caused by 
> quic. My mention of the quic commit was just intended to indicate how up to 
> date my build is with the gerrit master branch in case there were 
> recent/pending patches that people know of that might be relevant. That quic 
> commit is from about 2 weeks ago, which is the last time I merged upstream 
> changes.
> 
> Thanks,
> -Matt
> 
> 
> On Wed, May 26, 2021 at 5:58 PM Florin Coras  > wrote:
> Hi Matt, 
> 
> Did you try checking if quic plugin is loaded, just to see if there’s a 
> connection there. 
> 
> Regards,
> Florin
> 
> > On May 26, 2021, at 3:19 PM, Matthew Smith via lists.fd.io 
> >   > > wrote:
> > 
> > Hi,
> > 
> > I saw VPP crash several times during some tests that were running to 
> > evaluate IPsec performance. The last upstream commit on my build of VPP is 
> > 'fd77f8c00 quic: remove cmake --target'. The tests ran on a C3000 with an 
> > onboard QAT. The tests were repeated with the QAT removed from the device 
> > whitelist in startup.conf (using async crypto with sw_scheduler) and the 
> > same thing happened.
> > 
> > The relevant part of the stack trace looks like this:
> > 
> > #8  0x7fdbb4006459 in os_out_of_memory () at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/unix-misc.c:221
> > #9  0x7fdbb400d1fb in clib_mem_alloc_aligned_at_offset 
> > (size=2305843009213692256, align=8, align_offset=8, 
> > os_out_of_memory_on_failure=1) at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/mem.h:243
> > #10 vec_resize_allocate_memory (v=0x7fdb36a9b7f0, 
> > length_increment=288230376151711515, data_bytes=2305843009213692256, 
> > header_bytes=8, data_align=8, numa_id=255) at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/vec.c:111
> > #11 0x7fdbb60efe01 in _vec_resize_inline (v=0x7fdb36a9b7f0, 
> > length_increment=288230376151711515, data_bytes=2305843009213692248, 
> > header_bytes=0, data_align=8, numa_id=255) at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/vec.h:170
> > #12 clib_bitmap_ori_notrim (ai=0x7fdb36a9b7f0, i=18446744073709537927) at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/bitmap.h:643
> > #13 vnet_crypto_async_free_frame (vm=0x7fdb356f7a80, frame=0x7fdb3461c280) 
> > at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/crypto.h:585
> > #14 crypto_dequeue_frame (vm=0x7fdb356f7a80, node=0x7fdb36bbd280, 
> > ct=0x7fdb33537f80, hdl=0x7fdb2bc32810 , n_cache=1, 
> > n_total=0x7fdb145053dc) at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/node.c:135
> > #15 crypto_dispatch_node_fn (vm=0x7fdb356f7a80, node=0x7fdb36bbd280, 
> > frame=0x0) at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/node.c:166
> > #16 0x7fdbb4b789e5 in dispatch_node (vm=0x7fdb356f7a80, 
> > node=0x7fdb36bbd280, type=VLIB_NODE_TYPE_INPUT, 
> > dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0, 
> > last_time_stamp=207016971809128) at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vlib/main.c:1024
> > #17 vlib_main_or_worker_loop (vm=0x7fdb356f7a80, is_main=0) at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vlib/main.c:1618
> > 
> > In vnet_crypto_async_free_frame() it appears that a call to pool_put() is 
> > trying to return a pointer to a pool that it is not a member of:
> > 
> > (gdb) frame 13
> > #13 vnet_crypto_async_free_frame (vm=0x7fdb356f7a80, frame=0x7fdb3461c280) 
> > at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/crypto.h:585
> > 585  pool_put (ct->frame_pool, frame);
> > (gdb) p frame - ct->frame_pool
> > $1 = -13689
> > 
> > It seems like maybe a pointer to a vnet_crypto_async_frame_t was stored by 
> > the crypto engine and before it could be dequeued the pool filled and had 
> > to be reallocated. The per-thread frame_pool's are allocated with room for 
> > 1024 entries initially and ct->frame_pool had a vector length of 1025 when 
> > the crash occurred.
> > 
> > Can anyone with knowledge of the async crypto code confirm or refute that 
> > theory? Anyone have suggestions on the best way to fix this?
> > 
> > Thanks,
> > -Matt
> > 
> > 
> > 
> > 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19491): https://lists.fd.io/g/vpp-dev/message/19491
Mute This Topic: https://lists.fd.io/mt/83112898/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: 

Re: [vpp-dev] IPsec crash with async crypto

2021-05-27 Thread Matthew Smith via lists.fd.io
Hi Florin!

It appears that the quic plugin is disabled in my build:

2021/05/27 07:44:49:044 notice plugin/loadPlugin disabled
(default): quic_plugin.so

I didn't mean to give the impression that I thought this issue was caused
by quic. My mention of the quic commit was just intended to indicate how up
to date my build is with the gerrit master branch in case there were
recent/pending patches that people know of that might be relevant. That
quic commit is from about 2 weeks ago, which is the last time I merged
upstream changes.

Thanks,
-Matt


On Wed, May 26, 2021 at 5:58 PM Florin Coras  wrote:

> Hi Matt,
>
> Did you try checking if quic plugin is loaded, just to see if there’s a
> connection there.
>
> Regards,
> Florin
>
> > On May 26, 2021, at 3:19 PM, Matthew Smith via lists.fd.io  netgate@lists.fd.io> wrote:
> >
> > Hi,
> >
> > I saw VPP crash several times during some tests that were running to
> evaluate IPsec performance. The last upstream commit on my build of VPP is
> 'fd77f8c00 quic: remove cmake --target'. The tests ran on a C3000 with an
> onboard QAT. The tests were repeated with the QAT removed from the device
> whitelist in startup.conf (using async crypto with sw_scheduler) and the
> same thing happened.
> >
> > The relevant part of the stack trace looks like this:
> >
> > #8  0x7fdbb4006459 in os_out_of_memory () at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/unix-misc.c:221
> > #9  0x7fdbb400d1fb in clib_mem_alloc_aligned_at_offset
> (size=2305843009213692256, align=8, align_offset=8,
> os_out_of_memory_on_failure=1) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/mem.h:243
> > #10 vec_resize_allocate_memory (v=0x7fdb36a9b7f0,
> length_increment=288230376151711515, data_bytes=2305843009213692256,
> header_bytes=8, data_align=8, numa_id=255) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/vec.c:111
> > #11 0x7fdbb60efe01 in _vec_resize_inline (v=0x7fdb36a9b7f0,
> length_increment=288230376151711515, data_bytes=2305843009213692248,
> header_bytes=0, data_align=8, numa_id=255) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/vec.h:170
> > #12 clib_bitmap_ori_notrim (ai=0x7fdb36a9b7f0, i=18446744073709537927)
> at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/bitmap.h:643
> > #13 vnet_crypto_async_free_frame (vm=0x7fdb356f7a80,
> frame=0x7fdb3461c280) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/crypto.h:585
> > #14 crypto_dequeue_frame (vm=0x7fdb356f7a80, node=0x7fdb36bbd280,
> ct=0x7fdb33537f80, hdl=0x7fdb2bc32810 , n_cache=1,
> n_total=0x7fdb145053dc) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/node.c:135
> > #15 crypto_dispatch_node_fn (vm=0x7fdb356f7a80, node=0x7fdb36bbd280,
> frame=0x0) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/node.c:166
> > #16 0x7fdbb4b789e5 in dispatch_node (vm=0x7fdb356f7a80,
> node=0x7fdb36bbd280, type=VLIB_NODE_TYPE_INPUT,
> dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0,
> last_time_stamp=207016971809128) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vlib/main.c:1024
> > #17 vlib_main_or_worker_loop (vm=0x7fdb356f7a80, is_main=0) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vlib/main.c:1618
> >
> > In vnet_crypto_async_free_frame() it appears that a call to pool_put()
> is trying to return a pointer to a pool that it is not a member of:
> >
> > (gdb) frame 13
> > #13 vnet_crypto_async_free_frame (vm=0x7fdb356f7a80,
> frame=0x7fdb3461c280) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/crypto.h:585
> > 585  pool_put (ct->frame_pool, frame);
> > (gdb) p frame - ct->frame_pool
> > $1 = -13689
> >
> > It seems like maybe a pointer to a vnet_crypto_async_frame_t was stored
> by the crypto engine and before it could be dequeued the pool filled and
> had to be reallocated. The per-thread frame_pool's are allocated with room
> for 1024 entries initially and ct->frame_pool had a vector length of 1025
> when the crash occurred.
> >
> > Can anyone with knowledge of the async crypto code confirm or refute
> that theory? Anyone have suggestions on the best way to fix this?
> >
> > Thanks,
> > -Matt
> >
> >
> > 
> >
>
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19490): https://lists.fd.io/g/vpp-dev/message/19490
Mute This Topic: https://lists.fd.io/mt/83112898/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] unformat_vnet_uri not implemented following RFC 3986

2021-05-27 Thread Damjan Marion via lists.fd.io

Same RFC defines that for IPv6, square brackets should be used to distinguish 
between addr and port:

 A host identified by an Internet Protocol literal address, version 6
   [RFC3513] or later, is distinguished by enclosing the IP literal
   within square brackets ("[" and "]").  This is the only place where
   square bracket characters are allowed in the URI syntax. 
— 
Damjan



> On 27.05.2021., at 13:00, Dave Barach  wrote:
> 
> IIRC it's exactly because ipv6 addresses use ':' (and "::") as chunk 
> separators. If you decide to change unformat_vnet_uri please test ipv6 cases 
> carefully.
> 
> D.
> 
> -Original Message-
> From: vpp-dev@lists.fd.io  On Behalf Of Florin Coras
> Sent: Thursday, May 27, 2021 1:05 AM
> To: 江 晓明 
> Cc: vpp-dev@lists.fd.io
> Subject: Re: [vpp-dev] unformat_vnet_uri not implemented following RFC 3986
> 
> Hi, 
> 
> That unformat function and the associated session layer apis (e.g., 
> vnet_connect_uri) are mainly used for testing and their production use is 
> discouraged. Provided that functionality is not lost, if anybody wants to do 
> the work, I don’t see why we wouldn’t want to make the unformat function rfc 
> compliant. At this point I can’t remember why we settled on the use of “/“ 
> but I suspect it may have to do with easier parsing of ipv6 ips. 
> 
> Regards,
> Florin
> 
>> On May 26, 2021, at 8:04 PM, jiangxiaom...@outlook.com wrote:
>> 
>> Hi Florin:
>>Currently unformat_vnet_uri not implemented following RFC 3986. The 
>> syntax `tcp://10.0.0.1/500` should be `tcp://10.0.0.1:500` in rfc 3986.
>> I noticed in there is a comment for `unformat_vent_uri` in 
>> `src/vnet/session/application_interface.c`,
>> ```
>> /**
>> * unformat a vnet URI
>> *
>> * transport-proto://[hostname]ip46-addr:port
>> * eg.  tcp://ip46-addr:port
>> *  tls://[testtsl.fd.io]ip46-addr:port
>> *
>> ...
>> ```
>> Does it mean `unformat_vnet_uri` will be refactored following rfc in future?
>> 
>> 
>> 
> 
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19489): https://lists.fd.io/g/vpp-dev/message/19489
Mute This Topic: https://lists.fd.io/mt/83117335/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] unformat_vnet_uri not implemented following RFC 3986

2021-05-27 Thread Dave Barach
IIRC it's exactly because ipv6 addresses use ':' (and "::") as chunk 
separators. If you decide to change unformat_vnet_uri please test ipv6 cases 
carefully.

D.  

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Florin Coras
Sent: Thursday, May 27, 2021 1:05 AM
To: 江 晓明 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] unformat_vnet_uri not implemented following RFC 3986

Hi, 

That unformat function and the associated session layer apis (e.g., 
vnet_connect_uri) are mainly used for testing and their production use is 
discouraged. Provided that functionality is not lost, if anybody wants to do 
the work, I don’t see why we wouldn’t want to make the unformat function rfc 
compliant. At this point I can’t remember why we settled on the use of “/“ but 
I suspect it may have to do with easier parsing of ipv6 ips. 

Regards,
Florin

> On May 26, 2021, at 8:04 PM, jiangxiaom...@outlook.com wrote:
> 
> Hi Florin:
> Currently unformat_vnet_uri not implemented following RFC 3986. The 
> syntax `tcp://10.0.0.1/500` should be `tcp://10.0.0.1:500` in rfc 3986.
> I noticed in there is a comment for `unformat_vent_uri` in 
> `src/vnet/session/application_interface.c`,
> ```
> /**
>  * unformat a vnet URI
>  *
>  * transport-proto://[hostname]ip46-addr:port
>  * eg.  tcp://ip46-addr:port
>  *  tls://[testtsl.fd.io]ip46-addr:port
>  *
>  ...
> ```
> Does it mean `unformat_vnet_uri` will be refactored following rfc in future?
> 
> 
> 



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19488): https://lists.fd.io/g/vpp-dev/message/19488
Mute This Topic: https://lists.fd.io/mt/83117335/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] linux_nl_plugin causes VPP crash when importing a full IPv4 table

2021-05-27 Thread Pim van Pelt
Hoi Nate,

further to what Andrew suggested, there are a few more hints I can offer:
1) Make sure there is enough netlink socket buffer by adding this to your
sysctl set:
cat << EOF > /etc/sysctl.d/81-VPP-netlink.conf
# Increase netlink to 64M
net.core.rmem_default=67108864
net.core.wmem_default=67108864
net.core.rmem_max=67108864
net.core.wmem_max=67108864
EOF
sysctl -p /etc/sysctl.d/81-VPP-netlink.conf

2) Ensure there is enough memory by adding this to VPP's startup config:
memory {
  main-heap-size 2G
  main-heap-page-size default-hugepage
}

3) Many prefixes (like a full BGP routing table) will need more stats
memory, so increase that too in VPP's startup config:
statseg {
  size 128M
}

And in case you missed it, make sure to create the linux-cp devices in a
separate namespace by adding this to the startup config:
linux-cp {
  default netns dataplane
}

Then you should be able to consume the IPv4 and IPv6 DFZ in your router. I
tested extensively with FRR and Bird2, and so far had good success.

groet,
Pim

On Thu, May 27, 2021 at 10:02 AM Andrew Yourtchenko 
wrote:

> I would guess from your traceback you are running out of memory, so
> increasing the main heap size to something like 4x could help…
>
> --a
>
> On 27 May 2021, at 08:29, Nate Sales  wrote:
>
> 
> Hello,
>
> I'm having some trouble with the linux-cp netlink plugin. After building
> it from the patch set (https://gerrit.fd.io/r/c/vpp/+/31122), it does
> correctly receive netlink messages and insert routes from the linux kernel
> table into the VPP FIB. When loading a large amount of routes however (full
> IPv4 table), VPP crashes after loading about 400k routes.
>
> It appears to be receiving a SIGABRT that terminates the VPP process:
>
> May 27 06:10:33 pdx1rtr1 vnet[2232]: received signal SIGABRT, PC
> 0x7fe9b99bdce1
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #0  0x7fe9b9de1a7b 0x7fe9b9de1a7b
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #1  0x7fe9b9d13140 0x7fe9b9d13140
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #2  0x7fe9b99bdce1 gsignal + 0x141
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #3  0x7fe9b99a7537 abort + 0x123
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #4  0x55d43480a1f3 0x55d43480a1f3
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #5  0x7fe9b9c9c8d5
> vec_resize_allocate_memory + 0x285
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #6  0x7fe9b9d71feb
> vlib_validate_combined_counter + 0xdb
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #7  0x7fe9ba4f1e55
> load_balance_create + 0x205
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #8  0x7fe9ba4c639d
> fib_entry_src_mk_lb + 0x38d
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #9  0x7fe9ba4c64a4
> fib_entry_src_action_install + 0x44
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #10 0x7fe9ba4c681b
> fib_entry_src_action_activate + 0x17b
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #11 0x7fe9ba4c3780
> fib_entry_create + 0x70
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #12 0x7fe9ba4b9afc
> fib_table_entry_update + 0x29c
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #13 0x7fe935fcedce 0x7fe935fcedce
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #14 0x7fe935fd2ab5 0x7fe935fd2ab5
> May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Main process exited,
> code=killed, status=6/ABRT
> May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Failed with result
> 'signal'.
> May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU
> time.
> May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Scheduled restart job,
> restart counter is at 2.
> May 27 06:10:34 pdx1rtr1 systemd[1]: Stopped vector packet processing
> engine.
> May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU
> time.
> May 27 06:10:34 pdx1rtr1 systemd[1]: Starting vector packet processing
> engine...
> May 27 06:10:34 pdx1rtr1 systemd[1]: Started vector packet processing
> engine.
>
> Here's what I'm working with:
>
> root@pdx1rtr1:~# uname -a< /div>
> Linux pdx1rtr1 5.10.0-7-amd64 #1 SMP Debian 5.10.38-1 (2021-05-20) x86_64
> GNU/Linux
> root@pdx1rtr1:~# vppctl show ver
> vpp v21.10-rc0~3-g3f3da0d27 built by nate on altair at 2021-05-27T01:21:58
> root@pdx1rtr1:~# bird --version
> BIRD version 2.0.7
>
> And some adjusted sysctl params:
>
> net.core.rmem_default = 67108864
> net.core.wmem_default = 67108864
> net.core.rmem_max = 67108864
> net.core.wmem_max = 67108864
> vm.nr_hugepages = 1024
> vm.max_map_count = 3096
> vm.hugetlb_shm_group = 0
> kernel.shmmax = 2147483648
>
> In case it's at all helpful, I ran a "sh ip fib sum" every second and
> restarted BIRD to observe when the routes start processing, and to get the
> last known fib state before the crash:
>
> Thu May 27 06:10:20 UTC 2021
> ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel ]
> epoch:0 flags:none locks:[adjacency:1, default-route:1, lcp-rt:1, ]
> Prefix length Count
>0   1
>4   2
>8   3
>9   5
>  

Re: [vpp-dev]: Unable to run VPP with ASAN enabled

2021-05-27 Thread Benoit Ganne (bganne) via lists.fd.io
Hi Rajith,

> The problem seems to be due to external libraries that we have linked with
> VPP. These external libraries have not been compiled with ASAN.
> I could see that when those external libraries were suppressed through the
> MyASAN.supp file, VPP started running with ASAN enabled.

This is surprising because the default policy of ASan is to allow access to any 
memory by default - IOW ASan will not detect access memory errors *unless* this 
memory address was *explicitly* marked as unaccessible. ASan interpose malloc() 
etc. symbols so libc allocations are automatically marked as 
accessible/unaccessible by the ASan runtime, and VPP does the same for its own 
memory allocator.
Of course there might still be false positive but in my experience they are 
usually rare. 

Best
ben

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19486): https://lists.fd.io/g/vpp-dev/message/19486
Mute This Topic: https://lists.fd.io/mt/83071228/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] linux_nl_plugin causes VPP crash when importing a full IPv4 table

2021-05-27 Thread Andrew Yourtchenko
I would guess from your traceback you are running out of memory, so increasing 
the main heap size to something like 4x could help…

--a

> On 27 May 2021, at 08:29, Nate Sales  wrote:
> 
> 
> Hello,
> 
> I'm having some trouble with the linux-cp netlink plugin. After building it 
> from the patch set (https://gerrit.fd.io/r/c/vpp/+/31122), it does correctly 
> receive netlink messages and insert routes from the linux kernel table into 
> the VPP FIB. When loading a large amount of routes however (full IPv4 table), 
> VPP crashes after loading about 400k routes.
> 
> It appears to be receiving a SIGABRT that terminates the VPP process:
> 
> May 27 06:10:33 pdx1rtr1 vnet[2232]: received signal SIGABRT, PC 
> 0x7fe9b99bdce1
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #0  0x7fe9b9de1a7b 0x7fe9b9de1a7b
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #1  0x7fe9b9d13140 0x7fe9b9d13140
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #2  0x7fe9b99bdce1 gsignal + 0x141
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #3  0x7fe9b99a7537 abort + 0x123
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #4  0x55d43480a1f3 0x55d43480a1f3
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #5  0x7fe9b9c9c8d5 
> vec_resize_allocate_memory + 0x285
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #6  0x7fe9b9d71feb 
> vlib_validate_combined_counter + 0xdb
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #7  0x7fe9ba4f1e55 
> load_balance_create + 0x205
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #8  0x7fe9ba4c639d 
> fib_entry_src_mk_lb + 0x38d
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #9  0x7fe9ba4c64a4 
> fib_entry_src_action_install + 0x44
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #10 0x7fe9ba4c681b 
> fib_entry_src_action_activate + 0x17b
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #11 0x7fe9ba4c3780 fib_entry_create 
> + 0x70
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #12 0x7fe9ba4b9afc 
> fib_table_entry_update + 0x29c
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #13 0x7fe935fcedce 0x7fe935fcedce
> May 27 06:10:33 pdx1rtr1 vnet[2232]: #14 0x7fe935fd2ab5 0x7fe935fd2ab5
> May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Main process exited, 
> code=killed, status=6/ABRT
> May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Failed with result 'signal'.
> May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU time.
> May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Scheduled restart job, 
> restart counter is at 2.
> May 27 06:10:34 pdx1rtr1 systemd[1]: Stopped vector packet processing engine.
> May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU time.
> May 27 06:10:34 pdx1rtr1 systemd[1]: Starting vector packet processing 
> engine...
> May 27 06:10:34 pdx1rtr1 systemd[1]: Started vector packet processing engine.
> 
> Here's what I'm working with:
> 
> root@pdx1rtr1:~# uname -a< /div>
> Linux pdx1rtr1 5.10.0-7-amd64 #1 SMP Debian 5.10.38-1 (2021-05-20) x86_64 
> GNU/Linux
> root@pdx1rtr1:~# vppctl show ver
> vpp v21.10-rc0~3-g3f3da0d27 built by nate on altair at 2021-05-27T01:21:58
> root@pdx1rtr1:~# bird --version
> BIRD version 2.0.7
> 
> And some adjusted sysctl params:
> 
> net.core.rmem_default = 67108864
> net.core.wmem_default = 67108864
> net.core.rmem_max = 67108864
> net.core.wmem_max = 67108864
> vm.nr_hugepages = 1024
> vm.max_map_count = 3096
> vm.hugetlb_shm_group = 0
> kernel.shmmax = 2147483648
> 
> In case it's at all helpful, I ran a "sh ip fib sum" every second and 
> restarted BIRD to observe when the routes start processing, and to get the 
> last known fib state before the crash:
> 
> Thu May 27 06:10:20 UTC 2021
> ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel ] 
> epoch:0 flags:none locks:[adjacency:1, default-route:1, lcp-rt:1, ]
> Prefix length Count 
>0   1
>4   2
>8   3
>9   5
>   10  29
>   11  62
> ;   12 169
>   13 357
>   14 702
>   151140
>   167110
>   174710
>   187763
>   19   13814
>sp;   20   22146
>   21   26557
>   22   51780
>   23   43914
>   24  227173
>   27   1
>   32   6
> Thu May 27 06:10:21 UTC 2021
> clib_socket_init: connect (fd 3, '/run/vpp/cli.sock'): Connection refused
> Thu May 27 06:10:22 UTC 2021
> ipv4-VRF:0, fib_index:0, flow hash:[src dst spor t dport proto flowlabel ] 
> epoch:0 flags:none locks:[default-route:1, ]
> Prefix length Count 
>0   1
>4   2
>   32

Re: [vpp-dev]: Unable to run VPP with ASAN enabled

2021-05-27 Thread Rajith PR via lists.fd.io
Hi Ben,

The problem seems to be due to external libraries that we have linked with
VPP. These external libraries have not been compiled with ASAN.
I could see that when those external libraries were suppressed through the
MyASAN.supp file, VPP started running with ASAN enabled.

Thanks,
Rajith

On Wed, May 26, 2021 at 2:25 PM Benoit Ganne (bganne) 
wrote:

> Hi Rajith,
>
> > I was able to proceed further after setting LD_PRELOAD to the asan
> > library. After this i get SIGSEGV crash in asan. These dont seem to be
> > related to our code, as without ASAN they have been perfectly working.
>
> I suspect the opposite  - ASan detects errors we do not detect in
> release or debug mode, esp. out-of-bound access and use-after-free. Look
> carefully at /home/supervisor/libvpp/src/vpp/rtbrick/rtb_vpp_ifp.c:287
>
> Best
> ben
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19484): https://lists.fd.io/g/vpp-dev/message/19484
Mute This Topic: https://lists.fd.io/mt/83071228/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] linux_nl_plugin causes VPP crash when importing a full IPv4 table

2021-05-27 Thread Nate Sales

Hello,

I'm having some trouble with the linux-cp netlink plugin. After 
building it from the patch set 
(), it does correctly receive 
netlink messages and insert routes from the linux kernel table into the 
VPP FIB. When loading a large amount of routes however (full IPv4 
table), VPP crashes after loading about 400k routes.


It appears to be receiving a SIGABRT that terminates the VPP process:

May 27 06:10:33 pdx1rtr1 vnet[2232]: received signal SIGABRT, PC 
0x7fe9b99bdce1
May 27 06:10:33 pdx1rtr1 vnet[2232]: #0  0x7fe9b9de1a7b 
0x7fe9b9de1a7b
May 27 06:10:33 pdx1rtr1 vnet[2232]: #1  0x7fe9b9d13140 
0x7fe9b9d13140
May 27 06:10:33 pdx1rtr1 vnet[2232]: #2  0x7fe9b99bdce1 gsignal + 
0x141
May 27 06:10:33 pdx1rtr1 vnet[2232]: #3  0x7fe9b99a7537 abort + 
0x123
May 27 06:10:33 pdx1rtr1 vnet[2232]: #4  0x55d43480a1f3 
0x55d43480a1f3
May 27 06:10:33 pdx1rtr1 vnet[2232]: #5  0x7fe9b9c9c8d5 
vec_resize_allocate_memory + 0x285
May 27 06:10:33 pdx1rtr1 vnet[2232]: #6  0x7fe9b9d71feb 
vlib_validate_combined_counter + 0xdb
May 27 06:10:33 pdx1rtr1 vnet[2232]: #7  0x7fe9ba4f1e55 
load_balance_create + 0x205
May 27 06:10:33 pdx1rtr1 vnet[2232]: #8  0x7fe9ba4c639d 
fib_entry_src_mk_lb + 0x38d
May 27 06:10:33 pdx1rtr1 vnet[2232]: #9  0x7fe9ba4c64a4 
fib_entry_src_action_install + 0x44
May 27 06:10:33 pdx1rtr1 vnet[2232]: #10 0x7fe9ba4c681b 
fib_entry_src_action_activate + 0x17b
May 27 06:10:33 pdx1rtr1 vnet[2232]: #11 0x7fe9ba4c3780 
fib_entry_create + 0x70
May 27 06:10:33 pdx1rtr1 vnet[2232]: #12 0x7fe9ba4b9afc 
fib_table_entry_update + 0x29c
May 27 06:10:33 pdx1rtr1 vnet[2232]: #13 0x7fe935fcedce 
0x7fe935fcedce
May 27 06:10:33 pdx1rtr1 vnet[2232]: #14 0x7fe935fd2ab5 
0x7fe935fd2ab5
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Main process exited, 
code=killed, status=6/ABRT
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Failed with result 
'signal'.
May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU 
time.
May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Scheduled restart 
job, restart counter is at 2.
May 27 06:10:34 pdx1rtr1 systemd[1]: Stopped vector packet processing 
engine.
May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU 
time.
May 27 06:10:34 pdx1rtr1 systemd[1]: Starting vector packet processing 
engine...
May 27 06:10:34 pdx1rtr1 systemd[1]: Started vector packet processing 
engine.


Here's what I'm working with:

root@pdx1rtr1:~# uname -a
Linux pdx1rtr1 5.10.0-7-amd64 #1 SMP Debian 5.10.38-1 (2021-05-20) 
x86_64 GNU/Linux

root@pdx1rtr1:~# vppctl show ver
vpp v21.10-rc0~3-g3f3da0d27 built by nate on altair at 
2021-05-27T01:21:58

root@pdx1rtr1:~# bird --version
BIRD version 2.0.7

And some adjusted sysctl params:

net.core.rmem_default = 67108864
net.core.wmem_default = 67108864
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
vm.nr_hugepages = 1024
vm.max_map_count = 3096
vm.hugetlb_shm_group = 0
kernel.shmmax = 2147483648

In case it's at all helpful, I ran a "sh ip fib sum" every second and 
restarted BIRD to observe when the routes start processing, and to get 
the last known fib state before the crash:


Thu May 27 06:10:20 UTC 2021
ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel 
] epoch:0 flags:none locks:[adjacency:1, default-route:1, lcp-rt:1, ]

   Prefix length Count
  0   1
  4   2
  8   3
  9   5
 10  29
 11  62
 12 169
 13 357
 14 702
 151140
 167110
 174710
 187763
 19   13814
 20   22146
 21   26557
 22   51780
 23   43914
 24  227173
 27   1
 32   6
Thu May 27 06:10:21 UTC 2021
clib_socket_init: connect (fd 3, '/run/vpp/cli.sock'): Connection 
refused

Thu May 27 06:10:22 UTC 2021
ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel 
] epoch:0 flags:none locks:[default-route:1, ]

   Prefix length Count
  0   1
  4   2
 32   2


I'm new to VPP so let me know if there are other logs that would be 
useful too.


Cheers,
Nate




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19483): https://lists.fd.io/g/vpp-dev/message/19483
Mute This Topic: https://lists.fd.io/mt/83119168/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub