Hi Brian, If you’re adding lots of routes, you’ll also need to bump the heap size for the IP FIBs as well as the main heap: https://fdio-vpp.readthedocs.io/en/latest/gettingstarted/users/configuring/startup.html#ip
to run in gdb: sudo service vpp stop (or your OS equivalent) make build sudo gdb –args ./build-root/install-vpp_debug-native/vpp/bin/vpp –c <YOUR_CONF_FILE> plugin_path <PATH/TO/ALL/PLUGINS> hope that helps, /neale De : <vpp-dev@lists.fd.io> au nom de Brian Dickson <brian.peter.dick...@gmail.com> Date : mercredi 5 décembre 2018 à 19:31 À : "vpp-dev@lists.fd.io" <vpp-dev@lists.fd.io> Objet : [vpp-dev] vnet crashes, and problems building debug version (was Re: netlink & router (vppsb or patch->vpp) - help building/running) Greetings again, Here is more context on the problem I'm seeing. The problem occurs if a large-ish number of IPv4 prefixes are added to the FIB (by way of the netlink and router plugin). If the prefix count is below some threshold (e.g. 50,000 prefixes), things work fine. At some prefix count (haven't narrowed it down to a specific number, but I don't think the actual number is relevant), vnet crashes, in a failure within ip4_mtrie.c. I have been trying to run in debug mode, but am having a lot of difficulty building everything with debug. Basically, the only way I can successfully build everything is to use the script vagrant/build.sh (which does a make pkg-rpm that generates a bunch of rpm files that I then install with yum). Then, I have to rebuild things using the instructions from vppsb/router/README.md (doing 4 symlinks and various make iterations, and THEN having to run some of those with a bunch of CFLAGS values just to get it to compile). I don't see any good/easy way to build debug images from this environment, without a LOT of work/investigation on how all the various build components work. Is the problem easy enough to diagnose from a non-symbolic stack dump, or can someone provide details on how to build and run vpp with everything to use gdb, including the plugins for netlink/router, so the problem can be further isolated? I think there's basically some kind of bug related to the fib stuff in vnet, that really needs to be fixed. The box has an unreasonably large amount of memory (128GB, doing nothing but VPP), and I get the same error even if I up the initial heap size by a factor of 2^12 (changing 32<<20 to 32ULL<<32). Please help. Brian (In the following, the buffer space message is likely a consequence of the thread handling netlink messages dying, rather than a cause.) Here's the log messages: Dec 4 17:08:14 sj2tldnslab09 vnet[19785]: dpdk_pool_create:535: ioctl(VFIO_IOMMU_MAP_DMA) pool 'dpdk_mbuf_pool_socket0': Inappropriate ioctl for device (errno 25) Dec 4 17:08:14 sj2tldnslab09 vnet[19785]: dpdk_ipsec_process:1026: not enough DPDK crypto resources, default to OpenSSL Dec 4 17:08:16 sj2tldnslab09 vnet[19785]: rtnl_ns_recv:403: Received notification while in sync. Restart synchronization. Dec 4 17:08:16 sj2tldnslab09 vnet[19785]: rtnl_process_read:467: rtnetlink recv error (31) []: Bad file descriptor Dec 4 17:08:58 sj2tldnslab09 vnet[19785]: rtnl_process_read:467: rtnetlink recv error (27) []: No buffer space available Dec 4 17:09:07 sj2tldnslab09 vnet[19785]: rtnl_process_read:467: rtnetlink recv error (27) []: No buffer space available Dec 4 17:09:07 sj2tldnslab09 vnet[19785]: received signal SIGABRT, PC 0x7f043c3c7277 Dec 4 17:09:07 sj2tldnslab09 vnet[19785]: #0 0x00007f043e5c18c5 0x7f043e5c18c5 Dec 4 17:09:07 sj2tldnslab09 vnet[19785]: #1 0x00007f043c9716d0 0x7f043c9716d0 Dec 4 17:09:07 sj2tldnslab09 vnet[19785]: #2 0x00007f043c3c7277 gsignal + 0x37 Dec 4 17:09:07 sj2tldnslab09 vnet[19785]: #3 0x00007f043c3c8968 abort + 0x148 Dec 4 17:09:07 sj2tldnslab09 vnet[19785]: #4 0x00005569eb7900d3 0x5569eb7900d3 Dec 4 17:09:07 sj2tldnslab09 vnet[19785]: #5 0x00007f043d0e8512 vec_resize_allocate_memory + 0x2f2 Dec 4 17:09:07 sj2tldnslab09 vnet[19785]: #6 0x00007f043dd9809f 0x7f043dd9809f Dec 4 17:09:07 sj2tldnslab09 vnet[19785]: #7 0x00007f043dd985cd ip4_fib_mtrie_route_add + 0x17d Dec 4 17:09:07 sj2tldnslab09 vnet[19785]: #8 0x00007f043e129b08 fib_entry_src_action_install + 0xb8 Dec 4 17:09:07 sj2tldnslab09 vnet[19785]: #9 0x00007f043e1274a0 fib_entry_create + 0x70 Dec 4 17:09:07 sj2tldnslab09 vnet[19785]: #10 0x00007f043e11e890 fib_table_entry_path_add2 + 0x190 Dec 4 17:09:07 sj2tldnslab09 vnet[19785]: #11 0x00007f03f86833fd add_del_route + 0x34c Dec 4 17:09:07 sj2tldnslab09 vnet[19785]: #12 0x00007f03f8683594 netns_notify_cb + 0x8c Dec 4 17:09:07 sj2tldnslab09 vnet[19785]: #13 0x00007f03f8466e71 netns_notify + 0x1f3 Dec 4 17:09:07 sj2tldnslab09 vnet[19785]: #14 0x00007f03f84684ed ns_rcv_route + 0x825 On Tue, Nov 27, 2018 at 6:17 PM Brian Dickson <brian.peter.dick...@gmail.com<mailto:brian.peter.dick...@gmail.com>> wrote: I have been working with the netlink and router plugins, which I was able to build from the 18.07 tree via the instructions in vppsb/router. (NB: trying to build from anything more recent, e.g. 18.10 or 19.01 breaks, with no obvious easy resolution). When running with these plugins, connected with an open source router (bird version 1.6.4 or 2.02) and with a very small routing table, it works really really well. (I was able to run roughly line-rate 10g even with small packets, and when using a second host with vpp and the span->pg->pcap to /tmp, didn't lose any data.) However, when trying to load up the routing table, things went sideways, and it seems to be something netlink-related.(This was using BGP to feed in 3 copies of the full routing table, each copy of which is about 750K routes.) I was hoping someone could provide good instructions (good == tested and works) on building from a more recent release of VPP to see if it's an issue that has been fixed. If the issue persists and/or looks to be netlink-specific, would anyone be able to look into it? I'm happy to provide logs etc. System is bare metal centos7.5, tons of cores, memory, etc. The first few messages in syslog look like: Nov 27 17:57:30 sj2tldnslab09 bird: Kernel dropped some netlink messages, will resync on next scan. Nov 27 17:57:30 sj2tldnslab09 vnet[127960]: rtnl_process_read:467: rtnetlink recv error (27) []: No buffer space available Nov 27 17:57:30 sj2tldnslab09 vnet[127960]: rtnl_process_read:467: rtnetlink recv error (27) []: No buffer space available After a bunch of similar groups of messages, VPP appears to crash. If this is a known problem or if there's something that needs to be tweaked on the host, any assistance would be greatly appreciated. Brian
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#11502): https://lists.fd.io/g/vpp-dev/message/11502 Mute This Topic: https://lists.fd.io/mt/28615952/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-