OVS Version 2.9.90 DPDK Version 17.11 Ubuntu 17.10 Kernel 4.13.0-21-generic Docker 18.06.1-ce
I figured out that no hugepage was being consumed by any of the switches. What is happening is that when ovs-vswitchd is starting, the number of free hugepages drop all the way to 0 (in /dev/hugepages, I see files rtemap_0 to rtemap_7 created), then all of them get deleted and the free hugepages count is back to what it was before (I was expecting it to be decremented by one). So I was running all containers with a one-to-one mapping (docker argument -v /dev/hugepages:/dev/hugepages) when this problem was occurring. I went ahead and created a subfolder in /dev/hugepages in the host for each OVS container and mapped as "-v /dev/hugepages/br-n:/dev/hugepages" where n is the bridge number. This resolved the segfault problem with dpdkvhost port attachment. However I still observe the behavior where the free hugepages count hits 0 (files rtemap_0 to rtemap_7 being created in /dev/hugepages/br-n for each bridge) while ovs-vswitchd is starting in a container. Then all rtemap files but one are deleted, and the free hugepages count is decremented by one from the number before starting ovs-vswitchd, which is an indication that one hugepage is allocated successfully. Packets seem to be forwarded just fine and everything looks ok to me, but I am curious whether this is the normal behavior for allocating hugepages to OVS. On Fri, Nov 2, 2018 at 7:01 PM Alan Kayahan <[email protected]> wrote: > > Thanks for the response all. > > @Ian > 1GB pagesize, 8 total pages. OVS is launched without the > dpdk-socket-mem options so it should take 1GB. When one switch is > started, free hugepage count drops to 7. When I launch another, I'd > expect it to drop to 6 but it crashes instead. > > My DPDK apps create the hugepage file in /dev/hugepages with the name > I specify. I am assuming ovs-vswitchd is responsible for the naming of > ovs hugepage files in /dev/hugepages. I believe it wouldn't be a > problem (I will test this and respond) if both dpdk bridges were > managed by the same ovs-vswitchd service. But in the containerized > scenario, two ovs-vswitchd services are accessing the same > /dev/hugepaes path. Dont you think this would be a problem? Or is it > the openvswitch kernel module that is in charge of hugepage > coordination? > > @Ben > Will retrieve the info you requested as soon as I eliminate couple of > other possible causes. > > Alan > On Wed, Oct 31, 2018 at 11:00 AM Stokes, Ian <[email protected]> wrote: > > > > > On Thu, Oct 25, 2018 at 09:51:38PM +0200, Alan Kayahan wrote: > > > > Hello, > > > > > > > > I have 3 OVS bridges on the same host, connected to each other as > > > > br1<->br2<->br3. br1 and br3 are connected to the docker container cA > > > > via dpdkvhostuser port type (I know it is deprecated, the app works > > > > this way only). The DPDK app running in cA generate packets, which > > > > traverse bridges br1->br2->br3, then ends up back at the DPDK app. > > > > This setup works fine. > > > > > > > > Now I am trying to put each OVS bridge into its respective docker > > > > container. I connect the containers with veth pairs, then add the veth > > > > ports to the bridges. Next, I add a dpdkvhostuser port named SRC to > > > > br1, so far so good. The moment I add a dpdkvhostuser port named SNK > > > > to br3, ovs-vswitchd services in br1's and br3's containers segfault. > > > > Following are the backtraces from each, > > > > What version of OVS and DPDK are you using? > > > > > > > > > > ------------------br1's container--------------- > > > > > > > > [Thread debugging using libthread_db enabled] Using host libthread_db > > > > library "/lib/x86_64-linux-gnu/libthread_db.so.1". > > > > Core was generated by `ovs-vswitchd > > > > unix:/usr/local/var/run/openvswitch/db.sock -vconsole:emer -vsyslo'. > > > > Program terminated with signal SIGSEGV, Segmentation fault. > > > > #0 0x00005608fa0f321b in netdev_rxq_recv (rx=0x7ff13c34ee80, > > > > batch=batch@entry=0x7ff1bbb4d890) at lib/netdev.c:702 > > > > 702 retval = rx->netdev->netdev_class->rxq_recv(rx, batch); > > > > [Current thread is 1 (Thread 0x7ff1bbb4e700 (LWP 376))] > > > > (gdb) bt > > > > #0 0x00005608fa0f321b in netdev_rxq_recv (rx=0x7ff13c34ee80, > > > > batch=batch@entry=0x7ff1bbb4d890) at lib/netdev.c:702 > > > > #1 0x00005608fa0cce65 in dp_netdev_process_rxq_port ( > > > > pmd=pmd@entry=0x7ff1bbb4f010, rxq=0x5608fb651be0, port_no=1) > > > > at lib/dpif-netdev.c:3279 > > > > #2 0x00005608fa0cd296 in pmd_thread_main (f_=<optimized out>) > > > > at lib/dpif-netdev.c:4145 > > > > #3 0x00005608fa14a836 in ovsthread_wrapper (aux_=<optimized out>) > > > > at lib/ovs-thread.c:348 > > > > #4 0x00007ff1c52517fc in start_thread (arg=0x7ff1bbb4e700) > > > > at pthread_create.c:465 > > > > #5 0x00007ff1c4815b5f in clone () > > > > at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > > > > > > > ------------------br3's container--------------- > > > > > > > > [Thread debugging using libthread_db enabled] Using host libthread_db > > > > library "/lib/x86_64-linux-gnu/libthread_db.so.1". > > > > Core was generated by `ovs-vswitchd > > > > unix:/usr/local/var/run/openvswitch/db.sock -vconsole:emer -vsyslo'. > > > > Program terminated with signal SIGSEGV, Segmentation fault. > > > > #0 0x000055c517e3abcb in rte_mempool_free_memchunks () [Current > > > > thread is 1 (Thread 0x7f202351f300 (LWP 647))] > > > > (gdb) bt > > > > #0 0x000055c517e3abcb in rte_mempool_free_memchunks () > > > > #1 0x000055c517e3ad46 in rte_mempool_free.part () > > > > #2 0x000055c518218b78 in dpdk_mp_free (mp=0x7f603fe66a00) > > > > at lib/netdev-dpdk.c:599 > > > > #3 0x000055c518218ff0 in dpdk_mp_free (mp=<optimized out>) > > > > at lib/netdev-dpdk.c:593 > > > > #4 netdev_dpdk_mempool_configure (dev=0x7f1f7ffeac00) at > > > > lib/netdev-dpdk.c:629 > > > > #5 0x000055c51821a98d in dpdk_vhost_reconfigure_helper > > > (dev=0x7f1f7ffeac00) > > > > at lib/netdev-dpdk.c:3599 > > > > #6 0x000055c51821ac8b in netdev_dpdk_vhost_reconfigure > > > (netdev=0x7f1f7ffebcc0) > > > > at lib/netdev-dpdk.c:3624 > > > > #7 0x000055c51813fe6b in port_reconfigure (port=0x55c51a4522a0) > > > > at lib/dpif-netdev.c:3341 > > > > #8 reconfigure_datapath (dp=dp@entry=0x55c51a46efc0) at > > > > lib/dpif-netdev.c:3822 > > > > #9 0x000055c5181403e8 in do_add_port (dp=dp@entry=0x55c51a46efc0, > > > > devname=devname@entry=0x55c51a456520 "SNK", > > > > type=0x55c51834f7bd "dpdkvhostuser", port_no=port_no@entry=1) > > > > at lib/dpif-netdev.c:1584 > > > > #10 0x000055c51814059b in dpif_netdev_port_add (dpif=<optimized out>, > > > > netdev=0x7f1f7ffebcc0, port_nop=0x7fffb4eef68c) at > > > > lib/dpif-netdev.c:1610 > > > > #11 0x000055c5181469be in dpif_port_add (dpif=0x55c51a469350, > > > > netdev=netdev@entry=0x7f1f7ffebcc0, > > > port_nop=port_nop@entry=0x7fffb4eef6ec) > > > > at lib/dpif.c:579 > > > > ---Type <return> to continue, or q <return> to quit--- > > > > #12 0x000055c5180f9f28 in port_add (ofproto_=0x55c51a464ee0, > > > > netdev=0x7f1f7ffebcc0) at ofproto/ofproto-dpif.c:3645 > > > > #13 0x000055c5180ecafe in ofproto_port_add (ofproto=0x55c51a464ee0, > > > > netdev=0x7f1f7ffebcc0, ofp_portp=ofp_portp@entry=0x7fffb4eef7e8) at > > > > ofproto/ofproto.c:1999 > > > > #14 0x000055c5180d97e6 in iface_do_create (errp=0x7fffb4eef7f8, > > > > netdevp=0x7fffb4eef7f0, ofp_portp=0x7fffb4eef7e8, > > > > iface_cfg=0x55c51a46d590, br=0x55c51a4415b0) > > > > at vswitchd/bridge.c:1799 > > > > #15 iface_create (port_cfg=0x55c51a46e210, iface_cfg=0x55c51a46d590, > > > > br=0x55c51a4415b0) at vswitchd/bridge.c:1837 > > > > #16 bridge_add_ports__ (br=br@entry=0x55c51a4415b0, > > > > wanted_ports=wanted_ports@entry=0x55c51a441690, > > > > with_requested_port=with_requested_port@entry=true) at > > > > vswitchd/bridge.c:931 > > > > #17 0x000055c5180db87a in bridge_add_ports > > > > (wanted_ports=0x55c51a441690, br=0x55c51a4415b0) at > > > > vswitchd/bridge.c:942 > > > > #18 bridge_reconfigure (ovs_cfg=ovs_cfg@entry=0x55c51a46ea80) at > > > > vswitchd/bridge.c:661 > > > > #19 0x000055c5180df989 in bridge_run () at vswitchd/bridge.c:3016 > > > > #20 0x000055c517dbc535 in main (argc=<optimized out>, argv=<optimized > > > > out>) at vswitchd/ovs-vswitchd.c:120 > > > > > > > > Note that /dev/hugepages of the host is shared with all containers. I > > > > have a feeling that br3 is overwriting the hugepage file of br1. Any > > > > ideas? > > > > How many/much huge page memory are you allocating to the system and how > > much do you allocate when launching OVS DPDK in each container? > > > > Can you confirm with "cat /proc/meminfo"? > > > > Ian > > > > > > It does look like some kind of bad pointer, since > > > rx->netdev->netdev_class->rxq_recv shouldn't segfault. Is there a way > > > you can rerun with Valgrind or Address Sanitizer? > > > _______________________________________________ > > > discuss mailing list > > > [email protected] > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss _______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
