[vpp-dev] TCP bandwidth loss
[Edited Message Follows] Hi, we found that the nsh make our TCP bandwidth loss. s cene as following threre ar two host: vm1(192.168.128.2), vm2(192.168.128.3) 1: host(vm1/vm2) send traffic to vpp_server0 2: vpp_server0 configrate the classify table and session for river the traffic to vpp_server1 3: vpp_server1 configrate the classify table and session to send the traffic back to vpp_server0 4: vpp_server0 send traffic to host vpp_server 0/1 nsh configration (hit-next 27 mean nsh-classifier) vpp_server0: create vxlan-gpe tunnel local 100.64.1.8 remote 100.64.1.9 vni 1000 next-nsh (#if_name:vxlan_gpe_tunnel0 if index:19) create nsh entry nsp 1000 nsi 255 md-type 1 c1 0 c2 0 c3 0 c4 0 next-ethernet create nsh map nsp 1000 nsi 255 mapped-nsp 1000 mapped-nsi 255 nsh_action push encap-vxlan-gpe-intf 19 classify table mask l3 ip4 src (table-index 0) classify session hit-next 27 table-index 0 match l3 ip4 src 192.168.128.2 opaque-index 256255 classify session hit-next 27 table-index 0 match l3 ip4 src 192.168.128.3 opaque-index 256255 set interface l2 input classify intfc pipe1000.0 ip4-table 0 create nsh map nsp 1001 nsi 255 mapped-nsp 1001 mapped-nsi 255 nsh_action pop encap-none 3 0 create nsh map nsp 1002 nsi 255 mapped-nsp 1002 mapped-nsi 255 nsh_action pop encap-none 3 0 vpp_server1: create vxlan-gpe tunnel local 100.64.1.9 remote 100.64.1.8 vni 1000 next-nsh (#if_name:vxlan_gpe_tunnel0 if_index: 7) create nsh map nsp 1000 nsi 255 mapped-nsp 1000 mapped-nsi 255 nsh_action pop encap-none 1 0 (# nsh_tunnel0) set interface feature nsh_tunnel0 ip4-not-enabled arc ip4-unicast disable create nsh entry nsp 1001 nsi 255 md-type 1 c1 0 c2 0 c3 0 c4 0 next-ethernet create nsh map nsp 1001 nsi 255 mapped-nsp 1001 mapped-nsi 255 nsh_action push encap-vxlan-gpe-intf 7 create nsh entry nsp 1002 nsi 255 md-type 1 c1 0 c2 0 c3 0 c4 0 next-ethernet create nsh map nsp 1002 nsi 255 mapped-nsp 1002 mapped-nsi 255 nsh_action push encap-vxlan-gpe-intf 7 classify table mask l3 ip4 dst (table-index 0) classify session hit-next 27 table-index 0 match l3 ip4 dst 192.168.128.2 opaque-index 256511 classify session hit-next 27 table-index 0 match l3 ip4 dst 192.168.128.3 opaque-index 256767 set interface l2 input classify intfc loop1000 ip4-table 0 we use iperf to test vm1/vm2 tcp bandwidth, it just 1Gbps if we just modify the configration of vpp_server1 as following, it mean that use the same nsh sp(1001) and si(255), tcp bandwidth will increate to 6Gbps we expected classify session hit-next 27 table-index 0 match l3 ip4 dst 192.168.128.3 del classify session hit-next 27 table-index 0 match l3 ip4 dst 192.168.128.3 opaque-index 256511 the vpp version we used is 19.08. we don't know why nsh cause this issue. B.R. joseph -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#19634): https://lists.fd.io/g/vpp-dev/message/19634 Mute This Topic: https://lists.fd.io/mt/83775537/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] TCP bandwidth loss
Hi, we found that the nsh make our TCP bandwidth loss. s cene as following threre ar two host: vm1(192.168.128.2), vm2(192.168.128.3) 1: host(vm1/vm2) send traffic to vpp_server0 2: vpp_server0 configrate the classify table and session for river the traffic to vpp_server1 3: vpp_server1 configrate the classify table and session to send the traffic back to vpp_server0 4: vpp_server0 send traffic to host vpp_server 0/1 nsh configration (hit-next 27 mean nsh-classifier) vpp_server0: classify table mask l3 ip4 src classify session hit-next 27 table-index 0 match l3 ip4 src 192.168.128.2 opaque-index 256255 classify session hit-next 27 table-index 0 match l3 ip4 src 192.168.128.3 opaque-index 256255 set interface l2 input classify intfc pipe1000.0 ip4-table 0 create nsh map nsp 1001 nsi 255 mapped-nsp 1001 mapped-nsi 255 nsh_action pop encap-none 3 0 create nsh map nsp 1002 nsi 255 mapped-nsp 1002 mapped-nsi 255 nsh_action pop encap-none 3 0 vpp_server1: classify table mask l3 ip4 dst classify session hit-next 27 table-index 0 match l3 ip4 dst 192.168.128.2 opaque-index 256511 classify session hit-next 27 table-index 0 match l3 ip4 dst 192.168.128.3 opaque-index 256767 we use iperf to test vm1/vm2 tcp bandwidth, it just 1Gbps if we just modify the configration of vpp_server1 as following, it mean that use the same nsh sp(1001) and si(255), tcp bandwidth will increate to 6Gbps we expected classify session hit-next 27 table-index 0 match l3 ip4 dst 192.168.128.3 del classify session hit-next 27 table-index 0 match l3 ip4 dst 192.168.128.3 opaque-index 256511 the vpp version we used is 19.08. we don't know why nsh cause this issue. B.R. joseph -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#19634): https://lists.fd.io/g/vpp-dev/message/19634 Mute This Topic: https://lists.fd.io/mt/83775537/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] MPLS DROP DPO
Hi Neale , The problem was solved . Thanks a lot. Mohsen On Wed, Jun 23, 2021 at 3:49 PM Neale Ranns wrote: > > > Hi Mohsen, > > > > You programmed the non-EOS entry, but the packet was EOS. MPLS lookup is > really a 21 bit lookup; label & EOS-bit. > > > > /neale > > > > > > *From: *vpp-dev@lists.fd.io on behalf of Mohsen > Meamarian via lists.fd.io > *Date: *Wednesday, 23 June 2021 at 09:09 > *To: *vpp-dev@lists.fd.io > *Subject: *[vpp-dev] MPLS DROP DPO > > > > Hi friends , > > I set mpls config between 3 hosts . but the middle host vpp dropped the > mpls packet . I use trace to see the drop error and see MPLS DROP DPO > . What could be the reason? > > I attached two photos from sh trace & sh mpls fib. > > thanks. > > > > > > > > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#19633): https://lists.fd.io/g/vpp-dev/message/19633 Mute This Topic: https://lists.fd.io/mt/83732825/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] #vpp #vnet os_panic for failed barrier timeout
Given the reported MTBF of 9 months and nearly 2-year-old software, switching to 21.01 [and then to 21.06 when released] seems like the only sensible next step. >From the gdb info provided, it looks like there is one worker thread. Is that correct? If so, the "workers_at_barrier" count seems correct, so why wouldn't the main thread have moved on instead of spinning waiting for something which already happened? D. From: vpp-dev@lists.fd.io On Behalf Of Bly, Mike via lists.fd.io Sent: Wednesday, June 23, 2021 10:59 AM To: vpp-dev@lists.fd.io Subject: [vpp-dev] #vpp #vnet os_panic for failed barrier timeout We are looking for advise on whether this os_panic() for a barrier timeout has anyone looking at it. We see in the forum many instances of type of main thread back-trace. For this incedent, referencing the sw_interface_dump API we created a lighter oper-get call to simply fetch link state vs. all of the extensive information the dump command fetches for each interface. At the time we added our new oper-get function, we overlooked the "is_mp_safe" enablement for dump and as such did NOT set it for our new oper-get. The end result is a fairly light API that requires barrier support. When this issue occurred the configuration was using a single separate worker thread so the API is waiting for a barrier count of 1. Interestingly, the BT analysis shows the count value was met, which implies some deeper issue. Why did a single worker, with at most 10s of packets per second workload at the time fail to stall at a barrier within the allotted one second timeout value? And, even more fun to answer is why we even reached the os_panic call as the BT shows the worker was stalled at the barrier. Please refer to GDB analysis at bottom of this email. This code is based on 19.08. We are in the process of upgrading to 21.01, but in review of the forum posts, this type of BT is seen across many versions. This is an extremely rare event. We had one occurrence in September of last year that we could not reproduce and then just had a second occurrence this week. As such, we are not able to reproduce this on demand, let alone in stock VPP code given this is a new API. While we could simply enable is_mp_safe as done for sw_interface_dump to avoid the issue, we are troubled at not being able to explain why the os_panic occurred in the first place. As such, we are hoping someone might be able to provide guidance here on next steps. What additional details from the core-file can we provide? Thread 1 backtrace #0 __GI_raise (sig=sig@entry=6) at /usr/src/debug/glibc/2.30-r0/git/sysdeps/unix/sysv/linux/raise.c:50 #1 0x003cb8425548 in __GI_abort () at /usr/src/debug/glibc/2.30-r0/git/stdlib/abort.c:79 #2 0x004075da in os_exit () at /usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vpp/vnet/main.c:37 9 #3 0x7ff1f5740794 in unix_signal_handler (signum=, si=, uc=) at /usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlib/unix/main.c:1 83 #4 #5 __GI_raise (sig=sig@entry=6) at /usr/src/debug/glibc/2.30-r0/git/sysdeps/unix/sysv/linux/raise.c:50 #6 0x003cb8425548 in __GI_abort () at /usr/src/debug/glibc/2.30-r0/git/stdlib/abort.c:79 #7 0x00407583 in os_panic () at /usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vpp/vnet/main.c:35 5 #8 0x7ff1f5728643 in vlib_worker_thread_barrier_sync_int (vm=0x7ff1f575ba40 , func_name=) at /usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlib/threads.c:147 6 #9 0x7ff1f62c6d56 in vl_msg_api_handler_with_vm_node (am=am@entry=0x7ff1f62d8d40 , the_msg=0x1300ba738, vm=vm@entry=0x7ff1f575ba40 , node=node@entry=0x7ff1b588c000) at /usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlibapi/api_shared .c:583 #10 0x7ff1f62b1237 in void_mem_api_handle_msg_i (am=, q=, node=0x7ff1b588c000, vm=0x7ff1f575ba40 ) at /usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlibmemory/memory_ api.c:712 #11 vl_mem_api_handle_msg_main (vm=vm@entry=0x7ff1f575ba40 , node=node@entry=0x7ff1b588c000) at /usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlibmemory/memory_ api.c:722 #12 0x7ff1f62be713 in vl_api_clnt_process (f=, node=, vm=) at /usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlibmemory/vlib_ap i.c:326 #13 vl_api_clnt_process (vm=, node=, f=) at /usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlibmemory/vlib_ap i.c:252 #14 0x7ff1f56f90b7 in vlib_process_bootstrap (_a=) at /usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlib/main.c:1468 #15 0x7ff1f561f220 in clib_calljmp () at /usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vppinfra/longjmp.S :123 #16 0x7ff1b5e39db0 in ?? () #17 0x7ff1f56fc669 in vlib_process_startup (f=0x0, p=0x7ff1b588c000, vm=0x7ff1f575ba40 ) at /usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vppinfra/types.h:1 33 Thread 3 backtrace (gdb) thr 3 [Switching to thread 3 (LWP 440)] #0
Re: [vpp-dev] VRRP issue when using interface in a table
Hi Mechthild, You’ll need to include: https://gerrit.fd.io/r/c/vpp/+/32298 /neale From: vpp-dev@lists.fd.io on behalf of Mechthild Buescher via lists.fd.io Date: Thursday, 24 June 2021 at 10:49 To: vpp-dev@lists.fd.io Subject: [vpp-dev] VRRP issue when using interface in a table Hi all, we are using VPP on two nodes where we would like to run VRRP. This works fine if the VRRP VR interface is in fib 0 but if we but the interface into FIB table 1 instead, VRRP is not working correctly anymore. Can you please help? Our setup: · 2 nodes with VPP on each node and one DPDK interface (we reduced the config to isolate the issue) connected to each VPP · a switch between the nodes which just forwards the traffic, so that it’s like a peer-2-peer connection The VPP version is (both nodes): vpp# show version vpp v21.01.0-6~gf70123b2c built by suse on SUSE at 2021-05-06T12:18:31 vpp# show version verbose Version: v21.01.0-6~gf70123b2c Compiled by: suse Compile host: SUSE Compile date: 2021-05-06T12:18:31 Compile location: /root/vpp-sp/vpp Compiler: GCC 7.5.0 Current PID: 6677 The VPP config uses the DPDK interface (both nodes): vpp# show hardware-interfaces NameIdx Link Hardware Ext-0 1 up Ext-0 Link speed: 10 Gbps Ethernet address e4:43:4b:ed:59:10 Intel X710/XL710 Family carrier up full duplex mtu 9206 flags: admin-up pmd maybe-multiseg tx-offload intel-phdr-cksum rx-ip4-cksum Devargs: rx: queues 1 (max 192), desc 1024 (min 64 max 4096 align 32) tx: queues 3 (max 192), desc 1024 (min 64 max 4096 align 32) pci: device 8086:1572 subsystem 1028:1f9c address :17:00.00 numa 0 max rx packet len: 9728 promiscuous: unicast off all-multicast on vlan offload: strip off filter off qinq off rx offload avail: vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq-strip outer-ipv4-cksum vlan-filter vlan-extend jumbo-frame scatter keep-crc rss-hash rx offload active: ipv4-cksum jumbo-frame scatter tx offload avail: vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp-cksum tcp-tso outer-ipv4-cksum qinq-insert vxlan-tnl-tso gre-tnl-tso ipip-tnl-tso geneve-tnl-tso multi-segs mbuf-fast-free tx offload active: udp-cksum tcp-cksum multi-segs rss avail: ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other ipv6-frag ipv6-tcp ipv6-udp ipv6-sctp ipv6-other l2-payload rss active:none tx burst mode: Scalar rx burst mode: Vector AVX2 Scattered The VRRP configs are (MASTER): set interface state Ext-0 up set interface ip address Ext-0 192.168.61.52/25 vrrp vr add Ext-0 vr_id 61 priority 200 no_preempt accept_mode 192.168.61.50 and on the system under test (SUT): ip table add 1 set interface ip table Ext-0 1 set interface state Ext-0 up set interface ip address Ext-0 192.168.61.51/25 vrrp vr add Ext-0 vr_id 61 priority 100 no_preempt accept_mode 192.168.61.50 On the MASTER, we started VRRP with: vrrp proto start Ext-0 vr_id 61 so that it has: vpp# show vrrp vr [0] sw_if_index 1 VR ID 61 IPv4 state Master flags: preempt no accept yes unicast no priority: configured 200 adjusted 200 timers: adv interval 100 master adv 100 skew 21 master down 321 virtual MAC 00:00:5e:00:01:3d addresses 192.168.61.50 peer addresses tracked interfaces On the SUT, we did not yet start VRRP, so we see: vpp# show vrrp vr [0] sw_if_index 1 VR ID 61 IPv4 state Initialize flags: preempt no accept yes unicast no priority: configured 100 adjusted 100 timers: adv interval 100 master adv 0 skew 0 master down 0 virtual MAC 00:00:5e:00:01:3d addresses 192.168.61.50 peer addresses tracked interfaces Here I see already that something is going wrong as the VRRP packets are not reaching vrrp4-input: vpp# show errors Count Node Reason Severity 5 dpdk-input no error error 138 ip4-localip4 source lookup miss error (If we configure SUT similar to the MASTER, ie interface in FIB 0, I can see vrrp4-input at this point) The trace of dpdk-input gives: Packet 1 00:00:57:644818: dpdk-input Ext-0 rx queue 0 buffer 0x9b7ec: current data 0, length 60, buffer-pool 0, ref-count 1, totlen-nifb 0, trace handle 0x100 ext-hdr-valid l4-cksum-computed l4-cksum-correct PKT MBUF: port 0, nb_segs 1, pkt_len 60 buf_len 2176, data_len 60, ol_flags 0x180, data_off 128, phys_addr 0x26dfb80 packet_type 0x691 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0 rss 0x0 fdir.hi 0x0 fdir.lo 0x0 Packet Offload Flags
[vpp-dev] VRRP issue when using interface in a table
Hi all, we are using VPP on two nodes where we would like to run VRRP. This works fine if the VRRP VR interface is in fib 0 but if we but the interface into FIB table 1 instead, VRRP is not working correctly anymore. Can you please help? Our setup: * 2 nodes with VPP on each node and one DPDK interface (we reduced the config to isolate the issue) connected to each VPP * a switch between the nodes which just forwards the traffic, so that it's like a peer-2-peer connection The VPP version is (both nodes): vpp# show version vpp v21.01.0-6~gf70123b2c built by suse on SUSE at 2021-05-06T12:18:31 vpp# show version verbose Version: v21.01.0-6~gf70123b2c Compiled by: suse Compile host: SUSE Compile date: 2021-05-06T12:18:31 Compile location: /root/vpp-sp/vpp Compiler: GCC 7.5.0 Current PID: 6677 The VPP config uses the DPDK interface (both nodes): vpp# show hardware-interfaces NameIdx Link Hardware Ext-0 1 up Ext-0 Link speed: 10 Gbps Ethernet address e4:43:4b:ed:59:10 Intel X710/XL710 Family carrier up full duplex mtu 9206 flags: admin-up pmd maybe-multiseg tx-offload intel-phdr-cksum rx-ip4-cksum Devargs: rx: queues 1 (max 192), desc 1024 (min 64 max 4096 align 32) tx: queues 3 (max 192), desc 1024 (min 64 max 4096 align 32) pci: device 8086:1572 subsystem 1028:1f9c address :17:00.00 numa 0 max rx packet len: 9728 promiscuous: unicast off all-multicast on vlan offload: strip off filter off qinq off rx offload avail: vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq-strip outer-ipv4-cksum vlan-filter vlan-extend jumbo-frame scatter keep-crc rss-hash rx offload active: ipv4-cksum jumbo-frame scatter tx offload avail: vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp-cksum tcp-tso outer-ipv4-cksum qinq-insert vxlan-tnl-tso gre-tnl-tso ipip-tnl-tso geneve-tnl-tso multi-segs mbuf-fast-free tx offload active: udp-cksum tcp-cksum multi-segs rss avail: ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other ipv6-frag ipv6-tcp ipv6-udp ipv6-sctp ipv6-other l2-payload rss active:none tx burst mode: Scalar rx burst mode: Vector AVX2 Scattered The VRRP configs are (MASTER): set interface state Ext-0 up set interface ip address Ext-0 192.168.61.52/25 vrrp vr add Ext-0 vr_id 61 priority 200 no_preempt accept_mode 192.168.61.50 and on the system under test (SUT): ip table add 1 set interface ip table Ext-0 1 set interface state Ext-0 up set interface ip address Ext-0 192.168.61.51/25 vrrp vr add Ext-0 vr_id 61 priority 100 no_preempt accept_mode 192.168.61.50 On the MASTER, we started VRRP with: vrrp proto start Ext-0 vr_id 61 so that it has: vpp# show vrrp vr [0] sw_if_index 1 VR ID 61 IPv4 state Master flags: preempt no accept yes unicast no priority: configured 200 adjusted 200 timers: adv interval 100 master adv 100 skew 21 master down 321 virtual MAC 00:00:5e:00:01:3d addresses 192.168.61.50 peer addresses tracked interfaces On the SUT, we did not yet start VRRP, so we see: vpp# show vrrp vr [0] sw_if_index 1 VR ID 61 IPv4 state Initialize flags: preempt no accept yes unicast no priority: configured 100 adjusted 100 timers: adv interval 100 master adv 0 skew 0 master down 0 virtual MAC 00:00:5e:00:01:3d addresses 192.168.61.50 peer addresses tracked interfaces Here I see already that something is going wrong as the VRRP packets are not reaching vrrp4-input: vpp# show errors Count Node Reason Severity 5 dpdk-input no error error 138 ip4-localip4 source lookup miss error (If we configure SUT similar to the MASTER, ie interface in FIB 0, I can see vrrp4-input at this point) The trace of dpdk-input gives: Packet 1 00:00:57:644818: dpdk-input Ext-0 rx queue 0 buffer 0x9b7ec: current data 0, length 60, buffer-pool 0, ref-count 1, totlen-nifb 0, trace handle 0x100 ext-hdr-valid l4-cksum-computed l4-cksum-correct PKT MBUF: port 0, nb_segs 1, pkt_len 60 buf_len 2176, data_len 60, ol_flags 0x180, data_off 128, phys_addr 0x26dfb80 packet_type 0x691 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0 rss 0x0 fdir.hi 0x0 fdir.lo 0x0 Packet Offload Flags PKT_RX_IP_CKSUM_GOOD (0x0080) IP cksum of RX pkt. is valid PKT_RX_L4_CKSUM_GOOD (0x0100) L4 cksum of RX pkt. is valid Packet Types RTE_PTYPE_L2_ETHER (0x0001) Ethernet packet RTE_PTYPE_L3_IPV4_EXT_UNKNOWN (0x0090) IPv4 packet with or without extension headers