Re: [vpp-dev] VPP / tcp_echo performance
Hi Florin, >From my logs it seems that TSO is not on even when using the native driver, >logs attached below. I'm going to do a deeper dive into the various networking >layers involved in this setup, will post any interesting findings back on this >thread. Thank you for all the help so far! Regards, Dom vpp# sh int Name Idx State MTU (L3/IP4/IP6/MPLS) Counter Count local0 0 down 0/0/0/0 vpp# create int virtio :00:03.0 vpp# sh int Name Idx State MTU (L3/IP4/IP6/MPLS) Counter Count local0 0 down 0/0/0/0 virtio-0/0/3/0 1 down 9000/0/0/0 rx packets 999 rx bytes 60252 drops 999 ip4 6 vpp# set interface ip address virtio-0/0/3/0 10.0.0.152/24 vpp# set interface state virtio-0/0/3/0 up vpp# session enable vpp# sh int Name Idx State MTU (L3/IP4/IP6/MPLS) Counter Count local0 0 down 0/0/0/0 virtio-0/0/3/0 1 up 9000/0/0/0 rx packets 1017 rx bytes 61332 drops 1017 ip4 6 vpp# sh sess verbose 2 Thread 0: no sessions Thread 1: no sessions Thread 2: no sessions Thread 3: no sessions vpp# sh sess verbose 2 Thread 0: no sessions [1:0][T] 10.0.0.152:27761->10.0.0.156:5201 ESTABLISHED index: 0 cfg: flags: timers: snd_una 124 snd_nxt 124 snd_una_max 124 rcv_nxt 5 rcv_las 5 snd_wnd 29056 rcv_wnd 7999488 rcv_wscale 10 snd_wl1 4 snd_wl2 124 flight size 0 out space 4473 rcv_wnd_av 7999488 tsval_recent 244488841 tsecr 434592381 tsecr_last_ack 434592381 tsval_recent_age 2995 snd_mss 1448 rto 259 rto_boff 0 srtt 67 us 3.127 rttvar 48 rtt_ts 0. rtt_seq 124 next_node 0 opaque 0x0 cong: none algo cubic cwnd 4473 ssthresh 2147483647 bytes_acked 0 cc space 4473 prev_cwnd 0 prev_ssthresh 0 snd_cong 3516215180 dupack 0 limited_tx 3516215180 rxt_bytes 0 rxt_delivered 0 rxt_head 3516215180 rxt_ts 434595747 prr_start 3516215180 prr_delivered 0 prr space 0 sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0 last_delivered 0 high_sacked 3516215180 is_reneging 0 cur_rxt_hole 4294967295 high_rxt 3516215180 rescue_rxt 3516215180 stats: in segs 6 dsegs 4 bytes 4 dupacks 0 out segs 7 dsegs 2 bytes 123 dupacks 0 fr 0 tr 0 rxt segs 0 bytes 0 duration 3.381 err wnd data below 0 above 0 ack below 0 above 0 pacer: rate 1430540 bucket 0 t/p 1.431 last_update 3.365 s idle 156 Rx fifo: cursize 0 nitems 799 has_event 0 head 4 tail 4 segment manager 2 vpp session 0 thread 1 app session 0 thread 0 ooo pool 0 active elts newest 4294967295 Tx fifo: cursize 0 nitems 799 has_event 0 head 123 tail 123 segment manager 2 vpp session 0 thread 1 app session 0 thread 0 ooo pool 0 active elts newest 4294967295 session: state: ready opaque: 0x0 flags: [1:1][T] 10.0.0.152:31516->10.0.0.156:5201 ESTABLISHED index: 1 cfg: flags: PSH pending timers: RETRANSMIT snd_una 455633510 snd_nxt 456069358 snd_una_max 456069358 rcv_nxt 1 rcv_las 1 snd_wnd 1575424 rcv_wnd 7999488 rcv_wscale 10 snd_wl1 1 snd_wl2 455633510 flight size 435848 out space 486 rcv_wnd_av 7999488 tsval_recent 244492205 tsecr 434595744 tsecr_last_ack 434595744 tsval_recent_age 4294966926 snd_mss 1448 rto 200 rto_boff 0 srtt 1 us 2.612 rttvar 1 rtt_ts 95.8681 rtt_seq 455697222 next_node 0 opaque 0x0 cong: none algo cubic cwnd 436334 ssthresh 333526 bytes_acked 2896 cc space 486 prev_cwnd 476467 prev_ssthresh 341803 snd_cong 419169974 dupack 0 limited_tx 2714252909 rxt_bytes 0 rxt_delivered 0 rxt_head 2714252909 rxt_ts 434595747 prr_start 418787702 prr_delivered 0 prr space 0 sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0 last_delivered 0 high_sacked 419169974 is_reneging 0 cur_rxt_hole 4294967295 high_rxt 418797838 rescue_rxt 418787701 stats: in segs 144873 dsegs 0 bytes 0 dupacks 2553 out segs 315278 dsegs 315277 bytes 456519685 dupacks 0 fr 12 tr 0 rxt segs 311 bytes 450328 duration 3.371 err wnd data below 0 above 0 ack below 0 above 0 pacer: rate 436334000 bucket 282 t/p 436.334 last_update 264 us idle 100 Rx fifo: cursize 0 nitems 799 has_event 0 head 0 tail 0 segment manager 2 vpp session 1 thread 1 app session 1 thread 0 ooo pool 0 active elts newest 0 Tx fifo: cursize 799 nitems 799 has_event 1 head 7633509 tail 7633508 segment manager 2 vpp session 1 thread 1 app session 1 thread 0 ooo pool 0 active elts newest 4294967295 session: state: ready opaque: 0x0 flags: Thread 1: active sessions 2 Thread 2: no sessions Thread 3: no sessions vpp# sh hardware-interfaces Name Idx Link Hardware local0 0 down local0 Link speed: unknown local virtio-0/0/3/0 1 up virtio-0/0/3/0 Link speed: unknown Ethernet
Re: [vpp-dev] VPP / tcp_echo performance
Hi Dom, From the logs it looks like TSO is not on. I wonder if the vhost nic actually honors the “tso on” flag. Have you also tried with native vhost driver, instead of the dpdk one? I’ve never tried with the tcp, so I don’t know if it properly advertises the fact that it supports TSO. Lower you can see how it looks on my side, between two Broadwell boxes with XL710s. The tcp connection TSO flag needs to be on, otherwise tcp will do the segmentation by itself. Regards, Florin $ ~/vpp/vcl_iperf_client 6.0.1.2 -t 10 [snip] [ ID] Interval Transfer Bandwidth Retr [ 33] 0.00-10.00 sec 42.2 GBytes 36.2 Gbits/sec0 sender [ 33] 0.00-10.00 sec 42.2 GBytes 36.2 Gbits/sec receiver vpp# show session verbose 2 [snip] [1:1][T] 6.0.1.1:27240->6.0.1.2:5201 ESTABLISHED index: 1 cfg: TSO flags: PSH pending timers: RETRANSMIT snd_una 2731494347 snd_nxt 2731992143 snd_una_max 2731992143 rcv_nxt 1 rcv_las 1 snd_wnd 1999872 rcv_wnd 3999744 rcv_wscale 10 snd_wl1 1 snd_wl2 2731494347 flight size 497796 out space 716 rcv_wnd_av 3999744 tsval_recent 1787061797 tsecr 3347210414 tsecr_last_ack 3347210414 tsval_recent_age 4294966829 snd_mss 1448 rto 200 rto_boff 0 srtt 1 us .101 rttvar 1 rtt_ts 8.6696 rtt_seq 2731733367 next_node 0 opaque 0x0 cong: none algo cubic cwnd 498512 ssthresh 407288 bytes_acked 17376 cc space 716 prev_cwnd 581841 prev_ssthresh 403737 snd_cong 2702482407 dupack 0 limited_tx 1608697445 rxt_bytes 0 rxt_delivered 0 rxt_head 13367060 rxt_ts 3347210414 prr_start 2701996195 prr_delivered 0 prr space 0 sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0 last_delivered 0 high_sacked 2702540327 is_reneging 0 cur_rxt_hole 4294967295 high_rxt 2702048323 rescue_rxt 2701996194 stats: in segs 293052 dsegs 0 bytes 0 dupacks 5568 out segs 381811 dsegs 381810 bytes 15628627726 dupacks 0 fr 229 tr 0 rxt segs 8207 bytes 11733696 duration 3.468 err wnd data below 0 above 0 ack below 0 above 0 pacer: rate 4941713080 bucket 2328382 t/p 4941.713 last_update 0 us idle 100 Rx fifo: cursize 0 nitems 399 has_event 0 head 0 tail 0 segment manager 1 vpp session 1 thread 1 app session 1 thread 0 ooo pool 0 active elts newest 0 Tx fifo: cursize 199 nitems 199 has_event 1 head 396234 tail 396233 segment manager 1 vpp session 1 thread 1 app session 1 thread 0 ooo pool 0 active elts newest 4294967295 session: state: ready opaque: 0x0 flags: vpp# sh run [snip] Thread 1 vpp_wk_0 (lcore 24) Time 774.3, 10 sec internal node vector rate 0.00 vector rates in 2.5159e3, out 1.4186e3, drop 1.2915e-3, punt 0.e0 Name State Calls Vectors Suspends Clocks Vectors/Call FortyGigabitEthernet84/0/0-out active 977678 1099456 0 2.47e21.12 FortyGigabitEthernet84/0/0-txactive 977678 1098446 0 2.17e31.12 ethernet-input active 442524 848618 0 2.69e21.92 ip4-input-no-checksumactive 442523 848617 0 2.86e21.92 ip4-localactive 442523 848617 0 3.24e21.92 ip4-lookup active1291425 1948073 0 2.09e21.51 ip4-rewrite active 977678 1099456 0 2.23e21.12 session-queuepolling7614793106 1099452 0 7.45e50.00 tcp4-established active 442520 848614 0 1.26e31.92 tcp4-input active 442523 848617 0 3.04e21.92 tcp4-output active 977678 1099456 0 3.77e21.12 tcp4-rcv-process active 1 1 0 5.82e31.00 tcp4-syn-sentactive 2 2 0 6.84e41.00 > On Dec 13, 2019, at 12:58 PM, dch...@akouto.com wrote: > > Hi, > I rebuilt VPP on master and updated startup.conf to enable tso as follows: > dpdk { > dev :00:03.0{ > num-rx-desc 2048 > num-tx-desc 2048 > tso on > } > uio-driver vfio-pci > enable-tcp-udp-checksum > } > > I'm not sure whether it is working or not, there is nothing in show session > verbose 2 to indicate whether it is on or off (output at the end of this >
Re: [vpp-dev] VPP / tcp_echo performance
Hi, I rebuilt VPP on master and updated startup.conf to enable tso as follows: dpdk { dev :00:03.0{ num-rx-desc 2048 num-tx-desc 2048 tso on } uio-driver vfio-pci enable-tcp-udp-checksum } I'm not sure whether it is working or not, there is nothing in show session verbose 2 to indicate whether it is on or off (output at the end of this update). Unfortunately there was no improvement from a performance perspective. Then I figured I would try using a tap interface on the VPP side so I could run iperf3 "natively" on the VPP client side as well, but got the same result again. I find this so perplexing, two test runs back to back with reboots in between to rule out any configuration issues: *Test 1 using native linux networking on both sides:* [iperf3 client --> linux networking eth0] --> [Openstack/Linuxbridge] --> [linux networking eth0 --> iperf3 server] Result: 10+ Gbps *Reboot both instances and assign the NIC on the client side to VPP :* vpp# set int l2 bridge GigabitEthernet0/3/0 1 vpp# set int state GigabitEthernet0/3/0 up vpp# create tap tap0 vpp# set int l2 bridge tap0 1 vpp# set int state tap0 up [root]# ip addr add 10.0.0.152/24 dev tap0 [iperf3 client --> tap0 --> VPP GigabitEthernet0/3/0 ] --> [Openstack/Linuxbridge] --> [ linux networking eth0 --> iperf3 server] Result: 1 Gbps I had started to suspect the host OS or OpenStack Neutron, linuxbridge etc, but based on this it just *has* to be something in the guest running VPP. Any and all ideas or suggestions are welcome! Regards, Dom Note: this output is from a run using iperf3+VCL with the TSO settings in startup.conf, not the tap interface test described above: vpp# set interface ip address GigabitEthernet0/3/0 10.0.0.152/24 vpp# set interface state GigabitEthernet0/3/0 up vpp# session enable vpp# sh session verbose 2 Thread 0: no sessions [1:0][T] 10.0.0.152:6445->10.0.0.156:5201 ESTABLISHED index: 0 cfg: flags: timers: snd_una 124 snd_nxt 124 snd_una_max 124 rcv_nxt 5 rcv_las 5 snd_wnd 29056 rcv_wnd 7999488 rcv_wscale 10 snd_wl1 4 snd_wl2 124 flight size 0 out space 4473 rcv_wnd_av 7999488 tsval_recent 3428491 tsecr 193532193 tsecr_last_ack 193532193 tsval_recent_age 13996 snd_mss 1448 rto 259 rto_boff 0 srtt 67 us 3.891 rttvar 48 rtt_ts 0. rtt_seq 124 next_node 0 opaque 0x0 cong: none algo cubic cwnd 4473 ssthresh 2147483647 bytes_acked 0 cc space 4473 prev_cwnd 0 prev_ssthresh 0 snd_cong 1281277517 dupack 0 limited_tx 1281277517 rxt_bytes 0 rxt_delivered 0 rxt_head 1281277517 rxt_ts 193546719 prr_start 1281277517 prr_delivered 0 prr space 0 sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0 last_delivered 0 high_sacked 1281277517 is_reneging 0 cur_rxt_hole 4294967295 high_rxt 1281277517 rescue_rxt 1281277517 stats: in segs 6 dsegs 4 bytes 4 dupacks 0 out segs 7 dsegs 2 bytes 123 dupacks 0 fr 0 tr 0 rxt segs 0 bytes 0 duration 14.539 err wnd data below 0 above 0 ack below 0 above 0 pacer: rate 1149550 bucket 0 t/p 1.149 last_update 14.526 s idle 194 Rx fifo: cursize 0 nitems 799 has_event 0 head 4 tail 4 segment manager 2 vpp session 0 thread 1 app session 0 thread 0 ooo pool 0 active elts newest 4294967295 Tx fifo: cursize 0 nitems 799 has_event 0 head 123 tail 123 segment manager 2 vpp session 0 thread 1 app session 0 thread 0 ooo pool 0 active elts newest 4294967295 session: state: ready opaque: 0x0 flags: [1:1][T] 10.0.0.152:10408->10.0.0.156:5201 ESTABLISHED index: 1 cfg: flags: timers: RETRANSMIT snd_una 2195902174 snd_nxt 2196262726 snd_una_max 2196262726 rcv_nxt 1 rcv_las 1 snd_wnd 1574016 rcv_wnd 7999488 rcv_wscale 10 snd_wl1 1 snd_wl2 2195902174 flight size 360552 out space 832 rcv_wnd_av 7999488 tsval_recent 3443014 tsecr 193546715 tsecr_last_ack 193546715 tsval_recent_age 4294966768 snd_mss 1448 rto 200 rto_boff 0 srtt 1 us 2.606 rttvar 1 rtt_ts 45.0534 rtt_seq 2195903622 next_node 0 opaque 0x0 cong: none algo cubic cwnd 361384 ssthresh 329528 bytes_acked 2896 cc space 832 prev_cwnd 470755 prev_ssthresh 340435 snd_cong 2188350854 dupack 0 limited_tx 2709798285 rxt_bytes 0 rxt_delivered 0 rxt_head 2143051622 rxt_ts 193546719 prr_start 2187975822 prr_delivered 0 prr space 0 sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0 last_delivered 0 high_sacked 2188350854 is_reneging 0 cur_rxt_hole 4294967295 high_rxt 2187977270 rescue_rxt 2187975821 stats: in segs 720132 dsegs 0 bytes 0 dupacks 127869 out segs 1549120 dsegs 1549119 bytes 2243122901 dupacks 0 fr 43 tr 0 rxt segs 32362 bytes 46860176 duration 14.529 err wnd data below 0 above 0 ack below 0 above 0 pacer: rate 361384000 bucket 1996 t/p 361.384 last_update 619 us idle 100 Rx fifo: cursize 0 nitems 799 has_event 0 head 0 tail 0 segment manager 2 vpp session 1 thread 1 app session 1 thread 0 ooo pool 0 active elts newest 0 Tx fifo: cursize 799 nitems 799 has_event 1 head 3902173 tail 3902172 segment manager 2 vpp session 1 thread 1 app session 1 thread 0 ooo pool 0 active elts
Re: [vpp-dev] VPP / tcp_echo performance
Hi Dom, > On Dec 12, 2019, at 12:29 PM, dch...@akouto.com wrote: > > Hi Florin, > > The saga continues, a little progress and more questions. In order to reduce > the variables, I am now only using VPP on one of the VMs: iperf3 server is > running on a VM with native Linux networking, and iperf3+VCL client running > on the second VM. FC: Okay! > > I've pasted the output from a few commands during this test run below and > have a few questions if you don't mind. > The "show errors" command indicates "Tx packet drops (dpdk tx failure)". I > have done quite a bit of searching, found other mentions of this in other > threads but no tips as to where to look or hints on how it was / can be > solved. Any thoughts? FC: The number of drops is not that large, so we can ignore for now. > I'm not really sure how to interpret the results of "show run" but nothing > jumps out at me, do you see anything useful in there? FC: Nothing apart from the fact that one of vpp’s workers is moderately loaded (you’re still running 3 workers). > Some of the startup.conf options were not working for me, so I switched to > building from source (I chose to use tag v20.01-rc0 for some stability). > Still no luck with some of the options: > When I try to use tcp { tso } I get this: 0: tcp_config_fn: unknown input ` > tso' FC: You need to get “closer” to master HEAD. That tag was laid when 19.08 was released but tso support was merged afterwards. Typically our CI infra is good enough to keep things running so you might want to try master latest. > When I try to use num-mbufs in the dpdk section, I get 0: dpdk_config: > unknown input `num-mbufs 65535’ FC: This was deprecated at one point. The new stanza is "buffers { buffers-per-numa }" > > Do you know if these options are supported? I can't figure out a way to > increase mbufs since the above option does not work, and when I try to use > socket-mem (which according to the documentation is needed if there is a need > for a larger number of mbufs) I get this: dpdk_config:1408: socket-mem > argument is deprecated FC: Yes, this was also deprecated. > > To answer some of your questions from your previous reply: > I have indeed been using taaskset and watching CPU load with top to make sure > things are going where I expect them to go > I am not trying to use jumbo buffers, increasing "default data-size" was just > an attempt to see if there would be a difference > Thanks for the cubic congestion algo suggestion, made the change but no > improvement FC: Understood! I guess that means we should try tso. I just tested it and it seems dpdk stanza needs an extra "dpdk {enable-tcp-udp-checksum}” apart from “dpdk { dev { tso on } }”. Let me know if you hit any other issues with it. You’ll know that it’s running if you do “show session verbose 2” and you see “TSO" in the cfg flags, instead of “TSO off”. Regards, Florin > Thank you for all the help, it is very much appreciated. > > Regards, > Dom > > vpp# sh int > Name IdxState MTU (L3/IP4/IP6/MPLS) > Counter Count > GigabitEthernet0/3/0 1 up 9000/0/0/0 rx > packets 1642537 > rx bytes > 108676814 > tx > packets 5216493 > tx bytes > 7793319472 > drops >392 > ip4 >1642178 > tx-error >475 > local00 down 0/0/0/0 drops > 1 > > vpp# sh err >CountNode Reason > 1ip4-glean ARP requests sent > 7 dpdk-input no error >5216424 session-queue Packets transmitted > 1tcp4-rcv-processPure ACKs received > 2 tcp4-syn-sent SYN-ACKs received > 7tcp4-establishedPackets pushed into rx fifo >1619850tcp4-establishedPure ACKs received > 22219tcp4-establishedDuplicate ACK > 1tcp4-establishedResets received > 62tcp4-establishedConnection closed > 1tcp4-establishedFINs received > 62 tcp4-output Resets sent > 2arp-reply ARP replies sent > 33ip4-input
Re: [vpp-dev] VPP / tcp_echo performance
Hi Florin, The saga continues, a little progress and more questions. In order to reduce the variables, I am now only using VPP on one of the VMs: iperf3 server is running on a VM with native Linux networking, and iperf3+VCL client running on the second VM. I've pasted the output from a few commands during this test run below and have a few questions if you don't mind. * The "show errors" command indicates " *Tx packet drops (dpdk tx failure)* ". I have done quite a bit of searching, found other mentions of this in other threads but no tips as to where to look or hints on how it was / can be solved. Any thoughts? * I'm not really sure how to interpret the results of "show run" but nothing jumps out at me, do you see anything useful in there? * Some of the startup.conf options were not working for me, so I switched to building from source (I chose to use tag v20.01-rc0 for some stability). Still no luck with some of the options: * When I try to use tcp { tso } I get this: *0:* *tcp_config_fn: unknown input ` tso'* * When I try to use num-mbufs in the dpdk section, I get *0: dpdk_config: unknown input `num-mbufs 65535'* Do you know if these options are supported? I can't figure out a way to increase mbufs since the above option does not work, and when I try to use socket-mem (which according to the documentation is needed if there is a need for a larger number of mbufs) I get this: *dpdk_config:1408: socket-mem argument is deprecated* To answer some of your questions from your previous reply: * I have indeed been using taaskset and watching CPU load with top to make sure things are going where I expect them to go * I am not trying to use jumbo buffers, increasing "default data-size" was just an attempt to see if there would be a difference * Thanks for the cubic congestion algo suggestion, made the change but no improvement Thank you for all the help, it is very much appreciated. Regards, Dom *vpp# sh int* Name Idx State MTU (L3/IP4/IP6/MPLS) Counter Count GigabitEthernet0/3/0 1 up 9000/0/0/0 rx packets 1642537 rx bytes 108676814 tx packets 5216493 tx bytes 7793319472 drops 392 ip4 1642178 tx-error 475 local0 0 down 0/0/0/0 drops 1 *vpp# sh err* Count Node Reason 1 ip4-glean ARP requests sent 7 dpdk-input no error 5216424 session-queue Packets transmitted 1 tcp4-rcv-process Pure ACKs received 2 tcp4-syn-sent SYN-ACKs received 7 tcp4-established Packets pushed into rx fifo 1619850 tcp4-established Pure ACKs received 22219 tcp4-established Duplicate ACK 1 tcp4-established Resets received 62 tcp4-established Connection closed 1 tcp4-established FINs received 62 tcp4-output Resets sent 2 arp-reply ARP replies sent 33 ip4-input unknown ip protocol 1 ip4-input Multicast RPF check failed 1 ip4-glean ARP requests sent 351 llc-input unknown llc ssap/dsap 475 GigabitEthernet0/3/0-tx Tx packet drops (dpdk tx failure) *vpp# sh run* Thread 0 vpp_main (lcore 7) Time 94.7, average vectors/node 1.00, last 128 main loops 0.00 per node 0.00 vector rates in 0.e0, out 3.1669e-2, drop 1.0556e-2, punt 0.e0 Name State Calls Vectors Suspends Clocks Vectors/Call GigabitEthernet0/3/0-output active 3 3 0 3.29e4 1.00 GigabitEthernet0/3/0-tx active 3 3 0 3.73e4 1.00 acl-plugin-fa-cleaner-process event wait 0 0 1 2.78e4 0.00 admin-up-down-process event wait 0 0 1 2.24e3 0.00 api-rx-from-ring any wait 0 0 24 1.01e6 0.00 avf-process event wait 0 0 1 2.15e4 0.00 bfd-process event wait 0 0 1 1.49e4 0.00 bond-process event wait 0 0 1 1.43e4 0.00 dhcp-client-process any wait 0 0
Re: [vpp-dev] VPP / tcp_echo performance
Hi Dom, Great to see progress! More inline. > On Dec 6, 2019, at 10:21 AM, dch...@akouto.com wrote: > > Hi Florin, > > Some progress, at least with the built-in echo app, thank you for all the > suggestions so far! By adjusting the fifo-size and testing in half-duplex I > was able to get close to 5 Gbps between the two openstack instances using the > built-in test echo app: > > vpp# test echo clients gbytes 1 no-return fifo-size 100 uri > tcp://10.0.0.156/ FC: The cli for the echo apps is a bit confusing. Whatever you pass above is left shifted by 10 (multiplied by 1024) so that’s why I suggested to use 4096 (~4MB). You can also use larger values, but above you are asking for ~1GB :-) > 1 three-way handshakes in .26 seconds 3.86/s > Test started at 745.163085 > Test finished at 746.937343 > 1073741824 bytes (1024 mbytes, 1 gbytes) in 1.77 seconds > 605177784.33 bytes/second half-duplex > 4.8414 gbit/second half-duplex > > I need to get closer to 10 Gbps but at least there is good proof that the > issue is related to configuration / tuning. So, I switched back to iperf > testing with VCL, and I'm back to 600 Mbps, even though I can confirm that > the fifo sizes match what is configured in vcl.conf (note that in this test > run I changed that to 8 Mb each for rx and tx from the previous 16, but > results are the same when I use 16 Mb). I'm obviously missing something in > the configuration but I can't imagine what that might be. Below is my exact > startup.conf, vcl.conf and output from show session from this iperf run to > give the full picture, hopefully something jumps out as missing in my > configuration. Thank you for your patience and support with this, much > appreciated! FC: Not entirely sure what the issue is but some things can be improved. More lower. > > [root@vpp-test-1 centos]# cat vcl.conf > vcl { > rx-fifo-size 800 > tx-fifo-size 800 > app-scope-local > app-scope-global > api-socket-name /tmp/vpp-api.sock > } FC: This looks okay. > > [root@vpp-test-1 centos]# cat /etc/vpp/startup.conf > unix { > nodaemon > log /var/log/vpp/vpp.log > full-coredump > cli-listen /run/vpp/cli.sock > gid vpp > interactive > } > dpdk { > dev :00:03.0{ > num-rx-desc 65535 > num-tx-desc 65535 FC: Not sure about this. I don’t have any experience with vhost interfaces, but for XL710s I typically use 256 descriptors. It might be too low if you start noticing lots of rx/tx drops with “show int”. > } > } > session { evt_qs_memfd_seg } > socksvr { socket-name /tmp/vpp-api.sock } > api-trace { > on > } > api-segment { > gid vpp > } > cpu { > main-core 7 > corelist-workers 4-6 > workers 3 FC: For starters, could you try this out with only 1 worker, since you’re testing with 1 connection. Also, did you try pinning iperf with taskset to a worker on the same numa like your vpp workers, in case you have multiple numas? Check with lscpu your cpu into numa distribution. You may want to pin iperf even if you have only one numa, just to be sure it won’t be scheduled by mistake on the cores vpp is using. > } > buffers { > ## Increase number of buffers allocated, needed only in scenarios with > ## large number of interfaces and worker threads. Value is per numa > node. > ## Default is 16384 (8192 if running unpriviledged) > buffers-per-numa 128000 FC: For simple testing I only use 16k, but this value actually depends on the number of rx/tx descriptors you have configured. > > ## Size of buffer data area > ## Default is 2048 > default data-size 8192 FC: Are you trying to use jumbo buffers? You need to add to the tcp stanza, i.e., tcp { mtu }. But for starters don’t modify the buffer size, just to get an idea of where performance is without this. Afterwards, as Jerome suggested, you may want to try tso by enabling it for tcp, i.e., tcp { tso } in startup.conf and enabling tso for the nic by adding “tso on” to the nic’s dpdk stanza (if the nic actually supports it). You don’t need to change the buffer size for that. > } > > vpp# sh session verbose 2 > Thread 0: no sessions > [1:0][T] 10.0.0.152:41737->10.0.0.156:5201ESTABLISHED > index: 0 flags: timers: > snd_una 124 snd_nxt 124 snd_una_max 124 rcv_nxt 5 rcv_las 5 > snd_wnd 7999488 rcv_wnd 7999488 rcv_wscale 10 snd_wl1 4 snd_wl2 124 > flight size 0 out space 4413 rcv_wnd_av 7999488 tsval_recent 12893009 > tsecr 10757431 tsecr_last_ack 10757431 tsval_recent_age 1995 snd_mss 1428 > rto 200 rto_boff 0 srtt 3 us 3.887 rttvar 2 rtt_ts 0. rtt_seq 124 > cong: none algo newreno cwnd 4413 ssthresh 4194304 bytes_acked 0 > cc space 4413 prev_cwnd 0 prev_ssthresh 0 rtx_bytes 0 > snd_congestion 1736877166 dupack 0 limited_transmit 1736877166 > sboard: sacked_bytes 0 last_sacked_bytes 0 lost_bytes 0 > last_bytes_delivered 0 high_sacked
Re: [vpp-dev] VPP / tcp_echo performance
Hi Florin, Some progress, at least with the built-in echo app, thank you for all the suggestions so far! By adjusting the fifo-size and testing in half-duplex I was able to get close to 5 Gbps between the two openstack instances using the built-in test echo app: vpp# test echo clients gbytes 1 no-return fifo-size 100 uri tcp://10.0.0.156/ 1 three-way handshakes in .26 seconds 3.86/s Test started at 745.163085 Test finished at 746.937343 1073741824 bytes (1024 mbytes, 1 gbytes) in 1.77 seconds 605177784.33 bytes/second half-duplex 4.8414 gbit/second half-duplex I need to get closer to 10 Gbps but at least there is good proof that the issue is related to configuration / tuning. So, I switched back to iperf testing with VCL, and I'm back to 600 Mbps, even though I can confirm that the fifo sizes match what is configured in vcl.conf (note that in this test run I changed that to 8 Mb each for rx and tx from the previous 16, but results are the same when I use 16 Mb). I'm obviously missing something in the configuration but I can't imagine what that might be. Below is my exact startup.conf, vcl.conf and output from show session from this iperf run to give the full picture, hopefully something jumps out as missing in my configuration. Thank you for your patience and support with this, much appreciated! *[root@vpp-test-1 centos]# cat vcl.conf* vcl { rx-fifo-size 800 tx-fifo-size 800 app-scope-local app-scope-global api-socket-name /tmp/vpp-api.sock } *[root@vpp-test-1 centos]# cat /etc/vpp/startup.conf* unix { nodaemon log /var/log/vpp/vpp.log full-coredump cli-listen /run/vpp/cli.sock gid vpp interactive } dpdk { dev :00:03.0{ num-rx-desc 65535 num-tx-desc 65535 } } session { evt_qs_memfd_seg } socksvr { socket-name /tmp/vpp-api.sock } api-trace { on } api-segment { gid vpp } cpu { main-core 7 corelist-workers 4-6 workers 3 } buffers { ## Increase number of buffers allocated, needed only in scenarios with ## large number of interfaces and worker threads. Value is per numa node. ## Default is 16384 (8192 if running unpriviledged) buffers-per-numa 128000 ## Size of buffer data area ## Default is 2048 default data-size 8192 } *vpp# sh session verbose 2* Thread 0: no sessions [1:0][T] 10.0.0.152:41737->10.0.0.156:5201 ESTABLISHED index: 0 flags: timers: snd_una 124 snd_nxt 124 snd_una_max 124 rcv_nxt 5 rcv_las 5 snd_wnd 7999488 rcv_wnd 7999488 rcv_wscale 10 snd_wl1 4 snd_wl2 124 flight size 0 out space 4413 rcv_wnd_av 7999488 tsval_recent 12893009 tsecr 10757431 tsecr_last_ack 10757431 tsval_recent_age 1995 snd_mss 1428 rto 200 rto_boff 0 srtt 3 us 3.887 rttvar 2 rtt_ts 0. rtt_seq 124 cong: none algo newreno cwnd 4413 ssthresh 4194304 bytes_acked 0 cc space 4413 prev_cwnd 0 prev_ssthresh 0 rtx_bytes 0 snd_congestion 1736877166 dupack 0 limited_transmit 1736877166 sboard: sacked_bytes 0 last_sacked_bytes 0 lost_bytes 0 last_bytes_delivered 0 high_sacked 1736877166 snd_una_adv 0 cur_rxt_hole 4294967295 high_rxt 1736877166 rescue_rxt 1736877166 stats: in segs 7 dsegs 4 bytes 4 dupacks 0 out segs 7 dsegs 2 bytes 123 dupacks 0 fr 0 tr 0 rxt segs 0 bytes 0 duration 2.484 err wnd data below 0 above 0 ack below 0 above 0 pacer: bucket 42459 tokens/period .685 last_update 61908201 Rx fifo: cursize 0 nitems 799 has_event 0 head 4 tail 4 segment manager 3 vpp session 0 thread 1 app session 0 thread 0 ooo pool 0 active elts newest 4294967295 Tx fifo: cursize 0 nitems 799 has_event 0 head 123 tail 123 segment manager 3 vpp session 0 thread 1 app session 0 thread 0 ooo pool 0 active elts newest 4294967295 [1:1][T] 10.0.0.152:53460->10.0.0.156:5201 ESTABLISHED index: 1 flags: PSH pending timers: RETRANSMIT snd_una 160482962 snd_nxt 160735718 snd_una_max 160735718 rcv_nxt 1 rcv_las 1 snd_wnd 7999488 rcv_wnd 7999488 rcv_wscale 10 snd_wl1 1 snd_wl2 160482962 flight size 252756 out space 714 rcv_wnd_av 7999488 tsval_recent 12895476 tsecr 10759907 tsecr_last_ack 10759907 tsval_recent_age 4294966825 snd_mss 1428 rto 200 rto_boff 0 srtt 1 us 3.418 rttvar 2 rtt_ts 42.0588 rtt_seq 160485818 cong: none algo newreno cwnd 253470 ssthresh 187782 bytes_acked 2856 cc space 714 prev_cwnd 382704 prev_ssthresh 187068 rtx_bytes 0 snd_congestion 150237062 dupack 0 limited_transmit 817908495 sboard: sacked_bytes 0 last_sacked_bytes 0 lost_bytes 0 last_bytes_delivered 0 high_sacked 150242774 snd_una_adv 0 cur_rxt_hole 4294967295 high_rxt 150235634 rescue_rxt 149855785 stats: in segs 84958 dsegs 0 bytes 0 dupacks 1237 out segs 112747 dsegs 112746 bytes 160999897 dupacks 0 fr 5 tr 0 rxt segs 185 bytes 264180 duration 2.473 err wnd data below 0 above 0 ack below 0 above 0 pacer: bucket 22180207 tokens/period 117.979 last_update 61e173e5 Rx fifo: cursize 0 nitems 799 has_event 0 head 0 tail 0 segment manager 3 vpp session 1 thread 1 app session 1 thread 0 ooo pool 0 active elts newest 0 Tx fifo: cursize 799 nitems 799 has_event 1 head 482961 tail 482960 segment manager 3 vpp
Re: [vpp-dev] VPP / tcp_echo performance
Hi Dom, I would actually recommend testing with iperf because it should not be slower than the builtin echo server/client apps. Remember to add fifo-size to your echo apps cli commands (something like fifo-size 4096 for 4MB) to increase the fifo sizes. Also note that you’re trying full-duplex testing. To check half-duplex, add no-echo to the server and no-return to client (or the other way around - in an airport and can’t remember the exact cli). We should probably make half-duplex default. I’m surprised that iperf reports throughput as small as the echo apps. Did you check that fifo sizes are 16MB as configured and that snd_wnd/rcv_wnd/cwnd reported by “show session verbose 2” are the right size? As for the checksum issues you’re hitting, I agree. It might be that tcp checksum offloading does not work properly with your interfaces. Regards, Florin > On Dec 4, 2019, at 2:18 PM, dch...@akouto.com wrote: > > It turns out I was using DPDK virtio, with help from Moshin I changed the > configuration and tried to repeat the tests using VPP native virtio, results > are similar but there are some interesting new observations, sharing them > here in case they are useful to others or trigger any ideas. > > After configuring both instances to use VPP native virtio, I used the > built-in echo test to see what throughput I would get, and I got the same > results as the modified external tcp_echo, i.e. about 600 Mbps: > Added dpdk { no-pci } to startup.conf and configured the interface using > create int virtio as per instructions from Moshin, confirmed > settings with show virtio pci command > Ran the built-in test echo application to transfer 1 GB of data and got the > following results: > vpp# test echo clients gbytes 1 uri tcp://10.0.0.153/5556 > 1 three-way handshakes in 0.00 seconds 2288.06/s > Test started at 1255.753237 > Test finished at 1272.863244 > 1073741824 bytes (1024 mbytes, 1 gbytes) in 17.11 seconds > 62755195.55 bytes/second full-duplex > .5020 gbit/second full-duplex > I then used iperf3 with VCL on both sides and got roughly the same results > (620 Mbps) > Then I rebooted the client VM and use native Linux networking on the client > side with VPP on the server side, and try to repeat the iperf test > When I use VPP-native virtio on the server side, the iperf test fails, > packets are dropped on the server (VPP) side, doing a trace shows packets are > dropped because of "bad tcp checksum" > I then switch the server side to use DPDK virtio, the iperf test works and I > get 3 Gbps throughput > So, the big performance problem is on the client (sender) side, with VPP only > able to get around 600 Mbps out for some reason, even when using the built-in > test echo application. I'm continuing my investigation to see where the > bottleneck is, any other ideas on where to look would be greatly appreciated. > > Also, there may be a checksum bug in the VPP-native virtio driver since the > packets are not dropped on the server side when using the DPDK virtio driver. > I'd be happy to help gather more details on this, create a JIRA ticket and > even contribute a fix but wanted to check before going down that road, any > thoughts or comments? > > Thanks again for all the help so far! > > Regards, > Dom > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > > View/Reply Online (#14801): https://lists.fd.io/g/vpp-dev/message/14801 > Mute This Topic: https://lists.fd.io/mt/65863639/675152 > Group Owner: vpp-dev+ow...@lists.fd.io > Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [fcoras.li...@gmail.com] > -=-=-=-=-=-=-=-=-=-=-=- -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#14805): https://lists.fd.io/g/vpp-dev/message/14805 Mute This Topic: https://lists.fd.io/mt/65863639/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] VPP / tcp_echo performance
Hi Dom, I suspect your client/server are really bursty in sending/receiving and your fifos are relatively small. So probably the delay in issuing the cli in the two vms is enough for the receiver to drain its rx fifo. Also, whenever the rx fifo on the receiver fills, the sender will most probably stop sending for ~200ms (the persist timeout after a zero window). The vcl.conf parameters are only used by vcl applications. The builtin echo apps do not use vcl, instead they use the native C app-interface api. Both the server and client echo apps take the fifo size as a parameter (something like fifo-size 4096 for 4MB fifos). Regards, Florin > On Dec 4, 2019, at 3:58 PM, dch...@akouto.com wrote: > > Hi Florin, > > Those are tcp echo results. Note that the "show session verbose 2" command > was issued while there was still traffic being sent. Interesting that on the > client (sender) side the tx fifo is full (cursize 65534 nitems 65534) and on > the server (receiver) side the rx fifo is empty (cursize 0 nitems 65534). > > Where is the rx and tx fifo size configured? Here's my exact vcl.conf file: > vcl { > rx-fifo-size 1600 > tx-fifo-size 1600 > app-scope-local > app-scope-global > api-socket-name /tmp/vpp-api.sock > } > > Is this what those values should match? > > Thanks, > Dom > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > > View/Reply Online (#14803): https://lists.fd.io/g/vpp-dev/message/14803 > Mute This Topic: https://lists.fd.io/mt/65863639/675152 > Group Owner: vpp-dev+ow...@lists.fd.io > Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [fcoras.li...@gmail.com] > -=-=-=-=-=-=-=-=-=-=-=- -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#14804): https://lists.fd.io/g/vpp-dev/message/14804 Mute This Topic: https://lists.fd.io/mt/65863639/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] VPP / tcp_echo performance
Hi Florin, Those are tcp echo results. Note that the "show session verbose 2" command was issued while there was still traffic being sent. Interesting that on the client (sender) side the tx fifo is full (cursize 65534 nitems 65534) and on the server (receiver) side the rx fifo is empty (cursize 0 nitems 65534). Where is the rx and tx fifo size configured? Here's my exact vcl.conf file: vcl { rx-fifo-size 1600 tx-fifo-size 1600 app-scope-local app-scope-global api-socket-name /tmp/vpp-api.sock } Is this what those values should match? Thanks, Dom -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#14803): https://lists.fd.io/g/vpp-dev/message/14803 Mute This Topic: https://lists.fd.io/mt/65863639/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] VPP / tcp_echo performance
Hi Dom, [traveling so a quick reply] For some reason, your rx/tx fifos (see nitems), and implicitly the snd and rcv wnd, are 64kB in your logs lower. Is this the tcp echo or iperf result? Regards, Florin > On Dec 4, 2019, at 7:29 AM, dch...@akouto.com wrote: > > Hi, > > Thank you Florin and Jerome for your time, very much appreciated. > > For VCL configuration, FIFO sizes are 16 MB > "show session verbose 2" does not indicate any retransmissions. Here are the > numbers during a test run where approx. 9 GB were transferred (the difference > in values between client and server is just because it took me a few seconds > to issue the command on the client side as you can see from the duration): > SERVER SIDE: > stats: in segs 5989307 dsegs 5989306 bytes 8544661342 dupacks 0 > out segs 3942513 dsegs 0 bytes 0 dupacks 0 > fr 0 tr 0 rxt segs 0 bytes 0 duration 106.489 > err wnd data below 0 above 0 ack below 0 above 0 > CLIENT SIDE: > stats: in segs 4207793 dsegs 0 bytes 0 dupacks 0 > out segs 6407444 dsegs 6407443 bytes 9141373892 dupacks 0 > fr 0 tr 0 rxt segs 0 bytes 0 duration 114.113 > err wnd data below 0 above 0 ack below 0 above 0 > sh int does not seem to indicate any issue. There are occasional drops but I > enabled tracing and checked those out, they are LLC BPDU's, I'm not sure > where those are coming from but I suspect they are from linuxbridge in the > compute host where the VMs are running. > @Jerome: Before I use the dpdk-devbind command to make the interfaces > available to VPP, they use virtio drivers. When assigned to VPP they use > uio_pci_generic. > > I'm not sure if any other stats might be useful so I'm just pasting a bunch > of stats & information from the client & server instances below, I know it's > a lot, just putting it here in case there is something useful in there. > Thanks again for taking the time to follow-up with me and for the > suggestions, I really do appreciate it very much! > > Regards, > Dom > > # > # Interface uses virtio-pci when the iperf3 test is run using regular Linux > # networking. > # > [root@vpp-test-1 centos]# dpdk-devbind --status > > Network devices using kernel driver > === > :00:03.0 'Virtio network device 1000' if=eth0 drv=virtio-pci > unused=virtio_pci *Active* > :00:04.0 'Virtio network device 1000' if=eth1 drv=virtio-pci > unused=virtio_pci *Active* > > # > # Interface uses uio_pci_generic when set up for VPP > # > > [root@vpp-test-1 centos]# dpdk-devbind --status > > Network devices using DPDK-compatible driver > > :00:03.0 'Virtio network device 1000' drv=uio_pci_generic > unused=virtio_pci > > Network devices using kernel driver > === > :00:04.0 'Virtio network device 1000' if=eth1 drv=virtio-pci > unused=virtio_pci,uio_pci_generic *Active* > > > vpp# sh hardware-interfaces > NameIdx Link Hardware > GigabitEthernet0/3/0 1 up GigabitEthernet0/3/0 > Link speed: 10 Gbps > Ethernet address fa:16:3e:10:5e:4b > Red Hat Virtio > carrier up full duplex mtu 9206 > flags: admin-up pmd maybe-multiseg > rx: queues 1 (max 1), desc 256 (min 0 max 65535 align 1) > tx: queues 1 (max 1), desc 256 (min 0 max 65535 align 1) > pci: device 1af4:1000 subsystem 1af4:0001 address :00:03.00 numa 0 > max rx packet len: 9728 > promiscuous: unicast off all-multicast on > vlan offload: strip off filter off qinq off > rx offload avail: vlan-strip udp-cksum tcp-cksum tcp-lro vlan-filter >jumbo-frame > rx offload active: jumbo-frame > tx offload avail: vlan-insert udp-cksum tcp-cksum tcp-tso multi-segs > tx offload active: multi-segs > rss avail: none > rss active:none > tx burst function: virtio_xmit_pkts > rx burst function: virtio_recv_mergeable_pkts > > rx frames ok 467 > rx bytes ok27992 > extended stats: > rx good packets467 > rx good bytes27992 > rx q0packets 467 > rx q0bytes 27992 > rx q0 good packets 467 > rx q0 good bytes 27992 > rx q0 multicast packets465 > rx q0 broadcast packets 2 >
Re: [vpp-dev] VPP / tcp_echo performance
It turns out I was using DPDK virtio, with help from Moshin I changed the configuration and tried to repeat the tests using VPP native virtio, results are similar but there are some interesting new observations, sharing them here in case they are useful to others or trigger any ideas. After configuring both instances to use VPP native virtio, I used the built-in echo test to see what throughput I would get, and I got the same results as the modified external tcp_echo, i.e. about 600 Mbps: * Added *dpdk { no-pci }* to startup.conf and configured the interface using *create int virtio * as per instructions from Moshin, confirmed settings with *show virtio pci* command * Ran the built-in test echo application to transfer 1 GB of data and got the following results: *vpp# test echo clients gbytes 1 uri tcp://10.0.0.153/5556* 1 three-way handshakes in 0.00 seconds 2288.06/s Test started at 1255.753237 Test finished at 1272.863244 1073741824 bytes (1024 mbytes, 1 gbytes) in 17.11 seconds 62755195.55 bytes/second full-duplex *.5020 gbit/second full-duplex* * I then used iperf3 with VCL on both sides and got roughly the same results (620 Mbps) * Then I rebooted the client VM and use native Linux networking on the client side with VPP on the server side, and try to repeat the iperf test * When I use VPP-native virtio on the server side, the iperf test fails, packets are dropped on the server (VPP) side, doing a trace shows packets are dropped because of "bad tcp checksum" * I then switch the server side to use DPDK virtio, the iperf test works and I get 3 Gbps throughput So, the big performance problem is on the client (sender) side, with VPP only able to get around 600 Mbps out for some reason, even when using the built-in test echo application. I'm continuing my investigation to see where the bottleneck is, any other ideas on where to look would be greatly appreciated. Also, there may be a checksum bug in the VPP-native virtio driver since the packets are not dropped on the server side when using the DPDK virtio driver. I'd be happy to help gather more details on this, create a JIRA ticket and even contribute a fix but wanted to check before going down that road, any thoughts or comments? Thanks again for all the help so far! Regards, Dom -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#14801): https://lists.fd.io/g/vpp-dev/message/14801 Mute This Topic: https://lists.fd.io/mt/65863639/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] VPP / tcp_echo performance
Are you using VPP native virtio or DPDK virtio ? Jerome De : au nom de "dch...@akouto.com" Date : mercredi 4 décembre 2019 à 16:29 À : "vpp-dev@lists.fd.io" Objet : Re: [vpp-dev] VPP / tcp_echo performance Hi, Thank you Florin and Jerome for your time, very much appreciated. · For VCL configuration, FIFO sizes are 16 MB · "show session verbose 2" does not indicate any retransmissions. Here are the numbers during a test run where approx. 9 GB were transferred (the difference in values between client and server is just because it took me a few seconds to issue the command on the client side as you can see from the duration): SERVER SIDE: stats: in segs 5989307 dsegs 5989306 bytes 8544661342 dupacks 0 out segs 3942513 dsegs 0 bytes 0 dupacks 0 fr 0 tr 0 rxt segs 0 bytes 0 duration 106.489 err wnd data below 0 above 0 ack below 0 above 0 CLIENT SIDE: stats: in segs 4207793 dsegs 0 bytes 0 dupacks 0 out segs 6407444 dsegs 6407443 bytes 9141373892 dupacks 0 fr 0 tr 0 rxt segs 0 bytes 0 duration 114.113 err wnd data below 0 above 0 ack below 0 above 0 · sh int does not seem to indicate any issue. There are occasional drops but I enabled tracing and checked those out, they are LLC BPDU's, I'm not sure where those are coming from but I suspect they are from linuxbridge in the compute host where the VMs are running. · @Jerome: Before I use the dpdk-devbind command to make the interfaces available to VPP, they use virtio drivers. When assigned to VPP they use uio_pci_generic. I'm not sure if any other stats might be useful so I'm just pasting a bunch of stats & information from the client & server instances below, I know it's a lot, just putting it here in case there is something useful in there. Thanks again for taking the time to follow-up with me and for the suggestions, I really do appreciate it very much! Regards, Dom # # Interface uses virtio-pci when the iperf3 test is run using regular Linux # networking. # [root@vpp-test-1 centos]# dpdk-devbind --status Network devices using kernel driver === :00:03.0 'Virtio network device 1000' if=eth0 drv=virtio-pci unused=virtio_pci *Active* :00:04.0 'Virtio network device 1000' if=eth1 drv=virtio-pci unused=virtio_pci *Active* # # Interface uses uio_pci_generic when set up for VPP # [root@vpp-test-1 centos]# dpdk-devbind --status Network devices using DPDK-compatible driver :00:03.0 'Virtio network device 1000' drv=uio_pci_generic unused=virtio_pci Network devices using kernel driver === :00:04.0 'Virtio network device 1000' if=eth1 drv=virtio-pci unused=virtio_pci,uio_pci_generic *Active* vpp# sh hardware-interfaces NameIdx Link Hardware GigabitEthernet0/3/0 1 up GigabitEthernet0/3/0 Link speed: 10 Gbps Ethernet address fa:16:3e:10:5e:4b Red Hat Virtio carrier up full duplex mtu 9206 flags: admin-up pmd maybe-multiseg rx: queues 1 (max 1), desc 256 (min 0 max 65535 align 1) tx: queues 1 (max 1), desc 256 (min 0 max 65535 align 1) pci: device 1af4:1000 subsystem 1af4:0001 address :00:03.00 numa 0 max rx packet len: 9728 promiscuous: unicast off all-multicast on vlan offload: strip off filter off qinq off rx offload avail: vlan-strip udp-cksum tcp-cksum tcp-lro vlan-filter jumbo-frame rx offload active: jumbo-frame tx offload avail: vlan-insert udp-cksum tcp-cksum tcp-tso multi-segs tx offload active: multi-segs rss avail: none rss active:none tx burst function: virtio_xmit_pkts rx burst function: virtio_recv_mergeable_pkts rx frames ok 467 rx bytes ok27992 extended stats: rx good packets467 rx good bytes27992 rx q0packets 467 rx q0bytes 27992 rx q0 good packets 467 rx q0 good bytes 27992 rx q0 multicast packets465 rx q0 broadcast packets 2 rx q0 undersize packets467 # # Dropped packets are LLC BPDUs, not
Re: [vpp-dev] VPP / tcp_echo performance
Hi, Thank you Florin and Jerome for your time, very much appreciated. * For VCL configuration, FIFO sizes are 16 MB * "show session verbose 2" does not indicate any retransmissions. Here are the numbers during a test run where approx. 9 GB were transferred (the difference in values between client and server is just because it took me a few seconds to issue the command on the client side as you can see from the duration): *SERVER SIDE:* stats: in segs 5989307 dsegs 5989306 bytes 8544661342 dupacks 0 out segs 3942513 dsegs 0 bytes 0 dupacks 0 fr 0 tr 0 rxt segs 0 bytes 0 duration 106.489 err wnd data below 0 above 0 ack below 0 above 0 *CLIENT SIDE:* stats: in segs 4207793 dsegs 0 bytes 0 dupacks 0 out segs 6407444 dsegs 6407443 bytes 9141373892 dupacks 0 fr 0 tr 0 rxt segs 0 bytes 0 duration 114.113 err wnd data below 0 above 0 ack below 0 above 0 * sh int does not seem to indicate any issue. There are occasional drops but I enabled tracing and checked those out, they are LLC BPDU's, I'm not sure where those are coming from but I suspect they are from linuxbridge in the compute host where the VMs are running. * *@Jerome* : Before I use the dpdk-devbind command to make the interfaces available to VPP, they use virtio drivers. When assigned to VPP they use uio_pci_generic. I'm not sure if any other stats might be useful so I'm just pasting a bunch of stats & information from the client & server instances below, I know it's a lot, just putting it here in case there is something useful in there. Thanks again for taking the time to follow-up with me and for the suggestions, I really do appreciate it very much! Regards, Dom *#* *# Interface uses virtio-pci when the iperf3 test is run using regular Linux* *# networking.* *#* *[root@vpp-test-1 centos]# dpdk-devbind --status* Network devices using kernel driver === :00:03.0 'Virtio network device 1000' if=eth0 drv=virtio-pci unused=virtio_pci *Active* :00:04.0 'Virtio network device 1000' if=eth1 drv=virtio-pci unused=virtio_pci *Active* *#* *# Interface uses uio_pci_generic when set up for VPP* *#* *[root@vpp-test-1 centos]# dpdk-devbind --status* Network devices using DPDK-compatible driver :00:03.0 'Virtio network device 1000' drv=uio_pci_generic unused=virtio_pci Network devices using kernel driver === :00:04.0 'Virtio network device 1000' if=eth1 drv=virtio-pci unused=virtio_pci,uio_pci_generic *Active* *vpp# sh hardware-interfaces* Name Idx Link Hardware GigabitEthernet0/3/0 1 up GigabitEthernet0/3/0 Link speed: 10 Gbps Ethernet address fa:16:3e:10:5e:4b Red Hat Virtio carrier up full duplex mtu 9206 flags: admin-up pmd maybe-multiseg rx: queues 1 (max 1), desc 256 (min 0 max 65535 align 1) tx: queues 1 (max 1), desc 256 (min 0 max 65535 align 1) pci: device 1af4:1000 subsystem 1af4:0001 address :00:03.00 numa 0 max rx packet len: 9728 promiscuous: unicast off all-multicast on vlan offload: strip off filter off qinq off rx offload avail: vlan-strip udp-cksum tcp-cksum tcp-lro vlan-filter jumbo-frame rx offload active: jumbo-frame tx offload avail: vlan-insert udp-cksum tcp-cksum tcp-tso multi-segs tx offload active: multi-segs rss avail: none rss active: none tx burst function: virtio_xmit_pkts rx burst function: virtio_recv_mergeable_pkts rx frames ok 467 rx bytes ok 27992 extended stats: rx good packets 467 rx good bytes 27992 rx q0packets 467 rx q0bytes 27992 rx q0 good packets 467 rx q0 good bytes 27992 rx q0 multicast packets 465 rx q0 broadcast packets 2 rx q0 undersize packets 467 *#* *# Dropped packets are LLC BPDUs, not sure but probably a linuxbridge thing* *#* *vpp# show trace* --- Start of thread 0 vpp_main --- No packets in trace buffer --- Start of thread 1 vpp_wk_0 --- Packet 1 00:08:35:202159: dpdk-input GigabitEthernet0/3/0 rx queue 0 buffer 0xfee2f4: current data 0, length 60, buffer-pool 0, ref-count 1, totlen-nifb 0, trace handle 0x100 ext-hdr-valid
Re: [vpp-dev] VPP / tcp_echo performance
Hi Dom, In addition to Florin’s questions, can you clarify what you mean by “…interfaces are assigned to DPDK/VPP” ? What driver are you using ? Regards, Jerome De : au nom de Florin Coras Date : mercredi 4 décembre 2019 à 02:31 À : "dch...@akouto.com" Cc : "vpp-dev@lists.fd.io" Objet : Re: [vpp-dev] VPP / tcp_echo performance Hi Dom, I’ve never tried to run the stack in a VM, so not sure about the expected performance, but here are a couple of comments: - What fifo sizes are you using? Are they at least 4MB (see [1] for VCL configuration). - I don’t think you need to configure more than 16k buffers/numa. Additionally, to get more information on the issue: - What does “show session verbose 2” report? Check the stats section for retransmit counts (tr - timer retransmit, fr - fast retansmit) which if non-zero indicate that packets are lost. - Check interface rx/tx error counts with “show int”. - Typically, for improved performance, you should write more than 1.4kB per call. But the fact that your average is less than 1.4kB suggests that you often find the fifo full or close to full. So probably the issue is not your sender app. Regards, Florin [1] https://wiki.fd.io/view/VPP/HostStack/LDP/iperf On Dec 3, 2019, at 11:40 AM, dch...@akouto.com<mailto:dch...@akouto.com> wrote: Hi all, I've been running some performance tests and not quite getting the results I was hoping for, and have a couple of related questions I was hoping someone could provide some tips with. For context, here's a summary of the results of TCP tests I've run on two VMs (CentOS 7 OpenStack instances, host-1 is the client and host-2 is the server): · Running iperf3 natively before the interfaces are assigned to DPDK/VPP: 10 Gbps TCP throughput · Running iperf3 with VCL/HostStack: 3.5 Gbps TCP throughput · Running a modified version of the tcp_echo application (similar results with socket and svm api): 610 Mbps throughput Things I've tried to improve performance: · Anything I could apply from https://wiki.fd.io/view/VPP/How_To_Optimize_Performance_(System_Tuning) · Added tcp { cc-algo cubic } to VPP startup config · Using isolcpu and VPP startup config options, allocated first 2, then 4 and finally 6 of the 8 available cores to VPP main & worker threads · In VPP startup config set "buffers-per-numa 65536" and "default data-size 4096" · Updated grub boot options to include hugepagesz=1GB hugepages=64 default_hugepagesz=1GB My goal is to achieve at least the same throughput using VPP as I get when I run iperf3 natively on the same network interfaces (in this case 10 Gbps). A couple of related questions: · Given the items above, do any VPP or kernel configuration items jump out that I may have missed that could justify the difference in native vs VPP performance or help get the two a bit closer? · In the modified tcp_echo application, n_sent = app_send_stream(...) is called in a loop always using the same length (1400 bytes) in my test version. The return value n_sent indicates that the average bytes sent is only around 130 bytes per call after some run time. Are there any parameters or options that might improve this? Any tips or pointers to documentation that might shed some light would be hugely appreciated! Regards, Dom -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#14772): https://lists.fd.io/g/vpp-dev/message/14772 Mute This Topic: https://lists.fd.io/mt/65863639/675152 Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [fcoras.li...@gmail.com<mailto:fcoras.li...@gmail.com>] -=-=-=-=-=-=-=-=-=-=-=- -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#14779): https://lists.fd.io/g/vpp-dev/message/14779 Mute This Topic: https://lists.fd.io/mt/65863639/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] VPP / tcp_echo performance
Hi Dom, I’ve never tried to run the stack in a VM, so not sure about the expected performance, but here are a couple of comments: - What fifo sizes are you using? Are they at least 4MB (see [1] for VCL configuration). - I don’t think you need to configure more than 16k buffers/numa. Additionally, to get more information on the issue: - What does “show session verbose 2” report? Check the stats section for retransmit counts (tr - timer retransmit, fr - fast retansmit) which if non-zero indicate that packets are lost. - Check interface rx/tx error counts with “show int”. - Typically, for improved performance, you should write more than 1.4kB per call. But the fact that your average is less than 1.4kB suggests that you often find the fifo full or close to full. So probably the issue is not your sender app. Regards, Florin [1] https://wiki.fd.io/view/VPP/HostStack/LDP/iperf > On Dec 3, 2019, at 11:40 AM, dch...@akouto.com wrote: > > Hi all, > > I've been running some performance tests and not quite getting the results I > was hoping for, and have a couple of related questions I was hoping someone > could provide some tips with. For context, here's a summary of the results of > TCP tests I've run on two VMs (CentOS 7 OpenStack instances, host-1 is the > client and host-2 is the server): > Running iperf3 natively before the interfaces are assigned to DPDK/VPP: 10 > Gbps TCP throughput > Running iperf3 with VCL/HostStack: 3.5 Gbps TCP throughput > Running a modified version of the tcp_echo application (similar results with > socket and svm api): 610 Mbps throughput > Things I've tried to improve performance: > Anything I could apply from > https://wiki.fd.io/view/VPP/How_To_Optimize_Performance_(System_Tuning) > Added tcp { cc-algo cubic } to VPP startup config > Using isolcpu and VPP startup config options, allocated first 2, then 4 and > finally 6 of the 8 available cores to VPP main & worker threads > In VPP startup config set "buffers-per-numa 65536" and "default data-size > 4096" > Updated grub boot options to include hugepagesz=1GB hugepages=64 > default_hugepagesz=1GB > My goal is to achieve at least the same throughput using VPP as I get when I > run iperf3 natively on the same network interfaces (in this case 10 Gbps). > > A couple of related questions: > Given the items above, do any VPP or kernel configuration items jump out that > I may have missed that could justify the difference in native vs VPP > performance or help get the two a bit closer? > In the modified tcp_echo application, n_sent = app_send_stream(...) is called > in a loop always using the same length (1400 bytes) in my test version. The > return value n_sent indicates that the average bytes sent is only around 130 > bytes per call after some run time. Are there any parameters or options that > might improve this? > Any tips or pointers to documentation that might shed some light would be > hugely appreciated! > > Regards, > Dom > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > > View/Reply Online (#14772): https://lists.fd.io/g/vpp-dev/message/14772 > Mute This Topic: https://lists.fd.io/mt/65863639/675152 > Group Owner: vpp-dev+ow...@lists.fd.io > Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [fcoras.li...@gmail.com] > -=-=-=-=-=-=-=-=-=-=-=- -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#14777): https://lists.fd.io/g/vpp-dev/message/14777 Mute This Topic: https://lists.fd.io/mt/65863639/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] VPP / tcp_echo performance
Hi all, I've been running some performance tests and not quite getting the results I was hoping for, and have a couple of related questions I was hoping someone could provide some tips with. For context, here's a summary of the results of TCP tests I've run on two VMs (CentOS 7 OpenStack instances, host-1 is the client and host-2 is the server): * Running iperf3 natively before the interfaces are assigned to DPDK/VPP: *10 Gbps TCP throughput* * Running iperf3 with VCL/HostStack: *3.5 Gbps TCP throughput* * Running a modified version of the *tcp_echo* application (similar results with socket and svm api): *610 Mbps throughput* Things I've tried to improve performance: * Anything I could apply from https://wiki.fd.io/view/VPP/How_To_Optimize_Performance_(System_Tuning) * Added tcp { cc-algo cubic } to VPP startup config * Using isolcpu and VPP startup config options, allocated first 2, then 4 and finally 6 of the 8 available cores to VPP main & worker threads * In VPP startup config set "buffers-per-numa 65536" and "default data-size 4096" * Updated grub boot options to include hugepagesz=1GB hugepages=64 default_hugepagesz=1GB My goal is to achieve at least the same throughput using VPP as I get when I run iperf3 natively on the same network interfaces (in this case 10 Gbps). A couple of related questions: * Given the items above, do any VPP or kernel configuration items jump out that I may have missed that could justify the difference in native vs VPP performance or help get the two a bit closer? * In the modified tcp_echo application, *n_sent = app_send_stream(...)* is called in a loop always using the same length (1400 bytes) in my test version. The return value *n_sent* indicates that the average bytes sent is only around 130 bytes per call after some run time. Are there any parameters or options that might improve this? Any tips or pointers to documentation that might shed some light would be hugely appreciated! Regards, Dom -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#14772): https://lists.fd.io/g/vpp-dev/message/14772 Mute This Topic: https://lists.fd.io/mt/65863639/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-