Re: [vpp-dev] VPP / tcp_echo performance

2019-12-16 Thread dchons
Hi Florin,

>From my logs it seems that TSO is not on even when using the native driver, 
>logs attached below. I'm going to do a deeper dive into the various networking 
>layers involved in this setup, will post any interesting findings back on this 
>thread.

Thank you for all the help so far!

Regards,
Dom

vpp# sh int
Name               Idx    State  MTU (L3/IP4/IP6/MPLS)     Counter          
Count
local0                            0     down          0/0/0/0
vpp# create int virtio :00:03.0
vpp# sh int
Name               Idx    State  MTU (L3/IP4/IP6/MPLS)     Counter          
Count
local0                            0     down          0/0/0/0
virtio-0/0/3/0                    1     down         9000/0/0/0     rx packets  
                 999
rx bytes                   60252
drops                        999
ip4                            6
vpp# set interface ip address virtio-0/0/3/0 10.0.0.152/24
vpp# set interface state virtio-0/0/3/0 up
vpp# session enable
vpp# sh int
Name               Idx    State  MTU (L3/IP4/IP6/MPLS)     Counter          
Count
local0                            0     down          0/0/0/0
virtio-0/0/3/0                    1      up          9000/0/0/0     rx packets  
                1017
rx bytes                   61332
drops                       1017
ip4                            6
vpp# sh sess verbose 2
Thread 0: no sessions
Thread 1: no sessions
Thread 2: no sessions
Thread 3: no sessions
vpp# sh sess verbose 2
Thread 0: no sessions
[1:0][T] 10.0.0.152:27761->10.0.0.156:5201        ESTABLISHED
index: 0 cfg:  flags:  timers:
snd_una 124 snd_nxt 124 snd_una_max 124 rcv_nxt 5 rcv_las 5
snd_wnd 29056 rcv_wnd 7999488 rcv_wscale 10 snd_wl1 4 snd_wl2 124
flight size 0 out space 4473 rcv_wnd_av 7999488 tsval_recent 244488841
tsecr 434592381 tsecr_last_ack 434592381 tsval_recent_age 2995 snd_mss 1448
rto 259 rto_boff 0 srtt 67 us 3.127 rttvar 48 rtt_ts 0. rtt_seq 124
next_node 0 opaque 0x0
cong:   none algo cubic cwnd 4473 ssthresh 2147483647 bytes_acked 0
cc space 4473 prev_cwnd 0 prev_ssthresh 0
snd_cong 3516215180 dupack 0 limited_tx 3516215180
rxt_bytes 0 rxt_delivered 0 rxt_head 3516215180 rxt_ts 434595747
prr_start 3516215180 prr_delivered 0 prr space 0
sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0
last_delivered 0 high_sacked 3516215180 is_reneging 0
cur_rxt_hole 4294967295 high_rxt 3516215180 rescue_rxt 3516215180
stats: in segs 6 dsegs 4 bytes 4 dupacks 0
out segs 7 dsegs 2 bytes 123 dupacks 0
fr 0 tr 0 rxt segs 0 bytes 0 duration 3.381
err wnd data below 0 above 0 ack below 0 above 0
pacer: rate 1430540 bucket 0 t/p 1.431 last_update 3.365 s idle 156
Rx fifo: cursize 0 nitems 799 has_event 0
head 4 tail 4 segment manager 2
vpp session 0 thread 1 app session 0 thread 0
ooo pool 0 active elts newest 4294967295
Tx fifo: cursize 0 nitems 799 has_event 0
head 123 tail 123 segment manager 2
vpp session 0 thread 1 app session 0 thread 0
ooo pool 0 active elts newest 4294967295
session: state: ready opaque: 0x0 flags:
[1:1][T] 10.0.0.152:31516->10.0.0.156:5201        ESTABLISHED
index: 1 cfg:  flags: PSH pending timers: RETRANSMIT
snd_una 455633510 snd_nxt 456069358 snd_una_max 456069358 rcv_nxt 1 rcv_las 1
snd_wnd 1575424 rcv_wnd 7999488 rcv_wscale 10 snd_wl1 1 snd_wl2 455633510
flight size 435848 out space 486 rcv_wnd_av 7999488 tsval_recent 244492205
tsecr 434595744 tsecr_last_ack 434595744 tsval_recent_age 4294966926 snd_mss 
1448
rto 200 rto_boff 0 srtt 1 us 2.612 rttvar 1 rtt_ts 95.8681 rtt_seq 455697222
next_node 0 opaque 0x0
cong:   none algo cubic cwnd 436334 ssthresh 333526 bytes_acked 2896
cc space 486 prev_cwnd 476467 prev_ssthresh 341803
snd_cong 419169974 dupack 0 limited_tx 2714252909
rxt_bytes 0 rxt_delivered 0 rxt_head 2714252909 rxt_ts 434595747
prr_start 418787702 prr_delivered 0 prr space 0
sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0
last_delivered 0 high_sacked 419169974 is_reneging 0
cur_rxt_hole 4294967295 high_rxt 418797838 rescue_rxt 418787701
stats: in segs 144873 dsegs 0 bytes 0 dupacks 2553
out segs 315278 dsegs 315277 bytes 456519685 dupacks 0
fr 12 tr 0 rxt segs 311 bytes 450328 duration 3.371
err wnd data below 0 above 0 ack below 0 above 0
pacer: rate 436334000 bucket 282 t/p 436.334 last_update 264 us idle 100
Rx fifo: cursize 0 nitems 799 has_event 0
head 0 tail 0 segment manager 2
vpp session 1 thread 1 app session 1 thread 0
ooo pool 0 active elts newest 0
Tx fifo: cursize 799 nitems 799 has_event 1
head 7633509 tail 7633508 segment manager 2
vpp session 1 thread 1 app session 1 thread 0
ooo pool 0 active elts newest 4294967295
session: state: ready opaque: 0x0 flags:
Thread 1: active sessions 2
Thread 2: no sessions
Thread 3: no sessions
vpp# sh hardware-interfaces
Name                Idx   Link  Hardware
local0                             0    down  local0
Link speed: unknown
local
virtio-0/0/3/0                     1     up   virtio-0/0/3/0
Link speed: unknown
Ethernet 

Re: [vpp-dev] VPP / tcp_echo performance

2019-12-13 Thread Florin Coras
Hi Dom, 

From the logs it looks like TSO is not on. I wonder if the vhost nic actually 
honors the “tso on” flag. Have you also tried with native vhost driver, instead 
of the dpdk one? I’ve never tried with the tcp, so I don’t know if it properly 
advertises the fact that it supports TSO. 

Lower you can see how it looks on my side, between two Broadwell boxes with 
XL710s. The tcp connection TSO flag needs to be on, otherwise tcp will do the 
segmentation by itself. 

Regards, 
Florin

$ ~/vpp/vcl_iperf_client 6.0.1.2 -t 10
[snip]
[ ID] Interval   Transfer Bandwidth   Retr
[ 33]   0.00-10.00  sec  42.2 GBytes  36.2 Gbits/sec0 sender
[ 33]   0.00-10.00  sec  42.2 GBytes  36.2 Gbits/sec  receiver

vpp# show session verbose 2
[snip]
[1:1][T] 6.0.1.1:27240->6.0.1.2:5201  ESTABLISHED
 index: 1 cfg: TSO flags: PSH pending timers: RETRANSMIT
 snd_una 2731494347 snd_nxt 2731992143 snd_una_max 2731992143 rcv_nxt 1 rcv_las 
1
 snd_wnd 1999872 rcv_wnd 3999744 rcv_wscale 10 snd_wl1 1 snd_wl2 2731494347
 flight size 497796 out space 716 rcv_wnd_av 3999744 tsval_recent 1787061797
 tsecr 3347210414 tsecr_last_ack 3347210414 tsval_recent_age 4294966829 snd_mss 
1448
 rto 200 rto_boff 0 srtt 1 us .101 rttvar 1 rtt_ts 8.6696 rtt_seq 2731733367
 next_node 0 opaque 0x0
 cong:   none algo cubic cwnd 498512 ssthresh 407288 bytes_acked 17376
 cc space 716 prev_cwnd 581841 prev_ssthresh 403737
 snd_cong 2702482407 dupack 0 limited_tx 1608697445
 rxt_bytes 0 rxt_delivered 0 rxt_head 13367060 rxt_ts 3347210414
 prr_start 2701996195 prr_delivered 0 prr space 0
 sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0
 last_delivered 0 high_sacked 2702540327 is_reneging 0
 cur_rxt_hole 4294967295 high_rxt 2702048323 rescue_rxt 2701996194
 stats: in segs 293052 dsegs 0 bytes 0 dupacks 5568
out segs 381811 dsegs 381810 bytes 15628627726 dupacks 0
fr 229 tr 0 rxt segs 8207 bytes 11733696 duration 3.468
err wnd data below 0 above 0 ack below 0 above 0
 pacer: rate 4941713080 bucket 2328382 t/p 4941.713 last_update 0 us idle 100
 Rx fifo: cursize 0 nitems 399 has_event 0
  head 0 tail 0 segment manager 1
  vpp session 1 thread 1 app session 1 thread 0
  ooo pool 0 active elts newest 0
 Tx fifo: cursize 199 nitems 199 has_event 1
  head 396234 tail 396233 segment manager 1
  vpp session 1 thread 1 app session 1 thread 0
  ooo pool 0 active elts newest 4294967295
 session: state: ready opaque: 0x0 flags:

vpp# sh run 
[snip]
Thread 1 vpp_wk_0 (lcore 24)
Time 774.3, 10 sec internal node vector rate 0.00
  vector rates in 2.5159e3, out 1.4186e3, drop 1.2915e-3, punt 0.e0
 Name State Calls  Vectors
Suspends Clocks   Vectors/Call
FortyGigabitEthernet84/0/0-out   active 977678 1099456  
 0  2.47e21.12
FortyGigabitEthernet84/0/0-txactive 977678 1098446  
 0  2.17e31.12
ethernet-input   active 442524  848618  
 0  2.69e21.92
ip4-input-no-checksumactive 442523  848617  
 0  2.86e21.92
ip4-localactive 442523  848617  
 0  3.24e21.92
ip4-lookup   active1291425 1948073  
 0  2.09e21.51
ip4-rewrite  active 977678 1099456  
 0  2.23e21.12
session-queuepolling7614793106 1099452  
 0  7.45e50.00
tcp4-established active 442520  848614  
 0  1.26e31.92
tcp4-input   active 442523  848617  
 0  3.04e21.92
tcp4-output  active 977678 1099456  
 0  3.77e21.12
tcp4-rcv-process active  1   1  
 0  5.82e31.00
tcp4-syn-sentactive  2   2  
 0  6.84e41.00


> On Dec 13, 2019, at 12:58 PM, dch...@akouto.com wrote:
> 
> Hi,
> I rebuilt VPP on master and updated startup.conf to enable tso as follows:
> dpdk {
>   dev :00:03.0{
>   num-rx-desc 2048
>   num-tx-desc 2048
>   tso on
>   }
>   uio-driver vfio-pci
>   enable-tcp-udp-checksum
> }
> 
> I'm not sure whether it is working or not, there is nothing in show session 
> verbose 2 to indicate whether it is on or off (output at the end of this 
> 

Re: [vpp-dev] VPP / tcp_echo performance

2019-12-13 Thread dchons
Hi,
I rebuilt VPP on master and updated startup.conf to enable tso as follows:
dpdk {
dev :00:03.0{
num-rx-desc 2048
num-tx-desc 2048
tso on
}
uio-driver vfio-pci
enable-tcp-udp-checksum
}

I'm not sure whether it is working or not, there is nothing in show session 
verbose 2 to indicate whether it is on or off (output at the end of this 
update). Unfortunately there was no improvement from a performance perspective.

Then I figured I would try using a tap interface on the VPP side so I could run 
iperf3 "natively" on the VPP client side as well, but got the same result 
again. I find this so perplexing, two test runs back to back with reboots in 
between to rule out any configuration issues:

*Test 1 using native linux networking on both sides:*
[iperf3 client --> linux networking eth0] --> [Openstack/Linuxbridge] --> 
[linux networking eth0 --> iperf3 server]
Result: 10+ Gbps

*Reboot both instances and assign the NIC on the client side to VPP :*

vpp# set int l2 bridge GigabitEthernet0/3/0 1

vpp# set int state GigabitEthernet0/3/0 up

vpp# create tap

tap0

vpp# set int l2 bridge tap0 1

vpp# set int state tap0 up

[root]# ip addr add 10.0.0.152/24 dev tap0

[iperf3 client --> tap0 --> VPP GigabitEthernet0/3/0 ] --> 
[Openstack/Linuxbridge] --> [ linux networking eth0 --> iperf3 server]
Result: 1 Gbps

I had started to suspect the host OS or OpenStack Neutron, linuxbridge etc, but 
based on this it just *has* to be something in the guest running VPP. Any and 
all ideas or suggestions are welcome!

Regards,
Dom

Note: this output is from a run using iperf3+VCL with the TSO settings in 
startup.conf, not the tap interface test described above:

vpp# set interface ip address GigabitEthernet0/3/0 10.0.0.152/24
vpp# set interface state GigabitEthernet0/3/0 up
vpp# session enable
vpp# sh session verbose 2
Thread 0: no sessions
[1:0][T] 10.0.0.152:6445->10.0.0.156:5201         ESTABLISHED
index: 0 cfg:  flags:  timers:
snd_una 124 snd_nxt 124 snd_una_max 124 rcv_nxt 5 rcv_las 5
snd_wnd 29056 rcv_wnd 7999488 rcv_wscale 10 snd_wl1 4 snd_wl2 124
flight size 0 out space 4473 rcv_wnd_av 7999488 tsval_recent 3428491
tsecr 193532193 tsecr_last_ack 193532193 tsval_recent_age 13996 snd_mss 1448
rto 259 rto_boff 0 srtt 67 us 3.891 rttvar 48 rtt_ts 0. rtt_seq 124
next_node 0 opaque 0x0
cong:   none algo cubic cwnd 4473 ssthresh 2147483647 bytes_acked 0
cc space 4473 prev_cwnd 0 prev_ssthresh 0
snd_cong 1281277517 dupack 0 limited_tx 1281277517
rxt_bytes 0 rxt_delivered 0 rxt_head 1281277517 rxt_ts 193546719
prr_start 1281277517 prr_delivered 0 prr space 0
sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0
last_delivered 0 high_sacked 1281277517 is_reneging 0
cur_rxt_hole 4294967295 high_rxt 1281277517 rescue_rxt 1281277517
stats: in segs 6 dsegs 4 bytes 4 dupacks 0
out segs 7 dsegs 2 bytes 123 dupacks 0
fr 0 tr 0 rxt segs 0 bytes 0 duration 14.539
err wnd data below 0 above 0 ack below 0 above 0
pacer: rate 1149550 bucket 0 t/p 1.149 last_update 14.526 s idle 194
Rx fifo: cursize 0 nitems 799 has_event 0
head 4 tail 4 segment manager 2
vpp session 0 thread 1 app session 0 thread 0
ooo pool 0 active elts newest 4294967295
Tx fifo: cursize 0 nitems 799 has_event 0
head 123 tail 123 segment manager 2
vpp session 0 thread 1 app session 0 thread 0
ooo pool 0 active elts newest 4294967295
session: state: ready opaque: 0x0 flags:
[1:1][T] 10.0.0.152:10408->10.0.0.156:5201        ESTABLISHED
index: 1 cfg:  flags:  timers: RETRANSMIT
snd_una 2195902174 snd_nxt 2196262726 snd_una_max 2196262726 rcv_nxt 1 rcv_las 1
snd_wnd 1574016 rcv_wnd 7999488 rcv_wscale 10 snd_wl1 1 snd_wl2 2195902174
flight size 360552 out space 832 rcv_wnd_av 7999488 tsval_recent 3443014
tsecr 193546715 tsecr_last_ack 193546715 tsval_recent_age 4294966768 snd_mss 
1448
rto 200 rto_boff 0 srtt 1 us 2.606 rttvar 1 rtt_ts 45.0534 rtt_seq 2195903622
next_node 0 opaque 0x0
cong:   none algo cubic cwnd 361384 ssthresh 329528 bytes_acked 2896
cc space 832 prev_cwnd 470755 prev_ssthresh 340435
snd_cong 2188350854 dupack 0 limited_tx 2709798285
rxt_bytes 0 rxt_delivered 0 rxt_head 2143051622 rxt_ts 193546719
prr_start 2187975822 prr_delivered 0 prr space 0
sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0
last_delivered 0 high_sacked 2188350854 is_reneging 0
cur_rxt_hole 4294967295 high_rxt 2187977270 rescue_rxt 2187975821
stats: in segs 720132 dsegs 0 bytes 0 dupacks 127869
out segs 1549120 dsegs 1549119 bytes 2243122901 dupacks 0
fr 43 tr 0 rxt segs 32362 bytes 46860176 duration 14.529
err wnd data below 0 above 0 ack below 0 above 0
pacer: rate 361384000 bucket 1996 t/p 361.384 last_update 619 us idle 100
Rx fifo: cursize 0 nitems 799 has_event 0
head 0 tail 0 segment manager 2
vpp session 1 thread 1 app session 1 thread 0
ooo pool 0 active elts newest 0
Tx fifo: cursize 799 nitems 799 has_event 1
head 3902173 tail 3902172 segment manager 2
vpp session 1 thread 1 app session 1 thread 0
ooo pool 0 active elts 

Re: [vpp-dev] VPP / tcp_echo performance

2019-12-12 Thread Florin Coras
Hi Dom, 


> On Dec 12, 2019, at 12:29 PM, dch...@akouto.com wrote:
> 
> Hi Florin,
> 
> The saga continues, a little progress and more questions. In order to reduce 
> the variables, I am now only using VPP on one of the VMs: iperf3 server is 
> running on a VM with native Linux networking, and iperf3+VCL client running 
> on the second VM.

FC: Okay!

> 
> I've pasted the output from a few commands during this test run below and 
> have a few questions if you don't mind.
> The "show errors" command indicates "Tx packet drops (dpdk tx failure)". I 
> have done quite a bit of searching, found other mentions of this in other 
> threads but no tips as to where to look or hints on how it was / can be 
> solved. Any thoughts?
FC: The number of drops is not that large, so we can ignore for now. 
> I'm not really sure how to interpret the results of "show run" but nothing 
> jumps out at me, do you see anything useful in there?
FC: Nothing apart from the fact that one of vpp’s workers is moderately loaded 
(you’re still running 3 workers). 
> Some of the startup.conf options were not working for me, so I switched to 
> building from source (I chose to use tag v20.01-rc0 for some stability). 
> Still no luck with some of the options:
> When I try to use tcp { tso } I get this: 0: tcp_config_fn: unknown input ` 
> tso'
FC: You need to get “closer” to master HEAD. That tag was laid when 19.08 was 
released but tso support was merged afterwards. Typically our CI infra is good 
enough to keep things running so you might want to try master latest. 
> When I try to use num-mbufs in the dpdk section, I get 0: dpdk_config: 
> unknown input `num-mbufs 65535’
FC: This was deprecated at one point. The new stanza is "buffers { 
buffers-per-numa  }"
> 
> Do you know if these options are supported? I can't figure out a way to 
> increase mbufs since the above option does not work, and when I try to use 
> socket-mem (which according to the documentation is needed if there is a need 
> for a larger number of mbufs) I get this: dpdk_config:1408: socket-mem 
> argument is deprecated

FC: Yes, this was also deprecated. 

> 
> To answer some of your questions from your previous reply:
> I have indeed been using taaskset and watching CPU load with top to make sure 
> things are going where I expect them to go
> I am not trying to use jumbo buffers, increasing "default data-size" was just 
> an attempt to see if there would be a difference
> Thanks for the cubic congestion algo suggestion, made the change but no 
> improvement

FC: Understood! I guess that means we should try tso. I just tested it and it 
seems dpdk stanza needs an extra "dpdk {enable-tcp-udp-checksum}” apart from 
“dpdk { dev  { tso on } }”. Let me know if you hit any other issues with 
it. You’ll know that it’s running if you do “show session verbose 2” and you 
see “TSO" in the cfg flags, instead of “TSO off”. 

Regards, 
Florin
> Thank you for all the help, it is very much appreciated.
> 
> Regards,
> Dom
> 
> vpp# sh int
>   Name   IdxState  MTU (L3/IP4/IP6/MPLS) 
> Counter  Count
> GigabitEthernet0/3/0  1  up  9000/0/0/0 rx 
> packets   1642537
> rx bytes  
>  108676814
> tx 
> packets   5216493
> tx bytes  
> 7793319472
> drops 
>392
> ip4   
>1642178
> tx-error  
>475
> local00 down  0/0/0/0   drops 
>  1
> 
> vpp# sh err
>CountNode  Reason
>  1ip4-glean   ARP requests sent
>  7   dpdk-input   no error
>5216424  session-queue Packets transmitted
>  1tcp4-rcv-processPure ACKs received
>  2  tcp4-syn-sent SYN-ACKs received
>  7tcp4-establishedPackets pushed into rx fifo
>1619850tcp4-establishedPure ACKs received
>  22219tcp4-establishedDuplicate ACK
>  1tcp4-establishedResets received
> 62tcp4-establishedConnection closed
>  1tcp4-establishedFINs received
> 62   tcp4-output  Resets sent
>  2arp-reply   ARP replies sent
> 33ip4-input   

Re: [vpp-dev] VPP / tcp_echo performance

2019-12-12 Thread dchons
Hi Florin,

The saga continues, a little progress and more questions. In order to reduce 
the variables, I am now only using VPP on one of the VMs: iperf3 server is 
running on a VM with native Linux networking, and iperf3+VCL client running on 
the second VM.

I've pasted the output from a few commands during this test run below and have 
a few questions if you don't mind.

* The "show errors" command indicates " *Tx packet drops (dpdk tx failure)* ". 
I have done quite a bit of searching, found other mentions of this in other 
threads but no tips as to where to look or hints on how it was / can be solved. 
Any thoughts?
* I'm not really sure how to interpret the results of "show run" but nothing 
jumps out at me, do you see anything useful in there?
* Some of the startup.conf options were not working for me, so I switched to 
building from source (I chose to use tag v20.01-rc0 for some stability). Still 
no luck with some of the options:

* When I try to use tcp { tso } I get this: *0:* *tcp_config_fn: unknown input 
` tso'*
* When I try to use num-mbufs in the dpdk section, I get *0: dpdk_config: 
unknown input `num-mbufs 65535'*

Do you know if these options are supported? I can't figure out a way to 
increase mbufs since the above option does not work, and when I try to use 
socket-mem (which according to the documentation is needed if there is a need 
for a larger number of mbufs) I get this: *dpdk_config:1408: socket-mem 
argument is deprecated*

To answer some of your questions from your previous reply:

* I have indeed been using taaskset and watching CPU load with top to make sure 
things are going where I expect them to go
* I am not trying to use jumbo buffers, increasing "default data-size" was just 
an attempt to see if there would be a difference
* Thanks for the cubic congestion algo suggestion, made the change but no 
improvement

Thank you for all the help, it is very much appreciated.

Regards,
Dom

*vpp# sh int*
Name               Idx    State  MTU (L3/IP4/IP6/MPLS)     Counter          
Count
GigabitEthernet0/3/0              1      up          9000/0/0/0     rx packets  
             1642537
rx bytes               108676814
tx packets               5216493
tx bytes              7793319472
drops                        392
ip4                      1642178
tx-error                     475
local0                            0     down          0/0/0/0       drops       
                   1

*vpp# sh err*
Count                    Node                  Reason
1                ip4-glean               ARP requests sent
7               dpdk-input               no error
5216424              session-queue             Packets transmitted
1            tcp4-rcv-process            Pure ACKs received
2              tcp4-syn-sent             SYN-ACKs received
7            tcp4-established            Packets pushed into rx fifo
1619850            tcp4-established            Pure ACKs received
22219            tcp4-established            Duplicate ACK
1            tcp4-established            Resets received
62            tcp4-established            Connection closed
1            tcp4-established            FINs received
62               tcp4-output              Resets sent
2                arp-reply               ARP replies sent
33                ip4-input               unknown ip protocol
1                ip4-input               Multicast RPF check failed
1                ip4-glean               ARP requests sent
351                llc-input               unknown llc ssap/dsap
475         GigabitEthernet0/3/0-tx        Tx packet drops (dpdk tx failure)

*vpp# sh run*
Thread 0 vpp_main (lcore 7)
Time 94.7, average vectors/node 1.00, last 128 main loops 0.00 per node 0.00
vector rates in 0.e0, out 3.1669e-2, drop 1.0556e-2, punt 0.e0
Name                 State         Calls          Vectors        Suspends       
  Clocks       Vectors/Call
GigabitEthernet0/3/0-output      active                  3               3      
         0          3.29e4            1.00
GigabitEthernet0/3/0-tx          active                  3               3      
         0          3.73e4            1.00
acl-plugin-fa-cleaner-process  event wait                0               0      
         1          2.78e4            0.00
admin-up-down-process          event wait                0               0      
         1          2.24e3            0.00
api-rx-from-ring                any wait                 0               0      
        24          1.01e6            0.00
avf-process                    event wait                0               0      
         1          2.15e4            0.00
bfd-process                    event wait                0               0      
         1          1.49e4            0.00
bond-process                   event wait                0               0      
         1          1.43e4            0.00
dhcp-client-process             any wait                 0               0      
  

Re: [vpp-dev] VPP / tcp_echo performance

2019-12-06 Thread Florin Coras
Hi Dom, 

Great to see progress! More inline. 

> On Dec 6, 2019, at 10:21 AM, dch...@akouto.com wrote:
> 
> Hi Florin,
> 
> Some progress, at least with the built-in echo app, thank you for all the 
> suggestions so far! By adjusting the fifo-size and testing in half-duplex I 
> was able to get close to 5 Gbps between the two openstack instances using the 
> built-in test echo app:
> 
> vpp# test echo clients gbytes 1 no-return fifo-size 100 uri 
> tcp://10.0.0.156/

FC: The cli for the echo apps is a bit confusing. Whatever you pass above is 
left shifted by 10 (multiplied by 1024) so that’s why I suggested to use 4096 
(~4MB). You can also use larger values, but above you are asking for ~1GB :-)

> 1 three-way handshakes in .26 seconds 3.86/s
> Test started at 745.163085
> Test finished at 746.937343
> 1073741824 bytes (1024 mbytes, 1 gbytes) in 1.77 seconds
> 605177784.33 bytes/second half-duplex
> 4.8414 gbit/second half-duplex
> 
> I need to get closer to 10 Gbps but at least there is good proof that the 
> issue is related to configuration / tuning. So, I switched back to iperf 
> testing with VCL, and I'm back to 600 Mbps, even though I can confirm that 
> the fifo sizes match what is configured in vcl.conf (note that in this test 
> run I changed that to 8 Mb each for rx and tx from the previous 16, but 
> results are the same when I use 16 Mb). I'm obviously missing something in 
> the configuration but I can't imagine what that might be. Below is my exact 
> startup.conf, vcl.conf and output from show session from this iperf run to 
> give the full picture, hopefully something jumps out as missing in my 
> configuration. Thank you for your patience and support with this, much 
> appreciated!

FC: Not entirely sure what the issue is but some things can be improved. More 
lower. 

> 
> [root@vpp-test-1 centos]# cat vcl.conf
> vcl {
>   rx-fifo-size 800
>   tx-fifo-size 800
>   app-scope-local
>   app-scope-global
>   api-socket-name /tmp/vpp-api.sock
> }

FC: This looks okay.

> 
> [root@vpp-test-1 centos]# cat /etc/vpp/startup.conf
> unix {
>   nodaemon
>   log /var/log/vpp/vpp.log
>   full-coredump
>   cli-listen /run/vpp/cli.sock
>   gid vpp
>   interactive
> }
> dpdk {
>   dev :00:03.0{
>   num-rx-desc 65535
>   num-tx-desc 65535

FC: Not sure about this. I don’t have any experience with vhost interfaces, but 
for XL710s I typically use 256 descriptors. It might be too low if you start 
noticing lots of rx/tx drops with “show int”. 

>   }
> }
> session { evt_qs_memfd_seg }
> socksvr { socket-name /tmp/vpp-api.sock }
> api-trace {
>   on
> }
> api-segment {
>   gid vpp
> }
> cpu {
> main-core 7
> corelist-workers 4-6
> workers 3

FC: For starters, could you try this out with only 1 worker, since you’re 
testing with 1 connection. 

Also, did you try pinning iperf with taskset to a worker on the same numa like 
your vpp workers, in case you have multiple numas? Check with lscpu your cpu 
into numa distribution.  

You may want to pin iperf even if you have only one numa, just to be sure it 
won’t be scheduled by mistake on the cores vpp is using. 

> }
> buffers {
> ## Increase number of buffers allocated, needed only in scenarios with
> ## large number of interfaces and worker threads. Value is per numa 
> node.
> ## Default is 16384 (8192 if running unpriviledged)
> buffers-per-numa 128000

FC: For simple testing I only use 16k, but this value actually depends on the 
number of rx/tx descriptors you have configured. 

>  
> ## Size of buffer data area
> ## Default is 2048
> default data-size 8192

FC: Are you trying to use jumbo buffers? You need to add to the tcp stanza, 
i.e., tcp { mtu  }. But for starters don’t modify the 
buffer size, just to get an idea of where performance is without this. 

Afterwards, as Jerome suggested, you may want to try tso by enabling it for 
tcp, i.e., tcp { tso } in startup.conf and enabling tso for the nic by adding 
“tso on” to the nic’s dpdk stanza (if the nic actually supports it). You don’t 
need to change the buffer size for that. 

> }
> 
> vpp# sh session verbose 2
> Thread 0: no sessions
> [1:0][T] 10.0.0.152:41737->10.0.0.156:5201ESTABLISHED
>  index: 0 flags:  timers:
>  snd_una 124 snd_nxt 124 snd_una_max 124 rcv_nxt 5 rcv_las 5
>  snd_wnd 7999488 rcv_wnd 7999488 rcv_wscale 10 snd_wl1 4 snd_wl2 124
>  flight size 0 out space 4413 rcv_wnd_av 7999488 tsval_recent 12893009
>  tsecr 10757431 tsecr_last_ack 10757431 tsval_recent_age 1995 snd_mss 1428
>  rto 200 rto_boff 0 srtt 3 us 3.887 rttvar 2 rtt_ts 0. rtt_seq 124
>  cong:   none algo newreno cwnd 4413 ssthresh 4194304 bytes_acked 0
>  cc space 4413 prev_cwnd 0 prev_ssthresh 0 rtx_bytes 0
>  snd_congestion 1736877166 dupack 0 limited_transmit 1736877166
>  sboard: sacked_bytes 0 last_sacked_bytes 0 lost_bytes 0
>  last_bytes_delivered 0 high_sacked 

Re: [vpp-dev] VPP / tcp_echo performance

2019-12-06 Thread dchons
Hi Florin,

Some progress, at least with the built-in echo app, thank you for all the 
suggestions so far! By adjusting the fifo-size and testing in half-duplex I was 
able to get close to 5 Gbps between the two openstack instances using the 
built-in test echo app:

vpp# test echo clients gbytes 1 no-return fifo-size 100 uri 
tcp://10.0.0.156/
1 three-way handshakes in .26 seconds 3.86/s
Test started at 745.163085
Test finished at 746.937343
1073741824 bytes (1024 mbytes, 1 gbytes) in 1.77 seconds
605177784.33 bytes/second half-duplex
4.8414 gbit/second half-duplex

I need to get closer to 10 Gbps but at least there is good proof that the issue 
is related to configuration / tuning. So, I switched back to iperf testing with 
VCL, and I'm back to 600 Mbps, even though I can confirm that the fifo sizes 
match what is configured in vcl.conf (note that in this test run I changed that 
to 8 Mb each for rx and tx from the previous 16, but results are the same when 
I use 16 Mb). I'm obviously missing something in the configuration but I can't 
imagine what that might be. Below is my exact startup.conf, vcl.conf and output 
from show session from this iperf run to give the full picture, hopefully 
something jumps out as missing in my configuration. Thank you for your patience 
and support with this, much appreciated!

*[root@vpp-test-1 centos]# cat vcl.conf*
vcl {
rx-fifo-size 800
tx-fifo-size 800
app-scope-local
app-scope-global
api-socket-name /tmp/vpp-api.sock
}

*[root@vpp-test-1 centos]# cat /etc/vpp/startup.conf*
unix {
nodaemon
log /var/log/vpp/vpp.log
full-coredump
cli-listen /run/vpp/cli.sock
gid vpp
interactive
}
dpdk {
dev :00:03.0{
num-rx-desc 65535
num-tx-desc 65535
}
}
session { evt_qs_memfd_seg }
socksvr { socket-name /tmp/vpp-api.sock }
api-trace {
on
}
api-segment {
gid vpp
}
cpu {
main-core 7
corelist-workers 4-6
workers 3
}
buffers {
## Increase number of buffers allocated, needed only in scenarios with
## large number of interfaces and worker threads. Value is per numa node.
## Default is 16384 (8192 if running unpriviledged)
buffers-per-numa 128000

## Size of buffer data area
## Default is 2048
default data-size 8192
}

*vpp# sh session verbose 2*
Thread 0: no sessions
[1:0][T] 10.0.0.152:41737->10.0.0.156:5201        ESTABLISHED
index: 0 flags:  timers:
snd_una 124 snd_nxt 124 snd_una_max 124 rcv_nxt 5 rcv_las 5
snd_wnd 7999488 rcv_wnd 7999488 rcv_wscale 10 snd_wl1 4 snd_wl2 124
flight size 0 out space 4413 rcv_wnd_av 7999488 tsval_recent 12893009
tsecr 10757431 tsecr_last_ack 10757431 tsval_recent_age 1995 snd_mss 1428
rto 200 rto_boff 0 srtt 3 us 3.887 rttvar 2 rtt_ts 0. rtt_seq 124
cong:   none algo newreno cwnd 4413 ssthresh 4194304 bytes_acked 0
cc space 4413 prev_cwnd 0 prev_ssthresh 0 rtx_bytes 0
snd_congestion 1736877166 dupack 0 limited_transmit 1736877166
sboard: sacked_bytes 0 last_sacked_bytes 0 lost_bytes 0
last_bytes_delivered 0 high_sacked 1736877166 snd_una_adv 0
cur_rxt_hole 4294967295 high_rxt 1736877166 rescue_rxt 1736877166
stats: in segs 7 dsegs 4 bytes 4 dupacks 0
out segs 7 dsegs 2 bytes 123 dupacks 0
fr 0 tr 0 rxt segs 0 bytes 0 duration 2.484
err wnd data below 0 above 0 ack below 0 above 0
pacer: bucket 42459 tokens/period .685 last_update 61908201
Rx fifo: cursize 0 nitems 799 has_event 0
head 4 tail 4 segment manager 3
vpp session 0 thread 1 app session 0 thread 0
ooo pool 0 active elts newest 4294967295
Tx fifo: cursize 0 nitems 799 has_event 0
head 123 tail 123 segment manager 3
vpp session 0 thread 1 app session 0 thread 0
ooo pool 0 active elts newest 4294967295
[1:1][T] 10.0.0.152:53460->10.0.0.156:5201        ESTABLISHED
index: 1 flags: PSH pending timers: RETRANSMIT
snd_una 160482962 snd_nxt 160735718 snd_una_max 160735718 rcv_nxt 1 rcv_las 1
snd_wnd 7999488 rcv_wnd 7999488 rcv_wscale 10 snd_wl1 1 snd_wl2 160482962
flight size 252756 out space 714 rcv_wnd_av 7999488 tsval_recent 12895476
tsecr 10759907 tsecr_last_ack 10759907 tsval_recent_age 4294966825 snd_mss 1428
rto 200 rto_boff 0 srtt 1 us 3.418 rttvar 2 rtt_ts 42.0588 rtt_seq 160485818
cong:   none algo newreno cwnd 253470 ssthresh 187782 bytes_acked 2856
cc space 714 prev_cwnd 382704 prev_ssthresh 187068 rtx_bytes 0
snd_congestion 150237062 dupack 0 limited_transmit 817908495
sboard: sacked_bytes 0 last_sacked_bytes 0 lost_bytes 0
last_bytes_delivered 0 high_sacked 150242774 snd_una_adv 0
cur_rxt_hole 4294967295 high_rxt 150235634 rescue_rxt 149855785
stats: in segs 84958 dsegs 0 bytes 0 dupacks 1237
out segs 112747 dsegs 112746 bytes 160999897 dupacks 0
fr 5 tr 0 rxt segs 185 bytes 264180 duration 2.473
err wnd data below 0 above 0 ack below 0 above 0
pacer: bucket 22180207 tokens/period 117.979 last_update 61e173e5
Rx fifo: cursize 0 nitems 799 has_event 0
head 0 tail 0 segment manager 3
vpp session 1 thread 1 app session 1 thread 0
ooo pool 0 active elts newest 0
Tx fifo: cursize 799 nitems 799 has_event 1
head 482961 tail 482960 segment manager 3
vpp 

Re: [vpp-dev] VPP / tcp_echo performance

2019-12-04 Thread Florin Coras
Hi Dom,

I would actually recommend testing with iperf because it should not be slower 
than the builtin echo server/client apps. Remember to add fifo-size to your 
echo apps cli commands (something like fifo-size 4096 for 4MB) to increase the 
fifo sizes. 

Also note that you’re trying full-duplex testing. To check half-duplex, add 
no-echo to the server and no-return to client (or the other way around - in an 
airport and can’t remember the exact cli). We should probably make half-duplex 
default. 

I’m surprised that iperf reports throughput as small as the echo apps. Did you 
check that fifo sizes are 16MB as configured and that snd_wnd/rcv_wnd/cwnd 
reported by “show session verbose 2” are the right size?

As for the checksum issues you’re hitting, I agree. It might be that tcp 
checksum offloading does not work properly with your interfaces. 

Regards,
Florin

> On Dec 4, 2019, at 2:18 PM, dch...@akouto.com wrote:
> 
> It turns out I was using DPDK virtio, with help from Moshin I changed the 
> configuration and tried to repeat the tests using VPP native virtio, results 
> are similar but there are some interesting new observations, sharing them 
> here in case they are useful to others or trigger any ideas. 
> 
> After configuring both instances to use VPP native virtio, I used the 
> built-in echo test to see what throughput I would get, and I got the same 
> results as the modified external tcp_echo, i.e. about 600 Mbps:
> Added dpdk { no-pci } to startup.conf and configured the interface using 
> create int virtio  as per instructions from Moshin, confirmed 
> settings with show virtio pci command
> Ran the built-in test echo application to transfer 1 GB of data and got the 
> following results:
> vpp# test echo clients gbytes 1 uri tcp://10.0.0.153/5556
> 1 three-way handshakes in 0.00 seconds 2288.06/s
> Test started at 1255.753237
> Test finished at 1272.863244
> 1073741824 bytes (1024 mbytes, 1 gbytes) in 17.11 seconds
> 62755195.55 bytes/second full-duplex
> .5020 gbit/second full-duplex
> I then used iperf3 with VCL on both sides and got roughly the same results 
> (620 Mbps)
> Then I rebooted the client VM and use native Linux networking on the client 
> side with VPP on the server side, and try to repeat the iperf test
> When I use VPP-native virtio on the server side, the iperf test fails, 
> packets are dropped on the server (VPP) side, doing a trace shows packets are 
> dropped because of "bad tcp checksum"
> I then switch the server side to use DPDK virtio, the iperf test works and I 
> get 3 Gbps throughput
> So, the big performance problem is on the client (sender) side, with VPP only 
> able to get around 600 Mbps out for some reason, even when using the built-in 
> test echo application. I'm continuing my investigation to see where the 
> bottleneck is, any other ideas on where to look would be greatly appreciated.
> 
> Also, there may be a checksum bug in the VPP-native virtio driver since the 
> packets are not dropped on the server side when using the DPDK virtio driver. 
> I'd be happy to help gather more details on this, create a JIRA ticket and 
> even contribute a fix but wanted to check before going down that road, any 
> thoughts or comments?
> 
> Thanks again for all the help so far!
> 
> Regards,
> Dom
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> 
> View/Reply Online (#14801): https://lists.fd.io/g/vpp-dev/message/14801
> Mute This Topic: https://lists.fd.io/mt/65863639/675152
> Group Owner: vpp-dev+ow...@lists.fd.io
> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [fcoras.li...@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#14805): https://lists.fd.io/g/vpp-dev/message/14805
Mute This Topic: https://lists.fd.io/mt/65863639/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP / tcp_echo performance

2019-12-04 Thread Florin Coras
Hi Dom, 

I suspect your client/server are really bursty in sending/receiving and your 
fifos are relatively small. So probably the delay in issuing the cli in the two 
vms is enough for the receiver to drain its rx fifo. Also, whenever the rx fifo 
on the receiver fills, the sender will most probably stop sending for ~200ms 
(the persist timeout after a zero window). 

The vcl.conf parameters are only used by vcl applications. The builtin echo 
apps do not use vcl, instead they use the native C app-interface api. Both the 
server and client echo apps take the fifo size as a parameter (something like 
fifo-size 4096 for 4MB fifos). 

Regards, 
Florin

> On Dec 4, 2019, at 3:58 PM, dch...@akouto.com wrote:
> 
> Hi Florin,
> 
> Those are tcp echo results. Note that the "show session verbose 2" command 
> was issued while there was still traffic being sent. Interesting that on the 
> client (sender) side the tx fifo is full (cursize 65534 nitems 65534) and on 
> the server (receiver) side the rx fifo is empty (cursize 0 nitems 65534).
> 
> Where is the rx and tx fifo size configured? Here's my exact vcl.conf file:
> vcl {
>   rx-fifo-size 1600
>   tx-fifo-size 1600
>   app-scope-local
>   app-scope-global
>   api-socket-name /tmp/vpp-api.sock
> }
> 
> Is this what those values should match?
> 
> Thanks,
> Dom
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> 
> View/Reply Online (#14803): https://lists.fd.io/g/vpp-dev/message/14803
> Mute This Topic: https://lists.fd.io/mt/65863639/675152
> Group Owner: vpp-dev+ow...@lists.fd.io
> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [fcoras.li...@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#14804): https://lists.fd.io/g/vpp-dev/message/14804
Mute This Topic: https://lists.fd.io/mt/65863639/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP / tcp_echo performance

2019-12-04 Thread dchons
Hi Florin,

Those are tcp echo results. Note that the "show session verbose 2" command was 
issued while there was still traffic being sent. Interesting that on the client 
(sender) side the tx fifo is full (cursize 65534 nitems 65534) and on the 
server (receiver) side the rx fifo is empty (cursize 0 nitems 65534).

Where is the rx and tx fifo size configured? Here's my exact vcl.conf file:
vcl {
rx-fifo-size 1600
tx-fifo-size 1600
app-scope-local
app-scope-global
api-socket-name /tmp/vpp-api.sock
}

Is this what those values should match?

Thanks,
Dom
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#14803): https://lists.fd.io/g/vpp-dev/message/14803
Mute This Topic: https://lists.fd.io/mt/65863639/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP / tcp_echo performance

2019-12-04 Thread Florin Coras
Hi Dom, 

[traveling so a quick reply]

For some reason, your rx/tx fifos (see nitems), and implicitly the snd and rcv 
wnd, are 64kB in your logs lower. Is this the tcp echo or iperf result?

Regards,
Florin

> On Dec 4, 2019, at 7:29 AM, dch...@akouto.com wrote:
> 
> Hi,
> 
> Thank you Florin and Jerome for your time, very much appreciated.
> 
> For VCL configuration, FIFO sizes are 16 MB
> "show session verbose 2" does not indicate any retransmissions. Here are the 
> numbers during a test run where approx. 9 GB were transferred (the difference 
> in values between client and server is just because it took me a few seconds 
> to issue the command on the client side as you can see from the duration):
> SERVER SIDE:
>  stats: in segs 5989307 dsegs 5989306 bytes 8544661342 dupacks 0
> out segs 3942513 dsegs 0 bytes 0 dupacks 0
> fr 0 tr 0 rxt segs 0 bytes 0 duration 106.489
> err wnd data below 0 above 0 ack below 0 above 0
> CLIENT SIDE:
>  stats: in segs 4207793 dsegs 0 bytes 0 dupacks 0
> out segs 6407444 dsegs 6407443 bytes 9141373892 dupacks 0
> fr 0 tr 0 rxt segs 0 bytes 0 duration 114.113
> err wnd data below 0 above 0 ack below 0 above 0
> sh int does not seem to indicate any issue. There are occasional drops but I 
> enabled tracing and checked those out, they are LLC BPDU's, I'm not sure 
> where those are coming from but I suspect they are from linuxbridge in the 
> compute host where the VMs are running.
> @Jerome: Before I use the dpdk-devbind command to make the interfaces 
> available to VPP, they use virtio drivers. When assigned to VPP they use 
> uio_pci_generic.
> 
> I'm not sure if any other stats might be useful so I'm just pasting a bunch 
> of stats & information from the client & server instances below, I know it's 
> a lot, just putting it here in case there is something useful in there. 
> Thanks again for taking the time to follow-up with me and for the 
> suggestions, I really do appreciate it very much!
> 
> Regards,
> Dom
> 
> #
> # Interface uses virtio-pci when the iperf3 test is run using regular Linux
> # networking. 
> #
> [root@vpp-test-1 centos]# dpdk-devbind --status
>  
> Network devices using kernel driver
> ===
> :00:03.0 'Virtio network device 1000' if=eth0 drv=virtio-pci 
> unused=virtio_pci *Active*
> :00:04.0 'Virtio network device 1000' if=eth1 drv=virtio-pci 
> unused=virtio_pci *Active*
>  
> #
> # Interface uses uio_pci_generic when set up for VPP
> #
>  
> [root@vpp-test-1 centos]# dpdk-devbind --status
>  
> Network devices using DPDK-compatible driver
> 
> :00:03.0 'Virtio network device 1000' drv=uio_pci_generic 
> unused=virtio_pci
>  
> Network devices using kernel driver
> ===
> :00:04.0 'Virtio network device 1000' if=eth1 drv=virtio-pci 
> unused=virtio_pci,uio_pci_generic *Active*
>  
>  
> vpp# sh hardware-interfaces
>   NameIdx   Link  Hardware
> GigabitEthernet0/3/0   1 up   GigabitEthernet0/3/0
>   Link speed: 10 Gbps
>   Ethernet address fa:16:3e:10:5e:4b
>   Red Hat Virtio
> carrier up full duplex mtu 9206
> flags: admin-up pmd maybe-multiseg
> rx: queues 1 (max 1), desc 256 (min 0 max 65535 align 1)
> tx: queues 1 (max 1), desc 256 (min 0 max 65535 align 1)
> pci: device 1af4:1000 subsystem 1af4:0001 address :00:03.00 numa 0
> max rx packet len: 9728
> promiscuous: unicast off all-multicast on
> vlan offload: strip off filter off qinq off
> rx offload avail:  vlan-strip udp-cksum tcp-cksum tcp-lro vlan-filter
>jumbo-frame
> rx offload active: jumbo-frame
> tx offload avail:  vlan-insert udp-cksum tcp-cksum tcp-tso multi-segs
> tx offload active: multi-segs
> rss avail: none
> rss active:none
> tx burst function: virtio_xmit_pkts
> rx burst function: virtio_recv_mergeable_pkts
>  
> rx frames ok 467
> rx bytes ok27992
> extended stats:
>   rx good packets467
>   rx good bytes27992
>   rx q0packets   467
>   rx q0bytes   27992
>   rx q0 good packets 467
>   rx q0 good bytes 27992
>   rx q0 multicast packets465
>   rx q0 broadcast packets  2
> 

Re: [vpp-dev] VPP / tcp_echo performance

2019-12-04 Thread dchons
It turns out I was using DPDK virtio, with help from Moshin I changed the 
configuration and tried to repeat the tests using VPP native virtio, results 
are similar but there are some interesting new observations, sharing them here 
in case they are useful to others or trigger any ideas.

After configuring both instances to use VPP native virtio, I used the built-in 
echo test to see what throughput I would get, and I got the same results as the 
modified external tcp_echo, i.e. about 600 Mbps:

* Added *dpdk { no-pci }* to startup.conf and configured the interface using 
*create int virtio * as per instructions from Moshin, confirmed 
settings with *show virtio pci* command
* Ran the built-in test echo application to transfer 1 GB of data and got the 
following results:

*vpp# test echo clients gbytes 1 uri tcp://10.0.0.153/5556*
1 three-way handshakes in 0.00 seconds 2288.06/s
Test started at 1255.753237
Test finished at 1272.863244
1073741824 bytes (1024 mbytes, 1 gbytes) in 17.11 seconds
62755195.55 bytes/second full-duplex
*.5020 gbit/second full-duplex*

* I then used iperf3 with VCL on both sides and got roughly the same results 
(620 Mbps)
* Then I rebooted the client VM and use native Linux networking on the client 
side with VPP on the server side, and try to repeat the iperf test

* When I use VPP-native virtio on the server side, the iperf test fails, 
packets are dropped on the server (VPP) side, doing a trace shows packets are 
dropped because of "bad tcp checksum"
* I then switch the server side to use DPDK virtio, the iperf test works and I 
get 3 Gbps throughput

So, the big performance problem is on the client (sender) side, with VPP only 
able to get around 600 Mbps out for some reason, even when using the built-in 
test echo application. I'm continuing my investigation to see where the 
bottleneck is, any other ideas on where to look would be greatly appreciated.

Also, there may be a checksum bug in the VPP-native virtio driver since the 
packets are not dropped on the server side when using the DPDK virtio driver. 
I'd be happy to help gather more details on this, create a JIRA ticket and even 
contribute a fix but wanted to check before going down that road, any thoughts 
or comments?

Thanks again for all the help so far!

Regards,
Dom
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#14801): https://lists.fd.io/g/vpp-dev/message/14801
Mute This Topic: https://lists.fd.io/mt/65863639/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP / tcp_echo performance

2019-12-04 Thread Jerome Tollet via Lists.Fd.Io
Are you using VPP native virtio or DPDK virtio ?
Jerome

De :  au nom de "dch...@akouto.com" 
Date : mercredi 4 décembre 2019 à 16:29
À : "vpp-dev@lists.fd.io" 
Objet : Re: [vpp-dev] VPP / tcp_echo performance

Hi,

Thank you Florin and Jerome for your time, very much appreciated.
· For VCL configuration, FIFO sizes are 16 MB
· "show session verbose 2" does not indicate any retransmissions. Here 
are the numbers during a test run where approx. 9 GB were transferred (the 
difference in values between client and server is just because it took me a few 
seconds to issue the command on the client side as you can see from the 
duration):
SERVER SIDE:
 stats: in segs 5989307 dsegs 5989306 bytes 8544661342 dupacks 0
out segs 3942513 dsegs 0 bytes 0 dupacks 0
fr 0 tr 0 rxt segs 0 bytes 0 duration 106.489
err wnd data below 0 above 0 ack below 0 above 0
CLIENT SIDE:
 stats: in segs 4207793 dsegs 0 bytes 0 dupacks 0
out segs 6407444 dsegs 6407443 bytes 9141373892 dupacks 0
fr 0 tr 0 rxt segs 0 bytes 0 duration 114.113
err wnd data below 0 above 0 ack below 0 above 0
· sh int does not seem to indicate any issue. There are occasional 
drops but I enabled tracing and checked those out, they are LLC BPDU's, I'm not 
sure where those are coming from but I suspect they are from linuxbridge in the 
compute host where the VMs are running.
· @Jerome: Before I use the dpdk-devbind command to make the interfaces 
available to VPP, they use virtio drivers. When assigned to VPP they use 
uio_pci_generic.

I'm not sure if any other stats might be useful so I'm just pasting a bunch of 
stats & information from the client & server instances below, I know it's a 
lot, just putting it here in case there is something useful in there. Thanks 
again for taking the time to follow-up with me and for the suggestions, I 
really do appreciate it very much!

Regards,
Dom
#
# Interface uses virtio-pci when the iperf3 test is run using regular Linux
# networking.
#
[root@vpp-test-1 centos]# dpdk-devbind --status

Network devices using kernel driver
===
:00:03.0 'Virtio network device 1000' if=eth0 drv=virtio-pci 
unused=virtio_pci *Active*
:00:04.0 'Virtio network device 1000' if=eth1 drv=virtio-pci 
unused=virtio_pci *Active*

#
# Interface uses uio_pci_generic when set up for VPP
#

[root@vpp-test-1 centos]# dpdk-devbind --status

Network devices using DPDK-compatible driver

:00:03.0 'Virtio network device 1000' drv=uio_pci_generic unused=virtio_pci

Network devices using kernel driver
===
:00:04.0 'Virtio network device 1000' if=eth1 drv=virtio-pci 
unused=virtio_pci,uio_pci_generic *Active*


vpp# sh hardware-interfaces
  NameIdx   Link  Hardware
GigabitEthernet0/3/0   1 up   GigabitEthernet0/3/0
  Link speed: 10 Gbps
  Ethernet address fa:16:3e:10:5e:4b
  Red Hat Virtio
carrier up full duplex mtu 9206
flags: admin-up pmd maybe-multiseg
rx: queues 1 (max 1), desc 256 (min 0 max 65535 align 1)
tx: queues 1 (max 1), desc 256 (min 0 max 65535 align 1)
pci: device 1af4:1000 subsystem 1af4:0001 address :00:03.00 numa 0
max rx packet len: 9728
promiscuous: unicast off all-multicast on
vlan offload: strip off filter off qinq off
rx offload avail:  vlan-strip udp-cksum tcp-cksum tcp-lro vlan-filter
   jumbo-frame
rx offload active: jumbo-frame
tx offload avail:  vlan-insert udp-cksum tcp-cksum tcp-tso multi-segs
tx offload active: multi-segs
rss avail: none
rss active:none
tx burst function: virtio_xmit_pkts
rx burst function: virtio_recv_mergeable_pkts

rx frames ok 467
rx bytes ok27992
extended stats:
  rx good packets467
  rx good bytes27992
  rx q0packets   467
  rx q0bytes   27992
  rx q0 good packets 467
  rx q0 good bytes 27992
  rx q0 multicast packets465
  rx q0 broadcast packets  2
  rx q0 undersize packets467


#
# Dropped packets are LLC BPDUs, not

Re: [vpp-dev] VPP / tcp_echo performance

2019-12-04 Thread dchons
Hi,

Thank you Florin and Jerome for your time, very much appreciated.

* For VCL configuration, FIFO sizes are 16 MB
* "show session verbose 2" does not indicate any retransmissions. Here are the 
numbers during a test run where approx. 9 GB were transferred (the difference 
in values between client and server is just because it took me a few seconds to 
issue the command on the client side as you can see from the duration):

*SERVER SIDE:*
stats: in segs 5989307 dsegs 5989306 bytes 8544661342 dupacks 0
out segs 3942513 dsegs 0 bytes 0 dupacks 0
fr 0 tr 0 rxt segs 0 bytes 0 duration 106.489
err wnd data below 0 above 0 ack below 0 above 0
*CLIENT SIDE:*
stats: in segs 4207793 dsegs 0 bytes 0 dupacks 0
out segs 6407444 dsegs 6407443 bytes 9141373892 dupacks 0
fr 0 tr 0 rxt segs 0 bytes 0 duration 114.113
err wnd data below 0 above 0 ack below 0 above 0

* sh int does not seem to indicate any issue. There are occasional drops but I 
enabled tracing and checked those out, they are LLC BPDU's, I'm not sure where 
those are coming from but I suspect they are from linuxbridge in the compute 
host where the VMs are running.
* *@Jerome* : Before I use the dpdk-devbind command to make the interfaces 
available to VPP, they use virtio drivers. When assigned to VPP they use 
uio_pci_generic.

I'm not sure if any other stats might be useful so I'm just pasting a bunch of 
stats & information from the client & server instances below, I know it's a 
lot, just putting it here in case there is something useful in there. Thanks 
again for taking the time to follow-up with me and for the suggestions, I 
really do appreciate it very much!

Regards,
Dom

*#*
*# Interface uses virtio-pci when the iperf3 test is run using regular Linux*
*# networking.*
*#*
*[root@vpp-test-1 centos]# dpdk-devbind --status*

Network devices using kernel driver
===
:00:03.0 'Virtio network device 1000' if=eth0 drv=virtio-pci 
unused=virtio_pci *Active*
:00:04.0 'Virtio network device 1000' if=eth1 drv=virtio-pci 
unused=virtio_pci *Active*

*#*
*# Interface uses uio_pci_generic when set up for VPP*
*#*

*[root@vpp-test-1 centos]# dpdk-devbind --status*

Network devices using DPDK-compatible driver

:00:03.0 'Virtio network device 1000' drv=uio_pci_generic unused=virtio_pci

Network devices using kernel driver
===
:00:04.0 'Virtio network device 1000' if=eth1 drv=virtio-pci 
unused=virtio_pci,uio_pci_generic *Active*

*vpp# sh hardware-interfaces*
Name                Idx   Link  Hardware
GigabitEthernet0/3/0               1     up   GigabitEthernet0/3/0
Link speed: 10 Gbps
Ethernet address fa:16:3e:10:5e:4b
Red Hat Virtio
carrier up full duplex mtu 9206
flags: admin-up pmd maybe-multiseg
rx: queues 1 (max 1), desc 256 (min 0 max 65535 align 1)
tx: queues 1 (max 1), desc 256 (min 0 max 65535 align 1)
pci: device 1af4:1000 subsystem 1af4:0001 address :00:03.00 numa 0
max rx packet len: 9728
promiscuous: unicast off all-multicast on
vlan offload: strip off filter off qinq off
rx offload avail:  vlan-strip udp-cksum tcp-cksum tcp-lro vlan-filter
jumbo-frame
rx offload active: jumbo-frame
tx offload avail:  vlan-insert udp-cksum tcp-cksum tcp-tso multi-segs
tx offload active: multi-segs
rss avail:         none
rss active:        none
tx burst function: virtio_xmit_pkts
rx burst function: virtio_recv_mergeable_pkts

rx frames ok                                         467
rx bytes ok                                        27992
extended stats:
rx good packets                                    467
rx good bytes                                    27992
rx q0packets                                       467
rx q0bytes                                       27992
rx q0 good packets                                 467
rx q0 good bytes                                 27992
rx q0 multicast packets                            465
rx q0 broadcast packets                              2
rx q0 undersize packets                            467

*#*
*# Dropped packets are LLC BPDUs, not sure but probably a linuxbridge thing*
*#*
*vpp# show trace*
--- Start of thread 0 vpp_main ---
No packets in trace buffer
--- Start of thread 1 vpp_wk_0 ---
Packet 1

00:08:35:202159: dpdk-input
GigabitEthernet0/3/0 rx queue 0
buffer 0xfee2f4: current data 0, length 60, buffer-pool 0, ref-count 1, 
totlen-nifb 0, trace handle 0x100
ext-hdr-valid

Re: [vpp-dev] VPP / tcp_echo performance

2019-12-04 Thread Jerome Tollet via Lists.Fd.Io
Hi Dom,
In addition to Florin’s questions, can you clarify what you mean by 
“…interfaces are assigned to DPDK/VPP” ? What driver are you using ?
Regards,
Jerome


De :  au nom de Florin Coras 
Date : mercredi 4 décembre 2019 à 02:31
À : "dch...@akouto.com" 
Cc : "vpp-dev@lists.fd.io" 
Objet : Re: [vpp-dev] VPP / tcp_echo performance

Hi Dom,

I’ve never tried to run the stack in a VM, so not sure about the expected 
performance, but here are a couple of comments:
- What fifo sizes are you using? Are they at least 4MB (see [1] for VCL 
configuration).
- I don’t think you need to configure more than 16k buffers/numa.

Additionally, to get more information on the issue:
- What does “show session verbose 2” report? Check the stats section for 
retransmit counts (tr - timer retransmit, fr - fast retansmit) which if 
non-zero indicate that packets are lost.
- Check interface rx/tx error counts with “show int”.
- Typically, for improved performance, you should write more than 1.4kB per 
call. But the fact that your average is less than 1.4kB suggests that you often 
find the fifo full or close to full. So probably the issue is not your sender 
app.

Regards,
Florin

[1] https://wiki.fd.io/view/VPP/HostStack/LDP/iperf


On Dec 3, 2019, at 11:40 AM, dch...@akouto.com<mailto:dch...@akouto.com> wrote:

Hi all,
I've been running some performance tests and not quite getting the results I 
was hoping for, and have a couple of related questions I was hoping someone 
could provide some tips with. For context, here's a summary of the results of 
TCP tests I've run on two VMs (CentOS 7 OpenStack instances, host-1 is the 
client and host-2 is the server):
· Running iperf3 natively before the interfaces are assigned to 
DPDK/VPP: 10 Gbps TCP throughput
· Running iperf3 with VCL/HostStack: 3.5 Gbps TCP throughput
· Running a modified version of the tcp_echo application (similar 
results with socket and svm api): 610 Mbps throughput
Things I've tried to improve performance:
· Anything I could apply from 
https://wiki.fd.io/view/VPP/How_To_Optimize_Performance_(System_Tuning)
· Added tcp { cc-algo cubic } to VPP startup config
· Using isolcpu and VPP startup config options, allocated first 2, then 
4 and finally 6 of the 8 available cores to VPP main & worker threads
· In VPP startup config set "buffers-per-numa 65536" and "default 
data-size 4096"
· Updated grub boot options to include hugepagesz=1GB hugepages=64 
default_hugepagesz=1GB
My goal is to achieve at least the same throughput using VPP as I get when I 
run iperf3 natively on the same network interfaces (in this case 10 Gbps).

A couple of related questions:
· Given the items above, do any VPP or kernel configuration items jump 
out that I may have missed that could justify the difference in native vs VPP 
performance or help get the two a bit closer?
· In the modified tcp_echo application, n_sent = app_send_stream(...) 
is called in a loop always using the same length (1400 bytes) in my test 
version. The return value n_sent indicates that the average bytes sent is only 
around 130 bytes per call after some run time. Are there any parameters or 
options that might improve this?
Any tips or pointers to documentation that might shed some light would be 
hugely appreciated!

Regards,
Dom

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#14772): https://lists.fd.io/g/vpp-dev/message/14772
Mute This Topic: https://lists.fd.io/mt/65863639/675152
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[fcoras.li...@gmail.com<mailto:fcoras.li...@gmail.com>]
-=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#14779): https://lists.fd.io/g/vpp-dev/message/14779
Mute This Topic: https://lists.fd.io/mt/65863639/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP / tcp_echo performance

2019-12-03 Thread Florin Coras
Hi Dom, 

I’ve never tried to run the stack in a VM, so not sure about the expected 
performance, but here are a couple of comments:
- What fifo sizes are you using? Are they at least 4MB (see [1] for VCL 
configuration). 
- I don’t think you need to configure more than 16k buffers/numa. 

Additionally, to get more information on the issue:
- What does “show session verbose 2” report? Check the stats section for 
retransmit counts (tr - timer retransmit, fr - fast retansmit) which if 
non-zero indicate that packets are lost. 
- Check interface rx/tx error counts with “show int”. 
- Typically, for improved performance, you should write more than 1.4kB per 
call. But the fact that your average is less than 1.4kB suggests that you often 
find the fifo full or close to full. So probably the issue is not your sender 
app. 

Regards,
Florin

[1] https://wiki.fd.io/view/VPP/HostStack/LDP/iperf

> On Dec 3, 2019, at 11:40 AM, dch...@akouto.com wrote:
> 
> Hi all,
> 
> I've been running some performance tests and not quite getting the results I 
> was hoping for, and have a couple of related questions I was hoping someone 
> could provide some tips with. For context, here's a summary of the results of 
> TCP tests I've run on two VMs (CentOS 7 OpenStack instances, host-1 is the 
> client and host-2 is the server):
> Running iperf3 natively before the interfaces are assigned to DPDK/VPP: 10 
> Gbps TCP throughput
> Running iperf3 with VCL/HostStack: 3.5 Gbps TCP throughput
> Running a modified version of the tcp_echo application (similar results with 
> socket and svm api): 610 Mbps throughput
> Things I've tried to improve performance:
> Anything I could apply from 
> https://wiki.fd.io/view/VPP/How_To_Optimize_Performance_(System_Tuning)
> Added tcp { cc-algo cubic } to VPP startup config
> Using isolcpu and VPP startup config options, allocated first 2, then 4 and 
> finally 6 of the 8 available cores to VPP main & worker threads
> In VPP startup config set "buffers-per-numa 65536" and "default data-size 
> 4096"
> Updated grub boot options to include hugepagesz=1GB hugepages=64 
> default_hugepagesz=1GB
> My goal is to achieve at least the same throughput using VPP as I get when I 
> run iperf3 natively on the same network interfaces (in this case 10 Gbps).
>  
> A couple of related questions:
> Given the items above, do any VPP or kernel configuration items jump out that 
> I may have missed that could justify the difference in native vs VPP 
> performance or help get the two a bit closer?
> In the modified tcp_echo application, n_sent = app_send_stream(...) is called 
> in a loop always using the same length (1400 bytes) in my test version. The 
> return value n_sent indicates that the average bytes sent is only around 130 
> bytes per call after some run time. Are there any parameters or options that 
> might improve this?
> Any tips or pointers to documentation that might shed some light would be 
> hugely appreciated!
>  
> Regards,
> Dom
>  
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> 
> View/Reply Online (#14772): https://lists.fd.io/g/vpp-dev/message/14772
> Mute This Topic: https://lists.fd.io/mt/65863639/675152
> Group Owner: vpp-dev+ow...@lists.fd.io
> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [fcoras.li...@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#14777): https://lists.fd.io/g/vpp-dev/message/14777
Mute This Topic: https://lists.fd.io/mt/65863639/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] VPP / tcp_echo performance

2019-12-03 Thread dchons
Hi all,

I've been running some performance tests and not quite getting the results I 
was hoping for, and have a couple of related questions I was hoping someone 
could provide some tips with. For context, here's a summary of the results of 
TCP tests I've run on two VMs (CentOS 7 OpenStack instances, host-1 is the 
client and host-2 is the server):

* Running iperf3 natively before the interfaces are assigned to DPDK/VPP: *10 
Gbps TCP throughput*
* Running iperf3 with VCL/HostStack: *3.5 Gbps TCP throughput*
* Running a modified version of the *tcp_echo* application (similar results 
with socket and svm api): *610 Mbps throughput*

Things I've tried to improve performance:

* Anything I could apply from 
https://wiki.fd.io/view/VPP/How_To_Optimize_Performance_(System_Tuning)
* Added tcp { cc-algo cubic } to VPP startup config
* Using isolcpu and VPP startup config options, allocated first 2, then 4 and 
finally 6 of the 8 available cores to VPP main & worker threads
* In VPP startup config set "buffers-per-numa 65536" and "default data-size 
4096"
* Updated grub boot options to include hugepagesz=1GB hugepages=64 
default_hugepagesz=1GB

My goal is to achieve at least the same throughput using VPP as I get when I 
run iperf3 natively on the same network interfaces (in this case 10 Gbps).

A couple of related questions:

* Given the items above, do any VPP or kernel configuration items jump out that 
I may have missed that could justify the difference in native vs VPP 
performance or help get the two a bit closer?
* In the modified tcp_echo application, *n_sent = app_send_stream(...)* is 
called in a loop always using the same length (1400 bytes) in my test version. 
The return value *n_sent* indicates that the average bytes sent is only around 
130 bytes per call after some run time. Are there any parameters or options 
that might improve this?

Any tips or pointers to documentation that might shed some light would be 
hugely appreciated!

Regards,
Dom
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#14772): https://lists.fd.io/g/vpp-dev/message/14772
Mute This Topic: https://lists.fd.io/mt/65863639/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-