Thanks, Damjan for the hints. The 2nd point that you mentioned looks
interesting. I did see drops on rx queue, especially with command “show
hardware”.
I was able to see a significant performance improvement with multiple worker
threads pinned to a dedicated logical core. There are 4 worker threads, each
polling on 10G ports and 1G ports. There is one Rx queue configured / port. 1G
ports are admin down. Below is the rx-placement and thread placement. With
below config, I am able to send/receive 95% of 10G traffic (1500 byte frame) in
both directions without any drops. However, when I admin-enable 1G ports, “rx
miss” and “tx-error” drops start to appear on 10G ports. Had to bring down the
traffic to 85% of 10G to see no drops Any thoughts on why this could be
happening ? There is separate worker thread polling on the 1G port queues and
there is no traffic on 1G ports. So, looking to understand what is the relation.
vpp# show interface rx-placement
Thread 1 (vpp_wk_0):
node dpdk-input:
TenGigabitEthernet3/0/0 queue 0 (polling)
Thread 2 (vpp_wk_1):
node dpdk-input:
TenGigabitEthernet3/0/1 queue 0 (polling)
Thread 3 (vpp_wk_2):
node dpdk-input:
GigabitEthernet5/0/0 queue 0 (polling)
Thread 4 (vpp_wk_3):
node dpdk-input:
GigabitEthernet5/0/1 queue 0 (polling)
vpp# show threads
ID Name Type LWP Sched Policy (Priority) lcore
Core Socket State
0 vpp_main 1454 other (0) 2
2 0
1 vpp_wk_0 workers 1458 other (0) 10
2 0
2 vpp_wk_1 workers 1459 other (0) 11
3 0
3 vpp_wk_2 workers 1460 other (0) 12
4 0
4 vpp_wk_3 workers 1461 other (0) 13
5 0
5 stats 1462 other (0) 0
0 0
Thanks,
Vijay
From: Damjan Marion <[email protected]>
Date: Tuesday, March 26, 2019 at 2:30 AM
To: "Chandra Mohan, Vijay Mohan" <[email protected]>
Cc: "[email protected]" <[email protected]>
Subject: [**EXTERNAL**] Re: [vpp-dev] VPP Performance question
Few hints:
1. When you observe “show run” statistics, you always do:
1. start traffic
2. clear run
3. wait a bit
4 show run
Otherwise statistics will show you average which includes period without
traffic.
2. debug CLI commands are typically causing barrier sync (unless handler is
explicitly marked as thread safe),
and that can stop worker threads for more than 500 usec. In such situations it
is normal and expected that you will observe
small amount of rx tail drops as simply worker is not servicing specific NIC
queue for significant amount of time.
On 26 Mar 2019, at 04:04, Chandra Mohan, Vijay Mohan
<[email protected]<mailto:[email protected]>> wrote:
Hi Everyone,
I am working on measuring the performance with a xconnect of two
sub-interfaces. I did see quite a few performance related questions & answers
in the community which were very helpful to get to this point. However, I’m
still facing rx and tx queue drops (“rx misses” and “tx-error”).
Here is the config :
l2 xconnect TenGigabitEthernet3/0/0.1 TenGigabitEthernet3/0/1.1
l2 xconnect TenGigabitEthernet3/0/1.1 TenGigabitEthernet3/0/0.1
I’m passing traffic which is 70% of the line rate (10G) in both directions and
I do not see any drops. Ran the traffic for 30 Min with no drops. Below is the
runtime stats. Have CPU affinity in place and “vpp_wk_0” is on dedicated
logical core 9. However, I see that the “vectors/node” is 26.03 . I was
expecting to see 255.99. Is it something that can be seen only with high burst
of traffic ? I may be missing something here and looking to understand what
that may be.
Thread 1 vpp_wk_0 (lcore 9)
Time 531.8, average vectors/node 26.03, last 128 main loops 1.31 per node 21.00
vector rates in 1.1539e6, out 1.1539e6, drop 0.0000e0, punt 0.0000e0
Name State Calls Vectors
Suspends Clocks Vectors/Call
TenGigabitEthernet3/0/0-output active 18857661 306865019
0 1.54e2 16.27
TenGigabitEthernet3/0/0-tx active 18857661 306865019
0 2.62e2 16.27
TenGigabitEthernet3/0/1-output active 18857661 306865172
0 1.63e2 16.27
TenGigabitEthernet3/0/1-tx active 18857661 306865172
0 2.67e2 16.27
dpdk-input polling 18857661 613730191
0 4.48e2 32.55
ethernet-input active 18864470 613730191
0 6.92e2 32.53
l2-input active 18864470 613730191
0 1.15e2 32.53
l2-output active 18864470 613730191
0 1.31e2 32.53
There are two rx-queues and two tx-queues assigned to each of the 10 Gig ports.
Queue depth is 1024. Following is the queue placement:
Thread 1 (vpp_wk_0):
node dpdk-input:
TenGigabitEthernet3/0/0 queue 0 (polling)
TenGigabitEthernet3/0/0 queue 1 (polling)
TenGigabitEthernet3/0/1 queue 0 (polling)
TenGigabitEthernet3/0/1 queue 1 (polling)
Now, when I increase the rate to 75% of 10G, I am seeing drops due to “rx-miss”
DBGvpp# sho int
Name Idx State MTU (L3/IP4/IP6/MPLS)
Counter Count
TenGigabitEthernet3/0/0 1 up 9000/0/0/0 rx packets
26235935
rx bytes
39248958760
tx packets
26236104
tx bytes
39249211584
rx-miss
697
TenGigabitEthernet3/0/0.1 3 up 0/0/0/0 rx packets
26235935
rx bytes
39248958760
tx packets
26236104
tx bytes
39249211584
TenGigabitEthernet3/0/1 2 up 9000/0/0/0 rx packets
26236104
rx bytes
39249211584
tx packets
26235935
tx bytes
39248958760
rx-miss
711
TenGigabitEthernet3/0/1.1 4 up 0/0/0/0 rx packets
26236104
rx bytes
39249211584
tx packets
26235935
tx bytes
39248958760
local0 0 down 0/0/0/0
Here is the runtime stats when that happens:
Thread 1 vpp_wk_0 (lcore 9)
Time 59.0, average vectors/node 34.58, last 128 main loops 1.69 per node 27.00
vector rates in 1.2365e6, out 1.2365e6, drop 0.0000e0, punt 0.0000e0
Name State Calls Vectors
Suspends Clocks Vectors/Call
TenGigabitEthernet3/0/0-output active 1682608 36482575
0 1.33e2 21.68
TenGigabitEthernet3/0/0-tx active 1682608 36482575
0 2.48e2 21.68
TenGigabitEthernet3/0/1-output active 1682608 36482560
0 1.42e2 21.68
TenGigabitEthernet3/0/1-tx active 1682608 36482560
0 2.53e2 21.68
dpdk-input polling 1682608 72965135
0 4.11e2 43.36
ethernet-input active 1691495 72965135
0 6.77e2 43.14
l2-input active 1691495 72965135
0 1.08e2 43.14
l2-output active 1691495 72965135
0 1.07e2 43.14
Would increasing the core, threads be of any help ? or Given that vector/node
is 34.58, does it mean there is still room to process more frames ?
Also, there are two Rx queues configured. Is there a command to check if they
are equally serviced ? looking to understand how the load is equally
distributed over the two rx-queues and two tx queues.
Any help to determine why this drop might be happening will be great.
Thanks,
Vijay
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#12635): https://lists.fd.io/g/vpp-dev/message/12635
Mute This Topic: https://lists.fd.io/mt/30778968/675642
Group Owner: [email protected]<mailto:[email protected]>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub
[[email protected]<mailto:[email protected]>]
-=-=-=-=-=-=-=-=-=-=-=-
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#12654): https://lists.fd.io/g/vpp-dev/message/12654
Mute This Topic: https://lists.fd.io/mt/30808484/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-