Hi VPP dev,
I was trying to do some benchmarking on VPP and found out no-multi-seg option
in startup.conf will have impact on both the performance and how the runtime
shows.
The VPP version is v21.01-rc0~547-gf0419a0c8, the DPDK version is DPDK 20.11.0.
With no-multi-seg option set in the startup.conf, the runtime shows like the
following:
Thread 1 vpp_wk_0 (lcore 2)
Time 1.1, 10 sec internal node vector rate 85.33 loops/sec 97035.37
vector rates in 1.2537e7, out 1.2537e7, drop 0.0000e0, punt 0.0000e0
Name State Calls Vectors
Suspends Clocks Vectors/Call
dpdk-input polling 112527 14403456
0 9.34e-1 128.00
eth0-output active 112527 7201728
0 2.04e-1 64.00
eth0-tx active 112527 7201728
0 7.86e-1 64.00
eth1-output active 112527 7201728
0 1.91e-1 64.00
eth1-tx active 112527 7201728
0 7.93e-1 64.00
ethernet-input active 225054 14403456
0 5.65e-1 64.00
ip4-input-no-checksum active 112527 14403456
0 3.83e-1 128.00
ip4-lookup active 112527 14403456
0 5.34e-1 128.00
ip4-rewrite active 112527 14403456
0 5.73e-1 128.00
unix-epoll-input polling 110 0
0 2.84e1 0.00
Output for command 'show hardware-interfaces':
vpp# sh hardware-interfaces
Name Idx Link Hardware
eth0 1 up eth0
Link speed: 40 Gbps
Ethernet address 3c:fd:fe:bb:d4:10
Intel X710/XL710 Family
carrier up full duplex mtu 9206
flags: admin-up pmd rx-ip4-cksum
Devargs:
rx: queues 1 (max 320), desc 1024 (min 64 max 4096 align 32)
tx: queues 2 (max 320), desc 1024 (min 64 max 4096 align 32)
pci: device 8086:1583 subsystem 8086:0001 address 0001:01:00.00 numa 0
max rx packet len: 9728
promiscuous: unicast off all-multicast on
vlan offload: strip off filter off qinq off
rx offload avail: vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq-strip
outer-ipv4-cksum vlan-filter vlan-extend jumbo-frame
scatter keep-crc rss-hash
rx offload active: ipv4-cksum
tx offload avail: vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp-cksum
tcp-tso outer-ipv4-cksum qinq-insert vxlan-tnl-tso
gre-tnl-tso ipip-tnl-tso geneve-tnl-tso multi-segs
mbuf-fast-free
tx offload active: none
rss avail: ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other
ipv6-frag
ipv6-tcp ipv6-udp ipv6-sctp ipv6-other l2-payload
rss active: none
tx burst mode: Vector Neon
rx burst mode: Vector Neon
Without no-mutli-seg option in startup.conf, the runtime shows as below:
Thread 1 vpp_wk_0 (lcore 2)
Time 1.7, 10 sec internal node vector rate 256.00 loops/sec 19628.70
vector rates in 1.0186e7, out 1.0186e7, drop 0.0000e0, punt 0.0000e0
Name State Calls Vectors
Suspends Clocks Vectors/Call
dpdk-input polling 34157 17488384
0 9.51e-1 512.00
eth0-output active 34157 8744192
0 1.66e-1 256.00
eth0-tx active 34157 8744192
0 1.84e0 256.00
eth1-output active 34157 8744192
0 1.71e-1 256.00
eth1-tx active 34157 8744192
0 1.88e0 256.00
ethernet-input active 68314 17488384
0 4.60e-1 256.00
ip4-input-no-checksum active 68314 17488384
0 3.58e-1 256.00
ip4-lookup active 68314 17488384
0 5.29e-1 256.00
ip4-rewrite active 68314 17488384
0 5.78e-1 256.00
unix-epoll-input polling 33 0
0 3.39e1 0.00
Output for command 'show hardware-interfaces':
vpp# sh hardware-interfaces
Name Idx Link Hardware
eth0 1 up eth0
Link speed: 40 Gbps
Ethernet address 3c:fd:fe:bb:d4:10
Intel X710/XL710 Family
carrier up full duplex mtu 9206
flags: admin-up pmd maybe-multiseg rx-ip4-cksum
Devargs:
rx: queues 1 (max 320), desc 1024 (min 64 max 4096 align 32)
tx: queues 2 (max 320), desc 1024 (min 64 max 4096 align 32)
pci: device 8086:1583 subsystem 8086:0001 address 0001:01:00.00 numa 0
max rx packet len: 9728
promiscuous: unicast off all-multicast on
vlan offload: strip off filter off qinq off
rx offload avail: vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq-strip
outer-ipv4-cksum vlan-filter vlan-extend jumbo-frame
scatter keep-crc rss-hash
rx offload active: ipv4-cksum jumbo-frame scatter
tx offload avail: vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp-cksum
tcp-tso outer-ipv4-cksum qinq-insert vxlan-tnl-tso
gre-tnl-tso ipip-tnl-tso geneve-tnl-tso multi-segs
mbuf-fast-free
tx offload active: multi-segs
rss avail: ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other
ipv6-frag
ipv6-tcp ipv6-udp ipv6-sctp ipv6-other l2-payload
rss active: none
tx burst mode: Scalar
rx burst mode: Vector Neon Scattered
So I am wondering why no-multi-seg option will change the vector rates
highlight in red above? Is this phenomenon expected?
I also saw performance drop when no-multi-seg option was not set in
startup.conf when I sent small packets(like 64 bytes) as traffic input(simple
IPv4 routing test case with 1 flow). How will VPP behave differently if
no-multi-seg option is set?
Look forward to getting your feedback.
Thanks,
Jieqiang Wang
IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18488): https://lists.fd.io/g/vpp-dev/message/18488
Mute This Topic: https://lists.fd.io/mt/79516636/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-