On 6/24/20 4:03 AM, Vipul Ujawane wrote:
Dear all,
I am observing a very low performance when running OVS-DPDK when compared
to OVS running with the Kernel Datapath.
I have OvS version 2.13.90 compiled from source with the latest stable DPDK
v19.11.3 on a stable Debian system running kernel 4.19.0-9-amd64 (real
version:4.19.118).
I have tried to use the latest released OvS as well (2.12) with the same
LTS DPDK. As a last resort, I have tried an older kernel, whether it has
any problem (4.19.0-8-amd64 (real version:4.19.98)).
I have not been able to troubleshoot the problem, and kindly request your
help regarding the same.
HW configuration
================
We have to two totally identical servers (Debian stable, Intel(R) Xeon(R)
Gold 6230 CPU, 96G Mem), each runs KVM virtual machine. On the hypervisor
layer, we have OvS for traffic routing. The servers are connected directly
via a Mellanox ConnectX-5 (1x100G).
OVS Forwarding tables are configured for simple port-forwarding only to
avoid any packet processing-related issue.
Problem
=======
When both servers are running OVS-Kernel at the hypervisor layer and VMs
are connected to it via libvirt and virtio interfaces, the
VM->Server1->Server2->VM throughput is around 16-18Gbps.
However, when using OVS-DPDK with the same setting, the throughput drops
down to 4-6Gbps.
You don't mention the traffic profile. I assume 64 byte frames but best
to be explicit.
SW/driver configurations:
==================
DPDK
----
In config common_base, besides the defaults, I have enabled the following
extra drivers/features to be compiled/enabled.
CONFIG_RTE_LIBRTE_MLX5_PMD=y
CONFIG_RTE_LIBRTE_VHOST=y
CONFIG_RTE_LIBRTE_VHOST_NUMA=y
CONFIG_RTE_LIBRTE_PMD_VHOST=y
CONFIG_RTE_VIRTIO_USER=n
CONFIG_RTE_EAL_VFIO=y
OVS
---
$ovs-vswitchd --version
ovs-vswitchd (Open vSwitch) 2.13.90
$sudo ovs-vsctl get Open_vSwitch . dpdk_initialized
true
$sudo ovs-vsctl get Open_vSwitch . dpdk_version
"DPDK 19.11.3"
OS settings
-----------
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster
$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.19.0-9-amd64 root=/dev/mapper/Volume0-debian--stable
ro default_hugepagesz=1G hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt
quiet
Why don't you reserve any CPUs for OVS/DPDK or VM usage? All published
performance white papers recommend settings for CPU isolation like this
Mellanox DPDK performance report:
https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_performance_report.pdf
For their test system:
isolcpus=24-47 intel_idle.max_cstate=0 processor.max_cstate=0
intel_pstate=disable nohz_full=24-47
rcu_nocbs=24-47 rcu_nocb_poll default_hugepagesz=1G hugepagesz=1G
hugepages=64 audit=0
nosoftlockup
Using the tuned service (CPU partitioning profile) make this process easier:
https://tuned-project.org/
./usertools/dpdk-devbind.py --status
Network devices using kernel driver
===================================
0000:b3:00.0 'MT27800 Family [ConnectX-5] 1017' if=ens2 drv=mlx5_core
unused=igb_uio,vfio-pci
Due to the way how Mellanox cards and their driver work, I have not bond
igb_uio to the interface, however, uio, igb_uio and vfio-pci kernel modules
are loaded.
Relevant part of the VM-config for Qemu/KVM
-------------------------------------------
<cputune>
<shares>4096</shares>
<vcpupin vcpu='0' cpuset='4'/>
<vcpupin vcpu='1' cpuset='5'/>
Where did you get these CPU mapping values? x86 systems typically map
even-numbered CPUs to one NUMA node and odd-numbered CPUs to a different
NUMA node. You generally want to select CPUs from the same NUMA node as
the mlx5 NIC you're using for DPDK.
You should have at least 4 CPUs in the VM, selected according to the
NUMA topology of the system.
Take a look at this bash script written for Red Hat:
https://github.com/ctrautma/RHEL_NIC_QUALIFICATION/blob/ansible/ansible/get_cpulist.sh
It gives you a good starting reference which CPUs to select for the
OVS/DPDK and VM configurations on your particular system. Also review
the Ansible script pvp_ovsdpdk.yml, it provides a lot of other useful
steps you might be able to apply to your Debian OS.
<emulatorpin cpuset='4-5'/>
</cputune>
<cpu mode='host-model' check='partial'>
<model fallback='allow'/>
<topology sockets='2' cores='1' threads='1'/>
<numa>
<cell id='0' cpus='0-1' memory='4194304' unit='KiB'
memAccess='shared'/>
</numa>
</cpu>
<interface type='vhostuser'>
<mac address='00:00:00:00:00:aa'/>
<source type='unix' path='/usr/local/var/run/openvswitch/vhostuser'
mo$
<model type='virtio'/>
<driver queues='2'>
<host mrg_rxbuf='on'/>
Is there a requirement for mergeable RX buffers? Some PMDs like mlx5
can take advantage of SSE instructions when this is disabled, yielding
better performance.
</driver>
<address type='pci' domain='0x0000' bus='0x07' slot='0x00'
function='0x0'$
</interface>
I don't see hugepage usage in the libvirt XML. Something similar to:
<memory unit='KiB'>8388608</memory>
<currentMemory unit='KiB'>8388608</currentMemory>
<memoryBacking>
<hugepages>
<page size='1048576' unit='KiB' nodeset='0'/>
</hugepages>
</memoryBacking>
-----------------------------------
OVS Start Config
-----------------------------------
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="4096,0"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0xff
ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0e
These two masks shouldn't overlap:
https://developers.redhat.com/blog/2017/06/28/ovs-dpdk-parameters-dealing-with-multi-numa/
ovs-vsctl add-port ovsbr dpdk0 -- set Interface dpdk0 type=dpdk
options:dpdk-devargs=0000:b3:00.0
ovs-vsctl set interface dpdk0 options:n_rxq=2
ovs-vsctl add-port ovsbr vhost-vm -- set Interface vhostuser
type=dpdkvhostuser
-------------------------------------------------------
$cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.19.0-9-amd64 root=/dev/mapper/Volume0-debian--stable
ro default_hugepagesz=1G hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt
quiet
Is there anything I should be aware of the versions and setting I am using?
Did I compile DPDK and/or OvS in a wrong way?
Thank you for your kind help ;)