On 6/24/20 4:03 AM, Vipul Ujawane wrote:
Dear all,

I am observing a very low performance when running OVS-DPDK when compared
to OVS running with the Kernel Datapath.
I have OvS version 2.13.90 compiled from source with the latest stable DPDK
v19.11.3 on a stable Debian system running kernel 4.19.0-9-amd64 (real
version:4.19.118).

I have tried to use the latest released OvS as well (2.12) with the same
LTS DPDK. As a last resort, I have tried an older kernel, whether it has
any problem (4.19.0-8-amd64 (real version:4.19.98)).

I have not been able to troubleshoot the problem, and kindly request your
help regarding the same.

HW configuration
================
We have to two totally identical servers (Debian stable, Intel(R) Xeon(R)
Gold 6230 CPU, 96G Mem), each runs KVM virtual machine. On the hypervisor
layer, we have OvS for traffic routing. The servers are connected directly
via a Mellanox ConnectX-5 (1x100G).
OVS Forwarding tables are configured for simple port-forwarding only to
avoid any packet processing-related issue.

Problem
=======
When both servers are running OVS-Kernel at the hypervisor layer and VMs
are connected to it via libvirt and virtio interfaces, the
VM->Server1->Server2->VM throughput is around 16-18Gbps.
However, when using OVS-DPDK with the same setting, the throughput drops
down to 4-6Gbps.

You don't mention the traffic profile. I assume 64 byte frames but best to be explicit.


SW/driver configurations:
==================
DPDK
----
In config common_base, besides the defaults, I have enabled the following
extra drivers/features to be compiled/enabled.
CONFIG_RTE_LIBRTE_MLX5_PMD=y
CONFIG_RTE_LIBRTE_VHOST=y
CONFIG_RTE_LIBRTE_VHOST_NUMA=y
CONFIG_RTE_LIBRTE_PMD_VHOST=y
CONFIG_RTE_VIRTIO_USER=n
CONFIG_RTE_EAL_VFIO=y


OVS
---
$ovs-vswitchd --version
ovs-vswitchd (Open vSwitch) 2.13.90

$sudo ovs-vsctl get Open_vSwitch . dpdk_initialized
true

$sudo ovs-vsctl get Open_vSwitch . dpdk_version
"DPDK 19.11.3"

OS settings
-----------
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster


$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.19.0-9-amd64 root=/dev/mapper/Volume0-debian--stable
ro default_hugepagesz=1G hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt
quiet

Why don't you reserve any CPUs for OVS/DPDK or VM usage? All published performance white papers recommend settings for CPU isolation like this Mellanox DPDK performance report:

https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_performance_report.pdf

For their test system:

isolcpus=24-47 intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable nohz_full=24-47 rcu_nocbs=24-47 rcu_nocb_poll default_hugepagesz=1G hugepagesz=1G hugepages=64 audit=0
nosoftlockup

Using the tuned service (CPU partitioning profile) make this process easier:

https://tuned-project.org/


./usertools/dpdk-devbind.py --status
Network devices using kernel driver
===================================
0000:b3:00.0 'MT27800 Family [ConnectX-5] 1017' if=ens2 drv=mlx5_core
unused=igb_uio,vfio-pci

Due to the way how Mellanox cards and their driver work, I have not bond
igb_uio to the interface, however, uio, igb_uio and vfio-pci kernel modules
are loaded.


Relevant part of the VM-config for Qemu/KVM
-------------------------------------------
   <cputune>
     <shares>4096</shares>
     <vcpupin vcpu='0' cpuset='4'/>
     <vcpupin vcpu='1' cpuset='5'/>

Where did you get these CPU mapping values? x86 systems typically map even-numbered CPUs to one NUMA node and odd-numbered CPUs to a different NUMA node. You generally want to select CPUs from the same NUMA node as the mlx5 NIC you're using for DPDK.

You should have at least 4 CPUs in the VM, selected according to the NUMA topology of the system.

Take a look at this bash script written for Red Hat:

https://github.com/ctrautma/RHEL_NIC_QUALIFICATION/blob/ansible/ansible/get_cpulist.sh

It gives you a good starting reference which CPUs to select for the OVS/DPDK and VM configurations on your particular system. Also review the Ansible script pvp_ovsdpdk.yml, it provides a lot of other useful steps you might be able to apply to your Debian OS.

     <emulatorpin cpuset='4-5'/>
   </cputune>
   <cpu mode='host-model' check='partial'>
     <model fallback='allow'/>
     <topology sockets='2' cores='1' threads='1'/>
     <numa>
       <cell id='0' cpus='0-1' memory='4194304' unit='KiB'
memAccess='shared'/>
     </numa>
   </cpu>
     <interface type='vhostuser'>
       <mac address='00:00:00:00:00:aa'/>
       <source type='unix' path='/usr/local/var/run/openvswitch/vhostuser'
mo$
       <model type='virtio'/>
       <driver queues='2'>
         <host mrg_rxbuf='on'/>

Is there a requirement for mergeable RX buffers? Some PMDs like mlx5 can take advantage of SSE instructions when this is disabled, yielding better performance.

       </driver>
       <address type='pci' domain='0x0000' bus='0x07' slot='0x00'
function='0x0'$
     </interface>


I don't see hugepage usage in the libvirt XML.  Something similar to:

  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='1048576' unit='KiB' nodeset='0'/>
    </hugepages>
  </memoryBacking>


-----------------------------------
OVS Start Config
-----------------------------------
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="4096,0"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0xff
ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0e

These two masks shouldn't overlap:
https://developers.redhat.com/blog/2017/06/28/ovs-dpdk-parameters-dealing-with-multi-numa/

ovs-vsctl add-port ovsbr dpdk0 -- set Interface dpdk0 type=dpdk
options:dpdk-devargs=0000:b3:00.0
ovs-vsctl set interface dpdk0 options:n_rxq=2
ovs-vsctl add-port ovsbr vhost-vm -- set Interface vhostuser
type=dpdkvhostuser



-------------------------------------------------------
$cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.19.0-9-amd64 root=/dev/mapper/Volume0-debian--stable
ro default_hugepagesz=1G hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt
quiet


Is there anything I should be aware of the versions and setting I am using?
Did I compile DPDK and/or OvS in a wrong way?

Thank you for your kind help ;)

Reply via email to