On Sat, Dec 1, 2018 at 1:17 AM LIU Yulong <[email protected]> wrote:

>
>
> On Fri, Nov 30, 2018 at 5:36 PM Lam, Tiago <[email protected]> wrote:
>
>> On 30/11/2018 02:07, LIU Yulong wrote:
>> > Hi,
>> >
>> > Thanks for the reply, please see my inline comments below.
>> >
>> >
>> > On Thu, Nov 29, 2018 at 6:00 PM Lam, Tiago <[email protected]
>> > <mailto:[email protected]>> wrote:
>> >
>> >     On 29/11/2018 08:24, LIU Yulong wrote:
>> >     > Hi,
>> >     >
>> >     > We recently tested ovs-dpdk, but we met some bandwidth issue. The
>> >     bandwidth
>> >     > from VM to VM was not close to the physical NIC, it's about
>> >     4.3Gbps on a
>> >     > 10Gbps NIC. For no dpdk (virtio-net) VMs, the iperf3 test can
>> easily
>> >     > reach 9.3Gbps. We enabled the virtio multiqueue for all guest VMs.
>> >     In the
>> >     > dpdk vhostuser guest, we noticed that the interrupts are
>> >     centralized to
>> >     > only one queue. But for no dpdk VM, interrupts can hash to all
>> queues.
>> >     > For those dpdk vhostuser VMs, we also noticed that the PMD usages
>> were
>> >     > also centralized to one no matter server(tx) or client(rx). And no
>> >     matter
>> >     > one PMD or multiple PMDs, this behavior always exists.
>> >     >
>> >     > Furthuremore, my colleague add some systemtap hook on the
>> openvswitch
>> >     > function, he found something interesting. The function
>> >     > __netdev_dpdk_vhost_send will send all the packets to one
>> >     virtionet-queue.
>> >     > Seems that there are some algorithm/hash table/logic does not do
>> >     the hash
>> >     > very well.
>> >     >
>> >
>> >     Hi,
>> >
>> >     When you say "no dpdk VMs", you mean that within your VM you're
>> relying
>> >     on the Kernel to get the packets, using virtio-net. And when you say
>> >     "dpdk vhostuser guest", you mean you're using DPDK inside the VM to
>> get
>> >     the packets. Is this correct?
>> >
>> >
>> > Sorry for the inaccurate description. I'm really new to DPDK.
>> > No DPDK inside VM, all these settings are for host only.
>> > (`host` means the hypervisor physical machine in the perspective of
>> > virtualization.
>> > On the other hand `guest` means the virtual machine.)
>> > "no dpdk VMs" means the host does not setup DPDK (ovs is working in
>> > traditional way),
>> > the VMs were boot on that. Maybe a new name `VMs-on-NO-DPDK-host`?
>>
>> Got it. Your "no dpdk VMs" really is referred to as OvS-Kernel, while
>> your "dpdk vhostuser guest" is referred to as OvS-DPDK.
>>
>> >
>> >     If so, could you also tell us which DPDK app you're using inside of
>> >     those VMs? Is it testpmd? If so, how are you setting the `--rxq` and
>> >     `--txq` args? Otherwise, how are you setting those in your app when
>> >     initializing DPDK?
>> >
>> >
>> > Inside VM, there is no DPDK app, VM kernel also
>> > does not set any config related to DPDK. `iperf3` is the tool for
>> > bandwidth testing.
>> >
>> >     The information below is useful in telling us how you're setting
>> your
>> >     configurations in OvS, but we are still missing the configurations
>> >     inside the VM.
>> >
>> >     This should help us in getting more information,
>> >
>> >
>> > Maybe you have noticed that, we only setup one PMD in the pasted
>> > configurations.
>> > But VM has 8 queues. Should the pmd quantity match the queues?
>>
>> It shouldn't match the queues inside the VM per say. But in this case,
>> since you have configured 8 rx queues on your physical NICs as well, and
>> since you're looking for higher throughputs, you should increase that
>> number of PMDs and pin those rxqs - take a look at [1] on how to do
>> that. Later on, increasing the size of your queues could also help.
>>
>>
> I'll test it.
> Yes, as you noticed that the vhostuserclient  port has n_rxq="8",
> options:
> {n_rxq="8",vhost-server-path="/var/lib/vhost_sockets/vhu76f9a623-9f"}.
> And the physical NIC has both n_rxq="8", n_txq="8".
> options: {dpdk-devargs="0000:01:00.0", n_rxq="8", n_txq="8"}
> options: {dpdk-devargs="0000:05:00.1", n_rxq="8", n_txq="8"}
> But, furthermore, when remove such configuration for vhostuserclient  port
> and physical NIC,
> the bandwidth is same to 4.3Gbps no matter one PMD or multiple PMDs.
>

Bad news, the bandwidth does not increase so much, it's about ~4.9Gps -
5.3Gbps.
The followings are the new configrations. VM still has 8 queues.
But now I have 4 PMDs.

# ovs-vsctl get interface nic-10G-1 other_config
{pmd-rxq-affinity="0:2,1:4,3:20"}
# ovs-vsctl get interface nic-10G-2 other_config
{pmd-rxq-affinity="0:2,1:4,3:20"}
# ovs-vsctl get interface vhuc8febeff-56 other_config
{pmd-rxq-affinity="0:2,1:4,3:20"}

# ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 2:
        isolated : true
        port: nic-10G-1         queue-id:  0    pmd usage:  0 %
        port: nic-10G-2         queue-id:  0    pmd usage:  0 %
        port: vhuc8febeff-56    queue-id:  0    pmd usage:  0 %
pmd thread numa_id 0 core_id 4:
        isolated : true
        port: nic-10G-1         queue-id:  1    pmd usage:  0 %
        port: nic-10G-2         queue-id:  1    pmd usage:  0 %
        port: vhuc8febeff-56    queue-id:  1    pmd usage:  0 %
pmd thread numa_id 0 core_id 8:
        isolated : false
        port: nic-10G-1         queue-id:  2    pmd usage:  0 %
        port: nic-10G-2         queue-id:  2    pmd usage:  0 %
        port: vhuc8febeff-56    queue-id:  2    pmd usage:  0 %
        port: vhuc8febeff-56    queue-id:  4    pmd usage:  0 %
        port: vhuc8febeff-56    queue-id:  5    pmd usage:  0 %
        port: vhuc8febeff-56    queue-id:  6    pmd usage:  0 %
        port: vhuc8febeff-56    queue-id:  7    pmd usage:  0 %
pmd thread numa_id 0 core_id 20:
        isolated : true
        port: nic-10G-1         queue-id:  3    pmd usage:  0 %
        port: nic-10G-2         queue-id:  3    pmd usage:  0 %
        port: vhuc8febeff-56    queue-id:  3    pmd usage:  0 %


# ovs-vsctl show
...
        Port dpdkbond
            Interface "nic-10G-2"
                type: dpdk
                options: {dpdk-devargs="0000:05:00.1", mtu_request="9000",
n_rxq="4", n_txq="4"}
            Interface "nic-10G-1"
                type: dpdk
                options: {dpdk-devargs="0000:01:00.0", mtu_request="9000",
n_rxq="4", n_txq="4"}
        Port br-ex
            Interface br-ex
                type: internal
    Bridge br-int
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port int-br-ex
            Interface int-br-ex
                type: patch
                options: {peer=phy-br-ex}
        Port br-int
            Interface br-int
                type: internal
        Port "vhuc8febeff-56"
            tag: 1
            Interface "vhuc8febeff-56"
                type: dpdkvhostuserclient
                options: {n_rxq="4", n_txq="4",
vhost-server-path="/var/lib/vhost_sockets/vhuc8febeff-56"}



>
>
>> Just as a curiosity, I see you have a configured MTU of 1500B on the
>> physical interfaces. Is that the same MTU you're using inside the VM?
>> And are you using the same configurations (including that 1500B MTU)
>> when running your OvS-Kernel setup?
>>
>>
> MTU inside VM is 1450. Is that OK for the high throughput?
>
>
Inside VM the MTU is 1500, the dpdk physical NIC (OvS-Kernel) is 9000 now.
Bandwidth is ~5.1Gbps now.


>
>
>> Hope this helps,
>>
>>
>
>
>
>> Tiago.
>>
>> [1]
>>
>> http://docs.openvswitch.org/en/latest/topics/dpdk/pmd/#port-rx-queue-assigment-to-pmd-threads
>>
>> >
>> >     Tiago.
>> >
>> >     > So I'd like to find some help from the community. Maybe I'm
>> >     missing some
>> >     > configrations.
>> >     >
>> >     > Thanks.
>> >     >
>> >     >
>> >     > Here is the list of the environment and some configrations:
>> >     > # uname -r
>> >     > 3.10.0-862.11.6.el7.x86_64
>> >     > # rpm -qa|grep dpdk
>> >     > dpdk-17.11-11.el7.x86_64
>> >     > # rpm -qa|grep openvswitch
>> >     > openvswitch-2.9.0-3.el7.x86_64
>> >     > # ovs-vsctl list open_vswitch
>> >     > _uuid               : a6a3d9eb-28a8-4bf0-a8b4-94577b5ffe5e
>> >     > bridges             : [531e4bea-ce12-402a-8a07-7074c31b978e,
>> >     > 5c1675e2-5408-4c1f-88bc-6d9c9b932d47]
>> >     > cur_cfg             : 1305
>> >     > datapath_types      : [netdev, system]
>> >     > db_version          : "7.15.1"
>> >     > external_ids        : {hostname="cq01-compute-10e112e5e140",
>> >     > rundir="/var/run/openvswitch",
>> >     > system-id="e2cc84fe-a3c8-455f-8c64-260741c141ee"}
>> >     > iface_types         : [dpdk, dpdkr, dpdkvhostuser,
>> >     dpdkvhostuserclient,
>> >     > geneve, gre, internal, lisp, patch, stt, system, tap, vxlan]
>> >     > manager_options     : [43803994-272b-49cb-accc-ab672d1eefc8]
>> >     > next_cfg            : 1305
>> >     > other_config        : {dpdk-init="true", dpdk-lcore-mask="0x1",
>> >     > dpdk-socket-mem="1024,1024", pmd-cpu-mask="0x100000",
>> >     > vhost-iommu-support="true"}
>> >     > ovs_version         : "2.9.0"
>> >     > ssl                 : []
>> >     > statistics          : {}
>> >     > system_type         : centos
>> >     > system_version      : "7"
>> >     > # lsmod |grep vfio
>> >     > vfio_pci               41312  2
>> >     > vfio_iommu_type1       22300  1
>> >     > vfio                   32695  7 vfio_iommu_type1,vfio_pci
>> >     > irqbypass              13503  23 kvm,vfio_pci
>> >     >
>> >     > # ovs-appctl dpif/show
>> >     > netdev@ovs-netdev: hit:759366335 missed:754283
>> >     > br-ex:
>> >     > bond1108 4/6: (tap)
>> >     > br-ex 65534/3: (tap)
>> >     > nic-10G-1 5/4: (dpdk: configured_rx_queues=8,
>> >     > configured_rxq_descriptors=2048, configured_tx_queues=2,
>> >     > configured_txq_descriptors=2048, mtu=1500, requested_rx_queues=8,
>> >     > requested_rxq_descriptors=2048, requested_tx_queues=2,
>> >     > requested_txq_descriptors=2048, rx_csum_offload=true)
>> >     > nic-10G-2 6/5: (dpdk: configured_rx_queues=8,
>> >     > configured_rxq_descriptors=2048, configured_tx_queues=2,
>> >     > configured_txq_descriptors=2048, mtu=1500, requested_rx_queues=8,
>> >     > requested_rxq_descriptors=2048, requested_tx_queues=2,
>> >     > requested_txq_descriptors=2048, rx_csum_offload=true)
>> >     > phy-br-ex 3/none: (patch: peer=int-br-ex)
>> >     > br-int:
>> >     > br-int 65534/2: (tap)
>> >     > int-br-ex 1/none: (patch: peer=phy-br-ex)
>> >     > vhu76f9a623-9f 2/1: (dpdkvhostuserclient: configured_rx_queues=8,
>> >     > configured_tx_queues=8, mtu=1500, requested_rx_queues=8,
>> >     > requested_tx_queues=8)
>> >     >
>> >     > # ovs-appctl dpctl/show -s
>> >     > netdev@ovs-netdev:
>> >     > lookups: hit:759366335 missed:754283 lost:72
>> >     > flows: 186
>> >     > port 0: ovs-netdev (tap)
>> >     > RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>> >     > TX packets:0 errors:0 dropped:0 aborted:0 carrier:0
>> >     > collisions:0
>> >     > RX bytes:0  TX bytes:0
>> >     > port 1: vhu76f9a623-9f (dpdkvhostuserclient:
>> configured_rx_queues=8,
>> >     > configured_tx_queues=8, mtu=1500, requested_rx_queues=8,
>> >     > requested_tx_queues=8)
>> >     > RX packets:718391758 errors:0 dropped:0 overruns:? frame:?
>> >     > TX packets:30372410 errors:? dropped:719200 aborted:? carrier:?
>> >     > collisions:?
>> >     > RX bytes:1086995317051 (1012.3 GiB)  TX bytes:2024893540 (1.9 GiB)
>> >     > port 2: br-int (tap)
>> >     > RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>> >     > TX packets:1393992 errors:0 dropped:4 aborted:0 carrier:0
>> >     > collisions:0
>> >     > RX bytes:0  TX bytes:2113616736 (2.0 GiB)
>> >     > port 3: br-ex (tap)
>> >     > RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>> >     > TX packets:6660091 errors:0 dropped:967 aborted:0 carrier:0
>> >     > collisions:0
>> >     > RX bytes:0  TX bytes:2451440870 (2.3 GiB)
>> >     > port 4: nic-10G-1 (dpdk: configured_rx_queues=8,
>> >     > configured_rxq_descriptors=2048, configured_tx_queues=2,
>> >     > configured_txq_descriptors=2048, mtu=1500, requested_rx_queues=8,
>> >     > requested_rxq_descriptors=2048, requested_tx_queues=2,
>> >     > requested_txq_descriptors=2048, rx_csum_offload=true)
>> >     > RX packets:36409466 errors:0 dropped:0 overruns:? frame:?
>> >     > TX packets:718371472 errors:0 dropped:20276 aborted:? carrier:?
>> >     > collisions:?
>> >     > RX bytes:2541593983 (2.4 GiB)  TX bytes:1089838136919 (1015.0 GiB)
>> >     > port 5: nic-10G-2 (dpdk: configured_rx_queues=8,
>> >     > configured_rxq_descriptors=2048, configured_tx_queues=2,
>> >     > configured_txq_descriptors=2048, mtu=1500, requested_rx_queues=8,
>> >     > requested_rxq_descriptors=2048, requested_tx_queues=2,
>> >     > requested_txq_descriptors=2048, rx_csum_offload=true)
>> >     > RX packets:5319466 errors:0 dropped:0 overruns:? frame:?
>> >     > TX packets:0 errors:0 dropped:0 aborted:? carrier:?
>> >     > collisions:?
>> >     > RX bytes:344903551 (328.9 MiB)  TX bytes:0
>> >     > port 6: bond1108 (tap)
>> >     > RX packets:228 errors:0 dropped:0 overruns:0 frame:0
>> >     > TX packets:5460 errors:0 dropped:18 aborted:0 carrier:0
>> >     > collisions:0
>> >     > RX bytes:21459 (21.0 KiB)  TX bytes:341087 (333.1 KiB)
>> >     >
>> >     > # ovs-appctl dpif-netdev/pmd-stats-show
>> >     > pmd thread numa_id 0 core_id 20:
>> >     > packets received: 760120690
>> >     > packet recirculations: 0
>> >     > avg. datapath passes per packet: 1.00
>> >     > emc hits: 750787577
>> >     > megaflow hits: 8578758
>> >     > avg. subtable lookups per megaflow hit: 1.05
>> >     > miss with success upcall: 754283
>> >     > miss with failed upcall: 72
>> >     > avg. packets per output batch: 2.21
>> >     > idle cycles: 210648140144730 (99.13%)
>> >     > processing cycles: 1846745927216 (0.87%)
>> >     > avg cycles per packet: 279554.14 (212494886071946/760120690)
>> >     > avg processing cycles per packet: 2429.54
>> (1846745927216/760120690)
>> >     > main thread:
>> >     > packets received: 0
>> >     > packet recirculations: 0
>> >     > avg. datapath passes per packet: 0.00
>> >     > emc hits: 0
>> >     > megaflow hits: 0
>> >     > avg. subtable lookups per megaflow hit: 0.00
>> >     > miss with success upcall: 0
>> >     > miss with failed upcall: 0
>> >     > avg. packets per output batch: 0.00
>> >     >
>> >     > # ovs-appctl dpif-netdev/pmd-rxq-show
>> >     > pmd thread numa_id 0 core_id 20:
>> >     > isolated : false
>> >     > port: nic-10G-1       queue-id:  0pmd usage:  0 %
>> >     > port: nic-10G-1       queue-id:  1pmd usage:  0 %
>> >     > port: nic-10G-1       queue-id:  2pmd usage:  0 %
>> >     > port: nic-10G-1       queue-id:  3pmd usage:  0 %
>> >     > port: nic-10G-1       queue-id:  4pmd usage:  0 %
>> >     > port: nic-10G-1       queue-id:  5pmd usage:  0 %
>> >     > port: nic-10G-1       queue-id:  6pmd usage:  0 %
>> >     > port: nic-10G-1       queue-id:  7pmd usage:  0 %
>> >     > port: nic-10G-2       queue-id:  0pmd usage:  0 %
>> >     > port: nic-10G-2       queue-id:  1pmd usage:  0 %
>> >     > port: nic-10G-2       queue-id:  2pmd usage:  0 %
>> >     > port: nic-10G-2       queue-id:  3pmd usage:  0 %
>> >     > port: nic-10G-2       queue-id:  4pmd usage:  0 %
>> >     > port: nic-10G-2       queue-id:  5pmd usage:  0 %
>> >     > port: nic-10G-2       queue-id:  6pmd usage:  0 %
>> >     > port: nic-10G-2       queue-id:  7pmd usage:  0 %
>> >     > port: vhu76f9a623-9f  queue-id:  0pmd usage:  0 %
>> >     > port: vhu76f9a623-9f  queue-id:  1pmd usage:  0 %
>> >     > port: vhu76f9a623-9f  queue-id:  2pmd usage:  0 %
>> >     > port: vhu76f9a623-9f  queue-id:  3pmd usage:  0 %
>> >     > port: vhu76f9a623-9f  queue-id:  4pmd usage:  0 %
>> >     > port: vhu76f9a623-9f  queue-id:  5pmd usage:  0 %
>> >     > port: vhu76f9a623-9f  queue-id:  6pmd usage:  0 %
>> >     > port: vhu76f9a623-9f  queue-id:  7pmd usage:  0 %
>> >     >
>> >     >
>> >     > # virsh dumpxml instance-5c5191ff-c1a2-4429-9a8b-93ddd939583d
>> >     > ...
>> >     >     <interface type='vhostuser'>
>> >     >       <mac address='fa:16:3e:77:ab:fb'/>
>> >     >       <source type='unix'
>> path='/var/lib/vhost_sockets/vhu76f9a623-9f'
>> >     > mode='server'/>
>> >     >       <target dev='vhu76f9a623-9f'/>
>> >     >       <model type='virtio'/>
>> >     >       <driver name='vhost' queues='8'/>
>> >     >       <alias name='net0'/>
>> >     >       <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
>> >     > function='0x0'/>
>> >     >     </interface>
>> >     > ...
>> >     >
>> >     > # ovs-vsctl show
>> >     > a6a3d9eb-28a8-4bf0-a8b4-94577b5ffe5e
>> >     >     Manager "ptcp:6640:127.0.0.1"
>> >     >         is_connected: true
>> >     >     Bridge br-int
>> >     >         Controller "tcp:127.0.0.1:6633 <http://127.0.0.1:6633>
>> >     <http://127.0.0.1:6633>"
>> >     >             is_connected: true
>> >     >         fail_mode: secure
>> >     >         Port int-br-ex
>> >     >             Interface int-br-ex
>> >     >                 type: patch
>> >     >                 options: {peer=phy-br-ex}
>> >     >         Port br-int
>> >     >             Interface br-int
>> >     >                 type: internal
>> >     >         Port "vhu76f9a623-9f"
>> >     >             tag: 1
>> >     >             Interface "vhu76f9a623-9f"
>> >     >                 type: dpdkvhostuserclient
>> >     >                 options: {n_rxq="8",
>> >     > vhost-server-path="/var/lib/vhost_sockets/vhu76f9a623-9f"}
>> >     >     Bridge br-ex
>> >     >         Controller "tcp:127.0.0.1:6633 <http://127.0.0.1:6633>
>> >     <http://127.0.0.1:6633>"
>> >     >             is_connected: true
>> >     >         fail_mode: secure
>> >     >         Port dpdkbond
>> >     >             Interface "nic-10G-1"
>> >     >                 type: dpdk
>> >     >                 options: {dpdk-devargs="0000:01:00.0", n_rxq="8",
>> >     n_txq="8"}
>> >     >             Interface "nic-10G-2"
>> >     >                 type: dpdk
>> >     >                 options: {dpdk-devargs="0000:05:00.1", n_rxq="8",
>> >     n_txq="8"}
>> >     >         Port phy-br-ex
>> >     >             Interface phy-br-ex
>> >     >                 type: patch
>> >     >                 options: {peer=int-br-ex}
>> >     >         Port br-ex
>> >     >             Interface br-ex
>> >     >                 type: internal
>> >     >
>> >     > # numactl --hardware
>> >     > available: 2 nodes (0-1)
>> >     > node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36
>> 38
>> >     > node 0 size: 130978 MB
>> >     > node 0 free: 7539 MB
>> >     > node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37
>> 39
>> >     > node 1 size: 131072 MB
>> >     > node 1 free: 6886 MB
>> >     > node distances:
>> >     > node   0   1
>> >     >   0:  10  21
>> >     >   1:  21  10
>> >     >
>> >     > # grep HugePages_ /proc/meminfo
>> >     > HugePages_Total:     232
>> >     > HugePages_Free:       10
>> >     > HugePages_Rsvd:        0
>> >     > HugePages_Surp:        0
>> >     >
>> >     >
>> >     > # cat /proc/cmdline
>> >     > BOOT_IMAGE=/boot/vmlinuz-3.10.0-862.11.6.el7.x86_64
>> >     > root=UUID=220ee106-5e00-4809-91a0-641e045a4c21 ro
>> >     > intel_idle.max_cstate=0 crashkernel=auto rhgb quiet
>> >     > default_hugepagesz=1G hugepagesz=1G hugepages=232 iommu=pt
>> >     intel_iommu=on
>> >     >
>> >     >
>> >     > Best regards,
>> >     > LIU Yulong
>> >     >
>> >     > _______________________________________________
>> >     > discuss mailing list
>> >     > [email protected] <mailto:[email protected]>
>> >     > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>> >     >
>> >
>> >
>> > _______________________________________________
>> > discuss mailing list
>> > [email protected]
>> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>> >
>>
>
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to