On Sat, Dec 1, 2018 at 1:17 AM LIU Yulong <[email protected]> wrote:
> > > On Fri, Nov 30, 2018 at 5:36 PM Lam, Tiago <[email protected]> wrote: > >> On 30/11/2018 02:07, LIU Yulong wrote: >> > Hi, >> > >> > Thanks for the reply, please see my inline comments below. >> > >> > >> > On Thu, Nov 29, 2018 at 6:00 PM Lam, Tiago <[email protected] >> > <mailto:[email protected]>> wrote: >> > >> > On 29/11/2018 08:24, LIU Yulong wrote: >> > > Hi, >> > > >> > > We recently tested ovs-dpdk, but we met some bandwidth issue. The >> > bandwidth >> > > from VM to VM was not close to the physical NIC, it's about >> > 4.3Gbps on a >> > > 10Gbps NIC. For no dpdk (virtio-net) VMs, the iperf3 test can >> easily >> > > reach 9.3Gbps. We enabled the virtio multiqueue for all guest VMs. >> > In the >> > > dpdk vhostuser guest, we noticed that the interrupts are >> > centralized to >> > > only one queue. But for no dpdk VM, interrupts can hash to all >> queues. >> > > For those dpdk vhostuser VMs, we also noticed that the PMD usages >> were >> > > also centralized to one no matter server(tx) or client(rx). And no >> > matter >> > > one PMD or multiple PMDs, this behavior always exists. >> > > >> > > Furthuremore, my colleague add some systemtap hook on the >> openvswitch >> > > function, he found something interesting. The function >> > > __netdev_dpdk_vhost_send will send all the packets to one >> > virtionet-queue. >> > > Seems that there are some algorithm/hash table/logic does not do >> > the hash >> > > very well. >> > > >> > >> > Hi, >> > >> > When you say "no dpdk VMs", you mean that within your VM you're >> relying >> > on the Kernel to get the packets, using virtio-net. And when you say >> > "dpdk vhostuser guest", you mean you're using DPDK inside the VM to >> get >> > the packets. Is this correct? >> > >> > >> > Sorry for the inaccurate description. I'm really new to DPDK. >> > No DPDK inside VM, all these settings are for host only. >> > (`host` means the hypervisor physical machine in the perspective of >> > virtualization. >> > On the other hand `guest` means the virtual machine.) >> > "no dpdk VMs" means the host does not setup DPDK (ovs is working in >> > traditional way), >> > the VMs were boot on that. Maybe a new name `VMs-on-NO-DPDK-host`? >> >> Got it. Your "no dpdk VMs" really is referred to as OvS-Kernel, while >> your "dpdk vhostuser guest" is referred to as OvS-DPDK. >> >> > >> > If so, could you also tell us which DPDK app you're using inside of >> > those VMs? Is it testpmd? If so, how are you setting the `--rxq` and >> > `--txq` args? Otherwise, how are you setting those in your app when >> > initializing DPDK? >> > >> > >> > Inside VM, there is no DPDK app, VM kernel also >> > does not set any config related to DPDK. `iperf3` is the tool for >> > bandwidth testing. >> > >> > The information below is useful in telling us how you're setting >> your >> > configurations in OvS, but we are still missing the configurations >> > inside the VM. >> > >> > This should help us in getting more information, >> > >> > >> > Maybe you have noticed that, we only setup one PMD in the pasted >> > configurations. >> > But VM has 8 queues. Should the pmd quantity match the queues? >> >> It shouldn't match the queues inside the VM per say. But in this case, >> since you have configured 8 rx queues on your physical NICs as well, and >> since you're looking for higher throughputs, you should increase that >> number of PMDs and pin those rxqs - take a look at [1] on how to do >> that. Later on, increasing the size of your queues could also help. >> >> > I'll test it. > Yes, as you noticed that the vhostuserclient port has n_rxq="8", > options: > {n_rxq="8",vhost-server-path="/var/lib/vhost_sockets/vhu76f9a623-9f"}. > And the physical NIC has both n_rxq="8", n_txq="8". > options: {dpdk-devargs="0000:01:00.0", n_rxq="8", n_txq="8"} > options: {dpdk-devargs="0000:05:00.1", n_rxq="8", n_txq="8"} > But, furthermore, when remove such configuration for vhostuserclient port > and physical NIC, > the bandwidth is same to 4.3Gbps no matter one PMD or multiple PMDs. > Bad news, the bandwidth does not increase so much, it's about ~4.9Gps - 5.3Gbps. The followings are the new configrations. VM still has 8 queues. But now I have 4 PMDs. # ovs-vsctl get interface nic-10G-1 other_config {pmd-rxq-affinity="0:2,1:4,3:20"} # ovs-vsctl get interface nic-10G-2 other_config {pmd-rxq-affinity="0:2,1:4,3:20"} # ovs-vsctl get interface vhuc8febeff-56 other_config {pmd-rxq-affinity="0:2,1:4,3:20"} # ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 0 core_id 2: isolated : true port: nic-10G-1 queue-id: 0 pmd usage: 0 % port: nic-10G-2 queue-id: 0 pmd usage: 0 % port: vhuc8febeff-56 queue-id: 0 pmd usage: 0 % pmd thread numa_id 0 core_id 4: isolated : true port: nic-10G-1 queue-id: 1 pmd usage: 0 % port: nic-10G-2 queue-id: 1 pmd usage: 0 % port: vhuc8febeff-56 queue-id: 1 pmd usage: 0 % pmd thread numa_id 0 core_id 8: isolated : false port: nic-10G-1 queue-id: 2 pmd usage: 0 % port: nic-10G-2 queue-id: 2 pmd usage: 0 % port: vhuc8febeff-56 queue-id: 2 pmd usage: 0 % port: vhuc8febeff-56 queue-id: 4 pmd usage: 0 % port: vhuc8febeff-56 queue-id: 5 pmd usage: 0 % port: vhuc8febeff-56 queue-id: 6 pmd usage: 0 % port: vhuc8febeff-56 queue-id: 7 pmd usage: 0 % pmd thread numa_id 0 core_id 20: isolated : true port: nic-10G-1 queue-id: 3 pmd usage: 0 % port: nic-10G-2 queue-id: 3 pmd usage: 0 % port: vhuc8febeff-56 queue-id: 3 pmd usage: 0 % # ovs-vsctl show ... Port dpdkbond Interface "nic-10G-2" type: dpdk options: {dpdk-devargs="0000:05:00.1", mtu_request="9000", n_rxq="4", n_txq="4"} Interface "nic-10G-1" type: dpdk options: {dpdk-devargs="0000:01:00.0", mtu_request="9000", n_rxq="4", n_txq="4"} Port br-ex Interface br-ex type: internal Bridge br-int Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port int-br-ex Interface int-br-ex type: patch options: {peer=phy-br-ex} Port br-int Interface br-int type: internal Port "vhuc8febeff-56" tag: 1 Interface "vhuc8febeff-56" type: dpdkvhostuserclient options: {n_rxq="4", n_txq="4", vhost-server-path="/var/lib/vhost_sockets/vhuc8febeff-56"} > > >> Just as a curiosity, I see you have a configured MTU of 1500B on the >> physical interfaces. Is that the same MTU you're using inside the VM? >> And are you using the same configurations (including that 1500B MTU) >> when running your OvS-Kernel setup? >> >> > MTU inside VM is 1450. Is that OK for the high throughput? > > Inside VM the MTU is 1500, the dpdk physical NIC (OvS-Kernel) is 9000 now. Bandwidth is ~5.1Gbps now. > > >> Hope this helps, >> >> > > > >> Tiago. >> >> [1] >> >> http://docs.openvswitch.org/en/latest/topics/dpdk/pmd/#port-rx-queue-assigment-to-pmd-threads >> >> > >> > Tiago. >> > >> > > So I'd like to find some help from the community. Maybe I'm >> > missing some >> > > configrations. >> > > >> > > Thanks. >> > > >> > > >> > > Here is the list of the environment and some configrations: >> > > # uname -r >> > > 3.10.0-862.11.6.el7.x86_64 >> > > # rpm -qa|grep dpdk >> > > dpdk-17.11-11.el7.x86_64 >> > > # rpm -qa|grep openvswitch >> > > openvswitch-2.9.0-3.el7.x86_64 >> > > # ovs-vsctl list open_vswitch >> > > _uuid : a6a3d9eb-28a8-4bf0-a8b4-94577b5ffe5e >> > > bridges : [531e4bea-ce12-402a-8a07-7074c31b978e, >> > > 5c1675e2-5408-4c1f-88bc-6d9c9b932d47] >> > > cur_cfg : 1305 >> > > datapath_types : [netdev, system] >> > > db_version : "7.15.1" >> > > external_ids : {hostname="cq01-compute-10e112e5e140", >> > > rundir="/var/run/openvswitch", >> > > system-id="e2cc84fe-a3c8-455f-8c64-260741c141ee"} >> > > iface_types : [dpdk, dpdkr, dpdkvhostuser, >> > dpdkvhostuserclient, >> > > geneve, gre, internal, lisp, patch, stt, system, tap, vxlan] >> > > manager_options : [43803994-272b-49cb-accc-ab672d1eefc8] >> > > next_cfg : 1305 >> > > other_config : {dpdk-init="true", dpdk-lcore-mask="0x1", >> > > dpdk-socket-mem="1024,1024", pmd-cpu-mask="0x100000", >> > > vhost-iommu-support="true"} >> > > ovs_version : "2.9.0" >> > > ssl : [] >> > > statistics : {} >> > > system_type : centos >> > > system_version : "7" >> > > # lsmod |grep vfio >> > > vfio_pci 41312 2 >> > > vfio_iommu_type1 22300 1 >> > > vfio 32695 7 vfio_iommu_type1,vfio_pci >> > > irqbypass 13503 23 kvm,vfio_pci >> > > >> > > # ovs-appctl dpif/show >> > > netdev@ovs-netdev: hit:759366335 missed:754283 >> > > br-ex: >> > > bond1108 4/6: (tap) >> > > br-ex 65534/3: (tap) >> > > nic-10G-1 5/4: (dpdk: configured_rx_queues=8, >> > > configured_rxq_descriptors=2048, configured_tx_queues=2, >> > > configured_txq_descriptors=2048, mtu=1500, requested_rx_queues=8, >> > > requested_rxq_descriptors=2048, requested_tx_queues=2, >> > > requested_txq_descriptors=2048, rx_csum_offload=true) >> > > nic-10G-2 6/5: (dpdk: configured_rx_queues=8, >> > > configured_rxq_descriptors=2048, configured_tx_queues=2, >> > > configured_txq_descriptors=2048, mtu=1500, requested_rx_queues=8, >> > > requested_rxq_descriptors=2048, requested_tx_queues=2, >> > > requested_txq_descriptors=2048, rx_csum_offload=true) >> > > phy-br-ex 3/none: (patch: peer=int-br-ex) >> > > br-int: >> > > br-int 65534/2: (tap) >> > > int-br-ex 1/none: (patch: peer=phy-br-ex) >> > > vhu76f9a623-9f 2/1: (dpdkvhostuserclient: configured_rx_queues=8, >> > > configured_tx_queues=8, mtu=1500, requested_rx_queues=8, >> > > requested_tx_queues=8) >> > > >> > > # ovs-appctl dpctl/show -s >> > > netdev@ovs-netdev: >> > > lookups: hit:759366335 missed:754283 lost:72 >> > > flows: 186 >> > > port 0: ovs-netdev (tap) >> > > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >> > > TX packets:0 errors:0 dropped:0 aborted:0 carrier:0 >> > > collisions:0 >> > > RX bytes:0 TX bytes:0 >> > > port 1: vhu76f9a623-9f (dpdkvhostuserclient: >> configured_rx_queues=8, >> > > configured_tx_queues=8, mtu=1500, requested_rx_queues=8, >> > > requested_tx_queues=8) >> > > RX packets:718391758 errors:0 dropped:0 overruns:? frame:? >> > > TX packets:30372410 errors:? dropped:719200 aborted:? carrier:? >> > > collisions:? >> > > RX bytes:1086995317051 (1012.3 GiB) TX bytes:2024893540 (1.9 GiB) >> > > port 2: br-int (tap) >> > > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >> > > TX packets:1393992 errors:0 dropped:4 aborted:0 carrier:0 >> > > collisions:0 >> > > RX bytes:0 TX bytes:2113616736 (2.0 GiB) >> > > port 3: br-ex (tap) >> > > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >> > > TX packets:6660091 errors:0 dropped:967 aborted:0 carrier:0 >> > > collisions:0 >> > > RX bytes:0 TX bytes:2451440870 (2.3 GiB) >> > > port 4: nic-10G-1 (dpdk: configured_rx_queues=8, >> > > configured_rxq_descriptors=2048, configured_tx_queues=2, >> > > configured_txq_descriptors=2048, mtu=1500, requested_rx_queues=8, >> > > requested_rxq_descriptors=2048, requested_tx_queues=2, >> > > requested_txq_descriptors=2048, rx_csum_offload=true) >> > > RX packets:36409466 errors:0 dropped:0 overruns:? frame:? >> > > TX packets:718371472 errors:0 dropped:20276 aborted:? carrier:? >> > > collisions:? >> > > RX bytes:2541593983 (2.4 GiB) TX bytes:1089838136919 (1015.0 GiB) >> > > port 5: nic-10G-2 (dpdk: configured_rx_queues=8, >> > > configured_rxq_descriptors=2048, configured_tx_queues=2, >> > > configured_txq_descriptors=2048, mtu=1500, requested_rx_queues=8, >> > > requested_rxq_descriptors=2048, requested_tx_queues=2, >> > > requested_txq_descriptors=2048, rx_csum_offload=true) >> > > RX packets:5319466 errors:0 dropped:0 overruns:? frame:? >> > > TX packets:0 errors:0 dropped:0 aborted:? carrier:? >> > > collisions:? >> > > RX bytes:344903551 (328.9 MiB) TX bytes:0 >> > > port 6: bond1108 (tap) >> > > RX packets:228 errors:0 dropped:0 overruns:0 frame:0 >> > > TX packets:5460 errors:0 dropped:18 aborted:0 carrier:0 >> > > collisions:0 >> > > RX bytes:21459 (21.0 KiB) TX bytes:341087 (333.1 KiB) >> > > >> > > # ovs-appctl dpif-netdev/pmd-stats-show >> > > pmd thread numa_id 0 core_id 20: >> > > packets received: 760120690 >> > > packet recirculations: 0 >> > > avg. datapath passes per packet: 1.00 >> > > emc hits: 750787577 >> > > megaflow hits: 8578758 >> > > avg. subtable lookups per megaflow hit: 1.05 >> > > miss with success upcall: 754283 >> > > miss with failed upcall: 72 >> > > avg. packets per output batch: 2.21 >> > > idle cycles: 210648140144730 (99.13%) >> > > processing cycles: 1846745927216 (0.87%) >> > > avg cycles per packet: 279554.14 (212494886071946/760120690) >> > > avg processing cycles per packet: 2429.54 >> (1846745927216/760120690) >> > > main thread: >> > > packets received: 0 >> > > packet recirculations: 0 >> > > avg. datapath passes per packet: 0.00 >> > > emc hits: 0 >> > > megaflow hits: 0 >> > > avg. subtable lookups per megaflow hit: 0.00 >> > > miss with success upcall: 0 >> > > miss with failed upcall: 0 >> > > avg. packets per output batch: 0.00 >> > > >> > > # ovs-appctl dpif-netdev/pmd-rxq-show >> > > pmd thread numa_id 0 core_id 20: >> > > isolated : false >> > > port: nic-10G-1 queue-id: 0pmd usage: 0 % >> > > port: nic-10G-1 queue-id: 1pmd usage: 0 % >> > > port: nic-10G-1 queue-id: 2pmd usage: 0 % >> > > port: nic-10G-1 queue-id: 3pmd usage: 0 % >> > > port: nic-10G-1 queue-id: 4pmd usage: 0 % >> > > port: nic-10G-1 queue-id: 5pmd usage: 0 % >> > > port: nic-10G-1 queue-id: 6pmd usage: 0 % >> > > port: nic-10G-1 queue-id: 7pmd usage: 0 % >> > > port: nic-10G-2 queue-id: 0pmd usage: 0 % >> > > port: nic-10G-2 queue-id: 1pmd usage: 0 % >> > > port: nic-10G-2 queue-id: 2pmd usage: 0 % >> > > port: nic-10G-2 queue-id: 3pmd usage: 0 % >> > > port: nic-10G-2 queue-id: 4pmd usage: 0 % >> > > port: nic-10G-2 queue-id: 5pmd usage: 0 % >> > > port: nic-10G-2 queue-id: 6pmd usage: 0 % >> > > port: nic-10G-2 queue-id: 7pmd usage: 0 % >> > > port: vhu76f9a623-9f queue-id: 0pmd usage: 0 % >> > > port: vhu76f9a623-9f queue-id: 1pmd usage: 0 % >> > > port: vhu76f9a623-9f queue-id: 2pmd usage: 0 % >> > > port: vhu76f9a623-9f queue-id: 3pmd usage: 0 % >> > > port: vhu76f9a623-9f queue-id: 4pmd usage: 0 % >> > > port: vhu76f9a623-9f queue-id: 5pmd usage: 0 % >> > > port: vhu76f9a623-9f queue-id: 6pmd usage: 0 % >> > > port: vhu76f9a623-9f queue-id: 7pmd usage: 0 % >> > > >> > > >> > > # virsh dumpxml instance-5c5191ff-c1a2-4429-9a8b-93ddd939583d >> > > ... >> > > <interface type='vhostuser'> >> > > <mac address='fa:16:3e:77:ab:fb'/> >> > > <source type='unix' >> path='/var/lib/vhost_sockets/vhu76f9a623-9f' >> > > mode='server'/> >> > > <target dev='vhu76f9a623-9f'/> >> > > <model type='virtio'/> >> > > <driver name='vhost' queues='8'/> >> > > <alias name='net0'/> >> > > <address type='pci' domain='0x0000' bus='0x00' slot='0x03' >> > > function='0x0'/> >> > > </interface> >> > > ... >> > > >> > > # ovs-vsctl show >> > > a6a3d9eb-28a8-4bf0-a8b4-94577b5ffe5e >> > > Manager "ptcp:6640:127.0.0.1" >> > > is_connected: true >> > > Bridge br-int >> > > Controller "tcp:127.0.0.1:6633 <http://127.0.0.1:6633> >> > <http://127.0.0.1:6633>" >> > > is_connected: true >> > > fail_mode: secure >> > > Port int-br-ex >> > > Interface int-br-ex >> > > type: patch >> > > options: {peer=phy-br-ex} >> > > Port br-int >> > > Interface br-int >> > > type: internal >> > > Port "vhu76f9a623-9f" >> > > tag: 1 >> > > Interface "vhu76f9a623-9f" >> > > type: dpdkvhostuserclient >> > > options: {n_rxq="8", >> > > vhost-server-path="/var/lib/vhost_sockets/vhu76f9a623-9f"} >> > > Bridge br-ex >> > > Controller "tcp:127.0.0.1:6633 <http://127.0.0.1:6633> >> > <http://127.0.0.1:6633>" >> > > is_connected: true >> > > fail_mode: secure >> > > Port dpdkbond >> > > Interface "nic-10G-1" >> > > type: dpdk >> > > options: {dpdk-devargs="0000:01:00.0", n_rxq="8", >> > n_txq="8"} >> > > Interface "nic-10G-2" >> > > type: dpdk >> > > options: {dpdk-devargs="0000:05:00.1", n_rxq="8", >> > n_txq="8"} >> > > Port phy-br-ex >> > > Interface phy-br-ex >> > > type: patch >> > > options: {peer=int-br-ex} >> > > Port br-ex >> > > Interface br-ex >> > > type: internal >> > > >> > > # numactl --hardware >> > > available: 2 nodes (0-1) >> > > node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 >> 38 >> > > node 0 size: 130978 MB >> > > node 0 free: 7539 MB >> > > node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 >> 39 >> > > node 1 size: 131072 MB >> > > node 1 free: 6886 MB >> > > node distances: >> > > node 0 1 >> > > 0: 10 21 >> > > 1: 21 10 >> > > >> > > # grep HugePages_ /proc/meminfo >> > > HugePages_Total: 232 >> > > HugePages_Free: 10 >> > > HugePages_Rsvd: 0 >> > > HugePages_Surp: 0 >> > > >> > > >> > > # cat /proc/cmdline >> > > BOOT_IMAGE=/boot/vmlinuz-3.10.0-862.11.6.el7.x86_64 >> > > root=UUID=220ee106-5e00-4809-91a0-641e045a4c21 ro >> > > intel_idle.max_cstate=0 crashkernel=auto rhgb quiet >> > > default_hugepagesz=1G hugepagesz=1G hugepages=232 iommu=pt >> > intel_iommu=on >> > > >> > > >> > > Best regards, >> > > LIU Yulong >> > > >> > > _______________________________________________ >> > > discuss mailing list >> > > [email protected] <mailto:[email protected]> >> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >> > > >> > >> > >> > _______________________________________________ >> > discuss mailing list >> > [email protected] >> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >> > >> >
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
