Hi Eelco, Please find my comments inline.
> >Hi Bhanuprakash, > >I was doing some Physical to Virtual tests, and whenever the number of flows >reaches the rx batch size performance dropped a lot. I created an >experimental patch where I added an intermediate queue and flush it at the >end of the rx batch. > >When I found your patch I decided to give it try to see how it behaves. >I also modified you patch in such a way that it will flush the queue after >every >call to dp_netdev_process_rxq_port(). I presume you were doing like below in the pmd_thread_main receive loop? for (i = 0; i < poll_cnt; i++) { dp_netdev_process_rxq_port(pmd, poll_list[i].rx, poll_list[i].port_no); dp_netdev_drain_txq_ports(pmd); } > >Here are some pkt forwarding stats for the Physical to Physical scenario, for >two 82599ES 10G port with 64 byte packets being send at wire speed: > >Number plain patch + >of flows git clone patch flush >======== ========= ========= ========= > 10 10727283 13527752 13393844 > 32 7042253 11285572 11228799 > 50 7515491 9642650 9607791 > 100 5838699 9461239 9430730 > 500 5285066 7859123 7845807 >1000 5226477 7146404 7135601 Thanks for sharing the numbers, I do agree with your findings and I saw very similar results with our v3 patch. In any case we see significant throughput improvement with the patch. > > >I do not have an IXIA to do the latency tests you performed, however I do >have a XENA tester which has a basic latency measurement feature. >I used the following script to get the latency numbers: > >https://github.com/chaudron/XenaPythonLib/blob/latency/examples/latenc >y.py Thanks for pointing this, it could be useful for users with no IXIA setup. > > >As you can see in the numbers below, the default queue introduces quite >some latency, however doing the flush every rx batch brings the latency down >to almost the original values. The results mimics your test case 2, sending 10G >traffic @ wire speed: > > ===== GIT CLONE > Pkt size min(ns) avg(ns) max(ns) > 512 4,631 5,022 309,914 > 1024 5,545 5,749 104,294 > 1280 5,978 6,159 45,306 > 1518 6,419 6,774 946,850 > > ===== PATCH > Pkt size min(ns) avg(ns) max(ns) > 512 4,928 492,228 1,995,026 > 1024 5,761 499,206 2,006,628 > 1280 6,186 497,975 1,986,175 > 1518 6,579 494,434 2,005,947 > > ===== PATCH + FLUSH > Pkt size min(ns) avg(ns) max(ns) > 512 4,711 5,064 182,477 > 1024 5,601 5,888 701,654 > 1280 6,018 6,491 533,037 > 1518 6,467 6,734 312,471 The latency numbers above are very encouraging indeed. However with RFC2544 tests especially on IXIA, we do have lot of parameters to tune. I see that the latency stats fluctuate a lot with change in acceptable 'Frame Loss'. I am not expert of IXIA myself, but trying to figure out acceptable settings and trying to measure latency/throughput. > >Maybe it will be good to re-run your latency tests with the flush for every rx >batch. This might get ride of your huge latency while still increasing the >performance in the case the rx batch shares the same egress port. > >The overall patchset looks fine to me, see some comments inline. Thanks for reviewing the patch. >> >> +#define MAX_LOOP_TO_DRAIN 128 >Is defining this inline ok? I see that this convention is used in ovs. >> NULL, >> NULL, >> netdev_dpdk_vhost_reconfigure, >> - netdev_dpdk_vhost_rxq_recv); >> + netdev_dpdk_vhost_rxq_recv, >> + NULL); >We need this patch even more in the vhost case as there is an even bigger >drop in performance when we exceed the rx batch size. I measured around >40%, when reducing the rx batch size to 4, and using 1 vs 5 flows (single PMD). Completely Agree. Infact we did a quick patch doing batching for vhost ports as well and found significant performance improvement(though it's not thoroughly tested for all corner cases). We have that in our backlog and we will trying posting that patch as an RFC atleast to get feedback from the community. -Bhanuprakash. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev