Re: [ovs-dev] [PATCH v3] netdev-dpdk: Implement Tx intermediate queue for dpdk ports.

Bodireddy, Bhanuprakash Tue, 18 Apr 2017 14:45:07 -0700

Hi Eelco,

Please find my comments inline.


>
>Hi Bhanuprakash,
>
>I was doing some Physical to Virtual tests, and whenever the number of flows
>reaches the rx batch size performance dropped a lot. I created an
>experimental patch where I added an intermediate queue and flush it at the
>end of the rx batch.
>
>When I found your patch I decided to give it try to see how it behaves.
>I also modified you patch in such a way that it will flush the queue after 
>every
>call to dp_netdev_process_rxq_port().

I presume you were doing like below in the pmd_thread_main receive loop?

for (i = 0; i < poll_cnt; i++) {
            dp_netdev_process_rxq_port(pmd, poll_list[i].rx,
                                       poll_list[i].port_no);
            dp_netdev_drain_txq_ports(pmd);
        }

>
>Here are some pkt forwarding stats for the Physical to Physical scenario, for
>two 82599ES 10G port with 64 byte packets being send at wire speed:
>
>Number      plain                patch +
>of flows  git clone    patch      flush
>========  =========  =========  =========
>   10       10727283   13527752   13393844
>   32        7042253   11285572     11228799
>   50        7515491    9642650      9607791
>  100        5838699    9461239      9430730
>  500        5285066    7859123      7845807
>1000        5226477    7146404      7135601

Thanks for sharing the numbers, I do agree with your findings and I saw very 
similar results with our v3 patch.
In any case we see significant throughput improvement with the patch.

>
>
>I do not have an IXIA to do the latency tests you performed, however I do
>have a XENA tester which has a basic latency measurement feature.
>I used the following script to get the latency numbers:
>
>https://github.com/chaudron/XenaPythonLib/blob/latency/examples/latenc
>y.py

Thanks for pointing this, it could be useful for users with no IXIA setup.

>
>
>As you can see in the numbers below, the default queue introduces quite
>some latency, however doing the flush every rx batch brings the latency down
>to almost the original values. The results mimics your test case 2, sending 10G
>traffic @ wire speed:
>
>   ===== GIT CLONE
>   Pkt size  min(ns)  avg(ns)  max(ns)
>    512      4,631      5,022    309,914
>   1024      5,545      5,749    104,294
>   1280      5,978      6,159     45,306
>   1518      6,419      6,774    946,850
>
>   ===== PATCH
>   Pkt size  min(ns)  avg(ns)  max(ns)
>    512      4,928    492,228  1,995,026
>   1024      5,761    499,206  2,006,628
>   1280      6,186    497,975  1,986,175
>   1518      6,579    494,434  2,005,947
>
>   ===== PATCH + FLUSH
>   Pkt size  min(ns)  avg(ns)  max(ns)
>    512      4,711      5,064    182,477
>   1024      5,601      5,888    701,654
>   1280      6,018      6,491    533,037
>   1518      6,467      6,734    312,471

The latency numbers above are very encouraging indeed. However with RFC2544 
tests especially on IXIA, we do have lot of parameters to tune.
I see that the latency stats fluctuate a lot with change in acceptable 'Frame 
Loss'.  I am not expert of IXIA myself, but trying to figure out acceptable
settings and trying to measure latency/throughput. 

>
>Maybe it will be good to re-run your latency tests with the flush for every rx
>batch. This might get ride of your huge latency while still increasing the
>performance in the case the rx batch shares the same egress port.
>
>The overall patchset looks fine to me, see some comments inline.
Thanks for reviewing the patch.

>>
>> +#define MAX_LOOP_TO_DRAIN 128
>Is defining this inline ok?
I see that this convention is used in ovs. 

>>           NULL,
>>           NULL,
>>           netdev_dpdk_vhost_reconfigure,
>> -        netdev_dpdk_vhost_rxq_recv);
>> +        netdev_dpdk_vhost_rxq_recv,
>> +        NULL);
>We need this patch even more in the vhost case as there is an even bigger
>drop in performance when we exceed the rx batch size. I measured around
>40%, when reducing the rx batch size to 4, and using 1 vs 5 flows (single PMD).

Completely Agree. Infact we did a quick patch doing batching for vhost ports as 
well and found significant performance improvement(though it's not thoroughly 
tested for all corner cases).
We have that in our backlog and we will trying posting that patch as an RFC 
atleast to get feedback from the community.

-Bhanuprakash. 
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3] netdev-dpdk: Implement Tx intermediate queue for dpdk ports.

Reply via email to