On 20.12.2016 15:19, Bodireddy, Bhanuprakash wrote:
>> -----Original Message-----
>> From: Ilya Maximets [mailto:[email protected]]
>> Sent: Tuesday, December 20, 2016 8:09 AM
>> To: Bodireddy, Bhanuprakash <[email protected]>; Aaron
>> Conole <[email protected]>
>> Cc: [email protected]; Daniele Di Proietto <[email protected]>;
>> Thadeu Lima de Souza Cascardo <[email protected]>; Fischetti, Antonio
>> <[email protected]>; Markus Magnusson
>> <[email protected]>
>> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue during
>> packet transmission.
>>
>> On 19.12.2016 21:05, Bodireddy, Bhanuprakash wrote:
>>> Thanks Ilya and Aaron for reviewing this patch and providing your
>> comments, my reply inline.
>>>
>>>> -----Original Message-----
>>>> From: Ilya Maximets [mailto:[email protected]]
>>>> Sent: Monday, December 19, 2016 8:41 AM
>>>> To: Aaron Conole <[email protected]>; Bodireddy, Bhanuprakash
>>>> <[email protected]>
>>>> Cc: [email protected]; Daniele Di Proietto
>>>> <[email protected]>; Thadeu Lima de Souza Cascardo
>>>> <[email protected]>; Fischetti, Antonio
>>>> <[email protected]>; Markus Magnusson
>>>> <[email protected]>
>>>> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue
>>>> during packet transmission.
>>>>
>>>> Hi,
>>>>
>>>> I didn't test this patch yet. Bhanu, could you please describe your
>>>> test scenario and performance results in more details.
>>>
>>> During the recent performance analysis improvements for classifier, we
>> found that bottleneck was also observed at flow batching.
>>> This was due to fewer packets in batch. To reproduce this, a simple P2P test
>> case can be used with 30 IXIA streams and matching IP flow rules.
>>> Eg: For an IXIA stream with src Ip: 2.2.2.1, dst tip: 5.5.5.1
>>> ovs-ofctl add-flow br0
>>> dl_type=0x0800,nw_src=2.2.2.1,actions=output:2
>>>
>>> For an IXIA stream with src Ip: 4.4.4.1, dst tip: 15.15.15.1
>>> ovs-ofctl add-flow br0
>>> dl_type=0x0800,nw_src=4.4.4.1,actions=output:2
>>>
>>> This leaves fewer packets in batches and packet_batch_per_flow_execute()
>> shall be invoked for every batch.
>>> With 30 flows, I see the throughput drops to ~7.0 Mpps in PHY2PHY case for
>> 64 byte udp packets.
>>>
>>>> It'll be nice if you provide throughput and latency measurement
>>>> results for different scenarios and packet sizes. Latency is important
>>>> here.
>>> We are yet to do latency measurements in this case. With 30 IXIA
>>> streams comprising of 64 byte udp packets there was an throughput
>>> improvement of 30% in P2P case and 13-15% in PVP case(single queue). we
>> will try to get the latency stats with and without this patch.
>>>
>>>>
>>>> About the patch itself:
>>>>
>>>> 1. 'drain' called only for PMD threads. This will lead to
>>>> broken connection with non-PMD ports.
>>> I see that non-PMD ports are handled with vswitchd thread.
>>> Tested PVP loopback case with tap ports and found to be working as
>>> expected. Can you let me know the specific case you are referring here so
>> that I can verify if the patch breaks it.
>>
>> I meant something like this:
>>
>>
>> *-----------HOST-1(br-int)-----*
>> *---------HOST-2(br-int)------*
>> | | |
>> |
>> | internal_port <--> dpdk0 <-------------------> dpdk0 <--> internal_port
>> |
>> | 192.168.0.1/24 | | 192.168.0.2/24
>> |
>> *------------------------------*
>> *-----------------------------*
>>
>> (HOST-1)# ping 192.168.0.2
>>
>> In this case I'm expecting that first (NETDEV_MAX_BURST - 1) icmp packets
>> will stuck in TX queue. The next packet will cause sending the whole batch of
>> NETDEV_MAX_BURST icmp packets. That is not good behaviour.
>
> Thanks for test case. I tested this and found to be working with the patch.
> The reason being, PMD threads uses intermediate queue implementation by
> invoking 'netdev_dpdk_eth_tx_queue()',
> Whereas the non-pmd thread(vswitchd) continues to call
> netdev_dpdk_eth_tx_burst() in the patch and no change in previous behavior
> here.
>
> I had setup the case similar to one mentioned in above diagram using 2 hosts
> and assigning IP addresses to the respective bridges.
> When I ping the Host2(192.168.0.2) from Host 1(192.168.0.1) below is the
> call path.
>
> (HOST-1)# ping 192.168.0.2 -c 1 (send only one packet)
>
> When sending ICMP echo request packet:
> Vswitchd thread: dp_execute_cb()
> netdev_send() [ pkt sent
> on dpdk0, qid: 0]
> netdev_dpdk_send__()
> dpdk_do_tx_copy()
>
> netdev_dpdk_eth_tx_burst().
>
>
> On receiving ICMP echo reply packet:
> PMD thread: dp_execute_cb()
> netdev_send() [pkt sent on br0, qid: 1]
> netdev_linux_send()
>
OK. Maybe in this exact case it works, but it isn't guaranteed that non-PMD
threads will always call send with 'may_steal=false'.
Also, I see another issue here:
8. netdev_send(.., may_steal=true, ...) uses instant send while
netdev_send(.., may_steal=false, ...) uses queueing of packets.
This behaviour will lead to packet reordering issues in some
cases.
9. You're decreasing 'txq->count' inside 'netdev_dpdk_eth_tx_burst()'
while it can be called with different from 'txq->burst_pkts'
argument. Some of queued packets will be lost + memory leak.
Best regards, Ilya Maximets.
>
>>
>>>> 2. 'xps_get_tx_qid()' called twice. First time on send and
>>>> the second time on drain. This may lead to different
>>>> returned 'tx_qid's and packets will stuck forever in
>>>> tx buffer.
>>>
>>> You are right and xps_get_tx_qid() can return different tx_qids.
>>> I was always testing for XPS disabled cases and completely overlooked this
>> case.
>>> This could be a potential problem for us and any suggestions should be very
>> helpful.
>>>
>>>>
>>>> 3. 'txq_drain()' must take the 'tx_lock' for queue in case
>>>> of dynamic tx queues.
>>>
>>> Agree, will handle this in next version.
>>>
>>>>
>>>> 4. Waiting for 1024 polling cycles of PMD thread may cause
>>>> a huge latency if we have few packets per second on one
>>>> port and intensive traffic on others.
>>>
>>> I agree with you. We discussed this here and thought invoking the
>>> drain logic once every 1024 cycles is an optimal solution. But if the
>>> community thinks otherwise we can move this in to the main 'for' loop
>>> so that it can be invoked more often provided that 'DRAIN_TSC' cycles
>> elapsed.
>>>
>>>>
>>>> 5. This patch breaks the counting of 'tx_dropped' packets
>>>> in netdev-dpdk.
>>>
>>> I presume you are referring to below line in the code.
>>>
>>> - dropped += netdev_dpdk_eth_tx_burst()
>>> + netdev_dpdk_eth_tx_queue()
>>>
>>> Will handle this in v2 of this patch.
>>>
>>>>
>>>> 6. Comments in netdev-provider.h should be fixed to reflect
>>>> all the changes.
>>>
>>> Will make necessary changes in netdev-provider.h
>>>
>>>>
>>>> 7. At last, I agree with Aaron that explicit allowing only
>>>> 'dpdk' ports is not a good style. Also, mentioning name of
>>>> exact netdev inside the common code is a bad style too.
>>>
>>> Will handle this appropriately.
>>>
>>> Regards,
>>> Bhanu Prakash.
>>>
>>>> Best regards, Ilya Maximets.
>>>>
>>>> On 16.12.2016 22:24, Aaron Conole wrote:
>>>>> Hi Bhanu,
>>>>>
>>>>> Bhanuprakash Bodireddy <[email protected]> writes:
>>>>>
>>>>>> In exact match cache processing on an EMC hit, packets are queued
>>>>>> in to batches matching the flow. Thereafter, packets are processed
>>>>>> in batches for faster packet processing. This particularly is
>>>>>> inefficient if there are fewer packets in a batch as
>>>>>> rte_eth_tx_burst() incurs expensive MMIO write.
>>>>>>
>>>>>> This commit adds back intermediate queue implementation. Packets
>>>>>> are queued and burst when the packet count >= NETDEV_MAX_BURST.
>>>>>> Also drain logic is refactored to handle fewer packets in the tx queues.
>>>>>> Testing shows significant performance gains with queueing.
>>>>>>
>>>>>> Fixes: b59cc14e032d("netdev-dpdk: Use instant sending instead of
>>>>>> queueing of packets.")
>>>>>> Signed-off-by: Bhanuprakash Bodireddy
>>>>>> <[email protected]>
>>>>>> Signed-off-by: Antonio Fischetti <[email protected]>
>>>>>> Co-authored-by: Antonio Fischetti <[email protected]>
>>>>>> Signed-off-by: Markus Magnusson
>> <[email protected]>
>>>>>> Co-authored-by: Markus Magnusson
>> <[email protected]>
>>>>>> ---
>>>>>
>>>>> I've Cc'd Ilya just in hopes that the patch gets a better review
>>>>> than I could give. As a general comment, I like the direction -
>>>>> batched operations are usually a better way of going.
>>>>>
>>>>> Just minor below.
>>>>>
>>>>> ... snip ...
>>>>>>
>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>>>>>> 3509493..65dff83 100644
>>>>>> --- a/lib/dpif-netdev.c
>>>>>> +++ b/lib/dpif-netdev.c
>>>>>> @@ -622,6 +622,9 @@ static int dpif_netdev_xps_get_tx_qid(const
>>>>>> struct dp_netdev_pmd_thread *pmd, static inline bool
>>>>>> emc_entry_alive(struct emc_entry *ce); static void
>>>>>> emc_clear_entry(struct emc_entry *ce);
>>>>>>
>>>>>> +static struct tx_port *pmd_send_port_cache_lookup
>>>>>> + (const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
>>>>>> +
>>>>>> static void
>>>>>> emc_cache_init(struct emc_cache *flow_cache) { @@ -2877,6
>>>>>> +2880,31 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd,
>> }
>>>>>>
>>>>>> static void
>>>>>> +dp_netdev_drain_txq_port(struct dp_netdev_pmd_thread *pmd,
>>>>>> + struct dp_netdev_port *port,
>>>>>> + uint64_t now) {
>>>>>> + int tx_qid;
>>>>>> +
>>>>>> + if (!strcmp(port->type, "dpdk")) {
>>>>>
>>>>> Any reason to restrict this only to dpdk ports? It looks like
>>>>> you've added a new netdev operation, so why not just call the
>>>>> netdev_txq_drain unconditionally?
>>>>>
>>>>> Also, bit of a nit, but tq_qid can be reduced in scope down to the
>>>>> if block below.
>>>>>
>>>>>> + struct tx_port *tx = pmd_send_port_cache_lookup(pmd,
>>>>>> + u32_to_odp(port->port_no));
>>>>>> +
>>>>>> + if (OVS_LIKELY(tx)) {
>>>>>> + bool dynamic_txqs = tx->port->dynamic_txqs;
>>>>>> +
>>>>>> + if (dynamic_txqs) {
>>>>>> + tx_qid = dpif_netdev_xps_get_tx_qid(pmd, tx, now);
>>>>>> + } else {
>>>>>> + tx_qid = pmd->static_tx_qid;
>>>>>> + }
>>>>>> +
>>>>>> + netdev_txq_drain(port->netdev, tx_qid);
>>>>>> + }
>>>>>> + }
>>>>>> +}
>>>>>> +
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>
>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev