Will respin a patch v2 where we addressed your comments. So please let's continue our discussion on v2.
Thanks, Antonio > -----Original Message----- > From: [email protected] [mailto:ovs-dev- > [email protected]] On Behalf Of Fischetti, Antonio > Sent: Thursday, January 12, 2017 5:40 PM > To: Ilya Maximets <[email protected]>; Bodireddy, Bhanuprakash > <[email protected]>; Aaron Conole <[email protected]> > Cc: [email protected]; Daniele Di Proietto <[email protected]> > Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue during > packet transmission. > > Hi Ilya, > thanks for your detailed feedback. I've added a couple of replies > inline about when the instant send happens and the may_steal param. > > > Regards, > Antonio > > > -----Original Message----- > > From: Ilya Maximets [mailto:[email protected]] > > Sent: Tuesday, December 20, 2016 12:47 PM > > To: Bodireddy, Bhanuprakash <[email protected]>; Aaron > > Conole <[email protected]> > > Cc: [email protected]; Daniele Di Proietto <[email protected]>; > > Thadeu Lima de Souza Cascardo <[email protected]>; Fischetti, Antonio > > <[email protected]>; Markus Magnusson > > <[email protected]> > > Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue > during > > packet transmission. > > > > On 20.12.2016 15:19, Bodireddy, Bhanuprakash wrote: > > >> -----Original Message----- > > >> From: Ilya Maximets [mailto:[email protected]] > > >> Sent: Tuesday, December 20, 2016 8:09 AM > > >> To: Bodireddy, Bhanuprakash <[email protected]>; Aaron > > >> Conole <[email protected]> > > >> Cc: [email protected]; Daniele Di Proietto > <[email protected]>; > > >> Thadeu Lima de Souza Cascardo <[email protected]>; Fischetti, > Antonio > > >> <[email protected]>; Markus Magnusson > > >> <[email protected]> > > >> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue > > during > > >> packet transmission. > > >> > > >> On 19.12.2016 21:05, Bodireddy, Bhanuprakash wrote: > > >>> Thanks Ilya and Aaron for reviewing this patch and providing your > > >> comments, my reply inline. > > >>> > > >>>> -----Original Message----- > > >>>> From: Ilya Maximets [mailto:[email protected]] > > >>>> Sent: Monday, December 19, 2016 8:41 AM > > >>>> To: Aaron Conole <[email protected]>; Bodireddy, Bhanuprakash > > >>>> <[email protected]> > > >>>> Cc: [email protected]; Daniele Di Proietto > > >>>> <[email protected]>; Thadeu Lima de Souza Cascardo > > >>>> <[email protected]>; Fischetti, Antonio > > >>>> <[email protected]>; Markus Magnusson > > >>>> <[email protected]> > > >>>> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue > > >>>> during packet transmission. > > >>>> > > >>>> Hi, > > >>>> > > >>>> I didn't test this patch yet. Bhanu, could you please describe your > > >>>> test scenario and performance results in more details. > > >>> > > >>> During the recent performance analysis improvements for classifier, > we > > >> found that bottleneck was also observed at flow batching. > > >>> This was due to fewer packets in batch. To reproduce this, a simple > > P2P test > > >> case can be used with 30 IXIA streams and matching IP flow rules. > > >>> Eg: For an IXIA stream with src Ip: 2.2.2.1, dst tip: 5.5.5.1 > > >>> ovs-ofctl add-flow br0 > > >>> dl_type=0x0800,nw_src=2.2.2.1,actions=output:2 > > >>> > > >>> For an IXIA stream with src Ip: 4.4.4.1, dst tip: 15.15.15.1 > > >>> ovs-ofctl add-flow br0 > > >>> dl_type=0x0800,nw_src=4.4.4.1,actions=output:2 > > >>> > > >>> This leaves fewer packets in batches and > > packet_batch_per_flow_execute() > > >> shall be invoked for every batch. > > >>> With 30 flows, I see the throughput drops to ~7.0 Mpps in PHY2PHY > case > > for > > >> 64 byte udp packets. > > >>> > > >>>> It'll be nice if you provide throughput and latency measurement > > >>>> results for different scenarios and packet sizes. Latency is > > important here. > > >>> We are yet to do latency measurements in this case. With 30 IXIA > > >>> streams comprising of 64 byte udp packets there was an throughput > > >>> improvement of 30% in P2P case and 13-15% in PVP case(single queue). > > we > > >> will try to get the latency stats with and without this patch. > > >>> > > >>>> > > >>>> About the patch itself: > > >>>> > > >>>> 1. 'drain' called only for PMD threads. This will lead to > > >>>> broken connection with non-PMD ports. > > >>> I see that non-PMD ports are handled with vswitchd thread. > > >>> Tested PVP loopback case with tap ports and found to be working as > > >>> expected. Can you let me know the specific case you are referring > here > > so > > >> that I can verify if the patch breaks it. > > >> > > >> I meant something like this: > > >> > > >> > > >> *-----------HOST-1(br-int)-----* *---------HOST-2(br- > > int)------* > > >> | | | > > | > > >> | internal_port <--> dpdk0 <-------------------> dpdk0 <--> > > internal_port | > > >> | 192.168.0.1/24 | | > > 192.168.0.2/24 | > > >> *------------------------------* *--------------------- > -- > > ------* > > >> > > >> (HOST-1)# ping 192.168.0.2 > > >> > > >> In this case I'm expecting that first (NETDEV_MAX_BURST - 1) icmp > > packets > > >> will stuck in TX queue. The next packet will cause sending the whole > > batch of > > >> NETDEV_MAX_BURST icmp packets. That is not good behaviour. > > > > > > Thanks for test case. I tested this and found to be working with the > > patch. > > > The reason being, PMD threads uses intermediate queue implementation > by > > invoking 'netdev_dpdk_eth_tx_queue()', > > > Whereas the non-pmd thread(vswitchd) continues to call > > netdev_dpdk_eth_tx_burst() in the patch and no change in previous > behavior > > here. > > > > > > I had setup the case similar to one mentioned in above diagram using 2 > > hosts and assigning IP addresses to the respective bridges. > > > When I ping the Host2(192.168.0.2) from Host 1(192.168.0.1) below is > > the call path. > > > > > > (HOST-1)# ping 192.168.0.2 -c 1 (send only one packet) > > > > > > When sending ICMP echo request packet: > > > Vswitchd thread: dp_execute_cb() > > > netdev_send() [ > pkt > > sent on dpdk0, qid: 0] > > > netdev_dpdk_send__() > > > > dpdk_do_tx_copy() > > > > > netdev_dpdk_eth_tx_burst(). > > > > > > > > > On receiving ICMP echo reply packet: > > > PMD thread: dp_execute_cb() > > > netdev_send() [pkt sent on br0, > > qid: 1] > > > netdev_linux_send() > > > > > > > OK. Maybe in this exact case it works, but it isn't guaranteed that non- > > PMD > > threads will always call send with 'may_steal=false'. > > > > [Antonio F] In my test case - below the details - the non-PMD thread > is calling send with 'may_steal=true' and the packet is sent instantly. > > Regardless of the may_steal value the netdev_dpdk_send__() function > calls dpdk_do_tx_copy() for non-PMD threads because the condition > (batch->packets[0]->source != DPBUF_DPDK) > is true, as buffer data is not from DPDK allocated memory. > > In my test case I have one host with > - 2 dpdk ports dpdk0, dpdk1, > - an IP address is assigned to the internal port 'br0' and > - 1 single flow is set as > ovs-ofctl add-flow br0 actions=NORMAL > > When an ARP request is sent from an ixia generator to dpdk0 > 1. the PMD thread forwards it to the internal br0 with may_steal=0, > via netdev_linux_send() > 2. the PMD thread also forwards the ARP request to dpdk1 with may_steal=1. > As buffer data is from dpdk allocated mem > if (OVS_UNLIKELY(!may_steal || > batch->packets[0]->source != DPBUF_DPDK)) > is false => ARP request is put into the queue. > > > When an ARP reply is received from br0: > 1. the non-PMD thread forwards it to dpdk0 with may_steal=1. > As buffer data is NOT from dpdk allocated mem the following > if (OVS_UNLIKELY(!may_steal || > batch->packets[0]->source != DPBUF_DPDK)) > is true => dpdk_do_tx_copy() is called and the packet is sent out > immediately. > > > > Also, I see another issue here: > > > > 8. netdev_send(.., may_steal=true, ...) uses instant send while > > netdev_send(.., may_steal=false, ...) uses queueing of packets. > > This behaviour will lead to packet reordering issues in some > > cases. > > [Antonio F] The instant send will happen when in netdev_dpdk_send__() the > condition > > if (OVS_UNLIKELY(!may_steal || > batch->packets[0]->source != DPBUF_DPDK)) > > is true => dpdk_do_tx_copy() => netdev_dpdk_eth_tx_burst() is called. > Can you please provide more details about the packet reordering issue you > mentioned? > > > > > 9. You're decreasing 'txq->count' inside > 'netdev_dpdk_eth_tx_burst()' > > while it can be called with different from 'txq->burst_pkts' > > argument. Some of queued packets will be lost + memory leak. > > > > Best regards, Ilya Maximets. > > > > > > > >> > > >>>> 2. 'xps_get_tx_qid()' called twice. First time on send and > > >>>> the second time on drain. This may lead to different > > >>>> returned 'tx_qid's and packets will stuck forever in > > >>>> tx buffer. > > >>> > > >>> You are right and xps_get_tx_qid() can return different tx_qids. > > >>> I was always testing for XPS disabled cases and completely > overlooked > > this > > >> case. > > >>> This could be a potential problem for us and any suggestions should > be > > very > > >> helpful. > > >>> > > >>>> > > >>>> 3. 'txq_drain()' must take the 'tx_lock' for queue in case > > >>>> of dynamic tx queues. > > >>> > > >>> Agree, will handle this in next version. > > >>> > > >>>> > > >>>> 4. Waiting for 1024 polling cycles of PMD thread may cause > > >>>> a huge latency if we have few packets per second on one > > >>>> port and intensive traffic on others. > > >>> > > >>> I agree with you. We discussed this here and thought invoking the > > >>> drain logic once every 1024 cycles is an optimal solution. But if > the > > >>> community thinks otherwise we can move this in to the main 'for' > loop > > >>> so that it can be invoked more often provided that 'DRAIN_TSC' > cycles > > >> elapsed. > > >>> > > >>>> > > >>>> 5. This patch breaks the counting of 'tx_dropped' packets > > >>>> in netdev-dpdk. > > >>> > > >>> I presume you are referring to below line in the code. > > >>> > > >>> - dropped += netdev_dpdk_eth_tx_burst() > > >>> + netdev_dpdk_eth_tx_queue() > > >>> > > >>> Will handle this in v2 of this patch. > > >>> > > >>>> > > >>>> 6. Comments in netdev-provider.h should be fixed to reflect > > >>>> all the changes. > > >>> > > >>> Will make necessary changes in netdev-provider.h > > >>> > > >>>> > > >>>> 7. At last, I agree with Aaron that explicit allowing only > > >>>> 'dpdk' ports is not a good style. Also, mentioning name of > > >>>> exact netdev inside the common code is a bad style too. > > >>> > > >>> Will handle this appropriately. > > >>> > > >>> Regards, > > >>> Bhanu Prakash. > > >>> > > >>>> Best regards, Ilya Maximets. > > >>>> > > >>>> On 16.12.2016 22:24, Aaron Conole wrote: > > >>>>> Hi Bhanu, > > >>>>> > > >>>>> Bhanuprakash Bodireddy <[email protected]> writes: > > >>>>> > > >>>>>> In exact match cache processing on an EMC hit, packets are queued > > >>>>>> in to batches matching the flow. Thereafter, packets are > processed > > >>>>>> in batches for faster packet processing. This particularly is > > >>>>>> inefficient if there are fewer packets in a batch as > > >>>>>> rte_eth_tx_burst() incurs expensive MMIO write. > > >>>>>> > > >>>>>> This commit adds back intermediate queue implementation. Packets > > >>>>>> are queued and burst when the packet count >= NETDEV_MAX_BURST. > > >>>>>> Also drain logic is refactored to handle fewer packets in the tx > > queues. > > >>>>>> Testing shows significant performance gains with queueing. > > >>>>>> > > >>>>>> Fixes: b59cc14e032d("netdev-dpdk: Use instant sending instead of > > >>>>>> queueing of packets.") > > >>>>>> Signed-off-by: Bhanuprakash Bodireddy > > >>>>>> <[email protected]> > > >>>>>> Signed-off-by: Antonio Fischetti <[email protected]> > > >>>>>> Co-authored-by: Antonio Fischetti <[email protected]> > > >>>>>> Signed-off-by: Markus Magnusson > > >> <[email protected]> > > >>>>>> Co-authored-by: Markus Magnusson > > >> <[email protected]> > > >>>>>> --- > > >>>>> > > >>>>> I've Cc'd Ilya just in hopes that the patch gets a better review > > >>>>> than I could give. As a general comment, I like the direction - > > >>>>> batched operations are usually a better way of going. > > >>>>> > > >>>>> Just minor below. > > >>>>> > > >>>>> ... snip ... > > >>>>>> > > >>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index > > >>>>>> 3509493..65dff83 100644 > > >>>>>> --- a/lib/dpif-netdev.c > > >>>>>> +++ b/lib/dpif-netdev.c > > >>>>>> @@ -622,6 +622,9 @@ static int dpif_netdev_xps_get_tx_qid(const > > >>>>>> struct dp_netdev_pmd_thread *pmd, static inline bool > > >>>>>> emc_entry_alive(struct emc_entry *ce); static void > > >>>>>> emc_clear_entry(struct emc_entry *ce); > > >>>>>> > > >>>>>> +static struct tx_port *pmd_send_port_cache_lookup > > >>>>>> + (const struct dp_netdev_pmd_thread *pmd, odp_port_t > port_no); > > >>>>>> + > > >>>>>> static void > > >>>>>> emc_cache_init(struct emc_cache *flow_cache) { @@ -2877,6 > > >>>>>> +2880,31 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd, > > >> } > > >>>>>> > > >>>>>> static void > > >>>>>> +dp_netdev_drain_txq_port(struct dp_netdev_pmd_thread *pmd, > > >>>>>> + struct dp_netdev_port *port, > > >>>>>> + uint64_t now) { > > >>>>>> + int tx_qid; > > >>>>>> + > > >>>>>> + if (!strcmp(port->type, "dpdk")) { > > >>>>> > > >>>>> Any reason to restrict this only to dpdk ports? It looks like > > >>>>> you've added a new netdev operation, so why not just call the > > >>>>> netdev_txq_drain unconditionally? > > >>>>> > > >>>>> Also, bit of a nit, but tq_qid can be reduced in scope down to the > > >>>>> if block below. > > >>>>> > > >>>>>> + struct tx_port *tx = pmd_send_port_cache_lookup(pmd, > > >>>>>> + u32_to_odp(port->port_no)); > > >>>>>> + > > >>>>>> + if (OVS_LIKELY(tx)) { > > >>>>>> + bool dynamic_txqs = tx->port->dynamic_txqs; > > >>>>>> + > > >>>>>> + if (dynamic_txqs) { > > >>>>>> + tx_qid = dpif_netdev_xps_get_tx_qid(pmd, tx, > now); > > >>>>>> + } else { > > >>>>>> + tx_qid = pmd->static_tx_qid; > > >>>>>> + } > > >>>>>> + > > >>>>>> + netdev_txq_drain(port->netdev, tx_qid); > > >>>>>> + } > > >>>>>> + } > > >>>>>> +} > > >>>>>> + > > >>>>> > > >>>>> > > >>>>> > > >>> > > >>> > > >>> > > > > > > > > > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
