On 19.12.2016 21:05, Bodireddy, Bhanuprakash wrote: > Thanks Ilya and Aaron for reviewing this patch and providing your comments, > my reply inline. > >> -----Original Message----- >> From: Ilya Maximets [mailto:[email protected]] >> Sent: Monday, December 19, 2016 8:41 AM >> To: Aaron Conole <[email protected]>; Bodireddy, Bhanuprakash >> <[email protected]> >> Cc: [email protected]; Daniele Di Proietto <[email protected]>; >> Thadeu Lima de Souza Cascardo <[email protected]>; Fischetti, Antonio >> <[email protected]>; Markus Magnusson >> <[email protected]> >> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue during >> packet transmission. >> >> Hi, >> >> I didn't test this patch yet. Bhanu, could you please describe your test >> scenario >> and performance results in more details. > > During the recent performance analysis improvements for classifier, we found > that bottleneck was also observed at flow batching. > This was due to fewer packets in batch. To reproduce this, a simple P2P test > case can be used with 30 IXIA streams and matching IP flow rules. > Eg: For an IXIA stream with src Ip: 2.2.2.1, dst tip: 5.5.5.1 > ovs-ofctl add-flow br0 dl_type=0x0800,nw_src=2.2.2.1,actions=output:2 > > For an IXIA stream with src Ip: 4.4.4.1, dst tip: 15.15.15.1 > ovs-ofctl add-flow br0 dl_type=0x0800,nw_src=4.4.4.1,actions=output:2 > > This leaves fewer packets in batches and packet_batch_per_flow_execute() > shall be invoked for every batch. > With 30 flows, I see the throughput drops to ~7.0 Mpps in PHY2PHY case for 64 > byte udp packets. > >> It'll be nice if you provide throughput and latency measurement results for >> different scenarios and packet sizes. Latency is important here. > We are yet to do latency measurements in this case. With 30 IXIA streams > comprising of 64 byte udp packets there was > an throughput improvement of 30% in P2P case and 13-15% in PVP case(single > queue). we will try to get the latency stats > with and without this patch. > >> >> About the patch itself: >> >> 1. 'drain' called only for PMD threads. This will lead to >> broken connection with non-PMD ports. > I see that non-PMD ports are handled with vswitchd thread. > Tested PVP loopback case with tap ports and found to be working as expected. > Can you let me know the specific case > you are referring here so that I can verify if the patch breaks it.
I meant something like this: *-----------HOST-1(br-int)-----* *---------HOST-2(br-int)------* | | | | | internal_port <--> dpdk0 <-------------------> dpdk0 <--> internal_port | | 192.168.0.1/24 | | 192.168.0.2/24 | *------------------------------* *-----------------------------* (HOST-1)# ping 192.168.0.2 In this case I'm expecting that first (NETDEV_MAX_BURST - 1) icmp packets will stuck in TX queue. The next packet will cause sending the whole batch of NETDEV_MAX_BURST icmp packets. That is not good behaviour. >> 2. 'xps_get_tx_qid()' called twice. First time on send and >> the second time on drain. This may lead to different >> returned 'tx_qid's and packets will stuck forever in >> tx buffer. > > You are right and xps_get_tx_qid() can return different tx_qids. > I was always testing for XPS disabled cases and completely overlooked this > case. > This could be a potential problem for us and any suggestions should be very > helpful. > >> >> 3. 'txq_drain()' must take the 'tx_lock' for queue in case >> of dynamic tx queues. > > Agree, will handle this in next version. > >> >> 4. Waiting for 1024 polling cycles of PMD thread may cause >> a huge latency if we have few packets per second on one >> port and intensive traffic on others. > > I agree with you. We discussed this here and thought invoking the drain > logic once every 1024 cycles is an optimal solution. But if the community > thinks otherwise > we can move this in to the main 'for' loop so that it can be invoked more > often provided that > 'DRAIN_TSC' cycles elapsed. > >> >> 5. This patch breaks the counting of 'tx_dropped' packets >> in netdev-dpdk. > > I presume you are referring to below line in the code. > > - dropped += netdev_dpdk_eth_tx_burst() > + netdev_dpdk_eth_tx_queue() > > Will handle this in v2 of this patch. > >> >> 6. Comments in netdev-provider.h should be fixed to reflect >> all the changes. > > Will make necessary changes in netdev-provider.h > >> >> 7. At last, I agree with Aaron that explicit allowing only >> 'dpdk' ports is not a good style. Also, mentioning name of >> exact netdev inside the common code is a bad style too. > > Will handle this appropriately. > > Regards, > Bhanu Prakash. > >> Best regards, Ilya Maximets. >> >> On 16.12.2016 22:24, Aaron Conole wrote: >>> Hi Bhanu, >>> >>> Bhanuprakash Bodireddy <[email protected]> writes: >>> >>>> In exact match cache processing on an EMC hit, packets are queued in >>>> to batches matching the flow. Thereafter, packets are processed in >>>> batches for faster packet processing. This particularly is >>>> inefficient if there are fewer packets in a batch as >>>> rte_eth_tx_burst() incurs expensive MMIO write. >>>> >>>> This commit adds back intermediate queue implementation. Packets are >>>> queued and burst when the packet count >= NETDEV_MAX_BURST. Also >>>> drain logic is refactored to handle fewer packets in the tx queues. >>>> Testing shows significant performance gains with queueing. >>>> >>>> Fixes: b59cc14e032d("netdev-dpdk: Use instant sending instead of >>>> queueing of packets.") >>>> Signed-off-by: Bhanuprakash Bodireddy >>>> <[email protected]> >>>> Signed-off-by: Antonio Fischetti <[email protected]> >>>> Co-authored-by: Antonio Fischetti <[email protected]> >>>> Signed-off-by: Markus Magnusson <[email protected]> >>>> Co-authored-by: Markus Magnusson <[email protected]> >>>> --- >>> >>> I've Cc'd Ilya just in hopes that the patch gets a better review than >>> I could give. As a general comment, I like the direction - batched >>> operations are usually a better way of going. >>> >>> Just minor below. >>> >>> ... snip ... >>>> >>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index >>>> 3509493..65dff83 100644 >>>> --- a/lib/dpif-netdev.c >>>> +++ b/lib/dpif-netdev.c >>>> @@ -622,6 +622,9 @@ static int dpif_netdev_xps_get_tx_qid(const >>>> struct dp_netdev_pmd_thread *pmd, static inline bool >>>> emc_entry_alive(struct emc_entry *ce); static void >>>> emc_clear_entry(struct emc_entry *ce); >>>> >>>> +static struct tx_port *pmd_send_port_cache_lookup >>>> + (const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no); >>>> + >>>> static void >>>> emc_cache_init(struct emc_cache *flow_cache) { @@ -2877,6 +2880,31 >>>> @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd, } >>>> >>>> static void >>>> +dp_netdev_drain_txq_port(struct dp_netdev_pmd_thread *pmd, >>>> + struct dp_netdev_port *port, >>>> + uint64_t now) { >>>> + int tx_qid; >>>> + >>>> + if (!strcmp(port->type, "dpdk")) { >>> >>> Any reason to restrict this only to dpdk ports? It looks like you've >>> added a new netdev operation, so why not just call the >>> netdev_txq_drain unconditionally? >>> >>> Also, bit of a nit, but tq_qid can be reduced in scope down to the if >>> block below. >>> >>>> + struct tx_port *tx = pmd_send_port_cache_lookup(pmd, >>>> + u32_to_odp(port->port_no)); >>>> + >>>> + if (OVS_LIKELY(tx)) { >>>> + bool dynamic_txqs = tx->port->dynamic_txqs; >>>> + >>>> + if (dynamic_txqs) { >>>> + tx_qid = dpif_netdev_xps_get_tx_qid(pmd, tx, now); >>>> + } else { >>>> + tx_qid = pmd->static_tx_qid; >>>> + } >>>> + >>>> + netdev_txq_drain(port->netdev, tx_qid); >>>> + } >>>> + } >>>> +} >>>> + >>> >>> >>> > > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
