Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue during packet transmission.

Fischetti, Antonio Tue, 17 Jan 2017 07:30:51 -0800

Will respin a patch v2 where we addressed your comments. 
So please let's continue our discussion on v2.


Thanks,
Antonio

> -----Original Message-----
> From: [email protected] [mailto:ovs-dev-
> [email protected]] On Behalf Of Fischetti, Antonio
> Sent: Thursday, January 12, 2017 5:40 PM
> To: Ilya Maximets <[email protected]>; Bodireddy, Bhanuprakash
> <[email protected]>; Aaron Conole <[email protected]>
> Cc: [email protected]; Daniele Di Proietto <[email protected]>
> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue during
> packet transmission.
> 
> Hi Ilya,
> thanks for your detailed feedback. I've added a couple of replies
> inline about when the instant send happens and the may_steal param.
> 
> 
> Regards,
> Antonio
> 
> > -----Original Message-----
> > From: Ilya Maximets [mailto:[email protected]]
> > Sent: Tuesday, December 20, 2016 12:47 PM
> > To: Bodireddy, Bhanuprakash <[email protected]>; Aaron
> > Conole <[email protected]>
> > Cc: [email protected]; Daniele Di Proietto <[email protected]>;
> > Thadeu Lima de Souza Cascardo <[email protected]>; Fischetti, Antonio
> > <[email protected]>; Markus Magnusson
> > <[email protected]>
> > Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue
> during
> > packet transmission.
> >
> > On 20.12.2016 15:19, Bodireddy, Bhanuprakash wrote:
> > >> -----Original Message-----
> > >> From: Ilya Maximets [mailto:[email protected]]
> > >> Sent: Tuesday, December 20, 2016 8:09 AM
> > >> To: Bodireddy, Bhanuprakash <[email protected]>; Aaron
> > >> Conole <[email protected]>
> > >> Cc: [email protected]; Daniele Di Proietto
> <[email protected]>;
> > >> Thadeu Lima de Souza Cascardo <[email protected]>; Fischetti,
> Antonio
> > >> <[email protected]>; Markus Magnusson
> > >> <[email protected]>
> > >> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue
> > during
> > >> packet transmission.
> > >>
> > >> On 19.12.2016 21:05, Bodireddy, Bhanuprakash wrote:
> > >>> Thanks Ilya and Aaron for reviewing this patch and providing your
> > >> comments, my reply inline.
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Ilya Maximets [mailto:[email protected]]
> > >>>> Sent: Monday, December 19, 2016 8:41 AM
> > >>>> To: Aaron Conole <[email protected]>; Bodireddy, Bhanuprakash
> > >>>> <[email protected]>
> > >>>> Cc: [email protected]; Daniele Di Proietto
> > >>>> <[email protected]>; Thadeu Lima de Souza Cascardo
> > >>>> <[email protected]>; Fischetti, Antonio
> > >>>> <[email protected]>; Markus Magnusson
> > >>>> <[email protected]>
> > >>>> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue
> > >>>> during packet transmission.
> > >>>>
> > >>>> Hi,
> > >>>>
> > >>>> I didn't test this patch yet. Bhanu, could you please describe your
> > >>>> test scenario and performance results in more details.
> > >>>
> > >>> During the recent performance analysis improvements for classifier,
> we
> > >> found that bottleneck was also observed at flow batching.
> > >>> This was due to fewer packets in batch. To reproduce this, a simple
> > P2P test
> > >> case can be used with 30 IXIA streams and matching IP flow rules.
> > >>> Eg:  For an IXIA stream with src Ip: 2.2.2.1, dst tip: 5.5.5.1
> > >>>         ovs-ofctl add-flow br0
> > >>> dl_type=0x0800,nw_src=2.2.2.1,actions=output:2
> > >>>
> > >>>         For an IXIA stream with src Ip: 4.4.4.1, dst tip: 15.15.15.1
> > >>>         ovs-ofctl add-flow br0
> > >>> dl_type=0x0800,nw_src=4.4.4.1,actions=output:2
> > >>>
> > >>> This leaves fewer packets in batches and
> > packet_batch_per_flow_execute()
> > >> shall be invoked for every batch.
> > >>> With 30 flows, I see the throughput drops to ~7.0 Mpps in PHY2PHY
> case
> > for
> > >> 64 byte udp packets.
> > >>>
> > >>>> It'll be nice if you provide throughput and latency measurement
> > >>>> results for different scenarios and packet sizes. Latency is
> > important here.
> > >>> We are yet to do latency measurements in this case. With 30 IXIA
> > >>> streams comprising of 64 byte udp packets there was an throughput
> > >>> improvement of 30% in P2P case and 13-15% in PVP case(single queue).
> > we
> > >> will try to get the latency stats with and without this patch.
> > >>>
> > >>>>
> > >>>> About the patch itself:
> > >>>>
> > >>>>  1. 'drain' called only for PMD threads. This will lead to
> > >>>>     broken connection with non-PMD ports.
> > >>> I see that non-PMD ports are handled with vswitchd thread.
> > >>> Tested PVP loopback case with tap ports and found to be working as
> > >>> expected. Can you let me know the specific case you are referring
> here
> > so
> > >> that I can verify if the patch breaks it.
> > >>
> > >> I meant something like this:
> > >>
> > >>
> > >> *-----------HOST-1(br-int)-----*               *---------HOST-2(br-
> > int)------*
> > >> |                              |               |
> > |
> > >> |   internal_port <--> dpdk0 <-------------------> dpdk0 <-->
> > internal_port  |
> > >> |   192.168.0.1/24             |               |
> > 192.168.0.2/24 |
> > >> *------------------------------*               *---------------------
> --
> > ------*
> > >>
> > >> (HOST-1)# ping 192.168.0.2
> > >>
> > >> In this case I'm expecting that first (NETDEV_MAX_BURST - 1) icmp
> > packets
> > >> will stuck in TX queue. The next packet will cause sending the whole
> > batch of
> > >> NETDEV_MAX_BURST icmp packets. That is not good behaviour.
> > >
> > > Thanks for test case. I tested this and found to be working with the
> > patch.
> > > The reason being, PMD threads uses intermediate queue implementation
> by
> > invoking 'netdev_dpdk_eth_tx_queue()',
> > > Whereas the non-pmd thread(vswitchd) continues to call
> > netdev_dpdk_eth_tx_burst() in the patch and no change in previous
> behavior
> > here.
> > >
> > > I had setup the case similar to one mentioned in above diagram using 2
> > hosts and assigning IP addresses to the respective bridges.
> > > When I ping the  Host2(192.168.0.2) from Host 1(192.168.0.1) below is
> > the call path.
> > >
> > > (HOST-1)# ping 192.168.0.2 -c 1  (send only one packet)
> > >
> > > When sending ICMP echo request packet:
> > > Vswitchd thread:    dp_execute_cb()
> > >                                            netdev_send()           [
> pkt
> > sent on dpdk0, qid: 0]
> > >                                                  netdev_dpdk_send__()
> > >
> dpdk_do_tx_copy()
> > >
> > netdev_dpdk_eth_tx_burst().
> > >
> > >
> > > On receiving ICMP echo reply packet:
> > > PMD thread:   dp_execute_cb()
> > >                                     netdev_send()    [pkt sent on br0,
> > qid: 1]
> > >                                          netdev_linux_send()
> > >
> >
> > OK. Maybe in this exact case it works, but it isn't guaranteed that non-
> > PMD
> > threads will always call send with 'may_steal=false'.
> >
> 
> [Antonio F] In my test case - below the details - the non-PMD thread
> is calling send with 'may_steal=true' and the packet is sent instantly.
> 
> Regardless of the may_steal value the netdev_dpdk_send__() function
> calls dpdk_do_tx_copy() for non-PMD threads because the condition
> (batch->packets[0]->source != DPBUF_DPDK)
> is true, as buffer data is not from DPDK allocated memory.
> 
> In my test case I have one host with
> - 2 dpdk ports dpdk0, dpdk1,
> - an IP address is assigned to the internal port 'br0' and
> - 1 single flow is set as
>   ovs-ofctl add-flow br0 actions=NORMAL
> 
> When an ARP request is sent from an ixia generator to dpdk0
> 1. the PMD thread forwards it to the internal br0 with may_steal=0,
>    via netdev_linux_send()
> 2. the PMD thread also forwards the ARP request to dpdk1 with may_steal=1.
>    As buffer data is from dpdk allocated mem
>    if (OVS_UNLIKELY(!may_steal ||
>                      batch->packets[0]->source != DPBUF_DPDK))
>    is false => ARP request is put into the queue.
> 
> 
> When an ARP reply is received from br0:
> 1. the non-PMD thread forwards it to dpdk0 with may_steal=1.
>    As buffer data is NOT from dpdk allocated mem the following
>    if (OVS_UNLIKELY(!may_steal ||
>                      batch->packets[0]->source != DPBUF_DPDK))
>    is true => dpdk_do_tx_copy() is called and the packet is sent out
> immediately.
> 
> 
> > Also, I see another issue here:
> >
> >     8. netdev_send(.., may_steal=true, ...) uses instant send while
> >        netdev_send(.., may_steal=false, ...) uses queueing of packets.
> >        This behaviour will lead to packet reordering issues in some
> >        cases.
> 
> [Antonio F] The instant send will happen when in netdev_dpdk_send__() the
> condition
> 
>    if (OVS_UNLIKELY(!may_steal ||
>                      batch->packets[0]->source != DPBUF_DPDK))
> 
> is true => dpdk_do_tx_copy() => netdev_dpdk_eth_tx_burst() is called.
> Can you please provide more details about the packet reordering issue you
> mentioned?
> 
> >
> >     9. You're decreasing 'txq->count' inside
> 'netdev_dpdk_eth_tx_burst()'
> >        while it can be called with different from 'txq->burst_pkts'
> >        argument. Some of queued packets will be lost + memory leak.
> >
> > Best regards, Ilya Maximets.
> >
> > >
> > >>
> > >>>>  2. 'xps_get_tx_qid()' called twice. First time on send and
> > >>>>     the second time on drain. This may lead to different
> > >>>>     returned 'tx_qid's and packets will stuck forever in
> > >>>>     tx buffer.
> > >>>
> > >>> You are right and xps_get_tx_qid() can return different tx_qids.
> > >>>  I was always testing for XPS disabled cases and completely
> overlooked
> > this
> > >> case.
> > >>> This could be a potential problem for us and any suggestions should
> be
> > very
> > >> helpful.
> > >>>
> > >>>>
> > >>>>  3. 'txq_drain()' must take the 'tx_lock' for queue in case
> > >>>>     of dynamic tx queues.
> > >>>
> > >>> Agree, will handle this in next version.
> > >>>
> > >>>>
> > >>>>  4. Waiting for 1024 polling cycles of PMD thread may cause
> > >>>>     a huge latency if we have few packets per second on one
> > >>>>     port and intensive traffic on others.
> > >>>
> > >>> I agree with you. We discussed this here and thought invoking the
> > >>> drain logic once every 1024 cycles is an optimal solution. But if
> the
> > >>> community thinks otherwise we can move this in to the main 'for'
> loop
> > >>> so that it can be invoked more often provided that 'DRAIN_TSC'
> cycles
> > >> elapsed.
> > >>>
> > >>>>
> > >>>>  5. This patch breaks the counting of 'tx_dropped' packets
> > >>>>     in netdev-dpdk.
> > >>>
> > >>> I presume you are referring to below line in the code.
> > >>>
> > >>> - dropped += netdev_dpdk_eth_tx_burst()
> > >>> + netdev_dpdk_eth_tx_queue()
> > >>>
> > >>> Will handle this in v2 of this patch.
> > >>>
> > >>>>
> > >>>>  6. Comments in netdev-provider.h should be fixed to reflect
> > >>>>     all the changes.
> > >>>
> > >>> Will make necessary changes in netdev-provider.h
> > >>>
> > >>>>
> > >>>>  7. At last, I agree with Aaron that explicit allowing only
> > >>>>     'dpdk' ports is not a good style. Also, mentioning name of
> > >>>>     exact netdev inside the common code is a bad style too.
> > >>>
> > >>> Will handle this appropriately.
> > >>>
> > >>> Regards,
> > >>> Bhanu Prakash.
> > >>>
> > >>>> Best regards, Ilya Maximets.
> > >>>>
> > >>>> On 16.12.2016 22:24, Aaron Conole wrote:
> > >>>>> Hi Bhanu,
> > >>>>>
> > >>>>> Bhanuprakash Bodireddy <[email protected]> writes:
> > >>>>>
> > >>>>>> In exact match cache processing on an EMC hit, packets are queued
> > >>>>>> in to batches matching the flow. Thereafter, packets are
> processed
> > >>>>>> in batches for faster packet processing. This particularly is
> > >>>>>> inefficient if there are fewer packets in a batch as
> > >>>>>> rte_eth_tx_burst() incurs expensive MMIO write.
> > >>>>>>
> > >>>>>> This commit adds back intermediate queue implementation. Packets
> > >>>>>> are queued and burst when the packet count >= NETDEV_MAX_BURST.
> > >>>>>> Also drain logic is refactored to handle fewer packets in the tx
> > queues.
> > >>>>>> Testing shows significant performance gains with queueing.
> > >>>>>>
> > >>>>>> Fixes: b59cc14e032d("netdev-dpdk: Use instant sending instead of
> > >>>>>> queueing of packets.")
> > >>>>>> Signed-off-by: Bhanuprakash Bodireddy
> > >>>>>> <[email protected]>
> > >>>>>> Signed-off-by: Antonio Fischetti <[email protected]>
> > >>>>>> Co-authored-by: Antonio Fischetti <[email protected]>
> > >>>>>> Signed-off-by: Markus Magnusson
> > >> <[email protected]>
> > >>>>>> Co-authored-by: Markus Magnusson
> > >> <[email protected]>
> > >>>>>> ---
> > >>>>>
> > >>>>> I've Cc'd Ilya just in hopes that the patch gets a better review
> > >>>>> than I could give.  As a general comment, I like the direction -
> > >>>>> batched operations are usually a better way of going.
> > >>>>>
> > >>>>> Just minor below.
> > >>>>>
> > >>>>> ... snip ...
> > >>>>>>
> > >>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> > >>>>>> 3509493..65dff83 100644
> > >>>>>> --- a/lib/dpif-netdev.c
> > >>>>>> +++ b/lib/dpif-netdev.c
> > >>>>>> @@ -622,6 +622,9 @@ static int dpif_netdev_xps_get_tx_qid(const
> > >>>>>> struct dp_netdev_pmd_thread *pmd,  static inline bool
> > >>>>>> emc_entry_alive(struct emc_entry *ce);  static void
> > >>>>>> emc_clear_entry(struct emc_entry *ce);
> > >>>>>>
> > >>>>>> +static struct tx_port *pmd_send_port_cache_lookup
> > >>>>>> +    (const struct dp_netdev_pmd_thread *pmd, odp_port_t
> port_no);
> > >>>>>> +
> > >>>>>>  static void
> > >>>>>>  emc_cache_init(struct emc_cache *flow_cache)  { @@ -2877,6
> > >>>>>> +2880,31 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd,
> > >> }
> > >>>>>>
> > >>>>>>  static void
> > >>>>>> +dp_netdev_drain_txq_port(struct dp_netdev_pmd_thread *pmd,
> > >>>>>> +                         struct dp_netdev_port *port,
> > >>>>>> +                         uint64_t now) {
> > >>>>>> +    int tx_qid;
> > >>>>>> +
> > >>>>>> +    if (!strcmp(port->type, "dpdk")) {
> > >>>>>
> > >>>>> Any reason to restrict this only to dpdk ports?  It looks like
> > >>>>> you've added a new netdev operation, so why not just call the
> > >>>>> netdev_txq_drain unconditionally?
> > >>>>>
> > >>>>> Also, bit of a nit, but tq_qid can be reduced in scope down to the
> > >>>>> if block below.
> > >>>>>
> > >>>>>> +        struct tx_port *tx = pmd_send_port_cache_lookup(pmd,
> > >>>>>> +                 u32_to_odp(port->port_no));
> > >>>>>> +
> > >>>>>> +        if (OVS_LIKELY(tx)) {
> > >>>>>> +            bool dynamic_txqs = tx->port->dynamic_txqs;
> > >>>>>> +
> > >>>>>> +            if (dynamic_txqs) {
> > >>>>>> +                tx_qid = dpif_netdev_xps_get_tx_qid(pmd, tx,
> now);
> > >>>>>> +            } else {
> > >>>>>> +                tx_qid = pmd->static_tx_qid;
> > >>>>>> +            }
> > >>>>>> +
> > >>>>>> +            netdev_txq_drain(port->netdev, tx_qid);
> > >>>>>> +        }
> > >>>>>> +    }
> > >>>>>> +}
> > >>>>>> +
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>
> > >>>
> > >>>
> > >
> > >
> > >
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue during packet transmission.

Reply via email to