Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-12-02 Thread Yong Wang
> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monja...@6wind.com]
> Sent: Wednesday, November 30, 2016 10:27 AM
> To: Ananyev, Konstantin <konstantin.anan...@intel.com>
> Cc: Harish Patil <harish.pa...@qlogic.com>; dev@dpdk.org; Rahul Lakkireddy
> <rahul.lakkire...@chelsio.com>; Stephen Hurd
> <stephen.h...@broadcom.com>; Jan Medala <j...@semihalf.com>; Jakub
> Palider <j...@semihalf.com>; John Daley <johnd...@cisco.com>; Adrien
> Mazarguil <adrien.mazarg...@6wind.com>; Alejandro Lucero
> <alejandro.luc...@netronome.com>; Rasesh Mody
> <rasesh.m...@qlogic.com>; Jacob, Jerin <jerin.ja...@cavium.com>;
> Yuanhan Liu <yuanhan@linux.intel.com>; Yong Wang
> <yongw...@vmware.com>; Kulasek, TomaszX
> <tomaszx.kula...@intel.com>; olivier.m...@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> 
> 2016-11-30 17:42, Ananyev, Konstantin:
> > > >Please, we need a comment for each driver saying
> > > >"it is OK, we do not need any checksum preparation for TSO"
> > > >or
> > > >"yes we have to implement tx_prepare or TSO will not work in this
> mode"
> > > >
> > >
> > > qede PMD doesn’t currently support TSO yet, it only supports Tx
> TCP/UDP/IP
> > > csum offloads.
> > > So Tx preparation isn’t applicable. So as of now -
> > > "it is OK, we do not need any checksum preparation for TSO"
> >
> > Thanks for the answer.
> > Though please note that it not only for TSO.
> 
> Oh yes, sorry, my wording was incorrect.
> We need to know if any checksum preparation is needed prior
> offloading its final computation to the hardware or driver.
> So the question applies to TSO and simple checksum offload.
> 
> We are still waiting answers for
>   bnxt, cxgbe, ena, nfp, thunderx, virtio and vmxnet3.

The case for a virtual device is a little bit more complicated as packets 
offloaded from a virtual device can eventually be delivered to another virtual 
NIC or different physical NICs that have different offload requirements.  In 
ESX, the hypervisor will enforce that the packets offloaded will be something 
that the hardware expects.  The contract for vmxnet3 is that the guest needs to 
fill in pseudo header checksum for both l4 checksum only and TSO + l4 checksum 
offload cases.

> > This is for any TX offload for which the upper layer SW would have
> > to modify the contents of the packet.
> > Though as I can see for qede neither PKT_TX_IP_CKSUM or
> PKT_TX_TCP_CKSUM
> > exhibits any extra requirements for the user.
> > Is that correct?



Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-12-01 Thread Jerin Jacob
On Thu, Dec 01, 2016 at 09:58:31AM +0100, Thomas Monjalon wrote:
> 2016-12-01 08:15, Adrien Mazarguil:
> > I'm perhaps a bit pessimistic mind you, but I do not think tx_prepare() will
> > remain optional for long. Sure, PMDs that do not implement it do not care,
> > I'm focusing on applications, for which the performance impact of calling
> > tx_prepare() followed by tx_burst() is higher than a single tx_burst()
> > performing all the necessary preparation at once.
> 
> I agree that tx_prepare() should become mandatory shortly.

I agree. The tx_prepare has to be mandatory. Application will have no
idea on how PMD drivers use this hook to fix up PMD tx side limitations.
On other side, if it turns out to be mandatory, what real benefit it is
going to have compared to existing scheme of just tx_burst.

> 
> > [...]
> > > > Following the same logic, why can't such a thing be made part of the TX
> > > > burst function as well (through a direct call to rte_phdr_cksum_fix()
> > > > whenever necessary). From an application standpoint, what are the 
> > > > advantages
> > > > of having to:
> > > > 
> > > >  if (tx_prep()) // iterate and update mbufs as needed
> > > >  tx_burst(); // iterate and send
> > > > 
> > > > Compared to:
> > > > 
> > > >  tx_burst(); // iterate, update as needed and send
> > > 
> > > I think that was discussed extensively quite a lot previously here:
> > > As Thomas already replied - main motivation is to allow user
> > > to execute them on different stages of packet TX pipeline,
> > > and probably on different cores.
> > > I think that provides better flexibility to the user to when/where
> > > do these preparations and hopefully would lead to better performance.
> > 
> > And I agree, I think this use case is valid but does not warrant such a high
> > penalty when your application does not need that much flexibility. Simple
> > (yet conscious) applications need the highest performance. Complex ones as
> > you described already suffer quite a bit from IPCs and won't mind a couple
> > of extra CPU cycles right?
> > 
> > Yes they will, therefore we need a method that satisfies both cases.
> > 
> > As a possible solution, a special mbuf flag could be added to each mbuf
> > having gone through tx_prepare(). That way, tx_burst() could skip some
> > checks and things it would otherwise have done.
> 
> I like this idea!
>