Re: [Intel-wired-lan] [RFC PATCH next 2/2] i40e: add support for macvlan hardware offload

2017-10-17 Thread Shannon Nelson

On 10/17/2017 2:32 PM, Alexander Duyck wrote:


So the select_queue function being needed is the deal breaker on all
of this as far as I am concerned. We aren't allowed to use it under
other cases so why should macvlan be an exception to the rule?


I realize that the stack is pretty good at chosing the "right" queue, 
which is my understanding as to why we shouldn't use select_queue(), but 
it doesn't know how to use the accel_priv context associated with the 
macvlan offload.


I saw DaveM's guidance to the HiNIC folks when they tried to add 
select_queue(): "do not implement this function unless you absolutely 
need to do something custom in your driver".  I can see where this might 
be the exception.


When originally thinking about how to do this, I wanted to use the 
accel_priv as a pointer to the VSI to be used for the offload, then we 
could have multiple queues and use all the VSI specific tuning 
operations that XL710 has available.  It can work when selecting the 
queue, but by the time you get to start_xmit(), you no longer have that 
context and only have the queue number.  You can't do any fancy encoding 
in the queue number because the value has to be within 
dev->num_tx_queues.  Maybe we can add accel_priv to the start_xmit 
interface?  (I can hear the groans already...)


However... for our case, you might be right anyway.  If the stack is 
doing its job at keeping the conversation on the one queue/irq/cpu 
combination, any Tx following the offloaded Rx might already be headed 
for the right Tx queue.  I'll check on that.

I think we should probably look at a different approach for this. For
example why is it we need to use a different transmit path for a
macvlan packet vs any other packet? On the Rx side we get the
advantage of avoiding the software hashing and demux. What do we get
for reserving queues for transmit?


There are a couple of reasons I can think of to keep the Tx on the 
specific queue pair:


- Keep the Tx traffic on the same CPU and irq as the Rx traffic

- Don't let the flow get interrupted, slowed, or otherwise perturbed by 
other traffic flows.


- Allow for adding hardware assisted bandwidth constraints to the 
offloaded flow without bothering the rest of the NIC's traffic


Are these enough to want to guarantee the Tx queue?


My plan for this is to go back and "fix" ixgbe so we can get it away
from having to use the select_queue call for the macvlan offload and
then maybe look at proving a few select NDO operations for allowing
macvlans that are being offloaded to make specific calls into the
hardware to perform tasks as needed.


The ixgbe implementation can certainly be improved.  I think its biggest 
failing is that the rest of the general traffic gets constrained to a 
single queue - no more RSS for load balancing.


sln



Re: [Intel-wired-lan] [RFC PATCH next 2/2] i40e: add support for macvlan hardware offload

2017-10-17 Thread Alexander Duyck
On Tue, Oct 17, 2017 at 2:18 PM, Shannon Nelson
 wrote:
> This patch adds support for macvlan hardware offload (l2-fwd-offload)
> feature using the XL710's macvlan-to-queue filtering machanism.  These
> are most useful for supporting separate mac addresses for Container
> virtualization using Docker and similar configurations.
>
> The basic design is to partition off some of the PF's general LAN queues
> outside of the standard RSS pool and use them as the offload queues.
> This especially makes sense on machines with more than 64 CPUs: since
> the RSS pool is limited to a maximum of 64, the queues assigned to the
> remaining CPUs essentially go unused.  When on a machine with fewer than
> 64 CPUs, we shrink the RSS pool and use the upper queues for the offload.
>
> If the user has added Flow Director filters, enabling of macvlan offload
> is disallowed.
>
> To use this feature, use ethtool to enable l2-fwd-offload
> ethtool -K ethX l2-fwd-offload on
> When the next macvlan devices are created on ethX, the macvlan driver
> will automatically attempt to setup the hardweare offload.
>
> Signed-off-by: Shannon Nelson 
> ---
>  drivers/net/ethernet/intel/i40e/i40e.h |   10 +
>  drivers/net/ethernet/intel/i40e/i40e_ethtool.c |   15 ++
>  drivers/net/ethernet/intel/i40e/i40e_main.c|  239 
> +++-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h|1 +
>  4 files changed, 264 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
> b/drivers/net/ethernet/intel/i40e/i40e.h
> index a187f53..4868ae2 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> @@ -365,6 +365,10 @@ struct i40e_pf {
> u8 atr_sample_rate;
> bool wol_en;
>
> +   u16 macvlan_hint;
> +   u16 macvlan_used;
> +   u16 macvlan_num;
> +
> struct hlist_head fdir_filter_list;
> u16 fdir_pf_active_filters;
> unsigned long fd_flush_timestamp;
> @@ -712,6 +716,12 @@ struct i40e_netdev_priv {
> struct i40e_vsi *vsi;
>  };
>
> +struct i40e_fwd {
> +   struct net_device *vdev;
> +   u16 tx_base_queue;
> +   /* future expansion here might include number of queues */
> +};
> +
>  /* struct that defines an interrupt vector */
>  struct i40e_q_vector {
> struct i40e_vsi *vsi;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
> b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> index afd3ca8..e1628c1 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> @@ -3817,6 +3817,13 @@ static int i40e_set_rxnfc(struct net_device *netdev, 
> struct ethtool_rxnfc *cmd)
> struct i40e_pf *pf = vsi->back;
> int ret = -EOPNOTSUPP;
>
> +   if (pf->macvlan_num) {
> +   dev_warn(>pdev->dev,
> +"Remove %d remaining macvlan offloads to change 
> filter options\n",
> +pf->macvlan_used);
> +   return -EBUSY;
> +   }
> +
> switch (cmd->cmd) {
> case ETHTOOL_SRXFH:
> ret = i40e_set_rss_hash_opt(pf, cmd);
> @@ -3909,6 +3916,14 @@ static int i40e_set_channels(struct net_device *dev,
> if (count > i40e_max_channels(vsi))
> return -EINVAL;
>
> +   /* verify that macvlan offloads are not in use */
> +   if (pf->macvlan_num) {
> +   dev_warn(>pdev->dev,
> +"Remove %d remaining macvlan offloads to change 
> channel count\n",
> +pf->macvlan_used);
> +   return -EBUSY;
> +   }
> +
> /* verify that the number of channels does not invalidate any current
>  * flow director rules
>  */
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index e4b8a4b..7b26c6f 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -9221,6 +9221,66 @@ static void i40e_clear_rss_lut(struct i40e_vsi *vsi)
>  }
>
>  /**
> + * i40e_fix_features - fix the proposed netdev feature flags
> + * @netdev: ptr to the netdev being adjusted
> + * @features: the feature set that the stack is suggesting
> + * Note: expects to be called while under rtnl_lock()
> + **/
> +static netdev_features_t i40e_fix_features(struct net_device *netdev,
> +  netdev_features_t features)
> +{
> +   struct i40e_netdev_priv *np = netdev_priv(netdev);
> +   struct i40e_pf *pf = np->vsi->back;
> +   struct i40e_vsi *vsi = np->vsi;
> +
> +   /* make sure there are queues to be used for macvlan offload */
> +   if (features & NETIF_F_HW_L2FW_DOFFLOAD &&
> +   !(netdev->features & NETIF_F_HW_L2FW_DOFFLOAD)) {
> +   const u8 drop = I40E_FILTER_PROGRAM_DESC_DEST_DROP_PACKET;
> +