Re: [PATCH net-next V3 0/3] Introduce adaptive TX interrupt moderation to net DIM

2018-04-24 Thread Andy Gospodarek
On Tue, Apr 24, 2018 at 10:18:09AM -0400, David Miller wrote:
> From: Tal Gilboa 
> Date: Tue, 24 Apr 2018 13:36:00 +0300
> 
> > Net DIM is a library designed for dynamic interrupt moderation. It was
> > implemented and optimized with receive side interrupts in mind, since these
> > are usually the CPU expensive ones. This patch-set introduces adaptive 
> > transmit
> > interrupt moderation to net DIM, complete with a usage in the mlx5e driver.
> > Using adaptive TX behavior would reduce interrupt rate for multiple 
> > scenarios.
> > Furthermore, it is essential for increasing bandwidth on cases where payload
> > aggregation is required.
> > 
> > v3: Remove "inline" from functions in .c files (requested by DaveM). Revert
> > adding "enabled" field from struct net_dim and applied mlx5e structural
> > suggestions (suggested by SaeedM).
> > 
> > v2: Rebase over proper tree.
> > 
> > v1: Fix compilation issues due to missed function renaming.
> 
> I have no problem with this, series applied, thanks.
> 
> Although I have to say that I've always been suspicious of adaptive moderation
> schemes, especially if implemented in software.
> 
> My thinking was that at these kinds of link speeds, the conditions of the link
> change so fast that whatever state you've measured changes by the time you
> commit new settings to the chip.
> 
> It obviously helps, so I must be missing some piece of the puzzle in my mental
> analysis :-)

You are definitely correct that there are many cases where sessions are
so short that by the time a measurement is made and modified conditions
can change.

What I found when adding this to the bnxt_en driver was that for longer
running sessions/transfers (flows lasting secs not msecs) that the
adjustment can happen pretty quickly and you get a nice reduction in CPU
utilization during the duration of that transfer.

There is also an advantage that since this is done a per queue basis one
queue that may be handling a bulk transfer can have its coalescing
parameters adjusted while others stay at a setting that keeps traffic
flowing at low latency.  This is helpful when a system is receiving a
large amount of traffic on one queue but also sending data on another
queue and quick processing of acks keeps data flowing at high rate with
low CPU utilization in both directions.


Re: SRIOV switchdev mode BoF minutes

2018-04-18 Thread Andy Gospodarek
On Wed, Apr 18, 2018 at 09:26:34AM -0700, Jakub Kicinski wrote:
> On Wed, 18 Apr 2018 11:15:29 -0400, Andy Gospodarek wrote:
> > > A similar issue exists on multi-host for PFs, right?  If one of the
> > > hosts is down do we still show their PF repr?  IMHO yes.  
> > 
> > I would agree with that as well.  With today's model the VF reps are
> > created once a PF is put into switchdev mode, but I'm still working out
> > how we want to consider whether or not a PF rep for the other domains is
> > created locally or not and also how one can determine which domain is in
> > control.
> > 
> > Permanent config options (like NVRAM settings) could easily handle which
> > domain is in control, but that still does not mean that PF reps must be
> > created automatically, does it?
> 
> The control domain is tricky.  I'm not sure I understand how you could
> not have a PF rep for remote domains, though.  How do you configure
> switching to the PF netdev if there is no rep?

Yes, for complete control of all traffic using standard Linux APIs a PF
rep is a requirement.


Re: SRIOV switchdev mode BoF minutes

2018-04-18 Thread Andy Gospodarek
On Tue, Apr 17, 2018 at 04:19:15PM -0700, Jakub Kicinski wrote:
> On Tue, 17 Apr 2018 10:47:00 -0400, Andy Gospodarek wrote:
> > There is also a school of thought that the VF reps could be
> > pre-allocated on the SmartNIC so that any application processing that
> > traffic would sit idle when no traffic arrives on the rep, but could
> > process frames that do arrive when the VFs were created on the host.
> > This implementation will depend on how resources are allocated on a
> > given bit of hardware, but can really work well.
> 
> +1 if there is no FW resource allocation issues IMHO it's okay to
> just show all reprs for "remote PCIes (PFs and VFs)" on the SmartNIC/
> controller.  The reprs should just show link down as if PCIe cable
> was unpluged until host actually enables them.  

Yes we are on the same page on this.

> A similar issue exists on multi-host for PFs, right?  If one of the
> hosts is down do we still show their PF repr?  IMHO yes.

I would agree with that as well.  With today's model the VF reps are
created once a PF is put into switchdev mode, but I'm still working out
how we want to consider whether or not a PF rep for the other domains is
created locally or not and also how one can determine which domain is in
control.

Permanent config options (like NVRAM settings) could easily handle which
domain is in control, but that still does not mean that PF reps must be
created automatically, does it?

> That makes the thing looks more like a switch with cables being plugged
> in and out.

Yes, that's exactly how I view it as well.


Re: SRIOV switchdev mode BoF minutes

2018-04-17 Thread Andy Gospodarek
On Tue, Apr 17, 2018 at 09:46:38AM -0700, Samudrala, Sridhar wrote:
> On 4/17/2018 7:47 AM, Andy Gospodarek wrote:
> > On Tue, Apr 17, 2018 at 04:58:05PM +0300, Or Gerlitz wrote:
> > > On Tue, Apr 17, 2018 at 4:30 PM, Andy Gospodarek
> > > <andrew.gospoda...@broadcom.com> wrote:
> > > > On Mon, Apr 16, 2018 at 07:08:39PM -0700, Samudrala, Sridhar wrote:
> > > > > On 4/16/2018 5:39 AM, Andy Gospodarek wrote:
> > > > > > On Sun, Apr 15, 2018 at 09:01:16AM +0300, Or Gerlitz wrote:
> > > > > > > On Sat, Apr 14, 2018 at 2:03 AM, Samudrala, Sridhar
> > > > > > > <sridhar.samudr...@intel.com> wrote:
> > > > > > > 
> > > > > > > > I meant between PFs on 2 compute nodes.
> > > > > > > If the PF serves as uplink rep, it functions as  a switch port -- 
> > > > > > > applications
> > > > > > > don't run on switch ports. One way to get apps to run on the host 
> > > > > > > in switchdev
> > > > > > > mode is probe one of the VFs there.
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > So once a pci device is configured in 'switchdev' mode,  only port 
> > > > > representor netdevs are
> > > > > seen on the host, no more PF netdev.
> > > > That is not the functionality I would propose.  The PF netdev will 
> > > > still be there.
> > > Andy,
> > > 
> > > Basically LGTM, so even in smartnic configs, the PF @ the host is
> > > still privileged to
> > > create/destroy VFs or provision MACs for them even if it is not the
> > > e-switch manager
> > > anymore?
> > Yes, in a SmartNIC world one config we aim to have is that a host can create
> > and destroy VFs as needed.  One of the challenges is how the VF reps are
> > managed by applications in the SmartNIC when the host could make them
> > disappear.
> 
> OK. So are we saying that in 'switchdev' mode with 2 VFs and 1 uplink, the 
> host will
> see PF netdev, 2 vf-rep netdev's corresponding to 2 VFs and 1 uplink-rep 
> netdev.
> 
> Is PF netdev used only for the control/configure of the VFs? If it used as a 
> datapath,
> i think we need a pf-rep netdev too.
> 

Yes, that is correct.  PF reps could be used for datapath configuration to
redirect traffic to a PF.



Re: SRIOV switchdev mode BoF minutes

2018-04-17 Thread Andy Gospodarek
On Tue, Apr 17, 2018 at 04:58:05PM +0300, Or Gerlitz wrote:
> On Tue, Apr 17, 2018 at 4:30 PM, Andy Gospodarek
> <andrew.gospoda...@broadcom.com> wrote:
> > On Mon, Apr 16, 2018 at 07:08:39PM -0700, Samudrala, Sridhar wrote:
> >>
> >> On 4/16/2018 5:39 AM, Andy Gospodarek wrote:
> >> > On Sun, Apr 15, 2018 at 09:01:16AM +0300, Or Gerlitz wrote:
> >> > > On Sat, Apr 14, 2018 at 2:03 AM, Samudrala, Sridhar
> >> > > <sridhar.samudr...@intel.com> wrote:
> >> > >
> >> > > > I meant between PFs on 2 compute nodes.
> >> > > If the PF serves as uplink rep, it functions as  a switch port -- 
> >> > > applications
> >> > > don't run on switch ports. One way to get apps to run on the host in 
> >> > > switchdev
> >> > > mode is probe one of the VFs there.
> >> > >
> >> > >
> >> > >
> >> So once a pci device is configured in 'switchdev' mode,  only port 
> >> representor netdevs are
> >> seen on the host, no more PF netdev.
> >
> > That is not the functionality I would propose.  The PF netdev will still be 
> > there.
> 
> Andy,
> 
> Basically LGTM, so even in smartnic configs, the PF @ the host is
> still privileged to
> create/destroy VFs or provision MACs for them even if it is not the
> e-switch manager
> anymore?

Yes, in a SmartNIC world one config we aim to have is that a host can create
and destroy VFs as needed.  One of the challenges is how the VF reps are
managed by applications in the SmartNIC when the host could make them
disappear.  

> Actually AFAIK this  can also work somehow otherwise, e.g a smartnic FW
> "pushes" the VFs into the host w.o them being under a host admin directive.

The model to 'push' VFs to a host is also another option, but I do not
like it as much.  My general preference is to allow the host to use a
SmartNIC as if it was any other standard NIC (we have been using the
word 'Performance NIC' to desribe what we would call a standard NIC, but
the name is not terribly important).

There is also a school of thought that the VF reps could be
pre-allocated on the SmartNIC so that any application processing that
traffic would sit idle when no traffic arrives on the rep, but could
process frames that do arrive when the VFs were created on the host.
This implementation will depend on how resources are allocated on a
given bit of hardware, but can really work well.





Re: SRIOV switchdev mode BoF minutes

2018-04-17 Thread Andy Gospodarek
On Mon, Apr 16, 2018 at 07:08:39PM -0700, Samudrala, Sridhar wrote:
> 
> On 4/16/2018 5:39 AM, Andy Gospodarek wrote:
> > On Sun, Apr 15, 2018 at 09:01:16AM +0300, Or Gerlitz wrote:
> > > On Sat, Apr 14, 2018 at 2:03 AM, Samudrala, Sridhar
> > > <sridhar.samudr...@intel.com> wrote:
> > > 
> > > > I meant between PFs on 2 compute nodes.
> > > If the PF serves as uplink rep, it functions as  a switch port -- 
> > > applications
> > > don't run on switch ports. One way to get apps to run on the host in 
> > > switchdev
> > > mode is probe one of the VFs there.
> > > 
> > > 
> > > 
> So once a pci device is configured in 'switchdev' mode,  only port 
> representor netdevs are
> seen on the host, no more PF netdev.

That is not the functionality I would propose.  The PF netdev will still be 
there.

> Are you going to expose another way to change sriov_num_vfs when the device 
> is in
> 'switchdev' mode OR do we need to switch to 'legacy' mode to 
> increase/decrease the number of
> VFs?

Since the PF netdev will not disappear, the standard ways to configure number
of VF, etc is still available.

> Even in switchdev mode, i guess it will be possible for host apps to use the 
> IP configured
> on the uplink rep to talk externally.
> 
> In case of multiple uplinks, are you exposing one uplink-rep netdev per 
> uplink?


Re: SRIOV switchdev mode BoF minutes

2018-04-16 Thread Andy Gospodarek
On Sun, Apr 15, 2018 at 09:01:16AM +0300, Or Gerlitz wrote:
> On Sat, Apr 14, 2018 at 2:03 AM, Samudrala, Sridhar
>  wrote:
> 
> > I meant between PFs on 2 compute nodes.
> 
> If the PF serves as uplink rep, it functions as  a switch port -- applications
> don't run on switch ports. One way to get apps to run on the host in switchdev
> mode is probe one of the VFs there.
> 
> 
> [...]
> 
> > By smartnic env, i guess you are referring to OVS control plane also running
> > on the NIC.
> 
> correct
> 

Not just OvS, but other applications running on the SmartNIC could use tc for
programming hardware can benefit from a design like this.

> > I will look forward to your patches.
> 
> FWIW, note that my patches don't bring any newz for you.. I am aligning
> mlx5 with what was agreed on netdev, e.g nfp does it (uplink rep and
> such) already.

Probably not major news from us either since this was discussed at the last
NetConf, but we are planning to have this option for SmartNICs or PCI-multihost
NICs, too.


Re: [PATCH net v2 2/6] bnxt_en: do not allow wildcard matches for L2 flows

2018-04-11 Thread Andy Gospodarek
On Wed, Apr 11, 2018 at 01:41:31PM -0700, Michael Chan wrote:
> On Wed, Apr 11, 2018 at 1:31 PM, Andy Gospodarek
> <andrew.gospoda...@broadcom.com> wrote:
> > On Wed, Apr 11, 2018 at 11:43:14AM -0700, Jakub Kicinski wrote:
> >> On Wed, 11 Apr 2018 11:50:14 -0400, Michael Chan wrote:
> >> > @@ -764,6 +788,41 @@ static bool bnxt_tc_can_offload(struct bnxt *bp, 
> >> > struct bnxt_tc_flow *flow)
> >> > return false;
> >> > }
> >> >
> >> > +   /* Currently source/dest MAC cannot be partial wildcard  */
> >> > +   if (bits_set(>l2_key.smac, sizeof(flow->l2_key.smac)) &&
> >> > +   !is_exactmatch(flow->l2_mask.smac, sizeof(flow->l2_mask.smac))) {
> >> > +   netdev_info(bp->dev, "Wildcard match unsupported for Source 
> >> > MAC\n");
> >>
> >> This wouldn't be something to do in net, but how do you feel about
> >> using extack for messages like this?
> >>
> >
> > I agree 'net' would not have been the place for a change like that, but
> > I do think that would be a good idea.  It looks like we could easily
> > change the ndo_setup_tc to something like this:
> >
> > int (*ndo_setup_tc)(struct net_device *dev,
> > enum tc_setup_type type,
> > void *type_data,
> > struct netlink_ext_ack 
> > *extack);
> 
> I think the extack pointer is already in the tc_cls_common_offload
> struct inside tc_cls_flower_offload struct.

True, but I'm not sure that tc_cls_common_offload is used in all cases.
Take red_offload() as one of those.


Re: [PATCH net v2 2/6] bnxt_en: do not allow wildcard matches for L2 flows

2018-04-11 Thread Andy Gospodarek
On Wed, Apr 11, 2018 at 11:43:14AM -0700, Jakub Kicinski wrote:
> On Wed, 11 Apr 2018 11:50:14 -0400, Michael Chan wrote:
> > @@ -764,6 +788,41 @@ static bool bnxt_tc_can_offload(struct bnxt *bp, 
> > struct bnxt_tc_flow *flow)
> > return false;
> > }
> >  
> > +   /* Currently source/dest MAC cannot be partial wildcard  */
> > +   if (bits_set(>l2_key.smac, sizeof(flow->l2_key.smac)) &&
> > +   !is_exactmatch(flow->l2_mask.smac, sizeof(flow->l2_mask.smac))) {
> > +   netdev_info(bp->dev, "Wildcard match unsupported for Source 
> > MAC\n");
> 
> This wouldn't be something to do in net, but how do you feel about
> using extack for messages like this?
> 

I agree 'net' would not have been the place for a change like that, but
I do think that would be a good idea.  It looks like we could easily
change the ndo_setup_tc to something like this:

int (*ndo_setup_tc)(struct net_device *dev,
enum tc_setup_type type,
void *type_data,
struct netlink_ext_ack *extack);

It also looks like most of the callers of ndo_setup_tc have infra in
place to pass extack easily when the call is sourced from a netlink
message.   The others can just pass in NULL or define a local
netlink_ext_ack variable for short-term use.



Re: [PATCH net] net/dim: Fix int overflow

2018-03-29 Thread Andy Gospodarek
On Thu, Mar 29, 2018 at 01:53:52PM +0300, Tal Gilboa wrote:
> When calculating difference between samples, the values
> are multiplied by 100. Large values may cause int overflow
> when multiplied (usually on first iteration).
> Fixed by forcing 100 to be of type unsigned long.
> 
> Fixes: 4c4dbb4a7363 ("net/mlx5e: Move dynamic interrupt coalescing code to 
> include/linux")
> Signed-off-by: Tal Gilboa <ta...@mellanox.com>

Reviewed-by: Andy Gospodarek <go...@broadcom.com>

> ---
>  include/linux/net_dim.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
> index bebeaad..29ed8fd 100644
> --- a/include/linux/net_dim.h
> +++ b/include/linux/net_dim.h
> @@ -231,7 +231,7 @@ static inline void net_dim_exit_parking(struct net_dim 
> *dim)
>  }
>  
>  #define IS_SIGNIFICANT_DIFF(val, ref) \
> - (((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference 
> */
> + (((100UL * abs((val) - (ref))) / (ref)) > 10) /* more than 10% 
> difference */
>  
>  static inline int net_dim_stats_compare(struct net_dim_stats *curr,
>   struct net_dim_stats *prev)
> -- 
> 1.8.3.1
> 


Re: [PATCH net 3/3] bonding: process the err returned by dev_set_allmulti properly in bond_enslave

2018-03-26 Thread Andy Gospodarek
On Mon, Mar 26, 2018 at 01:16:47AM +0800, Xin Long wrote:
> When dev_set_promiscuity(1) succeeds but dev_set_allmulti(1) fails,
> dev_set_promiscuity(-1) should be done before going to the err path.
> Otherwise, dev->promiscuity will leak.
> 
> Fixes: 7e1a1ac1fbaa ("bonding: Check return of dev_set_promiscuity/allmulti")
> Signed-off-by: Xin Long <lucien@gmail.com>

Acked-by: Andy Gospodarek <a...@greyhouse.net>

> ---
>  drivers/net/bonding/bond_main.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 55e1985..b7b1130 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1706,8 +1706,11 @@ int bond_enslave(struct net_device *bond_dev, struct 
> net_device *slave_dev,
>   /* set allmulti level to new slave */
>   if (bond_dev->flags & IFF_ALLMULTI) {
>   res = dev_set_allmulti(slave_dev, 1);
> - if (res)
> + if (res) {
> + if (bond_dev->flags & IFF_PROMISC)
> + dev_set_promiscuity(slave_dev, -1);
>   goto err_sysfs_del;
> + }
>   }
>  
>   netif_addr_lock_bh(bond_dev);


Re: [PATCH net 2/3] bonding: move dev_mc_sync after master_upper_dev_link in bond_enslave

2018-03-26 Thread Andy Gospodarek
On Mon, Mar 26, 2018 at 01:16:46AM +0800, Xin Long wrote:
> Beniamino found a crash when adding vlan as slave of bond which is also
> the parent link:
> 
>   ip link add bond1 type bond
>   ip link set bond1 up
>   ip link add link bond1 vlan1 type vlan id 80
>   ip link set vlan1 master bond1
> 
> The call trace is as below:
> 
>   [] queued_spin_lock_slowpath+0xb/0xf
>   [] _raw_spin_lock+0x20/0x30
>   [] dev_mc_sync+0x37/0x80
>   [] vlan_dev_set_rx_mode+0x1c/0x30 [8021q]
>   [] __dev_set_rx_mode+0x5a/0xa0
>   [] dev_mc_sync_multiple+0x78/0x80
>   [] bond_enslave+0x67c/0x1190 [bonding]
>   [] do_setlink+0x9c9/0xe50
>   [] rtnl_newlink+0x522/0x880
>   [] rtnetlink_rcv_msg+0xa7/0x260
>   [] netlink_rcv_skb+0xab/0xc0
>   [] rtnetlink_rcv+0x28/0x30
>   [] netlink_unicast+0x170/0x210
>   [] netlink_sendmsg+0x308/0x420
>   [] sock_sendmsg+0xb6/0xf0
> 
> This is actually a dead lock caused by sync slave hwaddr from master when
> the master is the slave's 'slave'. This dead loop check is actually done
> by netdev_master_upper_dev_link. However, Commit 1f718f0f4f97 ("bonding:
> populate neighbour's private on enslave") moved it after dev_mc_sync.
> 
> This patch is to fix it by moving dev_mc_sync after master_upper_dev_link,
> so that this loop check would be earlier than dev_mc_sync. It also moves
> if (mode == BOND_MODE_8023AD) into if (!bond_uses_primary) clause as an
> improvement.

Nice optimization.  :-)

> 
> Note team driver also has this issue, I will fix it in another patch.
> 
> Fixes: 1f718f0f4f97 ("bonding: populate neighbour's private on enslave")
> Reported-by: Beniamino Galvani <bgalv...@redhat.com>
> Signed-off-by: Xin Long <lucien@gmail.com>

Acked-by: Andy Gospodarek <a...@greyhouse.net>

> ---
>  drivers/net/bonding/bond_main.c | 73 
> -
>  1 file changed, 35 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 0c299de..55e1985 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1528,44 +1528,11 @@ int bond_enslave(struct net_device *bond_dev, struct 
> net_device *slave_dev,
>   goto err_close;
>   }
>  
> - /* If the mode uses primary, then the following is handled by
> -  * bond_change_active_slave().
> -  */
> - if (!bond_uses_primary(bond)) {
> - /* set promiscuity level to new slave */
> - if (bond_dev->flags & IFF_PROMISC) {
> - res = dev_set_promiscuity(slave_dev, 1);
> - if (res)
> - goto err_close;
> - }
> -
> - /* set allmulti level to new slave */
> - if (bond_dev->flags & IFF_ALLMULTI) {
> - res = dev_set_allmulti(slave_dev, 1);
> - if (res)
> - goto err_close;
> - }
> -
> - netif_addr_lock_bh(bond_dev);
> -
> - dev_mc_sync_multiple(slave_dev, bond_dev);
> - dev_uc_sync_multiple(slave_dev, bond_dev);
> -
> - netif_addr_unlock_bh(bond_dev);
> - }
> -
> - if (BOND_MODE(bond) == BOND_MODE_8023AD) {
> - /* add lacpdu mc addr to mc list */
> - u8 lacpdu_multicast[ETH_ALEN] = MULTICAST_LACPDU_ADDR;
> -
> - dev_mc_add(slave_dev, lacpdu_multicast);
> - }
> -
>   res = vlan_vids_add_by_dev(slave_dev, bond_dev);
>   if (res) {
>   netdev_err(bond_dev, "Couldn't add bond vlan ids to %s\n",
>  slave_dev->name);
> - goto err_hwaddr_unsync;
> + goto err_close;
>   }
>  
>   prev_slave = bond_last_slave(bond);
> @@ -1725,6 +1692,37 @@ int bond_enslave(struct net_device *bond_dev, struct 
> net_device *slave_dev,
>   goto err_upper_unlink;
>   }
>  
> + /* If the mode uses primary, then the following is handled by
> +  * bond_change_active_slave().
> +  */
> + if (!bond_uses_primary(bond)) {
> + /* set promiscuity level to new slave */
> + if (bond_dev->flags & IFF_PROMISC) {
> + res = dev_set_promiscuity(slave_dev, 1);
> + if (res)
> + goto err_sysfs_del;
> + }
> +
> + /* set allmulti level to new slave */
> + if (bond_dev->flags & IFF_ALLMULTI) {
> + res = dev_set_allmulti(slave_dev, 1);
> + if (res)
> 

Re: [PATCH net 1/3] bonding: fix the err path for dev hwaddr sync in bond_enslave

2018-03-26 Thread Andy Gospodarek
On Mon, Mar 26, 2018 at 01:16:45AM +0800, Xin Long wrote:
> vlan_vids_add_by_dev is called right after dev hwaddr sync, so on
> the err path it should unsync dev hwaddr. Otherwise, the slave
> dev's hwaddr will never be unsync when this err happens.
> 
> Fixes: 1ff412ad7714 ("bonding: change the bond's vlan syncing functions with 
> the standard ones")
> Signed-off-by: Xin Long <lucien@gmail.com>

Acked-by: Andy Gospodarek <a...@greyhouse.net>

> ---
>  drivers/net/bonding/bond_main.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index c669554..0c299de 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1565,7 +1565,7 @@ int bond_enslave(struct net_device *bond_dev, struct 
> net_device *slave_dev,
>   if (res) {
>   netdev_err(bond_dev, "Couldn't add bond vlan ids to %s\n",
>  slave_dev->name);
> - goto err_close;
> + goto err_hwaddr_unsync;
>   }
>  
>   prev_slave = bond_last_slave(bond);
> @@ -1755,9 +1755,6 @@ int bond_enslave(struct net_device *bond_dev, struct 
> net_device *slave_dev,
>   netdev_rx_handler_unregister(slave_dev);
>  
>  err_detach:
> - if (!bond_uses_primary(bond))
> - bond_hw_addr_flush(bond_dev, slave_dev);
> -
>   vlan_vids_del_by_dev(slave_dev, bond_dev);
>   if (rcu_access_pointer(bond->primary_slave) == new_slave)
>   RCU_INIT_POINTER(bond->primary_slave, NULL);
> @@ -1771,6 +1768,10 @@ int bond_enslave(struct net_device *bond_dev, struct 
> net_device *slave_dev,
>   synchronize_rcu();
>   slave_disable_netpoll(new_slave);
>  
> +err_hwaddr_unsync:
> + if (!bond_uses_primary(bond))
> + bond_hw_addr_flush(bond_dev, slave_dev);
> +
>  err_close:
>   slave_dev->priv_flags &= ~IFF_BONDING;
>   dev_close(slave_dev);


Re: [patch net-next RFC 00/12] devlink: introduce port flavours and common phys_port_name generation

2018-03-22 Thread Andy Gospodarek
On Thu, Mar 22, 2018 at 01:10:38PM -0600, David Ahern wrote:
> On 3/22/18 11:49 AM, Jiri Pirko wrote:
> > Thu, Mar 22, 2018 at 04:34:07PM CET, dsah...@gmail.com wrote:
> >> On 3/22/18 4:55 AM, Jiri Pirko wrote:
> >>> From: Jiri Pirko 
> >>>
> >>> This patchset resolves 2 issues we have right now:
> >>> 1) There are many netdevices / ports in the system, for port, pf, vf
> >>>represenatation but the user has no way to see which is which
> >>> 2) The ndo_get_phys_port_name is implemented in each driver separatelly,
> >>>which may lead to inconsistent names between drivers.
> >>
> >> Similar to ndo_get_phys_port_{name,id}, devlink requires drivers to opt
> >> in with an implementation right, so you can't really force a solution to
> >> the consistent naming.
> > 
> > Yeah, drivers would still have free choice to implemen the ndo
> > themselves. But most of them, like all sriov switch drivers should use
> > the devlink helper to have consistent naming. In other words, devlink
> > helper should be the standard way, in weird cases (like rocker), driver
> > implements it himself.
> 
> That's an assumption that somehow the devlink API will be better
> supported than ndo_get_phys_port_{name,id}. Don't get me wrong -- an API
> to show the kind of device is needed, but I do not think this enforces
> any kind of consistency in naming.
> 
> > 
> > 
> >>
> >>>
> >>> This patchset introduces port flavours which should address the first
> >>> problem. I'm testing this with Netronome nfp hardware. When the user
> >>> has 2 physical ports, 1 pf, and 4 vfs, he should see something like this:
> >>> # devlink port
> >>> pci/:05:00.0/0: type eth netdev enp5s0np0 flavour physical number 0
> >>> pci/:05:00.0/268435456: type eth netdev eth0 flavour physical number 0
> >>> pci/:05:00.0/268435460: type eth netdev enp5s0np1 flavour physical 
> >>> number 1
> >>> pci/:05:00.0/536875008: type eth netdev eth2 flavour pf_rep number 
> >>> 536875008
> >>> pci/:05:00.0/536870912: type eth netdev eth1 flavour vf_rep number 0
> >>> pci/:05:00.0/536870976: type eth netdev eth3 flavour vf_rep number 1
> >>> pci/:05:00.0/536871040: type eth netdev eth4 flavour vf_rep number 2
> >>> pci/:05:00.0/536871104: type eth netdev eth5 flavour vf_rep number 3
> >>
> >> How about 'kind' instead of flavo{u}r?
> > 
> > Yeah, kind is often used in kernel already with different meaning
> > git grep kind net/core
> > I wanted to avoid confusions
> 
> Roopa's amendment works as well; I just think flavor / flavour is the
> wrong word. Make me thinks of food ... ice cream vs netdevices.

Naming it a 'subtype' could also work.  It's a bit longer than 'kind'
(no longer than 'flavour') and accurately describes the characteristic
of this port.  It also avoids the namespace collision of 'kind' that
Jiri points out.

It also fits with the names used in the PCI world with vendor:device and
subsystem vendor:subsystem device naming used there for further
granularity.



Re: [PATCH net-next] Documentation/networking: Add net DIM documentation

2018-03-21 Thread Andy Gospodarek
On Wed, Mar 21, 2018 at 11:30:29AM +0200, Tal Gilboa wrote:
> Net DIM is a generic algorithm, purposed for dynamically
> optimizing network devices interrupt moderation. This
> document describes how it works and how to use it.
> 
> Signed-off-by: Tal Gilboa <ta...@mellanox.com>

Looks like a nice summary of how to integrate it with a driver.  Thanks
for documenting DIM.

Acked-by: Andy Gospodarek <go...@broadcom.com>

> ---
>  Documentation/networking/net_dim.txt | 174 
> +++
>  1 file changed, 174 insertions(+)
>  create mode 100644 Documentation/networking/net_dim.txt
> 
> diff --git a/Documentation/networking/net_dim.txt 
> b/Documentation/networking/net_dim.txt
> new file mode 100644
> index 000..ef622c8
> --- /dev/null
> +++ b/Documentation/networking/net_dim.txt
> @@ -0,0 +1,174 @@
> +Net DIM - Generic Network Dynamic Interrupt Moderation
> +==
> +
> +Author:
> + Tal Gilboa <ta...@mellanox.com>
> +
> +
> +Contents
> +=
> +
> +- Assumptions
> +- Introduction
> +- The Net DIM Algorithm
> +- Registering a Network Device to DIM
> +- Example
> +
> +Part 0: Assumptions
> +==
> +
> +This document assumes the reader has basic knowledge in network drivers
> +and in general interrupt moderation.
> +
> +
> +Part I: Introduction
> +==
> +
> +Dynamic Interrupt Moderation (DIM) (in networking) refers to changing the 
> interrupt
> +moderation configuration of a channel in order to optimize packet processing.
> +The mechanism includes an algorithm which decides if and how to change
> +moderation parameters for a channel, usually by performing an analysis on
> +runtime data sampled from the system. Net DIM is such a mechanism. In each
> +iteration of the algorithm, it analyses a given sample of the data, compares 
> it to
> +the previous sample and if required, is can decide to change some of the 
> interrupt moderation
> +configuration fields. The data sample is composed of data bandwidth, the 
> number of
> +packets and the number of events. The time between samples is also measured. 
> Net DIM
> +compares the current and the previous data and returns an adjusted interrupt
> +moderation configuration object. In some cases, the algorithm might decide 
> not
> +to change anything. The configuration fields are the minimum duration
> +(microseconds) allowed between events and the maximum number of wanted 
> packets
> +per event. The Net DIM algorithm ascribes importance to increase bandwidth 
> over
> +reducing interrupt rate.
> +
> +
> +Part II: The Net DIM Algorithm
> +===
> +
> +Each iteration of the Net DIM algorithm follows these steps:
> +1. Calculates new data sample.
> +2. Compares it to previous sample.
> +3. Makes a decision - suggests interrupt moderation configuration fields.
> +4. Applies a schedule work function, which applies suggested configuration.
> +
> +The first two steps are straight forward, both the new and the previous data 
> are
> +supplied by the driver registered to Net DIM. The previous data is the new 
> data
> +supplied to the previous iteration. The comparison step checks the difference
> +between the new and previous data and decides on the result of the last 
> step. A step
> +would result as "better" if bandwidth increases and as "worse" if bandwidth
> +reduces. If there is no change in bandwidth, the packet rate is compared in 
> a similar
> +fashion - increase == "better" and decrease == "worse". In case there is no
> +change in the packet rate as well, the interrupt rate is compared. Here the
> +algorithm tries to optimize for lower interrupt rate so an increase in the
> +interrupt rate is considered "worse" and a decrease is considered "better".
> +Step #2 has an optimization for avoiding false results, it only considers a
> +difference between samples as valid if it is greater than a certain 
> percentage.
> +Also, since Net DIM does not measure anything by itself, it assumes the data
> +provided by the driver is valid.
> +
> +Step #3 decides on the suggested configuration based on the result from step 
> #2
> +and the internal state of the algorithm. The states reflect the "direction" 
> of
> +the algorithm, is it going left (reducing moderation), right (increasing
> +moderation) or standing still. Another optimization is that if a decision
> +to stay still is made multiple times, the interval between iterations of the
> +algorithm would increase in order to reduce calculation overhead. Also, after
> +&qu

[PATCH net-next] bnxt_en: cleanup DIM work on device shutdown

2018-01-26 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Make sure to cancel any pending work that might update driver coalesce
settings when taking down an interface.

Fixes: 6a8788f25625 ("bnxt_en: add support for software dynamic interrupt 
moderation")
Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Cc: Michael Chan <michael.c...@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 4b001d2..1500243 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6082,8 +6082,14 @@ static void bnxt_disable_napi(struct bnxt *bp)
if (!bp->bnapi)
return;
 
-   for (i = 0; i < bp->cp_nr_rings; i++)
+   for (i = 0; i < bp->cp_nr_rings; i++) {
+   struct bnxt_cp_ring_info *cpr = >bnapi[i]->cp_ring;
+
+   if (bp->bnapi[i]->rx_ring)
+   cancel_work_sync(>dim.work);
+
napi_disable(>bnapi[i]->napi);
+   }
 }
 
 static void bnxt_enable_napi(struct bnxt *bp)
-- 
2.7.4



Re: [PATCH][next] bnxt_en: ensure len is ininitialized to zero

2018-01-12 Thread Andy Gospodarek
On Fri, Jan 12, 2018 at 10:11:17AM -0800, Michael Chan wrote:
> On Fri, Jan 12, 2018 at 9:46 AM, Colin King  wrote:
> > From: Colin Ian King 
> >
> > In the case where cmp_type == CMP_TYPE_RX_L2_TPA_START_CMP the
> > exit return path is via label next_rx_no_prod and cpr->rx_bytes
> > is being updated by an uninitialized value from len. Fix this by
> > initializing len to zero.
> >
> > Detected by CoverityScan, CID#1463807 ("Uninitialized scalar variable")
> >
> > Fixes: 6a8788f25625 ("bnxt_en: add support for software dynamic interrupt 
> > moderation")
> > Signed-off-by: Colin Ian King 
> > ---
> >  drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
> > b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > index cf6ebf1e324b..5b5c4f266f1b 100644
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > @@ -1482,7 +1482,7 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct 
> > bnxt_napi *bnapi, u32 *raw_cons,
> > u32 tmp_raw_cons = *raw_cons;
> > u16 cfa_code, cons, prod, cp_cons = RING_CMP(tmp_raw_cons);
> > struct bnxt_sw_rx_bd *rx_buf;
> > -   unsigned int len;
> > +   unsigned int len = 0;
> 
> It might be better to add a new label next_rx_no_prod_no_len and have
> the TPA code paths jump there instead.
> 
> Andy, what do you think?
> 

Yes, I think that would be a better fix.  Making the TPA vs not-TPA as
explicit and intentional as possible seems like a good idea.

> > u8 *data_ptr, agg_bufs, cmp_type;
> > dma_addr_t dma_addr;
> > struct sk_buff *skb;
> > --
> > 2.15.1
> >


Re: [PATCH net-next v4 00/10] net: create dynamic software irq moderation library

2018-01-09 Thread Andy Gospodarek
On Wed, Jan 10, 2018 at 12:49:53AM +0200, Tal Gilboa wrote:
> On 1/10/2018 12:46 AM, Florian Fainelli wrote:
> > Hey Andy,
> > 
> > On 01/09/2018 01:06 PM, Andy Gospodarek wrote:
> > > From: Andy Gospodarek <go...@broadcom.com>
> > > 
> > > This converts the dynamic interrupt moderation library from the mlx5e
> > > driver into a library so it can be used by any driver.  The penultimate
> > > patch in this set adds support for this new dynamic interrupt moderation
> > > library in the bnxt_en driver and the last patch creates an entry in the
> > > MAINTAINERS file for this library.
> > > 
> > > The main purpose of this code is to allow an administrator to make sure
> > > that default coalesce settings are optimized for low latency, but
> > > quickly adapt to handle high throughput/bulk traffic by altering how
> > > much time passes before popping an interrupt.
> > > 
> > > For any new driver the following changes would be needed to use this
> > > library:
> > > 
> > > - add elements in ring struct to track items needed by this library
> > > - create function that can be called to actually set coalesce settings
> > >for the driver
> > > 
> > > Credit to Rob Rice and Lee Reed for doing some of the initial proof of
> > > concept and testing for this patch and Tal Gilboa and Or Gerlitz for
> > > their comments, etc on this set.
> > > 
> > > v4: Fix build breakage for VF representers noticed by kbuild test robot.
> > > Thanks for being so courteous, kbuild test robot!
> > > 
> > > v3: bnxt_en fix from Michael Chan, comment suggestion from Vasundhara
> > > Volam, and small mlx5e header file fix from Tal Gilboa.
> > > 
> > > v2: Spelling fixes from Stephen Hemminger, bnxt_en suggestions from
> > > Michael Chan, spelling and formatting fixes from Or Gerlitz, and
> > > spelling and mlx5e changes suggested by Tal Gilboa.
> > 
> > Certainly not a blocking item for this patch series, but can you
> > consider a follow up patch adding a small bit of documentation entry
> > covering how the implementation works as well as possible
> > limitations/considerations depending on what the networking HW supports
> > in terms of interrupt moderation capabilities? (e.g: is it necessary to
> > support generating an interrupt on ring empty, a micro-second resolution
> > RX/TX timeout etc. etc.).
> > 
> > Thanks for doing this!
> > 
> 
> Hi Florian, I plan to do so right after these patches would be accepted.

Thanks, Tal!



[PATCH net-next v4 01/10] net/mlx5e: Move interrupt moderation structs to new file

2018-01-09 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Create new header file to prepare to move code that handles irq
moderation to a library that lives in a header file.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 33 +---
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h | 69 
 2 files changed, 70 insertions(+), 32 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 5299310..df9cbb3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -50,6 +50,7 @@
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
+#include "en_dim.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
@@ -227,12 +228,6 @@ enum mlx5e_priv_flag {
 #define MLX5E_MAX_BW_ALLOC 100 /* Max percentage of BW allocation */
 #endif
 
-struct mlx5e_cq_moder {
-   u16 usec;
-   u16 pkts;
-   u8 cq_period_mode;
-};
-
 struct mlx5e_params {
u8  log_sq_size;
u8  rq_wq_type;
@@ -473,32 +468,6 @@ struct mlx5e_mpw_info {
u16 skbs_frags[MLX5_MPWRQ_PAGES_PER_WQE];
 };
 
-struct mlx5e_rx_am_stats {
-   int ppms; /* packets per msec */
-   int bpms; /* bytes per msec */
-   int epms; /* events per msec */
-};
-
-struct mlx5e_rx_am_sample {
-   ktime_t time;
-   u32 pkt_ctr;
-   u32 byte_ctr;
-   u16 event_ctr;
-};
-
-struct mlx5e_rx_am { /* Adaptive Moderation */
-   u8  state;
-   struct mlx5e_rx_am_statsprev_stats;
-   struct mlx5e_rx_am_sample   start_sample;
-   struct work_struct  work;
-   u8  profile_ix;
-   u8  mode;
-   u8  tune_state;
-   u8  steps_right;
-   u8  steps_left;
-   u8  tired;
-};
-
 /* a single cache unit is capable to serve one napi call (for non-striding rq)
  * or a MPWQE (for striding rq).
  */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
new file mode 100644
index 000..9eeaa11
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -0,0 +1,69 @@
+/*
+ * Copyright (c) 2013-2015, Mellanox Technologies, Ltd.  All rights reserved.
+ * Copyright (c) 2017-2018, Broadcom Limited
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef MLX5_AM_H
+#define MLX5_AM_H
+
+struct mlx5e_cq_moder {
+   u16 usec;
+   u16 pkts;
+   u8 cq_period_mode;
+};
+
+struct mlx5e_rx_am_sample {
+   ktime_t time;
+   u32 pkt_ctr;
+   u32 byte_ctr;
+   u16 event_ctr;
+};
+
+struct mlx5e_rx_am_stats {
+   int ppms; /* packets per msec */
+   int bpms; /* bytes per msec */
+   int epms; /* events per msec */
+};
+
+struct mlx5e_rx_am { /* Adaptive Moderation */
+   u8  state;
+   struct mlx5e_rx_am_statsprev_stats;
+   struct mlx5e_rx_am_sample   start_sample;
+   struct work_struct  wo

[PATCH net-next v4 00/10] net: create dynamic software irq moderation library

2018-01-09 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This converts the dynamic interrupt moderation library from the mlx5e
driver into a library so it can be used by any driver.  The penultimate
patch in this set adds support for this new dynamic interrupt moderation
library in the bnxt_en driver and the last patch creates an entry in the
MAINTAINERS file for this library.

The main purpose of this code is to allow an administrator to make sure
that default coalesce settings are optimized for low latency, but
quickly adapt to handle high throughput/bulk traffic by altering how
much time passes before popping an interrupt.

For any new driver the following changes would be needed to use this
library:

- add elements in ring struct to track items needed by this library
- create function that can be called to actually set coalesce settings
  for the driver

Credit to Rob Rice and Lee Reed for doing some of the initial proof of
concept and testing for this patch and Tal Gilboa and Or Gerlitz for
their comments, etc on this set.

v4: Fix build breakage for VF representers noticed by kbuild test robot.
Thanks for being so courteous, kbuild test robot!

v3: bnxt_en fix from Michael Chan, comment suggestion from Vasundhara
Volam, and small mlx5e header file fix from Tal Gilboa.

v2: Spelling fixes from Stephen Hemminger, bnxt_en suggestions from
Michael Chan, spelling and formatting fixes from Or Gerlitz, and
spelling and mlx5e changes suggested by Tal Gilboa.

Andy Gospodarek (10):
  net/mlx5e: Move interrupt moderation structs to new file
  net/mlx5e: Move interrupt moderation forward declarations
  net/mlx5e: Remove rq references in mlx5e_rx_am
  net/mlx5e: Move AM logic enums
  net/mlx5e: Move generic functions to new file
  net/mlx5e: Change Mellanox references in DIM code
  net/mlx5e: Move dynamic interrupt coalescing code to include/linux
  net/dim: use struct net_dim_sample as arg to net_dim
  bnxt_en: add support for software dynamic interrupt moderation
  MAINTAINERS: add entry for Dynamic Interrupt Moderation

 MAINTAINERS|   5 +
 drivers/net/ethernet/broadcom/bnxt/Makefile|   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c  |  50 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h  |  34 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c  |  33 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  |  12 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  46 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  49 +++
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  40 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 341 ---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  10 +-
 include/linux/net_dim.h| 373 +
 15 files changed, 594 insertions(+), 411 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
 create mode 100644 include/linux/net_dim.h

-- 
2.7.4



[PATCH net-next v4 04/10] net/mlx5e: Move AM logic enums

2018-01-09 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

More movement to help make this code more generic.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h   | 26 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 25 -
 2 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index 7d5499a..a1497bab 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -66,6 +66,32 @@ struct mlx5e_rx_am { /* Adaptive Moderation */
u8  tired;
 };
 
+/* Adaptive moderation logic */
+enum {
+   MLX5E_AM_START_MEASURE,
+   MLX5E_AM_MEASURE_IN_PROGRESS,
+   MLX5E_AM_APPLY_NEW_PROFILE,
+};
+
+enum {
+   MLX5E_AM_PARKING_ON_TOP,
+   MLX5E_AM_PARKING_TIRED,
+   MLX5E_AM_GOING_RIGHT,
+   MLX5E_AM_GOING_LEFT,
+};
+
+enum {
+   MLX5E_AM_STATS_WORSE,
+   MLX5E_AM_STATS_SAME,
+   MLX5E_AM_STATS_BETTER,
+};
+
+enum {
+   MLX5E_AM_STEPPED,
+   MLX5E_AM_TOO_TIRED,
+   MLX5E_AM_ON_EDGE,
+};
+
 void mlx5e_rx_am(struct mlx5e_rx_am *am,
 u16 event_ctr,
 u64 packets,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index 1630076..337dd60 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -82,31 +82,6 @@ struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 
rx_cq_period_mode)
return mlx5e_am_get_profile(rx_cq_period_mode, default_profile_ix);
 }
 
-/* Adaptive moderation logic */
-enum {
-   MLX5E_AM_START_MEASURE,
-   MLX5E_AM_MEASURE_IN_PROGRESS,
-   MLX5E_AM_APPLY_NEW_PROFILE,
-};
-
-enum {
-   MLX5E_AM_PARKING_ON_TOP,
-   MLX5E_AM_PARKING_TIRED,
-   MLX5E_AM_GOING_RIGHT,
-   MLX5E_AM_GOING_LEFT,
-};
-
-enum {
-   MLX5E_AM_STATS_WORSE,
-   MLX5E_AM_STATS_SAME,
-   MLX5E_AM_STATS_BETTER,
-};
-
-enum {
-   MLX5E_AM_STEPPED,
-   MLX5E_AM_TOO_TIRED,
-   MLX5E_AM_ON_EDGE,
-};
 
 static bool mlx5e_am_on_top(struct mlx5e_rx_am *am)
 {
-- 
2.7.4



[PATCH net-next v4 03/10] net/mlx5e: Remove rq references in mlx5e_rx_am

2018-01-09 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This makes mlx5e_am_sample more generic so that it can be called easily
from a driver that does not use the same data structure to store these
values in a single structure.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h   |  6 --
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 22 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  5 -
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index 2031a21..7d5499a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -66,8 +66,10 @@ struct mlx5e_rx_am { /* Adaptive Moderation */
u8  tired;
 };
 
-struct mlx5e_rq;
-void mlx5e_rx_am(struct mlx5e_rq *rq);
+void mlx5e_rx_am(struct mlx5e_rx_am *am,
+u16 event_ctr,
+u64 packets,
+u64 bytes);
 void mlx5e_rx_am_work(struct work_struct *work);
 struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index e401d9d..1630076 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -264,13 +264,15 @@ static bool mlx5e_am_decision(struct mlx5e_rx_am_stats 
*curr_stats,
return am->profile_ix != prev_ix;
 }
 
-static void mlx5e_am_sample(struct mlx5e_rq *rq,
+static void mlx5e_am_sample(u16 event_ctr,
+   u64 packets,
+   u64 bytes,
struct mlx5e_rx_am_sample *s)
 {
s->time  = ktime_get();
-   s->pkt_ctr   = rq->stats.packets;
-   s->byte_ctr  = rq->stats.bytes;
-   s->event_ctr = rq->cq.event_ctr;
+   s->pkt_ctr   = packets;
+   s->byte_ctr  = bytes;
+   s->event_ctr = event_ctr;
 }
 
 #define MLX5E_AM_NEVENTS 64
@@ -309,20 +311,22 @@ void mlx5e_rx_am_work(struct work_struct *work)
am->state = MLX5E_AM_START_MEASURE;
 }
 
-void mlx5e_rx_am(struct mlx5e_rq *rq)
+void mlx5e_rx_am(struct mlx5e_rx_am *am,
+u16 event_ctr,
+u64 packets,
+u64 bytes)
 {
-   struct mlx5e_rx_am *am = >am;
struct mlx5e_rx_am_sample end_sample;
struct mlx5e_rx_am_stats curr_stats;
u16 nevents;
 
switch (am->state) {
case MLX5E_AM_MEASURE_IN_PROGRESS:
-   nevents = BIT_GAP(BITS_PER_TYPE(u16), rq->cq.event_ctr,
+   nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
  am->start_sample.event_ctr);
if (nevents < MLX5E_AM_NEVENTS)
break;
-   mlx5e_am_sample(rq, _sample);
+   mlx5e_am_sample(event_ctr, packets, bytes, _sample);
mlx5e_am_calc_stats(>start_sample, _sample,
_stats);
if (mlx5e_am_decision(_stats, am)) {
@@ -332,7 +336,7 @@ void mlx5e_rx_am(struct mlx5e_rq *rq)
}
/* fall through */
case MLX5E_AM_START_MEASURE:
-   mlx5e_am_sample(rq, >start_sample);
+   mlx5e_am_sample(event_ctr, packets, bytes, >start_sample);
am->state = MLX5E_AM_MEASURE_IN_PROGRESS;
break;
case MLX5E_AM_APPLY_NEW_PROFILE:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index ab92298..1849169 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -79,7 +79,10 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
mlx5e_cq_arm(>sq[i].cq);
 
if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM))
-   mlx5e_rx_am(>rq);
+   mlx5e_rx_am(>rq.am,
+   c->rq.cq.event_ctr,
+   c->rq.stats.packets,
+   c->rq.stats.bytes);
 
mlx5e_cq_arm(>rq.cq);
mlx5e_cq_arm(>icosq.cq);
-- 
2.7.4



[PATCH net-next v4 02/10] net/mlx5e: Move interrupt moderation forward declarations

2018-01-09 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Move these to newly created file to prepare to move these functions to a
library.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 4 
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h | 5 +
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index df9cbb3..e2e35ed 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -833,10 +833,6 @@ void mlx5e_dealloc_rx_wqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_dealloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_free_rx_mpwqe(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi);
 
-void mlx5e_rx_am(struct mlx5e_rq *rq);
-void mlx5e_rx_am_work(struct work_struct *work);
-struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
-
 void mlx5e_update_stats(struct mlx5e_priv *priv, bool full);
 
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index 9eeaa11..2031a21 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -66,4 +66,9 @@ struct mlx5e_rx_am { /* Adaptive Moderation */
u8  tired;
 };
 
+struct mlx5e_rq;
+void mlx5e_rx_am(struct mlx5e_rq *rq);
+void mlx5e_rx_am_work(struct work_struct *work);
+struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
+
 #endif /* MLX5_AM_H */
-- 
2.7.4



[PATCH net-next v4 05/10] net/mlx5e: Move generic functions to new file

2018-01-09 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

These functions were identified as ones that could be made generic and
used by multiple drivers.  Most of the contents of en_rx_am.c are moved
to net_dim.c.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  48 
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h   | 102 ---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 320 -
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c  | 307 
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  | 103 +++
 7 files changed, 461 insertions(+), 425 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 19b21b4..b46b6de2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -14,8 +14,8 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o 
fpga/conn.o fpga/sdk.o \
fpga/ipsec.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o 
\
-   en_tx.o en_rx.o en_rx_am.o en_txrx.o en_stats.o vxlan.o \
-   en_arfs.o en_fs_ethtool.o en_selftest.o
+   en_tx.o en_rx.o en_dim.o en_txrx.o en_stats.o vxlan.o \
+   en_arfs.o en_fs_ethtool.o en_selftest.o net_dim.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index e2e35ed..4ee06e7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -50,7 +50,7 @@
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
-#include "en_dim.h"
+#include "net_dim.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
new file mode 100644
index 000..b9b434b
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -0,0 +1,48 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "en.h"
+
+void mlx5e_rx_am_work(struct work_struct *work)
+{
+   struct mlx5e_rx_am *am = container_of(work, struct mlx5e_rx_am,
+ work);
+   struct mlx5e_rq *rq = container_of(am, struct mlx5e_rq, am);
+   struct mlx5e_cq_moder cur_profile = mlx5e_am_get_profile(am->mode,
+
am->profile_ix);
+
+   mlx5_core_modify_cq_moderation(rq->mdev, >cq.mcq,
+  cur_profile.usec, cur_profile.pkts);
+
+   am->state = MLX5E_AM_START_MEASURE;
+}
+
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
del

[PATCH net-next v4 07/10] net/mlx5e: Move dynamic interrupt coalescing code to include/linux

2018-01-09 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This move allows drivers to add private structure elements to track the
number of packets, bytes, and interrupts events per ring.  A driver
also defines a workqueue handler to act on this collected data once per
poll and modify the coalescing parameters per ring.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c | 307 --
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h | 108 ---
 include/linux/net_dim.h   | 377 ++
 6 files changed, 380 insertions(+), 417 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h
 create mode 100644 include/linux/net_dim.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index b46b6de2..c805769 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -15,7 +15,7 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o 
fpga/conn.o fpga/sdk.o \
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o 
\
en_tx.o en_rx.o en_dim.o en_txrx.o en_stats.o vxlan.o \
-   en_arfs.o en_fs_ethtool.o en_selftest.o net_dim.o
+   en_arfs.o en_fs_ethtool.o en_selftest.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 4d1d298..29b9675 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -47,10 +47,10 @@
 #include 
 #include 
 #include 
+#include 
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
-#include "net_dim.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index f620325..2b89951 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -30,6 +30,7 @@
  * SOFTWARE.
  */
 
+#include 
 #include "en.h"
 
 void mlx5e_rx_dim_work(struct work_struct *work)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c 
b/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
deleted file mode 100644
index decb370..000
--- a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
+++ /dev/null
@@ -1,307 +0,0 @@
-/*
- * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
- * Copyright (c) 2017-2018, Broadcom Limited. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- *  - Redistributions of source code must retain the above
- *copyright notice, this list of conditions and the following
- *disclaimer.
- *
- *  - Redistributions in binary form must reproduce the above
- *copyright notice, this list of conditions and the following
- *disclaimer in the documentation and/or other materials
- *provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-
-#include "en.h"
-
-#define NET_DIM_PARAMS_NUM_PROFILES 5
-/* Adaptive moderation profiles */
-#define NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
-#define NET_DIM_DEF_PROFILE_CQE 1
-#define NET_DIM_DEF_PROFILE_EQE 1
-
-/* All profiles sizes must be NET_PARAMS_DIM_NUM_PROFILES */
-#define NET_DIM_EQE_PROFILES { \
-   {1,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {8,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {64,  NET_DIM_DEFAULT_RX_CQ_MODERATION_P

[PATCH net-next v4 08/10] net/dim: use struct net_dim_sample as arg to net_dim

2018-01-09 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Simplify the arguments net_dim() by formatting them into a struct
net_dim_sample before calling the function.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Suggested-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 13 -
 include/linux/net_dim.h   | 10 +++---
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index a1c94fd..f292bb3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -78,11 +78,14 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
for (i = 0; i < c->num_tc; i++)
mlx5e_cq_arm(>sq[i].cq);
 
-   if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM))
-   net_dim(>rq.dim,
-   c->rq.cq.event_ctr,
-   c->rq.stats.packets,
-   c->rq.stats.bytes);
+   if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM)) {
+   struct net_dim_sample dim_sample;
+   net_dim_sample(c->rq.cq.event_ctr,
+  c->rq.stats.packets,
+  c->rq.stats.bytes,
+  _sample);
+   net_dim(>rq.dim, dim_sample);
+   }
 
mlx5e_cq_arm(>rq.cq);
mlx5e_cq_arm(>icosq.cq);
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index 741510f..1c7e450 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -342,21 +342,18 @@ static inline void net_dim_calc_stats(struct 
net_dim_sample *start,
 }
 
 static inline void net_dim(struct net_dim *dim,
-  u16 event_ctr,
-  u64 packets,
-  u64 bytes)
+  struct net_dim_sample end_sample)
 {
-   struct net_dim_sample end_sample;
struct net_dim_stats curr_stats;
u16 nevents;
 
switch (dim->state) {
case NET_DIM_MEASURE_IN_PROGRESS:
-   nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
+   nevents = BIT_GAP(BITS_PER_TYPE(u16),
+ end_sample.event_ctr,
  dim->start_sample.event_ctr);
if (nevents < NET_DIM_NEVENTS)
break;
-   net_dim_sample(event_ctr, packets, bytes, _sample);
net_dim_calc_stats(>start_sample, _sample,
   _stats);
if (net_dim_decision(_stats, dim)) {
@@ -366,7 +363,6 @@ static inline void net_dim(struct net_dim *dim,
}
/* fall through */
case NET_DIM_START_MEASURE:
-   net_dim_sample(event_ctr, packets, bytes, >start_sample);
dim->state = NET_DIM_MEASURE_IN_PROGRESS;
break;
case NET_DIM_APPLY_NEW_PROFILE:
-- 
2.7.4



[PATCH net-next v4 09/10] bnxt_en: add support for software dynamic interrupt moderation

2018-01-09 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This implements the changes needed for the bnxt_en driver to add support
for dynamic interrupt moderation per ring.

This does add additional counters in the receive path, but testing shows
that any additional instructions are offset by throughput gain when the
default configuration is for low latency.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Michael Chan <michael.c...@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/Makefile   |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 50 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 34 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c | 33 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 12 ++
 5 files changed, 119 insertions(+), 12 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c

diff --git a/drivers/net/ethernet/broadcom/bnxt/Makefile 
b/drivers/net/ethernet/broadcom/bnxt/Makefile
index 59c8ec9..7c560d5 100644
--- a/drivers/net/ethernet/broadcom/bnxt/Makefile
+++ b/drivers/net/ethernet/broadcom/bnxt/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_BNXT) += bnxt_en.o
 
-bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o 
bnxt_xdp.o bnxt_vfr.o bnxt_devlink.o
+bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o 
bnxt_xdp.o bnxt_vfr.o bnxt_devlink.o bnxt_dim.o
 bnxt_en-$(CONFIG_BNXT_FLOWER_OFFLOAD) += bnxt_tc.o
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 89c3c87..cf6ebf1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1645,6 +1645,8 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_napi 
*bnapi, u32 *raw_cons,
rxr->rx_next_cons = NEXT_RX(cons);
 
 next_rx_no_prod:
+   cpr->rx_packets += 1;
+   cpr->rx_bytes += len;
*raw_cons = tmp_raw_cons;
 
return rc;
@@ -1802,6 +1804,7 @@ static irqreturn_t bnxt_msix(int irq, void *dev_instance)
struct bnxt_cp_ring_info *cpr = >cp_ring;
u32 cons = RING_CMP(cpr->cp_raw_cons);
 
+   cpr->event_ctr++;
prefetch(>cp_desc_ring[CP_RING(cons)][CP_IDX(cons)]);
napi_schedule(>napi);
return IRQ_HANDLED;
@@ -2025,6 +2028,15 @@ static int bnxt_poll(struct napi_struct *napi, int 
budget)
break;
}
}
+   if (bp->flags & BNXT_FLAG_DIM) {
+   struct net_dim_sample dim_sample;
+
+   net_dim_sample(cpr->event_ctr,
+  cpr->rx_packets,
+  cpr->rx_bytes,
+  _sample);
+   net_dim(>dim, dim_sample);
+   }
mmiowb();
return work_done;
 }
@@ -2617,6 +2629,8 @@ static void bnxt_init_cp_rings(struct bnxt *bp)
struct bnxt_ring_struct *ring = >cp_ring_struct;
 
ring->fw_ring_id = INVALID_HW_RING_ID;
+   cpr->rx_ring_coal.coal_ticks = bp->rx_coal.coal_ticks;
+   cpr->rx_ring_coal.coal_bufs = bp->rx_coal.coal_bufs;
}
 }
 
@@ -4593,6 +4607,36 @@ static void bnxt_hwrm_set_coal_params(struct bnxt_coal 
*hw_coal,
req->flags = cpu_to_le16(flags);
 }
 
+int bnxt_hwrm_set_ring_coal(struct bnxt *bp, struct bnxt_napi *bnapi)
+{
+   struct hwrm_ring_cmpl_ring_cfg_aggint_params_input req_rx = {0};
+   struct bnxt_cp_ring_info *cpr = >cp_ring;
+   struct bnxt_coal coal;
+   unsigned int grp_idx;
+
+   /* Tick values in micro seconds.
+* 1 coal_buf x bufs_per_record = 1 completion record.
+*/
+   memcpy(, >rx_coal, sizeof(struct bnxt_coal));
+
+   coal.coal_ticks = cpr->rx_ring_coal.coal_ticks;
+   coal.coal_bufs = cpr->rx_ring_coal.coal_bufs;
+
+   if (!bnapi->rx_ring)
+   return -ENODEV;
+
+   bnxt_hwrm_cmd_hdr_init(bp, _rx,
+  HWRM_RING_CMPL_RING_CFG_AGGINT_PARAMS, -1, -1);
+
+   bnxt_hwrm_set_coal_params(, _rx);
+
+   grp_idx = bnapi->index;
+   req_rx.ring_id = cpu_to_le16(bp->grp_info[grp_idx].cp_fw_ring_id);
+
+   return hwrm_send_message(bp, _rx, sizeof(req_rx),
+HWRM_CMD_TIMEOUT);
+}
+
 int bnxt_hwrm_set_coal(struct bnxt *bp)
 {
int i, rc = 0;
@@ -5715,7 +5759,13 @@ static void bnxt_enable_napi(struct bnxt *bp)
int i;
 
for (i = 0; i < bp->cp_nr_rings; i++) {
+   struct bnxt_cp_ring_info *cpr = >bnapi[i]->cp_ring;
bp->bnapi[i]->in_reset = false;
+
+   if (bp->bnapi[i]->rx_ring) {
+   INIT_WORK(>dim.work, bnxt_dim_work);
+   cpr->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+   }
napi_enab

[PATCH net-next v4 10/10] MAINTAINERS: add entry for Dynamic Interrupt Moderation

2018-01-09 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Signed-off-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 MAINTAINERS | 5 +
 1 file changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e81d91f..1791e7b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4942,6 +4942,11 @@ S:   Maintained
 F: lib/dynamic_debug.c
 F: include/linux/dynamic_debug.h
 
+DYNAMIC INTERRUPT MODERATION
+M: Tal Gilboa <ta...@mellanox.com>
+S: Maintained
+F: include/linux/net_dim.h
+
 DZ DECSTATION DZ11 SERIAL DRIVER
 M: "Maciej W. Rozycki" <ma...@linux-mips.org>
 S: Maintained
-- 
2.7.4



[PATCH net-next v4 06/10] net/mlx5e: Change Mellanox references in DIM code

2018-01-09 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Change all appropriate mlx5_am* and MLX5_AM* references to net_dim and
NET_DIM, respectively, in code that handles dynamic interrupt
moderation.  Also change all references from 'am' to 'dim' when used as
local variables and add generic profile references.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   9 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  14 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  40 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c  | 286 ++---
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  |  63 ++---
 8 files changed, 226 insertions(+), 202 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 4ee06e7..4d1d298 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -238,8 +238,8 @@ struct mlx5e_params {
u16 num_channels;
u8  num_tc;
bool rx_cqe_compress_def;
-   struct mlx5e_cq_moder rx_cq_moderation;
-   struct mlx5e_cq_moder tx_cq_moderation;
+   struct net_dim_cq_moder rx_cq_moderation;
+   struct net_dim_cq_moder tx_cq_moderation;
bool lro_en;
u32 lro_wqe_sz;
u16 tx_max_inline;
@@ -249,7 +249,7 @@ struct mlx5e_params {
u32 indirection_rqt[MLX5E_INDIR_RQT_SIZE];
bool vlan_strip_disable;
bool scatter_fcs_en;
-   bool rx_am_enabled;
+   bool rx_dim_enabled;
u32 lro_timeout;
u32 pflags;
struct bpf_prog *xdp_prog;
@@ -528,7 +528,7 @@ struct mlx5e_rq {
unsigned long  state;
intix;
 
-   struct mlx5e_rx_am am; /* Adaptive Moderation */
+   struct net_dim dim; /* Dynamic Interrupt Moderation */
 
/* XDP */
struct bpf_prog   *xdp_prog;
@@ -1079,4 +1079,5 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
struct mlx5e_params *params,
u16 max_channels);
 u8 mlx5e_params_calculate_tx_min_inline(struct mlx5_core_dev *mdev);
+void mlx5e_rx_dim_work(struct work_struct *work);
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index b9b434b..f620325 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -32,17 +32,17 @@
 
 #include "en.h"
 
-void mlx5e_rx_am_work(struct work_struct *work)
+void mlx5e_rx_dim_work(struct work_struct *work)
 {
-   struct mlx5e_rx_am *am = container_of(work, struct mlx5e_rx_am,
- work);
-   struct mlx5e_rq *rq = container_of(am, struct mlx5e_rq, am);
-   struct mlx5e_cq_moder cur_profile = mlx5e_am_get_profile(am->mode,
-
am->profile_ix);
+   struct net_dim *dim = container_of(work, struct net_dim,
+  work);
+   struct mlx5e_rq *rq = container_of(dim, struct mlx5e_rq, dim);
+   struct net_dim_cq_moder cur_profile = net_dim_get_profile(dim->mode,
+ 
dim->profile_ix);
 
mlx5_core_modify_cq_moderation(rq->mdev, >cq.mcq,
   cur_profile.usec, cur_profile.pkts);
 
-   am->state = MLX5E_AM_START_MEASURE;
+   dim->state = NET_DIM_START_MEASURE;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 1554780..bd5af7f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -465,7 +465,7 @@ int mlx5e_ethtool_get_coalesce(struct mlx5e_priv *priv,
coal->rx_max_coalesced_frames = 
priv->channels.params.rx_cq_moderation.pkts;
coal->tx_coalesce_usecs   = 
priv->channels.params.tx_cq_moderation.usec;
coal->tx_max_coalesced_frames = 
priv->channels.params.tx_cq_moderation.pkts;
-   coal->use_adaptive_rx_coalesce = priv->channels.params.rx_am_enabled;
+   coal->use_adaptive_rx_coalesce = priv->channels.params.rx_dim_enabled;
 
return 0;
 }
@@ -519,7 +519,7 @@ int mlx5e_ethtool_set_coalesce(struct mlx5e_priv *priv,
new_channels.params.tx_cq_moderation.pkts = 
coal->tx_max_coalesced_frames;
new_channels.params.rx_cq_moderatio

Re: [PATCH net-next v3 06/10] net/mlx5e: Change Mellanox references in DIM code

2018-01-09 Thread Andy Gospodarek
On Tue, Jan 09, 2018 at 08:22:15PM +0200, Tal Gilboa wrote:
> On 1/9/2018 6:06 PM, Andy Gospodarek wrote:
> > On Mon, Jan 08, 2018 at 11:06:28PM -0800, Saeed Mahameed wrote:
> > > 
> > > 
> > > On 01/08/2018 10:13 PM, Andy Gospodarek wrote:
> > > > From: Andy Gospodarek <go...@broadcom.com>
> > > > 
> > > > Change all appropriate mlx5_am* and MLX5_AM* references to net_dim and
> > > > NET_DIM, respectively, in code that handles dynamic interrupt
> > > > moderation.  Also change all references from 'am' to 'dim' when used as
> > > > local variables and add generic profile references.
> > > > 
> > > > Signed-off-by: Andy Gospodarek <go...@broadcom.com>
> > > > Acked-by: Tal Gilboa <ta...@mellanox.com>
> > > > Acked-by: Saeed Mahameed <sae...@mellanox.com>
> > > > ---
> > > >drivers/net/ethernet/mellanox/mlx5/core/en.h   |   9 +-
> > > >drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  14 +-
> > > >.../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   6 +-
> > > >drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  40 ++-
> > > >drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   8 +-
> > > >drivers/net/ethernet/mellanox/mlx5/core/net_dim.c  | 286 
> > > > ++---
> > > >drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  |  63 ++---
> > > >7 files changed, 225 insertions(+), 201 deletions(-)
> > > > 
> > > 
> > > [...]
> > > 
> > > >#define IS_SIGNIFICANT_DIFF(val, ref) \
> > > > (((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% 
> > > > difference */
> > > > -static int mlx5e_am_stats_compare(struct mlx5e_rx_am_stats *curr,
> > > > - struct mlx5e_rx_am_stats *prev)
> > > > +static int net_dim_stats_compare(struct net_dim_stats *curr,
> > > > +struct net_dim_stats *prev)
> > > >{
> > > > if (!prev->bpms)
> > > > -   return curr->bpms ? MLX5E_AM_STATS_BETTER :
> > > > -   MLX5E_AM_STATS_SAME;
> > > > +   return curr->bpms ? NET_DIM_STATS_BETTER :
> > > > +   NET_DIM_STATS_SAME;
> > > > if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
> > > > -   return (curr->bpms > prev->bpms) ? 
> > > > MLX5E_AM_STATS_BETTER :
> > > > -  MLX5E_AM_STATS_WORSE;
> > > > +   return (curr->bpms > prev->bpms) ? NET_DIM_STATS_BETTER 
> > > > :
> > > > +  NET_DIM_STATS_WORSE;
> > > 
> > > Hey Andy,
> > > 
> > > I am currently reviewing a patch internally that fixes a bug in this area,
> > > prev->ppms can be 0 and could cause IS_SIGNIFICANT_DIFF ouch !
> > > same goes for prev->eppm, for some reason we had a broken assumption that 
> > > if
> > > ppms is 0 for some reason then the bpms is 0 and the above condition will
> > > cover us.
> > > 
> > > Anyway the patch will go to net, which means when this series gets 
> > > accepted
> > > then net-next will fail to merge with net and we need to manually push the
> > > fix to the new DIM library.
> > > 
> > > But for now I don't think anything is required for this series other than
> > > bringing this division by 0 issue and the future merge conflict to your
> > > attention.
> > > 
> > 
> > Thanks for bringing that to everyone's attention.  I agree there is
> > probably not much that should be done at this point -- hopefully the
> > merge should go pretty smoothly, since net_dim.h is seen as a rename
> > from en_rx_am.c.
> 
> I talked with Talat, who is submitting the fix. He will apply it over these
> patches after they are accepted.
> 

Awesome!  Thanks for doing that.  v4 coming in a few mins -- hopefully the
last.  :-)

> > 
> > 
> > > > if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
> > > > -   return (curr->ppms > prev->ppms) ? 
> > > > MLX5E_AM_STATS_BETTER :
> > > > -  MLX5E_AM_STATS_WORSE;
> > > > +   return (curr->ppms > prev->ppms) ? NET_DIM_STATS_BETTER 
> > > > :
> > > > +  NET_DIM_STATS_WORSE;
> > > > if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
> > > > -   return (curr->epms < prev->epms) ? 
> > > > MLX5E_AM_STATS_BETTER :
> > > > -  MLX5E_AM_STATS_WORSE;
> > > > +   return (curr->epms < prev->epms) ? NET_DIM_STATS_BETTER 
> > > > :
> > > > +  NET_DIM_STATS_WORSE;
> > > > -   return MLX5E_AM_STATS_SAME;
> > > > +   return NET_DIM_STATS_SAME;
> > > >}


Re: [PATCH net-next v3 06/10] net/mlx5e: Change Mellanox references in DIM code

2018-01-09 Thread Andy Gospodarek
On Mon, Jan 08, 2018 at 11:06:28PM -0800, Saeed Mahameed wrote:
> 
> 
> On 01/08/2018 10:13 PM, Andy Gospodarek wrote:
> > From: Andy Gospodarek <go...@broadcom.com>
> > 
> > Change all appropriate mlx5_am* and MLX5_AM* references to net_dim and
> > NET_DIM, respectively, in code that handles dynamic interrupt
> > moderation.  Also change all references from 'am' to 'dim' when used as
> > local variables and add generic profile references.
> > 
> > Signed-off-by: Andy Gospodarek <go...@broadcom.com>
> > Acked-by: Tal Gilboa <ta...@mellanox.com>
> > Acked-by: Saeed Mahameed <sae...@mellanox.com>
> > ---
> >   drivers/net/ethernet/mellanox/mlx5/core/en.h   |   9 +-
> >   drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  14 +-
> >   .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   6 +-
> >   drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  40 ++-
> >   drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   8 +-
> >   drivers/net/ethernet/mellanox/mlx5/core/net_dim.c  | 286 
> > ++---
> >   drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  |  63 ++---
> >   7 files changed, 225 insertions(+), 201 deletions(-)
> > 
> 
> [...]
> 
> >   #define IS_SIGNIFICANT_DIFF(val, ref) \
> > (((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference 
> > */
> > -static int mlx5e_am_stats_compare(struct mlx5e_rx_am_stats *curr,
> > - struct mlx5e_rx_am_stats *prev)
> > +static int net_dim_stats_compare(struct net_dim_stats *curr,
> > +struct net_dim_stats *prev)
> >   {
> > if (!prev->bpms)
> > -   return curr->bpms ? MLX5E_AM_STATS_BETTER :
> > -   MLX5E_AM_STATS_SAME;
> > +   return curr->bpms ? NET_DIM_STATS_BETTER :
> > +   NET_DIM_STATS_SAME;
> > if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
> > -   return (curr->bpms > prev->bpms) ? MLX5E_AM_STATS_BETTER :
> > -  MLX5E_AM_STATS_WORSE;
> > +   return (curr->bpms > prev->bpms) ? NET_DIM_STATS_BETTER :
> > +  NET_DIM_STATS_WORSE;
> 
> Hey Andy,
> 
> I am currently reviewing a patch internally that fixes a bug in this area,
> prev->ppms can be 0 and could cause IS_SIGNIFICANT_DIFF ouch !
> same goes for prev->eppm, for some reason we had a broken assumption that if
> ppms is 0 for some reason then the bpms is 0 and the above condition will
> cover us.
> 
> Anyway the patch will go to net, which means when this series gets accepted
> then net-next will fail to merge with net and we need to manually push the
> fix to the new DIM library.
> 
> But for now I don't think anything is required for this series other than
> bringing this division by 0 issue and the future merge conflict to your
> attention.
> 

Thanks for bringing that to everyone's attention.  I agree there is
probably not much that should be done at this point -- hopefully the
merge should go pretty smoothly, since net_dim.h is seen as a rename
from en_rx_am.c.


> > if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
> > -   return (curr->ppms > prev->ppms) ? MLX5E_AM_STATS_BETTER :
> > -  MLX5E_AM_STATS_WORSE;
> > +   return (curr->ppms > prev->ppms) ? NET_DIM_STATS_BETTER :
> > +  NET_DIM_STATS_WORSE;
> > if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
> > -   return (curr->epms < prev->epms) ? MLX5E_AM_STATS_BETTER :
> > -  MLX5E_AM_STATS_WORSE;
> > +   return (curr->epms < prev->epms) ? NET_DIM_STATS_BETTER :
> > +  NET_DIM_STATS_WORSE;
> > -   return MLX5E_AM_STATS_SAME;
> > +   return NET_DIM_STATS_SAME;
> >   }


[PATCH net-next v3 06/10] net/mlx5e: Change Mellanox references in DIM code

2018-01-08 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Change all appropriate mlx5_am* and MLX5_AM* references to net_dim and
NET_DIM, respectively, in code that handles dynamic interrupt
moderation.  Also change all references from 'am' to 'dim' when used as
local variables and add generic profile references.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   9 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  14 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  40 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c  | 286 ++---
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  |  63 ++---
 7 files changed, 225 insertions(+), 201 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 4ee06e7..4d1d298 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -238,8 +238,8 @@ struct mlx5e_params {
u16 num_channels;
u8  num_tc;
bool rx_cqe_compress_def;
-   struct mlx5e_cq_moder rx_cq_moderation;
-   struct mlx5e_cq_moder tx_cq_moderation;
+   struct net_dim_cq_moder rx_cq_moderation;
+   struct net_dim_cq_moder tx_cq_moderation;
bool lro_en;
u32 lro_wqe_sz;
u16 tx_max_inline;
@@ -249,7 +249,7 @@ struct mlx5e_params {
u32 indirection_rqt[MLX5E_INDIR_RQT_SIZE];
bool vlan_strip_disable;
bool scatter_fcs_en;
-   bool rx_am_enabled;
+   bool rx_dim_enabled;
u32 lro_timeout;
u32 pflags;
struct bpf_prog *xdp_prog;
@@ -528,7 +528,7 @@ struct mlx5e_rq {
unsigned long  state;
intix;
 
-   struct mlx5e_rx_am am; /* Adaptive Moderation */
+   struct net_dim dim; /* Dynamic Interrupt Moderation */
 
/* XDP */
struct bpf_prog   *xdp_prog;
@@ -1079,4 +1079,5 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
struct mlx5e_params *params,
u16 max_channels);
 u8 mlx5e_params_calculate_tx_min_inline(struct mlx5_core_dev *mdev);
+void mlx5e_rx_dim_work(struct work_struct *work);
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index b9b434b..f620325 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -32,17 +32,17 @@
 
 #include "en.h"
 
-void mlx5e_rx_am_work(struct work_struct *work)
+void mlx5e_rx_dim_work(struct work_struct *work)
 {
-   struct mlx5e_rx_am *am = container_of(work, struct mlx5e_rx_am,
- work);
-   struct mlx5e_rq *rq = container_of(am, struct mlx5e_rq, am);
-   struct mlx5e_cq_moder cur_profile = mlx5e_am_get_profile(am->mode,
-
am->profile_ix);
+   struct net_dim *dim = container_of(work, struct net_dim,
+  work);
+   struct mlx5e_rq *rq = container_of(dim, struct mlx5e_rq, dim);
+   struct net_dim_cq_moder cur_profile = net_dim_get_profile(dim->mode,
+ 
dim->profile_ix);
 
mlx5_core_modify_cq_moderation(rq->mdev, >cq.mcq,
   cur_profile.usec, cur_profile.pkts);
 
-   am->state = MLX5E_AM_START_MEASURE;
+   dim->state = NET_DIM_START_MEASURE;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 8f05efa..51ae6df 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -480,7 +480,7 @@ int mlx5e_ethtool_get_coalesce(struct mlx5e_priv *priv,
coal->rx_max_coalesced_frames = 
priv->channels.params.rx_cq_moderation.pkts;
coal->tx_coalesce_usecs   = 
priv->channels.params.tx_cq_moderation.usec;
coal->tx_max_coalesced_frames = 
priv->channels.params.tx_cq_moderation.pkts;
-   coal->use_adaptive_rx_coalesce = priv->channels.params.rx_am_enabled;
+   coal->use_adaptive_rx_coalesce = priv->channels.params.rx_dim_enabled;
 
return 0;
 }
@@ -534,7 +534,7 @@ int mlx5e_ethtool_set_coalesce(struct mlx5e_priv *priv,
new_channels.params.tx_cq_moderation.pkts = 
coal->tx_max_coalesced_frames;
new_channels.params.rx_cq_moderation.usec = coal->rx_coalesce_usecs;
new_channels.params.rx_

[PATCH net-next v3 05/10] net/mlx5e: Move generic functions to new file

2018-01-08 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

These functions were identified as ones that could be made generic and
used by multiple drivers.  Most of the contents of en_rx_am.c are moved
to net_dim.c.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  48 
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h   | 102 ---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 320 -
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c  | 307 
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  | 103 +++
 7 files changed, 461 insertions(+), 425 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 19b21b4..b46b6de2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -14,8 +14,8 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o 
fpga/conn.o fpga/sdk.o \
fpga/ipsec.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o 
\
-   en_tx.o en_rx.o en_rx_am.o en_txrx.o en_stats.o vxlan.o \
-   en_arfs.o en_fs_ethtool.o en_selftest.o
+   en_tx.o en_rx.o en_dim.o en_txrx.o en_stats.o vxlan.o \
+   en_arfs.o en_fs_ethtool.o en_selftest.o net_dim.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index e2e35ed..4ee06e7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -50,7 +50,7 @@
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
-#include "en_dim.h"
+#include "net_dim.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
new file mode 100644
index 000..b9b434b
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -0,0 +1,48 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "en.h"
+
+void mlx5e_rx_am_work(struct work_struct *work)
+{
+   struct mlx5e_rx_am *am = container_of(work, struct mlx5e_rx_am,
+ work);
+   struct mlx5e_rq *rq = container_of(am, struct mlx5e_rq, am);
+   struct mlx5e_cq_moder cur_profile = mlx5e_am_get_profile(am->mode,
+
am->profile_ix);
+
+   mlx5_core_modify_cq_moderation(rq->mdev, >cq.mcq,
+  cur_profile.usec, cur_profile.pkts);
+
+   am->state = MLX5E_AM_START_MEASURE;
+}
+
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
del

[PATCH net-next v3 03/10] net/mlx5e: Remove rq references in mlx5e_rx_am

2018-01-08 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This makes mlx5e_am_sample more generic so that it can be called easily
from a driver that does not use the same data structure to store these
values in a single structure.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h   |  6 --
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 22 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  5 -
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index 2031a21..7d5499a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -66,8 +66,10 @@ struct mlx5e_rx_am { /* Adaptive Moderation */
u8  tired;
 };
 
-struct mlx5e_rq;
-void mlx5e_rx_am(struct mlx5e_rq *rq);
+void mlx5e_rx_am(struct mlx5e_rx_am *am,
+u16 event_ctr,
+u64 packets,
+u64 bytes);
 void mlx5e_rx_am_work(struct work_struct *work);
 struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index e401d9d..1630076 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -264,13 +264,15 @@ static bool mlx5e_am_decision(struct mlx5e_rx_am_stats 
*curr_stats,
return am->profile_ix != prev_ix;
 }
 
-static void mlx5e_am_sample(struct mlx5e_rq *rq,
+static void mlx5e_am_sample(u16 event_ctr,
+   u64 packets,
+   u64 bytes,
struct mlx5e_rx_am_sample *s)
 {
s->time  = ktime_get();
-   s->pkt_ctr   = rq->stats.packets;
-   s->byte_ctr  = rq->stats.bytes;
-   s->event_ctr = rq->cq.event_ctr;
+   s->pkt_ctr   = packets;
+   s->byte_ctr  = bytes;
+   s->event_ctr = event_ctr;
 }
 
 #define MLX5E_AM_NEVENTS 64
@@ -309,20 +311,22 @@ void mlx5e_rx_am_work(struct work_struct *work)
am->state = MLX5E_AM_START_MEASURE;
 }
 
-void mlx5e_rx_am(struct mlx5e_rq *rq)
+void mlx5e_rx_am(struct mlx5e_rx_am *am,
+u16 event_ctr,
+u64 packets,
+u64 bytes)
 {
-   struct mlx5e_rx_am *am = >am;
struct mlx5e_rx_am_sample end_sample;
struct mlx5e_rx_am_stats curr_stats;
u16 nevents;
 
switch (am->state) {
case MLX5E_AM_MEASURE_IN_PROGRESS:
-   nevents = BIT_GAP(BITS_PER_TYPE(u16), rq->cq.event_ctr,
+   nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
  am->start_sample.event_ctr);
if (nevents < MLX5E_AM_NEVENTS)
break;
-   mlx5e_am_sample(rq, _sample);
+   mlx5e_am_sample(event_ctr, packets, bytes, _sample);
mlx5e_am_calc_stats(>start_sample, _sample,
_stats);
if (mlx5e_am_decision(_stats, am)) {
@@ -332,7 +336,7 @@ void mlx5e_rx_am(struct mlx5e_rq *rq)
}
/* fall through */
case MLX5E_AM_START_MEASURE:
-   mlx5e_am_sample(rq, >start_sample);
+   mlx5e_am_sample(event_ctr, packets, bytes, >start_sample);
am->state = MLX5E_AM_MEASURE_IN_PROGRESS;
break;
case MLX5E_AM_APPLY_NEW_PROFILE:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index ab92298..1849169 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -79,7 +79,10 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
mlx5e_cq_arm(>sq[i].cq);
 
if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM))
-   mlx5e_rx_am(>rq);
+   mlx5e_rx_am(>rq.am,
+   c->rq.cq.event_ctr,
+   c->rq.stats.packets,
+   c->rq.stats.bytes);
 
mlx5e_cq_arm(>rq.cq);
mlx5e_cq_arm(>icosq.cq);
-- 
2.7.4



[PATCH net-next v3 10/10] MAINTAINERS: add entry for Dynamic Interrupt Moderation

2018-01-08 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Signed-off-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 MAINTAINERS | 5 +
 1 file changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 753799d..178239dc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4944,6 +4944,11 @@ S:   Maintained
 F: lib/dynamic_debug.c
 F: include/linux/dynamic_debug.h
 
+DYNAMIC INTERRUPT MODERATION
+M: Tal Gilboa <ta...@mellanox.com>
+S: Maintained
+F: include/linux/net_dim.h
+
 DZ DECSTATION DZ11 SERIAL DRIVER
 M: "Maciej W. Rozycki" <ma...@linux-mips.org>
 S: Maintained
-- 
2.7.4



[PATCH net-next v3 02/10] net/mlx5e: Move interrupt moderation forward declarations

2018-01-08 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Move these to newly created file to prepare to move these functions to a
library.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 4 
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h | 5 +
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index df9cbb3..e2e35ed 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -833,10 +833,6 @@ void mlx5e_dealloc_rx_wqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_dealloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_free_rx_mpwqe(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi);
 
-void mlx5e_rx_am(struct mlx5e_rq *rq);
-void mlx5e_rx_am_work(struct work_struct *work);
-struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
-
 void mlx5e_update_stats(struct mlx5e_priv *priv, bool full);
 
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index 9eeaa11..2031a21 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -66,4 +66,9 @@ struct mlx5e_rx_am { /* Adaptive Moderation */
u8  tired;
 };
 
+struct mlx5e_rq;
+void mlx5e_rx_am(struct mlx5e_rq *rq);
+void mlx5e_rx_am_work(struct work_struct *work);
+struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
+
 #endif /* MLX5_AM_H */
-- 
2.7.4



[PATCH net-next v3 07/10] net/mlx5e: Move dynamic interrupt coalescing code to include/linux

2018-01-08 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This move allows drivers to add private structure elements to track the
number of packets, bytes, and interrupts events per ring.  A driver
also defines a workqueue handler to act on this collected data once per
poll and modify the coalescing parameters per ring.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c | 307 --
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h | 108 ---
 include/linux/net_dim.h   | 377 ++
 6 files changed, 380 insertions(+), 417 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h
 create mode 100644 include/linux/net_dim.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index b46b6de2..c805769 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -15,7 +15,7 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o 
fpga/conn.o fpga/sdk.o \
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o 
\
en_tx.o en_rx.o en_dim.o en_txrx.o en_stats.o vxlan.o \
-   en_arfs.o en_fs_ethtool.o en_selftest.o net_dim.o
+   en_arfs.o en_fs_ethtool.o en_selftest.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 4d1d298..29b9675 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -47,10 +47,10 @@
 #include 
 #include 
 #include 
+#include 
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
-#include "net_dim.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index f620325..2b89951 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -30,6 +30,7 @@
  * SOFTWARE.
  */
 
+#include 
 #include "en.h"
 
 void mlx5e_rx_dim_work(struct work_struct *work)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c 
b/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
deleted file mode 100644
index decb370..000
--- a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
+++ /dev/null
@@ -1,307 +0,0 @@
-/*
- * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
- * Copyright (c) 2017-2018, Broadcom Limited. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- *  - Redistributions of source code must retain the above
- *copyright notice, this list of conditions and the following
- *disclaimer.
- *
- *  - Redistributions in binary form must reproduce the above
- *copyright notice, this list of conditions and the following
- *disclaimer in the documentation and/or other materials
- *provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-
-#include "en.h"
-
-#define NET_DIM_PARAMS_NUM_PROFILES 5
-/* Adaptive moderation profiles */
-#define NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
-#define NET_DIM_DEF_PROFILE_CQE 1
-#define NET_DIM_DEF_PROFILE_EQE 1
-
-/* All profiles sizes must be NET_PARAMS_DIM_NUM_PROFILES */
-#define NET_DIM_EQE_PROFILES { \
-   {1,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {8,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {64,  NET_DIM_DEFAULT_RX_CQ_MODERATION_P

[PATCH net-next v3 01/10] net/mlx5e: Move interrupt moderation structs to new file

2018-01-08 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Create new header file to prepare to move code that handles irq
moderation to a library that lives in a header file.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 33 +---
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h | 69 
 2 files changed, 70 insertions(+), 32 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 5299310..df9cbb3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -50,6 +50,7 @@
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
+#include "en_dim.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
@@ -227,12 +228,6 @@ enum mlx5e_priv_flag {
 #define MLX5E_MAX_BW_ALLOC 100 /* Max percentage of BW allocation */
 #endif
 
-struct mlx5e_cq_moder {
-   u16 usec;
-   u16 pkts;
-   u8 cq_period_mode;
-};
-
 struct mlx5e_params {
u8  log_sq_size;
u8  rq_wq_type;
@@ -473,32 +468,6 @@ struct mlx5e_mpw_info {
u16 skbs_frags[MLX5_MPWRQ_PAGES_PER_WQE];
 };
 
-struct mlx5e_rx_am_stats {
-   int ppms; /* packets per msec */
-   int bpms; /* bytes per msec */
-   int epms; /* events per msec */
-};
-
-struct mlx5e_rx_am_sample {
-   ktime_t time;
-   u32 pkt_ctr;
-   u32 byte_ctr;
-   u16 event_ctr;
-};
-
-struct mlx5e_rx_am { /* Adaptive Moderation */
-   u8  state;
-   struct mlx5e_rx_am_statsprev_stats;
-   struct mlx5e_rx_am_sample   start_sample;
-   struct work_struct  work;
-   u8  profile_ix;
-   u8  mode;
-   u8  tune_state;
-   u8  steps_right;
-   u8  steps_left;
-   u8  tired;
-};
-
 /* a single cache unit is capable to serve one napi call (for non-striding rq)
  * or a MPWQE (for striding rq).
  */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
new file mode 100644
index 000..9eeaa11
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -0,0 +1,69 @@
+/*
+ * Copyright (c) 2013-2015, Mellanox Technologies, Ltd.  All rights reserved.
+ * Copyright (c) 2017-2018, Broadcom Limited
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef MLX5_AM_H
+#define MLX5_AM_H
+
+struct mlx5e_cq_moder {
+   u16 usec;
+   u16 pkts;
+   u8 cq_period_mode;
+};
+
+struct mlx5e_rx_am_sample {
+   ktime_t time;
+   u32 pkt_ctr;
+   u32 byte_ctr;
+   u16 event_ctr;
+};
+
+struct mlx5e_rx_am_stats {
+   int ppms; /* packets per msec */
+   int bpms; /* bytes per msec */
+   int epms; /* events per msec */
+};
+
+struct mlx5e_rx_am { /* Adaptive Moderation */
+   u8  state;
+   struct mlx5e_rx_am_statsprev_stats;
+   struct mlx5e_rx_am_sample   start_sample;
+   struct work_struct  wo

[PATCH net-next v3 08/10] net/dim: use struct net_dim_sample as arg to net_dim

2018-01-08 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Simplify the arguments net_dim() by formatting them into a struct
net_dim_sample before calling the function.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Suggested-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 13 -
 include/linux/net_dim.h   | 10 +++---
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index a1c94fd..f292bb3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -78,11 +78,14 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
for (i = 0; i < c->num_tc; i++)
mlx5e_cq_arm(>sq[i].cq);
 
-   if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM))
-   net_dim(>rq.dim,
-   c->rq.cq.event_ctr,
-   c->rq.stats.packets,
-   c->rq.stats.bytes);
+   if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM)) {
+   struct net_dim_sample dim_sample;
+   net_dim_sample(c->rq.cq.event_ctr,
+  c->rq.stats.packets,
+  c->rq.stats.bytes,
+  _sample);
+   net_dim(>rq.dim, dim_sample);
+   }
 
mlx5e_cq_arm(>rq.cq);
mlx5e_cq_arm(>icosq.cq);
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index 741510f..1c7e450 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -342,21 +342,18 @@ static inline void net_dim_calc_stats(struct 
net_dim_sample *start,
 }
 
 static inline void net_dim(struct net_dim *dim,
-  u16 event_ctr,
-  u64 packets,
-  u64 bytes)
+  struct net_dim_sample end_sample)
 {
-   struct net_dim_sample end_sample;
struct net_dim_stats curr_stats;
u16 nevents;
 
switch (dim->state) {
case NET_DIM_MEASURE_IN_PROGRESS:
-   nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
+   nevents = BIT_GAP(BITS_PER_TYPE(u16),
+ end_sample.event_ctr,
  dim->start_sample.event_ctr);
if (nevents < NET_DIM_NEVENTS)
break;
-   net_dim_sample(event_ctr, packets, bytes, _sample);
net_dim_calc_stats(>start_sample, _sample,
   _stats);
if (net_dim_decision(_stats, dim)) {
@@ -366,7 +363,6 @@ static inline void net_dim(struct net_dim *dim,
}
/* fall through */
case NET_DIM_START_MEASURE:
-   net_dim_sample(event_ctr, packets, bytes, >start_sample);
dim->state = NET_DIM_MEASURE_IN_PROGRESS;
break;
case NET_DIM_APPLY_NEW_PROFILE:
-- 
2.7.4



[PATCH net-next v3 09/10] bnxt_en: add support for software dynamic interrupt moderation

2018-01-08 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This implements the changes needed for the bnxt_en driver to add support
for dynamic interrupt moderation per ring.

This does add additional counters in the receive path, but testing shows
that any additional instructions are offset by throughput gain when the
default configuration is for low latency.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Cc: Michael Chan <mc...@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/Makefile   |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 50 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 34 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c | 33 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 12 ++
 5 files changed, 119 insertions(+), 12 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c

diff --git a/drivers/net/ethernet/broadcom/bnxt/Makefile 
b/drivers/net/ethernet/broadcom/bnxt/Makefile
index 59c8ec9..7c560d5 100644
--- a/drivers/net/ethernet/broadcom/bnxt/Makefile
+++ b/drivers/net/ethernet/broadcom/bnxt/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_BNXT) += bnxt_en.o
 
-bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o 
bnxt_xdp.o bnxt_vfr.o bnxt_devlink.o
+bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o 
bnxt_xdp.o bnxt_vfr.o bnxt_devlink.o bnxt_dim.o
 bnxt_en-$(CONFIG_BNXT_FLOWER_OFFLOAD) += bnxt_tc.o
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 89c3c87..cf6ebf1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1645,6 +1645,8 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_napi 
*bnapi, u32 *raw_cons,
rxr->rx_next_cons = NEXT_RX(cons);
 
 next_rx_no_prod:
+   cpr->rx_packets += 1;
+   cpr->rx_bytes += len;
*raw_cons = tmp_raw_cons;
 
return rc;
@@ -1802,6 +1804,7 @@ static irqreturn_t bnxt_msix(int irq, void *dev_instance)
struct bnxt_cp_ring_info *cpr = >cp_ring;
u32 cons = RING_CMP(cpr->cp_raw_cons);
 
+   cpr->event_ctr++;
prefetch(>cp_desc_ring[CP_RING(cons)][CP_IDX(cons)]);
napi_schedule(>napi);
return IRQ_HANDLED;
@@ -2025,6 +2028,15 @@ static int bnxt_poll(struct napi_struct *napi, int 
budget)
break;
}
}
+   if (bp->flags & BNXT_FLAG_DIM) {
+   struct net_dim_sample dim_sample;
+
+   net_dim_sample(cpr->event_ctr,
+  cpr->rx_packets,
+  cpr->rx_bytes,
+  _sample);
+   net_dim(>dim, dim_sample);
+   }
mmiowb();
return work_done;
 }
@@ -2617,6 +2629,8 @@ static void bnxt_init_cp_rings(struct bnxt *bp)
struct bnxt_ring_struct *ring = >cp_ring_struct;
 
ring->fw_ring_id = INVALID_HW_RING_ID;
+   cpr->rx_ring_coal.coal_ticks = bp->rx_coal.coal_ticks;
+   cpr->rx_ring_coal.coal_bufs = bp->rx_coal.coal_bufs;
}
 }
 
@@ -4593,6 +4607,36 @@ static void bnxt_hwrm_set_coal_params(struct bnxt_coal 
*hw_coal,
req->flags = cpu_to_le16(flags);
 }
 
+int bnxt_hwrm_set_ring_coal(struct bnxt *bp, struct bnxt_napi *bnapi)
+{
+   struct hwrm_ring_cmpl_ring_cfg_aggint_params_input req_rx = {0};
+   struct bnxt_cp_ring_info *cpr = >cp_ring;
+   struct bnxt_coal coal;
+   unsigned int grp_idx;
+
+   /* Tick values in micro seconds.
+* 1 coal_buf x bufs_per_record = 1 completion record.
+*/
+   memcpy(, >rx_coal, sizeof(struct bnxt_coal));
+
+   coal.coal_ticks = cpr->rx_ring_coal.coal_ticks;
+   coal.coal_bufs = cpr->rx_ring_coal.coal_bufs;
+
+   if (!bnapi->rx_ring)
+   return -ENODEV;
+
+   bnxt_hwrm_cmd_hdr_init(bp, _rx,
+  HWRM_RING_CMPL_RING_CFG_AGGINT_PARAMS, -1, -1);
+
+   bnxt_hwrm_set_coal_params(, _rx);
+
+   grp_idx = bnapi->index;
+   req_rx.ring_id = cpu_to_le16(bp->grp_info[grp_idx].cp_fw_ring_id);
+
+   return hwrm_send_message(bp, _rx, sizeof(req_rx),
+HWRM_CMD_TIMEOUT);
+}
+
 int bnxt_hwrm_set_coal(struct bnxt *bp)
 {
int i, rc = 0;
@@ -5715,7 +5759,13 @@ static void bnxt_enable_napi(struct bnxt *bp)
int i;
 
for (i = 0; i < bp->cp_nr_rings; i++) {
+   struct bnxt_cp_ring_info *cpr = >bnapi[i]->cp_ring;
bp->bnapi[i]->in_reset = false;
+
+   if (bp->bnapi[i]->rx_ring) {
+   INIT_WORK(>dim.work, bnxt_dim_work);
+   cpr->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+   }
napi_enable(>bnapi[i

[PATCH net-next v3 04/10] net/mlx5e: Move AM logic enums

2018-01-08 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

More movement to help make this code more generic.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h   | 26 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 25 -
 2 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index 7d5499a..a1497bab 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -66,6 +66,32 @@ struct mlx5e_rx_am { /* Adaptive Moderation */
u8  tired;
 };
 
+/* Adaptive moderation logic */
+enum {
+   MLX5E_AM_START_MEASURE,
+   MLX5E_AM_MEASURE_IN_PROGRESS,
+   MLX5E_AM_APPLY_NEW_PROFILE,
+};
+
+enum {
+   MLX5E_AM_PARKING_ON_TOP,
+   MLX5E_AM_PARKING_TIRED,
+   MLX5E_AM_GOING_RIGHT,
+   MLX5E_AM_GOING_LEFT,
+};
+
+enum {
+   MLX5E_AM_STATS_WORSE,
+   MLX5E_AM_STATS_SAME,
+   MLX5E_AM_STATS_BETTER,
+};
+
+enum {
+   MLX5E_AM_STEPPED,
+   MLX5E_AM_TOO_TIRED,
+   MLX5E_AM_ON_EDGE,
+};
+
 void mlx5e_rx_am(struct mlx5e_rx_am *am,
 u16 event_ctr,
 u64 packets,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index 1630076..337dd60 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -82,31 +82,6 @@ struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 
rx_cq_period_mode)
return mlx5e_am_get_profile(rx_cq_period_mode, default_profile_ix);
 }
 
-/* Adaptive moderation logic */
-enum {
-   MLX5E_AM_START_MEASURE,
-   MLX5E_AM_MEASURE_IN_PROGRESS,
-   MLX5E_AM_APPLY_NEW_PROFILE,
-};
-
-enum {
-   MLX5E_AM_PARKING_ON_TOP,
-   MLX5E_AM_PARKING_TIRED,
-   MLX5E_AM_GOING_RIGHT,
-   MLX5E_AM_GOING_LEFT,
-};
-
-enum {
-   MLX5E_AM_STATS_WORSE,
-   MLX5E_AM_STATS_SAME,
-   MLX5E_AM_STATS_BETTER,
-};
-
-enum {
-   MLX5E_AM_STEPPED,
-   MLX5E_AM_TOO_TIRED,
-   MLX5E_AM_ON_EDGE,
-};
 
 static bool mlx5e_am_on_top(struct mlx5e_rx_am *am)
 {
-- 
2.7.4



[PATCH net-next v3 00/10] net: create dynamic software irq moderation library

2018-01-08 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This converts the dynamic interrupt moderation library from the mlx5e
driver into a library so it can be used by any driver.  The penultimate
patch in this set adds support for thiw new dynamic interrupt moderation
library in the bnxt_en driver and the last patch creates an entry in the
MAINTAINERS file for this library.

The main purpose of this code is to allow an administrator to make sure
that default coalesce settings are optimized for low latency, but
quickly adapt to handle high throughput/bulk traffic by altering how
much time passes before popping an interrupt.

For any new driver the following changes would be needed to use this
library:

- add elements in ring struct to track items needed by this library
- create function that can be called to actually set coalesce settings
  for the driver

Credit to Rob Rice and Lee Reed for doing some of the initial proof of
concept and testing for this patch and Tal Gilboa and Or Gerlitz for
their comments, etc on this set.

v3: bnxt_en fix from Michael Chan, comment suggestion from Vasundhara
Volam, and small mlx5e header file fix from Tal Gilboa.

v2: Spelling fixes from Stephen Hemminger, bnxt_en suggestions from
Michael Chan, spelling and formatting fixes from Or Gerlitz, and
spelling and mlx5e changes suggested by Tal Gilboa.

Andy Gospodarek (10):
  net/mlx5e: Move interrupt moderation structs to new file
  net/mlx5e: Move interrupt moderation forward declarations
  net/mlx5e: Remove rq references in mlx5e_rx_am
  net/mlx5e: Move AM logic enums
  net/mlx5e: Move generic functions to new file
  net/mlx5e: Change Mellanox references in DIM code
  net/mlx5e: Move dynamic interrupt coalescing code to include/linux
  net/dim: use struct net_dim_sample as arg to net_dim
  bnxt_en: add support for software dynamic interrupt moderation
  MAINTAINERS: add entry for Dynamic Interrupt Moderation

 MAINTAINERS|   5 +
 drivers/net/ethernet/broadcom/bnxt/Makefile|   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c  |  50 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h  |  34 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c  |  33 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  |  12 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  46 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  49 +++
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  40 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 341 ---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  10 +-
 include/linux/net_dim.h| 373 +
 14 files changed, 593 insertions(+), 410 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
 create mode 100644 include/linux/net_dim.h

-- 
2.7.4



Re: [PATCH net-next v2 06/10] net/mlx5e: Change Mellanox references in DIM code

2018-01-08 Thread Andy Gospodarek
On Sun, Jan 07, 2018 at 11:44:48AM +0200, Tal Gilboa wrote:
> 
> 
> On 1/6/2018 12:58 AM, Andy Gospodarek wrote:
> > From: Andy Gospodarek <go...@broadcom.com>
> > 
> > Change all appropriate mlx5_am* and MLX5_AM* references to net_dim and
> > NET_DIM, respectively, in code that handles dynamic interrupt
> > moderation.  Also change all references from 'am' to 'dim' when used as
> > local variables.
> > 
> > Signed-off-by: Andy Gospodarek <go...@broadcom.com>
> > Acked-by: Tal Gilboa <ta...@mellanox.com>
> > Acked-by: Saeed Mahameed <sae...@mellanox.com>
> > 
> > ---
> >   drivers/net/ethernet/mellanox/mlx5/core/en.h   |  12 +-
> >   drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  14 +-
> >   .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  12 +-
> >   drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  52 ++--
> >   drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
> >   drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   2 +-
> >   drivers/net/ethernet/mellanox/mlx5/core/net_dim.c  | 284 
> > ++---
> >   drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  |  63 +++--
> >   8 files changed, 232 insertions(+), 211 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
> > b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> > index 121f280..732f275 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> > @@ -115,6 +115,9 @@
> >   #define MLX5E_PARAMS_DEFAULT_MIN_RX_WQES0x80
> >   #define MLX5E_PARAMS_DEFAULT_MIN_RX_WQES_MPW0x2
> > +#define MLX5E_CQ_PERIOD_MODE_START_FROM_EQE0x0
> > +#define MLX5E_CQ_PERIOD_MODE_START_FROM_CQE0x1
> > +
> 
> This enum should be defined under include/linux/mlx5/mlx5_ifc.h:
> enum {
> MLX5_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
> MLX5_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
> MLX5_CQ_PERIOD_NUM_MODES
> };
> 
> We already include this in the relevant files. This was changed in patch
> 01/10, please revert.
> 

Ah yes.  I'd forgotten I moved that out of there to start and I'll move it
back.  Thanks!



Re: [PATCH net-next v2 09/10] bnxt_en: add support for software dynamic interrupt moderation

2018-01-08 Thread Andy Gospodarek
On Mon, Jan 08, 2018 at 04:20:04PM +0530, Vasundhara Volam wrote:
> Hi Andy,
> 
> If you are re-doing the patch, could you modify a minor comment below?
> 
> On Sat, Jan 6, 2018 at 4:28 AM, Andy Gospodarek <a...@greyhouse.net> wrote:
> 
> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c 
> > b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
> > new file mode 100644
> > index 000..156e025
> > --- /dev/null
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
> > @@ -0,0 +1,32 @@
> > +/*
> > + * Copyright (c) 2017 Broadcom Limited
> You have to modify year here.
> Also, First line driver description is missing "Broadcom NetXtreme-C/E
> network driver."
> Could you modify it?

Sure I could make those changes since it looks like I need a v3 anyway.
Thanks for the review.




[PATCH net-next v2 02/10] net/mlx5e: Move interrupt moderation forward declarations

2018-01-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Move these to newly created file to prepare to move these functions to a
library.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>

---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 4 
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h | 5 +
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ddb5429..2ccedf6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -829,10 +829,6 @@ void mlx5e_dealloc_rx_wqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_dealloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_free_rx_mpwqe(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi);
 
-void mlx5e_rx_am(struct mlx5e_rq *rq);
-void mlx5e_rx_am_work(struct work_struct *work);
-struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
-
 void mlx5e_update_stats(struct mlx5e_priv *priv, bool full);
 
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index 84b8524..f5f6535 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -72,4 +72,9 @@ enum {
MLX5_CQ_PERIOD_NUM_MODES
 };
 
+struct mlx5e_rq;
+void mlx5e_rx_am(struct mlx5e_rq *rq);
+void mlx5e_rx_am_work(struct work_struct *work);
+struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
+
 #endif /* MLX5_AM_H */
-- 
2.7.4



[PATCH net-next v2 07/10] net/mlx5e: Move dynamic interrupt coalescing code to include/linux

2018-01-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This move allows drivers to add private structure elements to track the
number of packets, bytes, and interrupts events per ring.  A driver
also defines a workqueue handler to act on this collected data once per
poll and modify the coalescing parameters per ring.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>

---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c | 307 --
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h | 108 ---
 include/linux/net_dim.h   | 376 ++
 6 files changed, 379 insertions(+), 417 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h
 create mode 100644 include/linux/net_dim.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index b46b6de2..c805769 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -15,7 +15,7 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o 
fpga/conn.o fpga/sdk.o \
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o 
\
en_tx.o en_rx.o en_dim.o en_txrx.o en_stats.o vxlan.o \
-   en_arfs.o en_fs_ethtool.o en_selftest.o net_dim.o
+   en_arfs.o en_fs_ethtool.o en_selftest.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 732f275..cb9abc9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -46,10 +46,10 @@
 #include 
 #include 
 #include 
+#include 
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
-#include "net_dim.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index f620325..2b89951 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -30,6 +30,7 @@
  * SOFTWARE.
  */
 
+#include 
 #include "en.h"
 
 void mlx5e_rx_dim_work(struct work_struct *work)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c 
b/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
deleted file mode 100644
index 00b9ae3..000
--- a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
+++ /dev/null
@@ -1,307 +0,0 @@
-/*
- * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
- * Copyright (c) 2017, Broadcom Limiited. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- *  - Redistributions of source code must retain the above
- *copyright notice, this list of conditions and the following
- *disclaimer.
- *
- *  - Redistributions in binary form must reproduce the above
- *copyright notice, this list of conditions and the following
- *disclaimer in the documentation and/or other materials
- *provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-
-#include "en.h"
-
-#define NET_DIM_PARAMS_NUM_PROFILES 5
-/* Adaptive moderation profiles */
-#define NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
-#define NET_DIM_DEF_PROFILE_CQE 1
-#define NET_DIM_DEF_PROFILE_EQE 1
-
-/* All profiles sizes must be NET_PARAMS_DIM_NUM_PROFILES */
-#define NET_DIM_EQE_PROFILES { \
-   {1,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {8,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {64,  NET_DIM_DEFAULT_RX_CQ_MODERATION_P

[PATCH net-next v2 06/10] net/mlx5e: Change Mellanox references in DIM code

2018-01-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Change all appropriate mlx5_am* and MLX5_AM* references to net_dim and
NET_DIM, respectively, in code that handles dynamic interrupt
moderation.  Also change all references from 'am' to 'dim' when used as
local variables.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>

---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  12 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  14 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  12 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  52 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c  | 284 ++---
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  |  63 +++--
 8 files changed, 232 insertions(+), 211 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 121f280..732f275 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -115,6 +115,9 @@
 #define MLX5E_PARAMS_DEFAULT_MIN_RX_WQES0x80
 #define MLX5E_PARAMS_DEFAULT_MIN_RX_WQES_MPW0x2
 
+#define MLX5E_CQ_PERIOD_MODE_START_FROM_EQE0x0
+#define MLX5E_CQ_PERIOD_MODE_START_FROM_CQE0x1
+
 #define MLX5E_LOG_INDIR_RQT_SIZE   0x7
 #define MLX5E_INDIR_RQT_SIZE   BIT(MLX5E_LOG_INDIR_RQT_SIZE)
 #define MLX5E_MIN_NUM_CHANNELS 0x1
@@ -237,8 +240,8 @@ struct mlx5e_params {
u16 num_channels;
u8  num_tc;
bool rx_cqe_compress_def;
-   struct mlx5e_cq_moder rx_cq_moderation;
-   struct mlx5e_cq_moder tx_cq_moderation;
+   struct net_dim_cq_moder rx_cq_moderation;
+   struct net_dim_cq_moder tx_cq_moderation;
bool lro_en;
u32 lro_wqe_sz;
u16 tx_max_inline;
@@ -248,7 +251,7 @@ struct mlx5e_params {
u32 indirection_rqt[MLX5E_INDIR_RQT_SIZE];
bool vlan_strip_disable;
bool scatter_fcs_en;
-   bool rx_am_enabled;
+   bool rx_dim_enabled;
u32 lro_timeout;
u32 pflags;
struct bpf_prog *xdp_prog;
@@ -527,7 +530,7 @@ struct mlx5e_rq {
unsigned long  state;
intix;
 
-   struct mlx5e_rx_am am; /* Adaptive Moderation */
+   struct net_dim dim; /* Dynamic Interrupt Moderation */
 
/* XDP */
struct bpf_prog   *xdp_prog;
@@ -1075,4 +1078,5 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
struct mlx5e_params *params,
u16 max_channels);
 u8 mlx5e_params_calculate_tx_min_inline(struct mlx5_core_dev *mdev);
+void mlx5e_rx_dim_work(struct work_struct *work);
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index b9b434b..f620325 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -32,17 +32,17 @@
 
 #include "en.h"
 
-void mlx5e_rx_am_work(struct work_struct *work)
+void mlx5e_rx_dim_work(struct work_struct *work)
 {
-   struct mlx5e_rx_am *am = container_of(work, struct mlx5e_rx_am,
- work);
-   struct mlx5e_rq *rq = container_of(am, struct mlx5e_rq, am);
-   struct mlx5e_cq_moder cur_profile = mlx5e_am_get_profile(am->mode,
-
am->profile_ix);
+   struct net_dim *dim = container_of(work, struct net_dim,
+  work);
+   struct mlx5e_rq *rq = container_of(dim, struct mlx5e_rq, dim);
+   struct net_dim_cq_moder cur_profile = net_dim_get_profile(dim->mode,
+ 
dim->profile_ix);
 
mlx5_core_modify_cq_moderation(rq->mdev, >cq.mcq,
   cur_profile.usec, cur_profile.pkts);
 
-   am->state = MLX5E_AM_START_MEASURE;
+   dim->state = NET_DIM_START_MEASURE;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 8f05efa..62ac4c9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -480,7 +480,7 @@ int mlx5e_ethtool_get_coalesce(struct mlx5e_priv *priv,
coal->rx_max_coalesced_frames = 
priv->channels.params.rx_cq_moderation.pkts;
coal->tx_coalesce_usecs   = 
priv->channels.params.tx_cq_moderation.usec;
coal->tx_max_coalesced_frames = 
priv->channels.params.tx_cq_moderation.pkts;
-   coal-&

[PATCH net-next v2 08/10] net/dim: use struct net_dim_sample as arg to net_dim

2018-01-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Simplify the arguments net_dim() by formatting them into a struct
net_dim_sample before calling the function.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Suggested-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 13 -
 include/linux/net_dim.h   | 10 +++---
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index dae77a9..f292bb3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -78,11 +78,14 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
for (i = 0; i < c->num_tc; i++)
mlx5e_cq_arm(>sq[i].cq);
 
-   if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM))
-   net_dim(>rq.dim,
-   c->rq.cq.event_ctr,
-   c->rq.stats.packets,
-   c->rq.stats.bytes);
+   if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM)) {
+   struct net_dim_sample dim_sample;
+   net_dim_sample(c->rq.cq.event_ctr,
+  c->rq.stats.packets,
+  c->rq.stats.bytes,
+  _sample);
+   net_dim(>rq.dim, dim_sample);
+   }
 
mlx5e_cq_arm(>rq.cq);
mlx5e_cq_arm(>icosq.cq);
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index bb99073..2cceefa 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -341,21 +341,18 @@ static inline void net_dim_calc_stats(struct 
net_dim_sample *start,
 }
 
 static inline void net_dim(struct net_dim *dim,
-  u16 event_ctr,
-  u64 packets,
-  u64 bytes)
+  struct net_dim_sample end_sample)
 {
-   struct net_dim_sample end_sample;
struct net_dim_stats curr_stats;
u16 nevents;
 
switch (dim->state) {
case NET_DIM_MEASURE_IN_PROGRESS:
-   nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
+   nevents = BIT_GAP(BITS_PER_TYPE(u16),
+ end_sample.event_ctr,
  dim->start_sample.event_ctr);
if (nevents < NET_DIM_NEVENTS)
break;
-   net_dim_sample(event_ctr, packets, bytes, _sample);
net_dim_calc_stats(>start_sample, _sample,
   _stats);
if (net_dim_decision(_stats, dim)) {
@@ -365,7 +362,6 @@ static inline void net_dim(struct net_dim *dim,
}
/* fall through */
case NET_DIM_START_MEASURE:
-   net_dim_sample(event_ctr, packets, bytes, >start_sample);
dim->state = NET_DIM_MEASURE_IN_PROGRESS;
break;
case NET_DIM_APPLY_NEW_PROFILE:
-- 
2.7.4



[PATCH net-next v2 09/10] bnxt_en: add support for software dynamic interrupt moderation

2018-01-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This implements the changes needed for the bnxt_en driver to add support
for dynamic interrupt moderation per ring.

This does add additional counters in the receive path, but testing shows
that any additional instructions are offset by throughput gain when the
default configuration is for low latency.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Cc: Michael Chan <mc...@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/Makefile   |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 49 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 34 +++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c | 32 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 12 ++
 5 files changed, 117 insertions(+), 12 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c

diff --git a/drivers/net/ethernet/broadcom/bnxt/Makefile 
b/drivers/net/ethernet/broadcom/bnxt/Makefile
index 59c8ec9..7c560d5 100644
--- a/drivers/net/ethernet/broadcom/bnxt/Makefile
+++ b/drivers/net/ethernet/broadcom/bnxt/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_BNXT) += bnxt_en.o
 
-bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o 
bnxt_xdp.o bnxt_vfr.o bnxt_devlink.o
+bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o 
bnxt_xdp.o bnxt_vfr.o bnxt_devlink.o bnxt_dim.o
 bnxt_en-$(CONFIG_BNXT_FLOWER_OFFLOAD) += bnxt_tc.o
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 9efbdc6..b9d4c61 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1645,6 +1645,8 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_napi 
*bnapi, u32 *raw_cons,
rxr->rx_next_cons = NEXT_RX(cons);
 
 next_rx_no_prod:
+   cpr->rx_packets += 1;
+   cpr->rx_bytes += len;
*raw_cons = tmp_raw_cons;
 
return rc;
@@ -1802,6 +1804,7 @@ static irqreturn_t bnxt_msix(int irq, void *dev_instance)
struct bnxt_cp_ring_info *cpr = >cp_ring;
u32 cons = RING_CMP(cpr->cp_raw_cons);
 
+   cpr->event_ctr++;
prefetch(>cp_desc_ring[CP_RING(cons)][CP_IDX(cons)]);
napi_schedule(>napi);
return IRQ_HANDLED;
@@ -2025,6 +2028,14 @@ static int bnxt_poll(struct napi_struct *napi, int 
budget)
break;
}
}
+   if (bp->flags & BNXT_FLAG_DIM) {
+   struct net_dim_sample dim_sample;
+   net_dim_sample(cpr->event_ctr,
+  cpr->rx_packets,
+  cpr->rx_bytes,
+  _sample);
+   net_dim(>dim, dim_sample);
+   }
mmiowb();
return work_done;
 }
@@ -2610,6 +2621,8 @@ static void bnxt_init_cp_rings(struct bnxt *bp)
struct bnxt_ring_struct *ring = >cp_ring_struct;
 
ring->fw_ring_id = INVALID_HW_RING_ID;
+   cpr->rx_ring_coal.coal_ticks = bp->rx_coal.coal_ticks;
+   cpr->rx_ring_coal.coal_bufs = bp->rx_coal.coal_bufs;
}
 }
 
@@ -4583,6 +4596,36 @@ static void bnxt_hwrm_set_coal_params(struct bnxt_coal 
*hw_coal,
req->flags = cpu_to_le16(flags);
 }
 
+int bnxt_hwrm_set_ring_coal(struct bnxt *bp, struct bnxt_napi *bnapi)
+{
+   struct hwrm_ring_cmpl_ring_cfg_aggint_params_input req_rx = {0};
+   struct bnxt_cp_ring_info *cpr = >cp_ring;
+   struct bnxt_coal coal;
+   unsigned int grp_idx;
+
+/* Tick values in micro seconds.
+ * 1 coal_buf x bufs_per_record = 1 completion record.
+ */
+   memcpy(, >rx_coal, sizeof(struct bnxt_coal));
+
+   coal.coal_ticks = cpr->rx_ring_coal.coal_ticks;
+   coal.coal_bufs = cpr->rx_ring_coal.coal_bufs;
+
+   if (!bnapi->rx_ring)
+   return -ENODEV;
+
+   bnxt_hwrm_cmd_hdr_init(bp, _rx,
+  HWRM_RING_CMPL_RING_CFG_AGGINT_PARAMS, -1, -1);
+
+   bnxt_hwrm_set_coal_params(, _rx);
+
+   grp_idx = bnapi->index;
+   req_rx.ring_id = cpu_to_le16(bp->grp_info[grp_idx].cp_fw_ring_id);
+
+   return hwrm_send_message(bp, _rx, sizeof(req_rx),
+HWRM_CMD_TIMEOUT);
+}
+
 int bnxt_hwrm_set_coal(struct bnxt *bp)
 {
int i, rc = 0;
@@ -5705,7 +5748,13 @@ static void bnxt_enable_napi(struct bnxt *bp)
int i;
 
for (i = 0; i < bp->cp_nr_rings; i++) {
+   struct bnxt_cp_ring_info *cpr = >bnapi[i]->cp_ring;
bp->bnapi[i]->in_reset = false;
+
+   if (!(bp->bnapi[i]->flags & BNXT_NAPI_FLAG_XDP)) {
+   INIT_WORK(>dim.work, bnxt_dim_work);
+   cpr->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+   }
 

[PATCH net-next v2 00/10] net: create dynamic software irq moderation library

2018-01-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This converts the dynamic interrupt moderation library from the mlx5e
driver into a library so it can be used by any driver.  The penultimate
patch in this set adds support for thiw new dynamic interrupt moderation
library in the bnxt_en driver and the last patch creates an entry in the
MAINTAINERS file for this library.

The main purpose of this code is to allow an administrator to make sure
that default coalesce settings are optimized for low latency, but
quickly adapt to handle high throughput/bulk traffic by altering how
much time passes before popping an interrupt.

For any new driver the following changes would be needed to use this
library:

- add elements in ring struct to track items needed by this library
- create function that can be called to actually set coalesce settings
  for the driver

Credit to Rob Rice and Lee Reed for doing some of the initial proof of
concept and testing for this patch and Tal Gilboa and Or Gerlitz for
their comments, etc on this set.

v2: Spelling fixes from Stephen Hemminger, bnxt_en suggestions from
Michael Chan, spelling and formatting fixes from Or Gerlitz, and
spelling and mlx5e changes suggested by Tal Gilboa.

Andy Gospodarek (10):
  net/mlx5e: Move interrupt moderation structs to new file
  net/mlx5e: Move interrupt moderation forward declarations
  net/mlx5e: Remove rq references in mlx5e_rx_am
  net/mlx5e: Move AM logic enums
  net/mlx5e: Move generic functions to new file
  net/mlx5e: Change Mellanox references in DIM code
  net/mlx5e: Move dynamic interrupt coalescing code to include/linux
  net/dim: use struct net_dim_sample as arg to net_dim
  bnxt_en: add support for software dynamic interrupt moderation
  MAINTAINERS: add entry for Dynamic Interrupt Moderation

 MAINTAINERS|   5 +
 drivers/net/ethernet/broadcom/bnxt/Makefile|   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c  |  49 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h  |  34 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c  |  32 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  |  12 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  49 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  49 +++
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  12 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  52 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 341 ---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  10 +-
 include/linux/mlx5/mlx5_ifc.h  |   6 -
 include/linux/net_dim.h| 372 +
 16 files changed, 604 insertions(+), 427 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
 create mode 100644 include/linux/net_dim.h

-- 
2.7.4



[PATCH net-next v2 01/10] net/mlx5e: Move interrupt moderation structs to new file

2018-01-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Create new header file to prepare to move code that handles irq
moderation to a library that lives in a header file.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>

---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 33 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h | 75 
 include/linux/mlx5/mlx5_ifc.h|  6 --
 3 files changed, 76 insertions(+), 38 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 543060c..ddb5429 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -49,6 +49,7 @@
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
+#include "en_dim.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
@@ -226,12 +227,6 @@ enum mlx5e_priv_flag {
 #define MLX5E_MAX_BW_ALLOC 100 /* Max percentage of BW allocation */
 #endif
 
-struct mlx5e_cq_moder {
-   u16 usec;
-   u16 pkts;
-   u8 cq_period_mode;
-};
-
 struct mlx5e_params {
u8  log_sq_size;
u8  rq_wq_type;
@@ -472,32 +467,6 @@ struct mlx5e_mpw_info {
u16 skbs_frags[MLX5_MPWRQ_PAGES_PER_WQE];
 };
 
-struct mlx5e_rx_am_stats {
-   int ppms; /* packets per msec */
-   int bpms; /* bytes per msec */
-   int epms; /* events per msec */
-};
-
-struct mlx5e_rx_am_sample {
-   ktime_t time;
-   u32 pkt_ctr;
-   u32 byte_ctr;
-   u16 event_ctr;
-};
-
-struct mlx5e_rx_am { /* Adaptive Moderation */
-   u8  state;
-   struct mlx5e_rx_am_statsprev_stats;
-   struct mlx5e_rx_am_sample   start_sample;
-   struct work_struct  work;
-   u8  profile_ix;
-   u8  mode;
-   u8  tune_state;
-   u8  steps_right;
-   u8  steps_left;
-   u8  tired;
-};
-
 /* a single cache unit is capable to serve one napi call (for non-striding rq)
  * or a MPWQE (for striding rq).
  */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
new file mode 100644
index 000..84b8524
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -0,0 +1,75 @@
+/*
+ * Copyright (c) 2013-2015, Mellanox Technologies, Ltd.  All rights reserved.
+ * Copyright (c) 2017, Broadcom Limited
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+*/
+
+#ifndef MLX5_AM_H
+#define MLX5_AM_H
+
+struct mlx5e_cq_moder {
+   u16 usec;
+   u16 pkts;
+   u8 cq_period_mode;
+};
+
+struct mlx5e_rx_am_sample {
+   ktime_t time;
+   u32 pkt_ctr;
+   u32 byte_ctr;
+   u16 event_ctr;
+};
+
+struct mlx5e_rx_am_stats {
+   int ppms; /* packets per msec */
+   int bpms; /* bytes per msec */
+   int epms; /* events per msec */
+};
+
+struct mlx5e_rx_am { /* Adaptive Moderation */
+   u8  state;
+   struct mlx5e_rx_am_statsprev_stats;
+   struct mlx5e_rx_am_sample   star

[PATCH net-next v2 10/10] MAINTAINERS: add entry for Dynamic Interrupt Moderation

2018-01-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Signed-off-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>
---
 MAINTAINERS | 5 +
 1 file changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 753799d..178239dc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4944,6 +4944,11 @@ S:   Maintained
 F: lib/dynamic_debug.c
 F: include/linux/dynamic_debug.h
 
+DYNAMIC INTERRUPT MODERATION
+M: Tal Gilboa <ta...@mellanox.com>
+S: Maintained
+F: include/linux/net_dim.h
+
 DZ DECSTATION DZ11 SERIAL DRIVER
 M: "Maciej W. Rozycki" <ma...@linux-mips.org>
 S: Maintained
-- 
2.7.4



[PATCH net-next v2 03/10] net/mlx5e: Remove rq references in mlx5e_rx_am

2018-01-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This makes mlx5e_am_sample more generic so that it can be called easily
from a driver that does not use the same data structure to store these
values in a single structure.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>

---
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h   |  6 --
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 22 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  5 -
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index f5f6535..b676a057 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -72,8 +72,10 @@ enum {
MLX5_CQ_PERIOD_NUM_MODES
 };
 
-struct mlx5e_rq;
-void mlx5e_rx_am(struct mlx5e_rq *rq);
+void mlx5e_rx_am(struct mlx5e_rx_am *am,
+u16 event_ctr,
+u64 packets,
+u64 bytes);
 void mlx5e_rx_am_work(struct work_struct *work);
 struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index e401d9d..1630076 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -264,13 +264,15 @@ static bool mlx5e_am_decision(struct mlx5e_rx_am_stats 
*curr_stats,
return am->profile_ix != prev_ix;
 }
 
-static void mlx5e_am_sample(struct mlx5e_rq *rq,
+static void mlx5e_am_sample(u16 event_ctr,
+   u64 packets,
+   u64 bytes,
struct mlx5e_rx_am_sample *s)
 {
s->time  = ktime_get();
-   s->pkt_ctr   = rq->stats.packets;
-   s->byte_ctr  = rq->stats.bytes;
-   s->event_ctr = rq->cq.event_ctr;
+   s->pkt_ctr   = packets;
+   s->byte_ctr  = bytes;
+   s->event_ctr = event_ctr;
 }
 
 #define MLX5E_AM_NEVENTS 64
@@ -309,20 +311,22 @@ void mlx5e_rx_am_work(struct work_struct *work)
am->state = MLX5E_AM_START_MEASURE;
 }
 
-void mlx5e_rx_am(struct mlx5e_rq *rq)
+void mlx5e_rx_am(struct mlx5e_rx_am *am,
+u16 event_ctr,
+u64 packets,
+u64 bytes)
 {
-   struct mlx5e_rx_am *am = >am;
struct mlx5e_rx_am_sample end_sample;
struct mlx5e_rx_am_stats curr_stats;
u16 nevents;
 
switch (am->state) {
case MLX5E_AM_MEASURE_IN_PROGRESS:
-   nevents = BIT_GAP(BITS_PER_TYPE(u16), rq->cq.event_ctr,
+   nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
  am->start_sample.event_ctr);
if (nevents < MLX5E_AM_NEVENTS)
break;
-   mlx5e_am_sample(rq, _sample);
+   mlx5e_am_sample(event_ctr, packets, bytes, _sample);
mlx5e_am_calc_stats(>start_sample, _sample,
_stats);
if (mlx5e_am_decision(_stats, am)) {
@@ -332,7 +336,7 @@ void mlx5e_rx_am(struct mlx5e_rq *rq)
}
/* fall through */
case MLX5E_AM_START_MEASURE:
-   mlx5e_am_sample(rq, >start_sample);
+   mlx5e_am_sample(event_ctr, packets, bytes, >start_sample);
am->state = MLX5E_AM_MEASURE_IN_PROGRESS;
break;
case MLX5E_AM_APPLY_NEW_PROFILE:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index ab92298..1849169 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -79,7 +79,10 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
mlx5e_cq_arm(>sq[i].cq);
 
if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM))
-   mlx5e_rx_am(>rq);
+   mlx5e_rx_am(>rq.am,
+   c->rq.cq.event_ctr,
+   c->rq.stats.packets,
+   c->rq.stats.bytes);
 
mlx5e_cq_arm(>rq.cq);
mlx5e_cq_arm(>icosq.cq);
-- 
2.7.4



[PATCH net-next v2 05/10] net/mlx5e: Move generic functions to new file

2018-01-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

These functions were identified as ones that could be made generic and
used by multiple drivers.  Most of the contents of en_rx_am.c are moved
to net_dim.c.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>

---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  48 
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h   | 108 ---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 320 -
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c  | 307 
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  | 109 +++
 7 files changed, 467 insertions(+), 431 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 19b21b4..b46b6de2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -14,8 +14,8 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o 
fpga/conn.o fpga/sdk.o \
fpga/ipsec.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o 
\
-   en_tx.o en_rx.o en_rx_am.o en_txrx.o en_stats.o vxlan.o \
-   en_arfs.o en_fs_ethtool.o en_selftest.o
+   en_tx.o en_rx.o en_dim.o en_txrx.o en_stats.o vxlan.o \
+   en_arfs.o en_fs_ethtool.o en_selftest.o net_dim.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 2ccedf6..121f280 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -49,7 +49,7 @@
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
-#include "en_dim.h"
+#include "net_dim.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
new file mode 100644
index 000..b9b434b
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -0,0 +1,48 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "en.h"
+
+void mlx5e_rx_am_work(struct work_struct *work)
+{
+   struct mlx5e_rx_am *am = container_of(work, struct mlx5e_rx_am,
+ work);
+   struct mlx5e_rq *rq = container_of(am, struct mlx5e_rq, am);
+   struct mlx5e_cq_moder cur_profile = mlx5e_am_get_profile(am->mode,
+
am->profile_ix);
+
+   mlx5_core_modify_cq_moderation(rq->mdev, >cq.mcq,
+  cur_profile.usec, cur_profile.pkts);
+
+   am->state = MLX5E_AM_START_MEASURE;
+}
+
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
de

[PATCH net-next v2 04/10] net/mlx5e: Move AM logic enums

2018-01-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

More movement to help make this code more generic.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Saeed Mahameed <sae...@mellanox.com>

---
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h   | 26 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 25 -
 2 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index b676a057..c9f0d05 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -72,6 +72,32 @@ enum {
MLX5_CQ_PERIOD_NUM_MODES
 };
 
+/* Adaptive moderation logic */
+enum {
+   MLX5E_AM_START_MEASURE,
+   MLX5E_AM_MEASURE_IN_PROGRESS,
+   MLX5E_AM_APPLY_NEW_PROFILE,
+};
+
+enum {
+   MLX5E_AM_PARKING_ON_TOP,
+   MLX5E_AM_PARKING_TIRED,
+   MLX5E_AM_GOING_RIGHT,
+   MLX5E_AM_GOING_LEFT,
+};
+
+enum {
+   MLX5E_AM_STATS_WORSE,
+   MLX5E_AM_STATS_SAME,
+   MLX5E_AM_STATS_BETTER,
+};
+
+enum {
+   MLX5E_AM_STEPPED,
+   MLX5E_AM_TOO_TIRED,
+   MLX5E_AM_ON_EDGE,
+};
+
 void mlx5e_rx_am(struct mlx5e_rx_am *am,
 u16 event_ctr,
 u64 packets,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index 1630076..337dd60 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -82,31 +82,6 @@ struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 
rx_cq_period_mode)
return mlx5e_am_get_profile(rx_cq_period_mode, default_profile_ix);
 }
 
-/* Adaptive moderation logic */
-enum {
-   MLX5E_AM_START_MEASURE,
-   MLX5E_AM_MEASURE_IN_PROGRESS,
-   MLX5E_AM_APPLY_NEW_PROFILE,
-};
-
-enum {
-   MLX5E_AM_PARKING_ON_TOP,
-   MLX5E_AM_PARKING_TIRED,
-   MLX5E_AM_GOING_RIGHT,
-   MLX5E_AM_GOING_LEFT,
-};
-
-enum {
-   MLX5E_AM_STATS_WORSE,
-   MLX5E_AM_STATS_SAME,
-   MLX5E_AM_STATS_BETTER,
-};
-
-enum {
-   MLX5E_AM_STEPPED,
-   MLX5E_AM_TOO_TIRED,
-   MLX5E_AM_ON_EDGE,
-};
 
 static bool mlx5e_am_on_top(struct mlx5e_rx_am *am)
 {
-- 
2.7.4



Re: [net-next 06/10] net/mlx5e: change Mellanox references in DIM code

2018-01-05 Thread Andy Gospodarek
On Fri, Jan 05, 2018 at 10:04:50AM +0200, Tal Gilboa wrote:
> On 1/4/2018 10:21 PM, Andy Gospodarek wrote:
> > From: Andy Gospodarek <go...@broadcom.com>
> > 
> > Change all mlx5_am* and MLX_AM* references to net_dim and NET_DIM,
> MLX_AM->MLX5_AM
> 
> > cq_period_mode = enable ?
> > -   MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
> > -   MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
> > +   NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE :
> > +   NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
> I'm not sure about this part. CQE/EQE based moderation is a feature in
> Mellanox's chips, which isn't necessarily coupled with adaptive moderation.
> net_dim lib should know which values to choose according to the selected
> mode, but I don't think mlx5 driver should use an enum from net_dim for
> enabling/disabling HW features. Another issue is that we use the enum value
> as an argument for the command to HW (0=EQE, 1=CQE). If someone would change
> the values it would break the HW feature. I think it would be safer to use
> the NET_DIM_XXX enum only when using functions from net_dim lib.

[Please ignore my eariler response, I'm not sure I fully read/parsed what you
were saying.  Sorry about that.]

I like your suggestion, so I'm going to refactor this a bit based on that.  I
made all the other suggested changes, so this should be the last one

> 
> > current_cq_period_mode = is_rx_cq ?
> > priv->channels.params.rx_cq_moderation.cq_period_mode :
> > priv->channels.params.tx_cq_moderation.cq_period_mode;
> > mode_changed = cq_period_mode != current_cq_period_mode;
> > -   if (cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE &&
> > +   if (cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE &&
> > !MLX5_CAP_GEN(mdev, cq_period_start_from_cqe))
> > return -EOPNOTSUPP;
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
> > b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index 3aa1c90..edd4077 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -674,8 +674,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
> > wqe->data.lkey = rq->mkey_be;
> > }
> > -   INIT_WORK(>am.work, mlx5e_rx_am_work);
> > -   rq->am.mode = params->rx_cq_moderation.cq_period_mode;
> > +   INIT_WORK(>dim.work, mlx5e_rx_dim_work);
> > +   rq->dim.mode = params->rx_cq_moderation.cq_period_mode;
> > rq->page_cache.head = 0;
> > rq->page_cache.tail = 0;
> > @@ -919,7 +919,7 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
> > if (err)
> > goto err_destroy_rq;
> > -   if (params->rx_am_enabled)
> > +   if (params->rx_dim_enabled)
> > c->rq.state |= BIT(MLX5E_RQ_STATE_AM);
> > return 0;
> > @@ -952,7 +952,7 @@ static void mlx5e_deactivate_rq(struct mlx5e_rq *rq)
> >   static void mlx5e_close_rq(struct mlx5e_rq *rq)
> >   {
> > -   cancel_work_sync(>am.work);
> > +   cancel_work_sync(>dim.work);
> > mlx5e_destroy_rq(rq);
> > mlx5e_free_rx_descs(rq);
> > mlx5e_free_rq(rq);
> > @@ -1565,7 +1565,7 @@ static void mlx5e_destroy_cq(struct mlx5e_cq *cq)
> >   }
> >   static int mlx5e_open_cq(struct mlx5e_channel *c,
> > -struct mlx5e_cq_moder moder,
> > +struct net_dim_cq_moder moder,
> >  struct mlx5e_cq_param *param,
> >  struct mlx5e_cq *cq)
> >   {
> > @@ -1747,7 +1747,7 @@ static int mlx5e_open_channel(struct mlx5e_priv 
> > *priv, int ix,
> >   struct mlx5e_channel_param *cparam,
> >   struct mlx5e_channel **cp)
> >   {
> > -   struct mlx5e_cq_moder icocq_moder = {0, 0};
> > +   struct net_dim_cq_moder icocq_moder = {0, 0};
> > struct net_device *netdev = priv->netdev;
> > int cpu = mlx5e_get_cpu(priv, ix);
> > struct mlx5e_channel *c;
> > @@ -1999,7 +1999,7 @@ static void mlx5e_build_ico_cq_param(struct 
> > mlx5e_priv *priv,
> > mlx5e_build_common_cq_param(priv, param);
> > -   param->cq_period_mode = MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
> > +   param->cq_period_mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
> >   }
> >   static void mlx5e_build_icosq_param(struct mlx5e_priv *priv,
> > @@ -4016,13 +4016,13 @@ void mlx5e_set_tx_cq_mode_params(struct 
> > mlx5e_params *params, u8 cq_period_mode)
>

Re: [net-next 06/10] net/mlx5e: change Mellanox references in DIM code

2018-01-05 Thread Andy Gospodarek
On Fri, Jan 05, 2018 at 10:04:50AM +0200, Tal Gilboa wrote:
> On 1/4/2018 10:21 PM, Andy Gospodarek wrote:
> > From: Andy Gospodarek <go...@broadcom.com>
> > 
> > Change all mlx5_am* and MLX_AM* references to net_dim and NET_DIM,
> MLX_AM->MLX5_AM
> 
> > cq_period_mode = enable ?
> > -   MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
> > -   MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
> > +   NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE :
> > +   NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
> I'm not sure about this part. CQE/EQE based moderation is a feature in
> Mellanox's chips, which isn't necessarily coupled with adaptive moderation.
> net_dim lib should know which values to choose according to the selected
> mode, but I don't think mlx5 driver should use an enum from net_dim for
> enabling/disabling HW features. Another issue is that we use the enum value
> as an argument for the command to HW (0=EQE, 1=CQE). If someone would change
> the values it would break the HW feature. I think it would be safer to use
> the NET_DIM_XXX enum only when using functions from net_dim lib.

I've gone back and forth about when to do this (now vs later).

One of the future improments I'd planned was actually to allow profiles
to live in en_dim.c and bnxt_dim.c or have some additional profiles
in net_dim.h.  This is specifially to address some different hardware
limits that might exist as hardware with a maximum that is smaller than
256 usecs (for example) might not find much benefit from how quickly
these existing profiles scale-up.

I'll play with this today for a bit and see what falls out.  My
preference is not to change this for v2 not so as to not hold up this
set too long.

> 
> > current_cq_period_mode = is_rx_cq ?
> > priv->channels.params.rx_cq_moderation.cq_period_mode :
> > priv->channels.params.tx_cq_moderation.cq_period_mode;
> > mode_changed = cq_period_mode != current_cq_period_mode;
> > -   if (cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE &&
> > +   if (cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE &&
> > !MLX5_CAP_GEN(mdev, cq_period_start_from_cqe))
> > return -EOPNOTSUPP;
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
> > b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index 3aa1c90..edd4077 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -674,8 +674,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
> > wqe->data.lkey = rq->mkey_be;
> > }
> > -   INIT_WORK(>am.work, mlx5e_rx_am_work);
> > -   rq->am.mode = params->rx_cq_moderation.cq_period_mode;
> > +   INIT_WORK(>dim.work, mlx5e_rx_dim_work);
> > +   rq->dim.mode = params->rx_cq_moderation.cq_period_mode;
> > rq->page_cache.head = 0;
> > rq->page_cache.tail = 0;
> > @@ -919,7 +919,7 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
> > if (err)
> > goto err_destroy_rq;
> > -   if (params->rx_am_enabled)
> > +   if (params->rx_dim_enabled)
> > c->rq.state |= BIT(MLX5E_RQ_STATE_AM);
> > return 0;
> > @@ -952,7 +952,7 @@ static void mlx5e_deactivate_rq(struct mlx5e_rq *rq)
> >   static void mlx5e_close_rq(struct mlx5e_rq *rq)
> >   {
> > -   cancel_work_sync(>am.work);
> > +   cancel_work_sync(>dim.work);
> > mlx5e_destroy_rq(rq);
> > mlx5e_free_rx_descs(rq);
> > mlx5e_free_rq(rq);
> > @@ -1565,7 +1565,7 @@ static void mlx5e_destroy_cq(struct mlx5e_cq *cq)
> >   }
> >   static int mlx5e_open_cq(struct mlx5e_channel *c,
> > -struct mlx5e_cq_moder moder,
> > +struct net_dim_cq_moder moder,
> >  struct mlx5e_cq_param *param,
> >  struct mlx5e_cq *cq)
> >   {
> > @@ -1747,7 +1747,7 @@ static int mlx5e_open_channel(struct mlx5e_priv 
> > *priv, int ix,
> >   struct mlx5e_channel_param *cparam,
> >   struct mlx5e_channel **cp)
> >   {
> > -   struct mlx5e_cq_moder icocq_moder = {0, 0};
> > +   struct net_dim_cq_moder icocq_moder = {0, 0};
> > struct net_device *netdev = priv->netdev;
> > int cpu = mlx5e_get_cpu(priv, ix);
> > struct mlx5e_channel *c;
> > @@ -1999,7 +1999,7 @@ static void mlx5e_build_ico_cq_param(struct 
> > mlx5e_priv *priv,
> > mlx5e_build_common_cq_param(priv, param);
> > -   param->cq_period_mode = MLX5_CQ_P

Re: [net-next 00/10] net: create dynamic software irq moderation library

2018-01-05 Thread Andy Gospodarek
On Fri, Jan 05, 2018 at 10:14:43AM +0200, Tal Gilboa wrote:
> Thanks Andy for your hard work. Looks great overall!
> 
> On 1/4/2018 10:21 PM, Andy Gospodarek wrote:
> > From: Andy Gospodarek <go...@broadcom.com>
> > 
> > This converts the dynamic interrupt moderation library from the mlx5_en 
> > driver
> > into a library so it can be used by any driver.  The penultimatepatch in 
> > this
> Had to look up "penultimatepatch " :), but aren't these two words?
> 
> > set adds support for interrupt moderation in the bnxt_en driver and the last
> > patch creates an entry in the MAINTAINERS file.
> > 
> > The main purpose of this code in the mlx5_en driver is to allow an
> > administrator to make sure that default coalesce settings are optimized
> > for low latency, but quickly adapt to handle high throughput traffic and
> > optimize how many packets are received during each napi poll.
> > 
> > For any new driver the following changes would be needed to use this
> > library:
> > 
> > - add elements in ring struct to track items needed by this library
> > - create function that can be called to actually set coalesce settings
> >for the driver
> > 
> > Credit to Rob Rice and Lee Reed for doing some of the initial proof of
> > concept and testing for this patch and Tal Gilboa and Or Gerlitz for their
> > comments, etc on this set.
> > 
> > Andy Gospodarek (10):
> >net/mlx5e: move interrupt moderation structs to new file
> >net/mlx5e: move interrupt moderation forward declarations
> >net/mlx5e: remove rq references in mlx5e_rx_am
> >net/mlx5e: move AM logic enums
> >net/mlx5e: move generic functions to new file
> >net/mlx5e: change Mellanox references in DIM code
> >net: move dynamic interrpt coalescing code to include/linux
> interrpt -> interrupt. The topic of the actual patch was fixed, only left in
> the cover.

I'm just going to run ispell on everything again.  :-)

> 
> >net/dim: use struct net_dim_sample as arg to net_dim
> >bnxt_en: add support for software dynamic interrupt moderation
> >MAINTAINERS: add entry for Dynamic Interrupt Moderation
> > 
> >   MAINTAINERS|   5 +
> >   drivers/net/ethernet/broadcom/bnxt/Makefile|   2 +-
> >   drivers/net/ethernet/broadcom/bnxt/bnxt.c  |  52 +++
> >   drivers/net/ethernet/broadcom/bnxt/bnxt.h  |  34 +-
> >   drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c  |  32 ++
> >   drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  |  12 +
> >   drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
> >   drivers/net/ethernet/mellanox/mlx5/core/en.h   |  46 +--
> >   drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  49 +++
> >   .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  12 +-
> >   drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  32 +-
> >   drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
> >   drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 341 
> > ---
> >   drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  10 +-
> >   drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  | 108 ++
> >   include/linux/mlx5/mlx5_ifc.h  |   6 -
> >   include/linux/net_dim.h| 372 
> > +
> >   17 files changed, 693 insertions(+), 426 deletions(-)
> >   create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
> >   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
> >   delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
> >   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h
> mlx5/core/net_dim.h was removed from code. Please fix the cover.

Yes, it's odd that this is there.  I'll see if I can figure out why as this
file should have been deleted with patch "net: move dynamic interrpt coalescing
code to include/linux" but it looks like maybe it was not.

> >   create mode 100644 include/linux/net_dim.h
> > 


Re: [net-next 09/10] bnxt_en: add support for software dynamic interrupt moderation

2018-01-05 Thread Andy Gospodarek
On Thu, Jan 04, 2018 at 02:16:26PM -0800, Michael Chan wrote:
> On Thu, Jan 4, 2018 at 12:21 PM, Andy Gospodarek <a...@greyhouse.net> wrote:
> > From: Andy Gospodarek <go...@broadcom.com>
> >
> > This implements the changes needed for the bnxt_en driver to add support
> > for dynamic interrupt moderation per ring.
> >
> > This does add additional counters in the receive path, but testing shows
> > that any additional instructions are offset by throughput gain when the
> > default configuration is for low latency.
> >
> > Signed-off-by: Andy Gospodarek <go...@broadcom.com>
> > Cc: Michael Chan <mc...@broadcom.com>
> 
> Andy, looks good in general. I just have a few comments below.  These
> minor issues can be cleaned up after merge if you want.

Thanks for the review -- not the first time you've seen it :-) -- and for
agreeing that we can cleanup after the merge.  I'll need a v2, so I might as
well fix anything we want to fix now.

> 
> 
> > +int bnxt_hwrm_set_ring_coal(struct bnxt *bp, struct bnxt_napi *bnapi)
> > +{
> > +   struct hwrm_ring_cmpl_ring_cfg_aggint_params_input req_rx = {0};
> > +   struct bnxt_cp_ring_info *cpr = >cp_ring;
> > +   struct bnxt_coal coal;
> > +   unsigned int grp_idx;
> > +   int rc = 0;
> > +
> > +/* Tick values in micro seconds.
> > + * 1 coal_buf x bufs_per_record = 1 completion record.
> > + */
> > +   memcpy(, >rx_coal, sizeof(struct bnxt_coal));
> > +
> > +   coal.coal_ticks = cpr->rx_ring_coal.coal_ticks;
> > +   coal.coal_bufs = cpr->rx_ring_coal.coal_bufs;
> > +
> > +   if (!bnapi->rx_ring)
> > +   return -ENODEV;
> > +
> > +   bnxt_hwrm_cmd_hdr_init(bp, _rx,
> > +  HWRM_RING_CMPL_RING_CFG_AGGINT_PARAMS, -1, 
> > -1);
> > +
> > +   bnxt_hwrm_set_coal_params(, _rx);
> > +
> > +   mutex_lock(>hwrm_cmd_lock);
> > +   grp_idx = bnapi->index;
> > +
> > +   req_rx.ring_id = cpu_to_le16(bp->grp_info[grp_idx].cp_fw_ring_id);
> > +
> > +   rc = _hwrm_send_message(bp, _rx, sizeof(req_rx),
> > +   HWRM_CMD_TIMEOUT);
> > +   mutex_unlock(>hwrm_cmd_lock);
> 
> You can use the hwrm_send_message() variant that does not require you
> to take the mutex.  You only need this variant and take the mutex if
> you need to check the firmware reply.
> 

OK, good to know.  I'll consider whether or not it is important to check
the reply.  I think I'd want to know if it failed, but I'm not sure what
I'd do were that error condition encountered

> > +   return rc;
> > +}
> > +
> >  int bnxt_hwrm_set_coal(struct bnxt *bp)
> >  {
> > int i, rc = 0;
> > @@ -5705,7 +5753,11 @@ static void bnxt_enable_napi(struct bnxt *bp)
> > int i;
> >
> > for (i = 0; i < bp->cp_nr_rings; i++) {
> 
> We only need to enable this for every completion ring that has an RX
> ring.  In some cases, for example when XDP is enabled, there will be a
> set of completion rings with only TX rings.  So I think we can
> optimize this for completion rings with RX only.

Good call.

> > +   struct bnxt_cp_ring_info *cpr = >bnapi[i]->cp_ring;
> > bp->bnapi[i]->in_reset = false;
> > +
> > +   INIT_WORK(>am.work, bnxt_dim_work);
> > +   cpr->am.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
> > napi_enable(>bnapi[i]->napi);
> > }
> >  }


Re: [net-next 00/10] net: create dynamic software irq moderation library

2018-01-04 Thread Andy Gospodarek
On Thu, Jan 04, 2018 at 10:37:37PM +0200, Or Gerlitz wrote:
> >   net/mlx5e: move interrupt moderation structs to new file
> >   net/mlx5e: move interrupt moderation forward declarations
> >   net/mlx5e: remove rq references in mlx5e_rx_am
> >   net/mlx5e: move AM logic enums
> >   net/mlx5e: move generic functions to new file
> >   net/mlx5e: change Mellanox references in DIM code
> 
> Hi, Andy && happy new 2018 --  this is indeed a nit, but I have
> provided it to you twice (...),
> please get the commit titles to align with what we do which is capital
> letter after the net/mlx5e: prefix
> 
> from: net/mlx5e: move interrupt moderation structs to new file
> to: net/mlx5e: Move interrupt moderation structs to new file
> 
> If you get other comments, just apply this for the next version, if everyone
> is happy, that would be a very small effort to just fix and get that in..

Or, you did mention this part and I'm _really_ sorry to forgot to add
the capitalization.  I will do that if there is a v2 (which is looks
like there might be since I cannot spell 'Maintained' correctly.





Re: [net-next 10/10] MAINTAINERS: add entry for Dynamic Interrupt Moderation

2018-01-04 Thread Andy Gospodarek
On Thu, Jan 04, 2018 at 02:36:54PM -0800, Stephen Hemminger wrote:
> On Thu,  4 Jan 2018 15:21:30 -0500
> Andy Gospodarek <a...@greyhouse.net> wrote:
> 
> > +DYNAMIC INTERRUPT MODERATION
> > +M: Tal Gilboa <ta...@mellanox.com>
> > +S: Mainained
> 
> s/Mainained/Maintained/

Ugh.  Thanks for noticing that, Stephen!



[net-next 01/10] net/mlx5e: move interrupt moderation structs to new file

2018-01-04 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Create new header file to prepare to move code that handles irq
moderation to a library that lives in a header file.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 33 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h | 75 
 include/linux/mlx5/mlx5_ifc.h|  6 --
 3 files changed, 76 insertions(+), 38 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 543060c..ddb5429 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -49,6 +49,7 @@
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
+#include "en_dim.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
@@ -226,12 +227,6 @@ enum mlx5e_priv_flag {
 #define MLX5E_MAX_BW_ALLOC 100 /* Max percentage of BW allocation */
 #endif
 
-struct mlx5e_cq_moder {
-   u16 usec;
-   u16 pkts;
-   u8 cq_period_mode;
-};
-
 struct mlx5e_params {
u8  log_sq_size;
u8  rq_wq_type;
@@ -472,32 +467,6 @@ struct mlx5e_mpw_info {
u16 skbs_frags[MLX5_MPWRQ_PAGES_PER_WQE];
 };
 
-struct mlx5e_rx_am_stats {
-   int ppms; /* packets per msec */
-   int bpms; /* bytes per msec */
-   int epms; /* events per msec */
-};
-
-struct mlx5e_rx_am_sample {
-   ktime_t time;
-   u32 pkt_ctr;
-   u32 byte_ctr;
-   u16 event_ctr;
-};
-
-struct mlx5e_rx_am { /* Adaptive Moderation */
-   u8  state;
-   struct mlx5e_rx_am_statsprev_stats;
-   struct mlx5e_rx_am_sample   start_sample;
-   struct work_struct  work;
-   u8  profile_ix;
-   u8  mode;
-   u8  tune_state;
-   u8  steps_right;
-   u8  steps_left;
-   u8  tired;
-};
-
 /* a single cache unit is capable to serve one napi call (for non-striding rq)
  * or a MPWQE (for striding rq).
  */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
new file mode 100644
index 000..84b8524
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -0,0 +1,75 @@
+/*
+ * Copyright (c) 2013-2015, Mellanox Technologies, Ltd.  All rights reserved.
+ * Copyright (c) 2017, Broadcom Limited
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+*/
+
+#ifndef MLX5_AM_H
+#define MLX5_AM_H
+
+struct mlx5e_cq_moder {
+   u16 usec;
+   u16 pkts;
+   u8 cq_period_mode;
+};
+
+struct mlx5e_rx_am_sample {
+   ktime_t time;
+   u32 pkt_ctr;
+   u32 byte_ctr;
+   u16 event_ctr;
+};
+
+struct mlx5e_rx_am_stats {
+   int ppms; /* packets per msec */
+   int bpms; /* bytes per msec */
+   int epms; /* events per msec */
+};
+
+struct mlx5e_rx_am { /* Adaptive Moderation */
+   u8  state;
+   struct mlx5e_rx_am_statsprev_stats;
+   struct mlx5e_rx_am_sample   start_sample;
+   struct work_struct  work;
+ 

[net-next 03/10] net/mlx5e: remove rq references in mlx5e_rx_am

2018-01-04 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This makes mlx5e_am_sample more generic so that it can be called easily
from a driver that does not use the same data structure to store these
values in a single structure.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h   |  6 --
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 22 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  5 -
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index f5f6535..b676a057 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -72,8 +72,10 @@ enum {
MLX5_CQ_PERIOD_NUM_MODES
 };
 
-struct mlx5e_rq;
-void mlx5e_rx_am(struct mlx5e_rq *rq);
+void mlx5e_rx_am(struct mlx5e_rx_am *am,
+u16 event_ctr,
+u64 packets,
+u64 bytes);
 void mlx5e_rx_am_work(struct work_struct *work);
 struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index e401d9d..1630076 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -264,13 +264,15 @@ static bool mlx5e_am_decision(struct mlx5e_rx_am_stats 
*curr_stats,
return am->profile_ix != prev_ix;
 }
 
-static void mlx5e_am_sample(struct mlx5e_rq *rq,
+static void mlx5e_am_sample(u16 event_ctr,
+   u64 packets,
+   u64 bytes,
struct mlx5e_rx_am_sample *s)
 {
s->time  = ktime_get();
-   s->pkt_ctr   = rq->stats.packets;
-   s->byte_ctr  = rq->stats.bytes;
-   s->event_ctr = rq->cq.event_ctr;
+   s->pkt_ctr   = packets;
+   s->byte_ctr  = bytes;
+   s->event_ctr = event_ctr;
 }
 
 #define MLX5E_AM_NEVENTS 64
@@ -309,20 +311,22 @@ void mlx5e_rx_am_work(struct work_struct *work)
am->state = MLX5E_AM_START_MEASURE;
 }
 
-void mlx5e_rx_am(struct mlx5e_rq *rq)
+void mlx5e_rx_am(struct mlx5e_rx_am *am,
+u16 event_ctr,
+u64 packets,
+u64 bytes)
 {
-   struct mlx5e_rx_am *am = >am;
struct mlx5e_rx_am_sample end_sample;
struct mlx5e_rx_am_stats curr_stats;
u16 nevents;
 
switch (am->state) {
case MLX5E_AM_MEASURE_IN_PROGRESS:
-   nevents = BIT_GAP(BITS_PER_TYPE(u16), rq->cq.event_ctr,
+   nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
  am->start_sample.event_ctr);
if (nevents < MLX5E_AM_NEVENTS)
break;
-   mlx5e_am_sample(rq, _sample);
+   mlx5e_am_sample(event_ctr, packets, bytes, _sample);
mlx5e_am_calc_stats(>start_sample, _sample,
_stats);
if (mlx5e_am_decision(_stats, am)) {
@@ -332,7 +336,7 @@ void mlx5e_rx_am(struct mlx5e_rq *rq)
}
/* fall through */
case MLX5E_AM_START_MEASURE:
-   mlx5e_am_sample(rq, >start_sample);
+   mlx5e_am_sample(event_ctr, packets, bytes, >start_sample);
am->state = MLX5E_AM_MEASURE_IN_PROGRESS;
break;
case MLX5E_AM_APPLY_NEW_PROFILE:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index ab92298..1849169 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -79,7 +79,10 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
mlx5e_cq_arm(>sq[i].cq);
 
if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM))
-   mlx5e_rx_am(>rq);
+   mlx5e_rx_am(>rq.am,
+   c->rq.cq.event_ctr,
+   c->rq.stats.packets,
+   c->rq.stats.bytes);
 
mlx5e_cq_arm(>rq.cq);
mlx5e_cq_arm(>icosq.cq);
-- 
2.7.4



[net-next 06/10] net/mlx5e: change Mellanox references in DIM code

2018-01-04 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Change all mlx5_am* and MLX_AM* references to net_dim and NET_DIM,
respectively, in code that handles dynamic interrupt moderation.  Also
change all references from 'am' to 'dim' when used as local variables.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  10 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  14 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h   |  20 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  12 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  32 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c  | 284 ++---
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  |  63 +++--
 9 files changed, 219 insertions(+), 222 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 2ccedf6..da2d5e7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -50,6 +50,7 @@
 #include "mlx5_core.h"
 #include "en_stats.h"
 #include "en_dim.h"
+#include "net_dim.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
@@ -237,8 +238,8 @@ struct mlx5e_params {
u16 num_channels;
u8  num_tc;
bool rx_cqe_compress_def;
-   struct mlx5e_cq_moder rx_cq_moderation;
-   struct mlx5e_cq_moder tx_cq_moderation;
+   struct net_dim_cq_moder rx_cq_moderation;
+   struct net_dim_cq_moder tx_cq_moderation;
bool lro_en;
u32 lro_wqe_sz;
u16 tx_max_inline;
@@ -248,7 +249,7 @@ struct mlx5e_params {
u32 indirection_rqt[MLX5E_INDIR_RQT_SIZE];
bool vlan_strip_disable;
bool scatter_fcs_en;
-   bool rx_am_enabled;
+   bool rx_dim_enabled;
u32 lro_timeout;
u32 pflags;
struct bpf_prog *xdp_prog;
@@ -527,7 +528,7 @@ struct mlx5e_rq {
unsigned long  state;
intix;
 
-   struct mlx5e_rx_am am; /* Adaptive Moderation */
+   struct net_dim dim; /* Dynamic Interrupt Moderation */
 
/* XDP */
struct bpf_prog   *xdp_prog;
@@ -1075,4 +1076,5 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
struct mlx5e_params *params,
u16 max_channels);
 u8 mlx5e_params_calculate_tx_min_inline(struct mlx5_core_dev *mdev);
+void mlx5e_rx_dim_work(struct work_struct *work);
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index b9b434b..f620325 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -32,17 +32,17 @@
 
 #include "en.h"
 
-void mlx5e_rx_am_work(struct work_struct *work)
+void mlx5e_rx_dim_work(struct work_struct *work)
 {
-   struct mlx5e_rx_am *am = container_of(work, struct mlx5e_rx_am,
- work);
-   struct mlx5e_rq *rq = container_of(am, struct mlx5e_rq, am);
-   struct mlx5e_cq_moder cur_profile = mlx5e_am_get_profile(am->mode,
-
am->profile_ix);
+   struct net_dim *dim = container_of(work, struct net_dim,
+  work);
+   struct mlx5e_rq *rq = container_of(dim, struct mlx5e_rq, dim);
+   struct net_dim_cq_moder cur_profile = net_dim_get_profile(dim->mode,
+ 
dim->profile_ix);
 
mlx5_core_modify_cq_moderation(rq->mdev, >cq.mcq,
   cur_profile.usec, cur_profile.pkts);
 
-   am->state = MLX5E_AM_START_MEASURE;
+   dim->state = NET_DIM_START_MEASURE;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index 5ce8e54..21219de 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -40,23 +40,23 @@ struct mlx5e_cq_moder {
u8 cq_period_mode;
 };
 
-struct mlx5e_rx_am_sample {
+struct mlx5e_rx_dim_sample {
ktime_t time;
u32 pkt_ctr;
u32 byte_ctr;
u16 event_ctr;
 };
 
-struct mlx5e_rx_am_stats {
+struct mlx5e_rx_dim_stats {
int ppms; /* packets per msec */
int bpms; /* bytes per msec */
int epms; /* events per msec */
 };
 
-struct mlx5e_rx_am { /* Adaptive Moderation */
+struct mlx5e_rx_dim { /* Adaptive Moderation */
u8  state;
-   struct mlx5e_rx_am_s

[net-next 08/10] net/dim: use struct net_dim_sample as arg to net_dim

2018-01-04 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Simplify the arguments net_dim() by formatting them into a struct
net_dim_sample before calling the function.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Suggested-by: Tal Gilboa <ta...@mellanox.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 13 -
 include/linux/net_dim.h   | 10 +++---
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index dae77a9..f292bb3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -78,11 +78,14 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
for (i = 0; i < c->num_tc; i++)
mlx5e_cq_arm(>sq[i].cq);
 
-   if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM))
-   net_dim(>rq.dim,
-   c->rq.cq.event_ctr,
-   c->rq.stats.packets,
-   c->rq.stats.bytes);
+   if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM)) {
+   struct net_dim_sample dim_sample;
+   net_dim_sample(c->rq.cq.event_ctr,
+  c->rq.stats.packets,
+  c->rq.stats.bytes,
+  _sample);
+   net_dim(>rq.dim, dim_sample);
+   }
 
mlx5e_cq_arm(>rq.cq);
mlx5e_cq_arm(>icosq.cq);
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index bb99073..2cceefa 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -341,21 +341,18 @@ static inline void net_dim_calc_stats(struct 
net_dim_sample *start,
 }
 
 static inline void net_dim(struct net_dim *dim,
-  u16 event_ctr,
-  u64 packets,
-  u64 bytes)
+  struct net_dim_sample end_sample)
 {
-   struct net_dim_sample end_sample;
struct net_dim_stats curr_stats;
u16 nevents;
 
switch (dim->state) {
case NET_DIM_MEASURE_IN_PROGRESS:
-   nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
+   nevents = BIT_GAP(BITS_PER_TYPE(u16),
+ end_sample.event_ctr,
  dim->start_sample.event_ctr);
if (nevents < NET_DIM_NEVENTS)
break;
-   net_dim_sample(event_ctr, packets, bytes, _sample);
net_dim_calc_stats(>start_sample, _sample,
   _stats);
if (net_dim_decision(_stats, dim)) {
@@ -365,7 +362,6 @@ static inline void net_dim(struct net_dim *dim,
}
/* fall through */
case NET_DIM_START_MEASURE:
-   net_dim_sample(event_ctr, packets, bytes, >start_sample);
dim->state = NET_DIM_MEASURE_IN_PROGRESS;
break;
case NET_DIM_APPLY_NEW_PROFILE:
-- 
2.7.4



[net-next 05/10] net/mlx5e: move generic functions to new file

2018-01-04 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

These functions were identified as ones that could be made generic and
used by multiple drivers.  Most of the contents of en_rx_am.c are moved
to net_dim.c.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  48 
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h   |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 320 -
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c  | 307 
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  | 109 +++
 6 files changed, 467 insertions(+), 322 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 19b21b4..b46b6de2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -14,8 +14,8 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o 
fpga/conn.o fpga/sdk.o \
fpga/ipsec.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o 
\
-   en_tx.o en_rx.o en_rx_am.o en_txrx.o en_stats.o vxlan.o \
-   en_arfs.o en_fs_ethtool.o en_selftest.o
+   en_tx.o en_rx.o en_dim.o en_txrx.o en_stats.o vxlan.o \
+   en_arfs.o en_fs_ethtool.o en_selftest.o net_dim.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
new file mode 100644
index 000..b9b434b
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -0,0 +1,48 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "en.h"
+
+void mlx5e_rx_am_work(struct work_struct *work)
+{
+   struct mlx5e_rx_am *am = container_of(work, struct mlx5e_rx_am,
+ work);
+   struct mlx5e_rq *rq = container_of(am, struct mlx5e_rq, am);
+   struct mlx5e_cq_moder cur_profile = mlx5e_am_get_profile(am->mode,
+
am->profile_ix);
+
+   mlx5_core_modify_cq_moderation(rq->mdev, >cq.mcq,
+  cur_profile.usec, cur_profile.pkts);
+
+   am->state = MLX5E_AM_START_MEASURE;
+}
+
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index c9f0d05..5ce8e54 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -104,5 +104,6 @@ void mlx5e_rx_am(struct mlx5e_rx_am *am,
 u64 bytes);
 void mlx5e_rx_am_work(struct work_struct *work);
 struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
+struct mlx5e_cq_moder mlx5e_am_get_profile(u8 cq_period_mode, int ix);
 
 #endif /* MLX5_AM_H */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
deleted file mode 100644
index 337dd60..000
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ /

[net-next 02/10] net/mlx5e: move interrupt moderation forward declarations

2018-01-04 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Move these to newly created file to prepare to move these functions to a
library.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 4 
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h | 5 +
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ddb5429..2ccedf6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -829,10 +829,6 @@ void mlx5e_dealloc_rx_wqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_dealloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_free_rx_mpwqe(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi);
 
-void mlx5e_rx_am(struct mlx5e_rq *rq);
-void mlx5e_rx_am_work(struct work_struct *work);
-struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
-
 void mlx5e_update_stats(struct mlx5e_priv *priv, bool full);
 
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index 84b8524..f5f6535 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -72,4 +72,9 @@ enum {
MLX5_CQ_PERIOD_NUM_MODES
 };
 
+struct mlx5e_rq;
+void mlx5e_rx_am(struct mlx5e_rq *rq);
+void mlx5e_rx_am_work(struct work_struct *work);
+struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
+
 #endif /* MLX5_AM_H */
-- 
2.7.4



[net-next 09/10] bnxt_en: add support for software dynamic interrupt moderation

2018-01-04 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This implements the changes needed for the bnxt_en driver to add support
for dynamic interrupt moderation per ring.

This does add additional counters in the receive path, but testing shows
that any additional instructions are offset by throughput gain when the
default configuration is for low latency.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Cc: Michael Chan <mc...@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/Makefile   |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 52 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 34 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c | 32 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 12 ++
 5 files changed, 120 insertions(+), 12 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c

diff --git a/drivers/net/ethernet/broadcom/bnxt/Makefile 
b/drivers/net/ethernet/broadcom/bnxt/Makefile
index 59c8ec9..7c560d5 100644
--- a/drivers/net/ethernet/broadcom/bnxt/Makefile
+++ b/drivers/net/ethernet/broadcom/bnxt/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_BNXT) += bnxt_en.o
 
-bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o 
bnxt_xdp.o bnxt_vfr.o bnxt_devlink.o
+bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o 
bnxt_xdp.o bnxt_vfr.o bnxt_devlink.o bnxt_dim.o
 bnxt_en-$(CONFIG_BNXT_FLOWER_OFFLOAD) += bnxt_tc.o
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 9efbdc6..3c5d2fa 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1645,6 +1645,8 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_napi 
*bnapi, u32 *raw_cons,
rxr->rx_next_cons = NEXT_RX(cons);
 
 next_rx_no_prod:
+   cpr->rx_packets += 1;
+   cpr->rx_bytes += len;
*raw_cons = tmp_raw_cons;
 
return rc;
@@ -1802,6 +1804,7 @@ static irqreturn_t bnxt_msix(int irq, void *dev_instance)
struct bnxt_cp_ring_info *cpr = >cp_ring;
u32 cons = RING_CMP(cpr->cp_raw_cons);
 
+   cpr->event_ctr++;
prefetch(>cp_desc_ring[CP_RING(cons)][CP_IDX(cons)]);
napi_schedule(>napi);
return IRQ_HANDLED;
@@ -2025,6 +2028,14 @@ static int bnxt_poll(struct napi_struct *napi, int 
budget)
break;
}
}
+   if (bp->flags & BNXT_FLAG_DIM) {
+   struct net_dim_sample dim_sample;
+   net_dim_sample(cpr->event_ctr,
+  cpr->rx_packets,
+  cpr->rx_bytes,
+  _sample);
+   net_dim(>am, dim_sample);
+   }
mmiowb();
return work_done;
 }
@@ -2610,6 +2621,8 @@ static void bnxt_init_cp_rings(struct bnxt *bp)
struct bnxt_ring_struct *ring = >cp_ring_struct;
 
ring->fw_ring_id = INVALID_HW_RING_ID;
+   cpr->rx_ring_coal.coal_ticks = bp->rx_coal.coal_ticks;
+   cpr->rx_ring_coal.coal_bufs = bp->rx_coal.coal_bufs;
}
 }
 
@@ -4583,6 +4596,41 @@ static void bnxt_hwrm_set_coal_params(struct bnxt_coal 
*hw_coal,
req->flags = cpu_to_le16(flags);
 }
 
+int bnxt_hwrm_set_ring_coal(struct bnxt *bp, struct bnxt_napi *bnapi)
+{
+   struct hwrm_ring_cmpl_ring_cfg_aggint_params_input req_rx = {0};
+   struct bnxt_cp_ring_info *cpr = >cp_ring;
+   struct bnxt_coal coal;
+   unsigned int grp_idx;
+   int rc = 0;
+
+/* Tick values in micro seconds.
+ * 1 coal_buf x bufs_per_record = 1 completion record.
+ */
+   memcpy(, >rx_coal, sizeof(struct bnxt_coal));
+
+   coal.coal_ticks = cpr->rx_ring_coal.coal_ticks;
+   coal.coal_bufs = cpr->rx_ring_coal.coal_bufs;
+
+   if (!bnapi->rx_ring)
+   return -ENODEV;
+
+   bnxt_hwrm_cmd_hdr_init(bp, _rx,
+  HWRM_RING_CMPL_RING_CFG_AGGINT_PARAMS, -1, -1);
+
+   bnxt_hwrm_set_coal_params(, _rx);
+
+   mutex_lock(>hwrm_cmd_lock);
+   grp_idx = bnapi->index;
+
+   req_rx.ring_id = cpu_to_le16(bp->grp_info[grp_idx].cp_fw_ring_id);
+
+   rc = _hwrm_send_message(bp, _rx, sizeof(req_rx),
+   HWRM_CMD_TIMEOUT);
+   mutex_unlock(>hwrm_cmd_lock);
+   return rc;
+}
+
 int bnxt_hwrm_set_coal(struct bnxt *bp)
 {
int i, rc = 0;
@@ -5705,7 +5753,11 @@ static void bnxt_enable_napi(struct bnxt *bp)
int i;
 
for (i = 0; i < bp->cp_nr_rings; i++) {
+   struct bnxt_cp_ring_info *cpr = >bnapi[i]->cp_ring;
bp->bnapi[i]->in_reset = false;
+
+   INIT_WORK(>am.work, bnxt_dim_work);
+   cpr->am.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;

[net-next 10/10] MAINTAINERS: add entry for Dynamic Interrupt Moderation

2018-01-04 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Signed-off-by: Tal Gilboa <ta...@mellanox.com>
---
 MAINTAINERS | 5 +
 1 file changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 753799d..769857b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4944,6 +4944,11 @@ S:   Maintained
 F: lib/dynamic_debug.c
 F: include/linux/dynamic_debug.h
 
+DYNAMIC INTERRUPT MODERATION
+M: Tal Gilboa <ta...@mellanox.com>
+S: Mainained
+F: include/linux/net_dim.h
+
 DZ DECSTATION DZ11 SERIAL DRIVER
 M: "Maciej W. Rozycki" <ma...@linux-mips.org>
 S: Maintained
-- 
2.7.4



[net-next 04/10] net/mlx5e: move AM logic enums

2018-01-04 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

More movement to help make this code more generic.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h   | 26 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 25 -
 2 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index b676a057..c9f0d05 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -72,6 +72,32 @@ enum {
MLX5_CQ_PERIOD_NUM_MODES
 };
 
+/* Adaptive moderation logic */
+enum {
+   MLX5E_AM_START_MEASURE,
+   MLX5E_AM_MEASURE_IN_PROGRESS,
+   MLX5E_AM_APPLY_NEW_PROFILE,
+};
+
+enum {
+   MLX5E_AM_PARKING_ON_TOP,
+   MLX5E_AM_PARKING_TIRED,
+   MLX5E_AM_GOING_RIGHT,
+   MLX5E_AM_GOING_LEFT,
+};
+
+enum {
+   MLX5E_AM_STATS_WORSE,
+   MLX5E_AM_STATS_SAME,
+   MLX5E_AM_STATS_BETTER,
+};
+
+enum {
+   MLX5E_AM_STEPPED,
+   MLX5E_AM_TOO_TIRED,
+   MLX5E_AM_ON_EDGE,
+};
+
 void mlx5e_rx_am(struct mlx5e_rx_am *am,
 u16 event_ctr,
 u64 packets,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index 1630076..337dd60 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -82,31 +82,6 @@ struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 
rx_cq_period_mode)
return mlx5e_am_get_profile(rx_cq_period_mode, default_profile_ix);
 }
 
-/* Adaptive moderation logic */
-enum {
-   MLX5E_AM_START_MEASURE,
-   MLX5E_AM_MEASURE_IN_PROGRESS,
-   MLX5E_AM_APPLY_NEW_PROFILE,
-};
-
-enum {
-   MLX5E_AM_PARKING_ON_TOP,
-   MLX5E_AM_PARKING_TIRED,
-   MLX5E_AM_GOING_RIGHT,
-   MLX5E_AM_GOING_LEFT,
-};
-
-enum {
-   MLX5E_AM_STATS_WORSE,
-   MLX5E_AM_STATS_SAME,
-   MLX5E_AM_STATS_BETTER,
-};
-
-enum {
-   MLX5E_AM_STEPPED,
-   MLX5E_AM_TOO_TIRED,
-   MLX5E_AM_ON_EDGE,
-};
 
 static bool mlx5e_am_on_top(struct mlx5e_rx_am *am)
 {
-- 
2.7.4



[net-next 00/10] net: create dynamic software irq moderation library

2018-01-04 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This converts the dynamic interrupt moderation library from the mlx5_en driver
into a library so it can be used by any driver.  The penultimatepatch in this
set adds support for interrupt moderation in the bnxt_en driver and the last
patch creates an entry in the MAINTAINERS file.  

The main purpose of this code in the mlx5_en driver is to allow an
administrator to make sure that default coalesce settings are optimized
for low latency, but quickly adapt to handle high throughput traffic and
optimize how many packets are received during each napi poll.

For any new driver the following changes would be needed to use this
library:

- add elements in ring struct to track items needed by this library
- create function that can be called to actually set coalesce settings
  for the driver

Credit to Rob Rice and Lee Reed for doing some of the initial proof of
concept and testing for this patch and Tal Gilboa and Or Gerlitz for their
comments, etc on this set.

Andy Gospodarek (10):
  net/mlx5e: move interrupt moderation structs to new file
  net/mlx5e: move interrupt moderation forward declarations
  net/mlx5e: remove rq references in mlx5e_rx_am
  net/mlx5e: move AM logic enums
  net/mlx5e: move generic functions to new file
  net/mlx5e: change Mellanox references in DIM code
  net: move dynamic interrpt coalescing code to include/linux
  net/dim: use struct net_dim_sample as arg to net_dim
  bnxt_en: add support for software dynamic interrupt moderation
  MAINTAINERS: add entry for Dynamic Interrupt Moderation

 MAINTAINERS|   5 +
 drivers/net/ethernet/broadcom/bnxt/Makefile|   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c  |  52 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h  |  34 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c  |  32 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  |  12 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  46 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  49 +++
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  12 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  32 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 341 ---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  10 +-
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  | 108 ++
 include/linux/mlx5/mlx5_ifc.h  |   6 -
 include/linux/net_dim.h| 372 +
 17 files changed, 693 insertions(+), 426 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h
 create mode 100644 include/linux/net_dim.h

-- 
2.7.4



[net-next 07/10] net: move dynamic interrupt coalescing code to include/linux

2018-01-04 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This move allows drivers to add private structure elements to track the
number of packets, bytes, and interrupts events per ring.  A driver
also defines a workqueue handler to act on this collected data once per
poll and modify the coalescing parameters per ring.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
Acked-by: Tal Gilboa <ta...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h  | 105 --
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c | 307 --
 include/linux/net_dim.h   | 376 ++
 6 files changed, 379 insertions(+), 415 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
 create mode 100644 include/linux/net_dim.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index b46b6de2..c805769 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -15,7 +15,7 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o 
fpga/conn.o fpga/sdk.o \
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o 
\
en_tx.o en_rx.o en_dim.o en_txrx.o en_stats.o vxlan.o \
-   en_arfs.o en_fs_ethtool.o en_selftest.o net_dim.o
+   en_arfs.o en_fs_ethtool.o en_selftest.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index da2d5e7..41e6783 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -46,11 +46,10 @@
 #include 
 #include 
 #include 
+#include 
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
-#include "en_dim.h"
-#include "net_dim.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index f620325..2b89951 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -30,6 +30,7 @@
  * SOFTWARE.
  */
 
+#include 
 #include "en.h"
 
 void mlx5e_rx_dim_work(struct work_struct *work)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
deleted file mode 100644
index 21219de..000
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ /dev/null
@@ -1,105 +0,0 @@
-/*
- * Copyright (c) 2013-2015, Mellanox Technologies, Ltd.  All rights reserved.
- * Copyright (c) 2017, Broadcom Limited
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- *  - Redistributions of source code must retain the above
- *copyright notice, this list of conditions and the following
- *disclaimer.
- *
- *  - Redistributions in binary form must reproduce the above
- *copyright notice, this list of conditions and the following
- *disclaimer in the documentation and/or other materials
- *provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
-*/
-
-#ifndef MLX5_AM_H
-#define MLX5_AM_H
-
-struct mlx5e_cq_moder {
-   u16 usec;
-   u16 pkts;
-   u8 cq_period_mode;
-};
-
-struct mlx5e_rx_dim_sample {
-   ktime_t time;
-   u32 pkt_ctr;
-   u32 byte_ctr;
-   u16 event_ctr;
-};
-
-struct mlx5e_rx_dim_stats {
-   int ppms; /* packets per msec */
-   int bpms; /* bytes per msec */
-   int epms; /* events per msec */
-};
-
-struct mlx5e_rx_dim { /* Adaptive Moderation */
-   u8  state;
- 

Re: [PATCH v1] net: bonding: Replace mac address parsing

2017-12-20 Thread Andy Gospodarek
On Tue, Dec 19, 2017 at 08:20:44PM +0200, Andy Shevchenko wrote:
> Replace sscanf() with mac_pton().
> 
> Signed-off-by: Andy Shevchenko <andriy.shevche...@linux.intel.com>

Nice cleanup.  Thanks!

Acked-by: Andy Gospodarek <a...@greyhouse.net>

> ---
>  drivers/net/bonding/bond_options.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_options.c 
> b/drivers/net/bonding/bond_options.c
> index 8a9b085c2a98..58c705f24f96 100644
> --- a/drivers/net/bonding/bond_options.c
> +++ b/drivers/net/bonding/bond_options.c
> @@ -1431,13 +1431,9 @@ static int bond_option_ad_actor_system_set(struct 
> bonding *bond,
>  {
>   u8 macaddr[ETH_ALEN];
>   u8 *mac;
> - int i;
>  
>   if (newval->string) {
> - i = sscanf(newval->string, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
> -[0], [1], [2],
> -[3], [4], [5]);
> - if (i != ETH_ALEN)
> + if (!mac_pton(newval->string, macaddr))
>   goto err;
>   mac = macaddr;
>   } else {
> -- 
> 2.15.1
> 


Re: [PATCH net] ipv6: Do not consider linkdown nexthops during multipath

2017-11-21 Thread Andy Gospodarek
On Tue, Nov 21, 2017 at 09:50:12AM +0200, Ido Schimmel wrote:
> When the 'ignore_routes_with_linkdown' sysctl is set, we should not
> consider linkdown nexthops during route lookup.
> 
> While the code correctly verifies that the initially selected route
> ('match') has a carrier, it does not perform the same check in the
> subsequent multipath selection, resulting in a potential packet loss.
> 
> In case the chosen route does not have a carrier and the sysctl is set,
> choose the initially selected route.
> 
> Fixes: 35103d11173b ("net: ipv6 sysctl option to ignore routes when nexthop 
> link is down")
> Signed-off-by: Ido Schimmel <ido...@mellanox.com>

Nice find.  Looks good to me, as well.

Acked-by: Andy Gospodarek <a...@greyhouse.net>

> ---
>  net/ipv6/route.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 05eb7bc36156..0363db914c7a 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -472,6 +472,11 @@ static struct rt6_info *rt6_multipath_select(struct 
> rt6_info *match,
>   >rt6i_siblings, rt6i_siblings) {
>   route_choosen--;
>   if (route_choosen == 0) {
> + struct inet6_dev *idev = sibling->rt6i_idev;
> +
> + if (!netif_carrier_ok(sibling->dst.dev) &&
> + idev->cnf.ignore_routes_with_linkdown)
> + break;
>   if (rt6_score_route(sibling, oif, strict) < 0)
>   break;
>   match = sibling;
> -- 
> 2.14.3
> 


[RFC 8/9] net: move adaptive interrpt coalescing code to lib/

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This takes the code that is not generically named to lib/.

This move allows drivers to add private structure elements to track the
number of packets, bytes, and interrupts events per ring.  A driver
also defines a workqueue handler to act on this collected data once per
poll and modify the coalecing paramets per ring.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c |   1 +
 .../net/ethernet/mellanox/mlx5/core/net_rx_am.c| 303 
 .../net/ethernet/mellanox/mlx5/core/net_rx_am.h| 107 ---
 include/linux/net_rx_am.h  | 109 
 lib/Makefile   |   2 +-
 lib/net_rx_am.c| 306 +
 8 files changed, 419 insertions(+), 413 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.h
 create mode 100644 include/linux/net_rx_am.h
 create mode 100644 lib/net_rx_am.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index d5d6d3d..19b21b4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -15,7 +15,7 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o 
fpga/conn.o fpga/sdk.o \
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o 
\
en_tx.o en_rx.o en_rx_am.o en_txrx.o en_stats.o vxlan.o \
-   en_arfs.o en_fs_ethtool.o en_selftest.o net_rx_am.o
+   en_arfs.o en_fs_ethtool.o en_selftest.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 203dc7b..04b36fe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -49,7 +49,7 @@
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
-#include "net_rx_am.h"
+#include 
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index 1f8fda1..391f1ba 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -31,6 +31,7 @@
  */
 
 #include "en.h"
+#include 
 
 void mlx5e_rx_am_work(struct work_struct *work)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c
deleted file mode 100644
index 37ea6d1..000
--- a/drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c
+++ /dev/null
@@ -1,303 +0,0 @@
-/*
- * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
- * Copyright (c) 2017, Broadcom Limiited. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- *  - Redistributions of source code must retain the above
- *copyright notice, this list of conditions and the following
- *disclaimer.
- *
- *  - Redistributions in binary form must reproduce the above
- *copyright notice, this list of conditions and the following
- *disclaimer in the documentation and/or other materials
- *provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-
-#include "en.h"
-
-#define NET_PARAMS_AM_NUM_PROFILES 5
-/* Adaptive moderation profiles */
-#define NET_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
-#define NET_RX_AM_DEF_PROFILE_CQE 1
-#define NET_RX_AM_DEF_PROFILE_EQE 1
-
-/* All profiles sizes must be NET_PARAMS_AM_NUM_PROFILES */
-#define NET_AM_EQE_PROFILES { \
-   {1,   NET_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_E

[RFC 3/9] mlx5_en: remove rq references in mlx5e_rx_am

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This makes mlx5e_am_sample more generic so that it can be called easily
from a driver that does not use the same data structure to store these
values in a single structure.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 22 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h |  5 -
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  5 -
 3 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index acf32fe..845dbb8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -260,13 +260,15 @@ static bool mlx5e_am_decision(struct mlx5e_rx_am_stats 
*curr_stats,
return am->profile_ix != prev_ix;
 }
 
-static void mlx5e_am_sample(struct mlx5e_rq *rq,
+static void mlx5e_am_sample(u16 event_ctr,
+   u64 packets,
+   u64 bytes,
struct mlx5e_rx_am_sample *s)
 {
s->time  = ktime_get();
-   s->pkt_ctr   = rq->stats.packets;
-   s->byte_ctr  = rq->stats.bytes;
-   s->event_ctr = rq->cq.event_ctr;
+   s->pkt_ctr   = packets;
+   s->byte_ctr  = bytes;
+   s->event_ctr = event_ctr;
 }
 
 #define MLX5E_AM_NEVENTS 64
@@ -305,20 +307,22 @@ void mlx5e_rx_am_work(struct work_struct *work)
am->state = MLX5E_AM_START_MEASURE;
 }
 
-void mlx5e_rx_am(struct mlx5e_rq *rq)
+void mlx5e_rx_am(struct mlx5e_rx_am *am,
+u16 event_ctr,
+u64 packets,
+u64 bytes)
 {
-   struct mlx5e_rx_am *am = >am;
struct mlx5e_rx_am_sample end_sample;
struct mlx5e_rx_am_stats curr_stats;
u16 nevents;
 
switch (am->state) {
case MLX5E_AM_MEASURE_IN_PROGRESS:
-   nevents = BIT_GAP(BITS_PER_TYPE(u16), rq->cq.event_ctr,
+   nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
  am->start_sample.event_ctr);
if (nevents < MLX5E_AM_NEVENTS)
break;
-   mlx5e_am_sample(rq, _sample);
+   mlx5e_am_sample(event_ctr, packets, bytes, _sample);
mlx5e_am_calc_stats(>start_sample, _sample,
_stats);
if (mlx5e_am_decision(_stats, am)) {
@@ -328,7 +332,7 @@ void mlx5e_rx_am(struct mlx5e_rq *rq)
}
/* fall through */
case MLX5E_AM_START_MEASURE:
-   mlx5e_am_sample(rq, >start_sample);
+   mlx5e_am_sample(event_ctr, packets, bytes, >start_sample);
am->state = MLX5E_AM_MEASURE_IN_PROGRESS;
break;
case MLX5E_AM_APPLY_NEW_PROFILE:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
index 869e4e7..90e4913 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
@@ -71,7 +71,10 @@ enum {
MLX5_CQ_PERIOD_NUM_MODES
 };
 
-void mlx5e_rx_am(struct mlx5e_rq *rq);
+void mlx5e_rx_am(struct mlx5e_rx_am *am,
+u16 event_ctr,
+u64 packets,
+u64 bytes);
 void mlx5e_rx_am_work(struct work_struct *work);
 struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index e906b75..8fed6c6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -77,7 +77,10 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
mlx5e_cq_arm(>sq[i].cq);
 
if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM))
-   mlx5e_rx_am(>rq);
+   mlx5e_rx_am(>rq.am,
+   c->rq.cq.event_ctr,
+   c->rq.stats.packets,
+   c->rq.stats.bytes);
 
mlx5e_cq_arm(>rq.cq);
mlx5e_cq_arm(>icosq.cq);
-- 
2.7.4



[RFC 5/9] mlx5_en: move generic functions to new file

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

These functions were identified as ones that could be made generic and
used by multiple drivers.  Most of the contents of en_rx_am.c are moved
to net_rx_am.c.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 272 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h |   1 +
 .../net/ethernet/mellanox/mlx5/core/net_rx_am.c| 303 +
 4 files changed, 307 insertions(+), 271 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 19b21b4..d5d6d3d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -15,7 +15,7 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o 
fpga/conn.o fpga/sdk.o \
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o 
\
en_tx.o en_rx.o en_rx_am.o en_txrx.o en_stats.o vxlan.o \
-   en_arfs.o en_fs_ethtool.o en_selftest.o
+   en_arfs.o en_fs_ethtool.o en_selftest.o net_rx_am.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index 02d4f80..b9b434b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -32,249 +32,13 @@
 
 #include "en.h"
 
-/* Adaptive moderation profiles */
-#define MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
-#define MLX5E_RX_AM_DEF_PROFILE_CQE 1
-#define MLX5E_RX_AM_DEF_PROFILE_EQE 1
-#define MLX5E_PARAMS_AM_NUM_PROFILES 5
-
-/* All profiles sizes must be MLX5E_PARAMS_AM_NUM_PROFILES */
-#define MLX5_AM_EQE_PROFILES { \
-   {1,   MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {8,   MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {64,  MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {128, MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {256, MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-}
-
-#define MLX5_AM_CQE_PROFILES { \
-   {2,  256}, \
-   {8,  128}, \
-   {16, 64},  \
-   {32, 64},  \
-   {64, 64}   \
-}
-
-static const struct mlx5e_cq_moder
-profile[MLX5_CQ_PERIOD_NUM_MODES][MLX5E_PARAMS_AM_NUM_PROFILES] = {
-   MLX5_AM_EQE_PROFILES,
-   MLX5_AM_CQE_PROFILES,
-};
-
-static inline struct mlx5e_cq_moder mlx5e_am_get_profile(u8 cq_period_mode, 
int ix)
-{
-   return profile[cq_period_mode][ix];
-}
-
-struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode)
-{
-   int default_profile_ix;
-
-   if (rx_cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE)
-   default_profile_ix = MLX5E_RX_AM_DEF_PROFILE_CQE;
-   else /* MLX5_CQ_PERIOD_MODE_START_FROM_EQE */
-   default_profile_ix = MLX5E_RX_AM_DEF_PROFILE_EQE;
-
-   return profile[rx_cq_period_mode][default_profile_ix];
-}
-
-
-static bool mlx5e_am_on_top(struct mlx5e_rx_am *am)
-{
-   switch (am->tune_state) {
-   case MLX5E_AM_PARKING_ON_TOP:
-   case MLX5E_AM_PARKING_TIRED:
-   return true;
-   case MLX5E_AM_GOING_RIGHT:
-   return (am->steps_left > 1) && (am->steps_right == 1);
-   default: /* MLX5E_AM_GOING_LEFT */
-   return (am->steps_right > 1) && (am->steps_left == 1);
-   }
-}
-
-static void mlx5e_am_turn(struct mlx5e_rx_am *am)
-{
-   switch (am->tune_state) {
-   case MLX5E_AM_PARKING_ON_TOP:
-   case MLX5E_AM_PARKING_TIRED:
-   break;
-   case MLX5E_AM_GOING_RIGHT:
-   am->tune_state = MLX5E_AM_GOING_LEFT;
-   am->steps_left = 0;
-   break;
-   case MLX5E_AM_GOING_LEFT:
-   am->tune_state = MLX5E_AM_GOING_RIGHT;
-   am->steps_right = 0;
-   break;
-   }
-}
-
-static int mlx5e_am_step(struct mlx5e_rx_am *am)
-{
-   if (am->tired == (MLX5E_PARAMS_AM_NUM_PROFILES * 2))
-   return MLX5E_AM_TOO_TIRED;
-
-   switch (am->tune_state) {
-   case MLX5E_AM_PARKING_ON_TOP:
-   case MLX5E_AM_PARKING_TIRED:
-   break;
-   case MLX5E_AM_GOING_RIGHT:
-   if (am->profile_ix == (MLX5E_PARAMS_AM_NUM_PROFILES - 1))
-   return MLX5E_AM_ON_EDGE;
-   am->profile_ix++;
-   am->steps_right++;
-   break;
-   case MLX5E_AM_GOING_LEFT:
-   if (am->profile_ix == 0)
-   return MLX5E_AM_ON_EDGE;
-   am->profile_ix--;
-   am->ste

[RFC 6/9] mlx5_en: rename en_rx_am.h to net_rx_am.h

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This is so net_rx_am.h can be easily moved out of mlx5/core.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h | 108 -
 .../net/ethernet/mellanox/mlx5/core/net_rx_am.h| 108 +
 3 files changed, 109 insertions(+), 109 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 1c56d16..a9dc118 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -49,7 +49,7 @@
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
-#include "en_rx_am.h"
+#include "net_rx_am.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
deleted file mode 100644
index ef86bf8..000
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
+++ /dev/null
@@ -1,108 +0,0 @@
-/*
- * Copyright (c) 2013-2015, Mellanox Technologies, Ltd.  All rights reserved.
- * Copyright (c) 2017, Broadcom Limited
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- *  - Redistributions of source code must retain the above
- *copyright notice, this list of conditions and the following
- *disclaimer.
- *
- *  - Redistributions in binary form must reproduce the above
- *copyright notice, this list of conditions and the following
- *disclaimer in the documentation and/or other materials
- *provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
-*/
-
-#ifndef MLX5_AM_H
-#define MLX5_AM_H
-
-struct mlx5e_cq_moder {
-   u16 usec;
-   u16 pkts;
-};
-
-struct mlx5e_rx_am_sample {
-   ktime_t time;
-   u32 pkt_ctr;
-   u32 byte_ctr;
-   u16 event_ctr;
-};
-
-struct mlx5e_rx_am_stats {
-   int ppms; /* packets per msec */
-   int bpms; /* bytes per msec */
-   int epms; /* events per msec */
-};
-
-struct mlx5e_rx_am { /* Adaptive Moderation */
-   u8  state;
-   struct mlx5e_rx_am_statsprev_stats;
-   struct mlx5e_rx_am_sample   start_sample;
-   struct work_struct  work;
-   u8  profile_ix;
-   u8  mode;
-   u8  tune_state;
-   u8  steps_right;
-   u8  steps_left;
-   u8  tired;
-};
-
-enum {
-   MLX5_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
-   MLX5_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
-   MLX5_CQ_PERIOD_NUM_MODES
-};
-
-/* Adaptive moderation logic */
-enum {
-   MLX5E_AM_START_MEASURE,
-   MLX5E_AM_MEASURE_IN_PROGRESS,
-   MLX5E_AM_APPLY_NEW_PROFILE,
-};
-
-enum {
-   MLX5E_AM_PARKING_ON_TOP,
-   MLX5E_AM_PARKING_TIRED,
-   MLX5E_AM_GOING_RIGHT,
-   MLX5E_AM_GOING_LEFT,
-};
-
-enum {
-   MLX5E_AM_STATS_WORSE,
-   MLX5E_AM_STATS_SAME,
-   MLX5E_AM_STATS_BETTER,
-};
-
-enum {
-   MLX5E_AM_STEPPED,
-   MLX5E_AM_TOO_TIRED,
-   MLX5E_AM_ON_EDGE,
-};
-
-void mlx5e_rx_am(struct mlx5e_rx_am *am,
-u16 event_ctr,
-u64 packets,
-u64 bytes);
-void mlx5e_rx_am_work(struct work_struct *work);
-struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
-struct mlx5e_cq_moder mlx5e_am_get_profile(u8 cq_period_mode, int ix);
-
-#endif /* MLX5_AM_H */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.h 
b/drivers/net/ethernet/mellanox/mlx5/core/net

[RFC 4/9] mlx5_en: move AM logic enums

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

More movement to help make this code more generic.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 25 -
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h | 26 ++
 2 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index 845dbb8..02d4f80 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -78,31 +78,6 @@ struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 
rx_cq_period_mode)
return profile[rx_cq_period_mode][default_profile_ix];
 }
 
-/* Adaptive moderation logic */
-enum {
-   MLX5E_AM_START_MEASURE,
-   MLX5E_AM_MEASURE_IN_PROGRESS,
-   MLX5E_AM_APPLY_NEW_PROFILE,
-};
-
-enum {
-   MLX5E_AM_PARKING_ON_TOP,
-   MLX5E_AM_PARKING_TIRED,
-   MLX5E_AM_GOING_RIGHT,
-   MLX5E_AM_GOING_LEFT,
-};
-
-enum {
-   MLX5E_AM_STATS_WORSE,
-   MLX5E_AM_STATS_SAME,
-   MLX5E_AM_STATS_BETTER,
-};
-
-enum {
-   MLX5E_AM_STEPPED,
-   MLX5E_AM_TOO_TIRED,
-   MLX5E_AM_ON_EDGE,
-};
 
 static bool mlx5e_am_on_top(struct mlx5e_rx_am *am)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
index 90e4913..efbee99 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
@@ -71,6 +71,32 @@ enum {
MLX5_CQ_PERIOD_NUM_MODES
 };
 
+/* Adaptive moderation logic */
+enum {
+   MLX5E_AM_START_MEASURE,
+   MLX5E_AM_MEASURE_IN_PROGRESS,
+   MLX5E_AM_APPLY_NEW_PROFILE,
+};
+
+enum {
+   MLX5E_AM_PARKING_ON_TOP,
+   MLX5E_AM_PARKING_TIRED,
+   MLX5E_AM_GOING_RIGHT,
+   MLX5E_AM_GOING_LEFT,
+};
+
+enum {
+   MLX5E_AM_STATS_WORSE,
+   MLX5E_AM_STATS_SAME,
+   MLX5E_AM_STATS_BETTER,
+};
+
+enum {
+   MLX5E_AM_STEPPED,
+   MLX5E_AM_TOO_TIRED,
+   MLX5E_AM_ON_EDGE,
+};
+
 void mlx5e_rx_am(struct mlx5e_rx_am *am,
 u16 event_ctr,
 u64 packets,
-- 
2.7.4



[RFC 9/9] bnxt_en: add support for software adaptive interrupt moderation

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This implements the changes needed for the bnxt_en driver to add support
for adaptive interrupt moderation per ring.

This does add additional counters in the receive path, but testing shows
that any additional instructions are offset by throughput gain when the
default configuration is for low latency.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/Makefile   |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 51 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 34 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c |  7 
 drivers/net/ethernet/broadcom/bnxt/bnxt_rx_am.c   | 32 ++
 5 files changed, 114 insertions(+), 12 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_rx_am.c

diff --git a/drivers/net/ethernet/broadcom/bnxt/Makefile 
b/drivers/net/ethernet/broadcom/bnxt/Makefile
index 59c8ec9..1b0c78c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/Makefile
+++ b/drivers/net/ethernet/broadcom/bnxt/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_BNXT) += bnxt_en.o
 
-bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o 
bnxt_xdp.o bnxt_vfr.o bnxt_devlink.o
+bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o 
bnxt_xdp.o bnxt_vfr.o bnxt_devlink.o bnxt_rx_am.o
 bnxt_en-$(CONFIG_BNXT_FLOWER_OFFLOAD) += bnxt_tc.o
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 4e3d569..e1110d9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1482,6 +1482,7 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_napi 
*bnapi, u32 *raw_cons,
u32 tmp_raw_cons = *raw_cons;
u16 cfa_code, cons, prod, cp_cons = RING_CMP(tmp_raw_cons);
struct bnxt_sw_rx_bd *rx_buf;
+   unsigned int pkts = 0;
unsigned int len;
u8 *data_ptr, agg_bufs, cmp_type;
dma_addr_t dma_addr;
@@ -1522,6 +1523,8 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_napi 
*bnapi, u32 *raw_cons,
 
rc = -ENOMEM;
if (likely(skb)) {
+   struct skb_shared_info *shinfo = skb_shinfo(skb);
+   pkts = shinfo->nr_frags;
bnxt_deliver_skb(bp, bnapi, skb);
rc = 1;
}
@@ -1645,6 +1648,8 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_napi 
*bnapi, u32 *raw_cons,
rxr->rx_next_cons = NEXT_RX(cons);
 
 next_rx_no_prod:
+   cpr->rx_packets += pkts ? : 1;
+   cpr->rx_bytes += len;
*raw_cons = tmp_raw_cons;
 
return rc;
@@ -1798,6 +1803,7 @@ static irqreturn_t bnxt_msix(int irq, void *dev_instance)
struct bnxt_cp_ring_info *cpr = >cp_ring;
u32 cons = RING_CMP(cpr->cp_raw_cons);
 
+   cpr->event_ctr++;
prefetch(>cp_desc_ring[CP_RING(cons)][CP_IDX(cons)]);
napi_schedule(>napi);
return IRQ_HANDLED;
@@ -2021,6 +2027,11 @@ static int bnxt_poll(struct napi_struct *napi, int 
budget)
break;
}
}
+   if (bp->flags & BNXT_FLAG_RX_AM)
+   net_rx_am(>am,
+ cpr->event_ctr,
+ cpr->rx_packets,
+ cpr->rx_bytes);
mmiowb();
return work_done;
 }
@@ -2606,6 +2617,8 @@ static void bnxt_init_cp_rings(struct bnxt *bp)
struct bnxt_ring_struct *ring = >cp_ring_struct;
 
ring->fw_ring_id = INVALID_HW_RING_ID;
+   cpr->rx_ring_coal.coal_ticks = bp->rx_coal.coal_ticks;
+   cpr->rx_ring_coal.coal_bufs = bp->rx_coal.coal_bufs;
}
 }
 
@@ -4579,6 +4592,38 @@ static void bnxt_hwrm_set_coal_params(struct bnxt_coal 
*hw_coal,
req->flags = cpu_to_le16(flags);
 }
 
+int bnxt_hwrm_set_ring_coal(struct bnxt *bp, struct bnxt_napi *bnapi)
+{
+   struct hwrm_ring_cmpl_ring_cfg_aggint_params_input req_rx = {0};
+   struct bnxt_cp_ring_info *cpr = >cp_ring;
+   struct bnxt_coal coal;
+   int rc = 0;
+
+/* Tick values in micro seconds.
+ * 1 coal_buf x bufs_per_record = 1 completion record.
+ */
+   memcpy(, >rx_coal, sizeof(struct bnxt_coal));
+
+   coal.coal_ticks = cpr->rx_ring_coal.coal_ticks;
+   coal.coal_bufs = cpr->rx_ring_coal.coal_bufs;
+
+   if (!bnapi->rx_ring)
+   return -ENODEV;
+
+   bnxt_hwrm_cmd_hdr_init(bp, _rx,
+  HWRM_RING_CMPL_RING_CFG_AGGINT_PARAMS, -1, -1);
+
+   bnxt_hwrm_set_coal_params(, _rx);
+
+   mutex_lock(>hwrm_cmd_lock);
+   req_rx.ring_id = cpr->cp_ring_struct.fw_ring_id;
+
+   rc = _hwrm_send_message(bp, _rx, sizeof(req_rx),
+   HWRM_CMD_TIMEOUT);
+

[RFC 8/9] net: move adaptive interrupt coalescing code to lib/

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This takes the code that is not generically named to lib/.

This move allows drivers to add private structure elements to track the
number of packets, bytes, and interrupts events per ring.  A driver
also defines a workqueue handler to act on this collected data once per
poll and modify the coalescing parameters per ring.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c |   1 +
 .../net/ethernet/mellanox/mlx5/core/net_rx_am.c| 303 
 .../net/ethernet/mellanox/mlx5/core/net_rx_am.h| 107 ---
 include/linux/net_rx_am.h  | 109 
 lib/Makefile   |   2 +-
 lib/net_rx_am.c| 306 +
 8 files changed, 419 insertions(+), 413 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.h
 create mode 100644 include/linux/net_rx_am.h
 create mode 100644 lib/net_rx_am.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index d5d6d3d..19b21b4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -15,7 +15,7 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o 
fpga/conn.o fpga/sdk.o \
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o 
\
en_tx.o en_rx.o en_rx_am.o en_txrx.o en_stats.o vxlan.o \
-   en_arfs.o en_fs_ethtool.o en_selftest.o net_rx_am.o
+   en_arfs.o en_fs_ethtool.o en_selftest.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 203dc7b..04b36fe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -49,7 +49,7 @@
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
-#include "net_rx_am.h"
+#include 
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index 1f8fda1..391f1ba 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -31,6 +31,7 @@
  */
 
 #include "en.h"
+#include 
 
 void mlx5e_rx_am_work(struct work_struct *work)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c
deleted file mode 100644
index 37ea6d1..000
--- a/drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c
+++ /dev/null
@@ -1,303 +0,0 @@
-/*
- * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
- * Copyright (c) 2017, Broadcom Limiited. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- *  - Redistributions of source code must retain the above
- *copyright notice, this list of conditions and the following
- *disclaimer.
- *
- *  - Redistributions in binary form must reproduce the above
- *copyright notice, this list of conditions and the following
- *disclaimer in the documentation and/or other materials
- *provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-
-#include "en.h"
-
-#define NET_PARAMS_AM_NUM_PROFILES 5
-/* Adaptive moderation profiles */
-#define NET_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
-#define NET_RX_AM_DEF_PROFILE_CQE 1
-#define NET_RX_AM_DEF_PROFILE_EQE 1
-
-/* All profiles sizes must be NET_PARAMS_AM_NUM_PROFILES */
-#define NET_AM_EQE_PROFILES { \
-   {1,   NET_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_E

[RFC 7/9] mlx5_en: remove Mellanox references in AM code

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Remove all mlx5* and MLX* references to net_ and NET_, respectively in
code that handles software interrupt moderation.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   7 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  18 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/net_rx_am.c| 214 ++---
 .../net/ethernet/mellanox/mlx5/core/net_rx_am.h|  57 +++---
 8 files changed, 157 insertions(+), 157 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index a9dc118..203dc7b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -221,8 +221,8 @@ struct mlx5e_params {
u8  num_tc;
u8  rx_cq_period_mode;
bool rx_cqe_compress_def;
-   struct mlx5e_cq_moder rx_cq_moderation;
-   struct mlx5e_cq_moder tx_cq_moderation;
+   struct net_cq_moder rx_cq_moderation;
+   struct net_cq_moder tx_cq_moderation;
bool lro_en;
u32 lro_wqe_sz;
u16 tx_max_inline;
@@ -505,7 +505,7 @@ struct mlx5e_rq {
unsigned long  state;
intix;
 
-   struct mlx5e_rx_am am; /* Adaptive Moderation */
+   struct net_rx_am am; /* Adaptive Moderation */
 
/* XDP */
struct bpf_prog   *xdp_prog;
@@ -1036,4 +1036,5 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
struct mlx5e_params *params,
u16 max_channels);
 
+void mlx5e_rx_am_work(struct work_struct *work);
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index b34aa8e..3955521 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1454,11 +1454,11 @@ static int set_pflag_rx_cqe_based_moder(struct 
net_device *netdev, bool enable)
int err = 0;
 
rx_cq_period_mode = enable ?
-   MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
-   MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
+   NET_CQ_PERIOD_MODE_START_FROM_CQE :
+   NET_CQ_PERIOD_MODE_START_FROM_EQE;
rx_mode_changed = rx_cq_period_mode != 
priv->channels.params.rx_cq_period_mode;
 
-   if (rx_cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE &&
+   if (rx_cq_period_mode == NET_CQ_PERIOD_MODE_START_FROM_CQE &&
!MLX5_CAP_GEN(mdev, cq_period_start_from_cqe))
return -EOPNOTSUPP;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 28ae00b..dcd96fe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1571,7 +1571,7 @@ static void mlx5e_destroy_cq(struct mlx5e_cq *cq)
 }
 
 static int mlx5e_open_cq(struct mlx5e_channel *c,
-struct mlx5e_cq_moder moder,
+struct net_cq_moder moder,
 struct mlx5e_cq_param *param,
 struct mlx5e_cq *cq)
 {
@@ -1748,7 +1748,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, 
int ix,
  struct mlx5e_channel_param *cparam,
  struct mlx5e_channel **cp)
 {
-   struct mlx5e_cq_moder icocq_moder = {0, 0};
+   struct net_cq_moder icocq_moder = {0, 0};
struct net_device *netdev = priv->netdev;
struct mlx5e_channel *c;
unsigned int irq;
@@ -1987,7 +1987,7 @@ static void mlx5e_build_tx_cq_param(struct mlx5e_priv 
*priv,
 
mlx5e_build_common_cq_param(priv, param);
 
-   param->cq_period_mode = MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
+   param->cq_period_mode = NET_CQ_PERIOD_MODE_START_FROM_EQE;
 }
 
 static void mlx5e_build_ico_cq_param(struct mlx5e_priv *priv,
@@ -2000,7 +2000,7 @@ static void mlx5e_build_ico_cq_param(struct mlx5e_priv 
*priv,
 
mlx5e_build_common_cq_param(priv, param);
 
-   param->cq_period_mode = MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
+   param->cq_period_mode = NET_CQ_PERIOD_MODE_START_FROM_EQE;
 }
 
 static void mlx5e_build_icosq_param(struct mlx5e_priv *priv,
@@ -3996,16 +3996,16 @@ void mlx5e_set_rx_cq_mode_params(struct mlx5e_params 
*params, u8 cq_period_mode)
params->rx_cq_moderation.usec =
MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC;
 
-   if (cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE)
+   if (cq

[RFC 1/9] mlx5_en: move interrupt moderation structs to new file

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Create new header file to prepare to move code that handles irq
moderation to a library.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   | 32 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h | 74 ++
 include/linux/mlx5/mlx5_ifc.h  |  6 --
 3 files changed, 75 insertions(+), 37 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index e613ce0..1bde086 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -49,6 +49,7 @@
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
+#include "en_rx_am.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
@@ -209,11 +210,6 @@ enum mlx5e_priv_flag {
 #define MLX5E_MAX_BW_ALLOC 100 /* Max percentage of BW allocation */
 #endif
 
-struct mlx5e_cq_moder {
-   u16 usec;
-   u16 pkts;
-};
-
 struct mlx5e_params {
u8  log_sq_size;
u8  rq_wq_type;
@@ -449,32 +445,6 @@ struct mlx5e_mpw_info {
u16 skbs_frags[MLX5_MPWRQ_PAGES_PER_WQE];
 };
 
-struct mlx5e_rx_am_stats {
-   int ppms; /* packets per msec */
-   int bpms; /* bytes per msec */
-   int epms; /* events per msec */
-};
-
-struct mlx5e_rx_am_sample {
-   ktime_t time;
-   u32 pkt_ctr;
-   u32 byte_ctr;
-   u16 event_ctr;
-};
-
-struct mlx5e_rx_am { /* Adaptive Moderation */
-   u8  state;
-   struct mlx5e_rx_am_statsprev_stats;
-   struct mlx5e_rx_am_sample   start_sample;
-   struct work_struct  work;
-   u8  profile_ix;
-   u8  mode;
-   u8  tune_state;
-   u8  steps_right;
-   u8  steps_left;
-   u8  tired;
-};
-
 /* a single cache unit is capable to serve one napi call (for non-striding rq)
  * or a MPWQE (for striding rq).
  */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
new file mode 100644
index 000..176a732
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
@@ -0,0 +1,74 @@
+/*
+ * Copyright (c) 2013-2015, Mellanox Technologies, Ltd.  All rights reserved.
+ * Copyright (c) 2017, Broadcom Limited
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+*/
+
+#ifndef MLX5_AM_H
+#define MLX5_AM_H
+
+struct mlx5e_cq_moder {
+   u16 usec;
+   u16 pkts;
+};
+
+struct mlx5e_rx_am_sample {
+   ktime_t time;
+   u32 pkt_ctr;
+   u32 byte_ctr;
+   u16 event_ctr;
+};
+
+struct mlx5e_rx_am_stats {
+   int ppms; /* packets per msec */
+   int bpms; /* bytes per msec */
+   int epms; /* events per msec */
+};
+
+struct mlx5e_rx_am { /* Adaptive Moderation */
+   u8  state;
+   struct mlx5e_rx_am_statsprev_stats;
+   struct mlx5e_rx_am_sample   start_sample;
+   struct work_struct  work;
+   u8  profile_ix;
+   u8  mode;
+   

[RFC 2/9] mlx5_en: move interrupt moderation forward delcarations

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Move these to newly created file to prepare to move these functions to a
library.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   | 4 
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 1bde086..1c56d16 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -798,10 +798,6 @@ void mlx5e_dealloc_rx_wqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_dealloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_free_rx_mpwqe(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi);
 
-void mlx5e_rx_am(struct mlx5e_rq *rq);
-void mlx5e_rx_am_work(struct work_struct *work);
-struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
-
 void mlx5e_update_stats(struct mlx5e_priv *priv, bool full);
 
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
index 176a732..869e4e7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
@@ -71,4 +71,8 @@ enum {
MLX5_CQ_PERIOD_NUM_MODES
 };
 
+void mlx5e_rx_am(struct mlx5e_rq *rq);
+void mlx5e_rx_am_work(struct work_struct *work);
+struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
+
 #endif /* MLX5_AM_H */
-- 
2.7.4



[RFC 2/9] mlx5_en: move interrupt moderation forward declarations

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

Move these to newly created file to prepare to move these functions to a
library.

Signed-off-by: Andy Gospodarek <go...@broadcom.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   | 4 
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 1bde086..1c56d16 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -798,10 +798,6 @@ void mlx5e_dealloc_rx_wqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_dealloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_free_rx_mpwqe(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi);
 
-void mlx5e_rx_am(struct mlx5e_rq *rq);
-void mlx5e_rx_am_work(struct work_struct *work);
-struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
-
 void mlx5e_update_stats(struct mlx5e_priv *priv, bool full);
 
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
index 176a732..869e4e7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
@@ -71,4 +71,8 @@ enum {
MLX5_CQ_PERIOD_NUM_MODES
 };
 
+void mlx5e_rx_am(struct mlx5e_rq *rq);
+void mlx5e_rx_am_work(struct work_struct *work);
+struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
+
 #endif /* MLX5_AM_H */
-- 
2.7.4



[RFC 0/9] net: create adaptive software irq moderation library

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This RFC converts the adaptive interrupt moderation library from the
mlx5_en driver into a library so it can be used by any driver.  The last
patch in this set adds support for interrupt moderation in the bnxt_en
driver.

The main purpose of this code in the mlx5 driver is to allow an 
  administrator to make sure that default coalesce 
settings are optimized   for low latency, but 
quickly adapt to handle high throughput traffic and
optimize how many packets are received during each napi poll.

For any new driver the following changes would ne needed to use this
library:

- add elements in ring struct to track items needed by this library
- create function that can be called to actually set coalesce settings
  for the driver

My main reason for making this an RFC is that I would like verification
from Mellanox that the performance of their driver does not change in a
unintended way.  I did some basic testing (netperf) and did not note a
statistically significant change in throughput or CPU utilization before
and after this set.  

Andy Gospodarek (9):
  mlx5_en: move interrupt moderation structs to new file
  mlx5_en: move interrupt moderation forward delcarations
  mlx5_en: remove rq references in mlx5e_rx_am
  mlx5_en: move AM logic enums
  mlx5_en: move generic functions to new file
  mlx5_en: rename en_rx_am.h to net_rx_am.h
  mlx5_en: remove Mellanox references in AM code
  net: move adaptive interrpt coalescing code to lib/
  bnxt_en: add support for software adaptive interrupt moderation

 drivers/net/ethernet/broadcom/bnxt/Makefile|   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c  |  51 
 drivers/net/ethernet/broadcom/bnxt/bnxt.h  |  34 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  |   7 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_rx_am.c|  32 +++
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  43 +--
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  18 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 298 +---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   5 +-
 include/linux/mlx5/mlx5_ifc.h  |   6 -
 include/linux/net_rx_am.h  | 109 
 lib/Makefile   |   2 +-
 lib/net_rx_am.c| 306 +
 15 files changed, 558 insertions(+), 365 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_rx_am.c
 create mode 100644 include/linux/net_rx_am.h
 create mode 100644 lib/net_rx_am.c

-- 
2.7.4



[RFC 0/9] net: create adaptive software irq moderation library

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek <go...@broadcom.com>

This RFC converts the adaptive interrupt moderation library from the
mlx5_en driver into a library so it can be used by any driver.  The last
patch in this set adds support for interrupt moderation in the bnxt_en
driver.

The main purpose of this code in the mlx5_en driver is to allow an
administrator to make sure that default coalesce settings are optimized
for low latency, but quickly adapt to handle high throughput traffic and
optimize how many packets are received during each napi poll.

For any new driver the following changes would be needed to use this
library:

- add elements in ring struct to track items needed by this library
- create function that can be called to actually set coalesce settings
  for the driver

My main reason for making this an RFC is that I would like verification
from Mellanox that the performance of their driver does not change in a
unintended way.  I did some basic testing (netperf) and did not note a
statistically significant change in throughput or CPU utilization before
and after this set.  

Credit to Rob Rice and Lee Reed for doing some of the initial proof of
concept and testing for this patch.

Andy Gospodarek (9):
  mlx5_en: move interrupt moderation structs to new file
  mlx5_en: move interrupt moderation forward delcarations
  mlx5_en: remove rq references in mlx5e_rx_am
  mlx5_en: move AM logic enums
  mlx5_en: move generic functions to new file
  mlx5_en: rename en_rx_am.h to net_rx_am.h
  mlx5_en: remove Mellanox references in AM code
  net: move adaptive interrpt coalescing code to lib/
  bnxt_en: add support for software adaptive interrupt moderation

 drivers/net/ethernet/broadcom/bnxt/Makefile|   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c  |  51 
 drivers/net/ethernet/broadcom/bnxt/bnxt.h  |  34 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  |   7 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_rx_am.c|  32 +++
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  43 +--
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  18 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 298 +---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   5 +-
 include/linux/mlx5/mlx5_ifc.h  |   6 -
 include/linux/net_rx_am.h  | 109 
 lib/Makefile   |   2 +-
 lib/net_rx_am.c| 306 +
 15 files changed, 558 insertions(+), 365 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_rx_am.c
 create mode 100644 include/linux/net_rx_am.h
 create mode 100644 lib/net_rx_am.c

-- 
2.7.4



Re: [RFC 1/3] devlink: Add config parameter get/set operations

2017-10-12 Thread Andy Gospodarek
On Thu, Oct 12, 2017 at 04:03:17PM +0200, Jiri Pirko wrote:
> Thu, Oct 12, 2017 at 03:34:20PM CEST, steven.l...@broadcom.com wrote:
> >Add support for config parameter get/set commands. Initially used by
> >bnxt driver, but other drivers can use the same, or new, attributes.
> >The config_get() and config_set() operations operate as expected, but
> >note that the driver implementation of the config_set() operation can
> >indicate whether a restart is necessary for the setting to take
> >effect.
> >
> 
> First of all, I like this approach.
> 
> I would like to see this patch split into:
> 1) config-options infrastructure introduction
> 2) specific config options introductions - would be best to have it
>per-option. We need to make sure every option is very well described
>and explained usecases. This is needed in order vendors to share
>attributes among drivers.
> 
> More nits inlined.
> 
> 
> >Signed-off-by: Steve Lin <steven.l...@broadcom.com>
> >Acked-by: Andy Gospodarek <go...@broadcom.com>
> >---
> > include/net/devlink.h|   4 +
> > include/uapi/linux/devlink.h | 108 ++
> > net/core/devlink.c   | 207 
> > +++
> > 3 files changed, 319 insertions(+)
> >
> > static inline void *devlink_priv(struct devlink *devlink)
> >diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
> >index 0cbca96..e959716 100644
> >--- a/include/uapi/linux/devlink.h
> >+++ b/include/uapi/linux/devlink.h
[...]
> >@@ -202,6 +267,49 @@ enum devlink_attr {
> > 
> > DEVLINK_ATTR_ESWITCH_ENCAP_MODE,/* u8 */
> > 
> >+/* Configuration Parameters */
> >+DEVLINK_ATTR_SRIOV_ENABLED, /* u8 */
> >+DEVLINK_ATTR_NUM_VF_PER_PF, /* u32 */
> >+DEVLINK_ATTR_MAX_NUM_PF_MSIX_VECT,  /* u32 */
> >+DEVLINK_ATTR_MSIX_VECTORS_PER_VF,   /* u32 */
> >+DEVLINK_ATTR_NPAR_NUM_PARTITIONS_PER_PORT,  /* u32 */
> >+DEVLINK_ATTR_NPAR_BW_IN_PERCENT,/* u8 */
> >+DEVLINK_ATTR_NPAR_BW_RESERVATION,   /* u8 */
> >+DEVLINK_ATTR_NPAR_BW_RESERVATION_VALID, /* u8 */
> >+DEVLINK_ATTR_NPAR_BW_LIMIT, /* u8 */
> >+DEVLINK_ATTR_NPAR_BW_LIMIT_VALID,   /* u8 */
> >+DEVLINK_ATTR_DCBX_MODE, /* u8 */
> >+DEVLINK_ATTR_RDMA_ENABLED,  /* u8 */
> >+DEVLINK_ATTR_MULTIFUNC_MODE,/* u8 */
> >+DEVLINK_ATTR_SECURE_NIC_ENABLED,/* u8 */
> >+DEVLINK_ATTR_IGNORE_ARI_CAPABILITY, /* u8 */
> >+DEVLINK_ATTR_LLDP_NEAREST_BRIDGE_ENABLED,   /* u8 */
> >+DEVLINK_ATTR_LLDP_NEAREST_NONTPMR_BRIDGE_ENABLED,   /* u8 */
> >+DEVLINK_ATTR_PME_CAPABILITY_ENABLED,/* u8 */
> >+DEVLINK_ATTR_MAGIC_PACKET_WOL_ENABLED,  /* u8 */
> >+DEVLINK_ATTR_EEE_PWR_SAVE_ENABLED,  /* u8 */
> >+DEVLINK_ATTR_AUTONEG_PROTOCOL,  /* u8 */
> >+DEVLINK_ATTR_MEDIA_AUTO_DETECT, /* u8 */
> >+DEVLINK_ATTR_PHY_SELECT,/* u8 */
> >+DEVLINK_ATTR_PRE_OS_LINK_SPEED_D0,  /* u8 */
> >+DEVLINK_ATTR_PRE_OS_LINK_SPEED_D3,  /* u8 */
> >+DEVLINK_ATTR_MBA_ENABLED,   /* u8 */
> >+DEVLINK_ATTR_MBA_BOOT_TYPE, /* u8 */
> >+DEVLINK_ATTR_MBA_DELAY_TIME,/* u32 */
> >+DEVLINK_ATTR_MBA_SETUP_HOT_KEY, /* u8 */
> >+DEVLINK_ATTR_MBA_HIDE_SETUP_PROMPT, /* u8 */
> >+DEVLINK_ATTR_MBA_BOOT_RETRY_COUNT,  /* u32 */
> >+DEVLINK_ATTR_MBA_VLAN_ENABLED,  /* u8 */
> >+DEVLINK_ATTR_MBA_VLAN_TAG,  /* u16 */
> >+DEVLINK_ATTR_MBA_BOOT_PROTOCOL, /* u8 */
> >+DEVLINK_ATTR_MBA_LINK_SPEED,/* u8 */
> 
> Okay, I think it is about the time we should start thinking about
> putting this new config attributes under nester attribute. What do you
> think?
> 

Steve and I actually had a similar discussion yesterday when I was doing
a final review of the patches.

My only objection to nesting was coming up with a way to describe these
functions that made them seem different than existing configuration
options.  In this case of the hardware we are trying to support these
are all permanent config options, so we would call them
DEVLINK_ATTR_NVRAM or DEVLINK_ATTR_PERM.  Does that seem reasonable to
others?


Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access

2017-09-28 Thread Andy Gospodarek
On Thu, Sep 28, 2017 at 1:59 AM, Waskiewicz Jr, Peter
<peter.waskiewicz...@intel.com> wrote:
> On 9/26/17 10:21 AM, Andy Gospodarek wrote:
>> On Mon, Sep 25, 2017 at 08:50:28PM +0200, Daniel Borkmann wrote:
>>> On 09/25/2017 08:10 PM, Andy Gospodarek wrote:
>>> [...]
>>>> First, thanks for this detailed description.  It was helpful to read
>>>> along with the patches.
>>>>
>>>> My only concern about this area being generic is that you are now in a
>>>> state where any bpf program must know about all the bpf programs in the
>>>> receive pipeline before it can properly parse what is stored in the
>>>> meta-data and add it to an skb (or perform any other action).
>>>> Especially if each program adds it's own meta-data along the way.
>>>>
>>>> Maybe this isn't a big concern based on the number of users of this
>>>> today, but it just starts to seem like a concern as there are these
>>>> hints being passed between layers that are challenging to track due to a
>>>> lack of a standard format for passing data between.
>>>
>>> Btw, we do have similar kind of programmable scratch buffer also today
>>> wrt skb cb[] that you can program from tc side, the perf ring buffer,
>>> which doesn't have any fixed layout for the slots, or a per-cpu map
>>> where you can transfer data between tail calls for example, then tail
>>> calls themselves that need to coordinate, or simply mangling of packets
>>> itself if you will, but more below to your use case ...
>>>
>>>> The main reason I bring this up is that Michael and I had discussed and
>>>> designed a way for drivers to communicate between each other that rx
>>>> resources could be freed after a tx completion on an XDP_REDIRECT
>>>> action.  Much like this code, it involved adding an new element to
>>>> struct xdp_md that could point to the important information.  Now that
>>>> there is a generic way to handle this, it would seem nice to be able to
>>>> leverage it, but I'm not sure how reliable this meta-data area would be
>>>> without the ability to mark it in some manner.
>>>>
>>>> For additional background, the minimum amount of data needed in the case
>>>> Michael and I were discussing was really 2 words.  One to serve as a
>>>> pointer to an rx_ring structure and one to have a counter to the rx
>>>> producer entry.  This data could be acessed by the driver processing the
>>>> tx completions and callback to the driver that received the frame off the 
>>>> wire
>>>> to perform any needed processing.  (For those curious this would also 
>>>> require a
>>>> new callback/netdev op to act on this data stored in the XDP buffer.)
>>>
>>> What you describe above doesn't seem to be fitting to the use-case of
>>> this set, meaning the area here is fully programmable out of the BPF
>>> program, the infrastructure you're describing is some sort of means of
>>> communication between drivers for the XDP_REDIRECT, and should be
>>> outside of the control of the BPF program to mangle.
>>
>> OK, I understand that perspective.  I think saying this is really meant
>> as a BPF<->BPF communication channel for now is fine.
>>
>>> You could probably reuse the base infra here and make a part of that
>>> inaccessible for the program with some sort of a fixed layout, but I
>>> haven't seen your code yet to be able to fully judge. Intention here
>>> is to allow for programmability within the BPF prog in a generic way,
>>> such that based on the use-case it can be populated in specific ways
>>> and propagated to the skb w/o having to define a fixed layout and
>>> bloat xdp_buff all the way to an skb while still retaining all the
>>> flexibility.
>>
>> Some level of reuse might be proper, but I'd rather it be explicit for
>> my use since it's not exclusively something that will need to be used by
>> a BPF prog, but rather the driver.  I'll produce some patches this week
>> for reference.
>
> Sorry for chiming in late, I've been offline.
>
> We're looking to add some functionality from driver to XDP inside this
> xdp_buff->data_meta region.  We want to assign it to an opaque
> structure, that would be specific per driver (think of a flex descriptor
> coming out of the hardware).  We'd like to pass these offloaded
> computations into XDP programs to help accelerate them, such as packet
> type, where headers are loc

Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access

2017-09-26 Thread Andy Gospodarek
On Mon, Sep 25, 2017 at 08:50:28PM +0200, Daniel Borkmann wrote:
> On 09/25/2017 08:10 PM, Andy Gospodarek wrote:
> [...]
> > First, thanks for this detailed description.  It was helpful to read
> > along with the patches.
> > 
> > My only concern about this area being generic is that you are now in a
> > state where any bpf program must know about all the bpf programs in the
> > receive pipeline before it can properly parse what is stored in the
> > meta-data and add it to an skb (or perform any other action).
> > Especially if each program adds it's own meta-data along the way.
> > 
> > Maybe this isn't a big concern based on the number of users of this
> > today, but it just starts to seem like a concern as there are these
> > hints being passed between layers that are challenging to track due to a
> > lack of a standard format for passing data between.
> 
> Btw, we do have similar kind of programmable scratch buffer also today
> wrt skb cb[] that you can program from tc side, the perf ring buffer,
> which doesn't have any fixed layout for the slots, or a per-cpu map
> where you can transfer data between tail calls for example, then tail
> calls themselves that need to coordinate, or simply mangling of packets
> itself if you will, but more below to your use case ...
> 
> > The main reason I bring this up is that Michael and I had discussed and
> > designed a way for drivers to communicate between each other that rx
> > resources could be freed after a tx completion on an XDP_REDIRECT
> > action.  Much like this code, it involved adding an new element to
> > struct xdp_md that could point to the important information.  Now that
> > there is a generic way to handle this, it would seem nice to be able to
> > leverage it, but I'm not sure how reliable this meta-data area would be
> > without the ability to mark it in some manner.
> > 
> > For additional background, the minimum amount of data needed in the case
> > Michael and I were discussing was really 2 words.  One to serve as a
> > pointer to an rx_ring structure and one to have a counter to the rx
> > producer entry.  This data could be acessed by the driver processing the
> > tx completions and callback to the driver that received the frame off the 
> > wire
> > to perform any needed processing.  (For those curious this would also 
> > require a
> > new callback/netdev op to act on this data stored in the XDP buffer.)
> 
> What you describe above doesn't seem to be fitting to the use-case of
> this set, meaning the area here is fully programmable out of the BPF
> program, the infrastructure you're describing is some sort of means of
> communication between drivers for the XDP_REDIRECT, and should be
> outside of the control of the BPF program to mangle.

OK, I understand that perspective.  I think saying this is really meant
as a BPF<->BPF communication channel for now is fine.

> You could probably reuse the base infra here and make a part of that
> inaccessible for the program with some sort of a fixed layout, but I
> haven't seen your code yet to be able to fully judge. Intention here
> is to allow for programmability within the BPF prog in a generic way,
> such that based on the use-case it can be populated in specific ways
> and propagated to the skb w/o having to define a fixed layout and
> bloat xdp_buff all the way to an skb while still retaining all the
> flexibility.

Some level of reuse might be proper, but I'd rather it be explicit for
my use since it's not exclusively something that will need to be used by
a BPF prog, but rather the driver.  I'll produce some patches this week
for reference.



Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access

2017-09-25 Thread Andy Gospodarek
On Mon, Sep 25, 2017 at 02:25:51AM +0200, Daniel Borkmann wrote:
> This work enables generic transfer of metadata from XDP into skb. The
> basic idea is that we can make use of the fact that the resulting skb
> must be linear and already comes with a larger headroom for supporting
> bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work
> on a similar principle and introduce a small helper bpf_xdp_adjust_meta()
> for adjusting a new pointer called xdp->data_meta. Thus, the packet has
> a flexible and programmable room for meta data, followed by the actual
> packet data. struct xdp_buff is therefore laid out that we first point
> to data_hard_start, then data_meta directly prepended to data followed
> by data_end marking the end of packet. bpf_xdp_adjust_head() takes into
> account whether we have meta data already prepended and if so, memmove()s
> this along with the given offset provided there's enough room.
> 
> xdp->data_meta is optional and programs are not required to use it. The
> rationale is that when we process the packet in XDP (e.g. as DoS filter),
> we can push further meta data along with it for the XDP_PASS case, and
> give the guarantee that a clsact ingress BPF program on the same device
> can pick this up for further post-processing. Since we work with skb
> there, we can also set skb->mark, skb->priority or other skb meta data
> out of BPF, thus having this scratch space generic and programmable
> allows for more flexibility than defining a direct 1:1 transfer of
> potentially new XDP members into skb (it's also more efficient as we
> don't need to initialize/handle each of such new members).  The facility
> also works together with GRO aggregation. The scratch space at the head
> of the packet can be multiple of 4 byte up to 32 byte large. Drivers not
> yet supporting xdp->data_meta can simply be set up with xdp->data_meta
> as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out,
> such that the subsequent match against xdp->data for later access is
> guaranteed to fail.
> 
> The verifier treats xdp->data_meta/xdp->data the same way as we treat
> xdp->data/xdp->data_end pointer comparisons. The requirement for doing
> the compare against xdp->data is that it hasn't been modified from it's
> original address we got from ctx access. It may have a range marking
> already from prior successful xdp->data/xdp->data_end pointer comparisons
> though.

First, thanks for this detailed description.  It was helpful to read
along with the patches.

My only concern about this area being generic is that you are now in a
state where any bpf program must know about all the bpf programs in the
receive pipeline before it can properly parse what is stored in the
meta-data and add it to an skb (or perform any other action).
Especially if each program adds it's own meta-data along the way.

Maybe this isn't a big concern based on the number of users of this
today, but it just starts to seem like a concern as there are these
hints being passed between layers that are challenging to track due to a
lack of a standard format for passing data between.

The main reason I bring this up is that Michael and I had discussed and
designed a way for drivers to communicate between each other that rx
resources could be freed after a tx completion on an XDP_REDIRECT
action.  Much like this code, it involved adding an new element to
struct xdp_md that could point to the important information.  Now that
there is a generic way to handle this, it would seem nice to be able to
leverage it, but I'm not sure how reliable this meta-data area would be
without the ability to mark it in some manner.

For additional background, the minimum amount of data needed in the case
Michael and I were discussing was really 2 words.  One to serve as a
pointer to an rx_ring structure and one to have a counter to the rx
producer entry.  This data could be acessed by the driver processing the
tx completions and callback to the driver that received the frame off the wire
to perform any needed processing.  (For those curious this would also require a
new callback/netdev op to act on this data stored in the XDP buffer.)

IIUC, I could use this meta_data area to store this information, but would it
also be useful to create some type field/marker that could also be stored in
the meta_data to indicate what type of information is there?  I hate to propose
such a thing as it may add unneeded complexity, but I just wanted to make sure
to say something before it was too late as there may be more users of this
right away.  :-)

> Signed-off-by: Daniel Borkmann 
> Acked-by: Alexei Starovoitov 
> Acked-by: John Fastabend 
> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c  |   1 +
>  drivers/net/ethernet/cavium/thunder/nicvf_main.c   |   1 +
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c|   1 +
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  

Re: [PATCH net-next] xdp: implement xdp_redirect_map for generic XDP

2017-09-06 Thread Andy Gospodarek
On Wed, Sep 6, 2017 at 11:26 AM, Jesper Dangaard Brouer
<bro...@redhat.com> wrote:
> Using bpf_redirect_map is allowed for generic XDP programs, but the
> appropriate map lookup was never performed in xdp_do_generic_redirect().
>
> Instead the map-index is directly used as the ifindex.  For the
> xdp_redirect_map sample in SKB-mode '-S', this resulted in trying
> sending on ifindex 0 which isn't valid, resulting in getting SKB
> packets dropped.  Thus, the reported performance numbers are wrong in
> commit 24251c264798 ("samples/bpf: add option for native and skb mode
> for redirect apps") for the 'xdp_redirect_map -S' case.
>
> It might seem innocent this was lacking, but it can actually crash the
> kernel.  The potential crash is caused by not consuming redirect_info->map.
> The bpf_redirect_map helper will set this_cpu_ptr(_info)->map
> pointer, which will survive even after unloading the xdp bpf_prog and
> deallocating the devmap data-structure.  This leaves a dead map
> pointer around.  The kernel will crash when loading the xdp_redirect
> sample (in native XDP mode) as it doesn't reset map (via bpf_redirect)
> and returns XDP_REDIRECT, which will cause it to dereference the map
> pointer.

Nice catch!

Since 'net-next' is closed and this is a bugfix it seems like this is
a good candidate for 'net' right?

>
> Fixes: 6103aa96ec07 ("net: implement XDP_REDIRECT for xdp generic")
> Fixes: 24251c264798 ("samples/bpf: add option for native and skb mode for 
> redirect apps")
> Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com>

Acked-by: Andy Gospodarek <a...@greyhouse.net>


> ---
>  net/core/filter.c |   29 +
>  1 file changed, 29 insertions(+)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 5912c738a7b2..6a4745bf2c9f 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2562,6 +2562,32 @@ int xdp_do_redirect(struct net_device *dev, struct 
> xdp_buff *xdp,
>  }
>  EXPORT_SYMBOL_GPL(xdp_do_redirect);
>
> +static int xdp_do_generic_redirect_map(struct net_device *dev,
> +  struct sk_buff *skb,
> +  struct bpf_prog *xdp_prog)
> +{
> +   struct redirect_info *ri = this_cpu_ptr(_info);
> +   struct bpf_map *map = ri->map;
> +   u32 index = ri->ifindex;
> +   struct net_device *fwd;
> +   int err;
> +
> +   ri->ifindex = 0;
> +   ri->map = NULL;
> +
> +   fwd = __dev_map_lookup_elem(map, index);
> +   if (!fwd) {
> +   err = -EINVAL;
> +   goto err;
> +   }
> +   skb->dev = fwd;
> +   _trace_xdp_redirect_map(dev, xdp_prog, fwd, map, index);
> +   return 0;
> +err:
> +   _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map, index, err);
> +   return err;
> +}
> +
>  int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
> struct bpf_prog *xdp_prog)
>  {
> @@ -2571,6 +2597,9 @@ int xdp_do_generic_redirect(struct net_device *dev, 
> struct sk_buff *skb,
> unsigned int len;
> int err = 0;
>
> +   if (ri->map)
> +   return xdp_do_generic_redirect_map(dev, skb, xdp_prog);
> +
> fwd = dev_get_by_index_rcu(dev_net(dev), index);
> ri->ifindex = 0;
> if (unlikely(!fwd)) {
>


Re: XDP redirect measurements, gotchas and tracepoints

2017-08-29 Thread Andy Gospodarek
On Tue, Aug 29, 2017 at 09:23:49AM -0700, Alexander Duyck wrote:
> On Tue, Aug 29, 2017 at 6:26 AM, Jesper Dangaard Brouer
>  wrote:
> >
> > On Mon, 28 Aug 2017 09:11:25 -0700 Alexander Duyck 
> >  wrote:
> >
> >> My advice would be to not over complicate this. My big concern with
> >> all this buffer recycling is what happens the first time somebody
> >> introduces something like mirroring? Are you going to copy the data to
> >> a new page which would be quite expensive or just have to introduce
> >> reference counts? You are going to have to deal with stuff like
> >> reference counts eventually so you might as well bite that bullet now.
> >> My advice would be to not bother with optimizing for performance right
> >> now and instead focus on just getting functionality. The approach we
> >> took in ixgbe for the transmit path should work for almost any other
> >> driver since all you are looking at is having to free the page
> >> reference which takes care of reference counting already.
> >
> > This return API is not about optimizing performance right now.  It is
> > actually about allowing us to change the underlying memory model per RX
> > queue for XDP.
> 
>  I would disagree. To me this is a obvious case of premature optimization.
> 

I'm with Jesper on this.  Though it may seem to you that this is an
optimization that is not a goal.

> > If a RX-ring is use for both SKBs and XDP, then the refcnt model is
> > still enforced.  Although a driver using the 1-packet-per-page model,
> > should be able to reuse refcnt==1 pages when returned from XDP.
> 
> Isn't this the case for all Rx on XDP enabled rings. Last I knew there
> was an option to pass packets up via an SKB if XDP_PASS is returned.
> Are you saying we need to do a special allocation path if an XDP
> program doesn't make use of XDP_PASS?

I am not proposing that a special allocation path is needed depending on the
return code from the XDP program.  I'm proposing that in a case where
the return code is XDP_REDIRECT (or really anytime the ndo_xdp_xmit
operation is called), that there should be:

(1) notification back to the driver/resource/etc that allocated the page
that resources are no longer in use.

or 

(2) common alloc/free framework used by drivers that operate on
xdp->data so that framework takes care of refcounting, etc.

My preference is (1) since it provides drivers the most flexibility in
the event that some hardware resource (rx ring buffer pointer) or
software resource (page or other chunk of memory) can be freed.

> > If a RX-ring is _ONLY_ used for XDP, then the driver have freedom to
> > implement another memory model, with the return-API.  We need to
> > experiment with the most optimal memory model.  The 1-packet-per-page
> > model is actually not the fastest, because of PCI-e bottlenecks.  With
> > HW support for packing descriptors and packets over the PCI-e bus, much
> > higher rates can be achieved.  Mellanox mlx5-Lx already have the needed HW
> > support.  And companies like NetCope also have 100G HW that does
> > similar tricks, and they even have a whitepaper[1][2] how they are
> > faster than DPDK with their NDP (Netcope Data Plane) API.
> >
> > We do need the ability/flexibility to change the RX memory model, to
> > take advantage of this new NIC hardware.
> 
> Looking over the white paper I see nothing that prevents us from using
> the same memory model we do with the Intel NICs. If anything I think
> the Intel drivers in "legacy-rx" mode could support something like
> this now, even if the hardware doesn't simply because we can get away
> with keeping the memory pseudo-pinned. My bigger concern is that we
> keep coming back to this idea that we need to have the network stack
> taking care of the 1 page per packet recycling when I really think it
> has no business being there. We either need to look at implementing
> this in the way we did in the Intel drivers where we use the reference
> counts or implement our own memory handling API like SLUB or something
> similar based on compound page destructors. I would much rather see us
> focus on getting this going with an agnostic memory model where we
> don't have to make the stack aware of where the memory came from or
> where it has to be returned to.
> 
> > [1] https://www.netcope.com/en/resources/improving-dpdk-performance
> > [2] 
> > https://www.netcope.com/en/company/press-center/press-releases/read-new-netcope-whitepaper-on-dpdk-acceleration
> 
> My only concern with something like this is the fact that it is
> optimized for a setup where the data is left in place and nothing
> extra is added. Trying to work with something like this gets more
> expensive when you have to deal with the full stack as you have to
> copy out the headers and still deal with all the skb metadata. I fully
> agree with the basic premise that writing in large blocks provides
> significant gains in throughput, specifically with small packets. 

Re: XDP redirect measurements, gotchas and tracepoints

2017-08-28 Thread Andy Gospodarek
On Mon, Aug 28, 2017 at 09:14:20AM -0700, John Fastabend wrote:
> On 08/28/2017 09:02 AM, Andy Gospodarek wrote:
> > On Fri, Aug 25, 2017 at 08:28:55AM -0700, Michael Chan wrote:
> >> On Fri, Aug 25, 2017 at 8:10 AM, John Fastabend
> >> <john.fastab...@gmail.com> wrote:
> >>> On 08/25/2017 05:45 AM, Jesper Dangaard Brouer wrote:
> >>>> On Thu, 24 Aug 2017 20:36:28 -0700
> >>>> Michael Chan <michael.c...@broadcom.com> wrote:
> >>>>
> >>>>> On Wed, Aug 23, 2017 at 1:29 AM, Jesper Dangaard Brouer
> >>>>> <bro...@redhat.com> wrote:
> >>>>>> On Tue, 22 Aug 2017 23:59:05 -0700
> >>>>>> Michael Chan <michael.c...@broadcom.com> wrote:
> >>>>>>
> >>>>>>> On Tue, Aug 22, 2017 at 6:06 PM, Alexander Duyck
> >>>>>>> <alexander.du...@gmail.com> wrote:
> >>>>>>>> On Tue, Aug 22, 2017 at 1:04 PM, Michael Chan 
> >>>>>>>> <michael.c...@broadcom.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Right, but it's conceivable to add an API to "return" the buffer to
> >>>>>>>>> the input device, right?
> >>>>>>
> >>>>>> Yes, I would really like to see an API like this.
> >>>>>>
> >>>>>>>>
> >>>>>>>> You could, it is just added complexity. "just free the buffer" in
> >>>>>>>> ixgbe usually just amounts to one atomic operation to decrement the
> >>>>>>>> total page count since page recycling is already implemented in the
> >>>>>>>> driver. You still would have to unmap the buffer regardless of if you
> >>>>>>>> were recycling it or not so all you would save is 1.15259 atomic
> >>>>>>>> operations per packet. The fraction is because once every 64K uses we
> >>>>>>>> have to bulk update the count on the page.
> >>>>>>>>
> >>>>>>>
> >>>>>>> If the buffer is returned to the input device, the input device can
> >>>>>>> keep the DMA mapping.  All it needs to do is to dma_sync it back to
> >>>>>>> the input device when the buffer is returned.
> >>>>>>
> >>>>>> Yes, exactly, return to the input device. I really think we should
> >>>>>> work on a solution where we can keep the DMA mapping around.  We have
> >>>>>> an opportunity here to make ndo_xdp_xmit TX queues use a specialized
> >>>>>> page return call, to achieve this. (I imagine other arch's have a high
> >>>>>> DMA overhead than Intel)
> >>>>>>
> >>>>>> I'm not sure how the API should look.  The ixgbe recycle mechanism and
> >>>>>> splitting the page (into two packets) actually complicates things, and
> >>>>>> tie us into a page-refcnt based model.  We could get around this by
> >>>>>> each driver implementing a page-return-callback, that allow us to
> >>>>>> return the page to the input device?  Then, drivers implementing the
> >>>>>> 1-packet-per-page can simply check/read the page-refcnt, and if it is
> >>>>>> "1" DMA-sync and reuse it in the RX queue.
> >>>>>>
> >>>>>
> >>>>> Yeah, based on Alex' description, it's not clear to me whether ixgbe
> >>>>> redirecting to a non-intel NIC or vice versa will actually work.  It
> >>>>> sounds like the output device has to make some assumptions about how
> >>>>> the page was allocated by the input device.
> >>>>
> >>>> Yes, exactly. We are tied into a page refcnt based scheme.
> >>>>
> >>>> Besides the ixgbe page recycle scheme (which keeps the DMA RX-mapping)
> >>>> is also tied to the RX queue size, plus how fast the pages are returned.
> >>>> This makes it very hard to tune.  As I demonstrated, default ixgbe
> >>>> settings does not work well with XDP_REDIRECT.  I needed to increase
> >>>> TX-ring size, but it broke page recycling (dropping perf from 13Mpps to
> >>>> 10Mpps) so I also needed it increase RX-ring size.  But perf is best if
> >>>> RX-ring size is smaller, 

Re: XDP redirect measurements, gotchas and tracepoints

2017-08-28 Thread Andy Gospodarek
On Fri, Aug 25, 2017 at 08:28:55AM -0700, Michael Chan wrote:
> On Fri, Aug 25, 2017 at 8:10 AM, John Fastabend
>  wrote:
> > On 08/25/2017 05:45 AM, Jesper Dangaard Brouer wrote:
> >> On Thu, 24 Aug 2017 20:36:28 -0700
> >> Michael Chan  wrote:
> >>
> >>> On Wed, Aug 23, 2017 at 1:29 AM, Jesper Dangaard Brouer
> >>>  wrote:
>  On Tue, 22 Aug 2017 23:59:05 -0700
>  Michael Chan  wrote:
> 
> > On Tue, Aug 22, 2017 at 6:06 PM, Alexander Duyck
> >  wrote:
> >> On Tue, Aug 22, 2017 at 1:04 PM, Michael Chan 
> >>  wrote:
> >>>
> >>> Right, but it's conceivable to add an API to "return" the buffer to
> >>> the input device, right?
> 
>  Yes, I would really like to see an API like this.
> 
> >>
> >> You could, it is just added complexity. "just free the buffer" in
> >> ixgbe usually just amounts to one atomic operation to decrement the
> >> total page count since page recycling is already implemented in the
> >> driver. You still would have to unmap the buffer regardless of if you
> >> were recycling it or not so all you would save is 1.15259 atomic
> >> operations per packet. The fraction is because once every 64K uses we
> >> have to bulk update the count on the page.
> >>
> >
> > If the buffer is returned to the input device, the input device can
> > keep the DMA mapping.  All it needs to do is to dma_sync it back to
> > the input device when the buffer is returned.
> 
>  Yes, exactly, return to the input device. I really think we should
>  work on a solution where we can keep the DMA mapping around.  We have
>  an opportunity here to make ndo_xdp_xmit TX queues use a specialized
>  page return call, to achieve this. (I imagine other arch's have a high
>  DMA overhead than Intel)
> 
>  I'm not sure how the API should look.  The ixgbe recycle mechanism and
>  splitting the page (into two packets) actually complicates things, and
>  tie us into a page-refcnt based model.  We could get around this by
>  each driver implementing a page-return-callback, that allow us to
>  return the page to the input device?  Then, drivers implementing the
>  1-packet-per-page can simply check/read the page-refcnt, and if it is
>  "1" DMA-sync and reuse it in the RX queue.
> 
> >>>
> >>> Yeah, based on Alex' description, it's not clear to me whether ixgbe
> >>> redirecting to a non-intel NIC or vice versa will actually work.  It
> >>> sounds like the output device has to make some assumptions about how
> >>> the page was allocated by the input device.
> >>
> >> Yes, exactly. We are tied into a page refcnt based scheme.
> >>
> >> Besides the ixgbe page recycle scheme (which keeps the DMA RX-mapping)
> >> is also tied to the RX queue size, plus how fast the pages are returned.
> >> This makes it very hard to tune.  As I demonstrated, default ixgbe
> >> settings does not work well with XDP_REDIRECT.  I needed to increase
> >> TX-ring size, but it broke page recycling (dropping perf from 13Mpps to
> >> 10Mpps) so I also needed it increase RX-ring size.  But perf is best if
> >> RX-ring size is smaller, thus two contradicting tuning needed.
> >>
> >
> > The changes to decouple the ixgbe page recycle scheme (1pg per descriptor
> > split into two halves being the default) from the number of descriptors
> > doesn't look too bad IMO. It seems like it could be done by having some
> > extra pages allocated upfront and pulling those in when we need another
> > page.
> >
> > This would be a nice iterative step we could take on the existing API.
> >
> >>
> >>> With buffer return API,
> >>> each driver can cleanly recycle or free its own buffers properly.
> >>
> >> Yes, exactly. And RX-driver can implement a special memory model for
> >> this queue.  E.g. RX-driver can know this is a dedicated XDP RX-queue
> >> which is never used for SKBs, thus opening for new RX memory models.
> >>
> >> Another advantage of a return API.  There is also an opportunity for
> >> avoiding the DMA map on TX. As we need to know the from-device.  Thus,
> >> we can add a DMA API, where we can query if the two devices uses the
> >> same DMA engine, and can reuse the same DMA address the RX-side already
> >> knows.
> >>
> >>
> >>> Let me discuss this further with Andy to see if we can come up with a
> >>> good scheme.
> >>
> >> Sound good, looking forward to hear what you come-up with :-)
> >>
> >
> > I guess by this thread we will see a broadcom nic with redirect support
> > soon ;)
> 
> Yes, Andy actually has finished the coding for XDP_REDIRECT, but the
> buffer recycling scheme has some problems.  We can make it work for
> Broadcom to Broadcom only, but we want a better solution.

(Sorry for the radio silence I was AFK last week...)

I finished it a 

Re: [RFC] switchdev: clarify ndo_get_phys_port_name() formats

2017-07-26 Thread Andy Gospodarek
On Tue, Jul 25, 2017 at 07:34:47PM -0700, Jakub Kicinski wrote:
> On Tue, 25 Jul 2017 21:48:15 -0400, Andy Gospodarek wrote:
> > On Tue, Jul 25, 2017 at 03:26:47PM -0700, Jakub Kicinski wrote:
> > > On Tue, 25 Jul 2017 11:22:41 -0400, Andy Gospodarek wrote:  
> > > > On Mon, Jul 24, 2017 at 10:13:44PM -0700, Jakub Kicinski wrote:  
> > > > > We are still in position where we can suggest uniform naming
> > > > > convention for ndo_get_phys_port_name().  switchdev.txt file
> > > > > already contained a suggestion of how to name external ports.
> > > > > Since the use of switchdev for SR-IOV NIC's eswitches is growing,
> > > > > establish a format for ports of those devices as well.
> > > > > 
> > > > > Signed-off-by: Jakub Kicinski <jakub.kicin...@netronome.com>
> > > > 
> > > > This is a nice addition and I suspect there could be even more done to
> > > > update this file to cover the VF rep usage.
> > > >   
> > > > > ---
> > > > >  Documentation/networking/switchdev.txt | 14 +++---
> > > > >  1 file changed, 11 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/Documentation/networking/switchdev.txt 
> > > > > b/Documentation/networking/switchdev.txt
> > > > > index 3e7b946dea27..7c4b6025fb4b 100644
> > > > > --- a/Documentation/networking/switchdev.txt
> > > > > +++ b/Documentation/networking/switchdev.txt
> > > > > @@ -119,9 +119,17 @@ into 4 10G ports, resulting in 4 port netdevs, 
> > > > > the device can give a unique
> > > > >  SUBSYSTEM=="net", ACTION=="add", 
> > > > > ATTR{phys_switch_id}=="", \
> > > > >   ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
> > > > >  
> > > > > -Suggested naming convention is "swXpYsZ", where X is the switch name 
> > > > > or ID, Y
> > > > > -is the port name or ID, and Z is the sub-port name or ID.  For 
> > > > > example, sw1p1s0
> > > > > -would be sub-port 0 on port 1 on switch 1.
> > > > > +Suggested formats of the port name returned by 
> > > > > ndo_get_phys_port_name are:
> > > > > + - pA for external ports;
> > > > > + - pAsB   for split external ports;
> > > > > + - pfCfor PF ports (so called PF representors);
> > > > > + - pfCvfD for VF ports (so called VF representors).
> > > > 
> > > > I hate to clutter this up, but might be also need to add:
> > > > 
> > > >  - pfCsBfor split PF ports (so called PF representors);
> > > >  - pfCsBvfD for split VF ports (so called VF representors).
> > > > 
> > > > or are we comfortable that these additions to the name for split ports
> > > > are implied?  
> > > 
> > > Hm..  What is a split PF port?  Splits happen on the physical port - see
> > > my rant on the thread this is a reply to ;)  PFs are PCIe functions,
> > > on the opposite side of the eswitch from the wires.  
> > 
> > I'm with you that I think there is value in separate netdevs to
> > represent "PFs, VFs and external ports/MACs" -- particularly for the
> > use-case you to create rules to control PF<->VF traffic.
> > 
> > So while I'm not saying it is a _great_ idea to support such a thing as
> > port-splitting of PFs, I suggested this addition as I'm not willing to 
> > restrict
> > such a design/implementation if a vendor or customer desired.  It seemed
> > useful to provde some guidance on how to name them -- even if we do not
> > like them.  :-)
> 
> If I understand you correctly split PF would be a situation where
> device has multiple port instances on the PCIe PF side?  IOW switch sees
> multiple endpoints on the PF side?  Let me attempt an ASCII diagram :)
> 
>
> HOST A ||  HOST B  
>||  
> PF A   | V | V

Re: [RFC] switchdev: clarify ndo_get_phys_port_name() formats

2017-07-25 Thread Andy Gospodarek
On Tue, Jul 25, 2017 at 03:26:47PM -0700, Jakub Kicinski wrote:
> On Tue, 25 Jul 2017 11:22:41 -0400, Andy Gospodarek wrote:
> > On Mon, Jul 24, 2017 at 10:13:44PM -0700, Jakub Kicinski wrote:
> > > We are still in position where we can suggest uniform naming
> > > convention for ndo_get_phys_port_name().  switchdev.txt file
> > > already contained a suggestion of how to name external ports.
> > > Since the use of switchdev for SR-IOV NIC's eswitches is growing,
> > > establish a format for ports of those devices as well.
> > > 
> > > Signed-off-by: Jakub Kicinski <jakub.kicin...@netronome.com>  
> > 
> > This is a nice addition and I suspect there could be even more done to
> > update this file to cover the VF rep usage.
> > 
> > > ---
> > >  Documentation/networking/switchdev.txt | 14 +++---
> > >  1 file changed, 11 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/Documentation/networking/switchdev.txt 
> > > b/Documentation/networking/switchdev.txt
> > > index 3e7b946dea27..7c4b6025fb4b 100644
> > > --- a/Documentation/networking/switchdev.txt
> > > +++ b/Documentation/networking/switchdev.txt
> > > @@ -119,9 +119,17 @@ into 4 10G ports, resulting in 4 port netdevs, the 
> > > device can give a unique
> > >  SUBSYSTEM=="net", ACTION=="add", 
> > > ATTR{phys_switch_id}=="", \
> > >   ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
> > >  
> > > -Suggested naming convention is "swXpYsZ", where X is the switch name or 
> > > ID, Y
> > > -is the port name or ID, and Z is the sub-port name or ID.  For example, 
> > > sw1p1s0
> > > -would be sub-port 0 on port 1 on switch 1.
> > > +Suggested formats of the port name returned by ndo_get_phys_port_name 
> > > are:
> > > + - pA for external ports;
> > > + - pAsB   for split external ports;
> > > + - pfCfor PF ports (so called PF representors);
> > > + - pfCvfD for VF ports (so called VF representors).  
> > 
> > I hate to clutter this up, but might be also need to add:
> > 
> >  - pfCsBfor split PF ports (so called PF representors);
> >  - pfCsBvfD for split VF ports (so called VF representors).
> > 
> > or are we comfortable that these additions to the name for split ports
> > are implied?
> 
> Hm..  What is a split PF port?  Splits happen on the physical port - see
> my rant on the thread this is a reply to ;)  PFs are PCIe functions,
> on the opposite side of the eswitch from the wires.

I'm with you that I think there is value in separate netdevs to
represent "PFs, VFs and external ports/MACs" -- particularly for the
use-case you to create rules to control PF<->VF traffic.

So while I'm not saying it is a _great_ idea to support such a thing as
port-splitting of PFs, I suggested this addition as I'm not willing to restrict
such a design/implementation if a vendor or customer desired.  It seemed
useful to provde some guidance on how to name them -- even if we do not
like them.  :-)



Re: [RFC] switchdev: clarify ndo_get_phys_port_name() formats

2017-07-25 Thread Andy Gospodarek
On Mon, Jul 24, 2017 at 10:13:44PM -0700, Jakub Kicinski wrote:
> We are still in position where we can suggest uniform naming
> convention for ndo_get_phys_port_name().  switchdev.txt file
> already contained a suggestion of how to name external ports.
> Since the use of switchdev for SR-IOV NIC's eswitches is growing,
> establish a format for ports of those devices as well.
> 
> Signed-off-by: Jakub Kicinski 

This is a nice addition and I suspect there could be even more done to
update this file to cover the VF rep usage.

> ---
>  Documentation/networking/switchdev.txt | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/networking/switchdev.txt 
> b/Documentation/networking/switchdev.txt
> index 3e7b946dea27..7c4b6025fb4b 100644
> --- a/Documentation/networking/switchdev.txt
> +++ b/Documentation/networking/switchdev.txt
> @@ -119,9 +119,17 @@ into 4 10G ports, resulting in 4 port netdevs, the 
> device can give a unique
>  SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="", \
>   ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
>  
> -Suggested naming convention is "swXpYsZ", where X is the switch name or ID, Y
> -is the port name or ID, and Z is the sub-port name or ID.  For example, 
> sw1p1s0
> -would be sub-port 0 on port 1 on switch 1.
> +Suggested formats of the port name returned by ndo_get_phys_port_name are:
> + - pA for external ports;
> + - pAsB   for split external ports;
> + - pfCfor PF ports (so called PF representors);
> + - pfCvfD for VF ports (so called VF representors).

I hate to clutter this up, but might be also need to add:

 - pfCsBfor split PF ports (so called PF representors);
 - pfCsBvfD for split VF ports (so called VF representors).

or are we comfortable that these additions to the name for split ports
are implied?

> +Where A is the port name or ID; B is the sub-port name or ID; C is PCIe
> +Physical Function name or ID and D is PCIe Virtual Function name or ID.
> +
> +Suggested naming convention for switches is "swX", where X is the switch name
> +or ID, plus the port name.  For example, sw1p1s0 would be sub-port 0 on port 
> 1
> +on switch 1.
>  
>  Port Features
>  ^


Re: [PATCH] net: bonding: Fix transmit load balancing in balance-alb mode

2017-07-20 Thread Andy Gospodarek
On Thu, Jul 20, 2017 at 1:20 AM, Kosuke Tatsukawa <ta...@ab.jp.nec.com> wrote:
> balance-alb mode used to have transmit dynamic load balancing feature
> enabled by default.  However, transmit dynamic load balancing no longer
> works in balance-alb after commit 8b426dc54cf4 ("bonding: remove
> hardcoded value").
>
> Both balance-tlb and balance-alb use the function bond_do_alb_xmit() to
> send packets.  This function uses the parameter tlb_dynamic_lb.
> tlb_dynamic_lb used to have the default value of 1 for balance-alb, but
> now the value is set to 0 except in balance-tlb.
>
> Re-enable transmit dyanmic load balancing by initializing tlb_dynamic_lb
> for balance-alb similar to balance-tlb.
>
> Signed-off-by: Kosuke Tatsukawa <ta...@ab.jp.nec.com>
> Cc: sta...@vger.kernel.org

You probably should add:

Fixes: 8b426dc54cf4 ("bonding: remove hardcoded value").

Otherwise this looks reasonable to me.

Acked-by: Andy Gospodarek <a...@greyhouse.net>


> ---
>  drivers/net/bonding/bond_main.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 14ff622..181839d 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -4596,7 +4596,7 @@ static int bond_check_params(struct bond_params *params)
> }
> ad_user_port_key = valptr->value;
>
> -   if (bond_mode == BOND_MODE_TLB) {
> +   if ((bond_mode == BOND_MODE_TLB) || (bond_mode == BOND_MODE_ALB)) {
> bond_opt_initstr(, "default");
> valptr = bond_opt_parse(bond_opt_get(BOND_OPT_TLB_DYNAMIC_LB),
> );
>


[PATCH net-next] samples/bpf: add option for native and skb mode for redirect apps

2017-07-17 Thread Andy Gospodarek
When testing with a driver that has both native and generic redirect support:

$ sudo ./samples/bpf/xdp_redirect -N 5 6
input: 5 output: 6
ifindex 6:4961879 pkt/s
ifindex 6:6391319 pkt/s
ifindex 6:6419468 pkt/s

$ sudo ./samples/bpf/xdp_redirect -S 5 6
input: 5 output: 6
ifindex 6:1845435 pkt/s
ifindex 6:3882850 pkt/s
ifindex 6:3893974 pkt/s

$ sudo ./samples/bpf/xdp_redirect_map -N 5 6
input: 5 output: 6
map[0] (vports) = 4, map[1] (map) = 5, map[2] (count) = 0
ifindex 6:2207374 pkt/s
ifindex 6:6212869 pkt/s
ifindex 6:6286515 pkt/s

$ sudo ./samples/bpf/xdp_redirect_map -S 5 6
input: 5 output: 6
map[0] (vports) = 4, map[1] (map) = 5, map[2] (count) = 0
ifindex 6:5052528 pkt/s
ifindex 6:5736631 pkt/s
ifindex 6:5739962 pkt/s

Signed-off-by: Andy Gospodarek <a...@greyhouse.net>
---
 samples/bpf/xdp_redirect_map_user.c | 50 ++---
 samples/bpf/xdp_redirect_user.c | 50 ++---
 2 files changed, 82 insertions(+), 18 deletions(-)

diff --git a/samples/bpf/xdp_redirect_map_user.c 
b/samples/bpf/xdp_redirect_map_user.c
index 0b8009a..a1ad00f 100644
--- a/samples/bpf/xdp_redirect_map_user.c
+++ b/samples/bpf/xdp_redirect_map_user.c
@@ -10,6 +10,7 @@
  * General Public License for more details.
  */
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -17,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "bpf_load.h"
 #include "bpf_util.h"
@@ -25,9 +27,11 @@
 static int ifindex_in;
 static int ifindex_out;
 
+static __u32 xdp_flags;
+
 static void int_exit(int sig)
 {
-   set_link_xdp_fd(ifindex_in, -1, 0);
+   set_link_xdp_fd(ifindex_in, -1, xdp_flags);
exit(0);
 }
 
@@ -56,20 +60,47 @@ static void poll_stats(int interval, int ifindex)
}
 }
 
-int main(int ac, char **argv)
+static void usage(const char *prog)
 {
-   char filename[256];
-   int ret, key = 0;
+   fprintf(stderr,
+   "usage: %s [OPTS] IFINDEX_IN IFINDEX_OUT\n\n"
+   "OPTS:\n"
+   "-Suse skb-mode\n"
+   "-Nenforce native mode\n",
+   prog);
+}
 
-   snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
 
-   if (ac != 3) {
+int main(int argc, char **argv)
+{
+   const char *optstr = "SN";
+   char filename[256];
+   int ret, opt, key = 0;
+
+   while ((opt = getopt(argc, argv, optstr)) != -1) {
+   switch (opt) {
+   case 'S':
+   xdp_flags |= XDP_FLAGS_SKB_MODE;
+   break;
+   case 'N':
+   xdp_flags |= XDP_FLAGS_DRV_MODE;
+   break;
+   default:
+   usage(basename(argv[0]));
+   return 1;
+   }
+   }
+
+   if (optind == argc) {
printf("usage: %s IFINDEX_IN IFINDEX_OUT\n", argv[0]);
return 1;
}
 
-   ifindex_in = strtoul(argv[1], NULL, 0);
-   ifindex_out = strtoul(argv[2], NULL, 0);
+   ifindex_in = strtoul(argv[optind], NULL, 0);
+   ifindex_out = strtoul(argv[optind + 1], NULL, 0);
+   printf("input: %d output: %d\n", ifindex_in, ifindex_out);
+
+   snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
 
if (load_bpf_file(filename)) {
printf("%s", bpf_log_buf);
@@ -82,8 +113,9 @@ int main(int ac, char **argv)
}
 
signal(SIGINT, int_exit);
+   signal(SIGTERM, int_exit);
 
-   if (set_link_xdp_fd(ifindex_in, prog_fd[0], 0) < 0) {
+   if (set_link_xdp_fd(ifindex_in, prog_fd[0], xdp_flags) < 0) {
printf("link set xdp fd failed\n");
return 1;
}
diff --git a/samples/bpf/xdp_redirect_user.c b/samples/bpf/xdp_redirect_user.c
index 761a91d..f705a19 100644
--- a/samples/bpf/xdp_redirect_user.c
+++ b/samples/bpf/xdp_redirect_user.c
@@ -10,6 +10,7 @@
  * General Public License for more details.
  */
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -17,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "bpf_load.h"
 #include "bpf_util.h"
@@ -25,9 +27,11 @@
 static int ifindex_in;
 static int ifindex_out;
 
+static __u32 xdp_flags;
+
 static void int_exit(int sig)
 {
-   set_link_xdp_fd(ifindex_in, -1, 0);
+   set_link_xdp_fd(ifindex_in, -1, xdp_flags);
exit(0);
 }
 
@@ -56,20 +60,47 @@ static void poll_stats(int interval, int ifindex)
}
 }
 
-int main(int ac, char **argv)
+static void usage(const char *prog)
 {
-   char filename[256];
-   int ret, key = 0;
+   fprintf(stderr,
+   "usage: %s [OPTS] IFINDEX_IN IFINDEX_OUT\n\n"
+   "OPTS:\n"
+   "-Suse skb-mo

  1   2   3   4   >