Re: [PATCH net 0/2] IB/ipoib: ip link support

2018-01-02 Thread Erez Shitrit
On Tue, Jan 2, 2018 at 9:29 AM, Or Gerlitz  wrote:
> On Sun, Dec 31, 2017 at 2:28 PM, Or Gerlitz  wrote:
>> On Sun, Dec 31, 2017 at 1:16 PM, Denis Drozdov  wrote:
>>> IP link was broken due to the changes in IPoIB for the rdma_netdev
>>> support after commit cd565b4b51e5
>>> ("IB/IPoIB: Support acceleration options callbacks").
>>
>> you are approaching stable kernels, right? lets make sure dave is okay
>> to pick the net (touching dev.c etc) patch to stable even if the
>> offending patch didn't change anything there.
>
> Erez, Denis
>
> The offending commit went in 4.12-rc1, so the fix should be getting
> a while back. RU sure you need the net/core patch to fix that up or
> we can come up with ipoib only patch?

The prefered approach was to have 2 patches one for ipoib and one for core/net.
So, we can't with patch to ipoib only.

Denis, the author of the patches is on new-year vacation, he will
close all the gaps as soon as he will be back.

>
> Or.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/IPoIB: Check the headroom size

2017-04-25 Thread Erez Shitrit
On Tue, Apr 25, 2017 at 2:14 PM, Or Gerlitz <gerlitz...@gmail.com> wrote:
> On Tue, Apr 25, 2017 at 2:11 PM, Erez Shitrit <ere...@dev.mellanox.co.il> 
> wrote:
>> On Tue, Apr 25, 2017 at 1:32 PM, Or Gerlitz <gerlitz...@gmail.com> wrote:
>>> On Tue, Apr 25, 2017 at 12:55 PM, Honggang LI <ho...@redhat.com> wrote:
>>>> From: Honggang Li <ho...@redhat.com>
>>>>
>>>> Minimal hard_header_len set by bond_compute_features is ETH_HLEN, which
>>>> is smaller than IPOIB_HARD_LEN. ipoib_hard_header should check the
>>>> size of headroom to avoid skb_under_panic.
>>>
>>> sounds terrible, ipoib bonding is supported since ~2007, thanks for
>>> reporting on that.
>>>
>>>> [  122.871493] ipoib_hard_header: skb->head= 8808179d9400, skb->data= 
>>>> 8808179d9420, skb_headroom= 0x20
>>>> [  123.055400] bond0: Releasing backup interface mthca_ib1
>>>> [  123.560529] bond_compute_features:1112 bond0 bond_dev->hard_header_len 
>>>> = 14
>>>> [  123.568822] CPU: 0 PID: 12336 Comm: ifdown-ib Not tainted 4.9.0-debug #1
>>>
>>> did you generate this trace by calling dump_stack or this is existing
>>> kernel code.
>>>
>>>> Fixes: fc791b633515 ('IB/ipoib: move back IB LL address into the hard 
>>>> header')
>>>
>>> this is more of WA to avoid some crash or failure but not fixing the
>>> actual problem
>>>
>>> Erez, can you comment?
>>
>> We saw that after commit fc791b633515, it happened while removing bond
>> interface after its slaves (ipoib interface) removed.
>> At that point the bond interface sets its dev_harheader_len to be as
>> eth interfaces (14 instead of 24), and if a process which doesn't
>> aware of the slaves removal or was at the middle of the sending tries
>> to send (igmp) packet it goes to ipoib with no space in the skb for
>> it, and here comes the panic.
>
> thanks for the info. Is this bug there since ipoib/bonding day one
> (and hence my bug...)
> or was indeed introduced later? if later, can you explain how
> fc791b633515 introduced
> that or you only know it by bisection?

commit "fc791b633515" changes the size of the dev_hardlen to be 24 and
required 24 extra bytes in the skb, before it was only 4, if skb is
aligned to eth "mode" it already has 14 bytes for hard-header.
So only after that commit we have the issue.

>
>> I agree with you that this fix is w/a, and it is a fix in the data
>> path for all the packets while the panic is in a control flow. It
>> probably should be fixed in the bonding driver.
>
> so what's your suggestion? fc791b633515 is 6m old, and it means the bug
> is in stable kernels and probably also in inbox drivers
>
> Or.


Re: [PATCH] IB/IPoIB: Check the headroom size

2017-04-25 Thread Erez Shitrit
On Tue, Apr 25, 2017 at 1:32 PM, Or Gerlitz  wrote:
> On Tue, Apr 25, 2017 at 12:55 PM, Honggang LI  wrote:
>> From: Honggang Li 
>>
>> Minimal hard_header_len set by bond_compute_features is ETH_HLEN, which
>> is smaller than IPOIB_HARD_LEN. ipoib_hard_header should check the
>> size of headroom to avoid skb_under_panic.
>
> sounds terrible, ipoib bonding is supported since ~2007, thanks for
> reporting on that.
>
>> [  122.871493] ipoib_hard_header: skb->head= 8808179d9400, skb->data= 
>> 8808179d9420, skb_headroom= 0x20
>> [  123.055400] bond0: Releasing backup interface mthca_ib1
>> [  123.560529] bond_compute_features:1112 bond0 bond_dev->hard_header_len = 
>> 14
>> [  123.568822] CPU: 0 PID: 12336 Comm: ifdown-ib Not tainted 4.9.0-debug #1
>
> did you generate this trace by calling dump_stack or this is existing
> kernel code.
>
>> Fixes: fc791b633515 ('IB/ipoib: move back IB LL address into the hard 
>> header')
>
> this is more of WA to avoid some crash or failure but not fixing the
> actual problem
>
> Erez, can you comment?

We saw that after commit fc791b633515, it happened while removing bond
interface after its slaves (ipoib interface) removed.
At that point the bond interface sets its dev_harheader_len to be as
eth interfaces (14 instead of 24), and if a process which doesn't
aware of the slaves removal or was at the middle of the sending tries
to send (igmp) packet it goes to ipoib with no space in the skb for
it, and here comes the panic.

I agree with you that this fix is w/a, and it is a fix in the data
path for all the packets while the panic is in a control flow. It
probably should be fixed in the bonding driver.

>
> Or.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API

2017-03-16 Thread Erez Shitrit
On Mon, Mar 13, 2017 at 10:01 PM, Jason Gunthorpe
<jguntho...@obsidianresearch.com> wrote:
> On Mon, Mar 13, 2017 at 08:31:15PM +0200, Erez Shitrit wrote:
>
>> diff --git a/include/rdma/ib_ipoib_accel_ops.h 
>> b/include/rdma/ib_ipoib_accel_ops.h
>> new file mode 100644
>> index ..148a5529a559
>> +++ b/include/rdma/ib_ipoib_accel_ops.h
>
> Both patches need a better naming scheme for this file..
>
> rn_opa_vnic.h
> rn_ipoib.h
>
> Maybe?
>
>> +struct rdma_netdev {
>> + void *clnt_priv;
>> +
>> + /* control functions */
>> + void (*set_id)(struct net_device *netdev, int id);
>
>> + /* IB resource allocation function, returns new UD QP */
>> + int (*ib_dev_init)(struct net_device *dev, struct ib_device *hca,
>> +int *qp_num);
>
> Why can't some combination of alloc_rdma_netdev and ndo.open do this stuff?
>
>> + void (*ib_dev_cleanup)(struct net_device *dev, struct ib_device *hca);
>
> Ditto
>
>> + /* send packet */
>> + void (*send)(struct net_device *dev, struct sk_buff *skb,
>> +  struct ipoib_ah *address, u32 dqpn, u32 dqkey);
>
>> + /* multicast */
>> + int (*attach_mcast)(struct net_device *dev, struct ib_device *hca,
>> + union ib_gid *gid, u16 lid, int set_qkey);
>> + int (*detach_mcast)(struct net_device *dev, struct ib_device *hca,
>> + union ib_gid *gid, u16 lid);
>
> It would make more sense to store the struct ib_device pointer in the
> struct rdma_netdev.
>
> Should 'lid' be 'mlid'?
>
>> + int qp_num;
>
> This one probably belongs in ipoib_rdma_netdev
>
>> + void *context;

The QP as a part of the HW resources, it is created in the low-level
driver, and used by the upper ipoib for few reasons, (for example the
mac of the ipoib interface includes from the qp_num)
Now, if we want to use the ndo's init/uninit i need to store member
variables (qp_num and context) in the rdma_netdev, that will let me
use the ndos as is.
rdma_netdev is the one who belongs to both layers, ipoib and the low-level.

>
> What is this? Why is something other than ipoib_priv or ipoib_dev_priv
> needed?
>
>
>>   struct ib_wq_attr *attr,
>>   u32 wq_attr_mask,
>>   struct ib_udata *udata);
>> + struct ib_ipoib_accel_ops * (*get_ipoib_accel_ops)(struct ib_device 
>> *device);
>
> rebase error? Not sure how this compiles
>
> Jason


Re: [RFC v1 for accelerated IPoIB 25/25] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops

2017-03-14 Thread Erez Shitrit
On Tue, Mar 14, 2017 at 6:10 PM, Jason Gunthorpe
<jguntho...@obsidianresearch.com> wrote:
> On Tue, Mar 14, 2017 at 04:53:24PM +0200, Erez Shitrit wrote:
>
>> > Why isn't this stuff in open/close?
>>
>> According to ipoib control flows, there is a different between
>> open/close to init/cleanup for example, in open/close the driver
>> doesn't destroy hw resources, just change the state, it destroys
>> them in cleanup.
>
> So put it in mlx5_alloc_rdma_netdev then?
>
> Or ndo.init as was suggested?

I can do that, as i said to your previous suggestion, will add the
ib_device to the rdma_netdev and will use the ndo.init

>
> Or in the void (*setup)(struct net_device *)
>
>> >> + param.size_base_priv = sizeof(struct ipoib_rdma_netdev);
>> >
>> > This is really weird, the code in mlx5i_create_netdev calls
>> > ipoib_dev_priv so it must assume the struct is a ipoib_rdma_netdev.
>>
>> It is the same attitude as in the vnic/hfi
>> (https://patchwork.kernel.org/patch/9587815/)
>
> Not quite, they call alloc_netdev_mqs directly, here indirects through
> mlx5i_create_netdev which assumes a priv layout, Just drop
> param.size_base_priv and put that same calculation in
> mlx5i_create_netdev..

We are sharing 2 drivers as the low level driver, anyway i will find
the way to do that.

>
> Jason
>


Re: [RFC v1 for accelerated IPoIB 25/25] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops

2017-03-14 Thread Erez Shitrit
On Mon, Mar 13, 2017 at 10:27 PM, Jason Gunthorpe
<jguntho...@obsidianresearch.com> wrote:
> On Mon, Mar 13, 2017 at 08:31:36PM +0200, Erez Shitrit wrote:
>
>> +struct net_device *mlx5_alloc_rdma_netdev(struct ib_device *hca,
>> +  u8 port_num,
>> +  enum rdma_netdev_t type,
>> +  const char *name,
>> +  unsigned char name_assign_type,
>> +  void (*setup)(struct net_device *));
>> +void mlx5_free_rdma_netdev(struct net_device *netdev);
>
> Seems like OK signatures to me..
>
>> + dev->ib_dev.alloc_rdma_netdev   = mlx5_alloc_rdma_netdev;
>> + dev->ib_dev.free_rdma_netdev= mlx5_free_rdma_netdev;
>
> Since mlx5_free_rdma_netdev is empty this should just be NULL

OK,

>
>> +int mlx5_ib_dev_init(struct net_device *dev, struct ib_device *hca,
>> +  int *qp_num)
>> +{
>> + void *next_priv = ipoib_dev_priv(dev);
>> + struct rdma_netdev *rn = netdev_priv(dev);
>> + struct mlx5_ib_dev *ib_dev = to_mdev(hca);
>> + int ret;
>> +
>> + ret = mlx5i_attach(ib_dev->mdev, next_priv);
>> + if (ret) {
>> + pr_err("Failed resources allocation for device: %s ret: %d\n",
>> +dev->name, ret);
>> + return ret;
>> + }
>> +
>> + *qp_num = rn->qp_num;
>> +
>> + pr_debug("resources allocated for device: %s\n", dev->name);
>> +
>> + return 0;
>> +}
>> +
>> +void mlx5_ib_dev_cleanup(struct net_device *dev, struct ib_device *hca)
>> +{
>> + void *next_priv = ipoib_dev_priv(dev);
>> + struct rdma_netdev *rn = netdev_priv(dev);
>> + struct mlx5_ib_dev *ib_dev = to_mdev(hca);
>> + struct mlx5_qp_context context;
>> + int ret;
>> +
>> + /* detach qp from flow-steering by reset it */
>> + ret = mlx5_core_qp_modify(ib_dev->mdev,
>> +   MLX5_CMD_OP_2RST_QP, 0, ,
>> +   (struct mlx5_core_qp *)rn->context);
>> + if (ret)
>> + pr_err("%s failed (ret: %d) to reset QP\n", __func__, ret);
>> +
>> + mlx5i_detach(ib_dev->mdev, next_priv);
>> +
>> + mlx5_ib_clean_qp(ib_dev, (struct mlx5_core_qp *)rn->context);
>> +}
>
> Why isn't this stuff in open/close?

According to ipoib control flows, there is a different between
open/close to init/cleanup
for example, in open/close the driver doesn't destroy hw resources,
just change the state, it destroys them in cleanup.

>
>> +void mlx5_ib_send(struct net_device *dev, struct sk_buff *skb,
>> +   struct ipoib_ah *address, u32 dqpn, u32 dqkey)
>> +{
>> + void *next_priv = ipoib_dev_priv(dev);
>> +
>> + mlx5i_xmit(skb, next_priv, _mah(address->ah)->av, dqpn, dqkey);
>
> How come the qkey is not available via ipoib_ah ?
>
> to_mah(address->ah)->av->key.qkey.qkey
>
> ?

It is, i will change the signature of that function accordingly.

>
>> +static const struct net_device_ops ipoib_netdev_default_pf = {
>
> That is a weird name for a mlx5 specific structure.

OK, will change that.

>
>> + param.size_base_priv = sizeof(struct ipoib_rdma_netdev);
>
> This is really weird, the code in mlx5i_create_netdev calls
> ipoib_dev_priv so it must assume the struct is a ipoib_rdma_netdev.

It is the same attitude as in the vnic/hfi
(https://patchwork.kernel.org/patch/9587815/)
The lower driver allocates space for the rdma_netdev.
the only struct that is known between the layers is rdma_netdev.

>
>> + /* set func pointers */
>> + rn = netdev_priv(dev);
>> + rn->qp_num = qp->qpn;
>> + rn->context = qp;
>
> No for using context.. You need your own driver priv, like this:
>
> struct mlx4_rn_priv
> {
> struct mlx5e_priv priv;
> struct mlx5_core_qp *qp;
> };

OK, will try to fix it (i have a priv which is shared with the en
driver, so i don't want to mix it with ib objects like qp, will find a
solution for that, thanks.)



>
> Jason


Re: [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks

2017-03-14 Thread Erez Shitrit
On Tue, Mar 14, 2017 at 8:35 AM, Vishwanathapura, Niranjana
<niranjana.vishwanathap...@intel.com> wrote:
> On Mon, Mar 13, 2017 at 08:31:16PM +0200, Erez Shitrit wrote:
>>
>> +static struct net_device *ipoib_create_netdev_default(struct ib_device
>> *hca,
>> + const char *name,
>> + void (*setup)(struct
>> net_device *))
>> {
>> struct net_device *dev;
>> +   struct rdma_netdev *rn;
>>
>> -   dev = alloc_netdev((int)sizeof(struct ipoib_dev_priv), name,
>> -  NET_NAME_UNKNOWN, ipoib_setup);
>> +   dev = alloc_netdev((int)sizeof(struct ipoib_rdma_netdev),
>> +  name,
>> +  NET_NAME_UNKNOWN, setup);
>> if (!dev)
>> return NULL;
>>
>> -   return netdev_priv(dev);
>> +   rn = netdev_priv(dev);
>> +
>> +   rn->ib_dev_init = ipoib_dev_init_default;
>> +   rn->ib_dev_cleanup = ipoib_dev_uninit_default;
>> +   rn->send = ipoib_send;
>> +   rn->attach_mcast = ipoib_mcast_attach;
>> +   rn->detach_mcast = ipoib_mcast_detach;
>> +
>> +   dev->netdev_ops = _netdev_default_pf;
>> +
>
>
> Probably no need to set netdev_ops here as it gets overwritten.

No, it is switched, and used.

>
>
>> +   return dev;
>> +}
>> +
>> +struct ipoib_dev_priv *ipoib_intf_alloc(struct ib_device *hca, u8 port,
>> +   const char *name)
>> +{
>> +   struct net_device *dev;
>> +   struct ipoib_dev_priv *priv;
>> +   struct rdma_netdev *rn;
>> +
>> +   priv = kzalloc(sizeof(*priv), GFP_KERNEL);
>> +   if (!priv) {
>> +   pr_err("%s failed allocting priv\n", __func__);
>> +   return NULL;
>> +   }
>> +
>> +   if (!hca->alloc_rdma_netdev)
>> +   dev = ipoib_create_netdev_default(hca, name,
>> ipoib_setup_common);
>> +   else
>> +   dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
>> +name, NET_NAME_UNKNOWN,
>> +ipoib_setup_common);
>> +   if (!dev) {
>> +   kfree(priv);
>> +   return NULL;
>> +   }
>
>
> This will break ipoib on hfi1 as hfi1 will define alloc_rdma_netdev for
> OPA_VNIC type. We should probably look for a dedicated return type
> (-ENODEV?) to determine of the driver supports specified rdma netdev type.
> Or use a ib device attribute to suggest driver support ipoib rdma netdev.

sorry, I don't understand that, we are in ipoib driver, so the type is
RDMA_NETDEV_IPOIB, if hfi wants to implement it should use the same
flag, and to use OPA_VNIC for vnic.


>
> Niranjana


Re: [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API

2017-03-14 Thread Erez Shitrit
On Tue, Mar 14, 2017 at 9:01 AM, Vishwanathapura, Niranjana
 wrote:
> On Mon, Mar 13, 2017 at 02:01:36PM -0600, Jason Gunthorpe wrote:
>>>
>>> +   /* multicast */
>>> +   int (*attach_mcast)(struct net_device *dev, struct ib_device
>>> *hca,
>>> +   union ib_gid *gid, u16 lid, int set_qkey);
>>> +   int (*detach_mcast)(struct net_device *dev, struct ib_device
>>> *hca,
>>> +   union ib_gid *gid, u16 lid);
>>
>>
>> It would make more sense to store the struct ib_device pointer in the
>> struct rdma_netdev.
>>
>
> Agree that it shouldn't be a function parameters.
> For opa_vnic, I found it convenient to store ib_device pointer in client and
> device private structures as those will be available in most places anyhow.

Will add it to the rdma_netdev obj, as Jason suggested.
Thanks,

>
> Niranjana


[RFC v1 for accelerated IPoIB 18/25] net/mlx5e: Export open/close api for IB link

2017-03-13 Thread Erez Shitrit

Now IB device is able to call open or close for its net device.

TBD:
There is one change that is waiting for the new channels api, till then
used an "if" in the code.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 23 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 13 -
 include/linux/mlx5/driver.h   |  3 +++
 4 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 2f9242ae06f3..154cab2a301b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -934,4 +934,5 @@ int mlx5e_get_offload_stats(int attr_id, const struct 
net_device *dev,
 
 bool mlx5e_is_uplink_rep(struct mlx5e_priv *priv);
 bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv);
+bool mlx5e_is_eswitch_vport_mngr(struct mlx5_core_dev *mdev);
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ca1867cdce48..24efc8ccc075 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2269,7 +2269,9 @@ int mlx5e_open_locked(struct mlx5e_priv *priv)
}
 
mlx5e_redirect_rqts(priv);
-   mlx5e_update_carrier(priv);
+   /* only for the RFC, will use channels api when available */
+   if (MLX5_CAP_GEN(mdev, port_type) == MLX5_CAP_PORT_TYPE_ETH)
+   mlx5e_update_carrier(priv);
mlx5e_timestamp_init(priv);
 #ifdef CONFIG_RFS_ACCEL
priv->netdev->rx_cpu_rmap = priv->mdev->rmap;
@@ -2277,7 +2279,7 @@ int mlx5e_open_locked(struct mlx5e_priv *priv)
if (priv->profile->update_stats)
queue_delayed_work(priv->wq, >update_stats_work, 0);
 
-   if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
+   if (mlx5e_is_eswitch_vport_mngr(mdev)) {
err = mlx5e_add_sqs_fwd_rules(priv);
if (err)
goto err_close_channels;
@@ -3899,6 +3901,7 @@ static void mlx5i_nic_init(struct mlx5_core_dev *mdev,
struct mlx5e_priv *priv = ipoib_dev_priv(netdev);
 
mlx5n_build_nic_netdev_priv_common(mdev, netdev, priv, profile, ppriv);
+   priv->ppriv = NULL;
 }
 
 static int mlx5i_init_nic_rx(struct mlx5e_priv *priv)
@@ -4025,6 +4028,22 @@ void mlx5i_detach(struct mlx5_core_dev *mdev, void 
*vpriv)
 }
 EXPORT_SYMBOL(mlx5i_detach);
 
+int mlx5i_open(void *vpriv)
+{
+   struct mlx5e_priv *priv = vpriv;
+
+   return mlx5e_open_locked(priv);
+}
+EXPORT_SYMBOL(mlx5i_open);
+
+int mlx5i_close(void *vpriv)
+{
+   struct mlx5e_priv *priv = vpriv;
+
+   return mlx5e_close_locked(priv);
+}
+EXPORT_SYMBOL(mlx5i_close);
+
 static const struct mlx5e_profile mlx5e_nic_profile = {
.init  = mlx5e_nic_init,
.cleanup   = mlx5e_nic_cleanup,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 21d3d8e0bab7..cbb10924 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -219,7 +219,12 @@ int mlx5e_nic_rep_load(struct mlx5_eswitch *esw, struct 
mlx5_eswitch_rep *rep)
 void mlx5e_remove_sqs_fwd_rules(struct mlx5e_priv *priv)
 {
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
-   struct mlx5_eswitch_rep *rep = priv->ppriv;
+   struct mlx5_eswitch_rep *rep;
+
+   if (!priv->ppriv)
+   return;
+
+   rep = priv->ppriv;
 
mlx5_eswitch_sqs2vport_stop(esw, rep);
 }
@@ -323,6 +328,12 @@ bool mlx5e_is_uplink_rep(struct mlx5e_priv *priv)
return false;
 }
 
+bool mlx5e_is_eswitch_vport_mngr(struct mlx5_core_dev *mdev)
+{
+   return (MLX5_CAP_GEN(mdev, vport_group_manager) &&
+   MLX5_CAP_GEN(mdev, port_type) == MLX5_CAP_PORT_TYPE_ETH);
+}
+
 bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv)
 {
struct mlx5_eswitch_rep *rep = (struct mlx5_eswitch_rep *)priv->ppriv;
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index d0060cfb2a4f..c18be51287e7 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1139,4 +1139,7 @@ struct net_device *mlx5i_create_netdev(struct 
mlx5_core_dev *mdev,
   struct mlx5i_create_ext_param *param);
 int mlx5i_attach(struct mlx5_core_dev *mdev, void *vpriv);
 void mlx5i_detach(struct mlx5_core_dev *mdev, void *vpriv);
+int mlx5i_close(void *vpriv);
+int mlx5i_open(void *vpriv);
+
 #endif /* MLX5_DRIVER_H */
-- 
1.8.3.1



[RFC v1 for accelerated IPoIB 13/25] net/mlx5e: Export resource creation function to be used in IB link

2017-03-13 Thread Erez Shitrit

mlx5i_attach that creates the resources of IB network device.
mlx5i_detach cleans resources for IB device.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 122 +++---
 1 file changed, 87 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 87881f9ddf35..5b3c2e67607f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3844,6 +3844,54 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
mlx5_lag_remove(mdev);
 }
 
+static int mlx5n_attach_netdev_common(struct mlx5_core_dev *mdev,
+ struct mlx5e_priv *priv)
+{
+   const struct mlx5e_profile *profile;
+   struct net_device *netdev;
+   int err;
+
+   netdev = priv->netdev;
+   profile = priv->profile;
+   clear_bit(MLX5E_STATE_DESTROYING, >state);
+
+   err = profile->init_tx(priv);
+   if (err)
+   goto out;
+
+   err = mlx5e_open_drop_rq(priv);
+   if (err) {
+   mlx5_core_err(mdev, "open drop rq failed, %d\n", err);
+   goto err_cleanup_tx;
+   }
+
+   err = profile->init_rx(priv);
+   if (err)
+   goto err_close_drop_rq;
+
+   mlx5e_create_q_counter(priv);
+
+   if (profile->enable)
+   profile->enable(priv);
+
+   rtnl_lock();
+   if (netif_running(netdev))
+   mlx5e_open(netdev);
+   netif_device_attach(netdev);
+   rtnl_unlock();
+
+   return 0;
+
+err_close_drop_rq:
+   mlx5e_close_drop_rq(priv);
+
+err_cleanup_tx:
+   profile->cleanup_tx(priv);
+
+out:
+   return err;
+}
+
 static void mlx5i_nic_init(struct mlx5_core_dev *mdev,
   struct net_device *netdev,
   const struct mlx5e_profile *profile,
@@ -3942,6 +3990,42 @@ struct net_device *mlx5i_create_netdev(struct 
mlx5_core_dev *mdev,
 }
 EXPORT_SYMBOL(mlx5i_create_netdev);
 
+int mlx5i_attach(struct mlx5_core_dev *mdev, void *vpriv)
+{
+   struct mlx5e_priv *priv = vpriv;
+   struct net_device *netdev = priv->netdev;
+   int err;
+
+   if (netif_device_present(netdev))
+   return 0;
+
+   err = mlx5e_create_mdev_resources(mdev);
+   if (err)
+   return err;
+
+   err = mlx5n_attach_netdev_common(mdev, priv);
+   if (err) {
+   mlx5e_destroy_mdev_resources(mdev);
+   return err;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL(mlx5i_attach);
+
+void mlx5i_detach(struct mlx5_core_dev *mdev, void *vpriv)
+{
+   struct mlx5e_priv *priv = vpriv;
+   struct net_device *netdev = priv->netdev;
+
+   if (!netif_device_present(netdev))
+   return;
+
+   mlx5e_detach_netdev(mdev, netdev);
+   mlx5e_destroy_mdev_resources(mdev);
+}
+EXPORT_SYMBOL(mlx5i_detach);
+
 static const struct mlx5e_profile mlx5e_nic_profile = {
.init  = mlx5e_nic_init,
.cleanup   = mlx5e_nic_cleanup,
@@ -3996,31 +4080,17 @@ struct net_device *mlx5e_create_netdev(struct 
mlx5_core_dev *mdev,
 
 int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv)
 {
-   const struct mlx5e_profile *profile;
struct net_device *netdev;
u16 max_mtu;
int err;
 
netdev = priv->netdev;
-   profile = priv->profile;
-   clear_bit(MLX5E_STATE_DESTROYING, >state);
-
-   err = profile->init_tx(priv);
-   if (err)
-   goto out;
 
-   err = mlx5e_open_drop_rq(priv);
+   err = mlx5n_attach_netdev_common(mdev, priv);
if (err) {
-   mlx5_core_err(mdev, "open drop rq failed, %d\n", err);
-   goto err_cleanup_tx;
+   mlx5_core_err(mdev, "failed attach netdev %d\n", err);
+   return err;
}
-
-   err = profile->init_rx(priv);
-   if (err)
-   goto err_close_drop_rq;
-
-   mlx5e_create_q_counter(priv);
-   //TBD do i need to change that?
mlx5e_init_l2_addr(priv);
 
/* MTU range: 68 - hw-specific max */
@@ -4030,25 +4100,7 @@ int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, 
struct mlx5e_priv *priv)
 
mlx5e_set_dev_port_mtu(netdev);
 
-   if (profile->enable)
-   profile->enable(priv);
-
-   rtnl_lock();
-   if (netif_running(netdev))
-   mlx5e_open(netdev);
-   netif_device_attach(netdev);
-   rtnl_unlock();
-
return 0;
-
-err_close_drop_rq:
-   mlx5e_close_drop_rq(priv);
-
-err_cleanup_tx:
-   profile->cleanup_tx(priv);
-
-out:
-   return err;
 }
 
 static void mlx5e_register_vport_rep(struct mlx5_core_dev *mdev)
-- 
1.8.3.1



[RFC v1 for accelerated IPoIB 16/25] net/mlx5e: Change cleanup API in order to enable IB link

2017-03-13 Thread Erez Shitrit

1. Change the mlx5e_detach_netdev api.
2. Let that function to be called after the rtnl_lock is already held,
like done in IB link.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 18 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  |  4 ++--
 3 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 84db4761f09c..a10966df24f6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -921,7 +921,7 @@ struct net_device *mlx5e_create_netdev(struct mlx5_core_dev 
*mdev,
   void *ppriv);
 void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
 int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
-void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device 
*netdev);
+void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
 u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout);
 void mlx5e_add_vxlan_port(struct net_device *netdev,
  struct udp_tunnel_info *ti);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 5b3c2e67607f..b91bd7a179fc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4021,7 +4021,7 @@ void mlx5i_detach(struct mlx5_core_dev *mdev, void *vpriv)
if (!netif_device_present(netdev))
return;
 
-   mlx5e_detach_netdev(mdev, netdev);
+   mlx5e_detach_netdev(mdev, priv);
mlx5e_destroy_mdev_resources(mdev);
 }
 EXPORT_SYMBOL(mlx5i_detach);
@@ -4126,18 +4126,22 @@ static void mlx5e_register_vport_rep(struct 
mlx5_core_dev *mdev)
}
 }
 
-void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev)
+void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv)
 {
-   struct mlx5e_priv *priv = netdev_priv(netdev);
+   struct net_device *netdev = priv->netdev;
const struct mlx5e_profile *profile = priv->profile;
-
+   bool locked = false;
set_bit(MLX5E_STATE_DESTROYING, >state);
 
-   rtnl_lock();
+   if (!rtnl_is_locked()) {
+   rtnl_lock();
+   locked = true;
+   }
if (netif_running(netdev))
mlx5e_close(netdev);
netif_device_detach(netdev);
-   rtnl_unlock();
+   if (locked)
+   rtnl_unlock();
 
if (profile->disable)
profile->disable(priv);
@@ -4183,7 +4187,7 @@ static void mlx5e_detach(struct mlx5_core_dev *mdev, void 
*vpriv)
if (!netif_device_present(netdev))
return;
 
-   mlx5e_detach_netdev(mdev, netdev);
+   mlx5e_detach_netdev(mdev, priv);
mlx5e_destroy_mdev_resources(mdev);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 0aad28da1638..21d3d8e0bab7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -586,7 +586,7 @@ int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
return 0;
 
 err_detach_netdev:
-   mlx5e_detach_netdev(esw->dev, netdev);
+   mlx5e_detach_netdev(esw->dev, priv);
 
 err_destroy_netdev:
mlx5e_destroy_netdev(esw->dev, netdev_priv(netdev));
@@ -601,6 +601,6 @@ void mlx5e_vport_rep_unload(struct mlx5_eswitch *esw,
struct net_device *netdev = rep->netdev;
 
unregister_netdev(netdev);
-   mlx5e_detach_netdev(esw->dev, netdev);
+   mlx5e_detach_netdev(esw->dev, netdev_priv(netdev));
mlx5e_destroy_netdev(esw->dev, netdev_priv(netdev));
 }
-- 
1.8.3.1



[RFC v1 for accelerated IPoIB 17/25] net/mlx5e: Change mlx5e_open_locked and mlx5e_close_locked api

2017-03-13 Thread Erez Shitrit

Let the IB link to call it directly with the relevant priv.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  4 +--
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 24 -
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 31 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|  4 +--
 4 files changed, 31 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index a10966df24f6..2f9242ae06f3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -797,8 +797,8 @@ int mlx5e_vlan_rx_kill_vid(struct net_device *dev, 
__always_unused __be16 proto,
 void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_priv *priv, void *tirc,
enum mlx5e_traffic_types tt);
 
-int mlx5e_open_locked(struct net_device *netdev);
-int mlx5e_close_locked(struct net_device *netdev);
+int mlx5e_open_locked(struct mlx5e_priv *priv);
+int mlx5e_close_locked(struct mlx5e_priv *priv);
 void mlx5e_build_default_indir_rqt(struct mlx5_core_dev *mdev,
   u32 *indirection_rqt, int len,
   int num_channels);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index bb67863aa361..0c8773718292 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -516,14 +516,14 @@ static int mlx5e_set_ringparam(struct net_device *dev,
 
was_opened = test_bit(MLX5E_STATE_OPENED, >state);
if (was_opened)
-   mlx5e_close_locked(dev);
+   mlx5e_close_locked(priv);
 
priv->params.log_rq_size = log_rq_size;
priv->params.log_sq_size = log_sq_size;
priv->params.min_rx_wqes = min_rx_wqes;
 
if (was_opened)
-   err = mlx5e_open_locked(dev);
+   err = mlx5e_open_locked(priv);
 
mutex_unlock(>state_lock);
 
@@ -561,7 +561,7 @@ static int mlx5e_set_channels(struct net_device *dev,
 
was_opened = test_bit(MLX5E_STATE_OPENED, >state);
if (was_opened)
-   mlx5e_close_locked(dev);
+   mlx5e_close_locked(priv);
 
arfs_enabled = dev->features & NETIF_F_NTUPLE;
if (arfs_enabled)
@@ -572,7 +572,7 @@ static int mlx5e_set_channels(struct net_device *dev,
  MLX5E_INDIR_RQT_SIZE, count);
 
if (was_opened)
-   err = mlx5e_open_locked(dev);
+   err = mlx5e_open_locked(priv);
if (err)
goto out;
 
@@ -626,7 +626,7 @@ static int mlx5e_set_coalesce(struct net_device *netdev,
 
was_opened = test_bit(MLX5E_STATE_OPENED, >state);
if (was_opened && restart) {
-   mlx5e_close_locked(netdev);
+   mlx5e_close_locked(priv);
priv->params.rx_am_enabled = !!coal->use_adaptive_rx_coalesce;
}
 
@@ -655,7 +655,7 @@ static int mlx5e_set_coalesce(struct net_device *netdev,
 
 out:
if (was_opened && restart)
-   err = mlx5e_open_locked(netdev);
+   err = mlx5e_open_locked(priv);
 
mutex_unlock(>state_lock);
return err;
@@ -1112,12 +1112,12 @@ static int mlx5e_set_tunable(struct net_device *dev,
 
was_opened = test_bit(MLX5E_STATE_OPENED, >state);
if (was_opened)
-   mlx5e_close_locked(dev);
+   mlx5e_close_locked(priv);
 
priv->params.tx_max_inline = val;
 
if (was_opened)
-   err = mlx5e_open_locked(dev);
+   err = mlx5e_open_locked(priv);
 
mutex_unlock(>state_lock);
break;
@@ -1444,12 +1444,12 @@ static int set_pflag_rx_cqe_based_moder(struct 
net_device *netdev, bool enable)
 
reset = test_bit(MLX5E_STATE_OPENED, >state);
if (reset)
-   mlx5e_close_locked(netdev);
+   mlx5e_close_locked(priv);
 
mlx5e_set_rx_cq_mode_params(>params, rx_cq_period_mode);
 
if (reset)
-   err = mlx5e_open_locked(netdev);
+   err = mlx5e_open_locked(priv);
 
return err;
 }
@@ -1473,13 +1473,13 @@ static int set_pflag_rx_cqe_compress(struct net_device 
*netdev,
reset = test_bit(MLX5E_STATE_OPENED, >state);
 
if (reset)
-   mlx5e_close_locked(netdev);
+   mlx5e_close_locked(priv);
 
MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, enable);
priv->params.rx_cqe_compress_def = enable;
 
if (reset)
-   err = mlx5e_open_locked(netdev);
+   err = mlx5e_open_locked(priv

[RFC v1 for accelerated IPoIB 09/25] net/mlx5e: Creating and Destroying flow-steering tables for IB link

2017-03-13 Thread Erez Shitrit

New function to handle RSS table for IB link type.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h|  2 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 41 +
 2 files changed, 43 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 39f8ac849af7..f3337ec4457f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -755,6 +755,8 @@ void mlx5e_page_release(struct mlx5e_rq *rq, struct 
mlx5e_dma_info *dma_info,
 
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv);
 void mlx5e_destroy_flow_steering(struct mlx5e_priv *priv);
+int mlx5i_create_flow_steering(struct mlx5e_priv *priv);
+void mlx5i_destroy_flow_steering(struct mlx5e_priv *priv);
 void mlx5e_init_l2_addr(struct mlx5e_priv *priv);
 void mlx5e_destroy_flow_table(struct mlx5e_flow_table *ft);
 int mlx5e_self_test_num(struct mlx5e_priv *priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index a0e5a69402b3..c6b40003007c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -1081,6 +1081,40 @@ static void mlx5e_destroy_vlan_table(struct mlx5e_priv 
*priv)
mlx5e_destroy_flow_table(>fs.vlan.ft);
 }
 
+int mlx5i_create_flow_steering(struct mlx5e_priv *priv)
+{
+   int err;
+
+   priv->fs.ns = mlx5_get_flow_namespace(priv->mdev,
+  MLX5_FLOW_NAMESPACE_KERNEL);
+
+   if (!priv->fs.ns)
+   return -EINVAL;
+
+   err = mlx5e_arfs_create_tables(priv);
+   if (err) {
+   netdev_err(priv->netdev, "Failed to create arfs tables, 
err=%d\n",
+  err);
+   priv->netdev->hw_features &= ~NETIF_F_NTUPLE;
+   }
+
+   err = mlx5e_create_ttc_table(priv);
+   if (err) {
+   netdev_err(priv->netdev, "Failed to create ttc table, err=%d\n",
+  err);
+   goto err_destroy_arfs_tables;
+   }
+
+   mlx5e_ethtool_init_steering(priv);
+
+   return 0;
+
+err_destroy_arfs_tables:
+   mlx5e_arfs_destroy_tables(priv);
+
+   return err;
+}
+
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv)
 {
int err;
@@ -1141,3 +1175,10 @@ void mlx5e_destroy_flow_steering(struct mlx5e_priv *priv)
mlx5e_arfs_destroy_tables(priv);
mlx5e_ethtool_cleanup_steering(priv);
 }
+
+void mlx5i_destroy_flow_steering(struct mlx5e_priv *priv)
+{
+   mlx5e_destroy_ttc_table(priv);
+   mlx5e_arfs_destroy_tables(priv);
+   mlx5e_ethtool_cleanup_steering(priv);
+}
-- 
1.8.3.1



[RFC v1 for accelerated IPoIB 11/25] net/mlx5e: Refactor attach_netdev API

2017-03-13 Thread Erez Shitrit

Use priv object instead of netdev object, will give tha ability to use
it for IB link.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 4 +++-
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f3337ec4457f..e5c8badc38c7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -918,7 +918,7 @@ struct net_device *mlx5e_create_netdev(struct mlx5_core_dev 
*mdev,
   const struct mlx5e_profile *profile,
   void *ppriv);
 void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
-int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev);
+int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
 void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device 
*netdev);
 u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout);
 void mlx5e_add_vxlan_port(struct net_device *netdev,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 3db0334cdba0..0eb16ada0ae6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3993,14 +3993,14 @@ struct net_device *mlx5e_create_netdev(struct 
mlx5_core_dev *mdev,
return NULL;
 }
 
-int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev)
+int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv)
 {
const struct mlx5e_profile *profile;
-   struct mlx5e_priv *priv;
+   struct net_device *netdev;
u16 max_mtu;
int err;
 
-   priv = netdev_priv(netdev);
+   netdev = priv->netdev;
profile = priv->profile;
clear_bit(MLX5E_STATE_DESTROYING, >state);
 
@@ -4113,7 +4113,7 @@ static int mlx5e_attach(struct mlx5_core_dev *mdev, void 
*vpriv)
if (err)
return err;
 
-   err = mlx5e_attach_netdev(mdev, netdev);
+   err = mlx5e_attach_netdev(mdev, priv);
if (err) {
mlx5e_destroy_mdev_resources(mdev);
return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 850378893b25..0aad28da1638 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -556,6 +556,7 @@ int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
 struct mlx5_eswitch_rep *rep)
 {
struct net_device *netdev;
+   struct mlx5e_priv *priv;
int err;
 
netdev = mlx5e_create_netdev(esw->dev, _rep_profile, rep);
@@ -567,7 +568,8 @@ int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
 
rep->netdev = netdev;
 
-   err = mlx5e_attach_netdev(esw->dev, netdev);
+   priv = netdev_priv(netdev);
+   err = mlx5e_attach_netdev(esw->dev, priv);
if (err) {
pr_warn("Failed to attach representor netdev for vport %d\n",
rep->vport);
-- 
1.8.3.1



[RFC v1 for accelerated IPoIB 22/25] net/mlx5e: New function pointer for build_rx_skb is

2017-03-13 Thread Erez Shitrit

In order to have the ability to support IB link with the same base code.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  6 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  7 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   | 10 +-
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index b6758d0b93a5..84de1ca11524 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -298,6 +298,11 @@ typedef int (*mlx5e_fp_alloc_wqe)(struct mlx5e_rq *rq, 
struct mlx5e_rx_wqe *wqe,
 
 typedef void (*mlx5e_fp_dealloc_wqe)(struct mlx5e_rq *rq, u16 ix);
 
+typedef void (*mlx5e_fp_build_rx_skb)(struct mlx5_cqe64 *cqe,
+ u32 cqe_bcnt,
+ struct mlx5e_rq *rq,
+ struct sk_buff *skb);
+
 struct mlx5e_dma_info {
struct page *page;
dma_addr_t  addr;
@@ -367,6 +372,7 @@ struct mlx5e_rq {
mlx5e_fp_handle_rx_cqe handle_rx_cqe;
mlx5e_fp_alloc_wqe alloc_wqe;
mlx5e_fp_dealloc_wqe   dealloc_wqe;
+   mlx5e_fp_build_rx_skb  build_rx_skb;
 
unsigned long  state;
intix;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 24efc8ccc075..4dc8b21d011d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -493,6 +493,12 @@ static int mlx5e_create_rq_umr_mkey(struct mlx5e_rq *rq)
return mlx5e_create_umr_mkey(priv, num_mtts, PAGE_SHIFT, >umr_mkey);
 }
 
+/* forward declaration */
+inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
+  u32 cqe_bcnt,
+  struct mlx5e_rq *rq,
+  struct sk_buff *skb);
+
 static int mlx5e_create_rq(struct mlx5e_channel *c,
   struct mlx5e_rq_param *param,
   struct mlx5e_rq *rq)
@@ -538,6 +544,7 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
if (rq->xdp_prog)
rq->buff.map_dir = DMA_BIDIRECTIONAL;
 
+   rq->build_rx_skb = mlx5e_build_rx_skb;
switch (priv->params.rq_wq_type) {
case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
if (mlx5e_is_vf_vport_rep(priv)) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index ad08e64fee1a..98546b3395df 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -590,10 +590,10 @@ static inline void mlx5e_handle_csum(struct net_device 
*netdev,
rq->stats.csum_none++;
 }
 
-static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
- u32 cqe_bcnt,
- struct mlx5e_rq *rq,
- struct sk_buff *skb)
+inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
+  u32 cqe_bcnt,
+  struct mlx5e_rq *rq,
+  struct sk_buff *skb)
 {
struct net_device *netdev = rq->netdev;
struct mlx5e_tstamp *tstamp = rq->tstamp;
@@ -632,7 +632,7 @@ static inline void mlx5e_complete_rx_cqe(struct mlx5e_rq 
*rq,
 {
rq->stats.packets++;
rq->stats.bytes += cqe_bcnt;
-   mlx5e_build_rx_skb(cqe, cqe_bcnt, rq, skb);
+   rq->build_rx_skb(cqe, cqe_bcnt, rq, skb);
 }
 
 static inline void mlx5e_xmit_xdp_doorbell(struct mlx5e_sq *sq)
-- 
1.8.3.1



[RFC v1 for accelerated IPoIB 12/25] net/mlx5e: Use underlay_qpn in tis creation

2017-03-13 Thread Erez Shitrit

Enable IB link to use the same code, by default the underlay_qp is zero
for ETH link.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  | 2 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index e5c8badc38c7..84db4761f09c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -222,6 +222,7 @@ struct mlx5e_params {
bool rx_am_enabled;
u32 lro_timeout;
u32 pflags;
+   u32 underlay_qpn;
 };
 
 #ifdef CONFIG_MLX5_CORE_EN_DCB
@@ -718,6 +719,7 @@ struct mlx5e_priv {
 
const struct mlx5e_profile *profile;
void  *ppriv;
+   u32   underlay_qpn;
 };
 
 void mlx5e_build_ptys2ethtool_map(void);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 0eb16ada0ae6..87881f9ddf35 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2453,6 +2453,7 @@ static int mlx5e_create_tis(struct mlx5e_priv *priv, int 
tc)
 
MLX5_SET(tisc, tisc, prio, tc << 1);
MLX5_SET(tisc, tisc, transport_domain, mdev->mlx5e_res.td.tdn);
+   MLX5_SET(tisc, tisc, underlay_qpn, priv->underlay_qpn);
 
if (mlx5_lag_is_lacp_owner(mdev))
MLX5_SET(tisc, tisc, strict_lag_tx_port_affinity, 1);
-- 
1.8.3.1



[RFC v1 for accelerated IPoIB 14/25] net/mlx5: Enable flow-steering for IB link

2017-03-13 Thread Erez Shitrit

Get the relevant capabilities if supports ipoib_enhanced_offloads and
init the flow steering table accordingly.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 10 +-
 drivers/net/ethernet/mellanox/mlx5/core/fw.c  |  3 ++-
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index fa4edd88daf1..dd21fc557281 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1991,9 +1991,6 @@ int mlx5_init_fs(struct mlx5_core_dev *dev)
struct mlx5_flow_steering *steering;
int err = 0;
 
-   if (MLX5_CAP_GEN(dev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
-   return 0;
-
err = mlx5_init_fc_stats(dev);
if (err)
return err;
@@ -2004,8 +2001,11 @@ int mlx5_init_fs(struct mlx5_core_dev *dev)
steering->dev = dev;
dev->priv.steering = steering;
 
-   if (MLX5_CAP_GEN(dev, nic_flow_table) &&
-   MLX5_CAP_FLOWTABLE_NIC_RX(dev, ft_support)) {
+   if MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_ETH) &&
+ (MLX5_CAP_GEN(dev, nic_flow_table))) ||
+((MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_IB) &&
+ MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)))
+   && MLX5_CAP_FLOWTABLE_NIC_RX(dev, ft_support)) {
err = init_root_ns(steering);
if (err)
goto err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index 5718aada6605..f95bc78b02f2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -123,7 +123,8 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
return err;
}
 
-   if (MLX5_CAP_GEN(dev, nic_flow_table)) {
+   if (MLX5_CAP_GEN(dev, nic_flow_table) ||
+   MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)) {
err = mlx5_core_get_caps(dev, MLX5_CAP_FLOW_TABLE);
if (err)
return err;
-- 
1.8.3.1



[RFC v1 for accelerated IPoIB 23/25] net/mlx5e: Change the function that checks the packet type

2017-03-13 Thread Erez Shitrit

Now we can use it for non-ethernet type packets (like IB).
After changing the order of the skb processing the function now can
check the skb->protocol field to check if it is ETH_P_IP/ETH_P_IPV6
packet.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 98546b3395df..071a6ecce720 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -549,9 +549,8 @@ static inline void mlx5e_skb_set_hash(struct mlx5_cqe64 
*cqe,
 
 static inline bool is_first_ethertype_ip(struct sk_buff *skb)
 {
-   __be16 ethertype = ((struct ethhdr *)skb->data)->h_proto;
-
-   return (ethertype == htons(ETH_P_IP) || ethertype == htons(ETH_P_IPV6));
+   return (skb->protocol == htons(ETH_P_IP) ||
+   skb->protocol == htons(ETH_P_IPV6));
 }
 
 static inline void mlx5e_handle_csum(struct net_device *netdev,
@@ -621,8 +620,8 @@ inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
 
skb->mark = be32_to_cpu(cqe->sop_drop_qpn) & MLX5E_TC_FLOW_ID_MASK;
 
-   mlx5e_handle_csum(netdev, cqe, rq, skb, !!lro_num_seg);
skb->protocol = eth_type_trans(skb, netdev);
+   mlx5e_handle_csum(netdev, cqe, rq, skb, !!lro_num_seg);
 }
 
 static inline void mlx5e_complete_rx_cqe(struct mlx5e_rq *rq,
-- 
1.8.3.1



[RFC v1 for accelerated IPoIB 25/25] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops

2017-03-13 Thread Erez Shitrit

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/Makefile |   2 +-
 drivers/infiniband/hw/mlx5/main.c   |  10 +
 drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c | 289 
 3 files changed, 300 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c

diff --git a/drivers/infiniband/hw/mlx5/Makefile 
b/drivers/infiniband/hw/mlx5/Makefile
index 90ad2adc752f..0c4caa339565 100644
--- a/drivers/infiniband/hw/mlx5/Makefile
+++ b/drivers/infiniband/hw/mlx5/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_MLX5_INFINIBAND)  += mlx5_ib.o
 
-mlx5_ib-y :=   main.o cq.o doorbell.o qp.o mem.o srq.o mr.o ah.o mad.o gsi.o 
ib_virt.o cmd.o
+mlx5_ib-y :=   main.o cq.o doorbell.o qp.o mem.o srq.o mr.o ah.o mad.o gsi.o 
ib_virt.o cmd.o mlx5_ipoib_ops.o
 mlx5_ib-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += odp.o
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 4a043cf35b9a..c9bcaf2cc0c6 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -72,6 +72,14 @@ enum {
MLX5_ATOMIC_SIZE_QP_8BYTES = 1 << 3,
 };
 
+struct net_device *mlx5_alloc_rdma_netdev(struct ib_device *hca,
+u8 port_num,
+enum rdma_netdev_t type,
+const char *name,
+unsigned char name_assign_type,
+void (*setup)(struct net_device *));
+void mlx5_free_rdma_netdev(struct net_device *netdev);
+
 static enum rdma_link_layer
 mlx5_port_type_cap_to_rdma_ll(int port_type_cap)
 {
@@ -3436,6 +3444,8 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
dev->ib_dev.alloc_mr= mlx5_ib_alloc_mr;
dev->ib_dev.map_mr_sg   = mlx5_ib_map_mr_sg;
dev->ib_dev.check_mr_status = mlx5_ib_check_mr_status;
+   dev->ib_dev.alloc_rdma_netdev   = mlx5_alloc_rdma_netdev;
+   dev->ib_dev.free_rdma_netdev= mlx5_free_rdma_netdev;
dev->ib_dev.get_port_immutable  = mlx5_port_immutable;
dev->ib_dev.get_dev_fw_str  = get_dev_fw_str;
if (mlx5_core_is_pf(mdev)) {
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c 
b/drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c
new file mode 100644
index ..9ca2fc4fbc15
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c
@@ -0,0 +1,289 @@
+/*
+ * Copyright (c) 2013-2015, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include 
+#include 
+#include "mlx5_ib.h"
+#include 
+#include 
+
+/*FIX ME*/
+#include "../../ulp/ipoib/ipoib.h"
+
+#define IB_DEFAULT_Q_KEY   0xb1b
+
+int mlx5_ib_config_ipoib_qp(struct mlx5_ib_dev *ib_dev, struct mlx5_core_qp 
*qp)
+{
+   u32 *in;
+   struct mlx5_qp_context *context;
+   int inlen;
+   void *addr_path;
+   void *qpc;
+   int ret;
+
+   inlen = MLX5_ST_SZ_BYTES(create_qp_in);
+   in = mlx5_vzalloc(inlen);
+   if (!in)
+   return -ENOMEM;
+
+   qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
+   MLX5_SET(qpc, qpc, st, MLX5_QP_ST_UD);
+   MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
+   MLX5_SET(qpc, qpc, ulp_stateless_offload_mode,
+MLX5_QP_ENHANCED_ULP_STATELESS_MODE);
+
+   addr_path = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
+   MLX5_SET(ads, addr_path, port, 1);
+   MLX5_SET(ads, addr_path, grh, 1);
+
+   ret = mlx5_core_create_qp(ib_dev-&

[RFC v1 for accelerated IPoIB 21/25] net/mlx5e: Export send function for IB link type

2017-03-13 Thread Erez Shitrit

The function will be used in IB link in order to send packets.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 55 +
 include/linux/mlx5/driver.h |  5 ++-
 2 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 22443ce778ff..fea06be30393 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -435,6 +435,61 @@ netdev_tx_t mlx5e_xmit(struct sk_buff *skb, struct 
net_device *dev)
return mlx5e_sq_xmit(sq, skb);
 }
 
+static int s_ctrl_seg = sizeof(struct mlx5_wqe_ctrl_seg);
+static int s_datagram_seg = sizeof(struct mlx5_wqe_datagram_seg);
+static int s_pad = sizeof(struct mlx5_wqe_eth_pad);
+static int s_eth_seg = sizeof(struct mlx5_wqe_eth_seg);
+static netdev_tx_t mlx5i_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb,
+struct mlx5_av *av, u32 dqpn, u32 dqkey)
+{
+   struct mlx5_wq_cyc   *wq   = >wq;
+   u16  pi= sq->pc & wq->sz_m1;
+   void *wqe  = mlx5_wq_cyc_get_wqe(wq, pi);
+   struct mlx5e_tx_wqe_info *wi   = >db.txq.wqe_info[pi];
+
+   struct mlx5_wqe_ctrl_seg *ctrl_seg = wqe;
+   struct mlx5_wqe_datagram_seg *datagram_seg =
+   wqe + s_ctrl_seg;
+   struct mlx5_wqe_eth_pad *pad =
+   (void *)datagram_seg + s_datagram_seg;
+   struct mlx5_wqe_eth_seg  *ether_seg =
+   (void *)pad + s_pad;
+   struct mlx5_wqe_data_seg *data_seg;
+
+   int tot = s_ctrl_seg + s_datagram_seg + s_pad + s_eth_seg;
+
+   memset(wqe, 0, tot);
+
+   mlx5n_sq_build_datagram_seg(sq, datagram_seg, av, dqpn, dqkey, skb);
+
+   mlx5n_sq_build_ether_seg(sq, wi, ether_seg, skb);
+
+   wi->ds_cnt  = tot / MLX5_SEND_WQE_DS;
+   wi->ds_cnt += DIV_ROUND_UP(wi->ihs - 
sizeof(ether_seg->inline_hdr_start),
+   MLX5_SEND_WQE_DS);
+   data_seg = (struct mlx5_wqe_data_seg *)ctrl_seg + wi->ds_cnt;
+
+   if (mlx5n_sq_build_data_seg(sq, wi, data_seg, skb) < 0)
+   goto out;
+
+   mlx5n_sq_fill_ctrl_seg_and_send(sq, wi, ctrl_seg, skb, pi);
+
+out:
+   return NETDEV_TX_OK;
+}
+
+netdev_tx_t mlx5i_xmit(struct sk_buff *skb, void *p,
+  struct mlx5_av *av, u32 dqpn, u32 dqkey)
+{
+   struct mlx5e_priv *priv = p;
+   struct mlx5e_sq *sq;
+
+   sq = priv->txq_to_sq_map[skb_get_queue_mapping(skb)];
+
+   return mlx5i_sq_xmit(sq, skb, av, dqpn, dqkey);
+}
+EXPORT_SYMBOL(mlx5i_xmit);
+
 bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 {
struct mlx5e_sq *sq;
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index c18be51287e7..6d2ac932d321 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1133,6 +1133,8 @@ struct mlx5i_create_ext_param {
u32 qpn;
 };
 
+struct mlx5_av;
+
 struct net_device *mlx5i_create_netdev(struct mlx5_core_dev *mdev,
   const char *name,
   void (*setup)(struct net_device *dev),
@@ -1141,5 +1143,6 @@ struct net_device *mlx5i_create_netdev(struct 
mlx5_core_dev *mdev,
 void mlx5i_detach(struct mlx5_core_dev *mdev, void *vpriv);
 int mlx5i_close(void *vpriv);
 int mlx5i_open(void *vpriv);
-
+netdev_tx_t mlx5i_xmit(struct sk_buff *skb, void *p, struct mlx5_av *av,
+  u32 dqpn, u32 dqkey);
 #endif /* MLX5_DRIVER_H */
-- 
1.8.3.1



[RFC v1 for accelerated IPoIB 24/25] net/mlx5e: Add support for build_rx_skb for packet from IB type

2017-03-13 Thread Erez Shitrit

New function that parse and build the skb for IPoIB traffic.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  8 
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   | 49 +++
 2 files changed, 57 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 4dc8b21d011d..3b609bcc0914 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -499,6 +499,11 @@ inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
   struct mlx5e_rq *rq,
   struct sk_buff *skb);
 
+inline void mlx5i_build_rx_skb(struct mlx5_cqe64 *cqe,
+  u32 cqe_bcnt,
+  struct mlx5e_rq *rq,
+  struct sk_buff *skb);
+
 static int mlx5e_create_rq(struct mlx5e_channel *c,
   struct mlx5e_rq_param *param,
   struct mlx5e_rq *rq)
@@ -584,6 +589,9 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
else
rq->handle_rx_cqe = mlx5e_handle_rx_cqe;
 
+   if (MLX5_CAP_GEN(mdev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
+   rq->build_rx_skb = mlx5i_build_rx_skb;
+
rq->alloc_wqe = mlx5e_alloc_rx_wqe;
rq->dealloc_wqe = mlx5e_dealloc_rx_wqe;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 071a6ecce720..db3064c4b052 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -624,6 +624,55 @@ inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
mlx5e_handle_csum(netdev, cqe, rq, skb, !!lro_num_seg);
 }
 
+#define MLX5_IB_GRH_DGID_OFFSET 24
+#define MLX5_IB_GRH_BYTES   40
+#define MLX5_IPOIB_ENCAP_LEN4
+#define MLX5_GID_SIZE   16
+
+inline void mlx5i_build_rx_skb(struct mlx5_cqe64 *cqe,
+  u32 cqe_bcnt,
+  struct mlx5e_rq *rq,
+  struct sk_buff *skb)
+{
+   struct net_device *netdev = rq->netdev;
+   u8 *dgid;
+   u8 g;
+
+   skb_put(skb, cqe_bcnt);
+
+   g = (be32_to_cpu(cqe->flags_rqpn) >> 28) & 3;
+   dgid = skb->data + MLX5_IB_GRH_DGID_OFFSET;
+   if ((!g) || dgid[0] != 0xff)
+   skb->pkt_type = PACKET_HOST;
+   else if (memcmp(dgid, netdev->broadcast + 4, MLX5_GID_SIZE) == 0)
+   skb->pkt_type = PACKET_BROADCAST;
+   else
+   skb->pkt_type = PACKET_MULTICAST;
+
+   /* TODO: IB/ipoib: Allow mcast packets from other VFs
+* 68996a6e760e5c74654723eeb57bf65628ae87f4
+*/
+
+   skb_pull(skb, MLX5_IB_GRH_BYTES);
+
+   skb->protocol = *((__be16 *)(skb->data));
+
+   mlx5e_handle_csum(netdev, cqe, rq, skb, rq->priv->params.lro_en);
+
+   skb_record_rx_queue(skb, rq->ix);
+
+   if (likely(netdev->features & NETIF_F_RXHASH))
+   mlx5e_skb_set_hash(cqe, skb);
+
+   skb_reset_mac_header(skb);
+   skb_pull(skb, MLX5_IPOIB_ENCAP_LEN);
+
+   ++netdev->stats.rx_packets;
+   netdev->stats.rx_bytes += skb->len;
+
+   skb->dev = netdev;
+}
+
 static inline void mlx5e_complete_rx_cqe(struct mlx5e_rq *rq,
 struct mlx5_cqe64 *cqe,
 u32 cqe_bcnt,
-- 
1.8.3.1



[RFC v1 for accelerated IPoIB 15/25] net/mlx5e: Enhanced flow table creation to support ETH and IB links.

2017-03-13 Thread Erez Shitrit

IB link needs the the underlay_qp to support flow-steering, so change
the API of the flow-steering creation for supporting both types in the
same set of functions.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c  | 12 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c| 39 -
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  9 ++-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 19 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |  8 +++
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 67 ++
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |  1 +
 include/linux/mlx5/fs.h| 16 --
 8 files changed, 125 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
index 68419a01db36..ea3032d97b0d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
@@ -325,10 +325,18 @@ static int arfs_create_table(struct mlx5e_priv *priv,
 {
struct mlx5e_arfs_tables *arfs = >fs.arfs;
struct mlx5e_flow_table *ft = >arfs_tables[type].ft;
+   struct create_flow_table_param param = {0};
int err;
 
-   ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO,
-  MLX5E_ARFS_TABLE_SIZE, 
MLX5E_ARFS_FT_LEVEL, 0);
+   ft->num_groups = 0;
+
+   param.ns = priv->fs.ns;
+   param.prio = MLX5E_NIC_PRIO;
+   param.max_fte = MLX5E_ARFS_TABLE_SIZE;
+   param.level = MLX5E_ARFS_FT_LEVEL;
+   param.flags = 0;
+
+   ft->t = mlx5_create_flow_table();
if (IS_ERR(ft->t)) {
err = PTR_ERR(ft->t);
ft->t = NULL;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index c6b40003007c..46b48b76e7ca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -779,9 +779,16 @@ static int mlx5e_create_ttc_table(struct mlx5e_priv *priv)
struct mlx5e_ttc_table *ttc = >fs.ttc;
struct mlx5e_flow_table *ft = >ft;
int err;
+   struct create_flow_table_param param = {0};
 
-   ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO,
-  MLX5E_TTC_TABLE_SIZE, 
MLX5E_TTC_FT_LEVEL, 0);
+   param.ns = priv->fs.ns;
+   param.prio = MLX5E_NIC_PRIO;
+   param.max_fte = MLX5E_TTC_TABLE_SIZE;
+   param.level = MLX5E_TTC_FT_LEVEL;
+   param.flags = 0;
+   param.underlay_qpn = priv->underlay_qpn;
+
+   ft->t = mlx5_create_flow_table();
if (IS_ERR(ft->t)) {
err = PTR_ERR(ft->t);
ft->t = NULL;
@@ -952,10 +959,16 @@ static int mlx5e_create_l2_table(struct mlx5e_priv *priv)
struct mlx5e_l2_table *l2_table = >fs.l2;
struct mlx5e_flow_table *ft = _table->ft;
int err;
+   struct create_flow_table_param param = {0};
+
+   param.ns = priv->fs.ns;
+   param.prio = MLX5E_NIC_PRIO;
+   param.max_fte = MLX5E_L2_TABLE_SIZE;
+   param.level = MLX5E_L2_FT_LEVEL;
+   param.flags = 0;
 
ft->num_groups = 0;
-   ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO,
-  MLX5E_L2_TABLE_SIZE, MLX5E_L2_FT_LEVEL, 
0);
+   ft->t = mlx5_create_flow_table();
 
if (IS_ERR(ft->t)) {
err = PTR_ERR(ft->t);
@@ -1041,11 +1054,18 @@ static int mlx5e_create_vlan_table_groups(struct 
mlx5e_flow_table *ft)
 static int mlx5e_create_vlan_table(struct mlx5e_priv *priv)
 {
struct mlx5e_flow_table *ft = >fs.vlan.ft;
+   struct create_flow_table_param param = {0};
int err;
 
ft->num_groups = 0;
-   ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO,
-  MLX5E_VLAN_TABLE_SIZE, 
MLX5E_VLAN_FT_LEVEL, 0);
+
+   param.ns = priv->fs.ns;
+   param.prio = MLX5E_NIC_PRIO;
+   param.max_fte = MLX5E_VLAN_TABLE_SIZE;
+   param.level = MLX5E_VLAN_FT_LEVEL;
+   param.flags = 0;
+
+   ft->t = mlx5_create_flow_table();
 
if (IS_ERR(ft->t)) {
err = PTR_ERR(ft->t);
@@ -1091,13 +,6 @@ int mlx5i_create_flow_steering(struct mlx5e_priv *priv)
if (!priv->fs.ns)
return -EINVAL;
 
-   err = mlx5e_arfs_create_tables(priv);
-   if (err) {
-   netdev_err(priv->netdev, "Failed to create arfs tables, 
err=%d\n",
-  err);
-   priv->netdev->hw_features &= ~NETIF_F_NTUPLE;
-   }
-
err = mlx5e_create_ttc_table(priv);
if (err) {
netdev_err(priv->netdev,

[RFC v1 for accelerated IPoIB 20/25] net/mlx5e: Refactor TX send flow

2017-03-13 Thread Erez Shitrit

prepare of IB link type packets sending.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h|   4 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 237 ++--
 2 files changed, 141 insertions(+), 100 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 154cab2a301b..b6758d0b93a5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -402,6 +402,10 @@ struct mlx5e_tx_wqe_info {
u32 num_bytes;
u8  num_wqebbs;
u8  num_dma;
+   u16 ds_cnt;
+   u16 ihs;
+   u8 opcode;
+   bool bf;
 };
 
 enum mlx5e_dma_map_type {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index cfb68371c397..22443ce778ff 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -216,94 +216,65 @@ static inline void mlx5e_insert_vlan(void *start, struct 
sk_buff *skb, u16 ihs,
mlx5e_tx_skb_pull_inline(skb_data, skb_len, cpy2_sz);
 }
 
-static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb)
+static inline void mlx5n_sq_fill_ctrl_seg_and_send(struct mlx5e_sq *sq,
+  struct mlx5e_tx_wqe_info *wi,
+  struct mlx5_wqe_ctrl_seg 
*cseg,
+  struct sk_buff *skb, u16 pi)
 {
struct mlx5_wq_cyc   *wq   = >wq;
 
-   u16 pi = sq->pc & wq->sz_m1;
-   struct mlx5e_tx_wqe  *wqe  = mlx5_wq_cyc_get_wqe(wq, pi);
-   struct mlx5e_tx_wqe_info *wi   = >db.txq.wqe_info[pi];
+   cseg->opmod_idx_opcode = cpu_to_be32((sq->pc << 8) | wi->opcode);
+   cseg->qpn_ds   = cpu_to_be32((sq->sqn << 8) | wi->ds_cnt);
 
-   struct mlx5_wqe_ctrl_seg *cseg = >ctrl;
-   struct mlx5_wqe_eth_seg  *eseg = >eth;
-   struct mlx5_wqe_data_seg *dseg;
+   sq->db.txq.skb[pi] = skb;
 
-   unsigned char *skb_data = skb->data;
-   unsigned int skb_len = skb->len;
-   u8  opcode = MLX5_OPCODE_SEND;
-   dma_addr_t dma_addr = 0;
-   unsigned int num_bytes;
-   bool bf = false;
-   u16 headlen;
-   u16 ds_cnt;
-   u16 ihs;
-   int i;
+   wi->num_wqebbs = DIV_ROUND_UP(wi->ds_cnt, MLX5_SEND_WQEBB_NUM_DS);
+   sq->pc += wi->num_wqebbs;
 
-   memset(wqe, 0, sizeof(*wqe));
+   netdev_tx_sent_queue(sq->txq, wi->num_bytes);
 
-   if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
-   eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM;
-   if (skb->encapsulation) {
-   eseg->cs_flags |= MLX5_ETH_WQE_L3_INNER_CSUM |
- MLX5_ETH_WQE_L4_INNER_CSUM;
-   sq->stats.csum_partial_inner++;
-   } else {
-   eseg->cs_flags |= MLX5_ETH_WQE_L4_CSUM;
-   }
-   } else
-   sq->stats.csum_none++;
+   if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
+   skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
 
-   if (sq->cc != sq->prev_cc) {
-   sq->prev_cc = sq->cc;
-   sq->bf_budget = (sq->cc == sq->pc) ? MLX5E_SQ_BF_BUDGET : 0;
+   if (unlikely(!mlx5e_sq_has_room_for(sq, MLX5E_SQ_STOP_ROOM))) {
+   netif_tx_stop_queue(sq->txq);
+   sq->stats.stopped++;
}
 
-   if (skb_is_gso(skb)) {
-   eseg->mss= cpu_to_be16(skb_shinfo(skb)->gso_size);
-   opcode   = MLX5_OPCODE_LSO;
+   if (!skb->xmit_more || netif_xmit_stopped(sq->txq)) {
+   int bf_sz = 0;
 
-   if (skb->encapsulation) {
-   ihs = skb_inner_transport_offset(skb) + 
inner_tcp_hdrlen(skb);
-   sq->stats.tso_inner_packets++;
-   sq->stats.tso_inner_bytes += skb->len - ihs;
-   } else {
-   ihs = skb_transport_offset(skb) + tcp_hdrlen(skb);
-   sq->stats.tso_packets++;
-   sq->stats.tso_bytes += skb->len - ihs;
-   }
+   if (wi->bf && test_bit(MLX5E_SQ_STATE_BF_ENABLE, >state))
+   bf_sz = wi->num_wqebbs << 3;
 
-   num_bytes = skb->len + (skb_shinfo(skb)->gso_segs - 1) * ihs;
-   } else {
-   bf = sq->bf_budget &&
-!skb->xmit_more &&
-!skb_shinfo(skb)->nr_frags;
-   ihs = mlx5e_get_inline_hdr_size(sq, skb, bf);
-   num_bytes = max_t(unsig

[RFC v1 for accelerated IPoIB 19/25] include/linux/mlx5: Add mlx5_wqe_eth_pad and enhanced-ipoib-qp-mode

2017-03-13 Thread Erez Shitrit

mlx5_wqe_eth_pad will be used for TX flow for IB link type.
enhanced-ipoib-qp-mode for QP creation.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/qp.c | 4 
 include/linux/mlx5/qp.h | 8 
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 41e14d57fec9..d6918e6b6f28 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -71,10 +71,6 @@ enum {
[MLX5_IB_WR_UMR]= MLX5_OPCODE_UMR,
 };
 
-struct mlx5_wqe_eth_pad {
-   u8 rsvd0[16];
-};
-
 enum raw_qp_set_mask_map {
MLX5_RAW_QP_MOD_SET_RQ_Q_CTR_ID = 1UL << 0,
MLX5_RAW_QP_RATE_LIMIT  = 1UL << 1,
diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index 219c699c17b7..568f8ac9 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -221,6 +221,14 @@ enum {
MLX5_ETH_WQE_L4_CSUM= 1 << 7,
 };
 
+struct mlx5_wqe_eth_pad {
+   u8 rsvd0[16];
+};
+
+enum {
+   MLX5_QP_ENHANCED_ULP_STATELESS_MODE = 2,
+};
+
 struct mlx5_wqe_eth_seg {
u8  rsvd0[4];
u8  cs_flags;
-- 
1.8.3.1



[RFC v1 for accelerated IPoIB 10/25] net/mlx5e: Support netdevice creation for IB link type

2017-03-13 Thread Erez Shitrit

Implement required interface that will able the IB link to be run on top
of the ETH data structures.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 98 +++
 include/linux/mlx5/driver.h   | 12 +++
 2 files changed, 110 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 88541f99d37b..3db0334cdba0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3843,6 +3843,104 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
mlx5_lag_remove(mdev);
 }
 
+static void mlx5i_nic_init(struct mlx5_core_dev *mdev,
+  struct net_device *netdev,
+  const struct mlx5e_profile *profile,
+  void *ppriv)
+{
+   struct mlx5e_priv *priv = ipoib_dev_priv(netdev);
+
+   mlx5n_build_nic_netdev_priv_common(mdev, netdev, priv, profile, ppriv);
+}
+
+static int mlx5i_init_nic_rx(struct mlx5e_priv *priv)
+{
+   struct mlx5_core_dev *mdev = priv->mdev;
+   int err;
+
+   err = mlx5n_init_nic_rx_common(priv);
+   if (err) {
+   mlx5_core_warn(mdev, "failed create nic rx res, %d\n", err);
+   return err;
+   }
+
+   err = mlx5i_create_flow_steering(priv);
+   if (err) {
+   mlx5_core_warn(mdev, "create flow steering failed, %d\n", err);
+   return err;
+   }
+
+   return 0;
+}
+
+static void mlx5i_cleanup_nic_rx(struct mlx5e_priv *priv)
+{
+   mlx5i_destroy_flow_steering(priv);
+   mlx5n_cleanup_nic_rx_common(priv);
+}
+
+static const struct mlx5e_profile mlx5i_nic_profile = {
+   .init  = mlx5i_nic_init,
+   .cleanup   = NULL,
+   .init_rx   = mlx5i_init_nic_rx,
+   .cleanup_rx= mlx5i_cleanup_nic_rx,
+   .init_tx   = mlx5e_init_nic_tx,
+   .cleanup_tx= mlx5e_cleanup_nic_tx,
+   .enable= NULL,/*mlx5e_nic_enable,*/
+   .disable   = NULL,
+   .update_stats  = NULL,/*mlx5e_update_stats,*/
+   .max_nch   = mlx5e_get_max_num_channels,
+   .max_tc= MLX5E_MAX_NUM_TC,
+};
+
+struct net_device *mlx5i_create_netdev(struct mlx5_core_dev *mdev,
+  const char *name,
+  void (*setup)(struct net_device *dev),
+  struct mlx5i_create_ext_param *param)
+{
+   const struct mlx5e_profile *profile = _nic_profile;
+   int nch = profile->max_nch(mdev);
+   struct net_device *netdev;
+   struct mlx5e_priv *priv;
+
+   if (mlx5e_check_required_hca_cap(mdev, MLX5_INTERFACE_PROTOCOL_IB))
+   return NULL;
+
+   netdev = alloc_netdev_mqs(sizeof(struct mlx5e_priv) + 
param->size_base_priv,
+ name, NET_NAME_UNKNOWN,
+ setup,
+ nch * MLX5E_MAX_NUM_TC,
+ nch);
+   if (!netdev) {
+   pr_err("alloc_netdev_mqs failed\n");
+   return NULL;
+   }
+
+   if (profile->init)
+   profile->init(mdev, netdev, profile, >size_base_priv);
+
+   netif_carrier_off(netdev);
+
+   priv = ipoib_dev_priv(netdev);
+
+   priv->underlay_qpn = param->qpn;
+
+   priv->wq = create_singlethread_workqueue("mlx5i");
+   if (!priv->wq)
+   goto err_cleanup_nic;
+
+   return netdev;
+
+err_cleanup_nic:
+   if (profile->cleanup)
+   profile->cleanup(priv);
+
+   free_netdev(netdev);
+
+   return NULL;
+}
+EXPORT_SYMBOL(mlx5i_create_netdev);
+
 static const struct mlx5e_profile mlx5e_nic_profile = {
.init  = mlx5e_nic_init,
.cleanup   = mlx5e_nic_cleanup,
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 886ff2b00500..d0060cfb2a4f 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 enum {
MLX5_BOARD_ID_LEN = 64,
@@ -1127,4 +1128,15 @@ enum {
MLX5_TRIGGERED_CMD_COMP = (u64)1 << 32,
 };
 
+struct mlx5i_create_ext_param {
+   int size_base_priv;
+   u32 qpn;
+};
+
+struct net_device *mlx5i_create_netdev(struct mlx5_core_dev *mdev,
+  const char *name,
+  void (*setup)(struct net_device *dev),
+  struct mlx5i_create_ext_param *param);
+int mlx5i_attach(struct mlx5_core_dev *mdev, void *vpriv);
+void mlx5i_detach(struct mlx5_core_dev *mdev, void *vpriv);
 #endif /* MLX5_DRIVER_H */
-- 
1.8.3.1



[RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks

2017-03-13 Thread Erez Shitrit

IPoIB driver now uses the new set of callback functions.
If the HW provider supports the new ipoib_options implementation, the
driver uses the callbacks in its datapath flows, otherwise it uses the
driver default implementation for all data flows in its code.
The default implementation is exactly the driver implementation as it
was without HW vendor support.

TODO: We added remote qkey to ipoib_send in order to match send op
signature.
In accel mode this param will be used but in regular mode this param is
redundant. Need to think about better solution.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h   |  30 ++--
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|  66 
 drivers/infiniband/ulp/ipoib/ipoib_ethtool.c   |   6 +-
 drivers/infiniband/ulp/ipoib/ipoib_fs.c|   4 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c|  57 +++
 drivers/infiniband/ulp/ipoib/ipoib_main.c  | 207 -
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |  39 +++--
 drivers/infiniband/ulp/ipoib/ipoib_netlink.c   |  12 +-
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |  24 ++-
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c  |   9 +-
 10 files changed, 275 insertions(+), 179 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index d94a7a953338..48da1b5be183 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -51,8 +51,8 @@
 #include 
 #include 
 #include 
+#include 
 #include 
-
 /* constants */
 
 enum ipoib_flush_level {
@@ -357,6 +357,7 @@ struct ipoib_dev_priv {
struct ib_cq *recv_cq;
struct ib_cq *send_cq;
struct ib_qp *qp;
+   u32   qp_num;
u32   qkey;
 
union ib_gid local_gid;
@@ -404,6 +405,7 @@ struct ipoib_dev_priv {
struct timer_list poll_timer;
unsigned max_send_sge;
bool sm_fullmember_sendonly_support;
+   const struct net_device_ops *rn_ops;
 };
 
 struct ipoib_ah {
@@ -483,22 +485,26 @@ static inline void ipoib_put_ah(struct ipoib_ah *ah)
 int ipoib_add_umcast_attr(struct net_device *dev);
 
 void ipoib_send(struct net_device *dev, struct sk_buff *skb,
-   struct ipoib_ah *address, u32 dqpn);
+   struct ipoib_ah *address, u32 dqpn, u32 dqkey);
 void ipoib_reap_ah(struct work_struct *work);
 
 struct ipoib_path *__path_find(struct net_device *dev, void *gid);
 void ipoib_mark_paths_invalid(struct net_device *dev);
 void ipoib_flush_paths(struct net_device *dev);
 int ipoib_check_sm_sendonly_fullmember_support(struct ipoib_dev_priv *priv);
-struct ipoib_dev_priv *ipoib_intf_alloc(const char *format);
+struct ipoib_dev_priv *ipoib_intf_alloc(struct ib_device *hca, u8 port,
+   const char *format);
 void ipoib_ib_tx_timer_func(unsigned long ctx);
 void ipoib_ib_dev_flush_light(struct work_struct *work);
 void ipoib_ib_dev_flush_normal(struct work_struct *work);
 void ipoib_ib_dev_flush_heavy(struct work_struct *work);
 void ipoib_pkey_event(struct work_struct *work);
 void ipoib_ib_dev_cleanup(struct net_device *dev);
-void ipoib_dev_uninit_default(struct net_device *dev);
+void ipoib_dev_uninit_default(struct net_device *dev, struct ib_device *hca);
+int ipoib_ib_dev_open_default(struct net_device *dev);
+int ipoib_ib_dev_stop_default(struct net_device *dev);
 int ipoib_ib_dev_open(struct net_device *dev);
+int ipoib_ib_dev_stop(struct net_device *dev);
 void ipoib_ib_dev_up(struct net_device *dev);
 void ipoib_ib_dev_down(struct net_device *dev);
 int ipoib_ib_dev_stop_default(struct net_device *dev);
@@ -561,8 +567,10 @@ void ipoib_path_iter_read(struct ipoib_path_iter *iter,
  struct ipoib_path *path);
 #endif
 
-int ipoib_mcast_attach(struct net_device *dev, u16 mlid,
-  union ib_gid *mgid, int set_qkey);
+int ipoib_mcast_attach(struct net_device *dev, struct ib_device *hca,
+  union ib_gid *mgid, u16 mlid, int set_qkey);
+int ipoib_mcast_detach(struct net_device *dev, struct ib_device *hca,
+  union ib_gid *mgid, u16 mlid);
 void ipoib_mcast_remove_list(struct list_head *remove_list);
 void ipoib_check_and_add_mcast_sendonly(struct ipoib_dev_priv *priv, u8 *mgid,
struct list_head *remove_list);
@@ -586,7 +594,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
ipoib_dev_priv *priv,
 void ipoib_set_umcast(struct net_device *ndev, int umcast_val);
 int  ipoib_set_mode(struct net_device *dev, const char *buf);
 
-void ipoib_setup(struct net_device *dev);
+void ipoib_setup_common(struct net_device *dev);
 
 void ipoib_pkey_open(struct ipoib_dev_priv *priv);
 void ipoib_drain_cq(struct net_device *dev);
@@ -606,14 +614,14 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
ipoib_dev_priv *priv,
 
 static inli

[RFC v1 for accelerated IPoIB 03/25] IB/ipoib: Rename qpn to dqpn in ipoib_send and post_send functions

2017-03-13 Thread Erez Shitrit

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h| 2 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index b3900b253ad5..d94a7a953338 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -483,7 +483,7 @@ static inline void ipoib_put_ah(struct ipoib_ah *ah)
 int ipoib_add_umcast_attr(struct net_device *dev);
 
 void ipoib_send(struct net_device *dev, struct sk_buff *skb,
-   struct ipoib_ah *address, u32 qpn);
+   struct ipoib_ah *address, u32 dqpn);
 void ipoib_reap_ah(struct work_struct *work);
 
 struct ipoib_path *__path_find(struct net_device *dev, void *gid);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 5d732c5f01ee..dd5fb2964e63 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -514,7 +514,7 @@ void ipoib_send_comp_handler(struct ib_cq *cq, void 
*dev_ptr)
 
 static inline int post_send(struct ipoib_dev_priv *priv,
unsigned int wr_id,
-   struct ib_ah *address, u32 qpn,
+   struct ib_ah *address, u32 dqpn,
struct ipoib_tx_buf *tx_req,
void *head, int hlen)
 {
@@ -524,7 +524,7 @@ static inline int post_send(struct ipoib_dev_priv *priv,
ipoib_build_sge(priv, tx_req);
 
priv->tx_wr.wr.wr_id= wr_id;
-   priv->tx_wr.remote_qpn  = qpn;
+   priv->tx_wr.remote_qpn  = dqpn;
priv->tx_wr.ah  = address;
 
if (head) {
@@ -539,7 +539,7 @@ static inline int post_send(struct ipoib_dev_priv *priv,
 }
 
 void ipoib_send(struct net_device *dev, struct sk_buff *skb,
-   struct ipoib_ah *address, u32 qpn)
+   struct ipoib_ah *address, u32 dqpn)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
struct ipoib_tx_buf *tx_req;
@@ -621,7 +621,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
skb_dst_drop(skb);
 
rc = post_send(priv, priv->tx_head & (ipoib_sendq_size - 1),
-  address->ah, qpn, tx_req, phead, hlen);
+  address->ah, dqpn, tx_req, phead, hlen);
if (unlikely(rc)) {
ipoib_warn(priv, "post_send failed, error %d\n", rc);
++dev->stats.tx_errors;
-- 
1.8.3.1



[RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API

2017-03-13 Thread Erez Shitrit

The idea is to allow vendors to optimize IPoIB data path.
New struct that includes functions and data member is exposed.
It exposes set of callback functions for handling data path flows in IPoIB 
driver.
Each vendor can support these set of functions in order to optimize its
specific data path, and let IPoIB to leverage its data path.
The code of IPoIB driver was changed accordingly, and works in both ways
with vendor specific implementation and without.
There is an assumption, that vendors should give the full set of functions
and not only part of them, in order to work properly.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 include/rdma/ib_ipoib_accel_ops.h | 59 +++
 include/rdma/ib_verbs.h   | 36 
 2 files changed, 95 insertions(+)
 create mode 100644 include/rdma/ib_ipoib_accel_ops.h

diff --git a/include/rdma/ib_ipoib_accel_ops.h 
b/include/rdma/ib_ipoib_accel_ops.h
new file mode 100644
index ..148a5529a559
--- /dev/null
+++ b/include/rdma/ib_ipoib_accel_ops.h
@@ -0,0 +1,59 @@
+/*
+ * Copyright (c) 2017 Mellanox Technologies Ltd.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#if !defined(IB_IPOIB_ACCEL_OPS_H)
+#define IB_IPOIB_ACCEL_OPS_H
+
+#include 
+
+/* ipoib rdma netdev's private data structure */
+struct ipoib_rdma_netdev {
+   struct rdma_netdev rn;  /* keep this first */
+   /* followed by device private data */
+   char *dev_priv[0];
+};
+
+static inline void *ipoib_priv(const struct net_device *dev)
+{
+   struct rdma_netdev *rn = netdev_priv(dev);
+
+   return rn->clnt_priv;
+}
+
+static inline void *ipoib_dev_priv(const struct net_device *dev)
+{
+   struct ipoib_rdma_netdev *ipoib_rn = netdev_priv(dev);
+
+   return ipoib_rn->dev_priv;
+}
+
+#endif /* IB_IPOIB_ACCEL_OPS_H */
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 85b9034c8cfc..9b090efccdba 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1901,6 +1901,41 @@ struct ib_port_immutable {
u32   max_mad_size;
 };
 
+/* rdma netdev type - specifies protocol type */
+enum rdma_netdev_t {
+   RDMA_NETDEV_OPA_VNIC,
+   RDMA_NETDEV_IPOIB
+};
+
+struct ipoib_ah;
+
+/**
+ * struct rdma_netdev - rdma netdev
+ * For cases where netstack interfacing is required.
+ */
+struct rdma_netdev {
+   void *clnt_priv;
+
+   /* control functions */
+   void (*set_id)(struct net_device *netdev, int id);
+   /* IB resource allocation function, returns new UD QP */
+   int (*ib_dev_init)(struct net_device *dev, struct ib_device *hca,
+  int *qp_num);
+   void (*ib_dev_cleanup)(struct net_device *dev, struct ib_device *hca);
+
+   /* send packet */
+   void (*send)(struct net_device *dev, struct sk_buff *skb,
+struct ipoib_ah *address, u32 dqpn, u32 dqkey);
+
+   /* multicast */
+   int (*attach_mcast)(struct net_device *dev, struct ib_device *hca,
+   union ib_gid *gid, u16 lid, int set_qkey);
+   int (*detach_mcast)(struct net_device *dev, struct ib_device *hca,
+   union ib_gid *gid, u16 lid);
+   int qp_num;
+   void *context;
+};
+
 struct ib_device {
struct device*dma_device;
 
@@ -2149,6 +2184,7 @@ struct ib_device {
struct ib_wq_attr *attr,
u32 wq_attr_mask,
struct ib_udata *

[RFC v1 for accelerated IPoIB 08/25] net/mlx5e: Refactor EN code to support IB link

2017-03-13 Thread Erez Shitrit

The idea is to use the same infrastructures for both ETH and IB link
types, so the first step is to refactor the ETH handling to be able to
use IB link as well.

1. Check requirments for ETH and for IB
2. Move code to common functions, where it will be used for both link
types.
3. Change init and cleanup flows not to be specific for ETH link.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 120 ++
 1 file changed, 80 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 041e0ac16096..88541f99d37b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3323,29 +3323,39 @@ static void mlx5e_netpoll(struct net_device *dev)
.ndo_get_offload_stats   = mlx5e_get_offload_stats,
 };
 
-static int mlx5e_check_required_hca_cap(struct mlx5_core_dev *mdev)
-{
-   if (MLX5_CAP_GEN(mdev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
-   return -EOPNOTSUPP;
-   if (!MLX5_CAP_GEN(mdev, eth_net_offloads) ||
-   !MLX5_CAP_GEN(mdev, nic_flow_table) ||
-   !MLX5_CAP_ETH(mdev, csum_cap) ||
-   !MLX5_CAP_ETH(mdev, max_lso_cap) ||
-   !MLX5_CAP_ETH(mdev, vlan_cap) ||
-   !MLX5_CAP_ETH(mdev, rss_ind_tbl_cap) ||
-   MLX5_CAP_FLOWTABLE(mdev,
-  flow_table_properties_nic_receive.max_ft_level)
-  < 3) {
-   mlx5_core_warn(mdev,
-  "Not creating net device, some required device 
capabilities are missing\n");
-   return -EOPNOTSUPP;
+static int mlx5e_check_required_hca_cap(struct mlx5_core_dev *mdev,
+   int link_type)
+{
+   if (link_type == MLX5_INTERFACE_PROTOCOL_ETH) {
+   if (!MLX5_CAP_GEN(mdev, eth_net_offloads) ||
+   !MLX5_CAP_GEN(mdev, nic_flow_table) ||
+   !MLX5_CAP_ETH(mdev, csum_cap) ||
+   !MLX5_CAP_ETH(mdev, max_lso_cap) ||
+   !MLX5_CAP_ETH(mdev, vlan_cap) ||
+   !MLX5_CAP_ETH(mdev, rss_ind_tbl_cap) ||
+   MLX5_CAP_FLOWTABLE(mdev,
+  
flow_table_properties_nic_receive.max_ft_level)
+  < 3) {
+   mlx5_core_warn(mdev,
+  "Not creating net device, some required 
device capabilities are missing\n");
+   return -ENOTSUPP;
+   }
+   if (!MLX5_CAP_ETH(mdev, self_lb_en_modifiable))
+   mlx5_core_warn(mdev, "Self loop back prevention is not 
supported\n");
+   if (!MLX5_CAP_GEN(mdev, cq_moderation))
+   mlx5_core_warn(mdev, "CQ modiration is not 
supported\n");
+
+   return 0;
+   } else if (link_type == MLX5_INTERFACE_PROTOCOL_IB) {
+   if (!MLX5_CAP_GEN(mdev, ipoib_enhanced_offloads)) {
+   pr_warn("Not creating net device (IB), some required 
device capabilities are missing\n");
+   return -ENOTSUPP;
+   }
+   return 0;
}
-   if (!MLX5_CAP_ETH(mdev, self_lb_en_modifiable))
-   mlx5_core_warn(mdev, "Self loop back prevention is not 
supported\n");
-   if (!MLX5_CAP_GEN(mdev, cq_moderation))
-   mlx5_core_warn(mdev, "CQ modiration is not supported\n");
 
-   return 0;
+   return -ENOTSUPP;
+
 }
 
 u16 mlx5e_get_max_inline_cap(struct mlx5_core_dev *mdev)
@@ -3455,12 +3465,12 @@ u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev 
*mdev, u32 wanted_timeout)
return MLX5_CAP_ETH(mdev, lro_timer_supported_periods[i]);
 }
 
-static void mlx5e_build_nic_netdev_priv(struct mlx5_core_dev *mdev,
+void mlx5n_build_nic_netdev_priv_common(struct mlx5_core_dev *mdev,
struct net_device *netdev,
+   struct mlx5e_priv *priv,
const struct mlx5e_profile *profile,
void *ppriv)
 {
-   struct mlx5e_priv *priv = netdev_priv(netdev);
u32 link_speed = 0;
u32 pci_bw = 0;
u8 cq_period_mode = MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ?
@@ -3524,6 +3534,15 @@ static void mlx5e_build_nic_netdev_priv(struct 
mlx5_core_dev *mdev,
MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, 
priv->params.rx_cqe_compress_def);
 
mutex_init(>state_lock);
+}
+
+static void mlx5e_build_nic_netdev_priv(struct mlx5_core_dev *mdev,
+   struct net_device *netdev,
+   struct mlx5e

[RFC v1 for accelerated IPoIB 01/25] IB/ipoib: Separate control and data related initializations

2017-03-13 Thread Erez Shitrit

This patch prepares init and and teardown flows so we can call them
through ipoib_options function pointers.
It arranges that area of code as the following:
All operations which deal with the resource allocation/deletion are done
in one place.
All operations that are control oriented, meaning that they are not
connected to a specific HW beneath, are done in a separate place.

The operations for alloc HW resources are now in the function
ipoib_dev_init_default, the deletion of all the resources are in
ipoib_dev_uninit_default The only exception is the creation of the pd
object which is used both for resource allocation (create QP etc.) and for
control flows like creating ah.

It also does:
Move creation of rx_ring and tx_ring to be in the resources allocation
area.
Move the function ipoib_ib_dev_open that does the open device to the
control area instead of the dev_init which creates resources.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h   |   5 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c|  39 +++---
 drivers/infiniband/ulp/ipoib/ipoib_main.c  | 116 -
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |  40 +-
 4 files changed, 110 insertions(+), 90 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index bed233bf45c3..7cd9befd7d54 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -491,14 +491,13 @@ void ipoib_send(struct net_device *dev, struct sk_buff 
*skb,
 void ipoib_flush_paths(struct net_device *dev);
 int ipoib_check_sm_sendonly_fullmember_support(struct ipoib_dev_priv *priv);
 struct ipoib_dev_priv *ipoib_intf_alloc(const char *format);
-
-int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port);
+void ipoib_ib_tx_timer_func(unsigned long ctx);
 void ipoib_ib_dev_flush_light(struct work_struct *work);
 void ipoib_ib_dev_flush_normal(struct work_struct *work);
 void ipoib_ib_dev_flush_heavy(struct work_struct *work);
 void ipoib_pkey_event(struct work_struct *work);
 void ipoib_ib_dev_cleanup(struct net_device *dev);
-
+void ipoib_dev_uninit_default(struct net_device *dev);
 int ipoib_ib_dev_open(struct net_device *dev);
 void ipoib_ib_dev_up(struct net_device *dev);
 void ipoib_ib_dev_down(struct net_device *dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 12c4f84a6639..3c0a35d883e2 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -40,6 +40,7 @@
 
 #include 
 #include 
+#include 
 
 #include "ipoib.h"
 
@@ -692,7 +693,7 @@ static void ipoib_stop_ah(struct net_device *dev)
ipoib_flush_ah(dev);
 }
 
-static void ipoib_ib_tx_timer_func(unsigned long ctx)
+void ipoib_ib_tx_timer_func(unsigned long ctx)
 {
drain_tx_cq((struct net_device *)ctx);
 }
@@ -913,32 +914,6 @@ void ipoib_ib_dev_stop(struct net_device *dev)
ib_req_notify_cq(priv->recv_cq, IB_CQ_NEXT_COMP);
 }
 
-int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
-{
-   struct ipoib_dev_priv *priv = netdev_priv(dev);
-
-   priv->ca = ca;
-   priv->port = port;
-   priv->qp = NULL;
-
-   if (ipoib_transport_dev_init(dev, ca)) {
-   printk(KERN_WARNING "%s: ipoib_transport_dev_init failed\n", 
ca->name);
-   return -ENODEV;
-   }
-
-   setup_timer(>poll_timer, ipoib_ib_tx_timer_func,
-   (unsigned long) dev);
-
-   if (dev->flags & IFF_UP) {
-   if (ipoib_ib_dev_open(dev)) {
-   ipoib_transport_dev_cleanup(dev);
-   return -ENODEV;
-   }
-   }
-
-   return 0;
-}
-
 /*
  * Takes whatever value which is in pkey index 0 and updates priv->pkey
  * returns 0 if the pkey value was changed.
@@ -1236,7 +1211,13 @@ void ipoib_ib_dev_cleanup(struct net_device *dev)
 */
ipoib_stop_ah(dev);
 
-   ipoib_transport_dev_cleanup(dev);
-}
+   clear_bit(IPOIB_PKEY_ASSIGNED, >flags);
+
+   ipoib_dev_uninit_default(dev);
 
+   if (priv->pd) {
+   ib_dealloc_pd(priv->pd);
+   priv->pd = NULL;
+   }
+}
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 259c59f67394..8c644bbc2828 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1622,8 +1622,23 @@ static void ipoib_neigh_hash_uninit(struct net_device 
*dev)
wait_for_completion(>ntbl.deleted);
 }
 
+void ipoib_dev_uninit_default(struct net_device *dev)
+{
+   struct ipoib_dev_priv *priv = netdev_priv(dev);
 
-int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
+   ipoib_transport_dev_cleanup(dev);
+
+   ipoib_cm_dev_clea

[RFC v1 for accelerated IPoIB 07/25] linux/mlx5/mlx5_ifc.h: Add underlay_qpn field to PRM objects

2017-03-13 Thread Erez Shitrit

Add to the TIS and flow_table objects, and to hca capability table.

1. New capability bit: ipoib_enhanced_offloads, indicates new ability for UD
QP to do RSS.
2. In order to support SET_ROOT cmd, to connect between QP to FS table.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 2d197d8a7025..afb6c8ab156a 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -859,7 +859,8 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 
u8 compact_address_vector[0x1];
u8 striding_rq[0x1];
-   u8 reserved_at_202[0x2];
+   u8 reserved_at_202[0x1];
+   u8 ipoib_enhanced_offloads[0x1];
u8 ipoib_basic_offloads[0x1];
u8 reserved_at_205[0xa];
u8 drain_sigerr[0x1];
@@ -2217,7 +2218,9 @@ struct mlx5_ifc_tisc_bits {
u8 reserved_at_120[0x8];
u8 transport_domain[0x18];
 
-   u8 reserved_at_140[0x3c0];
+   u8 reserved_at_140[0x8];
+   u8 underlay_qpn[0x18];
+   u8 reserved_at_160[0x3a0];
 };
 
 enum {
@@ -7906,7 +7909,9 @@ struct mlx5_ifc_set_flow_table_root_in_bits {
u8 reserved_at_a0[0x8];
u8 table_id[0x18];
 
-   u8 reserved_at_c0[0x140];
+   u8 reserved_at_c0[0x8];
+   u8 underlay_qpn[0x18];
+   u8 reserved_at_e0[0x120];
 };
 
 enum {
-- 
1.8.3.1



[RFC v1 for accelerated IPoIB 06/25] hw/mlx5: Add New bit to check over QP creation

2017-03-13 Thread Erez Shitrit

Add check for bit IB_QP_CREATE_NETIF_QP while creating QP.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/qp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index dc0ea63900c1..41e14d57fec9 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -897,6 +897,7 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev,
if (init_attr->create_flags & ~(IB_QP_CREATE_SIGNATURE_EN |
IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK |
IB_QP_CREATE_IPOIB_UD_LSO |
+   IB_QP_CREATE_NETIF_QP |
mlx5_ib_create_qp_sqpn_qp1()))
return -EINVAL;
 
-- 
1.8.3.1



[RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver

2017-03-13 Thread Erez Shitrit
The IPoIB protocol encapsulates IP packets over Infiniband datagrams.
As a direct RDMA Upper Layer Protocol (ULP), IPoIB cannot support HW
features that are specific to the IP protocol stack.

Nevertheless, RDMA interfaces have been extended to support some of the
prominent IP offload features, such as TCP/UDP checksum and TSO.
This provided reasonable performance gain for IPoIB but is still
insufficient to cope with the increasing network bandwidth demand.

However, New features are exisiting in common network interfaces that
are very hard to implement in IPoIB interfaces while it uses the RDMA
layer, examples include TSS and RSS, tunneling offloads, and XDP.
Rather than continuously porting IP network interface developments into
the RDMA stack, we propose adding an abstract network data-path interfaces
to RDMA devices.

In order to present a consistent interface to users, the IPoIB ULP
continues to represent the network device to the IP stack.
The common code also manages the IPoIB control plane, such as resolving
path queries and registering to multicast groups.
Data path operations are forwarded to devices that implement the new
API, or fallback to the standard implementation otherwise.
Using the forgoing approach, we show how IPoIB closes the performance
gap compared to state-of-the-art Ethernet network interfaces.

The implementation idea is to expose a struct that has data members and set
of functions that are used for network interfaces, like create, delete, 
init hw
resources, send, and attach/detach multicast to qp.
That set of functions encapsulates in new struct, and this struct can or
can't be given by the specific HW layer.

The IPoIB code will be adapted to enable the option of accelerating the
network interface, but the code will work as before if the HW below
doesn't support the acceleration.
Each HW vendor can supply the acceleration for the IPoIB or to leave
IPoIB to work as before.

TBD:
1. Few functions in the API might be changed, at least send functions 
that is not going to use the ipoib_ah struct.
2. Currently I used functions for init/cleanup, perhaps later it will 
be pushed into the ndo_ops struct.
3. The low-level-functions will have a new design that will reduce the 
use of exported function from the mlx5_core layer to the ib layer.

Changes fron v0:
---
1. Use the vnic/hfi API as a base for the new design/impl.
2. Change the low level driver to support the new struct. 


Erez Shitrit (25):
  IB/ipoib: Separate control and data related initializations
  IB/ipoib: separate control from HW operation on ipoib_open/stop ndo
  IB/ipoib: Rename qpn to dqpn in ipoib_send and post_send functions
  IB/verb: Add ipoib_options struct and API
  IB/ipoib: Support ipoib acceleration options callbacks
  hw/mlx5: Add New bit to check over QP creation
  linux/mlx5/mlx5_ifc.h: Add underlay_qpn field to PRM objects
  net/mlx5e: Refactor EN code to support IB link
  net/mlx5e: Creating and Destroying flow-steering tables for IB link
  net/mlx5e: Support netdevice creation for IB link type
  net/mlx5e: Refactor attach_netdev API
  net/mlx5e: Use underlay_qpn in tis creation
  net/mlx5e: Export resource creation function to be used in IB link
  net/mlx5: Enable flow-steering for IB link
  net/mlx5e: Enhanced flow table creation to support ETH and IB links.
  net/mlx5e: Change cleanup API in order to enable IB link
  net/mlx5e: Change mlx5e_open_locked and mlx5e_close_locked api
  net/mlx5e: Export open/close api for IB link
  include/linux/mlx5: Add mlx5_wqe_eth_pad and enhanced-ipoib-qp-mode
  net/mlx5e: Refactor TX send flow
  net/mlx5e: Export send function for IB link type
  net/mlx5e: New function pointer for build_rx_skb is
  net/mlx5e: Change the function that checks the packet type
  net/mlx5e: Add support for build_rx_skb for packet from IB type
  mlx5_ib: skeleton for mlx5_ib to support ipoib_ops

 drivers/infiniband/hw/mlx5/Makefile|   2 +-
 drivers/infiniband/hw/mlx5/main.c  |  10 +
 drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c| 289 ++
 drivers/infiniband/hw/mlx5/qp.c|   5 +-
 drivers/infiniband/ulp/ipoib/ipoib.h   |  35 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|  66 ++--
 drivers/infiniband/ulp/ipoib/ipoib_ethtool.c   |   6 +-
 drivers/infiniband/ulp/ipoib/ipoib_fs.c|   4 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c| 316 +++
 drivers/infiniband/ulp/ipoib/ipoib_main.c  | 299 ++
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |  39 +-
 drivers/infiniband/ulp/ipoib/ipoib_netlink.c   |  12 +-
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |  62 +--
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c  |   9 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h

[RFC v1 for accelerated IPoIB 02/25] IB/ipoib: separate control from HW operation on ipoib_open/stop ndo

2017-03-13 Thread Erez Shitrit

This patch is preparing the netdev part at the ipoib driver to be able
to use the ipoib_options.
It deals with the two flows from the .ndo: ipoib_open and ipoib_stop.
It arranges that area of code as follows:
All operations which deal with the HW resources, (for example change QP
state, post-receive etc.) are done in one place.
All operations that are control oriented (like restart multicast task,
start the reap_ah etc.) are done in separate place.

The functions that deals with the HW resources now located at
__ipoib_ib_dev_open for the ipoib_open flow and __ipoib_ib_dev_stop for
ipoib_stop.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h  |   2 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c   | 228 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c |   2 +-
 3 files changed, 129 insertions(+), 103 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 7cd9befd7d54..b3900b253ad5 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -501,7 +501,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 int ipoib_ib_dev_open(struct net_device *dev);
 void ipoib_ib_dev_up(struct net_device *dev);
 void ipoib_ib_dev_down(struct net_device *dev);
-void ipoib_ib_dev_stop(struct net_device *dev);
+int ipoib_ib_dev_stop_default(struct net_device *dev);
 void ipoib_pkey_dev_check_presence(struct net_device *dev);
 
 int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 3c0a35d883e2..5d732c5f01ee 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -693,24 +693,113 @@ static void ipoib_stop_ah(struct net_device *dev)
ipoib_flush_ah(dev);
 }
 
-void ipoib_ib_tx_timer_func(unsigned long ctx)
+static int recvs_pending(struct net_device *dev)
 {
-   drain_tx_cq((struct net_device *)ctx);
+   struct ipoib_dev_priv *priv = netdev_priv(dev);
+   int pending = 0;
+   int i;
+
+   for (i = 0; i < ipoib_recvq_size; ++i)
+   if (priv->rx_ring[i].skb)
+   ++pending;
+
+   return pending;
 }
 
-int ipoib_ib_dev_open(struct net_device *dev)
+int ipoib_ib_dev_stop_default(struct net_device *dev)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
-   int ret;
+   struct ib_qp_attr qp_attr;
+   unsigned long begin;
+   struct ipoib_tx_buf *tx_req;
+   int i;
 
-   ipoib_pkey_dev_check_presence(dev);
+   if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, >flags))
+   napi_disable(>napi);
 
-   if (!test_bit(IPOIB_PKEY_ASSIGNED, >flags)) {
-   ipoib_warn(priv, "P_Key 0x%04x is %s\n", priv->pkey,
-  (!(priv->pkey & 0x7fff) ? "Invalid" : "not found"));
-   return -1;
+   ipoib_cm_dev_stop(dev);
+
+   /*
+* Move our QP to the error state and then reinitialize in
+* when all work requests have completed or have been flushed.
+*/
+   qp_attr.qp_state = IB_QPS_ERR;
+   if (ib_modify_qp(priv->qp, _attr, IB_QP_STATE))
+   ipoib_warn(priv, "Failed to modify QP to ERROR state\n");
+
+   /* Wait for all sends and receives to complete */
+   begin = jiffies;
+
+   while (priv->tx_head != priv->tx_tail || recvs_pending(dev)) {
+   if (time_after(jiffies, begin + 5 * HZ)) {
+   ipoib_warn(priv, "timing out; %d sends %d receives not 
completed\n",
+  priv->tx_head - priv->tx_tail, 
recvs_pending(dev));
+
+   /*
+* assume the HW is wedged and just free up
+* all our pending work requests.
+*/
+   while ((int) priv->tx_tail - (int) priv->tx_head < 0) {
+   tx_req = >tx_ring[priv->tx_tail &
+   (ipoib_sendq_size - 1)];
+   ipoib_dma_unmap_tx(priv, tx_req);
+   dev_kfree_skb_any(tx_req->skb);
+   ++priv->tx_tail;
+   --priv->tx_outstanding;
+   }
+
+   for (i = 0; i < ipoib_recvq_size; ++i) {
+   struct ipoib_rx_buf *rx_req;
+
+   rx_req = >rx_ring[i];
+   if (!rx_req->skb)
+   continue;
+   ipoib_ud_dma_unmap_rx(priv,
+ priv->

Re: [PATCH linux-next 1/4] infiniband/ipoib: fix possible NULL pointer dereference in ipoib_get_iflink

2015-04-16 Thread Erez Shitrit
On Wed, Apr 15, 2015 at 7:06 PM, Jason Gunthorpe
jguntho...@obsidianresearch.com wrote:
 On Wed, Apr 15, 2015 at 09:17:14AM +0300, Erez Shitrit wrote:
 +   /* parent interface */
 +   if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags))
 +   return dev-ifindex;
 +
 +   /* child/vlan interface */
 +   if (!priv-parent)
 +   return -1;

 Like was said for other drivers, I can't see how parent can be null
 while IPOIB_FLAG_SUBINTERFACE is set. Drop the last if.

 It can, at least for ipoib child interface (AKA vlan), you can't
 control the call for that ndo and it can be called before the parent
 was set.

 If the ndo can be called before the netdev private structures are fully
 prepared then we have another bug, and returning -1 or 0 is not the right
 answer anyhow.

 For safety, fold this into your patch.

OK, will do that.


 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
 index 9fad7b5ac8b9..e62b007adf5d 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
 @@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
 ipoib_dev_priv *priv,
 /* MTU will be reset when mcast join happens */
 priv-dev-mtu   = IPOIB_UD_MTU(priv-max_ib_mtu);
 priv-mcast_mtu  = priv-admin_mtu = priv-dev-mtu;
 +   priv-parent = ppriv-dev;
 set_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags);

 result = ipoib_set_dev_features(priv, ppriv-ca);
 @@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
 ipoib_dev_priv *priv,
 goto register_failed;
 }

 -   priv-parent = ppriv-dev;
 -
 ipoib_create_debug_files(priv-dev);

 /* RTNL childs don't need proprietary sysfs entries */
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] IB/ipoib: Fix ndo_get_iflink

2015-04-16 Thread Erez Shitrit
Currently, iflink of the parent interface was always accessed, event 
when interface didn't have a parent and hence we rashed there.

Handle the interface types properly: for a child interface, return
the ifindex of the parent, for parent interface, return its ifindex.

For child devices, make sure to set the parent pointer prior to
invoking register_netdevice(), this allows the new ndo to be called
by the stack immediately after the child device is registered.

Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink')
Reported-by: Honggang Li ho...@redhat.com
Signed-off-by: Erez Shitrit ere...@mellanox.com
Signed-off-by: Honggang Li ho...@redhat.com
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 657b89b..915ad04 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device *dev)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
 
+   /* parent interface */
+   if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags))
+   return dev-ifindex;
+
+   /* child/vlan interface */
return priv-parent-ifindex;
 }
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 
b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 4dd1313..fca1a88 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
ipoib_dev_priv *priv,
/* MTU will be reset when mcast join happens */
priv-dev-mtu   = IPOIB_UD_MTU(priv-max_ib_mtu);
priv-mcast_mtu  = priv-admin_mtu = priv-dev-mtu;
+   priv-parent = ppriv-dev;
set_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags);
 
result = ipoib_set_dev_features(priv, ppriv-ca);
@@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
ipoib_dev_priv *priv,
goto register_failed;
}
 
-   priv-parent = ppriv-dev;
-
ipoib_create_debug_files(priv-dev);
 
/* RTNL childs don't need proprietary sysfs entries */
-- 
1.7.11.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V1 net-next] IB/ipoib: Fix ndo_get_iflink

2015-04-16 Thread Erez Shitrit
Currently, iflink of the parent interface was always accessed, even 
when interface didn't have a parent and hence we crashed there.

Handle the interface types properly: for a child interface, return
the ifindex of the parent, for parent interface, return its ifindex.

For child devices, make sure to set the parent pointer prior to
invoking register_netdevice(), this allows the new ndo to be called
by the stack immediately after the child device is registered.

Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink')
Reported-by: Honggang Li ho...@redhat.com
Signed-off-by: Erez Shitrit ere...@mellanox.com
Signed-off-by: Honggang Li ho...@redhat.com
---

changes from V0:
 - fixed two typos in the change-log

 drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 657b89b..915ad04 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device *dev)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
 
+   /* parent interface */
+   if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags))
+   return dev-ifindex;
+
+   /* child/vlan interface */
return priv-parent-ifindex;
 }
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 
b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 4dd1313..fca1a88 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
ipoib_dev_priv *priv,
/* MTU will be reset when mcast join happens */
priv-dev-mtu   = IPOIB_UD_MTU(priv-max_ib_mtu);
priv-mcast_mtu  = priv-admin_mtu = priv-dev-mtu;
+   priv-parent = ppriv-dev;
set_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags);
 
result = ipoib_set_dev_features(priv, ppriv-ca);
@@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
ipoib_dev_priv *priv,
goto register_failed;
}
 
-   priv-parent = ppriv-dev;
-
ipoib_create_debug_files(priv-dev);
 
/* RTNL childs don't need proprietary sysfs entries */
-- 
1.7.11.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html