Re: [PATCH v2 net-next 3/4] mlx4: xdp: Reserve headroom for receiving packet when XDP prog is active

2016-12-06 Thread Saeed Mahameed
On Tue, Dec 6, 2016 at 8:27 PM, Martin KaFai Lau <ka...@fb.com> wrote:
> On Tue, Dec 06, 2016 at 06:50:47PM +0200, Saeed Mahameed wrote:
>> On Mon, Dec 5, 2016 at 9:55 PM, Martin KaFai Lau <ka...@fb.com> wrote:
>> > On Mon, Dec 05, 2016 at 02:54:06AM +0200, Saeed Mahameed wrote:
>> >> On Sun, Dec 4, 2016 at 5:17 AM, Martin KaFai Lau <ka...@fb.com> wrote:
>> >> > Reserve XDP_PACKET_HEADROOM and honor bpf_xdp_adjust_head()
>> >> > when XDP prog is active.  This patch only affects the code
>> >> > path when XDP is active.
>> >> >
>> >> > Signed-off-by: Martin KaFai Lau <ka...@fb.com>
>> >> > ---
>> >>
>> >> Hi Martin, Sorry for the late review, i have some comments below
>> >>
>> >> >  drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 17 +++--
>> >> >  drivers/net/ethernet/mellanox/mlx4/en_rx.c | 23 
>> >> > +--
>> >> >  drivers/net/ethernet/mellanox/mlx4/en_tx.c |  9 +
>> >> >  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  3 ++-
>> >> >  4 files changed, 39 insertions(+), 13 deletions(-)
>> >> >
>> >> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
>> >> > b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> >> > index 311c14153b8b..094a13b52cf6 100644
>> >> > --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> >> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> >> > @@ -51,7 +51,8 @@
>> >> >  #include "mlx4_en.h"
>> >> >  #include "en_port.h"
>> >> >
>> >> > -#define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * 
>> >> > VLAN_HLEN)))
>> >> > +#define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * 
>> >> > VLAN_HLEN) - \
>> >> > +  XDP_PACKET_HEADROOM))
>> >> >
>> >> >  int mlx4_en_setup_tc(struct net_device *dev, u8 up)
>> >> >  {
>> >> > @@ -1551,6 +1552,7 @@ int mlx4_en_start_port(struct net_device *dev)
>> >> > struct mlx4_en_tx_ring *tx_ring;
>> >> > int rx_index = 0;
>> >> > int err = 0;
>> >> > +   int mtu;
>> >> > int i, t;
>> >> > int j;
>> >> > u8 mc_list[16] = {0};
>> >> > @@ -1684,8 +1686,12 @@ int mlx4_en_start_port(struct net_device *dev)
>> >> > }
>> >> >
>> >> > /* Configure port */
>> >> > +   mtu = priv->rx_skb_size + ETH_FCS_LEN;
>> >> > +   if (priv->tx_ring_num[TX_XDP])
>> >> > +   mtu += XDP_PACKET_HEADROOM;
>> >> > +
>> >>
>> >> Why would the physical MTU care for the headroom you preserve for XDP 
>> >> prog?
>> >> This is the wire MTU, it shouldn't be changed, please keep it as
>> >> before, any preservation you make in packets buffers are needed only
>> >> for FWD case or modify case (HW or wire should not care about them).
>> >
>> > Thanks for your feedback!
>>
>> Just doing my job :))
>>
>> >
>> > FWD:
>> > packet received from a port
>> > => process by a XDP prog
>> > => XDP_TX out to the same port.
>> >
>> > For example, if the received packet has 1500 payload and the XDP prog
>> > encapsulates it in an IPv6 header (+40 bytes).  After testing, it cannot
>> > be sent out due to the HW/wire MTU is 1500.
>> >
>> > Even the wire MTU info was passed to the XDP prog, there is not much a
>> > XDP prog could do here other than dropping it.
>> >
>> > Hence, this patch gives guarantee to the XDP prog such that
>> > it can always send out what it has received + XDP_PACKET_HEADROOM.
>> >
>>
>> Still i am not convinced ! this is against common sense,
>> this means that the XDP prog can send packets larger than the  MTU
>> seen on netdev!
>>
>> anyway if a packet with the size (MTU + XDP_PACKET_HEADROOM) was sent
>> from XDP ring and HW allowed it to exit somehow (with the code you
>> provided :)), most likely it will be dropped
>> at the other end.
> The MTU of our receiver side is larger than 1500.
>
> If the otherside could not handle >1500, 

Re: [BUG] ethernet:mellanox:mlx5: Oops in health_recover get_nic_state(dev)

2017-03-28 Thread Saeed Mahameed
On Tue, Mar 28, 2017 at 2:45 AM, Goel, Sameer  wrote:
> Stack frame:
> [ 1744.418958] [] get_nic_state+0x24/0x40 [mlx5_core]
> [ 1744.425273] [] health_recover+0x28/0x80 [mlx5_core]
> [ 1744.431496] [] process_one_work+0x150/0x460
> [ 1744.437218] [] worker_thread+0x50/0x4b8
> [ 1744.442609] [] kthread+0xd8/0xf0
> [ 1744.447377] [] ret_from_fork+0x10/0x20
>
> Summary:
> This issue was seen on QDF2400 system 30 mins after while running speccpu 
> 2006. During the test a recoverable PCIe error was seen that gave the 
> following log:
> [ 1673.170969] pcieport 0002:00:00.0: aer_status: 0x4000, aer_mask: 
> 0x0040
> [ 1673.177961] pcieport 0002:00:00.0: aer_layer=Transaction Layer, 
> aer_agent=Requester ID
> [ 1673.185832] pcieport 0002:00:00.0: aer_uncor_severity: 0x00462030
> [ 1675.536391] mlx5_core 0002:01:00.0: assert_var[0] 0x
> [ 1675.541093] mlx5_core 0002:01:00.0: assert_var[1] 0x
> [ 1675.546750] mlx5_core 0002:01:00.0: assert_var[2] 0x
> [ 1675.552377] mlx5_core 0002:01:00.0: assert_var[3] 0x
> [ 1675.558040] mlx5_core 0002:01:00.0: assert_var[4] 0x
> [ 1675.563661] mlx5_core 0002:01:00.0: assert_exit_ptr 0x
> [ 1675.569488] mlx5_core 0002:01:00.0: assert_callra 0x
> [ 1675.575120] mlx5_core 0002:01:00.0: fw_ver 15.4095.65535
> [ 1675.580426] mlx5_core 0002:01:00.0: hw_id 0x
> [ 1675.585363] mlx5_core 0002:01:00.0: irisc_index 255
> [ 1675.590242] mlx5_core 0002:01:00.0: synd 0xff: unrecognized error
> [ 1675.596301] mlx5_core 0002:01:00.0: ext_synd 0x
> [ 1675.601209] mlx5_core 0002:01:00.0: mlx5_enter_error_state:120:(pid 7205): 
> start
> [ 1675.608613] mlx5_core 0002:01:00.0: mlx5_enter_error_state:127:(pid 7205): 
> end
>
> After the above log we see the above stackframe and a page fault due to 
> invalid dev pointer.
>
> So the the recovery work is queued and the timer is stopped. Somehow the 
> workqueue is not cleared and when it runs the dev pointer is invalid.
>
> This issue was difficult to repro and was seen only once in multiple runs on 
> a specific device.

Hi Sameer,

Thanks for the report,
adding more relevant ppl

Mohamad/Daniel Does the above ring a bell ?
can you check ?

Thanks
Saeed.


[PATCH net 1/1] net/mlx5: Avoid dereferencing uninitialized pointer

2017-03-28 Thread Saeed Mahameed
From: Talat Batheesh <tal...@mellanox.com>

In NETDEV_CHANGEUPPER event the upper_info field is valid
only when linking is true. Otherwise it should be ignored.

Fixes: 7907f23adc18 (net/mlx5: Implement RoCE LAG feature)
Signed-off-by: Talat Batheesh <tal...@mellanox.com>
Reviewed-by: Aviv Heller <av...@mellanox.com>
Reviewed-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---

Hi Dave,

I will appreciate it if you queue up this patch for v4.9-stable.

Thanks in advance,
Saeed.

 drivers/net/ethernet/mellanox/mlx5/core/lag.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag.c 
b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
index 55957246c0e8..b5d5519542e8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
@@ -294,7 +294,7 @@ static int mlx5_handle_changeupper_event(struct mlx5_lag 
*ldev,
 struct 
netdev_notifier_changeupper_info *info)
 {
struct net_device *upper = info->upper_dev, *ndev_tmp;
-   struct netdev_lag_upper_info *lag_upper_info;
+   struct netdev_lag_upper_info *lag_upper_info = NULL;
bool is_bonded;
int bond_status = 0;
int num_slaves = 0;
@@ -303,7 +303,8 @@ static int mlx5_handle_changeupper_event(struct mlx5_lag 
*ldev,
if (!netif_is_lag_master(upper))
return 0;
 
-   lag_upper_info = info->upper_info;
+   if (info->linking)
+   lag_upper_info = info->upper_info;
 
/* The event may still be of interest if the slave does not belong to
 * us, but is enslaved to a master which has one or more of our netdevs
-- 
2.11.0



Re: [PATCH net-next 00/12] Mellanox mlx5e XDP performance optimization

2017-03-26 Thread Saeed Mahameed
On Sat, Mar 25, 2017 at 6:54 PM, Tom Herbert <t...@herbertland.com> wrote:
> On Fri, Mar 24, 2017 at 2:52 PM, Saeed Mahameed <sae...@mellanox.com> wrote:
>> Hi Dave,
>>
>> This series provides some preformancee optimizations for mlx5e
>> driver, especially for XDP TX flows.
>>
>> 1st patch is a simple change of rmb to dma_rmb in CQE fetch routine
>> which shows a huge gain for both RX and TX packet rates.
>>
>> 2nd patch removes write combining logic from the driver TX handler
>> and simplifies the TX logic while improving TX CPU utilization.
>>
>> All other patches combined provide some refactoring to the driver TX
>> flows to allow some significant XDP TX improvements.
>>
>> More details and performance numbers per patch can be found in each patch
>> commit message compared to the preceding patch.
>>
>> Overall performance improvemnets
>>   System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
>>
>> Test case   Baseline  Now  improvement
>> ---
>> TX packets (24 threads) 45Mpps54Mpps  20%
>> TC stack Drop (1 core)  3.45Mpps  3.6Mpps 5%
>> XDP Drop  (1 core)  14Mpps16.9Mpps20%
>> XDP TX(1 core)  10.4Mpps  13.7Mpps31%
>>
> Awesome, and good timing. I'll be presenting XDP at IETF next and
> would like to include these numbers in the presentation if you don't
> mind...
>

Not at all, please go ahead.

But as you see, the system i tested on is not that powerful. We can
get even better results with a modern system.
If you want i can provide you those numbers by mid-week.


[PATCH net-next 12/12] net/mlx5e: Different SQ types

2017-03-24 Thread Saeed Mahameed
Different SQ types (tx, xdp, ico) are growing apart, we separate them
and remove unwanted parts in each one of them, to simplify data path and
utilize data cache.

Remove DB union from SQ structures since it is not needed anymore as we
now have different SQ data type for each SQ.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  99 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 464 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |  33 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c   |  50 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c |   2 +-
 5 files changed, 392 insertions(+), 256 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 50f895fa5f31..bace9233dc1f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -319,13 +319,7 @@ struct mlx5e_sq_wqe_info {
u8  num_wqebbs;
 };
 
-enum mlx5e_sq_type {
-   MLX5E_SQ_TXQ,
-   MLX5E_SQ_ICO,
-   MLX5E_SQ_XDP
-};
-
-struct mlx5e_sq {
+struct mlx5e_txqsq {
/* data path */
 
/* dirtied @completion */
@@ -339,18 +333,11 @@ struct mlx5e_sq {
 
struct mlx5e_cqcq;
 
-   /* pointers to per tx element info: write@xmit, read@completion */
-   union {
-   struct {
-   struct sk_buff   **skb;
-   struct mlx5e_sq_dma   *dma_fifo;
-   struct mlx5e_tx_wqe_info  *wqe_info;
-   } txq;
-   struct mlx5e_sq_wqe_info *ico_wqe;
-   struct {
-   struct mlx5e_dma_info *di;
-   bool   doorbell;
-   } xdp;
+   /* write@xmit, read@completion */
+   struct {
+   struct sk_buff   **skb;
+   struct mlx5e_sq_dma   *dma_fifo;
+   struct mlx5e_tx_wqe_info  *wqe_info;
} db;
 
/* read only */
@@ -372,7 +359,67 @@ struct mlx5e_sq {
struct mlx5e_channel  *channel;
inttc;
u32rate_limit;
-   u8 type;
+} cacheline_aligned_in_smp;
+
+struct mlx5e_xdpsq {
+   /* data path */
+
+   /* dirtied @rx completion */
+   u16cc;
+   u16pc;
+
+   struct mlx5e_cqcq;
+
+   /* write@xmit, read@completion */
+   struct {
+   struct mlx5e_dma_info *di;
+   bool   doorbell;
+   } db;
+
+   /* read only */
+   struct mlx5_wq_cyc wq;
+   void __iomem  *uar_map;
+   u32sqn;
+   struct device *pdev;
+   __be32 mkey_be;
+   u8 min_inline_mode;
+   unsigned long  state;
+
+   /* control path */
+   struct mlx5_wq_ctrlwq_ctrl;
+   struct mlx5e_channel  *channel;
+} cacheline_aligned_in_smp;
+
+struct mlx5e_icosq {
+   /* data path */
+
+   /* dirtied @completion */
+   u16cc;
+
+   /* dirtied @xmit */
+   u16pc cacheline_aligned_in_smp;
+   u32dma_fifo_pc;
+   u16prev_cc;
+
+   struct mlx5e_cqcq;
+
+   /* write@xmit, read@completion */
+   struct {
+   struct mlx5e_sq_wqe_info *ico_wqe;
+   } db;
+
+   /* read only */
+   struct mlx5_wq_cyc wq;
+   void __iomem  *uar_map;
+   u32sqn;
+   u16edge;
+   struct device *pdev;
+   __be32 mkey_be;
+   unsigned long  state;
+
+   /* control path */
+   struct mlx5_wq_ctrlwq_ctrl;
+   struct mlx5e_channel  *channel;
 } cacheline_aligned_in_smp;
 
 static inline bool
@@ -477,7 +524,7 @@ struct mlx5e_rq {
 
/* XDP */
struct bpf_prog   *xdp_prog;
-   struct mlx5e_sqxdpsq;
+   struct mlx5e_xdpsq xdpsq;
 
/* control */
struct mlx5_wq_ctrlwq_ctrl;
@@ -497,8 +544,8 @@ enum channel_flags {
 struct mlx5e_channel {
/* data path */
struct mlx5e_rqrq;
-   struct mlx5e_sqsq[MLX5E_MAX_NUM_TC];
-   struct mlx5e_sqicosq;   /* internal control operations */
+   struct mlx5e_txqsq sq[MLX5E_MAX_NUM_TC];
+   struct mlx5e_icosq icosq;   /* internal control operations */
bool   xdp;
struct napi_struct napi;
struct device *pde

[PATCH net-next 03/12] net/mlx5e: Single bfreg (UAR) for all mlx5e SQs and netdevs

2017-03-24 Thread Saeed Mahameed
One is sufficient since Blue Flame is not supported anymore.
This will also come in handy for switchdev mode to save resources, since
VF representors will use same single UAR as well for their own SQs.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h|  1 -
 drivers/net/ethernet/mellanox/mlx5/core/en_common.c |  9 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c   | 17 +++--
 include/linux/mlx5/driver.h |  1 +
 4 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 85261e9ccf4a..22e4bad03f05 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -483,7 +483,6 @@ struct mlx5e_sq {
 
/* control path */
struct mlx5_wq_ctrlwq_ctrl;
-   struct mlx5_sq_bfreg   bfreg;
struct mlx5e_channel  *channel;
inttc;
u32rate_limit;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
index bd898d8deda0..20bdbe685795 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
@@ -107,10 +107,18 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev 
*mdev)
goto err_dealloc_transport_domain;
}
 
+   err = mlx5_alloc_bfreg(mdev, >bfreg, false, false);
+   if (err) {
+   mlx5_core_err(mdev, "alloc bfreg failed, %d\n", err);
+   goto err_destroy_mkey;
+   }
+
INIT_LIST_HEAD(>mlx5e_res.td.tirs_list);
 
return 0;
 
+err_destroy_mkey:
+   mlx5_core_destroy_mkey(mdev, >mkey);
 err_dealloc_transport_domain:
mlx5_core_dealloc_transport_domain(mdev, res->td.tdn);
 err_dealloc_pd:
@@ -122,6 +130,7 @@ void mlx5e_destroy_mdev_resources(struct mlx5_core_dev 
*mdev)
 {
struct mlx5e_resources *res = >mlx5e_res;
 
+   mlx5_free_bfreg(mdev, >bfreg);
mlx5_core_destroy_mkey(mdev, >mkey);
mlx5_core_dealloc_transport_domain(mdev, res->td.tdn);
mlx5_core_dealloc_pd(mdev, res->pdn);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index f9bcbd277adb..49c1769d13b9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1016,18 +1016,14 @@ static int mlx5e_create_sq(struct mlx5e_channel *c,
sq->mkey_be   = c->mkey_be;
sq->channel   = c;
sq->tc= tc;
+   sq->uar_map   = mdev->mlx5e_res.bfreg.map;
 
-   err = mlx5_alloc_bfreg(mdev, >bfreg, false, false);
-   if (err)
-   return err;
-
-   sq->uar_map = sq->bfreg.map;
param->wq.db_numa_node = cpu_to_node(c->cpu);
 
err = mlx5_wq_cyc_create(mdev, >wq, sqc_wq, >wq,
 >wq_ctrl);
if (err)
-   goto err_unmap_free_uar;
+   return err;
 
sq->wq.db   = >wq.db[MLX5_SND_DBR];
 
@@ -1053,20 +1049,13 @@ static int mlx5e_create_sq(struct mlx5e_channel *c,
 err_sq_wq_destroy:
mlx5_wq_destroy(>wq_ctrl);
 
-err_unmap_free_uar:
-   mlx5_free_bfreg(mdev, >bfreg);
-
return err;
 }
 
 static void mlx5e_destroy_sq(struct mlx5e_sq *sq)
 {
-   struct mlx5e_channel *c = sq->channel;
-   struct mlx5e_priv *priv = c->priv;
-
mlx5e_free_sq_db(sq);
mlx5_wq_destroy(>wq_ctrl);
-   mlx5_free_bfreg(priv->mdev, >bfreg);
 }
 
 static int mlx5e_enable_sq(struct mlx5e_sq *sq, struct mlx5e_sq_param *param)
@@ -1103,7 +1092,7 @@ static int mlx5e_enable_sq(struct mlx5e_sq *sq, struct 
mlx5e_sq_param *param)
MLX5_SET(sqc,  sqc, tis_lst_sz, param->type == MLX5E_SQ_ICO ? 0 : 1);
 
MLX5_SET(wq,   wq, wq_type,   MLX5_WQ_TYPE_CYCLIC);
-   MLX5_SET(wq,   wq, uar_page,  sq->bfreg.index);
+   MLX5_SET(wq,   wq, uar_page,  mdev->mlx5e_res.bfreg.index);
MLX5_SET(wq,   wq, log_wq_pg_sz,  sq->wq_ctrl.buf.page_shift -
  MLX5_ADAPTER_PAGE_SHIFT);
MLX5_SET64(wq, wq, dbr_addr,  sq->wq_ctrl.db.dma);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 2fcff6b4503f..f50864626230 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -728,6 +728,7 @@ struct mlx5e_resources {
u32pdn;
struct mlx5_td td;
struct mlx5_core_mkey  mkey;
+   struct mlx5_sq_bfreg   bfreg;
 };
 
 struct mlx5_core_dev {
-- 
2.11.0



[PATCH net-next 00/12] Mellanox mlx5e XDP performance optimization

2017-03-24 Thread Saeed Mahameed
Hi Dave,

This series provides some preformancee optimizations for mlx5e
driver, especially for XDP TX flows.

1st patch is a simple change of rmb to dma_rmb in CQE fetch routine
which shows a huge gain for both RX and TX packet rates.

2nd patch removes write combining logic from the driver TX handler
and simplifies the TX logic while improving TX CPU utilization.

All other patches combined provide some refactoring to the driver TX
flows to allow some significant XDP TX improvements.

More details and performance numbers per patch can be found in each patch
commit message compared to the preceding patch.

Overall performance improvemnets
  System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz

Test case   Baseline  Now  improvement
---
TX packets (24 threads) 45Mpps54Mpps  20%
TC stack Drop (1 core)  3.45Mpps  3.6Mpps 5%
XDP Drop  (1 core)  14Mpps16.9Mpps20%
XDP TX(1 core)  10.4Mpps  13.7Mpps31%

Thanks,
Saeed.

Saeed Mahameed (12):
  net/mlx5e: Use dma_rmb rather than rmb in CQE fetch routine
  net/mlx5e: Xmit, no write combining
  net/mlx5e: Single bfreg (UAR) for all mlx5e SQs and netdevs
  net/mlx5e: Move XDP completion functions to rx file
  net/mlx5e: Move mlx5e_rq struct declaration
  net/mlx5e: Move XDP SQ instance into RQ
  net/mlx5e: Poll XDP TX CQ before RX CQ
  net/mlx5e: Optimize XDP frame xmit
  net/mlx5e: Generalize tx helper functions for different SQ types
  net/mlx5e: Proper names for SQ/RQ/CQ functions
  net/mlx5e: Generalize SQ create/modify/destroy functions
  net/mlx5e: Different SQ types

 drivers/net/ethernet/mellanox/mlx5/core/en.h   | 319 +-
 .../net/ethernet/mellanox/mlx5/core/en_common.c|   9 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 644 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 124 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c| 147 +
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  70 +--
 include/linux/mlx5/driver.h|   1 +
 7 files changed, 716 insertions(+), 598 deletions(-)

-- 
2.11.0



[PATCH net-next 09/12] net/mlx5e: Generalize tx helper functions for different SQ types

2017-03-24 Thread Saeed Mahameed
In the next patches we will introduce different SQ types, for that we here
generalize some TX helper functions to work with more basic SQ parameters,
in order to re-use them for the different SQ types.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  | 35 -
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 13 +---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   | 10 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c   | 37 +--
 4 files changed, 48 insertions(+), 47 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f02d2cb8d148..50f895fa5f31 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -375,10 +375,10 @@ struct mlx5e_sq {
u8 type;
 } cacheline_aligned_in_smp;
 
-static inline bool mlx5e_sq_has_room_for(struct mlx5e_sq *sq, u16 n)
+static inline bool
+mlx5e_wqc_has_room_for(struct mlx5_wq_cyc *wq, u16 cc, u16 pc, u16 n)
 {
-   return (((sq->wq.sz_m1 & (sq->cc - sq->pc)) >= n) ||
-   (sq->cc  == sq->pc));
+   return (((wq->sz_m1 & (cc - pc)) >= n) || (cc == pc));
 }
 
 struct mlx5e_dma_info {
@@ -721,7 +721,6 @@ struct mlx5e_priv {
 
 void mlx5e_build_ptys2ethtool_map(void);
 
-void mlx5e_send_nop(struct mlx5e_sq *sq, bool notify_hw);
 u16 mlx5e_select_queue(struct net_device *dev, struct sk_buff *skb,
   void *accel_priv, select_queue_fallback_t fallback);
 netdev_tx_t mlx5e_xmit(struct sk_buff *skb, struct net_device *dev);
@@ -807,20 +806,40 @@ void mlx5e_set_rx_cq_mode_params(struct mlx5e_params 
*params,
 u8 cq_period_mode);
 void mlx5e_set_rq_type_params(struct mlx5e_priv *priv, u8 rq_type);
 
-static inline void
-mlx5e_tx_notify_hw(struct mlx5e_sq *sq, struct mlx5_wqe_ctrl_seg *ctrl)
+static inline
+struct mlx5e_tx_wqe *mlx5e_post_nop(struct mlx5_wq_cyc *wq, u32 sqn, u16 *pc)
 {
+   u16 pi   = *pc & wq->sz_m1;
+   struct mlx5e_tx_wqe*wqe  = mlx5_wq_cyc_get_wqe(wq, pi);
+   struct mlx5_wqe_ctrl_seg   *cseg = >ctrl;
+
+   memset(cseg, 0, sizeof(*cseg));
+
+   cseg->opmod_idx_opcode = cpu_to_be32((*pc << 8) | MLX5_OPCODE_NOP);
+   cseg->qpn_ds   = cpu_to_be32((sqn << 8) | 0x01);
+
+   (*pc)++;
+
+   return wqe;
+}
+
+static inline
+void mlx5e_notify_hw(struct mlx5_wq_cyc *wq, u16 pc,
+void __iomem *uar_map,
+struct mlx5_wqe_ctrl_seg *ctrl)
+{
+   ctrl->fm_ce_se = MLX5_WQE_CTRL_CQ_UPDATE;
/* ensure wqe is visible to device before updating doorbell record */
dma_wmb();
 
-   *sq->wq.db = cpu_to_be32(sq->pc);
+   *wq->db = cpu_to_be32(pc);
 
/* ensure doorbell record is visible to device before ringing the
 * doorbell
 */
wmb();
 
-   mlx5_write64((__be32 *)ctrl, sq->uar_map, NULL);
+   mlx5_write64((__be32 *)ctrl, uar_map, NULL);
 }
 
 static inline void mlx5e_cq_arm(struct mlx5e_cq *cq)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index d39ee6669b8e..7faf2bcccfa6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -847,6 +847,7 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
 {
struct mlx5e_sq *sq = >icosq;
u16 pi = sq->pc & sq->wq.sz_m1;
+   struct mlx5e_tx_wqe *nopwqe;
int err;
 
err = mlx5e_create_rq(c, param, rq);
@@ -867,8 +868,9 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
 
sq->db.ico_wqe[pi].opcode = MLX5_OPCODE_NOP;
sq->db.ico_wqe[pi].num_wqebbs = 1;
-   mlx5e_send_nop(sq, true); /* trigger mlx5e_post_rx_wqes() */
-
+   nopwqe = mlx5e_post_nop(>wq, sq->sqn, >pc);
+   mlx5e_notify_hw(>wq, sq->pc, sq->uar_map, >ctrl);
+   sq->stats.nop++; /* TODO no need for SQ stats in ico */
return 0;
 
 err_disable_rq:
@@ -1202,9 +1204,12 @@ static void mlx5e_close_sq(struct mlx5e_sq *sq)
netif_tx_disable_queue(sq->txq);
 
/* last doorbell out, godspeed .. */
-   if (mlx5e_sq_has_room_for(sq, 1)) {
+   if (mlx5e_wqc_has_room_for(>wq, sq->cc, sq->pc, 1)) {
+   struct mlx5e_tx_wqe *nop;
+
sq->db.txq.skb[(sq->pc & sq->wq.sz_m1)] = NULL;
-   mlx5e_send_nop(sq, true);
+   nop = mlx5e_post_nop(>wq, sq->sqn, >pc);
+   mlx5e_notify_hw(>wq, sq->pc, sq->uar_map, 
>ctrl);
 

[PATCH net-next 08/12] net/mlx5e: Optimize XDP frame xmit

2017-03-24 Thread Saeed Mahameed
XDP SQ has a fixed size WQE (MLX5E_XDP_TX_WQEBBS = 1) and only posts
one kind of WQE (MLX5_OPCODE_SEND),

Also we initialize SQ descriptors static fields once on open_xdpsq,
rather than every time on critical path.

Optimize the code in light of those facts and add a prefetch of the TX
descriptor first thing in the xdp xmit function.

Performance improvement:
System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz

Test case  Before Nowimprovement
---
XDP TX   (1 core)  13Mpps13.7Mpps   5%

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  5 ---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 43 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   | 41 ++---
 3 files changed, 47 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 5e4ae94c9f6a..f02d2cb8d148 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -116,12 +116,8 @@
(DIV_ROUND_UP(sizeof(struct mlx5e_umr_wqe), MLX5_SEND_WQE_BB))
 
 #define MLX5E_XDP_MIN_INLINE (ETH_HLEN + VLAN_HLEN)
-#define MLX5E_XDP_IHS_DS_COUNT \
-   DIV_ROUND_UP(MLX5E_XDP_MIN_INLINE - 2, MLX5_SEND_WQE_DS)
 #define MLX5E_XDP_TX_DS_COUNT \
((sizeof(struct mlx5e_tx_wqe) / MLX5_SEND_WQE_DS) + 1 /* SG DS */)
-#define MLX5E_XDP_TX_WQEBBS \
-   DIV_ROUND_UP(MLX5E_XDP_TX_DS_COUNT, MLX5_SEND_WQEBB_NUM_DS)
 
 #define MLX5E_NUM_MAIN_GROUPS 9
 
@@ -352,7 +348,6 @@ struct mlx5e_sq {
} txq;
struct mlx5e_sq_wqe_info *ico_wqe;
struct {
-   struct mlx5e_sq_wqe_info  *wqe_info;
struct mlx5e_dma_info *di;
bool   doorbell;
} xdp;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 210033187bfe..d39ee6669b8e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -894,7 +894,6 @@ static void mlx5e_close_rq(struct mlx5e_rq *rq)
 static void mlx5e_free_sq_xdp_db(struct mlx5e_sq *sq)
 {
kfree(sq->db.xdp.di);
-   kfree(sq->db.xdp.wqe_info);
 }
 
 static int mlx5e_alloc_sq_xdp_db(struct mlx5e_sq *sq, int numa)
@@ -903,9 +902,7 @@ static int mlx5e_alloc_sq_xdp_db(struct mlx5e_sq *sq, int 
numa)
 
sq->db.xdp.di = kzalloc_node(sizeof(*sq->db.xdp.di) * wq_sz,
 GFP_KERNEL, numa);
-   sq->db.xdp.wqe_info = kzalloc_node(sizeof(*sq->db.xdp.wqe_info) * wq_sz,
-  GFP_KERNEL, numa);
-   if (!sq->db.xdp.di || !sq->db.xdp.wqe_info) {
+   if (!sq->db.xdp.di) {
mlx5e_free_sq_xdp_db(sq);
return -ENOMEM;
}
@@ -993,7 +990,7 @@ static int mlx5e_sq_get_max_wqebbs(u8 sq_type)
case MLX5E_SQ_ICO:
return MLX5E_ICOSQ_MAX_WQEBBS;
case MLX5E_SQ_XDP:
-   return MLX5E_XDP_TX_WQEBBS;
+   return 1;
}
return MLX5_SEND_WQE_MAX_WQEBBS;
 }
@@ -1513,6 +1510,40 @@ static inline int mlx5e_get_max_num_channels(struct 
mlx5_core_dev *mdev)
  MLX5E_MAX_NUM_CHANNELS);
 }
 
+static int mlx5e_open_xdpsq(struct mlx5e_channel *c,
+   struct mlx5e_sq_param *param,
+   struct mlx5e_sq *sq)
+{
+   unsigned int ds_cnt = MLX5E_XDP_TX_DS_COUNT;
+   unsigned int inline_hdr_sz = 0;
+   int err;
+   int i;
+
+   err = mlx5e_open_sq(c, 0, param, sq);
+   if (err)
+   return err;
+
+   if (sq->min_inline_mode != MLX5_INLINE_MODE_NONE) {
+   inline_hdr_sz = MLX5E_XDP_MIN_INLINE;
+   ds_cnt++;
+   }
+
+   /* Pre initialize fixed WQE fields */
+   for (i = 0; i < mlx5_wq_cyc_get_size(>wq); i++) {
+   struct mlx5e_tx_wqe  *wqe  = mlx5_wq_cyc_get_wqe(>wq, 
i);
+   struct mlx5_wqe_ctrl_seg *cseg = >ctrl;
+   struct mlx5_wqe_eth_seg  *eseg = >eth;
+   struct mlx5_wqe_data_seg *dseg;
+
+   cseg->qpn_ds = cpu_to_be32((sq->sqn << 8) | ds_cnt);
+   eseg->inline_hdr.sz = cpu_to_be16(inline_hdr_sz);
+
+   dseg = (struct mlx5_wqe_data_seg *)cseg + (ds_cnt - 1);
+   dseg->lkey = sq->mkey_be;
+   }
+   return 0;
+}
+
 static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
  struct mlx5e_channel_param *cparam,
  struct mlx5e_channel **cp)
@@ -15

[PATCH net-next 02/12] net/mlx5e: Xmit, no write combining

2017-03-24 Thread Saeed Mahameed
mlx5e netdev Blue Flame (write combining) support demands a lot of
overhead for a little latency gain for some special cases, this overhead
is hurting the common case.

Here we remove xmit Blue Flame support by creating all bfregs with no
write combining for all SQs, and we remove a lot of BF logic and
conditions from xmit data path.

Simplify mlx5e_tx_notify_hw (doorbell function) by removing BF related
code and by removing one memory barrier needed for WC mapped SQ doorbell
buffers, which no longer exist.

Performance improvement:
System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz

Test case   Before  Now  improvement
---
TX packets (24 threads) 50Mpps  54Mpps8%

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  | 20 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  6 +---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |  4 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c   | 42 ++-
 4 files changed, 9 insertions(+), 63 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index dc52053128bc..85261e9ccf4a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -111,7 +111,6 @@
 #define MLX5E_MAX_NUM_SQS  (MLX5E_MAX_NUM_CHANNELS * 
MLX5E_MAX_NUM_TC)
 #define MLX5E_TX_CQ_POLL_BUDGET128
 #define MLX5E_UPDATE_STATS_INTERVAL200 /* msecs */
-#define MLX5E_SQ_BF_BUDGET 16
 
 #define MLX5E_ICOSQ_MAX_WQEBBS \
(DIV_ROUND_UP(sizeof(struct mlx5e_umr_wqe), MLX5_SEND_WQE_BB))
@@ -426,7 +425,6 @@ struct mlx5e_sq_dma {
 
 enum {
MLX5E_SQ_STATE_ENABLED,
-   MLX5E_SQ_STATE_BF_ENABLE,
 };
 
 struct mlx5e_sq_wqe_info {
@@ -450,9 +448,6 @@ struct mlx5e_sq {
/* dirtied @xmit */
u16pc cacheline_aligned_in_smp;
u32dma_fifo_pc;
-   u16bf_offset;
-   u16prev_cc;
-   u8 bf_budget;
struct mlx5e_sq_stats  stats;
 
struct mlx5e_cqcq;
@@ -478,7 +473,6 @@ struct mlx5e_sq {
void __iomem  *uar_map;
struct netdev_queue   *txq;
u32sqn;
-   u16bf_buf_size;
u16max_inline;
u8 min_inline_mode;
u16edge;
@@ -818,11 +812,9 @@ void mlx5e_set_rx_cq_mode_params(struct mlx5e_params 
*params,
 u8 cq_period_mode);
 void mlx5e_set_rq_type_params(struct mlx5e_priv *priv, u8 rq_type);
 
-static inline void mlx5e_tx_notify_hw(struct mlx5e_sq *sq,
- struct mlx5_wqe_ctrl_seg *ctrl, int bf_sz)
+static inline void
+mlx5e_tx_notify_hw(struct mlx5e_sq *sq, struct mlx5_wqe_ctrl_seg *ctrl)
 {
-   u16 ofst = sq->bf_offset;
-
/* ensure wqe is visible to device before updating doorbell record */
dma_wmb();
 
@@ -832,14 +824,8 @@ static inline void mlx5e_tx_notify_hw(struct mlx5e_sq *sq,
 * doorbell
 */
wmb();
-   if (bf_sz)
-   __iowrite64_copy(sq->uar_map + ofst, ctrl, bf_sz);
-   else
-   mlx5_write64((__be32 *)ctrl, sq->uar_map + ofst, NULL);
-   /* flush the write-combining mapped buffer */
-   wmb();
 
-   sq->bf_offset ^= sq->bf_buf_size;
+   mlx5_write64((__be32 *)ctrl, sq->uar_map, NULL);
 }
 
 static inline void mlx5e_cq_arm(struct mlx5e_cq *cq)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ddd7464c6b45..f9bcbd277adb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1017,7 +1017,7 @@ static int mlx5e_create_sq(struct mlx5e_channel *c,
sq->channel   = c;
sq->tc= tc;
 
-   err = mlx5_alloc_bfreg(mdev, >bfreg, MLX5_CAP_GEN(mdev, bf), false);
+   err = mlx5_alloc_bfreg(mdev, >bfreg, false, false);
if (err)
return err;
 
@@ -1030,10 +1030,7 @@ static int mlx5e_create_sq(struct mlx5e_channel *c,
goto err_unmap_free_uar;
 
sq->wq.db   = >wq.db[MLX5_SND_DBR];
-   if (sq->bfreg.wc)
-   set_bit(MLX5E_SQ_STATE_BF_ENABLE, >state);
 
-   sq->bf_buf_size = (1 << MLX5_CAP_GEN(mdev, log_bf_reg_size)) / 2;
sq->max_inline  = param->max_inline;
sq->min_inline_mode = param->min_inline_mode;
 
@@ -1050,7 +1047,6 @@ static int mlx5e_create_sq(struct mlx5e_channel *c,
}
 
sq

[PATCH net-next 05/12] net/mlx5e: Move mlx5e_rq struct declaration

2017-03-24 Thread Saeed Mahameed
Move struct mlx5e_rq and friends to appear after mlx5e_sq declaration in
en.h.

We will need this for next patch to move the mlx5e_sq instance into
mlx5e_rq struct for XDP SQs.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 213 +--
 1 file changed, 105 insertions(+), 108 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index fce0eca0701c..8d789a25a1c0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -297,19 +297,113 @@ struct mlx5e_cq {
struct mlx5_frag_wq_ctrl   wq_ctrl;
 } cacheline_aligned_in_smp;
 
-struct mlx5e_rq;
-typedef void (*mlx5e_fp_handle_rx_cqe)(struct mlx5e_rq *rq,
-  struct mlx5_cqe64 *cqe);
-typedef int (*mlx5e_fp_alloc_wqe)(struct mlx5e_rq *rq, struct mlx5e_rx_wqe 
*wqe,
- u16 ix);
+struct mlx5e_tx_wqe_info {
+   u32 num_bytes;
+   u8  num_wqebbs;
+   u8  num_dma;
+};
+
+enum mlx5e_dma_map_type {
+   MLX5E_DMA_MAP_SINGLE,
+   MLX5E_DMA_MAP_PAGE
+};
+
+struct mlx5e_sq_dma {
+   dma_addr_t  addr;
+   u32 size;
+   enum mlx5e_dma_map_type type;
+};
+
+enum {
+   MLX5E_SQ_STATE_ENABLED,
+};
+
+struct mlx5e_sq_wqe_info {
+   u8  opcode;
+   u8  num_wqebbs;
+};
 
-typedef void (*mlx5e_fp_dealloc_wqe)(struct mlx5e_rq *rq, u16 ix);
+enum mlx5e_sq_type {
+   MLX5E_SQ_TXQ,
+   MLX5E_SQ_ICO,
+   MLX5E_SQ_XDP
+};
+
+struct mlx5e_sq {
+   /* data path */
+
+   /* dirtied @completion */
+   u16cc;
+   u32dma_fifo_cc;
+
+   /* dirtied @xmit */
+   u16pc cacheline_aligned_in_smp;
+   u32dma_fifo_pc;
+   struct mlx5e_sq_stats  stats;
+
+   struct mlx5e_cqcq;
+
+   /* pointers to per tx element info: write@xmit, read@completion */
+   union {
+   struct {
+   struct sk_buff   **skb;
+   struct mlx5e_sq_dma   *dma_fifo;
+   struct mlx5e_tx_wqe_info  *wqe_info;
+   } txq;
+   struct mlx5e_sq_wqe_info *ico_wqe;
+   struct {
+   struct mlx5e_sq_wqe_info  *wqe_info;
+   struct mlx5e_dma_info *di;
+   bool   doorbell;
+   } xdp;
+   } db;
+
+   /* read only */
+   struct mlx5_wq_cyc wq;
+   u32dma_fifo_mask;
+   void __iomem  *uar_map;
+   struct netdev_queue   *txq;
+   u32sqn;
+   u16max_inline;
+   u8 min_inline_mode;
+   u16edge;
+   struct device *pdev;
+   struct mlx5e_tstamp   *tstamp;
+   __be32 mkey_be;
+   unsigned long  state;
+
+   /* control path */
+   struct mlx5_wq_ctrlwq_ctrl;
+   struct mlx5e_channel  *channel;
+   inttc;
+   u32rate_limit;
+   u8 type;
+} cacheline_aligned_in_smp;
+
+static inline bool mlx5e_sq_has_room_for(struct mlx5e_sq *sq, u16 n)
+{
+   return (((sq->wq.sz_m1 & (sq->cc - sq->pc)) >= n) ||
+   (sq->cc  == sq->pc));
+}
 
 struct mlx5e_dma_info {
struct page *page;
dma_addr_t  addr;
 };
 
+struct mlx5e_umr_dma_info {
+   __be64*mtt;
+   dma_addr_t mtt_addr;
+   struct mlx5e_dma_info  dma_info[MLX5_MPWRQ_PAGES_PER_WQE];
+   struct mlx5e_umr_wqe   wqe;
+};
+
+struct mlx5e_mpw_info {
+   struct mlx5e_umr_dma_info umr;
+   u16 consumed_strides;
+   u16 skbs_frags[MLX5_MPWRQ_PAGES_PER_WQE];
+};
+
 struct mlx5e_rx_am_stats {
int ppms; /* packets per msec */
int epms; /* events per msec */
@@ -346,6 +440,11 @@ struct mlx5e_page_cache {
struct mlx5e_dma_info page_cache[MLX5E_CACHE_SIZE];
 };
 
+struct mlx5e_rq;
+typedef void (*mlx5e_fp_handle_rx_cqe)(struct mlx5e_rq*, struct mlx5_cqe64*);
+typedef int (*mlx5e_fp_alloc_wqe)(struct mlx5e_rq*, struct mlx5e_rx_wqe*, u16);
+typedef void (*mlx5e_fp_dealloc_wqe)(struct mlx5e_rq*, u16);
+
 struct mlx5e_rq {
/* data path */
struct mlx5_wq_ll  wq;
@@ -393,108 +492,6 @@ struct mlx5e_rq {
struct mlx5_core_mkey  umr_mkey;
 } cacheline_aligned_in_smp;
 
-struct mlx5e_umr_dma_info {
-   __be64*mtt;
-   dma_addr_t mtt_addr;
-   struct mlx5e_dma_info  dma_info[MLX5_MPWRQ_PAGES_PER_

[PATCH net-next 10/12] net/mlx5e: Proper names for SQ/RQ/CQ functions

2017-03-24 Thread Saeed Mahameed
Rename mlx5e_{create,destroy}_{sq,rq,cq} to
mlx5e_{alloc,free}_{sq,rq,cq}.

Rename mlx5e_{enable,disable}_{sq,rq,cq} to
mlx5e_{create,destroy}_{sq,rq,cq}.

mlx5e_{enable,disable}_{sq,rq,cq} used to actually create/destroy the SQ
in FW, so we rename them to align the functions names with FW semantics.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 126 +++---
 1 file changed, 63 insertions(+), 63 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 7faf2bcccfa6..d03afa535064 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -539,9 +539,9 @@ static int mlx5e_create_rq_umr_mkey(struct mlx5e_rq *rq)
return mlx5e_create_umr_mkey(priv, num_mtts, PAGE_SHIFT, >umr_mkey);
 }
 
-static int mlx5e_create_rq(struct mlx5e_channel *c,
-  struct mlx5e_rq_param *param,
-  struct mlx5e_rq *rq)
+static int mlx5e_alloc_rq(struct mlx5e_channel *c,
+ struct mlx5e_rq_param *param,
+ struct mlx5e_rq *rq)
 {
struct mlx5e_priv *priv = c->priv;
struct mlx5_core_dev *mdev = priv->mdev;
@@ -674,7 +674,7 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
return err;
 }
 
-static void mlx5e_destroy_rq(struct mlx5e_rq *rq)
+static void mlx5e_free_rq(struct mlx5e_rq *rq)
 {
int i;
 
@@ -699,7 +699,7 @@ static void mlx5e_destroy_rq(struct mlx5e_rq *rq)
mlx5_wq_destroy(>wq_ctrl);
 }
 
-static int mlx5e_enable_rq(struct mlx5e_rq *rq, struct mlx5e_rq_param *param)
+static int mlx5e_create_rq(struct mlx5e_rq *rq, struct mlx5e_rq_param *param)
 {
struct mlx5e_priv *priv = rq->priv;
struct mlx5_core_dev *mdev = priv->mdev;
@@ -798,7 +798,7 @@ static int mlx5e_modify_rq_vsd(struct mlx5e_rq *rq, bool 
vsd)
return err;
 }
 
-static void mlx5e_disable_rq(struct mlx5e_rq *rq)
+static void mlx5e_destroy_rq(struct mlx5e_rq *rq)
 {
mlx5_core_destroy_rq(rq->priv->mdev, rq->rqn);
 }
@@ -850,18 +850,18 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
struct mlx5e_tx_wqe *nopwqe;
int err;
 
-   err = mlx5e_create_rq(c, param, rq);
+   err = mlx5e_alloc_rq(c, param, rq);
if (err)
return err;
 
-   err = mlx5e_enable_rq(rq, param);
+   err = mlx5e_create_rq(rq, param);
if (err)
-   goto err_destroy_rq;
+   goto err_free_rq;
 
set_bit(MLX5E_RQ_STATE_ENABLED, >state);
err = mlx5e_modify_rq_state(rq, MLX5_RQC_STATE_RST, MLX5_RQC_STATE_RDY);
if (err)
-   goto err_disable_rq;
+   goto err_destroy_rq;
 
if (param->am_enabled)
set_bit(MLX5E_RQ_STATE_AM, >rq.state);
@@ -873,11 +873,11 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
sq->stats.nop++; /* TODO no need for SQ stats in ico */
return 0;
 
-err_disable_rq:
-   clear_bit(MLX5E_RQ_STATE_ENABLED, >state);
-   mlx5e_disable_rq(rq);
 err_destroy_rq:
+   clear_bit(MLX5E_RQ_STATE_ENABLED, >state);
mlx5e_destroy_rq(rq);
+err_free_rq:
+   mlx5e_free_rq(rq);
 
return err;
 }
@@ -888,9 +888,9 @@ static void mlx5e_close_rq(struct mlx5e_rq *rq)
napi_synchronize(>channel->napi); /* prevent mlx5e_post_rx_wqes */
cancel_work_sync(>am.work);
 
-   mlx5e_disable_rq(rq);
-   mlx5e_free_rx_descs(rq);
mlx5e_destroy_rq(rq);
+   mlx5e_free_rx_descs(rq);
+   mlx5e_free_rq(rq);
 }
 
 static void mlx5e_free_sq_xdp_db(struct mlx5e_sq *sq)
@@ -997,10 +997,10 @@ static int mlx5e_sq_get_max_wqebbs(u8 sq_type)
return MLX5_SEND_WQE_MAX_WQEBBS;
 }
 
-static int mlx5e_create_sq(struct mlx5e_channel *c,
-  int tc,
-  struct mlx5e_sq_param *param,
-  struct mlx5e_sq *sq)
+static int mlx5e_alloc_sq(struct mlx5e_channel *c,
+ int tc,
+ struct mlx5e_sq_param *param,
+ struct mlx5e_sq *sq)
 {
struct mlx5e_priv *priv = c->priv;
struct mlx5_core_dev *mdev = priv->mdev;
@@ -1051,13 +1051,13 @@ static int mlx5e_create_sq(struct mlx5e_channel *c,
return err;
 }
 
-static void mlx5e_destroy_sq(struct mlx5e_sq *sq)
+static void mlx5e_free_sq(struct mlx5e_sq *sq)
 {
mlx5e_free_sq_db(sq);
mlx5_wq_destroy(>wq_ctrl);
 }
 
-static int mlx5e_enable_sq(struct mlx5e_sq *sq, struct mlx5e_sq_param *param)
+static int mlx5e_create_sq(struct mlx5e_sq *sq, struct mlx5e_sq_param *param)
 {
struct mlx5e_channel *c = sq->channel;
struct mlx5e_priv *priv = c->priv;
@@ -

[PATCH net-next 04/12] net/mlx5e: Move XDP completion functions to rx file

2017-03-24 Thread Saeed Mahameed
XDP code belongs to RX path, move mlx5e_poll_xdp_tx_cq and
mlx5e_free_xdp_tx_descs to en_rx.c.

Rename them to mlx5e_poll_xdpsq_cq and mlx5e_free_xdpsq_descs.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   | 82 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c   | 24 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 62 +
 4 files changed, 86 insertions(+), 84 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 22e4bad03f05..fce0eca0701c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -737,6 +737,8 @@ void mlx5e_cq_error_event(struct mlx5_core_cq *mcq, enum 
mlx5_event event);
 int mlx5e_napi_poll(struct napi_struct *napi, int budget);
 bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget);
 int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget);
+bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq);
+void mlx5e_free_xdpsq_descs(struct mlx5e_sq *sq);
 void mlx5e_free_sq_descs(struct mlx5e_sq *sq);
 
 void mlx5e_page_release(struct mlx5e_rq *rq, struct mlx5e_dma_info *dma_info,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 873b3085756c..bc74d6032a5c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -989,3 +989,85 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
 
return work_done;
 }
+
+bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq)
+{
+   struct mlx5e_sq *sq;
+   u16 sqcc;
+   int i;
+
+   sq = container_of(cq, struct mlx5e_sq, cq);
+
+   if (unlikely(!test_bit(MLX5E_SQ_STATE_ENABLED, >state)))
+   return false;
+
+   /* sq->cc must be updated only after mlx5_cqwq_update_db_record(),
+* otherwise a cq overrun may occur
+*/
+   sqcc = sq->cc;
+
+   for (i = 0; i < MLX5E_TX_CQ_POLL_BUDGET; i++) {
+   struct mlx5_cqe64 *cqe;
+   u16 wqe_counter;
+   bool last_wqe;
+
+   cqe = mlx5e_get_cqe(cq);
+   if (!cqe)
+   break;
+
+   mlx5_cqwq_pop(>wq);
+
+   wqe_counter = be16_to_cpu(cqe->wqe_counter);
+
+   do {
+   struct mlx5e_sq_wqe_info *wi;
+   struct mlx5e_dma_info *di;
+   u16 ci;
+
+   last_wqe = (sqcc == wqe_counter);
+
+   ci = sqcc & sq->wq.sz_m1;
+   di = >db.xdp.di[ci];
+   wi = >db.xdp.wqe_info[ci];
+
+   if (unlikely(wi->opcode == MLX5_OPCODE_NOP)) {
+   sqcc++;
+   continue;
+   }
+
+   sqcc += wi->num_wqebbs;
+   /* Recycle RX page */
+   mlx5e_page_release(>channel->rq, di, true);
+   } while (!last_wqe);
+   }
+
+   mlx5_cqwq_update_db_record(>wq);
+
+   /* ensure cq space is freed before enabling more cqes */
+   wmb();
+
+   sq->cc = sqcc;
+   return (i == MLX5E_TX_CQ_POLL_BUDGET);
+}
+
+void mlx5e_free_xdpsq_descs(struct mlx5e_sq *sq)
+{
+   struct mlx5e_sq_wqe_info *wi;
+   struct mlx5e_dma_info *di;
+   u16 ci;
+
+   while (sq->cc != sq->pc) {
+   ci = sq->cc & sq->wq.sz_m1;
+   di = >db.xdp.di[ci];
+   wi = >db.xdp.wqe_info[ci];
+
+   if (wi->opcode == MLX5_OPCODE_NOP) {
+   sq->cc++;
+   continue;
+   }
+
+   sq->cc += wi->num_wqebbs;
+
+   mlx5e_page_release(>channel->rq, di, false);
+   }
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index eec4354208ee..7497b6ac4382 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -493,28 +493,6 @@ static void mlx5e_free_txq_sq_descs(struct mlx5e_sq *sq)
}
 }
 
-static void mlx5e_free_xdp_sq_descs(struct mlx5e_sq *sq)
-{
-   struct mlx5e_sq_wqe_info *wi;
-   struct mlx5e_dma_info *di;
-   u16 ci;
-
-   while (sq->cc != sq->pc) {
-   ci = sq->cc & sq->wq.sz_m1;
-   di = >db.xdp.di[ci];
-   wi = >db.xdp.wqe_info[ci];
-
-   if (wi->opcode == MLX5_OPCODE_NOP) {
-   sq->cc++;
-   continue;
-   }
-
-   sq->cc += wi->

[PATCH net-next 01/12] net/mlx5e: Use dma_rmb rather than rmb in CQE fetch routine

2017-03-24 Thread Saeed Mahameed
Use dma_rmb in mlx5e_get_cqe rather than aggressive rmb (at least on
some architectures), this should help improve the performance on such
CPU archs where dma_rmb is optimized.

Performance improvement:
System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz

Test case   Baseline  Now  improvement
---
TX packets (24 threads) 45Mpps50Mpps  11%
TC stack Drop (1 core)  3.45Mpps  3.6Mpps 5%
XDP Drop  (1 core)  14Mpps16.9Mpps20%
XDP TX(1 core)  10.4Mpps  12Mpps  15%

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index e5c12a732aa1..d8cda2f6239b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -44,7 +44,7 @@ struct mlx5_cqe64 *mlx5e_get_cqe(struct mlx5e_cq *cq)
return NULL;
 
/* ensure cqe content is read after cqe ownership bit */
-   rmb();
+   dma_rmb();
 
return cqe;
 }
-- 
2.11.0



[PATCH net-next 06/12] net/mlx5e: Move XDP SQ instance into RQ

2017-03-24 Thread Saeed Mahameed
To save many rq->channel->sq dereferences in fast-path.
And rename it to xdpsq.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  4 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 12 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   | 18 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c |  2 +-
 4 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 8d789a25a1c0..5e4ae94c9f6a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -479,7 +479,10 @@ struct mlx5e_rq {
u16rx_headroom;
 
struct mlx5e_rx_am am; /* Adaptive Moderation */
+
+   /* XDP */
struct bpf_prog   *xdp_prog;
+   struct mlx5e_sqxdpsq;
 
/* control */
struct mlx5_wq_ctrlwq_ctrl;
@@ -499,7 +502,6 @@ enum channel_flags {
 struct mlx5e_channel {
/* data path */
struct mlx5e_rqrq;
-   struct mlx5e_sqxdp_sq;
struct mlx5e_sqsq[MLX5E_MAX_NUM_TC];
struct mlx5e_sqicosq;   /* internal control operations */
bool   xdp;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 49c1769d13b9..210033187bfe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1562,7 +1562,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, 
int ix,
goto err_close_tx_cqs;
 
/* XDP SQ CQ params are same as normal TXQ sq CQ params */
-   err = c->xdp ? mlx5e_open_cq(c, >tx_cq, >xdp_sq.cq,
+   err = c->xdp ? mlx5e_open_cq(c, >tx_cq, >rq.xdpsq.cq,
 priv->params.tx_cq_moderation) : 0;
if (err)
goto err_close_rx_cq;
@@ -1587,7 +1587,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, 
int ix,
}
}
 
-   err = c->xdp ? mlx5e_open_sq(c, 0, >xdp_sq, >xdp_sq) : 0;
+   err = c->xdp ? mlx5e_open_sq(c, 0, >xdp_sq, >rq.xdpsq) : 0;
if (err)
goto err_close_sqs;
 
@@ -1601,7 +1601,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, 
int ix,
return 0;
 err_close_xdp_sq:
if (c->xdp)
-   mlx5e_close_sq(>xdp_sq);
+   mlx5e_close_sq(>rq.xdpsq);
 
 err_close_sqs:
mlx5e_close_sqs(c);
@@ -1612,7 +1612,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, 
int ix,
 err_disable_napi:
napi_disable(>napi);
if (c->xdp)
-   mlx5e_close_cq(>xdp_sq.cq);
+   mlx5e_close_cq(>rq.xdpsq.cq);
 
 err_close_rx_cq:
mlx5e_close_cq(>rq.cq);
@@ -1634,12 +1634,12 @@ static void mlx5e_close_channel(struct mlx5e_channel *c)
 {
mlx5e_close_rq(>rq);
if (c->xdp)
-   mlx5e_close_sq(>xdp_sq);
+   mlx5e_close_sq(>rq.xdpsq);
mlx5e_close_sqs(c);
mlx5e_close_sq(>icosq);
napi_disable(>napi);
if (c->xdp)
-   mlx5e_close_cq(>xdp_sq.cq);
+   mlx5e_close_cq(>rq.xdpsq.cq);
mlx5e_close_cq(>rq.cq);
mlx5e_close_tx_cqs(c);
mlx5e_close_cq(>icosq.cq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index bc74d6032a5c..040074f36313 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -653,7 +653,7 @@ static inline bool mlx5e_xmit_xdp_frame(struct mlx5e_rq *rq,
struct mlx5e_dma_info *di,
const struct xdp_buff *xdp)
 {
-   struct mlx5e_sq  *sq   = >channel->xdp_sq;
+   struct mlx5e_sq  *sq   = >xdpsq;
struct mlx5_wq_cyc   *wq   = >wq;
u16  pi= sq->pc & wq->sz_m1;
struct mlx5e_tx_wqe  *wqe  = mlx5_wq_cyc_get_wqe(wq, pi);
@@ -950,7 +950,7 @@ void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct 
mlx5_cqe64 *cqe)
 int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
 {
struct mlx5e_rq *rq = container_of(cq, struct mlx5e_rq, cq);
-   struct mlx5e_sq *xdp_sq = >channel->xdp_sq;
+   struct mlx5e_sq *xdpsq = >xdpsq;
int work_done = 0;
 
if (unlikely(!test_bit(MLX5E_RQ_STATE_ENABLED, >state)))
@@ -977,9 +977,9 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
rq->handle_rx_cqe(rq, cqe);
}
 
-   if (xdp_sq-&

[PATCH net-next 07/12] net/mlx5e: Poll XDP TX CQ before RX CQ

2017-03-24 Thread Saeed Mahameed
Handle XDP TX completions before handling RX packets, to make sure more
free space is available for XDP TX packets a moment before handling
RX packets.

Performance improvement:
System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz

Test case  Before Now  improvement
---
XDP Drop (1 core)  16.9Mpps  16.9MppsNo change
XDP TX   (1 core)  12Mpps13Mpps  8%

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index 9beeb4a1212f..c880022bb21a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -118,12 +118,12 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
for (i = 0; i < c->num_tc; i++)
busy |= mlx5e_poll_tx_cq(>sq[i].cq, budget);
 
-   work_done = mlx5e_poll_rx_cq(>rq.cq, budget);
-   busy |= work_done == budget;
-
if (c->xdp)
busy |= mlx5e_poll_xdpsq_cq(>rq.xdpsq.cq);
 
+   work_done = mlx5e_poll_rx_cq(>rq.cq, budget);
+   busy |= work_done == budget;
+
mlx5e_poll_ico_cq(>icosq.cq);
 
busy |= mlx5e_post_rx_wqes(>rq);
-- 
2.11.0



[PATCH net-next 11/12] net/mlx5e: Generalize SQ create/modify/destroy functions

2017-03-24 Thread Saeed Mahameed
In the next patches we will introduce different SQ types,
and we would want to reuse those functions, in this patch we make them
agnostic to SQ type (txq, xdp, ico).

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 111 ++
 1 file changed, 69 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index d03afa535064..dcc67df54a5c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1057,10 +1057,19 @@ static void mlx5e_free_sq(struct mlx5e_sq *sq)
mlx5_wq_destroy(>wq_ctrl);
 }
 
-static int mlx5e_create_sq(struct mlx5e_sq *sq, struct mlx5e_sq_param *param)
+struct mlx5e_create_sq_param {
+   struct mlx5_wq_ctrl*wq_ctrl;
+   u32 cqn;
+   u32 tisn;
+   u8  tis_lst_sz;
+   u8  min_inline_mode;
+};
+
+static int mlx5e_create_sq(struct mlx5e_priv *priv,
+  struct mlx5e_sq_param *param,
+  struct mlx5e_create_sq_param *csp,
+  u32 *sqn)
 {
-   struct mlx5e_channel *c = sq->channel;
-   struct mlx5e_priv *priv = c->priv;
struct mlx5_core_dev *mdev = priv->mdev;
 
void *in;
@@ -1070,7 +1079,7 @@ static int mlx5e_create_sq(struct mlx5e_sq *sq, struct 
mlx5e_sq_param *param)
int err;
 
inlen = MLX5_ST_SZ_BYTES(create_sq_in) +
-   sizeof(u64) * sq->wq_ctrl.buf.npages;
+   sizeof(u64) * csp->wq_ctrl->buf.npages;
in = mlx5_vzalloc(inlen);
if (!in)
return -ENOMEM;
@@ -1079,38 +1088,41 @@ static int mlx5e_create_sq(struct mlx5e_sq *sq, struct 
mlx5e_sq_param *param)
wq = MLX5_ADDR_OF(sqc, sqc, wq);
 
memcpy(sqc, param->sqc, sizeof(param->sqc));
-
-   MLX5_SET(sqc,  sqc, tis_num_0, param->type == MLX5E_SQ_ICO ?
-  0 : priv->tisn[sq->tc]);
-   MLX5_SET(sqc,  sqc, cqn,sq->cq.mcq.cqn);
+   MLX5_SET(sqc,  sqc, tis_lst_sz, csp->tis_lst_sz);
+   MLX5_SET(sqc,  sqc, tis_num_0, csp->tisn);
+   MLX5_SET(sqc,  sqc, cqn, csp->cqn);
 
if (MLX5_CAP_ETH(mdev, wqe_inline_mode) == 
MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
-   MLX5_SET(sqc,  sqc, min_wqe_inline_mode, sq->min_inline_mode);
+   MLX5_SET(sqc,  sqc, min_wqe_inline_mode, csp->min_inline_mode);
 
-   MLX5_SET(sqc,  sqc, state,  MLX5_SQC_STATE_RST);
-   MLX5_SET(sqc,  sqc, tis_lst_sz, param->type == MLX5E_SQ_ICO ? 0 : 1);
+   MLX5_SET(sqc,  sqc, state, MLX5_SQC_STATE_RST);
 
MLX5_SET(wq,   wq, wq_type,   MLX5_WQ_TYPE_CYCLIC);
-   MLX5_SET(wq,   wq, uar_page,  mdev->mlx5e_res.bfreg.index);
-   MLX5_SET(wq,   wq, log_wq_pg_sz,  sq->wq_ctrl.buf.page_shift -
+   MLX5_SET(wq,   wq, uar_page,  priv->mdev->mlx5e_res.bfreg.index);
+   MLX5_SET(wq,   wq, log_wq_pg_sz,  csp->wq_ctrl->buf.page_shift -
  MLX5_ADAPTER_PAGE_SHIFT);
-   MLX5_SET64(wq, wq, dbr_addr,  sq->wq_ctrl.db.dma);
+   MLX5_SET64(wq, wq, dbr_addr,  csp->wq_ctrl->db.dma);
 
-   mlx5_fill_page_array(>wq_ctrl.buf,
-(__be64 *)MLX5_ADDR_OF(wq, wq, pas));
+   mlx5_fill_page_array(>wq_ctrl->buf, (__be64 *)MLX5_ADDR_OF(wq, wq, 
pas));
 
-   err = mlx5_core_create_sq(mdev, in, inlen, >sqn);
+   err = mlx5_core_create_sq(mdev, in, inlen, sqn);
 
kvfree(in);
 
return err;
 }
 
-static int mlx5e_modify_sq(struct mlx5e_sq *sq, int curr_state,
-  int next_state, bool update_rl, int rl_index)
+struct mlx5e_modify_sq_param {
+   int curr_state;
+   int next_state;
+   bool rl_update;
+   int rl_index;
+};
+
+static int mlx5e_modify_sq(struct mlx5e_priv *priv,
+  u32 sqn,
+  struct mlx5e_modify_sq_param *p)
 {
-   struct mlx5e_channel *c = sq->channel;
-   struct mlx5e_priv *priv = c->priv;
struct mlx5_core_dev *mdev = priv->mdev;
 
void *in;
@@ -1125,29 +1137,24 @@ static int mlx5e_modify_sq(struct mlx5e_sq *sq, int 
curr_state,
 
sqc = MLX5_ADDR_OF(modify_sq_in, in, ctx);
 
-   MLX5_SET(modify_sq_in, in, sq_state, curr_state);
-   MLX5_SET(sqc, sqc, state, next_state);
-   if (update_rl && next_state == MLX5_SQC_STATE_RDY) {
+   MLX5_SET(modify_sq_in, in, sq_state, p->curr_state);
+   MLX5_SET(sqc, sqc, state, p->next_state);
+   if (p->rl_update && p->next_state == MLX5_SQC_STATE_RDY

Re: [PATCH net-next 00/12] Mellanox mlx5e XDP performance optimization

2017-03-25 Thread Saeed Mahameed
On Sat, Mar 25, 2017 at 2:26 AM, Alexei Starovoitov <a...@fb.com> wrote:
> On 3/24/17 2:52 PM, Saeed Mahameed wrote:
>>
>> Hi Dave,
>>
>> This series provides some preformancee optimizations for mlx5e
>> driver, especially for XDP TX flows.
>>
>> 1st patch is a simple change of rmb to dma_rmb in CQE fetch routine
>> which shows a huge gain for both RX and TX packet rates.
>>
>> 2nd patch removes write combining logic from the driver TX handler
>> and simplifies the TX logic while improving TX CPU utilization.
>>
>> All other patches combined provide some refactoring to the driver TX
>> flows to allow some significant XDP TX improvements.
>>
>> More details and performance numbers per patch can be found in each patch
>> commit message compared to the preceding patch.
>>
>> Overall performance improvemnets
>>   System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
>>
>> Test case   Baseline  Now  improvement
>> ---
>> TX packets (24 threads) 45Mpps54Mpps  20%
>> TC stack Drop (1 core)  3.45Mpps  3.6Mpps 5%
>> XDP Drop  (1 core)  14Mpps16.9Mpps20%
>> XDP TX(1 core)  10.4Mpps  13.7Mpps31%
>
>
> Excellent work!
> All patches look great, so for the series:
> Acked-by: Alexei Starovoitov <a...@kernel.org>
>

Thanks Alexei !

> in patch 12 I noticed that inline_mode is being evaluated.
> I think for xdp queues it's guaranteed to be fixed.
> Can we optimize that path little bit more as well?

Yes, you are right, we do evaluate it in  mlx5e_alloc_xdpsq
+   if (sq->min_inline_mode != MLX5_INLINE_MODE_NONE) {
+   inline_hdr_sz = MLX5E_XDP_MIN_INLINE;
+   ds_cnt++;
+   }

and check it again in mlx5e_xmit_xdp_frame

+  /* copy the inline part if required */
+  if (sq->min_inline_mode != MLX5_INLINE_MODE_NONE) {

sq->min_inline_mode is fixed in run-time, but it is different across
HW versions.
This condition is needed so we would not copy inline headers and waste
CPU cycles while it is not required from ConnectX-5 and later.
Actually this is a 5% XDP_TX optimization you get when you run over
ConnectX-5 [1].

in ConnectX-4 and 4-LX driver is still required to copy L2 headers
into TX descriptor so the HW can make the loopback decision correctly
(needed in case you want XDP program to switch packets between
different PFs/VFs running on the same box/NIC).

So i don't see anyway to do this without breaking XDP loopback
functionality or removing the connectX-5 optimization.

for my taste this condition is good as is.

[1] https://www.spinics.net/lists/netdev/msg419215.html

> Thanks!


[net-next 03/12] net/mlx5e: Add intermediate struct for TC flow parsing attributes

2017-03-28 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

Add intermediate structure to store attributes parsed from TC filter
matching/actions parts which are soon to be configured into the HW.

Currently put there the flow matching spec after being parsed. More
content to be added in down-stream patch.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 29 +++--
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 2a9df0a0b859..9f900afcd7ea 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -70,6 +70,10 @@ struct mlx5e_tc_flow {
};
 };
 
+struct mlx5e_tc_flow_parse_attr {
+   struct mlx5_flow_spec spec;
+};
+
 enum {
MLX5_HEADER_TYPE_VXLAN = 0x0,
MLX5_HEADER_TYPE_NVGRE = 0x1,
@@ -80,7 +84,7 @@ enum {
 
 static struct mlx5_flow_handle *
 mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
- struct mlx5_flow_spec *spec,
+ struct mlx5e_tc_flow_parse_attr *parse_attr,
  struct mlx5_nic_flow_attr *attr)
 {
struct mlx5_core_dev *dev = priv->mdev;
@@ -123,8 +127,9 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
table_created = true;
}
 
-   spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
-   rule = mlx5_add_flow_rules(priv->fs.tc.t, spec, _act, , 1);
+   parse_attr->spec.match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
+   rule = mlx5_add_flow_rules(priv->fs.tc.t, _attr->spec,
+  _act, , 1);
 
if (IS_ERR(rule))
goto err_add_rule;
@@ -161,7 +166,7 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
 
 static struct mlx5_flow_handle *
 mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
- struct mlx5_flow_spec *spec,
+ struct mlx5e_tc_flow_parse_attr *parse_attr,
  struct mlx5_esw_flow_attr *attr)
 {
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
@@ -171,7 +176,7 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
if (err)
return ERR_PTR(err);
 
-   return mlx5_eswitch_add_offloaded_rule(esw, spec, attr);
+   return mlx5_eswitch_add_offloaded_rule(esw, _attr->spec, attr);
 }
 
 static void mlx5e_detach_encap(struct mlx5e_priv *priv,
@@ -1173,8 +1178,8 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, 
__be16 protocol,
   struct tc_cls_flower_offload *f)
 {
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+   struct mlx5e_tc_flow_parse_attr *parse_attr;
struct mlx5e_tc_table *tc = >fs.tc;
-   struct mlx5_flow_spec *spec;
struct mlx5e_tc_flow *flow;
int attr_size, err = 0;
u8 flow_flags = 0;
@@ -1188,8 +1193,8 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, 
__be16 protocol,
}
 
flow = kzalloc(sizeof(*flow) + attr_size, GFP_KERNEL);
-   spec = mlx5_vzalloc(sizeof(*spec));
-   if (!spec || !flow) {
+   parse_attr = mlx5_vzalloc(sizeof(*parse_attr));
+   if (!parse_attr || !flow) {
err = -ENOMEM;
goto err_free;
}
@@ -1197,7 +1202,7 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, 
__be16 protocol,
flow->cookie = f->cookie;
flow->flags = flow_flags;
 
-   err = parse_cls_flower(priv, flow, spec, f);
+   err = parse_cls_flower(priv, flow, _attr->spec, f);
if (err < 0)
goto err_free;
 
@@ -1205,12 +1210,12 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, 
__be16 protocol,
err = parse_tc_fdb_actions(priv, f->exts, flow);
if (err < 0)
goto err_free;
-   flow->rule = mlx5e_tc_add_fdb_flow(priv, spec, flow->esw_attr);
+   flow->rule = mlx5e_tc_add_fdb_flow(priv, parse_attr, 
flow->esw_attr);
} else {
err = parse_tc_nic_actions(priv, f->exts, flow->nic_attr);
if (err < 0)
goto err_free;
-   flow->rule = mlx5e_tc_add_nic_flow(priv, spec, flow->nic_attr);
+   flow->rule = mlx5e_tc_add_nic_flow(priv, parse_attr, 
flow->nic_attr);
}
 
if (IS_ERR(flow->rule)) {
@@ -1231,7 +1236,7 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, 
__be16 protocol,
 err_free:
kfree(flow);
 out:
-   kvfree(spec);
+   kvfree(parse_attr);
return err;
 }
 
-- 
2.11.0



[net-next 11/12] net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions

2017-03-28 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

This includes calling the parsing code that translates from pedit
speak to the HW API, allocation (deallocation) of a modify header
context and setting the modify header id associated with this
context to the FTE of that flow.

Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Hadar Hen Zion <had...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 35 +
 1 file changed, 35 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 3a31195f0d9c..4045b4768294 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -52,6 +52,7 @@
 struct mlx5_nic_flow_attr {
u32 action;
u32 flow_tag;
+   u32 mod_hdr_id;
 };
 
 enum {
@@ -97,10 +98,12 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
.action = attr->action,
.flow_tag = attr->flow_tag,
.encap_id = 0,
+   .modify_id = attr->mod_hdr_id,
};
struct mlx5_fc *counter = NULL;
struct mlx5_flow_handle *rule;
bool table_created = false;
+   int err;
 
if (attr->action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
@@ -114,6 +117,18 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
dest.counter = counter;
}
 
+   if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) {
+   err = mlx5_modify_header_alloc(dev, MLX5_FLOW_NAMESPACE_KERNEL,
+  parse_attr->num_mod_hdr_actions,
+  parse_attr->mod_hdr_actions,
+  >mod_hdr_id);
+   kfree(parse_attr->mod_hdr_actions);
+   if (err) {
+   rule = ERR_PTR(err);
+   goto err_create_mod_hdr_id;
+   }
+   }
+
if (IS_ERR_OR_NULL(priv->fs.tc.t)) {
priv->fs.tc.t =
mlx5_create_auto_grouped_flow_table(priv->fs.ns,
@@ -146,6 +161,10 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
priv->fs.tc.t = NULL;
}
 err_create_ft:
+   if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
+   mlx5_modify_header_dealloc(priv->mdev,
+  attr->mod_hdr_id);
+err_create_mod_hdr_id:
mlx5_fc_destroy(dev, counter);
 
return rule;
@@ -164,6 +183,10 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
mlx5_destroy_flow_table(priv->fs.tc.t);
priv->fs.tc.t = NULL;
}
+
+   if (flow->nic_attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
+   mlx5_modify_header_dealloc(priv->mdev,
+  flow->nic_attr->mod_hdr_id);
 }
 
 static void mlx5e_detach_encap(struct mlx5e_priv *priv,
@@ -955,6 +978,7 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv, 
struct tcf_exts *exts,
struct mlx5_nic_flow_attr *attr = flow->nic_attr;
const struct tc_action *a;
LIST_HEAD(actions);
+   int err;
 
if (tc_no_actions(exts))
return -EINVAL;
@@ -976,6 +1000,17 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv, 
struct tcf_exts *exts,
continue;
}
 
+   if (is_tcf_pedit(a)) {
+   err = parse_tc_pedit_action(priv, a, 
MLX5_FLOW_NAMESPACE_KERNEL,
+   parse_attr);
+   if (err)
+   return err;
+
+   attr->action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR |
+   MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+   continue;
+   }
+
if (is_tcf_skbedit_mark(a)) {
u32 mark = tcf_skbedit_mark(a);
 
-- 
2.11.0



[net-next 02/12] net/mlx5e: Add NIC attributes for offloaded TC flows

2017-03-28 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

Add structure that contains the attributes related to offloaded
NIC flows. Currently it has the actions and flow tag.

While here, do xmas tree cleanup of the TC configure function.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Hadar Hen Zion <had...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 51 +++--
 1 file changed, 31 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index b2501987988b..2a9df0a0b859 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -48,8 +48,14 @@
 #include "eswitch.h"
 #include "vxlan.h"
 
+struct mlx5_nic_flow_attr {
+   u32 action;
+   u32 flow_tag;
+};
+
 enum {
MLX5E_TC_FLOW_ESWITCH   = BIT(0),
+   MLX5E_TC_FLOW_NIC   = BIT(1),
 };
 
 struct mlx5e_tc_flow {
@@ -58,7 +64,10 @@ struct mlx5e_tc_flow {
u8  flags;
struct mlx5_flow_handle *rule;
struct list_headencap; /* flows sharing the same encap */
-   struct mlx5_esw_flow_attr esw_attr[0];
+   union {
+   struct mlx5_esw_flow_attr esw_attr[0];
+   struct mlx5_nic_flow_attr nic_attr[0];
+   };
 };
 
 enum {
@@ -72,23 +81,23 @@ enum {
 static struct mlx5_flow_handle *
 mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
  struct mlx5_flow_spec *spec,
- u32 action, u32 flow_tag)
+ struct mlx5_nic_flow_attr *attr)
 {
struct mlx5_core_dev *dev = priv->mdev;
struct mlx5_flow_destination dest = { 0 };
struct mlx5_flow_act flow_act = {
-   .action = action,
-   .flow_tag = flow_tag,
+   .action = attr->action,
+   .flow_tag = attr->flow_tag,
.encap_id = 0,
};
struct mlx5_fc *counter = NULL;
struct mlx5_flow_handle *rule;
bool table_created = false;
 
-   if (action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
+   if (attr->action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
dest.ft = priv->fs.vlan.ft.t;
-   } else if (action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
+   } else if (attr->action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
counter = mlx5_fc_create(dev, true);
if (IS_ERR(counter))
return ERR_CAST(counter);
@@ -651,7 +660,7 @@ static int parse_cls_flower(struct mlx5e_priv *priv,
 }
 
 static int parse_tc_nic_actions(struct mlx5e_priv *priv, struct tcf_exts *exts,
-   u32 *action, u32 *flow_tag)
+   struct mlx5_nic_flow_attr *attr)
 {
const struct tc_action *a;
LIST_HEAD(actions);
@@ -659,20 +668,20 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv, 
struct tcf_exts *exts,
if (tc_no_actions(exts))
return -EINVAL;
 
-   *flow_tag = MLX5_FS_DEFAULT_FLOW_TAG;
-   *action = 0;
+   attr->flow_tag = MLX5_FS_DEFAULT_FLOW_TAG;
+   attr->action = 0;
 
tcf_exts_to_list(exts, );
list_for_each_entry(a, , list) {
/* Only support a single action per rule */
-   if (*action)
+   if (attr->action)
return -EINVAL;
 
if (is_tcf_gact_shot(a)) {
-   *action |= MLX5_FLOW_CONTEXT_ACTION_DROP;
+   attr->action |= MLX5_FLOW_CONTEXT_ACTION_DROP;
if (MLX5_CAP_FLOWTABLE(priv->mdev,
   
flow_table_properties_nic_receive.flow_counter))
-   *action |= MLX5_FLOW_CONTEXT_ACTION_COUNT;
+   attr->action |= MLX5_FLOW_CONTEXT_ACTION_COUNT;
continue;
}
 
@@ -685,8 +694,8 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv, 
struct tcf_exts *exts,
return -EINVAL;
}
 
-   *flow_tag = mark;
-   *action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+   attr->flow_tag = mark;
+   attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
continue;
}
 
@@ -1163,17 +1172,19 @@ static int parse_tc_fdb_actions(struct mlx5e_priv 
*priv, struct tcf_exts *exts,
 int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol,
   struct tc_cls_flower_offload *f)
 {
+   struct mlx5_eswitch *esw 

[net-next 06/12] net/mlx5: Reorder few command cases to reflect their natural order

2017-03-28 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

Move the commands related to scheduling elements and vport qos to
a suitable location (according to the MLX5_CMD_OP enum values) in
the command string and internal error helpers.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Hadar Hen Zion <had...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index a380353a78c2..c3c6e931cc35 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -279,6 +279,8 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev 
*dev, u16 op,
case MLX5_CMD_OP_DESTROY_XRC_SRQ:
case MLX5_CMD_OP_DESTROY_DCT:
case MLX5_CMD_OP_DEALLOC_Q_COUNTER:
+   case MLX5_CMD_OP_DESTROY_SCHEDULING_ELEMENT:
+   case MLX5_CMD_OP_DESTROY_QOS_PARA_VPORT:
case MLX5_CMD_OP_DEALLOC_PD:
case MLX5_CMD_OP_DEALLOC_UAR:
case MLX5_CMD_OP_DETACH_FROM_MCG:
@@ -305,8 +307,6 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev 
*dev, u16 op,
case MLX5_CMD_OP_SET_FLOW_TABLE_ENTRY:
case MLX5_CMD_OP_SET_FLOW_TABLE_ROOT:
case MLX5_CMD_OP_DEALLOC_ENCAP_HEADER:
-   case MLX5_CMD_OP_DESTROY_SCHEDULING_ELEMENT:
-   case MLX5_CMD_OP_DESTROY_QOS_PARA_VPORT:
return MLX5_CMD_STAT_OK;
 
case MLX5_CMD_OP_QUERY_HCA_CAP:
@@ -363,6 +363,10 @@ static int mlx5_internal_err_ret_value(struct 
mlx5_core_dev *dev, u16 op,
case MLX5_CMD_OP_QUERY_Q_COUNTER:
case MLX5_CMD_OP_SET_RATE_LIMIT:
case MLX5_CMD_OP_QUERY_RATE_LIMIT:
+   case MLX5_CMD_OP_CREATE_SCHEDULING_ELEMENT:
+   case MLX5_CMD_OP_QUERY_SCHEDULING_ELEMENT:
+   case MLX5_CMD_OP_MODIFY_SCHEDULING_ELEMENT:
+   case MLX5_CMD_OP_CREATE_QOS_PARA_VPORT:
case MLX5_CMD_OP_ALLOC_PD:
case MLX5_CMD_OP_ALLOC_UAR:
case MLX5_CMD_OP_CONFIG_INT_MODERATION:
@@ -414,10 +418,6 @@ static int mlx5_internal_err_ret_value(struct 
mlx5_core_dev *dev, u16 op,
case MLX5_CMD_OP_ALLOC_FLOW_COUNTER:
case MLX5_CMD_OP_QUERY_FLOW_COUNTER:
case MLX5_CMD_OP_ALLOC_ENCAP_HEADER:
-   case MLX5_CMD_OP_CREATE_SCHEDULING_ELEMENT:
-   case MLX5_CMD_OP_QUERY_SCHEDULING_ELEMENT:
-   case MLX5_CMD_OP_MODIFY_SCHEDULING_ELEMENT:
-   case MLX5_CMD_OP_CREATE_QOS_PARA_VPORT:
*status = MLX5_DRIVER_STATUS_ABORTED;
*synd = MLX5_DRIVER_SYND;
return -EIO;
@@ -501,6 +501,12 @@ const char *mlx5_command_str(int command)
MLX5_COMMAND_STR_CASE(QUERY_Q_COUNTER);
MLX5_COMMAND_STR_CASE(SET_RATE_LIMIT);
MLX5_COMMAND_STR_CASE(QUERY_RATE_LIMIT);
+   MLX5_COMMAND_STR_CASE(CREATE_SCHEDULING_ELEMENT);
+   MLX5_COMMAND_STR_CASE(DESTROY_SCHEDULING_ELEMENT);
+   MLX5_COMMAND_STR_CASE(QUERY_SCHEDULING_ELEMENT);
+   MLX5_COMMAND_STR_CASE(MODIFY_SCHEDULING_ELEMENT);
+   MLX5_COMMAND_STR_CASE(CREATE_QOS_PARA_VPORT);
+   MLX5_COMMAND_STR_CASE(DESTROY_QOS_PARA_VPORT);
MLX5_COMMAND_STR_CASE(ALLOC_PD);
MLX5_COMMAND_STR_CASE(DEALLOC_PD);
MLX5_COMMAND_STR_CASE(ALLOC_UAR);
@@ -576,12 +582,6 @@ const char *mlx5_command_str(int command)
MLX5_COMMAND_STR_CASE(MODIFY_FLOW_TABLE);
MLX5_COMMAND_STR_CASE(ALLOC_ENCAP_HEADER);
MLX5_COMMAND_STR_CASE(DEALLOC_ENCAP_HEADER);
-   MLX5_COMMAND_STR_CASE(CREATE_SCHEDULING_ELEMENT);
-   MLX5_COMMAND_STR_CASE(DESTROY_SCHEDULING_ELEMENT);
-   MLX5_COMMAND_STR_CASE(QUERY_SCHEDULING_ELEMENT);
-   MLX5_COMMAND_STR_CASE(MODIFY_SCHEDULING_ELEMENT);
-   MLX5_COMMAND_STR_CASE(CREATE_QOS_PARA_VPORT);
-   MLX5_COMMAND_STR_CASE(DESTROY_QOS_PARA_VPORT);
default: return "unknown command opcode";
}
 }
-- 
2.11.0



[net-next 09/12] net/sched: Add accessor functions to pedit keys for offloading drivers

2017-03-28 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

HW drivers will use the header-type and command fields from the extended
keys, and some fields (e.g mask, val, offset) from the legacy keys.

Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Hadar Hen Zion <had...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 include/net/tc_act/tc_pedit.h | 45 +++
 1 file changed, 45 insertions(+)

diff --git a/include/net/tc_act/tc_pedit.h b/include/net/tc_act/tc_pedit.h
index dfbd6ee0bc7c..a46c3f2ace70 100644
--- a/include/net/tc_act/tc_pedit.h
+++ b/include/net/tc_act/tc_pedit.h
@@ -2,6 +2,7 @@
 #define __NET_TC_PED_H
 
 #include 
+#include 
 
 struct tcf_pedit_key_ex {
enum pedit_header_type htype;
@@ -17,4 +18,48 @@ struct tcf_pedit {
 };
 #define to_pedit(a) ((struct tcf_pedit *)a)
 
+static inline bool is_tcf_pedit(const struct tc_action *a)
+{
+#ifdef CONFIG_NET_CLS_ACT
+   if (a->ops && a->ops->type == TCA_ACT_PEDIT)
+   return true;
+#endif
+   return false;
+}
+
+static inline int tcf_pedit_nkeys(const struct tc_action *a)
+{
+   return to_pedit(a)->tcfp_nkeys;
+}
+
+static inline u32 tcf_pedit_htype(const struct tc_action *a, int index)
+{
+   if (to_pedit(a)->tcfp_keys_ex)
+   return to_pedit(a)->tcfp_keys_ex[index].htype;
+
+   return TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK;
+}
+
+static inline u32 tcf_pedit_cmd(const struct tc_action *a, int index)
+{
+   if (to_pedit(a)->tcfp_keys_ex)
+   return to_pedit(a)->tcfp_keys_ex[index].cmd;
+
+   return __PEDIT_CMD_MAX;
+}
+
+static inline u32 tcf_pedit_mask(const struct tc_action *a, int index)
+{
+   return to_pedit(a)->tcfp_keys[index].mask;
+}
+
+static inline u32 tcf_pedit_val(const struct tc_action *a, int index)
+{
+   return to_pedit(a)->tcfp_keys[index].val;
+}
+
+static inline u32 tcf_pedit_offset(const struct tc_action *a, int index)
+{
+   return to_pedit(a)->tcfp_keys[index].off;
+}
 #endif /* __NET_TC_PED_H */
-- 
2.11.0



[net-next 05/12] net/mlx5: Add helper to initialize a flow steering actions struct instance

2017-03-28 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

There are bunch of places in the code where the intermediate struct
that keeps the elements related to flow actions is initialized with
the same default values. Put that into a small DECLARE type helper.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Hadar Hen Zion <had...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 14 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c   | 18 +++---
 include/linux/mlx5/fs.h   |  5 -
 3 files changed, 10 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
index 68419a01db36..c4e9cc79f5c7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
@@ -174,13 +174,9 @@ static int arfs_add_default_rule(struct mlx5e_priv *priv,
 enum arfs_type type)
 {
struct arfs_table *arfs_t = >fs.arfs.arfs_tables[type];
-   struct mlx5_flow_act flow_act = {
-   .action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST,
-   .flow_tag = MLX5_FS_DEFAULT_FLOW_TAG,
-   .encap_id = 0,
-   };
-   struct mlx5_flow_destination dest;
struct mlx5e_tir *tir = priv->indir_tir;
+   struct mlx5_flow_destination dest;
+   MLX5_DECLARE_FLOW_ACT(flow_act);
struct mlx5_flow_spec *spec;
int err = 0;
 
@@ -469,15 +465,11 @@ static struct arfs_table *arfs_get_table(struct 
mlx5e_arfs_tables *arfs,
 static struct mlx5_flow_handle *arfs_add_rule(struct mlx5e_priv *priv,
  struct arfs_rule *arfs_rule)
 {
-   struct mlx5_flow_act flow_act = {
-   .action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST,
-   .flow_tag = MLX5_FS_DEFAULT_FLOW_TAG,
-   .encap_id = 0,
-   };
struct mlx5e_arfs_tables *arfs = >fs.arfs;
struct arfs_tuple *tuple = _rule->tuple;
struct mlx5_flow_handle *rule = NULL;
struct mlx5_flow_destination dest;
+   MLX5_DECLARE_FLOW_ACT(flow_act);
struct arfs_table *arfs_table;
struct mlx5_flow_spec *spec;
struct mlx5_flow_table *ft;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index f2762e45c8ae..5376d69a6b1a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -159,14 +159,10 @@ static int __mlx5e_add_vlan_rule(struct mlx5e_priv *priv,
 enum mlx5e_vlan_rule_type rule_type,
 u16 vid, struct mlx5_flow_spec *spec)
 {
-   struct mlx5_flow_act flow_act = {
-   .action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST,
-   .flow_tag = MLX5_FS_DEFAULT_FLOW_TAG,
-   .encap_id = 0,
-   };
struct mlx5_flow_table *ft = priv->fs.vlan.ft.t;
struct mlx5_flow_destination dest;
struct mlx5_flow_handle **rule_p;
+   MLX5_DECLARE_FLOW_ACT(flow_act);
int err = 0;
 
dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
@@ -659,11 +655,7 @@ mlx5e_generate_ttc_rule(struct mlx5e_priv *priv,
u16 etype,
u8 proto)
 {
-   struct mlx5_flow_act flow_act = {
-   .action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST,
-   .flow_tag = MLX5_FS_DEFAULT_FLOW_TAG,
-   .encap_id = 0,
-   };
+   MLX5_DECLARE_FLOW_ACT(flow_act);
struct mlx5_flow_handle *rule;
struct mlx5_flow_spec *spec;
int err = 0;
@@ -848,13 +840,9 @@ static void mlx5e_del_l2_flow_rule(struct mlx5e_priv *priv,
 static int mlx5e_add_l2_flow_rule(struct mlx5e_priv *priv,
  struct mlx5e_l2_rule *ai, int type)
 {
-   struct mlx5_flow_act flow_act = {
-   .action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST,
-   .flow_tag = MLX5_FS_DEFAULT_FLOW_TAG,
-   .encap_id = 0,
-   };
struct mlx5_flow_table *ft = priv->fs.l2.ft.t;
struct mlx5_flow_destination dest;
+   MLX5_DECLARE_FLOW_ACT(flow_act);
struct mlx5_flow_spec *spec;
int err = 0;
u8 *mc_dmac;
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 949b24b6c479..5eea1ba2e593 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -136,6 +136,10 @@ struct mlx5_flow_act {
u32 encap_id;
 };
 
+#define MLX5_DECLARE_FLOW_ACT(name) \
+   struct mlx5_flow_act name = {MLX5_FLOW_CONTEXT_ACTION_FWD_DEST,\
+MLX5_FS_DEFAULT_FLOW_TAG, 0}
+
 /* Single destination per rule.
  * Group ID is implied by the match criteri

[net-next 10/12] net/mlx5e: Add parsing of TC pedit actions to HW format

2017-03-28 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

Parse/translate a set of TC pedit actions to be formed in the HW API format.

User-space provides set of keys where each one of them is made of: command (add 
or
set), header-type, byte offset within that header along with a 32 bit mask and 
value.

The mask dictates what bits in the 32 bit word that starts on the offset we 
should
be dealing with, but under negative polarity (unset bits are to be modified).

We do a 1st pass over the set of keys while using the header-type and offset to
fill the masks and the values into a data-structure containting all the
supported network headers.

We then do a 2nd pass over the set of fields to re-write supported by the HW,
where for each such candidate field, we use the masks filled on the 1st pass to
realize if we should offloading re-write it.

In case offloading is required, we fill a HW descriptor with the following:

(1) the header field to modify
(2) the bit offset within the field from where to modify (set command only)
(3) the value to set/add
(4) the length in bits 1...32 to modify (set command only)

Note that it's possible for a given pedit mask to dictate modifying the
same header field multiple times or to modify multiple header fields.
Currently such combinations are not supported for offloading, hence, for set
commands, the offset within the field is always zero, and the length to modify
is the field size.

Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 273 
 1 file changed, 273 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index af92d9c1a619..3a31195f0d9c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "en.h"
 #include "en_tc.h"
@@ -72,6 +73,8 @@ struct mlx5e_tc_flow {
 
 struct mlx5e_tc_flow_parse_attr {
struct mlx5_flow_spec spec;
+   int num_mod_hdr_actions;
+   void *mod_hdr_actions;
 };
 
 enum {
@@ -675,6 +678,276 @@ static int parse_cls_flower(struct mlx5e_priv *priv,
return err;
 }
 
+struct pedit_headers {
+   struct ethhdr  eth;
+   struct iphdr   ip4;
+   struct ipv6hdr ip6;
+   struct tcphdr  tcp;
+   struct udphdr  udp;
+};
+
+static int pedit_header_offsets[] = {
+   [TCA_PEDIT_KEY_EX_HDR_TYPE_ETH] = offsetof(struct pedit_headers, eth),
+   [TCA_PEDIT_KEY_EX_HDR_TYPE_IP4] = offsetof(struct pedit_headers, ip4),
+   [TCA_PEDIT_KEY_EX_HDR_TYPE_IP6] = offsetof(struct pedit_headers, ip6),
+   [TCA_PEDIT_KEY_EX_HDR_TYPE_TCP] = offsetof(struct pedit_headers, tcp),
+   [TCA_PEDIT_KEY_EX_HDR_TYPE_UDP] = offsetof(struct pedit_headers, udp),
+};
+
+#define pedit_header(_ph, _htype) ((void *)(_ph) + 
pedit_header_offsets[_htype])
+
+static int set_pedit_val(u8 hdr_type, u32 mask, u32 val, u32 offset,
+struct pedit_headers *masks,
+struct pedit_headers *vals)
+{
+   u32 *curr_pmask, *curr_pval;
+
+   if (hdr_type >= __PEDIT_HDR_TYPE_MAX)
+   goto out_err;
+
+   curr_pmask = (u32 *)(pedit_header(masks, hdr_type) + offset);
+   curr_pval  = (u32 *)(pedit_header(vals, hdr_type) + offset);
+
+   if (*curr_pmask & mask)  /* disallow acting twice on the same location 
*/
+   goto out_err;
+
+   *curr_pmask |= mask;
+   *curr_pval  |= (val & mask);
+
+   return 0;
+
+out_err:
+   return -EOPNOTSUPP;
+}
+
+struct mlx5_fields {
+   u8  field;
+   u8  size;
+   u32 offset;
+};
+
+static struct mlx5_fields fields[] = {
+   {MLX5_ACTION_IN_FIELD_OUT_DMAC_47_16, 4, offsetof(struct pedit_headers, 
eth.h_dest[0])},
+   {MLX5_ACTION_IN_FIELD_OUT_DMAC_15_0,  2, offsetof(struct pedit_headers, 
eth.h_dest[4])},
+   {MLX5_ACTION_IN_FIELD_OUT_SMAC_47_16, 4, offsetof(struct pedit_headers, 
eth.h_source[0])},
+   {MLX5_ACTION_IN_FIELD_OUT_SMAC_15_0,  2, offsetof(struct pedit_headers, 
eth.h_source[4])},
+   {MLX5_ACTION_IN_FIELD_OUT_ETHERTYPE,  2, offsetof(struct pedit_headers, 
eth.h_proto)},
+
+   {MLX5_ACTION_IN_FIELD_OUT_IP_DSCP, 1, offsetof(struct pedit_headers, 
ip4.tos)},
+   {MLX5_ACTION_IN_FIELD_OUT_IP_TTL,  1, offsetof(struct pedit_headers, 
ip4.ttl)},
+   {MLX5_ACTION_IN_FIELD_OUT_SIPV4,   4, offsetof(struct pedit_headers, 
ip4.saddr)},
+   {MLX5_ACTION_IN_FIELD_OUT_DIPV4,   4, offsetof(struct pedit_headers, 
ip4.daddr)},
+
+   {MLX5_ACTION_IN_FIELD_OUT_SIPV6_127_96, 4, offsetof(struct 
pedit_headers, ip6.saddr.s6_addr32[0])},
+   {MLX5_ACTION_IN_FIELD_OUT_SIPV6_95_64,  4, offsetof(struct 
pedit_headers, ip6.saddr.s6_addr32[1])},
+   {ML

[net-next 01/12] net/mlx5e: Add prefix for e-switch offloaded TC flow attributes

2017-03-28 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

Add esw_ prefix to the flow attributes attached to offloaded e-switch
TC flows. This is a pre-step to add attributes to offloaded NIC TC flows.

Also, save one pointer space by using gcc's zero size array, this would
be beneficial for environments where 100Ks (or Ms) of flows are offloaded.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Hadar Hen Zion <had...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index fade7233dac5..b2501987988b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -58,7 +58,7 @@ struct mlx5e_tc_flow {
u8  flags;
struct mlx5_flow_handle *rule;
struct list_headencap; /* flows sharing the same encap */
-   struct mlx5_esw_flow_attr *attr;
+   struct mlx5_esw_flow_attr esw_attr[0];
 };
 
 enum {
@@ -173,11 +173,11 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv,
 {
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 
-   mlx5_eswitch_del_offloaded_rule(esw, flow->rule, flow->attr);
+   mlx5_eswitch_del_offloaded_rule(esw, flow->rule, flow->esw_attr);
 
-   mlx5_eswitch_del_vlan_action(esw, flow->attr);
+   mlx5_eswitch_del_vlan_action(esw, flow->esw_attr);
 
-   if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
+   if (flow->esw_attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
mlx5e_detach_encap(priv, flow);
 }
 
@@ -1073,7 +1073,7 @@ static int mlx5e_attach_encap(struct mlx5e_priv *priv,
 static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts,
struct mlx5e_tc_flow *flow)
 {
-   struct mlx5_esw_flow_attr *attr = flow->attr;
+   struct mlx5_esw_flow_attr *attr = flow->esw_attr;
struct ip_tunnel_info *info = NULL;
const struct tc_action *a;
LIST_HEAD(actions);
@@ -1191,11 +1191,10 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, 
__be16 protocol,
goto err_free;
 
if (flow->flags & MLX5E_TC_FLOW_ESWITCH) {
-   flow->attr  = (struct mlx5_esw_flow_attr *)(flow + 1);
err = parse_tc_fdb_actions(priv, f->exts, flow);
if (err < 0)
goto err_free;
-   flow->rule = mlx5e_tc_add_fdb_flow(priv, spec, flow->attr);
+   flow->rule = mlx5e_tc_add_fdb_flow(priv, spec, flow->esw_attr);
} else {
err = parse_tc_nic_actions(priv, f->exts, , _tag);
if (err < 0)
-- 
2.11.0



[net-next 12/12] net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions

2017-03-28 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

This includes calling the parsing code that translates from pedit
speak to the HW API, allocation (deallocation) of a modify header
context and setting the modify header id associated with this
context to the FTE of that flow.

Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Hadar Hen Zion <had...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 37 --
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  1 +
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |  5 ++-
 3 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 4045b4768294..9dec11c00a49 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -98,7 +98,6 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
.action = attr->action,
.flow_tag = attr->flow_tag,
.encap_id = 0,
-   .modify_id = attr->mod_hdr_id,
};
struct mlx5_fc *counter = NULL;
struct mlx5_flow_handle *rule;
@@ -122,6 +121,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
   parse_attr->num_mod_hdr_actions,
   parse_attr->mod_hdr_actions,
   >mod_hdr_id);
+   flow_act.modify_id = attr->mod_hdr_id;
kfree(parse_attr->mod_hdr_actions);
if (err) {
rule = ERR_PTR(err);
@@ -208,6 +208,18 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
goto err_add_vlan;
}
 
+   if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) {
+   err = mlx5_modify_header_alloc(priv->mdev, 
MLX5_FLOW_NAMESPACE_FDB,
+  parse_attr->num_mod_hdr_actions,
+  parse_attr->mod_hdr_actions,
+  >mod_hdr_id);
+   kfree(parse_attr->mod_hdr_actions);
+   if (err) {
+   rule = ERR_PTR(err);
+   goto err_mod_hdr;
+   }
+   }
+
rule = mlx5_eswitch_add_offloaded_rule(esw, _attr->spec, attr);
if (IS_ERR(rule))
goto err_add_rule;
@@ -215,11 +227,14 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
return rule;
 
 err_add_rule:
+   if (flow->esw_attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
+   mlx5_modify_header_dealloc(priv->mdev,
+  attr->mod_hdr_id);
+err_mod_hdr:
mlx5_eswitch_del_vlan_action(esw, attr);
 err_add_vlan:
if (attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
mlx5e_detach_encap(priv, flow);
-
return rule;
 }
 
@@ -227,6 +242,7 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv,
  struct mlx5e_tc_flow *flow)
 {
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+   struct mlx5_esw_flow_attr *attr = flow->esw_attr;
 
mlx5_eswitch_del_offloaded_rule(esw, flow->rule, flow->esw_attr);
 
@@ -234,6 +250,10 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv,
 
if (flow->esw_attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
mlx5e_detach_encap(priv, flow);
+
+   if (flow->esw_attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
+   mlx5_modify_header_dealloc(priv->mdev,
+  attr->mod_hdr_id);
 }
 
 static void mlx5e_detach_encap(struct mlx5e_priv *priv,
@@ -1406,6 +1426,7 @@ static int mlx5e_attach_encap(struct mlx5e_priv *priv,
 }
 
 static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts,
+   struct mlx5e_tc_flow_parse_attr *parse_attr,
struct mlx5e_tc_flow *flow)
 {
struct mlx5_esw_flow_attr *attr = flow->esw_attr;
@@ -1429,6 +1450,16 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, 
struct tcf_exts *exts,
continue;
}
 
+   if (is_tcf_pedit(a)) {
+   err = parse_tc_pedit_action(priv, a, 
MLX5_FLOW_NAMESPACE_FDB,
+   parse_attr);
+   if (err)
+   return err;
+
+   attr->action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
+   continue;
+   }
+
if (is_tcf_mirred_egress_redirect(a)) {
int ifindex = tcf_mirred_if

[net-next 07/12] net/mlx5: Introduce modify header structures, commands and steering action definitions

2017-03-28 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

Add the definitions related to creation/deletion of a modify header
context and the modify header steering action which are used for HW
packet header modify (re-write) as part of steering. Add as well the
modify header id into two intermediate structs and set it to the FTE.

Note that as the push/pop vlan steering actions are emulated by the
ewitch management code, we're not breaking any compatibility while
changing their values to make room for the modify header action which
is not emulated and whose value is part of the FW API. The new bit
values for the emulated actions are at the end of the possible range.

Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Hadar Hen Zion <had...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |   1 +
 include/linux/mlx5/fs.h   |   3 +-
 include/linux/mlx5/mlx5_ifc.h | 113 +-
 6 files changed, 118 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index ad329b1680b4..cd9240c3a7f0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -285,8 +285,8 @@ enum {
SET_VLAN_INSERT = BIT(1)
 };
 
-#define MLX5_FLOW_CONTEXT_ACTION_VLAN_POP  0x40
-#define MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH 0x80
+#define MLX5_FLOW_CONTEXT_ACTION_VLAN_POP  0x4000
+#define MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH 0x8000
 
 struct mlx5_encap_entry {
struct hlist_node encap_hlist;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index b64a781c7e85..20d1fd516d03 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -249,6 +249,7 @@ static int mlx5_cmd_set_fte(struct mlx5_core_dev *dev,
MLX5_SET(flow_context, in_flow_context, flow_tag, fte->flow_tag);
MLX5_SET(flow_context, in_flow_context, action, fte->action);
MLX5_SET(flow_context, in_flow_context, encap_id, fte->encap_id);
+   MLX5_SET(flow_context, in_flow_context, modify_header_id, 
fte->modify_id);
in_match_value = MLX5_ADDR_OF(flow_context, in_flow_context,
  match_value);
memcpy(in_match_value, >val, MLX5_ST_SZ_BYTES(fte_match_param));
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index ded27bb9a3b6..27ff815600f7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -476,6 +476,7 @@ static struct fs_fte *alloc_fte(struct mlx5_flow_act 
*flow_act,
fte->index = index;
fte->action = flow_act->action;
fte->encap_id = flow_act->encap_id;
+   fte->modify_id = flow_act->modify_id;
 
return fte;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
index 8e668c63f69e..03af2e7989f3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
@@ -152,6 +152,7 @@ struct fs_fte {
u32 index;
u32 action;
u32 encap_id;
+   u32 modify_id;
enum fs_fte_status  status;
struct mlx5_fc  *counter;
 };
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 5eea1ba2e593..ae91a4bda1a3 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -134,11 +134,12 @@ struct mlx5_flow_act {
u32 action;
u32 flow_tag;
u32 encap_id;
+   u32 modify_id;
 };
 
 #define MLX5_DECLARE_FLOW_ACT(name) \
struct mlx5_flow_act name = {MLX5_FLOW_CONTEXT_ACTION_FWD_DEST,\
-MLX5_FS_DEFAULT_FLOW_TAG, 0}
+MLX5_FS_DEFAULT_FLOW_TAG, 0, 0}
 
 /* Single destination per rule.
  * Group ID is implied by the match criteria.
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 838242697541..56bc842b0620 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -227,6 +227,8 @@ enum {
MLX5_CMD_OP_MODIFY_FLOW_TABLE = 0x93c,
MLX5_CMD_OP_ALLOC_ENCAP_HEADER= 0x93d,
MLX5_CMD_OP_DEALLOC_ENCAP_HEADER  = 0x93e,
+   MLX5_CMD_OP_ALLOC_MODIFY_HEADER_CONTEXT   = 0x940,
+   MLX5_CMD_OP_DE

[net-next 04/12] net/mlx5e: Properly deal with resource cleanup when adding TC flow fails

2017-03-28 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

The code for adding tc fdb flows leaves things half set when it fails
in the middle. Currently we are not leaking things (e.g eswitch
vlan reference, encap reference and HW resources) since the main
code to add flower rules does a cleanup by calling mlx5e_tc_del_flow().

This cleanup further works just b/c we're checking there if the HW rule
for the flow we are attempting to delete is valid before touching it, and
since under the current possible combinations of supported actions it's okay
to go and blidnly deref or delete all the action related resources (encap, 
vlan).

Instead, do things properly, namely make sure that if add flow fails we
clean all what was allocated or referenced. Now, the flow delete code can
blindly deref/deallocate both the rule and the actions related resources and
when more action combinations are introduced (such as the upcoming header
re-write) we are fine with clear and robust code.

While here, align all of nic/fdb parse actions/add flow functions to get
mlx5e_tc_flow struct param and pick the attributes or whatever else needed
from there.

Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Hadar Hen Zion <had...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 59 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 23 +
 2 files changed, 50 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 9f900afcd7ea..af92d9c1a619 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -85,10 +85,11 @@ enum {
 static struct mlx5_flow_handle *
 mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
  struct mlx5e_tc_flow_parse_attr *parse_attr,
- struct mlx5_nic_flow_attr *attr)
+ struct mlx5e_tc_flow *flow)
 {
+   struct mlx5_nic_flow_attr *attr = flow->nic_attr;
struct mlx5_core_dev *dev = priv->mdev;
-   struct mlx5_flow_destination dest = { 0 };
+   struct mlx5_flow_destination dest = {};
struct mlx5_flow_act flow_act = {
.action = attr->action,
.flow_tag = attr->flow_tag,
@@ -152,11 +153,9 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
 {
struct mlx5_fc *counter = NULL;
 
-   if (!IS_ERR(flow->rule)) {
-   counter = mlx5_flow_rule_counter(flow->rule);
-   mlx5_del_flow_rules(flow->rule);
-   mlx5_fc_destroy(priv->mdev, counter);
-   }
+   counter = mlx5_flow_rule_counter(flow->rule);
+   mlx5_del_flow_rules(flow->rule);
+   mlx5_fc_destroy(priv->mdev, counter);
 
if (!mlx5e_tc_num_filters(priv) && (priv->fs.tc.t)) {
mlx5_destroy_flow_table(priv->fs.tc.t);
@@ -164,23 +163,39 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
}
 }
 
+static void mlx5e_detach_encap(struct mlx5e_priv *priv,
+  struct mlx5e_tc_flow *flow);
+
 static struct mlx5_flow_handle *
 mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
  struct mlx5e_tc_flow_parse_attr *parse_attr,
- struct mlx5_esw_flow_attr *attr)
+ struct mlx5e_tc_flow *flow)
 {
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+   struct mlx5_esw_flow_attr *attr = flow->esw_attr;
+   struct mlx5_flow_handle *rule;
int err;
 
err = mlx5_eswitch_add_vlan_action(esw, attr);
-   if (err)
-   return ERR_PTR(err);
+   if (err) {
+   rule = ERR_PTR(err);
+   goto err_add_vlan;
+   }
 
-   return mlx5_eswitch_add_offloaded_rule(esw, _attr->spec, attr);
-}
+   rule = mlx5_eswitch_add_offloaded_rule(esw, _attr->spec, attr);
+   if (IS_ERR(rule))
+   goto err_add_rule;
 
-static void mlx5e_detach_encap(struct mlx5e_priv *priv,
-  struct mlx5e_tc_flow *flow);
+   return rule;
+
+err_add_rule:
+   mlx5_eswitch_del_vlan_action(esw, attr);
+err_add_vlan:
+   if (attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
+   mlx5e_detach_encap(priv, flow);
+
+   return rule;
+}
 
 static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv,
  struct mlx5e_tc_flow *flow)
@@ -214,10 +229,6 @@ static void mlx5e_detach_encap(struct mlx5e_priv *priv,
}
 }
 
-/* we get here also when setting rule to the FW failed, etc. It means that the
- * flow rule itself might not exist, but some offloading related to the actions
- * should be cleaned.
- */
 static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
  struct mlx5e_tc_flow *flow)

[net-next 08/12] net/mlx5: Introduce alloc/dealloc modify header context commands

2017-03-28 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

Implement the low-level commands to support packet header re-write.

Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Hadar Hen Zion <had...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c  |  4 ++
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   | 66 ++
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|  5 ++
 3 files changed, 75 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index c3c6e931cc35..5bdaf3d545b2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -307,6 +307,7 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev 
*dev, u16 op,
case MLX5_CMD_OP_SET_FLOW_TABLE_ENTRY:
case MLX5_CMD_OP_SET_FLOW_TABLE_ROOT:
case MLX5_CMD_OP_DEALLOC_ENCAP_HEADER:
+   case MLX5_CMD_OP_DEALLOC_MODIFY_HEADER_CONTEXT:
return MLX5_CMD_STAT_OK;
 
case MLX5_CMD_OP_QUERY_HCA_CAP:
@@ -418,6 +419,7 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev 
*dev, u16 op,
case MLX5_CMD_OP_ALLOC_FLOW_COUNTER:
case MLX5_CMD_OP_QUERY_FLOW_COUNTER:
case MLX5_CMD_OP_ALLOC_ENCAP_HEADER:
+   case MLX5_CMD_OP_ALLOC_MODIFY_HEADER_CONTEXT:
*status = MLX5_DRIVER_STATUS_ABORTED;
*synd = MLX5_DRIVER_SYND;
return -EIO;
@@ -582,6 +584,8 @@ const char *mlx5_command_str(int command)
MLX5_COMMAND_STR_CASE(MODIFY_FLOW_TABLE);
MLX5_COMMAND_STR_CASE(ALLOC_ENCAP_HEADER);
MLX5_COMMAND_STR_CASE(DEALLOC_ENCAP_HEADER);
+   MLX5_COMMAND_STR_CASE(ALLOC_MODIFY_HEADER_CONTEXT);
+   MLX5_COMMAND_STR_CASE(DEALLOC_MODIFY_HEADER_CONTEXT);
default: return "unknown command opcode";
}
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index 20d1fd516d03..c6178ea1a461 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -516,3 +516,69 @@ void mlx5_encap_dealloc(struct mlx5_core_dev *dev, u32 
encap_id)
 
mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
 }
+
+int mlx5_modify_header_alloc(struct mlx5_core_dev *dev,
+u8 namespace, u8 num_actions,
+void *modify_actions, u32 *modify_header_id)
+{
+   u32 out[MLX5_ST_SZ_DW(alloc_modify_header_context_out)];
+   int max_actions, actions_size, inlen, err;
+   void *actions_in;
+   u8 table_type;
+   u32 *in;
+
+   switch (namespace) {
+   case MLX5_FLOW_NAMESPACE_FDB:
+   max_actions = MLX5_CAP_ESW_FLOWTABLE_FDB(dev, 
max_modify_header_actions);
+   table_type = FS_FT_FDB;
+   break;
+   case MLX5_FLOW_NAMESPACE_KERNEL:
+   max_actions = MLX5_CAP_FLOWTABLE_NIC_RX(dev, 
max_modify_header_actions);
+   table_type = FS_FT_NIC_RX;
+   break;
+   default:
+   return -EOPNOTSUPP;
+   }
+
+   if (num_actions > max_actions) {
+   mlx5_core_warn(dev, "too many modify header actions %d, max 
supported %d\n",
+  num_actions, max_actions);
+   return -EOPNOTSUPP;
+   }
+
+   actions_size = MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto) * 
num_actions;
+   inlen = MLX5_ST_SZ_BYTES(alloc_modify_header_context_in) + actions_size;
+
+   in = kzalloc(inlen, GFP_KERNEL);
+   if (!in)
+   return -ENOMEM;
+
+   MLX5_SET(alloc_modify_header_context_in, in, opcode,
+MLX5_CMD_OP_ALLOC_MODIFY_HEADER_CONTEXT);
+   MLX5_SET(alloc_modify_header_context_in, in, table_type, table_type);
+   MLX5_SET(alloc_modify_header_context_in, in, num_of_actions, 
num_actions);
+
+   actions_in = MLX5_ADDR_OF(alloc_modify_header_context_in, in, actions);
+   memcpy(actions_in, modify_actions, actions_size);
+
+   memset(out, 0, sizeof(out));
+   err = mlx5_cmd_exec(dev, in, inlen, out, sizeof(out));
+
+   *modify_header_id = MLX5_GET(alloc_modify_header_context_out, out, 
modify_header_id);
+   kfree(in);
+   return err;
+}
+
+void mlx5_modify_header_dealloc(struct mlx5_core_dev *dev, u32 
modify_header_id)
+{
+   u32 in[MLX5_ST_SZ_DW(dealloc_modify_header_context_in)];
+   u32 out[MLX5_ST_SZ_DW(dealloc_modify_header_context_out)];
+
+   memset(in, 0, sizeof(in));
+   MLX5_SET(dealloc_modify_header_context_in, in, opcode,
+MLX5_CMD_OP_DEALLOC_MODIFY_HEADER_CONTEXT);
+   MLX5_SET(dealloc_modify_header_context_in, in, modify_header_id,
+modify_header_id);
+
+   mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
diff 

[pull request][net-next 00/12] Mellanox mlx5 offloading of TC pedit (header re-write) action

2017-03-28 Thread Saeed Mahameed
Hi Dave,

The following changes from Or Gerlitz provide mlx5 offloading support of
TC pedit (header re-write) action.

For more information please see below.

Please pull and let me know if there's any problem.

Thanks,
Saeed.

---

The following changes since commit cc628c9680c212d9dbf68785fbf5d454ccb2313e:

  Merge tag 'mlx5e-failsafe' of 
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux (2017-03-27 21:16:03 
-0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5e-pedit

for you to fetch changes up to d7e75a325cb2d2b72e7ac9a185abc1cd59bc9922:

  net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions 
(2017-03-28 15:34:10 +0300)


mlx5e-pedit 2017-03-28

Or Gerlitz says:

This series adds support for offloading modifications of packet headers using
ConnectX-5 HW header re-write as an action applied during packet steering.

The offloaded SW mechanism is TC's pedit action. The offloading is
supported for E-Switch steering of VF traffic in the SRIOV
switchdev mode and for NIC (non eswitch) RX.

One use-case for this offload on virtual networks, is when the hypervisor
implements flow based router such as Open-Stack's DVR, where L2 headers
of guest packets re-written with routers' MAC addresses and the IP TTL
is decremented.

Another use case (which can be applied in parallel with routing) is
stateless NAT where guest L3/L4 headers are re-written.

The series is built as follows: the 1st six patches are preperations which
don't yet add new functionality, patches 7-8 add the FW APIs (data-structures
and commands) for header re-write, and patch nine allows offloading driver
to access pedit keys.

The 10th patch is somehow the core of the series, where we translate from
the pedit way to represent set of header modification elements to the FW
API for that same matter.

Once a set of HW modification is established, we register it with the FW
and get a modify header ID. When this ID is used with an action during
packet steering, the HW applies the header modification on the packet.

Patches 11 and 12 implement the above logic as an offload for pedit action
for the NIC and E-Switch use-cases.

I'd like to thanks Elijah Shakkour  for implementing
and helping me testing this functionality on HW simulator, before it could
be done with FW.

- Or.


Or Gerlitz (12):
  net/mlx5e: Add prefix for e-switch offloaded TC flow attributes
  net/mlx5e: Add NIC attributes for offloaded TC flows
  net/mlx5e: Add intermediate struct for TC flow parsing attributes
  net/mlx5e: Properly deal with resource cleanup when adding TC flow fails
  net/mlx5: Add helper to initialize a flow steering actions struct instance
  net/mlx5: Reorder few command cases to reflect their natural order
  net/mlx5: Introduce modify header structures, commands and steering 
action definitions
  net/mlx5: Introduce alloc/dealloc modify header context commands
  net/sched: Add accessor functions to pedit keys for offloading drivers
  net/mlx5e: Add parsing of TC pedit actions to HW format
  net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions
  net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions

 drivers/net/ethernet/mellanox/mlx5/core/cmd.c  |  28 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c  |  14 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c|  18 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 473 ++---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |   5 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |  28 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |  67 +++
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |   1 +
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   5 +
 include/linux/mlx5/fs.h|   6 +-
 include/linux/mlx5/mlx5_ifc.h  | 113 -
 include/net/tc_act/tc_pedit.h  |  45 ++
 13 files changed, 698 insertions(+), 106 deletions(-)


[net-next 08/14] net/mlx5e: CQ and RQ don't need priv pointer

2017-03-27 Thread Saeed Mahameed
Remove mlx5e_priv pointer from CQ and RQ structs,
it was needed only to access mdev pointer from priv pointer.

Instead we now pass mdev where needed.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  38 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 181 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   3 +-
 4 files changed, 99 insertions(+), 125 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 007f91f54fda..44c454b34754 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -280,7 +280,6 @@ struct mlx5e_cq {
struct napi_struct*napi;
struct mlx5_core_cqmcq;
struct mlx5e_channel  *channel;
-   struct mlx5e_priv *priv;
 
/* cqe decompression */
struct mlx5_cqe64  title;
@@ -290,6 +289,7 @@ struct mlx5e_cq {
u16decmprs_wqe_counter;
 
/* control */
+   struct mlx5_core_dev  *mdev;
struct mlx5_frag_wq_ctrl   wq_ctrl;
 } cacheline_aligned_in_smp;
 
@@ -533,7 +533,7 @@ struct mlx5e_rq {
u32mpwqe_num_strides;
u32rqn;
struct mlx5e_channel  *channel;
-   struct mlx5e_priv *priv;
+   struct mlx5_core_dev  *mdev;
struct mlx5_core_mkey  umr_mkey;
 } cacheline_aligned_in_smp;
 
@@ -556,6 +556,8 @@ struct mlx5e_channel {
 
/* control */
struct mlx5e_priv *priv;
+   struct mlx5_core_dev  *mdev;
+   struct mlx5e_tstamp   *tstamp;
intix;
intcpu;
 };
@@ -715,22 +717,6 @@ enum {
MLX5E_NIC_PRIO
 };
 
-struct mlx5e_profile {
-   void(*init)(struct mlx5_core_dev *mdev,
-   struct net_device *netdev,
-   const struct mlx5e_profile *profile, void *ppriv);
-   void(*cleanup)(struct mlx5e_priv *priv);
-   int (*init_rx)(struct mlx5e_priv *priv);
-   void(*cleanup_rx)(struct mlx5e_priv *priv);
-   int (*init_tx)(struct mlx5e_priv *priv);
-   void(*cleanup_tx)(struct mlx5e_priv *priv);
-   void(*enable)(struct mlx5e_priv *priv);
-   void(*disable)(struct mlx5e_priv *priv);
-   void(*update_stats)(struct mlx5e_priv *priv);
-   int (*max_nch)(struct mlx5_core_dev *mdev);
-   int max_tc;
-};
-
 struct mlx5e_priv {
/* priv data path fields - start */
struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC];
@@ -770,6 +756,22 @@ struct mlx5e_priv {
void  *ppriv;
 };
 
+struct mlx5e_profile {
+   void(*init)(struct mlx5_core_dev *mdev,
+   struct net_device *netdev,
+   const struct mlx5e_profile *profile, void *ppriv);
+   void(*cleanup)(struct mlx5e_priv *priv);
+   int (*init_rx)(struct mlx5e_priv *priv);
+   void(*cleanup_rx)(struct mlx5e_priv *priv);
+   int (*init_tx)(struct mlx5e_priv *priv);
+   void(*cleanup_tx)(struct mlx5e_priv *priv);
+   void(*enable)(struct mlx5e_priv *priv);
+   void(*disable)(struct mlx5e_priv *priv);
+   void(*update_stats)(struct mlx5e_priv *priv);
+   int (*max_nch)(struct mlx5_core_dev *mdev);
+   int max_tc;
+};
+
 void mlx5e_build_ptys2ethtool_map(void);
 
 u16 mlx5e_select_queue(struct net_device *dev, struct sk_buff *skb,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index cf8df1d3275e..a6e09c46440b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -491,11 +491,10 @@ static void mlx5e_rq_free_mpwqe_info(struct mlx5e_rq *rq)
kfree(rq->mpwqe.info);
 }
 
-static int mlx5e_create_umr_mkey(struct mlx5e_priv *priv,
+static int mlx5e_create_umr_mkey(struct mlx5_core_dev *mdev,
 u64 npages, u8 page_shift,
 struct mlx5_core_mkey *umr_mkey)
 {
-   struct mlx5_core_dev *mdev = priv->mdev;
int inlen = MLX5_ST_SZ_BYTES(create_mkey_in);
void *mkc;
u32 *in;
@@ -529,12 +528,11 @@ static int mlx5e_create_umr_mkey(struct mlx5e_priv *priv,
return err;
 }
 
-static int mlx5e_create_rq_umr_mkey(struct mlx5e_rq *rq)
+static int mlx5e_create_rq_umr_mkey(struct mlx5_core_dev *mdev, struct 
mlx5e_rq *rq)
 {
-   struct mlx5e_priv *priv = rq->priv;
u64 num_mtts = MLX5E_REQUIRED_MTTS(mlx5_wq_ll_get_size(>wq));
 
-   return mlx5e_create_umr_mkey(priv, num_mtt

[net-next 07/14] net/mlx5e: Isolate open_channels from priv->params

2017-03-27 Thread Saeed Mahameed
In order to have a clean separation between channels resources creation
flows and current active mlx5e netdev parameters, make sure each
resource creation function do not access priv->params, and only works
with on a new fresh set of parameters.

For this we add "new" mlx5e_params field to mlx5e_channels structure
and use it down the road to mlx5e_open_{cq,rq,sq} and so on.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  22 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 119 +++---
 .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c|   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 448 ++---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  61 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   7 +-
 8 files changed, 328 insertions(+), 341 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f1895ebe7fe5..007f91f54fda 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -182,15 +182,15 @@ enum mlx5e_priv_flag {
MLX5E_PFLAG_RX_CQE_COMPRESS = (1 << 1),
 };
 
-#define MLX5E_SET_PFLAG(priv, pflag, enable)   \
+#define MLX5E_SET_PFLAG(params, pflag, enable) \
do {\
if (enable) \
-   (priv)->params.pflags |= (pflag);   \
+   (params)->pflags |= (pflag);\
else\
-   (priv)->params.pflags &= ~(pflag);  \
+   (params)->pflags &= ~(pflag);   \
} while (0)
 
-#define MLX5E_GET_PFLAG(priv, pflag) (!!((priv)->params.pflags & (pflag)))
+#define MLX5E_GET_PFLAG(params, pflag) (!!((params)->pflags & (pflag)))
 
 #ifdef CONFIG_MLX5_CORE_EN_DCB
 #define MLX5E_MAX_BW_ALLOC 100 /* Max percentage of BW allocation */
@@ -213,7 +213,6 @@ struct mlx5e_params {
bool rx_cqe_compress_def;
struct mlx5e_cq_moder rx_cq_moderation;
struct mlx5e_cq_moder tx_cq_moderation;
-   u16 min_rx_wqes;
bool lro_en;
u32 lro_wqe_sz;
u16 tx_max_inline;
@@ -225,6 +224,7 @@ struct mlx5e_params {
bool rx_am_enabled;
u32 lro_timeout;
u32 pflags;
+   struct bpf_prog *xdp_prog;
 };
 
 #ifdef CONFIG_MLX5_CORE_EN_DCB
@@ -357,7 +357,6 @@ struct mlx5e_txqsq {
/* control path */
struct mlx5_wq_ctrlwq_ctrl;
struct mlx5e_channel  *channel;
-   inttc;
inttxq_ix;
u32rate_limit;
 } cacheline_aligned_in_smp;
@@ -564,6 +563,7 @@ struct mlx5e_channel {
 struct mlx5e_channels {
struct mlx5e_channel **c;
unsigned int   num;
+   struct mlx5e_paramsparams;
 };
 
 enum mlx5e_traffic_types {
@@ -735,7 +735,6 @@ struct mlx5e_priv {
/* priv data path fields - start */
struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC];
int channel_tc2txq[MLX5E_MAX_NUM_CHANNELS][MLX5E_MAX_NUM_TC];
-   struct bpf_prog *xdp_prog;
/* priv data path fields - end */
 
unsigned long  state;
@@ -752,7 +751,6 @@ struct mlx5e_priv {
struct mlx5e_flow_steering fs;
struct mlx5e_vxlan_db  vxlan;
 
-   struct mlx5e_paramsparams;
struct workqueue_struct*wq;
struct work_struct update_carrier_work;
struct work_struct set_rx_mode_work;
@@ -857,8 +855,9 @@ struct mlx5e_redirect_rqt_param {
 
 int mlx5e_redirect_rqt(struct mlx5e_priv *priv, u32 rqtn, int sz,
   struct mlx5e_redirect_rqt_param rrp);
-void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_priv *priv, void *tirc,
-   enum mlx5e_traffic_types tt);
+void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_params *params,
+   enum mlx5e_traffic_types tt,
+   void *tirc);
 
 int mlx5e_open_locked(struct net_device *netdev);
 int mlx5e_close_locked(struct net_device *netdev);
@@ -869,7 +868,8 @@ int mlx5e_get_max_linkspeed(struct mlx5_core_dev *mdev, u32 
*speed);
 
 void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params,
 u8 cq_period_mode);
-void mlx5e_set_rq_type_params(struct mlx5e_priv *priv, u8 rq_type);
+void mlx5e_set_rq_type_params(struct mlx5_core_dev *mdev,
+ struct mlx5e_params *params, u8 rq_ty

[net-next 02/14] net/mlx5e: Set netdev->rx_cpu_rmap on netdev creation

2017-03-27 Thread Saeed Mahameed
To simplify mlx5e_open_locked flow we set netdev->rx_cpu_rmap on netdev
creation rather on netdev open, it is redundant to set it every time on
mlx5e_open_locked.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 469d6c147db7..f0eff5e30729 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2478,9 +2478,7 @@ int mlx5e_open_locked(struct net_device *netdev)
mlx5e_redirect_rqts(priv);
mlx5e_update_carrier(priv);
mlx5e_timestamp_init(priv);
-#ifdef CONFIG_RFS_ACCEL
-   priv->netdev->rx_cpu_rmap = priv->mdev->rmap;
-#endif
+
if (priv->profile->update_stats)
queue_delayed_work(priv->wq, >update_stats_work, 0);
 
@@ -4022,6 +4020,10 @@ struct net_device *mlx5e_create_netdev(struct 
mlx5_core_dev *mdev,
return NULL;
}
 
+#ifdef CONFIG_RFS_ACCEL
+   netdev->rx_cpu_rmap = mdev->rmap;
+#endif
+
profile->init(mdev, netdev, profile, ppriv);
 
netif_carrier_off(netdev);
-- 
2.11.0



[net-next 03/14] net/mlx5e: Introduce mlx5e_channels

2017-03-27 Thread Saeed Mahameed
Have a dedicated "channels" handler that will serve as channels
(RQs/SQs/etc..) holder to help with separating channels/parameters
operations, for the downstream fail-safe configuration flow, where we will
create a new instance of mlx5e_channels with the new requested parameters
and switch to the new channels on the fly.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  9 ++-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 27 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 86 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   | 14 ++--
 4 files changed, 71 insertions(+), 65 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index bace9233dc1f..b00c6688ddcf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -560,6 +560,11 @@ struct mlx5e_channel {
intcpu;
 };
 
+struct mlx5e_channels {
+   struct mlx5e_channel **c;
+   unsigned int   num;
+};
+
 enum mlx5e_traffic_types {
MLX5E_TT_IPV4_TCP,
MLX5E_TT_IPV6_TCP,
@@ -736,7 +741,7 @@ struct mlx5e_priv {
struct mutex   state_lock; /* Protects Interface state */
struct mlx5e_rqdrop_rq;
 
-   struct mlx5e_channel **channel;
+   struct mlx5e_channels  channels;
u32tisn[MLX5E_MAX_NUM_TC];
struct mlx5e_rqt   indir_rqt;
struct mlx5e_tir   indir_tir[MLX5E_NUM_INDIR_TIRS];
@@ -836,7 +841,7 @@ int mlx5e_vlan_rx_kill_vid(struct net_device *dev, 
__always_unused __be16 proto,
 void mlx5e_enable_vlan_filter(struct mlx5e_priv *priv);
 void mlx5e_disable_vlan_filter(struct mlx5e_priv *priv);
 
-int mlx5e_modify_rqs_vsd(struct mlx5e_priv *priv, bool vsd);
+int mlx5e_modify_channels_vsd(struct mlx5e_channels *chs, bool vsd);
 
 int mlx5e_redirect_rqt(struct mlx5e_priv *priv, u32 rqtn, int sz, int ix);
 void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_priv *priv, void *tirc,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index a004a5a1a4c2..2e54a6564d86 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -152,12 +152,9 @@ static bool mlx5e_query_global_pause_combined(struct 
mlx5e_priv *priv)
 }
 
 #define MLX5E_NUM_Q_CNTRS(priv) (NUM_Q_COUNTERS * (!!priv->q_counter))
-#define MLX5E_NUM_RQ_STATS(priv) \
-   (NUM_RQ_STATS * priv->params.num_channels * \
-test_bit(MLX5E_STATE_OPENED, >state))
+#define MLX5E_NUM_RQ_STATS(priv) (NUM_RQ_STATS * (priv)->channels.num)
 #define MLX5E_NUM_SQ_STATS(priv) \
-   (NUM_SQ_STATS * priv->params.num_channels * priv->params.num_tc * \
-test_bit(MLX5E_STATE_OPENED, >state))
+   (NUM_SQ_STATS * (priv)->channels.num * (priv)->params.num_tc)
 #define MLX5E_NUM_PFC_COUNTERS(priv) \
((mlx5e_query_global_pause_combined(priv) + 
hweight8(mlx5e_query_pfc_combined(priv))) * \
  NUM_PPORT_PER_PRIO_PFC_COUNTERS)
@@ -262,13 +259,13 @@ static void mlx5e_fill_stats_strings(struct mlx5e_priv 
*priv, uint8_t *data)
return;
 
/* per channel counters */
-   for (i = 0; i < priv->params.num_channels; i++)
+   for (i = 0; i < priv->channels.num; i++)
for (j = 0; j < NUM_RQ_STATS; j++)
sprintf(data + (idx++) * ETH_GSTRING_LEN,
rq_stats_desc[j].format, i);
 
for (tc = 0; tc < priv->params.num_tc; tc++)
-   for (i = 0; i < priv->params.num_channels; i++)
+   for (i = 0; i < priv->channels.num; i++)
for (j = 0; j < NUM_SQ_STATS; j++)
sprintf(data + (idx++) * ETH_GSTRING_LEN,
sq_stats_desc[j].format,
@@ -303,6 +300,7 @@ static void mlx5e_get_ethtool_stats(struct net_device *dev,
struct ethtool_stats *stats, u64 *data)
 {
struct mlx5e_priv *priv = netdev_priv(dev);
+   struct mlx5e_channels *channels;
struct mlx5_priv *mlx5_priv;
int i, j, tc, prio, idx = 0;
unsigned long pfc_combined;
@@ -313,6 +311,7 @@ static void mlx5e_get_ethtool_stats(struct net_device *dev,
mutex_lock(>state_lock);
if (test_bit(MLX5E_STATE_OPENED, >state))
mlx5e_update_stats(priv);
+   channels = >channels;
mutex_unlock(>state_lock);
 
for (i = 0; i < NUM_SW_COUNTERS; i++)
@@ -382,16 +381,16 @@ static void mlx5e_get_ethtool_stats(struct net_device 
*dev,
return;
 
/* p

[net-next 09/14] net/mlx5e: Minimize mlx5e_{open/close}_locked

2017-03-27 Thread Saeed Mahameed
mlx5e_redirect_rqts_to_{channels,drop} and mlx5e_{add,del}_sqs_fwd_rules
and Set real num tx/rx queues belong to
mlx5e_{activate,deactivate}_priv_channels, for that we move those functions
and minimize mlx5e_open/close flows.

This will be needed in downstream patches to replace old channels with new
ones without the need to call mlx5e_close/open.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 40 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 10 --
 2 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a6e09c46440b..a94f84ec2c1a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2498,14 +2498,33 @@ static void mlx5e_build_channels_tx_maps(struct 
mlx5e_priv *priv)
 
 static void mlx5e_activate_priv_channels(struct mlx5e_priv *priv)
 {
+   int num_txqs = priv->channels.num * priv->channels.params.num_tc;
+   struct net_device *netdev = priv->netdev;
+
+   mlx5e_netdev_set_tcs(netdev);
+   if (netdev->real_num_tx_queues != num_txqs)
+   netif_set_real_num_tx_queues(netdev, num_txqs);
+   if (netdev->real_num_rx_queues != priv->channels.num)
+   netif_set_real_num_rx_queues(netdev, priv->channels.num);
+
mlx5e_build_channels_tx_maps(priv);
mlx5e_activate_channels(>channels);
netif_tx_start_all_queues(priv->netdev);
+
+   if (MLX5_CAP_GEN(priv->mdev, vport_group_manager))
+   mlx5e_add_sqs_fwd_rules(priv);
+
mlx5e_wait_channels_min_rx_wqes(>channels);
+   mlx5e_redirect_rqts_to_channels(priv, >channels);
 }
 
 static void mlx5e_deactivate_priv_channels(struct mlx5e_priv *priv)
 {
+   mlx5e_redirect_rqts_to_drop(priv);
+
+   if (MLX5_CAP_GEN(priv->mdev, vport_group_manager))
+   mlx5e_remove_sqs_fwd_rules(priv);
+
/* FIXME: This is a W/A only for tx timeout watch dog false alarm when
 * polling for inactive tx queues.
 */
@@ -2517,40 +2536,24 @@ static void mlx5e_deactivate_priv_channels(struct 
mlx5e_priv *priv)
 int mlx5e_open_locked(struct net_device *netdev)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
-   struct mlx5_core_dev *mdev = priv->mdev;
-   int num_txqs;
int err;
 
set_bit(MLX5E_STATE_OPENED, >state);
 
-   mlx5e_netdev_set_tcs(netdev);
-
-   num_txqs = priv->channels.params.num_channels * 
priv->channels.params.num_tc;
-   netif_set_real_num_tx_queues(netdev, num_txqs);
-   netif_set_real_num_rx_queues(netdev, 
priv->channels.params.num_channels);
-
err = mlx5e_open_channels(priv, >channels);
if (err)
goto err_clear_state_opened_flag;
 
mlx5e_refresh_tirs(priv, false);
mlx5e_activate_priv_channels(priv);
-   mlx5e_redirect_rqts_to_channels(priv, >channels);
mlx5e_update_carrier(priv);
mlx5e_timestamp_init(priv);
 
if (priv->profile->update_stats)
queue_delayed_work(priv->wq, >update_stats_work, 0);
 
-   if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
-   err = mlx5e_add_sqs_fwd_rules(priv);
-   if (err)
-   goto err_close_channels;
-   }
return 0;
 
-err_close_channels:
-   mlx5e_close_channels(>channels);
 err_clear_state_opened_flag:
clear_bit(MLX5E_STATE_OPENED, >state);
return err;
@@ -2571,7 +2574,6 @@ int mlx5e_open(struct net_device *netdev)
 int mlx5e_close_locked(struct net_device *netdev)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
-   struct mlx5_core_dev *mdev = priv->mdev;
 
/* May already be CLOSED in case a previous configuration operation
 * (e.g RX/TX queue size change) that involves close failed.
@@ -2581,12 +2583,8 @@ int mlx5e_close_locked(struct net_device *netdev)
 
clear_bit(MLX5E_STATE_OPENED, >state);
 
-   if (MLX5_CAP_GEN(mdev, vport_group_manager))
-   mlx5e_remove_sqs_fwd_rules(priv);
-
mlx5e_timestamp_cleanup(priv);
netif_carrier_off(priv->netdev);
-   mlx5e_redirect_rqts_to_drop(priv);
mlx5e_deactivate_priv_channels(priv);
mlx5e_close_channels(>channels);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index d277c1979b2a..53db5ec2c122 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -189,12 +189,13 @@ int mlx5e_add_sqs_fwd_rules(struct mlx5e_priv *priv)
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
struct mlx5_eswitch

[pull request][net-next 00/14] Mellanox mlx5e Fail-safe config

2017-03-27 Thread Saeed Mahameed
Hi Dave,

This series provides a fail-safe mechanism to allow safely re-configuring
mlx5e netdevice and provides a resiliency against sporadic
configuration failures.

For additional information please see below.

Please pull and let me know if there's any problem.

Thanks,
Saeed.

---

The following changes since commit 88275ed0cb3ac89ed869a925337b951801b154d7:

  Merge branch 'netvsc-next' (2017-03-25 20:15:56 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5e-failsafe

for you to fetch changes up to 2e20a151205be8e7efa9644cdb942381e7bec787:

  net/mlx5e: Fail safe mtu and lro setting (2017-03-27 15:08:24 +0300)


mlx5e-failsafe 27-03-2017

This series provides a fail-safe mechanism to allow safely re-configuring
mlx5e netdevice and provides a resiliency against sporadic
configuration failures.

To enable this we do some refactoring and code reorganizing to allow
breaking the drivers open/close flows to stages:
  open -> activate -> deactivate -> close.

In addition we need to allow creating fresh HW ring resources
(mlx5e_channels) with their own "new" set of parameters, while keeping
the current ones running and active until the new channels are
successfully created with the new configuration, and only then we can
safly replace (switch) old channels with new ones.

For that we introduce mlx5e_channels object and an API to manage it:
 - channels = open_channels(new_params):
   open fresh TX/RX channels
 - activate_channels(channels):
   redirect traffic to them and attach them to the netdev
 - deactivate_channes(channels)
   stop traffic and detach from netdev
 - close(channels)
   Free the TX/RX HW resources of those channels

With the above strategy it is straightforward to achieve the desired
behavior of fail-safe configuration.  In pseudo code:

make_new_config(new_params)
{
old_channels = current_active_channels;
new_channels = create_channels(new_params);
if (!new_channels)
return "Failed, but current channels are still active :)"

deactivate_channels(old_channels); /* Can't fail */
set_hw_new_state();/* If needed  */
activate_channels(new_channels);   /* Can't fail */
close_channels(old_channels);
current_active_channels = new_channels;

return "SUCCESS";
}

At the top of this series, we change the following flows to be fail-safe:
ethtool:
   - ring parameters
   - coalesce parameters
   - tx copy break parameters
   - cqe compressing/moderation mode setting (priv flags)
ndos:
   - tc setup
   - set features: LRO
   - change mtu

----
Saeed Mahameed (14):
  net/mlx5e: Set SQ max rate on mlx5e_open_txqsq rather on open_channel
  net/mlx5e: Set netdev->rx_cpu_rmap on netdev creation
  net/mlx5e: Introduce mlx5e_channels
  net/mlx5e: Redirect RQT refactoring
  net/mlx5e: Refactor refresh TIRs
  net/mlx5e: Split open/close channels to stages
  net/mlx5e: Isolate open_channels from priv->params
  net/mlx5e: CQ and RQ don't need priv pointer
  net/mlx5e: Minimize mlx5e_{open/close}_locked
  net/mlx5e: Introduce switch channels
  net/mlx5e: Fail safe ethtool settings
  net/mlx5e: Fail safe cqe compressing/moderation mode setting
  net/mlx5e: Fail safe tc setup
  net/mlx5e: Fail safe mtu and lro setting

 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  106 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |   10 +-
 .../net/ethernet/mellanox/mlx5/core/en_common.c|   17 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  336 +++---
 .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c|2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 1172 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   83 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|   22 -
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c |2 +-
 .../net/ethernet/mellanox/mlx5/core/en_selftest.c  |9 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   11 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |3 +-
 12 files changed, 984 insertions(+), 789 deletions(-)


[net-next 05/14] net/mlx5e: Refactor refresh TIRs

2017-03-27 Thread Saeed Mahameed
Rename mlx5e_refresh_tirs_self_loopback to mlx5e_refresh_tirs,
as it will be used in downstream (Safe config flow) patches, and make it
fail safe on mlx5e_open.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  3 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_common.c   | 17 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  8 +---
 drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c |  9 +++--
 4 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 50dfc4c6c8e4..5f7cc58d900c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -959,8 +959,7 @@ void mlx5e_destroy_tir(struct mlx5_core_dev *mdev,
   struct mlx5e_tir *tir);
 int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev);
 void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev);
-int mlx5e_refresh_tirs_self_loopback(struct mlx5_core_dev *mdev,
-bool enable_uc_lb);
+int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb);
 
 struct mlx5_eswitch_rep;
 int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
index 20bdbe685795..f1f17f7a3cd0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
@@ -136,18 +136,20 @@ void mlx5e_destroy_mdev_resources(struct mlx5_core_dev 
*mdev)
mlx5_core_dealloc_pd(mdev, res->pdn);
 }
 
-int mlx5e_refresh_tirs_self_loopback(struct mlx5_core_dev *mdev,
-bool enable_uc_lb)
+int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb)
 {
+   struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5e_tir *tir;
-   void *in;
+   int err  = -ENOMEM;
+   u32 tirn = 0;
int inlen;
-   int err = 0;
+   void *in;
+
 
inlen = MLX5_ST_SZ_BYTES(modify_tir_in);
in = mlx5_vzalloc(inlen);
if (!in)
-   return -ENOMEM;
+   goto out;
 
if (enable_uc_lb)
MLX5_SET(modify_tir_in, in, ctx.self_lb_block,
@@ -156,13 +158,16 @@ int mlx5e_refresh_tirs_self_loopback(struct mlx5_core_dev 
*mdev,
MLX5_SET(modify_tir_in, in, bitmask.self_lb_en, 1);
 
list_for_each_entry(tir, >mlx5e_res.td.tirs_list, list) {
-   err = mlx5_core_modify_tir(mdev, tir->tirn, in, inlen);
+   tirn = tir->tirn;
+   err = mlx5_core_modify_tir(mdev, tirn, in, inlen);
if (err)
goto out;
}
 
 out:
kvfree(in);
+   if (err)
+   netdev_err(priv->netdev, "refresh tir(0x%x) failed, %d\n", 
tirn, err);
 
return err;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index aec77f075714..a98d01684247 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2498,13 +2498,7 @@ int mlx5e_open_locked(struct net_device *netdev)
goto err_clear_state_opened_flag;
}
 
-   err = mlx5e_refresh_tirs_self_loopback(priv->mdev, false);
-   if (err) {
-   netdev_err(netdev, "%s: mlx5e_refresh_tirs_self_loopback_enable 
failed, %d\n",
-  __func__, err);
-   goto err_close_channels;
-   }
-
+   mlx5e_refresh_tirs(priv, false);
mlx5e_redirect_rqts_to_channels(priv, >channels);
mlx5e_update_carrier(priv);
mlx5e_timestamp_init(priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
index 5621dcfda4f1..5225f2226a67 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
@@ -236,12 +236,9 @@ static int mlx5e_test_loopback_setup(struct mlx5e_priv 
*priv,
 {
int err = 0;
 
-   err = mlx5e_refresh_tirs_self_loopback(priv->mdev, true);
-   if (err) {
-   netdev_err(priv->netdev,
-  "\tFailed to enable UC loopback err(%d)\n", err);
+   err = mlx5e_refresh_tirs(priv, true);
+   if (err)
return err;
-   }
 
lbtp->loopback_ok = false;
init_completion(>comp);
@@ -258,7 +255,7 @@ static void mlx5e_test_loopback_cleanup(struct mlx5e_priv 
*priv,
struct mlx5e_lbt_priv *lbtp)
 {
dev_remove_pack(>pt);
-   mlx5e_refresh_tirs_self_loopback(priv->mdev, fa

[net-next 12/14] net/mlx5e: Fail safe cqe compressing/moderation mode setting

2017-03-27 Thread Saeed Mahameed
Use the new fail-safe channels switch mechanism to set new
CQE compressing and CQE moderation mode settings.

We also move RX CQE compression modify function out of en_rx file  to
a more appropriate place.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  8 +++-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 53 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 22 -
 4 files changed, 51 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 2f259dfbf844..8b93d8d02116 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -833,7 +833,7 @@ void mlx5e_pps_event_handler(struct mlx5e_priv *priv,
 struct ptp_clock_event *event);
 int mlx5e_hwstamp_set(struct net_device *dev, struct ifreq *ifr);
 int mlx5e_hwstamp_get(struct net_device *dev, struct ifreq *ifr);
-void mlx5e_modify_rx_cqe_compression_locked(struct mlx5e_priv *priv, bool val);
+int mlx5e_modify_rx_cqe_compression_locked(struct mlx5e_priv *priv, bool val);
 
 int mlx5e_vlan_rx_add_vid(struct net_device *dev, __always_unused __be16 proto,
  u16 vid);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
index 485c23b59f93..e706a87fc8b2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
@@ -90,6 +90,7 @@ int mlx5e_hwstamp_set(struct net_device *dev, struct ifreq 
*ifr)
 {
struct mlx5e_priv *priv = netdev_priv(dev);
struct hwtstamp_config config;
+   int err;
 
if (!MLX5_CAP_GEN(priv->mdev, device_frequency_khz))
return -EOPNOTSUPP;
@@ -129,7 +130,12 @@ int mlx5e_hwstamp_set(struct net_device *dev, struct ifreq 
*ifr)
case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
/* Disable CQE compression */
netdev_warn(dev, "Disabling cqe compression");
-   mlx5e_modify_rx_cqe_compression_locked(priv, false);
+   err = mlx5e_modify_rx_cqe_compression_locked(priv, false);
+   if (err) {
+   netdev_err(dev, "Failed disabling cqe compression 
err=%d\n", err);
+   mutex_unlock(>state_lock);
+   return err;
+   }
config.rx_filter = HWTSTAMP_FILTER_ALL;
break;
default:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 457a796cc248..c5f49e294987 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1474,10 +1474,10 @@ static int set_pflag_rx_cqe_based_moder(struct 
net_device *netdev, bool enable)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
struct mlx5_core_dev *mdev = priv->mdev;
+   struct mlx5e_channels new_channels = {};
bool rx_mode_changed;
u8 rx_cq_period_mode;
int err = 0;
-   bool reset;
 
rx_cq_period_mode = enable ?
MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
@@ -1491,16 +1491,51 @@ static int set_pflag_rx_cqe_based_moder(struct 
net_device *netdev, bool enable)
if (!rx_mode_changed)
return 0;
 
-   reset = test_bit(MLX5E_STATE_OPENED, >state);
-   if (reset)
-   mlx5e_close_locked(netdev);
+   new_channels.params = priv->channels.params;
+   mlx5e_set_rx_cq_mode_params(_channels.params, rx_cq_period_mode);
 
-   mlx5e_set_rx_cq_mode_params(>channels.params, rx_cq_period_mode);
+   if (!test_bit(MLX5E_STATE_OPENED, >state)) {
+   priv->channels.params = new_channels.params;
+   return 0;
+   }
+
+   err = mlx5e_open_channels(priv, _channels);
+   if (err)
+   return err;
 
-   if (reset)
-   err = mlx5e_open_locked(netdev);
+   mlx5e_switch_priv_channels(priv, _channels);
+   return 0;
+}
 
-   return err;
+int mlx5e_modify_rx_cqe_compression_locked(struct mlx5e_priv *priv, bool 
new_val)
+{
+   bool curr_val = MLX5E_GET_PFLAG(>channels.params, 
MLX5E_PFLAG_RX_CQE_COMPRESS);
+   struct mlx5e_channels new_channels = {};
+   int err = 0;
+
+   if (!MLX5_CAP_GEN(priv->mdev, cqe_compression))
+   return new_val ? -EOPNOTSUPP : 0;
+
+   if (curr_val == new_val)
+   return 0;
+
+   new_channels.params = priv->channels.params;
+   MLX5E_SET_PFLAG(_channels.params, MLX5E_PFLAG_RX_CQE_COMPRESS, 
new_val)

[net-next 14/14] net/mlx5e: Fail safe mtu and lro setting

2017-03-27 Thread Saeed Mahameed
Use the new fail-safe channels switch mechanism to set new
netdev mtu and lro settings.

MTU and lro settings demand some HW configuration changes after new
channels are created and ready for action. In order to unify switch
channels routine for LRO and MTU changes, and maybe future configuration
features, we now pass to it a modify HW function pointer to be
invoked directly after old channels are de-activated and before new
channels are activated.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  8 ++-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 12 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 70 ++
 3 files changed, 58 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 8b93d8d02116..150fb52a0737 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -867,8 +867,14 @@ int mlx5e_close_locked(struct net_device *netdev);
 int mlx5e_open_channels(struct mlx5e_priv *priv,
struct mlx5e_channels *chs);
 void mlx5e_close_channels(struct mlx5e_channels *chs);
+
+/* Function pointer to be used to modify WH settings while
+ * switching channels
+ */
+typedef int (*mlx5e_fp_hw_modify)(struct mlx5e_priv *priv);
 void mlx5e_switch_priv_channels(struct mlx5e_priv *priv,
-   struct mlx5e_channels *new_chs);
+   struct mlx5e_channels *new_chs,
+   mlx5e_fp_hw_modify hw_modify);
 
 void mlx5e_build_default_indir_rqt(struct mlx5_core_dev *mdev,
   u32 *indirection_rqt, int len,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index c5f49e294987..40912937d211 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -540,7 +540,7 @@ static int mlx5e_set_ringparam(struct net_device *dev,
if (err)
goto unlock;
 
-   mlx5e_switch_priv_channels(priv, _channels);
+   mlx5e_switch_priv_channels(priv, _channels, NULL);
 
 unlock:
mutex_unlock(>state_lock);
@@ -597,7 +597,7 @@ static int mlx5e_set_channels(struct net_device *dev,
mlx5e_arfs_disable(priv);
 
/* Switch to new channels, set new parameters and close old ones */
-   mlx5e_switch_priv_channels(priv, _channels);
+   mlx5e_switch_priv_channels(priv, _channels, NULL);
 
if (arfs_enabled) {
err = mlx5e_arfs_enable(priv);
@@ -691,7 +691,7 @@ static int mlx5e_set_coalesce(struct net_device *netdev,
if (err)
goto out;
 
-   mlx5e_switch_priv_channels(priv, _channels);
+   mlx5e_switch_priv_channels(priv, _channels, NULL);
 
 out:
mutex_unlock(>state_lock);
@@ -1166,7 +1166,7 @@ static int mlx5e_set_tunable(struct net_device *dev,
err = mlx5e_open_channels(priv, _channels);
if (err)
break;
-   mlx5e_switch_priv_channels(priv, _channels);
+   mlx5e_switch_priv_channels(priv, _channels, NULL);
 
break;
default:
@@ -1503,7 +1503,7 @@ static int set_pflag_rx_cqe_based_moder(struct net_device 
*netdev, bool enable)
if (err)
return err;
 
-   mlx5e_switch_priv_channels(priv, _channels);
+   mlx5e_switch_priv_channels(priv, _channels, NULL);
return 0;
 }
 
@@ -1534,7 +1534,7 @@ int mlx5e_modify_rx_cqe_compression_locked(struct 
mlx5e_priv *priv, bool new_val
if (err)
return err;
 
-   mlx5e_switch_priv_channels(priv, _channels);
+   mlx5e_switch_priv_channels(priv, _channels, NULL);
return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 1e29f40d84ca..68d6c3c58ba7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2437,9 +2437,9 @@ static void mlx5e_query_mtu(struct mlx5e_priv *priv, u16 
*mtu)
*mtu = MLX5E_HW2SW_MTU(hw_mtu);
 }
 
-static int mlx5e_set_dev_port_mtu(struct net_device *netdev)
+static int mlx5e_set_dev_port_mtu(struct mlx5e_priv *priv)
 {
-   struct mlx5e_priv *priv = netdev_priv(netdev);
+   struct net_device *netdev = priv->netdev;
u16 mtu;
int err;
 
@@ -2534,7 +2534,8 @@ static void mlx5e_deactivate_priv_channels(struct 
mlx5e_priv *priv)
 }
 
 void mlx5e_switch_priv_channels(struct mlx5e_priv *priv,
-   struct mlx5e_channels *new_chs)
+   struct mlx5e_channels *new_chs,
+   mlx5e_f

[net-next 01/14] net/mlx5e: Set SQ max rate on mlx5e_open_txqsq rather on open_channel

2017-03-27 Thread Saeed Mahameed
Instead of iterating over the channel SQs to set their max rate, do it
on SQ creation per TXQ SQ.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index e849a0fc2653..469d6c147db7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1207,6 +1207,9 @@ static int mlx5e_create_sq_rdy(struct mlx5e_priv *priv,
return err;
 }
 
+static int mlx5e_set_sq_maxrate(struct net_device *dev,
+   struct mlx5e_txqsq *sq, u32 rate);
+
 static int mlx5e_open_txqsq(struct mlx5e_channel *c,
int tc,
struct mlx5e_sq_param *param,
@@ -1214,6 +1217,8 @@ static int mlx5e_open_txqsq(struct mlx5e_channel *c,
 {
struct mlx5e_create_sq_param csp = {};
struct mlx5e_priv *priv = c->priv;
+   u32 tx_rate;
+   int txq_ix;
int err;
 
err = mlx5e_alloc_txqsq(c, tc, param, sq);
@@ -1230,6 +1235,11 @@ static int mlx5e_open_txqsq(struct mlx5e_channel *c,
if (err)
goto err_free_txqsq;
 
+   txq_ix = c->ix + tc * priv->params.num_channels;
+   tx_rate = priv->tx_rates[txq_ix];
+   if (tx_rate)
+   mlx5e_set_sq_maxrate(priv->netdev, sq, tx_rate);
+
netdev_tx_reset_queue(sq->txq);
netif_tx_start_queue(sq->txq);
return 0;
@@ -1692,7 +1702,6 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, 
int ix,
int cpu = mlx5e_get_cpu(priv, ix);
struct mlx5e_channel *c;
int err;
-   int i;
 
c = kzalloc_node(sizeof(*c), GFP_KERNEL, cpu_to_node(cpu));
if (!c)
@@ -1745,17 +1754,6 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, 
int ix,
if (err)
goto err_close_icosq;
 
-   for (i = 0; i < priv->params.num_tc; i++) {
-   u32 txq_ix = priv->channeltc_to_txq_map[ix][i];
-
-   if (priv->tx_rates[txq_ix]) {
-   struct mlx5e_txqsq *sq = priv->txq_to_sq_map[txq_ix];
-
-   mlx5e_set_sq_maxrate(priv->netdev, sq,
-priv->tx_rates[txq_ix]);
-   }
-   }
-
err = c->xdp ? mlx5e_open_xdpsq(c, >xdp_sq, >rq.xdpsq) : 0;
if (err)
goto err_close_sqs;
-- 
2.11.0



[net-next 04/14] net/mlx5e: Redirect RQT refactoring

2017-03-27 Thread Saeed Mahameed
RQ Tables are always created once (on netdev creation) pointing to drop RQ
and at that stage, RQ tables (indirection tables) are always directed to
drop RQ.

We don't need to use mlx5e_fill_{direct,indir}_rqt_rqns to fill the drop
RQ in create RQT procedure.

Instead of having separate flows to redirect direct and indirect RQ Tables
to the current active channels Receive Queues (RQs), we unify the two
flows by introducing mlx5e_redirect_rqt function and redirect_rqt_param
struct. Combined, they provide one generic logic to fill the RQ table RQ
numbers regardless of the RQ table purpose (direct/indirect).

Demonstrated the usage with mlx5e_redirect_rqts_to_channels which will
be called on mlx5e_open and with mlx5e_redirect_rqts_to_drop which will
be called on mlx5e_close.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  14 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  24 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 169 -
 3 files changed, 129 insertions(+), 78 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index b00c6688ddcf..50dfc4c6c8e4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -843,7 +843,19 @@ void mlx5e_disable_vlan_filter(struct mlx5e_priv *priv);
 
 int mlx5e_modify_channels_vsd(struct mlx5e_channels *chs, bool vsd);
 
-int mlx5e_redirect_rqt(struct mlx5e_priv *priv, u32 rqtn, int sz, int ix);
+struct mlx5e_redirect_rqt_param {
+   bool is_rss;
+   union {
+   u32 rqn; /* Direct RQN (Non-RSS) */
+   struct {
+   u8 hfunc;
+   struct mlx5e_channels *channels;
+   } rss; /* RSS data */
+   };
+};
+
+int mlx5e_redirect_rqt(struct mlx5e_priv *priv, u32 rqtn, int sz,
+  struct mlx5e_redirect_rqt_param rrp);
 void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_priv *priv, void *tirc,
enum mlx5e_traffic_types tt);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 2e54a6564d86..faa21848c9dc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1027,20 +1027,28 @@ static int mlx5e_set_rxfh(struct net_device *dev, const 
u32 *indir,
 
mutex_lock(>state_lock);
 
-   if (indir) {
-   u32 rqtn = priv->indir_rqt.rqtn;
-
-   memcpy(priv->params.indirection_rqt, indir,
-  sizeof(priv->params.indirection_rqt));
-   mlx5e_redirect_rqt(priv, rqtn, MLX5E_INDIR_RQT_SIZE, 0);
-   }
-
if (hfunc != ETH_RSS_HASH_NO_CHANGE &&
hfunc != priv->params.rss_hfunc) {
priv->params.rss_hfunc = hfunc;
hash_changed = true;
}
 
+   if (indir) {
+   memcpy(priv->params.indirection_rqt, indir,
+  sizeof(priv->params.indirection_rqt));
+
+   if (test_bit(MLX5E_STATE_OPENED, >state)) {
+   u32 rqtn = priv->indir_rqt.rqtn;
+   struct mlx5e_redirect_rqt_param rrp = {
+   .is_rss = true,
+   .rss.hfunc = priv->params.rss_hfunc,
+   .rss.channels  = >channels
+   };
+
+   mlx5e_redirect_rqt(priv, rqtn, MLX5E_INDIR_RQT_SIZE, 
rrp);
+   }
+   }
+
if (key) {
memcpy(priv->params.toeplitz_hash_key, key,
   sizeof(priv->params.toeplitz_hash_key));
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 920e72ae992e..aec77f075714 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2046,61 +2046,15 @@ static void mlx5e_close_channels(struct mlx5e_priv 
*priv)
chs->num = 0;
 }
 
-static int mlx5e_rx_hash_fn(int hfunc)
-{
-   return (hfunc == ETH_RSS_HASH_TOP) ?
-  MLX5_RX_HASH_FN_TOEPLITZ :
-  MLX5_RX_HASH_FN_INVERTED_XOR8;
-}
-
-static int mlx5e_bits_invert(unsigned long a, int size)
-{
-   int inv = 0;
-   int i;
-
-   for (i = 0; i < size; i++)
-   inv |= (test_bit(size - i - 1, ) ? 1 : 0) << i;
-
-   return inv;
-}
-
-static void mlx5e_fill_indir_rqt_rqns(struct mlx5e_priv *priv, void *rqtc)
-{
-   int i;
-
-   for (i = 0; i < MLX5E_INDIR_RQT_SIZE; i++) {
-   int ix = i;
-   u32 rqn;
-
-   if (priv->params.rss_hfunc == ETH_RSS_HASH

[net-next 06/14] net/mlx5e: Split open/close channels to stages

2017-03-27 Thread Saeed Mahameed
As a foundation for safe config flow, a simple clear API such as
(Open then Activate) where the "Open" handles the heavy unsafe
creation operation and the "activate" will be fast and fail safe,
to enable the newly created channels.

For this we split the RQs/TXQ SQs and channels open/close flows to
open => activate, deactivate => close.

This will simplify the ability to have fail safe configuration changes
in downstream patches as follows:

make_new_config(new_params)
{
 old_channels = current_active_channels;
 new_channels = create_channels(new_params);
 if (!new_channels)
  return "Failed, but current channels still active :)"
 deactivate_channels(old_channels); /* Can't fail */
 activate_channels(new_channels); /* Can't fail */
 close_channels(old_channels);
 current_active_channels = new_channels;

 return "SUCCESS";
}

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   5 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 214 ++---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   4 +-
 4 files changed, 148 insertions(+), 77 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 5f7cc58d900c..f1895ebe7fe5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -358,6 +358,7 @@ struct mlx5e_txqsq {
struct mlx5_wq_ctrlwq_ctrl;
struct mlx5e_channel  *channel;
inttc;
+   inttxq_ix;
u32rate_limit;
 } cacheline_aligned_in_smp;
 
@@ -732,8 +733,8 @@ struct mlx5e_profile {
 
 struct mlx5e_priv {
/* priv data path fields - start */
-   struct mlx5e_txqsq **txq_to_sq_map;
-   int channeltc_to_txq_map[MLX5E_MAX_NUM_CHANNELS][MLX5E_MAX_NUM_TC];
+   struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC];
+   int channel_tc2txq[MLX5E_MAX_NUM_CHANNELS][MLX5E_MAX_NUM_TC];
struct bpf_prog *xdp_prog;
/* priv data path fields - end */
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index faa21848c9dc..5159358a242d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -269,7 +269,7 @@ static void mlx5e_fill_stats_strings(struct mlx5e_priv 
*priv, uint8_t *data)
for (j = 0; j < NUM_SQ_STATS; j++)
sprintf(data + (idx++) * ETH_GSTRING_LEN,
sq_stats_desc[j].format,
-   priv->channeltc_to_txq_map[i][tc]);
+   priv->channel_tc2txq[i][tc]);
 }
 
 static void mlx5e_get_strings(struct net_device *dev,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a98d01684247..6be7c2367d41 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -820,6 +820,8 @@ static int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq)
msleep(20);
}
 
+   netdev_warn(priv->netdev, "Failed to get min RX wqes on RQN[0x%x] wq 
cur_sz(%d) min_rx_wqes(%d)\n",
+   rq->rqn, wq->cur_sz, priv->params.min_rx_wqes);
return -ETIMEDOUT;
 }
 
@@ -848,9 +850,6 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
 struct mlx5e_rq_param *param,
 struct mlx5e_rq *rq)
 {
-   struct mlx5e_icosq *sq = >icosq;
-   u16 pi = sq->pc & sq->wq.sz_m1;
-   struct mlx5e_tx_wqe *nopwqe;
int err;
 
err = mlx5e_alloc_rq(c, param, rq);
@@ -861,7 +860,6 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
if (err)
goto err_free_rq;
 
-   set_bit(MLX5E_RQ_STATE_ENABLED, >state);
err = mlx5e_modify_rq_state(rq, MLX5_RQC_STATE_RST, MLX5_RQC_STATE_RDY);
if (err)
goto err_destroy_rq;
@@ -869,14 +867,9 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
if (param->am_enabled)
set_bit(MLX5E_RQ_STATE_AM, >rq.state);
 
-   sq->db.ico_wqe[pi].opcode = MLX5_OPCODE_NOP;
-   sq->db.ico_wqe[pi].num_wqebbs = 1;
-   nopwqe = mlx5e_post_nop(>wq, sq->sqn, >pc);
-   mlx5e_notify_hw(>wq, sq->pc, sq->uar_map, >ctrl);
return 0;
 
 err_destroy_rq:
-   clear_bit(MLX5E_RQ_STATE_ENABLED, >state);
mlx5e_destroy_rq(rq);
 err_free

[net-next 11/14] net/mlx5e: Fail safe ethtool settings

2017-03-27 Thread Saeed Mahameed
Use the new fail-safe channels switch mechanism to set new ethtool
settings:
 - ring parameters
 - coalesce parameters
 - tx copy break parameters

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 120 +
 1 file changed, 73 insertions(+), 47 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index e5cee400a4d3..457a796cc248 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -457,8 +457,8 @@ static int mlx5e_set_ringparam(struct net_device *dev,
 {
struct mlx5e_priv *priv = netdev_priv(dev);
int rq_wq_type = priv->channels.params.rq_wq_type;
+   struct mlx5e_channels new_channels = {};
u32 rx_pending_wqes;
-   bool was_opened;
u32 min_rq_size;
u32 max_rq_size;
u8 log_rq_size;
@@ -527,16 +527,22 @@ static int mlx5e_set_ringparam(struct net_device *dev,
 
mutex_lock(>state_lock);
 
-   was_opened = test_bit(MLX5E_STATE_OPENED, >state);
-   if (was_opened)
-   mlx5e_close_locked(dev);
+   new_channels.params = priv->channels.params;
+   new_channels.params.log_rq_size = log_rq_size;
+   new_channels.params.log_sq_size = log_sq_size;
 
-   priv->channels.params.log_rq_size = log_rq_size;
-   priv->channels.params.log_sq_size = log_sq_size;
+   if (!test_bit(MLX5E_STATE_OPENED, >state)) {
+   priv->channels.params = new_channels.params;
+   goto unlock;
+   }
 
-   if (was_opened)
-   err = mlx5e_open_locked(dev);
+   err = mlx5e_open_channels(priv, _channels);
+   if (err)
+   goto unlock;
+
+   mlx5e_switch_priv_channels(priv, _channels);
 
+unlock:
mutex_unlock(>state_lock);
 
return err;
@@ -623,36 +629,13 @@ static int mlx5e_get_coalesce(struct net_device *netdev,
return 0;
 }
 
-static int mlx5e_set_coalesce(struct net_device *netdev,
- struct ethtool_coalesce *coal)
+static void
+mlx5e_set_priv_channels_coalesce(struct mlx5e_priv *priv, struct 
ethtool_coalesce *coal)
 {
-   struct mlx5e_priv *priv= netdev_priv(netdev);
struct mlx5_core_dev *mdev = priv->mdev;
-   bool restart =
-   !!coal->use_adaptive_rx_coalesce != 
priv->channels.params.rx_am_enabled;
-   bool was_opened;
-   int err = 0;
int tc;
int i;
 
-   if (!MLX5_CAP_GEN(mdev, cq_moderation))
-   return -EOPNOTSUPP;
-
-   mutex_lock(>state_lock);
-
-   was_opened = test_bit(MLX5E_STATE_OPENED, >state);
-   if (was_opened && restart) {
-   mlx5e_close_locked(netdev);
-   priv->channels.params.rx_am_enabled = 
!!coal->use_adaptive_rx_coalesce;
-   }
-
-   priv->channels.params.tx_cq_moderation.usec = coal->tx_coalesce_usecs;
-   priv->channels.params.tx_cq_moderation.pkts = 
coal->tx_max_coalesced_frames;
-   priv->channels.params.rx_cq_moderation.usec = coal->rx_coalesce_usecs;
-   priv->channels.params.rx_cq_moderation.pkts = 
coal->rx_max_coalesced_frames;
-
-   if (!was_opened || restart)
-   goto out;
for (i = 0; i < priv->channels.num; ++i) {
struct mlx5e_channel *c = priv->channels.c[i];
 
@@ -667,11 +650,50 @@ static int mlx5e_set_coalesce(struct net_device *netdev,
   coal->rx_coalesce_usecs,
   coal->rx_max_coalesced_frames);
}
+}
 
-out:
-   if (was_opened && restart)
-   err = mlx5e_open_locked(netdev);
+static int mlx5e_set_coalesce(struct net_device *netdev,
+ struct ethtool_coalesce *coal)
+{
+   struct mlx5e_priv *priv= netdev_priv(netdev);
+   struct mlx5_core_dev *mdev = priv->mdev;
+   struct mlx5e_channels new_channels = {};
+   int err = 0;
+   bool reset;
 
+   if (!MLX5_CAP_GEN(mdev, cq_moderation))
+   return -EOPNOTSUPP;
+
+   mutex_lock(>state_lock);
+   new_channels.params = priv->channels.params;
+
+   new_channels.params.tx_cq_moderation.usec = coal->tx_coalesce_usecs;
+   new_channels.params.tx_cq_moderation.pkts = 
coal->tx_max_coalesced_frames;
+   new_channels.params.rx_cq_moderation.usec = coal->rx_coalesce_usecs;
+   new_channels.params.rx_cq_moderation.pkts = 
coal->rx_max_coalesced_frames;
+   new_channels.params.rx_am_enabled = 
!!coal->use_adaptive_rx_coalesce;
+
+   if (!test_bit(MLX5E_STATE_OPENED, >state)) {
+   priv->channels.params =

[net-next 13/14] net/mlx5e: Fail safe tc setup

2017-03-27 Thread Saeed Mahameed
Use the new fail-safe channels switch mechanism to set up new
tc parameters.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 97e153209834..1e29f40d84ca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2910,7 +2910,7 @@ int mlx5e_modify_channels_vsd(struct mlx5e_channels *chs, 
bool vsd)
 static int mlx5e_setup_tc(struct net_device *netdev, u8 tc)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
-   bool was_opened;
+   struct mlx5e_channels new_channels = {};
int err = 0;
 
if (tc && tc != MLX5E_MAX_NUM_TC)
@@ -2918,17 +2918,21 @@ static int mlx5e_setup_tc(struct net_device *netdev, u8 
tc)
 
mutex_lock(>state_lock);
 
-   was_opened = test_bit(MLX5E_STATE_OPENED, >state);
-   if (was_opened)
-   mlx5e_close_locked(priv->netdev);
+   new_channels.params = priv->channels.params;
+   new_channels.params.num_tc = tc ? tc : 1;
 
-   priv->channels.params.num_tc = tc ? tc : 1;
+   if (test_bit(MLX5E_STATE_OPENED, >state)) {
+   priv->channels.params = new_channels.params;
+   goto out;
+   }
 
-   if (was_opened)
-   err = mlx5e_open_locked(priv->netdev);
+   err = mlx5e_open_channels(priv, _channels);
+   if (err)
+   goto out;
 
+   mlx5e_switch_priv_channels(priv, _channels);
+out:
mutex_unlock(>state_lock);
-
return err;
 }
 
-- 
2.11.0



[net-next 10/14] net/mlx5e: Introduce switch channels

2017-03-27 Thread Saeed Mahameed
A fail safe helper functions that allows switching to new channels on the
fly,  In simple words:

make_new_config(new_params)
{
new_channels = open_channels(new_params);
if (!new_channels)
 return "Failed, but current channels are still active :)"

switch_channels(new_channels);

return "SUCCESS";
}

Demonstrate mlx5e_switch_priv_channels usage in set channels ethtool
callback and make it fail-safe using the new switch channels mechanism.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  7 +
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 29 -
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 30 +++---
 3 files changed, 51 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 44c454b34754..2f259dfbf844 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -863,6 +863,13 @@ void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_params 
*params,
 
 int mlx5e_open_locked(struct net_device *netdev);
 int mlx5e_close_locked(struct net_device *netdev);
+
+int mlx5e_open_channels(struct mlx5e_priv *priv,
+   struct mlx5e_channels *chs);
+void mlx5e_close_channels(struct mlx5e_channels *chs);
+void mlx5e_switch_priv_channels(struct mlx5e_priv *priv,
+   struct mlx5e_channels *new_chs);
+
 void mlx5e_build_default_indir_rqt(struct mlx5_core_dev *mdev,
   u32 *indirection_rqt, int len,
   int num_channels);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index b2cd0ef7921e..e5cee400a4d3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -556,8 +556,8 @@ static int mlx5e_set_channels(struct net_device *dev,
 {
struct mlx5e_priv *priv = netdev_priv(dev);
unsigned int count = ch->combined_count;
+   struct mlx5e_channels new_channels = {};
bool arfs_enabled;
-   bool was_opened;
int err = 0;
 
if (!count) {
@@ -571,22 +571,27 @@ static int mlx5e_set_channels(struct net_device *dev,
 
mutex_lock(>state_lock);
 
-   was_opened = test_bit(MLX5E_STATE_OPENED, >state);
-   if (was_opened)
-   mlx5e_close_locked(dev);
+   new_channels.params = priv->channels.params;
+   new_channels.params.num_channels = count;
+   mlx5e_build_default_indir_rqt(priv->mdev, 
new_channels.params.indirection_rqt,
+ MLX5E_INDIR_RQT_SIZE, count);
+
+   if (!test_bit(MLX5E_STATE_OPENED, >state)) {
+   priv->channels.params = new_channels.params;
+   goto out;
+   }
+
+   /* Create fresh channels with new parameters */
+   err = mlx5e_open_channels(priv, _channels);
+   if (err)
+   goto out;
 
arfs_enabled = dev->features & NETIF_F_NTUPLE;
if (arfs_enabled)
mlx5e_arfs_disable(priv);
 
-   priv->channels.params.num_channels = count;
-   mlx5e_build_default_indir_rqt(priv->mdev, 
priv->channels.params.indirection_rqt,
- MLX5E_INDIR_RQT_SIZE, count);
-
-   if (was_opened)
-   err = mlx5e_open_locked(dev);
-   if (err)
-   goto out;
+   /* Switch to new channels, set new parameters and close old ones */
+   mlx5e_switch_priv_channels(priv, _channels);
 
if (arfs_enabled) {
err = mlx5e_arfs_enable(priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a94f84ec2c1a..97e153209834 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1972,8 +1972,8 @@ static void mlx5e_build_channel_param(struct mlx5e_priv 
*priv,
mlx5e_build_ico_cq_param(priv, icosq_log_wq_sz, >icosq_cq);
 }
 
-static int mlx5e_open_channels(struct mlx5e_priv *priv,
-  struct mlx5e_channels *chs)
+int mlx5e_open_channels(struct mlx5e_priv *priv,
+   struct mlx5e_channels *chs)
 {
struct mlx5e_channel_param *cparam;
int err = -ENOMEM;
@@ -2037,7 +2037,7 @@ static void mlx5e_deactivate_channels(struct 
mlx5e_channels *chs)
mlx5e_deactivate_channel(chs->c[i]);
 }
 
-static void mlx5e_close_channels(struct mlx5e_channels *chs)
+void mlx5e_close_channels(struct mlx5e_channels *chs)
 {
int i;
 
@@ -2533,6 +2533,30 @@ static void mlx5e_deactivate_priv_channels(struct 
mlx5e_priv *priv)

Re: mlx5e backports for v4.9 -stable

2017-03-20 Thread Saeed Mahameed


On 03/17/2017 02:06 AM, David Miller wrote:
> 
> Commits:
> 
> 
> From b0d4660b4cc52e6477ca3a43435351d565dfcedc Mon Sep 17 00:00:00 2001
> From: Tariq Toukan <tar...@mellanox.com>
> Date: Wed, 22 Feb 2017 17:20:14 +0200
> Subject: [PATCH] net/mlx5e: Fix broken CQE compression initialization
> 
> 
> and
> 
> 
> From 6dc4b54e77282caf17f0ff72aa32dd296037fbc0 Mon Sep 17 00:00:00 2001
> From: Saeed Mahameed <sae...@mellanox.com>
> Date: Wed, 22 Feb 2017 17:20:15 +0200
> Subject: [PATCH] net/mlx5e: Update MPWQE stride size when modifying CQE
>  compress state
> 
> 
> do not apply even closely to v4.9 while I was working on -stable backports.
> 
> Please provide proper backports of these two patches if you want them to
> show up in v4.9 -stable.
> 

Hi Dave,

thank you for trying, we will provide the patches, but I don't know what is the 
right procedure
to do so.

is it ok to post the patches applied on top tag v4.9.16  of 
kernel/git/stable/linux-stable.git ?
to whom should I send them ? 

thanks,
Saeed.


[PATCH net 0/8] Mellanox mlx5 fixes 2017-03-21

2017-03-21 Thread Saeed Mahameed
Hi Dave,

This series contains some mlx5 core and ethernet driver fixes.

For -stable:
net/mlx5e: Count LRO packets correctly (for kernel >= 4.2)
net/mlx5e: Count GSO packets correctly (for kernel >= 4.2)
net/mlx5: Increase number of max QPs in default profile (for kernel >= 4.0)
net/mlx5e: Avoid supporting udp tunnel port ndo for VF reps (for kernel >= 4.10)
net/mlx5e: Use the proper UAPI values when offloading TC vlan actions (for 
kernel >= v4.9)
net/mlx5: E-Switch, Don't allow changing inline mode when flows are configured 
(for kernel >= 4.10)
net/mlx5e: Change the TC offload rule add/del code path to be per NIC or 
E-Switch (for kernel >= 4.10)
net/mlx5: Add missing entries for set/query rate limit commands (for kernel >= 
4.8)

Thanks,
Saeed.

Gal Pressman (2):
  net/mlx5e: Count GSO packets correctly
  net/mlx5e: Count LRO packets correctly

Maor Gottlieb (1):
  net/mlx5: Increase number of max QPs in default profile

Or Gerlitz (3):
  net/mlx5: Add missing entries for set/query rate limit commands
  net/mlx5e: Change the TC offload rule add/del code path to be per NIC
or E-Switch
  net/mlx5e: Use the proper UAPI values when offloading TC vlan actions

Paul Blakey (1):
  net/mlx5e: Avoid supporting udp tunnel port ndo for VF reps

Roi Dayan (1):
  net/mlx5: E-Switch, Don't allow changing inline mode when flows are
configured

 drivers/net/ethernet/mellanox/mlx5/core/cmd.c  |  4 ++
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  4 --
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  8 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  2 -
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|  4 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 74 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|  5 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  6 ++
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 22 +++
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  2 +-
 10 files changed, 94 insertions(+), 37 deletions(-)

-- 
2.11.0



[PATCH net 8/8] net/mlx5e: Count LRO packets correctly

2017-03-21 Thread Saeed Mahameed
From: Gal Pressman <g...@mellanox.com>

RX packets statistics ('rx_packets' counter) used to count LRO packets
as one, even though it contains multiple segments.
This patch will increment the counter by the number of segments, and
align the driver with the behavior of other drivers in the stack.

Note that no information is lost in this patch due to 'rx_lro_packets'
counter existence.

Before, ethtool showed:
$ ethtool -S ens6 | egrep "rx_packets|rx_lro_packets"
 rx_packets: 435277
 rx_lro_packets: 35847
 rx_packets_phy: 1935066

Now, we will see the more logical statistics:
$ ethtool -S ens6 | egrep "rx_packets|rx_lro_packets"
 rx_packets: 1935066
 rx_lro_packets: 35847
 rx_packets_phy: 1935066

Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files")
Signed-off-by: Gal Pressman <g...@mellanox.com>
Cc: kernel-t...@fb.com
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 3d371688fbbb..bafcb349a50c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -601,6 +601,10 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 
*cqe,
if (lro_num_seg > 1) {
mlx5e_lro_update_hdr(skb, cqe, cqe_bcnt);
skb_shinfo(skb)->gso_size = DIV_ROUND_UP(cqe_bcnt, lro_num_seg);
+   /* Subtract one since we already counted this as one
+* "regular" packet in mlx5e_complete_rx_cqe()
+*/
+   rq->stats.packets += lro_num_seg - 1;
rq->stats.lro_packets++;
rq->stats.lro_bytes += cqe_bcnt;
}
-- 
2.11.0



[PATCH net 5/8] net/mlx5e: Avoid supporting udp tunnel port ndo for VF reps

2017-03-21 Thread Saeed Mahameed
From: Paul Blakey <pa...@mellanox.com>

This was added to allow the TC offloading code to identify offloading
encap/decap vxlan rules.

The VF reps are effectively related to the same mlx5 PCI device as the
PF. Since the kernel invokes the (say) delete ndo for each netdev, the
FW erred on multiple vxlan dst port deletes when the port was deleted
from the system.

We fix that by keeping the registration to be carried out only by the
PF. Since the PF serves as the uplink device, the VF reps will look
up a port there and realize if they are ok to offload that.

Tested:
 
 
 ip link add vxlan1 type vxlan id 44 dev ens5f0 dstport 
 ip link set vxlan1 up
 ip link del dev vxlan1

Fixes: 4a25730eb202 ('net/mlx5e: Add ndo_udp_tunnel_add to VF representors')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
Reviewed-by: Or Gerlitz <ogerl...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  | 4 
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 2 --
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   | 9 +++--
 4 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f6a6ded204f6..dc52053128bc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -928,10 +928,6 @@ void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, 
struct mlx5e_priv *priv);
 int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev);
 void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device 
*netdev);
 u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout);
-void mlx5e_add_vxlan_port(struct net_device *netdev,
- struct udp_tunnel_info *ti);
-void mlx5e_del_vxlan_port(struct net_device *netdev,
- struct udp_tunnel_info *ti);
 
 int mlx5e_get_offload_stats(int attr_id, const struct net_device *dev,
void *sp);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 8ef64c4db2c2..66c133757a5e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3100,8 +3100,8 @@ static int mlx5e_get_vf_stats(struct net_device *dev,
vf_stats);
 }
 
-void mlx5e_add_vxlan_port(struct net_device *netdev,
- struct udp_tunnel_info *ti)
+static void mlx5e_add_vxlan_port(struct net_device *netdev,
+struct udp_tunnel_info *ti)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
 
@@ -3114,8 +3114,8 @@ void mlx5e_add_vxlan_port(struct net_device *netdev,
mlx5e_vxlan_queue_work(priv, ti->sa_family, be16_to_cpu(ti->port), 1);
 }
 
-void mlx5e_del_vxlan_port(struct net_device *netdev,
- struct udp_tunnel_info *ti)
+static void mlx5e_del_vxlan_port(struct net_device *netdev,
+struct udp_tunnel_info *ti)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 2c864574a9d5..f621373bd7a5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -393,8 +393,6 @@ static const struct net_device_ops mlx5e_netdev_ops_rep = {
.ndo_get_phys_port_name  = mlx5e_rep_get_phys_port_name,
.ndo_setup_tc= mlx5e_rep_ndo_setup_tc,
.ndo_get_stats64 = mlx5e_rep_get_stats,
-   .ndo_udp_tunnel_add  = mlx5e_add_vxlan_port,
-   .ndo_udp_tunnel_del  = mlx5e_del_vxlan_port,
.ndo_has_offload_stats   = mlx5e_has_offload_stats,
.ndo_get_offload_stats   = mlx5e_get_offload_stats,
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 9c13abaf3885..fade7233dac5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -267,12 +267,15 @@ static int parse_tunnel_attr(struct mlx5e_priv *priv,
skb_flow_dissector_target(f->dissector,
  FLOW_DISSECTOR_KEY_ENC_PORTS,
  f->mask);
+   struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+   struct net_device *up_dev = mlx5_eswitch_get_uplink_netdev(esw);
+   struct mlx5e_priv *up_priv = netdev_priv(up_dev);
 
/* Full udp dst port must be given */
if (memchr_inv(>dst, 0xff, sizeof(mask->dst)))

[PATCH net 4/8] net/mlx5e: Use the proper UAPI values when offloading TC vlan actions

2017-03-21 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

Currently we use the non UAPI values and we miss erring on
the modify action which is not supported, fix that.

Fixes: 8b32580df1cb ('net/mlx5e: Add TC vlan action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reported-by: Petr Machata <pe...@mellanox.com>
Reviewed-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 2825b5665456..9c13abaf3885 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1131,14 +1131,16 @@ static int parse_tc_fdb_actions(struct mlx5e_priv 
*priv, struct tcf_exts *exts,
}
 
if (is_tcf_vlan(a)) {
-   if (tcf_vlan_action(a) == VLAN_F_POP) {
+   if (tcf_vlan_action(a) == TCA_VLAN_ACT_POP) {
attr->action |= 
MLX5_FLOW_CONTEXT_ACTION_VLAN_POP;
-   } else if (tcf_vlan_action(a) == VLAN_F_PUSH) {
+   } else if (tcf_vlan_action(a) == TCA_VLAN_ACT_PUSH) {
if (tcf_vlan_push_proto(a) != 
htons(ETH_P_8021Q))
return -EOPNOTSUPP;
 
attr->action |= 
MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH;
attr->vlan = tcf_vlan_push_vid(a);
+   } else { /* action is TCA_VLAN_ACT_MODIFY */
+   return -EOPNOTSUPP;
}
continue;
}
-- 
2.11.0



[PATCH net 6/8] net/mlx5: Increase number of max QPs in default profile

2017-03-21 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

With ConnectX-4 sharing SRQs from the same space as QPs, we hit a
limit preventing some applications to allocate needed QPs amount.
Double the size to 256K.

Fixes: e126ba97dba9e ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index e2bd600d19de..60154a175bd3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -87,7 +87,7 @@ static struct mlx5_profile profile[] = {
[2] = {
.mask   = MLX5_PROF_MASK_QP_SIZE |
  MLX5_PROF_MASK_MR_CACHE,
-   .log_max_qp = 17,
+   .log_max_qp = 18,
.mr_cache[0]= {
.size   = 500,
.limit  = 250
-- 
2.11.0



[PATCH net 7/8] net/mlx5e: Count GSO packets correctly

2017-03-21 Thread Saeed Mahameed
From: Gal Pressman <g...@mellanox.com>

TX packets statistics ('tx_packets' counter) used to count GSO packets
as one, even though it contains multiple segments.
This patch will increment the counter by the number of segments, and
align the driver with the behavior of other drivers in the stack.

Note that no information is lost in this patch due to 'tx_tso_packets'
counter existence.

Before, ethtool showed:
$ ethtool -S ens6 | egrep "tx_packets|tx_tso_packets"
 tx_packets: 61340
 tx_tso_packets: 60954
 tx_packets_phy: 2451115

Now, we will see the more logical statistics:
$ ethtool -S ens6 | egrep "tx_packets|tx_tso_packets"
 tx_packets: 2451115
 tx_tso_packets: 60954
 tx_packets_phy: 2451115

Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files")
Signed-off-by: Gal Pressman <g...@mellanox.com>
Cc: kernel-t...@fb.com
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index f193128bac4b..57f5e2d7ebd1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -274,15 +274,18 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, 
struct sk_buff *skb)
sq->stats.tso_bytes += skb->len - ihs;
}
 
+   sq->stats.packets += skb_shinfo(skb)->gso_segs;
num_bytes = skb->len + (skb_shinfo(skb)->gso_segs - 1) * ihs;
} else {
bf = sq->bf_budget &&
 !skb->xmit_more &&
 !skb_shinfo(skb)->nr_frags;
ihs = mlx5e_get_inline_hdr_size(sq, skb, bf);
+   sq->stats.packets++;
num_bytes = max_t(unsigned int, skb->len, ETH_ZLEN);
}
 
+   sq->stats.bytes += num_bytes;
wi->num_bytes = num_bytes;
 
ds_cnt = sizeof(*wqe) / MLX5_SEND_WQE_DS;
@@ -381,8 +384,6 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, 
struct sk_buff *skb)
if (bf)
sq->bf_budget--;
 
-   sq->stats.packets++;
-   sq->stats.bytes += num_bytes;
return NETDEV_TX_OK;
 
 dma_unmap_wqe_err:
-- 
2.11.0



Re: mlx5e backports for v4.9 -stable

2017-03-21 Thread Saeed Mahameed
On Mon, Mar 20, 2017 at 11:32 PM, Saeed Mahameed <sae...@mellanox.com> wrote:
>
>
> On 03/17/2017 02:06 AM, David Miller wrote:
>>
>> Commits:
>>
>> 
>> From b0d4660b4cc52e6477ca3a43435351d565dfcedc Mon Sep 17 00:00:00 2001
>> From: Tariq Toukan <tar...@mellanox.com>
>> Date: Wed, 22 Feb 2017 17:20:14 +0200
>> Subject: [PATCH] net/mlx5e: Fix broken CQE compression initialization
>> 
>>
>> and
>>
>> ====
>> From 6dc4b54e77282caf17f0ff72aa32dd296037fbc0 Mon Sep 17 00:00:00 2001
>> From: Saeed Mahameed <sae...@mellanox.com>
>> Date: Wed, 22 Feb 2017 17:20:15 +0200
>> Subject: [PATCH] net/mlx5e: Update MPWQE stride size when modifying CQE
>>  compress state
>> 
>>
>> do not apply even closely to v4.9 while I was working on -stable backports.
>>
>> Please provide proper backports of these two patches if you want them to
>> show up in v4.9 -stable.
>>
>
> Hi Dave,
>
> thank you for trying, we will provide the patches, but I don't know what is 
> the right procedure
> to do so.
>
> is it ok to post the patches applied on top tag v4.9.16  of 
> kernel/git/stable/linux-stable.git ?
> to whom should I send them ?
>

Actually, it looks like the fixes tag is not accurate, the bug is exposed only
with the new user control of CQE compression feature which should have
took care of
those issues, 9bcc86064bb5 ("net/mlx5e: Add CQE compression user control")
and this patch was only introduced in 4.10.

No need to apply those patches into 4.9-stable.

Sorry for the inconvenience.

thanks,
Saeed.


[PATCH net 3/8] net/mlx5: E-Switch, Don't allow changing inline mode when flows are configured

2017-03-21 Thread Saeed Mahameed
From: Roi Dayan <r...@mellanox.com>

Changing the eswitch inline mode can potentially cause already configured
flows not to match the policy. E.g. set policy L4, add some L4 rules,
set policy to L2 --> bad! Hence we disallow it.

Keep track of how many offloaded rules are now set and refuse
inline mode changes if this isn't zero.

Fixes: bffaa916588e ("net/mlx5: E-Switch, Add control for inline mode")
Signed-off-by: Roi Dayan <r...@mellanox.com>
Reviewed-by: Or Gerlitz <ogerl...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 8 
 2 files changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 9227a83a97e3..ad329b1680b4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -209,6 +209,7 @@ struct mlx5_esw_offload {
struct mlx5_eswitch_rep *vport_reps;
DECLARE_HASHTABLE(encap_tbl, 8);
u8 inline_mode;
+   u64 num_flows;
 };
 
 struct mlx5_eswitch {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index bfabefe20ac0..307ec6c5fd3b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -93,6 +93,8 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
   spec, _act, dest, i);
if (IS_ERR(rule))
mlx5_fc_destroy(esw->dev, counter);
+   else
+   esw->offloads.num_flows++;
 
return rule;
 }
@@ -108,6 +110,7 @@ mlx5_eswitch_del_offloaded_rule(struct mlx5_eswitch *esw,
counter = mlx5_flow_rule_counter(rule);
mlx5_del_flow_rules(rule);
mlx5_fc_destroy(esw->dev, counter);
+   esw->offloads.num_flows--;
}
 }
 
@@ -922,6 +925,11 @@ int mlx5_devlink_eswitch_inline_mode_set(struct devlink 
*devlink, u8 mode)
MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
return -EOPNOTSUPP;
 
+   if (esw->offloads.num_flows > 0) {
+   esw_warn(dev, "Can't set inline mode when flows are 
configured\n");
+   return -EOPNOTSUPP;
+   }
+
err = esw_inline_mode_from_devlink(mode, _mode);
if (err)
goto out;
-- 
2.11.0



[PATCH net 1/8] net/mlx5: Add missing entries for set/query rate limit commands

2017-03-21 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

The switch cases for the rate limit set and query commands were
missing, which could get us wrong under fw error or driver reset
flow, fix that.

Fixes: 1466cc5b23d1 ('net/mlx5: Rate limit tables support')
Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Hadar Hen Zion <had...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index caa837e5e2b9..a380353a78c2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -361,6 +361,8 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev 
*dev, u16 op,
case MLX5_CMD_OP_QUERY_VPORT_COUNTER:
case MLX5_CMD_OP_ALLOC_Q_COUNTER:
case MLX5_CMD_OP_QUERY_Q_COUNTER:
+   case MLX5_CMD_OP_SET_RATE_LIMIT:
+   case MLX5_CMD_OP_QUERY_RATE_LIMIT:
case MLX5_CMD_OP_ALLOC_PD:
case MLX5_CMD_OP_ALLOC_UAR:
case MLX5_CMD_OP_CONFIG_INT_MODERATION:
@@ -497,6 +499,8 @@ const char *mlx5_command_str(int command)
MLX5_COMMAND_STR_CASE(ALLOC_Q_COUNTER);
MLX5_COMMAND_STR_CASE(DEALLOC_Q_COUNTER);
MLX5_COMMAND_STR_CASE(QUERY_Q_COUNTER);
+   MLX5_COMMAND_STR_CASE(SET_RATE_LIMIT);
+   MLX5_COMMAND_STR_CASE(QUERY_RATE_LIMIT);
MLX5_COMMAND_STR_CASE(ALLOC_PD);
MLX5_COMMAND_STR_CASE(DEALLOC_PD);
MLX5_COMMAND_STR_CASE(ALLOC_UAR);
-- 
2.11.0



[PATCH net 2/8] net/mlx5e: Change the TC offload rule add/del code path to be per NIC or E-Switch

2017-03-21 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

Refactor the code to deal with add/del TC rules to have handler per NIC/E-switch
offloading use case, and push the latter into the e-switch code. This provides
better separation and is to be used in down-stream patch for applying a fix.

Fixes: bffaa916588e ("net/mlx5: E-Switch, Add control for inline mode")
Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 59 ++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  5 ++
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 14 +
 3 files changed, 58 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 79481f4cf264..2825b5665456 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -133,6 +133,23 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
return rule;
 }
 
+static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
+ struct mlx5e_tc_flow *flow)
+{
+   struct mlx5_fc *counter = NULL;
+
+   if (!IS_ERR(flow->rule)) {
+   counter = mlx5_flow_rule_counter(flow->rule);
+   mlx5_del_flow_rules(flow->rule);
+   mlx5_fc_destroy(priv->mdev, counter);
+   }
+
+   if (!mlx5e_tc_num_filters(priv) && (priv->fs.tc.t)) {
+   mlx5_destroy_flow_table(priv->fs.tc.t);
+   priv->fs.tc.t = NULL;
+   }
+}
+
 static struct mlx5_flow_handle *
 mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
  struct mlx5_flow_spec *spec,
@@ -149,7 +166,24 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 }
 
 static void mlx5e_detach_encap(struct mlx5e_priv *priv,
-  struct mlx5e_tc_flow *flow) {
+  struct mlx5e_tc_flow *flow);
+
+static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv,
+ struct mlx5e_tc_flow *flow)
+{
+   struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+
+   mlx5_eswitch_del_offloaded_rule(esw, flow->rule, flow->attr);
+
+   mlx5_eswitch_del_vlan_action(esw, flow->attr);
+
+   if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
+   mlx5e_detach_encap(priv, flow);
+}
+
+static void mlx5e_detach_encap(struct mlx5e_priv *priv,
+  struct mlx5e_tc_flow *flow)
+{
struct list_head *next = flow->encap.next;
 
list_del(>encap);
@@ -173,25 +207,10 @@ static void mlx5e_detach_encap(struct mlx5e_priv *priv,
 static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
  struct mlx5e_tc_flow *flow)
 {
-   struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
-   struct mlx5_fc *counter = NULL;
-
-   if (!IS_ERR(flow->rule)) {
-   counter = mlx5_flow_rule_counter(flow->rule);
-   mlx5_del_flow_rules(flow->rule);
-   mlx5_fc_destroy(priv->mdev, counter);
-   }
-
-   if (flow->flags & MLX5E_TC_FLOW_ESWITCH) {
-   mlx5_eswitch_del_vlan_action(esw, flow->attr);
-   if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
-   mlx5e_detach_encap(priv, flow);
-   }
-
-   if (!mlx5e_tc_num_filters(priv) && (priv->fs.tc.t)) {
-   mlx5_destroy_flow_table(priv->fs.tc.t);
-   priv->fs.tc.t = NULL;
-   }
+   if (flow->flags & MLX5E_TC_FLOW_ESWITCH)
+   mlx5e_tc_del_fdb_flow(priv, flow);
+   else
+   mlx5e_tc_del_nic_flow(priv, flow);
 }
 
 static void parse_vxlan_attr(struct mlx5_flow_spec *spec,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 5b78883d5654..9227a83a97e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -271,6 +271,11 @@ struct mlx5_flow_handle *
 mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
struct mlx5_flow_spec *spec,
struct mlx5_esw_flow_attr *attr);
+void
+mlx5_eswitch_del_offloaded_rule(struct mlx5_eswitch *esw,
+   struct mlx5_flow_handle *rule,
+   struct mlx5_esw_flow_attr *attr);
+
 struct mlx5_flow_handle *
 mlx5_eswitch_create_vport_rx_rule(struct mlx5_eswitch *esw, int vport, u32 
tirn);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 4f5b0d47d5f3..bfabefe20

Re: [PATCH net-next] net/mlx5e: fix build error without CONFIG_SYSFS

2017-04-05 Thread Saeed Mahameed
On Wed, Apr 5, 2017 at 5:11 AM, Tobias Regnery <tobias.regn...@gmail.com> wrote:
> Commit 9008ae074885 ("net/mlx5e: Minimize mlx5e_{open/close}_locked")
> copied the calls to netif_set_real_num_{tx,rx}_queues from
> mlx5e_open_locked to mlx5e_activate_priv_channels and wraps them in an
> if condition to test for netdev->real_num_{tx,rx}_queues.
>
> But netdev->real_num_rx_queues is conditionally compiled in if CONFIG_SYSFS
> is set. Without CONFIG_SYSFS the build fails:
>
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 
> 'mlx5e_activate_priv_channels':
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2515:12: error: 'struct 
> net_device' has no member named 'real_num_rx_queues'; did you mean 
> 'real_num_tx_queues'?
>
> Fix this by unconditionally call netif_set_real_num{tx,rx}_queues like before
> commit 9008ae074885.
>
> Fixes: 9008ae074885 ("net/mlx5e: Minimize mlx5e_{open/close}_locked")
> Signed-off-by: Tobias Regnery <tobias.regn...@gmail.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index ec389b1b51cb..d5248637d44f 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -2510,10 +2510,8 @@ static void mlx5e_activate_priv_channels(struct 
> mlx5e_priv *priv)
> struct net_device *netdev = priv->netdev;
>
> mlx5e_netdev_set_tcs(netdev);
> -   if (netdev->real_num_tx_queues != num_txqs)
> -   netif_set_real_num_tx_queues(netdev, num_txqs);
> -   if (netdev->real_num_rx_queues != priv->channels.num)
> -   netif_set_real_num_rx_queues(netdev, priv->channels.num);
> +   netif_set_real_num_tx_queues(netdev, num_txqs);
> +   netif_set_real_num_rx_queues(netdev, priv->channels.num);
>

Acked-by: Saeed Mahameed <sae...@mellanox.com>

Thanks Tobias for the fix, although it is redundant for most of the
reconfiguration options of mlx5 to call set_real_num_{rx,tx} queues
every time, but it is not that big of a deal, so it is ok with me to
align the code with the previous behavior we had before ("net/mlx5e:
Minimize mlx5e_{open/close}_locked").


Re: [PATCH net-next] mlx4: trust shinfo->gso_segs

2017-04-05 Thread Saeed Mahameed
On Wed, Apr 5, 2017 at 11:49 AM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> From: Eric Dumazet <eduma...@google.com>
>
> mlx4 is the only driver in the tree making a point to recompute
> shinfo->gso_segs.
>
> Lets remove superfluous code.
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
> Cc: Tariq Toukan <tar...@mellanox.com>
> Cc: Saeed Mahameed <sae...@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c |3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 
> e0c5ffb3e3a6607456e1f191b0b8c8becfc71219..3ba89bc43d74d8c023776079bcd0bbadd70fb5c6
>  100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -978,8 +978,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct 
> net_device *dev)
>
> ring->tso_packets++;
>
> -   i = ((skb->len - lso_header_size) / shinfo->gso_size) +
> -   !!((skb->len - lso_header_size) % shinfo->gso_size);
> +   i = shinfo->gso_segs;
>     tx_info->nr_bytes = skb->len + (i - 1) * lso_header_size;
> ring->packets += i;
> } else {
>
>

Reviewed-by: Saeed Mahameed <sae...@mellanox.com>


[PATCH net-next 09/16] net/mlx5e: IPoIB, Basic netdev ndos open/close

2017-04-12 Thread Saeed Mahameed
Implement open/close of IPoIB netdevice ndos using mlx5e's
channels API to manage data path resources (RQs/SQs/CQs).

Set IPoIB netdev address on dev_init ndo.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  4 +-
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c   | 90 ++-
 3 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 5345d875b695..23b92ec54e12 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -883,6 +883,8 @@ typedef int (*mlx5e_fp_hw_modify)(struct mlx5e_priv *priv);
 void mlx5e_switch_priv_channels(struct mlx5e_priv *priv,
struct mlx5e_channels *new_chs,
mlx5e_fp_hw_modify hw_modify);
+void mlx5e_activate_priv_channels(struct mlx5e_priv *priv);
+void mlx5e_deactivate_priv_channels(struct mlx5e_priv *priv);
 
 void mlx5e_build_default_indir_rqt(struct mlx5_core_dev *mdev,
   u32 *indirection_rqt, int len,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 1fde4e2301a4..eb657987e9b5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2547,7 +2547,7 @@ static void mlx5e_build_channels_tx_maps(struct 
mlx5e_priv *priv)
}
 }
 
-static void mlx5e_activate_priv_channels(struct mlx5e_priv *priv)
+void mlx5e_activate_priv_channels(struct mlx5e_priv *priv)
 {
int num_txqs = priv->channels.num * priv->channels.params.num_tc;
struct net_device *netdev = priv->netdev;
@@ -2567,7 +2567,7 @@ static void mlx5e_activate_priv_channels(struct 
mlx5e_priv *priv)
mlx5e_redirect_rqts_to_channels(priv, >channels);
 }
 
-static void mlx5e_deactivate_priv_channels(struct mlx5e_priv *priv)
+void mlx5e_deactivate_priv_channels(struct mlx5e_priv *priv)
 {
mlx5e_redirect_rqts_to_drop(priv);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
index d7d705c840ae..e188d067bc97 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
@@ -34,6 +34,18 @@
 #include "en.h"
 #include "ipoib.h"
 
+static int mlx5i_open(struct net_device *netdev);
+static int mlx5i_close(struct net_device *netdev);
+static int  mlx5i_dev_init(struct net_device *dev);
+static void mlx5i_dev_cleanup(struct net_device *dev);
+
+static const struct net_device_ops mlx5i_netdev_ops = {
+   .ndo_open= mlx5i_open,
+   .ndo_stop= mlx5i_close,
+   .ndo_init= mlx5i_dev_init,
+   .ndo_uninit  = mlx5i_dev_cleanup,
+};
+
 /* IPoIB mlx5 netdev profile */
 
 /* Called directly after IPoIB netdevice was created to initialize SW structs 
*/
@@ -52,7 +64,17 @@ static void mlx5i_init(struct mlx5_core_dev *mdev,
mlx5e_build_nic_params(mdev, >channels.params, 
profile->max_nch(mdev));
 
mutex_init(>state_lock);
-   /* TODO : init netdev features here */
+
+   netdev->hw_features|= NETIF_F_SG;
+   netdev->hw_features|= NETIF_F_IP_CSUM;
+   netdev->hw_features|= NETIF_F_IPV6_CSUM;
+   netdev->hw_features|= NETIF_F_GRO;
+   netdev->hw_features|= NETIF_F_TSO;
+   netdev->hw_features|= NETIF_F_TSO6;
+   netdev->hw_features|= NETIF_F_RXCSUM;
+   netdev->hw_features|= NETIF_F_RXHASH;
+
+   netdev->netdev_ops = _netdev_ops;
 }
 
 /* Called directly before IPoIB netdevice is destroyed to cleanup SW structs */
@@ -181,6 +203,72 @@ static const struct mlx5e_profile mlx5i_nic_profile = {
.max_tc= MLX5I_MAX_NUM_TC,
 };
 
+/* mlx5i netdev NDos */
+
+static int mlx5i_dev_init(struct net_device *dev)
+{
+   struct mlx5e_priv*priv   = mlx5i_epriv(dev);
+   struct mlx5i_priv*ipriv  = priv->ppriv;
+
+   /* Set dev address using underlay QP */
+   dev->dev_addr[1] = (ipriv->qp.qpn >> 16) & 0xff;
+   dev->dev_addr[2] = (ipriv->qp.qpn >>  8) & 0xff;
+   dev->dev_addr[3] = (ipriv->qp.qpn) & 0xff;
+
+   return 0;
+}
+
+static void mlx5i_dev_cleanup(struct net_device *dev)
+{
+   /* TODO: detach underlay qp from flow-steering by reset it */
+}
+
+static int mlx5i_open(struct net_device *netdev)
+{
+   struct mlx5e_priv *priv = mlx5i_epriv(netdev);
+   int err;
+
+   mutex_lock(>state_lock);
+
+   set_bit(MLX5E_STATE_OPENED, >state);
+
+   err = mlx5e_open_cha

[PATCH net-next 07/16] net/mlx5e: IPoIB, RSS flow steering tables

2017-04-12 Thread Saeed Mahameed
Like the mlx5e ethernet mode, on IPoIB mode we need to create RX steering
tables, but IPoIB do not require MAC and VLAN steering tables so the
only tables we create in here are:
1. TTC Table (Traffic Type Classifier table for RSS steering)
2. ARFS Table (for accelerated RFS support)

Creation of those tables is identical to mlx5e ethernet mode, hence the
use of mlx5e_create_ttc_table and mlx5e_arfs_create_tables.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h|  4 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c |  7 ++--
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 46 +
 3 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index e5518536d56f..c813eab5d764 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -999,6 +999,7 @@ int mlx5e_attr_get(struct net_device *dev, struct 
switchdev_attr *attr);
 void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe);
 void mlx5e_update_hw_rep_counters(struct mlx5e_priv *priv);
 
+/* common netdev helpers */
 int mlx5e_create_indirect_rqt(struct mlx5e_priv *priv);
 
 int mlx5e_create_indirect_tirs(struct mlx5e_priv *priv);
@@ -1010,6 +1011,9 @@ int mlx5e_create_direct_tirs(struct mlx5e_priv *priv);
 void mlx5e_destroy_direct_tirs(struct mlx5e_priv *priv);
 void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct mlx5e_rqt *rqt);
 
+int mlx5e_create_ttc_table(struct mlx5e_priv *priv, u32 underlay_qpn);
+void mlx5e_destroy_ttc_table(struct mlx5e_priv *priv);
+
 int mlx5e_create_tises(struct mlx5e_priv *priv);
 void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv);
 int mlx5e_close(struct net_device *netdev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index 729904c43801..576d6787b484 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -792,7 +792,7 @@ static int mlx5e_create_ttc_table_groups(struct 
mlx5e_ttc_table *ttc)
return err;
 }
 
-static void mlx5e_destroy_ttc_table(struct mlx5e_priv *priv)
+void mlx5e_destroy_ttc_table(struct mlx5e_priv *priv)
 {
struct mlx5e_ttc_table *ttc = >fs.ttc;
 
@@ -800,7 +800,7 @@ static void mlx5e_destroy_ttc_table(struct mlx5e_priv *priv)
mlx5e_destroy_flow_table(>ft);
 }
 
-static int mlx5e_create_ttc_table(struct mlx5e_priv *priv)
+int mlx5e_create_ttc_table(struct mlx5e_priv *priv, u32 underlay_qpn)
 {
struct mlx5e_ttc_table *ttc = >fs.ttc;
struct mlx5_flow_table_attr ft_attr = {};
@@ -810,6 +810,7 @@ static int mlx5e_create_ttc_table(struct mlx5e_priv *priv)
ft_attr.max_fte = MLX5E_TTC_TABLE_SIZE;
ft_attr.level = MLX5E_TTC_FT_LEVEL;
ft_attr.prio = MLX5E_NIC_PRIO;
+   ft_attr.underlay_qpn = underlay_qpn;
 
ft->t = mlx5_create_flow_table(priv->fs.ns, _attr);
if (IS_ERR(ft->t)) {
@@ -1146,7 +1147,7 @@ int mlx5e_create_flow_steering(struct mlx5e_priv *priv)
priv->netdev->hw_features &= ~NETIF_F_NTUPLE;
}
 
-   err = mlx5e_create_ttc_table(priv);
+   err = mlx5e_create_ttc_table(priv, 0);
if (err) {
netdev_err(priv->netdev, "Failed to create ttc table, err=%d\n",
   err);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
index f0318920844e..e16e1c7b246e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
@@ -72,6 +72,45 @@ static void mlx5i_cleanup_tx(struct mlx5e_priv *priv)
 {
 }
 
+static int mlx5i_create_flow_steering(struct mlx5e_priv *priv)
+{
+   struct mlx5i_priv *ipriv = priv->ppriv;
+   int err;
+
+   priv->fs.ns = mlx5_get_flow_namespace(priv->mdev,
+  MLX5_FLOW_NAMESPACE_KERNEL);
+
+   if (!priv->fs.ns)
+   return -EINVAL;
+
+   err = mlx5e_arfs_create_tables(priv);
+   if (err) {
+   netdev_err(priv->netdev, "Failed to create arfs tables, 
err=%d\n",
+  err);
+   priv->netdev->hw_features &= ~NETIF_F_NTUPLE;
+   }
+
+   err = mlx5e_create_ttc_table(priv, ipriv->qp.qpn);
+   if (err) {
+   netdev_err(priv->netdev, "Failed to create ttc table, err=%d\n",
+  err);
+   goto err_destroy_arfs_tables;
+   }
+
+   return 0;
+
+err_destroy_arfs_tables:
+   mlx5e_arfs_destroy_tables(priv);
+
+   return err;
+}
+
+static void mlx5i_destroy_flow_steering(struct mlx5e_priv *priv)
+{
+   mlx5e_destroy_tt

[PATCH net-next 04/16] net/mlx5e: More generic netdev management API

2017-04-12 Thread Saeed Mahameed
In preparation for mlx5e RDMA net_device support, here we generalize
mlx5e_attach/detach in a way that those functions will be agnostic
to link type.  For that we move ethernet specific NIC net device logic out
of those functions into {nic,rep}_{enable/disable} mlx5e NIC and
representor profiles callbacks.

Also some of the logic was moved only to NIC profile since it is not right
to have this logic for representor net device (e.g. set port MTU).

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  15 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 160 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  |  12 +-
 3 files changed, 96 insertions(+), 91 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index b7feecfbb5a5..ced31906b8fd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -999,12 +999,6 @@ void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv);
 int mlx5e_close(struct net_device *netdev);
 int mlx5e_open(struct net_device *netdev);
 void mlx5e_update_stats_work(struct work_struct *work);
-struct net_device *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
-  const struct mlx5e_profile *profile,
-  void *ppriv);
-void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
-int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev);
-void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device 
*netdev);
 u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout);
 
 int mlx5e_get_offload_stats(int attr_id, const struct net_device *dev,
@@ -1013,4 +1007,13 @@ bool mlx5e_has_offload_stats(const struct net_device 
*dev, int attr_id);
 
 bool mlx5e_is_uplink_rep(struct mlx5e_priv *priv);
 bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv);
+
+/* mlx5e generic netdev management API */
+struct net_device*
+mlx5e_create_netdev(struct mlx5_core_dev *mdev, const struct mlx5e_profile 
*profile,
+   void *ppriv);
+int mlx5e_attach_netdev(struct mlx5e_priv *priv);
+void mlx5e_detach_netdev(struct mlx5e_priv *priv);
+void mlx5e_destroy_netdev(struct mlx5e_priv *priv);
+
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 8b7b7e604ea0..cdc34ba354c8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4121,12 +4121,57 @@ static int mlx5e_init_nic_tx(struct mlx5e_priv *priv)
return 0;
 }
 
+static void mlx5e_register_vport_rep(struct mlx5_core_dev *mdev)
+{
+   struct mlx5_eswitch *esw = mdev->priv.eswitch;
+   int total_vfs = MLX5_TOTAL_VPORTS(mdev);
+   int vport;
+   u8 mac[ETH_ALEN];
+
+   if (!MLX5_CAP_GEN(mdev, vport_group_manager))
+   return;
+
+   mlx5_query_nic_vport_mac_address(mdev, 0, mac);
+
+   for (vport = 1; vport < total_vfs; vport++) {
+   struct mlx5_eswitch_rep rep;
+
+   rep.load = mlx5e_vport_rep_load;
+   rep.unload = mlx5e_vport_rep_unload;
+   rep.vport = vport;
+   ether_addr_copy(rep.hw_id, mac);
+   mlx5_eswitch_register_vport_rep(esw, vport, );
+   }
+}
+
+static void mlx5e_unregister_vport_rep(struct mlx5_core_dev *mdev)
+{
+   struct mlx5_eswitch *esw = mdev->priv.eswitch;
+   int total_vfs = MLX5_TOTAL_VPORTS(mdev);
+   int vport;
+
+   if (!MLX5_CAP_GEN(mdev, vport_group_manager))
+   return;
+
+   for (vport = 1; vport < total_vfs; vport++)
+   mlx5_eswitch_unregister_vport_rep(esw, vport);
+}
+
 static void mlx5e_nic_enable(struct mlx5e_priv *priv)
 {
struct net_device *netdev = priv->netdev;
struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5_eswitch *esw = mdev->priv.eswitch;
struct mlx5_eswitch_rep rep;
+   u16 max_mtu;
+
+   mlx5e_init_l2_addr(priv);
+
+   /* MTU range: 68 - hw-specific max */
+   netdev->min_mtu = ETH_MIN_MTU;
+   mlx5_query_port_max_mtu(priv->mdev, _mtu, 1);
+   netdev->max_mtu = MLX5E_HW2SW_MTU(max_mtu);
+   mlx5e_set_dev_port_mtu(priv);
 
mlx5_lag_add(mdev, netdev);
 
@@ -4141,6 +4186,8 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
mlx5_eswitch_register_vport_rep(esw, 0, );
}
 
+   mlx5e_register_vport_rep(mdev);
+
if (netdev->reg_state != NETREG_REGISTERED)
return;
 
@@ -4152,6 +4199,12 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
}
 
queue_work(priv->wq, >set_rx_mode_work);
+
+ 

[PATCH net-next 02/16] net/mlx5: Refactor create flow table method to accept underlay QP

2017-04-12 Thread Saeed Mahameed
From: Erez Shitrit <ere...@mellanox.com>

IB flow tables need the underlay qp to perform flow steering.
Here we change the API of the flow tables creation to accept the
underlay QP number as a parameter in order to support IB (IPoIB) flow
steering.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c  | 10 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c| 25 +--
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  5 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 16 +++--
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |  8 +++
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 84 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |  1 +
 include/linux/mlx5/fs.h| 14 ++--
 8 files changed, 113 insertions(+), 50 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
index c4e9cc79f5c7..c8a005326e30 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
@@ -321,10 +321,16 @@ static int arfs_create_table(struct mlx5e_priv *priv,
 {
struct mlx5e_arfs_tables *arfs = >fs.arfs;
struct mlx5e_flow_table *ft = >arfs_tables[type].ft;
+   struct mlx5_flow_table_attr ft_attr = {};
int err;
 
-   ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO,
-  MLX5E_ARFS_TABLE_SIZE, 
MLX5E_ARFS_FT_LEVEL, 0);
+   ft->num_groups = 0;
+
+   ft_attr.max_fte = MLX5E_ARFS_TABLE_SIZE;
+   ft_attr.level = MLX5E_ARFS_FT_LEVEL;
+   ft_attr.prio = MLX5E_NIC_PRIO;
+
+   ft->t = mlx5_create_flow_table(priv->fs.ns, _attr);
if (IS_ERR(ft->t)) {
err = PTR_ERR(ft->t);
ft->t = NULL;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index 5376d69a6b1a..729904c43801 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -803,11 +803,15 @@ static void mlx5e_destroy_ttc_table(struct mlx5e_priv 
*priv)
 static int mlx5e_create_ttc_table(struct mlx5e_priv *priv)
 {
struct mlx5e_ttc_table *ttc = >fs.ttc;
+   struct mlx5_flow_table_attr ft_attr = {};
struct mlx5e_flow_table *ft = >ft;
int err;
 
-   ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO,
-  MLX5E_TTC_TABLE_SIZE, 
MLX5E_TTC_FT_LEVEL, 0);
+   ft_attr.max_fte = MLX5E_TTC_TABLE_SIZE;
+   ft_attr.level = MLX5E_TTC_FT_LEVEL;
+   ft_attr.prio = MLX5E_NIC_PRIO;
+
+   ft->t = mlx5_create_flow_table(priv->fs.ns, _attr);
if (IS_ERR(ft->t)) {
err = PTR_ERR(ft->t);
ft->t = NULL;
@@ -973,12 +977,16 @@ static int mlx5e_create_l2_table(struct mlx5e_priv *priv)
 {
struct mlx5e_l2_table *l2_table = >fs.l2;
struct mlx5e_flow_table *ft = _table->ft;
+   struct mlx5_flow_table_attr ft_attr = {};
int err;
 
ft->num_groups = 0;
-   ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO,
-  MLX5E_L2_TABLE_SIZE, MLX5E_L2_FT_LEVEL, 
0);
 
+   ft_attr.max_fte = MLX5E_L2_TABLE_SIZE;
+   ft_attr.level = MLX5E_L2_FT_LEVEL;
+   ft_attr.prio = MLX5E_NIC_PRIO;
+
+   ft->t = mlx5_create_flow_table(priv->fs.ns, _attr);
if (IS_ERR(ft->t)) {
err = PTR_ERR(ft->t);
ft->t = NULL;
@@ -1076,11 +1084,16 @@ static int mlx5e_create_vlan_table_groups(struct 
mlx5e_flow_table *ft)
 static int mlx5e_create_vlan_table(struct mlx5e_priv *priv)
 {
struct mlx5e_flow_table *ft = >fs.vlan.ft;
+   struct mlx5_flow_table_attr ft_attr = {};
int err;
 
ft->num_groups = 0;
-   ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO,
-  MLX5E_VLAN_TABLE_SIZE, 
MLX5E_VLAN_FT_LEVEL, 0);
+
+   ft_attr.max_fte = MLX5E_VLAN_TABLE_SIZE;
+   ft_attr.level = MLX5E_VLAN_FT_LEVEL;
+   ft_attr.prio = MLX5E_NIC_PRIO;
+
+   ft->t = mlx5_create_flow_table(priv->fs.ns, _attr);
 
if (IS_ERR(ft->t)) {
err = PTR_ERR(ft->t);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index fcd5bc7e31db..b3281d1118b3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -337,6 +337,7 @@ esw_fdb_set_vport_promisc_rule(struct mlx5_eswitch *esw, 
u32 vport)
 static int esw_create_legacy_fdb_table(struct mlx5_eswitch *esw, int nvports)
 {
int inlen = MLX5_ST_SZ_BYTES(crea

[PATCH net-next 00/16] Mellanox, mlx5 RDMA net device support

2017-04-12 Thread Saeed Mahameed
Hi Dave and Doug.

This series provides the lower level mlx5 support of RDMA netdevice
creation API [1] suggested and introduced by Intel's HFI OPA VNIC
netdevice driver [2], to enable IPoIB mlx5 RDMA netdevice creation.

mlx5 IPoIB RDMA netdev will serve as an acceleration netdevice for the current
IPoIB ULP generic netdevice, providing:
- mlx5 RSS support.
- mlx5 HW RX,TX offloads (checksum, TSO, LRO, etc ..).
- Full mlx5 HW features transparent to the ULP itself.

The idea here is to reuse and benefit from the already implemented mlx5e 
netdevice
management and channels API for both etherent and RDMA netdevices, since both 
IPoIB
and Ethernet netdevices share same common mlx5 HW resources (with some small
exceptions) and share most of the control/data path logic, it is more natural to
have them share the same code.

The differences between IPoIB and Ethernet netdevices can be summarized to:

Steering:
In mlx5, IPoIB traffic is sent and received from an underlay special QP, and in 
Ethernet
the traffic is handled by vports and vport steering is managed by e-switch or 
FW.

For IPoIB traffic to get steered correctly the only thing we need to do is to 
create RSS
HW contexts for RX and TX HW contexts for TX (similar to mlx5e) with the 
underlay QP attached to
them (underlay QP will be 0 in case of Ethernet).

RX,TX:
Since IPoIB traffic is different, slightly modified RX and TX handlers are 
required,
still we do some code reuse in data path via common helper functions.

All of the other generic netdevice and mlx5 aspects will be shared between mlx5 
Ethernet
and IPoIB netdevices, e.g.
- Channels creation and handling (RQs,SQs,CQs, NAPI, interrupt 
moderation, etc..)
- Offloads, checksum, GRO, LRO, TSO, and more.
- netdevice logic and non Ethernet specific ndos (open/close, etc..)

In order to achieve what we want:

In patchet 1 to 3, Erez added the supported for underlay QP in mlx5_ifc and 
refactored 
the mlx5 steering code to accept the underlay QP as a parameter for creating 
steering
objects and enabled flow steering for IB link.

Then we are going to use the mlx5e netdevice profile, which is already used to 
separate between
NIC and VF representors netdevices, to create new type of IPoIB netdevice 
profile.

For that, one small refactoring is required to make mlx5e netdevice profile 
management
more genetic and agnostic to link type which is done in patch #4.

In patch #5, we introduce ipoib.c to host all of mlx5 IPoIB (mlx5i) specific 
logic and a 
skeleton for the IPoIB mlx5 netdevice profile, and we will start filling it in 
next patches,
using mlx5e already existing APIs.

Patch #6 and #7, Implement init/cleanup RX mlx5i netdev profile handlers to 
create mlx5 RSS
resources, same as mlx5e but without vlan and L2 steering tables.

Patch #8, Implement init/cleanup TX mlx5i netdev profile handlers, to create TX 
resources
same as mlx5e but with one TC (tc = 0) support.

Patch #9, Implement mlx5i open/close ndos, where we reuese the mlx5e channels 
API, to start/stop TX/RX channels.

Patch #10, Create the underlay QP and attach it to mlx5i RSS and TX HW contexts.

Patch #11 and #12, Break down the mlx5e xmit flow into smaller helper function 
and implement the
mlx5i IPoIB xmit routine.

Patch #13 and #14, Have an RX handler per netdevice profile. We already do this 
before this series
in a non clean way to separate between NIC netdev and VF representor RX 
handlers, in patch 13 we make
the RX handler generic and bound to a profile and in patch 14 we implement the 
IPoIB RX handlers.

Patch #15, Small cleanup to avoid e-switch with IPoIB netdev. 


In order to enable mlx5 IPoIB, a merge between the IPoIB RDMA netdev offolad 
support [3]
- which was alread submitted to the rdma mailing list - and this series is 
required
plus an extra small patch [4] which will connect between both sides and 
actually enables the offload.

Once both patch-sets are merged into linux we will have to submit the extra 
small patch [4], to enable
the feature.

Thanks,
Saeed.

[1] https://patchwork.kernel.org/patch/9676637/

[2] https://lwn.net/Articles/715453/
https://patchwork.kernel.org/patch/9587815/

[3] https://patchwork.kernel.org/patch/9672069/
[4] 
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/commit/?id=0141db6a686e32294dee015b7d07706162ba48d8


Erez Shitrit (4):
  net/mlx5: Add IPoIB enhanced offloads bits to mlx5_ifc
  net/mlx5: Refactor create flow table method to accept underlay QP
  net/mlx5: Enable flow-steering for IB link
  hw/mlx5: Add New bit to check over QP creation

Saeed Mahameed (12):
  net/mlx5e: More generic netdev management API
  net/mlx5e: IPoIB, Add netdevice profile skeleton
  net/mlx5e: IPoIB, RX steering RSS RQTs and TIRs
  net/mlx5e: IPoIB, RSS flow steering tables
  net/mlx5e: IPoIB, TX TIS creation
  net/mlx5e: IPoIB, Basic netdev ndos open/close
  net/mlx5e: IPoIB, Underlay QP
  net/mlx5e: Xmit flow break down
  net

[PATCH net-next 03/16] net/mlx5: Enable flow-steering for IB link

2017-04-12 Thread Saeed Mahameed
From: Erez Shitrit <ere...@mellanox.com>

Get the relevant capabilities if supports ipoib_enhanced_offloads and
init the flow steering table accordingly.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 11 ---
 drivers/net/ethernet/mellanox/mlx5/core/fw.c  |  3 ++-
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 55182d0b06e8..b8a176503d38 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1905,9 +1905,6 @@ void mlx5_cleanup_fs(struct mlx5_core_dev *dev)
 {
struct mlx5_flow_steering *steering = dev->priv.steering;
 
-   if (MLX5_CAP_GEN(dev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
-   return;
-
cleanup_root_ns(steering->root_ns);
cleanup_root_ns(steering->esw_egress_root_ns);
cleanup_root_ns(steering->esw_ingress_root_ns);
@@ -2010,9 +2007,6 @@ int mlx5_init_fs(struct mlx5_core_dev *dev)
struct mlx5_flow_steering *steering;
int err = 0;
 
-   if (MLX5_CAP_GEN(dev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
-   return 0;
-
err = mlx5_init_fc_stats(dev);
if (err)
return err;
@@ -2023,7 +2017,10 @@ int mlx5_init_fs(struct mlx5_core_dev *dev)
steering->dev = dev;
dev->priv.steering = steering;
 
-   if (MLX5_CAP_GEN(dev, nic_flow_table) &&
+   if MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_ETH) &&
+ (MLX5_CAP_GEN(dev, nic_flow_table))) ||
+((MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_IB) &&
+ MLX5_CAP_GEN(dev, ipoib_enhanced_offloads))) &&
MLX5_CAP_FLOWTABLE_NIC_RX(dev, ft_support)) {
err = init_root_ns(steering);
if (err)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index d0bbefa08af7..1bc14d0fded8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -137,7 +137,8 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
return err;
}
 
-   if (MLX5_CAP_GEN(dev, nic_flow_table)) {
+   if (MLX5_CAP_GEN(dev, nic_flow_table) ||
+   MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)) {
err = mlx5_core_get_caps(dev, MLX5_CAP_FLOW_TABLE);
if (err)
return err;
-- 
2.11.0



[PATCH net-next 08/16] net/mlx5e: IPoIB, TX TIS creation

2017-04-12 Thread Saeed Mahameed
Modify mlx5e tis creation function to accept underlay qp number, which
will be needed by IPoIB.

Implement mlx5i (IPoIB) tx init/cleanup netdevice profile flows to
create one TIS with the IPoIB underlay qp, for IPoIB TX SQs.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  4 
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 18 ++
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c   | 14 --
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index c813eab5d764..5345d875b695 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -1014,6 +1014,10 @@ void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct 
mlx5e_rqt *rqt);
 int mlx5e_create_ttc_table(struct mlx5e_priv *priv, u32 underlay_qpn);
 void mlx5e_destroy_ttc_table(struct mlx5e_priv *priv);
 
+int mlx5e_create_tis(struct mlx5_core_dev *mdev, int tc,
+u32 underlay_qpn, u32 *tisn);
+void mlx5e_destroy_tis(struct mlx5_core_dev *mdev, u32 tisn);
+
 int mlx5e_create_tises(struct mlx5e_priv *priv);
 void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv);
 int mlx5e_close(struct net_device *netdev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 08b67aa24644..1fde4e2301a4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2759,24 +2759,25 @@ static void mlx5e_close_drop_rq(struct mlx5e_rq 
*drop_rq)
mlx5e_free_cq(_rq->cq);
 }
 
-static int mlx5e_create_tis(struct mlx5e_priv *priv, int tc)
+int mlx5e_create_tis(struct mlx5_core_dev *mdev, int tc,
+u32 underlay_qpn, u32 *tisn)
 {
-   struct mlx5_core_dev *mdev = priv->mdev;
u32 in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
void *tisc = MLX5_ADDR_OF(create_tis_in, in, ctx);
 
MLX5_SET(tisc, tisc, prio, tc << 1);
+   MLX5_SET(tisc, tisc, underlay_qpn, underlay_qpn);
MLX5_SET(tisc, tisc, transport_domain, mdev->mlx5e_res.td.tdn);
 
if (mlx5_lag_is_lacp_owner(mdev))
MLX5_SET(tisc, tisc, strict_lag_tx_port_affinity, 1);
 
-   return mlx5_core_create_tis(mdev, in, sizeof(in), >tisn[tc]);
+   return mlx5_core_create_tis(mdev, in, sizeof(in), tisn);
 }
 
-static void mlx5e_destroy_tis(struct mlx5e_priv *priv, int tc)
+void mlx5e_destroy_tis(struct mlx5_core_dev *mdev, u32 tisn)
 {
-   mlx5_core_destroy_tis(priv->mdev, priv->tisn[tc]);
+   mlx5_core_destroy_tis(mdev, tisn);
 }
 
 int mlx5e_create_tises(struct mlx5e_priv *priv)
@@ -2785,7 +2786,7 @@ int mlx5e_create_tises(struct mlx5e_priv *priv)
int tc;
 
for (tc = 0; tc < priv->profile->max_tc; tc++) {
-   err = mlx5e_create_tis(priv, tc);
+   err = mlx5e_create_tis(priv->mdev, tc, 0, >tisn[tc]);
if (err)
goto err_close_tises;
}
@@ -2794,7 +2795,7 @@ int mlx5e_create_tises(struct mlx5e_priv *priv)
 
 err_close_tises:
for (tc--; tc >= 0; tc--)
-   mlx5e_destroy_tis(priv, tc);
+   mlx5e_destroy_tis(priv->mdev, priv->tisn[tc]);
 
return err;
 }
@@ -2804,7 +2805,7 @@ void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv)
int tc;
 
for (tc = 0; tc < priv->profile->max_tc; tc++)
-   mlx5e_destroy_tis(priv, tc);
+   mlx5e_destroy_tis(priv->mdev, priv->tisn[tc]);
 }
 
 static void mlx5e_build_indir_tir_ctx(struct mlx5e_priv *priv,
@@ -3841,6 +3842,7 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
mlx5e_set_rq_params(mdev, params);
 
/* HW LRO */
+   /* TODO: && MLX5_CAP_ETH(mdev, lro_cap) */
if (params->rq_wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ)
params->lro_en = true;
params->lro_timeout = mlx5e_choose_lro_timeout(mdev, 
MLX5E_DEFAULT_LRO_TIMEOUT);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
index e16e1c7b246e..d7d705c840ae 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
@@ -63,13 +63,23 @@ static void mlx5i_cleanup(struct mlx5e_priv *priv)
 
 static int mlx5i_init_tx(struct mlx5e_priv *priv)
 {
+   struct mlx5i_priv *ipriv = priv->ppriv;
+   int err;
+
/* TODO: Create IPoIB underlay QP */
-   /* TODO: create IPoIB TX HW TIS */
+
+   err = mlx5e_create_tis(priv->mdev, 0 /* tc */, ipriv->qp.qpn, 
>tisn[0]);
+   if (err) {
+   mlx5_core_warn(priv->mdev, "create tis failed, %d\n", err);
+ 

[PATCH net-next 13/16] net/mlx5e: RX handlers per netdev profile

2017-04-12 Thread Saeed Mahameed
In order to have different RX handler per profile, fix and refactor the
current code to take the rx handler directly from the netdevice profile
rather than computing it on runtime as it was done with the switchdev
mode representor rx handler.

This will also remove the current wrong assumption in mlx5e_alloc_rq
code that mlx5e_priv->ppriv is of the type vport_rep.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  5 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 28 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  |  4 +++-
 3 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 25185f8c3562..0881325fba04 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -779,6 +779,10 @@ struct mlx5e_profile {
void(*disable)(struct mlx5e_priv *priv);
void(*update_stats)(struct mlx5e_priv *priv);
int (*max_nch)(struct mlx5_core_dev *mdev);
+   struct {
+   mlx5e_fp_handle_rx_cqe handle_rx_cqe;
+   mlx5e_fp_handle_rx_cqe handle_rx_cqe_mpwqe;
+   } rx_handlers;
int max_tc;
 };
 
@@ -1032,7 +1036,6 @@ int mlx5e_get_offload_stats(int attr_id, const struct 
net_device *dev,
 bool mlx5e_has_offload_stats(const struct net_device *dev, int attr_id);
 
 bool mlx5e_is_uplink_rep(struct mlx5e_priv *priv);
-bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv);
 
 /* mlx5e generic netdev management API */
 struct net_device*
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 2201b7ea05f4..6a164aff404c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -585,15 +585,17 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 
switch (rq->wq_type) {
case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
-   if (mlx5e_is_vf_vport_rep(c->priv)) {
-   err = -EINVAL;
-   goto err_rq_wq_destroy;
-   }
 
-   rq->handle_rx_cqe = mlx5e_handle_rx_cqe_mpwrq;
rq->alloc_wqe = mlx5e_alloc_rx_mpwqe;
rq->dealloc_wqe = mlx5e_dealloc_rx_mpwqe;
 
+   rq->handle_rx_cqe = 
c->priv->profile->rx_handlers.handle_rx_cqe_mpwqe;
+   if (!rq->handle_rx_cqe) {
+   err = -EINVAL;
+   netdev_err(c->netdev, "RX handler of MPWQE RQ is not 
set, err %d\n", err);
+   goto err_rq_wq_destroy;
+   }
+
rq->mpwqe_stride_sz = BIT(params->mpwqe_log_stride_sz);
rq->mpwqe_num_strides = BIT(params->mpwqe_log_num_strides);
 
@@ -616,15 +618,17 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
err = -ENOMEM;
goto err_rq_wq_destroy;
}
-
-   if (mlx5e_is_vf_vport_rep(c->priv))
-   rq->handle_rx_cqe = mlx5e_handle_rx_cqe_rep;
-   else
-   rq->handle_rx_cqe = mlx5e_handle_rx_cqe;
-
rq->alloc_wqe = mlx5e_alloc_rx_wqe;
rq->dealloc_wqe = mlx5e_dealloc_rx_wqe;
 
+   rq->handle_rx_cqe = c->priv->profile->rx_handlers.handle_rx_cqe;
+   if (!rq->handle_rx_cqe) {
+   kfree(rq->dma_info);
+   err = -EINVAL;
+   netdev_err(c->netdev, "RX handler of RQ is not set, err 
%d\n", err);
+   goto err_rq_wq_destroy;
+   }
+
rq->buff.wqe_sz = params->lro_en  ?
params->lro_wqe_sz :
MLX5E_SW2HW_MTU(c->netdev->mtu);
@@ -4229,6 +4233,8 @@ static const struct mlx5e_profile mlx5e_nic_profile = {
.disable   = mlx5e_nic_disable,
.update_stats  = mlx5e_update_stats,
.max_nch   = mlx5e_get_max_num_channels,
+   .rx_handlers.handle_rx_cqe   = mlx5e_handle_rx_cqe,
+   .rx_handlers.handle_rx_cqe_mpwqe = mlx5e_handle_rx_cqe_mpwrq,
.max_tc= MLX5E_MAX_NUM_TC,
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index da85b0ad3e92..16b683e8226d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -329,7 +329,7 @@ bool mlx5e_is_uplink_rep(struct mlx5e_priv *priv)
return false;
 }
 
-bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv)
+static bool mlx5e_is_vf_vpor

[PATCH net-next 12/16] net/mlx5e: IPoIB, Xmit flow

2017-04-12 Thread Saeed Mahameed
Implement mlx5e's IPoIB SKB transmit using the helper functions provided
by mlx5e ethernet tx flow, the only difference in the code between
mlx5e_xmit and mlx5i_xmit is that IPoIB has some extra fields to fill
(UD datagram segment) in the TX descriptor (WQE) and it doesn't need to
have any vlan handling.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h| 10 ---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 87 +
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 10 +++
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.h |  3 +
 include/linux/mlx5/qp.h | 10 +++
 5 files changed, 110 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 3cd064b5f0bf..ce8ba617d46e 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -729,16 +729,6 @@ static inline struct mlx5_ib_mw *to_mmw(struct ib_mw *ibmw)
return container_of(ibmw, struct mlx5_ib_mw, ibmw);
 }
 
-struct mlx5_ib_ah {
-   struct ib_ahibah;
-   struct mlx5_av  av;
-};
-
-static inline struct mlx5_ib_ah *to_mah(struct ib_ah *ibah)
-{
-   return container_of(ibah, struct mlx5_ib_ah, ibah);
-}
-
 int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context, unsigned long virt,
struct mlx5_db *db);
 void mlx5_ib_db_unmap_user(struct mlx5_ib_ucontext *context, struct mlx5_db 
*db);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index ba664a1126cf..dda7db503043 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -503,3 +503,90 @@ void mlx5e_free_txqsq_descs(struct mlx5e_txqsq *sq)
sq->cc += wi->num_wqebbs;
}
 }
+
+#ifdef CONFIG_MLX5_CORE_IPOIB
+
+struct mlx5_wqe_eth_pad {
+   u8 rsvd0[16];
+};
+
+struct mlx5i_tx_wqe {
+   struct mlx5_wqe_ctrl_seg ctrl;
+   struct mlx5_wqe_datagram_seg datagram;
+   struct mlx5_wqe_eth_pad  pad;
+   struct mlx5_wqe_eth_seg  eth;
+};
+
+static inline void
+mlx5i_txwqe_build_datagram(struct mlx5_av *av, u32 dqpn, u32 dqkey,
+  struct mlx5_wqe_datagram_seg *dseg)
+{
+   memcpy(>av, av, sizeof(struct mlx5_av));
+   dseg->av.dqp_dct = cpu_to_be32(dqpn | MLX5_EXTENDED_UD_AV);
+   dseg->av.key.qkey.qkey = cpu_to_be32(dqkey);
+}
+
+netdev_tx_t mlx5i_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
+ struct mlx5_av *av, u32 dqpn, u32 dqkey)
+{
+   struct mlx5_wq_cyc   *wq   = >wq;
+   u16   pi   = sq->pc & wq->sz_m1;
+   struct mlx5i_tx_wqe  *wqe  = mlx5_wq_cyc_get_wqe(wq, pi);
+   struct mlx5e_tx_wqe_info *wi   = >db.wqe_info[pi];
+
+   struct mlx5_wqe_ctrl_seg *cseg = >ctrl;
+   struct mlx5_wqe_datagram_seg *datagram = >datagram;
+   struct mlx5_wqe_eth_seg  *eseg = >eth;
+
+   unsigned char *skb_data = skb->data;
+   unsigned int skb_len = skb->len;
+   u8  opcode = MLX5_OPCODE_SEND;
+   unsigned int num_bytes;
+   int num_dma;
+   u16 headlen;
+   u16 ds_cnt;
+   u16 ihs;
+
+   memset(wqe, 0, sizeof(*wqe));
+
+   mlx5i_txwqe_build_datagram(av, dqpn, dqkey, datagram);
+
+   mlx5e_txwqe_build_eseg_csum(sq, skb, eseg);
+
+   if (skb_is_gso(skb)) {
+   opcode = MLX5_OPCODE_LSO;
+   ihs = mlx5e_txwqe_build_eseg_gso(sq, skb, eseg, _bytes);
+   } else {
+   ihs = mlx5e_calc_min_inline(sq->min_inline_mode, skb);
+   num_bytes = max_t(unsigned int, skb->len, ETH_ZLEN);
+   }
+
+   ds_cnt = sizeof(*wqe) / MLX5_SEND_WQE_DS;
+   if (ihs) {
+   memcpy(eseg->inline_hdr.start, skb_data, ihs);
+   mlx5e_tx_skb_pull_inline(_data, _len, ihs);
+   eseg->inline_hdr.sz = cpu_to_be16(ihs);
+   ds_cnt += DIV_ROUND_UP(ihs - sizeof(eseg->inline_hdr.start), 
MLX5_SEND_WQE_DS);
+   }
+
+   headlen = skb_len - skb->data_len;
+   num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb_data, headlen,
+ (struct mlx5_wqe_data_seg *)cseg + 
ds_cnt);
+   if (unlikely(num_dma < 0))
+   goto dma_unmap_wqe_err;
+
+   mlx5e_txwqe_complete(sq, skb, opcode, ds_cnt + num_dma,
+num_bytes, num_dma, wi, cseg);
+
+   return NETDEV_TX_OK;
+
+dma_unmap_wqe_err:
+   sq->stats.dropped++;
+   mlx5e_dma_unmap_wqe_err(sq, wi->num_dma);
+
+   dev_kfree_skb_any(skb);
+
+   return NETDEV_TX_OK;
+}
+
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/cor

[PATCH net-next 15/16] net/mlx5e: E-switch vport manager is valid for ethernet only

2017-04-12 Thread Saeed Mahameed
Currently the driver support only ethernet eswitch, and we want to
protect downstream IPoIB netdev from trying to access it in IB link.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 6a164aff404c..061b20c73071 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2548,6 +2548,12 @@ static void mlx5e_build_channels_tx_maps(struct 
mlx5e_priv *priv)
}
 }
 
+static bool mlx5e_is_eswitch_vport_mngr(struct mlx5_core_dev *mdev)
+{
+   return (MLX5_CAP_GEN(mdev, vport_group_manager) &&
+   MLX5_CAP_GEN(mdev, port_type) == MLX5_CAP_PORT_TYPE_ETH);
+}
+
 void mlx5e_activate_priv_channels(struct mlx5e_priv *priv)
 {
int num_txqs = priv->channels.num * priv->channels.params.num_tc;
@@ -2561,7 +2567,7 @@ void mlx5e_activate_priv_channels(struct mlx5e_priv *priv)
mlx5e_activate_channels(>channels);
netif_tx_start_all_queues(priv->netdev);
 
-   if (MLX5_CAP_GEN(priv->mdev, vport_group_manager))
+   if (mlx5e_is_eswitch_vport_mngr(priv->mdev))
mlx5e_add_sqs_fwd_rules(priv);
 
mlx5e_wait_channels_min_rx_wqes(>channels);
@@ -2572,7 +2578,7 @@ void mlx5e_deactivate_priv_channels(struct mlx5e_priv 
*priv)
 {
mlx5e_redirect_rqts_to_drop(priv);
 
-   if (MLX5_CAP_GEN(priv->mdev, vport_group_manager))
+   if (mlx5e_is_eswitch_vport_mngr(priv->mdev))
mlx5e_remove_sqs_fwd_rules(priv);
 
/* FIXME: This is a W/A only for tx timeout watch dog false alarm when
-- 
2.11.0



[PATCH net-next 14/16] net/mlx5e: IPoIB, RX handler

2017-04-12 Thread Saeed Mahameed
Implement IPoIB RX SKB handler.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 78 +
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c |  2 +
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.h |  1 +
 3 files changed, 81 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 1a9532b31635..43308243f519 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1031,3 +1031,81 @@ void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq)
mlx5e_page_release(rq, di, false);
}
 }
+
+#ifdef CONFIG_MLX5_CORE_IPOIB
+
+#define MLX5_IB_GRH_DGID_OFFSET 24
+#define MLX5_IB_GRH_BYTES   40
+#define MLX5_IPOIB_ENCAP_LEN4
+#define MLX5_GID_SIZE   16
+
+static inline void mlx5i_complete_rx_cqe(struct mlx5e_rq *rq,
+struct mlx5_cqe64 *cqe,
+u32 cqe_bcnt,
+struct sk_buff *skb)
+{
+   struct net_device *netdev = rq->netdev;
+   u8 *dgid;
+   u8 g;
+
+   g = (be32_to_cpu(cqe->flags_rqpn) >> 28) & 3;
+   dgid = skb->data + MLX5_IB_GRH_DGID_OFFSET;
+   if ((!g) || dgid[0] != 0xff)
+   skb->pkt_type = PACKET_HOST;
+   else if (memcmp(dgid, netdev->broadcast + 4, MLX5_GID_SIZE) == 0)
+   skb->pkt_type = PACKET_BROADCAST;
+   else
+   skb->pkt_type = PACKET_MULTICAST;
+
+   /* TODO: IB/ipoib: Allow mcast packets from other VFs
+* 68996a6e760e5c74654723eeb57bf65628ae87f4
+*/
+
+   skb_pull(skb, MLX5_IB_GRH_BYTES);
+
+   skb->protocol = *((__be16 *)(skb->data));
+
+   skb->ip_summed = CHECKSUM_COMPLETE;
+   skb->csum = csum_unfold((__force __sum16)cqe->check_sum);
+
+   skb_record_rx_queue(skb, rq->ix);
+
+   if (likely(netdev->features & NETIF_F_RXHASH))
+   mlx5e_skb_set_hash(cqe, skb);
+
+   skb_reset_mac_header(skb);
+   skb_pull(skb, MLX5_IPOIB_ENCAP_LEN);
+
+   skb->dev = netdev;
+
+   rq->stats.csum_complete++;
+   rq->stats.packets++;
+   rq->stats.bytes += cqe_bcnt;
+}
+
+void mlx5i_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
+{
+   struct mlx5e_rx_wqe *wqe;
+   __be16 wqe_counter_be;
+   struct sk_buff *skb;
+   u16 wqe_counter;
+   u32 cqe_bcnt;
+
+   wqe_counter_be = cqe->wqe_counter;
+   wqe_counter= be16_to_cpu(wqe_counter_be);
+   wqe= mlx5_wq_ll_get_wqe(>wq, wqe_counter);
+   cqe_bcnt   = be32_to_cpu(cqe->byte_cnt);
+
+   skb = skb_from_cqe(rq, cqe, wqe_counter, cqe_bcnt);
+   if (!skb)
+   goto wq_ll_pop;
+
+   mlx5i_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
+   napi_gro_receive(rq->cq.napi, skb);
+
+wq_ll_pop:
+   mlx5_wq_ll_pop(>wq, wqe_counter_be,
+  >next.next_wqe_index);
+}
+
+#endif /* CONFIG_MLX5_CORE_IPOIB */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
index c468aaedf0a6..001d2953cb6d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
@@ -282,6 +282,8 @@ static const struct mlx5e_profile mlx5i_nic_profile = {
.disable   = NULL, /* mlx5i_disable */
.update_stats  = NULL, /* mlx5i_update_stats */
.max_nch   = mlx5e_get_max_num_channels,
+   .rx_handlers.handle_rx_cqe   = mlx5i_handle_rx_cqe,
+   .rx_handlers.handle_rx_cqe_mpwqe = NULL, /* Not supported */
.max_tc= MLX5I_MAX_NUM_TC,
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.h 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.h
index 89bca182464c..bae0a5cbc8ad 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.h
@@ -49,5 +49,6 @@ struct mlx5i_priv {
 
 netdev_tx_t mlx5i_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
  struct mlx5_av *av, u32 dqpn, u32 dqkey);
+void mlx5i_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe);
 
 #endif /* __MLX5E_IPOB_H__ */
-- 
2.11.0



[PATCH net-next 06/16] net/mlx5e: IPoIB, RX steering RSS RQTs and TIRs

2017-04-12 Thread Saeed Mahameed
Implement IPoIB RX RSS (RQTs and TIRs) HW objects creation,
All we do here is simply reuse the mlx5e implementation to create
direct and indirect (RSS) steering HW objects.

For that we just expose
mlx5e_{create,destroy}_{direct,indirect}_{rqt,tir} functions into en.h
and call them from ipoib.c in init/cleanup_rx IPoIB netdevice profile
callbacks.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  | 12 -
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 56 ---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 17 ++-
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c   | 42 +++--
 4 files changed, 83 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 02aa3cc59dc3..e5518536d56f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -999,10 +999,17 @@ int mlx5e_attr_get(struct net_device *dev, struct 
switchdev_attr *attr);
 void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe);
 void mlx5e_update_hw_rep_counters(struct mlx5e_priv *priv);
 
+int mlx5e_create_indirect_rqt(struct mlx5e_priv *priv);
+
+int mlx5e_create_indirect_tirs(struct mlx5e_priv *priv);
+void mlx5e_destroy_indirect_tirs(struct mlx5e_priv *priv);
+
 int mlx5e_create_direct_rqts(struct mlx5e_priv *priv);
-void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct mlx5e_rqt *rqt);
+void mlx5e_destroy_direct_rqts(struct mlx5e_priv *priv);
 int mlx5e_create_direct_tirs(struct mlx5e_priv *priv);
 void mlx5e_destroy_direct_tirs(struct mlx5e_priv *priv);
+void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct mlx5e_rqt *rqt);
+
 int mlx5e_create_tises(struct mlx5e_priv *priv);
 void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv);
 int mlx5e_close(struct net_device *netdev);
@@ -1024,5 +1031,8 @@ mlx5e_create_netdev(struct mlx5_core_dev *mdev, const 
struct mlx5e_profile *prof
 int mlx5e_attach_netdev(struct mlx5e_priv *priv);
 void mlx5e_detach_netdev(struct mlx5e_priv *priv);
 void mlx5e_destroy_netdev(struct mlx5e_priv *priv);
+void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
+   struct mlx5e_params *params,
+   u16 max_channels);
 
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 14c7452a6348..08b67aa24644 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2115,11 +2115,15 @@ void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct 
mlx5e_rqt *rqt)
mlx5_core_destroy_rqt(priv->mdev, rqt->rqtn);
 }
 
-static int mlx5e_create_indirect_rqts(struct mlx5e_priv *priv)
+int mlx5e_create_indirect_rqt(struct mlx5e_priv *priv)
 {
struct mlx5e_rqt *rqt = >indir_rqt;
+   int err;
 
-   return mlx5e_create_rqt(priv, MLX5E_INDIR_RQT_SIZE, rqt);
+   err = mlx5e_create_rqt(priv, MLX5E_INDIR_RQT_SIZE, rqt);
+   if (err)
+   mlx5_core_warn(priv->mdev, "create indirect rqts failed, %d\n", 
err);
+   return err;
 }
 
 int mlx5e_create_direct_rqts(struct mlx5e_priv *priv)
@@ -2138,12 +2142,21 @@ int mlx5e_create_direct_rqts(struct mlx5e_priv *priv)
return 0;
 
 err_destroy_rqts:
+   mlx5_core_warn(priv->mdev, "create direct rqts failed, %d\n", err);
for (ix--; ix >= 0; ix--)
mlx5e_destroy_rqt(priv, >direct_tir[ix].rqt);
 
return err;
 }
 
+void mlx5e_destroy_direct_rqts(struct mlx5e_priv *priv)
+{
+   int i;
+
+   for (i = 0; i < priv->profile->max_nch(priv->mdev); i++)
+   mlx5e_destroy_rqt(priv, >direct_tir[i].rqt);
+}
+
 static int mlx5e_rx_hash_fn(int hfunc)
 {
return (hfunc == ETH_RSS_HASH_TOP) ?
@@ -2818,7 +2831,7 @@ static void mlx5e_build_direct_tir_ctx(struct mlx5e_priv 
*priv, u32 rqtn, u32 *t
MLX5_SET(tirc, tirc, rx_hash_fn, MLX5_RX_HASH_FN_INVERTED_XOR8);
 }
 
-static int mlx5e_create_indirect_tirs(struct mlx5e_priv *priv)
+int mlx5e_create_indirect_tirs(struct mlx5e_priv *priv)
 {
struct mlx5e_tir *tir;
void *tirc;
@@ -2847,6 +2860,7 @@ static int mlx5e_create_indirect_tirs(struct mlx5e_priv 
*priv)
return 0;
 
 err_destroy_tirs:
+   mlx5_core_warn(priv->mdev, "create indirect tirs failed, %d\n", err);
for (tt--; tt >= 0; tt--)
mlx5e_destroy_tir(priv->mdev, >indir_tir[tt]);
 
@@ -2885,6 +2899,7 @@ int mlx5e_create_direct_tirs(struct mlx5e_priv *priv)
return 0;
 
 err_destroy_ch_tirs:
+   mlx5_core_warn(priv->mdev, "create direct tirs failed, %d\n", err);
for (ix--; ix >= 0; ix--)
mlx5e_destroy_tir(

[PATCH net-next 01/16] net/mlx5: Add IPoIB enhanced offloads bits to mlx5_ifc

2017-04-12 Thread Saeed Mahameed
From: Erez Shitrit <ere...@mellanox.com>

New capability bit: ipoib_enhanced_offloads, indicates new ability for UD
QP to do RSS and enhanced IPoIB offloads and acceleration.

Add underlay_qpn to the TIS and flow_table objects In order to support
SET_ROOT command, to connect between IPoIB QPs and flow steering tables.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 1993adbd2c82..7c50bd39b297 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -872,7 +872,8 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 
u8 compact_address_vector[0x1];
u8 striding_rq[0x1];
-   u8 reserved_at_202[0x2];
+   u8 reserved_at_202[0x1];
+   u8 ipoib_enhanced_offloads[0x1];
u8 ipoib_basic_offloads[0x1];
u8 reserved_at_205[0xa];
u8 drain_sigerr[0x1];
@@ -2293,7 +2294,9 @@ struct mlx5_ifc_tisc_bits {
u8 reserved_at_120[0x8];
u8 transport_domain[0x18];
 
-   u8 reserved_at_140[0x3c0];
+   u8 reserved_at_140[0x8];
+   u8 underlay_qpn[0x18];
+   u8 reserved_at_160[0x3a0];
 };
 
 enum {
@@ -8218,7 +8221,9 @@ struct mlx5_ifc_set_flow_table_root_in_bits {
u8 reserved_at_a0[0x8];
u8 table_id[0x18];
 
-   u8 reserved_at_c0[0x140];
+   u8 reserved_at_c0[0x8];
+   u8 underlay_qpn[0x18];
+   u8 reserved_at_e0[0x120];
 };
 
 enum {
-- 
2.11.0



[PATCH net-next 05/16] net/mlx5e: IPoIB, Add netdevice profile skeleton

2017-04-12 Thread Saeed Mahameed
Create mlx5e IPoIB netdevice profile skeleton in the new ipoib.c
file with empty implementation.

Downstream patches will provide the full mlx5 rdma netdevice acceleration
support for IPoIB into this new file, by using the mlx5e netdevice
profile and new mlx5_channels APIs and infrastructures.
Same as already done in mlx5e NIC netdevice and switchdev mode VF
representors.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|   7 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   9 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   9 -
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c| 181 +
 .../ethernet/mellanox/mlx5/core/ipoib.h}   |  22 ++-
 6 files changed, 215 insertions(+), 15 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
 copy drivers/{infiniband/hw/mlx5/cmd.h => 
net/ethernet/mellanox/mlx5/core/ipoib.h} (77%)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 117170014e88..a84b652f9b54 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -31,3 +31,10 @@ config MLX5_CORE_EN_DCB
  This flag is depended on the kernel's DCB support.
 
  If unsure, set to Y
+
+config MLX5_CORE_IPOIB
+   bool "Mellanox Technologies ConnectX-4 IPoIB offloads support"
+   depends on MLX5_CORE_EN
+   default y
+   ---help---
+ MLX5 IPoIB offloads & acceleration support.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 9f43beb86250..9e644615f07a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -11,3 +11,5 @@ mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o 
eswitch_offloads.o \
en_tc.o en_arfs.o en_rep.o en_fs_ethtool.o en_selftest.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o
+
+mlx5_core-$(CONFIG_MLX5_CORE_IPOIB) += ipoib.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ced31906b8fd..02aa3cc59dc3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -153,6 +154,14 @@ static inline int mlx5_max_log_rq_size(int wq_type)
}
 }
 
+static inline int mlx5e_get_max_num_channels(struct mlx5_core_dev *mdev)
+{
+   return is_kdump_kernel() ?
+   MLX5E_MIN_NUM_CHANNELS :
+   min_t(int, mdev->priv.eq_table.num_comp_vectors,
+ MLX5E_MAX_NUM_CHANNELS);
+}
+
 struct mlx5e_tx_wqe {
struct mlx5_wqe_ctrl_seg ctrl;
struct mlx5_wqe_eth_seg  eth;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index cdc34ba354c8..14c7452a6348 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -31,7 +31,6 @@
  */
 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -1710,14 +1709,6 @@ static int mlx5e_set_tx_maxrate(struct net_device *dev, 
int index, u32 rate)
return err;
 }
 
-static inline int mlx5e_get_max_num_channels(struct mlx5_core_dev *mdev)
-{
-   return is_kdump_kernel() ?
-   MLX5E_MIN_NUM_CHANNELS :
-   min_t(int, mdev->priv.eq_table.num_comp_vectors,
- MLX5E_MAX_NUM_CHANNELS);
-}
-
 static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
  struct mlx5e_params *params,
  struct mlx5e_channel_param *cparam,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
new file mode 100644
index ..2f65927a8d03
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
@@ -0,0 +1,181 @@
+/*
+ * Copyright (c) 2017, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in b

[PATCH net-next 10/16] net/mlx5e: IPoIB, Underlay QP

2017-04-12 Thread Saeed Mahameed
Create IPoIB underlay QP needed by the IPoIB netdevice profile for RSS
and TX HW context to perform on IPoIB traffic.

Reset the underlay QP on dev_uninit ndo to stop IPoIB traffic going
through this QP when the ULP IPoIB decides to cleanup.

Implement attach/detach mcast RDMA netdev callbacks for later RDMA
netdev use.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 126 +++-
 1 file changed, 124 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
index e188d067bc97..bd56f36066b3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
@@ -34,6 +34,8 @@
 #include "en.h"
 #include "ipoib.h"
 
+#define IB_DEFAULT_Q_KEY   0xb1b
+
 static int mlx5i_open(struct net_device *netdev);
 static int mlx5i_close(struct net_device *netdev);
 static int  mlx5i_dev_init(struct net_device *dev);
@@ -83,12 +85,89 @@ static void mlx5i_cleanup(struct mlx5e_priv *priv)
/* Do nothing .. */
 }
 
+#define MLX5_QP_ENHANCED_ULP_STATELESS_MODE 2
+
+static int mlx5i_create_underlay_qp(struct mlx5_core_dev *mdev, struct 
mlx5_core_qp *qp)
+{
+   struct mlx5_qp_context *context = NULL;
+   u32 *in = NULL;
+   void *addr_path;
+   int ret = 0;
+   int inlen;
+   void *qpc;
+
+   inlen = MLX5_ST_SZ_BYTES(create_qp_in);
+   in = mlx5_vzalloc(inlen);
+   if (!in)
+   return -ENOMEM;
+
+   qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
+   MLX5_SET(qpc, qpc, st, MLX5_QP_ST_UD);
+   MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
+   MLX5_SET(qpc, qpc, ulp_stateless_offload_mode,
+MLX5_QP_ENHANCED_ULP_STATELESS_MODE);
+
+   addr_path = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
+   MLX5_SET(ads, addr_path, port, 1);
+   MLX5_SET(ads, addr_path, grh, 1);
+
+   ret = mlx5_core_create_qp(mdev, qp, in, inlen);
+   if (ret) {
+   mlx5_core_err(mdev, "Failed creating IPoIB QP err : %d\n", ret);
+   goto out;
+   }
+
+   /* QP states */
+   context = kzalloc(sizeof(*context), GFP_KERNEL);
+   if (!context) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   context->flags = cpu_to_be32(MLX5_QP_PM_MIGRATED << 11);
+   context->pri_path.port = 1;
+   context->qkey = cpu_to_be32(IB_DEFAULT_Q_KEY);
+
+   ret = mlx5_core_qp_modify(mdev, MLX5_CMD_OP_RST2INIT_QP, 0, context, 
qp);
+   if (ret) {
+   mlx5_core_err(mdev, "Failed to modify qp RST2INIT, err: %d\n", 
ret);
+   goto out;
+   }
+   memset(context, 0, sizeof(*context));
+
+   ret = mlx5_core_qp_modify(mdev, MLX5_CMD_OP_INIT2RTR_QP, 0, context, 
qp);
+   if (ret) {
+   mlx5_core_err(mdev, "Failed to modify qp INIT2RTR, err: %d\n", 
ret);
+   goto out;
+   }
+
+   ret = mlx5_core_qp_modify(mdev, MLX5_CMD_OP_RTR2RTS_QP, 0, context, qp);
+   if (ret) {
+   mlx5_core_err(mdev, "Failed to modify qp RTR2RTS, err: %d\n", 
ret);
+   goto out;
+   }
+
+out:
+   kfree(context);
+   kvfree(in);
+   return ret;
+}
+
+static void mlx5i_destroy_underlay_qp(struct mlx5_core_dev *mdev, struct 
mlx5_core_qp *qp)
+{
+   mlx5_core_destroy_qp(mdev, qp);
+}
+
 static int mlx5i_init_tx(struct mlx5e_priv *priv)
 {
struct mlx5i_priv *ipriv = priv->ppriv;
int err;
 
-   /* TODO: Create IPoIB underlay QP */
+   err = mlx5i_create_underlay_qp(priv->mdev, >qp);
+   if (err) {
+   mlx5_core_warn(priv->mdev, "create underlay QP failed, %d\n", 
err);
+   return err;
+   }
 
err = mlx5e_create_tis(priv->mdev, 0 /* tc */, ipriv->qp.qpn, 
>tisn[0]);
if (err) {
@@ -101,7 +180,10 @@ static int mlx5i_init_tx(struct mlx5e_priv *priv)
 
 void mlx5i_cleanup_tx(struct mlx5e_priv *priv)
 {
+   struct mlx5i_priv *ipriv = priv->ppriv;
+
mlx5e_destroy_tis(priv->mdev, priv->tisn[0]);
+   mlx5i_destroy_underlay_qp(priv->mdev, >qp);
 }
 
 static int mlx5i_create_flow_steering(struct mlx5e_priv *priv)
@@ -220,7 +302,13 @@ static int mlx5i_dev_init(struct net_device *dev)
 
 static void mlx5i_dev_cleanup(struct net_device *dev)
 {
-   /* TODO: detach underlay qp from flow-steering by reset it */
+   struct mlx5e_priv*priv   = mlx5i_epriv(dev);
+   struct mlx5_core_dev *mdev   = priv->mdev;
+   struct mlx5i_priv*ipriv  = priv->ppriv;
+   struct mlx5_qp_context context;
+
+   /* detach qp from flow-steering by reset it */
+   mlx5_core_qp_modify(mdev, MLX5_CMD_OP_2RST_QP, 0, , >qp);
 }
 
 static int mlx

[PATCH net-next 16/16] hw/mlx5: Add New bit to check over QP creation

2017-04-12 Thread Saeed Mahameed
From: Erez Shitrit <ere...@mellanox.com>

Add check for bit IB_QP_CREATE_NETIF_QP while creating QP.

Signed-off-by: Erez Shitrit <ere...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/qp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index ad8a2638e339..ed6320186f89 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -897,6 +897,7 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev,
if (init_attr->create_flags & ~(IB_QP_CREATE_SIGNATURE_EN |
IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK |
IB_QP_CREATE_IPOIB_UD_LSO |
+   IB_QP_CREATE_NETIF_QP |
mlx5_ib_create_qp_sqpn_qp1()))
return -EINVAL;
 
-- 
2.11.0



[PATCH net-next 11/16] net/mlx5e: Xmit flow break down

2017-04-12 Thread Saeed Mahameed
Break current mlx5e xmit flow into smaller blocks (helper functions)
in order to reuse them for IPoIB SKB transmission.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Erez Shitrit <ere...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |   7 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c   | 199 +-
 3 files changed, 119 insertions(+), 89 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 23b92ec54e12..25185f8c3562 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -304,6 +304,7 @@ struct mlx5e_cq {
 } cacheline_aligned_in_smp;
 
 struct mlx5e_tx_wqe_info {
+   struct sk_buff *skb;
u32 num_bytes;
u8  num_wqebbs;
u8  num_dma;
@@ -345,7 +346,6 @@ struct mlx5e_txqsq {
 
/* write@xmit, read@completion */
struct {
-   struct sk_buff   **skb;
struct mlx5e_sq_dma   *dma_fifo;
struct mlx5e_tx_wqe_info  *wqe_info;
} db;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index eb657987e9b5..2201b7ea05f4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1042,7 +1042,6 @@ static void mlx5e_free_txqsq_db(struct mlx5e_txqsq *sq)
 {
kfree(sq->db.wqe_info);
kfree(sq->db.dma_fifo);
-   kfree(sq->db.skb);
 }
 
 static int mlx5e_alloc_txqsq_db(struct mlx5e_txqsq *sq, int numa)
@@ -1050,13 +1049,11 @@ static int mlx5e_alloc_txqsq_db(struct mlx5e_txqsq *sq, 
int numa)
int wq_sz = mlx5_wq_cyc_get_size(>wq);
int df_sz = wq_sz * MLX5_SEND_WQEBB_NUM_DS;
 
-   sq->db.skb = kzalloc_node(wq_sz * sizeof(*sq->db.skb),
- GFP_KERNEL, numa);
sq->db.dma_fifo = kzalloc_node(df_sz * sizeof(*sq->db.dma_fifo),
   GFP_KERNEL, numa);
sq->db.wqe_info = kzalloc_node(wq_sz * sizeof(*sq->db.wqe_info),
   GFP_KERNEL, numa);
-   if (!sq->db.skb || !sq->db.dma_fifo || !sq->db.wqe_info) {
+   if (!sq->db.dma_fifo || !sq->db.wqe_info) {
mlx5e_free_txqsq_db(sq);
return -ENOMEM;
}
@@ -1295,7 +1292,7 @@ static void mlx5e_deactivate_txqsq(struct mlx5e_txqsq *sq)
if (mlx5e_wqc_has_room_for(>wq, sq->cc, sq->pc, 1)) {
struct mlx5e_tx_wqe *nop;
 
-   sq->db.skb[(sq->pc & sq->wq.sz_m1)] = NULL;
+   sq->db.wqe_info[(sq->pc & sq->wq.sz_m1)].skb = NULL;
nop = mlx5e_post_nop(>wq, sq->sqn, >pc);
mlx5e_notify_hw(>wq, sq->pc, sq->uar_map, >ctrl);
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 5bbc313e70c5..ba664a1126cf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -177,30 +177,9 @@ static inline void mlx5e_insert_vlan(void *start, struct 
sk_buff *skb, u16 ihs,
mlx5e_tx_skb_pull_inline(skb_data, skb_len, cpy2_sz);
 }
 
-static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb)
+static inline void
+mlx5e_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb, 
struct mlx5_wqe_eth_seg *eseg)
 {
-   struct mlx5_wq_cyc   *wq   = >wq;
-
-   u16 pi = sq->pc & wq->sz_m1;
-   struct mlx5e_tx_wqe  *wqe  = mlx5_wq_cyc_get_wqe(wq, pi);
-   struct mlx5e_tx_wqe_info *wi   = >db.wqe_info[pi];
-
-   struct mlx5_wqe_ctrl_seg *cseg = >ctrl;
-   struct mlx5_wqe_eth_seg  *eseg = >eth;
-   struct mlx5_wqe_data_seg *dseg;
-
-   unsigned char *skb_data = skb->data;
-   unsigned int skb_len = skb->len;
-   u8  opcode = MLX5_OPCODE_SEND;
-   dma_addr_t dma_addr = 0;
-   unsigned int num_bytes;
-   u16 headlen;
-   u16 ds_cnt;
-   u16 ihs;
-   int i;
-
-   memset(wqe, 0, sizeof(*wqe));
-
if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM;
if (skb->encapsulation) {
@@ -212,66 +191,51 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_txqsq *sq, 
struct sk_buff *skb)
}
} else
sq->stats.csum_none++;
+}
 
-   if (skb_is_gso(skb)) {
-   eseg->mss= cpu_to_be16(skb_shinfo(skb)->gso_size);
-   opcode   = MLX5_OPCODE_LSO;
+static inline u16
+mlx5e_txwqe_build_eseg_gso(struct mlx5e_txqsq *sq, struct sk_buff *skb,
+   

Re: [PATCH net v2] net/mlx5e: Fix race in mlx5e_sw_stats and mlx5e_vport_stats

2017-04-20 Thread Saeed Mahameed
On Thu, Apr 20, 2017 at 2:32 AM, Martin KaFai Lau <ka...@fb.com> wrote:
> We have observed a sudden spike in rx/tx_packets and rx/tx_bytes
> reported under /proc/net/dev.  There is a race in mlx5e_update_stats()
> and some of the get-stats functions (the one that we hit is the
> mlx5e_get_stats() which is called by ndo_get_stats64()).
>
> In particular, the very first thing mlx5e_update_sw_counters()
> does is 'memset(s, 0, sizeof(*s))'.  For example, if mlx5e_get_stats()
> is unlucky at one point, rx_bytes and rx_packets could be 0.  One second
> later, a normal (and much bigger than 0) value will be reported.
>
> This patch is to use a 'struct mlx5e_sw_stats temp' to avoid
> a direct memset zero on priv->stats.sw.
>
> mlx5e_update_vport_counters() has a similar race.  Hence, addressed
> together.
>
> I am lucky enough to catch this 0-reset in rx multicast:
> eth0: 41457665   76804   700070  0 47085 15586634   
> 87502300 0   3  0
> eth0: 41459860   76815   700070  0 47094 15588376   
> 87516300 0   3  0
> eth0: 41460577   76822   700070  0 0 15589083   
> 87521300 0   3  0
> eth0: 41463293   76838   700070  0 47108 15595872   
> 87538300 0   3  0
> eth0: 41463379   76839   700070      0 47116 15596138   
> 87539300 0   3  0
>
> Cc: Saeed Mahameed <sae...@mellanox.com>
> Suggested-by: Eric Dumazet <eric.duma...@gmail.com>
> Signed-off-by: Martin KaFai Lau <ka...@fb.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 66c133757a5e..246786bb861b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -174,7 +174,7 @@ static void mlx5e_tx_timeout_work(struct work_struct 
> *work)
>
>  static void mlx5e_update_sw_counters(struct mlx5e_priv *priv)
>  {
> -   struct mlx5e_sw_stats *s = >stats.sw;
> +   struct mlx5e_sw_stats temp, *s = 
> struct mlx5e_rq_stats *rq_stats;
> struct mlx5e_sq_stats *sq_stats;
> u64 tx_offload_none = 0;
> @@ -229,12 +229,14 @@ static void mlx5e_update_sw_counters(struct mlx5e_priv 
> *priv)
> s->link_down_events_phy = MLX5_GET(ppcnt_reg,
> priv->stats.pport.phy_counters,
> 
> counter_set.phys_layer_cntrs.link_down_events);
> +   memcpy(>stats.sw, s, sizeof(*s));
>  }
>
>  static void mlx5e_update_vport_counters(struct mlx5e_priv *priv)
>  {
> +   struct mlx5e_vport_stats temp;
> int outlen = MLX5_ST_SZ_BYTES(query_vport_counter_out);
> -   u32 *out = (u32 *)priv->stats.vport.query_vport_out;
> +   u32 *out = (u32 *)temp.query_vport_out;
> u32 in[MLX5_ST_SZ_DW(query_vport_counter_in)] = {0};
> struct mlx5_core_dev *mdev = priv->mdev;
>
> @@ -245,6 +247,7 @@ static void mlx5e_update_vport_counters(struct mlx5e_priv 
> *priv)
>
> memset(out, 0, outlen);

Actually you don't need any temp here, it is safe to just remove this
redundant memset
and mlx5_cmd_exec will do the copy for you.

> mlx5_cmd_exec(mdev, in, sizeof(in), out, outlen);
> +   memcpy(priv->stats.vport.query_vport_out, out, outlen);
>  }

Anyway we still need a spin lock here, and also for all the counters
under priv->stats which are affected by this race as well.

If you want I can accept this as a temporary  fix for net and Gal will
work on a spin lock based mechanism to fix the memcpy race for all the
counters.

>
>  static void mlx5e_update_pport_counters(struct mlx5e_priv *priv)
> --
> 2.9.3
>


Re: [PATCH net v2] net/mlx5e: Fix race in mlx5e_sw_stats and mlx5e_vport_stats

2017-04-20 Thread Saeed Mahameed
On Thu, Apr 20, 2017 at 5:15 PM, Martin KaFai Lau <ka...@fb.com> wrote:
> On Thu, Apr 20, 2017 at 05:00:13PM +0300, Saeed Mahameed wrote:
>> On Thu, Apr 20, 2017 at 2:32 AM, Martin KaFai Lau <ka...@fb.com> wrote:
>> > We have observed a sudden spike in rx/tx_packets and rx/tx_bytes
>> > reported under /proc/net/dev.  There is a race in mlx5e_update_stats()
>> > and some of the get-stats functions (the one that we hit is the
>> > mlx5e_get_stats() which is called by ndo_get_stats64()).
>> >
>> > In particular, the very first thing mlx5e_update_sw_counters()
>> > does is 'memset(s, 0, sizeof(*s))'.  For example, if mlx5e_get_stats()
>> > is unlucky at one point, rx_bytes and rx_packets could be 0.  One second
>> > later, a normal (and much bigger than 0) value will be reported.
>> >
>> > This patch is to use a 'struct mlx5e_sw_stats temp' to avoid
>> > a direct memset zero on priv->stats.sw.
>> >
>> > mlx5e_update_vport_counters() has a similar race.  Hence, addressed
>> > together.
>> >
>> > I am lucky enough to catch this 0-reset in rx multicast:
>> > eth0: 41457665   76804   700070  0 47085 15586634  
>> >  87502300 0   3  0
>> > eth0: 41459860   76815   700070  0 47094 15588376  
>> >  87516300 0   3  0
>> > eth0: 41460577   76822   700070  0 0 15589083  
>> >  87521300 0   3  0
>> > eth0: 41463293   76838   700    0    70  0 47108 15595872  
>> >  87538300 0   3  0
>> > eth0: 41463379   76839   700070  0 47116 15596138  
>> >  87539300 0   3  0
>> >
>> > Cc: Saeed Mahameed <sae...@mellanox.com>
>> > Suggested-by: Eric Dumazet <eric.duma...@gmail.com>
>> > Signed-off-by: Martin KaFai Lau <ka...@fb.com>
>> > ---
>> >  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 +--
>> >  1 file changed, 5 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
>> > b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> > index 66c133757a5e..246786bb861b 100644
>> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> > @@ -174,7 +174,7 @@ static void mlx5e_tx_timeout_work(struct work_struct 
>> > *work)
>> >
>> >  static void mlx5e_update_sw_counters(struct mlx5e_priv *priv)
>> >  {
>> > -   struct mlx5e_sw_stats *s = >stats.sw;
>> > +   struct mlx5e_sw_stats temp, *s = 
>> > struct mlx5e_rq_stats *rq_stats;
>> > struct mlx5e_sq_stats *sq_stats;
>> > u64 tx_offload_none = 0;
>> > @@ -229,12 +229,14 @@ static void mlx5e_update_sw_counters(struct 
>> > mlx5e_priv *priv)
>> > s->link_down_events_phy = MLX5_GET(ppcnt_reg,
>> > priv->stats.pport.phy_counters,
>> > 
>> > counter_set.phys_layer_cntrs.link_down_events);
>> > +   memcpy(>stats.sw, s, sizeof(*s));
>> >  }
>> >
>> >  static void mlx5e_update_vport_counters(struct mlx5e_priv *priv)
>> >  {
>> > +   struct mlx5e_vport_stats temp;
>> > int outlen = MLX5_ST_SZ_BYTES(query_vport_counter_out);
>> > -   u32 *out = (u32 *)priv->stats.vport.query_vport_out;
>> > +   u32 *out = (u32 *)temp.query_vport_out;
>> > u32 in[MLX5_ST_SZ_DW(query_vport_counter_in)] = {0};
>> > struct mlx5_core_dev *mdev = priv->mdev;
>> >
>> > @@ -245,6 +247,7 @@ static void mlx5e_update_vport_counters(struct 
>> > mlx5e_priv *priv)
>> >
>> > memset(out, 0, outlen);
>>
>> Actually you don't need any temp here, it is safe to just remove this
>> redundant memset
>> and mlx5_cmd_exec will do the copy for you.
>>
>> > mlx5_cmd_exec(mdev, in, sizeof(in), out, outlen);
>> > +   memcpy(priv->stats.vport.query_vport_out, out, outlen);
>> >  }
>>
>> Anyway we still need a spin lock here, and also for all the counters
>> under priv->stats which are affected by this race as well.
>>
>> If you want I can accept this as a temporary  fix for net and Gal will
>> work on a spin lock based mechanism to fix the memcpy race for all the
>> counters.
> A follow-up patch approach by Gal will be nice.
>

Ok, I will expect V3 with the removal of the memset from
mlx5e_update_vport_counters instead of the memcpy.
and Gal will work on the follow up patch


Re: [RFC PATCH net] net/mlx5e: Race between mlx5e_update_stats() and getting the stats

2017-04-20 Thread Saeed Mahameed
On Thu, Apr 20, 2017 at 2:35 AM, Eric Dumazet  wrote:
> On Wed, 2017-04-19 at 14:53 -0700, Martin KaFai Lau wrote:
>
>> Right, a temp and a memcpy should be enough to solve our spike problem.
>> It may be the right fix for net.
>>
>> Agree that using a spinlock is better (likely changing state_lock
>> to spinlock).  A quick grep shows 80 line changes.  Saeed, thoughts?

No, changing the current state_lock is a bad idea, as Eric said it is
for a reason,
to synchronize between arbitrary ring/device state changes which might sleep.

memcpy is a good idea to better hide or delay the issue :),
new dedicated spin lock is the right way to go, As Eric suggested below.

BTW, very nice catch Martin, I just got back from vacation to work on this bug,
and you already root caused it, Thanks !!

>
> I was not advising replacing the mutex (maybe it is a mutex for good
> reason), I simply suggested to use another spinlock only for this very
> specific section.
>
> Something like :
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
> b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> index 
> dc52053128bc752ccd398449330c24c0bdf8b3a1..9b2e1b79fded22d55e9409cb572308190679cfdd
>  100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> @@ -722,6 +722,7 @@ struct mlx5e_priv {
> struct mlx5_core_dev  *mdev;
> struct net_device *netdev;
> struct mlx5e_stats stats;
> +   spinlock_t stats_lock;
> struct mlx5e_tstamptstamp;
> u16 q_counter;
>  #ifdef CONFIG_MLX5_CORE_EN_DCB
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
> index 
> a004a5a1a4c22a742ef3f9939769c6b5c9445f46..b4b7d43bf899cadca2c2a17151d35acac9773859
>  100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
> @@ -315,9 +315,11 @@ static void mlx5e_get_ethtool_stats(struct net_device 
> *dev,
> mlx5e_update_stats(priv);
> mutex_unlock(>state_lock);
>
> +   spin_lock(>stats_lock);
> for (i = 0; i < NUM_SW_COUNTERS; i++)
> data[idx++] = MLX5E_READ_CTR64_CPU(>stats.sw,
>sw_stats_desc, i);
> +   spin_unlock(>stats_lock);
>
> for (i = 0; i < MLX5E_NUM_Q_CNTRS(priv); i++)
> data[idx++] = MLX5E_READ_CTR32_CPU(>stats.qcnt,
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 
> 66c133757a5ee8daae122e93322306b1c5c44336..4d6672045b1126a8bab4d6f2035e6a9b830560d2
>  100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -174,7 +174,7 @@ static void mlx5e_tx_timeout_work(struct work_struct 
> *work)
>
>  static void mlx5e_update_sw_counters(struct mlx5e_priv *priv)
>  {
> -   struct mlx5e_sw_stats *s = >stats.sw;
> +   struct mlx5e_sw_stats temp, *s = 
> struct mlx5e_rq_stats *rq_stats;
> struct mlx5e_sq_stats *sq_stats;
> u64 tx_offload_none = 0;
> @@ -229,6 +229,9 @@ static void mlx5e_update_sw_counters(struct mlx5e_priv 
> *priv)
> s->link_down_events_phy = MLX5_GET(ppcnt_reg,
> priv->stats.pport.phy_counters,
> 
> counter_set.phys_layer_cntrs.link_down_events);
> +   spin_lock(>stats_lock);
> +   memcpy(>stats.sw, s, sizeof(*s));
> +   spin_unlock(>stats_lock);

I like this ! minimized the critical section with a temp buffer and a
memcpy .. perfect.

>  }
>
>  static void mlx5e_update_vport_counters(struct mlx5e_priv *priv)
> @@ -2754,11 +2757,13 @@ mlx5e_get_stats(struct net_device *dev, struct 
> rtnl_link_stats64 *stats)
> stats->tx_packets = PPORT_802_3_GET(pstats, 
> a_frames_transmitted_ok);
> stats->tx_bytes   = PPORT_802_3_GET(pstats, 
> a_octets_transmitted_ok);
> } else {
> +   spin_lock(>stats_lock);
> stats->rx_packets = sstats->rx_packets;
> stats->rx_bytes   = sstats->rx_bytes;
> stats->tx_packets = sstats->tx_packets;
> stats->tx_bytes   = sstats->tx_bytes;
> stats->tx_dropped = sstats->tx_queue_dropped;
> +   spin_unlock(>stats_lock);
> }
>
> stats->rx_dropped = priv->stats.qcnt.rx_out_of_buffer;
> @@ -3561,6 +3566,8 @@ static void mlx5e_build_nic_netdev_priv(struct 
> mlx5_core_dev *mdev,
>
> mutex_init(>state_lock);
>
> +   spin_lock_init(>stats_lock);
> +
> INIT_WORK(>update_carrier_work, mlx5e_update_carrier_work);
> INIT_WORK(>set_rx_mode_work, mlx5e_set_rx_mode_work);
> INIT_WORK(>tx_timeout_work, mlx5e_tx_timeout_work);
>
>


[net 1/7] net/mlx5: Fix driver load bad flow when having fw initializing timeout

2017-04-23 Thread Saeed Mahameed
From: Mohamad Haj Yahia <moha...@mellanox.com>

If FW is stuck in initializing state we will skip the driver load, but
current error handling flow doesn't clean previously allocated command
interface resources.

Fixes: e3297246c2c8 ('net/mlx5_core: Wait for FW readiness on startup')
Signed-off-by: Mohamad Haj Yahia <moha...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Cc: kernel-t...@fb.com
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 60154a175bd3..0ad66324247f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1029,7 +1029,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, 
struct mlx5_priv *priv,
if (err) {
dev_err(>pdev->dev, "Firmware over %d MS in initializing 
state, aborting\n",
FW_INIT_TIMEOUT_MILI);
-   goto out_err;
+   goto err_cmd_cleanup;
}
 
err = mlx5_core_enable_hca(dev, 0);
-- 
2.11.0



[net 4/7] net/mlx5e: Make sure the FW max encap size is enough for ipv6 tunnels

2017-04-23 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

Otherwise the code that fills the ipv6 encapsulation headers could be writing
beyond the allocated headers buffer.

Fixes: ce99f6b97fcd ('net/mlx5e: Support SRIOV TC encapsulation offloads for 
IPv6 tunnels')
Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 42 ++---
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index fc7c1d30461c..5436866798f4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -819,16 +819,15 @@ static void gen_vxlan_header_ipv4(struct net_device 
*out_dev,
vxh->vx_vni = vxlan_vni_field(vx_vni);
 }
 
-static int gen_vxlan_header_ipv6(struct net_device *out_dev,
-char buf[],
-unsigned char h_dest[ETH_ALEN],
-int ttl,
-struct in6_addr *daddr,
-struct in6_addr *saddr,
-__be16 udp_dst_port,
-__be32 vx_vni)
+static void gen_vxlan_header_ipv6(struct net_device *out_dev,
+ char buf[], int encap_size,
+ unsigned char h_dest[ETH_ALEN],
+ int ttl,
+ struct in6_addr *daddr,
+ struct in6_addr *saddr,
+ __be16 udp_dst_port,
+ __be32 vx_vni)
 {
-   int encap_size = VXLAN_HLEN + sizeof(struct ipv6hdr) + ETH_HLEN;
struct ethhdr *eth = (struct ethhdr *)buf;
struct ipv6hdr *ip6h = (struct ipv6hdr *)((char *)eth + sizeof(struct 
ethhdr));
struct udphdr *udp = (struct udphdr *)((char *)ip6h + sizeof(struct 
ipv6hdr));
@@ -850,8 +849,6 @@ static int gen_vxlan_header_ipv6(struct net_device *out_dev,
udp->dest = udp_dst_port;
vxh->vx_flags = VXLAN_HF_VNI;
vxh->vx_vni = vxlan_vni_field(vx_vni);
-
-   return encap_size;
 }
 
 static int mlx5e_create_encap_header_ipv4(struct mlx5e_priv *priv,
@@ -935,13 +932,20 @@ static int mlx5e_create_encap_header_ipv6(struct 
mlx5e_priv *priv,
 
 {
int max_encap_size = MLX5_CAP_ESW(priv->mdev, max_encap_header_size);
+   int ipv6_encap_size = ETH_HLEN + sizeof(struct ipv6hdr) + VXLAN_HLEN;
struct ip_tunnel_key *tun_key = >tun_info.key;
-   int encap_size, err, ttl = 0;
struct neighbour *n = NULL;
struct flowi6 fl6 = {};
char *encap_header;
+   int err, ttl = 0;
 
-   encap_header = kzalloc(max_encap_size, GFP_KERNEL);
+   if (max_encap_size < ipv6_encap_size) {
+   mlx5_core_warn(priv->mdev, "encap size %d too big, max 
supported is %d\n",
+  ipv6_encap_size, max_encap_size);
+   return -EOPNOTSUPP;
+   }
+
+   encap_header = kzalloc(ipv6_encap_size, GFP_KERNEL);
if (!encap_header)
return -ENOMEM;
 
@@ -977,11 +981,11 @@ static int mlx5e_create_encap_header_ipv6(struct 
mlx5e_priv *priv,
 
switch (e->tunnel_type) {
case MLX5_HEADER_TYPE_VXLAN:
-   encap_size = gen_vxlan_header_ipv6(*out_dev, encap_header,
-  e->h_dest, ttl,
-  ,
-  , tun_key->tp_dst,
-  
tunnel_id_to_key32(tun_key->tun_id));
+   gen_vxlan_header_ipv6(*out_dev, encap_header,
+ ipv6_encap_size, e->h_dest, ttl,
+ ,
+ , tun_key->tp_dst,
+ tunnel_id_to_key32(tun_key->tun_id));
break;
default:
err = -EOPNOTSUPP;
@@ -989,7 +993,7 @@ static int mlx5e_create_encap_header_ipv6(struct mlx5e_priv 
*priv,
}
 
err = mlx5_encap_alloc(priv->mdev, e->tunnel_type,
-  encap_size, encap_header, >encap_id);
+  ipv6_encap_size, encap_header, >encap_id);
 out:
if (err && n)
neigh_release(n);
-- 
2.11.0



[net 6/7] net/mlx5e: Fix small packet threshold

2017-04-23 Thread Saeed Mahameed
From: Eugenia Emantayev <euge...@mellanox.com>

RX packet headers are meant to be contained in SKB linear part,
and chose a threshold of 128.
It turns out this is not enough, i.e. for IPv6 packet over VxLAN.
In this case, UDP/IPv4 needs 42 bytes, GENEVE header is 8 bytes,
and 86 bytes for TCP/IPv6. In total 136 bytes that is more than
current 128 bytes. In this case expand header flow is reached.
The warning in skb_try_coalesce() caused by a wrong truesize
was already fixed here:
commit 158f323b9868 ("net: adjust skb->truesize in pskb_expand_head()").
Still, we prefer to totally avoid the expand header flow for performance 
reasons.
Tested regular TCP_STREAM with iperf for 1 and 8 streams, no degradation was 
found.

Fixes: 461017cb006a ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
Signed-off-by: Eugenia Emantayev <euge...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Cc: kernel-t...@fb.com
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index dc52053128bc..3d9490cd2db1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -90,7 +90,7 @@
 #define MLX5E_VALID_NUM_MTTS(num_mtts) (MLX5_MTT_OCTW(num_mtts) - 1 <= U16_MAX)
 
 #define MLX5_UMR_ALIGN (2048)
-#define MLX5_MPWRQ_SMALL_PACKET_THRESHOLD  (128)
+#define MLX5_MPWRQ_SMALL_PACKET_THRESHOLD  (256)
 
 #define MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ (64 * 1024)
 #define MLX5E_DEFAULT_LRO_TIMEOUT   32
-- 
2.11.0



[pull request][net 0/7] Mellanox, mlx5 fixes 2017-04-22

2017-04-23 Thread Saeed Mahameed
Hi Dave,

This series contains some mlx5 fixes for net.

For your convenience, the series doesn't introduce any conflict with
the ongoing net-next pull request.

Please pull and let me know if there's any problem.

For -stable:
("net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5") kernels 
>= 4.10
("net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling") kernels >= 4.8
("net/mlx5e: Fix small packet threshold")   kernels >= 4.7
("net/mlx5: Fix driver load bad flow when having fw initializing timeout") 
kernels >= 4.4

Thanks,
Saeed.

The following changes since commit 94836ecf1e7378b64d37624fbb81fe48fbd4c772:

  Merge tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux (2017-04-21 
16:37:48 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5-fixes-2017-04-22

for you to fetch changes up to 5e82c9e4ed60beba83f46a1a5a8307b99a23e982:

  net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling (2017-04-22 21:52:37 +0300)


mlx5-fixes-2017-04-22


Eugenia Emantayev (1):
  net/mlx5e: Fix small packet threshold

Ilan Tayari (1):
  net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling

Maor Gottlieb (1):
  net/mlx5: Fix UAR memory leak

Mohamad Haj Yahia (1):
  net/mlx5: Fix driver load bad flow when having fw initializing timeout

Or Gerlitz (3):
  net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5
  net/mlx5e: Make sure the FW max encap size is enough for ipv4 tunnels
  net/mlx5e: Make sure the FW max encap size is enough for ipv6 tunnels

 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  2 +-
 .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c|  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 87 --
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 36 ++---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  |  1 +
 6 files changed, 76 insertions(+), 53 deletions(-)


[net 3/7] net/mlx5e: Make sure the FW max encap size is enough for ipv4 tunnels

2017-04-23 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

Otherwise the code that fills the ipv4 encapsulation headers could be writing
beyond the allocated headers buffer.

Fixes: a54e20b4fcae ('net/mlx5e: Add basic TC tunnel set action for SRIOV 
offloads')
Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 42 ++---
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index b7c99c38a7c4..fc7c1d30461c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -786,16 +786,15 @@ static int mlx5e_route_lookup_ipv6(struct mlx5e_priv 
*priv,
return 0;
 }
 
-static int gen_vxlan_header_ipv4(struct net_device *out_dev,
-char buf[],
-unsigned char h_dest[ETH_ALEN],
-int ttl,
-__be32 daddr,
-__be32 saddr,
-__be16 udp_dst_port,
-__be32 vx_vni)
+static void gen_vxlan_header_ipv4(struct net_device *out_dev,
+ char buf[], int encap_size,
+ unsigned char h_dest[ETH_ALEN],
+ int ttl,
+ __be32 daddr,
+ __be32 saddr,
+ __be16 udp_dst_port,
+ __be32 vx_vni)
 {
-   int encap_size = VXLAN_HLEN + sizeof(struct iphdr) + ETH_HLEN;
struct ethhdr *eth = (struct ethhdr *)buf;
struct iphdr  *ip = (struct iphdr *)((char *)eth + sizeof(struct 
ethhdr));
struct udphdr *udp = (struct udphdr *)((char *)ip + sizeof(struct 
iphdr));
@@ -818,8 +817,6 @@ static int gen_vxlan_header_ipv4(struct net_device *out_dev,
udp->dest = udp_dst_port;
vxh->vx_flags = VXLAN_HF_VNI;
vxh->vx_vni = vxlan_vni_field(vx_vni);
-
-   return encap_size;
 }
 
 static int gen_vxlan_header_ipv6(struct net_device *out_dev,
@@ -863,13 +860,20 @@ static int mlx5e_create_encap_header_ipv4(struct 
mlx5e_priv *priv,
  struct net_device **out_dev)
 {
int max_encap_size = MLX5_CAP_ESW(priv->mdev, max_encap_header_size);
+   int ipv4_encap_size = ETH_HLEN + sizeof(struct iphdr) + VXLAN_HLEN;
struct ip_tunnel_key *tun_key = >tun_info.key;
-   int encap_size, ttl, err;
struct neighbour *n = NULL;
struct flowi4 fl4 = {};
char *encap_header;
+   int ttl, err;
 
-   encap_header = kzalloc(max_encap_size, GFP_KERNEL);
+   if (max_encap_size < ipv4_encap_size) {
+   mlx5_core_warn(priv->mdev, "encap size %d too big, max 
supported is %d\n",
+  ipv4_encap_size, max_encap_size);
+   return -EOPNOTSUPP;
+   }
+
+   encap_header = kzalloc(ipv4_encap_size, GFP_KERNEL);
if (!encap_header)
return -ENOMEM;
 
@@ -904,11 +908,11 @@ static int mlx5e_create_encap_header_ipv4(struct 
mlx5e_priv *priv,
 
switch (e->tunnel_type) {
case MLX5_HEADER_TYPE_VXLAN:
-   encap_size = gen_vxlan_header_ipv4(*out_dev, encap_header,
-  e->h_dest, ttl,
-  fl4.daddr,
-  fl4.saddr, tun_key->tp_dst,
-  
tunnel_id_to_key32(tun_key->tun_id));
+   gen_vxlan_header_ipv4(*out_dev, encap_header,
+ ipv4_encap_size, e->h_dest, ttl,
+ fl4.daddr,
+ fl4.saddr, tun_key->tp_dst,
+ tunnel_id_to_key32(tun_key->tun_id));
break;
default:
err = -EOPNOTSUPP;
@@ -916,7 +920,7 @@ static int mlx5e_create_encap_header_ipv4(struct mlx5e_priv 
*priv,
}
 
err = mlx5_encap_alloc(priv->mdev, e->tunnel_type,
-  encap_size, encap_header, >encap_id);
+  ipv4_encap_size, encap_header, >encap_id);
 out:
if (err && n)
neigh_release(n);
-- 
2.11.0



[net 2/7] net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5

2017-04-23 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

On ConnectX5 the wqe inline mode is "none" and hence the FW
reports MLX5_CAP_INLINE_MODE_NOT_REQUIRED.

Fix our devlink callbacks to deal with that on get and set.

Also fix the tc flow parsing code not to fail anything when
inline isn't required.

Fixes: bffaa916588e ('net/mlx5: E-Switch, Add control for inline mode')
Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c|  3 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 36 ++
 2 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index fade7233dac5..b7c99c38a7c4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -639,7 +639,8 @@ static int parse_cls_flower(struct mlx5e_priv *priv,
 
if (!err && (flow->flags & MLX5E_TC_FLOW_ESWITCH) &&
rep->vport != FDB_UPLINK_VPORT) {
-   if (min_inline > esw->offloads.inline_mode) {
+   if (esw->offloads.inline_mode != MLX5_INLINE_MODE_NONE &&
+   esw->offloads.inline_mode < min_inline) {
netdev_warn(priv->netdev,
"Flow is not offloaded due to min inline 
setting, required %d actual %d\n",
min_inline, esw->offloads.inline_mode);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 307ec6c5fd3b..d111cebca9f1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -911,8 +911,7 @@ int mlx5_devlink_eswitch_inline_mode_set(struct devlink 
*devlink, u8 mode)
struct mlx5_core_dev *dev = devlink_priv(devlink);
struct mlx5_eswitch *esw = dev->priv.eswitch;
int num_vports = esw->enabled_vports;
-   int err;
-   int vport;
+   int err, vport;
u8 mlx5_mode;
 
if (!MLX5_CAP_GEN(dev, vport_group_manager))
@@ -921,9 +920,17 @@ int mlx5_devlink_eswitch_inline_mode_set(struct devlink 
*devlink, u8 mode)
if (esw->mode == SRIOV_NONE)
return -EOPNOTSUPP;
 
-   if (MLX5_CAP_ETH(dev, wqe_inline_mode) !=
-   MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
+   switch (MLX5_CAP_ETH(dev, wqe_inline_mode)) {
+   case MLX5_CAP_INLINE_MODE_NOT_REQUIRED:
+   if (mode == DEVLINK_ESWITCH_INLINE_MODE_NONE)
+   return 0;
+   /* fall through */
+   case MLX5_CAP_INLINE_MODE_L2:
+   esw_warn(dev, "Inline mode can't be set\n");
return -EOPNOTSUPP;
+   case MLX5_CAP_INLINE_MODE_VPORT_CONTEXT:
+   break;
+   }
 
if (esw->offloads.num_flows > 0) {
esw_warn(dev, "Can't set inline mode when flows are 
configured\n");
@@ -966,18 +973,14 @@ int mlx5_devlink_eswitch_inline_mode_get(struct devlink 
*devlink, u8 *mode)
if (esw->mode == SRIOV_NONE)
return -EOPNOTSUPP;
 
-   if (MLX5_CAP_ETH(dev, wqe_inline_mode) !=
-   MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
-   return -EOPNOTSUPP;
-
return esw_inline_mode_to_devlink(esw->offloads.inline_mode, mode);
 }
 
 int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, int nvfs, u8 *mode)
 {
+   u8 prev_mlx5_mode, mlx5_mode = MLX5_INLINE_MODE_L2;
struct mlx5_core_dev *dev = esw->dev;
int vport;
-   u8 prev_mlx5_mode, mlx5_mode = MLX5_INLINE_MODE_L2;
 
if (!MLX5_CAP_GEN(dev, vport_group_manager))
return -EOPNOTSUPP;
@@ -985,10 +988,18 @@ int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch 
*esw, int nvfs, u8 *mode)
if (esw->mode == SRIOV_NONE)
return -EOPNOTSUPP;
 
-   if (MLX5_CAP_ETH(dev, wqe_inline_mode) !=
-   MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
-   return -EOPNOTSUPP;
+   switch (MLX5_CAP_ETH(dev, wqe_inline_mode)) {
+   case MLX5_CAP_INLINE_MODE_NOT_REQUIRED:
+   mlx5_mode = MLX5_INLINE_MODE_NONE;
+   goto out;
+   case MLX5_CAP_INLINE_MODE_L2:
+   mlx5_mode = MLX5_INLINE_MODE_L2;
+   goto out;
+   case MLX5_CAP_INLINE_MODE_VPORT_CONTEXT:
+   goto query_vports;
+   }
 
+query_vports:
for (vport = 1; vport <= nvfs; vport++) {
mlx5_query_nic_vport_min_inline(dev, vport, _mode);
if (vport > 1 && prev_mlx5_mode != mlx5_mode)
@@ -996,6 +1007,7 @@ int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, 
int nvfs, u8 *mode)
prev_mlx5_mode = mlx5_mode;
}
 
+out:
*mode = mlx5_mode;
return 0;
 }
-- 
2.11.0



[net 7/7] net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling

2017-04-23 Thread Saeed Mahameed
From: Ilan Tayari <il...@mellanox.com>

Handler for ETHTOOL_GRXCLSRLALL must set info->data to the size
of the table, regardless of the amount of entries in it.
Existing code does not do that, and this breaks all usage of ethtool -N
or -n without explicit location, with this error:
rmgr: Invalid RX class rules table size: Success

Set info->data to the table size.

Tested:
ethtool -n ens8
ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1
ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1 loc 55
ethtool -n ens8
ethtool -N ens8 delete 1023
ethtool -N ens8 delete 55

Fixes: f913a72aa008 ("net/mlx5e: Add support to get ethtool flow rules")
Signed-off-by: Ilan Tayari <il...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Cc: kernel-t...@fb.com
---
 drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
index d55fff0ba388..26fc77e80f7b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
@@ -564,6 +564,7 @@ int mlx5e_ethtool_get_all_flows(struct mlx5e_priv *priv, 
struct ethtool_rxnfc *i
int idx = 0;
int err = 0;
 
+   info->data = MAX_NUM_OF_ETHTOOL_RULES;
while ((!err || err == -ENOENT) && idx < info->rule_cnt) {
err = mlx5e_ethtool_get_flow(priv, info, location);
if (!err)
-- 
2.11.0



[net 5/7] net/mlx5: Fix UAR memory leak

2017-04-23 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

When UAR is released, we deallocate the device resource, but
don't unmmap the UAR mapping memory.
Fix the leak by unmapping this memory.

Fixes: a6d51b68611e9 ('net/mlx5: Introduce blue flame register allocator)
Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/uar.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/uar.c 
b/drivers/net/ethernet/mellanox/mlx5/core/uar.c
index 2e6b0f290ddc..222b25908d01 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/uar.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/uar.c
@@ -87,6 +87,7 @@ static void up_rel_func(struct kref *kref)
struct mlx5_uars_page *up = container_of(kref, struct mlx5_uars_page, 
ref_count);
 
list_del(>list);
+   iounmap(up->map);
if (mlx5_cmd_free_uar(up->mdev, up->index))
mlx5_core_warn(up->mdev, "failed to free uar index %d\n", 
up->index);
kfree(up->reg_bitmap);
-- 
2.11.0



[pull request][net-next 0/5] Mellanox, mlx5 updates 2017-04-22

2017-04-22 Thread Saeed Mahameed
Hi Dave,

This series contains some updates to mlx5 driver.

Sparse and compiler warnings fixes from Stephen Hemminger.

>From Roi Dayan and Or Gerlitz, Add devlink and mlx5 support for controlling
E-Switch encapsulation mode, this knob will enable HW support for applying
encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading.

Please pull and let me know if there's any problem.

Thanks,
Saeed.

---

The following changes since commit fb796707d7a6c9b24fdf80b9b4f24fa5ffcf0ec5:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-04-21 
20:23:53 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5-updates-2017-04-22

for you to fetch changes up to 8bf3198a5e394ed6815aeb8fedaf49722986bbd3:

  mlx5: fix warning about missing prototype (2017-04-22 20:26:42 +0300)

Or Gerlitz (1):
  net/mlx5: E-Switch, Refactor fast path FDB table creation in switchdev 
mode

Roi Dayan (2):
  net/devlink: Add E-Switch encapsulation control
  net/mlx5: E-Switch, Add control for encapsulation

Stephen Hemminger (2):
  mlx5: hide unused functions
  mlx5: fix warning about missing prototype

 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   1 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |   5 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |   3 +
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 132 +
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c|  24 ++--
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   2 +
 include/net/devlink.h  |   2 +
 include/uapi/linux/devlink.h   |   7 ++
 net/core/devlink.c |  26 +++-
 10 files changed, 167 insertions(+), 36 deletions(-)


[net-next 4/5] mlx5: hide unused functions

2017-04-22 Thread Saeed Mahameed
From: Stephen Hemminger <step...@networkplumber.org>

Fix sparse warnings in recent ipoib support.
The RDMA functions are not used yet, hide behind #ifdef.
Based on comment, they will eventually be local so make static.

Signed-off-by: Stephen Hemminger <sthem...@microsoft.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
index ec78e637840f..3c84e36af018 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
@@ -178,7 +178,7 @@ static int mlx5i_init_tx(struct mlx5e_priv *priv)
return 0;
 }
 
-void mlx5i_cleanup_tx(struct mlx5e_priv *priv)
+static void mlx5i_cleanup_tx(struct mlx5e_priv *priv)
 {
struct mlx5i_priv *ipriv = priv->ppriv;
 
@@ -359,9 +359,10 @@ static int mlx5i_close(struct net_device *netdev)
return 0;
 }
 
+#ifdef notusedyet
 /* IPoIB RDMA netdev callbacks */
-int mlx5i_attach_mcast(struct net_device *netdev, struct ib_device *hca,
-  union ib_gid *gid, u16 lid, int set_qkey)
+static int mlx5i_attach_mcast(struct net_device *netdev, struct ib_device *hca,
+ union ib_gid *gid, u16 lid, int set_qkey)
 {
struct mlx5e_priv*epriv = mlx5i_epriv(netdev);
struct mlx5_core_dev *mdev  = epriv->mdev;
@@ -377,8 +378,8 @@ int mlx5i_attach_mcast(struct net_device *netdev, struct 
ib_device *hca,
return err;
 }
 
-int mlx5i_detach_mcast(struct net_device *netdev, struct ib_device *hca,
-  union ib_gid *gid, u16 lid)
+static int mlx5i_detach_mcast(struct net_device *netdev, struct ib_device *hca,
+ union ib_gid *gid, u16 lid)
 {
struct mlx5e_priv*epriv = mlx5i_epriv(netdev);
struct mlx5_core_dev *mdev  = epriv->mdev;
@@ -395,7 +396,7 @@ int mlx5i_detach_mcast(struct net_device *netdev, struct 
ib_device *hca,
return err;
 }
 
-int mlx5i_xmit(struct net_device *dev, struct sk_buff *skb,
+static int mlx5i_xmit(struct net_device *dev, struct sk_buff *skb,
   struct ib_ah *address, u32 dqpn, u32 dqkey)
 {
struct mlx5e_priv *epriv = mlx5i_epriv(dev);
@@ -404,6 +405,7 @@ int mlx5i_xmit(struct net_device *dev, struct sk_buff *skb,
 
return mlx5i_sq_xmit(sq, skb, >av, dqpn, dqkey);
 }
+#endif
 
 static int mlx5i_check_required_hca_cap(struct mlx5_core_dev *mdev)
 {
@@ -418,10 +420,10 @@ static int mlx5i_check_required_hca_cap(struct 
mlx5_core_dev *mdev)
return 0;
 }
 
-struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
- struct ib_device *ibdev,
- const char *name,
- void (*setup)(struct net_device *))
+static struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
+struct ib_device *ibdev,
+const char *name,
+void (*setup)(struct 
net_device *))
 {
const struct mlx5e_profile *profile = _nic_profile;
int nch = profile->max_nch(mdev);
@@ -480,7 +482,7 @@ struct net_device *mlx5_rdma_netdev_alloc(struct 
mlx5_core_dev *mdev,
 }
 EXPORT_SYMBOL(mlx5_rdma_netdev_alloc);
 
-void mlx5_rdma_netdev_free(struct net_device *netdev)
+static void mlx5_rdma_netdev_free(struct net_device *netdev)
 {
struct mlx5e_priv  *priv= mlx5i_epriv(netdev);
const struct mlx5e_profile *profile = priv->profile;
-- 
2.11.0



[net-next 5/5] mlx5: fix warning about missing prototype

2017-04-22 Thread Saeed Mahameed
From: Stephen Hemminger <step...@networkplumber.org>

Fix sparse warning about missing prototypes. The rx/tx code path
defines functions with prototypes in ipoib.h.

Signed-off-by: Stephen Hemminger <sthem...@microsoft.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 43308243f519..ae66fad98244 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -39,6 +39,7 @@
 #include "en.h"
 #include "en_tc.h"
 #include "eswitch.h"
+#include "ipoib.h"
 
 static inline bool mlx5e_rx_hw_stamp(struct mlx5e_tstamp *tstamp)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index dda7db503043..ab3bb026ff9e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include "en.h"
+#include "ipoib.h"
 
 #define MLX5E_SQ_NOPS_ROOM  MLX5_SEND_WQE_MAX_WQEBBS
 #define MLX5E_SQ_STOP_ROOM (MLX5_SEND_WQE_MAX_WQEBBS +\
-- 
2.11.0



[net-next 1/5] net/devlink: Add E-Switch encapsulation control

2017-04-22 Thread Saeed Mahameed
From: Roi Dayan <r...@mellanox.com>

This is an e-switch global knob to enable HW support for applying
encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading.

The actual encap/decap is carried out (along with the matching and other 
actions)
per offloaded e-switch rules, e.g as done when offloading the TC tunnel key 
action.

Signed-off-by: Roi Dayan <r...@mellanox.com>
Reviewed-by: Or Gerlitz <ogerl...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 include/net/devlink.h|  2 ++
 include/uapi/linux/devlink.h |  7 +++
 net/core/devlink.c   | 26 +++---
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 24de13f8c94f..ed7687bbf5d0 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -268,6 +268,8 @@ struct devlink_ops {
int (*eswitch_mode_set)(struct devlink *devlink, u16 mode);
int (*eswitch_inline_mode_get)(struct devlink *devlink, u8 
*p_inline_mode);
int (*eswitch_inline_mode_set)(struct devlink *devlink, u8 inline_mode);
+   int (*eswitch_encap_mode_get)(struct devlink *devlink, u8 
*p_encap_mode);
+   int (*eswitch_encap_mode_set)(struct devlink *devlink, u8 encap_mode);
 };
 
 static inline void *devlink_priv(struct devlink *devlink)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index b47bee277347..b0e807ac53bb 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -119,6 +119,11 @@ enum devlink_eswitch_inline_mode {
DEVLINK_ESWITCH_INLINE_MODE_TRANSPORT,
 };
 
+enum devlink_eswitch_encap_mode {
+   DEVLINK_ESWITCH_ENCAP_MODE_NONE,
+   DEVLINK_ESWITCH_ENCAP_MODE_BASIC,
+};
+
 enum devlink_attr {
/* don't change the order or add anything between, this is ABI! */
DEVLINK_ATTR_UNSPEC,
@@ -195,6 +200,8 @@ enum devlink_attr {
 
DEVLINK_ATTR_PAD,
 
+   DEVLINK_ATTR_ESWITCH_ENCAP_MODE,/* u8 */
+
/* add new attributes above here, update the policy in devlink.c */
 
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 0afac5800b57..b0b87a292e7c 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1397,10 +1397,10 @@ static int devlink_nl_eswitch_fill(struct sk_buff *msg, 
struct devlink *devlink,
   u32 seq, int flags)
 {
const struct devlink_ops *ops = devlink->ops;
+   u8 inline_mode, encap_mode;
void *hdr;
int err = 0;
u16 mode;
-   u8 inline_mode;
 
hdr = genlmsg_put(msg, portid, seq, _nl_family, flags, cmd);
if (!hdr)
@@ -1429,6 +1429,15 @@ static int devlink_nl_eswitch_fill(struct sk_buff *msg, 
struct devlink *devlink,
goto nla_put_failure;
}
 
+   if (ops->eswitch_encap_mode_get) {
+   err = ops->eswitch_encap_mode_get(devlink, _mode);
+   if (err)
+   goto nla_put_failure;
+   err = nla_put_u8(msg, DEVLINK_ATTR_ESWITCH_ENCAP_MODE, 
encap_mode);
+   if (err)
+   goto nla_put_failure;
+   }
+
genlmsg_end(msg, hdr);
return 0;
 
@@ -1468,9 +1477,9 @@ static int devlink_nl_cmd_eswitch_set_doit(struct sk_buff 
*skb,
 {
struct devlink *devlink = info->user_ptr[0];
const struct devlink_ops *ops = devlink->ops;
-   u16 mode;
-   u8 inline_mode;
+   u8 inline_mode, encap_mode;
int err = 0;
+   u16 mode;
 
if (!ops)
return -EOPNOTSUPP;
@@ -1493,6 +1502,16 @@ static int devlink_nl_cmd_eswitch_set_doit(struct 
sk_buff *skb,
if (err)
return err;
}
+
+   if (info->attrs[DEVLINK_ATTR_ESWITCH_ENCAP_MODE]) {
+   if (!ops->eswitch_encap_mode_set)
+   return -EOPNOTSUPP;
+   encap_mode = 
nla_get_u8(info->attrs[DEVLINK_ATTR_ESWITCH_ENCAP_MODE]);
+   err = ops->eswitch_encap_mode_set(devlink, encap_mode);
+   if (err)
+   return err;
+   }
+
return 0;
 }
 
@@ -2190,6 +2209,7 @@ static const struct nla_policy 
devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_SB_TC_INDEX] = { .type = NLA_U16 },
[DEVLINK_ATTR_ESWITCH_MODE] = { .type = NLA_U16 },
[DEVLINK_ATTR_ESWITCH_INLINE_MODE] = { .type = NLA_U8 },
+   [DEVLINK_ATTR_ESWITCH_ENCAP_MODE] = { .type = NLA_U8 },
[DEVLINK_ATTR_DPIPE_TABLE_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED] = { .type = NLA_U8 },
 };
-- 
2.11.0



[net-next 3/5] net/mlx5: E-Switch, Add control for encapsulation

2017-04-22 Thread Saeed Mahameed
From: Roi Dayan <r...@mellanox.com>

Implement the devlink e-switch encapsulation control set and get
callbacks. Apply the value set by the user on the switchdev offloads
mode when creating the fast FDB table where offloaded rules will be set.

Signed-off-by: Roi Dayan <r...@mellanox.com>
Reviewed-by: Or Gerlitz <ogerl...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  5 ++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  3 ++
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 63 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  2 +
 4 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index b3281d1118b3..21bed3c3334d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1806,6 +1806,11 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
esw->enabled_vports = 0;
esw->mode = SRIOV_NONE;
esw->offloads.inline_mode = MLX5_INLINE_MODE_NONE;
+   if (MLX5_CAP_ESW_FLOWTABLE_FDB(dev, encap) &&
+   MLX5_CAP_ESW_FLOWTABLE_FDB(dev, decap))
+   esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_BASIC;
+   else
+   esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_NONE;
 
dev->priv.eswitch = esw;
return 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 1f56ed9f5a6f..1e7f21be1233 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -210,6 +210,7 @@ struct mlx5_esw_offload {
DECLARE_HASHTABLE(encap_tbl, 8);
u8 inline_mode;
u64 num_flows;
+   u8 encap;
 };
 
 struct mlx5_eswitch {
@@ -322,6 +323,8 @@ int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, 
u16 *mode);
 int mlx5_devlink_eswitch_inline_mode_set(struct devlink *devlink, u8 mode);
 int mlx5_devlink_eswitch_inline_mode_get(struct devlink *devlink, u8 *mode);
 int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, int nvfs, u8 *mode);
+int mlx5_devlink_eswitch_encap_mode_set(struct devlink *devlink, u8 encap);
+int mlx5_devlink_eswitch_encap_mode_get(struct devlink *devlink, u8 *encap);
 void mlx5_eswitch_register_vport_rep(struct mlx5_eswitch *esw,
 int vport_index,
 struct mlx5_eswitch_rep *rep);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index ce3a2c040706..189d24dbd3e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -450,8 +450,7 @@ static int esw_create_offloads_fast_fdb_table(struct 
mlx5_eswitch *esw)
esw_size = min_t(int, MLX5_CAP_GEN(dev, max_flow_counter) * 
ESW_OFFLOADS_NUM_GROUPS,
 1 << MLX5_CAP_ESW_FLOWTABLE_FDB(dev, log_max_ft_size));
 
-   if (MLX5_CAP_ESW_FLOWTABLE_FDB(dev, encap) &&
-   MLX5_CAP_ESW_FLOWTABLE_FDB(dev, decap))
+   if (esw->offloads.encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE)
flags |= MLX5_FLOW_TABLE_TUNNEL_EN;
 
fdb = mlx5_create_auto_grouped_flow_table(root_ns, FDB_FAST_PATH,
@@ -1045,6 +1044,66 @@ int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch 
*esw, int nvfs, u8 *mode)
return 0;
 }
 
+int mlx5_devlink_eswitch_encap_mode_set(struct devlink *devlink, u8 encap)
+{
+   struct mlx5_core_dev *dev = devlink_priv(devlink);
+   struct mlx5_eswitch *esw = dev->priv.eswitch;
+   int err;
+
+   if (!MLX5_CAP_GEN(dev, vport_group_manager))
+   return -EOPNOTSUPP;
+
+   if (esw->mode == SRIOV_NONE)
+   return -EOPNOTSUPP;
+
+   if (encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE &&
+   (!MLX5_CAP_ESW_FLOWTABLE_FDB(dev, encap) ||
+!MLX5_CAP_ESW_FLOWTABLE_FDB(dev, decap)))
+   return -EOPNOTSUPP;
+
+   if (encap && encap != DEVLINK_ESWITCH_ENCAP_MODE_BASIC)
+   return -EOPNOTSUPP;
+
+   if (esw->mode == SRIOV_LEGACY) {
+   esw->offloads.encap = encap;
+   return 0;
+   }
+
+   if (esw->offloads.encap == encap)
+   return 0;
+
+   if (esw->offloads.num_flows > 0) {
+   esw_warn(dev, "Can't set encapsulation when flows are 
configured\n");
+   return -EOPNOTSUPP;
+   }
+
+   esw_destroy_offloads_fast_fdb_table(esw);
+
+   esw->offloads.encap = encap;
+   err = esw_create_offloads_fast_fdb_table(esw);
+   if (err) {
+   esw_warn(esw->dev, &

[net-next 2/5] net/mlx5: E-Switch, Refactor fast path FDB table creation in switchdev mode

2017-04-22 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

Refactor the creation of the fast path FDB table that holds the
offloaded rules in SRIOV switchdev mode into it's own function.

This will be used in the next patch to be able and re-create the
table under different settings without going through legacy mode.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 69 +++---
 1 file changed, 49 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 992b380d36be..ce3a2c040706 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -426,31 +426,21 @@ static int esw_add_fdb_miss_rule(struct mlx5_eswitch *esw)
return err;
 }
 
-#define MAX_PF_SQ 256
 #define ESW_OFFLOADS_NUM_GROUPS  4
 
-static int esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports)
+static int esw_create_offloads_fast_fdb_table(struct mlx5_eswitch *esw)
 {
-   int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
-   struct mlx5_flow_table_attr ft_attr = {};
-   int table_size, ix, esw_size, err = 0;
struct mlx5_core_dev *dev = esw->dev;
struct mlx5_flow_namespace *root_ns;
struct mlx5_flow_table *fdb = NULL;
-   struct mlx5_flow_group *g;
-   u32 *flow_group_in;
-   void *match_criteria;
+   int esw_size, err = 0;
u32 flags = 0;
 
-   flow_group_in = mlx5_vzalloc(inlen);
-   if (!flow_group_in)
-   return -ENOMEM;
-
root_ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_FDB);
if (!root_ns) {
esw_warn(dev, "Failed to get FDB flow namespace\n");
err = -EOPNOTSUPP;
-   goto ns_err;
+   goto out;
}
 
esw_debug(dev, "Create offloads FDB table, min (max esw size(2^%d), max 
counters(%d)*groups(%d))\n",
@@ -471,10 +461,49 @@ static int esw_create_offloads_fdb_table(struct 
mlx5_eswitch *esw, int nvports)
if (IS_ERR(fdb)) {
err = PTR_ERR(fdb);
esw_warn(dev, "Failed to create Fast path FDB Table err %d\n", 
err);
-   goto fast_fdb_err;
+   goto out;
}
esw->fdb_table.fdb = fdb;
 
+out:
+   return err;
+}
+
+static void esw_destroy_offloads_fast_fdb_table(struct mlx5_eswitch *esw)
+{
+   mlx5_destroy_flow_table(esw->fdb_table.fdb);
+}
+
+#define MAX_PF_SQ 256
+
+static int esw_create_offloads_fdb_tables(struct mlx5_eswitch *esw, int 
nvports)
+{
+   int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
+   struct mlx5_flow_table_attr ft_attr = {};
+   struct mlx5_core_dev *dev = esw->dev;
+   struct mlx5_flow_namespace *root_ns;
+   struct mlx5_flow_table *fdb = NULL;
+   int table_size, ix, err = 0;
+   struct mlx5_flow_group *g;
+   void *match_criteria;
+   u32 *flow_group_in;
+
+   esw_debug(esw->dev, "Create offloads FDB Tables\n");
+   flow_group_in = mlx5_vzalloc(inlen);
+   if (!flow_group_in)
+   return -ENOMEM;
+
+   root_ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_FDB);
+   if (!root_ns) {
+   esw_warn(dev, "Failed to get FDB flow namespace\n");
+   err = -EOPNOTSUPP;
+   goto ns_err;
+   }
+
+   err = esw_create_offloads_fast_fdb_table(esw);
+   if (err)
+   goto fast_fdb_err;
+
table_size = nvports + MAX_PF_SQ + 1;
 
ft_attr.max_fte = table_size;
@@ -545,18 +574,18 @@ static int esw_create_offloads_fdb_table(struct 
mlx5_eswitch *esw, int nvports)
return err;
 }
 
-static void esw_destroy_offloads_fdb_table(struct mlx5_eswitch *esw)
+static void esw_destroy_offloads_fdb_tables(struct mlx5_eswitch *esw)
 {
if (!esw->fdb_table.fdb)
return;
 
-   esw_debug(esw->dev, "Destroy offloads FDB Table\n");
+   esw_debug(esw->dev, "Destroy offloads FDB Tables\n");
mlx5_del_flow_rules(esw->fdb_table.offloads.miss_rule);
mlx5_destroy_flow_group(esw->fdb_table.offloads.send_to_vport_grp);
mlx5_destroy_flow_group(esw->fdb_table.offloads.miss_grp);
 
mlx5_destroy_flow_table(esw->fdb_table.offloads.fdb);
-   mlx5_destroy_flow_table(esw->fdb_table.fdb);
+   esw_destroy_offloads_fast_fdb_table(esw);
 }
 
 static int esw_create_offloads_table(struct mlx5_eswitch *esw)
@@ -716,7 +745,7 @@ int esw_offloads_init(struct mlx5_eswitch *esw, int nvports)
mlx5_remove_dev_by_protocol(esw->dev, MLX5_INTERFACE_PROTOCOL_IB);
 

Re: [PATCH v3 net-next 08/14] mlx4: use order-0 pages for RX

2017-03-12 Thread Saeed Mahameed
On Sun, Mar 12, 2017 at 5:29 PM, Eric Dumazet  wrote:
> On Sun, 2017-03-12 at 07:57 -0700, Eric Dumazet wrote:
>
>> Problem is XDP TX :
>>
>> I do not see any guarantee mlx4_en_recycle_tx_desc() runs while the NAPI
>> RX is owned by current cpu.
>>
>> Since TX completion is using a different NAPI, I really do not believe
>> we can avoid an atomic operation, like a spinlock, to protect the list
>> of pages ( ring->page_cache )
>
> A quick fix for net-next would be :
>

Hi Eric, Good catch.

I don't think we need to complicate with an expensive spinlock,
 we can simply fix this by not enabling interrupts on XDP TX CQ (not
arm this CQ at all).
and handle XDP TX CQ completion from the RX NAPI context, in a serial
(Atomic) manner before handling RX completions themselves.
This way locking is not required since all page cache handling is done
from the same context (RX NAPI).

This is how we do this in mlx5, and this is the best approach
(performance wise) since we dealy XDP TX CQ completions handling
until we really need the space they hold (On new RX packets).

> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 
> aa074e57ce06fb2842fa1faabd156c3cd2fe10f5..e0b2ea8cefd6beef093c41bade199e3ec4f0291c
>  100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -137,13 +137,17 @@ static int mlx4_en_prepare_rx_desc(struct mlx4_en_priv 
> *priv,
> struct mlx4_en_rx_desc *rx_desc = ring->buf + (index * ring->stride);
> struct mlx4_en_rx_alloc *frags = ring->rx_info +
> (index << priv->log_rx_info);
> +
> if (ring->page_cache.index > 0) {
> +   spin_lock(>page_cache.lock);
> +
> /* XDP uses a single page per frame */
> if (!frags->page) {
> ring->page_cache.index--;
> frags->page = 
> ring->page_cache.buf[ring->page_cache.index].page;
> frags->dma  = 
> ring->page_cache.buf[ring->page_cache.index].dma;
> }
> +   spin_unlock(>page_cache.lock);
> frags->page_offset = XDP_PACKET_HEADROOM;
> rx_desc->data[0].addr = cpu_to_be64(frags->dma +
> XDP_PACKET_HEADROOM);
> @@ -277,6 +281,7 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
> }
> }
>
> +   spin_lock_init(>page_cache.lock);
> ring->prod = 0;
> ring->cons = 0;
> ring->size = size;
> @@ -419,10 +424,13 @@ bool mlx4_en_rx_recycle(struct mlx4_en_rx_ring *ring,
>
> if (cache->index >= MLX4_EN_CACHE_SIZE)
> return false;
> -
> -   cache->buf[cache->index].page = frame->page;
> -   cache->buf[cache->index].dma = frame->dma;
> -   cache->index++;
> +   spin_lock(>lock);
> +   if (cache->index < MLX4_EN_CACHE_SIZE) {
> +   cache->buf[cache->index].page = frame->page;
> +   cache->buf[cache->index].dma = frame->dma;
> +   cache->index++;
> +   }
> +   spin_unlock(>lock);
> return true;
>  }
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h 
> b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index 
> 39f401aa30474e61c0b0029463b23a829ec35fa3..090a08020d13d8e11cc163ac9fc6ac6affccc463
>  100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> @@ -258,7 +258,8 @@ struct mlx4_en_rx_alloc {
>  #define MLX4_EN_CACHE_SIZE (2 * NAPI_POLL_WEIGHT)
>
>  struct mlx4_en_page_cache {
> -   u32 index;
> +   u32 index;
> +   spinlock_t  lock;
> struct {
> struct page *page;
> dma_addr_t  dma;
>
>


Re: [PATCH v3 net-next 08/14] mlx4: use order-0 pages for RX

2017-03-13 Thread Saeed Mahameed
On Sun, Mar 12, 2017 at 6:49 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Sun, 2017-03-12 at 17:49 +0200, Saeed Mahameed wrote:
>> On Sun, Mar 12, 2017 at 5:29 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
>> > On Sun, 2017-03-12 at 07:57 -0700, Eric Dumazet wrote:
>> >
>> >> Problem is XDP TX :
>> >>
>> >> I do not see any guarantee mlx4_en_recycle_tx_desc() runs while the NAPI
>> >> RX is owned by current cpu.
>> >>
>> >> Since TX completion is using a different NAPI, I really do not believe
>> >> we can avoid an atomic operation, like a spinlock, to protect the list
>> >> of pages ( ring->page_cache )
>> >
>> > A quick fix for net-next would be :
>> >
>>
>> Hi Eric, Good catch.
>>
>> I don't think we need to complicate with an expensive spinlock,
>>  we can simply fix this by not enabling interrupts on XDP TX CQ (not
>> arm this CQ at all).
>> and handle XDP TX CQ completion from the RX NAPI context, in a serial
>> (Atomic) manner before handling RX completions themselves.
>> This way locking is not required since all page cache handling is done
>> from the same context (RX NAPI).
>>
>> This is how we do this in mlx5, and this is the best approach
>> (performance wise) since we dealy XDP TX CQ completions handling
>> until we really need the space they hold (On new RX packets).
>
> SGTM, can you provide the patch for mlx4 ?
>

of course, We will send it soon.

> Thanks !
>
>


[PATCH net 2/5] net/mlx5: Don't save PCI state when PCI error is detected

2017-03-10 Thread Saeed Mahameed
From: Daniel Jurgens <dani...@mellanox.com>

When a PCI error is detected the PCI state could be corrupt, don't save
it in that flow. Save the state after initialization. After restoring the
PCI state during slot reset save it again, restoring the state destroys
the previously saved state info.

Fixes: 05ac2c0b7438 ('net/mlx5: Fix race between PCI error handlers and
health work')
Signed-off-by: Daniel Jurgens <dani...@mellanox.com>

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index c4242a4e8130..e2bd600d19de 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1352,6 +1352,7 @@ static int init_one(struct pci_dev *pdev,
if (err)
goto clean_load;
 
+   pci_save_state(pdev);
return 0;
 
 clean_load:
@@ -1407,9 +1408,8 @@ static pci_ers_result_t mlx5_pci_err_detected(struct 
pci_dev *pdev,
 
mlx5_enter_error_state(dev);
mlx5_unload_one(dev, priv, false);
-   /* In case of kernel call save the pci state and drain the health wq */
+   /* In case of kernel call drain the health wq */
if (state) {
-   pci_save_state(pdev);
mlx5_drain_health_wq(dev);
mlx5_pci_disable_device(dev);
}
@@ -1461,6 +1461,7 @@ static pci_ers_result_t mlx5_pci_slot_reset(struct 
pci_dev *pdev)
 
pci_set_master(pdev);
pci_restore_state(pdev);
+   pci_save_state(pdev);
 
if (wait_vital(pdev)) {
dev_err(>dev, "%s: wait_vital timed out\n", __func__);
-- 
2.11.0



[PATCH net 4/5] net/mlx5e: Avoid wrong identification of rules on deletion

2017-03-10 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

When deleting offloaded TC flows, we must correctly identify E-switch
rules. The current check could get us wrong w.r.t to rules set on the
PF. Since it's possible to set NIC rules on the PF, switch to SRIOV
offloads mode and then attempt to delete a NIC rule.

To solve that, we add a flags field to offloaded rules, set it on
creation time and use that over the code where currently needed.

Fixes: 8b32580df1cb ('net/mlx5e: Add TC vlan action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 33 ++---
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 44406a5ec15d..79481f4cf264 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -48,9 +48,14 @@
 #include "eswitch.h"
 #include "vxlan.h"
 
+enum {
+   MLX5E_TC_FLOW_ESWITCH   = BIT(0),
+};
+
 struct mlx5e_tc_flow {
struct rhash_head   node;
u64 cookie;
+   u8  flags;
struct mlx5_flow_handle *rule;
struct list_headencap; /* flows sharing the same encap */
struct mlx5_esw_flow_attr *attr;
@@ -177,7 +182,7 @@ static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
mlx5_fc_destroy(priv->mdev, counter);
}
 
-   if (esw && esw->mode == SRIOV_OFFLOADS) {
+   if (flow->flags & MLX5E_TC_FLOW_ESWITCH) {
mlx5_eswitch_del_vlan_action(esw, flow->attr);
if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
mlx5e_detach_encap(priv, flow);
@@ -598,6 +603,7 @@ static int __parse_cls_flower(struct mlx5e_priv *priv,
 }
 
 static int parse_cls_flower(struct mlx5e_priv *priv,
+   struct mlx5e_tc_flow *flow,
struct mlx5_flow_spec *spec,
struct tc_cls_flower_offload *f)
 {
@@ -609,7 +615,7 @@ static int parse_cls_flower(struct mlx5e_priv *priv,
 
err = __parse_cls_flower(priv, spec, f, _inline);
 
-   if (!err && esw->mode == SRIOV_OFFLOADS &&
+   if (!err && (flow->flags & MLX5E_TC_FLOW_ESWITCH) &&
rep->vport != FDB_UPLINK_VPORT) {
if (min_inline > esw->offloads.inline_mode) {
netdev_warn(priv->netdev,
@@ -1132,23 +1138,19 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, 
__be16 protocol,
   struct tc_cls_flower_offload *f)
 {
struct mlx5e_tc_table *tc = >fs.tc;
-   int err = 0;
-   bool fdb_flow = false;
+   int err, attr_size = 0;
u32 flow_tag, action;
struct mlx5e_tc_flow *flow;
struct mlx5_flow_spec *spec;
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+   u8 flow_flags = 0;
 
-   if (esw && esw->mode == SRIOV_OFFLOADS)
-   fdb_flow = true;
-
-   if (fdb_flow)
-   flow = kzalloc(sizeof(*flow) +
-  sizeof(struct mlx5_esw_flow_attr),
-  GFP_KERNEL);
-   else
-   flow = kzalloc(sizeof(*flow), GFP_KERNEL);
+   if (esw && esw->mode == SRIOV_OFFLOADS) {
+   flow_flags = MLX5E_TC_FLOW_ESWITCH;
+   attr_size  = sizeof(struct mlx5_esw_flow_attr);
+   }
 
+   flow = kzalloc(sizeof(*flow) + attr_size, GFP_KERNEL);
spec = mlx5_vzalloc(sizeof(*spec));
if (!spec || !flow) {
err = -ENOMEM;
@@ -1156,12 +1158,13 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, 
__be16 protocol,
}
 
flow->cookie = f->cookie;
+   flow->flags = flow_flags;
 
-   err = parse_cls_flower(priv, spec, f);
+   err = parse_cls_flower(priv, flow, spec, f);
if (err < 0)
goto err_free;
 
-   if (fdb_flow) {
+   if (flow->flags & MLX5E_TC_FLOW_ESWITCH) {
flow->attr  = (struct mlx5_esw_flow_attr *)(flow + 1);
err = parse_tc_fdb_actions(priv, f->exts, flow);
if (err < 0)
-- 
2.11.0



[PATCH net 1/5] net/mlx5: Fix create autogroup prev initializer

2017-03-10 Thread Saeed Mahameed
From: Paul Blakey <pa...@mellanox.com>

The autogroups list is a list of non overlapping group boundaries
sorted by their start index. If the autogroups list wasn't empty
and an empty group slot was found at the start of the list,
the new group was added to the end of the list instead of the
beginning, as the prev initializer was incorrect.
When this was repeated, it caused multiple groups to have
overlapping boundaries.

Fixed that by correctly initializing the prev pointer to the
start of the list.

Fixes: eccec8da3b4e ('net/mlx5: Keep autogroups list ordered')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
Reviewed-by: Mark Bloch <ma...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 2478516a61e2..ded27bb9a3b6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1136,7 +1136,7 @@ static struct mlx5_flow_group *create_autogroup(struct 
mlx5_flow_table *ft,
u32 *match_criteria)
 {
int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
-   struct list_head *prev = ft->node.children.prev;
+   struct list_head *prev = >node.children;
unsigned int candidate_index = 0;
struct mlx5_flow_group *fg;
void *match_criteria_addr;
-- 
2.11.0



[PATCH net 0/5] Mellanox mlx5 fixes 2017-03-09

2017-03-10 Thread Saeed Mahameed
Hi Dave,

This series contains some mlx5 core and ethernet driver fixes.

For -stable:
net/mlx5e: remove IEEE/CEE mode check when setting DCBX mode (for kernel >= 
4.10)
net/mlx5e: Avoid wrong identification of rules on deletion (for kernel >= 4.9)
net/mlx5: Don't save PCI state when PCI error is detected (for kernel >= 4.9)
net/mlx5: Fix create autogroup prev initializer (for kernel >=4.9)

Thanks,
Saeed.

Daniel Jurgens (1):
  net/mlx5: Don't save PCI state when PCI error is detected

Eugenia Emantayev (1):
  net/mlx5e: Fix loopback selftest

Huy Nguyen (1):
  net/mlx5e: remove IEEE/CEE mode check when setting DCBX mode

Or Gerlitz (1):
  net/mlx5e: Avoid wrong identification of rules on deletion

Paul Blakey (1):
  net/mlx5: Fix create autogroup prev initializer

 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 10 +++
 .../net/ethernet/mellanox/mlx5/core/en_selftest.c  |  5 +---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 33 --
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  5 ++--
 5 files changed, 28 insertions(+), 27 deletions(-)

-- 
2.11.0



[PATCH net 5/5] net/mlx5e: Fix loopback selftest

2017-03-10 Thread Saeed Mahameed
From: Eugenia Emantayev <euge...@mellanox.com>

Change packet type handler to ETH_P_IP instead of ETH_P_ALL
since we are already expecting an IP packet.

Also, using ETH_P_ALL will cause the loopback test packet type handler
to be called on all outgoing packets, especially our own self loopback
test SKB, which will be validated on xmit as well, and we don't want that.

Tested with:
ethtool -t ethX
validated that the loopback test passes.

Fixes: 0952da791c97 ('net/mlx5e: Add support for loopback selftest')
Signed-off-by: Eugenia Emantayev <euge...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
index 31e3cb7ee5fe..5621dcfda4f1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
@@ -204,9 +204,6 @@ mlx5e_test_loopback_validate(struct sk_buff *skb,
struct iphdr *iph;
 
/* We are only going to peek, no need to clone the SKB */
-   if (skb->protocol != htons(ETH_P_IP))
-   goto out;
-
if (MLX5E_TEST_PKT_SIZE - ETH_HLEN > skb_headlen(skb))
goto out;
 
@@ -249,7 +246,7 @@ static int mlx5e_test_loopback_setup(struct mlx5e_priv 
*priv,
lbtp->loopback_ok = false;
init_completion(>comp);
 
-   lbtp->pt.type = htons(ETH_P_ALL);
+   lbtp->pt.type = htons(ETH_P_IP);
lbtp->pt.func = mlx5e_test_loopback_validate;
lbtp->pt.dev = priv->netdev;
lbtp->pt.af_packet_priv = lbtp;
-- 
2.11.0



<    8   9   10   11   12   13   14   15   16   17   >