Re: [PATCH v2 net-next 3/4] mlx4: xdp: Reserve headroom for receiving packet when XDP prog is active
On Tue, Dec 6, 2016 at 8:27 PM, Martin KaFai Lau <ka...@fb.com> wrote: > On Tue, Dec 06, 2016 at 06:50:47PM +0200, Saeed Mahameed wrote: >> On Mon, Dec 5, 2016 at 9:55 PM, Martin KaFai Lau <ka...@fb.com> wrote: >> > On Mon, Dec 05, 2016 at 02:54:06AM +0200, Saeed Mahameed wrote: >> >> On Sun, Dec 4, 2016 at 5:17 AM, Martin KaFai Lau <ka...@fb.com> wrote: >> >> > Reserve XDP_PACKET_HEADROOM and honor bpf_xdp_adjust_head() >> >> > when XDP prog is active. This patch only affects the code >> >> > path when XDP is active. >> >> > >> >> > Signed-off-by: Martin KaFai Lau <ka...@fb.com> >> >> > --- >> >> >> >> Hi Martin, Sorry for the late review, i have some comments below >> >> >> >> > drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 17 +++-- >> >> > drivers/net/ethernet/mellanox/mlx4/en_rx.c | 23 >> >> > +-- >> >> > drivers/net/ethernet/mellanox/mlx4/en_tx.c | 9 + >> >> > drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 3 ++- >> >> > 4 files changed, 39 insertions(+), 13 deletions(-) >> >> > >> >> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c >> >> > b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c >> >> > index 311c14153b8b..094a13b52cf6 100644 >> >> > --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c >> >> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c >> >> > @@ -51,7 +51,8 @@ >> >> > #include "mlx4_en.h" >> >> > #include "en_port.h" >> >> > >> >> > -#define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * >> >> > VLAN_HLEN))) >> >> > +#define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * >> >> > VLAN_HLEN) - \ >> >> > + XDP_PACKET_HEADROOM)) >> >> > >> >> > int mlx4_en_setup_tc(struct net_device *dev, u8 up) >> >> > { >> >> > @@ -1551,6 +1552,7 @@ int mlx4_en_start_port(struct net_device *dev) >> >> > struct mlx4_en_tx_ring *tx_ring; >> >> > int rx_index = 0; >> >> > int err = 0; >> >> > + int mtu; >> >> > int i, t; >> >> > int j; >> >> > u8 mc_list[16] = {0}; >> >> > @@ -1684,8 +1686,12 @@ int mlx4_en_start_port(struct net_device *dev) >> >> > } >> >> > >> >> > /* Configure port */ >> >> > + mtu = priv->rx_skb_size + ETH_FCS_LEN; >> >> > + if (priv->tx_ring_num[TX_XDP]) >> >> > + mtu += XDP_PACKET_HEADROOM; >> >> > + >> >> >> >> Why would the physical MTU care for the headroom you preserve for XDP >> >> prog? >> >> This is the wire MTU, it shouldn't be changed, please keep it as >> >> before, any preservation you make in packets buffers are needed only >> >> for FWD case or modify case (HW or wire should not care about them). >> > >> > Thanks for your feedback! >> >> Just doing my job :)) >> >> > >> > FWD: >> > packet received from a port >> > => process by a XDP prog >> > => XDP_TX out to the same port. >> > >> > For example, if the received packet has 1500 payload and the XDP prog >> > encapsulates it in an IPv6 header (+40 bytes). After testing, it cannot >> > be sent out due to the HW/wire MTU is 1500. >> > >> > Even the wire MTU info was passed to the XDP prog, there is not much a >> > XDP prog could do here other than dropping it. >> > >> > Hence, this patch gives guarantee to the XDP prog such that >> > it can always send out what it has received + XDP_PACKET_HEADROOM. >> > >> >> Still i am not convinced ! this is against common sense, >> this means that the XDP prog can send packets larger than the MTU >> seen on netdev! >> >> anyway if a packet with the size (MTU + XDP_PACKET_HEADROOM) was sent >> from XDP ring and HW allowed it to exit somehow (with the code you >> provided :)), most likely it will be dropped >> at the other end. > The MTU of our receiver side is larger than 1500. > > If the otherside could not handle >1500,
Re: [BUG] ethernet:mellanox:mlx5: Oops in health_recover get_nic_state(dev)
On Tue, Mar 28, 2017 at 2:45 AM, Goel, Sameerwrote: > Stack frame: > [ 1744.418958] [] get_nic_state+0x24/0x40 [mlx5_core] > [ 1744.425273] [] health_recover+0x28/0x80 [mlx5_core] > [ 1744.431496] [] process_one_work+0x150/0x460 > [ 1744.437218] [] worker_thread+0x50/0x4b8 > [ 1744.442609] [] kthread+0xd8/0xf0 > [ 1744.447377] [] ret_from_fork+0x10/0x20 > > Summary: > This issue was seen on QDF2400 system 30 mins after while running speccpu > 2006. During the test a recoverable PCIe error was seen that gave the > following log: > [ 1673.170969] pcieport 0002:00:00.0: aer_status: 0x4000, aer_mask: > 0x0040 > [ 1673.177961] pcieport 0002:00:00.0: aer_layer=Transaction Layer, > aer_agent=Requester ID > [ 1673.185832] pcieport 0002:00:00.0: aer_uncor_severity: 0x00462030 > [ 1675.536391] mlx5_core 0002:01:00.0: assert_var[0] 0x > [ 1675.541093] mlx5_core 0002:01:00.0: assert_var[1] 0x > [ 1675.546750] mlx5_core 0002:01:00.0: assert_var[2] 0x > [ 1675.552377] mlx5_core 0002:01:00.0: assert_var[3] 0x > [ 1675.558040] mlx5_core 0002:01:00.0: assert_var[4] 0x > [ 1675.563661] mlx5_core 0002:01:00.0: assert_exit_ptr 0x > [ 1675.569488] mlx5_core 0002:01:00.0: assert_callra 0x > [ 1675.575120] mlx5_core 0002:01:00.0: fw_ver 15.4095.65535 > [ 1675.580426] mlx5_core 0002:01:00.0: hw_id 0x > [ 1675.585363] mlx5_core 0002:01:00.0: irisc_index 255 > [ 1675.590242] mlx5_core 0002:01:00.0: synd 0xff: unrecognized error > [ 1675.596301] mlx5_core 0002:01:00.0: ext_synd 0x > [ 1675.601209] mlx5_core 0002:01:00.0: mlx5_enter_error_state:120:(pid 7205): > start > [ 1675.608613] mlx5_core 0002:01:00.0: mlx5_enter_error_state:127:(pid 7205): > end > > After the above log we see the above stackframe and a page fault due to > invalid dev pointer. > > So the the recovery work is queued and the timer is stopped. Somehow the > workqueue is not cleared and when it runs the dev pointer is invalid. > > This issue was difficult to repro and was seen only once in multiple runs on > a specific device. Hi Sameer, Thanks for the report, adding more relevant ppl Mohamad/Daniel Does the above ring a bell ? can you check ? Thanks Saeed.
[PATCH net 1/1] net/mlx5: Avoid dereferencing uninitialized pointer
From: Talat Batheesh <tal...@mellanox.com> In NETDEV_CHANGEUPPER event the upper_info field is valid only when linking is true. Otherwise it should be ignored. Fixes: 7907f23adc18 (net/mlx5: Implement RoCE LAG feature) Signed-off-by: Talat Batheesh <tal...@mellanox.com> Reviewed-by: Aviv Heller <av...@mellanox.com> Reviewed-by: Moni Shoua <mo...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- Hi Dave, I will appreciate it if you queue up this patch for v4.9-stable. Thanks in advance, Saeed. drivers/net/ethernet/mellanox/mlx5/core/lag.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag.c index 55957246c0e8..b5d5519542e8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lag.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag.c @@ -294,7 +294,7 @@ static int mlx5_handle_changeupper_event(struct mlx5_lag *ldev, struct netdev_notifier_changeupper_info *info) { struct net_device *upper = info->upper_dev, *ndev_tmp; - struct netdev_lag_upper_info *lag_upper_info; + struct netdev_lag_upper_info *lag_upper_info = NULL; bool is_bonded; int bond_status = 0; int num_slaves = 0; @@ -303,7 +303,8 @@ static int mlx5_handle_changeupper_event(struct mlx5_lag *ldev, if (!netif_is_lag_master(upper)) return 0; - lag_upper_info = info->upper_info; + if (info->linking) + lag_upper_info = info->upper_info; /* The event may still be of interest if the slave does not belong to * us, but is enslaved to a master which has one or more of our netdevs -- 2.11.0
Re: [PATCH net-next 00/12] Mellanox mlx5e XDP performance optimization
On Sat, Mar 25, 2017 at 6:54 PM, Tom Herbert <t...@herbertland.com> wrote: > On Fri, Mar 24, 2017 at 2:52 PM, Saeed Mahameed <sae...@mellanox.com> wrote: >> Hi Dave, >> >> This series provides some preformancee optimizations for mlx5e >> driver, especially for XDP TX flows. >> >> 1st patch is a simple change of rmb to dma_rmb in CQE fetch routine >> which shows a huge gain for both RX and TX packet rates. >> >> 2nd patch removes write combining logic from the driver TX handler >> and simplifies the TX logic while improving TX CPU utilization. >> >> All other patches combined provide some refactoring to the driver TX >> flows to allow some significant XDP TX improvements. >> >> More details and performance numbers per patch can be found in each patch >> commit message compared to the preceding patch. >> >> Overall performance improvemnets >> System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz >> >> Test case Baseline Now improvement >> --- >> TX packets (24 threads) 45Mpps54Mpps 20% >> TC stack Drop (1 core) 3.45Mpps 3.6Mpps 5% >> XDP Drop (1 core) 14Mpps16.9Mpps20% >> XDP TX(1 core) 10.4Mpps 13.7Mpps31% >> > Awesome, and good timing. I'll be presenting XDP at IETF next and > would like to include these numbers in the presentation if you don't > mind... > Not at all, please go ahead. But as you see, the system i tested on is not that powerful. We can get even better results with a modern system. If you want i can provide you those numbers by mid-week.
[PATCH net-next 12/12] net/mlx5e: Different SQ types
Different SQ types (tx, xdp, ico) are growing apart, we separate them and remove unwanted parts in each one of them, to simplify data path and utilize data cache. Remove DB union from SQ structures since it is not needed anymore as we now have different SQ data type for each SQ. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 99 +++-- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 464 +- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 33 +- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 50 +-- drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 2 +- 5 files changed, 392 insertions(+), 256 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 50f895fa5f31..bace9233dc1f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -319,13 +319,7 @@ struct mlx5e_sq_wqe_info { u8 num_wqebbs; }; -enum mlx5e_sq_type { - MLX5E_SQ_TXQ, - MLX5E_SQ_ICO, - MLX5E_SQ_XDP -}; - -struct mlx5e_sq { +struct mlx5e_txqsq { /* data path */ /* dirtied @completion */ @@ -339,18 +333,11 @@ struct mlx5e_sq { struct mlx5e_cqcq; - /* pointers to per tx element info: write@xmit, read@completion */ - union { - struct { - struct sk_buff **skb; - struct mlx5e_sq_dma *dma_fifo; - struct mlx5e_tx_wqe_info *wqe_info; - } txq; - struct mlx5e_sq_wqe_info *ico_wqe; - struct { - struct mlx5e_dma_info *di; - bool doorbell; - } xdp; + /* write@xmit, read@completion */ + struct { + struct sk_buff **skb; + struct mlx5e_sq_dma *dma_fifo; + struct mlx5e_tx_wqe_info *wqe_info; } db; /* read only */ @@ -372,7 +359,67 @@ struct mlx5e_sq { struct mlx5e_channel *channel; inttc; u32rate_limit; - u8 type; +} cacheline_aligned_in_smp; + +struct mlx5e_xdpsq { + /* data path */ + + /* dirtied @rx completion */ + u16cc; + u16pc; + + struct mlx5e_cqcq; + + /* write@xmit, read@completion */ + struct { + struct mlx5e_dma_info *di; + bool doorbell; + } db; + + /* read only */ + struct mlx5_wq_cyc wq; + void __iomem *uar_map; + u32sqn; + struct device *pdev; + __be32 mkey_be; + u8 min_inline_mode; + unsigned long state; + + /* control path */ + struct mlx5_wq_ctrlwq_ctrl; + struct mlx5e_channel *channel; +} cacheline_aligned_in_smp; + +struct mlx5e_icosq { + /* data path */ + + /* dirtied @completion */ + u16cc; + + /* dirtied @xmit */ + u16pc cacheline_aligned_in_smp; + u32dma_fifo_pc; + u16prev_cc; + + struct mlx5e_cqcq; + + /* write@xmit, read@completion */ + struct { + struct mlx5e_sq_wqe_info *ico_wqe; + } db; + + /* read only */ + struct mlx5_wq_cyc wq; + void __iomem *uar_map; + u32sqn; + u16edge; + struct device *pdev; + __be32 mkey_be; + unsigned long state; + + /* control path */ + struct mlx5_wq_ctrlwq_ctrl; + struct mlx5e_channel *channel; } cacheline_aligned_in_smp; static inline bool @@ -477,7 +524,7 @@ struct mlx5e_rq { /* XDP */ struct bpf_prog *xdp_prog; - struct mlx5e_sqxdpsq; + struct mlx5e_xdpsq xdpsq; /* control */ struct mlx5_wq_ctrlwq_ctrl; @@ -497,8 +544,8 @@ enum channel_flags { struct mlx5e_channel { /* data path */ struct mlx5e_rqrq; - struct mlx5e_sqsq[MLX5E_MAX_NUM_TC]; - struct mlx5e_sqicosq; /* internal control operations */ + struct mlx5e_txqsq sq[MLX5E_MAX_NUM_TC]; + struct mlx5e_icosq icosq; /* internal control operations */ bool xdp; struct napi_struct napi; struct device *pde
[PATCH net-next 03/12] net/mlx5e: Single bfreg (UAR) for all mlx5e SQs and netdevs
One is sufficient since Blue Flame is not supported anymore. This will also come in handy for switchdev mode to save resources, since VF representors will use same single UAR as well for their own SQs. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h| 1 - drivers/net/ethernet/mellanox/mlx5/core/en_common.c | 9 + drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 17 +++-- include/linux/mlx5/driver.h | 1 + 4 files changed, 13 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 85261e9ccf4a..22e4bad03f05 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -483,7 +483,6 @@ struct mlx5e_sq { /* control path */ struct mlx5_wq_ctrlwq_ctrl; - struct mlx5_sq_bfreg bfreg; struct mlx5e_channel *channel; inttc; u32rate_limit; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c index bd898d8deda0..20bdbe685795 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c @@ -107,10 +107,18 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev) goto err_dealloc_transport_domain; } + err = mlx5_alloc_bfreg(mdev, >bfreg, false, false); + if (err) { + mlx5_core_err(mdev, "alloc bfreg failed, %d\n", err); + goto err_destroy_mkey; + } + INIT_LIST_HEAD(>mlx5e_res.td.tirs_list); return 0; +err_destroy_mkey: + mlx5_core_destroy_mkey(mdev, >mkey); err_dealloc_transport_domain: mlx5_core_dealloc_transport_domain(mdev, res->td.tdn); err_dealloc_pd: @@ -122,6 +130,7 @@ void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev) { struct mlx5e_resources *res = >mlx5e_res; + mlx5_free_bfreg(mdev, >bfreg); mlx5_core_destroy_mkey(mdev, >mkey); mlx5_core_dealloc_transport_domain(mdev, res->td.tdn); mlx5_core_dealloc_pd(mdev, res->pdn); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index f9bcbd277adb..49c1769d13b9 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -1016,18 +1016,14 @@ static int mlx5e_create_sq(struct mlx5e_channel *c, sq->mkey_be = c->mkey_be; sq->channel = c; sq->tc= tc; + sq->uar_map = mdev->mlx5e_res.bfreg.map; - err = mlx5_alloc_bfreg(mdev, >bfreg, false, false); - if (err) - return err; - - sq->uar_map = sq->bfreg.map; param->wq.db_numa_node = cpu_to_node(c->cpu); err = mlx5_wq_cyc_create(mdev, >wq, sqc_wq, >wq, >wq_ctrl); if (err) - goto err_unmap_free_uar; + return err; sq->wq.db = >wq.db[MLX5_SND_DBR]; @@ -1053,20 +1049,13 @@ static int mlx5e_create_sq(struct mlx5e_channel *c, err_sq_wq_destroy: mlx5_wq_destroy(>wq_ctrl); -err_unmap_free_uar: - mlx5_free_bfreg(mdev, >bfreg); - return err; } static void mlx5e_destroy_sq(struct mlx5e_sq *sq) { - struct mlx5e_channel *c = sq->channel; - struct mlx5e_priv *priv = c->priv; - mlx5e_free_sq_db(sq); mlx5_wq_destroy(>wq_ctrl); - mlx5_free_bfreg(priv->mdev, >bfreg); } static int mlx5e_enable_sq(struct mlx5e_sq *sq, struct mlx5e_sq_param *param) @@ -1103,7 +1092,7 @@ static int mlx5e_enable_sq(struct mlx5e_sq *sq, struct mlx5e_sq_param *param) MLX5_SET(sqc, sqc, tis_lst_sz, param->type == MLX5E_SQ_ICO ? 0 : 1); MLX5_SET(wq, wq, wq_type, MLX5_WQ_TYPE_CYCLIC); - MLX5_SET(wq, wq, uar_page, sq->bfreg.index); + MLX5_SET(wq, wq, uar_page, mdev->mlx5e_res.bfreg.index); MLX5_SET(wq, wq, log_wq_pg_sz, sq->wq_ctrl.buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT); MLX5_SET64(wq, wq, dbr_addr, sq->wq_ctrl.db.dma); diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 2fcff6b4503f..f50864626230 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -728,6 +728,7 @@ struct mlx5e_resources { u32pdn; struct mlx5_td td; struct mlx5_core_mkey mkey; + struct mlx5_sq_bfreg bfreg; }; struct mlx5_core_dev { -- 2.11.0
[PATCH net-next 00/12] Mellanox mlx5e XDP performance optimization
Hi Dave, This series provides some preformancee optimizations for mlx5e driver, especially for XDP TX flows. 1st patch is a simple change of rmb to dma_rmb in CQE fetch routine which shows a huge gain for both RX and TX packet rates. 2nd patch removes write combining logic from the driver TX handler and simplifies the TX logic while improving TX CPU utilization. All other patches combined provide some refactoring to the driver TX flows to allow some significant XDP TX improvements. More details and performance numbers per patch can be found in each patch commit message compared to the preceding patch. Overall performance improvemnets System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz Test case Baseline Now improvement --- TX packets (24 threads) 45Mpps54Mpps 20% TC stack Drop (1 core) 3.45Mpps 3.6Mpps 5% XDP Drop (1 core) 14Mpps16.9Mpps20% XDP TX(1 core) 10.4Mpps 13.7Mpps31% Thanks, Saeed. Saeed Mahameed (12): net/mlx5e: Use dma_rmb rather than rmb in CQE fetch routine net/mlx5e: Xmit, no write combining net/mlx5e: Single bfreg (UAR) for all mlx5e SQs and netdevs net/mlx5e: Move XDP completion functions to rx file net/mlx5e: Move mlx5e_rq struct declaration net/mlx5e: Move XDP SQ instance into RQ net/mlx5e: Poll XDP TX CQ before RX CQ net/mlx5e: Optimize XDP frame xmit net/mlx5e: Generalize tx helper functions for different SQ types net/mlx5e: Proper names for SQ/RQ/CQ functions net/mlx5e: Generalize SQ create/modify/destroy functions net/mlx5e: Different SQ types drivers/net/ethernet/mellanox/mlx5/core/en.h | 319 +- .../net/ethernet/mellanox/mlx5/core/en_common.c| 9 + drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 644 + drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 124 +++- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c| 147 + drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 70 +-- include/linux/mlx5/driver.h| 1 + 7 files changed, 716 insertions(+), 598 deletions(-) -- 2.11.0
[PATCH net-next 09/12] net/mlx5e: Generalize tx helper functions for different SQ types
In the next patches we will introduce different SQ types, for that we here generalize some TX helper functions to work with more basic SQ parameters, in order to re-use them for the different SQ types. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 35 - drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 13 +--- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 10 +++--- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 37 +-- 4 files changed, 48 insertions(+), 47 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index f02d2cb8d148..50f895fa5f31 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -375,10 +375,10 @@ struct mlx5e_sq { u8 type; } cacheline_aligned_in_smp; -static inline bool mlx5e_sq_has_room_for(struct mlx5e_sq *sq, u16 n) +static inline bool +mlx5e_wqc_has_room_for(struct mlx5_wq_cyc *wq, u16 cc, u16 pc, u16 n) { - return (((sq->wq.sz_m1 & (sq->cc - sq->pc)) >= n) || - (sq->cc == sq->pc)); + return (((wq->sz_m1 & (cc - pc)) >= n) || (cc == pc)); } struct mlx5e_dma_info { @@ -721,7 +721,6 @@ struct mlx5e_priv { void mlx5e_build_ptys2ethtool_map(void); -void mlx5e_send_nop(struct mlx5e_sq *sq, bool notify_hw); u16 mlx5e_select_queue(struct net_device *dev, struct sk_buff *skb, void *accel_priv, select_queue_fallback_t fallback); netdev_tx_t mlx5e_xmit(struct sk_buff *skb, struct net_device *dev); @@ -807,20 +806,40 @@ void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode); void mlx5e_set_rq_type_params(struct mlx5e_priv *priv, u8 rq_type); -static inline void -mlx5e_tx_notify_hw(struct mlx5e_sq *sq, struct mlx5_wqe_ctrl_seg *ctrl) +static inline +struct mlx5e_tx_wqe *mlx5e_post_nop(struct mlx5_wq_cyc *wq, u32 sqn, u16 *pc) { + u16 pi = *pc & wq->sz_m1; + struct mlx5e_tx_wqe*wqe = mlx5_wq_cyc_get_wqe(wq, pi); + struct mlx5_wqe_ctrl_seg *cseg = >ctrl; + + memset(cseg, 0, sizeof(*cseg)); + + cseg->opmod_idx_opcode = cpu_to_be32((*pc << 8) | MLX5_OPCODE_NOP); + cseg->qpn_ds = cpu_to_be32((sqn << 8) | 0x01); + + (*pc)++; + + return wqe; +} + +static inline +void mlx5e_notify_hw(struct mlx5_wq_cyc *wq, u16 pc, +void __iomem *uar_map, +struct mlx5_wqe_ctrl_seg *ctrl) +{ + ctrl->fm_ce_se = MLX5_WQE_CTRL_CQ_UPDATE; /* ensure wqe is visible to device before updating doorbell record */ dma_wmb(); - *sq->wq.db = cpu_to_be32(sq->pc); + *wq->db = cpu_to_be32(pc); /* ensure doorbell record is visible to device before ringing the * doorbell */ wmb(); - mlx5_write64((__be32 *)ctrl, sq->uar_map, NULL); + mlx5_write64((__be32 *)ctrl, uar_map, NULL); } static inline void mlx5e_cq_arm(struct mlx5e_cq *cq) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index d39ee6669b8e..7faf2bcccfa6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -847,6 +847,7 @@ static int mlx5e_open_rq(struct mlx5e_channel *c, { struct mlx5e_sq *sq = >icosq; u16 pi = sq->pc & sq->wq.sz_m1; + struct mlx5e_tx_wqe *nopwqe; int err; err = mlx5e_create_rq(c, param, rq); @@ -867,8 +868,9 @@ static int mlx5e_open_rq(struct mlx5e_channel *c, sq->db.ico_wqe[pi].opcode = MLX5_OPCODE_NOP; sq->db.ico_wqe[pi].num_wqebbs = 1; - mlx5e_send_nop(sq, true); /* trigger mlx5e_post_rx_wqes() */ - + nopwqe = mlx5e_post_nop(>wq, sq->sqn, >pc); + mlx5e_notify_hw(>wq, sq->pc, sq->uar_map, >ctrl); + sq->stats.nop++; /* TODO no need for SQ stats in ico */ return 0; err_disable_rq: @@ -1202,9 +1204,12 @@ static void mlx5e_close_sq(struct mlx5e_sq *sq) netif_tx_disable_queue(sq->txq); /* last doorbell out, godspeed .. */ - if (mlx5e_sq_has_room_for(sq, 1)) { + if (mlx5e_wqc_has_room_for(>wq, sq->cc, sq->pc, 1)) { + struct mlx5e_tx_wqe *nop; + sq->db.txq.skb[(sq->pc & sq->wq.sz_m1)] = NULL; - mlx5e_send_nop(sq, true); + nop = mlx5e_post_nop(>wq, sq->sqn, >pc); + mlx5e_notify_hw(>wq, sq->pc, sq->uar_map, >ctrl);
[PATCH net-next 08/12] net/mlx5e: Optimize XDP frame xmit
XDP SQ has a fixed size WQE (MLX5E_XDP_TX_WQEBBS = 1) and only posts one kind of WQE (MLX5_OPCODE_SEND), Also we initialize SQ descriptors static fields once on open_xdpsq, rather than every time on critical path. Optimize the code in light of those facts and add a prefetch of the TX descriptor first thing in the xdp xmit function. Performance improvement: System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz Test case Before Nowimprovement --- XDP TX (1 core) 13Mpps13.7Mpps 5% Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 5 --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 43 +++ drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 41 ++--- 3 files changed, 47 insertions(+), 42 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 5e4ae94c9f6a..f02d2cb8d148 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -116,12 +116,8 @@ (DIV_ROUND_UP(sizeof(struct mlx5e_umr_wqe), MLX5_SEND_WQE_BB)) #define MLX5E_XDP_MIN_INLINE (ETH_HLEN + VLAN_HLEN) -#define MLX5E_XDP_IHS_DS_COUNT \ - DIV_ROUND_UP(MLX5E_XDP_MIN_INLINE - 2, MLX5_SEND_WQE_DS) #define MLX5E_XDP_TX_DS_COUNT \ ((sizeof(struct mlx5e_tx_wqe) / MLX5_SEND_WQE_DS) + 1 /* SG DS */) -#define MLX5E_XDP_TX_WQEBBS \ - DIV_ROUND_UP(MLX5E_XDP_TX_DS_COUNT, MLX5_SEND_WQEBB_NUM_DS) #define MLX5E_NUM_MAIN_GROUPS 9 @@ -352,7 +348,6 @@ struct mlx5e_sq { } txq; struct mlx5e_sq_wqe_info *ico_wqe; struct { - struct mlx5e_sq_wqe_info *wqe_info; struct mlx5e_dma_info *di; bool doorbell; } xdp; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 210033187bfe..d39ee6669b8e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -894,7 +894,6 @@ static void mlx5e_close_rq(struct mlx5e_rq *rq) static void mlx5e_free_sq_xdp_db(struct mlx5e_sq *sq) { kfree(sq->db.xdp.di); - kfree(sq->db.xdp.wqe_info); } static int mlx5e_alloc_sq_xdp_db(struct mlx5e_sq *sq, int numa) @@ -903,9 +902,7 @@ static int mlx5e_alloc_sq_xdp_db(struct mlx5e_sq *sq, int numa) sq->db.xdp.di = kzalloc_node(sizeof(*sq->db.xdp.di) * wq_sz, GFP_KERNEL, numa); - sq->db.xdp.wqe_info = kzalloc_node(sizeof(*sq->db.xdp.wqe_info) * wq_sz, - GFP_KERNEL, numa); - if (!sq->db.xdp.di || !sq->db.xdp.wqe_info) { + if (!sq->db.xdp.di) { mlx5e_free_sq_xdp_db(sq); return -ENOMEM; } @@ -993,7 +990,7 @@ static int mlx5e_sq_get_max_wqebbs(u8 sq_type) case MLX5E_SQ_ICO: return MLX5E_ICOSQ_MAX_WQEBBS; case MLX5E_SQ_XDP: - return MLX5E_XDP_TX_WQEBBS; + return 1; } return MLX5_SEND_WQE_MAX_WQEBBS; } @@ -1513,6 +1510,40 @@ static inline int mlx5e_get_max_num_channels(struct mlx5_core_dev *mdev) MLX5E_MAX_NUM_CHANNELS); } +static int mlx5e_open_xdpsq(struct mlx5e_channel *c, + struct mlx5e_sq_param *param, + struct mlx5e_sq *sq) +{ + unsigned int ds_cnt = MLX5E_XDP_TX_DS_COUNT; + unsigned int inline_hdr_sz = 0; + int err; + int i; + + err = mlx5e_open_sq(c, 0, param, sq); + if (err) + return err; + + if (sq->min_inline_mode != MLX5_INLINE_MODE_NONE) { + inline_hdr_sz = MLX5E_XDP_MIN_INLINE; + ds_cnt++; + } + + /* Pre initialize fixed WQE fields */ + for (i = 0; i < mlx5_wq_cyc_get_size(>wq); i++) { + struct mlx5e_tx_wqe *wqe = mlx5_wq_cyc_get_wqe(>wq, i); + struct mlx5_wqe_ctrl_seg *cseg = >ctrl; + struct mlx5_wqe_eth_seg *eseg = >eth; + struct mlx5_wqe_data_seg *dseg; + + cseg->qpn_ds = cpu_to_be32((sq->sqn << 8) | ds_cnt); + eseg->inline_hdr.sz = cpu_to_be16(inline_hdr_sz); + + dseg = (struct mlx5_wqe_data_seg *)cseg + (ds_cnt - 1); + dseg->lkey = sq->mkey_be; + } + return 0; +} + static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, struct mlx5e_channel_param *cparam, struct mlx5e_channel **cp) @@ -15
[PATCH net-next 02/12] net/mlx5e: Xmit, no write combining
mlx5e netdev Blue Flame (write combining) support demands a lot of overhead for a little latency gain for some special cases, this overhead is hurting the common case. Here we remove xmit Blue Flame support by creating all bfregs with no write combining for all SQs, and we remove a lot of BF logic and conditions from xmit data path. Simplify mlx5e_tx_notify_hw (doorbell function) by removing BF related code and by removing one memory barrier needed for WC mapped SQ doorbell buffers, which no longer exist. Performance improvement: System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz Test case Before Now improvement --- TX packets (24 threads) 50Mpps 54Mpps8% Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 20 ++- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 6 +--- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 4 +-- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 42 ++- 4 files changed, 9 insertions(+), 63 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index dc52053128bc..85261e9ccf4a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -111,7 +111,6 @@ #define MLX5E_MAX_NUM_SQS (MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC) #define MLX5E_TX_CQ_POLL_BUDGET128 #define MLX5E_UPDATE_STATS_INTERVAL200 /* msecs */ -#define MLX5E_SQ_BF_BUDGET 16 #define MLX5E_ICOSQ_MAX_WQEBBS \ (DIV_ROUND_UP(sizeof(struct mlx5e_umr_wqe), MLX5_SEND_WQE_BB)) @@ -426,7 +425,6 @@ struct mlx5e_sq_dma { enum { MLX5E_SQ_STATE_ENABLED, - MLX5E_SQ_STATE_BF_ENABLE, }; struct mlx5e_sq_wqe_info { @@ -450,9 +448,6 @@ struct mlx5e_sq { /* dirtied @xmit */ u16pc cacheline_aligned_in_smp; u32dma_fifo_pc; - u16bf_offset; - u16prev_cc; - u8 bf_budget; struct mlx5e_sq_stats stats; struct mlx5e_cqcq; @@ -478,7 +473,6 @@ struct mlx5e_sq { void __iomem *uar_map; struct netdev_queue *txq; u32sqn; - u16bf_buf_size; u16max_inline; u8 min_inline_mode; u16edge; @@ -818,11 +812,9 @@ void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode); void mlx5e_set_rq_type_params(struct mlx5e_priv *priv, u8 rq_type); -static inline void mlx5e_tx_notify_hw(struct mlx5e_sq *sq, - struct mlx5_wqe_ctrl_seg *ctrl, int bf_sz) +static inline void +mlx5e_tx_notify_hw(struct mlx5e_sq *sq, struct mlx5_wqe_ctrl_seg *ctrl) { - u16 ofst = sq->bf_offset; - /* ensure wqe is visible to device before updating doorbell record */ dma_wmb(); @@ -832,14 +824,8 @@ static inline void mlx5e_tx_notify_hw(struct mlx5e_sq *sq, * doorbell */ wmb(); - if (bf_sz) - __iowrite64_copy(sq->uar_map + ofst, ctrl, bf_sz); - else - mlx5_write64((__be32 *)ctrl, sq->uar_map + ofst, NULL); - /* flush the write-combining mapped buffer */ - wmb(); - sq->bf_offset ^= sq->bf_buf_size; + mlx5_write64((__be32 *)ctrl, sq->uar_map, NULL); } static inline void mlx5e_cq_arm(struct mlx5e_cq *cq) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index ddd7464c6b45..f9bcbd277adb 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -1017,7 +1017,7 @@ static int mlx5e_create_sq(struct mlx5e_channel *c, sq->channel = c; sq->tc= tc; - err = mlx5_alloc_bfreg(mdev, >bfreg, MLX5_CAP_GEN(mdev, bf), false); + err = mlx5_alloc_bfreg(mdev, >bfreg, false, false); if (err) return err; @@ -1030,10 +1030,7 @@ static int mlx5e_create_sq(struct mlx5e_channel *c, goto err_unmap_free_uar; sq->wq.db = >wq.db[MLX5_SND_DBR]; - if (sq->bfreg.wc) - set_bit(MLX5E_SQ_STATE_BF_ENABLE, >state); - sq->bf_buf_size = (1 << MLX5_CAP_GEN(mdev, log_bf_reg_size)) / 2; sq->max_inline = param->max_inline; sq->min_inline_mode = param->min_inline_mode; @@ -1050,7 +1047,6 @@ static int mlx5e_create_sq(struct mlx5e_channel *c, } sq
[PATCH net-next 05/12] net/mlx5e: Move mlx5e_rq struct declaration
Move struct mlx5e_rq and friends to appear after mlx5e_sq declaration in en.h. We will need this for next patch to move the mlx5e_sq instance into mlx5e_rq struct for XDP SQs. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 213 +-- 1 file changed, 105 insertions(+), 108 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index fce0eca0701c..8d789a25a1c0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -297,19 +297,113 @@ struct mlx5e_cq { struct mlx5_frag_wq_ctrl wq_ctrl; } cacheline_aligned_in_smp; -struct mlx5e_rq; -typedef void (*mlx5e_fp_handle_rx_cqe)(struct mlx5e_rq *rq, - struct mlx5_cqe64 *cqe); -typedef int (*mlx5e_fp_alloc_wqe)(struct mlx5e_rq *rq, struct mlx5e_rx_wqe *wqe, - u16 ix); +struct mlx5e_tx_wqe_info { + u32 num_bytes; + u8 num_wqebbs; + u8 num_dma; +}; + +enum mlx5e_dma_map_type { + MLX5E_DMA_MAP_SINGLE, + MLX5E_DMA_MAP_PAGE +}; + +struct mlx5e_sq_dma { + dma_addr_t addr; + u32 size; + enum mlx5e_dma_map_type type; +}; + +enum { + MLX5E_SQ_STATE_ENABLED, +}; + +struct mlx5e_sq_wqe_info { + u8 opcode; + u8 num_wqebbs; +}; -typedef void (*mlx5e_fp_dealloc_wqe)(struct mlx5e_rq *rq, u16 ix); +enum mlx5e_sq_type { + MLX5E_SQ_TXQ, + MLX5E_SQ_ICO, + MLX5E_SQ_XDP +}; + +struct mlx5e_sq { + /* data path */ + + /* dirtied @completion */ + u16cc; + u32dma_fifo_cc; + + /* dirtied @xmit */ + u16pc cacheline_aligned_in_smp; + u32dma_fifo_pc; + struct mlx5e_sq_stats stats; + + struct mlx5e_cqcq; + + /* pointers to per tx element info: write@xmit, read@completion */ + union { + struct { + struct sk_buff **skb; + struct mlx5e_sq_dma *dma_fifo; + struct mlx5e_tx_wqe_info *wqe_info; + } txq; + struct mlx5e_sq_wqe_info *ico_wqe; + struct { + struct mlx5e_sq_wqe_info *wqe_info; + struct mlx5e_dma_info *di; + bool doorbell; + } xdp; + } db; + + /* read only */ + struct mlx5_wq_cyc wq; + u32dma_fifo_mask; + void __iomem *uar_map; + struct netdev_queue *txq; + u32sqn; + u16max_inline; + u8 min_inline_mode; + u16edge; + struct device *pdev; + struct mlx5e_tstamp *tstamp; + __be32 mkey_be; + unsigned long state; + + /* control path */ + struct mlx5_wq_ctrlwq_ctrl; + struct mlx5e_channel *channel; + inttc; + u32rate_limit; + u8 type; +} cacheline_aligned_in_smp; + +static inline bool mlx5e_sq_has_room_for(struct mlx5e_sq *sq, u16 n) +{ + return (((sq->wq.sz_m1 & (sq->cc - sq->pc)) >= n) || + (sq->cc == sq->pc)); +} struct mlx5e_dma_info { struct page *page; dma_addr_t addr; }; +struct mlx5e_umr_dma_info { + __be64*mtt; + dma_addr_t mtt_addr; + struct mlx5e_dma_info dma_info[MLX5_MPWRQ_PAGES_PER_WQE]; + struct mlx5e_umr_wqe wqe; +}; + +struct mlx5e_mpw_info { + struct mlx5e_umr_dma_info umr; + u16 consumed_strides; + u16 skbs_frags[MLX5_MPWRQ_PAGES_PER_WQE]; +}; + struct mlx5e_rx_am_stats { int ppms; /* packets per msec */ int epms; /* events per msec */ @@ -346,6 +440,11 @@ struct mlx5e_page_cache { struct mlx5e_dma_info page_cache[MLX5E_CACHE_SIZE]; }; +struct mlx5e_rq; +typedef void (*mlx5e_fp_handle_rx_cqe)(struct mlx5e_rq*, struct mlx5_cqe64*); +typedef int (*mlx5e_fp_alloc_wqe)(struct mlx5e_rq*, struct mlx5e_rx_wqe*, u16); +typedef void (*mlx5e_fp_dealloc_wqe)(struct mlx5e_rq*, u16); + struct mlx5e_rq { /* data path */ struct mlx5_wq_ll wq; @@ -393,108 +492,6 @@ struct mlx5e_rq { struct mlx5_core_mkey umr_mkey; } cacheline_aligned_in_smp; -struct mlx5e_umr_dma_info { - __be64*mtt; - dma_addr_t mtt_addr; - struct mlx5e_dma_info dma_info[MLX5_MPWRQ_PAGES_PER_
[PATCH net-next 10/12] net/mlx5e: Proper names for SQ/RQ/CQ functions
Rename mlx5e_{create,destroy}_{sq,rq,cq} to mlx5e_{alloc,free}_{sq,rq,cq}. Rename mlx5e_{enable,disable}_{sq,rq,cq} to mlx5e_{create,destroy}_{sq,rq,cq}. mlx5e_{enable,disable}_{sq,rq,cq} used to actually create/destroy the SQ in FW, so we rename them to align the functions names with FW semantics. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 126 +++--- 1 file changed, 63 insertions(+), 63 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 7faf2bcccfa6..d03afa535064 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -539,9 +539,9 @@ static int mlx5e_create_rq_umr_mkey(struct mlx5e_rq *rq) return mlx5e_create_umr_mkey(priv, num_mtts, PAGE_SHIFT, >umr_mkey); } -static int mlx5e_create_rq(struct mlx5e_channel *c, - struct mlx5e_rq_param *param, - struct mlx5e_rq *rq) +static int mlx5e_alloc_rq(struct mlx5e_channel *c, + struct mlx5e_rq_param *param, + struct mlx5e_rq *rq) { struct mlx5e_priv *priv = c->priv; struct mlx5_core_dev *mdev = priv->mdev; @@ -674,7 +674,7 @@ static int mlx5e_create_rq(struct mlx5e_channel *c, return err; } -static void mlx5e_destroy_rq(struct mlx5e_rq *rq) +static void mlx5e_free_rq(struct mlx5e_rq *rq) { int i; @@ -699,7 +699,7 @@ static void mlx5e_destroy_rq(struct mlx5e_rq *rq) mlx5_wq_destroy(>wq_ctrl); } -static int mlx5e_enable_rq(struct mlx5e_rq *rq, struct mlx5e_rq_param *param) +static int mlx5e_create_rq(struct mlx5e_rq *rq, struct mlx5e_rq_param *param) { struct mlx5e_priv *priv = rq->priv; struct mlx5_core_dev *mdev = priv->mdev; @@ -798,7 +798,7 @@ static int mlx5e_modify_rq_vsd(struct mlx5e_rq *rq, bool vsd) return err; } -static void mlx5e_disable_rq(struct mlx5e_rq *rq) +static void mlx5e_destroy_rq(struct mlx5e_rq *rq) { mlx5_core_destroy_rq(rq->priv->mdev, rq->rqn); } @@ -850,18 +850,18 @@ static int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_tx_wqe *nopwqe; int err; - err = mlx5e_create_rq(c, param, rq); + err = mlx5e_alloc_rq(c, param, rq); if (err) return err; - err = mlx5e_enable_rq(rq, param); + err = mlx5e_create_rq(rq, param); if (err) - goto err_destroy_rq; + goto err_free_rq; set_bit(MLX5E_RQ_STATE_ENABLED, >state); err = mlx5e_modify_rq_state(rq, MLX5_RQC_STATE_RST, MLX5_RQC_STATE_RDY); if (err) - goto err_disable_rq; + goto err_destroy_rq; if (param->am_enabled) set_bit(MLX5E_RQ_STATE_AM, >rq.state); @@ -873,11 +873,11 @@ static int mlx5e_open_rq(struct mlx5e_channel *c, sq->stats.nop++; /* TODO no need for SQ stats in ico */ return 0; -err_disable_rq: - clear_bit(MLX5E_RQ_STATE_ENABLED, >state); - mlx5e_disable_rq(rq); err_destroy_rq: + clear_bit(MLX5E_RQ_STATE_ENABLED, >state); mlx5e_destroy_rq(rq); +err_free_rq: + mlx5e_free_rq(rq); return err; } @@ -888,9 +888,9 @@ static void mlx5e_close_rq(struct mlx5e_rq *rq) napi_synchronize(>channel->napi); /* prevent mlx5e_post_rx_wqes */ cancel_work_sync(>am.work); - mlx5e_disable_rq(rq); - mlx5e_free_rx_descs(rq); mlx5e_destroy_rq(rq); + mlx5e_free_rx_descs(rq); + mlx5e_free_rq(rq); } static void mlx5e_free_sq_xdp_db(struct mlx5e_sq *sq) @@ -997,10 +997,10 @@ static int mlx5e_sq_get_max_wqebbs(u8 sq_type) return MLX5_SEND_WQE_MAX_WQEBBS; } -static int mlx5e_create_sq(struct mlx5e_channel *c, - int tc, - struct mlx5e_sq_param *param, - struct mlx5e_sq *sq) +static int mlx5e_alloc_sq(struct mlx5e_channel *c, + int tc, + struct mlx5e_sq_param *param, + struct mlx5e_sq *sq) { struct mlx5e_priv *priv = c->priv; struct mlx5_core_dev *mdev = priv->mdev; @@ -1051,13 +1051,13 @@ static int mlx5e_create_sq(struct mlx5e_channel *c, return err; } -static void mlx5e_destroy_sq(struct mlx5e_sq *sq) +static void mlx5e_free_sq(struct mlx5e_sq *sq) { mlx5e_free_sq_db(sq); mlx5_wq_destroy(>wq_ctrl); } -static int mlx5e_enable_sq(struct mlx5e_sq *sq, struct mlx5e_sq_param *param) +static int mlx5e_create_sq(struct mlx5e_sq *sq, struct mlx5e_sq_param *param) { struct mlx5e_channel *c = sq->channel; struct mlx5e_priv *priv = c->priv; @@ -
[PATCH net-next 04/12] net/mlx5e: Move XDP completion functions to rx file
XDP code belongs to RX path, move mlx5e_poll_xdp_tx_cq and mlx5e_free_xdp_tx_descs to en_rx.c. Rename them to mlx5e_poll_xdpsq_cq and mlx5e_free_xdpsq_descs. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 + drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 82 +++ drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 24 +-- drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 62 + 4 files changed, 86 insertions(+), 84 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 22e4bad03f05..fce0eca0701c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -737,6 +737,8 @@ void mlx5e_cq_error_event(struct mlx5_core_cq *mcq, enum mlx5_event event); int mlx5e_napi_poll(struct napi_struct *napi, int budget); bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget); int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget); +bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq); +void mlx5e_free_xdpsq_descs(struct mlx5e_sq *sq); void mlx5e_free_sq_descs(struct mlx5e_sq *sq); void mlx5e_page_release(struct mlx5e_rq *rq, struct mlx5e_dma_info *dma_info, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index 873b3085756c..bc74d6032a5c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -989,3 +989,85 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget) return work_done; } + +bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq) +{ + struct mlx5e_sq *sq; + u16 sqcc; + int i; + + sq = container_of(cq, struct mlx5e_sq, cq); + + if (unlikely(!test_bit(MLX5E_SQ_STATE_ENABLED, >state))) + return false; + + /* sq->cc must be updated only after mlx5_cqwq_update_db_record(), +* otherwise a cq overrun may occur +*/ + sqcc = sq->cc; + + for (i = 0; i < MLX5E_TX_CQ_POLL_BUDGET; i++) { + struct mlx5_cqe64 *cqe; + u16 wqe_counter; + bool last_wqe; + + cqe = mlx5e_get_cqe(cq); + if (!cqe) + break; + + mlx5_cqwq_pop(>wq); + + wqe_counter = be16_to_cpu(cqe->wqe_counter); + + do { + struct mlx5e_sq_wqe_info *wi; + struct mlx5e_dma_info *di; + u16 ci; + + last_wqe = (sqcc == wqe_counter); + + ci = sqcc & sq->wq.sz_m1; + di = >db.xdp.di[ci]; + wi = >db.xdp.wqe_info[ci]; + + if (unlikely(wi->opcode == MLX5_OPCODE_NOP)) { + sqcc++; + continue; + } + + sqcc += wi->num_wqebbs; + /* Recycle RX page */ + mlx5e_page_release(>channel->rq, di, true); + } while (!last_wqe); + } + + mlx5_cqwq_update_db_record(>wq); + + /* ensure cq space is freed before enabling more cqes */ + wmb(); + + sq->cc = sqcc; + return (i == MLX5E_TX_CQ_POLL_BUDGET); +} + +void mlx5e_free_xdpsq_descs(struct mlx5e_sq *sq) +{ + struct mlx5e_sq_wqe_info *wi; + struct mlx5e_dma_info *di; + u16 ci; + + while (sq->cc != sq->pc) { + ci = sq->cc & sq->wq.sz_m1; + di = >db.xdp.di[ci]; + wi = >db.xdp.wqe_info[ci]; + + if (wi->opcode == MLX5_OPCODE_NOP) { + sq->cc++; + continue; + } + + sq->cc += wi->num_wqebbs; + + mlx5e_page_release(>channel->rq, di, false); + } +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index eec4354208ee..7497b6ac4382 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -493,28 +493,6 @@ static void mlx5e_free_txq_sq_descs(struct mlx5e_sq *sq) } } -static void mlx5e_free_xdp_sq_descs(struct mlx5e_sq *sq) -{ - struct mlx5e_sq_wqe_info *wi; - struct mlx5e_dma_info *di; - u16 ci; - - while (sq->cc != sq->pc) { - ci = sq->cc & sq->wq.sz_m1; - di = >db.xdp.di[ci]; - wi = >db.xdp.wqe_info[ci]; - - if (wi->opcode == MLX5_OPCODE_NOP) { - sq->cc++; - continue; - } - - sq->cc += wi->
[PATCH net-next 01/12] net/mlx5e: Use dma_rmb rather than rmb in CQE fetch routine
Use dma_rmb in mlx5e_get_cqe rather than aggressive rmb (at least on some architectures), this should help improve the performance on such CPU archs where dma_rmb is optimized. Performance improvement: System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz Test case Baseline Now improvement --- TX packets (24 threads) 45Mpps50Mpps 11% TC stack Drop (1 core) 3.45Mpps 3.6Mpps 5% XDP Drop (1 core) 14Mpps16.9Mpps20% XDP TX(1 core) 10.4Mpps 12Mpps 15% Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c index e5c12a732aa1..d8cda2f6239b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c @@ -44,7 +44,7 @@ struct mlx5_cqe64 *mlx5e_get_cqe(struct mlx5e_cq *cq) return NULL; /* ensure cqe content is read after cqe ownership bit */ - rmb(); + dma_rmb(); return cqe; } -- 2.11.0
[PATCH net-next 06/12] net/mlx5e: Move XDP SQ instance into RQ
To save many rq->channel->sq dereferences in fast-path. And rename it to xdpsq. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 4 +++- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 12 ++-- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 18 +++--- drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 2 +- 4 files changed, 21 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 8d789a25a1c0..5e4ae94c9f6a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -479,7 +479,10 @@ struct mlx5e_rq { u16rx_headroom; struct mlx5e_rx_am am; /* Adaptive Moderation */ + + /* XDP */ struct bpf_prog *xdp_prog; + struct mlx5e_sqxdpsq; /* control */ struct mlx5_wq_ctrlwq_ctrl; @@ -499,7 +502,6 @@ enum channel_flags { struct mlx5e_channel { /* data path */ struct mlx5e_rqrq; - struct mlx5e_sqxdp_sq; struct mlx5e_sqsq[MLX5E_MAX_NUM_TC]; struct mlx5e_sqicosq; /* internal control operations */ bool xdp; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 49c1769d13b9..210033187bfe 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -1562,7 +1562,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, goto err_close_tx_cqs; /* XDP SQ CQ params are same as normal TXQ sq CQ params */ - err = c->xdp ? mlx5e_open_cq(c, >tx_cq, >xdp_sq.cq, + err = c->xdp ? mlx5e_open_cq(c, >tx_cq, >rq.xdpsq.cq, priv->params.tx_cq_moderation) : 0; if (err) goto err_close_rx_cq; @@ -1587,7 +1587,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, } } - err = c->xdp ? mlx5e_open_sq(c, 0, >xdp_sq, >xdp_sq) : 0; + err = c->xdp ? mlx5e_open_sq(c, 0, >xdp_sq, >rq.xdpsq) : 0; if (err) goto err_close_sqs; @@ -1601,7 +1601,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, return 0; err_close_xdp_sq: if (c->xdp) - mlx5e_close_sq(>xdp_sq); + mlx5e_close_sq(>rq.xdpsq); err_close_sqs: mlx5e_close_sqs(c); @@ -1612,7 +1612,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, err_disable_napi: napi_disable(>napi); if (c->xdp) - mlx5e_close_cq(>xdp_sq.cq); + mlx5e_close_cq(>rq.xdpsq.cq); err_close_rx_cq: mlx5e_close_cq(>rq.cq); @@ -1634,12 +1634,12 @@ static void mlx5e_close_channel(struct mlx5e_channel *c) { mlx5e_close_rq(>rq); if (c->xdp) - mlx5e_close_sq(>xdp_sq); + mlx5e_close_sq(>rq.xdpsq); mlx5e_close_sqs(c); mlx5e_close_sq(>icosq); napi_disable(>napi); if (c->xdp) - mlx5e_close_cq(>xdp_sq.cq); + mlx5e_close_cq(>rq.xdpsq.cq); mlx5e_close_cq(>rq.cq); mlx5e_close_tx_cqs(c); mlx5e_close_cq(>icosq.cq); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index bc74d6032a5c..040074f36313 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -653,7 +653,7 @@ static inline bool mlx5e_xmit_xdp_frame(struct mlx5e_rq *rq, struct mlx5e_dma_info *di, const struct xdp_buff *xdp) { - struct mlx5e_sq *sq = >channel->xdp_sq; + struct mlx5e_sq *sq = >xdpsq; struct mlx5_wq_cyc *wq = >wq; u16 pi= sq->pc & wq->sz_m1; struct mlx5e_tx_wqe *wqe = mlx5_wq_cyc_get_wqe(wq, pi); @@ -950,7 +950,7 @@ void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe) int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget) { struct mlx5e_rq *rq = container_of(cq, struct mlx5e_rq, cq); - struct mlx5e_sq *xdp_sq = >channel->xdp_sq; + struct mlx5e_sq *xdpsq = >xdpsq; int work_done = 0; if (unlikely(!test_bit(MLX5E_RQ_STATE_ENABLED, >state))) @@ -977,9 +977,9 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget) rq->handle_rx_cqe(rq, cqe); } - if (xdp_sq-&
[PATCH net-next 07/12] net/mlx5e: Poll XDP TX CQ before RX CQ
Handle XDP TX completions before handling RX packets, to make sure more free space is available for XDP TX packets a moment before handling RX packets. Performance improvement: System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz Test case Before Now improvement --- XDP Drop (1 core) 16.9Mpps 16.9MppsNo change XDP TX (1 core) 12Mpps13Mpps 8% Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c index 9beeb4a1212f..c880022bb21a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c @@ -118,12 +118,12 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget) for (i = 0; i < c->num_tc; i++) busy |= mlx5e_poll_tx_cq(>sq[i].cq, budget); - work_done = mlx5e_poll_rx_cq(>rq.cq, budget); - busy |= work_done == budget; - if (c->xdp) busy |= mlx5e_poll_xdpsq_cq(>rq.xdpsq.cq); + work_done = mlx5e_poll_rx_cq(>rq.cq, budget); + busy |= work_done == budget; + mlx5e_poll_ico_cq(>icosq.cq); busy |= mlx5e_post_rx_wqes(>rq); -- 2.11.0
[PATCH net-next 11/12] net/mlx5e: Generalize SQ create/modify/destroy functions
In the next patches we will introduce different SQ types, and we would want to reuse those functions, in this patch we make them agnostic to SQ type (txq, xdp, ico). Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 111 ++ 1 file changed, 69 insertions(+), 42 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index d03afa535064..dcc67df54a5c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -1057,10 +1057,19 @@ static void mlx5e_free_sq(struct mlx5e_sq *sq) mlx5_wq_destroy(>wq_ctrl); } -static int mlx5e_create_sq(struct mlx5e_sq *sq, struct mlx5e_sq_param *param) +struct mlx5e_create_sq_param { + struct mlx5_wq_ctrl*wq_ctrl; + u32 cqn; + u32 tisn; + u8 tis_lst_sz; + u8 min_inline_mode; +}; + +static int mlx5e_create_sq(struct mlx5e_priv *priv, + struct mlx5e_sq_param *param, + struct mlx5e_create_sq_param *csp, + u32 *sqn) { - struct mlx5e_channel *c = sq->channel; - struct mlx5e_priv *priv = c->priv; struct mlx5_core_dev *mdev = priv->mdev; void *in; @@ -1070,7 +1079,7 @@ static int mlx5e_create_sq(struct mlx5e_sq *sq, struct mlx5e_sq_param *param) int err; inlen = MLX5_ST_SZ_BYTES(create_sq_in) + - sizeof(u64) * sq->wq_ctrl.buf.npages; + sizeof(u64) * csp->wq_ctrl->buf.npages; in = mlx5_vzalloc(inlen); if (!in) return -ENOMEM; @@ -1079,38 +1088,41 @@ static int mlx5e_create_sq(struct mlx5e_sq *sq, struct mlx5e_sq_param *param) wq = MLX5_ADDR_OF(sqc, sqc, wq); memcpy(sqc, param->sqc, sizeof(param->sqc)); - - MLX5_SET(sqc, sqc, tis_num_0, param->type == MLX5E_SQ_ICO ? - 0 : priv->tisn[sq->tc]); - MLX5_SET(sqc, sqc, cqn,sq->cq.mcq.cqn); + MLX5_SET(sqc, sqc, tis_lst_sz, csp->tis_lst_sz); + MLX5_SET(sqc, sqc, tis_num_0, csp->tisn); + MLX5_SET(sqc, sqc, cqn, csp->cqn); if (MLX5_CAP_ETH(mdev, wqe_inline_mode) == MLX5_CAP_INLINE_MODE_VPORT_CONTEXT) - MLX5_SET(sqc, sqc, min_wqe_inline_mode, sq->min_inline_mode); + MLX5_SET(sqc, sqc, min_wqe_inline_mode, csp->min_inline_mode); - MLX5_SET(sqc, sqc, state, MLX5_SQC_STATE_RST); - MLX5_SET(sqc, sqc, tis_lst_sz, param->type == MLX5E_SQ_ICO ? 0 : 1); + MLX5_SET(sqc, sqc, state, MLX5_SQC_STATE_RST); MLX5_SET(wq, wq, wq_type, MLX5_WQ_TYPE_CYCLIC); - MLX5_SET(wq, wq, uar_page, mdev->mlx5e_res.bfreg.index); - MLX5_SET(wq, wq, log_wq_pg_sz, sq->wq_ctrl.buf.page_shift - + MLX5_SET(wq, wq, uar_page, priv->mdev->mlx5e_res.bfreg.index); + MLX5_SET(wq, wq, log_wq_pg_sz, csp->wq_ctrl->buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT); - MLX5_SET64(wq, wq, dbr_addr, sq->wq_ctrl.db.dma); + MLX5_SET64(wq, wq, dbr_addr, csp->wq_ctrl->db.dma); - mlx5_fill_page_array(>wq_ctrl.buf, -(__be64 *)MLX5_ADDR_OF(wq, wq, pas)); + mlx5_fill_page_array(>wq_ctrl->buf, (__be64 *)MLX5_ADDR_OF(wq, wq, pas)); - err = mlx5_core_create_sq(mdev, in, inlen, >sqn); + err = mlx5_core_create_sq(mdev, in, inlen, sqn); kvfree(in); return err; } -static int mlx5e_modify_sq(struct mlx5e_sq *sq, int curr_state, - int next_state, bool update_rl, int rl_index) +struct mlx5e_modify_sq_param { + int curr_state; + int next_state; + bool rl_update; + int rl_index; +}; + +static int mlx5e_modify_sq(struct mlx5e_priv *priv, + u32 sqn, + struct mlx5e_modify_sq_param *p) { - struct mlx5e_channel *c = sq->channel; - struct mlx5e_priv *priv = c->priv; struct mlx5_core_dev *mdev = priv->mdev; void *in; @@ -1125,29 +1137,24 @@ static int mlx5e_modify_sq(struct mlx5e_sq *sq, int curr_state, sqc = MLX5_ADDR_OF(modify_sq_in, in, ctx); - MLX5_SET(modify_sq_in, in, sq_state, curr_state); - MLX5_SET(sqc, sqc, state, next_state); - if (update_rl && next_state == MLX5_SQC_STATE_RDY) { + MLX5_SET(modify_sq_in, in, sq_state, p->curr_state); + MLX5_SET(sqc, sqc, state, p->next_state); + if (p->rl_update && p->next_state == MLX5_SQC_STATE_RDY
Re: [PATCH net-next 00/12] Mellanox mlx5e XDP performance optimization
On Sat, Mar 25, 2017 at 2:26 AM, Alexei Starovoitov <a...@fb.com> wrote: > On 3/24/17 2:52 PM, Saeed Mahameed wrote: >> >> Hi Dave, >> >> This series provides some preformancee optimizations for mlx5e >> driver, especially for XDP TX flows. >> >> 1st patch is a simple change of rmb to dma_rmb in CQE fetch routine >> which shows a huge gain for both RX and TX packet rates. >> >> 2nd patch removes write combining logic from the driver TX handler >> and simplifies the TX logic while improving TX CPU utilization. >> >> All other patches combined provide some refactoring to the driver TX >> flows to allow some significant XDP TX improvements. >> >> More details and performance numbers per patch can be found in each patch >> commit message compared to the preceding patch. >> >> Overall performance improvemnets >> System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz >> >> Test case Baseline Now improvement >> --- >> TX packets (24 threads) 45Mpps54Mpps 20% >> TC stack Drop (1 core) 3.45Mpps 3.6Mpps 5% >> XDP Drop (1 core) 14Mpps16.9Mpps20% >> XDP TX(1 core) 10.4Mpps 13.7Mpps31% > > > Excellent work! > All patches look great, so for the series: > Acked-by: Alexei Starovoitov <a...@kernel.org> > Thanks Alexei ! > in patch 12 I noticed that inline_mode is being evaluated. > I think for xdp queues it's guaranteed to be fixed. > Can we optimize that path little bit more as well? Yes, you are right, we do evaluate it in mlx5e_alloc_xdpsq + if (sq->min_inline_mode != MLX5_INLINE_MODE_NONE) { + inline_hdr_sz = MLX5E_XDP_MIN_INLINE; + ds_cnt++; + } and check it again in mlx5e_xmit_xdp_frame + /* copy the inline part if required */ + if (sq->min_inline_mode != MLX5_INLINE_MODE_NONE) { sq->min_inline_mode is fixed in run-time, but it is different across HW versions. This condition is needed so we would not copy inline headers and waste CPU cycles while it is not required from ConnectX-5 and later. Actually this is a 5% XDP_TX optimization you get when you run over ConnectX-5 [1]. in ConnectX-4 and 4-LX driver is still required to copy L2 headers into TX descriptor so the HW can make the loopback decision correctly (needed in case you want XDP program to switch packets between different PFs/VFs running on the same box/NIC). So i don't see anyway to do this without breaking XDP loopback functionality or removing the connectX-5 optimization. for my taste this condition is good as is. [1] https://www.spinics.net/lists/netdev/msg419215.html > Thanks!
[net-next 03/12] net/mlx5e: Add intermediate struct for TC flow parsing attributes
From: Or Gerlitz <ogerl...@mellanox.com> Add intermediate structure to store attributes parsed from TC filter matching/actions parts which are soon to be configured into the HW. Currently put there the flow matching spec after being parsed. More content to be added in down-stream patch. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 29 +++-- 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 2a9df0a0b859..9f900afcd7ea 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -70,6 +70,10 @@ struct mlx5e_tc_flow { }; }; +struct mlx5e_tc_flow_parse_attr { + struct mlx5_flow_spec spec; +}; + enum { MLX5_HEADER_TYPE_VXLAN = 0x0, MLX5_HEADER_TYPE_NVGRE = 0x1, @@ -80,7 +84,7 @@ enum { static struct mlx5_flow_handle * mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv, - struct mlx5_flow_spec *spec, + struct mlx5e_tc_flow_parse_attr *parse_attr, struct mlx5_nic_flow_attr *attr) { struct mlx5_core_dev *dev = priv->mdev; @@ -123,8 +127,9 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv, table_created = true; } - spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS; - rule = mlx5_add_flow_rules(priv->fs.tc.t, spec, _act, , 1); + parse_attr->spec.match_criteria_enable = MLX5_MATCH_OUTER_HEADERS; + rule = mlx5_add_flow_rules(priv->fs.tc.t, _attr->spec, + _act, , 1); if (IS_ERR(rule)) goto err_add_rule; @@ -161,7 +166,7 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv, static struct mlx5_flow_handle * mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, - struct mlx5_flow_spec *spec, + struct mlx5e_tc_flow_parse_attr *parse_attr, struct mlx5_esw_flow_attr *attr) { struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; @@ -171,7 +176,7 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, if (err) return ERR_PTR(err); - return mlx5_eswitch_add_offloaded_rule(esw, spec, attr); + return mlx5_eswitch_add_offloaded_rule(esw, _attr->spec, attr); } static void mlx5e_detach_encap(struct mlx5e_priv *priv, @@ -1173,8 +1178,8 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol, struct tc_cls_flower_offload *f) { struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + struct mlx5e_tc_flow_parse_attr *parse_attr; struct mlx5e_tc_table *tc = >fs.tc; - struct mlx5_flow_spec *spec; struct mlx5e_tc_flow *flow; int attr_size, err = 0; u8 flow_flags = 0; @@ -1188,8 +1193,8 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol, } flow = kzalloc(sizeof(*flow) + attr_size, GFP_KERNEL); - spec = mlx5_vzalloc(sizeof(*spec)); - if (!spec || !flow) { + parse_attr = mlx5_vzalloc(sizeof(*parse_attr)); + if (!parse_attr || !flow) { err = -ENOMEM; goto err_free; } @@ -1197,7 +1202,7 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol, flow->cookie = f->cookie; flow->flags = flow_flags; - err = parse_cls_flower(priv, flow, spec, f); + err = parse_cls_flower(priv, flow, _attr->spec, f); if (err < 0) goto err_free; @@ -1205,12 +1210,12 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol, err = parse_tc_fdb_actions(priv, f->exts, flow); if (err < 0) goto err_free; - flow->rule = mlx5e_tc_add_fdb_flow(priv, spec, flow->esw_attr); + flow->rule = mlx5e_tc_add_fdb_flow(priv, parse_attr, flow->esw_attr); } else { err = parse_tc_nic_actions(priv, f->exts, flow->nic_attr); if (err < 0) goto err_free; - flow->rule = mlx5e_tc_add_nic_flow(priv, spec, flow->nic_attr); + flow->rule = mlx5e_tc_add_nic_flow(priv, parse_attr, flow->nic_attr); } if (IS_ERR(flow->rule)) { @@ -1231,7 +1236,7 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol, err_free: kfree(flow); out: - kvfree(spec); + kvfree(parse_attr); return err; } -- 2.11.0
[net-next 11/12] net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions
From: Or Gerlitz <ogerl...@mellanox.com> This includes calling the parsing code that translates from pedit speak to the HW API, allocation (deallocation) of a modify header context and setting the modify header id associated with this context to the FTE of that flow. Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Hadar Hen Zion <had...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 35 + 1 file changed, 35 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 3a31195f0d9c..4045b4768294 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -52,6 +52,7 @@ struct mlx5_nic_flow_attr { u32 action; u32 flow_tag; + u32 mod_hdr_id; }; enum { @@ -97,10 +98,12 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv, .action = attr->action, .flow_tag = attr->flow_tag, .encap_id = 0, + .modify_id = attr->mod_hdr_id, }; struct mlx5_fc *counter = NULL; struct mlx5_flow_handle *rule; bool table_created = false; + int err; if (attr->action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) { dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE; @@ -114,6 +117,18 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv, dest.counter = counter; } + if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) { + err = mlx5_modify_header_alloc(dev, MLX5_FLOW_NAMESPACE_KERNEL, + parse_attr->num_mod_hdr_actions, + parse_attr->mod_hdr_actions, + >mod_hdr_id); + kfree(parse_attr->mod_hdr_actions); + if (err) { + rule = ERR_PTR(err); + goto err_create_mod_hdr_id; + } + } + if (IS_ERR_OR_NULL(priv->fs.tc.t)) { priv->fs.tc.t = mlx5_create_auto_grouped_flow_table(priv->fs.ns, @@ -146,6 +161,10 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv, priv->fs.tc.t = NULL; } err_create_ft: + if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) + mlx5_modify_header_dealloc(priv->mdev, + attr->mod_hdr_id); +err_create_mod_hdr_id: mlx5_fc_destroy(dev, counter); return rule; @@ -164,6 +183,10 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv, mlx5_destroy_flow_table(priv->fs.tc.t); priv->fs.tc.t = NULL; } + + if (flow->nic_attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) + mlx5_modify_header_dealloc(priv->mdev, + flow->nic_attr->mod_hdr_id); } static void mlx5e_detach_encap(struct mlx5e_priv *priv, @@ -955,6 +978,7 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv, struct tcf_exts *exts, struct mlx5_nic_flow_attr *attr = flow->nic_attr; const struct tc_action *a; LIST_HEAD(actions); + int err; if (tc_no_actions(exts)) return -EINVAL; @@ -976,6 +1000,17 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv, struct tcf_exts *exts, continue; } + if (is_tcf_pedit(a)) { + err = parse_tc_pedit_action(priv, a, MLX5_FLOW_NAMESPACE_KERNEL, + parse_attr); + if (err) + return err; + + attr->action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR | + MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; + continue; + } + if (is_tcf_skbedit_mark(a)) { u32 mark = tcf_skbedit_mark(a); -- 2.11.0
[net-next 02/12] net/mlx5e: Add NIC attributes for offloaded TC flows
From: Or Gerlitz <ogerl...@mellanox.com> Add structure that contains the attributes related to offloaded NIC flows. Currently it has the actions and flow tag. While here, do xmas tree cleanup of the TC configure function. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Hadar Hen Zion <had...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 51 +++-- 1 file changed, 31 insertions(+), 20 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index b2501987988b..2a9df0a0b859 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -48,8 +48,14 @@ #include "eswitch.h" #include "vxlan.h" +struct mlx5_nic_flow_attr { + u32 action; + u32 flow_tag; +}; + enum { MLX5E_TC_FLOW_ESWITCH = BIT(0), + MLX5E_TC_FLOW_NIC = BIT(1), }; struct mlx5e_tc_flow { @@ -58,7 +64,10 @@ struct mlx5e_tc_flow { u8 flags; struct mlx5_flow_handle *rule; struct list_headencap; /* flows sharing the same encap */ - struct mlx5_esw_flow_attr esw_attr[0]; + union { + struct mlx5_esw_flow_attr esw_attr[0]; + struct mlx5_nic_flow_attr nic_attr[0]; + }; }; enum { @@ -72,23 +81,23 @@ enum { static struct mlx5_flow_handle * mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv, struct mlx5_flow_spec *spec, - u32 action, u32 flow_tag) + struct mlx5_nic_flow_attr *attr) { struct mlx5_core_dev *dev = priv->mdev; struct mlx5_flow_destination dest = { 0 }; struct mlx5_flow_act flow_act = { - .action = action, - .flow_tag = flow_tag, + .action = attr->action, + .flow_tag = attr->flow_tag, .encap_id = 0, }; struct mlx5_fc *counter = NULL; struct mlx5_flow_handle *rule; bool table_created = false; - if (action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) { + if (attr->action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) { dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE; dest.ft = priv->fs.vlan.ft.t; - } else if (action & MLX5_FLOW_CONTEXT_ACTION_COUNT) { + } else if (attr->action & MLX5_FLOW_CONTEXT_ACTION_COUNT) { counter = mlx5_fc_create(dev, true); if (IS_ERR(counter)) return ERR_CAST(counter); @@ -651,7 +660,7 @@ static int parse_cls_flower(struct mlx5e_priv *priv, } static int parse_tc_nic_actions(struct mlx5e_priv *priv, struct tcf_exts *exts, - u32 *action, u32 *flow_tag) + struct mlx5_nic_flow_attr *attr) { const struct tc_action *a; LIST_HEAD(actions); @@ -659,20 +668,20 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv, struct tcf_exts *exts, if (tc_no_actions(exts)) return -EINVAL; - *flow_tag = MLX5_FS_DEFAULT_FLOW_TAG; - *action = 0; + attr->flow_tag = MLX5_FS_DEFAULT_FLOW_TAG; + attr->action = 0; tcf_exts_to_list(exts, ); list_for_each_entry(a, , list) { /* Only support a single action per rule */ - if (*action) + if (attr->action) return -EINVAL; if (is_tcf_gact_shot(a)) { - *action |= MLX5_FLOW_CONTEXT_ACTION_DROP; + attr->action |= MLX5_FLOW_CONTEXT_ACTION_DROP; if (MLX5_CAP_FLOWTABLE(priv->mdev, flow_table_properties_nic_receive.flow_counter)) - *action |= MLX5_FLOW_CONTEXT_ACTION_COUNT; + attr->action |= MLX5_FLOW_CONTEXT_ACTION_COUNT; continue; } @@ -685,8 +694,8 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv, struct tcf_exts *exts, return -EINVAL; } - *flow_tag = mark; - *action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; + attr->flow_tag = mark; + attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; continue; } @@ -1163,17 +1172,19 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts, int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol, struct tc_cls_flower_offload *f) { + struct mlx5_eswitch *esw
[net-next 06/12] net/mlx5: Reorder few command cases to reflect their natural order
From: Or Gerlitz <ogerl...@mellanox.com> Move the commands related to scheduling elements and vport qos to a suitable location (according to the MLX5_CMD_OP enum values) in the command string and internal error helpers. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Hadar Hen Zion <had...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index a380353a78c2..c3c6e931cc35 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -279,6 +279,8 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op, case MLX5_CMD_OP_DESTROY_XRC_SRQ: case MLX5_CMD_OP_DESTROY_DCT: case MLX5_CMD_OP_DEALLOC_Q_COUNTER: + case MLX5_CMD_OP_DESTROY_SCHEDULING_ELEMENT: + case MLX5_CMD_OP_DESTROY_QOS_PARA_VPORT: case MLX5_CMD_OP_DEALLOC_PD: case MLX5_CMD_OP_DEALLOC_UAR: case MLX5_CMD_OP_DETACH_FROM_MCG: @@ -305,8 +307,6 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op, case MLX5_CMD_OP_SET_FLOW_TABLE_ENTRY: case MLX5_CMD_OP_SET_FLOW_TABLE_ROOT: case MLX5_CMD_OP_DEALLOC_ENCAP_HEADER: - case MLX5_CMD_OP_DESTROY_SCHEDULING_ELEMENT: - case MLX5_CMD_OP_DESTROY_QOS_PARA_VPORT: return MLX5_CMD_STAT_OK; case MLX5_CMD_OP_QUERY_HCA_CAP: @@ -363,6 +363,10 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op, case MLX5_CMD_OP_QUERY_Q_COUNTER: case MLX5_CMD_OP_SET_RATE_LIMIT: case MLX5_CMD_OP_QUERY_RATE_LIMIT: + case MLX5_CMD_OP_CREATE_SCHEDULING_ELEMENT: + case MLX5_CMD_OP_QUERY_SCHEDULING_ELEMENT: + case MLX5_CMD_OP_MODIFY_SCHEDULING_ELEMENT: + case MLX5_CMD_OP_CREATE_QOS_PARA_VPORT: case MLX5_CMD_OP_ALLOC_PD: case MLX5_CMD_OP_ALLOC_UAR: case MLX5_CMD_OP_CONFIG_INT_MODERATION: @@ -414,10 +418,6 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op, case MLX5_CMD_OP_ALLOC_FLOW_COUNTER: case MLX5_CMD_OP_QUERY_FLOW_COUNTER: case MLX5_CMD_OP_ALLOC_ENCAP_HEADER: - case MLX5_CMD_OP_CREATE_SCHEDULING_ELEMENT: - case MLX5_CMD_OP_QUERY_SCHEDULING_ELEMENT: - case MLX5_CMD_OP_MODIFY_SCHEDULING_ELEMENT: - case MLX5_CMD_OP_CREATE_QOS_PARA_VPORT: *status = MLX5_DRIVER_STATUS_ABORTED; *synd = MLX5_DRIVER_SYND; return -EIO; @@ -501,6 +501,12 @@ const char *mlx5_command_str(int command) MLX5_COMMAND_STR_CASE(QUERY_Q_COUNTER); MLX5_COMMAND_STR_CASE(SET_RATE_LIMIT); MLX5_COMMAND_STR_CASE(QUERY_RATE_LIMIT); + MLX5_COMMAND_STR_CASE(CREATE_SCHEDULING_ELEMENT); + MLX5_COMMAND_STR_CASE(DESTROY_SCHEDULING_ELEMENT); + MLX5_COMMAND_STR_CASE(QUERY_SCHEDULING_ELEMENT); + MLX5_COMMAND_STR_CASE(MODIFY_SCHEDULING_ELEMENT); + MLX5_COMMAND_STR_CASE(CREATE_QOS_PARA_VPORT); + MLX5_COMMAND_STR_CASE(DESTROY_QOS_PARA_VPORT); MLX5_COMMAND_STR_CASE(ALLOC_PD); MLX5_COMMAND_STR_CASE(DEALLOC_PD); MLX5_COMMAND_STR_CASE(ALLOC_UAR); @@ -576,12 +582,6 @@ const char *mlx5_command_str(int command) MLX5_COMMAND_STR_CASE(MODIFY_FLOW_TABLE); MLX5_COMMAND_STR_CASE(ALLOC_ENCAP_HEADER); MLX5_COMMAND_STR_CASE(DEALLOC_ENCAP_HEADER); - MLX5_COMMAND_STR_CASE(CREATE_SCHEDULING_ELEMENT); - MLX5_COMMAND_STR_CASE(DESTROY_SCHEDULING_ELEMENT); - MLX5_COMMAND_STR_CASE(QUERY_SCHEDULING_ELEMENT); - MLX5_COMMAND_STR_CASE(MODIFY_SCHEDULING_ELEMENT); - MLX5_COMMAND_STR_CASE(CREATE_QOS_PARA_VPORT); - MLX5_COMMAND_STR_CASE(DESTROY_QOS_PARA_VPORT); default: return "unknown command opcode"; } } -- 2.11.0
[net-next 09/12] net/sched: Add accessor functions to pedit keys for offloading drivers
From: Or Gerlitz <ogerl...@mellanox.com> HW drivers will use the header-type and command fields from the extended keys, and some fields (e.g mask, val, offset) from the legacy keys. Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Hadar Hen Zion <had...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- include/net/tc_act/tc_pedit.h | 45 +++ 1 file changed, 45 insertions(+) diff --git a/include/net/tc_act/tc_pedit.h b/include/net/tc_act/tc_pedit.h index dfbd6ee0bc7c..a46c3f2ace70 100644 --- a/include/net/tc_act/tc_pedit.h +++ b/include/net/tc_act/tc_pedit.h @@ -2,6 +2,7 @@ #define __NET_TC_PED_H #include +#include struct tcf_pedit_key_ex { enum pedit_header_type htype; @@ -17,4 +18,48 @@ struct tcf_pedit { }; #define to_pedit(a) ((struct tcf_pedit *)a) +static inline bool is_tcf_pedit(const struct tc_action *a) +{ +#ifdef CONFIG_NET_CLS_ACT + if (a->ops && a->ops->type == TCA_ACT_PEDIT) + return true; +#endif + return false; +} + +static inline int tcf_pedit_nkeys(const struct tc_action *a) +{ + return to_pedit(a)->tcfp_nkeys; +} + +static inline u32 tcf_pedit_htype(const struct tc_action *a, int index) +{ + if (to_pedit(a)->tcfp_keys_ex) + return to_pedit(a)->tcfp_keys_ex[index].htype; + + return TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK; +} + +static inline u32 tcf_pedit_cmd(const struct tc_action *a, int index) +{ + if (to_pedit(a)->tcfp_keys_ex) + return to_pedit(a)->tcfp_keys_ex[index].cmd; + + return __PEDIT_CMD_MAX; +} + +static inline u32 tcf_pedit_mask(const struct tc_action *a, int index) +{ + return to_pedit(a)->tcfp_keys[index].mask; +} + +static inline u32 tcf_pedit_val(const struct tc_action *a, int index) +{ + return to_pedit(a)->tcfp_keys[index].val; +} + +static inline u32 tcf_pedit_offset(const struct tc_action *a, int index) +{ + return to_pedit(a)->tcfp_keys[index].off; +} #endif /* __NET_TC_PED_H */ -- 2.11.0
[net-next 05/12] net/mlx5: Add helper to initialize a flow steering actions struct instance
From: Or Gerlitz <ogerl...@mellanox.com> There are bunch of places in the code where the intermediate struct that keeps the elements related to flow actions is initialized with the same default values. Put that into a small DECLARE type helper. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Hadar Hen Zion <had...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 14 +++--- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 18 +++--- include/linux/mlx5/fs.h | 5 - 3 files changed, 10 insertions(+), 27 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c index 68419a01db36..c4e9cc79f5c7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c @@ -174,13 +174,9 @@ static int arfs_add_default_rule(struct mlx5e_priv *priv, enum arfs_type type) { struct arfs_table *arfs_t = >fs.arfs.arfs_tables[type]; - struct mlx5_flow_act flow_act = { - .action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST, - .flow_tag = MLX5_FS_DEFAULT_FLOW_TAG, - .encap_id = 0, - }; - struct mlx5_flow_destination dest; struct mlx5e_tir *tir = priv->indir_tir; + struct mlx5_flow_destination dest; + MLX5_DECLARE_FLOW_ACT(flow_act); struct mlx5_flow_spec *spec; int err = 0; @@ -469,15 +465,11 @@ static struct arfs_table *arfs_get_table(struct mlx5e_arfs_tables *arfs, static struct mlx5_flow_handle *arfs_add_rule(struct mlx5e_priv *priv, struct arfs_rule *arfs_rule) { - struct mlx5_flow_act flow_act = { - .action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST, - .flow_tag = MLX5_FS_DEFAULT_FLOW_TAG, - .encap_id = 0, - }; struct mlx5e_arfs_tables *arfs = >fs.arfs; struct arfs_tuple *tuple = _rule->tuple; struct mlx5_flow_handle *rule = NULL; struct mlx5_flow_destination dest; + MLX5_DECLARE_FLOW_ACT(flow_act); struct arfs_table *arfs_table; struct mlx5_flow_spec *spec; struct mlx5_flow_table *ft; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c index f2762e45c8ae..5376d69a6b1a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c @@ -159,14 +159,10 @@ static int __mlx5e_add_vlan_rule(struct mlx5e_priv *priv, enum mlx5e_vlan_rule_type rule_type, u16 vid, struct mlx5_flow_spec *spec) { - struct mlx5_flow_act flow_act = { - .action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST, - .flow_tag = MLX5_FS_DEFAULT_FLOW_TAG, - .encap_id = 0, - }; struct mlx5_flow_table *ft = priv->fs.vlan.ft.t; struct mlx5_flow_destination dest; struct mlx5_flow_handle **rule_p; + MLX5_DECLARE_FLOW_ACT(flow_act); int err = 0; dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE; @@ -659,11 +655,7 @@ mlx5e_generate_ttc_rule(struct mlx5e_priv *priv, u16 etype, u8 proto) { - struct mlx5_flow_act flow_act = { - .action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST, - .flow_tag = MLX5_FS_DEFAULT_FLOW_TAG, - .encap_id = 0, - }; + MLX5_DECLARE_FLOW_ACT(flow_act); struct mlx5_flow_handle *rule; struct mlx5_flow_spec *spec; int err = 0; @@ -848,13 +840,9 @@ static void mlx5e_del_l2_flow_rule(struct mlx5e_priv *priv, static int mlx5e_add_l2_flow_rule(struct mlx5e_priv *priv, struct mlx5e_l2_rule *ai, int type) { - struct mlx5_flow_act flow_act = { - .action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST, - .flow_tag = MLX5_FS_DEFAULT_FLOW_TAG, - .encap_id = 0, - }; struct mlx5_flow_table *ft = priv->fs.l2.ft.t; struct mlx5_flow_destination dest; + MLX5_DECLARE_FLOW_ACT(flow_act); struct mlx5_flow_spec *spec; int err = 0; u8 *mc_dmac; diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 949b24b6c479..5eea1ba2e593 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -136,6 +136,10 @@ struct mlx5_flow_act { u32 encap_id; }; +#define MLX5_DECLARE_FLOW_ACT(name) \ + struct mlx5_flow_act name = {MLX5_FLOW_CONTEXT_ACTION_FWD_DEST,\ +MLX5_FS_DEFAULT_FLOW_TAG, 0} + /* Single destination per rule. * Group ID is implied by the match criteri
[net-next 10/12] net/mlx5e: Add parsing of TC pedit actions to HW format
From: Or Gerlitz <ogerl...@mellanox.com> Parse/translate a set of TC pedit actions to be formed in the HW API format. User-space provides set of keys where each one of them is made of: command (add or set), header-type, byte offset within that header along with a 32 bit mask and value. The mask dictates what bits in the 32 bit word that starts on the offset we should be dealing with, but under negative polarity (unset bits are to be modified). We do a 1st pass over the set of keys while using the header-type and offset to fill the masks and the values into a data-structure containting all the supported network headers. We then do a 2nd pass over the set of fields to re-write supported by the HW, where for each such candidate field, we use the masks filled on the 1st pass to realize if we should offloading re-write it. In case offloading is required, we fill a HW descriptor with the following: (1) the header field to modify (2) the bit offset within the field from where to modify (set command only) (3) the value to set/add (4) the length in bits 1...32 to modify (set command only) Note that it's possible for a given pedit mask to dictate modifying the same header field multiple times or to modify multiple header fields. Currently such combinations are not supported for offloading, hence, for set commands, the offset within the field is always zero, and the length to modify is the field size. Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Amir Vadai <a...@vadai.me> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 273 1 file changed, 273 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index af92d9c1a619..3a31195f0d9c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -42,6 +42,7 @@ #include #include #include +#include #include #include "en.h" #include "en_tc.h" @@ -72,6 +73,8 @@ struct mlx5e_tc_flow { struct mlx5e_tc_flow_parse_attr { struct mlx5_flow_spec spec; + int num_mod_hdr_actions; + void *mod_hdr_actions; }; enum { @@ -675,6 +678,276 @@ static int parse_cls_flower(struct mlx5e_priv *priv, return err; } +struct pedit_headers { + struct ethhdr eth; + struct iphdr ip4; + struct ipv6hdr ip6; + struct tcphdr tcp; + struct udphdr udp; +}; + +static int pedit_header_offsets[] = { + [TCA_PEDIT_KEY_EX_HDR_TYPE_ETH] = offsetof(struct pedit_headers, eth), + [TCA_PEDIT_KEY_EX_HDR_TYPE_IP4] = offsetof(struct pedit_headers, ip4), + [TCA_PEDIT_KEY_EX_HDR_TYPE_IP6] = offsetof(struct pedit_headers, ip6), + [TCA_PEDIT_KEY_EX_HDR_TYPE_TCP] = offsetof(struct pedit_headers, tcp), + [TCA_PEDIT_KEY_EX_HDR_TYPE_UDP] = offsetof(struct pedit_headers, udp), +}; + +#define pedit_header(_ph, _htype) ((void *)(_ph) + pedit_header_offsets[_htype]) + +static int set_pedit_val(u8 hdr_type, u32 mask, u32 val, u32 offset, +struct pedit_headers *masks, +struct pedit_headers *vals) +{ + u32 *curr_pmask, *curr_pval; + + if (hdr_type >= __PEDIT_HDR_TYPE_MAX) + goto out_err; + + curr_pmask = (u32 *)(pedit_header(masks, hdr_type) + offset); + curr_pval = (u32 *)(pedit_header(vals, hdr_type) + offset); + + if (*curr_pmask & mask) /* disallow acting twice on the same location */ + goto out_err; + + *curr_pmask |= mask; + *curr_pval |= (val & mask); + + return 0; + +out_err: + return -EOPNOTSUPP; +} + +struct mlx5_fields { + u8 field; + u8 size; + u32 offset; +}; + +static struct mlx5_fields fields[] = { + {MLX5_ACTION_IN_FIELD_OUT_DMAC_47_16, 4, offsetof(struct pedit_headers, eth.h_dest[0])}, + {MLX5_ACTION_IN_FIELD_OUT_DMAC_15_0, 2, offsetof(struct pedit_headers, eth.h_dest[4])}, + {MLX5_ACTION_IN_FIELD_OUT_SMAC_47_16, 4, offsetof(struct pedit_headers, eth.h_source[0])}, + {MLX5_ACTION_IN_FIELD_OUT_SMAC_15_0, 2, offsetof(struct pedit_headers, eth.h_source[4])}, + {MLX5_ACTION_IN_FIELD_OUT_ETHERTYPE, 2, offsetof(struct pedit_headers, eth.h_proto)}, + + {MLX5_ACTION_IN_FIELD_OUT_IP_DSCP, 1, offsetof(struct pedit_headers, ip4.tos)}, + {MLX5_ACTION_IN_FIELD_OUT_IP_TTL, 1, offsetof(struct pedit_headers, ip4.ttl)}, + {MLX5_ACTION_IN_FIELD_OUT_SIPV4, 4, offsetof(struct pedit_headers, ip4.saddr)}, + {MLX5_ACTION_IN_FIELD_OUT_DIPV4, 4, offsetof(struct pedit_headers, ip4.daddr)}, + + {MLX5_ACTION_IN_FIELD_OUT_SIPV6_127_96, 4, offsetof(struct pedit_headers, ip6.saddr.s6_addr32[0])}, + {MLX5_ACTION_IN_FIELD_OUT_SIPV6_95_64, 4, offsetof(struct pedit_headers, ip6.saddr.s6_addr32[1])}, + {ML
[net-next 01/12] net/mlx5e: Add prefix for e-switch offloaded TC flow attributes
From: Or Gerlitz <ogerl...@mellanox.com> Add esw_ prefix to the flow attributes attached to offloaded e-switch TC flows. This is a pre-step to add attributes to offloaded NIC TC flows. Also, save one pointer space by using gcc's zero size array, this would be beneficial for environments where 100Ks (or Ms) of flows are offloaded. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Hadar Hen Zion <had...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index fade7233dac5..b2501987988b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -58,7 +58,7 @@ struct mlx5e_tc_flow { u8 flags; struct mlx5_flow_handle *rule; struct list_headencap; /* flows sharing the same encap */ - struct mlx5_esw_flow_attr *attr; + struct mlx5_esw_flow_attr esw_attr[0]; }; enum { @@ -173,11 +173,11 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv, { struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; - mlx5_eswitch_del_offloaded_rule(esw, flow->rule, flow->attr); + mlx5_eswitch_del_offloaded_rule(esw, flow->rule, flow->esw_attr); - mlx5_eswitch_del_vlan_action(esw, flow->attr); + mlx5_eswitch_del_vlan_action(esw, flow->esw_attr); - if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP) + if (flow->esw_attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP) mlx5e_detach_encap(priv, flow); } @@ -1073,7 +1073,7 @@ static int mlx5e_attach_encap(struct mlx5e_priv *priv, static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts, struct mlx5e_tc_flow *flow) { - struct mlx5_esw_flow_attr *attr = flow->attr; + struct mlx5_esw_flow_attr *attr = flow->esw_attr; struct ip_tunnel_info *info = NULL; const struct tc_action *a; LIST_HEAD(actions); @@ -1191,11 +1191,10 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol, goto err_free; if (flow->flags & MLX5E_TC_FLOW_ESWITCH) { - flow->attr = (struct mlx5_esw_flow_attr *)(flow + 1); err = parse_tc_fdb_actions(priv, f->exts, flow); if (err < 0) goto err_free; - flow->rule = mlx5e_tc_add_fdb_flow(priv, spec, flow->attr); + flow->rule = mlx5e_tc_add_fdb_flow(priv, spec, flow->esw_attr); } else { err = parse_tc_nic_actions(priv, f->exts, , _tag); if (err < 0) -- 2.11.0
[net-next 12/12] net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions
From: Or Gerlitz <ogerl...@mellanox.com> This includes calling the parsing code that translates from pedit speak to the HW API, allocation (deallocation) of a modify header context and setting the modify header id associated with this context to the FTE of that flow. Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Hadar Hen Zion <had...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 37 -- drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 1 + .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 5 ++- 3 files changed, 39 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 4045b4768294..9dec11c00a49 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -98,7 +98,6 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv, .action = attr->action, .flow_tag = attr->flow_tag, .encap_id = 0, - .modify_id = attr->mod_hdr_id, }; struct mlx5_fc *counter = NULL; struct mlx5_flow_handle *rule; @@ -122,6 +121,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv, parse_attr->num_mod_hdr_actions, parse_attr->mod_hdr_actions, >mod_hdr_id); + flow_act.modify_id = attr->mod_hdr_id; kfree(parse_attr->mod_hdr_actions); if (err) { rule = ERR_PTR(err); @@ -208,6 +208,18 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, goto err_add_vlan; } + if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) { + err = mlx5_modify_header_alloc(priv->mdev, MLX5_FLOW_NAMESPACE_FDB, + parse_attr->num_mod_hdr_actions, + parse_attr->mod_hdr_actions, + >mod_hdr_id); + kfree(parse_attr->mod_hdr_actions); + if (err) { + rule = ERR_PTR(err); + goto err_mod_hdr; + } + } + rule = mlx5_eswitch_add_offloaded_rule(esw, _attr->spec, attr); if (IS_ERR(rule)) goto err_add_rule; @@ -215,11 +227,14 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, return rule; err_add_rule: + if (flow->esw_attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) + mlx5_modify_header_dealloc(priv->mdev, + attr->mod_hdr_id); +err_mod_hdr: mlx5_eswitch_del_vlan_action(esw, attr); err_add_vlan: if (attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP) mlx5e_detach_encap(priv, flow); - return rule; } @@ -227,6 +242,7 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow) { struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + struct mlx5_esw_flow_attr *attr = flow->esw_attr; mlx5_eswitch_del_offloaded_rule(esw, flow->rule, flow->esw_attr); @@ -234,6 +250,10 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv, if (flow->esw_attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP) mlx5e_detach_encap(priv, flow); + + if (flow->esw_attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) + mlx5_modify_header_dealloc(priv->mdev, + attr->mod_hdr_id); } static void mlx5e_detach_encap(struct mlx5e_priv *priv, @@ -1406,6 +1426,7 @@ static int mlx5e_attach_encap(struct mlx5e_priv *priv, } static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts, + struct mlx5e_tc_flow_parse_attr *parse_attr, struct mlx5e_tc_flow *flow) { struct mlx5_esw_flow_attr *attr = flow->esw_attr; @@ -1429,6 +1450,16 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts, continue; } + if (is_tcf_pedit(a)) { + err = parse_tc_pedit_action(priv, a, MLX5_FLOW_NAMESPACE_FDB, + parse_attr); + if (err) + return err; + + attr->action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR; + continue; + } + if (is_tcf_mirred_egress_redirect(a)) { int ifindex = tcf_mirred_if
[net-next 07/12] net/mlx5: Introduce modify header structures, commands and steering action definitions
From: Or Gerlitz <ogerl...@mellanox.com> Add the definitions related to creation/deletion of a modify header context and the modify header steering action which are used for HW packet header modify (re-write) as part of steering. Add as well the modify header id into two intermediate structs and set it to the FTE. Note that as the push/pop vlan steering actions are emulated by the ewitch management code, we're not breaking any compatibility while changing their values to make room for the modify header action which is not emulated and whose value is part of the FW API. The new bit values for the emulated actions are at the end of the possible range. Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Hadar Hen Zion <had...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 4 +- drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/fs_core.h | 1 + include/linux/mlx5/fs.h | 3 +- include/linux/mlx5/mlx5_ifc.h | 113 +- 6 files changed, 118 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index ad329b1680b4..cd9240c3a7f0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -285,8 +285,8 @@ enum { SET_VLAN_INSERT = BIT(1) }; -#define MLX5_FLOW_CONTEXT_ACTION_VLAN_POP 0x40 -#define MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH 0x80 +#define MLX5_FLOW_CONTEXT_ACTION_VLAN_POP 0x4000 +#define MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH 0x8000 struct mlx5_encap_entry { struct hlist_node encap_hlist; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c index b64a781c7e85..20d1fd516d03 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c @@ -249,6 +249,7 @@ static int mlx5_cmd_set_fte(struct mlx5_core_dev *dev, MLX5_SET(flow_context, in_flow_context, flow_tag, fte->flow_tag); MLX5_SET(flow_context, in_flow_context, action, fte->action); MLX5_SET(flow_context, in_flow_context, encap_id, fte->encap_id); + MLX5_SET(flow_context, in_flow_context, modify_header_id, fte->modify_id); in_match_value = MLX5_ADDR_OF(flow_context, in_flow_context, match_value); memcpy(in_match_value, >val, MLX5_ST_SZ_BYTES(fte_match_param)); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index ded27bb9a3b6..27ff815600f7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -476,6 +476,7 @@ static struct fs_fte *alloc_fte(struct mlx5_flow_act *flow_act, fte->index = index; fte->action = flow_act->action; fte->encap_id = flow_act->encap_id; + fte->modify_id = flow_act->modify_id; return fte; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h index 8e668c63f69e..03af2e7989f3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h @@ -152,6 +152,7 @@ struct fs_fte { u32 index; u32 action; u32 encap_id; + u32 modify_id; enum fs_fte_status status; struct mlx5_fc *counter; }; diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 5eea1ba2e593..ae91a4bda1a3 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -134,11 +134,12 @@ struct mlx5_flow_act { u32 action; u32 flow_tag; u32 encap_id; + u32 modify_id; }; #define MLX5_DECLARE_FLOW_ACT(name) \ struct mlx5_flow_act name = {MLX5_FLOW_CONTEXT_ACTION_FWD_DEST,\ -MLX5_FS_DEFAULT_FLOW_TAG, 0} +MLX5_FS_DEFAULT_FLOW_TAG, 0, 0} /* Single destination per rule. * Group ID is implied by the match criteria. diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 838242697541..56bc842b0620 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -227,6 +227,8 @@ enum { MLX5_CMD_OP_MODIFY_FLOW_TABLE = 0x93c, MLX5_CMD_OP_ALLOC_ENCAP_HEADER= 0x93d, MLX5_CMD_OP_DEALLOC_ENCAP_HEADER = 0x93e, + MLX5_CMD_OP_ALLOC_MODIFY_HEADER_CONTEXT = 0x940, + MLX5_CMD_OP_DE
[net-next 04/12] net/mlx5e: Properly deal with resource cleanup when adding TC flow fails
From: Or Gerlitz <ogerl...@mellanox.com> The code for adding tc fdb flows leaves things half set when it fails in the middle. Currently we are not leaking things (e.g eswitch vlan reference, encap reference and HW resources) since the main code to add flower rules does a cleanup by calling mlx5e_tc_del_flow(). This cleanup further works just b/c we're checking there if the HW rule for the flow we are attempting to delete is valid before touching it, and since under the current possible combinations of supported actions it's okay to go and blidnly deref or delete all the action related resources (encap, vlan). Instead, do things properly, namely make sure that if add flow fails we clean all what was allocated or referenced. Now, the flow delete code can blindly deref/deallocate both the rule and the actions related resources and when more action combinations are introduced (such as the upcoming header re-write) we are fine with clear and robust code. While here, align all of nic/fdb parse actions/add flow functions to get mlx5e_tc_flow struct param and pick the attributes or whatever else needed from there. Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Hadar Hen Zion <had...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 59 +- .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 23 + 2 files changed, 50 insertions(+), 32 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 9f900afcd7ea..af92d9c1a619 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -85,10 +85,11 @@ enum { static struct mlx5_flow_handle * mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv, struct mlx5e_tc_flow_parse_attr *parse_attr, - struct mlx5_nic_flow_attr *attr) + struct mlx5e_tc_flow *flow) { + struct mlx5_nic_flow_attr *attr = flow->nic_attr; struct mlx5_core_dev *dev = priv->mdev; - struct mlx5_flow_destination dest = { 0 }; + struct mlx5_flow_destination dest = {}; struct mlx5_flow_act flow_act = { .action = attr->action, .flow_tag = attr->flow_tag, @@ -152,11 +153,9 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv, { struct mlx5_fc *counter = NULL; - if (!IS_ERR(flow->rule)) { - counter = mlx5_flow_rule_counter(flow->rule); - mlx5_del_flow_rules(flow->rule); - mlx5_fc_destroy(priv->mdev, counter); - } + counter = mlx5_flow_rule_counter(flow->rule); + mlx5_del_flow_rules(flow->rule); + mlx5_fc_destroy(priv->mdev, counter); if (!mlx5e_tc_num_filters(priv) && (priv->fs.tc.t)) { mlx5_destroy_flow_table(priv->fs.tc.t); @@ -164,23 +163,39 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv, } } +static void mlx5e_detach_encap(struct mlx5e_priv *priv, + struct mlx5e_tc_flow *flow); + static struct mlx5_flow_handle * mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, struct mlx5e_tc_flow_parse_attr *parse_attr, - struct mlx5_esw_flow_attr *attr) + struct mlx5e_tc_flow *flow) { struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + struct mlx5_esw_flow_attr *attr = flow->esw_attr; + struct mlx5_flow_handle *rule; int err; err = mlx5_eswitch_add_vlan_action(esw, attr); - if (err) - return ERR_PTR(err); + if (err) { + rule = ERR_PTR(err); + goto err_add_vlan; + } - return mlx5_eswitch_add_offloaded_rule(esw, _attr->spec, attr); -} + rule = mlx5_eswitch_add_offloaded_rule(esw, _attr->spec, attr); + if (IS_ERR(rule)) + goto err_add_rule; -static void mlx5e_detach_encap(struct mlx5e_priv *priv, - struct mlx5e_tc_flow *flow); + return rule; + +err_add_rule: + mlx5_eswitch_del_vlan_action(esw, attr); +err_add_vlan: + if (attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP) + mlx5e_detach_encap(priv, flow); + + return rule; +} static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow) @@ -214,10 +229,6 @@ static void mlx5e_detach_encap(struct mlx5e_priv *priv, } } -/* we get here also when setting rule to the FW failed, etc. It means that the - * flow rule itself might not exist, but some offloading related to the actions - * should be cleaned. - */ static void mlx5e_tc_del_flow(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow)
[net-next 08/12] net/mlx5: Introduce alloc/dealloc modify header context commands
From: Or Gerlitz <ogerl...@mellanox.com> Implement the low-level commands to support packet header re-write. Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Hadar Hen Zion <had...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 ++ drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c | 66 ++ .../net/ethernet/mellanox/mlx5/core/mlx5_core.h| 5 ++ 3 files changed, 75 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index c3c6e931cc35..5bdaf3d545b2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -307,6 +307,7 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op, case MLX5_CMD_OP_SET_FLOW_TABLE_ENTRY: case MLX5_CMD_OP_SET_FLOW_TABLE_ROOT: case MLX5_CMD_OP_DEALLOC_ENCAP_HEADER: + case MLX5_CMD_OP_DEALLOC_MODIFY_HEADER_CONTEXT: return MLX5_CMD_STAT_OK; case MLX5_CMD_OP_QUERY_HCA_CAP: @@ -418,6 +419,7 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op, case MLX5_CMD_OP_ALLOC_FLOW_COUNTER: case MLX5_CMD_OP_QUERY_FLOW_COUNTER: case MLX5_CMD_OP_ALLOC_ENCAP_HEADER: + case MLX5_CMD_OP_ALLOC_MODIFY_HEADER_CONTEXT: *status = MLX5_DRIVER_STATUS_ABORTED; *synd = MLX5_DRIVER_SYND; return -EIO; @@ -582,6 +584,8 @@ const char *mlx5_command_str(int command) MLX5_COMMAND_STR_CASE(MODIFY_FLOW_TABLE); MLX5_COMMAND_STR_CASE(ALLOC_ENCAP_HEADER); MLX5_COMMAND_STR_CASE(DEALLOC_ENCAP_HEADER); + MLX5_COMMAND_STR_CASE(ALLOC_MODIFY_HEADER_CONTEXT); + MLX5_COMMAND_STR_CASE(DEALLOC_MODIFY_HEADER_CONTEXT); default: return "unknown command opcode"; } } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c index 20d1fd516d03..c6178ea1a461 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c @@ -516,3 +516,69 @@ void mlx5_encap_dealloc(struct mlx5_core_dev *dev, u32 encap_id) mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); } + +int mlx5_modify_header_alloc(struct mlx5_core_dev *dev, +u8 namespace, u8 num_actions, +void *modify_actions, u32 *modify_header_id) +{ + u32 out[MLX5_ST_SZ_DW(alloc_modify_header_context_out)]; + int max_actions, actions_size, inlen, err; + void *actions_in; + u8 table_type; + u32 *in; + + switch (namespace) { + case MLX5_FLOW_NAMESPACE_FDB: + max_actions = MLX5_CAP_ESW_FLOWTABLE_FDB(dev, max_modify_header_actions); + table_type = FS_FT_FDB; + break; + case MLX5_FLOW_NAMESPACE_KERNEL: + max_actions = MLX5_CAP_FLOWTABLE_NIC_RX(dev, max_modify_header_actions); + table_type = FS_FT_NIC_RX; + break; + default: + return -EOPNOTSUPP; + } + + if (num_actions > max_actions) { + mlx5_core_warn(dev, "too many modify header actions %d, max supported %d\n", + num_actions, max_actions); + return -EOPNOTSUPP; + } + + actions_size = MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto) * num_actions; + inlen = MLX5_ST_SZ_BYTES(alloc_modify_header_context_in) + actions_size; + + in = kzalloc(inlen, GFP_KERNEL); + if (!in) + return -ENOMEM; + + MLX5_SET(alloc_modify_header_context_in, in, opcode, +MLX5_CMD_OP_ALLOC_MODIFY_HEADER_CONTEXT); + MLX5_SET(alloc_modify_header_context_in, in, table_type, table_type); + MLX5_SET(alloc_modify_header_context_in, in, num_of_actions, num_actions); + + actions_in = MLX5_ADDR_OF(alloc_modify_header_context_in, in, actions); + memcpy(actions_in, modify_actions, actions_size); + + memset(out, 0, sizeof(out)); + err = mlx5_cmd_exec(dev, in, inlen, out, sizeof(out)); + + *modify_header_id = MLX5_GET(alloc_modify_header_context_out, out, modify_header_id); + kfree(in); + return err; +} + +void mlx5_modify_header_dealloc(struct mlx5_core_dev *dev, u32 modify_header_id) +{ + u32 in[MLX5_ST_SZ_DW(dealloc_modify_header_context_in)]; + u32 out[MLX5_ST_SZ_DW(dealloc_modify_header_context_out)]; + + memset(in, 0, sizeof(in)); + MLX5_SET(dealloc_modify_header_context_in, in, opcode, +MLX5_CMD_OP_DEALLOC_MODIFY_HEADER_CONTEXT); + MLX5_SET(dealloc_modify_header_context_in, in, modify_header_id, +modify_header_id); + + mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); +} diff
[pull request][net-next 00/12] Mellanox mlx5 offloading of TC pedit (header re-write) action
Hi Dave, The following changes from Or Gerlitz provide mlx5 offloading support of TC pedit (header re-write) action. For more information please see below. Please pull and let me know if there's any problem. Thanks, Saeed. --- The following changes since commit cc628c9680c212d9dbf68785fbf5d454ccb2313e: Merge tag 'mlx5e-failsafe' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux (2017-03-27 21:16:03 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5e-pedit for you to fetch changes up to d7e75a325cb2d2b72e7ac9a185abc1cd59bc9922: net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions (2017-03-28 15:34:10 +0300) mlx5e-pedit 2017-03-28 Or Gerlitz says: This series adds support for offloading modifications of packet headers using ConnectX-5 HW header re-write as an action applied during packet steering. The offloaded SW mechanism is TC's pedit action. The offloading is supported for E-Switch steering of VF traffic in the SRIOV switchdev mode and for NIC (non eswitch) RX. One use-case for this offload on virtual networks, is when the hypervisor implements flow based router such as Open-Stack's DVR, where L2 headers of guest packets re-written with routers' MAC addresses and the IP TTL is decremented. Another use case (which can be applied in parallel with routing) is stateless NAT where guest L3/L4 headers are re-written. The series is built as follows: the 1st six patches are preperations which don't yet add new functionality, patches 7-8 add the FW APIs (data-structures and commands) for header re-write, and patch nine allows offloading driver to access pedit keys. The 10th patch is somehow the core of the series, where we translate from the pedit way to represent set of header modification elements to the FW API for that same matter. Once a set of HW modification is established, we register it with the FW and get a modify header ID. When this ID is used with an action during packet steering, the HW applies the header modification on the packet. Patches 11 and 12 implement the above logic as an offload for pedit action for the NIC and E-Switch use-cases. I'd like to thanks Elijah Shakkourfor implementing and helping me testing this functionality on HW simulator, before it could be done with FW. - Or. Or Gerlitz (12): net/mlx5e: Add prefix for e-switch offloaded TC flow attributes net/mlx5e: Add NIC attributes for offloaded TC flows net/mlx5e: Add intermediate struct for TC flow parsing attributes net/mlx5e: Properly deal with resource cleanup when adding TC flow fails net/mlx5: Add helper to initialize a flow steering actions struct instance net/mlx5: Reorder few command cases to reflect their natural order net/mlx5: Introduce modify header structures, commands and steering action definitions net/mlx5: Introduce alloc/dealloc modify header context commands net/sched: Add accessor functions to pedit keys for offloading drivers net/mlx5e: Add parsing of TC pedit actions to HW format net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 28 +- drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 14 +- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c| 18 +- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 473 ++--- drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 5 +- .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 28 +- drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c | 67 +++ drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/fs_core.h | 1 + .../net/ethernet/mellanox/mlx5/core/mlx5_core.h| 5 + include/linux/mlx5/fs.h| 6 +- include/linux/mlx5/mlx5_ifc.h | 113 - include/net/tc_act/tc_pedit.h | 45 ++ 13 files changed, 698 insertions(+), 106 deletions(-)
[net-next 08/14] net/mlx5e: CQ and RQ don't need priv pointer
Remove mlx5e_priv pointer from CQ and RQ structs, it was needed only to access mdev pointer from priv pointer. Instead we now pass mdev where needed. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 38 +++-- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 181 + drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 3 +- 4 files changed, 99 insertions(+), 125 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 007f91f54fda..44c454b34754 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -280,7 +280,6 @@ struct mlx5e_cq { struct napi_struct*napi; struct mlx5_core_cqmcq; struct mlx5e_channel *channel; - struct mlx5e_priv *priv; /* cqe decompression */ struct mlx5_cqe64 title; @@ -290,6 +289,7 @@ struct mlx5e_cq { u16decmprs_wqe_counter; /* control */ + struct mlx5_core_dev *mdev; struct mlx5_frag_wq_ctrl wq_ctrl; } cacheline_aligned_in_smp; @@ -533,7 +533,7 @@ struct mlx5e_rq { u32mpwqe_num_strides; u32rqn; struct mlx5e_channel *channel; - struct mlx5e_priv *priv; + struct mlx5_core_dev *mdev; struct mlx5_core_mkey umr_mkey; } cacheline_aligned_in_smp; @@ -556,6 +556,8 @@ struct mlx5e_channel { /* control */ struct mlx5e_priv *priv; + struct mlx5_core_dev *mdev; + struct mlx5e_tstamp *tstamp; intix; intcpu; }; @@ -715,22 +717,6 @@ enum { MLX5E_NIC_PRIO }; -struct mlx5e_profile { - void(*init)(struct mlx5_core_dev *mdev, - struct net_device *netdev, - const struct mlx5e_profile *profile, void *ppriv); - void(*cleanup)(struct mlx5e_priv *priv); - int (*init_rx)(struct mlx5e_priv *priv); - void(*cleanup_rx)(struct mlx5e_priv *priv); - int (*init_tx)(struct mlx5e_priv *priv); - void(*cleanup_tx)(struct mlx5e_priv *priv); - void(*enable)(struct mlx5e_priv *priv); - void(*disable)(struct mlx5e_priv *priv); - void(*update_stats)(struct mlx5e_priv *priv); - int (*max_nch)(struct mlx5_core_dev *mdev); - int max_tc; -}; - struct mlx5e_priv { /* priv data path fields - start */ struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC]; @@ -770,6 +756,22 @@ struct mlx5e_priv { void *ppriv; }; +struct mlx5e_profile { + void(*init)(struct mlx5_core_dev *mdev, + struct net_device *netdev, + const struct mlx5e_profile *profile, void *ppriv); + void(*cleanup)(struct mlx5e_priv *priv); + int (*init_rx)(struct mlx5e_priv *priv); + void(*cleanup_rx)(struct mlx5e_priv *priv); + int (*init_tx)(struct mlx5e_priv *priv); + void(*cleanup_tx)(struct mlx5e_priv *priv); + void(*enable)(struct mlx5e_priv *priv); + void(*disable)(struct mlx5e_priv *priv); + void(*update_stats)(struct mlx5e_priv *priv); + int (*max_nch)(struct mlx5_core_dev *mdev); + int max_tc; +}; + void mlx5e_build_ptys2ethtool_map(void); u16 mlx5e_select_queue(struct net_device *dev, struct sk_buff *skb, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index cf8df1d3275e..a6e09c46440b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -491,11 +491,10 @@ static void mlx5e_rq_free_mpwqe_info(struct mlx5e_rq *rq) kfree(rq->mpwqe.info); } -static int mlx5e_create_umr_mkey(struct mlx5e_priv *priv, +static int mlx5e_create_umr_mkey(struct mlx5_core_dev *mdev, u64 npages, u8 page_shift, struct mlx5_core_mkey *umr_mkey) { - struct mlx5_core_dev *mdev = priv->mdev; int inlen = MLX5_ST_SZ_BYTES(create_mkey_in); void *mkc; u32 *in; @@ -529,12 +528,11 @@ static int mlx5e_create_umr_mkey(struct mlx5e_priv *priv, return err; } -static int mlx5e_create_rq_umr_mkey(struct mlx5e_rq *rq) +static int mlx5e_create_rq_umr_mkey(struct mlx5_core_dev *mdev, struct mlx5e_rq *rq) { - struct mlx5e_priv *priv = rq->priv; u64 num_mtts = MLX5E_REQUIRED_MTTS(mlx5_wq_ll_get_size(>wq)); - return mlx5e_create_umr_mkey(priv, num_mtt
[net-next 07/14] net/mlx5e: Isolate open_channels from priv->params
In order to have a clean separation between channels resources creation flows and current active mlx5e netdev parameters, make sure each resource creation function do not access priv->params, and only works with on a new fresh set of parameters. For this we add "new" mlx5e_params field to mlx5e_channels structure and use it down the road to mlx5e_open_{cq,rq,sq} and so on. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 22 +- drivers/net/ethernet/mellanox/mlx5/core/en_clock.c | 2 +- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 119 +++--- .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c| 2 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 448 ++--- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 61 ++- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 8 +- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c| 7 +- 8 files changed, 328 insertions(+), 341 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index f1895ebe7fe5..007f91f54fda 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -182,15 +182,15 @@ enum mlx5e_priv_flag { MLX5E_PFLAG_RX_CQE_COMPRESS = (1 << 1), }; -#define MLX5E_SET_PFLAG(priv, pflag, enable) \ +#define MLX5E_SET_PFLAG(params, pflag, enable) \ do {\ if (enable) \ - (priv)->params.pflags |= (pflag); \ + (params)->pflags |= (pflag);\ else\ - (priv)->params.pflags &= ~(pflag); \ + (params)->pflags &= ~(pflag); \ } while (0) -#define MLX5E_GET_PFLAG(priv, pflag) (!!((priv)->params.pflags & (pflag))) +#define MLX5E_GET_PFLAG(params, pflag) (!!((params)->pflags & (pflag))) #ifdef CONFIG_MLX5_CORE_EN_DCB #define MLX5E_MAX_BW_ALLOC 100 /* Max percentage of BW allocation */ @@ -213,7 +213,6 @@ struct mlx5e_params { bool rx_cqe_compress_def; struct mlx5e_cq_moder rx_cq_moderation; struct mlx5e_cq_moder tx_cq_moderation; - u16 min_rx_wqes; bool lro_en; u32 lro_wqe_sz; u16 tx_max_inline; @@ -225,6 +224,7 @@ struct mlx5e_params { bool rx_am_enabled; u32 lro_timeout; u32 pflags; + struct bpf_prog *xdp_prog; }; #ifdef CONFIG_MLX5_CORE_EN_DCB @@ -357,7 +357,6 @@ struct mlx5e_txqsq { /* control path */ struct mlx5_wq_ctrlwq_ctrl; struct mlx5e_channel *channel; - inttc; inttxq_ix; u32rate_limit; } cacheline_aligned_in_smp; @@ -564,6 +563,7 @@ struct mlx5e_channel { struct mlx5e_channels { struct mlx5e_channel **c; unsigned int num; + struct mlx5e_paramsparams; }; enum mlx5e_traffic_types { @@ -735,7 +735,6 @@ struct mlx5e_priv { /* priv data path fields - start */ struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC]; int channel_tc2txq[MLX5E_MAX_NUM_CHANNELS][MLX5E_MAX_NUM_TC]; - struct bpf_prog *xdp_prog; /* priv data path fields - end */ unsigned long state; @@ -752,7 +751,6 @@ struct mlx5e_priv { struct mlx5e_flow_steering fs; struct mlx5e_vxlan_db vxlan; - struct mlx5e_paramsparams; struct workqueue_struct*wq; struct work_struct update_carrier_work; struct work_struct set_rx_mode_work; @@ -857,8 +855,9 @@ struct mlx5e_redirect_rqt_param { int mlx5e_redirect_rqt(struct mlx5e_priv *priv, u32 rqtn, int sz, struct mlx5e_redirect_rqt_param rrp); -void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_priv *priv, void *tirc, - enum mlx5e_traffic_types tt); +void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_params *params, + enum mlx5e_traffic_types tt, + void *tirc); int mlx5e_open_locked(struct net_device *netdev); int mlx5e_close_locked(struct net_device *netdev); @@ -869,7 +868,8 @@ int mlx5e_get_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed); void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode); -void mlx5e_set_rq_type_params(struct mlx5e_priv *priv, u8 rq_type); +void mlx5e_set_rq_type_params(struct mlx5_core_dev *mdev, + struct mlx5e_params *params, u8 rq_ty
[net-next 02/14] net/mlx5e: Set netdev->rx_cpu_rmap on netdev creation
To simplify mlx5e_open_locked flow we set netdev->rx_cpu_rmap on netdev creation rather on netdev open, it is redundant to set it every time on mlx5e_open_locked. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 469d6c147db7..f0eff5e30729 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -2478,9 +2478,7 @@ int mlx5e_open_locked(struct net_device *netdev) mlx5e_redirect_rqts(priv); mlx5e_update_carrier(priv); mlx5e_timestamp_init(priv); -#ifdef CONFIG_RFS_ACCEL - priv->netdev->rx_cpu_rmap = priv->mdev->rmap; -#endif + if (priv->profile->update_stats) queue_delayed_work(priv->wq, >update_stats_work, 0); @@ -4022,6 +4020,10 @@ struct net_device *mlx5e_create_netdev(struct mlx5_core_dev *mdev, return NULL; } +#ifdef CONFIG_RFS_ACCEL + netdev->rx_cpu_rmap = mdev->rmap; +#endif + profile->init(mdev, netdev, profile, ppriv); netif_carrier_off(netdev); -- 2.11.0
[net-next 03/14] net/mlx5e: Introduce mlx5e_channels
Have a dedicated "channels" handler that will serve as channels (RQs/SQs/etc..) holder to help with separating channels/parameters operations, for the downstream fail-safe configuration flow, where we will create a new instance of mlx5e_channels with the new requested parameters and switch to the new channels on the fly. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 9 ++- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 27 +++ drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 86 +++--- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 14 ++-- 4 files changed, 71 insertions(+), 65 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index bace9233dc1f..b00c6688ddcf 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -560,6 +560,11 @@ struct mlx5e_channel { intcpu; }; +struct mlx5e_channels { + struct mlx5e_channel **c; + unsigned int num; +}; + enum mlx5e_traffic_types { MLX5E_TT_IPV4_TCP, MLX5E_TT_IPV6_TCP, @@ -736,7 +741,7 @@ struct mlx5e_priv { struct mutex state_lock; /* Protects Interface state */ struct mlx5e_rqdrop_rq; - struct mlx5e_channel **channel; + struct mlx5e_channels channels; u32tisn[MLX5E_MAX_NUM_TC]; struct mlx5e_rqt indir_rqt; struct mlx5e_tir indir_tir[MLX5E_NUM_INDIR_TIRS]; @@ -836,7 +841,7 @@ int mlx5e_vlan_rx_kill_vid(struct net_device *dev, __always_unused __be16 proto, void mlx5e_enable_vlan_filter(struct mlx5e_priv *priv); void mlx5e_disable_vlan_filter(struct mlx5e_priv *priv); -int mlx5e_modify_rqs_vsd(struct mlx5e_priv *priv, bool vsd); +int mlx5e_modify_channels_vsd(struct mlx5e_channels *chs, bool vsd); int mlx5e_redirect_rqt(struct mlx5e_priv *priv, u32 rqtn, int sz, int ix); void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_priv *priv, void *tirc, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index a004a5a1a4c2..2e54a6564d86 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -152,12 +152,9 @@ static bool mlx5e_query_global_pause_combined(struct mlx5e_priv *priv) } #define MLX5E_NUM_Q_CNTRS(priv) (NUM_Q_COUNTERS * (!!priv->q_counter)) -#define MLX5E_NUM_RQ_STATS(priv) \ - (NUM_RQ_STATS * priv->params.num_channels * \ -test_bit(MLX5E_STATE_OPENED, >state)) +#define MLX5E_NUM_RQ_STATS(priv) (NUM_RQ_STATS * (priv)->channels.num) #define MLX5E_NUM_SQ_STATS(priv) \ - (NUM_SQ_STATS * priv->params.num_channels * priv->params.num_tc * \ -test_bit(MLX5E_STATE_OPENED, >state)) + (NUM_SQ_STATS * (priv)->channels.num * (priv)->params.num_tc) #define MLX5E_NUM_PFC_COUNTERS(priv) \ ((mlx5e_query_global_pause_combined(priv) + hweight8(mlx5e_query_pfc_combined(priv))) * \ NUM_PPORT_PER_PRIO_PFC_COUNTERS) @@ -262,13 +259,13 @@ static void mlx5e_fill_stats_strings(struct mlx5e_priv *priv, uint8_t *data) return; /* per channel counters */ - for (i = 0; i < priv->params.num_channels; i++) + for (i = 0; i < priv->channels.num; i++) for (j = 0; j < NUM_RQ_STATS; j++) sprintf(data + (idx++) * ETH_GSTRING_LEN, rq_stats_desc[j].format, i); for (tc = 0; tc < priv->params.num_tc; tc++) - for (i = 0; i < priv->params.num_channels; i++) + for (i = 0; i < priv->channels.num; i++) for (j = 0; j < NUM_SQ_STATS; j++) sprintf(data + (idx++) * ETH_GSTRING_LEN, sq_stats_desc[j].format, @@ -303,6 +300,7 @@ static void mlx5e_get_ethtool_stats(struct net_device *dev, struct ethtool_stats *stats, u64 *data) { struct mlx5e_priv *priv = netdev_priv(dev); + struct mlx5e_channels *channels; struct mlx5_priv *mlx5_priv; int i, j, tc, prio, idx = 0; unsigned long pfc_combined; @@ -313,6 +311,7 @@ static void mlx5e_get_ethtool_stats(struct net_device *dev, mutex_lock(>state_lock); if (test_bit(MLX5E_STATE_OPENED, >state)) mlx5e_update_stats(priv); + channels = >channels; mutex_unlock(>state_lock); for (i = 0; i < NUM_SW_COUNTERS; i++) @@ -382,16 +381,16 @@ static void mlx5e_get_ethtool_stats(struct net_device *dev, return; /* p
[net-next 09/14] net/mlx5e: Minimize mlx5e_{open/close}_locked
mlx5e_redirect_rqts_to_{channels,drop} and mlx5e_{add,del}_sqs_fwd_rules and Set real num tx/rx queues belong to mlx5e_{activate,deactivate}_priv_channels, for that we move those functions and minimize mlx5e_open/close flows. This will be needed in downstream patches to replace old channels with new ones without the need to call mlx5e_close/open. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 40 +++ drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 -- 2 files changed, 26 insertions(+), 24 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index a6e09c46440b..a94f84ec2c1a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -2498,14 +2498,33 @@ static void mlx5e_build_channels_tx_maps(struct mlx5e_priv *priv) static void mlx5e_activate_priv_channels(struct mlx5e_priv *priv) { + int num_txqs = priv->channels.num * priv->channels.params.num_tc; + struct net_device *netdev = priv->netdev; + + mlx5e_netdev_set_tcs(netdev); + if (netdev->real_num_tx_queues != num_txqs) + netif_set_real_num_tx_queues(netdev, num_txqs); + if (netdev->real_num_rx_queues != priv->channels.num) + netif_set_real_num_rx_queues(netdev, priv->channels.num); + mlx5e_build_channels_tx_maps(priv); mlx5e_activate_channels(>channels); netif_tx_start_all_queues(priv->netdev); + + if (MLX5_CAP_GEN(priv->mdev, vport_group_manager)) + mlx5e_add_sqs_fwd_rules(priv); + mlx5e_wait_channels_min_rx_wqes(>channels); + mlx5e_redirect_rqts_to_channels(priv, >channels); } static void mlx5e_deactivate_priv_channels(struct mlx5e_priv *priv) { + mlx5e_redirect_rqts_to_drop(priv); + + if (MLX5_CAP_GEN(priv->mdev, vport_group_manager)) + mlx5e_remove_sqs_fwd_rules(priv); + /* FIXME: This is a W/A only for tx timeout watch dog false alarm when * polling for inactive tx queues. */ @@ -2517,40 +2536,24 @@ static void mlx5e_deactivate_priv_channels(struct mlx5e_priv *priv) int mlx5e_open_locked(struct net_device *netdev) { struct mlx5e_priv *priv = netdev_priv(netdev); - struct mlx5_core_dev *mdev = priv->mdev; - int num_txqs; int err; set_bit(MLX5E_STATE_OPENED, >state); - mlx5e_netdev_set_tcs(netdev); - - num_txqs = priv->channels.params.num_channels * priv->channels.params.num_tc; - netif_set_real_num_tx_queues(netdev, num_txqs); - netif_set_real_num_rx_queues(netdev, priv->channels.params.num_channels); - err = mlx5e_open_channels(priv, >channels); if (err) goto err_clear_state_opened_flag; mlx5e_refresh_tirs(priv, false); mlx5e_activate_priv_channels(priv); - mlx5e_redirect_rqts_to_channels(priv, >channels); mlx5e_update_carrier(priv); mlx5e_timestamp_init(priv); if (priv->profile->update_stats) queue_delayed_work(priv->wq, >update_stats_work, 0); - if (MLX5_CAP_GEN(mdev, vport_group_manager)) { - err = mlx5e_add_sqs_fwd_rules(priv); - if (err) - goto err_close_channels; - } return 0; -err_close_channels: - mlx5e_close_channels(>channels); err_clear_state_opened_flag: clear_bit(MLX5E_STATE_OPENED, >state); return err; @@ -2571,7 +2574,6 @@ int mlx5e_open(struct net_device *netdev) int mlx5e_close_locked(struct net_device *netdev) { struct mlx5e_priv *priv = netdev_priv(netdev); - struct mlx5_core_dev *mdev = priv->mdev; /* May already be CLOSED in case a previous configuration operation * (e.g RX/TX queue size change) that involves close failed. @@ -2581,12 +2583,8 @@ int mlx5e_close_locked(struct net_device *netdev) clear_bit(MLX5E_STATE_OPENED, >state); - if (MLX5_CAP_GEN(mdev, vport_group_manager)) - mlx5e_remove_sqs_fwd_rules(priv); - mlx5e_timestamp_cleanup(priv); netif_carrier_off(priv->netdev); - mlx5e_redirect_rqts_to_drop(priv); mlx5e_deactivate_priv_channels(priv); mlx5e_close_channels(>channels); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index d277c1979b2a..53db5ec2c122 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -189,12 +189,13 @@ int mlx5e_add_sqs_fwd_rules(struct mlx5e_priv *priv) struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; struct mlx5_eswitch
[pull request][net-next 00/14] Mellanox mlx5e Fail-safe config
Hi Dave, This series provides a fail-safe mechanism to allow safely re-configuring mlx5e netdevice and provides a resiliency against sporadic configuration failures. For additional information please see below. Please pull and let me know if there's any problem. Thanks, Saeed. --- The following changes since commit 88275ed0cb3ac89ed869a925337b951801b154d7: Merge branch 'netvsc-next' (2017-03-25 20:15:56 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5e-failsafe for you to fetch changes up to 2e20a151205be8e7efa9644cdb942381e7bec787: net/mlx5e: Fail safe mtu and lro setting (2017-03-27 15:08:24 +0300) mlx5e-failsafe 27-03-2017 This series provides a fail-safe mechanism to allow safely re-configuring mlx5e netdevice and provides a resiliency against sporadic configuration failures. To enable this we do some refactoring and code reorganizing to allow breaking the drivers open/close flows to stages: open -> activate -> deactivate -> close. In addition we need to allow creating fresh HW ring resources (mlx5e_channels) with their own "new" set of parameters, while keeping the current ones running and active until the new channels are successfully created with the new configuration, and only then we can safly replace (switch) old channels with new ones. For that we introduce mlx5e_channels object and an API to manage it: - channels = open_channels(new_params): open fresh TX/RX channels - activate_channels(channels): redirect traffic to them and attach them to the netdev - deactivate_channes(channels) stop traffic and detach from netdev - close(channels) Free the TX/RX HW resources of those channels With the above strategy it is straightforward to achieve the desired behavior of fail-safe configuration. In pseudo code: make_new_config(new_params) { old_channels = current_active_channels; new_channels = create_channels(new_params); if (!new_channels) return "Failed, but current channels are still active :)" deactivate_channels(old_channels); /* Can't fail */ set_hw_new_state();/* If needed */ activate_channels(new_channels); /* Can't fail */ close_channels(old_channels); current_active_channels = new_channels; return "SUCCESS"; } At the top of this series, we change the following flows to be fail-safe: ethtool: - ring parameters - coalesce parameters - tx copy break parameters - cqe compressing/moderation mode setting (priv flags) ndos: - tc setup - set features: LRO - change mtu ---- Saeed Mahameed (14): net/mlx5e: Set SQ max rate on mlx5e_open_txqsq rather on open_channel net/mlx5e: Set netdev->rx_cpu_rmap on netdev creation net/mlx5e: Introduce mlx5e_channels net/mlx5e: Redirect RQT refactoring net/mlx5e: Refactor refresh TIRs net/mlx5e: Split open/close channels to stages net/mlx5e: Isolate open_channels from priv->params net/mlx5e: CQ and RQ don't need priv pointer net/mlx5e: Minimize mlx5e_{open/close}_locked net/mlx5e: Introduce switch channels net/mlx5e: Fail safe ethtool settings net/mlx5e: Fail safe cqe compressing/moderation mode setting net/mlx5e: Fail safe tc setup net/mlx5e: Fail safe mtu and lro setting drivers/net/ethernet/mellanox/mlx5/core/en.h | 106 +- drivers/net/ethernet/mellanox/mlx5/core/en_clock.c | 10 +- .../net/ethernet/mellanox/mlx5/core/en_common.c| 17 +- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 336 +++--- .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c|2 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1172 +++- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 83 +- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 22 - drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c |2 +- .../net/ethernet/mellanox/mlx5/core/en_selftest.c |9 +- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c| 11 +- drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c |3 +- 12 files changed, 984 insertions(+), 789 deletions(-)
[net-next 05/14] net/mlx5e: Refactor refresh TIRs
Rename mlx5e_refresh_tirs_self_loopback to mlx5e_refresh_tirs, as it will be used in downstream (Safe config flow) patches, and make it fail safe on mlx5e_open. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 +-- drivers/net/ethernet/mellanox/mlx5/core/en_common.c | 17 +++-- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 +--- drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c | 9 +++-- 4 files changed, 16 insertions(+), 21 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 50dfc4c6c8e4..5f7cc58d900c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -959,8 +959,7 @@ void mlx5e_destroy_tir(struct mlx5_core_dev *mdev, struct mlx5e_tir *tir); int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev); void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev); -int mlx5e_refresh_tirs_self_loopback(struct mlx5_core_dev *mdev, -bool enable_uc_lb); +int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb); struct mlx5_eswitch_rep; int mlx5e_vport_rep_load(struct mlx5_eswitch *esw, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c index 20bdbe685795..f1f17f7a3cd0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c @@ -136,18 +136,20 @@ void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev) mlx5_core_dealloc_pd(mdev, res->pdn); } -int mlx5e_refresh_tirs_self_loopback(struct mlx5_core_dev *mdev, -bool enable_uc_lb) +int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb) { + struct mlx5_core_dev *mdev = priv->mdev; struct mlx5e_tir *tir; - void *in; + int err = -ENOMEM; + u32 tirn = 0; int inlen; - int err = 0; + void *in; + inlen = MLX5_ST_SZ_BYTES(modify_tir_in); in = mlx5_vzalloc(inlen); if (!in) - return -ENOMEM; + goto out; if (enable_uc_lb) MLX5_SET(modify_tir_in, in, ctx.self_lb_block, @@ -156,13 +158,16 @@ int mlx5e_refresh_tirs_self_loopback(struct mlx5_core_dev *mdev, MLX5_SET(modify_tir_in, in, bitmask.self_lb_en, 1); list_for_each_entry(tir, >mlx5e_res.td.tirs_list, list) { - err = mlx5_core_modify_tir(mdev, tir->tirn, in, inlen); + tirn = tir->tirn; + err = mlx5_core_modify_tir(mdev, tirn, in, inlen); if (err) goto out; } out: kvfree(in); + if (err) + netdev_err(priv->netdev, "refresh tir(0x%x) failed, %d\n", tirn, err); return err; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index aec77f075714..a98d01684247 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -2498,13 +2498,7 @@ int mlx5e_open_locked(struct net_device *netdev) goto err_clear_state_opened_flag; } - err = mlx5e_refresh_tirs_self_loopback(priv->mdev, false); - if (err) { - netdev_err(netdev, "%s: mlx5e_refresh_tirs_self_loopback_enable failed, %d\n", - __func__, err); - goto err_close_channels; - } - + mlx5e_refresh_tirs(priv, false); mlx5e_redirect_rqts_to_channels(priv, >channels); mlx5e_update_carrier(priv); mlx5e_timestamp_init(priv); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c index 5621dcfda4f1..5225f2226a67 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c @@ -236,12 +236,9 @@ static int mlx5e_test_loopback_setup(struct mlx5e_priv *priv, { int err = 0; - err = mlx5e_refresh_tirs_self_loopback(priv->mdev, true); - if (err) { - netdev_err(priv->netdev, - "\tFailed to enable UC loopback err(%d)\n", err); + err = mlx5e_refresh_tirs(priv, true); + if (err) return err; - } lbtp->loopback_ok = false; init_completion(>comp); @@ -258,7 +255,7 @@ static void mlx5e_test_loopback_cleanup(struct mlx5e_priv *priv, struct mlx5e_lbt_priv *lbtp) { dev_remove_pack(>pt); - mlx5e_refresh_tirs_self_loopback(priv->mdev, fa
[net-next 12/14] net/mlx5e: Fail safe cqe compressing/moderation mode setting
Use the new fail-safe channels switch mechanism to set new CQE compressing and CQE moderation mode settings. We also move RX CQE compression modify function out of en_rx file to a more appropriate place. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en_clock.c | 8 +++- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 53 ++ drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 22 - 4 files changed, 51 insertions(+), 34 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 2f259dfbf844..8b93d8d02116 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -833,7 +833,7 @@ void mlx5e_pps_event_handler(struct mlx5e_priv *priv, struct ptp_clock_event *event); int mlx5e_hwstamp_set(struct net_device *dev, struct ifreq *ifr); int mlx5e_hwstamp_get(struct net_device *dev, struct ifreq *ifr); -void mlx5e_modify_rx_cqe_compression_locked(struct mlx5e_priv *priv, bool val); +int mlx5e_modify_rx_cqe_compression_locked(struct mlx5e_priv *priv, bool val); int mlx5e_vlan_rx_add_vid(struct net_device *dev, __always_unused __be16 proto, u16 vid); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c index 485c23b59f93..e706a87fc8b2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c @@ -90,6 +90,7 @@ int mlx5e_hwstamp_set(struct net_device *dev, struct ifreq *ifr) { struct mlx5e_priv *priv = netdev_priv(dev); struct hwtstamp_config config; + int err; if (!MLX5_CAP_GEN(priv->mdev, device_frequency_khz)) return -EOPNOTSUPP; @@ -129,7 +130,12 @@ int mlx5e_hwstamp_set(struct net_device *dev, struct ifreq *ifr) case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ: /* Disable CQE compression */ netdev_warn(dev, "Disabling cqe compression"); - mlx5e_modify_rx_cqe_compression_locked(priv, false); + err = mlx5e_modify_rx_cqe_compression_locked(priv, false); + if (err) { + netdev_err(dev, "Failed disabling cqe compression err=%d\n", err); + mutex_unlock(>state_lock); + return err; + } config.rx_filter = HWTSTAMP_FILTER_ALL; break; default: diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index 457a796cc248..c5f49e294987 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -1474,10 +1474,10 @@ static int set_pflag_rx_cqe_based_moder(struct net_device *netdev, bool enable) { struct mlx5e_priv *priv = netdev_priv(netdev); struct mlx5_core_dev *mdev = priv->mdev; + struct mlx5e_channels new_channels = {}; bool rx_mode_changed; u8 rx_cq_period_mode; int err = 0; - bool reset; rx_cq_period_mode = enable ? MLX5_CQ_PERIOD_MODE_START_FROM_CQE : @@ -1491,16 +1491,51 @@ static int set_pflag_rx_cqe_based_moder(struct net_device *netdev, bool enable) if (!rx_mode_changed) return 0; - reset = test_bit(MLX5E_STATE_OPENED, >state); - if (reset) - mlx5e_close_locked(netdev); + new_channels.params = priv->channels.params; + mlx5e_set_rx_cq_mode_params(_channels.params, rx_cq_period_mode); - mlx5e_set_rx_cq_mode_params(>channels.params, rx_cq_period_mode); + if (!test_bit(MLX5E_STATE_OPENED, >state)) { + priv->channels.params = new_channels.params; + return 0; + } + + err = mlx5e_open_channels(priv, _channels); + if (err) + return err; - if (reset) - err = mlx5e_open_locked(netdev); + mlx5e_switch_priv_channels(priv, _channels); + return 0; +} - return err; +int mlx5e_modify_rx_cqe_compression_locked(struct mlx5e_priv *priv, bool new_val) +{ + bool curr_val = MLX5E_GET_PFLAG(>channels.params, MLX5E_PFLAG_RX_CQE_COMPRESS); + struct mlx5e_channels new_channels = {}; + int err = 0; + + if (!MLX5_CAP_GEN(priv->mdev, cqe_compression)) + return new_val ? -EOPNOTSUPP : 0; + + if (curr_val == new_val) + return 0; + + new_channels.params = priv->channels.params; + MLX5E_SET_PFLAG(_channels.params, MLX5E_PFLAG_RX_CQE_COMPRESS, new_val)
[net-next 14/14] net/mlx5e: Fail safe mtu and lro setting
Use the new fail-safe channels switch mechanism to set new netdev mtu and lro settings. MTU and lro settings demand some HW configuration changes after new channels are created and ready for action. In order to unify switch channels routine for LRO and MTU changes, and maybe future configuration features, we now pass to it a modify HW function pointer to be invoked directly after old channels are de-activated and before new channels are activated. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 8 ++- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 12 ++-- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 70 ++ 3 files changed, 58 insertions(+), 32 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 8b93d8d02116..150fb52a0737 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -867,8 +867,14 @@ int mlx5e_close_locked(struct net_device *netdev); int mlx5e_open_channels(struct mlx5e_priv *priv, struct mlx5e_channels *chs); void mlx5e_close_channels(struct mlx5e_channels *chs); + +/* Function pointer to be used to modify WH settings while + * switching channels + */ +typedef int (*mlx5e_fp_hw_modify)(struct mlx5e_priv *priv); void mlx5e_switch_priv_channels(struct mlx5e_priv *priv, - struct mlx5e_channels *new_chs); + struct mlx5e_channels *new_chs, + mlx5e_fp_hw_modify hw_modify); void mlx5e_build_default_indir_rqt(struct mlx5_core_dev *mdev, u32 *indirection_rqt, int len, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index c5f49e294987..40912937d211 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -540,7 +540,7 @@ static int mlx5e_set_ringparam(struct net_device *dev, if (err) goto unlock; - mlx5e_switch_priv_channels(priv, _channels); + mlx5e_switch_priv_channels(priv, _channels, NULL); unlock: mutex_unlock(>state_lock); @@ -597,7 +597,7 @@ static int mlx5e_set_channels(struct net_device *dev, mlx5e_arfs_disable(priv); /* Switch to new channels, set new parameters and close old ones */ - mlx5e_switch_priv_channels(priv, _channels); + mlx5e_switch_priv_channels(priv, _channels, NULL); if (arfs_enabled) { err = mlx5e_arfs_enable(priv); @@ -691,7 +691,7 @@ static int mlx5e_set_coalesce(struct net_device *netdev, if (err) goto out; - mlx5e_switch_priv_channels(priv, _channels); + mlx5e_switch_priv_channels(priv, _channels, NULL); out: mutex_unlock(>state_lock); @@ -1166,7 +1166,7 @@ static int mlx5e_set_tunable(struct net_device *dev, err = mlx5e_open_channels(priv, _channels); if (err) break; - mlx5e_switch_priv_channels(priv, _channels); + mlx5e_switch_priv_channels(priv, _channels, NULL); break; default: @@ -1503,7 +1503,7 @@ static int set_pflag_rx_cqe_based_moder(struct net_device *netdev, bool enable) if (err) return err; - mlx5e_switch_priv_channels(priv, _channels); + mlx5e_switch_priv_channels(priv, _channels, NULL); return 0; } @@ -1534,7 +1534,7 @@ int mlx5e_modify_rx_cqe_compression_locked(struct mlx5e_priv *priv, bool new_val if (err) return err; - mlx5e_switch_priv_channels(priv, _channels); + mlx5e_switch_priv_channels(priv, _channels, NULL); return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 1e29f40d84ca..68d6c3c58ba7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -2437,9 +2437,9 @@ static void mlx5e_query_mtu(struct mlx5e_priv *priv, u16 *mtu) *mtu = MLX5E_HW2SW_MTU(hw_mtu); } -static int mlx5e_set_dev_port_mtu(struct net_device *netdev) +static int mlx5e_set_dev_port_mtu(struct mlx5e_priv *priv) { - struct mlx5e_priv *priv = netdev_priv(netdev); + struct net_device *netdev = priv->netdev; u16 mtu; int err; @@ -2534,7 +2534,8 @@ static void mlx5e_deactivate_priv_channels(struct mlx5e_priv *priv) } void mlx5e_switch_priv_channels(struct mlx5e_priv *priv, - struct mlx5e_channels *new_chs) + struct mlx5e_channels *new_chs, + mlx5e_f
[net-next 01/14] net/mlx5e: Set SQ max rate on mlx5e_open_txqsq rather on open_channel
Instead of iterating over the channel SQs to set their max rate, do it on SQ creation per TXQ SQ. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index e849a0fc2653..469d6c147db7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -1207,6 +1207,9 @@ static int mlx5e_create_sq_rdy(struct mlx5e_priv *priv, return err; } +static int mlx5e_set_sq_maxrate(struct net_device *dev, + struct mlx5e_txqsq *sq, u32 rate); + static int mlx5e_open_txqsq(struct mlx5e_channel *c, int tc, struct mlx5e_sq_param *param, @@ -1214,6 +1217,8 @@ static int mlx5e_open_txqsq(struct mlx5e_channel *c, { struct mlx5e_create_sq_param csp = {}; struct mlx5e_priv *priv = c->priv; + u32 tx_rate; + int txq_ix; int err; err = mlx5e_alloc_txqsq(c, tc, param, sq); @@ -1230,6 +1235,11 @@ static int mlx5e_open_txqsq(struct mlx5e_channel *c, if (err) goto err_free_txqsq; + txq_ix = c->ix + tc * priv->params.num_channels; + tx_rate = priv->tx_rates[txq_ix]; + if (tx_rate) + mlx5e_set_sq_maxrate(priv->netdev, sq, tx_rate); + netdev_tx_reset_queue(sq->txq); netif_tx_start_queue(sq->txq); return 0; @@ -1692,7 +1702,6 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, int cpu = mlx5e_get_cpu(priv, ix); struct mlx5e_channel *c; int err; - int i; c = kzalloc_node(sizeof(*c), GFP_KERNEL, cpu_to_node(cpu)); if (!c) @@ -1745,17 +1754,6 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, if (err) goto err_close_icosq; - for (i = 0; i < priv->params.num_tc; i++) { - u32 txq_ix = priv->channeltc_to_txq_map[ix][i]; - - if (priv->tx_rates[txq_ix]) { - struct mlx5e_txqsq *sq = priv->txq_to_sq_map[txq_ix]; - - mlx5e_set_sq_maxrate(priv->netdev, sq, -priv->tx_rates[txq_ix]); - } - } - err = c->xdp ? mlx5e_open_xdpsq(c, >xdp_sq, >rq.xdpsq) : 0; if (err) goto err_close_sqs; -- 2.11.0
[net-next 04/14] net/mlx5e: Redirect RQT refactoring
RQ Tables are always created once (on netdev creation) pointing to drop RQ and at that stage, RQ tables (indirection tables) are always directed to drop RQ. We don't need to use mlx5e_fill_{direct,indir}_rqt_rqns to fill the drop RQ in create RQT procedure. Instead of having separate flows to redirect direct and indirect RQ Tables to the current active channels Receive Queues (RQs), we unify the two flows by introducing mlx5e_redirect_rqt function and redirect_rqt_param struct. Combined, they provide one generic logic to fill the RQ table RQ numbers regardless of the RQ table purpose (direct/indirect). Demonstrated the usage with mlx5e_redirect_rqts_to_channels which will be called on mlx5e_open and with mlx5e_redirect_rqts_to_drop which will be called on mlx5e_close. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 14 +- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 24 ++- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 169 - 3 files changed, 129 insertions(+), 78 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index b00c6688ddcf..50dfc4c6c8e4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -843,7 +843,19 @@ void mlx5e_disable_vlan_filter(struct mlx5e_priv *priv); int mlx5e_modify_channels_vsd(struct mlx5e_channels *chs, bool vsd); -int mlx5e_redirect_rqt(struct mlx5e_priv *priv, u32 rqtn, int sz, int ix); +struct mlx5e_redirect_rqt_param { + bool is_rss; + union { + u32 rqn; /* Direct RQN (Non-RSS) */ + struct { + u8 hfunc; + struct mlx5e_channels *channels; + } rss; /* RSS data */ + }; +}; + +int mlx5e_redirect_rqt(struct mlx5e_priv *priv, u32 rqtn, int sz, + struct mlx5e_redirect_rqt_param rrp); void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_priv *priv, void *tirc, enum mlx5e_traffic_types tt); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index 2e54a6564d86..faa21848c9dc 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -1027,20 +1027,28 @@ static int mlx5e_set_rxfh(struct net_device *dev, const u32 *indir, mutex_lock(>state_lock); - if (indir) { - u32 rqtn = priv->indir_rqt.rqtn; - - memcpy(priv->params.indirection_rqt, indir, - sizeof(priv->params.indirection_rqt)); - mlx5e_redirect_rqt(priv, rqtn, MLX5E_INDIR_RQT_SIZE, 0); - } - if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != priv->params.rss_hfunc) { priv->params.rss_hfunc = hfunc; hash_changed = true; } + if (indir) { + memcpy(priv->params.indirection_rqt, indir, + sizeof(priv->params.indirection_rqt)); + + if (test_bit(MLX5E_STATE_OPENED, >state)) { + u32 rqtn = priv->indir_rqt.rqtn; + struct mlx5e_redirect_rqt_param rrp = { + .is_rss = true, + .rss.hfunc = priv->params.rss_hfunc, + .rss.channels = >channels + }; + + mlx5e_redirect_rqt(priv, rqtn, MLX5E_INDIR_RQT_SIZE, rrp); + } + } + if (key) { memcpy(priv->params.toeplitz_hash_key, key, sizeof(priv->params.toeplitz_hash_key)); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 920e72ae992e..aec77f075714 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -2046,61 +2046,15 @@ static void mlx5e_close_channels(struct mlx5e_priv *priv) chs->num = 0; } -static int mlx5e_rx_hash_fn(int hfunc) -{ - return (hfunc == ETH_RSS_HASH_TOP) ? - MLX5_RX_HASH_FN_TOEPLITZ : - MLX5_RX_HASH_FN_INVERTED_XOR8; -} - -static int mlx5e_bits_invert(unsigned long a, int size) -{ - int inv = 0; - int i; - - for (i = 0; i < size; i++) - inv |= (test_bit(size - i - 1, ) ? 1 : 0) << i; - - return inv; -} - -static void mlx5e_fill_indir_rqt_rqns(struct mlx5e_priv *priv, void *rqtc) -{ - int i; - - for (i = 0; i < MLX5E_INDIR_RQT_SIZE; i++) { - int ix = i; - u32 rqn; - - if (priv->params.rss_hfunc == ETH_RSS_HASH
[net-next 06/14] net/mlx5e: Split open/close channels to stages
As a foundation for safe config flow, a simple clear API such as (Open then Activate) where the "Open" handles the heavy unsafe creation operation and the "activate" will be fast and fail safe, to enable the newly created channels. For this we split the RQs/TXQ SQs and channels open/close flows to open => activate, deactivate => close. This will simplify the ability to have fail safe configuration changes in downstream patches as follows: make_new_config(new_params) { old_channels = current_active_channels; new_channels = create_channels(new_params); if (!new_channels) return "Failed, but current channels still active :)" deactivate_channels(old_channels); /* Can't fail */ activate_channels(new_channels); /* Can't fail */ close_channels(old_channels); current_active_channels = new_channels; return "SUCCESS"; } Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 5 +- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 214 ++--- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c| 4 +- 4 files changed, 148 insertions(+), 77 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 5f7cc58d900c..f1895ebe7fe5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -358,6 +358,7 @@ struct mlx5e_txqsq { struct mlx5_wq_ctrlwq_ctrl; struct mlx5e_channel *channel; inttc; + inttxq_ix; u32rate_limit; } cacheline_aligned_in_smp; @@ -732,8 +733,8 @@ struct mlx5e_profile { struct mlx5e_priv { /* priv data path fields - start */ - struct mlx5e_txqsq **txq_to_sq_map; - int channeltc_to_txq_map[MLX5E_MAX_NUM_CHANNELS][MLX5E_MAX_NUM_TC]; + struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC]; + int channel_tc2txq[MLX5E_MAX_NUM_CHANNELS][MLX5E_MAX_NUM_TC]; struct bpf_prog *xdp_prog; /* priv data path fields - end */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index faa21848c9dc..5159358a242d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -269,7 +269,7 @@ static void mlx5e_fill_stats_strings(struct mlx5e_priv *priv, uint8_t *data) for (j = 0; j < NUM_SQ_STATS; j++) sprintf(data + (idx++) * ETH_GSTRING_LEN, sq_stats_desc[j].format, - priv->channeltc_to_txq_map[i][tc]); + priv->channel_tc2txq[i][tc]); } static void mlx5e_get_strings(struct net_device *dev, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index a98d01684247..6be7c2367d41 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -820,6 +820,8 @@ static int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq) msleep(20); } + netdev_warn(priv->netdev, "Failed to get min RX wqes on RQN[0x%x] wq cur_sz(%d) min_rx_wqes(%d)\n", + rq->rqn, wq->cur_sz, priv->params.min_rx_wqes); return -ETIMEDOUT; } @@ -848,9 +850,6 @@ static int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_rq_param *param, struct mlx5e_rq *rq) { - struct mlx5e_icosq *sq = >icosq; - u16 pi = sq->pc & sq->wq.sz_m1; - struct mlx5e_tx_wqe *nopwqe; int err; err = mlx5e_alloc_rq(c, param, rq); @@ -861,7 +860,6 @@ static int mlx5e_open_rq(struct mlx5e_channel *c, if (err) goto err_free_rq; - set_bit(MLX5E_RQ_STATE_ENABLED, >state); err = mlx5e_modify_rq_state(rq, MLX5_RQC_STATE_RST, MLX5_RQC_STATE_RDY); if (err) goto err_destroy_rq; @@ -869,14 +867,9 @@ static int mlx5e_open_rq(struct mlx5e_channel *c, if (param->am_enabled) set_bit(MLX5E_RQ_STATE_AM, >rq.state); - sq->db.ico_wqe[pi].opcode = MLX5_OPCODE_NOP; - sq->db.ico_wqe[pi].num_wqebbs = 1; - nopwqe = mlx5e_post_nop(>wq, sq->sqn, >pc); - mlx5e_notify_hw(>wq, sq->pc, sq->uar_map, >ctrl); return 0; err_destroy_rq: - clear_bit(MLX5E_RQ_STATE_ENABLED, >state); mlx5e_destroy_rq(rq); err_free
[net-next 11/14] net/mlx5e: Fail safe ethtool settings
Use the new fail-safe channels switch mechanism to set new ethtool settings: - ring parameters - coalesce parameters - tx copy break parameters Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 120 + 1 file changed, 73 insertions(+), 47 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index e5cee400a4d3..457a796cc248 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -457,8 +457,8 @@ static int mlx5e_set_ringparam(struct net_device *dev, { struct mlx5e_priv *priv = netdev_priv(dev); int rq_wq_type = priv->channels.params.rq_wq_type; + struct mlx5e_channels new_channels = {}; u32 rx_pending_wqes; - bool was_opened; u32 min_rq_size; u32 max_rq_size; u8 log_rq_size; @@ -527,16 +527,22 @@ static int mlx5e_set_ringparam(struct net_device *dev, mutex_lock(>state_lock); - was_opened = test_bit(MLX5E_STATE_OPENED, >state); - if (was_opened) - mlx5e_close_locked(dev); + new_channels.params = priv->channels.params; + new_channels.params.log_rq_size = log_rq_size; + new_channels.params.log_sq_size = log_sq_size; - priv->channels.params.log_rq_size = log_rq_size; - priv->channels.params.log_sq_size = log_sq_size; + if (!test_bit(MLX5E_STATE_OPENED, >state)) { + priv->channels.params = new_channels.params; + goto unlock; + } - if (was_opened) - err = mlx5e_open_locked(dev); + err = mlx5e_open_channels(priv, _channels); + if (err) + goto unlock; + + mlx5e_switch_priv_channels(priv, _channels); +unlock: mutex_unlock(>state_lock); return err; @@ -623,36 +629,13 @@ static int mlx5e_get_coalesce(struct net_device *netdev, return 0; } -static int mlx5e_set_coalesce(struct net_device *netdev, - struct ethtool_coalesce *coal) +static void +mlx5e_set_priv_channels_coalesce(struct mlx5e_priv *priv, struct ethtool_coalesce *coal) { - struct mlx5e_priv *priv= netdev_priv(netdev); struct mlx5_core_dev *mdev = priv->mdev; - bool restart = - !!coal->use_adaptive_rx_coalesce != priv->channels.params.rx_am_enabled; - bool was_opened; - int err = 0; int tc; int i; - if (!MLX5_CAP_GEN(mdev, cq_moderation)) - return -EOPNOTSUPP; - - mutex_lock(>state_lock); - - was_opened = test_bit(MLX5E_STATE_OPENED, >state); - if (was_opened && restart) { - mlx5e_close_locked(netdev); - priv->channels.params.rx_am_enabled = !!coal->use_adaptive_rx_coalesce; - } - - priv->channels.params.tx_cq_moderation.usec = coal->tx_coalesce_usecs; - priv->channels.params.tx_cq_moderation.pkts = coal->tx_max_coalesced_frames; - priv->channels.params.rx_cq_moderation.usec = coal->rx_coalesce_usecs; - priv->channels.params.rx_cq_moderation.pkts = coal->rx_max_coalesced_frames; - - if (!was_opened || restart) - goto out; for (i = 0; i < priv->channels.num; ++i) { struct mlx5e_channel *c = priv->channels.c[i]; @@ -667,11 +650,50 @@ static int mlx5e_set_coalesce(struct net_device *netdev, coal->rx_coalesce_usecs, coal->rx_max_coalesced_frames); } +} -out: - if (was_opened && restart) - err = mlx5e_open_locked(netdev); +static int mlx5e_set_coalesce(struct net_device *netdev, + struct ethtool_coalesce *coal) +{ + struct mlx5e_priv *priv= netdev_priv(netdev); + struct mlx5_core_dev *mdev = priv->mdev; + struct mlx5e_channels new_channels = {}; + int err = 0; + bool reset; + if (!MLX5_CAP_GEN(mdev, cq_moderation)) + return -EOPNOTSUPP; + + mutex_lock(>state_lock); + new_channels.params = priv->channels.params; + + new_channels.params.tx_cq_moderation.usec = coal->tx_coalesce_usecs; + new_channels.params.tx_cq_moderation.pkts = coal->tx_max_coalesced_frames; + new_channels.params.rx_cq_moderation.usec = coal->rx_coalesce_usecs; + new_channels.params.rx_cq_moderation.pkts = coal->rx_max_coalesced_frames; + new_channels.params.rx_am_enabled = !!coal->use_adaptive_rx_coalesce; + + if (!test_bit(MLX5E_STATE_OPENED, >state)) { + priv->channels.params =
[net-next 13/14] net/mlx5e: Fail safe tc setup
Use the new fail-safe channels switch mechanism to set up new tc parameters. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 97e153209834..1e29f40d84ca 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -2910,7 +2910,7 @@ int mlx5e_modify_channels_vsd(struct mlx5e_channels *chs, bool vsd) static int mlx5e_setup_tc(struct net_device *netdev, u8 tc) { struct mlx5e_priv *priv = netdev_priv(netdev); - bool was_opened; + struct mlx5e_channels new_channels = {}; int err = 0; if (tc && tc != MLX5E_MAX_NUM_TC) @@ -2918,17 +2918,21 @@ static int mlx5e_setup_tc(struct net_device *netdev, u8 tc) mutex_lock(>state_lock); - was_opened = test_bit(MLX5E_STATE_OPENED, >state); - if (was_opened) - mlx5e_close_locked(priv->netdev); + new_channels.params = priv->channels.params; + new_channels.params.num_tc = tc ? tc : 1; - priv->channels.params.num_tc = tc ? tc : 1; + if (test_bit(MLX5E_STATE_OPENED, >state)) { + priv->channels.params = new_channels.params; + goto out; + } - if (was_opened) - err = mlx5e_open_locked(priv->netdev); + err = mlx5e_open_channels(priv, _channels); + if (err) + goto out; + mlx5e_switch_priv_channels(priv, _channels); +out: mutex_unlock(>state_lock); - return err; } -- 2.11.0
[net-next 10/14] net/mlx5e: Introduce switch channels
A fail safe helper functions that allows switching to new channels on the fly, In simple words: make_new_config(new_params) { new_channels = open_channels(new_params); if (!new_channels) return "Failed, but current channels are still active :)" switch_channels(new_channels); return "SUCCESS"; } Demonstrate mlx5e_switch_priv_channels usage in set channels ethtool callback and make it fail-safe using the new switch channels mechanism. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Tariq Toukan <tar...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 7 + .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 29 - drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 30 +++--- 3 files changed, 51 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 44c454b34754..2f259dfbf844 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -863,6 +863,13 @@ void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_params *params, int mlx5e_open_locked(struct net_device *netdev); int mlx5e_close_locked(struct net_device *netdev); + +int mlx5e_open_channels(struct mlx5e_priv *priv, + struct mlx5e_channels *chs); +void mlx5e_close_channels(struct mlx5e_channels *chs); +void mlx5e_switch_priv_channels(struct mlx5e_priv *priv, + struct mlx5e_channels *new_chs); + void mlx5e_build_default_indir_rqt(struct mlx5_core_dev *mdev, u32 *indirection_rqt, int len, int num_channels); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index b2cd0ef7921e..e5cee400a4d3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -556,8 +556,8 @@ static int mlx5e_set_channels(struct net_device *dev, { struct mlx5e_priv *priv = netdev_priv(dev); unsigned int count = ch->combined_count; + struct mlx5e_channels new_channels = {}; bool arfs_enabled; - bool was_opened; int err = 0; if (!count) { @@ -571,22 +571,27 @@ static int mlx5e_set_channels(struct net_device *dev, mutex_lock(>state_lock); - was_opened = test_bit(MLX5E_STATE_OPENED, >state); - if (was_opened) - mlx5e_close_locked(dev); + new_channels.params = priv->channels.params; + new_channels.params.num_channels = count; + mlx5e_build_default_indir_rqt(priv->mdev, new_channels.params.indirection_rqt, + MLX5E_INDIR_RQT_SIZE, count); + + if (!test_bit(MLX5E_STATE_OPENED, >state)) { + priv->channels.params = new_channels.params; + goto out; + } + + /* Create fresh channels with new parameters */ + err = mlx5e_open_channels(priv, _channels); + if (err) + goto out; arfs_enabled = dev->features & NETIF_F_NTUPLE; if (arfs_enabled) mlx5e_arfs_disable(priv); - priv->channels.params.num_channels = count; - mlx5e_build_default_indir_rqt(priv->mdev, priv->channels.params.indirection_rqt, - MLX5E_INDIR_RQT_SIZE, count); - - if (was_opened) - err = mlx5e_open_locked(dev); - if (err) - goto out; + /* Switch to new channels, set new parameters and close old ones */ + mlx5e_switch_priv_channels(priv, _channels); if (arfs_enabled) { err = mlx5e_arfs_enable(priv); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index a94f84ec2c1a..97e153209834 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -1972,8 +1972,8 @@ static void mlx5e_build_channel_param(struct mlx5e_priv *priv, mlx5e_build_ico_cq_param(priv, icosq_log_wq_sz, >icosq_cq); } -static int mlx5e_open_channels(struct mlx5e_priv *priv, - struct mlx5e_channels *chs) +int mlx5e_open_channels(struct mlx5e_priv *priv, + struct mlx5e_channels *chs) { struct mlx5e_channel_param *cparam; int err = -ENOMEM; @@ -2037,7 +2037,7 @@ static void mlx5e_deactivate_channels(struct mlx5e_channels *chs) mlx5e_deactivate_channel(chs->c[i]); } -static void mlx5e_close_channels(struct mlx5e_channels *chs) +void mlx5e_close_channels(struct mlx5e_channels *chs) { int i; @@ -2533,6 +2533,30 @@ static void mlx5e_deactivate_priv_channels(struct mlx5e_priv *priv)
Re: mlx5e backports for v4.9 -stable
On 03/17/2017 02:06 AM, David Miller wrote: > > Commits: > > > From b0d4660b4cc52e6477ca3a43435351d565dfcedc Mon Sep 17 00:00:00 2001 > From: Tariq Toukan <tar...@mellanox.com> > Date: Wed, 22 Feb 2017 17:20:14 +0200 > Subject: [PATCH] net/mlx5e: Fix broken CQE compression initialization > > > and > > > From 6dc4b54e77282caf17f0ff72aa32dd296037fbc0 Mon Sep 17 00:00:00 2001 > From: Saeed Mahameed <sae...@mellanox.com> > Date: Wed, 22 Feb 2017 17:20:15 +0200 > Subject: [PATCH] net/mlx5e: Update MPWQE stride size when modifying CQE > compress state > > > do not apply even closely to v4.9 while I was working on -stable backports. > > Please provide proper backports of these two patches if you want them to > show up in v4.9 -stable. > Hi Dave, thank you for trying, we will provide the patches, but I don't know what is the right procedure to do so. is it ok to post the patches applied on top tag v4.9.16 of kernel/git/stable/linux-stable.git ? to whom should I send them ? thanks, Saeed.
[PATCH net 0/8] Mellanox mlx5 fixes 2017-03-21
Hi Dave, This series contains some mlx5 core and ethernet driver fixes. For -stable: net/mlx5e: Count LRO packets correctly (for kernel >= 4.2) net/mlx5e: Count GSO packets correctly (for kernel >= 4.2) net/mlx5: Increase number of max QPs in default profile (for kernel >= 4.0) net/mlx5e: Avoid supporting udp tunnel port ndo for VF reps (for kernel >= 4.10) net/mlx5e: Use the proper UAPI values when offloading TC vlan actions (for kernel >= v4.9) net/mlx5: E-Switch, Don't allow changing inline mode when flows are configured (for kernel >= 4.10) net/mlx5e: Change the TC offload rule add/del code path to be per NIC or E-Switch (for kernel >= 4.10) net/mlx5: Add missing entries for set/query rate limit commands (for kernel >= 4.8) Thanks, Saeed. Gal Pressman (2): net/mlx5e: Count GSO packets correctly net/mlx5e: Count LRO packets correctly Maor Gottlieb (1): net/mlx5: Increase number of max QPs in default profile Or Gerlitz (3): net/mlx5: Add missing entries for set/query rate limit commands net/mlx5e: Change the TC offload rule add/del code path to be per NIC or E-Switch net/mlx5e: Use the proper UAPI values when offloading TC vlan actions Paul Blakey (1): net/mlx5e: Avoid supporting udp tunnel port ndo for VF reps Roi Dayan (1): net/mlx5: E-Switch, Don't allow changing inline mode when flows are configured drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 ++ drivers/net/ethernet/mellanox/mlx5/core/en.h | 4 -- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 +-- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 - drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 4 ++ drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 74 +++--- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c| 5 +- drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 6 ++ .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 22 +++ drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +- 10 files changed, 94 insertions(+), 37 deletions(-) -- 2.11.0
[PATCH net 8/8] net/mlx5e: Count LRO packets correctly
From: Gal Pressman <g...@mellanox.com> RX packets statistics ('rx_packets' counter) used to count LRO packets as one, even though it contains multiple segments. This patch will increment the counter by the number of segments, and align the driver with the behavior of other drivers in the stack. Note that no information is lost in this patch due to 'rx_lro_packets' counter existence. Before, ethtool showed: $ ethtool -S ens6 | egrep "rx_packets|rx_lro_packets" rx_packets: 435277 rx_lro_packets: 35847 rx_packets_phy: 1935066 Now, we will see the more logical statistics: $ ethtool -S ens6 | egrep "rx_packets|rx_lro_packets" rx_packets: 1935066 rx_lro_packets: 35847 rx_packets_phy: 1935066 Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files") Signed-off-by: Gal Pressman <g...@mellanox.com> Cc: kernel-t...@fb.com Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index 3d371688fbbb..bafcb349a50c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -601,6 +601,10 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe, if (lro_num_seg > 1) { mlx5e_lro_update_hdr(skb, cqe, cqe_bcnt); skb_shinfo(skb)->gso_size = DIV_ROUND_UP(cqe_bcnt, lro_num_seg); + /* Subtract one since we already counted this as one +* "regular" packet in mlx5e_complete_rx_cqe() +*/ + rq->stats.packets += lro_num_seg - 1; rq->stats.lro_packets++; rq->stats.lro_bytes += cqe_bcnt; } -- 2.11.0
[PATCH net 5/8] net/mlx5e: Avoid supporting udp tunnel port ndo for VF reps
From: Paul Blakey <pa...@mellanox.com> This was added to allow the TC offloading code to identify offloading encap/decap vxlan rules. The VF reps are effectively related to the same mlx5 PCI device as the PF. Since the kernel invokes the (say) delete ndo for each netdev, the FW erred on multiple vxlan dst port deletes when the port was deleted from the system. We fix that by keeping the registration to be carried out only by the PF. Since the PF serves as the uplink device, the VF reps will look up a port there and realize if they are ok to offload that. Tested: ip link add vxlan1 type vxlan id 44 dev ens5f0 dstport ip link set vxlan1 up ip link del dev vxlan1 Fixes: 4a25730eb202 ('net/mlx5e: Add ndo_udp_tunnel_add to VF representors') Signed-off-by: Paul Blakey <pa...@mellanox.com> Reviewed-by: Or Gerlitz <ogerl...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 4 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 -- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 9 +++-- 4 files changed, 11 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index f6a6ded204f6..dc52053128bc 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -928,10 +928,6 @@ void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv); int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev); void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev); u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout); -void mlx5e_add_vxlan_port(struct net_device *netdev, - struct udp_tunnel_info *ti); -void mlx5e_del_vxlan_port(struct net_device *netdev, - struct udp_tunnel_info *ti); int mlx5e_get_offload_stats(int attr_id, const struct net_device *dev, void *sp); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 8ef64c4db2c2..66c133757a5e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3100,8 +3100,8 @@ static int mlx5e_get_vf_stats(struct net_device *dev, vf_stats); } -void mlx5e_add_vxlan_port(struct net_device *netdev, - struct udp_tunnel_info *ti) +static void mlx5e_add_vxlan_port(struct net_device *netdev, +struct udp_tunnel_info *ti) { struct mlx5e_priv *priv = netdev_priv(netdev); @@ -3114,8 +3114,8 @@ void mlx5e_add_vxlan_port(struct net_device *netdev, mlx5e_vxlan_queue_work(priv, ti->sa_family, be16_to_cpu(ti->port), 1); } -void mlx5e_del_vxlan_port(struct net_device *netdev, - struct udp_tunnel_info *ti) +static void mlx5e_del_vxlan_port(struct net_device *netdev, +struct udp_tunnel_info *ti) { struct mlx5e_priv *priv = netdev_priv(netdev); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 2c864574a9d5..f621373bd7a5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -393,8 +393,6 @@ static const struct net_device_ops mlx5e_netdev_ops_rep = { .ndo_get_phys_port_name = mlx5e_rep_get_phys_port_name, .ndo_setup_tc= mlx5e_rep_ndo_setup_tc, .ndo_get_stats64 = mlx5e_rep_get_stats, - .ndo_udp_tunnel_add = mlx5e_add_vxlan_port, - .ndo_udp_tunnel_del = mlx5e_del_vxlan_port, .ndo_has_offload_stats = mlx5e_has_offload_stats, .ndo_get_offload_stats = mlx5e_get_offload_stats, }; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 9c13abaf3885..fade7233dac5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -267,12 +267,15 @@ static int parse_tunnel_attr(struct mlx5e_priv *priv, skb_flow_dissector_target(f->dissector, FLOW_DISSECTOR_KEY_ENC_PORTS, f->mask); + struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + struct net_device *up_dev = mlx5_eswitch_get_uplink_netdev(esw); + struct mlx5e_priv *up_priv = netdev_priv(up_dev); /* Full udp dst port must be given */ if (memchr_inv(>dst, 0xff, sizeof(mask->dst)))
[PATCH net 4/8] net/mlx5e: Use the proper UAPI values when offloading TC vlan actions
From: Or Gerlitz <ogerl...@mellanox.com> Currently we use the non UAPI values and we miss erring on the modify action which is not supported, fix that. Fixes: 8b32580df1cb ('net/mlx5e: Add TC vlan action for SRIOV offloads') Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reported-by: Petr Machata <pe...@mellanox.com> Reviewed-by: Jiri Pirko <j...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 2825b5665456..9c13abaf3885 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -1131,14 +1131,16 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts, } if (is_tcf_vlan(a)) { - if (tcf_vlan_action(a) == VLAN_F_POP) { + if (tcf_vlan_action(a) == TCA_VLAN_ACT_POP) { attr->action |= MLX5_FLOW_CONTEXT_ACTION_VLAN_POP; - } else if (tcf_vlan_action(a) == VLAN_F_PUSH) { + } else if (tcf_vlan_action(a) == TCA_VLAN_ACT_PUSH) { if (tcf_vlan_push_proto(a) != htons(ETH_P_8021Q)) return -EOPNOTSUPP; attr->action |= MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH; attr->vlan = tcf_vlan_push_vid(a); + } else { /* action is TCA_VLAN_ACT_MODIFY */ + return -EOPNOTSUPP; } continue; } -- 2.11.0
[PATCH net 6/8] net/mlx5: Increase number of max QPs in default profile
From: Maor Gottlieb <ma...@mellanox.com> With ConnectX-4 sharing SRQs from the same space as QPs, we hit a limit preventing some applications to allocate needed QPs amount. Double the size to 256K. Fixes: e126ba97dba9e ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Maor Gottlieb <ma...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index e2bd600d19de..60154a175bd3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -87,7 +87,7 @@ static struct mlx5_profile profile[] = { [2] = { .mask = MLX5_PROF_MASK_QP_SIZE | MLX5_PROF_MASK_MR_CACHE, - .log_max_qp = 17, + .log_max_qp = 18, .mr_cache[0]= { .size = 500, .limit = 250 -- 2.11.0
[PATCH net 7/8] net/mlx5e: Count GSO packets correctly
From: Gal Pressman <g...@mellanox.com> TX packets statistics ('tx_packets' counter) used to count GSO packets as one, even though it contains multiple segments. This patch will increment the counter by the number of segments, and align the driver with the behavior of other drivers in the stack. Note that no information is lost in this patch due to 'tx_tso_packets' counter existence. Before, ethtool showed: $ ethtool -S ens6 | egrep "tx_packets|tx_tso_packets" tx_packets: 61340 tx_tso_packets: 60954 tx_packets_phy: 2451115 Now, we will see the more logical statistics: $ ethtool -S ens6 | egrep "tx_packets|tx_tso_packets" tx_packets: 2451115 tx_tso_packets: 60954 tx_packets_phy: 2451115 Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files") Signed-off-by: Gal Pressman <g...@mellanox.com> Cc: kernel-t...@fb.com Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index f193128bac4b..57f5e2d7ebd1 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -274,15 +274,18 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb) sq->stats.tso_bytes += skb->len - ihs; } + sq->stats.packets += skb_shinfo(skb)->gso_segs; num_bytes = skb->len + (skb_shinfo(skb)->gso_segs - 1) * ihs; } else { bf = sq->bf_budget && !skb->xmit_more && !skb_shinfo(skb)->nr_frags; ihs = mlx5e_get_inline_hdr_size(sq, skb, bf); + sq->stats.packets++; num_bytes = max_t(unsigned int, skb->len, ETH_ZLEN); } + sq->stats.bytes += num_bytes; wi->num_bytes = num_bytes; ds_cnt = sizeof(*wqe) / MLX5_SEND_WQE_DS; @@ -381,8 +384,6 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb) if (bf) sq->bf_budget--; - sq->stats.packets++; - sq->stats.bytes += num_bytes; return NETDEV_TX_OK; dma_unmap_wqe_err: -- 2.11.0
Re: mlx5e backports for v4.9 -stable
On Mon, Mar 20, 2017 at 11:32 PM, Saeed Mahameed <sae...@mellanox.com> wrote: > > > On 03/17/2017 02:06 AM, David Miller wrote: >> >> Commits: >> >> >> From b0d4660b4cc52e6477ca3a43435351d565dfcedc Mon Sep 17 00:00:00 2001 >> From: Tariq Toukan <tar...@mellanox.com> >> Date: Wed, 22 Feb 2017 17:20:14 +0200 >> Subject: [PATCH] net/mlx5e: Fix broken CQE compression initialization >> >> >> and >> >> ==== >> From 6dc4b54e77282caf17f0ff72aa32dd296037fbc0 Mon Sep 17 00:00:00 2001 >> From: Saeed Mahameed <sae...@mellanox.com> >> Date: Wed, 22 Feb 2017 17:20:15 +0200 >> Subject: [PATCH] net/mlx5e: Update MPWQE stride size when modifying CQE >> compress state >> >> >> do not apply even closely to v4.9 while I was working on -stable backports. >> >> Please provide proper backports of these two patches if you want them to >> show up in v4.9 -stable. >> > > Hi Dave, > > thank you for trying, we will provide the patches, but I don't know what is > the right procedure > to do so. > > is it ok to post the patches applied on top tag v4.9.16 of > kernel/git/stable/linux-stable.git ? > to whom should I send them ? > Actually, it looks like the fixes tag is not accurate, the bug is exposed only with the new user control of CQE compression feature which should have took care of those issues, 9bcc86064bb5 ("net/mlx5e: Add CQE compression user control") and this patch was only introduced in 4.10. No need to apply those patches into 4.9-stable. Sorry for the inconvenience. thanks, Saeed.
[PATCH net 3/8] net/mlx5: E-Switch, Don't allow changing inline mode when flows are configured
From: Roi Dayan <r...@mellanox.com> Changing the eswitch inline mode can potentially cause already configured flows not to match the policy. E.g. set policy L4, add some L4 rules, set policy to L2 --> bad! Hence we disallow it. Keep track of how many offloaded rules are now set and refuse inline mode changes if this isn't zero. Fixes: bffaa916588e ("net/mlx5: E-Switch, Add control for inline mode") Signed-off-by: Roi Dayan <r...@mellanox.com> Reviewed-by: Or Gerlitz <ogerl...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 1 + drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 8 2 files changed, 9 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index 9227a83a97e3..ad329b1680b4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -209,6 +209,7 @@ struct mlx5_esw_offload { struct mlx5_eswitch_rep *vport_reps; DECLARE_HASHTABLE(encap_tbl, 8); u8 inline_mode; + u64 num_flows; }; struct mlx5_eswitch { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index bfabefe20ac0..307ec6c5fd3b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -93,6 +93,8 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw, spec, _act, dest, i); if (IS_ERR(rule)) mlx5_fc_destroy(esw->dev, counter); + else + esw->offloads.num_flows++; return rule; } @@ -108,6 +110,7 @@ mlx5_eswitch_del_offloaded_rule(struct mlx5_eswitch *esw, counter = mlx5_flow_rule_counter(rule); mlx5_del_flow_rules(rule); mlx5_fc_destroy(esw->dev, counter); + esw->offloads.num_flows--; } } @@ -922,6 +925,11 @@ int mlx5_devlink_eswitch_inline_mode_set(struct devlink *devlink, u8 mode) MLX5_CAP_INLINE_MODE_VPORT_CONTEXT) return -EOPNOTSUPP; + if (esw->offloads.num_flows > 0) { + esw_warn(dev, "Can't set inline mode when flows are configured\n"); + return -EOPNOTSUPP; + } + err = esw_inline_mode_from_devlink(mode, _mode); if (err) goto out; -- 2.11.0
[PATCH net 1/8] net/mlx5: Add missing entries for set/query rate limit commands
From: Or Gerlitz <ogerl...@mellanox.com> The switch cases for the rate limit set and query commands were missing, which could get us wrong under fw error or driver reset flow, fix that. Fixes: 1466cc5b23d1 ('net/mlx5: Rate limit tables support') Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Hadar Hen Zion <had...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index caa837e5e2b9..a380353a78c2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -361,6 +361,8 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op, case MLX5_CMD_OP_QUERY_VPORT_COUNTER: case MLX5_CMD_OP_ALLOC_Q_COUNTER: case MLX5_CMD_OP_QUERY_Q_COUNTER: + case MLX5_CMD_OP_SET_RATE_LIMIT: + case MLX5_CMD_OP_QUERY_RATE_LIMIT: case MLX5_CMD_OP_ALLOC_PD: case MLX5_CMD_OP_ALLOC_UAR: case MLX5_CMD_OP_CONFIG_INT_MODERATION: @@ -497,6 +499,8 @@ const char *mlx5_command_str(int command) MLX5_COMMAND_STR_CASE(ALLOC_Q_COUNTER); MLX5_COMMAND_STR_CASE(DEALLOC_Q_COUNTER); MLX5_COMMAND_STR_CASE(QUERY_Q_COUNTER); + MLX5_COMMAND_STR_CASE(SET_RATE_LIMIT); + MLX5_COMMAND_STR_CASE(QUERY_RATE_LIMIT); MLX5_COMMAND_STR_CASE(ALLOC_PD); MLX5_COMMAND_STR_CASE(DEALLOC_PD); MLX5_COMMAND_STR_CASE(ALLOC_UAR); -- 2.11.0
[PATCH net 2/8] net/mlx5e: Change the TC offload rule add/del code path to be per NIC or E-Switch
From: Or Gerlitz <ogerl...@mellanox.com> Refactor the code to deal with add/del TC rules to have handler per NIC/E-switch offloading use case, and push the latter into the e-switch code. This provides better separation and is to be used in down-stream patch for applying a fix. Fixes: bffaa916588e ("net/mlx5: E-Switch, Add control for inline mode") Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Roi Dayan <r...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 59 ++ drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 5 ++ .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 14 + 3 files changed, 58 insertions(+), 20 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 79481f4cf264..2825b5665456 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -133,6 +133,23 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv, return rule; } +static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv, + struct mlx5e_tc_flow *flow) +{ + struct mlx5_fc *counter = NULL; + + if (!IS_ERR(flow->rule)) { + counter = mlx5_flow_rule_counter(flow->rule); + mlx5_del_flow_rules(flow->rule); + mlx5_fc_destroy(priv->mdev, counter); + } + + if (!mlx5e_tc_num_filters(priv) && (priv->fs.tc.t)) { + mlx5_destroy_flow_table(priv->fs.tc.t); + priv->fs.tc.t = NULL; + } +} + static struct mlx5_flow_handle * mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, struct mlx5_flow_spec *spec, @@ -149,7 +166,24 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, } static void mlx5e_detach_encap(struct mlx5e_priv *priv, - struct mlx5e_tc_flow *flow) { + struct mlx5e_tc_flow *flow); + +static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv, + struct mlx5e_tc_flow *flow) +{ + struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + + mlx5_eswitch_del_offloaded_rule(esw, flow->rule, flow->attr); + + mlx5_eswitch_del_vlan_action(esw, flow->attr); + + if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP) + mlx5e_detach_encap(priv, flow); +} + +static void mlx5e_detach_encap(struct mlx5e_priv *priv, + struct mlx5e_tc_flow *flow) +{ struct list_head *next = flow->encap.next; list_del(>encap); @@ -173,25 +207,10 @@ static void mlx5e_detach_encap(struct mlx5e_priv *priv, static void mlx5e_tc_del_flow(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow) { - struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; - struct mlx5_fc *counter = NULL; - - if (!IS_ERR(flow->rule)) { - counter = mlx5_flow_rule_counter(flow->rule); - mlx5_del_flow_rules(flow->rule); - mlx5_fc_destroy(priv->mdev, counter); - } - - if (flow->flags & MLX5E_TC_FLOW_ESWITCH) { - mlx5_eswitch_del_vlan_action(esw, flow->attr); - if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP) - mlx5e_detach_encap(priv, flow); - } - - if (!mlx5e_tc_num_filters(priv) && (priv->fs.tc.t)) { - mlx5_destroy_flow_table(priv->fs.tc.t); - priv->fs.tc.t = NULL; - } + if (flow->flags & MLX5E_TC_FLOW_ESWITCH) + mlx5e_tc_del_fdb_flow(priv, flow); + else + mlx5e_tc_del_nic_flow(priv, flow); } static void parse_vxlan_attr(struct mlx5_flow_spec *spec, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index 5b78883d5654..9227a83a97e3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -271,6 +271,11 @@ struct mlx5_flow_handle * mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw, struct mlx5_flow_spec *spec, struct mlx5_esw_flow_attr *attr); +void +mlx5_eswitch_del_offloaded_rule(struct mlx5_eswitch *esw, + struct mlx5_flow_handle *rule, + struct mlx5_esw_flow_attr *attr); + struct mlx5_flow_handle * mlx5_eswitch_create_vport_rx_rule(struct mlx5_eswitch *esw, int vport, u32 tirn); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index 4f5b0d47d5f3..bfabefe20
Re: [PATCH net-next] net/mlx5e: fix build error without CONFIG_SYSFS
On Wed, Apr 5, 2017 at 5:11 AM, Tobias Regnery <tobias.regn...@gmail.com> wrote: > Commit 9008ae074885 ("net/mlx5e: Minimize mlx5e_{open/close}_locked") > copied the calls to netif_set_real_num_{tx,rx}_queues from > mlx5e_open_locked to mlx5e_activate_priv_channels and wraps them in an > if condition to test for netdev->real_num_{tx,rx}_queues. > > But netdev->real_num_rx_queues is conditionally compiled in if CONFIG_SYSFS > is set. Without CONFIG_SYSFS the build fails: > > drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function > 'mlx5e_activate_priv_channels': > drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2515:12: error: 'struct > net_device' has no member named 'real_num_rx_queues'; did you mean > 'real_num_tx_queues'? > > Fix this by unconditionally call netif_set_real_num{tx,rx}_queues like before > commit 9008ae074885. > > Fixes: 9008ae074885 ("net/mlx5e: Minimize mlx5e_{open/close}_locked") > Signed-off-by: Tobias Regnery <tobias.regn...@gmail.com> > --- > drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > index ec389b1b51cb..d5248637d44f 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > @@ -2510,10 +2510,8 @@ static void mlx5e_activate_priv_channels(struct > mlx5e_priv *priv) > struct net_device *netdev = priv->netdev; > > mlx5e_netdev_set_tcs(netdev); > - if (netdev->real_num_tx_queues != num_txqs) > - netif_set_real_num_tx_queues(netdev, num_txqs); > - if (netdev->real_num_rx_queues != priv->channels.num) > - netif_set_real_num_rx_queues(netdev, priv->channels.num); > + netif_set_real_num_tx_queues(netdev, num_txqs); > + netif_set_real_num_rx_queues(netdev, priv->channels.num); > Acked-by: Saeed Mahameed <sae...@mellanox.com> Thanks Tobias for the fix, although it is redundant for most of the reconfiguration options of mlx5 to call set_real_num_{rx,tx} queues every time, but it is not that big of a deal, so it is ok with me to align the code with the previous behavior we had before ("net/mlx5e: Minimize mlx5e_{open/close}_locked").
Re: [PATCH net-next] mlx4: trust shinfo->gso_segs
On Wed, Apr 5, 2017 at 11:49 AM, Eric Dumazet <eric.duma...@gmail.com> wrote: > From: Eric Dumazet <eduma...@google.com> > > mlx4 is the only driver in the tree making a point to recompute > shinfo->gso_segs. > > Lets remove superfluous code. > > Signed-off-by: Eric Dumazet <eduma...@google.com> > Cc: Tariq Toukan <tar...@mellanox.com> > Cc: Saeed Mahameed <sae...@mellanox.com> > --- > drivers/net/ethernet/mellanox/mlx4/en_tx.c |3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c > b/drivers/net/ethernet/mellanox/mlx4/en_tx.c > index > e0c5ffb3e3a6607456e1f191b0b8c8becfc71219..3ba89bc43d74d8c023776079bcd0bbadd70fb5c6 > 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c > +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c > @@ -978,8 +978,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct > net_device *dev) > > ring->tso_packets++; > > - i = ((skb->len - lso_header_size) / shinfo->gso_size) + > - !!((skb->len - lso_header_size) % shinfo->gso_size); > + i = shinfo->gso_segs; > tx_info->nr_bytes = skb->len + (i - 1) * lso_header_size; > ring->packets += i; > } else { > > Reviewed-by: Saeed Mahameed <sae...@mellanox.com>
[PATCH net-next 09/16] net/mlx5e: IPoIB, Basic netdev ndos open/close
Implement open/close of IPoIB netdevice ndos using mlx5e's channels API to manage data path resources (RQs/SQs/CQs). Set IPoIB netdev address on dev_init ndo. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Erez Shitrit <ere...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 + drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 90 ++- 3 files changed, 93 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 5345d875b695..23b92ec54e12 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -883,6 +883,8 @@ typedef int (*mlx5e_fp_hw_modify)(struct mlx5e_priv *priv); void mlx5e_switch_priv_channels(struct mlx5e_priv *priv, struct mlx5e_channels *new_chs, mlx5e_fp_hw_modify hw_modify); +void mlx5e_activate_priv_channels(struct mlx5e_priv *priv); +void mlx5e_deactivate_priv_channels(struct mlx5e_priv *priv); void mlx5e_build_default_indir_rqt(struct mlx5_core_dev *mdev, u32 *indirection_rqt, int len, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 1fde4e2301a4..eb657987e9b5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -2547,7 +2547,7 @@ static void mlx5e_build_channels_tx_maps(struct mlx5e_priv *priv) } } -static void mlx5e_activate_priv_channels(struct mlx5e_priv *priv) +void mlx5e_activate_priv_channels(struct mlx5e_priv *priv) { int num_txqs = priv->channels.num * priv->channels.params.num_tc; struct net_device *netdev = priv->netdev; @@ -2567,7 +2567,7 @@ static void mlx5e_activate_priv_channels(struct mlx5e_priv *priv) mlx5e_redirect_rqts_to_channels(priv, >channels); } -static void mlx5e_deactivate_priv_channels(struct mlx5e_priv *priv) +void mlx5e_deactivate_priv_channels(struct mlx5e_priv *priv) { mlx5e_redirect_rqts_to_drop(priv); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c index d7d705c840ae..e188d067bc97 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c @@ -34,6 +34,18 @@ #include "en.h" #include "ipoib.h" +static int mlx5i_open(struct net_device *netdev); +static int mlx5i_close(struct net_device *netdev); +static int mlx5i_dev_init(struct net_device *dev); +static void mlx5i_dev_cleanup(struct net_device *dev); + +static const struct net_device_ops mlx5i_netdev_ops = { + .ndo_open= mlx5i_open, + .ndo_stop= mlx5i_close, + .ndo_init= mlx5i_dev_init, + .ndo_uninit = mlx5i_dev_cleanup, +}; + /* IPoIB mlx5 netdev profile */ /* Called directly after IPoIB netdevice was created to initialize SW structs */ @@ -52,7 +64,17 @@ static void mlx5i_init(struct mlx5_core_dev *mdev, mlx5e_build_nic_params(mdev, >channels.params, profile->max_nch(mdev)); mutex_init(>state_lock); - /* TODO : init netdev features here */ + + netdev->hw_features|= NETIF_F_SG; + netdev->hw_features|= NETIF_F_IP_CSUM; + netdev->hw_features|= NETIF_F_IPV6_CSUM; + netdev->hw_features|= NETIF_F_GRO; + netdev->hw_features|= NETIF_F_TSO; + netdev->hw_features|= NETIF_F_TSO6; + netdev->hw_features|= NETIF_F_RXCSUM; + netdev->hw_features|= NETIF_F_RXHASH; + + netdev->netdev_ops = _netdev_ops; } /* Called directly before IPoIB netdevice is destroyed to cleanup SW structs */ @@ -181,6 +203,72 @@ static const struct mlx5e_profile mlx5i_nic_profile = { .max_tc= MLX5I_MAX_NUM_TC, }; +/* mlx5i netdev NDos */ + +static int mlx5i_dev_init(struct net_device *dev) +{ + struct mlx5e_priv*priv = mlx5i_epriv(dev); + struct mlx5i_priv*ipriv = priv->ppriv; + + /* Set dev address using underlay QP */ + dev->dev_addr[1] = (ipriv->qp.qpn >> 16) & 0xff; + dev->dev_addr[2] = (ipriv->qp.qpn >> 8) & 0xff; + dev->dev_addr[3] = (ipriv->qp.qpn) & 0xff; + + return 0; +} + +static void mlx5i_dev_cleanup(struct net_device *dev) +{ + /* TODO: detach underlay qp from flow-steering by reset it */ +} + +static int mlx5i_open(struct net_device *netdev) +{ + struct mlx5e_priv *priv = mlx5i_epriv(netdev); + int err; + + mutex_lock(>state_lock); + + set_bit(MLX5E_STATE_OPENED, >state); + + err = mlx5e_open_cha
[PATCH net-next 07/16] net/mlx5e: IPoIB, RSS flow steering tables
Like the mlx5e ethernet mode, on IPoIB mode we need to create RX steering tables, but IPoIB do not require MAC and VLAN steering tables so the only tables we create in here are: 1. TTC Table (Traffic Type Classifier table for RSS steering) 2. ARFS Table (for accelerated RFS support) Creation of those tables is identical to mlx5e ethernet mode, hence the use of mlx5e_create_ttc_table and mlx5e_arfs_create_tables. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Erez Shitrit <ere...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h| 4 +++ drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 7 ++-- drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 46 + 3 files changed, 54 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index e5518536d56f..c813eab5d764 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -999,6 +999,7 @@ int mlx5e_attr_get(struct net_device *dev, struct switchdev_attr *attr); void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe); void mlx5e_update_hw_rep_counters(struct mlx5e_priv *priv); +/* common netdev helpers */ int mlx5e_create_indirect_rqt(struct mlx5e_priv *priv); int mlx5e_create_indirect_tirs(struct mlx5e_priv *priv); @@ -1010,6 +1011,9 @@ int mlx5e_create_direct_tirs(struct mlx5e_priv *priv); void mlx5e_destroy_direct_tirs(struct mlx5e_priv *priv); void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct mlx5e_rqt *rqt); +int mlx5e_create_ttc_table(struct mlx5e_priv *priv, u32 underlay_qpn); +void mlx5e_destroy_ttc_table(struct mlx5e_priv *priv); + int mlx5e_create_tises(struct mlx5e_priv *priv); void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv); int mlx5e_close(struct net_device *netdev); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c index 729904c43801..576d6787b484 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c @@ -792,7 +792,7 @@ static int mlx5e_create_ttc_table_groups(struct mlx5e_ttc_table *ttc) return err; } -static void mlx5e_destroy_ttc_table(struct mlx5e_priv *priv) +void mlx5e_destroy_ttc_table(struct mlx5e_priv *priv) { struct mlx5e_ttc_table *ttc = >fs.ttc; @@ -800,7 +800,7 @@ static void mlx5e_destroy_ttc_table(struct mlx5e_priv *priv) mlx5e_destroy_flow_table(>ft); } -static int mlx5e_create_ttc_table(struct mlx5e_priv *priv) +int mlx5e_create_ttc_table(struct mlx5e_priv *priv, u32 underlay_qpn) { struct mlx5e_ttc_table *ttc = >fs.ttc; struct mlx5_flow_table_attr ft_attr = {}; @@ -810,6 +810,7 @@ static int mlx5e_create_ttc_table(struct mlx5e_priv *priv) ft_attr.max_fte = MLX5E_TTC_TABLE_SIZE; ft_attr.level = MLX5E_TTC_FT_LEVEL; ft_attr.prio = MLX5E_NIC_PRIO; + ft_attr.underlay_qpn = underlay_qpn; ft->t = mlx5_create_flow_table(priv->fs.ns, _attr); if (IS_ERR(ft->t)) { @@ -1146,7 +1147,7 @@ int mlx5e_create_flow_steering(struct mlx5e_priv *priv) priv->netdev->hw_features &= ~NETIF_F_NTUPLE; } - err = mlx5e_create_ttc_table(priv); + err = mlx5e_create_ttc_table(priv, 0); if (err) { netdev_err(priv->netdev, "Failed to create ttc table, err=%d\n", err); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c index f0318920844e..e16e1c7b246e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c @@ -72,6 +72,45 @@ static void mlx5i_cleanup_tx(struct mlx5e_priv *priv) { } +static int mlx5i_create_flow_steering(struct mlx5e_priv *priv) +{ + struct mlx5i_priv *ipriv = priv->ppriv; + int err; + + priv->fs.ns = mlx5_get_flow_namespace(priv->mdev, + MLX5_FLOW_NAMESPACE_KERNEL); + + if (!priv->fs.ns) + return -EINVAL; + + err = mlx5e_arfs_create_tables(priv); + if (err) { + netdev_err(priv->netdev, "Failed to create arfs tables, err=%d\n", + err); + priv->netdev->hw_features &= ~NETIF_F_NTUPLE; + } + + err = mlx5e_create_ttc_table(priv, ipriv->qp.qpn); + if (err) { + netdev_err(priv->netdev, "Failed to create ttc table, err=%d\n", + err); + goto err_destroy_arfs_tables; + } + + return 0; + +err_destroy_arfs_tables: + mlx5e_arfs_destroy_tables(priv); + + return err; +} + +static void mlx5i_destroy_flow_steering(struct mlx5e_priv *priv) +{ + mlx5e_destroy_tt
[PATCH net-next 04/16] net/mlx5e: More generic netdev management API
In preparation for mlx5e RDMA net_device support, here we generalize mlx5e_attach/detach in a way that those functions will be agnostic to link type. For that we move ethernet specific NIC net device logic out of those functions into {nic,rep}_{enable/disable} mlx5e NIC and representor profiles callbacks. Also some of the logic was moved only to NIC profile since it is not right to have this logic for representor net device (e.g. set port MTU). Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Erez Shitrit <ere...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 15 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 160 +++--- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 12 +- 3 files changed, 96 insertions(+), 91 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index b7feecfbb5a5..ced31906b8fd 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -999,12 +999,6 @@ void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv); int mlx5e_close(struct net_device *netdev); int mlx5e_open(struct net_device *netdev); void mlx5e_update_stats_work(struct work_struct *work); -struct net_device *mlx5e_create_netdev(struct mlx5_core_dev *mdev, - const struct mlx5e_profile *profile, - void *ppriv); -void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv); -int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev); -void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev); u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout); int mlx5e_get_offload_stats(int attr_id, const struct net_device *dev, @@ -1013,4 +1007,13 @@ bool mlx5e_has_offload_stats(const struct net_device *dev, int attr_id); bool mlx5e_is_uplink_rep(struct mlx5e_priv *priv); bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv); + +/* mlx5e generic netdev management API */ +struct net_device* +mlx5e_create_netdev(struct mlx5_core_dev *mdev, const struct mlx5e_profile *profile, + void *ppriv); +int mlx5e_attach_netdev(struct mlx5e_priv *priv); +void mlx5e_detach_netdev(struct mlx5e_priv *priv); +void mlx5e_destroy_netdev(struct mlx5e_priv *priv); + #endif /* __MLX5_EN_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 8b7b7e604ea0..cdc34ba354c8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -4121,12 +4121,57 @@ static int mlx5e_init_nic_tx(struct mlx5e_priv *priv) return 0; } +static void mlx5e_register_vport_rep(struct mlx5_core_dev *mdev) +{ + struct mlx5_eswitch *esw = mdev->priv.eswitch; + int total_vfs = MLX5_TOTAL_VPORTS(mdev); + int vport; + u8 mac[ETH_ALEN]; + + if (!MLX5_CAP_GEN(mdev, vport_group_manager)) + return; + + mlx5_query_nic_vport_mac_address(mdev, 0, mac); + + for (vport = 1; vport < total_vfs; vport++) { + struct mlx5_eswitch_rep rep; + + rep.load = mlx5e_vport_rep_load; + rep.unload = mlx5e_vport_rep_unload; + rep.vport = vport; + ether_addr_copy(rep.hw_id, mac); + mlx5_eswitch_register_vport_rep(esw, vport, ); + } +} + +static void mlx5e_unregister_vport_rep(struct mlx5_core_dev *mdev) +{ + struct mlx5_eswitch *esw = mdev->priv.eswitch; + int total_vfs = MLX5_TOTAL_VPORTS(mdev); + int vport; + + if (!MLX5_CAP_GEN(mdev, vport_group_manager)) + return; + + for (vport = 1; vport < total_vfs; vport++) + mlx5_eswitch_unregister_vport_rep(esw, vport); +} + static void mlx5e_nic_enable(struct mlx5e_priv *priv) { struct net_device *netdev = priv->netdev; struct mlx5_core_dev *mdev = priv->mdev; struct mlx5_eswitch *esw = mdev->priv.eswitch; struct mlx5_eswitch_rep rep; + u16 max_mtu; + + mlx5e_init_l2_addr(priv); + + /* MTU range: 68 - hw-specific max */ + netdev->min_mtu = ETH_MIN_MTU; + mlx5_query_port_max_mtu(priv->mdev, _mtu, 1); + netdev->max_mtu = MLX5E_HW2SW_MTU(max_mtu); + mlx5e_set_dev_port_mtu(priv); mlx5_lag_add(mdev, netdev); @@ -4141,6 +4186,8 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv) mlx5_eswitch_register_vport_rep(esw, 0, ); } + mlx5e_register_vport_rep(mdev); + if (netdev->reg_state != NETREG_REGISTERED) return; @@ -4152,6 +4199,12 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv) } queue_work(priv->wq, >set_rx_mode_work); + +
[PATCH net-next 02/16] net/mlx5: Refactor create flow table method to accept underlay QP
From: Erez Shitrit <ere...@mellanox.com> IB flow tables need the underlay qp to perform flow steering. Here we change the API of the flow tables creation to accept the underlay QP number as a parameter in order to support IB (IPoIB) flow steering. Signed-off-by: Erez Shitrit <ere...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 10 ++- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c| 25 +-- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 5 +- .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 16 +++-- drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c | 8 +++ drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 84 +- drivers/net/ethernet/mellanox/mlx5/core/fs_core.h | 1 + include/linux/mlx5/fs.h| 14 ++-- 8 files changed, 113 insertions(+), 50 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c index c4e9cc79f5c7..c8a005326e30 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c @@ -321,10 +321,16 @@ static int arfs_create_table(struct mlx5e_priv *priv, { struct mlx5e_arfs_tables *arfs = >fs.arfs; struct mlx5e_flow_table *ft = >arfs_tables[type].ft; + struct mlx5_flow_table_attr ft_attr = {}; int err; - ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO, - MLX5E_ARFS_TABLE_SIZE, MLX5E_ARFS_FT_LEVEL, 0); + ft->num_groups = 0; + + ft_attr.max_fte = MLX5E_ARFS_TABLE_SIZE; + ft_attr.level = MLX5E_ARFS_FT_LEVEL; + ft_attr.prio = MLX5E_NIC_PRIO; + + ft->t = mlx5_create_flow_table(priv->fs.ns, _attr); if (IS_ERR(ft->t)) { err = PTR_ERR(ft->t); ft->t = NULL; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c index 5376d69a6b1a..729904c43801 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c @@ -803,11 +803,15 @@ static void mlx5e_destroy_ttc_table(struct mlx5e_priv *priv) static int mlx5e_create_ttc_table(struct mlx5e_priv *priv) { struct mlx5e_ttc_table *ttc = >fs.ttc; + struct mlx5_flow_table_attr ft_attr = {}; struct mlx5e_flow_table *ft = >ft; int err; - ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO, - MLX5E_TTC_TABLE_SIZE, MLX5E_TTC_FT_LEVEL, 0); + ft_attr.max_fte = MLX5E_TTC_TABLE_SIZE; + ft_attr.level = MLX5E_TTC_FT_LEVEL; + ft_attr.prio = MLX5E_NIC_PRIO; + + ft->t = mlx5_create_flow_table(priv->fs.ns, _attr); if (IS_ERR(ft->t)) { err = PTR_ERR(ft->t); ft->t = NULL; @@ -973,12 +977,16 @@ static int mlx5e_create_l2_table(struct mlx5e_priv *priv) { struct mlx5e_l2_table *l2_table = >fs.l2; struct mlx5e_flow_table *ft = _table->ft; + struct mlx5_flow_table_attr ft_attr = {}; int err; ft->num_groups = 0; - ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO, - MLX5E_L2_TABLE_SIZE, MLX5E_L2_FT_LEVEL, 0); + ft_attr.max_fte = MLX5E_L2_TABLE_SIZE; + ft_attr.level = MLX5E_L2_FT_LEVEL; + ft_attr.prio = MLX5E_NIC_PRIO; + + ft->t = mlx5_create_flow_table(priv->fs.ns, _attr); if (IS_ERR(ft->t)) { err = PTR_ERR(ft->t); ft->t = NULL; @@ -1076,11 +1084,16 @@ static int mlx5e_create_vlan_table_groups(struct mlx5e_flow_table *ft) static int mlx5e_create_vlan_table(struct mlx5e_priv *priv) { struct mlx5e_flow_table *ft = >fs.vlan.ft; + struct mlx5_flow_table_attr ft_attr = {}; int err; ft->num_groups = 0; - ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO, - MLX5E_VLAN_TABLE_SIZE, MLX5E_VLAN_FT_LEVEL, 0); + + ft_attr.max_fte = MLX5E_VLAN_TABLE_SIZE; + ft_attr.level = MLX5E_VLAN_FT_LEVEL; + ft_attr.prio = MLX5E_NIC_PRIO; + + ft->t = mlx5_create_flow_table(priv->fs.ns, _attr); if (IS_ERR(ft->t)) { err = PTR_ERR(ft->t); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index fcd5bc7e31db..b3281d1118b3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -337,6 +337,7 @@ esw_fdb_set_vport_promisc_rule(struct mlx5_eswitch *esw, u32 vport) static int esw_create_legacy_fdb_table(struct mlx5_eswitch *esw, int nvports) { int inlen = MLX5_ST_SZ_BYTES(crea
[PATCH net-next 00/16] Mellanox, mlx5 RDMA net device support
Hi Dave and Doug. This series provides the lower level mlx5 support of RDMA netdevice creation API [1] suggested and introduced by Intel's HFI OPA VNIC netdevice driver [2], to enable IPoIB mlx5 RDMA netdevice creation. mlx5 IPoIB RDMA netdev will serve as an acceleration netdevice for the current IPoIB ULP generic netdevice, providing: - mlx5 RSS support. - mlx5 HW RX,TX offloads (checksum, TSO, LRO, etc ..). - Full mlx5 HW features transparent to the ULP itself. The idea here is to reuse and benefit from the already implemented mlx5e netdevice management and channels API for both etherent and RDMA netdevices, since both IPoIB and Ethernet netdevices share same common mlx5 HW resources (with some small exceptions) and share most of the control/data path logic, it is more natural to have them share the same code. The differences between IPoIB and Ethernet netdevices can be summarized to: Steering: In mlx5, IPoIB traffic is sent and received from an underlay special QP, and in Ethernet the traffic is handled by vports and vport steering is managed by e-switch or FW. For IPoIB traffic to get steered correctly the only thing we need to do is to create RSS HW contexts for RX and TX HW contexts for TX (similar to mlx5e) with the underlay QP attached to them (underlay QP will be 0 in case of Ethernet). RX,TX: Since IPoIB traffic is different, slightly modified RX and TX handlers are required, still we do some code reuse in data path via common helper functions. All of the other generic netdevice and mlx5 aspects will be shared between mlx5 Ethernet and IPoIB netdevices, e.g. - Channels creation and handling (RQs,SQs,CQs, NAPI, interrupt moderation, etc..) - Offloads, checksum, GRO, LRO, TSO, and more. - netdevice logic and non Ethernet specific ndos (open/close, etc..) In order to achieve what we want: In patchet 1 to 3, Erez added the supported for underlay QP in mlx5_ifc and refactored the mlx5 steering code to accept the underlay QP as a parameter for creating steering objects and enabled flow steering for IB link. Then we are going to use the mlx5e netdevice profile, which is already used to separate between NIC and VF representors netdevices, to create new type of IPoIB netdevice profile. For that, one small refactoring is required to make mlx5e netdevice profile management more genetic and agnostic to link type which is done in patch #4. In patch #5, we introduce ipoib.c to host all of mlx5 IPoIB (mlx5i) specific logic and a skeleton for the IPoIB mlx5 netdevice profile, and we will start filling it in next patches, using mlx5e already existing APIs. Patch #6 and #7, Implement init/cleanup RX mlx5i netdev profile handlers to create mlx5 RSS resources, same as mlx5e but without vlan and L2 steering tables. Patch #8, Implement init/cleanup TX mlx5i netdev profile handlers, to create TX resources same as mlx5e but with one TC (tc = 0) support. Patch #9, Implement mlx5i open/close ndos, where we reuese the mlx5e channels API, to start/stop TX/RX channels. Patch #10, Create the underlay QP and attach it to mlx5i RSS and TX HW contexts. Patch #11 and #12, Break down the mlx5e xmit flow into smaller helper function and implement the mlx5i IPoIB xmit routine. Patch #13 and #14, Have an RX handler per netdevice profile. We already do this before this series in a non clean way to separate between NIC netdev and VF representor RX handlers, in patch 13 we make the RX handler generic and bound to a profile and in patch 14 we implement the IPoIB RX handlers. Patch #15, Small cleanup to avoid e-switch with IPoIB netdev. In order to enable mlx5 IPoIB, a merge between the IPoIB RDMA netdev offolad support [3] - which was alread submitted to the rdma mailing list - and this series is required plus an extra small patch [4] which will connect between both sides and actually enables the offload. Once both patch-sets are merged into linux we will have to submit the extra small patch [4], to enable the feature. Thanks, Saeed. [1] https://patchwork.kernel.org/patch/9676637/ [2] https://lwn.net/Articles/715453/ https://patchwork.kernel.org/patch/9587815/ [3] https://patchwork.kernel.org/patch/9672069/ [4] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/commit/?id=0141db6a686e32294dee015b7d07706162ba48d8 Erez Shitrit (4): net/mlx5: Add IPoIB enhanced offloads bits to mlx5_ifc net/mlx5: Refactor create flow table method to accept underlay QP net/mlx5: Enable flow-steering for IB link hw/mlx5: Add New bit to check over QP creation Saeed Mahameed (12): net/mlx5e: More generic netdev management API net/mlx5e: IPoIB, Add netdevice profile skeleton net/mlx5e: IPoIB, RX steering RSS RQTs and TIRs net/mlx5e: IPoIB, RSS flow steering tables net/mlx5e: IPoIB, TX TIS creation net/mlx5e: IPoIB, Basic netdev ndos open/close net/mlx5e: IPoIB, Underlay QP net/mlx5e: Xmit flow break down net
[PATCH net-next 03/16] net/mlx5: Enable flow-steering for IB link
From: Erez Shitrit <ere...@mellanox.com> Get the relevant capabilities if supports ipoib_enhanced_offloads and init the flow steering table accordingly. Signed-off-by: Erez Shitrit <ere...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 11 --- drivers/net/ethernet/mellanox/mlx5/core/fw.c | 3 ++- 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 55182d0b06e8..b8a176503d38 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -1905,9 +1905,6 @@ void mlx5_cleanup_fs(struct mlx5_core_dev *dev) { struct mlx5_flow_steering *steering = dev->priv.steering; - if (MLX5_CAP_GEN(dev, port_type) != MLX5_CAP_PORT_TYPE_ETH) - return; - cleanup_root_ns(steering->root_ns); cleanup_root_ns(steering->esw_egress_root_ns); cleanup_root_ns(steering->esw_ingress_root_ns); @@ -2010,9 +2007,6 @@ int mlx5_init_fs(struct mlx5_core_dev *dev) struct mlx5_flow_steering *steering; int err = 0; - if (MLX5_CAP_GEN(dev, port_type) != MLX5_CAP_PORT_TYPE_ETH) - return 0; - err = mlx5_init_fc_stats(dev); if (err) return err; @@ -2023,7 +2017,10 @@ int mlx5_init_fs(struct mlx5_core_dev *dev) steering->dev = dev; dev->priv.steering = steering; - if (MLX5_CAP_GEN(dev, nic_flow_table) && + if MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_ETH) && + (MLX5_CAP_GEN(dev, nic_flow_table))) || +((MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_IB) && + MLX5_CAP_GEN(dev, ipoib_enhanced_offloads))) && MLX5_CAP_FLOWTABLE_NIC_RX(dev, ft_support)) { err = init_root_ns(steering); if (err) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c index d0bbefa08af7..1bc14d0fded8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c @@ -137,7 +137,8 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev) return err; } - if (MLX5_CAP_GEN(dev, nic_flow_table)) { + if (MLX5_CAP_GEN(dev, nic_flow_table) || + MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)) { err = mlx5_core_get_caps(dev, MLX5_CAP_FLOW_TABLE); if (err) return err; -- 2.11.0
[PATCH net-next 08/16] net/mlx5e: IPoIB, TX TIS creation
Modify mlx5e tis creation function to accept underlay qp number, which will be needed by IPoIB. Implement mlx5i (IPoIB) tx init/cleanup netdevice profile flows to create one TIS with the IPoIB underlay qp, for IPoIB TX SQs. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Erez Shitrit <ere...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 4 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 18 ++ drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 14 -- 3 files changed, 26 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index c813eab5d764..5345d875b695 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -1014,6 +1014,10 @@ void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct mlx5e_rqt *rqt); int mlx5e_create_ttc_table(struct mlx5e_priv *priv, u32 underlay_qpn); void mlx5e_destroy_ttc_table(struct mlx5e_priv *priv); +int mlx5e_create_tis(struct mlx5_core_dev *mdev, int tc, +u32 underlay_qpn, u32 *tisn); +void mlx5e_destroy_tis(struct mlx5_core_dev *mdev, u32 tisn); + int mlx5e_create_tises(struct mlx5e_priv *priv); void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv); int mlx5e_close(struct net_device *netdev); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 08b67aa24644..1fde4e2301a4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -2759,24 +2759,25 @@ static void mlx5e_close_drop_rq(struct mlx5e_rq *drop_rq) mlx5e_free_cq(_rq->cq); } -static int mlx5e_create_tis(struct mlx5e_priv *priv, int tc) +int mlx5e_create_tis(struct mlx5_core_dev *mdev, int tc, +u32 underlay_qpn, u32 *tisn) { - struct mlx5_core_dev *mdev = priv->mdev; u32 in[MLX5_ST_SZ_DW(create_tis_in)] = {0}; void *tisc = MLX5_ADDR_OF(create_tis_in, in, ctx); MLX5_SET(tisc, tisc, prio, tc << 1); + MLX5_SET(tisc, tisc, underlay_qpn, underlay_qpn); MLX5_SET(tisc, tisc, transport_domain, mdev->mlx5e_res.td.tdn); if (mlx5_lag_is_lacp_owner(mdev)) MLX5_SET(tisc, tisc, strict_lag_tx_port_affinity, 1); - return mlx5_core_create_tis(mdev, in, sizeof(in), >tisn[tc]); + return mlx5_core_create_tis(mdev, in, sizeof(in), tisn); } -static void mlx5e_destroy_tis(struct mlx5e_priv *priv, int tc) +void mlx5e_destroy_tis(struct mlx5_core_dev *mdev, u32 tisn) { - mlx5_core_destroy_tis(priv->mdev, priv->tisn[tc]); + mlx5_core_destroy_tis(mdev, tisn); } int mlx5e_create_tises(struct mlx5e_priv *priv) @@ -2785,7 +2786,7 @@ int mlx5e_create_tises(struct mlx5e_priv *priv) int tc; for (tc = 0; tc < priv->profile->max_tc; tc++) { - err = mlx5e_create_tis(priv, tc); + err = mlx5e_create_tis(priv->mdev, tc, 0, >tisn[tc]); if (err) goto err_close_tises; } @@ -2794,7 +2795,7 @@ int mlx5e_create_tises(struct mlx5e_priv *priv) err_close_tises: for (tc--; tc >= 0; tc--) - mlx5e_destroy_tis(priv, tc); + mlx5e_destroy_tis(priv->mdev, priv->tisn[tc]); return err; } @@ -2804,7 +2805,7 @@ void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv) int tc; for (tc = 0; tc < priv->profile->max_tc; tc++) - mlx5e_destroy_tis(priv, tc); + mlx5e_destroy_tis(priv->mdev, priv->tisn[tc]); } static void mlx5e_build_indir_tir_ctx(struct mlx5e_priv *priv, @@ -3841,6 +3842,7 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev, mlx5e_set_rq_params(mdev, params); /* HW LRO */ + /* TODO: && MLX5_CAP_ETH(mdev, lro_cap) */ if (params->rq_wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ) params->lro_en = true; params->lro_timeout = mlx5e_choose_lro_timeout(mdev, MLX5E_DEFAULT_LRO_TIMEOUT); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c index e16e1c7b246e..d7d705c840ae 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c @@ -63,13 +63,23 @@ static void mlx5i_cleanup(struct mlx5e_priv *priv) static int mlx5i_init_tx(struct mlx5e_priv *priv) { + struct mlx5i_priv *ipriv = priv->ppriv; + int err; + /* TODO: Create IPoIB underlay QP */ - /* TODO: create IPoIB TX HW TIS */ + + err = mlx5e_create_tis(priv->mdev, 0 /* tc */, ipriv->qp.qpn, >tisn[0]); + if (err) { + mlx5_core_warn(priv->mdev, "create tis failed, %d\n", err); +
[PATCH net-next 13/16] net/mlx5e: RX handlers per netdev profile
In order to have different RX handler per profile, fix and refactor the current code to take the rx handler directly from the netdevice profile rather than computing it on runtime as it was done with the switchdev mode representor rx handler. This will also remove the current wrong assumption in mlx5e_alloc_rq code that mlx5e_priv->ppriv is of the type vport_rep. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Erez Shitrit <ere...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 5 +++- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 28 ++- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 4 +++- 3 files changed, 24 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 25185f8c3562..0881325fba04 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -779,6 +779,10 @@ struct mlx5e_profile { void(*disable)(struct mlx5e_priv *priv); void(*update_stats)(struct mlx5e_priv *priv); int (*max_nch)(struct mlx5_core_dev *mdev); + struct { + mlx5e_fp_handle_rx_cqe handle_rx_cqe; + mlx5e_fp_handle_rx_cqe handle_rx_cqe_mpwqe; + } rx_handlers; int max_tc; }; @@ -1032,7 +1036,6 @@ int mlx5e_get_offload_stats(int attr_id, const struct net_device *dev, bool mlx5e_has_offload_stats(const struct net_device *dev, int attr_id); bool mlx5e_is_uplink_rep(struct mlx5e_priv *priv); -bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv); /* mlx5e generic netdev management API */ struct net_device* diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 2201b7ea05f4..6a164aff404c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -585,15 +585,17 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c, switch (rq->wq_type) { case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ: - if (mlx5e_is_vf_vport_rep(c->priv)) { - err = -EINVAL; - goto err_rq_wq_destroy; - } - rq->handle_rx_cqe = mlx5e_handle_rx_cqe_mpwrq; rq->alloc_wqe = mlx5e_alloc_rx_mpwqe; rq->dealloc_wqe = mlx5e_dealloc_rx_mpwqe; + rq->handle_rx_cqe = c->priv->profile->rx_handlers.handle_rx_cqe_mpwqe; + if (!rq->handle_rx_cqe) { + err = -EINVAL; + netdev_err(c->netdev, "RX handler of MPWQE RQ is not set, err %d\n", err); + goto err_rq_wq_destroy; + } + rq->mpwqe_stride_sz = BIT(params->mpwqe_log_stride_sz); rq->mpwqe_num_strides = BIT(params->mpwqe_log_num_strides); @@ -616,15 +618,17 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c, err = -ENOMEM; goto err_rq_wq_destroy; } - - if (mlx5e_is_vf_vport_rep(c->priv)) - rq->handle_rx_cqe = mlx5e_handle_rx_cqe_rep; - else - rq->handle_rx_cqe = mlx5e_handle_rx_cqe; - rq->alloc_wqe = mlx5e_alloc_rx_wqe; rq->dealloc_wqe = mlx5e_dealloc_rx_wqe; + rq->handle_rx_cqe = c->priv->profile->rx_handlers.handle_rx_cqe; + if (!rq->handle_rx_cqe) { + kfree(rq->dma_info); + err = -EINVAL; + netdev_err(c->netdev, "RX handler of RQ is not set, err %d\n", err); + goto err_rq_wq_destroy; + } + rq->buff.wqe_sz = params->lro_en ? params->lro_wqe_sz : MLX5E_SW2HW_MTU(c->netdev->mtu); @@ -4229,6 +4233,8 @@ static const struct mlx5e_profile mlx5e_nic_profile = { .disable = mlx5e_nic_disable, .update_stats = mlx5e_update_stats, .max_nch = mlx5e_get_max_num_channels, + .rx_handlers.handle_rx_cqe = mlx5e_handle_rx_cqe, + .rx_handlers.handle_rx_cqe_mpwqe = mlx5e_handle_rx_cqe_mpwrq, .max_tc= MLX5E_MAX_NUM_TC, }; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index da85b0ad3e92..16b683e8226d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -329,7 +329,7 @@ bool mlx5e_is_uplink_rep(struct mlx5e_priv *priv) return false; } -bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv) +static bool mlx5e_is_vf_vpor
[PATCH net-next 12/16] net/mlx5e: IPoIB, Xmit flow
Implement mlx5e's IPoIB SKB transmit using the helper functions provided by mlx5e ethernet tx flow, the only difference in the code between mlx5e_xmit and mlx5i_xmit is that IPoIB has some extra fields to fill (UD datagram segment) in the TX descriptor (WQE) and it doesn't need to have any vlan handling. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Erez Shitrit <ere...@mellanox.com> --- drivers/infiniband/hw/mlx5/mlx5_ib.h| 10 --- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 87 + drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 10 +++ drivers/net/ethernet/mellanox/mlx5/core/ipoib.h | 3 + include/linux/mlx5/qp.h | 10 +++ 5 files changed, 110 insertions(+), 10 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 3cd064b5f0bf..ce8ba617d46e 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -729,16 +729,6 @@ static inline struct mlx5_ib_mw *to_mmw(struct ib_mw *ibmw) return container_of(ibmw, struct mlx5_ib_mw, ibmw); } -struct mlx5_ib_ah { - struct ib_ahibah; - struct mlx5_av av; -}; - -static inline struct mlx5_ib_ah *to_mah(struct ib_ah *ibah) -{ - return container_of(ibah, struct mlx5_ib_ah, ibah); -} - int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context, unsigned long virt, struct mlx5_db *db); void mlx5_ib_db_unmap_user(struct mlx5_ib_ucontext *context, struct mlx5_db *db); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index ba664a1126cf..dda7db503043 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -503,3 +503,90 @@ void mlx5e_free_txqsq_descs(struct mlx5e_txqsq *sq) sq->cc += wi->num_wqebbs; } } + +#ifdef CONFIG_MLX5_CORE_IPOIB + +struct mlx5_wqe_eth_pad { + u8 rsvd0[16]; +}; + +struct mlx5i_tx_wqe { + struct mlx5_wqe_ctrl_seg ctrl; + struct mlx5_wqe_datagram_seg datagram; + struct mlx5_wqe_eth_pad pad; + struct mlx5_wqe_eth_seg eth; +}; + +static inline void +mlx5i_txwqe_build_datagram(struct mlx5_av *av, u32 dqpn, u32 dqkey, + struct mlx5_wqe_datagram_seg *dseg) +{ + memcpy(>av, av, sizeof(struct mlx5_av)); + dseg->av.dqp_dct = cpu_to_be32(dqpn | MLX5_EXTENDED_UD_AV); + dseg->av.key.qkey.qkey = cpu_to_be32(dqkey); +} + +netdev_tx_t mlx5i_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb, + struct mlx5_av *av, u32 dqpn, u32 dqkey) +{ + struct mlx5_wq_cyc *wq = >wq; + u16 pi = sq->pc & wq->sz_m1; + struct mlx5i_tx_wqe *wqe = mlx5_wq_cyc_get_wqe(wq, pi); + struct mlx5e_tx_wqe_info *wi = >db.wqe_info[pi]; + + struct mlx5_wqe_ctrl_seg *cseg = >ctrl; + struct mlx5_wqe_datagram_seg *datagram = >datagram; + struct mlx5_wqe_eth_seg *eseg = >eth; + + unsigned char *skb_data = skb->data; + unsigned int skb_len = skb->len; + u8 opcode = MLX5_OPCODE_SEND; + unsigned int num_bytes; + int num_dma; + u16 headlen; + u16 ds_cnt; + u16 ihs; + + memset(wqe, 0, sizeof(*wqe)); + + mlx5i_txwqe_build_datagram(av, dqpn, dqkey, datagram); + + mlx5e_txwqe_build_eseg_csum(sq, skb, eseg); + + if (skb_is_gso(skb)) { + opcode = MLX5_OPCODE_LSO; + ihs = mlx5e_txwqe_build_eseg_gso(sq, skb, eseg, _bytes); + } else { + ihs = mlx5e_calc_min_inline(sq->min_inline_mode, skb); + num_bytes = max_t(unsigned int, skb->len, ETH_ZLEN); + } + + ds_cnt = sizeof(*wqe) / MLX5_SEND_WQE_DS; + if (ihs) { + memcpy(eseg->inline_hdr.start, skb_data, ihs); + mlx5e_tx_skb_pull_inline(_data, _len, ihs); + eseg->inline_hdr.sz = cpu_to_be16(ihs); + ds_cnt += DIV_ROUND_UP(ihs - sizeof(eseg->inline_hdr.start), MLX5_SEND_WQE_DS); + } + + headlen = skb_len - skb->data_len; + num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb_data, headlen, + (struct mlx5_wqe_data_seg *)cseg + ds_cnt); + if (unlikely(num_dma < 0)) + goto dma_unmap_wqe_err; + + mlx5e_txwqe_complete(sq, skb, opcode, ds_cnt + num_dma, +num_bytes, num_dma, wi, cseg); + + return NETDEV_TX_OK; + +dma_unmap_wqe_err: + sq->stats.dropped++; + mlx5e_dma_unmap_wqe_err(sq, wi->num_dma); + + dev_kfree_skb_any(skb); + + return NETDEV_TX_OK; +} + +#endif diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/cor
[PATCH net-next 15/16] net/mlx5e: E-switch vport manager is valid for ethernet only
Currently the driver support only ethernet eswitch, and we want to protect downstream IPoIB netdev from trying to access it in IB link. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Erez Shitrit <ere...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 6a164aff404c..061b20c73071 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -2548,6 +2548,12 @@ static void mlx5e_build_channels_tx_maps(struct mlx5e_priv *priv) } } +static bool mlx5e_is_eswitch_vport_mngr(struct mlx5_core_dev *mdev) +{ + return (MLX5_CAP_GEN(mdev, vport_group_manager) && + MLX5_CAP_GEN(mdev, port_type) == MLX5_CAP_PORT_TYPE_ETH); +} + void mlx5e_activate_priv_channels(struct mlx5e_priv *priv) { int num_txqs = priv->channels.num * priv->channels.params.num_tc; @@ -2561,7 +2567,7 @@ void mlx5e_activate_priv_channels(struct mlx5e_priv *priv) mlx5e_activate_channels(>channels); netif_tx_start_all_queues(priv->netdev); - if (MLX5_CAP_GEN(priv->mdev, vport_group_manager)) + if (mlx5e_is_eswitch_vport_mngr(priv->mdev)) mlx5e_add_sqs_fwd_rules(priv); mlx5e_wait_channels_min_rx_wqes(>channels); @@ -2572,7 +2578,7 @@ void mlx5e_deactivate_priv_channels(struct mlx5e_priv *priv) { mlx5e_redirect_rqts_to_drop(priv); - if (MLX5_CAP_GEN(priv->mdev, vport_group_manager)) + if (mlx5e_is_eswitch_vport_mngr(priv->mdev)) mlx5e_remove_sqs_fwd_rules(priv); /* FIXME: This is a W/A only for tx timeout watch dog false alarm when -- 2.11.0
[PATCH net-next 14/16] net/mlx5e: IPoIB, RX handler
Implement IPoIB RX SKB handler. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Erez Shitrit <ere...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 78 + drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 2 + drivers/net/ethernet/mellanox/mlx5/core/ipoib.h | 1 + 3 files changed, 81 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index 1a9532b31635..43308243f519 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -1031,3 +1031,81 @@ void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq) mlx5e_page_release(rq, di, false); } } + +#ifdef CONFIG_MLX5_CORE_IPOIB + +#define MLX5_IB_GRH_DGID_OFFSET 24 +#define MLX5_IB_GRH_BYTES 40 +#define MLX5_IPOIB_ENCAP_LEN4 +#define MLX5_GID_SIZE 16 + +static inline void mlx5i_complete_rx_cqe(struct mlx5e_rq *rq, +struct mlx5_cqe64 *cqe, +u32 cqe_bcnt, +struct sk_buff *skb) +{ + struct net_device *netdev = rq->netdev; + u8 *dgid; + u8 g; + + g = (be32_to_cpu(cqe->flags_rqpn) >> 28) & 3; + dgid = skb->data + MLX5_IB_GRH_DGID_OFFSET; + if ((!g) || dgid[0] != 0xff) + skb->pkt_type = PACKET_HOST; + else if (memcmp(dgid, netdev->broadcast + 4, MLX5_GID_SIZE) == 0) + skb->pkt_type = PACKET_BROADCAST; + else + skb->pkt_type = PACKET_MULTICAST; + + /* TODO: IB/ipoib: Allow mcast packets from other VFs +* 68996a6e760e5c74654723eeb57bf65628ae87f4 +*/ + + skb_pull(skb, MLX5_IB_GRH_BYTES); + + skb->protocol = *((__be16 *)(skb->data)); + + skb->ip_summed = CHECKSUM_COMPLETE; + skb->csum = csum_unfold((__force __sum16)cqe->check_sum); + + skb_record_rx_queue(skb, rq->ix); + + if (likely(netdev->features & NETIF_F_RXHASH)) + mlx5e_skb_set_hash(cqe, skb); + + skb_reset_mac_header(skb); + skb_pull(skb, MLX5_IPOIB_ENCAP_LEN); + + skb->dev = netdev; + + rq->stats.csum_complete++; + rq->stats.packets++; + rq->stats.bytes += cqe_bcnt; +} + +void mlx5i_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe) +{ + struct mlx5e_rx_wqe *wqe; + __be16 wqe_counter_be; + struct sk_buff *skb; + u16 wqe_counter; + u32 cqe_bcnt; + + wqe_counter_be = cqe->wqe_counter; + wqe_counter= be16_to_cpu(wqe_counter_be); + wqe= mlx5_wq_ll_get_wqe(>wq, wqe_counter); + cqe_bcnt = be32_to_cpu(cqe->byte_cnt); + + skb = skb_from_cqe(rq, cqe, wqe_counter, cqe_bcnt); + if (!skb) + goto wq_ll_pop; + + mlx5i_complete_rx_cqe(rq, cqe, cqe_bcnt, skb); + napi_gro_receive(rq->cq.napi, skb); + +wq_ll_pop: + mlx5_wq_ll_pop(>wq, wqe_counter_be, + >next.next_wqe_index); +} + +#endif /* CONFIG_MLX5_CORE_IPOIB */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c index c468aaedf0a6..001d2953cb6d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c @@ -282,6 +282,8 @@ static const struct mlx5e_profile mlx5i_nic_profile = { .disable = NULL, /* mlx5i_disable */ .update_stats = NULL, /* mlx5i_update_stats */ .max_nch = mlx5e_get_max_num_channels, + .rx_handlers.handle_rx_cqe = mlx5i_handle_rx_cqe, + .rx_handlers.handle_rx_cqe_mpwqe = NULL, /* Not supported */ .max_tc= MLX5I_MAX_NUM_TC, }; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.h b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.h index 89bca182464c..bae0a5cbc8ad 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.h @@ -49,5 +49,6 @@ struct mlx5i_priv { netdev_tx_t mlx5i_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb, struct mlx5_av *av, u32 dqpn, u32 dqkey); +void mlx5i_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe); #endif /* __MLX5E_IPOB_H__ */ -- 2.11.0
[PATCH net-next 06/16] net/mlx5e: IPoIB, RX steering RSS RQTs and TIRs
Implement IPoIB RX RSS (RQTs and TIRs) HW objects creation, All we do here is simply reuse the mlx5e implementation to create direct and indirect (RSS) steering HW objects. For that we just expose mlx5e_{create,destroy}_{direct,indirect}_{rqt,tir} functions into en.h and call them from ipoib.c in init/cleanup_rx IPoIB netdevice profile callbacks. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Erez Shitrit <ere...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 12 - drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 56 --- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 17 ++- drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 42 +++-- 4 files changed, 83 insertions(+), 44 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 02aa3cc59dc3..e5518536d56f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -999,10 +999,17 @@ int mlx5e_attr_get(struct net_device *dev, struct switchdev_attr *attr); void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe); void mlx5e_update_hw_rep_counters(struct mlx5e_priv *priv); +int mlx5e_create_indirect_rqt(struct mlx5e_priv *priv); + +int mlx5e_create_indirect_tirs(struct mlx5e_priv *priv); +void mlx5e_destroy_indirect_tirs(struct mlx5e_priv *priv); + int mlx5e_create_direct_rqts(struct mlx5e_priv *priv); -void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct mlx5e_rqt *rqt); +void mlx5e_destroy_direct_rqts(struct mlx5e_priv *priv); int mlx5e_create_direct_tirs(struct mlx5e_priv *priv); void mlx5e_destroy_direct_tirs(struct mlx5e_priv *priv); +void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct mlx5e_rqt *rqt); + int mlx5e_create_tises(struct mlx5e_priv *priv); void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv); int mlx5e_close(struct net_device *netdev); @@ -1024,5 +1031,8 @@ mlx5e_create_netdev(struct mlx5_core_dev *mdev, const struct mlx5e_profile *prof int mlx5e_attach_netdev(struct mlx5e_priv *priv); void mlx5e_detach_netdev(struct mlx5e_priv *priv); void mlx5e_destroy_netdev(struct mlx5e_priv *priv); +void mlx5e_build_nic_params(struct mlx5_core_dev *mdev, + struct mlx5e_params *params, + u16 max_channels); #endif /* __MLX5_EN_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 14c7452a6348..08b67aa24644 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -2115,11 +2115,15 @@ void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct mlx5e_rqt *rqt) mlx5_core_destroy_rqt(priv->mdev, rqt->rqtn); } -static int mlx5e_create_indirect_rqts(struct mlx5e_priv *priv) +int mlx5e_create_indirect_rqt(struct mlx5e_priv *priv) { struct mlx5e_rqt *rqt = >indir_rqt; + int err; - return mlx5e_create_rqt(priv, MLX5E_INDIR_RQT_SIZE, rqt); + err = mlx5e_create_rqt(priv, MLX5E_INDIR_RQT_SIZE, rqt); + if (err) + mlx5_core_warn(priv->mdev, "create indirect rqts failed, %d\n", err); + return err; } int mlx5e_create_direct_rqts(struct mlx5e_priv *priv) @@ -2138,12 +2142,21 @@ int mlx5e_create_direct_rqts(struct mlx5e_priv *priv) return 0; err_destroy_rqts: + mlx5_core_warn(priv->mdev, "create direct rqts failed, %d\n", err); for (ix--; ix >= 0; ix--) mlx5e_destroy_rqt(priv, >direct_tir[ix].rqt); return err; } +void mlx5e_destroy_direct_rqts(struct mlx5e_priv *priv) +{ + int i; + + for (i = 0; i < priv->profile->max_nch(priv->mdev); i++) + mlx5e_destroy_rqt(priv, >direct_tir[i].rqt); +} + static int mlx5e_rx_hash_fn(int hfunc) { return (hfunc == ETH_RSS_HASH_TOP) ? @@ -2818,7 +2831,7 @@ static void mlx5e_build_direct_tir_ctx(struct mlx5e_priv *priv, u32 rqtn, u32 *t MLX5_SET(tirc, tirc, rx_hash_fn, MLX5_RX_HASH_FN_INVERTED_XOR8); } -static int mlx5e_create_indirect_tirs(struct mlx5e_priv *priv) +int mlx5e_create_indirect_tirs(struct mlx5e_priv *priv) { struct mlx5e_tir *tir; void *tirc; @@ -2847,6 +2860,7 @@ static int mlx5e_create_indirect_tirs(struct mlx5e_priv *priv) return 0; err_destroy_tirs: + mlx5_core_warn(priv->mdev, "create indirect tirs failed, %d\n", err); for (tt--; tt >= 0; tt--) mlx5e_destroy_tir(priv->mdev, >indir_tir[tt]); @@ -2885,6 +2899,7 @@ int mlx5e_create_direct_tirs(struct mlx5e_priv *priv) return 0; err_destroy_ch_tirs: + mlx5_core_warn(priv->mdev, "create direct tirs failed, %d\n", err); for (ix--; ix >= 0; ix--) mlx5e_destroy_tir(
[PATCH net-next 01/16] net/mlx5: Add IPoIB enhanced offloads bits to mlx5_ifc
From: Erez Shitrit <ere...@mellanox.com> New capability bit: ipoib_enhanced_offloads, indicates new ability for UD QP to do RSS and enhanced IPoIB offloads and acceleration. Add underlay_qpn to the TIS and flow_table objects In order to support SET_ROOT command, to connect between IPoIB QPs and flow steering tables. Signed-off-by: Erez Shitrit <ere...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- include/linux/mlx5/mlx5_ifc.h | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 1993adbd2c82..7c50bd39b297 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -872,7 +872,8 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 compact_address_vector[0x1]; u8 striding_rq[0x1]; - u8 reserved_at_202[0x2]; + u8 reserved_at_202[0x1]; + u8 ipoib_enhanced_offloads[0x1]; u8 ipoib_basic_offloads[0x1]; u8 reserved_at_205[0xa]; u8 drain_sigerr[0x1]; @@ -2293,7 +2294,9 @@ struct mlx5_ifc_tisc_bits { u8 reserved_at_120[0x8]; u8 transport_domain[0x18]; - u8 reserved_at_140[0x3c0]; + u8 reserved_at_140[0x8]; + u8 underlay_qpn[0x18]; + u8 reserved_at_160[0x3a0]; }; enum { @@ -8218,7 +8221,9 @@ struct mlx5_ifc_set_flow_table_root_in_bits { u8 reserved_at_a0[0x8]; u8 table_id[0x18]; - u8 reserved_at_c0[0x140]; + u8 reserved_at_c0[0x8]; + u8 underlay_qpn[0x18]; + u8 reserved_at_e0[0x120]; }; enum { -- 2.11.0
[PATCH net-next 05/16] net/mlx5e: IPoIB, Add netdevice profile skeleton
Create mlx5e IPoIB netdevice profile skeleton in the new ipoib.c file with empty implementation. Downstream patches will provide the full mlx5 rdma netdevice acceleration support for IPoIB into this new file, by using the mlx5e netdevice profile and new mlx5_channels APIs and infrastructures. Same as already done in mlx5e NIC netdevice and switchdev mode VF representors. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Erez Shitrit <ere...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/Kconfig| 7 + drivers/net/ethernet/mellanox/mlx5/core/Makefile | 2 + drivers/net/ethernet/mellanox/mlx5/core/en.h | 9 + drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 9 - drivers/net/ethernet/mellanox/mlx5/core/ipoib.c| 181 + .../ethernet/mellanox/mlx5/core/ipoib.h} | 22 ++- 6 files changed, 215 insertions(+), 15 deletions(-) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c copy drivers/{infiniband/hw/mlx5/cmd.h => net/ethernet/mellanox/mlx5/core/ipoib.h} (77%) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig index 117170014e88..a84b652f9b54 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig @@ -31,3 +31,10 @@ config MLX5_CORE_EN_DCB This flag is depended on the kernel's DCB support. If unsure, set to Y + +config MLX5_CORE_IPOIB + bool "Mellanox Technologies ConnectX-4 IPoIB offloads support" + depends on MLX5_CORE_EN + default y + ---help--- + MLX5 IPoIB offloads & acceleration support. diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile index 9f43beb86250..9e644615f07a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile @@ -11,3 +11,5 @@ mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o eswitch_offloads.o \ en_tc.o en_arfs.o en_rep.o en_fs_ethtool.o en_selftest.o mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) += en_dcbnl.o + +mlx5_core-$(CONFIG_MLX5_CORE_IPOIB) += ipoib.o diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index ced31906b8fd..02aa3cc59dc3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -37,6 +37,7 @@ #include #include #include +#include #include #include #include @@ -153,6 +154,14 @@ static inline int mlx5_max_log_rq_size(int wq_type) } } +static inline int mlx5e_get_max_num_channels(struct mlx5_core_dev *mdev) +{ + return is_kdump_kernel() ? + MLX5E_MIN_NUM_CHANNELS : + min_t(int, mdev->priv.eq_table.num_comp_vectors, + MLX5E_MAX_NUM_CHANNELS); +} + struct mlx5e_tx_wqe { struct mlx5_wqe_ctrl_seg ctrl; struct mlx5_wqe_eth_seg eth; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index cdc34ba354c8..14c7452a6348 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -31,7 +31,6 @@ */ #include -#include #include #include #include @@ -1710,14 +1709,6 @@ static int mlx5e_set_tx_maxrate(struct net_device *dev, int index, u32 rate) return err; } -static inline int mlx5e_get_max_num_channels(struct mlx5_core_dev *mdev) -{ - return is_kdump_kernel() ? - MLX5E_MIN_NUM_CHANNELS : - min_t(int, mdev->priv.eq_table.num_comp_vectors, - MLX5E_MAX_NUM_CHANNELS); -} - static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, struct mlx5e_params *params, struct mlx5e_channel_param *cparam, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c new file mode 100644 index ..2f65927a8d03 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c @@ -0,0 +1,181 @@ +/* + * Copyright (c) 2017, Mellanox Technologies. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in b
[PATCH net-next 10/16] net/mlx5e: IPoIB, Underlay QP
Create IPoIB underlay QP needed by the IPoIB netdevice profile for RSS and TX HW context to perform on IPoIB traffic. Reset the underlay QP on dev_uninit ndo to stop IPoIB traffic going through this QP when the ULP IPoIB decides to cleanup. Implement attach/detach mcast RDMA netdev callbacks for later RDMA netdev use. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Erez Shitrit <ere...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 126 +++- 1 file changed, 124 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c index e188d067bc97..bd56f36066b3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c @@ -34,6 +34,8 @@ #include "en.h" #include "ipoib.h" +#define IB_DEFAULT_Q_KEY 0xb1b + static int mlx5i_open(struct net_device *netdev); static int mlx5i_close(struct net_device *netdev); static int mlx5i_dev_init(struct net_device *dev); @@ -83,12 +85,89 @@ static void mlx5i_cleanup(struct mlx5e_priv *priv) /* Do nothing .. */ } +#define MLX5_QP_ENHANCED_ULP_STATELESS_MODE 2 + +static int mlx5i_create_underlay_qp(struct mlx5_core_dev *mdev, struct mlx5_core_qp *qp) +{ + struct mlx5_qp_context *context = NULL; + u32 *in = NULL; + void *addr_path; + int ret = 0; + int inlen; + void *qpc; + + inlen = MLX5_ST_SZ_BYTES(create_qp_in); + in = mlx5_vzalloc(inlen); + if (!in) + return -ENOMEM; + + qpc = MLX5_ADDR_OF(create_qp_in, in, qpc); + MLX5_SET(qpc, qpc, st, MLX5_QP_ST_UD); + MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED); + MLX5_SET(qpc, qpc, ulp_stateless_offload_mode, +MLX5_QP_ENHANCED_ULP_STATELESS_MODE); + + addr_path = MLX5_ADDR_OF(qpc, qpc, primary_address_path); + MLX5_SET(ads, addr_path, port, 1); + MLX5_SET(ads, addr_path, grh, 1); + + ret = mlx5_core_create_qp(mdev, qp, in, inlen); + if (ret) { + mlx5_core_err(mdev, "Failed creating IPoIB QP err : %d\n", ret); + goto out; + } + + /* QP states */ + context = kzalloc(sizeof(*context), GFP_KERNEL); + if (!context) { + ret = -ENOMEM; + goto out; + } + + context->flags = cpu_to_be32(MLX5_QP_PM_MIGRATED << 11); + context->pri_path.port = 1; + context->qkey = cpu_to_be32(IB_DEFAULT_Q_KEY); + + ret = mlx5_core_qp_modify(mdev, MLX5_CMD_OP_RST2INIT_QP, 0, context, qp); + if (ret) { + mlx5_core_err(mdev, "Failed to modify qp RST2INIT, err: %d\n", ret); + goto out; + } + memset(context, 0, sizeof(*context)); + + ret = mlx5_core_qp_modify(mdev, MLX5_CMD_OP_INIT2RTR_QP, 0, context, qp); + if (ret) { + mlx5_core_err(mdev, "Failed to modify qp INIT2RTR, err: %d\n", ret); + goto out; + } + + ret = mlx5_core_qp_modify(mdev, MLX5_CMD_OP_RTR2RTS_QP, 0, context, qp); + if (ret) { + mlx5_core_err(mdev, "Failed to modify qp RTR2RTS, err: %d\n", ret); + goto out; + } + +out: + kfree(context); + kvfree(in); + return ret; +} + +static void mlx5i_destroy_underlay_qp(struct mlx5_core_dev *mdev, struct mlx5_core_qp *qp) +{ + mlx5_core_destroy_qp(mdev, qp); +} + static int mlx5i_init_tx(struct mlx5e_priv *priv) { struct mlx5i_priv *ipriv = priv->ppriv; int err; - /* TODO: Create IPoIB underlay QP */ + err = mlx5i_create_underlay_qp(priv->mdev, >qp); + if (err) { + mlx5_core_warn(priv->mdev, "create underlay QP failed, %d\n", err); + return err; + } err = mlx5e_create_tis(priv->mdev, 0 /* tc */, ipriv->qp.qpn, >tisn[0]); if (err) { @@ -101,7 +180,10 @@ static int mlx5i_init_tx(struct mlx5e_priv *priv) void mlx5i_cleanup_tx(struct mlx5e_priv *priv) { + struct mlx5i_priv *ipriv = priv->ppriv; + mlx5e_destroy_tis(priv->mdev, priv->tisn[0]); + mlx5i_destroy_underlay_qp(priv->mdev, >qp); } static int mlx5i_create_flow_steering(struct mlx5e_priv *priv) @@ -220,7 +302,13 @@ static int mlx5i_dev_init(struct net_device *dev) static void mlx5i_dev_cleanup(struct net_device *dev) { - /* TODO: detach underlay qp from flow-steering by reset it */ + struct mlx5e_priv*priv = mlx5i_epriv(dev); + struct mlx5_core_dev *mdev = priv->mdev; + struct mlx5i_priv*ipriv = priv->ppriv; + struct mlx5_qp_context context; + + /* detach qp from flow-steering by reset it */ + mlx5_core_qp_modify(mdev, MLX5_CMD_OP_2RST_QP, 0, , >qp); } static int mlx
[PATCH net-next 16/16] hw/mlx5: Add New bit to check over QP creation
From: Erez Shitrit <ere...@mellanox.com> Add check for bit IB_QP_CREATE_NETIF_QP while creating QP. Signed-off-by: Erez Shitrit <ere...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/infiniband/hw/mlx5/qp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index ad8a2638e339..ed6320186f89 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -897,6 +897,7 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev, if (init_attr->create_flags & ~(IB_QP_CREATE_SIGNATURE_EN | IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK | IB_QP_CREATE_IPOIB_UD_LSO | + IB_QP_CREATE_NETIF_QP | mlx5_ib_create_qp_sqpn_qp1())) return -EINVAL; -- 2.11.0
[PATCH net-next 11/16] net/mlx5e: Xmit flow break down
Break current mlx5e xmit flow into smaller blocks (helper functions) in order to reuse them for IPoIB SKB transmission. Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Reviewed-by: Erez Shitrit <ere...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 +- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 199 +- 3 files changed, 119 insertions(+), 89 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 23b92ec54e12..25185f8c3562 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -304,6 +304,7 @@ struct mlx5e_cq { } cacheline_aligned_in_smp; struct mlx5e_tx_wqe_info { + struct sk_buff *skb; u32 num_bytes; u8 num_wqebbs; u8 num_dma; @@ -345,7 +346,6 @@ struct mlx5e_txqsq { /* write@xmit, read@completion */ struct { - struct sk_buff **skb; struct mlx5e_sq_dma *dma_fifo; struct mlx5e_tx_wqe_info *wqe_info; } db; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index eb657987e9b5..2201b7ea05f4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -1042,7 +1042,6 @@ static void mlx5e_free_txqsq_db(struct mlx5e_txqsq *sq) { kfree(sq->db.wqe_info); kfree(sq->db.dma_fifo); - kfree(sq->db.skb); } static int mlx5e_alloc_txqsq_db(struct mlx5e_txqsq *sq, int numa) @@ -1050,13 +1049,11 @@ static int mlx5e_alloc_txqsq_db(struct mlx5e_txqsq *sq, int numa) int wq_sz = mlx5_wq_cyc_get_size(>wq); int df_sz = wq_sz * MLX5_SEND_WQEBB_NUM_DS; - sq->db.skb = kzalloc_node(wq_sz * sizeof(*sq->db.skb), - GFP_KERNEL, numa); sq->db.dma_fifo = kzalloc_node(df_sz * sizeof(*sq->db.dma_fifo), GFP_KERNEL, numa); sq->db.wqe_info = kzalloc_node(wq_sz * sizeof(*sq->db.wqe_info), GFP_KERNEL, numa); - if (!sq->db.skb || !sq->db.dma_fifo || !sq->db.wqe_info) { + if (!sq->db.dma_fifo || !sq->db.wqe_info) { mlx5e_free_txqsq_db(sq); return -ENOMEM; } @@ -1295,7 +1292,7 @@ static void mlx5e_deactivate_txqsq(struct mlx5e_txqsq *sq) if (mlx5e_wqc_has_room_for(>wq, sq->cc, sq->pc, 1)) { struct mlx5e_tx_wqe *nop; - sq->db.skb[(sq->pc & sq->wq.sz_m1)] = NULL; + sq->db.wqe_info[(sq->pc & sq->wq.sz_m1)].skb = NULL; nop = mlx5e_post_nop(>wq, sq->sqn, >pc); mlx5e_notify_hw(>wq, sq->pc, sq->uar_map, >ctrl); } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index 5bbc313e70c5..ba664a1126cf 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -177,30 +177,9 @@ static inline void mlx5e_insert_vlan(void *start, struct sk_buff *skb, u16 ihs, mlx5e_tx_skb_pull_inline(skb_data, skb_len, cpy2_sz); } -static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb) +static inline void +mlx5e_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb, struct mlx5_wqe_eth_seg *eseg) { - struct mlx5_wq_cyc *wq = >wq; - - u16 pi = sq->pc & wq->sz_m1; - struct mlx5e_tx_wqe *wqe = mlx5_wq_cyc_get_wqe(wq, pi); - struct mlx5e_tx_wqe_info *wi = >db.wqe_info[pi]; - - struct mlx5_wqe_ctrl_seg *cseg = >ctrl; - struct mlx5_wqe_eth_seg *eseg = >eth; - struct mlx5_wqe_data_seg *dseg; - - unsigned char *skb_data = skb->data; - unsigned int skb_len = skb->len; - u8 opcode = MLX5_OPCODE_SEND; - dma_addr_t dma_addr = 0; - unsigned int num_bytes; - u16 headlen; - u16 ds_cnt; - u16 ihs; - int i; - - memset(wqe, 0, sizeof(*wqe)); - if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) { eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM; if (skb->encapsulation) { @@ -212,66 +191,51 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb) } } else sq->stats.csum_none++; +} - if (skb_is_gso(skb)) { - eseg->mss= cpu_to_be16(skb_shinfo(skb)->gso_size); - opcode = MLX5_OPCODE_LSO; +static inline u16 +mlx5e_txwqe_build_eseg_gso(struct mlx5e_txqsq *sq, struct sk_buff *skb, +
Re: [PATCH net v2] net/mlx5e: Fix race in mlx5e_sw_stats and mlx5e_vport_stats
On Thu, Apr 20, 2017 at 2:32 AM, Martin KaFai Lau <ka...@fb.com> wrote: > We have observed a sudden spike in rx/tx_packets and rx/tx_bytes > reported under /proc/net/dev. There is a race in mlx5e_update_stats() > and some of the get-stats functions (the one that we hit is the > mlx5e_get_stats() which is called by ndo_get_stats64()). > > In particular, the very first thing mlx5e_update_sw_counters() > does is 'memset(s, 0, sizeof(*s))'. For example, if mlx5e_get_stats() > is unlucky at one point, rx_bytes and rx_packets could be 0. One second > later, a normal (and much bigger than 0) value will be reported. > > This patch is to use a 'struct mlx5e_sw_stats temp' to avoid > a direct memset zero on priv->stats.sw. > > mlx5e_update_vport_counters() has a similar race. Hence, addressed > together. > > I am lucky enough to catch this 0-reset in rx multicast: > eth0: 41457665 76804 700070 0 47085 15586634 > 87502300 0 3 0 > eth0: 41459860 76815 700070 0 47094 15588376 > 87516300 0 3 0 > eth0: 41460577 76822 700070 0 0 15589083 > 87521300 0 3 0 > eth0: 41463293 76838 700070 0 47108 15595872 > 87538300 0 3 0 > eth0: 41463379 76839 700070 0 47116 15596138 > 87539300 0 3 0 > > Cc: Saeed Mahameed <sae...@mellanox.com> > Suggested-by: Eric Dumazet <eric.duma...@gmail.com> > Signed-off-by: Martin KaFai Lau <ka...@fb.com> > --- > drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > index 66c133757a5e..246786bb861b 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > @@ -174,7 +174,7 @@ static void mlx5e_tx_timeout_work(struct work_struct > *work) > > static void mlx5e_update_sw_counters(struct mlx5e_priv *priv) > { > - struct mlx5e_sw_stats *s = >stats.sw; > + struct mlx5e_sw_stats temp, *s = > struct mlx5e_rq_stats *rq_stats; > struct mlx5e_sq_stats *sq_stats; > u64 tx_offload_none = 0; > @@ -229,12 +229,14 @@ static void mlx5e_update_sw_counters(struct mlx5e_priv > *priv) > s->link_down_events_phy = MLX5_GET(ppcnt_reg, > priv->stats.pport.phy_counters, > > counter_set.phys_layer_cntrs.link_down_events); > + memcpy(>stats.sw, s, sizeof(*s)); > } > > static void mlx5e_update_vport_counters(struct mlx5e_priv *priv) > { > + struct mlx5e_vport_stats temp; > int outlen = MLX5_ST_SZ_BYTES(query_vport_counter_out); > - u32 *out = (u32 *)priv->stats.vport.query_vport_out; > + u32 *out = (u32 *)temp.query_vport_out; > u32 in[MLX5_ST_SZ_DW(query_vport_counter_in)] = {0}; > struct mlx5_core_dev *mdev = priv->mdev; > > @@ -245,6 +247,7 @@ static void mlx5e_update_vport_counters(struct mlx5e_priv > *priv) > > memset(out, 0, outlen); Actually you don't need any temp here, it is safe to just remove this redundant memset and mlx5_cmd_exec will do the copy for you. > mlx5_cmd_exec(mdev, in, sizeof(in), out, outlen); > + memcpy(priv->stats.vport.query_vport_out, out, outlen); > } Anyway we still need a spin lock here, and also for all the counters under priv->stats which are affected by this race as well. If you want I can accept this as a temporary fix for net and Gal will work on a spin lock based mechanism to fix the memcpy race for all the counters. > > static void mlx5e_update_pport_counters(struct mlx5e_priv *priv) > -- > 2.9.3 >
Re: [PATCH net v2] net/mlx5e: Fix race in mlx5e_sw_stats and mlx5e_vport_stats
On Thu, Apr 20, 2017 at 5:15 PM, Martin KaFai Lau <ka...@fb.com> wrote: > On Thu, Apr 20, 2017 at 05:00:13PM +0300, Saeed Mahameed wrote: >> On Thu, Apr 20, 2017 at 2:32 AM, Martin KaFai Lau <ka...@fb.com> wrote: >> > We have observed a sudden spike in rx/tx_packets and rx/tx_bytes >> > reported under /proc/net/dev. There is a race in mlx5e_update_stats() >> > and some of the get-stats functions (the one that we hit is the >> > mlx5e_get_stats() which is called by ndo_get_stats64()). >> > >> > In particular, the very first thing mlx5e_update_sw_counters() >> > does is 'memset(s, 0, sizeof(*s))'. For example, if mlx5e_get_stats() >> > is unlucky at one point, rx_bytes and rx_packets could be 0. One second >> > later, a normal (and much bigger than 0) value will be reported. >> > >> > This patch is to use a 'struct mlx5e_sw_stats temp' to avoid >> > a direct memset zero on priv->stats.sw. >> > >> > mlx5e_update_vport_counters() has a similar race. Hence, addressed >> > together. >> > >> > I am lucky enough to catch this 0-reset in rx multicast: >> > eth0: 41457665 76804 700070 0 47085 15586634 >> > 87502300 0 3 0 >> > eth0: 41459860 76815 700070 0 47094 15588376 >> > 87516300 0 3 0 >> > eth0: 41460577 76822 700070 0 0 15589083 >> > 87521300 0 3 0 >> > eth0: 41463293 76838 700 0 70 0 47108 15595872 >> > 87538300 0 3 0 >> > eth0: 41463379 76839 700070 0 47116 15596138 >> > 87539300 0 3 0 >> > >> > Cc: Saeed Mahameed <sae...@mellanox.com> >> > Suggested-by: Eric Dumazet <eric.duma...@gmail.com> >> > Signed-off-by: Martin KaFai Lau <ka...@fb.com> >> > --- >> > drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 +-- >> > 1 file changed, 5 insertions(+), 2 deletions(-) >> > >> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c >> > b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c >> > index 66c133757a5e..246786bb861b 100644 >> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c >> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c >> > @@ -174,7 +174,7 @@ static void mlx5e_tx_timeout_work(struct work_struct >> > *work) >> > >> > static void mlx5e_update_sw_counters(struct mlx5e_priv *priv) >> > { >> > - struct mlx5e_sw_stats *s = >stats.sw; >> > + struct mlx5e_sw_stats temp, *s = >> > struct mlx5e_rq_stats *rq_stats; >> > struct mlx5e_sq_stats *sq_stats; >> > u64 tx_offload_none = 0; >> > @@ -229,12 +229,14 @@ static void mlx5e_update_sw_counters(struct >> > mlx5e_priv *priv) >> > s->link_down_events_phy = MLX5_GET(ppcnt_reg, >> > priv->stats.pport.phy_counters, >> > >> > counter_set.phys_layer_cntrs.link_down_events); >> > + memcpy(>stats.sw, s, sizeof(*s)); >> > } >> > >> > static void mlx5e_update_vport_counters(struct mlx5e_priv *priv) >> > { >> > + struct mlx5e_vport_stats temp; >> > int outlen = MLX5_ST_SZ_BYTES(query_vport_counter_out); >> > - u32 *out = (u32 *)priv->stats.vport.query_vport_out; >> > + u32 *out = (u32 *)temp.query_vport_out; >> > u32 in[MLX5_ST_SZ_DW(query_vport_counter_in)] = {0}; >> > struct mlx5_core_dev *mdev = priv->mdev; >> > >> > @@ -245,6 +247,7 @@ static void mlx5e_update_vport_counters(struct >> > mlx5e_priv *priv) >> > >> > memset(out, 0, outlen); >> >> Actually you don't need any temp here, it is safe to just remove this >> redundant memset >> and mlx5_cmd_exec will do the copy for you. >> >> > mlx5_cmd_exec(mdev, in, sizeof(in), out, outlen); >> > + memcpy(priv->stats.vport.query_vport_out, out, outlen); >> > } >> >> Anyway we still need a spin lock here, and also for all the counters >> under priv->stats which are affected by this race as well. >> >> If you want I can accept this as a temporary fix for net and Gal will >> work on a spin lock based mechanism to fix the memcpy race for all the >> counters. > A follow-up patch approach by Gal will be nice. > Ok, I will expect V3 with the removal of the memset from mlx5e_update_vport_counters instead of the memcpy. and Gal will work on the follow up patch
Re: [RFC PATCH net] net/mlx5e: Race between mlx5e_update_stats() and getting the stats
On Thu, Apr 20, 2017 at 2:35 AM, Eric Dumazetwrote: > On Wed, 2017-04-19 at 14:53 -0700, Martin KaFai Lau wrote: > >> Right, a temp and a memcpy should be enough to solve our spike problem. >> It may be the right fix for net. >> >> Agree that using a spinlock is better (likely changing state_lock >> to spinlock). A quick grep shows 80 line changes. Saeed, thoughts? No, changing the current state_lock is a bad idea, as Eric said it is for a reason, to synchronize between arbitrary ring/device state changes which might sleep. memcpy is a good idea to better hide or delay the issue :), new dedicated spin lock is the right way to go, As Eric suggested below. BTW, very nice catch Martin, I just got back from vacation to work on this bug, and you already root caused it, Thanks !! > > I was not advising replacing the mutex (maybe it is a mutex for good > reason), I simply suggested to use another spinlock only for this very > specific section. > > Something like : > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h > b/drivers/net/ethernet/mellanox/mlx5/core/en.h > index > dc52053128bc752ccd398449330c24c0bdf8b3a1..9b2e1b79fded22d55e9409cb572308190679cfdd > 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h > @@ -722,6 +722,7 @@ struct mlx5e_priv { > struct mlx5_core_dev *mdev; > struct net_device *netdev; > struct mlx5e_stats stats; > + spinlock_t stats_lock; > struct mlx5e_tstamptstamp; > u16 q_counter; > #ifdef CONFIG_MLX5_CORE_EN_DCB > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c > b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c > index > a004a5a1a4c22a742ef3f9939769c6b5c9445f46..b4b7d43bf899cadca2c2a17151d35acac9773859 > 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c > @@ -315,9 +315,11 @@ static void mlx5e_get_ethtool_stats(struct net_device > *dev, > mlx5e_update_stats(priv); > mutex_unlock(>state_lock); > > + spin_lock(>stats_lock); > for (i = 0; i < NUM_SW_COUNTERS; i++) > data[idx++] = MLX5E_READ_CTR64_CPU(>stats.sw, >sw_stats_desc, i); > + spin_unlock(>stats_lock); > > for (i = 0; i < MLX5E_NUM_Q_CNTRS(priv); i++) > data[idx++] = MLX5E_READ_CTR32_CPU(>stats.qcnt, > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > index > 66c133757a5ee8daae122e93322306b1c5c44336..4d6672045b1126a8bab4d6f2035e6a9b830560d2 > 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > @@ -174,7 +174,7 @@ static void mlx5e_tx_timeout_work(struct work_struct > *work) > > static void mlx5e_update_sw_counters(struct mlx5e_priv *priv) > { > - struct mlx5e_sw_stats *s = >stats.sw; > + struct mlx5e_sw_stats temp, *s = > struct mlx5e_rq_stats *rq_stats; > struct mlx5e_sq_stats *sq_stats; > u64 tx_offload_none = 0; > @@ -229,6 +229,9 @@ static void mlx5e_update_sw_counters(struct mlx5e_priv > *priv) > s->link_down_events_phy = MLX5_GET(ppcnt_reg, > priv->stats.pport.phy_counters, > > counter_set.phys_layer_cntrs.link_down_events); > + spin_lock(>stats_lock); > + memcpy(>stats.sw, s, sizeof(*s)); > + spin_unlock(>stats_lock); I like this ! minimized the critical section with a temp buffer and a memcpy .. perfect. > } > > static void mlx5e_update_vport_counters(struct mlx5e_priv *priv) > @@ -2754,11 +2757,13 @@ mlx5e_get_stats(struct net_device *dev, struct > rtnl_link_stats64 *stats) > stats->tx_packets = PPORT_802_3_GET(pstats, > a_frames_transmitted_ok); > stats->tx_bytes = PPORT_802_3_GET(pstats, > a_octets_transmitted_ok); > } else { > + spin_lock(>stats_lock); > stats->rx_packets = sstats->rx_packets; > stats->rx_bytes = sstats->rx_bytes; > stats->tx_packets = sstats->tx_packets; > stats->tx_bytes = sstats->tx_bytes; > stats->tx_dropped = sstats->tx_queue_dropped; > + spin_unlock(>stats_lock); > } > > stats->rx_dropped = priv->stats.qcnt.rx_out_of_buffer; > @@ -3561,6 +3566,8 @@ static void mlx5e_build_nic_netdev_priv(struct > mlx5_core_dev *mdev, > > mutex_init(>state_lock); > > + spin_lock_init(>stats_lock); > + > INIT_WORK(>update_carrier_work, mlx5e_update_carrier_work); > INIT_WORK(>set_rx_mode_work, mlx5e_set_rx_mode_work); > INIT_WORK(>tx_timeout_work, mlx5e_tx_timeout_work); > >
[net 1/7] net/mlx5: Fix driver load bad flow when having fw initializing timeout
From: Mohamad Haj Yahia <moha...@mellanox.com> If FW is stuck in initializing state we will skip the driver load, but current error handling flow doesn't clean previously allocated command interface resources. Fixes: e3297246c2c8 ('net/mlx5_core: Wait for FW readiness on startup') Signed-off-by: Mohamad Haj Yahia <moha...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Cc: kernel-t...@fb.com --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 60154a175bd3..0ad66324247f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1029,7 +1029,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv, if (err) { dev_err(>pdev->dev, "Firmware over %d MS in initializing state, aborting\n", FW_INIT_TIMEOUT_MILI); - goto out_err; + goto err_cmd_cleanup; } err = mlx5_core_enable_hca(dev, 0); -- 2.11.0
[net 4/7] net/mlx5e: Make sure the FW max encap size is enough for ipv6 tunnels
From: Or Gerlitz <ogerl...@mellanox.com> Otherwise the code that fills the ipv6 encapsulation headers could be writing beyond the allocated headers buffer. Fixes: ce99f6b97fcd ('net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels') Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Roi Dayan <r...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 42 ++--- 1 file changed, 23 insertions(+), 19 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index fc7c1d30461c..5436866798f4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -819,16 +819,15 @@ static void gen_vxlan_header_ipv4(struct net_device *out_dev, vxh->vx_vni = vxlan_vni_field(vx_vni); } -static int gen_vxlan_header_ipv6(struct net_device *out_dev, -char buf[], -unsigned char h_dest[ETH_ALEN], -int ttl, -struct in6_addr *daddr, -struct in6_addr *saddr, -__be16 udp_dst_port, -__be32 vx_vni) +static void gen_vxlan_header_ipv6(struct net_device *out_dev, + char buf[], int encap_size, + unsigned char h_dest[ETH_ALEN], + int ttl, + struct in6_addr *daddr, + struct in6_addr *saddr, + __be16 udp_dst_port, + __be32 vx_vni) { - int encap_size = VXLAN_HLEN + sizeof(struct ipv6hdr) + ETH_HLEN; struct ethhdr *eth = (struct ethhdr *)buf; struct ipv6hdr *ip6h = (struct ipv6hdr *)((char *)eth + sizeof(struct ethhdr)); struct udphdr *udp = (struct udphdr *)((char *)ip6h + sizeof(struct ipv6hdr)); @@ -850,8 +849,6 @@ static int gen_vxlan_header_ipv6(struct net_device *out_dev, udp->dest = udp_dst_port; vxh->vx_flags = VXLAN_HF_VNI; vxh->vx_vni = vxlan_vni_field(vx_vni); - - return encap_size; } static int mlx5e_create_encap_header_ipv4(struct mlx5e_priv *priv, @@ -935,13 +932,20 @@ static int mlx5e_create_encap_header_ipv6(struct mlx5e_priv *priv, { int max_encap_size = MLX5_CAP_ESW(priv->mdev, max_encap_header_size); + int ipv6_encap_size = ETH_HLEN + sizeof(struct ipv6hdr) + VXLAN_HLEN; struct ip_tunnel_key *tun_key = >tun_info.key; - int encap_size, err, ttl = 0; struct neighbour *n = NULL; struct flowi6 fl6 = {}; char *encap_header; + int err, ttl = 0; - encap_header = kzalloc(max_encap_size, GFP_KERNEL); + if (max_encap_size < ipv6_encap_size) { + mlx5_core_warn(priv->mdev, "encap size %d too big, max supported is %d\n", + ipv6_encap_size, max_encap_size); + return -EOPNOTSUPP; + } + + encap_header = kzalloc(ipv6_encap_size, GFP_KERNEL); if (!encap_header) return -ENOMEM; @@ -977,11 +981,11 @@ static int mlx5e_create_encap_header_ipv6(struct mlx5e_priv *priv, switch (e->tunnel_type) { case MLX5_HEADER_TYPE_VXLAN: - encap_size = gen_vxlan_header_ipv6(*out_dev, encap_header, - e->h_dest, ttl, - , - , tun_key->tp_dst, - tunnel_id_to_key32(tun_key->tun_id)); + gen_vxlan_header_ipv6(*out_dev, encap_header, + ipv6_encap_size, e->h_dest, ttl, + , + , tun_key->tp_dst, + tunnel_id_to_key32(tun_key->tun_id)); break; default: err = -EOPNOTSUPP; @@ -989,7 +993,7 @@ static int mlx5e_create_encap_header_ipv6(struct mlx5e_priv *priv, } err = mlx5_encap_alloc(priv->mdev, e->tunnel_type, - encap_size, encap_header, >encap_id); + ipv6_encap_size, encap_header, >encap_id); out: if (err && n) neigh_release(n); -- 2.11.0
[net 6/7] net/mlx5e: Fix small packet threshold
From: Eugenia Emantayev <euge...@mellanox.com> RX packet headers are meant to be contained in SKB linear part, and chose a threshold of 128. It turns out this is not enough, i.e. for IPv6 packet over VxLAN. In this case, UDP/IPv4 needs 42 bytes, GENEVE header is 8 bytes, and 86 bytes for TCP/IPv6. In total 136 bytes that is more than current 128 bytes. In this case expand header flow is reached. The warning in skb_try_coalesce() caused by a wrong truesize was already fixed here: commit 158f323b9868 ("net: adjust skb->truesize in pskb_expand_head()"). Still, we prefer to totally avoid the expand header flow for performance reasons. Tested regular TCP_STREAM with iperf for 1 and 8 streams, no degradation was found. Fixes: 461017cb006a ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)") Signed-off-by: Eugenia Emantayev <euge...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Cc: kernel-t...@fb.com --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index dc52053128bc..3d9490cd2db1 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -90,7 +90,7 @@ #define MLX5E_VALID_NUM_MTTS(num_mtts) (MLX5_MTT_OCTW(num_mtts) - 1 <= U16_MAX) #define MLX5_UMR_ALIGN (2048) -#define MLX5_MPWRQ_SMALL_PACKET_THRESHOLD (128) +#define MLX5_MPWRQ_SMALL_PACKET_THRESHOLD (256) #define MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ (64 * 1024) #define MLX5E_DEFAULT_LRO_TIMEOUT 32 -- 2.11.0
[pull request][net 0/7] Mellanox, mlx5 fixes 2017-04-22
Hi Dave, This series contains some mlx5 fixes for net. For your convenience, the series doesn't introduce any conflict with the ongoing net-next pull request. Please pull and let me know if there's any problem. For -stable: ("net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5") kernels >= 4.10 ("net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling") kernels >= 4.8 ("net/mlx5e: Fix small packet threshold") kernels >= 4.7 ("net/mlx5: Fix driver load bad flow when having fw initializing timeout") kernels >= 4.4 Thanks, Saeed. The following changes since commit 94836ecf1e7378b64d37624fbb81fe48fbd4c772: Merge tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux (2017-04-21 16:37:48 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-fixes-2017-04-22 for you to fetch changes up to 5e82c9e4ed60beba83f46a1a5a8307b99a23e982: net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling (2017-04-22 21:52:37 +0300) mlx5-fixes-2017-04-22 Eugenia Emantayev (1): net/mlx5e: Fix small packet threshold Ilan Tayari (1): net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling Maor Gottlieb (1): net/mlx5: Fix UAR memory leak Mohamad Haj Yahia (1): net/mlx5: Fix driver load bad flow when having fw initializing timeout Or Gerlitz (3): net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5 net/mlx5e: Make sure the FW max encap size is enough for ipv4 tunnels net/mlx5e: Make sure the FW max encap size is enough for ipv6 tunnels drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 +- .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c| 1 + drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 87 -- .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 36 ++--- drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/uar.c | 1 + 6 files changed, 76 insertions(+), 53 deletions(-)
[net 3/7] net/mlx5e: Make sure the FW max encap size is enough for ipv4 tunnels
From: Or Gerlitz <ogerl...@mellanox.com> Otherwise the code that fills the ipv4 encapsulation headers could be writing beyond the allocated headers buffer. Fixes: a54e20b4fcae ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads') Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Roi Dayan <r...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 42 ++--- 1 file changed, 23 insertions(+), 19 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index b7c99c38a7c4..fc7c1d30461c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -786,16 +786,15 @@ static int mlx5e_route_lookup_ipv6(struct mlx5e_priv *priv, return 0; } -static int gen_vxlan_header_ipv4(struct net_device *out_dev, -char buf[], -unsigned char h_dest[ETH_ALEN], -int ttl, -__be32 daddr, -__be32 saddr, -__be16 udp_dst_port, -__be32 vx_vni) +static void gen_vxlan_header_ipv4(struct net_device *out_dev, + char buf[], int encap_size, + unsigned char h_dest[ETH_ALEN], + int ttl, + __be32 daddr, + __be32 saddr, + __be16 udp_dst_port, + __be32 vx_vni) { - int encap_size = VXLAN_HLEN + sizeof(struct iphdr) + ETH_HLEN; struct ethhdr *eth = (struct ethhdr *)buf; struct iphdr *ip = (struct iphdr *)((char *)eth + sizeof(struct ethhdr)); struct udphdr *udp = (struct udphdr *)((char *)ip + sizeof(struct iphdr)); @@ -818,8 +817,6 @@ static int gen_vxlan_header_ipv4(struct net_device *out_dev, udp->dest = udp_dst_port; vxh->vx_flags = VXLAN_HF_VNI; vxh->vx_vni = vxlan_vni_field(vx_vni); - - return encap_size; } static int gen_vxlan_header_ipv6(struct net_device *out_dev, @@ -863,13 +860,20 @@ static int mlx5e_create_encap_header_ipv4(struct mlx5e_priv *priv, struct net_device **out_dev) { int max_encap_size = MLX5_CAP_ESW(priv->mdev, max_encap_header_size); + int ipv4_encap_size = ETH_HLEN + sizeof(struct iphdr) + VXLAN_HLEN; struct ip_tunnel_key *tun_key = >tun_info.key; - int encap_size, ttl, err; struct neighbour *n = NULL; struct flowi4 fl4 = {}; char *encap_header; + int ttl, err; - encap_header = kzalloc(max_encap_size, GFP_KERNEL); + if (max_encap_size < ipv4_encap_size) { + mlx5_core_warn(priv->mdev, "encap size %d too big, max supported is %d\n", + ipv4_encap_size, max_encap_size); + return -EOPNOTSUPP; + } + + encap_header = kzalloc(ipv4_encap_size, GFP_KERNEL); if (!encap_header) return -ENOMEM; @@ -904,11 +908,11 @@ static int mlx5e_create_encap_header_ipv4(struct mlx5e_priv *priv, switch (e->tunnel_type) { case MLX5_HEADER_TYPE_VXLAN: - encap_size = gen_vxlan_header_ipv4(*out_dev, encap_header, - e->h_dest, ttl, - fl4.daddr, - fl4.saddr, tun_key->tp_dst, - tunnel_id_to_key32(tun_key->tun_id)); + gen_vxlan_header_ipv4(*out_dev, encap_header, + ipv4_encap_size, e->h_dest, ttl, + fl4.daddr, + fl4.saddr, tun_key->tp_dst, + tunnel_id_to_key32(tun_key->tun_id)); break; default: err = -EOPNOTSUPP; @@ -916,7 +920,7 @@ static int mlx5e_create_encap_header_ipv4(struct mlx5e_priv *priv, } err = mlx5_encap_alloc(priv->mdev, e->tunnel_type, - encap_size, encap_header, >encap_id); + ipv4_encap_size, encap_header, >encap_id); out: if (err && n) neigh_release(n); -- 2.11.0
[net 2/7] net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5
From: Or Gerlitz <ogerl...@mellanox.com> On ConnectX5 the wqe inline mode is "none" and hence the FW reports MLX5_CAP_INLINE_MODE_NOT_REQUIRED. Fix our devlink callbacks to deal with that on get and set. Also fix the tc flow parsing code not to fail anything when inline isn't required. Fixes: bffaa916588e ('net/mlx5: E-Switch, Add control for inline mode') Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Roi Dayan <r...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 3 +- .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 36 ++ 2 files changed, 26 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index fade7233dac5..b7c99c38a7c4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -639,7 +639,8 @@ static int parse_cls_flower(struct mlx5e_priv *priv, if (!err && (flow->flags & MLX5E_TC_FLOW_ESWITCH) && rep->vport != FDB_UPLINK_VPORT) { - if (min_inline > esw->offloads.inline_mode) { + if (esw->offloads.inline_mode != MLX5_INLINE_MODE_NONE && + esw->offloads.inline_mode < min_inline) { netdev_warn(priv->netdev, "Flow is not offloaded due to min inline setting, required %d actual %d\n", min_inline, esw->offloads.inline_mode); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index 307ec6c5fd3b..d111cebca9f1 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -911,8 +911,7 @@ int mlx5_devlink_eswitch_inline_mode_set(struct devlink *devlink, u8 mode) struct mlx5_core_dev *dev = devlink_priv(devlink); struct mlx5_eswitch *esw = dev->priv.eswitch; int num_vports = esw->enabled_vports; - int err; - int vport; + int err, vport; u8 mlx5_mode; if (!MLX5_CAP_GEN(dev, vport_group_manager)) @@ -921,9 +920,17 @@ int mlx5_devlink_eswitch_inline_mode_set(struct devlink *devlink, u8 mode) if (esw->mode == SRIOV_NONE) return -EOPNOTSUPP; - if (MLX5_CAP_ETH(dev, wqe_inline_mode) != - MLX5_CAP_INLINE_MODE_VPORT_CONTEXT) + switch (MLX5_CAP_ETH(dev, wqe_inline_mode)) { + case MLX5_CAP_INLINE_MODE_NOT_REQUIRED: + if (mode == DEVLINK_ESWITCH_INLINE_MODE_NONE) + return 0; + /* fall through */ + case MLX5_CAP_INLINE_MODE_L2: + esw_warn(dev, "Inline mode can't be set\n"); return -EOPNOTSUPP; + case MLX5_CAP_INLINE_MODE_VPORT_CONTEXT: + break; + } if (esw->offloads.num_flows > 0) { esw_warn(dev, "Can't set inline mode when flows are configured\n"); @@ -966,18 +973,14 @@ int mlx5_devlink_eswitch_inline_mode_get(struct devlink *devlink, u8 *mode) if (esw->mode == SRIOV_NONE) return -EOPNOTSUPP; - if (MLX5_CAP_ETH(dev, wqe_inline_mode) != - MLX5_CAP_INLINE_MODE_VPORT_CONTEXT) - return -EOPNOTSUPP; - return esw_inline_mode_to_devlink(esw->offloads.inline_mode, mode); } int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, int nvfs, u8 *mode) { + u8 prev_mlx5_mode, mlx5_mode = MLX5_INLINE_MODE_L2; struct mlx5_core_dev *dev = esw->dev; int vport; - u8 prev_mlx5_mode, mlx5_mode = MLX5_INLINE_MODE_L2; if (!MLX5_CAP_GEN(dev, vport_group_manager)) return -EOPNOTSUPP; @@ -985,10 +988,18 @@ int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, int nvfs, u8 *mode) if (esw->mode == SRIOV_NONE) return -EOPNOTSUPP; - if (MLX5_CAP_ETH(dev, wqe_inline_mode) != - MLX5_CAP_INLINE_MODE_VPORT_CONTEXT) - return -EOPNOTSUPP; + switch (MLX5_CAP_ETH(dev, wqe_inline_mode)) { + case MLX5_CAP_INLINE_MODE_NOT_REQUIRED: + mlx5_mode = MLX5_INLINE_MODE_NONE; + goto out; + case MLX5_CAP_INLINE_MODE_L2: + mlx5_mode = MLX5_INLINE_MODE_L2; + goto out; + case MLX5_CAP_INLINE_MODE_VPORT_CONTEXT: + goto query_vports; + } +query_vports: for (vport = 1; vport <= nvfs; vport++) { mlx5_query_nic_vport_min_inline(dev, vport, _mode); if (vport > 1 && prev_mlx5_mode != mlx5_mode) @@ -996,6 +1007,7 @@ int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, int nvfs, u8 *mode) prev_mlx5_mode = mlx5_mode; } +out: *mode = mlx5_mode; return 0; } -- 2.11.0
[net 7/7] net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling
From: Ilan Tayari <il...@mellanox.com> Handler for ETHTOOL_GRXCLSRLALL must set info->data to the size of the table, regardless of the amount of entries in it. Existing code does not do that, and this breaks all usage of ethtool -N or -n without explicit location, with this error: rmgr: Invalid RX class rules table size: Success Set info->data to the table size. Tested: ethtool -n ens8 ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1 ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1 loc 55 ethtool -n ens8 ethtool -N ens8 delete 1023 ethtool -N ens8 delete 55 Fixes: f913a72aa008 ("net/mlx5e: Add support to get ethtool flow rules") Signed-off-by: Ilan Tayari <il...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> Cc: kernel-t...@fb.com --- drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c index d55fff0ba388..26fc77e80f7b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c @@ -564,6 +564,7 @@ int mlx5e_ethtool_get_all_flows(struct mlx5e_priv *priv, struct ethtool_rxnfc *i int idx = 0; int err = 0; + info->data = MAX_NUM_OF_ETHTOOL_RULES; while ((!err || err == -ENOENT) && idx < info->rule_cnt) { err = mlx5e_ethtool_get_flow(priv, info, location); if (!err) -- 2.11.0
[net 5/7] net/mlx5: Fix UAR memory leak
From: Maor Gottlieb <ma...@mellanox.com> When UAR is released, we deallocate the device resource, but don't unmmap the UAR mapping memory. Fix the leak by unmapping this memory. Fixes: a6d51b68611e9 ('net/mlx5: Introduce blue flame register allocator) Signed-off-by: Maor Gottlieb <ma...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/uar.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/uar.c b/drivers/net/ethernet/mellanox/mlx5/core/uar.c index 2e6b0f290ddc..222b25908d01 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/uar.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/uar.c @@ -87,6 +87,7 @@ static void up_rel_func(struct kref *kref) struct mlx5_uars_page *up = container_of(kref, struct mlx5_uars_page, ref_count); list_del(>list); + iounmap(up->map); if (mlx5_cmd_free_uar(up->mdev, up->index)) mlx5_core_warn(up->mdev, "failed to free uar index %d\n", up->index); kfree(up->reg_bitmap); -- 2.11.0
[pull request][net-next 0/5] Mellanox, mlx5 updates 2017-04-22
Hi Dave, This series contains some updates to mlx5 driver. Sparse and compiler warnings fixes from Stephen Hemminger. >From Roi Dayan and Or Gerlitz, Add devlink and mlx5 support for controlling E-Switch encapsulation mode, this knob will enable HW support for applying encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading. Please pull and let me know if there's any problem. Thanks, Saeed. --- The following changes since commit fb796707d7a6c9b24fdf80b9b4f24fa5ffcf0ec5: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-04-21 20:23:53 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2017-04-22 for you to fetch changes up to 8bf3198a5e394ed6815aeb8fedaf49722986bbd3: mlx5: fix warning about missing prototype (2017-04-22 20:26:42 +0300) Or Gerlitz (1): net/mlx5: E-Switch, Refactor fast path FDB table creation in switchdev mode Roi Dayan (2): net/devlink: Add E-Switch encapsulation control net/mlx5: E-Switch, Add control for encapsulation Stephen Hemminger (2): mlx5: hide unused functions mlx5: fix warning about missing prototype drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 1 + drivers/net/ethernet/mellanox/mlx5/core/en_tx.c| 1 + drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 5 + drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 3 + .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 132 + drivers/net/ethernet/mellanox/mlx5/core/ipoib.c| 24 ++-- drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 + include/net/devlink.h | 2 + include/uapi/linux/devlink.h | 7 ++ net/core/devlink.c | 26 +++- 10 files changed, 167 insertions(+), 36 deletions(-)
[net-next 4/5] mlx5: hide unused functions
From: Stephen Hemminger <step...@networkplumber.org> Fix sparse warnings in recent ipoib support. The RDMA functions are not used yet, hide behind #ifdef. Based on comment, they will eventually be local so make static. Signed-off-by: Stephen Hemminger <sthem...@microsoft.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c index ec78e637840f..3c84e36af018 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c @@ -178,7 +178,7 @@ static int mlx5i_init_tx(struct mlx5e_priv *priv) return 0; } -void mlx5i_cleanup_tx(struct mlx5e_priv *priv) +static void mlx5i_cleanup_tx(struct mlx5e_priv *priv) { struct mlx5i_priv *ipriv = priv->ppriv; @@ -359,9 +359,10 @@ static int mlx5i_close(struct net_device *netdev) return 0; } +#ifdef notusedyet /* IPoIB RDMA netdev callbacks */ -int mlx5i_attach_mcast(struct net_device *netdev, struct ib_device *hca, - union ib_gid *gid, u16 lid, int set_qkey) +static int mlx5i_attach_mcast(struct net_device *netdev, struct ib_device *hca, + union ib_gid *gid, u16 lid, int set_qkey) { struct mlx5e_priv*epriv = mlx5i_epriv(netdev); struct mlx5_core_dev *mdev = epriv->mdev; @@ -377,8 +378,8 @@ int mlx5i_attach_mcast(struct net_device *netdev, struct ib_device *hca, return err; } -int mlx5i_detach_mcast(struct net_device *netdev, struct ib_device *hca, - union ib_gid *gid, u16 lid) +static int mlx5i_detach_mcast(struct net_device *netdev, struct ib_device *hca, + union ib_gid *gid, u16 lid) { struct mlx5e_priv*epriv = mlx5i_epriv(netdev); struct mlx5_core_dev *mdev = epriv->mdev; @@ -395,7 +396,7 @@ int mlx5i_detach_mcast(struct net_device *netdev, struct ib_device *hca, return err; } -int mlx5i_xmit(struct net_device *dev, struct sk_buff *skb, +static int mlx5i_xmit(struct net_device *dev, struct sk_buff *skb, struct ib_ah *address, u32 dqpn, u32 dqkey) { struct mlx5e_priv *epriv = mlx5i_epriv(dev); @@ -404,6 +405,7 @@ int mlx5i_xmit(struct net_device *dev, struct sk_buff *skb, return mlx5i_sq_xmit(sq, skb, >av, dqpn, dqkey); } +#endif static int mlx5i_check_required_hca_cap(struct mlx5_core_dev *mdev) { @@ -418,10 +420,10 @@ static int mlx5i_check_required_hca_cap(struct mlx5_core_dev *mdev) return 0; } -struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev, - struct ib_device *ibdev, - const char *name, - void (*setup)(struct net_device *)) +static struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev, +struct ib_device *ibdev, +const char *name, +void (*setup)(struct net_device *)) { const struct mlx5e_profile *profile = _nic_profile; int nch = profile->max_nch(mdev); @@ -480,7 +482,7 @@ struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev, } EXPORT_SYMBOL(mlx5_rdma_netdev_alloc); -void mlx5_rdma_netdev_free(struct net_device *netdev) +static void mlx5_rdma_netdev_free(struct net_device *netdev) { struct mlx5e_priv *priv= mlx5i_epriv(netdev); const struct mlx5e_profile *profile = priv->profile; -- 2.11.0
[net-next 5/5] mlx5: fix warning about missing prototype
From: Stephen Hemminger <step...@networkplumber.org> Fix sparse warning about missing prototypes. The rx/tx code path defines functions with prototypes in ipoib.h. Signed-off-by: Stephen Hemminger <sthem...@microsoft.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index 43308243f519..ae66fad98244 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -39,6 +39,7 @@ #include "en.h" #include "en_tc.h" #include "eswitch.h" +#include "ipoib.h" static inline bool mlx5e_rx_hw_stamp(struct mlx5e_tstamp *tstamp) { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index dda7db503043..ab3bb026ff9e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -33,6 +33,7 @@ #include #include #include "en.h" +#include "ipoib.h" #define MLX5E_SQ_NOPS_ROOM MLX5_SEND_WQE_MAX_WQEBBS #define MLX5E_SQ_STOP_ROOM (MLX5_SEND_WQE_MAX_WQEBBS +\ -- 2.11.0
[net-next 1/5] net/devlink: Add E-Switch encapsulation control
From: Roi Dayan <r...@mellanox.com> This is an e-switch global knob to enable HW support for applying encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading. The actual encap/decap is carried out (along with the matching and other actions) per offloaded e-switch rules, e.g as done when offloading the TC tunnel key action. Signed-off-by: Roi Dayan <r...@mellanox.com> Reviewed-by: Or Gerlitz <ogerl...@mellanox.com> Acked-by: Jiri Pirko <j...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- include/net/devlink.h| 2 ++ include/uapi/linux/devlink.h | 7 +++ net/core/devlink.c | 26 +++--- 3 files changed, 32 insertions(+), 3 deletions(-) diff --git a/include/net/devlink.h b/include/net/devlink.h index 24de13f8c94f..ed7687bbf5d0 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -268,6 +268,8 @@ struct devlink_ops { int (*eswitch_mode_set)(struct devlink *devlink, u16 mode); int (*eswitch_inline_mode_get)(struct devlink *devlink, u8 *p_inline_mode); int (*eswitch_inline_mode_set)(struct devlink *devlink, u8 inline_mode); + int (*eswitch_encap_mode_get)(struct devlink *devlink, u8 *p_encap_mode); + int (*eswitch_encap_mode_set)(struct devlink *devlink, u8 encap_mode); }; static inline void *devlink_priv(struct devlink *devlink) diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index b47bee277347..b0e807ac53bb 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -119,6 +119,11 @@ enum devlink_eswitch_inline_mode { DEVLINK_ESWITCH_INLINE_MODE_TRANSPORT, }; +enum devlink_eswitch_encap_mode { + DEVLINK_ESWITCH_ENCAP_MODE_NONE, + DEVLINK_ESWITCH_ENCAP_MODE_BASIC, +}; + enum devlink_attr { /* don't change the order or add anything between, this is ABI! */ DEVLINK_ATTR_UNSPEC, @@ -195,6 +200,8 @@ enum devlink_attr { DEVLINK_ATTR_PAD, + DEVLINK_ATTR_ESWITCH_ENCAP_MODE,/* u8 */ + /* add new attributes above here, update the policy in devlink.c */ __DEVLINK_ATTR_MAX, diff --git a/net/core/devlink.c b/net/core/devlink.c index 0afac5800b57..b0b87a292e7c 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -1397,10 +1397,10 @@ static int devlink_nl_eswitch_fill(struct sk_buff *msg, struct devlink *devlink, u32 seq, int flags) { const struct devlink_ops *ops = devlink->ops; + u8 inline_mode, encap_mode; void *hdr; int err = 0; u16 mode; - u8 inline_mode; hdr = genlmsg_put(msg, portid, seq, _nl_family, flags, cmd); if (!hdr) @@ -1429,6 +1429,15 @@ static int devlink_nl_eswitch_fill(struct sk_buff *msg, struct devlink *devlink, goto nla_put_failure; } + if (ops->eswitch_encap_mode_get) { + err = ops->eswitch_encap_mode_get(devlink, _mode); + if (err) + goto nla_put_failure; + err = nla_put_u8(msg, DEVLINK_ATTR_ESWITCH_ENCAP_MODE, encap_mode); + if (err) + goto nla_put_failure; + } + genlmsg_end(msg, hdr); return 0; @@ -1468,9 +1477,9 @@ static int devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb, { struct devlink *devlink = info->user_ptr[0]; const struct devlink_ops *ops = devlink->ops; - u16 mode; - u8 inline_mode; + u8 inline_mode, encap_mode; int err = 0; + u16 mode; if (!ops) return -EOPNOTSUPP; @@ -1493,6 +1502,16 @@ static int devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb, if (err) return err; } + + if (info->attrs[DEVLINK_ATTR_ESWITCH_ENCAP_MODE]) { + if (!ops->eswitch_encap_mode_set) + return -EOPNOTSUPP; + encap_mode = nla_get_u8(info->attrs[DEVLINK_ATTR_ESWITCH_ENCAP_MODE]); + err = ops->eswitch_encap_mode_set(devlink, encap_mode); + if (err) + return err; + } + return 0; } @@ -2190,6 +2209,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = { [DEVLINK_ATTR_SB_TC_INDEX] = { .type = NLA_U16 }, [DEVLINK_ATTR_ESWITCH_MODE] = { .type = NLA_U16 }, [DEVLINK_ATTR_ESWITCH_INLINE_MODE] = { .type = NLA_U8 }, + [DEVLINK_ATTR_ESWITCH_ENCAP_MODE] = { .type = NLA_U8 }, [DEVLINK_ATTR_DPIPE_TABLE_NAME] = { .type = NLA_NUL_STRING }, [DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED] = { .type = NLA_U8 }, }; -- 2.11.0
[net-next 3/5] net/mlx5: E-Switch, Add control for encapsulation
From: Roi Dayan <r...@mellanox.com> Implement the devlink e-switch encapsulation control set and get callbacks. Apply the value set by the user on the switchdev offloads mode when creating the fast FDB table where offloaded rules will be set. Signed-off-by: Roi Dayan <r...@mellanox.com> Reviewed-by: Or Gerlitz <ogerl...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 5 ++ drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 3 ++ .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 63 +- drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 + 4 files changed, 71 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index b3281d1118b3..21bed3c3334d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -1806,6 +1806,11 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev) esw->enabled_vports = 0; esw->mode = SRIOV_NONE; esw->offloads.inline_mode = MLX5_INLINE_MODE_NONE; + if (MLX5_CAP_ESW_FLOWTABLE_FDB(dev, encap) && + MLX5_CAP_ESW_FLOWTABLE_FDB(dev, decap)) + esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_BASIC; + else + esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_NONE; dev->priv.eswitch = esw; return 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index 1f56ed9f5a6f..1e7f21be1233 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -210,6 +210,7 @@ struct mlx5_esw_offload { DECLARE_HASHTABLE(encap_tbl, 8); u8 inline_mode; u64 num_flows; + u8 encap; }; struct mlx5_eswitch { @@ -322,6 +323,8 @@ int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode); int mlx5_devlink_eswitch_inline_mode_set(struct devlink *devlink, u8 mode); int mlx5_devlink_eswitch_inline_mode_get(struct devlink *devlink, u8 *mode); int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, int nvfs, u8 *mode); +int mlx5_devlink_eswitch_encap_mode_set(struct devlink *devlink, u8 encap); +int mlx5_devlink_eswitch_encap_mode_get(struct devlink *devlink, u8 *encap); void mlx5_eswitch_register_vport_rep(struct mlx5_eswitch *esw, int vport_index, struct mlx5_eswitch_rep *rep); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index ce3a2c040706..189d24dbd3e3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -450,8 +450,7 @@ static int esw_create_offloads_fast_fdb_table(struct mlx5_eswitch *esw) esw_size = min_t(int, MLX5_CAP_GEN(dev, max_flow_counter) * ESW_OFFLOADS_NUM_GROUPS, 1 << MLX5_CAP_ESW_FLOWTABLE_FDB(dev, log_max_ft_size)); - if (MLX5_CAP_ESW_FLOWTABLE_FDB(dev, encap) && - MLX5_CAP_ESW_FLOWTABLE_FDB(dev, decap)) + if (esw->offloads.encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE) flags |= MLX5_FLOW_TABLE_TUNNEL_EN; fdb = mlx5_create_auto_grouped_flow_table(root_ns, FDB_FAST_PATH, @@ -1045,6 +1044,66 @@ int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, int nvfs, u8 *mode) return 0; } +int mlx5_devlink_eswitch_encap_mode_set(struct devlink *devlink, u8 encap) +{ + struct mlx5_core_dev *dev = devlink_priv(devlink); + struct mlx5_eswitch *esw = dev->priv.eswitch; + int err; + + if (!MLX5_CAP_GEN(dev, vport_group_manager)) + return -EOPNOTSUPP; + + if (esw->mode == SRIOV_NONE) + return -EOPNOTSUPP; + + if (encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE && + (!MLX5_CAP_ESW_FLOWTABLE_FDB(dev, encap) || +!MLX5_CAP_ESW_FLOWTABLE_FDB(dev, decap))) + return -EOPNOTSUPP; + + if (encap && encap != DEVLINK_ESWITCH_ENCAP_MODE_BASIC) + return -EOPNOTSUPP; + + if (esw->mode == SRIOV_LEGACY) { + esw->offloads.encap = encap; + return 0; + } + + if (esw->offloads.encap == encap) + return 0; + + if (esw->offloads.num_flows > 0) { + esw_warn(dev, "Can't set encapsulation when flows are configured\n"); + return -EOPNOTSUPP; + } + + esw_destroy_offloads_fast_fdb_table(esw); + + esw->offloads.encap = encap; + err = esw_create_offloads_fast_fdb_table(esw); + if (err) { + esw_warn(esw->dev, &
[net-next 2/5] net/mlx5: E-Switch, Refactor fast path FDB table creation in switchdev mode
From: Or Gerlitz <ogerl...@mellanox.com> Refactor the creation of the fast path FDB table that holds the offloaded rules in SRIOV switchdev mode into it's own function. This will be used in the next patch to be able and re-create the table under different settings without going through legacy mode. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Roi Dayan <r...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 69 +++--- 1 file changed, 49 insertions(+), 20 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index 992b380d36be..ce3a2c040706 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -426,31 +426,21 @@ static int esw_add_fdb_miss_rule(struct mlx5_eswitch *esw) return err; } -#define MAX_PF_SQ 256 #define ESW_OFFLOADS_NUM_GROUPS 4 -static int esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports) +static int esw_create_offloads_fast_fdb_table(struct mlx5_eswitch *esw) { - int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in); - struct mlx5_flow_table_attr ft_attr = {}; - int table_size, ix, esw_size, err = 0; struct mlx5_core_dev *dev = esw->dev; struct mlx5_flow_namespace *root_ns; struct mlx5_flow_table *fdb = NULL; - struct mlx5_flow_group *g; - u32 *flow_group_in; - void *match_criteria; + int esw_size, err = 0; u32 flags = 0; - flow_group_in = mlx5_vzalloc(inlen); - if (!flow_group_in) - return -ENOMEM; - root_ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_FDB); if (!root_ns) { esw_warn(dev, "Failed to get FDB flow namespace\n"); err = -EOPNOTSUPP; - goto ns_err; + goto out; } esw_debug(dev, "Create offloads FDB table, min (max esw size(2^%d), max counters(%d)*groups(%d))\n", @@ -471,10 +461,49 @@ static int esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports) if (IS_ERR(fdb)) { err = PTR_ERR(fdb); esw_warn(dev, "Failed to create Fast path FDB Table err %d\n", err); - goto fast_fdb_err; + goto out; } esw->fdb_table.fdb = fdb; +out: + return err; +} + +static void esw_destroy_offloads_fast_fdb_table(struct mlx5_eswitch *esw) +{ + mlx5_destroy_flow_table(esw->fdb_table.fdb); +} + +#define MAX_PF_SQ 256 + +static int esw_create_offloads_fdb_tables(struct mlx5_eswitch *esw, int nvports) +{ + int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in); + struct mlx5_flow_table_attr ft_attr = {}; + struct mlx5_core_dev *dev = esw->dev; + struct mlx5_flow_namespace *root_ns; + struct mlx5_flow_table *fdb = NULL; + int table_size, ix, err = 0; + struct mlx5_flow_group *g; + void *match_criteria; + u32 *flow_group_in; + + esw_debug(esw->dev, "Create offloads FDB Tables\n"); + flow_group_in = mlx5_vzalloc(inlen); + if (!flow_group_in) + return -ENOMEM; + + root_ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_FDB); + if (!root_ns) { + esw_warn(dev, "Failed to get FDB flow namespace\n"); + err = -EOPNOTSUPP; + goto ns_err; + } + + err = esw_create_offloads_fast_fdb_table(esw); + if (err) + goto fast_fdb_err; + table_size = nvports + MAX_PF_SQ + 1; ft_attr.max_fte = table_size; @@ -545,18 +574,18 @@ static int esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports) return err; } -static void esw_destroy_offloads_fdb_table(struct mlx5_eswitch *esw) +static void esw_destroy_offloads_fdb_tables(struct mlx5_eswitch *esw) { if (!esw->fdb_table.fdb) return; - esw_debug(esw->dev, "Destroy offloads FDB Table\n"); + esw_debug(esw->dev, "Destroy offloads FDB Tables\n"); mlx5_del_flow_rules(esw->fdb_table.offloads.miss_rule); mlx5_destroy_flow_group(esw->fdb_table.offloads.send_to_vport_grp); mlx5_destroy_flow_group(esw->fdb_table.offloads.miss_grp); mlx5_destroy_flow_table(esw->fdb_table.offloads.fdb); - mlx5_destroy_flow_table(esw->fdb_table.fdb); + esw_destroy_offloads_fast_fdb_table(esw); } static int esw_create_offloads_table(struct mlx5_eswitch *esw) @@ -716,7 +745,7 @@ int esw_offloads_init(struct mlx5_eswitch *esw, int nvports) mlx5_remove_dev_by_protocol(esw->dev, MLX5_INTERFACE_PROTOCOL_IB);
Re: [PATCH v3 net-next 08/14] mlx4: use order-0 pages for RX
On Sun, Mar 12, 2017 at 5:29 PM, Eric Dumazetwrote: > On Sun, 2017-03-12 at 07:57 -0700, Eric Dumazet wrote: > >> Problem is XDP TX : >> >> I do not see any guarantee mlx4_en_recycle_tx_desc() runs while the NAPI >> RX is owned by current cpu. >> >> Since TX completion is using a different NAPI, I really do not believe >> we can avoid an atomic operation, like a spinlock, to protect the list >> of pages ( ring->page_cache ) > > A quick fix for net-next would be : > Hi Eric, Good catch. I don't think we need to complicate with an expensive spinlock, we can simply fix this by not enabling interrupts on XDP TX CQ (not arm this CQ at all). and handle XDP TX CQ completion from the RX NAPI context, in a serial (Atomic) manner before handling RX completions themselves. This way locking is not required since all page cache handling is done from the same context (RX NAPI). This is how we do this in mlx5, and this is the best approach (performance wise) since we dealy XDP TX CQ completions handling until we really need the space they hold (On new RX packets). > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c > b/drivers/net/ethernet/mellanox/mlx4/en_rx.c > index > aa074e57ce06fb2842fa1faabd156c3cd2fe10f5..e0b2ea8cefd6beef093c41bade199e3ec4f0291c > 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c > @@ -137,13 +137,17 @@ static int mlx4_en_prepare_rx_desc(struct mlx4_en_priv > *priv, > struct mlx4_en_rx_desc *rx_desc = ring->buf + (index * ring->stride); > struct mlx4_en_rx_alloc *frags = ring->rx_info + > (index << priv->log_rx_info); > + > if (ring->page_cache.index > 0) { > + spin_lock(>page_cache.lock); > + > /* XDP uses a single page per frame */ > if (!frags->page) { > ring->page_cache.index--; > frags->page = > ring->page_cache.buf[ring->page_cache.index].page; > frags->dma = > ring->page_cache.buf[ring->page_cache.index].dma; > } > + spin_unlock(>page_cache.lock); > frags->page_offset = XDP_PACKET_HEADROOM; > rx_desc->data[0].addr = cpu_to_be64(frags->dma + > XDP_PACKET_HEADROOM); > @@ -277,6 +281,7 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv, > } > } > > + spin_lock_init(>page_cache.lock); > ring->prod = 0; > ring->cons = 0; > ring->size = size; > @@ -419,10 +424,13 @@ bool mlx4_en_rx_recycle(struct mlx4_en_rx_ring *ring, > > if (cache->index >= MLX4_EN_CACHE_SIZE) > return false; > - > - cache->buf[cache->index].page = frame->page; > - cache->buf[cache->index].dma = frame->dma; > - cache->index++; > + spin_lock(>lock); > + if (cache->index < MLX4_EN_CACHE_SIZE) { > + cache->buf[cache->index].page = frame->page; > + cache->buf[cache->index].dma = frame->dma; > + cache->index++; > + } > + spin_unlock(>lock); > return true; > } > > diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h > b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h > index > 39f401aa30474e61c0b0029463b23a829ec35fa3..090a08020d13d8e11cc163ac9fc6ac6affccc463 > 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h > +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h > @@ -258,7 +258,8 @@ struct mlx4_en_rx_alloc { > #define MLX4_EN_CACHE_SIZE (2 * NAPI_POLL_WEIGHT) > > struct mlx4_en_page_cache { > - u32 index; > + u32 index; > + spinlock_t lock; > struct { > struct page *page; > dma_addr_t dma; > >
Re: [PATCH v3 net-next 08/14] mlx4: use order-0 pages for RX
On Sun, Mar 12, 2017 at 6:49 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: > On Sun, 2017-03-12 at 17:49 +0200, Saeed Mahameed wrote: >> On Sun, Mar 12, 2017 at 5:29 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: >> > On Sun, 2017-03-12 at 07:57 -0700, Eric Dumazet wrote: >> > >> >> Problem is XDP TX : >> >> >> >> I do not see any guarantee mlx4_en_recycle_tx_desc() runs while the NAPI >> >> RX is owned by current cpu. >> >> >> >> Since TX completion is using a different NAPI, I really do not believe >> >> we can avoid an atomic operation, like a spinlock, to protect the list >> >> of pages ( ring->page_cache ) >> > >> > A quick fix for net-next would be : >> > >> >> Hi Eric, Good catch. >> >> I don't think we need to complicate with an expensive spinlock, >> we can simply fix this by not enabling interrupts on XDP TX CQ (not >> arm this CQ at all). >> and handle XDP TX CQ completion from the RX NAPI context, in a serial >> (Atomic) manner before handling RX completions themselves. >> This way locking is not required since all page cache handling is done >> from the same context (RX NAPI). >> >> This is how we do this in mlx5, and this is the best approach >> (performance wise) since we dealy XDP TX CQ completions handling >> until we really need the space they hold (On new RX packets). > > SGTM, can you provide the patch for mlx4 ? > of course, We will send it soon. > Thanks ! > >
[PATCH net 2/5] net/mlx5: Don't save PCI state when PCI error is detected
From: Daniel Jurgens <dani...@mellanox.com> When a PCI error is detected the PCI state could be corrupt, don't save it in that flow. Save the state after initialization. After restoring the PCI state during slot reset save it again, restoring the state destroys the previously saved state info. Fixes: 05ac2c0b7438 ('net/mlx5: Fix race between PCI error handlers and health work') Signed-off-by: Daniel Jurgens <dani...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index c4242a4e8130..e2bd600d19de 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1352,6 +1352,7 @@ static int init_one(struct pci_dev *pdev, if (err) goto clean_load; + pci_save_state(pdev); return 0; clean_load: @@ -1407,9 +1408,8 @@ static pci_ers_result_t mlx5_pci_err_detected(struct pci_dev *pdev, mlx5_enter_error_state(dev); mlx5_unload_one(dev, priv, false); - /* In case of kernel call save the pci state and drain the health wq */ + /* In case of kernel call drain the health wq */ if (state) { - pci_save_state(pdev); mlx5_drain_health_wq(dev); mlx5_pci_disable_device(dev); } @@ -1461,6 +1461,7 @@ static pci_ers_result_t mlx5_pci_slot_reset(struct pci_dev *pdev) pci_set_master(pdev); pci_restore_state(pdev); + pci_save_state(pdev); if (wait_vital(pdev)) { dev_err(>dev, "%s: wait_vital timed out\n", __func__); -- 2.11.0
[PATCH net 4/5] net/mlx5e: Avoid wrong identification of rules on deletion
From: Or Gerlitz <ogerl...@mellanox.com> When deleting offloaded TC flows, we must correctly identify E-switch rules. The current check could get us wrong w.r.t to rules set on the PF. Since it's possible to set NIC rules on the PF, switch to SRIOV offloads mode and then attempt to delete a NIC rule. To solve that, we add a flags field to offloaded rules, set it on creation time and use that over the code where currently needed. Fixes: 8b32580df1cb ('net/mlx5e: Add TC vlan action for SRIOV offloads') Signed-off-by: Or Gerlitz <ogerl...@mellanox.com> Reviewed-by: Roi Dayan <r...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 33 ++--- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 44406a5ec15d..79481f4cf264 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -48,9 +48,14 @@ #include "eswitch.h" #include "vxlan.h" +enum { + MLX5E_TC_FLOW_ESWITCH = BIT(0), +}; + struct mlx5e_tc_flow { struct rhash_head node; u64 cookie; + u8 flags; struct mlx5_flow_handle *rule; struct list_headencap; /* flows sharing the same encap */ struct mlx5_esw_flow_attr *attr; @@ -177,7 +182,7 @@ static void mlx5e_tc_del_flow(struct mlx5e_priv *priv, mlx5_fc_destroy(priv->mdev, counter); } - if (esw && esw->mode == SRIOV_OFFLOADS) { + if (flow->flags & MLX5E_TC_FLOW_ESWITCH) { mlx5_eswitch_del_vlan_action(esw, flow->attr); if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP) mlx5e_detach_encap(priv, flow); @@ -598,6 +603,7 @@ static int __parse_cls_flower(struct mlx5e_priv *priv, } static int parse_cls_flower(struct mlx5e_priv *priv, + struct mlx5e_tc_flow *flow, struct mlx5_flow_spec *spec, struct tc_cls_flower_offload *f) { @@ -609,7 +615,7 @@ static int parse_cls_flower(struct mlx5e_priv *priv, err = __parse_cls_flower(priv, spec, f, _inline); - if (!err && esw->mode == SRIOV_OFFLOADS && + if (!err && (flow->flags & MLX5E_TC_FLOW_ESWITCH) && rep->vport != FDB_UPLINK_VPORT) { if (min_inline > esw->offloads.inline_mode) { netdev_warn(priv->netdev, @@ -1132,23 +1138,19 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol, struct tc_cls_flower_offload *f) { struct mlx5e_tc_table *tc = >fs.tc; - int err = 0; - bool fdb_flow = false; + int err, attr_size = 0; u32 flow_tag, action; struct mlx5e_tc_flow *flow; struct mlx5_flow_spec *spec; struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + u8 flow_flags = 0; - if (esw && esw->mode == SRIOV_OFFLOADS) - fdb_flow = true; - - if (fdb_flow) - flow = kzalloc(sizeof(*flow) + - sizeof(struct mlx5_esw_flow_attr), - GFP_KERNEL); - else - flow = kzalloc(sizeof(*flow), GFP_KERNEL); + if (esw && esw->mode == SRIOV_OFFLOADS) { + flow_flags = MLX5E_TC_FLOW_ESWITCH; + attr_size = sizeof(struct mlx5_esw_flow_attr); + } + flow = kzalloc(sizeof(*flow) + attr_size, GFP_KERNEL); spec = mlx5_vzalloc(sizeof(*spec)); if (!spec || !flow) { err = -ENOMEM; @@ -1156,12 +1158,13 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol, } flow->cookie = f->cookie; + flow->flags = flow_flags; - err = parse_cls_flower(priv, spec, f); + err = parse_cls_flower(priv, flow, spec, f); if (err < 0) goto err_free; - if (fdb_flow) { + if (flow->flags & MLX5E_TC_FLOW_ESWITCH) { flow->attr = (struct mlx5_esw_flow_attr *)(flow + 1); err = parse_tc_fdb_actions(priv, f->exts, flow); if (err < 0) -- 2.11.0
[PATCH net 1/5] net/mlx5: Fix create autogroup prev initializer
From: Paul Blakey <pa...@mellanox.com> The autogroups list is a list of non overlapping group boundaries sorted by their start index. If the autogroups list wasn't empty and an empty group slot was found at the start of the list, the new group was added to the end of the list instead of the beginning, as the prev initializer was incorrect. When this was repeated, it caused multiple groups to have overlapping boundaries. Fixed that by correctly initializing the prev pointer to the start of the list. Fixes: eccec8da3b4e ('net/mlx5: Keep autogroups list ordered') Signed-off-by: Paul Blakey <pa...@mellanox.com> Reviewed-by: Mark Bloch <ma...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 2478516a61e2..ded27bb9a3b6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -1136,7 +1136,7 @@ static struct mlx5_flow_group *create_autogroup(struct mlx5_flow_table *ft, u32 *match_criteria) { int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in); - struct list_head *prev = ft->node.children.prev; + struct list_head *prev = >node.children; unsigned int candidate_index = 0; struct mlx5_flow_group *fg; void *match_criteria_addr; -- 2.11.0
[PATCH net 0/5] Mellanox mlx5 fixes 2017-03-09
Hi Dave, This series contains some mlx5 core and ethernet driver fixes. For -stable: net/mlx5e: remove IEEE/CEE mode check when setting DCBX mode (for kernel >= 4.10) net/mlx5e: Avoid wrong identification of rules on deletion (for kernel >= 4.9) net/mlx5: Don't save PCI state when PCI error is detected (for kernel >= 4.9) net/mlx5: Fix create autogroup prev initializer (for kernel >=4.9) Thanks, Saeed. Daniel Jurgens (1): net/mlx5: Don't save PCI state when PCI error is detected Eugenia Emantayev (1): net/mlx5e: Fix loopback selftest Huy Nguyen (1): net/mlx5e: remove IEEE/CEE mode check when setting DCBX mode Or Gerlitz (1): net/mlx5e: Avoid wrong identification of rules on deletion Paul Blakey (1): net/mlx5: Fix create autogroup prev initializer drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 10 +++ .../net/ethernet/mellanox/mlx5/core/en_selftest.c | 5 +--- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 33 -- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/main.c | 5 ++-- 5 files changed, 28 insertions(+), 27 deletions(-) -- 2.11.0
[PATCH net 5/5] net/mlx5e: Fix loopback selftest
From: Eugenia Emantayev <euge...@mellanox.com> Change packet type handler to ETH_P_IP instead of ETH_P_ALL since we are already expecting an IP packet. Also, using ETH_P_ALL will cause the loopback test packet type handler to be called on all outgoing packets, especially our own self loopback test SKB, which will be validated on xmit as well, and we don't want that. Tested with: ethtool -t ethX validated that the loopback test passes. Fixes: 0952da791c97 ('net/mlx5e: Add support for loopback selftest') Signed-off-by: Eugenia Emantayev <euge...@mellanox.com> Signed-off-by: Saeed Mahameed <sae...@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c index 31e3cb7ee5fe..5621dcfda4f1 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c @@ -204,9 +204,6 @@ mlx5e_test_loopback_validate(struct sk_buff *skb, struct iphdr *iph; /* We are only going to peek, no need to clone the SKB */ - if (skb->protocol != htons(ETH_P_IP)) - goto out; - if (MLX5E_TEST_PKT_SIZE - ETH_HLEN > skb_headlen(skb)) goto out; @@ -249,7 +246,7 @@ static int mlx5e_test_loopback_setup(struct mlx5e_priv *priv, lbtp->loopback_ok = false; init_completion(>comp); - lbtp->pt.type = htons(ETH_P_ALL); + lbtp->pt.type = htons(ETH_P_IP); lbtp->pt.func = mlx5e_test_loopback_validate; lbtp->pt.dev = priv->netdev; lbtp->pt.af_packet_priv = lbtp; -- 2.11.0