Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-12-01 Thread Saeed Mahameed
On Wed, Nov 30, 2016 at 11:00 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Wed, 2016-11-30 at 22:42 +0200, Saeed Mahameed wrote:
>> On Wed, Nov 30, 2016 at 7:35 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
>> > On Wed, 2016-11-30 at 18:46 +0200, Saeed Mahameed wrote:
>> >
>> >> we had/still have the proper stats they are the ones that
>> >> mlx4_en_fold_software_stats is trying to cache into  (they always
>> >> exist),
>> >> but the ones that you are trying to read from (the mlx4 rings) are gone !
>> >>
>> >> This bug is totally new and as i warned, this is another symptom of
>> >> the real root cause (can't sleep while reading stats).
>> >>
>> >> Eric what do you suggest ? Keep pre-allocated MAX_RINGS stats  and
>> >> always iterate over all of them to query stats ?
>> >> what if you have one ring/none/1K ? how would you know how many to query ?
>> >
>> > I am suggesting I will fix the bug I introduced.
>> >
>> > Do not panic.
>> >
>> >
>>
>> Not at all, I trust you are the only one who is capable of providing
>> the best solution.
>> I am just trying to read your mind :-).
>>
>> As i said i like the solution and i want to adapt it to mlx5, so I am
>> a little bit enthusiastic :)
>
> What about the following fix guys ?
>
> As a bonus we update the stats right before they are sent to monitors
> via rtnetlink ;)

Hi Eric, Thanks for the patch, I already acked it.

I have one educational question (not related to this patch, but
related to stats reading in general).
I was wondering why do we need to disable bh every time we read stats
"spin_lock_bh" ? is it essential ?

I checked and in mlx4 we don't hold stats_lock in softirq
(en_rx.c/en_tx.c), so I don't see any deadlock risk in here..

 Thanks
Saeed.


Re: [PATCH net-next] mlx4: fix use-after-free in mlx4_en_fold_software_stats()

2016-12-01 Thread Saeed Mahameed
On Thu, Dec 1, 2016 at 3:02 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> From: Eric Dumazet <eduma...@google.com>
>
> My recent commit to get more precise rx/tx counters in ndo_get_stats64()
> can lead to crashes at device dismantle, as Jesper found out.
>
> We must prevent mlx4_en_fold_software_stats() trying to access
> tx/rx rings if they are deleted.
>
> Fix this by adding a test against priv->port_up in
> mlx4_en_fold_software_stats()
>
> Calling mlx4_en_fold_software_stats() from mlx4_en_stop_port()
> allows us to eventually broadcast the latest/current counters to
> rtnetlink monitors.
>
> Fixes: 40931b85113d ("mlx4: give precise rx/tx bytes/packets counters")
> Signed-off-by: Eric Dumazet <eduma...@google.com>
> Reported-and-bisected-by: Jesper Dangaard Brouer <bro...@redhat.com>
> Tested-by: Jesper Dangaard Brouer <bro...@redhat.com>
> Cc: Tariq Toukan <tar...@mellanox.com>
> Cc: Saeed Mahameed <sae...@dev.mellanox.co.il>

Acked-by: Saeed Mahameed <sae...@mellanox.com>


Re: [WIP] net+mlx4: auto doorbell

2016-11-30 Thread Saeed Mahameed
On Wed, Nov 30, 2016 at 5:44 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Wed, 2016-11-30 at 15:50 +0200, Saeed Mahameed wrote:
>> On Tue, Nov 29, 2016 at 8:58 AM, Eric Dumazet <eric.duma...@gmail.com> wrote:
>> > On Mon, 2016-11-21 at 10:10 -0800, Eric Dumazet wrote:
>> >
>> >
>> >> Not sure it this has been tried before, but the doorbell avoidance could
>> >> be done by the driver itself, because it knows a TX completion will come
>> >> shortly (well... if softirqs are not delayed too much !)
>> >>
>> >> Doorbell would be forced only if :
>> >>
>> >> ("skb->xmit_more is not set" AND "TX engine is not 'started yet'" )
>> >> OR
>> >> ( too many [1] packets were put in TX ring buffer, no point deferring
>> >> more)
>> >>
>> >> Start the pump, but once it is started, let the doorbells being done by
>> >> TX completion.
>> >>
>> >> ndo_start_xmit and TX completion handler would have to maintain a shared
>> >> state describing if packets were ready but doorbell deferred.
>> >>
>> >>
>> >> Note that TX completion means "if at least one packet was drained",
>> >> otherwise busy polling, constantly calling napi->poll() would force a
>> >> doorbell too soon for devices sharing a NAPI for both RX and TX.
>> >>
>> >> But then, maybe busy poll would like to force a doorbell...
>> >>
>> >> I could try these ideas on mlx4 shortly.
>> >>
>> >>
>> >> [1] limit could be derived from active "ethtool -c" params, eg tx-frames
>> >
>> > I have a WIP, that increases pktgen rate by 75 % on mlx4 when bulking is
>> > not used.
>>
>> Hi Eric, Nice Idea indeed and we need something like this,
>> today we almost don't exploit the TX bulking at all.
>>
>> But please see below, i am not sure different contexts should share
>> the doorbell ringing, it is really risky.
>>
>> >  drivers/net/ethernet/mellanox/mlx4/en_rx.c   |2
>> >  drivers/net/ethernet/mellanox/mlx4/en_tx.c   |   90 +++--
>> >  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |4
>> >  include/linux/netdevice.h|1
>> >  net/core/net-sysfs.c |   18 +++
>> >  5 files changed, 83 insertions(+), 32 deletions(-)
>> >
>> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
>> > b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> > index 6562f78b07f4..fbea83218fc0 100644
>> > --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> > @@ -1089,7 +1089,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, 
>> > struct mlx4_en_cq *cq, int bud
>> >
>> > if (polled) {
>> > if (doorbell_pending)
>> > -   
>> > mlx4_en_xmit_doorbell(priv->tx_ring[TX_XDP][cq->ring]);
>> > +   mlx4_en_xmit_doorbell(dev, 
>> > priv->tx_ring[TX_XDP][cq->ring]);
>> >
>> > mlx4_cq_set_ci(>mcq);
>> > wmb(); /* ensure HW sees CQ consumer before we post new 
>> > buffers */
>> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c 
>> > b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> > index 4b597dca5c52..affebb435679 100644
>> > --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> > @@ -67,7 +67,7 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
>> > ring->size = size;
>> > ring->size_mask = size - 1;
>> > ring->sp_stride = stride;
>> > -   ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
>> > +   ring->full_size = ring->size - HEADROOM - 2*MAX_DESC_TXBBS;
>> >
>> > tmp = size * sizeof(struct mlx4_en_tx_info);
>> > ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
>> > @@ -193,6 +193,7 @@ int mlx4_en_activate_tx_ring(struct mlx4_en_priv *priv,
>> > ring->sp_cqn = cq;
>> > ring->prod = 0;
>> > ring->cons = 0x;
>> > +   ring->ncons = 0;
>> > ring->last_nr_txbb = 1;
>> > memset(ring->tx_info, 0, ring->size * sizeof(struct 
>>

[PATCH net-next V2 6/7] net/mlx5e: Refactor tc del flow to accept mlx5e_tc_flow instance

2016-11-30 Thread Saeed Mahameed
From: Roi Dayan <r...@mellanox.com>

Change the function that deletes offloaded TC rule to get
struct mlx5e_tc_flow instance which contains both the flow
handle and flow attributes. This is a cleanup needed for
downstream patches, it doesn't change any functionality.

Signed-off-by: Roi Dayan <r...@mellanox.com>
Reviewed-by: Or Gerlitz <ogerl...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 4d71445..3875c1c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -143,18 +143,17 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 }
 
 static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
- struct mlx5_flow_handle *rule,
- struct mlx5_esw_flow_attr *attr)
+ struct mlx5e_tc_flow *flow)
 {
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
struct mlx5_fc *counter = NULL;
 
-   counter = mlx5_flow_rule_counter(rule);
+   counter = mlx5_flow_rule_counter(flow->rule);
 
-   mlx5_del_flow_rules(rule);
+   mlx5_del_flow_rules(flow->rule);
 
if (esw && esw->mode == SRIOV_OFFLOADS)
-   mlx5_eswitch_del_vlan_action(esw, attr);
+   mlx5_eswitch_del_vlan_action(esw, flow->attr);
 
mlx5_fc_destroy(priv->mdev, counter);
 
@@ -1005,7 +1004,7 @@ int mlx5e_delete_flower(struct mlx5e_priv *priv,
 
rhashtable_remove_fast(>ht, >node, tc->ht_params);
 
-   mlx5e_tc_del_flow(priv, flow->rule, flow->attr);
+   mlx5e_tc_del_flow(priv, flow);
 
if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
mlx5e_detach_encap(priv, flow);
@@ -1065,7 +1064,7 @@ static void _mlx5e_tc_del_flow(void *ptr, void *arg)
struct mlx5e_tc_flow *flow = ptr;
struct mlx5e_priv *priv = arg;
 
-   mlx5e_tc_del_flow(priv, flow->rule, flow->attr);
+   mlx5e_tc_del_flow(priv, flow);
kfree(flow);
 }
 
-- 
2.7.4



[PATCH net-next V2 4/7] net/mlx5e: Remove redundant hashtable lookup in configure flower

2016-11-30 Thread Saeed Mahameed
From: Roi Dayan <r...@mellanox.com>

We will never find a flow with the same cookie as cls_flower always
allocates a new flow and the cookie is the allocated memory address.

Fixes: e3a2b7ed018e ("net/mlx5e: Support offload cls_flower with drop action")
Signed-off-by: Roi Dayan <r...@mellanox.com>
Reviewed-by: Hadar Hen Zion <had...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 26 +++--
 1 file changed, 7 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 4d06fab..dd6d954 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -915,25 +915,17 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, 
__be16 protocol,
u32 flow_tag, action;
struct mlx5e_tc_flow *flow;
struct mlx5_flow_spec *spec;
-   struct mlx5_flow_handle *old = NULL;
-   struct mlx5_esw_flow_attr *old_attr = NULL;
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 
if (esw && esw->mode == SRIOV_OFFLOADS)
fdb_flow = true;
 
-   flow = rhashtable_lookup_fast(>ht, >cookie,
- tc->ht_params);
-   if (flow) {
-   old = flow->rule;
-   old_attr = flow->attr;
-   } else {
-   if (fdb_flow)
-   flow = kzalloc(sizeof(*flow) + sizeof(struct 
mlx5_esw_flow_attr),
-  GFP_KERNEL);
-   else
-   flow = kzalloc(sizeof(*flow), GFP_KERNEL);
-   }
+   if (fdb_flow)
+   flow = kzalloc(sizeof(*flow) +
+  sizeof(struct mlx5_esw_flow_attr),
+  GFP_KERNEL);
+   else
+   flow = kzalloc(sizeof(*flow), GFP_KERNEL);
 
spec = mlx5_vzalloc(sizeof(*spec));
if (!spec || !flow) {
@@ -970,17 +962,13 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, 
__be16 protocol,
if (err)
goto err_del_rule;
 
-   if (old)
-   mlx5e_tc_del_flow(priv, old, old_attr);
-
goto out;
 
 err_del_rule:
mlx5_del_flow_rules(flow->rule);
 
 err_free:
-   if (!old)
-   kfree(flow);
+   kfree(flow);
 out:
kvfree(spec);
return err;
-- 
2.7.4



[PATCH net-next V2 1/7] net/mlx5e: Implement Fragmented Work Queue (WQ)

2016-11-30 Thread Saeed Mahameed
From: Tariq Toukan <tar...@mellanox.com>

Add new type of struct mlx5_frag_buf which is used to allocate fragmented
buffers rather than contiguous, and make the Completion Queues (CQs) use
it as they are big (default of 2MB per CQ in Striding RQ).

This fixes the failures of type:
"mlx5e_open_locked: mlx5e_open_channels failed, -12"
due to dma_zalloc_coherent insufficient contiguous coherent memory to
satisfy the driver's request when the user tries to setup more or larger
rings.

Signed-off-by: Tariq Toukan <tar...@mellanox.com>
Reported-by: Sebastian Ott <seb...@linux.vnet.ibm.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/alloc.c   | 66 +++
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 10 ++--
 drivers/net/ethernet/mellanox/mlx5/core/wq.c  | 26 ++---
 drivers/net/ethernet/mellanox/mlx5/core/wq.h  | 18 +--
 include/linux/mlx5/driver.h   | 11 
 6 files changed, 116 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/alloc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
index 2c6e3c7..44791de 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
@@ -106,6 +106,63 @@ void mlx5_buf_free(struct mlx5_core_dev *dev, struct 
mlx5_buf *buf)
 }
 EXPORT_SYMBOL_GPL(mlx5_buf_free);
 
+int mlx5_frag_buf_alloc_node(struct mlx5_core_dev *dev, int size,
+struct mlx5_frag_buf *buf, int node)
+{
+   int i;
+
+   buf->size = size;
+   buf->npages = 1 << get_order(size);
+   buf->page_shift = PAGE_SHIFT;
+   buf->frags = kcalloc(buf->npages, sizeof(struct mlx5_buf_list),
+GFP_KERNEL);
+   if (!buf->frags)
+   goto err_out;
+
+   for (i = 0; i < buf->npages; i++) {
+   struct mlx5_buf_list *frag = >frags[i];
+   int frag_sz = min_t(int, size, PAGE_SIZE);
+
+   frag->buf = mlx5_dma_zalloc_coherent_node(dev, frag_sz,
+ >map, node);
+   if (!frag->buf)
+   goto err_free_buf;
+   if (frag->map & ((1 << buf->page_shift) - 1)) {
+   dma_free_coherent(>pdev->dev, frag_sz,
+ buf->frags[i].buf, buf->frags[i].map);
+   mlx5_core_warn(dev, "unexpected map alignment: 0x%p, 
page_shift=%d\n",
+  (void *)frag->map, buf->page_shift);
+   goto err_free_buf;
+   }
+   size -= frag_sz;
+   }
+
+   return 0;
+
+err_free_buf:
+   while (i--)
+   dma_free_coherent(>pdev->dev, PAGE_SIZE, buf->frags[i].buf,
+ buf->frags[i].map);
+   kfree(buf->frags);
+err_out:
+   return -ENOMEM;
+}
+
+void mlx5_frag_buf_free(struct mlx5_core_dev *dev, struct mlx5_frag_buf *buf)
+{
+   int size = buf->size;
+   int i;
+
+   for (i = 0; i < buf->npages; i++) {
+   int frag_sz = min_t(int, size, PAGE_SIZE);
+
+   dma_free_coherent(>pdev->dev, frag_sz, buf->frags[i].buf,
+ buf->frags[i].map);
+   size -= frag_sz;
+   }
+   kfree(buf->frags);
+}
+
 static struct mlx5_db_pgdir *mlx5_alloc_db_pgdir(struct mlx5_core_dev *dev,
 int node)
 {
@@ -230,3 +287,12 @@ void mlx5_fill_page_array(struct mlx5_buf *buf, __be64 
*pas)
}
 }
 EXPORT_SYMBOL_GPL(mlx5_fill_page_array);
+
+void mlx5_fill_page_frag_array(struct mlx5_frag_buf *buf, __be64 *pas)
+{
+   int i;
+
+   for (i = 0; i < buf->npages; i++)
+   pas[i] = cpu_to_be64(buf->frags[i].map);
+}
+EXPORT_SYMBOL_GPL(mlx5_fill_page_frag_array);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 442dbc3..f16f7fb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -286,7 +286,7 @@ struct mlx5e_cq {
u16decmprs_wqe_counter;
 
/* control */
-   struct mlx5_wq_ctrlwq_ctrl;
+   struct mlx5_frag_wq_ctrl   wq_ctrl;
 } cacheline_aligned_in_smp;
 
 struct mlx5e_rq;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 6b492ca..ba25cd3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1201,7 +1201,7 @@ static int mlx5e_create_cq(struct mlx5e_channel *c,
 
 static void mlx5e

[PATCH net-next V2 7/7] net/mlx5e: Remove flow encap entry in the correct place

2016-11-30 Thread Saeed Mahameed
From: Roi Dayan <r...@mellanox.com>

Handling flow encap entry should be inside tc del flow
and is only relevant for offloaded eswitch TC rules.

Fixes: 11a457e9b6c1 ("net/mlx5e: Add basic TC tunnel set action for SRIOV 
offloads")
Signed-off-by: Roi Dayan <r...@mellanox.com>
Reviewed-by: Or Gerlitz <ogerl...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 43 +
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 3875c1c..f07ef8c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -142,6 +142,24 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
return mlx5_eswitch_add_offloaded_rule(esw, spec, attr);
 }
 
+static void mlx5e_detach_encap(struct mlx5e_priv *priv,
+  struct mlx5e_tc_flow *flow) {
+   struct list_head *next = flow->encap.next;
+
+   list_del(>encap);
+   if (list_empty(next)) {
+   struct mlx5_encap_entry *e;
+
+   e = list_entry(next, struct mlx5_encap_entry, flows);
+   if (e->n) {
+   mlx5_encap_dealloc(priv->mdev, e->encap_id);
+   neigh_release(e->n);
+   }
+   hlist_del_rcu(>encap_hlist);
+   kfree(e);
+   }
+}
+
 static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
  struct mlx5e_tc_flow *flow)
 {
@@ -152,8 +170,11 @@ static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
 
mlx5_del_flow_rules(flow->rule);
 
-   if (esw && esw->mode == SRIOV_OFFLOADS)
+   if (esw && esw->mode == SRIOV_OFFLOADS) {
mlx5_eswitch_del_vlan_action(esw, flow->attr);
+   if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
+   mlx5e_detach_encap(priv, flow);
+   }
 
mlx5_fc_destroy(priv->mdev, counter);
 
@@ -973,24 +994,6 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 
protocol,
return err;
 }
 
-static void mlx5e_detach_encap(struct mlx5e_priv *priv,
-  struct mlx5e_tc_flow *flow) {
-   struct list_head *next = flow->encap.next;
-
-   list_del(>encap);
-   if (list_empty(next)) {
-   struct mlx5_encap_entry *e;
-
-   e = list_entry(next, struct mlx5_encap_entry, flows);
-   if (e->n) {
-   mlx5_encap_dealloc(priv->mdev, e->encap_id);
-   neigh_release(e->n);
-   }
-   hlist_del_rcu(>encap_hlist);
-   kfree(e);
-   }
-}
-
 int mlx5e_delete_flower(struct mlx5e_priv *priv,
struct tc_cls_flower_offload *f)
 {
@@ -1006,8 +1009,6 @@ int mlx5e_delete_flower(struct mlx5e_priv *priv,
 
mlx5e_tc_del_flow(priv, flow);
 
-   if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
-   mlx5e_detach_encap(priv, flow);
 
kfree(flow);
 
-- 
2.7.4



[PATCH net-next V2 2/7] net/mlx5e: Move function mlx5e_create_umr_mkey

2016-11-30 Thread Saeed Mahameed
From: Tariq Toukan <tar...@mellanox.com>

In next patch we are going to create a UMR MKey per RQ, we need
mlx5e_create_umr_mkey declared before mlx5e_create_rq.

Signed-off-by: Tariq Toukan <tar...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 74 +++
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ba25cd3..49ca30b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -471,6 +471,43 @@ static void mlx5e_rq_free_mpwqe_info(struct mlx5e_rq *rq)
kfree(rq->mpwqe.info);
 }
 
+static int mlx5e_create_umr_mkey(struct mlx5e_priv *priv)
+{
+   struct mlx5_core_dev *mdev = priv->mdev;
+   u64 npages = MLX5E_REQUIRED_MTTS(priv->profile->max_nch(mdev),
+
BIT(MLX5E_PARAMS_MAXIMUM_LOG_RQ_SIZE_MPW));
+   int inlen = MLX5_ST_SZ_BYTES(create_mkey_in);
+   void *mkc;
+   u32 *in;
+   int err;
+
+   in = mlx5_vzalloc(inlen);
+   if (!in)
+   return -ENOMEM;
+
+   mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+
+   npages = min_t(u32, ALIGN(U16_MAX, 4) * 2, npages);
+
+   MLX5_SET(mkc, mkc, free, 1);
+   MLX5_SET(mkc, mkc, umr_en, 1);
+   MLX5_SET(mkc, mkc, lw, 1);
+   MLX5_SET(mkc, mkc, lr, 1);
+   MLX5_SET(mkc, mkc, access_mode, MLX5_MKC_ACCESS_MODE_MTT);
+
+   MLX5_SET(mkc, mkc, qpn, 0xff);
+   MLX5_SET(mkc, mkc, pd, mdev->mlx5e_res.pdn);
+   MLX5_SET64(mkc, mkc, len, npages << PAGE_SHIFT);
+   MLX5_SET(mkc, mkc, translations_octword_size,
+MLX5_MTT_OCTW(npages));
+   MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT);
+
+   err = mlx5_core_create_mkey(mdev, >umr_mkey, in, inlen);
+
+   kvfree(in);
+   return err;
+}
+
 static int mlx5e_create_rq(struct mlx5e_channel *c,
   struct mlx5e_rq_param *param,
   struct mlx5e_rq *rq)
@@ -3625,43 +3662,6 @@ static void mlx5e_destroy_q_counter(struct mlx5e_priv 
*priv)
mlx5_core_dealloc_q_counter(priv->mdev, priv->q_counter);
 }
 
-static int mlx5e_create_umr_mkey(struct mlx5e_priv *priv)
-{
-   struct mlx5_core_dev *mdev = priv->mdev;
-   u64 npages = MLX5E_REQUIRED_MTTS(priv->profile->max_nch(mdev),
-
BIT(MLX5E_PARAMS_MAXIMUM_LOG_RQ_SIZE_MPW));
-   int inlen = MLX5_ST_SZ_BYTES(create_mkey_in);
-   void *mkc;
-   u32 *in;
-   int err;
-
-   in = mlx5_vzalloc(inlen);
-   if (!in)
-   return -ENOMEM;
-
-   mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
-
-   npages = min_t(u32, ALIGN(U16_MAX, 4) * 2, npages);
-
-   MLX5_SET(mkc, mkc, free, 1);
-   MLX5_SET(mkc, mkc, umr_en, 1);
-   MLX5_SET(mkc, mkc, lw, 1);
-   MLX5_SET(mkc, mkc, lr, 1);
-   MLX5_SET(mkc, mkc, access_mode, MLX5_MKC_ACCESS_MODE_MTT);
-
-   MLX5_SET(mkc, mkc, qpn, 0xff);
-   MLX5_SET(mkc, mkc, pd, mdev->mlx5e_res.pdn);
-   MLX5_SET64(mkc, mkc, len, npages << PAGE_SHIFT);
-   MLX5_SET(mkc, mkc, translations_octword_size,
-MLX5_MTT_OCTW(npages));
-   MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT);
-
-   err = mlx5_core_create_mkey(mdev, >umr_mkey, in, inlen);
-
-   kvfree(in);
-   return err;
-}
-
 static void mlx5e_nic_init(struct mlx5_core_dev *mdev,
   struct net_device *netdev,
   const struct mlx5e_profile *profile,
-- 
2.7.4



[PATCH net-next V2 3/7] net/mlx5e: Create UMR MKey per RQ

2016-11-30 Thread Saeed Mahameed
From: Tariq Toukan <tar...@mellanox.com>

In Striding RQ implementation, we used a single UMR
(User-Mode Memory Registration) memory key for all RQs.
When the product of RQs number*size gets high, we hit a
limitation of u16 field size in FW.

Here we move to using a UMR memory key per RQ, so we can
scale to any number of rings, with the maximum buffer
size in each.

Signed-off-by: Tariq Toukan <tar...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   | 12 ++---
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 12 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 53 --
 3 files changed, 35 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f16f7fb..63dd639 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -77,9 +77,9 @@
 MLX5_MPWRQ_WQE_PAGE_ORDER)
 
 #define MLX5_MTT_OCTW(npages) (ALIGN(npages, 8) / 2)
-#define MLX5E_REQUIRED_MTTS(rqs, wqes)\
-   (rqs * wqes * ALIGN(MLX5_MPWRQ_PAGES_PER_WQE, 8))
-#define MLX5E_VALID_NUM_MTTS(num_mtts) (MLX5_MTT_OCTW(num_mtts) <= U16_MAX)
+#define MLX5E_REQUIRED_MTTS(wqes)  \
+   (wqes * ALIGN(MLX5_MPWRQ_PAGES_PER_WQE, 8))
+#define MLX5E_VALID_NUM_MTTS(num_mtts) (MLX5_MTT_OCTW(num_mtts) - 1 <= U16_MAX)
 
 #define MLX5_UMR_ALIGN (2048)
 #define MLX5_MPWRQ_SMALL_PACKET_THRESHOLD  (128)
@@ -347,7 +347,6 @@ struct mlx5e_rq {
struct {
struct mlx5e_mpw_info *info;
void  *mtt_no_align;
-   u32mtt_offset;
} mpwqe;
};
struct {
@@ -382,6 +381,7 @@ struct mlx5e_rq {
u32rqn;
struct mlx5e_channel  *channel;
struct mlx5e_priv *priv;
+   struct mlx5_core_mkey  umr_mkey;
 } cacheline_aligned_in_smp;
 
 struct mlx5e_umr_dma_info {
@@ -689,7 +689,6 @@ struct mlx5e_priv {
 
unsigned long  state;
struct mutex   state_lock; /* Protects Interface state */
-   struct mlx5_core_mkey  umr_mkey;
struct mlx5e_rqdrop_rq;
 
struct mlx5e_channel **channel;
@@ -838,8 +837,7 @@ static inline void mlx5e_cq_arm(struct mlx5e_cq *cq)
 
 static inline u32 mlx5e_get_wqe_mtt_offset(struct mlx5e_rq *rq, u16 wqe_ix)
 {
-   return rq->mpwqe.mtt_offset +
-   wqe_ix * ALIGN(MLX5_MPWRQ_PAGES_PER_WQE, 8);
+   return wqe_ix * ALIGN(MLX5_MPWRQ_PAGES_PER_WQE, 8);
 }
 
 static inline int mlx5e_get_max_num_channels(struct mlx5_core_dev *mdev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index aa963d7..352462a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -499,8 +499,7 @@ static int mlx5e_set_ringparam(struct net_device *dev,
return -EINVAL;
}
 
-   num_mtts = MLX5E_REQUIRED_MTTS(priv->params.num_channels,
-  rx_pending_wqes);
+   num_mtts = MLX5E_REQUIRED_MTTS(rx_pending_wqes);
if (priv->params.rq_wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ &&
!MLX5E_VALID_NUM_MTTS(num_mtts)) {
netdev_info(dev, "%s: rx_pending (%d) request can't be 
satisfied, try to reduce.\n",
@@ -565,7 +564,6 @@ static int mlx5e_set_channels(struct net_device *dev,
unsigned int count = ch->combined_count;
bool arfs_enabled;
bool was_opened;
-   u32 num_mtts;
int err = 0;
 
if (!count) {
@@ -584,14 +582,6 @@ static int mlx5e_set_channels(struct net_device *dev,
return -EINVAL;
}
 
-   num_mtts = MLX5E_REQUIRED_MTTS(count, BIT(priv->params.log_rq_size));
-   if (priv->params.rq_wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ &&
-   !MLX5E_VALID_NUM_MTTS(num_mtts)) {
-   netdev_info(dev, "%s: rx count (%d) request can't be satisfied, 
try to reduce.\n",
-   __func__, count);
-   return -EINVAL;
-   }
-
if (priv->params.num_channels == count)
return 0;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 49ca30b..84a4adb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -471,24 +471,25 @@ static void mlx5e_rq_free_mpwqe_info(struct mlx5e_rq *rq)
kfree(rq->mpwqe.info);
 }
 
-static int mlx5e_create_umr_mkey(struct mlx5e_priv *priv)
+static int mlx5e_crea

[PATCH net-next V2 5/7] net/mlx5e: Correct cleanup order when deleting offloaded TC rules

2016-11-30 Thread Saeed Mahameed
From: Roi Dayan <r...@mellanox.com>

According to the reverse unwinding principle, on delete time we should
first handle deletion of the steering rule and later handle the vlan
deletion from the eswitch.

Fixes: 8b32580df1cb ("net/mlx5e: Add TC vlan action for SRIOV offloads")
Signed-off-by: Roi Dayan <r...@mellanox.com>
Reviewed-by: Or Gerlitz <ogerl...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index dd6d954..4d71445 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -151,11 +151,11 @@ static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
 
counter = mlx5_flow_rule_counter(rule);
 
+   mlx5_del_flow_rules(rule);
+
if (esw && esw->mode == SRIOV_OFFLOADS)
mlx5_eswitch_del_vlan_action(esw, attr);
 
-   mlx5_del_flow_rules(rule);
-
mlx5_fc_destroy(priv->mdev, counter);
 
if (!mlx5e_tc_num_filters(priv) && (priv->fs.tc.t)) {
-- 
2.7.4



[PATCH net-next V2 0/7] Mellanox 100G mlx5 updates 2016-11-29

2016-11-30 Thread Saeed Mahameed
Hi Dave,

The following series from Tariq and Roi, provides some critical fixes
and updates for the mlx5e driver.

>From Tariq: 
 - Fix driver coherent memory huge allocation issues by fragmenting
   completion queues, in a way that is transparent to the netdev driver by
   providing a new buffer type "mlx5_frag_buf" with the same access API.
 - Create UMR MKey per RQ to have better scalability.

>From Roi:
 - Some fixes for the encap-decap support and tc flower added lately to the
   mlx5e driver.

v1->v2:
 - Fix start index in error flow of mlx5_frag_buf_alloc_node, pointed out by 
Eric.

This series was generated against commit:
31ac1c19455f ("geneve: fix ip_hdr_len reserved for geneve6 tunnel.")

Thanks,
Saeed.

Roi Dayan (4):
  net/mlx5e: Remove redundant hashtable lookup in configure flower
  net/mlx5e: Correct cleanup order when deleting offloaded TC rules
  net/mlx5e: Refactor tc del flow to accept mlx5e_tc_flow instance
  net/mlx5e: Remove flow encap entry in the correct place

Tariq Toukan (3):
  net/mlx5e: Implement Fragmented Work Queue (WQ)
  net/mlx5e: Move function mlx5e_create_umr_mkey
  net/mlx5e: Create UMR MKey per RQ

 drivers/net/ethernet/mellanox/mlx5/core/alloc.c|  66 +++
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  14 +--
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  12 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 121 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c|  82 ++
 drivers/net/ethernet/mellanox/mlx5/core/wq.c   |  26 +++--
 drivers/net/ethernet/mellanox/mlx5/core/wq.h   |  18 ++-
 include/linux/mlx5/driver.h|  11 ++
 8 files changed, 215 insertions(+), 135 deletions(-)

-- 
2.7.4



Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-11-30 Thread Saeed Mahameed
On Wed, Nov 30, 2016 at 5:58 PM, Eric Dumazet  wrote:
> On Wed, 2016-11-30 at 15:08 +0100, Jesper Dangaard Brouer wrote:
>> On Fri, 25 Nov 2016 07:46:20 -0800 Eric Dumazet  
>> wrote:
>>
>> > From: Eric Dumazet 
>>
>> Ended up-in net-next as:
>>
>>  commit 40931b85113dad7881d49e8759e5ad41d30a5e6c
>>  Author: Eric Dumazet 
>>  Date:   Fri Nov 25 07:46:20 2016 -0800
>>
>> mlx4: give precise rx/tx bytes/packets counters
>>
>> mlx4 stats are chaotic because a deferred work queue is responsible
>> to update them every 250 ms.
>>
>> Likely after this patch I get this crash (below), when rebooting my machine.
>> Looks like a device removal order thing.
>> Tested with net-next at commit 93ba5504.
>>
>> [...]
>> [ 1967.248453] mlx5_core :02:00.1: Shutdown was called
>> [ 1967.854556] mlx5_core :02:00.0: Shutdown was called
>> [ 1968.443015] e1000e: EEE TX LPI TIMER: 0011
>> [ 1968.484676] sd 3:0:0:0: [sda] Synchronizing SCSI cache
>> [ 1968.528354] mlx4_core :01:00.0: mlx4_shutdown was called
>> [ 1968.534054] mlx4_en: mlx4p1: Close port called
>> [ 1968.571156] mlx4_en :01:00.0: removed PHC
>> [ 1968.575677] mlx4_en: mlx4p2: Close port called
>> [ 1969.506602] BUG: unable to handle kernel NULL pointer dereference at 
>> 0d08
>> [ 1969.514530] IP: [] 
>> mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en]
>> [ 1969.522963] PGD 0 [ 1969.524803]
>> [ 1969.526332] Oops:  [#1] PREEMPT SMP
>> [ 1969.530201] Modules linked in: coretemp kvm_intel kvm irqbypass 
>> intel_cstate mxm_wmi i2c_i801 intel_rapl_perf i2c_smbus sg pcspkr i2c_core 
>> shpchp nfsd wmi video acpi_pad auth_rpcgss oid_registry nfs_acl lockd grace 
>> sunrpc ip_tables x_tables mlx4_en e1000e mlx5_core ptp serio_raw sd_mod 
>> mlx4_core pps_core devlink hid_generic
>> [ 1969.559616] CPU: 3 PID: 3104 Comm: kworker/3:1 Not tainted 
>> 4.9.0-rc6-net-next3-01390-g93ba5504 #12
>> [ 1969.568984] Hardware name: To Be Filled By O.E.M. To Be Filled By 
>> O.E.M./Z97 Extreme4, BIOS P2.10 05/12/2015
>> [ 1969.578877] Workqueue: events linkwatch_event
>> [ 1969.583285] task: 8803f42a task.stack: 88040b2d
>> [ 1969.589238] RIP: 0010:[]  [] 
>> mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en]
>> [ 1969.600102] RSP: 0018:88040b2d3bd8  EFLAGS: 00010282
>> [ 1969.605442] RAX: 8803f432efc8 RBX: 8803f432 RCX: 
>> 
>> [ 1969.612604] RDX:  RSI:  RDI: 
>> 8803f432
>> [ 1969.619772] RBP: 88040b2d3bd8 R08: 000c R09: 
>> 8803f432f000
>> [ 1969.626938] R10:  R11: 88040d64ac00 R12: 
>> 8803e5aff8dc
>> [ 1969.634104] R13: 8803f4320a28 R14: 8803e5aff800 R15: 
>> 
>> [ 1969.641273] FS:  () GS:88041fac() 
>> knlGS:
>> [ 1969.649422] CS:  0010 DS:  ES:  CR0: 80050033
>> [ 1969.655197] CR2: 0d08 CR3: 01c07000 CR4: 
>> 001406e0
>> [ 1969.662366] Stack:
>> [ 1969.664412]  88040b2d3be8 a0127f8e 88040b2d3c10 
>> a012a23b
>> [ 1969.671948]  8803e5aff8dc 8803f432 8803f432 
>> 88040b2d3c30
>> [ 1969.679478]  8160ae29 8803e5aff8d8 8804088ff300 
>> 88040b2d3c58
>> [ 1969.687001] Call Trace:
>> [ 1969.689484]  [] mlx4_en_fold_software_stats+0x1e/0x20 
>> [mlx4_en]
>> [ 1969.697026]  [] mlx4_en_get_stats64+0x2b/0x50 [mlx4_en]
>> [ 1969.703844]  [] dev_get_stats+0x39/0xa0
>> [ 1969.709274]  [] rtnl_fill_stats+0x40/0x130
>> [ 1969.714968]  [] rtnl_fill_ifinfo+0x55b/0x1010
>> [ 1969.720921]  [] rtmsg_ifinfo_build_skb+0x73/0xd0
>> [ 1969.727136]  [] rtmsg_ifinfo.part.25+0x16/0x50
>> [ 1969.733176]  [] rtmsg_ifinfo+0x18/0x20
>> [ 1969.738522]  [] netdev_state_change+0x47/0x50
>> [ 1969.744478]  [] linkwatch_do_dev+0x38/0x50
>> [ 1969.750170]  [] __linkwatch_run_queue+0xe7/0x160
>> [ 1969.756385]  [] linkwatch_event+0x25/0x30
>> [ 1969.761991]  [] process_one_work+0x15b/0x460
>> [ 1969.767857]  [] worker_thread+0x4e/0x480
>> [ 1969.773378]  [] ? process_one_work+0x460/0x460
>> [ 1969.779420]  [] ? process_one_work+0x460/0x460
>> [ 1969.785460]  [] kthread+0xca/0xe0
>> [ 1969.790372]  [] ? kthread_worker_fn+0x120/0x120
>> [ 1969.796495]  [] ret_from_fork+0x22/0x30
>> [ 1969.801924] Code: 00 00 55 48 89 e5 85 d2 0f 84 90 00 00 00 83 ea 01 31 
>> c9 31 f6 48 8d 87 c0 ef 00 00 4c 8d 8c d7 c8 ef 00 00 48 8b 10 48 83 c0 08 
>> <4c> 8b 82 08 0d 00 00 48 8b 92 00 0d 00 00 4c 01 c6 48 01 d1 4c
>> [ 1969.821969] RIP  [] 
>> mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en]
>> [ 1969.830486]  RSP 
>> [ 1969.834002] CR2: 0d08
>> [ 1969.837440] ---[ end trace 80b9fbc1e7baed9b ]---
>> [ 1969.842102] Kernel panic - not syncing: Fatal exception in interrupt
>> [ 1969.848520] Kernel Offset: disabled
>> [ 1969.852050] ---[ end Kernel panic - 

Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-12-01 Thread Saeed Mahameed
On Thu, Dec 1, 2016 at 5:55 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Thu, 2016-12-01 at 17:38 +0200, Saeed Mahameed wrote:
>
>>
>> Hi Eric, Thanks for the patch, I already acked it.
>
> Thanks !
>
>>
>> I have one educational question (not related to this patch, but
>> related to stats reading in general).
>> I was wondering why do we need to disable bh every time we read stats
>> "spin_lock_bh" ? is it essential ?
>>
>> I checked and in mlx4 we don't hold stats_lock in softirq
>> (en_rx.c/en_tx.c), so I don't see any deadlock risk in here..
>
> Excellent question, and I chose to keep the spinlock.
>
> That would be doable, only if we do not overwrite dev->stats.
>
> Current code is :
>
> static struct rtnl_link_stats64 *
> mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
> {
> struct mlx4_en_priv *priv = netdev_priv(dev);
>
> spin_lock_bh(>stats_lock);
> mlx4_en_fold_software_stats(dev);
> netdev_stats_to_stats64(stats, >stats);
> spin_unlock_bh(>stats_lock);
>
> return stats;
> }
>
> If you remove the spin_lock_bh() :
>
>
> static struct rtnl_link_stats64 *
> mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
> {
> struct mlx4_en_priv *priv = netdev_priv(dev);
>
> mlx4_en_fold_software_stats(dev); // possible races
>
> netdev_stats_to_stats64(stats, >stats);
>
> return stats;
> }
>
> 1) one mlx4_en_fold_software_stats(dev) could be preempted
> on a CONFIG_PREEMPT kernel, or interrupted by long irqs.
>
> 2) Another cpu would also call mlx4_en_fold_software_stats(dev) while
>first cpu is busy.
>
> 3) Then when resuming first cpu/thread, part of the dev->stats fieds
> would be updated with 'old counters',
> while another thread might have updated them with newer values.
>
> 4) A SNMP reader could then get counters that are not monotonically
> increasing,
> which would be confusing/buggy.
>
> So removing the spinlock is doable, but needs to add a new parameter
> to mlx4_en_fold_software_stats() and call netdev_stats_to_stats64()
> before mlx4_en_fold_software_stats(dev)
>
> static struct rtnl_link_stats64 *
> mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
> {
> struct mlx4_en_priv *priv = netdev_priv(dev);
>
> netdev_stats_to_stats64(stats, >stats);
>
> // Passing a non NULL stats asks mlx4_en_fold_software_stats()
> // to not update dev->stats, but stats directly.
>
> mlx4_en_fold_software_stats(dev, stats)
>
>
> return stats;
> }
>
>

Thanks for the detailed answer !!

BTW you went 5 steps ahead of my original question :)), so far you
already have a patch without locking at all (really impressive).

What i wanted to ask originally, was regarding the "_bh", i didn't
mean to completely remove the "spin_lock_bh",
I meant, what happens if we replace "spin_lock_bh"  with "spin_lock",
without disabling bh ?
I gues raw "sping_lock" handles points (2 to 4) from above, but it
won't handle long irqs.


Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-11-30 Thread Saeed Mahameed
On Wed, Nov 30, 2016 at 7:35 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Wed, 2016-11-30 at 18:46 +0200, Saeed Mahameed wrote:
>
>> we had/still have the proper stats they are the ones that
>> mlx4_en_fold_software_stats is trying to cache into  (they always
>> exist),
>> but the ones that you are trying to read from (the mlx4 rings) are gone !
>>
>> This bug is totally new and as i warned, this is another symptom of
>> the real root cause (can't sleep while reading stats).
>>
>> Eric what do you suggest ? Keep pre-allocated MAX_RINGS stats  and
>> always iterate over all of them to query stats ?
>> what if you have one ring/none/1K ? how would you know how many to query ?
>
> I am suggesting I will fix the bug I introduced.
>
> Do not panic.
>
>

Not at all, I trust you are the only one who is capable of providing
the best solution.
I am just trying to read your mind :-).

As i said i like the solution and i want to adapt it to mlx5, so I am
a little bit enthusiastic :)

Thanks.


[PATCH net 6/6] net/mlx5e: Change the SQ/RQ operational state to positive logic

2016-12-04 Thread Saeed Mahameed
From: Mohamad Haj Yahia <moha...@mellanox.com>

When using the negative logic (i.e. FLUSH state), after the RQ/SQ reopen
we will have a time interval that the RQ/SQ is not really ready and the
state indicates that its not in FLUSH state because the initial SQ/RQ struct
memory starts as zeros.
Now we changed the state to indicate if the SQ/RQ is opened and we will
set the READY state after finishing preparing all the SQ/RQ resources.

Fixes: 6e8dd6d6f4bd ("net/mlx5e: Don't wait for SQ completions on close")
Fixes: f2fde18c52a7 ("net/mlx5e: Don't wait for RQ completions on close")
Signed-off-by: Mohamad Haj Yahia <moha...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 14 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |  6 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c   |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c |  4 ++--
 5 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 7a43502..71382df 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -241,7 +241,7 @@ struct mlx5e_tstamp {
 };
 
 enum {
-   MLX5E_RQ_STATE_FLUSH,
+   MLX5E_RQ_STATE_ENABLED,
MLX5E_RQ_STATE_UMR_WQE_IN_PROGRESS,
MLX5E_RQ_STATE_AM,
 };
@@ -394,7 +394,7 @@ struct mlx5e_sq_dma {
 };
 
 enum {
-   MLX5E_SQ_STATE_FLUSH,
+   MLX5E_SQ_STATE_ENABLED,
MLX5E_SQ_STATE_BF_ENABLE,
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 5bf7f86..246d98e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -759,6 +759,7 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
if (err)
goto err_destroy_rq;
 
+   set_bit(MLX5E_RQ_STATE_ENABLED, >state);
err = mlx5e_modify_rq_state(rq, MLX5_RQC_STATE_RST, MLX5_RQC_STATE_RDY);
if (err)
goto err_disable_rq;
@@ -773,6 +774,7 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
return 0;
 
 err_disable_rq:
+   clear_bit(MLX5E_RQ_STATE_ENABLED, >state);
mlx5e_disable_rq(rq);
 err_destroy_rq:
mlx5e_destroy_rq(rq);
@@ -782,7 +784,7 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
 
 static void mlx5e_close_rq(struct mlx5e_rq *rq)
 {
-   set_bit(MLX5E_RQ_STATE_FLUSH, >state);
+   clear_bit(MLX5E_RQ_STATE_ENABLED, >state);
napi_synchronize(>channel->napi); /* prevent mlx5e_post_rx_wqes */
cancel_work_sync(>am.work);
 
@@ -1082,6 +1084,7 @@ static int mlx5e_open_sq(struct mlx5e_channel *c,
if (err)
goto err_destroy_sq;
 
+   set_bit(MLX5E_SQ_STATE_ENABLED, >state);
err = mlx5e_modify_sq(sq, MLX5_SQC_STATE_RST, MLX5_SQC_STATE_RDY,
  false, 0);
if (err)
@@ -1095,6 +1098,7 @@ static int mlx5e_open_sq(struct mlx5e_channel *c,
return 0;
 
 err_disable_sq:
+   clear_bit(MLX5E_SQ_STATE_ENABLED, >state);
mlx5e_disable_sq(sq);
 err_destroy_sq:
mlx5e_destroy_sq(sq);
@@ -,7 +1115,7 @@ static inline void netif_tx_disable_queue(struct 
netdev_queue *txq)
 
 static void mlx5e_close_sq(struct mlx5e_sq *sq)
 {
-   set_bit(MLX5E_SQ_STATE_FLUSH, >state);
+   clear_bit(MLX5E_SQ_STATE_ENABLED, >state);
/* prevent netif_tx_wake_queue */
napi_synchronize(>channel->napi);
 
@@ -3091,7 +3095,7 @@ static void mlx5e_tx_timeout(struct net_device *dev)
if (!netif_xmit_stopped(netdev_get_tx_queue(dev, i)))
continue;
sched_work = true;
-   set_bit(MLX5E_SQ_STATE_FLUSH, >state);
+   clear_bit(MLX5E_SQ_STATE_ENABLED, >state);
netdev_err(dev, "TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, 
SQ Cons: 0x%x SQ Prod: 0x%x\n",
   i, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc);
}
@@ -3146,13 +3150,13 @@ static int mlx5e_xdp_set(struct net_device *netdev, 
struct bpf_prog *prog)
for (i = 0; i < priv->params.num_channels; i++) {
struct mlx5e_channel *c = priv->channel[i];
 
-   set_bit(MLX5E_RQ_STATE_FLUSH, >rq.state);
+   clear_bit(MLX5E_RQ_STATE_ENABLED, >rq.state);
napi_synchronize(>napi);
/* prevent mlx5e_poll_rx_cq from accessing rq->xdp_prog */
 
old_prog = xchg(>rq.xdp_prog, prog);
 
-   clear_bit(MLX5E_RQ_STATE_FLUSH, >rq.state);
+   set_bit(MLX5E_RQ_STATE_ENABLED, >rq.state);
/* napi_schedule in case

[PATCH net 0/6] Mellanox 100G mlx5 fixes 2016-12-04

2016-12-04 Thread Saeed Mahameed
Hi Dave,

Some bug fixes for mlx5 core and mlx5e driver.

Thanks,
Saeed.

Kamal Heib (3):
  net/mlx5: Verify module parameters
  net/mlx5: Remove duplicate pci dev name print
  net/mlx5: Fix query ISSI flow

Mohamad Haj Yahia (1):
  net/mlx5e: Change the SQ/RQ operational state to positive logic

Saeed Mahameed (2):
  net/mlx5e: Don't notify HW when filling the edge of ICO SQ
  net/mlx5e: Don't flush SQ on error

 drivers/net/ethernet/mellanox/mlx5/core/cmd.c  |  5 ---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  4 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 15 
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|  8 ++---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  4 +--
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 42 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h| 15 +---
 8 files changed, 53 insertions(+), 42 deletions(-)

-- 
2.7.4



[PATCH net 1/6] net/mlx5: Verify module parameters

2016-12-04 Thread Saeed Mahameed
From: Kamal Heib <kam...@mellanox.com>

Verify the mlx5_core module parameters by making sure that they are in
the expected range and if they aren't restore them to their default
values.

Fixes: 9603b61de1ee ('mlx5: Move pci device handling from mlx5_ib to mlx5_core')
Signed-off-by: Kamal Heib <kam...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 27 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|  2 +-
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 3b7c6a9f..ab8f2b4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -62,13 +62,13 @@ MODULE_DESCRIPTION("Mellanox Connect-IB, ConnectX-4 core 
driver");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_VERSION(DRIVER_VERSION);
 
-int mlx5_core_debug_mask;
-module_param_named(debug_mask, mlx5_core_debug_mask, int, 0644);
+uint mlx5_core_debug_mask;
+module_param_named(debug_mask, mlx5_core_debug_mask, uint, 0644);
 MODULE_PARM_DESC(debug_mask, "debug mask: 1 = dump cmd data, 2 = dump cmd exec 
time, 3 = both. Default=0");
 
 #define MLX5_DEFAULT_PROF  2
-static int prof_sel = MLX5_DEFAULT_PROF;
-module_param_named(prof_sel, prof_sel, int, 0444);
+static uint prof_sel = MLX5_DEFAULT_PROF;
+module_param_named(prof_sel, prof_sel, uint, 0444);
 MODULE_PARM_DESC(prof_sel, "profile selector. Valid range 0 - 2");
 
 enum {
@@ -1227,13 +1227,6 @@ static int init_one(struct pci_dev *pdev,
 
dev->pdev = pdev;
dev->event = mlx5_core_event;
-
-   if (prof_sel < 0 || prof_sel >= ARRAY_SIZE(profile)) {
-   mlx5_core_warn(dev,
-  "selected profile out of range, selecting 
default (%d)\n",
-  MLX5_DEFAULT_PROF);
-   prof_sel = MLX5_DEFAULT_PROF;
-   }
dev->profile = [prof_sel];
 
INIT_LIST_HEAD(>ctx_list);
@@ -1450,10 +1443,22 @@ static struct pci_driver mlx5_core_driver = {
.sriov_configure   = mlx5_core_sriov_configure,
 };
 
+static void mlx5_core_verify_params(void)
+{
+   if (prof_sel >= ARRAY_SIZE(profile)) {
+   pr_warn("mlx5_core: WARNING: Invalid module parameter prof_sel 
%d, valid range 0-%zu, changing back to default(%d)\n",
+   prof_sel,
+   ARRAY_SIZE(profile) - 1,
+   MLX5_DEFAULT_PROF);
+   prof_sel = MLX5_DEFAULT_PROF;
+   }
+}
+
 static int __init init(void)
 {
int err;
 
+   mlx5_core_verify_params();
mlx5_register_debugfs();
 
err = pci_register_driver(_core_driver);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 187662c..20d16b1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -44,7 +44,7 @@
 
 #define MLX5_TOTAL_VPORTS(mdev) (1 + pci_sriov_get_totalvfs(mdev->pdev))
 
-extern int mlx5_core_debug_mask;
+extern uint mlx5_core_debug_mask;
 
 #define mlx5_core_dbg(__dev, format, ...)  \
dev_dbg(&(__dev)->pdev->dev, "%s:%s:%d:(pid %d): " format,  \
-- 
2.7.4



[PATCH net 2/6] net/mlx5: Remove duplicate pci dev name print

2016-12-04 Thread Saeed Mahameed
From: Kamal Heib <kam...@mellanox.com>

Remove duplicate pci dev name printing from mlx5_core_warn/dbg.

Fixes: 5a7883989b1c ('net/mlx5_core: Improve mlx5 messages')
Signed-off-by: Kamal Heib <kam...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 20d16b1..2ce0346 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -47,8 +47,8 @@
 extern uint mlx5_core_debug_mask;
 
 #define mlx5_core_dbg(__dev, format, ...)  \
-   dev_dbg(&(__dev)->pdev->dev, "%s:%s:%d:(pid %d): " format,  \
-(__dev)->priv.name, __func__, __LINE__, current->pid,  \
+   dev_dbg(&(__dev)->pdev->dev, "%s:%d:(pid %d): " format, \
+__func__, __LINE__, current->pid,  \
 ##__VA_ARGS__)
 
 #define mlx5_core_dbg_mask(__dev, mask, format, ...)   \
@@ -63,8 +63,8 @@ do {  
\
   ##__VA_ARGS__)
 
 #define mlx5_core_warn(__dev, format, ...) \
-   dev_warn(&(__dev)->pdev->dev, "%s:%s:%d:(pid %d): " format, \
-   (__dev)->priv.name, __func__, __LINE__, current->pid,   \
+   dev_warn(&(__dev)->pdev->dev, "%s:%d:(pid %d): " format,\
+__func__, __LINE__, current->pid,  \
##__VA_ARGS__)
 
 #define mlx5_core_info(__dev, format, ...) \
-- 
2.7.4



[PATCH net 5/6] net/mlx5e: Don't flush SQ on error

2016-12-04 Thread Saeed Mahameed
We are doing SQ descriptors cleanup in driver.

Fixes: 6e8dd6d6f4bd ("net/mlx5e: Don't wait for SQ completions on close")
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 84e8b25..5bf7f86 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1006,7 +1006,6 @@ static int mlx5e_enable_sq(struct mlx5e_sq *sq, struct 
mlx5e_sq_param *param)
MLX5_SET(sqc,  sqc, min_wqe_inline_mode, sq->min_inline_mode);
MLX5_SET(sqc,  sqc, state,  MLX5_SQC_STATE_RST);
MLX5_SET(sqc,  sqc, tis_lst_sz, param->type == MLX5E_SQ_ICO ? 0 : 1);
-   MLX5_SET(sqc,  sqc, flush_in_error_en,  1);
 
MLX5_SET(wq,   wq, wq_type,   MLX5_WQ_TYPE_CYCLIC);
MLX5_SET(wq,   wq, uar_page,  sq->uar.index);
-- 
2.7.4



[PATCH net 4/6] net/mlx5e: Don't notify HW when filling the edge of ICO SQ

2016-12-04 Thread Saeed Mahameed
We are going to do this a couple of steps ahead anyway.

Fixes: d3c9bc2743dc ("net/mlx5e: Added ICO SQs")
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index c6de6fb..e9abb6d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -340,7 +340,7 @@ static inline void mlx5e_post_umr_wqe(struct mlx5e_rq *rq, 
u16 ix)
while ((pi = (sq->pc & wq->sz_m1)) > sq->edge) {
sq->db.ico_wqe[pi].opcode = MLX5_OPCODE_NOP;
sq->db.ico_wqe[pi].num_wqebbs = 1;
-   mlx5e_send_nop(sq, true);
+   mlx5e_send_nop(sq, false);
}
 
wqe = mlx5_wq_cyc_get_wqe(wq, pi);
-- 
2.7.4



[PATCH net 3/6] net/mlx5: Fix query ISSI flow

2016-12-04 Thread Saeed Mahameed
From: Kamal Heib <kam...@mellanox.com>

In old FWs query ISSI command is not supported and for some of those FWs
it might fail with status other than "MLX5_CMD_STAT_BAD_OP_ERR".

In such case instead of failing the driver load, we will treat any FW
status other than 0 for Query ISSI FW command as ISSI not supported and
assume ISSI=0 (most basic driver/FW interface).

In case of driver syndrom (query ISSI failure by driver) we will fail
driver load.

Fixes: f62b8bb8f2d3 ('net/mlx5: Extend mlx5_core to support ConnectX-4
Ethernet functionality')
Signed-off-by: Kamal Heib <kam...@mellanox.com>

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c   |  5 -
 drivers/net/ethernet/mellanox/mlx5/core/main.c  | 15 +--
 drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h |  5 +
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 1e639f8..bfe410e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -268,11 +268,6 @@ static void dump_buf(void *buf, int size, int data_only, 
int offset)
pr_debug("\n");
 }
 
-enum {
-   MLX5_DRIVER_STATUS_ABORTED = 0xfe,
-   MLX5_DRIVER_SYND = 0xbadd00de,
-};
-
 static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op,
   u32 *synd, u8 *status)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index ab8f2b4..296e45b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -732,13 +732,15 @@ static int mlx5_core_set_issi(struct mlx5_core_dev *dev)
u8 status;
 
mlx5_cmd_mbox_status(query_out, , );
-   if (status == MLX5_CMD_STAT_BAD_OP_ERR) {
-   pr_debug("Only ISSI 0 is supported\n");
-   return 0;
+   if (!status || syndrome == MLX5_DRIVER_SYND) {
+   mlx5_core_err(dev, "Failed to query ISSI err(%d) 
status(%d) synd(%d)\n",
+ err, status, syndrome);
+   return err;
}
 
-   pr_err("failed to query ISSI err(%d)\n", err);
-   return err;
+   mlx5_core_warn(dev, "Query ISSI is not supported by FW, ISSI is 
0\n");
+   dev->issi = 0;
+   return 0;
}
 
sup_issi = MLX5_GET(query_issi_out, query_out, supported_issi_dw0);
@@ -752,7 +754,8 @@ static int mlx5_core_set_issi(struct mlx5_core_dev *dev)
err = mlx5_cmd_exec(dev, set_in, sizeof(set_in),
set_out, sizeof(set_out));
if (err) {
-   pr_err("failed to set ISSI=1 err(%d)\n", err);
+   mlx5_core_err(dev, "Failed to set ISSI to 1 err(%d)\n",
+ err);
return err;
}
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 2ce0346..63b9a0d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -75,6 +75,11 @@ enum {
MLX5_CMD_TIME, /* print command execution time */
 };
 
+enum {
+   MLX5_DRIVER_STATUS_ABORTED = 0xfe,
+   MLX5_DRIVER_SYND = 0xbadd00de,
+};
+
 int mlx5_query_hca_caps(struct mlx5_core_dev *dev);
 int mlx5_query_board_id(struct mlx5_core_dev *dev);
 int mlx5_cmd_init_hca(struct mlx5_core_dev *dev);
-- 
2.7.4



Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-12-04 Thread Saeed Mahameed
On Thu, Dec 1, 2016 at 7:08 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Thu, 2016-12-01 at 18:33 +0200, Saeed Mahameed wrote:
>
>> Thanks for the detailed answer !!
>
> You're welcome.
>
>>
>> BTW you went 5 steps ahead of my original question :)), so far you
>> already have a patch without locking at all (really impressive).
>>
>> What i wanted to ask originally, was regarding the "_bh", i didn't
>> mean to completely remove the "spin_lock_bh",
>> I meant, what happens if we replace "spin_lock_bh"  with "spin_lock",
>> without disabling bh ?
>> I gues raw "sping_lock" handles points (2 to 4) from above, but it
>> won't handle long irqs.
>
> Thats a very good point, the _bh prefix can totally be removed, since
> stats_lock is only acquired from process context.
>
>

That was my initial point, Thanks for the help.
will provide a fix patch later once 4.9 is release.


Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-12-04 Thread Saeed Mahameed
On Thu, Dec 1, 2016 at 7:36 PM, Eric Dumazet  wrote:
> On Thu, 2016-12-01 at 08:08 -0800, Eric Dumazet wrote:
>> On Thu, 2016-12-01 at 07:55 -0800, Eric Dumazet wrote:
>>
>> > So removing the spinlock is doable, but needs to add a new parameter
>> > to mlx4_en_fold_software_stats() and call netdev_stats_to_stats64()
>> > before mlx4_en_fold_software_stats(dev)
>>
>> Untested patch would be :
>>
>>  drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |2 -
>>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |   10 +
>>  drivers/net/ethernet/mellanox/mlx4/en_port.c|   24 +-
>>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h|3 +
>>  4 files changed, 23 insertions(+), 16 deletions(-)
>
> The patch is wrong, since priv->port_up could change to false while we
> are running and using the about to be deleted tx/rx rings.
>

Right, hence the regression Jesper saw ;).

>
> So the only safe thing to do is to remove the _bh suffix.
>
> Not worth trying to avoid taking a spinlock in this code.
>

Ack.


Re: [PATCH v2 net-next 3/4] mlx4: xdp: Reserve headroom for receiving packet when XDP prog is active

2016-12-04 Thread Saeed Mahameed
On Sun, Dec 4, 2016 at 5:17 AM, Martin KaFai Lau  wrote:
> Reserve XDP_PACKET_HEADROOM and honor bpf_xdp_adjust_head()
> when XDP prog is active.  This patch only affects the code
> path when XDP is active.
>
> Signed-off-by: Martin KaFai Lau 
> ---

Hi Martin, Sorry for the late review, i have some comments below

>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 17 +++--
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c | 23 +--
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c |  9 +
>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  3 ++-
>  4 files changed, 39 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 311c14153b8b..094a13b52cf6 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -51,7 +51,8 @@
>  #include "mlx4_en.h"
>  #include "en_port.h"
>
> -#define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN)))
> +#define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) - \
> +  XDP_PACKET_HEADROOM))
>
>  int mlx4_en_setup_tc(struct net_device *dev, u8 up)
>  {
> @@ -1551,6 +1552,7 @@ int mlx4_en_start_port(struct net_device *dev)
> struct mlx4_en_tx_ring *tx_ring;
> int rx_index = 0;
> int err = 0;
> +   int mtu;
> int i, t;
> int j;
> u8 mc_list[16] = {0};
> @@ -1684,8 +1686,12 @@ int mlx4_en_start_port(struct net_device *dev)
> }
>
> /* Configure port */
> +   mtu = priv->rx_skb_size + ETH_FCS_LEN;
> +   if (priv->tx_ring_num[TX_XDP])
> +   mtu += XDP_PACKET_HEADROOM;
> +

Why would the physical MTU care for the headroom you preserve for XDP prog?
This is the wire MTU, it shouldn't be changed, please keep it as
before, any preservation you make in packets buffers are needed only
for FWD case or modify case (HW or wire should not care about them).

> err = mlx4_SET_PORT_general(mdev->dev, priv->port,
> -   priv->rx_skb_size + ETH_FCS_LEN,
> +   mtu,
> priv->prof->tx_pause,
> priv->prof->tx_ppp,
> priv->prof->rx_pause,
> @@ -2255,6 +2261,13 @@ static bool mlx4_en_check_xdp_mtu(struct net_device 
> *dev, int mtu)
>  {
> struct mlx4_en_priv *priv = netdev_priv(dev);
>
> +   if (mtu + XDP_PACKET_HEADROOM > priv->max_mtu) {
> +   en_err(priv,
> +  "Device max mtu:%d does not allow %d bytes reserved 
> headroom for XDP prog\n",
> +  priv->max_mtu, XDP_PACKET_HEADROOM);
> +   return false;
> +   }
> +
> if (mtu > MLX4_EN_MAX_XDP_MTU) {
> en_err(priv, "mtu:%d > max:%d when XDP prog is attached\n",
>mtu, MLX4_EN_MAX_XDP_MTU);
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 23e9d04d1ef4..324771ac929e 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -96,7 +96,6 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
> struct mlx4_en_rx_alloc page_alloc[MLX4_EN_MAX_RX_FRAGS];
> const struct mlx4_en_frag_info *frag_info;
> struct page *page;
> -   dma_addr_t dma;
> int i;
>
> for (i = 0; i < priv->num_frags; i++) {
> @@ -115,9 +114,10 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
>
> for (i = 0; i < priv->num_frags; i++) {
> frags[i] = ring_alloc[i];
> -   dma = ring_alloc[i].dma + ring_alloc[i].page_offset;
> +   frags[i].page_offset += priv->frag_info[i].rx_headroom;

I don't see any need for headroom on frag_info other that frag0 (which
where the packet starts).
What is the meaning of a headroom of a frag in a middle of a packet ?

if you agree with me then, you can use XDP_PACKET_HEADROOM as is where
needed (i.e frag0 page offset) and remove
"priv->frag_info[i].rx_headroom"

...

After going through the code a little bit i see that this code is
shared between XDP and common path, and you didn't want to add boolean
conditions.

Ok i see what you did here.

Maybe we can pass headroom as a function parameter and split frag0
handling from the rest ?
If it is too much then i am ok with the code as it is,

> +   rx_desc->data[i].addr = cpu_to_be64(frags[i].dma +
> +   frags[i].page_offset);
> ring_alloc[i] = page_alloc[i];
> -   rx_desc->data[i].addr = cpu_to_be64(dma);
> }
>
> return 0;
> @@ -250,7 +250,8 @@ static int mlx4_en_prepare_rx_desc(struct mlx4_en_priv 
> *priv,
>
>  

Re: [PATCH net-next 2/4] mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs

2016-12-04 Thread Saeed Mahameed
On Sat, Dec 3, 2016 at 2:53 AM, Alexei Starovoitov  wrote:
> On 12/2/16 4:38 PM, Eric Dumazet wrote:
>>
>> On Fri, 2016-12-02 at 15:23 -0800, Martin KaFai Lau wrote:
>>>
>>> When XDP prog is attached, it is currently limiting
>>> MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514
>>> in x86.
>>>
>>> AFAICT, since mlx4 is doing one page per packet for XDP,
>>> we can at least raise the MTU limitation up to
>>> PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is
>>> doing.  It will be useful in the next patch which allows
>>> XDP program to extend the packet by adding new header(s).
>>>
>>> Signed-off-by: Martin KaFai Lau 
>>> ---
>>
>>
>> Have you tested your patch on a host with PAGE_SIZE = 64 KB ?
>>
>> Looks XDP really kills arches with bigger pages :(
>
>
> I'm afraid xdp mlx[45] support was not tested on arches
> with 64k pages at all. Not just this patch.

Yep, in mlx5 page per packet became the default, with or without XDP,
unlike mlx4.
currently we allow 64KB pages per packet! which is wrong and need to be fixed.

I will get to this task soon.

> I think people who care about such archs should test?

We do test mlx5 and mlx4 on PPC arch. other than we require more
memory than we need, we don't see any issues. and we don't test XDP on
those archs.

> Note page per packet is not a hard requirement for all drivers
> and all archs. For mlx[45] it was the easiest and the most
> convenient way to achieve desired performance.
> If there are ways to do the same performance differently,
> I'm all ears :)
>

when bigger pages, i.e  PAGE_SIZE > 8K, my current low hanging fruit
options for mlx5 are
1. start sharing pages for multi packets.
2. Go back to the SKB allocator (allocate ring of SKBs on advance
rather than page per packet/s).

this means that default RX memory scheme will be different than XDP's
on such ARCHs (XDP wil still use page per packet)

Alexei, we should start considering PPC archs for XDP use cases,
demanding page per packet on those archs is a little bit heavy
requirement


Re: [WIP] net+mlx4: auto doorbell

2016-11-30 Thread Saeed Mahameed
On Tue, Nov 29, 2016 at 8:58 AM, Eric Dumazet  wrote:
> On Mon, 2016-11-21 at 10:10 -0800, Eric Dumazet wrote:
>
>
>> Not sure it this has been tried before, but the doorbell avoidance could
>> be done by the driver itself, because it knows a TX completion will come
>> shortly (well... if softirqs are not delayed too much !)
>>
>> Doorbell would be forced only if :
>>
>> ("skb->xmit_more is not set" AND "TX engine is not 'started yet'" )
>> OR
>> ( too many [1] packets were put in TX ring buffer, no point deferring
>> more)
>>
>> Start the pump, but once it is started, let the doorbells being done by
>> TX completion.
>>
>> ndo_start_xmit and TX completion handler would have to maintain a shared
>> state describing if packets were ready but doorbell deferred.
>>
>>
>> Note that TX completion means "if at least one packet was drained",
>> otherwise busy polling, constantly calling napi->poll() would force a
>> doorbell too soon for devices sharing a NAPI for both RX and TX.
>>
>> But then, maybe busy poll would like to force a doorbell...
>>
>> I could try these ideas on mlx4 shortly.
>>
>>
>> [1] limit could be derived from active "ethtool -c" params, eg tx-frames
>
> I have a WIP, that increases pktgen rate by 75 % on mlx4 when bulking is
> not used.

Hi Eric, Nice Idea indeed and we need something like this,
today we almost don't exploit the TX bulking at all.

But please see below, i am not sure different contexts should share
the doorbell ringing, it is really risky.

>  drivers/net/ethernet/mellanox/mlx4/en_rx.c   |2
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c   |   90 +++--
>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |4
>  include/linux/netdevice.h|1
>  net/core/net-sysfs.c |   18 +++
>  5 files changed, 83 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 6562f78b07f4..fbea83218fc0 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -1089,7 +1089,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, 
> struct mlx4_en_cq *cq, int bud
>
> if (polled) {
> if (doorbell_pending)
> -   
> mlx4_en_xmit_doorbell(priv->tx_ring[TX_XDP][cq->ring]);
> +   mlx4_en_xmit_doorbell(dev, 
> priv->tx_ring[TX_XDP][cq->ring]);
>
> mlx4_cq_set_ci(>mcq);
> wmb(); /* ensure HW sees CQ consumer before we post new 
> buffers */
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 4b597dca5c52..affebb435679 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -67,7 +67,7 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
> ring->size = size;
> ring->size_mask = size - 1;
> ring->sp_stride = stride;
> -   ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
> +   ring->full_size = ring->size - HEADROOM - 2*MAX_DESC_TXBBS;
>
> tmp = size * sizeof(struct mlx4_en_tx_info);
> ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
> @@ -193,6 +193,7 @@ int mlx4_en_activate_tx_ring(struct mlx4_en_priv *priv,
> ring->sp_cqn = cq;
> ring->prod = 0;
> ring->cons = 0x;
> +   ring->ncons = 0;
> ring->last_nr_txbb = 1;
> memset(ring->tx_info, 0, ring->size * sizeof(struct mlx4_en_tx_info));
> memset(ring->buf, 0, ring->buf_size);
> @@ -227,9 +228,9 @@ void mlx4_en_deactivate_tx_ring(struct mlx4_en_priv *priv,
>MLX4_QP_STATE_RST, NULL, 0, 0, >sp_qp);
>  }
>
> -static inline bool mlx4_en_is_tx_ring_full(struct mlx4_en_tx_ring *ring)
> +static inline bool mlx4_en_is_tx_ring_full(const struct mlx4_en_tx_ring 
> *ring)
>  {
> -   return ring->prod - ring->cons > ring->full_size;
> +   return READ_ONCE(ring->prod) - READ_ONCE(ring->cons) > 
> ring->full_size;
>  }
>
>  static void mlx4_en_stamp_wqe(struct mlx4_en_priv *priv,
> @@ -374,6 +375,7 @@ int mlx4_en_free_tx_buf(struct net_device *dev, struct 
> mlx4_en_tx_ring *ring)
>
> /* Skip last polled descriptor */
> ring->cons += ring->last_nr_txbb;
> +   ring->ncons += ring->last_nr_txbb;
> en_dbg(DRV, priv, "Freeing Tx buf - cons:0x%x prod:0x%x\n",
>  ring->cons, ring->prod);
>
> @@ -389,6 +391,7 @@ int mlx4_en_free_tx_buf(struct net_device *dev, struct 
> mlx4_en_tx_ring *ring)
> !!(ring->cons & ring->size), 
> 0,
> 0 /* Non-NAPI caller */);
> ring->cons += ring->last_nr_txbb;
> +   ring->ncons += ring->last_nr_txbb;
> cnt++;
> }
>
> @@ -401,6 +404,38 @@ int 

[PATCH for-next 01/11] net/mlx5: Fix offset naming for reserved fields in hca_cap_bits

2017-01-01 Thread Saeed Mahameed
From: Max Gurtovoy <m...@mellanox.com>

Fix offset for reserved fields.

Fixes: 7486216b3a0b ("{net,IB}/mlx5: mlx5_ifc updates")
Fixes: b4ff3a36d3e4 ("net/mlx5: Use offset based reserved field names in the 
IFC header file")
Fixes: 7d5e14237a55 ("net/mlx5: Update mlx5_ifc hardware features")
Signed-off-by: Max Gurtovoy <m...@mellanox.com>
Reviewed-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 57bec54..4792c85 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -826,9 +826,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 reserved_at_1a9[0x2];
u8 local_ca_ack_delay[0x5];
u8 port_module_event[0x1];
-   u8 reserved_at_1b0[0x1];
+   u8 reserved_at_1b1[0x1];
u8 ports_check[0x1];
-   u8 reserved_at_1b2[0x1];
+   u8 reserved_at_1b3[0x1];
u8 disable_link_up[0x1];
u8 beacon_led[0x1];
u8 port_type[0x2];
@@ -858,7 +858,7 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 
u8 compact_address_vector[0x1];
u8 striding_rq[0x1];
-   u8 reserved_at_201[0x2];
+   u8 reserved_at_202[0x2];
u8 ipoib_basic_offloads[0x1];
u8 reserved_at_205[0xa];
u8 drain_sigerr[0x1];
@@ -1009,10 +1009,10 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 rndv_offload_rc[0x1];
u8 rndv_offload_dc[0x1];
u8 log_tag_matching_list_sz[0x5];
-   u8 reserved_at_5e8[0x3];
+   u8 reserved_at_5f8[0x3];
u8 log_max_xrq[0x5];
 
-   u8 reserved_at_5f0[0x200];
+   u8 reserved_at_600[0x200];
 };
 
 enum mlx5_flow_destination_type {
-- 
2.7.4



[PATCH for-next 09/11] {net,IB}/mlx5: Refactor page fault handling

2017-01-01 Thread Saeed Mahameed
From: Artemy Kovalyov <artem...@mellanox.com>

* Update page fault event according to last specification.
* Separate code path for page fault EQ, completion EQ and async EQ.
* Move page fault handling work queue from mlx5_ib static variable
  into mlx5_core page fault EQ.
* Allocate memory to store ODP event dynamically as the
  events arrive, since in atomic context - use mempool.
* Make mlx5_ib page fault handler run in process context.

Signed-off-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c  |  14 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |  49 +---
 drivers/infiniband/hw/mlx5/odp.c   | 300 -
 drivers/infiniband/hw/mlx5/qp.c|  26 --
 drivers/net/ethernet/mellanox/mlx5/core/dev.c  |  33 +++
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   | 290 +---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  21 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   2 +
 drivers/net/ethernet/mellanox/mlx5/core/qp.c   | 108 
 include/linux/mlx5/device.h|   6 +-
 include/linux/mlx5/driver.h|  97 ++-
 include/linux/mlx5/qp.h|  44 ---
 12 files changed, 522 insertions(+), 468 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index b871272..86c61e7 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3319,6 +3319,9 @@ static struct mlx5_interface mlx5_ib_interface = {
.add= mlx5_ib_add,
.remove = mlx5_ib_remove,
.event  = mlx5_ib_event,
+#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
+   .pfault = mlx5_ib_pfault,
+#endif
.protocol   = MLX5_INTERFACE_PROTOCOL_IB,
 };
 
@@ -3329,25 +3332,14 @@ static int __init mlx5_ib_init(void)
if (deprecated_prof_sel != 2)
pr_warn("prof_sel is deprecated for mlx5_ib, set it for 
mlx5_core\n");
 
-   err = mlx5_ib_odp_init();
-   if (err)
-   return err;
-
err = mlx5_register_interface(_ib_interface);
-   if (err)
-   goto clean_odp;
-
-   return err;
 
-clean_odp:
-   mlx5_ib_odp_cleanup();
return err;
 }
 
 static void __exit mlx5_ib_cleanup(void)
 {
mlx5_unregister_interface(_ib_interface);
-   mlx5_ib_odp_cleanup();
 }
 
 module_init(mlx5_ib_init);
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 02d9255..a51c805 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -277,29 +277,6 @@ struct mlx5_ib_rwq_ind_table {
u32 rqtn;
 };
 
-/*
- * Connect-IB can trigger up to four concurrent pagefaults
- * per-QP.
- */
-enum mlx5_ib_pagefault_context {
-   MLX5_IB_PAGEFAULT_RESPONDER_READ,
-   MLX5_IB_PAGEFAULT_REQUESTOR_READ,
-   MLX5_IB_PAGEFAULT_RESPONDER_WRITE,
-   MLX5_IB_PAGEFAULT_REQUESTOR_WRITE,
-   MLX5_IB_PAGEFAULT_CONTEXTS
-};
-
-static inline enum mlx5_ib_pagefault_context
-   mlx5_ib_get_pagefault_context(struct mlx5_pagefault *pagefault)
-{
-   return pagefault->flags & (MLX5_PFAULT_REQUESTOR | MLX5_PFAULT_WRITE);
-}
-
-struct mlx5_ib_pfault {
-   struct work_struct  work;
-   struct mlx5_pagefault   mpfault;
-};
-
 struct mlx5_ib_ubuffer {
struct ib_umem *umem;
int buf_size;
@@ -385,20 +362,6 @@ struct mlx5_ib_qp {
/* Store signature errors */
boolsignature_en;
 
-#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
-   /*
-* A flag that is true for QP's that are in a state that doesn't
-* allow page faults, and shouldn't schedule any more faults.
-*/
-   int disable_page_faults;
-   /*
-* The disable_page_faults_lock protects a QP's disable_page_faults
-* field, allowing for a thread to atomically check whether the QP
-* allows page faults, and if so schedule a page fault.
-*/
-   spinlock_t  disable_page_faults_lock;
-   struct mlx5_ib_pfault   pagefaults[MLX5_IB_PAGEFAULT_CONTEXTS];
-#endif
struct list_headqps_list;
struct list_headcq_recv_list;
struct list_headcq_send_list;
@@ -869,18 +832,13 @@ struct ib_rwq_ind_table 
*mlx5_ib_create_rwq_ind_table(struct ib_device *device,
 int mlx5_ib_destroy_rwq_ind_table(struct ib_rwq_ind_table *wq_ind_table);
 
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
-extern struct workqueue_struct *mlx5_ib_page_fault_wq;
-
 void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev *dev);
-void mlx5_ib_mr_pfault_handler(struct mlx5_ib_qp *

[PATCH for-next 10/11] IB/mlx5: Add ODP atomics support

2017-01-01 Thread Saeed Mahameed
From: Artemy Kovalyov <artem...@mellanox.com>

Handle ODP atomic operations. When initiator of RDMA atomic
operation use ODP MR to provide source data handle pagefault properly.

Signed-off-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/odp.c | 88 +++-
 include/linux/mlx5/mlx5_ifc.h|  2 +-
 include/linux/mlx5/qp.h  | 18 
 3 files changed, 69 insertions(+), 39 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 26f96c7..971b288 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -144,6 +144,9 @@ void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev *dev)
if (MLX5_CAP_ODP(dev->mdev, rc_odp_caps.read))
caps->per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ;
 
+   if (MLX5_CAP_ODP(dev->mdev, rc_odp_caps.atomic))
+   caps->per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_ATOMIC;
+
return;
 }
 
@@ -386,6 +389,17 @@ static int pagefault_data_segments(struct mlx5_ib_dev *dev,
return ret < 0 ? ret : npages;
 }
 
+static const u32 mlx5_ib_odp_opcode_cap[] = {
+   [MLX5_OPCODE_SEND] = IB_ODP_SUPPORT_SEND,
+   [MLX5_OPCODE_SEND_IMM] = IB_ODP_SUPPORT_SEND,
+   [MLX5_OPCODE_SEND_INVAL]   = IB_ODP_SUPPORT_SEND,
+   [MLX5_OPCODE_RDMA_WRITE]   = IB_ODP_SUPPORT_WRITE,
+   [MLX5_OPCODE_RDMA_WRITE_IMM]   = IB_ODP_SUPPORT_WRITE,
+   [MLX5_OPCODE_RDMA_READ]= IB_ODP_SUPPORT_READ,
+   [MLX5_OPCODE_ATOMIC_CS]= IB_ODP_SUPPORT_ATOMIC,
+   [MLX5_OPCODE_ATOMIC_FA]= IB_ODP_SUPPORT_ATOMIC,
+};
+
 /*
  * Parse initiator WQE. Advances the wqe pointer to point at the
  * scatter-gather list, and set wqe_end to the end of the WQE.
@@ -396,6 +410,8 @@ static int mlx5_ib_mr_initiator_pfault_handler(
 {
struct mlx5_wqe_ctrl_seg *ctrl = *wqe;
u16 wqe_index = pfault->wqe.wqe_index;
+   u32 transport_caps;
+   struct mlx5_base_av *av;
unsigned ds, opcode;
 #if defined(DEBUG)
u32 ctrl_wqe_index, ctrl_qpn;
@@ -441,53 +457,49 @@ static int mlx5_ib_mr_initiator_pfault_handler(
 
opcode = be32_to_cpu(ctrl->opmod_idx_opcode) &
 MLX5_WQE_CTRL_OPCODE_MASK;
+
switch (qp->ibqp.qp_type) {
case IB_QPT_RC:
-   switch (opcode) {
-   case MLX5_OPCODE_SEND:
-   case MLX5_OPCODE_SEND_IMM:
-   case MLX5_OPCODE_SEND_INVAL:
-   if (!(dev->odp_caps.per_transport_caps.rc_odp_caps &
- IB_ODP_SUPPORT_SEND))
-   goto invalid_transport_or_opcode;
-   break;
-   case MLX5_OPCODE_RDMA_WRITE:
-   case MLX5_OPCODE_RDMA_WRITE_IMM:
-   if (!(dev->odp_caps.per_transport_caps.rc_odp_caps &
- IB_ODP_SUPPORT_WRITE))
-   goto invalid_transport_or_opcode;
-   *wqe += sizeof(struct mlx5_wqe_raddr_seg);
-   break;
-   case MLX5_OPCODE_RDMA_READ:
-   if (!(dev->odp_caps.per_transport_caps.rc_odp_caps &
- IB_ODP_SUPPORT_READ))
-   goto invalid_transport_or_opcode;
-   *wqe += sizeof(struct mlx5_wqe_raddr_seg);
-   break;
-   default:
-   goto invalid_transport_or_opcode;
-   }
+   transport_caps = dev->odp_caps.per_transport_caps.rc_odp_caps;
break;
case IB_QPT_UD:
-   switch (opcode) {
-   case MLX5_OPCODE_SEND:
-   case MLX5_OPCODE_SEND_IMM:
-   if (!(dev->odp_caps.per_transport_caps.ud_odp_caps &
- IB_ODP_SUPPORT_SEND))
-   goto invalid_transport_or_opcode;
-   *wqe += sizeof(struct mlx5_wqe_datagram_seg);
-   break;
-   default:
-   goto invalid_transport_or_opcode;
-   }
+   transport_caps = dev->odp_caps.per_transport_caps.ud_odp_caps;
break;
default:
-invalid_transport_or_opcode:
-   mlx5_ib_err(dev, "ODP fault on QP of an unsupported opcode or 
transport. transport: 0x%x opcode: 0x%x.\n",
-   qp->ibqp.qp_type, opcode);
+   mlx5_ib_err(dev, "ODP fault on QP of an unsupported transport 
0x%x\n",
+   qp->ibqp.qp_type);
+   return -EFAULT;
+   }
+
+   if (unlikely(opcode &

[PATCH for-next 04/11] net/mlx5: Support new MR features

2017-01-01 Thread Saeed Mahameed
From: Artemy Kovalyov <artem...@mellanox.com>

This patch adds the following items to IFC file.

1. MLX5_MKC_ACCESS_MODE_KSM enum value for creating KSM memory keys.
KSM access mode used when indirect MKey associated with fixed memory
size entries.

2. null_mkey field that is used to indicate non-present KLM/KSM
entries, where it causes the device to generate page fault event
when trying to access it.

3. struct mlx5_ifc_cmd_hca_cap_bits capability bits indicating
related value/field is supported:
* fixed_buffer_size - MLX5_MKC_ACCESS_MODE_KSM
* umr_extended_translation_offset - translation_offset_42_16
in UMR ctrl segment
* null_mkey - null_mkey in QUERY_SPECIAL_CONTEXTS

Signed-off-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 4792c85..7c760e5 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -782,11 +782,12 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 log_max_eq[0x4];
 
u8 max_indirection[0x8];
-   u8 reserved_at_108[0x1];
+   u8 fixed_buffer_size[0x1];
u8 log_max_mrw_sz[0x7];
u8 reserved_at_110[0x2];
u8 log_max_bsf_list_size[0x6];
-   u8 reserved_at_118[0x2];
+   u8 umr_extended_translation_offset[0x1];
+   u8 null_mkey[0x1];
u8 log_max_klm_list_size[0x6];
 
u8 reserved_at_120[0xa];
@@ -2569,6 +2570,7 @@ enum {
MLX5_MKC_ACCESS_MODE_PA= 0x0,
MLX5_MKC_ACCESS_MODE_MTT   = 0x1,
MLX5_MKC_ACCESS_MODE_KLMS  = 0x2,
+   MLX5_MKC_ACCESS_MODE_KSM   = 0x3,
 };
 
 struct mlx5_ifc_mkc_bits {
@@ -3677,6 +3679,10 @@ struct mlx5_ifc_query_special_contexts_out_bits {
u8 dump_fill_mkey[0x20];
 
u8 resd_lkey[0x20];
+
+   u8 null_mkey[0x20];
+
+   u8 reserved_at_a0[0x60];
 };
 
 struct mlx5_ifc_query_special_contexts_in_bits {
-- 
2.7.4



[PATCH for-next 00/11] Mellanox mlx5 core and ODP updates 2017-01-01

2017-01-01 Thread Saeed Mahameed
Hi Dave and Doug,

The following eleven patches mainly come from Artemy Kovalyov
who expanded mlx5 on-demand-paging (ODP) support. In addition
there are three cleanup patches which don't change any functionality,
but are needed to align codebase prior accepting other patches.

Memory region (MR) in IB can be huge and ODP (on-demand paging)
technique allows to use unpinned memory, which can be consumed and
released on demand. This allows to applications do not pin down
the underlying physical pages of the address space, and save from them
need to track the validity of the mappings.

Rather, the HCA requests the latest translations from the OS when pages
are not present, and the OS invalidates translations which are no longer
valid due to either non-present pages or mapping changes.

In existing ODP implementation applications is needed to register
memory buffers for communication, though registered memory regions
need not have valid mappings at registration time.

This patch set performs the following steps to expand
current ODP implementation:

1. It refactors UMR to support large regions, by introducing generic
   function to perform HCA translation table modifications. This
   function supports both atomic and process contexts and is not limited
   by number of modified entries.

   This function allows to enable reallocated memory regions of
   arbitrary size, so adding MR cache buckets to support up to 16GB MRs.

2. It changes page fault event format and refactor page faults logic
   together with addition of atomic support.

3. It prepares mlx5 core code to support implicit registration with
   simplified and relaxed semantics.

   Implicit ODP semantics allows to applications provide special memory
   key that represents their complete address space. Thus all IO accesses
   referencing to this key (with proper access rights associated with the key)
   wouldn't need not register any virtual address range.

Thanks,
Artemy, Ilya and Leon

The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77
Linux 4.10-rc1

are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git 
tags/mlx5-odp-for-4.11

for you to fetch changes up to 4ca4c0713ca3097f1be94355d9a36bd1fc7243a2
IB/mlx5: Improve MR check

Regards,
Saeed.

Artemy Kovalyov (8):
  net/mlx5: Support new MR features
  IB/mlx5: Refactor UMR post send format
  IB/mlx5: Add support for big MRs
  IB/mlx5: Add MR cache for large UMR regions
  net/mlx5: Update PAGE_FAULT_RESUME layout
  {net,IB}/mlx5: Refactor page fault handling
  IB/mlx5: Add ODP atomics support
  IB/mlx5: Improve MR check

Binoy Jayan (1):
  IB/mlx5: Add helper mlx5_ib_post_send_wait

Leon Romanovsky (1):
  IB/mlx5: Reorder code in query device command

Max Gurtovoy (1):
  net/mlx5: Fix offset naming for reserved fields in hca_cap_bits

 drivers/infiniband/hw/mlx5/main.c  |  50 +-
 drivers/infiniband/hw/mlx5/mem.c   |  32 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |  89 ++--
 drivers/infiniband/hw/mlx5/mr.c| 518 -
 drivers/infiniband/hw/mlx5/odp.c   | 424 -
 drivers/infiniband/hw/mlx5/qp.c| 154 ++
 drivers/net/ethernet/mellanox/mlx5/core/dev.c  |  33 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   | 290 ++--
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  41 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   2 +
 drivers/net/ethernet/mellanox/mlx5/core/qp.c   | 114 -
 include/linux/mlx5/device.h|   6 +-
 include/linux/mlx5/driver.h| 105 -
 include/linux/mlx5/mlx5_ifc.h  |  31 +-
 include/linux/mlx5/qp.h|  76 ++-
 16 files changed, 1000 insertions(+), 967 deletions(-)

-- 
2.7.4



[PATCH for-next 03/11] IB/mlx5: Add helper mlx5_ib_post_send_wait

2017-01-01 Thread Saeed Mahameed
From: Binoy Jayan <binoy.ja...@linaro.org>

Clean up the following common code (to post a list of work requests to the
send queue of the specified QP) at various places and add a helper function
'mlx5_ib_post_send_wait' to implement the same.

 - Initialize 'mlx5_ib_umr_context' on stack
 - Assign "mlx5_umr_wr:wr:wr_cqe to umr_context.cqe
 - Acquire the semaphore
 - call ib_post_send with a single ib_send_wr
 - wait_for_completion()
 - Check for umr_context.status
 - Release the semaphore

Signed-off-by: Binoy Jayan <binoy.ja...@linaro.org>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/mr.c | 115 +++-
 1 file changed, 32 insertions(+), 83 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 8f608deb..afb6dc1 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -891,16 +891,40 @@ static inline void mlx5_ib_init_umr_context(struct 
mlx5_ib_umr_context *context)
init_completion(>done);
 }
 
+static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
+struct mlx5_umr_wr *umrwr)
+{
+   struct umr_common *umrc = >umrc;
+   struct ib_send_wr *bad;
+   int err;
+   struct mlx5_ib_umr_context umr_context;
+
+   mlx5_ib_init_umr_context(_context);
+   umrwr->wr.wr_cqe = _context.cqe;
+
+   down(>sem);
+   err = ib_post_send(umrc->qp, >wr, );
+   if (err) {
+   mlx5_ib_warn(dev, "UMR post send failed, err %d\n", err);
+   } else {
+   wait_for_completion(_context.done);
+   if (umr_context.status != IB_WC_SUCCESS) {
+   mlx5_ib_warn(dev, "reg umr failed (%u)\n",
+umr_context.status);
+   err = -EFAULT;
+   }
+   }
+   up(>sem);
+   return err;
+}
+
 static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct ib_umem *umem,
  u64 virt_addr, u64 len, int npages,
  int page_shift, int order, int access_flags)
 {
struct mlx5_ib_dev *dev = to_mdev(pd->device);
struct device *ddev = dev->ib_dev.dma_device;
-   struct umr_common *umrc = >umrc;
-   struct mlx5_ib_umr_context umr_context;
struct mlx5_umr_wr umrwr = {};
-   struct ib_send_wr *bad;
struct mlx5_ib_mr *mr;
struct ib_sge sg;
int size;
@@ -929,24 +953,12 @@ static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, 
struct ib_umem *umem,
if (err)
goto free_mr;
 
-   mlx5_ib_init_umr_context(_context);
-
-   umrwr.wr.wr_cqe = _context.cqe;
prep_umr_reg_wqe(pd, , , dma, npages, mr->mmkey.key,
 page_shift, virt_addr, len, access_flags);
 
-   down(>sem);
-   err = ib_post_send(umrc->qp, , );
-   if (err) {
-   mlx5_ib_warn(dev, "post send failed, err %d\n", err);
+   err = mlx5_ib_post_send_wait(dev, );
+   if (err && err != -EFAULT)
goto unmap_dma;
-   } else {
-   wait_for_completion(_context.done);
-   if (umr_context.status != IB_WC_SUCCESS) {
-   mlx5_ib_warn(dev, "reg umr failed\n");
-   err = -EFAULT;
-   }
-   }
 
mr->mmkey.iova = virt_addr;
mr->mmkey.size = len;
@@ -955,7 +967,6 @@ static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct 
ib_umem *umem,
mr->live = 1;
 
 unmap_dma:
-   up(>sem);
dma_unmap_single(ddev, dma, size, DMA_TO_DEVICE);
 
kfree(mr_pas);
@@ -975,13 +986,10 @@ int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 
start_page_index, int npages,
 {
struct mlx5_ib_dev *dev = mr->dev;
struct device *ddev = dev->ib_dev.dma_device;
-   struct umr_common *umrc = >umrc;
-   struct mlx5_ib_umr_context umr_context;
struct ib_umem *umem = mr->umem;
int size;
__be64 *pas;
dma_addr_t dma;
-   struct ib_send_wr *bad;
struct mlx5_umr_wr wr;
struct ib_sge sg;
int err = 0;
@@ -1046,10 +1054,7 @@ int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 
start_page_index, int npages,
 
dma_sync_single_for_device(ddev, dma, size, DMA_TO_DEVICE);
 
-   mlx5_ib_init_umr_context(_context);
-
memset(, 0, sizeof(wr));
-   wr.wr.wr_cqe = _context.cqe;
 
sg.addr = dma;
sg.length = ALIGN(npages * sizeof(u64),
@@ -1066,19 +1071,7 @@ int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 
start_page_index, int npages,
wr.mkey = mr->mmkey.key;
 

[PATCH for-next 02/11] IB/mlx5: Reorder code in query device command

2017-01-01 Thread Saeed Mahameed
From: Leon Romanovsky <l...@kernel.org>

The order of features exposed by private mlx5-abi.h
file is CQE zipping, packet pacing and multi-packet WQE.

The internal order implemented in mlx5_ib_query_device() is
multi-packet WQE, CQE zipping and packet pacing.

Such difference hurts code readability, so let's sync,
while mlx5-abi.h (exposed to userspace) is the primary
order.

This commit doesn't change any functionality.

Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index d566f67..2ab4e32 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -672,17 +672,6 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
1 << MLX5_CAP_GEN(dev->mdev, log_max_rq);
}
 
-   if (field_avail(typeof(resp), mlx5_ib_support_multi_pkt_send_wqes,
-   uhw->outlen)) {
-   resp.mlx5_ib_support_multi_pkt_send_wqes =
-   MLX5_CAP_ETH(mdev, multi_pkt_send_wqe);
-   resp.response_length +=
-   sizeof(resp.mlx5_ib_support_multi_pkt_send_wqes);
-   }
-
-   if (field_avail(typeof(resp), reserved, uhw->outlen))
-   resp.response_length += sizeof(resp.reserved);
-
if (field_avail(typeof(resp), cqe_comp_caps, uhw->outlen)) {
resp.cqe_comp_caps.max_num =
MLX5_CAP_GEN(dev->mdev, cqe_compression) ?
@@ -706,6 +695,17 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
resp.response_length += sizeof(resp.packet_pacing_caps);
}
 
+   if (field_avail(typeof(resp), mlx5_ib_support_multi_pkt_send_wqes,
+   uhw->outlen)) {
+   resp.mlx5_ib_support_multi_pkt_send_wqes =
+   MLX5_CAP_ETH(mdev, multi_pkt_send_wqe);
+   resp.response_length +=
+   sizeof(resp.mlx5_ib_support_multi_pkt_send_wqes);
+   }
+
+   if (field_avail(typeof(resp), reserved, uhw->outlen))
+   resp.response_length += sizeof(resp.reserved);
+
if (uhw->outlen) {
err = ib_copy_to_udata(uhw, , resp.response_length);
 
-- 
2.7.4



[PATCH for-next 11/11] IB/mlx5: Improve MR check

2017-01-01 Thread Saeed Mahameed
From: Artemy Kovalyov <artem...@mellanox.com>

Add "type" field to mlx5_core MKEY struct.
Check whether page fault happens on MKEY corresponding to MR.

Signed-off-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/mr.c  | 5 +
 drivers/infiniband/hw/mlx5/odp.c | 9 +++--
 include/linux/mlx5/driver.h  | 6 ++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index e4333c8..f56b249 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -129,6 +129,7 @@ static void reg_mr_callback(int status, void *context)
return;
}
 
+   mr->mmkey.type = MLX5_MKEY_MR;
spin_lock_irqsave(>mdev->priv.mkey_lock, flags);
key = dev->mdev->priv.mkey_key++;
spin_unlock_irqrestore(>mdev->priv.mkey_lock, flags);
@@ -728,6 +729,7 @@ struct ib_mr *mlx5_ib_get_dma_mr(struct ib_pd *pd, int acc)
goto err_in;
 
kfree(in);
+   mr->mmkey.type = MLX5_MKEY_MR;
mr->ibmr.lkey = mr->mmkey.key;
mr->ibmr.rkey = mr->mmkey.key;
mr->umem = NULL;
@@ -1088,6 +1090,7 @@ static struct mlx5_ib_mr *reg_create(struct ib_mr *ibmr, 
struct ib_pd *pd,
mlx5_ib_warn(dev, "create mkey failed\n");
goto err_2;
}
+   mr->mmkey.type = MLX5_MKEY_MR;
mr->umem = umem;
mr->dev = dev;
mr->live = 1;
@@ -1533,6 +1536,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
if (err)
goto err_destroy_psv;
 
+   mr->mmkey.type = MLX5_MKEY_MR;
mr->ibmr.lkey = mr->mmkey.key;
mr->ibmr.rkey = mr->mmkey.key;
mr->umem = NULL;
@@ -1613,6 +1617,7 @@ struct ib_mw *mlx5_ib_alloc_mw(struct ib_pd *pd, enum 
ib_mw_type type,
if (err)
goto free;
 
+   mw->mmkey.type = MLX5_MKEY_MW;
mw->ibmw.rkey = mw->mmkey.key;
 
resp.response_length = min(offsetof(typeof(resp), response_length) +
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 971b288..e5bc267 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -155,9 +155,14 @@ static struct mlx5_ib_mr *mlx5_ib_odp_find_mr_lkey(struct 
mlx5_ib_dev *dev,
 {
u32 base_key = mlx5_base_mkey(key);
struct mlx5_core_mkey *mmkey = __mlx5_mr_lookup(dev->mdev, base_key);
-   struct mlx5_ib_mr *mr = container_of(mmkey, struct mlx5_ib_mr, mmkey);
+   struct mlx5_ib_mr *mr;
+
+   if (!mmkey || mmkey->key != key || mmkey->type != MLX5_MKEY_MR)
+   return NULL;
+
+   mr = container_of(mmkey, struct mlx5_ib_mr, mmkey);
 
-   if (!mmkey || mmkey->key != key || !mr->live)
+   if (!mr->live)
return NULL;
 
return container_of(mmkey, struct mlx5_ib_mr, mmkey);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index b52d074..cfa49bc 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -394,11 +394,17 @@ struct mlx5_core_sig_ctx {
u32 sigerr_count;
 };
 
+enum {
+   MLX5_MKEY_MR = 1,
+   MLX5_MKEY_MW,
+};
+
 struct mlx5_core_mkey {
u64 iova;
u64 size;
u32 key;
u32 pd;
+   u32 type;
 };
 
 #define MLX5_24BIT_MASK((1 << 24) - 1)
-- 
2.7.4



[PATCH for-next 05/11] IB/mlx5: Refactor UMR post send format

2017-01-01 Thread Saeed Mahameed
From: Artemy Kovalyov <artem...@mellanox.com>

* Update struct mlx5_wqe_umr_ctrl_seg.
* Currenlty UMR send_flags aim only certain use cases: enabled/disable
  cached MR, modifying XLT for ODP. By making flags independent make UMR
  more flexible allowing arbitrary manipulations.
* Since different UMR formats have different entry sizes UMR request
  should receive exact size of translation table update instead of
  number of entries. Rename field npages to xlt_size in struct mlx5_umr_wr
  and update relevant code accordingly.
* Add support of length64 bit.

Signed-off-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h  |  24 ++--
 drivers/infiniband/hw/mlx5/mr.c   |  50 +
 drivers/infiniband/hw/mlx5/odp.c  |   3 +-
 drivers/infiniband/hw/mlx5/qp.c   | 128 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |   2 +-
 include/linux/mlx5/qp.h   |  14 ++-
 6 files changed, 103 insertions(+), 118 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 6c6057e..d79580d 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -174,13 +174,12 @@ struct mlx5_ib_flow_db {
  * enum ib_send_flags and enum ib_qp_type for low-level driver
  */
 
-#define MLX5_IB_SEND_UMR_UNREG IB_SEND_RESERVED_START
-#define MLX5_IB_SEND_UMR_FAIL_IF_FREE (IB_SEND_RESERVED_START << 1)
-#define MLX5_IB_SEND_UMR_UPDATE_MTT (IB_SEND_RESERVED_START << 2)
-
-#define MLX5_IB_SEND_UMR_UPDATE_TRANSLATION(IB_SEND_RESERVED_START << 3)
-#define MLX5_IB_SEND_UMR_UPDATE_PD (IB_SEND_RESERVED_START << 4)
-#define MLX5_IB_SEND_UMR_UPDATE_ACCESS IB_SEND_RESERVED_END
+#define MLX5_IB_SEND_UMR_ENABLE_MR(IB_SEND_RESERVED_START << 0)
+#define MLX5_IB_SEND_UMR_DISABLE_MR   (IB_SEND_RESERVED_START << 1)
+#define MLX5_IB_SEND_UMR_FAIL_IF_FREE (IB_SEND_RESERVED_START << 2)
+#define MLX5_IB_SEND_UMR_UPDATE_XLT   (IB_SEND_RESERVED_START << 3)
+#define MLX5_IB_SEND_UMR_UPDATE_TRANSLATION(IB_SEND_RESERVED_START << 4)
+#define MLX5_IB_SEND_UMR_UPDATE_PD_ACCESS   IB_SEND_RESERVED_END
 
 #define MLX5_IB_QPT_REG_UMRIB_QPT_RESERVED1
 /*
@@ -190,6 +189,9 @@ struct mlx5_ib_flow_db {
 #define MLX5_IB_QPT_HW_GSI IB_QPT_RESERVED2
 #define MLX5_IB_WR_UMR IB_WR_RESERVED1
 
+#define MLX5_IB_UMR_OCTOWORD  16
+#define MLX5_IB_UMR_XLT_ALIGNMENT  64
+
 /* Private QP creation flags to be passed in ib_qp_init_attr.create_flags.
  *
  * These flags are intended for internal use by the mlx5_ib driver, and they
@@ -414,13 +416,11 @@ enum mlx5_ib_qp_flags {
 
 struct mlx5_umr_wr {
struct ib_send_wr   wr;
-   union {
-   u64 virt_addr;
-   u64 offset;
-   } target;
+   u64 virt_addr;
+   u64 offset;
struct ib_pd   *pd;
unsigned intpage_shift;
-   unsigned intnpages;
+   unsigned intxlt_size;
u64 length;
int access_flags;
u32 mkey;
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index afb6dc1..738ba13 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -774,7 +774,7 @@ static int dma_map_mr_pas(struct mlx5_ib_dev *dev, struct 
ib_umem *umem,
 * To avoid copying garbage after the pas array, we allocate
 * a little more.
 */
-   *size = ALIGN(sizeof(u64) * npages, MLX5_UMR_MTT_ALIGNMENT);
+   *size = ALIGN(sizeof(struct mlx5_mtt) * npages, MLX5_UMR_MTT_ALIGNMENT);
*mr_pas = kmalloc(*size + MLX5_UMR_ALIGN - 1, GFP_KERNEL);
if (!(*mr_pas))
return -ENOMEM;
@@ -782,7 +782,7 @@ static int dma_map_mr_pas(struct mlx5_ib_dev *dev, struct 
ib_umem *umem,
pas = PTR_ALIGN(*mr_pas, MLX5_UMR_ALIGN);
mlx5_ib_populate_pas(dev, umem, page_shift, pas, MLX5_IB_MTT_PRESENT);
/* Clear padding after the actual pages. */
-   memset(pas + npages, 0, *size - npages * sizeof(u64));
+   memset(pas + npages, 0, *size - npages * sizeof(struct mlx5_mtt));
 
*dma = dma_map_single(ddev, pas, *size, DMA_TO_DEVICE);
if (dma_mapping_error(ddev, *dma)) {
@@ -801,7 +801,8 @@ static void prep_umr_wqe_common(struct ib_pd *pd, struct 
ib_send_wr *wr,
struct mlx5_umr_wr *umrwr = umr_wr(wr);
 
sg->addr = dma;
-   sg->length 

[PATCH for-next 08/11] net/mlx5: Update PAGE_FAULT_RESUME layout

2017-01-01 Thread Saeed Mahameed
From: Artemy Kovalyov <artem...@mellanox.com>

Update PAGE_FAULT_RESUME command layout.

Three bit fields describing page fault: rdma, rdma_write, req_res gave 8
possible combinations, while only a few were legal. Now they
are interpreted as three-bit type field, where former legal
combinations turns into corresponding types and unused were added as new
types.

Signed-off-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/qp.c | 10 ++
 include/linux/mlx5/mlx5_ifc.h|  9 -
 2 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c 
b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
index d0a4005..5378a5f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
@@ -515,14 +515,8 @@ int mlx5_core_page_fault_resume(struct mlx5_core_dev *dev, 
u32 qpn,
 
MLX5_SET(page_fault_resume_in, in, opcode,
 MLX5_CMD_OP_PAGE_FAULT_RESUME);
-   MLX5_SET(page_fault_resume_in, in, qpn, qpn);
-
-   if (flags & MLX5_PAGE_FAULT_RESUME_REQUESTOR)
-   MLX5_SET(page_fault_resume_in, in, req_res, 1);
-   if (flags & MLX5_PAGE_FAULT_RESUME_WRITE)
-   MLX5_SET(page_fault_resume_in, in, read_write, 1);
-   if (flags & MLX5_PAGE_FAULT_RESUME_RDMA)
-   MLX5_SET(page_fault_resume_in, in, rdma, 1);
+   MLX5_SET(page_fault_resume_in, in, wq_number, qpn);
+   MLX5_SET(page_fault_resume_in, in, page_fault_type, flags);
if (error)
MLX5_SET(page_fault_resume_in, in, error, 1);
 
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 7c760e5..608dc98 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -4775,12 +4775,11 @@ struct mlx5_ifc_page_fault_resume_in_bits {
 
u8 error[0x1];
u8 reserved_at_41[0x4];
-   u8 rdma[0x1];
-   u8 read_write[0x1];
-   u8 req_res[0x1];
-   u8 qpn[0x18];
+   u8 page_fault_type[0x3];
+   u8 wq_number[0x18];
 
-   u8 reserved_at_60[0x20];
+   u8 reserved_at_60[0x8];
+   u8 token[0x18];
 };
 
 struct mlx5_ifc_nop_out_bits {
-- 
2.7.4



[PATCH for-next 07/11] IB/mlx5: Add MR cache for large UMR regions

2017-01-01 Thread Saeed Mahameed
From: Artemy Kovalyov <artem...@mellanox.com>

In this change we turn mlx5_ib_update_mtt() into generic
mlx5_ib_update_xlt() to perfrom HCA translation table modifiactions
supporting both atomic and process contexts and not limited by number
of modified entries.
Using this function we increase preallocated MRs up to 16GB.

Signed-off-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c  |  14 +-
 drivers/infiniband/hw/mlx5/mem.c   |  32 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |  15 +-
 drivers/infiniband/hw/mlx5/mr.c| 386 ++---
 drivers/infiniband/hw/mlx5/odp.c   |  19 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  20 ++
 include/linux/mlx5/driver.h|   2 +-
 7 files changed, 240 insertions(+), 248 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 2ab4e32..b871272 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1112,11 +1112,18 @@ static struct ib_ucontext 
*mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
context->ibucontext.invalidate_range = _ib_invalidate_range;
 #endif
 
+   context->upd_xlt_page = __get_free_page(GFP_KERNEL);
+   if (!context->upd_xlt_page) {
+   err = -ENOMEM;
+   goto out_uars;
+   }
+   mutex_init(>upd_xlt_page_mutex);
+
if (MLX5_CAP_GEN(dev->mdev, log_max_transport_domain)) {
err = mlx5_core_alloc_transport_domain(dev->mdev,
   >tdn);
if (err)
-   goto out_uars;
+   goto out_page;
}
 
INIT_LIST_HEAD(>vma_private_list);
@@ -1168,6 +1175,9 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
if (MLX5_CAP_GEN(dev->mdev, log_max_transport_domain))
mlx5_core_dealloc_transport_domain(dev->mdev, context->tdn);
 
+out_page:
+   free_page(context->upd_xlt_page);
+
 out_uars:
for (i--; i >= 0; i--)
mlx5_cmd_free_uar(dev->mdev, uars[i].index);
@@ -1195,6 +1205,8 @@ static int mlx5_ib_dealloc_ucontext(struct ib_ucontext 
*ibcontext)
if (MLX5_CAP_GEN(dev->mdev, log_max_transport_domain))
mlx5_core_dealloc_transport_domain(dev->mdev, context->tdn);
 
+   free_page(context->upd_xlt_page);
+
for (i = 0; i < uuari->num_uars; i++) {
if (mlx5_cmd_free_uar(dev->mdev, uuari->uars[i].index))
mlx5_ib_warn(dev, "failed to free UAR 0x%x\n", 
uuari->uars[i].index);
diff --git a/drivers/infiniband/hw/mlx5/mem.c b/drivers/infiniband/hw/mlx5/mem.c
index 6851357..778d8a1 100644
--- a/drivers/infiniband/hw/mlx5/mem.c
+++ b/drivers/infiniband/hw/mlx5/mem.c
@@ -159,7 +159,7 @@ void __mlx5_ib_populate_pas(struct mlx5_ib_dev *dev, struct 
ib_umem *umem,
unsigned long umem_page_shift = ilog2(umem->page_size);
int shift = page_shift - umem_page_shift;
int mask = (1 << shift) - 1;
-   int i, k;
+   int i, k, idx;
u64 cur = 0;
u64 base;
int len;
@@ -185,18 +185,36 @@ void __mlx5_ib_populate_pas(struct mlx5_ib_dev *dev, 
struct ib_umem *umem,
for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {
len = sg_dma_len(sg) >> umem_page_shift;
base = sg_dma_address(sg);
-   for (k = 0; k < len; k++) {
+
+   /* Skip elements below offset */
+   if (i + len < offset << shift) {
+   i += len;
+   continue;
+   }
+
+   /* Skip pages below offset */
+   if (i < offset << shift) {
+   k = (offset << shift) - i;
+   i = offset << shift;
+   } else {
+   k = 0;
+   }
+
+   for (; k < len; k++) {
if (!(i & mask)) {
cur = base + (k << umem_page_shift);
cur |= access_flags;
+   idx = (i >> shift) - offset;
 
-   pas[i >> shift] = cpu_to_be64(cur);
+   pas[idx] = cpu_to_be64(cur);
mlx5_ib_dbg(dev, "pas[%d] 0x%llx\n",
-   i >> shift, be64_to_cpu(pas[i >> 
shift]));
-   }  else
-   mlx5_ib_dbg(dev, "=> 0x%llx\n",
-   base + 

[PATCH for-next 06/11] IB/mlx5: Add support for big MRs

2017-01-01 Thread Saeed Mahameed
From: Artemy Kovalyov <artem...@mellanox.com>

Make use of extended UMR translation offset.

Signed-off-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
 drivers/infiniband/hw/mlx5/mr.c  | 8 +---
 drivers/infiniband/hw/mlx5/odp.c | 5 +
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index d79580d..73bff77 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -634,6 +634,7 @@ struct mlx5_ib_dev {
int fill_delay;
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
struct ib_odp_caps  odp_caps;
+   u64 odp_max_size;
/*
 * Sleepable RCU that prevents destruction of MRs while they are still
 * being used by a page fault handler.
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 738ba13..271b78e 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1121,8 +1121,9 @@ static struct mlx5_ib_mr *reg_create(struct ib_mr *ibmr, 
struct ib_pd *pd,
goto err_1;
}
pas = (__be64 *)MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt);
-   mlx5_ib_populate_pas(dev, umem, page_shift, pas,
-pg_cap ? MLX5_IB_MTT_PRESENT : 0);
+   if (!(access_flags & IB_ACCESS_ON_DEMAND))
+   mlx5_ib_populate_pas(dev, umem, page_shift, pas,
+pg_cap ? MLX5_IB_MTT_PRESENT : 0);
 
/* The pg_access bit allows setting the access flags
 * in the page list submitted with the command. */
@@ -1210,7 +1211,8 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 
start, u64 length,
mlx5_ib_dbg(dev, "cache empty for order %d", order);
mr = NULL;
}
-   } else if (access_flags & IB_ACCESS_ON_DEMAND) {
+   } else if (access_flags & IB_ACCESS_ON_DEMAND &&
+  !MLX5_CAP_GEN(dev->mdev, umr_extended_translation_offset)) {
err = -EINVAL;
pr_err("Got MR registration for ODP MR > 512MB, not supported 
for Connect-IB");
goto error;
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 67651ec..1e73c12 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -121,6 +121,11 @@ void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev 
*dev)
 
caps->general_caps = IB_ODP_SUPPORT;
 
+   if (MLX5_CAP_GEN(dev->mdev, umr_extended_translation_offset))
+   dev->odp_max_size = U64_MAX;
+   else
+   dev->odp_max_size = BIT_ULL(MLX5_MAX_UMR_SHIFT + PAGE_SHIFT);
+
if (MLX5_CAP_ODP(dev->mdev, ud_odp_caps.send))
caps->per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SEND;
 
-- 
2.7.4



[PATCH for-next V2 02/11] IB/mlx5: Reorder code in query device command

2017-01-02 Thread Saeed Mahameed
From: Leon Romanovsky <l...@kernel.org>

The order of features exposed by private mlx5-abi.h
file is CQE zipping, packet pacing and multi-packet WQE.

The internal order implemented in mlx5_ib_query_device() is
multi-packet WQE, CQE zipping and packet pacing.

Such difference hurts code readability, so let's sync,
while mlx5-abi.h (exposed to userspace) is the primary
order.

This commit doesn't change any functionality.

Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index d566f67..2ab4e32 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -672,17 +672,6 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
1 << MLX5_CAP_GEN(dev->mdev, log_max_rq);
}
 
-   if (field_avail(typeof(resp), mlx5_ib_support_multi_pkt_send_wqes,
-   uhw->outlen)) {
-   resp.mlx5_ib_support_multi_pkt_send_wqes =
-   MLX5_CAP_ETH(mdev, multi_pkt_send_wqe);
-   resp.response_length +=
-   sizeof(resp.mlx5_ib_support_multi_pkt_send_wqes);
-   }
-
-   if (field_avail(typeof(resp), reserved, uhw->outlen))
-   resp.response_length += sizeof(resp.reserved);
-
if (field_avail(typeof(resp), cqe_comp_caps, uhw->outlen)) {
resp.cqe_comp_caps.max_num =
MLX5_CAP_GEN(dev->mdev, cqe_compression) ?
@@ -706,6 +695,17 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
resp.response_length += sizeof(resp.packet_pacing_caps);
}
 
+   if (field_avail(typeof(resp), mlx5_ib_support_multi_pkt_send_wqes,
+   uhw->outlen)) {
+   resp.mlx5_ib_support_multi_pkt_send_wqes =
+   MLX5_CAP_ETH(mdev, multi_pkt_send_wqe);
+   resp.response_length +=
+   sizeof(resp.mlx5_ib_support_multi_pkt_send_wqes);
+   }
+
+   if (field_avail(typeof(resp), reserved, uhw->outlen))
+   resp.response_length += sizeof(resp.reserved);
+
if (uhw->outlen) {
err = ib_copy_to_udata(uhw, , resp.response_length);
 
-- 
2.7.4



[PATCH for-next V2 08/11] net/mlx5: Update PAGE_FAULT_RESUME layout

2017-01-02 Thread Saeed Mahameed
From: Artemy Kovalyov <artem...@mellanox.com>

Update PAGE_FAULT_RESUME command layout.

Three bit fields describing page fault: rdma, rdma_write, req_res gave 8
possible combinations, while only a few were legal. Now they
are interpreted as three-bit type field, where former legal
combinations turns into corresponding types and unused were added as new
types.

Signed-off-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/qp.c | 10 ++
 include/linux/mlx5/mlx5_ifc.h|  9 -
 2 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c 
b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
index d0a4005..5378a5f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
@@ -515,14 +515,8 @@ int mlx5_core_page_fault_resume(struct mlx5_core_dev *dev, 
u32 qpn,
 
MLX5_SET(page_fault_resume_in, in, opcode,
 MLX5_CMD_OP_PAGE_FAULT_RESUME);
-   MLX5_SET(page_fault_resume_in, in, qpn, qpn);
-
-   if (flags & MLX5_PAGE_FAULT_RESUME_REQUESTOR)
-   MLX5_SET(page_fault_resume_in, in, req_res, 1);
-   if (flags & MLX5_PAGE_FAULT_RESUME_WRITE)
-   MLX5_SET(page_fault_resume_in, in, read_write, 1);
-   if (flags & MLX5_PAGE_FAULT_RESUME_RDMA)
-   MLX5_SET(page_fault_resume_in, in, rdma, 1);
+   MLX5_SET(page_fault_resume_in, in, wq_number, qpn);
+   MLX5_SET(page_fault_resume_in, in, page_fault_type, flags);
if (error)
MLX5_SET(page_fault_resume_in, in, error, 1);
 
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 7c760e5..608dc98 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -4775,12 +4775,11 @@ struct mlx5_ifc_page_fault_resume_in_bits {
 
u8 error[0x1];
u8 reserved_at_41[0x4];
-   u8 rdma[0x1];
-   u8 read_write[0x1];
-   u8 req_res[0x1];
-   u8 qpn[0x18];
+   u8 page_fault_type[0x3];
+   u8 wq_number[0x18];
 
-   u8 reserved_at_60[0x20];
+   u8 reserved_at_60[0x8];
+   u8 token[0x18];
 };
 
 struct mlx5_ifc_nop_out_bits {
-- 
2.7.4



[PATCH for-next V2 07/11] IB/mlx5: Add MR cache for large UMR regions

2017-01-02 Thread Saeed Mahameed
From: Artemy Kovalyov <artem...@mellanox.com>

In this change we turn mlx5_ib_update_mtt() into generic
mlx5_ib_update_xlt() to perfrom HCA translation table modifiactions
supporting both atomic and process contexts and not limited by number
of modified entries.
Using this function we increase preallocated MRs up to 16GB.

Signed-off-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c  |  14 +-
 drivers/infiniband/hw/mlx5/mem.c   |  32 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |  15 +-
 drivers/infiniband/hw/mlx5/mr.c| 386 ++---
 drivers/infiniband/hw/mlx5/odp.c   |  19 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  20 ++
 include/linux/mlx5/driver.h|   2 +-
 7 files changed, 240 insertions(+), 248 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 2ab4e32..b871272 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1112,11 +1112,18 @@ static struct ib_ucontext 
*mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
context->ibucontext.invalidate_range = _ib_invalidate_range;
 #endif
 
+   context->upd_xlt_page = __get_free_page(GFP_KERNEL);
+   if (!context->upd_xlt_page) {
+   err = -ENOMEM;
+   goto out_uars;
+   }
+   mutex_init(>upd_xlt_page_mutex);
+
if (MLX5_CAP_GEN(dev->mdev, log_max_transport_domain)) {
err = mlx5_core_alloc_transport_domain(dev->mdev,
   >tdn);
if (err)
-   goto out_uars;
+   goto out_page;
}
 
INIT_LIST_HEAD(>vma_private_list);
@@ -1168,6 +1175,9 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
if (MLX5_CAP_GEN(dev->mdev, log_max_transport_domain))
mlx5_core_dealloc_transport_domain(dev->mdev, context->tdn);
 
+out_page:
+   free_page(context->upd_xlt_page);
+
 out_uars:
for (i--; i >= 0; i--)
mlx5_cmd_free_uar(dev->mdev, uars[i].index);
@@ -1195,6 +1205,8 @@ static int mlx5_ib_dealloc_ucontext(struct ib_ucontext 
*ibcontext)
if (MLX5_CAP_GEN(dev->mdev, log_max_transport_domain))
mlx5_core_dealloc_transport_domain(dev->mdev, context->tdn);
 
+   free_page(context->upd_xlt_page);
+
for (i = 0; i < uuari->num_uars; i++) {
if (mlx5_cmd_free_uar(dev->mdev, uuari->uars[i].index))
mlx5_ib_warn(dev, "failed to free UAR 0x%x\n", 
uuari->uars[i].index);
diff --git a/drivers/infiniband/hw/mlx5/mem.c b/drivers/infiniband/hw/mlx5/mem.c
index 6851357..778d8a1 100644
--- a/drivers/infiniband/hw/mlx5/mem.c
+++ b/drivers/infiniband/hw/mlx5/mem.c
@@ -159,7 +159,7 @@ void __mlx5_ib_populate_pas(struct mlx5_ib_dev *dev, struct 
ib_umem *umem,
unsigned long umem_page_shift = ilog2(umem->page_size);
int shift = page_shift - umem_page_shift;
int mask = (1 << shift) - 1;
-   int i, k;
+   int i, k, idx;
u64 cur = 0;
u64 base;
int len;
@@ -185,18 +185,36 @@ void __mlx5_ib_populate_pas(struct mlx5_ib_dev *dev, 
struct ib_umem *umem,
for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {
len = sg_dma_len(sg) >> umem_page_shift;
base = sg_dma_address(sg);
-   for (k = 0; k < len; k++) {
+
+   /* Skip elements below offset */
+   if (i + len < offset << shift) {
+   i += len;
+   continue;
+   }
+
+   /* Skip pages below offset */
+   if (i < offset << shift) {
+   k = (offset << shift) - i;
+   i = offset << shift;
+   } else {
+   k = 0;
+   }
+
+   for (; k < len; k++) {
if (!(i & mask)) {
cur = base + (k << umem_page_shift);
cur |= access_flags;
+   idx = (i >> shift) - offset;
 
-   pas[i >> shift] = cpu_to_be64(cur);
+   pas[idx] = cpu_to_be64(cur);
mlx5_ib_dbg(dev, "pas[%d] 0x%llx\n",
-   i >> shift, be64_to_cpu(pas[i >> 
shift]));
-   }  else
-   mlx5_ib_dbg(dev, "=> 0x%llx\n",
-   base + 

[PATCH for-next V2 10/11] IB/mlx5: Add ODP atomics support

2017-01-02 Thread Saeed Mahameed
From: Artemy Kovalyov <artem...@mellanox.com>

Handle ODP atomic operations. When initiator of RDMA atomic
operation use ODP MR to provide source data handle pagefault properly.

Signed-off-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/odp.c | 88 +++-
 include/linux/mlx5/mlx5_ifc.h|  2 +-
 include/linux/mlx5/qp.h  | 18 
 3 files changed, 69 insertions(+), 39 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 26f96c7..971b288 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -144,6 +144,9 @@ void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev *dev)
if (MLX5_CAP_ODP(dev->mdev, rc_odp_caps.read))
caps->per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ;
 
+   if (MLX5_CAP_ODP(dev->mdev, rc_odp_caps.atomic))
+   caps->per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_ATOMIC;
+
return;
 }
 
@@ -386,6 +389,17 @@ static int pagefault_data_segments(struct mlx5_ib_dev *dev,
return ret < 0 ? ret : npages;
 }
 
+static const u32 mlx5_ib_odp_opcode_cap[] = {
+   [MLX5_OPCODE_SEND] = IB_ODP_SUPPORT_SEND,
+   [MLX5_OPCODE_SEND_IMM] = IB_ODP_SUPPORT_SEND,
+   [MLX5_OPCODE_SEND_INVAL]   = IB_ODP_SUPPORT_SEND,
+   [MLX5_OPCODE_RDMA_WRITE]   = IB_ODP_SUPPORT_WRITE,
+   [MLX5_OPCODE_RDMA_WRITE_IMM]   = IB_ODP_SUPPORT_WRITE,
+   [MLX5_OPCODE_RDMA_READ]= IB_ODP_SUPPORT_READ,
+   [MLX5_OPCODE_ATOMIC_CS]= IB_ODP_SUPPORT_ATOMIC,
+   [MLX5_OPCODE_ATOMIC_FA]= IB_ODP_SUPPORT_ATOMIC,
+};
+
 /*
  * Parse initiator WQE. Advances the wqe pointer to point at the
  * scatter-gather list, and set wqe_end to the end of the WQE.
@@ -396,6 +410,8 @@ static int mlx5_ib_mr_initiator_pfault_handler(
 {
struct mlx5_wqe_ctrl_seg *ctrl = *wqe;
u16 wqe_index = pfault->wqe.wqe_index;
+   u32 transport_caps;
+   struct mlx5_base_av *av;
unsigned ds, opcode;
 #if defined(DEBUG)
u32 ctrl_wqe_index, ctrl_qpn;
@@ -441,53 +457,49 @@ static int mlx5_ib_mr_initiator_pfault_handler(
 
opcode = be32_to_cpu(ctrl->opmod_idx_opcode) &
 MLX5_WQE_CTRL_OPCODE_MASK;
+
switch (qp->ibqp.qp_type) {
case IB_QPT_RC:
-   switch (opcode) {
-   case MLX5_OPCODE_SEND:
-   case MLX5_OPCODE_SEND_IMM:
-   case MLX5_OPCODE_SEND_INVAL:
-   if (!(dev->odp_caps.per_transport_caps.rc_odp_caps &
- IB_ODP_SUPPORT_SEND))
-   goto invalid_transport_or_opcode;
-   break;
-   case MLX5_OPCODE_RDMA_WRITE:
-   case MLX5_OPCODE_RDMA_WRITE_IMM:
-   if (!(dev->odp_caps.per_transport_caps.rc_odp_caps &
- IB_ODP_SUPPORT_WRITE))
-   goto invalid_transport_or_opcode;
-   *wqe += sizeof(struct mlx5_wqe_raddr_seg);
-   break;
-   case MLX5_OPCODE_RDMA_READ:
-   if (!(dev->odp_caps.per_transport_caps.rc_odp_caps &
- IB_ODP_SUPPORT_READ))
-   goto invalid_transport_or_opcode;
-   *wqe += sizeof(struct mlx5_wqe_raddr_seg);
-   break;
-   default:
-   goto invalid_transport_or_opcode;
-   }
+   transport_caps = dev->odp_caps.per_transport_caps.rc_odp_caps;
break;
case IB_QPT_UD:
-   switch (opcode) {
-   case MLX5_OPCODE_SEND:
-   case MLX5_OPCODE_SEND_IMM:
-   if (!(dev->odp_caps.per_transport_caps.ud_odp_caps &
- IB_ODP_SUPPORT_SEND))
-   goto invalid_transport_or_opcode;
-   *wqe += sizeof(struct mlx5_wqe_datagram_seg);
-   break;
-   default:
-   goto invalid_transport_or_opcode;
-   }
+   transport_caps = dev->odp_caps.per_transport_caps.ud_odp_caps;
break;
default:
-invalid_transport_or_opcode:
-   mlx5_ib_err(dev, "ODP fault on QP of an unsupported opcode or 
transport. transport: 0x%x opcode: 0x%x.\n",
-   qp->ibqp.qp_type, opcode);
+   mlx5_ib_err(dev, "ODP fault on QP of an unsupported transport 
0x%x\n",
+   qp->ibqp.qp_type);
+   return -EFAULT;
+   }
+
+   if (unlikely(opcode &

[PATCH for-next V2 06/11] IB/mlx5: Add support for big MRs

2017-01-02 Thread Saeed Mahameed
From: Artemy Kovalyov <artem...@mellanox.com>

Make use of extended UMR translation offset.

Signed-off-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
 drivers/infiniband/hw/mlx5/mr.c  | 8 +---
 drivers/infiniband/hw/mlx5/odp.c | 5 +
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index d79580d..73bff77 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -634,6 +634,7 @@ struct mlx5_ib_dev {
int fill_delay;
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
struct ib_odp_caps  odp_caps;
+   u64 odp_max_size;
/*
 * Sleepable RCU that prevents destruction of MRs while they are still
 * being used by a page fault handler.
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index be8d38d..4d40fe0 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1121,8 +1121,9 @@ static struct mlx5_ib_mr *reg_create(struct ib_mr *ibmr, 
struct ib_pd *pd,
goto err_1;
}
pas = (__be64 *)MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt);
-   mlx5_ib_populate_pas(dev, umem, page_shift, pas,
-pg_cap ? MLX5_IB_MTT_PRESENT : 0);
+   if (!(access_flags & IB_ACCESS_ON_DEMAND))
+   mlx5_ib_populate_pas(dev, umem, page_shift, pas,
+pg_cap ? MLX5_IB_MTT_PRESENT : 0);
 
/* The pg_access bit allows setting the access flags
 * in the page list submitted with the command. */
@@ -1210,7 +1211,8 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 
start, u64 length,
mlx5_ib_dbg(dev, "cache empty for order %d", order);
mr = NULL;
}
-   } else if (access_flags & IB_ACCESS_ON_DEMAND) {
+   } else if (access_flags & IB_ACCESS_ON_DEMAND &&
+  !MLX5_CAP_GEN(dev->mdev, umr_extended_translation_offset)) {
err = -EINVAL;
pr_err("Got MR registration for ODP MR > 512MB, not supported 
for Connect-IB");
goto error;
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 67651ec..1e73c12 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -121,6 +121,11 @@ void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev 
*dev)
 
caps->general_caps = IB_ODP_SUPPORT;
 
+   if (MLX5_CAP_GEN(dev->mdev, umr_extended_translation_offset))
+   dev->odp_max_size = U64_MAX;
+   else
+   dev->odp_max_size = BIT_ULL(MLX5_MAX_UMR_SHIFT + PAGE_SHIFT);
+
if (MLX5_CAP_ODP(dev->mdev, ud_odp_caps.send))
caps->per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SEND;
 
-- 
2.7.4



[PATCH for-next V2 09/11] {net,IB}/mlx5: Refactor page fault handling

2017-01-02 Thread Saeed Mahameed
From: Artemy Kovalyov <artem...@mellanox.com>

* Update page fault event according to last specification.
* Separate code path for page fault EQ, completion EQ and async EQ.
* Move page fault handling work queue from mlx5_ib static variable
  into mlx5_core page fault EQ.
* Allocate memory to store ODP event dynamically as the
  events arrive, since in atomic context - use mempool.
* Make mlx5_ib page fault handler run in process context.

Signed-off-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c  |  14 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |  49 +---
 drivers/infiniband/hw/mlx5/odp.c   | 300 -
 drivers/infiniband/hw/mlx5/qp.c|  26 --
 drivers/net/ethernet/mellanox/mlx5/core/dev.c  |  33 +++
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   | 290 +---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  21 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   2 +
 drivers/net/ethernet/mellanox/mlx5/core/qp.c   | 108 
 include/linux/mlx5/device.h|   6 +-
 include/linux/mlx5/driver.h|  97 ++-
 include/linux/mlx5/qp.h|  44 ---
 12 files changed, 522 insertions(+), 468 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index b871272..86c61e7 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3319,6 +3319,9 @@ static struct mlx5_interface mlx5_ib_interface = {
.add= mlx5_ib_add,
.remove = mlx5_ib_remove,
.event  = mlx5_ib_event,
+#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
+   .pfault = mlx5_ib_pfault,
+#endif
.protocol   = MLX5_INTERFACE_PROTOCOL_IB,
 };
 
@@ -3329,25 +3332,14 @@ static int __init mlx5_ib_init(void)
if (deprecated_prof_sel != 2)
pr_warn("prof_sel is deprecated for mlx5_ib, set it for 
mlx5_core\n");
 
-   err = mlx5_ib_odp_init();
-   if (err)
-   return err;
-
err = mlx5_register_interface(_ib_interface);
-   if (err)
-   goto clean_odp;
-
-   return err;
 
-clean_odp:
-   mlx5_ib_odp_cleanup();
return err;
 }
 
 static void __exit mlx5_ib_cleanup(void)
 {
mlx5_unregister_interface(_ib_interface);
-   mlx5_ib_odp_cleanup();
 }
 
 module_init(mlx5_ib_init);
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 02d9255..a51c805 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -277,29 +277,6 @@ struct mlx5_ib_rwq_ind_table {
u32 rqtn;
 };
 
-/*
- * Connect-IB can trigger up to four concurrent pagefaults
- * per-QP.
- */
-enum mlx5_ib_pagefault_context {
-   MLX5_IB_PAGEFAULT_RESPONDER_READ,
-   MLX5_IB_PAGEFAULT_REQUESTOR_READ,
-   MLX5_IB_PAGEFAULT_RESPONDER_WRITE,
-   MLX5_IB_PAGEFAULT_REQUESTOR_WRITE,
-   MLX5_IB_PAGEFAULT_CONTEXTS
-};
-
-static inline enum mlx5_ib_pagefault_context
-   mlx5_ib_get_pagefault_context(struct mlx5_pagefault *pagefault)
-{
-   return pagefault->flags & (MLX5_PFAULT_REQUESTOR | MLX5_PFAULT_WRITE);
-}
-
-struct mlx5_ib_pfault {
-   struct work_struct  work;
-   struct mlx5_pagefault   mpfault;
-};
-
 struct mlx5_ib_ubuffer {
struct ib_umem *umem;
int buf_size;
@@ -385,20 +362,6 @@ struct mlx5_ib_qp {
/* Store signature errors */
boolsignature_en;
 
-#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
-   /*
-* A flag that is true for QP's that are in a state that doesn't
-* allow page faults, and shouldn't schedule any more faults.
-*/
-   int disable_page_faults;
-   /*
-* The disable_page_faults_lock protects a QP's disable_page_faults
-* field, allowing for a thread to atomically check whether the QP
-* allows page faults, and if so schedule a page fault.
-*/
-   spinlock_t  disable_page_faults_lock;
-   struct mlx5_ib_pfault   pagefaults[MLX5_IB_PAGEFAULT_CONTEXTS];
-#endif
struct list_headqps_list;
struct list_headcq_recv_list;
struct list_headcq_send_list;
@@ -869,18 +832,13 @@ struct ib_rwq_ind_table 
*mlx5_ib_create_rwq_ind_table(struct ib_device *device,
 int mlx5_ib_destroy_rwq_ind_table(struct ib_rwq_ind_table *wq_ind_table);
 
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
-extern struct workqueue_struct *mlx5_ib_page_fault_wq;
-
 void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev *dev);
-void mlx5_ib_mr_pfault_handler(struct mlx5_ib_qp *

[PATCH for-next V2 11/11] IB/mlx5: Improve MR check

2017-01-02 Thread Saeed Mahameed
From: Artemy Kovalyov <artem...@mellanox.com>

Add "type" field to mlx5_core MKEY struct.
Check whether page fault happens on MKEY corresponding to MR.

Signed-off-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/mr.c  | 5 +
 drivers/infiniband/hw/mlx5/odp.c | 9 +++--
 include/linux/mlx5/driver.h  | 6 ++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index f4ecc10..8cf2a67 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -129,6 +129,7 @@ static void reg_mr_callback(int status, void *context)
return;
}
 
+   mr->mmkey.type = MLX5_MKEY_MR;
spin_lock_irqsave(>mdev->priv.mkey_lock, flags);
key = dev->mdev->priv.mkey_key++;
spin_unlock_irqrestore(>mdev->priv.mkey_lock, flags);
@@ -728,6 +729,7 @@ struct ib_mr *mlx5_ib_get_dma_mr(struct ib_pd *pd, int acc)
goto err_in;
 
kfree(in);
+   mr->mmkey.type = MLX5_MKEY_MR;
mr->ibmr.lkey = mr->mmkey.key;
mr->ibmr.rkey = mr->mmkey.key;
mr->umem = NULL;
@@ -1088,6 +1090,7 @@ static struct mlx5_ib_mr *reg_create(struct ib_mr *ibmr, 
struct ib_pd *pd,
mlx5_ib_warn(dev, "create mkey failed\n");
goto err_2;
}
+   mr->mmkey.type = MLX5_MKEY_MR;
mr->umem = umem;
mr->dev = dev;
mr->live = 1;
@@ -1533,6 +1536,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
if (err)
goto err_destroy_psv;
 
+   mr->mmkey.type = MLX5_MKEY_MR;
mr->ibmr.lkey = mr->mmkey.key;
mr->ibmr.rkey = mr->mmkey.key;
mr->umem = NULL;
@@ -1613,6 +1617,7 @@ struct ib_mw *mlx5_ib_alloc_mw(struct ib_pd *pd, enum 
ib_mw_type type,
if (err)
goto free;
 
+   mw->mmkey.type = MLX5_MKEY_MW;
mw->ibmw.rkey = mw->mmkey.key;
 
resp.response_length = min(offsetof(typeof(resp), response_length) +
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 971b288..e5bc267 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -155,9 +155,14 @@ static struct mlx5_ib_mr *mlx5_ib_odp_find_mr_lkey(struct 
mlx5_ib_dev *dev,
 {
u32 base_key = mlx5_base_mkey(key);
struct mlx5_core_mkey *mmkey = __mlx5_mr_lookup(dev->mdev, base_key);
-   struct mlx5_ib_mr *mr = container_of(mmkey, struct mlx5_ib_mr, mmkey);
+   struct mlx5_ib_mr *mr;
+
+   if (!mmkey || mmkey->key != key || mmkey->type != MLX5_MKEY_MR)
+   return NULL;
+
+   mr = container_of(mmkey, struct mlx5_ib_mr, mmkey);
 
-   if (!mmkey || mmkey->key != key || !mr->live)
+   if (!mr->live)
return NULL;
 
return container_of(mmkey, struct mlx5_ib_mr, mmkey);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index b52d074..cfa49bc 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -394,11 +394,17 @@ struct mlx5_core_sig_ctx {
u32 sigerr_count;
 };
 
+enum {
+   MLX5_MKEY_MR = 1,
+   MLX5_MKEY_MW,
+};
+
 struct mlx5_core_mkey {
u64 iova;
u64 size;
u32 key;
u32 pd;
+   u32 type;
 };
 
 #define MLX5_24BIT_MASK((1 << 24) - 1)
-- 
2.7.4



[PATCH for-next V2 05/11] IB/mlx5: Refactor UMR post send format

2017-01-02 Thread Saeed Mahameed
From: Artemy Kovalyov <artem...@mellanox.com>

* Update struct mlx5_wqe_umr_ctrl_seg.
* Currenlty UMR send_flags aim only certain use cases: enabled/disable
  cached MR, modifying XLT for ODP. By making flags independent make UMR
  more flexible allowing arbitrary manipulations.
* Since different UMR formats have different entry sizes UMR request
  should receive exact size of translation table update instead of
  number of entries. Rename field npages to xlt_size in struct mlx5_umr_wr
  and update relevant code accordingly.
* Add support of length64 bit.

Signed-off-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h  |  24 ++--
 drivers/infiniband/hw/mlx5/mr.c   |  50 +
 drivers/infiniband/hw/mlx5/odp.c  |   3 +-
 drivers/infiniband/hw/mlx5/qp.c   | 128 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |   2 +-
 include/linux/mlx5/qp.h   |  14 ++-
 6 files changed, 103 insertions(+), 118 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 6c6057e..d79580d 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -174,13 +174,12 @@ struct mlx5_ib_flow_db {
  * enum ib_send_flags and enum ib_qp_type for low-level driver
  */
 
-#define MLX5_IB_SEND_UMR_UNREG IB_SEND_RESERVED_START
-#define MLX5_IB_SEND_UMR_FAIL_IF_FREE (IB_SEND_RESERVED_START << 1)
-#define MLX5_IB_SEND_UMR_UPDATE_MTT (IB_SEND_RESERVED_START << 2)
-
-#define MLX5_IB_SEND_UMR_UPDATE_TRANSLATION(IB_SEND_RESERVED_START << 3)
-#define MLX5_IB_SEND_UMR_UPDATE_PD (IB_SEND_RESERVED_START << 4)
-#define MLX5_IB_SEND_UMR_UPDATE_ACCESS IB_SEND_RESERVED_END
+#define MLX5_IB_SEND_UMR_ENABLE_MR(IB_SEND_RESERVED_START << 0)
+#define MLX5_IB_SEND_UMR_DISABLE_MR   (IB_SEND_RESERVED_START << 1)
+#define MLX5_IB_SEND_UMR_FAIL_IF_FREE (IB_SEND_RESERVED_START << 2)
+#define MLX5_IB_SEND_UMR_UPDATE_XLT   (IB_SEND_RESERVED_START << 3)
+#define MLX5_IB_SEND_UMR_UPDATE_TRANSLATION(IB_SEND_RESERVED_START << 4)
+#define MLX5_IB_SEND_UMR_UPDATE_PD_ACCESS   IB_SEND_RESERVED_END
 
 #define MLX5_IB_QPT_REG_UMRIB_QPT_RESERVED1
 /*
@@ -190,6 +189,9 @@ struct mlx5_ib_flow_db {
 #define MLX5_IB_QPT_HW_GSI IB_QPT_RESERVED2
 #define MLX5_IB_WR_UMR IB_WR_RESERVED1
 
+#define MLX5_IB_UMR_OCTOWORD  16
+#define MLX5_IB_UMR_XLT_ALIGNMENT  64
+
 /* Private QP creation flags to be passed in ib_qp_init_attr.create_flags.
  *
  * These flags are intended for internal use by the mlx5_ib driver, and they
@@ -414,13 +416,11 @@ enum mlx5_ib_qp_flags {
 
 struct mlx5_umr_wr {
struct ib_send_wr   wr;
-   union {
-   u64 virt_addr;
-   u64 offset;
-   } target;
+   u64 virt_addr;
+   u64 offset;
struct ib_pd   *pd;
unsigned intpage_shift;
-   unsigned intnpages;
+   unsigned intxlt_size;
u64 length;
int access_flags;
u32 mkey;
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 7ab9b67..be8d38d 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -774,7 +774,7 @@ static int dma_map_mr_pas(struct mlx5_ib_dev *dev, struct 
ib_umem *umem,
 * To avoid copying garbage after the pas array, we allocate
 * a little more.
 */
-   *size = ALIGN(sizeof(u64) * npages, MLX5_UMR_MTT_ALIGNMENT);
+   *size = ALIGN(sizeof(struct mlx5_mtt) * npages, MLX5_UMR_MTT_ALIGNMENT);
*mr_pas = kmalloc(*size + MLX5_UMR_ALIGN - 1, GFP_KERNEL);
if (!(*mr_pas))
return -ENOMEM;
@@ -782,7 +782,7 @@ static int dma_map_mr_pas(struct mlx5_ib_dev *dev, struct 
ib_umem *umem,
pas = PTR_ALIGN(*mr_pas, MLX5_UMR_ALIGN);
mlx5_ib_populate_pas(dev, umem, page_shift, pas, MLX5_IB_MTT_PRESENT);
/* Clear padding after the actual pages. */
-   memset(pas + npages, 0, *size - npages * sizeof(u64));
+   memset(pas + npages, 0, *size - npages * sizeof(struct mlx5_mtt));
 
*dma = dma_map_single(ddev, pas, *size, DMA_TO_DEVICE);
if (dma_mapping_error(ddev, *dma)) {
@@ -801,7 +801,8 @@ static void prep_umr_wqe_common(struct ib_pd *pd, struct 
ib_send_wr *wr,
struct mlx5_umr_wr *umrwr = umr_wr(wr);
 
sg->addr = dma;
-   sg->length 

[PATCH for-next V2 03/11] IB/mlx5: Add helper mlx5_ib_post_send_wait

2017-01-02 Thread Saeed Mahameed
From: Binoy Jayan <binoy.ja...@linaro.org>

Clean up the following common code (to post a list of work requests to the
send queue of the specified QP) at various places and add a helper function
'mlx5_ib_post_send_wait' to implement the same.

 - Initialize 'mlx5_ib_umr_context' on stack
 - Assign "mlx5_umr_wr:wr:wr_cqe to umr_context.cqe
 - Acquire the semaphore
 - call ib_post_send with a single ib_send_wr
 - wait_for_completion()
 - Check for umr_context.status
 - Release the semaphore

Signed-off-by: Binoy Jayan <binoy.ja...@linaro.org>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/mr.c | 115 +++-
 1 file changed, 32 insertions(+), 83 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 8f608deb..7ab9b67 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -891,16 +891,40 @@ static inline void mlx5_ib_init_umr_context(struct 
mlx5_ib_umr_context *context)
init_completion(>done);
 }
 
+static int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
+ struct mlx5_umr_wr *umrwr)
+{
+   struct umr_common *umrc = >umrc;
+   struct ib_send_wr *bad;
+   int err;
+   struct mlx5_ib_umr_context umr_context;
+
+   mlx5_ib_init_umr_context(_context);
+   umrwr->wr.wr_cqe = _context.cqe;
+
+   down(>sem);
+   err = ib_post_send(umrc->qp, >wr, );
+   if (err) {
+   mlx5_ib_warn(dev, "UMR post send failed, err %d\n", err);
+   } else {
+   wait_for_completion(_context.done);
+   if (umr_context.status != IB_WC_SUCCESS) {
+   mlx5_ib_warn(dev, "reg umr failed (%u)\n",
+umr_context.status);
+   err = -EFAULT;
+   }
+   }
+   up(>sem);
+   return err;
+}
+
 static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct ib_umem *umem,
  u64 virt_addr, u64 len, int npages,
  int page_shift, int order, int access_flags)
 {
struct mlx5_ib_dev *dev = to_mdev(pd->device);
struct device *ddev = dev->ib_dev.dma_device;
-   struct umr_common *umrc = >umrc;
-   struct mlx5_ib_umr_context umr_context;
struct mlx5_umr_wr umrwr = {};
-   struct ib_send_wr *bad;
struct mlx5_ib_mr *mr;
struct ib_sge sg;
int size;
@@ -929,24 +953,12 @@ static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, 
struct ib_umem *umem,
if (err)
goto free_mr;
 
-   mlx5_ib_init_umr_context(_context);
-
-   umrwr.wr.wr_cqe = _context.cqe;
prep_umr_reg_wqe(pd, , , dma, npages, mr->mmkey.key,
 page_shift, virt_addr, len, access_flags);
 
-   down(>sem);
-   err = ib_post_send(umrc->qp, , );
-   if (err) {
-   mlx5_ib_warn(dev, "post send failed, err %d\n", err);
+   err = mlx5_ib_post_send_wait(dev, );
+   if (err && err != -EFAULT)
goto unmap_dma;
-   } else {
-   wait_for_completion(_context.done);
-   if (umr_context.status != IB_WC_SUCCESS) {
-   mlx5_ib_warn(dev, "reg umr failed\n");
-   err = -EFAULT;
-   }
-   }
 
mr->mmkey.iova = virt_addr;
mr->mmkey.size = len;
@@ -955,7 +967,6 @@ static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct 
ib_umem *umem,
mr->live = 1;
 
 unmap_dma:
-   up(>sem);
dma_unmap_single(ddev, dma, size, DMA_TO_DEVICE);
 
kfree(mr_pas);
@@ -975,13 +986,10 @@ int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 
start_page_index, int npages,
 {
struct mlx5_ib_dev *dev = mr->dev;
struct device *ddev = dev->ib_dev.dma_device;
-   struct umr_common *umrc = >umrc;
-   struct mlx5_ib_umr_context umr_context;
struct ib_umem *umem = mr->umem;
int size;
__be64 *pas;
dma_addr_t dma;
-   struct ib_send_wr *bad;
struct mlx5_umr_wr wr;
struct ib_sge sg;
int err = 0;
@@ -1046,10 +1054,7 @@ int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 
start_page_index, int npages,
 
dma_sync_single_for_device(ddev, dma, size, DMA_TO_DEVICE);
 
-   mlx5_ib_init_umr_context(_context);
-
memset(, 0, sizeof(wr));
-   wr.wr.wr_cqe = _context.cqe;
 
sg.addr = dma;
sg.length = ALIGN(npages * sizeof(u64),
@@ -1066,19 +1071,7 @@ int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 
start_page_index, int npages,
wr.mkey = mr->mmkey.key;
wr.target.offset = start_page_index;

[PATCH for-next V2 00/11] Mellanox mlx5 core and ODP updates 2017-01-01

2017-01-02 Thread Saeed Mahameed
Hi Dave and Doug,

The following eleven patches mainly come from Artemy Kovalyov
who expanded mlx5 on-demand-paging (ODP) support. In addition
there are three cleanup patches which don't change any functionality,
but are needed to align codebase prior accepting other patches.

Memory region (MR) in IB can be huge and ODP (on-demand paging)
technique allows to use unpinned memory, which can be consumed and
released on demand. This allows to applications do not pin down
the underlying physical pages of the address space, and save from them
need to track the validity of the mappings.

Rather, the HCA requests the latest translations from the OS when pages
are not present, and the OS invalidates translations which are no longer
valid due to either non-present pages or mapping changes.

In existing ODP implementation applications is needed to register
memory buffers for communication, though registered memory regions
need not have valid mappings at registration time.

This patch set performs the following steps to expand
current ODP implementation:

1. It refactors UMR to support large regions, by introducing generic
   function to perform HCA translation table modifications. This
   function supports both atomic and process contexts and is not limited
   by number of modified entries.

   This function allows to enable reallocated memory regions of
   arbitrary size, so adding MR cache buckets to support up to 16GB MRs.

2. It changes page fault event format and refactor page faults logic
   together with addition of atomic support.

3. It prepares mlx5 core code to support implicit registration with
   simplified and relaxed semantics.

   Implicit ODP semantics allows to applications provide special memory
   key that represents their complete address space. Thus all IO accesses
   referencing to this key (with proper access rights associated with the key)
   wouldn't need not register any virtual address range.

Thanks,
Artemy, Ilya and Leon

v1->v2:
  - Don't use 'inline' in .c files

The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77
Linux 4.10-rc1

are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git 
tags/mlx5-odp-for-4.11

for you to fetch changes up to 4ca4c0713ca3097f1be94355d9a36bd1fc7243a2
IB/mlx5: Improve MR check

Regards,
Saeed.

Artemy Kovalyov (8):
  net/mlx5: Support new MR features
  IB/mlx5: Refactor UMR post send format
  IB/mlx5: Add support for big MRs
  IB/mlx5: Add MR cache for large UMR regions
  net/mlx5: Update PAGE_FAULT_RESUME layout
  {net,IB}/mlx5: Refactor page fault handling
  IB/mlx5: Add ODP atomics support
  IB/mlx5: Improve MR check

Binoy Jayan (1):
  IB/mlx5: Add helper mlx5_ib_post_send_wait

Leon Romanovsky (1):
  IB/mlx5: Reorder code in query device command

Max Gurtovoy (1):
  net/mlx5: Fix offset naming for reserved fields in hca_cap_bits

 drivers/infiniband/hw/mlx5/main.c  |  50 +-
 drivers/infiniband/hw/mlx5/mem.c   |  32 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |  89 ++--
 drivers/infiniband/hw/mlx5/mr.c| 518 -
 drivers/infiniband/hw/mlx5/odp.c   | 424 -
 drivers/infiniband/hw/mlx5/qp.c| 154 ++
 drivers/net/ethernet/mellanox/mlx5/core/dev.c  |  33 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   | 290 ++--
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  41 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   2 +
 drivers/net/ethernet/mellanox/mlx5/core/qp.c   | 114 -
 include/linux/mlx5/device.h|   6 +-
 include/linux/mlx5/driver.h| 105 -
 include/linux/mlx5/mlx5_ifc.h  |  31 +-
 include/linux/mlx5/qp.h|  76 ++-
 16 files changed, 1000 insertions(+), 967 deletions(-)

-- 
2.7.4



[PATCH for-next V2 01/11] net/mlx5: Fix offset naming for reserved fields in hca_cap_bits

2017-01-02 Thread Saeed Mahameed
From: Max Gurtovoy <m...@mellanox.com>

Fix offset for reserved fields.

Fixes: 7486216b3a0b ("{net,IB}/mlx5: mlx5_ifc updates")
Fixes: b4ff3a36d3e4 ("net/mlx5: Use offset based reserved field names in the 
IFC header file")
Fixes: 7d5e14237a55 ("net/mlx5: Update mlx5_ifc hardware features")
Signed-off-by: Max Gurtovoy <m...@mellanox.com>
Reviewed-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 57bec54..4792c85 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -826,9 +826,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 reserved_at_1a9[0x2];
u8 local_ca_ack_delay[0x5];
u8 port_module_event[0x1];
-   u8 reserved_at_1b0[0x1];
+   u8 reserved_at_1b1[0x1];
u8 ports_check[0x1];
-   u8 reserved_at_1b2[0x1];
+   u8 reserved_at_1b3[0x1];
u8 disable_link_up[0x1];
u8 beacon_led[0x1];
u8 port_type[0x2];
@@ -858,7 +858,7 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 
u8 compact_address_vector[0x1];
u8 striding_rq[0x1];
-   u8 reserved_at_201[0x2];
+   u8 reserved_at_202[0x2];
u8 ipoib_basic_offloads[0x1];
u8 reserved_at_205[0xa];
u8 drain_sigerr[0x1];
@@ -1009,10 +1009,10 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 rndv_offload_rc[0x1];
u8 rndv_offload_dc[0x1];
u8 log_tag_matching_list_sz[0x5];
-   u8 reserved_at_5e8[0x3];
+   u8 reserved_at_5f8[0x3];
u8 log_max_xrq[0x5];
 
-   u8 reserved_at_5f0[0x200];
+   u8 reserved_at_600[0x200];
 };
 
 enum mlx5_flow_destination_type {
-- 
2.7.4



[PATCH for-next V2 04/11] net/mlx5: Support new MR features

2017-01-02 Thread Saeed Mahameed
From: Artemy Kovalyov <artem...@mellanox.com>

This patch adds the following items to IFC file.

1. MLX5_MKC_ACCESS_MODE_KSM enum value for creating KSM memory keys.
KSM access mode used when indirect MKey associated with fixed memory
size entries.

2. null_mkey field that is used to indicate non-present KLM/KSM
entries, where it causes the device to generate page fault event
when trying to access it.

3. struct mlx5_ifc_cmd_hca_cap_bits capability bits indicating
related value/field is supported:
* fixed_buffer_size - MLX5_MKC_ACCESS_MODE_KSM
* umr_extended_translation_offset - translation_offset_42_16
in UMR ctrl segment
* null_mkey - null_mkey in QUERY_SPECIAL_CONTEXTS

Signed-off-by: Artemy Kovalyov <artem...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 4792c85..7c760e5 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -782,11 +782,12 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 log_max_eq[0x4];
 
u8 max_indirection[0x8];
-   u8 reserved_at_108[0x1];
+   u8 fixed_buffer_size[0x1];
u8 log_max_mrw_sz[0x7];
u8 reserved_at_110[0x2];
u8 log_max_bsf_list_size[0x6];
-   u8 reserved_at_118[0x2];
+   u8 umr_extended_translation_offset[0x1];
+   u8 null_mkey[0x1];
u8 log_max_klm_list_size[0x6];
 
u8 reserved_at_120[0xa];
@@ -2569,6 +2570,7 @@ enum {
MLX5_MKC_ACCESS_MODE_PA= 0x0,
MLX5_MKC_ACCESS_MODE_MTT   = 0x1,
MLX5_MKC_ACCESS_MODE_KLMS  = 0x2,
+   MLX5_MKC_ACCESS_MODE_KSM   = 0x3,
 };
 
 struct mlx5_ifc_mkc_bits {
@@ -3677,6 +3679,10 @@ struct mlx5_ifc_query_special_contexts_out_bits {
u8 dump_fill_mkey[0x20];
 
u8 resd_lkey[0x20];
+
+   u8 null_mkey[0x20];
+
+   u8 reserved_at_a0[0x60];
 };
 
 struct mlx5_ifc_query_special_contexts_in_bits {
-- 
2.7.4



[for-next 07/10] IB/mlx5: Use blue flame register allocator in mlx5_ib

2017-01-03 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Make use of the blue flame registers allocator at mlx5_ib. Since blue
flame was not really supported we remove all the code that is related to
blue flame and we let all consumers to use the same blue flame register.
Once blue flame is supported we will add the code. As part of this patch
we also move the definition of struct mlx5_bf to mlx5_ib.h as it is only
used by mlx5_ib.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/cq.c |   8 +-
 drivers/infiniband/hw/mlx5/main.c   |  28 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h|  11 ++-
 drivers/infiniband/hw/mlx5/qp.c |  73 +++
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h|   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c  |  16 +---
 drivers/net/ethernet/mellanox/mlx5/core/uar.c   | 114 
 include/linux/mlx5/cq.h |   3 +-
 include/linux/mlx5/doorbell.h   |  32 +--
 include/linux/mlx5/driver.h |  19 
 11 files changed, 58 insertions(+), 252 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index bb7e91c..a28ec33 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -689,7 +689,7 @@ int mlx5_ib_arm_cq(struct ib_cq *ibcq, enum 
ib_cq_notify_flags flags)
 {
struct mlx5_core_dev *mdev = to_mdev(ibcq->device)->mdev;
struct mlx5_ib_cq *cq = to_mcq(ibcq);
-   void __iomem *uar_page = mdev->priv.bfregi.uars[0].map;
+   void __iomem *uar_page = mdev->priv.uar->map;
unsigned long irq_flags;
int ret = 0;
 
@@ -704,9 +704,7 @@ int mlx5_ib_arm_cq(struct ib_cq *ibcq, enum 
ib_cq_notify_flags flags)
mlx5_cq_arm(>mcq,
(flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED ?
MLX5_CQ_DB_REQ_NOT_SOL : MLX5_CQ_DB_REQ_NOT,
-   uar_page,
-   MLX5_GET_DOORBELL_LOCK(>priv.cq_uar_lock),
-   to_mcq(ibcq)->mcq.cons_index);
+   uar_page, to_mcq(ibcq)->mcq.cons_index);
 
return ret;
 }
@@ -886,7 +884,7 @@ static int create_cq_kernel(struct mlx5_ib_dev *dev, struct 
mlx5_ib_cq *cq,
MLX5_SET(cqc, cqc, log_page_size,
 cq->buf.buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
 
-   *index = dev->mdev->priv.bfregi.uars[0].index;
+   *index = dev->mdev->priv.uar->index;
 
return 0;
 
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index d5cf82b..e9f0830 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3074,8 +3074,6 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
if (mlx5_use_mad_ifc(dev))
get_ext_port_caps(dev);
 
-   MLX5_INIT_DOORBELL_LOCK(>uar_lock);
-
if (!mlx5_lag_is_active(mdev))
name = "mlx5_%d";
else
@@ -3251,9 +3249,21 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
if (err)
goto err_odp;
 
+   dev->mdev->priv.uar = mlx5_get_uars_page(dev->mdev);
+   if (!dev->mdev->priv.uar)
+   goto err_q_cnt;
+
+   err = mlx5_alloc_bfreg(dev->mdev, >bfreg, false, false);
+   if (err)
+   goto err_uar_page;
+
+   err = mlx5_alloc_bfreg(dev->mdev, >fp_bfreg, false, true);
+   if (err)
+   goto err_bfreg;
+
err = ib_register_device(>ib_dev, NULL);
if (err)
-   goto err_q_cnt;
+   goto err_fp_bfreg;
 
err = create_umr_res(dev);
if (err)
@@ -3276,6 +3286,15 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 err_dev:
ib_unregister_device(>ib_dev);
 
+err_fp_bfreg:
+   mlx5_free_bfreg(dev->mdev, >fp_bfreg);
+
+err_bfreg:
+   mlx5_free_bfreg(dev->mdev, >bfreg);
+
+err_uar_page:
+   mlx5_put_uars_page(dev->mdev, dev->mdev->priv.uar);
+
 err_q_cnt:
mlx5_ib_dealloc_q_counters(dev);
 
@@ -3307,6 +3326,9 @@ static void mlx5_ib_remove(struct mlx5_core_dev *mdev, 
void *context)
 
mlx5_remove_netdev_notifier(dev);
ib_unregister_device(>ib_dev);
+   mlx5_free_bfreg(dev->mdev, >fp_bfreg);
+   mlx5_free_bfreg(dev->mdev, >bfreg);
+   mlx5_put_uars_page(dev->mdev, mdev->priv.uar);
mlx5_ib_dealloc_q_counters(dev);
destroy_umrc_res(dev);
mlx5_ib_odp_remove_one(dev);
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index d4d1329..ae3bc4a 100644

[for-next 01/10] IB/mlx5: Fix kernel to user leak prevention logic

2017-01-03 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

The logic was broken as it failed to update the response length for
architectures with PAGE_SIZE larger than 4kB. As a result further
extension of the ucontext response struct would fail.

Fixes: d69e3bcf7976 ('IB/mlx5: Mmap the HCA's core clock register to 
user-space')
Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 86c61e7..852b5b7 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1148,13 +1148,13 @@ static struct ib_ucontext 
*mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
 * pretend we don't support reading the HCA's core clock. This is also
 * forced by mmap function.
 */
-   if (PAGE_SIZE <= 4096 &&
-   field_avail(typeof(resp), hca_core_clock_offset, udata->outlen)) {
-   resp.comp_mask |=
-   MLX5_IB_ALLOC_UCONTEXT_RESP_MASK_CORE_CLOCK_OFFSET;
-   resp.hca_core_clock_offset =
-   offsetof(struct mlx5_init_seg, internal_timer_h) %
-   PAGE_SIZE;
+   if (field_avail(typeof(resp), hca_core_clock_offset, udata->outlen)) {
+   if (PAGE_SIZE <= 4096) {
+   resp.comp_mask |=
+   
MLX5_IB_ALLOC_UCONTEXT_RESP_MASK_CORE_CLOCK_OFFSET;
+   resp.hca_core_clock_offset =
+   offsetof(struct mlx5_init_seg, 
internal_timer_h) % PAGE_SIZE;
+   }
resp.response_length += sizeof(resp.hca_core_clock_offset) +
sizeof(resp.reserved2);
}
-- 
2.7.4



[for-next 02/10] IB/mlx5: Fix error handling order in create_kernel_qp

2017-01-03 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Make sure order of cleanup is exactly the opposite of initialization.

Fixes: 9603b61de1ee ('mlx5: Move pci device handling from mlx5_ib to mlx5_core')
Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/qp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 53f4dd3..42d021cd 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -994,12 +994,12 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev,
return 0;
 
 err_wrid:
-   mlx5_db_free(dev->mdev, >db);
kfree(qp->sq.wqe_head);
kfree(qp->sq.w_list);
kfree(qp->sq.wrid);
kfree(qp->sq.wr_data);
kfree(qp->rq.wrid);
+   mlx5_db_free(dev->mdev, >db);
 
 err_free:
kvfree(*in);
@@ -1014,12 +1014,12 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev,
 
 static void destroy_qp_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp)
 {
-   mlx5_db_free(dev->mdev, >db);
kfree(qp->sq.wqe_head);
kfree(qp->sq.w_list);
kfree(qp->sq.wrid);
kfree(qp->sq.wr_data);
kfree(qp->rq.wrid);
+   mlx5_db_free(dev->mdev, >db);
mlx5_buf_free(dev->mdev, >buf);
free_uuar(>mdev->priv.uuari, qp->bf->uuarn);
 }
-- 
2.7.4



[for-next 03/10] mlx5: Fix naming convention with respect to UARs

2017-01-03 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

This establishes a solid naming conventions for UARs. A UAR (User Access
Region) can have size identical to a system page or can be fixed 4KB
depending on a value queried by firmware. Each UAR always has 4 blue
flame register which are used to post doorbell to send queue. In
addition, a UAR has section used for posting doorbells to CQs or EQs. In
this patch we change names to reflect this conventions.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/cq.c|   6 +-
 drivers/infiniband/hw/mlx5/main.c  |  80 +--
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |   6 +-
 drivers/infiniband/hw/mlx5/qp.c| 176 -
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  |  90 ++---
 include/linux/mlx5/device.h|   9 +-
 include/linux/mlx5/driver.h|  14 +-
 include/uapi/rdma/mlx5-abi.h   |  12 +-
 10 files changed, 206 insertions(+), 203 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index b3ef47c..bb7e91c 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -689,7 +689,7 @@ int mlx5_ib_arm_cq(struct ib_cq *ibcq, enum 
ib_cq_notify_flags flags)
 {
struct mlx5_core_dev *mdev = to_mdev(ibcq->device)->mdev;
struct mlx5_ib_cq *cq = to_mcq(ibcq);
-   void __iomem *uar_page = mdev->priv.uuari.uars[0].map;
+   void __iomem *uar_page = mdev->priv.bfregi.uars[0].map;
unsigned long irq_flags;
int ret = 0;
 
@@ -790,7 +790,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct 
ib_udata *udata,
MLX5_SET(cqc, cqc, log_page_size,
 page_shift - MLX5_ADAPTER_PAGE_SHIFT);
 
-   *index = to_mucontext(context)->uuari.uars[0].index;
+   *index = to_mucontext(context)->bfregi.uars[0].index;
 
if (ucmd.cqe_comp_en == 1) {
if (unlikely((*cqe_size != 64) ||
@@ -886,7 +886,7 @@ static int create_cq_kernel(struct mlx5_ib_dev *dev, struct 
mlx5_ib_cq *cq,
MLX5_SET(cqc, cqc, log_page_size,
 cq->buf.buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
 
-   *index = dev->mdev->priv.uuari.uars[0].index;
+   *index = dev->mdev->priv.bfregi.uars[0].index;
 
return 0;
 
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 852b5b7..d5cf82b 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -999,12 +999,12 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
struct mlx5_ib_alloc_ucontext_req_v2 req = {};
struct mlx5_ib_alloc_ucontext_resp resp = {};
struct mlx5_ib_ucontext *context;
-   struct mlx5_uuar_info *uuari;
+   struct mlx5_bfreg_info *bfregi;
struct mlx5_uar *uars;
-   int gross_uuars;
+   int gross_bfregs;
int num_uars;
int ver;
-   int uuarn;
+   int bfregn;
int err;
int i;
size_t reqlen;
@@ -1032,10 +1032,10 @@ static struct ib_ucontext 
*mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
if (req.flags)
return ERR_PTR(-EINVAL);
 
-   if (req.total_num_uuars > MLX5_MAX_UUARS)
+   if (req.total_num_bfregs > MLX5_MAX_BFREGS)
return ERR_PTR(-ENOMEM);
 
-   if (req.total_num_uuars == 0)
+   if (req.total_num_bfregs == 0)
return ERR_PTR(-EINVAL);
 
if (req.comp_mask || req.reserved0 || req.reserved1 || req.reserved2)
@@ -1046,13 +1046,13 @@ static struct ib_ucontext 
*mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
 reqlen - sizeof(req)))
return ERR_PTR(-EOPNOTSUPP);
 
-   req.total_num_uuars = ALIGN(req.total_num_uuars,
-   MLX5_NON_FP_BF_REGS_PER_PAGE);
-   if (req.num_low_latency_uuars > req.total_num_uuars - 1)
+   req.total_num_bfregs = ALIGN(req.total_num_bfregs,
+   MLX5_NON_FP_BFREGS_PER_UAR);
+   if (req.num_low_latency_bfregs > req.total_num_bfregs - 1)
return ERR_PTR(-EINVAL);
 
-   num_uars = req.total_num_uuars / MLX5_NON_FP_BF_REGS_PER_PAGE;
-   gross_uuars = num_uars * MLX5_BF_REGS_PER_PAGE;
+   num_uars = req.total_num_bfregs / MLX5_NON_FP_BFREGS_PER_UAR;
+   gross_bfregs = num_uars * MLX5_BFREGS_PER_UAR;
resp.qp_tab_size = 1 << MLX5_CAP_GEN(dev->mdev, log_max_qp);
if (mlx5_core_is_pf(dev->mdev) && MLX5_CAP_GEN(dev->mdev, bf))
 

[for-next 00/10][pull request] Mellanox 100G mlx5 4K UAR support

2017-01-03 Thread Saeed Mahameed
Hi Dave and Doug,

Following the mlx5-odp submission, you can find here the 2nd mlx5
submission for 4.11 as a pull-request including mlx5 4K UAR support from
Eli Cohen (details below).  For you Doug, this pull request will provide 
you with both mlx5 odp and mlx5 4k UAR since it is based on Dave's
net-next mlx5-odp merge commit.

Thank you,
Saeed.

---

The following changes since commit 525dfa2cdce4f5ab76251b5e57ebabf4f2dfc40c:

  Merge branch 'mlx5-odp' (2017-01-02 15:51:21 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git 
tags/mlx5-4kuar-for-4.11

for you to fetch changes up to 9f02e17b1c2976a62ba7bdd99530b437b31a454d:

  net/mlx5: Activate support for 4K UARs (2017-01-03 23:00:03 +0200)


mlx5 4K UAR

The following series of patches optimizes the usage of the UAR area which is
contained within the BAR 0-1. Previous versions of the firmware and the driver
assumed each system page contains a single UAR. This patch set will query the
firmware for a new capability that if published, means that the firmware can
support UARs of fixed 4K regardless of system page size. In the case of
powerpc, where page size equals 64KB, this means we can utilize 16 UARs per
system page. Since user space processes by default consume eight UARs per
context this means that with this change a process will need a single system
page to fulfill that requirement and in fact make use of more UARs which is
better in terms of performance.

In addition to optimizing user-space processes, we introduce an allocator
that can be used by kernel consumers to allocate blue flame registers
(which are areas within a UAR that are used to write doorbells). This provides
further optimization on using the UAR area since the Ethernet driver makes
use of a single blue flame register per system page and now it will use two
blue flame registers per 4K.

The series also makes changes to naming conventions and now the terms used in
the driver code match the terms used in the PRM (programmers reference manual).
Thus, what used to be called UUAR (micro UAR) is now called BFREG (blue flame
register).

In order to support compatibility between different versions of
library/driver/firmware, the library has now means to notify the kernel driver
that it supports the new scheme and the kernel can notify the library if it
supports this extension. So mixed versions of libraries can run concurrently
without any issues.

As an additional cleanup, we explicitly requested support of 64bit in mlx5
core Kconfig.

Thanks,
Eli and Matan


Eli Cohen (10):
  IB/mlx5: Fix kernel to user leak prevention logic
  IB/mlx5: Fix error handling order in create_kernel_qp
  mlx5: Fix naming convention with respect to UARs
  IB/mlx5: Fix retrieval of index to first hi class bfreg
  net/mlx5: Introduce blue flame register allocator
  net/mlx5: Add interface to get reference to a UAR
  IB/mlx5: Use blue flame register allocator in mlx5_ib
  IB/mlx5: Allow future extension of libmlx5 input data
  IB/mlx5: Support 4k UAR for libmlx5
  net/mlx5: Activate support for 4K UARs

drivers/infiniband/hw/mlx5/cq.c|  10 +-
 drivers/infiniband/hw/mlx5/main.c  | 278 ++--
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |  32 +-
 drivers/infiniband/hw/mlx5/qp.c| 290 +++--
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/cq.c   |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  11 +-
 .../net/ethernet/mellanox/mlx5/core/en_common.c|  12 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  21 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   |  14 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  26 +-
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  | 351 +
 include/linux/mlx5/cq.h|   5 +-
 include/linux/mlx5/device.h|  23 +-
 include/linux/mlx5/doorbell.h  |  32 +-
 include/linux/mlx5/driver.h|  81 ++---
 include/linux/mlx5/mlx5_ifc.h  |   7 +-
 include/uapi/rdma/mlx5-abi.h   |  19 +-
 18 files changed, 670 insertions(+), 546 deletions(-)

-- 
2.7.4



[for-next 05/10] net/mlx5: Introduce blue flame register allocator

2017-01-03 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Here is an implementation of an allocator that allocates blue flame
registers. A blue flame register is used for generating send doorbells.
A blue flame register can be used to generate either a regular doorbell
or a blue flame doorbell where the data to be sent is written to the
device's I/O memory hence saving the need to read the data from memory.
For blue flame kind of doorbells to succeed, the blue flame register
need to be mapped as write combining. The user can specify what kind of
send doorbells she wishes to use. If she requested write combining
mapping but that failed, the allocator will fall back to non write
combining mapping and will indicate that to the user.
Subsequent patches in this series will make use of this allocator.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/uar.c | 235 ++
 include/linux/mlx5/device.h   |   2 +
 include/linux/mlx5/driver.h   |  37 
 include/linux/mlx5/mlx5_ifc.h |   7 +-
 4 files changed, 279 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/uar.c 
b/drivers/net/ethernet/mellanox/mlx5/core/uar.c
index ce7fceb..6a081a8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/uar.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/uar.c
@@ -231,3 +231,238 @@ void mlx5_unmap_free_uar(struct mlx5_core_dev *mdev, 
struct mlx5_uar *uar)
mlx5_cmd_free_uar(mdev, uar->index);
 }
 EXPORT_SYMBOL(mlx5_unmap_free_uar);
+
+static int uars_per_sys_page(struct mlx5_core_dev *mdev)
+{
+   if (MLX5_CAP_GEN(mdev, uar_4k))
+   return MLX5_CAP_GEN(mdev, num_of_uars_per_page);
+
+   return 1;
+}
+
+static u64 uar2pfn(struct mlx5_core_dev *mdev, u32 index)
+{
+   u32 system_page_index;
+
+   if (MLX5_CAP_GEN(mdev, uar_4k))
+   system_page_index = index >> (PAGE_SHIFT - 
MLX5_ADAPTER_PAGE_SHIFT);
+   else
+   system_page_index = index;
+
+   return (pci_resource_start(mdev->pdev, 0) >> PAGE_SHIFT) + 
system_page_index;
+}
+
+static void up_rel_func(struct kref *kref)
+{
+   struct mlx5_uars_page *up = container_of(kref, struct mlx5_uars_page, 
ref_count);
+
+   list_del(>list);
+   if (mlx5_cmd_free_uar(up->mdev, up->index))
+   mlx5_core_warn(up->mdev, "failed to free uar index %d\n", 
up->index);
+   kfree(up->reg_bitmap);
+   kfree(up->fp_bitmap);
+   kfree(up);
+}
+
+static struct mlx5_uars_page *alloc_uars_page(struct mlx5_core_dev *mdev,
+ bool map_wc)
+{
+   struct mlx5_uars_page *up;
+   int err = -ENOMEM;
+   phys_addr_t pfn;
+   int bfregs;
+   int i;
+
+   bfregs = uars_per_sys_page(mdev) * MLX5_BFREGS_PER_UAR;
+   up = kzalloc(sizeof(*up), GFP_KERNEL);
+   if (!up)
+   return ERR_PTR(err);
+
+   up->mdev = mdev;
+   up->reg_bitmap = kcalloc(BITS_TO_LONGS(bfregs), sizeof(unsigned long), 
GFP_KERNEL);
+   if (!up->reg_bitmap)
+   goto error1;
+
+   up->fp_bitmap = kcalloc(BITS_TO_LONGS(bfregs), sizeof(unsigned long), 
GFP_KERNEL);
+   if (!up->fp_bitmap)
+   goto error1;
+
+   for (i = 0; i < bfregs; i++)
+   if ((i % MLX5_BFREGS_PER_UAR) < MLX5_NON_FP_BFREGS_PER_UAR)
+   set_bit(i, up->reg_bitmap);
+   else
+   set_bit(i, up->fp_bitmap);
+
+   up->bfregs = bfregs;
+   up->fp_avail = bfregs * MLX5_FP_BFREGS_PER_UAR / MLX5_BFREGS_PER_UAR;
+   up->reg_avail = bfregs * MLX5_NON_FP_BFREGS_PER_UAR / 
MLX5_BFREGS_PER_UAR;
+
+   err = mlx5_cmd_alloc_uar(mdev, >index);
+   if (err) {
+   mlx5_core_warn(mdev, "mlx5_cmd_alloc_uar() failed, %d\n", err);
+   goto error1;
+   }
+
+   pfn = uar2pfn(mdev, up->index);
+   if (map_wc) {
+   up->map = ioremap_wc(pfn << PAGE_SHIFT, PAGE_SIZE);
+   if (!up->map) {
+   err = -EAGAIN;
+   goto error2;
+   }
+   } else {
+   up->map = ioremap(pfn << PAGE_SHIFT, PAGE_SIZE);
+   if (!up->map) {
+   err = -ENOMEM;
+   goto error2;
+   }
+   }
+   kref_init(>ref_count);
+   mlx5_core_dbg(mdev, "allocated UAR page: index %d, total bfregs %d\n",
+ up->index, up->bfregs);
+   return up;
+
+error2:
+   if (mlx5_cmd_free_uar(mdev, up->index))
+   mlx5_core_warn(mdev, "failed to free uar index 

[for-next 10/10] net/mlx5: Activate support for 4K UARs

2017-01-03 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Activate 4K UAR support for firmware versions that support it.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index ff1f144..a16ee16 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -530,6 +530,10 @@ static int handle_hca_cap(struct mlx5_core_dev *dev)
/* disable cmdif checksum */
MLX5_SET(cmd_hca_cap, set_hca_cap, cmdif_checksum, 0);
 
+   /* If the HCA supports 4K UARs use it */
+   if (MLX5_CAP_GEN_MAX(dev, uar_4k))
+   MLX5_SET(cmd_hca_cap, set_hca_cap, uar_4k, 1);
+
MLX5_SET(cmd_hca_cap, set_hca_cap, log_uar_page_sz, PAGE_SHIFT - 12);
 
err = set_caps(dev, set_ctx, set_sz,
-- 
2.7.4



[for-next 09/10] IB/mlx5: Support 4k UAR for libmlx5

2017-01-03 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Add fields to structs to convey to kernel an indication whether the
library supports multi UARs per page and return to the library the size
of a UAR based on the queried value.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c  | 21 +++-
 drivers/net/ethernet/mellanox/mlx5/core/cq.c   |  2 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  9 ++--
 .../net/ethernet/mellanox/mlx5/core/en_common.c| 12 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 21 
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  | 56 --
 include/linux/mlx5/cq.h|  2 +-
 include/linux/mlx5/driver.h| 12 -
 include/uapi/rdma/mlx5-abi.h   |  7 +++
 9 files changed, 42 insertions(+), 100 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 6640672..a191b93 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -992,6 +992,12 @@ static int mlx5_ib_modify_port(struct ib_device *ibdev, u8 
port, int mask,
return err;
 }
 
+static void print_lib_caps(struct mlx5_ib_dev *dev, u64 caps)
+{
+   mlx5_ib_dbg(dev, "MLX5_LIB_CAP_4K_UAR = %s\n",
+   caps & MLX5_LIB_CAP_4K_UAR ? "y" : "n");
+}
+
 static int calc_total_bfregs(struct mlx5_ib_dev *dev, bool lib_uar_4k,
 struct mlx5_ib_alloc_ucontext_req_v2 *req,
 u32 *num_sys_pages)
@@ -1122,6 +1128,10 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
resp.cqe_version = min_t(__u8,
 (__u8)MLX5_CAP_GEN(dev->mdev, cqe_version),
 req.max_cqe_version);
+   resp.log_uar_size = MLX5_CAP_GEN(dev->mdev, uar_4k) ?
+   MLX5_ADAPTER_PAGE_SHIFT : PAGE_SHIFT;
+   resp.num_uars_per_page = MLX5_CAP_GEN(dev->mdev, uar_4k) ?
+   MLX5_CAP_GEN(dev->mdev, 
num_of_uars_per_page) : 1;
resp.response_length = min(offsetof(typeof(resp), response_length) +
   sizeof(resp.response_length), udata->outlen);
 
@@ -1129,7 +1139,7 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
if (!context)
return ERR_PTR(-ENOMEM);
 
-   lib_uar_4k = false;
+   lib_uar_4k = req.lib_caps & MLX5_LIB_CAP_4K_UAR;
bfregi = >bfregi;
 
/* updates req->total_num_bfregs */
@@ -1209,6 +1219,12 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
sizeof(resp.reserved2);
}
 
+   if (field_avail(typeof(resp), log_uar_size, udata->outlen))
+   resp.response_length += sizeof(resp.log_uar_size);
+
+   if (field_avail(typeof(resp), num_uars_per_page, udata->outlen))
+   resp.response_length += sizeof(resp.num_uars_per_page);
+
err = ib_copy_to_udata(udata, , resp.response_length);
if (err)
goto out_td;
@@ -1216,7 +1232,8 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
bfregi->ver = ver;
bfregi->num_low_latency_bfregs = req.num_low_latency_bfregs;
context->cqe_version = resp.cqe_version;
-   context->lib_caps = false;
+   context->lib_caps = req.lib_caps;
+   print_lib_caps(dev, context->lib_caps);
 
return >ibucontext;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index 32d4af9..336d473 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -179,6 +179,8 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct 
mlx5_core_cq *cq,
mlx5_core_dbg(dev, "failed adding CP 0x%x to debug file 
system\n",
  cq->cqn);
 
+   cq->uar = dev->priv.uar;
+
return 0;
 
 err_cmd:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index e3ef0b5..396c63d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -465,7 +465,6 @@ struct mlx5e_sq {
/* read only */
struct mlx5_wq_cyc wq;
u32dma_fifo_mask;
-   void __iomem  *uar_map;
struct netdev_queue   *txq;
u32sqn;
u16bf_buf_size;
@@ -479,7 +478

[for-next 04/10] IB/mlx5: Fix retrieval of index to first hi class bfreg

2017-01-03 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

First the function retrieving the index of the first hi latency class
blue flame register. High latency class bfregs are located right above
medium latency class bfregs.

Fixes: c1be5232d21d ('IB/mlx5: Fix micro UAR allocator')
Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/qp.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index fbea9bd..240fbb0 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -490,12 +490,21 @@ static int next_bfreg(int n)
return n;
 }
 
+enum {
+   /* this is the first blue flame register in the array of bfregs assigned
+* to a processes. Since we do not use it for blue flame but rather
+* regular 64 bit doorbells, we do not need a lock for maintaiing
+* "odd/even" order
+*/
+   NUM_NON_BLUE_FLAME_BFREGS = 1,
+};
+
 static int num_med_bfreg(struct mlx5_bfreg_info *bfregi)
 {
int n;
 
n = bfregi->num_uars * MLX5_NON_FP_BFREGS_PER_UAR -
-   bfregi->num_low_latency_bfregs - 1;
+   bfregi->num_low_latency_bfregs - NUM_NON_BLUE_FLAME_BFREGS;
 
return n >= 0 ? n : 0;
 }
@@ -508,17 +517,9 @@ static int max_bfregi(struct mlx5_bfreg_info *bfregi)
 static int first_hi_bfreg(struct mlx5_bfreg_info *bfregi)
 {
int med;
-   int i;
-   int t;
 
med = num_med_bfreg(bfregi);
-   for (t = 0, i = first_med_bfreg();; i = next_bfreg(i)) {
-   t++;
-   if (t == med)
-   return next_bfreg(i);
-   }
-
-   return 0;
+   return next_bfreg(med);
 }
 
 static int alloc_high_class_bfreg(struct mlx5_bfreg_info *bfregi)
@@ -544,6 +545,8 @@ static int alloc_med_class_bfreg(struct mlx5_bfreg_info 
*bfregi)
for (i = first_med_bfreg(); i < first_hi_bfreg(bfregi); i = 
next_bfreg(i)) {
if (bfregi->count[i] < bfregi->count[minidx])
minidx = i;
+   if (!bfregi->count[minidx])
+   break;
}
 
bfregi->count[minidx]++;
@@ -558,6 +561,7 @@ static int alloc_bfreg(struct mlx5_bfreg_info *bfregi,
mutex_lock(>lock);
switch (lat) {
case MLX5_IB_LATENCY_CLASS_LOW:
+   BUILD_BUG_ON(NUM_NON_BLUE_FLAME_BFREGS != 1);
bfregn = 0;
bfregi->count[bfregn]++;
break;
-- 
2.7.4



[for-next 06/10] net/mlx5: Add interface to get reference to a UAR

2017-01-03 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

A reference to a UAR is required to generate CQ or EQ doorbells. Since
CQ or EQ doorbells can all be generated using the same UAR area without
any effect on performance, we are just getting a reference to any
available UAR, If one is not available we allocate it but we don't waste
the blue flame registers it can provide and we will use them for
subsequent allocations.
We get a reference to such UAR and put in mlx5_priv so any kernel
consumer can make use of it.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   | 14 ---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 22 ++
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  | 32 ++
 include/linux/mlx5/driver.h|  5 +++-
 4 files changed, 59 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 11a8d63..9849ee9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -512,7 +512,7 @@ static void init_eq_buf(struct mlx5_eq *eq)
 
 int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 
vecidx,
   int nent, u64 mask, const char *name,
-  struct mlx5_uar *uar, enum mlx5_eq_type type)
+  enum mlx5_eq_type type)
 {
u32 out[MLX5_ST_SZ_DW(create_eq_out)] = {0};
struct mlx5_priv *priv = >priv;
@@ -556,7 +556,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct 
mlx5_eq *eq, u8 vecidx,
 
eqc = MLX5_ADDR_OF(create_eq_in, in, eq_context_entry);
MLX5_SET(eqc, eqc, log_eq_size, ilog2(eq->nent));
-   MLX5_SET(eqc, eqc, uar_page, uar->index);
+   MLX5_SET(eqc, eqc, uar_page, priv->uar->index);
MLX5_SET(eqc, eqc, intr, vecidx);
MLX5_SET(eqc, eqc, log_page_size,
 eq->buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
@@ -571,7 +571,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct 
mlx5_eq *eq, u8 vecidx,
eq->eqn = MLX5_GET(create_eq_out, out, eq_number);
eq->irqn = priv->msix_arr[vecidx].vector;
eq->dev = dev;
-   eq->doorbell = uar->map + MLX5_EQ_DOORBEL_OFFSET;
+   eq->doorbell = priv->uar->map + MLX5_EQ_DOORBEL_OFFSET;
err = request_irq(eq->irqn, handler, 0,
  priv->irq_info[vecidx].name, eq);
if (err)
@@ -686,8 +686,7 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
 
err = mlx5_create_map_eq(dev, >cmd_eq, MLX5_EQ_VEC_CMD,
 MLX5_NUM_CMD_EQE, 1ull << MLX5_EVENT_TYPE_CMD,
-"mlx5_cmd_eq", >priv.bfregi.uars[0],
-MLX5_EQ_TYPE_ASYNC);
+"mlx5_cmd_eq",  MLX5_EQ_TYPE_ASYNC);
if (err) {
mlx5_core_warn(dev, "failed to create cmd EQ %d\n", err);
return err;
@@ -697,8 +696,7 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
 
err = mlx5_create_map_eq(dev, >async_eq, MLX5_EQ_VEC_ASYNC,
 MLX5_NUM_ASYNC_EQE, async_event_mask,
-"mlx5_async_eq", >priv.bfregi.uars[0],
-MLX5_EQ_TYPE_ASYNC);
+"mlx5_async_eq", MLX5_EQ_TYPE_ASYNC);
if (err) {
mlx5_core_warn(dev, "failed to create async EQ %d\n", err);
goto err1;
@@ -708,7 +706,6 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
 MLX5_EQ_VEC_PAGES,
 /* TODO: sriov max_vf + */ 1,
 1 << MLX5_EVENT_TYPE_PAGE_REQUEST, 
"mlx5_pages_eq",
->priv.bfregi.uars[0],
 MLX5_EQ_TYPE_ASYNC);
if (err) {
mlx5_core_warn(dev, "failed to create pages EQ %d\n", err);
@@ -722,7 +719,6 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
 MLX5_NUM_ASYNC_EQE,
 1 << MLX5_EVENT_TYPE_PAGE_FAULT,
 "mlx5_page_fault_eq",
->priv.bfregi.uars[0],
 MLX5_EQ_TYPE_PF);
if (err) {
mlx5_core_warn(dev, "failed to create page fault EQ 
%d\n",
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/co

[for-next 08/10] IB/mlx5: Allow future extension of libmlx5 input data

2017-01-03 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Current check requests that new fields in struct
mlx5_ib_alloc_ucontext_req_v2 that are not known to the driver be zero.
This was introduced so new libraries passing additional information to
the kernel through struct mlx5_ib_alloc_ucontext_req_v2 will be notified
by old kernels that do not support their request by failing the
operation. This schecme is problematic since it requires libmlx5 to issue
the requests with descending input size for struct
mlx5_ib_alloc_ucontext_req_v2.

To avoid this, we require that new features that will obey the following
rules:
If the feature requires one or more fields in the response and the at
least one of the fields can be encoded such that a zero value means the
kernel ignored the request then this field will provide the indication
to the library. If no response is required or if zero is a valid
response, a new field should be added that indicates to the library
whether its request was processed.

Fixes: b368d7cb8ceb ('IB/mlx5: Add hca_core_clock_offset to udata in 
init_ucontext')
Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/cq.c  |   2 +-
 drivers/infiniband/hw/mlx5/main.c| 201 ++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  15 ++-
 drivers/infiniband/hw/mlx5/qp.c  | 133 ++-
 include/linux/mlx5/device.h  |  12 ++-
 include/linux/mlx5/driver.h  |  12 +--
 6 files changed, 209 insertions(+), 166 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index a28ec33..31803b3 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -788,7 +788,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct 
ib_udata *udata,
MLX5_SET(cqc, cqc, log_page_size,
 page_shift - MLX5_ADAPTER_PAGE_SHIFT);
 
-   *index = to_mucontext(context)->bfregi.uars[0].index;
+   *index = to_mucontext(context)->bfregi.sys_pages[0];
 
if (ucmd.cqe_comp_en == 1) {
if (unlikely((*cqe_size != 64) ||
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index e9f0830..6640672 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -992,6 +992,80 @@ static int mlx5_ib_modify_port(struct ib_device *ibdev, u8 
port, int mask,
return err;
 }
 
+static int calc_total_bfregs(struct mlx5_ib_dev *dev, bool lib_uar_4k,
+struct mlx5_ib_alloc_ucontext_req_v2 *req,
+u32 *num_sys_pages)
+{
+   int uars_per_sys_page;
+   int bfregs_per_sys_page;
+   int ref_bfregs = req->total_num_bfregs;
+
+   if (req->total_num_bfregs == 0)
+   return -EINVAL;
+
+   BUILD_BUG_ON(MLX5_MAX_BFREGS % MLX5_NON_FP_BFREGS_IN_PAGE);
+   BUILD_BUG_ON(MLX5_MAX_BFREGS < MLX5_NON_FP_BFREGS_IN_PAGE);
+
+   if (req->total_num_bfregs > MLX5_MAX_BFREGS)
+   return -ENOMEM;
+
+   uars_per_sys_page = get_uars_per_sys_page(dev, lib_uar_4k);
+   bfregs_per_sys_page = uars_per_sys_page * MLX5_NON_FP_BFREGS_PER_UAR;
+   req->total_num_bfregs = ALIGN(req->total_num_bfregs, 
bfregs_per_sys_page);
+   *num_sys_pages = req->total_num_bfregs / bfregs_per_sys_page;
+
+   if (req->num_low_latency_bfregs > req->total_num_bfregs - 1)
+   return -EINVAL;
+
+   mlx5_ib_dbg(dev, "uar_4k: fw support %s, lib support %s, user requested 
%d bfregs, alloated %d, using %d sys pages\n",
+   MLX5_CAP_GEN(dev->mdev, uar_4k) ? "yes" : "no",
+   lib_uar_4k ? "yes" : "no", ref_bfregs,
+   req->total_num_bfregs, *num_sys_pages);
+
+   return 0;
+}
+
+static int allocate_uars(struct mlx5_ib_dev *dev, struct mlx5_ib_ucontext 
*context)
+{
+   struct mlx5_bfreg_info *bfregi;
+   int err;
+   int i;
+
+   bfregi = >bfregi;
+   for (i = 0; i < bfregi->num_sys_pages; i++) {
+   err = mlx5_cmd_alloc_uar(dev->mdev, >sys_pages[i]);
+   if (err)
+   goto error;
+
+   mlx5_ib_dbg(dev, "allocated uar %d\n", bfregi->sys_pages[i]);
+   }
+   return 0;
+
+error:
+   for (--i; i >= 0; i--)
+   if (mlx5_cmd_free_uar(dev->mdev, bfregi->sys_pages[i]))
+   mlx5_ib_warn(dev, "failed to free uar %d\n", i);
+
+   return err;
+}
+
+static int deallocate_uars(struct mlx5_ib_dev *dev, struct mlx5_ib_ucontext 
*context)
+{
+   struct mlx5_bfreg_info *bfregi;
+   int err;
+   int i;
+
+

Re: [for-next V2 06/10] net/mlx5: Add interface to get reference to a UAR

2017-01-09 Thread Saeed Mahameed
On Mon, Jan 9, 2017 at 5:47 PM, David Miller <da...@davemloft.net> wrote:
> From: Saeed Mahameed <sae...@dev.mellanox.co.il>
> Date: Mon, 9 Jan 2017 10:31:36 +0200
>
>> We will submit an incremental patch for this, as checkpatch doesn't
>> complain about such minor things.
>
> Please fix this and resubmit the series.
>

Sure, will do this.

> Checkpatch not complaining is not an argument for fixing up coding
> style issues reported to you in feedback.

I just thought that this pull request is sitting in the mailing list
for too long and we need to move on.
Such minor issues can be fixed in incremental patches that doesn't
block net-next and rdma submissions.

Anyway will fix and submit v3.

thanks,
Saeed.


Re: [PATCH for-next V2 00/11] Mellanox mlx5 core and ODP updates 2017-01-01

2017-01-02 Thread Saeed Mahameed
On Mon, Jan 2, 2017 at 10:53 PM, David Miller <da...@davemloft.net> wrote:
> From: Saeed Mahameed <sae...@mellanox.com>
> Date: Mon,  2 Jan 2017 11:37:37 +0200
>
>> The following eleven patches mainly come from Artemy Kovalyov
>> who expanded mlx5 on-demand-paging (ODP) support. In addition
>> there are three cleanup patches which don't change any functionality,
>> but are needed to align codebase prior accepting other patches.
>
> Series applied to net-next, thanks.

Whoops,

This series was meant as a pull request, you can blame it on me I
kinda messed up the V2 title.
Doug will have to pull same patches later, will this produce a
conflict on merge window ?

Sorry for the confusion.


Re: [PATCH net-next 6/7] net/mlx5: E-Switch, Add control for inline mode

2017-01-08 Thread Saeed Mahameed
On Sun, Jan 8, 2017 at 11:56 AM, Jiri Pirko <j...@resnulli.us> wrote:
> Mon, Nov 21, 2016 at 02:06:00PM CET, sae...@mellanox.com wrote:
>>From: Roi Dayan <r...@mellanox.com>
>>
>>Implement devlink show and set of HW inline-mode.
>>The supported modes: none, link, network, transport.
>>We currently support one mode for all vports so set is done on all vports.
>>When eswitch is first initialized the inline-mode is queried from the FW.
>>
>>Signed-off-by: Roi Dayan <r...@mellanox.com>
>>Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
>
> Saeed, could you please use get_maintainer script and cc those people
> for you submissions? Thanks!

Sure,

Or, Roi, please make sure you do this in your future work.
I will verify prior to submission of course.


[for-next V2 10/10] net/mlx5: Activate support for 4K UARs

2017-01-08 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Activate 4K UAR support for firmware versions that support it.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index ff1f144..a16ee16 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -530,6 +530,10 @@ static int handle_hca_cap(struct mlx5_core_dev *dev)
/* disable cmdif checksum */
MLX5_SET(cmd_hca_cap, set_hca_cap, cmdif_checksum, 0);
 
+   /* If the HCA supports 4K UARs use it */
+   if (MLX5_CAP_GEN_MAX(dev, uar_4k))
+   MLX5_SET(cmd_hca_cap, set_hca_cap, uar_4k, 1);
+
MLX5_SET(cmd_hca_cap, set_hca_cap, log_uar_page_sz, PAGE_SHIFT - 12);
 
err = set_caps(dev, set_ctx, set_sz,
-- 
2.7.4



[for-next V2 03/10] mlx5: Fix naming convention with respect to UARs

2017-01-08 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

This establishes a solid naming conventions for UARs. A UAR (User Access
Region) can have size identical to a system page or can be fixed 4KB
depending on a value queried by firmware. Each UAR always has 4 blue
flame register which are used to post doorbell to send queue. In
addition, a UAR has section used for posting doorbells to CQs or EQs. In
this patch we change names to reflect this conventions.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/cq.c|   6 +-
 drivers/infiniband/hw/mlx5/main.c  |  80 +--
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |   6 +-
 drivers/infiniband/hw/mlx5/qp.c| 176 -
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  |  90 ++---
 include/linux/mlx5/device.h|   9 +-
 include/linux/mlx5/driver.h|  14 +-
 include/uapi/rdma/mlx5-abi.h   |  12 +-
 10 files changed, 206 insertions(+), 203 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index b3ef47c..bb7e91c 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -689,7 +689,7 @@ int mlx5_ib_arm_cq(struct ib_cq *ibcq, enum 
ib_cq_notify_flags flags)
 {
struct mlx5_core_dev *mdev = to_mdev(ibcq->device)->mdev;
struct mlx5_ib_cq *cq = to_mcq(ibcq);
-   void __iomem *uar_page = mdev->priv.uuari.uars[0].map;
+   void __iomem *uar_page = mdev->priv.bfregi.uars[0].map;
unsigned long irq_flags;
int ret = 0;
 
@@ -790,7 +790,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct 
ib_udata *udata,
MLX5_SET(cqc, cqc, log_page_size,
 page_shift - MLX5_ADAPTER_PAGE_SHIFT);
 
-   *index = to_mucontext(context)->uuari.uars[0].index;
+   *index = to_mucontext(context)->bfregi.uars[0].index;
 
if (ucmd.cqe_comp_en == 1) {
if (unlikely((*cqe_size != 64) ||
@@ -886,7 +886,7 @@ static int create_cq_kernel(struct mlx5_ib_dev *dev, struct 
mlx5_ib_cq *cq,
MLX5_SET(cqc, cqc, log_page_size,
 cq->buf.buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
 
-   *index = dev->mdev->priv.uuari.uars[0].index;
+   *index = dev->mdev->priv.bfregi.uars[0].index;
 
return 0;
 
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 852b5b7..d5cf82b 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -999,12 +999,12 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
struct mlx5_ib_alloc_ucontext_req_v2 req = {};
struct mlx5_ib_alloc_ucontext_resp resp = {};
struct mlx5_ib_ucontext *context;
-   struct mlx5_uuar_info *uuari;
+   struct mlx5_bfreg_info *bfregi;
struct mlx5_uar *uars;
-   int gross_uuars;
+   int gross_bfregs;
int num_uars;
int ver;
-   int uuarn;
+   int bfregn;
int err;
int i;
size_t reqlen;
@@ -1032,10 +1032,10 @@ static struct ib_ucontext 
*mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
if (req.flags)
return ERR_PTR(-EINVAL);
 
-   if (req.total_num_uuars > MLX5_MAX_UUARS)
+   if (req.total_num_bfregs > MLX5_MAX_BFREGS)
return ERR_PTR(-ENOMEM);
 
-   if (req.total_num_uuars == 0)
+   if (req.total_num_bfregs == 0)
return ERR_PTR(-EINVAL);
 
if (req.comp_mask || req.reserved0 || req.reserved1 || req.reserved2)
@@ -1046,13 +1046,13 @@ static struct ib_ucontext 
*mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
 reqlen - sizeof(req)))
return ERR_PTR(-EOPNOTSUPP);
 
-   req.total_num_uuars = ALIGN(req.total_num_uuars,
-   MLX5_NON_FP_BF_REGS_PER_PAGE);
-   if (req.num_low_latency_uuars > req.total_num_uuars - 1)
+   req.total_num_bfregs = ALIGN(req.total_num_bfregs,
+   MLX5_NON_FP_BFREGS_PER_UAR);
+   if (req.num_low_latency_bfregs > req.total_num_bfregs - 1)
return ERR_PTR(-EINVAL);
 
-   num_uars = req.total_num_uuars / MLX5_NON_FP_BF_REGS_PER_PAGE;
-   gross_uuars = num_uars * MLX5_BF_REGS_PER_PAGE;
+   num_uars = req.total_num_bfregs / MLX5_NON_FP_BFREGS_PER_UAR;
+   gross_bfregs = num_uars * MLX5_BFREGS_PER_UAR;
resp.qp_tab_size = 1 << MLX5_CAP_GEN(dev->mdev, log_max_qp);
if (mlx5_core_is_pf(dev->mdev) && MLX5_CAP_GEN(dev->mdev, bf))
 

[for-next V2 07/10] IB/mlx5: Use blue flame register allocator in mlx5_ib

2017-01-08 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Make use of the blue flame registers allocator at mlx5_ib. Since blue
flame was not really supported we remove all the code that is related to
blue flame and we let all consumers to use the same blue flame register.
Once blue flame is supported we will add the code. As part of this patch
we also move the definition of struct mlx5_bf to mlx5_ib.h as it is only
used by mlx5_ib.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/cq.c|   8 +-
 drivers/infiniband/hw/mlx5/main.c  |  28 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |  11 ++-
 drivers/infiniband/hw/mlx5/qp.c|  73 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  16 +---
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  | 114 -
 include/linux/mlx5/cq.h|   3 +-
 include/linux/mlx5/doorbell.h  |   6 +-
 include/linux/mlx5/driver.h|  19 -
 10 files changed, 59 insertions(+), 221 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index bb7e91c..a28ec33 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -689,7 +689,7 @@ int mlx5_ib_arm_cq(struct ib_cq *ibcq, enum 
ib_cq_notify_flags flags)
 {
struct mlx5_core_dev *mdev = to_mdev(ibcq->device)->mdev;
struct mlx5_ib_cq *cq = to_mcq(ibcq);
-   void __iomem *uar_page = mdev->priv.bfregi.uars[0].map;
+   void __iomem *uar_page = mdev->priv.uar->map;
unsigned long irq_flags;
int ret = 0;
 
@@ -704,9 +704,7 @@ int mlx5_ib_arm_cq(struct ib_cq *ibcq, enum 
ib_cq_notify_flags flags)
mlx5_cq_arm(>mcq,
(flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED ?
MLX5_CQ_DB_REQ_NOT_SOL : MLX5_CQ_DB_REQ_NOT,
-   uar_page,
-   MLX5_GET_DOORBELL_LOCK(>priv.cq_uar_lock),
-   to_mcq(ibcq)->mcq.cons_index);
+   uar_page, to_mcq(ibcq)->mcq.cons_index);
 
return ret;
 }
@@ -886,7 +884,7 @@ static int create_cq_kernel(struct mlx5_ib_dev *dev, struct 
mlx5_ib_cq *cq,
MLX5_SET(cqc, cqc, log_page_size,
 cq->buf.buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
 
-   *index = dev->mdev->priv.bfregi.uars[0].index;
+   *index = dev->mdev->priv.uar->index;
 
return 0;
 
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index d5cf82b..e9f0830 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3074,8 +3074,6 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
if (mlx5_use_mad_ifc(dev))
get_ext_port_caps(dev);
 
-   MLX5_INIT_DOORBELL_LOCK(>uar_lock);
-
if (!mlx5_lag_is_active(mdev))
name = "mlx5_%d";
else
@@ -3251,9 +3249,21 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
if (err)
goto err_odp;
 
+   dev->mdev->priv.uar = mlx5_get_uars_page(dev->mdev);
+   if (!dev->mdev->priv.uar)
+   goto err_q_cnt;
+
+   err = mlx5_alloc_bfreg(dev->mdev, >bfreg, false, false);
+   if (err)
+   goto err_uar_page;
+
+   err = mlx5_alloc_bfreg(dev->mdev, >fp_bfreg, false, true);
+   if (err)
+   goto err_bfreg;
+
err = ib_register_device(>ib_dev, NULL);
if (err)
-   goto err_q_cnt;
+   goto err_fp_bfreg;
 
err = create_umr_res(dev);
if (err)
@@ -3276,6 +3286,15 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 err_dev:
ib_unregister_device(>ib_dev);
 
+err_fp_bfreg:
+   mlx5_free_bfreg(dev->mdev, >fp_bfreg);
+
+err_bfreg:
+   mlx5_free_bfreg(dev->mdev, >bfreg);
+
+err_uar_page:
+   mlx5_put_uars_page(dev->mdev, dev->mdev->priv.uar);
+
 err_q_cnt:
mlx5_ib_dealloc_q_counters(dev);
 
@@ -3307,6 +3326,9 @@ static void mlx5_ib_remove(struct mlx5_core_dev *mdev, 
void *context)
 
mlx5_remove_netdev_notifier(dev);
ib_unregister_device(>ib_dev);
+   mlx5_free_bfreg(dev->mdev, >fp_bfreg);
+   mlx5_free_bfreg(dev->mdev, >bfreg);
+   mlx5_put_uars_page(dev->mdev, mdev->priv.uar);
mlx5_ib_dealloc_q_counters(dev);
destroy_umrc_res(dev);
mlx5_ib_odp_remove_one(dev);
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index d4d1329..ae3bc4a 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniba

[for-next V2 05/10] net/mlx5: Introduce blue flame register allocator

2017-01-08 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Here is an implementation of an allocator that allocates blue flame
registers. A blue flame register is used for generating send doorbells.
A blue flame register can be used to generate either a regular doorbell
or a blue flame doorbell where the data to be sent is written to the
device's I/O memory hence saving the need to read the data from memory.
For blue flame kind of doorbells to succeed, the blue flame register
need to be mapped as write combining. The user can specify what kind of
send doorbells she wishes to use. If she requested write combining
mapping but that failed, the allocator will fall back to non write
combining mapping and will indicate that to the user.
Subsequent patches in this series will make use of this allocator.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/uar.c | 235 ++
 include/linux/mlx5/device.h   |   2 +
 include/linux/mlx5/driver.h   |  37 
 include/linux/mlx5/mlx5_ifc.h |   7 +-
 4 files changed, 279 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/uar.c 
b/drivers/net/ethernet/mellanox/mlx5/core/uar.c
index ce7fceb..6a081a8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/uar.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/uar.c
@@ -231,3 +231,238 @@ void mlx5_unmap_free_uar(struct mlx5_core_dev *mdev, 
struct mlx5_uar *uar)
mlx5_cmd_free_uar(mdev, uar->index);
 }
 EXPORT_SYMBOL(mlx5_unmap_free_uar);
+
+static int uars_per_sys_page(struct mlx5_core_dev *mdev)
+{
+   if (MLX5_CAP_GEN(mdev, uar_4k))
+   return MLX5_CAP_GEN(mdev, num_of_uars_per_page);
+
+   return 1;
+}
+
+static u64 uar2pfn(struct mlx5_core_dev *mdev, u32 index)
+{
+   u32 system_page_index;
+
+   if (MLX5_CAP_GEN(mdev, uar_4k))
+   system_page_index = index >> (PAGE_SHIFT - 
MLX5_ADAPTER_PAGE_SHIFT);
+   else
+   system_page_index = index;
+
+   return (pci_resource_start(mdev->pdev, 0) >> PAGE_SHIFT) + 
system_page_index;
+}
+
+static void up_rel_func(struct kref *kref)
+{
+   struct mlx5_uars_page *up = container_of(kref, struct mlx5_uars_page, 
ref_count);
+
+   list_del(>list);
+   if (mlx5_cmd_free_uar(up->mdev, up->index))
+   mlx5_core_warn(up->mdev, "failed to free uar index %d\n", 
up->index);
+   kfree(up->reg_bitmap);
+   kfree(up->fp_bitmap);
+   kfree(up);
+}
+
+static struct mlx5_uars_page *alloc_uars_page(struct mlx5_core_dev *mdev,
+ bool map_wc)
+{
+   struct mlx5_uars_page *up;
+   int err = -ENOMEM;
+   phys_addr_t pfn;
+   int bfregs;
+   int i;
+
+   bfregs = uars_per_sys_page(mdev) * MLX5_BFREGS_PER_UAR;
+   up = kzalloc(sizeof(*up), GFP_KERNEL);
+   if (!up)
+   return ERR_PTR(err);
+
+   up->mdev = mdev;
+   up->reg_bitmap = kcalloc(BITS_TO_LONGS(bfregs), sizeof(unsigned long), 
GFP_KERNEL);
+   if (!up->reg_bitmap)
+   goto error1;
+
+   up->fp_bitmap = kcalloc(BITS_TO_LONGS(bfregs), sizeof(unsigned long), 
GFP_KERNEL);
+   if (!up->fp_bitmap)
+   goto error1;
+
+   for (i = 0; i < bfregs; i++)
+   if ((i % MLX5_BFREGS_PER_UAR) < MLX5_NON_FP_BFREGS_PER_UAR)
+   set_bit(i, up->reg_bitmap);
+   else
+   set_bit(i, up->fp_bitmap);
+
+   up->bfregs = bfregs;
+   up->fp_avail = bfregs * MLX5_FP_BFREGS_PER_UAR / MLX5_BFREGS_PER_UAR;
+   up->reg_avail = bfregs * MLX5_NON_FP_BFREGS_PER_UAR / 
MLX5_BFREGS_PER_UAR;
+
+   err = mlx5_cmd_alloc_uar(mdev, >index);
+   if (err) {
+   mlx5_core_warn(mdev, "mlx5_cmd_alloc_uar() failed, %d\n", err);
+   goto error1;
+   }
+
+   pfn = uar2pfn(mdev, up->index);
+   if (map_wc) {
+   up->map = ioremap_wc(pfn << PAGE_SHIFT, PAGE_SIZE);
+   if (!up->map) {
+   err = -EAGAIN;
+   goto error2;
+   }
+   } else {
+   up->map = ioremap(pfn << PAGE_SHIFT, PAGE_SIZE);
+   if (!up->map) {
+   err = -ENOMEM;
+   goto error2;
+   }
+   }
+   kref_init(>ref_count);
+   mlx5_core_dbg(mdev, "allocated UAR page: index %d, total bfregs %d\n",
+ up->index, up->bfregs);
+   return up;
+
+error2:
+   if (mlx5_cmd_free_uar(mdev, up->index))
+   mlx5_core_warn(mdev, "failed to free uar index 

[for-next V2 06/10] net/mlx5: Add interface to get reference to a UAR

2017-01-08 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

A reference to a UAR is required to generate CQ or EQ doorbells. Since
CQ or EQ doorbells can all be generated using the same UAR area without
any effect on performance, we are just getting a reference to any
available UAR, If one is not available we allocate it but we don't waste
the blue flame registers it can provide and we will use them for
subsequent allocations.
We get a reference to such UAR and put in mlx5_priv so any kernel
consumer can make use of it.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   | 14 ---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 22 ++
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  | 32 ++
 include/linux/mlx5/driver.h|  5 +++-
 4 files changed, 59 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 11a8d63..9849ee9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -512,7 +512,7 @@ static void init_eq_buf(struct mlx5_eq *eq)
 
 int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 
vecidx,
   int nent, u64 mask, const char *name,
-  struct mlx5_uar *uar, enum mlx5_eq_type type)
+  enum mlx5_eq_type type)
 {
u32 out[MLX5_ST_SZ_DW(create_eq_out)] = {0};
struct mlx5_priv *priv = >priv;
@@ -556,7 +556,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct 
mlx5_eq *eq, u8 vecidx,
 
eqc = MLX5_ADDR_OF(create_eq_in, in, eq_context_entry);
MLX5_SET(eqc, eqc, log_eq_size, ilog2(eq->nent));
-   MLX5_SET(eqc, eqc, uar_page, uar->index);
+   MLX5_SET(eqc, eqc, uar_page, priv->uar->index);
MLX5_SET(eqc, eqc, intr, vecidx);
MLX5_SET(eqc, eqc, log_page_size,
 eq->buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
@@ -571,7 +571,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct 
mlx5_eq *eq, u8 vecidx,
eq->eqn = MLX5_GET(create_eq_out, out, eq_number);
eq->irqn = priv->msix_arr[vecidx].vector;
eq->dev = dev;
-   eq->doorbell = uar->map + MLX5_EQ_DOORBEL_OFFSET;
+   eq->doorbell = priv->uar->map + MLX5_EQ_DOORBEL_OFFSET;
err = request_irq(eq->irqn, handler, 0,
  priv->irq_info[vecidx].name, eq);
if (err)
@@ -686,8 +686,7 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
 
err = mlx5_create_map_eq(dev, >cmd_eq, MLX5_EQ_VEC_CMD,
 MLX5_NUM_CMD_EQE, 1ull << MLX5_EVENT_TYPE_CMD,
-"mlx5_cmd_eq", >priv.bfregi.uars[0],
-MLX5_EQ_TYPE_ASYNC);
+"mlx5_cmd_eq",  MLX5_EQ_TYPE_ASYNC);
if (err) {
mlx5_core_warn(dev, "failed to create cmd EQ %d\n", err);
return err;
@@ -697,8 +696,7 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
 
err = mlx5_create_map_eq(dev, >async_eq, MLX5_EQ_VEC_ASYNC,
 MLX5_NUM_ASYNC_EQE, async_event_mask,
-"mlx5_async_eq", >priv.bfregi.uars[0],
-MLX5_EQ_TYPE_ASYNC);
+"mlx5_async_eq", MLX5_EQ_TYPE_ASYNC);
if (err) {
mlx5_core_warn(dev, "failed to create async EQ %d\n", err);
goto err1;
@@ -708,7 +706,6 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
 MLX5_EQ_VEC_PAGES,
 /* TODO: sriov max_vf + */ 1,
 1 << MLX5_EVENT_TYPE_PAGE_REQUEST, 
"mlx5_pages_eq",
->priv.bfregi.uars[0],
 MLX5_EQ_TYPE_ASYNC);
if (err) {
mlx5_core_warn(dev, "failed to create pages EQ %d\n", err);
@@ -722,7 +719,6 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
 MLX5_NUM_ASYNC_EQE,
 1 << MLX5_EVENT_TYPE_PAGE_FAULT,
 "mlx5_page_fault_eq",
->priv.bfregi.uars[0],
 MLX5_EQ_TYPE_PF);
if (err) {
mlx5_core_warn(dev, "failed to create page fault EQ 
%d\n",
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/co

[for-next V2 04/10] IB/mlx5: Fix retrieval of index to first hi class bfreg

2017-01-08 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

First the function retrieving the index of the first hi latency class
blue flame register. High latency class bfregs are located right above
medium latency class bfregs.

Fixes: c1be5232d21d ('IB/mlx5: Fix micro UAR allocator')
Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/qp.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index fbea9bd..240fbb0 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -490,12 +490,21 @@ static int next_bfreg(int n)
return n;
 }
 
+enum {
+   /* this is the first blue flame register in the array of bfregs assigned
+* to a processes. Since we do not use it for blue flame but rather
+* regular 64 bit doorbells, we do not need a lock for maintaiing
+* "odd/even" order
+*/
+   NUM_NON_BLUE_FLAME_BFREGS = 1,
+};
+
 static int num_med_bfreg(struct mlx5_bfreg_info *bfregi)
 {
int n;
 
n = bfregi->num_uars * MLX5_NON_FP_BFREGS_PER_UAR -
-   bfregi->num_low_latency_bfregs - 1;
+   bfregi->num_low_latency_bfregs - NUM_NON_BLUE_FLAME_BFREGS;
 
return n >= 0 ? n : 0;
 }
@@ -508,17 +517,9 @@ static int max_bfregi(struct mlx5_bfreg_info *bfregi)
 static int first_hi_bfreg(struct mlx5_bfreg_info *bfregi)
 {
int med;
-   int i;
-   int t;
 
med = num_med_bfreg(bfregi);
-   for (t = 0, i = first_med_bfreg();; i = next_bfreg(i)) {
-   t++;
-   if (t == med)
-   return next_bfreg(i);
-   }
-
-   return 0;
+   return next_bfreg(med);
 }
 
 static int alloc_high_class_bfreg(struct mlx5_bfreg_info *bfregi)
@@ -544,6 +545,8 @@ static int alloc_med_class_bfreg(struct mlx5_bfreg_info 
*bfregi)
for (i = first_med_bfreg(); i < first_hi_bfreg(bfregi); i = 
next_bfreg(i)) {
if (bfregi->count[i] < bfregi->count[minidx])
minidx = i;
+   if (!bfregi->count[minidx])
+   break;
}
 
bfregi->count[minidx]++;
@@ -558,6 +561,7 @@ static int alloc_bfreg(struct mlx5_bfreg_info *bfregi,
mutex_lock(>lock);
switch (lat) {
case MLX5_IB_LATENCY_CLASS_LOW:
+   BUILD_BUG_ON(NUM_NON_BLUE_FLAME_BFREGS != 1);
bfregn = 0;
bfregi->count[bfregn]++;
break;
-- 
2.7.4



[for-next V2 01/10] IB/mlx5: Fix kernel to user leak prevention logic

2017-01-08 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

The logic was broken as it failed to update the response length for
architectures with PAGE_SIZE larger than 4kB. As a result further
extension of the ucontext response struct would fail.

Fixes: d69e3bcf7976 ('IB/mlx5: Mmap the HCA's core clock register to 
user-space')
Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 86c61e7..852b5b7 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1148,13 +1148,13 @@ static struct ib_ucontext 
*mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
 * pretend we don't support reading the HCA's core clock. This is also
 * forced by mmap function.
 */
-   if (PAGE_SIZE <= 4096 &&
-   field_avail(typeof(resp), hca_core_clock_offset, udata->outlen)) {
-   resp.comp_mask |=
-   MLX5_IB_ALLOC_UCONTEXT_RESP_MASK_CORE_CLOCK_OFFSET;
-   resp.hca_core_clock_offset =
-   offsetof(struct mlx5_init_seg, internal_timer_h) %
-   PAGE_SIZE;
+   if (field_avail(typeof(resp), hca_core_clock_offset, udata->outlen)) {
+   if (PAGE_SIZE <= 4096) {
+   resp.comp_mask |=
+   
MLX5_IB_ALLOC_UCONTEXT_RESP_MASK_CORE_CLOCK_OFFSET;
+   resp.hca_core_clock_offset =
+   offsetof(struct mlx5_init_seg, 
internal_timer_h) % PAGE_SIZE;
+   }
resp.response_length += sizeof(resp.hca_core_clock_offset) +
sizeof(resp.reserved2);
}
-- 
2.7.4



[for-next V2 02/10] IB/mlx5: Fix error handling order in create_kernel_qp

2017-01-08 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Make sure order of cleanup is exactly the opposite of initialization.

Fixes: 9603b61de1ee ('mlx5: Move pci device handling from mlx5_ib to mlx5_core')
Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/qp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 53f4dd3..42d021cd 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -994,12 +994,12 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev,
return 0;
 
 err_wrid:
-   mlx5_db_free(dev->mdev, >db);
kfree(qp->sq.wqe_head);
kfree(qp->sq.w_list);
kfree(qp->sq.wrid);
kfree(qp->sq.wr_data);
kfree(qp->rq.wrid);
+   mlx5_db_free(dev->mdev, >db);
 
 err_free:
kvfree(*in);
@@ -1014,12 +1014,12 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev,
 
 static void destroy_qp_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp)
 {
-   mlx5_db_free(dev->mdev, >db);
kfree(qp->sq.wqe_head);
kfree(qp->sq.w_list);
kfree(qp->sq.wrid);
kfree(qp->sq.wr_data);
kfree(qp->rq.wrid);
+   mlx5_db_free(dev->mdev, >db);
mlx5_buf_free(dev->mdev, >buf);
free_uuar(>mdev->priv.uuari, qp->bf->uuarn);
 }
-- 
2.7.4



[for-next V2 00/10][pull request] Mellanox 100G mlx5 4K UAR support

2017-01-08 Thread Saeed Mahameed
Hi Dave and Doug,

Following the mlx5-odp submission, you can find here the 2nd mlx5
submission for 4.11 as a pull-request including mlx5 4K UAR support from
Eli Cohen (details below).  For you Doug, this pull request will provide 
you with both mlx5 odp and mlx5 4k UAR since it is based on Dave's
net-next mlx5-odp merge commit.

v1->v2:
  - Removed 64BIT arch dependency.

Thank you,
Saeed.

---

The following changes since commit 525dfa2cdce4f5ab76251b5e57ebabf4f2dfc40c:

  Merge branch 'mlx5-odp' (2017-01-02 15:51:21 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git 
tags/mlx5-4kuar-for-4.11

for you to fetch changes up to ca704520bd370758aec9d70afeeecc9d643fe132:

  net/mlx5: Activate support for 4K UARs (2017-01-08 11:21:27 +0200)


mlx5 4K UAR

The following series of patches optimizes the usage of the UAR area which is
contained within the BAR 0-1. Previous versions of the firmware and the driver
assumed each system page contains a single UAR. This patch set will query the
firmware for a new capability that if published, means that the firmware can
support UARs of fixed 4K regardless of system page size. In the case of
powerpc, where page size equals 64KB, this means we can utilize 16 UARs per
system page. Since user space processes by default consume eight UARs per
context this means that with this change a process will need a single system
page to fulfill that requirement and in fact make use of more UARs which is
better in terms of performance.

In addition to optimizing user-space processes, we introduce an allocator
that can be used by kernel consumers to allocate blue flame registers
(which are areas within a UAR that are used to write doorbells). This provides
further optimization on using the UAR area since the Ethernet driver makes
use of a single blue flame register per system page and now it will use two
blue flame registers per 4K.

The series also makes changes to naming conventions and now the terms used in
the driver code match the terms used in the PRM (programmers reference manual).
Thus, what used to be called UUAR (micro UAR) is now called BFREG (blue flame
register).

In order to support compatibility between different versions of
library/driver/firmware, the library has now means to notify the kernel driver
that it supports the new scheme and the kernel can notify the library if it
supports this extension. So mixed versions of libraries can run concurrently
without any issues.

Thanks,
Eli and Matan


Eli Cohen (10):
  IB/mlx5: Fix kernel to user leak prevention logic
  IB/mlx5: Fix error handling order in create_kernel_qp
  mlx5: Fix naming convention with respect to UARs
  IB/mlx5: Fix retrieval of index to first hi class bfreg
  net/mlx5: Introduce blue flame register allocator
  net/mlx5: Add interface to get reference to a UAR
  IB/mlx5: Use blue flame register allocator in mlx5_ib
  IB/mlx5: Allow future extension of libmlx5 input data
  IB/mlx5: Support 4k UAR for libmlx5
  net/mlx5: Activate support for 4K UARs

 drivers/infiniband/hw/mlx5/cq.c|  10 +-
 drivers/infiniband/hw/mlx5/main.c  | 278 ++--
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |  32 +-
 drivers/infiniband/hw/mlx5/qp.c| 290 +++--
 drivers/net/ethernet/mellanox/mlx5/core/cq.c   |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  11 +-
 .../net/ethernet/mellanox/mlx5/core/en_common.c|  12 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  21 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   |  14 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  26 +-
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  | 351 +
 include/linux/mlx5/cq.h|   5 +-
 include/linux/mlx5/device.h|  23 +-
 include/linux/mlx5/doorbell.h  |   6 +-
 include/linux/mlx5/driver.h|  81 ++---
 include/linux/mlx5/mlx5_ifc.h  |   7 +-
 include/uapi/rdma/mlx5-abi.h   |  19 +-
 17 files changed, 672 insertions(+), 516 deletions(-)

-- 
2.7.4



[for-next V2 08/10] IB/mlx5: Allow future extension of libmlx5 input data

2017-01-08 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Current check requests that new fields in struct
mlx5_ib_alloc_ucontext_req_v2 that are not known to the driver be zero.
This was introduced so new libraries passing additional information to
the kernel through struct mlx5_ib_alloc_ucontext_req_v2 will be notified
by old kernels that do not support their request by failing the
operation. This schecme is problematic since it requires libmlx5 to issue
the requests with descending input size for struct
mlx5_ib_alloc_ucontext_req_v2.

To avoid this, we require that new features that will obey the following
rules:
If the feature requires one or more fields in the response and the at
least one of the fields can be encoded such that a zero value means the
kernel ignored the request then this field will provide the indication
to the library. If no response is required or if zero is a valid
response, a new field should be added that indicates to the library
whether its request was processed.

Fixes: b368d7cb8ceb ('IB/mlx5: Add hca_core_clock_offset to udata in 
init_ucontext')
Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/cq.c  |   2 +-
 drivers/infiniband/hw/mlx5/main.c| 201 ++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  15 ++-
 drivers/infiniband/hw/mlx5/qp.c  | 133 ++-
 include/linux/mlx5/device.h  |  12 ++-
 include/linux/mlx5/driver.h  |  12 +--
 6 files changed, 209 insertions(+), 166 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index a28ec33..31803b3 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -788,7 +788,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct 
ib_udata *udata,
MLX5_SET(cqc, cqc, log_page_size,
 page_shift - MLX5_ADAPTER_PAGE_SHIFT);
 
-   *index = to_mucontext(context)->bfregi.uars[0].index;
+   *index = to_mucontext(context)->bfregi.sys_pages[0];
 
if (ucmd.cqe_comp_en == 1) {
if (unlikely((*cqe_size != 64) ||
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index e9f0830..6640672 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -992,6 +992,80 @@ static int mlx5_ib_modify_port(struct ib_device *ibdev, u8 
port, int mask,
return err;
 }
 
+static int calc_total_bfregs(struct mlx5_ib_dev *dev, bool lib_uar_4k,
+struct mlx5_ib_alloc_ucontext_req_v2 *req,
+u32 *num_sys_pages)
+{
+   int uars_per_sys_page;
+   int bfregs_per_sys_page;
+   int ref_bfregs = req->total_num_bfregs;
+
+   if (req->total_num_bfregs == 0)
+   return -EINVAL;
+
+   BUILD_BUG_ON(MLX5_MAX_BFREGS % MLX5_NON_FP_BFREGS_IN_PAGE);
+   BUILD_BUG_ON(MLX5_MAX_BFREGS < MLX5_NON_FP_BFREGS_IN_PAGE);
+
+   if (req->total_num_bfregs > MLX5_MAX_BFREGS)
+   return -ENOMEM;
+
+   uars_per_sys_page = get_uars_per_sys_page(dev, lib_uar_4k);
+   bfregs_per_sys_page = uars_per_sys_page * MLX5_NON_FP_BFREGS_PER_UAR;
+   req->total_num_bfregs = ALIGN(req->total_num_bfregs, 
bfregs_per_sys_page);
+   *num_sys_pages = req->total_num_bfregs / bfregs_per_sys_page;
+
+   if (req->num_low_latency_bfregs > req->total_num_bfregs - 1)
+   return -EINVAL;
+
+   mlx5_ib_dbg(dev, "uar_4k: fw support %s, lib support %s, user requested 
%d bfregs, alloated %d, using %d sys pages\n",
+   MLX5_CAP_GEN(dev->mdev, uar_4k) ? "yes" : "no",
+   lib_uar_4k ? "yes" : "no", ref_bfregs,
+   req->total_num_bfregs, *num_sys_pages);
+
+   return 0;
+}
+
+static int allocate_uars(struct mlx5_ib_dev *dev, struct mlx5_ib_ucontext 
*context)
+{
+   struct mlx5_bfreg_info *bfregi;
+   int err;
+   int i;
+
+   bfregi = >bfregi;
+   for (i = 0; i < bfregi->num_sys_pages; i++) {
+   err = mlx5_cmd_alloc_uar(dev->mdev, >sys_pages[i]);
+   if (err)
+   goto error;
+
+   mlx5_ib_dbg(dev, "allocated uar %d\n", bfregi->sys_pages[i]);
+   }
+   return 0;
+
+error:
+   for (--i; i >= 0; i--)
+   if (mlx5_cmd_free_uar(dev->mdev, bfregi->sys_pages[i]))
+   mlx5_ib_warn(dev, "failed to free uar %d\n", i);
+
+   return err;
+}
+
+static int deallocate_uars(struct mlx5_ib_dev *dev, struct mlx5_ib_ucontext 
*context)
+{
+   struct mlx5_bfreg_info *bfregi;
+   int err;
+   int i;
+
+

[for-next V2 09/10] IB/mlx5: Support 4k UAR for libmlx5

2017-01-08 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Add fields to structs to convey to kernel an indication whether the
library supports multi UARs per page and return to the library the size
of a UAR based on the queried value.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c  | 21 +++-
 drivers/net/ethernet/mellanox/mlx5/core/cq.c   |  2 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  9 ++--
 .../net/ethernet/mellanox/mlx5/core/en_common.c| 12 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 21 
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  | 56 --
 include/linux/mlx5/cq.h|  2 +-
 include/linux/mlx5/driver.h| 12 -
 include/uapi/rdma/mlx5-abi.h   |  7 +++
 9 files changed, 42 insertions(+), 100 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 6640672..a191b93 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -992,6 +992,12 @@ static int mlx5_ib_modify_port(struct ib_device *ibdev, u8 
port, int mask,
return err;
 }
 
+static void print_lib_caps(struct mlx5_ib_dev *dev, u64 caps)
+{
+   mlx5_ib_dbg(dev, "MLX5_LIB_CAP_4K_UAR = %s\n",
+   caps & MLX5_LIB_CAP_4K_UAR ? "y" : "n");
+}
+
 static int calc_total_bfregs(struct mlx5_ib_dev *dev, bool lib_uar_4k,
 struct mlx5_ib_alloc_ucontext_req_v2 *req,
 u32 *num_sys_pages)
@@ -1122,6 +1128,10 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
resp.cqe_version = min_t(__u8,
 (__u8)MLX5_CAP_GEN(dev->mdev, cqe_version),
 req.max_cqe_version);
+   resp.log_uar_size = MLX5_CAP_GEN(dev->mdev, uar_4k) ?
+   MLX5_ADAPTER_PAGE_SHIFT : PAGE_SHIFT;
+   resp.num_uars_per_page = MLX5_CAP_GEN(dev->mdev, uar_4k) ?
+   MLX5_CAP_GEN(dev->mdev, 
num_of_uars_per_page) : 1;
resp.response_length = min(offsetof(typeof(resp), response_length) +
   sizeof(resp.response_length), udata->outlen);
 
@@ -1129,7 +1139,7 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
if (!context)
return ERR_PTR(-ENOMEM);
 
-   lib_uar_4k = false;
+   lib_uar_4k = req.lib_caps & MLX5_LIB_CAP_4K_UAR;
bfregi = >bfregi;
 
/* updates req->total_num_bfregs */
@@ -1209,6 +1219,12 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
sizeof(resp.reserved2);
}
 
+   if (field_avail(typeof(resp), log_uar_size, udata->outlen))
+   resp.response_length += sizeof(resp.log_uar_size);
+
+   if (field_avail(typeof(resp), num_uars_per_page, udata->outlen))
+   resp.response_length += sizeof(resp.num_uars_per_page);
+
err = ib_copy_to_udata(udata, , resp.response_length);
if (err)
goto out_td;
@@ -1216,7 +1232,8 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
bfregi->ver = ver;
bfregi->num_low_latency_bfregs = req.num_low_latency_bfregs;
context->cqe_version = resp.cqe_version;
-   context->lib_caps = false;
+   context->lib_caps = req.lib_caps;
+   print_lib_caps(dev, context->lib_caps);
 
return >ibucontext;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index 32d4af9..336d473 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -179,6 +179,8 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct 
mlx5_core_cq *cq,
mlx5_core_dbg(dev, "failed adding CP 0x%x to debug file 
system\n",
  cq->cqn);
 
+   cq->uar = dev->priv.uar;
+
return 0;
 
 err_cmd:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 3037631..a473cea 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -465,7 +465,6 @@ struct mlx5e_sq {
/* read only */
struct mlx5_wq_cyc wq;
u32dma_fifo_mask;
-   void __iomem  *uar_map;
struct netdev_queue   *txq;
u32sqn;
u16bf_buf_size;
@@ -479,7 +478

Re: [for-next V2 06/10] net/mlx5: Add interface to get reference to a UAR

2017-01-09 Thread Saeed Mahameed
On Sun, Jan 8, 2017 at 7:05 PM, Yuval Shaia <yuval.sh...@oracle.com> wrote:
> On Sun, Jan 08, 2017 at 05:54:47PM +0200, Saeed Mahameed wrote:
>> From: Eli Cohen <e...@mellanox.com>
>>
>>   err = mlx5_create_map_eq(dev, >cmd_eq, MLX5_EQ_VEC_CMD,
>>MLX5_NUM_CMD_EQE, 1ull << MLX5_EVENT_TYPE_CMD,
>> -  "mlx5_cmd_eq", >priv.bfregi.uars[0],
>> -  MLX5_EQ_TYPE_ASYNC);
>> +  "mlx5_cmd_eq",  MLX5_EQ_TYPE_ASYNC);
>
> Remove extra space
>

Hi Yuval, thanks for the review,
We will submit an incremental patch for this, as checkpatch doesn't
complain about such minor things.


[for-next V3 01/10] IB/mlx5: Fix kernel to user leak prevention logic

2017-01-09 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

The logic was broken as it failed to update the response length for
architectures with PAGE_SIZE larger than 4kB. As a result further
extension of the ucontext response struct would fail.

Fixes: d69e3bcf7976 ('IB/mlx5: Mmap the HCA's core clock register to 
user-space')
Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 86c61e7..852b5b7 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1148,13 +1148,13 @@ static struct ib_ucontext 
*mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
 * pretend we don't support reading the HCA's core clock. This is also
 * forced by mmap function.
 */
-   if (PAGE_SIZE <= 4096 &&
-   field_avail(typeof(resp), hca_core_clock_offset, udata->outlen)) {
-   resp.comp_mask |=
-   MLX5_IB_ALLOC_UCONTEXT_RESP_MASK_CORE_CLOCK_OFFSET;
-   resp.hca_core_clock_offset =
-   offsetof(struct mlx5_init_seg, internal_timer_h) %
-   PAGE_SIZE;
+   if (field_avail(typeof(resp), hca_core_clock_offset, udata->outlen)) {
+   if (PAGE_SIZE <= 4096) {
+   resp.comp_mask |=
+   
MLX5_IB_ALLOC_UCONTEXT_RESP_MASK_CORE_CLOCK_OFFSET;
+   resp.hca_core_clock_offset =
+   offsetof(struct mlx5_init_seg, 
internal_timer_h) % PAGE_SIZE;
+   }
resp.response_length += sizeof(resp.hca_core_clock_offset) +
sizeof(resp.reserved2);
}
-- 
2.7.4



[for-next V3 00/10][pull request] Mellanox 100G mlx5 4K UAR support

2017-01-09 Thread Saeed Mahameed
Hi Dave and Doug,

Following the mlx5-odp submission, you can find here the 2nd mlx5
submission for 4.11 as a pull-request including mlx5 4K UAR support from
Eli Cohen (details below).  For you Doug, this pull request will provide
you with both mlx5 odp and mlx5 4k UAR since it is based on Dave's
net-next mlx5-odp merge commit.

v1->v2:
  - Removed 64BIT arch dependency.
v2->v3:
  - Removed extra space.

Thank you,
Saeed.

The following changes since commit 525dfa2cdce4f5ab76251b5e57ebabf4f2dfc40c:

  Merge branch 'mlx5-odp' (2017-01-02 15:51:21 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git 
tags/mlx5-4kuar-for-4.11

for you to fetch changes up to f502d834950a28e02651bb7e2cc7111ddd352644:

  net/mlx5: Activate support for 4K UARs (2017-01-09 20:25:10 +0200)


mlx5 4K UAR

The following series of patches optimizes the usage of the UAR area which is
contained within the BAR 0-1. Previous versions of the firmware and the driver
assumed each system page contains a single UAR. This patch set will query the
firmware for a new capability that if published, means that the firmware can
support UARs of fixed 4K regardless of system page size. In the case of
powerpc, where page size equals 64KB, this means we can utilize 16 UARs per
system page. Since user space processes by default consume eight UARs per
context this means that with this change a process will need a single system
page to fulfill that requirement and in fact make use of more UARs which is
better in terms of performance.

In addition to optimizing user-space processes, we introduce an allocator
that can be used by kernel consumers to allocate blue flame registers
(which are areas within a UAR that are used to write doorbells). This provides
further optimization on using the UAR area since the Ethernet driver makes
use of a single blue flame register per system page and now it will use two
blue flame registers per 4K.

The series also makes changes to naming conventions and now the terms used in
the driver code match the terms used in the PRM (programmers reference manual).
Thus, what used to be called UUAR (micro UAR) is now called BFREG (blue flame
register).

In order to support compatibility between different versions of
library/driver/firmware, the library has now means to notify the kernel driver
that it supports the new scheme and the kernel can notify the library if it
supports this extension. So mixed versions of libraries can run concurrently
without any issues.

Thanks,
Eli and Matan


Eli Cohen (10):
  IB/mlx5: Fix kernel to user leak prevention logic
  IB/mlx5: Fix error handling order in create_kernel_qp
  mlx5: Fix naming convention with respect to UARs
  IB/mlx5: Fix retrieval of index to first hi class bfreg
  net/mlx5: Introduce blue flame register allocator
  net/mlx5: Add interface to get reference to a UAR
  IB/mlx5: Use blue flame register allocator in mlx5_ib
  IB/mlx5: Allow future extension of libmlx5 input data
  IB/mlx5: Support 4k UAR for libmlx5
  net/mlx5: Activate support for 4K UARs

 drivers/infiniband/hw/mlx5/cq.c|  10 +-
 drivers/infiniband/hw/mlx5/main.c  | 278 ++--
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |  32 +-
 drivers/infiniband/hw/mlx5/qp.c| 290 +++--
 drivers/net/ethernet/mellanox/mlx5/core/cq.c   |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  11 +-
 .../net/ethernet/mellanox/mlx5/core/en_common.c|  12 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  21 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   |  14 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  26 +-
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  | 351 +
 include/linux/mlx5/cq.h|   5 +-
 include/linux/mlx5/device.h|  23 +-
 include/linux/mlx5/doorbell.h  |   6 +-
 include/linux/mlx5/driver.h|  81 ++---
 include/linux/mlx5/mlx5_ifc.h  |   7 +-
 include/uapi/rdma/mlx5-abi.h   |  19 +-
 17 files changed, 672 insertions(+), 516 deletions(-)


[for-next V3 04/10] IB/mlx5: Fix retrieval of index to first hi class bfreg

2017-01-09 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

First the function retrieving the index of the first hi latency class
blue flame register. High latency class bfregs are located right above
medium latency class bfregs.

Fixes: c1be5232d21d ('IB/mlx5: Fix micro UAR allocator')
Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/qp.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index fbea9bd..240fbb0 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -490,12 +490,21 @@ static int next_bfreg(int n)
return n;
 }
 
+enum {
+   /* this is the first blue flame register in the array of bfregs assigned
+* to a processes. Since we do not use it for blue flame but rather
+* regular 64 bit doorbells, we do not need a lock for maintaiing
+* "odd/even" order
+*/
+   NUM_NON_BLUE_FLAME_BFREGS = 1,
+};
+
 static int num_med_bfreg(struct mlx5_bfreg_info *bfregi)
 {
int n;
 
n = bfregi->num_uars * MLX5_NON_FP_BFREGS_PER_UAR -
-   bfregi->num_low_latency_bfregs - 1;
+   bfregi->num_low_latency_bfregs - NUM_NON_BLUE_FLAME_BFREGS;
 
return n >= 0 ? n : 0;
 }
@@ -508,17 +517,9 @@ static int max_bfregi(struct mlx5_bfreg_info *bfregi)
 static int first_hi_bfreg(struct mlx5_bfreg_info *bfregi)
 {
int med;
-   int i;
-   int t;
 
med = num_med_bfreg(bfregi);
-   for (t = 0, i = first_med_bfreg();; i = next_bfreg(i)) {
-   t++;
-   if (t == med)
-   return next_bfreg(i);
-   }
-
-   return 0;
+   return next_bfreg(med);
 }
 
 static int alloc_high_class_bfreg(struct mlx5_bfreg_info *bfregi)
@@ -544,6 +545,8 @@ static int alloc_med_class_bfreg(struct mlx5_bfreg_info 
*bfregi)
for (i = first_med_bfreg(); i < first_hi_bfreg(bfregi); i = 
next_bfreg(i)) {
if (bfregi->count[i] < bfregi->count[minidx])
minidx = i;
+   if (!bfregi->count[minidx])
+   break;
}
 
bfregi->count[minidx]++;
@@ -558,6 +561,7 @@ static int alloc_bfreg(struct mlx5_bfreg_info *bfregi,
mutex_lock(>lock);
switch (lat) {
case MLX5_IB_LATENCY_CLASS_LOW:
+   BUILD_BUG_ON(NUM_NON_BLUE_FLAME_BFREGS != 1);
bfregn = 0;
bfregi->count[bfregn]++;
break;
-- 
2.7.4



[for-next V3 03/10] mlx5: Fix naming convention with respect to UARs

2017-01-09 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

This establishes a solid naming conventions for UARs. A UAR (User Access
Region) can have size identical to a system page or can be fixed 4KB
depending on a value queried by firmware. Each UAR always has 4 blue
flame register which are used to post doorbell to send queue. In
addition, a UAR has section used for posting doorbells to CQs or EQs. In
this patch we change names to reflect this conventions.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/cq.c|   6 +-
 drivers/infiniband/hw/mlx5/main.c  |  80 +--
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |   6 +-
 drivers/infiniband/hw/mlx5/qp.c| 176 -
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  |  90 ++---
 include/linux/mlx5/device.h|   9 +-
 include/linux/mlx5/driver.h|  14 +-
 include/uapi/rdma/mlx5-abi.h   |  12 +-
 10 files changed, 206 insertions(+), 203 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index b3ef47c..bb7e91c 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -689,7 +689,7 @@ int mlx5_ib_arm_cq(struct ib_cq *ibcq, enum 
ib_cq_notify_flags flags)
 {
struct mlx5_core_dev *mdev = to_mdev(ibcq->device)->mdev;
struct mlx5_ib_cq *cq = to_mcq(ibcq);
-   void __iomem *uar_page = mdev->priv.uuari.uars[0].map;
+   void __iomem *uar_page = mdev->priv.bfregi.uars[0].map;
unsigned long irq_flags;
int ret = 0;
 
@@ -790,7 +790,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct 
ib_udata *udata,
MLX5_SET(cqc, cqc, log_page_size,
 page_shift - MLX5_ADAPTER_PAGE_SHIFT);
 
-   *index = to_mucontext(context)->uuari.uars[0].index;
+   *index = to_mucontext(context)->bfregi.uars[0].index;
 
if (ucmd.cqe_comp_en == 1) {
if (unlikely((*cqe_size != 64) ||
@@ -886,7 +886,7 @@ static int create_cq_kernel(struct mlx5_ib_dev *dev, struct 
mlx5_ib_cq *cq,
MLX5_SET(cqc, cqc, log_page_size,
 cq->buf.buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
 
-   *index = dev->mdev->priv.uuari.uars[0].index;
+   *index = dev->mdev->priv.bfregi.uars[0].index;
 
return 0;
 
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 852b5b7..d5cf82b 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -999,12 +999,12 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
struct mlx5_ib_alloc_ucontext_req_v2 req = {};
struct mlx5_ib_alloc_ucontext_resp resp = {};
struct mlx5_ib_ucontext *context;
-   struct mlx5_uuar_info *uuari;
+   struct mlx5_bfreg_info *bfregi;
struct mlx5_uar *uars;
-   int gross_uuars;
+   int gross_bfregs;
int num_uars;
int ver;
-   int uuarn;
+   int bfregn;
int err;
int i;
size_t reqlen;
@@ -1032,10 +1032,10 @@ static struct ib_ucontext 
*mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
if (req.flags)
return ERR_PTR(-EINVAL);
 
-   if (req.total_num_uuars > MLX5_MAX_UUARS)
+   if (req.total_num_bfregs > MLX5_MAX_BFREGS)
return ERR_PTR(-ENOMEM);
 
-   if (req.total_num_uuars == 0)
+   if (req.total_num_bfregs == 0)
return ERR_PTR(-EINVAL);
 
if (req.comp_mask || req.reserved0 || req.reserved1 || req.reserved2)
@@ -1046,13 +1046,13 @@ static struct ib_ucontext 
*mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
 reqlen - sizeof(req)))
return ERR_PTR(-EOPNOTSUPP);
 
-   req.total_num_uuars = ALIGN(req.total_num_uuars,
-   MLX5_NON_FP_BF_REGS_PER_PAGE);
-   if (req.num_low_latency_uuars > req.total_num_uuars - 1)
+   req.total_num_bfregs = ALIGN(req.total_num_bfregs,
+   MLX5_NON_FP_BFREGS_PER_UAR);
+   if (req.num_low_latency_bfregs > req.total_num_bfregs - 1)
return ERR_PTR(-EINVAL);
 
-   num_uars = req.total_num_uuars / MLX5_NON_FP_BF_REGS_PER_PAGE;
-   gross_uuars = num_uars * MLX5_BF_REGS_PER_PAGE;
+   num_uars = req.total_num_bfregs / MLX5_NON_FP_BFREGS_PER_UAR;
+   gross_bfregs = num_uars * MLX5_BFREGS_PER_UAR;
resp.qp_tab_size = 1 << MLX5_CAP_GEN(dev->mdev, log_max_qp);
if (mlx5_core_is_pf(dev->mdev) && MLX5_CAP_GEN(dev->mdev, bf))
 

[for-next V3 09/10] IB/mlx5: Support 4k UAR for libmlx5

2017-01-09 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Add fields to structs to convey to kernel an indication whether the
library supports multi UARs per page and return to the library the size
of a UAR based on the queried value.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c  | 21 +++-
 drivers/net/ethernet/mellanox/mlx5/core/cq.c   |  2 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  9 ++--
 .../net/ethernet/mellanox/mlx5/core/en_common.c| 12 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 21 
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  | 56 --
 include/linux/mlx5/cq.h|  2 +-
 include/linux/mlx5/driver.h| 12 -
 include/uapi/rdma/mlx5-abi.h   |  7 +++
 9 files changed, 42 insertions(+), 100 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 6640672..a191b93 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -992,6 +992,12 @@ static int mlx5_ib_modify_port(struct ib_device *ibdev, u8 
port, int mask,
return err;
 }
 
+static void print_lib_caps(struct mlx5_ib_dev *dev, u64 caps)
+{
+   mlx5_ib_dbg(dev, "MLX5_LIB_CAP_4K_UAR = %s\n",
+   caps & MLX5_LIB_CAP_4K_UAR ? "y" : "n");
+}
+
 static int calc_total_bfregs(struct mlx5_ib_dev *dev, bool lib_uar_4k,
 struct mlx5_ib_alloc_ucontext_req_v2 *req,
 u32 *num_sys_pages)
@@ -1122,6 +1128,10 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
resp.cqe_version = min_t(__u8,
 (__u8)MLX5_CAP_GEN(dev->mdev, cqe_version),
 req.max_cqe_version);
+   resp.log_uar_size = MLX5_CAP_GEN(dev->mdev, uar_4k) ?
+   MLX5_ADAPTER_PAGE_SHIFT : PAGE_SHIFT;
+   resp.num_uars_per_page = MLX5_CAP_GEN(dev->mdev, uar_4k) ?
+   MLX5_CAP_GEN(dev->mdev, 
num_of_uars_per_page) : 1;
resp.response_length = min(offsetof(typeof(resp), response_length) +
   sizeof(resp.response_length), udata->outlen);
 
@@ -1129,7 +1139,7 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
if (!context)
return ERR_PTR(-ENOMEM);
 
-   lib_uar_4k = false;
+   lib_uar_4k = req.lib_caps & MLX5_LIB_CAP_4K_UAR;
bfregi = >bfregi;
 
/* updates req->total_num_bfregs */
@@ -1209,6 +1219,12 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
sizeof(resp.reserved2);
}
 
+   if (field_avail(typeof(resp), log_uar_size, udata->outlen))
+   resp.response_length += sizeof(resp.log_uar_size);
+
+   if (field_avail(typeof(resp), num_uars_per_page, udata->outlen))
+   resp.response_length += sizeof(resp.num_uars_per_page);
+
err = ib_copy_to_udata(udata, , resp.response_length);
if (err)
goto out_td;
@@ -1216,7 +1232,8 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
bfregi->ver = ver;
bfregi->num_low_latency_bfregs = req.num_low_latency_bfregs;
context->cqe_version = resp.cqe_version;
-   context->lib_caps = false;
+   context->lib_caps = req.lib_caps;
+   print_lib_caps(dev, context->lib_caps);
 
return >ibucontext;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index 32d4af9..336d473 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -179,6 +179,8 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct 
mlx5_core_cq *cq,
mlx5_core_dbg(dev, "failed adding CP 0x%x to debug file 
system\n",
  cq->cqn);
 
+   cq->uar = dev->priv.uar;
+
return 0;
 
 err_cmd:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 3037631..a473cea 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -465,7 +465,6 @@ struct mlx5e_sq {
/* read only */
struct mlx5_wq_cyc wq;
u32dma_fifo_mask;
-   void __iomem  *uar_map;
struct netdev_queue   *txq;
u32sqn;
u16bf_buf_size;
@@ -479,7 +478

[for-next V3 02/10] IB/mlx5: Fix error handling order in create_kernel_qp

2017-01-09 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Make sure order of cleanup is exactly the opposite of initialization.

Fixes: 9603b61de1ee ('mlx5: Move pci device handling from mlx5_ib to mlx5_core')
Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/qp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 53f4dd3..42d021cd 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -994,12 +994,12 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev,
return 0;
 
 err_wrid:
-   mlx5_db_free(dev->mdev, >db);
kfree(qp->sq.wqe_head);
kfree(qp->sq.w_list);
kfree(qp->sq.wrid);
kfree(qp->sq.wr_data);
kfree(qp->rq.wrid);
+   mlx5_db_free(dev->mdev, >db);
 
 err_free:
kvfree(*in);
@@ -1014,12 +1014,12 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev,
 
 static void destroy_qp_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp)
 {
-   mlx5_db_free(dev->mdev, >db);
kfree(qp->sq.wqe_head);
kfree(qp->sq.w_list);
kfree(qp->sq.wrid);
kfree(qp->sq.wr_data);
kfree(qp->rq.wrid);
+   mlx5_db_free(dev->mdev, >db);
mlx5_buf_free(dev->mdev, >buf);
free_uuar(>mdev->priv.uuari, qp->bf->uuarn);
 }
-- 
2.7.4



[for-next V3 07/10] IB/mlx5: Use blue flame register allocator in mlx5_ib

2017-01-09 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Make use of the blue flame registers allocator at mlx5_ib. Since blue
flame was not really supported we remove all the code that is related to
blue flame and we let all consumers to use the same blue flame register.
Once blue flame is supported we will add the code. As part of this patch
we also move the definition of struct mlx5_bf to mlx5_ib.h as it is only
used by mlx5_ib.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/cq.c|   8 +-
 drivers/infiniband/hw/mlx5/main.c  |  28 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |  11 ++-
 drivers/infiniband/hw/mlx5/qp.c|  73 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  16 +---
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  | 114 -
 include/linux/mlx5/cq.h|   3 +-
 include/linux/mlx5/doorbell.h  |   6 +-
 include/linux/mlx5/driver.h|  19 -
 10 files changed, 59 insertions(+), 221 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index bb7e91c..a28ec33 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -689,7 +689,7 @@ int mlx5_ib_arm_cq(struct ib_cq *ibcq, enum 
ib_cq_notify_flags flags)
 {
struct mlx5_core_dev *mdev = to_mdev(ibcq->device)->mdev;
struct mlx5_ib_cq *cq = to_mcq(ibcq);
-   void __iomem *uar_page = mdev->priv.bfregi.uars[0].map;
+   void __iomem *uar_page = mdev->priv.uar->map;
unsigned long irq_flags;
int ret = 0;
 
@@ -704,9 +704,7 @@ int mlx5_ib_arm_cq(struct ib_cq *ibcq, enum 
ib_cq_notify_flags flags)
mlx5_cq_arm(>mcq,
(flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED ?
MLX5_CQ_DB_REQ_NOT_SOL : MLX5_CQ_DB_REQ_NOT,
-   uar_page,
-   MLX5_GET_DOORBELL_LOCK(>priv.cq_uar_lock),
-   to_mcq(ibcq)->mcq.cons_index);
+   uar_page, to_mcq(ibcq)->mcq.cons_index);
 
return ret;
 }
@@ -886,7 +884,7 @@ static int create_cq_kernel(struct mlx5_ib_dev *dev, struct 
mlx5_ib_cq *cq,
MLX5_SET(cqc, cqc, log_page_size,
 cq->buf.buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
 
-   *index = dev->mdev->priv.bfregi.uars[0].index;
+   *index = dev->mdev->priv.uar->index;
 
return 0;
 
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index d5cf82b..e9f0830 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3074,8 +3074,6 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
if (mlx5_use_mad_ifc(dev))
get_ext_port_caps(dev);
 
-   MLX5_INIT_DOORBELL_LOCK(>uar_lock);
-
if (!mlx5_lag_is_active(mdev))
name = "mlx5_%d";
else
@@ -3251,9 +3249,21 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
if (err)
goto err_odp;
 
+   dev->mdev->priv.uar = mlx5_get_uars_page(dev->mdev);
+   if (!dev->mdev->priv.uar)
+   goto err_q_cnt;
+
+   err = mlx5_alloc_bfreg(dev->mdev, >bfreg, false, false);
+   if (err)
+   goto err_uar_page;
+
+   err = mlx5_alloc_bfreg(dev->mdev, >fp_bfreg, false, true);
+   if (err)
+   goto err_bfreg;
+
err = ib_register_device(>ib_dev, NULL);
if (err)
-   goto err_q_cnt;
+   goto err_fp_bfreg;
 
err = create_umr_res(dev);
if (err)
@@ -3276,6 +3286,15 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 err_dev:
ib_unregister_device(>ib_dev);
 
+err_fp_bfreg:
+   mlx5_free_bfreg(dev->mdev, >fp_bfreg);
+
+err_bfreg:
+   mlx5_free_bfreg(dev->mdev, >bfreg);
+
+err_uar_page:
+   mlx5_put_uars_page(dev->mdev, dev->mdev->priv.uar);
+
 err_q_cnt:
mlx5_ib_dealloc_q_counters(dev);
 
@@ -3307,6 +3326,9 @@ static void mlx5_ib_remove(struct mlx5_core_dev *mdev, 
void *context)
 
mlx5_remove_netdev_notifier(dev);
ib_unregister_device(>ib_dev);
+   mlx5_free_bfreg(dev->mdev, >fp_bfreg);
+   mlx5_free_bfreg(dev->mdev, >bfreg);
+   mlx5_put_uars_page(dev->mdev, mdev->priv.uar);
mlx5_ib_dealloc_q_counters(dev);
destroy_umrc_res(dev);
mlx5_ib_odp_remove_one(dev);
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index d4d1329..ae3bc4a 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniba

[for-next V3 08/10] IB/mlx5: Allow future extension of libmlx5 input data

2017-01-09 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Current check requests that new fields in struct
mlx5_ib_alloc_ucontext_req_v2 that are not known to the driver be zero.
This was introduced so new libraries passing additional information to
the kernel through struct mlx5_ib_alloc_ucontext_req_v2 will be notified
by old kernels that do not support their request by failing the
operation. This schecme is problematic since it requires libmlx5 to issue
the requests with descending input size for struct
mlx5_ib_alloc_ucontext_req_v2.

To avoid this, we require that new features that will obey the following
rules:
If the feature requires one or more fields in the response and the at
least one of the fields can be encoded such that a zero value means the
kernel ignored the request then this field will provide the indication
to the library. If no response is required or if zero is a valid
response, a new field should be added that indicates to the library
whether its request was processed.

Fixes: b368d7cb8ceb ('IB/mlx5: Add hca_core_clock_offset to udata in 
init_ucontext')
Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/cq.c  |   2 +-
 drivers/infiniband/hw/mlx5/main.c| 201 ++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  15 ++-
 drivers/infiniband/hw/mlx5/qp.c  | 133 ++-
 include/linux/mlx5/device.h  |  12 ++-
 include/linux/mlx5/driver.h  |  12 +--
 6 files changed, 209 insertions(+), 166 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index a28ec33..31803b3 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -788,7 +788,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct 
ib_udata *udata,
MLX5_SET(cqc, cqc, log_page_size,
 page_shift - MLX5_ADAPTER_PAGE_SHIFT);
 
-   *index = to_mucontext(context)->bfregi.uars[0].index;
+   *index = to_mucontext(context)->bfregi.sys_pages[0];
 
if (ucmd.cqe_comp_en == 1) {
if (unlikely((*cqe_size != 64) ||
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index e9f0830..6640672 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -992,6 +992,80 @@ static int mlx5_ib_modify_port(struct ib_device *ibdev, u8 
port, int mask,
return err;
 }
 
+static int calc_total_bfregs(struct mlx5_ib_dev *dev, bool lib_uar_4k,
+struct mlx5_ib_alloc_ucontext_req_v2 *req,
+u32 *num_sys_pages)
+{
+   int uars_per_sys_page;
+   int bfregs_per_sys_page;
+   int ref_bfregs = req->total_num_bfregs;
+
+   if (req->total_num_bfregs == 0)
+   return -EINVAL;
+
+   BUILD_BUG_ON(MLX5_MAX_BFREGS % MLX5_NON_FP_BFREGS_IN_PAGE);
+   BUILD_BUG_ON(MLX5_MAX_BFREGS < MLX5_NON_FP_BFREGS_IN_PAGE);
+
+   if (req->total_num_bfregs > MLX5_MAX_BFREGS)
+   return -ENOMEM;
+
+   uars_per_sys_page = get_uars_per_sys_page(dev, lib_uar_4k);
+   bfregs_per_sys_page = uars_per_sys_page * MLX5_NON_FP_BFREGS_PER_UAR;
+   req->total_num_bfregs = ALIGN(req->total_num_bfregs, 
bfregs_per_sys_page);
+   *num_sys_pages = req->total_num_bfregs / bfregs_per_sys_page;
+
+   if (req->num_low_latency_bfregs > req->total_num_bfregs - 1)
+   return -EINVAL;
+
+   mlx5_ib_dbg(dev, "uar_4k: fw support %s, lib support %s, user requested 
%d bfregs, alloated %d, using %d sys pages\n",
+   MLX5_CAP_GEN(dev->mdev, uar_4k) ? "yes" : "no",
+   lib_uar_4k ? "yes" : "no", ref_bfregs,
+   req->total_num_bfregs, *num_sys_pages);
+
+   return 0;
+}
+
+static int allocate_uars(struct mlx5_ib_dev *dev, struct mlx5_ib_ucontext 
*context)
+{
+   struct mlx5_bfreg_info *bfregi;
+   int err;
+   int i;
+
+   bfregi = >bfregi;
+   for (i = 0; i < bfregi->num_sys_pages; i++) {
+   err = mlx5_cmd_alloc_uar(dev->mdev, >sys_pages[i]);
+   if (err)
+   goto error;
+
+   mlx5_ib_dbg(dev, "allocated uar %d\n", bfregi->sys_pages[i]);
+   }
+   return 0;
+
+error:
+   for (--i; i >= 0; i--)
+   if (mlx5_cmd_free_uar(dev->mdev, bfregi->sys_pages[i]))
+   mlx5_ib_warn(dev, "failed to free uar %d\n", i);
+
+   return err;
+}
+
+static int deallocate_uars(struct mlx5_ib_dev *dev, struct mlx5_ib_ucontext 
*context)
+{
+   struct mlx5_bfreg_info *bfregi;
+   int err;
+   int i;
+
+

[for-next V3 05/10] net/mlx5: Introduce blue flame register allocator

2017-01-09 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Here is an implementation of an allocator that allocates blue flame
registers. A blue flame register is used for generating send doorbells.
A blue flame register can be used to generate either a regular doorbell
or a blue flame doorbell where the data to be sent is written to the
device's I/O memory hence saving the need to read the data from memory.
For blue flame kind of doorbells to succeed, the blue flame register
need to be mapped as write combining. The user can specify what kind of
send doorbells she wishes to use. If she requested write combining
mapping but that failed, the allocator will fall back to non write
combining mapping and will indicate that to the user.
Subsequent patches in this series will make use of this allocator.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/uar.c | 235 ++
 include/linux/mlx5/device.h   |   2 +
 include/linux/mlx5/driver.h   |  37 
 include/linux/mlx5/mlx5_ifc.h |   7 +-
 4 files changed, 279 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/uar.c 
b/drivers/net/ethernet/mellanox/mlx5/core/uar.c
index ce7fceb..6a081a8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/uar.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/uar.c
@@ -231,3 +231,238 @@ void mlx5_unmap_free_uar(struct mlx5_core_dev *mdev, 
struct mlx5_uar *uar)
mlx5_cmd_free_uar(mdev, uar->index);
 }
 EXPORT_SYMBOL(mlx5_unmap_free_uar);
+
+static int uars_per_sys_page(struct mlx5_core_dev *mdev)
+{
+   if (MLX5_CAP_GEN(mdev, uar_4k))
+   return MLX5_CAP_GEN(mdev, num_of_uars_per_page);
+
+   return 1;
+}
+
+static u64 uar2pfn(struct mlx5_core_dev *mdev, u32 index)
+{
+   u32 system_page_index;
+
+   if (MLX5_CAP_GEN(mdev, uar_4k))
+   system_page_index = index >> (PAGE_SHIFT - 
MLX5_ADAPTER_PAGE_SHIFT);
+   else
+   system_page_index = index;
+
+   return (pci_resource_start(mdev->pdev, 0) >> PAGE_SHIFT) + 
system_page_index;
+}
+
+static void up_rel_func(struct kref *kref)
+{
+   struct mlx5_uars_page *up = container_of(kref, struct mlx5_uars_page, 
ref_count);
+
+   list_del(>list);
+   if (mlx5_cmd_free_uar(up->mdev, up->index))
+   mlx5_core_warn(up->mdev, "failed to free uar index %d\n", 
up->index);
+   kfree(up->reg_bitmap);
+   kfree(up->fp_bitmap);
+   kfree(up);
+}
+
+static struct mlx5_uars_page *alloc_uars_page(struct mlx5_core_dev *mdev,
+ bool map_wc)
+{
+   struct mlx5_uars_page *up;
+   int err = -ENOMEM;
+   phys_addr_t pfn;
+   int bfregs;
+   int i;
+
+   bfregs = uars_per_sys_page(mdev) * MLX5_BFREGS_PER_UAR;
+   up = kzalloc(sizeof(*up), GFP_KERNEL);
+   if (!up)
+   return ERR_PTR(err);
+
+   up->mdev = mdev;
+   up->reg_bitmap = kcalloc(BITS_TO_LONGS(bfregs), sizeof(unsigned long), 
GFP_KERNEL);
+   if (!up->reg_bitmap)
+   goto error1;
+
+   up->fp_bitmap = kcalloc(BITS_TO_LONGS(bfregs), sizeof(unsigned long), 
GFP_KERNEL);
+   if (!up->fp_bitmap)
+   goto error1;
+
+   for (i = 0; i < bfregs; i++)
+   if ((i % MLX5_BFREGS_PER_UAR) < MLX5_NON_FP_BFREGS_PER_UAR)
+   set_bit(i, up->reg_bitmap);
+   else
+   set_bit(i, up->fp_bitmap);
+
+   up->bfregs = bfregs;
+   up->fp_avail = bfregs * MLX5_FP_BFREGS_PER_UAR / MLX5_BFREGS_PER_UAR;
+   up->reg_avail = bfregs * MLX5_NON_FP_BFREGS_PER_UAR / 
MLX5_BFREGS_PER_UAR;
+
+   err = mlx5_cmd_alloc_uar(mdev, >index);
+   if (err) {
+   mlx5_core_warn(mdev, "mlx5_cmd_alloc_uar() failed, %d\n", err);
+   goto error1;
+   }
+
+   pfn = uar2pfn(mdev, up->index);
+   if (map_wc) {
+   up->map = ioremap_wc(pfn << PAGE_SHIFT, PAGE_SIZE);
+   if (!up->map) {
+   err = -EAGAIN;
+   goto error2;
+   }
+   } else {
+   up->map = ioremap(pfn << PAGE_SHIFT, PAGE_SIZE);
+   if (!up->map) {
+   err = -ENOMEM;
+   goto error2;
+   }
+   }
+   kref_init(>ref_count);
+   mlx5_core_dbg(mdev, "allocated UAR page: index %d, total bfregs %d\n",
+ up->index, up->bfregs);
+   return up;
+
+error2:
+   if (mlx5_cmd_free_uar(mdev, up->index))
+   mlx5_core_warn(mdev, "failed to free uar index 

[for-next V3 06/10] net/mlx5: Add interface to get reference to a UAR

2017-01-09 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

A reference to a UAR is required to generate CQ or EQ doorbells. Since
CQ or EQ doorbells can all be generated using the same UAR area without
any effect on performance, we are just getting a reference to any
available UAR, If one is not available we allocate it but we don't waste
the blue flame registers it can provide and we will use them for
subsequent allocations.
We get a reference to such UAR and put in mlx5_priv so any kernel
consumer can make use of it.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   | 14 ---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 22 ++
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  | 32 ++
 include/linux/mlx5/driver.h|  5 +++-
 4 files changed, 59 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 11a8d63..5130d65 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -512,7 +512,7 @@ static void init_eq_buf(struct mlx5_eq *eq)
 
 int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 
vecidx,
   int nent, u64 mask, const char *name,
-  struct mlx5_uar *uar, enum mlx5_eq_type type)
+  enum mlx5_eq_type type)
 {
u32 out[MLX5_ST_SZ_DW(create_eq_out)] = {0};
struct mlx5_priv *priv = >priv;
@@ -556,7 +556,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct 
mlx5_eq *eq, u8 vecidx,
 
eqc = MLX5_ADDR_OF(create_eq_in, in, eq_context_entry);
MLX5_SET(eqc, eqc, log_eq_size, ilog2(eq->nent));
-   MLX5_SET(eqc, eqc, uar_page, uar->index);
+   MLX5_SET(eqc, eqc, uar_page, priv->uar->index);
MLX5_SET(eqc, eqc, intr, vecidx);
MLX5_SET(eqc, eqc, log_page_size,
 eq->buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
@@ -571,7 +571,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct 
mlx5_eq *eq, u8 vecidx,
eq->eqn = MLX5_GET(create_eq_out, out, eq_number);
eq->irqn = priv->msix_arr[vecidx].vector;
eq->dev = dev;
-   eq->doorbell = uar->map + MLX5_EQ_DOORBEL_OFFSET;
+   eq->doorbell = priv->uar->map + MLX5_EQ_DOORBEL_OFFSET;
err = request_irq(eq->irqn, handler, 0,
  priv->irq_info[vecidx].name, eq);
if (err)
@@ -686,8 +686,7 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
 
err = mlx5_create_map_eq(dev, >cmd_eq, MLX5_EQ_VEC_CMD,
 MLX5_NUM_CMD_EQE, 1ull << MLX5_EVENT_TYPE_CMD,
-"mlx5_cmd_eq", >priv.bfregi.uars[0],
-MLX5_EQ_TYPE_ASYNC);
+"mlx5_cmd_eq", MLX5_EQ_TYPE_ASYNC);
if (err) {
mlx5_core_warn(dev, "failed to create cmd EQ %d\n", err);
return err;
@@ -697,8 +696,7 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
 
err = mlx5_create_map_eq(dev, >async_eq, MLX5_EQ_VEC_ASYNC,
 MLX5_NUM_ASYNC_EQE, async_event_mask,
-"mlx5_async_eq", >priv.bfregi.uars[0],
-MLX5_EQ_TYPE_ASYNC);
+"mlx5_async_eq", MLX5_EQ_TYPE_ASYNC);
if (err) {
mlx5_core_warn(dev, "failed to create async EQ %d\n", err);
goto err1;
@@ -708,7 +706,6 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
 MLX5_EQ_VEC_PAGES,
 /* TODO: sriov max_vf + */ 1,
 1 << MLX5_EVENT_TYPE_PAGE_REQUEST, 
"mlx5_pages_eq",
->priv.bfregi.uars[0],
 MLX5_EQ_TYPE_ASYNC);
if (err) {
mlx5_core_warn(dev, "failed to create pages EQ %d\n", err);
@@ -722,7 +719,6 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
 MLX5_NUM_ASYNC_EQE,
 1 << MLX5_EVENT_TYPE_PAGE_FAULT,
 "mlx5_page_fault_eq",
->priv.bfregi.uars[0],
 MLX5_EQ_TYPE_PF);
if (err) {
mlx5_core_warn(dev, "failed to create page fault EQ 
%d\n",
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/co

[for-next V3 10/10] net/mlx5: Activate support for 4K UARs

2017-01-09 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Activate 4K UAR support for firmware versions that support it.

Signed-off-by: Eli Cohen <e...@mellanox.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Leon Romanovsky <l...@kernel.org>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index ff1f144..a16ee16 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -530,6 +530,10 @@ static int handle_hca_cap(struct mlx5_core_dev *dev)
/* disable cmdif checksum */
MLX5_SET(cmd_hca_cap, set_hca_cap, cmdif_checksum, 0);
 
+   /* If the HCA supports 4K UARs use it */
+   if (MLX5_CAP_GEN_MAX(dev, uar_4k))
+   MLX5_SET(cmd_hca_cap, set_hca_cap, uar_4k, 1);
+
MLX5_SET(cmd_hca_cap, set_hca_cap, log_uar_page_sz, PAGE_SHIFT - 12);
 
err = set_caps(dev, set_ctx, set_sz,
-- 
2.7.4



[PATCH net 00/12] Mellanox 100G mlx5 fixes 28-12-2016

2016-12-28 Thread Saeed Mahameed
Hi Dave,

Some fixes for mlx5 core and ethernet driver.

for -stable:
net/mlx5: Check FW limitations on log_max_qp before setting it
net/mlx5: Cancel recovery work in remove flow
net/mlx5: Avoid shadowing numa_node
net/mlx5: Mask destination mac value in ethtool steering rules
net/mlx5: Prevent setting multicast macs for VFs
net/mlx5e: Don't sync netdev state when not registered
net/mlx5e: Disable netdev after close

Thanks,
Saeed.

Daniel Jurgens (1):
  net/mlx5: Cancel recovery work in remove flow

Eli Cohen (1):
  net/mlx5: Avoid shadowing numa_node

Gal Pressman (2):
  Revert "net/mlx5e: Expose PCIe statistics to ethtool"
  Revert "net/mlx5: Add MPCNT register infrastructure"

Huy Nguyen (1):
  net/mlx5e: Check ets capability before initializing ets settings

Maor Gottlieb (2):
  net/mlx5: Mask destination mac value in ethtool steering rules
  net/mlx5: Release FTE lock in error flow

Mohamad Haj Yahia (1):
  net/mlx5: Prevent setting multicast macs for VFs

Noa Osherovich (1):
  net/mlx5: Check FW limitations on log_max_qp before setting it

Or Gerlitz (1):
  net/mlx5: Disable RoCE on the e-switch management port under switchdev
mode

Saeed Mahameed (2):
  net/mlx5e: Don't sync netdev state when not registered
  net/mlx5e: Disable netdev after close

 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c |  3 +
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 17 
 .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c|  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 51 
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 32 +---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  2 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 11 +++
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 15 +++-
 include/linux/mlx5/device.h|  5 --
 include/linux/mlx5/driver.h|  1 -
 include/linux/mlx5/mlx5_ifc.h  | 93 --
 12 files changed, 45 insertions(+), 187 deletions(-)

-- 
2.7.4



[PATCH net 02/12] net/mlx5: Check FW limitations on log_max_qp before setting it

2016-12-28 Thread Saeed Mahameed
From: Noa Osherovich <no...@mellanox.com>

When setting HCA capabilities, set log_max_qp to be the minimum
between the selected profile's value and the HCA limitation.

Fixes: 938fe83c8dcb ('net/mlx5_core: New device capabilities...')
Signed-off-by: Noa Osherovich <no...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 54e5a78..23c12f1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -503,6 +503,13 @@ static int handle_hca_cap(struct mlx5_core_dev *dev)
MLX5_SET(cmd_hca_cap, set_hca_cap, pkey_table_size,
 to_fw_pkey_sz(dev, 128));
 
+   /* Check log_max_qp from HCA caps to set in current profile */
+   if (MLX5_CAP_GEN_MAX(dev, log_max_qp) < profile[prof_sel].log_max_qp) {
+   mlx5_core_warn(dev, "log_max_qp value in current profile is %d, 
changing it to HCA capability limit (%d)\n",
+  profile[prof_sel].log_max_qp,
+  MLX5_CAP_GEN_MAX(dev, log_max_qp));
+   profile[prof_sel].log_max_qp = MLX5_CAP_GEN_MAX(dev, 
log_max_qp);
+   }
if (prof->mask & MLX5_PROF_MASK_QP_SIZE)
MLX5_SET(cmd_hca_cap, set_hca_cap, log_max_qp,
 prof->log_max_qp);
-- 
2.7.4



[PATCH net 01/12] net/mlx5: Disable RoCE on the e-switch management port under switchdev mode

2016-12-28 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

Under the switchdev/offloads mode, packets that don't match any
e-switch steering rule are sent towards the e-switch management
port. We use a NIC HW steering rule set per vport (uplink and VFs)
to make them be received into the host OS through the respective
vport representor netdevice.

Currnetly such missed RoCE packets will not get to this NIC steering
rule, and hence VF RoCE will not work over the slow path of the offloads
mode. This is b/c these packets will be matched by a steering rule added
by the firmware that serves RoCE traffic set on the PF NIC vport which
is also the e-switch management port under SRIOV.

Disabling RoCE on the e-switch management vport when we are in the offloads
mode, will signal to the firmware to remove their RoCE rule, and then the
missed RoCE packets will be matched by the representor NIC steering rule
as any other missed packets.

To achieve that, we disable RoCE on the PF vport. We do that by removing
(hot-unplugging) the IB device instance associated with the PF. This is
also required by our current model where the PF serves as the uplink
representor and hence only SW switching (TC, bridge, OVS) applications
and slow path vport mlx5e net-device should be running over that vport.

Fixes: c930a3ad7453 ('net/mlx5e: Add devlink based SRIOV mode changes')
Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Hadar Hen Zion <had...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 466e161..03293ed 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -695,6 +695,12 @@ int esw_offloads_init(struct mlx5_eswitch *esw, int 
nvports)
if (err)
goto err_reps;
}
+
+   /* disable PF RoCE so missed packets don't go through RoCE steering */
+   mlx5_dev_list_lock();
+   mlx5_remove_dev_by_protocol(esw->dev, MLX5_INTERFACE_PROTOCOL_IB);
+   mlx5_dev_list_unlock();
+
return 0;
 
 err_reps:
@@ -718,6 +724,11 @@ static int esw_offloads_stop(struct mlx5_eswitch *esw)
 {
int err, err1, num_vfs = esw->dev->priv.sriov.num_vfs;
 
+   /* enable back PF RoCE */
+   mlx5_dev_list_lock();
+   mlx5_add_dev_by_protocol(esw->dev, MLX5_INTERFACE_PROTOCOL_IB);
+   mlx5_dev_list_unlock();
+
mlx5_eswitch_disable_sriov(esw);
err = mlx5_eswitch_enable_sriov(esw, num_vfs, SRIOV_LEGACY);
if (err) {
-- 
2.7.4



[PATCH net 03/12] net/mlx5: Cancel recovery work in remove flow

2016-12-28 Thread Saeed Mahameed
From: Daniel Jurgens <dani...@mellanox.com>

If there is pending delayed work for health recovery it must be canceled
if the device is being unloaded.

Fixes: 05ac2c0b7438 ("net/mlx5: Fix race between PCI error handlers and health 
work")
Signed-off-by: Daniel Jurgens <dani...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 23c12f1..0b49739 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1196,6 +1196,8 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, 
struct mlx5_priv *priv,
 {
int err = 0;
 
+   mlx5_drain_health_wq(dev);
+
mutex_lock(>intf_state_mutex);
if (test_bit(MLX5_INTERFACE_STATE_DOWN, >intf_state)) {
dev_warn(>pdev->dev, "%s: interface is down, NOP\n",
@@ -1358,10 +1360,9 @@ static pci_ers_result_t mlx5_pci_err_detected(struct 
pci_dev *pdev,
 
mlx5_enter_error_state(dev);
mlx5_unload_one(dev, priv, false);
-   /* In case of kernel call save the pci state and drain health wq */
+   /* In case of kernel call save the pci state */
if (state) {
pci_save_state(pdev);
-   mlx5_drain_health_wq(dev);
mlx5_pci_disable_device(dev);
}
 
-- 
2.7.4



[PATCH net 09/12] Revert "net/mlx5: Add MPCNT register infrastructure"

2016-12-28 Thread Saeed Mahameed
From: Gal Pressman <g...@mellanox.com>

This reverts commit 7f503169cabd70c1f13b9279c50eca7dfb9a7d51.

Fixes: 7f503169cabd ("net/mlx5: Add MPCNT register infrastructure")
Signed-off-by: Gal Pressman <g...@mellanox.com>
Reported-by: Jesper Dangaard Brouer <bro...@redhat.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 include/linux/mlx5/device.h   |  5 ---
 include/linux/mlx5/driver.h   |  1 -
 include/linux/mlx5/mlx5_ifc.h | 93 ---
 3 files changed, 99 deletions(-)

diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 9f48936..52b4374 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1071,11 +1071,6 @@ enum {
MLX5_INFINIBAND_PORT_COUNTERS_GROUP   = 0x20,
 };
 
-enum {
-   MLX5_PCIE_PERFORMANCE_COUNTERS_GROUP   = 0x0,
-   MLX5_PCIE_TIMERS_AND_STATES_COUNTERS_GROUP = 0x2,
-};
-
 static inline u16 mlx5_to_sw_pkey_sz(int pkey_sz)
 {
if (pkey_sz > MLX5_MAX_LOG_PKEY_TABLE)
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 0ae5536..735b363 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -123,7 +123,6 @@ enum {
MLX5_REG_HOST_ENDIANNESS = 0x7004,
MLX5_REG_MCIA= 0x9014,
MLX5_REG_MLCR= 0x902b,
-   MLX5_REG_MPCNT   = 0x9051,
 };
 
 enum mlx5_dcbx_oper_mode {
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 57bec54..a852e9d 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1757,80 +1757,6 @@ struct mlx5_ifc_eth_802_3_cntrs_grp_data_layout_bits {
u8 reserved_at_4c0[0x300];
 };
 
-struct mlx5_ifc_pcie_perf_cntrs_grp_data_layout_bits {
-   u8 life_time_counter_high[0x20];
-
-   u8 life_time_counter_low[0x20];
-
-   u8 rx_errors[0x20];
-
-   u8 tx_errors[0x20];
-
-   u8 l0_to_recovery_eieos[0x20];
-
-   u8 l0_to_recovery_ts[0x20];
-
-   u8 l0_to_recovery_framing[0x20];
-
-   u8 l0_to_recovery_retrain[0x20];
-
-   u8 crc_error_dllp[0x20];
-
-   u8 crc_error_tlp[0x20];
-
-   u8 reserved_at_140[0x680];
-};
-
-struct mlx5_ifc_pcie_tas_cntrs_grp_data_layout_bits {
-   u8 life_time_counter_high[0x20];
-
-   u8 life_time_counter_low[0x20];
-
-   u8 time_to_boot_image_start[0x20];
-
-   u8 time_to_link_image[0x20];
-
-   u8 calibration_time[0x20];
-
-   u8 time_to_first_perst[0x20];
-
-   u8 time_to_detect_state[0x20];
-
-   u8 time_to_l0[0x20];
-
-   u8 time_to_crs_en[0x20];
-
-   u8 time_to_plastic_image_start[0x20];
-
-   u8 time_to_iron_image_start[0x20];
-
-   u8 perst_handler[0x20];
-
-   u8 times_in_l1[0x20];
-
-   u8 times_in_l23[0x20];
-
-   u8 dl_down[0x20];
-
-   u8 config_cycle1usec[0x20];
-
-   u8 config_cycle2to7usec[0x20];
-
-   u8 config_cycle_8to15usec[0x20];
-
-   u8 config_cycle_16_to_63usec[0x20];
-
-   u8 config_cycle_64usec[0x20];
-
-   u8 correctable_err_msg_sent[0x20];
-
-   u8 non_fatal_err_msg_sent[0x20];
-
-   u8 fatal_err_msg_sent[0x20];
-
-   u8 reserved_at_2e0[0x4e0];
-};
-
 struct mlx5_ifc_cmd_inter_comp_event_bits {
u8 command_completion_vector[0x20];
 
@@ -2995,12 +2921,6 @@ union mlx5_ifc_eth_cntrs_grp_data_layout_auto_bits {
u8 reserved_at_0[0x7c0];
 };
 
-union mlx5_ifc_pcie_cntrs_grp_data_layout_auto_bits {
-   struct mlx5_ifc_pcie_perf_cntrs_grp_data_layout_bits 
pcie_perf_cntrs_grp_data_layout;
-   struct mlx5_ifc_pcie_tas_cntrs_grp_data_layout_bits 
pcie_tas_cntrs_grp_data_layout;
-   u8 reserved_at_0[0x7c0];
-};
-
 union mlx5_ifc_event_auto_bits {
struct mlx5_ifc_comp_event_bits comp_event;
struct mlx5_ifc_dct_events_bits dct_events;
@@ -7320,18 +7240,6 @@ struct mlx5_ifc_ppcnt_reg_bits {
union mlx5_ifc_eth_cntrs_grp_data_layout_auto_bits counter_set;
 };
 
-struct mlx5_ifc_mpcnt_reg_bits {
-   u8 reserved_at_0[0x8];
-   u8 pcie_index[0x8];
-   u8 reserved_at_10[0xa];
-   u8 grp[0x6];
-
-   u8 clr[0x1];
-   u8 reserved_at_21[0x1f];
-
-   union mlx5_ifc_pcie_cntrs_grp_data_layout_auto_bits counter_set;
-};
-
 struct mlx5_ifc_ppad_reg_bits {
u8 reserved_at_0[0x3];
u8 single_mac[0x1];
@@ -7937,7 +7845,6 @@ union mlx5_ifc_ports_control_registers_document_bits {
struct mlx5_ifc_pmtu_reg_bits pmtu_reg;
struct mlx5_ifc_ppad_reg_bits ppad_reg;
struct mlx5_ifc_ppcnt_reg_bits ppcnt_reg;
-   struct mlx5_ifc_mpc

[PATCH net 05/12] net/mlx5: Mask destination mac value in ethtool steering rules

2016-12-28 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

We need to mask the destination mac value with the destination mac
mask when adding steering rule via ethtool.

Fixes: 1174fce8d1410 ('net/mlx5e: Support l3/l4 flow type specs in ethtool flow 
steering')
Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
index 3691451..d088eff 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
@@ -247,6 +247,7 @@ static int set_flow_attrs(u32 *match_c, u32 *match_v,
}
if (fs->flow_type & FLOW_MAC_EXT &&
!is_zero_ether_addr(fs->m_ext.h_dest)) {
+   mask_spec(fs->m_ext.h_dest, fs->h_ext.h_dest, ETH_ALEN);
ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4,
 outer_headers_c, dmac_47_16),
fs->m_ext.h_dest);
-- 
2.7.4



[PATCH net 08/12] Revert "net/mlx5e: Expose PCIe statistics to ethtool"

2016-12-28 Thread Saeed Mahameed
From: Gal Pressman <g...@mellanox.com>

This reverts commit 9c7262399ba12825f3ca4b00a76d8d5e77c720f5.
PCIe counters were introduced in a new firmware version, as a result users
with old firmware encountered a syndrome every 200ms due to update stats
work. This feature will be re-introduced later with appropriate capabilities
infrastructure.

Fixes: 9c7262399ba1 ("net/mlx5e: Expose PCIe statistics to ethtool")
Signed-off-by: Gal Pressman <g...@mellanox.com>
Reported-by: Jesper Dangaard Brouer <bro...@redhat.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 17 
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 24 
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 32 +-
 3 files changed, 1 insertion(+), 72 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 352462a..33a399a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -171,7 +171,6 @@ static int mlx5e_get_sset_count(struct net_device *dev, int 
sset)
return NUM_SW_COUNTERS +
   MLX5E_NUM_Q_CNTRS(priv) +
   NUM_VPORT_COUNTERS + NUM_PPORT_COUNTERS +
-  NUM_PCIE_COUNTERS +
   MLX5E_NUM_RQ_STATS(priv) +
   MLX5E_NUM_SQ_STATS(priv) +
   MLX5E_NUM_PFC_COUNTERS(priv) +
@@ -219,14 +218,6 @@ static void mlx5e_fill_stats_strings(struct mlx5e_priv 
*priv, uint8_t *data)
strcpy(data + (idx++) * ETH_GSTRING_LEN,
   pport_2819_stats_desc[i].format);
 
-   for (i = 0; i < NUM_PCIE_PERF_COUNTERS; i++)
-   strcpy(data + (idx++) * ETH_GSTRING_LEN,
-  pcie_perf_stats_desc[i].format);
-
-   for (i = 0; i < NUM_PCIE_TAS_COUNTERS; i++)
-   strcpy(data + (idx++) * ETH_GSTRING_LEN,
-  pcie_tas_stats_desc[i].format);
-
for (prio = 0; prio < NUM_PPORT_PRIO; prio++) {
for (i = 0; i < NUM_PPORT_PER_PRIO_TRAFFIC_COUNTERS; i++)
sprintf(data + (idx++) * ETH_GSTRING_LEN,
@@ -339,14 +330,6 @@ static void mlx5e_get_ethtool_stats(struct net_device *dev,
data[idx++] = 
MLX5E_READ_CTR64_BE(>stats.pport.RFC_2819_counters,
  pport_2819_stats_desc, i);
 
-   for (i = 0; i < NUM_PCIE_PERF_COUNTERS; i++)
-   data[idx++] = 
MLX5E_READ_CTR32_BE(>stats.pcie.pcie_perf_counters,
- pcie_perf_stats_desc, i);
-
-   for (i = 0; i < NUM_PCIE_TAS_COUNTERS; i++)
-   data[idx++] = 
MLX5E_READ_CTR32_BE(>stats.pcie.pcie_tas_counters,
- pcie_tas_stats_desc, i);
-
for (prio = 0; prio < NUM_PPORT_PRIO; prio++) {
for (i = 0; i < NUM_PPORT_PER_PRIO_TRAFFIC_COUNTERS; i++)
data[idx++] = 
MLX5E_READ_CTR64_BE(>stats.pport.per_prio_counters[prio],
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index cbfa38f..be5ef03 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -291,36 +291,12 @@ static void mlx5e_update_q_counter(struct mlx5e_priv 
*priv)
  >rx_out_of_buffer);
 }
 
-static void mlx5e_update_pcie_counters(struct mlx5e_priv *priv)
-{
-   struct mlx5e_pcie_stats *pcie_stats = >stats.pcie;
-   struct mlx5_core_dev *mdev = priv->mdev;
-   int sz = MLX5_ST_SZ_BYTES(mpcnt_reg);
-   void *out;
-   u32 *in;
-
-   in = mlx5_vzalloc(sz);
-   if (!in)
-   return;
-
-   out = pcie_stats->pcie_perf_counters;
-   MLX5_SET(mpcnt_reg, in, grp, MLX5_PCIE_PERFORMANCE_COUNTERS_GROUP);
-   mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_MPCNT, 0, 0);
-
-   out = pcie_stats->pcie_tas_counters;
-   MLX5_SET(mpcnt_reg, in, grp, 
MLX5_PCIE_TIMERS_AND_STATES_COUNTERS_GROUP);
-   mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_MPCNT, 0, 0);
-
-   kvfree(in);
-}
-
 void mlx5e_update_stats(struct mlx5e_priv *priv)
 {
mlx5e_update_q_counter(priv);
mlx5e_update_vport_counters(priv);
mlx5e_update_pport_counters(priv);
mlx5e_update_sw_counters(priv);
-   mlx5e_update_pcie_counters(priv);
 }
 
 void mlx5e_update_stats_work(struct work_struct *work)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index f202f87..ba5db1d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ether

[PATCH net 11/12] net/mlx5e: Don't sync netdev state when not registered

2016-12-28 Thread Saeed Mahameed
Skip setting netdev vxlan ports and netdev rx_mode on driver load
when netdev is not yet registered.

Synchronizing with netdev state is needed only on reset flow where the
netdev remains registered for the whole reset period.

This also fixes an access before initialization of net_device.addr_list_lock
- which for some reason initialized on register_netdev - where we queued
set_rx_mode work on driver load before netdev registration.

Fixes: 26e59d8077a3 ("net/mlx5e: Implement mlx5e interface attach/detach 
callbacks")
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reported-by: Sebastian Ott <seb...@linux.vnet.ibm.com>
Reviewed-by: Mohamad Haj Yahia <moha...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index be5ef03..cf270f6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3781,14 +3781,7 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
 
mlx5_lag_add(mdev, netdev);
 
-   if (mlx5e_vxlan_allowed(mdev)) {
-   rtnl_lock();
-   udp_tunnel_get_rx_info(netdev);
-   rtnl_unlock();
-   }
-
mlx5e_enable_async_events(priv);
-   queue_work(priv->wq, >set_rx_mode_work);
 
if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
mlx5_query_nic_vport_mac_address(mdev, 0, rep.hw_id);
@@ -3798,6 +3791,18 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
rep.netdev = netdev;
mlx5_eswitch_register_vport_rep(esw, 0, );
}
+
+   if (netdev->reg_state != NETREG_REGISTERED)
+   return;
+
+   /* Device already registered: sync netdev system state */
+   if (mlx5e_vxlan_allowed(mdev)) {
+   rtnl_lock();
+   udp_tunnel_get_rx_info(netdev);
+   rtnl_unlock();
+   }
+
+   queue_work(priv->wq, >set_rx_mode_work);
 }
 
 static void mlx5e_nic_disable(struct mlx5e_priv *priv)
-- 
2.7.4



[PATCH net 12/12] net/mlx5e: Disable netdev after close

2016-12-28 Thread Saeed Mahameed
Disable netdev should come after it was closed, although no harm of doing it
before -hence the MLX5E_STATE_DESTROYING bit- but it is more natural this way.

Fixes: 26e59d8077a3 ("net/mlx5e: Implement mlx5e interface attach/detach 
callbacks")
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Reviewed-by: Mohamad Haj Yahia <moha...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index cf270f6..1236b27 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3947,10 +3947,6 @@ void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, 
struct net_device *netdev)
const struct mlx5e_profile *profile = priv->profile;
 
set_bit(MLX5E_STATE_DESTROYING, >state);
-   if (profile->disable)
-   profile->disable(priv);
-
-   flush_workqueue(priv->wq);
 
rtnl_lock();
if (netif_running(netdev))
@@ -3958,6 +3954,10 @@ void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, 
struct net_device *netdev)
netif_device_detach(netdev);
rtnl_unlock();
 
+   if (profile->disable)
+   profile->disable(priv);
+   flush_workqueue(priv->wq);
+
mlx5e_destroy_q_counter(priv);
profile->cleanup_rx(priv);
mlx5e_close_drop_rq(priv);
-- 
2.7.4



[PATCH net 10/12] net/mlx5e: Check ets capability before initializing ets settings

2016-12-28 Thread Saeed Mahameed
From: Huy Nguyen <h...@mellanox.com>

During the initial setup, the ets command is sent to firmware
without checking if the HCA supports ets. This causes the invalid
command error. Add the ets capiblity check before sending firmware
command to initialize ets settings.

Fixes: e207b7e99176 ("net/mlx5e: ConnectX-4 firmware support for DCBX")
Signed-off-by: Huy Nguyen <h...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index 7f6c225..f0b460f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -723,6 +723,9 @@ static void mlx5e_ets_init(struct mlx5e_priv *priv)
int i;
struct ieee_ets ets;
 
+   if (!MLX5_CAP_GEN(priv->mdev, ets))
+   return;
+
memset(, 0, sizeof(ets));
ets.ets_cap = mlx5_max_tc(priv->mdev) + 1;
for (i = 0; i < ets.ets_cap; i++) {
-- 
2.7.4



[PATCH net 04/12] net/mlx5: Avoid shadowing numa_node

2016-12-28 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Avoid using a local variable named numa_node to avoid shadowing a public
one.

Fixes: db058a186f98 ('net/mlx5_core: Set irq affinity hints')
Signed-off-by: Eli Cohen <e...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 0b49739..6547f22 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -582,7 +582,6 @@ static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev 
*mdev, int i)
struct mlx5_priv *priv  = >priv;
struct msix_entry *msix = priv->msix_arr;
int irq = msix[i + MLX5_EQ_VEC_COMP_BASE].vector;
-   int numa_node   = priv->numa_node;
int err;
 
if (!zalloc_cpumask_var(>irq_info[i].mask, GFP_KERNEL)) {
@@ -590,7 +589,7 @@ static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev 
*mdev, int i)
return -ENOMEM;
}
 
-   cpumask_set_cpu(cpumask_local_spread(i, numa_node),
+   cpumask_set_cpu(cpumask_local_spread(i, priv->numa_node),
priv->irq_info[i].mask);
 
err = irq_set_affinity_hint(irq, priv->irq_info[i].mask);
-- 
2.7.4



[PATCH net 07/12] net/mlx5: Prevent setting multicast macs for VFs

2016-12-28 Thread Saeed Mahameed
From: Mohamad Haj Yahia <moha...@mellanox.com>

Need to check that VF mac address entered by the admin user is either
zero or unicast mac.
Multicast mac addresses are prohibited.

Fixes: 77256579c6b4 ('net/mlx5: E-Switch, Introduce Vport administration 
functions')
Signed-off-by: Mohamad Haj Yahia <moha...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index d6807c3..f14d9c9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1860,7 +1860,7 @@ int mlx5_eswitch_set_vport_mac(struct mlx5_eswitch *esw,
 
if (!ESW_ALLOWED(esw))
return -EPERM;
-   if (!LEGAL_VPORT(esw, vport))
+   if (!LEGAL_VPORT(esw, vport) || is_multicast_ether_addr(mac))
return -EINVAL;
 
mutex_lock(>state_lock);
-- 
2.7.4



[PATCH net 06/12] net/mlx5: Release FTE lock in error flow

2016-12-28 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Release the FTE lock when adding rule to the FTE has failed.

Fixes: 0fd758d6112f ('net/mlx5: Don't unlock fte while still using it')
Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Reviewed-by: Mark Bloch <ma...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index a263d89..0ac7a2f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1263,6 +1263,7 @@ static struct mlx5_flow_handle *add_rule_fg(struct 
mlx5_flow_group *fg,
nested_lock_ref_node(>node, FS_MUTEX_CHILD);
handle = add_rule_fte(fte, fg, dest, dest_num, false);
if (IS_ERR(handle)) {
+   unlock_ref_node(>node);
kfree(fte);
goto unlock_fg;
}
-- 
2.7.4



<    7   8   9   10   11   12   13   14   15   16   >