Re: [PATCH net-next 4/8] net/mlx5e: Consider IRQ affinity changes in NAPI poll

2015-11-02 Thread Saeed Mahameed
Hi Dave,

We agree with you, we will drop this patch for now and will think of a
cleaner approach to fix this in the future.

> From: David Miller [mailto:da...@davemloft.net]
> Sent: Monday, November 02, 2015 12:34 AM
> To: Or Gerlitz <ogerl...@mellanox.com>
> Cc: netdev@vger.kernel.org; Saeed Mahameed <sae...@mellanox.com>; Achiad 
> Shochat <ach...@mellanox.com>
> Subject: Re: [PATCH net-next 4/8] net/mlx5e: Consider IRQ affinity changes in 
> NAPI poll
>
> From: Or Gerlitz <ogerl...@mellanox.com>
> Date: Sun,  1 Nov 2015 19:35:18 +0200
>
>> @@ -49,6 +50,15 @@ struct mlx5_cqe64 *mlx5e_get_cqe(struct mlx5e_cq *cq)
>>   return cqe;
>>  }
>>
>> +static inline bool mlx5e_no_channel_affinity_change(struct
>> +mlx5e_channel *c) {
>> + int current_cpu = smp_processor_id();
>> + struct irq_data *d = irq_desc_get_irq_data(c->irq_desc);
>> + struct cpumask *aff = irq_data_get_affinity_mask(d);
>> +
>> + return cpumask_test_cpu(current_cpu, aff); }
>
> This is so much pointer dereferencing and then a bitmask test as well.
>
> Are you really sure sure an extremely rare situation warrants this test every 
> single NAPI poll call?
>
> If this is a real problem, then every driver is susceptible to the issue and 
> it therefore warrants a generic solution.  And if we have generic 
> infrastructure for this situation in the code NAPI polling networking code, I 
> guarantee that it will probably be implemented much more cheaply than this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/2] mlx5 minor SRIOV fixes

2015-12-09 Thread Saeed Mahameed
On Wed, Dec 9, 2015 at 4:41 AM, David Miller  wrote:
>
> Don't do this, submitting two disconnected patch series for the same
> driver (mlx5), for the same tree (net-next).
>
for next time ? or you want me to re-submit those two patch sets ?

> Just submit them all in one batch.

In case of resubmission, What will be prefered re-submit with V1 on
the larger series, or resubmit with new topic ?

Sorry if I am asking trivial questions.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 3/7] net/mlx5_core: Add flow steering lookup algorithms

2015-12-08 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Introduce the flow steering mlx5_flow_namespace (Namespace)
and fs_prio (Flow Steering Priority) tree nodes.

Namespaces are used in order to isolate different usages or types
of steering (for example, downstream patches will add a different
namespaces for the NIC driver and for E-Switch FDB usages).

Flow Steering Priorities are objects that describes priorities
ranges between different flow objects under the same namespace.

Example, entries in priority i are matched before entries
in priority i+1.

This patch adds the following algorithms:

1) Calculate level:
Each flow table has level(the priority between the flow tables).
When we initialize the flow steering tree, we assign range of levels
to each priority, therefore the level for new flow table is
the location within the priority related to the range of the priority.

2) Match between match criteria. This function is used
for searching flow group when new flow rule is added.

3) Match between match values. This function is used
for searching flow table entry  when new flow rule is added.

4) Add essential macros for traversing on a node's children.
E.g. traversing on all the flow table of some priority

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   93 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |   41 +
 2 files changed, 134 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 3c54d7b..cac0d15 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -114,3 +114,96 @@ static int tree_remove_node(struct fs_node *node)
tree_put_node(node);
return 0;
 }
+
+static struct fs_prio *find_prio(struct mlx5_flow_namespace *ns,
+unsigned int prio)
+{
+   struct fs_prio *iter_prio;
+
+   fs_for_each_prio(iter_prio, ns) {
+   if (iter_prio->prio == prio)
+   return iter_prio;
+   }
+
+   return NULL;
+}
+
+static unsigned int find_next_free_level(struct fs_prio *prio)
+{
+   if (!list_empty(>node.children)) {
+   struct mlx5_flow_table *ft;
+
+   ft = list_last_entry(>node.children,
+struct mlx5_flow_table,
+node.list);
+   return ft->level + 1;
+   }
+   return prio->start_level;
+}
+
+static bool masked_memcmp(void *mask, void *val1, void *val2, size_t size)
+{
+   unsigned int i;
+
+   for (i = 0; i < size; i++, mask++, val1++, val2++)
+   if ((*((u8 *)val1) & (*(u8 *)mask)) !=
+   ((*(u8 *)val2) & (*(u8 *)mask)))
+   return false;
+
+   return true;
+}
+
+static bool compare_match_value(struct mlx5_flow_group_mask *mask,
+   void *fte_param1, void *fte_param2)
+{
+   if (mask->match_criteria_enable &
+   1 << MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_OUTER_HEADERS) 
{
+   void *fte_match1 = MLX5_ADDR_OF(fte_match_param,
+   fte_param1, outer_headers);
+   void *fte_match2 = MLX5_ADDR_OF(fte_match_param,
+   fte_param2, outer_headers);
+   void *fte_mask = MLX5_ADDR_OF(fte_match_param,
+ mask->match_criteria, 
outer_headers);
+
+   if (!masked_memcmp(fte_mask, fte_match1, fte_match2,
+  MLX5_ST_SZ_BYTES(fte_match_set_lyr_2_4)))
+   return false;
+   }
+
+   if (mask->match_criteria_enable &
+   1 << 
MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_MISC_PARAMETERS) {
+   void *fte_match1 = MLX5_ADDR_OF(fte_match_param,
+   fte_param1, misc_parameters);
+   void *fte_match2 = MLX5_ADDR_OF(fte_match_param,
+   fte_param2, misc_parameters);
+   void *fte_mask = MLX5_ADDR_OF(fte_match_param,
+ mask->match_criteria, 
misc_parameters);
+
+   if (!masked_memcmp(fte_mask, fte_match1, fte_match2,
+  MLX5_ST_SZ_BYTES(fte_match_set_misc)))
+   return false;
+   }
+
+   if (mask->match_criteria_enable &
+   1 << MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_INNER_HEADERS) 
{
+   void *fte_match1 = ML

[PATCH net-next 1/7] net/mlx5_core: Introduce flow steering firmware commands

2015-12-08 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Introduce new Flow Steering (FS) firmware commands,
in-order to support the new flow steering infrastructure.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c  |  239 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h  |   65 ++
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |   81 +++
 include/linux/mlx5/fs.h   |   47 
 include/linux/mlx5/mlx5_ifc.h |   32 ++-
 6 files changed, 455 insertions(+), 11 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
 create mode 100644 include/linux/mlx5/fs.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index a075591..be10592 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -2,7 +2,7 @@ obj-$(CONFIG_MLX5_CORE) += mlx5_core.o
 
 mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o   \
-   mad.o transobj.o vport.o sriov.o
+   mad.o transobj.o vport.o sriov.o fs_cmd.o
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o flow_table.o eswitch.o \
en_main.o en_flow_table.o en_ethtool.o en_tx.o en_rx.o \
en_txrx.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
new file mode 100644
index 000..5096f4f
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -0,0 +1,239 @@
+/*
+ * Copyright (c) 2015, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "fs_core.h"
+#include "fs_cmd.h"
+#include "mlx5_core.h"
+
+int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
+  enum fs_flow_table_type type, unsigned int level,
+  unsigned int log_size, unsigned int *table_id)
+{
+   u32 out[MLX5_ST_SZ_DW(create_flow_table_out)];
+   u32 in[MLX5_ST_SZ_DW(create_flow_table_in)];
+   int err;
+
+   memset(in, 0, sizeof(in));
+
+   MLX5_SET(create_flow_table_in, in, opcode,
+MLX5_CMD_OP_CREATE_FLOW_TABLE);
+
+   MLX5_SET(create_flow_table_in, in, table_type, type);
+   MLX5_SET(create_flow_table_in, in, level, level);
+   MLX5_SET(create_flow_table_in, in, log_size, log_size);
+
+   memset(out, 0, sizeof(out));
+   err = mlx5_cmd_exec_check_status(dev, in, sizeof(in), out,
+sizeof(out));
+
+   if (!err)
+   *table_id = MLX5_GET(create_flow_table_out, out,
+table_id);
+   return err;
+}
+
+int mlx5_cmd_destroy_flow_table(struct mlx5_core_dev *dev,
+   struct mlx5_flow_table *ft)
+{
+   u32 in[MLX5_ST_SZ_DW(destroy_flow_table_in)];
+   u32 out[MLX5_ST_SZ_DW(destroy_flow_table_out)];
+
+   memset(in, 0, sizeof(in));
+   memset(out,

[PATCH net-next 2/7] net/mlx5_core: Add flow steering base data structures

2015-12-08 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Introducing the base data structure and its operations that are
going to represent ConnectX-4 Flow Steering, this data structure
is basically a tree and all Flow steering objects such as
(Flow Table/Flow Group/FTE/etc ..) are represented as fs_node(s).

fs_node is the base object which describes a basic tree node, with the
following extra info:
type: describes the runtime type of the node (Object).
lock: lock this node sub-tree.
ref_count: number of children + current references.
remove_func: a generic destructor.

fs_node types will be used and explained once the usage is added in the
following patches.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  116 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |6 +
 3 files changed, 123 insertions(+), 1 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index be10592..7fc5e23 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -2,7 +2,7 @@ obj-$(CONFIG_MLX5_CORE) += mlx5_core.o
 
 mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o   \
-   mad.o transobj.o vport.o sriov.o fs_cmd.o
+   mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o flow_table.o eswitch.o \
en_main.o en_flow_table.o en_ethtool.o en_tx.o en_rx.o \
en_txrx.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
new file mode 100644
index 000..3c54d7b
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -0,0 +1,116 @@
+/*
+ * Copyright (c) 2015, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+
+#include "mlx5_core.h"
+#include "fs_core.h"
+
+static void tree_init_node(struct fs_node *node,
+  unsigned int refcount,
+  void (*remove_func)(struct fs_node *))
+{
+   atomic_set(>refcount, refcount);
+   INIT_LIST_HEAD(>list);
+   INIT_LIST_HEAD(>children);
+   mutex_init(>lock);
+   node->remove_func = remove_func;
+}
+
+static void tree_add_node(struct fs_node *node, struct fs_node *parent)
+{
+   if (parent)
+   atomic_inc(>refcount);
+   node->parent = parent;
+
+   /* Parent is the root */
+   if (!parent)
+   node->root = node;
+   else
+   node->root = parent->root;
+}
+
+static void tree_get_node(struct fs_node *node)
+{
+   atomic_inc(>refcount);
+}
+
+static void nested_lock_ref_node(struct fs_node *node)
+{
+   if (node) {
+   mutex_lock_nested(>lock, SINGLE_DEPTH_NESTING);
+   atomic_inc(>refcount);
+   }
+}
+
+static void lock_ref_node(struct fs_node *node)
+{
+   if (node) {
+   mutex_lock(>lock);
+   atomic_inc(>refcount);
+   }
+}
+
+static void unlock_ref_node(struct 

[PATCH net-next 4/7] net/mlx5_core: Introduce flow steering API

2015-12-08 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Introducing the following objects:

mlx5_flow_root_namespace: represent the root of specific flow table
type tree(e.g NIC receive, FDB, etc..)

mlx5_flow_group: define the mask of the flow specification.

fs_fte(flow steering flow table entry): defines the value of the
flow specification.

The following describes the relationships between the tree objects:
root_namespace --> priorities -->namespaces -->
priorities -->flow-tables --> flow-groups -->
flow-entries --> destinations

When we create new object(flow table/flow group/flow table entry), we
call to the FW command and then we add the related sw object to the tree.

When we destroy object, e.g. call to mlx5_destroy_flow_table, we use
the tree node destructor for destroying the FW object and remove the
node from the tree.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  464 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |   23 +
 2 files changed, 487 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index cac0d15..1828351 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -35,6 +35,12 @@
 
 #include "mlx5_core.h"
 #include "fs_core.h"
+#include "fs_cmd.h"
+
+static void del_rule(struct fs_node *node);
+static void del_flow_table(struct fs_node *node);
+static void del_flow_group(struct fs_node *node);
+static void del_fte(struct fs_node *node);
 
 static void tree_init_node(struct fs_node *node,
   unsigned int refcount,
@@ -207,3 +213,461 @@ static bool compare_match_criteria(u8 
match_criteria_enable1,
return match_criteria_enable1 == match_criteria_enable2 &&
!memcmp(mask1, mask2, MLX5_ST_SZ_BYTES(fte_match_param));
 }
+
+static struct mlx5_flow_root_namespace *find_root(struct fs_node *node)
+{
+   struct fs_node *root;
+   struct mlx5_flow_namespace *ns;
+
+   root = node->root;
+
+   if (WARN_ON(root->type != FS_TYPE_NAMESPACE)) {
+   pr_warn("mlx5: flow steering node is not in tree or 
garbaged\n");
+   return NULL;
+   }
+
+   ns = container_of(root, struct mlx5_flow_namespace, node);
+   return container_of(ns, struct mlx5_flow_root_namespace, ns);
+}
+
+static inline struct mlx5_core_dev *get_dev(struct fs_node *node)
+{
+   struct mlx5_flow_root_namespace *root = find_root(node);
+
+   if (root)
+   return root->dev;
+   return NULL;
+}
+
+static void del_flow_table(struct fs_node *node)
+{
+   struct mlx5_flow_table *ft;
+   struct mlx5_core_dev *dev;
+   struct fs_prio *prio;
+   int err;
+
+   fs_get_obj(ft, node);
+   dev = get_dev(>node);
+
+   err = mlx5_cmd_destroy_flow_table(dev, ft);
+   if (err)
+   pr_warn("flow steering can't destroy ft\n");
+   fs_get_obj(prio, ft->node.parent);
+   prio->num_ft--;
+}
+
+static void del_rule(struct fs_node *node)
+{
+   struct mlx5_flow_rule *rule;
+   struct mlx5_flow_table *ft;
+   struct mlx5_flow_group *fg;
+   struct fs_fte *fte;
+   u32 *match_value;
+   struct mlx5_core_dev *dev = get_dev(node);
+   int match_len = MLX5_ST_SZ_BYTES(fte_match_param);
+   int err;
+
+   match_value = mlx5_vzalloc(match_len);
+   if (!match_value) {
+   pr_warn("failed to allocate inbox\n");
+   return;
+   }
+
+   fs_get_obj(rule, node);
+   fs_get_obj(fte, rule->node.parent);
+   fs_get_obj(fg, fte->node.parent);
+   memcpy(match_value, fte->val, sizeof(fte->val));
+   fs_get_obj(ft, fg->node.parent);
+   list_del(>node.list);
+   fte->dests_size--;
+   if (fte->dests_size) {
+   err = mlx5_cmd_update_fte(dev, ft,
+ fg->id, fte);
+   if (err)
+   pr_warn("%s can't del rule fg id=%d fte_index=%d\n",
+   __func__, fg->id, fte->index);
+   }
+   kvfree(match_value);
+}
+
+static void del_fte(struct fs_node *node)
+{
+   struct mlx5_flow_table *ft;
+   struct mlx5_flow_group *fg;
+   struct mlx5_core_dev *dev;
+   struct fs_fte *fte;
+   int err;
+
+   fs_get_obj(fte, node);
+   fs_get_obj(fg, fte->node.parent);
+   fs_get_obj(ft, fg->node.parent);
+
+   dev = get_dev(>node);
+   err = mlx5_cmd_delete_fte(dev, ft,
+ fte-

[PATCH net-next 0/7] mlx5 improved flow steering management

2015-12-08 Thread Saeed Mahameed
Hi Dave,

This patch series modifies the driver's code that manages flow steering 
rules with Connectx-4 devices.

Basic introduction:

The flow steering device specification model is composed of the following 
entities:

Destination (either a TIR/Flow table/vport), where TIR is RSS end-point, vport 
is the VF eSwitch port in SRIOV.

Flow table entry (FTE) - the values used by the flow specification  
Flow table group (FG) - the masks used by the flow specification
Flow table (FT) - groups several FGs and can serve as destination

The flow steering software entities:

In addition to the device objects, the software have two more objects:

Priorities - group several FTs. Handles order of packet matching.

Namespaces - group several priorities. Namespace are used in order to 
isolate different usages of steering (for example, add two separate
namespaces, one for the NIC driver and one for E-Switch FDB).

The base data structure for the flow steering management is a tree and 
all the flow steering objects such as (Namespace/Flow table/Flow 
Group/FTE/etc.) 
are represented as a node in the tree, e.g.:
Priority-0 -> FT1 -> FG -> FTE -> TIR (destination)
Priority-1 -> FT2 -> FG->  FTE -> TIR (destination)

Matching begins in FT1 flow rules and if there is a miss on all the FTEs 
then matching continues on the FTEs in FT2.

The new implementation solves/improves the following
issues in the current code:

1) The new impl. supports multiple destinations, the search for existing rule 
with
   the same matching value is performed by the flow steering management.
   In the current impl. the E-switch FDB management code needs to search 
   for existing rules before calling to the add rule function.

2) The new impl. manages the flow table level, in the current implementation 
the 
   consumer states the flow table level when new flow table is created without 
   any knowledge about the levels of other flow tables.

3) In the current impl. the consumer can't create or destroy flow
   groups dynamically, the flow groups are passed as argument to the create 
   flow table API. The new impl. exposes API for create/destroy flow group.

The series is built as follows:

Patch #1 add flow steering API firmware commands.

Patch #2 add tree operation of the flow steering tree: add/remove node,
initialize node and take reference count on a node.

Patch #3 add essential algorithms for managing the flow steering.

Patch #4 Initialize the flow steering tree, flow steering initialization is 
based 
on static tree which illustrates the flow steering tree when the driver is 
loaded.

Patch #5 is the main patch of the series. It introduce the flow steering API.

Patch #6 Expose the new flow steering API and remove the old one.
The Ethernet flow steering follows the existing implementation,
but uses the new steering API.

Patch #7 Rename en_flow_table.c to en_fs.c in order to be aligned with
the new flow steering files.


Maor Gottlieb (7):
  net/mlx5_core: Introduce flow steering firmware commands
  net/mlx5_core: Add flow steering base data structures
  net/mlx5_core: Add flow steering lookup algorithms
  net/mlx5_core: Introduce flow steering API
  net/mlx5_core: Flow steering tree initialization
  net/mlx5: Use flow steering infrastructure for mlx5_en
  net/mlx5e: Rename en_flow_table.c to en_fs.c

 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   23 +-
 .../ethernet/mellanox/mlx5/core/en_flow_table.c| 1046 -
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c| 1224 
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  291 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |   15 +-
 .../net/ethernet/mellanox/mlx5/core/flow_table.c   |  422 ---
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |  239 
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h   |   65 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 1039 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |  155 +++
 drivers/net/ethernet/mellanox/mlx5/core/main.c |9 +
 include/linux/mlx5/driver.h|2 +
 include/linux/mlx5/flow_table.h|   63 -
 include/linux/mlx5/fs.h|   93 ++
 include/linux/mlx5/mlx5_ifc.h  |   32 +-
 17 files changed, 2922 insertions(+), 1804 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/flow_table.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
 create mode 100644 

[PATCH net-next 7/7] net/mlx5e: Rename en_flow_table.c to en_fs.c

2015-12-08 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Rename en_flow_table.c to en_fs.c in order to be aligned
with the new flow steering files.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |2 +-
 .../ethernet/mellanox/mlx5/core/en_flow_table.c| 1224 
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c| 1224 
 3 files changed, 1225 insertions(+), 1225 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 11ee062..fe11e96 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -4,5 +4,5 @@ mlx5_core-y :=  main.o cmd.o debugfs.o fw.o eq.o uar.o 
pagealloc.o \
health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o   \
mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o \
-   en_main.o en_flow_table.o en_ethtool.o en_tx.o en_rx.o \
+   en_main.o en_fs.o en_ethtool.o en_tx.o en_rx.o \
en_txrx.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
deleted file mode 100644
index 80d81ab..000
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
+++ /dev/null
@@ -1,1224 +0,0 @@
-/*
- * Copyright (c) 2015, Mellanox Technologies. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- *  - Redistributions of source code must retain the above
- *copyright notice, this list of conditions and the following
- *disclaimer.
- *
- *  - Redistributions in binary form must reproduce the above
- *copyright notice, this list of conditions and the following
- *disclaimer in the documentation and/or other materials
- *provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include "en.h"
-
-#define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
-
-enum {
-   MLX5E_FULLMATCH = 0,
-   MLX5E_ALLMULTI  = 1,
-   MLX5E_PROMISC   = 2,
-};
-
-enum {
-   MLX5E_UC= 0,
-   MLX5E_MC_IPV4   = 1,
-   MLX5E_MC_IPV6   = 2,
-   MLX5E_MC_OTHER  = 3,
-};
-
-enum {
-   MLX5E_ACTION_NONE = 0,
-   MLX5E_ACTION_ADD  = 1,
-   MLX5E_ACTION_DEL  = 2,
-};
-
-struct mlx5e_eth_addr_hash_node {
-   struct hlist_node  hlist;
-   u8 action;
-   struct mlx5e_eth_addr_info ai;
-};
-
-static inline int mlx5e_hash_eth_addr(u8 *addr)
-{
-   return addr[5];
-}
-
-static void mlx5e_add_eth_addr_to_hash(struct hlist_head *hash, u8 *addr)
-{
-   struct mlx5e_eth_addr_hash_node *hn;
-   int ix = mlx5e_hash_eth_addr(addr);
-   int found = 0;
-
-   hlist_for_each_entry(hn, [ix], hlist)
-   if (ether_addr_equal_64bits(hn->ai.addr, addr)) {
-   found = 1;
-   break;
-   }
-
-   if (found) {
-   hn->action = MLX5E_ACTION_NONE;
-   return;
-   }
-
-   hn = kzalloc(sizeof(*hn), GFP_ATOMIC);
-   if (!hn)
-   return;
-
-   ether_addr_copy(hn->ai.addr, addr);
-   hn->action = MLX5E_ACTION_ADD;
-
-   hlist_add_head(>hlist, [ix]);
-}
-
-static void mlx5e_del_eth_addr_from_hash(struct mlx5e_eth_addr_hash_node *hn)
-{
-   hlist_del(>hlist);
-   kfree(hn);
-}
-
-static void mlx5e_del_eth_addr_from_flow_table(struct mlx5e_priv *priv,
-  struct mlx5e_eth_ad

[PATCH net-next 5/7] net/mlx5_core: Flow steering tree initialization

2015-12-08 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Flow steering initialization is based on static tree which
illustrates the flow steering tree when the driver is loaded. The
initialization considers the max supported flow table level of the device,
a minimum of 2 kernel flow tables(vlan and mac) are required to have
kernel flow table functionality.

The tree structures when the driver is loaded:

root_namespace(receive nic)
  |
priority-0 (kernel priority)
  |
namespace(kernel namespace)
  |
priority-0 (flow tables priority)

In the following patches, When the EN driver will use the flow steering
API, it create two flow tables and their flow groups under
priority-0(flow tables priority).

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  366 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |4 +
 include/linux/mlx5/driver.h   |2 +
 include/linux/mlx5/fs.h   |8 +
 4 files changed, 380 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 1828351..4a83632 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -37,6 +37,54 @@
 #include "fs_core.h"
 #include "fs_cmd.h"
 
+#define INIT_TREE_NODE_ARRAY_SIZE(...) (sizeof((struct 
init_tree_node[]){__VA_ARGS__}) /\
+sizeof(struct init_tree_node))
+
+#define INIT_PRIO(min_level_val, max_ft_val,\
+ start_level_val, ...) {.type = FS_TYPE_PRIO,\
+   .min_ft_level = min_level_val,\
+   .start_level = start_level_val,\
+   .max_ft = max_ft_val,\
+   .children = (struct init_tree_node[]) {__VA_ARGS__},\
+   .ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
+}
+
+#define ADD_PRIO(min_level_val, max_ft_val, start_level_val, ...)\
+   INIT_PRIO(min_level_val, max_ft_val, start_level_val,\
+ __VA_ARGS__)\
+
+#define ADD_FT_PRIO(max_ft_val, start_level_val, ...)\
+   INIT_PRIO(0, max_ft_val, start_level_val,\
+ __VA_ARGS__)\
+
+#define ADD_NS(...) {.type = FS_TYPE_NAMESPACE,\
+   .children = (struct init_tree_node[]) {__VA_ARGS__},\
+   .ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
+}
+
+#define KERNEL_START_LEVEL 0
+#define KERNEL_P0_START_LEVEL KERNEL_START_LEVEL
+#define KERNEL_MAX_FT 2
+#define KENREL_MIN_LEVEL 2
+static struct init_tree_node {
+   enum fs_node_type   type;
+   struct init_tree_node *children;
+   int ar_size;
+   int min_ft_level;
+   int prio;
+   int max_ft;
+   int start_level;
+} root_fs = {
+   .type = FS_TYPE_NAMESPACE,
+   .ar_size = 1,
+   .children = (struct init_tree_node[]) {
+   ADD_PRIO(KENREL_MIN_LEVEL, KERNEL_MAX_FT,
+KERNEL_START_LEVEL,
+ADD_NS(ADD_FT_PRIO(KERNEL_MAX_FT,
+   KERNEL_P0_START_LEVEL))),
+   }
+};
+
 static void del_rule(struct fs_node *node);
 static void del_flow_table(struct fs_node *node);
 static void del_flow_group(struct fs_node *node);
@@ -671,3 +719,321 @@ static void mlx5_destroy_flow_group(struct 
mlx5_flow_group *fg)
mlx5_core_warn(get_dev(>node), "Flow group %d wasn't 
destroyed, refcount > 1\n",
   fg->id);
 }
+
+static struct mlx5_flow_namespace *mlx5_get_flow_namespace(struct 
mlx5_core_dev *dev,
+  enum 
mlx5_flow_namespace_type type)
+{
+   struct mlx5_flow_root_namespace *root_ns = dev->priv.root_ns;
+   int prio;
+   static struct fs_prio *fs_prio;
+   struct mlx5_flow_namespace *ns;
+
+   if (!root_ns)
+   return NULL;
+
+   switch (type) {
+   case MLX5_FLOW_NAMESPACE_KERNEL:
+   prio = 0;
+   break;
+   case MLX5_FLOW_NAMESPACE_FDB:
+   if (dev->priv.fdb_root_ns)
+   return >priv.fdb_root_ns->ns;
+   else
+   return NULL;
+   default:
+   return NULL;
+   }
+
+   fs_prio = find_prio(_ns->ns, prio);
+   if (!fs_prio)
+   return NULL;
+
+   ns = list_first_entry(_prio->node.children,
+ typeof(*ns),
+ node.list);
+
+   return ns;
+}
+
+static struct fs_prio *fs_create_prio(struct mlx5_flow_namespace *ns,
+  

[PATCH net-next 6/7] net/mlx5: Use flow steering infrastructure for mlx5_en

2015-12-08 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Expose the new flow steering API and remove the old
one.

Few changes are required:

1. The Ethernet flow steering follows the existing implementation, but uses
the new steering API. The old flow steering implementation is removed.

2. Move the E-switch FDB management to use the new API.

3. When driver is loaded call to mlx5_init_fs which initialize
the flow steering tree structure, open namespaces for NIC receive
and for E-switch FDB.

4. Call to mlx5_cleanup_fs when the driver is unloaded.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   23 +-
 .../ethernet/mellanox/mlx5/core/en_flow_table.c|  824 
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  291 ++--
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |   15 +-
 .../net/ethernet/mellanox/mlx5/core/flow_table.c   |  422 --
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |   26 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |9 +
 include/linux/mlx5/flow_table.h|   63 --
 include/linux/mlx5/fs.h|   38 +
 11 files changed, 633 insertions(+), 1082 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/flow_table.c
 delete mode 100644 include/linux/mlx5/flow_table.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 7fc5e23..11ee062 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -3,6 +3,6 @@ obj-$(CONFIG_MLX5_CORE) += mlx5_core.o
 mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o   \
mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o
-mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o flow_table.o eswitch.o \
+mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o \
en_main.o en_flow_table.o en_ethtool.o en_tx.o en_rx.o \
en_txrx.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 89313d4..f689ce5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -64,6 +64,8 @@
 #define MLX5E_UPDATE_STATS_INTERVAL200 /* msecs */
 #define MLX5E_SQ_BF_BUDGET 16
 
+#define MLX5E_NUM_MAIN_GROUPS 9
+
 static const char vport_strings[][ETH_GSTRING_LEN] = {
/* vport statistics */
"rx_packets",
@@ -442,7 +444,7 @@ enum mlx5e_rqt_ix {
 struct mlx5e_eth_addr_info {
u8  addr[ETH_ALEN + 2];
u32 tt_vec;
-   u32 ft_ix[MLX5E_NUM_TT]; /* flow table index per traffic type */
+   struct mlx5_flow_rule *ft_rule[MLX5E_NUM_TT];
 };
 
 #define MLX5E_ETH_ADDR_HASH_SIZE (1 << BITS_PER_BYTE)
@@ -466,15 +468,22 @@ enum {
 
 struct mlx5e_vlan_db {
unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
-   u32   active_vlans_ft_ix[VLAN_N_VID];
-   u32   untagged_rule_ft_ix;
-   u32   any_vlan_rule_ft_ix;
+   struct mlx5_flow_rule   *active_vlans_rule[VLAN_N_VID];
+   struct mlx5_flow_rule   *untagged_rule;
+   struct mlx5_flow_rule   *any_vlan_rule;
bool  filter_disabled;
 };
 
 struct mlx5e_flow_table {
-   void *vlan;
-   void *main;
+   int num_groups;
+   struct mlx5_flow_table  *t;
+   struct mlx5_flow_group  **g;
+};
+
+struct mlx5e_flow_tables {
+   struct mlx5_flow_namespace  *ns;
+   struct mlx5e_flow_table vlan;
+   struct mlx5e_flow_table main;
 };
 
 struct mlx5e_priv {
@@ -497,7 +506,7 @@ struct mlx5e_priv {
u32rqtn[MLX5E_NUM_RQT];
u32tirn[MLX5E_NUM_TT];
 
-   struct mlx5e_flow_tableft;
+   struct mlx5e_flow_tables   fts;
struct mlx5e_eth_addr_db   eth_addr;
struct mlx5e_vlan_db   vlan;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
index 5b93c9c..80d81ab 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
@@ -34,9 +34,11 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include "en.h"
 
+#define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
+
 enum {
MLX5E_FULLMATCH = 0,
MLX5E_ALLMULTI  = 1,
@@ -103,44 +105,38 @@ static void mlx5e_del_eth_addr_from

[PATCH net-next 1/2] net/mlx5: Fix query E-Switch capabilities

2015-12-08 Thread Saeed Mahameed
E-Switch capabilities should be queried only if E-Switch flow table
is supported and not only when vport group manager.

Fixes: d753c6e8 ("net/mlx5: E-Switch, Introduce HCA cap and E-Switch vport 
context")
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fw.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index 1c9f9a5..aa1ab47 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -173,7 +173,7 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
return err;
}
 
-   if (MLX5_CAP_GEN(dev, vport_group_manager)) {
+   if (MLX5_CAP_GEN(dev, eswitch_flow_table)) {
err = mlx5_core_get_caps(dev, MLX5_CAP_ESWITCH,
 HCA_CAP_OPMOD_GET_CUR);
if (err)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 0/2] mlx5 minor SRIOV fixes

2015-12-08 Thread Saeed Mahameed
Hi Dave,

This short series fixes some minor issues in recently
introduced SRIOV code.
 
Saeed.

Saeed Mahameed (2):
  net/mlx5: Fix query E-Switch capabilities
  net/mlx5e: Assign random MAC address if needed

 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |5 +
 drivers/net/ethernet/mellanox/mlx5/core/fw.c   |2 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|3 +++
 3 files changed, 9 insertions(+), 1 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/2] net/mlx5e: Assign random MAC address if needed

2015-12-08 Thread Saeed Mahameed
Under SRIOV there might be a case where VFs are loaded
without pre-assigned MAC address. In this case, the VF
will randomize its own MAC.  This will address the case
of administrator not assigning MAC to the VF through
the PF OS APIs and keep udev happy.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |5 +
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|3 +++
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index d67058a..a20be56 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2103,6 +2103,11 @@ static void mlx5e_set_netdev_dev_addr(struct net_device 
*netdev)
struct mlx5e_priv *priv = netdev_priv(netdev);
 
mlx5_query_nic_vport_mac_address(priv->mdev, 0, netdev->dev_addr);
+   if (is_zero_ether_addr(netdev->dev_addr) &&
+   !MLX5_CAP_GEN(priv->mdev, vport_group_manager)) {
+   eth_hw_addr_random(netdev);
+   mlx5_core_info(priv->mdev, "Assigned random MAC address %pM\n", 
netdev->dev_addr);
+   }
 }
 
 static void mlx5e_build_netdev(struct net_device *netdev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index bee7da8..ea6a137 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -65,6 +65,9 @@ do {  
\
(__dev)->priv.name, __func__, __LINE__, current->pid,   \
##__VA_ARGS__)
 
+#define mlx5_core_info(__dev, format, ...) \
+   dev_info(&(__dev)->pdev->dev, format, ##__VA_ARGS__)
+
 enum {
MLX5_CMD_DATA, /* print command payload only */
MLX5_CMD_TIME, /* print command execution time */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/2] net/mlx5_en: Add HW timestamping (TS) support

2015-12-16 Thread Saeed Mahameed
From: Eran Ben Elisha <era...@mellanox.com>

Add support for enable/disable HW timestamping for incoming and/or
outgoing packets. It adds and initializes all structs and callbacks
needed by kernel TS API.  To enable/disable HW timestamping appropriate
ioctl should be used.  Currently HWTSTAMP_FILTER_ALL/NONE and
HWTSAMP_TX_ON/OFF only are supported.  Make all relevant changes in
RX/TX flows to consider TS request and plant HW timestamps into
relevant structures.

Add a PHC support to the mlx5_en driver. Use reader/writer spinlocks to
protect the timecounter since every packet received needs to call
timecounter_cycle2time() when timestamping is enabled.  This can become
a performance bottleneck with RSS and multiple receive queues if normal
spinlocks are used.

This driver has been tested with both Documentation/ptp/testptp and the
linuxptp project (http://linuxptp.sourceforge.net/) on a Mellanox
ConnectX-4 card.

Signed-off-by: Eran Ben Elisha <era...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|1 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   25 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  226 
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   32 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  103 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|9 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   14 ++
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   31 +++
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|1 +
 include/linux/mlx5/device.h|   20 ++-
 include/linux/mlx5/mlx5_ifc.h  |5 +-
 12 files changed, 462 insertions(+), 7 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 158c88c..c503ea0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -13,6 +13,7 @@ config MLX5_CORE
 config MLX5_CORE_EN
bool "Mellanox Technologies ConnectX-4 Ethernet support"
depends on NETDEVICES && ETHERNET && PCI && MLX5_CORE
+   select PTP_1588_CLOCK
default n
---help---
  Ethernet support in Mellanox Technologies ConnectX-4 NIC.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index fe11e96..01c0256 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -5,4 +5,4 @@ mlx5_core-y :=  main.o cmd.o debugfs.o fw.o eq.o uar.o 
pagealloc.o \
mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o \
en_main.o en_fs.o en_ethtool.o en_tx.o en_rx.o \
-   en_txrx.o
+   en_txrx.o en_clock.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f689ce5..7634eb2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -32,6 +32,9 @@
 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -63,6 +66,7 @@
 #define MLX5E_TX_CQ_POLL_BUDGET128
 #define MLX5E_UPDATE_STATS_INTERVAL200 /* msecs */
 #define MLX5E_SQ_BF_BUDGET 16
+#define MLX5E_SERVICE_TASK_DELAY   (HZ / 4)
 
 #define MLX5E_NUM_MAIN_GROUPS 9
 
@@ -486,6 +490,18 @@ struct mlx5e_flow_tables {
struct mlx5e_flow_table main;
 };
 
+struct mlx5e_tstamp {
+   rwlock_t   lock;
+   struct cyclecountercycles;
+   struct timecounter clock;
+   struct ptp_clock  *ptp;
+   struct ptp_clock_info  ptp_info;
+   struct hwtstamp_config hwtstamp_config;
+   u32nominal_c_mult;
+   unsigned long  last_overflow_check;
+   unsigned long  overflow_period;
+};
+
 struct mlx5e_priv {
/* priv data path fields - start */
intdefault_vlan_prio;
@@ -515,10 +531,12 @@ struct mlx5e_priv {
struct work_struct update_carrier_work;
struct work_struct set_rx_mode_work;
struct delayed_workupdate_stats_work;
+   struct delayed_workservice_task;
 
struct mlx5_core_dev  *mdev;
struct net_device *netdev;
struct mlx5e_stats stats;
+   struct mlx5e_tstamptstamp;
 };
 
 #define MLX5E_NET_IP_ALIGN 2
@@ -585,6 +603,13 @@ void mlx5e_destroy_flow_tables(struct mlx5e_priv *priv);
 void mlx5e_init_eth_addr(struct mlx5e_priv *priv);
 

[PATCH net-next 0/2] Introduce mlx5 ethernet timestamping

2015-12-16 Thread Saeed Mahameed
Hi Dave,

This patch series introduces the support for ConnectX-4 timestamping
and the PTP kernel interface.

First patch fixes a bug in SKB data pointer in device xmit function.
Second patch adds the needed low level helpers for:
- Fetching the hardware clock (hardware internal timer)
- Parsing CQEs timestamps
- Device frequency capability

Added new en_clock.c file that handles all needed timestamping
operations:
- Internal clock structure initialization.
- PTP registration and cleanup.
- PTP callbacks implementation.

Added the needed ioctl for setting/getting the current timestamping
configuration, and used this configuration in RX/TX data path to 
fill the SKB with the timestamp.

Eran Ben Elisha (2):
  net/mlx5_en: Restore the skb data pointer after xmit is finished
  net/mlx5_en: Add HW timestamping (TS) support

 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|1 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   25 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  226 
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   32 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  103 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|9 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   16 ++
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   31 +++
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|1 +
 include/linux/mlx5/device.h|   20 ++-
 include/linux/mlx5/mlx5_ifc.h  |5 +-
 12 files changed, 464 insertions(+), 7 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/2] net/mlx5_en: Restore the skb data pointer after xmit is finished

2015-12-16 Thread Saeed Mahameed
From: Eran Ben Elisha <era...@mellanox.com>

Restore the skb data pointer after coping the data to the HW, so the skb
can be cloned with correct headers for future use (e.g timestamping).

Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files")
Signed-off-by: Eran Ben Elisha <era...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 1341b1d..0fcfe64 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -165,6 +165,7 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, 
struct sk_buff *skb)
struct mlx5_wqe_eth_seg  *eseg = >eth;
struct mlx5_wqe_data_seg *dseg;
 
+   unsigned char *skb_data_orig = skb->data;
u8  opcode = MLX5_OPCODE_SEND;
dma_addr_t dma_addr = 0;
bool bf = false;
@@ -263,6 +264,7 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, 
struct sk_buff *skb)
cseg->opmod_idx_opcode = cpu_to_be32((sq->pc << 8) | opcode);
cseg->qpn_ds   = cpu_to_be32((sq->sqn << 8) | ds_cnt);
 
+   skb_push(skb, skb->data - skb_data_orig);
sq->skb[pi] = skb;
 
MLX5E_TX_SKB_CB(skb)->num_wqebbs = DIV_ROUND_UP(ds_cnt,
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V1 7/9] net/mlx5_core: Flow steering tree initialization

2015-12-10 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Flow steering initialization is based on static tree which
illustrates the flow steering tree when the driver is loaded. The
initialization considers the max supported flow table level of the device,
a minimum of 2 kernel flow tables(vlan and mac) are required to have
kernel flow table functionality.

The tree structures when the driver is loaded:

root_namespace(receive nic)
  |
priority-0 (kernel priority)
  |
namespace(kernel namespace)
  |
priority-0 (flow tables priority)

In the following patches, When the EN driver will use the flow steering
API, it create two flow tables and their flow groups under
priority-0(flow tables priority).

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  374 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |4 +
 include/linux/mlx5/driver.h   |2 +
 include/linux/mlx5/fs.h   |8 +
 4 files changed, 388 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 1828351..4264e8b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -37,6 +37,54 @@
 #include "fs_core.h"
 #include "fs_cmd.h"
 
+#define INIT_TREE_NODE_ARRAY_SIZE(...) (sizeof((struct 
init_tree_node[]){__VA_ARGS__}) /\
+sizeof(struct init_tree_node))
+
+#define INIT_PRIO(min_level_val, max_ft_val,\
+ start_level_val, ...) {.type = FS_TYPE_PRIO,\
+   .min_ft_level = min_level_val,\
+   .start_level = start_level_val,\
+   .max_ft = max_ft_val,\
+   .children = (struct init_tree_node[]) {__VA_ARGS__},\
+   .ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
+}
+
+#define ADD_PRIO(min_level_val, max_ft_val, start_level_val, ...)\
+   INIT_PRIO(min_level_val, max_ft_val, start_level_val,\
+ __VA_ARGS__)\
+
+#define ADD_FT_PRIO(max_ft_val, start_level_val, ...)\
+   INIT_PRIO(0, max_ft_val, start_level_val,\
+ __VA_ARGS__)\
+
+#define ADD_NS(...) {.type = FS_TYPE_NAMESPACE,\
+   .children = (struct init_tree_node[]) {__VA_ARGS__},\
+   .ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
+}
+
+#define KERNEL_START_LEVEL 0
+#define KERNEL_P0_START_LEVEL KERNEL_START_LEVEL
+#define KERNEL_MAX_FT 2
+#define KENREL_MIN_LEVEL 2
+static struct init_tree_node {
+   enum fs_node_type   type;
+   struct init_tree_node *children;
+   int ar_size;
+   int min_ft_level;
+   int prio;
+   int max_ft;
+   int start_level;
+} root_fs = {
+   .type = FS_TYPE_NAMESPACE,
+   .ar_size = 1,
+   .children = (struct init_tree_node[]) {
+   ADD_PRIO(KENREL_MIN_LEVEL, KERNEL_MAX_FT,
+KERNEL_START_LEVEL,
+ADD_NS(ADD_FT_PRIO(KERNEL_MAX_FT,
+   KERNEL_P0_START_LEVEL))),
+   }
+};
+
 static void del_rule(struct fs_node *node);
 static void del_flow_table(struct fs_node *node);
 static void del_flow_group(struct fs_node *node);
@@ -671,3 +719,329 @@ static void mlx5_destroy_flow_group(struct 
mlx5_flow_group *fg)
mlx5_core_warn(get_dev(>node), "Flow group %d wasn't 
destroyed, refcount > 1\n",
   fg->id);
 }
+
+static struct mlx5_flow_namespace *mlx5_get_flow_namespace(struct 
mlx5_core_dev *dev,
+  enum 
mlx5_flow_namespace_type type)
+{
+   struct mlx5_flow_root_namespace *root_ns = dev->priv.root_ns;
+   int prio;
+   static struct fs_prio *fs_prio;
+   struct mlx5_flow_namespace *ns;
+
+   if (!root_ns)
+   return NULL;
+
+   switch (type) {
+   case MLX5_FLOW_NAMESPACE_KERNEL:
+   prio = 0;
+   break;
+   case MLX5_FLOW_NAMESPACE_FDB:
+   if (dev->priv.fdb_root_ns)
+   return >priv.fdb_root_ns->ns;
+   else
+   return NULL;
+   default:
+   return NULL;
+   }
+
+   fs_prio = find_prio(_ns->ns, prio);
+   if (!fs_prio)
+   return NULL;
+
+   ns = list_first_entry(_prio->node.children,
+ typeof(*ns),
+ node.list);
+
+   return ns;
+}
+
+static struct fs_prio *fs_create_prio(struct mlx5_flow_namespace *ns,
+  

[PATCH net-next V1 6/9] net/mlx5_core: Introduce flow steering API

2015-12-10 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Introducing the following objects:

mlx5_flow_root_namespace: represent the root of specific flow table
type tree(e.g NIC receive, FDB, etc..)

mlx5_flow_group: define the mask of the flow specification.

fs_fte(flow steering flow table entry): defines the value of the
flow specification.

The following describes the relationships between the tree objects:
root_namespace --> priorities -->namespaces -->
priorities -->flow-tables --> flow-groups -->
flow-entries --> destinations

When we create new object(flow table/flow group/flow table entry), we
call to the FW command and then we add the related sw object to the tree.

When we destroy object, e.g. call to mlx5_destroy_flow_table, we use
the tree node destructor for destroying the FW object and remove the
node from the tree.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  464 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |   23 +
 2 files changed, 487 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index cac0d15..1828351 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -35,6 +35,12 @@
 
 #include "mlx5_core.h"
 #include "fs_core.h"
+#include "fs_cmd.h"
+
+static void del_rule(struct fs_node *node);
+static void del_flow_table(struct fs_node *node);
+static void del_flow_group(struct fs_node *node);
+static void del_fte(struct fs_node *node);
 
 static void tree_init_node(struct fs_node *node,
   unsigned int refcount,
@@ -207,3 +213,461 @@ static bool compare_match_criteria(u8 
match_criteria_enable1,
return match_criteria_enable1 == match_criteria_enable2 &&
!memcmp(mask1, mask2, MLX5_ST_SZ_BYTES(fte_match_param));
 }
+
+static struct mlx5_flow_root_namespace *find_root(struct fs_node *node)
+{
+   struct fs_node *root;
+   struct mlx5_flow_namespace *ns;
+
+   root = node->root;
+
+   if (WARN_ON(root->type != FS_TYPE_NAMESPACE)) {
+   pr_warn("mlx5: flow steering node is not in tree or 
garbaged\n");
+   return NULL;
+   }
+
+   ns = container_of(root, struct mlx5_flow_namespace, node);
+   return container_of(ns, struct mlx5_flow_root_namespace, ns);
+}
+
+static inline struct mlx5_core_dev *get_dev(struct fs_node *node)
+{
+   struct mlx5_flow_root_namespace *root = find_root(node);
+
+   if (root)
+   return root->dev;
+   return NULL;
+}
+
+static void del_flow_table(struct fs_node *node)
+{
+   struct mlx5_flow_table *ft;
+   struct mlx5_core_dev *dev;
+   struct fs_prio *prio;
+   int err;
+
+   fs_get_obj(ft, node);
+   dev = get_dev(>node);
+
+   err = mlx5_cmd_destroy_flow_table(dev, ft);
+   if (err)
+   pr_warn("flow steering can't destroy ft\n");
+   fs_get_obj(prio, ft->node.parent);
+   prio->num_ft--;
+}
+
+static void del_rule(struct fs_node *node)
+{
+   struct mlx5_flow_rule *rule;
+   struct mlx5_flow_table *ft;
+   struct mlx5_flow_group *fg;
+   struct fs_fte *fte;
+   u32 *match_value;
+   struct mlx5_core_dev *dev = get_dev(node);
+   int match_len = MLX5_ST_SZ_BYTES(fte_match_param);
+   int err;
+
+   match_value = mlx5_vzalloc(match_len);
+   if (!match_value) {
+   pr_warn("failed to allocate inbox\n");
+   return;
+   }
+
+   fs_get_obj(rule, node);
+   fs_get_obj(fte, rule->node.parent);
+   fs_get_obj(fg, fte->node.parent);
+   memcpy(match_value, fte->val, sizeof(fte->val));
+   fs_get_obj(ft, fg->node.parent);
+   list_del(>node.list);
+   fte->dests_size--;
+   if (fte->dests_size) {
+   err = mlx5_cmd_update_fte(dev, ft,
+ fg->id, fte);
+   if (err)
+   pr_warn("%s can't del rule fg id=%d fte_index=%d\n",
+   __func__, fg->id, fte->index);
+   }
+   kvfree(match_value);
+}
+
+static void del_fte(struct fs_node *node)
+{
+   struct mlx5_flow_table *ft;
+   struct mlx5_flow_group *fg;
+   struct mlx5_core_dev *dev;
+   struct fs_fte *fte;
+   int err;
+
+   fs_get_obj(fte, node);
+   fs_get_obj(fg, fte->node.parent);
+   fs_get_obj(ft, fg->node.parent);
+
+   dev = get_dev(>node);
+   err = mlx5_cmd_delete_fte(dev, ft,
+ fte-

[PATCH net-next V1 0/9] mlx5 improved flow steering management

2015-12-10 Thread Saeed Mahameed
Hi Dave,

First two patches fixes some minor issues in recently
introduced SRIOV code.

The other seven patches modifies the driver's code that 
manages flow steering rules with Connectx-4 devices.

Basic introduction:

The flow steering device specification model is composed of the following 
entities:

Destination (either a TIR/Flow table/vport), where TIR is RSS end-point, vport 
is the VF eSwitch port in SRIOV.

Flow table entry (FTE) - the values used by the flow specification  
Flow table group (FG) - the masks used by the flow specification
Flow table (FT) - groups several FGs and can serve as destination

The flow steering software entities:

In addition to the device objects, the software have two more objects:

Priorities - group several FTs. Handles order of packet matching.

Namespaces - group several priorities. Namespace are used in order to 
isolate different usages of steering (for example, add two separate
namespaces, one for the NIC driver and one for E-Switch FDB).

The base data structure for the flow steering management is a tree and 
all the flow steering objects such as (Namespace/Flow table/Flow 
Group/FTE/etc.) 
are represented as a node in the tree, e.g.:
Priority-0 -> FT1 -> FG -> FTE -> TIR (destination)
Priority-1 -> FT2 -> FG->  FTE -> TIR (destination)

Matching begins in FT1 flow rules and if there is a miss on all the FTEs 
then matching continues on the FTEs in FT2.

The new implementation solves/improves the following
issues in the current code:

1) The new impl. supports multiple destinations, the search for existing rule 
with
   the same matching value is performed by the flow steering management.
   In the current impl. the E-switch FDB management code needs to search 
   for existing rules before calling to the add rule function.

2) The new impl. manages the flow table level, in the current implementation 
the 
   consumer states the flow table level when new flow table is created without 
   any knowledge about the levels of other flow tables.

3) In the current impl. the consumer can't create or destroy flow
   groups dynamically, the flow groups are passed as argument to the create 
   flow table API. The new impl. exposes API for create/destroy flow group.

The series is built as follows:

Patch #1 add flow steering API firmware commands.

Patch #2 add tree operation of the flow steering tree: add/remove node,
initialize node and take reference count on a node.

Patch #3 add essential algorithms for managing the flow steering.

Patch #4 Initialize the flow steering tree, flow steering initialization is 
based 
on static tree which illustrates the flow steering tree when the driver is 
loaded.

Patch #5 is the main patch of the series. It introduce the flow steering API.

Patch #6 Expose the new flow steering API and remove the old one.
The Ethernet flow steering follows the existing implementation,
but uses the new steering API.

Patch #7 Rename en_flow_table.c to en_fs.c in order to be aligned with
the new flow steering files.



Maor Gottlieb (7):
  net/mlx5_core: Introduce flow steering firmware commands
  net/mlx5_core: Add flow steering base data structures
  net/mlx5_core: Add flow steering lookup algorithms
  net/mlx5_core: Introduce flow steering API
  net/mlx5_core: Flow steering tree initialization
  net/mlx5: Use flow steering infrastructure for mlx5_en
  net/mlx5e: Rename en_flow_table.c to en_fs.c

Saeed Mahameed (2):
  net/mlx5: Fix query E-Switch capabilities
  net/mlx5e: Assign random MAC address if needed

 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   23 +-
 .../ethernet/mellanox/mlx5/core/en_flow_table.c| 1046 -
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c| 1224 
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |7 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  291 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |   15 +-
 .../net/ethernet/mellanox/mlx5/core/flow_table.c   |  422 ---
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |  239 
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h   |   65 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 1047 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |  155 +++
 drivers/net/ethernet/mellanox/mlx5/core/fw.c   |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |9 +
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|3 +
 include/linux/mlx5/driver.h|2 +
 include/linux/mlx5/flow_table.h|   63 -
 include/linux/mlx5/fs.h|   93 ++
 include/linux/mlx5/mlx5_ifc.h  |   32 +-
 19 files changed, 2939 insertions(+), 1805 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
 create mode 100644 drivers/ne

[PATCH net-next V1 9/9] net/mlx5e: Rename en_flow_table.c to en_fs.c

2015-12-10 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Rename en_flow_table.c to en_fs.c in order to be aligned
with the new flow steering files.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |2 +-
 .../ethernet/mellanox/mlx5/core/en_flow_table.c| 1224 
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c| 1224 
 3 files changed, 1225 insertions(+), 1225 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 11ee062..fe11e96 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -4,5 +4,5 @@ mlx5_core-y :=  main.o cmd.o debugfs.o fw.o eq.o uar.o 
pagealloc.o \
health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o   \
mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o \
-   en_main.o en_flow_table.o en_ethtool.o en_tx.o en_rx.o \
+   en_main.o en_fs.o en_ethtool.o en_tx.o en_rx.o \
en_txrx.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
deleted file mode 100644
index 80d81ab..000
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
+++ /dev/null
@@ -1,1224 +0,0 @@
-/*
- * Copyright (c) 2015, Mellanox Technologies. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- *  - Redistributions of source code must retain the above
- *copyright notice, this list of conditions and the following
- *disclaimer.
- *
- *  - Redistributions in binary form must reproduce the above
- *copyright notice, this list of conditions and the following
- *disclaimer in the documentation and/or other materials
- *provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include "en.h"
-
-#define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
-
-enum {
-   MLX5E_FULLMATCH = 0,
-   MLX5E_ALLMULTI  = 1,
-   MLX5E_PROMISC   = 2,
-};
-
-enum {
-   MLX5E_UC= 0,
-   MLX5E_MC_IPV4   = 1,
-   MLX5E_MC_IPV6   = 2,
-   MLX5E_MC_OTHER  = 3,
-};
-
-enum {
-   MLX5E_ACTION_NONE = 0,
-   MLX5E_ACTION_ADD  = 1,
-   MLX5E_ACTION_DEL  = 2,
-};
-
-struct mlx5e_eth_addr_hash_node {
-   struct hlist_node  hlist;
-   u8 action;
-   struct mlx5e_eth_addr_info ai;
-};
-
-static inline int mlx5e_hash_eth_addr(u8 *addr)
-{
-   return addr[5];
-}
-
-static void mlx5e_add_eth_addr_to_hash(struct hlist_head *hash, u8 *addr)
-{
-   struct mlx5e_eth_addr_hash_node *hn;
-   int ix = mlx5e_hash_eth_addr(addr);
-   int found = 0;
-
-   hlist_for_each_entry(hn, [ix], hlist)
-   if (ether_addr_equal_64bits(hn->ai.addr, addr)) {
-   found = 1;
-   break;
-   }
-
-   if (found) {
-   hn->action = MLX5E_ACTION_NONE;
-   return;
-   }
-
-   hn = kzalloc(sizeof(*hn), GFP_ATOMIC);
-   if (!hn)
-   return;
-
-   ether_addr_copy(hn->ai.addr, addr);
-   hn->action = MLX5E_ACTION_ADD;
-
-   hlist_add_head(>hlist, [ix]);
-}
-
-static void mlx5e_del_eth_addr_from_hash(struct mlx5e_eth_addr_hash_node *hn)
-{
-   hlist_del(>hlist);
-   kfree(hn);
-}
-
-static void mlx5e_del_eth_addr_from_flow_table(struct mlx5e_priv *priv,
-  struct mlx5e_eth_ad

[PATCH net-next V1 4/9] net/mlx5_core: Add flow steering base data structures

2015-12-10 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Introducing the base data structure and its operations that are
going to represent ConnectX-4 Flow Steering, this data structure
is basically a tree and all Flow steering objects such as
(Flow Table/Flow Group/FTE/etc ..) are represented as fs_node(s).

fs_node is the base object which describes a basic tree node, with the
following extra info:
type: describes the runtime type of the node (Object).
lock: lock this node sub-tree.
ref_count: number of children + current references.
remove_func: a generic destructor.

fs_node types will be used and explained once the usage is added in the
following patches.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  116 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |6 +
 3 files changed, 123 insertions(+), 1 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index be10592..7fc5e23 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -2,7 +2,7 @@ obj-$(CONFIG_MLX5_CORE) += mlx5_core.o
 
 mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o   \
-   mad.o transobj.o vport.o sriov.o fs_cmd.o
+   mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o flow_table.o eswitch.o \
en_main.o en_flow_table.o en_ethtool.o en_tx.o en_rx.o \
en_txrx.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
new file mode 100644
index 000..3c54d7b
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -0,0 +1,116 @@
+/*
+ * Copyright (c) 2015, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+
+#include "mlx5_core.h"
+#include "fs_core.h"
+
+static void tree_init_node(struct fs_node *node,
+  unsigned int refcount,
+  void (*remove_func)(struct fs_node *))
+{
+   atomic_set(>refcount, refcount);
+   INIT_LIST_HEAD(>list);
+   INIT_LIST_HEAD(>children);
+   mutex_init(>lock);
+   node->remove_func = remove_func;
+}
+
+static void tree_add_node(struct fs_node *node, struct fs_node *parent)
+{
+   if (parent)
+   atomic_inc(>refcount);
+   node->parent = parent;
+
+   /* Parent is the root */
+   if (!parent)
+   node->root = node;
+   else
+   node->root = parent->root;
+}
+
+static void tree_get_node(struct fs_node *node)
+{
+   atomic_inc(>refcount);
+}
+
+static void nested_lock_ref_node(struct fs_node *node)
+{
+   if (node) {
+   mutex_lock_nested(>lock, SINGLE_DEPTH_NESTING);
+   atomic_inc(>refcount);
+   }
+}
+
+static void lock_ref_node(struct fs_node *node)
+{
+   if (node) {
+   mutex_lock(>lock);
+   atomic_inc(>refcount);
+   }
+}
+
+static void unlock_ref_node(struct 

[PATCH net-next V1 1/9] net/mlx5: Fix query E-Switch capabilities

2015-12-10 Thread Saeed Mahameed
E-Switch capabilities should be queried only if E-Switch flow table
is supported and not only when vport group manager.

Fixes: d753c6e8 ("net/mlx5: E-Switch, Introduce HCA cap and E-Switch vport 
context")
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fw.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index 1c9f9a5..aa1ab47 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -173,7 +173,7 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
return err;
}
 
-   if (MLX5_CAP_GEN(dev, vport_group_manager)) {
+   if (MLX5_CAP_GEN(dev, eswitch_flow_table)) {
err = mlx5_core_get_caps(dev, MLX5_CAP_ESWITCH,
 HCA_CAP_OPMOD_GET_CUR);
if (err)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V1 8/9] net/mlx5: Use flow steering infrastructure for mlx5_en

2015-12-10 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Expose the new flow steering API and remove the old
one.

Few changes are required:

1. The Ethernet flow steering follows the existing implementation, but uses
the new steering API. The old flow steering implementation is removed.

2. Move the E-switch FDB management to use the new API.

3. When driver is loaded call to mlx5_init_fs which initialize
the flow steering tree structure, open namespaces for NIC receive
and for E-switch FDB.

4. Call to mlx5_cleanup_fs when the driver is unloaded.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   23 +-
 .../ethernet/mellanox/mlx5/core/en_flow_table.c|  824 
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  291 ++--
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |   15 +-
 .../net/ethernet/mellanox/mlx5/core/flow_table.c   |  422 --
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |   26 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |9 +
 include/linux/mlx5/flow_table.h|   63 --
 include/linux/mlx5/fs.h|   38 +
 11 files changed, 633 insertions(+), 1082 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/flow_table.c
 delete mode 100644 include/linux/mlx5/flow_table.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 7fc5e23..11ee062 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -3,6 +3,6 @@ obj-$(CONFIG_MLX5_CORE) += mlx5_core.o
 mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o   \
mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o
-mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o flow_table.o eswitch.o \
+mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o \
en_main.o en_flow_table.o en_ethtool.o en_tx.o en_rx.o \
en_txrx.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 89313d4..f689ce5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -64,6 +64,8 @@
 #define MLX5E_UPDATE_STATS_INTERVAL200 /* msecs */
 #define MLX5E_SQ_BF_BUDGET 16
 
+#define MLX5E_NUM_MAIN_GROUPS 9
+
 static const char vport_strings[][ETH_GSTRING_LEN] = {
/* vport statistics */
"rx_packets",
@@ -442,7 +444,7 @@ enum mlx5e_rqt_ix {
 struct mlx5e_eth_addr_info {
u8  addr[ETH_ALEN + 2];
u32 tt_vec;
-   u32 ft_ix[MLX5E_NUM_TT]; /* flow table index per traffic type */
+   struct mlx5_flow_rule *ft_rule[MLX5E_NUM_TT];
 };
 
 #define MLX5E_ETH_ADDR_HASH_SIZE (1 << BITS_PER_BYTE)
@@ -466,15 +468,22 @@ enum {
 
 struct mlx5e_vlan_db {
unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
-   u32   active_vlans_ft_ix[VLAN_N_VID];
-   u32   untagged_rule_ft_ix;
-   u32   any_vlan_rule_ft_ix;
+   struct mlx5_flow_rule   *active_vlans_rule[VLAN_N_VID];
+   struct mlx5_flow_rule   *untagged_rule;
+   struct mlx5_flow_rule   *any_vlan_rule;
bool  filter_disabled;
 };
 
 struct mlx5e_flow_table {
-   void *vlan;
-   void *main;
+   int num_groups;
+   struct mlx5_flow_table  *t;
+   struct mlx5_flow_group  **g;
+};
+
+struct mlx5e_flow_tables {
+   struct mlx5_flow_namespace  *ns;
+   struct mlx5e_flow_table vlan;
+   struct mlx5e_flow_table main;
 };
 
 struct mlx5e_priv {
@@ -497,7 +506,7 @@ struct mlx5e_priv {
u32rqtn[MLX5E_NUM_RQT];
u32tirn[MLX5E_NUM_TT];
 
-   struct mlx5e_flow_tableft;
+   struct mlx5e_flow_tables   fts;
struct mlx5e_eth_addr_db   eth_addr;
struct mlx5e_vlan_db   vlan;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
index 5b93c9c..80d81ab 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
@@ -34,9 +34,11 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include "en.h"
 
+#define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
+
 enum {
MLX5E_FULLMATCH = 0,
MLX5E_ALLMULTI  = 1,
@@ -103,44 +105,38 @@ static void mlx5e_del_eth_addr_from

[PATCH net-next V1 5/9] net/mlx5_core: Add flow steering lookup algorithms

2015-12-10 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Introduce the flow steering mlx5_flow_namespace (Namespace)
and fs_prio (Flow Steering Priority) tree nodes.

Namespaces are used in order to isolate different usages or types
of steering (for example, downstream patches will add a different
namespaces for the NIC driver and for E-Switch FDB usages).

Flow Steering Priorities are objects that describes priorities
ranges between different flow objects under the same namespace.

Example, entries in priority i are matched before entries
in priority i+1.

This patch adds the following algorithms:

1) Calculate level:
Each flow table has level(the priority between the flow tables).
When we initialize the flow steering tree, we assign range of levels
to each priority, therefore the level for new flow table is
the location within the priority related to the range of the priority.

2) Match between match criteria. This function is used
for searching flow group when new flow rule is added.

3) Match between match values. This function is used
for searching flow table entry  when new flow rule is added.

4) Add essential macros for traversing on a node's children.
E.g. traversing on all the flow table of some priority

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   93 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |   41 +
 2 files changed, 134 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 3c54d7b..cac0d15 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -114,3 +114,96 @@ static int tree_remove_node(struct fs_node *node)
tree_put_node(node);
return 0;
 }
+
+static struct fs_prio *find_prio(struct mlx5_flow_namespace *ns,
+unsigned int prio)
+{
+   struct fs_prio *iter_prio;
+
+   fs_for_each_prio(iter_prio, ns) {
+   if (iter_prio->prio == prio)
+   return iter_prio;
+   }
+
+   return NULL;
+}
+
+static unsigned int find_next_free_level(struct fs_prio *prio)
+{
+   if (!list_empty(>node.children)) {
+   struct mlx5_flow_table *ft;
+
+   ft = list_last_entry(>node.children,
+struct mlx5_flow_table,
+node.list);
+   return ft->level + 1;
+   }
+   return prio->start_level;
+}
+
+static bool masked_memcmp(void *mask, void *val1, void *val2, size_t size)
+{
+   unsigned int i;
+
+   for (i = 0; i < size; i++, mask++, val1++, val2++)
+   if ((*((u8 *)val1) & (*(u8 *)mask)) !=
+   ((*(u8 *)val2) & (*(u8 *)mask)))
+   return false;
+
+   return true;
+}
+
+static bool compare_match_value(struct mlx5_flow_group_mask *mask,
+   void *fte_param1, void *fte_param2)
+{
+   if (mask->match_criteria_enable &
+   1 << MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_OUTER_HEADERS) 
{
+   void *fte_match1 = MLX5_ADDR_OF(fte_match_param,
+   fte_param1, outer_headers);
+   void *fte_match2 = MLX5_ADDR_OF(fte_match_param,
+   fte_param2, outer_headers);
+   void *fte_mask = MLX5_ADDR_OF(fte_match_param,
+ mask->match_criteria, 
outer_headers);
+
+   if (!masked_memcmp(fte_mask, fte_match1, fte_match2,
+  MLX5_ST_SZ_BYTES(fte_match_set_lyr_2_4)))
+   return false;
+   }
+
+   if (mask->match_criteria_enable &
+   1 << 
MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_MISC_PARAMETERS) {
+   void *fte_match1 = MLX5_ADDR_OF(fte_match_param,
+   fte_param1, misc_parameters);
+   void *fte_match2 = MLX5_ADDR_OF(fte_match_param,
+   fte_param2, misc_parameters);
+   void *fte_mask = MLX5_ADDR_OF(fte_match_param,
+ mask->match_criteria, 
misc_parameters);
+
+   if (!masked_memcmp(fte_mask, fte_match1, fte_match2,
+  MLX5_ST_SZ_BYTES(fte_match_set_misc)))
+   return false;
+   }
+
+   if (mask->match_criteria_enable &
+   1 << MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_INNER_HEADERS) 
{
+   void *fte_match1 = ML

[PATCH net-next V1 3/9] net/mlx5_core: Introduce flow steering firmware commands

2015-12-10 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Introduce new Flow Steering (FS) firmware commands,
in-order to support the new flow steering infrastructure.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c  |  239 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h  |   65 ++
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |   81 +++
 include/linux/mlx5/fs.h   |   47 
 include/linux/mlx5/mlx5_ifc.h |   32 ++-
 6 files changed, 455 insertions(+), 11 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
 create mode 100644 include/linux/mlx5/fs.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index a075591..be10592 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -2,7 +2,7 @@ obj-$(CONFIG_MLX5_CORE) += mlx5_core.o
 
 mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o   \
-   mad.o transobj.o vport.o sriov.o
+   mad.o transobj.o vport.o sriov.o fs_cmd.o
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o flow_table.o eswitch.o \
en_main.o en_flow_table.o en_ethtool.o en_tx.o en_rx.o \
en_txrx.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
new file mode 100644
index 000..5096f4f
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -0,0 +1,239 @@
+/*
+ * Copyright (c) 2015, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "fs_core.h"
+#include "fs_cmd.h"
+#include "mlx5_core.h"
+
+int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
+  enum fs_flow_table_type type, unsigned int level,
+  unsigned int log_size, unsigned int *table_id)
+{
+   u32 out[MLX5_ST_SZ_DW(create_flow_table_out)];
+   u32 in[MLX5_ST_SZ_DW(create_flow_table_in)];
+   int err;
+
+   memset(in, 0, sizeof(in));
+
+   MLX5_SET(create_flow_table_in, in, opcode,
+MLX5_CMD_OP_CREATE_FLOW_TABLE);
+
+   MLX5_SET(create_flow_table_in, in, table_type, type);
+   MLX5_SET(create_flow_table_in, in, level, level);
+   MLX5_SET(create_flow_table_in, in, log_size, log_size);
+
+   memset(out, 0, sizeof(out));
+   err = mlx5_cmd_exec_check_status(dev, in, sizeof(in), out,
+sizeof(out));
+
+   if (!err)
+   *table_id = MLX5_GET(create_flow_table_out, out,
+table_id);
+   return err;
+}
+
+int mlx5_cmd_destroy_flow_table(struct mlx5_core_dev *dev,
+   struct mlx5_flow_table *ft)
+{
+   u32 in[MLX5_ST_SZ_DW(destroy_flow_table_in)];
+   u32 out[MLX5_ST_SZ_DW(destroy_flow_table_out)];
+
+   memset(in, 0, sizeof(in));
+   memset(out,

[PATCH net-next V1 2/9] net/mlx5e: Assign random MAC address if needed

2015-12-10 Thread Saeed Mahameed
Under SRIOV there might be a case where VFs are loaded
without pre-assigned MAC address. In this case, the VF
will randomize its own MAC.  This will address the case
of administrator not assigning MAC to the VF through
the PF OS APIs and keep udev happy.

Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |5 +
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|3 +++
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index d67058a..a20be56 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2103,6 +2103,11 @@ static void mlx5e_set_netdev_dev_addr(struct net_device 
*netdev)
struct mlx5e_priv *priv = netdev_priv(netdev);
 
mlx5_query_nic_vport_mac_address(priv->mdev, 0, netdev->dev_addr);
+   if (is_zero_ether_addr(netdev->dev_addr) &&
+   !MLX5_CAP_GEN(priv->mdev, vport_group_manager)) {
+   eth_hw_addr_random(netdev);
+   mlx5_core_info(priv->mdev, "Assigned random MAC address %pM\n", 
netdev->dev_addr);
+   }
 }
 
 static void mlx5e_build_netdev(struct net_device *netdev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index bee7da8..ea6a137 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -65,6 +65,9 @@ do {  
\
(__dev)->priv.name, __func__, __LINE__, current->pid,   \
##__VA_ARGS__)
 
+#define mlx5_core_info(__dev, format, ...) \
+   dev_info(&(__dev)->pdev->dev, format, ##__VA_ARGS__)
+
 enum {
MLX5_CMD_DATA, /* print command payload only */
MLX5_CMD_TIME, /* print command execution time */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next V3 0/4] Introduce mlx5 ethernet timestamping

2016-01-04 Thread Saeed Mahameed
Hi Dave,

Any chance you are giving this a shot ?

On Tue, Dec 29, 2015 at 2:58 PM, Saeed Mahameed <sae...@mellanox.com> wrote:
> Hi Dave,
>
> This patch series introduces the support for ConnectX-4 timestamping
> and the PTP kernel interface.
>
> Changes from V2:
> net/mlx5_core: Introduce access function to read internal_timer
> - Remove one line function
> - Change function name
>
> net/mlx5e: Add HW timestamping (TS) support:
> - Data path performance optimization (caching tstamp struct in rq,sq)
> - Change read/write_lock_irqsave to read/write_lock
> - Move ioctl functions to en_clock file
> - Changed overflow start algorithm according to comments from Richard
> - Move timestamp init/cleanup to open/close ndos.
>
> In details:
>
> 1st patch prevents the driver from modifying skb->data and SKB CB in
> device xmit function.
>
> 2nd patch adds the needed low level helpers for:
> - Fetching the hardware clock (hardware internal timer)
> - Parsing CQEs timestamps
> - Device frequency capability
>
> 3rd patch adds new en_clock.c file that handles all needed timestamping
> operations:
> - Internal clock structure initialization and other helper functions
> - Added the needed ioctl for setting/getting the current timestamping
>   configuration.
> - used this configuration in RX/TX data path to fill the SKB with
>   the timestamp.
>
> 4th patch Introduces PTP (PHC) support.
>
> Achiad Shochat (1):
>   net/mlx5e: Do not modify the TX SKB
>
> Eran Ben Elisha (3):
>   net/mlx5_core: Introduce access function to read internal timer
>   net/mlx5e: Add HW timestamping (TS) support
>   net/mlx5e: Add PTP Hardware Clock (PHC) support
>
>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig|1 +
>  drivers/net/ethernet/mellanox/mlx5/core/Makefile   |2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/en.h   |   31 ++-
>  drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  287 
> 
>  .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   30 ++
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   24 ++-
>  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|9 +
>  drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   85 --
>  drivers/net/ethernet/mellanox/mlx5/core/main.c |   13 +
>  .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|1 +
>  include/linux/mlx5/device.h|   20 ++-
>  include/linux/mlx5/mlx5_ifc.h  |6 +-
>  12 files changed, 467 insertions(+), 42 deletions(-)
>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next V3 0/4] Introduce mlx5 ethernet timestamping

2016-01-04 Thread Saeed Mahameed
Sorry my mistake,

please ignore my last email, just got your response on the matter,

On Mon, Jan 4, 2016 at 11:57 PM, Saeed Mahameed
<sae...@dev.mellanox.co.il> wrote:
> Hi Dave,
>
> Any chance you are giving this a shot ?
>
> On Tue, Dec 29, 2015 at 2:58 PM, Saeed Mahameed <sae...@mellanox.com> wrote:
>> Hi Dave,
>>
>> This patch series introduces the support for ConnectX-4 timestamping
>> and the PTP kernel interface.
>>
>> Changes from V2:
>> net/mlx5_core: Introduce access function to read internal_timer
>> - Remove one line function
>> - Change function name
>>
>> net/mlx5e: Add HW timestamping (TS) support:
>> - Data path performance optimization (caching tstamp struct in rq,sq)
>> - Change read/write_lock_irqsave to read/write_lock
>> - Move ioctl functions to en_clock file
>> - Changed overflow start algorithm according to comments from Richard
>> - Move timestamp init/cleanup to open/close ndos.
>>
>> In details:
>>
>> 1st patch prevents the driver from modifying skb->data and SKB CB in
>> device xmit function.
>>
>> 2nd patch adds the needed low level helpers for:
>> - Fetching the hardware clock (hardware internal timer)
>> - Parsing CQEs timestamps
>> - Device frequency capability
>>
>> 3rd patch adds new en_clock.c file that handles all needed timestamping
>> operations:
>> - Internal clock structure initialization and other helper functions
>> - Added the needed ioctl for setting/getting the current timestamping
>>   configuration.
>> - used this configuration in RX/TX data path to fill the SKB with
>>   the timestamp.
>>
>> 4th patch Introduces PTP (PHC) support.
>>
>> Achiad Shochat (1):
>>   net/mlx5e: Do not modify the TX SKB
>>
>> Eran Ben Elisha (3):
>>   net/mlx5_core: Introduce access function to read internal timer
>>   net/mlx5e: Add HW timestamping (TS) support
>>   net/mlx5e: Add PTP Hardware Clock (PHC) support
>>
>>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig|1 +
>>  drivers/net/ethernet/mellanox/mlx5/core/Makefile   |2 +-
>>  drivers/net/ethernet/mellanox/mlx5/core/en.h   |   31 ++-
>>  drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  287 
>> 
>>  .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   30 ++
>>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   24 ++-
>>  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|9 +
>>  drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   85 --
>>  drivers/net/ethernet/mellanox/mlx5/core/main.c |   13 +
>>  .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|1 +
>>  include/linux/mlx5/device.h|   20 ++-
>>  include/linux/mlx5/mlx5_ifc.h  |6 +-
>>  12 files changed, 467 insertions(+), 42 deletions(-)
>>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
>>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next V3 0/4] Introduce mlx5 ethernet timestamping

2016-01-05 Thread Saeed Mahameed
Thank you David and Richard.

On Tue, Jan 5, 2016 at 9:12 PM, David Miller <da...@davemloft.net> wrote:
> From: Saeed Mahameed <sae...@mellanox.com>
> Date: Tue, 29 Dec 2015 14:58:28 +0200
>
>> This patch series introduces the support for ConnectX-4 timestamping
>> and the PTP kernel interface.
>
> Series applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 03/12] net/mlx5_core: Managing root flow table

2016-01-05 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

The root Flow Table for each Flow Table Type is defined,
by default, as the Flow Table with level 0.

In order not to use an empty flow tables and introduce new hops,
but still preserve space for flow-tables that have a priority
greater(lower number) than the current flow table, we introduce this
new set root flow table command.
This command tells the HW to start matching packets from the
assigned root flow table.
This command is used when we create new flow table with level lower than the
current lowest flow table or it is the first flow table.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c  |   18 
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h  |2 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   97 +++--
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |6 ++
 include/linux/mlx5/mlx5_ifc.h |   31 +++-
 5 files changed, 144 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index 5096f4f..d8b1195 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -38,6 +38,24 @@
 #include "fs_cmd.h"
 #include "mlx5_core.h"
 
+int mlx5_cmd_update_root_ft(struct mlx5_core_dev *dev,
+   struct mlx5_flow_table *ft)
+{
+   u32 in[MLX5_ST_SZ_DW(set_flow_table_root_in)];
+   u32 out[MLX5_ST_SZ_DW(set_flow_table_root_out)];
+
+   memset(in, 0, sizeof(in));
+
+   MLX5_SET(set_flow_table_root_in, in, opcode,
+MLX5_CMD_OP_SET_FLOW_TABLE_ROOT);
+   MLX5_SET(set_flow_table_root_in, in, table_type, ft->type);
+   MLX5_SET(set_flow_table_root_in, in, table_id, ft->id);
+
+   memset(out, 0, sizeof(out));
+   return mlx5_cmd_exec_check_status(dev, in, sizeof(in), out,
+ sizeof(out));
+}
+
 int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
   enum fs_flow_table_type type, unsigned int level,
   unsigned int log_size, unsigned int *table_id)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
index f39304e..70d18ec 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
@@ -62,4 +62,6 @@ int mlx5_cmd_delete_fte(struct mlx5_core_dev *dev,
struct mlx5_flow_table *ft,
unsigned int index);
 
+int mlx5_cmd_update_root_ft(struct mlx5_core_dev *dev,
+   struct mlx5_flow_table *ft);
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index e62cc59..6445489 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -510,6 +510,29 @@ static struct mlx5_flow_table *find_prev_chained_ft(struct 
fs_prio *prio)
return find_closest_ft(prio, true);
 }
 
+static int update_root_ft_create(struct mlx5_flow_table *ft, struct fs_prio
+*prio)
+{
+   struct mlx5_flow_root_namespace *root = find_root(>node);
+   int min_level = INT_MAX;
+   int err;
+
+   if (root->root_ft)
+   min_level = root->root_ft->level;
+
+   if (ft->level >= min_level)
+   return 0;
+
+   err = mlx5_cmd_update_root_ft(root->dev, ft);
+   if (err)
+   mlx5_core_warn(root->dev, "Update root flow table of id=%u 
failed\n",
+  ft->id);
+   else
+   root->root_ft = ft;
+
+   return err;
+}
+
 struct mlx5_flow_table *mlx5_create_flow_table(struct mlx5_flow_namespace *ns,
   int prio,
   int max_fte)
@@ -526,14 +549,15 @@ struct mlx5_flow_table *mlx5_create_flow_table(struct 
mlx5_flow_namespace *ns,
return ERR_PTR(-ENODEV);
}
 
+   mutex_lock(>chain_lock);
fs_prio = find_prio(ns, prio);
-   if (!fs_prio)
-   return ERR_PTR(-EINVAL);
-
-   lock_ref_node(_prio->node);
+   if (!fs_prio) {
+   err = -EINVAL;
+   goto unlock_root;
+   }
if (fs_prio->num_ft == fs_prio->max_ft) {
err = -ENOSPC;
-   goto unlock_prio;
+   goto unlock_root;
}
 
ft = alloc_flow_table(find_next_free_level(fs_prio),
@@ -541,7 +565,7 @@ struct mlx5_flow_t

[PATCH net-next 05/12] net/mlx5_core: Connect flow tables

2016-01-05 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Flow tables from different priorities should be chained together.
When a packet arrives we search for a match in the
by-pass flow tables (first we search for a match in priority 0
and if we don't find a match we move to the next priority).
If we can't find a match in any of the bypass flow-tables, we continue
searching in the flow-tables of the next priority, which are the
kernel's flow tables.

Setting the miss flow table in a new flow table to be the next one in
the list is performed via create flow table API. If we want to change an
existing flow table, for example in order to point from an
existing flow table to the new next-in-list flow table, we use the
modify flow table API.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c  |7 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h  |3 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  104 +++--
 3 files changed, 104 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index 2b55625..a9894d2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -58,7 +58,8 @@ int mlx5_cmd_update_root_ft(struct mlx5_core_dev *dev,
 
 int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
   enum fs_flow_table_type type, unsigned int level,
-  unsigned int log_size, unsigned int *table_id)
+  unsigned int log_size, struct mlx5_flow_table
+  *next_ft, unsigned int *table_id)
 {
u32 out[MLX5_ST_SZ_DW(create_flow_table_out)];
u32 in[MLX5_ST_SZ_DW(create_flow_table_in)];
@@ -69,6 +70,10 @@ int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
MLX5_SET(create_flow_table_in, in, opcode,
 MLX5_CMD_OP_CREATE_FLOW_TABLE);
 
+   if (next_ft) {
+   MLX5_SET(create_flow_table_in, in, table_miss_mode, 1);
+   MLX5_SET(create_flow_table_in, in, table_miss_id, next_ft->id);
+   }
MLX5_SET(create_flow_table_in, in, table_type, type);
MLX5_SET(create_flow_table_in, in, level, level);
MLX5_SET(create_flow_table_in, in, log_size, log_size);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
index 1ae9b68..9814d47 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
@@ -35,7 +35,8 @@
 
 int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
   enum fs_flow_table_type type, unsigned int level,
-  unsigned int log_size, unsigned int *table_id);
+  unsigned int log_size, struct mlx5_flow_table
+  *next_ft, unsigned int *table_id);
 
 int mlx5_cmd_destroy_flow_table(struct mlx5_core_dev *dev,
struct mlx5_flow_table *ft);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 6445489..4b4f2b8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -510,6 +510,48 @@ static struct mlx5_flow_table *find_prev_chained_ft(struct 
fs_prio *prio)
return find_closest_ft(prio, true);
 }
 
+static int connect_fts_in_prio(struct mlx5_core_dev *dev,
+  struct fs_prio *prio,
+  struct mlx5_flow_table *ft)
+{
+   struct mlx5_flow_table *iter;
+   int i = 0;
+   int err;
+
+   fs_for_each_ft(iter, prio) {
+   i++;
+   err = mlx5_cmd_modify_flow_table(dev,
+iter,
+ft);
+   if (err) {
+   mlx5_core_warn(dev, "Failed to modify flow table %d\n",
+  iter->id);
+   /* The driver is out of sync with the FW */
+   if (i > 1)
+   WARN_ON(true);
+   return err;
+   }
+   }
+   return 0;
+}
+
+/* Connect flow tables from previous priority of prio to ft */
+static int connect_prev_fts(struct mlx5_core_dev *dev,
+   struct mlx5_flow_table *ft,
+   struct fs_prio *prio)
+{
+   struct mlx5_flow_table *prev_ft;
+
+   prev_ft = find_prev_chained_ft(prio);
+   if (prev_ft) {
+   struct fs_prio *prev_prio

[PATCH net-next 02/12] net/mlx5_core: Add utilities to find next and prev flow-tables

2016-01-05 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Add two utility functions for find next and prev flow table.
Find next flow table function gets priority and return the
first flow table of the next priority in the tree.
Find prev flow table return the last flow table of
the previous priority in the tree.

These utility functions are used for chaining flow table from different
priorities.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   67 +
 1 files changed, 67 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 743a475..e62cc59 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -443,6 +443,73 @@ static struct mlx5_flow_table *alloc_flow_table(int level, 
int max_fte,
return ft;
 }
 
+/* If reverse is false, then we search for the first flow table in the
+ * root sub-tree from start(closest from right), else we search for the
+ * last flow table in the root sub-tree till start(closest from left).
+ */
+static struct mlx5_flow_table *find_closest_ft_recursive(struct fs_node  *root,
+struct list_head 
*start,
+bool reverse)
+{
+#define list_advance_entry(pos, reverse)   \
+   ((reverse) ? list_prev_entry(pos, list) : list_next_entry(pos, list))
+
+#define list_for_each_advance_continue(pos, head, reverse) \
+   for (pos = list_advance_entry(pos, reverse);\
+>list != (head);  \
+pos = list_advance_entry(pos, reverse))
+
+   struct fs_node *iter = list_entry(start, struct fs_node, list);
+   struct mlx5_flow_table *ft = NULL;
+
+   if (!root)
+   return NULL;
+
+   list_for_each_advance_continue(iter, >children, reverse) {
+   if (iter->type == FS_TYPE_FLOW_TABLE) {
+   fs_get_obj(ft, iter);
+   return ft;
+   }
+   ft = find_closest_ft_recursive(iter, >children, reverse);
+   if (ft)
+   return ft;
+   }
+
+   return ft;
+}
+
+/* If reverse if false then return the first flow table in next priority of
+ * prio in the tree, else return the last flow table in the previous priority
+ * of prio in the tree.
+ */
+static struct mlx5_flow_table *find_closest_ft(struct fs_prio *prio, bool 
reverse)
+{
+   struct mlx5_flow_table *ft = NULL;
+   struct fs_node *curr_node;
+   struct fs_node *parent;
+
+   parent = prio->node.parent;
+   curr_node = >node;
+   while (!ft && parent) {
+   ft = find_closest_ft_recursive(parent, _node->list, 
reverse);
+   curr_node = parent;
+   parent = curr_node->parent;
+   }
+   return ft;
+}
+
+/* Assuming all the tree is locked by mutex chain lock */
+static struct mlx5_flow_table *find_next_chained_ft(struct fs_prio *prio)
+{
+   return find_closest_ft(prio, false);
+}
+
+/* Assuming all the tree is locked by mutex chain lock */
+static struct mlx5_flow_table *find_prev_chained_ft(struct fs_prio *prio)
+{
+   return find_closest_ft(prio, true);
+}
+
 struct mlx5_flow_table *mlx5_create_flow_table(struct mlx5_flow_namespace *ns,
   int prio,
   int max_fte)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 00/12] net/mlx5_core: Enhance flow steering support

2016-01-05 Thread Saeed Mahameed
Hi Dave,

This series adds three new functionalists to the driver flow-steering
infrastructure:
auto-grouped flow tables, chaining of flow tables and updates for the
root flow table.

1. Auto-grouped flow tables - Flow table with auto grouping management.
When a flow table is created, hints regarding the number of rule types
and the number of rules are given in advance. Thus, a flow table is
divided into #NUM_TYPES+1 groups each contains
(#NUM_RULES)/(#NUM_TYPES+1) rules. The first #NUM_TYPES parts are groups
which are filled if the added rule matches the group specification or
the group is empty. The last part is filled by rules that can't fit
any of the former groups.

2. Chaining flow tables - Flow tables from different priorities are chained
together, if there is no match in flow table of priority i we continue
searching for a match in priority i+1. This is both true if priorities
i and i+1 belongs to the same namespace or not.

3. Updating the root flow table - the root flow table is the flow table
with the lowest level. The hardware start searching for a match in the
root flow table and continue according to the matches it find along
the way.

The first usage for the new functionality is flow steering for user-space
ConnectX-4 offloaded HW Eth RX queues done through the mlx5 IB driver.

When the mlx5 core driver is loaded, it opens three flow namespaces:
1. By-pass namespace (used by mlx5 IB driver).
2. Kernel namespace (used in order to get packets to the networking stack
through mlx5 EN driver).
3. Leftovers namespace (used by mlx5 IB and future sniffer)

The series is built as follows:

Patch #1 introduces auto-grouped flow tables support.

Patch #2 add utility functions for finding the next and the previous
flow tables in different priorities. This is used in order to chain
the flow tables in a downstream patch.

Patch #3 introduces a firmware command for updating the root flow table.

Patch #4 introduces modify flow table firmware command, this command is used
when we want to change the next flow table of an existing flow table.
This is used for chaining flow tables as well.

Patch #5 connect/disconnect flow tables. This is actually the chaining
process when we want to link flow tables. This means that if we couldn't
find a match in the first flow table, we'll continue in the chained
flow table.

Patch #6 updates priority's attributes that is required for flow table
level allocation. We update both the max_fts (the number of allowed FTs
in the sub-tree of this priority) and the start_level (which is the first
level we'll assign to the flow-tables created inside the priority).

Patch #7 adds checking of required device capabilities. Some namespaces
could be only created if the hardware supports certain attributes.
This is especially true for the Bypass and leftovers namespaces. This
adds a generic mechanism to check these required attributes.

Patch #8 creates two additional namespaces:
a. Bypass flow rules(has nine priorities)
b. Leftovers packets(have one priority) - for unmatched packets.

Patch #9 re-factors ipv4/ipv6 match fields in the mlx5 firmware interface
header to be more clear.

Patch #10 exports the flow steering API for mlx5_ib usage

Patch #11 and #12 implements the required support in mlx5_ib in order
to support the RDMA flow steering verbs.

Regards,
Moni, Matan and Maor

Maor Gottlieb (12):
  net/mlx5_core: Introduce flow steering autogrouped flow table
  net/mlx5_core: Add utilities to find next and prev flow-tables
  net/mlx5_core: Managing root flow table
  net/mlx5_core: Introduce modify flow table command
  net/mlx5_core: Connect flow tables
  net/mlx5_core: Set priority attributes
  net/mlx5_core: Initialize namespaces only when supported by device
  net/mlx5_core: Enable flow steering support for the IB driver
  net/mlx5_core: Make ipv4/ipv6 location more clear
  net/mlx5_core: Export flow steering API
  IB/mlx5: Add flow steering utilities
  IB/mlx5: Add flow steering support

 drivers/infiniband/hw/mlx5/main.c |  462 
 drivers/infiniband/hw/mlx5/mlx5_ib.h  |   45 ++-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c  |   52 ++-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h  |9 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  605 ++---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |   14 +
 include/linux/mlx5/device.h   |2 +
 include/linux/mlx5/fs.h   |   18 +
 include/linux/mlx5/mlx5_ifc.h |  105 -
 9 files changed, 1234 insertions(+), 78 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 12/12] IB/mlx5: Add flow steering support

2016-01-05 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Adding flow steering support by creating a flow-table per
priority (if rules exist in the priority).
mlx5_ib uses autogrouping and thus only creates the
required destinations.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c|  285 ++
 drivers/infiniband/hw/mlx5/mlx5_ib.h |   45 +-
 2 files changed, 329 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 6886d81..c1acd52 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "user.h"
 #include "mlx5_ib.h"
 
@@ -1012,6 +1013,281 @@ static bool is_valid_attr(struct ib_flow_attr 
*flow_attr)
return !has_ipv4_spec || eth_type_ipv4;
 }
 
+static void put_flow_table(struct mlx5_ib_dev *dev,
+  struct mlx5_ib_flow_prio *prio, bool ft_added)
+{
+   prio->refcount -= !!ft_added;
+   if (!prio->refcount) {
+   mlx5_destroy_flow_table(prio->flow_table);
+   prio->flow_table = NULL;
+   }
+}
+
+static int mlx5_ib_destroy_flow(struct ib_flow *flow_id)
+{
+   struct mlx5_ib_dev *dev = to_mdev(flow_id->qp->device);
+   struct mlx5_ib_flow_handler *handler = container_of(flow_id,
+ struct 
mlx5_ib_flow_handler,
+ ibflow);
+   struct mlx5_ib_flow_handler *iter, *tmp;
+
+   mutex_lock(>flow_db.lock);
+
+   list_for_each_entry_safe(iter, tmp, >list, list) {
+   mlx5_del_flow_rule(iter->rule);
+   list_del(>list);
+   kfree(iter);
+   }
+
+   mlx5_del_flow_rule(handler->rule);
+   put_flow_table(dev, >flow_db.prios[handler->prio], true);
+   mutex_unlock(>flow_db.lock);
+
+   kfree(handler);
+
+   return 0;
+}
+
+#define MLX5_FS_MAX_TYPES   10
+#define MLX5_FS_MAX_ENTRIES 32000UL
+static struct mlx5_ib_flow_prio *get_flow_table(struct mlx5_ib_dev *dev,
+   struct ib_flow_attr *flow_attr)
+{
+   struct mlx5_flow_namespace *ns = NULL;
+   struct mlx5_ib_flow_prio *prio;
+   struct mlx5_flow_table *ft;
+   int num_entries;
+   int num_groups;
+   int priority;
+   int err = 0;
+
+   if (flow_attr->type == IB_FLOW_ATTR_NORMAL) {
+   if (flow_is_multicast_only(flow_attr))
+   priority = MLX5_IB_FLOW_MCAST_PRIO;
+   else
+   priority = flow_attr->priority;
+   ns = mlx5_get_flow_namespace(dev->mdev,
+MLX5_FLOW_NAMESPACE_BYPASS);
+   num_entries = MLX5_FS_MAX_ENTRIES;
+   num_groups = MLX5_FS_MAX_TYPES;
+   prio = >flow_db.prios[priority];
+   } else if (flow_attr->type == IB_FLOW_ATTR_ALL_DEFAULT ||
+  flow_attr->type == IB_FLOW_ATTR_MC_DEFAULT) {
+   ns = mlx5_get_flow_namespace(dev->mdev,
+MLX5_FLOW_NAMESPACE_LEFTOVERS);
+   build_leftovers_ft_param(,
+_entries,
+_groups);
+   prio = >flow_db.prios[MLX5_IB_FLOW_LEFTOVERS_PRIO];
+   }
+
+   if (!ns)
+   return ERR_PTR(-ENOTSUPP);
+
+   ft = prio->flow_table;
+   if (!ft) {
+   ft = mlx5_create_auto_grouped_flow_table(ns, priority,
+num_entries,
+num_groups);
+
+   if (!IS_ERR(ft)) {
+   prio->refcount = 0;
+   prio->flow_table = ft;
+   } else {
+   err = PTR_ERR(ft);
+   }
+   }
+
+   return err ? ERR_PTR(err) : prio;
+}
+
+static struct mlx5_ib_flow_handler *create_flow_rule(struct mlx5_ib_dev *dev,
+struct mlx5_ib_flow_prio 
*ft_prio,
+struct ib_flow_attr 
*flow_attr,
+struct 
mlx5_flow_destination *dst)
+{
+   struct mlx5_flow_table  *ft = ft_prio->flow_table;
+   struct mlx5_ib_flow_handler *handler;
+   void *ib_flow = flow_attr + 1;
+   u8 match_criteria_enable = 0;
+   unsigned int spec_index;
+   u32 *match_c;
+   u32 *match_v;
+   int err = 0;
+
+   if (

[PATCH net-next 11/12] IB/mlx5: Add flow steering utilities

2016-01-05 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Add three utility functions for support flow steering:

1. Parsing verbs flow attributes hardware steering specs.

2. Check if flow is multicast - this is required in order to decide to
which flow table will we add the steering rule.

3. Set outer headers in flow match criteria to zeros.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c |  177 +
 include/linux/mlx5/fs.h   |   10 ++
 2 files changed, 187 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 7e97cb5..6886d81 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -43,6 +43,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "user.h"
 #include "mlx5_ib.h"
 
@@ -835,6 +837,181 @@ static int mlx5_ib_dealloc_pd(struct ib_pd *pd)
return 0;
 }
 
+static bool outer_header_zero(u32 *match_criteria)
+{
+   int size = MLX5_ST_SZ_BYTES(fte_match_param);
+   char *outer_headers_c = MLX5_ADDR_OF(fte_match_param, match_criteria,
+outer_headers);
+
+   return outer_headers_c[0] == 0 && !memcmp(outer_headers_c,
+ outer_headers_c + 1,
+ size - 1);
+}
+
+static int parse_flow_attr(u32 *match_c, u32 *match_v,
+  union ib_flow_spec *ib_spec)
+{
+   void *outer_headers_c = MLX5_ADDR_OF(fte_match_param, match_c,
+outer_headers);
+   void *outer_headers_v = MLX5_ADDR_OF(fte_match_param, match_v,
+outer_headers);
+   switch (ib_spec->type) {
+   case IB_FLOW_SPEC_ETH:
+   if (ib_spec->size != sizeof(ib_spec->eth))
+   return -EINVAL;
+
+   ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, 
outer_headers_c,
+dmac_47_16),
+   ib_spec->eth.mask.dst_mac);
+   ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, 
outer_headers_v,
+dmac_47_16),
+   ib_spec->eth.val.dst_mac);
+
+   if (ib_spec->eth.mask.vlan_tag) {
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+vlan_tag, 1);
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+vlan_tag, 1);
+
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+first_vid, ntohs(ib_spec->eth.mask.vlan_tag));
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+first_vid, ntohs(ib_spec->eth.val.vlan_tag));
+
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+first_cfi,
+ntohs(ib_spec->eth.mask.vlan_tag) >> 12);
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+first_cfi,
+ntohs(ib_spec->eth.val.vlan_tag) >> 12);
+
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+first_prio,
+ntohs(ib_spec->eth.mask.vlan_tag) >> 13);
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+first_prio,
+ntohs(ib_spec->eth.val.vlan_tag) >> 13);
+   }
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+ethertype, ntohs(ib_spec->eth.mask.ether_type));
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+ethertype, ntohs(ib_spec->eth.val.ether_type));
+   break;
+   case IB_FLOW_SPEC_IPV4:
+   if (ib_spec->size != sizeof(ib_spec->ipv4))
+   return -EINVAL;
+
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+ethertype, 0x);
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+ethertype, ETH_P_IP);
+
+   memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, outer_headers_c,
+   src_ipv4_src_ipv6.ipv4_layout.ipv4),
+  _spec->ipv4.mask.src_ip,
+  sizeof(ib_spec->ipv4.mask.src_ip));
+ 

[PATCH net-next 04/12] net/mlx5_core: Introduce modify flow table command

2016-01-05 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Introduce the modify flow table command. This command is used when
we want to change the next flow table of an existing flow table.
The next flow table is defined as the table we search (in order
to find a match), if we couldn't find a match in any of the flow table
entries in the current flow table.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c |   27 ++
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h |4 ++
 include/linux/mlx5/mlx5_ifc.h|   56 --
 3 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index d8b1195..2b55625 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -101,6 +101,33 @@ int mlx5_cmd_destroy_flow_table(struct mlx5_core_dev *dev,
  sizeof(out));
 }
 
+int mlx5_cmd_modify_flow_table(struct mlx5_core_dev *dev,
+  struct mlx5_flow_table *ft,
+  struct mlx5_flow_table *next_ft)
+{
+   u32 in[MLX5_ST_SZ_DW(modify_flow_table_in)];
+   u32 out[MLX5_ST_SZ_DW(modify_flow_table_out)];
+
+   memset(in, 0, sizeof(in));
+   memset(out, 0, sizeof(out));
+
+   MLX5_SET(modify_flow_table_in, in, opcode,
+MLX5_CMD_OP_MODIFY_FLOW_TABLE);
+   MLX5_SET(modify_flow_table_in, in, table_type, ft->type);
+   MLX5_SET(modify_flow_table_in, in, table_id, ft->id);
+   MLX5_SET(modify_flow_table_in, in, modify_field_select,
+MLX5_MODIFY_FLOW_TABLE_MISS_TABLE_ID);
+   if (next_ft) {
+   MLX5_SET(modify_flow_table_in, in, table_miss_mode, 1);
+   MLX5_SET(modify_flow_table_in, in, table_miss_id, next_ft->id);
+   } else {
+   MLX5_SET(modify_flow_table_in, in, table_miss_mode, 0);
+   }
+
+   return mlx5_cmd_exec_check_status(dev, in, sizeof(in), out,
+ sizeof(out));
+}
+
 int mlx5_cmd_create_flow_group(struct mlx5_core_dev *dev,
   struct mlx5_flow_table *ft,
   u32 *in,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
index 70d18ec..1ae9b68 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
@@ -40,6 +40,10 @@ int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
 int mlx5_cmd_destroy_flow_table(struct mlx5_core_dev *dev,
struct mlx5_flow_table *ft);
 
+int mlx5_cmd_modify_flow_table(struct mlx5_core_dev *dev,
+  struct mlx5_flow_table *ft,
+  struct mlx5_flow_table *next_ft);
+
 int mlx5_cmd_create_flow_group(struct mlx5_core_dev *dev,
   struct mlx5_flow_table *ft,
   u32 *in, unsigned int *group_id);
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 323e713..7f16695 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -194,7 +194,8 @@ enum {
MLX5_CMD_OP_QUERY_FLOW_GROUP  = 0x935,
MLX5_CMD_OP_SET_FLOW_TABLE_ENTRY  = 0x936,
MLX5_CMD_OP_QUERY_FLOW_TABLE_ENTRY= 0x937,
-   MLX5_CMD_OP_DELETE_FLOW_TABLE_ENTRY   = 0x938
+   MLX5_CMD_OP_DELETE_FLOW_TABLE_ENTRY   = 0x938,
+   MLX5_CMD_OP_MODIFY_FLOW_TABLE = 0x93c
 };
 
 struct mlx5_ifc_flow_table_fields_supported_bits {
@@ -260,7 +261,9 @@ struct mlx5_ifc_flow_table_prop_layout_bits {
u8 reserved_0[0x2];
u8 flow_modify_en[0x1];
u8 modify_root[0x1];
-   u8 reserved_1[0x1b];
+   u8 identified_miss_table_mode[0x1];
+   u8 flow_table_modify[0x1];
+   u8 reserved_1[0x19];
 
u8 reserved_2[0x2];
u8 log_max_ft_size[0x6];
@@ -5669,12 +5672,16 @@ struct mlx5_ifc_create_flow_table_in_bits {
 
u8 reserved_4[0x20];
 
-   u8 reserved_5[0x8];
+   u8 reserved_5[0x4];
+   u8 table_miss_mode[0x4];
u8 level[0x8];
u8 reserved_6[0x8];
u8 log_size[0x8];
 
-   u8 reserved_7[0x120];
+   u8 reserved_7[0x8];
+   u8 table_miss_id[0x18];
+
+   u8 reserved_8[0x100];
 };
 
 struct mlx5_ifc_create_flow_group_out_bits {
@@ -6975,4 +6982,45 @@ struct mlx5_ifc_set_flow_table_root_in_bits {
u8

[PATCH net-next 09/12] net/mlx5_core: Make ipv4/ipv6 location more clear

2016-01-05 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Change the mlx5 firmware interface header to make it
more clear which bytes should be used by IPv4 or
IPv6 addresses.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h |   20 ++--
 1 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 7f16695..68d73f8 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -298,6 +298,22 @@ struct mlx5_ifc_odp_per_transport_service_cap_bits {
u8 reserved_1[0x1a];
 };
 
+struct mlx5_ifc_ipv4_layout_bits {
+   u8 reserved_0[0x60];
+
+   u8 ipv4[0x20];
+};
+
+struct mlx5_ifc_ipv6_layout_bits {
+   u8 ipv6[16][0x8];
+};
+
+union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits {
+   struct mlx5_ifc_ipv6_layout_bits ipv6_layout;
+   struct mlx5_ifc_ipv4_layout_bits ipv4_layout;
+   u8 reserved_0[0x80];
+};
+
 struct mlx5_ifc_fte_match_set_lyr_2_4_bits {
u8 smac_47_16[0x20];
 
@@ -328,9 +344,9 @@ struct mlx5_ifc_fte_match_set_lyr_2_4_bits {
u8 udp_sport[0x10];
u8 udp_dport[0x10];
 
-   u8 src_ip[4][0x20];
+   union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits src_ipv4_src_ipv6;
 
-   u8 dst_ip[4][0x20];
+   union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits dst_ipv4_dst_ipv6;
 };
 
 struct mlx5_ifc_fte_match_set_misc_bits {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 01/12] net/mlx5_core: Introduce flow steering autogrouped flow table

2016-01-05 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

When user add rule to autogrouped flow table, we search
for flow group with the same match criteria, if we don't
find such group then we create new flow group with the
required match criteria and insert the rule to this group.

We divide the flow table into required_groups + 1,
in order to reserve a part of the flow table for rules
which don't match any existing group.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  170 ++---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |5 +
 include/linux/mlx5/fs.h   |6 +
 3 files changed, 160 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index f7d62fe..743a475 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -85,6 +85,12 @@ static struct init_tree_node {
}
 };
 
+enum fs_i_mutex_lock_class {
+   FS_MUTEX_GRANDPARENT,
+   FS_MUTEX_PARENT,
+   FS_MUTEX_CHILD
+};
+
 static void del_rule(struct fs_node *node);
 static void del_flow_table(struct fs_node *node);
 static void del_flow_group(struct fs_node *node);
@@ -119,10 +125,11 @@ static void tree_get_node(struct fs_node *node)
atomic_inc(>refcount);
 }
 
-static void nested_lock_ref_node(struct fs_node *node)
+static void nested_lock_ref_node(struct fs_node *node,
+enum fs_i_mutex_lock_class class)
 {
if (node) {
-   mutex_lock_nested(>lock, SINGLE_DEPTH_NESTING);
+   mutex_lock_nested(>lock, class);
atomic_inc(>refcount);
}
 }
@@ -481,9 +488,7 @@ struct mlx5_flow_table *mlx5_create_flow_table(struct 
mlx5_flow_namespace *ns,
list_add_tail(>node.list, _prio->node.children);
fs_prio->num_ft++;
unlock_ref_node(_prio->node);
-
return ft;
-
 free_ft:
kfree(ft);
 unlock_prio:
@@ -491,8 +496,32 @@ unlock_prio:
return ERR_PTR(err);
 }
 
-struct mlx5_flow_group *mlx5_create_flow_group(struct mlx5_flow_table *ft,
-  u32 *fg_in)
+struct mlx5_flow_table *mlx5_create_auto_grouped_flow_table(struct 
mlx5_flow_namespace *ns,
+   int prio,
+   int 
num_flow_table_entries,
+   int max_num_groups)
+{
+   struct mlx5_flow_table *ft;
+
+   if (max_num_groups > num_flow_table_entries)
+   return ERR_PTR(-EINVAL);
+
+   ft = mlx5_create_flow_table(ns, prio, num_flow_table_entries);
+   if (IS_ERR(ft))
+   return ft;
+
+   ft->autogroup.active = true;
+   ft->autogroup.required_groups = max_num_groups;
+
+   return ft;
+}
+
+/* Flow table should be locked */
+static struct mlx5_flow_group *create_flow_group_common(struct mlx5_flow_table 
*ft,
+   u32 *fg_in,
+   struct list_head
+   *prev_fg,
+   bool is_auto_fg)
 {
struct mlx5_flow_group *fg;
struct mlx5_core_dev *dev = get_dev(>node);
@@ -505,18 +534,33 @@ struct mlx5_flow_group *mlx5_create_flow_group(struct 
mlx5_flow_table *ft,
if (IS_ERR(fg))
return fg;
 
-   lock_ref_node(>node);
err = mlx5_cmd_create_flow_group(dev, ft, fg_in, >id);
if (err) {
kfree(fg);
-   unlock_ref_node(>node);
return ERR_PTR(err);
}
-   /* Add node to tree */
-   tree_init_node(>node, 1, del_flow_group);
+
+   if (ft->autogroup.active)
+   ft->autogroup.num_groups++;
+   /*Add node to tree*/
+   tree_init_node(>node, !is_auto_fg, del_flow_group);
tree_add_node(>node, >node);
-   /* Add node to group list */
+   /*Add node to group list*/
list_add(>node.list, ft->node.children.prev);
+
+   return fg;
+}
+
+struct mlx5_flow_group *mlx5_create_flow_group(struct mlx5_flow_table *ft,
+  u32 *fg_in)
+{
+   struct mlx5_flow_group *fg;
+
+   if (ft->autogroup.active)
+   return ERR_PTR(-EPERM);
+
+   lock_ref_node(>node);
+   fg = create_flow_group_common(ft, fg_in, >node.children, false);
unlock_ref_node(>node);
 
return fg;
@@ -614,7 +658,63 @@ static st

[PATCH net-next 10/12] net/mlx5_core: Export flow steering API

2016-01-05 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Add exports to flow steering API for mlx5_ib usage.
The following functions are exported:

1. mlx5_create_auto_grouped_flow_table - used to create flow
table with auto flow grouping management (create and destroy
flow groups). In auto-grouped flow tables, we create groups
automatically if needed (if we don't find an existing
flow group with same match criteria when we add new rule).

2. mlx5_destroy_flow_table - used to destroy  a flow table.

3. mlx5_add_flow_rule - used to add flow rule into a flow table.

4. mlx5_del_flow_rule - used to delete flow rule from its flow table.

5. mlx5_get_flow_namespace - used to get a handle to the required
namespace sub-tree.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 7198528..fa144e5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -702,6 +702,7 @@ struct mlx5_flow_table 
*mlx5_create_auto_grouped_flow_table(struct mlx5_flow_nam
 
return ft;
 }
+EXPORT_SYMBOL(mlx5_create_auto_grouped_flow_table);
 
 /* Flow table should be locked */
 static struct mlx5_flow_group *create_flow_group_common(struct mlx5_flow_table 
*ft,
@@ -1013,11 +1014,13 @@ unlock:
unlock_ref_node(>node);
return rule;
 }
+EXPORT_SYMBOL(mlx5_add_flow_rule);
 
 void mlx5_del_flow_rule(struct mlx5_flow_rule *rule)
 {
tree_remove_node(>node);
 }
+EXPORT_SYMBOL(mlx5_del_flow_rule);
 
 /* Assuming prio->node.children(flow tables) is sorted by level */
 static struct mlx5_flow_table *find_next_ft(struct mlx5_flow_table *ft)
@@ -1099,6 +1102,7 @@ int mlx5_destroy_flow_table(struct mlx5_flow_table *ft)
 
return err;
 }
+EXPORT_SYMBOL(mlx5_destroy_flow_table);
 
 void mlx5_destroy_flow_group(struct mlx5_flow_group *fg)
 {
@@ -1143,6 +1147,7 @@ struct mlx5_flow_namespace 
*mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
 
return ns;
 }
+EXPORT_SYMBOL(mlx5_get_flow_namespace);
 
 static struct fs_prio *fs_create_prio(struct mlx5_flow_namespace *ns,
  unsigned prio, int max_ft)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 06/12] net/mlx5_core: Set priority attributes

2016-01-05 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Each priority has two attributes:
1. max_ft - maximum allowed flow tables under this priority.
2. start_level - start level range of the flow tables
in the priority.

These attributes are set by traversing the tree nodes by
DFS and set start level and max flow tables to each priority.
Start level depends on the max flow tables of the prior priorities
in the tree.

The leaves of the trees have max_ft set in them. Each node accumulates
the max_ft of its children and set it accordingly.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   71 +++--
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |3 +
 2 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 4b4f2b8..2c064ba 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -41,20 +41,19 @@
 sizeof(struct init_tree_node))
 
 #define INIT_PRIO(min_level_val, max_ft_val,\
- start_level_val, ...) {.type = FS_TYPE_PRIO,\
+ ...) {.type = FS_TYPE_PRIO,\
.min_ft_level = min_level_val,\
-   .start_level = start_level_val,\
.max_ft = max_ft_val,\
.children = (struct init_tree_node[]) {__VA_ARGS__},\
.ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
 }
 
-#define ADD_PRIO(min_level_val, max_ft_val, start_level_val, ...)\
-   INIT_PRIO(min_level_val, max_ft_val, start_level_val,\
+#define ADD_PRIO(min_level_val, max_ft_val, ...)\
+   INIT_PRIO(min_level_val, max_ft_val,\
  __VA_ARGS__)\
 
-#define ADD_FT_PRIO(max_ft_val, start_level_val, ...)\
-   INIT_PRIO(0, max_ft_val, start_level_val,\
+#define ADD_FT_PRIO(max_ft_val, ...)\
+   INIT_PRIO(0, max_ft_val,\
  __VA_ARGS__)\
 
 #define ADD_NS(...) {.type = FS_TYPE_NAMESPACE,\
@@ -62,8 +61,6 @@
.ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
 }
 
-#define KERNEL_START_LEVEL 0
-#define KERNEL_P0_START_LEVEL KERNEL_START_LEVEL
 #define KERNEL_MAX_FT 2
 #define KENREL_MIN_LEVEL 2
 static struct init_tree_node {
@@ -73,15 +70,12 @@ static struct init_tree_node {
int min_ft_level;
int prio;
int max_ft;
-   int start_level;
 } root_fs = {
.type = FS_TYPE_NAMESPACE,
.ar_size = 1,
.children = (struct init_tree_node[]) {
-   ADD_PRIO(KENREL_MIN_LEVEL, KERNEL_MAX_FT,
-KERNEL_START_LEVEL,
-ADD_NS(ADD_FT_PRIO(KERNEL_MAX_FT,
-   KERNEL_P0_START_LEVEL))),
+   ADD_PRIO(KENREL_MIN_LEVEL, 0,
+ADD_NS(ADD_FT_PRIO(KERNEL_MAX_FT))),
}
 };
 
@@ -1117,8 +,7 @@ struct mlx5_flow_namespace 
*mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
 }
 
 static struct fs_prio *fs_create_prio(struct mlx5_flow_namespace *ns,
- unsigned prio, int max_ft,
- int start_level)
+ unsigned prio, int max_ft)
 {
struct fs_prio *fs_prio;
 
@@ -1131,7 +1124,6 @@ static struct fs_prio *fs_create_prio(struct 
mlx5_flow_namespace *ns,
tree_add_node(_prio->node, >node);
fs_prio->max_ft = max_ft;
fs_prio->prio = prio;
-   fs_prio->start_level = start_level;
list_add_tail(_prio->node.list, >node.children);
 
return fs_prio;
@@ -1177,8 +1169,7 @@ static int init_root_tree_recursive(int max_ft_level, 
struct init_tree_node *ini
return -ENOTSUPP;
 
fs_get_obj(fs_ns, fs_parent_node);
-   fs_prio = fs_create_prio(fs_ns, index, init_node->max_ft,
-init_node->start_level);
+   fs_prio = fs_create_prio(fs_ns, index, init_node->max_ft);
if (IS_ERR(fs_prio))
return PTR_ERR(fs_prio);
base = _prio->node;
@@ -1245,6 +1236,46 @@ static struct mlx5_flow_root_namespace 
*create_root_ns(struct mlx5_core_dev *dev
return root_ns;
 }
 
+static void set_prio_attrs_in_prio(struct fs_prio *prio, int acc_level);
+
+static int set_prio_attrs_in_ns(struct mlx5_flow_namespace *ns, int acc_level)
+{
+   struct fs_prio *prio;
+
+   fs_for_each_prio(prio, ns) {
+/* This updates prio start_level and max_ft */
+   set_prio_attrs_in_prio(prio, acc_level);
+   acc_level += prio->max_ft;
+   }
+   return acc_level;
+}
+
+static void set_prio_at

[PATCH net-next 08/12] net/mlx5_core: Enable flow steering support for the IB driver

2016-01-05 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

When the driver is loaded, we create flow steering namespace
for kernel bypass with nine priorities and another namespace
for leftovers(in order to catch packets that weren't matched).
Verbs applications will use these priorities.
we found nine as a number that balances the requirements from the
user and retains performance.

The bypass namespace is used by verbs applications that want to bypass
the kernel networking stack. The leftovers namespace is used by verbs
applications and the sniffer in order to catch packets that weren't
handled by any preceding rules.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   55 ++---
 include/linux/mlx5/device.h   |2 +
 include/linux/mlx5/fs.h   |2 +
 3 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 7e39b69..7198528 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -40,18 +40,19 @@
 #define INIT_TREE_NODE_ARRAY_SIZE(...) (sizeof((struct 
init_tree_node[]){__VA_ARGS__}) /\
 sizeof(struct init_tree_node))
 
-#define ADD_PRIO(min_level_val, max_ft_val, caps_val,\
+#define ADD_PRIO(num_prios_val, min_level_val, max_ft_val, caps_val,\
 ...) {.type = FS_TYPE_PRIO,\
.min_ft_level = min_level_val,\
.max_ft = max_ft_val,\
+   .num_leaf_prios = num_prios_val,\
.caps = caps_val,\
.children = (struct init_tree_node[]) {__VA_ARGS__},\
.ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
 }
 
-#define ADD_FT_PRIO(max_ft_val, ...)\
-   ADD_PRIO(0, max_ft_val, {},\
- __VA_ARGS__)\
+#define ADD_MULTIPLE_PRIO(num_prios_val, max_ft_val, ...)\
+   ADD_PRIO(num_prios_val, 0, max_ft_val, {},\
+__VA_ARGS__)\
 
 #define ADD_NS(...) {.type = FS_TYPE_NAMESPACE,\
.children = (struct init_tree_node[]) {__VA_ARGS__},\
@@ -66,7 +67,14 @@
 #define FS_REQUIRED_CAPS(...) {.arr_sz = INIT_CAPS_ARRAY_SIZE(__VA_ARGS__), \
   .caps = (long[]) {__VA_ARGS__} }
 
+#define LEFTOVERS_MAX_FT 1
+#define LEFTOVERS_NUM_PRIOS 1
+#define BY_PASS_PRIO_MAX_FT 1
+#define BY_PASS_MIN_LEVEL (KENREL_MIN_LEVEL + MLX5_BY_PASS_NUM_PRIOS +\
+  LEFTOVERS_MAX_FT)
+
 #define KERNEL_MAX_FT 2
+#define KERNEL_NUM_PRIOS 1
 #define KENREL_MIN_LEVEL 2
 
 struct node_caps {
@@ -79,14 +87,27 @@ static struct init_tree_node {
int ar_size;
struct node_caps caps;
int min_ft_level;
+   int num_leaf_prios;
int prio;
int max_ft;
 } root_fs = {
.type = FS_TYPE_NAMESPACE,
-   .ar_size = 1,
+   .ar_size = 3,
.children = (struct init_tree_node[]) {
-   ADD_PRIO(KENREL_MIN_LEVEL, 0, {},
-ADD_NS(ADD_FT_PRIO(KERNEL_MAX_FT))),
+   ADD_PRIO(0, BY_PASS_MIN_LEVEL, 0,
+
FS_REQUIRED_CAPS(FS_CAP(flow_table_properties_nic_receive.flow_modify_en),
+ 
FS_CAP(flow_table_properties_nic_receive.modify_root),
+ 
FS_CAP(flow_table_properties_nic_receive.identified_miss_table_mode),
+ 
FS_CAP(flow_table_properties_nic_receive.flow_table_modify)),
+ADD_NS(ADD_MULTIPLE_PRIO(MLX5_BY_PASS_NUM_PRIOS, 
BY_PASS_PRIO_MAX_FT))),
+   ADD_PRIO(0, KENREL_MIN_LEVEL, 0, {},
+ADD_NS(ADD_MULTIPLE_PRIO(KERNEL_NUM_PRIOS, 
KERNEL_MAX_FT))),
+   ADD_PRIO(0, BY_PASS_MIN_LEVEL, 0,
+
FS_REQUIRED_CAPS(FS_CAP(flow_table_properties_nic_receive.flow_modify_en),
+ 
FS_CAP(flow_table_properties_nic_receive.modify_root),
+ 
FS_CAP(flow_table_properties_nic_receive.identified_miss_table_mode),
+ 
FS_CAP(flow_table_properties_nic_receive.flow_table_modify)),
+ADD_NS(ADD_MULTIPLE_PRIO(LEFTOVERS_NUM_PRIOS, 
LEFTOVERS_MAX_FT))),
}
 };
 
@@ -1098,8 +1119,10 @@ struct mlx5_flow_namespace 
*mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
return NULL;
 
switch (type) {
+   case MLX5_FLOW_NAMESPACE_BYPASS:
case MLX5_FLOW_NAMESPACE_KERNEL:
-   prio = 0;
+   case MLX5_FLOW_NAMESPACE_LEFTOVERS:
+   prio = type;
break;
case MLX5_FLOW_NAMESPACE_FDB:
if (dev->priv.fdb_root_

[PATCH net-next 07/12] net/mlx5_core: Initialize namespaces only when supported by device

2016-01-05 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Before we create the sub tree of a steering namespaces(kernel, bypass,
leftovers) we check that the device has the required capabilities
in order to create this subtree.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   70 ++--
 1 files changed, 49 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 2c064ba..7e39b69 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -40,20 +40,17 @@
 #define INIT_TREE_NODE_ARRAY_SIZE(...) (sizeof((struct 
init_tree_node[]){__VA_ARGS__}) /\
 sizeof(struct init_tree_node))
 
-#define INIT_PRIO(min_level_val, max_ft_val,\
- ...) {.type = FS_TYPE_PRIO,\
+#define ADD_PRIO(min_level_val, max_ft_val, caps_val,\
+...) {.type = FS_TYPE_PRIO,\
.min_ft_level = min_level_val,\
.max_ft = max_ft_val,\
+   .caps = caps_val,\
.children = (struct init_tree_node[]) {__VA_ARGS__},\
.ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
 }
 
-#define ADD_PRIO(min_level_val, max_ft_val, ...)\
-   INIT_PRIO(min_level_val, max_ft_val,\
- __VA_ARGS__)\
-
 #define ADD_FT_PRIO(max_ft_val, ...)\
-   INIT_PRIO(0, max_ft_val,\
+   ADD_PRIO(0, max_ft_val, {},\
  __VA_ARGS__)\
 
 #define ADD_NS(...) {.type = FS_TYPE_NAMESPACE,\
@@ -61,12 +58,26 @@
.ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
 }
 
+#define INIT_CAPS_ARRAY_SIZE(...) (sizeof((long[]){__VA_ARGS__}) /\
+  sizeof(long))
+
+#define FS_CAP(cap) (__mlx5_bit_off(flow_table_nic_cap, cap))
+
+#define FS_REQUIRED_CAPS(...) {.arr_sz = INIT_CAPS_ARRAY_SIZE(__VA_ARGS__), \
+  .caps = (long[]) {__VA_ARGS__} }
+
 #define KERNEL_MAX_FT 2
 #define KENREL_MIN_LEVEL 2
+
+struct node_caps {
+   size_t  arr_sz;
+   long*caps;
+};
 static struct init_tree_node {
enum fs_node_type   type;
struct init_tree_node *children;
int ar_size;
+   struct node_caps caps;
int min_ft_level;
int prio;
int max_ft;
@@ -74,7 +85,7 @@ static struct init_tree_node {
.type = FS_TYPE_NAMESPACE,
.ar_size = 1,
.children = (struct init_tree_node[]) {
-   ADD_PRIO(KENREL_MIN_LEVEL, 0,
+   ADD_PRIO(KENREL_MIN_LEVEL, 0, {},
 ADD_NS(ADD_FT_PRIO(KERNEL_MAX_FT))),
}
 };
@@ -1153,11 +1164,31 @@ static struct mlx5_flow_namespace 
*fs_create_namespace(struct fs_prio *prio)
return ns;
 }
 
-static int init_root_tree_recursive(int max_ft_level, struct init_tree_node 
*init_node,
+#define FLOW_TABLE_BIT_SZ 1
+#define GET_FLOW_TABLE_CAP(dev, offset) \
+   ((be32_to_cpu(*((__be32 *)(dev->hca_caps_cur[MLX5_CAP_FLOW_TABLE]) +
\
+   offset / 32)) >>
\
+ (32 - FLOW_TABLE_BIT_SZ - (offset & 0x1f))) & FLOW_TABLE_BIT_SZ)
+static bool has_required_caps(struct mlx5_core_dev *dev, struct node_caps 
*caps)
+{
+   int i;
+
+   for (i = 0; i < caps->arr_sz; i++) {
+   if (!GET_FLOW_TABLE_CAP(dev, caps->caps[i]))
+   return false;
+   }
+   return true;
+}
+
+static int init_root_tree_recursive(struct mlx5_core_dev *dev,
+   struct init_tree_node *init_node,
struct fs_node *fs_parent_node,
struct init_tree_node *init_parent_node,
int index)
 {
+   int max_ft_level = MLX5_CAP_FLOWTABLE(dev,
+ flow_table_properties_nic_receive.
+ max_ft_level);
struct mlx5_flow_namespace *fs_ns;
struct fs_prio *fs_prio;
struct fs_node *base;
@@ -1165,8 +1196,9 @@ static int init_root_tree_recursive(int max_ft_level, 
struct init_tree_node *ini
int err;
 
if (init_node->type == FS_TYPE_PRIO) {
-   if (init_node->min_ft_level > max_ft_level)
-   return -ENOTSUPP;
+   if ((init_node->min_ft_level > max_ft_level) ||
+   !has_required_caps(dev, _node->caps))
+   return 0;
 
fs_get_obj(fs_ns, fs_parent_node);
fs_prio = fs_create_prio(fs_ns, index, init_node->max_ft);
@@ -1183,9 +1215,8 @@ static int init_root_tree_recursive(int max_ft_level, 
struct init_t

[PATCH net-next V1 00/12] net/mlx5_core: Enhance flow steering support

2016-01-06 Thread Saeed Mahameed
Hi Dave,

This series adds three new functionalists to the driver flow-steering
infrastructure:
auto-grouped flow tables, chaining of flow tables and updates for the
root flow table.

Changes since V0:
- Fixed improperly formatted comments.
- Compare value of ib_spec->eth.mask.ether_type in network byte order
  in ('IB/mlx5: Add flow steering utilities').

1. Auto-grouped flow tables - Flow table with auto grouping management.
When a flow table is created, hints regarding the number of rule types
and the number of rules are given in advance. Thus, a flow table is
divided into #NUM_TYPES+1 groups each contains
(#NUM_RULES)/(#NUM_TYPES+1) rules. The first #NUM_TYPES parts are groups
which are filled if the added rule matches the group specification or
the group is empty. The last part is filled by rules that can't fit
any of the former groups.

2. Chaining flow tables - Flow tables from different priorities are chained
together, if there is no match in flow table of priority i we continue
searching for a match in priority i+1. This is both true if priorities
i and i+1 belongs to the same namespace or not.

3. Updating the root flow table - the root flow table is the flow table
with the lowest level. The hardware start searching for a match in the
root flow table and continue according to the matches it find along
the way.

The first usage for the new functionality is flow steering for user-space
ConnectX-4 offloaded HW Eth RX queues done through the mlx5 IB driver.

When the mlx5 core driver is loaded, it opens three flow namespaces:
1. By-pass namespace (used by mlx5 IB driver).
2. Kernel namespace (used in order to get packets to the networking stack
through mlx5 EN driver).
3. Leftovers namespace (used by mlx5 IB and future sniffer)

The series is built as follows:

Patch #1 introduces auto-grouped flow tables support.

Patch #2 add utility functions for finding the next and the previous
flow tables in different priorities. This is used in order to chain
the flow tables in a downstream patch.

Patch #3 introduces a firmware command for updating the root flow table.

Patch #4 introduces modify flow table firmware command, this command is used
when we want to change the next flow table of an existing flow table.
This is used for chaining flow tables as well.

Patch #5 connect/disconnect flow tables. This is actually the chaining
process when we want to link flow tables. This means that if we couldn't
find a match in the first flow table, we'll continue in the chained
flow table.

Patch #6 updates priority's attributes that is required for flow table
level allocation. We update both the max_fts (the number of allowed FTs
in the sub-tree of this priority) and the start_level (which is the first
level we'll assign to the flow-tables created inside the priority).

Patch #7 adds checking of required device capabilities. Some namespaces
could be only created if the hardware supports certain attributes.
This is especially true for the Bypass and leftovers namespaces. This
adds a generic mechanism to check these required attributes.

Patch #8 creates two additional namespaces:
a. Bypass flow rules(has nine priorities)
b. Leftovers packets(have one priority) - for unmatched packets.

Patch #9 re-factors ipv4/ipv6 match fields in the mlx5 firmware interface
header to be more clear.

Patch #10 exports the flow steering API for mlx5_ib usage

Patch #11 and #12 implements the required support in mlx5_ib in order
to support the RDMA flow steering verbs.

Regards,
Moni, Matan and Maor


Maor Gottlieb (12):
  net/mlx5_core: Introduce flow steering autogrouped flow table
  net/mlx5_core: Add utilities to find next and prev flow-tables
  net/mlx5_core: Managing root flow table
  net/mlx5_core: Introduce modify flow table command
  net/mlx5_core: Connect flow tables
  net/mlx5_core: Set priority attributes
  net/mlx5_core: Initialize namespaces only when supported by device
  net/mlx5_core: Enable flow steering support for the IB driver
  net/mlx5_core: Make ipv4/ipv6 location more clear
  net/mlx5_core: Export flow steering API
  IB/mlx5: Add flow steering utilities
  IB/mlx5: Add flow steering support

 drivers/infiniband/hw/mlx5/main.c |  462 
 drivers/infiniband/hw/mlx5/mlx5_ib.h  |   45 ++-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c  |   52 ++-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h  |9 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  601 ++---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |   14 +
 include/linux/mlx5/device.h   |2 +
 include/linux/mlx5/fs.h   |   18 +
 include/linux/mlx5/mlx5_ifc.h |  105 -
 9 files changed, 1232 insertions(+), 76 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

[PATCH net-next V1 03/12] net/mlx5_core: Managing root flow table

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

The root Flow Table for each Flow Table Type is defined,
by default, as the Flow Table with level 0.

In order not to use an empty flow tables and introduce new hops,
but still preserve space for flow-tables that have a priority
greater(lower number) than the current flow table, we introduce this
new set root flow table command.
This command tells the HW to start matching packets from the
assigned root flow table.
This command is used when we create new flow table with level lower than the
current lowest flow table or it is the first flow table.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c  |   18 
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h  |2 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   97 +++--
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |6 ++
 include/linux/mlx5/mlx5_ifc.h |   31 +++-
 5 files changed, 144 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index 5096f4f..d8b1195 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -38,6 +38,24 @@
 #include "fs_cmd.h"
 #include "mlx5_core.h"
 
+int mlx5_cmd_update_root_ft(struct mlx5_core_dev *dev,
+   struct mlx5_flow_table *ft)
+{
+   u32 in[MLX5_ST_SZ_DW(set_flow_table_root_in)];
+   u32 out[MLX5_ST_SZ_DW(set_flow_table_root_out)];
+
+   memset(in, 0, sizeof(in));
+
+   MLX5_SET(set_flow_table_root_in, in, opcode,
+MLX5_CMD_OP_SET_FLOW_TABLE_ROOT);
+   MLX5_SET(set_flow_table_root_in, in, table_type, ft->type);
+   MLX5_SET(set_flow_table_root_in, in, table_id, ft->id);
+
+   memset(out, 0, sizeof(out));
+   return mlx5_cmd_exec_check_status(dev, in, sizeof(in), out,
+ sizeof(out));
+}
+
 int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
   enum fs_flow_table_type type, unsigned int level,
   unsigned int log_size, unsigned int *table_id)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
index f39304e..70d18ec 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
@@ -62,4 +62,6 @@ int mlx5_cmd_delete_fte(struct mlx5_core_dev *dev,
struct mlx5_flow_table *ft,
unsigned int index);
 
+int mlx5_cmd_update_root_ft(struct mlx5_core_dev *dev,
+   struct mlx5_flow_table *ft);
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index c5a96e6..64bdb54 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -510,6 +510,29 @@ static struct mlx5_flow_table *find_prev_chained_ft(struct 
fs_prio *prio)
return find_closest_ft(prio, true);
 }
 
+static int update_root_ft_create(struct mlx5_flow_table *ft, struct fs_prio
+*prio)
+{
+   struct mlx5_flow_root_namespace *root = find_root(>node);
+   int min_level = INT_MAX;
+   int err;
+
+   if (root->root_ft)
+   min_level = root->root_ft->level;
+
+   if (ft->level >= min_level)
+   return 0;
+
+   err = mlx5_cmd_update_root_ft(root->dev, ft);
+   if (err)
+   mlx5_core_warn(root->dev, "Update root flow table of id=%u 
failed\n",
+  ft->id);
+   else
+   root->root_ft = ft;
+
+   return err;
+}
+
 struct mlx5_flow_table *mlx5_create_flow_table(struct mlx5_flow_namespace *ns,
   int prio,
   int max_fte)
@@ -526,14 +549,15 @@ struct mlx5_flow_table *mlx5_create_flow_table(struct 
mlx5_flow_namespace *ns,
return ERR_PTR(-ENODEV);
}
 
+   mutex_lock(>chain_lock);
fs_prio = find_prio(ns, prio);
-   if (!fs_prio)
-   return ERR_PTR(-EINVAL);
-
-   lock_ref_node(_prio->node);
+   if (!fs_prio) {
+   err = -EINVAL;
+   goto unlock_root;
+   }
if (fs_prio->num_ft == fs_prio->max_ft) {
err = -ENOSPC;
-   goto unlock_prio;
+   goto unlock_root;
}
 
ft = alloc_flow_table(find_next_free_level(fs_prio),
@@ -541,7 +565,7 @@ struct mlx5_flow_t

[PATCH net-next V1 07/12] net/mlx5_core: Initialize namespaces only when supported by device

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Before we create the sub tree of a steering namespaces(kernel, bypass,
leftovers) we check that the device has the required capabilities
in order to create this subtree.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   70 ++--
 1 files changed, 49 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index e1282e8..96e287a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -40,20 +40,17 @@
 #define INIT_TREE_NODE_ARRAY_SIZE(...) (sizeof((struct 
init_tree_node[]){__VA_ARGS__}) /\
 sizeof(struct init_tree_node))
 
-#define INIT_PRIO(min_level_val, max_ft_val,\
- ...) {.type = FS_TYPE_PRIO,\
+#define ADD_PRIO(min_level_val, max_ft_val, caps_val,\
+...) {.type = FS_TYPE_PRIO,\
.min_ft_level = min_level_val,\
.max_ft = max_ft_val,\
+   .caps = caps_val,\
.children = (struct init_tree_node[]) {__VA_ARGS__},\
.ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
 }
 
-#define ADD_PRIO(min_level_val, max_ft_val, ...)\
-   INIT_PRIO(min_level_val, max_ft_val,\
- __VA_ARGS__)\
-
 #define ADD_FT_PRIO(max_ft_val, ...)\
-   INIT_PRIO(0, max_ft_val,\
+   ADD_PRIO(0, max_ft_val, {},\
  __VA_ARGS__)\
 
 #define ADD_NS(...) {.type = FS_TYPE_NAMESPACE,\
@@ -61,12 +58,26 @@
.ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
 }
 
+#define INIT_CAPS_ARRAY_SIZE(...) (sizeof((long[]){__VA_ARGS__}) /\
+  sizeof(long))
+
+#define FS_CAP(cap) (__mlx5_bit_off(flow_table_nic_cap, cap))
+
+#define FS_REQUIRED_CAPS(...) {.arr_sz = INIT_CAPS_ARRAY_SIZE(__VA_ARGS__), \
+  .caps = (long[]) {__VA_ARGS__} }
+
 #define KERNEL_MAX_FT 2
 #define KENREL_MIN_LEVEL 2
+
+struct node_caps {
+   size_t  arr_sz;
+   long*caps;
+};
 static struct init_tree_node {
enum fs_node_type   type;
struct init_tree_node *children;
int ar_size;
+   struct node_caps caps;
int min_ft_level;
int prio;
int max_ft;
@@ -74,7 +85,7 @@ static struct init_tree_node {
.type = FS_TYPE_NAMESPACE,
.ar_size = 1,
.children = (struct init_tree_node[]) {
-   ADD_PRIO(KENREL_MIN_LEVEL, 0,
+   ADD_PRIO(KENREL_MIN_LEVEL, 0, {},
 ADD_NS(ADD_FT_PRIO(KERNEL_MAX_FT))),
}
 };
@@ -1153,11 +1164,31 @@ static struct mlx5_flow_namespace 
*fs_create_namespace(struct fs_prio *prio)
return ns;
 }
 
-static int init_root_tree_recursive(int max_ft_level, struct init_tree_node 
*init_node,
+#define FLOW_TABLE_BIT_SZ 1
+#define GET_FLOW_TABLE_CAP(dev, offset) \
+   ((be32_to_cpu(*((__be32 *)(dev->hca_caps_cur[MLX5_CAP_FLOW_TABLE]) +
\
+   offset / 32)) >>
\
+ (32 - FLOW_TABLE_BIT_SZ - (offset & 0x1f))) & FLOW_TABLE_BIT_SZ)
+static bool has_required_caps(struct mlx5_core_dev *dev, struct node_caps 
*caps)
+{
+   int i;
+
+   for (i = 0; i < caps->arr_sz; i++) {
+   if (!GET_FLOW_TABLE_CAP(dev, caps->caps[i]))
+   return false;
+   }
+   return true;
+}
+
+static int init_root_tree_recursive(struct mlx5_core_dev *dev,
+   struct init_tree_node *init_node,
struct fs_node *fs_parent_node,
struct init_tree_node *init_parent_node,
int index)
 {
+   int max_ft_level = MLX5_CAP_FLOWTABLE(dev,
+ flow_table_properties_nic_receive.
+ max_ft_level);
struct mlx5_flow_namespace *fs_ns;
struct fs_prio *fs_prio;
struct fs_node *base;
@@ -1165,8 +1196,9 @@ static int init_root_tree_recursive(int max_ft_level, 
struct init_tree_node *ini
int err;
 
if (init_node->type == FS_TYPE_PRIO) {
-   if (init_node->min_ft_level > max_ft_level)
-   return -ENOTSUPP;
+   if ((init_node->min_ft_level > max_ft_level) ||
+   !has_required_caps(dev, _node->caps))
+   return 0;
 
fs_get_obj(fs_ns, fs_parent_node);
fs_prio = fs_create_prio(fs_ns, index, init_node->max_ft);
@@ -1183,9 +1215,8 @@ static int init_root_tree_recursive(int max_ft_level, 
struct init_t

[PATCH net-next V1 01/12] net/mlx5_core: Introduce flow steering autogrouped flow table

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

When user add rule to autogrouped flow table, we search
for flow group with the same match criteria, if we don't
find such group then we create new flow group with the
required match criteria and insert the rule to this group.

We divide the flow table into required_groups + 1,
in order to reserve a part of the flow table for rules
which don't match any existing group.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  166 ++---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |5 +
 include/linux/mlx5/fs.h   |6 +
 3 files changed, 158 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index f7d62fe..7d24bbb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -85,6 +85,12 @@ static struct init_tree_node {
}
 };
 
+enum fs_i_mutex_lock_class {
+   FS_MUTEX_GRANDPARENT,
+   FS_MUTEX_PARENT,
+   FS_MUTEX_CHILD
+};
+
 static void del_rule(struct fs_node *node);
 static void del_flow_table(struct fs_node *node);
 static void del_flow_group(struct fs_node *node);
@@ -119,10 +125,11 @@ static void tree_get_node(struct fs_node *node)
atomic_inc(>refcount);
 }
 
-static void nested_lock_ref_node(struct fs_node *node)
+static void nested_lock_ref_node(struct fs_node *node,
+enum fs_i_mutex_lock_class class)
 {
if (node) {
-   mutex_lock_nested(>lock, SINGLE_DEPTH_NESTING);
+   mutex_lock_nested(>lock, class);
atomic_inc(>refcount);
}
 }
@@ -481,9 +488,7 @@ struct mlx5_flow_table *mlx5_create_flow_table(struct 
mlx5_flow_namespace *ns,
list_add_tail(>node.list, _prio->node.children);
fs_prio->num_ft++;
unlock_ref_node(_prio->node);
-
return ft;
-
 free_ft:
kfree(ft);
 unlock_prio:
@@ -491,8 +496,32 @@ unlock_prio:
return ERR_PTR(err);
 }
 
-struct mlx5_flow_group *mlx5_create_flow_group(struct mlx5_flow_table *ft,
-  u32 *fg_in)
+struct mlx5_flow_table *mlx5_create_auto_grouped_flow_table(struct 
mlx5_flow_namespace *ns,
+   int prio,
+   int 
num_flow_table_entries,
+   int max_num_groups)
+{
+   struct mlx5_flow_table *ft;
+
+   if (max_num_groups > num_flow_table_entries)
+   return ERR_PTR(-EINVAL);
+
+   ft = mlx5_create_flow_table(ns, prio, num_flow_table_entries);
+   if (IS_ERR(ft))
+   return ft;
+
+   ft->autogroup.active = true;
+   ft->autogroup.required_groups = max_num_groups;
+
+   return ft;
+}
+
+/* Flow table should be locked */
+static struct mlx5_flow_group *create_flow_group_common(struct mlx5_flow_table 
*ft,
+   u32 *fg_in,
+   struct list_head
+   *prev_fg,
+   bool is_auto_fg)
 {
struct mlx5_flow_group *fg;
struct mlx5_core_dev *dev = get_dev(>node);
@@ -505,18 +534,33 @@ struct mlx5_flow_group *mlx5_create_flow_group(struct 
mlx5_flow_table *ft,
if (IS_ERR(fg))
return fg;
 
-   lock_ref_node(>node);
err = mlx5_cmd_create_flow_group(dev, ft, fg_in, >id);
if (err) {
kfree(fg);
-   unlock_ref_node(>node);
return ERR_PTR(err);
}
+
+   if (ft->autogroup.active)
+   ft->autogroup.num_groups++;
/* Add node to tree */
-   tree_init_node(>node, 1, del_flow_group);
+   tree_init_node(>node, !is_auto_fg, del_flow_group);
tree_add_node(>node, >node);
/* Add node to group list */
list_add(>node.list, ft->node.children.prev);
+
+   return fg;
+}
+
+struct mlx5_flow_group *mlx5_create_flow_group(struct mlx5_flow_table *ft,
+  u32 *fg_in)
+{
+   struct mlx5_flow_group *fg;
+
+   if (ft->autogroup.active)
+   return ERR_PTR(-EPERM);
+
+   lock_ref_node(>node);
+   fg = create_flow_group_common(ft, fg_in, >node.children, false);
unlock_ref_node(>node);
 
return fg;
@@ -614,7 +658,63 @@ static struct fs_fte *create_fte(struct mlx5_flow_group 
*fg,
return ft

[PATCH net-next V1 02/12] net/mlx5_core: Add utilities to find next and prev flow-tables

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Add two utility functions for find next and prev flow table.
Find next flow table function gets priority and return the
first flow table of the next priority in the tree.
Find prev flow table return the last flow table of
the previous priority in the tree.

These utility functions are used for chaining flow table from different
priorities.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   67 +
 1 files changed, 67 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 7d24bbb..c5a96e6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -443,6 +443,73 @@ static struct mlx5_flow_table *alloc_flow_table(int level, 
int max_fte,
return ft;
 }
 
+/* If reverse is false, then we search for the first flow table in the
+ * root sub-tree from start(closest from right), else we search for the
+ * last flow table in the root sub-tree till start(closest from left).
+ */
+static struct mlx5_flow_table *find_closest_ft_recursive(struct fs_node  *root,
+struct list_head 
*start,
+bool reverse)
+{
+#define list_advance_entry(pos, reverse)   \
+   ((reverse) ? list_prev_entry(pos, list) : list_next_entry(pos, list))
+
+#define list_for_each_advance_continue(pos, head, reverse) \
+   for (pos = list_advance_entry(pos, reverse);\
+>list != (head);  \
+pos = list_advance_entry(pos, reverse))
+
+   struct fs_node *iter = list_entry(start, struct fs_node, list);
+   struct mlx5_flow_table *ft = NULL;
+
+   if (!root)
+   return NULL;
+
+   list_for_each_advance_continue(iter, >children, reverse) {
+   if (iter->type == FS_TYPE_FLOW_TABLE) {
+   fs_get_obj(ft, iter);
+   return ft;
+   }
+   ft = find_closest_ft_recursive(iter, >children, reverse);
+   if (ft)
+   return ft;
+   }
+
+   return ft;
+}
+
+/* If reverse if false then return the first flow table in next priority of
+ * prio in the tree, else return the last flow table in the previous priority
+ * of prio in the tree.
+ */
+static struct mlx5_flow_table *find_closest_ft(struct fs_prio *prio, bool 
reverse)
+{
+   struct mlx5_flow_table *ft = NULL;
+   struct fs_node *curr_node;
+   struct fs_node *parent;
+
+   parent = prio->node.parent;
+   curr_node = >node;
+   while (!ft && parent) {
+   ft = find_closest_ft_recursive(parent, _node->list, 
reverse);
+   curr_node = parent;
+   parent = curr_node->parent;
+   }
+   return ft;
+}
+
+/* Assuming all the tree is locked by mutex chain lock */
+static struct mlx5_flow_table *find_next_chained_ft(struct fs_prio *prio)
+{
+   return find_closest_ft(prio, false);
+}
+
+/* Assuming all the tree is locked by mutex chain lock */
+static struct mlx5_flow_table *find_prev_chained_ft(struct fs_prio *prio)
+{
+   return find_closest_ft(prio, true);
+}
+
 struct mlx5_flow_table *mlx5_create_flow_table(struct mlx5_flow_namespace *ns,
   int prio,
   int max_fte)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V1 08/12] net/mlx5_core: Enable flow steering support for the IB driver

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

When the driver is loaded, we create flow steering namespace
for kernel bypass with nine priorities and another namespace
for leftovers(in order to catch packets that weren't matched).
Verbs applications will use these priorities.
we found nine as a number that balances the requirements from the
user and retains performance.

The bypass namespace is used by verbs applications that want to bypass
the kernel networking stack. The leftovers namespace is used by verbs
applications and the sniffer in order to catch packets that weren't
handled by any preceding rules.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   55 ++---
 include/linux/mlx5/device.h   |2 +
 include/linux/mlx5/fs.h   |2 +
 3 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 96e287a..757725b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -40,18 +40,19 @@
 #define INIT_TREE_NODE_ARRAY_SIZE(...) (sizeof((struct 
init_tree_node[]){__VA_ARGS__}) /\
 sizeof(struct init_tree_node))
 
-#define ADD_PRIO(min_level_val, max_ft_val, caps_val,\
+#define ADD_PRIO(num_prios_val, min_level_val, max_ft_val, caps_val,\
 ...) {.type = FS_TYPE_PRIO,\
.min_ft_level = min_level_val,\
.max_ft = max_ft_val,\
+   .num_leaf_prios = num_prios_val,\
.caps = caps_val,\
.children = (struct init_tree_node[]) {__VA_ARGS__},\
.ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
 }
 
-#define ADD_FT_PRIO(max_ft_val, ...)\
-   ADD_PRIO(0, max_ft_val, {},\
- __VA_ARGS__)\
+#define ADD_MULTIPLE_PRIO(num_prios_val, max_ft_val, ...)\
+   ADD_PRIO(num_prios_val, 0, max_ft_val, {},\
+__VA_ARGS__)\
 
 #define ADD_NS(...) {.type = FS_TYPE_NAMESPACE,\
.children = (struct init_tree_node[]) {__VA_ARGS__},\
@@ -66,7 +67,14 @@
 #define FS_REQUIRED_CAPS(...) {.arr_sz = INIT_CAPS_ARRAY_SIZE(__VA_ARGS__), \
   .caps = (long[]) {__VA_ARGS__} }
 
+#define LEFTOVERS_MAX_FT 1
+#define LEFTOVERS_NUM_PRIOS 1
+#define BY_PASS_PRIO_MAX_FT 1
+#define BY_PASS_MIN_LEVEL (KENREL_MIN_LEVEL + MLX5_BY_PASS_NUM_PRIOS +\
+  LEFTOVERS_MAX_FT)
+
 #define KERNEL_MAX_FT 2
+#define KERNEL_NUM_PRIOS 1
 #define KENREL_MIN_LEVEL 2
 
 struct node_caps {
@@ -79,14 +87,27 @@ static struct init_tree_node {
int ar_size;
struct node_caps caps;
int min_ft_level;
+   int num_leaf_prios;
int prio;
int max_ft;
 } root_fs = {
.type = FS_TYPE_NAMESPACE,
-   .ar_size = 1,
+   .ar_size = 3,
.children = (struct init_tree_node[]) {
-   ADD_PRIO(KENREL_MIN_LEVEL, 0, {},
-ADD_NS(ADD_FT_PRIO(KERNEL_MAX_FT))),
+   ADD_PRIO(0, BY_PASS_MIN_LEVEL, 0,
+
FS_REQUIRED_CAPS(FS_CAP(flow_table_properties_nic_receive.flow_modify_en),
+ 
FS_CAP(flow_table_properties_nic_receive.modify_root),
+ 
FS_CAP(flow_table_properties_nic_receive.identified_miss_table_mode),
+ 
FS_CAP(flow_table_properties_nic_receive.flow_table_modify)),
+ADD_NS(ADD_MULTIPLE_PRIO(MLX5_BY_PASS_NUM_PRIOS, 
BY_PASS_PRIO_MAX_FT))),
+   ADD_PRIO(0, KENREL_MIN_LEVEL, 0, {},
+ADD_NS(ADD_MULTIPLE_PRIO(KERNEL_NUM_PRIOS, 
KERNEL_MAX_FT))),
+   ADD_PRIO(0, BY_PASS_MIN_LEVEL, 0,
+
FS_REQUIRED_CAPS(FS_CAP(flow_table_properties_nic_receive.flow_modify_en),
+ 
FS_CAP(flow_table_properties_nic_receive.modify_root),
+ 
FS_CAP(flow_table_properties_nic_receive.identified_miss_table_mode),
+ 
FS_CAP(flow_table_properties_nic_receive.flow_table_modify)),
+ADD_NS(ADD_MULTIPLE_PRIO(LEFTOVERS_NUM_PRIOS, 
LEFTOVERS_MAX_FT))),
}
 };
 
@@ -1098,8 +1119,10 @@ struct mlx5_flow_namespace 
*mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
return NULL;
 
switch (type) {
+   case MLX5_FLOW_NAMESPACE_BYPASS:
case MLX5_FLOW_NAMESPACE_KERNEL:
-   prio = 0;
+   case MLX5_FLOW_NAMESPACE_LEFTOVERS:
+   prio = type;
break;
case MLX5_FLOW_NAMESPACE_FDB:
if (dev->priv.fdb_root_

[PATCH net-next V1 11/12] IB/mlx5: Add flow steering utilities

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Add three utility functions for support flow steering:

1. Parsing verbs flow attributes hardware steering specs.

2. Check if flow is multicast - this is required in order to decide to
which flow table will we add the steering rule.

3. Set outer headers in flow match criteria to zeros.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c |  177 +
 include/linux/mlx5/fs.h   |   10 ++
 2 files changed, 187 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 7e97cb5..e16f13f 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -43,6 +43,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "user.h"
 #include "mlx5_ib.h"
 
@@ -835,6 +837,181 @@ static int mlx5_ib_dealloc_pd(struct ib_pd *pd)
return 0;
 }
 
+static bool outer_header_zero(u32 *match_criteria)
+{
+   int size = MLX5_ST_SZ_BYTES(fte_match_param);
+   char *outer_headers_c = MLX5_ADDR_OF(fte_match_param, match_criteria,
+outer_headers);
+
+   return outer_headers_c[0] == 0 && !memcmp(outer_headers_c,
+ outer_headers_c + 1,
+ size - 1);
+}
+
+static int parse_flow_attr(u32 *match_c, u32 *match_v,
+  union ib_flow_spec *ib_spec)
+{
+   void *outer_headers_c = MLX5_ADDR_OF(fte_match_param, match_c,
+outer_headers);
+   void *outer_headers_v = MLX5_ADDR_OF(fte_match_param, match_v,
+outer_headers);
+   switch (ib_spec->type) {
+   case IB_FLOW_SPEC_ETH:
+   if (ib_spec->size != sizeof(ib_spec->eth))
+   return -EINVAL;
+
+   ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, 
outer_headers_c,
+dmac_47_16),
+   ib_spec->eth.mask.dst_mac);
+   ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, 
outer_headers_v,
+dmac_47_16),
+   ib_spec->eth.val.dst_mac);
+
+   if (ib_spec->eth.mask.vlan_tag) {
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+vlan_tag, 1);
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+vlan_tag, 1);
+
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+first_vid, ntohs(ib_spec->eth.mask.vlan_tag));
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+first_vid, ntohs(ib_spec->eth.val.vlan_tag));
+
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+first_cfi,
+ntohs(ib_spec->eth.mask.vlan_tag) >> 12);
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+first_cfi,
+ntohs(ib_spec->eth.val.vlan_tag) >> 12);
+
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+first_prio,
+ntohs(ib_spec->eth.mask.vlan_tag) >> 13);
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+first_prio,
+ntohs(ib_spec->eth.val.vlan_tag) >> 13);
+   }
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+ethertype, ntohs(ib_spec->eth.mask.ether_type));
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+ethertype, ntohs(ib_spec->eth.val.ether_type));
+   break;
+   case IB_FLOW_SPEC_IPV4:
+   if (ib_spec->size != sizeof(ib_spec->ipv4))
+   return -EINVAL;
+
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+ethertype, 0x);
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+ethertype, ETH_P_IP);
+
+   memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, outer_headers_c,
+   src_ipv4_src_ipv6.ipv4_layout.ipv4),
+  _spec->ipv4.mask.src_ip,
+  sizeof(ib_spec->ipv4.mask.src_ip));
+ 

[PATCH net-next V1 09/12] net/mlx5_core: Make ipv4/ipv6 location more clear

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Change the mlx5 firmware interface header to make it
more clear which bytes should be used by IPv4 or
IPv6 addresses.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h |   20 ++--
 1 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 7f16695..68d73f8 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -298,6 +298,22 @@ struct mlx5_ifc_odp_per_transport_service_cap_bits {
u8 reserved_1[0x1a];
 };
 
+struct mlx5_ifc_ipv4_layout_bits {
+   u8 reserved_0[0x60];
+
+   u8 ipv4[0x20];
+};
+
+struct mlx5_ifc_ipv6_layout_bits {
+   u8 ipv6[16][0x8];
+};
+
+union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits {
+   struct mlx5_ifc_ipv6_layout_bits ipv6_layout;
+   struct mlx5_ifc_ipv4_layout_bits ipv4_layout;
+   u8 reserved_0[0x80];
+};
+
 struct mlx5_ifc_fte_match_set_lyr_2_4_bits {
u8 smac_47_16[0x20];
 
@@ -328,9 +344,9 @@ struct mlx5_ifc_fte_match_set_lyr_2_4_bits {
u8 udp_sport[0x10];
u8 udp_dport[0x10];
 
-   u8 src_ip[4][0x20];
+   union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits src_ipv4_src_ipv6;
 
-   u8 dst_ip[4][0x20];
+   union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits dst_ipv4_dst_ipv6;
 };
 
 struct mlx5_ifc_fte_match_set_misc_bits {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V1 04/12] net/mlx5_core: Introduce modify flow table command

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Introduce the modify flow table command. This command is used when
we want to change the next flow table of an existing flow table.
The next flow table is defined as the table we search (in order
to find a match), if we couldn't find a match in any of the flow table
entries in the current flow table.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c |   27 ++
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h |4 ++
 include/linux/mlx5/mlx5_ifc.h|   56 --
 3 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index d8b1195..2b55625 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -101,6 +101,33 @@ int mlx5_cmd_destroy_flow_table(struct mlx5_core_dev *dev,
  sizeof(out));
 }
 
+int mlx5_cmd_modify_flow_table(struct mlx5_core_dev *dev,
+  struct mlx5_flow_table *ft,
+  struct mlx5_flow_table *next_ft)
+{
+   u32 in[MLX5_ST_SZ_DW(modify_flow_table_in)];
+   u32 out[MLX5_ST_SZ_DW(modify_flow_table_out)];
+
+   memset(in, 0, sizeof(in));
+   memset(out, 0, sizeof(out));
+
+   MLX5_SET(modify_flow_table_in, in, opcode,
+MLX5_CMD_OP_MODIFY_FLOW_TABLE);
+   MLX5_SET(modify_flow_table_in, in, table_type, ft->type);
+   MLX5_SET(modify_flow_table_in, in, table_id, ft->id);
+   MLX5_SET(modify_flow_table_in, in, modify_field_select,
+MLX5_MODIFY_FLOW_TABLE_MISS_TABLE_ID);
+   if (next_ft) {
+   MLX5_SET(modify_flow_table_in, in, table_miss_mode, 1);
+   MLX5_SET(modify_flow_table_in, in, table_miss_id, next_ft->id);
+   } else {
+   MLX5_SET(modify_flow_table_in, in, table_miss_mode, 0);
+   }
+
+   return mlx5_cmd_exec_check_status(dev, in, sizeof(in), out,
+ sizeof(out));
+}
+
 int mlx5_cmd_create_flow_group(struct mlx5_core_dev *dev,
   struct mlx5_flow_table *ft,
   u32 *in,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
index 70d18ec..1ae9b68 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
@@ -40,6 +40,10 @@ int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
 int mlx5_cmd_destroy_flow_table(struct mlx5_core_dev *dev,
struct mlx5_flow_table *ft);
 
+int mlx5_cmd_modify_flow_table(struct mlx5_core_dev *dev,
+  struct mlx5_flow_table *ft,
+  struct mlx5_flow_table *next_ft);
+
 int mlx5_cmd_create_flow_group(struct mlx5_core_dev *dev,
   struct mlx5_flow_table *ft,
   u32 *in, unsigned int *group_id);
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 323e713..7f16695 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -194,7 +194,8 @@ enum {
MLX5_CMD_OP_QUERY_FLOW_GROUP  = 0x935,
MLX5_CMD_OP_SET_FLOW_TABLE_ENTRY  = 0x936,
MLX5_CMD_OP_QUERY_FLOW_TABLE_ENTRY= 0x937,
-   MLX5_CMD_OP_DELETE_FLOW_TABLE_ENTRY   = 0x938
+   MLX5_CMD_OP_DELETE_FLOW_TABLE_ENTRY   = 0x938,
+   MLX5_CMD_OP_MODIFY_FLOW_TABLE = 0x93c
 };
 
 struct mlx5_ifc_flow_table_fields_supported_bits {
@@ -260,7 +261,9 @@ struct mlx5_ifc_flow_table_prop_layout_bits {
u8 reserved_0[0x2];
u8 flow_modify_en[0x1];
u8 modify_root[0x1];
-   u8 reserved_1[0x1b];
+   u8 identified_miss_table_mode[0x1];
+   u8 flow_table_modify[0x1];
+   u8 reserved_1[0x19];
 
u8 reserved_2[0x2];
u8 log_max_ft_size[0x6];
@@ -5669,12 +5672,16 @@ struct mlx5_ifc_create_flow_table_in_bits {
 
u8 reserved_4[0x20];
 
-   u8 reserved_5[0x8];
+   u8 reserved_5[0x4];
+   u8 table_miss_mode[0x4];
u8 level[0x8];
u8 reserved_6[0x8];
u8 log_size[0x8];
 
-   u8 reserved_7[0x120];
+   u8 reserved_7[0x8];
+   u8 table_miss_id[0x18];
+
+   u8 reserved_8[0x100];
 };
 
 struct mlx5_ifc_create_flow_group_out_bits {
@@ -6975,4 +6982,45 @@ struct mlx5_ifc_set_flow_table_root_in_bits {
u8

[PATCH net-next V1 06/12] net/mlx5_core: Set priority attributes

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Each priority has two attributes:
1. max_ft - maximum allowed flow tables under this priority.
2. start_level - start level range of the flow tables
in the priority.

These attributes are set by traversing the tree nodes by
DFS and set start level and max flow tables to each priority.
Start level depends on the max flow tables of the prior priorities
in the tree.

The leaves of the trees have max_ft set in them. Each node accumulates
the max_ft of its children and set it accordingly.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   71 +++--
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |3 +
 2 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index c6f864d..e1282e8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -41,20 +41,19 @@
 sizeof(struct init_tree_node))
 
 #define INIT_PRIO(min_level_val, max_ft_val,\
- start_level_val, ...) {.type = FS_TYPE_PRIO,\
+ ...) {.type = FS_TYPE_PRIO,\
.min_ft_level = min_level_val,\
-   .start_level = start_level_val,\
.max_ft = max_ft_val,\
.children = (struct init_tree_node[]) {__VA_ARGS__},\
.ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
 }
 
-#define ADD_PRIO(min_level_val, max_ft_val, start_level_val, ...)\
-   INIT_PRIO(min_level_val, max_ft_val, start_level_val,\
+#define ADD_PRIO(min_level_val, max_ft_val, ...)\
+   INIT_PRIO(min_level_val, max_ft_val,\
  __VA_ARGS__)\
 
-#define ADD_FT_PRIO(max_ft_val, start_level_val, ...)\
-   INIT_PRIO(0, max_ft_val, start_level_val,\
+#define ADD_FT_PRIO(max_ft_val, ...)\
+   INIT_PRIO(0, max_ft_val,\
  __VA_ARGS__)\
 
 #define ADD_NS(...) {.type = FS_TYPE_NAMESPACE,\
@@ -62,8 +61,6 @@
.ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
 }
 
-#define KERNEL_START_LEVEL 0
-#define KERNEL_P0_START_LEVEL KERNEL_START_LEVEL
 #define KERNEL_MAX_FT 2
 #define KENREL_MIN_LEVEL 2
 static struct init_tree_node {
@@ -73,15 +70,12 @@ static struct init_tree_node {
int min_ft_level;
int prio;
int max_ft;
-   int start_level;
 } root_fs = {
.type = FS_TYPE_NAMESPACE,
.ar_size = 1,
.children = (struct init_tree_node[]) {
-   ADD_PRIO(KENREL_MIN_LEVEL, KERNEL_MAX_FT,
-KERNEL_START_LEVEL,
-ADD_NS(ADD_FT_PRIO(KERNEL_MAX_FT,
-   KERNEL_P0_START_LEVEL))),
+   ADD_PRIO(KENREL_MIN_LEVEL, 0,
+ADD_NS(ADD_FT_PRIO(KERNEL_MAX_FT))),
}
 };
 
@@ -1117,8 +,7 @@ struct mlx5_flow_namespace 
*mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
 }
 
 static struct fs_prio *fs_create_prio(struct mlx5_flow_namespace *ns,
- unsigned prio, int max_ft,
- int start_level)
+ unsigned prio, int max_ft)
 {
struct fs_prio *fs_prio;
 
@@ -1131,7 +1124,6 @@ static struct fs_prio *fs_create_prio(struct 
mlx5_flow_namespace *ns,
tree_add_node(_prio->node, >node);
fs_prio->max_ft = max_ft;
fs_prio->prio = prio;
-   fs_prio->start_level = start_level;
list_add_tail(_prio->node.list, >node.children);
 
return fs_prio;
@@ -1177,8 +1169,7 @@ static int init_root_tree_recursive(int max_ft_level, 
struct init_tree_node *ini
return -ENOTSUPP;
 
fs_get_obj(fs_ns, fs_parent_node);
-   fs_prio = fs_create_prio(fs_ns, index, init_node->max_ft,
-init_node->start_level);
+   fs_prio = fs_create_prio(fs_ns, index, init_node->max_ft);
if (IS_ERR(fs_prio))
return PTR_ERR(fs_prio);
base = _prio->node;
@@ -1245,6 +1236,46 @@ static struct mlx5_flow_root_namespace 
*create_root_ns(struct mlx5_core_dev *dev
return root_ns;
 }
 
+static void set_prio_attrs_in_prio(struct fs_prio *prio, int acc_level);
+
+static int set_prio_attrs_in_ns(struct mlx5_flow_namespace *ns, int acc_level)
+{
+   struct fs_prio *prio;
+
+   fs_for_each_prio(prio, ns) {
+/* This updates prio start_level and max_ft */
+   set_prio_attrs_in_prio(prio, acc_level);
+   acc_level += prio->max_ft;
+   }
+   return acc_level;
+}
+
+static void set_prio_at

[PATCH net-next V1 12/12] IB/mlx5: Add flow steering support

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Adding flow steering support by creating a flow-table per
priority (if rules exist in the priority).
mlx5_ib uses autogrouping and thus only creates the
required destinations.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c|  285 ++
 drivers/infiniband/hw/mlx5/mlx5_ib.h |   45 +-
 2 files changed, 329 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index e16f13f..01f7ef5 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "user.h"
 #include "mlx5_ib.h"
 
@@ -1012,6 +1013,281 @@ static bool is_valid_attr(struct ib_flow_attr 
*flow_attr)
return !has_ipv4_spec || eth_type_ipv4;
 }
 
+static void put_flow_table(struct mlx5_ib_dev *dev,
+  struct mlx5_ib_flow_prio *prio, bool ft_added)
+{
+   prio->refcount -= !!ft_added;
+   if (!prio->refcount) {
+   mlx5_destroy_flow_table(prio->flow_table);
+   prio->flow_table = NULL;
+   }
+}
+
+static int mlx5_ib_destroy_flow(struct ib_flow *flow_id)
+{
+   struct mlx5_ib_dev *dev = to_mdev(flow_id->qp->device);
+   struct mlx5_ib_flow_handler *handler = container_of(flow_id,
+ struct 
mlx5_ib_flow_handler,
+ ibflow);
+   struct mlx5_ib_flow_handler *iter, *tmp;
+
+   mutex_lock(>flow_db.lock);
+
+   list_for_each_entry_safe(iter, tmp, >list, list) {
+   mlx5_del_flow_rule(iter->rule);
+   list_del(>list);
+   kfree(iter);
+   }
+
+   mlx5_del_flow_rule(handler->rule);
+   put_flow_table(dev, >flow_db.prios[handler->prio], true);
+   mutex_unlock(>flow_db.lock);
+
+   kfree(handler);
+
+   return 0;
+}
+
+#define MLX5_FS_MAX_TYPES   10
+#define MLX5_FS_MAX_ENTRIES 32000UL
+static struct mlx5_ib_flow_prio *get_flow_table(struct mlx5_ib_dev *dev,
+   struct ib_flow_attr *flow_attr)
+{
+   struct mlx5_flow_namespace *ns = NULL;
+   struct mlx5_ib_flow_prio *prio;
+   struct mlx5_flow_table *ft;
+   int num_entries;
+   int num_groups;
+   int priority;
+   int err = 0;
+
+   if (flow_attr->type == IB_FLOW_ATTR_NORMAL) {
+   if (flow_is_multicast_only(flow_attr))
+   priority = MLX5_IB_FLOW_MCAST_PRIO;
+   else
+   priority = flow_attr->priority;
+   ns = mlx5_get_flow_namespace(dev->mdev,
+MLX5_FLOW_NAMESPACE_BYPASS);
+   num_entries = MLX5_FS_MAX_ENTRIES;
+   num_groups = MLX5_FS_MAX_TYPES;
+   prio = >flow_db.prios[priority];
+   } else if (flow_attr->type == IB_FLOW_ATTR_ALL_DEFAULT ||
+  flow_attr->type == IB_FLOW_ATTR_MC_DEFAULT) {
+   ns = mlx5_get_flow_namespace(dev->mdev,
+MLX5_FLOW_NAMESPACE_LEFTOVERS);
+   build_leftovers_ft_param(,
+_entries,
+_groups);
+   prio = >flow_db.prios[MLX5_IB_FLOW_LEFTOVERS_PRIO];
+   }
+
+   if (!ns)
+   return ERR_PTR(-ENOTSUPP);
+
+   ft = prio->flow_table;
+   if (!ft) {
+   ft = mlx5_create_auto_grouped_flow_table(ns, priority,
+num_entries,
+num_groups);
+
+   if (!IS_ERR(ft)) {
+   prio->refcount = 0;
+   prio->flow_table = ft;
+   } else {
+   err = PTR_ERR(ft);
+   }
+   }
+
+   return err ? ERR_PTR(err) : prio;
+}
+
+static struct mlx5_ib_flow_handler *create_flow_rule(struct mlx5_ib_dev *dev,
+struct mlx5_ib_flow_prio 
*ft_prio,
+struct ib_flow_attr 
*flow_attr,
+struct 
mlx5_flow_destination *dst)
+{
+   struct mlx5_flow_table  *ft = ft_prio->flow_table;
+   struct mlx5_ib_flow_handler *handler;
+   void *ib_flow = flow_attr + 1;
+   u8 match_criteria_enable = 0;
+   unsigned int spec_index;
+   u32 *match_c;
+   u32 *match_v;
+   int err = 0;
+
+   if (

[PATCH net-next V1 05/12] net/mlx5_core: Connect flow tables

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Flow tables from different priorities should be chained together.
When a packet arrives we search for a match in the
by-pass flow tables (first we search for a match in priority 0
and if we don't find a match we move to the next priority).
If we can't find a match in any of the bypass flow-tables, we continue
searching in the flow-tables of the next priority, which are the
kernel's flow tables.

Setting the miss flow table in a new flow table to be the next one in
the list is performed via create flow table API. If we want to change an
existing flow table, for example in order to point from an
existing flow table to the new next-in-list flow table, we use the
modify flow table API.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c  |7 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h  |3 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  104 +++--
 3 files changed, 104 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index 2b55625..a9894d2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -58,7 +58,8 @@ int mlx5_cmd_update_root_ft(struct mlx5_core_dev *dev,
 
 int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
   enum fs_flow_table_type type, unsigned int level,
-  unsigned int log_size, unsigned int *table_id)
+  unsigned int log_size, struct mlx5_flow_table
+  *next_ft, unsigned int *table_id)
 {
u32 out[MLX5_ST_SZ_DW(create_flow_table_out)];
u32 in[MLX5_ST_SZ_DW(create_flow_table_in)];
@@ -69,6 +70,10 @@ int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
MLX5_SET(create_flow_table_in, in, opcode,
 MLX5_CMD_OP_CREATE_FLOW_TABLE);
 
+   if (next_ft) {
+   MLX5_SET(create_flow_table_in, in, table_miss_mode, 1);
+   MLX5_SET(create_flow_table_in, in, table_miss_id, next_ft->id);
+   }
MLX5_SET(create_flow_table_in, in, table_type, type);
MLX5_SET(create_flow_table_in, in, level, level);
MLX5_SET(create_flow_table_in, in, log_size, log_size);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
index 1ae9b68..9814d47 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
@@ -35,7 +35,8 @@
 
 int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
   enum fs_flow_table_type type, unsigned int level,
-  unsigned int log_size, unsigned int *table_id);
+  unsigned int log_size, struct mlx5_flow_table
+  *next_ft, unsigned int *table_id);
 
 int mlx5_cmd_destroy_flow_table(struct mlx5_core_dev *dev,
struct mlx5_flow_table *ft);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 64bdb54..c6f864d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -510,6 +510,48 @@ static struct mlx5_flow_table *find_prev_chained_ft(struct 
fs_prio *prio)
return find_closest_ft(prio, true);
 }
 
+static int connect_fts_in_prio(struct mlx5_core_dev *dev,
+  struct fs_prio *prio,
+  struct mlx5_flow_table *ft)
+{
+   struct mlx5_flow_table *iter;
+   int i = 0;
+   int err;
+
+   fs_for_each_ft(iter, prio) {
+   i++;
+   err = mlx5_cmd_modify_flow_table(dev,
+iter,
+ft);
+   if (err) {
+   mlx5_core_warn(dev, "Failed to modify flow table %d\n",
+  iter->id);
+   /* The driver is out of sync with the FW */
+   if (i > 1)
+   WARN_ON(true);
+   return err;
+   }
+   }
+   return 0;
+}
+
+/* Connect flow tables from previous priority of prio to ft */
+static int connect_prev_fts(struct mlx5_core_dev *dev,
+   struct mlx5_flow_table *ft,
+   struct fs_prio *prio)
+{
+   struct mlx5_flow_table *prev_ft;
+
+   prev_ft = find_prev_chained_ft(prio);
+   if (prev_ft) {
+   struct fs_prio *prev_prio

[PATCH net-next V1 10/12] net/mlx5_core: Export flow steering API

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Add exports to flow steering API for mlx5_ib usage.
The following functions are exported:

1. mlx5_create_auto_grouped_flow_table - used to create flow
table with auto flow grouping management (create and destroy
flow groups). In auto-grouped flow tables, we create groups
automatically if needed (if we don't find an existing
flow group with same match criteria when we add new rule).

2. mlx5_destroy_flow_table - used to destroy  a flow table.

3. mlx5_add_flow_rule - used to add flow rule into a flow table.

4. mlx5_del_flow_rule - used to delete flow rule from its flow table.

5. mlx5_get_flow_namespace - used to get a handle to the required
namespace sub-tree.

Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Moni Shoua <mo...@mellanox.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 757725b..6f68dba 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -702,6 +702,7 @@ struct mlx5_flow_table 
*mlx5_create_auto_grouped_flow_table(struct mlx5_flow_nam
 
return ft;
 }
+EXPORT_SYMBOL(mlx5_create_auto_grouped_flow_table);
 
 /* Flow table should be locked */
 static struct mlx5_flow_group *create_flow_group_common(struct mlx5_flow_table 
*ft,
@@ -1013,11 +1014,13 @@ unlock:
unlock_ref_node(>node);
return rule;
 }
+EXPORT_SYMBOL(mlx5_add_flow_rule);
 
 void mlx5_del_flow_rule(struct mlx5_flow_rule *rule)
 {
tree_remove_node(>node);
 }
+EXPORT_SYMBOL(mlx5_del_flow_rule);
 
 /* Assuming prio->node.children(flow tables) is sorted by level */
 static struct mlx5_flow_table *find_next_ft(struct mlx5_flow_table *ft)
@@ -1099,6 +1102,7 @@ int mlx5_destroy_flow_table(struct mlx5_flow_table *ft)
 
return err;
 }
+EXPORT_SYMBOL(mlx5_destroy_flow_table);
 
 void mlx5_destroy_flow_group(struct mlx5_flow_group *fg)
 {
@@ -1143,6 +1147,7 @@ struct mlx5_flow_namespace 
*mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
 
return ns;
 }
+EXPORT_SYMBOL(mlx5_get_flow_namespace);
 
 static struct fs_prio *fs_create_prio(struct mlx5_flow_namespace *ns,
  unsigned prio, int max_ft)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V2 3/4] net/mlx5e: Add HW timestamping (TS) support

2015-12-20 Thread Saeed Mahameed
From: Eran Ben Elisha <era...@mellanox.com>

Add support for enable/disable HW timestamping for incoming and/or
outgoing packets. To enable/disable HW timestamping appropriate
ioctl should be used. Currently HWTSTAMP_FILTER_ALL/NONE and
HWTSAMP_TX_ON/OFF only are supported. Make all relevant changes in
RX/TX flows to consider TS request and plant HW timestamps into
relevant structures.

Add internal clock for converting hardware timestamp to nanoseconds. In
addition, add a service task to catch internal clock overflow, to make
sure timestamping is accurate.

Signed-off-by: Eran Ben Elisha <era...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   19 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  134 
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   31 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   77 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|   10 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   17 +++
 7 files changed, 288 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index fe11e96..01c0256 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -5,4 +5,4 @@ mlx5_core-y :=  main.o cmd.o debugfs.o fw.o eq.o uar.o 
pagealloc.o \
mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o \
en_main.o en_fs.o en_ethtool.o en_tx.o en_rx.o \
-   en_txrx.o
+   en_txrx.o en_clock.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ae3f0e3..0395e72 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -32,6 +32,8 @@
 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -485,6 +487,16 @@ struct mlx5e_flow_tables {
struct mlx5e_flow_table main;
 };
 
+struct mlx5e_tstamp {
+   rwlock_t   lock;
+   struct cyclecountercycles;
+   struct timecounter clock;
+   struct hwtstamp_config hwtstamp_config;
+   u32nominal_c_mult;
+   unsigned long  overflow_period;
+   struct delayed_workoverflow_work;
+};
+
 struct mlx5e_priv {
/* priv data path fields - start */
intdefault_vlan_prio;
@@ -518,6 +530,7 @@ struct mlx5e_priv {
struct mlx5_core_dev  *mdev;
struct net_device *netdev;
struct mlx5e_stats stats;
+   struct mlx5e_tstamptstamp;
 };
 
 #define MLX5E_NET_IP_ALIGN 2
@@ -584,6 +597,12 @@ void mlx5e_destroy_flow_tables(struct mlx5e_priv *priv);
 void mlx5e_init_eth_addr(struct mlx5e_priv *priv);
 void mlx5e_set_rx_mode_work(struct work_struct *work);
 
+void mlx5e_fill_hwstamp(struct mlx5e_tstamp *clock,
+   struct skb_shared_hwtstamps *hwts,
+   u64 timestamp);
+void mlx5e_timestamp_init(struct mlx5e_priv *priv);
+void mlx5e_timestamp_cleanup(struct mlx5e_priv *priv);
+
 int mlx5e_vlan_rx_add_vid(struct net_device *dev, __always_unused __be16 proto,
  u16 vid);
 int mlx5e_vlan_rx_kill_vid(struct net_device *dev, __always_unused __be16 
proto,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
new file mode 100644
index 000..b85863e
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
@@ -0,0 +1,134 @@
+/*
+ * Copyright (c) 2015, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * E

[PATCH net-next V2 4/4] net/mlx5e: Add PTP Hardware Clock (PHC) support

2015-12-20 Thread Saeed Mahameed
From: Eran Ben Elisha <era...@mellanox.com>

Add a PHC support to the mlx5_en driver. Use reader/writer spinlocks to
protect the timecounter since every packet received needs to call
timecounter_cycle2time() when timestamping is enabled.  This can become
a performance bottleneck with RSS and multiple receive queues if normal
spinlocks are used.

The driver has been tested with both Documentation/ptp/testptp and the
linuxptp project (http://linuxptp.sourceforge.net/) on a Mellanox
ConnectX-4 card.

Signed-off-by: Eran Ben Elisha <era...@mellanox.com>
Cc: Richard Cochran <richardcoch...@gmail.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|1 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |3 +
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  104 
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |2 +
 4 files changed, 110 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 158c88c..c503ea0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -13,6 +13,7 @@ config MLX5_CORE
 config MLX5_CORE_EN
bool "Mellanox Technologies ConnectX-4 Ethernet support"
depends on NETDEVICES && ETHERNET && PCI && MLX5_CORE
+   select PTP_1588_CLOCK
default n
---help---
  Ethernet support in Mellanox Technologies ConnectX-4 NIC.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 0395e72..9fa933d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -495,6 +496,8 @@ struct mlx5e_tstamp {
u32nominal_c_mult;
unsigned long  overflow_period;
struct delayed_workoverflow_work;
+   struct ptp_clock  *ptp;
+   struct ptp_clock_info  ptp_info;
 };
 
 struct mlx5e_priv {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
index b85863e..eacf633 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
@@ -52,6 +52,93 @@ void mlx5e_fill_hwstamp(struct mlx5e_tstamp *tstamp,
hwts->hwtstamp = ns_to_ktime(nsec);
 }
 
+static int mlx5e_ptp_settime(struct ptp_clock_info *ptp,
+const struct timespec64 *ts)
+{
+   struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp,
+  ptp_info);
+   u64 ns = timespec64_to_ns(ts);
+   unsigned long flags;
+
+   write_lock_irqsave(>lock, flags);
+   timecounter_init(>clock, >cycles, ns);
+   write_unlock_irqrestore(>lock, flags);
+
+   return 0;
+}
+
+static int mlx5e_ptp_gettime(struct ptp_clock_info *ptp,
+struct timespec64 *ts)
+{
+   struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp,
+  ptp_info);
+   u64 ns;
+   unsigned long flags;
+
+   write_lock_irqsave(>lock, flags);
+   ns = timecounter_read(>clock);
+   write_unlock_irqrestore(>lock, flags);
+
+   *ts = ns_to_timespec64(ns);
+
+   return 0;
+}
+
+static int mlx5e_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
+{
+   struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp,
+  ptp_info);
+   unsigned long flags;
+
+   write_lock_irqsave(>lock, flags);
+   timecounter_adjtime(>clock, delta);
+   write_unlock_irqrestore(>lock, flags);
+
+   return 0;
+}
+
+static int mlx5e_ptp_adjfreq(struct ptp_clock_info *ptp, s32 delta)
+{
+   u64 adj;
+   u32 diff;
+   int neg_adj = 0;
+   unsigned long flags;
+   struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp,
+ ptp_info);
+
+   if (delta < 0) {
+   neg_adj = 1;
+   delta = -delta;
+   }
+
+   adj = tstamp->nominal_c_mult;
+   adj *= delta;
+   diff = div_u64(adj, 10ULL);
+
+   write_lock_irqsave(>lock, flags);
+   timecounter_read(>clock);
+   tstamp->cycles.mult = neg_adj ? tstamp->nominal_c_mult - diff :
+   tstamp->nominal_c_mult + diff;
+   write_unlock_irqrestore(>lock, flags);
+
+   return 0;
+}
+
+static const struct ptp_clock_info mlx5e_ptp_clock_info = {
+   .owner  = THIS_MODULE,
+   .max_adj= 1,

[PATCH net-next V2 2/4] net/mlx5_core: Add support for reading hardware timestamp

2015-12-20 Thread Saeed Mahameed
From: Eran Ben Elisha <era...@mellanox.com>

A preparation step which adds support for reading the hardware
timestamp from the internal clock and from the CQE.
In addition, advertize device_frequency_khz HCA capability.

Signed-off-by: Eran Ben Elisha <era...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   31 
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|1 +
 include/linux/mlx5/device.h|   20 +++--
 include/linux/mlx5/mlx5_ifc.h  |5 ++-
 4 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 789882b..b16eb42 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -504,6 +504,37 @@ int mlx5_core_disable_hca(struct mlx5_core_dev *dev, u16 
func_id)
return mlx5_cmd_status_to_err_v2(out);
 }
 
+static u32 internal_timer_h(struct mlx5_core_dev *dev)
+{
+   return ioread32be(>iseg->internal_timer_h);
+}
+
+static u32 internal_timer_l(struct mlx5_core_dev *dev)
+{
+   return ioread32be(>iseg->internal_timer_l);
+}
+
+cycle_t mlx5_core_read_clock(struct mlx5_core_dev *dev)
+{
+   u32 timer_h, timer_h1, timer_l;
+
+   /*  Reading the internal timer using 2 PCI reads in a non-atomic manner
+* may hit the wraparound of the 32 LSBs. Reading the 32 MSBs twice can
+* verify a wraparound did not happen.
+*/
+   timer_h = internal_timer_h(dev);
+   timer_l = internal_timer_l(dev);
+   timer_h1 = internal_timer_h(dev);
+   if (timer_h == timer_h1)
+   goto ret;
+
+   /* In case of overflow or wraparound, re-read the LSB */
+   timer_l = internal_timer_l(dev);
+
+ret:
+   return (u64)timer_l | (u64)timer_h1 << 32;
+}
+
 static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
 {
struct mlx5_priv *priv  = >priv;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index ea6a137..b6651b8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -98,6 +98,7 @@ int mlx5_core_sriov_configure(struct pci_dev *dev, int 
num_vfs);
 int mlx5_core_enable_hca(struct mlx5_core_dev *dev, u16 func_id);
 int mlx5_core_disable_hca(struct mlx5_core_dev *dev, u16 func_id);
 int mlx5_wait_for_vf_pages(struct mlx5_core_dev *dev);
+cycle_t mlx5_core_read_clock(struct mlx5_core_dev *dev);
 
 void mlx5e_init(void);
 void mlx5e_cleanup(void);
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 7d3a85f..df2f79e 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -443,9 +443,12 @@ struct mlx5_init_seg {
__be32  rsvd1[120];
__be32  initializing;
struct health_bufferhealth;
-   __be32  rsvd2[884];
+   __be32  rsvd2[880];
+   __be32  internal_timer_h;
+   __be32  internal_timer_l;
+   __be32  rsrv3[2];
__be32  health_counter;
-   __be32  rsvd3[1019];
+   __be32  rsvd4[1019];
__be64  ieee1588_clk;
__be32  ieee1588_clk_type;
__be32  clr_intx;
@@ -601,7 +604,8 @@ struct mlx5_cqe64 {
__be32  imm_inval_pkey;
u8  rsvd40[4];
__be32  byte_cnt;
-   __be64  timestamp;
+   __be32  timestamp_h;
+   __be32  timestamp_l;
__be32  sop_drop_qpn;
__be16  wqe_counter;
u8  signature;
@@ -623,6 +627,16 @@ static inline int cqe_has_vlan(struct mlx5_cqe64 *cqe)
return !!(cqe->l4_hdr_type_etc & 0x1);
 }
 
+static inline u64 get_cqe_ts(struct mlx5_cqe64 *cqe)
+{
+   u32 hi, lo;
+
+   hi = be32_to_cpu(cqe->timestamp_h);
+   lo = be32_to_cpu(cqe->timestamp_l);
+
+   return (u64)lo | ((u64)hi << 32);
+}
+
 enum {
CQE_L4_HDR_TYPE_NONE= 0x0,
CQE_L4_HDR_TYPE_TCP_NO_ACK  = 0x1,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 131a273..e4da900 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -829,9 +829,10 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 reserved_66[0x8];
u8 log_uar_page_sz[0x10];
 
-   u8 reserved_67[0xe0];
+   u8 reserved_67[0x40];
+   u8 device_frequency_khz[0x20];
+   u8 reserved_68[0x5f];
 
-   u8 reserved_68[0x1f];
u8  

[PATCH net-next V2 1/4] net/mlx5e: Do not modify the TX SKB

2015-12-20 Thread Saeed Mahameed
From: Achiad Shochat <ach...@mellanox.com>

If the SKB is cloned, or has an elevated users count, someone else
can be looking at it at the same time.

Signed-off-by: Achiad Shochat <ach...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |5 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |5 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c   |   73 -
 3 files changed, 49 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f689ce5..ae3f0e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -328,14 +328,12 @@ struct mlx5e_rq {
struct mlx5e_priv *priv;
 } cacheline_aligned_in_smp;
 
-struct mlx5e_tx_skb_cb {
+struct mlx5e_tx_wqe_info {
u32 num_bytes;
u8  num_wqebbs;
u8  num_dma;
 };
 
-#define MLX5E_TX_SKB_CB(__skb) ((struct mlx5e_tx_skb_cb *)__skb->cb)
-
 enum mlx5e_dma_map_type {
MLX5E_DMA_MAP_SINGLE,
MLX5E_DMA_MAP_PAGE
@@ -371,6 +369,7 @@ struct mlx5e_sq {
/* pointers to per packet info: write@xmit, read@completion */
struct sk_buff   **skb;
struct mlx5e_sq_dma   *dma_fifo;
+   struct mlx5e_tx_wqe_info  *wqe_info;
 
/* read only */
struct mlx5_wq_cyc wq;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index d4601a5..96775a2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -507,6 +507,7 @@ static void mlx5e_close_rq(struct mlx5e_rq *rq)
 
 static void mlx5e_free_sq_db(struct mlx5e_sq *sq)
 {
+   kfree(sq->wqe_info);
kfree(sq->dma_fifo);
kfree(sq->skb);
 }
@@ -519,8 +520,10 @@ static int mlx5e_alloc_sq_db(struct mlx5e_sq *sq, int numa)
sq->skb = kzalloc_node(wq_sz * sizeof(*sq->skb), GFP_KERNEL, numa);
sq->dma_fifo = kzalloc_node(df_sz * sizeof(*sq->dma_fifo), GFP_KERNEL,
numa);
+   sq->wqe_info = kzalloc_node(wq_sz * sizeof(*sq->wqe_info), GFP_KERNEL,
+   numa);
 
-   if (!sq->skb || !sq->dma_fifo) {
+   if (!sq->skb || !sq->dma_fifo || !sq->wqe_info) {
mlx5e_free_sq_db(sq);
return -ENOMEM;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 1341b1d..aa037eb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -92,11 +92,11 @@ static inline struct mlx5e_sq_dma *mlx5e_dma_get(struct 
mlx5e_sq *sq, u32 i)
return >dma_fifo[i & sq->dma_fifo_mask];
 }
 
-static void mlx5e_dma_unmap_wqe_err(struct mlx5e_sq *sq, struct sk_buff *skb)
+static void mlx5e_dma_unmap_wqe_err(struct mlx5e_sq *sq, u8 num_dma)
 {
int i;
 
-   for (i = 0; i < MLX5E_TX_SKB_CB(skb)->num_dma; i++) {
+   for (i = 0; i < num_dma; i++) {
struct mlx5e_sq_dma *last_pushed_dma =
mlx5e_dma_get(sq, --sq->dma_fifo_pc);
 
@@ -139,19 +139,28 @@ static inline u16 mlx5e_get_inline_hdr_size(struct 
mlx5e_sq *sq,
return MLX5E_MIN_INLINE;
 }
 
-static inline void mlx5e_insert_vlan(void *start, struct sk_buff *skb, u16 ihs)
+static inline void mlx5e_tx_skb_pull_inline(unsigned char **skb_data,
+   unsigned int *skb_len,
+   unsigned int len)
+{
+   *skb_len -= len;
+   *skb_data += len;
+}
+
+static inline void mlx5e_insert_vlan(void *start, struct sk_buff *skb, u16 ihs,
+unsigned char **skb_data,
+unsigned int *skb_len)
 {
struct vlan_ethhdr *vhdr = (struct vlan_ethhdr *)start;
int cpy1_sz = 2 * ETH_ALEN;
int cpy2_sz = ihs - cpy1_sz;
 
-   skb_copy_from_linear_data(skb, vhdr, cpy1_sz);
-   skb_pull_inline(skb, cpy1_sz);
+   memcpy(vhdr, *skb_data, cpy1_sz);
+   mlx5e_tx_skb_pull_inline(skb_data, skb_len, cpy1_sz);
vhdr->h_vlan_proto = skb->vlan_proto;
vhdr->h_vlan_TCI = cpu_to_be16(skb_vlan_tag_get(skb));
-   skb_copy_from_linear_data(skb, >h_vlan_encapsulated_proto,
- cpy2_sz);
-   skb_pull_inline(skb, cpy2_sz);
+   memcpy(>h_vlan_encapsulated_proto, *skb_data, cpy2_sz);
+   mlx5e_tx_skb_pull_inline(skb_data, skb_len, cpy2_sz);
 }
 
 static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb)
@@ -160,11 +169,14 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, 
struct sk_buff *skb)
 
u16 pi = sq-&

[PATCH net-next V2 0/4] Introduce mlx5 ethernet timestamping

2015-12-20 Thread Saeed Mahameed
Hi Dave,

This patch series introduces the support for ConnectX-4 timestamping
and the PTP kernel interface.

This version addresses all the comments received on v1. The first patch
was replaced with a one that addresses the SKB data issue and fixes it
in the right way (By Achiad), Also Addressed all of Richard's comments
re timestamping patches and fixed the delayed work to work in the correct delay.

In details:

1st patch prevents the driver from modifying skb->data and SKB CB in
device xmit function.

2nd patch adds the needed low level helpers for:
- Fetching the hardware clock (hardware internal timer)
- Parsing CQEs timestamps
- Device frequency capability

3rd patch adds new en_clock.c file that handles all needed timestamping
operations:
- Internal clock structure initialization and other helper functions
- Added the needed ioctl for setting/getting the current timestamping
  configuration.
- used this configuration in RX/TX data path to fill the SKB with 
  the timestamp.

4th patch Introduces PTP (PHC) support.

Achiad Shochat (1):
  net/mlx5e: Do not modify the TX SKB

Eran Ben Elisha (3):
  net/mlx5_core: Add support for reading hardware timestamp
  net/mlx5e: Add HW timestamping (TS) support
  net/mlx5e: Add PTP Hardware Clock (PHC) support

 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|1 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   27 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  238 
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   33 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   82 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|   10 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   90 +---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   31 +++
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|1 +
 include/linux/mlx5/device.h|   20 ++-
 include/linux/mlx5/mlx5_ifc.h  |5 +-
 12 files changed, 499 insertions(+), 41 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next V1 4/4] net/mlx5e: Add PTP Hardware Clock (PHC) support

2015-12-20 Thread Saeed Mahameed
On Thu, Dec 17, 2015 at 10:20 PM, Richard Cochran
<richardcoch...@gmail.com> wrote:
> On Thu, Dec 17, 2015 at 02:35:35PM +0200, Saeed Mahameed wrote:
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
>> b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
>> index 8e86f2c..b2e5014 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
>> @@ -880,6 +880,9 @@ static int mlx5e_get_ts_info(struct net_device *dev,
>>   (1 << HWTSTAMP_FILTER_ALL);
>>   }
>>
>> + if (priv->tstamp.ptp)
>> + info->phc_index = ptp_clock_index(priv->tstamp.ptp);

> else
> info->phc_index = -1;
Will fix this.
Thnks

>
> Thanks,
> Richard
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next V1 1/4] net/mlx5e: Restore the skb data pointer after xmit is finished

2015-12-20 Thread Saeed Mahameed
On Thu, Dec 17, 2015 at 10:21 PM, David Miller <da...@davemloft.net> wrote:
> From: Saeed Mahameed <sae...@mellanox.com>
> Date: Thu, 17 Dec 2015 14:35:32 +0200
>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
>> b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>> index 1341b1d..0fcfe64 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>> @@ -165,6 +165,7 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, 
>> struct sk_buff *skb)
>>   struct mlx5_wqe_eth_seg  *eseg = >eth;
>>   struct mlx5_wqe_data_seg *dseg;
>>
>> + unsigned char *skb_data_orig = skb->data;
>>   u8  opcode = MLX5_OPCODE_SEND;
>>   dma_addr_t dma_addr = 0;
>>   bool bf = false;
>> @@ -263,6 +264,7 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, 
>> struct sk_buff *skb)
>>   cseg->opmod_idx_opcode = cpu_to_be32((sq->pc << 8) | opcode);
>>   cseg->qpn_ds   = cpu_to_be32((sq->sqn << 8) | ds_cnt);
>>
>> + skb_push(skb, skb->data - skb_data_orig);
>>   sq->skb[pi] = skb;
>>
>>   MLX5E_TX_SKB_CB(skb)->num_wqebbs = DIV_ROUND_UP(ds_cnt,
>
> And in the middle of this we have:
>
> skb_pull_inline(skb, ihs);
>
> This is looks illegal.
>
> You must not modify the data pointers of any SKB that you receive for
> sending via ->ndo_start_xmit() unless you know that absolutely you are
> the one and only reference that exists to that SKB.
>
> And exactly for the case you are trying to "fix" here, you do not.  If
> the SKB is cloned, or has an elevated users count, someone else can be
> looking at it exactly at the same time you are messing with the data
> pointers.

Agree, we will provide a fix soon.

>
> I bet mlx4 has this bug too.

I did a quick review and I din't see that we mess with SKB data pointer in mlx4.
if you know of such bug in mlx4, please share with us, we will handle ASAP.

> You must fix this properly, by keeping track of an offset or similar
> internally to your driver, rather than changing the SKB data pointers.
>
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next V1 3/4] net/mlx5e: Add HW timestamping (TS) support

2015-12-20 Thread Saeed Mahameed
On Thu, Dec 17, 2015 at 10:11 PM, Richard Cochran
<richardcoch...@gmail.com> wrote:
> On Thu, Dec 17, 2015 at 02:35:34PM +0200, Saeed Mahameed wrote:
>> @@ -63,6 +65,7 @@
>>  #define MLX5E_TX_CQ_POLL_BUDGET128
>>  #define MLX5E_UPDATE_STATS_INTERVAL200 /* msecs */
>>  #define MLX5E_SQ_BF_BUDGET 16
>> +#define MLX5E_SERVICE_TASK_DELAY   (HZ / 4)
>
> Hm...
>
>> +void mlx5e_timestamp_overflow_check(struct mlx5e_priv *priv)
>> +{
>> + bool timeout = time_is_before_jiffies(priv->tstamp.last_overflow_check 
>> +
>> +   priv->tstamp.overflow_period);
>> + unsigned long flags;
>> +
>> + if (timeout) {
>> + write_lock_irqsave(>tstamp.lock, flags);
>> + timecounter_read(>tstamp.clock);
>> + write_unlock_irqrestore(>tstamp.lock, flags);
>> + priv->tstamp.last_overflow_check = jiffies;
>
> Here you have extra book keeping, because the rate of the work
> callbacks is not the same as the rate of the overflow checks.
>
>> + }
>> +}
>
>> +void mlx5e_timestamp_init(struct mlx5e_priv *priv)
>> +{
>> + struct mlx5e_tstamp *tstamp = >tstamp;
>> + u64 ns;
>> + u64 frac = 0;
>> + u32 dev_freq;
>> +
>> + mlx5e_timestamp_init_config(tstamp);
>> + dev_freq = MLX5_CAP_GEN(priv->mdev, device_frequency_khz);
>> + if (!dev_freq) {
>> + mlx5_core_warn(priv->mdev, "invalid device_frequency_khz. %s 
>> failed\n",
>> +__func__);
>> + return;
>> + }
>> + rwlock_init(>lock);
>> + memset(>cycles, 0, sizeof(tstamp->cycles));
>> + tstamp->cycles.read = mlx5e_read_clock;
>> + tstamp->cycles.shift = MLX5E_CYCLES_SHIFT;
>> + tstamp->cycles.mult = clocksource_khz2mult(dev_freq,
>> +tstamp->cycles.shift);
>> + tstamp->nominal_c_mult = tstamp->cycles.mult;
>> + tstamp->cycles.mask = CLOCKSOURCE_MASK(41);
>> +
>> + timecounter_init(>clock, >cycles,
>> +  ktime_to_ns(ktime_get_real()));
>> +
>> + /* Calculate period in seconds to call the overflow watchdog - to make
>> +  * sure counter is checked at least once every wrap around.
>> +  */
>> + ns = cyclecounter_cyc2ns(>cycles, tstamp->cycles.mask, frac,
>> +  );
>> + do_div(ns, NSEC_PER_SEC / 2 / HZ);
>> + tstamp->overflow_period = ns;
>> +}
>
> And here you take great pains to calculate the rate of overflow checks...
>
>> +/* mlx5e_service_task - Run service task for tasks that needed to be done
>> + * periodically
>> + */
>> +static void mlx5e_service_task(struct work_struct *work)
>> +{
>> + struct delayed_work *dwork = to_delayed_work(work);
>> + struct mlx5e_priv *priv = container_of(dwork, struct mlx5e_priv,
>> +service_task);
>> +
>> + mutex_lock(>state_lock);
>> + if (test_bit(MLX5E_STATE_OPENED, >state) &&
>> + !test_bit(MLX5E_STATE_DESTROYING, >state)) {
>> + if (MLX5_CAP_GEN(priv->mdev, device_frequency_khz)) {
>> + mlx5e_timestamp_overflow_check(priv);
>> + /* Only mlx5e_timestamp_overflow_check is called from
>> +  * this service task. schedule a new task only if clock
>> +  * is initialized. if changed, move the scheduler.
>> +  */
>> + schedule_delayed_work(dwork, MLX5E_SERVICE_TASK_DELAY);
>
> Why not simply use the rate you calculated, rather than some hard
> coded value?
>

This task was made to serve several kinds of tasks, currently its only
purpose is to serve the overflow check,
We will make it specific to overflow check for now and will use a more
accurate delay.

> Consider What happens if MLX5E_SERVICE_TASK_DELAY is too long or way
> too short.
>

Agree, but what will happen if the calculated period is too rapid ?
shouldn't we have some kind of minimum ?


>> + }
>> + }
>> + mutex_unlock(>state_lock);
>> +}
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next V2 3/4] net/mlx5e: Add HW timestamping (TS) support

2015-12-21 Thread Saeed Mahameed
On Mon, Dec 21, 2015 at 11:15 AM, Richard Cochran
<richardcoch...@gmail.com> wrote:
> On Sun, Dec 20, 2015 at 11:46:30PM +0200, Saeed Mahameed wrote:
>> +/* Should run once every mlx5e_tstamp->overflow_period */
>> +static void mlx5e_timestamp_overflow(struct work_struct *work)
>> +{
>> + struct delayed_work *dwork = to_delayed_work(work);
>> + struct mlx5e_tstamp *tstamp = container_of(dwork, struct mlx5e_tstamp, 
>> overflow_work);
>> + unsigned long flags;
>> +
>> + write_lock_irqsave(>lock, flags);
>> + timecounter_read(>clock);
>> + if (tstamp->overflow_period)
>> + schedule_delayed_work(>overflow_work, 
>> tstamp->overflow_period);
>
> You don't need this test, and the call to schedule_delayed_work can be
> outside of the lock.
>

think of a case where:
CPU1: is just about to call
"schedule_delayed_work(>overflow_work,
tstamp->overflow_period);"
CPU2: cancel_delayed_work

In this case cancel_dalyed_work_sync (CPU2) will wait for CPU1 to
complete but CPU1 will re-arm the work, and we will
be left with tstamp->overflow_work running forever.

>> + write_unlock_irqrestore(>lock, flags);
>> +}
>
>> +void mlx5e_timestamp_cleanup(struct mlx5e_priv *priv)
>> +{
>> + struct mlx5e_tstamp *tstamp = >tstamp;
>> +
>> + if (!MLX5_CAP_GEN(priv->mdev, device_frequency_khz))
>> + return;
>> +
>> + write_lock(>lock);
>> + tstamp->overflow_period = 0; /* Signal overflow_check to stop */
>> + write_unlock(>lock);
>
> This is unnecessary because
>
>> +
>> + cancel_delayed_work_sync(>overflow_work);
>
> this will block until the work is cancelled.
>
see my previous comment it sure will block but without the protected
signal " tstamp->overflow_period = 0;"
the work can reschedule itself.

>> +}
>
> Thanks,
> Richard
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next V2 3/4] net/mlx5e: Add HW timestamping (TS) support

2015-12-22 Thread Saeed Mahameed
On Mon, Dec 21, 2015 at 8:35 PM, Richard Cochran
<richardcoch...@gmail.com> wrote:
> On Mon, Dec 21, 2015 at 04:35:23PM +0200, Saeed Mahameed wrote:
>> think of a case where:
>> CPU1: is just about to call
>> "schedule_delayed_work(>overflow_work,
>> tstamp->overflow_period);"
>> CPU2: cancel_delayed_work
>>
>> In this case cancel_dalyed_work_sync (CPU2) will wait for CPU1 to
>> complete but CPU1 will re-arm the work, and we will
>> be left with tstamp->overflow_work running forever.
>
> This is my understanding:  Once the work becomes re-queued, it will be
> canceled before running again.
True, will fix this.
Thanks

>
> Thanks,
> Richard
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next V2 3/4] net/mlx5e: Add HW timestamping (TS) support

2015-12-22 Thread Saeed Mahameed
On Mon, Dec 21, 2015 at 11:15 AM, Richard Cochran
<richardcoch...@gmail.com> wrote:
> On Sun, Dec 20, 2015 at 11:46:30PM +0200, Saeed Mahameed wrote:
>> +/* Should run once every mlx5e_tstamp->overflow_period */
>> +static void mlx5e_timestamp_overflow(struct work_struct *work)
>> +{
>> + struct delayed_work *dwork = to_delayed_work(work);
>> + struct mlx5e_tstamp *tstamp = container_of(dwork, struct mlx5e_tstamp, 
>> overflow_work);
>> + unsigned long flags;
>> +
>> + write_lock_irqsave(>lock, flags);
>> + timecounter_read(>clock);
>> + if (tstamp->overflow_period)
>> + schedule_delayed_work(>overflow_work, 
>> tstamp->overflow_period);
>
> You don't need this test, and the call to schedule_delayed_work can be
> outside of the lock.
>
Ok, but what will happen if somehow tstamp->overflow_period is zero ?
the work will run too rapidly.
don't we need to have protection against such case.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V1 4/4] net/mlx5e: Add PTP Hardware Clock (PHC) support

2015-12-17 Thread Saeed Mahameed
From: Eran Ben Elisha <era...@mellanox.com>

Add a PHC support to the mlx5_en driver. Use reader/writer spinlocks to
protect the timecounter since every packet received needs to call
timecounter_cycle2time() when timestamping is enabled.  This can become
a performance bottleneck with RSS and multiple receive queues if normal
spinlocks are used.

The driver has been tested with both Documentation/ptp/testptp and the
linuxptp project (http://linuxptp.sourceforge.net/) on a Mellanox
ConnectX-4 card.

Signed-off-by: Eran Ben Elisha <era...@mellanox.com>
Cc: Richard Cochran <richardcoch...@gmail.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|1 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |4 +
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  107 
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |3 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |2 +
 5 files changed, 117 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 158c88c..c503ea0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -13,6 +13,7 @@ config MLX5_CORE
 config MLX5_CORE_EN
bool "Mellanox Technologies ConnectX-4 Ethernet support"
depends on NETDEVICES && ETHERNET && PCI && MLX5_CORE
+   select PTP_1588_CLOCK
default n
---help---
  Ethernet support in Mellanox Technologies ConnectX-4 NIC.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 84e65a5..f0a36d5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -497,6 +498,8 @@ struct mlx5e_tstamp {
u32nominal_c_mult;
unsigned long  last_overflow_check;
unsigned long  overflow_period;
+   struct ptp_clock  *ptp;
+   struct ptp_clock_info  ptp_info;
 };
 
 struct mlx5e_priv {
@@ -605,6 +608,7 @@ void mlx5e_fill_hwstamp(struct mlx5e_tstamp *clock,
u64 timestamp);
 void mlx5e_timestamp_overflow_check(struct mlx5e_priv *priv);
 void mlx5e_timestamp_init(struct mlx5e_priv *priv);
+void mlx5e_timestamp_cleanup(struct mlx5e_priv *priv);
 
 int mlx5e_vlan_rx_add_vid(struct net_device *dev, __always_unused __be16 proto,
  u16 vid);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
index 9bc0058..7542f17 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
@@ -52,6 +52,93 @@ void mlx5e_fill_hwstamp(struct mlx5e_tstamp *tstamp,
hwts->hwtstamp = ns_to_ktime(nsec);
 }
 
+static int mlx5e_ptp_settime(struct ptp_clock_info *ptp,
+const struct timespec64 *ts)
+{
+   struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp,
+  ptp_info);
+   u64 ns = timespec64_to_ns(ts);
+   unsigned long flags;
+
+   write_lock_irqsave(>lock, flags);
+   timecounter_init(>clock, >cycles, ns);
+   write_unlock_irqrestore(>lock, flags);
+
+   return 0;
+}
+
+static int mlx5e_ptp_gettime(struct ptp_clock_info *ptp,
+struct timespec64 *ts)
+{
+   struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp,
+  ptp_info);
+   u64 ns;
+   unsigned long flags;
+
+   write_lock_irqsave(>lock, flags);
+   ns = timecounter_read(>clock);
+   write_unlock_irqrestore(>lock, flags);
+
+   *ts = ns_to_timespec64(ns);
+
+   return 0;
+}
+
+static int mlx5e_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
+{
+   struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp,
+  ptp_info);
+   unsigned long flags;
+
+   write_lock_irqsave(>lock, flags);
+   timecounter_adjtime(>clock, delta);
+   write_unlock_irqrestore(>lock, flags);
+
+   return 0;
+}
+
+static int mlx5e_ptp_adjfreq(struct ptp_clock_info *ptp, s32 delta)
+{
+   u64 adj;
+   u32 diff;
+   int neg_adj = 0;
+   unsigned long flags;
+   struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp,
+ ptp_info);
+
+   if (delta < 0) {
+   neg_adj = 1;
+   delta = -delta;
+   }
+
+   adj = tstamp->nominal_c_mult;
+   adj *= delta;
+   diff = div_u64

[PATCH net-next V1 2/4] net/mlx5_core: Add support for reading hardware timestamp

2015-12-17 Thread Saeed Mahameed
From: Eran Ben Elisha <era...@mellanox.com>

A preparation step which adds support for reading the hardware timestamp
from the internal clock and from the CQE.
In addition, advertize device_frequency_khz HCA capability.

Signed-off-by: Eran Ben Elisha <era...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   31 
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|1 +
 include/linux/mlx5/device.h|   20 +++--
 include/linux/mlx5/mlx5_ifc.h  |5 ++-
 4 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 789882b..b16eb42 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -504,6 +504,37 @@ int mlx5_core_disable_hca(struct mlx5_core_dev *dev, u16 
func_id)
return mlx5_cmd_status_to_err_v2(out);
 }
 
+static u32 internal_timer_h(struct mlx5_core_dev *dev)
+{
+   return ioread32be(>iseg->internal_timer_h);
+}
+
+static u32 internal_timer_l(struct mlx5_core_dev *dev)
+{
+   return ioread32be(>iseg->internal_timer_l);
+}
+
+cycle_t mlx5_core_read_clock(struct mlx5_core_dev *dev)
+{
+   u32 timer_h, timer_h1, timer_l;
+
+   /*  Reading the internal timer using 2 PCI reads in a non-atomic manner
+* may hit the wraparound of the 32 LSBs. Reading the 32 MSBs twice can
+* verify a wraparound did not happen.
+*/
+   timer_h = internal_timer_h(dev);
+   timer_l = internal_timer_l(dev);
+   timer_h1 = internal_timer_h(dev);
+   if (timer_h == timer_h1)
+   goto ret;
+
+   /* In case of overflow or wraparound, re-read the LSB */
+   timer_l = internal_timer_l(dev);
+
+ret:
+   return (u64)timer_l | (u64)timer_h1 << 32;
+}
+
 static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
 {
struct mlx5_priv *priv  = >priv;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index ea6a137..b6651b8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -98,6 +98,7 @@ int mlx5_core_sriov_configure(struct pci_dev *dev, int 
num_vfs);
 int mlx5_core_enable_hca(struct mlx5_core_dev *dev, u16 func_id);
 int mlx5_core_disable_hca(struct mlx5_core_dev *dev, u16 func_id);
 int mlx5_wait_for_vf_pages(struct mlx5_core_dev *dev);
+cycle_t mlx5_core_read_clock(struct mlx5_core_dev *dev);
 
 void mlx5e_init(void);
 void mlx5e_cleanup(void);
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 7d3a85f..df2f79e 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -443,9 +443,12 @@ struct mlx5_init_seg {
__be32  rsvd1[120];
__be32  initializing;
struct health_bufferhealth;
-   __be32  rsvd2[884];
+   __be32  rsvd2[880];
+   __be32  internal_timer_h;
+   __be32  internal_timer_l;
+   __be32  rsrv3[2];
__be32  health_counter;
-   __be32  rsvd3[1019];
+   __be32  rsvd4[1019];
__be64  ieee1588_clk;
__be32  ieee1588_clk_type;
__be32  clr_intx;
@@ -601,7 +604,8 @@ struct mlx5_cqe64 {
__be32  imm_inval_pkey;
u8  rsvd40[4];
__be32  byte_cnt;
-   __be64  timestamp;
+   __be32  timestamp_h;
+   __be32  timestamp_l;
__be32  sop_drop_qpn;
__be16  wqe_counter;
u8  signature;
@@ -623,6 +627,16 @@ static inline int cqe_has_vlan(struct mlx5_cqe64 *cqe)
return !!(cqe->l4_hdr_type_etc & 0x1);
 }
 
+static inline u64 get_cqe_ts(struct mlx5_cqe64 *cqe)
+{
+   u32 hi, lo;
+
+   hi = be32_to_cpu(cqe->timestamp_h);
+   lo = be32_to_cpu(cqe->timestamp_l);
+
+   return (u64)lo | ((u64)hi << 32);
+}
+
 enum {
CQE_L4_HDR_TYPE_NONE= 0x0,
CQE_L4_HDR_TYPE_TCP_NO_ACK  = 0x1,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 131a273..e4da900 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -829,9 +829,10 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 reserved_66[0x8];
u8 log_uar_page_sz[0x10];
 
-   u8 reserved_67[0xe0];
+   u8 reserved_67[0x40];
+   u8 device_frequency_khz[0x20];
+   u8 reserved_68[0x5f];
 
-   u8 reserved_68[0x1f];
u8  

[PATCH net-next V1 0/4] Introduce mlx5 ethernet timestamping

2015-12-17 Thread Saeed Mahameed
Hi Dave,

This patch series introduces the support for ConnectX-4 timestamping
and the PTP kernel interface.

This version addresses all the comments received on v0 and breaks 
the original series to four patches.

First patch fixes a bug in SKB data pointer in device xmit function.

Second patch adds the needed low level helpers for:
- Fetching the hardware clock (hardware internal timer)
- Parsing CQEs timestamps
- Device frequency capability

3rd patch adds new en_clock.c file that handles all needed timestamping
operations:
- Internal clock structure initialization and other helper functions.
- Added the needed ioctl for setting/getting the current timestamping
  configuration.
- used this configuration in RX/TX data path to fill the SKB with 
  the timestamp.

4th patch Introduces PTP (PHC) support.

Eran Ben Elisha (4):
  net/mlx5e: Restore the skb data pointer after xmit is finished
  net/mlx5_core: Add support for reading hardware timestamp
  net/mlx5e: Add HW timestamping (TS) support
  net/mlx5e: Add PTP Hardware Clock (PHC) support

 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|1 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   25 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  226 
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   32 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  103 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|9 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   16 ++
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   31 +++
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|1 +
 include/linux/mlx5/device.h|   20 ++-
 include/linux/mlx5/mlx5_ifc.h  |5 +-
 12 files changed, 464 insertions(+), 7 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V1 1/4] net/mlx5e: Restore the skb data pointer after xmit is finished

2015-12-17 Thread Saeed Mahameed
From: Eran Ben Elisha <era...@mellanox.com>

Restore the skb data pointer after coping the data to the HW, so the skb
can be cloned with correct headers for future use (e.g timestamping).

Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files")
Signed-off-by: Eran Ben Elisha <era...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 1341b1d..0fcfe64 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -165,6 +165,7 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, 
struct sk_buff *skb)
struct mlx5_wqe_eth_seg  *eseg = >eth;
struct mlx5_wqe_data_seg *dseg;
 
+   unsigned char *skb_data_orig = skb->data;
u8  opcode = MLX5_OPCODE_SEND;
dma_addr_t dma_addr = 0;
bool bf = false;
@@ -263,6 +264,7 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, 
struct sk_buff *skb)
cseg->opmod_idx_opcode = cpu_to_be32((sq->pc << 8) | opcode);
cseg->qpn_ds   = cpu_to_be32((sq->sqn << 8) | ds_cnt);
 
+   skb_push(skb, skb->data - skb_data_orig);
sq->skb[pi] = skb;
 
MLX5E_TX_SKB_CB(skb)->num_wqebbs = DIV_ROUND_UP(ds_cnt,
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V1 3/4] net/mlx5e: Add HW timestamping (TS) support

2015-12-17 Thread Saeed Mahameed
From: Eran Ben Elisha <era...@mellanox.com>

Add support for enable/disable HW timestamping for incoming and/or
outgoing packets. To enable/disable HW timestamping appropriate
ioctl should be used.  Currently HWTSTAMP_FILTER_ALL/NONE and
HWTSAMP_TX_ON/OFF only are supported.  Make all relevant changes in
RX/TX flows to consider TS request and plant HW timestamps into
relevant structures.

Add internal clock for converting hardware timestamp to nanoseconds.  In
addition, add a service task to catch internal clock overflow, to make
sure timestamping is accurate.

Signed-off-by: Eran Ben Elisha <era...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   21 
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  119 
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   29 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  101 -
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|9 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   14 +++
 7 files changed, 293 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index fe11e96..01c0256 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -5,4 +5,4 @@ mlx5_core-y :=  main.o cmd.o debugfs.o fw.o eq.o uar.o 
pagealloc.o \
mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o \
en_main.o en_fs.o en_ethtool.o en_tx.o en_rx.o \
-   en_txrx.o
+   en_txrx.o en_clock.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f689ce5..84e65a5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -32,6 +32,8 @@
 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -63,6 +65,7 @@
 #define MLX5E_TX_CQ_POLL_BUDGET128
 #define MLX5E_UPDATE_STATS_INTERVAL200 /* msecs */
 #define MLX5E_SQ_BF_BUDGET 16
+#define MLX5E_SERVICE_TASK_DELAY   (HZ / 4)
 
 #define MLX5E_NUM_MAIN_GROUPS 9
 
@@ -486,6 +489,16 @@ struct mlx5e_flow_tables {
struct mlx5e_flow_table main;
 };
 
+struct mlx5e_tstamp {
+   rwlock_t   lock;
+   struct cyclecountercycles;
+   struct timecounter clock;
+   struct hwtstamp_config hwtstamp_config;
+   u32nominal_c_mult;
+   unsigned long  last_overflow_check;
+   unsigned long  overflow_period;
+};
+
 struct mlx5e_priv {
/* priv data path fields - start */
intdefault_vlan_prio;
@@ -515,10 +528,12 @@ struct mlx5e_priv {
struct work_struct update_carrier_work;
struct work_struct set_rx_mode_work;
struct delayed_workupdate_stats_work;
+   struct delayed_workservice_task;
 
struct mlx5_core_dev  *mdev;
struct net_device *netdev;
struct mlx5e_stats stats;
+   struct mlx5e_tstamptstamp;
 };
 
 #define MLX5E_NET_IP_ALIGN 2
@@ -585,6 +600,12 @@ void mlx5e_destroy_flow_tables(struct mlx5e_priv *priv);
 void mlx5e_init_eth_addr(struct mlx5e_priv *priv);
 void mlx5e_set_rx_mode_work(struct work_struct *work);
 
+void mlx5e_fill_hwstamp(struct mlx5e_tstamp *clock,
+   struct skb_shared_hwtstamps *hwts,
+   u64 timestamp);
+void mlx5e_timestamp_overflow_check(struct mlx5e_priv *priv);
+void mlx5e_timestamp_init(struct mlx5e_priv *priv);
+
 int mlx5e_vlan_rx_add_vid(struct net_device *dev, __always_unused __be16 proto,
  u16 vid);
 int mlx5e_vlan_rx_kill_vid(struct net_device *dev, __always_unused __be16 
proto,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
new file mode 100644
index 000..9bc0058
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
@@ -0,0 +1,119 @@
+/*
+ * Copyright (c) 2015, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributi

[PATCH net-next V3 3/4] net/mlx5e: Add HW timestamping (TS) support

2015-12-29 Thread Saeed Mahameed
From: Eran Ben Elisha <era...@mellanox.com>

Add support for enable/disable HW timestamping for incoming and/or
outgoing packets. To enable/disable HW timestamping appropriate
ioctl should be used. Currently HWTSTAMP_FILTER_ALL/NONE and
HWTSAMP_TX_ON/OFF only are supported. Make all relevant changes in
RX/TX flows to consider TS request and plant HW timestamps into
relevant structures.

Add internal clock for converting hardware timestamp to nanoseconds. In
addition, add a service task to catch internal clock overflow, to make
sure timestamping is accurate.

Signed-off-by: Eran Ben Elisha <era...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   23 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  187 
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   29 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   19 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|9 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   12 ++
 7 files changed, 279 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index fe11e96..01c0256 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -5,4 +5,4 @@ mlx5_core-y :=  main.o cmd.o debugfs.o fw.o eq.o uar.o 
pagealloc.o \
mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o \
en_main.o en_fs.o en_ethtool.o en_tx.o en_rx.o \
-   en_txrx.o
+   en_txrx.o en_clock.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ae3f0e3..477e248 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -32,6 +32,8 @@
 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -284,6 +286,17 @@ struct mlx5e_params {
u32 indirection_rqt[MLX5E_INDIR_RQT_SIZE];
 };
 
+struct mlx5e_tstamp {
+   rwlock_t   lock;
+   struct cyclecountercycles;
+   struct timecounter clock;
+   struct hwtstamp_config hwtstamp_config;
+   u32nominal_c_mult;
+   unsigned long  overflow_period;
+   struct delayed_workoverflow_work;
+   struct mlx5_core_dev  *mdev;
+};
+
 enum {
MLX5E_RQ_STATE_POST_WQES_ENABLE,
 };
@@ -315,6 +328,7 @@ struct mlx5e_rq {
 
struct device *pdev;
struct net_device *netdev;
+   struct mlx5e_tstamp   *tstamp;
struct mlx5e_rq_stats  stats;
struct mlx5e_cqcq;
 
@@ -382,6 +396,7 @@ struct mlx5e_sq {
u16max_inline;
u16edge;
struct device *pdev;
+   struct mlx5e_tstamp   *tstamp;
__be32 mkey_be;
unsigned long  state;
 
@@ -518,6 +533,7 @@ struct mlx5e_priv {
struct mlx5_core_dev  *mdev;
struct net_device *netdev;
struct mlx5e_stats stats;
+   struct mlx5e_tstamptstamp;
 };
 
 #define MLX5E_NET_IP_ALIGN 2
@@ -584,6 +600,13 @@ void mlx5e_destroy_flow_tables(struct mlx5e_priv *priv);
 void mlx5e_init_eth_addr(struct mlx5e_priv *priv);
 void mlx5e_set_rx_mode_work(struct work_struct *work);
 
+void mlx5e_fill_hwstamp(struct mlx5e_tstamp *clock, u64 timestamp,
+   struct skb_shared_hwtstamps *hwts);
+void mlx5e_timestamp_init(struct mlx5e_priv *priv);
+void mlx5e_timestamp_cleanup(struct mlx5e_priv *priv);
+int mlx5e_hwstamp_set(struct net_device *dev, struct ifreq *ifr);
+int mlx5e_hwstamp_get(struct net_device *dev, struct ifreq *ifr);
+
 int mlx5e_vlan_rx_add_vid(struct net_device *dev, __always_unused __be16 proto,
  u16 vid);
 int mlx5e_vlan_rx_kill_vid(struct net_device *dev, __always_unused __be16 
proto,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
new file mode 100644
index 000..49a8238
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
@@ -0,0 +1,187 @@
+/*
+ * Copyright (c) 2015, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * wi

[PATCH net-next V3 1/4] net/mlx5e: Do not modify the TX SKB

2015-12-29 Thread Saeed Mahameed
From: Achiad Shochat <ach...@mellanox.com>

If the SKB is cloned, or has an elevated users count, someone else
can be looking at it at the same time.

Signed-off-by: Achiad Shochat <ach...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |5 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |5 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c   |   73 -
 3 files changed, 49 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f689ce5..ae3f0e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -328,14 +328,12 @@ struct mlx5e_rq {
struct mlx5e_priv *priv;
 } cacheline_aligned_in_smp;
 
-struct mlx5e_tx_skb_cb {
+struct mlx5e_tx_wqe_info {
u32 num_bytes;
u8  num_wqebbs;
u8  num_dma;
 };
 
-#define MLX5E_TX_SKB_CB(__skb) ((struct mlx5e_tx_skb_cb *)__skb->cb)
-
 enum mlx5e_dma_map_type {
MLX5E_DMA_MAP_SINGLE,
MLX5E_DMA_MAP_PAGE
@@ -371,6 +369,7 @@ struct mlx5e_sq {
/* pointers to per packet info: write@xmit, read@completion */
struct sk_buff   **skb;
struct mlx5e_sq_dma   *dma_fifo;
+   struct mlx5e_tx_wqe_info  *wqe_info;
 
/* read only */
struct mlx5_wq_cyc wq;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index d4601a5..96775a2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -507,6 +507,7 @@ static void mlx5e_close_rq(struct mlx5e_rq *rq)
 
 static void mlx5e_free_sq_db(struct mlx5e_sq *sq)
 {
+   kfree(sq->wqe_info);
kfree(sq->dma_fifo);
kfree(sq->skb);
 }
@@ -519,8 +520,10 @@ static int mlx5e_alloc_sq_db(struct mlx5e_sq *sq, int numa)
sq->skb = kzalloc_node(wq_sz * sizeof(*sq->skb), GFP_KERNEL, numa);
sq->dma_fifo = kzalloc_node(df_sz * sizeof(*sq->dma_fifo), GFP_KERNEL,
numa);
+   sq->wqe_info = kzalloc_node(wq_sz * sizeof(*sq->wqe_info), GFP_KERNEL,
+   numa);
 
-   if (!sq->skb || !sq->dma_fifo) {
+   if (!sq->skb || !sq->dma_fifo || !sq->wqe_info) {
mlx5e_free_sq_db(sq);
return -ENOMEM;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 1341b1d..aa037eb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -92,11 +92,11 @@ static inline struct mlx5e_sq_dma *mlx5e_dma_get(struct 
mlx5e_sq *sq, u32 i)
return >dma_fifo[i & sq->dma_fifo_mask];
 }
 
-static void mlx5e_dma_unmap_wqe_err(struct mlx5e_sq *sq, struct sk_buff *skb)
+static void mlx5e_dma_unmap_wqe_err(struct mlx5e_sq *sq, u8 num_dma)
 {
int i;
 
-   for (i = 0; i < MLX5E_TX_SKB_CB(skb)->num_dma; i++) {
+   for (i = 0; i < num_dma; i++) {
struct mlx5e_sq_dma *last_pushed_dma =
mlx5e_dma_get(sq, --sq->dma_fifo_pc);
 
@@ -139,19 +139,28 @@ static inline u16 mlx5e_get_inline_hdr_size(struct 
mlx5e_sq *sq,
return MLX5E_MIN_INLINE;
 }
 
-static inline void mlx5e_insert_vlan(void *start, struct sk_buff *skb, u16 ihs)
+static inline void mlx5e_tx_skb_pull_inline(unsigned char **skb_data,
+   unsigned int *skb_len,
+   unsigned int len)
+{
+   *skb_len -= len;
+   *skb_data += len;
+}
+
+static inline void mlx5e_insert_vlan(void *start, struct sk_buff *skb, u16 ihs,
+unsigned char **skb_data,
+unsigned int *skb_len)
 {
struct vlan_ethhdr *vhdr = (struct vlan_ethhdr *)start;
int cpy1_sz = 2 * ETH_ALEN;
int cpy2_sz = ihs - cpy1_sz;
 
-   skb_copy_from_linear_data(skb, vhdr, cpy1_sz);
-   skb_pull_inline(skb, cpy1_sz);
+   memcpy(vhdr, *skb_data, cpy1_sz);
+   mlx5e_tx_skb_pull_inline(skb_data, skb_len, cpy1_sz);
vhdr->h_vlan_proto = skb->vlan_proto;
vhdr->h_vlan_TCI = cpu_to_be16(skb_vlan_tag_get(skb));
-   skb_copy_from_linear_data(skb, >h_vlan_encapsulated_proto,
- cpy2_sz);
-   skb_pull_inline(skb, cpy2_sz);
+   memcpy(>h_vlan_encapsulated_proto, *skb_data, cpy2_sz);
+   mlx5e_tx_skb_pull_inline(skb_data, skb_len, cpy2_sz);
 }
 
 static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb)
@@ -160,11 +169,14 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, 
struct sk_buff *skb)
 
u16 pi = sq-&

[PATCH net-next V3 4/4] net/mlx5e: Add PTP Hardware Clock (PHC) support

2015-12-29 Thread Saeed Mahameed
From: Eran Ben Elisha <era...@mellanox.com>

Add a PHC support to the mlx5_en driver. Use reader/writer spinlocks to
protect the timecounter since every packet received needs to call
timecounter_cycle2time() when timestamping is enabled.  This can become
a performance bottleneck with RSS and multiple receive queues if normal
spinlocks are used.

The driver has been tested with both Documentation/ptp/testptp and the
linuxptp project (http://linuxptp.sourceforge.net/) on a Mellanox
ConnectX-4 card.

Signed-off-by: Eran Ben Elisha <era...@mellanox.com>
Cc: Richard Cochran <richardcoch...@gmail.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|1 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |3 +
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  100 
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |3 +-
 4 files changed, 106 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 158c88c..c503ea0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -13,6 +13,7 @@ config MLX5_CORE
 config MLX5_CORE_EN
bool "Mellanox Technologies ConnectX-4 Ethernet support"
depends on NETDEVICES && ETHERNET && PCI && MLX5_CORE
+   select PTP_1588_CLOCK
default n
---help---
  Ethernet support in Mellanox Technologies ConnectX-4 NIC.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 477e248..9ea49a8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -295,6 +296,8 @@ struct mlx5e_tstamp {
unsigned long  overflow_period;
struct delayed_workoverflow_work;
struct mlx5_core_dev  *mdev;
+   struct ptp_clock  *ptp;
+   struct ptp_clock_info  ptp_info;
 };
 
 enum {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
index 49a8238..be65435 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
@@ -130,6 +130,89 @@ int mlx5e_hwstamp_get(struct net_device *dev, struct ifreq 
*ifr)
return copy_to_user(ifr->ifr_data, cfg, sizeof(*cfg)) ? -EFAULT : 0;
 }
 
+static int mlx5e_ptp_settime(struct ptp_clock_info *ptp,
+const struct timespec64 *ts)
+{
+   struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp,
+  ptp_info);
+   u64 ns = timespec64_to_ns(ts);
+
+   write_lock(>lock);
+   timecounter_init(>clock, >cycles, ns);
+   write_unlock(>lock);
+
+   return 0;
+}
+
+static int mlx5e_ptp_gettime(struct ptp_clock_info *ptp,
+struct timespec64 *ts)
+{
+   struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp,
+  ptp_info);
+   u64 ns;
+
+   write_lock(>lock);
+   ns = timecounter_read(>clock);
+   write_unlock(>lock);
+
+   *ts = ns_to_timespec64(ns);
+
+   return 0;
+}
+
+static int mlx5e_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
+{
+   struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp,
+  ptp_info);
+
+   write_lock(>lock);
+   timecounter_adjtime(>clock, delta);
+   write_unlock(>lock);
+
+   return 0;
+}
+
+static int mlx5e_ptp_adjfreq(struct ptp_clock_info *ptp, s32 delta)
+{
+   u64 adj;
+   u32 diff;
+   int neg_adj = 0;
+   struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp,
+ ptp_info);
+
+   if (delta < 0) {
+   neg_adj = 1;
+   delta = -delta;
+   }
+
+   adj = tstamp->nominal_c_mult;
+   adj *= delta;
+   diff = div_u64(adj, 10ULL);
+
+   write_lock(>lock);
+   timecounter_read(>clock);
+   tstamp->cycles.mult = neg_adj ? tstamp->nominal_c_mult - diff :
+   tstamp->nominal_c_mult + diff;
+   write_unlock(>lock);
+
+   return 0;
+}
+
+static const struct ptp_clock_info mlx5e_ptp_clock_info = {
+   .owner  = THIS_MODULE,
+   .max_adj= 1,
+   .n_alarm= 0,
+   .n_ext_ts   = 0,
+   .n_per_out  = 0,
+   .n_pins = 0,
+   .pps= 0,
+   .adjfreq= mlx5e_ptp_adjfreq,
+   .adjtime= mlx5e_ptp

[PATCH net-next V3 0/4] Introduce mlx5 ethernet timestamping

2015-12-29 Thread Saeed Mahameed
Hi Dave,

This patch series introduces the support for ConnectX-4 timestamping
and the PTP kernel interface.

Changes from V2:
net/mlx5_core: Introduce access function to read internal_timer
- Remove one line function
- Change function name

net/mlx5e: Add HW timestamping (TS) support:
- Data path performance optimization (caching tstamp struct in rq,sq)
- Change read/write_lock_irqsave to read/write_lock
- Move ioctl functions to en_clock file
- Changed overflow start algorithm according to comments from Richard
- Move timestamp init/cleanup to open/close ndos.

In details:

1st patch prevents the driver from modifying skb->data and SKB CB in
device xmit function.

2nd patch adds the needed low level helpers for:
- Fetching the hardware clock (hardware internal timer)
- Parsing CQEs timestamps
- Device frequency capability

3rd patch adds new en_clock.c file that handles all needed timestamping
operations:
- Internal clock structure initialization and other helper functions
- Added the needed ioctl for setting/getting the current timestamping
  configuration.
- used this configuration in RX/TX data path to fill the SKB with 
  the timestamp.

4th patch Introduces PTP (PHC) support.

Achiad Shochat (1):
  net/mlx5e: Do not modify the TX SKB

Eran Ben Elisha (3):
  net/mlx5_core: Introduce access function to read internal timer
  net/mlx5e: Add HW timestamping (TS) support
  net/mlx5e: Add PTP Hardware Clock (PHC) support

 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|1 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   31 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  287 
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   30 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   24 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|9 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   85 --
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   13 +
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|1 +
 include/linux/mlx5/device.h|   20 ++-
 include/linux/mlx5/mlx5_ifc.h  |6 +-
 12 files changed, 467 insertions(+), 42 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V3 2/4] net/mlx5_core: Introduce access function to read internal timer

2015-12-29 Thread Saeed Mahameed
From: Eran Ben Elisha <era...@mellanox.com>

A preparation step which adds support for reading the hardware
internal timer and the hardware timestamping from the CQE.
In addition, advertize device_frequency_khz HCA capability.

Signed-off-by: Eran Ben Elisha <era...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   13 +
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|1 +
 include/linux/mlx5/device.h|   20 +---
 include/linux/mlx5/mlx5_ifc.h  |6 +++---
 4 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 789882b..67676cf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -504,6 +504,19 @@ int mlx5_core_disable_hca(struct mlx5_core_dev *dev, u16 
func_id)
return mlx5_cmd_status_to_err_v2(out);
 }
 
+cycle_t mlx5_read_internal_timer(struct mlx5_core_dev *dev)
+{
+   u32 timer_h, timer_h1, timer_l;
+
+   timer_h = ioread32be(>iseg->internal_timer_h);
+   timer_l = ioread32be(>iseg->internal_timer_l);
+   timer_h1 = ioread32be(>iseg->internal_timer_h);
+   if (timer_h != timer_h1) /* wrap around */
+   timer_l = ioread32be(>iseg->internal_timer_l);
+
+   return (cycle_t)timer_l | (cycle_t)timer_h1 << 32;
+}
+
 static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
 {
struct mlx5_priv *priv  = >priv;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index ea6a137..0336847 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -98,6 +98,7 @@ int mlx5_core_sriov_configure(struct pci_dev *dev, int 
num_vfs);
 int mlx5_core_enable_hca(struct mlx5_core_dev *dev, u16 func_id);
 int mlx5_core_disable_hca(struct mlx5_core_dev *dev, u16 func_id);
 int mlx5_wait_for_vf_pages(struct mlx5_core_dev *dev);
+cycle_t mlx5_read_internal_timer(struct mlx5_core_dev *dev);
 
 void mlx5e_init(void);
 void mlx5e_cleanup(void);
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 7d3a85f..df2f79e 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -443,9 +443,12 @@ struct mlx5_init_seg {
__be32  rsvd1[120];
__be32  initializing;
struct health_bufferhealth;
-   __be32  rsvd2[884];
+   __be32  rsvd2[880];
+   __be32  internal_timer_h;
+   __be32  internal_timer_l;
+   __be32  rsrv3[2];
__be32  health_counter;
-   __be32  rsvd3[1019];
+   __be32  rsvd4[1019];
__be64  ieee1588_clk;
__be32  ieee1588_clk_type;
__be32  clr_intx;
@@ -601,7 +604,8 @@ struct mlx5_cqe64 {
__be32  imm_inval_pkey;
u8  rsvd40[4];
__be32  byte_cnt;
-   __be64  timestamp;
+   __be32  timestamp_h;
+   __be32  timestamp_l;
__be32  sop_drop_qpn;
__be16  wqe_counter;
u8  signature;
@@ -623,6 +627,16 @@ static inline int cqe_has_vlan(struct mlx5_cqe64 *cqe)
return !!(cqe->l4_hdr_type_etc & 0x1);
 }
 
+static inline u64 get_cqe_ts(struct mlx5_cqe64 *cqe)
+{
+   u32 hi, lo;
+
+   hi = be32_to_cpu(cqe->timestamp_h);
+   lo = be32_to_cpu(cqe->timestamp_l);
+
+   return (u64)lo | ((u64)hi << 32);
+}
+
 enum {
CQE_L4_HDR_TYPE_NONE= 0x0,
CQE_L4_HDR_TYPE_TCP_NO_ACK  = 0x1,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 131a273..1780a85 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -829,9 +829,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 reserved_66[0x8];
u8 log_uar_page_sz[0x10];
 
-   u8 reserved_67[0xe0];
-
-   u8 reserved_68[0x1f];
+   u8 reserved_67[0x40];
+   u8 device_frequency_khz[0x20];
+   u8 reserved_68[0x5f];
u8 cqe_zip[0x1];
 
u8 cqe_zip_timeout[0x10];
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 10/18] net/mlx5e: Write vlan list into vport context

2015-11-23 Thread Saeed Mahameed
On Mon, Nov 23, 2015 at 7:30 PM, Alexander Duyck
<alexander.du...@gmail.com> wrote:
> On 11/23/2015 03:11 AM, Or Gerlitz wrote:
>>
>> From: Saeed Mahameed <sae...@mellanox.com>
>>
>> Each Vport/vNIC must notify underlying e-Switch layer
>> for vlan table changes in-order to update SR-IOV FDB tables.
>>
>> We do that at vlan_rx_add_vid and vlan_rx_kill_vid ndos.
>>
>> Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
>> Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
>> ---
>>   drivers/net/ethernet/mellanox/mlx5/core/en.h   |  1 +
>>   .../ethernet/mellanox/mlx5/core/en_flow_table.c| 49
>> ++
>>   2 files changed, 50 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h
>> b/drivers/net/ethernet/mellanox/mlx5/core/en.h
>> index 69f1c1a..89313d4 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
>> @@ -465,6 +465,7 @@ enum {
>>   };
>>
>>   struct mlx5e_vlan_db {
>> +   unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
>> u32   active_vlans_ft_ix[VLAN_N_VID];
>> u32   untagged_rule_ft_ix;
>> u32   any_vlan_rule_ft_ix;
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
>> b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
>> index 9a021be..3c0cf22 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
>> @@ -502,6 +502,46 @@ add_eth_addr_rule_out:
>> return err;
>>   }
>>
>> +static int mlx5e_vport_context_update_vlans(struct mlx5e_priv *priv)
>> +{
>> +   struct net_device *ndev = priv->netdev;
>> +   int max_list_size;
>> +   int list_size;
>> +   u16 *vlans;
>> +   int vlan;
>> +   int err;
>> +   int i;
>> +
>> +   list_size = 0;
>> +   for_each_set_bit(vlan, priv->vlan.active_vlans, VLAN_N_VID)
>> +   list_size++;
>> +
>> +   max_list_size = 1 << MLX5_CAP_GEN(priv->mdev, log_max_vlan_list);
>> +
>> +   if (list_size > max_list_size) {
>> +   netdev_warn(ndev,
>> +   "netdev vlans list size (%d) > (%d) max vport
>> list size, some vlans will be dropped\n",
>> +   list_size, max_list_size);
>> +   list_size = max_list_size;
>> +   }
>> +
>> +   vlans = kcalloc(list_size, sizeof(*vlans), GFP_KERNEL);
>> +   if (!vlans)
>> +   return -ENOMEM;
>> +
>> +   i = 0;
>> +   for_each_set_bit(vlan, priv->vlan.active_vlans, VLAN_N_VID)
>> +   vlans[i++] = vlan;
>> +
>
>
> You capped the allocation at max_list_size above, but you are technically
> populating up to the original value of list_size here.  I believe that opens
> you up to a buffer overrun.  You probably need to add a check for i >=
> list_size and exit the loop if true.
>
True, Will fix this, thanks for noticing.

>
>> +   err = mlx5_modify_nic_vport_vlans(priv->mdev, vlans, list_size);
>> +   if (err)
>> +   netdev_err(ndev, "Failed to modify vport vlans list
>> err(%d)\n",
>> +  err);
>> +
>> +   kfree(vlans);
>> +   return err;
>> +}
>> +
>>   enum mlx5e_vlan_rule_type {
>> MLX5E_VLAN_RULE_TYPE_UNTAGGED,
>> MLX5E_VLAN_RULE_TYPE_ANY_VID,
>> @@ -552,6 +592,10 @@ static int mlx5e_add_vlan_rule(struct mlx5e_priv
>> *priv,
>>  1);
>> break;
>> default: /* MLX5E_VLAN_RULE_TYPE_MATCH_VID */
>> +   err = mlx5e_vport_context_update_vlans(priv);
>> +   if (err)
>> +   goto add_vlan_rule_out;
>> +
>> ft_ix = >vlan.active_vlans_ft_ix[vid];
>> MLX5_SET(fte_match_param, match_value,
>> outer_headers.vlan_tag,
>>  1);
>> @@ -588,6 +632,7 @@ static void mlx5e_del_vlan_rule(struct mlx5e_priv
>> *priv,
>> case MLX5E_VLAN_RULE_TYPE_MATCH_VID:
>> mlx5_del_flow_table_entry(priv->ft.vlan,
>>
>> priv->vlan.active_vlans_ft_ix[vid]);
>> +   mlx5e_vport_context_update_vlans(priv);
>> break;
>> }
>>   }
>>

Re: [PATCH net-next 09/18] net/mlx5e: Write UC/MC list and promisc mode into vport context

2015-11-23 Thread Saeed Mahameed
On Mon, Nov 23, 2015 at 7:23 PM, Alexander Duyck
<alexander.du...@gmail.com> wrote:
> On 11/23/2015 03:11 AM, Or Gerlitz wrote:
>>
>> From: Saeed Mahameed <sae...@mellanox.com>
>>
>> Each Vport/vNIC must notify underlying e-Switch layer
>> for UC/MC list and promisc mode updates, in-order to update
>> l2 tables and SR-IOV FDB tables.
>>
>> We do that at set_rx_mode ndo.
>>
>> preperation for ethernet-SRIOV and l2 table management.
>>
>> Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
>> Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
>> ---
>>   .../ethernet/mellanox/mlx5/core/en_flow_table.c| 99
>> ++
>>   1 file changed, 99 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
>> b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
>> index 22d603f..9a021be 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
>> @@ -671,6 +671,103 @@ static void mlx5e_sync_netdev_addr(struct mlx5e_priv
>> *priv)
>> netif_addr_unlock_bh(netdev);
>>   }
>>
>> +/* Returns a pointer to an array of type u8[][ETH_ALEN] */
>> +static u8 (*mlx5e_build_addr_array(struct mlx5e_priv *priv, int
>> list_type,
>> +  int *size))[ETH_ALEN]
>
>
> This is just ugly.  Isn't there a way you can just return a u8 pointer and
> assume the ETH_ALEN stride?  If nothing else it seems like it would be
> better to just create a structure or typedef containing the u8 array and to
> return a pointer to that since all the ETH_ALEN here really represents is a
> stride within your array.
>
I thought twice before writing the code this way, Indeed it looks ugly
although it is a standard C syntax, it might be cleaner to just have a
typedef or a structure.
Will try your suggestion.

>
>> +{
>> +   bool is_uc = (list_type == MLX5_NVPRT_LIST_TYPE_UC);
>> +   struct net_device *ndev = priv->netdev;
>> +   struct mlx5e_eth_addr_hash_node *hn;
>> +   struct hlist_head *addr_list;
>> +   u8 (*addr_array)[ETH_ALEN];
>> +   struct hlist_node *tmp;
>> +   int max_list_size;
>> +   int list_size;
>> +   int hi;
>> +   int i;
>> +
>> +   list_size = is_uc ? 0 : (priv->eth_addr.broadcast_enabled ? 1 :
>> 0);
>> +   max_list_size = is_uc ?
>> +   1 << MLX5_CAP_GEN(priv->mdev, log_max_current_uc_list) :
>> +   1 << MLX5_CAP_GEN(priv->mdev, log_max_current_mc_list);
>> +
>> +   addr_list = is_uc ? priv->eth_addr.netdev_uc :
>> priv->eth_addr.netdev_mc;
>> +   mlx5e_for_each_hash_node(hn, tmp, addr_list, hi)
>> +   list_size++;
>> +
>> +   if (list_size > max_list_size) {
>> +   netdev_warn(ndev,
>> +   "netdev %s list size (%d) > (%d) max vport
>> list size, some addresses will be dropped\n",
>> +   is_uc ? "UC" : "MC", list_size,
>> max_list_size);
>> +   list_size = max_list_size;
>> +   }
>> +
>> +   addr_array = kcalloc(list_size, ETH_ALEN, GFP_KERNEL);
>> +   if (!addr_array)
>> +   return NULL;
>> +
>> +   i = 0;
>> +   if (is_uc) { /* Make sure our own address is pushed first */
>> +   mlx5e_for_each_hash_node(hn, tmp, addr_list, hi) {
>> +   if (ether_addr_equal(ndev->dev_addr, hn->ai.addr))
>> {
>> +   ether_addr_copy(addr_array[i++],
>> ndev->dev_addr);
>> +   break;
>> +   }
>> +   }
>> +   }
>> +
>
>
> What is the point of this loop?  Is there a chance that the device address
> isn't going to be in the list somewhere?  Otherwise it seems like you could
> just follow the pattern you did for the broadcast address and just copy the
> dev_addr directly instead of crawling through the loop.
>
The main Idea of traversing in this loop is to handle the case where the
device uc list is sent empty, in this case I don't need any kind of
special logic to know whether I need to push the dev_addr directly or
not at all.

Regarding your question whether the device address going to be in the list,
the answer is yes and it is always there when the device is up, we do push
it ourselves in mlx5e_sync_netdev_addr.

for the broadcast address th

[PATCH net-next] nfnetlink_queue: enable PID info retrieval

2016-06-09 Thread Saeed Mahameed
From: Matthew Finlay <m...@mellanox.com>

Allow the netlink_queue_module to get the PID associated with an outgoing
connection. Finding the PID based on the tuple in userspace is expensive.
This additional attribute makes it convenient and efficient to get the PID
associated with the outgoing connection in userspace, without the need to
parse procfs.

Signed-off-by: Matthew Finlay <m...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
CC: Pablo Neira Ayuso <pa...@netfilter.org>
CC: Patrick McHardy <ka...@trash.net>
CC: Jozsef Kadlecsik <kad...@blackhole.kfki.hu>
---
 include/linux/fs.h |  1 +
 include/uapi/linux/netfilter/nfnetlink_queue.h |  4 +++-
 net/netfilter/nfnetlink_queue.c| 25 +
 net/socket.c   |  1 +
 4 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index dd28814..f6e0ae3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -871,6 +871,7 @@ extern struct block_device *I_BDEV(struct inode *inode);
 struct fown_struct {
rwlock_t lock;  /* protects pid, uid, euid fields */
struct pid *pid;/* pid or -pgrp where SIGIO should be sent */
+   struct pid *sock_pid;   /* pid of the process that created the socket */
enum pid_type pid_type; /* Kind of process group SIGIO should be sent 
to */
kuid_t uid, euid;   /* uid/euid of process setting the owner */
int signum; /* posix.1b rt signal to be delivered on IO */
diff --git a/include/uapi/linux/netfilter/nfnetlink_queue.h 
b/include/uapi/linux/netfilter/nfnetlink_queue.h
index ae30841..87379ae 100644
--- a/include/uapi/linux/netfilter/nfnetlink_queue.h
+++ b/include/uapi/linux/netfilter/nfnetlink_queue.h
@@ -60,6 +60,7 @@ enum nfqnl_attr_type {
NFQA_SECCTX,/* security context string */
NFQA_VLAN,  /* nested attribute: packet vlan info */
NFQA_L2HDR, /* full L2 header */
+   NFQA_PID,   /* __s32 sk pid */
 
__NFQA_MAX
 };
@@ -114,7 +115,8 @@ enum nfqnl_attr_config {
 #define NFQA_CFG_F_GSO (1 << 2)
 #define NFQA_CFG_F_UID_GID (1 << 3)
 #define NFQA_CFG_F_SECCTX  (1 << 4)
-#define NFQA_CFG_F_MAX (1 << 5)
+#define NFQA_CFG_F_PID (1 << 5)
+#define NFQA_CFG_F_MAX (1 << 6)
 
 /* flags for NFQA_SKB_INFO */
 /* packet appears to have wrong checksums, but they are ok */
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index aa93877..b7a7f5a3 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -278,6 +278,24 @@ nla_put_failure:
return -1;
 }
 
+static int nfqnl_put_sk_pid(struct sk_buff *skb, struct sock *sk)
+{
+   struct pid *sk_pid;
+   int err = 0;
+
+   if (!sk_fullsock(sk))
+   return 0;
+
+   read_lock_bh(>sk_callback_lock);
+   if (sk->sk_socket && sk->sk_socket->file) {
+   sk_pid = sk->sk_socket->file->f_owner.sock_pid;
+   if (sk_pid)
+   err = nla_put_be32(skb, NFQA_PID, 
htonl(pid_nr(sk_pid)));
+   }
+   read_unlock_bh(>sk_callback_lock);
+   return err;
+}
+
 static u32 nfqnl_get_sk_secctx(struct sk_buff *skb, char **secdata)
 {
u32 seclen = 0;
@@ -440,6 +458,9 @@ nfqnl_build_packet_message(struct net *net, struct 
nfqnl_instance *queue,
size += nla_total_size(seclen);
}
 
+   if (queue->flags & NFQA_CFG_F_PID)
+   size += nla_total_size(sizeof(int32_t)); /* pid */
+
skb = alloc_skb(size, GFP_ATOMIC);
if (!skb) {
skb_tx_error(entskb);
@@ -570,6 +591,10 @@ nfqnl_build_packet_message(struct net *net, struct 
nfqnl_instance *queue,
nfqnl_put_sk_uidgid(skb, entskb->sk) < 0)
goto nla_put_failure;
 
+   if ((queue->flags & NFQA_CFG_F_PID) && entskb->sk &&
+   nfqnl_put_sk_pid(skb, entskb->sk) < 0)
+   goto nla_put_failure;
+
if (seclen && nla_put(skb, NFQA_SECCTX, seclen, secdata))
goto nla_put_failure;
 
diff --git a/net/socket.c b/net/socket.c
index a1bd161..67de200 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -382,6 +382,7 @@ struct file *sock_alloc_file(struct socket *sock, int 
flags, const char *dname)
}
 
sock->file = file;
+   file->f_owner.sock_pid  = find_get_pid(task_pid_nr(current));
file->f_flags = O_RDWR | (flags & O_NONBLOCK);
file->private_data = sock;
return file;
-- 
2.8.0



[PATCH net 02/13] net/mlx5: Fix masking of reserved bits in XRCD number

2016-06-09 Thread Saeed Mahameed
From: Majd Dibbiny <m...@mellanox.com>

Mask the reserved bits when reading the number of newly
created XRCD.

Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Majd Dibbiny <m...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/qp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c 
b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
index b720a27..b82d658 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
@@ -418,7 +418,7 @@ int mlx5_core_xrcd_alloc(struct mlx5_core_dev *dev, u32 
*xrcdn)
if (out.hdr.status)
err = mlx5_cmd_status_to_err();
else
-   *xrcdn = be32_to_cpu(out.xrcdn);
+   *xrcdn = be32_to_cpu(out.xrcdn) & 0xff;
 
return err;
 }
-- 
2.8.0



[PATCH net 00/13] Mellanox 100G mlx5 fixes for 4.7-rc

2016-06-09 Thread Saeed Mahameed
Hi Dave,

The following series provides some small fixes for mlx5 driver.

Two small fixes for the mlx5e netdev, the 1st is for the blue flame 
quota accounting and the 2nd is a small refactoring in shutdown flow.

Five trivial fixes for mlx5 E-Switch.
- Allmulti mc_promisc flag was not set in a specific flow.
- Modify VF node guid when admin mac is changed.
- Race in vport enable flow.
- Misc code fixes (kvfree when needed and error pointers checking).

Three in mlx5 steering area.  Correct capabilities checking and root flow table 
update.

Three misc fixes in mlx5 commands enum and layouts.

Thanks,
Saeed.

Eli Cohen (1):
  net/mlx5e: Fix blue flame quota logic

Eran Ben Elisha (1):
  net/mlx5e: Use ndo_stop explicitly at shutdown flow

Majd Dibbiny (2):
  net/mlx5: Fix the size of modify QP mailbox
  net/mlx5: Fix masking of reserved bits in XRCD number

Maor Gottlieb (3):
  net/mlx5: Fix root flow table update
  net/mlx5: Fix flow steering NIC capabilities check
  net/mlx5: Fix E-Switch flow steering capabilities check

Mohamad Haj Yahia (2):
  net/mlx5: E-Switch, Fix vport enable flow
  net/mlx5: E-Switch, always set mc_promisc for allmulti vports

Noa Osherovich (1):
  net/mlx5: E-Switch, Modify node guid on vf set MAC

Or Gerlitz (2):
  net/mlx5: E-Switch, Use the correct free() function
  net/mlx5: E-Switch, Use the correct error check on returned pointers

Shahar Klein (1):
  net/mlx5: Fix MLX5_CMD_OP_MAX to be defined correctly

 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  5 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c   |  3 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 69 ++-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 40 -
 drivers/net/ethernet/mellanox/mlx5/core/qp.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/vport.c   | 38 +
 include/linux/mlx5/device.h   |  8 ++-
 include/linux/mlx5/mlx5_ifc.h | 12 +++-
 include/linux/mlx5/qp.h   |  1 +
 include/linux/mlx5/vport.h|  2 +
 10 files changed, 128 insertions(+), 52 deletions(-)

-- 
2.8.0



[PATCH net 04/13] net/mlx5: Fix root flow table update

2016-06-09 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

When we destroy the last flow table we need to update
the root_ft to NULL.

It fixes an issue for when the last flow table is destroyed
and recreated again, root_ft pointer will not be updated,
as a result traffic will be dropped.

Fixes: 2cc43b494a6c ('net/mlx5_core: Managing root flow table')
Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 8b5f0b2..fa6fec1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1292,8 +1292,8 @@ static int update_root_ft_destroy(struct mlx5_flow_table 
*ft)
   ft->id);
return err;
}
-   root->root_ft = new_root_ft;
}
+   root->root_ft = new_root_ft;
return 0;
 }
 
-- 
2.8.0



[PATCH net 03/13] net/mlx5: Fix MLX5_CMD_OP_MAX to be defined correctly

2016-06-09 Thread Saeed Mahameed
From: Shahar Klein <shah...@mellanox.com>

Having MLX5_CMD_OP_MAX on another file causes us to repeatedly miss
accounting new commands added to the driver and hence there're no entries
for them in debugfs. To solve that, we integrate it into the commands enum
as the last entry.

Fixes: 34a40e689393 ('net/mlx5_core: Introduce modify flow table command')
Signed-off-by: Shahar Klein <shah...@mellanox.com>
Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 include/linux/mlx5/device.h   | 2 --
 include/linux/mlx5/mlx5_ifc.h | 3 ++-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 035abdf..51f0caf 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1240,8 +1240,6 @@ struct mlx5_destroy_psv_out {
u8  rsvd[8];
 };
 
-#define MLX5_CMD_OP_MAX 0x920
-
 enum {
VPORT_STATE_DOWN= 0x0,
VPORT_STATE_UP  = 0x1,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 9a05cd7..986a615 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -205,7 +205,8 @@ enum {
MLX5_CMD_OP_ALLOC_FLOW_COUNTER= 0x939,
MLX5_CMD_OP_DEALLOC_FLOW_COUNTER  = 0x93a,
MLX5_CMD_OP_QUERY_FLOW_COUNTER= 0x93b,
-   MLX5_CMD_OP_MODIFY_FLOW_TABLE = 0x93c
+   MLX5_CMD_OP_MODIFY_FLOW_TABLE = 0x93c,
+   MLX5_CMD_OP_MAX
 };
 
 struct mlx5_ifc_flow_table_fields_supported_bits {
-- 
2.8.0



[PATCH net 06/13] net/mlx5: Fix E-Switch flow steering capabilities check

2016-06-09 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Add missing capabilities check for E-Switch FDB and ACLs flow
tables before creating their namespace in flow steering.

Fixes: efdc810ba39d ('net/mlx5: Flow steering, Add vport ACL support')
Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 28 ---
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index c1efa55..e912a3d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1846,19 +1846,21 @@ int mlx5_init_fs(struct mlx5_core_dev *dev)
}
 
if (MLX5_CAP_GEN(dev, eswitch_flow_table)) {
-   err = init_fdb_root_ns(dev);
-   if (err)
-   goto err;
-   }
-   if (MLX5_CAP_ESW_EGRESS_ACL(dev, ft_support)) {
-   err = init_egress_acl_root_ns(dev);
-   if (err)
-   goto err;
-   }
-   if (MLX5_CAP_ESW_INGRESS_ACL(dev, ft_support)) {
-   err = init_ingress_acl_root_ns(dev);
-   if (err)
-   goto err;
+   if (MLX5_CAP_ESW_FLOWTABLE_FDB(dev, ft_support)) {
+   err = init_fdb_root_ns(dev);
+   if (err)
+   goto err;
+   }
+   if (MLX5_CAP_ESW_EGRESS_ACL(dev, ft_support)) {
+   err = init_egress_acl_root_ns(dev);
+   if (err)
+   goto err;
+   }
+   if (MLX5_CAP_ESW_INGRESS_ACL(dev, ft_support)) {
+   err = init_ingress_acl_root_ns(dev);
+   if (err)
+   goto err;
+   }
}
 
return 0;
-- 
2.8.0



[PATCH net 01/13] net/mlx5: Fix the size of modify QP mailbox

2016-06-09 Thread Saeed Mahameed
From: Majd Dibbiny <m...@mellanox.com>

Add 16 reserved bytes at the end of mlx5_modify_qp_mbox_in to
match the hardware spec definition.

Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Majd Dibbiny <m...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 include/linux/mlx5/qp.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index 6422102..1532dcf 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -560,6 +560,7 @@ struct mlx5_modify_qp_mbox_in {
__be32  optparam;
u8  rsvd0[4];
struct mlx5_qp_context  ctx;
+   u8  rsvd2[16];
 };
 
 struct mlx5_modify_qp_mbox_out {
-- 
2.8.0



[PATCH net 05/13] net/mlx5: Fix flow steering NIC capabilities check

2016-06-09 Thread Saeed Mahameed
From: Maor Gottlieb <ma...@mellanox.com>

Flow steering infrastructure is currently used only on link layer
ethernet, therefore the driver should initialize the flow steering
when the device link layer is ethernet.

In addition, add missing capability check before initializing the
namespace of NIC RX flow tables.

Fixes: 2530236303d9 ('net/mlx5_core: Flow steering tree initialization')
Signed-off-by: Maor Gottlieb <ma...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 10 +-
 include/linux/mlx5/device.h   |  6 ++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index fa6fec1..c1efa55 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1767,6 +1767,9 @@ static void cleanup_root_ns(struct mlx5_core_dev *dev)
 
 void mlx5_cleanup_fs(struct mlx5_core_dev *dev)
 {
+   if (MLX5_CAP_GEN(dev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
+   return;
+
cleanup_root_ns(dev);
cleanup_single_prio_root_ns(dev, dev->priv.fdb_root_ns);
cleanup_single_prio_root_ns(dev, dev->priv.esw_egress_root_ns);
@@ -1828,15 +1831,20 @@ int mlx5_init_fs(struct mlx5_core_dev *dev)
 {
int err = 0;
 
+   if (MLX5_CAP_GEN(dev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
+   return 0;
+
err = mlx5_init_fc_stats(dev);
if (err)
return err;
 
-   if (MLX5_CAP_GEN(dev, nic_flow_table)) {
+   if (MLX5_CAP_GEN(dev, nic_flow_table) &&
+   MLX5_CAP_FLOWTABLE_NIC_RX(dev, ft_support)) {
err = init_root_ns(dev);
if (err)
goto err;
}
+
if (MLX5_CAP_GEN(dev, eswitch_flow_table)) {
err = init_fdb_root_ns(dev);
if (err)
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 51f0caf..73a4847 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1367,6 +1367,12 @@ enum mlx5_cap_type {
 #define MLX5_CAP_FLOWTABLE_MAX(mdev, cap) \
MLX5_GET(flow_table_nic_cap, mdev->hca_caps_max[MLX5_CAP_FLOW_TABLE], 
cap)
 
+#define MLX5_CAP_FLOWTABLE_NIC_RX(mdev, cap) \
+   MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_receive.cap)
+
+#define MLX5_CAP_FLOWTABLE_NIC_RX_MAX(mdev, cap) \
+   MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_receive.cap)
+
 #define MLX5_CAP_ESW_FLOWTABLE(mdev, cap) \
MLX5_GET(flow_table_eswitch_cap, \
 mdev->hca_caps_cur[MLX5_CAP_ESWITCH_FLOW_TABLE], cap)
-- 
2.8.0



[PATCH net 13/13] net/mlx5e: Fix blue flame quota logic

2016-06-09 Thread Saeed Mahameed
From: Eli Cohen <e...@mellanox.com>

Blue flame is a latency enhancement feature that allows the driver to
write the packet data directly to the NIC's registers thus making the
read of the packet data from host memory redundant.

We maintain a quota for the blue flame which is reloaded whenever we
identify that the hardware is processing send requests and processes
them fast enough so by the time we post the next send request it was
able to process all the pending ones. This indicates that the hardware
is capable of processing more blue flame requests efficiently. The blue
flame quota is decremented whenever we send using blue flame.

The current code erroneously clears the budget if we did not use blue
flame for the current post send operation and we fix it here.

Fixes: 88a85f99e51f ('net/mlx5e: TX latency optimization to save DMA reads')
Signed-off-by: Eli Cohen <e...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 229ab16..b000ddc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -317,7 +317,8 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, 
struct sk_buff *skb)
while ((sq->pc & wq->sz_m1) > sq->edge)
mlx5e_send_nop(sq, false);
 
-   sq->bf_budget = bf ? sq->bf_budget - 1 : 0;
+   if (bf)
+   sq->bf_budget--;
 
sq->stats.packets++;
sq->stats.bytes += num_bytes;
-- 
2.8.0



[PATCH net 10/13] net/mlx5: E-Switch, Modify node guid on vf set MAC

2016-06-09 Thread Saeed Mahameed
From: Noa Osherovich <no...@mellanox.com>

In RoCE, the RDMA-CM needs the node guid to establish connection
between nodes.
Today, the node guid exposed to mlx5 Ethernet VFs is zero, therefore
RDMA-CM on the VF is broken.

Whenever the administrator sets a MAC for a VF, derive the node guid
from it and set it as well in the following way:
MAC: e4:1d:2d:b3:f4:01 -> node_guid: e4:1d:2d:ff:fe:b3:f4:01

Fixes: 77256579c6b43 ('net/mlx5: E-Switch, Introduce Vport...')
Signed-off-by: Noa Osherovich <no...@mellanox.com>
Signed-off-by: Majd Dibbiny <m...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 23 --
 drivers/net/ethernet/mellanox/mlx5/core/vport.c   | 38 +++
 include/linux/mlx5/mlx5_ifc.h |  9 --
 include/linux/mlx5/vport.h|  2 ++
 4 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index cfec20c..9b1855b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1725,11 +1725,24 @@ void mlx5_eswitch_vport_event(struct mlx5_eswitch *esw, 
struct mlx5_eqe *eqe)
(esw && MLX5_CAP_GEN(esw->dev, vport_group_manager) && 
mlx5_core_is_pf(esw->dev))
 #define LEGAL_VPORT(esw, vport) (vport >= 0 && vport < esw->total_vports)
 
+static void node_guid_gen_from_mac(u64 *node_guid, u8 mac[ETH_ALEN])
+{
+   ((u8 *)node_guid)[7] = mac[0];
+   ((u8 *)node_guid)[6] = mac[1];
+   ((u8 *)node_guid)[5] = mac[2];
+   ((u8 *)node_guid)[4] = 0xff;
+   ((u8 *)node_guid)[3] = 0xfe;
+   ((u8 *)node_guid)[2] = mac[3];
+   ((u8 *)node_guid)[1] = mac[4];
+   ((u8 *)node_guid)[0] = mac[5];
+}
+
 int mlx5_eswitch_set_vport_mac(struct mlx5_eswitch *esw,
   int vport, u8 mac[ETH_ALEN])
 {
-   int err = 0;
struct mlx5_vport *evport;
+   u64 node_guid;
+   int err = 0;
 
if (!ESW_ALLOWED(esw))
return -EPERM;
@@ -1753,11 +1766,17 @@ int mlx5_eswitch_set_vport_mac(struct mlx5_eswitch *esw,
return err;
}
 
+   node_guid_gen_from_mac(_guid, mac);
+   err = mlx5_modify_nic_vport_node_guid(esw->dev, vport, node_guid);
+   if (err)
+   mlx5_core_warn(esw->dev,
+  "Failed to set vport %d node guid, err = %d. 
RDMA_CM will not function properly for this VF.\n",
+  vport, err);
+
mutex_lock(>state_lock);
if (evport->enabled)
err = esw_vport_ingress_config(esw, evport);
mutex_unlock(>state_lock);
-
return err;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c 
b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index b69dadc..daf44cd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -508,6 +508,44 @@ int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev 
*mdev, u64 *node_guid)
 }
 EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_node_guid);
 
+int mlx5_modify_nic_vport_node_guid(struct mlx5_core_dev *mdev,
+   u32 vport, u64 node_guid)
+{
+   int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
+   void *nic_vport_context;
+   u8 *guid;
+   void *in;
+   int err;
+
+   if (!vport)
+   return -EINVAL;
+   if (!MLX5_CAP_GEN(mdev, vport_group_manager))
+   return -EACCES;
+   if (!MLX5_CAP_ESW(mdev, nic_vport_node_guid_modify))
+   return -ENOTSUPP;
+
+   in = mlx5_vzalloc(inlen);
+   if (!in)
+   return -ENOMEM;
+
+   MLX5_SET(modify_nic_vport_context_in, in,
+field_select.node_guid, 1);
+   MLX5_SET(modify_nic_vport_context_in, in, vport_number, vport);
+   MLX5_SET(modify_nic_vport_context_in, in, other_vport, !!vport);
+
+   nic_vport_context = MLX5_ADDR_OF(modify_nic_vport_context_in,
+in, nic_vport_context);
+   guid = MLX5_ADDR_OF(nic_vport_context, nic_vport_context,
+   node_guid);
+   MLX5_SET64(nic_vport_context, nic_vport_context, node_guid, node_guid);
+
+   err = mlx5_modify_nic_vport_context(mdev, in, inlen);
+
+   kvfree(in);
+
+   return err;
+}
+
 int mlx5_query_nic_vport_qkey_viol_cntr(struct mlx5_core_dev *mdev,
u16 *qkey_viol_cntr)
 {
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 986a615..e955a28 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -501,7 +501,9 @@ struct mlx5_ifc_e_switch_cap_bits {
u8 v

[PATCH net 09/13] net/mlx5: E-Switch, Fix vport enable flow

2016-06-09 Thread Saeed Mahameed
From: Mohamad Haj Yahia <moha...@mellanox.com>

Reorder vport enable flow to mark the vport as enabled before calling
the vport change handler which was modified to handle the case for
when vport is not enabled.

This fixes the case for when the PF netdev is open before sriov is
enabled, once sriov is enabled at esw_enable_vport,
esw_vport_change_handle_locked didn't read the PF context since it
thought the PF vport was not enabled.

When we enable the vport, arming for events is not required anymore,
since it's done on the vport change handle

Fixes: 586cfa7f1d58 ('net/mlx5: E-Switch, Use vport event handler for vport 
cleanup')
Signed-off-by: Mohamad Haj Yahia <moha...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index a350af2..cfec20c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1491,14 +1491,11 @@ static void esw_enable_vport(struct mlx5_eswitch *esw, 
int vport_num,
 
/* Sync with current vport context */
vport->enabled_events = enable_events;
-   esw_vport_change_handle_locked(vport);
-
vport->enabled = true;
 
/* only PF is trusted by default */
vport->trusted = (vport_num) ? false : true;
-
-   arm_vport_context_events_cmd(esw->dev, vport_num, enable_events);
+   esw_vport_change_handle_locked(vport);
 
esw->enabled_vports++;
esw_debug(esw->dev, "Enabled VPORT(%d)\n", vport_num);
-- 
2.8.0



[PATCH net 08/13] net/mlx5: E-Switch, Use the correct error check on returned pointers

2016-06-09 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

The mlx5 flow-steering API (mlx5_create_flow_table/group/rule) never
returns null pointer on error. Even if it was doing that, checking
for IS_ERR_OR_NULL(p) and then returning PTR_ERR(p) would have cause
bugs, since PTR_ERR(NULL) --> success, crash.

To make things more robust and protect against related future bugs,
convert all IS_ERR_OR_NULL checks on returned values to IS_ERR.

Fixes: 5742df0f7dbe ('net/mlx5: E-Switch, Introduce VST vport ingress/egress 
ACLs')
Fixes: 86d722ad2c3b ('net/mlx5: Use flow steering infrastructure for mlx5_en')
Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reported-by: Ilya Lesokhin <il...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 34 +++
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 5374796..a350af2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -383,7 +383,7 @@ __esw_fdb_set_vport_rule(struct mlx5_eswitch *esw, u32 
vport, bool rx_rule,
   match_v,
   MLX5_FLOW_CONTEXT_ACTION_FWD_DEST,
   0, );
-   if (IS_ERR_OR_NULL(flow_rule)) {
+   if (IS_ERR(flow_rule)) {
pr_warn(
"FDB: Failed to add flow rule: dmac_v(%pM) dmac_c(%pM) 
-> vport(%d), err(%ld)\n",
 dmac_v, dmac_c, vport, PTR_ERR(flow_rule));
@@ -457,7 +457,7 @@ static int esw_create_fdb_table(struct mlx5_eswitch *esw, 
int nvports)
 
table_size = BIT(MLX5_CAP_ESW_FLOWTABLE_FDB(dev, log_max_ft_size));
fdb = mlx5_create_flow_table(root_ns, 0, table_size, 0);
-   if (IS_ERR_OR_NULL(fdb)) {
+   if (IS_ERR(fdb)) {
err = PTR_ERR(fdb);
esw_warn(dev, "Failed to create FDB Table err %d\n", err);
goto out;
@@ -474,7 +474,7 @@ static int esw_create_fdb_table(struct mlx5_eswitch *esw, 
int nvports)
MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index, 
table_size - 3);
eth_broadcast_addr(dmac);
g = mlx5_create_flow_group(fdb, flow_group_in);
-   if (IS_ERR_OR_NULL(g)) {
+   if (IS_ERR(g)) {
err = PTR_ERR(g);
esw_warn(dev, "Failed to create flow group err(%d)\n", err);
goto out;
@@ -489,7 +489,7 @@ static int esw_create_fdb_table(struct mlx5_eswitch *esw, 
int nvports)
eth_zero_addr(dmac);
dmac[0] = 0x01;
g = mlx5_create_flow_group(fdb, flow_group_in);
-   if (IS_ERR_OR_NULL(g)) {
+   if (IS_ERR(g)) {
err = PTR_ERR(g);
esw_warn(dev, "Failed to create allmulti flow group err(%d)\n", 
err);
goto out;
@@ -506,7 +506,7 @@ static int esw_create_fdb_table(struct mlx5_eswitch *esw, 
int nvports)
MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index, 
table_size - 1);
MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index, 
table_size - 1);
g = mlx5_create_flow_group(fdb, flow_group_in);
-   if (IS_ERR_OR_NULL(g)) {
+   if (IS_ERR(g)) {
err = PTR_ERR(g);
esw_warn(dev, "Failed to create promisc flow group err(%d)\n", 
err);
goto out;
@@ -1060,7 +1060,7 @@ static void esw_vport_enable_egress_acl(struct 
mlx5_eswitch *esw,
return;
 
acl = mlx5_create_vport_flow_table(root_ns, 0, table_size, 0, 
vport->vport);
-   if (IS_ERR_OR_NULL(acl)) {
+   if (IS_ERR(acl)) {
err = PTR_ERR(acl);
esw_warn(dev, "Failed to create E-Switch vport[%d] egress flow 
Table, err(%d)\n",
 vport->vport, err);
@@ -1075,7 +1075,7 @@ static void esw_vport_enable_egress_acl(struct 
mlx5_eswitch *esw,
MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index, 0);
 
vlan_grp = mlx5_create_flow_group(acl, flow_group_in);
-   if (IS_ERR_OR_NULL(vlan_grp)) {
+   if (IS_ERR(vlan_grp)) {
err = PTR_ERR(vlan_grp);
esw_warn(dev, "Failed to create E-Switch vport[%d] egress 
allowed vlans flow group, err(%d)\n",
 vport->vport, err);
@@ -1086,7 +1086,7 @@ static void esw_vport_enable_egress_acl(struct 
mlx5_eswitch *esw,
MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index, 1);
MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index, 1);
drop_grp = mlx5_create_flow_group(acl, flow_group_in);
-   if (IS_ERR_OR_NULL(drop_grp)) {
+   if (IS_ERR(drop_grp)) {
err = PTR_ERR(drop_grp);
esw_warn(dev, &q

[PATCH net 11/13] net/mlx5: E-Switch, always set mc_promisc for allmulti vports

2016-06-09 Thread Saeed Mahameed
From: Mohamad Haj Yahia <moha...@mellanox.com>

Set the mc_promisc flag also in the case of adding new mc address to
existing allmulti vport.

Fixes: a35f71f27a61 ('net/mlx5: E-Switch, Implement promiscuous rx modes vf 
request handling')
Signed-off-by: Mohamad Haj Yahia <moha...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 9b1855b..aebbd6c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -651,6 +651,7 @@ static void update_allmulti_vports(struct mlx5_eswitch *esw,
esw_fdb_set_vport_rule(esw,
   mac,
   vport_idx);
+   iter_vaddr->mc_promisc = true;
break;
case MLX5_ACTION_DEL:
if (!iter_vaddr)
-- 
2.8.0



[PATCH net 12/13] net/mlx5e: Use ndo_stop explicitly at shutdown flow

2016-06-09 Thread Saeed Mahameed
From: Eran Ben Elisha <era...@mellanox.com>

The current implementation copies the flow of ndo_stop instead of
calling it explicitly, Fixed it.

Fixes: 5fc7197d3a25 ("net/mlx5: Add pci shutdown callback")
Signed-off-by: Eran Ben Elisha <era...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index fd43929..f5c8d5d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3192,10 +3192,7 @@ static void mlx5e_destroy_netdev(struct mlx5_core_dev 
*mdev, void *vpriv)
flush_workqueue(priv->wq);
if (test_bit(MLX5_INTERFACE_STATE_SHUTDOWN, >intf_state)) {
netif_device_detach(netdev);
-   mutex_lock(>state_lock);
-   if (test_bit(MLX5E_STATE_OPENED, >state))
-   mlx5e_close_locked(netdev);
-   mutex_unlock(>state_lock);
+   mlx5e_close(netdev);
} else {
unregister_netdev(netdev);
}
-- 
2.8.0



[PATCH net 07/13] net/mlx5: E-Switch, Use the correct free() function

2016-06-09 Thread Saeed Mahameed
From: Or Gerlitz <ogerl...@mellanox.com>

We must use kvfree() for something that could have been allocated with 
vzalloc(),
do that.

Fixes: 5742df0f7dbe ('net/mlx5: E-Switch, Introduce VST vport ingress/egress 
ACLs')
Fixes: 86d722ad2c3b ('net/mlx5: Use flow steering infrastructure for mlx5_en')
Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reported-by: Ilya Lesokhin <il...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index b84a691..5374796 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -529,7 +529,7 @@ out:
}
}
 
-   kfree(flow_group_in);
+   kvfree(flow_group_in);
return err;
 }
 
@@ -1097,7 +1097,7 @@ static void esw_vport_enable_egress_acl(struct 
mlx5_eswitch *esw,
vport->egress.drop_grp = drop_grp;
vport->egress.allowed_vlans_grp = vlan_grp;
 out:
-   kfree(flow_group_in);
+   kvfree(flow_group_in);
if (err && !IS_ERR_OR_NULL(vlan_grp))
mlx5_destroy_flow_group(vlan_grp);
if (err && !IS_ERR_OR_NULL(acl))
@@ -1259,7 +1259,7 @@ out:
mlx5_destroy_flow_table(vport->ingress.acl);
}
 
-   kfree(flow_group_in);
+   kvfree(flow_group_in);
 }
 
 static void esw_vport_cleanup_ingress_rules(struct mlx5_eswitch *esw,
-- 
2.8.0



Re: [PATCH 1/2] mlx5: only register devlink when ethernet is available

2016-06-15 Thread Saeed Mahameed
On Wed, Jun 15, 2016 at 6:27 PM, Arnd Bergmann  wrote:
> We get a build error with the mlx5 driver when the ethernet
> support (CONFIG_MLX5_CORE_EN) is disabled:
>
> drivers/net/ethernet/mellanox/mlx5/core/main.c:1320:22: error: 
> 'mlx5_devlink_eswitch_mode_set' undeclared here (not in a function)
> drivers/net/ethernet/mellanox/mlx5/core/main.c:1321:22: error: 
> 'mlx5_devlink_eswitch_mode_get' undeclared here (not in a function)
> drivers/net/built-in.o:(.rodata+0x25a68): undefined reference to 
> `mlx5_devlink_eswitch_mode_get'
> drivers/net/built-in.o:(.rodata+0x25a6c): undefined reference to 
> `mlx5_devlink_eswitch_mode_set'
>
> There are actually two problems here, but they are closely related,
> so I'm addressing them both:
>
> - The header is included under an #ifdef, which is usually a bad idea
>   as it hides the function declarations, so we fail to compile even
>   if we don't actually use the functions in the end.
> - The references to the functions are kept in the object file because
>   we don't check whether they are built-in or not.
>
> As we don't want to add any useless #ifdef here, this uses an
> IS_ENABLED() check to drop the mlx5_devlink_ops structure when we don't
> need it, and to skip the register/unregister step.
>
> Signed-off-by: Arnd Bergmann 
> Fixes: f7856daf57b9 ("net/mlx5: Add devlink interface")

Hi Arnd,

We already took care of those issues, they only apply to Leon's tree
https://git.kernel.org/cgit/linux/kernel/git/leon/linux-rdma.git/,
this tree is meant to maintain MLX5 Shared code between netdev and
linux-rdma trees prior to submission to both trees.

This patch is a non-shared code and it only exists in
https://git.kernel.org/cgit/linux/kernel/git/leon/linux-rdma.git/log/?h=topic/net-next-mlx5.
It is yet to be submitted to Dave's net/net-next tree. later on, this
patch and all the others will go through the normal submission
process.

For the future I don't see any reason to CC the whole netdev, rdma and
kernel folks.
Unless you, Dave and Doug think otherwise.

Thanks
Saeed.


Re: [PATCH] net/mlx5: use mlx5_buf_alloc_node insteaf of mlx5_buf_alloc in mlx5_wq_ll_create

2016-06-22 Thread Saeed Mahameed
On Wed, Jun 22, 2016 at 11:06 AM, Wang Sheng-Hui  wrote:
> This patch introduces 2 changes:
> * use mlx5_buf_alloc_node() insteaf of mlx5_buf_alloc() in

insteaf => instead

>   mlx5_wq_ll_create
> * Update the failure warn messages with _node postfix for mlx5_*_alloc
>   function names
>

Nice catch,

Please add Fixes line, and let's take it to net and -stable.
Fixes: 311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on
reader NUMA node")

Can i ask how did you hit this ? did you see any performance impact on RX?


[PATCH net-next 1/9] net/mlx5: Rate limit tables support

2016-06-22 Thread Saeed Mahameed
From: Yevgeny Petrilin <yevge...@mellanox.com>

Configuring and managing HW rate limit tables.
The HW holds a table of rate limits, each rate is
associated with an index in that table.
Later a Send Queue uses this index to set the rate limit.
Multiple Send Queues can have the same rate limit, which is
represented by a single entry in this table.
Even though a rate can be shared, each queue is being rate
limited independently of others.

The SW shadow of this table holds the rate itself,
the index in the HW table and the refcount (number of queues)
working with this rate.

The exported functions are mlx5_rl_add_rate and mlx5_rl_remove_rate.
Number of different rates and their values are derived
from HW capabilities.

Signed-off-by: Yevgeny Petrilin <yevge...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile |   5 +-
 drivers/net/ethernet/mellanox/mlx5/core/fw.c |   6 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c   |  10 ++
 drivers/net/ethernet/mellanox/mlx5/core/rl.c | 209 +++
 include/linux/mlx5/device.h  |   4 +
 include/linux/mlx5/driver.h  |  27 +++
 6 files changed, 259 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/rl.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 9ea7b58..0c8a7dc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -1,8 +1,9 @@
 obj-$(CONFIG_MLX5_CORE)+= mlx5_core.o
 
 mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
-   health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o   \
-   mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o 
fs_counters.o
+   health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o \
+   mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
+   fs_counters.o rl.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o \
en_main.o en_fs.o en_ethtool.o en_tx.o en_rx.o \
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index 75c7ae6..77fc1aa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -151,6 +151,12 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
return err;
}
 
+   if (MLX5_CAP_GEN(dev, qos)) {
+   err = mlx5_core_get_caps(dev, MLX5_CAP_QOS);
+   if (err)
+   return err;
+   }
+
return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index a19b593..08cae34 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1144,6 +1144,13 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, 
struct mlx5_priv *priv)
dev_err(>dev, "Failed to init flow steering\n");
goto err_fs;
}
+
+   err = mlx5_init_rl_table(dev);
+   if (err) {
+   dev_err(>dev, "Failed to init rate limiting\n");
+   goto err_rl;
+   }
+
 #ifdef CONFIG_MLX5_CORE_EN
err = mlx5_eswitch_init(dev);
if (err) {
@@ -1183,6 +1190,8 @@ err_sriov:
mlx5_eswitch_cleanup(dev->priv.eswitch);
 #endif
 err_reg_dev:
+   mlx5_cleanup_rl_table(dev);
+err_rl:
mlx5_cleanup_fs(dev);
 err_fs:
mlx5_cleanup_mkey_table(dev);
@@ -1253,6 +1262,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, 
struct mlx5_priv *priv)
mlx5_eswitch_cleanup(dev->priv.eswitch);
 #endif
 
+   mlx5_cleanup_rl_table(dev);
mlx5_cleanup_fs(dev);
mlx5_cleanup_mkey_table(dev);
mlx5_cleanup_srq_table(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/rl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/rl.c
new file mode 100644
index 000..c07c28b
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/rl.c
@@ -0,0 +1,209 @@
+/*
+ * Copyright (c) 2013-2016, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redis

[PATCH net-next 0/9] Mellanox 100G mlx5e Ethernet extensions

2016-06-22 Thread Saeed Mahameed
Hi Dave,

This series includes multiple features extensions for mlx5 Ethernet netdevice 
driver.
Namely, TX Rate limiting, RX interrupt moderation, ethtool settings.

TX Rate limiting:
- ConnectX-4 rate limiting infrastructure
- Set max rate NDO support

RX interrupt moderation:
- CQE based coalescing option (controlled via priv flags)
- Adaptive RX coalescing

ethtool settings:
- priv flags callbacks
- Support new ksettings API
- Add 50G missing link mode
- Support auto negotiation on/off

Applied on top: 0e9390ebf1fe ("Merge branch 'mlxsw-next'")

Thanks,
Saeed.

Gal Pressman (5):
  net/mlx5e: Introduce net device priv flags infrastructure
  net/mlx5e: Toggle link only after modifying port parameters
  net/mlx5e: Add 50G missing link mode to ethtool and mlx5 driver
  net/mlx5e: Use new ethtool get/set link ksettings API
  net/mlx5e: Report correct auto negotiation and allow toggling

Gil Rockah (1):
  net/mlx5e: Support adaptive RX coalescing

Tariq Toukan (1):
  net/mlx5e: CQE based moderation

Yevgeny Petrilin (2):
  net/mlx5: Rate limit tables support
  net/mlx5e: Add TXQ set max rate support

 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  73 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c |   9 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 476 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 181 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 335 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   5 +
 drivers/net/ethernet/mellanox/mlx5/core/fw.c   |   6 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  10 +
 drivers/net/ethernet/mellanox/mlx5/core/port.c |  48 ++-
 drivers/net/ethernet/mellanox/mlx5/core/rl.c   | 209 +
 include/linux/mlx5/device.h|   4 +
 include/linux/mlx5/driver.h|  27 ++
 include/linux/mlx5/port.h  |  16 +-
 include/uapi/linux/ethtool.h   |   3 +-
 15 files changed, 1179 insertions(+), 231 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/rl.c

-- 
2.8.0



[PATCH net-next 7/9] net/mlx5e: Add 50G missing link mode to ethtool and mlx5 driver

2016-06-22 Thread Saeed Mahameed
From: Gal Pressman <g...@mellanox.com>

Add ETHTOOL_LINK_MODE_5baseSR2_Full_BIT and MLX5E_50GBASE_SR2
bits.

Signed-off-by: Gal Pressman <g...@mellanox.com>
CC: Ben Hutchings <b...@kernel.org>
CC: David Decotigny <de...@googlers.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 +
 include/uapi/linux/ethtool.h | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index aa36a3a..b8732e6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -616,6 +616,7 @@ enum mlx5e_link_mode {
MLX5E_10GBASE_ER = 14,
MLX5E_40GBASE_SR4= 15,
MLX5E_40GBASE_LR4= 16,
+   MLX5E_50GBASE_SR2= 18,
MLX5E_100GBASE_CR4   = 20,
MLX5E_100GBASE_SR4   = 21,
MLX5E_100GBASE_KR4   = 22,
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 5f030b4..b8f38e8 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1362,6 +1362,7 @@ enum ethtool_link_mode_bit_indices {
ETHTOOL_LINK_MODE_10baseSR4_Full_BIT= 37,
ETHTOOL_LINK_MODE_10baseCR4_Full_BIT= 38,
ETHTOOL_LINK_MODE_10baseLR4_ER4_Full_BIT= 39,
+   ETHTOOL_LINK_MODE_5baseSR2_Full_BIT = 40,
 
/* Last allowed bit for __ETHTOOL_LINK_MODE_LEGACY_MASK is bit
 * 31. Please do NOT define any SUPPORTED_* or ADVERTISED_*
@@ -1370,7 +1371,7 @@ enum ethtool_link_mode_bit_indices {
 */
 
__ETHTOOL_LINK_MODE_LAST
- = ETHTOOL_LINK_MODE_10baseLR4_ER4_Full_BIT,
+ = ETHTOOL_LINK_MODE_5baseSR2_Full_BIT,
 };
 
 #define __ETHTOOL_LINK_MODE_LEGACY_MASK(base_name) \
-- 
2.8.0



[PATCH net-next 6/9] net/mlx5e: Toggle link only after modifying port parameters

2016-06-22 Thread Saeed Mahameed
From: Gal Pressman <g...@mellanox.com>

Add a dedicated function to toggle port link. It should be called only
after setting a port register.
Toggle will set port link to down and bring it back up in case that it's
admin status was up.

Signed-off-by: Gal Pressman <g...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c   |  9 +
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |  7 +--
 drivers/net/ethernet/mellanox/mlx5/core/port.c   | 12 
 include/linux/mlx5/port.h|  1 +
 4 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index b2db180..e688313 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -191,7 +191,6 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev,
 {
struct mlx5e_priv *priv = netdev_priv(dev);
struct mlx5_core_dev *mdev = priv->mdev;
-   enum mlx5_port_status ps;
u8 curr_pfc_en;
int ret;
 
@@ -200,14 +199,8 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev,
if (pfc->pfc_en == curr_pfc_en)
return 0;
 
-   mlx5_query_port_admin_status(mdev, );
-   if (ps == MLX5_PORT_UP)
-   mlx5_set_port_admin_status(mdev, MLX5_PORT_DOWN);
-
ret = mlx5_set_port_pfc(mdev, pfc->pfc_en, pfc->pfc_en);
-
-   if (ps == MLX5_PORT_UP)
-   mlx5_set_port_admin_status(mdev, MLX5_PORT_UP);
+   mlx5_toggle_port_link(mdev);
 
return ret;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index c4be394..d0d3dcf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -795,7 +795,6 @@ static int mlx5e_set_settings(struct net_device *netdev,
u32 link_modes;
u32 speed;
u32 eth_proto_cap, eth_proto_admin;
-   enum mlx5_port_status ps;
int err;
 
speed = ethtool_cmd_speed(cmd);
@@ -829,12 +828,8 @@ static int mlx5e_set_settings(struct net_device *netdev,
if (link_modes == eth_proto_admin)
goto out;
 
-   mlx5_query_port_admin_status(mdev, );
-   if (ps == MLX5_PORT_UP)
-   mlx5_set_port_admin_status(mdev, MLX5_PORT_DOWN);
mlx5_set_port_proto(mdev, link_modes, MLX5_PTYS_EN);
-   if (ps == MLX5_PORT_UP)
-   mlx5_set_port_admin_status(mdev, MLX5_PORT_UP);
+   mlx5_toggle_port_link(mdev);
 
 out:
return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c 
b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index 3e35611..1562e73 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -222,6 +222,18 @@ int mlx5_set_port_proto(struct mlx5_core_dev *dev, u32 
proto_admin,
 }
 EXPORT_SYMBOL_GPL(mlx5_set_port_proto);
 
+/* This function should be used after setting a port register only */
+void mlx5_toggle_port_link(struct mlx5_core_dev *dev)
+{
+   enum mlx5_port_status ps;
+
+   mlx5_query_port_admin_status(dev, );
+   mlx5_set_port_admin_status(dev, MLX5_PORT_DOWN);
+   if (ps == MLX5_PORT_UP)
+   mlx5_set_port_admin_status(dev, MLX5_PORT_UP);
+}
+EXPORT_SYMBOL_GPL(mlx5_toggle_port_link);
+
 int mlx5_set_port_admin_status(struct mlx5_core_dev *dev,
   enum mlx5_port_status status)
 {
diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h
index 9851862..4adfac1 100644
--- a/include/linux/mlx5/port.h
+++ b/include/linux/mlx5/port.h
@@ -67,6 +67,7 @@ int mlx5_query_port_proto_oper(struct mlx5_core_dev *dev,
   u8 local_port);
 int mlx5_set_port_proto(struct mlx5_core_dev *dev, u32 proto_admin,
int proto_mask);
+void mlx5_toggle_port_link(struct mlx5_core_dev *dev);
 int mlx5_set_port_admin_status(struct mlx5_core_dev *dev,
   enum mlx5_port_status status);
 int mlx5_query_port_admin_status(struct mlx5_core_dev *dev,
-- 
2.8.0



[PATCH net-next 4/9] net/mlx5e: CQE based moderation

2016-06-22 Thread Saeed Mahameed
From: Tariq Toukan <tar...@mellanox.com>

In this mode the moderation timer will restart upon
new completion (CQE) generation rather than upon interrupt
generation.

The outcome is that for bursty traffic the period timer will never
expire and thus only the moderation frames counter will dictate
interrupt generation, thus the interrupt rate will be relative
to the incoming packets size.
If the burst seizes for "moderation period" time then an interrupt
will be issued immediately.

CQE based moderation is off by default and can be controlled
via ethtool set_priv_flags.

Performance tested on ConnectX4-Lx 50G.

Less packet loss in netperf UDP and TCP tests, with no bw degradation,
for both single and multi streams, with message sizes of
64, 1024, 1472 and 32768 byte.

Signed-off-by: Tariq Toukan <tar...@mellanox.com>
Signed-off-by: Achiad Shochat <ach...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
Signed-off-by: Gal Pressman <g...@mellanox.com>
Signed-off-by: Gil Rockah <g...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   | 20 +---
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 54 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 54 --
 3 files changed, 95 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 02fa4da..36f625d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -79,6 +79,7 @@
 
 #define MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ (64 * 1024)
 #define MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC  0x10
+#define MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC_FROM_CQE 0x3
 #define MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_PKTS  0x20
 #define MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC  0x10
 #define MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_PKTS  0x20
@@ -145,11 +146,11 @@ struct mlx5e_umr_wqe {
 };
 
 static const char mlx5e_priv_flags[][ETH_GSTRING_LEN] = {
-   "nop",
+   "rx_cqe_moder",
 };
 
 enum mlx5e_priv_flag {
-   MLX5E_PFLAG_NOP = (1 << 0),
+   MLX5E_PFLAG_RX_CQE_BASED_MODER = (1 << 0),
 };
 
 #define MLX5E_SET_PRIV_FLAG(priv, pflag, enable)\
@@ -165,6 +166,11 @@ enum mlx5e_priv_flag {
 #define MLX5E_MIN_BW_ALLOC 1   /* Min percentage of BW allocation */
 #endif
 
+struct mlx5e_cq_moder {
+   u16 usec;
+   u16 pkts;
+};
+
 struct mlx5e_params {
u8  log_sq_size;
u8  rq_wq_type;
@@ -173,12 +179,11 @@ struct mlx5e_params {
u8  log_rq_size;
u16 num_channels;
u8  num_tc;
+   u8  rx_cq_period_mode;
bool rx_cqe_compress_admin;
bool rx_cqe_compress;
-   u16 rx_cq_moderation_usec;
-   u16 rx_cq_moderation_pkts;
-   u16 tx_cq_moderation_usec;
-   u16 tx_cq_moderation_pkts;
+   struct mlx5e_cq_moder rx_cq_moderation;
+   struct mlx5e_cq_moder tx_cq_moderation;
u16 min_rx_wqes;
bool lro_en;
u32 lro_wqe_sz;
@@ -667,6 +672,9 @@ void mlx5e_build_default_indir_rqt(struct mlx5_core_dev 
*mdev,
   int num_channels);
 int mlx5e_get_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
 
+void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params,
+u8 cq_period_mode);
+
 static inline void mlx5e_tx_notify_hw(struct mlx5e_sq *sq,
  struct mlx5_wqe_ctrl_seg *ctrl, int bf_sz)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index f8bbc2b..4f433d3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -524,10 +524,10 @@ static int mlx5e_get_coalesce(struct net_device *netdev,
if (!MLX5_CAP_GEN(priv->mdev, cq_moderation))
return -ENOTSUPP;
 
-   coal->rx_coalesce_usecs   = priv->params.rx_cq_moderation_usec;
-   coal->rx_max_coalesced_frames = priv->params.rx_cq_moderation_pkts;
-   coal->tx_coalesce_usecs   = priv->params.tx_cq_moderation_usec;
-   coal->tx_max_coalesced_frames = priv->params.tx_cq_moderation_pkts;
+   coal->rx_coalesce_usecs   = priv->params.rx_cq_moderation.usec;
+   coal->rx_max_coalesced_frames = priv->params.rx_cq_moderation.pkts;
+   coal->tx_coalesce_usecs   = priv->params.tx_cq_moderation.usec;
+   coal->tx_max_coalesced_frames = priv->params.tx_cq_moderation.pkts;
 
return 0;
 }
@@ -545,10 +545,11 @@ static int mlx5e_set_coalesce(struct net_device *netdev,
return -ENOTSUPP;
 
mutex_lock(>state_lock);
-   priv->params.tx_cq_moderation_usec = coal->tx_coalesce_usecs;
-   priv->par

  1   2   3   4   5   6   7   8   9   10   >